Showing posts with label Australia. Show all posts
Showing posts with label Australia. Show all posts

Thursday, February 15, 2007



This is a duck pond, right? I mean there's no way it's a lake...
Final presntation in Melbourne: "Grid Data Management and Storage (An EGEE-Centric View)".

This is organised by VeRSI (The Victorian eResearch Strategic Initiative). They have funding to setup multi-site storage of ~100TB to help scientists in Victoria share data and were very interested in the EGEE DM solutions. I ran though storage, catalogs, SRM 2.2, FTS, etc (presentation here). However, my conclusion was that operations were more important than technology choices - perhaps that is the real lesson from EGEE.

After learning a bit more about their project, I was inclined to recommend dCache for them - they have a muti-site storage problem, with dedicated networking between the data centres and dCache would seem to offer them the most flexible approach. Of course, as with many of the Australian grid projects, SRB seemed to be their de facto soution (needless to say, with shiboleth authentication), however in talking to their SRB expert about the LFC, I finally managed to get a number for the scaling of the SRB MCAT catalog - it can start to have problems when you have over 30,000 files. This seems terribly low - I know the LFC has been tested up to millions on even quite modest hardware (although the LFC is just a file catalog, where as MCAT is also a metadata calalog).

Of course, no software is a panacea, and they all have problems - perhaps the weakness of the EGEE solution is that there are so many bits to it - it would be quite a daunting thing to setup from scratch.

I'll be interested to see what they do decide in the end.
My fourth talk in Australia was the School of Physics Colloquium. Here was a chance to move away from the storage and data management focus, and deliver a much more general EGEE/LHC talk, which I entitled "Enabling Grids for eScience: How to build a working grid".

As ever, the RTM is a great start - it's such an attractive visualisation of the grid. This time I had no trouble getting it to work on my mac, although it only ever uses the primary display, so I had to put my laptop into mirror display mode, then flip back to two screens for the talk - a minor niggle.

I tried to speak about EGEE in a general way, introducing the project, the services delivered - even a slide on the "Life of a Job", problems commonly seen and operational aspects. The idea was to introduce the grid to potential users, rather than site admins or service providers. This seemed to go pretty well - no one obviously fell asleep (!) and there were a number of questions about EGEE and other grids, job efficiencies and data volumes. Later some of the graduate students complimented me on a good talk, so it must have been pitched at the right level.

I had intended to mention Byzantine Generals, but it slipped my mind. Single points of failure, eh...
Tuesday night: It's Linux Users Victoria.

These guys (and of the 50 or so there, 49 are guys) are uber-geeks. Talking to them is a lot of fun - they like the physics, they like the LHC, they even like the grid. Some of them want grid certificates. Personally I think we should give certificates to them - they'd probably fix lots of stuff!

The GridPP Real Time Monitor was a big hit.
Right, before the suntan gets washed off and all memory of antipodean adventure fades, better blog a bit about my vist to Melbourne Uni on the second week of the trip.

First Monday, it's VPAC. "Data Management and Storage in EGEE" is my talk.

VPAC is the Victorian Partnership for Advanced Computing. I'm giving much the same talk as I did at AUSGRID, but with some particle phyiscs at the beginning. This is partly to flesh things out, but mostly because the LHC is so cool it makes the storage and data management parts of the talk more interesting, having set the context for why we are doing what we're doing. This is my first talk pretending to be a particle physicist - remarkable what an afternoon browsing the Standard Model on wikipedia can produce ;-)

The audience are mostly coming from an HPC background, so they are more used to providing compute resources and fast interconnects, but now they have an interest in providing cross-site storage, rather and some dedicated disk hanging off their HPC. Oh, and they've heard about the grid.

Again, the scale of the EGEE project is generally appreciated - and by implication that the solutions we've adopted do scale. I'm specifically asked about why we ddn't use SRB. This is hard to give a definitive answer on as I don't know enough about SRB, however, the EGEE solution (usefully charactarised by Jens as component based) is that:

  1. It allows poorly performing components to be replaced individually, e.g., the replacement of the RLS with the LFC.
  2. It allows sites, whose storage is also serving local users and legacy applications, to layer an SRM interface on top of their storage, rather than have to implement a completely new system.
  3. It makes it easier to deploy at a large number of sites, where agreements across the whole grid are often difficult to achieve.

In the discussion it also becomes clear that SRB doesn't offer any good methods of data localisation - there's no easy way to ensure that jobs at a site can access SRB data efficiently. There doesn't seem to be an SRB LAN access protocol, like rfio or dcap.

After the talk I discuss SRB a little more with David Bannon. He says that the SRB model of federations of storage is actually quite fragile and only seems to work if all the SRBs are running exactly the same release - so it's back to big bang upgrades to maintain.

Sunday, February 11, 2007

Wow! It's been a busy time here in Melbourne, but just time to blog before getting back on the plane to Scotland.

The first week I was at the ACSW conference in Ballarat. It turned out that AUSGRID was only a 1 day event as part of that, so in fact the primary reason (excuse?) for coming turned out to be the least interesting thing. There were a few good papers as part of that, though. The keynote was by Denis Caromel, from Nice, describing how they have grid "enabled" java, in a package called ProActive. Essentially it replaces all of those awful MPI calls with nice, easy to manipulate java objects. Looks like a splendid way of doing parallel code on the grid. Probably not of huge interest in LCG, where we're mostly embarassingly parallel, but for communities coming from a supercomputing background, or just starting out with grids, it might be a very valuable toolkit to use.

The paper I gave went down well and there was general appeciation of the scale of EGEE and the volume of work we are doing. There was some discussion on SRM vs. SRB, a theme that contunied into my second week.

I met with Lyndsay Hood, the acting program manager of the Australian Partnership for Advanced Computing (APAC), for lunch. We had an interesting chat about their storage issues, where the thing that seems to put them off the EGEE solution is the X509 authentication. Too difficult for normal users, they seem to think. This is something I've heard from people in the UK as well, but no one seems to have a better answer: portals are too limiting, username/password doesn't scale. Shiboleth gets talked about all the time, but it seems to be spoken of as a panecea, rather than a working product that people can use right now.

A general comment on computer science conferences: I've had enough graph theory to last me a lifetime.