Thursday, February 15, 2007

Right, before the suntan gets washed off and all memory of antipodean adventure fades, better blog a bit about my vist to Melbourne Uni on the second week of the trip.

First Monday, it's VPAC. "Data Management and Storage in EGEE" is my talk.

VPAC is the Victorian Partnership for Advanced Computing. I'm giving much the same talk as I did at AUSGRID, but with some particle phyiscs at the beginning. This is partly to flesh things out, but mostly because the LHC is so cool it makes the storage and data management parts of the talk more interesting, having set the context for why we are doing what we're doing. This is my first talk pretending to be a particle physicist - remarkable what an afternoon browsing the Standard Model on wikipedia can produce ;-)

The audience are mostly coming from an HPC background, so they are more used to providing compute resources and fast interconnects, but now they have an interest in providing cross-site storage, rather and some dedicated disk hanging off their HPC. Oh, and they've heard about the grid.

Again, the scale of the EGEE project is generally appreciated - and by implication that the solutions we've adopted do scale. I'm specifically asked about why we ddn't use SRB. This is hard to give a definitive answer on as I don't know enough about SRB, however, the EGEE solution (usefully charactarised by Jens as component based) is that:

  1. It allows poorly performing components to be replaced individually, e.g., the replacement of the RLS with the LFC.
  2. It allows sites, whose storage is also serving local users and legacy applications, to layer an SRM interface on top of their storage, rather than have to implement a completely new system.
  3. It makes it easier to deploy at a large number of sites, where agreements across the whole grid are often difficult to achieve.

In the discussion it also becomes clear that SRB doesn't offer any good methods of data localisation - there's no easy way to ensure that jobs at a site can access SRB data efficiently. There doesn't seem to be an SRB LAN access protocol, like rfio or dcap.

After the talk I discuss SRB a little more with David Bannon. He says that the SRB model of federations of storage is actually quite fragile and only seems to work if all the SRBs are running exactly the same release - so it's back to big bang upgrades to maintain.

No comments: