Thursday, February 15, 2007

Final presntation in Melbourne: "Grid Data Management and Storage (An EGEE-Centric View)".

This is organised by VeRSI (The Victorian eResearch Strategic Initiative). They have funding to setup multi-site storage of ~100TB to help scientists in Victoria share data and were very interested in the EGEE DM solutions. I ran though storage, catalogs, SRM 2.2, FTS, etc (presentation here). However, my conclusion was that operations were more important than technology choices - perhaps that is the real lesson from EGEE.

After learning a bit more about their project, I was inclined to recommend dCache for them - they have a muti-site storage problem, with dedicated networking between the data centres and dCache would seem to offer them the most flexible approach. Of course, as with many of the Australian grid projects, SRB seemed to be their de facto soution (needless to say, with shiboleth authentication), however in talking to their SRB expert about the LFC, I finally managed to get a number for the scaling of the SRB MCAT catalog - it can start to have problems when you have over 30,000 files. This seems terribly low - I know the LFC has been tested up to millions on even quite modest hardware (although the LFC is just a file catalog, where as MCAT is also a metadata calalog).

Of course, no software is a panacea, and they all have problems - perhaps the weakness of the EGEE solution is that there are so many bits to it - it would be quite a daunting thing to setup from scratch.

I'll be interested to see what they do decide in the end.

No comments: