I was invited to talk about data movement in GridPP/WLCG to the AstroGrid people today. It was really informal and a chance for GridPP and AstroGrid to learn a bit more about one another's problem domains and look at the solutions that each project has adopted.
AstroGrid have a nice virtual filesystem implementation called VOSpace, which allows astronomers to interact with IVOA databases and resources. They now want to extend the concept of these personal virtual spaces into the Astronomy data area. So, a database at the ROE could have a database for an instrument, with tables for different data, and an astronomer would be able to query and store their results in a virtual space within the ROE database (roe/user/graeme/myqueries/hotstars). These results could then be copied out, e.g., into a VOSpace area.
The cunning bit is can you combine query results between databases? i.e., do a join between a database view at ROE and one at Cambridge? Well, this requires a (possibly large) amount of data to be shipped between these two databases.
The question they wanted to ask me was, did I know an efficient way of transferring the data between the databases?
Now, this is quite a different from the LCG problem. So, instead of saying, "buggered if I know" (with my facetious streak, it was tempting ;-), I described the sorts of data flows associated with LCG, the software components we use and the achievements and limitations of our solutions: FTS is great - it ships PB of data around very reliably 24x7, balances VO and site requirements, etc.; FTS is rubbish - it needs Oracle and only talks to SRMs.
Actually, it became quite an interesting and wide ranging chat about grids - different methods of working and how to convince users to use it. They were quite heartened to see that in HEP the VOs really do now overwhelmingly use the grid for their activities.
In the end it seems they actually want to start this quite small - get the virtualised results spaces working first, and then probably tackle the bulk data shipping later. They were definitely interested in looking at an implementation of VOSpace which used an SRM as a backend store (at the moment they haven't really gone beyond the constraints of a single RAID array). And when they do come to look at bulk data movement, then they will look at FTS and RFT as possible methods for doing it.
Nice to make contact with other communities. They get a good view from the top of Blackford Hill...
No comments:
Post a Comment