Fortunately we got a big boost in the number of FTS slots from Glasgow to RAL, increasing from 10 to 25 active transfers (see the bottom FTS monitoring plot). Even so it clearly takes 24 hours for all the backlogs to drain down.
One of the problems here is that the output files are small from simulation (a tiny log file and a 20-50MB HITS file), so the overheads of FTS + SRM are very considerable and the actual bandwidth achieved is quite low. One possibility we are considering in ATLAS is introducing a pre-merge of outputs on the T2, which will allow us to send much bigger files back to the T1 (although a final "super-merge" will probably still be necessary). For this we are waiting for the generic Athena merge transform and then we will need to test integrating this into the mainline production workflow.
Until then we just have to take the operational load of tweaking the FTS settings when necessary.
1 comment:
Queen Mary experienced the same problem a couple of months ago - at least in part due to Imperial's jobs using QMUL's SE.
I suspect the "real" solution is piplining support in gridFTP.
Post a Comment