Friday, May 08, 2009

ScotGrid Updates

  1. Mike enabled pilot roles for both ATLAS and LHCb. He will also work on a parser which digests torque logs and gives the accounting figures in HEP-SPEC2006.
  2. Dug has been tracking down problems and discovering more about the LCG-CEs failure modes than he ever wanted to know (double job running from comms problems all down the line between ganga, wms, CE and batch system).
  3. Stuart has been optimising the cleanup of shared disk areas, which were cramping our style by sending the main nfs server into serious i/o wait for 20 hours in the day.
  4. Sam has installed a small test xrootd server - hopefully I will start running some analysis jobs against it soon to test it out.
  5. We reviewed our fairshares in advance of STEP09 to make sure each group was getting their due. We dropped most of our opportunistic VOs down to 1%.
  6. I discovered a jolly wheeze in Maui to use QOS to help bind the three different ATLAS fairshares into one QOS unit, with its own fairshare. This gives ATLAS sub-groups a fairshare advantage if the total ATLAS usage is under the total ATLAS target. Goes like this:
GROUPCFG[atlas] FSTARGET=10 MAXPROC=2000,2000 QDEF=atlas
GROUPCFG[atlasprd] FSTARGET=21 MAXPROC=2000,2000 QDEF=atlas
GROUPCFG[atlaspil] FSTARGET=11 MAXPROC=2000,2000 QDEF=atlas

  1. Running well, but we decided not to implement the ATLAS pilot role (no intention to really support ATLAS analysis - they don't have the disk) and the LHCb pilot role is optional.
  2. Did the HEP-SPEC2006 benchmark on their nodes and got 67.82 for their Xeon L5430s (2.66GHz).
  1. To ward off less efficient user jobs we deleted ATLAS AOD - should see them only doing production for now.
  2. APEL publishing problem fixed.
  3. Steve plans to replace the ancient gLite 3.0 CE with a spiffy new gLite 3.1 one.

No comments: