Monday, November 26, 2007

Health and Efficiency (ATLAS style)

Now that the conversion to the new ATLAS MC production system (panda/pallette/pangea?) is underway, I thought it would be interesting to compare the site's view of efficiency in the new system to the old. I had to fix up our local accounting database, which was truncating some of the longer username fields we have now (e.g., prdatlasNNN). After doing that, I could easily distinguish between panda pilots and other production activities.

Since we upgraded to SL4 in September (which was just about the time that Rod started toying with panda) the scores are:


Lexor/Cronus
+-------+----------+-----------+-----------+
| Jobs | CPU_Hours| Wall_Hours| Eff |
+-------+----------+-----------+-----------+
| 20047 | 282434.6 | 533904.0 | 0.52899 |
+-------+----------+-----------+-----------+

Panda
+-------+------------+-----------+------------+
| Jobs | CPU_Hours | Wall_Hours| Eff |
+-------+------------+-----------+------------+
| 17746 | 57312.5925 | 59600.919 | 0.96160584 |
+-------+------------+-----------+------------+


This is quite a different view of "efficiency" to the VO's view, because here the actual success or failure of the job is masked - we're only looking at wall time efficiency in the batch system. However, the improvement here is spectacular, so sites should, I think, be very happy with this change.

Note that the panda figures include all the pilots, even the ones which had no jobs to pick up (production stalled a few times because of dCache problems at RAL and other teething troubles). If one masks these jobs out then the efficiency is even better: 98.1%.

No comments: