Wednesday, November 08, 2006

Some 20 atlas jobs went into a funny state in the batch system, being in "hold". I eventually figured that this was abnormal condition caused by the jobs not starting correctly.

Interestingly the fact that 20 atlas jobs were waiting, even though this was an abnormal wait, caused the GIP plugin to report a very high ERT/WRT for atlas. So there was nothing for it but to cancel these jobs (qdel). As soon as this was done the ERT returned to 0 and more atlas jobs arrived.

Didn't manage to get to the bottom of why they were failing to start though!

No comments: