Monday, February 25, 2008

Dem info system blues

I fixed a problem on the CE information system tonight. YAIM had gone a little screwy and incorrectly written the lcg-info-dynamic-scheduler.conf file, so I had added the lrms_backend_cmd parameter myself as:

lrms_backend_cmd: /opt/lcg/libexec/lrmsinfo-pbs -h

Adding the host seemed sensible as the CE and the batch system don't run on the same node, right? Wrong! the host paramater ends up being passed down to "qstat -f HOST" which is a broken command - we ended up with zeros everywhere for queued and running jobs and, consequently a large stack of biomed jobs we are unlikely ever to run.

I raised the obligatory GGUS ticket:

