I fixed a problem on the CE information system tonight. YAIM had gone a little screwy and incorrectly written the lcg-info-dynamic-scheduler.conf file, so I had added the lrms_backend_cmd parameter myself as:
lrms_backend_cmd: /opt/lcg/libexec/lrmsinfo-pbs -h svr016.gla.scotgrid.ac.uk
Adding the host seemed sensible as the CE and the batch system don't run on the same node, right? Wrong! the host paramater ends up being passed down to "qstat -f HOST" which is a broken command - we ended up with zeros everywhere for queued and running jobs and, consequently a large stack of biomed jobs we are unlikely ever to run.
I raised the obligatory GGUS ticket: https://gus.fzk.de/pages/ticket_details.php?ticket=33313
No comments:
Post a Comment