Friday, March 23, 2007

Jobmanager Tweaks at Glasgow

I finally got around to applying the Cal Loomis patch to the gatekeeper which helps catch jobs in the torque Completed state. Instead of patching lcgpbs.in and then reconfiguring Globus, I patched the final perl module in /opt/globus/lib/perl/Globus/GRAM/JobManager/lcgpbs.pm and then added this to our cfengine configuration.

In addition I patched the pbs jobmanager in the same way.

I have also now enabled the pbs jobmanager in /etc/globus.conf. Change the jobmanagers line to read

jobmanagers="fork lcgpbs pbs"

and then add

[gatekeeper/pbs]
type=pbs
job_manager=globus-job-manager

before restarting the gatekeeper.

At first I thought that it was broken as I was getting GASS cache errors, but in fact this turns out to be because the pbs jobmanager cannot deal with non-shared home directories (the lcgpbs one can). We don't have shared directories for EGEE VO users - however, we do use shared homes for NGS and local users, so David reports that it works for him (he's in NGS).

In addition we now pass 14/16 of the NGS GITS tests. The two we don't pass are gsissh and gsiscp, because GITS naively assumes these are on the gatekeeper host. For us they are not as we maintain a gsi login host for NGS and local users on svr020 instead.

Good progress, though.

No comments: