Wednesday, August 12, 2009

getting ngs.ac.uk voms to work

I have been looking into an issue with the NGS as they are testing submission to the WMS. A ticket was raised as authentication failed on both our production CE's.

This was recreated with by created an ngs voms proxy.

-bash-3.00$ voms-proxy-init -voms ngs.ac.uk --valid 240:00
Cannot find file or dir: /clusterhome/home/gla057/.glite/vomses
Enter GRID pass phrase:
Your identity: /C=UK/O=eScience/OU=Glasgow/L=Compserv/CN=douglas mcnab
Creating temporary proxy ................................................................ Done
Contacting voms.ngs.ac.uk:15010 [/C=UK/O=eScience/OU=Manchester/L=MC/CN=voms.ngs.ac.uk/Email=support@grid-support.ac.uk] "ngs.ac.uk" Done

Warning: voms.ngs.ac.uk:15010: The validity of this VOMS AC in your proxy is shortened to 86400 seconds!

Creating proxy ............................................................................ Done
Your proxy is valid until Thu Aug 20 15:34:41 2009


Then with a direct globus-job-run:

-bash-3.00$ globus-job-run svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs "/bin/hostname -f"
GRAM Job submission failed because authentication with the remote server failed (error code 7)
-bash-3.00$ globus-job-run svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs "/bin/hostname -f"
GRAM Job submission failed because data transfer to the server failed (error code 10)


After much investigation, the long and short of it is that even with the correct entries in the groupmapfile and grid-mapfile the issue still occurred. So I checked the VO certificate in /etc/grid-security/vomsdir. This was fine, although there was also the /etc/grid-security/vomsdir/ngs.ac.uk/voms.ngs.ac.uk.lsc which may have been getting used before the VO certificate. So to check I removed the /etc/grid-security/vomsdir/ngs.ac.uk/voms.ngs.ac.uk.lsc

Hey presto, submission worked:

-bash-3.00$ globus-job-run svr026.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs "/bin/hostname -f"
node295.beowulf.cluster
-bash-3.00$ globus-job-run svr021.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs "/bin/hostname -f"
node295.beowulf.cluster


So I think there may be an issue with ngs.ac.uk VO and the lsc file which looked correct.


svr026:/etc/grid-security/vomsdir/ngs.ac.uk# cat voms.ngs.ac.uk.lsc
/C=UK/O=eScience/OU=Manchester/L=MC/CN=voms.ngs.ac.uk/Email=support@grid-support.ac.uk
/C=UK/O=eScienceCA/OU=Authority/CN=CA


This will be an issue in the future on SL5 when VO certificates are deprecated for the lsc file.

No comments: