Tuesday, January 29, 2008

OPS goes dark...

Following enabling other VOs yesterday, DPM broke for ops. The error messages were as cryptic as ever:

httpg://svr018.gla.scotgrid.ac.uk:8443/srm/managerv1: Unknown error

And in the SRM logs:

01/29 12:29:02 14830,3 srmv1: SRM02 - soap_serve error : Can't get req uniqueid
01/29 12:05:18 14830,0 srmv1: SRM02 - soap_serve error : CGSI-gSOAP: Could not find mapping for: /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=samoper/CN=582979/C
N=Judit Novak

The error seemed to correspond to re-running config_mkgridmap yesterday, however, as Judit (and other ops people) were in the grid-mapfile I was very confused.

Eventually, staring at the lcgdm-mkgridmap.conf I realised that the ops VO was only configured to get voms information from the deprecated lcg-voms.cern.ch server. I reconfigured to get the information from voms.cern.ch and it started to work.

The think I cannot fathom is how is kept working for so long - we have always had lcg-voms.cern.ch as the server for ops.

I updated the ops entry on the GridPP Wiki.

As usual the things which made this much harder then it should have were:

1. It only affected the ops VO (not dteam, atlas or pheno which we can test).

2. The error message was, as usualy, cryptic and unhelpful.

