Showing posts with label grid-mapfile. Show all posts
Showing posts with label grid-mapfile. Show all posts

Friday, August 29, 2008

nanocmos + lcas = FAIL

While working on an unrelated issue on svr021 I noticed an edg-mkgridmap error in the logfile

Aug 29 05:28:14 svr021 edg-mkgridmap[6693]: voms search(https://svr029.gla.scotgrid.ac.uk:8443/voms/vo.scotgrid.ac.uk/services/VOMSCompatibility?method=getGridmapUsers): Internal Server Error

Mentioned to mike who promptly went and fixed the issue, only to discover 30 mins later we're failing SAM tests - LCAS voms plugin had once again gone fubar and caused globus-gatekeeper to segfault

Aug 29 12:14:41 svr021 GRAM gatekeeper[662]: Authenticated globus user: [DN REMOVED]
Aug 29 12:14:41 svr021 GRAM gatekeeper[663]: Authenticated globus user: [DN REMOVED] Aug 29 12:14:41 svr021 kernel: globus-gatekeep[662]: segfault at 0000000000000046 rip 0000000000b86259 rsp 00000000ffff9d98 error 4
Aug 29 12:14:41 svr021 kernel: globus-gatekeep[663]: segfault at 0000000000000046 rip 0000000000b86259 rsp 00000000ffff9d98 error 4


the globus gatekeeper log has a bit more info:
TIME: Fri Aug 29 12:14:41 2008
PID: 663 -- Notice: 5: Authenticated globus user: [DN REMOVED]
lcas client name: [DN REMOVED]
LCAS 0:
LCAS 1: Initialization LCAS version 1.3.7
allowing empty credentials
LCAS 2: LCAS authorization request
LCAS 0: lcas_userban.mod-plugin_confirm_authorization(): checking banned users in /opt/glite/etc/lcas/ban_users.db
LCAS 0: lcas_plugin_voms-plugin_confirm_authorization_from_x509(): Did not find a matching VO entry in the authorization file
LCAS 0: 2008-08-29.12:14:41 : lcas_plugin_voms-plugin_confirm_authorization_from_x509(): voms plugin failed
LCAS 0: lcas.mod-lcas_run_va(): authorization failed for plugin /opt/glite/lib/modules/lcas_voms.mod
LCAS 0: lcas.mod-lcas_run_va(): failed
LCAS 0: lcas_plugin_voms-plugin_confirm_authorization_from_x509(): Did not find a matching VO entry in the authorization file
LCAS 0: 2008-08-29.12:14:41 : lcas_plugin_voms-plugin_confirm_authorization_from_x509(): voms plugin failed
LCAS 0: lcas.mod-lcas_run_va(): authorization failed for plugin /opt/glite/lib/modules/lcas_voms.mod
LCAS 0: lcas.mod-lcas_run_va(): failed
JMA 2008/08/29 12:14:45 GATEKEEPER_JM_ID 2008-08-29.11:14:39.0000014519.0000000000 JM exiting

As before, commenting out the lcas_voms.mod in /opt/glite/etc/lcas/lcas.db allows it to work, at the expense of losing VOMS roles.

We've got it working using the voms_mod at the moment by altering the ACLs on the VOMS server (svr029) for nanocmos. Now to try and debug the lcas plugin failure

Thursday, January 31, 2008

perl-TermReadKey - missing in action

We had a ticket from a Zeus user unable to get a file off our DPM, while her Zeus colleagues could. I spend a long time checking the pool accounts, which were all fine, and checking the Zeus VOMS setup, which was also fine.

Finally, I looked in the logs for the grid-mapfile, where the culprit lay:

"Can't locate Term/ReadKey.pm in @INC..."

On two of the servers, disk034 and disk036, the perl-TermReadKey RPM was missing and it looks like the grid-mapfiles had not been rebuilt for a very long time - from the backups it looks like it was October when they were remade!

OK, nagios check: age of grid-mapfile and lcgdm-mapfile!