Friday, August 29, 2008

nanocmos + lcas = FAIL

While working on an unrelated issue on svr021 I noticed an edg-mkgridmap error in the logfile

Aug 29 05:28:14 svr021 edg-mkgridmap[6693]: voms search(https://svr029.gla.scotgrid.ac.uk:8443/voms/vo.scotgrid.ac.uk/services/VOMSCompatibility?method=getGridmapUsers): Internal Server Error

Mentioned to mike who promptly went and fixed the issue, only to discover 30 mins later we're failing SAM tests - LCAS voms plugin had once again gone fubar and caused globus-gatekeeper to segfault

Aug 29 12:14:41 svr021 GRAM gatekeeper[662]: Authenticated globus user: [DN REMOVED]
Aug 29 12:14:41 svr021 GRAM gatekeeper[663]: Authenticated globus user: [DN REMOVED] Aug 29 12:14:41 svr021 kernel: globus-gatekeep[662]: segfault at 0000000000000046 rip 0000000000b86259 rsp 00000000ffff9d98 error 4
Aug 29 12:14:41 svr021 kernel: globus-gatekeep[663]: segfault at 0000000000000046 rip 0000000000b86259 rsp 00000000ffff9d98 error 4


the globus gatekeeper log has a bit more info:
TIME: Fri Aug 29 12:14:41 2008
PID: 663 -- Notice: 5: Authenticated globus user: [DN REMOVED]
lcas client name: [DN REMOVED]
LCAS 0:
LCAS 1: Initialization LCAS version 1.3.7
allowing empty credentials
LCAS 2: LCAS authorization request
LCAS 0: lcas_userban.mod-plugin_confirm_authorization(): checking banned users in /opt/glite/etc/lcas/ban_users.db
LCAS 0: lcas_plugin_voms-plugin_confirm_authorization_from_x509(): Did not find a matching VO entry in the authorization file
LCAS 0: 2008-08-29.12:14:41 : lcas_plugin_voms-plugin_confirm_authorization_from_x509(): voms plugin failed
LCAS 0: lcas.mod-lcas_run_va(): authorization failed for plugin /opt/glite/lib/modules/lcas_voms.mod
LCAS 0: lcas.mod-lcas_run_va(): failed
LCAS 0: lcas_plugin_voms-plugin_confirm_authorization_from_x509(): Did not find a matching VO entry in the authorization file
LCAS 0: 2008-08-29.12:14:41 : lcas_plugin_voms-plugin_confirm_authorization_from_x509(): voms plugin failed
LCAS 0: lcas.mod-lcas_run_va(): authorization failed for plugin /opt/glite/lib/modules/lcas_voms.mod
LCAS 0: lcas.mod-lcas_run_va(): failed
JMA 2008/08/29 12:14:45 GATEKEEPER_JM_ID 2008-08-29.11:14:39.0000014519.0000000000 JM exiting

As before, commenting out the lcas_voms.mod in /opt/glite/etc/lcas/lcas.db allows it to work, at the expense of losing VOMS roles.

We've got it working using the voms_mod at the moment by altering the ACLs on the VOMS server (svr029) for nanocmos. Now to try and debug the lcas plugin failure

No comments: