Wednesday, April 08, 2009

cream broken pipes

I was just updating our pre-production cream set-up for testing it with a newly installed pre-production torque instance and it ceased to submit any jobs.

So if you ever find that when submitting a job to cream you see the following....

2009-04-07 16:11:31,265 FATAL - MethodName=[jobRegister] Timestamp=[Tue 07 Apr 2009 16:11:31]
ErrorCode=[0] Description=[system error] FaultCause=[cannot write the job wrapper (jobId = CREAM600033116)!
The problem seems to be related to glexec which reported: Broken pipe]

It looked like re-running yaim on the node had re-configured something incorrectly. Checking /var/log/messages it actually looked like glexec could no longer write to a log file.

dev011:/var/log# tail -f messages
Apr 7 16:40:55 dev011 glexec[11697]: Error in LCAS/LCMAPS, rc = 107
Apr 7 16:40:55 dev011 glexec[11697]: LCAS failed, see '/var/log/glite/glexec_lcas_lcmaps.log' for more info.
Apr 7 16:43:33 dev011 glexec[12065]: glexec pid: 12065
Apr 7 16:43:33 dev011 glexec[12065]: lcas_log_open(): Cannot open logfile /var/log/glite/glexec_lcas_lcmaps.log

It appears that cream has a default glexec log location set in the glexec.conf which is either in /opt/glite/var/log/glexec_lcas_lcmaps.log or /var/log/glite/glexec_lcas_lcmaps.log. This must have changed!

This must directory must exist or else cream will not start! Something to remember in future!

Massimo

Actually no
The location of the glexec log file (specified in the glexec conf file by yaim-cream-ce) didn't change