Wednesday, April 08, 2009

cream broken pipes

I was just updating our pre-production cream set-up for testing it with a newly installed pre-production torque instance and it ceased to submit any jobs.

So if you ever find that when submitting a job to cream you see the following....

2009-04-07 16:11:31,265 FATAL - MethodName=[jobRegister] Timestamp=[Tue 07 Apr 2009 16:11:31]
ErrorCode=[0] Description=[system error] FaultCause=[cannot write the job wrapper (jobId = CREAM600033116)!
The problem seems to be related to glexec which reported: Broken pipe]

It looked like re-running yaim on the node had re-configured something incorrectly. Checking /var/log/messages it actually looked like glexec could no longer write to a log file.

dev011:/var/log# tail -f messages
Apr 7 16:40:55 dev011 glexec[11697]: Error in LCAS/LCMAPS, rc = 107
Apr 7 16:40:55 dev011 glexec[11697]: LCAS failed, see '/var/log/glite/glexec_lcas_lcmaps.log' for more info.
Apr 7 16:43:33 dev011 glexec[12065]: glexec pid: 12065
Apr 7 16:43:33 dev011 glexec[12065]: lcas_log_open(): Cannot open logfile /var/log/glite/glexec_lcas_lcmaps.log

It appears that cream has a default glexec log location set in the glexec.conf which is either in /opt/glite/var/log/glexec_lcas_lcmaps.log or /var/log/glite/glexec_lcas_lcmaps.log. This must have changed!

This must directory must exist or else cream will not start! Something to remember in future!

1 comment:

Massimo said...

Actually no
The location of the glexec log file (specified in the glexec conf file by yaim-cream-ce) didn't change