Andrew did great work getting all the nodes back on line and dealing with the quirks of cfengine and reconfiguring everything.
Unfortunately we failed 2 SAM tests, with the infamous "Cannot read JobWrapper output, both from Condor and from Maradona". I checked the torque logs and both of these tests ran on node016 - so looked like this was the bad apple.
When I checked, it was clear that yaim had not run, so the PATH was bad (perhaps this was before Andrew fixed cfengine)?
Quick spin with cfengine and "-Drunyaim" and the node was good again.
The existence of links from /etc/profile.d/grid-env.{csh,sh} is an excellent proxy for YAIM having run correctly, so we should implement this as a cfengine test.
No comments:
Post a Comment