Friday, September 05, 2008

take that cfengine

We've had a long running problem with cfengine at glasgow - 2.2.3 (the latest DAG) didn't expand out HostRange properly on the non-workernodes (ie where we need it most - disksvr, gridsvr, natbox groups). today I spent far too long battling with both 2.2.8 and the latest svn release (don't go there - its far too fussy about the exact release of aclocal you use) and neither of them worked properly.

I finally got a minature testcase configuration file to work, then got *really* confused when I used our live config as a testcase file sucessfully, but not the normal incantation.

it turned out to be the fact we'd defined

domain = ( beowulf.cluster )
in update.conf

however, setting this broke the way cfengine handles FQDNs on the dual-homed nodes (which are gla.scotgrid.ac.uk and beowulf.cluster). Commented it out leaving cfengine to guess the right thing to do, and it all seems OK.

I have since upgraded uniformly to 2.2.3 across all the SL4 x86_64 machines and tested OK.

While doing this I noticed we hadn't defined the WMS as a mysqld node so we weren't monitoring it in nagios or backing up the database. Oops. Sorted.

No comments: