Wednesday, February 13, 2008

ganglia gmond

During some testing for next months outage, we'd rebuilt several nodes. One thing I noticed was that the NAT boxes had stopped reporting into ganglia. We'd had something similar before with an older version of gmond ignoring the 'mcast_if' parameter (hey, the alternative is to set up the routing tables) - the clunky 'copy over a known newer binary' wasn't going to be sustainable and the download only had i386.

Howver, kudos to the ganglia developers - one stupidly simple 'rpmbuild' and lo, a pile of x86_64 rpms ready to be copied into the cluster repo directory. Some cfengine voodoo and zip - all diskservers (including the new shiny 48T box) and nat boxes are reporting in. some of the graphs took a wobble but we're all present and correct with 168 machines in the pool.

