Thursday, February 22, 2007

The President's Brain Is Missing...

svr031, the Glasgow cluster headnode, is currently a bit FUBAR.

Grieg was mirroring the dcache repository from DESY, becasue the mandatory webcaching policy at Glasgow stuffs up yum big time, hence any repos need to be got locally for installation to work. However. his mirror script has went crazy and managd to wipe the whole of /etc. Arggg!

Backup, what backup? It's a RAID 5 disk, we didn't need a backup (woops)...

svr031's roles in the cluster are:

  • NAT box for WNs
  • DNS server for whole of the cluster
  • Central syslogger
  • DHCP server
  • tftp server for kickstarting
  • http server for installation, nagios and ganglia

The critical run-time services are NAT and DNS. Fortunately the DNS server on svr031 and the NATing are still working, so even though the president's brain is missing the organs of the state still function, for now.

The immediate things to be done are to remove svr031's run time functions from the cluster. This will consist of:

  1. Generate and copy an /etc/hosts file with a complete set of entries for the internal cluster machines - so that the internal DNS is not required.
  2. Update /etc/resolv.conf on the cluster to use the standard university DNS servers
  3. Setup NAT via the dedicated NAT boxes

After those steps have been taken our dependancy on svr031 should be removed, and we can work on reinstalling it and re-establishing its other services.

