It's taken a while, but svr031 has now been taken out of active service.
All machines have had /etc/hosts and /etc/resolv.conf files put on them, which takes care of internal cluster name resolution, and they have had their DNS servers pointed at the normal university ones.
In addition one of the nat hosts was setup as a gateway machine for the worker nodes. The workers were (carefully) told to use this new host as their default gateway.
So, nothing is relying on any services provided by svr031 and we can prepare for a reinstall next week.
I had to do a bit of resuscitation of svr031, so that one can at least login via ssh and scp files to and from it. I've restored some of the library paths to get auxiliary commands working.
The cluster itself has been remarkably untroubled by svr031 being in a tizzy - I thought about putting us into a precautionary downtime, but this has not been necessary. We've carried on passing the SAM tests without trouble.
Leaders? Who needs 'em?