Tuesday, January 29, 2008

Keeping up with the Jones'

Well we recently had an incident with our NFS server for the cluster (home / software) locking up and needing a cold power cycle. Due to $vendors setup this takes aaaages (in the order of 20 mins) to go through the BIOS selfcheck (hangs at 053C). $vendor would like to poke around system and perhaps perform bios upgrade. Hmm. Oh well, all 10 disk servers are identical so we'll just drain one down and play - it also gives us chance to upgrade (from 1.6.5) to the latest 1.6.7-mumble DPM.


... or so we thought.

disk032:~# rpm -qa | grep DPM
DPM-gridftp-server-1.6.7-1sec
DPM-rfio-server-1.6.7-2sec.slc3
DPM-client-1.6.7-2sec.slc3


"Thats odd - Graeme have you updated these?" nope - Turns out that yum.nightly cron was auto updating on both the disk servers and some of the grid servers... Gaaah. clickity click and we're all ready to play.

In the meantime, dpm-drain migrated most of the data off the server to the other stash of disks but there were still 69 files that failed with 'Internal error' - Am looking through the DB to try and see if I can pull any more info out

No comments: