Wednesday, September 20, 2006

A little bit on last week's ClusterVision training. I've decided I quite like CVOS. It's a nicely constructed system for doing image management. Of course, I have not much to compare it to, (e.g. OSCAR, Rocks).

The trinity tool manages nodes' images, the DHCP server and named is basic, but easy to use.

A number of things, like power cycling nodes and running commands across the cluster are neatly sorted out (and one should never undervalue a simple thing which is done well).

However, it's clear that we're running into a lot of problems because CV moved wholesale to 64bit images and this does not work will when the OS has to be 32bit.

This is the problem we have - which requires the rather ugly patching in of a 64bit kernel to allow the image deployment to take place. In the end the installer was not converted back to 32bit. This isn't a disaster, of course, but demonstrates that some painting into a corner has happened...

I'll post my notes from the training session as a comment to this post.

1 comment:

Graeme Stewart said...

The formatting on this is all screwed up. Ho hum...

- [ ] Trinity is utility used to configure the cluster
- [ ] Invoke "trinity" on command line
- [ ] Nodes are configured by querying the switch for what port
they are in.
- [ ] Nodes have catagory, APC port and eth port number
- [ ] Catagories can be defined
- [ ] How? Just manually?
- [ ] Red IP addresses mean MAC is unknown
- [ ] Need to scrub MAC address is machine dies
- [ ] (Auto burn in)
- [ ] Option 2 - Update dynamic hosts files - does dhcpd.conf, nfs
daemons, etc.
- [ ] Option 3 - Fundamental reconfiguration - should only need to
run once more
- [ ] /slave is the root of all configurations
- [ ] Trinity is quite basic - some parameters need to be
changed in the master file (/slave/cluster
- [ ] Defaullt node action is sync, for restart; full install for
new nodes
- [ ] BIOS can be flashed automatically - but leave this to CV
- [ ] Need to configure VLAN for disk servers and masternode - will use
eth1 interfaces
- [ ] Lowrens will do the VLAN - we can plugin using long cables
now, but will be cabled properly when remaining install done.
- [ ] /slave/installer
- [ ] Basic linux environment (64 bit)
- [ ] From here trinity is run - it's only a bootstrap
- [ ] /slave/catagories
- [ ] Each category has its own configuration file (can share
images though)
- [ ] IMAGE tells it what image to use
- [ ] GROUPS file does?
- [ ] NET defines how networking is defined
- [ ] PARTITIONS defines that node's ptable
- [ ] RSYNC_* do the obvious things...
- [ ] FINALISE is a script run after an image is rsynced
- [ ] Can be used to customise image types and add extra
things to fstab
- [ ] /slave/config
- [ ] One directory for every node controlled
- [ ] These files are read/written by trinity curses tool
- [ ] Easier to do batch changes by editing these directly
- [ ] Trinity writes rsyncd.conf - entry for each image
- [ ] Parallel commands
- [ ] ppoweron/ppoweroff.
- [ ]
- [ ] apc - controls APC units via SNMP
- [ ] apc list (all APC ports)
- [ ] apc -n list (with hostnames)
- [ ] apc status/reset HOST
- [ ] ppoweroff will power down everything ("normal" be default)
- [ ] ppoweroff -g GROUP
- [ ] pshutdown will halt node
- [ ] pping - mass ping
- [ ] pexec (uses rsh). -n RANGE
- [ ] Tries to cd to same wd as on master!
- [ ] N.B. default password is "system"
- [ ] Must change!
- [ ] module - sets environment for using a particular package
- [ ] CV will work on 32bit versions of modules
- [ ] RPMs of other compilers available (just ask)
- [ ] ganglia - will add lm_sensors modules
- [ ] nagios
- [ ] CV have a problem... nagios master was unable to pull
information for WNs
- [ ] new nagios can use broadcast, which is better - looking at it
- [ ] we need to think about this too
- [ ] shorewall
- [ ] /etc/shorewall/rules
- [ ] have to insert new rule to get ganglia to work!
- [ ] Support:
- [ ] on site repair of master + grid nodes + disk servers
- [ ] back to base for slaves
- [ ] CV pay for shipping -will email a shipping label
- [ ] Always quote project number #50143
- [ ] email
- [ ] Images
- [ ] ganglia installed on the master by CV
- [ ] we add PBS/torque
- [ ] GS to work on SL43 i386 for disk servers
- [ ] module add installer-tools
- [ ] mkinitrd_cvos - wraps initrd up to add nfs support, etc
- [ ] storage nodes
- [ ] (gs will work on disk032)
- [ ] module add areca
- [ ] default pasword "000"
- [ ] archttppci64
- [ ] has gpt partition table
- [ ] cli64 (command line interface) - could build a nagios sensor
for this
- [ ] can set IP address from cli
- [ ] Open issue : IPMI