Monday, August 28, 2006

DPM on SL4.

I tested DPM on SL4 last week. The i386 version worked fine. This is good from the point of view of hardware support in the kernel, however it does not help with support for xfs, which is still our preferred filesystem.

The x86_64 version turned out to be very hard to install - some bug in anaconda was causing all of the device nodes in /dev to dissapear shortly after formatting disk partitions. I couldn't find anything on google at all about this, but it's such a critical bug I find it amazing that no one else has experienced it. Must join the sl-users mailing list and report this.

Found a work around: first time around create the partitions - this install will then fail with the /dev/null bug above; then restart, but create no partitions, which works around the bug. What a pain! It means that our kickstart auto-installation of SL4 x86_64 nodes does not work.

In order to install the i386 packages needed by the gLite middleware I found yum to be far superior to apt - yum was able to look at the whole repository and pull the necessary i386 compatibility packages, where as apt seemed completely stymied.

Testing DPM, finally, on this platform seems to have revealed a bug in the dpm daemon itself. It seems to be unable to authenticate users, even when they are listed in the cns_db/userinfo table. This is even weirder, because dpns (e.g., dpns-ls) works fine. rfio and dpm-gsiftp both also work fine. If this turns out to be a real problem then at least the disk pools, which are the important component, can be run on SL4, with the headnode on SL3.

I will reboot today and we'll see if that helps - there were some issues with the database and hostnames (again).

See http://www.gridpp.ac.uk/wiki/Installing_SL3_build_of_DPM_on_SL4.

1 comment:

Greig said...

Good stuff with the SL4 work; sounds like you had a tough time with it. Regarding xfs on i386 SL4, I have found it to work without any problems, albeit on a test machine with only a 17GB partition.

[gcowan@wn4 ~]$ uname -a
Linux wn4.epcc.ed.ac.uk 2.6.9-34.0.2.EL #1 Fri Jul 7 09:57:49 CDT 2006 i686 athlon i386 GNU/Linux

[gcowan@wn4 ~]$ cat /etc/redhat-release
Scientific Linux SL release 4.3 (Beryllium)

[gcowan@wn4 ~]$ rpm -qa|grep xfs
kernel-module-xfs-2.6.9-34.0.2.EL-0.1-1
xfsprogs-2.6.13-1.SL

I've ran some tiobench and AIM suite-VII benchmarks and have not observed any problems unlike was previously reported on the HEPSYSMAN-L list.

It would be good to test this out further on more production level hardware.