Thursday, December 18, 2008

Am I seeing double site bdii?

With the imminent move of the development rack we need to move some of the important grid infrastructure out of the current dev rack and into a permanent production home in clustervision. To minimise site downtime we would like to create a temporary scotgrid BDII on svr027 (currently unused). So here goes.....

when running cfagent -qv it ran successfully on svr027 through the files, links, editfiles, packages including the correct glite-BDII packages and copy sections

All was going well until YAIM.

notes from configuring the UI

running yaim for a UI node will configure the UI, /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n BDII_site

this caused the following errors:

cfengine:svr027:m/bin/yaim -c -: INFO: Executing function: config_edgusers
cfengine:svr027:m/bin/yaim -c -: chown: cannot access `/opt/bdii/var': No such file or directory
cfengine:svr027:m/bin/yaim -c -: sed: can't read /opt/bdii/etc/schemas: No such file or directory
cfengine:svr027:m/bin/yaim -c -: INFO: Executing function: config_bdii_only
Stopping BDII27:m/bin/yaim -c -: [FAILED]
cfengine:svr027:m/bin/yaim -c -: Starting BDII [ OK ]

These errors were slightly puzzling but I realised that I had not changed anything in the site-info.def.
So I changed the SITE_BDII_HOST parameter from this:

SITE_BDII_HOST=svr030.$MY_DOMAIN

to this:

SITE_BDII_HOST="svr030.$MY_DOMAIN svr027.$MY_DOMAIN"

and re-ran /opt/glite/yaim/bin/yaim -c -s /opt/glite/yaim/etc/site-info.def -n BDII_site

This time the only error was:

sed: can't read /opt/bdii/etc/schemas: No such file or directory

but the configurator still produced:

INFO: Configuration Complete. [ OK ]
NFO: YAIM terminated succesfully.

checking the /opt/bdii/etc on svr027 I had this:

svr027:/opt/bdii/etc# ls -la
total 64
drwxr-xr-x 2 edguser edguser 4096 Dec 17 16:17 .
drwxr-xr-x 6 root root 4096 Dec 17 15:55 ..
-rw-r----- 1 edguser edguser 503 Dec 17 16:17 bdii.conf
-rw-r--r-- 1 edguser edguser 2535 Oct 13 13:54 BDII.schema
-rw-r--r-- 1 edguser edguser 50 Oct 13 13:54 bdii-update.conf
-rw-r--r-- 1 edguser edguser 634 Oct 13 13:54 DB_CONFIG
-rw-r--r-- 1 edguser edguser 246 Oct 13 13:54 default.ldif
-rw-r--r-- 1 edguser edguser 1783 Oct 13 13:54 glue-slapd.conf

checking this against svr030 I had this:

svr030:/opt/bdii/etc# ls -la
total 48
drwxr-xr-x 2 edguser edguser 4096 Oct 8 10:35 .
drwxr-xr-x 6 root root 4096 Feb 10 2008 ..
-rw-r--r-- 1 edguser edguser 364 Oct 8 10:35 bdii.conf
-rw-r--r-- 1 edguser edguser 50 Feb 10 2008 bdii-update.conf
-rw-r--r-- 1 edguser edguser 377 Feb 10 2008 indexes
-rw-r--r-- 1 edguser edguser 268 Oct 8 10:35 schemas

very different!

I then decided to reboot and try again from scratch just to make sure there was nothing hanging around from the previous failure.
When I installed everything in the same way. The file structure still appeared different. So i decided to test the site level BDII to see if it actually worked.

svr027:/opt/glite/yaim/etc# ldapsearch -xLLL -b mds-vo-name=UKI-SCOTGRID-GLASGOW,o=grid -p 2170 -h svr027.gla.scotgrid.ac.uk > svr027.txt
svr027:/opt/glite/yaim/etc# ldapsearch -xLLL -b mds-vo-name=UKI-SCOTGRID-GLASGOW,o=grid -p 2170 -h svr030.gla.scotgrid.ac.uk > svr030.txt


this was then compared: cat svr027.txt | sort > ldapsvr027.txt;cat svr030.txt | sort > ldapsvr030.txt;diff -y ldapsvr027.txt ldapsvr030.txt | grep '>' | grep '.gla.scotgrid'

On comparing the output from an ldap search it apparent that something was missing as their output showed some missing servers. After a quick discussion with Sam we found the file /opt/glite/etc/gip/site-urls.conf and noticed the differences: the DPM2 and BDII_TOP i.e. svr025 and svr019

svr027:/opt/glite/etc/gip# cat site-urls.conf
CE ldap://svr021.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
CE2 ldap://svr026.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
DPM ldap://svr018.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
WMS ldap://svr022.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
WMS2 ldap://svr023.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
BDII ldap://svr027.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
VOBOX ldap://svr024.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid

svr030:/opt/glite/etc/gip# cat site-urls.conf
CE ldap://svr021.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
CE2 ldap://svr026.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
DPM ldap://svr018.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
DPM2 ldap://svr025.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
WMS ldap://svr022.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
WMS2 ldap://svr023.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
BDII ldap://svr030.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
BDII_TOP ldap://svr019.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid
VOBOX ldap://svr024.gla.scotgrid.ac.uk:2170/mds-vo-name=resource,o=grid

after updating svr027 and restarting /etc/init.d/bdii restart we now have a operational site BDII on svr027.

The question is, should these additional entries be in /var/cfengine/inputs/skel/yaim/services/glite-bdii?

On with the move!

Update: svr027 is currently the only SITE_BDII in the GOC DB

1 comment:

Graeme Stewart said...

Hi Dug

You should add the site BDII itself and DPM2 to the site-info.def file for YAIM.

The SITE-BDII was a mistake (they used to include themselves automatically) and the DPM2 now looks like a semi-permanent feature for ATLAS tests.

g