Tuesday, December 16, 2008

Development / PreProd : The UI

I thought my first foray into grid middleware installations deserved a quick blog so here goes. Apologies in advance if I am covering old ground.

With grid02 now defunct and dev008 very much part of the alive and kicking it was time to install the required packages/middleware and configure it to run as a UI.

First thing for me was to understand/create a cfagent script for the new host. After much deliberation of wishing to keep it all separate and out of the way of the main production script. I decided to add it into the main script to save duplication. Perhaps something to think about for the future may be to split this up into much smaller modules per host and import a few common modules. Although, at this stage I am inclined to go with the old adage, "don't fix it if it ain't broken". I have also heard/read much of puppet which is built on cfengine with bells and whistles. Perhaps something to look at? Anyway, on with the install.

Once the script was created I ran cfagent -qv . However, beginner's luck was thin on the ground and it failed to install the packages properly first time around.

First off there was a missing dependency:

cfengine:dev008: --> Processing Dependency: perl(URI::URL) for package: perl-libwwError: Missing Dependency: log4cpp >= 1.0 is needed by package glite-ce-cream-client-api-c
cfengine:dev008: Error: Missing Dependency: liblog4cpp.so.4 is needed by package glite-ce-cream-cli

The fix was to include the DAG repo onto dev008 to pull a later version of log4cpp.
However, there was some issues surrounding this as the UI is 386 and the grid machines we have are generally 64 bit machines.
So the DAG repo url in /etc/yum.repos.d/dag.repo had to be fudged to change the /$basearch variable to i386

After a yum clean all I ran cfagent -qv again. This resulted in a second error:

cfengine:dev008: Transaction Check Error: file /usr/share/java/jaf.jar conflicts between attempted installs of geronimo-jaf-1.0.2-api-1.2-11.jpp5 and sun-jaf-1.1-3jpp
cfengine:dev008: file /usr/share/java/jaf_api.jar conflicts between attempted installs of geronimo-jaf-1.0.2-api-1.2-11.jpp5 and sun-jaf-1.1-3jpp

This was a known error with the middleware install and the fix was to run yum install glite-UI --disablerepo=jpackage17-generic

After a third run of cfgent -qv it was good to go or so I thought. What I did see was that it was running YAIM and failing. Therefore, I opted to run YAIM manually. Using the normal UI command, /opt/glite/yaim/bin/yaim -c -s ../etc/site-info.def -n UI I generated the following error:

INFO: Executing function: config_workload_manager_client_setenv
INFO: Executing function: config_workload_manager_client
ERROR: RB_HOST is not set
ERROR: One of the functions returned with error without specifying it's nature !

After a quick cat of the site-info.def, indeed RB_HOST is commented out as presumably the WMS is in there instead.

WMS_HOST="svr022.$MY_DOMAIN svr023.$MY_DOMAIN"
LB_HOST="svr022.$MY_DOMAIN svr023.$MY_DOMAIN"

I managed to amend the local site-info.def before cfagent set it back to the original value and this allowed YAIM to get further. After reading some sites, I opted for this config as it appeared that you could actually have WMS_HOST and RB_HOST defined in the one file. Perhaps a WMS install will not like this setting? We will have to see.

WMS_HOST="svr022.$MY_DOMAIN svr023.$MY_DOMAIN"
LB_HOST="svr022.$MY_DOMAIN svr023.$MY_DOMAIN"

running yaim again: opt/glite/yaim/bin/yaim -c -s ../etc/site-info.def -n UI now returned some errors when build the globus core:

gpt-build ====> Changing to /etc/grid-security/vomsdir/BUILD/globus_core-4.30/
gpt-build ====> BUILDING FLAVOR gcc32
GLOBUS_LOCATION=/opt/globus; export GLOBUS_LOCATION; GLOBUS_CC=gcc; export GLOBUS_CC; /etc/grid-security/vomsdir/BUILD/globus_core-4.30//configure --with-flavor=gcc32
Dependencies Complete
checking for a BSD-compatible install... /usr/bin/install -c
checking whether build environment is sane... yes
checking for gawk... gawk
checking whether make sets $(MAKE)... yes
checking whether to enable maintainer-specific portions of Makefiles... no
checking for style of include used by make... GNU
checking for gcc... no
checking for cc... no
checking for cc... no
checking for cl... no
configure: error: no acceptable C compiler found in $PATH
See `config.log' for more details.

It looked bizarrely like gcc is not installed by cfengine on a sl4.i386 version by default, so to fix: yum install gcc . After checking the cfagent.conf this does appear to be the case. There are lots of additional packages for sl4.x86_64 but not for i386. Should this be the case?

After another re-run of yaim: opt/glite/yaim/bin/yaim -c -s ../etc/site-info.def -n UI

INFO: Configuration Complete. [ OK ]
INFO: YAIM terminated successfully.

This looked better and after sourcing the grid-env that had just been installed: source /etc/profile.d/grid-env.sh commands like: voms-proxy-init -voms vo.scotgrid.ac.uk were successful. In fact I was able to submit a job and retrieve its data from dev008. So installation successful. Or so I thought. I updated the cfagent.conf and ran it all from cfengine.

Cfengine appears to make two passes. The 1st pass install works correctly. It installs the UI, configures through YAIM. However, since some of the fileedits and links rely on the existence of a configured glite they actually fail on the first pass i.e

cfengine:dev008: Error while trying to link /opt/glite/bin/python2 -> /usr/bin/python32
cfengine:dev008: Error while trying to link /opt/glite/bin/grid-proxy-init -> voms-proxy-init
cfengine:dev008: Error while trying to link /opt/glite/bin/grid-proxy-info -> voms-proxy-info
cfengine:dev008: Couldn't stat /opt/glite/etc/glite_wmsui_cmd_var.conf - no file to edit
cfengine:dev008: statcfengine:dev008: Couldn't stat /opt/edg/etc/edg_wl_ui_cmd_var.conf - no file to edit
cfengine:dev008: statcfengine:dev008: Couldn't stat /opt/glite/etc/gaussian/glite_wms.conf - no file to edit
cfengine:dev008: statcfengine:dev008: Couldn't stat /opt/glite/etc/gaussian/glite_wmsui.conf - no file to edit

I had expected these to be caught on the second pass as glite was installed and configured but that run of cfagent -qv does not pick them up on the second pass. When cfagent -qv is ran a second time it does update the files appropriately. Not sure this is the behaviour we want. Does anyone remember if this happened with the original UI? Currently the dev008 is using all the original classes for ui and clusterui at the moment and should be running in the same way as the original UI install.

So to summarise the questions:

  1. Can you set RB_HOST and WMS_HOST in the same site-info.def?
  2. Are there lots of packages missing for a sl4.i386install?
  3. Does anyone remember from the original UI install what happens when it updates files on the second pass?

So a partial success, now onto a WMS.

No comments: