Monday, May 18, 2009

Cream in Action : Local Users & Glexec

At Glasgow have now rolled out a production Cream instance open to only dteam, ops, vo.scotgrid.ac.uk and our newly created vo.optics.ac.uk (to support optics user community and Lumerical's FDTD software). This is svr014 and it looks like CMS are now looking for production Cream instances too. So it may see further action.

One thing that we have done in the past with our local user community is tweak LCMAPS such that specific local users do not use a pool account for their jobs. This was documented in a previous blog post. With cream I thought we should at least attempt to follow the same model for local users.

However, Cream uses glexec with LCMAPS and unfortunately the current version of glexec that comes with the cream CE to map to local users does not work correctly. Thanks to Oscar and Mischa at Nikhef for getting me the right versions of glexec. Here are the versions required to do the following mapping in LCMAPS:


glite-security-glexec-0.6.8-2.slc4.i386.rpm
glite-security-lcmaps-1.4.7-1.slc4.i386.rpm
glite-security-lcmaps-plugins-basic-1.3.10-2.slc4.i386.rpm


these are all in pre-production, so should be out soon in a full cream update.
When these rpm's are installed take care to set the setuid bits as these are lost during the update.

-rwsr-sr-x 1 root glexec 65620 Apr 30 15:56 /opt/glite/sbin/glexec

With these installed the following lcmaps policy can be added/amended to /opt/glite/etc/lcmaps/lcmaps-suexec.db


localuseraccount = "lcmaps_localuseraccount.mod -gridmapfile /usr/local/etc/grid-mapfile-local"

glexec_get_account:
proxycheck -> localuseraccount
localuseraccount -> good | vomslocalgroup
vomslocalgroup -> vomspoolaccount | poolaccount
vomspoolaccount -> good | vomslocalaccount
vomslocalaccount -> good | poolaccount
poolaccount -> good


This policy when moved to be executed first in the list will map any users in the grid-mapfile-local to their local user accounts rather than a pool account.

This 'tweak' seems to work but as I discovered Cream does not really like you doing this and you have to be very careful about the primary group of the user that glexec transforms you to. In Cream /opt/glite/var/cream_sandbox is the directory where the sandbox files are staged on the CREAM CE. This contains a set of directories, created I believe by yaim, named after each of the user/role combination. For example


drwxrwx--- 2 tomcat scotg 4096 Apr 28 12:42 scotg
drwxrwx--- 2 tomcat scotgprd 4096 Apr 28 12:42 scotgprd
drwxrwx--- 2 tomcat scotgsgm 4096 Apr 28 12:42 scotgsgm

dev011:/opt/glite/var/cream_sandbox/scotg# ls -la
total 24
drwxrwx--- 3 tomcat scotg 4096 May 18 14:14 .
drwxrwxr-x 81 tomcat tomcat 4096 May 18 14:10 ..
drwx------ 3 scotg094 scotg 4096 May 18 14:14 C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL


Note that these are all owned by the tomcat user and the group is in effect the grid group. So when not using any customised local users when glexec maps you via your voms extension e.g. vo.scotgrid.ac.uk to scotg001 a member of the scotg group and you end up in the scotg directory. Also note the permission of the directory named after your proxy: 700. Meaning only no group read/write permissions on the files contained within the directories.

When using a local user 'tweaked' LCMAPS and my vo.scotgrid.ac.uk proxy gla057/scotg it attempts to stage the input files to scotg but fails like this:

2009-05-18 14:20:18,983 INFO - Sending [/clusterhome/home/gla057/lumerical/paralleltest.fsp] to [gsiftp://dev011.gla.scotgrid.ac.uk/opt/glite/var/cream_sandbox/scotg/C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL/CREAM679019987/ISB/paralleltest.fsp]...
2009-05-18 14:20:18,984 DEBUG - ftpclient::put() - dst=[gsiftp://dev011.gla.scotgrid.ac.uk/opt/glite/var/cream_sandbox/scotg/C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL/CREAM679019987/ISB/paralleltest.fsp]
2009-05-18 14:20:19,761 ERROR - data_cb() - globus_ftp_client: the server responded with an error
2009-05-18 14:20:19,761 ERROR - done_cb() - globus_ftp_client: the server responded with an error
2009-05-18 14:20:19,764 FATAL - Error sending file [/clusterhome/home/gla057/lumerical/paralleltest.fsp]


This was very confusing at first but when you actually try to do a globus-url-copy or an uberftp which I presume was the CREAM UI is trying to do. You see that it is in fact using your proxy on the client side to map you to a pool account and gsiftp the files to CREAM. From what I could see it was using scotg094. On the server side after applying the local user 'tweak' what it meant was that glexec was actually interacting with cream to build the sandbox directories with a different user. This interaction can be seen here in /opt/glite/etc/glite-ce-cream/cream-glexec.sh


drwx------ 3 gla057 scotg 4096 May 18 14:20 C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL


So the gsiftp could not write as the user was no longer the pool user and there are no group write permission on the directories contained within the sandbox. I was able to get round this by relaxing the permissions from 700 to 770 so that members of the same group could effectively read/write/execute to the sandbox directory by patching /opt/glite/etc/glite-ce-cream/cream-glexec.sh. Although I am not entirely happy about this as this could be a security concern.

Now this all worked because my local user gla057 still has a primary group that matches the pool accounts primary group of scotg. However, we have other local users that have a unix group glee. This does not match any of primary groups of the accounts pool available to the VO that they are a member of: nanocmos. I thought the quick win would be to add the nano pool accounts to have an additional group of glee.
But it turns out that globus-url-copy and uberftp etc do not understand the concept of secondary groups when gsiftp'ing. So no luck there.

I think the only possible solution is to create another local VO which can be supported properly through the middleware. A hassle but less of a hack.

Cream in Action : Consumable Resources

I am not sure if you remember this previous post but I said stated that some experimenting was required in order to get consumable resources working with the glite middleware stack.

The reason for this requirement was that for some licensed software (FDTD by Lumerical) that we have installed on our cluster. The documented way to 'consume' a license is to qsub directly to the batch system and pass #software -l FDTD. Not much good when you have an lcg-CE in front of it! After some further investigation it appeared that the only way to get this information through the lcg-CE would be to 'patch' the job manager, so that it added this into the generated PBS script based on RSL that could be sent to it. Unfortunately, from what I could see the RSL schema did not have anything that could be used to fit this software attribute out of the box and patching the job manager was not an ideal going forward.

This looked to only leave the option of creating a specific queue for the software and only allowing members of the new VO to run in this queue. However, it finally struck me to look at the capabilities of cream. With the help of Massimo Sgaravatto and David Rebatto I was able to pass this batch system requirement through the wms, cream and finally end up on the batch system correctly with very little customisation.

in summary:

- set in your JDL (the one used for the glite-ce-job-submit command):
cerequirements = "software==\"FDTD\"";
- Create in the CREAM CE node the file:
/opt/glite/bin/pbs_local_submit_attributes.sh
which has to properly manage the added attribute ("software" in your
case). E.g. for this specific use case it could be something like:


#!/bin/sh
if [ "$software" == "FDTD" ]; then
echo "#PBS -l software=FDTD"
fi


So for any special CE requirements your can handle them by adding them into the submit_attributes.sh file. Cream also has similar capabilities for other batch systems.

As for WMS submission, well when the ice component worked if only for a brief time...
the CErequirements attribute in the JDL sent to CREAM is supposed to be filled by the WMS. This value should basically take into account what it is specified in the Requirements attribute of the JDL and the value specified as CeForwardParameters in the WMS configuration file.

For example, if in your JDL you have:

Requirements= "other.GlueHostMainMemoryRAMSize > 100 && other.GlueCEImplementationName==\"CREAM\"";

and if the conf file of the WMS there is:

CeForwardParameters = {"GlueHostMainMemoryVirtualSize","GlueHostMainMemoryRAMSize","GlueCEPolicyMaxCPUTime"};

The JDL sent by ICE to CREAM should be:

CeRequirements= "other.GlueHostMainMemoryRAMSize > 100";

Unfortunately this doesn't work because of this bug

What you can do now, as a workaround, is specify in the JDL used in the submission to the WMS this cerequirements, e.g.:
cerequirements = "software==\"FDTD\"";
This will be forwarded as it is to CREAM.

This has now been written up in more detail on the cream page.

So to sum it up: Thumbs up for cream.

Saturday, May 09, 2009

Oh my gosh... it's users...

I had been aware of a steady increase in the number of ATLAS user jobs on the cluster in the last few months, which I was delighted to see. I decided to quantify this by querying our accounting database and the users really have arrived.

User jobs since April 1 have consumed 867k hours of wallclock and 686k hours of CPU (80% efficient), c.f. production numbers of 1981k wallclock and 1867k CPU (94% efficient). This means ATLAS users are now consuming 30% of the ATLAS walltime on the cluster.

We've had 235 unique ATLAS users since April and 46 have used more than 1000 hours of wallclock time.

Friday, May 08, 2009

ScotGrid Updates

Glasgow:
  1. Mike enabled pilot roles for both ATLAS and LHCb. He will also work on a parser which digests torque logs and gives the accounting figures in HEP-SPEC2006.
  2. Dug has been tracking down problems and discovering more about the LCG-CEs failure modes than he ever wanted to know (double job running from comms problems all down the line between ganga, wms, CE and batch system).
  3. Stuart has been optimising the cleanup of shared disk areas, which were cramping our style by sending the main nfs server into serious i/o wait for 20 hours in the day.
  4. Sam has installed a small test xrootd server - hopefully I will start running some analysis jobs against it soon to test it out.
  5. We reviewed our fairshares in advance of STEP09 to make sure each group was getting their due. We dropped most of our opportunistic VOs down to 1%.
  6. I discovered a jolly wheeze in Maui to use QOS to help bind the three different ATLAS fairshares into one QOS unit, with its own fairshare. This gives ATLAS sub-groups a fairshare advantage if the total ATLAS usage is under the total ATLAS target. Goes like this:
GROUPCFG[atlas] FSTARGET=10 MAXPROC=2000,2000 QDEF=atlas
GROUPCFG[atlasprd] FSTARGET=21 MAXPROC=2000,2000 QDEF=atlas
GROUPCFG[atlaspil] FSTARGET=11 MAXPROC=2000,2000 QDEF=atlas

QOSCFG[atlas] FSTARGET=42+
Durham:
  1. Running well, but we decided not to implement the ATLAS pilot role (no intention to really support ATLAS analysis - they don't have the disk) and the LHCb pilot role is optional.
  2. Did the HEP-SPEC2006 benchmark on their nodes and got 67.82 for their Xeon L5430s (2.66GHz).
ECDF:
  1. To ward off less efficient user jobs we deleted ATLAS AOD - should see them only doing production for now.
  2. APEL publishing problem fixed.
  3. Steve plans to replace the ancient gLite 3.0 CE with a spiffy new gLite 3.1 one.