Monday, May 18, 2009

Cream in Action : Local Users & Glexec

At Glasgow have now rolled out a production Cream instance open to only dteam, ops, vo.scotgrid.ac.uk and our newly created vo.optics.ac.uk (to support optics user community and Lumerical's FDTD software). This is svr014 and it looks like CMS are now looking for production Cream instances too. So it may see further action.

One thing that we have done in the past with our local user community is tweak LCMAPS such that specific local users do not use a pool account for their jobs. This was documented in a previous blog post. With cream I thought we should at least attempt to follow the same model for local users.

However, Cream uses glexec with LCMAPS and unfortunately the current version of glexec that comes with the cream CE to map to local users does not work correctly. Thanks to Oscar and Mischa at Nikhef for getting me the right versions of glexec. Here are the versions required to do the following mapping in LCMAPS:


glite-security-glexec-0.6.8-2.slc4.i386.rpm
glite-security-lcmaps-1.4.7-1.slc4.i386.rpm
glite-security-lcmaps-plugins-basic-1.3.10-2.slc4.i386.rpm


these are all in pre-production, so should be out soon in a full cream update.
When these rpm's are installed take care to set the setuid bits as these are lost during the update.

-rwsr-sr-x 1 root glexec 65620 Apr 30 15:56 /opt/glite/sbin/glexec

With these installed the following lcmaps policy can be added/amended to /opt/glite/etc/lcmaps/lcmaps-suexec.db


localuseraccount = "lcmaps_localuseraccount.mod -gridmapfile /usr/local/etc/grid-mapfile-local"

glexec_get_account:
proxycheck -> localuseraccount
localuseraccount -> good | vomslocalgroup
vomslocalgroup -> vomspoolaccount | poolaccount
vomspoolaccount -> good | vomslocalaccount
vomslocalaccount -> good | poolaccount
poolaccount -> good


This policy when moved to be executed first in the list will map any users in the grid-mapfile-local to their local user accounts rather than a pool account.

This 'tweak' seems to work but as I discovered Cream does not really like you doing this and you have to be very careful about the primary group of the user that glexec transforms you to. In Cream /opt/glite/var/cream_sandbox is the directory where the sandbox files are staged on the CREAM CE. This contains a set of directories, created I believe by yaim, named after each of the user/role combination. For example


drwxrwx--- 2 tomcat scotg 4096 Apr 28 12:42 scotg
drwxrwx--- 2 tomcat scotgprd 4096 Apr 28 12:42 scotgprd
drwxrwx--- 2 tomcat scotgsgm 4096 Apr 28 12:42 scotgsgm

dev011:/opt/glite/var/cream_sandbox/scotg# ls -la
total 24
drwxrwx--- 3 tomcat scotg 4096 May 18 14:14 .
drwxrwxr-x 81 tomcat tomcat 4096 May 18 14:10 ..
drwx------ 3 scotg094 scotg 4096 May 18 14:14 C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL


Note that these are all owned by the tomcat user and the group is in effect the grid group. So when not using any customised local users when glexec maps you via your voms extension e.g. vo.scotgrid.ac.uk to scotg001 a member of the scotg group and you end up in the scotg directory. Also note the permission of the directory named after your proxy: 700. Meaning only no group read/write permissions on the files contained within the directories.

When using a local user 'tweaked' LCMAPS and my vo.scotgrid.ac.uk proxy gla057/scotg it attempts to stage the input files to scotg but fails like this:

2009-05-18 14:20:18,983 INFO - Sending [/clusterhome/home/gla057/lumerical/paralleltest.fsp] to [gsiftp://dev011.gla.scotgrid.ac.uk/opt/glite/var/cream_sandbox/scotg/C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL/CREAM679019987/ISB/paralleltest.fsp]...
2009-05-18 14:20:18,984 DEBUG - ftpclient::put() - dst=[gsiftp://dev011.gla.scotgrid.ac.uk/opt/glite/var/cream_sandbox/scotg/C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL/CREAM679019987/ISB/paralleltest.fsp]
2009-05-18 14:20:19,761 ERROR - data_cb() - globus_ftp_client: the server responded with an error
2009-05-18 14:20:19,761 ERROR - done_cb() - globus_ftp_client: the server responded with an error
2009-05-18 14:20:19,764 FATAL - Error sending file [/clusterhome/home/gla057/lumerical/paralleltest.fsp]


This was very confusing at first but when you actually try to do a globus-url-copy or an uberftp which I presume was the CREAM UI is trying to do. You see that it is in fact using your proxy on the client side to map you to a pool account and gsiftp the files to CREAM. From what I could see it was using scotg094. On the server side after applying the local user 'tweak' what it meant was that glexec was actually interacting with cream to build the sandbox directories with a different user. This interaction can be seen here in /opt/glite/etc/glite-ce-cream/cream-glexec.sh


drwx------ 3 gla057 scotg 4096 May 18 14:20 C_UK_O_eScience_OU_Glasgow_L_Compserv_CN_douglas_mcnab_vo.scotgrid.ac.uk_Role_NULL_Capability_NULL


So the gsiftp could not write as the user was no longer the pool user and there are no group write permission on the directories contained within the sandbox. I was able to get round this by relaxing the permissions from 700 to 770 so that members of the same group could effectively read/write/execute to the sandbox directory by patching /opt/glite/etc/glite-ce-cream/cream-glexec.sh. Although I am not entirely happy about this as this could be a security concern.

Now this all worked because my local user gla057 still has a primary group that matches the pool accounts primary group of scotg. However, we have other local users that have a unix group glee. This does not match any of primary groups of the accounts pool available to the VO that they are a member of: nanocmos. I thought the quick win would be to add the nano pool accounts to have an additional group of glee.
But it turns out that globus-url-copy and uberftp etc do not understand the concept of secondary groups when gsiftp'ing. So no luck there.

I think the only possible solution is to create another local VO which can be supported properly through the middleware. A hassle but less of a hack.

1 comment:

Massimo said...

I think the fix for the following bug:


http://savannah.cern.ch/bugs/?48083>

will address this problem, right ?