ScotGrid: March 2009

Wednesday, March 11, 2009

ice cream anyone ...

We now have a functional cream CE in our preproduction mini cluster designated dev011. So what does this give us I hear you cry. Well it has been recently reported that an update to the glite packages allows the WMS to submit directly to Cream through the ice component. anyone for ice cream! So I thought, why not give this a shot! The updates were successfully installed on the UI (dev008), WMS (dev009) and Cream CE (dev011). When I say successfully installed, I actually mean with some minor jpackage voodoo. It seems that this repo is just plain broken and there are all sorts of clashes between the jpackage 5 and 1.7. In fact, the advice on lcg rollout seems to be to remove 1.7 from the repo definition altogether. It would be nice if we could get a standard build of the java that worked and distribute it along with the middleware. Since we haven't seen it. I'm guessing that is not possible! Anyway, on with the ice cream.

I was going to post all the fun I had trying to install the cream CE but for brevity I have moved that to a ScotGrid wiki page and will just show it working with the WMS. In order to test it working on our mini cluster I installed a site bdii and changed LCG_GFAL_INFOSYS such that lcg-infosites on the UI picked up the mini cluster CE's.


-bash-3.00$ lcg-infosites --vo vo.scotgrid.ac.uk ce
#CPU    Free    Total Jobs    Running    Waiting    ComputingElement
----------------------------------------------------------
.....
1912       5       4              0       4    dev011:8443/cream-pbs-q30m
1912       8       4              0       4    dev010:2119/jobmanager-lcgpbs-q30m

a whipped cream example:


-bash-3.00$ cat whippedcream.jdl
Type = "Job";
JobType = "Normal";
Executable = "double.sh";
StdOutput = "hw.out";
StdError = "hw.err";
InputSandbox = {"double.sh"};
OutputSandboxBaseDestURI = "gsiftp://dev008/clusterhome/home/gla057/cream/job_output";
OutputSandbox = {"hw.out", "hw.err"};
Requirements = other.GlueCEUniqueID == "dev011:8443/cream-pbs-q30m";

submission to cream ce through a WMS:


-bash-3.00$ glite-wms-job-submit -a --vo vo.scotgrid.ac.uk --debug -r dev011:8443/cream-pbs-q30m whippedcream.jdl

can we see the job in torque? Yes we can.


svr016:~# qstat | grep sco
2214311.svr016            cream_034614244  scotg001               0 W q30m

Has it worked through the cream ce? Yes!


-bash-3.00$ glite-wms-job-status https://dev009:9000/l4-RXjbtZbk1g00moK2IWA

*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://dev009:9000/l4-RXjbtZbk1g00moK2IWA
Current Status:     Done (Success)
Logged Reason(s):
   - job completed
   - Job Terminated Successfully
Exit code:          0
Status Reason:      Job Terminated Successfully
Destination:        dev011:8443/cream-pbs-q30m
Submitted:          Fri Mar  6 16:28:15 2009 GMT
*************************************************************

It also appears there is no need to ask for the job output as this is automatically gsiftp'd to your output sandbox dir specified in the jdl.


-bash-3.00$ glite-wms-job-output https://dev009:9000/l4-RXjbtZbk1g00moK2IWA
Connecting to the service https://dev009:7443/glite_wms_wmproxy_server
Error - Output not Allowed
Output files already retrieved

One point to note is that you now have to run a gridftp server to stage successful output from the cream CE. This is also useful for staging files in especially if you want to bypass WMS inputSandbox size limitations imposed by sites. For a more in-depth account of the install you can check out the ScotGrid wiki. This may help if you encounter anything weird.

Monday, March 09, 2009

Tier 2.5 open for business

Today we released the long-awaited Tier 2.5 to the local punters at Glasgow. The benefits of this "halfway house" include:

Output of Grid jobs running at GU-Scotgrid can now be sent straight to the departmental (i.e. non-Grid enabled) storage, in turn making it accessible to users' desktop machines.
Access to the GU-ScotGrid UI is now with a familiar departmental username, rather than an arbitrarily assigned 'glaXXX' account; one less thing for new users to remember.

To protect the NFS mounted departmental storage from ne'er-do-wells, we created an additional Unix group, to which all Tier 2.5 users (and nobody else) must belong. Additionally, the permissions on the ScotGrid end of the NFS mounts are set to '750':


drwxr-x---     2 root nfsusers     0 Mar  9 16:23 data

These steps successfully control the users who can see the departmental NFS mounts, but what about Grid jobs? Well, so long as the user's primary GID is a 'Griddy' one, their job will run, have access to the NFS mounts, and be accounted for accordingly.

Wednesday, March 04, 2009

jpackage voodoo

I was recently trying to install a glite-MON box along with a glite-UI for the development cluster (i386 SL4). However, jpackage seemed to playing up more than ever at the moment.

With the current jpackage repo setup of:


bash-3.00# cat jpackage.repo
[main]
[jpackage17-generic]
name=JPackage 1.7, generic
baseurl=http://mirrors.dotsrc.org/jpackage/1.7/generic/free/
enabled=1
protect=1

[main]
[jpackage5-generic]
name=JPackage 5, generic
baseurl=http://mirrors.dotsrc.org/jpackage/5.0/generic/free/
enabled=1
protect=1

The first install attempt gave these errors:


yum install glite-MON

Error: Missing Dependency: jdk = 2000:1.6.0_12-fcs is needed by package java-1.6.0-sun-compat
Error: Missing Dependency: xml-commons-jaxp-1.2-apis = 0:1.3.04-5.jpp5 is needed by package xml-commons-resolver11
Error: Missing Dependency: jaxp = 1.2 is needed by package dom4j

Argh!
I remembered that jpackage17 was causing all sorts of issues so I decided to remove it for the yum repo temporarily and slot in a non-free jpackage5. Then by updating the repo 'yum clean all; yum update", I was ready to re-run the glite-MON install.


yum install glite-MON
...
Error: Missing Dependency: xml-commons-jaxp-1.2-apis = 0:1.3.04-5.jpp5 is needed by package xml-commons-resolver11
Error: Missing Dependency: jaxp = 1.2 is needed by package dom4j

Woo Hoo, one down.


bash-3.00# yum list xml-commons-jaxp-1.2-apis
...
Installed Packages
xml-commons-jaxp-1.2-apis.noarch         1.3.04-5.jpp5          installed

I was slightly confused as it said it couldn't find that a minute ago! Anyway I did a yum search and found that there was a 1.3 available so I installed that.


yum install xml-commons-jaxp-1.3-apis.noarch
Dependencies Resolved
=============================================================================
 Package                 Arch       Version          Repository        Size
=============================================================================
Installing:
 xml-commons-jaxp-1.3-apis  noarch     1.3.04-5.jpp5    jpackage5-generic  224 k

Transaction Summary
=============================================================================
Install      1 Package(s)        
Update       0 Package(s)        
Remove       0 Package(s)        
Total download size: 224 k
Is this ok [y/N]: y
Downloading Packages:
(1/1): xml-commons-jaxp-1 100% |=========================| 224 kB    00:00    
Running Transaction Test
Finished Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing: xml-commons-jaxp-1.3-apis    ######################### [1/2]
  Removing  : xml-commons-jaxp-1.2-apis    ######################### [2/2]

Installed: xml-commons-jaxp-1.3-apis.noarch 0:1.3.04-5.jpp5
Complete!

Okay, now for the biggie, yum install glite-MON and it worked!

Monday, March 02, 2009

Gone with the Indices: A story of optimisation and DPM, set against the thrilling backdrop of MySQL.

Last time I blogged, it was to crow about how much we'd improved our DPM performance against the ATLAS User Analysis tests by splitting our DPM into a front end and a MySQL server backend.

It appeared at that point that the limiting factor on the performance of the DPM was the IOwait on the MySQL server, so we've been looking into ways to reduce that.
Turning on slow query logging showed that there were actually a couple of relatively common queries which were selecting on columns that weren't indexed in their tables, so we decided to try adding indexes to see if that improved matters. (While indexes add a small constant to the time taken to make a write, there's already quite a few implicit indexes on the tables, and writes are much less common than reads.)
The most common slow queries were of the form:

select MAX(lifetime) from dpm_get_filereq where pfn = 'some pfn here'

and lifetime is not indexed in the dpm_db.dpm_get_filereq table (to be fair, there's no obvious reason why it should be, and the db is generally pretty well indexed on the whole).

create index pfn_lifetime on dpm_get_filereq (pfn(255), lifetime);

deals with that.
Similarly, for the less frequent lookups for put requests we add:

create index status_idx on dpm_put_filereq(status);

and

create index stime_idx on dpm_req(stime);

and, finally, to optimise out the spikes we see each time monami tries to query the server, we add an index to the cns_db:

create index usage_by_group Cns_file_metadata(gid, filesize);

(this also speeds up the responsiveness of Greig's DPM Monitoring webapp).

In order to do this without locking the request tables for ages, Stuart implemented a slightly hair-raising approach involving cloning the "static", older, parts of the tables, indexing the clone, and then stopping dpm briefly, and syncing the clone with the dynamic parts before switching the (indexed) clone for the (unindexed) original and restarting dpm.
This works surprisingly well - something like 95% of the request tables appear to be historical and static rather than referring to current requests.
(It also raises the question of if it would be easier just to delete the first 80% or so of all the request tables, keeping a suitable backup copy, of course.)

So, after all that, what was the result?
Well.

In normal use, the MySQL load is much smoother than before - we've removed pretty much all the load spikes from intensive infrequent queries, and the background load from get requests is roughly halved from previously.

This is visible by comparing the MySQL server loads during HammerCloud test 135 and the most recent test against Glasgow - HC 164:

Unfortunately, within error, it doesn't seem to have actually improved our performance in HammerCloud tests by anything:

which is sad. The iowait still appears (but a bit reduced) when we're under heavy load - the sheer number of reads against the DB is enough to generate this by itself, even with indexes.
It's possible that we could reduce the iowait by increasing the InnoDB Buffer Pool setting for the server - at the moment, we have a 97% hit rate, so increasing that to 99% would cut our iowait by a factor of 3 - but it's not clear that the server is really the bottleneck.

Looking at the other loads:

DPM disk cpu load for first hour of test.

DPM disk network load for first hour of test.

then it's not clear where the bottleneck is, really - the disks were slightly more stressed (there's a little bit of iowait visible at their peak CPU load), and it looks like something in the network bandwidth topped out at the same time (that peak is suspiciously flat at around 800MB/sec).
Further investigation needed, though!

Interactive Debugging on WNs

[Not strictly scotgrid, but figured the scotgrid blog has a higher readership than my personal ramblings]. How to get an interactive bash shell on the workernodes (with grid environment) to debug. Case in point, as part of ther certification of the SL5 x86_64 bit WN, I could lcg-cr fine on the command line, but not as a job.

WARNING - Trying this as a user without the site administrators assistance will probably lead to 'Bad Things' happening to your DN and the banned user list... You have been warned.

So - I wanted to get a shell to work out exactly what wasn't quite right.

On the Workernode:
1) install screen (yum install screen)
2) chmod 755 /var/run/screen
3) chmod +s /usr/bin/screen (yes, we know SUID is bad mmmkaaay.)
4) append to /etc/screenrc
multiuser on
acladd root

Then your jdl can simply invoke 'screen -dm'. root can then reattach to the session on the same workernode using screen -rx wnusername/pid... syntax, eg:

[root@vtb-generic-94 ~]# screen -r dteam013/
sh-3.2$ voms-proxy-info --all
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=aelwell/CN=671736/CN=Andrew Elwell/CN=proxy/CN=proxy/CN=limited proxy

tada!

Gotchas: Trying to be smart and put Executable = "/usr/bin/screen"; and Arguments = "-d -m"; doesn't help. Although the screen session launces as it should, the cleanup wipes all your proxy and other goodies.

Working with a noddy screen.sh input sandbox of
#!/bin/sh
screen -dm
sleep 3600

did the trick fine.

ScotGrid