Wednesday, March 11, 2009

ice cream anyone ...

We now have a functional cream CE in our preproduction mini cluster designated dev011. So what does this give us I hear you cry. Well it has been recently reported that an update to the glite packages allows the WMS to submit directly to Cream through the ice component. anyone for ice cream! So I thought, why not give this a shot! The updates were successfully installed on the UI (dev008), WMS (dev009) and Cream CE (dev011). When I say successfully installed, I actually mean with some minor jpackage voodoo. It seems that this repo is just plain broken and there are all sorts of clashes between the jpackage 5 and 1.7. In fact, the advice on lcg rollout seems to be to remove 1.7 from the repo definition altogether. It would be nice if we could get a standard build of the java that worked and distribute it along with the middleware. Since we haven't seen it. I'm guessing that is not possible! Anyway, on with the ice cream.

I was going to post all the fun I had trying to install the cream CE but for brevity I have moved that to a ScotGrid wiki page and will just show it working with the WMS. In order to test it working on our mini cluster I installed a site bdii and changed LCG_GFAL_INFOSYS such that lcg-infosites on the UI picked up the mini cluster CE's.

-bash-3.00$ lcg-infosites --vo ce
#CPU Free Total Jobs Running Waiting ComputingElement
1912 5 4 0 4 dev011:8443/cream-pbs-q30m
1912 8 4 0 4 dev010:2119/jobmanager-lcgpbs-q30m

a whipped cream example:

-bash-3.00$ cat whippedcream.jdl
Type = "Job";
JobType = "Normal";
Executable = "";
StdOutput = "hw.out";
StdError = "hw.err";
InputSandbox = {""};
OutputSandboxBaseDestURI = "gsiftp://dev008/clusterhome/home/gla057/cream/job_output";
OutputSandbox = {"hw.out", "hw.err"};
Requirements = other.GlueCEUniqueID == "dev011:8443/cream-pbs-q30m";

submission to cream ce through a WMS:

-bash-3.00$ glite-wms-job-submit -a --vo --debug -r dev011:8443/cream-pbs-q30m whippedcream.jdl

can we see the job in torque? Yes we can.

svr016:~# qstat | grep sco
2214311.svr016 cream_034614244 scotg001 0 W q30m

Has it worked through the cream ce? Yes!

-bash-3.00$ glite-wms-job-status https://dev009:9000/l4-RXjbtZbk1g00moK2IWA


Status info for the Job : https://dev009:9000/l4-RXjbtZbk1g00moK2IWA
Current Status: Done (Success)
Logged Reason(s):
- job completed
- Job Terminated Successfully
Exit code: 0
Status Reason: Job Terminated Successfully
Destination: dev011:8443/cream-pbs-q30m
Submitted: Fri Mar 6 16:28:15 2009 GMT

It also appears there is no need to ask for the job output as this is automatically gsiftp'd to your output sandbox dir specified in the jdl.

-bash-3.00$ glite-wms-job-output https://dev009:9000/l4-RXjbtZbk1g00moK2IWA
Connecting to the service https://dev009:7443/glite_wms_wmproxy_server
Error - Output not Allowed
Output files already retrieved

One point to note is that you now have to run a gridftp server to stage successful output from the cream CE. This is also useful for staging files in especially if you want to bypass WMS inputSandbox size limitations imposed by sites. For a more in-depth account of the install you can check out the ScotGrid wiki. This may help if you encounter anything weird.


SteveT said...

Hi Dug,

You may be a perfect candidate to check some things out for me. In particular does anything get passed about what you added as a requirement. e.g Try matching for a queue length.

This should be some environment variables in your job on the WN reflecting what you asked for?

dug mcnab said...

Hi Steve,

I will take a look and write another post.