Friday, March 16, 2007

Resource Broker (Beta) for ScotGrid

Through the wonders of YAIM and cfengine, I was able to setup an lcg-RB on svr023 in two cfengine lines: download metapackage, run configure_node.

And it works! I got output back from my first job:

ppepc62:~/jobs$ edg-job-status https://svr023.gla.scotgrid.ac.uk:9000/5hT0x7GZluDFMWJ6qT0KLQ


*************************************************************
BOOKKEEPING INFORMATION:

Status info for the Job : https://svr023.gla.scotgrid.ac.uk:9000/5hT0x7GZluDFMWJ6qT0KLQ
Current Status: Done (Success)
Exit code: 0
Status Reason: Job terminated successfully
Destination: svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-dteam
reached on: Fri Mar 16 12:24:38 2007
*************************************************************

ppepc62:~/jobs$ edg-job-get-output https://svr023.gla.scotgrid.ac.uk:9000/5hT0x7GZluDFMWJ6qT0KLQ

Retrieving files from host: svr023.gla.scotgrid.ac.uk ( for https://svr023.gla.scotgrid.ac.uk:9000/5hT0x7GZluDFMWJ6qT0KLQ )

*********************************************************************************
JOB GET OUTPUT OUTCOME

Output sandbox files for the job:
- https://svr023.gla.scotgrid.ac.uk:9000/5hT0x7GZluDFMWJ6qT0KLQ
have been successfully retrieved and stored in the directory:
/tmp/jobOutput/graeme_5hT0x7GZluDFMWJ6qT0KLQ

*********************************************************************************


The jobs did, however, run really slowly as R-GMA managed to lock-up twice on me and need restarted. I'm really fed up with this, so I have started an Ops Logbook for the site to at least log these issues in a consistent way.

If anyone has a nagios/cfengine recipe for restarting R-GMA I'd be glad to use it.

No comments: