Monday, March 02, 2009

Interactive Debugging on WNs

[Not strictly scotgrid, but figured the scotgrid blog has a higher readership than my personal ramblings]. How to get an interactive bash shell on the workernodes (with grid environment) to debug. Case in point, as part of ther certification of the SL5 x86_64 bit WN, I could lcg-cr fine on the command line, but not as a job.

WARNING - Trying this as a user without the site administrators assistance will probably lead to 'Bad Things' happening to your DN and the banned user list... You have been warned.

So - I wanted to get a shell to work out exactly what wasn't quite right.

On the Workernode:
1) install screen (yum install screen)
2) chmod 755 /var/run/screen
3) chmod +s /usr/bin/screen (yes, we know SUID is bad mmmkaaay.)
4) append to /etc/screenrc
multiuser on
acladd root

Then your jdl can simply invoke 'screen -dm'. root can then reattach to the session on the same workernode using screen -rx wnusername/pid... syntax, eg:

[root@vtb-generic-94 ~]# screen -r dteam013/
sh-3.2$ voms-proxy-info --all
subject : /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=aelwell/CN=671736/CN=Andrew Elwell/CN=proxy/CN=proxy/CN=limited proxy


Gotchas: Trying to be smart and put Executable = "/usr/bin/screen"; and Arguments = "-d -m"; doesn't help. Although the screen session launces as it should, the cleanup wipes all your proxy and other goodies.

Working with a noddy input sandbox of
screen -dm
sleep 3600

did the trick fine.

1 comment:

Stephen Childs said...

The i2glogin program works really well for interactive login over the grid. You start up the server on your UI then submit a job which connects back to the correct port. After waiting a little while for the job to get through the grid, you get a shell on the WN.

Something like this in your JDL:

Executable = "i2glogin";
Arguments = "-p $port:$UI_address -r -t -c /bin/sh";

RPM available at: