Monday, October 08, 2007

MPI Running Properly at Glasgow

I was very keen to take up Stephen Child's offer to get MPI enabled
properly at Glasgow. (See link for my last attempts at this working,
where I cobbled together something that was far from satisfactory, but
at least proved it was possible in theory to get this to work.)

A problem for MPI at Glasgow is that our pool account home directories
are not shared and that jobs all wake up in /tmp anyway. For local
users we offer the /cluster/share area, which gets around this, but
what to do for generic MPI jobs? We decided it would be a very good
idea to offer some shared area for MPI jobs, and that the right
strategy would be to modify mpi-start to pick the job up by its
bootstraps and drop itself back down into a shared directory area in
/cluster/share/mpi. To do this we decided to generalise the
MPI_SHARED_HOME environment variable. Previously this had been "yes"
or "no", but in the new scheme if it points to a directory then the
script transplants the job to an appropriate subdirectory of this
area.

On the site side I had to make site all the MPI environment variables
were properly defined in the job's environment (which we do with
/etc/profile.d/mpi.sh) and advertise the right MPI attributes in the
information system.

It all went pretty well, until we had an issue with mpiexec not being
able to invoke the other job threads properly. (Mpiexec starts the
other job threads via torque, which means they get accounted for
properly and that we can disable passwordless ssh between the WNs
- which we did at the SL4 upgrade). There was a fear it was due to some
weird torque build problem, but in the end it was a simple issue with
server_name not being properly defined on the workers. A quick but of
cfengine and this was then fixed.

So, Glasgow now supports MPI jobs - excellent. (He will rebuild the
newly featured mpi-start and release it next week.)

Big thanks to Stephen for setting this up for us.

No comments: