Tuesday, February 20, 2007

Glasgow CE slow march to death with alice?




I noticed last week, after coming back from Australia, that the load on the CE had been creeping up and up.

On further investigation I have found 1790 gatekeeper processes running as the user alice001 - but we have no alice jobs running. Indeed, the torque logs show that we've never run any alice jobs (well a few alicesgm jobs in December only).

In fact the CE has hit swap now, so it's time to get biblical:

To every thing there is a season, and a time to every purpose under the heaven:
A time to be born, and a time to die; a time to plant, and a time to pluck up that which is planted;
A time to kill...


# kill $(ps aux | perl -ne 'print "$1 " if /^alice001\s+(\d+)/')


In the spirit of grid operations I have raised the issue as a GGUS ticket rather than emailing the user directly. This will be an interesting experiment in how well sites can raise problems with VOs through GGUS. Ticket #18703.

No comments: