Showing posts with label ALICE. Show all posts
Showing posts with label ALICE. Show all posts

Monday, March 05, 2007

ALICE Queue Disabled at Glasgow

I have been regularly pruning the orphaned ALICE processes off the Glasgow CE. This morning I had to kill 371. We've had very little response from the user concerned, and I can't see how they will be motivated to fix the situation as ALICE have not, and at the moment will not, run any jobs at Glasgow. (We have a 5% share for ALICE as this is GridPP policy and, of course, we generally think grids work better if lots of VOs are enabled.)

I did check the local ALICE queue and it functions correctly.

However, I've now closed the queue in the hope that this will prevent the globus processes from spawning. We know from experience that if these processes are allowed to accumulate too much then they affect the performance of the CE and thus affect all users of the site

Tuesday, February 20, 2007

Glasgow CE slow march to death with alice?




I noticed last week, after coming back from Australia, that the load on the CE had been creeping up and up.

On further investigation I have found 1790 gatekeeper processes running as the user alice001 - but we have no alice jobs running. Indeed, the torque logs show that we've never run any alice jobs (well a few alicesgm jobs in December only).

In fact the CE has hit swap now, so it's time to get biblical:

To every thing there is a season, and a time to every purpose under the heaven:
A time to be born, and a time to die; a time to plant, and a time to pluck up that which is planted;
A time to kill...


# kill $(ps aux | perl -ne 'print "$1 " if /^alice001\s+(\d+)/')


In the spirit of grid operations I have raised the issue as a GGUS ticket rather than emailing the user directly. This will be an interesting experiment in how well sites can raise problems with VOs through GGUS. Ticket #18703.