Wednesday, September 12, 2007

Maximum Queable Jobs Bites Back

One of our local ATLAS users wanted to submit 2000 jobs onto the system, which I thought would be ok. Unfortunately he hit the 1000 max_queuable limit, and started having jobs fail. Worse, other ATLAS jobs could also not be queued and we failed quite a few of Steve's tests.

Another unexpected issue was that max_queuable seems to apply to running+queued, which was rather unexpected.

Reconsidering the issue I have decided to set the max_user_queuable parameter to 1000 on each queue instead.

This will prevent users from DOSing their entire VO, but should prevent accidents taking out the CE.

No comments: