One of our local ATLAS users wanted to submit 2000 jobs onto the system, which I thought would be ok. Unfortunately he hit the 1000 max_queuable limit, and started having jobs fail. Worse, other ATLAS jobs could also not be queued and we failed quite a few of Steve's tests.
Another unexpected issue was that max_queuable seems to apply to running+queued, which was rather unexpected.
Reconsidering the issue I have decided to set the max_user_queuable parameter to 1000 on each queue instead.
This will prevent users from DOSing their entire VO, but should prevent accidents taking out the CE.
No comments:
Post a Comment