Tuesday, August 14, 2007

Maximum Queable Jobs

From our two phenogrid DOS attacks, it seems that the maximum number of queued jobs the system can cope with is about 2500. After this the system slides into a crisis, running out of CPU with too many gatekeeper processes active and a context switch storm starts - from which the system can rarely spontaneously recover, it seems.

So, I have set a max_queueable parameter on every queue of 1000, which seems a reasonable number for any single VO or queue.

It seems a limitation of torque that it cannot also have a global cap on queued jobs (at 2500, for instance), but this is only a parameter settable for queues.

