Now, you could repartition the cluster nodes between a "normal" partition and a "testing" partition, but for most pbs/maui clusters (which don't have anything but the 'ALL' partition set), this involves changing configuration for all the nodes, rather than simply the nodes we care about. (And then changing it back when you're finished.)
You might also consider doing this with reservations - indeed, the maui manual suggests that a reservation locked to a user specified with an & prefix will force precisely the behaviour we want - locking the reservation and the user together. This appears not to work under empirical testing.
Instead, the solution we've found to work is (all in
maui.cfg
):- Create a reservation for the user only.
SRCFG[ssdnodes] PERIOD=INFINITY
SRCFG[ssdnodes] STARTTIME=00:00:00 ENDTIME=24:00:00
SRCFG[ssdnodes] HOSTLIST=node30[0-9]
SRCFG[ssdnodes] USERLIST=ssp001 - Create a quality of service class with the property that it only runs on that reservation.
QOSCFG[ssd] QFLAGS=USERESERVED:ssdnodes
- Make the user a member of that quality of service class only.
USERCFG[ssp001] QDEF=ssd QLIST=ssd
(In this case, the configuration mutually restricts the user
ssp001
and the nodes node300
to node309
to each other.)This has the benefit that it also generalises to any number of users, as long as you add them to the reservation and the QoS class.
3 comments:
Hi,
we do same kind of reservations but without QoS. Something like:
SRCFG[picsgm_64] GROUPLIST=atsgm,sgmcm,lhsgm,masgm,ctasgm,dtsgm,misgm,pasgm,picvosgm
SRCFG[picsgm_64] RESOURCES=PROCS:8
SRCFG[picsgm_64] PRIORITY=1000
SRCFG[picsgm_64] HOSTLIST=node
SRCFG[picsgm_64] STARTTIME=0:00:00 ENDTIME=24:00:00
SRCFG[picsgm_64] PERIOD=INFINITY
Is your QoS conf mandatory? Doesn't it work without it?
Cheers,
Arnau
Well, a reservation like that, without a QoS or other limiting statement, will let only those groups run on those nodes, but doesn't (as far as we can determine with some testing here) prevent those groups from running on other nodes.
The second clause is what the QoS class enforces.
(For SGM job priorities, merely providing a dedicated resource is enough. What we were doing was attempting to re-implement partitions (which are poorly supported in maui) but with the existing tool set.)
Ok, I understand your point.
*I forgot to mention that we also have an extra property for SGM node set by torque_submit_filter in CE.
So job goes to regular queue but with extra queue property. That is our limituing statement.
Thanks Sam,
Arnau
Post a Comment