Wednesday, November 14, 2007
Maui has been driving me mad for about 3 weeks now. When we upgraded the cluster I had forgotten that moving to pooled prd and sgm accounts would mean that these groups were independent of the normal VO fairshare. As our engineers started to become more active I was unable to get any ATLAS jobs to start at all - particularly atlasprd jobs. As I tried to add fairshare for the new groups maui started to lose the plot, just dropping groups entirely from its fairshare groups - you can see the effect very clearly from MonAMI's maui plots - groups just evaporate!
Fed up with this, tonight, I stopped maui, removed all its databases and restarted it. This, of course, means it's lost its current fairshare calculations, but at least it now has fairshares for the new groups.
I have also re-jigged the fairshare algorithm to have far less of a decay on it - users who ran 7 day jobs were at a huge advantage because by the time their job had finished its first day of running was weighted by 0.3, so it almost didn't count!