- General sickness in the atlas pilot factory.
- Quite a few BDII dropouts.
- SAM test failures from the above.
- Sluggish clients on our UIs.
- Very slow logins from CERN.
The slow ones had been configured to look at a dnsmasq cache on our headnode, which for unknown reasons was going very slowly (even a restart did not help).
I reconfigured to take out the cache and suddenly all was rosy again across the cluster.
Curiously we had added the cache to overcome problems with campus DNS in the first place.
At least with things configured via cfengine this is a very easy change to make right across the cluster.
No comments:
Post a Comment