
Paul is investigating and seems to have found at least one place where connections could leak (although he's unclear why it was triggered).
However, even stopping MonAMI at 11pm last night didn't entirely resolve the situation. At some point in the early hours MySQL seemed to again run out of connections. This caused some of the DPM threads to go mad and write as fast as they could to the disk. By 6am there was a 2.5GB DPM log file and / was full. Yikes.
This morning I had to stop all of DPM and MySQL, move the giant logfile out of the way, and then do a restart.
Paul will try the fix soon, but this time keep a much closer eye on things.
I believe we should also make sure that /var/log on the servers is a separate large partition in the future. Although we have enough space in / during normal running, clearly an abnormal situation can fill things up pretty quickly - and running out of space on the root file system is not desirable!
2 comments:
Ah well, I guess all software has its problems. The problem is diagnosed and fixed now. A new RPM will be released imminently.
Post a Comment