Paul put MonAMI back onto the DPM yesterday. We saw a very similar rise in the number of MySQL connections as before, but as we were on the ball with this we were able to look at who was connecting via SHOW PROCESSLIST. Turns out that all the extra connections were from DPM itself. MonAMI was not to blame.
Early this morning the number of connections came back down again, which might indicate that under certain circumstances, DPM starts an extra connection to the database which it then does not let go of for some time (the 24 hour slot for any SRM transaction to complete?). I wonder if this might be the cause of the rare putDone failures we saw.
Thanks to MonAMI we'll be able to watch for this, and correlate any failures with how busy MySQL was.
Paul also did some pretty RRD aggregate plots, which are very much easier to read. Thanks! Note how MonAMI is able to distinguish between atlas and atlas/Role=production, which is incredibly useful.
No comments:
Post a Comment