We failed another couple of CE-RM tests over the weekend. It's clear this started when the load on the machine crept up from ~0.5 to ~1.0 about 2 weeks ago. I didn't change anything on the box, so I am mystified as to what's caused this change. Perhaps it's a greater load being put on the BDII by expanded use of the RB?
I have switched back to using the RAL BDII for the moment and we haven't failed since then. I may setup an additional top level BDII on svr017, which is the unloaded scotgrid admin node, and see if that has a lower overall load.
I will also upgrade the BDII to the new release, which uses indexes which speeds up queries, and see if that helps.