
Finally, after several months of anguish, SAM jobs are running at ECDF! Note that for the moment they fail replica management tests (there was little point in putting effort into the DPM while the CE was so broken), but at last we're getting output from SAM jobs coming back correctly.
The root cause of this has been networking arrangements which were preventing the worker nodes from making arbitrary outbound connections. Last week we managed to arrange with the systems team to open all ports >1024, outbound, from the workers. Then it was a matter of battering down each of the router blocks one by one (painfully these seemed to take about 2 days each to disappear).
Testing will now continue, but we're very hopeful that things will now come together quickly.
 
No comments:
Post a Comment