We finally seem to be homing in on the problems at ECDF. Any job which forks off too many processes seems to die in the batch system. Launching a simple fork python job works fine at 20 children, but dies at 50. The same task at Glasgow runs happily with 100 children.
I can see there is no ulimit issue, but something is unhappy. We must track down if it is SGE or some gatekeeper weirdness.
No comments:
Post a Comment