Wednesday, March 26, 2008

The curse of 79...

Since the dawn of the Glasgow cluster we have been cursed with a low level of globus 79 errors. We did not understand these well, but always believed that they were caused by a confusion in the gatekeeper, where the X509 authentication seemed to suffer a race condition and get muddled between users.

However, since upgrading to an SL4 CE and installing it on a different machine we still get these cropping up (an example).

The GOC Wiki suggests this can be caused by firewall trouble or an incorrect GLOBUS_TCP_PORT_RANGE. Now, this is (and was) correctly defined on both machines to be the standard 20000-25000. However, I have decided to change it to 50000-55000 in case we are tripping some generic nasty filter somewhere else on campus.

Since I did that, last night, we haven't had a 79 error - however this proves nothing so far as we can easily go for a week without one of these happening.

I also contacted the campus networking people to ask if there was any known port blocks in this range.

5 comments:

Unknown said...

Consider unsetting GLOBUS_TCP_PORT_RANGE completely. The only place it needs to be set is for a service requiring call backs. So that is your GridFTP servers (including the one on your CE) and your UI if and only if you want to use globus-job-run. Other wise it does more harm than good badly selecting source ports for you which may still be in an Active state. Concerning your gatekeeper I am also positive it does not need to be set but keep a close eye once disabled.

Graeme Stewart said...

Interesting. So what port range will it use if this is unset? Just whatever it gets from the kernel, I suppose.

Since I moved the range we have not had an error. But if we get another one I will take your advice and remove this parameter.

Graeme Stewart said...

But wait - the gridftp server lives on the gatekeeper, so we do need it then?

We don't suffer any generic filtering at the campus level, so generally any callback port will be ok...

Andrew Elwell said...

"We don't suffer any generic filtering" ... or so we thought - nmap from offsite showed differently. Networks are 'investigating' Ho Hum....

Andrew Elwell said...
This comment has been removed by the author.