Wednesday, September 24, 2008

opendns to the rescue

Glasgow, Edinburgh and Durham suffered SAM failures today due to the scotgrid BDII going AWOL. Actually the BDII itself was OK, the problem was caused by the campus DNS servers taking ages to respond and the LDAP query timing out before they responded.

Cue one quick switchover to OpenDNS servers instead.

Worth scribbling on a sticky note - the 2 nameserver IPs are 208.67.222.222 and
208.67.220.220

Update to the above:
OpenDNS don't return NXDOMAIN for non-existent domains, such as .beowulf.cluster -- This can break your installer horribly (as we discovered at glasgow) if you're expecting things to check which is the right address)

However as we're using dnsmasq you can get round this by flagging the 'helpful' opendns guide addresses as bogus:

ie setup your /etc/dnsmasq.conf

no-resolv
server=208.67.222.222
server=208.67.220.220
bogus-nxdomain=208.69.34.132


This then gives the expected results:

svr031:~# dig www.flarble.co.uk

; <<>> DiG 9.2.4 <<>> www.flarble.co.uk
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 10483
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.flarble.co.uk. IN A

;; Query time: 105 msec
;; SERVER: 10.141.255.254#53(10.141.255.254)
;; WHEN: Fri Oct 31 09:52:00 2008
;; MSG SIZE rcvd: 35


compared to...
svr031:~# dig www.flarble.co.uk @208.67.222.222

; <<>> DiG 9.2.4 <<>> www.flarble.co.uk @208.67.222.222
; (1 server found)
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 24219
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.flarble.co.uk. IN A

;; ANSWER SECTION:
www.flarble.co.uk. 0 IN A 208.69.34.132

;; Query time: 11 msec
;; SERVER: 208.67.222.222#53(208.67.222.222)
;; WHEN: Fri Oct 31 09:52:13 2008
;; MSG SIZE rcvd: 51

No comments: