Wednesday, July 23, 2008

Supported VO Tweaks

I have enabled the 'gaussian' VO on the ScotGrid UIs at the request of a local user. This took a little longer than usual as the set-up is slightly unusual - we want to have voms and job submission correctly configured, but do not want to support the VO on our WMS (which we consider our fragile service).

YAIM 4 makes enabling a VO on a specific node type easy - we have an override stanza in services/glite-UI which adds gaussian (through 'VOS="$VOS gaussian"').

The trick bit is the cfengine magic to redefine the WMS and the LB services for gaussian only as YAIM sets all of these to be the same:

ui::
{ /opt/glite/etc/gaussian/glite_wms.conf
ReplaceFirst "https://svr022.gla.scotgrid.ac.uk:7443/glite_wms_wmproxy_server" With "https://rb1.cyf-kr.edu.pl:7443/glite_wms_wmproxy_server"
}
{ /opt/glite/etc/gaussian/glite_wmsui.conf
ReplaceAll "svr022.gla.scotgrid.ac.uk" With "lb.grid.cyf-kr.edu.pl"
}

Gaussian are a special VO who only seem to exist to allow access to a commercial software package: "Gaussian VO enables use of commercial chemical package Gaussian on EGEE Grid".

Grids and licensed software still have some serious paradigm issues when you have to setup an entire VO to use a piece of software.

Nagios, nagios, where are you?

Nagios seems not to be sending us alarms properly on SAM test failures (we had 2 yesterday - one for an internal SAM problem and the other for an SRMv1 timeout issue).

Andrew is investigating.

DPM Upgrade

Mike and I upgraded the DPM yesterday to 1.6.10. There are no schema changes in 1.6.7->1.6.10, so the upgrade involves 'downtime' of about 20s. We didn't put this into the CIC portal, but I suppose in retrospect we should have declared an 'at risk' period.

We're still running the i386 version of DPM (on top of x86_64). At some point it would be desirable to upgrade to x86_64; however, as i386 works just fine and this will involve real downtime, there is no urgent pressure to do so.

Greig has noted that dpm-updatespace in 1.6.10 has a bug in it: https://gus.fzk.de/pages/ticket_details.php?ticket=38330.

Tuesday, July 15, 2008

Alert! Alert!

'twas the night before holidays, when all through the servers not a pager was stirring...


Hmm lulled into a false sense of security by the appearance of [WLCG Nagios] entitled emails alerting me about proxy expiry on the shared nagios system I foolishly thought all was well. However we wern't getting any 'real' alerts from the system to the individual sites.

Turned out to be a configuration issue in the /etc/nagios/uki-scotgrid-*/contacts.cfg

We had

service_notification_options n
host_notification_options n
meaning no notifications were sent - changed this to

service_notification_options w,u,c,r,f
host_notification_options d,u,r,f,s

which means we get alerted on pretty much every state change - for more details see the manual

on a more annoying note - I left my macbook PSU back in the UK and there's a limited no of apple resellers here :-(

Friday, July 11, 2008

Space tokens aplenty!

Glasgow have deployed the ATLAS PROD, USER and GROUP disk space tokens, in line with the requirements of this GGUS ticket.

As we're running DPM, the procedure was fairly trivial, and is documented on the ScotGrid wiki.

We've also fixed and re-enabled the DPM information provider script, which Graeme reported broken in this blog posting.

An ldapsearch query shows that we're now advertising the new tokens correctly:


# atlas:ATLASPRODDISK:online, svr018.gla.scotgrid.ac.uk, resource, grid
dn: GlueSALocalID=atlas:ATLASPRODDISK:online,GlueSEUniqueID=svr018.gla.scotgri
d.ac.uk,Mds-Vo-name=resource,o=grid
objectClass: GlueSATop
objectClass: GlueSA
objectClass: GlueSAPolicy
objectClass: GlueSAState
objectClass: GlueSAAccessControlBase
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueSARoot: atlas:/dpm/gla.scotgrid.ac.uk/home/atlas
GlueSAPath: /dpm/gla.scotgrid.ac.uk/home/atlas
GlueSAType: permanent
GlueSALocalID: atlas:ATLASPRODDISK:online
GlueSAName: Replica online storage for VO atlas
.
.
.


All in all, a productive few minutes' work...I wonder how those with dCache are coping...