On Wednesday, the 25th, the Glasgow Scotgrid site was part of the wider SSC5 Security Challenge and during the course of the challenge we encountered several issues with the network security configuration on our core switch.
The configuration changes which caused issued are specifically:
1) Access List Configuration for inbound services
2) ICMP dos-control settings
The Access List Configuration (ACL) did not accept a global default permit with a wild card mask for both IP address ranges and subnets. The key issue here is that when the Access List was applied on an access port for inbound traffic the Access List worked correctly. However, when applied to the primary egress port onto our network switch it disabled remote connectivity into the cluster, while not impacting internal machine to machine traffic on the cluster. The access list was removed and remote access was restored. The root cause for this failure was traced to an incorrectly set ACL ANY permit within the list, however on further investigation each network requiring access to and from the cluster will require its own unique entry rather than a default network range with a series of denied services. The central IT group at the University also run a series of access lists and fire walls within the edge routing and switching network to the JANET environment which can be adapted to fit our requirements within the cluster setup at Glasgow.
A secondary issue;
A dos-control setting which controls the maximum payload for ICMP also caused unusual network behaviour after it was implemented. Effectively by limiting the payload to 512 bytes, this caused Maui and Torque to encounter issues when attempting to communicate with one another which then impacted other services within the cluster environment, while this slowed down Torque and Maui it did not completely stop the cluster, however its removal immediately improved data connectivity within the cluster. This issue is being referred back to the manufacturer as the payload incrementation only increases to 1023 bytes presently.
Once we have an update on this issue we will post it up on the blog.