Monday, October 08, 2007

Lunch with glexec developers

We had a very interesting lunchtime meeting with the glexec developers
on Tuesday lunchtime (organised thanks to Alessandra's efforts in the TCG to convince the developers that the sites had serious issues with glexec). The developers very
happy to meet us and discuss the orgins of glexec and why they thought
that it was needed. It was clear that no-one in the meeting is at all
keen on generic pilot job frameworks. However, what's also clear is
that the LHC VOs are going to insist on having them and it's highly
unlikely that sites would ever be able to exert enough influence on
them to stop. However, what we can do is insist that if such
frameworks do exist, then they will have to use a glexec call and
respect its result. What glexec gives, in essence, is sudo like
abilities, but integrated into an X509 based authorisation scheme (and
it uses LCAS/LCMAPS plugins, which we are familiar with). The
advantages to the sites are that (1) there is a lot of control as to
who can call glexec in the first place, e.g., restricting this only to
production roles; (2) using LCMAPS single users can be banned, whereas
without glexec there only option with misbehaving pilot jobs is to ban
the pilot user on the CE (which is tantamount to banning the whole
VO); (3) there's an audit trail of who's payload has been executed
(which is crucial). glexec has safeguards to stop multiple calls
(i.e., the payload recalling glexec).

Then, to suid or not? We are assured that glexec will be distributed
in two flavours - one with the suid bit switched on in the RPM, the
other with it switched off. Of course, post-installation, one can
easily flip the bit using cfengine. The danger of not enabling suid is
that it will be possible for the payload job to access the submission
proxy certificate (danger for the VO) and that sorting out the payload
from the pilot is harder at the process level (danger for the
site). Of course, this has to be balanced by the danger of enabling
suid and risking a possible avenue of privilege escalation should
glexec turn out to have a security problem.

I'm still convinced that for ScotGrid deploying glexec in non-suid
form is best. We can run it for a while and then evaluate the
situation further. It's clear that sites like ECDF will never allow
glexec to be suid for them - so running it in non-suid mode will
always be an option.

The security vulnerabilities that Kostas identified have all been
fixed - along with more potential problems they spotted when going
through the code. They are very happy for other people to look at the
code and feedback problems to them.

Overall I was very impressed at the developers' openness. I still
don't think generic pilots are a good idea for the grid, but in a
world where they exist glexec is a definite help.

1 comment:

Kostas Georgiou said...

IMHO there are major problems with the glexec implementation. It is a very bad idea to have a suid script that links to dozens of libraries (that were not even design to be used in this way).

Take for example the LCAS/LCMAPS libraries, what do you think it will happen if I do something like
"env LCMAPS_LOG_FILE=/etc/passwd glexec"
for example?

Of course this specific attack might not work but there will be other ones that will work.