Caching problems, ldap and rereading group memberships

147 views

Skip to first unread message

Fredrik Luthander

unread,

Jun 11, 2010, 7:59:18 AM6/11/10

to Repo and Gerrit Discussion

Hi people!

We're experiencing some issues with ldap group memberships, and the
procedure when a group membership is found out by Gerrit for the first
time.

Scenario:
A user is in our LDAP directory, and Gerrit has previous knowledge of
this user. The user does not have any group memberships at this time,
and so has no access to gits or branches that requires any group
memberships.

An administrator of a group decides the user should have access to the
gits that the group controls, and hence add user to the group - in
LDAP.

At this point the LDAP directory will report that the user now is a
member, but because Gerrit caches are quite aggressive, it has not yet
realized this fact. So, the user can trigger that Gerrit reread this
information for the user by logging out of gerrit and then log right
back in again. However, there are a slight problem after this fact,
and that is the slave server. Because the slave server still have not
refreshed it's knowledge about the user and so even if the master now
allows access to the gits the slave won't. So, we need a slaves to be
notified that a particular user now is stale in cache and needs a
reread from the db. It becomes a large administrative burden to
trigger a full cache-flush everytime someone gets a group membership.
Also, it's expensive as the slave loses all group membership caching
and will need to reread them for everyone, not only the user in
question.

Any ideas or suggestions on the problem we're seeing? Any smart way to
get rid of the admin work around this? Is there anything I can post an
issue on that will help us get rid of this?

BR,
Fredrik

Shawn Pearce

unread,

Jun 11, 2010, 11:48:41 AM6/11/10

to Fredrik Luthander, Repo and Gerrit Discussion

On Fri, Jun 11, 2010 at 04:59, Fredrik Luthander
<fredrik....@sonyericsson.com> wrote:
> At this point the LDAP directory will report that the user now is a
> member, but because Gerrit caches are quite aggressive, it has not yet
> realized this fact. So, the user can trigger that Gerrit reread this
> information for the user by logging out of gerrit and then log right
> back in again. However, there are a slight problem after this fact,
> and that is the slave server. Because the slave server still have not
> refreshed it's knowledge about the user and so even if the master now
> allows access to the gits the slave won't.

This is why the daemon manual page says to set the caches to a very
low maxAge, because they don't evict and refresh when operations occur
in the web UI.

> So, we need a slaves to be
> notified that a particular user now is stale in cache and needs a
> reread from the db.

The idea behind the magic 'Gerrit Code Review' SSH user, and the
peer_keys file, was to allow the slaves to connect over SSH to the
master, and then start listening for cache flush events. Similar to
`gerrit stream-events`, only the slave would be listening for the
lower level events that get posted to the master's caches, allowing
the slaves to evict individual records just seconds behind the master.
I got started... but didn't finish the work.

This may not be that much more work to implement.

When starting up in slave mode, start a background thread that just
connects to the master in a loop. If the connection breaks, it just
keeps trying to reconnect until its successful. The thread runs some
new `gerrit stream-cache-changes` command or something similar, which
allows the slave to learn about changes to the master's caches.

On the master side, implement that new stream-cache-changes command
similar to how stream-events is done. An async IO queue pushing
events out to the connected clients, without needing to tie up a
thread per client on the server side.

We might then just need to add a listener to the underlying Ehcache
objects that allows us to learn about remove() and removeAll() events,
and send them out to listening clients.