Gerring working on Load balancing

Murali T

unread,

Feb 9, 2011, 10:02:08 PM2/9/11

to Repo and Gerrit Discussion

hi,

At our company we're trying to setup Gerrit on 2 systems behind a load
balancer. We are not sure whether Gerrit supports LB.
We've 2 systems, git01, git02 running Gerrit individually..now we want
to put a LB sharing same set of repos, so http://git request can go to
either git01 or 02 on a round-robin fashion.
Please let us know if this is something supported by Gerrit. We want
to know your opinion before tryout.

thanks
-Murali T

Magnus Bäck

unread,

Feb 10, 2011, 1:51:43 AM2/10/11

to Repo and Gerrit Discussion

On Thursday, February 10, 2011 at 04:02 CET,
Murali T <k12m...@gmail.com> wrote:

I don't see why Gerrit wouldn't support this, but are you actually
planning on distributing the load and have working fail-over or is
it just about round-robin? If just the latter, it would seem like
having two A records in DNS could work sufficiently well. If not,
how would you determine when a server is down and fail-over should
kick in? Chances are the server will still respond on port 80 even
though it's effectively bogged down and not useful to clients.

--
Magnus B�ck Opinions are my own and do not necessarily
SW Configuration Manager represent the ones of my employer, etc.
Sony Ericsson

Ted

unread,

Feb 10, 2011, 1:54:27 AM2/10/11

to Murali T, Repo and Gerrit Discussion

so that would kind of imply you're sharing a file system git and database between the two gerrit instances?

doesn't gerrit has caches right? I can't remember if it was for the database or the git's but wouldn't that cause a caching problem?

--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--
Ted.

Magnus Bäck

unread,

Feb 10, 2011, 2:49:02 AM2/10/11

to Repo and Gerrit Discussion

On Thursday, February 10, 2011 at 07:54 CET,
Ted <r6squ...@gmail.com> wrote:

> On Thu, Feb 10, 2011 at 2:02 PM, Murali T <k12m...@gmail.com> wrote:
>
> > At our company we're trying to setup Gerrit on 2 systems behind a
> > load balancer. We are not sure whether Gerrit supports LB.
> > We've 2 systems, git01, git02 running Gerrit individually..now we
> > want to put a LB sharing same set of repos, so http://git request
> > can go to either git01 or 02 on a round-robin fashion.
> > Please let us know if this is something supported by Gerrit. We
> > want to know your opinion before tryout.
>

> so that would kind of imply you're sharing a file system git and
> database between the two gerrit instances?

Sharing the git storage file system between servers would still leave
you with a single point of failure (which might not be a problem if
you're only after the performance boost). The typical way of managing
read-only slave servers is configuring the master server -- which of
course could be part of the round-robin setup -- to push changes to
the other slaves, making the slaves independent (git-wise at least).

> doesn't gerrit has caches right? I can't remember if it was for the
> database or the git's but wouldn't that cause a caching problem?

Yes, Gerrit has caches for the database lookups, but if the database
connection dies you'll eventually run into problems anyway. Assuming
you run Gerrit on the read-only slaves -- if you don't need
authentication and just run the Git daemon to sync the code you're
obviously not dependent on the Gerrit database.

Ted

unread,

Feb 10, 2011, 3:45:31 AM2/10/11

to Repo and Gerrit Discussion

On Thu, Feb 10, 2011 at 6:49 PM, Magnus Bäck <magnu...@sonyericsson.com> wrote:

On Thursday, February 10, 2011 at 07:54 CET,

Ted <r6squ...@gmail.com> wrote:

> On Thu, Feb 10, 2011 at 2:02 PM, Murali T <k12m...@gmail.com> wrote:
>

> > At our company we're trying to setup Gerrit on 2 systems behind a
> > load balancer. We are not sure whether Gerrit supports LB.
> > We've 2 systems, git01, git02 running Gerrit individually..now we
> > want to put a LB sharing same set of repos, so http://git request
> > can go to either git01 or 02 on a round-robin fashion.
> > Please let us know if this is something supported by Gerrit. We
> > want to know your opinion before tryout.
>

> so that would kind of imply you're sharing a file system git and
> database between the two gerrit instances?

Sharing the git storage file system between servers would still leave
you with a single point of failure (which might not be a problem if
you're only after the performance boost). The typical way of managing
read-only slave servers is configuring the master server -- which of
course could be part of the round-robin setup -- to push changes to
the other slaves, making the slaves independent (git-wise at least).

yes I agree but his original post did say the "same set of repos" and it was going to be setup in a round-robin format. That would imply one is not a read only slave nor cold fail over. It seems to imply they are both active at the exact same time.

> doesn't gerrit has caches right? I can't remember if it was for the
> database or the git's but wouldn't that cause a caching problem?

Yes, Gerrit has caches for the database lookups, but if the database
connection dies you'll eventually run into problems anyway. Assuming
you run Gerrit on the read-only slaves -- if you don't need
authentication and just run the Git daemon to sync the code you're
obviously not dependent on the Gerrit database.

maybe I mis-read his original posting, I got the impression he was going to use both active at the same time for the same purpose which would require authentication on both.

I guess maybe the question to the original poster is... what was the original objective? are you actually running into a performance limit on 1 server or are you trying to setup full redundancy with no single point of failure (which of course would include the load-balancer).

--
Ted.

Murali T

unread,

Feb 10, 2011, 2:24:32 PM2/10/11

to Repo and Gerrit Discussion

Thanks for everyone for your quick responses!

On Feb 10, 12:45 am, Ted <r6squee...@gmail.com> wrote:
> On Thu, Feb 10, 2011 at 6:49 PM, Magnus Bäck

> <magnus.b...@sonyericsson.com>wrote:

>
>
>
>
>
> > On Thursday, February 10, 2011 at 07:54 CET,

> > Ted <r6squee...@gmail.com> wrote:

Ans: Yes, the intention is to have both systems active at the same
time. LB (http://git) will be listening on port 80 and forward
requests to either of Gerrit servers (git01, git02) waiting on 8088-
>29418. Backend db will by Mysql runnring on a separate system.
Both repos same netapp storage (/git/repos) and review_site.

> I guess maybe the question to the original poster is... what was the
> original objective? are you actually running into a performance limit on 1
> server or are you trying to setup full redundancy with no single point of
> failure (which of course would include the load-balancer).

Ans: The objective is high availability of the application. We've not
looked into performance angle yet as this is still in POC as we want
to know if Gerrit supports above architecture.

> --
> Ted.- Hide quoted text -
>
> - Show quoted text -

Martin Fick

unread,

Feb 10, 2011, 2:38:18 PM2/10/11

to repo-d...@googlegroups.com, Murali T

On Thursday 10 February 2011 12:24:32 pm Murali T wrote:
> Ans: The objective is high availability of the
> application. We've not looked into performance angle yet
> as this is still in POC as we want to know if Gerrit
> supports above architecture.

I do not believe that Gerrit currently supports this mode of
operation. Perhaps with some effort, it could be made to
work that way. As pointed out earlier, you would have to
start by eliminating the caching mechanisms. Likely there
are other concurrency issues also, but probably not anything
unsurmountable.

The question is, how quick of a failover do you need, and
how much performance degradation would you see? I suspect
that the degradation would not be worth it. Since you
already have shared storage, it would likely be better to
simply failover to a new instance of Gerrit on a new
machine.

A load balancing front end is for performance more than
for HA, and since performance will likely drop, load
balancing is likely useless. For HA, failover is fine, and
Gerrit can operate that way with the help of something like
heartbeat/pacemaker.

-Martin

Murali T

unread,

Feb 11, 2011, 2:45:41 PM2/11/11

to Repo and Gerrit Discussion

We did some testing and git01, 02 showed different profiles for the
same user (ssh keys were submitted using git01 and when logged into
02, no ssh keys shows up under my profile, same for some code review
records). Your assumption is correct, Gerrit is using some kind of
cache locally. I was under impression that sharing reivew_site would
share cache also. It looks like Gerrit maintains local cache
somewhere. Do you guys have any idea on where this local cache is? if
we configure to share that, would it work?

On another note, can we have multiple Gerrit instances sharing same
mysql database?

Martin Fick

unread,

Feb 11, 2011, 3:16:05 PM2/11/11

to repo-d...@googlegroups.com, Murali T

On Friday 11 February 2011 12:45:41 pm Murali T wrote:
> We did some testing and git01, 02 showed different
> profiles for the same user (ssh keys were submitted
> using git01 and when logged into 02, no ssh keys shows
> up under my profile, same for some code review records).
> Your assumption is correct, Gerrit is using some kind of
> cache locally. I was under impression that sharing
> reivew_site would share cache also. It looks like Gerrit
> maintains local cache somewhere. Do you guys have any
> idea on where this local cache is? if we configure to
> share that, would it work?

The cache is an in memory cache, so no, I don't think there
is a ways to share it. But if you restart 02 after changing
a user on 01, 02 should reflect that change. Files like the
AccountCache.java interface control the caching, if you
muck with the implementations and create a pass through one,
it might be a start.

> On another note, can we have multiple Gerrit instances
> sharing same mysql database?

I don't know, there might be some Gerrit db operations which
if read from the DB partway through, would confuse the other
server, but I can't think of where. I know that some db
operations are not atomic (gwtorm does not have
transactions). There are I believe some recovery checks
that get done on Gerrit startup which fix most of the known
potential interrupted operation problems. So, you would
have to understand those, and somehow code the server to be
able to avoid synchronization problems for these operations.

Another thing I would suspect could be problematic also is
perhaps the legacy creation of ids for Changes.

Likely only Shawn would know the full extent of potential
gotchas.

-Martin

Shawn Pearce

unread,

Feb 11, 2011, 11:12:10 PM2/11/11

to Martin Fick, repo-d...@googlegroups.com, Murali T

On Fri, Feb 11, 2011 at 15:16, Martin Fick <mf...@codeaurora.org> wrote:
>
> The cache is an in memory cache, so no, I don't think there
> is a ways to share it. But if you restart 02 after changing
> a user on 01, 02 should reflect that change. Files like the
> AccountCache.java interface control the caching, if you
> muck with the implementations and create a pass through one,
> it might be a start.

Exactly correct.

>> On another note, can we have multiple Gerrit instances
>> sharing same mysql database?
>
> I don't know, there might be some Gerrit db operations which
> if read from the DB partway through, would confuse the other
> server, but I can't think of where. I know that some db
> operations are not atomic (gwtorm does not have
> transactions).

This is (I believe) true. The order DB operations are executed in are
safe if they are read dirty by another server.

> There are I believe some recovery checks
> that get done on Gerrit startup which fix most of the known
> potential interrupted operation problems. So, you would
> have to understand those, and somehow code the server to be
> able to avoid synchronization problems for these operations.

This is the problem. Actually, its not so much the startup as it is
the change merge queue. When a user clicks a button to submit a change
to a branch, Gerrit takes a lock internally in memory to ensure its
more likely to win the lock on disk when it goes to update the Git
branch. It then proceeds to do the submit into the branch. If the
submit fails, it may reschedule to retry the submit later, or simply
fail hard back to the user. Lock failures (because the other server
actually won the Git reference lock) fail hard back to the user and
don't retry.

This was an optimization hack that I took, if we have only one server
but multiple threads within it, its cheaper to take an in-memory lock
before taking the disk lock, and then we are pretty certain we will
win the disk lock (unless the Gerrit administrator is mucking about
with Git at the same time, and this is very unlikely). Unfortunately
it fails for this case of multiple Gerrit web servers answering submit
requests against the same database.

ChangeMergeQueue / MergeOp might need to be reworked to take the lock
in Git first, but this is perhaps not trivial because you need to make
sure the .lock file is cleaned up if anything aborts for any reason.
Also, JGit doesn't actually expose the locking protocol to you, so if
you take the same lock as JGit, JGit cannot actually make the update.
:-)

> Another thing I would suspect could be problematic also is
> perhaps the legacy creation of ids for Changes.

Nope. These are generated by the database's sequential number
generator, and that is safe.

> Likely only Shawn would know the full extent of potential
> gotchas.

The in-memory caches and the merge queue are the major ones.

I'm not sure this configuration buys you much in terms of performance
or availability. The server really hammers the git filesystem, and
that is one of the bottlenecks. Instead of it being cheap local disk,
its now remote disk that is shared with another server that is also
(potentially) hammering away at it. And the caches in-memory are there
to avoid "expensive" database queries when the database is on the same
machine as Gerrit. Removing these caches so the servers stay in-sync
and having them always query a remote MySQL server is likely going to
slow down user response time, not improve it.

I mostly agree with Magnus Bäck's earlier message in this thread about
using a slave server for reads, and having only one master, and Martin
Fick's comments about using Linux HA tools to implement reasonably
quick fail-over from the one front-end to the other if there is a
failure. FWIW, I don't usually see the Gerrit daemon itself fail, the
database or filesystem is almost as likely to fail as the Gerrit
daemon is. And the load-balanced server pair described in this message
doesn't seem to be making those any more reliable than if they were on
the same system as the Gerrit daemon.

Anatol Pomazau

unread,

Feb 15, 2011, 12:48:26 PM2/15/11

to Martin Fick, repo-d...@googlegroups.com, Murali T

On Fri, Feb 11, 2011 at 12:16 PM, Martin Fick <mf...@codeaurora.org> wrote:

On Friday 11 February 2011 12:45:41 pm Murali T wrote:
> We did some testing and git01, 02 showed different
> profiles for the same user (ssh keys were submitted
> using git01 and when logged into 02, no ssh keys shows
> up under my profile, same for some code review records).
> Your assumption is correct, Gerrit is using some kind of
> cache locally. I was under impression that sharing
> reivew_site would share cache also. It looks like Gerrit
> maintains local cache somewhere. Do you guys have any
> idea on where this local cache is? if we configure to
> share that, would it work?

The cache is an in memory cache, so no, I don't think there
is a ways to share it.

Distributed cache should help is this situation. Gerrit uses ehcache and it has several "distributed replication" adapters. See http://ehcache.org/documentation/distributed_caching.html for more info.

Reply all

Reply to author

Forward