I don't see why Gerrit wouldn't support this, but are you actually
planning on distributing the load and have working fail-over or is
it just about round-robin? If just the latter, it would seem like
having two A records in DNS could work sufficiently well. If not,
how would you determine when a server is down and fail-over should
kick in? Chances are the server will still respond on port 80 even
though it's effectively bogged down and not useful to clients.
--
Magnus B�ck Opinions are my own and do not necessarily
SW Configuration Manager represent the ones of my employer, etc.
Sony Ericsson
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
> On Thu, Feb 10, 2011 at 2:02 PM, Murali T <k12m...@gmail.com> wrote:
>
> > At our company we're trying to setup Gerrit on 2 systems behind a
> > load balancer. We are not sure whether Gerrit supports LB.
> > We've 2 systems, git01, git02 running Gerrit individually..now we
> > want to put a LB sharing same set of repos, so http://git request
> > can go to either git01 or 02 on a round-robin fashion.
> > Please let us know if this is something supported by Gerrit. We
> > want to know your opinion before tryout.
>
> so that would kind of imply you're sharing a file system git and
> database between the two gerrit instances?
Sharing the git storage file system between servers would still leave
you with a single point of failure (which might not be a problem if
you're only after the performance boost). The typical way of managing
read-only slave servers is configuring the master server -- which of
course could be part of the round-robin setup -- to push changes to
the other slaves, making the slaves independent (git-wise at least).
> doesn't gerrit has caches right? I can't remember if it was for the
> database or the git's but wouldn't that cause a caching problem?
Yes, Gerrit has caches for the database lookups, but if the database
connection dies you'll eventually run into problems anyway. Assuming
you run Gerrit on the read-only slaves -- if you don't need
authentication and just run the Git daemon to sync the code you're
obviously not dependent on the Gerrit database.
On Thursday, February 10, 2011 at 07:54 CET,
Ted <r6squ...@gmail.com> wrote:
> On Thu, Feb 10, 2011 at 2:02 PM, Murali T <k12m...@gmail.com> wrote:
>
> > At our company we're trying to setup Gerrit on 2 systems behind a
> > load balancer. We are not sure whether Gerrit supports LB.
> > We've 2 systems, git01, git02 running Gerrit individually..now we
> > want to put a LB sharing same set of repos, so http://git request
> > can go to either git01 or 02 on a round-robin fashion.
> > Please let us know if this is something supported by Gerrit. We
> > want to know your opinion before tryout.
>
> so that would kind of imply you're sharing a file system git andSharing the git storage file system between servers would still leave
> database between the two gerrit instances?
you with a single point of failure (which might not be a problem if
you're only after the performance boost). The typical way of managing
read-only slave servers is configuring the master server -- which of
course could be part of the round-robin setup -- to push changes to
the other slaves, making the slaves independent (git-wise at least).
> doesn't gerrit has caches right? I can't remember if it was for theYes, Gerrit has caches for the database lookups, but if the database
> database or the git's but wouldn't that cause a caching problem?
connection dies you'll eventually run into problems anyway. Assuming
you run Gerrit on the read-only slaves -- if you don't need
authentication and just run the Git daemon to sync the code you're
obviously not dependent on the Gerrit database.
I do not believe that Gerrit currently supports this mode of
operation. Perhaps with some effort, it could be made to
work that way. As pointed out earlier, you would have to
start by eliminating the caching mechanisms. Likely there
are other concurrency issues also, but probably not anything
unsurmountable.
The question is, how quick of a failover do you need, and
how much performance degradation would you see? I suspect
that the degradation would not be worth it. Since you
already have shared storage, it would likely be better to
simply failover to a new instance of Gerrit on a new
machine.
A load balancing front end is for performance more than
for HA, and since performance will likely drop, load
balancing is likely useless. For HA, failover is fine, and
Gerrit can operate that way with the help of something like
heartbeat/pacemaker.
-Martin
The cache is an in memory cache, so no, I don't think there
is a ways to share it. But if you restart 02 after changing
a user on 01, 02 should reflect that change. Files like the
AccountCache.java interface control the caching, if you
muck with the implementations and create a pass through one,
it might be a start.
> On another note, can we have multiple Gerrit instances
> sharing same mysql database?
I don't know, there might be some Gerrit db operations which
if read from the DB partway through, would confuse the other
server, but I can't think of where. I know that some db
operations are not atomic (gwtorm does not have
transactions). There are I believe some recovery checks
that get done on Gerrit startup which fix most of the known
potential interrupted operation problems. So, you would
have to understand those, and somehow code the server to be
able to avoid synchronization problems for these operations.
Another thing I would suspect could be problematic also is
perhaps the legacy creation of ids for Changes.
Likely only Shawn would know the full extent of potential
gotchas.
-Martin
Exactly correct.
>> On another note, can we have multiple Gerrit instances
>> sharing same mysql database?
>
> I don't know, there might be some Gerrit db operations which
> if read from the DB partway through, would confuse the other
> server, but I can't think of where. I know that some db
> operations are not atomic (gwtorm does not have
> transactions).
This is (I believe) true. The order DB operations are executed in are
safe if they are read dirty by another server.
> There are I believe some recovery checks
> that get done on Gerrit startup which fix most of the known
> potential interrupted operation problems. So, you would
> have to understand those, and somehow code the server to be
> able to avoid synchronization problems for these operations.
This is the problem. Actually, its not so much the startup as it is
the change merge queue. When a user clicks a button to submit a change
to a branch, Gerrit takes a lock internally in memory to ensure its
more likely to win the lock on disk when it goes to update the Git
branch. It then proceeds to do the submit into the branch. If the
submit fails, it may reschedule to retry the submit later, or simply
fail hard back to the user. Lock failures (because the other server
actually won the Git reference lock) fail hard back to the user and
don't retry.
This was an optimization hack that I took, if we have only one server
but multiple threads within it, its cheaper to take an in-memory lock
before taking the disk lock, and then we are pretty certain we will
win the disk lock (unless the Gerrit administrator is mucking about
with Git at the same time, and this is very unlikely). Unfortunately
it fails for this case of multiple Gerrit web servers answering submit
requests against the same database.
ChangeMergeQueue / MergeOp might need to be reworked to take the lock
in Git first, but this is perhaps not trivial because you need to make
sure the .lock file is cleaned up if anything aborts for any reason.
Also, JGit doesn't actually expose the locking protocol to you, so if
you take the same lock as JGit, JGit cannot actually make the update.
:-)
> Another thing I would suspect could be problematic also is
> perhaps the legacy creation of ids for Changes.
Nope. These are generated by the database's sequential number
generator, and that is safe.
> Likely only Shawn would know the full extent of potential
> gotchas.
The in-memory caches and the merge queue are the major ones.
I'm not sure this configuration buys you much in terms of performance
or availability. The server really hammers the git filesystem, and
that is one of the bottlenecks. Instead of it being cheap local disk,
its now remote disk that is shared with another server that is also
(potentially) hammering away at it. And the caches in-memory are there
to avoid "expensive" database queries when the database is on the same
machine as Gerrit. Removing these caches so the servers stay in-sync
and having them always query a remote MySQL server is likely going to
slow down user response time, not improve it.
I mostly agree with Magnus Bäck's earlier message in this thread about
using a slave server for reads, and having only one master, and Martin
Fick's comments about using Linux HA tools to implement reasonably
quick fail-over from the one front-end to the other if there is a
failure. FWIW, I don't usually see the Gerrit daemon itself fail, the
database or filesystem is almost as likely to fail as the Gerrit
daemon is. And the load-balanced server pair described in this message
doesn't seem to be making those any more reliable than if they were on
the same system as the Gerrit daemon.
On Friday 11 February 2011 12:45:41 pm Murali T wrote:The cache is an in memory cache, so no, I don't think there
> We did some testing and git01, 02 showed different
> profiles for the same user (ssh keys were submitted
> using git01 and when logged into 02, no ssh keys shows
> up under my profile, same for some code review records).
> Your assumption is correct, Gerrit is using some kind of
> cache locally. I was under impression that sharing
> reivew_site would share cache also. It looks like Gerrit
> maintains local cache somewhere. Do you guys have any
> idea on where this local cache is? if we configure to
> share that, would it work?
is a ways to share it.