> I'm looking into the practicality of running multiple Git daemons
> (on different hosts) pointing to the same ReviewDB and back-end Git
> repositories, as a means of load-balancing between multiple clients.
Do you mean balancing the read access of the repositories?
> Whilst most data is immutable in the ReviewDB, and Gerrit appears to
> do a lot of reloading of the database's state (e.g. to determine if
> another program has run any 'gerrit review' flags, I'm concerned
> whether there might be some data which isn't reloaded upon demand
> from the database.
Gerrit is currently not ready for multi-master operation. You need to
have a single master server, but that master can send data to any number
of slave servers to offload the traffic from clients that fetch data.
[...]
--
Magnus Bäck Opinions are my own and do not necessarily
SW Configuration Manager represent the ones of my employer, etc.
Sony Ericsson
There is a *lot* of data cached in memory in the Gerrit server. The
entire "accounts", "account_group_members", "account_external_ids",
"account_ssh_keys", etc. tables are held in memory by the server and
hardly read from the database. (Actually its a cache, but assuming
your cache is larger than the active set, its read once and never read
again.) In Gerrit 2.1.x the "projects" and "ref_rights" are also
cached, holding the access control data. In Gerrit 2.2.x the
"refs/meta/config" branch is parsed and held cached until the server
notices the branch was changed, which it checks for every few minutes.
> The one thing I can think of is the change set number (which is used
> to calculate the next branch to push to) as this is monotonically
> increasing from one review push to the next.
This is incremented by the database, not the Gerrit server. So its safe.
What isn't safe is the processing of the submit queue. When a change
gets submitted Gerrit puts the change into the submit queue in memory.
That queue is then run essentially single-threaded to update the Git
branch and mark the change as merged. If the server goes down and
comes back up, the submit queue is reloaded from the database and
processed by the server. If you have N servers, and they all restart
at the same time (e.g. power hiccup and they all restart), the submit
queue will be processed N times concurrently with lots of race
conditions and failures.
In theory this stuff will be safe, the Git low-level locking will make
sure the branch isn't corrupted, but you will probably see a lot of
false lock errors in the error logs due to lock contention, and you
may see changes get stuck in Submitted state even though they are
actually Merged because the lock contention caused 1 server to succeed
and write Merged, and then the other server to fail and try to write
Submitted back to the database to support retrying later.
Basically I didn't write the code to support this use case, its never
been tested like this, so I wouldn't suggest running it that way in
production right now.
> I'd be happy to help contribute changes necessary to try and remove
> any contentious points if there were any.
If you want to load-balance the Git-over-SSH or Git-over-HTTP
operations, you can use slave servers. Check the daemon --slave flag
in the documentation. The slaves can use the same Git repository and
database. All web UI and write operations have to go through the
master, but Git reads can be sent through the slaves. If you run the
slave caches with a shorter maxAge (e.g. 1 hour rather than 90 days)
users who change their SSH keys will have less time to wait for a
slave to pick up their new key.
http://gerrit.googlecode.com/svn/documentation/2.2.0/pgm-daemon.html
On Wed, Aug 10, 2011 at 01:45, AlBlue <alex.b...@gmail.com> wrote:I'm looking into the practicality of running multiple Git daemons (ondifferent hosts) pointing to the same ReviewDB and back-end Gitrepositories, as a means of load-balancing between multiple clients.
There is a *lot* of data cached in memory in the Gerrit server. The
entire "accounts", "account_group_members", "account_external_ids",
"account_ssh_keys", etc. tables are held in memory by the server and
hardly read from the database.
The one thing I can think of is the change set number (which is usedto calculate the next branch to push to) as this is monotonicallyincreasing from one review push to the next.
This is incremented by the database, not the Gerrit server. So its safe.
What isn't safe is the processing of the submit queue.
...and you may see changes get stuck in Submitted state even though they are
actually Merged because the lock contention caused 1 server to succeed
and write Merged, and then the other server to fail and try to write
Submitted back to the database to support retrying later.
Basically I didn't write the code to support this use case, its never
been tested like this, so I wouldn't suggest running it that way in
production right now.
I'd be happy to help contribute changes necessary to try and removeany contentious points if there were any.
If you want to load-balance the Git-over-SSH or Git-over-HTTP
operations, you can use slave servers.
Yup. That's the two classes.
It might be acceptable to have ReloadSubmitQueueOp only run on a
single server, and not on the others (e.g. by a configuration flag).
Or to not run it at all and instead have an SSH command an
administrator can execute against a specific server to make that load
and retry the submit queue after a downtime event has occurred.
> > Thanks, that's also good to know. I suspect read-only replicas will benefit
> > from
> > this, but ideally I'd like to have live-live Gerrit writable instances as
> > well.
A lot of folks would like to have this, and that submit queue stuff is
one of the things preventing it right now. The other is the caches
being (reasonably) coherent between servers.