gerrit replication

437 views
Skip to first unread message

Mark

unread,
Nov 9, 2009, 11:05:39 AM11/9/09
to repo-d...@googlegroups.com
Pretend I'm a git n00b :).  I have replication set up in my gerrit instance to replicate the repositories to another server.  The applied changes seem to make it to the "slave" server but if I run an rsync against it, it shows that it needs updates.  Is this just because of the way that git stores things under the hood?  If I need to fail over to the "slave" server, will the changes still have the same SHA1 hash names?

--
Mark
"Blessed is he who finds happiness in his own foolishness, for he will always be happy."

Zach Wily

unread,
Nov 9, 2009, 11:18:53 AM11/9/09
to repo-d...@googlegroups.com
Changesets that haven't been merged in yet are stored in refs/changes.
Those don't get replicated by default. (I think refs/heads/* and refs/
tags/* are the defaults for replication.)

This means that if you're using replication to mirror a repo that
you're planning on failing over to for Gerrit use, and not just git,
you'll want to add "push = +refs/changes/*" to your replication config.

zach

> --~--~---------~--~----~------------~-------~--~----~
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
> -~----------~----~----~----~------~----~------~--~---
>

Mark

unread,
Nov 9, 2009, 11:33:02 AM11/9/09
to repo-d...@googlegroups.com
The docs say the default is +refs/*:refs/* so wouldn't that cover it?

--
Mark
"Blessed is he who finds happiness in his own foolishness, for he will always be happy."


On Mon, Nov 9, 2009 at 10:18 AM, Zach Wily <za...@zwily.com> wrote:
Changesets that haven't been merged in yet are stored in refs/changes. Those don't get replicated by default. (I think refs/heads/* and refs/tags/* are the defaults for replication.)

Zach Wily

unread,
Nov 9, 2009, 11:36:56 AM11/9/09
to repo-d...@googlegroups.com
Ah yes, you are right. What files is rsync saying still need to be transferred? It may just be packing differences between the two repos. (Git will periodically pack up a bunch of small objects into a larger file for efficiency reasons.)

zach


Mark

unread,
Nov 9, 2009, 11:39:43 AM11/9/09
to repo-d...@googlegroups.com
An example of what rsync pushed after I merged one change set on a test project.
mktestproj.git/
mktestproj.git/logs/
mktestproj.git/logs/refs/
mktestproj.git/logs/refs/heads/
mktestproj.git/logs/refs/heads/master
mktestproj.git/objects/
deleting mktestproj.git/objects/e0/f1a28dd093dece4f910cedd0bf7b921afb2226
deleting mktestproj.git/objects/e0/
deleting mktestproj.git/objects/c6/0dd4ce8ec01719244840c73622a340dfdf7b48
deleting mktestproj.git/objects/c6/
deleting mktestproj.git/objects/54/b6e41e68dbf367e9954f83e951bd79d92c6fdb
deleting mktestproj.git/objects/54/
mktestproj.git/objects/pack/
mktestproj.git/objects/pack/pack-c8112d87e5ecbe650a2a894d8152d13672825351.idx
mktestproj.git/objects/pack/pack-c8112d87e5ecbe650a2a894d8152d13672825351.pack
mktestproj.git/refs/changes/
mktestproj.git/refs/changes/98/
mktestproj.git/refs/changes/98/1098/
mktestproj.git/refs/changes/98/1098/1
mktestproj.git/refs/heads/
mktestproj.git/refs/heads/master




--
Mark
"Blessed is he who finds happiness in his own foolishness, for he will always be happy."


Shawn Pearce

unread,
Nov 9, 2009, 11:53:43 AM11/9/09
to repo-d...@googlegroups.com
On Mon, Nov 9, 2009 at 08:05, Mark <ma...@mitsein.net> wrote:
> Pretend I'm a git n00b :).  I have replication set up in my gerrit instance
> to replicate the repositories to another server.  The applied changes seem
> to make it to the "slave" server but if I run an rsync against it, it shows
> that it needs updates.  Is this just because of the way that git stores
> things under the hood?  If I need to fail over to the "slave" server, will
> the changes still have the same SHA1 hash names?

The same SHA-1 hashes are present on the slave, so yes, fail over will
work with no loss.

The difference is due to the on-disk storage. There are two areas
where there will be differences:

- $GIT_DIR/refs/logs

These are different because Gerrit is writing to them on the master...
but the slave is writing to them with different timestamps and
different groups. The timestamps differ because there is a slight lag
between when Gerrit updates its local ref, and when it starts the push
to the slave. That lag means the slave's log will record a slightly
later time for the same change. The slave's log will also record the
user identity of the Gerrit replication user, not the user identity of
the end-user who originally pushed something to Gerrit. Also, the
slave log may differ slightly in its records, as Gerrit might batch up
multiple changes to a local ref within the same replication delay
window into a single push to the slave... so what was 3 records in the
Gerrit log file may be only 1 record in the slave's log file.

- $GIT_DIR/objects/...

Gerrit records every incoming object in a pack file. The standard C
git implementation might explode the incoming pack file into loose
objects. Its two different means of storing the same information, but
is still a difference to rsync. The pack files between the server and
the master could also still be different due to the batching I just
mentioned above. If two different users push to the same project
within the replication delay window, their 2 packs will be combined
into one when pushed to the slave. That means the slave will get only
1 pack file instead of 2, and will thus appear different to rsync, but
still have the same Git data.


I actually repack my master, and then rsync it to the slaves every so
often. The rsync to the slaves looks like this:

--<8--
#!/bin/sh

TOP=/srv/gerrit/repositories
RC=/srv/gerrit/replication.config
GERRIT_PORT=29418

DEST=$(git config --file "$RC" --list |
grep remote.*.url |
cut -d= -f2 |
grep -v :29418 |
sed 's,${name}.git,,')

for u in $DEST
do
echo $u
rsync -a --delete-after $TOP $u
ssh -q -p $GERRIT_PORT localhost gerrit replicate --all --url $u
done
-->8--

The reason for the replicate --all after the rsync is to ensure that
any replication which happened during the rsync isn't lost. Without
this line there is a tiny race condition during the rsync where rsync
may delete or overwrite data which Gerrit has tried to push to the
slave during the rsync. Asking gerrit to replicate all projects back
to the slave after the rsync is done means gerrit will fix up anything
which got missed.

Reply all
Reply to author
Forward
0 new messages