Wandisco's Gerrit Multimaster solution. Anyone with experience?

900 views
Skip to first unread message

Gustaf Lundh

unread,
Dec 5, 2014, 5:13:23 AM12/5/14
to repo-d...@googlegroups.com
Wandisco seems to provide a backend/framework which would allow Gerrit to run in a multimaster environment [1]. They make many great claims [2], like:

Git MultiSite for Gerrit applies WANdisco’s unique, patented replication to turn Gerrit master
and slave servers into fully writable Git peer nodes, eliminating the bottleneck. 

and

Every User Works at LAN Speed
Deploy a Git MultiSite server close to each office and experience LAN speed while cloning,
pulling, and pushing with Gerrit. 

and

Zero Down Time. Zero Data Loss. Maximum Performance.
Failover between servers is automatic and there is no single point of failure for Git, making maintenance windows a thing of the past. 


Perhaps even more interesting is that they are not relying on a fork:

Pure Gerrit
Git MultiSite for Gerrit is built on open source Gerrit and WANdisco’s active/active
replication technology. There are no proprietary back ends or black
box data stores to contend with. Stay up to date with the latest Gerrit releases
and use regular Git protocols and client programs with no restrictions.

and

Unlike commercial solutions that use proprietary versions of Git polluted with other software users don’t want or need, Git MultiSite builds on open source Git. That means Git MultiSite can be used with any other open source or closed source software with no vendor lock-in, or need to retrain users. Git MultiSite now integrates with Gerrit and GitLab, the leading code review and collaboration tools.Unlike commercial solutions that use proprietary versions of Git polluted with other software users don’t want or need, Git MultiSite builds on open source Git. That means Git MultiSite can be used with any other open source or closed source software with no vendor lock-in, or need to retrain users. Git MultiSite now integrates with Gerrit and GitLab, the leading code review and collaboration tools.

So. Does anyone have experience of running this solution in production? How does it scale? Since LAN-speed pushes are promised, how is the ref update racing conditions handled? The master receiving the push should need to synchronize the ref lock with all the other masters around the world. And should be quite costly.



Luca Milanesio

unread,
Dec 5, 2014, 5:19:04 AM12/5/14
to Gustaf Lundh, repo-d...@googlegroups.com
WANDisco developed its own patented technology a while ago (http://www.wandisco.com/get.php?f=documentation/whitepapers/WANdisco_DConE_White_Paper.pdf) for SVN.
They then applied the same to Git, as it based on file-system level replication.

Honestly Gerrit is more than just file-system level replication as it's not clear to me how they solved the DB problem: unless they require all Gerrit servers to share the same DB, which wouldn't be ideal :-O

Would be useful if anyone from WANDisco watching this forum could clarify how it works :-)

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gustaf Lundh

unread,
Dec 5, 2014, 6:54:27 AM12/5/14
to repo-d...@googlegroups.com, gustaf...@sonymobile.com

> Would be useful if anyone from WANDisco watching this forum could clarify how it works :-)

Yes. I sent them a request, asking for an engineer join the discussion. Let's hope they show up. 

Gustaf

On Friday, December 5, 2014 11:19:04 AM UTC+1, lucamilanesio wrote:
WANDisco developed its own patented technology a while ago (http://www.wandisco.com/get.php?f=documentation/whitepapers/WANdisco_DConE_White_Paper.pdf) for SVN.
They then applied the same to Git, as it based on file-system level replication.

Honestly Gerrit is more than just file-system level replication as it's not clear to me how they solved the DB problem: unless they require all Gerrit servers to share the same DB, which wouldn't be ideal :-O

Would be useful if anyone from WANDisco watching this forum could clarify how it works :-)

Luca.

On 5 Dec 2014, at 10:13, Gustaf Lundh wrote:

Wandisco seems to provide a backend/framework which would allow Gerrit to run in a multimaster environment [1]. They make many great claims [2], like:

Git MultiSite for Gerrit applies WANdisco’s unique, patented replication to turn Gerrit master
and slave servers into fully writable Git peer nodes, eliminating the bottleneck. 

and

Every User Works at LAN Speed
Deploy a Git MultiSite server close to each office and experience LAN speed while cloning,
pulling, and pushing with Gerrit. 

and

Zero Down Time. Zero Data Loss. Maximum Performance.
Failover between servers is automatic and there is no single point of failure for Git, making maintenance windows a thing of the past. 


Perhaps even more interesting is that they are not relying on a fork:

Pure Gerrit
Git MultiSite for Gerrit is built on open source Gerrit and WANdisco’s active/active
replication technology. There are no proprietary back ends or black
box data stores to contend with. Stay up to date with the latest Gerrit releases
and use regular Git protocols and client programs with no restrictions.

and

Unlike commercial solutions that use proprietary versions of Git polluted with other software users don’t want or need, Git MultiSite builds on open source Git. That means Git MultiSite can be used with any other open source or closed source software with no vendor lock-in, or need to retrain users. Git MultiSite now integrates with Gerrit and GitLab, the leading code review and collaboration tools.Unlike commercial solutions that use proprietary versions of Git polluted with other software users don’t want or need, Git MultiSite builds on open source Git. That means Git MultiSite can be used with any other open source or closed source software with no vendor lock-in, or need to retrain users. Git MultiSite now integrates with Gerrit and GitLab, the leading code review and collaboration tools.

So. Does anyone have experience of running this solution in production? How does it scale? Since LAN-speed pushes are promised, how is the ref update racing conditions handled? The master receiving the push should need to synchronize the ref lock with all the other masters around the world. And should be quite costly.




--
--

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Patrick Renaud

unread,
Dec 5, 2014, 7:07:46 PM12/5/14
to repo-d...@googlegroups.com, gustaf...@sonymobile.com
Hi,

We spoke to WANdisco about a month ago. I can confirm that their solution isn't handling the db part currently. I suspect they are very much eager to see the Gerrit community moving faster in getting rid of this famous db, which would suddenly make the wandisco solution very very interesting.

We are very intrigued by their solution (read: interested!). We are in the process of securing an NDA between our respective companies in order to push the technical investigations around their solution to the next level. Not sure if this will lead us anywhere but we want to know more about it. Multi-master is our objective, and we are under the impression that WANdisco may help us getting there faster than if we try to do it ourselves with plain Gerrit.

Cheers!
-Patrick
Luca.

To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Richard Christie

unread,
Dec 6, 2014, 9:29:59 AM12/6/14
to repo-d...@googlegroups.com
We had WANdisco onsite only last week about this; and also asked about DB replication.

Their response was that we needed to use the replication of the underlying database e.g. mysql/postgres to have hot copies. At the moment to my understanding of what they said, the product is not yet ready for roll out but will do so very shortly.

We are obviously very interested in the claims of multi-master and have a more technical discussion about the possibilities with some of their engineers hopefully getting scheduled for January. I'll let you know what we learn if others haven't already got info by then.

Shawn Pearce

unread,
Dec 6, 2014, 8:04:21 PM12/6/14
to Patrick Renaud, repo-discuss, Lundh, Gustaf
This has been an interesting thread, because it sounds like WANdisco
really has a mutli-master Git/Gerrit solution that we have failed to
completely build as open source.

Out of pure curiosity I wonder how it compares to what we use behind
gerrit-review.

On Fri, Dec 5, 2014 at 4:07 PM, Patrick Renaud <pren...@gmail.com> wrote:
> We spoke to WANdisco about a month ago. I can confirm that their solution
> isn't handling the db part currently. I suspect they are very much eager to
> see the Gerrit community moving faster in getting rid of this famous db,
> which would suddenly make the wandisco solution very very interesting.

Dave Borowitz and I had hoped to finish killing the database this
year, but that is running behind. Looks like its going to happen in
2015.

A big part of our work assumes there will be a fast way for a peer to
be informed about reference updates being made by the Git replication,
as the peer must scan the notedb /meta branch and update the local
Lucene (or Solr) index for change search to work properly from that
peer. Likewise the peer must be able to pick up refs/meta/config
branch updates quickly to apply new ACLs.

We were hoping the multi-master Git support would include some sort of
event publishing system that Gerrit servers could subscribe to, so
they can discover reference updates and take action. I wonder if
WANdisco will be able to supply that from their Git multi-master
product.

> We are very intrigued by their solution (read: interested!). We are in the
> process of securing an NDA between our respective companies in order to push
> the technical investigations around their solution to the next level. Not
> sure if this will lead us anywhere but we want to know more about it.
> Multi-master is our objective, and we are under the impression that WANdisco
> may help us getting there faster than if we try to do it ourselves with
> plain Gerrit.

It (unfortunately) may. We haven't been working on multi-master very
quickly in the Gerrit open source project. :(

Luca Milanesio

unread,
Dec 7, 2014, 7:56:59 AM12/7/14
to Richard Christie, repo-d...@googlegroups.com
That was my impression as well: they do not manage Gerrit replication (repo + DB) but instead simply use their technology for a filesystem-based replication, regardless of the fact the files are Git or Subversion or any other format.
Would be interesting to have a more detailed design picture from them on the solution.

Luca.

Gustaf Lundh

unread,
Dec 8, 2014, 4:20:48 AM12/8/14
to repo-d...@googlegroups.com, gustaf...@sonymobile.com
Thanks for your comments.

I'm still quite curious how they can promise LAN speeds while pushing. The master receiving the push still needs to synchronize the ref lock with all the other masters around the world. I think this is also the reason Gerrit have longer push times than what a single master installation would have (please correct me if I'm wrong here, Shawn).

And if we read the DCONE[1] (Distributed Coordination Engine) white paper it does seems they are not using this kind of locking for solving the collision issue:

Concurrent Agreement 
The Paxos algorithm only allows agreement to be reached on one proposal
at a time. This has the obvious effect of slowing down performance in a
high transaction volume environment. DConE allows multiple proposals
from multiple proposers to progress simultaneously, rather than waiting for
agreement to be reached by all of the nodes on a proposal by proposal basis.

This is interesting. So if two users are pushing to two different masters, when is the collision handled? After the push is completed? Or at the end of the push? Since Gerrit is unmodified, is there actually a way to handle this today?

/Gustaf 

Shawn Pearce

unread,
Dec 8, 2014, 12:25:57 PM12/8/14
to Gustaf Lundh, repo-discuss
On Mon, Dec 8, 2014 at 1:20 AM, Gustaf Lundh
<gustaf...@sonymobile.com> wrote:
> Thanks for your comments.
>
> I'm still quite curious how they can promise LAN speeds while pushing. The
> master receiving the push still needs to synchronize the ref lock with all
> the other masters around the world.

You can do LAN speed for the object transfer to your local master. But
then there must be some sort of lag while the ref lock is
synchronized. And there necessarily must be some sort of time for the
objects to arrive in the other masters.

If I push an edit to a few source files, this might only be 40 KiB of
object data transferred. That edit may be able to be transferred
quickly enough to the remote masters that the lag isn't very
noticeable to a human.

If I push a new 650 MiB binary blob, the transfer to my local master
could be at LAN speed. But there must be some sort of lag for that
data to arrive at the other masters. The question is, what does
WANdisco do while that transfer is occurring? If you only have 10
Mbps link to a distant site, that transfer will take almost 9 minutes.

> I think this is also the reason Gerrit
> have longer push times than what a single master installation would have
> (please correct me if I'm wrong here, Shawn).

Correct.

Our multi-master solution effectively does several rounds of Paxos
during a push to Gerrit. It looks like this:

round 1) Copy object data from local master to remote masters. For
example, the user pushes to "US-WEST". After the object data is fully
stored at US-WEST, that node does "git push" to all other nodes to
also store the data. No references are updated yet. End-user clients
see this as "Replicating objects ... " with an ASCII art spinner.

If you are pushing 40 KiB, this phase is sometimes invisible to the
user. The spinner only shows up if its taking more than a few seconds
to do the data transfers. If are pushing 650 MiB, this spinner is very
likely to appear and make the end-user wait for the copying to occur.

A majority of nodes need to accept the data to ensure we are disaster
proof. We fail the end-user's git push if we can't get a majority
here.

round 2) Paxos is used to coordinate creation of
refs/changes/nn/ccnn/p. Being Paxos, a majority need to accept this
proposal.

round 3) Paxos is used to coordinate the PatchSets and Changes tables
in the database.

These aren't the fastest things in the world. One challenge is the
speed of light. Our sites are around the world. We have to wait for
replies from remote corners of the Earth. Light only goes so fast
along the fiber routes that have been built. And not every fiber route
is the shortest path between two points.

An advantage of our approach is by the end of round 2 we know some
sites are able to immediately read the branch and the objects in that
branch, with no lag caused by a reader. We push the lag onto the
writer by making the writer wait for all 3 rounds.

As an optimization we do allow some sites to fall behind by not voting
in a Paxos round. We often see at least one cluster in the group
lagging behind due to its extreme distance from the other nodes on the
network. That node catches up as quickly as possible later, but it may
still take time.

> And if we read the DCONE[1] (Distributed Coordination Engine) white paper it
> does seems they are not using this kind of locking for solving the collision
> issue:
>
>> Concurrent Agreement
>>
>> The Paxos algorithm only allows agreement to be reached on one proposal
>> at a time. This has the obvious effect of slowing down performance in a
>> high transaction volume environment. DConE allows multiple proposals
>> from multiple proposers to progress simultaneously, rather than waiting
>> for
>> agreement to be reached by all of the nodes on a proposal by proposal
>> basis.
>
>
> This is interesting. So if two users are pushing to two different masters,
> when is the collision handled? After the push is completed? Or at the end of
> the push? Since Gerrit is unmodified, is there actually a way to handle this
> today?

No, you are thinking about DConE wrong.

DConE helps when you have two users pushing to to non-conflicting
resources managed by the same Paxos group. If a user in US-WEST is
pushing to "master", and another user in APAC is pushing to "stable",
DConE might allow these both to succeed with lower latency overheads.

If both users were pushing to "master", DConE (should) still ensures
the conflict is identified and at least one will fail.

The question is how do they define a Paxos group. We define it over a
Git repository. We can pipeline operations on different repositories
efficiently, but within a single repository we have to delay and wait
for pending proposals to succeed/fail before a new one can begin.

Our decision to use per-Git repository Paxos was intentional with
forward thinking about notedb. We plan to update refs/heads/master and
refs/changes/nn/ccnn/meta simultaneously when a change merges. That
requires them to be able to be in a single Paxos proposal, so
conflicts can be identified.

As it happens with our current Paxos solution, cross-repository
transactions are a lot more complicated. This is one reason among many
we haven't worked on any sort of cross-repository change dependency
thing in Gerrit.

Shawn Pearce

unread,
Dec 8, 2014, 1:37:58 PM12/8/14
to Gustaf Lundh, repo-discuss
On Mon, Dec 8, 2014 at 9:25 AM, Shawn Pearce <s...@google.com> wrote:
>
> On Mon, Dec 8, 2014 at 1:20 AM, Gustaf Lundh
> <gustaf...@sonymobile.com> wrote:
> > Thanks for your comments.
> >
> > I'm still quite curious how they can promise LAN speeds while pushing. The
> > master receiving the push still needs to synchronize the ref lock with all
> > the other masters around the world.
>
> You can do LAN speed for the object transfer to your local master. But
> then there must be some sort of lag while the ref lock is
> synchronized. And there necessarily must be some sort of time for the
> objects to arrive in the other masters.
>
> If I push an edit to a few source files, this might only be 40 KiB of
> object data transferred. That edit may be able to be transferred
> quickly enough to the remote masters that the lag isn't very
> noticeable to a human.
>
> If I push a new 650 MiB binary blob, the transfer to my local master
> could be at LAN speed. But there must be some sort of lag for that
> data to arrive at the other masters. The question is, what does
> WANdisco do while that transfer is occurring? If you only have 10
> Mbps link to a distant site, that transfer will take almost 9 minutes.

A really good question is what happens during "git gc" on a master.

If its just filesystem replication, are the new huge pack files copied
to the remote masters?

How much does that bottleneck user transactions that are trying to
write to the same repository?

Doug Kelly

unread,
Dec 8, 2014, 2:39:17 PM12/8/14
to repo-d...@googlegroups.com, gustaf...@sonymobile.com
That's a good question indeed: it's even a concern when doing "normal" replication.  In our case, we haven't ever garbage collected our mirrors until just this year.  Obviously, this led to many stale and loose objects, and running "gerrit gc" has helped with storage space fairly significantly.  Correct me if I'm wrong, but there's nothing that would prevent the gc from running asynchronously.  I personally pick a time in each individual site that's least-impacting for their users to run the gc. This does cost more CPU time, of course, but it doesn't cost us anything from a network perspective, which is usually our more limited resource.

Randy Defauw

unread,
Dec 9, 2014, 10:21:47 AM12/9/14
to repo-d...@googlegroups.com
Hi,

Just to introduce myself, I work at WANdisco as a product manager.  I've invited some of the engineers working on our solution to chime in here as well, but I can probably give you more background.

First, in terms of architecture... WANdisco's replication engine uses a Paxos algorithm to coordinate writes to the Git repositories.  It does not use file system replication; there is no dependence on file system tools at all.  When a push comes in to Gerrit via a Git client or through activity in the Gerrit UI, we intercept the activity at roughly the same point a regular Git update hook would come into play.  The replication engine tries to get consensus to accept the push.  If it succeeds, then the content of the push (the pack file) is transferred to other nodes asynchronously.

In terms of the data transfer, there's a configurable setting as to whether you try to transfer the pack file to one (or more) other nodes before the push is accepted.  That's for the sake of data redundancy.  So if you have 5 nodes in the system a/b/c/d/e, say a push comes into 'a'.  By default the content is transferred to one other node (say 'b') before the push is accepted.  After the push is accepted the content is delivered to c/d/e.

'git gc' doesn't affect the replication.  In terms of concurrent operations, read operations are not affected at all (we're not in the loop for reads).  Writes are of course sequenced atomically as Paxos aims for single copy consistency.

We don't replicate the RDBMS and indeed we are looking forward to seeing more data move into the Git repositories.  We'd actually like to help in that area next year.

Anyway if you have any further questions please let me know.  WANdisco is a commercial company of course but we have an interest in helping open source succeed.  WANdisco has supported the Subversion project for many years and I'm looking forward to being more active in the Git/Gerrit community.

Cheers,
Randy

5 reasons your Hadoop needs WANdisco

Listed on the London Stock Exchange: WAND

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege.  If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone.  Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.  The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf.  All email sent to or from this address is subject to electronic storage and review by WANdisco.  Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Luca Milanesio

unread,
Dec 9, 2014, 10:31:51 AM12/9/14
to Randy Defauw, repo-d...@googlegroups.com
Hi Randy,
thanks for your feedback: I do have another question on your distributed coordinator.

How do you manage slowdowns in remote sites? Imagine that one of the remote nodes is in China and the line is *reeeeaaaaallllly* slow.

Do you block the operations on all the repositories until you’ve got the final ACK from all masters? Do you just queue the operations and allow to continue?
(in a nutshell: is it a sort of Two-Phase-Commit or do you manage replication queues?)

Luca.

Randy Defauw

unread,
Dec 9, 2014, 10:41:36 AM12/9/14
to Luca Milanesio, repo-d...@googlegroups.com
Hi Luca,

We let you configure the Paxos 'role' for each node, so the impact of a slowdown depends on the configuration and where the push comes from.

Let's say you have nodes A/B/C and to start all are voters (in Paxos terms, acceptors).  If a push comes in to node A, it needs to get a majority of the votes.  It would thus need to get agreement from B or C.  If 'C' is down or just really slow, it won't matter.  

Of course if the commit comes into 'C' you need to get agreement from 'A' or 'B'.  If 'C' is at the wrong end of a really slow network pipe, then the agreement (accepting the proposal for the push) is subject to that latency.  But the proposal is just a few packets.  The bulk of the data is the content of the push (the pack file) and we can send that asynchronously, so the user experience at 'C' shouldn't be awful.

You could also configure the roles differently and change them on a schedule.  For example let's say we have three nodes in the EU and three in Asia.  During working hours in the EU we could make those 3 nodes the voters, then in working hours in Asia we could make those three nodes the voters.

Thanks for reaching out.  I can also share a demo but it's pretty boring to tell you the truth - when replication works, it's transparent.  Normally what we show is:
  • Developer pushes a change for review on node A
  • Another developer performs the code review on node B and accepts it
  • Developer on node C can pull the approved review
Cheers,
Randy


Randy DeFauw I Senior Product Manager

WANdisco // Non-Stop Data

Luca Milanesio

unread,
Dec 9, 2014, 10:48:57 AM12/9/14
to Randy Defauw, repo-d...@googlegroups.com
Hi Randy,
thanks for the quick response.

Replication is not boring at all, feel free to share the demo / screen-scast / video :-)

Coming back to the slow node ‘C’: imaging that he gave the OK and accepted the push synchronously (at T0) and later one receiving the bulk of the data asynchronously (at T1).
What happens between T0 and T1? Does not ‘C’ prevent further pushes to its repo? Is the repo set as “read-only”?

If ‘C’ were to accept pushes between T0 and T1 (without having the BLOBs of the commit), it would potentially open the door to an consistency issue and to diverging branches.
Can you clarify what Paxos would do in case of node ‘C’ between T0 and T1?

Luca.

Randy Defauw

unread,
Dec 9, 2014, 11:22:13 AM12/9/14
to Luca Milanesio, Mark McKeown, repo-d...@googlegroups.com
Hi Luca,

Ok, I'll try to get my rough cut demo video posted somewhere. :)

I've copied Mark McKeown, one of my colleagues and our Git architect.  He can go into a lot more technical detail but I'll try to reply briefly here.

In the scenario with node 'C': as soon as the proposal is accepted, we let the local Gerrit instance complete its activity normally.  Afterwards node 'C' can be used for continued write operations even if, for example, it takes 5 minutes to transfer a large pack file to node 'A'.  From our perspective then node 'C' is fully usable at T0; it does not have to wait until T1.

Node 'A' has agreed to the proposal also and knows it is coming.  We use a global sequence number (GSN) in the proposals to preserve the proper order of transactions, which is part of the Paxos design.  So if node 'A' gets another incoming push it will know not to handle it until it has gotten the data from 'C' that preceded it.  Again, Mark can go into more formal detail on this point so forgive me if I'm not being precise here.

Cheers,
Randy


Randy DeFauw I Senior Product Manager

WANdisco // Non-Stop Data

Mark McKeown

unread,
Dec 9, 2014, 11:36:22 AM12/9/14
to repo-d...@googlegroups.com, randy....@wandisco.com
Hi Luca,
             WANDisco models each Git Repository as a State Machine [1] and uses the Paxos [2] algorithm to keep the state machines in sync. 
The original paper for this approach is "Time, Clocks and the Ordering of Events in a Distributed System" [3].

A Git push is a bunch of objects (blobs, commits and trees) in a packfile, this is unpacked into the repo and then git updates
the refs, eg refs/heads/master

When a push for a branch comes into a node it first transfers the contents of the packfile to the other nodes, once enough nodes have
the contents (this is configurable) the node issues a Paxos proposal to update the ref. If there are multiple proposals outstanding for
the state machine then DConE creates a global ordering for the proposals. Each node will execute the proposals in the same order keeping
the state machines synchronised. The node will only execute the proposal when it already has the packfile. 

If a node does not have the packfile then the node will block doing any updates to that repository until it has the packfile and can complete
the proposal - this does not prevent it accepting any more pushes to the repository, they will be submitted to Paxos and will queue up 
waiting for the earlier updates to complete. The later pushes may be rejected as they are out of date.

If a node is down it does not prevent the rest of the nodes making progress, when it starts up it catches up with the other nodes.

Hope this helps. 

cheers
Mark Mc Keown


Mark McKeown

unread,
Dec 9, 2014, 11:57:35 AM12/9/14
to repo-d...@googlegroups.com, gustaf...@sonymobile.com
Hi Shawn,
                WANDisco replicates git pushes. In general the on disk layout of the git repositories 
on different nodes is different - ie the mix of packfiles to loose objects is generally different 
on each node.

WANDisco provides the ability to do a "co-ordinated" git-gc, ie git-gc is run on a repository 
at the same point in the Paxos sequence of updates across all the nodes.

In the standard Git MultiSite product git-gc will run just like the normal C git implementation, 
this means it can run at different times on different nodes - it does not affect WANDisco git
replication.

Hope this helps.

cheers
Mark Mc Keown

Shawn Pearce

unread,
Dec 9, 2014, 12:51:35 PM12/9/14
to Mark McKeown, repo-discuss, randy....@wandisco.com
On Tue, Dec 9, 2014 at 8:36 AM, Mark McKeown <mark.m...@wandisco.com> wrote:
Hi Luca,
             WANDisco models each Git Repository as a State Machine [1] and uses the Paxos [2] algorithm to keep the state machines in sync. 
The original paper for this approach is "Time, Clocks and the Ordering of Events in a Distributed System" [3].

This is similar to what we do for the system behind gerrit-review.

However we allow nodes to apply accepted proposals out of order on different refs within the same repository. This allows small updates on a stable branch to fly ahead of a large change applied to master that is blocking on copying a large pack file.

A Git push is a bunch of objects (blobs, commits and trees) in a packfile, this is unpacked into the repo and then git updates
the refs, eg refs/heads/master

I understand how you can use hooks to catch activity from Git receive-pack, but how do you grab the ref updates initiated by Gerrit and send them through the Paxos queue?

When a push for a branch comes into a node it first transfers the contents of the packfile to the other nodes, once enough nodes have
the contents (this is configurable) the node issues a Paxos proposal to update the ref. If there are multiple proposals outstanding for
the state machine then DConE creates a global ordering for the proposals.

I assume if two nodes "a", and "b" both initiate a proposal for "master" to different SHA-1s, I assume the Paxos system resolves this conflict by accepting one proposal and rejecting the other, since they are conflicting proposals inside the same repository?

While two nodes "a" and "b" initiating proposals for different branches (e.g. "a" updates mater, "b" updates stable) will globally order and both be accepted.

Each node will execute the proposals in the same order keeping
the state machines synchronised. The node will only execute the proposal when it already has the packfile. 

If a node does not have the packfile then the node will block doing any updates to that repository until it has the packfile and can complete
the proposal - this does not prevent it accepting any more pushes to the repository, they will be submitted to Paxos and will queue up 
waiting for the earlier updates to complete. The later pushes may be rejected as they are out of date.

Let me ask about two specific cases:

Node "c" is lagging behind and does not have any recent updates (e.g. slow line).

A push comes into "c" for "stable", which has not been changed by any other node. Sounds like this would be queued at "c", submitted to Paxos after "c" finishes catching up, and would be accepted. OK. What does the user see while this is happening? Is their "git push" waiting for the catch up so the user can be notified if Paxos accepted or rejected their proposal for "stable"?

The case:  a push comes into "c" for "master", which has been changed several times by other node(s). Since "c" is behind, what happens? If you queue it up and submit the proposal to Paxos, it should be rejected since "c" is behind and the client would be discarding the edits made by other nodes.
I think I would explain this as an "eventually consistent" system then, as reads made immediately after a push may not yet observe the push due to replication lag across the network.

We don't replicate the RDBMS and indeed we are looking forward to seeing more data move into the Git repositories.  We'd actually like to help in that area next year.
Looking forward to it. One of our motivations for removing the RDBMS is to offload the replication work onto our Paxos based Git system, which vastly simplifies the multi-site management.

Randy Defauw

unread,
Dec 9, 2014, 1:21:37 PM12/9/14
to Shawn Pearce, Mark McKeown, repo-discuss
However we allow nodes to apply accepted proposals out of order on different refs within the same repository. This allows small updates on a stable branch to fly ahead of a large change applied to master that is blocking on copying a large pack file.

Interesting.  I suppose we could take that approach at some point, perhaps by using additional state machines.  
 
A Git push is a bunch of objects (blobs, commits and trees) in a packfile, this is unpacked into the repo and then git updates
the refs, eg refs/heads/master

I understand how you can use hooks to catch activity from Git receive-pack, but how do you grab the ref updates initiated by Gerrit and send them through the Paxos queue?

Mark's team made a few updates to JGit in order to intercept the additional activity.  (Naturally we publish any changes we make to open source on request.)
 
When a push for a branch comes into a node it first transfers the contents of the packfile to the other nodes, once enough nodes have
the contents (this is configurable) the node issues a Paxos proposal to update the ref. If there are multiple proposals outstanding for
the state machine then DConE creates a global ordering for the proposals.

I assume if two nodes "a", and "b" both initiate a proposal for "master" to different SHA-1s, I assume the Paxos system resolves this conflict by accepting one proposal and rejecting the other, since they are conflicting proposals inside the same repository?

In the case of Git, yes, we won't accept non-ff updates on a branch.  It's a little different for Subversion.  We've been replicating Subversion for many years and Subversion being file-based may choose to accept a non-conflicting update on the same branch.  It's really the same as if two people were committing at roughly the same time to the same branch.  In the case of Git the second person is out of date and has to reconcile.  In the case of Subversion it depends on which files were being changed.


While two nodes "a" and "b" initiating proposals for different branches (e.g. "a" updates mater, "b" updates stable) will globally order and both be accepted.

Yes, in general.
 

Let me ask about two specific cases:

Node "c" is lagging behind and does not have any recent updates (e.g. slow line).

A push comes into "c" for "stable", which has not been changed by any other node. Sounds like this would be queued at "c", submitted to Paxos after "c" finishes catching up, and would be accepted. OK. What does the user see while this is happening? Is their "git push" waiting for the catch up so the user can be notified if Paxos accepted or rejected their proposal for "stable"?

Yes, their 'git push' is hanging.  On the command line it seems like things are going slowly, so we have a few APIs that let you figure out how many proposals are queued up ahead of you, so you can get a sense of why things are slow.  It's a little more interesting in the Gerrit UI (e.g. merging a code review) so we built a configurable timeout into the system.  To be honest we're still doing some UX research to figure out the best way to let the user know what's happening in a useful way.


The case:  a push comes into "c" for "master", which has been changed several times by other node(s). Since "c" is behind, what happens? If you queue it up and submit the proposal to Paxos, it should be rejected since "c" is behind and the client would be discarding the edits made by other nodes.

That's right.  If we allowed that it'd be a non-ff update which would be unacceptable. 

I think I would explain this as an "eventually consistent" system then, as reads made immediately after a push may not yet observe the push due to replication lag across the network.

From the perspective of a user reading from a node that's behind, that's fair, although I think 'eventually consistent' is kind of a loaded phrase in this area.  The Paxos experts will probably want to argue about this. :)  From a practical view we let you tune the behavior.  You could insist that the files be copied to all nodes before the proposal is accepted, and that would get you closer to the single-node behavior where reads can happen while a large push is in progress.   

We don't replicate the RDBMS and indeed we are looking forward to seeing more data move into the Git repositories.  We'd actually like to help in that area next year.
Looking forward to it. One of our motivations for removing the RDBMS is to offload the replication work onto our Paxos based Git system, which vastly simplifies the multi-site management.

Same page here.  Any suggestions to the best way to get started?  Mark's team will be looking at this next year.
 

Mark McKeown

unread,
Dec 9, 2014, 2:01:11 PM12/9/14
to Randy Defauw, Shawn Pearce, repo-discuss
inline
Eventually consistent is accurate.

So if you do a read from the node you pushed to then you will see your
changes, this is because we make the client block until the transaction is
complete on the node you pushed to.

If you do a read to another node you may not see your changes. To have
all readers get a consistent view, "Strict Consistency", you would globally
order all reads as well as writes to the repository, ie intercept clone/pull
and run them through Paxos. Some applications that use DConE in the Big
Data space do this.

However people are happy with eventually consistent for SCM as they already
effectively using it through git's use of "Optimistic Concurrency"
[1]. Its also
interesting to view this in terms of the CAP[2] trade offs - we
provide Availability,
Partition Tolerance and eventual consistency.

If you have master-master replication of git then you are close to the two tier
system outlined in "Dangers of Replication and a Solution" [3] - which is kinda
interesting.

cheers
Mark

[1] http://en.wikipedia.org/wiki/Optimistic_concurrency_control
[2] http://en.wikipedia.org/wiki/CAP_theorem
[3] http://research.microsoft.com/apps/pubs/default.aspx?id=68247




>
> From the perspective of a user reading from a node that's behind, that's
> fair, although I think 'eventually consistent' is kind of a loaded phrase in
> this area. The Paxos experts will probably want to argue about this. :)
> From a practical view we let you tune the behavior. You could insist that
> the files be copied to all nodes before the proposal is accepted, and that
> would get you closer to the single-node behavior where reads can happen
> while a large push is in progress.
>>
>>
>>>>> We don't replicate the RDBMS and indeed we are looking forward to
>>>>> seeing more data move into the Git repositories. We'd actually like to help
>>>>> in that area next year.
>>
>> Looking forward to it. One of our motivations for removing the RDBMS is to
>> offload the replication work onto our Paxos based Git system, which vastly
>> simplifies the multi-site management.
>
>
> Same page here. Any suggestions to the best way to get started? Mark's
> team will be looking at this next year.
>
>

--


5 reasons your Hadoop needs WANdisco
<http://www.wandisco.com/system/files/documentation/5-Reasons.pdf>

Listed on the London Stock Exchange: WAND
<http://www.bloomberg.com/quote/WAND:LN>

Shawn Pearce

unread,
Dec 9, 2014, 2:17:30 PM12/9/14
to Randy Defauw, Mark McKeown, repo-discuss
On Tue, Dec 9, 2014 at 10:21 AM, Randy Defauw <randy....@wandisco.com> wrote:
>> However we allow nodes to apply accepted proposals out of order on
>> different refs within the same repository. This allows small updates on a
>> stable branch to fly ahead of a large change applied to master that is
>> blocking on copying a large pack file.
>
>
> Interesting. I suppose we could take that approach at some point, perhaps
> by using additional state machines.

Like WANdisco we globally order each update and assign them a sequential number.

When a node is applying the accepted proposals, multiple threads can
work on different proposals simultaneously. We keep track of the
sequential number alongside each reference, and skip the update if
another thread already applied a later accepted proposal to the target
branch.

We avoid multiple state machines by just having a clean "look ahead"
at the accepted queue, and parallelizing the updates. However,
skipping old updates processed out of order does require an additional
table on the side connecting proposal numbers to reference names, so a
thread can determine its update is "older" before it really updates
the ref.

>>> A Git push is a bunch of objects (blobs, commits and trees) in a
>>> packfile, this is unpacked into the repo and then git updates
>>> the refs, eg refs/heads/master
>>
>>
>> I understand how you can use hooks to catch activity from Git
>> receive-pack, but how do you grab the ref updates initiated by Gerrit and
>> send them through the Paxos queue?
>
>
> Mark's team made a few updates to JGit in order to intercept the additional
> activity. (Naturally we publish any changes we make to open source on
> request.)

OK, so its not a stock Gerrit server. Its stock Gerrit with an updated JGit. :)

>>> When a push for a branch comes into a node it first transfers the
>>> contents of the packfile to the other nodes, once enough nodes have
>>> the contents (this is configurable) the node issues a Paxos proposal to
>>> update the ref. If there are multiple proposals outstanding for
>>> the state machine then DConE creates a global ordering for the proposals.
>>
>>
>> I assume if two nodes "a", and "b" both initiate a proposal for "master"
>> to different SHA-1s, I assume the Paxos system resolves this conflict by
>> accepting one proposal and rejecting the other, since they are conflicting
>> proposals inside the same repository?
>
>
> In the case of Git, yes, we won't accept non-ff updates on a branch. It's a
> little different for Subversion. We've been replicating Subversion for many
> years and Subversion being file-based may choose to accept a non-conflicting
> update on the same branch. It's really the same as if two people were
> committing at roughly the same time to the same branch. In the case of Git
> the second person is out of date and has to reconcile. In the case of
> Subversion it depends on which files were being changed.

This makes perfect sense, both for Git and Subversion. Thank you for
the explanation.

>> While two nodes "a" and "b" initiating proposals for different branches
>> (e.g. "a" updates mater, "b" updates stable) will globally order and both be
>> accepted.
>
>
> Yes, in general.
>
>>
>>
>> Let me ask about two specific cases:
>>
>> Node "c" is lagging behind and does not have any recent updates (e.g. slow
>> line).
>>
>> A push comes into "c" for "stable", which has not been changed by any
>> other node. Sounds like this would be queued at "c", submitted to Paxos
>> after "c" finishes catching up, and would be accepted. OK. What does the
>> user see while this is happening? Is their "git push" waiting for the catch
>> up so the user can be notified if Paxos accepted or rejected their proposal
>> for "stable"?
>
>
> Yes, their 'git push' is hanging. On the command line it seems like things
> are going slowly, so we have a few APIs that let you figure out how many
> proposals are queued up ahead of you, so you can get a sense of why things
> are slow.

If you are running in a hook, the hook stderr goes to the tty of the
git client. You can output messages to let the user know something is
still happening on the server side. You can use "\r" (without \n) to
reset the tty to the start of the line and rewrite the line. We use
that trick to show a message:

E.g. if you push enough to our servers you see:

$ git push ...
Counting objects: 14768, done.
Delta compression using up to 12 threads.
Compressing objects: 100% (2168/2168), done.
Writing objects: 100% (14768/14768), 89.65 MiB | 20.48 MiB/s, done.
Total 14768 (delta 9327), reused 14682 (delta 9298)
remote: Resolving deltas: 100% (9327/9327)
remote: Replicating objects... (\)

^^^ our processing with a cute ASCII spinner on the end.

> It's a little more interesting in the Gerrit UI (e.g. merging a
> code review) so we built a configurable timeout into the system. To be
> honest we're still doing some UX research to figure out the best way to let
> the user know what's happening in a useful way.

I would be interested in your thoughts here, as we also can have delay
in the Gerrit UI during submit. We have been trying to keep this below
1 second to keep users happy, but sometimes that just isn't practical.
Dave Borowitz is mostly working through the code review metadata. But
that is only part of the database issue for commercial sites. Some
installations are very dependent upon the group system that comes
stock in Gerrit Code Review... which is stored in the database.

One open question I posed earlier in this thread was, how does
WANdisco tell Gerrit a ref has replicated onto this node? If Gerrit
at site "a" writes refs/changes/cc/nncc/meta, the Gerrit at site "b"
will need to be told when this ref arrives so it can update the local
Lucene index, or the local Solr cluster, so that user dashboards and
search results will reflect the current state. At Google we have a
trigger in our accepted Paxos proposal records that allows the Paxos
system to kick the event "up" the software stack into a Gerrit process
to do "Gerrit specific stuff" after an accepted proposal is applied at
a node.

Planning for that in your product, and helping us hook onto that, would help. :)


Years ago I proposed moving group management to Git in
https://gerrit-review.googlesource.com/35780 but this got stuck.
Reviving that and building an open source group backend plugin that
uses Git for storage rather than the database would give sites a path
out. Finishing the plugin out with REST API like
https://gerrit-review.googlesource.com/Documentation/rest-api-groups.html
is needed to help teams that have scripted the group system a way off
the database.

One alternative to this is just tell customers to never use the
internal group system and always rely on the LDAP integration from
their corporate directory. But some folks have found this impossible,
as their corporate IT didn't want to create the number of groups they
needed to organize the Gerrit workflow they wanted to implement.


User preferences are another area not moved to Git, but that should,
to help eliminate user information in the RDBMS. We have started the
migration by placing the user's custom menu items into the All-Users
repository. But we have not yet moved the other preferences. Moving
the other preferences would be fairly straight forward contribution to
the server.

David Ostrovsky

unread,
Dec 9, 2014, 2:25:44 PM12/9/14
to repo-d...@googlegroups.com

Am Dienstag, 9. Dezember 2014 20:17:30 UTC+1 schrieb Shawn Pearce:


User preferences are another area not moved to Git, but that should,
to help eliminate user information in the RDBMS. We have started the
migration by placing the user's custom menu items into the All-Users
repository. But we have not yet moved the other preferences.

The change that does this is pending for review [1].

Shawn Pearce

unread,
Dec 9, 2014, 2:55:20 PM12/9/14
to Randy Defauw, Mark McKeown, repo-discuss
On Tue, Dec 9, 2014 at 11:17 AM, Shawn Pearce <s...@google.com> wrote:
> On Tue, Dec 9, 2014 at 10:21 AM, Randy Defauw <randy....@wandisco.com> wrote:
>>>> A Git push is a bunch of objects (blobs, commits and trees) in a
>>>> packfile, this is unpacked into the repo and then git updates
>>>> the refs, eg refs/heads/master
>>>
>>>
>>> I understand how you can use hooks to catch activity from Git
>>> receive-pack, but how do you grab the ref updates initiated by Gerrit and
>>> send them through the Paxos queue?
>>
>>
>> Mark's team made a few updates to JGit in order to intercept the additional
>> activity. (Naturally we publish any changes we make to open source on
>> request.)
>
> OK, so its not a stock Gerrit server. Its stock Gerrit with an updated JGit. :)
...
>>>>>> We don't replicate the RDBMS and indeed we are looking forward to
>>>>>> seeing more data move into the Git repositories. We'd actually like to help
>>>>>> in that area next year.
>>>
>>> Looking forward to it. One of our motivations for removing the RDBMS is to
>>> offload the replication work onto our Paxos based Git system, which vastly
>>> simplifies the multi-site management.
>>
>>
>> Same page here. Any suggestions to the best way to get started? Mark's
>> team will be looking at this next year.

git-core itself is trying to abstract the reference management system.
Ronnie Sahlberg was trying something in [1] where git-core connects on
a UNIX domain socket to another process, delegating reference writes
to that process.

If this work all lands into git-core, WANdisco could replace the hook
system and run a server responding to this UNIX domain socket.
Operations written by Git over the socket would then be dropped into
the Paxos queue.

If this work lands into git-core, JGit would also need to be able to
communicate over this protocol to stay compatible with git-core. That
would allow JGit to cleanly talk to WANdisco's software, without
needing a modified version of JGit. Gerrit + JGit would just see this
configuration setting in place in the repository, open the socket, and
delegate all updates to WANdisco's magic sauce.


[1] https://github.com/rsahlberg/git/commit/b7220ca0dc121047c4ce57271c6aaaf8b8c25429

Randy Defauw

unread,
Dec 9, 2014, 4:06:39 PM12/9/14
to Shawn Pearce, Mark McKeown, repo-discuss
> Mark's team made a few updates to JGit in order to intercept the additional
> activity.  (Naturally we publish any changes we make to open source on
> request.)

OK, so its not a stock Gerrit server. Its stock Gerrit with an updated JGit. :)

Right. :)   For SVN we have an FSFSWD repository layer.  Git was much easier.  For regular Git we have essentially a custom update hook triggered all writes.  For Gerrit we had to poke into JGit.  Like I said Git (and JGit) were much easier than FSFSWD. 


If you are running in a hook, the hook stderr goes to the tty of the
git client. You can output messages to let the user know something is
still happening on the server side. You can use "\r" (without \n) to
reset the tty to the start of the line and rewrite the line. We use
that trick to show a message:

We do put some messages into the console, like "Commit replicated successfully".  I'd like to put more information in there, but the hook doesn't easily have access to all the information in the replicator.  Plus I don't know if it's useful to say "4 transactions queued".  What people care about is how long it'll take to process those 4, and that means we'd have to figure out how fast the content delivery is moving.  Anyway just a long way of saying that giving good feedback about replication was a little more complex than I first thought it would be... 


>  It's a little more interesting in the Gerrit UI (e.g. merging a
> code review) so we built a configurable timeout into the system.  To be
> honest we're still doing some UX research to figure out the best way to let
> the user know what's happening in a useful way.

I would be interested in your thoughts here, as we also can have delay
in the Gerrit UI during submit. We have been trying to keep this below
1 second to keep users happy, but sometimes that just isn't practical.

From the console we're fairly happy to let a Git or SVN user just hit Control+C if their commit/push is hanging.  (Of course it may succeed eventually, but they don't want to wait forever.)  If you're using a GUI I guess it's a little more irritating but most of the GUIs have a way to let you abort a long-running operation.

For Gerrit it's unusual in the sense that some operations that rarely fail (like project creation) may fail, or just take a long time, if your local node is disconnected or can't get consensus for some reason, or is lagging behind.  We didn't dive into this too deeply yet, but it seems like there's not an intuitive way to provide that sort of feedback to the user while an operation is in progress.  In order to prevent frustration we put an arbitrary configurable time out in certain operations: if your project creation takes more than a couple minutes we'll tell Gerrit it failed and you get a rather ugly 500 message back.  You can then look in our admin console for more information about what happened.

Ideally, we'd be able to present a message in the UI saying "This operation is taking longer than expected but it's still running.  What would you like to do?"  But we also don't want to make a lot of changes to the Gerrit UI and its error handling.  We're looking at some options but haven't settled on anything yet - ideas are welcome of course.
 
>
> Same page here.  Any suggestions to the best way to get started?  Mark's
> team will be looking at this next year.



One open question I posed earlier in this thread was, how does
WANdisco tell Gerrit a ref has replicated onto this node? 

Ah, we did have to do some work on the secondary index.  Briefly, we have an event stream plugin that notifies other nodes that they should update Lucene for a particular change.  Each replicator stores a list of changes-to-be-reindexed on disk so we can retry if necessary.  Mark has more details on that if you're interested.

We had hoped to use Solr but we found it was more or less deprecated - we really couldn't get it to work, and we got some feedback on the list a few months ago that it was pretty much dead.

We also plan to offer the option to 'inject' replicated activity into the local event stream in case you want to have a single Jenkins instance build everything, for example.
 
Planning for that in your product, and helping us hook onto that, would help. :)

Sure, any concerns/thoughts you have, we'll certainly take that into account.   

Randy Defauw

unread,
Dec 9, 2014, 4:40:21 PM12/9/14
to Mark McKeown, Shawn Pearce, repo-discuss
Eventually consistent is accurate.


[1]. Its also
interesting to view this in terms of the CAP[2] trade offs - we
provide Availability,
Partition Tolerance and eventual consistency.

 Sometimes CAP isn't as black and white as it seems... this is where the Paxos experts usually beat me up a little bit.

From a pragmatic view the behavior has been well described:
  • Writes are atomic and sequenced correctly using a number generated by the replicator (not time).
  • Reads may give you old data but not incorrect data.  In other words, if you ask for the state of the system as of sequence number 100, you'll get the same answer from every node.  But the node you're talking to may not have all the information about sequence number 101 yet.
I usually describe our system as favoring consistency and partition tolerance.  It is available for the majority subset of voting nodes.  But if your node is partitioned it cannot accept writes anymore.  

That's different from how I usually hear 'eventual consistency' described. I usually think about it as a case where you can write to all nodes all the time, but may have to reconcile conflicts later.  In that case you could ask a node for the state of the system and get a 'wrong' answer.  I believe Cassandra started out that way but now lets you configure how much consistency you want.

But as Mark pointed out we've taken ourselves out of the read loop for performance reasons and so we don't enforce the strong consistency for reads.  They can be stale (but not wrong).



Listed on the London Stock Exchange: WAND

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege.  If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone.  Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.  The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf.  All email sent to or from this address is subject to electronic storage and review by WANdisco.  Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

lucamilanesio

unread,
Dec 9, 2014, 4:46:22 PM12/9/14
to repo-d...@googlegroups.com, s...@google.com, mark.m...@wandisco.com
@Randy, @Mark, @Shawn: I'm getting really excited of this multi-master replication topic :-) Who said that this is boring stuff? :o)

I imagine that the solution implemented by WANDisco is thus a sort of "branch" of Gerrit (vanilla + modified JGit) supported as commercial product.

A few questions:
a. Do you support *ALL* Gerrit features and plugins on top of the WANDisco-modified Gerrit?
b. Do you have performance data from a real-life scenario of your Gerrit multi-master?
c. Do you have large scale Enterprise installations? Can you share some metrics? (# of nodes / sites, # of push / day, # of concurrent users, # of errors or conflicts, average Git push response time)
d. Are you up-to-date with the latest and greatest of Gerrit? (e.g. 2.10 and forthcoming 2.11)

Thanks again for providing feedback to this *very interesting* discussion thread :-)

Luca.

Randy Defauw

unread,
Dec 9, 2014, 6:39:19 PM12/9/14
to lucamilanesio, repo-discuss, Shawn Pearce, Mark McKeown
Hi Luca,


I imagine that the solution implemented by WANDisco is thus a sort of "branch" of Gerrit (vanilla + modified JGit) supported as commercial product.

That's about right.  We're trying to minimize source code changes as much as possible.  The type of companies we want to do business with are really sensitive about keeping up to date with Gerrit, so we're committing to staying within a quarter (3 months) of the major releases.   That just gives us some time to do a full release cycle and regression testing and such.  

We also do tech support... not that Gerrit is tough to administer. :)
 
A few questions:
a. Do you support *ALL* Gerrit features and plugins on top of the WANDisco-modified Gerrit?

We support all features; our goal is to not change the end user experience, just speed it up if we can.  

Plugins and other  are more interesting.  Here's where we're at:
  • Plugins that are not interested in the repositories should be fine.
  • Integrations that rely on Git hooks can use regular hooks (which respond to local repository activity) or our replicated hooks that respond to replicated activity.  As an example, you can hook up Jenkins to every node using post-receive hooks.  Or, you could hook Gerrit up to one node using post-receive and a new hook type we added called rp-post-receive.
  • Plugins that rely on the Gerrit event stream should be ok, as they'll be installed (and respond) to events that happen on any particular node.  In the future we're going to give you the option of injecting replicated activity into the local event stream.
That's a long way of saying that we're not anticipating problems but we haven't tested every single plugin.  We've focused on the most common ones like Jenkins and JIRA.

b. Do you have performance data from a real-life scenario of your Gerrit multi-master?

We just put out our GA release this week so we only have internal performance data.  We're hoping to install at our launch partner soon and then monitor their performance metrics.  (As you've probably seen our marketing kicks in early; that's a function of our sales cycles.)
 
c. Do you have large scale Enterprise installations? Can you share some metrics? (# of nodes / sites, # of push / day, # of concurrent users, # of errors or conflicts, average Git push response time)

Case studies coming soon.
 
d. Are you up-to-date with the latest and greatest of Gerrit? (e.g. 2.10 and forthcoming 2.11)

We're current with Gerrit 2.9.2 and will pull in Gerrit 2.10 as soon as it goes GA.  I saw that RC1 just came out last week.
 

Thanks again for providing feedback to this *very interesting* discussion thread :-)

Thanks for asking!
 

Lundh, Gustaf

unread,
Dec 10, 2014, 4:46:33 AM12/10/14
to Randy Defauw, lucamilanesio, repo-discuss, Shawn Pearce, Mark McKeown

Hi Randy,

 

Thanks for showing up and answering all the questions from the community.

 

> That's about right.  We're trying to minimize source code changes as much as possible.  The type of companies we want to do business with are really sensitive about keeping up to date with Gerrit, so we're committing to staying within a quarter (3 months) of the major releases.   That just gives us some time to do a full release cycle and regression testing and such.  

 

At Sony Mobile, we run a modified version of Gerrit. Most often it is a Official Release + a few hot fixes from @master and perhaps two or three of internal patches (that is not yet contributed or just still under review at gerrit-review.googlesource.com).

 

Our minor internal fork is quite organic and changes a bit with time, and quite occasionally between official releases. Sometimes we run into showstopper issues that needs to be hot fixed _ASAP_ or perhaps we detect a performance issue that we really want to include before it is included in a new official Gerrit release.

 

If we would go with WANdisco’s solution, would we lose the ability to manage our own internal fork? Or do you provide continuously the full source code to your own version, so that we can modify it as we please? If you are committing to stay within 3 months of major releases, what is the time span for a minor releases?

 

Best regards

Gustaf

--

Mark McKeown

unread,
Dec 10, 2014, 8:44:54 AM12/10/14
to Shawn Pearce, Randy Defauw, repo-discuss
Hi Shawn,
The UNIX domain socket work looks like it would be
perfect for us. One of our early prototypes was git-receive-pack
proxy[1], it intercepts some of the traffic going to git-receive-pack
and sends it to a HTTP service that would do the replication - maybe
someone might find it interesting.

Currently we have a slightly modified version of git-core that adds an
extra hook for the replication and handles some extra error handling -
the patch is available in the source RPM [2], it was not clear that
anyone else would be interested in having the extra hook but the patch
has always been available.

cheers
Mark

[1] https://git02.shef.wandisco.com/summary/?r=git-ms/grp-proxy-http.git
[2] http://opensource.wandisco.com/rhel/6/git/i686/
--


5 reasons your Hadoop needs WANdisco
<http://www.wandisco.com/system/files/documentation/5-Reasons.pdf>

Listed on the London Stock Exchange: WAND
<http://www.bloomberg.com/quote/WAND:LN>

Mark McKeown

unread,
Dec 10, 2014, 9:11:15 AM12/10/14
to Shawn Pearce, Randy Defauw, repo-discuss
On Tue, Dec 9, 2014 at 7:17 PM, Shawn Pearce <s...@google.com> wrote:
> On Tue, Dec 9, 2014 at 10:21 AM, Randy Defauw <randy....@wandisco.com> wrote:
>>> However we allow nodes to apply accepted proposals out of order on
>>> different refs within the same repository. This allows small updates on a
>>> stable branch to fly ahead of a large change applied to master that is
>>> blocking on copying a large pack file.
>>
>>
>> Interesting. I suppose we could take that approach at some point, perhaps
>> by using additional state machines.
>
> Like WANdisco we globally order each update and assign them a sequential number.
>
> When a node is applying the accepted proposals, multiple threads can
> work on different proposals simultaneously. We keep track of the
> sequential number alongside each reference, and skip the update if
> another thread already applied a later accepted proposal to the target
> branch.
>
> We avoid multiple state machines by just having a clean "look ahead"
> at the accepted queue, and parallelizing the updates. However,
> skipping old updates processed out of order does require an additional
> table on the side connecting proposal numbers to reference names, so a
> thread can determine its update is "older" before it really updates
> the ref.
>

This is really interesting.

Early versions of WANDisco used a single state machine, however we found it more
flexible having multiple state machines.

Having multiple state machines makes processing the update queues
easier and allows
repositories to be updated in parallel, the price is supporting
multiple state machines however
DConE provides that for us.

Having multiple state machines allows us to control replication
better, for example you
might have ten nodes but only want a particular repository to be
replicated on three nodes -
so you configure that state machine to only run on those three nodes
but some other repository
might be on three different nodes.

We also have state machines that are just used to manage nodes and
other state machines,
this allows us to do things like "Vertical Paxos" [1] and "Cheap
Paxos" [2]. This allows
users to control the replication dynamically, for example in our
"follow the sun" model the
nodes that are needed for quorum change during the day so that during
European day time
quorum can be reached quickly in Europe, but come US day time the
system automatically
reconfigures the state machines so that quorum is reached quickly for
US users (at this point
European users can still reach quorum but they pay the transatlantic
latency price when trying
to achieve quorum).

[1] http://research.microsoft.com/en-us/um/people/lamport/pubs/vertical-paxos.pdf
[2] http://research.microsoft.com/pubs/64634/web-dsn-submission.pdf?q=cheap
This is very cool and some we should do.

cheers
Mark
--


5 reasons your Hadoop needs WANdisco
<http://www.wandisco.com/system/files/documentation/5-Reasons.pdf>

Listed on the London Stock Exchange: WAND
<http://www.bloomberg.com/quote/WAND:LN>

Randy Defauw

unread,
Dec 10, 2014, 10:06:45 AM12/10/14
to Lundh, Gustaf, lucamilanesio, repo-discuss, Shawn Pearce, Mark McKeown
Hi Gustaf,

If we would go with WANdisco’s solution, would we lose the ability to manage our own internal fork? Or do you provide continuously the full source code to your own version, so that we can modify it as we please? If you are committing to stay within 3 months of major releases, what is the time span for a minor releases?


Normally we provide an installer that puts our changes into a compatible version of Gerrit.  However we would be happy to provide our patches so that you could apply them to your fork.  There's a bit of concern that we might run into problems trying to provide technical support in these cases.  However, our changes would likely not conflict with changes that you make internally, so at this point we're willing to cross that bridge when we come to it.

By major release I mean 2.8 -> 2.9 -> 2.10 and so on.  If tomorrow we saw that there was a 2.9.3 release that had a critical bug fix, we'd probably pull it in and release it after we run a quick automated regression cycle on it.  

We don't have anything set in stone yet for minor releases, but our approach will likely be that if we can pull in a minor release and it passes our automated test cycle, we'll ship it with our next patch release.

Cheers,
Randy

Doug Kelly

unread,
Dec 10, 2014, 10:32:18 AM12/10/14
to repo-d...@googlegroups.com, Gustaf...@sonymobile.com, luca.mi...@gmail.com, s...@google.com, mark.m...@wandisco.com


On Wednesday, December 10, 2014 9:06:45 AM UTC-6, Randy Defauw wrote:
Hi Gustaf,

If we would go with WANdisco’s solution, would we lose the ability to manage our own internal fork? Or do you provide continuously the full source code to your own version, so that we can modify it as we please? If you are committing to stay within 3 months of major releases, what is the time span for a minor releases?


Normally we provide an installer that puts our changes into a compatible version of Gerrit.  However we would be happy to provide our patches so that you could apply them to your fork.  There's a bit of concern that we might run into problems trying to provide technical support in these cases.  However, our changes would likely not conflict with changes that you make internally, so at this point we're willing to cross that bridge when we come to it.

By major release I mean 2.8 -> 2.9 -> 2.10 and so on.  If tomorrow we saw that there was a 2.9.3 release that had a critical bug fix, we'd probably pull it in and release it after we run a quick automated regression cycle on it.  

Funny you should mention this: David Pursehouse just released 2.9.3 last night (downgrades SSH to fix that stream-events bug):
 

We don't have anything set in stone yet for minor releases, but our approach will likely be that if we can pull in a minor release and it passes our automated test cycle, we'll ship it with our next patch release.

Cheers,
Randy

Overall, some very good discussion going on here: I look forward to seeing what the future holds.  Shawn does raise a good point with the group backend (hmm, I think we may be one of those pesky groups, with approximately 7 groups created per repo for their access controls), but in more general terms, if the group backend were more pluggable, maybe the internal backend could be less of an issue... Honestly, if group memberships propagated just like the Git repos, that would be fine with me (there's still a built-in audit trail), but the problem may come back as how to efficiently parse the memberships (and still notify other servers when things change).

Mark McKeown

unread,
Dec 10, 2014, 10:33:03 AM12/10/14
to Shawn Pearce, Randy Defauw, repo-discuss
Sorry link [2] to the git-receive-pack proxy should be:

https://github.com/markmckeown/git-receive-proxy

cheers
Mark

Antonio Chirizzi

unread,
Dec 10, 2014, 11:50:03 AM12/10/14
to repo-d...@googlegroups.com, randy....@wandisco.com, mark.m...@wandisco.com

> One open question I posed earlier in this thread was, how does 
> WANdisco tell Gerrit a ref has replicated onto this node? 

Hi Shawn,

this is what we do to tell one Gerrit that a ref has changed on another Gerrit.
Every Gerrit installation is added of a standard simple plugin written by us that catches
all  the change events (plus a new one we added, cause it was missing):

@Listen
public class GitMSChangeEventListenerFile implements ChangeListener,LifecycleListener {
...
}

so that if something happens on one Gerrit node about a change-id, this will
be caught and written to disk on that node. The GitMS (GIt MultiSite), living
on the very same node, will then pick up that file and send a proposal to
the other nodes about that change-id.

The GitMS available on the other nodes will take the proposal and send 
a REST request to Gerrit (on the same node) in order to index that
change-id. If Gerrit is down or does not reply in the right way, the index
request will be retried until successful (for a finite amount of time).

You said you have a 
> trigger in our accepted Paxos proposal records that allows the Paxos 
> system to kick the event "up" the software stack into a Gerrit process 

Can you be more specific on that?
 
In our solution we use 2 JVM (one for Gerrit and one for GitMS), but if
Gerrit already has something to manage the event, we could think
to put everything in one single JVM, avoiding the REST calls to index
the change ids.

Thanks,

-Antonio

Shawn Pearce

unread,
Dec 10, 2014, 12:25:36 PM12/10/14
to Antonio Chirizzi, repo-discuss, Randy Defauw, Mark McKeown
On Wed, Dec 10, 2014 at 8:50 AM, Antonio Chirizzi <antonio....@wandisco.com> wrote:

> One open question I posed earlier in this thread was, how does 
> WANdisco tell Gerrit a ref has replicated onto this node? 

Hi Shawn,

this is what we do to tell one Gerrit that a ref has changed on another Gerrit.
Every Gerrit installation is added of a standard simple plugin written by us that catches
all  the change events (plus a new one we added, cause it was missing):

We would be interested in this Gerrit patch for the missing event being upstreamed into the main product. Our intent is for the plugin extension points to allow this sort of monitoring without needing to modify Gerrit. :)

@Listen
public class GitMSChangeEventListenerFile implements ChangeListener,LifecycleListener {
...
}

so that if something happens on one Gerrit node about a change-id, this will
be caught and written to disk on that node. The GitMS (GIt MultiSite), living
on the very same node, will then pick up that file and send a proposal to
the other nodes about that change-id.

The GitMS available on the other nodes will take the proposal and send 
a REST request to Gerrit (on the same node) in order to index that
change-id. If Gerrit is down or does not reply in the right way, the index
request will be retried until successful (for a finite amount of time).

Cute reuse of technology. :)
 
You said you have a 
> trigger in our accepted Paxos proposal records that allows the Paxos 
> system to kick the event "up" the software stack into a Gerrit process 

Can you be more specific on that?

Its similar to what you implemented.

Whenever a Git update for a branch is proposed into the Paxos queue, part of that proposal necessarily includes the branch name (e.g. "refs/heads/master") and its SHA-1.

When a node is applying accepted proposals to Git, it also looks at the names of the references affected by that proposal. If it sees one that affects Gerrit Code Review (currently only "refs/meta/config" for ACL and config), it sends an RPC to node's local Gerrit server to flush that project from the "projects" cache. The RPC is sent after Git was updated.

We plan to expand the patterns our Paxos system looks at and notifies on to include the special names used for code review metadata ("refs/changes/*/meta").

In our solution we use 2 JVM (one for Gerrit and one for GitMS), but if
Gerrit already has something to manage the event, we could think
to put everything in one single JVM, avoiding the REST calls to index
the change ids.

My original planning for "Gerrit multi-master" was to have one Gerrit "git push" to another. The receiving Gerrit could then look at the refs updated and do special treatment, like flushing the "projects" cache entry if refs/meta/config was updated. That logic already exists inside of ReceiveCommits to handle pushes from authorized users updating refs/meta/config.

With Gerrit moving to a Git based database, ReceiveCommits will also need to update the index if an authorized user pushes (or deletes) the refs/changes/*/meta namespace branches. That is necessary to support federation, project migration between servers, etc.

I was planning to extract that "special logic" bits into some sort of helper class that only needed to care about branch names and SHA-1s, and then knew how to do Gerrit special behavior from there.

Google would then connect our Paxos queue to invoke that helper class when our processor is applying an accepted proposal at this node. WANdisco could modify its plugin to call the same helper. The only work for you is to connect have your Paxos state machine deliver the reference information from a proposal into your plugin, so the plugin can call Gerrit's helper class.

Its basically the same idea that you have now, but you wouldn't need that funny special file for each change id. Instead you just forward a copy of each accepted proposal to your plugin, and the main Gerrit code will know what special logic it needs. Since plugins can implement REST API and SSH commands (or anything else like its own socket server), you have a lot of options to get your Paxos queue to forward into Gerrit.

But you wouldn't need to monitor Gerrit for changes specifically anymore, so the @Listen example you gave above could die. Listening for mutations from Gerrit would be limited to only the reference changes made through JGit. Which the UNIX domain socket for reference handling discussed earlier in this thread may simplify.

 
On Tuesday, December 9, 2014 7:17:30 PM UTC, Shawn Pearce wrote:

One open question I posed earlier in this thread was, how does
WANdisco tell Gerrit a ref has replicated onto this node?  If Gerrit
at site "a" writes refs/changes/cc/nncc/meta, the Gerrit at site "b"
will need to be told when this ref arrives so it can update the local
Lucene index, or the local Solr cluster, so that user dashboards and
search results will reflect the current state. At Google we have a
trigger in our accepted Paxos proposal records that allows the Paxos
system to kick the event "up" the software stack into a Gerrit process
to do "Gerrit specific stuff" after an accepted proposal is applied at
a node.

Planning for that in your product, and helping us hook onto that, would help. :)



David Pursehouse

unread,
Dec 10, 2014, 7:43:19 PM12/10/14
to Antonio Chirizzi, repo-d...@googlegroups.com, randy....@wandisco.com, mark.m...@wandisco.com
On 12/11/2014 01:50 AM, Antonio Chirizzi wrote:
>
> > One open question I posed earlier in this thread was, how does
> > WANdisco tell Gerrit a ref has replicated onto this node?
>
> Hi Shawn,
>
> this is what we do to tell one Gerrit that a ref has changed on another
> Gerrit.
> Every Gerrit installation is added of a standard simple plugin written
> by us that catches
> all the change events (plus a new one we added, cause it was missing):

In the earlier mail from Randy Defauw [1] it was implied that Gerrit is
unmodified. How have you added the new event, if that's the case?

[1] https://groups.google.com/d/msg/repo-discuss/ZrhEWrbZtO8/-5uCcc2yBywJ
> --
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
> ---
> You received this message because you are subscribed to the Google
> Groups "Repo and Gerrit Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to repo-discuss...@googlegroups.com
> <mailto:repo-discuss...@googlegroups.com>.

Chris Mackey

unread,
Dec 11, 2014, 11:15:23 AM12/11/14
to repo-d...@googlegroups.com, antonio....@wandisco.com, randy....@wandisco.com, mark.m...@wandisco.com
Hey David,

We do modify Gerrit slightly to add the new event.

We'd be happy to submit this change back but I'm not sure the event has any value outside
of a replicated scenario where you try to maintain an accurate index on multiple nodes.

The reason we added it was to deal with a temporary divergence in the UIs between different
Gerrit nodes. When an open review is submitted to the merge queue, its status is changed
in the database, and indexed on the local node. In the case  however that the review will be held
in the merge queue for a while (missing dependency, no quorum in GitMS, etc), it will stay
that way until the issue is resolved. Meanwhile, no event is fired to be picked up by our
plugin to let the other nodes know about the updated status - they would all continue to see
the review as "Open". So we just created a new event to be sent after the index has been
performed on the originating node.


> OK, so its not a stock Gerrit server. Its stock Gerrit with an updated JGit. :)

Previously, we did replace the JGit jar bundled with Gerrit with our own, before we determined
that modifications to Gerrit itself were required, but now we just add our JGit changes to the
gerrit-patch-jgit directory and that seems to be a lot nicer.




Shawn Pearce

unread,
Dec 11, 2014, 11:24:11 AM12/11/14
to Chris Mackey, repo-discuss, antonio....@wandisco.com, Randy Defauw, Mark McKeown
On Thu, Dec 11, 2014 at 8:15 AM, Chris Mackey <chris....@wandisco.com> wrote:
>
> Hey David,
>
> We do modify Gerrit slightly to add the new event.
>
> We'd be happy to submit this change back but I'm not sure the event has any value outside
> of a replicated scenario where you try to maintain an accurate index on multiple nodes.
>
> The reason we added it was to deal with a temporary divergence in the UIs between different
> Gerrit nodes. When an open review is submitted to the merge queue, its status is changed
> in the database, and indexed on the local node. In the case however that the review will be held
> in the merge queue for a while (missing dependency, no quorum in GitMS, etc), it will stay
> that way until the issue is resolved. Meanwhile, no event is fired to be picked up by our
> plugin to let the other nodes know about the updated status - they would all continue to see
> the review as "Open". So we just created a new event to be sent after the index has been
> performed on the originating node.

Oh, duh. That makes sense. There is no event broadcast to plugins when
the state changes from NEW to SUBMITTED, as this is usually a short
transient window. But it could be longer if there are missing
dependencies.

> > OK, so its not a stock Gerrit server. Its stock Gerrit with an updated JGit. :)
>
> Previously, we did replace the JGit jar bundled with Gerrit with our own, before we determined
> that modifications to Gerrit itself were required, but now we just add our JGit changes to the
> gerrit-patch-jgit directory and that seems to be a lot nicer.

Please be careful with this.

If you are shadowing any classes in JGit ... that could be bad. The
Gerrit runtime ClassLoader structure doesn't provide any ordering
promises about JARs pushed into the classpath. It may be possible for
JGit to be placed ahead of gerrit-patch-jgit, and then your shadow
class doesn't exist.

Randy Defauw

unread,
Dec 11, 2014, 11:24:17 AM12/11/14
to Chris Mackey, repo-discuss, Antonio Chirizzi, Mark McKeown
Hi,

Just to add to what Chris said, I'm sorry if I caused confusion by not being precise.  I would usually say that we haven't modified Gerrit as we aren't trying to change behavior (beyond adding integration hooks for replication) and we definitely don't want to make substantial source code changes. 

But in the interest of full disclosure we are going to publish all the changes we've made, which fall in three places:
  • Changes to JGit as described by Mark earlier.
  • Changes to Gerrit to add the additional event that Chris just described.
  • The source of the plugin we use to send event stream updates to other nodes.

The additional event may be of general interest but I doubt the rest would be.  In any case the truth is in the code, so once we have it up on a public site you can judge for yourselves.

Those patches and plugin source should be published within a day or two.

Cheers,

Randy



Randy DeFauw I Senior Product Manager

WANdisco // Non-Stop Data

Chris Mackey

unread,
Dec 11, 2014, 11:58:56 AM12/11/14
to repo-d...@googlegroups.com, chris....@wandisco.com, antonio....@wandisco.com, randy....@wandisco.com, mark.m...@wandisco.com

> > OK, so its not a stock Gerrit server. Its stock Gerrit with an updated JGit. :)
>
> Previously, we did replace the JGit jar bundled with Gerrit with our own, before we determined
> that modifications to Gerrit itself were required, but now we just add our JGit changes to the
> gerrit-patch-jgit directory and that seems to be a lot nicer.

Please be careful with this.

If you are shadowing any classes in JGit ... that could be bad. The
Gerrit runtime ClassLoader structure doesn't provide any ordering
promises about JARs pushed into the classpath. It may be possible for
JGit to be placed ahead of gerrit-patch-jgit, and then your shadow
class doesn't exist.

That is good to know, thanks. I guess we'll have to swap back to replacing the jar then. We had
assumed that some guarantee of ordering of the gerrit-patch-jgit overrides existed if it was in use
already. If I can pick your mind for a moment, can I ask why its there at all if it's not reliably loaded?

 

Shawn Pearce

unread,
Dec 11, 2014, 12:18:29 PM12/11/14
to Chris Mackey, repo-discuss, Antonio Chirizzi, Randy Defauw, Mark McKeown
It introduces new classes that do not exist in JGit, but which use
package visible methods to do dirty things under the hood that I
wasn't committed to doing in upstream JGit. Notably around
serialization of ObjectIds and grabbing stats off the block cache.

Chris Mackey

unread,
Dec 12, 2014, 1:55:47 PM12/12/14
to repo-d...@googlegroups.com, chris....@wandisco.com, antonio....@wandisco.com, mark.m...@wandisco.com
Hey guys,

We've made the changes so far available in 3 Github repositories:


As Randy has already said, the changes are focused in 3 areas - a source code change to Gerrit for the previously mentioned SubmitEvent, the GitMS-Gerrit-Event-Plugin, which intercepts events in Gerrit and forwards them to the replicator, and changes in JGit to, when operations are to be performed on disk, instead contact the GitMS replicator to perform them. This has been based off the 2.9.1 release, as that has been what we were developing against. Work to move this up to the latest version of Gerrit will start in the next release cycle (after Christmas).

Luca Milanesio

unread,
Dec 12, 2014, 5:16:01 PM12/12/14
to Chris Mackey, repo-d...@googlegroups.com, antonio....@wandisco.com, mark.m...@wandisco.com
Hey Chris,
thank you very much for sharing the changes, it would make much easier to assess the adoption by companies that are very cautious on Gerrit upgrades.

One suggestion: why don’t you use Gerrit for accepting contributions on your fork?
Have you tried GerritHub.io?

You can see an introduction at:

Luca.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Randy Defauw

unread,
Dec 12, 2014, 5:21:33 PM12/12/14
to Luca Milanesio, Chris Mackey, repo-discuss, Antonio Chirizzi, Mark McKeown
Hi,

We hadn't really thought about it actually - we have a lot of people who contribute to various Apache projects and when I asked around it seemed like our GitHub account was a good place to put these repos. 

What's the preferred way to share with the community, particularly if we want to share back some bug fixes or patches? 

Cheers,
Randy


Randy DeFauw I Senior Product Manager

WANdisco // Non-Stop Data

You received this message because you are subscribed to a topic in the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/repo-discuss/ZrhEWrbZtO8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to repo-discuss...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Luca Milanesio

unread,
Dec 12, 2014, 5:27:04 PM12/12/14
to Randy Defauw, Chris Mackey, repo-discuss, Antonio Chirizzi, Mark McKeown
You can keep them in GitHub, but accept contributions and reviews through GerritHub.io.
The video I mentioned explains how it works :-)

This means that everyone will be free to send contributions and patches in the way they prefer:
- GitHub pull requests
- Gerrit patches

The Gerrit Code Review process will allow to get both and review them with the community before being merged: this allow:
- line and code-level comments
- scoring and validations
- tracking of patch-sets on the same change

The last point in particular is very problematic with GitHub :-( If you submit a patch and, based on the reviews, the commit gets amended … you loose completely the history of the previous commits :-(
Gerrit Code Review has a much better workflow and patch-set management on commits under review.

Luca.

Jonathan Nieder

unread,
Dec 12, 2014, 5:33:26 PM12/12/14
to Randy Defauw, Luca Milanesio, Chris Mackey, repo-discuss, Antonio Chirizzi, Mark McKeown
Randy Defauw wrote:
 
What's the preferred way to share with the community, particularly if we want to share back some bug fixes or patches? 

https://gerrit-review.googlesource.com/ and https://git.eclipse.org/r/. See Documentation/dev-contributing.txt in gerrit and CONTRIBUTING.md in jgit for details.

Thanks,
Jonathan

Luca Milanesio

unread,
Dec 12, 2014, 5:50:47 PM12/12/14
to Jonathan Nieder, Randy Defauw, Chris Mackey, repo-discuss, Antonio Chirizzi, Mark McKeown
That’s even better :-) … just use gerrit-review.googlesource.com :-)

Bear in mind however that not all the changes WANDisco will make would necessarily be accepted in Gerrit core, possibly because they could be too specific to your multi-master implementation and needs.
In general you should try to focus your WANDisco-specific stuff in plugins and, should you need more extension points in Gerrit, propose a patch directly to Gerrit Code Review.

CollabNet has followed this approach and has successfully contributed *a lot* of patches and improvements.
GerritForge has decided instead to put 100% of its code into gerrit-review.googlesource.com as OpenSource under Apache 2.0.

Up to WANDisco to decide which path to follow :-)

Luca.

lucamilanesio

unread,
Mar 2, 2017, 3:46:17 AM3/2/17
to Repo and Gerrit Discussion, j...@google.com, randy....@wandisco.com, chris....@wandisco.com, antonio....@wandisco.com, mark.m...@wandisco.com
Over 2 years have passed since the last message on this thread: what's the current status?

@Wandisco: have you decided to publish your customizations for review to the OpenSource Community?
@all: is anyone using Wandisco w/ Gerrit customizations in production?

Luca.

lucamilanesio

unread,
Mar 3, 2017, 5:03:23 AM3/3/17
to Repo and Gerrit Discussion, j...@google.com, randy....@wandisco.com, chris....@wandisco.com, antonio....@wandisco.com, mark.m...@wandisco.com
ping ... is this thread (and the WANdisco solution?) definitely dead?

Luca.

David Ostrovsky

unread,
Mar 3, 2017, 6:09:10 AM3/3/17
to Repo and Gerrit Discussion

Am Freitag, 3. März 2017 11:03:23 UTC+1 schrieb lucamilanesio:
ping ... is this thread (and the WANdisco solution?) definitely dead?

Looking at their site: [1], reveals that if you use gerrit-multisite solution, you would have these benefits:

* No downtime No restart required
* Guaranteed consistency and performance
* No change in Gerrit functionality and no learning curve for developers

So sounds like "Set it and forget it" to me, if it really works.


Luca Milanesio

unread,
Mar 3, 2017, 6:14:59 AM3/3/17
to David Ostrovsky, Repo and Gerrit Discussion
Yes, but the documentation is 2 years old, it requires a WANDisco's fork of Gerrit based on 2.11.x [2] and had no major updates since then.

That's why I was wondering *IF* there is something new *AND* has someone used it in production. 
I remember at the Summits that some people said they tried it in a test environment, but I am not aware of anyone using it in production yet.

I believe we should get soon some updates from WANdisco's mates soon on this thread :-)

Antonio Chirizzi

unread,
Mar 3, 2017, 12:52:36 PM3/3/17
to Repo and Gerrit Discussion, david.o...@gmail.com
Yes we (WANdisco) have not been updating this thread but in the meantime we have been working on Gerrit!

We use Gerrit MultiSite ourselves and it is being used by our customers - large multi-national companies - who look more for stability than the latest available feature.

That’s why our last Gerrit fork is the 2.11.9 one. In a few weeks we’ll be releasing the latest changes for Gerrit 2.13.x, cause that has been requested by some customers.

Since the last update on this thread we have changed our way of hacking Gerrit. In the beginning we were using a plugin to achieve a master-master replication, but since then, we had to move to change the Gerrit source code from the inside in order to control the master-master updates made on the database (we use freely available Percona XtraDB Cluster or the MariaDB galera cluster).

And yes we already publish our changes on GitHUB of course. We would welcome working with the community on how we could generalise the interfaces we have changed in order to make Gerrit more adaptable into the replicated product space!

We use our implementation of the Paxos algorithm to replicate the changes across the Gerrit instances, and we make use of freely available replicated databases. But we are completely open to review the interfaces or create new APIs and work with the community if this is required!

Luca Milanesio

unread,
Mar 3, 2017, 4:47:31 PM3/3/17
to Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
Ciao Antonio,
thanks for the update, see my feedback below.

On 3 Mar 2017, at 17:52, 'Antonio Chirizzi' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

Yes we (WANdisco) have not been updating this thread but in the meantime we have been working on Gerrit!

We use Gerrit MultiSite ourselves and it is being used by our customers - large multi-national companies - who look more for stability than the latest available feature.

I have to say that Gerrit is an exception on this: the most recent versions contains many more fixes and stability improvements than the older ones typically.
A log of our clients on 2.11.x are urgently looking to upgrade to the 2.13.x because of the Lucene indexing thread cancellation problem.

If your "large multi-national companies" are the ones coming to the Gerrit User Summits (btw: there will be one in 2017, you can register your interest at https://goo.gl/JuoIqQ) they are well aware of the problems I am referring to. I would be surprised that they would want to "stay on 2.11.x" looking for stability.


That’s why our last Gerrit fork is the 2.11.9 one.


In a few weeks we’ll be releasing the latest changes for Gerrit 2.13.x, cause that has been requested by some customers.

Good to know :-)


Since the last update on this thread we have changed our way of hacking Gerrit. In the beginning we were using a plugin to achieve a master-master replication, but since then, we had to move to change the Gerrit source code from the inside in order to control the master-master updates made on the database (we use freely available Percona XtraDB Cluster or the MariaDB galera cluster).

Ouch ... that would make your fork diverging even more from the plain vanilla I'm afraid.
There were a lot of presentations of customers "stuck in a fork" at the past User Summits, for this reason large clients tend to avoid forks and stay with the plain-vanilla Gerrit.

For an archive of all the previous Gerrit User Summits and the associated presentations, see:


And yes we already publish our changes on GitHUB of course. We would welcome working with the community on how we could generalise the interfaces we have changed in order to make Gerrit more adaptable into the replicated product space!

My 2p (I'm based in the UK): stop pushing to your GitHub repository and start contributing to Gerrit Code Review at https://gerrit.googlesource.com/gerrit/.
Second suggestion: start coming to the User Summits and participating to the design sessions of the future architecture of Gerrit. Everyone is interested in multi-master and we started a long ago putting the basis for it.

Ericsson contributed a series of very useful plugins to solve some of the multi-master problems, Google is working on Ketch (https://www.infoq.com/news/2016/02/google-kick-starts-git-ketch) for the Raft consensus protocol and GerritForge is working on an Apache Cassandra backend (https://www.slideshare.net/HaithemJarraya/infinite-gerrit) for a multi-node / multi-zone distributed high-performance storage.

Your experience and input would be more than welcome, if you guys just decide to actively contribute to the community.
Pushing code to GitHub, unfortunately, isn't enough :-( 


We use our implementation of the Paxos algorithm to replicate the changes across the Gerrit instances, and we make use of freely available replicated databases. But we are completely open to review the interfaces or create new APIs and work with the community if this is required!

Looking forward to it :-)



On Friday, March 3, 2017 at 11:14:59 AM UTC, lucamilanesio wrote:

On 3 Mar 2017, at 11:09, David Ostrovsky <david.o...@gmail.com> wrote:


Am Freitag, 3. März 2017 11:03:23 UTC+1 schrieb lucamilanesio:
ping ... is this thread (and the WANdisco solution?) definitely dead?

Looking at their site: [1], reveals that if you use gerrit-multisite solution, you would have these benefits:

* No downtime No restart required
* Guaranteed consistency and performance
* No change in Gerrit functionality and no learning curve for developers

So sounds like "Set it and forget it" to me, if it really works.


Yes, but the documentation is 2 years old, it requires a WANDisco's fork of Gerrit based on 2.11.x [2] and had no major updates since then.

That's why I was wondering *IF* there is something new *AND* has someone used it in production. 
I remember at the Summits that some people said they tried it in a test environment, but I am not aware of anyone using it in production yet.

I believe we should get soon some updates from WANdisco's mates soon on this thread :-)




-- 
-- 
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Listed on the London Stock Exchange: WAND


THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY, AND MAY BE PRIVILEGED.  If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege.  If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone.  Any distribution, use or copying of this e-mail or the information it contains by other than an intended recipient is unauthorized.  The views and opinions expressed in this e-mail message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf.  All email sent to or from this address is subject to electronic storage and review by WANdisco.  Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.

Matthias Sohn

unread,
Mar 3, 2017, 5:43:03 PM3/3/17
to Luca Milanesio, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
On Fri, Mar 3, 2017 at 10:47 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Ciao Antonio,
thanks for the update, see my feedback below.

On 3 Mar 2017, at 17:52, 'Antonio Chirizzi' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

Yes we (WANdisco) have not been updating this thread but in the meantime we have been working on Gerrit!

We use Gerrit MultiSite ourselves and it is being used by our customers - large multi-national companies - who look more for stability than the latest available feature.

I have to say that Gerrit is an exception on this: the most recent versions contains many more fixes and stability improvements than the older ones typically.
A log of our clients on 2.11.x are urgently looking to upgrade to the 2.13.x because of the Lucene indexing thread cancellation problem.

If your "large multi-national companies" are the ones coming to the Gerrit User Summits (btw: there will be one in 2017, you can register your interest at https://goo.gl/JuoIqQ) they are well aware of the problems I am referring to. I would be surprised that they would want to "stay on 2.11.x" looking for stability.


That’s why our last Gerrit fork is the 2.11.9 one.


In a few weeks we’ll be releasing the latest changes for Gerrit 2.13.x, cause that has been requested by some customers.

Good to know :-)


Since the last update on this thread we have changed our way of hacking Gerrit. In the beginning we were using a plugin to achieve a master-master replication, but since then, we had to move to change the Gerrit source code from the inside in order to control the master-master updates made on the database (we use freely available Percona XtraDB Cluster or the MariaDB galera cluster).

Ouch ... that would make your fork diverging even more from the plain vanilla I'm afraid.
There were a lot of presentations of customers "stuck in a fork" at the past User Summits, for this reason large clients tend to avoid forks and stay with the plain-vanilla Gerrit.

For an archive of all the previous Gerrit User Summits and the associated presentations, see:


And yes we already publish our changes on GitHUB of course. We would welcome working with the community on how we could generalise the interfaces we have changed in order to make Gerrit more adaptable into the replicated product space!

My 2p (I'm based in the UK): stop pushing to your GitHub repository and start contributing to Gerrit Code Review at https://gerrit.googlesource.com/gerrit/.
Second suggestion: start coming to the User Summits and participating to the design sessions of the future architecture of Gerrit. Everyone is interested in multi-master and we started a long ago putting the basis for it.

Ericsson contributed a series of very useful plugins to solve some of the multi-master problems, Google is working on Ketch (https://www.infoq.com/news/2016/02/google-kick-starts-git-ketch) for the Raft consensus protocol and GerritForge is working on an Apache Cassandra backend (https://www.slideshare.net/HaithemJarraya/infinite-gerrit) for a multi-node / multi-zone distributed high-performance storage.

Is the source code for infinite-gerrit public ?
Do you consider to contribute this DFS implementation to JGit ?

-Matthias

Luca Milanesio

unread,
Mar 3, 2017, 6:15:24 PM3/3/17
to Matthias Sohn, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
Hi Matthias,
there is a draft change uploaded to Gerrit, I'll ask Haithem to publish.

The JGit DFS implementation is going to be 100% OpenSource and released under Apache 2.0.

Luca.

Matthias Sohn

unread,
Mar 3, 2017, 6:37:06 PM3/3/17
to Luca Milanesio, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
On Sat, Mar 4, 2017 at 12:15 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Hi Matthias,
there is a draft change uploaded to Gerrit, I'll ask Haithem to publish.

The JGit DFS implementation is going to be 100% OpenSource and released under Apache 2.0.

so you don't want to contribute it to JGit itself but open source it elsewhere ?

Luca Milanesio

unread,
Mar 3, 2017, 6:39:52 PM3/3/17
to Matthias Sohn, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
I wouldn't pollute JGit., because the DFS implementation would add a lot of Cassandra related dependencies which would make the build quite complicated.
During the hackathon we installed as "LibModule" and deployed to Gerrit under its /lib directory.

However, we are open to suggestions and best practice to follow for JGit contributions ;-)

Luca.

Matthias Sohn

unread,
Mar 3, 2017, 6:53:51 PM3/3/17
to Luca Milanesio, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
JGit consists of many libraries, so this could be added as a new one, e.g. org.eclipse.jgit.dfs.cassandra.
Which dependencies does this DFS implementation need ?

How can Cassandra be considered to be a distributed file system (DFS) ?

-Matthias 

Luca Milanesio

unread,
Mar 3, 2017, 7:08:56 PM3/3/17
to Matthias Sohn, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky, GerritForge Support
A specific shaded version of the Cassandra Java Driver (com.datastax.cassandra:cassandra-driver-core) and then, of course, the Maven Shade plugin.


How can Cassandra be considered to be a distributed file system (DFS) ?

It is not a filesystem but a key/value store, completely distributed over a network of nodes and dynamically extensible.
It manages automatically:
- data repartitioning
- sharding
- data locallity geo-location
- recovery

It typically excels for immutable data, which Git objects and Packs are.

Luca.

lucamilanesio

unread,
Mar 3, 2017, 7:59:11 PM3/3/17
to Repo and Gerrit Discussion, antonio....@wandisco.com, david.o...@gmail.com
@Gustaf / @Patrick you were the ones who have been in contact with WANdisco to experiment their multi-master solution ... did you ever move to production with it?

Did you take any benchmark you can share? Any feedback?

Luca.

More info at http://groups.google.com/group/repo-discuss?hl=en

--- 
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.

Matthias Sohn

unread,
Mar 5, 2017, 3:52:42 PM3/5/17
to Luca Milanesio, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky, GerritForge Support
On Sat, Mar 4, 2017 at 1:08 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:
On 3 Mar 2017, at 23:53, Matthias Sohn <matthi...@gmail.com> wrote:

On Sat, Mar 4, 2017 at 12:39 AM, Luca Milanesio <luca.milanesio@gmail.com> wrote:

On 3 Mar 2017, at 23:36, Matthias Sohn <matthi...@gmail.com> wrote:

On Sat, Mar 4, 2017 at 12:15 AM, Luca Milanesio <luca.milanesio@gmail.com> wrote:
Hi Matthias,
there is a draft change uploaded to Gerrit, I'll ask Haithem to publish.

The JGit DFS implementation is going to be 100% OpenSource and released under Apache 2.0.

so you don't want to contribute it to JGit itself but open source it elsewhere ?

I wouldn't pollute JGit., because the DFS implementation would add a lot of Cassandra related dependencies which would make the build quite complicated.
During the hackathon we installed as "LibModule" and deployed to Gerrit under its /lib directory.

However, we are open to suggestions and best practice to follow for JGit contributions ;-)

JGit consists of many libraries, so this could be added as a new one, e.g. org.eclipse.jgit.dfs.cassandra.
Which dependencies does this DFS implementation need ?
A specific shaded version of the Cassandra Java Driver (com.datastax.cassandra:cassandra-driver-core) and then, of course, the Maven Shade plugin.

 this doesn't sound like a lot of additional dependencies
How can Cassandra be considered to be a distributed file system (DFS) ?

It is not a filesystem but a key/value store, completely distributed over a network of nodes and dynamically extensible.
It manages automatically:
- data repartitioning
- sharding
- data locallity geo-location
- recovery

It typically excels for immutable data, which Git objects and Packs are.

How are you storing objects and packs ? 
Cassandra recommends chunking larger BLOBs 

-Matthias

Luca Milanesio

unread,
Mar 5, 2017, 4:36:05 PM3/5/17
to Matthias Sohn, Antonio Chirizzi, Repo and Gerrit Discussion, David Ostrovsky
On 5 Mar 2017, at 20:52, Matthias Sohn <matthi...@gmail.com> wrote:

On Sat, Mar 4, 2017 at 1:08 AM, Luca Milanesio <luca.mi...@gmail.com> wrote:

On 3 Mar 2017, at 23:53, Matthias Sohn <matthi...@gmail.com> wrote:

On Sat, Mar 4, 2017 at 12:39 AM, Luca Milanesio <luca.milanesio@gmail.com> wrote:

On 3 Mar 2017, at 23:36, Matthias Sohn <matthi...@gmail.com> wrote:

On Sat, Mar 4, 2017 at 12:15 AM, Luca Milanesio <luca.milanesio@gmail.com> wrote:
Hi Matthias,
there is a draft change uploaded to Gerrit, I'll ask Haithem to publish.

The JGit DFS implementation is going to be 100% OpenSource and released under Apache 2.0.

so you don't want to contribute it to JGit itself but open source it elsewhere ?

I wouldn't pollute JGit., because the DFS implementation would add a lot of Cassandra related dependencies which would make the build quite complicated.
During the hackathon we installed as "LibModule" and deployed to Gerrit under its /lib directory.

However, we are open to suggestions and best practice to follow for JGit contributions ;-)

JGit consists of many libraries, so this could be added as a new one, e.g. org.eclipse.jgit.dfs.cassandra.
Which dependencies does this DFS implementation need ?

A specific shaded version of the Cassandra Java Driver (com.datastax.cassandra:cassandra-driver-core) and then, of course, the Maven Shade plugin.

 this doesn't sound like a lot of additional dependencies

Nope, but Cassandra Java Driver pull is a lot of them, some incompatible with Gerrit ones.
We needed then to shade the Java Driver and remap the ones incompatible with Gerrit, and then works :-)


How can Cassandra be considered to be a distributed file system (DFS) ?

It is not a filesystem but a key/value store, completely distributed over a network of nodes and dynamically extensible.
It manages automatically:
- data repartitioning
- sharding
- data locallity geo-location
- recovery

It typically excels for immutable data, which Git objects and Packs are.

How are you storing objects and packs ? 
Cassandra recommends chunking larger BLOBs 

Yes, we do chunk them, as suggested by Shawn.

Luca.

Gustaf Lundh

unread,
Mar 6, 2017, 3:54:04 AM3/6/17
to lucamilanesio, Repo and Gerrit Discussion, antonio....@wandisco.com, david.o...@gmail.com

> did you ever move to production with it?


No, while we were interested, we never got time to investigate Wandisco's solution much further (and then I moved on to another company, where we had other challenges to attack).


Though I'm interested how the secondary index works here. Is it documented somewhere? Is it handled the same way Ericsson does it, with event forwarding?


/Gustaf






From: repo-d...@googlegroups.com <repo-d...@googlegroups.com> on behalf of lucamilanesio <luca.mi...@gmail.com>
Sent: Saturday, March 4, 2017 1:59 AM
To: Repo and Gerrit Discussion
Cc: antonio....@wandisco.com; david.o...@gmail.com
Subject: Re: Wandisco's Gerrit Multimaster solution. Anyone with experience?
 

Luca Milanesio

unread,
Mar 6, 2017, 3:55:10 AM3/6/17
to Gustaf Lundh, Repo and Gerrit Discussion, antonio....@wandisco.com, David Ostrovsky
WANdisco's fork is public on GitHub, DavidO and I have started looking at it.

Luca.

On 6 Mar 2017, at 08:53, Gustaf Lundh <gustaf...@axis.com> wrote:

> did you ever move to production with it?

No, while we were interested, we never got time to investigate Wandisco's solution much further (and then I moved on to another company, where we had other challenges to attack).

Though I'm interested how the secondary index works here. Is it documented somewhere? Is it handled the same way Ericsson does it, with event forwarding?

/Gustaf





From: repo-d...@googlegroups.com <repo-d...@googlegroups.com> on behalf of lucamilanesio <luca.mi...@gmail.com>
Sent: Saturday, March 4, 2017 1:59 AM
To: Repo and Gerrit Discussion
Cc: antonio....@wandisco.com; david.o...@gmail.com
Subject: Re: Wandisco's Gerrit Multimaster solution. Anyone with experience?
 
@Gustaf / @Patrick you were the ones who have been in contact with WANdisco to experiment their multi-master solution ... did you ever move to production with it?

Did you take any benchmark you can share? Any feedback?

Luca.

On Friday, March 3, 2017 at 9:47:31 PM UTC, lucamilanesio wrote:
Ciao Antonio,
thanks for the update, see my feedback below.

On 3 Mar 2017, at 17:52, 'Antonio Chirizzi' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:

Yes we (WANdisco) have not been updating this thread but in the meantime we have been working on Gerrit!

We use Gerrit MultiSite ourselves and it is being used by our customers - large multi-national companies - who look more for stability than the latest available feature.

I have to say that Gerrit is an exception on this: the most recent versions contains many more fixes and stability improvements than the older ones typically.
A log of our clients on 2.11.x are urgently looking to upgrade to the 2.13.x because of the Lucene indexing thread cancellation problem.

If your "large multi-national companies" are the ones coming to the Gerrit User Summits (btw: there will be one in 2017, you can register your interest athttps://goo.gl/JuoIqQ) they are well aware of the problems I am referring to. I would be surprised that they would want to "stay on 2.11.x" looking for stability.
Reply all
Reply to author
Forward
0 new messages