Re: [jgit-dev] Ketch: multi-master replicated Git

334 views
Skip to first unread message

Luca Milanesio

unread,
Jan 13, 2016, 10:07:22 AM1/13/16
to Shawn Pearce, jgit-dev, Repo and Gerrit Discussion
Hi Shawn,
worth sharing on the repo-discuss mailing list as well :-)

Could then I use Git Ketch to manage the agreement process on pushes (as Cassandra gives me some headache with that) and still use another DFS implementation based on Git objects on Cassandra?

Luca.

> On 13 Jan 2016, at 14:54, Shawn Pearce <spe...@spearce.org> wrote:
>
> Google is starting to contribute a new multi-master implementation,
> calling it Git Ketch. Changes are on the Eclipse Gerrit[1].
>
>
> What is Git Ketch?
> ------------------
> Git Ketch is a multi-master Git repository management system. Writes
> (such as git push) can be started on any server, at any time. Writes
> are successful only if a majority of participant servers agree.
>
> Acked writes are durable against server failure, due to a majority of
> the participants storing all required objects.
>
>
> Do I need DFS?
> --------------
> No. We realized not everyone wants to run JGit DFS. Git Ketch is a
> higher-level, storage-agnostic service that can use both classical
> local file repositories, and DFS type repositories.
>
>
> Do I need JGit?
> ---------------
> Sort of.
>
> The Ketch Leader process running the consensus algorithm is currently
> implemented in Java, relying on JGit. However...
>
> Any Git repository served by Git >= 2.4.0 can act as a voting
> participant (the required feature is `git push --atomic`). The
> consensus algorithm runs on the Git wire protocol.
>
>
> Where's the rest?
> -----------------
> Google's prior multi-master implementation is 4 years old and heavily
> intertwined with internal source code. We are rewriting the
> multi-master logic and open sourcing as we go. We think this will make
> it easier for the JGit project to review and digest.
>
> Unfortunately, major portions are being rewritten from scratch, as
> there are segments deeply connected to our internal implementation of
> JGit DFS on Google Bigtable, or to our internal authentication and RPC
> protocols. None of that makes sense in the open source JGit project.
>
> So, the rest is Coming Soon(TM). We are working on it.
> You can help by providing feedback. :)
>
>
> Why is it called Git Ketch?
> ---------------------------
> Git Ketch is modeled on the Raft Consensus Algorithm[2]. A ketch[3]
> sailing vessel is faster and more nimble than a raft[4]. It can also
> carry more source codes.
>
> Git Ketch front-loads replication costs, which we think vaguely
> resembles a ketch sailing vessel's distinguishing feature of the main
> mast on the front of the ship.
>
>
> Footnotes
> ---------
> [1] https://git.eclipse.org/r/64206
> [2] https://raft.github.io/
> [3] https://www.google.com/search?q=ketch&tbm=isch
> [4] https://www.google.com/search?q=raft&tbm=isch
> _______________________________________________
> jgit-dev mailing list
> jgit...@eclipse.org
> To change your delivery options, retrieve your password, or unsubscribe from this list, visit
> https://dev.eclipse.org/mailman/listinfo/jgit-dev

Saša Živkov

unread,
Jan 13, 2016, 11:31:22 AM1/13/16
to Luca Milanesio, Shawn Pearce, jgit-dev, Repo and Gerrit Discussion
On Wed, Jan 13, 2016 at 4:07 PM, Luca Milanesio <luca.mi...@gmail.com> wrote:
Hi Shawn,
worth sharing on the repo-discuss mailing list as well :-)

Could then I use Git Ketch to manage the agreement process on pushes (as Cassandra gives me some headache with that) and still use another DFS implementation based on Git objects on Cassandra?
 
If I understood the announcement [1] correctly you can.

[1]
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Luca Milanesio

unread,
Jan 13, 2016, 11:46:43 AM1/13/16
to Saša Živkov, Shawn Pearce, jgit-dev, Repo and Gerrit Discussion
I would add ... after the JGit BitMap work ... and now the JGit multi-master agreement manager ... JGit is definitely the innovation gear and leading the way of Git into the Enterprise :-)
Thanks Shawn again for contributing it.

Luca.

On 13 Jan 2016, at 16:40, Saša Živkov <ziv...@gmail.com> wrote:



On Wed, Jan 13, 2016 at 3:54 PM, Shawn Pearce <spe...@spearce.org> wrote:
Google is starting to contribute a new multi-master implementation,
calling it Git Ketch. Changes are on the Eclipse Gerrit[1].

Great news!
Thanks for contributing this work to JGit.
Looks like replicated, multi-master, open source Gerrit will be soon a reality :-)

Do you also have a POC change in Gerrit which adds the "--ketch=LEADER"
to its daemon program? Would something like that be useful at this early phase?

Shawn Pearce

unread,
Jan 13, 2016, 5:45:48 PM1/13/16
to Luca Milanesio, Saša Živkov, jgit-dev, Repo and Gerrit Discussion
On Wed, Jan 13, 2016 at 8:30 AM, Saša Živkov <ziv...@gmail.com> wrote:
> On Wed, Jan 13, 2016 at 4:07 PM, Luca Milanesio <luca.mi...@gmail.com>
> wrote:
>>
>> Hi Shawn,
>> worth sharing on the repo-discuss mailing list as well :-)
>>
>> Could then I use Git Ketch to manage the agreement process on pushes (as
>> Cassandra gives me some headache with that) and still use another DFS
>> implementation based on Git objects on Cassandra?
>
>
> If I understood the announcement [1] correctly you can.

Yes, this should be supported.

It gets a bit confusing because I think you are talking about having
like 1 object store in a Cassandra cluster, and then the reference
data is managed by Ketch? Ketch stores the references in the object
store using RefTree, but still needs to use an odd number of copies of
refs/txn/accepted on durable storage to form the voting system.

To be honest I didn't consider a system layout such as this before. Up
until this email I was thinking a minimum Ketch 3.0 (3 voters, 0
followers) system would be 3 separate installations, e.g. 3 Linux
servers running Git on local disk. With Cassandra its more like 3
isolated Cassandra clusters providing 3 copies of the repositories,
and if each Cassandra cluster itself is probably 3 machines at
minimum, this is like a 9 machine system.

If those 9 machines are in the same data center than you may be better
off with something like HDFS providing disk storage for JGit DFS and
using 3 local disks or 3 small installations of reliable databases for
the RefTree bootstrap layer (where Ketch stores its
refs/txn/accepted).

luca.mi...@gmail.com

unread,
Jan 14, 2016, 2:11:19 AM1/14/16
to Shawn Pearce, Saša Živkov, jgit-dev, repo-d...@googlegroups.com


> On 13 Jan 2016, at 23:27, Shawn Pearce <spe...@spearce.org> wrote:
>
>> On Wed, Jan 13, 2016 at 8:40 AM, Saša Živkov <ziv...@gmail.com> wrote:
>>> On Wed, Jan 13, 2016 at 3:54 PM, Shawn Pearce <spe...@spearce.org> wrote:
>>>
>>> Google is starting to contribute a new multi-master implementation,
>>> calling it Git Ketch. Changes are on the Eclipse Gerrit[1].
>>
>>
>> Great news!
>> Thanks for contributing this work to JGit.
>> Looks like replicated, multi-master, open source Gerrit will be soon a
>> reality :-)
>
> Hopefully. :-)
>
> This is still very early work. Its months away from risking a
> prototype server on, let alone a production server. But I think 2016
> will be the year Gerrit ships a multi-master capable release.
>
>> Do you also have a POC change in Gerrit which adds the "--ketch=LEADER"
>> to its daemon program? Would something like that be useful at this early
>> phase?
>
> Not yet. There's a few things you want to do for Gerrit.
>
> RefTreeDatabase needs an extension that is aware of Ketch so it can
> convert BatchRefUpdates and RefUpdate objects to a Ketch Proposal and
> send them to the KetchLeader. Once that's done any JGit application
> (e.g. Gerrit) can make mutations to the repository.

This could be easily plugged into Gerrit already now without affecting JGit, by simply implementing a receiveCommit listener, send and wait for a KetchLeader confirmation ... And in the meantime saying something useful to the user, like "what's the weather today? Bear with me ..."

Luca

>
> This should actually be a small change someone else could try to
> author. RefTreeDatabase is reasonably clean code at this point. Extend
> it and override newBatchRefUpdate() to build a BatchRefUpdate that
> passes the ReceiveCommands into a Proposal object and hands that to
> the KetchLeader. The RefTreeUpdate also needs to know how to do this.
> I think I messed up a few of the internal APIs between RefTreeUpdate
> and RefTreeBatch that may require a bit of refactoring to make it
> easier to recast the update in terms of a Proposal. But this should be
> approachable by someone other than me.
>
>
> Something similar to KetchPreReceive should be wired into
> ReceiveCommits somewhere. When its performing a BatchRefUpdate to send
> objects you want to run a spinner while waiting for the Proposal that
> matches the BatchRefUpdate to execute. But that's a lot of change. May
> be easier to find a way to plumb progress messages around somehow so
> the `git push` user is entertained while a 3s operation runs on a
> world-wide installation. :-\

luca.mi...@gmail.com

unread,
Jan 14, 2016, 2:20:29 AM1/14/16
to Shawn Pearce, Saša Živkov, jgit-dev, Repo and Gerrit Discussion


> On 13 Jan 2016, at 22:45, Shawn Pearce <spe...@spearce.org> wrote:
>
>> On Wed, Jan 13, 2016 at 8:30 AM, Saša Živkov <ziv...@gmail.com> wrote:
>> On Wed, Jan 13, 2016 at 4:07 PM, Luca Milanesio <luca.mi...@gmail.com>
>> wrote:
>>>
>>> Hi Shawn,
>>> worth sharing on the repo-discuss mailing list as well :-)
>>>
>>> Could then I use Git Ketch to manage the agreement process on pushes (as
>>> Cassandra gives me some headache with that) and still use another DFS
>>> implementation based on Git objects on Cassandra?
>>
>>
>> If I understood the announcement [1] correctly you can.
>
> Yes, this should be supported.
>
> It gets a bit confusing because I think you are talking about having
> like 1 object store in a Cassandra cluster, and then the reference
> data is managed by Ketch? Ketch stores the references in the object
> store using RefTree, but still needs to use an odd number of copies of
> refs/txn/accepted on durable storage to form the voting system.

Ketch and Cassandra nodes could be co-located, and Ketch could use the local FS for his refs/txn/accepted while Cassandra storage could be used for everything else.

Typically a Cassandra cluster is at least a dozen of machines and typically is around one hundred. It would a configuration for large setups anyway ... We have great ambitions of growth for GerritHub :-)

>
> To be honest I didn't consider a system layout such as this before. Up
> until this email I was thinking a minimum Ketch 3.0 (3 voters, 0
> followers) system would be 3 separate installations, e.g. 3 Linux
> servers running Git on local disk. With Cassandra its more like 3
> isolated Cassandra clusters providing 3 copies of the repositories,

Cassandra replication factor would make some copies of the data across the cluster, but isn't exactly copy of everything everywhere :-) it's more about partitioning / sharding.

> and if each Cassandra cluster itself is probably 3 machines at
> minimum, this is like a 9 machine system.

Or still 3 if each node runs both Cassandra and Ketch. Again 3 machine is a very small cluster anyway :-)

>
> If those 9 machines are in the same data center than you may be better
> off with something like HDFS providing disk storage for JGit DFS

I thought about HDFS as well in the past the the problem is files explosion: name node will blow up for the number of files created by JGit for hundreds of thousands of repos :-(

Markus Duft

unread,
Jan 15, 2018, 8:01:41 AM1/15/18
to Repo and Gerrit Discussion
Hey everyone,

This is a quite old thread, but multi-master is still a very interesting topic. Is there anything happening towards Ketch and other Open Source Gerrit Multi Master topics, or has google discontinued work on this?

Cheers,
Markus

Luca Milanesio

unread,
Jan 15, 2018, 9:49:27 AM1/15/18
to Markus Duft, Repo and Gerrit Discussion
Hi Markus,
at the last Gerrit User Summit in London there were two different talks on Gerrit Multi-master:

1. Gerrit Multi-Master at Qualcomm (https://www.youtube.com/watch?v=X_rmI8TbKmY)

Multi-Master is definitely a reality in Gerrit OpenSource and you do not need Ketch.
Multi-Site is a different thing, and possibly you were referring to this instead.

GerritForge is working on the Cassandra DFS with Zookeeper for refs for the multi-site OpenSource implementation.

Luca.

Duft Markus

unread,
Jan 15, 2018, 9:54:09 AM1/15/18
to Luca Milanesio, Repo and Gerrit Discussion

Hi,

 

Thanks Luca. I have seen the first talk, but missed the second one J And yes, I am referring to multi-site. I assume you will announce once your implementation is somewhere near ready, so I’ll just sit here waiting :D In the meantime I will watch the second video.

 

Cheers,

Markus


SSI Schäfer IT Solutions GmbH | Friesachstrasse 15 | 8114 Friesach | Austria
Registered Office: Friesach | Commercial Register: 49324 K | VAT no. ATU28654300
Commercial Court: Landesgericht für Zivilrechtssachen Graz

Han-Wen Nienhuys

unread,
Jan 15, 2018, 10:10:32 AM1/15/18
to Markus Duft, Repo and Gerrit Discussion
On Mon, Jan 15, 2018 at 2:01 PM, Markus Duft
<marku...@ssi-schaefer.com> wrote:
> Hey everyone,
>
> This is a quite old thread, but multi-master is still a very interesting
> topic. Is there anything happening towards Ketch and other Open Source
> Gerrit Multi Master topics, or has google discontinued work on this?

Google has discontinued work on Ketch.

--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Duft Markus

unread,
Jan 15, 2018, 10:11:49 AM1/15/18
to Han-Wen Nienhuys, Repo and Gerrit Discussion
Sigh, anyway, thanks for the quick reply :)

Cheers,
Markus

-----Original Message-----
From: Han-Wen Nienhuys [mailto:han...@google.com]
Sent: Monday, January 15, 2018 4:10 PM
To: Duft Markus <Marku...@ssi-schaefer.com>
Cc: Repo and Gerrit Discussion <repo-d...@googlegroups.com>
Subject: Re: [jgit-dev] Ketch: multi-master replicated Git

David Pursehouse

unread,
Jan 15, 2018, 8:51:02 PM1/15/18
to Han-Wen Nienhuys, Markus Duft, Repo and Gerrit Discussion
On Tue, Jan 16, 2018 at 12:10 AM 'Han-Wen Nienhuys' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
On Mon, Jan 15, 2018 at 2:01 PM, Markus Duft
<marku...@ssi-schaefer.com> wrote:
> Hey everyone,
>
> This is a quite old thread, but multi-master is still a very interesting
> topic. Is there anything happening towards Ketch and other Open Source
> Gerrit Multi Master topics, or has google discontinued work on this?

Google has discontinued work on Ketch.


What does this mean for Google's plans/intentions to open source a multi-master Gerrit implementation? Is there some other work ongoing that does not use Ketch, or is the entire thing discontinued?

If Google is no longer working on this, would it still be feasible for the community to continue working with Ketch?

Han-Wen Nienhuys

unread,
Jan 16, 2018, 3:38:29 AM1/16/18
to David Pursehouse, Markus Duft, Repo and Gerrit Discussion
On Tue, Jan 16, 2018 at 2:50 AM, David Pursehouse
<david.pu...@gmail.com> wrote:
> On Tue, Jan 16, 2018 at 12:10 AM 'Han-Wen Nienhuys' via Repo and Gerrit
> Discussion <repo-d...@googlegroups.com> wrote:
>>
>> On Mon, Jan 15, 2018 at 2:01 PM, Markus Duft
>> <marku...@ssi-schaefer.com> wrote:
>> > Hey everyone,
>> >
>> > This is a quite old thread, but multi-master is still a very interesting
>> > topic. Is there anything happening towards Ketch and other Open Source
>> > Gerrit Multi Master topics, or has google discontinued work on this?
>>
>> Google has discontinued work on Ketch.
>>
>
> What does this mean for Google's plans/intentions to open source a
> multi-master Gerrit implementation? Is there some other work ongoing that
> does not use Ketch, or is the entire thing discontinued?

I'm sorry to say that we're focused on other things entirely (but
others are free to continue tinkering with Ketch). Our current MM
implementation does not resemble Ketch in any way, and you could
probably build something similar to it using CockroachDB/Zookeeper and
Cassandra/MongoDB.

> If Google is no longer working on this, would it still be feasible for the
> community to continue working with Ketch?

or course!
Reply all
Reply to author
Forward
0 new messages