Server Side Forking - smart way

63 views
Skip to first unread message

Jacek Centkowski

unread,
Jul 1, 2016, 7:03:37 AM7/1/16
to Repo and Gerrit Discussion
Hi,

I have tried to search over this discussion group but couldn't find topic related so here we go with the idea:
from time to time (more often the we would like to admit ;)) one needs to fork repository (reasons vary - highly confidential work, etc) - correct me if I am wrong but so far one has to clone repository, create new and force push under new name in Gerrit. This is especially tedious for those large repositories. Of course one can do it over file protocol on the server in question but it has at least two drawbacks:
- it is not available to everyone (server access is needed) - rest have to use traditional (will take ages...) approach
- it means that repository history up to the fork point is duplicated (sounds like a terrible waste of resources...)

Smarter way would be to have some sort of link that points from new repository to specific point in history of the original repository. Does anyone know if there is any solution to this for Gerrit?
If there is an interest in such feature how about doing POC during the forthcoming hackaton?

Look forward to hearing from you
Jacek

Edwin Kempin

unread,
Jul 1, 2016, 7:12:45 AM7/1/16
to Jacek Centkowski, Repo and Gerrit Discussion
Can't you do this with branches and ACLs?
E.g. assign OWNER on refs/heads/myfork/* to a 'MyFork Owners' group.

 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Saša Živkov

unread,
Jul 1, 2016, 7:50:46 AM7/1/16
to Jacek Centkowski, Repo and Gerrit Discussion
On Fri, Jul 1, 2016 at 1:03 PM, Jacek Centkowski <geminica...@gmail.com> wrote:
Hi,

I have tried to search over this discussion group but couldn't find topic related so here we go with the idea:
from time to time (more often the we would like to admit ;)) one needs to fork repository (reasons vary - highly confidential work, etc) - correct me if I am wrong but so far one has to clone repository, create new and force push under new name in Gerrit. This is especially tedious for those large repositories. Of course one can do it over file protocol on the server in question but it has at least two drawbacks:
- it is not available to everyone (server access is needed) - rest have to use traditional (will take ages...) approach
- it means that repository history up to the fork point is duplicated (sounds like a terrible waste of resources...)

Smarter way would be to have some sort of link that points from new repository to specific point in history of the original repository.
 
Sounds like you need the "git replace" feature [1], explained more in [2].

Does anyone know if there is any solution to this for Gerrit?

IMO, this would require that JGit supports the "replace" feature.
A feature request for that already exists [3] 



If there is an interest in such feature how about doing POC during the forthcoming hackaton?

Look forward to hearing from you
Jacek

--

David Pursehouse

unread,
Jul 1, 2016, 7:59:31 AM7/1/16
to Edwin Kempin, Jacek Centkowski, Repo and Gerrit Discussion
On Fri, Jul 1, 2016 at 8:50 PM 'Edwin Kempin' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
On Fri, Jul 1, 2016 at 1:03 PM, Jacek Centkowski <geminica...@gmail.com> wrote:
Hi,

I have tried to search over this discussion group but couldn't find topic related so here we go with the idea:
from time to time (more often the we would like to admit ;)) one needs to fork repository (reasons vary - highly confidential work, etc) - correct me if I am wrong but so far one has to clone repository, create new and force push under new name in Gerrit. This is especially tedious for those large repositories. Of course one can do it over file protocol on the server in question but it has at least two drawbacks:
- it is not available to everyone (server access is needed) - rest have to use traditional (will take ages...) approach
- it means that repository history up to the fork point is duplicated (sounds like a terrible waste of resources...)

Smarter way would be to have some sort of link that points from new repository to specific point in history of the original repository. Does anyone know if there is any solution to this for Gerrit?
If there is an interest in such feature how about doing POC during the forthcoming hackaton?

Look forward to hearing from you
Jacek

Can't you do this with branches and ACLs?
E.g. assign OWNER on refs/heads/myfork/* to a 'MyFork Owners' group.


Yes, but the idea here, as far as I understand, is for each "fork" to look like a separate repo with its own master branch, but behind-the-scenes actually be the same underlying git repository (as far as common history allows).

Matthias Sohn

unread,
Jul 1, 2016, 9:51:58 AM7/1/16
to David Pursehouse, Edwin Kempin, Jacek Centkowski, Repo and Gerrit Discussion
On Fri, Jul 1, 2016 at 1:59 PM, David Pursehouse <david.pu...@gmail.com> wrote:
On Fri, Jul 1, 2016 at 8:50 PM 'Edwin Kempin' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
On Fri, Jul 1, 2016 at 1:03 PM, Jacek Centkowski <geminica...@gmail.com> wrote:
Hi,

I have tried to search over this discussion group but couldn't find topic related so here we go with the idea:
from time to time (more often the we would like to admit ;)) one needs to fork repository (reasons vary - highly confidential work, etc) - correct me if I am wrong but so far one has to clone repository, create new and force push under new name in Gerrit. This is especially tedious for those large repositories. Of course one can do it over file protocol on the server in question but it has at least two drawbacks:
- it is not available to everyone (server access is needed) - rest have to use traditional (will take ages...) approach
- it means that repository history up to the fork point is duplicated (sounds like a terrible waste of resources...)

Smarter way would be to have some sort of link that points from new repository to specific point in history of the original repository. Does anyone know if there is any solution to this for Gerrit?
If there is an interest in such feature how about doing POC during the forthcoming hackaton?

Look forward to hearing from you
Jacek

Can't you do this with branches and ACLs?
E.g. assign OWNER on refs/heads/myfork/* to a 'MyFork Owners' group.


Yes, but the idea here, as far as I understand, is for each "fork" to look like a separate repo with its own master branch, but behind-the-scenes actually be the same underlying git repository (as far as common history allows).

JGit supports alternate object directories which can be set using BaseRepositoryBuilder.addAlternateObjectDirectory() [1]
or via objects/info/alternates [2]. Caveat: I didn't yet try using this feature myself.

Though using this option to share objects between forks of the same repository would make the handling of repository
lifecycle more complex, so e.g. the delete-project plugin would need to learn that it should not delete a repository used
via alternates by another repository instance.


-Matthias

Jacek Centkowski

unread,
Jul 1, 2016, 11:36:37 AM7/1/16
to Repo and Gerrit Discussion, david.pu...@gmail.com, eke...@google.com, geminica...@gmail.com

That looks promising :) is it already available in 4.3.0.201604071810-r version (this is the version used by current master if I am not wrong ;))? Could we try to use it?

Dave Borowitz

unread,
Jul 1, 2016, 12:00:28 PM7/1/16
to Matthias Sohn, David Pursehouse, Edwin Kempin, Jacek Centkowski, Repo and Gerrit Discussion
The bigger issue with alternates, which might be fixed in a recent git but I don't think so, is you need to prevent objects from being pruned during gc if they are reachable only via an alternates link from another repo.

IIRC for this reason, Github actually stores all forks of a repo in a single repository with a single shared ref namespace, and does refname translation in their wire protocol proxy.
 

--

Luca Milanesio

unread,
Jul 1, 2016, 4:08:44 PM7/1/16
to Jacek Centkowski, Matthias Sohn, David Pursehouse, Edwin Kempin, Repo and Gerrit Discussion, Dave Borowitz
My 2c: do we really need forks with Gerrit?

We discussed the point a few years ago and we came to the conclusion that forks are very much linked to the pull-request review model.
Having lots of fork to manage is not a plus ... is a big MINUS in administration overhead :-(

Luca.

Martin Fick

unread,
Jul 1, 2016, 4:50:14 PM7/1/16
to repo-d...@googlegroups.com, Dave Borowitz, Matthias Sohn, David Pursehouse, Edwin Kempin, Jacek Centkowski
On Friday, July 01, 2016 12:00:04 PM 'Dave Borowitz' via
We have defined a custom feature in our Gerrit server called
"siblings". The feature allows us to associate different
projects with each other, and it allows them to reference
each other's objects via the alternates mechanism. It does
not do exactly what you want, but I figure it is worth giving
a real world example of using jgit alternates on one of the
heaviest used Gerrit servers in the world.

Our approach does not attempt to minimize data storage,
rather we use it as mechanism to separate the ref
namespaces, yet to still make it easy to copy changes
between repositories. We have additional sibling features
such as a "sibling:" search operator that operates as an
ORing of the project names.

Each repository has a complete copy of all the objects it
needs, and has its own gc applied to it. The object sharing
primarily allows us to copy patchesets from one repository
to the other without having to copy the objects at that time
(they will end up getting copied over on the next gc cycle).

The alternates setup we use is one where we make a symlink
in each sibling to a common alternates file which lists all
the common siblings in that set. This intriduces loops in
the alternates architecture which jgit cannot handle. We
have custom jgit patches which fix this. We do notice some
performance degradations due to the extra pack files that
jgit needs to search through when selecting which copy of an
object to send over the wire. I have experimented with a
few patches to fix this, it can be improved. Some ended up
being the same as some of the patches that the distributed
jgit version added. I think jgit could stand to have a few
more optimizations in this area, I just haven't settled on
something I am comfortable deploying yet.

To put things in perspective, our 7 minute clones of
kernel/msm might rise to about 9mins with the extra
alternates (well packed no bitmaps). The degradation gets
worse as the number of packfiles increases. Regular
repacking due to uploads is required as if the repos were
all a single repo. Since with bitmaps, you would be used to
faster times, the degradation I speak of above will
likelybecome a greater portion of the overall time to do
things, making it more ripe for further optimizations.


> > Though using this option to share objects between forks
> > of the same repository would make the handling of
> > repository lifecycle more complex, so e.g. the
> > delete-project plugin would need to learn that it
> > should not delete a repository used via alternates by
> > another repository instance.
>
> The bigger issue with alternates, which might be fixed in
> a recent git but I don't think so, is you need to prevent
> objects from being pruned during gc if they are reachable
> only via an alternates link from another repo.

Right, this is why we let each repo get its own copy of its
objects,


-Martin


--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation

Oswald Buddenhagen

unread,
Jul 4, 2016, 4:42:50 AM7/4/16
to Repo and Gerrit Discussion
On Fri, Jul 01, 2016 at 03:51:54PM +0200, Matthias Sohn wrote:
> JGit supports alternate object directories which can be set using
> BaseRepositoryBuilder.addAlternateObjectDirectory() [1]
> or via objects/info/alternates [2].

> Caveat: I didn't yet try using this feature myself.
>
i did.
https://bugs.chromium.org/p/gerrit/issues/detail?id=3454

Jacek Centkowski

unread,
Jul 8, 2016, 3:46:09 AM7/8/16
to Repo and Gerrit Discussion, geminica...@gmail.com
Edwin considering all the answers below this seems to be pretty good and quite simple to achieve idea - thanks for pointing it out :)

Regards
Jacek
Reply all
Reply to author
Forward
0 new messages