On Friday, July 01, 2016 12:00:04 PM 'Dave Borowitz' via
We have defined a custom feature in our Gerrit server called
"siblings". The feature allows us to associate different
projects with each other, and it allows them to reference
each other's objects via the alternates mechanism. It does
not do exactly what you want, but I figure it is worth giving
a real world example of using jgit alternates on one of the
heaviest used Gerrit servers in the world.
Our approach does not attempt to minimize data storage,
rather we use it as mechanism to separate the ref
namespaces, yet to still make it easy to copy changes
between repositories. We have additional sibling features
such as a "sibling:" search operator that operates as an
ORing of the project names.
Each repository has a complete copy of all the objects it
needs, and has its own gc applied to it. The object sharing
primarily allows us to copy patchesets from one repository
to the other without having to copy the objects at that time
(they will end up getting copied over on the next gc cycle).
The alternates setup we use is one where we make a symlink
in each sibling to a common alternates file which lists all
the common siblings in that set. This intriduces loops in
the alternates architecture which jgit cannot handle. We
have custom jgit patches which fix this. We do notice some
performance degradations due to the extra pack files that
jgit needs to search through when selecting which copy of an
object to send over the wire. I have experimented with a
few patches to fix this, it can be improved. Some ended up
being the same as some of the patches that the distributed
jgit version added. I think jgit could stand to have a few
more optimizations in this area, I just haven't settled on
something I am comfortable deploying yet.
To put things in perspective, our 7 minute clones of
kernel/msm might rise to about 9mins with the extra
alternates (well packed no bitmaps). The degradation gets
worse as the number of packfiles increases. Regular
repacking due to uploads is required as if the repos were
all a single repo. Since with bitmaps, you would be used to
faster times, the degradation I speak of above will
likelybecome a greater portion of the overall time to do
things, making it more ripe for further optimizations.
> > Though using this option to share objects between forks
> > of the same repository would make the handling of
> > repository lifecycle more complex, so e.g. the
> > delete-project plugin would need to learn that it
> > should not delete a repository used via alternates by
> > another repository instance.
>
> The bigger issue with alternates, which might be fixed in
> a recent git but I don't think so, is you need to prevent
> objects from being pruned during gc if they are reachable
> only via an alternates link from another repo.
Right, this is why we let each repo get its own copy of its
objects,
-Martin
--
The Qualcomm Innovation Center, Inc. is a member of Code
Aurora Forum, hosted by The Linux Foundation