Scripts for migrating from SVN to git?

Graeme Stewart

unread,

Jan 29, 2016, 6:25:02 AM1/29/16

to HEP Software Foundation Technical Forum

Hi

Thanks to Benedikt for suggesting posting to this forum - it seems like a good list on which to ask a question about backend SCM migration.

ATLAS are now thinking seriously about a migration from SVN to git, however there's no authoritative advice we can find on how to migrate from our current SVN structure (based on package tags) to a more git like structure of branches and tags.

Currently in SVN we have this kind of structure (which I think was quite common in the community, hence the question to this list):

/PackageGroupA/PackageA1/{trunk,tags,branches}/real_code_here

/PackageGroupA/PackageA2/{trunk,tags,branches}/

/PackageGroupB/SubGroupB1/PackageB1.1/{trunk,tags,branches}/

/PackageGroupB/SubGroupB1/PackageB1.2/{trunk,tags,branches}/

/PackageGroupB/SubGroupB1/PackageB1.3/{trunk,tags,branches}/

/PackageGroupB/SubGroupB2/PackageB2.1/{trunk,tags,branches}/

/PackageGroupB/SubGroupB2/PackageB2.2/{trunk,tags,branches}/

/PackageGroupC/PackageC1/{trunk,tags,branches}/

Releases are assembled from a collection of package tags, which are held externally in our tag collector, i.e.,

Release 20.1.0.1 == PackageA1-X1-Y1-Z1 + PackageA2-X2-Y2-Z2 + ...

(Evidently we have access to these tag lists!)

We have various major release branches, which generally had various revisions collected like this:

20.1 -> 20.1.0.1, 20.1.0.2, 20.1.1.1, 20.1.1.2

20.7 -> 20.7.0.1, 20.7.1.1, 20.7.2.1, 20.7.2.2, 20.7.2.3

etc.

What we tentatively envisage is summarised like this (sorry if some of this is obvious, but just to spell it out):

Git would be structured getting rid of all the trunk, tags, branches SVN directories:

/PackageGroupA/PackageA1/real_code_here

/PackageGroupA/PackageA2/

/PackageGroupB/SubGroupB1/PackageB1.1/

/PackageGroupB/SubGroupB1/PackageB1.2/

/PackageGroupB/SubGroupB1/PackageB1.3/

/PackageGroupB/SubGroupB2/PackageB2.1/

/PackageGroupB/SubGroupB2/PackageB2.2/

/PackageGroupC/PackageC1/

And we would create:

- A branch for each release series (obviously with the code from those tags which went into the release!)

- A tag for each point in that release series when we actually built a numbered release

The git master branch naturally contains the current SVN trunk revisions.

Then the branch/tag identifies uniquely the code that went into a numbered release we made (specific package tags we don't bother even trying to store as we believe they become mostly irrelevant). However, we would like to store the proper commit history for packages between releases 20.1.X.Y and 20.1.X.Y+1, otherwise "blame" operations become low resolution.

Clearly this can be done; however, none of the tools that we've looked at seems to really fit with doing this migration at the scale of 2500 leaf packages. In addition, this migration requires some mashup of information in SVN and information from our tag collector, so we will not find a tool that manages it all 'off the shelf'.

So my question here is, what experience people had with a migration of this style of repository? Did you find any particular SVN migrate tool that at least helped? Or did you write your own script to do it (and would you share it)?

Oh, and generally, if you have any comment on the strategic plan, endorsement or criticism, we'd like to hear it.

Thanks a lot

Graeme

Michel Jouvin

unread,

Jan 29, 2016, 8:09:42 AM1/29/16

to hep-sf-t...@googlegroups.com

Hi Graeme,

Very good initiative! The topic you raised is huge... Quickly I wanted to share a modest experience compared to ATLAS scale but may be with some useful ideas... This is what we done when we migrated the Quattor project from SourceForge/SVN to GitHub. On SF we had a big SVN repo, with a structure close to yours (in particular not based on the usual trunk/branches/tags at the very top). And we decided that we wanted to split this in a (large) number of repos, some dedicated to one component, others used for several related things.

Our approach involved basically 2 steps... It worked pretty well for us, I don't know if it can work at the ATLAS scale (due to the time it may require) :

1- Dump a package full history from the repo (rather than dumping the full repo) to a git repo using git svn
2- Use git filter-branch to process created Git repo and basically rewrite the history, renaming files according to the new scheme/layout you want for the Git part

The main issue is to find the appropriate Git granularity as sharing a component between several Git repo is not necessarily easy (even if not impossible).

After that, based on our experience, dealing with a release involving a high number of repo is not necessarily difficult. Even if it certainly involves developping a tool/script to collect things from many repo.

If you are interested, I could try to refresh my memory and share more details later...

Cheers,

Michel

--
You received this message because you are subscribed to the Google Groups "HEP Software Foundation Technical Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hep-sf-tech-fo...@googlegroups.com.
To post to this group, send email to hep-sf-t...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hep-sf-tech-forum/73e545aa-6db5-417b-a329-d9198a16c84d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brett Viren

unread,

Jan 29, 2016, 12:07:54 PM1/29/16

to Graeme Stewart, HEP Software Foundation Technical Forum

Hi Graeme,

git-svn makes a remote Subversion repository almost usable as-is.
Although, merging is still much more difficult than a pure Git
environment.

It's been enough for me to coexist more or less happily with Daya Bay's
SVN over its lifetime of about 8 years. This includes everyday
committing as well as some mirroring/merging of external SVN (and CVS)
repositories such as Gaudi.

That said, in the case of ATLAS with its larger code base and longer
experiment lifetime, I can fully see the benefits of switching from
Subversion to a Git on the remote side.

A few ideas. You may already know them. Hopefully some help:

You can fairly easily map each of your PackageX# trees in SVN to an
individual Git repository using git-svn. This can be a one-time
conversion or, with some persistent effort, an ongoing bi-directional
connection.

git-svn can be told to honor the trunk,tags,branches convention. SVN
commits that happen to span packages will appear in each Git repository
but with only the relevant files affected.

I would not recommend making a single Git repository from all of SVN
because Git branches and tags are atomic to the repository and not
directory based like in SVN (actually, SVN doesn't *really* have a
branch/tag mechanism at all as it conflates directory hierarchy with
concepts of branches and tags).

So, the end result is to have many, individual Git repos defined at the
level of your PackageX# sub-directories. You will then need something
else to aggregate an overall suite of packages.

Given what you describe I think git-submodules can be a good fit to
provide this aggregation. You probably know of these already, but they
are implemented as a normal Git repository which manages what can be
thought of as symlinks into one or more "real" Git repositories.
Specifically the submodule is a pointer to a specific commit in a remote
Git repo. If you have used it, SVN's "externals" feature is similar
although git-submodules are far less clunky.

git-submodules would allow you to express your current hierarchy. You'd
use git-svn to extract each Package tree into a git repo and then build
up a submodule hierarchy matching your current directory tree.

Then, doing a "git clone --recursive" for the top-level would give you a
working directory tree that matches your existing directory hierarchy.

Note that unlike SVN, git-submodules do not restrict you to expressing
just one hierarchy. You could, for example, make one Git repo that had
any subset of your Packages repositories as submodules regardless as to
where they are in the original SVN hierarchy.

You will likely want to have some of your Git repos that hold
git-submodules play double-duty by also holding build files (eg, cmt/
subdir or CMakeList.txt).

You will probably want to script something to translate your global
release information:

> Release 20.1.0.1 == PackageA1-X1-Y1-Z1 + PackageA2-X2-Y2-Z2 + ...

into tags on the Git repositories holding git-submodules. This
translation would follow a recipe something like:

1) "git clone --recursive <PackageGroupA.remote.url>"

2) "cd PackageGroupA/PackageA1 && git checkout X1-Y1-Z1"

3) etc for the other packages....

4) "cd PackageGroupA/ && git commit" to record the checkout state of the submodule checkouts.

4) "git tag -am 'release message' 20.1.0.1" to make your release tag.

There are, of course, other Git aggregation methods to look into if you
haven't already. I know of three in particular which are useful in some
contexts: git-subtree, Google's "repo", "myrepos". But, I think
git-submodules fits your case fairly well.

-Brett.

signature.asc

Graeme Stewart

unread,

Feb 5, 2016, 11:45:55 AM2/5/16

to Brett Viren, HEP Software Foundation Technical Forum

Hi Brett

Thanks a lot for your reply - and sorry it took so long to get back to you. I think the git migration is a Friday PM activity for me at the moment.

On 29 January 2016 at 18:07, Brett Viren <brett...@gmail.com> wrote:

Hi Graeme,

git-svn makes a remote Subversion repository almost usable as-is.
Although, merging is still much more difficult than a pure Git
environment.

It's been enough for me to coexist more or less happily with Daya Bay's
SVN over its lifetime of about 8 years. This includes everyday
committing as well as some mirroring/merging of external SVN (and CVS)
repositories such as Gaudi.

That said, in the case of ATLAS with its larger code base and longer
experiment lifetime, I can fully see the benefits of switching from
Subversion to a Git on the remote side.

In fact we also see developers who use "git svn" to manage their packages with the current ATLAS SVN master repository. It's a convenient way to work, but evidently migrating the whole repository is rather a different issue.

A few ideas. You may already know them. Hopefully some help:

You can fairly easily map each of your PackageX# trees in SVN to an
individual Git repository using git-svn. This can be a one-time
conversion or, with some persistent effort, an ongoing bi-directional
connection.

Bi-directional we would avoid. This will have to be a big bang migration and we switch all active development to git. We'll keep the old SVN repository for historical purposes.

git-svn can be told to honor the trunk,tags,branches convention. SVN
commits that happen to span packages will appear in each Git repository
but with only the relevant files affected.

I would not recommend making a single Git repository from all of SVN
because Git branches and tags are atomic to the repository and not
directory based like in SVN (actually, SVN doesn't *really* have a
branch/tag mechanism at all as it conflates directory hierarchy with
concepts of branches and tags).

Actually, having a single tag that snapshots the offline code repository in its entirety before we build a release is actually quite desirable for us. We'd specifically rather avoid the problem of glueing together package tags into a release, which is really quite awkward.

So, the end result is to have many, individual Git repos defined at the
level of your PackageX# sub-directories. You will then need something
else to aggregate an overall suite of packages.

Given what you describe I think git-submodules can be a good fit to
provide this aggregation. You probably know of these already, but they
are implemented as a normal Git repository which manages what can be
thought of as symlinks into one or more "real" Git repositories.
Specifically the submodule is a pointer to a specific commit in a remote
Git repo. If you have used it, SVN's "externals" feature is similar
although git-submodules are far less clunky.

git-submodules would allow you to express your current hierarchy. You'd
use git-svn to extract each Package tree into a git repo and then build
up a submodule hierarchy matching your current directory tree.

Many people suggest this, but in fact it turns out that this doesn't really capture our workflow anyway. More than 2/3 of our current tag inclusion requests span multiple packages. Then this means that a coherent code change would become multiple git commits into the different submodules and you need some external tool to glue/revert that group together. One of the driving issues we have is to simplify and the present system is already too complex.

Another great feature of single spanning commits is that is makes automated testing also a lot easier.

Having looked at scale issues, we're about 50% of the linux kernel SLOC so we believe git scales quite easily to our size.

The problems of the size of a clone (yes, this is a drawback) we can at least mitigate with shallow checkouts.

Thanks for all the suggestions though - it's good to bat them about!

Cheers

Graeme

--

Graeme Stewart

University of Glasgow PPE @ CERN - 1-R-019 x75950

Michel Jouvin

unread,

Feb 5, 2016, 11:58:51 AM2/5/16

to hep-sf-t...@googlegroups.com

Hi Graeme,

This is certainly a topic where there is no definitive and perfect solution. I understand your point that one big repo is easier for the tagging/release process but, as Brett, I would explore all the ways to avoid it... The daily drawback for the contributors could outweight the release complexity with multiple repos.

Clearly, to deal with releases involving multiple repos you need a script/tool that does the tagging across them. But my experience is that this is not that complex and you are helped by the fact that Git is a DVCS. Tags are something you do first on your clone (used to build the release), not on the reference repo which means that if something is going wrong in your clone, you can just throw them away and one it is complete with your clone this is just the matter of push that can be redone if it fails for some reason at one point with one repo.

Cheers,

Michel

--
You received this message because you are subscribed to the Google Groups "HEP Software Foundation Technical Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hep-sf-tech-fo...@googlegroups.com.
To post to this group, send email to hep-sf-t...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/hep-sf-tech-forum/CABRH8Ya%3DpGykWBxundPHjO%2BtY_TXm2et4F_NdYRe-bckVQAFAg%40mail.gmail.com.

Brett Viren

unread,

Feb 5, 2016, 1:08:38 PM2/5/16

to Graeme Stewart, HEP Software Foundation Technical Forum

Hi again,

I don't try to convince you one way or the other but just a couple of
things to clarify/respond to:

Graeme Stewart <graeme...@gmail.com> writes:

> Many people suggest this, but in fact it turns out that this doesn't
> really capture our workflow anyway. More than 2/3 of our current tag
> inclusion requests span multiple packages.

The fact that you say 2/3 and not 100% tells me that your general case
is wanting the ability to place tags on a subset of the full code base.
Since tags in git are atomic to the repository I see no way to satisfy
this requirement without having multiple git repositories.

Or, if you can "round up" your requirement to say you will *always*
tag/branch 100% of the code base then, absolutely, a single git repo is
a far easier way to go. (Although, you'll then have to reconcile your
existing package-level branches and tags in SVN somehow, at least if you
want to keep that full history).

> Then this means that a coherent code change would become multiple git
> commits into the different submodules and you need some external tool
> to glue/revert that group together. One of the driving issues we have
> is to simplify and the present system is already too complex.

git-submodules is provided internally with the usual command line "git"
program (at least, but I don't know about all Git implementations).

> Another great feature of single spanning commits is that is makes
> automated testing also a lot easier.

You can have your CI trigger off commits and/or tags made to the
aggregating git repo holding the submodules.

> Having looked at scale issues, we're about 50% of the linux kernel
> SLOC so we believe git scales quite easily to our size.

In my mind, it's not so much the size in SLOC which drives the choice of
repo granularity. I'm not a deep expert but I believe Linux is
developed intentionally as a monolithic code base. There it makes sense
for a single repo (regardless of code size).

Just for fun I counted my current project which spans ~15 repositories
averaging just less than 1k SLOC C++ per repo and which I aggregate with
git-submodules. So far, this is a single-developer code base.

Blindly interpolating between those two wildly different data points, it
tells me the smaller the project the MORE repos it needs. :)

But, seriously, I make this fine-grained choice because I expect this
project to grow and be used in a way where different repositories evolve
at different rates and multiple top-level aggregation repositories will
provide different "views" of the available "real" git repos. I
need/want to tag, test, build the "core" repos in isolation and then
separately also various larger groupings which build different
application-level targets

-Brett.

signature.asc

Marco Cattaneo

unread,

Feb 5, 2016, 1:52:56 PM2/5/16

to Michel Jouvin, hep-sf-t...@googlegroups.com

We've had this discussion also in LHCb, with similar conclusions on the pros and cons of the two approaches.

One aspect that worries me more about the monolithic approach is the different life cycle of our different 'projects': all our applications share the same basic framework (LHCb extensions to Gaudi, various filtering tools etc.) which is in continuous development. The reconstruction is in continuous development also but at some point we take a snapshot to be used in a given year's HLT, after which we continue to develop on the SVN head for the following year, but might want to include specific enhancements straightaway (which we do by selecting specific tags of specific CMT packages, that can be either on the SVN head or on branches)

I can see how Git enables all this, but I don't see how we can easily delegate the management of e.g. the HLT incremental releases to someone different from the overall "master" of the whole repository. Or, to put it another way: how can different people build new applications releases selecting from overlapping subsets of the pool of pull requests? And how can the validation of such selective integration be automated?

Marco

To view this discussion on the web visit https://groups.google.com/d/msgid/hep-sf-tech-forum/56B4D4C6.7070409%40lal.in2p3.fr.

Brett Viren

unread,

Feb 5, 2016, 6:37:02 PM2/5/16

to Marco Cattaneo, Michel Jouvin, hep-sf-t...@googlegroups.com

Hi Marco,

Marco Cattaneo <mca...@gmail.com> writes:

> I can see how Git enables all this, but I don't see how we can easily
> delegate the management of e.g. the HLT incremental releases to
> someone different from the overall "master" of the whole repository.
> Or, to put it another way: how can different people build new
> applications releases selecting from overlapping subsets of the pool
> of pull requests? And how can the validation of such selective
> integration be automated?

I'm not sure I understand but maybe what is needed is a branch/merge
convention? "git flow" is one that is fairly popular and has actual git
command support to help enforce the convention.

http://nvie.com/posts/a-successful-git-branching-model/

I think also your question may have some ties to the particular build
system.

If the aggregation itself must exist in different states one can either
branch the aggregation repository or have multiple aggregation
repositories. Of course, being just Git, the aggregation repositories
themselves can follow "git flow" or some other branch convention.

-Brett.

signature.asc

Michel Jouvin

unread,

Feb 6, 2016, 4:11:00 AM2/6/16

to Brett Viren, Marco Cattaneo, hep-sf-t...@googlegroups.com

Hi,

I rename the topic of this thread to better reflect what is discussed.
And I think this also reflects that the real issue is not the SVN -> Git
migration itself but amongst the many possibilities of Git repo layout,
which one you choose.

As far as the SVN -> Git migration is concerned, I think this is a
one-off thing where you can afford a bit of complexity if you are
convinced that your target Git repo layout/structure is the appropriate
one for your project. I don't know of any project that was able to do
the migration with git-svn only, as this one basically reproduces you
SVN structure that often is not appropriate. git-svn is the way to get
the full SVN history for a repo (or a subpart of a repo) in Git format
but your real friend there is git filter-branch that allows to cycle
from all the commits retrieved from SVN and rewrite them after doing
anything you want, in particular renaming files. This way you can
automatize having a different file namespace in Git than in SVN, still
keeping an history that looks the same (with the same number of commits
and the same commit msgs/file diffs...). For this step, you certainly
need to spend some effort but again this is a one-off thing. And I agree
that in the migration, you should get read of the "SVN things" imported
by git-svn. Generally the easiest to achieve this is to clone the SVN
repo as a Git repo with git-svn and then clone the "git-svn" repo with git.

As it turned out in this thread, I think the main topic is the repo
structure vs. your packages/components and your envisionned workflow. If
your project is a big project but there are a very limited set of
"managers" that manage all aspects the project, this may limit the
traction for multiple repos (even if there are many other reasons to do
it). But If you have a project made of several/many subprojects with
different people responsible for the different parts, splitting the
project into multiple repos is a must, IMO. You should take into account
that the main access right granularity in Git is the repo (even though
this can be refined with the gitlolite layer but this is not necessarily
exposed through tools like GitHub and GitLab). Having multiple repo
means that a person A can be the master for repo X but only a
contributor without specific right in repo Y.

One other strong reason already mentionned for splitting a big repo into
multiple smaller ones is CI. If you want to implement some sorts of
testing as part of the pull/merge request workflow in GitHub or GitLab,
you need to have tests run in a reasonnable amount of time. The first
step in CI is a checkout of the modified repo: the bigger, the longer.
Also, with one repo it may be an additional complexity to select the
tests relevant to the commits compared to a smaller one where you run
all the repo tests at every pull request. Not that having one repo
depending one another one for running the tests is not necessarily a
problem for CI: your CI scripts can checkout repos in addition to the
one modified.

Another point to keep in mind IMO is that the repo structure should not
make building releases complicated but it
should not be constrained by what a release is (in term of file
layout). The release process should rely on a script/tool that do
whatever is required with the repos instead of relying directly on the
repo contents. Based on my experience (with the Quattor project
mentionned in an earlier email), this is not overwhelmingly complex. In
Quattor for example we have 35 repos (at least 25 involved in a release)
and a release script which is a bash script of 300 lines. This script
runs the build tools in all repos, update one repo with the "include
files" provided by others, tags the repos in a consistent way (not
necessarily the same tag for each repo anyway), build the packages... I
can imagine that an LHC experiment going this way may need something a
little bit more complex but probably not an order of magnitude...

Back to Marco's question about different applications with different
release cycles (if I understood properly), using overlapping subsets of
the other repos, I think it is much easier to handle with multiple repos
than with a single one. You should take into accounts that tags and
branches are different objects in Git and nothing prevent you from
defining application-specific tags in your common repo (all pointing, or
not, to the same branch) to ease the development of other apps. This is
just a matter of defining (and enforcing!) your tag policy. Again, for
projects having multiple repos (most of them!), I think the tagging is
not done by a real human manually! But by a script doing the right,
consistent thing.

As for the question about pull requests that may apply to several
branches, as said by Brett, this is just a matter of another pull
request! After merging your PR to one branch, you can merge the same
commits to another branch (probably doing a cherry-pick of the relevant
commits) and open another PR on the other branch. But in many cases, I
think it will be address by the previous approach. Both subprojects use
the same branch in fact but at different revisions/tags.

This answer is way too long!! Hope it helps... Based on my own
experience (I was very relunctant to move to multiple repos in the
Quattor project when it was decided with the same arguments given here),
when you come from SVN this is always difficult to imagine living with
many different repos. But after a while, you just ask yourself how you
were able to live with one gi repo !

Cheers,

Michel

Benedikt Hegner

unread,

Feb 6, 2016, 5:56:30 AM2/6/16

to Michel Jouvin, brett...@gmail.com, Marco Cattaneo, hep-sf-t...@googlegroups.com, David Lange

Hi Michel,

with you having some experience with multiple repos - how does one tackle layout refactoring that involves moving code between directories that may be contained in different repositories? How do you preserve history?

The problem I see is that one falls into the same trap as before with CVS/SVN and package layout - code is structured according to responsibilities of single developers, not functionality. And with moving to multipe git repositories one may make that even worse. As one really carves things into stone instead of now finally being able to fix it.

@David - maybe you could share more about the CMS release integration experience of a single git repo?

Thanks,
Benedikt
________________________________________
From: hep-sf-t...@googlegroups.com [hep-sf-t...@googlegroups.com] on behalf of Michel Jouvin [jou...@lal.in2p3.fr]
Sent: 06 February 2016 10:10
To: brett...@gmail.com; Marco Cattaneo
Cc: hep-sf-t...@googlegroups.com
Subject: Re: (big) single vs. multiple Git repositories

Hi,

Cheers,

Michel

--

You received this message because you are subscribed to the Google Groups "HEP Software Foundation Technical Forum" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hep-sf-tech-fo...@googlegroups.com.
To post to this group, send email to hep-sf-t...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/hep-sf-tech-forum/56B5B8A0.5040302%40lal.in2p3.fr.

Michel Jouvin

unread,

Feb 6, 2016, 6:27:09 AM2/6/16

to Benedikt Hegner, brett...@gmail.com, Marco Cattaneo, hep-sf-t...@googlegroups.com, David Lange

Hi Benedikt,

I don't think there is one single answer to your question... because
there are several ways of doing it! This will probably be reflected by
other replies...

In my experience (but I can easily imagine different ones), moving code
around repos is quite exceptional if you have an appropriate repo structure.

One answer is to say that you should avoid a too fine-grained repo
division and find the right balance between having strongly related
pieces of code in the same repo and having a repo where you can define a
sensible group of managers (people who really have responsibilities on
what is in the repo). You are right to say the the repo layout should
not be designed only around "developer layout". And you need to keep in
mind that Git is just one thing, what will really help
defining/implemneting your "responsibility workflow is products like
GitHub or GitLab that will allow to distinguish between contributors
(the persons who open pull/merge request, basically as many people as
you can), the code reviewers (basically also as many people as you can)
and subproject managers (the people who have the final responsibility of
saying yes for integrating a modification and that should probably
remain a limited number per subprojects for an optimal management).

Another possible answer is that if you really move (rather than
duplicate) a piece of code across repos, you may decide to start with
one revision of the moved code in the repo and just reflect in a comment
that the previous history is in another repo. It is probably appropriate
for quite some use cases.

(my) Last answer is that splitting/merging Git repos is not that
difficult. There are many recipes depending on what you want to achieve
exactly. This is not as difficult as it was with SVN (not to speak about
SVS), in particular because Git has these nice commit ids (!!!) which
are independent of the sequential order in the history and makes moving
them across repo feasible. git filter-branch is probably your friend
again. For example,
https://help.github.com/articles/splitting-a-subfolder-out-into-a-new-repository/
explains how to extract part of a repo with its history. Integrating
this in another repo is just the matter of adding a remote pointing to
this subproject repo in the destination repo and do whatever is
appropriate add it in the history of one branch.

I am sure I am missing many other possibilities! This need is certainly
not a showstopper for splitting big repo!

Cheers,

Michel

David Lange

unread,

Feb 9, 2016, 3:55:51 PM2/9/16

to Benedikt Hegner, Michel Jouvin, brett...@gmail.com, Marco Cattaneo, hep-sf-t...@googlegroups.com

Hi Benedikt,

> On Feb 6, 2016, at 11:56 AM, Benedikt Hegner <Benedik...@cern.ch> wrote:
>
> Hi Michel,
>
> with you having some experience with multiple repos - how does one tackle layout refactoring that involves moving code between directories that may be contained in different repositories? How do you preserve history?
>
> The problem I see is that one falls into the same trap as before with CVS/SVN and package layout - code is structured according to responsibilities of single developers, not functionality. And with moving to multipe git repositories one may make that even worse. As one really carves things into stone instead of now finally being able to fix it.
>
> @David - maybe you could share more about the CMS release integration experience of a single git repo?
>

Our integration works quite nicely (thanks to Giulio for designing it…) - Indeed our typical pull request would span multiple repositories if we had separated repositories according to group responsible or by functionality. So in this sense having a single repository is a good match to our development pattern. Different from our CVS repository, branches in github are always in a fully consistent state. (e.g., the head of each branch is tested twice per day).

Only a handful of us have actual write permission to the repository. For integration, we assigned each piece of software (we call them “package”s) to a group(s). This map defines which groups should sign off on any given pull request. (done via assigning github labels and using the github api to watch for people to say “+1”). In the end we decided to have the release team also sign off before actually entering the repository, but this works the same way (we are asked once all the package responsibles have said ok).

Git helps a lot with the issue of code organized into single developer kingdoms - as conflicts amongst developments within the same package are rare (e.g., just at the subfile level)

David

David Rousseau

unread,

Feb 10, 2016, 3:19:39 AM2/10/16

to David Lange, Benedikt Hegner, Michel Jouvin, brett...@gmail.com, Marco Cattaneo, hep-sf-t...@googlegroups.com

Hi,
Chipping in…
In ATLAS there is one functionality of our TagCollector and nightly integration system which we use a lot, which is the « vallidation » nightly release.
Developer would submit a tag (or bundle of tag), which will be accepted into to the validation nightly with little scrutiny, on the next morning a whole suite of integration tests is run on the validation nightly. If everything is OK the tag (or bundles of tag) is accepted in the main nightly. If there is a problem the tag is fixed, or rejected, or left there if the problem is due to sthg else. The main nightly has the same suite of test but is more stable so can be used to look for more subtle bugs, rare crashes or non trivial performance changes.
Would such a system be possible in git ?
David

> To view this discussion on the web visit https://groups.google.com/d/msgid/hep-sf-tech-forum/A5C8E7A1-8F57-43CA-BA86-83A326602D03%40cern.ch.

> For more options, visit https://groups.google.com/d/optout.

-----------------------------------------------------------------------------------------------------
| David Rousseau rous...@lal.in2p3.fr skype:droussea |
| LAL-Orsay, U Paris-Sud, CNRS/IN2P3, Université Paris-Saclay, France |
| snail: LAL, CSO Orsay, Bat 200, BP 34, 91898 Orsay Cedex. |
| @ LAL : Bat 200 Pièce 142 : +33 (0)1 64 46 85 91 Skype: droussea |
| @ CERN : 40/1-D12 : +41 (22) 76 73857 |
-----------------------------------------------------------------------------------------------------

Michel Jouvin

unread,

Feb 10, 2016, 3:28:59 AM2/10/16

to David Rousseau, David Lange, Benedikt Hegner, brett...@gmail.com, Marco Cattaneo, hep-sf-t...@googlegroups.com

David,

On of the issue is that adding a tag to a repo requires write access to
it. With Git, using something like GitLab or GitHub, it allows to have a
limited number of people with write access to the repo (as explained by
David L.), still allowing many people to submit contributions (via
pull/merge request). So to implement the workflow that you mention, I
think you'll need to use something else than git tags. You could imagine
for example a specific label to a pull/merge request that will be used
by your nightly build system to decided that this pull request must be
built, in addition to the main branches.

Michel

David Lange

unread,

Feb 10, 2016, 3:33:38 AM2/10/16

to David Rousseau, Benedikt Hegner, Michel Jouvin, brett...@gmail.com, Marco Cattaneo, hep-sf-t...@googlegroups.com

Hi David

Yes (in CMS we made a big advancement in this area with git+jenkins as we lacked something automated in our CVS+tag collector workflow)

Here is a CHEP presentation on the issue:
https://indico.cern.ch/event/214784/session/7/contribution/223/attachments/341158/476056/rise-of-build-infrastructure.pdf

Cheers-
David

Reply all

Reply to author

Forward