[llvm-dev] [RFC] One or many git repositories?

366 views
Skip to first unread message

Justin Lebar via llvm-dev

unread,
Jul 20, 2016, 7:40:13 PM7/20/16
to llvm-dev
Dear all,

I would like to (re-)open a discussion on the following specific question:

Assuming we are moving the llvm project to git, should we
a) use multiple git repositories, linked together as subrepositories
of an umbrella repo, or
b) use a single git repository for most llvm subprojects.

The current proposal assembled by Renato follows option (a), but I
think option (b) will be significantly simpler and more effective.
Moreover, I think the issues raised with option (b) are either
incorrect or can be reasonably addressed.

Specifically, my proposal is that all LLVM subprojects that are
"version-locked" (and/or use the common CMake build system) live in a
single git repository. That probably means all of the main llvm
subprojects other than the test-suite and maybe libc++. From looking
at the repository today that would be: llvm, clang, clang-tools-extra,
lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.

Let's first talk about the advantages of a single repository. Then
we'll address the disadvantages raised.

At a high level, one repository is simpler than multiple repos that
must be kept in sync using an external mechanism. The submodules
solution requires nontrivial automation to maintain the history of
commits in the umbrella repo (which we need if we want to bisect, or
even just build an old revision of clang), but no such mechanisms are
required if we have a single repo.

Similarly, it's possible to make atomic API changes across subprojects
in a single repo; we simply can't do with the submodules proposal.
And working with llvm release branches becomes much simpler.

In addition, the single repository approach ties branches that contain
changes to subprojects (e.g. clang) to a specific version of llvm
proper. This means that when you switch between two branches that
contain changes to clang, you'll automatically check out the right
llvm bits.

Although we can do this with submodules too, a single repository makes
it much easier.

As a concrete example, suppose you are working on some changes in
clang. You want to commit the changes, then switch to a new branch
based on tip of head and make some new changes. Finally you want to
switch back to your original branch. And when you switch between
branches, you want to get an llvm that's in sync with the clang in
your working copy.

Here's how I'd do it with a monolithic git repository, option (b):

git commit # old-branch
git fetch
git checkout -b new-branch origin/master
# hack hack hack
git commit # new-branch
git checkout old-branch

Here's how I'd do it with option (a), submodules. I've used git -C
here to make it explicit which repo we're working in, but in real life
I'd probably use cd.

# First, commit to two branches, one in your clang repo and one in your
# master repo.
git -C tools/clang commit # old-branch, clang submodule
git commit # old-branch, master repo
# Now fetch the submodule and check out head. Start a new branch in the
# umbrella repo.
git submodule foreach fetch
git checkout -b origin/master new-branch
git submodule update
# Start a new branch in the clang repo pointing to the current head.
git checkout -b -C tools/clang new-branch
# hack hack hack
# Commit both branches.
git commit -C tools/clang # new-branch
git commit # new-branch
# Check out the old branch.
git checkout old-branch
git submodule update

This is twice as many git commands, and almost three times as much
typing, to do the same thing.

Indeed, this is so complicated I expect that many developers wouldn't
bother, and will continue to develop the way we currently do. They
would thus continue to be unable to create clang branches that include
an llvm revision. :(

There are real simplifications and productivity advantages to be had
by using a single repository. They will affect essentially every
developer who makes changes to subprojects other than LLVM proper,
cares about release branches, bisects our code, or builds old
revisions.


So that's the first part, what we have to gain by using a monolithic
repository. Let's address the downsides.

If you'll bear with a hypothetical: Imagine you could somehow make the
monolithic repository behave exactly like the N separate repositories
work today. If so, that would be the best of both worlds: Those of us
who want a monolithic repository could have one, and those of us who
don't would be unaffected. Whatever downsides you were worried about
would evaporate in a mist of rainbows and puppies.

It turns out this hypothetical is very close to reality. The key is
git sparse checkouts [1], which let you check out only some files or
directories from a repository. Using this facility, if you don't like
the switch to a monolithic repository, you can set up your git so
you're (almost) entirely unaffected by it.

If you want to check out only llvm and clang, no problem. Just set up
your .git/info/sparse-checkout file appropriately. Done.

If you want to be able to have two different revisions of llvm and
clang checked out at once (maybe you want to update your clang bits
more often than you update your llvm bits), you can do that too. Make
one sparse checkout just of llvm, and make another sparse checkout
just of clang. Symlink the clang checkout to llvm/tools/clang.
That's it. The two checkouts can even share a common .git dir, so you
don't have to fetch and store everything twice.

As far as I can tell, the only overhead of the monolithic repository
is the extra storage in .git. But this is quite small in the scheme
of things.

The .git dir for the existing monolithic repository [2] is 1.2GB. By
way of comparison, my objdir for a release build of llvm and clang is
3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
0.65G.

If the 1.2G really is a problem for you (or more likely, your
automated infrastructure), a shallow clone [3] takes this down to 90M.

The critical point to me in all this is that it's easy to set up the
monolithic repository to appear like it's a bunch of separate repos.
But it is impossible, insofar as I can tell, to do the opposite. That
is, option (b) is strictly more powerful than option (a).


Renato has understandably pointed out that the current proposal is
pretty far along, so please speak up now if you want to make this
happen. I think we can.

Regards,
-Justin

[1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
info, see http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/.
As far as I can tell, sparse checkouts work fine on Windows, but you
have to use git-bash, see http://stackoverflow.com/q/23289006.
[2] https://github.com/llvm-project/llvm-project
[3] git clone --depth=1 https://github.com/llvm-project/llvm-project.git
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Justin Bogner via llvm-dev

unread,
Jul 20, 2016, 8:02:37 PM7/20/16
to Justin Lebar via llvm-dev
Justin Lebar via llvm-dev <llvm...@lists.llvm.org> writes:
> I would like to (re-)open a discussion on the following specific question:
>
> Assuming we are moving the llvm project to git, should we
> a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
> b) use a single git repository for most llvm subprojects.
>
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
>
> Specifically, my proposal is that all LLVM subprojects that are
> "version-locked" (and/or use the common CMake build system) live in a
> single git repository. That probably means all of the main llvm
> subprojects other than the test-suite and maybe libc++. From looking
> at the repository today that would be: llvm, clang, clang-tools-extra,
> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.

FWIW, I'm opposed. I'm not convinced that the problems with multiple
repos are any worse than the problems with a single repo, which makes
this more or less just change for the sake of change, IMO.

Chandler Carruth via llvm-dev

unread,
Jul 20, 2016, 8:06:43 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev <llvm...@lists.llvm.org> wrote:
Justin Lebar via llvm-dev <llvm...@lists.llvm.org> writes:
> I would like to (re-)open a discussion on the following specific question:
>
>   Assuming we are moving the llvm project to git, should we
>   a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>   b) use a single git repository for most llvm subprojects.
>
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
>
> Specifically, my proposal is that all LLVM subprojects that are
> "version-locked" (and/or use the common CMake build system) live in a
> single git repository.  That probably means all of the main llvm
> subprojects other than the test-suite and maybe libc++.  From looking
> at the repository today that would be: llvm, clang, clang-tools-extra,
> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.

FWIW, I'm opposed. I'm not convinced that the problems with multiple
repos are any worse than the problems with a single repo, which makes
this more or less just change for the sake of change, IMO.

It would be useful to know what problems you see with a single repo that are more significant. In particular, either why you think the problems jlebar already mentioned are worse than he sees them, or what other problems are that he hasn't addressed.

Sanjoy Das via llvm-dev

unread,
Jul 20, 2016, 8:23:28 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
Hi Justin,

On Wed, Jul 20, 2016 at 5:02 PM, Justin Bogner via llvm-dev
<llvm...@lists.llvm.org> wrote:
> FWIW, I'm opposed. I'm not convinced that the problems with multiple
> repos are any worse than the problems with a single repo, which makes
> this more or less just change for the sake of change, IMO.

Right now we *are* in a monorepo, with sequential revision numbers
across llvm and clang, so I'd say trying to move to separate repos is
actually the "change" here. :)

-- Sanjoy

Renato Golin via llvm-dev

unread,
Jul 20, 2016, 8:25:35 PM7/20/16
to Sanjoy Das, Justin Lebar via llvm-dev
On 21 July 2016 at 01:23, Sanjoy Das via llvm-dev

<llvm...@lists.llvm.org> wrote:
> Right now we *are* in a monorepo, with sequential revision numbers
> across llvm and clang, so I'd say trying to move to separate repos is
> actually the "change" here. :)

Not true. SVN can be checked out by directory, Git needs to be cloned
on the root.

Today I *can* checkout only LLVM and Clang. On a single Git repo I can't.

cheers,
--renato

Dean Michael Berris via llvm-dev

unread,
Jul 20, 2016, 8:29:25 PM7/20/16
to Justin Lebar, LLVM Developers

> On 21 Jul 2016, at 09:39, Justin Lebar via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Dear all,
>
> I would like to (re-)open a discussion on the following specific question:
>
> Assuming we are moving the llvm project to git, should we
> a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
> b) use a single git repository for most llvm subprojects.
>
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
>

+1 to everything Justin points out here (and the rest of the email, which I've snipped for brevity).

Before anything else, I've been through a few of these conversions from SVN to git in other projects. In most of the ones I've seen going to submodules of multiple repo's, a lot of automation is required just to keep things manageable. That's hard to do on a cross-platform basis (do you script in Python, shell script, one per OS, etc.) and is really more trouble than it's worth -- especially when adding new submodules and/or removing them. They're not impossible to do, but they're also much more work than a single repo.

Just to point out some devil's advocate positions:

- Keeping the current structure will be less churn to existing consumers that have "out of tree" builds based on the current structure. Asking them to change their workflow with SVN significantly (since moving to GitHub is mostly swayed by the SVN interface) will probably be non-trivial amounts of work. We probably need to document this well enough or show that the switch won't affect them too badly.

- Some people value keeping the history of the commits in SVN and the Git counterpart once the move happens (for a lot of valid reasons). Making sure we can merge the histories of all the subproject repositories into a single one should be addressed to preserve "provenance".

- Some people like isolation of workflows and concerns. As a git-native convert, I'm not sold on this, but there's some good reasons to be able to do this (maintainers of certain projects will probably enforce different constraints on when/who/how changes can/should/must be made). Making it possible to do so in a monorepo should be explained well (i.e. does this need any special configs on the repo on the server side, on GitHub, etc.).

All in all I think optimising for the case of the everyday developer working on multiple projects (in my case LLVM, Clang, and compiler-rt, and maybe potentially XRay as a subproject too) is a good cause. Whether this translates to every special consumer of the current set-up is less clear at least to me -- so I'd like to know what other stakeholders here think.

Cheers

Justin Bogner via llvm-dev

unread,
Jul 20, 2016, 8:36:48 PM7/20/16
to Chandler Carruth, Justin Lebar via llvm-dev

Running the same 'git checkout' commands on multiple repos has always
been sufficient to manage the multiple repos so far - as long as you
create the same branches and tags in each repo, it's easy[1] to manage
the set of repos with a script that cd's to each one and runs whatever
git command.

So it's a pretty minor inconvenience today to have the multiple repos in
the case where you want to check out all of them.

OTOH, if all of the repos are combined into one, you have to do work
when you only want some of them. In my experience, this is basically
always - between my various machines and projects I have a several
checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
checkouts of just llvm. I've only checked out the other repos when I was
changing APIs and needed to update them.

I haven't tried the options jlebar has described to deal with these -
sparse checkouts and whatnot, but they seem like an equivalent amount of
work/learning curve as writing a script that cd's to several directories
and runs the same git command in each.

Thus, this also sounds like a minor inconvenience. I just don't see how
trading one for the other is worth doing, since AFAICT they're equally
inconvenient.

[1] My understanding of the "umbrella repo" thing for bisecting is that
it'll be managed automatically by a cron or checkin hooks or
whatever, so the bit's in jlebar's description about updating
submodules seem like a red herring. I'm assuming that we end up in a
place where working with git is essentially the same as we work with
git-svn today.

Justin Lebar via llvm-dev

unread,
Jul 20, 2016, 8:39:39 PM7/20/16
to Renato Golin, Justin Lebar via llvm-dev
> Today I *can* checkout only LLVM and Clang. On a single Git repo I can't.

This is true if you s/checkout/clone/. With a single repo, you must
clone (download) everything (*), but after you've done so you can use
sparse checkouts to check out (create a working copy of) only llvm and
clang. So you should only notice the fact that there exist things
other than llvm and clang when you first clone (download) llvm.

Either way switching to git is going to be a change from the status
quo. Personally I'm more interested in finding the best overall
solution than the solution which is "most similar" to the current
setup under some metric.

(*) Technically, if you do a shallow clone, you have to download a
single revision of everything. That's the 90mb number from my
original post.

Chandler Carruth via llvm-dev

unread,
Jul 20, 2016, 8:47:17 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
A notable difference is the ability to do API updates across them or the ability to bisect across them.

Also, if the infrastructure that keeps the umbrella repo in sync falls over or has a serious problem, reconstructing version-locked state in order to bisect across those regions of time seems quite challenging. So IMO, it isn't a minor inconvenience, even if it is something we could overcome.
 
So it's a pretty minor inconvenience today to have the multiple repos in
the case where you want to check out all of them.

OTOH, if all of the repos are combined into one, you have to do work
when you only want some of them. In my experience, this is basically
always - between my various machines and projects I have a several
checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
checkouts of just llvm. I've only checked out the other repos when I was
changing APIs and needed to update them.

I haven't tried the options jlebar has described to deal with these -
sparse checkouts and whatnot, but they seem like an equivalent amount of
work/learning curve as writing a script that cd's to several directories
and runs the same git command in each.

I actually would like to see an example of how you would checkout a common subset with the sparse checkout feature. jlebar, could you give us demo commands for this?

In particular, I've had a lot of folks come up and ask me for my script to walk all the directories and run the appropriate git commands in them, and if it is easier to have the GettingStarted page document how to use the sparse checkout thing, that would be nice.

Mehdi Amini via llvm-dev

unread,
Jul 20, 2016, 8:53:17 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
IIUC you seem to explain that there are minor inconveniences on both side, but then I’m not sure about why you are opposed? It seems pretty equal,

Also the minor inconvenience in the case of the monolithic repository is happening during the initial setup/clone/checkout, and not during day-to-day development (git pull, git checkout -b, git commit, git push), while the split model induces “minor inconveniences” in the day-to-day developer interaction.
I.e. I prefer using a script to checkout and setup the repo, and then be able to use the standard git commands for interacting with it.


[1] My understanding of the "umbrella repo" thing for bisecting is that
   it'll be managed automatically by a cron or checkin hooks or
   whatever,

That’s also something that is fragile to me without a deterministic way to reconstruct it identically from scratch using only the split repositories (which would be possible with "git notes” attached by a server-side hook for instance, but unfortunately github does not allow it, and the current split-repository proposal exclude even *discussing* the merits of other hosting services).


so the bit's in jlebar's description about updating
   submodules seem like a red herring. I'm assuming that we end up in a
   place where working with git is essentially the same as we work with
   git-svn today.

Some people manage today to have a single commit that update clang+llvm at the same time. 
I believe doing this in the split-repository model requires write-access to the umbrella repo.


— 
Mehdi

Renato Golin via llvm-dev

unread,
Jul 20, 2016, 8:56:32 PM7/20/16
to Justin Lebar, Justin Lebar via llvm-dev
On 21 July 2016 at 01:39, Justin Lebar <jle...@google.com> wrote:
> This is true if you s/checkout/clone/. With a single repo, you must
> clone (download) everything (*), but after you've done so you can use
> sparse checkouts to check out (create a working copy of) only llvm and
> clang. So you should only notice the fact that there exist things
> other than llvm and clang when you first clone (download) llvm.

So, we use that to a certain extent.

Linaro's GCC validation uses the full checkout, then do a shallow
checkout that only has the updates.

Our LLVM scripts, OTOH, clone all repos and use worktree for *all*
branches, and we only branch on the repos that we choose, for each
"working dir".

Our scripts probably would need certain modifications... but it should be fine.

But I'm not, by far, the most problematic user.

The real problem, and why people accepted sub-modules, is that a lot
of downstream people only use one or another projects. Mostly LLVM or
Clang or libc++.

Checking out all of it is bad, but having them officially interlinked,
it seems, is worse. IIUC, the problem is that the projects are now
built independently on their projects, but more and more CMake changes
are creeping in, making it harder and harder to separate their
projects from the rest of LLVM. This means they'll now depend on a
much larger body of sources that will need to be compiled together,
and will probably mean they'll abandon LLVM in favour of something
lighter.

I honestly don't know how big is that problem, I don't have it myself,
but I "can imagine" compiling LLVM and Clang without need would be
pretty bad.

Justin Lebar via llvm-dev

unread,
Jul 20, 2016, 9:00:36 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
> Running the same 'git checkout' commands on multiple repos has always been sufficient to manage the multiple repos so far

Huh. It definitely hasn't worked well for me.

Here's the issue I face every day. I may be working on (unrelated)
changes to clang and llvm. I update my llvm tree (say I checked in a
patch, or I want to pull in changes someone else has checked in). Now
I want to go back to hacking on my clang stuff. Because my clang
branch is not connected to a specific LLVM revision, it no longer
compiles. I'm trying to build an old clang against a new llvm.

Now I have to pull the latest clang and rebase my patches. After I
deal with rebase conflicts (not what I wanted to do at the moment!),
I'm in a new state, which means when I build my ccache is no help.
And when I run the clang tests, I don't know whether to expect test
failures. So then I have to pop of my patches and run at head...
(Maybe I have to update clang! In which case I also have to update
llvm...)

This would all be solved with zero work on my part if llvm and clang
were in one repository. Then when I switched to working on my clang
patches, I would automatically check out a version of LLVM that is
compatible.

I think this is the main thing that people aren't getting. Maybe
because it's never been possible before to have a workflow like this.
But having a git branch that you can check out and immediately build
-- without any rebasing, re-syncing, or other messing around -- is
incredibly powerful.

Please let me know if this is still not clear -- it's kind of the key point.

As I said, you can accomplish this with submodules, too, but it
requires the complex hackery from my original email.

To me, this is not at all a minor inconvenience. It's at least an
hour of wasted time every week.

> I haven't tried the options jlebar has described to deal with these - sparse checkouts and whatnot, but they seem like an equivalent amount of work/learning curve as writing a script that cd's to several directories and runs the same git command in each.

I'll send sparse checkout instructions separately. But my example
submodules commands are not at all equivalent to a script that cd's
into several directories and runs a git command in each, and I think
this is the main point of confusion. (In fact you wouldn't need to
write such a script; it's just "git submodule foreach".)

The submodules commands creates a single branch in the umbrella repo
that encompasses the checked-out state of *all the LLVM subrepos*. So
you can, at a later time, check out this branch in the umbrella repo
and all the clang, llvm, etc. bits will be identical to the last time
you were on the branch.

If all you want is to continue using git the way you use it now, the
multiple git repos gets you that (as does a sparse checkout on the
single repo). My point is that, the move to git opens up a new, much
more powerful workflow with branches that encompass both llvm and
clang state. We can do this with or without submodules, but using
submodules for this is far more awkward than using a single repo.

-Justin L.

On Wed, Jul 20, 2016 at 5:36 PM, Justin Bogner via llvm-dev

Mehdi Amini via llvm-dev

unread,
Jul 20, 2016, 9:02:11 PM7/20/16
to Renato Golin, Justin Lebar via llvm-dev


You seem to imply that all the projects in the single repo would be built by default, while it is not part of the proposal.
Actually I’d expect an opt-in mechanism, so that: `mkdir build-llvm && cd build-llvm && cmake ../llvm` only builds LLVM.


Mehdi

Justin Bogner via llvm-dev

unread,
Jul 20, 2016, 9:05:03 PM7/20/16
to Mehdi Amini, Justin Lebar via llvm-dev
Mehdi Amini <mehdi...@apple.com> writes:
>> Running the same 'git checkout' commands on multiple repos has always
>> been sufficient to manage the multiple repos so far - as long as you
>> create the same branches and tags in each repo, it's easy[1] to manage
>> the set of repos with a script that cd's to each one and runs whatever
>> git command.
>>
>> So it's a pretty minor inconvenience today to have the multiple repos in
>> the case where you want to check out all of them.
>>
>> OTOH, if all of the repos are combined into one, you have to do work
>> when you only want some of them. In my experience, this is basically
>> always - between my various machines and projects I have a several
>> checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
>> checkouts of just llvm. I've only checked out the other repos when I was
>> changing APIs and needed to update them.
>>
>> I haven't tried the options jlebar has described to deal with these -
>> sparse checkouts and whatnot, but they seem like an equivalent amount of
>> work/learning curve as writing a script that cd's to several directories
>> and runs the same git command in each.
>>
>> Thus, this also sounds like a minor inconvenience. I just don't see how
>> trading one for the other is worth doing, since AFAICT they're equally
>> inconvenient.
>
> IIUC you seem to explain that there are minor inconveniences on both
> side, but then I’m not sure about why you are opposed? It seems pretty
> equal,

I should clarify, this is a -0 kind of opposed. If people overwhelmingly
think this is the way to go, I won't try to block it or anything. I'd
rather not have to update a bunch of workflow, infrastructure, and bots
for no particular reason though.

> Also the minor inconvenience in the case of the monolithic repository
> is happening during the initial setup/clone/checkout, and not during
> day-to-day development (git pull, git checkout -b, git commit, git
> push), while the split model induces “minor inconveniences” in the
> day-to-day developer interaction.
> I.e. I prefer using a script to checkout and setup the repo, and then
> be able to use the standard git commands for interacting with it.
>
>
>> [1] My understanding of the "umbrella repo" thing for bisecting is that
>> it'll be managed automatically by a cron or checkin hooks or
>> whatever,
>
> That’s also something that is fragile to me without a deterministic
> way to reconstruct it identically from scratch using only the split
> repositories (which would be possible with "git notes” attached by a
> server-side hook for instance, but unfortunately github does not allow
> it, and the current split-repository proposal exclude even
> *discussing* the merits of other hosting services).

I haven't been following that discussion, but that seems surprising
since AFAICT the only particularly compelling reason to move away from
SVN is that it's easy to find good reliable hosting.

>
>> so the bit's in jlebar's description about updating
>> submodules seem like a red herring. I'm assuming that we end up in a
>> place where working with git is essentially the same as we work with
>> git-svn today.
>
> Some people manage today to have a single commit that update
> clang+llvm at the same time.
> I believe doing this in the split-repository model requires
> write-access to the umbrella repo.

_______________________________________________

Chandler Carruth via llvm-dev

unread,
Jul 20, 2016, 9:06:23 PM7/20/16
to Mehdi Amini, Renato Golin, Justin Lebar via llvm-dev
If we end up with a single repository, I agree and think at least some level of opt-in for building subprojects is essential.

I would expect at *most* to automatically enable building the set of subprojects we currently suggest by default in the getting started docs. Any more than that wouldn't make sense, and I could even imagine defaulting *fewer* projects at the build system level.
 

Daniel Berlin via llvm-dev

unread,
Jul 20, 2016, 9:07:03 PM7/20/16
to Renato Golin, Justin Lebar via llvm-dev
On Wed, Jul 20, 2016 at 5:56 PM, Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:
On 21 July 2016 at 01:39, Justin Lebar <jle...@google.com> wrote:
> This is true if you s/checkout/clone/.  With a single repo, you must
> clone (download) everything (*), but after you've done so you can use
> sparse checkouts to check out (create a working copy of) only llvm and
> clang.  So you should only notice the fact that there exist things
> other than llvm and clang when you first clone (download) llvm.

So, we use that to a certain extent.

Linaro's GCC validation uses the full checkout, then do a shallow
checkout that only has the updates.

Our LLVM scripts, OTOH, clone all repos and use worktree for *all*
branches, and we only branch on the repos that we choose, for each
"working dir".

Our scripts probably would need certain modifications... but it should be fine.

But I'm not, by far, the most problematic user.

The real problem, and why people accepted sub-modules, is that a lot
of downstream people only use one or another projects. Mostly LLVM or
Clang or libc++.

Checking out all of it is bad,

Define bad?
Time?
Disk space?
Bandwidth?

I mean, we already assume you have a lot of each anyway?

but having them officially interlinked,
it seems, is worse.

Why?
Below it sounds like you want to do this as a way of enforcing projects to stay independent of each other.
 
I would posit that this is not the best way to do this?

Renato Golin via llvm-dev

unread,
Jul 20, 2016, 9:08:51 PM7/20/16
to Chandler Carruth, Justin Lebar via llvm-dev
On 21 July 2016 at 02:05, Chandler Carruth <chan...@google.com> wrote:
> If we end up with a single repository, I agree and think at least some level
> of opt-in for building subprojects is essential.

We were originally trying to avoid too many moves at the same time.

There is already some CMake efforts to help build the different
repositories, but it's not linked to any proposal.

I think doing so would complicate both build system and version
control migrations...

--renato

Justin Lebar via llvm-dev

unread,
Jul 20, 2016, 9:17:52 PM7/20/16
to Chandler Carruth, Justin Lebar via llvm-dev
> I actually would like to see an example of how you would checkout a common subset with the sparse checkout feature. jlebar, could you give us demo commands for this?

$ git clone --depth 1 https://github.com/llvm-project/llvm-project.git
$ cd llvm
$ ls
clang clang-tools-extra compiler-rt dragonegg klee ...
$ git config core.sparsecheckout true
$ echo "/llvm
/clang" > .git/info/sparse-checkout
$ git read-tree -mu HEAD
$ ls
clang llvm

I suppose you could even wrap this in a script and ship that with
llvm, if you wanted.

Renato Golin via llvm-dev

unread,
Jul 20, 2016, 9:18:38 PM7/20/16
to Daniel Berlin, Justin Lebar via llvm-dev
On 21 July 2016 at 02:06, Daniel Berlin <dbe...@dberlin.org> wrote:
>> Checking out all of it is bad,
>
> Define bad?
> Time?
> Disk space?
> Bandwidth?
>
> I mean, we already assume you have a lot of each anyway?

This is not about me, it's about people that use LLVM projects elsewhere.


>> but having them officially interlinked, it seems, is worse.
>
> Why?
> Below it sounds like you want to do this as a way of enforcing projects to
> stay independent of each other.

Why every one take my comments as my own personal motives?

I'm just the "consensus seeker". None of these ideas are mine, I'm
just echoing what was said in 320 emails, plus what was said in the
past few years when people discussed about using pure Git.

People in the IRC were saying I had ulterior motives, that I was
pushing people to use GitHub or sub-modules, or whatever. This is
*really* not cool.

Every single thread so far has died down and I wrote a summary, and no
one said anything. Then I created another thread, and wrote another
summary. Once no one was disagreeing, I wrote the text.

Now every one wants to disagree again. Seriously?

I *personally* don't care if we use GitHub, or GitLab, Git or
mercurial. I don't care if we have sub-modules or a monolithic
repository, but I'm not the only user.

LLVM has, so far, taken the modular approach that other projects can
embed our projects on their products. Downstream commercial products
do that, other OSS projects do that, and that's pretty cool.

GCC has had a *huge* flying monster in the last decade because they
weren't modular enough and that has been the big difference of LLVM,
and why it gained traction on impossible partners, like Emacs.

If we're saying we want to close everything down and make a compiler
like GCC, that will make my life **MUCH** easier. So there is
absolutely *no* point in me pushing the other way.

But I'm not the only user... And I'd rather not be selfish.

If the consensus has changed from last week, or if no one has actually
read the emails and threads and want to do it all over again, please
be my guest.

Justin Bogner via llvm-dev

unread,
Jul 20, 2016, 9:26:50 PM7/20/16
to Justin Lebar, Justin Lebar via llvm-dev

I don't know man, when I create a branch to save my clang work I just
create a branch with the same name in all the other repos I have checked
out, then it just stays in the state I left it in as I go do other
stuff. This kind of problem just hasn't really come up for me.

If I do `git log` in a sparse checkout that just has LLVM, will it only
show me LLVM commits? That is, how easy is it to filter out the
clang/lldb/subproject-X commits from a log? Negative globs are kind of
awkward.

Mehdi Amini via llvm-dev

unread,
Jul 20, 2016, 9:28:16 PM7/20/16
to Chandler Carruth, Justin Lebar via llvm-dev
Since there is already a unified repo for testing here: https://github.com/llvm-project/llvm-project

Here is what it would look like for someone interested in checking out in LLVM and Clang only:

# Prepare the git repo
mkdir llvm
cd llvm
git init
git remote add origin g...@github.com:llvm-project/llvm-project.git

# Setup the sparse checkout, asking for clang and llvm only
git config core.sparseCheckout true
mkdir .git/info
echo /llvm >> .git/info/sparse-checkout 
echo /clang >> .git/info/sparse-checkout 

# Actually fetch the data and checkout just clang and llvm.
git pull origin master

# At this point the checkout contains the directories for clang and llvm only.

Obviously this will download the 2.5GB repository (all branches for all projects), but that should happen *once* on a developer machine (future clone can be using `git worktree`).
For bots, shallow clone are efficient, with some modification to the script above:

# Prepare the git repo
mkdir llvm
cd llvm
git init
git remote add origin g...@github.com:llvm-project/llvm-project.git -t master

# Setup the sparse checkout, asking for clang and llvm only
git config core.sparseCheckout true
mkdir .git/info
echo /llvm >> .git/info/sparse-checkout 
echo /clang >> .git/info/sparse-checkout 

# Actually fetch the data and checkout just clang and llvm.
git pull origin master --depth=1
# alternatively: git fetch —depth=1 && git reset —hard origin/master

(That’s 81.58MB download, independently of the number of sub-projects to actually checkout)


— 
Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 20, 2016, 9:36:26 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
“git log” would show the full history with a sparse checkout, including the commits that are touching a subdirectory that is not checked out.
From the top of the project you’d have to type “git log llvm” to have only the llvm history. I’m not sure if there is a config/alias for that, but a custom git-log script could read the sparse-checkout config to filter it by default.


— 
Mehdi

Justin Lebar via llvm-dev

unread,
Jul 20, 2016, 10:02:51 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.

Ah, I understand your workflow now. That works, I guess. It's
definitely better than what I've been doing. :)

You have to write and use these scripts, of course. I think that's
the main problem -- git is hard enough as it is; asking me to do most
git commands completely differently when I happen to be working on
llvm is asking a lot. Even asking everyone to realize that there's a
better way is asking a lot. Inasmuch as we can make the commands we
type every day Just Work Like Any Other Git Repository, I think that's
a clear win for the community's overall productivity.

Beyond that, I guess the main benefits wrt workflow of the single repo
are that you can much more easily work with cross-cutting changes.
You can stash them, bisect them, reorder them, commit a bunch with one
command, whatever, there's nothing special about the fact that they're
cross-cutting.

And of course we don't get atomic commits across subprojects at all
without a single repo. That really would be nice for certain kinds of
changes.

But I think the bigger point wrt workflows is that there's a real
benefit to having fewer special snowflakes in our lives.

-Justin L.

Daniel Berlin via llvm-dev

unread,
Jul 20, 2016, 10:07:01 PM7/20/16
to Renato Golin, Justin Lebar via llvm-dev
On Wed, Jul 20, 2016 at 6:18 PM, Renato Golin <renato...@linaro.org> wrote:
On 21 July 2016 at 02:06, Daniel Berlin <dbe...@dberlin.org> wrote:
>> Checking out all of it is bad,
>
> Define bad?
> Time?
> Disk space?
> Bandwidth?
>
> I mean, we already assume you have a lot of each anyway?

This is not about me, it's about people that use LLVM projects elsewhere.
 


>> but having them officially interlinked, it seems, is worse.
>
> Why?
> Below it sounds like you want to do this as a way of enforcing projects to
> stay independent of each other.

Why every one take my comments as my own personal motives
I don't, but i can see how others might
See below.
 
I'm just the "consensus seeker". None of these ideas are mine, I'm
just echoing what was said in 320 emails, plus what was said in the
past few years when people discussed about using pure Git.

So, if you want to raise the concerns of others, you really need to be a bit more detailed about who and what.
Otherwise it honestly just comes off as "vague objection".

Even a minimum of "if you look at what X said about Y in the thread", or something, would go a long way here.

Otherwise you are basically saying "hey, i think i heard, in the past 300 emails, X".  That's not really something that one can respond to reasonably.



People in the IRC were saying I had ulterior motives, that I was
pushing people to use GitHub or sub-modules, or whatever. This is
*really* not cool.

That is definitely not cool. I don't think you do. I converted GCC from CVS to SVN, so i know how this feels, believe me :)
  

Every single thread so far has died down and I wrote a summary, and no
one said anything. Then I created another thread, and wrote another
summary. Once no one was disagreeing, I wrote the text.

Now every one wants to disagree again. Seriously?


FWIW: I actually think the LLVM community ratholes on a lot of things, *way too much*. Not sure we are quite at that point yet on this. 

I *personally* don't care if we use GitHub, or GitLab, Git or
mercurial. I don't care if we have sub-modules or a monolithic
repository, but I'm not the only user.

LLVM has, so far, taken the modular approach that other projects can
embed our projects on their products. Downstream commercial products
do that, other OSS projects do that, and that's pretty cool.

GCC has had a *huge* flying monster in the last decade because they
weren't modular enough and that has been the big difference of LLVM,
and why it gained traction on impossible partners, like Emacs.

Errr, i'm not sure this is really the reason, but let's ignore that :)

 

If we're saying we want to close everything down and make a compiler
like GCC, that will make my life **MUCH** easier.

I don't think anyone has said that. I simply pointed out having a monolithic repo or not should be 100% orthogonal to that.
 
So there is
absolutely *no* point in me pushing the other way.

But I'm not the only user... And I'd rather not be selfish.

If the consensus has changed from last week, or if no one has actually
read the emails and threads and want to do it all over again, please
be my guest.

I think you may need to move a *little* slower, FWIW. On one hand you are saying "there are 300+ emails", but you expect consensus in a week?
That seems .. a bit much :)
What if someone was on vacation last week?

I read literally every email in that thread. I guess i don't see all the concerns being raised you do? I see like one or two emails that could be taken as concerns.
So like I said, if you are going to seek consensus and drive it by voicing the concerns of others, that's great. I applaud it. But when doing it, you may want to make clear that is what you are doing, and who said what, so that the right people can be cc'd with the right responses, etc

Otherwise i'm not sure it's as helpful as one might think. (i could be wrong of course)

cheers,
--renato

James Y Knight via llvm-dev

unread,
Jul 20, 2016, 10:08:37 PM7/20/16
to Renato Golin, Justin Lebar via llvm-dev
On Wed, Jul 20, 2016 at 6:18 PM, Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:
Why every one take my comments as my own personal motives?

I'm just the "consensus seeker". None of these ideas are mine, I'm
just echoing what was said in 320 emails, plus what was said in the
past few years when people discussed about using pure Git.

People in the IRC were saying I had ulterior motives, that I was
pushing people to use GitHub or sub-modules, or whatever. This is
*really* not cool.

Every single thread so far has died down and I wrote a summary, and no
one said anything. Then I created another thread, and wrote another
summary. Once no one was disagreeing, I wrote the text.

Now every one wants to disagree again. Seriously?

I really really sympathize.

I, too, had read the previous N-hundred messages as having mostly opposition to a single repository solution, and, despite feeling it would be better, had decided it would not be worth my time to push for it. I can live with multi-repo, and I certainly want to move to git more than I want to move to single-repo. And so, here we are -- you've developed a complete proposal for the multi-repo solution, with a high degree of consensus around it.

And, then, suddenly, in the last day or so, a bunch of support seems to have shown up for the one-repo solution. Way more than I'd ever expected for sure. Even from people I had thought were super-opposed to it turned out not to be!

I'm also really sad to hear that people have been impugning your motives, because you've done a tremendous amount of work to bring this to a conclusion, and it really ought to be clear to everyone that you've been doing an admirable job of driving towards consensus here, and basically nothing more.

IMO, the only reason we can even have this conversation about a single-repo reasonably now is because of your work in writing up clearly the scheme for a multi-repo solution. So I hope you don't feel discouraged by this turn of events! I personally put the entire credit of getting to this point on your hard work.

But, anyways, +1 on a single-repo solution from me.

Before we can agree to merge to a single-repo, there's one further question that must be resolved:

Should the layout in the merged repository be:
1) Like the "llvm-project" git repository is now:

<root>/llvm/
<root>/clang/
<root>/compiler-rt
...

2) Like the "ideal merged checkout" is now:
llvm/
llvm/tools/clang
llvm/projects/compiler-rt
...


I don't much care which of those is chosen. I have a slight preference for #1, for ease of doing things like grep/log/etc on llvm by itself, excluding all the other projects. But either way seems probably fine, and an improvement over multiple repositories.

Chandler Carruth via llvm-dev

unread,
Jul 20, 2016, 10:38:29 PM7/20/16
to James Y Knight, Renato Golin, Justin Lebar via llvm-dev
On Wed, Jul 20, 2016 at 7:08 PM James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:
Before we can agree to merge to a single-repo, there's one further question that must be resolved:

I disagree that we have to solve this first. Personally, I think that we should ifgure out whether we want to have separate repos despite them being version-locked first. And if we do, then we should discuss what the layout should look like.
 

Should the layout in the merged repository be:
1) Like the "llvm-project" git repository is now:

<root>/llvm/
<root>/clang/
<root>/compiler-rt
...

2) Like the "ideal merged checkout" is now:
llvm/
llvm/tools/clang
llvm/projects/compiler-rt
...


I don't much care which of those is chosen. I have a slight preference for #1, for ease of doing things like grep/log/etc on llvm by itself, excluding all the other projects. But either way seems probably fine, and an improvement over multiple repositories.

FWIW, I strongly prefer #2, but I think the high order bit is the repository question.

Sean Silva via llvm-dev

unread,
Jul 20, 2016, 10:41:28 PM7/20/16
to Justin Bogner, Justin Lebar via llvm-dev
On Wed, Jul 20, 2016 at 5:02 PM, Justin Bogner via llvm-dev <llvm...@lists.llvm.org> wrote:
Justin Lebar via llvm-dev <llvm...@lists.llvm.org> writes:
> I would like to (re-)open a discussion on the following specific question:
>
>   Assuming we are moving the llvm project to git, should we
>   a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>   b) use a single git repository for most llvm subprojects.
>
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
>
> Specifically, my proposal is that all LLVM subprojects that are
> "version-locked" (and/or use the common CMake build system) live in a
> single git repository.  That probably means all of the main llvm
> subprojects other than the test-suite and maybe libc++.  From looking
> at the repository today that would be: llvm, clang, clang-tools-extra,
> lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.

FWIW, I'm opposed. I'm not convinced that the problems with multiple
repos are any worse than the problems with a single repo, which makes
this more or less just change for the sake of change, IMO.

Just my experience, but having worked extensively with both, the single integrated repository is *much* nicer.

-- Sean Silva

C Bergström

unread,
Jul 20, 2016, 11:02:25 PM7/20/16
to Sean Silva, Justin Lebar via llvm-dev
Can this conversation be moved to a -projects or -infra or
-somewhere-else place.. pretty please. While I have my own opinions -
I don't think my feedback would be heard over the voices of loud
contributors - as such and probably being in the same bucket as many
others on this list, kindly resolve this issue in another channel of
communication - again please set this up and stop flooding a developer
list with -infra issues.

Tim Northover via llvm-dev

unread,
Jul 20, 2016, 11:39:53 PM7/20/16
to C Bergström, Justin Lebar via llvm-dev
On 20 July 2016 at 20:01, C Bergström <llvm...@lists.llvm.org> wrote:
> Can this conversation be moved to a -projects or -infra or
> -somewhere-else place.. pretty please.

No. This affects a large part of the LLVM community and llvm-dev is
the most universal place we have to discuss such issues at the moment.

Feel free to help us set up a better list so your delicate
collaborators aren't bombarded with e-mails if you want, but that's a
separate discussion.

Tim.

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 2:03:59 AM7/21/16
to Daniel Berlin, Justin Lebar via llvm-dev
On 21 July 2016 at 03:06, Daniel Berlin <dbe...@dberlin.org> wrote:
> So, if you want to raise the concerns of others, you really need to be a bit
> more detailed about who and what.
> Otherwise it honestly just comes off as "vague objection".

Sigh.

A bit of history.... as precise as I can muster.

1. Git's been on the back of our heads for a long time
2. Some event (can't remember) triggered a discussion on IRC where
some core devs were mostly in agreement
3. I decided to take on, folks were happy, I sent a huge email with
*ALL* options. Local hosting, external hosting (all), Git, Mercurial,
SVN, monolithic, or not, etc.
4. There were hundreds of emails, in many cycles, and in each step, I
took a step back, wrote everything that was being said (not what I
wanted), and waited for disagreement.

During this process, I also proposed "voting". Tanya, very helpfully,
said it would be better to have a survey, so we don't take hard
decisions based on simple counts. Chandler, also very helpfully said
we needed a concrete implementation example in which to base our
decisions. People seemed generally in favour to gauge the opinions
more generally, so having "A" proposal was better than having general
discussions.

For me, personally, any monolithic Git repository would do. But there
was a lot of feedback on it not being monolitic, and then on it having
sub-modules, so I was echoing the larger voice, not my own.

And the idea, at least as "I" interpreted it, was to have "some"
concrete example, and a wide survey.

No one said that:
* the result of the survey would dictate the move
* I would get to choose it alone (s/I/anyone/)
* there weren't better models

I specifically stated that, this was one of the models, which I was
trying to push through the survey in the interest of getting a feeling
for how people like it *really*.

I specifically said there could be other proposals, other surveys.


> Even a minimum of "if you look at what X said about Y in the thread", or
> something, would go a long way here.
>
> Otherwise you are basically saying "hey, i think i heard, in the past 300
> emails, X". That's not really something that one can respond to reasonably.

Over simplifying it is a bit offensive.

Taking one of my points in separate as if it means my whole argument,
each time, *is* over simplifying it.


> FWIW: I actually think the LLVM community ratholes on a lot of things, *way
> too much*. Not sure we are quite at that point yet on this.

Having a precise proposal and survey is one way many people proposed
to get out of the rat hole. People are generally more conscious in
surveys than replying to email threads, and any personal attack they
send is restricted to the idea (or becomes childish), which is really
what we want. Mailing lists are too prone to trolling to be an
effective consensus reaching place. We've seen our share.

So, a cyclic model with a proposal and a survey seem like a good thing
to do. GitHub+modules is not *my* proposal, but *our* first proposal.
That's why I added it to a "Proposals" directory in the docs, and why
I wasn't worrying too much if people liked it on the review. It is one
reflection of one discussion from one angle.


>> GCC has had a *huge* flying monster in the last decade because they
>> weren't modular enough and that has been the big difference of LLVM,
>> and why it gained traction on impossible partners, like Emacs.
>
> Errr, i'm not sure this is really the reason, but let's ignore that :)

Again, taking one point as if I meant *everything*, and over simplifying.

There was certainly a tone to GCC's predicament that was not being
modular enough (being used as a library, extending its AST to Emacs,
having external projects use it in some form), but I have made no
assertion as to what *it* is, or how important *it* is in the whole
scheme of things. You should make no assumption as to my intentions
other than a simple statement.


> I think you may need to move a *little* slower, FWIW. On one hand you are
> saying "there are 300+ emails", but you expect consensus in a week?
> That seems .. a bit much :)
> What if someone was on vacation last week?

The threads lasted for 1 1/2 months, after "soft" discussions for years.

I didn't expect consensus "on the whole problem" in a week, just
consensus on the first proposal, GitHub+modules, which seemed had
already been reached weeks before.


> So like I said, if you are going to seek consensus and drive it by voicing
> the concerns of others, that's great. I applaud it. But when doing it, you
> may want to make clear that is what you are doing, and who said what, so
> that the right people can be cc'd with the right responses, etc

You have no idea how many times I read the same emails over and over
to make sure I cite the right person. That's why I have consistently
re-written a summary of every thread, with proper quotes and
everything.

But as you said, we tend to not get out of rat holes, and there is so
much I can cope with to go back reading the same emails.

A lot of what happened is that a number of people (and I'm being
purposely vague) are opposed to it, and are raising the same concerns
over and over, even though there were arguments to refute what they
are saying.

How many times more do I need to go back, read the emails again and
quote what people said, so that people can feel comfortable? Is that
really the best use of *our* time? Keeping ourselves in rat holes?

I personally think not. And why I wanted to get at least one proposal
out and see what people thought of it.

This may be entirely the wrong approach, and I accept your arguments,
and that's why I wrote "be my guest". It wasn't out of spite, but I'm
really saying, "please do it".

However, I'd really like if people would stop the personal attacks.
Reiterating, this is not *my* proposal.

So, from now on...

* I've made my part and got "consensus" for one proposal. It is what it is.
* Justin is forming consensus on the monolithic version. This is a
*different* proposal, so it needs to take into account hosting service
and everything else we did in the first.
* Please add a similar document to "docs/Proposals" at the end.
* Repeat.

I'll refrain from driving any other proposal in the interest of mental
sanity (and personal time), not because I support the GitHub+modules
proposal.

When everyone is happy that we have enough proposals, Tanya's survey
should be brought forward, in which case I'll gladly offer my help
again.

I hope this is clear enough and people will stop second guessing me.

regards,

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 2:12:50 AM7/21/16
to James Y Knight, Justin Lebar via llvm-dev
On 21 July 2016 at 03:08, James Y Knight <jykn...@google.com> wrote:
> And, then, suddenly, in the last day or so, a bunch of support seems to have
> shown up for the one-repo solution. Way more than I'd ever expected for
> sure. Even from people I had thought were super-opposed to it turned out not
> to be!

Same here.


> I'm also really sad to hear that people have been impugning your motives,
> because you've done a tremendous amount of work to bring this to a
> conclusion, and it really ought to be clear to everyone that you've been
> doing an admirable job of driving towards consensus here, and basically
> nothing more.

Thank you. Appreciated.


> IMO, the only reason we can even have this conversation about a single-repo
> reasonably now is because of your work in writing up clearly the scheme for
> a multi-repo solution. So I hope you don't feel discouraged by this turn of
> events! I personally put the entire credit of getting to this point on your
> hard work.

Haven't though of it that way. I feel better already. :)


> I don't much care which of those is chosen. I have a slight preference for
> #1, for ease of doing things like grep/log/etc on llvm by itself, excluding
> all the other projects. But either way seems probably fine, and an
> improvement over multiple repositories.

I don't have a strong preference, but #1 proponents weakly convinced
me with two arguments:

1. it is easier to mix-and-match repositories as you like

I'd still symlink as I do today, but I can see why this would be
interesting for off-tree users.

2. it "makes more sense" to let Clang *use* LLVM instead of LLVM *host* Clang

this seems more preference than anything, but people that know CMake
more than I do said it would be "easier" and I trust them. I have no
technical arguments pro or against.

Though, I'd be fine with anything really.

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 2:14:21 AM7/21/16
to Tim Northover, Justin Lebar via llvm-dev
On 21 July 2016 at 04:39, Tim Northover via llvm-dev

<llvm...@lists.llvm.org> wrote:
> Feel free to help us set up a better list so your delicate
> collaborators aren't bombarded with e-mails if you want, but that's a
> separate discussion.

There was a separate thread where people seemed in favour of a
different list. It's probably only a matter of time before the
foundation creates it.

--renato

Sean Silva via llvm-dev

unread,
Jul 21, 2016, 2:51:30 AM7/21/16
to Justin Bogner, Justin Lebar via llvm-dev
It has for me, and it is a serious problem.
It is extremely easy (even with a full checkout): `git log llvm/`

-- Sean Silva

Sean Silva via llvm-dev

unread,
Jul 21, 2016, 2:54:06 AM7/21/16
to Justin Bogner, Justin Lebar via llvm-dev
To be clear, this is with a layout like: https://github.com/llvm-project/llvm-project

-- Sean Silva

Justin Bogner via llvm-dev

unread,
Jul 21, 2016, 3:19:20 AM7/21/16
to Sean Silva, Justin Lebar via llvm-dev

Right - I was assuming a layout where the subprojects are already in the
places they need to be checked out to. With llvm-project's layout my
question is silly.

llvm-project's layout is kind of annoying, since with that I have to
check out all of the repos yet I still need to add symlinks or something
to actually use any of them. It also means that anyone who is only using
llvm has to change their paths from /path/to/llvm to /path/to/llvm/llvm,
which is a little bit ugly.

Sean Silva via llvm-dev

unread,
Jul 21, 2016, 5:05:03 AM7/21/16
to Justin Bogner, Justin Lebar via llvm-dev
Hopefully we could teach CMake about the new "default" structure. We already have LLVM_EXTERNAL_{CLANG,LLD}_SOURCE_DIR (which is what I use with llvm-project) so in principle CMake already is close to being capable of understanding this layout if it becomes the default.

-- Sean Silva

David Chisnall via llvm-dev

unread,
Jul 21, 2016, 5:51:39 AM7/21/16
to Renato Golin, Justin Lebar via llvm-dev
On 21 Jul 2016, at 07:12, Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:
>
>> I don't much care which of those is chosen. I have a slight preference for
>> #1, for ease of doing things like grep/log/etc on llvm by itself, excluding
>> all the other projects. But either way seems probably fine, and an
>> improvement over multiple repositories.
>
> I don't have a strong preference, but #1 proponents weakly convinced
> me with two arguments:
>
> 1. it is easier to mix-and-match repositories as you like
>
> I'd still symlink as I do today, but I can see why this would be
> interesting for off-tree users.
>
> 2. it "makes more sense" to let Clang *use* LLVM instead of LLVM *host* Clang
>
> this seems more preference than anything, but people that know CMake
> more than I do said it would be "easier" and I trust them. I have no
> technical arguments pro or against.
>
> Though, I'd be fine with anything really.

First of all, thank you very much for driving this Renato. It’s a horrible task to do and I’m very grateful that you’ve taken this on.

I would, however, like to add one argument against a single repo model. If you look at the current LLVM GitHub repo, GitHub is tracking 806 forks. It is tracking 595 forks for clang. Not everyone using git for downstream development has a fork on GitHub. In particular, GitHub does not allow private forks of public repos, so anyone who has a non-public git fork of LLVM will have done a git clone and a git push to their own private repo (on or off GitHub). I know of about a dozen such private repos and (for some bizarre reason) most companies don’t tell me about the secret things that they’re doing with LLVM so there are undoubtedly a lot more that I don’t know about.

Conservatively, I would estimate that we have at least a thousand downstream forks of the current LLVM git repository. Moving to a single repo model with break all of them. It is completely unacceptable to break so many downstream consumers unless we are able to provide them with some coherent migration plan, but I have not seen anyone in the single-repo camp suggest anything.

David

Robinson, Paul via llvm-dev

unread,
Jul 21, 2016, 11:01:17 AM7/21/16
to Justin Bogner, Justin Lebar, llvm...@lists.llvm.org
> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
>

I find it too confusing to try to maintain several different patch
threads in one place. For one thing I'd have to keep separate build
directories anyway, why not just have entire separate clones and 'cd'
to the right one to do whatever piece of work. Much faster than doing
checkouts all the time and forgetting which build directory to use.
Clones are relatively cheap, I keep ten or so lying around each with
its own purpose.

On another topic, the sparse-checkout feature looks cool but it's
also complicated. I don't need all the projects all the time but
sometimes a commit will break something and suddenly I'll need to get
clang-tools-extra or lld or whatever. I don't want to bother keeping
them all around all the time.

Finally, the major drawback of a single huge repo IMHO:
In git, to push a commit you must have it at the remote HEAD.
If HEAD has changed you need to rebase/rebuild/retest/retry.
With a single monster repo, a commit to 'lld' means I have to
go through this pain to put in my 'clang' tweak. Why is that good?
I doubt a sparse-checkout helps here.
--paulr

James Y Knight via llvm-dev

unread,
Jul 21, 2016, 11:11:49 AM7/21/16
to David Chisnall, Justin Lebar via llvm-dev
That is a good point.

With the multi-repo plan, we were planning to take the existing git repositories that everyone's already using, and has work based on, and make them official.

However, with the single-repo plan, we'd be making a brand new git repository, with an integrated/interleaved history. As such, all the commit-hashes would be different, and even the directory layout will be different from the current git-svn repositories. And so we would "strand" all existing forks -- they'll be unable to easily pull in new changes to these repositories after the migration.

That we'll be getting incompatible history has been glossed over, and it is indeed really important to make it clear and have a good plan there. This doesn't only affect actual "forks", it also affects every single developer with a local git clone which contains unfinished work.

Therefore, we must come up with a plan to allow such users to rebase their existing work onto the new repository structure. Either documentation describing the git commands people need to run, or if it's really complicated, a script.

I don't think this is a really hard problem though -- I can think of a few ways to help existing users that probably will work (although I'd want to try them first, to ensure it actually does work, of course). The two I'm thinking of are just doing "git diff" followed by "git apply --directory=llvm" if you just want to save a patch. Or, some "git filter-branch" invocation to rename all the files in your existing repo, followed by "git rebase" (or "git merge"), if you have some more history you want to maintain.

To me, it seems eminently worth it to pay a one-time transition cost like that, if it makes life easier afterwards, which I believe the single-repo system would do. As long as it's documented well so not every developer needs to figure out out on their own.

David Chisnall via llvm-dev

unread,
Jul 21, 2016, 11:22:52 AM7/21/16
to James Y Knight, Justin Lebar via llvm-dev
On 21 Jul 2016, at 16:11, James Y Knight <jykn...@google.com> wrote:
>
> I don't think this is a really hard problem though -- I can think of a few ways to help existing users that probably will work (although I'd want to try them first, to ensure it actually does work, of course). The two I'm thinking of are just doing "git diff" followed by "git apply --directory=llvm" if you just want to save a patch. Or, some "git filter-branch" invocation to rename all the files in your existing repo, followed by "git rebase" (or "git merge"), if you have some more history you want to maintain.

Our clones of LLVM and clang have a reasonable amount of history (a couple of hundred commits, I believe), including multiple branches, that we’d want to preserve. Both branches have merged from upstream multiple times. It’s one of the smaller friendly forks that I know about. I’ve not used git filter-branch before, but I’d be very impressed if there is some simple invocation that can can move from this model.

I was in favour of the GitHub migration primarily because a lot of downstream LLVM users already have a workflow based around GitHub that works well and the proposal was to make this closer to the official workflow. I’m very nervous about a last-minute change to require everyone downstream to restructure their workflows.

In particular, the fact that we have a third more public GitHub forks of LLVM than of clang, and eight times as many as of lldb implies to me that forcing everyone downstream to pull in all subprojects would not be particularly well received.

David

Daniel Berlin via llvm-dev

unread,
Jul 21, 2016, 11:42:49 AM7/21/16
to Renato Golin, Justin Lebar via llvm-dev
I have lost my desire to be part of this thread, so i'm just going to make two quick points and then i'll leave y'all to your devices.
Apologies for not responding to every point you make.

Taking one of my points in separate as if it means my whole argument,
each time, *is* over simplifying it.

I'm not sure what you are expecting people to do here.  People generally try to find the main points of contention they care about, and respond to them, and ignore side issues.  If you would like different behavior out of people, that's really hard to get, but different approaches to laying out your argument may help.

 
You have no idea how many times I read the same emails over and over
to make sure I cite the right person. That's why I have consistently
re-written a summary of every thread, with proper quotes and
everything.

In the email i replied to, and we are discussing, you did not quote or cite a single person. When you replied to my response, you said you were representing others views.  Which, like i said, i actually have no doubt is true, but i'm just pointing out that not a single person was directly quoted or cited in the email we are talking about here.
I'm going to gently suggest that if that had happened, you would have gotten a different response.  
You can argue "that was the one time i didn't do it", etc, but even if that's true, that may be why you got the response you did :)

--Dan


Justin Lebar via llvm-dev

unread,
Jul 21, 2016, 12:16:49 PM7/21/16
to David Chisnall, Justin Lebar via llvm-dev
> Our clones of LLVM and clang have a reasonable amount of history (a couple of hundred commits, I believe), including multiple branches, that we’d want to preserve. Both branches have merged from upstream multiple times. It’s one of the smaller friendly forks that I know about. I’ve not used git filter-branch before, but I’d be very impressed if there is some simple invocation that can can move from this model.

James and I owe you something here. I think this can be handled in a
straightforward manner, but I am not 100% sure how at the moment. I
agree this is very important.

Our demo would be much more compelling if we can use an existing
branch. Does anyone know of one we can play with?

> In particular, the fact that we have a third more public GitHub forks of LLVM than of clang, and eight times as many as of lldb implies to me that forcing everyone downstream to pull in all subprojects would not be particularly well received.

I have a hard time understanding this particular argument. Per the
original e-mail, with three shell commands, you can hide whichever
llvm subprojects you want. After doing that, the only overhead of the
subprojects is extra space in your .git directory, which would still
be much smaller than an llvm+clang objdir.

Is there something specific that you think will not be well-received?
Or maybe it's better to speak personally -- is there something
specific that will bother you personally about having to clone (but
not check out) everything?

Justin Lebar via llvm-dev

unread,
Jul 21, 2016, 12:39:45 PM7/21/16
to David Chisnall, Justin Lebar via llvm-dev
One other point about maintaining branches:

With the single repository approach, maintaining a long-running branch
that touches multiple subprojects (e.g. llvm and clang) becomes *far*
simpler.

With the umbrella repo, you have to do the submodules trickery I
described in the original e-mail. It is complicated, and takes a lot
of typing (or requires you to develop custom scripts). But with the
single repo, this cross-cutting branch is just a branch.

In fact even if your branch isn't cross-cutting, if it's not a branch
of LLVM proper, I'm curious how you'd do things like bisect the
branch, or even just check out and build an old version. You check
out an old version of the (say) clang branch, and then presumably you
try to figure out the corresponding version in the LLVM repo that you
need to check out. I guess you could find the upstream parent of your
branch, get the SVN revision number from the commit message, then go
to the LLVM branch and find a commit which has an SVN number that's
nearby?

This would all become as simple as "git checkout" under the monolithic model.

David Chisnall via llvm-dev

unread,
Jul 21, 2016, 12:43:23 PM7/21/16
to Justin Lebar, Justin Lebar via llvm-dev
On 21 Jul 2016, at 17:16, Justin Lebar <jle...@google.com> wrote:
>
>> Our clones of LLVM and clang have a reasonable amount of history (a couple of hundred commits, I believe), including multiple branches, that we’d want to preserve. Both branches have merged from upstream multiple times. It’s one of the smaller friendly forks that I know about. I’ve not used git filter-branch before, but I’d be very impressed if there is some simple invocation that can can move from this model.
>
> James and I owe you something here. I think this can be handled in a
> straightforward manner, but I am not 100% sure how at the moment. I
> agree this is very important.
>
> Our demo would be much more compelling if we can use an existing
> branch. Does anyone know of one we can play with?

Ours are public:

https://github.com/ctsrd-cheri/llvm/
https://github.com/ctsrd-cheri/clang/

If you can provide a convincing demo of how we migrate from there (and how we migrate local clones of that) to a new model then I’d happily withdraw all objections.

I still don’t really like the single repo model, as I’d rather see LLVM fulfil its promise of being a set of modular reusable libraries with less tight coupling, allowing all of the subprojects to evolve at different rates, but I won’t object on purely aesthetic grounds.

David

Pete Cooper via llvm-dev

unread,
Jul 21, 2016, 12:46:31 PM7/21/16
to Renato Golin, LLVM Developers
Thanks for driving this Renato.  It going to be a huge benefit to everyone once we have a solution in place.

On Jul 20, 2016, at 11:03 PM, Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:

When everyone is happy that we have enough proposals, Tanya's survey
should be brought forward, in which case I'll gladly offer my help
again.
Regarding the survey specifically, and since I didn’t see a thread discussing survey options, I’d love to have a ‘I don’t mind what the solution is, I just want git’ option.  Basically, ‘any of the above’.

For me, I’m very happy with the proposals being discussed, but mostly just want to move to a more reliable hosting service (full disclosure, I’m a fan of GitHub), and I use git-svn anyway so native git would be best for me.

Anyway, not trying to derail the discussion, just express that there are likely many others like me out there who are silent not because we don’t have an opinion, but because we just want git and don’t want to have an excessive number of +1’s on a thread saying so.

Cheers,
Pete

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 12:49:53 PM7/21/16
to Justin Lebar, Justin Lebar via llvm-dev
Question:

Which projects do we put under this monolithic repository?

SVN has about 42 projects, some of them dead, some of them in life support.

So far, being "an upstream repository" meant being inside the LLVM SVN
server. We'll change that to "being inside the monolithic LLVM
repository". But this can become huge, and not all projects "ink" back
to LLVM.

An alternative would be to just have some core projects in the
monolithic and everything else as separate, but then what's core?

As a back-of-the-envelope, I suggest: llvm, clang, clang-tools-extra,
compiler-rt, libc++, libc++abi, libunwind, test-suite.

I'm thinking LLD and LLDB could remain out, but I don't think it would
be too weird for them to be in...

Anything else? Less?

cheers,
--renato

C Bergström

unread,
Jul 21, 2016, 1:03:44 PM7/21/16
to Renato Golin, Justin Lebar via llvm-dev
Monolithic is trying to solve the wrong problem - it's that simple.
Any discussion or attempt to coddle those who think it's necessary is
a waste of time. #dictator

As part of any potential migration, everyone involved must start to
accept certain changes, (large or small) to the workflow. The big
challenge here isn't technical, it's mindset. It's convincing any
group of people who object that it won't be as painful to them as they
think. (I hope this is a true statement)

#if - there's a group of people are : dogmatic, stubborn and
unreasonable - others outside that group should decide how to deal
with them: ignore, coddle, placate or other.

I don't think there's a perfect technical solution to make everyone
happy - I think focusing on the social engineering will be an equal or
greater importance. (herding cats)

With the survey - I guess you could include some level of objection
like - strongly against and over my dead body type reactions are
probably the most to be cautious about. Anyone surveyed who fall in
the middle or slightly left/right can be seen as "flexible". If it
turns out that they survey shows only 1-5 people with extreme views
and 100 people with moderate or flexible views - those are hard
numbers. From there decisions can be made and long unending threads
like this can die - so we can all get back to reading more important
things.

David Chisnall via llvm-dev

unread,
Jul 21, 2016, 1:06:14 PM7/21/16
to Renato Golin, Justin Lebar via llvm-dev
On 21 Jul 2016, at 17:49, Renato Golin <renato...@linaro.org> wrote:
>
> As a back-of-the-envelope, I suggest: llvm, clang, clang-tools-extra,
> compiler-rt, libc++, libc++abi, libunwind, test-suite.

The minimum that makes sense is llvm, though that defeats the point of a combined repo.

I don’t think that libc++ / libc++abi make sense there for several reasons:

- You very rarely need to update them in lockstep with anything else

- LLVM/Clang is useful and frequently built without libc++

- libc++ is useful and frequently built without any of the rest of LLVM

The same applies to libunwind. If you’re building an entire toolchain then you might want to use it, but most projects don’t benefit from it and it implements a well-defined standard ABI and so doesn’t need to be updated in lockstep with anything else.

clang-tools-extra is explicitly a bunch of stuff that doesn’t belong in the main clang repo because it’s not of interest to most people doing clang work, so it’s hard to see why it would be of interest to everyone doing LLVM work. Additionally, I believe that they’re mostly things that are built on top of APIs in clang that are supposed to be moderately stable, so shouldn’t need atomically updating with respect to clang very often.

Compiler-rt probably makes sense if clang is there, as it includes a lot of the run-time support for clang.

David

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 1:11:23 PM7/21/16
to David Chisnall, Justin Lebar via llvm-dev
On 21 July 2016 at 18:06, David Chisnall <david.c...@cl.cam.ac.uk> wrote:
> - libc++ is useful and frequently built without any of the rest of LLVM

but it interlocks with libunwind and compiler-rt...


> The same applies to libunwind. If you’re building an entire toolchain then you might want to use it, but most projects don’t benefit from it and it implements a well-defined standard ABI and so doesn’t need to be updated in lockstep with anything else.

Using RT without libunwind on ARM is weird. libgcc_s has some of the
functionality, but the split between libgcc, _s and _eh is not the
same as compiler-rt, libc++abi and libunwind.

If one want's a reasonable solution, one (today) needs to include all
three. Then why not libc++? I mean, GCC does build libstdc++ in tree
already, so it wouldn't be unheard of.


> clang-tools-extra is explicitly a bunch of stuff that doesn’t belong in the main clang repo because it’s not of interest to most people doing clang work, so it’s hard to see why it would be of interest to everyone doing LLVM work. Additionally, I believe that they’re mostly things that are built on top of APIs in clang that are supposed to be moderately stable, so shouldn’t need atomically updating with respect to clang very often.

ok, no strong feelings about it.


> Compiler-rt probably makes sense if clang is there, as it includes a lot of the run-time support for clang.

RT strongly fits into the core. If there a minimal-minimal core to be
set, that'd be { llvm, clang, RT }. If not for what it can do today,
for what it should do in the future.

Justin Lebar via llvm-dev

unread,
Jul 21, 2016, 1:13:19 PM7/21/16
to Renato Golin, Justin Lebar via llvm-dev
> Which projects do we put under this monolithic repository?

The proposal at the moment is to include

llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
openmp, and parallel-libs.

This is the set {llvm} plus the transitive closure of "projects that
are version-locked to a project in the set", where the closure is
taken over the set of all active LLVM subprojects.

Projects that don't depend on a specific version of llvm or some other
subproject -- test-suite and libc++ -- are not included. Everything
else is, because the whole idea is to have one repository that
captures the implicit versioning dependencies between (say) lldb and
llvm.

As soon as we have one version-locked subproject that's not in the
monolithic repo, we now we have to maintain an umbrella repo that
tells you which version of llvm corresponds to which version of the
version-locked-but-not-in-monolithic repo.

The cost of including additional projects in the monolithic repository
is very low, since you can ignore them using sparse checkouts.

C Bergström

unread,
Jul 21, 2016, 1:14:51 PM7/21/16
to Renato Golin, Justin Lebar via llvm-dev
On Fri, Jul 22, 2016 at 1:11 AM, Renato Golin via llvm-dev
<llvm...@lists.llvm.org> wrote:
> On 21 July 2016 at 18:06, David Chisnall <david.c...@cl.cam.ac.uk> wrote:
>> - libc++ is useful and frequently built without any of the rest of LLVM
>
> but it interlocks with libunwind and compiler-rt...
>
>
>> The same applies to libunwind. If you’re building an entire toolchain then you might want to use it, but most projects don’t benefit from it and it implements a well-defined standard ABI and so doesn’t need to be updated in lockstep with anything else.
>
> Using RT without libunwind on ARM is weird. libgcc_s has some of the
> functionality, but the split between libgcc, _s and _eh is not the
> same as compiler-rt, libc++abi and libunwind.
>
> If one want's a reasonable solution, one (today) needs to include all
> three. Then why not libc++? I mean, GCC does build libstdc++ in tree
> already, so it wouldn't be unheard of.

Not true - the non-gnu libunwind which is outside of the llvm family
of projects works just fine on AArch64. We're dealing with hopefully
standard interfaces and A1 vs A2 comparisons.

Again - this is thread is digressing - wrong solution to wrong problem
- Renato step back and try not to let the thread get detracted

NAKAMURA Takumi via llvm-dev

unread,
Jul 21, 2016, 1:40:09 PM7/21/16
to James Y Knight, Renato Golin, Justin Lebar via llvm-dev
TL;DR, excuse me.

Before we can agree to merge to a single-repo, there's one further question that must be resolved:

Should the layout in the merged repository be:
1) Like the "llvm-project" git repository is now:

<root>/llvm/
<root>/clang/
<root>/compiler-rt
...

FYI, the layout can be synthesized with tree objects in each *.git with git-plumbing commands.
 
2) Like the "ideal merged checkout" is now:
llvm/
llvm/tools/clang
llvm/projects/compiler-rt
...

I suppose the layout is just standing on "the historical reason".
At least, clang is position-independent. It'd work if clang is not on $(BUILD_ROOT)/tools/clang, but on $(BUILD_ROOT)/projects/clang.

Since the llvm-project unified tree repo has been unofficial, I didn't propose any llvm-project-specific things.
I have just been tweaking that makes subprojects not assumed position-dependent.

I think we could take any option, svn, set of single repos, or unified tree.
ATM, I don't agree just to migrate to the single git repo. :)

Jonathan Roelofs via llvm-dev

unread,
Jul 21, 2016, 1:44:57 PM7/21/16
to C Bergström, Justin Lebar via llvm-dev

On 7/21/16 11:03 AM, C Bergström via llvm-dev wrote:
> Monolithic is trying to solve the wrong problem - it's that simple.
> Any discussion or attempt to coddle those who think it's necessary is
> a waste of time. #dictator

Christopher,

AFAICT, you haven't explained *why* it is the wrong problem. Mind
elaborating on that?


Jon

p.s: edicts, appeals to authority, and ad hominems are not useful for
discussion. Doing that, and following up with "#dictator" further
solidifies that you know your own argument is b-s.... please stop.

--
Jon Roelofs
jona...@codesourcery.com
CodeSourcery / Mentor Embedded

Sanjoy Das via llvm-dev

unread,
Jul 21, 2016, 2:04:17 PM7/21/16
to Jonathan Roelofs, Justin Lebar via llvm-dev
FWIW, like David Chisnall, we (Azul) have a problem with rewriting
history. Our LLVM fork has O(100) changes diverging from upstream
(though our branching structure is simple), and keeping all of that
history is important.

What do people think of having one (or a set of) merge commit(s)
merging in the non-llvm projects that will be part of the new
monorepo? That's the only technique I can think of that will preserve
history for downstream users by construction.

-- Sanjoy

--
Sanjoy Das
http://playingwithpointers.com

Justin Lebar via llvm-dev

unread,
Jul 21, 2016, 2:20:26 PM7/21/16
to Sanjoy Das, Justin Lebar via llvm-dev, Jonathan Roelofs
> What do people think of having one (or a set of) merge commit(s)
> merging in the non-llvm projects that will be part of the new
> monorepo? That's the only technique I can think of that will preserve
> history for downstream users by construction.

This would solve the problem of importing history. But if we did it
this way, you'd be unable to check out versions of the complete repo
from before the merge date. So you'd be unable to bisect back before
the merge date, for example. I think the umbrella repo might be a
better solution than one which had that property.

I don't know if there's a way to allow checkouts of everything from
before the merge date while also making the custom branch merge to the
monolithic repository as trivial as "git merge". I think it may
depend on git's handling of file renames, and if so...I am not too
hopeful. :)

For at least David's branches, I think it would be really cool if we
could merge the llvm and clang branches into a single branch with
correct history. We wouldn't be able to do that if we used git merge
to build the monorepo out of its constituent pieces.

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 4:55:52 PM7/21/16
to Justin Lebar, Justin Lebar via llvm-dev
On 21 July 2016 at 18:12, Justin Lebar <jle...@google.com> wrote:
> llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
> openmp, and parallel-libs.

I really, *really* would like to see libc++ / abi / unwind. :)

My reason is that, when building toolchains, the C++ ABI and unwinding
are fundamental parts of the run-time library, of which RT is only
part of.

RT has the builtins (and a lot of other stuff), but it can't unwind on
its own. So debuggers (LLDB), profilers (which lives in RT) and basic
stack traces don't work, unless you use an alternative option (like
libgcc). This is *specially* true for ARM.

When unwinding C++ code, one needs cxa_* functions, and that's in
libc++abi, which interoperates with libc++, unwind and RT.

The LLVM triple abi/unwind/RT is not divided in the same way as
gcc_eh/gcc_s/gcc, so picking some but not others is not a sane option.
Plus, validating every possible choices needs one buildbot for each
combination, which is not feasible, at least not for us.

Basically, picking RT and not unwind/abi breaks their
inter-dependencies, so does picking abi but not libc++.


> Projects that don't depend on a specific version of llvm or some other
> subproject -- test-suite and libc++ -- are not included. Everything
> else is, because the whole idea is to have one repository that
> captures the implicit versioning dependencies between (say) lldb and
> llvm.

I'm fine with the test-suite not being in the core, but the others
will make it very hard to build actual toolchains.

They're also reasonably small, rarely updated and self-contained, so I
don't see why they can't be there.

Chandler Carruth via llvm-dev

unread,
Jul 21, 2016, 5:11:35 PM7/21/16
to Renato Golin, Justin Lebar, Justin Lebar via llvm-dev
On Thu, Jul 21, 2016 at 1:55 PM Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:
On 21 July 2016 at 18:12, Justin Lebar <jle...@google.com> wrote:
>   llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
> openmp, and parallel-libs.

I really, *really* would like to see libc++ / abi / unwind. :)

FWIW, I agree for all the reasons you outline.

I didn't suggest it to Justin only because I know those in particular aren't really version-locked. Personally, I'd rather merge all of it and make checking out a slice easy and simple. But I felt like the decision there could be separate for those libraries, and probably should be made by the devs working on those libraries more than me.

-Chandler

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 5:19:06 PM7/21/16
to Chandler Carruth, Justin Lebar via llvm-dev
On 21 July 2016 at 22:11, Chandler Carruth <chan...@google.com> wrote:
> I didn't suggest it to Justin only because I know those in particular aren't
> really version-locked. Personally, I'd rather merge all of it and make
> checking out a slice easy and simple. But I felt like the decision there
> could be separate for those libraries, and probably should be made by the
> devs working on those libraries more than me.

Makes sense.

We work in RT and have a strong vested interest in libunwind, and for
us, having them bundled would be a major win.

libc++/abi are more as dependencies, but would also be much nicer
bundled. Marshall may have a better view on that specific subject.

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 5:26:24 PM7/21/16
to Chandler Carruth, Justin Lebar via llvm-dev

On Jul 21, 2016, at 2:11 PM, Chandler Carruth via llvm-dev <llvm...@lists.llvm.org> wrote:

On Thu, Jul 21, 2016 at 1:55 PM Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:
On 21 July 2016 at 18:12, Justin Lebar <jle...@google.com> wrote:
>   llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
> openmp, and parallel-libs.

I really, *really* would like to see libc++ / abi / unwind. :)

FWIW, I agree for all the reasons you outline.

Because it is hard to agree on what to put here, I’d have every single LLVM project that is not dying/very experimental research/… inside the repo.

— 
Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 5:29:18 PM7/21/16
to Sanjoy Das, Justin Lebar via llvm-dev, Jonathan Roelofs

> On Jul 21, 2016, at 11:03 AM, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> FWIW, like David Chisnall, we (Azul) have a problem with rewriting
> history.
> Our LLVM fork has O(100) changes diverging from upstream
> (though our branching structure is simple), and keeping all of that
> history is important.

Nobody downstream has to adopt the new structure, I believe it is possible to extract only the “llvm” commits from the new repo and rebase them on top of the existing llvm repo.
This can be done on the fly by you CI, but it is also a deterministic process, i.e. you can restart from scratch anytime (assuming you have the original llvm.git repo and the new one).

>
> What do people think of having one (or a set of) merge commit(s)
> merging in the non-llvm projects that will be part of the new
> monorepo? That's the only technique I can think of that will preserve
> history for downstream users by construction.

I have no idea what you mean here?


Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 5:32:13 PM7/21/16
to Sanjoy Das, Justin Lebar via llvm-dev, Jonathan Roelofs

> On Jul 21, 2016, at 2:29 PM, Mehdi Amini <mehdi...@apple.com> wrote:
>
>
>> On Jul 21, 2016, at 11:03 AM, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
>>
>> FWIW, like David Chisnall, we (Azul) have a problem with rewriting
>> history.
>> Our LLVM fork has O(100) changes diverging from upstream
>> (though our branching structure is simple), and keeping all of that
>> history is important.
>
> Nobody downstream has to adopt the new structure, I believe it is possible to extract only the “llvm” commits from the new repo and rebase them on top of the existing llvm repo.
> This can be done on the fly by you CI, but it is also a deterministic process, i.e. you can restart from scratch anytime (assuming you have the original llvm.git repo and the new one).
>
>>
>> What do people think of having one (or a set of) merge commit(s)
>> merging in the non-llvm projects that will be part of the new
>> monorepo? That's the only technique I can think of that will preserve
>> history for downstream users by construction.
>
> I have no idea what you mean here?

I think I understand what you mean:

1) checkout the existing clang repo
2) move everything in a subdirectory “clang”
3) commit the move
4) merge this into the new “llvm-project”.
5) repeat for every single project

That should preserve the hashes and avoid user to have to “extract” the subproject to merge into their own branch.
Annoyingly, it breaks git log path/to/file though.


Mehdi

Robinson, Paul via llvm-dev

unread,
Jul 21, 2016, 5:33:50 PM7/21/16
to Renato Golin, Justin Lebar, llvm...@lists.llvm.org
> On 21 July 2016 at 18:12, Justin Lebar <jle...@google.com> wrote:
> > llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
> > openmp, and parallel-libs.
>
> I really, *really* would like to see libc++ / abi / unwind. :)
>
> My reason is that, when building toolchains, the C++ ABI and unwinding
> are fundamental parts of the run-time library, of which RT is only
> part of.

When building *your* toolchain...

My toolchain uses clang but not libc++/abi/unwind, we have our own, and
we don't currently include them in our tree. We do include compiler-rt.

If we should change our minds later we can opt-in to anything else we
want (libcxx etc, lld? lldb? who knows) but in the meantime they are
unnecessary baggage for my purposes.

--paulr

Renato Golin via llvm-dev

unread,
Jul 21, 2016, 5:46:11 PM7/21/16
to Robinson, Paul, llvm...@lists.llvm.org
On 21 July 2016 at 22:33, Robinson, Paul <paul.r...@sony.com> wrote:
> When building *your* toolchain...

I know... :)


> If we should change our minds later we can opt-in to anything else we
> want (libcxx etc, lld? lldb? who knows) but in the meantime they are
> unnecessary baggage for my purposes.

I really see no way of doing this without bikeshedding, other than do
what Mehdi suggested and put all non-dying projects.

Cloning the first repo could be bad, especially for some of our
boards, so I won't propose it myself. Setting up NFS is rarely an
option (support, stability), so we will suffer for including lld,
lldb, etc. I'd prefer to see them out.

Since I'm not the one trying to reach consensus in this thread, I'll
just state what would be best for me and let Justin collect the
opinions. :)

I'm honestly fine with whatever decision, as we can usually work
around the problems, and it's probably cheaper than to bikeshed to
death.

cheers,
--renato

Justin Bogner via llvm-dev

unread,
Jul 21, 2016, 5:57:08 PM7/21/16
to Mehdi Amini via llvm-dev, Jonathan Roelofs
Mehdi Amini via llvm-dev <llvm...@lists.llvm.org> writes:
>>> What do people think of having one (or a set of) merge commit(s)
>>> merging in the non-llvm projects that will be part of the new
>>> monorepo? That's the only technique I can think of that will preserve
>>> history for downstream users by construction.
>>
>> I have no idea what you mean here?
>
> I think I understand what you mean:
>
> 1) checkout the existing clang repo
> 2) move everything in a subdirectory “clang”
> 3) commit the move
> 4) merge this into the new “llvm-project”.
> 5) repeat for every single project
>
> That should preserve the hashes and avoid user to have to “extract”
> the subproject to merge into their own branch.
> Annoyingly, it breaks git log path/to/file though.

Use `git log --follow path/to/file`. It's better ;)

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 6:03:30 PM7/21/16
to Justin Bogner, Mehdi Amini via llvm-dev, Jonathan Roelofs

> On Jul 21, 2016, at 2:56 PM, Justin Bogner <ma...@justinbogner.com> wrote:
>
> Mehdi Amini via llvm-dev <llvm...@lists.llvm.org> writes:
>>>> What do people think of having one (or a set of) merge commit(s)
>>>> merging in the non-llvm projects that will be part of the new
>>>> monorepo? That's the only technique I can think of that will preserve
>>>> history for downstream users by construction.
>>>
>>> I have no idea what you mean here?
>>
>> I think I understand what you mean:
>>
>> 1) checkout the existing clang repo
>> 2) move everything in a subdirectory “clang”
>> 3) commit the move
>> 4) merge this into the new “llvm-project”.
>> 5) repeat for every single project
>>
>> That should preserve the hashes and avoid user to have to “extract”
>> the subproject to merge into their own branch.
>> Annoyingly, it breaks git log path/to/file though.
>
> Use `git log --follow path/to/file`. It's better ;)

I know, it works most of the time for log, but how do blame it at a revision older than the move?


Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 6:16:11 PM7/21/16
to Robinson, Paul, llvm...@lists.llvm.org

> On Jul 21, 2016, at 2:33 PM, Robinson, Paul via llvm-dev <llvm...@lists.llvm.org> wrote:
>
>> On 21 July 2016 at 18:12, Justin Lebar <jle...@google.com> wrote:
>>> llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
>>> openmp, and parallel-libs.
>>
>> I really, *really* would like to see libc++ / abi / unwind. :)
>>
>> My reason is that, when building toolchains, the C++ ABI and unwinding
>> are fundamental parts of the run-time library, of which RT is only
>> part of.
>
> When building *your* toolchain...
>
> My toolchain uses clang but not libc++/abi/unwind, we have our own, and
> we don't currently include them in our tree. We do include compiler-rt.
>
> If we should change our minds later we can opt-in to anything else we
> want (libcxx etc, lld? lldb? who knows) but in the meantime they are
> unnecessary baggage for my purposes.

As a developer, you can checkout part of the repo with sparse-checkout.
As a downstream integrator, you can filter out the repo history as you want before merging into your repo.


Mehdi

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 6:27:29 PM7/21/16
to Mehdi Amini, Justin Lebar via llvm-dev, Jonathan Roelofs
On Jul 21, 2016, at 2:32 PM, Mehdi Amini via llvm-dev <llvm...@lists.llvm.org> wrote:


On Jul 21, 2016, at 2:29 PM, Mehdi Amini <mehdi...@apple.com> wrote:


On Jul 21, 2016, at 11:03 AM, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:

FWIW, like David Chisnall, we (Azul) have a problem with rewriting
history.  
Our LLVM fork has O(100) changes diverging from upstream
(though our branching structure is simple), and keeping all of that
history is important.

Nobody downstream has to adopt the new structure, I believe it is possible to extract only the “llvm” commits from the new repo and rebase them on top of the existing llvm repo.
This can be done on the fly by you CI, but it is also a deterministic process, i.e. you can restart from scratch anytime (assuming you have the original llvm.git repo and the new one).

To clarify, I believe we* can maintain this read-only view of the individual repo like (https://github.com/llvm-mirror/llvm and siblings) with their current history (hashes…), and adding on top the new commits from the new repo: 

*: we could be the LLVM foundation or some volunteer or each downstream user/org.

— 
Mehdi

Robinson, Paul via llvm-dev

unread,
Jul 21, 2016, 7:39:51 PM7/21/16
to mehdi...@apple.com, llvm...@lists.llvm.org


> -----Original Message-----
> From: mehdi...@apple.com [mailto:mehdi...@apple.com]
> Sent: Thursday, July 21, 2016 3:16 PM
> To: Robinson, Paul
> Cc: Renato Golin; Justin Lebar; llvm...@lists.llvm.org
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>
>
> > On Jul 21, 2016, at 2:33 PM, Robinson, Paul via llvm-dev <llvm-
> d...@lists.llvm.org> wrote:
> >
> >> On 21 July 2016 at 18:12, Justin Lebar <jle...@google.com> wrote:
> >>> llvm, clang, clang-tools-extra, lld, polly, lldb, llgo, compiler-rt,
> >>> openmp, and parallel-libs.
> >>
> >> I really, *really* would like to see libc++ / abi / unwind. :)
> >>
> >> My reason is that, when building toolchains, the C++ ABI and unwinding
> >> are fundamental parts of the run-time library, of which RT is only
> >> part of.
> >
> > When building *your* toolchain...
> >
> > My toolchain uses clang but not libc++/abi/unwind, we have our own, and
> > we don't currently include them in our tree. We do include compiler-rt.
> >
> > If we should change our minds later we can opt-in to anything else we
> > want (libcxx etc, lld? lldb? who knows) but in the meantime they are
> > unnecessary baggage for my purposes.
>
> As a developer, you can checkout part of the repo with sparse-checkout.

I'm not clear why imposing this cost on everybody who wants less-than-all
(which I'd think would be most people) is superior to the submodule thing
which can be maintained centrally by people who actually understand how to
do it.

> As a downstream integrator, you can filter out the repo history as you
> want before merging into your repo.

Hmmm maybe, maybe not. It sounds like the claim is: you can do a sparse
checkout of upstream, then merge it to a different branch, and get only
the history of the stuff that was sparsely checked out. Does this work
with subtree merges? Our branches are not rooted at the 'llvm' directory,
and I am suspicious about what the sparse checkout config would do to the
local branch. (I know, I should do the experiment myself, but right now
I'm in the middle of a release-prep circus and really shouldn't be
spending the time to write this email:-).)

If all of this magic *does* work, then mainly it's a matter of scripting
the sparse-checkout config and deploying that internally. Not free, but
maybe not horrible either.
--paulr

Mehdi Amini via llvm-dev

unread,
Jul 21, 2016, 7:51:43 PM7/21/16
to Robinson, Paul, llvm...@lists.llvm.org
Can you please clarify your use of “cost” (bandwidth, disk space, extra command to type initially?), otherwise it is hard for me to address you concerns (for instance I’m actually sensitive to the one you spelled out clearly in another email about a commit in lld requiring a rebase in llvm).

is superior to the submodule thing
which can be maintained centrally by people who actually understand how to 
do it.

While I see some good principled way to have a submodule umbrella repo in git, I don’t see any *without  server-side hooks* that does not have any flaw*. Unfortunately this is not addressed by Renato’s proposal, and github does not allow server-side hooks, and another git hosting service is spelled out-of-discussion for Renato’s proposal.

* we may consider the flaws acceptable, but they need to be understood and accepted, and I don’t think it has been spelled out clearly in Renato’s proposal.

As a downstream integrator, you can filter out the repo history as you
want before merging into your repo.

Hmmm maybe, maybe not.  It sounds like the claim is: you can do a sparse
checkout of upstream, then merge it to a different branch, and get only 
the history of the stuff that was sparsely checked out.  

No that’s not the claim (sparse checkout are totally unrelated to this part of my claim).

The claim is to keep the existing history (I.e. not hash changes) that is currently at http://llvm.org/git/llvm.git and continue to accumulate there any new commit that would touch the llvm subdirectory of the unified repo.
This would be a read-only view of course, but just like it is now.

I.e. if you’re using the existing git repo, we can keep maintaining your workflow *as-is* forever. It means *no* migration would be forced on any CI/integration system (other than those relying on SVN).
(We’d need some creativity around the git-svn-id in the commit messages for the new commits though).


— 
Mehdi

Sanjoy Das via llvm-dev

unread,
Jul 21, 2016, 8:16:45 PM7/21/16
to Mehdi Amini, llvm...@lists.llvm.org
Hi Mehdi,

I really like your idea of having a few "projected" git repositories
(i.e. capture all commits that touch llvm/ into llvm.git, all that
touch clang/ to clang.git etc.). I think it should solve our problem
of llvm-forks-with-downstream changes very nicely (I think we won't
have to do anything, as you said). I still want to sleep on it to see
if I can spot any issues.

@David Chisnall and others with local forks: can you spot any
potential issues with Mehdi's plan? Are there cases where it won't
work?

-- Sanjoy

Justin Lebar via llvm-dev

unread,
Jul 21, 2016, 8:32:15 PM7/21/16
to Sanjoy Das, llvm...@lists.llvm.org
FYI jyknight and I have a hacky script that sort of works for
translating commits to the existing monorepo. I'm working on cleaning
it up and applying it to David Chisnall's branches. Hopefully I'll
have something by eod tomorrow. (This isn't to take a position on
using the existing monorepo as our new source of truth, nor to take a
position on any particular directory layout.)

I wanted to try to merge David's llvm and clang branches into a single
branch -- that would be a big usability improvement over the current
situation. But there isn't enough information in the repositories to
recover the correct interleaving. You could try to order by date, but
that only works so long as the history is linear... So I gave up on
that feature.

I also kind of like the idea of these projected repositories, and if
that's sufficient, awesome, save us some work.

Robinson, Paul via llvm-dev

unread,
Jul 21, 2016, 9:08:36 PM7/21/16
to mehdi...@apple.com, llvm...@lists.llvm.org

Can you please clarify your use of “cost” (bandwidth, disk space, extra command to type initially?),

 

Developer time, barrier to entry for new contributors.  Getting the sparse-checkout business right looks like it is actually non-trivial and not recommended for the git novice.  *Changing* the sparse-checkout configuration later appears to be fraught with peril (easy to get wrong).

 

The claim is to keep the existing history (I.e. not hash changes) that is currently at http://llvm.org/git/llvm.git and continue to accumulate there any new commit that would touch the llvm subdirectory of the unified repo.

This would be a read-only view of course, but just like it is now.

 

Hmmm so there's still a per-old-project view?  Missed that aspect, sorry…  it would let us preserve our processes in terms of integrating the flow from upstream, although being able to get a correctly linearized flow of commits from the unified repo would be preferable and we would *want* to change over.  Still not clear how to make that work with a sparse checkout.

--paulr

Justin Lebar via llvm-dev

unread,
Jul 21, 2016, 9:15:13 PM7/21/16
to Robinson, Paul, llvm...@lists.llvm.org
> Developer time, barrier to entry for new contributors. Getting the sparse-checkout business right looks like it is actually non-trivial and not recommended for the git novice.

It's eminently copy-pastable, and there is no possibility of data loss.

I understand it's not zero cost, but I have trouble seeing how there's
a meaningful comparison between

- the cost of three copy-pastable commands run once, versus
- the benefit of simplifying the git commands we all run tens or
hundreds of times a day.

> *Changing* the sparse-checkout configuration later appears to be fraught with peril (easy to get wrong).

If you get it wrong, you don't have the right files in your checkout,
and you get a build error about a missing file...

Here too, I get that there's a nonzero possibility that one could
screw this up and get themselves into trouble, but when I actually do
the cost/benefit analysis, it is very hard for me to see how the costs
are anywhere near the same magnitude as the benefits.

Robinson, Paul via llvm-dev

unread,
Jul 22, 2016, 1:48:44 AM7/22/16
to Justin Lebar, llvm...@lists.llvm.org


> -----Original Message-----
> From: Justin Lebar [mailto:jle...@google.com]
> Sent: Thursday, July 21, 2016 6:15 PM
> To: Robinson, Paul
> Cc: mehdi...@apple.com; Renato Golin; llvm...@lists.llvm.org
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>
> > Developer time, barrier to entry for new contributors. Getting the
> sparse-checkout business right looks like it is actually non-trivial and
> not recommended for the git novice.
>
> It's eminently copy-pastable, and there is no possibility of data loss.
>
> I understand it's not zero cost, but I have trouble seeing how there's
> a meaningful comparison between
>
> - the cost of three copy-pastable commands run once, versus

once per clone (picky, picky, picky...) but extra steps are always
the ones you forget to do. Scriptable, so maybe not a big deal.

> - the benefit of simplifying the git commands we all run tens or
> hundreds of times a day.

Personally I already have a script to deal with updating the entire
tree; adapting to submodules would be a one-time-ever cost and I
never think about it again (and never have to retrain my fingers).

I'll acknowledge that people have different workflows, and there are
advantages to the unified repo beyond what 'checkout' costs. The
size cost of the extra sources is relatively small. So to get those
benefits without the unnecessary complexity of sparse checkouts,
I would like it setup so I *don't have to build* all the extra pieces
even if they exist in the source tree. Build time is iteration time
is lost time when building pieces I don't need or care about. Ditto
the time taken to run the tests of all those pieces I don't care about.
This should be a configuration-time thing (which again I have scripted
and therefore don't have to retrain my fingers). If the cmake run
can do that for me, I have no problem with a unified repo that holds
the entire LLVM universe in it.
--paulr

Justin Lebar via llvm-dev

unread,
Jul 22, 2016, 1:59:04 AM7/22/16
to Robinson, Paul, llvm...@lists.llvm.org
> So to get those benefits without the unnecessary complexity of sparse checkouts, I would like it setup so I *don't have to build* all the extra pieces even if they exist in the source tree. [...] If the cmake run can do that for me, I have no problem with a unified repo that holds the entire LLVM universe in it.

This is absolutely on the table as far as I'm concerned. In a world
with separate repos it might make sense to use the presence or absence
of particular source files to trigger building (or not) a particular
project, but that makes little sense with a monolithic repository.

(I mean, it doesn't personally affect me because I never type plain
"ninja" -- I always do "ninja check-clang" or whatever. But that's
just *my* messed up workflow. :)

-Justin

Mehdi Amini via llvm-dev

unread,
Jul 22, 2016, 2:14:45 AM7/22/16
to Mehdi Amini, Justin Lebar via llvm-dev, Jonathan Roelofs
On Jul 21, 2016, at 2:32 PM, Mehdi Amini via llvm-dev <llvm...@lists.llvm.org> wrote:


On Jul 21, 2016, at 2:29 PM, Mehdi Amini <mehdi...@apple.com> wrote:


On Jul 21, 2016, at 11:03 AM, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:

FWIW, like David Chisnall, we (Azul) have a problem with rewriting
history.  
Our LLVM fork has O(100) changes diverging from upstream
(though our branching structure is simple), and keeping all of that
history is important.

Nobody downstream has to adopt the new structure, I believe it is possible to extract only the “llvm” commits from the new repo and rebase them on top of the existing llvm repo.
This can be done on the fly by you CI, but it is also a deterministic process, i.e. you can restart from scratch anytime (assuming you have the original llvm.git repo and the new one).


What do people think of having one (or a set of) merge commit(s)
merging in the non-llvm projects that will be part of the new
monorepo?  That's the only technique I can think of that will preserve
history for downstream users by construction.

I have no idea what you mean here?

I think I understand what you mean:

1) checkout the existing clang repo
2) move everything in a subdirectory “clang”
3) commit the move
4) merge this into the new “llvm-project”.
5) repeat for every single project

That should preserve the hashes and avoid user to have to “extract” the subproject to merge into their own branch.
Annoyingly, it breaks git log path/to/file though.

Just tried to set it up there: https://github.com/joker-eph/llvm-unified
(git log —follow is working fine with this setup).

While it preserves the history fine (I.e. the hashes are identical to the current git), it has a drawback: there isn’t anymore a common ancestor for the parents of the merge (this may or may not be an issue in practice, not sure yet, but it is uncommon for git).

Now I need to write the script to regenerate the independent “read-only” repos.

— 

David Chisnall via llvm-dev

unread,
Jul 22, 2016, 3:31:42 AM7/22/16
to Mehdi Amini, Justin Lebar via llvm-dev, Jonathan Roelofs
On 22 Jul 2016, at 07:14, Mehdi Amini via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Just tried to set it up there: https://github.com/joker-eph/llvm-unified
> (git log —follow is working fine with this setup).
>
> While it preserves the history fine (I.e. the hashes are identical to the current git), it has a drawback: there isn’t anymore a common ancestor for the parents of the merge (this may or may not be an issue in practice, not sure yet, but it is uncommon for git).

Perhaps I’m missing something, but on GitHub I can’t see any of the history. Assuming that this is just a GitHub UI issue, can you explain the steps for a downstream user who has clones of the llvm and clang repos that were forked at some point in the past, have been periodically merged, and contain a load of commits on top of upstream would migrate to using this?

David

Chandler Carruth via llvm-dev

unread,
Jul 22, 2016, 3:51:33 AM7/22/16
to James Y Knight, Renato Golin, Justin Lebar via llvm-dev
I wanted to present some of the particular reasons why I'm pretty strongly opposed to a purely flat layout of projects the way the current github 'llvm-project' repository looks, as that hasn't happened on the list yet. I'm replying to myself as I don't see a much better place to hang that conversation.

On Wed, Jul 20, 2016 at 7:38 PM Chandler Carruth <chan...@google.com> wrote:
On Wed, Jul 20, 2016 at 7:08 PM James Y Knight via llvm-dev <llvm...@lists.llvm.org> wrote:
Should the layout in the merged repository be:
1) Like the "llvm-project" git repository is now:

<root>/llvm/
<root>/clang/
<root>/compiler-rt
...

2) Like the "ideal merged checkout" is now:
llvm/
llvm/tools/clang
llvm/projects/compiler-rt
...


I don't much care which of those is chosen. I have a slight preference for #1, for ease of doing things like grep/log/etc on llvm by itself, excluding all the other projects. But either way seems probably fine, and an improvement over multiple repositories.

FWIW, I strongly prefer #2, but I think the high order bit is the repository question.

So, a reasonable question might be, why do I prefer #2?
I have a lot of not terribly connected reasons.

First, I want to consider what happens if we go with #1. Today, LLVM subprojects have been formed essentially any time it was conducive to do so. This worked around the subversion sparse checkout challenges (arguably also solved by newer subversion features, but that's neither here nor there) and didn't cause any problem because we could lay out the tree any way that made sense and we always had a global revision number. A classic example: clang-tools-extra. At the time it was added, it was perceived as very useful to segregate. These days, I'm not sure the risk is interesting any more, and the cost is probably higher than the benefit. But it probably doesn't make sense to have a "cfe" directory and "clang-tools-extra" directory as peers. If we're moving to a monolithic repo, the clang-tools-extra stuff should almost *certainly* move under the 'tools' directory in the clang repo, where ever that ends up.

So, if we go with #1 above and just use the existing subversion repos as the top level directories, how would we rationally make a decision in the future about "should X new directory be a top-level directory, or a just fit it into the existing hierarchy?". I don't think we will ever have a good and principled response. We will constantly have oddball warts where things happen to be top-level because at one point we wanted the ability to not check out those Subversion repos, and now that has been enshrined.


I'm not actually arguing that #2 is a *good* layout. But I think it is a (slightly) less arbitrary layout than #1. And by breaking this weird mold of "all Subversion projects are top level", I think we'll be in a better place to make reasonably and considered decisions about re-structuring the layout long-term to reflect a useful and rational layout based on some set of reasonable technical principles.

It also has the advantage of being the layout which, if people's existing scripts and systems are set up around the defaults in the CMake build, will be the simplest to migrate to. I certainly know that all of my habits and patterns are geared around this layout and it will be dramatically easier for me to migrate to a single repo if it preserves this layout.


Long term, I want to see us use a layout that reasonably connotes the logical and practical structure of the code and project as a whole. I also long-term want to see the layout effectively address the pragmatic needs of tools and systems developers rely on such as "git log". On the whole, I think #2 is (slightly) closer to that than #1 so I strongly prefer it, but it clearly isn't perfect here. I just think we can incrementally fix and improve the layout over time. I don't think we're stuck in a single layout forever.

Hope this helps motivate why I would very much prefer to retain the default layout suggested in our docs and build system for now, and phrase any re-organization as follow-on changes once we had a single repo that made such changes straightforward and easily history-preserving.

-Chandler

Manuel Klimek via llvm-dev

unread,
Jul 22, 2016, 4:15:50 AM7/22/16
to Chandler Carruth, James Y Knight, Renato Golin, Justin Lebar via llvm-dev
Ok, throwing in my opinion, so we do not under-represent folks working far down the chain (I mainly have branches on clang-tools-extra I juggle with). Also with the disclaimer that a move to any git model will be super welcome from my side, and huge props & thanks to folks who have put so much time and effort into that!

If you work on clang-tools-extra, a single repo would be helpful. Version skew is annoying, and I have already tried (and I think so far failed to) understand git-submodules. Working on phabricator, which uses git submodules extensively, I've multiple times run into a problem where I was in a state that had the wrong git submodule version, and I was basically lost on how to solve that aside from completely re-initializing the submodules, which only works if you at least are backwards compatible (that is, the code works with a much newer submodule package version).

Regarding the specific point of being modular: without a stable code interface, we're not really cross-version-modular anyway. And if we want to encourage modularity within the libraries, there are other things we can do than splitting up the repos: for example, use the clang module maps to ensure #includes match dependencies, etc.

I do agree that it would be great to have a good solution to rebase forks onto the new repo, though.

Cheers,
/Manuel


Simon Taylor via llvm-dev

unread,
Jul 22, 2016, 4:16:29 AM7/22/16
to Sanjoy Das, llvm...@lists.llvm.org
Hi all,

I’ll start by saying I’ve skimmed this thread and am not actually a user of LLVM at all, but had some git thoughts that might be worth contributing.

> On 22 Jul 2016, at 01:16, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> @David Chisnall and others with local forks: can you spot any
> potential issues with Mehdi's plan? Are there cases where it won't
> work?

One potential “issue” is that a single commit into the monolithic repository would potentially touch multiple subprojects (that’s one of the advantages). Projecting that into individual repositories would only commit changes to those files, but the commit message would be maintained and might therefore be confusing in the context of the individual repository, especially if only a small part of the commit affects that individual sub-repo.

Essentially if the projects are “supposed” to be separate modules, then submodules is the solution to enforce that independence, ensuring commits in each module only affect that module and have appropriate commit messages for that context.

If the submodules are in practice more intertwined then that then it does feel like an ideologically pure solution that in the end just gets in the way of developer productivity.

I’ve got a setup here that uses a hierarchy of submodules, so there is a “combined” submodule that just ensures that it’s children (other submodules) are at mutually compatible versions. That helped productivity (multiple consumers of the “combined” submodule don’t need to manually track versions of all the children) but this discussion is pushing me towards the thought that actually a monorepo would be a more productive solution anyway, and make more sense for cross-cutting changes.

And sorry to throw another option into the ring; and one that might already have been discussed and discounted, but thought it worth sharing.

1) Create a new llvm-project-mono repo
2) Use git subtree instead of git submodule to add all the directories to match the layout of llvm-project.
3) From now on, all commits go to the monorepo
4) monorepo commits can be projected to the individual project repos, and additionally a new commit on llvm-project can be made with the submodule version updates

Advantages:

- No change for existing downstream users unless they want to move to the mono view
- Easier developer experience for cross-cutting changes
- Git log by path would work identically on either view of the repository
- Hashes from before the creation of the mono repo would match in both views - the mono repo will have multiple roots but that’s not unusual with git subtree

Disadvantages:

- Step 4 from my list would need a script to keep things updated. A server-side hook would be best. The mapping is deterministic (every mono repo commit will map to one commit in any affected submodules and one “submodule update” commit in the umbrella llvm-project repo), so if the server responsible falls over the updates might be delayed but can be caught up without losing anything
- Less ideologically pure in terms of trying to keep the modules independent
- Commit hashes will diverge between the two views from the creation of the mono repo, making comparisons / merges between clones of the different views more difficult


Simon

Sean Silva via llvm-dev

unread,
Jul 22, 2016, 5:03:48 AM7/22/16
to Simon Taylor, llvm...@lists.llvm.org
On Fri, Jul 22, 2016 at 1:16 AM, Simon Taylor via llvm-dev <llvm...@lists.llvm.org> wrote:
Hi all,

 I’ll start by saying I’ve skimmed this thread and am not actually a user of LLVM at all, but had some git thoughts that might be worth contributing.

> On 22 Jul 2016, at 01:16, Sanjoy Das via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> @David Chisnall and others with local forks: can you spot any
> potential issues with Mehdi's plan?  Are there cases where it won't
> work?

One potential “issue” is that a single commit into the monolithic repository would potentially touch multiple subprojects (that’s one of the advantages). Projecting that into individual repositories would only commit changes to those files, but the commit message would be maintained and might therefore be confusing in the context of the individual repository, especially if only a small part of the commit affects that individual sub-repo.

What do we do now? We already have the ability to do this. See the thread "[LLVMdev] [Git-fu] How to commit inter-repositories atomically to svn"

-- Sean Silva

Renato Golin via llvm-dev

unread,
Jul 22, 2016, 5:49:06 AM7/22/16
to Mehdi Amini, llvm...@lists.llvm.org
On 22 July 2016 at 00:51, Mehdi Amini <mehdi...@apple.com> wrote:
> While I see some good principled way to have a submodule umbrella repo in
> git, I don’t see any *without server-side hooks* that does not have any
> flaw*. Unfortunately this is not addressed by Renato’s proposal, and github
> does not allow server-side hooks, and another git hosting service is spelled
> out-of-discussion for Renato’s proposal.

*Please*, stop calling it *my* proposal. It absolutely wasn't.

I'll repeat, as people seem to prefer repeated arguments than
references to past emails, but:

* I have sent a number of concerns and options
* People have favoured GitHub with sub-modules (I hadn't)
* I summarised the first proposal, which seemed to be reaching consensus

Let's call it "First Proposal" or "GitHubSubMod" proposal:

http://llvm.org/docs/Proposals/GitHubSubMod.html

Everything "out-of-discussion" on the first proposal was in the
interest of reaching a self-contained proposal, and had absolutely no
ulterior motive.

Now the proposal is there, best we could make it. If there are
technical flaws, by all means, send a review to that document, but you
can't change that proposal into something else.

You can, however, create a new one, and that's what you're doing.

As people said earlier, getting to know one proposal well, has shown
many people that the "consensus" might not have been the best way
forward, but that was only possible by actually finalising at least
one proposal.

My assumption was that a survey would take us to the next step
(finding the precise and impersonal problems with that proposal), but
it seems I didn't need that. I stand corrected.

One thing your proposal doesn't even touch is where the repo will be.
I know it's basically orthogonal, but it's one of the key reasons why
we need to move. I have no preference, as long as the solution is
maintainable and cater for our needs.

My personal opinion is to host somewhere professional unless there's a
good reason not to.

If we use external hosting, GitHub is the best because there are
already thousands of forks (see Chisnall's email) there already, and
people do come to the list thinking the GitHub repo is our official
one.

If we don't, we'll have to understand the costs and who's going to
maintain it (volunteer vs. hired help). Relying on volunteers (like
myself) is extremely risky and I'd very much rather not go that way.
Relying on any company can create bias (or the impression of bias),
which can divide the community.

Again, I'm not pushing *any* agenda, just laying out the issues. But
if you want to compete with the first proposal, you *have* to have a
complete proposal, with all the pros and cons clearly laid out.

cheers,
--renato

PS: We may need a grid of proposals ({external, local} x {submod,
monolithic})...

Daniel Sanders via llvm-dev

unread,
Jul 22, 2016, 6:16:50 AM7/22/16
to Pete Cooper, Renato Golin, LLVM Developers

> Anyway, not trying to derail the discussion, just express that there are likely many others like me out there who are silent not because we don’t have an opinion, but because we just want git and don’t want to have an excessive number of +1’s on a thread saying so.

 

+1 :-)

 

There's another reason I've been staying quiet too which is that past experience with VCS migrations has taught me that people tend to over-value some things and that discussion tends not to convince people in advance of direct experience. I think some of these topics will end up being moot once we've moved to git and gotten used to it. For example, I've seen talk of wanting to preserve linear history which is understandable since it's quite nice to have. However, I suspect we'll drop that after a month or so as people find 'git push' doesn't work very well on a high traffic repo and start looking for alternatives. At that point I think we'll end up switching to pull requests and accepting non-linear history. Similarly, I think the desire for incremental revision numbers will gradually fade as people get used to git.

 

From: llvm-dev [mailto:llvm-dev...@lists.llvm.org] On Behalf Of Pete Cooper via llvm-dev
Sent: 21 July 2016 17:46
To: Renato Golin
Cc: LLVM Developers
Subject: Re: [llvm-dev] [RFC] One or many git repositories?

 

Thanks for driving this Renato.  It going to be a huge benefit to everyone once we have a solution in place.

On Jul 20, 2016, at 11:03 PM, Renato Golin via llvm-dev <llvm...@lists.llvm.org> wrote:

 

When everyone is happy that we have enough proposals, Tanya's survey
should be brought forward, in which case I'll gladly offer my help
again.

Regarding the survey specifically, and since I didn’t see a thread discussing survey options, I’d love to have a ‘I don’t mind what the solution is, I just want git’ option.  Basically, ‘any of the above’.

 

For me, I’m very happy with the proposals being discussed, but mostly just want to move to a more reliable hosting service (full disclosure, I’m a fan of GitHub), and I use git-svn anyway so native git would be best for me.

 

Anyway, not trying to derail the discussion, just express that there are likely many others like me out there who are silent not because we don’t have an opinion, but because we just want git and don’t want to have an excessive number of +1’s on a thread saying so.

 

Cheers,

Pete

Renato Golin via llvm-dev

unread,
Jul 22, 2016, 6:40:29 AM7/22/16
to Daniel Sanders, LLVM Developers
On 22 July 2016 at 11:16, Daniel Sanders <Daniel....@imgtec.com> wrote:
> There's another reason I've been staying quiet too which is that past
> experience with VCS migrations has taught me that people tend to over-value
> some things and that discussion tends not to convince people in advance of
> direct experience. I think some of these topics will end up being moot once
> we've moved to git and gotten used to it. For example, I've seen talk of
> wanting to preserve linear history which is understandable since it's quite
> nice to have. However, I suspect we'll drop that after a month or so as
> people find 'git push' doesn't work very well on a high traffic repo and
> start looking for alternatives. At that point I think we'll end up switching
> to pull requests and accepting non-linear history. Similarly, I think the
> desire for incremental revision numbers will gradually fade as people get
> used to git.

This is valid on a monolithic model, and that is one of the reasons I prefer it.

Today, I personally prefer the Git model (merges, pull requests, fuzzy
history), but I haven't always done so. The more I learnt how to use
Git, the more I realised how valuable the "confusing model" is for
distributed development.

Trying to force Git into an SVN model for the long term feels like
creating a niche that will be hard to work with (no hard evidence,
pinch of salt and all that).

I don't maintain a downstream fork, so I can't speak for that niche.
But forks in GitHub (all, not just LLVM's) seem to be fine merging
their patches over the original repository.

What this feels to me is that we were too complacent with the old
model and were slowly creeping Git support in an SVN world, and now we
realised how unusual is our "requirements".

Maybe you're right. Maybe moving to yet another model that satisfies
those requirements would be a step back, because we'd be setting in
stone a rule that was accommodated, not designed.

Maybe we should propose a third model: Use Git like Git. Pull requests and all.

As a quick recap of the things could go wrong, here's a
back-of-the-envelope idea of what could go wrong...

Changes that are the same as in linear monolithic core with external projects:

* the repositories themselves will have to adapt
* the build system (CMake and all)
* how the non-core repositories interact (relates to build system, bisect)
* all public forks (GitHub and others)
* all downstream forks (Many current LLVM active development affected)

New problems will be created:

* public and downstream forks that *rely* on linear history
* validation (buildbots will have to be re-written, or we'd have to
move to Jenkins, pull-request testing, etc)
* bisection (all our current tools will have to understand Git)
* library dependencies will be hard to bisect, because they won't be
in the same repository with the same history. This happens today in
GNU-land with binutils, glibc, etc.

All in all, not *that* different from the linear monolithic proposal,
and in my opinion, a future facing design, not a past driven
conformance.

cheers,
--renato

Daniel Sanders via llvm-dev

unread,
Jul 22, 2016, 8:40:25 AM7/22/16
to Renato Golin, LLVM Developers
> -----Original Message-----
> From: Renato Golin [mailto:renato...@linaro.org]
> Sent: 22 July 2016 11:40
> To: Daniel Sanders
> Cc: Pete Cooper; LLVM Developers
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>
> On 22 July 2016 at 11:16, Daniel Sanders <Daniel....@imgtec.com>
> wrote:
> > There's another reason I've been staying quiet too which is that past
> > experience with VCS migrations has taught me that people tend to over-
> value
> > some things and that discussion tends not to convince people in advance of
> > direct experience. I think some of these topics will end up being moot once
> > we've moved to git and gotten used to it. For example, I've seen talk of
> > wanting to preserve linear history which is understandable since it's quite
> > nice to have. However, I suspect we'll drop that after a month or so as
> > people find 'git push' doesn't work very well on a high traffic repo and
> > start looking for alternatives. At that point I think we'll end up switching
> > to pull requests and accepting non-linear history. Similarly, I think the
> > desire for incremental revision numbers will gradually fade as people get
> > used to git.
>
> This is valid on a monolithic model, and that is one of the reasons I prefer it.
>
> Today, I personally prefer the Git model (merges, pull requests, fuzzy
> history), but I haven't always done so. The more I learnt how to use
> Git, the more I realised how valuable the "confusing model" is for
> distributed development.

Same here, accepting the git model was a gradual thing for me. One particular
milestone was realizing that I didn't really need any logic in the commit id's
because I was only copy/pasting them or using things like 'my-branch^^'.

> Trying to force Git into an SVN model for the long term feels like
> creating a niche that will be hard to work with (no hard evidence,
> pinch of salt and all that).
>
> I don't maintain a downstream fork, so I can't speak for that niche.
> But forks in GitHub (all, not just LLVM's) seem to be fine merging
> their patches over the original repository.
>
> What this feels to me is that we were too complacent with the old
> model and were slowly creeping Git support in an SVN world, and now we
> realised how unusual is our "requirements".
>
> Maybe you're right. Maybe moving to yet another model that satisfies
> those requirements would be a step back, because we'd be setting in
> stone a rule that was accommodated, not designed.

I don't see it as a step backwards but rather as a way of making people
comfortable with the switch. I think opinions may gradually shift towards
a more conventional git model after the switch but that doesn't necessarily
detract from the value of a more svn-ish model if having one helps people
switch.

> Maybe we should propose a third model: Use Git like Git. Pull requests and
> all.
>
> As a quick recap of the things could go wrong, here's a
> back-of-the-envelope idea of what could go wrong...
>
> Changes that are the same as in linear monolithic core with external projects:
>
> * the repositories themselves will have to adapt
> * the build system (CMake and all)
> * how the non-core repositories interact (relates to build system, bisect)
> * all public forks (GitHub and others)
> * all downstream forks (Many current LLVM active development affected)
>
> New problems will be created:
>
> * public and downstream forks that *rely* on linear history

Do you have an example in mind? I'd expect them to rely on each 'master' being
an improvement on 'master^'. I wouldn't expect them to be interested in how
'master^' became 'master'.

> * validation (buildbots will have to be re-written, or we'd have to
> move to Jenkins, pull-request testing, etc)

Assuming the goal is to preserve what we have rather than improve it, buildbot
will be fine without any changes (beyond switching the source steps from svn to
git of course) whichever model we pick. It would just check out the latest 'master'
on each build like it currently does for trunk.

Renato Golin via llvm-dev

unread,
Jul 22, 2016, 8:56:23 AM7/22/16
to Daniel Sanders, LLVM Developers
On 22 July 2016 at 13:40, Daniel Sanders <Daniel....@imgtec.com> wrote:
> I don't see it as a step backwards but rather as a way of making people
> comfortable with the switch. I think opinions may gradually shift towards
> a more conventional git model after the switch but that doesn't necessarily
> detract from the value of a more svn-ish model if having one helps people
> switch.

The original idea was to change one thing at a time. SVN to Git, keep
everything else the same.

But that has proven harder than we imagined. So, maybe the best way
forward is not to do one step at a time, but to understand where we
are and what we need and take the "right" (tm) step forwards. Even if
it requires multiple steps, we can combine them into larger, fewer
steps.


>> * public and downstream forks that *rely* on linear history
>
> Do you have an example in mind? I'd expect them to rely on each 'master' being
> an improvement on 'master^'. I wouldn't expect them to be interested in how
> 'master^' became 'master'.

Paul Robinson was outlining some of the issues he had with git
history. I don't know their setup, so I'll let him describe the issues
(or he may have done so already in some thread, but I haven't read it
all).


> Assuming the goal is to preserve what we have rather than improve it, buildbot
> will be fine without any changes (beyond switching the source steps from svn to
> git of course) whichever model we pick. It would just check out the latest 'master'
> on each build like it currently does for trunk.

I meant Zorg and the like. Buildbot itself can handle Git, but we may
have assumptions that the repos are linked and linear in the builders.

But we have been discussing pre-commit testing for a while and it's
clear that Buildbots, in the way they're setup now, are not the
answer.

For the sake of the argument, here is the list of things we found:
* buildbots can have pre-commit testing via patch submission, but
controlling security and load is not trivial if we want people to
actually use it
* buildbots tracking non-master branches have the load problems if we
allow people to create branches, but not the security problems
* having a mirror so that bots track that mirror would solve the
security and load problems, but remove the ability for other people to
use it.

In essence, buildbots are single purpose and hard to configure (much
of it needs master restart).

OTOH, Jenkins can have configurable build scripts, with parameters and
customisations, that allow for us to pick pull requests and build
them, as they come.

It also scales independently, per architecture, from the number of
configurations, if you can use something like containers. So, in the
long term, it's cheaper and more robust to maintain.

However, it's a big change and will require another massive change in
how we do things, and the repository is already big enough.

Bruce Hoult via llvm-dev

unread,
Jul 22, 2016, 9:33:13 AM7/22/16
to Daniel Sanders, LLVM Developers
On Fri, Jul 22, 2016 at 3:40 PM, Daniel Sanders via llvm-dev <llvm...@lists.llvm.org> wrote:
Do you have an example in mind? I'd expect them to rely on each 'master' being
an improvement on 'master^'. I wouldn't expect them to be interested in how
'master^' became 'master'.

I would love it if each master commit was an improvement on the previous commit, or at last was virtually guaranteed to be not broken. It's most annoying that the existing LLVM history has a lot of examples of commits being reversed by a later commit.

The ease in git of branching -- and more importantly rebasing the branch on a later state of master -- means that you can run buildbots for all the different platforms on each pull request BEFORE merging it to master.

If buildbots are not fast enough to test every change (let alone repeatedly) then you can keep a pristine "master" head and a "proposed master" head that might have several pull requests added onto it sequentially. Then have the buildbots test the "proposed master" and if it passes then fast-forward advance the "master" head to the current "proposed master" head. Then merge the next batch of pull requests onto "proposed master", rinse and repeat.

If a "proposed master" fails and it has more than one pull request in it, then you can bisect it to find the bad pull request, throw it out, and try again without it.

Renato Golin via llvm-dev

unread,
Jul 22, 2016, 9:54:43 AM7/22/16
to Bruce Hoult, LLVM Developers
On 22 July 2016 at 14:33, Bruce Hoult <br...@hoult.org> wrote:
> I would love it if each master commit was an improvement on the previous
> commit, or at last was virtually guaranteed to be not broken. It's most
> annoying that the existing LLVM history has a lot of examples of commits
> being reversed by a later commit.

Historically, we use buildbots like we do as a way to work around the
fact that SVN doesn't have pull requests.


> The ease in git of branching -- and more importantly rebasing the branch on
> a later state of master -- means that you can run buildbots for all the
> different platforms on each pull request BEFORE merging it to master.

Indeed, this would be a *great* improvement.


> If buildbots are not fast enough to test every change (let alone repeatedly)
> then you can keep a pristine "master" head and a "proposed master" head that
> might have several pull requests added onto it sequentially. Then have the
> buildbots test the "proposed master" and if it passes then fast-forward
> advance the "master" head to the current "proposed master" head. Then merge
> the next batch of pull requests onto "proposed master", rinse and repeat.

We don't need to turn off the current post-commit bots, though. We
don't even need to use buildbots for pre-commits.

The current bots are good at covering the basics, like a last line of
defence. For pull requests we could have a simplified *additional*
testing that would pick the majority of the breakages.

That could be Jenkins or something else, that can drive configurable
builds through a large shared pool of resources, which is much more
suitable to pre-commit testing.

These would have to be *only* fast builders (~30min or less) and
should cover different targets. We should aim to have at least one per
supported target.

Robinson, Paul via llvm-dev

unread,
Jul 22, 2016, 1:50:31 PM7/22/16
to Renato Golin, Daniel Sanders, llvm...@lists.llvm.org
> >> * public and downstream forks that *rely* on linear history
> >
> > Do you have an example in mind? I'd expect them to rely on each 'master'
> being
> > an improvement on 'master^'. I wouldn't expect them to be interested in
> how
> > 'master^' became 'master'.
>
> Paul Robinson was outlining some of the issues he had with git
> history. I don't know their setup, so I'll let him describe the issues
> (or he may have done so already in some thread, but I haven't read it
> all).

Since you asked...

The key point is that a (basically) linear upstream history makes it
feasible to do bisection on a downstream branch that mixes in a pile
of local changes, because the (basically) linear upstream history can
be merged into the downstream branch commit-by-commit which retains
the crucial linearity property.

We have learned through experience that a bulk merge from upstream is
a Bad Idea(tm). Suppose we have a test that fails; it does not repro
with an upstream compiler; we try to bisect it; we discover that it
started after a bulk merge of 1000 commits from upstream. But we can't
bisect down the second-parent line of history, because that turns back
into a straight upstream compiler and the problem fails to repro.

If instead we had rolled the 1000 commits into our repo individually,
we'd have a linear history mixing upstream with our stuff and we would
be able to bisect naturally. But that relies on the *upstream* history
being basically linear, because we can't pick apart an upstream commit
that is itself a big merge of lots of commits. At least I don't know how.

Now, I do say "basically" linear because the important thing is to have
small increments of change each time. It doesn't mean we have to have
everything be ff-only, and we can surely tolerate the merge commits that
wrap individual commits in a pull-request kind of workflow. But merges
that bring in long chains of commits are not what we want.
--paulr

Richard Smith via llvm-dev

unread,
Jul 22, 2016, 4:08:25 PM7/22/16
to Justin Lebar, llvm-dev
Having read through the entire thread and thought about this for a while, here are my thoughts:

 * A single monolithic repository has quite a lot of advantages, some because of what it is (for instance, you can make atomic cross-project commits), and some because of what it isn't (keeping the repositories separate creates synchronization problems for version-locked components, and it's not clear to me that we have a good answer for these problems)

 * A single repository from which we can build a complete LLVM toolchain, without requiring checking out a dozen components in seemingly-random locations, would be valuable. The default behavior for someone checking out and building the LLVM project should be that they get a complete, fully-functional toolchain.

 * We need to preserve and maintain the easy ability to mix and match LLVM components with other components (other C runtime libraries, C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...). That means that it needs to be obvious what the boundaries of the optional components are, which means that the current project layout (the one implied by the build system) is not good enough for a monolithic repository (LLVM tests will fail if you don't check out llvm/tools/opt, but we presumably want to explicitly support not checking out llvm/tools/clang) -- unless we have extensive documentation covering this, and even then there are likely to be discoverability issues.

However, the move to git and the reorganization need not be done at the same time, and it seems vastly easier to reorganize *after* we move to a monolithic git repository -- it would then be essentially trivial for each person with organizational ideas to move the code around in their monolithic git repository, push it somewhere where we can all look at it, and for us to then make an informed choice about the layout, with a concrete example in front of us. Then we push the selected new layout; git supports this really nicely if all the parts are already in a single repository.

So here's what I would suggest:

- we move to a monolithic git repository on github

- this monolithic repository contains all the LLVM subprojects necessary to build a complete toolchain, including libc++ and other pieces that are not version-locked to llvm or clang

- the initial structure exactly matches the current layout implied by the build system (clang in tools/clang, lld in tools/lld, compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and so on)

- after we transition to git, interested parties assemble and upload to github patches reorganizing the project structure, and we have another discussion about principles for the restructuring (including forming solid guidance for how to organize future additions to LLVM), with reference to the patches so we can look at the proposed new layout; we pick one and commit it

The goal would be to have the new layout entirely settled by the time 4.0 branches.

On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <llvm...@lists.llvm.org> wrote:
Dear all,

I would like to (re-)open a discussion on the following specific question:

  Assuming we are moving the llvm project to git, should we
  a) use multiple git repositories, linked together as subrepositories
of an umbrella repo, or
  b) use a single git repository for most llvm subprojects.

The current proposal assembled by Renato follows option (a), but I
think option (b) will be significantly simpler and more effective.
Moreover, I think the issues raised with option (b) are either
incorrect or can be reasonably addressed.

Specifically, my proposal is that all LLVM subprojects that are
"version-locked" (and/or use the common CMake build system) live in a
single git repository.  That probably means all of the main llvm
subprojects other than the test-suite and maybe libc++.  From looking
at the repository today that would be: llvm, clang, clang-tools-extra,

lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.

Let's first talk about the advantages of a single repository.  Then
we'll address the disadvantages raised.

At a high level, one repository is simpler than multiple repos that
must be kept in sync using an external mechanism.  The submodules
solution requires nontrivial automation to maintain the history of
commits in the umbrella repo (which we need if we want to bisect, or
even just build an old revision of clang), but no such mechanisms are
required if we have a single repo.

Similarly, it's possible to make atomic API changes across subprojects
in a single repo; we simply can't do with the submodules proposal.
And working with llvm release branches becomes much simpler.

In addition, the single repository approach ties branches that contain
changes to subprojects (e.g. clang) to a specific version of llvm
proper.  This means that when you switch between two branches that
contain changes to clang, you'll automatically check out the right
llvm bits.

Although we can do this with submodules too, a single repository makes
it much easier.

As a concrete example, suppose you are working on some changes in
clang.  You want to commit the changes, then switch to a new branch
based on tip of head and make some new changes.  Finally you want to
switch back to your original branch.  And when you switch between
branches, you want to get an llvm that's in sync with the clang in
your working copy.

Here's how I'd do it with a monolithic git repository, option (b):

  git commit # old-branch
  git fetch
  git checkout -b new-branch origin/master
  # hack hack hack
  git commit # new-branch
  git checkout old-branch

Here's how I'd do it with option (a), submodules.  I've used git -C
here to make it explicit which repo we're working in, but in real life
I'd probably use cd.

  # First, commit to two branches, one in your clang repo and one in your
  # master repo.
  git -C tools/clang commit # old-branch, clang submodule
  git commit # old-branch, master repo
  # Now fetch the submodule and check out head.  Start a new branch in the
  # umbrella repo.
  git submodule foreach fetch
  git checkout -b origin/master new-branch
  git submodule update
  # Start a new branch in the clang repo pointing to the current head.
  git checkout -b -C tools/clang new-branch
  # hack hack hack
  # Commit both branches.
  git commit -C tools/clang # new-branch
  git commit # new-branch
  # Check out the old branch.
  git checkout old-branch
  git submodule update

This is twice as many git commands, and almost three times as much
typing, to do the same thing.

Indeed, this is so complicated I expect that many developers wouldn't
bother, and will continue to develop the way we currently do.  They
would thus continue to be unable to create clang branches that include
an llvm revision.  :(

There are real simplifications and productivity advantages to be had
by using a single repository.  They will affect essentially every
developer who makes changes to subprojects other than LLVM proper,
cares about release branches, bisects our code, or builds old
revisions.


So that's the first part, what we have to gain by using a monolithic
repository.  Let's address the downsides.

If you'll bear with a hypothetical: Imagine you could somehow make the
monolithic repository behave exactly like the N separate repositories
work today.  If so, that would be the best of both worlds: Those of us
who want a monolithic repository could have one, and those of us who
don't would be unaffected.  Whatever downsides you were worried about
would evaporate in a mist of rainbows and puppies.

It turns out this hypothetical is very close to reality.  The key is
git sparse checkouts [1], which let you check out only some files or
directories from a repository.  Using this facility, if you don't like
the switch to a monolithic repository, you can set up your git so
you're (almost) entirely unaffected by it.

If you want to check out only llvm and clang, no problem. Just set up
your .git/info/sparse-checkout file appropriately.  Done.

If you want to be able to have two different revisions of llvm and
clang checked out at once (maybe you want to update your clang bits
more often than you update your llvm bits), you can do that too.  Make
one sparse checkout just of llvm, and make another sparse checkout
just of clang.  Symlink the clang checkout to llvm/tools/clang.
That's it.  The two checkouts can even share a common .git dir, so you
don't have to fetch and store everything twice.

As far as I can tell, the only overhead of the monolithic repository
is the extra storage in .git.  But this is quite small in the scheme
of things.

The .git dir for the existing monolithic repository [2] is 1.2GB.  By
way of comparison, my objdir for a release build of llvm and clang is
3.5G, and a full checkout (workdir + .git dirs) of llvm and clang is
0.65G.

If the 1.2G really is a problem for you (or more likely, your
automated infrastructure), a shallow clone [3] takes this down to 90M.

The critical point to me in all this is that it's easy to set up the
monolithic repository to appear like it's a bunch of separate repos.
But it is impossible, insofar as I can tell, to do the opposite.  That
is, option (b) is strictly more powerful than option (a).


Renato has understandably pointed out that the current proposal is
pretty far along, so please speak up now if you want to make this
happen.  I think we can.

Regards,
-Justin

[1] Git sparse checkouts were introduced in git 1.7, in 2010. For more
info, see http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/.
As far as I can tell, sparse checkouts work fine on Windows, but you
have to use git-bash, see http://stackoverflow.com/q/23289006.
[2] https://github.com/llvm-project/llvm-project
[3] git clone --depth=1 https://github.com/llvm-project/llvm-project.git

Hal Finkel via llvm-dev

unread,
Jul 22, 2016, 4:18:13 PM7/22/16
to Richard Smith, llvm-dev
From: "Richard Smith via llvm-dev" <llvm...@lists.llvm.org>
To: "Justin Lebar" <jle...@google.com>
Cc: "llvm-dev" <llvm...@lists.llvm.org>
Sent: Friday, July 22, 2016 3:08:18 PM
Subject: Re: [llvm-dev] [RFC] One or many git repositories?

Having read through the entire thread and thought about this for a while, here are my thoughts:

 * A single monolithic repository has quite a lot of advantages, some because of what it is (for instance, you can make atomic cross-project commits), and some because of what it isn't (keeping the repositories separate creates synchronization problems for version-locked components, and it's not clear to me that we have a good answer for these problems)

 * A single repository from which we can build a complete LLVM toolchain, without requiring checking out a dozen components in seemingly-random locations, would be valuable. The default behavior for someone checking out and building the LLVM project should be that they get a complete, fully-functional toolchain.

 * We need to preserve and maintain the easy ability to mix and match LLVM components with other components (other C runtime libraries, C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...). That means that it needs to be obvious what the boundaries of the optional components are, which means that the current project layout (the one implied by the build system) is not good enough for a monolithic repository (LLVM tests will fail if you don't check out llvm/tools/opt, but we presumably want to explicitly support not checking out llvm/tools/clang) -- unless we have extensive documentation covering this, and even then there are likely to be discoverability issues.

However, the move to git and the reorganization need not be done at the same time, and it seems vastly easier to reorganize *after* we move to a monolithic git repository -- it would then be essentially trivial for each person with organizational ideas to move the code around in their monolithic git repository, push it somewhere where we can all look at it, and for us to then make an informed choice about the layout, with a concrete example in front of us. Then we push the selected new layout; git supports this really nicely if all the parts are already in a single repository.

So here's what I would suggest:

- we move to a monolithic git repository on github

- this monolithic repository contains all the LLVM subprojects necessary to build a complete toolchain, including libc++ and other pieces that are not version-locked to llvm or clang

- the initial structure exactly matches the current layout implied by the build system (clang in tools/clang, lld in tools/lld, compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and so on)

- after we transition to git, interested parties assemble and upload to github patches reorganizing the project structure, and we have another discussion about principles for the restructuring (including forming solid guidance for how to organize future additions to LLVM), with reference to the patches so we can look at the proposed new layout; we pick one and commit it
I agree with all of this.

I think that we should still keep the test-suite in a separate repository (both because it is very large, should be even larger, and because it follows a very different licensing policy).

 -Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory

Piotr Padlewski via llvm-dev

unread,
Jul 22, 2016, 4:18:56 PM7/22/16
to Richard Smith, llvm-dev
I have one reasone why we should not moe to monolithic repository - If you do some light stuff like clang-tidy, that don't often require syncing with clang, but you still want to have the most recent checks, then I don't see a solution in monolithic repository.
And this is a real issue if you only have 2 or 4 core laptop to do work. 
And I guess the the build system won't solve the problem, just a small change in some llvm file will result in recompiling many files that clang-tidy depends on.

Piotr Padlewski via llvm-dev

unread,
Jul 22, 2016, 4:23:23 PM7/22/16
to Richard Smith, llvm-dev
And the same thing happen to IDEs - I would not like to spend next 10-15 minutes updating symbols in my IDE which would also drain my battery. So basically what happens is you pay for what you don't use, which is not C++ way :P 

Hal Finkel via llvm-dev

unread,
Jul 22, 2016, 4:33:48 PM7/22/16
to Piotr Padlewski, llvm-dev
From: "Piotr Padlewski via llvm-dev" <llvm...@lists.llvm.org>
To: "Richard Smith" <ric...@metafoo.co.uk>
Cc: "llvm-dev" <llvm...@lists.llvm.org>
Sent: Friday, July 22, 2016 3:18:31 PM
Subject: Re: [llvm-dev] [RFC] One or many git repositories?

I have one reasone why we should not moe to monolithic repository - If you do some light stuff like clang-tidy, that don't often require syncing with clang, but you still want to have the most recent checks, then I don't see a solution in monolithic repository.
And this is a real issue if you only have 2 or 4 core laptop to do work. 
And I guess the the build system won't solve the problem, just a small change in some llvm file will result in recompiling many files that clang-tidy depends on.
This seems like an orthogonal problem. It would also be nice to have a build-system mode which decouples Clang from LLVM, in terms of dependency checking, for the same reason.

 -Hal

Chandler Carruth via llvm-dev

unread,
Jul 22, 2016, 4:36:56 PM7/22/16
to Richard Smith, Justin Lebar, llvm-dev
On Fri, Jul 22, 2016 at 1:08 PM Richard Smith via llvm-dev <llvm...@lists.llvm.org> wrote:
Having read through the entire thread and thought about this for a while, here are my thoughts:

 * A single monolithic repository has quite a lot of advantages, some because of what it is (for instance, you can make atomic cross-project commits), and some because of what it isn't (keeping the repositories separate creates synchronization problems for version-locked components, and it's not clear to me that we have a good answer for these problems)

 * A single repository from which we can build a complete LLVM toolchain, without requiring checking out a dozen components in seemingly-random locations, would be valuable. The default behavior for someone checking out and building the LLVM project should be that they get a complete, fully-functional toolchain.

 * We need to preserve and maintain the easy ability to mix and match LLVM components with other components (other C runtime libraries, C++ ABI libraries, C++ standard libraries, linkers, debuggers, ...). That means that it needs to be obvious what the boundaries of the optional components are, which means that the current project layout (the one implied by the build system) is not good enough for a monolithic repository (LLVM tests will fail if you don't check out llvm/tools/opt, but we presumably want to explicitly support not checking out llvm/tools/clang) -- unless we have extensive documentation covering this, and even then there are likely to be discoverability issues.

However, the move to git and the reorganization need not be done at the same time, and it seems vastly easier to reorganize *after* we move to a monolithic git repository -- it would then be essentially trivial for each person with organizational ideas to move the code around in their monolithic git repository, push it somewhere where we can all look at it, and for us to then make an informed choice about the layout, with a concrete example in front of us. Then we push the selected new layout; git supports this really nicely if all the parts are already in a single repository.

So here's what I would suggest:

- we move to a monolithic git repository on github

- this monolithic repository contains all the LLVM subprojects necessary to build a complete toolchain, including libc++ and other pieces that are not version-locked to llvm or clang

- the initial structure exactly matches the current layout implied by the build system (clang in tools/clang, lld in tools/lld, compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and so on)

- after we transition to git, interested parties assemble and upload to github patches reorganizing the project structure, and we have another discussion about principles for the restructuring (including forming solid guidance for how to organize future additions to LLVM), with reference to the patches so we can look at the proposed new layout; we pick one and commit it

The goal would be to have the new layout entirely settled by the time 4.0 branches.

Strong +1 to all of this. It was what I was trying to suggest, but more explicitly written.

Chandler Carruth via llvm-dev

unread,
Jul 22, 2016, 4:38:42 PM7/22/16
to Hal Finkel, Richard Smith, llvm-dev
On Fri, Jul 22, 2016 at 1:18 PM Hal Finkel via llvm-dev <llvm...@lists.llvm.org> wrote:
I think that we should still keep the test-suite in a separate repository (both because it is very large, should be even larger, and because it follows a very different licensing policy).

Yes. We actually *can't* merge the test suite without causing massive problems for very many people due to licenses and other things.

And this is also fine, because it still satisfies Richard's point about checking out a single repository and building a complete toolchain -- the test-suite isn't necessary for that.

Mehdi Amini via llvm-dev

unread,
Jul 22, 2016, 4:39:17 PM7/22/16
to Piotr Padlewski, llvm-dev
The build system can help, you just need to have two (sparse) checkout: one for LLVM/clang and the other for clang-tidy, and configure the build with the LLVM/clang checkout adding the clang-tidy as external.

— 
Mehdi

Justin Lebar via llvm-dev

unread,
Jul 22, 2016, 4:41:45 PM7/22/16
to Piotr Padlewski, llvm-dev
Hi, Piotr.

> If you do some light stuff like clang-tidy, that don't often require syncing with clang, but you still want to have the most recent checks, then I don't see a solution in monolithic repository.

Please see my original e-mail, in the paragraph that begins "If you


want to be able to have two different revisions of llvm and clang

checked out at once".

This describes a workflow that would allow you to update clang-tidy
without updating all of llvm. I think this would address the issue
you raise.

I grant that setting this up would require a one-time but nonzero
amount of work from developers like you. But then the question is
whether we should optimize for this one-time advantage for a few
developers or advantages for the vast majority of us that affect our
work every day.

-Justin

Piotr Padlewski via llvm-dev

unread,
Jul 22, 2016, 4:49:19 PM7/22/16
to Mehdi Amini, llvm-dev
2016-07-22 13:39 GMT-07:00 Mehdi Amini <mehdi...@apple.com>:
The build system can help, you just need to have two (sparse) checkout: one for LLVM/clang and the other for clang-tidy, and configure the build with the LLVM/clang checkout adding the clang-tidy as external.

Mehdi


Can you describe it more? I don't get the approach, but it seems we are trying to make it easier to use llvm, but in the same time we are making it harder.

I don't find the repositories setup we have right now hard. It might get a little bit harder if we would introduce git submodules, but mostly because submodules are something that not many developers use on daily basis, but it would actually give us the syncing ability that we need.

BTW Does anyone knows why cmake is reloading each time I update llvm/clang repo? I hope that both approaches would solve this problem, because it doesn't seem like a something that should happen. 

Tim Northover via llvm-dev

unread,
Jul 22, 2016, 5:05:14 PM7/22/16
to Piotr Padlewski, llvm-dev
On 22 July 2016 at 13:48, Piotr Padlewski via llvm-dev

<llvm...@lists.llvm.org> wrote:
>> The build system can help, you just need to have two (sparse) checkout: one for LLVM/clang and the other for clang-tidy, and configure the build with the LLVM/clang checkout adding the clang-tidy as external.

> Can you describe it more?

Something like this

$ git clone g...@github.com:llvm/llvm.git
$ git clone g...@github.com:llvm/llvm.git clang-tools-extras
$ <make the clang-tools sparse if you want, doesn't seem strictly
necessary though>
$ mkdir build && cd build
$ cmake ../llvm
-DLLVM_EXTERNAL_CLANG_TOOLS_EXTRA_SOURCE_DIR=../clang-tools-extras/tools/clang/tools/clang-tools-extras
$ make

> I don't get the approach, but it seems we are trying to make it easier to use llvm, but in the same time we are making it harder.

As Justin said, this isn't an issue for the majority of developers and
it's a solvable problem for you.

> BTW Does anyone knows why cmake is reloading each time I update llvm/clang repo? I hope that both approaches would solve this problem, because it doesn't seem like a something that should happen.

It reconfigures every time a CMakelists.txt file used in the build is
changed, which is unavoidable as far as I'm aware.

Cheers.

Tim.

It is loading more messages.
0 new messages