[RFC] Change to repo manifest format

1,358 views
Skip to first unread message

Shawn Pearce

unread,
May 21, 2009, 1:49:46 AM5/21/09
to repo-discuss
After some discussion with my coworkers I've come to realize that the current "repo sync" behavior of jumping to the latest version in each project is insane.  So, I have started working on a proposed overhaul of repo's manifest format.

In a nut shell, a repo manifest would be a git repository formatted for use by `git submodule`[1].  This means that instead of having an XML manifest file like we do today, there would be a single ".gitmodules"[2] file in the top level directory enumerating the projects, and special "gitlink" file entries recording the current commit SHA-1 of each project.

  [1] http://www.kernel.org/pub/software/scm/git/docs/git-submodule.html
  [2] http://www.kernel.org/pub/software/scm/git/docs/gitmodules.html

For example:

  $ cat .repo/manifests/.gitmodules
  [submodule "platform/build"]
        path = build
        url = ./build.git
        revision = .
  [submodule "platform/dalvik"]
        path = dalvik
        url = ./dalvik.git
        revision = .
  [submodule "platform/development"]
        path = development
        url = ./development.git
        revision = .
  $ (cd .repo/manifests; git ls-files --stage)
  160000 9537884b0dabe81bf612c79d12c7b4bf40de10a5 0    build
  160000 dbd5c902c3cca79e22d0f38539cc195ef7395647 0    dalvik
  160000 9cd49a44162f465402b0f10faa351364c76a5b76 0    development

A few things about this proposed format:

- The URL is relative to the manifest repository (or it may be absolute).  This makes setting up a mirror easier, as you don't necessarily need to tweak the manifest file just to point to the mirrored copy.

- I don't know where to put the review URL.  Given the URLs being relative, it suddenly seems odd to put the review URL into a "submodule.dalvik.review" key in the .gitmodules file.  Suggestions would be appreciated.

- The manifest format matches what `git submodule` expects.  That means one could clone the manifest and manage to do a checkout (and possibly build) without repo.  Conversely, repo could automatically be used on any other already existing `git submodule` style project.

- There is an optional revision property for each module.  The revision property replaces the revision property in the XML manifest format, in that it tells us what *branch* should be used to keep the submodule current.  A revision of "cupcake" would instead always refer to the "cupcake" branch of that project.  The magic value of "." means "use the same branch name as the manifest itself".  Thus branching a manifest implicitly branches all of the projects.  I think its more common to branch everything than to branch only 1 or 2 projects.

- There is no .repo/local_manifest.xml.  I can't decide how to represent it, or what aspects are important to maintain.  Comments from users about this removal in functionality would be most appreciated.

- Every project always has a current SHA-1.


The last two points are important.  *Especially* the last point.

During `repo sync`, repo will only update to the SHA-1 listed in the manifest's gitlink file entry.  If the branch named by the revision property is 5 commits ahead, `repo sync` will ignore the branch and will stick to what is listed in the gitlink; that entry displayed by `git ls-files --stage`.

Someone, somewhere, must be responsible for updating these project SHA-1s within the manifest, otherwise `repo sync` will never see new changes.  Enter Gerrit Code Review.

Gerrit Code Review will need to automatically update the manifest gitlink entries when a submit occurs within a project, so that the manifest's gitlink stays current to the project's branch.   Gerrit will automatically update manifests, creating new commits in the manifest project, anytime a submit occurs on a submodule project branch that appears in the revision field in the manifest's ".gitmodules" file.  For example, if this is the "cupcake" manifest branch, and a commit occurs on the "cupcake" branch in dalvik, Gerrit will update the dalvik gitlink entry, and create a new commit in the manifest "cupcake" branch.

If you aren't using Gerrit Code Review, a human, or your own script, will need to manually update the manifest, commit it, and publish that change.


I've only started prototyping this, but I hope to have at least the repo changes ready sometime next week for folks to look at and kick the tires on, before it gets merged over to the "stable" branch.  I have yet to start coding the necessary improvements to Gerrit Code Review, but they are a critical part for this proposal to go live.

repo will continue to honor XML formatted manifests for some time into the future, e.g. another 6 months, but I expect AOSP to cut over to the new manifest format much, much sooner, possibly before the next major tasty platform release is made.

I already have a basic "repo upgrade-manifest" tool that can be used to upgrade a manifest from XML to this submodule format, so for the most part I hope existing manifest maintainers will be able to convert with relatively little effort.  For leaf-level developers, the switch over in format would be automatic on their next "repo sync" (after the manifest mantainer made the change), but see above about losing support for local_manifest.xml.


Thoughts?

Jey Michael

unread,
May 21, 2009, 2:17:09 AM5/21/09
to repo-d...@googlegroups.com
Each project having a SHA1 is a good news, Shawn.
Two quick points (further below)


On Wed, May 20, 2009 at 10:49 PM, Shawn Pearce <s...@google.com> wrote:

>
> During `repo sync`, repo will only update to the SHA-1 listed in the manifest's gitlink file entry.  If the branch named by the revision property is 5 commits ahead, `repo sync` will ignore the branch and will stick to what is listed in the gitlink; that entry displayed by `git ls-files --stage`.


I assume that the developer's 'repo sync' will just continue to work
as before, and only the admin at the central repository would have to
update the project SHA-1 within the manifest (either with Gerrit, or
some script) correct?
That brings to the next point..

>
> Someone, somewhere, must be responsible for updating these project SHA-1s within the manifest, otherwise `repo sync` will never see new changes.  Enter Gerrit Code Review.


Does n't this add a rather tight dependency on Gerrit? ie, If one
does not use Gerrit, there is a really painful maintenance task on the
admin side. Or so, it seems from your description. Is this the
proposal? Is it not possible to retain the loose coupling and provide
some out-of-the-box solution, for admins with non-gerrit setup too?

-Jey

Shawn Pearce

unread,
May 21, 2009, 10:09:30 AM5/21/09
to repo-d...@googlegroups.com
On Wed, May 20, 2009 at 23:17, Jey Michael <jey.m...@gmail.com> wrote:
On Wed, May 20, 2009 at 10:49 PM, Shawn Pearce <s...@google.com> wrote:

> During `repo sync`, repo will only update to the SHA-1 listed in the manifest's gitlink file entry.  If the branch named by the revision property is 5 commits ahead, `repo sync` will ignore the branch and will stick to what is listed in the gitlink; that entry displayed by `git ls-files --stage`.

I assume that the developer's 'repo sync' will just continue to work
as before, and only the admin at the central repository would have to
update the project SHA-1 within the manifest (either with Gerrit, or
some script)  correct?

Yes.
 
> Someone, somewhere, must be responsible for updating these project SHA-1s within the manifest, otherwise `repo sync` will never see new changes.  Enter Gerrit Code Review.

Does n't this add a rather tight dependency on Gerrit?

Unfortunately, yes.  Or rather, a dependency between repo and something that will manage the manifest.  At present I only have plans for teaching Gerrit Code Review how to do that.  However, since its just a Git repository, other tools could be created.
 
 ie, If one
does not use Gerrit, there is a really painful maintenance task on the
admin side.  Or so, it seems from your description.  Is this the
proposal?  Is it not possible to retain the loose coupling and provide
some out-of-the-box solution, for admins with non-gerrit setup too?

Maybe create an update hook in each of your repositories that updates the manifest when a branch is modified by git push?  By moving from an XML manifest format to one that is more native to Git, it should be easier to construct a script that can update the manifest as needed.   My hope is that non-Gerrit administrators would pool together and create and share this script, maybe include it in some sort of "contrib/" directory within repo?

Jean-Baptiste Queru

unread,
May 21, 2009, 11:23:45 AM5/21/09
to repo-d...@googlegroups.com
-the new scheme also needs to work with back-end pushes to
refs/heads/cupcake, not just submits done as a result of pushes to
refs/for/cupcake (e.g. Google's auto-importers and auto-mergers work
like that, and it's also used for some maintenance tasks).

-let's keep in mind the use case where someone uses a private variant
of some projects and a shared variant of most others (a use case that
we've seen both inside and outside Google). It's quite painful today,
and I'd hate to see it become harder.

-Gerrit will need some logic to know which projects to include in the
manifest (it can't just rely on existing branches), i.e. it'll need a
manifest template from which to build the manifest.

-having a format that allows to diff manifests that contain explicit
revisions would be a nice added bonus - the current format makes this
hard as every line ends up being different.

JBQ

On Wed, May 20, 2009 at 10:49 PM, Shawn Pearce <s...@google.com> wrote:
--
Jean-Baptiste M. "JBQ" Queru
Android Engineer, Google.

Questions sent directly to me that have no reason for being private
will likely get ignored or forwarded to a public forum with no further
warning.

Shawn Pearce

unread,
May 21, 2009, 4:41:14 PM5/21/09
to repo-d...@googlegroups.com
On Thu, May 21, 2009 at 08:23, Jean-Baptiste Queru <j...@android.com> wrote:

-the new scheme also needs to work with back-end pushes to
refs/heads/cupcake, not just submits done as a result of pushes to
refs/for/cupcake (e.g. Google's auto-importers and auto-mergers work
like that, and it's also used for some maintenance tasks).

*sigh*

In the case of e.g. cupcake, where its the automatic legacy p4 -> git going on, we really should have that conversion tool create the update to the manifest, so it preserves the atomic change in p4 into git, despite the change being broken over multiple projects.

For other branches, e.g. donut, where its git -> git replication, again, we should be able to use the manifest as created on the one side to update the other.

The git based automerger of donut -> master however would be tricky.  I think that for this to work correctly it would need to merge the project, *and* merge the manifest, and update the manifest with the new project commit after the project merge was done, then push that whole group into Gerrit.  It probably has to happen in the automerger.

But I can see cases where someone manages a project by git push rather than change submits in Gerrit, but that project should automatically update in the manifest.  So we probably do have to support updating the manifest automatically on push.
 
-let's keep in mind the use case where someone uses a private variant
of some projects and a shared variant of most others (a use case that
we've seen both inside and outside Google). It's quite painful today,
and I'd hate to see it become harder.

Manifest inheritance is supposed to fix that.  Its never been implemented.

What I outlined earlier about using a single git repository as the manifest may make manifest inheritance harder.  To implement the inheritence we would probably need to merge the base manifest into the private manifest.  So any updates to the base manifest in Gerrit would need to be also merged into the private manifest.  Rinse, repeat, in case that manifest is a base manifest for another.

An alternative approach would be to have multiple manifest repositories in the client, and make repo union them, but then you lose some aspect of the "atomic sync".  Now instead of one SHA-1 to cover the manifest version we need N SHA-1s, one for each manifest in the union.

I'm not sure which is worse.
 
-Gerrit will need some logic to know which projects to include in the
manifest (it can't just rely on existing branches), i.e. it'll need a
manifest template from which to build the manifest.

That's what the .gitmodules file is for.  It has at least 3 lines per project, denoting the name of the project, where to get it from (relative to the manifest usually), and where it goes in the client checkout (path for the build system to find it at).  But, there is also the "gitlink" entry in the index/tree that tells us the project should exist, and what commit SHA-1 is should be.

So Gerrit will need to parse this information out of the manifest repository, and create a reverse lookup table of project -> manifest, detailing what it has to update, where.  Anytime the manifest changes, Gerrit will need to update this reverse lookup table to ensure it remains consistent.
 
-having a format that allows to diff manifests that contain explicit
revisions would be a nice added bonus - the current format makes this
hard as every line ends up being different.

Yes.  git diff inside of the manifest repository will give a much more concise answer.

Brad Larson

unread,
May 22, 2009, 5:03:53 PM5/22/09
to Repo and Gerrit Discussion
I'm not sure if I fully understand the proposal, so forgive me if this
seems uninformed.

> After some discussion with my coworkers I've come to realize that the
> current "repo sync" behavior of jumping to the latest version in each
> project is insane.

Why is that? If it is because of possible instability, couldn't you
just specify SHA1 tags in your manifest and update them when needed?

It seems like the proposed change is to always specify the SHA1 tag in
the manifest, and have gerrit automatically update the manifest when
code is merged. Won't that result in the same behavior of always
jumping to the latest version?

I guess that isn't the case if you aren't currently using gerrit...
but why would anyone not be using gerrit? ;)

> - Every project always has a current SHA-1.

By this, do you mean that a repo sync won't pull down changes past
what the manifest SHA1 specifies? This has been a nice feature for
us, so any developer can see what has changed in a project and merge
it in to their branch or update their manifest to point to a more
recent version. Will this capability still exist?

> - There is no .repo/local_manifest.xml. I can't decide how to represent it,
> or what aspects are important to maintain. Comments from users about this
> removal in functionality would be most appreciated.

> Manifest inheritance is supposed to fix that. Its never been implemented.

We are really looking forward to manifest inheritance, but are using
local_manifest.xml currently as a workaround. They aren't required
for our setup, but are very handy.

Other than better manifest diffs and automatic branch detection, I
don't understand what new features or benefits this change will
provide which aren't already supported by the current xml file. I
think the branch detection could be added to the xml format. Here is
our use-case (still somewhat under development):

We have several products sharing one gerrit install. Of the 170ish
repositories, about 110 are pristine copies from upstreem and are used
by all products. Another 50 or so have a MyCompany branch, and all
the products use this branch. The other 10 repositories are fairly
product-specific and each product has their own branch. While the
product is in development mode, it follows tips of all internal
repositories and specifies SHA1 tags for upstream unmodified
repositories. When we hit release/bug squash mode, that product will
specify SHA1 hashes in the manifest for all repositories and only
update to newer versions of a repository when needed.

The current xml format works quite well for this. With the new
format, if product A submits a change to one of the 50 shared
repositories, will products B-E have to update their manifest by hand
to use the new change?

Brad

skillzero

unread,
May 26, 2009, 12:51:48 AM5/26/09
to Repo and Gerrit Discussion
On May 22, 2:03 pm, Brad Larson <bklar...@gmail.com> wrote:
>
> We have several products sharing one gerrit install.  Of the 170ish
> repositories, about 110 are pristine copies from upstreem and are used
> by all products.  Another 50 or so have a MyCompany branch, and all
> the products use this branch.  The other 10 repositories are fairly
> product-specific and each product has their own branch.  While the
> product is in development mode, it follows tips of all internal
> repositories and specifies SHA1 tags for upstream unmodified
> repositories.  When we hit release/bug squash mode, that product will
> specify SHA1 hashes in the manifest for all repositories and only
> update to newer versions of a repository when needed.
>
> The current xml format works quite well for this.  With the new
> format, if product A submits a change to one of the 50 shared
> repositories, will products B-E have to update their manifest by hand
> to use the new change?

I'm also curious about this. My main reason for using repo is that I
don't have to manually specify or update SHA-1 hashes. I'm sort of
worried now because I've been proposing my group to switch to repo,
but if we have to manually update each manifest repository every time
somebody pushes a change to a sub-project (which is like 30 times a
day on average), we're in for a ton of work to keep everything in
sync. It's not possible for us to use an update hook (at least I don't
think it is) because some people that have access to a sub-project
don't have access to all the super projects that use it so they
wouldn't be able to update some super projects.

Shawn Pearce

unread,
May 26, 2009, 11:48:55 AM5/26/09
to repo-d...@googlegroups.com
On Fri, May 22, 2009 at 14:03, Brad Larson <bkla...@gmail.com> wrote:

I'm not sure if I fully understand the proposal, so forgive me if this
seems uninformed.

> After some discussion with my coworkers I've come to realize that the
> current "repo sync" behavior of jumping to the latest version in each
> project is insane.

Why is that?  If it is because of possible instability,

Yes, its because of instability.  We are finding that some change span 5, 6 projects at once, and if you get a change in project A but not the corresponding change in project B, then the build fails.  If project B is causing a merge conflict, but A has already submitted, we may have the entire system unbuildable for hours while the conflict is resolved and resubmitted.  That puts the entire team at a disadvantage, because now if you sync, you get A, and you can't compile to test.  If you don't sync, you can perhaps compile, but maybe you are missing a critical but unrelated bug fix in Z that was submitted.  This puts everyone between a rock and a hard place, with no way out.  :-)
 
couldn't you
just specify SHA1 tags in your manifest and update them when needed?

Yes.

However, if you put a SHA1 tag in your manifest, then tools like "repo upload" break, because they try to upload to refs/for/$SHA1, and $SHA1 is never a valid branch name in a project.  (Who really names their branches 498a0e8a79ab76eeb6adc40f12b04d59820716f9 or 5d15cf383abb5166d6233779042c7f31eaac26bd, I mean, that's just too obtuse of a name.)

Also, editing the XML manifest files by machine is somewhat ugly, you need a parser of some sort to go through the tag soup, make the edits, and format it back out.  The most common way to do this is with a DOM parser, which also causes any hand-formatting to be lost.  E.g. look at the hand crafted manifests from android-1.0 time period, compared to the output of `repo manifest -o -` today.

We're also likely to see many merge conflicts in the manifest file, if they use SHA1 tags, any time the revision changes its a potential for a conflict.  Resolving such conflicts in XML is rather difficult to do by machine.  E.g. the conflicit file produced by a naive git merge isn't a valid XML file that can be processed.  It is possible to reobtain the 2 ancestor revisions, but the common base is much more difficult, especially if this was a cross cross merge[1] case and the merge base was synthesized in core by git merge.  Even if we ignore the criss cross case, we need to parse the 3 XML files and then write our own merge algorithm over the DOM tree.  Yuck.

[1] http://revctrl.org/CrissCrossMerge

By switching to the git submodule format, where the most mutable data, the project SHA1, is stored in the Git tree as a gitlink entity, merge conflicts show up in the index structure as discrete records, making it easier to extract the sides, compute the merge result, and update it to resolve the conflict.  Its also easier to update any project SHA1, by using `git update-index --cacheinfo`.  No XML editing.  Just a tree entry update.

Also, the less mutable data, the project info pointers, are stored in a file that `git config` can edit from the command line.  This makes it easier to programatically register a new project, or to remove an existing project, through a few `git config --file=.gitmodules` invocations.

It seems like the proposed change is to always specify the SHA1 tag in
the manifest, and have gerrit automatically update the manifest when
code is merged.

Yes.
 
 Won't that result in the same behavior of always
jumping to the latest version?

Yes and no.  Let me try to explain again.

If a change only touches one project, then yes, as soon as it submits, the manifest jumps to that new revision.  So there is no difference.

However, if a change touches 2 or more projects, and they are interrelated, the first project submits, and the manifest does nothing.  When the second project submits, then the manifest jumps both projects simultaneously, in the same manifest update commit.  Thus clients syncing off that manifest either see no update, or they see both updates, but they are never presented a version of the manifest where only one of the 2 projects has been updated.
 
I guess that isn't the case if you aren't currently using gerrit...
but why would anyone not be using gerrit?  ;)

Some folks haven't gotten on the Gerrit band wagon.  They are comfortable with Gitorious.  Or they have multiple remote sites and Gerrit's single central point is too difficult to work with.  Etc.
 
> - Every project always has a current SHA-1.


By this, do you mean that a repo sync won't pull down changes past
what the manifest SHA1 specifies?

Correct.  Well, it would download them, e.g. the korg/master branch would be updated, but the m/master branch wouldn't be updated in that project.  So you would still have access to the commits and the files, but they wouldn't appear in the working tree, and they wouldn't appear yet in the m/master branch, since the m/* namespace tracks the manifest.
 
 This has been a nice feature for
us, so any developer can see what has changed in a project and merge
it in to their branch or update their manifest to point to a more
recent version.  Will this capability still exist?

I guess.  See above about repo sync at least downloading the commits.  repo upload might be confused however, as it might claim its about to upload changes that aren't yet in the manifest, but were already submitted.  Actually, that's just a bug in the way repo upload formats its output.  I should fix that.
 
> - There is no .repo/local_manifest.xml.  I can't decide how to represent it,
> or what aspects are important to maintain.  Comments from users about this
> removal in functionality would be most appreciated.


> Manifest inheritance is supposed to fix that.  Its never been implemented.

We are really looking forward to manifest inheritance, but are using
local_manifest.xml currently as a workaround.  They aren't required
for our setup, but are very handy.

Yea, point taken.  We at Google would also really benefit from having manifest inheritance.  I should implement that as part of this change, it sounds like its necessary to really make it practical for anyone.
 
Other than better manifest diffs and automatic branch detection, I
don't understand what new features or benefits this change will
provide which aren't already supported by the current xml file.  I
think the branch detection could be added to the xml format.

I think it is going to be easier to implement automatic manifest management with this format, than with one that is based on XML.  Its harder to get an invalid manifest, and during merges in the manifest project, its easier to see and resolve the conflict, as there's less weird XML parsing that has to occur.

Plus, you could actually implement a shell script that resides in $GIT_DIR/hooks/update on the server side to update the manifest automatically when certain branches in the project change.  The whole manifest is just a matter of an index file, e.g.:

  #!/bin/sh

  commit=$2
  path=$(pwd | 'sed s,/android, ; s,.git,,' )

  export GIT_INDEX_FILE=/tmp/gitindex.$$
  export GIT_DIR=../manifest.git
  if oh=$(git rev-parse HEAD) &&
    git read-tree --reset $oh &&
    git update-index --cacheinfo 160000 $commit $path &&
    nh=$(echo Update $(pwd) | git commit-tree -p $oh $(git write-tree)) &&
    git update-ref HEAD $nh $oh
  then
    rc=0
  else
    echo "fatal: Cannot update manifest $GIT_DIR"
    rc=1
  fi
  rm -f $GIT_INDEX_FILE
  exit $rc

That's it.  Now try doing that with XML, in Bourne shell.  Sure, in Python with an XML parser module its not too bad, but its still a bit more work.  OK, actually that doesn't pay attention to the "revision" property of the manifest, and only updates one branch, but its a general sketch of how to do it.  :-)
 
 Here is
our use-case (still somewhat under development):

We have several products sharing one gerrit install.  Of the 170ish
repositories, about 110 are pristine copies from upstreem and are used
by all products.  Another 50 or so have a MyCompany branch, and all
the products use this branch.  The other 10 repositories are fairly
product-specific and each product has their own branch.  While the
product is in development mode, it follows tips of all internal
repositories and specifies SHA1 tags for upstream unmodified
repositories.  When we hit release/bug squash mode, that product will
specify SHA1 hashes in the manifest for all repositories and only
update to newer versions of a repository when needed.

The current xml format works quite well for this.  With the new
format, if product A submits a change to one of the 50 shared
repositories, will products B-E have to update their manifest by hand
to use the new change?

Depends.

If products B-E have manifests that say "revision = ." for that project, Gerrit will update A-E all at once.  Well, its impossible to do a cross-project atomic update, but Gerrit will bang out the manifest updates as fast as it can, doing every branch of every manifest it manages that wants that new revision.

If products B-E have manifests that lack "revision", or that point at a different branch, Gerrit won't touch them.

IOW, Gerrit will have a big cross reference table, listing for every branch of every project, every branch of every dependent manifest.  A submit will cause Gerrit to go through that table, updating every manifest.  That may be hundreds of manifests, or just one.  Entirely depends on what is configured in them.  Updates to the manifest's ".gitmodules" file would cause Gerrit to update this cross reference table.

Shawn Pearce

unread,
May 26, 2009, 11:54:30 AM5/26/09
to repo-d...@googlegroups.com
On Mon, May 25, 2009 at 21:51, skillzero <skil...@gmail.com> wrote:

I'm also curious about this. My main reason for using repo is that I
don't have to manually specify or update SHA-1 hashes. I'm sort of
worried now because I've been proposing my group to switch to repo,
but if we have to manually update each manifest repository every time
somebody pushes a change to a sub-project (which is like 30 times a
day on average), we're in for a ton of work to keep everything in
sync. It's not possible for us to use an update hook (at least I don't
think it is) because some people that have access to a sub-project
don't have access to all the super projects that use it so they
wouldn't be able to update some super projects.

Interesting.  So you guys would actually be negatively impacted by this change, since you aren't using Gerrit and can't rely it to update the manifest for you, and users are executing with insufficient privileges to update the manifests via hook.

Have you considered using Gitosis[1]?  It would allow everyone to run as a single UNIX user (e.g. "git"), so that the update hook runs under a single identity, which has write access to all repositories.  This would permit an update hook to update the manifests where necessary.

But... you'd have to develop that update hook.  I just posted an untested fragment of what should be the basic steps necessary to write such a hook, but it ignores the notion of multiple branches in the manifest repository, or the "revision" property in .gitmodules, or the existence of multiple manifests.

I wonder if a middleground is possible.  Like a user level configuration that says "just always float to the latest", and have the client do the floating, like it does now.  So the manifest SHA-1s may be stale, but the client would just float ahead to current branch tips anyway.  With that, you might only update the manifest SHA-1s to tag a stable release point, e.g. a build you give to testing prior to release to a customer, but during normal development, the manifest just stays really old.

[1] http://eagain.net/gitweb/?p=gitosis.git

Jean-Baptiste Queru

unread,
May 26, 2009, 2:11:30 PM5/26/09
to repo-d...@googlegroups.com
I've been trying and failing to find a proper way to say this. I
apologize in advance as it's going to be hard to follow.

I'm concerned about the complexity of propagating/preserving manifest
changes through an auto-merge process.

We (Google) currently have an auto-merge process that walks our 100+
projects every few minutes, and whenever it find a new change in some
specified branch (e.g. in cupcake) it tries to merge it into some
other specified branch (e.g. into donut).

The process is currently already subject to race conditions
(short-term livelocks): the auto-merger syncs, performs a merge, and
attempts a simple push of the result (which fails if the result isn't
a fast-forward, i.e. if someone else touched that project during that
time window).

I'm trying to picture how such a process would work (or could be made
to work) with the new scheme that you are proposing.

JBQ

Shawn Pearce

unread,
May 26, 2009, 2:19:48 PM5/26/09
to repo-d...@googlegroups.com
On Tue, May 26, 2009 at 11:11, Jean-Baptiste Queru <j...@android.com> wrote:

I'm concerned about the complexity of propagating/preserving manifest
changes through an auto-merge process.

Me too.  This is probably a better conversation to have offline, with the folks who actually write and manage that auto-merge process.  Its unique to Google's current abuse of repo and Gerrit and Git, and until we have a good handle on that tool, making it available to others is worse than keeping it private.
 
We (Google) currently have an auto-merge process that walks our 100+
projects every few minutes, and whenever it find a new change in some
specified branch (e.g. in cupcake) it tries to merge it into some
other specified branch (e.g. into donut).

The process is currently already subject to race conditions
(short-term livelocks): the auto-merger syncs, performs a merge, and
attempts a simple push of the result (which fails if the result isn't
a fast-forward, i.e. if someone else touched that project during that
time window).

I'm trying to picture how such a process would work (or could be made
to work) with the new scheme that you are proposing.

Yea, me too.  I don't really have a good answer here.  Or, the answer is more complex than what I can try to explain in an email with very little context.  :-)

skillzero

unread,
May 26, 2009, 4:01:45 PM5/26/09
to Repo and Gerrit Discussion
On May 26, 8:54 am, Shawn Pearce <s...@google.com> wrote:
>
> Interesting.  So you guys would actually be negatively impacted by this
> change, since you aren't using Gerrit and can't rely it to update the
> manifest for you, and users are executing with insufficient privileges to
> update the manifests via hook.
>
> Have you considered using Gitosis[1]?  It would allow everyone to run as a
> single UNIX user (e.g. "git"), so that the update hook runs under a single
> identity, which has write access to all repositories.  This would permit an
> update hook to update the manifests where necessary.

I haven't looked at Gitosis yet. It seems like that would allow
visibility of all the projects if they could see what the update hook
is doing. Secrecy is the main reason some users don't have access to
certain projects. We don't want to disclose certain projects to
certain users because of sometimes crazy company policy. But maybe I
don't fully understand how Gitosis works.

> I wonder if a middleground is possible.  Like a user level configuration
> that says "just always float to the latest", and have the client do the
> floating, like it does now.  So the manifest SHA-1s may be stale, but the
> client would just float ahead to current branch tips anyway.  With that, you
> might only update the manifest SHA-1s to tag a stable release point, e.g. a
> build you give to testing prior to release to a customer, but during normal
> development, the manifest just stays really old.

That seems reasonable (i.e. it supports everything it does today, but
if you happened to look at the manifest XML equivalent, you'd see old
SHA-1 hashes even though it would be using the tip of whatever branch
the super project decided it wanted to use for the sub project).

Is there a reason it even needs to store the SHA-1 hash if it's not
being used? Maybe the SHA-1 hash could be optional. If it's present
then it always uses it even if the branch tip has advanced. If the
SHA-1 hash is missing then it uses the branch tip.

skillzero

unread,
May 26, 2009, 4:09:57 PM5/26/09
to Repo and Gerrit Discussion
On May 26, 8:48 am, Shawn Pearce <s...@google.com> wrote:
>
> On Fri, May 22, 2009 at 14:03, Brad Larson <bklar...@gmail.com> wrote:
>
> > couldn't you
> > just specify SHA1 tags in your manifest and update them when needed?
>
> Yes.
>
> However, if you put a SHA1 tag in your manifest, then tools like "repo
> upload" break, because they try to upload to refs/for/$SHA1, and $SHA1 is
> never a valid branch name in a project.  (Who really names their branches
> 498a0e8a79ab76eeb6adc40f12b04d59820716f9 or
> 5d15cf383abb5166d6233779042c7f31eaac26bd, I mean, that's just too obtuse of
> a name.)

What about adding a new "branch" attribute to augment the "revision"
attribute? This would tell repo what git branch to use and the
"revision" attribute could optionally be used to lock the project down
to a specific commit ID?

Jean-Baptiste Queru

unread,
May 26, 2009, 4:12:11 PM5/26/09
to repo-d...@googlegroups.com
Right now you can specifiy (e.g.) revision="refs/heads/master".

JBQ

Shawn Pearce

unread,
May 26, 2009, 6:00:33 PM5/26/09
to repo-d...@googlegroups.com
On Tue, May 26, 2009 at 13:01, skillzero <skil...@gmail.com> wrote:

On May 26, 8:54 am, Shawn Pearce <s...@google.com> wrote:
>
> Have you considered using Gitosis[1]?
I haven't looked at Gitosis yet. It seems like that would allow
visibility of all the projects if they could see what the update hook
is doing. Secrecy is the main reason some users don't have access to
certain projects. We don't want to disclose certain projects to
certain users because of sometimes crazy company policy. But maybe I
don't fully understand how Gitosis works.

No, you should be able to mark the other project's manifests as not visible, and unless the update hook displays information to stdout or stderr, the user won't know what is occurring behind the scenes.  So just make sure your update hook doesn't print information the user shouldn't know, and its fine.
 
> I wonder if a middleground is possible.  Like a user level configuration
> that says "just always float to the latest", and have the client do the
> floating, like it does now.  So the manifest SHA-1s may be stale, but the
> client would just float ahead to current branch tips anyway.  With that, you
> might only update the manifest SHA-1s to tag a stable release point, e.g. a
> build you give to testing prior to release to a customer, but during normal
> development, the manifest just stays really old.

Is there a reason it even needs to store the SHA-1 hash if it's not
being used? Maybe the SHA-1 hash could be optional. If it's present
then it always uses it even if the branch tip has advanced. If the
SHA-1 hash is missing then it uses the branch tip.

The proposed format relies partially on the entry in the ".gitmodules" file, and partially on the gitlink entry in the tree.  The gitlink entry must have a SHA-1, although it could be 0{40}, but why when we can just point to a valid commit, albeight maybe an old one.  I don't want to relax the parsing rules for the manifest to permit one but not the other, as that may make it easy to omit something that is critical.  I think it is better to just require both.

Gergely Kis

unread,
May 26, 2009, 7:55:50 PM5/26/09
to repo-d...@googlegroups.com
Hi,

The thread starter email references the classic debate about working copy update policies: "latest greatest" or "controlled, tested version".
I think this is an issue that any reasonably sized project has to overcome in its lifecycle.

On Tue, May 26, 2009 at 5:48 PM, Shawn Pearce <s...@google.com> wrote:

If a change only touches one project, then yes, as soon as it submits, the manifest jumps to that new revision.  So there is no difference.

However, if a change touches 2 or more projects, and they are interrelated, the first project submits, and the manifest does nothing.  When the second project submits, then the manifest jumps both projects simultaneously, in the same manifest update commit.  Thus clients syncing off that manifest either see no update, or they see both updates, but they are never presented a version of the manifest where only one of the 2 projects has been updated.
Probably I don't understand something: Is it planned to also submit manifest updates as part of a repo upload?

Actually I think it would be a good idea: it creates a "project-wide" atomic commit, which references other commits in other repositories. This way it is very easy to reproduce the exact state of the project: a single commit in the manifest repository does the trick.

It also provides an easy way to communicate between developers: something like SVN supports with tagging a working copy into the repository.

Also, different manifest commits could be tagged by different labels, thus implement a build/configuration promotion system.

If repo itself can easily create new manifest revisions from specific project states, and submit those to Gerrit or just push them into the master manifest repository, I don't see a problem with the "manual" gitlink changes. Repo should also include a possibility to update to the tip of each branch in the project, and generate the appropriate manifest for that.

I would like to add one more consideration into the mix: handling continuous integration.

I started to implement repo support for Hudson (hudson.dev.java.net), so it can be used as a CI for Android.
Of course, builds have to be reproducible. I saw that repo can output a manifest xml format, which lists all the commits for each repository. I planned to store this xml file as a build artifact so the build can be reproduced. It would be much nicer to use a single commit hash which points to the manifest repository.

I was also thinking on how to support the review process with a CI server.
It would periodically check the not yet merged and not abandoned changes in gerrit, merge them into a workspace, and try to build it. If it does not build it would automatically add a -1 comment if it builds it would add a +1 comment.

It is of course a question whether changes are applied one at a time, or the CI tries to include as many changes as possible into each build.

Also, most probably after a change is merged, the pending changes should be checked whether they can be still merged or not. If not, then they should be excluded from further CI runs, and a report should be sent to the creator of the patch, so they can submit an updated version.

What do you think?

Best Regards,
Gergely

Shawn Pearce

unread,
May 26, 2009, 8:23:37 PM5/26/09
to repo-d...@googlegroups.com
On Tue, May 26, 2009 at 16:55, Gergely Kis <gerge...@gmail.com> wrote:

The thread starter email references the classic debate about working copy update policies: "latest greatest" or "controlled, tested version".

Yes, true, that is a good summary of this thread.  :-)
 
I think this is an issue that any reasonably sized project has to overcome in its lifecycle.

Android was already there before the move to Git.  Its just that we've lately realized the early decision of "latest greatest" wasn't the right one.  At least for Google's engineers.
 
On Tue, May 26, 2009 at 5:48 PM, Shawn Pearce <s...@google.com> wrote:

If a change only touches one project, then yes, as soon as it submits, the manifest jumps to that new revision.  So there is no difference.

However, if a change touches 2 or more projects, and they are interrelated, the first project submits, and the manifest does nothing.  When the second project submits, then the manifest jumps both projects simultaneously, in the same manifest update commit.  Thus clients syncing off that manifest either see no update, or they see both updates, but they are never presented a version of the manifest where only one of the 2 projects has been updated.
Probably I don't understand something: Is it planned to also submit manifest updates as part of a repo upload?

I hadn't initially drafted it in this RFC, but I did consider it for about 2 seconds.  I think the idea has a lot of merit.
 
Actually I think it would be a good idea: it creates a "project-wide" atomic commit, which references other commits in other repositories. This way it is very easy to reproduce the exact state of the project: a single commit in the manifest repository does the trick.

Exactly.  And then repo download can better match your working state to the state of the change you are downloading, so it is more likely to compile as-is.
 
It also provides an easy way to communicate between developers: something like SVN supports with tagging a working copy into the repository.

Yes.  Some of those are trying to use a patch file created by "repo diff", which is hard because you need to slice it up and feed it to "git apply" piecemeal.  It would be much easier of this was more supported natively in repo.
 
Also, different manifest commits could be tagged by different labels, thus implement a build/configuration promotion system.

Yup.
 
If repo itself can easily create new manifest revisions from specific project states, and submit those to Gerrit or just push them into the master manifest repository, I don't see a problem with the "manual" gitlink changes. Repo should also include a possibility to update to the tip of each branch in the project, and generate the appropriate manifest for that.

Yes, but it gets ugly with live-lock when you are talking about updating the manifest repository.  E.g. you might succeed in pushing to the projects, but fail on pushing the manifest, as someone else has already updated the manifest.  Then repo would need to retry the manifest.  A bad client could push the manifest first, then the projects, resulting in the classic "out of order lock aquisition" problem, but instead of causing deadlock, you'd get a nasty merge conflict in the manifest repository.
 
I would like to add one more consideration into the mix: handling continuous integration.

I started to implement repo support for Hudson (hudson.dev.java.net), so it can be used as a CI for Android.
Of course, builds have to be reproducible. I saw that repo can output a manifest xml format, which lists all the commits for each repository. I planned to store this xml file as a build artifact so the build can be reproduced. It would be much nicer to use a single commit hash which points to the manifest repository.

I was also thinking on how to support the review process with a CI server.
It would periodically check the not yet merged and not abandoned changes in gerrit, merge them into a workspace, and try to build it. If it does not build it would automatically add a -1 comment if it builds it would add a +1 comment.

It is of course a question whether changes are applied one at a time, or the CI tries to include as many changes as possible into each build.

Also, most probably after a change is merged, the pending changes should be checked whether they can be still merged or not. If not, then they should be excluded from further CI runs, and a report should be sent to the creator of the patch, so they can submit an updated version.

What do you think?

I agree about the CI aspects.  FWIW, the "Verified" field is meant to be set +1/-1 by a CI system, as you describe above.

Actually, we've already talked about all of this internally at Google before we launched the AOSP.  But its all been pie-in-the-sky engineering, because we've been dragged down in more mundane issues.  I haven't even had time to write it out as part of a roadmap for Gerrit Code Review.  I'm glad someone else shares the same thoughts on the matter, and has taken the time to write it out for us.  :-)

skillzero

unread,
May 26, 2009, 9:01:33 PM5/26/09
to Repo and Gerrit Discussion
On May 26, 1:12 pm, Jean-Baptiste Queru <j...@android.com> wrote:
>
> Right now you can specifiy (e.g.) revision="refs/heads/master".

Yes, but I was thinking of a new "branch" attribute in addition to the
"revision" attribute. They could be used together pushes work to refs/
for/$branch, but repo could still restrict things to $revision. It
sounded like the motivation for the manifest change is for some
workflows to avoid picking up the latest changes for a branch of a
project until other projects have also been updated (i.e. making a
multi-project push appear atomic).

Gergely Kis

unread,
May 27, 2009, 4:38:12 AM5/27/09
to repo-d...@googlegroups.com
Hi

On Wed, May 27, 2009 at 2:23 AM, Shawn Pearce <s...@google.com> wrote:

 
If repo itself can easily create new manifest revisions from specific project states, and submit those to Gerrit or just push them into the master manifest repository, I don't see a problem with the "manual" gitlink changes. Repo should also include a possibility to update to the tip of each branch in the project, and generate the appropriate manifest for that.

Yes, but it gets ugly with live-lock when you are talking about updating the manifest repository.  E.g. you might succeed in pushing to the projects, but fail on pushing the manifest, as someone else has already updated the manifest.  Then repo would need to retry the manifest.  A bad client could push the manifest first, then the projects, resulting in the classic "out of order lock aquisition" problem, but instead of causing deadlock, you'd get a nasty merge conflict in the manifest repository.
I thought, that currently repo sends changes through Gerrit. So merging from that direction would be serialized. The problem is the auto-merger and other "special people" :), who can directly push changes into the repositories. In the case of Google I think it should be possible to teach the auto-mergers to first get a "lock" from somewhere.
For other people, who don't use Gerrit, a similar locking scheme can be implemented quite easily in my opinion, maybe the locking part of Gerrit could be broken out into a separate tool.

Merging a change that spans multiple repositories will always be "risky" in the sense, that if a patch cannot be applied to a repository the whole process will fail, and the previous repositories will contain "junk" that are not yet usable. With this new manifest scheme this is not a problem in the sense, that these changes won't be visible for others, but it will take up some storage place in the repositories. I don't know, if it is a requirement to somehow clean out these commits, or it is the notion, that they can be probably used later with an updated change that fixes the failing repositories.

Am I missing something? Are you trying to make sure that no such "global lock" is necessary?

Regarding merging the actual manifests: I think that instead of relying only on the built-in merge algorithms from git, a custom merge algorithm should be implemented, which understands the semantics of the manifest merge, e.g. knows that it is merging a change which actually happened in repo B and C and there it went without problems, so the respective SHA hashes can be safely changed to the right value.


 

I agree about the CI aspects.  FWIW, the "Verified" field is meant to be set +1/-1 by a CI system, as you describe above.

Actually, we've already talked about all of this internally at Google before we launched the AOSP.  But its all been pie-in-the-sky engineering, because we've been dragged down in more mundane issues.  I haven't even had time to write it out as part of a roadmap for Gerrit Code Review.  I'm glad someone else shares the same thoughts on the matter, and has taken the time to write it out for us.  :-)
If this manifest format change is going to happen soon, it might make more sense to add support for it to the Hudson git module, instead of my current method of calling repo directly. I have to think about this a bit more.

The other parts, like talking to Gerrit can be implemented separately as a build notifier / publisher.
 
Best Regards,
Gergely

Brad Larson

unread,
May 27, 2009, 12:16:57 PM5/27/09
to Repo and Gerrit Discussion


On May 26, 10:48 am, Shawn Pearce <s...@google.com> wrote:
> On Fri, May 22, 2009 at 14:03, Brad Larson <bklar...@gmail.com> wrote:
>
> > I'm not sure if I fully understand the proposal, so forgive me if this
> > seems uninformed.
>
> > > After some discussion with my coworkers I've come to realize that the
> > > current "repo sync" behavior of jumping to the latest version in each
> > > project is insane.
>
> > Why is that?  If it is because of possible instability,
>
> Yes, its because of instability.  We are finding that some change span 5, 6
> projects at once, and if you get a change in project A but not the
> corresponding change in project B, then the build fails.

Ah, I didn't realize this was to fix the multiple-project issue. The
proposal makes a lot more sense now.

We have considered adding support so a user can tell Gerrit that a
change depends on other changes - something more official than just
the patchset comments which we currently use for this purpose. But it
would still be up to the merger to merge everything at once. We've
avoided any problems so far by running a test merge on a clean sandbox
before running the actual merge if we have to merge several changes at
once.

So if my manifest says 'Follow tips of projects A - E', and someone
submits interdependent changes to projects A, B, and C, and they merge
cleanly on A and B but have a conflict on C, what will gerrit do? I
assume you want it to modify the manifest to specify the last good
commit for projects A and B while it waits on someone to fix the merge
conflicts with C? What if there are other manifests following tips of
A-E - it sounds like gerrit will update their manifests as well? What
if a manifefst tracks tips of A and B, but is on a static revision of
C? (This is more a problem in general, and not with the current
proposal)

How will gerrit know that changes are interdependent? Will the user
specify this through git or a gerrit page?

I agree that changing the manifest from xml to a git config type file
will be beneficial on lots of fronts. Having gerrit update it
concerns me though... as an alternative, would it be possible for
gerrit to run a test merge on interdependent changes to projects A-C,
and only do the actual merge if it knows everything will merge
cleanly?

Gergely Kis

unread,
May 28, 2009, 7:12:25 AM5/28/09
to repo-d...@googlegroups.com
Hi,

I was playing a bit with git submodule support and I would like to share a few of my thoughts.

On Thu, May 21, 2009 at 7:49 AM, Shawn Pearce <s...@google.com> wrote:

- I don't know where to put the review URL.  Given the URLs being relative, it suddenly seems odd to put the review URL into a "submodule.dalvik.review" key in the .gitmodules file.  Suggestions would be appreciated.
In the current manifest a review url is specified for a remote name, e.g. korg.
I think that the best way would be not to extend the .gitmodules file, but add a new .gerrit file. There you could specify the code review URL.
If we check the use cases:
1. "pure" a.g.k.o/r.android.com setup -> we need only one review url

2. local mirror setup, where we only push changes to local gerrit -> we need only one review url

3. local mirror setup, where we want to push to both a.g.k.o/r.android.com and to our local gerrit as well:
In this case we will have separate branches in the manifest repository: one for each configuration. If I want to push to a.g.k.o, I commit it on my topic branch, push it through my local gerrit, then switch my manifest to the a.g.k.o branch, and push the change using that configuration. (Of course there are many build, test, fix cycles omitted)

Even in this case, we only need a single line in the .gerrit (on each branch), which specifies the host name for gerrit.

Am I missing something?

 


- The manifest format matches what `git submodule` expects.  That means one could clone the manifest and manage to do a checkout (and possibly build) without repo.  Conversely, repo could automatically be used on any other already existing `git submodule` style project.

If I understand it correctly, repo creates the following working tree format:
.repo/
.repo/manifests.git -> the manifest repository
.repo/manifests -> a linked working tree of the manifest repository
.repo/projects/*/*.git -> the cloned repositories
.repo/repo -> the repo repository checked out with a working tree, so repo can find its files
dir1/dir2
dir3
....
dirN -> these are linked working trees, which link to .repo/projects/*/*.git

If I understand it correctly, the main advantage of this setup is that it should be possible to create multiple working tree setups without duplicating the repository clones itself.

Also, it could be possible to configure different working trees for different products (e.g. dream requires different kernel ... etc.)

If I am not mistaken, this functionality is not yet implemented in repo (or it is well hidden :) )

In order to be able to build Android simply with git submodules, we need to be able to do the following:

git clone git://path/to/manifest.git mydroid
cd mydroid
git submodules init
git submodules update
make (*)

*) for make to work, we need to add the currently <copyfile/> -d top level Makefile to the manifest repository.

In order for this to work we need to add the absolute urls (git://a.g.k.o/*/*.git) into .gitmodules.

How I would make repo work with this new scheme (assuming that I was right about my assumption above, why repo makes use of the linked working tree approach):

The directory hierarchy would look like this:
.repo
.repo/manifests.git -> clone of the manifests repository
.repo/projects/*/*.git -> clones of the other repositories
.repo/repo -> clone of the repo repository with a working tree so repo can find its files
.repo/working-trees -> a list of directories where workingtrees are stored based on this .repo storage

workingtree1 -> linked working tree of the manifest repository
workingtree1/.git/config -> includes the submodule links to the relative ../.repo/projects/*/*.git
workingtree1/.gitmodules -> the unaltered manifest configuration file, containing the full URLs of all the project repositories.
workingtree1/.repo-inherit -> a file to specify the parent manifest branch (or even commit)
workingtree1/.gerrit -> The code review URL
workingtree1/Makefile -> the currently <copyfile/> -ed Makefile which only includes the makefile from build anyways.
workingtree1/project1
....
workingtree1/projectN -> linked working trees of the projects to ../.repo/projects/*/*.git.

workingtree2
...
workingtreeN -> similar to workingtree1

workingtree1 ... workingtreeN would be named after the manifest branch name, e.g. master, master-dream (master branch configured for dream hardware),  donut, donut-dream ... etc.

Let's check the use cases:
1. Using without repo:
Works, but loses the multiple working tree and extra workflow functionality, like code review, manifest inheritance, manifest maintenance ... etc.

2. Using with repo:
adds multiple working tree support, additional workflow elements (creating / managing topic branches, creating atomic changes in multiple projects... etc.)
It also makes the dealing with submodules more bulletproof than

3. Manifest inheritance
Manifest inheritance can be handled by a function in repo which will check the branch / commit specified in .repo-inherit, and make appropriate modifications in the .gitmodules file and the gitlink entries. This should be a semantic merge (repo should understand the structure and be agnostic to reordering in .gitmodules... etc.).

4. local_manifest.xml
local_manifest.xml is replaced by your own manifest branch, which inherits the original branch you wanted to use.

5. auto mergers
The above repo "semantic merge" feature could be used to automatically update all affected manifest branches after an  automerge

6. Atomic changes
When doing repo start, commit, upload the appropriate gitlink updates are done in the manifest repository as a commit and sent together with the changes in the other projects.

This has the advantage that "repo download" would actually set the exact state of the project to the one submitted by the change author.

However if this is not desired (because the working tree already has changes, repo download  would use the semantic merge feature.

The same semantic merge algorithm would be used by Gerrit when merging a change into the master git repositories.

7. Continuous Integration
Simple CI would be possible with the Hudson git plugin (it has submodule support). However, this has no direct integration with Gerrit code review, it only builds what is already merged into the selected branch.

For deeper Gerrit integration (verifying changes prior to merging) a repo plugin is still required.
Additional advantages:
- The CI server could use a common repo storage area, and each build job only uses a linked working tree with the specific manifest branch.
- With the atomic change support above it would be easier to implement the integration.

What do you think?

Best Regards,
Gergely

Shawn Pearce

unread,
May 28, 2009, 11:12:06 AM5/28/09
to repo-d...@googlegroups.com
On Thu, May 28, 2009 at 04:12, Gergely Kis <gerge...@gmail.com> wrote:
On Thu, May 21, 2009 at 7:49 AM, Shawn Pearce <s...@google.com> wrote:

- I don't know where to put the review URL.  Given the URLs being relative, it suddenly seems odd to put the review URL into a "submodule.dalvik.review" key in the .gitmodules file.  Suggestions would be appreciated.
In the current manifest a review url is specified for a remote name, e.g. korg.
I think that the best way would be not to extend the .gitmodules file, but add a new .gerrit file. There you could specify the code review URL.

Ooooh.  I like that.

Here's why, we can do something like this:

  # on a.g.k.o / r.s.a.c manifests
  #
  $ echo review.source.android.com >.gerrit

  # on company internal manifest
  #
  $ echo review.our.company >.gerrit
  $ echo .gerrit merge=ourfile >>.gitattributes

  # one time setup per merging user
  #
  $ git config --global merge.ourfile.name 'always pick our file'
  $ git config --global merge.ourfile.driver 'cat %A'

Now you can update the internal company manifest by just "git pull korg donut", and git merge properly keeps your .gerrit file, no matter what happens to the upstream gerrit file.

I think the final question is, should ".gerrit" be a simple text file of *just* the hostname/URL, or should it be a git config style file, e.g.:

  $ git config --file=.gerrit review.url review.source.android.com
  $ cat .gerrit
  [review]
    url = review.source.android.com

The latter at least leaves us some room for the future, if we ever need to expand the file contents.  But I honestly can't see what else we'd need.

- The manifest format matches what `git submodule` expects.  That means one could clone the manifest and manage to do a checkout (and possibly build) without repo.  Conversely, repo could automatically be used on any other already existing `git submodule` style project.

If I understand it correctly, repo creates the following working tree format:
.repo/
.repo/manifests.git -> the manifest repository
.repo/manifests -> a linked working tree of the manifest repository
.repo/projects/*/*.git -> the cloned repositories
.repo/repo -> the repo repository checked out with a working tree, so repo can find its files
dir1/dir2
dir3
....
dirN -> these are linked working trees, which link to .repo/projects/*/*.git

If I understand it correctly, the main advantage of this setup is that it should be possible to create multiple working tree setups without duplicating the repository clones itself.

No.  The main advantage of this setup was to permit deleting "dir3" when switching between a manifest that has "dir3", and one which does not have "dir3".  During that switch we'd want to delete the working directory / tracked source files (like git would do if this was just a normal subdirectory), but we do not want to lose the repository itself, as that might lose topic branches that only exist here, and future clone costs would be high when we switch back.  By storing the real repository under .repo/projects/ its protected from the working directory delete, and can be easily reused during the recreate.

FWIW, the only reason ".repo/manifests" is symlinks back to ".repo/manifest.git" is because that simplified the code internally in repo.  Its an implementation detail, not something by design.

Also, it could be possible to configure different working trees for different products (e.g. dream requires different kernel ... etc.)

Yes, and e.g. android-1.0 differs from android-1.5 (aka cupcake) in that some projects were moved, some were added, etc.
 
If I am not mistaken, this functionality is not yet implemented in repo (or it is well hidden :) )

Yes, we could also implement "git new-work-dir" sort of semantics with repo, based upon this checkout format.  Just hasn't been hacked together.  It might not be too difficult, most of the code passes around this "repodir" value as the path to the ".repo/" directory.
 
In order to be able to build Android simply with git submodules, we need to be able to do the following:

git clone git://path/to/manifest.git mydroid
cd mydroid
git submodules init
git submodules update
make (*)

*) for make to work, we need to add the currently <copyfile/> -d top level Makefile to the manifest repository.

Or just copy it by hand.  Its 3 lines.  You already went through two git submodule calls just to use this.  Might as well go through a 3rd cp.
 
In order for this to work we need to add the absolute urls (git://a.g.k.o/*/*.git) into .gitmodules.

The relative URL format I chose for the .gitmodules file was based upon the gitmodules documentation saying it supported relative URLs if the URL starts with "./".  Maybe the version of git you are looking at doesn't yet support relative URLs inside .gitmodules?

The advantage of the relative URL format is clear... a very large number of organizations mirror the Android tree internally.  Not needing to hack all of the URLs in the .gitmodules file would be an advantage to them.
 
How I would make repo work with this new scheme (assuming that I was right about my assumption above, why repo makes use of the linked working tree approach):

The directory hierarchy would look like this:
.repo
.repo/manifests.git -> clone of the manifests repository
.repo/projects/*/*.git -> clones of the other repositories
.repo/repo -> clone of the repo repository with a working tree so repo can find its files
.repo/working-trees -> a list of directories where workingtrees are stored based on this .repo storage

workingtree1 -> linked working tree of the manifest repository
workingtree1/.git/config -> includes the submodule links to the relative ../.repo/projects/*/*.git
workingtree1/.gitmodules -> the unaltered manifest configuration file, containing the full URLs of all the project repositories.
workingtree1/.repo-inherit -> a file to specify the parent manifest branch (or even commit)
workingtree1/.gerrit -> The code review URL
workingtree1/Makefile -> the currently <copyfile/> -ed Makefile which only includes the makefile from build anyways.
workingtree1/project1
....
workingtree1/projectN -> linked working trees of the projects to ../.repo/projects/*/*.git.

workingtree2
...
workingtreeN -> similar to workingtree1

workingtree1 ... workingtreeN would be named after the manifest branch name, e.g. master, master-dream (master branch configured for dream hardware),  donut, donut-dream ... etc.

Interesting.

But, lets not make a ton of changes at once.  Some people are already unhappy they need to create a directory to execute "repo init" inside of.  Asking them to move one more level down for "workingtree1" is going to annoy them.  Others though already have 3, 4 parallel repo checkouts.  Clearly these people would benefit from this structure.

I like the multiple work tree approach... but lets do it after the manifest changes are done, rather than trying to do it at the same time, and lets make it optional.


I had planned on *not* creating a ".git" at the top of the working tree, so that commands like "git commit" always fail up here.  This way its clear that you need to give more context to operate on a project, like cd'ing into a project's working directory, before you can execute a command.  Thus far we have explicitly never had a ".git" at the top level for this very reason.

With the switch to a submodule style format, I want to keep that approach.  The supermodule (aka manifest) is a VCS implementation detail that should be semi-hidden from users, not in their face at the root level.  I'm unhappy that its "cd .repo/manifests" to access it, but, its better that it is out of sight most of the time.  I think many Android engineers at Google agree with me, they are used to "p4 client" being how they view/manipulate the equivilant of the manifest, and usually they don't think about what their client spec says, they just copy one from another engineer, and that's that.


As far as manifest inheritence, I realized this morning that the most likely proper way to do that is to just merge the parent manifest into your own.  The best way to explain this example is the T-Mobile G1 device.  We really should have 6 manifests:

  platform/manifest.git             ->  the Android Open Source Project, e.g. build, dalvik, framework
  hardware/msm.git                -> msm chipset specific projects
  google/google-experience.git -> projects related to the "Google experience" device
  t-mobile/tmus.git                  -> T-Mobile customizations, like "myFaves"
  htc/g1.git                            -> a manifest that pulls all of these together.

The way to build htc/g1.git is:

  $ mkdir g1
  $ cd g1
  $ git init

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake
  $ git pull git://android.git.kernel.org/hardware/msm.git cupcake
  $ git pull google:/google/-experience.git cupcake
  $ git pull tmobile:/tmus.git cupcake

If you need a new version of the base platform, just pull it in again:

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake


Where that gets ugly is the .gitmodules file.  A semantic merge of the .gitmodules file wouldn't be too difficult to define, its a pretty simple thing to merge together, we can do a "repo merge-driver-gitmodules" or something and help users define it in .git/config as a merge driver, and setup a .gitattributes to use our repo based merge driver for the .gitmodules file.

But you can't just create your own manifest like this without first having everything mirrored locally, otherwise the relative URLs would fail to resolve.

One way out of that might be to define URL prefixes on everything, but that would break compatibility with git submodule.  Hmm.  Maybe we have to use absolute URLs in .gitmodules, but support some sort of rewrites when the project is being initialized.  E.g., everyone keeps the git://android.git.kernel.org/ style URLs for projects whose real location is a.g.k.o, but some sort of rewrite file can be inserted at company mirror points, like the ".gerrit" above.

Shawn Pearce

unread,
May 28, 2009, 11:32:34 AM5/28/09
to repo-d...@googlegroups.com
On Wed, May 27, 2009 at 09:16, Brad Larson <bkla...@gmail.com> wrote:
So if my manifest says 'Follow tips of projects A - E', and someone
submits interdependent changes to projects A, B, and C, and they merge
cleanly on A and B but have a conflict on C, what will gerrit do?

Ideally, when A and B are submitted, they should be marked "SUBMITTED", but held out of the merge operation until C is also submitted.  Once all 3 changes are in the submit queue, Gerrit will have to lock all 3 branches, and execute the merges.

In your example, since C conflicted, Gerrit would release all 3 branch locks, and kick C out with a conflict.  A and B would stay in the SUBMITTED state, and the branches would be unaffected.

Once C is fixed, and resubmitted, Gerrit would relock all 3 branches, and retry the merges over again.  If all 3 merge clean, the branches are committed as rapidly as possible, and then the manifests are updated.

I guess there could be a race condition during the manifest updates... but in theory this shouldn't generally happen because we successfully updated our 3 branches, and we held the locks on those 3 branches.  Updating the manifest should be a trivially clean merge at this point because nobody else should have been able to update those same branches at the same time as we were.
 
 I
assume you want it to modify the manifest to specify the last good
commit for projects A and B while it waits on someone to fix the merge
conflicts with C?

No, that'd be bad, as it would allow A and B to become visible to clients, while C isn't yet visible.  The result would be what we have now, where the system doesn't compile until A and B are reverted, or C is finally submitted.  We might as well not make any code changes.
 
 What if there are other manifests following tips of
A-E - it sounds like gerrit will update their manifests as well?

Yes.  The manifest update portion of a submit could take considerable time (e.g. a second), if we have to update 50 manifests to match the new submits.
 
 What
if a manifefst tracks tips of A and B, but is on a static revision of
C?  (This is more a problem in general, and not with the current
proposal)

Only A and B would update, and C would be left alone.  And that manifest would probably fail to compile.

Perhaps in a case like this Gerrit should instead prepare the manifest update, but change all 3, even though C is static, but instead of immediately submitting that, schedule it as a reviewable change for the authors of A, B, C, the submitters of A, B, C, and the manifest owners.  Folks can then review whether or not to perform this update, or abandon it, and take the action.  But that would reduce the risk that A,B updates and C doesn't, and it now fails to compile.
 
How will gerrit know that changes are interdependent?  Will the user
specify this through git or a gerrit page?

I was thinking about letting them do this by git, or really, repo.  E.g. repo would prepare a new commit in the manifest repository which references the interdependent commits in the subprojects, and would upload that for review just like any other change.  Gerrit, upon seeing "gitlink" types pointing at commits in other projects would mark everything interdependent.

We could eventually also support setting this up in the web UI, but in the background, Gerrit would really just be creating the same data structures in the manifest repository as repo woudl have created.  That way users can easily setup something if they failed to do it with repo, but the end result is the same.

I prefer having this be able to be specified by git data transfer over git push... because it makes it more automatic.  You can rprepare the interdependent set locally, mark it interdependent, and upload much later.
 
I agree that changing the manifest from xml to a git config type file
will be beneficial on lots of fronts.  Having gerrit update it
concerns me though

Why?  Its the same as what Gerrit does now with its merges.  Its just a path level merge, but when there's a conflict we have a trivial way to recover, if one side fully contains the other (as far as revision history goes in that subproject) we can pick that one side.  Its literally the same code Gerrit is already running to submit file changes in projects.  Actually, that's a major reason to move from XML to this format... I can reuse the same merge code, code that we already have a reasonable level of trust in.
 
... as an alternative, would it be possible for
gerrit to run a test merge on interdependent changes to projects A-C,
and only do the actual merge if it knows everything will merge
cleanly?

See above, that's what I'm proposing.  For this to work we have to merge A-C concurrently, and only commit if they all succeed.  I have to work out how to allocate the resource locks to prevent merge attempts where we know we can't win the branch lock yet, but that's an easily solved problem.  If we always acquire the branch locks in the same order (e.g. alphabetical by project name), and we only attempt a merge if we know no other merge is actively working on that same branch, it'll work out fine.

Actually, this is similiar to what Eclipse does with its builders and scheduling rules.  Their code is really complex and tied to the Eclipse platform, but same principle.

Gergely Kis

unread,
May 28, 2009, 2:00:55 PM5/28/09
to repo-d...@googlegroups.com
Hi,

Sorry for the long email.

On Thu, May 28, 2009 at 5:12 PM, Shawn Pearce <s...@google.com> wrote:
  $ cat .gerrit
  [review]
    url = review.source.android.com
I think this is a good idea.

 

In order to be able to build Android simply with git submodules, we need to be able to do the following:

git clone git://path/to/manifest.git mydroid
cd mydroid
git submodules init
git submodules update
make (*)

*) for make to work, we need to add the currently <copyfile/> -d top level Makefile to the manifest repository.

Or just copy it by hand.  Its 3 lines.  You already went through two git submodule calls just to use this.  Might as well go through a 3rd cp.
But then your repository won't be clean, you will have an untracked file. It is just ugly :)
 

 
In order for this to work we need to add the absolute urls (git://a.g.k.o/*/*.git) into .gitmodules.

The relative URL format I chose for the .gitmodules file was based upon the gitmodules documentation saying it supported relative URLs if the URL starts with "./".  Maybe the version of git you are looking at doesn't yet support relative URLs inside .gitmodules?
Yes it does. But, If you would use relative URLs, and not use repo, you would need to write some scripts to clone the repositories to your system, and then clone from there.
Also, if you specify relative URLs in .gitmodules, then you have to specify somewhere else from where to clone those repositories.
 


The advantage of the relative URL format is clear... a very large number of organizations mirror the Android tree internally.  Not needing to hack all of the URLs in the .gitmodules file would be an advantage to them.
I don't really see the advantage.

If an organization mirrors the android repositories, they most likely will have their own repositories added to the manifest, or just use different branches ... etc. So they will have their own manifest / manifest branch anyways.

It can always be supported with tools to create / maintain these manifests, but I think it is unrealistic from an organization to assume that they can mirror the repositories and not touch the manifest. There are many more areas that need work and would be much greater help to these organizations.
 

 
How I would make repo work with this new scheme (assuming that I was right about my assumption above, why repo makes use of the linked working tree approach):

The directory hierarchy would look like this:
.repo
.repo/manifests.git -> clone of the manifests repository
.repo/projects/*/*.git -> clones of the other repositories
.repo/repo -> clone of the repo repository with a working tree so repo can find its files
.repo/working-trees -> a list of directories where workingtrees are stored based on this .repo storage

workingtree1 -> linked working tree of the manifest repository
workingtree1/.git/config -> includes the submodule links to the relative ../.repo/projects/*/*.git
workingtree1/.gitmodules -> the unaltered manifest configuration file, containing the full URLs of all the project repositories.
workingtree1/.repo-inherit -> a file to specify the parent manifest branch (or even commit)
workingtree1/.gerrit -> The code review URL
workingtree1/Makefile -> the currently <copyfile/> -ed Makefile which only includes the makefile from build anyways.
workingtree1/project1
....
workingtree1/projectN -> linked working trees of the projects to ../.repo/projects/*/*.git.

workingtree2
...
workingtreeN -> similar to workingtree1

workingtree1 ... workingtreeN would be named after the manifest branch name, e.g. master, master-dream (master branch configured for dream hardware),  donut, donut-dream ... etc.

Interesting.

But, lets not make a ton of changes at once.  Some people are already unhappy they need to create a directory to execute "repo init" inside of.  Asking them to move one more level down for "workingtree1" is going to annoy them.  Others though already have 3, 4 parallel repo checkouts.  Clearly these people would benefit from this structure.

I like the multiple work tree approach... but lets do it after the manifest changes are done, rather than trying to do it at the same time, and lets make it optional.
Fair enough. I was trying to communicate a "vision" sort of, not a plan of action or anything. :)

 


I had planned on *not* creating a ".git" at the top of the working tree, so that commands like "git commit" always fail up here.  This way its clear that you need to give more context to operate on a project, like cd'ing into a project's working directory, before you can execute a command.  Thus far we have explicitly never had a ".git" at the top level for this very reason.

With the switch to a submodule style format, I want to keep that approach.  The supermodule (aka manifest) is a VCS implementation detail that should be semi-hidden from users, not in their face at the root level.  I'm unhappy that its "cd .repo/manifests" to access it, but, its better that it is out of sight most of the time.  I think many Android engineers at Google agree with me, they are used to "p4 client" being how they view/manipulate the equivilant of the manifest, and usually they don't think about what their client spec says, they just copy one from another engineer, and that's that.
Regarding p4: I know the same "copy my neighbors' config spec" approach from ClearCase :)

However, with this change I wanted to shift the role of the "manifest" repository a bit. It would be no longer a mere "configspec" storage, but really the "top-level" umbrella repository.

When you create a change, then you would make a commit here as well, and then push that commit together with the other commits during repo upload to gerrit. So in the end this top level commit would get into refs/changes/* inside the manifest repository.

This commit would basically bind together all the commits that you made in the project repositories. So if you want to reproduce the _exact_ state of the working tree of the developer who submitted the commit, you can simply use this top level commit and switch to it.

I wanted to reflect this role change in the working tree layout as well. Plus, if someone wants to edit the manifest by hand for some reason (and not using an appropriate repo command), he can do it simply with stock git commands (git add $subproject ; git commit -m 'My message'), like he would in a regular "git submodule" project. 

I checked locally: git (1.5.4.3 from Ubuntu hardy) won't be confused by .gitmodules pointing to the absolute URLs, while .git/config points to the local ones in .repo: add, commit, push, pull, "submodule update" works just fine. Of course "submodule init" would be confused, but repo sync would be used for that anyways.

I will understand if you don't want to go this route, but please consider it, because I think that the benefits are much greater than the possibly issues with potentially annoyed developers.

Personally, I think that while changes in the tooling have to be well planned and communicated in advance, usually the actual cutover goes without big problems.

I was CM Lead at Siemens Mobile in one of the mobile phone projects and also member of a CM architect team for improving CM tooling / processes, so I have experience with big and slow organizations.

My experience was that while there is always a fear of change, most developers will see the reason behind the changes if communicated properly (and emphasizing the advantages for them :) )

I don't know how the work is organized at the companies developing for Android, but I always had a number of working trees for different product configurations:
- legacy product support
- currently developed products (usually more than one product configuration, which required different working trees)
- different language configurations (EMEA, US/North America, Asia-Pacific)
- different operator configurations (T-Mobile, Vodafone, ... etc.)

I expect that most people working on Android at Google will need to have at least 2 working trees by now: cupcake and donut.
They will probably also have separate trees for each type of hardware they are testing with. (I usually had 3 - 5 different hardware, and each required a separate tree)

Right now there is 3 type of hardware that we outside of Google know of: G1 -> T-Mobile variant, G2 -> Vodafone variant, ADP1
The first device that will target multiple operators will need even more trees.
 



As far as manifest inheritence, I realized this morning that the most likely proper way to do that is to just merge the parent manifest into your own.  The best way to explain this example is the T-Mobile G1 device.  We really should have 6 manifests:

  platform/manifest.git             ->  the Android Open Source Project, e.g. build, dalvik, framework
  hardware/msm.git                -> msm chipset specific projects
  google/google-experience.git -> projects related to the "Google experience" device
  t-mobile/tmus.git                  -> T-Mobile customizations, like "myFaves"
  htc/g1.git                            -> a manifest that pulls all of these together.

The way to build htc/g1.git is:

  $ mkdir g1
  $ cd g1
  $ git init

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake
  $ git pull git://android.git.kernel.org/hardware/msm.git cupcake
  $ git pull google:/google/-experience.git cupcake
  $ git pull tmobile:/tmus.git cupcake

If you need a new version of the base platform, just pull it in again:

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake

Would these repositories you list only contain the manifests?

In fact this is very similar to what I was proposing with .repo-inherit. The .repo-inherit would only be a convenience feature: With a "repo update-manifest" command it would automatically find the "parent" of the current manifest and do the merge if the parent has changed.

I expect that a build manager / CM person will be responsible for managing each organization's manifest repository which would track a.g.k.o and possibly other private / public manifest repositories from their partners.

From these sources they would derive their own manifest branch(es), and use those.
 



Where that gets ugly is the .gitmodules file.  A semantic merge of the .gitmodules file wouldn't be too difficult to define, its a pretty simple thing to merge together, we can do a "repo merge-driver-gitmodules" or something and help users define it in .git/config as a merge driver, and setup a .gitattributes to use our repo based merge driver for the .gitmodules file.

But you can't just create your own manifest like this without first having everything mirrored locally, otherwise the relative URLs would fail to resolve.
With my proposal the process would be like this:

repo init git://path/to/manifest.git would only clone the manifest and repo repositories into .repo.

Then you do a
repo sync <branchname>, which will do the following:
1. create a linked working tree from .repo/manifests.git into <branchname>
2. basically simulate git submodule init, by first examining .gitmodules, and cloning the required repositories into .repo/projects/*/*.git
3. it writes to <branchname>/.git/config the appropriate [submodule] sections with the relative (or by choice absolute) paths to the .repo/projects/*/*.git
4. execute the equivalent of "git submodule update", but creating linked working trees instead of a full clone of the repositories.

So this way you can do:

repo init git://path/to/manifest.git
repo sync master
repo sync master-dream
repo sync cupcake
repo sync cupcake-dream
...
And ideally only the first invocation will take long, the later ones only need to clone very few new repositories.

Then:
cd master ; repo sync

will do what you expect:
1. pull the manifest repository
2. examine the .gitmodules, clone the missing / fetch the existing submodule repositories
3. update .git/config to reflect the changes in the repositories
4. run the equivalent of "git submodule update" but with the linked working trees

In fact repo already implements the creation of linked working trees, the only part missing is the examination of .gitmodules and the updating of .git/config + the glue code to have basic sync support for this layout.

As an extra feature with a REPO_STORAGE or similar environment variable you can put .repo anywhere you want, and repo will just find it. So in fact this layout gives more freedom to the developers.
This is even more important for the CI setup, where you have no control where the workspace of the CI server is located.

Regarding relative (../../../.repo) or absolute (/home/myuser/mywork/repo) paths in .git/config: both have advantages and disadvantages:

With relative paths you can move everything to another machine / another directories, and the whole setup will continue to work.

With absolute path you can change the relative position of the working trees and the .repo directory, but you have to make sure that the absolute path to .repo stays constant.

Best Regards,
Gergely

Shawn Pearce

unread,
May 28, 2009, 10:06:06 PM5/28/09
to repo-d...@googlegroups.com
On Thu, May 28, 2009 at 11:00, Gergely Kis <gerge...@gmail.com> wrote:
On Thu, May 28, 2009 at 5:12 PM, Shawn Pearce <s...@google.com> wrote:
  $ cat .gerrit
  [review]
    url = review.source.android.com
I think this is a good idea.

Ok, noted.  So we're likely to do it this way then.
 
*) for make to work, we need to add the currently <copyfile/> -d top level Makefile to the manifest repository.

But then your repository won't be clean, you will have an untracked file. It is just ugly :)

echo /Makefile >>.git/info/exclude

:-)
 
In order for this to work we need to add the absolute urls (git://a.g.k.o/*/*.git) into .gitmodules.

The advantage of the relative URL format is clear... a very large number of organizations mirror the Android tree internally.  Not needing to hack all of the URLs in the .gitmodules file would be an advantage to them.
I don't really see the advantage.

If an organization mirrors the android repositories, they most likely will have their own repositories added to the manifest, or just use different branches ... etc. So they will have their own manifest / manifest branch anyways.

It can always be supported with tools to create / maintain these manifests, but I think it is unrealistic from an organization to assume that they can mirror the repositories and not touch the manifest. There are many more areas that need work and would be much greater help to these organizations.

Ah, OK.

My problem is, Google is juggling a number of different manifests, and for whatever reason, some folks are refusing to edit the URLs but are instead trying to play games with url.insteadof in ~/.gitconfig.  This is resulting in people getting stuck into a position where they have to edit ~/.gitconfig every time they switch their current working directory, or switch between terminal windows, because different clients have the same aboslute URL in their manifests, but originated from different mirror servers, or from android.git.kernel.org.  I had hoped that by using relative URLs, much of this pain would disappear.  But maybe not.

Maybe what we need is to support these URL rewrites in the client level within repo, rather than relying on git to do them transparently.  Then you can configure a remapping between "repo init" and "repo sync", and still get roughly the same effect.

Your arguments above though make good sense... absolute URLs might make it easier to deal with, especially when talking about a blended manifest (one pulled together from several other more modular manifests).
 
I had planned on *not* creating a ".git" at the top of the working tree, so that commands like "git commit" always fail up here.  This way its clear that you need to give more context to operate on a project, like cd'ing into a project's working directory, before you can execute a command.  Thus far we have explicitly never had a ".git" at the top level for this very reason.

With the switch to a submodule style format, I want to keep that approach.  The supermodule (aka manifest) is a VCS implementation detail that should be semi-hidden from users, not in their face at the root level.  I'm unhappy that its "cd .repo/manifests" to access it, but, its better that it is out of sight most of the time.  I think many Android engineers at Google agree with me, they are used to "p4 client" being how they view/manipulate the equivilant of the manifest, and usually they don't think about what their client spec says, they just copy one from another engineer, and that's that.
Regarding p4: I know the same "copy my neighbors' config spec" approach from ClearCase :)

However, with this change I wanted to shift the role of the "manifest" repository a bit. It would be no longer a mere "configspec" storage, but really the "top-level" umbrella repository.

We've never wanted the manifest to be some sort of top level.  Its always supposed to be *just* a "configspec".  So its small, doesn't change that often, etc.  It wasn't even a git repository in the early days of repo, we tacked that on once we realized that editing the XML file by hand was horrible, and people would want to share the XML files back and forth... so we shoved it in git just to facilitate sharing.

I'm not sure what value we get from it being on the top level, other than the "git add $subproject; git commit" benefit.  Which may or may not confuse the heck out of a user used to p4 or svn, for example.
 
As far as manifest inheritence, I realized this morning that the most likely proper way to do that is to just merge the parent manifest into your own.  The best way to explain this example is the T-Mobile G1 device.  We really should have 6 manifests:

  platform/manifest.git             ->  the Android Open Source Project, e.g. build, dalvik, framework
  hardware/msm.git                -> msm chipset specific projects
  google/google-experience.git -> projects related to the "Google experience" device
  t-mobile/tmus.git                  -> T-Mobile customizations, like "myFaves"
  htc/g1.git                            -> a manifest that pulls all of these together.

The way to build htc/g1.git is:

  $ mkdir g1
  $ cd g1
  $ git init

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake
  $ git pull git://android.git.kernel.org/hardware/msm.git cupcake
  $ git pull google:/google/-experience.git cupcake
  $ git pull tmobile:/tmus.git cupcake

If you need a new version of the base platform, just pull it in again:

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake

Would these repositories you list only contain the manifests?

Yes.  Or, well, each repository is a manifest repository.  They merge down into the "htc/g1" repository to provide the manifest necessary for HTC to build a release system image for manufactoring.  "htc/adp1" would be a very similar manifest, but is slightly different, due to the different keys on the device, etc.
 
In fact this is very similar to what I was proposing with .repo-inherit. The .repo-inherit would only be a convenience feature: With a "repo update-manifest" command it would automatically find the "parent" of the current manifest and do the merge if the parent has changed.

Sure, but why not just "git pull" ?  And note above, there's more than one parent for "htc/g1".  There are technically 4.
 
I expect that a build manager / CM person will be responsible for managing each organization's manifest repository which would track a.g.k.o and possibly other private / public manifest repositories from their partners.

From these sources they would derive their own manifest branch(es), and use those.

Yes, that's true I think in every group I've talked to that is working with the Android sources.  Usually I'm talking to that build manager / CM person.  :-)

Gergely Kis

unread,
May 29, 2009, 3:47:58 AM5/29/09
to repo-d...@googlegroups.com
Hi,

On Fri, May 29, 2009 at 4:06 AM, Shawn Pearce <s...@google.com> wrote:

However, with this change I wanted to shift the role of the "manifest" repository a bit. It would be no longer a mere "configspec" storage, but really the "top-level" umbrella repository.

We've never wanted the manifest to be some sort of top level.  Its always supposed to be *just* a "configspec".  So its small, doesn't change that often, etc.  It wasn't even a git repository in the early days of repo, we tacked that on once we realized that editing the XML file by hand was horrible, and people would want to share the XML files back and forth... so we shoved it in git just to facilitate sharing.
Fair enough, but with the proposed format change the manifest repository will have to change on each and every merged commit into any of the project repositories. So the "seldom changing" argument no longer applies.

 


I'm not sure what value we get from it being on the top level, other than the "git add $subproject; git commit" benefit.  Which may or may not confuse the heck out of a user used to p4 or svn, for example.
The main benefit is together with my "atomic change" proposal where each change also includes a commit in the manifest repository, so every change is globally defined. This is a property, that should not be underestimated.  It is something that works in SVN, in a single git repository, and even in a regular git submodule based project if we look at the superproject's commits.

I worked with systems, where this did not apply, like base ClearCase (you have to label every object in your view to get the same effect), or CVS or Telelogic CM Synergy (previously Continuous)... etc. While you can work with such systems, it is far from optimal.

In my opinion, my proposal blends in nicely with the intent of the whole change: to have more control over what gets into the developer's working tree.
At the basis it remains completely compatible with a regular "git submodule" project, no need to touch repo or gerrit if you want. (You can add "freedom of choice" to your marketing flyers :-) )

Repo only builds on this foundation:
- linked working trees to save disk space and sync times
- make working with submodules easier (not having to do separate commits in subprojects ... etc.)
- topic branch management
- code review workflow
etc.

In any case, I can submit patches for repo to add support for this layout, and test how it sits with the developers / CM people. However, the biggest advantage of this layout would be the easy support for atomic changes, which would also require Gerrit support.
 

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake
  $ git pull git://android.git.kernel.org/hardware/msm.git cupcake
  $ git pull google:/google/-experience.git cupcake
  $ git pull tmobile:/tmus.git cupcake

If you need a new version of the base platform, just pull it in again:

  $ git pull git://android.git.kernel.org/platform/manifest.git cupcake

Would these repositories you list only contain the manifests?

Yes.  Or, well, each repository is a manifest repository.  They merge down into the "htc/g1" repository to provide the manifest necessary for HTC to build a release system image for manufactoring.  "htc/adp1" would be a very similar manifest, but is slightly different, due to the different keys on the device, etc.
 
In fact this is very similar to what I was proposing with .repo-inherit. The .repo-inherit would only be a convenience feature: With a "repo update-manifest" command it would automatically find the "parent" of the current manifest and do the merge if the parent has changed.

Sure, but why not just "git pull" ?  And note above, there's more than one parent for "htc/g1".  There are technically 4.
I see your point, with the merge driver the handling should be pretty straightforward. Nice.
 
Best Regards,
Gergely

Shawn Pearce

unread,
May 29, 2009, 10:45:01 AM5/29/09
to repo-d...@googlegroups.com
On Fri, May 29, 2009 at 00:47, Gergely Kis <gerge...@gmail.com> wrote:
On Fri, May 29, 2009 at 4:06 AM, Shawn Pearce <s...@google.com> wrote:

However, with this change I wanted to shift the role of the "manifest" repository a bit. It would be no longer a mere "configspec" storage, but really the "top-level" umbrella repository.

We've never wanted the manifest to be some sort of top level.  Its always supposed to be *just* a "configspec".  So its small, doesn't change that often, etc.  It wasn't even a git repository in the early days of repo, we tacked that on once we realized that editing the XML file by hand was horrible, and people would want to share the XML files back and forth... so we shoved it in git just to facilitate sharing.
Fair enough, but with the proposed format change the manifest repository will have to change on each and every merged commit into any of the project repositories. So the "seldom changing" argument no longer applies.

After sleeping on it, I have changed my mind, and I think we are now in agreement.  I've moved over to your idea that the top level should just be a git repository working directory.  Well, actually, it depends on whether or not you use the "multiple parallel work trees" approach.

I think we should support both layouts, starting/defaulting with the single work tree layout, and then allowing the user to switch to multiple if they so choose.  E.g. "repo init --multi -u ..." would immediately use the multiple layout you proposed several messages back.  "git new-work-dir" is still in contrib for a reason, its easy to get confused and wind up with a branch checked out in two places at once, etc.  As it is repo and git give you enough rope to hang yourself, lets at least make it an advanced user option to ask for a gun along with that rope.  Moving from the default/single layout to multi should just be "repo init --multi" in the existing client, and only require moving the top level directories and fixing some symlinks.  So you wouldn't even lose your build products, or need to recompile the world.

But either layout, the "top level" should be the manifest project, like you have been argueing for.
 
I'm not sure what value we get from it being on the top level, other than the "git add $subproject; git commit" benefit.  Which may or may not confuse the heck out of a user used to p4 or svn, for example.
The main benefit is together with my "atomic change" proposal where each change also includes a commit in the manifest repository, so every change is globally defined. This is a property, that should not be underestimated.  It is something that works in SVN, in a single git repository, and even in a regular git submodule based project if we look at the superproject's commits.

Yes, but its also a throwback to BitKeeper for some.  Comitting a file, and then committing again to create the change set.  Its a very awkward UI.
 
In my opinion, my proposal blends in nicely with the intent of the whole change: to have more control over what gets into the developer's working tree.
At the basis it remains completely compatible with a regular "git submodule" project, no need to touch repo or gerrit if you want. (You can add "freedom of choice" to your marketing flyers :-) )

Yea, one of the things I like is that it does better match to "git submodule".  There's always been "freedom of choice" with repo, you have the source code under a very liberal license, anyone could have a created a tool that did the same task... but why, repo works well enough to get AOSP, and in some cases, is better/faster than git submodule.  But with this full rework of the client layout, repo can actually be used on any project, potentially replacing "git submodule" for many users.
 
Repo only builds on this foundation:
- linked working trees to save disk space and sync times
- make working with submodules easier (not having to do separate commits in subprojects ... etc.)
- topic branch management
- code review workflow
etc.

In any case, I can submit patches for repo to add support for this layout, and test how it sits with the developers / CM people. However, the biggest advantage of this layout would be the easy support for atomic changes, which would also require Gerrit support.

If you haven't yet looked at the repo code, I'm going to warn you, it isn't pretty.  I learned Python by writing it.  I know there are more elegant ways to write the code.  I know some of the style is inconsistent.  Its uh, not my proudest moment.  And since it works well enough for the folks that are using it today, it doesn't get any attention to clean it up... my time goes to Gerrit.  So, uh, apologies in advance for the sad state that the code is currently in.

Gergely Kis

unread,
May 29, 2009, 3:15:41 PM5/29/09
to repo-d...@googlegroups.com
Hi,

On Fri, May 29, 2009 at 4:45 PM, Shawn Pearce <s...@google.com> wrote:
After sleeping on it, I have changed my mind, and I think we are now in agreement.  I've moved over to your idea that the top level should just be a git repository working directory.  Well, actually, it depends on whether or not you use the "multiple parallel work trees" approach.

I think we should support both layouts, starting/defaulting with the single work tree layout, and then allowing the user to switch to multiple if they so choose.  E.g. "repo init --multi -u ..." would immediately use the multiple layout you proposed several messages back.  "git new-work-dir" is still in contrib for a reason, its easy to get confused and wind up with a branch checked out in two places at once, etc.  As it is repo and git give you enough rope to hang yourself, lets at least make it an advanced user option to ask for a gun along with that rope.  Moving from the default/single layout to multi should just be "repo init --multi" in the existing client, and only require moving the top level directories and fixing some symlinks.  So you wouldn't even lose your build products, or need to recompile the world.

But either layout, the "top level" should be the manifest project, like you have been argueing for.
So the "single layout" would be like this:

.repo
.repo/manifest.git -> clone of the manifest repository without working tree
.repo/repo -> repo.git clone with working tree
.repo/projects/*/*.git
.git/* -> Linked working tree to .repo/manifest.git
.git/config -> Generated content, listing submodules as relative urls to .repo/projects/*/*.git

.gitmodules -> File checkout from .repo/manifest.git
.gerrit -> File checkout from .repo/manifest.git
dir1 ... dirN -> Submodule working trees, essentially unchanged from the current layout.

This makes the conversion of the current workspaces very easy. A repo sync can first update the manifest repository, then generate the .git/* alongside .repo, then execute the new sync algorithm by looking at .gitmodules and the gitlink entries.

Are we on the same page here?

Yes, but its also a throwback to BitKeeper for some.  Comitting a file, and then committing again to create the change set.  Its a very awkward UI.
Well, I thought that repo could provide a frontend to make this easier. In fact a repo commit in the top level project could commit recursively in all projects with the appropriate topic branch, for example.

Or repo commit in a subproject could automatically execute "git add $subproject" in the top level project. Also, repo upload could check for commits in subproject topic branches that are not reflected in the top level project.

We should probably think this through systematically by collecting all the relevant use cases to have a consistent experience for developers.
 

If you haven't yet looked at the repo code, I'm going to warn you, it isn't pretty.  I learned Python by writing it.  I know there are more elegant ways to write the code.  I know some of the style is inconsistent.  Its uh, not my proudest moment.  And since it works well enough for the folks that are using it today, it doesn't get any attention to clean it up... my time goes to Gerrit.  So, uh, apologies in advance for the sad state that the code is currently in.
I already had a peek before we started this discussion, so I know what to expect. I don't think it is in too bad shape. I saw that you already started with the preparations for the manifest format change.

Best Regards,
Gergely

Shawn Pearce

unread,
May 29, 2009, 4:36:10 PM5/29/09
to repo-d...@googlegroups.com
On Fri, May 29, 2009 at 12:15, Gergely Kis <gerge...@gmail.com> wrote:
On Fri, May 29, 2009 at 4:45 PM, Shawn Pearce <s...@google.com> wrote:
After sleeping on it, I have changed my mind, and I think we are now in agreement.  I've moved over to your idea that the top level should just be a git repository working directory.  Well, actually, it depends on whether or not you use the "multiple parallel work trees" approach.

I think we should support both layouts, starting/defaulting with the single work tree layout, and then allowing the user to switch to multiple if they so choose.  E.g. "repo init --multi -u ..." would immediately use the multiple layout you proposed several messages back.  "git new-work-dir" is still in contrib for a reason, its easy to get confused and wind up with a branch checked out in two places at once, etc.  As it is repo and git give you enough rope to hang yourself, lets at least make it an advanced user option to ask for a gun along with that rope.  Moving from the default/single layout to multi should just be "repo init --multi" in the existing client, and only require moving the top level directories and fixing some symlinks.  So you wouldn't even lose your build products, or need to recompile the world.

But either layout, the "top level" should be the manifest project, like you have been argueing for.
So the "single layout" would be like this:

.repo
.repo/manifest.git -> clone of the manifest repository without working tree
.repo/repo -> repo.git clone with working tree
.repo/projects/*/*.git
.git/* -> Linked working tree to .repo/manifest.git
.git/config -> Generated content, listing submodules as relative urls to .repo/projects/*/*.git

.gitmodules -> File checkout from .repo/manifest.git
.gerrit -> File checkout from .repo/manifest.git
dir1 ... dirN -> Submodule working trees, essentially unchanged from the current layout.

This makes the conversion of the current workspaces very easy. A repo sync can first update the manifest repository, then generate the .git/* alongside .repo, then execute the new sync algorithm by looking at .gitmodules and the gitlink entries.

Are we on the same page here?

Yes, absolutely.
 
Yes, but its also a throwback to BitKeeper for some.  Comitting a file, and then committing again to create the change set.  Its a very awkward UI.
Well, I thought that repo could provide a frontend to make this easier. In fact a repo commit in the top level project could commit recursively in all projects with the appropriate topic branch, for example.

Or repo commit in a subproject could automatically execute "git add $subproject" in the top level project. Also, repo upload could check for commits in subproject topic branches that are not reflected in the top level project.

We should probably think this through systematically by collecting all the relevant use cases to have a consistent experience for developers.

I'm still semi-against a "repo commit" for all projects in one shot.  There's case where you don't want that.  E.g. the webkit/browser projects.  Folks work on both sometimes.  The commit message in external/webkit needs to be something relevant to *webkit* not to Android, because they work very hard to push that all back upstream to webkit.org.  Having a concise message that makes sense in the context of webkit at the time the code change is made helps them present that change upstream a week or two later once they've decided that really is the right course of action to take.

Meanwhile, in frameworks/base and the contacts provider and the contacts application, these are still so tightly tied together in Android that often a commit in all 3 should just carry the same message.. and needs to be atomic across all 3, to prevent build breakages due to submit timing.  So here it does make sense.

There are good arguments for both.

If you haven't yet looked at the repo code, I'm going to warn you, it isn't pretty.  I learned Python by writing it.  I know there are more elegant ways to write the code.  I know some of the style is inconsistent.  Its uh, not my proudest moment.  And since it works well enough for the folks that are using it today, it doesn't get any attention to clean it up... my time goes to Gerrit.  So, uh, apologies in advance for the sad state that the code is currently in.
I already had a peek before we started this discussion, so I know what to expect. I don't think it is in too bad shape. I saw that you already started with the preparations for the manifest format change.

I still have more code to push out.  I started that stuff a while ago, but couldn't get around to testing it enough to get it out there.  Right now I'm working on a bug in some code I did that split the "revision" property of a manifest into two fields, SHA-1 and revision, so that in the submodule style manifest we have access to both the commit the gitlink refers to, and also to the project's remote branch name, if one is specified for gerrit to auto-update.

skillzero

unread,
May 29, 2009, 6:38:23 PM5/29/09
to Repo and Gerrit Discussion
On May 29, 1:36 pm, Shawn Pearce <s...@google.com> wrote:
>
> I still have more code to push out.  I started that stuff a while ago, but
> couldn't get around to testing it enough to get it out there.  Right now I'm
> working on a bug in some code I did that split the "revision" property of a
> manifest into two fields, SHA-1 and revision, so that in the submodule style
> manifest we have access to both the commit the gitlink refers to, and also
> to the project's remote branch name, if one is specified for gerrit to
> auto-update.

You mentioned a few days ago about supporting a "floating" branch such
it would always bring in the tip of the specified branch even if that
made the SHA-1 in the manifest/modules file was stale. Are you still
planning on supporting this?

I ask because it seems like the only alternative to this would be the
custom update hook you mentioned that did a commit to each of the
super project manifest/module files on every update of a sub-project.
I thought more about that and I'm not sure it would work in my case
(which I realize may not be very important to others, but I figured
I'd mention it anyway because I'm selfish in my desire to use repo and
maybe others want to do similar things). The update hook for each sub
project would need to know about every super project and many of the
sub project owners wouldn't have (or be allowed to have) knowledge of
some of the super projects. And some super project owners may not have
write access to the update hook for each sub project. For example, a
GitHub-based super project might have a sub project that's a GitHub-
based repository from some other user.

But assuming the floating/stale thing is allowed, it sounds like the
main change from a usage standpoint with the new manifest format is
that instead of creating the manifest repository, the super project
would just be a git repository where you do 'git submodule add' for
each sub project then create another file to specify the branch names
of each sub project (since I didn't think git submodule let you
specify a branch name)?

Once that was set up, if I commit a change to a sub project and push
it to the sub project's repository and then somebody does a 'repo
sync' of the super project, they'd automatically get that newly
commited/pushed sub project change, right?

Shawn Pearce

unread,
Jun 3, 2009, 10:52:44 AM6/3/09
to repo-d...@googlegroups.com
On Fri, May 29, 2009 at 15:38, skillzero <skil...@gmail.com> wrote:

On May 29, 1:36 pm, Shawn Pearce <s...@google.com> wrote:
>
> I still have more code to push out.  I started that stuff a while ago, but
> couldn't get around to testing it enough to get it out there.  Right now I'm
> working on a bug in some code I did that split the "revision" property of a
> manifest into two fields, SHA-1 and revision, so that in the submodule style
> manifest we have access to both the commit the gitlink refers to, and also
> to the project's remote branch name, if one is specified for gerrit to
> auto-update.

You mentioned a few days ago about supporting a "floating" branch such
it would always bring in the tip of the specified branch even if that
made the SHA-1 in the manifest/modules file was stale. Are you still
planning on supporting this?

Yes.  I just don't know how we'll activate it.

Maybe it will be an option to "repo init", e.g. "repo init --enable-floating-revision".

But assuming the floating/stale thing is allowed, it sounds like the
main change from a usage standpoint with the new manifest format is
that instead of creating the manifest repository, the super project
would just be a git repository where you do 'git submodule add' for
each sub project then create another file to specify the branch names
of each sub project (since I didn't think git submodule let you
specify a branch name)?

Yes, but its in the .gitmodules file, so you can set the branch name as

  git config --file=.gitmodules submodule.$name.revision master

 
Once that was set up, if I commit a change to a sub project and push
it to the sub project's repository and then somebody does a 'repo
sync' of the super project, they'd automatically get that newly
commited/pushed sub project change, right?

If they had floating revisions enabled in the client, yes.

Reply all
Reply to author
Forward
0 new messages