General Manifest/Gerrit Questions

776 views
Skip to first unread message

Mike

unread,
Mar 4, 2009, 12:28:18 PM3/4/09
to Repo and Gerrit Discussion
I have a few questions about the manifest.xml file format and adding
projects to Gerrit.

First, is there any documentation on the manifest.xml format?

When inserting a project into the projects table of the Gerrit
Postgres DB, is it best to use the path or the name attribute form the
project tag of the manifest file? Usually the path contains a full
path; whereas, the name is relative to that parent directory. I'm
guessing the full path relative to the repository root (ie the value
saved in the config table), but I just want to be sure.

When setting up an android mirror, would you insert each project that
has a <project> tag in the manifest file into the Gerrit projects and
branches table? Is this normally dependent on usage; in other words,
only add the projects that your team is going to use?

Is there ever more than one manifest file in a repo repository?

Finally, is the purpose of the add-branch UI tool to add branches that
already exist in the repository or to create a new branch altogether?

Thanks,
Mike

Shawn Pearce

unread,
Mar 4, 2009, 1:18:10 PM3/4/09
to repo-d...@googlegroups.com
On Wed, Mar 4, 2009 at 09:28, Mike <msw...@gmail.com> wrote:

I have a few questions about the manifest.xml file format and adding
projects to Gerrit.

First, is there any documentation on the manifest.xml format?

Yes.

  http://android.git.kernel.org/?p=tools/repo.git;a=blob;f=docs/manifest-format.txt;hb=HEAD

or see

  .repo/repo/docs/manifest-format.txt

in any repo client.

When inserting a project into the projects table of the Gerrit
Postgres DB, is it best to use the path or the name attribute form the
project tag of the manifest file?

The path.  "repo upload" uses the path value of the project element to communicate the destination to Gerrit.  Gerrit takes that as-is and does a query on the projects table.  So yea, project "name" in Gerrit is misnamed in the database relative to the repo manifest.  I never noticed it before either.

I'll consider renaming it to path in the projects table.  Its a relatively painless change.  Its just YetAnotherSchemaUpgrade.  We seem to have one every release.  :)
 
When setting up an android mirror, would you insert each project that
has a <project> tag in the manifest file into the Gerrit projects and
branches table?  Is this normally dependent on usage; in other words,
only add the projects that your team is going to use?

You can do either.  Google is putting every project into our internal Gerrit, Just In Case We Need It(tm).  We don't expect to need to make a change to every project on the internal server.  Most changes would just be on the external one.  But we might as well import all of them.

Anyone else have suggestions for Mike?
 
Is there ever more than one manifest file in a repo repository?

No.  There is only one manifest.  You can augment it with the .repo/local_manifest.xml file, but this is a local to the repo client and isn't distributed.

You can have multiple manifest files per manifest Git repository and select them with the -m flag to repo init.  But I think this is an obscure feature that perhaps nobody uses.

Finally, is the purpose of the add-branch UI tool to add branches that
already exist in the repository or to create a new branch altogether?

Its to create new branches in an already existing repository, basing them on an existing commit SHA-1.  E.g. if you need to fork off to put risky changes somewhere, you need to make the branch before you can upload the change for review.

Mike

unread,
Mar 4, 2009, 1:21:35 PM3/4/09
to Repo and Gerrit Discussion
Excellent!! Thank you very much (once again).

On Mar 4, 10:18 am, Shawn Pearce <s...@google.com> wrote:
> On Wed, Mar 4, 2009 at 09:28, Mike <mswe...@gmail.com> wrote:
>
> > I have a few questions about the manifest.xml file format and adding
> > projects to Gerrit.
>
> > First, is there any documentation on the manifest.xml format?
>
> Yes.
>
> http://android.git.kernel.org/?p=tools/repo.git;a=blob;f=docs/manifes...

Shawn Pearce

unread,
Mar 4, 2009, 3:54:42 PM3/4/09
to repo-d...@googlegroups.com
On Wed, Mar 4, 2009 at 10:18, Shawn Pearce <s...@google.com> wrote:
On Wed, Mar 4, 2009 at 09:28, Mike <msw...@gmail.com> wrote:

When inserting a project into the projects table of the Gerrit
Postgres DB, is it best to use the path or the name attribute form the
project tag of the manifest file?

The path.

WTF.  Sorry.  I did not consume enough caffeine before writing.

repo upload uses the project *NAME*.  not the path.  Ignore what I said earlier.
 
I'll consider renaming it to path in the projects table.  Its a relatively painless change.  Its just YetAnotherSchemaUpgrade.  We seem to have one every release.  :) 

Thus no schema change.  The project "name" field in the database matches with the project "name" attribute in the XML.

Seriously.  I needed to drink more of my coffee before replying  Sorry.

Mike

unread,
Mar 4, 2009, 4:24:28 PM3/4/09
to Repo and Gerrit Discussion
I think I just confused you by saying "Usually the path contains a
full
path; whereas, the name is relative to that parent directory." :)

Anyway, no problems- that makes sense.

I have another question. My company will have several projects going
at the same time for different customers. It's necessary to keep all
the development efforts completely separate for the customers as
well. I just pulled down a repo repository that has something like 70
sub-projects inside of it. If we have several going at the same time,
we might have something like 300 projects in Gerrit within the first
year or so of using it.

Is there a way to create wildcard style access rules? For example, if
you have
Project/proj1
Project/proj2
Project/proj3

Could you use a wildcard in the database to assign some user group to
have "verify" access to Project/*?

Thanks,
Mike

On Mar 4, 12:54 pm, Shawn Pearce <s...@google.com> wrote:
> On Wed, Mar 4, 2009 at 10:18, Shawn Pearce <s...@google.com> wrote:

Shawn Pearce

unread,
Mar 4, 2009, 4:33:18 PM3/4/09
to repo-d...@googlegroups.com
On Wed, Mar 4, 2009 at 13:24, Mike <msw...@gmail.com> wrote:
I have another question.  My company will have several projects going
at the same time for different customers.  It's necessary to keep all
the development efforts completely separate for the customers as
well.  I just pulled down a repo repository that has something like 70
sub-projects inside of it.  If we have several going at the same time,
we might have something like 300 projects in Gerrit within the first
year or so of using it.

Yea, easily.  AOSP is up to 139 hosted on the site...
 
Is there a way to create wildcard style access rules?  For example, if
you have
Project/proj1
Project/proj2
Project/proj3

Could you use a wildcard in the database to assign some user group to
have "verify" access to Project/*?

No.  But I was thinking about that this morning.  There's sections of the AOSP where we just want the same rule more or less across the entire subtree, e.g. kernel/* or tools/*.

Mike

unread,
Mar 5, 2009, 12:02:10 PM3/5/09
to Repo and Gerrit Discussion
On Mar 4, 1:33 pm, Shawn Pearce <s...@google.com> wrote:
> Yea, easily.  AOSP is up to 139 hosted on the site...

The System Design document lists these numbers in the Scalability
section:

Parameter/Estimated Maximum
Projects 500
Contributors 2,000
Changes/Day 400
Revisions/Change 2.0
Files/Change 4
Comments/File 2
Reviewers/Change 1.0

Is the "500 projects" number still an accurate estimation?

Shawn Pearce

unread,
Mar 5, 2009, 12:19:06 PM3/5/09
to repo-d...@googlegroups.com
On Thu, Mar 5, 2009 at 09:02, Mike <msw...@gmail.com> wrote:

On Mar 4, 1:33 pm, Shawn Pearce <s...@google.com> wrote:
> Yea, easily.  AOSP is up to 139 hosted on the site...

The System Design document lists these numbers in the Scalability
section:

Parameter/Estimated Maximum
Projects 500

Is the "500 projects" number still an accurate estimation?

Maybe.  I don't know.

I never intended for Gerrit to house say 8 complete copies of the AOSP under different directory paths.  I always assumed it would only need to house one copy.  AOSP may grow to 500 projects, but I think it will take many years for it to reach that level.  Although its currently ~200...

Projects are actually quite lightweight, its just a record in the projects table.  The big footprint is the disk space in the Git repositories directory.

The project management UI may start to choke with 500 projects on it.  That screen doesn't paginate or anything.  Loading that many records may cause your browser to freeze up for few seconds while it parses the JSON and creates the HTML.

But the number of projects in the database has almost no bearing otherwise on how Gerrit performs.

Shawn Pearce

unread,
Mar 5, 2009, 12:30:19 PM3/5/09
to repo-d...@googlegroups.com
On Thu, Mar 5, 2009 at 09:19, Shawn Pearce <s...@google.com> wrote:
On Thu, Mar 5, 2009 at 09:02, Mike <msw...@gmail.com> wrote:

The System Design document lists these numbers in the Scalability
section:

Parameter/Estimated Maximum
Projects 500

Is the "500 projects" number still an accurate estimation?

I think, aside from that project screen when you are in the "Administrators" group and can see *every* project, Gerrit could scale into 20,000+ projects without blinking.  All of the database queries run on single index range scans or equality scans, and most stuff dealing with a project runs on either the unique project id, or the unique project name.

Where you'd definitely run into a problem is the local filesystem. 20,000 projects may not fit on a single filesystem.  They may not fit into a directory without causing huge slowdowns in the kernel's VFS layer as it tries to parse through the directory entries during filesystem access.  To really get into this realm we would want to implement a translation function of some sort to hash the repository directory on disk, to spread the load around.  That might then break gitweb and git daemon URL references, unless gitweb and git daemon also knew how how to do the hashing, or the hashed path was used instead of the project name.

Right now I don't need to get into that 20,000 project range with Gerrit.  But it shouldn't be much more than a few days work to implement hashing and some sane translation.  Most of it is isolated to RepositoryCache, and then dealing with that gitweb/git-daemon reference.
Reply all
Reply to author
Forward
0 new messages