Create-project extra code

26 views
Skip to first unread message

Lincoln

unread,
Aug 30, 2010, 10:07:44 AM8/30/10
to Repo and Gerrit Discussion
Hi Shawn,

Currently SE has a script that does some extra "stuff" when a project
it's created on gerrit.
This script offers to add one initial commit in the git, so that you
can immediately include the git in a repo manifest.
However, we would like to include this "fix" in a proper place.

Where do you think it's the best place to fix this?

Thanks,
Lincoln

Shawn Pearce

unread,
Aug 30, 2010, 10:13:57 AM8/30/10
to Lincoln, Repo and Gerrit Discussion

Just add it to create-project with a new command line option:
--empty-commit. The code for this in JGit is relatively simple:

ObjectInserter oi = repo.newObjectInserter();
try {
CommitBuilder cb = new CommitBuilder();
cb.setTreeId(oi.insert(Constants.OBJ_TREE, new byte[] {});
cb.setAuthorId(serverIdent);
cb.setCommitterId(cb.getAuthorId());
cb.setMessage("Initial empty repository");

ObjectId id = oi.insert(cb);
oi.flush();

RefUpdate ru = repo.updateRef(Constants.HEAD);
ru.setNewObjectId(id);
switch(ru.update()) {
.... check status ...
}
} finally {
oi.release();
}

Lincoln

unread,
Aug 30, 2010, 10:40:00 AM8/30/10
to Repo and Gerrit Discussion
Hi Shawn, Thanks for the fast response!

We want to include in the patch that will contain this fix also two
more things that are currently performed in the script:

One of them it's related to the discussion:
http://groups.google.com/group/repo-discuss/browse_thread/thread/b8ba2537a3669535/f7c8916189e23aa6?lnk=gst&q=Replication+problems

The general idea is to create a new entry for the "remote" section in
the replication.config, so that
the user can create a new project remotely through a option (--slave,
as you suggested, for example).

The other thing: we would like to have the possibility to list
projects that already are parents to another project --- they are
likely parent candidates for new projects.

In the UI it's pretty easy to add this (we are gonna to include this
in the create-project UI patch)), but what do you think about the
following solution for command-line?

We could create an option "--suggestparent" in the create-project
command
and If the user creates a project with this option the parent
candidates would be shown to the user (like an interactive command-
line).

Please, let me know what you think about this :)

Thanks
Lincoln

On 30 ago, 11:13, Shawn Pearce <s...@google.com> wrote:
> On Mon, Aug 30, 2010 at 07:07, Lincoln
>

Shawn Pearce

unread,
Aug 30, 2010, 10:55:01 AM8/30/10
to Lincoln, Repo and Gerrit Discussion
On Mon, Aug 30, 2010 at 07:40, Lincoln
<lincoln.oliveirac...@sonyericsson.com> wrote:
>
> The general idea is to create a new entry for the "remote" section in
> the replication.config, so that
> the user can create a new project remotely through a option (--slave,
> as you suggested, for example).

Adding anything to replication.config is going to be a challenge,
PushReplication doesn't reload that file until the server restarts.
If we start modifying the file on the fly for the administrator, we
need to also modify our internal structures to reflect that change.

> The other thing: we would like to have the possibility to list
> projects that already are parents to another project --- they are
> likely parent candidates for new projects.
>
> In the UI it's pretty easy to add this (we are gonna to include this
> in the create-project UI patch)), but what do you think about the
> following solution for command-line?
>
> We could create an option "--suggestparent" in the create-project
> command
> and If the user creates a project with this option the parent
> candidates would be shown to the user (like an interactive command-
> line).

Being interactive worries me a little. While we are waiting for the
user to make their choice we are trying up a worker thread, and there
are only so many worker threads available in a server. Not everyone
can create a project, but if enough try to create a project
interactively at the same time, they could starve out every other user
on the system. That means no git fetch/repo sync/repo upload/etc.
:-(

If we are going to be interactive, we might want to force
CreateProject to use its own thread, similar to the way other
@AdminHighPriorityCommand implementations already do. Which means
reworking some of the logic in BaseCommand startThread.

If its going to be interactive, maybe the better name for the option
is --select-parent?

Nasser Grainawi

unread,
Aug 30, 2010, 11:39:02 AM8/30/10
to Shawn Pearce, Lincoln, Repo and Gerrit Discussion

If you really don't want interactive you could have --suggestparent just
return with a list of suggested parents, then you'd have to re-run the
command with --parent <one of the suggestions> to actually create the
project. I considered hacking something similar to this into our
create-project wrapper script, but having this (or Lincoln's suggestion)
implemented natively would be better.

Nasser

--
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum

Lincoln

unread,
Aug 30, 2010, 11:56:17 AM8/30/10
to Repo and Gerrit Discussion
Hi,

Thanks for the response.

In the first item, I think I wasn't clear, the problem is that we are
unable to make replication work over ssh, and thus we currently need
to use the writable git-daemon for replication of git content to slave
servers.
The problem with this approach is that we can’t initialize new empty
gits over the writable git-daemon. We need to do that separately over
an ssh-link instead.

So the idea is to add the possibility to have in the
replication.config file besides the "url" section also a "ADMIN" or
"SSH" section, that would be used only for project-creation and rename
tasks.

Thanks
Lincoln

On 30 ago, 12:39, Nasser Grainawi <nas...@codeaurora.org> wrote:
> On 08/30/2010 08:55 AM, Shawn Pearce wrote:
>
>
>
> > On Mon, Aug 30, 2010 at 07:40, Lincoln
> > <lincoln.oliveiracamposdonascime...@sonyericsson.com>  wrote:

Shawn Pearce

unread,
Aug 30, 2010, 12:45:17 PM8/30/10
to Lincoln, Repo and Gerrit Discussion
On Mon, Aug 30, 2010 at 08:56, Lincoln
<lincoln.oliveirac...@sonyericsson.com> wrote:
> In the first item, I think I wasn't clear, the problem is that we are
> unable to make replication work over ssh, and thus we currently need
> to use the writable git-daemon for replication of git content to slave
> servers.
> The problem with this approach is that we can’t initialize new empty
> gits over the writable git-daemon. We need to do that separately over
> an ssh-link instead.
>
> So the idea is to add the possibility to have in the
> replication.config file besides the "url" section also a "ADMIN" or
> "SSH" section, that would be used only for project-creation and rename
> tasks.

Oh, OK, that makes sense. Its unrelated to the create-project parent
selection change you are also discussing in this thread, or even the
empty initial commit change that started this thread, but I can see
the reason for this. I'll happily review a change to add this create
project URL to replication.config.

Fredrik Luthander

unread,
Sep 1, 2010, 6:39:09 AM9/1/10
to Repo and Gerrit Discussion
Hi there!

I must be honest, I think this doesn't need its own switch, I just
think gerrit should always honor that switch in replication.config at
project creation time. It's not possible to create a new project over
the writable git-daemon protocol as of today, so if there is an admin/
ssh/whatever-name-we-choose entry available in the replication entry,
we should use it by default. It's easy to forsee that this entry will
be available in at least every git-daemon entry.

If gerrit is not able to create the project on all slaves, an error
should be generated for error_log.

BR,
Fredrik
On Aug 30, 5:56 pm, Lincoln

Shawn Pearce

unread,
Sep 1, 2010, 10:19:01 AM9/1/10
to Fredrik Luthander, Repo and Gerrit Discussion
On Wed, Sep 1, 2010 at 03:39, Fredrik Luthander
<fredrik....@sonyericsson.com> wrote:
> I must be honest, I think this doesn't need its own switch, I just
> think gerrit should always honor that switch in replication.config at
> project creation time. It's not possible to create a new project over
> the writable git-daemon protocol as of today, so if there is an admin/
> ssh/whatever-name-we-choose entry available in the replication entry,
> we should use it by default. It's easy to forsee that this entry will
> be available in at least every git-daemon entry.

Oh, I agree with this completely. This thread is actually two
different threads. One is talking about a new option to
replication.config to support this admin URL concept, and if present
that would always be used. The other half of the thread is talking
about something completely unrelated, which is to add a new switch to
create-project to help the user select a parent project.

> If gerrit is not able to create the project on all slaves, an error
> should be generated for error_log.

Yes. And even better if the create-project command could warn the
creator that one or more slaves failed to initialize the project.

Fredrik Luthander

unread,
Sep 1, 2010, 10:32:29 AM9/1/10
to Repo and Gerrit Discussion
Ok, maybe I misunderstood something, I think I read that into lincolns
suggestion of a -slave-option. I'm aware of the two issues being
discussed, but also in the parent-discussion a -slave switch doesn't
really make sense. But maybe I brought up something that was already a
thing of the past in your discussion. :)

Anyway, seems we are in agreement, which is what matters at the end of
the day.
Thanks!

/Fredrik

On Sep 1, 4:19 pm, Shawn Pearce <s...@google.com> wrote:
> On Wed, Sep 1, 2010 at 03:39, Fredrik Luthander
>

Fredrik Luthander

unread,
Sep 1, 2010, 2:28:07 PM9/1/10
to Repo and Gerrit Discussion, s...@google.com
Ok, so I just saw the link where Lincoln referred to my own post on
this from June [1], and there we talk about a -slave option as one
alternative.
I think I'm finally catching up with the rest of the world. :-)
[1] http://groups.google.com/group/repo-discuss/browse_thread/thread/b8ba2537a3669535#

So, nevermind my gibberish, it seems I'm out of phase here.

So, it seems I'm still just too locked into the idea that the master
will run the git initialization on the slave, and I think I'm now
understanding that that isn't the plan. Right?
I see both talk about a stream option that a slave would connect to
and get batched commands to execute, and then a scheduled batch option
running every so often on the slave server, a separate instance that
would connect once every while.
Is that describing the options well enough? Or have I misunderstood
once more? :-)
In that case I'm guessing we won't need my suggested entry in the
replication.config file either, my assumption with that was for the
master to execute the init over ssh on the remote slave.

Perhaps we don't even need the stream if there is a way to keep a list
of tasks to do in the db?
Or is it wrong to store the tasks in a table? If the slave is offline
at the time of project creation, how should that be handled?

I also seem to remember someone that was recommended to use
replication.config entries to export gits to a public location not
intended for a gerrit slave. I hope I remember that correctly. Anyway,
if true, how will git initialization, git renaming and git removal (as
discussed in gerrit issue 349) be handled on those locations? Would it
perhaps be necessary to associate certain slave processes with certain
location entries in replication.config?

Sorry to bombard you with questions and problems, but this will affect
my daily work so I'm quite interested in the outcome. :-)

BR,
Fredrik

On Sep 1, 4:32 pm, Fredrik Luthander

Shawn Pearce

unread,
Sep 3, 2010, 10:45:38 AM9/3/10
to Fredrik Luthander, Repo and Gerrit Discussion
On Wed, Sep 1, 2010 at 11:28, Fredrik Luthander
<fredrik....@sonyericsson.com> wrote:
> Ok, so I just saw the link where Lincoln referred to my own post on
> this from June [1], and there we talk about a -slave option as one
> alternative.
> I think I'm finally catching up with the rest of the world. :-)
> [1] http://groups.google.com/group/repo-discuss/browse_thread/thread/b8ba2537a3669535#
>
> So, nevermind my gibberish, it seems I'm out of phase here.
>
> So, it seems I'm still just too locked into the idea that the master
> will run the git initialization on the slave, and I think I'm now
> understanding that that isn't the plan. Right?

I think it is the plan. Let the master init the slave when a project
is created. But the init process may need to go through a different
URL than replication normally follows.

> I see both talk about a stream option that a slave would connect to
> and get batched commands to execute, and then a scheduled batch option
> running every so often on the slave server, a separate instance that
> would connect once every while.

That idea has been floated before, yes. IIRC I was describing it as a
way for slaves to know what cache records to evict, so they could be
more up-to-date with the master's database. But it would also be a
good way to get a slave to initialize a new Git repository.

> In that case I'm guessing we won't need my suggested entry in the
> replication.config file either, my assumption with that was for the
> master to execute the init over ssh on the remote slave.

As I mentioned above, that is still the current plan.

> Perhaps we don't even need the stream if there is a way to keep a list
> of tasks to do in the db?
> Or is it wrong to store the tasks in a table? If the slave is offline
> at the time of project creation, how should that be handled?

We probably should keep a list of recent events, because during master
server restarts its possible that the slave didn't get to see an event
before it was kicked off the master, or it takes longer to reconnect
and misses an event that happens as soon as the master comes online.
Or, there was a temporary network outage and the slave missed an
hour's worth of events. This is all also true for the current `gerrit
stream-events` command that clients can use to watch repositories and
create a firehose IRC chatbot.

Its somewhat stressful to the database to keep a log of actions in it.
But if you think about it, pretty much every single action we do is
already writing to at least one record in the database anyway. Adding
another record to a log table won't kill the server.

> I also seem to remember someone that was recommended to use
> replication.config entries to export gits to a public location not
> intended for a gerrit slave. I hope I remember that correctly. Anyway,
> if true, how will git initialization, git renaming and git removal (as
> discussed in gerrit issue 349) be handled on those locations? Would it
> perhaps be necessary to associate certain slave processes with certain
> location entries in replication.config?

Yes, everything you just said above is true.

Fredrik Luthander

unread,
Sep 3, 2010, 8:34:49 PM9/3/10
to Repo and Gerrit Discussion


On Sep 3, 4:45 pm, Shawn Pearce <s...@google.com> wrote:
> On Wed, Sep 1, 2010 at 11:28, Fredrik Luthander
>
> <fredrik.luthan...@sonyericsson.com> wrote:
> > Ok, so I just saw the link where Lincoln referred to my own post on
> > this from June [1], and there we talk about a -slave option as one
> > alternative.
> > I think I'm finally catching up with the rest of the world. :-)
> > [1]http://groups.google.com/group/repo-discuss/browse_thread/thread/b8ba...
>
> > So, nevermind my gibberish, it seems I'm out of phase here.
>
> > So, it seems I'm still just too locked into the idea that the master
> > will run the git initialization on the slave, and I think I'm now
> > understanding that that isn't the plan. Right?
>
> I think it is the plan.  Let the master init the slave when a project
> is created.  But the init process may need to go through a different
> URL than replication normally follows.

Oh, ok, excellent. That more or less invalidates most things I've
written below then. :)

Most excellent!

>
> > I see both talk about a stream option that a slave would connect to
> > and get batched commands to execute, and then a scheduled batch option
> > running every so often on the slave server, a separate instance that
> > would connect once every while.
>
> That idea has been floated before, yes.  IIRC I was describing it as a
> way for slaves to know what cache records to evict, so they could be
> more up-to-date with the master's database.  But it would also be a
> good way to get a slave to initialize a new Git repository.

My worry with this solution was for the case when there was no Gerrit
slave connected to the end destination.
But again, this worry has no ground anymore. :)

> > Perhaps we don't even need the stream if there is a way to keep a list
> > of tasks to do in the db?
> > Or is it wrong to store the tasks in a table? If the slave is offline
> > at the time of project creation, how should that be handled?
>
> We probably should keep a list of recent events, because during master
> server restarts its possible that the slave didn't get to see an event
> before it was kicked off the master, or it takes longer to reconnect
> and misses an event that happens as soon as the master comes online.
> Or, there was a temporary network outage and the slave missed an
> hour's worth of events.  This is all also true for the current `gerrit
> stream-events` command that clients can use to watch repositories and
> create a firehose IRC chatbot.
>
> Its somewhat stressful to the database to keep a log of actions in it.
>  But if you think about it, pretty much every single action we do is
> already writing to at least one record in the database anyway.  Adding
> another record to a log table won't kill the server.

True. And the amount of data we carry in it is frankly not killing
anybody either. (I'll try to find out the size of the data our db has,
we're approaching 47k changes now!)

BR,
Fredrik

Lincoln

unread,
Sep 17, 2010, 12:32:29 PM9/17/10
to Repo and Gerrit Discussion
Hi Shawn!

I have a question regarding the new option("SSH" or "Admin") to be
added to the "replication.config" file:

Currently the options from this file are retrieved in the method
"allConfigs" in the PushReplication class,
but it is used some JGit classes to do that, such as "FileBasedConfig"
and
"RemoteConfig" --> that actually contains the data from the "remote"
section, such as "url".

So, where would you suggest us to change, in order to add this new
option to replication.config?

Thanks,
Lincoln

On 3 set, 21:34, Fredrik Luthander

Shawn Pearce

unread,
Sep 17, 2010, 3:13:46 PM9/17/10
to Lincoln, Repo and Gerrit Discussion
On Fri, Sep 17, 2010 at 09:32, Lincoln
<lincoln.oliveirac...@sonyericsson.com> wrote:
> I have a question regarding the new option("SSH" or "Admin") to be
> added to the "replication.config" file:
>
> Currently the options from this file are retrieved in the method
> "allConfigs" in the PushReplication class,
> but it is used some JGit classes to do that, such as "FileBasedConfig"
> and
> "RemoteConfig" --> that actually contains the data from the "remote"
> section, such as "url".
>
> So, where would you suggest us to change, in order to add this new
> option to replication.config?

Put it in PushReplication.

The FileBasedConfig type from JGit supports arbitrary keys. To get
the new admin url we could do:

String adminUrl = cfg.getStringList("remote", rc.getName(), "adminUrl");

Reply all
Reply to author
Forward
0 new messages