Question about up/download restriction

160 views
Skip to first unread message

seonguk.baek

unread,
Apr 18, 2012, 1:35:51 AM4/18/12
to Repo and Gerrit Discussion
Deal all.

I've a question about replication operating.

We're operating a single master server plus 5 slave servers.

We want the master server to only perform replication process, 'repo
upload' and 'git push' and we want developers not to download the
sources from the master server because it causes server & network
overload when they download the sources from the master server.

Is there any way to allow the master server only to perform 'Direct
Push' and 'repo upload', NOT 'repo sync' ????

Shawn Pearce

unread,
Apr 18, 2012, 3:05:37 AM4/18/12
to seonguk.baek, Repo and Gerrit Discussion

Set upload.allowGroup in gerrit.config:

http://gerrit-documentation.googlecode.com/svn/Documentation/2.3/config-gerrit.html#upload.allowGroup

E.g. only permit Administrators to use the master directly:

[upload]
allowGroup = Administrators

Don't set this on the slaves. :-)

Edwin Kempin

unread,
Apr 18, 2012, 3:11:06 AM4/18/12
to Shawn Pearce, seonguk.baek, Repo and Gerrit Discussion
Shawn, I think the question was rather if it's possible to prevent cloning and fetching from the master Gerrit server so that users are forced to do these expensive operations via a slave Gerrit server.

2012/4/18 Shawn Pearce <s...@google.com>

Shawn Pearce

unread,
Apr 18, 2012, 3:43:11 AM4/18/12
to Edwin Kempin, seonguk.baek, Repo and Gerrit Discussion
On Wed, Apr 18, 2012 at 00:11, Edwin Kempin <edwin....@gmail.com> wrote:
> Shawn, I think the question was rather if it's possible to prevent cloning
> and fetching from the master Gerrit server so that users are forced to do
> these expensive operations via a slave Gerrit server.

And that is why the fine folks at Sony contributed the
upload.allowGroup config variable.

I know it sounds backwards. But upload on the server side is the git
term for the server portion of clone/fetch/sync. Because the server is
uploading the repository data to the client by replying to the
client's request. Blame Linus Torvalds for the naming.

Edwin Kempin

unread,
Apr 18, 2012, 4:02:28 AM4/18/12
to Shawn Pearce, seonguk.baek, Repo and Gerrit Discussion
Ah ok, thanks for clarifying. I was really confused by the name.

Magnus Bäck

unread,
Apr 18, 2012, 9:54:50 AM4/18/12
to Repo and Gerrit Discussion
On Wednesday, April 18, 2012 at 03:05 EDT,
Shawn Pearce <s...@google.com> wrote:

> On Tue, Apr 17, 2012 at 22:35, seonguk.baek <baeks...@gmail.com> wrote:

[...]

> > Is there any way to allow the master server only to perform
> > 'Direct Push' and 'repo upload', NOT 'repo sync' ????
>
> Set upload.allowGroup in gerrit.config:
>
> http://gerrit-documentation.googlecode.com/svn/Documentation/2.3/config-gerrit.html#upload.allowGroup
>
> E.g. only permit Administrators to use the master directly:
>
> [upload]
> allowGroup = Administrators
>
> Don't set this on the slaves. :-)

One problem with this feature is that it breaks SSH-based download
commands listed on the change page (which unconditionally uses the
web UI's hostname). For HTTP and Git protocol access we already have
gerrit.canonicalGitUrl and gerrit.gitHttpUrl, but perhaps we should
have something like gerrit.gitSshHostname?

--
Magnus Bäck
ba...@google.com

Edwin Kempin

unread,
Apr 18, 2012, 9:58:31 AM4/18/12
to Repo and Gerrit Discussion
Isn't this already available as sshd.advertisedAddress? See [1] for details.

[1] https://gerrit-review.googlesource.com/23831

2012/4/18 Magnus Bäck <ba...@google.com>

Shawn Pearce

unread,
Apr 18, 2012, 10:01:55 AM4/18/12
to Edwin Kempin, Repo and Gerrit Discussion
On Wed, Apr 18, 2012 at 06:58, Edwin Kempin <edwin....@gmail.com> wrote:
> Isn't this already available as sshd.advertisedAddress? See [1] for details.
>
> [1] https://gerrit-review.googlesource.com/23831

Apparently yes, this can be used for that.

The entire Git protocol setup is a confusing mess right now. None of
the options are named consistently. :-(

> 2012/4/18 Magnus Bäck <ba...@google.com>

Magnus Bäck

unread,
Apr 18, 2012, 10:05:04 AM4/18/12
to Repo and Gerrit Discussion
On Wednesday, April 18, 2012 at 09:58 EDT,
Edwin Kempin <edwin....@gmail.com> wrote:

> 2012/4/18 Magnus Bäck <ba...@google.com>


>
> > One problem with this feature is that it breaks SSH-based download
> > commands listed on the change page (which unconditionally uses the
> > web UI's hostname). For HTTP and Git protocol access we already have
> > gerrit.canonicalGitUrl and gerrit.gitHttpUrl, but perhaps we should
> > have something like gerrit.gitSshHostname?
>

> Isn't this already available as sshd.advertisedAddress? See [1] for
> details.
>
> [1] https://gerrit-review.googlesource.com/23831

It's not quite the same thing. The sshd.advertisedAddress setting
controls the advertised hostname for *any* SSH connection, including
Git connections but also all the administrative SSH commands that must
be run against the master server.

--
Magnus Bäck
ba...@google.com

Luthander, Fredrik

unread,
Apr 18, 2012, 10:05:41 AM4/18/12
to Shawn Pearce, Edwin Kempin, Repo and Gerrit Discussion

> From: repo-d...@googlegroups.com [mailto:repo-
> dis...@googlegroups.com] On Behalf Of Shawn Pearce

>
> On Wed, Apr 18, 2012 at 06:58, Edwin Kempin <edwin....@gmail.com>
> wrote:
> > Isn't this already available as sshd.advertisedAddress? See [1] for
> details.
> >
> > [1] https://gerrit-review.googlesource.com/23831
>
> Apparently yes, this can be used for that.
>
> The entire Git protocol setup is a confusing mess right now. None of
> the options are named consistently. :-(
>

On that note, how should one treat a name change of a config variable in Gerrit.config or replication.config? How should a migration from an old to a new value be done?

I'm asking because I'm planning to eventually submit a change that renames one of the permissions that control capabilities.

--
Best regards,
    Fredrik Luthander
Sony Mobile Communications AB

Luthander, Fredrik

unread,
Apr 18, 2012, 10:07:22 AM4/18/12
to Luthander, Fredrik, Shawn Pearce, Edwin Kempin, Repo and Gerrit Discussion

Which would then of course be found in project.config under refs/meta/config rather than Gerrit.config or replication.config... Aside from that, the question remains as before..

Sorry for the confusion. :-)

Shawn Pearce

unread,
Apr 18, 2012, 10:14:05 AM4/18/12
to Luthander, Fredrik, Edwin Kempin, Repo and Gerrit Discussion
On Wed, Apr 18, 2012 at 07:07, Luthander, Fredrik
<Fredrik....@sonymobile.com> wrote:
>> >
>> > The entire Git protocol setup is a confusing mess right now. None of
>> > the options are named consistently. :-(
>>
>> On that note, how should one treat a name change of a config variable
>> in Gerrit.config or replication.config? How should a migration from an
>> old to a new value be done?
>>
>> I'm asking because I'm planning to eventually submit a change that
>> renames one of the permissions that control capabilities.
>
> Which would then of course be found in project.config under refs/meta/config rather than Gerrit.config or replication.config... Aside from that, the question remains as before..

We have to be veerrrrry careful here.

In theory you can run a schema migration, and in the schema migration
iterate through all projects and edit the refs/meta/config branch as
you go. In practice this is not possible. In particular, I can't
iterate everything and convert it while the server is offline in my
hosting environment... because the server is never offline. So doing a
schema migration like this can break my hosting environment, breaking
the gerrit and android projects. If we have to do it, you could be
signing my team up for a week's worth of labor just for that one
change. :-(

The preferred approach is to modify the reader code to read the new
name, and if it is not defined, read the old name, e.g.:

if (cfg.getString("section", null, "new-name") != null) {
value = cfg.getEnum("section", null, "new-name", Type.DEFAULT);
} else {
value = cfg.getEnum("section", null, "old-name", Type.DEFAULT);
}

when storing the value, you can unset old-name and then set new-name.
But ideally we do this only if the server admin modified that
capability, and not just because they modified something else
unrelated in the same file.

Schema migrations are making me unhappy these days.

Luthander, Fredrik

unread,
Apr 18, 2012, 10:24:17 AM4/18/12
to Shawn Pearce, Repo and Gerrit Discussion

> -----Original Message-----
> From: Shawn Pearce [mailto:s...@google.com]
> Sent: onsdag den 18 april 2012 16:14
> To: Luthander, Fredrik
> Cc: Edwin Kempin; Repo and Gerrit Discussion
> Subject: Re: Question about up/download restriction
>
> On Wed, Apr 18, 2012 at 07:07, Luthander, Fredrik
> <Fredrik....@sonymobile.com> wrote:
> >> >
> >> > The entire Git protocol setup is a confusing mess right now. None
> of
> >> > the options are named consistently. :-(
> >>
> >> On that note, how should one treat a name change of a config
> variable
> >> in Gerrit.config or replication.config? How should a migration from
> an
> >> old to a new value be done?
> >>
> >> I'm asking because I'm planning to eventually submit a change that
> >> renames one of the permissions that control capabilities.
> >
> > Which would then of course be found in project.config under
> refs/meta/config rather than Gerrit.config or replication.config...
> Aside from that, the question remains as before..
>
> We have to be veerrrrry careful here.

That was my thought too when I sat down and thought a little about this. :-)

> In theory you can run a schema migration, and in the schema migration
> iterate through all projects and edit the refs/meta/config branch as
> you go. In practice this is not possible. In particular, I can't
> iterate everything and convert it while the server is offline in my
> hosting environment... because the server is never offline. So doing a
> schema migration like this can break my hosting environment, breaking
> the gerrit and android projects. If we have to do it, you could be
> signing my team up for a week's worth of labor just for that one
> change. :-(

I didn't realize that's what happens for you, but I was still hesitant to go for this anyway. Mostly because of the inherent complexity of the change.

>
> The preferred approach is to modify the reader code to read the new
> name, and if it is not defined, read the old name, e.g.:

I was a little worried that such an approach would be considered a bit ugly and inconsistent, since different installations can have different values.
However it makes me happy to find that you come up with the same concept as I had in mind as well...

>
> if (cfg.getString("section", null, "new-name") != null) {
> value = cfg.getEnum("section", null, "new-name", Type.DEFAULT);
> } else {
> value = cfg.getEnum("section", null, "old-name", Type.DEFAULT);
> }

.. even though I hadn't written any code for it yet. So I'll go ahead and steal these lines for the change (whenever it might happen). Thanks. :-)
Also, considering the work we might generate for your shop with the other alternative I think this is the only viable way to go.

> when storing the value, you can unset old-name and then set new-name.
> But ideally we do this only if the server admin modified that
> capability, and not just because they modified something else
> unrelated in the same file.

Right, it's smart to fix this at the time when we're changing owners of this value anyway.

Many groups can have this capability. I interpret your suggestion as changing the key-name for all holders of the capability at the time of editing any of the holders?

> Schema migrations are making me unhappy these days.

No wonder.

Shawn Pearce

unread,
Apr 18, 2012, 10:56:30 AM4/18/12
to Luthander, Fredrik, Repo and Gerrit Discussion
On Wed, Apr 18, 2012 at 07:24, Luthander, Fredrik
<Fredrik....@sonymobile.com> wrote:
>
>> In theory you can run a schema migration, and in the schema migration
>> iterate through all projects and edit the refs/meta/config branch as
>> you go. In practice this is not possible. In particular, I can't
>> iterate everything and convert it while the server is offline in my
>> hosting environment... because the server is never offline. So doing a
>> schema migration like this can break my hosting environment, breaking
>> the gerrit and android projects. If we have to do it, you could be
>> signing my team up for a week's worth of labor just for that one
>> change.  :-(
>
> I didn't realize that's what happens for you, but I was still hesitant to go for this anyway. Mostly because of the inherent complexity of the change.

My problem is a fairly simple one. We run servers 24/7 that are
talking to the same storage backend. Data changes to either database
or Git arrive at all servers within some small interval of taking
place.

To support a change like this we have to make a custom binary that
knows how to read the new name, and fall back to the old name if the
new name isn't yet defined. We have to push that binary out to all
servers... a process that can take us a full day to perform between
building, testing, and rolling out the binary at a slow enough rate
that we don't adversely interrupt user operations.

The rollout takes a server offline, upgrades it, puts it back into
service, then moves to the next one. But it takes a very long time
because when we take a server out of service, it may already be
handling some requests. So we attempt to wait for it to finish
everything that is outstanding. If a user on a slow connection is
cloning something very big, like say android's frameworks/base, this
may take a long time. Apparently, someone somewhere is always doing
this. Stupid popular projects. :-)

After the new version is running everywhere, we are now probably in
the middle of day 2 of this process, if you assume we spent 1 day
studying the affected change and making sure we have a version of code
that understands both formats at the same time. We still haven't
actually changed the data, but we are 2 days in. Whee.

Now we can roll out the data change. Typically this should be written
as a MapReduce style. Which isn't the same as a Schema_NN style
migration script. Its further complicated by the way our Git storage
system works. The time it takes to run this MapReduce is generally
low, we don't have that much data to process, but there are some ugly
management overheads we have to deal with to get the MapReduce
compiled and launched near the data. Call this at least 1 day for
someone. The last schema change we did that added new indexes for the
merged branch search cost me a day right here.

After the data is migrated... we are now at the start of day 4, if
things have gone well. Now I can build a new binary that only
understands the new format, test, and roll that out.... bringing us to
day 5.

There went a week. Its interrupty enough its hard to get much else
done than to work through this process. We manage to get email and
other small stuff done around this process, but none of us has managed
to do serious development work during one of these, its too
distracting.

Each schema change has thus far been unique enough that I haven't been
able to generalize the process better than the outline I just wrote
above. This clearly sounds like something we could automate. But
schema changes are pretty scary, as the delta is always unique.

> Also, considering the work we might generate for your shop with the other alternative I think this is the only viable way to go.

Unfortunately. :-(

The pain we have is unique to our environment right now, so its our
own fault. So to some extent we are willing to accept these when we
have to. But we are still looking for ways to bring our cluster-aware
Gerrit to the open source version, which means we can bring this sort
of management pain to everyone!

:-)

>> when storing the value, you can unset old-name and then set new-name.
>> But ideally we do this only if the server admin modified that
>> capability, and not just because they modified something else
>> unrelated in the same file.
>
> Right, it's smart to fix this at the time when we're changing owners of this value anyway.
>
> Many groups can have this capability. I interpret your suggestion as changing the key-name for all holders of the capability at the time of editing any of the holders?

Correct. So lets say we are renaming "createGroup" to "makeGroup",
just for sake of argument. We shouldn't rename this because the
administrator modified "createProject" group list, or because they
modified the Code-Review label for refs/heads/*. We should only rename
it if they modified the createGroup list.

But even then its a bit weird of a diff in Git. The diff will show a
full replace of the entire list, even though the admin perhaps only
added (or removed) one group. This still makes it hard to see what the
delta is.

So maybe its more like what we do when we rename a group, if the
server detects when displaying the access page that a group is
incorrectly named in the groups file, it makes a commit as the server
identity to rename the group on its own, so that this refactoring is a
no-op, and any changes made by the owner/administrator are reflected
after the rename.

Swindells, Thomas

unread,
Apr 18, 2012, 11:28:54 AM4/18/12
to Shawn Pearce, Luthander, Fredrik, Repo and Gerrit Discussion
Thanks for the long summary Shawn, very interesting and a nice insight.

These are half thought through thoughts that probably don't work so feel free to shoot them down.
Premise:
1. All config we are migrating is stored in Git
2. Each server knows the version of the software it is running
3. Config changes can be frozen during migration (may be a premise too far?)

If this is the case then could something like the following be done:
A flag is set indicating that the config can't be changed.
When upgrading the config file is tagged with the schema version that it is for.
When a server loads the config it will find the closest schema version tag to head.
If it is the correct schema version then it can read the config.
If it is a newer version then it backtracks to tag^ and repeats the process until it finds the config in the format it does support.
Once all the servers have been updated the flag is removed and configs can then be updated.

This gets around having nasty code coping with multiple config files, and allows the config files to be updated before the upgrade, but obviously the down side is that configuration is frozen. This might remove a load of the steps from what you are doing though and speed it up?

I'm sure it's not this simple just thought I'd throw the idea out (the alternative is getting down and writing the design doc I'm meant to be writing...)

Thomas

> -----Original Message-----
> From: repo-d...@googlegroups.com [mailto:repo-

> dis...@googlegroups.com] On Behalf Of Shawn Pearce
> Sent: 18 April 2012 15:57
> To: Luthander, Fredrik
> Cc: Repo and Gerrit Discussion
> Subject: Re: Question about up/download restriction
>


**************************************************************************************
This message is confidential and intended only for the addressee. If you have received this message in error, please immediately notify the postm...@nds.com and delete it from your system as well as any copies. The content of e-mails as well as traffic data may be monitored by NDS for employment and security purposes. To protect the environment please do not print this e-mail unless necessary.

NDS Limited. Registered Office: One London Road, Staines, Middlesex, TW18 4EX, United Kingdom. A company registered in England and Wales. Registered no. 3080780. VAT no. GB 603 8808 40-00
**************************************************************************************

seonguk.baek

unread,
Apr 18, 2012, 8:18:32 PM4/18/12
to Repo and Gerrit Discussion
Thanks for your reply :-)

It's working fine!!

seonguk.baek

unread,
Apr 19, 2012, 6:38:48 AM4/19/12
to Repo and Gerrit Discussion
One more question about replication environment.

We're operating a single master server, 5 slaves for load balancing
and another 5 slaves for overseas users in each country.

I worry the master server can take the load If we make more overseas
slave servers.

Is it enough one master server? or any other way like dual master?

Thanks

Magnus Bäck

unread,
Apr 19, 2012, 8:55:24 AM4/19/12
to Repo and Gerrit Discussion
On Thursday, April 19, 2012 at 06:38 EDT,
"seonguk.baek" <baeks...@gmail.com> wrote:

No, there must be exactly one master. However, if the replication
load itself is causing a problem for the master I suppose you could
have multi-layer replication where one server acts as a relay, i.e.
accepts data from the master and distributes it to the additional
slaves. This would of course introduce a slight delay.

However, looking at the code (Daemon.java) at least some parts of
the replication is being disabled for slave servers. From a quick
look I can't tell whether it's just the startup replication or if
also the on-the-fly replication in ReceiveCommits.java is disabled
for slaves.

Of course, the relay server doesn't have to run Gerrit.

--
Magnus Bäck
ba...@google.com

seonguk.baek

unread,
Apr 19, 2012, 8:02:25 PM4/19/12
to Repo and Gerrit Discussion
Dear. Bäck.

Thanks for your comments.
I've a question for how to set up relay server.
Please let me know if there is any document or some pages for those
configurations.
If not please guide me for that.

And I can't understand your last comments which relay server doesn't
need to run Gerrit.
In our case, we run a gerrit for every slave server.

Last question is about git repository can be sync for master -> relay -
> other slave server even relay server doesn't have a gerrit.

Thanks.



Magnus Bäck

unread,
Apr 20, 2012, 9:04:45 AM4/20/12
to Repo and Gerrit Discussion
On Thursday, April 19, 2012 at 20:02 EDT,
"seonguk.baek" <baeks...@gmail.com> wrote:

> Thanks for your comments.
> I've a question for how to set up relay server.
> Please let me know if there is any document or some pages for those
> configurations.
> If not please guide me for that.

If the relay server runs Gerrit you can basically set it up like any
other slave server, except that you also configure replication. If you
don't use Gerrit I don't know what choices you have. There might be
turn-key software for this, but you may also need to develop your own
hooks etc to make it happen.

> And I can't understand your last comments which relay server doesn't
> need to run Gerrit.
> In our case, we run a gerrit for every slave server.

Sure, but a relay server whose only task is to receive commits from the
master server (running Gerrit) and push these updates to a number of
slave servers (also running Gerrit) doesn't need to run Gerrit itself.
It could, but as I pointed out I'm not entirely sure that Gerrit run in
slave mode itself will replicate the data it receives. At least some
aspects of the replication appears to be disabled for slave setups.
Hopefully someone else can chime in here. If Gerrit doesn't support this
type of setup I think it's a bug.

> Last question is about git repository can be sync for master -> relay
> -> other slave server even relay server doesn't have a gerrit.

As I said, the task of receiving commits and other git objects and
pushing them to other servers isn't something that *only* Gerrit can do.

--
Magnus Bäck
ba...@google.com

Patrick Renaud

unread,
Apr 23, 2012, 7:07:05 PM4/23/12
to Repo and Gerrit Discussion
Regarding the sshd.advertisedAddress: the documentation says that "If
multiple values are supplied, the daemon will advertise all of them".
I haven't been able to do that so far. How can I specify multiple
values and get them all advertised accordingly?

Tried multiple advertisedAddress entries in gerrit.config: only the
first entry gets advertised. Seems the extra entries are just ignored.
Tried multiple advertisedAddress values in the same entry in
gerrit.config: treated as one address :-(. Tried comma separated list
and space delimited too. Running out of ideas....

I must be doing something wrong, or I don't know what I'm doing! ;-)

Shawn Pearce

unread,
Apr 23, 2012, 7:28:51 PM4/23/12
to Patrick Renaud, Repo and Gerrit Discussion
On Mon, Apr 23, 2012 at 16:07, Patrick Renaud <pren...@gmail.com> wrote:
>
> Regarding the sshd.advertisedAddress: the documentation says that "If
> multiple values are supplied, the daemon will advertise all of them".
> I haven't been able to do that so far. How can I specify multiple
> values and get them all advertised accordingly?

This might mean only on the /register page where we show the user the
SSH host keys of the server. :-(

seonguk.baek

unread,
May 2, 2012, 12:08:43 AM5/2/12
to Repo and Gerrit Discussion
Dear Shawn.

As you said, I set gerrit.config like below to restrict repo sync from
master.

[upload]
allowGroup = groupName

But this time, developers can't use gerrit cherry-pick command also.
Is there any way to allow only gerrit cherry-pick from master or
slaves?

Thanks

Luciano Carvalho

unread,
May 2, 2012, 12:21:04 AM5/2/12
to seonguk.baek, Repo and Gerrit Discussion

They should be able to get it from the slaves with the proper .gitconfig setting in their home dir.

Something like this will do it:

[url "ssh://user@slave-server:29418/"]
   insteadOf = "ssh://user@gerrit-server:29418/"

Regards,

Luciano.

seonguk.baek

unread,
May 2, 2012, 12:44:25 AM5/2/12
to Repo and Gerrit Discussion
Thanks for your reply!

Master server and slaves are same mysql DB.

So, we changed the fetch url from master to slave.

Error message occurs like below.

fatal: Couldn't find remote ref refs/changes/87/60387/1

Martin Fick

unread,
May 2, 2012, 12:50:43 AM5/2/12
to seonguk.baek, Repo and Gerrit Discussion


"seonguk.baek" <baeks...@gmail.com> wrote:
>
>fatal: Couldn't find remote ref refs/changes/87/60387/1

Sounds like your change is not yet replicated to your slave.
The replication queue needs to be fixed to retry on "failed to lock" errors, check your logs to see if there are any?


Employee of Qualcomm Innovation Center,Inc. which is a member of Code Aurora Forum

Luciano Carvalho

unread,
May 2, 2012, 1:44:06 AM5/2/12
to seonguk.baek, Repo and Gerrit Discussion

Maybe they're not replicating the refs/changes namespace.

Make sure you do replicate refs/tags, refs/heads and refs/changes to the slaves as well.

Reply all
Reply to author
Forward
0 new messages