Gerrit Data Retrieval Problems

13 views
Skip to first unread message

johann...@gmail.com

unread,
Oct 18, 2017, 5:33:57 AM10/18/17
to infra-dev, r.m.s....@student.tudelft.nl, m.vrach...@student.tudelft.nl
Hi everyone,

we are a group of students who want to analyse Gerrit data for a university project in order to gain some insights in the code reviewing process of OSS projects.

Currently, we are trying to retrieve all data from Chromium's Gerrit repo but we ran into the following error:

"cannot exceed 10000 results (after filtering for visibility)"

We request 500 changes/page, usually around the 7th page we receive this error - which is of course much fewer than 10000 results.

Our script works on other Gerrit repos.

Is the access to older reviews restricted?
How can we solve that?
Can the Gerrit admins maybe directly provide us with Chromium's Gerrit data?

We are looking forward to every help we can get.

Thank you already in advance,
Johannes

Primiano Tucci

unread,
Oct 18, 2017, 5:51:41 AM10/18/17
to johann...@gmail.com, infra-dev, r.m.s....@student.tudelft.nl, m.vrach...@student.tudelft.nl
How can we solve that?
Gerrit has a second, perhaps less documented, API which is based on git.

You can list all the codereviews by listing refs in the refs/changes/... namespace, e.g.:

$ git ls-remote https://chromium.googlesource.com/chromium/src.git | grep refs/changes/
...
478c6ea91d36d63d32b6cc913cd187c048fd7533 refs/changes/00/453300/1
cab5badec41bc502f8688e3607f08b24aa80a389 refs/changes/00/453300/2
6ce03c4a04d938a210a15040b95f0347d37c42bb refs/changes/00/453300/meta
59c9215c00d64d99f312a8c8e2a22651739c5f2b refs/changes/00/455000/1
e47ad8ecb6a3ea2299efe049a2ccf6b3fe09457c refs/changes/00/455000/2
5039cb03dd576cb52f1a8fc8bb7c60787225119e refs/changes/00/455000/meta

For each CL there are going to be 1 + num_patchsets refs.
As the name suggests refs/changes/00/453300/meta refers to some metadata that describes the history of the CL crrev.com/c/00453300. For instance

$ git fetch origin refs/changes/00/453300/meta
$ git log -p FETCH_HEAD

commit 6ce03c4a04d938a210a15040b95f0347d37c42bb
Author: Jeff Carpenter <1168241@3ce6091f-6c88-37e8-8c75-72f92ae8dfba>
Date:   Thu Apr 13 20:12:26 2017

    Update patch set 2

    Abandoned

    Patch-set: 2
    Status: abandoned
    Tag: autogenerated:gerrit:abandon

commit d1e1d1340ce2eb7045c15e556528330318fffc9d
Author: Jeff Carpenter <1168241@3ce6091f-6c88-37e8-8c75-72f92ae8dfba>
Date:   Mon Apr 3 21:51:29 2017

    Update patch set 2

    Restored

    Patch-set: 2
    Status: new
    Tag: autogenerated:gerrit:restore


While refs/changes/00/453300/1, refs/changes/00/453300/2 and so on are the actual patchsets.
Depending on what you need you can either use purely the git interface (just pull all the refs and analyze locally) or mix the git interface to list CLs and some top-level metadata, and once you have the CL number use the REST api to get the rest.

Hope it helps,
Primiano

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/f4263098-ff09-4c6f-8911-be96675c8ae9%40chromium.org.

Aaron Gable

unread,
Oct 18, 2017, 1:06:24 PM10/18/17
to Primiano Tucci, johann...@gmail.com, repo-d...@googlegroups.com, infra-dev, r.m.s....@student.tudelft.nl, m.vrach...@student.tudelft.nl
The list you really want is +repo-...@googlegroups.com, that group contains the developers and maintainers of Gerrit itself.

Riaas Mokiem

unread,
Oct 27, 2017, 4:18:57 AM10/27/17
to Aaron Gable, Primiano Tucci, johann...@gmail.com, repo-d...@googlegroups.com, infra-dev, m.vrach...@student.tudelft.nl
I'm trying the Primiano's suggestion right now and it seems to work fairly
well. I could get all change IDs which I'm now trying to retrieve one-by-one.
However, this seems to exceed the rate-limiting limits for Gerrit. I'm
currently waiting 1 second between successive calls, which works for a few
hours. But then I still end up with a HTTP 429 status and have to stop data
retrieval.

Does anyone know a way around this ? Or if preferred, can anyone tell me the
exact values used for rate-limiting so I can adjust my script to stay within
those limits?
Kind regards,

Riaas

On Wednesday, October 18, 2017 5:06:11 PM CEST Aaron Gable wrote:
> The list you really want is +repo-d...@googlegroups.com
> <repo-d...@googlegroups.com>, that group contains the developers and
> > https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CA%2ByH71e_Z13N
> > y_dX4zScQeefzGzVvvi1RBY6aMtKDnhQL%3Dq%3Dgw%40mail.gmail.com
> > <https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CA%2ByH71e_Z1
> > 3Ny_dX4zScQeefzGzVvvi1RBY6aMtKDnhQL%3Dq%3Dgw%40mail.gmail.com?utm_medium=e
> > mail&utm_source=footer> .

Riaas Mokiem

unread,
Oct 27, 2017, 4:40:25 AM10/27/17
to Aaron Gable, infra-dev, Primiano Tucci, johann...@gmail.com, m.vrach...@student.tudelft.nl
I also tried to send this to repo-discuss, as suggested, but I don't have
permission to post there.

Aaron Gable

unread,
Oct 27, 2017, 1:04:01 PM10/27/17
to Riaas Mokiem, Aaron Gable, infra-dev, Primiano Tucci, johann...@gmail.com, m.vrach...@student.tudelft.nl
All you have to do is join the group to be able to post. They are the only people who can truly answer questions about rate-limiting.

Primiano Tucci

unread,
Oct 27, 2017, 1:25:00 PM10/27/17
to Riaas Mokiem, Aaron Gable, infra-dev, johann...@gmail.com, m.vrach...@student.tudelft.nl, repo-d...@googlegroups.com
Make sure all your requests are authenticated via gitcookies. Registered accounts, even if from Gmail, get a higher quota than anonymous requests. 
Just register on Gerrit (chromium.googlesource.com), go to the settings page, http password, and get your cookies. 
Reply all
Reply to author
Forward
0 new messages