Gerrit Data Retrieval Problems

88 views
Skip to first unread message

Johannes Zenz

unread,
Oct 18, 2017, 5:44:10 AM10/18/17
to Repo and Gerrit Discussion
Hi everyone,

we are a group of students who want to analyse Gerrit data for a university project in order to gain some insights in the code reviewing process of OSS projects.

Currently, we are trying to retrieve all data from Android's Gerrit repo but we ran into the following error:

"cannot exceed 10000 results (after filtering for visibility)"

We request 500 changes/page, usually around the 7th page we receive this error - which is of course much fewer than 10000 results.

Our script works on other Gerrit repos.

Is the access to older reviews restricted?
How can we solve that?
Can the Gerrit admins maybe directly provide us with Android's Gerrit data?

We are looking forward to every help we can get.

Thank you already in advance,
Johannes

Han-Wen Nienhuys

unread,
Oct 18, 2017, 12:28:41 PM10/18/17
to Johannes Zenz, Repo and Gerrit Discussion
If you do

git clone --mirror https://android.googlesource.com/REPO

for all repositories, you will find branches called

refs/changes/56/123456/meta

each of these contains the codereview data (comments, votes) for a change.

--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado

Shawn Pearce

unread,
Oct 18, 2017, 12:46:31 PM10/18/17
to Johannes Zenz, Han-Wen Nienhuys, Repo and Gerrit Discussion
To answer Johannes original question, the search index server limits at 10,000 raw results in our indexing system. Gerrit is then filtering changes based on ACLs to only those changes that are visible. A private change for example is going to be hidden from your results view. If there are 100 private changes in the first 10,000 results than you can only get 9,900 results back. Tweaking your query to have fewer results (e.g. scoping per project and issuing a query per project) will get you a larger set of changes, but each query is still limited to that 10,000 raw results.


Because Gerrit NoteDb stores the original data in the refs/changes/.../meta branches in every repository, a mirror clone as Han-Wen suggests will get you all accessible information.



--
--
To unsubscribe, email repo-discuss+unsubscribe@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Gable

unread,
Oct 18, 2017, 1:06:29 PM10/18/17
to Primiano Tucci, johann...@gmail.com, repo-d...@googlegroups.com, infra-dev, r.m.s....@student.tudelft.nl, m.vrach...@student.tudelft.nl
The list you really want is +repo-...@googlegroups.com, that group contains the developers and maintainers of Gerrit itself.

On Wed, Oct 18, 2017 at 2:51 AM Primiano Tucci <prim...@chromium.org> wrote:
How can we solve that?
Gerrit has a second, perhaps less documented, API which is based on git.

You can list all the codereviews by listing refs in the refs/changes/... namespace, e.g.:

$ git ls-remote https://chromium.googlesource.com/chromium/src.git | grep refs/changes/
...
478c6ea91d36d63d32b6cc913cd187c048fd7533 refs/changes/00/453300/1
cab5badec41bc502f8688e3607f08b24aa80a389 refs/changes/00/453300/2
6ce03c4a04d938a210a15040b95f0347d37c42bb refs/changes/00/453300/meta
59c9215c00d64d99f312a8c8e2a22651739c5f2b refs/changes/00/455000/1
e47ad8ecb6a3ea2299efe049a2ccf6b3fe09457c refs/changes/00/455000/2
5039cb03dd576cb52f1a8fc8bb7c60787225119e refs/changes/00/455000/meta

For each CL there are going to be 1 + num_patchsets refs.
As the name suggests refs/changes/00/453300/meta refers to some metadata that describes the history of the CL crrev.com/c/00453300. For instance

$ git fetch origin refs/changes/00/453300/meta
$ git log -p FETCH_HEAD

commit 6ce03c4a04d938a210a15040b95f0347d37c42bb
Author: Jeff Carpenter <1168241@3ce6091f-6c88-37e8-8c75-72f92ae8dfba>
Date:   Thu Apr 13 20:12:26 2017

    Update patch set 2

    Abandoned

    Patch-set: 2
    Status: abandoned
    Tag: autogenerated:gerrit:abandon

commit d1e1d1340ce2eb7045c15e556528330318fffc9d
Author: Jeff Carpenter <1168241@3ce6091f-6c88-37e8-8c75-72f92ae8dfba>
Date:   Mon Apr 3 21:51:29 2017

    Update patch set 2

    Restored

    Patch-set: 2
    Status: new
    Tag: autogenerated:gerrit:restore


While refs/changes/00/453300/1, refs/changes/00/453300/2 and so on are the actual patchsets.
Depending on what you need you can either use purely the git interface (just pull all the refs and analyze locally) or mix the git interface to list CLs and some top-level metadata, and once you have the CL number use the REST api to get the rest.

Hope it helps,
Primiano


On Wed, Oct 18, 2017 at 10:33 AM <johann...@gmail.com> wrote:
Hi everyone,

we are a group of students who want to analyse Gerrit data for a university project in order to gain some insights in the code reviewing process of OSS projects.

Currently, we are trying to retrieve all data from Chromium's Gerrit repo but we ran into the following error:


"cannot exceed 10000 results (after filtering for visibility)"

We request 500 changes/page, usually around the 7th page we receive this error - which is of course much fewer than 10000 results.

Our script works on other Gerrit repos.

Is the access to older reviews restricted?
How can we solve that?
Can the Gerrit admins maybe directly provide us with Chromium's Gerrit data?


We are looking forward to every help we can get.

Thank you already in advance,
Johannes

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/f4263098-ff09-4c6f-8911-be96675c8ae9%40chromium.org.

--
You received this message because you are subscribed to the Google Groups "infra-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to infra-dev+...@chromium.org.
To post to this group, send email to infr...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/infra-dev/CA%2ByH71e_Z13Ny_dX4zScQeefzGzVvvi1RBY6aMtKDnhQL%3Dq%3Dgw%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages