Gerrit Metrics

1,254 views
Skip to first unread message

Stephen Roberts

unread,
Dec 9, 2011, 6:00:19 AM12/9/11
to Repo and Gerrit Discussion, jose....@isis-papyrus.com
Hi,
I am interested in generating some metrics from our gerrit
installation and am wondering if it is possible/trivial. Basically, we
have been trialling gerrit for some time now and I would like to go to
management with some hard number as to why we should continue to use
it. Sort of like, we have done X reviews, which on average go through
Y iterations and catch Z issues. Is this possible? Or even is the
database format documented somewhere to make writing such a tool easy?

Thanks!
Stephen

Saša Živkov

unread,
Dec 9, 2011, 11:23:56 AM12/9/11
to Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
I don't think the DB schema is documented but it is quite simple to understand.
Look at the DB tables (changes, patch_sets, etc..) and you will find the
information you need.


Edwin Kempin

unread,
Dec 9, 2011, 11:31:13 AM12/9/11
to Saša Živkov, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
Hi Stephen,

please be aware that there is the idea to remove the database in future Gerrit versions (or at least most of its tables). E.g. all the change data will then be stored directly in the git repositories. Keep that in mind before investing too much efforts in writing a tool that relies on the Gerrit database schema.

Best regards,
Edwin

2011/12/9 Saša Živkov <ziv...@gmail.com>

Deen Sethanandha

unread,
Dec 15, 2011, 11:15:21 AM12/15/11
to Edwin Kempin, Saša Živkov, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
Hi,

I am also working on mining Gerrit as part of my Ph.D Dissertation. Currently, I retrieve data directly from the database but the more robust approach would be to create a metric database by retrieving the data from the Gerrit API.

@Stephen, please let me know what kind of metric are you interested in getting. Hopefully we can work together on this. I am working on proposing metrics that can be used to evaluate and improve review process.

The related table for collecting metrics are

1) accounts -> store user information
2) changes -> store Gerrit change requests
3) change_message -> store comments for change requests
4) patch_sets -> store patchset information but it doesn't have content of patch of patch_set.  The content of patch is stored in Git.
5) patch_comments -> stores comment for each line in a patch

@Edwin, could you let us know where we can learn more about the possible changes when the data is moved to Git. How Gerrit is going to store review history and how to retrieve them? What is the status of this effort?


Best regards
-- Deen

Edwin Kempin

unread,
Dec 22, 2011, 2:00:25 PM12/22/11
to Deen Sethanandha, Saša Živkov, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com


2011/12/15 Deen Sethanandha <deen...@gmail.com>

Hi,

I am also working on mining Gerrit as part of my Ph.D Dissertation. Currently, I retrieve data directly from the database but the more robust approach would be to create a metric database by retrieving the data from the Gerrit API.

@Stephen, please let me know what kind of metric are you interested in getting. Hopefully we can work together on this. I am working on proposing metrics that can be used to evaluate and improve review process.

The related table for collecting metrics are

1) accounts -> store user information
2) changes -> store Gerrit change requests
3) change_message -> store comments for change requests
4) patch_sets -> store patchset information but it doesn't have content of patch of patch_set.  The content of patch is stored in Git.
5) patch_comments -> stores comment for each line in a patch

@Edwin, could you let us know where we can learn more about the possible changes when the data is moved to Git. How Gerrit is going to store review history and how to retrieve them? What is the status of this effort?
Shawn follows the idea to remove the Gerrit database already for some time. It was discussed on several occasions (e.g. on the mailing list, but also on GitTogether).
One hint about it can be found in the Gerrit design document 'Documentation/dev-design.html'. There it says
"The metadata is mostly housed in the database (*1) ...

*1 Although an effort is underway to eliminate the use of the database altogether, and to store all the metadata directly in the git repositories themselves. So far, as of Gerrit 2.2.1, of all Gerrit’s metadata, only the project configuration metadata has been migrated out of the database and into the git repositories for each project."
For the upcoming Gerrit 2.2.2 release this statement is still true.

Interesting to check out is one discussion on the mailing list [1] where some pros and cons about removing the database are discussed.

How exactly the review history will be stored in the end I can't say, but I'm sure that Shawn has already some ideas about this. I expect that we will have some special branches in the git repositories in which the review data is persisted (similar to the refs/meta/config branch that we already have to store the project configuration metadata).

[1] http://groups.google.com/group/repo-discuss/browse_thread/thread/3f5dd610fd226adc/ca19de38c986cde6

 

Janne Hellsten

unread,
Dec 27, 2011, 1:46:50 PM12/27/11
to Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
Hi,

It should be relatively easy to collect metrics via the Gerrit JSON
interface. This interface should be decoupled from Gerrit's internal
data layout.

See docs for the "query" command:
http://gerrit.googlecode.com/svn/documentation/2.2.1/cmd-query.html

Not everything is available via this interface but this should be
enough for information like # of reviews done, average number of
iterations per change, etc. My goal has been to compute a "review
latency" metric from Gerrit.

I've written Haskell code to talk to Gerrit via SSH and parse the JSON
responses to Haskell data structures. This is pretty handy for
further data mining in Haskell. Ping me if you're interested, I can
make the code available if someone finds it useful.

Janne

Remy Bohmer

unread,
Dec 28, 2011, 3:36:13 PM12/28/11
to Janne Hellsten, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
Hi Janne,

2011/12/27 Janne Hellsten <jjhe...@gmail.com>:


> I've written Haskell code to talk to Gerrit via SSH and parse the JSON
> responses to Haskell data structures.  This is pretty handy for
> further data mining in Haskell.  Ping me if you're interested, I can
> make the code available if someone finds it useful.

Here is the ping ;-)
I find it useful, where can I find the code?

Kind regards,

Remy

Janne Hellsten

unread,
Dec 29, 2011, 11:02:10 AM12/29/11
to Remy Bohmer, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
>> I've written Haskell code to talk to Gerrit via SSH and parse the JSON
>> responses to Haskell data structures.  This is pretty handy for
>> further data mining in Haskell.  Ping me if you're interested, I can
>> make the code available if someone finds it useful.
>
> Here is the ping ;-)
> I find it useful, where can I find the code?

You can find the GerritJson module here: https://github.com/nurpax/gerrit-json

I only recently went back to Haskell so there are probably many ways
in which the code can be improved.

Janne

Remy Bohmer

unread,
Dec 29, 2011, 4:57:42 PM12/29/11
to Janne Hellsten, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
Hi,

2011/12/29 Janne Hellsten <jjhe...@gmail.com>:

Thanks for sharing!
I will look into it in detail next year/week ;-)

Kind regards,

Remy

Lundh, Gustaf

unread,
Oct 12, 2012, 7:30:46 AM10/12/12
to Remy Bohmer, Janne Hellsten, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
For some of my Gerrit-metrics I'm using the Gerrit-event module in Gerrit-trigger[1] to collect events through the stream-event SSH interface.

I collect the interesting data and feed it to a Graphite instance. This way I can plot real-time data in terms of:
Changes per hour (or minute) and also the rate of comments/merges/etc.

One of my colleagues has also created python interface for listening and parsing the events. Not sure if it has been open-sourced yet.

[1] https://github.com/jenkinsci/gerrit-trigger-plugin

Best regards
Gustaf

Pursehouse, David

unread,
Oct 12, 2012, 8:10:52 AM10/12/12
to Lundh, Gustaf, Remy Bohmer, Janne Hellsten, Stephen Roberts, Repo and Gerrit Discussion, jose....@isis-papyrus.com
> One of my colleagues has also created python interface for listening and parsing the events.
> Not sure if it has been open-sourced yet.

Not yet. It needs a bit more tidying up, and then needs to go through the usual management approvals before it can be open sourced.

/David

David Pursehouse

unread,
Aug 9, 2013, 2:57:15 AM8/9/13
to repo-d...@googlegroups.com, Lundh, Gustaf, Remy Bohmer, Janne Hellsten, Stephen Roberts, jose....@isis-papyrus.com
It's taken a bit longer than I expected, but the python interface that Gustaf was referring to above has been open sourced.  Actually the initial commits were pushed to Github some time ago, but it's only in this last week that I've managed to find time to make it into a proper Python package.

Basically what it does is gives you a python class that runs the `stream-events` command over SSH and packages the received data up into python objects in a queue that the client can fetch from.  We use it in some backend management scripts for Django apps that need to get real-time information about commits being approved/merged etc.

The source (see link below) includes an example script that demonstrates the usage.

It's been built with quite specific uses in mind so I'm not really sure how useful it's going to be for anyone else.  Pull requests are welcome though.

Links:


Reply all
Reply to author
Forward
0 new messages