Grafana dashboards

254 views
Skip to first unread message

Thomas Dräbing

unread,
Jan 14, 2020, 4:27:09 AM1/14/20
to Repo and Gerrit Discussion
Dear all,

we plan to move some of our monitoring to the Prometheus/Grafana-stack. Among the dashboard collection published on the Grafana homepage, I couldn't find any existing dashboards for Gerrit metrics [1]. Before I start to create new dashboards from scratch, I wanted to ask whether somebody has Grafana dashboards for Gerrit metrics and is willing to share them with the community. Having a solid base to start from would be of great help (not only for me, I guess). Thus, help would be greatly appreciated!

Thanks and best regards,
Thomas


Luca Milanesio

unread,
Jan 14, 2020, 8:37:19 AM1/14/20
to Thomas Dräbing, Luca Milanesio, Repo and Gerrit Discussion

On 14 Jan 2020, at 01:27, Thomas Dräbing <thomas....@gmail.com> wrote:

Dear all,

we plan to move some of our monitoring to the Prometheus/Grafana-stack. Among the dashboard collection published on the Grafana homepage, I couldn't find any existing dashboards for Gerrit metrics [1]. Before I start to create new dashboards from scratch, I wanted to ask whether somebody has Grafana dashboards for Gerrit metrics and is willing to share them with the community. Having a solid base to start from would be of great help (not only for me, I guess). Thus, help would be greatly appreciated!

We have one for Gerrit multi-site, which includes also replication and split-brain metrics.
See some screenshots at [2]

Luca.



Thanks and best regards,
Thomas



--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/dc9d5de4-1e9b-4b98-87b5-28fe61985a57%40googlegroups.com.

Fabio Ponciroli

unread,
Jan 14, 2020, 8:40:25 AM1/14/20
to Luca Milanesio, Luca Milanesio, Thomas Dräbing, Repo and Gerrit Discussion
@Luca Milanesio we could extract the multi-site part and just publish the core metrics. WDYT?

Luca Milanesio

unread,
Jan 14, 2020, 8:52:28 AM1/14/20
to Fabio Ponciroli, Luca Milanesio, Thomas Dräbing, Repo and Gerrit Discussion

On 14 Jan 2020, at 05:39, Fabio Ponciroli <pon...@gmail.com> wrote:

@Luca Milanesio we could extract the multi-site part and just publish the core metrics. WDYT?

Sure, that would be a start.

Luca.

Thomas Dräbing

unread,
Jan 14, 2020, 9:02:14 AM1/14/20
to Luca Milanesio, Fabio Ponciroli, Repo and Gerrit Discussion
Hi Luca, hi Fabio,

if you could share the dashboard, that would be really awesome!

Maybe we can version the json-files describing the dashboards somewhere? Then it would be easy to adapt to new metrics etc.
I will of course also happily share what we did for our Prometheus/Grafana setup, as soon as it is ready.

Best,
Thomas

Luca Milanesio

unread,
Jan 14, 2020, 10:33:05 AM1/14/20
to Thomas Dräbing, Luca Milanesio, Fabio Ponciroli, Repo and Gerrit Discussion

On 14 Jan 2020, at 06:01, Thomas Dräbing <thomas....@gmail.com> wrote:

Hi Luca, hi Fabio,

if you could share the dashboard, that would be really awesome!

I believe the best would be to have a docker-compose.yaml that already contains the components we need and pre-configured:
1. Prometheus
2. Grafana

With regards to the Grafana dashboard, it should be shared on http://snapshot.raintank.io/info/ correct?

Luca.

Thomas Dräbing

unread,
Jan 14, 2020, 10:56:52 AM1/14/20
to Luca Milanesio, Fabio Ponciroli, Repo and Gerrit Discussion
On Tue, 14 Jan 2020 at 16:33, Luca Milanesio <luca.mi...@gmail.com> wrote:


On 14 Jan 2020, at 06:01, Thomas Dräbing <thomas....@gmail.com> wrote:

Hi Luca, hi Fabio,

if you could share the dashboard, that would be really awesome!

I believe the best would be to have a docker-compose.yaml that already contains the components we need and pre-configured:
1. Prometheus
2. Grafana

I am currently working on a Kubernetes based setup that is very opinionated, so mostly configured. By mostly configured I mean that some configuration can not sensibly be preconfigured (e.g. credentials), but others will most of the time stay the same.

In my approach, I use the helm charts provided for prometheus [1] and grafana [2], configure them as much as possible with values, I think are sensible (and have been tested to work so far). For options like credentials, which have to be configured for each deployment, I created a leaner configuration-file to set them, that will be used to create the final configuration files for helm and additional resources. Thereby with only a little configuration and a few commands one can set up the logging stack, also without having to spend a lot of time to learn how to configure Prometheus or Grafana to get a basic monitoring setup for Gerrit.

I am planning to provide this to open source as soon as I have some dashboards, a sensible base configuration to work with and have tested it for our setup.
 

With regards to the Grafana dashboard, it should be shared on http://snapshot.raintank.io/info/ correct?

Yes, that would be one option, as well as on the grafana homepage (https://grafana.com/grafana/dashboards). Or we just version the json-files in a git repository. It will require a bit more work to deploy them though, since we can't just import them by an id or URL, but have to load the json. On the other hand it would provide us with code review and a bit more control :-). Maybe we could do both?
 

Paladox none

unread,
Jan 14, 2020, 1:06:42 PM1/14/20
to Repo and Gerrit Discussion
Wikimedia use the javamelody dashboard [1]

Mihály Petrényi

unread,
Jan 15, 2020, 2:30:39 AM1/15/20
to Repo and Gerrit Discussion
Hi,

I am from Ericsson. We are hosting a huge multi-site Gerrit instance using Grafana with Prometheus for monitoring.
We are using some standard Prometheus exporters: node, mtail, jmx, gerrit, postgres and haproxy.
Additionally, we are generating custom metrics, mainly based on node exporter's textfilecollector functionality.
That is a great, easy to use feature, I highly recommend it.
Files containing the metrics can be placed in a directory and node exporter will serve those to Prometheus.
We are also planning to use Grafana Loki for log files, instead of the current solution with mtail.
These exporters provide us ~15k metrics / node.

We split the metrics into several dashboards. At the moment we have the following main dashboards: Overview, Datacenters, Backend, Database, Frontend, Garbage Collection, Disk usage, Network, RED (based on Google's RED method), Replication, Repositories, Node exporter.

Most important thing is to have the Prometheus targets properly and consistently labeled. We are using the following common labels for our targets:
environment (dev, staging, production), configuration (master, slave), datacenter (for multi-site), job (exporter name), role (backend, frontend, gc, db)

Sample screenshot from our Overview dashboard: https://imgur.com/BLlGDt5

It is probably a good idea to have a shared, publicly available Grafana template for Gerrit that can be easily tailored to the given environments.
We are happy to contribute with our experience.

Regards,
Mihaly



2020. január 14., kedd 15:02:14 UTC+1 időpontban Thomas Dräbing a következőt írta:
Hi Luca, hi Fabio,

if you could share the dashboard, that would be really awesome!

Maybe we can version the json-files describing the dashboards somewhere? Then it would be easy to adapt to new metrics etc.
I will of course also happily share what we did for our Prometheus/Grafana setup, as soon as it is ready.

Best,
Thomas

On Tue, 14 Jan 2020 at 14:52, Luca Milanesio <luca.m...@gmail.com> wrote:
On 14 Jan 2020, at 05:39, Fabio Ponciroli <pon...@gmail.com> wrote:

...@Luca Milanesio we could extract the multi-site part and just publish the core metrics. WDYT?

Sure, that would be a start.

Luca.


Il giorno mar 14 gen 2020 alle ore 14:37 Luca Milanesio <luca.m...@gmail.com> ha scritto:


On 14 Jan 2020, at 01:27, Thomas Dräbing <thomas....@gmail.com> wrote:

Dear all,

we plan to move some of our monitoring to the Prometheus/Grafana-stack. Among the dashboard collection published on the Grafana homepage, I couldn't find any existing dashboards for Gerrit metrics [1]. Before I start to create new dashboards from scratch, I wanted to ask whether somebody has Grafana dashboards for Gerrit metrics and is willing to share them with the community. Having a solid base to start from would be of great help (not only for me, I guess). Thus, help would be greatly appreciated!

We have one for Gerrit multi-site, which includes also replication and split-brain metrics.
See some screenshots at [2]

Luca.



Thanks and best regards,
Thomas



--
--

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.


--
--
To unsubscribe, email repo-d...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.

Luca Milanesio

unread,
Jan 15, 2020, 10:31:46 AM1/15/20
to Mihály Petrényi, Luca Milanesio, Repo and Gerrit Discussion

On 14 Jan 2020, at 07:20, Mihály Petrényi <e.mihaly...@gmail.com> wrote:

Hi,

I am from Ericsson. We are hosting a huge multi-site Gerrit instance using Grafana with Prometheus for monitoring.

Hi Mihaly, thanks for sharing your experience.

Out of topic: which Gerrit multi-site implementation are you using? Gerrit + multi-site plugin? WANdisco? In-house implementation?
Is it a Gerrit multi-master/multi-site or a simple master-slave replication?

We are using some standard Prometheus exporters: node, mtail, jmx, gerrit, postgres and haproxy.

How do you synchronise the Postgres multi-site?

Additionally, we are generating custom metrics, mainly based on node exporter's textfilecollector functionality.
That is a great, easy to use feature, I highly recommend it.
Files containing the metrics can be placed in a directory and node exporter will serve those to Prometheus.

That’s a very good hint, thanks a lot for that.

We are also planning to use Grafana Loki for log files, instead of the current solution with mtail.
These exporters provide us ~15k metrics / node.

Really interesting also.


We split the metrics into several dashboards. At the moment we have the following main dashboards: Overview, Datacenters, Backend, Database, Frontend, Garbage Collection, Disk usage, Network, RED (based on Google's RED method), Replication, Repositories, Node exporter.

Most important thing is to have the Prometheus targets properly and consistently labeled. We are using the following common labels for our targets:
environment (dev, staging, production), configuration (master, slave), datacenter (for multi-site), job (exporter name), role (backend, frontend, gc, db)

Sample screenshot from our Overview dashboard: https://imgur.com/BLlGDt5

It is probably a good idea to have a shared, publicly available Grafana template for Gerrit that can be easily tailored to the given environments.
We are happy to contribute with our experience.

Have you thought about coming to a Gerrit User Summit and present your experience?

Luca.


More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.

Message has been deleted

Mihály Petrényi

unread,
Jan 15, 2020, 6:43:48 PM1/15/20
to Repo and Gerrit Discussion

Out of topic: which Gerrit multi-site implementation are you using? Gerrit + multi-site plugin? WANdisco? In-house implementation?
Is it a Gerrit multi-master/multi-site or a simple master-slave replication?

We are using an in-house implementation as we are stuck with an older Gerrit version with postgresql and Lucene. At the moment, it is a "simple" master-slave setup. Master is using the high-availability plugin to achieve an active-passive configuration. This is a bottleneck in our implementation and we are planning to improve on this in the close future by introducing Elasticsearch and upgrading to NoteDb in order to bring us closer to the multi-master setup. For replication, we use the replication plugin with push replications.
 

How do you synchronise the Postgres multi-site?

 
We use the Postgres streaming replication feature and Pgpool for high availability.


Have you thought about coming to a Gerrit User Summit and present your experience?

I will bring this up within our team. Thanks for mentioning.

Mihaly

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:

2020. január 15., szerda 16:31:46 UTC+1 időpontban lucamilanesio a következőt írta:
Message has been deleted

Mihály Petrényi

unread,
Jan 16, 2020, 4:07:03 AM1/16/20
to Repo and Gerrit Discussion
Few additional things that might be worth to mention.

For Prometheus implementation, we use Cortex (https://github.com/cortexproject/cortex) for salability and high-availability.
For the shared Prometheus configuration, we use Consul (https://www.consul.io/).
Alerts are provided by Prometheus AlertManager via E-mail, Microsoft Teams channels and SMSs.
...

Thomas Dräbing

unread,
Jan 16, 2020, 8:35:46 AM1/16/20
to Mihály Petrényi, Repo and Gerrit Discussion
Hi Mihály,

thanks for sharing all the details about your monitoring setup. I agree with Luca, it would be very interesting to see more about that, e.g. at the next User Summit. Especially if you already collected some experience with Loki, since I was also thinking about using it instead of an EFK-stack.

To come back to the original discussion, I think, if we share dashboards to monitor Gerrit, it may only be feasible to share dashboards containing data that is reported from Gerrit itself, since every setup might use different additional components (e.g HAProxy/NGINX...), but this might still be very useful for users who want to start monitoring. Also such a dashboard will be highly dependent on the metric names that are used to store the data in prometheus and thus from the exporter that is used to collect the data. I am currently planning to use the metrics-reporter-prometheus plugin, so a dashboard I would build would depend on this plugin. Thus, if we publish the dashboard in a git repository, it would probably make sense to do it in the repository of the plugin itself and it would only provide a starting point, since it would not allow to combine data from different source, which may provide the most interesting facts.



To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/d33abef9-161c-4a71-a330-f2b40609f7e1%40googlegroups.com.

Mihály Petrényi

unread,
Jan 16, 2020, 9:25:15 AM1/16/20
to Repo and Gerrit Discussion
On Thursday, January 16, 2020 at 2:35:46 PM UTC+1, Thomas Dräbing wrote:
I am currently planning to use the metrics-reporter-prometheus plugin, ...

That plugin is an easy win in terms of Gerrit related metrics but it has a small issue. It exports a lot of metrics, including cache related ones that increases the HTTP endpoint response time. (The plugin itself creates new HTTP endpoints)
In fact, the HTTP response from the plugin will be as slow as the "gerrit show-caches" SSH command response time.
In case your Prometheus has a low scrape interval (under a minute) the connections can pile up and eventually overload the nodes.
Make sure your scrape interval is higher than the response time. On our master nodes, querying "gerrit show-caches" takes more than a minute.
...

Thomas Dräbing

unread,
Jan 16, 2020, 9:35:03 AM1/16/20
to Mihály Petrényi, Repo and Gerrit Discussion
Yes, we noticed as well that the disk state metrics of the persistent caches are an issue. In our case, we got 10+ hanging threads, because it took several minutes some times. We already new that these metrics cause such issues and for now I removed these metrics to test, whether this fixes the issue. If it does, I plan to propose a change that adds an option to disable this metric in Gerrit.

To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/0c868481-f925-4276-9c90-57688c5b17a1%40googlegroups.com.

Mihály Petrényi

unread,
Jan 16, 2020, 10:00:30 AM1/16/20
to Repo and Gerrit Discussion
On Thursday, January 16, 2020 at 3:35:03 PM UTC+1, Thomas Dräbing wrote:
Yes, we noticed as well that the disk state metrics of the persistent caches are an issue. In our case, we got 10+ hanging threads, because it took several minutes some times. We already new that these metrics cause such issues and for now I removed these metrics to test, whether this fixes the issue. If it does, I plan to propose a change that adds an option to disable this metric in Gerrit.

We decided to use metrics-reporter-jmx that exports the same metrics but through the jmx_exporter and it had the same issue.
I created a change to fix this by introducing a configuration possiblity to exclude metrics with pattern matching. eg.:
[metrics]
  exclude
= caches.*

If you stick to metrics-reporter-prometheus, feel free to port this fix to metrics-reporter-prometheus as well.


On Thursday, January 16, 2020 at 3:35:03 PM UTC+1, Thomas Dräbing wrote:
Yes, we noticed as well that the disk state metrics of the persistent caches are an issue. In our case, we got 10+ hanging threads, because it took several minutes some times. We already new that these metrics cause such issues and for now I removed these metrics to test, whether this fixes the issue. If it does, I plan to propose a change that adds an option to disable this metric in Gerrit.

...

Mihály Petrényi

unread,
Jan 16, 2020, 10:02:06 AM1/16/20
to Repo and Gerrit Discussion
Sorry, forgot the link to metrics-reporter-jmx change: https://gerrit-review.googlesource.com/c/plugins/metrics-reporter-jmx/+/239375 .

Thomas Dräbing

unread,
Jan 16, 2020, 10:15:19 AM1/16/20
to Mihály Petrényi, Repo and Gerrit Discussion
Thanks for the hint! This is exactly, what I had in mind. I will blatantly copy your code and adapt it to the prometheus reporter (Hope that is fine :-)).

To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/25298672-c183-46bf-9147-41efbf18f936%40googlegroups.com.

Luca Milanesio

unread,
Jan 16, 2020, 10:23:47 AM1/16/20
to Thomas Dräbing, Luca Milanesio, Mihály Petrényi, Repo and Gerrit Discussion

On 16 Jan 2020, at 07:15, Thomas Dräbing <thomas....@gmail.com> wrote:

Thanks for the hint! This is exactly, what I had in mind. I will blatantly copy your code and adapt it to the prometheus reporter (Hope that is fine :-)).

Can you post it for review?

Luca.

Thomas Dräbing

unread,
Jan 16, 2020, 10:42:49 AM1/16/20
to Luca Milanesio, Mihály Petrényi, Repo and Gerrit Discussion
On Thu, 16 Jan 2020 at 16:23, Luca Milanesio <luca.mi...@gmail.com> wrote:


On 16 Jan 2020, at 07:15, Thomas Dräbing <thomas....@gmail.com> wrote:

Thanks for the hint! This is exactly, what I had in mind. I will blatantly copy your code and adapt it to the prometheus reporter (Hope that is fine :-)).

Can you post it for review?

Cédric LE COZ

unread,
Mar 6, 2020, 1:29:50 AM3/6/20
to Repo and Gerrit Discussion
Hi Thomas, all,

Did you ever share or made or found a Grafana dashboard for Gerrit? (a Json file if I understand correctly what I am seeing on the UI)

I've just started looking at those prometheus datapoints but there are so many of them, creating a proper dashboard would take hours :)

Tks,
Cedric.

Le jeudi 16 janvier 2020 15:42:49 UTC, Thomas Dräbing a écrit :
...

Thomas Dräbing

unread,
Mar 6, 2020, 3:25:25 AM3/6/20
to Cédric LE COZ, Repo and Gerrit Discussion
Hi Cedric,

in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.

In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards. The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.

The dashboards in the project are currently versioned as JSON-files, so if you are only interested in them, you could just use those. We are thinking about moving to using Grafonnet or similar to have dashboards-as-code in the future.

@The Maintainers: Could we get a new repository for the monitoring setup (e.g. gerrit-monitoring)?

Best,
Thomas

To unsubscribe, email repo-discuss...@googlegroups.com

More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/481dbaad-9936-4990-9ef5-d0e6d7d477a4%40googlegroups.com.

David Pursehouse

unread,
Mar 6, 2020, 4:10:52 AM3/6/20
to Thomas Dräbing, Cédric LE COZ, Repo and Gerrit Discussion
On Fri, Mar 6, 2020 at 5:25 PM Thomas Dräbing <thomas....@gmail.com> wrote:
Hi Cedric,

in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.

In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards. The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.

The dashboards in the project are currently versioned as JSON-files, so if you are only interested in them, you could just use those. We are thinking about moving to using Grafonnet or similar to have dashboards-as-code in the future.

@The Maintainers: Could we get a new repository for the monitoring setup (e.g. gerrit-monitoring)?


Owner group is "gerrit-monitoring-owners" which currently consists of me (its creator) and you.  Feel free to add others as you see fit.

 

Thomas Dräbing

unread,
Mar 6, 2020, 4:14:37 AM3/6/20
to David Pursehouse, Cédric LE COZ, Repo and Gerrit Discussion
On Fri, 6 Mar 2020 at 10:10, David Pursehouse <david.pu...@gmail.com> wrote:
On Fri, Mar 6, 2020 at 5:25 PM Thomas Dräbing <thomas....@gmail.com> wrote:
Hi Cedric,

in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.

In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards. The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.

The dashboards in the project are currently versioned as JSON-files, so if you are only interested in them, you could just use those. We are thinking about moving to using Grafonnet or similar to have dashboards-as-code in the future.

@The Maintainers: Could we get a new repository for the monitoring setup (e.g. gerrit-monitoring)?


Owner group is "gerrit-monitoring-owners" which currently consists of me (its creator) and you.  Feel free to add others as you see fit.

Thanks David!

Luca Milanesio

unread,
Mar 6, 2020, 5:24:09 AM3/6/20
to Thomas Dräbing, Luca Milanesio, Cédric LE COZ, Repo and Gerrit Discussion


> On 6 Mar 2020, at 08:25, Thomas Dräbing <thomas....@gmail.com> wrote:
>
> Hi Cedric,
>
> in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.
>
> In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards.

Wow, that looks amazing :-)

Does it actually require K8s? Or can be used standalone with Docker?

> The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.

Looking forward to see the first change coming :-)
Thanks for sharing your experience.

>
> The dashboards in the project are currently versioned as JSON-files, so if you are only interested in them, you could just use those. We are thinking about moving to using Grafonnet or similar to have dashboards-as-code in the future.

That looks very interesting, will have a look at it.

Luca.

Thomas Dräbing

unread,
Mar 6, 2020, 7:23:14 AM3/6/20
to Luca Milanesio, Cédric LE COZ, Repo and Gerrit Discussion
On Fri, 6 Mar 2020 at 11:24, Luca Milanesio <luca.mi...@gmail.com> wrote:


> On 6 Mar 2020, at 08:25, Thomas Dräbing <thomas....@gmail.com> wrote:
>
> Hi Cedric,
>
> in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.
>
> In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards.

Wow, that looks amazing :-)

Does it actually require K8s? Or can be used standalone with Docker?

The setup is only available for Kubernetes. The project itself is basically just a collection of configuration, installation scripts and documentation that help with deploying the charts for the single components. It does not create any new helm chart, container etc. It allow to get a monitoring setup up and running in minutes, if a Kubernetes cluster is available. The Gerrit instance does not have to run in Kubernetes though, but can run anywhere. Promtail, the tool collecting the logs, has to be installed next to Gerrit with access to the logs on the filesystem. This can be done by using the binary or a docker container. The remaining components however run in Kubernetes. So far we monitor 3 Gerrit instances with it and only require about 0.6 CPUs and 3.3 GB RAM, although this is not HA yet, there are only 4-6 people looking at and querying the data and we only keep the data for 4 weeks, so this will definitely increase soon.

In principle all components are available as docker containers and can be run as such, but the gerrit-monitoring project is currently not set up to support this. 
 

> The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.

Looking forward to see the first change coming :-)
Thanks for sharing your experience.

Sure. I am looking forward for the community's feedback. I know that you and others already have Grafana dashboards for quite some time and gathered experience with it. It would be great, if everybody would share their work/experience on that, so that we can provide everybody with an easy to set up monitoring stack. For us this already proved extremely valuable in terms of optimizing our configuration after we updated to 2.16 now.

Matthias Sohn

unread,
Mar 11, 2020, 10:50:34 AM3/11/20
to Thomas Dräbing, Luca Milanesio, Cédric LE COZ, Repo and Gerrit Discussion
Thomas uploaded our current monitoring setup to the new project
We now use this since a couple of weeks for our productive master and replica servers.
Kudos to Thomas for driving this kubernetes based monitoring setup.

Here some screenshots

gerrit-monitoring-queues.png
gerrit-monitoring-process.png

-Matthias 

Luca Milanesio

unread,
Mar 11, 2020, 10:57:26 AM3/11/20
to Matthias Sohn, Thomas Dräbing, Luca Milanesio, Cédric LE COZ, Repo and Gerrit Discussion
Indeed, Kudos to Thomas :-)

Luca.


Here some screenshots

<gerrit-monitoring-queues.png>
<gerrit-monitoring-process.png>

-Matthias 

Reply all
Reply to author
Forward
0 new messages