Dear all,
at the Gerrit Contributor Summit there was some interest in the Grafana dashboards for Gerrit that were created as part of the gerrit-monitoring project. Thus, I wanted to give a short update of how the project looks right now.
Some time ago, we at SAP noticed that some operation tasks were more tedious than necessary, since important metrics were not readily available to us, although Gerrit already had support to export them. To mitigate that, we set up a logging/monitoring stack using Prometheus, Loki and Grafana. Since then we have been able to detect issues earlier and also to find the root causes faster than before, greatly helping us in running our Large Gerrit servers. Early on, we published the setup in the gerrit-monitoring project [1].
The gerrit-monitoring project provides an easy setup for all components of the stack into a Kubernetes cluster. It will install Prometheus and configure it to monitor a set of Gerrit servers, either running in a Kubernetes cluster or somewhere else. It will install Grafana together with some dashboards showing the most important metrics. Further, it will install Loki, a log aggregator, that will allow you to access your logs within Grafana. To collect the logs, the setup will also create configuration files for Promtail, the log collector used by Loki.
To make the setup as easy and quick as possible, the configuration of the helm charts used for deployment of the components is quite opinionated and is based on what we at SAP required of it. However, it will provide a good starting point for everyone and can be easily adjusted to different needs. And contributions with improvements are of course always welcome :-).
A lot of you might not want to install the whole stack or do not have a Kubernetes cluster at your disposal. The dashboards provided by the project can of course also be used with your own Prometheus/Grafana setup, independent of where they run. All dashboards are present in the `dashboards`-directory in the project's root directory and can be imported into any Grafana. I also uploaded the dashboards to the Grafana homepage [2], so that they are more easily discoverable and importable. We would be more than happy about feedback about the dashboards. Also, if you already have your own dashboards, please share them and your experiences, so that we can further improve on the current state.
The current efforts in the project are to use Grafonnet to create the dashboards (already in review), to provide the possibility to use Fluent Bit/Elasticsearch for logging as an alternative to Loki, which still has some issues, and to improve on the installation scripts to make them easier to extend.
I am looking forward to your feedback,
Thomas