On 14 Jan 2020, at 01:27, Thomas Dräbing <thomas....@gmail.com> wrote:Dear all,we plan to move some of our monitoring to the Prometheus/Grafana-stack. Among the dashboard collection published on the Grafana homepage, I couldn't find any existing dashboards for Gerrit metrics [1]. Before I start to create new dashboards from scratch, I wanted to ask whether somebody has Grafana dashboards for Gerrit metrics and is willing to share them with the community. Having a solid base to start from would be of great help (not only for me, I guess). Thus, help would be greatly appreciated!
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/dc9d5de4-1e9b-4b98-87b5-28fe61985a57%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/23AF7B2A-739B-4D1B-9872-31D281029345%40gmail.com.
On 14 Jan 2020, at 05:39, Fabio Ponciroli <pon...@gmail.com> wrote:@Luca Milanesio we could extract the multi-site part and just publish the core metrics. WDYT?
On 14 Jan 2020, at 06:01, Thomas Dräbing <thomas....@gmail.com> wrote:Hi Luca, hi Fabio,if you could share the dashboard, that would be really awesome!
On 14 Jan 2020, at 06:01, Thomas Dräbing <thomas....@gmail.com> wrote:Hi Luca, hi Fabio,if you could share the dashboard, that would be really awesome!I believe the best would be to have a docker-compose.yaml that already contains the components we need and pre-configured:1. Prometheus2. Grafana
With regards to the Grafana dashboard, it should be shared on http://snapshot.raintank.io/info/ correct?
Hi Luca, hi Fabio,if you could share the dashboard, that would be really awesome!Maybe we can version the json-files describing the dashboards somewhere? Then it would be easy to adapt to new metrics etc.I will of course also happily share what we did for our Prometheus/Grafana setup, as soon as it is ready.Best,Thomas
On Tue, 14 Jan 2020 at 14:52, Luca Milanesio <luca.m...@gmail.com> wrote:
On 14 Jan 2020, at 05:39, Fabio Ponciroli <pon...@gmail.com> wrote:
...@Luca Milanesio we could extract the multi-site part and just publish the core metrics. WDYT?
Sure, that would be a start.
Luca.
Il giorno mar 14 gen 2020 alle ore 14:37 Luca Milanesio <luca.m...@gmail.com> ha scritto:
On 14 Jan 2020, at 01:27, Thomas Dräbing <thomas....@gmail.com> wrote:Dear all,we plan to move some of our monitoring to the Prometheus/Grafana-stack. Among the dashboard collection published on the Grafana homepage, I couldn't find any existing dashboards for Gerrit metrics [1]. Before I start to create new dashboards from scratch, I wanted to ask whether somebody has Grafana dashboards for Gerrit metrics and is willing to share them with the community. Having a solid base to start from would be of great help (not only for me, I guess). Thus, help would be greatly appreciated!We have one for Gerrit multi-site, which includes also replication and split-brain metrics.See some screenshots at [2]Luca.
To unsubscribe, email rep...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/dc9d5de4-1e9b-4b98-87b5-28fe61985a57%40googlegroups.com.
--
--
To unsubscribe, email repo-d...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-d...@googlegroups.com.
On 14 Jan 2020, at 07:20, Mihály Petrényi <e.mihaly...@gmail.com> wrote:Hi,
I am from Ericsson. We are hosting a huge multi-site Gerrit instance using Grafana with Prometheus for monitoring.
We are using some standard Prometheus exporters: node, mtail, jmx, gerrit, postgres and haproxy.
Additionally, we are generating custom metrics, mainly based on node exporter's textfilecollector functionality.
That is a great, easy to use feature, I highly recommend it.
Files containing the metrics can be placed in a directory and node exporter will serve those to Prometheus.
We are also planning to use Grafana Loki for log files, instead of the current solution with mtail.
These exporters provide us ~15k metrics / node.
We split the metrics into several dashboards. At the moment we have the following main dashboards: Overview, Datacenters, Backend, Database, Frontend, Garbage Collection, Disk usage, Network, RED (based on Google's RED method), Replication, Repositories, Node exporter.
Most important thing is to have the Prometheus targets properly and consistently labeled. We are using the following common labels for our targets:
environment (dev, staging, production), configuration (master, slave), datacenter (for multi-site), job (exporter name), role (backend, frontend, gc, db)
Sample screenshot from our Overview dashboard: https://imgur.com/BLlGDt5
It is probably a good idea to have a shared, publicly available Grafana template for Gerrit that can be easily tailored to the given environments.
We are happy to contribute with our experience.
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
Out of topic: which Gerrit multi-site implementation are you using? Gerrit + multi-site plugin? WANdisco? In-house implementation?Is it a Gerrit multi-master/multi-site or a simple master-slave replication?
How do you synchronise the Postgres multi-site?
Have you thought about coming to a Gerrit User Summit and present your experience?
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/904df83a-e7f3-448a-a189-b213c70bd5de%40googlegroups.com.
...
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/d33abef9-161c-4a71-a330-f2b40609f7e1%40googlegroups.com.
I am currently planning to use the metrics-reporter-prometheus plugin, ...
...
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/0c868481-f925-4276-9c90-57688c5b17a1%40googlegroups.com.
On Thursday, January 16, 2020 at 3:35:03 PM UTC+1, Thomas Dräbing wrote:Yes, we noticed as well that the disk state metrics of the persistent caches are an issue. In our case, we got 10+ hanging threads, because it took several minutes some times. We already new that these metrics cause such issues and for now I removed these metrics to test, whether this fixes the issue. If it does, I plan to propose a change that adds an option to disable this metric in Gerrit.
[metrics]
exclude = caches.*Yes, we noticed as well that the disk state metrics of the persistent caches are an issue. In our case, we got 10+ hanging threads, because it took several minutes some times. We already new that these metrics cause such issues and for now I removed these metrics to test, whether this fixes the issue. If it does, I plan to propose a change that adds an option to disable this metric in Gerrit.
...
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/25298672-c183-46bf-9147-41efbf18f936%40googlegroups.com.
On 16 Jan 2020, at 07:15, Thomas Dräbing <thomas....@gmail.com> wrote:Thanks for the hint! This is exactly, what I had in mind. I will blatantly copy your code and adapt it to the prometheus reporter (Hope that is fine :-)).
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/CAG7bb4D%2BcgsrO5aRkE6iuF9_Gyb%2BjmbP7bEF8YfBZjLS1CX8BA%40mail.gmail.com.
On 16 Jan 2020, at 07:15, Thomas Dräbing <thomas....@gmail.com> wrote:Thanks for the hint! This is exactly, what I had in mind. I will blatantly copy your code and adapt it to the prometheus reporter (Hope that is fine :-)).Can you post it for review?
...
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/481dbaad-9936-4990-9ef5-d0e6d7d477a4%40googlegroups.com.
Hi Cedric,in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards. The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.The dashboards in the project are currently versioned as JSON-files, so if you are only interested in them, you could just use those. We are thinking about moving to using Grafonnet or similar to have dashboards-as-code in the future.@The Maintainers: Could we get a new repository for the monitoring setup (e.g. gerrit-monitoring)?
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/CAG7bb4D3W_-c5zKUnZ8wN_SrgqU50OAVoU-omZ26F2hDUDtVDw%40mail.gmail.com.
On Fri, Mar 6, 2020 at 5:25 PM Thomas Dräbing <thomas....@gmail.com> wrote:Hi Cedric,in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards. The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.The dashboards in the project are currently versioned as JSON-files, so if you are only interested in them, you could just use those. We are thinking about moving to using Grafonnet or similar to have dashboards-as-code in the future.@The Maintainers: Could we get a new repository for the monitoring setup (e.g. gerrit-monitoring)?Owner group is "gerrit-monitoring-owners" which currently consists of me (its creator) and you. Feel free to add others as you see fit.
> On 6 Mar 2020, at 08:25, Thomas Dräbing <thomas....@gmail.com> wrote:
>
> Hi Cedric,
>
> in our team we spend some time during the last weeks to create a monitoring setup and some dashboards. It has now reached a state that it provides a solid base for monitoring Gerrit and we plan to open source it during the next few weeks to collaborate with the community on improving the setup further and to provide everybody with a solid monitoring experience to start with.
>
> In the current state the setup is Kubernetes-based and provides an opinionated configuration for a stack including Prometheus, Prometheus Alertmanager, Grafana, Promtail, Loki and premade dashboards.
Wow, that looks amazing :-)
Does it actually require K8s? Or can be used standalone with Docker?
> The deployments are based on the charts provided by the respective projects. The installation is scripted. Only configuration expected to change between every setup (e.g. secrets) is directly exposed, reducing the complexity of configuring the charts considerably. Secret configuration can be encrypted (using Mozilla/sops) and thus be versioned within a private git repository and used by CI systems with reasonable safety. One of these setups may be used to monitor multiple Gerrit instances.
Looking forward to see the first change coming :-)
Thanks for sharing your experience.


Here some screenshots<gerrit-monitoring-queues.png><gerrit-monitoring-process.png>-Matthias