Prometheus data based on service-level-objectives

Juergen Etzlstorfer

unread,

Jun 18, 2020, 8:25:21 AM6/18/20

to Prometheus Users

Hi,

I am a core contributor to an open-source project which makes use of Prometheus data for the evaluation of quality criteria of a microservice or an application. Basically, we automatically pull the different metrics and evaluate them against a given specification file based on service-level objectives (SLOs). We usually use SLOs based on the RED metrics such as throughput, error rate, and duration/response time.

We want to provide more integrations out-of-the-box and I would be interested in which metrics would you usually like to check to determine the quality of your software?

If you want to take a look at how we are doing it, we just recently recorded a webinar demonstrating this use case. https://www.youtube.com/watch?v=hx0NHj4u7ic

The project itself can be found here if you are interested: https://keptn.sh

Looking forward to hearing your thoughts on this!

Ben Kochie

unread,

Jun 18, 2020, 8:34:24 AM6/18/20

to Juergen Etzlstorfer, Prometheus Users

We're doing something very similar. We have a metrics catalog[0] that turns all of our service level metrics into uniform SLO metrics.

[0]: https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2d816509-19d4-4969-9a77-7b10b9020427o%40googlegroups.com.

Juergen Etzlstorfer

unread,

Jun 18, 2020, 8:41:39 AM6/18/20

to Prometheus Users

That's interesting!

Which metrics do you usually focus on?

Could you please provide a bit more context on how you are then using the jsonnet files?

When is the evaluation triggered? Each time a new microservice gets deployed or by manual execution?

On Thursday, June 18, 2020 at 2:34:24 PM UTC+2, Ben Kochie wrote:

We're doing something very similar. We have a metrics catalog[0] that turns all of our service level metrics into uniform SLO metrics.

[0]: https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog

On Thu, Jun 18, 2020 at 2:25 PM Juergen Etzlstorfer <juergen.e...@dynatrace.com> wrote:

Hi,

I am a core contributor to an open-source project which makes use of Prometheus data for the evaluation of quality criteria of a microservice or an application. Basically, we automatically pull the different metrics and evaluate them against a given specification file based on service-level objectives (SLOs). We usually use SLOs based on the RED metrics such as throughput, error rate, and duration/response time.
We want to provide more integrations out-of-the-box and I would be interested in which metrics would you usually like to check to determine the quality of your software?

If you want to take a look at how we are doing it, we just recently recorded a webinar demonstrating this use case. https://www.youtube.com/watch?v=hx0NHj4u7ic

The project itself can be found here if you are interested: https://keptn.sh

Looking forward to hearing your thoughts on this!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Ben Kochie

unread,

Jun 18, 2020, 8:49:35 AM6/18/20

to Juergen Etzlstorfer, Prometheus Users

On Thu, Jun 18, 2020 at 2:41 PM Juergen Etzlstorfer <juergen.e...@dynatrace.com> wrote:

That's interesting!
Which metrics do you usually focus on?

We're following a variation of the RED method.

* Apdex scores for duration.

* Request rates

* Error rates

* Component saturation

Could you please provide a bit more context on how you are then using the jsonnet files?

The jsonnet files allow us to define what metrics are associated with what component. For example, "these histogram buckets are used for apdex".

This allows us to generate standardized recording rules, as well as populate automated Grafana dashboards.

When is the evaluation triggered? Each time a new microservice gets deployed or by manual execution?

The jsonnet files are in the repo, and get built and published in CI pipelines. Whenever we change our services, if that's sharding changes, new services, or new metrics for services.

On Thursday, June 18, 2020 at 2:34:24 PM UTC+2, Ben Kochie wrote:
We're doing something very similar. We have a metrics catalog[0] that turns all of our service level metrics into uniform SLO metrics.

[0]: https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog

On Thu, Jun 18, 2020 at 2:25 PM Juergen Etzlstorfer <juergen.e...@dynatrace.com> wrote:
Hi,

I am a core contributor to an open-source project which makes use of Prometheus data for the evaluation of quality criteria of a microservice or an application. Basically, we automatically pull the different metrics and evaluate them against a given specification file based on service-level objectives (SLOs). We usually use SLOs based on the RED metrics such as throughput, error rate, and duration/response time.
We want to provide more integrations out-of-the-box and I would be interested in which metrics would you usually like to check to determine the quality of your software?

If you want to take a look at how we are doing it, we just recently recorded a webinar demonstrating this use case. https://www.youtube.com/watch?v=hx0NHj4u7ic

The project itself can be found here if you are interested: https://keptn.sh

Looking forward to hearing your thoughts on this!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2d816509-19d4-4969-9a77-7b10b9020427o%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7864ef9f-06d4-4542-b3fa-2fa257adaf14o%40googlegroups.com.

Juergen Etzlstorfer

unread,

Jun 18, 2020, 9:13:33 AM6/18/20

to Prometheus Users

On Thursday, June 18, 2020 at 2:49:35 PM UTC+2, Ben Kochie wrote:

On Thu, Jun 18, 2020 at 2:41 PM Juergen Etzlstorfer <juergen.e...@dynatrace.com> wrote:
That's interesting!
Which metrics do you usually focus on?

We're following a variation of the RED method.
* Apdex scores for duration.
* Request rates
* Error rates
* Component saturation

Could you please provide a bit more context on how you are then using the jsonnet files?

The jsonnet files allow us to define what metrics are associated with what component. For example, "these histogram buckets are used for apdex".

This allows us to generate standardized recording rules, as well as populate automated Grafana dashboards.

Are you also using the files for automated generation of alerts?

When is the evaluation triggered? Each time a new microservice gets deployed or by manual execution?

The jsonnet files are in the repo, and get built and published in CI pipelines. Whenever we change our services, if that's sharding changes, new services, or new metrics for services.

That sounds similar to the approach we are taking.

On Thursday, June 18, 2020 at 2:34:24 PM UTC+2, Ben Kochie wrote:
We're doing something very similar. We have a metrics catalog[0] that turns all of our service level metrics into uniform SLO metrics.

[0]: https://gitlab.com/gitlab-com/runbooks/-/tree/master/metrics-catalog

On Thu, Jun 18, 2020 at 2:25 PM Juergen Etzlstorfer <juergen.e...@dynatrace.com> wrote:
Hi,

I am a core contributor to an open-source project which makes use of Prometheus data for the evaluation of quality criteria of a microservice or an application. Basically, we automatically pull the different metrics and evaluate them against a given specification file based on service-level objectives (SLOs). We usually use SLOs based on the RED metrics such as throughput, error rate, and duration/response time.
We want to provide more integrations out-of-the-box and I would be interested in which metrics would you usually like to check to determine the quality of your software?

If you want to take a look at how we are doing it, we just recently recorded a webinar demonstrating this use case. https://www.youtube.com/watch?v=hx0NHj4u7ic

The project itself can be found here if you are interested: https://keptn.sh

Looking forward to hearing your thoughts on this!

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2d816509-19d4-4969-9a77-7b10b9020427o%40googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/7864ef9f-06d4-4542-b3fa-2fa257adaf14o%40googlegroups.com.

Reply all

Reply to author

Forward