Introduce the concept of scrape Priority for Targets

Lili Cosic

unread,

Jul 22, 2020, 5:14:14 AM7/22/20

to Prometheus Developers

Only now seen in the docs that I am supposed to start any discussions here first before opening an issue, sorry about that! :)

Currently there is no way of a target to have higher scrape priority over another, but if you have a setup and even if you set target limits and sample limits you can still overestimate your setup, you still want to have a higher priority targets that are preferred over the entire Prometheus to fail. It would need to be based on the inability to ingest into tsdb on the current rate we are scrapping, if that is hit the priority class would take affect and only the highest priority targets would be scrapped in favour of lower priority. Another option which might be simpler would be to have a global limit on how much prometheus can handle based on perf testing.

This would be treated as a last resort, and there would definitely be a need for a high severity alert to inform the admin that something went terribly wrong, but because we would still be able to ingest Prometheus metrics for example if they are higher priority class alerting would be possible.

We could model this on something like PriorityClass from Kubernetes, but I am open to other suggestions.

I am open to other suggestions, or maybe there is something like this but I missed it. The main purpose is to ensure there are protection mechanisms in place, so any ideas and suggestions welcome!

Thanks and kind regards,

Lili

Julien Pivotto

unread,

Jul 22, 2020, 5:18:04 AM7/22/20

to Lili Cosic, Prometheus Developers

On 22 Jul 02:14, Lili Cosic wrote:
> Only now seen in the docs that I am supposed to start any discussions here
> first before opening an issue, sorry about that! :)
>
> Currently there is no way of a target to have higher scrape priority over
> another, but if you have a setup and even if you set target limits and
> sample limits you can still overestimate your setup, you still want to have
> a higher priority targets that are preferred over the entire Prometheus to
> fail. It would need to be based on the inability to ingest into tsdb on the
> current rate we are scrapping, if that is hit the priority class would take
> affect and only the highest priority targets would be scrapped in favour of
> lower priority. Another option which might be simpler would be to have a
> global limit on how much prometheus can handle based on perf testing.
>
> This would be treated as a last resort, and there would definitely be a
> need for a high severity alert to inform the admin that something went
> terribly wrong, but because we would still be able to ingest Prometheus
> metrics for example if they are higher priority class alerting would be
> possible.

Hi,

I think that limiting the number of targets you scrape is already a last
resort. I don't think we would need a second line of defense.

You can achieve this priority by setting 2 jobs, one which is limited
and one which is not, and use relabeling to decinde which target is
going in which job.

>
> We could model this on something like PriorityClass

> <https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass> from

> Kubernetes, but I am open to other suggestions.

That could be used in relabeling as I said.

>
> I am open to other suggestions, or maybe there is something like this but I
> missed it. The main purpose is to ensure there are protection mechanisms in
> place, so any ideas and suggestions welcome!
>

regards,

> Thanks and kind regards,
> Lili
>

> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com.

--
Julien Pivotto
@roidelapluie

Brian Brazil

unread,

Jul 22, 2020, 5:23:00 AM7/22/20

to Lili Cosic, Prometheus Developers

On Wed, 22 Jul 2020 at 10:18, Julien Pivotto <roidel...@prometheus.io> wrote:

On 22 Jul 02:14, Lili Cosic wrote:
> Only now seen in the docs that I am supposed to start any discussions here
> first before opening an issue, sorry about that! :)
>
> Currently there is no way of a target to have higher scrape priority over
> another, but if you have a setup and even if you set target limits and
> sample limits you can still overestimate your setup, you still want to have
> a higher priority targets that are preferred over the entire Prometheus to
> fail. It would need to be based on the inability to ingest into tsdb on the
> current rate we are scrapping, if that is hit the priority class would take
> affect and only the highest priority targets would be scrapped in favour of
> lower priority. Another option which might be simpler would be to have a
> global limit on how much prometheus can handle based on perf testing.
>
> This would be treated as a last resort, and there would definitely be a
> need for a high severity alert to inform the admin that something went
> terribly wrong, but because we would still be able to ingest Prometheus
> metrics for example if they are higher priority class alerting would be
> possible.

Hi,

I think that limiting the number of targets you scrape is already a last
resort. I don't think we would need a second line of defense.

I agree with Julien here. If you've gotten to this point you're already seriously overloaded, and prioritising individual targets is just rearranging the deckchairs at that point.

You can achieve this priority by setting 2 jobs, one which is limited
and one which is not, and use relabeling to decinde which target is
going in which job.

Or more generally, one Prometheus for the important targets and another for the less important and riskier targets.

Brian

>
> We could model this on something like PriorityClass
> <https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass> from
> Kubernetes, but I am open to other suggestions.

That could be used in relabeling as I said.

>
> I am open to other suggestions, or maybe there is something like this but I
> missed it. The main purpose is to ensure there are protection mechanisms in
> place, so any ideas and suggestions welcome!
>

regards,

> Thanks and kind regards,
> Lili
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com.

--
Julien Pivotto
@roidelapluie

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen.

--

Brian Brazil

www.robustperception.io

Lili Cosic

unread,

Jul 22, 2020, 5:35:30 AM7/22/20

to Prometheus Developers

I get your point completely Brian, and agree to some degree but people are still going to be setting up a multi tenant prometheus which then causes the above problems I mentioned. Even within the riskier targets there will be some more important than others for users. I think we should still strive to making a single shared Prometheus as safe as possible, if this is not the priority class I suggested, open to other ideas!

Brian

>
> We could model this on something like PriorityClass
> <https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass> from
> Kubernetes, but I am open to other suggestions.

That could be used in relabeling as I said.

>
> I am open to other suggestions, or maybe there is something like this but I
> missed it. The main purpose is to ensure there are protection mechanisms in
> place, so any ideas and suggestions welcome!
>

regards,

> Thanks and kind regards,
> Lili
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com.

--
Julien Pivotto
@roidelapluie

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen.

--
Brian Brazil
www.robustperception.io

Julien Pivotto

unread,

Jul 22, 2020, 5:40:29 AM7/22/20

to Lili Cosic, Prometheus Developers

Then 2 jobs are the answer, one unlimited and one limited.

The target_limit is already pretty advanced use case.

>
>
> >
> > Brian
> >
> >
> >>
> >> >
> >> > We could model this on something like PriorityClass
> >> > <
> >> https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/#priorityclass>
> >> from
> >> > Kubernetes, but I am open to other suggestions.
> >>
> >> That could be used in relabeling as I said.
> >>
> >> >
> >> > I am open to other suggestions, or maybe there is something like this
> >> but I
> >> > missed it. The main purpose is to ensure there are protection
> >> mechanisms in
> >> > place, so any ideas and suggestions welcome!
> >> >
> >>
> >> regards,
> >>
> >> > Thanks and kind regards,
> >> > Lili
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> >> Groups "Prometheus Developers" group.
> >> > To unsubscribe from this group and stop receiving emails from it, send

> >> an email to prometheus-devel...@googlegroups.com
> >> <javascript:>.

> >> > To view this discussion on the web visit
> >> https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com
> >> .
> >>
> >>
> >> --
> >> Julien Pivotto
> >> @roidelapluie
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Prometheus Developers" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an

> >> email to prometheus-devel...@googlegroups.com <javascript:>
> >> .

> >> To view this discussion on the web visit

> >> https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen
> >> .
> >>
> >
> >
> > --
> > Brian Brazil
> > www.robustperception.io

> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/b0b9e5f7-239a-4cc7-9108-9e6e015a30d6o%40googlegroups.com.

--
Julien Pivotto
@roidelapluie

Frederic Branczyk

unread,

Jul 22, 2020, 10:32:37 AM7/22/20

to Lili Cosic, Prometheus Developers

Can you explain what you mean by two jobs? Do you mean two scrape configs?

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen.

Julien Pivotto

unread,

Jul 22, 2020, 10:34:18 AM7/22/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

On 22 Jul 16:32, Frederic Branczyk wrote:
> Can you explain what you mean by two jobs? Do you mean two scrape configs?

Yes.

> > https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen
> > .

> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1Umx-uFZFPoeOMA-ev4oN5QoRUyODiCWnSZML3hessHkmBQ%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Frederic Branczyk

unread,

Jul 22, 2020, 10:36:21 AM7/22/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

It's unclear how that helps, can you help me understand?

Julien Pivotto

unread,

Jul 22, 2020, 10:39:00 AM7/22/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

On 22 Jul 16:36, Frederic Branczyk wrote:
> It's unclear how that helps, can you help me understand?

- job: highprio
relabel_configs:
- target_label: job
replacement: pods
- source_labels: [__meta_pod_priority]
regex: high
action: keep
- job: lowprio
relabel_configs:
- target_label: job
replacement: pods
- source_labels: [__meta_pod_priority]
regex: high
action: drop
target_limit: 1000

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmzgPKCrpmsDb4v3CrN9Oe%2Bmaka8bosCDuodmjmd-RAyLw%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Frederic Branczyk

unread,

Jul 22, 2020, 10:47:19 AM7/22/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

In practice even that can still be problematic. You only know that Prometheus has a problem when everything fails, the point is to keep things alive well enough for more critical components.

Julien Pivotto

unread,

Jul 22, 2020, 11:00:26 AM7/22/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

On 22 Jul 16:47, Frederic Branczyk wrote:
> In practice even that can still be problematic. You only know that
> Prometheus has a problem when everything fails, the point is to keep things
> alive well enough for more critical components.
>
> On Wed, 22 Jul 2020 at 16:38, Julien Pivotto <roidel...@prometheus.io>
> wrote:
>
> > On 22 Jul 16:36, Frederic Branczyk wrote:
> > > It's unclear how that helps, can you help me understand?
> >
> > - job: highprio
> > relabel_configs:
> > - target_label: job
> > replacement: pods
> > - source_labels: [__meta_pod_priority]
> > regex: high
> > action: keep

highprio job will always be scraped.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmyxR%3DQ%2B6_emwh12CVwkwemU%2B-tzenvgP1WQ%2BCHnw67UUQ%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Frederic Branczyk

unread,

Jul 30, 2020, 4:31:00 AM7/30/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

That's only effective in limiting the number of targets, the point here is that selectively scraping those with a higher priority based on backpressure of the system as a whole.

Ben Kochie

unread,

Jul 30, 2020, 4:44:59 AM7/30/20

to Frederic Branczyk, Lili Cosic, Prometheus Developers

I'm with Brian and Julian on this.

Multi-tenancy is not really something we want to solve in Prometheus. This is a concern for higher level systems like Kubernetes. Prometheus is designed to be distributed. If you have targets with different needs, they need to have separate Prometheus instances.

This is also why we have things like Thanos and Cortex as aggregation layers.

Similar to why we have said we don't plan to implement IO limits, this is a scheduling concern, out of scope for Prometheus.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com.

Lili Cosic

unread,

Jul 30, 2020, 5:01:30 AM7/30/20

to Prometheus Developers

Thanks, everyone for the replies! The official msg seems to be to use a Prometheus instance per tenant/priority if you want to have multiple tenants in your environment.

Kind regards,

Lili

> > prometheus-developers+unsub...@googlegroups.com

> > > > > > > >> <javascript:>.
> > > > > > > >> > To view this discussion on the web visit
> > > > > > > >>
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com
> > > > > > > >> .
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Julien Pivotto
> > > > > > > >> @roidelapluie
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> You received this message because you are subscribed to the
> > Google
> > > > > > Groups
> > > > > > > >> "Prometheus Developers" group.
> > > > > > > >> To unsubscribe from this group and stop receiving emails from
> > it,
> > > > > > send an

> > > > > > > >> email to prometheus-developers+unsub...@googlegroups.com

> > > > > > <javascript:>
> > > > > > > >> .
> > > > > > > >> To view this discussion on the web visit
> > > > > > > >>
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen
> > > > > > > >> .
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Brian Brazil
> > > > > > > > www.robustperception.io
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the
> > Google
> > > > > > Groups "Prometheus Developers" group.
> > > > > > > To unsubscribe from this group and stop receiving emails from it,
> > > > send

> > > > > > an email to prometheus-developers+unsub...@googlegroups.com.

> > > > > > > To view this discussion on the web visit
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/b0b9e5f7-239a-4cc7-9108-9e6e015a30d6o%40googlegroups.com
> > > > > > .
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Julien Pivotto
> > > > > > @roidelapluie
> > > > > >
> > > > > > --
> > > > > > You received this message because you are subscribed to the Google
> > > > Groups
> > > > > > "Prometheus Developers" group.
> > > > > > To unsubscribe from this group and stop receiving emails from it,
> > send
> > > > an

> > > > > > email to prometheus-developers+unsub...@googlegroups.com.

> > > > > > To view this discussion on the web visit
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen
> > > > > > .
> > > > > >
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > > Groups "Prometheus Developers" group.
> > > > > To unsubscribe from this group and stop receiving emails from it,
> > send

> > > > an email to prometheus-developers+unsub...@googlegroups.com.

> > > > > To view this discussion on the web visit
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/CAOs1Umx-uFZFPoeOMA-ev4oN5QoRUyODiCWnSZML3hessHkmBQ%40mail.gmail.com
> > > > .
> > > >
> > > > --
> > > > Julien Pivotto
> > > > @roidelapluie
> > > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups "Prometheus Developers" group.
> > > To unsubscribe from this group and stop receiving emails from it, send

> > an email to prometheus-developers+unsub...@googlegroups.com.

> > > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmzgPKCrpmsDb4v3CrN9Oe%2Bmaka8bosCDuodmjmd-RAyLw%40mail.gmail.com
> > .
> >
> > --
> > Julien Pivotto
> > @roidelapluie
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmyxR%3DQ%2B6_emwh12CVwkwemU%2B-tzenvgP1WQ%2BCHnw67UUQ%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

Bartłomiej Płotka

unread,

Jul 30, 2020, 5:10:52 AM7/30/20

to Lili Cosic, Prometheus Developers

Yes, looks like having many scrapers would solve this, and having Thanos on top for query aggregation can do. However, given the overhead of even operating the TSDB instances like Prometheus (e.g maintaining persistence volumes), I would still see some longer-term solution of better multitenant support (isolation of tenants scrape) within scrape engine. Some alternative is dynamic relabelling configured from outside as seen here https://blog.freshtracks.io/bomb-squad-automatic-detection-and-suppression-of-prometheus-cardinality-explosions-62ca8e02fa32 - I think with good monitoring of Prometheus health we could implement "sidecar" applying such priorities dynamically as well. That would be good for a star maybe (:

In the meantime, the separate scraper looks like the way to go.

Kind Regards,

Bartek

> > prometheus-devel...@googlegroups.com

> > > > > > > >> <javascript:>.
> > > > > > > >> > To view this discussion on the web visit
> > > > > > > >>
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/30df615e-5420-4bdf-9cb7-2790ef19d520o%40googlegroups.com
> > > > > > > >> .
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Julien Pivotto
> > > > > > > >> @roidelapluie
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> You received this message because you are subscribed to the
> > Google
> > > > > > Groups
> > > > > > > >> "Prometheus Developers" group.
> > > > > > > >> To unsubscribe from this group and stop receiving emails from
> > it,
> > > > > > send an

> > > > > > > >> email to prometheus-devel...@googlegroups.com

> > > > > > <javascript:>
> > > > > > > >> .
> > > > > > > >> To view this discussion on the web visit
> > > > > > > >>
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/20200722091759.GA140540%40oxygen
> > > > > > > >> .
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Brian Brazil
> > > > > > > > www.robustperception.io
> > > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > You received this message because you are subscribed to the
> > Google
> > > > > > Groups "Prometheus Developers" group.
> > > > > > > To unsubscribe from this group and stop receiving emails from it,
> > > > send

> > > > > > an email to prometheus-devel...@googlegroups.com.

> > > > > > > To view this discussion on the web visit
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/b0b9e5f7-239a-4cc7-9108-9e6e015a30d6o%40googlegroups.com
> > > > > > .
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Julien Pivotto
> > > > > > @roidelapluie
> > > > > >
> > > > > > --
> > > > > > You received this message because you are subscribed to the Google
> > > > Groups
> > > > > > "Prometheus Developers" group.
> > > > > > To unsubscribe from this group and stop receiving emails from it,
> > send
> > > > an

> > > > > > email to prometheus-devel...@googlegroups.com.

> > > > > > To view this discussion on the web visit
> > > > > >
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/20200722094024.GA175281%40oxygen
> > > > > > .
> > > > > >
> > > > >
> > > > > --
> > > > > You received this message because you are subscribed to the Google
> > > > Groups "Prometheus Developers" group.
> > > > > To unsubscribe from this group and stop receiving emails from it,
> > send

> > > > an email to prometheus-devel...@googlegroups.com.

> > > > > To view this discussion on the web visit
> > > >
> > https://groups.google.com/d/msgid/prometheus-developers/CAOs1Umx-uFZFPoeOMA-ev4oN5QoRUyODiCWnSZML3hessHkmBQ%40mail.gmail.com
> > > > .
> > > >
> > > > --
> > > > Julien Pivotto
> > > > @roidelapluie
> > > >
> > >
> > > --
> > > You received this message because you are subscribed to the Google
> > Groups "Prometheus Developers" group.
> > > To unsubscribe from this group and stop receiving emails from it, send

> > an email to prometheus-devel...@googlegroups.com.

> > > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmzgPKCrpmsDb4v3CrN9Oe%2Bmaka8bosCDuodmjmd-RAyLw%40mail.gmail.com
> > .
> >
> > --
> > Julien Pivotto
> > @roidelapluie
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmyxR%3DQ%2B6_emwh12CVwkwemU%2B-tzenvgP1WQ%2BCHnw67UUQ%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com.

--

You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com.

Julien Pivotto

unread,

Jul 30, 2020, 5:19:26 AM7/30/20

to Bartłomiej Płotka, Lili Cosic, Prometheus Developers

The problem is not that much priorities etc, it is all the questions and
confusions around this:

- When do we decide we are overloaded?
- What do we do for the low priority targets?

and more importantly:

- When do we decide that we can scrape the low targets again?

How to avoid:

High load -> stop low scrapes
-> Normal load (because we do not scrape low priorities) -> restart low
scrapes
-> High load -> stop low scrapes
-> Normal load (because we do not scrape low priorities) -> restart low
scrapes
-> High load -> stop low scrapes
-> Normal load (because we do not scrape low priorities) -> restart low
scrapes

Overall that does not seem easy questions.

> >>> https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com
> >>> <https://groups.google.com/d/msgid/prometheus-developers/CAOs1UmwjYgxU9ABkATe04febF_010n3%3DKVoEm8J_5XGnf0je%2Bg%40mail.gmail.com?utm_medium=email&utm_source=footer>

> >>> .
> >>>
> >> --
> > You received this message because you are subscribed to the Google Groups
> > "Prometheus Developers" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to prometheus-devel...@googlegroups.com.
> > To view this discussion on the web visit

> > https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com
> > <https://groups.google.com/d/msgid/prometheus-developers/4e4786ba-2ecd-497d-b900-18c8a30e9c75o%40googlegroups.com?utm_medium=email&utm_source=footer>

> > .
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/CAMssQwZT78NtfWCQCsrx%2B-B3u4RZKGoFmMGKEH_ypXWGoh3w%2Bw%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Chris Marchbanks

unread,

Jul 30, 2020, 9:43:03 AM7/30/20

to Bartłomiej Płotka, Lili Cosic, Prometheus Developers

I do think we need a better way to detect when we are overloaded, especially with respect to memory usage, and have defined behavior for how we handle backpressure in those cases. The current experience of entering OOM loops is frustrating, and makes it hard to debug as you can't query anything to see what caused the extra load. HA is also not helpful in this case as both instances will have similar data and OOM at the same time.

Perhaps after general overloading/backpressure is defined, higher level ideas such as priority can be introduced, but I also agree that it might be best to just run multiple instances.

Chris

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-developers/20200730091922.GA156213%40oxygen.

Reply all

Reply to author

Forward