what is the lowest scrapping and alert interval values ?

Sébastien Dionne

unread,

Jun 22, 2020, 3:08:08 PM6/22/20

to Prometheus Users

I want to use Prometheus + alertmanager for health manager. I want to know what is the lowest value I can use for scraping metrics (I hope that I can have a config for particuliar rules) and send alert as soon as there are alerts. I need almost realtime. Is it possible in Prometheus + alertmanager ?

I have a sample config that works now, but is it possible to have 1s are something that prometheus send alert as soon as the metric is read ?

serverFiles:

alerts:

groups:

- name: Instances

rules:

- alert: InstanceDown

expr: up == 0

for: 10s

labels:

severity: page

annotations:

description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.'

summary: 'Instance {{ $labels.instance }} down'

alertmanagerFiles:

alertmanager.yml:

route:

receiver: default-receiver

group_wait: 5s

group_interval: 10s

receivers:

- name: default-receiver

webhook_configs:

- url: "https://webhook.site/815a0b0b-f40c-4fc2-984d-e29cb9606840"

Stuart Clark

unread,

Jun 22, 2020, 3:42:14 PM6/22/20

to Sébastien Dionne, Prometheus Users

While it is definitely possible to have very low scrape intervals and very sensitive alerts often that results in poor outcomes.

The reality is that reaction times to alerts are generally fairly long - an alert outside of office hours could easily take 30 minutes or longer to respond to. I'd suggest being very careful about such short "for" intervals. You can very easily end up with a lot of false positives, with alerts which fire then resolve, fire then resolve.

But technically you can have scrape intervals of a second or less, and "for"s of a few seconds.

--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Sébastien Dionne

unread,

Jun 22, 2020, 3:46:28 PM6/22/20

to Prometheus Users

thanks

in my case, the alerts will be send to our healthManager and update the states of our application in the database. No human interaction.

I though of using a script with a liveness probe and the script could sent a POST into our healthManager.. but at the end, it's the same thing because the livenessprobe will run like each 5 seconds. So I prefer to use the metrics that Prometheus will scrap anyway.

On Monday, June 22, 2020 at 3:42:14 PM UTC-4, Stuart Clark wrote:

While it is definitely possible to have very low scrape intervals and very sensitive alerts often that results in poor outcomes.

The reality is that reaction times to alerts are generally fairly long - an alert outside of office hours could easily take 30 minutes or longer to respond to. I'd suggest being very careful about such short "for" intervals. You can very easily end up with a lot of false positives, with alerts which fire then resolve, fire then resolve.

But technically you can have scrape intervals of a second or less, and "for"s of a few seconds.

Stuart Clark

unread,

Jun 22, 2020, 3:52:35 PM6/22/20

to Sébastien Dionne, Prometheus Users

Just be aware that you can end up with very noisy data. Something which looks like a failure could easily be due to transient issues - failed scrapes, etc.

Julien Pivotto

unread,

Jun 22, 2020, 4:06:31 PM6/22/20

to Prometheus Users

It would be better for your health manager to query prometheus instead when needed.

Sébastien Dionne

unread,

Jun 23, 2020, 1:26:28 PM6/23/20

to Prometheus Users

that's a good idea.

Reply all

Reply to author

Forward