Context deadline exceeded with telegraf

chve...@gmail.com

unread,

Nov 30, 2018, 8:36:31 AM11/30/18

to Prometheus Users

Hi, there

I'm trying to scrape from telegraf once per second and I get 'Context deadline exceeded'. If I try to scrape every second 10K!! metrics from my POC application, it works perfectly.

What could be a reason for this? does anyone have such experience?

we need to scrape so often during load tests to get the best picture of what is going on in the system

best,

ilya

Ben Kochie

unread,

Nov 30, 2018, 9:10:25 AM11/30/18

to chve...@gmail.com, Prometheus Users

Scrape timeout can never be more than scrape interval, so telegraf has to respond within 1 second or it will timeout.

Try changing the interval to 10s, then check scrape_duration_seconds to see how long Prometheus thinks it's taking.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/44d47b6e-209d-4d98-a3d8-5fa4c6005bff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ilya Shvetsov

unread,

Nov 30, 2018, 9:20:41 AM11/30/18

to sup...@gmail.com, promethe...@googlegroups.com

Thanks for the tip and quick response

I've checked 'scrape_duration_seconds' it jumps between 0.001 and 0.01 seconds.

So, it should be able to get data, right?

If I set scrape interval 1050 ms. It works fine. I've checked this with versions 2.5 and 2.4.3 and telegraf last nightly build

--

With best regards
Ilya 'Akhil' Shvetsov

Ilya Shvetsov

unread,

Nov 30, 2018, 9:24:24 AM11/30/18

to sup...@gmail.com, promethe...@googlegroups.com

interesting thing is that if I set scrape interval 1s, then 'scrape_duration_seconds' is always 1s. I assume that means that we got a timeout.

and forgot to say, that I'm running on Windows 10 Pro

--

With best regards
Ilya 'Akhil' Shvetsov

Ilya Shvetsov

unread,

Nov 30, 2018, 9:28:40 AM11/30/18

to sup...@gmail.com, promethe...@googlegroups.com

well, not exactly 1s it is between 1 and 1.004s that looks odd

--

With best regards
Ilya 'Akhil' Shvetsov

Ben Kochie

unread,

Nov 30, 2018, 9:43:14 AM11/30/18

to chve...@gmail.com, Prometheus Users

What if you do this:

scrape_interval: 1s

scrape_timeout: 950ms

Ilya Shvetsov

unread,

Nov 30, 2018, 9:48:02 AM11/30/18

to Ben Kochie, Prometheus Users

now i see 0.95 - 0.953

Ilya Shvetsov

unread,

Nov 30, 2018, 9:55:46 AM11/30/18

to Ben Kochie, Prometheus Users

so, if I have

scrape_interval: 2s

scrape_timeout: 2s

then scrape duration is just a few milliseconds 4-8

if scrape_tiemout is below 1050 ms scrape duration starts to be around scrape_timeout.

That odd, because it works for my POC

Where is the bug?!

--

With best regards
Ilya 'Akhil' Shvetsov

Ilya Shvetsov

unread,

Nov 30, 2018, 10:46:15 AM11/30/18

to Ben Kochie, Prometheus Users

I have tried to scrape prometheus manually using browser. It looks answering quickly enough

пт, 30 нояб. 2018 г., 16:55 Ilya Shvetsov chve...@gmail.com:

Ilya Shvetsov

unread,

Dec 3, 2018, 5:19:45 AM12/3/18

to Ben Kochie, Prometheus Users

@Ben Kochie

do you have any ideas or recommendations?

--

With best regards
Ilya 'Akhil' Shvetsov

Ben Kochie

unread,

Dec 3, 2018, 9:28:38 AM12/3/18

to Ilya Shvetsov, Prometheus Users

No, sorry, I have no idea what could be going wrong with telegraf.

Reply all

Reply to author

Forward