Hi There,
I am experiencing the following situation with a number of my prometheus targets.
1. A number of targets are "DOWN" with the error message "Context Deadline Exceeded"
2. My scrape timeout is set to 10s
3. My scrape duration starts out lower than 10s on re-deploy but over the course of a couple days steadily approaches the timeout
4. The scrape samples scraped is under 25k per scrape, but more importantly doesn't seem correlated with the approach to timeout. At times where the scrape_samples_scraped == 25k but memory is low, the scrape_duration remains low
5. According to my dashboard the item that seems to have a strong correlation with timeout's is memory usage. I.e after a redeploy memory usage steadily increases and then fluctuates in a range close to max usage
My question is around the effect of low memory availability on a prometheus scrape. If a request is sent to the /prometheus_metrics endpoint, and there was a limited amount of memory would the generate_latest method ( python client ) hang and cause a timeout?
Please let me know if there is anything else I can provide and cheers in advance