gary.w...@comparethemarket.com
unread,Aug 12, 2016, 11:10:52 AM8/12/16Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Developers
Hi
I'm relatively new to prometheus and I'm not a code expert, but from a user perspective we've recently seen some drops in our trend data, as those counters (counts) haven't been successfully extracted from our On-Prem and AWS instances we're monitoring.
It’s as though the counters weren’t collected successfully, however, we have a log-exporter service running on our app-boxes, which should hold the counters until successfully scraped (retrieved) by prometheus. I’m wondering if this is something you are familiar with?
Our process flow for metrics is Log Files =>
log-exporter (which collects the logs in a format compatible with prometheus and presents it for collection) =>
prometheus servers in AWS.
We're losing counts (at seemingly random times) where the expression sum(increase(metric{}[15m])) by (instance) drops to zero for a short time.
It's almost like it's failed to scrape the counters from the log-exporter service, however, if it had done this then these counts would actually end up in prometheus at a later time...however they appear to have gone completely.
I'd like to attached an image (chart) but doesn't look like it's possible here to attach an image.
Any thoughts, common reasons, why this happens would be greatly appreciated.
Regards
Gary