Need Help: Monitoring MongoDB replicaset replication lag with Prometheus MongoDB exporter

536 views
Skip to first unread message

msta...@objectrocket.com

unread,
Sep 4, 2018, 7:26:30 AM9/4/18
to Prometheus Users
Hello
 
I'm pretty much still new when it comes to using Prometheus, especially the query language, and I tried to find some answers online first, but got little luck far, hence I decided to ask here.

I've been looking at the metrics that mongodb exporter makes available for Prometheus to scrape, to see how can I get to graph (or maybe notify as well, when things go bad) the replication lag within a single mongodb replica sets (3 node set).

I experimented with various queries using the 'max_oplog_lag_by_set' and it all seemed to work, until I noticed something weird in the graphs (and then later in prometheus query prompt, when I started digging around. like people tend to say - the devil is in the details :)  )!!

A few of the MongoDB replica sets monitored that way, showed really large values for the lag (up to 8 million seconds)....
So I looked into the actual SECONDARY members for those sets, and noticed that none of them had such a high lag at the time, and in fact a few had 0s lag....
That made me thinking - maybe the query isn't actually right at all , or maybe I should use a different metric ?

Any idea where could such a value be coming from ? It's really confusing - and makes setting up alerting practically impossible for this case, as I would get alerted for lag that sin't there :)

Anybody else here tried to monitor the replication lag in mongodb replica sets, using mongodb exporter for prometheus ?
What metric / query do you use ? Would you mind sharing ?

Kind regards
Marcin
Reply all
Reply to author
Forward
0 new messages