How to monitor "consumer lag" via Kafka exporter?

3,118 views
Skip to first unread message

Lijing....@nokia-sbell.com

unread,
Apr 18, 2018, 9:56:14 PM4/18/18
to Prometheus Users
Hi friends,

I'm using Prometheus Kafka exporter (https://github.com/danielqsj/kafka_exporter) and now getting puzzled about how to monitor consumer lag.
Do you have any idea about this? (for example, which metric or expression should I use?)

My dashboard is based on Grafana dashboard #721, with only 3 panels monitoring 
  • "Messages In Per Topic"              sum without(instance)(rate(kafka_server_brokertopicmetrics_messagesin_total{job="kubernetes-service-endpoints"}[5m]))
  • "Bytes In Per Topic"                      sum without(instance)(rate(kafka_server_brokertopicmetrics_bytesin_total{job="kubernetes-service-endpoints"}[5m]))
  • "Bytes Out Per Topic"                   sum without(instance)(rate(kafka_server_brokertopicmetrics_bytesout_total{job="kubernetes-service-endpoints"}[5m]))
--------
I mentioned there are plenty of kafka metrics available but just not sure which should I use.
$ curl -s http://prometheus-prometheus-server.default.svc.cluster.local/api/v1/label/__name__/values |jq . | grep -c kafka_
157
But only 2 for kafka consumers
$ curl -s http://prometheus-prometheus-server.default.svc.cluster.local/api/v1/label/__name__/values |jq . | egrep 'kafka.*cons'
   
"kafka_server_delayedfetchmetrics_expirespersec_fetchertype_consumer",
   
"kafka_server_fetcherlagmetrics_consumerlag",
I checked these metrics, but the curves are flat, stuck at 0.
Consumers are far behind the producers, so messages are lost, but I expected to see the lag from "kafka_server_fetcherlagmetrics_consumerlag".

Thanks in advance.

Anton Huck

unread,
Apr 19, 2018, 2:20:30 AM4/19/18
to Prometheus Users
We had plenty problems monitoring the consumer lags with older Kafka versions - what version are you using?

The only reliable Exporter I found was this one: 
https://github.com/braedon/prometheus-kafka-consumer-group-exporter
I think it's quite slim and straight forward but with two drawbacks to consider (or to fix youself):
- For SSL connections you need to extend the source
- The lag metric can be omitted as you'll get topic highwater and consumer offset anyway. Or create a recording rule.

Kind regards,
Anton

xuej...@growing.io

unread,
Aug 8, 2018, 10:55:15 PM8/8/18
to Prometheus Users
if this exporter support old consumer?

在 2018年4月19日星期四 UTC+8下午2:20:30,Anton Huck写道:
Reply all
Reply to author
Forward
0 new messages