--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/3cbfac1a-8688-4c13-b44f-5da7e035f448%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
We're still working on a good set of alert rules, I'll add those once we review and test what we currently have.I've added zookeeper & chroot labels to identify clusters, since they might have topics with same name, but the first thing I did when configuring scraping was to remove those and replace with a static label that identify each cluster. Seems like there's very little value in those labels in general and I'll drop it soon.I'll add some Prometheus config examples to the docs.
Good point about kafka_zookeeper_up, it's only set after successful scrape, so one would need to alert using absent(kafka_zookeeper_up), which is a bit counter intuitive and that doesn't really need any dedicated metric.
--On Fri, Jul 21, 2017 at 12:18 AM, Brian Brazil <brian.brazil@robustperception.io> wrote:On 21 July 2017 at 02:42, Łukasz Mierzwa <l.mi...@gmail.com> wrote:Hi,we (Cloudflare) are running a few Kafka clusters which uses ZooKeeper for cluster coordination, we export broker level metrics using jmx_exporter but we were missing metrics from the cluster level POV. We've noticed that under some scenarios broker might have a different view of the world than the rest of the cluster (network partitions) and when that happens it's might be not in sync with the rest of the cluster, but won't indicate any replication lag (we're still looking into this). To get metrics from the authoritative cluster state, ZooKeeper that is, we've created https://github.com/cloudflare/kafka_zookeeper_exporter that aims to give us better visibility and more detailed alerting.One of the issues was that with only broker level metrics under-replicated alert are triggered on the leader node, not on the replica that is out of sync, so the first step is to always find the node affected. Cluster level metrics will give us clear list of nodes that are out of sync, which give us alerts triggered for affected node rather than of the leader.Hopefully this will also be useful to others, feedback very much welcomed.Thanks for sharing, this'd be worth putting up on the official list.This is a blackbox/snmp style exporter, so I'd suggest including an example of the Prometheus configuration for it as that's not exactly obvious to beginners.I'd also remove the zookeeper and chroot labels, as that's something that should be handled on the Prometheus relabelling side. For example if there's only one chroot, the user won't want that label and they'll already have the instance label to cover what the zookeeper label is doing.kafka_zookeeper_up isn't always being set, given you're failing the whole scrape if you can't talk to zookeeper (which is fine) I'd suggest removing this metric to avoid confusion and users trying to alert on it being 0.--Brian BrazilŁukasz Mierzwa
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAFcbevsSKbcVG9JaZw%3D7VeZbgcFCYYs3ZfDswqF5mhR-x1_Ovw%40mail.gmail.com.
Speaking of example config, Kafka doesn't know much about hostnames (kafka1), only about broker IDs which are numeric (101). To match with broker level metrics that all have instance=kafka1 I wanted to rewrite instance label with:- job_name: kafka_zookeeperdns_sd_configs:- names:- kafka-exporter-dns-srvmetrics_path: /kafkaparams:zookeeper: ['zk:2181']chroot: ['/kafka/cluster]relabel_configs:- action: replacesource_labels: [replica]regex: (.+)target_label: instancereplacement: kafka${1}But I can't get it to work.Docs state:> After relabeling, the instance label is set to the value of __address__ by default if it was not set during relabeling.So I would expect that to work, but it seems that this relabeling is global to all metrics from a single scrape, UI shows it on scrape job label list with instance=kafka.Am I assuming correctly that this won't provide different relabeling value for each scraped metric?
Brian
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAFcbevsSKbcVG9JaZw%3D7VeZbgcFCYYs3ZfDswqF5mhR-x1_Ovw%40mail.gmail.com.
--Łukasz Mierzwa
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/WEu7iTS-cV8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/b632722e-4500-4314-b082-70d69f31a75c%40googlegroups.com.