On Wed, Mar 8, 2017 at 12:11 PM, Brian Brazil <brian.brazil@robustperception.io> wrote:On 8 March 2017 at 16:21, Jack Neely <jjn...@42lines.net> wrote:Greetings,I have a question regarding the duplicate metrics that can be produced with redundant Prometheus instances and federation.Some background about how we've deployed Prometheus: Each team has their own Prometheus server/instance. That team may add/create recording rules for important aggregations. We encourage building Grafana dashboards from recording rules. These "local" Prometheus instances are mostly Aurora jobs and, thus, have an ephemeral nature to them.At a "global" level on real hardware we discover all Prometheus instances and federate metrics matching the recording rule format. Grafana queries are (in most cases) done against the global Prometheus servers because dashboards will continue to present historical data even if your local Aurora Prometheus job has moved, restarted, or reconfigured (and lost its storage).When a redundant Prometheus instance for a team is brought online for redundancy (or even testing) this populates the global Prometheus servers with another copy of the same recording rule metrics. At this point Grafana dashboards become confusing as they receive double metrics.I need to figure out a way to address folks "seeing double" without passing out strong glasses. I can filter out specific prometheus server instances from the queries but that doesn't seem to be a scalable solution. Probably either version of the timeseries will work, just not both.Do folks have any techniques for managing this?The usual approach would be to filter out at the graphing stage, likely using Grafana templating. The redundant Prometheus should have slightly different external labels (e.g. dc-1 and dc-2) which will be present in the federated data, so you can filter using that.Presently that label is "instance".
If the recording rule metric doesn't have an instance Prometheus adds one to identify the Prometheus instance this was federated for. (And of course there is more confusion if there are instance labels present.) So this configuration at present needs to be fixed on my end.
Is there a label name / semantics here that are suggested?--
Did some testing with this. I do have honor_labels set to true. The issue I have is that Prometheus always attempts to attach an 'instance' label. So if one exists, that persists (in the very few recording rules that are aggregated by instance I have). Most rules get an instance label slapped on equal to the Prometheus instance the metric was federated from.
Jack,Have you considered using HAProxy to hide Prometheus redundancy from Grafana? I'm currently looking into this myself and would love to hear from other folks who has implemented this, since I'm still in the researching phase.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/fa0d84e7-86bc-4cf3-89f9-bc5c83da7e92%40googlegroups.com.--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
On Thu, May 11, 2017 at 3:32 PM, <afdra...@gmail.com> wrote:Jack,Have you considered using HAProxy to hide Prometheus redundancy from Grafana? I'm currently looking into this myself and would love to hear from other folks who has implemented this, since I'm still in the researching phase.While you can do this, it might get a tad annoying to have graphs slightly change on every reload (even for the same graph time range), as you alternate between backends, because both Prometheus servers are slightly phase-shifted in their data collection. Maybe with some kind of affinity where it always takes one server, unless it becomes unavailable...
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CA%2BT6Yox8t5_qU6f9ZB_aO821LCtAX43WRJLokUwB_czEG%3DQ5oA%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAHJKeLrLtgJzJX2%3DsBSAPaBKKFrH7v%2B6Nivxv_nj_HxX9giB1w%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/fa0d84e7-86bc-4cf3-89f9-bc5c83da7e92%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CA%2BT6Yox8t5_qU6f9ZB_aO821LCtAX43WRJLokUwB_czEG%3DQ5oA%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To post to this group, send email to promethe...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/88340352-c039-4f69-9b25-9b8565510501%40googlegroups.com.