You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Prometheus Users
Hello.
First of all, english is not my native language so excuse me if I cannot explain myself well enough.
I am facing the following situation:
- Several Prometheus servers deployed in clusters dedicated to development environments (dev, itg, pre, pro), federated against a central one (utils). All of them in HA configuration.
- Each of the Prometheus has external labels configured (dev, itg, pre, pro, utils), for example:
externalLabels: cluster: dev-gke-cluster environment: dev
- Only 1 alertmanager deployed alongside the central Prometheus, in HA configuration.
- honor labels is enabled for federated targets.
Prometheus was deployed with helm chart prometheus-operator.
The problem here is, with some of the prometheus-operator default alert rules I am not able to tell where the alert comes from, because the external labels get overwritten.
For example, with the KubePodNotReady alert:
In Prometheus alerts tab:
Annotations
message
Pod demo-apps-devops-back/fwk-springboot-service-example-969897fd4-6c6gd has been in a non-ready state for longer than 15 minutes.
This alert refers to a pod and namespace that do not exist in the "utils" environment, but in the "dev" one, even though we defined the environment external label. All the labels here belong to the "utils" Prometheus, where all the metrics are gathered and from where the alerts are generated.
We have found that this happens whenever the alert rule expression has any type of aggregation, as the one in the example:
This is a problem because there are namespaces with the same name in different clusters; or in other cases there is no way to be sure of the pod location except by looking for it manually.
Is there any way to keep the original external labels in the final alert, or are they lost in the expression evaluation, and then replaced by the labels from the Prometheus that evaluates and sends the alert?
Thanks in advance for your assistance.
Sohaib Omar
unread,
Jun 12, 2020, 11:47:19 AM6/12/20
Reply to author
Sign in to reply to author
Forward
Sign in to forward
Delete
You do not have permission to delete messages in this group
Copy link
Report message
Show original message
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to dgarciad, Prometheus Users
or are they lost in the expression evaluation, and then replaced by the labels from the Prometheus that evaluates and sends the alert?
I guess this is what's happening probably.
Is there any way to keep the original external labels in the final alert
I guess your best bet is to either change the names of the external labels of the central(utils) Prometheus or re-write labels of child clusters at scrape time, using scrape_relabel_config.