On 22.04.21 20:20, Matthias Rampke wrote:
> Your best starting point is the rules page of the Prometheus UI
> (:9090/rules). It will show the error. You can also evaluate the rule
> expression yourself, using the UI, or maybe using PromLens to help debug
> expression issues.
>
> /MR
:9090/rules show those 2 errors - found duplicate series for the match group
I think we may have a problem with the federation connfig..
alert:PrometheusRemoteWriteBehind
expr:(max_over_time(prometheus_remote_storage_highest_timestamp_in_seconds[5m])
- on(job, instance) group_right()
max_over_time(prometheus_remote_storage_queue_highest_sent_timestamp_seconds[5m]))
> 120
for: 15m
labels:
severity: critical
annotations:
description: Prometheus {{$labels.namespace}}/{{$labels.pod}} remote
write is {{ printf "%.1f" $value }}s behind for {{
$labels.remote_name}}:{{ $labels.url }}.
summary: Prometheus remote write is behind.
found duplicate series for the match group
{instance="
prometheus.slash-dir-poc-in.kuber.example.org:9090",
job="federate"} on the left hand-side of the operation: [{cluster="poc",
endpoint="web", exported_instance="x.x.x.x:9090",
exported_job="prometheus-k8s",
instance="
prometheus.slash-dir-poc-in.kuber.example.org:9090",
job="federate", namespace="monitoring", pod="prometheus-k8s-1",
prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0",
service="prometheus-k8s", team="MY-TEAM-NAME"}, {cluster="poc",
endpoint="web", exported_instance="x.x.x.x:9090",
exported_job="prometheus-k8s",
instance="
prometheus.slash-dir-poc-in.kuber.example.org:9090",
job="federate", namespace="monitoring", pod="prometheus-k8s-0",
prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0",
service="prometheus-k8s", team="MY-TEAM-NAME"}];many-to-many matching
not allowed: matching labels must be unique on one side
and
record:node:node_num_cpu:sum
expr:count by(cluster, node) (sum by(node, cpu)
(node_cpu_seconds_total{job="node-exporter"} * on(namespace, pod)
group_left(node) node_namespace_pod:kube_pod_info:))
found duplicate series for the match group {namespace="monitoring",
pod="prometheus-k8s-0"} on the right hand-side of the operation:
[{__name__="node_namespace_pod:kube_pod_info:", cluster="preprod",
instance="
prometheus.ep-preprod-in.kuber.example.org:9090",
job="federate", namespace="monitoring",
node="4516e9ed-4917-4792-ad49-2158775dc07e", pod="prometheus-k8s-0",
prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-1",
team="MY-TEAM-NAME"}, {__name__="node_namespace_pod:kube_pod_info:",
cluster="poc",
instance="
prometheus.slash-dir-poc-in.kuber.example.org:9090",
job="federate", namespace="monitoring",
node="602efe91-2eb5-466f-9350-c4c6ce35119a", pod="prometheus-k8s-0",
prometheus="monitoring/k8s", prometheus_replica="prometheus-k8s-0",
team="MY-TEAM-NAME"}];many-to-many matching not allowed: matching labels
must be unique on one side
also this alert fires
name: PrometheusOutOfOrderTimestamps
expr: rate(prometheus_target_scrapes_sample_out_of_order_total[5m]) > 0
we may have a problem with federation:
We have an external Prometheus which federates from 4x k8s cluter
Prometheus.
config
- job_name: federate
scrape_interval: 15s
scrape_timeout: 15s
honor_labels: false
metrics_path: /federate
scheme: https
tls_config:
insecure_skip_verify: true
params:
match[]:
- '{__name__=~".+"}'
file_sd_configs:
- files:
- k8s.yml
relabel_configs:
- source_labels:
- __address__
regex: (.*)
replacement: ${1}:9090
target_label: __address__
- labels:
cluster: poc
team: MY-TEAM-NAME
targets:
-
prometheus.slash-dir-poc-in.kuber.example.org
- labels:
cluster: devtest
team: MY-TEAM-NAME
targets:
-
prometheus.slash-dir-devtest-in.kuber.example.org
- labels:
cluster: preprod
team: MY-TEAM-NAME
targets:
-
prometheus.ep-preprod-in.kuber.example.org
- labels:
cluster: prod
team: MY-TEAM-NAME
targets:
-
prometheus.ep-prod-in.kuber.example.org
kind regards
Evelyn