Issues with prometheus rules once thanos is enabled

1,009 views
Skip to first unread message

Rodrigo Martinez

unread,
Aug 24, 2020, 4:04:57 PM8/24/20
to Prometheus Users

I have noticed that when enabling thanos I now see in prometheus logs of

Error executing query: found duplicate series for the match group {namespace="monitoring", pod="kube-state-metrics-567789848b-9d77w"} on the right hand-side of the operation: [{__name__="node_namespace_pod:kube_pod_info:", namespace="monitoring", node="test01", pod="kube-state-metrics-567789848b-9d77w"}, {__name__="node_namespace_pod:kube_pod_info:", namespace="monitoring", node="test01", pod="kube-state-metrics-567789848b-9d77w"}];many-to-many matching not allowed: matching labels must be unique on one side

Primarily this is coming from kube-state-metrics job and other prometheusrule with metrics that have honor_labels set to true



Just wondering overall how enabling thanos causes these errors now

Bartłomiej Płotka

unread,
Aug 25, 2020, 7:12:33 AM8/25/20
to Rodrigo Martinez, Prometheus Users
Hey, 

Interesting. Can you double-check your rules, what are they asking for? What queries are they making? Sounds like some typo in rule configuration you have with double asked matchers in the PromQL? Not sure how this is related to Thanos if you see this on Prometheus PromQL log.

Kind Regards,
Bartek Płotka (@bwplotka)


--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/2da7a037-f8f1-4e84-b238-c7e45ae54498n%40googlegroups.com.

Julien Pivotto

unread,
Aug 25, 2020, 7:42:23 AM8/25/20
to Bartłomiej Płotka, Rodrigo Martinez, Prometheus Users
On 25 Aug 12:12, Bartłomiej Płotka wrote:
> Hey,
>
> Interesting. Can you double-check your rules, what are they asking for?
> What queries are they making? Sounds like some typo in rule configuration
> you have with double asked matchers in the PromQL? Not sure how this is
> related to Thanos if you see this on Prometheus PromQL log.


If thanos in configured as remote read, you can have the errors below
because we see the metrics with and without the external labels.

>
> Kind Regards,
> Bartek Płotka (@bwplotka)
>
>
> On Mon, 24 Aug 2020 at 21:04, Rodrigo Martinez <rmar...@gmail.com> wrote:
>
> >
> > I have noticed that when enabling thanos I now see in prometheus logs of
> >
> > Error executing query: found duplicate series for the match group
> > {namespace="monitoring", pod="kube-state-metrics-567789848b-9d77w"} on the
> > right hand-side of the operation:
> > [{__name__="node_namespace_pod:kube_pod_info:", namespace="monitoring",
> > node="test01", pod="kube-state-metrics-567789848b-9d77w"},
> > {__name__="node_namespace_pod:kube_pod_info:", namespace="monitoring",
> > node="test01", pod="kube-state-metrics-567789848b-9d77w"}];many-to-many
> > matching not allowed: matching labels must be unique on one side
> >
> > Primarily this is coming from kube-state-metrics job and other
> > prometheusrule with metrics that have honor_labels set to true
> >
> >
> >
> > Just wondering overall how enabling thanos causes these errors now
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Prometheus Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to prometheus-use...@googlegroups.com.
> > To view this discussion on the web visit
> > https://groups.google.com/d/msgid/prometheus-users/2da7a037-f8f1-4e84-b238-c7e45ae54498n%40googlegroups.com
> > <https://groups.google.com/d/msgid/prometheus-users/2da7a037-f8f1-4e84-b238-c7e45ae54498n%40googlegroups.com?utm_medium=email&utm_source=footer>
> > .
> >
>
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/CAMssQwYw1EaGTRaOdet0wOwHTY7aoyLF6XCwFRVcY6cbiP%3D8KA%40mail.gmail.com.

--
Julien Pivotto
@roidelapluie

Julien Pivotto

unread,
Aug 25, 2020, 7:43:26 AM8/25/20
to Bartłomiej Płotka, Rodrigo Martinez, Prometheus Users

Can we know which prometheus version you are running?
--
Julien Pivotto
@roidelapluie

Bartłomiej Płotka

unread,
Aug 25, 2020, 7:46:19 AM8/25/20
to Bartłomiej Płotka, Rodrigo Martinez, Prometheus Users
It's very rare that someone have Prometheus -> remote read -> Thanos, it's rather Thanos sidecar connected to remote read that Prometheus expose.

Rodrigo, can you confirm what situation you have? 

Kind Regards,
Bartek Płotka (@bwplotka)

Rodrigo Martinez

unread,
Aug 25, 2020, 10:35:18 AM8/25/20
to Bartłomiej Płotka, Prometheus Users
Running prometheus prometheus:2.18.2
thanos sidecar thanos:0.10.0

I did see that some metrics after adding thanos side car has label exported_namespace/exported_pod
so guessing those are causing duplicates.

Currently I just have thanos side car running . Via enabling via prometheus operator .
How do I verify the setup ?
I figured it might be a configuration issue on my end , as I have not seen anyone else with this problem.


As well as this is the configuration for kube-state-metrics
- job_name: monitoring/kube-state-metrics/0
  honor_labels: true
  honor_timestamps: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  kubernetes_sd_configs:
  - role: endpoints
    namespaces:
      names:
      - monitoring
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
    separator: ;
    regex: kube-state-metrics
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_port_name]
    separator: ;
    regex: http
    replacement: $1
    action: keep
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Node;(.*)
    target_label: node
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_endpoint_address_target_kind, __meta_kubernetes_endpoint_address_target_name]
    separator: ;
    regex: Pod;(.*)
    target_label: pod
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_namespace]
    separator: ;
    regex: (.*)
    target_label: namespace
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: service
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_pod_name]
    separator: ;
    regex: (.*)
    target_label: pod
    replacement: $1
    action: replace
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace
  - separator: ;
    regex: (.*)
    target_label: endpoint
    replacement: http
    action: replace


Brian Candler

unread,
Aug 25, 2020, 10:49:36 AM8/25/20
to Prometheus Users
  - source_labels: [__meta_kubernetes_service_name]
    separator: ;
    regex: (.*)
    target_label: job
    replacement: ${1}
    action: replace
  - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
    separator: ;
    regex: (.+)
    target_label: job
    replacement: ${1}
    action: replace

I'm not sure what you're attempting to do here, but it is risky to mess with the "job" label.  This is the one prometheus itself sets to identify the scrape job where the metric originated, and if you end up scraping the same target multiple times from different jobs, this label ensures that the timeseries have unique label sets.

Rodrigo Martinez

unread,
Aug 26, 2020, 10:24:38 PM8/26/20
to Prometheus Users
Have been updating old prometheus rules and have noticed less errors.
However there are some taken from other components.

Currently I have ceph running and using their prometheusrule from their examples
I see issues

kube_node_status_condition{condition="Ready",job="kube-state-metrics",status="true"} * on (node) group_right() max(label_replace(ceph_disk_occupation{job="rook-ceph-mgr"},"node","$1","exported_instance","(.*)")) by (node)

on cluster with no thanos integration no issues. But with thanos i see collision .

when looking at left side of metric I do see that it returns
metrics twice

kube_node_status_condition{condition="Ready",instance="0.0.0.0:8080",job="kube-state-metrics",node="test-node",status="true"}
kube_node_status_condition{condition="Ready",endpoint="http",instance="0.0.0.0:8080",job="kube-state-metrics",namespace="monitoring",node="test-node",pod="kube-state-metrics-567789848b-9d77w",service="kube-state-metrics",status="true"}

Have to see what I`m doing wrong on my end. But if its noticeable from what is being displayed do let me know . As this issue was not seen until I enabled thanos side car.

Thanks

Rodrigo Martinez

unread,
Aug 26, 2020, 10:35:04 PM8/26/20
to Prometheus Users
I did as well just add
    relabelings:
    - action: labeldrop
      regex: (pod|service|endpoint|namespace)

since looks like up to date prometheusrule uses this for kube-state-metrics.

Rodrigo Martinez

unread,
Aug 26, 2020, 10:35:17 PM8/26/20
to Prometheus Users
sorry meant servicemonitor
Reply all
Reply to author
Forward
0 new messages