Federation Job "Error on ingesting out-of-order samples"

929 views

Skip to first unread message

Brian Beynon

unread,

Feb 15, 2021, 3:45:39 PM2/15/21

to Prometheus Users

Hello,

I recently updated our prometheus setup from using the helm chart to using the prometheus-operator.

Summary of current setup:

Platform: Google Cloud

1. Google Project used for monitoring:

Prometheus Operator (prometheus,alertmanager,grafana, node-exporter, kube-state-metrics...ect.)

2. Then multiple other Google Projects that now run the Prometheus-Operator (node-exporter, kube-state-metrics...ect) but without Alertmanager/Grafana.

So the main google project (#1 above) has federate scrapes/jobs that connect to each of the other google projects prometheus (#2 above).

Since updating to the prometheus-operators I'm now running into these errors coming from the main prometheus logs:

msg="Error on ingesting samples with different value but same timestamp"

and msg="Error on ingesting out-of-order samples".

Below is an example of one of the federate jobs where the errors are coming from.

When I have the job "vms" and job "node-exporter" both enabled the errors occur. If I disable either of those jobs I no longer see the errors.

- job_name: 'test-abc-123'

scrape_interval: 60s

scrape_timeout: 30s

honor_labels: true

metrics_path: '/federate'

scheme: 'https'

basic_auth:

username: '###################'

password: '###################'

params:

'match[]':

- '{job="vms"} '

- '{job="node-exporter"} '

- '{job="postgres"} '

- '{job="barman"} '

- '{job="apiserver"} '

- '{job="kube-state-metrics"} '

static_configs:

- targets:

- 'test-abc-123.com'

labels:

project: 'test-abc-123'

Here is the node-exporter serviceMonitor from project test-abc-123:

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

labels:

app.kubernetes.io/component: exporter

app.kubernetes.io/name: node-exporter

app.kubernetes.io/part-of: kube-prometheus

app.kubernetes.io/version: 1.1.0

name: node-exporter

namespace: monitoring

spec:

endpoints:

- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token

interval: 15s

port: https

relabelings:

- action: replace

regex: (.*)

replacement: $1

sourceLabels:

- __meta_kubernetes_pod_node_name

targetLabel: instance

scheme: https

tlsConfig:

insecureSkipVerify: true

jobLabel: app.kubernetes.io/name

selector:

matchLabels:

app.kubernetes.io/component: exporter

app.kubernetes.io/name: node-exporter

app.kubernetes.io/part-of: kube-prometheus

Here is the "vms" job from project test-abc-123:

- job_name: 'vms'

static_configs:

- targets: ['db-prod-1:9100','db-prod-2:9100','util-1:9100']

labels:

project: 'client-vms'

I have tried updating labels but maybe not in the right way. Any suggestions or pointers would be appreciated.

Thank you

Brian Beynon

unread,

Feb 15, 2021, 4:36:12 PM2/15/21

to Prometheus Users

I found out what the issue was. I had the same set of rules defined in projects I was scraping. I only need the rules in my main prometheus.
I remove the rules from the projects I was using federate against and the errors have stopped.

Reply all

Reply to author

Forward

0 new messages