Federation Job "Error on ingesting out-of-order samples"

912 views
Skip to first unread message

Brian Beynon

unread,
Feb 15, 2021, 3:45:39 PM2/15/21
to Prometheus Users
Hello,
I recently updated our prometheus setup from using the helm chart to using the prometheus-operator. 

Summary of current setup:
Platform:  Google Cloud
1.  Google Project used for monitoring:
Prometheus Operator (prometheus,alertmanager,grafana, node-exporter, kube-state-metrics...ect.)

2. Then multiple other Google Projects that now run the Prometheus-Operator (node-exporter, kube-state-metrics...ect) but without Alertmanager/Grafana.

So the main google project (#1 above) has federate scrapes/jobs that connect to each of the other google projects prometheus (#2 above).

Since updating to the prometheus-operators I'm now running into these errors coming from the main prometheus logs:
msg="Error on ingesting samples with different value but same timestamp"
and msg="Error on ingesting out-of-order samples".

Below is an example of one of the federate jobs where the errors are coming from. 
When I have the job "vms" and job "node-exporter" both enabled the errors occur.  If I disable either of those jobs I no longer see the errors.  

- job_name: 'test-abc-123'
  scrape_interval: 60s
  scrape_timeout: 30s
  honor_labels: true
  metrics_path: '/federate'
  scheme: 'https'
  basic_auth:
    username: '###################'
    password: '###################'
  params:
    'match[]':
      - '{job="vms"} '
      - '{job="node-exporter"} '
      - '{job="postgres"} '
      - '{job="barman"} '
      - '{job="apiserver"} '
      - '{job="kube-state-metrics"} '
  static_configs:
    - targets:
      - 'test-abc-123.com'
      labels:
        project: 'test-abc-123'

Here is the node-exporter serviceMonitor from project test-abc-123:

kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: node-exporter
    app.kubernetes.io/part-of: kube-prometheus
  name: node-exporter
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 15s
    port: https
    relabelings:
    - action: replace
      regex: (.*)
      replacement: $1
      sourceLabels:
      - __meta_kubernetes_pod_node_name
      targetLabel: instance
    scheme: https
    tlsConfig:
      insecureSkipVerify: true
  selector:
    matchLabels:
      app.kubernetes.io/component: exporter
      app.kubernetes.io/name: node-exporter
      app.kubernetes.io/part-of: kube-prometheus

Here is the "vms" job from project test-abc-123:
    
      - job_name: 'vms'
        static_configs:
          - targets: ['db-prod-1:9100','db-prod-2:9100','util-1:9100']
            labels:
              project: 'client-vms'

I have tried updating labels but maybe not in the right way.   Any suggestions or pointers would be appreciated.  

Thank you






Brian Beynon

unread,
Feb 15, 2021, 4:36:12 PM2/15/21
to Prometheus Users
I found out what the issue was.  I had the same set of rules defined in projects I was scraping.  I only need the rules in my main prometheus.
I remove the rules from the projects I was using federate against and the errors have stopped.
Reply all
Reply to author
Forward
0 new messages