same recording rules on both remote write sender and receiver

420 views
Skip to first unread message

Bogdan L

unread,
Feb 4, 2022, 8:19:22 AM2/4/22
to Prometheus Users
Hi,

I have a situation where I have a few "local" Prometheus servers sending data to a "global" server using the remote write API. I get errors that look like this on the remote write receiver:

ts=2022-02-03T12:41:11.244Z caller=write_handler.go:57 level=error component=web msg="Out of order sample from remote write" err="duplicate sample for timestamp"

The senders get the same error from the receiver, with a 400 HTML code.

After much trial and error I figured out that it happens because I have the same recording rules on all servers, on both senders and receiver. recording-rules.yaml looks like this:
```
groups:
  - name: node-exporter
    rules:
      # CPU cores per node
      - record: instance:node_cpus:count
        expr: count(node_cpu_seconds_total{mode="idle"}) without (cpu,mode)

      # CPU in use by CPU
      - record: instance_cpu:node_cpu_seconds_not_idle:rate5m
        expr: sum(rate(node_cpu_seconds_total{mode!="idle"}[5m])) without (mode)
```

However, if I delete the second rule, the errors are gone. So if I change recording-rules.yaml on all servers to:
```
groups:
  - name: node-exporter
    rules:
      # CPU cores per node
      - record: instance:node_cpus:count
        expr: count(node_cpu_seconds_total{mode="idle"}) without (cpu,mode)
```

Why?

1. Why are there duplicates in the first case, does the remote write receiver also run the rules when it receives data?
2. Why aren't there errors any more when the only rule is the CPU count? Shouldn't there be duplicates in that case too?

Brian Candler

unread,
Feb 4, 2022, 10:27:56 AM2/4/22
to Prometheus Users
Have you given each of your "local" prometheus servers unique labels, using the global external_labels setting (recommended), or some other way?  This is to ensure all timeseries have a unique label set.

Bogdan L

unread,
Feb 4, 2022, 11:13:15 AM2/4/22
to Brian Candler, Prometheus Users
There are external_labels, yes. "instance" is also unique, there is no overlap

On 4 Feb 2022, at 17:28, Brian Candler <b.ca...@pobox.com> wrote:

Have you given each of your "local" prometheus servers unique labels, using the global external_labels setting (recommended), or some other way?  This is to ensure all timeseries have a unique label set.
--
You received this message because you are subscribed to a topic in the Google Groups "Prometheus Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/prometheus-users/vRTNtIlbdV8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/600dd093-c5cc-4003-9fa0-5e531f6667ban%40googlegroups.com.

Brian Candler

unread,
Feb 4, 2022, 11:28:19 AM2/4/22
to Prometheus Users
Have you checked your prometheus version at both ends?  It's possible that bugs have been fixed. Remote write receiver was only officially promoted to "stable" in v2.33

Other than that, I'm afraid I don't have any ideas.

Bogdan L

unread,
Feb 4, 2022, 11:38:51 AM2/4/22
to Brian Candler, Prometheus Users

On 4 Feb 2022, at 18:28, Brian Candler <b.ca...@pobox.com> wrote:


Have you checked your prometheus version at both ends?  It's possible that bugs have been fixed. Remote write receiver was only officially promoted to "stable" in v2.33

Forgot to mention,

Prometheus version:

prometheus, version 2.32.1 (branch: release-2.32, revision: 0)
  build user:       root
  build date:       20211227-15:14:28
  go version:       go1.17.6
  platform:         freebsd/amd64
2.33 wasn't available in FreeBSD packages when I checked. 


Other than that, I'm afraid I don't have any ideas.

No worries, thank you. 

Reply all
Reply to author
Forward
0 new messages