Logs are flooded by "skipping resharding" warning messages

43 views
Skip to first unread message

Federico Buti

unread,
Apr 13, 2020, 5:12:55 PM4/13/20
to Prometheus Users
Hi all.

As the title implies we are seeing tons of logs in our Prometheus instances about failed resharding. Here is an excerpt from the logs of an instance:


ts=2020-04-13T20:30:54.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809844,
ts=2020-04-13T20:31:04.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809854,
ts=2020-04-13T20:31:14.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809864,
ts=2020-04-13T20:31:24.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809874,
ts=2020-04-13T20:31:34.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809884,
ts=2020-04-13T20:31:44.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809894,
ts=2020-04-13T20:31:54.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809904,
ts=2020-04-13T20:32:04.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809914,
ts=2020-04-13T20:32:14.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809924,
ts=2020-04-13T20:32:24.506Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809934,
ts=2020-04-13T20:32:34.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809944,
ts=2020-04-13T20:32:44.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809954,
ts=2020-04-13T20:32:54.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809964,
ts=2020-04-13T20:33:04.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809974,
ts=2020-04-13T20:33:14.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809984,
ts=2020-04-13T20:33:24.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586809994,
ts=2020-04-13T20:33:34.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586810004,
ts=2020-04-13T20:33:44.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586810014,
ts=2020-04-13T20:33:54.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586810024,
ts=2020-04-13T20:34:04.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586808573 minSendTimestamp=1586810034,
ts=2020-04-13T20:34:24.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586810048 minSendTimestamp=1586810054,
ts=2020-04-13T20:34:34.521Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586810048 minSendTimestamp=1586810064,
ts=2020-04-13T20:34:44.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586810048 minSendTimestamp=1586810074,
ts=2020-04-13T20:34:54.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586810048 minSendTimestamp=1586810084,
ts=2020-04-13T20:35:04.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586810048 minSendTimestamp=1586810094,
ts=2020-04-13T20:35:14.507Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586810048 minSendTimestamp=1586810104,


The logs on the other instance are a bit more diluted in time:

ts=2020-04-13T19:23:35.283Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586805515 minSendTimestamp=1586805805,
ts=2020-04-13T19:23:45.283Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586805515 minSendTimestamp=1586805815,
ts=2020-04-13T19:23:55.283Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586805515 minSendTimestamp=1586805825,
ts=2020-04-13T19:24:05.283Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586805515 minSendTimestamp=1586805835,
ts=2020-04-13T19:24:15.283Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586805515 minSendTimestamp=1586805845,
ts=2020-04-13T19:24:25.283Z caller=dedupe.go:112 component=remote level=warn remote_name=5a17e1 url="http://10.10.3.212:58186/api/v1/prom/write?db=prometheus" msg="Skipping resharding, last successful send was beyond threshold" lastSendTimestamp=1586805515 minSendTimestamp=1586805855,


Our current remote write configuration is as follows:

remote_write:
- url: http://10.10.3.212:8428/api/v1/write
queue_config:
max_samples_per_send: 10000
- url: "http://10.10.3.212:58186/api/v1/prom/write?db=prometheus"
write_relabel_configs:
- source_labels: [__name__, check]
regex: "xxxx_xx"
action: keep


Since Remote write tuning documentation says that "Prometheus implements sane defaults for remote write" should I just remove the setting for max_samples_per_send or are there any other advice that applies here?
I skimmed the linked page but apart from adjusting capacity on the basis of the chosen max_samples_per_send, I'm not sure whatever else can really help here.

Any advice really appreciated.
Thanks in advance,
F.

Julius Volz

unread,
Apr 14, 2020, 6:13:01 AM4/14/20
to Federico Buti, Prometheus Users
Reading the code, it looks like these warnings are produced because of another error that lead to no successful sending of samples to the remote end in the recent past (longer than 2x5s batch send deadline). In that case, resharding is skipped. In case the error in sending is a non-recoverable one, it should be logged at ERROR level (https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L840). Since your logs don't contain any such "non-recoverable error" message, another option is that the sending encounters an error that is classified as recoverable, which is retried. Those errors are only logged at DEBUG level though: https://github.com/prometheus/prometheus/blob/8224ddec23598152d7506b7b39f5235a77b5e036/storage/remote/queue_manager.go#L884. This could be an error like a broken network connection or 5xx status code.

So maybe it will help you to turn on debug-level logging (--log.level=debug) for a brief while to see the error that's being retried.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/978d0750-db67-45ef-89fd-90d25e5ba793%40googlegroups.com.

Federico Buti

unread,
Apr 14, 2020, 6:24:41 AM4/14/20
to Prometheus Users
Hi Julius!

Thanks a lot for the quick reply!
Yep, I don't have error logs. Or better, I have a remaining error log on one of the instances but it's related to a missing json file. Other error logs that we had in the past have been fixed so far and I've never seen that specific unrecoverable error.

As soon as I have time I'll try to switch on debug logs and I'll post the result!
Many thanks again.
F.
To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages