Prometheus remote_read for multiple Prometheus instances or Thanos Querier?

Shay Berman

unread,

May 4, 2020, 5:21:41 AM5/4/20

to Prometheus Users

I want to consult what is the right approach to break down a "big" prometheus (that scrap too many targets\metrics) into small prometheus instances(each scrap a range of targets) but still get global view via Grafana?

Which approach below is better?

Approach1) to use prometheus instance that remote_read from all the small prometheus instances. The prometheus remote_read can be installed as k8s deployment with loadbalancer above to allow scale out depend on network traffic. So this remote_read will be used as datasource in Grafana (so all dashboards and queries will still use single datasource).

Approach2) to use Thanos Querier that read from all the small prometheus instances. The Thanos querier is stateless by design so could be deploy in k8s as deployment with loadbalancer. The Thanos querier will be used as datasource in Grafana. I believe that Thanos querier component has small footprint (compare to prometheus remote_read).

Thanks

Brian Candler

unread,

May 4, 2020, 5:44:44 AM5/4/20

to Prometheus Users

I don't think (1) will work - that is, I don't think a prometheus server itself can act as a server endpoint for remote_read. (Federation is different).

I would add some more lightweight options to your list, which may do the job:

3) Use promxy as a front-end to combine queries across multiple prometheus instances. This is close in spirit to your option (1), but uses the regular PromQL API to fetch data.

4) Configure separate prometheus data sources in Grafana. This is fine for graphing, but won't help you do queries which span multiple data sources.

5) Use remote_write to push all your data, or a subset of it, to a single VictoriaMetrics instance, and query that in Grafana. A single VM node may or may not scale as far as you need, but it's a doddle to set up so you won't waste any time trying it out.

6) Use a central prometheus instance with federation to scrape a subset of the metrics and/or at lower resolution. This is useful if you only want a subset of your data available globally, and pretty easy to set up. You could also configure global recording rules here to summarise.

Bartłomiej Płotka

unread,

May 4, 2020, 6:00:21 AM5/4/20

to Brian Candler, Prometheus Users

Hi,

Thanks, Brian for tips, I will quickly jump in, as there is a little bit of misinformation here, maybe because things are moving quickly in this area. (:

> I believe that Thanos querier component has small footprint (compare to prometheus remote_read).

Why? There is literally no overhead vs `remote_read` because Thanos uses exactly `remote_read.`

> I don't think a prometheus server itself can act as a server endpoint for remote_read.

Prometheus can totally serve as both client and server of remote read. It will be less efficient as it does not yet support streaming remote_read, whereas Thanos does.

> This is close in spirit to your option (1), but uses the regular PromQL API to fetch data.

Not really, they added remote_read option as well, because querying PromQL directly was not the best idea. This makes promxy almost similar to Thanos - however Thanos has much more features, but I will stop here as I can be biased (:

> 5) Use remote_write to push all your data, or a subset of it, to a single VictoriaMetrics instance, and query that in Grafana. A single VM node may or may not scale as far as you need, but it's a doddle to set up so you won't waste any time trying it out.

I would say it's fair to mention there are many more, solutions to try in this space. For example, solutions that are much closer to Prometheus Ecosystem:
* Thanos Receiver which is quick to set up as well. Plus you can start with a single node AND automatically gracefully scale up or add object storage support.
* Cortex which has single process option as well.

Just to make sure we mention this: Remote write has different tradeoffs. You need to have a stable network, high availability of receiving server, etc. I would not start with this for simple scenarios. Thanos + sidecar, or promxy or Prometheus remote read might be more viable for a start.

Kind Regards,
Bartek

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a3bdd051-9113-4fc9-9037-70dc51696472%40googlegroups.com.

Brian Candler

unread,

May 4, 2020, 6:23:34 AM5/4/20

to Prometheus Users

I apologise for spreading misinformation!

In my defence, the documentation may be incomplete. I can't find any mention of prometheus providing (rather than consuming) a remote_read endpoint at:

https://prometheus.io/docs/prometheus/latest/querying/api/ [this is not the whole API though; e.g. doesn't mention /federate]

https://prometheus.io/docs/prometheus/latest/storage/

https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage

But after grepping the tree, I found it mentioned in passing here:

https://prometheus.io/docs/prometheus/latest/migration/

"The data format in Prometheus 2.0 has completely changed and is not backwards compatible with 1.8. To retain access to your historic monitoring data we recommend you run a non-scraping Prometheus instance running at least version 1.8.1 in parallel with your Prometheus 2.0 instance, and have the new server read existing data from the old one via the remote read protocol."

So it looks like remote_read server implementation has been around for a very long time, and I never came across it before. Thanks for putting me straight.

Bartłomiej Płotka

unread,

May 4, 2020, 7:21:16 AM5/4/20

to Brian Candler, Prometheus Users

Also this: https://prometheus.io/blog/2019/10/10/remote-read-meets-streaming/

Actually Brian you have a very nice point here. It's not well documented - it's definitely easy to miss this! There is even a recent discussion about it thanks to Shay: https://github.com/prometheus/prometheus/issues/7192

Help wanted to improve this! 🤗

Kind Regards,

Bartek

--

You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/ce265d58-8784-47a2-909e-c9477a4876ea%40googlegroups.com.

Shay Berman

unread,

May 4, 2020, 3:25:41 PM5/4/20

to Prometheus Users

Hi Bartek,

Thanks for the clarifications.

1. Assuming Thanos query component is the same as Prometheus remote_read (probably also in terms of CPU\Memory). I would like to ask how much overhead the Thanos sidecar brings in (in term of CPU\memory and network bandwidth)?

2. You mentioned Thanos streaming as advantage - can you elaborate about how it compare to just using Prometheus remote_read ?

3. What features Thanos Query provide that does not come in native Prometheus remote_read? (except for downsampling and deduplication for HA)

Thanks

Shay

To unsubscribe from this group and stop receiving emails from it, send an email to promethe...@googlegroups.com.

Shay Berman

unread,

May 4, 2020, 3:31:42 PM5/4/20

to Prometheus Users

Thanks Brain

- I will check promxy as alternative. But anyway in terms of adaption it looks like Thanos is much wider (means better support if needed in the future).

- Federation will not fit because its only on subset of the metric, and its hard to scale it.

Shay

Reply all

Reply to author

Forward