One Prometheus per Observed System vs. One Prometheus for Everything

83 views
Skip to first unread message

Tim Schwenke

unread,
May 5, 2020, 7:05:20 AM5/5/20
to Prometheus Users
Hello,

did I understood correctly that due to Prometheus being very light-weight (unlike an Elasticsearch) and efficient but having an upper-limit of xx millions of time series per instance it is recommended to have one Prometheus server/container per observed system (may it be an application or set of CI/CD job runners) than to host a single massive Prometheus?

On one hand I see the advantage in scraping all my REST APIs across all apps with one Prometheus. On the other hand I also have a ton of application specific metrics I would have to separate with a prefix or so to not lose overview (labels work as well, but I have to pick a metric first to filter for a certain label value).

Thanks in advance,

Tim Schwenke

Julius Volz

unread,
May 6, 2020, 3:30:28 AM5/6/20
to Tim Schwenke, Prometheus Users
Yeah, at some scale it becomes common to run multiple or even many Prometheus servers, segmented by different aspects like team, service, datacenter, sharding level, ... On the other hand, many organizations start out with a single Prometheus server and only split things up as that one becomes too large and unwieldy to handle. A single Prometheus on a big machine can take you quite far, but you'll have to experiment. In the end it doesn't matter so much how many different systems or services you monitor, but how many time series they produce, as that's the main scaling bottleneck for Prometheus.

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/00bda94d-af54-4a44-b9bf-f9e9af8bd042%40googlegroups.com.

Stuart Clark

unread,
May 6, 2020, 3:46:17 AM5/6/20
to promethe...@googlegroups.com, Julius Volz, Tim Schwenke, Prometheus Users
The other big reason to split into multiple servers is organisational rather than technical.

Having servers per team or per service allows them to be owned by the same people as the service, rather than by a central team. For larger organisations this can give a lot more control over monitoring to the people who actually use it. For example, a team controlling their own alerts instead of a central team having generic alerts or a slow process to get things changed.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Shay Berman

unread,
May 6, 2020, 3:48:12 AM5/6/20
to Prometheus Users
Actually I am facing the same situation when dealing with millions of time series on single prometheus. 
So I am trying to break it down to smaller prometheus instances(each scrap range of targets). 
But then a global view comes in (because you don't want to break you existing dashboards that has queries from specific datasource) so there are few solutions here:
1. Thanos querier link1 and link2 - which also can give you long term storage as optional phase.
2. promxy which probably lighter(no need sidecar inside prometheus pod) but less features like long term storage and deduplication.
3. A global prometheus that do just remote_read from all the smaller prometheus. So its work but not really well documented. I believe #1 and #2 are better.
4. Prometheus fedetation - but this has scaling limitation since you must scrap subset of the data, it may boom is you have many small prometheses.

Just sharing my 2cent so far.
Shay

Julius Volz

unread,
May 6, 2020, 3:55:58 AM5/6/20
to Stuart Clark, Prometheus Users, Tim Schwenke
On Wed, May 6, 2020 at 9:46 AM Stuart Clark <stuart...@jahingo.com> wrote:
The other big reason to split into multiple servers is organisational rather than technical.

Having servers per team or per service allows them to be owned by the same people as the service, rather than by a central team. For larger organisations this can give a lot more control over monitoring to the people who actually use it. For example, a team controlling their own alerts instead of a central team having generic alerts or a slow process to get things changed.

Yeah, great point. Unless you run everything pretty centrally, giving each team control and ownership over their own Prometheus servers limits can make a lot of sense. It also decreases the blast radius in case one team breaks their Prometheus server by ingesting too many metrics, overloading it with queries, etc. (and then they are in the best position to fix the situation too).

Stuart Clark

unread,
May 6, 2020, 4:34:24 AM5/6/20
to Shay Berman, Prometheus Users
On 2020-05-06 08:48, Shay Berman wrote:
> Actually I am facing the same situation [1] when dealing with millions
> of time series on single prometheus.
> So I am trying to break it down to smaller prometheus instances(each
> scrap range of targets).
> But then a global view comes in (because you don't want to break you
> existing dashboards that has queries from specific datasource) so
> there are few solutions here:
> 1. Thanos querier link1 [2] and link2 [3] - which also can give you
> long term storage as optional phase.
> 2. promxy [4] which probably lighter(no need sidecar inside prometheus
> pod) but less features like long term storage and deduplication.
> 3. A global prometheus that do just remote_read [5] from all the
> smaller prometheus. So its work but not really well documented. I
> believe #1 and #2 are better.
> 4. Prometheus fedetation [6] - but this has scaling limitation since
> you must scrap subset of the data, it may boom is you have many small
> prometheses.

There are lots of details around how you operate/want to work that may
help with deciding which method works for you.

The options which run queries on the "local" Prometheus servers require
those services to be available and not too busy - you can have the
situation that a query from somewhere else breaks a server because it is
too big/too slow. Equally a server being unavailable (down/network
issues) will cause a query to fail.

Federation removes that limitation, as the "global" queries would only
be handled by the one Prometheus server, with the only load on the
"local" servers being the constant federation requests (which should be
small and predictable). However, as you mention, switching to federation
needs careful design. You would want recording rules in the "local"
servers to aggregate the metrics (e.g. removing instance labels using
sum()) and then match[] selectors that only federate just enough for the
global alerts & dashboards. You may want to split your dashboards to
local & global - local would sit with/query the local servers, and can
give detail (because they are querying the full data), but may have
availability issues & can't query data not held on that server; global
would use the federated data, but cannot give the full per-instance
detail.

The global storage solutions sit somewhere in the middle. They have the
advantage of not being dependent on local servers for queries. They
equally can store everything, rather than just summaries. However there
is some complexity, and just because you can store everything centrally
& query without having recroding rules to aggregate doesn't mean you
always should - queries will be slow if lots of series/blocks have to be
interrogated.


>
> Just sharing my 2cent so far.
> Shay
>
> On Tuesday, May 5, 2020 at 2:05:20 PM UTC+3, Tim Schwenke wrote:
>
>> Hello,
>>
>> did I understood correctly that due to Prometheus being very
>> light-weight (unlike an Elasticsearch) and efficient but having an
>> upper-limit of xx millions of time series per instance it is
>> recommended to have one Prometheus server/container per observed
>> system (may it be an application or set of CI/CD job runners) than
>> to host a single massive Prometheus?
>>
>> On one hand I see the advantage in scraping all my REST APIs across
>> all apps with one Prometheus. On the other hand I also have a ton of
>> application specific metrics I would have to separate with a prefix
>> or so to not lose overview (labels work as well, but I have to pick
>> a metric first to filter for a certain label value).
>>
>> Thanks in advance,
>>
>> Tim Schwenke
>
> DISCLAIMER
>
> The information contained in this communication from the sender is
> confidential. It is intended solely for use by the recipient and
> others authorized to receive it. If you are not the recipient, you are
> hereby notified that any disclosure, copying, distribution or taking
> action in relation of the contents of this information is strictly
> prohibited and may be unlawful.
>
> This email has been scanned for viruses and malware, and may have been
> automatically archived by MIMECAST LTD, an innovator in Software as a
> Service (SaaS) for business. Providing a SAFER and MORE USEFUL place
> for your human generated data. Specializing in; Security, archiving
> and compliance. To find out more Click Here [7].
>
> --
> You received this message because you are subscribed to the Google
> Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/prometheus-users/8786bc23-7002-444a-a6f4-c5cf3314d87e%40googlegroups.com
> [8].
>
>
> Links:
> ------
> [1]
> https://groups.google.com/forum/#!topic/prometheus-users/CwDl2uOSRVY
> [2]
> https://github.com/thanos-io/thanos/blob/master/docs/components/query.md
> [3] https://www.youtube.com/watch?v=Iuo1EjCN5i4
> [4] https://github.com/jacksontj/promxy
> [5]
> https://prometheus.io/docs/prometheus/latest/configuration/configuration/#remote_read
> [6] https://www.robustperception.io/federation-what-is-it-good-for
> [7] http://www.mimecast.com/products/
> [8]
> https://groups.google.com/d/msgid/prometheus-users/8786bc23-7002-444a-a6f4-c5cf3314d87e%40googlegroups.com?utm_medium=email&utm_source=footer

--
Stuart Clark

Harald Koch

unread,
May 6, 2020, 3:15:57 PM5/6/20
to Prometheus Users


On Wed, May 6, 2020, at 03:46, Stuart Clark wrote:
The other big reason to split into multiple servers is organisational rather than technical.

We run Prometheus Alertmanager Grafana sets on the load balancers for each of our environments (DEV/TEST/PROD). This allows us to, for example, restrict access to different environments to different users.

But also, it makes my configurations much easier. DEV alerts go to slack, TEST alerts go to slack and email, PROD alerts go to slack and the pager. Writing this kind of rule in Alertmanager is troublesome, and requires tagging all of your environments. Instead I have different alertmanager.xml files for each environment*, and each one is very simple.

Our Grafana dashboards are also centrally managed, and we can configure them differently for each environment (e.g. hostnames, thresholds, etc.)

We aggregate all data to a single VictoriaMetrics instance so we can do things like compare performance in different clusters, but that's another show (as Alton Brown would say).

I was struggling a lot with getting both alerting and Grafana dashboards right before making this split - so much easier now!

--
Harald

*actually a single file templated using Ansible, but that's an unimportant detail

Shay Berman

unread,
May 6, 2020, 3:21:17 PM5/6/20
to Prometheus Users
Hi Stuart

Agree with you points.

about this section:
"The options which run queries on the "local" Prometheus servers require
those services to be available and not too busy - you can have the
situation that a query from somewhere else breaks a server because it is
too big/too slow. Equally a server being unavailable (down/network
issues) will cause a query to fail."


You didn't mentioned promxy or Thanos query - these could help to avoid failing the whole query if one single prometheus instance does not responding.

Stuart Clark

unread,
May 6, 2020, 5:16:49 PM5/6/20
to Shay Berman, Prometheus Users
On 06/05/2020 20:21, Shay Berman wrote:
Hi Stuart

Agree with you points.

about this section:
"The options which run queries on the "local" Prometheus servers require
those services to be available and not too busy - you can have the
situation that a query from somewhere else breaks a server because it is
too big/too slow. Equally a server being unavailable (down/network
issues) will cause a query to fail."


You didn't mentioned promxy or Thanos query - these could help to avoid failing the whole query if one single prometheus instance does not responding.


It could help (or hinder) depending on the failure mode & query purpose.

If you are trying a query across multiple sharded servers (e.g. different environments) Thanos/promxy isn't going to help with the missing data. However if you have HA pairs of servers everywhere it can be very useful if a single server has issues.

If you have queries which stress a server (either due to amount of timeseries covered or just overall query volume) systems which duplicate queries could in certain situations make things worse - maybe every server is now overloaded.

As I say, the exact "best option" very much depends on your particular situation. Is it a single environment in one location, or lots of environments globally? Do you have a single easily defined set of users (dashboards/alerts) or lots of different teams with different needs & requirements (e.g. some needing longer term querying for capacity management, while others are just short term incident management)? Does the way you operate fit into a more hierarchical structure/process (e.g region -> environment -> service -> instance) or are things more "flat"?

-- 
Stuart Clark
Reply all
Reply to author
Forward
0 new messages