Large scale use cases

hannesst...@gmail.com

unread,

Jan 25, 2021, 4:53:07 AM1/25/21

to Prometheus Users

Hello,

I'm curious how to best use Prometheus for large scale use cases:

Let's say you've got a successful e-commerce platform, using Prometheus to track some ~300 metrics related to requests, failures and interactions at various points throughout the system.

Now, your platform is very generic, and you can easily support 10,000 different e-commerce sites & customers.

At this scale, Kubernetes spreads your application across ~100 pods.

(There's no "session aware" ingress router, so any pod can support any customer.)

How would you handle these metrics?

Adding the 'customer' label to every metric results in 300(metrics) * 10000(customers) * 100(pods) = 300M timeseries, scraped every 15 seconds by default.

Julius Volz

unread,

Jan 25, 2021, 7:47:42 AM1/25/21

to hannesst...@gmail.com, Prometheus Users

Hi,

Generally most metrics-based systems won't be able to deal with that level of cardinality, thus typically we would recommend against putting data like customer IDs into label values (unless there are only a handful of customers). Prometheus is for monitoring overall system health, but would have trouble tracking behavior down to this level of detail. For looking into details of individual customers, you will likely need logging and/or tracing systems, unless you really cut down on the number of series per customer and either shard your Prometheus setup heavily or use a horizontally scalable Prometheus implementation like Cortex.

Regards,

Julius

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/cb9974bd-96ba-4bdd-8cf5-e28cbf8f587cn%40googlegroups.com.

--

Julius Volz

PromLabs - promlabs.com

hannesst...@gmail.com

unread,

Jan 25, 2021, 8:14:37 AM1/25/21

to Prometheus Users

Thanks a lot, much appreciated. These are great insights and suggestions. I'll see how I can best use this to rework this monitoring strategy.

Reply all

Reply to author

Forward