Hello,
I'm curious how to best use Prometheus for large scale use cases:
Let's say you've got a successful e-commerce platform, using Prometheus to track some ~300 metrics related to requests, failures and interactions at various points throughout the system.
Now, your platform is very generic, and you can easily support 10,000 different e-commerce sites & customers.
At this scale, Kubernetes spreads your application across ~100 pods.
(There's no "session aware" ingress router, so any pod can support any customer.)
How would you handle these metrics?
Adding the 'customer' label to every metric results in 300(metrics) * 10000(customers) * 100(pods) = 300M timeseries, scraped every 15 seconds by default.