> this. --
The limit is down to the hardware that Prometheus is running on. The
more time series (the total number of different label combinations in
use for every metric) the more memory you would need. A cardinality of
100k for a single metric is pretty large. With only a few such metrics
you'd quickly be in the millions of time series, which would have pretty
substantial infrastructure requirements.
For larger Prometheus setups you would generally try to avoid having a
single large central server. Instead you would look to have a Prometheus
for every failure domain (e.g. different datacenters or AWS regions) as
well as different services/applications/areas (whatever makes sense
based on organisation or technical structures).
You can then use tools such as federation or a remote read/write system
(such as Thanos) to construct global views and alerts if needed.
--
Stuart Clark