There is a nice document on the internal design of prometheus 2.x storage here:
Roughly speaking:
- prometheus tries to put all the data points of a time series next to each other on disk: this allows it to do delta compression between them.
- each unique bag of labels defines a distinct time series
- therefore, if you have 1 million different values for a label, you will get 1 million different timeseries
- there are implications for buffering to write each timeseries in their own blocks, keeping track of all the distinct timeseries in RAM etc.
- queries may also end up having to aggregate across millions of time series, requiring lots of small reads
- in the limit you'd end up with billions of timeseries with one point in each, which makes no sense
You'll find similar cardinality issues with "tags" in influxdb.