How does M3DB handle high cardinality metrics?

1,020 views
Skip to first unread message

Patrick O'Brien

unread,
Aug 9, 2018, 12:53:58 PM8/9/18
to M3
Hello,

What are the cardinality limits for M3DB? At what point is a namespace too highly cardinal? If you can share any information about what you have seen internally that would be fantastic.

For some background on what we're looking at, some of the metrics we're looking at scraping would in the range of ~800,000 timeseries per namespace. I realize this is generally not recommended by the Prometheus authors, but being forced to lose some of the labels we have in order to keep this down would take a lot of value out of Prometheus.

Aside from that, M3DB looks pretty great and we can't wait to dig in! If you could shed any light on this question, that would be super helpful!

Thank you!

Martin Mao

unread,
Aug 10, 2018, 8:15:51 PM8/10/18
to Patrick O'Brien, M3
Hey Patrick,

Thanks for reaching out! Our cardinality limits for our M3DB clusters at Uber have mostly been restricted by the number of time series that you can fetch in a single query as opposed to the total number of time series you can store. In production, we store over 6 billion time series across multiple clusters, each one containing up to hundreds of millions of time series, so storing 800K series per namespace is not an issue. 

The limitation for us has been when we query this information - for example, if you wanted to retrieve all 800K time series at once and sum them up, the query nodes - as you can imagine - will run out of memory. While we are making performance optimizations in our new query engine (https://github.com/m3db/m3/tree/master/src/query) to address this, such as keeping the blocks of data compressed in memory until they need to be evaluated and streaming blocks of data back to the query engine in small pieces at a time, we've found that the better solution to tackle use cases like this is via the ingestion pipeline. Inside Uber, we use our aggregation tier (https://github.com/m3db/m3aggregator) to perform the summation (or other aggregation functions) on the data when it is ingested such that we only store and retrieve a single time series for the aggregated query. This is similar to Prometheus' roll-up rules.

Our aggregation tier also lets us tweak the retention of metrics so a common use case would be that we store the high cardinality metrics for a shorter period of time and then drop some of the labels when we store them for much longer retention. 

I hope that answers your question. Let us know if you need anything else.

Cheers,
Martin

--
You received this message because you are subscribed to the Google Groups "M3" group.
To unsubscribe from this group and stop receiving emails from it, send an email to m3db+unsubscribe@googlegroups.com.
To post to this group, send email to m3...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/m3db/8333e742-ead2-4f58-9463-7611a83215a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages