high cardinality sparse metric (exporter and storage)

93 views
Skip to first unread message

Béla Törös

unread,
Jul 16, 2019, 4:19:28 AM7/16/19
to Prometheus Users
Hi,

I have a datasource (log files) that i would like to turn into stats
over long period of time.

The way I need to have this data parsed and broken down will yield to
approximately 10M+ timeseries, that can rise easily 2-3x over time.

The other feature of this data is that most of the timeseries will
probably be sparse, so it's probably only about ~10k timeseries will
change at each scrape interval.

Obviously the exporters memory requirements will become a factor if
you need to store an ever increasing number of timeseries. This is my
first problem, how to make the exporters memory requirments low by
most preferably having only knowledge of metric changes since last
scrape interval (turning current counters into gauges).

Currently the exporter as a POC (using counters) has 4 metrics, and 3
metric labels. Two of the labels have about 5 possible values each,
the third label has it in the millions. This can obviously be work
around to have 4 million metrics with 2 labels, or anything between
the 2.

Is this typically a usecase where prometheus is not a good fit or
there are some tips/tricks that I can use to tackle the cardinality
and sparseness of the data on both scraper and exporter side? Any
suggestions? In general this type of data seems more suited for a push
type of TSDB, but thought first will check with you guys if there is
something that prometheus can do for us in this case.


kind regards,
Bela

Brian Brazil

unread,
Jul 16, 2019, 4:33:31 AM7/16/19
to Béla Törös, Prometheus Users
On Tue, 16 Jul 2019 at 09:19, Béla Törös <kale...@gmail.com> wrote:
Hi,

I have a datasource (log files) that i would like to turn into stats
over long period of time.

The way I need to have this data parsed and broken down will yield to
approximately 10M+ timeseries, that can rise easily 2-3x over time.

The other feature of this data is that most of the timeseries will
probably be sparse, so it's probably only about ~10k timeseries will
change at each scrape interval.

Obviously the exporters memory requirements will become a factor if
you need to store an ever increasing number of timeseries. This is my
first problem, how to make the exporters memory requirments low by
most preferably having only knowledge of metric changes since last
scrape interval (turning current counters into gauges).

Currently the exporter as a POC (using counters) has 4 metrics, and 3
metric labels. Two of the labels have about 5 possible values each,
the third label has it in the millions. This can obviously be work
around to have 4 million metrics with 2 labels, or anything between
the 2.

Is this typically a usecase where prometheus is not a good fit or
there are some tips/tricks that I can use to tackle the cardinality
and sparseness of the data on both scraper and exporter side? Any
suggestions?

This doesn't sound like a good use case for Prometheus.
 
In general this type of data seems more suited for a push
type of TSDB, but thought first will check with you guys if there is
something that prometheus can do for us in this case.

Push vs pull has nothing to do with it, it sounds like you want an event logging system such as the ELK stack.

--

Béla Törös

unread,
Jul 16, 2019, 4:37:30 AM7/16/19
to Brian Brazil, Prometheus Users
> This doesn't sound like a good use case for Prometheus.

Thx, will check into other options.

> Push vs pull has nothing to do with it, it sounds like you want an event logging system such as the ELK stack.

The push model would help with the "exporter" side of things where it
doesn't need to care about the "history" of the timeseries.

Brian Brazil

unread,
Jul 16, 2019, 4:52:45 AM7/16/19
to Béla Törös, Prometheus Users
On Tue, 16 Jul 2019 at 09:37, Béla Törös <kale...@gmail.com> wrote:
> This doesn't sound like a good use case for Prometheus.

Thx, will check into other options.

> Push vs pull has nothing to do with it, it sounds like you want an event logging system such as the ELK stack.

The push model would help with the "exporter" side of things where it
doesn't need to care about the "history" of the timeseries.

You're conflating metrics and events, https://www.robustperception.io/which-kind-of-push-events-or-metrics, which is not the same as push vs pull.

Brian

Béla Törös

unread,
Jul 18, 2019, 10:34:54 AM7/18/19
to Brian Brazil, Prometheus Users
Hi,

I was a bit vague on why the push model would help on the exporter
side. It would help on the exporter side as they usually don't have
the restraint of pushing all metrics at all pushes (whatever the push
interval/aggregation maybe).
In prometheus based on the recommendations the exporter would need to
expose a metric every scrape interval even if that gets updated only
once a day, at super-high cardinality metrics this can take a toll on
the exporter. The push model would take away this extra weight.

We have considered the ELK stack but the overhead to produce numbers
seemed quite a lot greater, than a TSDB that can handle 10M timeseries
in some shape or form.

Thanks for the input, I will check around in other TSDBs.

--
Bela




On Tue, 16 Jul 2019 at 10:52, Brian Brazil

Brian Brazil

unread,
Jul 18, 2019, 12:09:48 PM7/18/19
to Béla Törös, Prometheus Users
On Thu, 18 Jul 2019 at 15:34, Béla Törös <kale...@gmail.com> wrote:
Hi,

I was a bit vague on why the push model would help on the exporter
side. It would help on the exporter side as they usually don't have
the restraint of pushing all metrics at all pushes (whatever the push
interval/aggregation maybe). 

That's about data model assumptions, not push versus pull. 
 
In prometheus based on the recommendations the exporter would need to
expose a metric every scrape interval even if that gets updated only
once a day, at super-high cardinality metrics this can take a toll on
the exporter. The push model would take away this extra weight.

Even if you pushed somehow to Prometheus, you'd still have to push everything.
 

We have considered the ELK stack but the overhead to produce numbers
seemed quite a lot greater, than a TSDB that can handle 10M timeseries
in some shape or form.

Thanks for the input, I will check around in other TSDBs.
There's also things like InfluxDB.
Reply all
Reply to author
Forward
0 new messages