Hi,
I have a datasource (log files) that i would like to turn into stats
over long period of time.
The way I need to have this data parsed and broken down will yield to
approximately 10M+ timeseries, that can rise easily 2-3x over time.
The other feature of this data is that most of the timeseries will
probably be sparse, so it's probably only about ~10k timeseries will
change at each scrape interval.
Obviously the exporters memory requirements will become a factor if
you need to store an ever increasing number of timeseries. This is my
first problem, how to make the exporters memory requirments low by
most preferably having only knowledge of metric changes since last
scrape interval (turning current counters into gauges).
Currently the exporter as a POC (using counters) has 4 metrics, and 3
metric labels. Two of the labels have about 5 possible values each,
the third label has it in the millions. This can obviously be work
around to have 4 million metrics with 2 labels, or anything between
the 2.
Is this typically a usecase where prometheus is not a good fit or
there are some tips/tricks that I can use to tackle the cardinality
and sparseness of the data on both scraper and exporter side? Any
suggestions?
In general this type of data seems more suited for a push
type of TSDB, but thought first will check with you guys if there is
something that prometheus can do for us in this case.
> This doesn't sound like a good use case for Prometheus.
Thx, will check into other options.
> Push vs pull has nothing to do with it, it sounds like you want an event logging system such as the ELK stack.
The push model would help with the "exporter" side of things where it
doesn't need to care about the "history" of the timeseries.
Hi,
I was a bit vague on why the push model would help on the exporter
side. It would help on the exporter side as they usually don't have
the restraint of pushing all metrics at all pushes (whatever the push
interval/aggregation maybe).
In prometheus based on the recommendations the exporter would need to
expose a metric every scrape interval even if that gets updated only
once a day, at super-high cardinality metrics this can take a toll on
the exporter. The push model would take away this extra weight.
We have considered the ELK stack but the overhead to produce numbers
seemed quite a lot greater, than a TSDB that can handle 10M timeseries
in some shape or form.
Thanks for the input, I will check around in other TSDBs.