Is Prometheus suitable for ephemeral time series?

JAE HOON KO

unread,

May 10, 2016, 10:42:37 PM5/10/16

to Prometheus Developers

Hi,

I've got a question about collecting ephemeral metrics.

As far as I know, Prometheus stores scraped samples in in-memory chunks and persists them only when they become full.
And a chunk stores samples belonging to a single time series.

So, let's say a time series is sampled only once. A chunk will be made for that sample. Will the chunk remain forever in the memory?
If incomplete chunks remain in memory forever, Prometheus server will end up with throttling scrape when the max chunk number is exceeded by 10%.

More realistic scenario is as follows:

A docker container runs only when a request arrives. When it has processed the request, the docker container stops.
In the meantime, the docker container exports application-specific metrics, which have a label of docker container id.
Now, because total number of docker containers is limited, the number of active time series in an instant is upper-bounded (let's say the number is far below 1M)
However, the total cardinality will grow indefinitely, as the new docker container will have different container id.

Am I misunderstanding something? Or is Prometheus simply not a right solution for ephemeral time series?

Brian Brazil

unread,

May 10, 2016, 10:59:04 PM5/10/16

to JAE HOON KO, Prometheus Developers

On 11 May 2016 at 03:42, JAE HOON KO <roen...@gmail.com> wrote:

Hi,

I've got a question about collecting ephemeral metrics.

As far as I know, Prometheus stores scraped samples in in-memory chunks and persists them only when they become full.
And a chunk stores samples belonging to a single time series.

So, let's say a time series is sampled only once. A chunk will be made for that sample. Will the chunk remain forever in the memory?

It'll be written out after an hour of no new data.

If incomplete chunks remain in memory forever, Prometheus server will end up with throttling scrape when the max chunk number is exceeded by 10%.

More realistic scenario is as follows:

A docker container runs only when a request arrives. When it has processed the request, the docker container stops.
In the meantime, the docker container exports application-specific metrics, which have a label of docker container id.
Now, because total number of docker containers is limited, the number of active time series in an instant is upper-bounded (let's say the number is far below 1M)
However, the total cardinality will grow indefinitely, as the new docker container will have different container id.

Am I misunderstanding something? Or is Prometheus simply not a right solution for ephemeral time series?

You're not looking for time series here, you're looking for an event logging solution such as the ELK stack.

--

Brian Brazil

www.robustperception.io

JAE HOON KO

unread,

May 11, 2016, 12:34:55 AM5/11/16

to Prometheus Developers, roen...@gmail.com

Oops, thanks for quick answer.

Does 'written out' mean that the chunk will be persisted AND removed from memory?
If it's only persisted and still counted as in-memory chunks, Prometheus will throttle scraping.

Actually, I've tested this scenario after I got your answer. I let Prometheus scrape only once, and then turn down the target so that no more samples are gathered.
Even after 1 and half hour, the number of chunks in the memory (prometheus_local_storage_memory_chunks) does not seem to decrease.
Is there a config paramter to change that 1 hour?

2016년 5월 11일 수요일 오전 11시 59분 4초 UTC+9, Brian Brazil 님의 말:

Julius Volz

unread,

May 11, 2016, 4:01:22 AM5/11/16

to JAE HOON KO, Prometheus Developers

On Wed, May 11, 2016 at 6:34 AM, JAE HOON KO <roen...@gmail.com> wrote:

Oops, thanks for quick answer.

Does 'written out' mean that the chunk will be persisted AND removed from memory?
If it's only persisted and still counted as in-memory chunks, Prometheus will throttle scraping.

Actually, I've tested this scenario after I got your answer. I let Prometheus scrape only once, and then turn down the target so that no more samples are gathered.
Even after 1 and half hour, the number of chunks in the memory (prometheus_local_storage_memory_chunks) does not seem to decrease.
Is there a config paramter to change that 1 hour?

No, but to clarify: The chunk will be scheduled for persistence after one hour, after which it *may* be evicted from memory if necessary. You can control how many chunks Prometheus tries to keep in memory via the -storage.local.memory-chunks flag (see also https://prometheus.io/docs/operating/storage/). By default, Prometheus will try to keep 1048576 chunks in memory (even if they are persisted already).

2016년 5월 11일 수요일 오전 11시 59분 4초 UTC+9, Brian Brazil 님의 말:
On 11 May 2016 at 03:42, JAE HOON KO <roen...@gmail.com> wrote:
Hi,

I've got a question about collecting ephemeral metrics.

As far as I know, Prometheus stores scraped samples in in-memory chunks and persists them only when they become full.
And a chunk stores samples belonging to a single time series.

So, let's say a time series is sampled only once. A chunk will be made for that sample. Will the chunk remain forever in the memory?

It'll be written out after an hour of no new data.

If incomplete chunks remain in memory forever, Prometheus server will end up with throttling scrape when the max chunk number is exceeded by 10%.

More realistic scenario is as follows:

A docker container runs only when a request arrives. When it has processed the request, the docker container stops.
In the meantime, the docker container exports application-specific metrics, which have a label of docker container id.
Now, because total number of docker containers is limited, the number of active time series in an instant is upper-bounded (let's say the number is far below 1M)
However, the total cardinality will grow indefinitely, as the new docker container will have different container id.

Am I misunderstanding something? Or is Prometheus simply not a right solution for ephemeral time series?

You're not looking for time series here, you're looking for an event logging solution such as the ELK stack.

--
Brian Brazil
www.robustperception.io

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

JAE HOON KO

unread,

May 11, 2016, 5:36:22 AM5/11/16

to Prometheus Developers, roen...@gmail.com

Wow, thanks.

So we don't have to worry about the urgency score or throttling decision including those chunks that have already persisted but not yet evicted.

One following up question: does "prometheus_local_storage_memory_chunks" include the chunks to persist? i.e. "prometheus_local_storage_memory_chunks" = "prometheus_local_storage_chunks_to_persist" + chunks-not-yet-to-persist?

2016년 5월 11일 수요일 오후 5시 1분 22초 UTC+9, Julius Volz 님의 말:

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

Björn Rabenstein

unread,

May 11, 2016, 5:58:49 AM5/11/16

to JAE HOON KO, Prometheus Developers

On 11 May 2016 at 11:36, JAE HOON KO <roen...@gmail.com> wrote:
> One following up question: does "prometheus_local_storage_memory_chunks"
> include the chunks to persist? i.e. "prometheus_local_storage_memory_chunks"
> = "prometheus_local_storage_chunks_to_persist" + chunks-not-yet-to-persist?

Almost... "prometheus_local_storage_memory_chunks" is simply the
number of chunks in memory. These could be in the following states:

- persisted and currently not used in a query (those chunks are the
only ones that could be evicted if needed)
- persisted but currently used in a query
- waiting to be persisted (which is
"prometheus_local_storage_chunks_to_persist")
- not eligible for persistence because still being written to

--
Björn Rabenstein, Engineer
http://soundcloud.com/brabenstein

SoundCloud Ltd. | Rheinsberger Str. 76/77, 10115 Berlin, Germany
Managing Director: Alexander Ljung | Incorporated in England & Wales
with Company No. 6343600 | Local Branch Office | AG Charlottenburg |
HRB 110657B

Brian Brazil

unread,

May 11, 2016, 9:27:56 AM5/11/16

to JAE HOON KO, Prometheus Developers

On 11 May 2016 at 10:36, JAE HOON KO <roen...@gmail.com> wrote:

Wow, thanks.

To be a bit clearer, this is not a use case to which Prometheus is suited to. We're a metrics-based time series database, and this is a event logging use case. Accordingly you'll find it very difficult to get this working at even moderate scale with Prometheus, and using PromQL will also be challenging.

What I'd suggest is that either you make your containers be long lived and process multiple requests, or only care about monitoring the thing that is starting/stopping the containers and routing the requests.

Brian

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-devel...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Brian Brazil

www.robustperception.io

JAE HOON KO

unread,

May 11, 2016, 8:47:30 PM5/11/16

to Prometheus Developers, roen...@gmail.com

Thank you for your comment.
I'll keep that in mind.

2016년 5월 11일 수요일 오후 10시 27분 56초 UTC+9, Brian Brazil 님의 말:

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Prometheus Developers" group.

To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-developers+unsub...@googlegroups.com.