performance impact of high churn rate

528 views
Skip to first unread message

Johny

unread,
Apr 11, 2022, 8:10:14 PM4/11/22
to Prometheus Users

When time series label values are substituted for new ones regularly and the old series becomes active with no new data points. Keeping ingestion rate constant, what is the performance impact of high churn rate on ingestion, querying, compaction, and other operations.

For example, container_memory_usage_bytes gives memory usage of a pod's containers in kubernetes. It contains a pod and container labels contains pod name and container name, respectively, these may change often due to change in deployment, auto scaling, restart of crashed pods or for load balancing for a large scale real time system.

I believe the inactive time series should be regularly flushed out of active memory. High cardinality caused by the high churn rate should not cause high RAM usage. 

Compaction operation runs in background packing time series blocks into bigger blocks. If ingestion rate is constant, I don't foresee an impact of compaction runtime or resource usage.

Querying for a specific pod/container (most commonly container) would touch blocks within the time horizon of query. Unless query spans a large time range, I don't think there should be a significant impact on query run time  and use too much CPU/RAM.

Is my understanding correct? Are there other performance considerations for a high churn rate?




 

 
 

Brian Candler

unread,
Apr 12, 2022, 3:44:33 AM4/12/22
to Prometheus Users
> I believe the inactive time series should be regularly flushed out of active memory. High cardinality caused by the high churn rate should not cause high RAM usage. 

As I understand it, the full set of timeseries seen will remain in the "head chunk" for 2 hours.  So high churn rate *does* cause high RAM usage.  If you create and destroy 100 pods per minute, and each pod generates 1000 metrics, then in 2 hours that's 12 million timeseries.  In several ways the stress on the TSDB is similar to scraping 12 million timeseries, even though in a particular scrape you'll only be ingesting data for a small subset of those.

You may find these useful:

Also, this is an old document (and sadly no longer published at a stable URL), but it gives the design of the TSDB from when Prometheus 2.0 was created:

Reply all
Reply to author
Forward
0 new messages