Labels with high cardinality leading to poor performance

966 views
Skip to first unread message

Gavin Hamill

unread,
Apr 21, 2017, 8:57:17 AM4/21/17
to Prometheus Users
Hi,

Having used Prometheus for a few weeks for node_exporter system metrics, with Grafana and some simple functions, I wanted to move on to something more chewy. I'm involved in some IoT work where we expect to have ~1 million 'Things' each of which will send metrics in Prometheus text format approximately once per day. (the device transport is MQTT, with a microservice to listen for metrics messages and present them to Prometheus for scraping)

To get a feel for the necessary AWS machine sizes / disk performance to support 20,000 (arbitrary figure) complete sets of metrics at one short timeframe, I (ab)used the node_exporter's textfile collector. I have 20,000 .prom files each containing histograms, counters and gauges where each file is approx 8kB. The whole http://x.x.x.x:9101/metrics scrape is 160MB.

Sample output and the Python generator script at https://gist.github.com/gdhgdhgdh/20ce03cf02540f9a9fdb1e2f46441f47

Given we need to record data for many devices, this leads to a very high cardinality on 'deviceid' which the docs clearly state [1] is an anti-pattern along these lines:

bounded_fifo_buffer_gauges{deviceid="0000000000000003",gauge="bridge_outgoing_pub_buffer"} 79.0
bounded_fifo_buffer_gauges{deviceid="0000000000000003",gauge="bridge_outgoing_unsub_queue"} 55.0
[...]

I'm running Prometheus 1.6.1 on AWS i3.large (dual-core, 16GB RAM) with -storage.local.path on an XFS-formatted NVMe SSD for huge IO performance. [2]

Regrettably even with scrape_interval: 60s and scrape_timeout: 50s I'm regularly seeing messages like:

WARN[0021] Storage has entered rushed mode.              chunksToPersist=0 memoryChunks=519896 source=storage.go:1842 urgencyScore=1
ERRO[0022] Storage needs throttling. Scrapes and rule evaluations will be skipped.  chunksToPersist=0 memoryChunks=519896 source=storage.go:982 urgencyScore=1

This of course leads to poor performance for other queries and flapping of monitoring based on seeing an up{} in the last 60 seconds.

Looking at iostat output, the storage doesn't appear to be very busy - perhaps 60 MB/sec at peak with ~5000-6000 IOPS.

On the other hand, the CPU is being chewed... very little idle time, but a 'perf record' for 10 seconds doesn't show any real smoking gun:

Overhead  Command     Shared Object      Symbol                                                                                                                                               ◆
   8.07%  prometheus  prometheus         [.] runtime.scanobject
   6.64%  prometheus  prometheus         [.] runtime.heapBitsForObject
   4.90%  prometheus  prometheus         [.] runtime.greyobject
   3.54%  prometheus  prometheus         [.] github.com/prometheus/prometheus/vendor/github.com/golang/snappy.encodeBlock
   2.00%  prometheus  prometheus         [.] runtime.memmove
   1.92%  prometheus  prometheus         [.] runtime.mallocgc
   1.71%  prometheus  prometheus         [.] github.com/prometheus/prometheus/vendor/github.com/golang/snappy.decode
   1.32%  prometheus  prometheus         [.] runtime.updatememstats                                                                                                                           

I haven't tweaked any of the RAM settings as yet. Is this type of behaviour to be expected from the way I'm ingesting data, and the format / volume of data?

Cheers,
Gavin.

Brian Brazil

unread,
Apr 21, 2017, 9:09:45 AM4/21/17
to Gavin Hamill, Prometheus Users
I estimate that's 1.6M time series, which for open head chunks alone will need over 3GB of RAM so that'll fall over pretty much instantly with the defaults. You need a lot more RAM.

Having it all come from one target will cause a big load spike, which may not be the best thing.

Brian
 

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-users+unsubscribe@googlegroups.com.
To post to this group, send email to prometheus-users@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/a809fc25-c3d0-4135-b894-f639df6779a2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Ben Kochie

unread,
Apr 21, 2017, 9:38:58 AM4/21/17
to Gavin Hamill, Prometheus Users
I think this is one of those cases where Prometheus isn't a great solution.  Prometheus is designed for continuous monitoring, gathering data from IoT devices is something more suited to a database like InfluxDB.

Another option would be to use the Cortex[0] database. It's designed to be more horizontally scaleable for something like IoT devices.  It also supports push-based ingestion.


--

Gavin Hamill

unread,
Apr 21, 2017, 10:48:52 AM4/21/17
to Prometheus Users
Many thanks to you both for the timely responses. I've bumped up the -storage.local.target-heap-size from the default 2G to a more sensible 8G and the import process has turbo-charged. It's now completing in 15-20 seconds when it would previously drag for several minutes (if it completed at all).

I now understand a bit more about how memory is used, so that can only be a good thing.

I take the comments about the single target huge import being a real drain on the system and I agree - fortunately we already included the concept of partitioning in the platform design, and this will help inform us how many clients we should allocate to each partition - a combination of the capacities of the MQTT service, metrics collection reporting (and some ancillary services)

I'll also be sure to take a look at Cortex. I've used InfluxDB before now, but only as a reaction to 'Graphite's whisper format is a pain' rather than for any serious work :)

Cheers,
Gavin.



Reply all
Reply to author
Forward
0 new messages