Hi,
Having used Prometheus for a few weeks for node_exporter system metrics, with Grafana and some simple functions, I wanted to move on to something more chewy. I'm involved in some IoT work where we expect to have ~1 million 'Things' each of which will send metrics in Prometheus text format approximately once per day. (the device transport is MQTT, with a microservice to listen for metrics messages and present them to Prometheus for scraping)
To get a feel for the necessary AWS machine sizes / disk performance to support 20,000 (arbitrary figure) complete sets of metrics at one short timeframe, I (ab)used the node_exporter's textfile collector. I have 20,000 .prom files each containing histograms, counters and gauges where each file is approx 8kB. The whole
http://x.x.x.x:9101/metrics scrape is 160MB.
Given we need to record data for many devices, this leads to a very high cardinality on 'deviceid' which the docs clearly state [1] is an anti-pattern along these lines:
bounded_fifo_buffer_gauges{deviceid="0000000000000003",gauge="bridge_outgoing_pub_buffer"} 79.0
bounded_fifo_buffer_gauges{deviceid="0000000000000003",gauge="bridge_outgoing_unsub_queue"} 55.0
[...]
I'm running Prometheus 1.6.1 on AWS i3.large (dual-core, 16GB RAM) with -storage.local.path on an XFS-formatted NVMe SSD for huge IO performance. [2]
Regrettably even with scrape_interval: 60s and scrape_timeout: 50s I'm regularly seeing messages like:
WARN[0021] Storage has entered rushed mode. chunksToPersist=0 memoryChunks=519896 source=storage.go:1842 urgencyScore=1
ERRO[0022] Storage needs throttling. Scrapes and rule evaluations will be skipped. chunksToPersist=0 memoryChunks=519896 source=storage.go:982 urgencyScore=1
This of course leads to poor performance for other queries and flapping of monitoring based on seeing an up{} in the last 60 seconds.
Looking at iostat output, the storage doesn't appear to be very busy - perhaps 60 MB/sec at peak with ~5000-6000 IOPS.
On the other hand, the CPU is being chewed... very little idle time, but a 'perf record' for 10 seconds doesn't show any real smoking gun:
Overhead Command Shared Object Symbol ◆
8.07% prometheus prometheus [.] runtime.scanobject
6.64% prometheus prometheus [.] runtime.heapBitsForObject
4.90% prometheus prometheus [.] runtime.greyobject
2.00% prometheus prometheus [.] runtime.memmove
1.92% prometheus prometheus [.] runtime.mallocgc
1.32% prometheus prometheus [.] runtime.updatememstats
I haven't tweaked any of the RAM settings as yet. Is this type of behaviour to be expected from the way I'm ingesting data, and the format / volume of data?
Cheers,
Gavin.