replaying WAL consumes huge memory

Vincent Chen

unread,

Nov 25, 2019, 3:58:08 AM11/25/19

to Prometheus Users

On of my Prometheus server deployed in Kubernetes is restarted by some reasons, I found the memory usage is huge during "replaying WAL".

Prometheus version: v2.13.1

Arguments: --storage.tsdb.retention.time=1d --web.enable-lifecycle--storage.tsdb.no-lockfile --web.route-prefix=/ --storage.tsdb.min-block-duration=2h --storage.tsdb.max-block-duration=2h

Server: 160GB ram, 20 cores

here is the RSS value from process_resident_memory_bytes metric, you can see this server already restarted 3 times.....it get OOM killed when replaying WAL:

level=info ts=2019-11-25T08:42:27.192Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=10648 maxSegment=10762
level=info ts=2019-11-25T08:42:31.414Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=10649 maxSegment=10762
level=info ts=2019-11-25T08:42:37.323Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=10650 maxSegment=10762
level=info ts=2019-11-25T08:42:43.324Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=10651 maxSegment=10762
level=info ts=2019-11-25T08:42:49.385Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=10652 maxSegment=10762
level=info ts=2019-11-25T08:43:04.284Z caller=head.go:562 component=tsdb msg="WAL segment loaded" segment=10653 maxSegment=10762
level=warn ts=2019-11-25T08:43:54.134Z caller=main.go:501 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2019-11-25T08:43:54.136Z caller=main.go:526 msg="Stopping scrape discovery manager..."
level=info ts=2019-11-25T08:43:54.137Z caller=main.go:540 msg="Stopping notify discovery manager..."
level=info ts=2019-11-25T08:43:54.137Z caller=main.go:562 msg="Stopping scrape manager..."

before restart, series is around 20~25 Mil, I found the series in head chunk is up to 100 Mil when "replaying WAL"

Around 4x than usual....

I ssh into the node and check wal folder,

I found lots of wal files are BEFORE two hours ago and not get purged.



total 39G
drwxrwsr-x  3 root root  16K Nov 25 08:44 .
drwxrwsr-x 18 root root 4.0K Nov 25 07:38 ..
-rw-rw-r--  1 root root 121M Nov 25 03:47 00010450
-rw-rw-r--  1 root root 122M Nov 25 03:47 00010451
-rw-rw-r--  1 root root 128M Nov 25 03:47 00010452
-rw-rw-r--  1 root root 124M Nov 25 03:47 00010453
-rw-rw-r--  1 root root 124M Nov 25 03:47 00010454
-rw-rw-r--  1 root root 126M Nov 25 03:47 00010455
-rw-rw-r--  1 root root 128M Nov 25 03:47 00010456
-rw-rw-r--  1 root root 128M Nov 25 03:47 00010457
-rw-rw-r--  1 root root 125M Nov 25 03:47 00010458
-rw-rw-r--  1 root root 124M Nov 25 03:47 00010459
-rw-rw-r--  1 root root 128M Nov 25 03:48 00010460
-rw-rw-r--  1 root root 128M Nov 25 03:49 00010461
-rw-rw-r--  1 root root 128M Nov 25 03:49 00010462
-rw-rw-r--  1 root root 128M Nov 25 03:50 00010463
-rw-rw-r--  1 root root 128M Nov 25 03:51 00010464
-rw-rw-r--  1 root root 128M Nov 25 03:52 00010465

~~~~~~~~~~~~~SKIP~~~~~~~~~~~~~~~~~~~~~~

-rw-rw-r--  1 root root 128M Nov 25 07:28 00010744
-rw-rw-r--  1 root root 128M Nov 25 07:29 00010745
-rw-rw-r--  1 root root 128M Nov 25 07:29 00010746
-rw-rw-r--  1 root root 121M Nov 25 07:30 00010747
-rw-rw-r--  1 root root 128M Nov 25 07:30 00010748
-rw-rw-r--  1 root root 128M Nov 25 07:31 00010749
-rw-rw-r--  1 root root 128M Nov 25 07:32 00010750
-rw-rw-r--  1 root root 124M Nov 25 07:32 00010751
-rw-rw-r--  1 root root 128M Nov 25 07:33 00010752
-rw-rw-r--  1 root root 128M Nov 25 07:33 00010753
-rw-rw-r--  1 root root 128M Nov 25 07:34 00010754
-rw-rw-r--  1 root root 128M Nov 25 07:35 00010755
-rw-rw-r--  1 root root 128M Nov 25 07:36 00010756
-rw-rw-r--  1 root root 128M Nov 25 07:38 00010757
-rw-rw-r--  1 root root 128M Nov 25 07:39 00010758
-rw-rw-r--  1 root root  35M Nov 25 07:39 00010759
-rw-r--r--  1 root root    0 Nov 25 07:39 00010760
-rw-r--r--  1 root root    0 Nov 25 08:01 00010761
-rw-r--r--  1 root root    0 Nov 25 08:23 00010762
-rw-r--r--  1 root root    0 Nov 25 08:44 00010763
drwxrwsr-x  2 root root  12K Nov 25 07:17 checkpoint.010449

Please correct me if I'm wrong

1. I think prometheus should only replay the WAL files winthin 2 hours.

2. I found if prometheus server successfully replay WAL files, it immediately remove series (no idea why)

Vincent Chen

unread,

Nov 25, 2019, 4:30:54 AM11/25/19

to Prometheus Users

Any suggestions for 25 Mil series setup?

Ben Kochie

unread,

Nov 25, 2019, 9:38:13 AM11/25/19

to Vincent Chen, Prometheus Users

It seems like it's time to start sharding your Prometheus setup. I usually recommend about 2M series per server to keep them small and fast.

On Mon, Nov 25, 2019, 03:30 Vincent Chen <silence...@gmail.com> wrote:

Any suggestions for 25 Mil series setup?

--
You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/901a5992-d028-451a-beae-c1402f4cd268%40googlegroups.com.

Vincent Chen

unread,

Nov 25, 2019, 9:50:30 PM11/25/19

to Prometheus Users

Sharding could be an idea, but need to face query consolidation cross servers.

Is it possible to shorten the checkpoint frequency?

I found the Prometheus old version has storage.local.checkpoint-interval argument to setup interval, but not available in current release.

Ben Kochie於 2019年11月25日星期一 UTC+8下午10時38分13秒寫道：

Benoit Dubois

unread,

Dec 3, 2019, 3:34:42 PM12/3/19

to Prometheus Users

consolidation can easily be done through victoriametrics, thanos, trickster and many other solutions

Reply all

Reply to author

Forward