Custom flushing time to FS?

34 views
Skip to first unread message

Marian Velez

unread,
Mar 12, 2023, 6:28:28 PM3/12/23
to Prometheus Users
Hi!
I need a hand trying to delay I/O flush operations into the hw/filesystem, due to poor hardware SSD lifetime.
I'm at this time unable to replace the SSD since I'm at a very remote location, so in term I was thinking on extending it as match as possible by flushing prometheus synchronous I/O operations into the filesystem like every 5 minutes or so. 

Initially I was thinking on doing some kind of hybrid thing in which I run tmpfs and sync it into the proper FS, but the DB keep breaking due to open files, which is kind of expected.

On the other hand, I couldn't find any custom Linux filesystem, fuse or not, that would allow me to control the underlying FS flushing frequency, so I was trying to see if I had the ability to do that on the prometheus DB side. 

Do you have any clue if this is achievable? 

Thanks in advance!

Julien Pivotto

unread,
Mar 12, 2023, 7:18:48 PM3/12/23
to Marian Velez, Prometheus Users
Hello,

It's not possible directly.

One possible workaround that comes to mind is using snapshots.

You could take periodic snapshots of the Prometheus database and flush
them to the filesystem at a lower frequency, say every 5 minutes, as you
mentioned.

Then you can sync the snapshot to disk.

Note: You can decide to snapshot the head or just the blocks.
> --
> You received this message because you are subscribed to the Google Groups "Prometheus Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to prometheus-use...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/prometheus-users/4ca96afd-e76d-407c-9006-4e7320bcae74n%40googlegroups.com.


--
Julien Pivotto
@roidelapluie

Brian Candler

unread,
Mar 13, 2023, 4:49:44 AM3/13/23
to Prometheus Users
Depending on other constraints you have, you could run prometheus in agent mode, with its WAL on ramdisk, doing remote_write to some other system.

I also remember reading recently about a change in VictoriaMetrics to defer the flushing of memory to reduce SSD wear:

Ben Kochie

unread,
Mar 13, 2023, 5:42:52 AM3/13/23
to Brian Candler, Prometheus Users
IIRC Prometheus WAL no longer fsyncs on a tight interval. It simply writes to page cache, so any flushing is actually controlled by the kernel.

sysctl vm.dirty_writeback_centisecs

The only forced syncs you'll see happen at compaction time, every 2 hours. See prometheus_tsdb_wal_fsync_duration_seconds_count.

So there's no need to do any of the shenanigans that VM does.

Ben Kochie

unread,
Mar 13, 2023, 6:22:56 AM3/13/23
to Brian Candler, Prometheus Users
Here's the result of tuning the sysctl value to 5 minutes at 09:45. This is on an odroid Pi-like system running Prometheus.

$ sudo sysctl -w vm.dirty_writeback_centisecs=30000
vm.dirty_writeback_centisecs = 30000

You can see how the writes are now reduced between the 5 minute intervals. No changes to Prometheus are necessary.

image.png

Brian Candler

unread,
Mar 13, 2023, 6:24:57 AM3/13/23
to Prometheus Users
Interesting.  Indeed, if I graph prometheus_tsdb_wal_fsync_duration_seconds_count I see it only increment once every 2 hours (on a small test system with v2.37.6)

Ben Kochie

unread,
Mar 13, 2023, 7:16:34 AM3/13/23
to Brian Candler, Prometheus Users
You can also see the effect on the dirty bytes:

image.png

And the smooth writes to disk:
image.png

But note, this doesn't change the overall disk write bandwidth that much. Only about a 10% reduction in bytes written.
image.png

Reply all
Reply to author
Forward
0 new messages