Enable WAL?

40 views
Skip to first unread message

Henry Wertz

unread,
Sep 10, 2021, 10:51:44 PM9/10/21
to s3ql
What do you think about enabling WAL (Write Ahead Logging)?

I see it was disabled in 2010 for good reason (WAL file ballooning out to multi-GB size) and poor s3qlcp speeds (not surprising if it was generating that much write traffic..).  But the WAL ballooning behavior was apparenlty changed/fixed in sqlite3 3.11.x (about 6 years ago.)  I applied the attached patch (which just switches from current setting of "PRAGMA journal_mode = OFF" and "PRAGMA synchronous = OFF" to the commented-out journal_mode WAL and synchronous NORMAL in database.py.

Could of course do a version check and only enable if you're on 3.11 or higher, if there is concern about older sqlite3 versions floating around.

I'm running local:// backend to 3 USB hard disks (these are 3 seperate s3ql file systems) with ext4, I'd guess with local storage the database speed may influence total speed more compared to when you have some remote S3 storage backing things up.

I didn't benchmark anything, but rsync'ing in small files is visibly faster, and the file system is better under load (i.e. I can copy stuff in and any simultaneous directory lookups, copying stuff out, etc. is noticeably faster and more responsive.)   I think the writeback tasks are finishing faster too (not really for 10MB blocks, they're probably dominated by compression time, but for smaller files and duplicates.)  The WAL file doesn't balloon too much, I have it grow to 99MB and stop there (I did put one FS under enough stress Linux's write cache started building up, the disk was not keeping up..  at some point there the WAL did increase to 160MB, so I guess it grows a little over 99MB under heavy load, but not by some crazy amount.)    The 99MB seems fairly constant (as opposed to being based on DB size), one of my s3ql systems has a 1.8GB DB, one is like 750MB, both have 99MB WAL files after they're mounted for a while.  s3qlcp on a directory with like 3000 files was a bit slow (but I don't have anything to compare it to..), but s3qlcp on this directory with about 300GB of VMs in it, it still took some seconds but a lot faster than before (the WAL didn't go past 99MB with either s3qlcp test).  As umount.s3ql does it's thing the WAL does grow to be about the same size as the DB (momentarily, just after the WAL reaches peak size I assume the DB is closed, the WAL is gone a second or two later).  (It did this on all 3 disks I have with s3ql local:// over ext4 on them.)

Thanks!
--Henry
database-use-wal.patch

Nikolaus Rath

unread,
Sep 11, 2021, 7:20:32 AM9/11/21
to s3...@googlegroups.com
On Sep 10 2021, Henry Wertz <hwer...@gmail.com> wrote:
> What do you think about enabling WAL (Write Ahead Logging)?
>
[...]
>
> I didn't benchmark anything, but rsync'ing in small files is visibly
> faster, and the file system is better under load (i.e. I can copy stuff in
> and any simultaneous directory lookups, copying stuff out, etc. is
> noticeably faster and more responsive.)

As I understand, WAL should not result in any speed-ups, it just
improves reliability in case of a crash. So I'd be very interested to
see actual benchmark data here rather than subjective impressions :-).

Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Henry Wertz

unread,
Sep 11, 2021, 4:42:32 PM9/11/21
to s3ql
I would think you're write -- you're writing your data into a log, then writing into the DB, that seems like it'd be slower.  But...

WAL docs (https://sqlite.org/wal.html)  say "WAL is significantly faster in most scenarios." (... to be fair they're comparing it to the regular journal mode, though, not journal_mode=off.)   They say WAL provides more concurrency (readers and writers don't usually block each other... I"m assuming the main thread, and the writeback threads, can block each other's access to the DB if only momentrarily), inserts are put into the WAL sequentially which is fast (and then I assume put into the DB at sqlite's leisure, when the DB is not busy), and WAL uses far fewer fsync() calls.   That said, on an SSD it may just mean "more writes" (one into WAL and one into DB) given the typical SSD's near-0 latency and high throughput.

Might be a bit (like later in the week) but I can do that -- pick out some test data (probably 1 test with larger files, 1 with smaller), copy it in, copy it out, s3qlcp, and finally remove both the 1st and s3qlcp'ed copy (so I don't have deduplication throw off the later test runs.)  Test with and without WAL.  I'll make sure to NOT test on my 8TB disk -- it's SMR so you give it like 100GB writes without enough of a breather and it's suddenly dead slow 8-) .

Thanks!
--Henry

Nikolaus Rath

unread,
Sep 12, 2021, 8:18:14 AM9/12/21
to s3...@googlegroups.com
Hi Henry,

A: Because it confuses the reader.
Q: Why?
A: No.
Q: Should I write my response above the quoted reply?

..so please quote properly, as I'm doing in the rest of this mail:


On Sep 11 2021, Henry Wertz <hwer...@gmail.com> wrote:
>> > What do you think about enabling WAL (Write Ahead Logging)?
>> >
>> [...]
>> >
>> > I didn't benchmark anything, but rsync'ing in small files is visibly
>> > faster, and the file system is better under load (i.e. I can copy stuff
>> in
>> > and any simultaneous directory lookups, copying stuff out, etc. is
>> > noticeably faster and more responsive.)
>>
>> As I understand, WAL should not result in any speed-ups, it just
>> improves reliability in case of a crash. So I'd be very interested to
>> see actual benchmark data here rather than subjective impressions :-).
>
> I would think you're write -- you're writing your data into a log, then
> writing into the DB, that seems like it'd be slower. But...
>
> WAL docs (https://sqlite.org/wal.html) say "WAL is significantly faster in
> most scenarios." (... to be fair they're comparing it to the regular
> journal mode, though, not journal_mode=off.) They say WAL provides more
> concurrency (readers and writers don't usually block each other... I"m
> [...]


That's the critical point. S3QL currently does not use a journal at
all. So enabling WAL just means that the data is written to the journal
before (just like now) the database is updated. So it's not clear to me
that this willl result in a speedup. Note also that S3QL currently
disables fsync() calls on the journal.

Henry Wertz

unread,
Sep 28, 2021, 3:13:35 AM9/28/21
to s3ql
Apologies!   I used to use Pine back in the day but got out of practice 8-).
Anyway, I ran some benchmarks  -- Made an empty filesystem and copied a portion of my .cache/bazel into it, a mix of somewhat larger files and a few directories of loads of small files.  Then I deleted it, re-copied, and re-deleted it. 
synchronous and journal-mode
OFF OFF
run 1:
create 4:15 delete 0:57
run 2:
create 4:15 delete 0:58

WAL OFF
run 1:
create 4:28 delete 1:03
run 2:
create 4:30 delete 0:59

WAL NORMAL
run 1:
create 4:29 delete 0:59
run 2:
create 4:30 delete
(I didn't run the last delete since the WAL, synchronous=NORMAL and WAL, synchronous=OFF speeds seemed to be identical anyway.)

I was surprised to see identical speeds with WAL/OFF and WAL/NORMAL.  Interestingly, WAL with sync=NORMAL is slower, but only by about 5%, for write-intensive loads.   WAL+normal is supposed to provide stronger guarantees of a consistent DB (if the system got interrupted, on next DB open sqlite can either complete or roll back transactions in the WAL, to make sure the DB is in a consistent state.)  That said, I've used s3ql on a few USB drives (one with a habit of having the cable fall out a few times, and I had a USB3 interface that had a habit of dropping off now and then with older kernels.)  Worst I had was running a sqlite .repiar and fsck, losing (unsurprisingly) the last several seconds of whatever I was copying in.   In other words I've already found sqlite robust enough with "OFF/OFF" setting, plus of course s3ql has the failsafe of having all those metadata backups just in case.

I did notice, if i ran "find" on the test set, it took 0.9 seconds in WAL mode but 0.3 seconds in OFF mode -- they do note in the docs that WAL can be slower for read-intensive loads, for walking through a directory tree it's getting 1/3rd the speed! 

I suspect that explains the "faster" rsync performance I observed -- I had one rsync copying stuff in and one walking through a directory tree copying stuff out, if it was walking through the tree at like 1/3rd the speed I suppose the write-intensive rsync would proceed faster, despite total filesystem IOPS being lower.

I'll post back in a bit, I've got another patch cooked up.  I decided to look into why running the fsck, why the searching for temporary files was taking over 2 hours.   I found on my s3ql-data, I had 810,000 directories (the "100" through "999" directories, 2 layers deep), but only about 30,000 with data in them,.  I went into the s3ql-data directory and (with s3ql unmounted) ran a "find -type d -exec rmdir {} \+" (and let that run overnight, I imagine it took a while.)  This cut the time for find to walk through from over 2 hours to about 10 minutes (and under a minute if I re-run, I apparently got the directory count low enough it can fit in the directory entry cache.)   s3ql currently creates these directories as needed, but does not remove them when empty, this patch adds that.

Reply all
Reply to author
Forward
0 new messages