Optimal/minimum block size?

Ivan Shapovalov

unread,

Jun 12, 2021, 3:51:08 AM6/12/21

to s3...@googlegroups.com

Hello,

does s3ql have any practical/recommended limits on amount of blocks in
a single filesystem? In other words, if I have, say, 10 TiB of data,
what would be the minimum recommended block size?

My use-case (for this specific filesystem) is storing consolidated
Macrium Reflect backups — a large disk image which is updated in place
daily. If I'm using a large block size (say, 100 or 500 MiB), this
basically causes the entire disk image to be retransmitted almost
entirely every time it is changed.

Can this be made to work at all?

--
Ivan Shapovalov / intelfx /

signature.asc

Daniel Jagszent

unread,

Jun 14, 2021, 8:29:52 PM6/14/21

to s3...@googlegroups.com

Hello intelfx,

There is a theoretical limit of 2⁶⁴ blocks because that is the limit that sqlite imposes. Before you get there your database size gets probably "too big to handle".
10 TiB of data with big files in the file system isn't that much that you need to worry about the database size. I have file systems for Bareos backups ranging from 5 TiB to 20 TiB. They have a maximum block size of 300 MiB because I feared that the database size would become unhandleable big. That fear was unwarranted. The database size is < 10 MiB (uncompressed). If I would redo these file systems I would probably choose a smaller maximum block size of 25 MiB to better utilize the CPU cores.

For your use case (only small blocks of a big file change) lowering the maximum block size to 1 MiB to 5 MiB would be better. I guess this would make a database size (uncompressed) of ~ 1 GiB. I probably would not go below 1 MiB.

signature.asc

Ivan Shapovalov

unread,

Jun 22, 2021, 4:29:24 AM6/22/21

to Daniel Jagszent, s3...@googlegroups.com

On 2021-06-15 at 02:29 +0200, Daniel Jagszent wrote:
> Hello intelfx,
>
> > does s3ql have any practical/recommended limits on amount of blocks
> > in
> > a single filesystem? In other words, if I have, say, 10 TiB of
> > data,
> > what would be the minimum recommended block size?
> >
> > My use-case (for this specific filesystem) is storing consolidated
> > Macrium Reflect backups — a large disk image which is updated in
> > place
> > daily. If I'm using a large block size (say, 100 or 500 MiB), this
> > basically causes the entire disk image to be retransmitted almost
> > entirely every time it is changed.
> >
> > Can this be made to work at all?

> There is a theoretical limit of 2^64 blocks because that is the limit
> that sqlite <https://sqlite.org/limits.html> imposes. Before you get

> there your database size gets probably "too big to handle".
> 10 TiB of data with big files in the file system isn't that much that
> you need to worry about the database size. I have file systems for
> Bareos backups ranging from 5 TiB to 20 TiB. They have a maximum
> block
> size of 300 MiB because I feared that the database size would become
> unhandleable big. That fear was unwarranted. The database size is <
> 10
> MiB (uncompressed). If I would redo these file systems I would
> probably
> choose a smaller maximum block size of 25 MiB to better utilize the
> CPU
> cores.
>
> For your use case (only small blocks of a big file change) lowering
> the
> maximum block size to 1 MiB to 5 MiB would be better. I guess this
> would
> make a database size (uncompressed) of ~ 1 GiB. I probably would not
> go
> below 1 MiB.
>

Thanks.

I tried with 10 MiB block size and this made s3ql to re-upload 300 GiB
when just 30 were changed this particular day. I'm now trying with 1
MiB; interestingly, it doesn't seem to affect multithreaded upload
speed too much.

I'm also worried about the s3ql block cache. If I'm going to use s3ql
in this setup at all, no matter the block size, it means I'll be
writing 1 TB of data daily(!) to the SSD that holds the block cache.

Is this solvable in s3ql somehow? I'd just put it in RAM (reducing the
cache size to something like 1 GiB which I can spare), but the cache
directory also holds the metadata, which needs to be non-volatile.

signature.asc

Nikolaus Rath

unread,

Jun 22, 2021, 5:53:32 AM6/22/21

to s3...@googlegroups.com

On Jun 22 2021, Ivan Shapovalov <int...@intelfx.name> wrote:
> I'm also worried about the s3ql block cache. If I'm going to use s3ql
> in this setup at all, no matter the block size, it means I'll be
> writing 1 TB of data daily(!) to the SSD that holds the block cache.
>
> Is this solvable in s3ql somehow? I'd just put it in RAM (reducing the
> cache size to something like 1 GiB which I can spare), but the cache
> directory also holds the metadata, which needs to be non-volatile.

The cache is in a -cache subdirectory. You might be able to symlink that
/ mount over it. Note that the name depends on the storage URL though.

Best,
-Nikolaus

--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«

Nikolaus Rath

unread,

Jun 22, 2021, 5:56:48 AM6/22/21

to s3...@googlegroups.com

On Jun 22 2021, Ivan Shapovalov <int...@intelfx.name> wrote:

> I tried with 10 MiB block size and this made s3ql to re-upload 300 GiB
> when just 30 were changed this particular day.

Does "30" refer to "30 MB of changes"? If so, then this isn't saying
anything.

If you change one byte every 10 MB, then S3QL would have to upload every
block even though you only changed 30 MB of data. If, on the other hand,
you write 30 MB in sequence, then S3QL should only upload 3 blocks. If
anything else happens, there is a bug (and it would be great if you
could construct a small testcase that reproduces it).

Ivan Shapovalov

unread,

Jun 22, 2021, 6:11:42 AM6/22/21

to Nikolaus Rath, s3...@googlegroups.com

On 2021-06-22 at 10:56 +0100, Nikolaus Rath wrote:
> On Jun 22 2021, Ivan Shapovalov <int...@intelfx.name> wrote:
> > I tried with 10 MiB block size and this made s3ql to re-upload 300
> > GiB
> > when just 30 were changed this particular day.
>
> Does "30" refer to "30 MB of changes"? If so, then this isn't saying
> anything.

30 GiB. Yes, of course, I was just saying it as a general observation,
I understand how fixed-size block splitting works.

>
> If you change one byte every 10 MB, then S3QL would have to upload
> every
> block even though you only changed 30 MB of data. If, on the other
> hand,
> you write 30 MB in sequence, then S3QL should only upload 3 blocks.
> If
> anything else happens, there is a bug (and it would be great if you
> could construct a small testcase that reproduces it).
>
>
> Best,
> -Nikolaus
>
> --
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
> »Time flies like an arrow, fruit flies like a Banana.«
>

signature.asc

Ivan Shapovalov

unread,

Jun 22, 2021, 6:17:05 AM6/22/21

to Nikolaus Rath, s3...@googlegroups.com

On 2021-06-22 at 10:53 +0100, Nikolaus Rath wrote:
> On Jun 22 2021, Ivan Shapovalov <int...@intelfx.name> wrote:
> > I'm also worried about the s3ql block cache. If I'm going to use
> > s3ql
> > in this setup at all, no matter the block size, it means I'll be
> > writing 1 TB of data daily(!) to the SSD that holds the block
> > cache.
> >
> > Is this solvable in s3ql somehow? I'd just put it in RAM (reducing
> > the
> > cache size to something like 1 GiB which I can spare), but the
> > cache
> > directory also holds the metadata, which needs to be non-volatile.
>
> The cache is in a -cache subdirectory. You might be able to symlink
> that
> / mount over it. Note that the name depends on the storage URL
> though.

Fair enough. It's kinda fragile (I'd really prefer if block cache and
metadata cache were configurable separately, might write a patch if I
get around to it), but indeed it will work.

A more serious question, though: what if the machine crashes or loses
power and I lose the cache contents? Is this handled gracefully in
fsck.s3ql (any way other than crashing / abandoning the filesystem)?

>
>
> Best,
> -Nikolaus
>
> --
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>
> »Time flies like an arrow, fruit flies like a Banana.«
>

signature.asc

Nikolaus Rath

unread,

Jun 23, 2021, 6:10:48 AM6/23/21

to s3...@googlegroups.com

> A more serious question, though: what if the machine crashes or loses
> power and I lose the cache contents? Is this handled gracefully in
> fsck.s3ql (any way other than crashing / abandoning the filesystem)?

fsck.s3ql will just assume that there is no cached data, i.e. that all
dirty data has been flushed and the cache emptied before the crash.

So the filesystem should continue to work, but you may have stale data
in files or (in case where new data was appended but not yet updated) to
will have holes in the files.

Henry Wertz

unread,

Sep 10, 2021, 9:00:20 PM9/10/21

to s3ql

I don't have 10TB in one, but I can tell you on the three I have (default 10MB block size) I have 1 with just over 4,000,000 directory entries, 4.12TB of data using 2.67TB space, it has a 750MB (uncompressed) DB. 2nd has about 166,000 files, 3.31TB using 2.25TB space,, 88.4MB DB. The 3rd, 11,341,879 directory entries (but only 3,812,814 data blocks), 8.27TB data using 3.79TB space, 1.86GB DB. Small files do not use blocks, they are put directly in the DB. The 2nd one mainly has some VirtualBox .OVA files and such and videos (almost all large stuff) while the other 2 are a mix of videos and hard drive backups (not images, rsync, so there's load of tiny files from /etc, /usr/include, and /usr/src that are probably put straight into the DB.)

I'm using it on my USB disks, using local:// backend and putting cache on the same disk (2 of these drives are permanently hooked up, but 1 is portable, keeping cache on same drive is a nice failsafe if I have to move the drive to another computer. I put the s3ql software on the "raw" ext4 filesystem in an s3ql directory (in case I wanted to installed the .deb on a "new" system to access the filesystem), an s3ql-cache directory, s3ql-data, and s3ql-fs mount point, and a shell script that runs s3ql fsck (which returns immediately if there's nothing to fix) and then mount the filesystem (and one to unmount it.). 50GB cache, it runs pretty well! sqlite is nice and robust, I had the cable pop out once or twice and had to repair the DB (but after runming sqlite3 repair fsck fixed things up fine.) And several times I had the drive drop off, I didn't have to do anything with the DB, just run fsck and it came right up.). It does keep 10 backup copies of the database in s3ql-data in case the database really goes sideways. Just like fsck on a conventional filesystem, if you were in the middle of writing stuff when the disk drops off, the fsck will put some stuff in lost+found and delete other blocks that are not pointed to by anything. Speed is actually good, I run the portable (the one with 750MB DB) off a system with 2GB RAM sometimes and it runs comfortably on that.

Best solution I've found to have a local filesystem with compression and dedup -- btrfs caused me problems (and was slow), and I found the minimum requirements for ZFS to be too high to look into it.