LZ4 tool with sparse files and block discards

Neil Wilson

unread,

Feb 10, 2015, 5:46:36 AM2/10/15

to lz...@googlegroups.com

I'm using the lz4 cli tool to compress and decompress raw virtual machine images. One of the features of such files is that they tend to have a lot of zeros in them.

At the moment the cli tool writes out those zeros as it does with any other data. However it would be useful if there was some way of detecting a block of zeros in the compression stream and changing that into either a seek on a normal file to create a sparse file, or a block discard or block zero out call for a block device [1] (or continue to write out the zeros if we're writing to a stream or character device).

At the moment I'm working around the lack of zero detection by piping the cli tool into 'cp', ie. lz4 -cd <source file> | cp --sparse=always /dev/stdin <target file> but of course that means the zeros still have to be written and read which seems a bit of a waste of time if you're after high speed.

Is there a way to detect a zero block so that it can be treated as a special case and would such a facility be considered useful for inclusion in the lz4 cli tool? I'm happy to try and write the patch to the code if so.

[1] https://rwmj.wordpress.com/2014/03/11/blkdiscard-blkzeroout-blkdiscardzeroes-blksecdiscard/

Rgs

NeilW

Yann Collet

unread,

Feb 10, 2015, 8:37:42 AM2/10/15

to lz...@googlegroups.com

Hi Neil

This look like a file-system specific approach.

The frame format doesn't define a "zero-filled block" signal.

So, currently, detecting that a block is "filled with zeroes" is an external process, which is not performed by lz4 cli.

I've no experience regarding sparse file techniques.

My guess is that it is likely FS specific, with potentially some portability issues.

It is always difficult to balance between optimisation, portability and maintenance.

Of course, you can prove me wrong, and show that it can be done in a efficient manner, without jeopardising portability.

If it can be done, the important question seems to be : does this complexity improve performance enough ?

Regards

Takayuki Matsuoka

unread,

Feb 10, 2015, 9:11:52 AM2/10/15

to lz...@googlegroups.com

Hi guys.
Personally, I always believe "tar S". But it seems interesting problem for me.

@Neil, could you provide your own benchmark ?
I'm not sure about your operations, but it may look like the following commands :

dd if=/dev/zero of=my-4gb-file seek=4G bs=1 count=1
lz4 -9 -f my-4gb-file my-4gb-file.lz4
time lz4 -d -f my-4gb-file.lz4 my-4gb-file.lz4.out
time lz4 -d -c my-4gb-file.lz4 | cp /dev/stdin my-4gb-file.lz4.cp.out
time lz4 -d -c my-4gb-file.lz4 | cp --sparse=always /dev/stdin my-4gb-file.lz4.sparse.out

@Yann, just an FYI, aside from this topic.

These are simple and generic examples to make sparse file with standard C library which works both on *nix and Windows :
http://www.unixguide.net/unix/sparse_file.shtml
http://stackoverflow.com/a/11909465/2132223

Also, xz has fairly simple and generic method to support sparse file
in src/xz/file_io.c, see io_write(), is_sparse()
http://git.tukaani.org/?p=xz.git;a=blob_plain;f=src/xz/file_io.c;hb=HEAD

I'm not sure about performance gain, but it is platform independent and works well on the platform which have sparse file capability.

Hope this helps.

Yann Collet

unread,

Feb 10, 2015, 9:29:34 AM2/10/15

to lz...@googlegroups.com

This is very interesting.

Thanks for the excellent informative links Takayuki.

So, an fseek() is enough to transparently trigger "sparse write" on file systems which do support it, without bothering those which do not ?

Impressive, this is trivial, and indeed very portable.

I'm just a bit curious : is fseek() guaranteed to always produce zero-filled space, no risk of "garbage-filled" space instead ?

Rgds

Neil Wilson

unread,

Feb 10, 2015, 9:35:34 AM2/10/15

to lz...@googlegroups.com

From POSIX.

"The fseek() function shall allow the file-position indicator to be set beyond the end of existing data in the file. If data is later written at this point, subsequent reads of data in the gap shall return bytes with the value 0 until data is actually written into the gap."

Neil Wilson

unread,

Feb 10, 2015, 9:49:59 AM2/10/15

to lz...@googlegroups.com

A slightly faster 'is_empty' perhaps if there is no easy way of checking for blanks other than scanning the decompression buffer.

int is_empty(char *buf, size_t size)
{
    return buf[0] == 0 && !memcmp(buf, buf + 1, size - 1);
}

Yann Collet

unread,

Feb 10, 2015, 9:58:31 AM2/10/15

to lz...@googlegroups.com

Excellent. This looks great.

I'll try a few benchmarking exercise too to optimize the detector, this is already a great start.

Neil Wilson

unread,

Feb 10, 2015, 10:35:59 AM2/10/15

to lz...@googlegroups.com

The operations I'm doing are on large VM images. Let me describe what I have and then do the benchmarks on that.

First take the latest Ubuntu Trusty image (trusty-server-cloudimg-amd64-disk1.img) which is in QEMU qcow format, convert to raw and expand to 80G.

qemu-img convert -O raw trusty-server-cloudimg-amd64-disk1.img testy.img

truncate --size 80G testy.img

which gives this on the filesystem

# ls -lash testy.img

807M -rw-r--r--. 1 root root 80G Feb 10 14:51 testy.img

After compressing with lz4 as you suggest I get the following results when decompressing to new images.

# time lz4 -d -f testy.img.lz4 new-image

Successfully decoded 85899345920 bytes

real 3m34.750s

user 1m48.547s

sys 1m21.370s

# time lz4 -c -d testy.img.lz4 | cp /dev/stdin new-image.cp

real 5m13.574s

user 2m30.285s

sys 3m12.671s

# time lz4 -c -d testy.img.lz4 | cp --sparse=always /dev/stdin new-image.sparse

real 3m21.534s

user 2m42.877s

sys 1m15.336s

Compare that with the sparse aware conversion from qemu-img

# time qemu-img convert -O raw ~/trusty-server-cloudimg-amd64-disk1.img new-image.qemu && truncate --size 80G new-image.qemu

real 0m9.174s

user 0m7.576s

sys 0m1.314s

Even if you read the entire 80G raw file it's faster because the writes are optimised

# time qemu-img convert -O raw testy.img new-image.qemu2

real 1m19.341s

user 0m15.849s

sys 1m1.690s

Now qemu-img is of course designed for doing that sort of thing from its own image format, but it cannot stream in the way lz4 can. I'm hoping that we can get lz4 down to somewhere near the qemu-img times for this sort of use case. That would save a lot of write bandwidth on the servers which helps save wear on the SSDs.

On Tuesday, 10 February 2015 14:11:52 UTC, Takayuki Matsuoka wrote:

Yann Collet

unread,

Feb 10, 2015, 4:45:32 PM2/10/15

to lz...@googlegroups.com

Item created on the issue board

https://code.google.com/p/lz4/issues/detail?id=154

Takayuki Matsuoka

unread,

Feb 10, 2015, 9:13:29 PM2/10/15

to lz...@googlegroups.com

Thanks Neil !
I'm moving to issue 155 : https://code.google.com/p/lz4/issues/detail?id=155

Yann Collet

unread,

Feb 11, 2015, 1:00:47 AM2/11/15

to lz...@googlegroups.com

Oh yes, sorry, my previous link was incorrect. It's issue 155, as described by Takayuki.

Reply all

Reply to author

Forward