Lz4Stream: A Streaming Lz4 Decoder

Phill Djonov

unread,

Dec 14, 2013, 5:43:08 AM12/14/13

to lz...@googlegroups.com

A while ago I posted here asking if there was a zlib-like decoder implementation available, and it turned out that was still in progress. I've since seen a number of such projects, but they're all based on streams of compressed blocks (each of which is decoded all at once), which unfortunately doesn't fit my use case. Anyway, I ended up implementing this myself, and the code is available here: https://github.com/pdjonov/Lz4Stream.

It's not going to perform nearly as well as the standard implementation, but it's still pretty quick (quick enough to still beat the popular managed zlib implementations) given that it satisfies the following requirements:

It's fully type-safe and verifiable managed code (no calling out to a C library, no use of C#'s unsafe), making it useful in certain constrained use cases.
It reads from a single continuous block of raw LZ4-compressed data. You can take the output of LZ4_compress[HC] and feed it directly to the decoder.
It loads data incrementally from a Stream, using a very small internal buffer, preventing long stalls when reading incrementally from a slow source.
It is a Stream (though it is read-only and non-seekable), so it easily works with existing .NET code.
It performs well reading large and small blocks alike, so there's no need to rewrite or buffer for clients which do many small reads.
It uses just a bit more than 64K of internal state. That's not insignificant, but it's a huge savings when streaming through 10+ MB blocks of solid LZ4 data.

A word of caution: it doesn't strictly validate input. It's impossible to really overflow a buffer in non-unsafe C# code blocks, so this isn't a security issue, but invalid data yields undefined results - you won't always get an exception, sometimes invalid data will be silently produced.

At some point I'll likely be adding a C or C++ implementation as well.

I am, of course, happy to hear any feedback.

Cheers!

Yann Collet

unread,

Dec 14, 2013, 8:34:22 AM12/14/13

to lz...@googlegroups.com

Hello Phill

It's a very great piece of work you put together into this release.

It definitely deserves a link on the LZ4 homepage, and can even serve as inspiration for future LZ4 evolutions.

> invalid data yields undefined results - you won't always get an exception, sometimes invalid data will be silently produced.

This part can probably be solved by implementing the xxHash crc algorithm. It will validate that output data is valid.

Best regards

Phill Djonov

unread,

Dec 16, 2013, 3:42:59 AM12/16/13

to lz...@googlegroups.com

Just a quick update, I've posted a preliminary implementation of the same sort of decoder, written in C and with a zlib like interface. (The files are in the same place as the managed version.) I haven't thoroughly tested it yet, but it's working fine with the data sets I've thrown at it so far, so I feel it's good enough to make public.

Reply all

Reply to author

Forward