Some questions about LZ4 streaming API

913 views
Skip to first unread message

Takayuki Matsuoka

unread,
Jun 13, 2014, 4:58:08 PM6/13/14
to lz...@googlegroups.com
Hi, Yann.

I have some questions about LZ4 streaming API in dev branch ( https://github.com/Cyan4973/lz4/tree/dev ).
Since I'm still not sure about purpose of these APIs, perhaps my questions are irrelevant.

Here is a my doubtful code https://gist.github.com/t-mat/cdb1bffe0f9664e968d1 (same file is attached). In this code, I employed the double buffer as "safe-buffer" to avoid `memcpy()`/`memmove()`.

My questions are :

1. Is this a valid usage of the API ?

I'm really confusing about comression with smaller buffer (< 64KiB) case. Is explicit `memset(LZ4_stream_t)` + `LZ4_loadDict()` (Line#53) valid and expected ?

2. What is the purpose of `LZ4_dict_t_internal.initCheck` ?

`LZ4_loadDict()` is checking it (`dict->initCheck != 0`), but even if `dict->initCheck` is zero, it is not guarantee entire `LZ4_dict_t_internal` (`LZ4_stream_t`) is 0. I think I'm still missing something, but I have no idea.

lz4-streaming-api-example.zip

Yann Collet

unread,
Jun 14, 2014, 7:52:37 AM6/14/14
to lz...@googlegroups.com
Hi Takayuki


Please find some tentative answers to your questions.


- You don't need to LZ4_loadDict() before each block.
LZ4_compress_continue() will remember where previous data is, and will look for it right there.
LZ4_loadDict() is supposed to be useful in a static dictionary scenario, 
whenever LZ4_compress_continue() had no chance to see previous data.

Note that your code will work, it's just that it is doing some additional work which is not required.


- You don't need to memset() before each block either.
memset() is only required once, when the LZ4_stream_t object is created.
Using = {0}; is enough.
After that, it's no longer necessary to reset it.
Resetting the LZ4_stream_t object is the equivalent of restarting to zero again.
It's one way to create independently compressed blocks using LZ4_compressed_continue().
(The other way is to LZ4_loadDict() with a dictSize of zero, which is much faster).


- Small Buffers (<64KB) are a bit tricky, and you're correct to be cautious about them.

Basically :
+ Small blocks have no impact on LZ4_compress_continue() behavior. Everything works as intended, and you don't need to use LZ4_loadDict() nor memset(). You can compress multiple very small data chunks one by one, and LZ4_compress_continue() will produce valid results.
+ Its ability to "remember" where stand all previous small data chunks is limited though. That's where it can become tricky.
+ If all previous small data chunks are stored next to each other into memory, LZ4_compress_continue() will simply merge them, adding the next block to previous ones, and consider them a "single dictionary area"
+ If, at some point, there is a separation between 2 consecutive blocks,  LZ4_compress_continue() will only remember the "previous block", not those before.

+ There is a problem though, whenever a memory segment registered within dictionary area is reused as an input area.
In this case, the current algorithm doesn't detect this situation, which can result in data corruption.
It happens when using a small buffer (<=64KB) in rotating mode.
I think it is the case in your example, since you seem to use an 8KB rotating buffer, separated into 8 chunks of 1 KB.
In this case, the work-around is to use LZ4_loadDict(), to reset the dictionary area, as you did by the way !
(memset() is not required, LZ4_loadDict() is enough).

+ I feel this situation could be handled better.
I've not made any effort to support this scenario, since it was not in my target list, but if you believe it is a valid one, 
then there might be some ways to directly integrate it into LZ4_compress_continue().
One possibility would be check input area on startup, and remove it from dictionary area whenever it overlaps.


2. What is the purpose of `LZ4_dict_t_internal.initCheck` ?

Well, as stated previously, the LZ4_stream_t memory area should be initialized to zero before first use (then it's no longer necessary).
There is a risk that some users might not even do that.
In this case, the idea is to detect it, using LZ4_dict_t_internal.iniCheck field, and output an error message without attempting compression.

As you said, it's not a guaranteed proof, it's only a quick hint. My expectation is that it should detect the initialization issue in most cases : for example, in debug mode, memory is typically initialized with a value != 0, precisely to detect uninitialized areas.


- As a quick comment :

The simple fact that you have questions on how the streaming API works is an important hint that's it's not clear enough.
Either in its behavior, or associated documentation, or both.
I'm all ear and very opened to comments and suggestions, to improve this situation,
and provide the necessary modifications to make this API as simple as possible to use and understand.


Best Regards

Yann

Yann Collet

unread,
Jun 14, 2014, 12:02:02 PM6/14/14
to lz...@googlegroups.com
The "dev" branch of LZ4 has been updated to better reflect some of the comments here.

A new overlapping detector has also been added, which checks if input is using space previously expected to be part of dictionary.
It simply dynamically resize the dictionary.

This should help your sample double-buffer code, since you would no longer have to manually work around the problem using LoadDict.


Regards

Yann Collet

unread,
Jun 15, 2014, 5:06:06 PM6/15/14
to lz...@googlegroups.com
I'm considering a potential change to the streaming API.
The expectation is to improve clarity (only), there is no change to capabilities.

streaming Decompression
---------------------------------

Move to a model like the following :
- LZ4_streamDecode_t
- LZ4_decompress_x_continue() 
- LZ4_setDict()

where x stands for : safe, fast

The main idea is : get a similar API feeling as the compression one.
Technically, it's the same as current LZ4_decompress_x_usingDict(),
but there is no more need to specify the dictionary, it is automatically tracked within LZ4_streamDecode_t structure.


Dictionary manipulation
---------------------------------

The current API defines :
LZ4_loadDict()
LZ4_moveDict()

LZ4_loadDict() is more powerful, it can emulate LZ4_moveDict() with the help of a memcpy().
LZ4_moveDict() is basically a shortcut, avoiding to recalculate references.

Proposition : replace moveDict() by saveDict() ?
It seems technically more correct, since data is duplicated, not erased from its initial position.


2) If decompression API is modified, there is a need for dictionary manipulation too.
It's simpler to setup, since there is no hash table associated. Only some pointer and length.

The proposal is to call this function :
LZ4_setDict()

However, it does not "feel" like it is a decompression-dedicated function. There is therefore a risk of confusion with compression-related ones.
Could it be replaced with another name, which would avoid such confusion risk ?


Regards

Yann Collet

unread,
Jun 17, 2014, 4:52:32 PM6/17/14
to lz...@googlegroups.com
The LZ4 streaming API had a bug with small (<64KB) dictionary buffers.
Your implementation was circumventing the bug, thanks to the reset of the structure and reload of dictionary data.

I released a new version into the "dev" branch, correcting the bug.
The new version is able to correctly handle your test program.

The updated test program is available there :

The updated version is a bit simpler,
and looks representative of the way to use the LZ4 streaming API.


Regards

Takayuki Matsuoka

unread,
Jun 18, 2014, 3:20:59 AM6/18/14
to lz...@googlegroups.com
Thanks for the detailed answers and fixes !
I'm very glad to my first example code become simpler and flawlessly. But since I still don't understand streaming APIs behavior yet, so I can't evaluate new names/meanings for now.

I've written second example to understand API: https://gist.github.com/t-mat/88e5ceffb45b04def990
But I'm failed, and have a similar question : Can I avoid these memcpy() and LZ4_loadDict() ?

In this code, I use ring buffer which have variable but limited length messages ([1,MESSAGE_MAX_BYTES] bytes). To guarantee always contiguous memory, it also relaxes wraparound condition, rather than strict offset mask. Since message length is limited, I can wraparound write/read offset easily.
My first attempt is in the IN_MY_IMAGINATION block, but I can't manage it without memcpy() and LZ4_loadDict().

Yann Collet

unread,
Jun 18, 2014, 7:47:43 AM6/18/14
to lz...@googlegroups.com
Hi Takayuki


It's great you keep finding use cases which stretch the API to its limits,
since it's the best way to find them, and improve the API before its official master release.

I'll look into your new sample code a.s.a.p.

Note that I just found and corrected another bug within latest "dev" branch release, 
but my guess is that it's unlikely to solve your latest issues.
Some more investigation time to be planned later today.


Regards

Yann

Yann Collet

unread,
Jun 19, 2014, 12:58:30 PM6/19/14
to lz...@googlegroups.com
I believe I caught the issue.

The latest (unreleased) version works well with example 2, but for compression only.
Decompression still fails.
I believe the issue is about overlapping matches.
In round-buffer mode, using small size buffers, it's possible to have a copy operation performed from a source position very close to the destination position, potentially causing concerns with memcpy().

I'll look deeper into it before posting a potential fix.

Rgds

Yann Collet

unread,
Jun 19, 2014, 6:01:45 PM6/19/14
to lz...@googlegroups.com
OK, I was wrong. The problem lied into my simplified version of your test. It is now corrected, and seems to work correctly.

You can find a modified version of your example at :

It's been simplified in order to take advantage of newer streaming API.
It's basically the same version as the one you defined "IN_MY_IMAGINATION".

Latest version of LZ4 uploaded into the "dev" branch now passes this test, at least on my test machines using several sample files.

However, I'm not entirely satisfied.
One of the key reasons it works is that both compression and decompression buffer are "synchronized".
It means they have same size, same "reset" rule, and therefore packet messages roll within both of them at exactly the same place.
I believe it wouldn't work anymore if both buffers were no longer "synchronized".
I haven't tested it yet, but I guess it could be tested using, for example, a decompression buffer which is twice larger than the compression buffer.

It might be nonetheless enough as a requirement. It's just I'm not sure.


Regards

Yann

Yann Collet

unread,
Jun 21, 2014, 12:06:53 PM6/21/14
to lz...@googlegroups.com
FYI,
latest release within Dev branch
corrects some more bugs,
related with handling small dictionaries,
which were previously hidden by the timely use of memset().

Fuzzer has been updated to test this new condition.

It is recommended to report future streaming tests with this new version.


Regards

Takayuki Matsuoka

unread,
Jun 30, 2014, 9:39:55 PM6/30/14
to lz...@googlegroups.com
First of all, thanks for the many improvements !

* "Synchronization" of streaming buffers

It's fine for my purpose (async logging and networking). But I've test
different decompression buffer size for the curiosity, and it failed to
decompress as you predicted.

I think it would be nice if streaming API allows asymmetric buffer size.
For example, larger ring buffer means more timing margin. So we can
trade-off memory and latency.


* My Streaming API impressions

1. I prefer explicit pointer type than void* for LZ4_stream_t and LZ4_streamDecode_t.
void* is always okay, but there are many void* in lz4[hc].h . When LZ4HC
streaming functions are added, it will be more ambiguous.
If you change these functions, LZ4_free() will also be forked.
(maybe LZ4_freeStream() and LZ4_freeStreamDecode())

2. It would be nice to have LZ4_reset*() functions like XXH32_resetState().
Zero-fill initializing is documented, but memset() or ={0} are just a bit
difficult to distinguish defensive initialization. LZ4_reset*() functions
also allow library side future improvements.

Yann Collet

unread,
Jul 1, 2014, 2:01:05 PM7/1/14
to lz...@googlegroups.com
Thanks Takayuki

I'll certainly appreciate your suggestions for the next release 


I think it would be nice if streaming API allows asymmetric buffer size.
> For example, larger ring buffer means more timing margin. So we can
> trade-off memory and latency.

Indeed, but this will require some change within the inner decoding loop.
I believe it's possible to expressly support ring buffers,
it will introduce a new mode, with the required piece of code to handle it.
So that's some work.

Just to be clear though :
except for buffers > 64 KB, which is probably not the main targeted use case,
it will nonetheless be required that decompression buffer be at least as large as the compression one.
Otherwise, largest offset won't be decodable....


 I prefer explicit pointer type than void* for LZ4_stream_t and LZ4_streamDecode_t.

Indeed, I tend to agree.
void* was selected to preserve compatibility with existing function LZ4_compress_continue()
That's nonetheless a negotiable API element.


>  It would be nice to have LZ4_reset*() functions like XXH32_resetState().

OK, fair enough, and simple to setup.



Rgds

Yann Collet

unread,
Jul 5, 2014, 11:16:12 AM7/5/14
to lz...@googlegroups.com
Hi Takayuki


Quick question :
Do you believe that LZ4_createStream() and LZ4_free() are actually useful within the streaming API ?

I think I added them just for the sake of API similarity, 
but I have the feeling they look redundant, considering LZ4_stream_t is already provided,
making the API somewhat heavier for no clear benefit (at least no clear benefit I can guess).


Regards

Yann

Yann Collet

unread,
Jul 5, 2014, 11:51:32 AM7/5/14
to lz...@googlegroups.com
I've pushed to the dev branch :

a tentative evolution of the LZ4 streaming API, following your suggestions.
Only the fast compression API has been modified so far, to test the main concepts.

Takayuki Matsuoka

unread,
Jul 7, 2014, 6:30:49 AM7/7/14
to lz...@googlegroups.com
> Do you believe that LZ4_createStream() and LZ4_free() are actually useful within the streaming API ?

Yes. I believe these functions are useful.

For my personal purpose, these are not so useful. Because I prefer to allocate these memories by my own code, and don't want to avoid rebuild for future revision of LZ4.

But people who prefer to use LZ4 as (dynamic) library and don't want to change or rebuild their code, these functions are useful to implement completely opaque type.

I don't know much about demand to dynamic library, but these options are good for admins/maintainers who should update LZ4 on the fly like r119.

Yann Collet

unread,
Jul 7, 2014, 7:40:20 AM7/7/14
to lz...@googlegroups.com
> But people who prefer to use LZ4 as (dynamic) library and don't want to change or rebuild their code, these functions are useful to implement completely opaque type.
> I don't know much about demand to dynamic library, but these options are good for admins/maintainers who should update LZ4 on the fly like r119.

OK, but if I do understand your point correctly, it seems to contradict another proposition, which is to move from "void*" pointers towards typed pointers. The reason void* was selected was precisely to keep compatibility with earlier versions of the API. Changing the type will break this compatibility.

Note : I consider the streaming API to be still in "experimental mode", and not yet stabilized. So it's not a big issue to change it right now if need be. Later, it will be more problematic. Hence better select the right choices now.


Regards

Takayuki Matsuoka

unread,
Jul 9, 2014, 7:57:52 PM7/9/14
to lz...@googlegroups.com
Yes. Typed pointer will break compatibility with current API, but I believe it is necessarily breaking change to implement clear and extensible interfaces.
After introducing typed pointer as signature of API set, LZ4 implementors and users will not need to change anything.

I think there are some levels of good abstraction. Minimalistic API such as void* is okay for simple thing. xxHash is good example.
But LZ4 is more complex thing. There are many demands and its solutions : Fast/safe, Baseline/HC, block/streaming, and it will increase.
So I believe it is good time to introduce higher level abstraction (typed pointer) to keep modest complexity and confusions.

Yann Collet

unread,
Jul 14, 2014, 6:06:48 PM7/14/14
to lz...@googlegroups.com
OK, thanks Takayuki, you totally convinced me.
I've adopted your recommendations within latest dev branch release :

Takayuki Matsuoka

unread,
Jul 16, 2014, 4:24:47 PM7/16/14
to lz...@googlegroups.com
Thanks. I've revamped my example: https://gist.github.com/t-mat/88e5ceffb45b04def990

In this time, the example code will compress/decompress the text file in line-by-line fashion. Results are fairly good. Especially for access_log_Jul95 (HTTP server log).

So we will easily be able to implement on-the-fly compression like ngx_http_log_module : http://nginx.org/en/docs/http/ngx_http_log_module.html

Yann Collet

unread,
Jul 16, 2014, 6:01:33 PM7/16/14
to lz...@googlegroups.com
Thanks Takayuki, that's excellent news.

I wasn't expecting much from using packets < 200 bytes,
but your line-by-line test, which goes below that threshold, seem to prove that it nonetheless still provides decent results in this range.

Best Regards

Yann Collet

unread,
Oct 28, 2014, 8:53:56 PM10/28/14
to lz...@googlegroups.com
Old issue, but just for completion :

Latest streaming version of LZ4 decompressor
should now work fine with non-synchronized ring buffers.
All it requires now is one of these 2 conditions :
- Either the ring buffer size for decompression >= 64 KB
or
- the ring buffer size for decompression >= ring buffer size for compression

This version is currently accessible within the dev branch of LZ4 :

It's expected to be released within r124.


Rgds

Beyers Cronje

unread,
Nov 1, 2014, 5:21:36 PM11/1/14
to lz...@googlegroups.com
Yann,

Does this also fix the lz4hc ring buffer issue as discussed on github?

Beyers

Yann Collet

unread,
Nov 1, 2014, 6:02:25 PM11/1/14
to lz...@googlegroups.com
Yes, I believe so.
I've updated some tests to check that, and it seems to work fine.
Reply all
Reply to author
Forward
0 new messages