Compression using dictionary in application

358 views
Skip to first unread message

Matthew Murrian

unread,
May 29, 2016, 9:56:01 AM5/29/16
to LZ4c
I am making use of lz4.c by Yann Collet.

I have a specific application where I am compressing data packets (they tend to be ~1kB) before network transmission and decompressing on the other end. The entirety of this traffic conforms to a specific proprietary file format (it is essentially a data stream of measurements and position reports from a GPS-receiver of ours). As I am transmitting these packets over UDP, I want to be robust against lost packets. I have opted out of a `streaming compression' for that reason.

I imagine I might benefit from a `compression dictionary' which could easily be hard-coded into all my senders and receivers ... but how do I construct such a dictionary? I have several gigabytes of past data to analyze but I don't know where to begin ...

How can I construct an `optimal dictionary' for LZ4 from stored data?

Matthew Murrian

unread,
May 29, 2016, 9:59:23 AM5/29/16
to LZ4c
Or, if LZ4 is not the right algorithm for this ... would someone recommend another compression suited for this task?

Cyan

unread,
May 29, 2016, 2:14:57 PM5/29/16
to LZ4c
You can use the dictionary builder from Zstandard : http://www.zstd.net 
`zstd --train samplesDirectory/* -o dictionaryFileName --maxdict 64K`

It produces dictionaries for zstd, which are also compatible with lz4 and zlib.
You probably don't need them to be 110 KB (default target size), lz4 can only make use of 64 KB anyway.


You don't need GB of samples to build a good dictionary.
I would advise to reduce your samples total size to ~10 MB, typically by taking random records from your larger sample base.
A dictionary is only effective if it contains frequent elements, and frequent elements don't need GB to show up.


lz4 advantage is that it's very light on CPU, for both compression and decompression.

If your application can afford higher CPU and memory requirements,
you can also give a try to zstd itself,
as it will provide better compression ratios.

Matthew Murrian

unread,
May 31, 2016, 8:58:28 AM5/31/16
to LZ4c
Very helpful information. Just what I needed to know. Thanks a lot!
Reply all
Reply to author
Forward
0 new messages