understanding lz4

cse b

unread,

Sep 6, 2015, 7:26:51 AM9/6/15

to LZ4c

1.Can any body please explain briefly about lz4 algorithm in simple words?
.
2. Lz4 vs lz4hc

Yann Collet

unread,

Sep 6, 2015, 6:11:21 PM9/6/15

to LZ4c

1. This article might be useful :

http://fastcompression.blogspot.co.uk/2011/05/lz4-explained.html

2. lz4hc == lz4 High Compression

Gilles Vollant

unread,

Sep 7, 2015, 2:57:22 AM9/7/15

to lz...@googlegroups.com

Can we imagine a lz4hc like compression (very fast decompression) with big dictionary (so decompression if fast but can uses, by example 1 MB of memory) ?

De : lz...@googlegroups.com [mailto:lz...@googlegroups.com] De la part de Yann Collet
Envoyé : lundi 7 septembre 2015 00:11
À : LZ4c
Objet : Re: understanding lz4

--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "LZ4c".
Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse lz4c+uns...@googlegroups.com.
Pour obtenir davantage d'options, consultez la page https://groups.google.com/d/optout.

Yann Collet

unread,

Sep 7, 2015, 9:09:02 AM9/7/15

to LZ4c

Yes.

The problem will be how to design an efficient offset encoding/decoding.

1 MB is 20 bits.

Already today, in an lz4 compressed block, the majority of the cost is the offset cost.

Larger offsets will make this cost even worse.

On the other hand, the ability to look farther will likely reduce literal costs, and even the number of sequences.

So the net trade-off is a probably better compression ratio.

Still, with a fixed-size offset, most of the benefits will be spent just by the increase in offset cost.

So the natural tendency is to create a variable-size offset, in order to pay "cheaply" for short matches.

Unfortunately, it also makes decoding more complex, hence has an impact on decoding speed.

Nonetheless, I bet it's certainly doable.

But considering all these issues,

I made my choice and decided to move to zstd,

which can efficiently produce and decode variable-size offsets and length counter.

Downside : it's more complex...

Le lundi 7 septembre 2015 08:57:22 UTC+2, Gilles Olivier Vollant a écrit :

Can we imagine a lz4hc like compression (very fast decompression) with big dictionary (so decompression if fast but can uses, by example 1 MB of memory) ?

De : lz...@googlegroups.com [mailto:lz...@googlegroups.com] De la part de Yann Collet
Envoyé : lundi 7 septembre 2015 00:11
À : LZ4c
Objet : Re: understanding lz4

1. This article might be useful :
http://fastcompression.blogspot.co.uk/2011/05/lz4-explained.html

2. lz4hc == lz4 High Compression

Le dimanche 6 septembre 2015 13:26:51 UTC+2, cse b a écrit :
1.Can any body please explain briefly about lz4 algorithm in simple words?
.
2. Lz4 vs lz4hc

--
Vous recevez ce message, car vous êtes abonné au groupe Google Groupes "LZ4c".

Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le concernant, envoyez un e-mail à l'adresse lz4c+unsubscribe@googlegroups.com.

Evan Nemerson

unread,

Sep 7, 2015, 6:46:10 PM9/7/15

to lz...@googlegroups.com

This is getting a bit off-topic, but do you know if anyone has played
with using variable-length quantities [1] (or something similar) for
offsets? I've never seen it mentioned, but then again I don't usually
bother reading the actual implementations of compression algorithms…

Obviously you lose some capacity with the continuation bit, but it
seems like shorter values for smaller offsets should provide a net
gain, and of course it lets you represent large offsets… Maybe a
variant with 15 bits of always-available storage, an optional
continuation bit, then (if that is set) 8 more bits of storage? That
would give you 23 bit offsets (in 24 bits of space), but offsets < 2^15
would only take require 2 bytes of space…

[1] https://en.wikipedia.org/wiki/Variable-length_quantity

> > *De :* lz...@googlegroups.com [mailto:lz...@googlegroups.com] *De la
> > part
> > de* Yann Collet
> > *Envoyé :* lundi 7 septembre 2015 00:11
> > *À :* LZ4c
> > *Objet :* Re: understanding lz4

> >
> >
> >
> >
> >
> > 1. This article might be useful :
> >
> > http://fastcompression.blogspot.co.uk/2011/05/lz4-explained.html
> >
> >
> >
> > 2. lz4hc == lz4 High Compression
> >
> >
> >
> >
> >
> > Le dimanche 6 septembre 2015 13:26:51 UTC+2, cse b a écrit :
> >
> > 1.Can any body please explain briefly about lz4 algorithm in simple
> > words?
> > .
> > 2. Lz4 vs lz4hc
> >
> > --
> > Vous recevez ce message, car vous êtes abonné au groupe Google
> > Groupes
> > "LZ4c".
> > Pour vous désabonner de ce groupe et ne plus recevoir d'e-mails le
> > concernant, envoyez un e-mail à l'adresse

> > lz4c+uns...@googlegroups.com.

Yann Collet

unread,

Sep 9, 2015, 2:45:46 AM9/9/15

to LZ4c

Yes,

Charles Bloom made some in-depth analysis of this, and wrote some very interesting articles about it.

He even created a variant from his learnings, called LZNib.

http://cbloomrants.blogspot.fr/2012/09/09-04-12-encoding-values-in-bytes-part-4.html

http://cbloomrants.blogspot.fr/2012/09/09-13-12-lznib.html

> > lz4c+unsubscribe@googlegroups.com.

Reply all

Reply to author

Forward