Huffman codes for known distribution

Andrei Alexandrescu

unread,

Sep 7, 1998, 3:00:00 AM9/7/98

to

Hello,

I have a question for you.
I have to send some big numbers whose distribution (and its
parameters) I know beforehand: it's normal distribution. Now that I
know the probability of each symbol (given by Gauss' formula), I can
apply logarithm and calculate the code length without a hitch. That
function will be a quadratic one.
The problem is, how can I get the actual code for each symbol without
building the Huffman tree? There has to be a mathematical way that
does not imply any memory consumption. Seems like I have to compute
the population (number of symbols) for each code length, but my math
is too weak for that. Otherwise I worked pretty much with compression
and the Huffman algorithm, so you can get deep into specifics.
Could you give me a hand? Thank you very much.

Yours,

Andrei Alexandrescu

DI Michael Schindler

unread,

Sep 8, 1998, 3:00:00 AM9/8/98

to

hi!

some ideas are on my huffman coding page
http://www.compressconsult.com/huffman/

The thing that might interest you most is the following
article from alistair moffat, also referenced from my page:
http://www.cs.mu.oz.au/~alistair/abstracts/inplace.html

Also my page gives some hints for in-place code calculations,
using only the space you need for efficient encoding anyway.
this space is the last log2(alphabetsize) bits of code and
codelength (also log2(alphabetsize) bits maximum) for each
symbol.

In your case it might pay off to encode not the whole number
but only part of it using huffman code and write out the
remaining bits verbatim; but of course this depends on the
parameters of your distribution. The point is that the whole
huffmann memory need will shrink as your number of symbols
shrinks in this approach. The last few bits will be
nearly evenly distributed anyway, so your loss in compression
is small.

just mail me if you need more info.
Simple entropy coding might not be best for your data, but I
need to know more before commenting.

regards
Michael

--
DI Michael Schindler Compression Consulting
http://www.compressconsult.com/ mic...@compressconsult.com

Andrei Alexandrescu <alexan...@micromodeling.com> wrote in article
<6t11kr$571$1...@scream.auckland.ac.nz>...

BCrow1962

unread,

Sep 11, 1998, 3:00:00 AM9/11/98

to

If you compute the table for an average distribution, you might find that the
difference in the amount of compression you get with a table built from a
specific distribution is insignificant. In other words, why not precompute the
table?

Brian Crowley

Christian von Roques

unread,

Sep 14, 1998, 3:00:00 AM9/14/98

to

Andrei Alexandrescu <alexan...@micromodeling.com> writes:

> I have to send some big numbers whose distribution (and its
> parameters) I know beforehand: it's normal distribution. Now that I
> know the probability of each symbol (given by Gauss' formula), I can
> apply logarithm and calculate the code length without a hitch. That
> function will be a quadratic one.

> The problem is, how can I get the actual code for each symbol without
> building the Huffman tree? There has to be a mathematical way that
> does not imply any memory consumption. Seems like I have to compute
> the population (number of symbols) for each code length, but my math
> is too weak for that. Otherwise I worked pretty much with compression
> and the Huffman algorithm, so you can get deep into specifics.
> Could you give me a hand? Thank you very much.

You can make use of arithmetic-coding. This is conceptually very
simple, but sadly has some hard practical problems: Let's assume, that
you just want to encode your numbers up to a specified precision. To
get a good entropy-encoding, you should try to give all bits in the
resulting encoding such a meaning, that they are as likely to be 0 as
they are likely to be 1, so that each of them represents one Bit of
information. This is in theory no problem, as you know the
probability distribution P(x) of the numbers you want to encode
beforehand. To encode a number X, you start with the knowledge, that
it must lie between lo = minus infinity and hi = plus infinity. To
make a code-bit represent one Bit of information you must find the
point m between lo and hi, such that lo <= X < m is as likely as m <=
X < hi, that is:

\int_{lo}^{m}P(x) dx = \int_{m}^{hi}P(x) dx.

When you've found m, you use one bit to encode, whether X<m or m<=X.
Finding m is the hard part.

If X<m you set the new hi to be m, If m<=X you set lo to m. You then
repeat this encoding with the new hi and lo until hi - lo is small
enough to fulfill your precision constraints. This will require more
bits to encode unlikely numbers with the specified precision than it
will take to encode more likely numbers to the same precision. The
precision constraint need not be uniform.

Hope this helps
Christian.