On 09.02.2012 16:41, Harry Potter wrote:
> On Feb 8, 10:55 am, Thomas Richter<
t...@math.tu-berlin.de> wrote:
>> What type of information do you have, and which type of answer would you
>> need? Do you have the Huffman code available? What do you know on the
>> statistics of the source? Do you need a worst-case or an average-case
>> answer? Is an estimate sufficient?
>>
> * General information--I'll do more specific later
> * I'm only starting the code--I don't have it yet
> * The compressor can scan the source file or parts of it for the
> statistics of the source befoe compression
Is a two-pass approach acceptable, i.e. determine the relative
frequencies first, design the optimal Huffman code, then encode?
> * I need the best possible depending on user settings
> * An estimation will do--and is in fact what I want.
If this is simple zero-order Huffman coder: Set p_i to the relative
frequency of symbol i. ceil(-log_2(p_i)) will give you the number of
bits the Huffman code will assign to the symbol. N \sum_i p_i *
ceil(-log_2(p_i)) will give you the compressed file size in bits, where
N is the number of symbols in the stream. Round up to the next byte,
done. Remember that you somehow also need to signal the Huffman code you
used, i.e. there is an overhead of side-channels not included in the
above computation.
Greetings,
Thomas