While trying to hand-write lossless Webp files, I ran into an issue that it took me many hours to figure out what was going on.
When specifying normal code length prefix codes for G, R, B, A, and Displacement, it appears that Webp requires you to specify more code lengths than you necessarily use, so that the resulting canonical Huffman tree contains all the codes needed to make it "balanced". I'm new to Huffman encoding, so I might be using these terms wrong, so here's an example:
e.g. if your image only has 3 possible B values, webp is unable to decode the image if you only specify code lengths (2, 3, 3), which should expand to the following:
"00"
"010"
"011"
This appears to be illegal because the set is missing all its counterparts that begin with 1.
So instead, you must specify code lengths (2, 3, 3, 3, 3, 3, 3), which expands to:
"00"
"010"
"011"
"100"
"101"
"110"
"111"
Is there a technical or mathematical reason that Webp applies this constraint?
It seems like Webp ought to know how to construct a balanced huffman tree even when the unused branches aren't explicitly specified in the image file.
Would it be possible to add an explanation of this constraint to the documentation?