Note VH2 is meant to be a bijective huffman file
compressor from the set of all files to the set of all files.
using 65536 number of symbols.
If one wants to make it a bijective file compressor from the
set of all even numbered byte length files to all files. Then
you can change the code where the inbit.r() and outbit.wz()
occur. Keep in mind they work in the pure string space of all
possible bit strings. in read you get either 0 or 1 the bits or
-1 End of Sring or -2 past End of String. On the bit writes
0 or 1 writes the bits -1 writes the End of String while -2
says I meant the last 1 to be the End of String the trailing
zeros dropped and you must have done at least a write 1 somewhere
before any -2 write.
If you still want it to work with even files only you can
preprocess the file that is map the even files to the set of all
files. Think of even file as three types of symbols 0x0000 0x8000
and REST16 write think of ouput file as three types of symbols
0x00 0x80 and REST8 note the 0x000 maps to 0x00 and 0x00 while
the 0x8000 maps to 0x80 and 0x00 while REST16 maps to either case
A REST8 and REST8 or case B 0x00 REST8 or case C REST8 0x00 case D
0x80 REST8 or case E REST8 0x80 or case F 0x80 0x80 or case G
0x00 0x80.
Keep track of 2 FLAGS F16 and F8 both start at zero. Read a 16bit
symbol. Every time you look at a new 16 bit symbol you either set
F16 if the symbol was a 0x0000 or clear the symbol if it is a REST16
do nothing to F16 if it was a 0x80000.
check if there is going to be a next 16 bits of input of so
do rest of this paragrph esle go to Fianl Phase.
Know write the first byte straight out to output file set F8 if
you wrote 0x00 and clear F8 if you wrote a REST8 type and leave F8
along if you wrote 0x80. Process second byte as first byte
setting flags as above. get next 16 bits and so till done.
Final Phase your on last of even numbered byte file input you
have just read the last 16bits and have adjusted the F16 flag.
process the fist byte to output file updating the F8 Flag.
Your at point of decding what to write if at all the last byte.
The final byte that you would normally write out is always
written out except in the following two cases.
case A don't write the last byte out if it would be a 0x00
and F16 and F8 are currently zero. Note F8 has not been
updated by final byte since not written yet.
case B don't write last byte out if it would be a 0x80 and
the F8 flag is ser Note F16 always 0 here,
Then do VH2
This this will result in same file or one that is shorter.
How much shorter depends on how long the last symbol is
compared to not writing the last symbol. Example
say you did not convert and you have Case A or B above
the last symbol goes to 00101000(-1) the -1 since last one in
the string space if you convered it goes to 0010(-1) it uses
the previous last one. The difference could be several bytes.
David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"