I've fixed algorithm for updating hash table, but my compress ratio is worse on 1-2% than yours...
Could you please say, what algorithm do you use for updating hash table when you use simple hash (not a hash chain)?
Let you have this data to compress, begins from address 1:
0,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,3,...
1. First we calculate hash for first 4 zeros and write in the hash table address 1.
2. We calculate hash for 4 zeros in address 1. We compress these 5 zeros into 1 literal, offset math == 1 and math length == 5. Then we update hash table and change address 1 to 2 (but its not good).
3. We calculate hash by bytes 1,0,0,0 and write in the hash table address 7.
4. We calculate hash by next 4 zeros and found in the hash table address 2, which match with length == 4. We compress the 4 zeros into 1 literal, offset == 6 and match length == 4. We update the hash table and change address 2 to 8 (its also not good).
5. So for the next 5 zeros we find match with only 4 previous zeros instead of 5, if we do not "corrupt" the data in the table.
Do you use some tricks to avoid these "corruption" hash table in case repeated chain of bytes (when offset match <= length match)?
Thanks,
Serge