I have a few quick questions about SDCH/vcdiff dictionaries. Aside from the SDCH header referencing the host, path, expiry etc., is it safe to say that tokens in SDCH/vcdiff dictionaries are separated by newlines (0x0a)? If that's the case, what if the token we want to encode contains a newline?
std::list<const char*> common_long_strings = pickStringsFromDocuments(documents);
writeDictToFile(filename, common_long_strings)
--
My recollection, from distant memories, is that the data inside of the dictionary is not separated into tokens. This means that although the dictionary is created from common strings, consecutive strings may actually form a portion of text that is reproduced by the decoding. The encoding specifies an arbitrary offset in the dictionary, as well as a length. There are no separators found in the dictionary.
YMMV,
Jim
sent from mobile
--