New functions in XML-WRT 2.0 (14.06.2006):
-internal zlib and LZMA compression
-input XML file is split into containers depend on start-tags and
end-tags
and content under the same tag is sent to the same container
-container for dates in format 1980-02-31 and 01-MAR-1920
-container for times in format 11:30pm
-container for numbers from 1900 to 2155 (years)
-container for pages in format "x-y", where y-x<256, eg. "120-148",
"1480-1600"
-container for numbers in format "x-y", eg. "1234-0", "87-623"
-container for two digits after period, eg. "102.00", "12.01"
-container for numbers from 0.0 to 24.9 (one digit after period), eg.
"12.0", "9.9"
-urls (statring from "http:"), e-mails (x@y.z), "ü" added to
dynamic
dictionary
best regards,
Przemyslaw
Updated compression results for xml-wrt + ppmonstr are posted here:
http://cs.fit.edu/~mmahoney/compression/text.html
Compression ratio is improved slightly, from .1547 to .1542 on enwik9.
I haven't tested the standalone compression yet.
-- Matt Mahoney