My results from testing starlit is that it meets the conditions for
the Hutter prize. On June 8 2021:
I am currently testing the May 31 submission on my Lenovo core
i7-1165G7, 2.80 GHz, 16 GB in an Ubuntu 20.04 shell under Windows 10.
The compiled size under clang++-12 is 401505 bytes (cmix_orig = 124984
bytes), which is larger than the prebuilt binary (cmix_amdzen1_ub18 =
114012) but smaller than a zipped directory tree of the source code
and makefile = 639942 bytes (mostly the uncompressed article
reordering data). I will use the executable size for both the Hutter
prize and large text benchmark score.
I have the following results so far. All tests use 9.5 GB of memory
according to top and /usr/bin/time -v.
cmix -n enwik6 -> 199039, 276s, 280s (size, compress and decompress
times, no preprocessing)
cmix -c enwik6 -> 198967, 280s, 270s (preprocessing, no dictionary)
cmix -c .dict enwik6 -> 179224, 186s, 191s (with dictionary)
cmix -c .dict enwik8 -> 15215107, about 5 hours, testing decompression
now. I didn't get an exact time because I had to disable sleep mode
and leave the lid open to keep the CPU running at 100%.
I expect compression and decompression to take about 2 days each (50
hours), which is within the requirements for the Hutter prize (70
hours on a geekbench 5 T = 1427).
On June 13 2021:
Starlit test was successful.
archive9 size 114951433
Compress time 48:19:20
User time 173865 s system 92 s
Max resident set 10230464 kb
Decompress time 47:41:28
User 171549s sys 133s
10233912 kb max resident
enwik9_uncompressed compares OK.
Also I have updated the large text benchmark. starlit is #2.
http://mattmahoney.net/dc/text.html
> To view this discussion on the web visit
https://groups.google.com/d/msgid/hutter-prize/d9f16286-6485-4644-ac25-9cb5629f775cn%40googlegroups.com.
--
-- Matt Mahoney,
mattma...@gmail.com