Re: HKCP Time

343 views
Skip to first unread message

Matt Mahoney

unread,
Jun 19, 2021, 10:39:00 AM6/19/21
to Marcus Hutter, James Bowery, Byron Knoll, Hutter-Prize, Artemiy Margaritov
cc to Hutter-prize group.

Cmix-hp compression is complete. I expect decompression to finish Monday. After that I will be travelling so I won't be doing any more tests until after July 10.

cmix -c .dict enwik8 -> 15957339
Decompression OK 

cmix -e enwik9 (to archive9) -> 113712798. Time 178947s (172685 user, 6502s sys, 52:37 hours, 94% CPU), 6873 MB max resident memory, 10863009 page faults (40 GB SSD I/O at 4k/page). 29.5 GB in temp files according to du. The temp file ppm.temp is 21 GB is apparently sparse as there are 17.5 GB of other temp files. Top shows 24 GB virtual memory while running.

Test hardware: Lenovo 82HT laptop, Core i7-1165G7, 2.80 GHz, 4 core, 8 thread, 16 GB RAM, 500 GB SSD, Ubuntu 20.04 under Windows 10. Tested with screen saver and sleep mode disabled, WiFi off, lid open. Geekbench 5: 1427 single, 4667 multi. Geekbench 4: 6638, 20384.

Cmix-hp by Byron Knoll is a modification of starlit by Artemiy Margaritov that enlarges and memory maps the PPM model. Starlit reorders the articles by mutual information, then preprocesses using the dictionary from phda9 (previous Hutter prize winner by Alexander Rhatushnyak) and compresses with a reduced version of cmix. Both programs produce a self extracting archive for the special case of enwik9.

On Fri, Jun 18, 2021, 10:55 AM Matt Mahoney <mattma...@gmail.com> wrote:
Cmix-hp compression of enwik9 is 75% complete after 38 hours. I will announce results on the Hutter prize list when completed, which I expect to take 50 hours, then test decompression. I will be traveling after that and not be able to run any more tests until after July 10. I was testing enwik8 with the lid shut (screen saver and sleep mode disabled) as an experiment to see if it affected speed and it still does. Decompression took 12.5 hours (successfully) and should have taken 5. All my other tests are with the lid open.

On Thu, Jun 17, 2021, 2:29 PM Marcus Hutter <marcus...@gmx.net> wrote:
Hi Matt and James,

Artemiy's has been the logistically most complex submission process so
far and I lost track of all the issues.
There seems to be even some dispute of the (de)compressor file size.
Hopefully end of June you can give me a summary where we stand.

As of comp.time, I accept if (de)compressor runs in 50000/T hours, where
T is the Geekbench 5 score, on at least one of our test machines.

A computer of a friend of mine has a GeekBench5 Single-Core Score of 835
https://browser.geekbench.com/v5/cpu/8005428
i.e. budget is 50000/T=60hours. Artemiy's (de)compressor ran in 67 hours.
I let that count for this time, but probably not for future submissions.

Bryon, by any measure, your submission is too slow.
Ideally please bring it down to half the time, so that it satisfies the
rules on Matt's machine.
IF it exceeds the time and IF we can run use the other machine and IF it
runs under 50'000/T give and take a bit, I'll accept too.

Thanks for all your help/contributions.

Cheers,

Marcus

Matt Mahoney

unread,
Jun 21, 2021, 4:32:33 PM6/21/21
to Marcus Hutter, James Bowery, Byron Knoll, Hutter-Prize, Artemiy Margaritov
Cmix-hp enwik9 decompression was successful (compares identical) in 53:58 hours using 6756 MB max resident memory and 29.5 GB disk. Here is a summary of both submissions.

Starlit submitted May 31.
Compressor size 401,505.
Enwik8 15,215,107.
Enwik9 archive size 114,951,433.
Time 48:19 + 47:41.
Memory 10230, 10233 GB.
Disk 17.5 GB.

Cmix-hp submitted June 10.
Compressor 397,501.
Enwik8 15,957,339.
Archive 113,712,798.
Time 52:37 + 53:58.
Memory 6873, 6756 MB.
Disk 29.5 GB.
Message has been deleted
Message has been deleted
Message has been deleted
Message has been deleted

Matt Mahoney

unread,
Aug 28, 2021, 6:17:33 PM8/28/21
to Marcus Hutter, Byron Knoll, James Bowery, Hutter-Prize, Artemiy Margaritov
I tested cmix_hpv3 by Byron Knoll (released Aug. 5) on my Lenovo
laptop, Core i7-1165G7, 2.80 GHz, 16 GB RAM, SSD, Ubuntu under Windows
10.

enwik9 compressed to 113591263 in 52:21:32 at 94% CPU using 6671 MB
max resident memory and 17.2 GB temp files.
decompression OK in 52:14:37 at 94% CPU using 6693 MB memory and 16.3
GB temp files.
The compressor size (cmix) is 397411 bytes consisting of the compiled
output using clang++12 at 119196 bytes before appending the compressed
dictionary and article reordering data. The output of the compressor
(archive9) is a self extracting archive which includes a copy of
itself with the compressed data appended.


matt@DESKTOP-KSVMRD1:~/cmixhpv3/run2$ /usr/bin/time -v ./cmix -e
~/enwik9 archive9
78127 bytes -> 415377 bytes in 63.29 s.
200076 bytes -> 1131233 bytes in 162.54 s.
Detected block types: TEXT: 100.0%
934188796 bytes -> 113591263 bytes in 178537.92 s.
free(): invalid size
Command terminated by signal 6
Command being timed: "./cmix -e /home/matt/enwik9 archive9"
User time (seconds): 169926.00
System time (seconds): 8839.13
Percent of CPU this job got: 94%
Elapsed (wall clock) time (h:mm:ss or m:ss): 52:21:32
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6671896
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 10375707
Minor (reclaiming a frame) page faults: 2335609765
Voluntary context switches: 10294077
Involuntary context switches: 2793631
Swaps: 0
File system inputs: 1660324680
File system outputs: 6510469224
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run2$ cd ..
matt@DESKTOP-KSVMRD1:~/cmixhpv3$ mkdir run3
matt@DESKTOP-KSVMRD1:~/cmixhpv3$ cd run3
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ cp ../run2/archive9 .
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ /usr/bin/time -v ./archive9
78127 bytes -> 415377 bytes in 62.98 s.
113591263 bytes -> 934188796 bytes in 177352.75 s.
Command being timed: "./archive9"
User time (seconds): 170047.99
System time (seconds): 7368.66
Percent of CPU this job got: 94%
Elapsed (wall clock) time (h:mm:ss or m:ss): 52:14:37
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 6693856
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 10616738
Minor (reclaiming a frame) page faults: 2335736358
Voluntary context switches: 10535460
Involuntary context switches: 3508856
Swaps: 0
File system inputs: 1736866272
File system outputs: 6513257536
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0

In the file listings below, run2 is the compressor output, run3 is the
decompressor output, run is the original compiled compressor.

matt@DESKTOP-KSVMRD1:~/cmixhpv3/run2$ ll
total 17201408
drwxr-xr-x 2 matt matt 4096 Aug 23 13:55 ./
drwxr-xr-x 10 matt matt 4096 Aug 26 11:43 ../
-rw-r--r-- 1 matt matt 9745 Aug 21 09:37 .coda
-rw-r--r-- 1 matt matt 119196 Aug 21 09:33 .decomp_bin
-rw-r--r-- 1 matt matt 415377 Aug 21 09:34 .dict
-rw-r--r-- 1 matt matt 78127 Aug 21 09:33 .dict.comp
-rw-r--r-- 1 matt matt 1404 Aug 21 09:37 .intro
-rw-r--r-- 1 matt matt 999988851 Aug 21 09:37 .main
-rw-r--r-- 1 matt matt 934177647 Aug 21 09:40 .main_phda9prepr
-rw-r--r-- 1 matt matt 999988851 Aug 21 09:37 .main_reordered
-rw-r--r-- 1 matt matt 1131233 Aug 21 09:37 .new_article_order
-rw-r--r-- 1 matt matt 200076 Aug 21 09:34 .new_article_order.comp
-rw-r--r-- 1 matt matt 934188796 Aug 21 09:40 .ready4cmix
-rwxrwxrwx 1 matt matt 113788598 Aug 23 13:55 archive9*
-rwxr-xr-x 1 matt matt 397411 Aug 21 09:31 cmix*
-rw-r--r-- 1 matt matt 197323 Aug 23 13:55 dec1
-rw-r--r-- 1 matt matt 113788586 Aug 23 13:55 dec2
-rw-r--r-- 1 matt matt 12 Aug 23 13:55 header4archive.dat
-rw-r--r-- 1 matt matt 953981679 Aug 21 09:40 out1
-rw-r--r-- 1 matt matt 953981667 Aug 21 09:39 out10
-rw-r--r-- 1 matt matt 953981673 Aug 21 09:39 out11
-rw-r--r-- 1 matt matt 953981677 Aug 21 09:39 out12
-rw-r--r-- 1 matt matt 953981661 Aug 21 09:38 out2
-rw-r--r-- 1 matt matt 985154324 Aug 21 09:38 out3
-rw-r--r-- 1 matt matt 984215015 Aug 21 09:38 out4
-rw-r--r-- 1 matt matt 980080701 Aug 21 09:38 out5
-rw-r--r-- 1 matt matt 999988944 Aug 21 09:38 out7
-rw-r--r-- 1 matt matt 999988935 Aug 21 09:38 out8
-rw-r--r-- 1 matt matt 953981679 Aug 21 09:39 out9
-rw-r--r-- 1 matt matt 149516 Aug 23 13:54 progress.log
-rw-r--r-- 1 matt matt 12 Aug 21 09:33 test.dat
-rw-r--r-- 1 matt matt 937788523 Aug 21 09:39 tmp1a
-rw-r--r-- 1 matt matt 16193156 Aug 21 09:39 tmp1b
-rw-r--r-- 1 matt matt 929737295 Aug 21 09:38 tmp2a
-rw-r--r-- 1 matt matt 24244366 Aug 21 09:38 tmp2b
-rw-r--r-- 1 matt matt 934179051 Aug 21 09:40 un1
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run2$ cd ../run3
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ ll
total 16390188
drwxr-xr-x 2 matt matt 4096 Aug 28 15:58 ./
drwxr-xr-x 10 matt matt 4096 Aug 26 11:43 ../
-rw-r--r-- 1 matt matt 9745 Aug 28 15:55 .coda_decomp
-rw-r--r-- 1 matt matt 78127 Aug 26 11:44 .dict.comp_decomp
-rw-r--r-- 1 matt matt 415377 Aug 26 11:45 .dict_decomp
-rw-r--r-- 1 matt matt 934188796 Aug 28 15:55 .input_decomp
-rw-r--r-- 1 matt matt 1404 Aug 28 15:55 .intro_decomp
-rw-r--r-- 1 matt matt 934177647 Aug 28 15:55 .main_decomp
-rw-r--r-- 1 matt matt 999988851 Aug 28 15:57 .main_decomp_restored
-rw-r--r-- 1 matt matt 999988851 Aug 28 15:58 .main_decomp_restored_sorted
-rw-r--r-- 1 matt matt 113591263 Aug 26 11:45 .ready4cmix_decomp
-rwxr-xr-x 1 matt matt 113788598 Aug 26 11:44 archive9*
-rw-r--r-- 1 matt matt 1000000000 Aug 28 15:59 enwik9_uncompressed
-rw-r--r-- 1 matt matt 953981661 Aug 28 15:56 out10d
-rw-r--r-- 1 matt matt 953981673 Aug 28 15:56 out11d
-rw-r--r-- 1 matt matt 953981667 Aug 28 15:56 out12d
-rw-r--r-- 1 matt matt 953981677 Aug 28 15:56 out13d
-rw-r--r-- 1 matt matt 999988935 Aug 28 15:57 out15d
-rw-r--r-- 1 matt matt 953981679 Aug 28 15:55 out1d
-rw-r--r-- 1 matt matt 980080701 Aug 28 15:56 out2d
-rw-r--r-- 1 matt matt 999049635 Aug 28 15:56 out3d
-rw-r--r-- 1 matt matt 999988944 Aug 28 15:56 out4d
-rw-r--r-- 1 matt matt 984215035 Aug 28 15:56 out5d
-rw-r--r-- 1 matt matt 953981679 Aug 28 15:55 out6d
-rw-r--r-- 1 matt matt 12 Aug 26 11:44 test.dat
-rw-r--r-- 1 matt matt 999990255 Aug 28 15:58 un1_d
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ du
16390184 .
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ md5sum enwik9_uncompressed
e206c3450ac99950df65bf70ef61a12d enwik9_uncompressed
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ cmp enwik9_uncompressed ~/enwik9
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run3$ cd ../run
matt@DESKTOP-KSVMRD1:~/cmixhpv3/run$ ll
total 920
drwxr-xr-x 2 matt matt 4096 Aug 21 09:30 ./
drwxr-xr-x 10 matt matt 4096 Aug 26 11:43 ../
-rwxr-xr-x 1 matt matt 397411 Aug 21 09:30 cmix*
-rwxr-xr-x 1 matt matt 119196 Aug 21 09:27 cmix_orig*
-rw-r--r-- 1 matt matt 78127 Aug 21 09:28 comp_dict
-rw-r--r-- 1 matt matt 200076 Aug 21 09:30 comp_order
-rw-r--r-- 1 matt matt 12 Aug 21 09:30 header.dat
-rw-r--r-- 1 matt matt 0 Aug 21 09:27 patch
-rw-r--r-- 1 matt matt 122083 Aug 21 09:30 progress.log

On Sun, Jul 11, 2021 at 2:59 AM Marcus Hutter <marcus...@gmx.net> wrote:
>
> Hi Bryan,
>
> On 7/11/21 12:20 AM, Byron Knoll wrote:
>
> I think I might be able to make a submission within those constraints in a couple weeks.
>
> That'll be great!
>
> Currently I am quite close - I can get about 5% time improvement and still beat starlit by over 1%. With 10% time improvement, cmix-hp currently doesn't beat starlit by 1%.
>
> If you don't manage, then please submit the 5% speed-up next month or so, and we'll consider this.
>
> Cheers,
>
> Marcus
>
>
> On Sat, Jul 10, 2021 at 1:20 PM James Bowery <jabo...@gmail.com> wrote:
>>
>> Given my system's inability to handle submissions based on STARLIT -- presumably because of a bug in the way LLVM produces binary for the Ryzen 3700 CPU (and a frustrating experience I had trying to report the problem to the LLVM folks) -- I'm going to have to rent time on a server farm or upgrade my linode hosted web server to 16GB ram which will double my monthly fee there. I suppose I could buy a new motherboard and CPU.
>>
>> On Sat, Jul 10, 2021 at 3:06 PM Marcus Hutter <marcus...@gmx.net> wrote:
>>>
>>> Hi All,
>>>
>>> Given that I will accept Artemiy's submission and Bryon's is only 10% slower than Artemiy's and beats Artemiy's by 1%, I think I should permanently relax the 50'000/T to 70'000/T and (re)pconsider Bryon's submission.
>>>
>>> @James. Could you test the submission?
>>> @Bryon. Any chance to make yours 10% faster to put it clearly under the relaxed time constraint?
>>>
>>> Cheers,
>>>
>>> Marcus
--
-- Matt Mahoney, mattma...@gmail.com

James Bowery

unread,
Oct 10, 2021, 2:38:16 PM10/10/21
to Hutter Prize
I managed to get my machine running one of the starlit-derived entries by removing the K80 TPU.  Apparently the K80 and the clang ecosystem don't play well together.

The version I'm running is:

0000000000000000000000000000000000000000 1e95318d9f0b157e4aafe6b73c85c2a2674a2fd1 hutterprize <hutterprize@ML.(none)> 1627863019 -0500  clone: from https://github.com/byronknoll/cmix-hp.git
1e95318d9f0b157e4aafe6b73c85c2a2674a2fd1 85a5c6486b898f97e0faa927543b2e872a747524 hutterprize <hutterprize@ML.(none)> 1633881931 -0500  pull: Fast-forward



James Bowery

unread,
Oct 12, 2021, 2:08:06 PM10/12/21
to Hutter Prize
It still core dumped but only after it _seemed_ to run to completion.  (The STOP was due to my need to temporarily pause execution to get some work done as the OS wasn't prioritizing things correctly -- multiple second key echos in xterm -- even after renice and ionice).  I'm running the decompression now.

~/entries/ByronKnoll/cmix-hp/run$ time ./cmix -e ../../../enwik9 enwik9.comp
78128 bytes -> 415377 bytes in 64.96 s.
200014 bytes -> 1131233 bytes in 157.62 s.
Detected block types: TEXT: 100.0%
progress: 20.62%     
[1]+  Stopped                 ./cmix -e ../../../enwik9 enwik9.comp

real 546m1.611s
user 0m0.000s
sys 0m0.000s
934188796 bytes -> 113552737 bytes in 159918.94 s.
munmap_chunk(): invalid pointer

[1]+  Aborted                 (core dumped) ./cmix -e ../../../enwik9 enwik9.comp


James Bowery

unread,
Oct 15, 2021, 2:41:56 PM10/15/21
to Hutter Prize
Decompressing core dumped without completing decompression.  So my system still has compatibility problems with the STARLIT-based submissions.  But it _is_ interesting that removing the K80 TPU so dramatically delayed the core dumps.

~/entries/ByronKnoll/cmix-hp/run$ time nice ./archive9
78128 bytes -> 415377 bytes in 61.03 s.
progress: 90.31%Segmentation fault (core dumped)

real 2218m42.903s
user 2080m22.265s
sys 134m59.337s

~/entries/ByronKnoll/cmix-hp/run$ time nice ./archive9
78128 bytes -> 415377 bytes in 61.03 s.
progress: 90.31%Segmentation fault (core dumped)

real 2218m42.903s
user 2080m22.265s
sys 134m59.337s
hutterprize@ML:~/entries/ByronKnoll/cmix-hp/run$ ls -altr

-rwxrwxrwx 1 hutterprize hutterprize   113741417 Oct 12 08:12 archive9
-rw-rw-r-- 1 hutterprize hutterprize          12 Oct 12 13:02 test.dat
-rw-rw-r-- 1 hutterprize hutterprize       78128 Oct 12 13:02 .dict.comp_decomp
-rw-rw-r-- 1 hutterprize hutterprize      415377 Oct 12 13:03 .dict_decomp
-rw-rw-r-- 1 hutterprize hutterprize   113552737 Oct 12 13:03 .ready4cmix_decomp
drwxrwxr-x 2 hutterprize hutterprize        4096 Oct 12 13:03 .
-rw-rw-r-- 1 hutterprize hutterprize 14680064001 Oct 14 02:00 ppm.temp
-rw-rw-r-- 1 hutterprize hutterprize   529080320 Oct 14 02:00 .cmix.temp
Reply all
Reply to author
Forward
0 new messages