Begin 30 Day Comment Period On Kaido Orav's fx-cmix

328 views
Skip to first unread message

James Bowery

unread,
Dec 28, 2023, 11:26:06 AM12/28/23
to Hutter Prize
We're now in the 30 day comment and verification period for Kaido Orav's submission called fx-cmix, which has exceeded the 1% improvement award threshold.

Source code is published at:

Improvement:

1.382% = 100*(1-112578322/114156155)

The fx-cmix is a updated implementation of fast-cmix.

Submission Description

This submission contains fallowing modifications on top of the recent fast-cmix Hutter Prize winner:

  • paq8hp model is replaced with fxcmv1 model with fallowing notable additions:
    • Multiple state tables are used in predictors, this allows better predictability.
    • Most contexts are divided between 30 main predictors, this allows more efficient memory usage per context.
    • Added bracket, quote, first char, char in paragraph, column, table, template, word stream/paragraph context. These contexts are parsed depending on input, this includes parsing of wiki links, http links, tables, columns, paragraphs, quotes, brackets, list, templates.
    • Some contexts are swapped in predictors depending on what current input is (table, column mode, word/paragraph, list).
    • Some predictors are switched on/off depending on last char, link or current bracket which improves compression.
    • Predictions are mixed with context that are more aware of what predictors are outputting.
    • Match model (not present in paq8hp model).
    • Predictors are faster allowing more complex contexts.
  • new dictionary
  • small change in phda9 preprocessor and in two tables in cmix
  • memory usage is larger in fxcmv1 model compared to old paq8hp
Recommended VM for replication on Google Cloud takes about 64*2 hours:

Machine configuration
Machine type c2-standard-4
CPU platform Intel Cascade Lake
Minimum CPU platform None
Architecture x86/64
vCPUs to core ratio  1 vCPU per core
Custom visible cores —
Display device Disabled
GPUs None

Boot disk
Image ubuntu-2004-focal-v20231130
Description Canonical, Ubuntu, 20.04 LTS, amd64 focal image

RAM 16GB
HDD 50GB
Intel(R) Xeon(R) CPU @ 3.10GHz
https://browser.geekbench.com/v5/cpu/22027847/claim?key=339176

Matt Mahoney

unread,
Jan 17, 2024, 1:47:03 PMJan 17
to hutter...@googlegroups.com
I posted test results for the self extracting archive to
https://encode.su/threads/4161-fx-cmix-(HP)?p=81864&viewfull=1#post81864
Unfortunately it is too slow on my test machine. 60 hours to extract
and needs to be under 50. Comparison to enwik9 was identical and
within the 10 GB memory limit at 8.869 GB maximum resident set size.
> --
> You received this message because you are subscribed to the Google Groups "Hutter Prize" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to hutter-prize...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/hutter-prize/80ea8c87-4bdf-4f92-93cc-01fd39a7c81dn%40googlegroups.com.



--
-- Matt Mahoney, mattma...@gmail.com

James Bowery

unread,
Jan 19, 2024, 9:28:32 AMJan 19
to Hutter Prize
This is in adjudication pending interpretation of the time limit as wall clock time or CPU time.  My interpretation is CPU time, which is why I made the announcement.  Matt is open to having Marcus arbitrate this and Marcus is quite busy for obvious reasons. 

James Bowery

unread,
Jan 20, 2024, 6:40:09 PMJan 20
to Hutter Prize
Marcus is permitting fx-cmix because the wall clock time does not greatly exceed the allowable time and the CPU time was well below the allowable time.
Reply all
Reply to author
Forward
0 new messages