Begin 30 Day Comment Period On Kaido Orav's fx2-cmix

137 views
Skip to first unread message

James Bowery

unread,
Sep 3, 2024, 5:56:07 PMSep 3
to Hutter Prize
We're now in the 30 day comment and verification period for Kaido Orav's submission sharing credit with Bryan Knoll called fx2-cmix, which has exceeded the 1% improvement award threshold.

Source code is published at:

https://github.com/kaitz/fx2-cmix


1.58585% = 100*(1-110793000/112578322)
% improvement = 100*(1-S/priorS)
110492000 := 441463 + 110351665
S := length(cmix)+length(archive9)
S := length(comp9.exe/zip)+length(archive9.exe)

Submission Description

This submission contains fallowing major modifications on top of the recent fx-cmix Hutter Prize winner:

  • NLP (Natural language processing)
  • online reverse dictionary transform
  • single pass wikipedia transform
  • updated order of articles.
More detailed changes
cmix changes:
  • mixers contexts are more similar to fxcm mixer contexts.
  • mixers have weight update skipping when error is below threshold (improves speed).
  • removed the weight regularizer from the mixer (improves speed).
  • executable binary size reduced due to "simpler" code.
  • Removed 7 indirect nonstationary predictors, 6 match model predictors, 3 mixers. This improves compression time and at the same time allows fxcm to be more complex and slower.
fxcm changes:
  • Reverse dictionary transform. We load the dictionary when it is found after decompressing it. Text has a separate buffer from coded byte stream buffer.
  • Natural language processing using stemmer (from paq8px(d)).
  • Stemmer has new word types: Article, Conjunction, Adposition, ConjunctiveAdverb.
  • Some word (related) contexts are changed based on what type of word was last. Some words are removed from word streams depending on the last word type. This improves compression.





Reply all
Reply to author
Forward
0 new messages