We're now in the 30 day comment and verification period for Kaido Orav's submission sharing credit with Bryan Knoll called fx2-cmix, which has exceeded the 1% improvement award threshold.
Source code is published at:
https://github.com/kaitz/fx2-cmix
1.58585% = 100*(1-110793000/112578322)
% improvement = 100*(1-S/priorS)
110492000 := 441463 + 110351665
S := length(cmix)+length(archive9)
S := length(comp9.exe/zip)+length(archive9.exe)
This submission contains fallowing major modifications on top of the recent fx-cmix Hutter Prize winner:
- NLP (Natural language processing)
- online reverse dictionary transform
- single pass wikipedia transform
- updated order of articles.
- mixers contexts are more similar to fxcm mixer contexts.
- mixers have weight update skipping when error is below threshold (improves speed).
- removed the weight regularizer from the mixer (improves speed).
- executable binary size reduced due to "simpler" code.
- Removed 7 indirect nonstationary predictors, 6 match model predictors, 3 mixers. This improves compression time and at the same time allows fxcm to be more complex and slower.
- Reverse dictionary transform. We load the dictionary when it is found after decompressing it. Text has a separate buffer from coded byte stream buffer.
- Natural language processing using stemmer (from paq8px(d)).
- Stemmer has new word types: Article, Conjunction, Adposition, ConjunctiveAdverb.
- Some word (related) contexts are changed based on what type of word was last. Some words are removed from word streams depending on the last word type. This improves compression.