Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Context modelling

3 views

Skip to first unread message

exama...@gmail.com

unread,

Aug 1, 2006, 11:16:47 AM8/1/06

Hi there,

Can you recommend me any papers to study the work done in PAQ-like
context modelling in statistical analysis?

Best,

--
Eray

Matt Mahoney

unread,

Aug 1, 2006, 8:31:59 PM8/1/06

I wrote a technical report describing PAQ6.
https://www.cs.fit.edu/Projects/tech_reports/cs-2005-16.pdf

PAQ7/8 differs mainly in that bit predictions from different models are
mixed using a neural network rather than by adaptive weighted
averaging. Essentially, probabilities are first transformed into the
logistic domain, p -> log(p/(1-p)), mixed linearly, then transformed
back by the inverse function, p -> 1/(1+exp(-p)). The NN is trained to
minimize coding cost rather than RMS error, so the weight update is
different than back propagation, actually simpler. w -> w + x*(y-p)*a,
where w is the mixing weight, x is the input probability from a model,
p is the output probability, and y is the actual bit (0 or 1), and a is
the learning rate. (y-p) is the output error. The factor p*(1-p) from
back propagation is dropped.

There are many small improvements as well. I don't have a paper on it,
but the algorithm is described in detail in the comments at the header
of paq8f.cpp.
http://cs.fit.edu/~mmahoney/compression/paq8f.cpp

The latest version is PAQ8H. PAQ8G by Przemyslaw Skibinski adds WRT
dictionary preprocessing, where words are coded as 1-3 byte symbols.
PAQ8H by Alexander Ratushnyak has some improvements to the word model
and secondary contexts to the neural network mixer. These improvements
were not documented.

Some earlier papers:

http://cs.fit.edu/~mmahoney/compression/paq1.pdf
described PAQ1 in 2002. It uses fixed weights that I tuned by hand to
mix model outputs. (I found that the optimal weight is the square of
the context length). The paper is a draft that I never got around to
submitting for publication.

http://cs.fit.edu/~mmahoney/compression/mmahoney00.pdf
is a 1999 AAAI paper describing a precursor to PAQ that used a neural
network to directly predict the next bit, rather than mix model
outputs. To make it work online, I had to store 0 and 1 counts along
with each weight to get the learning rate right. Later I realized I
could use the 0 and 1 counts directly and dispense with the weights,
resulting in PAQ1.

-- Matt Mahoney

exama...@gmail.com

unread,

Aug 2, 2006, 10:12:50 PM8/2/06

Thanks a lot for the info Matt. We are immensely interested in this
stuff. I would like to extend these ideas for use in some data mining
tasks.

0 new messages