CMM fast context mixing compressor

7 views
Skip to first unread message

toffer

unread,
Apr 23, 2008, 2:37:00 PM4/23/08
to encode_ru_f...@googlegroups.com


@osmanturan

I'll answer this tomorrow or maybe this afternoon. Sorry, i'm busy atm.

toffer

unread,
Apr 24, 2008, 7:57:00 AM4/24/08
to encode_ru_f...@googlegroups.com


Actually a NN doesn't learn that quick, since you use gradient descent (which converges slowly). This can be compensated a bit by using some techniques like step size estimation, smoothing, etc... But this can't compete with second order algorithms. By combining several model's predictions you compensate the inaccuracy of some models and the slow learning speed of other models. Suprisingly this mixed prediction is better than the best model's prediction, but life isn't perfect.

The network itself learns to represent a nonlinear function of it's input data, the complexity of this function is limited by the network structure (so a NN could be approximated by a polynomal). Since the network we use is very simple, we can't describe an ideal input-output mapping.

Alltogether:
the models aren't perfect
the prediction mix isn't perfect

Now this is the point what SSE does. It maps a prediction and some context to another prediction, so it tries to lern the "ideal" input-output mapping. Now just take a SSE step which only maps a probability to another: SSE(p0) = p1. That is a simple one dimensional function. How do we represent this? You sample this function at the points p0(k)=k/N (which map to p1(k)), where N is the total number of vertices, so you can approximate this mapping. This can be further improved by using interpolation. For speed reasons a simple linear interpolation, which involves the two nearest vertices p0(k), p0(k+1) to the input probability p0. And, of course, you need to update the SSE function; this is simple, since a vertex can be represented by bit model.
A SSE step p1=SSE(p0) is given by:

1. find the neares vertices k, k+1 : k = p0/(1/N)
2. find the distance of p0 to the vertex k : d = p0%(1/N) (reminder)
3. calculate p1 by interpolation, e.g. linear interpolation: p1 = p0(k)*((1/N-d)/(1/N)) + p0(k+1)*(d/(1/N))
4. Update the function, e.g. update the involved vertices (like the bit models you use)

There are several things which you can do to improve this mapping: use more than just one SSE function (select the "active" function by a context ctx). This makes SSE two dimensional: p1=SSE_ctx(p0) => p1 = SSE(p0,ctx). You can also preprocess the input probability, like applying a transform to increase the resolution where it's important (near accurate predictions). You don't need to do this, but you can (have a look at the discussion about SSE with Shelwien).

I hope this helps.

Greets

Reply all
Reply to author
Forward
0 new messages