toffer
unread,Apr 24, 2008, 7:57:00 AM4/24/08Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to encode_ru_f...@googlegroups.com
Actually a NN doesn't learn that quick, since you use gradient descent (which converges slowly). This can be compensated a bit by using some techniques like step size estimation, smoothing, etc... But this can't compete with second order algorithms. By combining several model's predictions you compensate the inaccuracy of some models and the slow learning speed of other models. Suprisingly this mixed prediction is better than the best model's prediction, but life isn't perfect.
The network itself learns to represent a nonlinear function of it's input data, the complexity of this function is limited by the network structure (so a NN could be approximated by a polynomal). Since the network we use is very simple, we can't describe an ideal input-output mapping.
Alltogether:
the models aren't perfect
the prediction mix isn't perfect
Now this is the point what SSE does. It maps a prediction and some context to another prediction, so it tries to lern the "ideal" input-output mapping. Now just take a SSE step which only maps a probability to another: SSE(p0) = p1. That is a simple one dimensional function. How do we represent this? You sample this function at the points p0(k)=k/N (which map to p1(k)), where N is the total number of vertices, so you can approximate this mapping. This can be further improved by using interpolation. For speed reasons a simple linear interpolation, which involves the two nearest vertices p0(k), p0(k+1) to the input probability p0. And, of course, you need to update the SSE function; this is simple, since a vertex can be represented by bit model.
A SSE step p1=SSE(p0) is given by:
1. find the neares vertices k, k+1 : k = p0/(1/N)
2. find the distance of p0 to the vertex k : d = p0%(1/N) (reminder)
3. calculate p1 by interpolation, e.g. linear interpolation: p1 = p0(k)*((1/N-d)/(1/N)) + p0(k+1)*(d/(1/N))
4. Update the function, e.g. update the involved vertices (like the bit models you use)
There are several things which you can do to improve this mapping: use more than just one SSE function (select the "active" function by a context ctx). This makes SSE two dimensional: p1=SSE_ctx(p0) => p1 = SSE(p0,ctx). You can also preprocess the input probability, like applying a transform to increase the resolution where it's important (near accurate predictions). You don't need to do this, but you can (have a look at the discussion about SSE with Shelwien).
I hope this helps.
Greets