On Thursday, April 18, 2013 1:15:43 PM UTC-4, TomH488 wrote:
Sorry for the belated reply:
The neural network platform I'm using is NeuroShell 2 which is fairly basic. They have a black box training method called TurboProp I that simply does not work a fair number of times.
So that leaves Momentum - my only other option.
After a lot of empirical experimentation, needing a method that does not require continuous user tweaking, I settled upon a momentum of .9 (0 to 1 allowed) and the largest learning rate that does not blow up initially. .01 being typical.
I also did not use sequential case training, but random. So in this case, Momentum acts as a "random perturber" as each case is trained upon.
Initial weights typically .3 (+-.3) uniform distribution were used. (uniform is only option)
Then I would use 6 trainings with different random number seeds to "smooth out the results."
I figure keeping the momentum really high is like a Poor Man's Similated Annealing.
I also found that with about 50% more hidden nodes (100) than inputs (70), I get my best results after only 10 epochs of training. Maybe 20-30 epochs max. But any more, chaos really shows up.
I wish I could dither (add fuzz) each training case each time it is processed since I believe this is an extremely robust method to smooth the solution. The fact that the net never sees the same case twice really should help with generalization and not memorization. Future, the "fuzz" sort of sets a slope constraint between case points that should really reduce interpolation haywir-ed-ness (make it smoother!) But unless I write my own code, that isn't going to happen. (Frankly, I think I would be a lot better off using all the code Timothy Masters has published along with his books which I think are grossly under-utilized. What can be better than having the source code? And what a great work to have an active forum on. [Maybe I should try to set one up on Google Groups.])
I'm also way behind the curve on picking lags. Have been looking at cross-correlations of input to output but really need to get an understand of PACF and its usage. Of course these are all based on linear problems but still, I believe they are relevant. The Big Thing is that if you have an input that effects the output 30 days into the future, you better have a 30 day lag. But as soon as you say that, what if you were using differencing to deal with nonstationarity? Of what use is a difference that is far distant from the output due to a big lag? Its like you need a 30 day difference or just raw input.
Lots of problems and issues. I could keep 10 of me working without end. But my time is a limiting factor.
I am very close actually. If I could simply get smoother results (stock market prediction) between trainings, I would be looking at yachts. Even looked into Kalman filters which are quite amazing and way above my pay grade. And their combination with neural networks is quite amazing. It seems like the weight matrix of a net becomes one of the Kalman matrices! Wow...!
So I'm stuck with my platform and just need to find one more improvement to get the drawdowns under control.
And I'm not even talking about the trading model! HaHaHaaa
Thanks so much, as always, in advance,
Tom