On 09/02/15 01:23, Ajay Talati wrote:
> Hey Alejandro,
>
> sorry for the late reply - my computers running slow - here's my run
> upto 102 epochs - with training epsilon = 0.05.
>
Looks good, learning faster than my run. Thanks for testing it!
> It seems initializing the network with sd 0.02, gives a small
> performance improvement, especially in the early learning epochs?
>
I don't know really, I haven't done any comparisons.
> The wrapper's a great idea/tool !!! I'm looking forward to trying it out
> later on - thanks a lot for making it available to us :)
>
> What are your current interests?
>
> a) fiddling with the screen/observations/network inputs, and network
> initialization, and the stochastic optimizer? Slow to experiment with !!!
> b) fiddling with the RL parameters, discount and epsilon stuff?
>
> Fiddling with a) is what I'll try and make some progress on :))))
>
I don't currently have much time for proper thinking, which is why I'm
doing just enough to keep the GPU busy. I'm going to tweak the discount
rate and try some other games for now. I haven't seen it learn much on
anything other than pong and breakout, although there was definitely
some learning going on in enduro. Mainly try to give Nathan data for him
to be able to tell what's working and what's not. (I also just enjoy
watching it play)
Once all the results from the DeepMind paper are replicated, there are
some simple ideas that I'd like to test (variable discount rate, deeper
network, context-sensitive epsilon). Also, getting it to use the cuDNN
code might be helpful and shouldn't be too hard.