Thomas,
Thanks for your help. Your information was very helpful. I do have a couple of follow up questions now that I have my program semi working. First, I'm having a problem with TDLambda where the values in the v vector grow out of control. I have a simple example in which the computer navigates an 8x8 grid and attempts to learn the fastest route to a specific position. Repeating this experiment with the TD code or TDLambda where lambda=0 shows that the computer very quickly learns the quickest path to the goal. However, when I increase lambda, the computer wanders around without noticeably learning, and the values in the v vector increase to infinity with a speed partially determined by the value of lambda. Do you have any idea what could cause this behavior? I am not using an eligibility trace and am leaving that argument out of the constructor, although my understanding is that these are improvements to the TDLambda algorithm but not necessary.
Second, clarification on the nbFeatures argument, my end goal is to use TDLambda to play checkers. The state of my system can therefore be described as an array of values of length 64. This is what I was using for x_t and x_tp1. However, after watching the v vector update (even with lambda=0), I'm wondering if I misunderstood. Should nbFeatures be the number of possible states of the board? Currently with nbFeatures=64 and lambda=0 the values in the v vector still grows to infinity over time, which appears to be due to the fact that even when the board state changes, most pieces don't move which increases the value of their corresponding position in v.
Lastly, I have been storing v as a bin file after every iteration of the algorithm through a game with the idea that I could shutdown the program, and start it up again where the last run left of learning. However, I don't see anyway to set v since it is protected. Do I have the right idea that this would allow me to pick up learning where an early run left off, and if so is there anyway to do this without altering your code?
Thanks for all your help,
Ryan