thanks for the quick response.
I know that 32 tilings is quite a large number giving you with an adequate precision the x,y location in the search space.
However, because the features will be enhanced later on, add more features, I thought to keep the number of tilings constant and alter other parameters (#tiles, action selection, gamma etc)
As far as the scaling, Sutton at http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLtoolkit/tiles.html mentions that the number of tiles is not restricted to any region and it extends to the infinite plane; it is restricted by the memory size.
My question is : why changing only the memory size [512,100000] the evaluation result is approximately the same?
I run some experiments today with memory size < 128 and 4 as # tilings. There were many collisions at the returned tiles indexes and the agent couldn't learn.
Also, as Sutton mentions at the same reference, in order to achieve a wrap-around effect, the double values should be divided by the range and call the tiles function with the remainder.
I run 50 experiments today by wrapping around the values and providing the remainder to the tiles functions and another 50 experiments without wrapping around. The results where approximately the same as far as the accumulative reward and the final evaluation reward.
Shouldn't wrapping around the values to produce better results?
Finally, as I know, the tiles by default assume generalisation of 1, ie the tile width is 1.
If someone wants to achieve better generalisation,he should divide the double values by the desired tile width before calling the function tiles?
Thanks,
Simos
As I mentioned previously, if the memory size (maximum length of
weight vector) is large enough to avoid hashing collisions, increasing
it will not change the performance. In your case 512 is large enough,
therefore, increasing it to 10^5 will not improve performance, only
force you to have an array of weights of length 10^5 instead of only
512.
Wrap around will only give improvements if your problem involves some
form of wrap around (or repeating) e.g., an angle which is the same at
0 as 2PI and 4PI... In cases such as this, you would not want to learn
a repeating value function at every multiple of 2PI, instead you would
want to use the wrap around to re-use the same values regardless.
i.e., use modulus 2PI as the input rather than the value of the angle
directly (i.e., range is 2PI).
Dividing your values is the method used in this implementation to
adjust the width the individual tiles, if you make them smaller you
can achieve more accuracy at the cost of generalisation, by reducing
the number of input values which activate any given tile.
tiles."
No, you don't need to take the range into account, the tiles aren't
numbered.
The weight indices are calculated by quantising the floating point
inputs to integer values and passing these integers to the hash
function.