Hi Ryan. I'm afraid that I didn't fully understand your explanation
of your understanding of CMAC and hashing.
First, the code on Rich Sutton's page is for tile-coding, which is a
subset and simplification of what a CMAC algorithm does. Perhaps I'm
being pedantic, but I've seen that as a source of confusion before.
I'm not sure if you've read the Sutton and Barto book, but if not, the
tile-coding section might be appropriate. It explains tile-coding and
hashing to some degree. Even after reading it, I personally found the
concept a little hard to get my mind around at first.
http://www.cs.ualberta.ca/~sutton/book/ebook/node88.html#SECTION04232000000000000000
Here is a VERY simplified explanation of how I think of tile-coding.
Tile-coding translates a list of numbers (possibly continuous and
discrete) into a single number. For example, of the list had only 1
continuous element, then looking at a particular range of values
[0,1], tile coding might create mapping:
Tile Range
0 [0, .25]
1 (.25, .5]
2 (.5, .75]
3 (.75,1.0]
When your observation has a value in one of these ranges, tile-coding
can change it into a discrete number, which you can associate with a
memory location. So, when you care about observation=.1, you get the
value from memory[0]. .2 also is associated with memory[0].
However, .3 is associated with memory[1], etc.
This is a simple discretization. You could do tile-coding multiple
times with different offsets, say 2 times.
Tile Range
0 [0, .25]
1 (.25, .5]
2 (.5, .75]
3 (.75,1.0]
4 [-.1, .15]
5 (.15, .4]
6 (.4, .65]
7 (.65, .9]
8 (.9, 1.15]
You see now every number in [0,1] corresponds to 2 memory locations.
So when you care about observation=.1, you get the value from
memory[0] and memory[4]. But .2 is associated with memory[0] and
memory[5]. You can just combine them:
Value(.1) = .5 * memory[0] + .5 * memory[4]
Value(.2) = .5 * memory[0] + .5 * memory[5]
The number of times you do this is the number of tilings that you are
using. By using 8, 16, or even more tilings, you can make fine
differentiations between different values of your observations.
The problem is that you don't know what tile number you should start
and end at for each tiling, because you don't actually know the range
of the observations. In the example above the first tiling assigned
tiles {0,1,2,3} to the numbers in [0,1]. What about the number 1.5?
Or 5.5, or 1500.5? To resolve this problem, tile-coding assigns the
tile numbers using an algorithm that spans a very large space, and
then hashes them down to fit in the memory you have.
So, the truth might be that the ranges tile to very large numbers:
Tile Range
1497453150 [0, .25]
45213 (.25, .5]
112631 (.5, .75]
33 (.75,1.0]
We would need a very large number to accommodate memory[1497453150].
So, depending on your actual physical memory size, these tile numbers
are hashed to fit (these are all made up numbers by the way):
Tile MemorySize=1000 MemorySize=100 MemorySize=10
1497453150 9750 501 2
112631 776 3 0
45213 5 46 7
33 913 87 2
See, the algorithm works even if the memory size is only 10. However,
the risk is that some ranges my hash to the same spot in the smaller
memory space. Hashing makes the algorithm work efficiently for any
memory size, but we risk collisions. However, with a reasonable sized
memory, and a few tilings, these collisions will not significantly
hamper the agent's performance.
I hope that's helpful...
--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com