CMAC and Hashing

Ryan Markley

unread,

Jan 15, 2009, 12:25:24 PM1/15/09

to Reinforcement Learning Mailing List

Hello, I am doing some research related with RL, and I have seen an
example that uses CMAC and hashing to reduce the memory requirements,
that I would like to double check with you if it is right, I would
like to use the same, but I do not fully understand the idea.

This is what I have seen, we have the atributes, these are the
variables that we choose to get the states. What they do is using a
offset for each atribute they get only the most significant bits for
each atribute. This offset change between atributes, the bigger is the
offset the best is the approximation to the atribute.

After that they concatenate all the bits from the different atributes
and they do a XOR with a randomly constant generated number, this
input goes to the hashing function and after we get the index for the
tile. For each tile we do a similar operation but shifting the
atributes, changing the order in which we process the atributes to get
the index. When we have the value of each tile we add them, and we get
our Q Value.

That thing that I have explained, is for accessing to the table. I
hope that everybody understand the process, if not ask. What I want to
know is:, is this what the Sutton code is doing here
http://www.cs.ualberta.ca/~sutton/tiles.html?, is this the common
approach to reduce the memory requirements?, how can this idea work?.
And finally this is the process to access to the Q values, but how can
I write the values after using SARSA update in the table?.

Thank you so much in advance.

Brian Tanner

unread,

Jan 15, 2009, 12:57:18 PM1/15/09

to rl-...@googlegroups.com

Hi Ryan. I'm afraid that I didn't fully understand your explanation
of your understanding of CMAC and hashing.

First, the code on Rich Sutton's page is for tile-coding, which is a
subset and simplification of what a CMAC algorithm does. Perhaps I'm
being pedantic, but I've seen that as a source of confusion before.

I'm not sure if you've read the Sutton and Barto book, but if not, the
tile-coding section might be appropriate. It explains tile-coding and
hashing to some degree. Even after reading it, I personally found the
concept a little hard to get my mind around at first.
http://www.cs.ualberta.ca/~sutton/book/ebook/node88.html#SECTION04232000000000000000

Here is a VERY simplified explanation of how I think of tile-coding.

Tile-coding translates a list of numbers (possibly continuous and
discrete) into a single number. For example, of the list had only 1
continuous element, then looking at a particular range of values
[0,1], tile coding might create mapping:
Tile Range
0 [0, .25]
1 (.25, .5]
2 (.5, .75]
3 (.75,1.0]

When your observation has a value in one of these ranges, tile-coding
can change it into a discrete number, which you can associate with a
memory location. So, when you care about observation=.1, you get the
value from memory[0]. .2 also is associated with memory[0].
However, .3 is associated with memory[1], etc.

This is a simple discretization. You could do tile-coding multiple
times with different offsets, say 2 times.
Tile Range
0 [0, .25]
1 (.25, .5]
2 (.5, .75]
3 (.75,1.0]
4 [-.1, .15]
5 (.15, .4]
6 (.4, .65]
7 (.65, .9]
8 (.9, 1.15]

You see now every number in [0,1] corresponds to 2 memory locations.
So when you care about observation=.1, you get the value from
memory[0] and memory[4]. But .2 is associated with memory[0] and
memory[5]. You can just combine them:
Value(.1) = .5 * memory[0] + .5 * memory[4]
Value(.2) = .5 * memory[0] + .5 * memory[5]

The number of times you do this is the number of tilings that you are
using. By using 8, 16, or even more tilings, you can make fine
differentiations between different values of your observations.

The problem is that you don't know what tile number you should start
and end at for each tiling, because you don't actually know the range
of the observations. In the example above the first tiling assigned
tiles {0,1,2,3} to the numbers in [0,1]. What about the number 1.5?
Or 5.5, or 1500.5? To resolve this problem, tile-coding assigns the
tile numbers using an algorithm that spans a very large space, and
then hashes them down to fit in the memory you have.

So, the truth might be that the ranges tile to very large numbers:
Tile Range
1497453150 [0, .25]
45213 (.25, .5]
112631 (.5, .75]
33 (.75,1.0]

We would need a very large number to accommodate memory[1497453150].
So, depending on your actual physical memory size, these tile numbers
are hashed to fit (these are all made up numbers by the way):
Tile MemorySize=1000 MemorySize=100 MemorySize=10
1497453150 9750 501 2
112631 776 3 0
45213 5 46 7
33 913 87 2

See, the algorithm works even if the memory size is only 10. However,
the risk is that some ranges my hash to the same spot in the smaller
memory space. Hashing makes the algorithm work efficiently for any
memory size, but we risk collisions. However, with a reasonable sized
memory, and a few tilings, these collisions will not significantly
hamper the agent's performance.

I hope that's helpful...

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Rich Sutton

unread,

Jan 15, 2009, 1:23:01 PM1/15/09

to rl-...@googlegroups.com

Csaba Szepesvari

unread,

Jan 15, 2009, 2:03:14 PM1/15/09

to rl-...@googlegroups.com

Hi,

Here is how I think about these methods -- hoping it helps you.

CMAC and tile coding implement linear function approximation:

f(z) = sum_{i=1}^n theta_i phi_i(z).

z is the input, phi_i(z) is the i-th feature value at the input, theta_i
are the tunable parameters, n is the number of features. This determines
the memory requirements (you need to store theta_1,..,theta_n). The
value of n can be quite large.

These methods assume that phi_i is binary valued.
Further, they set up phi_i in such a way that for any z,
the vector (phi_1(z),...,phi_n(z)) has only a few of its component 'on'
(equal to one), the others are zero.
Let the number of non-zero elements be say n(z).
Then, if you knew which elements are non-zero, in order to calculate
f(z) you would only need to sum the corresponding weights.
These methods exactly do this. Their computational complexity scales
with n(z).

They also have some non-trivial approximation properties (they are
randomized -> relation to compressive sampling?). They somehow do not
leave any point uncovered. Somehow everything is in fact evenly covered.
Though they are a little bit similar to ANOVA models of statistics I
guess in that higher order interactions (between coordinates) are not
modeled. This is quite reasonable if you care about overfitting and
keeping the memory requirements reasonable.

Here I have to admit, I am not aware of any work that analyzed the
approximation properties of these approximators in a rigorous manner. I
in fact would appreciate any references on this.

- Csaba

Ryan Markley

unread,

Jan 15, 2009, 2:22:32 PM1/15/09

to Reinforcement Learning Mailing List

Brian that explanaiton is perfect, now the Sutton thing makes sense, I
read that section about 10 times, but in none of them the thing makes
complete sense, now that is solved. If it is possible you should add
that explain to the book in the html page. But I have a question, we
use this thing to reduce the size of the table in wich we store the Q
Values because this is a mechanism of generalization right?, Can you
please tell me if the way to do this is something similar to this?.
For each atribute we do the thing that you have explained, and after
we add every value that we obtaing and we get the Q values. is this
correct?.

And another question after using SARSA we obtain a new Q Value, how do
I store the new Q Value?. My mainly probem is that I have to develop
my own code for the tiling to keep all the computation and memory use
under control, that's why I am interested in understand the solution
to develop my own code. My mainly concern is with the table to store
the Q values, in my application as usually happen with RL I have a
huge amount of states.

Regarding my question, that's why I was asking, because I had the
feeling that that idea was a total ad hoc solution, and no the common
approach. Here you have a pictures that shows what I explained.

http://img183.imageshack.us/img183/4391/agtywmvhswlwg9.jpg. Have you
seen this before?

On Jan 15, 9:57 am, Brian Tanner <br...@tannerpages.com> wrote:
> Hi Ryan. I'm afraid that I didn't fully understand your explanation
> of your understanding of CMAC and hashing.
>
> First, the code on Rich Sutton's page is for tile-coding, which is a
> subset and simplification of what a CMAC algorithm does. Perhaps I'm
> being pedantic, but I've seen that as a source of confusion before.
>
> I'm not sure if you've read the Sutton and Barto book, but if not, the
> tile-coding section might be appropriate. It explains tile-coding and
> hashing to some degree. Even after reading it, I personally found the

> concept a little hard to get my mind around at first.http://www.cs.ualberta.ca/~sutton/book/ebook/node88.html#SECTION04232...

Brian Tanner

unread,

Jan 15, 2009, 3:02:36 PM1/15/09

to rl-...@googlegroups.com

Glad I could be helpful.

I think all of what I said is covered in the link that Rich sent:
http://rlai.cs.ualberta.ca/RLAI/RLtoolkit/tiles.html

I'm not exactly sure about your question, but I would suggest looking
around at some agents that use tile-coding to see how it is done.
Rich has some code here that does this:
http://www.cs.ualberta.ca/~sutton/MountainCar/MountainCar.html

Probably the earlier link does too. For more specifics, and to your
diagram, I cannot help. I have invested time trying to understand the
tile-coding software, and have in the past written my own (without the
hashing), and these days I much prefer to think of the software as a
black box that just gives me answers ;)

--
Brian Tanner
Ph.D Student, University of Alberta
br...@tannerpages.com

Reply all

Reply to author

Forward