Re: [rl-list] Understanding tile coding

1,314 views
Skip to first unread message

Johnicholas Hines

unread,
Jun 15, 2012, 10:31:15 AM6/15/12
to rl-...@googlegroups.com
I think before working in continuous domains, getting very clear on how reinforcement learning works in discrete gridworld or perhaps n-armed bandit domains is advantageous.

Tile coding, if I understand it correctly, is a way of moving away from the problematic continuous domain toward the discrete gridworld.

Johnicholas

Barry Nichols

unread,
Jun 15, 2012, 11:22:52 AM6/15/12
to rl-...@googlegroups.com
Hi,

On 15 June 2012 14:27, J. Rosado <jfr...@gmail.com> wrote:
> My problem is that i still can't  understand very well what's the
> objective of tile coding and how does it work. So i'm looking for more
> information and explanations on that.

Tile coding is used as a method of approximating the value function,
it allows the storage of values for all possible states and actions
with a manageable number of parameters (or weights) rather than
requiring a lookup table of size action-space*state-space, and also
allows values of states and actions that have never been visited to be
calculated based on similar states and actions that have been visited
(generalisation).

Tile coding is basically the same as the Albus' CMAC neural network,
so you could read more about it by looking at some papers about CMAC.
Also, Russell Smith's Ph.D. thesis
[http://www.q12.org/phd-thesis.html] contains a chapter on CMAC neural
nets which may be helpful.

Kind regards,
Barry

Davi Carnaúba

unread,
Jun 15, 2012, 12:24:49 PM6/15/12
to rl-...@googlegroups.com
Hi,

In simple words, it works by discretizing the state space in regions called tiles, each tile share the knowledge obtained with your neighbors.

2012/6/15 J. Rosado <jfr...@gmail.com>
Hi:

   Maybe i did not put this on the right words. I understand that it's a way of moving away from the problematic continuous domain, or by other words it's a way to deal with large space state worlds without the course of dimension. What i want to understand is how it works? How does tile coding is able to provide the features parameters so the value (or the state-action) function approximates the real value function? 

Regards,

               J. Rosado 

--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en

Francisco Martinez Gil

unread,
Jun 15, 2012, 12:59:13 PM6/15/12
to rl-...@googlegroups.com
Hi,
here is a simplified image of the problem that colud help:

A Rubick's cube is a discretization of a portion of 3D space. Imagine a point P at the center of the Rubick's cube. It is inside a small cube. Now imagine many
Rubick's cubes in different positions and orientations that all of them includes inside (not as in the center of course) the point P. Your point P is inside a small cube of each one of the Rubick's cubes. So you can code your point P in a 3D space as the series of small cubes (one for each Rubick's cube)that includes the point P. Other point near P (lets name it R) will have a similar or equal codification of P. So tile coding codifies with similar (equal) codes the points that are near in your feature space.

In a simple case, if you keep a Q table where each entry is a small cube of each of the Rubick's cube you can estimate the value of point P adding up the values of each small cube that belongs to its codification. (Of course you have a problem if you have not enough memory to keep a value for each small cube). But there is also a solution: Hashing. The problem of the colision is very amminorated if you think that likely your problem is actually represented in the space of features by a low percentage of the whole number of small cubes.

I hope this helps.
I regret my bad english.

regards
Francisco Martinez-Gil

Csaba Szepesvari

unread,
Jun 15, 2012, 9:43:34 PM6/15/12
to rl-...@googlegroups.com, J. Rosado
Hi,

The objective of tile coding, as that of any function approximation
method, is to allow you to approximate closely the functions of
interest. Tile coding is a specific case of linear function
approximation, with binary, sparse features. Because of this, evaluating
the function value is really fast and is independent of the number of
features involved. This is a nice property to have.
If the tiles are initialized randomly (as they are most of the time),
you can think of tile coding as implementing a form of hashing.
Thus, probably the theory of random hash functions is where you could
find most likely an explanation of when to expect tile coding to work
well. My gut feeling is that tile coding will work fine with many
dimensions provided that there is limited interaction between the
dimensions.

Bests,
- Csaba
PS: A formal explanation of tile coding (for what it worth) can be found
in my book:
http://www.ualberta.ca/~szepesva/RLBook.html
See p. 27 of the pdf.

On 12-06-15 7:27 AM, J. Rosado wrote:
> Ok..let me try this again, since it seems my first try was misunderstood :(
>
> I'm doing work and research in the RL area, applied to biped locomotion.
> But all subjects related to AI and RL were new to me. So i pick up
> Richard Sutton book, and start studying by it. I'm now a bit stalled at
> the mountain car example and tile coding. I was able to create the
> matlab code equivalent to the tiles.c code provided by Richard, which i
> attach, so anyone can use it, if they wish and you can also provide
> feedback and possible bugs feedback. My problem is that i still can't
> understand very well what's the objective of tile coding and how does it
> work. So i'm looking for more information and explanations on that.
>
> Regards,
>
> J. Rosado
Reply all
Reply to author
Forward
0 new messages