TileCoding by Sutton

64 views
Skip to first unread message

Simos Gerasimou

unread,
Jul 10, 2011, 7:38:46 PM7/10/11
to rl-li...@googlegroups.com
Hello RL people,

I have a question about TIleCoding source code provided by Sutton at Sutton Tile Coding.
I know the background theory for Tile Coding Function Approximation, how it works and how it is used in order to divide the continuous space into distinct tiles.
The notes mention that "the input variables will be gridded at unit intervals, so generalisation will be by 1 in each direction. Any scaling will have to be done externally before calling the function tiles".

I am trying to encode two features: the first represents the x axis distance at the range [0,19] and the second the y axis distance at the same range.
Although I do not make the scaling, I don't divide by the range, the agent manages to learn and its performance is very good.
I also performed some experiments dividing the features' values by the range, before calling the function tiles, and the performance is approximately the same.

I have to mention that the tilings number is kept constant at 32 and I change the memory size. 
The odd thing is that by defining different memory sizes varying [512, 100000] the performance is neither improved nor reduced; average is the same.

Does anybody can explain me why this behaviour occurs?
Is it necessary to scale the values, divide by the range, before calling the function tiles?

Thanks in advance,

Simos


PS: 1)  @BrianTanner: The EpsilonGreedyTileCodingSarsaLamda package was really helpful and I used it as a guidance when implementing my tile coding. Thanks for sharing it.
  
      2) I didn't know which exactly is the correct group to post this question and I have posted the same question at RL_Glue. So please forgive me for double posting.

Barry Nichols

unread,
Jul 11, 2011, 12:49:17 PM7/11/11
to rl-library
Hi Simos,

As you have 32 overlapping tilings (layers), and therefore 32 active
weight indices from any x,y input, when any particular 32 indices are
active you will have quite a precise location in your x, y space; how
precise this needs to be depends on the problem, if you require more
precision you could increase the number of tilings.

Regarding the scaling, this is just the size of each tile in each
dimension. Consider a problem where you require different shaped
tiles, or more precision in one dimension than another, then you may
get better results by scaling x and y differently.

The memory size is just the size of your weight vector (i.e. maximum
number to be returned as an active weight index after hashing).
Because hashing occurs, the 'memory size' shouldn't make a huge
difference as long as you're not getting too many hashing collisions
(i.e. you're getting different sets of active indices for different
inputs).

Also, you can experiment with larger tiles and more layers of tiles in
order to achieve better generalisation. This is because you'd have
some weight indices that are the same for similar inputs, but some of
them will be different, so the value returned would be different. And
the larger the difference, the less common active weights and the
bigger the difference in the output value.

I hope this helps.

--
Regards,
Barry D. Nichols

On Jul 11, 12:38 am, Simos Gerasimou <s.geras...@gmail.com> wrote:
> Hello RL people,
>
> I have a question about TIleCoding source code provided by Sutton at Sutton
> Tile Coding <http://webdocs.cs.ualberta.ca/~sutton/tiles/tiles.cpp>.

Simos Gerasimou

unread,
Jul 11, 2011, 11:36:51 PM7/11/11
to rl-li...@googlegroups.com
Hey Barry,

thanks for the quick response.

I know that 32 tilings is quite a large number giving you with an adequate precision the x,y location in the search space.
However, because the features will be enhanced later on, add more features, I thought to keep the number of tilings constant and alter other parameters (#tiles, action selection, gamma etc)

As far as the scaling, Sutton at http://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLtoolkit/tiles.html mentions that the number of tiles is not restricted to any region and it extends to the infinite plane; it is restricted by the memory size.
My question is : why changing only the memory size [512,100000] the evaluation result is approximately the same?
I run some experiments today with memory size < 128 and 4 as # tilings. There were many collisions at the returned tiles indexes and the agent couldn't learn.

Also, as Sutton mentions at the same reference, in order to achieve a wrap-around effect, the double values should be divided by the range and call the tiles function with the remainder.
I run 50 experiments today by wrapping around the values and providing the remainder to the tiles functions and another 50 experiments without wrapping around. The results where approximately the same as far as the accumulative reward and the final evaluation reward.
Shouldn't wrapping around the values to produce better results?

Finally, as I know, the tiles by default assume generalisation of 1, ie the tile width is 1.
If someone wants to achieve better generalisation,he should divide the double values by the desired tile width before calling the function tiles?

Thanks,
Simos

Barry D. Nichols

unread,
Jul 12, 2011, 10:31:31 AM7/12/11
to rl-library
Hi Simos,

As I mentioned previously, if the memory size (maximum length of
weight vector) is large enough to avoid hashing collisions, increasing
it will not change the performance. In your case 512 is large enough,
therefore, increasing it to 10^5 will not improve performance, only
force you to have an array of weights of length 10^5 instead of only
512.

Wrap around will only give improvements if your problem involves some
form of wrap around (or repeating) e.g., an angle which is the same at
0 as 2PI and 4PI... In cases such as this, you would not want to learn
a repeating value function at every multiple of 2PI, instead you would
want to use the wrap around to re-use the same values regardless.
i.e., use modulus 2PI as the input rather than the value of the angle
directly (i.e., range is 2PI).

Dividing your values is the method used in this implementation to
adjust the width the individual tiles, if you make them smaller you
can achieve more accuracy at the cost of generalisation, by reducing
the number of input values which activate any given tile.

--
Barry

On Jul 12, 4:36 am, Simos Gerasimou <s.geras...@gmail.com> wrote:
> Hey Barry,
>
> thanks for the quick response.
>
> I know that 32 tilings is quite a large number giving you with an adequate precision the x,y location in the search space.
> However, because the features will be enhanced later on, add more features, I thought to keep the number of tilings constant and alter other parameters (#tiles, action selection, gamma etc)
>
> As far as the scaling, Sutton  athttp://incompleteideas.net/rlai.cs.ualberta.ca/RLAI/RLtoolkit/tiles.htmlmentions  that the number of tiles is not restricted to any region and it extends to the infinite plane; it is restricted by the memory size.

Simos Gerasimou

unread,
Jul 12, 2011, 3:42:43 PM7/12/11
to rl-li...@googlegroups.com
 
As I mentioned previously, if the memory size (maximum length of 
weight vector) is large enough to avoid hashing collisions, increasing 
it will not change the performance. In your case 512 is large enough, 
therefore, increasing it to 10^5 will not improve performance, only 
force you to have an array of weights of length 10^5 instead of only 
512. 
I realised what you are saying because while experimenting with different memory sizes,
I didn't observe any further improvement except than less collisions on the returned tile indexes. 
I thought that the memory size divided by the number of tilings is the number of tiles in each tiling (ie #tilings X #tiles in each tiling = #memorySize)
So, providing a larger memory size while keeping constant the number of tilings is giving you greater resolution.
This is not the case?
Also, how can someone decide which is the most appropriate memory size? 
Providing small memory size, there would be many collisions and the agent won't learn, whereas providing a large memory size it will not make the agent learning rate better.
Is it domain dependent, is it dependent by the number of dimensions and the width of the tiles or it can be found only by trial and error?


Wrap around will only give improvements if your problem involves some 
form of wrap around (or repeating) e.g., an angle which is the same at 
0 as 2PI and 4PI... In cases such as this, you would not want to learn 
a repeating value function at every multiple of 2PI, instead you would 
want to use the wrap around to re-use the same values regardless. 
i.e., use modulus 2PI as the input rather than the value of the angle 
directly (i.e., range is 2PI). 
Now, I got what's the meaning of wrap around.
So in case you have only to handle distinct values then you don't have to divide by the range.


Dividing your values is the method used in this implementation to 
adjust the width the individual tiles, if you make them smaller you 
can achieve more accuracy at the cost of generalisation, by reducing 
the number of input values which activate any given tile.
To make it clear: by default tile coding assumes that the width of each tile is 1.
So in order to achieve finer resolution, you have to divide the values at each dimension with the desired width of tile before passing these values to the tiles function?


Sorry for asking so many questions and requesting further explanation but I have been struggling with these for many days.


BR,
Simos

Barry D. Nichols

unread,
Jul 12, 2011, 8:40:24 PM7/12/11
to rl-library
Hi,

On Jul 12, 8:42 pm, Simos Gerasimou <s.geras...@gmail.com> wrote:
> I thought that the memory size divided by the number of tilings is the
> number of tiles in each tiling (ie #tilings X #tiles in each tiling =
> #memorySize)
> So, providing a larger memory size while keeping constant the number of
> tilings is giving you greater resolution.
> This is not the case?

No, but possibly reducing the number of hashing collisions. Remember
it is only the maximum weight index to be returned as an activated
tile and as hashing is taking place there can be hashing collisions
even if the memory_size is larger than the total number of tiles.

> Also, how can someone decide which is the most appropriate memory size?

memory_size = tiles_per_tiling X tilings should work OK, but although
this would provide enough weight indices to have exactly one index per
tile, hashing still occurs, so there may still be collisions.

Also, depending on the problem, it may not be possible to have a
weight vector as large as the total number of tiles.

> To make it clear: by default tile coding assumes that the width of each tile
> is 1.
> So in order to achieve finer resolution, you have to divide the values at
> each dimension with the desired width of tile before passing these values to
> the tiles function?

Yes.

>
> Sorry for asking so many questions and requesting further explanation but I
> have been struggling with these for many days.
>

I know the feeling.

Simos Gerasimou

unread,
Jul 12, 2011, 10:56:47 PM7/12/11
to rl-li...@googlegroups.com

Hi,

I have a better understanding now but I still have some questions:

i) when you create the tiles you should not take into account the range for each feature?
  For example, if you have 3 features, x distance in [0,9], y distance in [0,19] and mode in [0,5].
  shouldn't you divide the double values by the respective range before passing the doubles array to the tiles function?
  Otherwise, how would the maximum tile for each feature would be defined and how the function would decide which tile index to return?
   At this example each tiling's plane should be constituted by 10x20x5 tiles.
  (This is the case where the default tile width is used, ie 1)


ii) At the reference manual it is mentioned that "If you want to restrict the input vector to a limited region, then you must do so yourself before calling tiles."
    Is it possible not to divide by the region and still get an accurate estimation for the active tiles?


BR,
Simos 

Barry D. Nichols

unread,
Jul 13, 2011, 12:04:31 PM7/13/11
to rl-library
Hi,

On Jul 13, 3:56 am, Simos Gerasimou <s.geras...@gmail.com> wrote:
> Hi,
>
> I have a better understanding now but I still have some questions:
>
> i) when you create the tiles you should not take into account the range for
> each feature?
>   For example, if you have 3 features, x distance in [0,9], y distance in
> [0,19] and mode in [0,5].
>   shouldn't you divide the double values by the respective range before
> passing the doubles array to the tiles function?
>   Otherwise, how would the maximum tile for each feature would be defined
> and how the function would decide which tile index to return?
>    At this example each tiling's plane should be constituted by 10x20x5
> tiles.
>   (This is the case where the default tile width is used, ie 1)

No, you don't need to take the range into account, the tiles aren't
numbered.

The weight indices are calculated by quantising the floating point
inputs to integer values and passing these integers to the hash
function.

> ii) At the reference manual it is mentioned that "If you want to restrict
> the input vector to a limited region, then you must do so yourself before
> calling tiles."
>     Is it possible not to divide by the region and still get an accurate
> estimation for the active tiles?

I would interpret restricting to a region as enforcing the range of
your inputs. e.g., if you want to ensure x is in [0,9] you could
include something like 'IF (x > 9) x = 9; ELSE IF (x < 0) x = 0;' in
your program before calling the tiles function.

Simos Gerasimou

unread,
Jul 14, 2011, 9:05:31 AM7/14/11
to rl-li...@googlegroups.com
Hi,

A small briefing to see if I understand it correctly:

According to the theory: if you have 3 features with regions [0,9], [0,9] and [0,4] and assuming 
that the tiles' width is the default (1), then each 3-Dimensional tiling would be  constituted by 10x10x5 tiles = 500 tiles_per_tiling.
So for each tiling you will have such a 3D plane laid out in the state (features) space.

Multiplying the tiles_per_tiling by the tilings_number (#tilings x #tiles_per_tiling) gives the natural memory size if there was no hashing.
So, defining the memory size larger than the natural memory size, that minimises the collision possibilities but it does not eliminate them.

In the case were the natural memory size is really large, for example 10^10, setting the memory size smaller would lead to more collisions?
What happens in that case?

 

No, you don't need to take the range into account, the tiles aren't
numbered.
The weight indices are calculated by quantising the floating point
inputs to integer values and passing these integers to the hash
function. 
 
Can you please elaborate on this?


Thanks for your help.

Simos

Barry D. Nichols

unread,
Jul 14, 2011, 4:37:15 PM7/14/11
to rl-library
Hi Simos,

On Jul 14, 2:05 pm, Simos Gerasimou <s.geras...@gmail.com> wrote:
> Hi,
>
> A small briefing to see if I understand it correctly:
>
> According to the theory: if you have 3 features with regions [0,9], [0,9]
> and [0,4] and assuming
> that the tiles' width is the default (1), then each 3-Dimensional tiling
> would be  constituted by 10x10x5 tiles = 500 tiles_per_tiling.
> So for each tiling you will have such a 3D plane laid out in the state
> (features) space.
>
> Multiplying the tiles_per_tiling by the tilings_number (#tilings x
> #tiles_per_tiling) gives the natural memory size if there was no hashing.
> So, defining the memory size larger than the natural memory size, that
> minimises the collision possibilities but it does not eliminate them.

Yes.

> In the case were the natural memory size is really large, for example 10^10,
> setting the memory size smaller would lead to more collisions?
> What happens in that case?

In that case different sets of input values which activate different
tiles would result in the same weight index being returned by the hash
function. This would result in updates to the active weight indices
for one input vector would affect the value returned by other
completely unrelated input vectors.

Having said this, there would also be several other activated tiles so
the small amount of error due to one activated tile might not matter.
Also, for a very large state space some of these tiles will probably
rarely be activated so you wouldn't expect such precision in the value
function anyway.

>
> > No, you don't need to take the range into account, the tiles aren't
> > numbered.
> > The weight indices are calculated by quantising the floating point
> > inputs to integer values and passing these integers to the hash
> > function.
>
> Can you please elaborate on this?
>

What I meant is the tiles aren't numbered 1 to 5 in the top row, 2 to
10 in the second row and so on, which would require knowledge of the
range. They are indexed by their integer coordinates which are passed
to the hash function.
The hash function then calculates a number based on the coordinates of
the activated tile and returns this number as the weight index
(actually it returns an array or numbers, one per tiling).

The hash function also ensures the returned values are less than the
'memory size' specified, so the weight indices returned will be valid.

Simos Gerasimou

unread,
Jul 15, 2011, 9:52:43 PM7/15/11
to rl-li...@googlegroups.com
Hey Barry,

Thanks for your detailed explanation.
It makes sense now that everything is tied together.
I was making some experiments all day modifying the learning parameters and the tile coding parameters

However, allow me to ask 2 questions as far as the memory size:
i) How can someone decide which is the most appropriate memory size?
   I think that it is domain dependent and it is also dependent on the number of actions and features that someone wants to specify.




Simos Gerasimou

unread,
Jul 15, 2011, 10:08:40 PM7/15/11
to rl-li...@googlegroups.com
Continue from the previous....

          For example if you specify memory size larger than the natural memory size it means that many tiles will never be active.
          On the other hand, specifying memory size smaller than the natural memory size leads to many collisions, which as you said
           might not impact the overall agent performance but under certain inputs wrong tiles would be activated; hence wrong weights indexes will be updated.
          So, how can someone decide which is the best memory size?
          
          Also something I noticed today, while experimenting with different memory sizes, was that specifying large memory size
          (at least equal to the natural memory size), leads many action-weight indexes in the respective array (action-weight array)
         to never be visited nor updated.
          

         ii) If the range for each feature is not important (you don't have to divide by the range before passing the value to the tile function)
             it is only used to compute the natural memory size?

Barry D. Nichols

unread,
Jul 15, 2011, 11:02:42 PM7/15/11
to rl-library
Hi,

On Jul 16, 3:08 am, Simos Gerasimou <s.geras...@gmail.com> wrote:
> Continue from the previous....
>
>           For example if you specify memory size larger than the natural
> memory size it means that many tiles will never be active.
>           On the other hand, specifying memory size smaller than the natural
> memory size leads to many collisions, which as you said
>            might not impact the overall agent performance but under certain
> inputs wrong tiles would be activated; hence wrong weights indexes will be
> updated.
>           So, how can someone decide which is the best memory size?

I would set it equal to the natural memory size, I've experimented
with larger but didn't notice any improvements in the learnt policy.

>           Also something I noticed today, while experimenting with different
> memory sizes, was that specifying large memory size
>           (at least equal to the natural memory size), leads many
> action-weight indexes in the respective array (action-weight array)
>          to never be visited nor updated.

Sounds about right, but as the indices are never active they aren't
included in the summation of active weights or the weight update, and
therefore won't affect performance.

>          ii) If the range for each feature is not important (you don't have
> to divide by the range before passing the value to the tile function)
>              it is only used to compute the natural memory size?

Yes.

Simos Gerasimou

unread,
Jul 18, 2011, 7:29:28 AM7/18/11
to rl-li...@googlegroups.com
Thanks Barry. It is almost everything clear now about Sutton's tile coding.
I really appreciate your help. Thanks for your good explanation and patience!
If I have any other questions I will come back.


BR,
Simos
Reply all
Reply to author
Forward
0 new messages