Maze mod, granular maze environment and wall sensor info

38 views
Skip to first unread message

Yann Benigot

unread,
Feb 1, 2013, 1:19:55 PM2/1/13
to open...@googlegroups.com
Hi, 

While working on the Q-Learning exercise (Maze mod) and trying to understand the format of wall observations, I saw something strange in environment.py (line 667, sense function of GranularMazeEnvironment) :

---
        for (dr, dc) in MAZE_MOVES:
            direction = Vector3f(dr, dc, 0)
            ray = (p0, p0 + direction * GRID_DX / self.granularity)
            result = getSimContext().findInRay(ray[0], ray[1], 1, True)
            if len(result) > 0:
                (sim, hit) = result
                len1 = (ray[1] - ray[0]).getLength() # max extent
                len2 = (hit - ray[0]).getLength() # actual extent
                if len1 != 0:
                    obs[i_obs] = len2/len1; i_obs += 1
                else:
                    obs[i_obs] = 0; i_obs += 1
---

If I understood correctly, obs[2+i] should contain a value proportional to the distance to the next wall in direction i if there is a wall between us and our theoretical position if we choose action i and ignore collisions. 

However, this does not seem to work since we ignore cases where there is no wall in the direction we are exploring.
For example, if our agent is in the following situation : 

|------|
|      |
|A     |
|      |
... 'A' being the position of our agent. 

We should expect the following value for observations: (x, y, ?, 0.3, ?, ?) (I put ? because I do not know how we should represent cases where no walls are present)

However, I believe this code may produce something like: (x, y, 0.3, obs[3], obs[4], obs[5]) where obs refers to the origin value of obs (given as an argument). We never increment i_obs in the for loop if we do not encounter a wall, which means that if we encounter a wall in a following iteration, the index i_obs will be wrong when modifying obs. 

Finally, which value should be used for indices where no walls are encountered ? I am going to try to go with 1.0, which leads me to the following code : 
     for (dr, dc) in MAZE_MOVES:
            direction = Vector3f(dr, dc, 0)
            ray = (p0, p0 + direction * GRID_DX / self.granularity)
            result = getSimContext().findInRay(ray[0], ray[1], 1, True)
   print "dr ", dr, "dc ", dc, "obs ", obs, "res ", result
            if len(result) > 0:
                (sim, hit) = result
                len1 = (ray[1] - ray[0]).getLength() # max extent
                len2 = (hit - ray[0]).getLength() # actual extent
                if len1 != 0:
                    obs[i_obs] = len2/len1; i_obs += 1
                else:
                    obs[i_obs] = 0.; i_obs += 1
            else:
                obs[i_obs] = 1.; i_obs += 1  

Am I missing something or is there really a problem with the function GranularMazeEnvironment.sense() ?

Thank you

Igor Karpov

unread,
Feb 2, 2013, 7:51:52 PM2/2/13
to open...@googlegroups.com
Hello Yann,

You are absolutely correct! This is a rather ugly piece of code, and it's kind of hard to figure out what's going on here, but you can easily verify your hypothesis by adding an assert(i_obs == 6) after the loop and run the agent to see the assertion fail in the terminal.

The intended output is a vector of 6 elements, where:

  • obs[0] is the x position of the agent
  • obs[1] is the y position of the agent
  • obs[2] through obs[5] is the amount of free space in the +r, -r, +c, -c directions, where 0 means no space and 1 means "as far as the sensor can see".

So your fix would certainly address this problem, However, I've modified this section to hopefully make it clearer what is going on:

        # calculate free space in the possible move directions
        for i, (dr, dc) in enumerate(MAZE_MOVES):
            direction = Vector3f(dr, dc, 0)
            p1 = p0 + direction * GRID_DX / self.granularity
            hit_result = getSimContext().findInRay(p0, p1, 1, True)
            if len(hit_result) > 0:
                # if the ray hit a wall, return what fraction was clear
                (sim, hit) = hit_result
                len1 = (p1 - p0).getLength() # max extent
                len2 = (hit - p0).getLength() # actual extent
                obs[2 + i] = len2/len1 if len1 != 0 else 0.0
            else:
                # if the ray did not hit a wall, return 1.0
                obs[2 + i] = 1.0

I've committed this change to trunk, if you are building from SVN, please test it out and see if it works for you.

Thanks!

--Igor.



--
You received this message because you are subscribed to the Google Groups "opennero" group.
To unsubscribe from this group and stop receiving emails from it, send an email to opennero+u...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Yann Benigot

unread,
Feb 3, 2013, 7:48:29 AM2/3/13
to open...@googlegroups.com
Thank you for your answer!

I have tried the new environment.py and it seems to work correctly for me. 

However, something a little odd seems to happen when the robot "overlaps" with the walls on-screen (see the attached screenshot). This does not seem to have anything to do with the changes you made: I could not see the problem without your changes (and without my changes as well) because observations[2:6] did not seem to have any meaning. 

Precisely, if the robot is in the following situation:
|--------
|
|  A
|--------
(as next to the wall as possible), it seems it will be displayed as "overlapping" with the walls. When having a look at observations[2:6], I see it has the value : [1, 1, 0.3, 0.3], which would mean the robot has a wall both in the c+ direction and in the c- direction. However, internally, the robot only has a wall in the c- direction, so perhaps we should rather have: [1, 1, 1, 0.3]. 
I would say this seems to be caused by the fact Irrlicht is used to detect walls (by findInRay()), and that since the robot is technically inside the wall to Irrlicht, it will answer that there is a collision in both directions.

I do not know if this is really a problem. I have not seen it happen in the following situations:
|--------
|   A
|  
|--------

--------|
        |
       A|
        |
--------|

so perhaps it only happens when the wall is in directions r- and c-, which would mean that there would be no conflicts, [1, 1, 0.3, 0.3] representing a wall in the c- direction and [1, 1, 0.3, 1] a wall in the c+ direction (since the agent initially do not know what do the values of obs[2:6] mean and should automatically learn, this does not seem to be too problematic; it is just harder to understand for a human being). 


Perhaps the problem is that the wall positions given to Irrlicht are wrong and overlaps with the position of the agent in the described cases, which would mean it would just be needed to give proper positions to the walls in Irrlicht. However, this may be too much trouble for something that may not be too problematic since agent learning should be able to compensate for that. (such changes may also break the first person agent)
wall.png
Reply all
Reply to author
Forward
0 new messages