Hi,
While working on the Q-Learning exercise (Maze mod) and trying to understand the format of wall observations, I saw something strange in environment.py (line 667, sense function of GranularMazeEnvironment) :
---
for (dr, dc) in MAZE_MOVES:
direction = Vector3f(dr, dc, 0)
ray = (p0, p0 + direction * GRID_DX / self.granularity)
result = getSimContext().findInRay(ray[0], ray[1], 1, True)
if len(result) > 0:
(sim, hit) = result
len1 = (ray[1] - ray[0]).getLength() # max extent
len2 = (hit - ray[0]).getLength() # actual extent
if len1 != 0:
obs[i_obs] = len2/len1; i_obs += 1
else:
obs[i_obs] = 0; i_obs += 1
---
If I understood correctly, obs[2+i] should contain a value proportional to the distance to the next wall in direction i if there is a wall between us and our theoretical position if we choose action i and ignore collisions.
However, this does not seem to work since we ignore cases where there is no wall in the direction we are exploring.
For example, if our agent is in the following situation :
|------|
| |
|A |
| |
... 'A' being the position of our agent.
We should expect the following value for observations: (x, y, ?, 0.3, ?, ?) (I put ? because I do not know how we should represent cases where no walls are present)
However, I believe this code may produce something like: (x, y, 0.3, obs[3], obs[4], obs[5]) where obs refers to the origin value of obs (given as an argument). We never increment i_obs in the for loop if we do not encounter a wall, which means that if we encounter a wall in a following iteration, the index i_obs will be wrong when modifying obs.
Finally, which value should be used for indices where no walls are encountered ? I am going to try to go with 1.0, which leads me to the following code :
for (dr, dc) in MAZE_MOVES:
direction = Vector3f(dr, dc, 0)
ray = (p0, p0 + direction * GRID_DX / self.granularity)
result = getSimContext().findInRay(ray[0], ray[1], 1, True)
print "dr ", dr, "dc ", dc, "obs ", obs, "res ", result
if len(result) > 0:
(sim, hit) = result
len1 = (ray[1] - ray[0]).getLength() # max extent
len2 = (hit - ray[0]).getLength() # actual extent
if len1 != 0:
obs[i_obs] = len2/len1; i_obs += 1
else:
obs[i_obs] = 0.; i_obs += 1
else:
obs[i_obs] = 1.; i_obs += 1
Am I missing something or is there really a problem with the function GranularMazeEnvironment.sense() ?
Thank you