Location obects and TF/RF functionality

82 views
Skip to first unread message

Smagloy Ilya

unread,
Oct 28, 2017, 9:51:57 AM10/28/17
to BURLAP Discussion
Hello!

I've been reading the OOMDP tutorial in the past few weeks, trying to get my head around it.
After a while, I realized I do not understand how do the RF and TF functions work. I believe this is
because I do not understand how does the domain interact with the location objects.

In the tutorial, the author used in both of the functions the  propositional function "PF_AT" that
using an input of Agent and Location replies whether the agent is there.
This is really confusing because I do not provide the terminal location place for the TF for example.
As I understand it so far, the groundings include only the locations objects provided in the Initial State location,
and this is why the TF works. But in this case, how does the RF recognizes all of the other location objects?

If someone knows the mechanics of this, or feels he might have any idea where should I search - I'd be more than glad.

Thanks!

Ilia

ja...@cogitai.com

unread,
Oct 30, 2017, 4:27:21 PM10/30/17
to BURLAP Discussion
Hi,

The terminal function being used is a special one: SinglePFTF, which receives a PropositionalFunction. Then it answers whether any input state is a terminal state like this: is there any grounding of the provided PF in the state that evaluates to true? We gave it the AT propositional function, which operates on an agent object and location object. Therefore, it's asking "does there exist an agent (a) and location (l) object in the state such that AT(a, l) is true? If so it's terminal." Or in a more English-like fashion: if the agent is at *any* location, it is a terminal state.

So you're not specifying a terminating location, because the terminal function is defined so that *any* location counts as a terminating location.

Same idea for the reward function.

Make sense?

Smagloy Ilya

unread,
Nov 11, 2017, 4:13:01 AM11/11/17
to BURLAP Discussion
Ok, I understand what you say. But how does it make sense for the terminal function?
When does it actually terminate the program? And what's the importance of the terminating function here if it literally accepts anything?

Thanks a lot for the answer!
Ilia


בתאריך יום שני, 30 באוקטובר 2017 בשעה 22:27:21 UTC+2, מאת ja...@cogitai.com:

Smagloy Ilya

unread,
Nov 11, 2017, 9:23:58 AM11/11/17
to BURLAP Discussion
Also, where is the RF being used? In the sample function?
For example, I want to add some sort of printing of the reward every time an action is taken.
Where should I search? I'm quite confused.

Thanks again,
Ilia

James MacGlashan

unread,
Nov 12, 2017, 11:48:48 AM11/12/17
to Smagloy Ilya, BURLAP Discussion
To be clear, it doesn't cause it to terminate in any state. If you look at the visualization at the end of the tutorial you will see an image of the world. There is a grey circle (an agent object) and a blue square (a location object). The termination function is defined so that any state in which the grey circle is at a blue square is a terminal state, but no where else. In the example, there is only one blue square in the top right so only when the grey circle is at the blue square will the environment stop. If you added more location objects into the state you'd see more than one blue square and then the environment would terminate when the grey circle was at any of the blue squares, but no where else. The reward function is similarly defined to give a positive reward when the grey circle is at a blue square (any blue square), but no where else.

You could of course define more specific terminal and reward functions that regarded more specific objects so that the agent maybe had to go to specific location object, instead of any, but that was simply the choice made for this problem definition and is equivalent when there is only ever one location object.

The reward function gets used differently depending on whether you're using a planning algorithm (which uses it directly) or a simulated environment for reinforcement learning. In the tutorial, an Environment is made and visual explorer of the environment is created. When you launch that, you'll see you can launch a "shell" to do other tests with the environment. It might be helpful for you to pull that up and tell it to print the last reward every time you try something.

If you want to modify the reward function, you should implement your own class and not use pre-defined ones like SinglePFRF. Those are there to help you for standard cases. You can of course also subclass SinglePFRF if you want to do something different but similar to it.

James

--
You received this message because you are subscribed to a topic in the Google Groups "BURLAP Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/burlap-discussion/dhykHIQMrDc/unsubscribe.
To unsubscribe from this group and all its topics, send an email to burlap-discussion+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/burlap-discussion/6dbb9cf6-2d73-4472-9aa4-b99a8c057d1e%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Smagloy Ilya

unread,
Nov 18, 2017, 8:07:57 AM11/18/17
to BURLAP Discussion
Hello again!

Thank you for answering so frequently.

I'm still a little confused, but closer to understanding I believe; When I initialize the GenericOOState, I initialize it with a list of locations (as much as I give them). They are all considered terminal then (as I checked, this is correct as long as they have different names).
But then, where are the Location objects in my state, and how does the reward function uses them?

In addition, I started looking at planing algorithms and reinforcement learning via a simulated environment. I've noticed in the tutorials of burlap only the use of planning algorithms is explained. Is there an explanation/tutorial to use simulated environment for reinforcement learning? For example, I found that Value Iteration is implemented in BURLAP, but I don't seem to understand its connection to the reward function as you said earlier.
What I'm trying to do is to create my own environment with a single terminal location and different rewards for different locations that differ from the passive negative rewards that all of the locations give.
Right now my best lead is to add all of the special locations to the initialization of the GenericOOState, but it's seems rather unpleasant after that. Maybe you'd have an idea for a different approach?

And lastly, I'd like to alter the GUI a little bit. Does the BURLAP gui tool has some kind of a manual?

Thanks a lot, and have a good week

Ilia Smagloy

Smagloy Ilya

unread,
Nov 18, 2017, 8:35:53 AM11/18/17
to BURLAP Discussion
Hi again!

In addition to what I wrote earlier, we discovered while altering the objects send to the init of GenericOOState, we discovered that if there is some unexplained behavior from the Agent;
When a different object was between the terminal location and the agent, the program would consider this object to be the agent and the agent would go missing from the grid.

Maybe you have an idea where to look for an explanation. I'm sorry for the large amount of questions, I'm interested in understanding this big system and I didn't find a better source of info :)

Thank again,
Ilia Smagloy
Reply all
Reply to author
Forward
0 new messages