How to let AI know that only some actions are available during specific states?

56 views

Skip to first unread message

Junwei Dong

unread,

Aug 25, 2017, 3:22:29 AM8/25/17

to Reinforcement Learning Mailing List

Let's say there is an AI designed to learn to play Super Mario. And we define an action called "jump" to jump,which practically press down button A soon then release the button soon.The result is a hop---short jump.

But it's known to all there are not only short jump in the game,but also long jump---to show it, AI has to keep pressing down button A,not release button A until it is time to let Mario fall down.

As the distance to jump is not discrete, I don't think it make sense to define several jump actions with different distances. Instead, I think it make sense to define an action called "keep jump"---keep pressing button A, and another one called "release jump"---release button A.So that AI can choose whether to fall down in a certain time step,like human.

Then here raises a question---as there are some other actions besides action keep/release jump, how to let AI just choose "release jump" or do nothing after "pressing jump" during training.

in this action design, it is easy to see it would be terrible if AI forgets to release button---it would lead the AI to keep jumping and then gain a distorted recognition of the environment as it thinks "Hey, I am not acting,the environment just keep me jumping. I am always jumping in the game."

Then in my own thought, I pop up 2 ways to solve it:

Suppose we are using policy gradient to find an optimal policy with distribution of actions.And we take a kind of neural network to train with the input of raw pixels of game.

1.using "If then" to filter wrong actions: That is, after pressing jump, record it with a flag, then after gaining a distribution on all actions in the just following time step, no matter which action is chosen, just do :
if action chosen == "release jump":
do release jump
change record of flag
else:
do nothing

But the problem is what the AI really did does not influence the training, after all, it just choose some action,say C but not "release jump". Would it confuse the AI that C is good during jump? I get a loss here.

2.no change to choice of actions, but add a numeric flag in network:
That is:after pressing jump, just add a flag as certain unit value in a certain hidden layer of neural network.

The problem with this solution is:
1)waste of time, as AI has to figure out all other actions besides "release jump" is meaningless after pressing jumping (just for simplicity, ignoring the direction button effect during jump.) after a long iteration of training;

2)have to think about what numeric value to represent "jump pressed" or "not jump pressed"---what if it be OK just choose 0 and 1,not negative effect on gradient calculation?

Is there any idea on this, or you think my design on "keep/release action" make no sense?

Curt Welch

unread,

Aug 25, 2017, 12:25:59 PM8/25/17

to rl-...@googlegroups.com

The point of RL is to let the agent explore and learn. Just give it the power to control the buttons and let it learn on its own!

The long standing problem in RL is how to make systems learn quickly in large learning spaces. If giving your agent full power to control the buttons makes the learning space, and time, too great, then you could do research and figure out how to solve the problem no one else has fully solved! :)

Or, you can "cheat" and hard code game logic into the system so your learning agent doesn't have to learn it. We do this sort of thing to translate a large space, down to a small space learning problem to make practical applications of learning systems work better.

But your "if" code above doesn't seem to me to make learning any easier. The agent still has to learn that nothing works after it hits the jump button except jump release.

One reason it might be hard for the agent to learn to use the button is that it's "blind" to the fact that it hit the jump button (no action memory). It doesn't "know" the state of the jump button (up or down). A simple way to at least give it the data it needs is to feed the button state back into the agent as an input -- so it knows that the button is up or down. This gives it context for learning what works when the button is up, and what doesn't work -- which can be hard to learn if it doesn't have the input to establish that context.

Curt

--
You received this message because you are subscribed to the "Reinforcement Learning Mailing List" group.
To post to this group, send email to rl-...@googlegroups.com
To unsubscribe from this group, send email to
rl-list-unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/rl-list?hl=en
---
You received this message because you are subscribed to the Google Groups "Reinforcement Learning Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rl-list+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.