# planning and oranges

2 views

### Sean Barrett

Apr 25, 1994, 5:53:06 PM4/25/94
to
Well, I finally sat down and started writing out
the design for a basic plan-forming AI for use
idea kicking around in my mind for well over a
year, and I'm pretty clear on how I want everything
to work. But I did bump into one pretty major flaw.

My thought was to separate out spatial navigation
from the rest of plan formation. It's a pretty
special problem domain, I can do much more optimal
solutions, and I can save a fair amount of computation
time. Additionally, I can later plug in a full
3-space model and adapt it to an Ultima Underworld
or Alone in the Dark sort of game, and all I have
to do is rewrite the navigator.

Now, to handle this, basically, one of the
basic subgoals in the planner, "goto", can't
be directly subdivided. Instead, it calls into
the navigational planner. That planner figures
out a path from one point to another. If it
finds obstacles, let's say a door, it calls
back into the main planner to work out how
to open the door. It can weight paths differently
according to the results of the callback, or
whatever.

Anyway, the problem with this is that it results
in an extremely naive decomposition of the problem.

Suppose the problem is to retrieve an orange from
the kitchen. Suppose the door to the kitchen is
locked, and the AI knows this, and knows that
the key is on the shelf next to it.

The planner decomposes the problem as:

goto(kitchen)
get(orange)
goto(here)

The first subgoal is handed off to the navigational
planner. It works out the route, and detects a door
on the way, so it calls back into the main planner

The main planner looks and says, "Aha, it's locked,
so we need the key." So then it generates some
subgoals (where 'here' is of course different now)

goto(key)
get(key)
goto(here)
unlock(kitchen_door, key)

Then the navigator wraps the above plan in the path
from the start to the kitchen door, etc., and so the
whole thing ends up coming out as:

walk from start to kitchen door
walk to key
get key
walk to kitchen door
unlock kitchen_door

Clearly this is an unacceptable plan.

The problem comes from the fact that by splitting
the planner in two this way, and forcing them to
nest in a very strict fashion, the plans come out
quite suboptimally.

Any clues how to solve this? I could do some sort
of "peephole" optimizations of the entire plan,
or just in the navigator. Since the navigator
knows that it's just trying to get from A to B,
then it knows that any subplans generated along
the way can be rearranged, so two walks in a row
can be optimized together. However, there are
examples (such as when there are two locked doors
on the way) which seem to require a lot of domain
knowledge to solve.

But the domain knowledge should really go in the
main planner, not the navigational one, since I'd
like to be able to change the navigational planner
easily.

The _only_ solution I've come up is to do some sort
of post-processing optimization. Generate the whole
plan, have the main planner look at it, rearrange
the order it does things, and then recompute the

This seems extremely sloppy and redundant. Any
better ideas?

Sean Barrett

### Gareth Rees

Apr 27, 1994, 12:24:50 PM4/27/94
to
Sean Barrett writes:
> The _only_ solution I've come up is to do some sort of post-processing
> optimization. Generate the whole plan, have the main planner look at
> it, rearrange the order it does things, and then recompute the new

I took a look at the section on planning in Elaine Rich, "Artificial
Intelligence" and it suggests that planning is rather messy and often
you have to look at your plan after it's finished and tidy it up (for
example, you have two subgoals such that completing the second undoes
the first but not vice versa - a hierarchical planner might not notice
this dependency). However, the book was published in 1983, so one would
expect the state of the art to be a bit better by now.

You might want to take a loook at the literature on STRIPS (Stanford
Research Institute Problem Solver) which is "designed to solve the
planning problems faced by a robot in rearranging objects and navigating
in a cluttered environment", which sounds a lot like your problem. See

%A R Fikes, P Hart and N Nilsson
%T Learning and executing generalised robot plans
%J Artificial Intelligence 3
%P 251-288
%D 1972

but this is even earlier; perhaps a forward citation search would turn
up something more useful and recent.

--
Gareth Rees

### Collin Pieper

Apr 26, 1994, 2:37:54 AM4/26/94
to
I read this newsgroups fairly seldom, but I was reading some other
messages that eluded to the use of AI in adventure games, more than just
english recognition, and am sorry I missed it. So excuse me if I am off
base.
What your saying is interesting, but what is your overall goal? Having
the AI decide the path that the user must discover? If this is so then
there is nothing wrong with your code, for the user would have to walk to
the door to find out it was locked. If you just want an overall plan
(assuming you were God and knew the door was locked) post-processing
would be the easiest bet (minimal code changes). You've got to realize
that what you have modeled right now is essentially a player who walks
along the path, discovers problems and then tries to solve them. What
it sounds like you want is an omniscient God who could anticipate all
that's happening..
If you don't like post-processing...well actually this is the same
thing, but you could just check for reoccuring actions on the fly.
-has this been done
-yes
-well remove the previous one because it needs to be done after this
It is essentially the same technique you'd use in post-processing, but
it is done on an action by action basis and will definitly take less
memory overhead. There are of course many other AI techniques (you're
implementing the first AI program ever-1957) and seeing that I am no AI
God by any stretch I can't tell you of them, but search the bookstores
and talk to profs. What language are you using? Lisp? Prolog? A good
book for Lisp AI is Paradigms of Artificial Intelligence Programing:Case
Studies in Common Lisp/Peter Norvig/Morgan Kaufmann Publishers/1992.

Collin

### Erik Max Francis

Apr 28, 1994, 12:31:23 AM4/28/94
to
Collin Pieper <cl...@cornell.edu> writes:

> You've got to realize
> that what you have modeled right now is essentially a player who walks
> along the path, discovers problems and then tries to solve them. What
> it sounds like you want is an omniscient God who could anticipate all
> that's happening..

Not necessarily omniscient. But there are contexts in which a non-
player character _should_ know the status of objects (in the most
general sense) in a certain region of the game. For instance, if you
have the boy of the household, the boy will probably know if the
kitchen door is locked or not, and will also know where the key is.

It sounds to me like you can't have it both ways. The non-player
character knows where all the rooms are (which are just big objects),
but doesn't know the state of any objects within those rooms?
Granted, there might be a fuzzy area.

A non-player character with no knowledge would bumble about randomly
until it came across something interesting, and then would start
building a goal stack and, in an attempt to satisfy those goals, would
employ some sort of action stack, where it pops things off when it
completes them and pushes things on when it needs to achieve secondary
goals.

Er, I'm rambling.

Erik Max Francis, &tSftDotIotE ...!uuwest!alcyone!max m...@alcyone.darkside.com
USMail: 1070 Oakmont Dr. #1 San Jose, CA 95117 ICBM: 37 20 N 121 53 W __
AGCTACTGTACGTACGTTTGCACGTATGCTGTGCAXTGCATACTGACATCGTGACTGATCTGCATGACTTGCA / \
"Omnia quia sunt, lumina sunt." (All things that are, are lights.) \__/

### Sean Barrett

Apr 28, 1994, 3:05:56 PM4/28/94
to
Well, I've reached a conclusion about
how I'm going to approach this problem.
I'll give a quick description of what I
expect the final version of the planner
to be like, although I expect I'll do

I think this is pretty doable,
and in the setting I'm dealing with
shouldn't get too computationally
expensive. So I'm going to include
a fair amount of detail.

I realize this isn't "state of the art",
but it seems in excess of "state of the practice"
in the "available games" world.

Let me set up a few things for people
with less terminological background.

form) one or more goals. To meet
a goal, the NPC uses something called
"hierarchical planning". The crucial
point here is that it's _planning_.
The AI should (in effect) _plan the
entire sequence of actions_ it's going
to perform to achieve the goal.

The hierarchical aspect refers to the
fact that the AI takes a goal and
decomposes it into a set of subgoals.
My AI book, with all of three pages
on planning/hierarchical planning,
calls these "macro operators", and
doesn't have much other terminology.

Basically, I view it as you break
a goal down into subgoals, recursively,
until you have "actions". The way you
decompose them is by having what I
call "routes" or "scripts"; writing
scripts is the meat of tailoring the
It's where the "knowledge" of plannig
really lies.

Now, there may be more than one script
for decomposing a goal. Also, some
scripts may have conditions that
can fail to be met. So the planner
may have to examine a number of
possibilities before it finds one
that is guaranteed to work--or it
may not find any guaranteed routes.

Ok, so, here's how it's going to work.

First, I have a knowledge base. This
is used by the planner to check
conditions on routes. For example,
a script for getting through a locked
door might be:

pass_locked_door(door)
select X s.t. is_key(X) and unlocks(door, X)
get(X)
goto(door)
unlock_with(door, X)

The knowledge base is used to determine
what X is. If the NPC has a key that
it doesn't know the truth of unlocks(door, X)
[that is, it is unknown what the key is for],
then it will see this script as being
"possible but not guaranteed".

global knowledge base, making the AI omniscient.
Then later I'll go in and actually have
NPCs have independent knowledge bases. To
keep things simple, though, I may make it
that they can never believe something which
is actually false.

Ok, second, I have a "navigational planner".
It generates paths from A to B. I originally
described it as using a callback. Instead,
now, I'll have it return "all possible paths"
from A to B. However, this is restricted in
a very heavy way, due to domain knowledge.

Here's the crucial observation. In walking
from A to B, you may deal with certain
obstacles. A locked door, an enemy who
planner cannot tell which path is more
optimal.

However, if there are no obstacles, the
optimal path. What that ends up meaning
is that the navigational planner can
return the "path" by simply returning
a special "obstacle-less" movement action,
"move_to".

Thus, in the original kitchen example, the
two paths the navigation planner can return
are:
move_to(locked_door)
pass_obstacle(locked_door)
move_to(kitchen)
or
move_to(kitchen)

The latter is understood as meaning "the best
obstacle free route available". "move_to()"
can only be generated by the navigator, so
it is only generated in contexts where there
really is an obstacle-free route.

The domain knowledge observed above means that
the navigational planner never need generate
paths like this:

move_to(dining_room)
move_to(kitchen)

because if that were the optimal path, it
could just return move_to(kitchen).

So we're guaranteed it's not going to return
a huge number of paths (just all the ones
that might be optimal), and we know it'll
consist of alternating move_to() and
pass_obstacle() steps.

Ok. So now, the planner, say dealing with

goto(kitchen)
get(orange)
goto(start)

will get two different paths from the two
gotos. It can look at all the combinations
and pick which one is the best. Below we're
going to then optimize the final path, so
note that when it picks which one is best,
it really has to optimize it first, and
then pick the best one. I have lots of
ideas for how to prune and simplify this
comparison. It doesn't have to be optimal,
anyway, a good heuristic will do fine.

Now, we have the final problem. Assuming
in our knowledge base, the AI knows the
door is locked and knows where the key is,
we want it to _plan_. We want it to get
the key on the way, not get to the door
and get the key.

Traditionally, people use goal stacks and
action stacks. I've chosen to solve this
problem (and open up a number of other
really cool possibilities) by throwing
away the stacks. I'll use an explicitly
directed acyclic graph, which shows which
goals/actions cannot be performed before
the "specialness" of the move_to() and
goto() constructs, they do not have to
directly appear in the graph; instead,
each action/goal is labeled by "where it
occurs". The planner then builds a topological
sort of the action/goal graph, while trying
to minimize the movement required (basically
a restricted case of the travelling salesman
problem).

The effect is that the planner figures out
everywhere it needs to go first, then optimizes
its movement to be as efficient as possible.

Which is pretty much what I want.

The other things that the DAG approach to
scheduling allows is that the AI can attempt
to meet multiple goals at the same time, and
interleave the actions for meeting them.
Additionally, the AI can form two different
plans for how to meet one goal if it knows
that the first way may not succeed. It
can then interleave both of those plans
as well.

Interestingly, DAG-optimization is a
standard compiler thing. One easy DAG
optimization is common subexpression
elimintation, wherein the same value,
computed twice, is combined. The equivalent
for me is to notice that, say, two different
goals require me to get a particular key,
and I combine these actions together and
only get the key once.

The travelling salesman part of my optimization
doesn't much resemble compiler optimizations,
althought it's vaguely like the optimizations
performed to minimize the number of temporary
registers required, which explicitly carefully
chooses what order to evaluate things in.

All in all, as long as plans don't get too
complex and goals don't get to numerous,
this stays quite easily in the computationally
feasible range, while the navigator can
become as arbitrarily independently complex
as we like.

I've left out a lot of details, such as the
fact that some goals may not be decomposed
but appear themselves in the DAG, to keep
the size of the DAG down. These sorts of
details aren't that important; I think they'd
be pretty obvious if you sat down and tried
to implement smoething like this yourself.

Anyway, in summary, the planner has the following
interface (which is interesting because I'm
not writing the full planner initially, but
I want to reuse the rest of the components):

KNOWLEDGE PLANNER NAVIGATOR
BASE

knowledge hierarchical determine all
base -- <-- decomposition --> routes and
what the into subplans obstacles
AI knows |
|
travelling compute distance
salesman sort --> of obstacle-free
| path from A to B
|
build plan --> build detailed
obstacle-free path
from A to B
-->
subroutine call

The navigator also accesses the knowledge base, of course.

The most severe problem with this whole approach
is the fact that actions change the world, and
change the cost of other actions. Thus we can't
compute the cost of, say, the first half of the
action and the cost of the second half, and locally
optimize them. We have to consider what's changed
from various possible first halves when we look at
the second half.

In my approach, what this means in actuality is
that when we do the "travelling salesman" problem
we have to pay attention to the order of things
and notice what effects they have on the costs.
The costs of the primitive move_to() actions will
not be influenced by past actions (because there
should be an obstacle anywhere the cost of moving
might be variable). However, the costs of the
actual actions described by a node in the graph
might change.

Sean Barrett