Disambiguator tricks

15 views
Skip to first unread message

Jeff Hatch

unread,
Jan 20, 1998, 3:00:00 AM1/20/98
to

I've said some confusing things about my disambiguator in a couple of
threads, especially the "Unexpected bonus syntax" thread. Here's the
long version of my explanation.

THEORY

Disambiguation requires intelligent behavior by the parser. Programming
a computer to break most commands into actor-verb-noun-preposition-noun
sequences is a trivial task, but some commands don't neatly fall into
place with a simple algorithm. For instance, the command "Put two of
the books in the sack" has two prepositional phrases instead of one, so
the phrase "two of the books" has to be treated as a plural direct
object.

No problem yet. But what about the command, "Put two of spades in
sack"? Does this mean "Put the two of spades in the sack," or does it
mean "Put two of the spades in the sack"? There's no way a parser can
figure out which one the player intends without looking at the game
situation. Is the player carrying a two of spades, or is he carrying a
pair of spades?

The best IF libraries can even handle that problem. But what about "put
fly in amber in hole in wall"? What about "Examine the In box. Get In
box"? What about "give Steve Green green apple"? We'd like these
commands to work, because their meaning is obvious. But we don't want
to write special code for these situations, because they occur so seldom
it's not worth the trouble.

I believe these sentences can be handled effortlessly by using a totally
different technique for disambiguation.

The current disambiguation strategy reminds me of early chess-playing
programs. Game designers once tried to write sophisticated code so that
computers could understand the strategic situation on a chessboard in
much the same way humans do. Eventually, a "brute force" approach
proved more successful. Using this method, a computer doesn't
"understand" each board position nearly as well as we do--but it
examines all the possible paths that a game can take for so many moves
that an average computer chess player can defeat most of us.


In much the same way, I think a disambiguator that "looks ahead" will
disambiguate confusing sentences with much less effort than a
disambiguator that tries to "understand" context and game situation in
advance. If you use TADS, you're familiar with this concept--the TADS
disambiguator sometimes makes "quiet calls" to Verify routines to see
whether certain verbs make sense with certain items. I propose an

extension of that technique.


IMPLEMENTATION: THE "BRANCH" STATEMENT

There's no easy way to "look ahead" at will in any established IF
authoring language that I know of. The TADS Verify routines are similar
to what I want, but inadequate in few ways. First, TADS Verify routines
are forbidden to change variable values. Also, TADS Verify routines
must always succeed completely or fail completely, and in some
situations, I want the parser to know that a certain interpration is
unlikely, but still possible. It's better if verification code can be
mixed seamlessly with execution code.

This is one of the reasons I decided to write my own IF authoring
system. Before I describe how my library will use the disambiguator,
let me demonstrate the built-in "branch" statement which is the core of
my disambiguation strategy:

void Execution

{

int CantSeeError = 10000, AlreadyHaveError = 5000, AdjectiveMatch = 100;

int MatchedFruit = 0, MatchedBook = 0;

branch MatchedOrange = 1;

if (MatchedOrange == 1) {

error AlreadyHaveError;
EndMessage ("You already have the orange!");
}

branch MatchedBook = 1;

if (MatchedBook == 1) {

error AdjectiveMatch;
EndMessage ("You take the book.");
}

error CantSeeError;
EndMessage ("You can't see any 'orange' here.");

}


This function always outputs, "You take the book," because that path
gives the lowest error number.

Notes:
1. The "int" keyword begins a list of local variable declarations. The

equivalent Inform syntax would be "[Main CantSeeError AlreadyHaveError

AdjectiveMatch MatchedFruit MatchedBook; CantSeeError = 10000..."

2. The "EndMessage" function prints text and then exits, so if the
message
"You already have the orange!" is printed, none of the later code will
execute.

3. The "error" command has no direct game effect. It's only useful in

conjuction with the branch command.

4. The "branch" command is similar to an "if" statement, because the

statement which comes after the branch may or may not be executed in any

particular turn. But instead of evaluating an expression to see whether

or not the optional code will be executed, my system "quietly" executes

ALL remaining code which would be executed under EITHER possibilities,

and then prints the path which yielded the best result! (The "best"
result
is the one that yields the lowest error value.)

IMPLEMENTATION: USING "BRANCH" IN A LIBRARY
How does the "branch" statement solve the problem of disambiguation?
Well, the simplest way to use it is to branch every time the parser
matches a word or phrase, so the parser either accepts the match or
rejects it, depending on which choice gives the best result.

For instance, my system's IF library is so primitive that I haven't even
built the concept of "scope" into the parser. If I type the sentence
"get knife," my parser looks at every item in the game to see if one
matches the word "knife." When parsing is finished, a post-parsing
routine checks the direct object's location and says "You can't see the
knife here" if necessary. But by using the "branch" command, I was able
to make my system distinguish between two identical knives. When I was
in a room with one of the knives and typed "get knife," my system looked
at three different messages:

"You take the knife."
"You can't see the knife here."
"I don't understand your direct object."

After I assigned error numbers to the "You can't see..." and "I don't
understand..." messages, the first message was the only one that didn't
cause an error, so this was the one that was finally printed out.

A similar method will work for most situations (though my library isn't
sufficiently sophisticated yet). For instance, if a particular object
matches three words in a row, my parser may try accepting zero, one,
two, or three words. Here's how the sentence "put fly in amber in hole
in wall" would be understood, if I used the traditional technique of
matching as many words as possible:

1. "I only understood you as far as wanting to put the fly in amber
somewhere." [The parser interpreted the command as (Put) (fly in amber
in)...]
2. "You put the fly in amber in the hole in the wall." ["put (fly in
amber) in (hole in wall)"]
3. "I only understood you as far as wanting to put the fly in amber in
the hole in the wall."
4. Same as #3.
5. "I only understood you as far as wanting to put the fly in amber in
something."
6. Same as #1.
7. Same as #1.
8. "I only understood you as far as wanting to put the fly in amber in
the fly in amber." [(Put) (fly) (in) (amber)...]
9. Same as #1.
10. "I only understood you as far as wanting to put something
somewhere."
11. "I don't understand the word 'put' as a verb."

If my library used this technique, command #2 would be quickly selected,
and the player would have no need to realize how many different ways his
command could be interpreted.

Or consider the sentence "get two of spades." Skipping duplications,
the potential outputs would be:
1. "You can only see one spade here."
2. "You take the two of spades."
3. "I only understood you as far as wanting to take the two of spades."
4. "I only understood you as far as wanting to take something."
5. "I don't understand the word 'get' as a verb."


Thanks for your patience in reading this far! I'd be glad to hear your
comments, and to answer any non-technical questions. Tomorrow I'll
explain some of the technical details I've omitted in this summary and
mention some other uses for the branch statement, if time permits.


-Rúmil

Reply all
Reply to author
Forward
0 new messages