grammar & parsing

50 views
Skip to first unread message

Seth Whelan

unread,
Apr 4, 2000, 3:00:00 AM4/4/00
to
Hi. I'm new to this list but not to IF. Lately I've
been working on a personal project and the issue of parsing
player input is giving me designer's block. The basic question
that's plaguing me is how to take what the player has typed and
resolve that into an action of some kind. I've thought of a few
solutions, but none of them are as neat as the way that inform
handles it.
Okay, here's my question(s). Is there a paper out there
that covers how inform handles grammar? Is there a resource that
covers what is generally considered the "right way", the "best
way", or even just a good way to parse player input? Barring
those two, is this a subject that even comes up and that people
discuss, and if so how can I get in on it?
Replies, comments and insults welcome.


____________________________________________________________________
A fool must now and then be right by chance.


Xenos

unread,
Apr 7, 2000, 3:00:00 AM4/7/00
to
The basic concept is the same as parsing any kind of free-form input,
such as parsing compiler languages. I believe Infocom used an LALR
parser for their games. There are a lot of great tools out there that
will generate parses for you, not just the ones especially made for IF
games, but also tools such as ANTLr and Yacc (or Bison).

You can download my C++ parser for IF games from:
http://home.stny.rr.com/xenos/parser_rel_1.0.zip

Matthew T. Russotto

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
In article <7urseskc1budn4va2...@4ax.com>,

Xenos <nob...@nowhere.com> wrote:
}The basic concept is the same as parsing any kind of free-form input,
}such as parsing compiler languages. I believe Infocom used an LALR
}parser for their games.

Well, they SAID they did. I don't think so -- the grammar the parsers
accepted was certainly not.


--
Matthew T. Russotto russ...@pond.com
"Extremism in defense of liberty is no vice, and moderation in pursuit
of justice is no virtue."

Xenos

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
Why do you say that? An LALR parser is very powerful, and can be
built to accept context-free grammers and beyond. They are definitely
more powerful than the DNF based parsers built with Yacc. An LALR
paser can definely handle the complexity of the grammer accepted by
Zork, et. al.

I don't think you can definitively state what parsing method a program
uses just from the grammer it accepts. Parsing technique much less
powerful than an LALR can handle very complex sentences. Mine for
example will parser very complex paragraphs of compound sentences, and
is built using a DNF built from a modified form of BNF. Both DFAs and
BNF which will only accept context-free grammers. Child's play for a
parser using LALR.

Matthew T. Russotto

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
In article <9gvsescva5fq77g9n...@4ax.com>,

Xenos <nob...@nowhere.com> wrote:
}Why do you say that? An LALR parser is very powerful, and can be
}built to accept context-free grammers and beyond. They are definitely
}more powerful than the DNF based parsers built with Yacc. An LALR
}paser can definely handle the complexity of the grammer accepted by
}Zork, et. al.

Yes -- my point was the other way around, that the grammar accepted by
Zork 0 is very simple and the limitations would seem to indicate a
more limited parsing technique. It's possible they wrote a powerful
parser and hobbled it with a weak grammar, but I find it unlikely.

I thought YACC handled LALR(1) grammars.

Andrew Plotkin

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
Matthew T. Russotto <russ...@wanda.vf.pond.com> wrote:
> In article <9gvsescva5fq77g9n...@4ax.com>,
> Xenos <nob...@nowhere.com> wrote:
> }Why do you say that? An LALR parser is very powerful, and can be
> }built to accept context-free grammers and beyond. They are definitely
> }more powerful than the DNF based parsers built with Yacc. An LALR
> }paser can definely handle the complexity of the grammer accepted by
> }Zork, et. al.
>
> Yes -- my point was the other way around, that the grammar accepted by
> Zork 0 is very simple and the limitations would seem to indicate a
> more limited parsing technique. It's possible they wrote a powerful
> parser and hobbled it with a weak grammar, but I find it unlikely.

I agree.

> I thought YACC handled LALR(1) grammars.

Man page says: "yacc converts a context-free grammar into a set of tables
for a simple automaton which executes an LR(1) parsing algorithm."

The whole parser-theory family of grammars is terrific for parsing
computer languages, and terrible for natural language. (Even
pseudo-natural language, such as IF command syntax.)

Computer language grammars are recursive (arbitrarily nested parentheses,
statements, etc) and totally unambiguous. IF syntax is neither. Probably
someone has coded a parser that can accept "GET ALL ROCKS EXCEPT ALL THE
BLUE ROCKS EXCEPT THE SMALL BLUE ROCK", but I bet nobody has ever
*entered* such a command in the course of playing a game.

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the
borogoves..."

Xenos

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
I never played zork zero, so I couldn't say but I believe it was
suppose to be a 'sampler'. It came later than Zork, which could
handle complex sentences. I think when the authors indicated that
they used an LALR parser, they were talking about 1, 2, & 3 (or maybe
even just the original?).

Yacc does handle LALR(1). At least , that's what it says in "Lex &
Yacc" published by O'Reilly. What I was getting at was that since it
uses finite automations to do its parsing, it has trouble handling
more advanced techiques such as back-tracking, predicates, and
look-ahead values greater than one.

On Sat, 08 Apr 2000 18:30:32 GMT, russ...@wanda.vf.pond.com (Matthew
T. Russotto) wrote:

>In article <9gvsescva5fq77g9n...@4ax.com>,
>Xenos <nob...@nowhere.com> wrote:
>}Why do you say that? An LALR parser is very powerful, and can be
>}built to accept context-free grammers and beyond. They are definitely
>}more powerful than the DNF based parsers built with Yacc. An LALR
>}paser can definely handle the complexity of the grammer accepted by
>}Zork, et. al.
>
>Yes -- my point was the other way around, that the grammar accepted by
>Zork 0 is very simple and the limitations would seem to indicate a
>more limited parsing technique. It's possible they wrote a powerful
>parser and hobbled it with a weak grammar, but I find it unlikely.
>

Xenos

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
Its not that the grammers are ambiguous, and its not a problem of
syntax. Grammers for IF can be just as unambiguous as those for
computer languages. The problem lies in the semantics. Spoken
languages are such that a syntaxically correct sentence can have
multiple meaning depending on the context its used in. Then throw in
such things as slang, pronouns, infered meaning, etc. and its gets
worse. But computer languages also have sematical ambiguity problems,
such as with function overloading, literal constant typing, etc.


On 8 Apr 2000 18:45:42 GMT, Andrew Plotkin <erky...@eblong.com>
wrote:

>Matthew T. Russotto <russ...@wanda.vf.pond.com> wrote:
>> In article <9gvsescva5fq77g9n...@4ax.com>,
>> Xenos <nob...@nowhere.com> wrote:
>> }Why do you say that? An LALR parser is very powerful, and can be
>> }built to accept context-free grammers and beyond. They are definitely
>> }more powerful than the DNF based parsers built with Yacc. An LALR
>> }paser can definely handle the complexity of the grammer accepted by
>> }Zork, et. al.
>>
>> Yes -- my point was the other way around, that the grammar accepted by
>> Zork 0 is very simple and the limitations would seem to indicate a
>> more limited parsing technique. It's possible they wrote a powerful
>> parser and hobbled it with a weak grammar, but I find it unlikely.
>

>I agree.


>
>> I thought YACC handled LALR(1) grammars.
>

Bruce Stephens

unread,
Apr 8, 2000, 3:00:00 AM4/8/00
to
Andrew Plotkin <erky...@eblong.com> writes:

[...]

> Computer language grammars are recursive (arbitrarily nested
> parentheses, statements, etc) and totally unambiguous.

Actually, some are syntactically ambiguous. For example, C++ has a
special rule which says that if something is syntactically both a
legal declaration and a statement, then it should be regarded as a
declaration.

Anyway, your basic point seems correct: parsing computer programs is a
quite different problem to parsing natural language. Especially when
you're parsing single sentences which you know are going to be pretty
short (almost always less than 10 words long).

> IF syntax is neither. Probably someone has coded a parser that can
> accept "GET ALL ROCKS EXCEPT ALL THE BLUE ROCKS EXCEPT THE SMALL
> BLUE ROCK", but I bet nobody has ever *entered* such a command in
> the course of playing a game.

I think the Magnetic Scrolls parser could do things like that. Also
the famous "PLANT THE POT PLANT IN THE PLANT POT".

Xenos

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
On 08 Apr 2000 23:30:35 +0100, Bruce Stephens
<bruce+...@cenderis.demon.co.uk> wrote:

>Andrew Plotkin <erky...@eblong.com> writes:
>
>[...]
>
>> Computer language grammars are recursive (arbitrarily nested
>> parentheses, statements, etc) and totally unambiguous.
>
>Actually, some are syntactically ambiguous. For example, C++ has a
>special rule which says that if something is syntactically both a
>legal declaration and a statement, then it should be regarded as a
>declaration.
>
>Anyway, your basic point seems correct: parsing computer programs is a
>quite different problem to parsing natural language. Especially when
>you're parsing single sentences which you know are going to be pretty
>short (almost always less than 10 words long).

To throw away everthing we know about pattern recognition because
parsing a "natural" language is (seemingly) different from parsing a
computer language, is asinine and short-sighted. The techniques and
tools developed for parsing are used for a lot more things than
building compilers.

Joe Mason

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
Xenos <nob...@nowhere.com> wrote:
>>Anyway, your basic point seems correct: parsing computer programs is a
>>quite different problem to parsing natural language. Especially when
>>you're parsing single sentences which you know are going to be pretty
>>short (almost always less than 10 words long).
>
>To throw away everthing we know about pattern recognition because
>parsing a "natural" language is (seemingly) different from parsing a
>computer language, is asinine and short-sighted. The techniques and
>tools developed for parsing are used for a lot more things than
>building compilers.

Well, it doesn't really matter. Parsing the player's input isn't the
difficult part of writing IF - the hard part is creating an appropriate world
model. Making the grammar that's recognized more complex will mean the world
model has to be more complex to match.

Joe

Matthew T. Russotto

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
In article <olavessnd7jgaqqol...@4ax.com>,

Xenos <nob...@nowhere.com> wrote:
}I never played zork zero, so I couldn't say but I believe it was
}suppose to be a 'sampler'. It came later than Zork, which could
}handle complex sentences.

No, it was Zork Zero (and Arthur and Shogun) that the "LALR" flummery
was written about. It wasn't a sampler but a prequel. They claim it
"used an ATN algorithm with a LALR grammar and one-token lookahead".
I'm not sure what ATN stands for, but the grammar is far more
restricted than LALR(1).

Bruce Stephens

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
Xenos <nob...@nowhere.com> writes:

[...]

> To throw away everthing we know about pattern recognition because
> parsing a "natural" language is (seemingly) different from parsing a
> computer language, is asinine and short-sighted. The techniques and
> tools developed for parsing are used for a lot more things than
> building compilers.

I wasn't suggesting that.

It's just that the two problems are different: algorithms which you
might reject for parsing computer code (because they're potentially
exponential in the length of the program or something) might well be
entirely practical for parsing commands to an interactive fiction
game.

Xenos

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
On Sun, 09 Apr 2000 01:10:11 GMT, jcm...@uwaterloo.ca (Joe Mason)
wrote:

>Making the grammar that's recognized more complex will mean the world
>model has to be more complex to match.

I don't agree, though I think the converse is true.

Xenos

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
On 09 Apr 2000 13:16:09 +0100, Bruce Stephens
<bruce+...@cenderis.demon.co.uk> wrote:

>It's just that the two problems are different: algorithms which you
>might reject for parsing computer code (because they're potentially
>exponential in the length of the program or something) might well be
>entirely practical for parsing commands to an interactive fiction
>game.

If that were true, I would agree with you. But I doubt you could give
an example of a parsing algorithm (used is practice not academia) that
would be O(n^3) for a programming language and better for an IF. I
think this would only occur in a contrived example, and almost never
in practice.

Xenos

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
On Sun, 09 Apr 2000 01:56:00 GMT, russ...@wanda.vf.pond.com (Matthew
T. Russotto) wrote:

>In article <olavessnd7jgaqqol...@4ax.com>,
>Xenos <nob...@nowhere.com> wrote:
>}I never played zork zero, so I couldn't say but I believe it was
>}suppose to be a 'sampler'. It came later than Zork, which could
>}handle complex sentences.
>
>No, it was Zork Zero (and Arthur and Shogun) that the "LALR" flummery
>was written about. It wasn't a sampler but a prequel. They claim it
>"used an ATN algorithm with a LALR grammar and one-token lookahead".
>I'm not sure what ATN stands for, but the grammar is far more
>restricted than LALR(1).


Interesting... ATN is a statistical scoring algorithm using weighted
measured.

I found the article by Stu Galley of Infocom where he mentions using
ATN & an LALR grammer for Zoro Zero
(http://www.bf.rmit.edu.au/~fayep/Zork/parser.html). He implied that
the parser used for this game was better than the one used for the
earier games. As I said, I never played it, so I can't said. Are you
saying that it wasn't?


Bruce Stephens

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
Xenos <nob...@nowhere.com> writes:

[...]

> If that were true, I would agree with you. But I doubt you could give
> an example of a parsing algorithm (used is practice not academia) that
> would be O(n^3) for a programming language and better for an IF. I
> think this would only occur in a contrived example, and almost never
> in practice.

Try a generic backtracking parser. That would be fine for IF: it
would be quite feasible to parse "PLANT THE POT PLANT IN THE PLANT
POT" with such a thing. It would be overkill for programming
languages, however: they tend to be designed so that the lexical
tokens have only a single interpretation.

Whether a backtracking parser would be *better* than something else is
an entirely different question. There have been perfectly good games
which only understood one and two word commands. It takes a complex
world model to make more complex sentences necessary, and "PLANT THE
POT PLANT IN THE PLANT POT" was only really put there (in "The Pawn")
to show off the parser.

I guess what I'm really suggesting is that parsing commands is
typically a trivial part of a game, and it really doesn't matter what
you use to do it. However, if you're looking for something more
complex to use, you may as well try to solve problems like PLANT being
a verb, noun, and adjective. It seems quite likely that different
algorithms will be appropriate for short sentences with (potentially)
lots of ambiguous tokens.

Xenos

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
On 09 Apr 2000 17:02:34 +0100, Bruce Stephens
<bruce+...@cenderis.demon.co.uk> wrote:


>Try a generic backtracking parser. That would be fine for IF: it
>would be quite feasible to parse "PLANT THE POT PLANT IN THE PLANT
>POT" with such a thing. It would be overkill for programming
>languages, however: they tend to be designed so that the lexical
>tokens have only a single interpretation.

That is only true for languages with reversed words. In PL/1 you could
write: IF IF = THEN THEN THEN = ELSE ELSE ELSE= THEN
which is similar to your PLANT example. The difficulty here is NOT
the grammer, which is trival. With your plant example, forgetting for
a moment such things as optional parts, you have: <verb> <article>
<adjective> <noun/direction object> <preposition> <article>
<noun/indirect object>. The problem arises in the ambiguity that
arises from allowing the tokens or words to be part of multiple
classes or parts-of-speech. This is a semantical problem not a
syntaxical one, and exists in both human and computer languages, which
is why I submit that the two problems are not that different.

Also, remember that there are many different kinds of computer
languages. I would agree that backtracking parser may not be suited
for static-scoped, imperative languages, but for dynamic-scoped,
functional languages, they may be just the ticket.

>
>Whether a backtracking parser would be *better* than something else is
>an entirely different question. There have been perfectly good games
>which only understood one and two word commands. It takes a complex
>world model to make more complex sentences necessary, and "PLANT THE
>POT PLANT IN THE PLANT POT" was only really put there (in "The Pawn")
>to show off the parser.

I agree that a very complex parser would be overkill in a small,
simple world, but the complexity of the parser should be driven by the
complexity of the world, not the other way around.


>
>I guess what I'm really suggesting is that parsing commands is
>typically a trivial part of a game, and it really doesn't matter what
>you use to do it. However, if you're looking for something more
>complex to use, you may as well try to solve problems like PLANT being
>a verb, noun, and adjective. It seems quite likely that different
>algorithms will be appropriate for short sentences with (potentially)
>lots of ambiguous tokens.

Agreed, but to say that algorithm x is efficient for computer
languages and therefore implies that x is inefficient for IF languages
is flawed logic. Structurely, they can be very similar. The major
difficulty with parsing IF is the fact that words exist in more than
one part-of-speech. Most designers of computer languages try to avoid
this.

Bruce Stephens

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
Xenos <nob...@nowhere.com> writes:

[...]

> That is only true for languages with reversed words. In PL/1 you
> could write: IF IF = THEN THEN THEN = ELSE ELSE ELSE= THEN which is
> similar to your PLANT example. The difficulty here is NOT the
> grammer, which is trival. With your plant example, forgetting for a
> moment such things as optional parts, you have: <verb> <article>
> <adjective> <noun/direction object> <preposition> <article>
> <noun/indirect object>. The problem arises in the ambiguity that
> arises from allowing the tokens or words to be part of multiple
> classes or parts-of-speech. This is a semantical problem not a
> syntaxical one, and exists in both human and computer languages,
> which is why I submit that the two problems are not that different.

It exists in computer languages, but it's very much less of a problem
in computer languages. PL/1 is a bit of an exception in this respect
(although it's certainly not alone).

> Also, remember that there are many different kinds of computer
> languages. I would agree that backtracking parser may not be suited
> for static-scoped, imperative languages, but for dynamic-scoped,
> functional languages, they may be just the ticket.

I suspect not; at the time when one needs to know what a symbol is,
its type is usually unambiguous. I could be wrong though---it's
probably not important either way.

[...]

> I agree that a very complex parser would be overkill in a small,
> simple world, but the complexity of the parser should be driven by
> the complexity of the world, not the other way around.

Of course. I think some games have been harmed because they know
about ropes and objects to which ropes may be tied, but don't provide
a parser good enough to make it easy to tie ropes to objects. (I
think one of the Dungeon-type Level-9 games was like this, IIRC.)

[...]

> Agreed, but to say that algorithm x is efficient for computer
> languages and therefore implies that x is inefficient for IF
> languages is flawed logic.

I was suggesting something quite different: just because algorithm x
is inefficient for computer languages doesn't mean it's not suitable
for IF purposes.

> Structurely, they can be very similar. The major difficulty with
> parsing IF is the fact that words exist in more than one
> part-of-speech. Most designers of computer languages try to avoid
> this.

Agreed.

Xenos

unread,
Apr 9, 2000, 3:00:00 AM4/9/00
to
On 09 Apr 2000 22:00:48 +0100, Bruce Stephens
<bruce+...@cenderis.demon.co.uk> wrote:

>Of course. I think some games have been harmed because they know
>about ropes and objects to which ropes may be tied, but don't provide
>a parser good enough to make it easy to tie ropes to objects. (I
>think one of the Dungeon-type Level-9 games was like this, IIRC.)

I can't think of a truer statement!


>I was suggesting something quite different: just because algorithm x
>is inefficient for computer languages doesn't mean it's not suitable
>for IF purposes.
>

I concur.

Jonadab the Unsightly One

unread,
Apr 10, 2000, 3:00:00 AM4/10/00
to
Xenos <nob...@nowhere.com> wrote:

> Agreed, but to say that algorithm x is efficient for computer
> languages and therefore implies that x is inefficient for IF languages

> is flawed logic. Structurely, they can be very similar. The major


> difficulty with parsing IF is the fact that words exist in more than
> one part-of-speech. Most designers of computer languages try to avoid
> this.

It's worse than that, actually. A word may exist in multiple
parts of speech _and_ in any one of those parts of speech
may have several entirely unrelated meanings depending on
context; further, which meaning is in effect may alter the
structure of the rest of the sentence. Worse, not only does
the context of the word within the sentence affect the
parsing, but sometimes game state has a large impact as
well, at which point many algorithms used to parse computer
languages will probably go all to pieces.


--

Forward all spam to u...@ftc.gov

okbl...@my-deja.com

unread,
Apr 10, 2000, 3:00:00 AM4/10/00
to
In article <38f16ac9...@news.bright.net>,

jon...@bright.net (Jonadab the Unsightly One) wrote:
>
> It's worse than that, actually. A word may exist in multiple
> parts of speech _and_ in any one of those parts of speech
> may have several entirely unrelated meanings depending on
> context; further, which meaning is in effect may alter the
> structure of the rest of the sentence. Worse, not only does
> the context of the word within the sentence affect the
> parsing, but sometimes game state has a large impact as
> well, at which point many algorithms used to parse computer
> languages will probably go all to pieces.

And worse still given that an idiom, which can be any combination of
words with a particular literal meaning that has an often wildly
different actual meaning, and may also be used as different parts of
speech.

Stu is about to go on stage.
> BREAK A LEG
You push Stu into the orchestra pit, breaking his leg. As the
understudy, you must go on for him now!

Compound that with the fact that *any* phrase of the form:

[gerund][noun phrase]

can be interpreted as a euphemism for masturbation and you have a *real*
problem. ;-)
--
[ok]
(Spending too much time "posting to usenet", as they say.)


Sent via Deja.com http://www.deja.com/
Before you buy.

Joe Mason

unread,
Apr 10, 2000, 3:00:00 AM4/10/00
to

I probably could have phrased that better. The world model doesn't HAVE to be
made more complex, but if it isn't then what's the point of making the grammar
more complex?

The canonical example is adding adverbs to the grammar. It's perfectly
possible to parse "DROP THE VASE GENTLY" and "SLOWLY CRAWL NORTH", but if the
world model treats the two commands the same as "DROP THE VASE" and "GO NORTH"
then no player will bother using them, and the extra effort on the grammar is
wasted.

This is pretty much what you said up-thread: the complexity of the world model
has to drive the grammar. I just approached it from the other end.

Joe

John W. Kennedy

unread,
Apr 10, 2000, 3:00:00 AM4/10/00
to
Bruce Stephens wrote:
> It exists in computer languages, but it's very much less of a problem
> in computer languages. PL/1 is a bit of an exception in this respect
> (although it's certainly not alone).

Pretty nearly -- the only other cases I can think of are FORTRAN (like
PL/I -- which took the feature from FORTRAN -- its syntax is such that
all keywords can always be disambiguated when used as names) and APL (no
keywords at all, but a huge collection of special characters). I rather
prefer such languages as these, with no reserved words, but they're
rare. The worst is COBOL, whose textbooks recommend eschewing the
entire technical vocabularies of information processing and accounting
when naming things, on the theory that such a word that isn't reserved
in COBOL today very likely will be tomorrow.

One other language that theoretically had no reserved words was ALGOL
60, which achieved it on the ridiculous and, at the time, impossible
principle that all keywords were in a different font. Real-world ALGOL
60 compilers either ignored this, using reserved words in defiance of
the standard, or else used the horrid convention illustrated here:

'IF' X < Y 'THEN' OUTSTRING ('('X IS LESS THAN Y')');

(or, given the character sets usually available the time, more
probably:)

'IF' X 'LT' Y 'THEN' OUTSTRING ('('X IS LESS THAN Y')').,

--
-John W. Kennedy
-rri...@ibm.net
Compact is becoming contract
Man only earns and pays. -- Charles Williams

Reply all
Reply to author
Forward
0 new messages