Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

On parsers, yet again

49 views
Skip to first unread message

Johann 'Myrkraverk' Oskarsson

unread,
Feb 15, 2022, 8:36:05 PM2/15/22
to
Dear r.a.i-f,

I have been wondering, how are IF parsers generally constructed? Is
there literature on this topic? As in, is it more like programming
language parsing, for which there's abundant literature in compilers,
or is it more like natural language parsing, which I guess is slightly
different? Or neither?

For creating a game, I would probably use TADS, or Inform 6, or some
other ready made environment for exactly that. However, I have been
wondering if parsers are really that /hard/ to do, or just more like
/annoying/ to make?

Anyone here to share anything on the subject?

--
Johann | email: invalid -> com | www.myrkraverk.com/blog/
I'm not from the Internet, I just work there. | twitter: @myrkraverk

Greg Ewing

unread,
Feb 16, 2022, 7:20:04 AM2/16/22
to
On 16/02/22 2:36 pm, Johann 'Myrkraverk' Oskarsson wrote:
> I have been wondering, how are IF parsers generally constructed?  Is
> there literature on this topic?  As in, is it more like programming
> language parsing, for which there's abundant literature in compilers,
> or is it more like natural language parsing, which I guess is slightly
> different?  Or neither?

In my experience they're much more like programming language
parsers than natural language parsers. IF input languages are
usually a very restricted subset of natural languages, so you
don't tend to have the same problems of vagueness and ambiguity
that you get when trying to parse natural languages.

> I have been
> wondering if parsers are really that /hard/ to do, or just more like
> /annoying/ to make?

They're not really hard, especially if you have some familiarity
with the techniques used for parsing programming languages. In
fact, IF input languages are usually a lot simpler than typical
programming languages. Most of the complexity comes in figuring
out what to *do* in response to what the player typed.

--
Greg

Johann 'Myrkraverk' Oskarsson

unread,
Feb 16, 2022, 8:02:55 AM2/16/22
to
On 2/16/2022 12:19 PM, Greg Ewing wrote:
> On 16/02/22 2:36 pm, Johann 'Myrkraverk' Oskarsson wrote:
>> I have been wondering, how are IF parsers generally constructed?  Is
>> there literature on this topic?  As in, is it more like programming
>> language parsing, for which there's abundant literature in compilers,
>> or is it more like natural language parsing, which I guess is slightly
>> different?  Or neither?
>
> In my experience they're much more like programming language
> parsers than natural language parsers. IF input languages are
> usually a very restricted subset of natural languages, so you
> don't tend to have the same problems of vagueness and ambiguity
> that you get when trying to parse natural languages.

Right.

>> I have been
>> wondering if parsers are really that /hard/ to do, or just more like
>> /annoying/ to make?
>
> They're not really hard, especially if you have some familiarity
> with the techniques used for parsing programming languages. In
> fact, IF input languages are usually a lot simpler than typical
> programming languages. Most of the complexity comes in figuring
> out what to *do* in response to what the player typed.

I see. I have to say I'm not /very familiar/ with parsing programming
languages, however, recently I have been reading several compiler books,
and I think I'm starting to get -- at least some of -- it. [*]

Then I was thinking, if all of this has been written about compilers,
hasn't /something/ been written about IF parsers? Maybe it hasn't
and it's all in the compiler literature? One thing is different, IME,
in IF, and that's the game itself can add keywords and nouns. Though
maybe that's not too different from adding types in languages like C++.
The difference being that the compiler grammar is /fixed/ while the IF
grammar is more flexible with verbs being added and nouns changing as
the game progresses.

[*] To name two, /Modern Compiler Implementation in ML/ by Appel, and
/Compiler Design in C/ by Holub. The latter is available on the
author's website as pdf. Then I have a usable familiarity with flex
and yacc.

Adam Thornton

unread,
Feb 16, 2022, 1:34:10 PM2/16/22
to
In article <0C6PJ.1096642$X81f....@fx14.ams4>,
Johann 'Myrkraverk' Oskarsson <joh...@myrkraverk.invalid> wrote:
>Then I was thinking, if all of this has been written about compilers,
>hasn't /something/ been written about IF parsers? Maybe it hasn't
>and it's all in the compiler literature? One thing is different, IME,
>in IF, and that's the game itself can add keywords and nouns. Though
>maybe that's not too different from adding types in languages like C++.
>The difference being that the compiler grammar is /fixed/ while the IF
>grammar is more flexible with verbs being added and nouns changing as
>the game progresses.

Maybe? But plenty of languages let you extend the syntax. FORTH is
my favorite example, but anything LISP-like (and FORTH's stack is just
a LISP expression stood up on end) encourages you to do exactly that.

If you're not scared of wading through source...even though Inform 7
isn't yet open-source, you can wade through its implementation of the
parser and standard library, since that's written in Inform 6 and 7
and bundled with the application.

I'm working with the Mac app, so inside the Inform.app directory,
you'd want to go to Contents/Resources. Linux and Windows will have
analogous structures. Once inside there...Library/6.11 contains a
bunch of Inform 6, including parserm.h, which contains the input
tokenizer and parser. The I6 standard world model is in that
directory as well. Going back up to Contents/Resources, and then down
to Internal/Extensions/Graham\ Nelson will bring you to Standard\
Rules.i7x, which is both the definition of the I7 standard model and
the glue that binds it to I6.

It's an enlightening read, if you want to see how the sausage is made.
What you will find is what Greg Ewing said: the tokenizer is pretty
straightforward, and the parser...recognizes a lot less than you think
it might. The language extensibility is the cool bit, and Inform 7 is
a really neat experiment in making extending the language -- which is
to say, writing Interactive Fiction -- an awful lot like playing a
game written in the language.

Adam

Greg Ewing

unread,
Feb 17, 2022, 7:37:35 PM2/17/22
to
On 17/02/22 2:02 am, Johann 'Myrkraverk' Oskarsson wrote:
> I have to say I'm not /very familiar/ with parsing programming
> languages, however, recently I have been reading several compiler books,
> and I think I'm starting to get -- at least some of -- it.

Don't worry about getting deeply into the theory of parsing,
most of it is overkill for this purpose.

> the game itself can add keywords and nouns.  Though
> maybe that's not too different from adding types in languages like C++.
> The difference being that the compiler grammar is /fixed/ while the IF
> grammar is more flexible with verbs being added and nouns changing as
> the game progresses.

I'm not sure that's a helpful way to think about it. Rather than
the grammar changing, it's more like different variables being in
scope in different places in a program. The set of verbs, nouns,
adjectives etc. understood by the game is fixed by the game author,
but different objects become accessible at different times.

What might be a bit different is that whereas in many programming
languages you have reserved words such as "if", "while", etc. that
can't be used for any other purpose, an IF parser needs to treat
tokens more flexibly. E.g. if you decide that the word "plant"
is always a noun so that you can have an object called "green plant",
you're going to have trouble with a command like "plant the plant".

For that reason you may find tools like yacc that are designed
for keyword-oriented languages don't help very much.

--
Greg
0 new messages