Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

code for parsing

296 views
Skip to first unread message

Bill Kearney

unread,
Oct 17, 2000, 3:00:00 AM10/17/00
to
Hi all,

I'm new to this so bear with me.

Is there a good place to read up on parsing command structures? I'd like to
intergrate a semi-intelligent text parser into a program. I'm not talking
about a game, per se. Basically I'd like to be able to construct
english-like commands to execute a number of simple commands. Much like
"get bag, open it and eat the sandwich".

Suggestions, pointers?

Thanks,
Bill Kearney


Jonadab the Unsightly One

unread,
Oct 18, 2000, 3:00:00 AM10/18/00
to
"Bill Kearney" <wkear...@hotmail.com> wrote:

I don't know of any books on this subject. (There may be some; I
don't know of them.) Best I could offer you would be pointers
to the source code for the parsers some of the major IF development
systems use and the general tip that such systems are usually
very object-oriented, with an object for each recognised noun.
The object also understands adjectives that may apply to that
noun, and it has some flags that determine whether it is an
appropriate noun for certain kinds of verbs. (It may be flagged
as a container, for example, or as openable, or wearable...)

--
"Popularity and quality are orthogonal." -- jonadab

TenthStone

unread,
Oct 18, 2000, 3:00:00 AM10/18/00
to
On Tue, 17 Oct 2000 22:19:41 GMT, "Bill Kearney"
<wkear...@hotmail.com> wrote:

>Hi all,
>
>I'm new to this so bear with me.
>
>Is there a good place to read up on parsing command structures? I'd like to
>intergrate a semi-intelligent text parser into a program. I'm not talking
>about a game, per se. Basically I'd like to be able to construct
>english-like commands to execute a number of simple commands. Much like
>"get bag, open it and eat the sandwich".

Well. I don't know of any *books* per se, but that's just my own
ignorance. If you've got a university nearby with a strong
computer science department, go look for things on natural
language parsing.

Alright. Quick web search reveals:
1. the newsgroup comp.ai.nat-lang
2. its bibliographic FAQ at
http://www.cs.cmu.edu/Groups/AI/html/faqs/ai/nlp/nlp_faq/faq.html
3. Colibri at
http://colibri.let.ruu.nl/

Also. TADS 3 implements tools for complete natural-language parsing
in a fairly easy-to-use package; it should be fully functional for
your purposes. The TADS 3 source code is available, in case you need
to use your own programming enviornment.

----------------
The Imperturbable TenthStone
tenth...@hotmail.com pie...@humour.com rjmc...@cmu.edu

Larry Smith

unread,
Oct 18, 2000, 3:00:00 AM10/18/00
to

"Bill Kearney" <wkear...@hotmail.com> wrote in message
news:1g4H5.36527$Qf5.3...@newsread1.prod.itd.earthlink.net...

> Hi all,
>
> I'm new to this so bear with me.
>
> Is there a good place to read up on parsing command structures? I'd like
to
> intergrate a semi-intelligent text parser into a program. I'm not talking
> about a game, per se. Basically I'd like to be able to construct
> english-like commands to execute a number of simple commands. Much like
> "get bag, open it and eat the sandwich".

This is such a large topic that it's hard to know what level to point you
at. Obvious questions include how sophisticated a language you want to
implement, how flexible you need it to be (e.g. how easy is it to add new
keywords, modify your grammar, etc), what language you're going to be
programming this in, how experienced a programmer you are, etc, etc, etc.

One of your options is to use compiler-compiler techniques (although this
may well be overkill for what you want). Still, for all I know, it's exactly
what you need, so we'll give it a stab.

It's possible to define the grammar for a "language" (e.g. C or Java) as a
sequence of text definitions, feed it to a sophisticated program, and have
it produce a source program (in C/C++/Java/etc) that will then parse the
language you specified. But this only handles the syntax of the langugage.
You then need to modify this to add meaning (semantics) to it. For example,
there's nothing cast in concrete that says that "a*b" is multiplication. (I
know of at least one language where this means exponentiation.) Similarly,
"take frammistat" may be syntactically valid in your language, but what
actions should be taken when you recognize that sentence?

Here's an incomplete, super-abbreviated example of what a grammar might look
like (with a few comments in [brackets])

<sentence> ::= <declarative sentence> | <command>
[the "|" is read as "or"]
<command> ::= <command word> !
[a command is a command word followed by an exclamation mark]
<command word> ::= Stop | Proceed | [etc]
<declarative sentence> ::= <noun phrase> <verb> <object>
<noun phrase> ::= <noun> | <adjective> <noun>
[and so on...]


This process can work. Many years ago, I got a book on compiler theory, and
wrote my own program to read in a language description and use that as a
basis for simple natural language processing. It was just a
learning/hobbyist thing. But when I mentioned it to one of the guys at the
office, he jumped on it, saying that such a feature was just what they
needed for one of their projects. I never did find out how well it worked in
practice, since I left the company soon after, but it looked like it was
going to work out for them. (I'd offer you the code, but after 25 years or
so, it's been long lost.)

The traditional program to read in the grammar and produce the customized
parsing program is called YACC ("Yet Another Compiler Compiler"). It's
usually used in conjunction with a program called LEX (LEXical analyzer).
Clones of these from the GNU project called BISON (pun on YACC/yak) and FLEX
are also available (and are usually better than the originals). There are
also packages such as ANTLR (www.antlr.org). See their page of links at
http://www.antlr.org/links.html. Perhaps some of the projects mentioned
there (e.g. Aioli) might be relevant to you. A web search of the phrase
"compiler compiler" will give you a lot of hits.

There are any number of books on compiler theory, but you don't really need
to understand the algorithms used by the above programs (although a few
concepts wouldn't hurt). From a user's POV, you might try "Lex and Yacc,
second edition" from O'Reilly Press (http://www.oreilly.com/catalog/lex/).

Hope this helps.

Kaia Vintr

unread,
Oct 18, 2000, 3:00:00 AM10/18/00
to

Larry Smith wrote in message ...
>
(about using YACC and FLEX for natural language processing)
>

These types of tools will only work on a VERY restricted grammar. Even for
the most straightforward programming language syntaxes you usually have to
use weird tricks to make YACC (and its cousins) work. The good thing about
them (and the reason people use them) is that the parsers they generate are
very, very fast, which is important for compilers.

So I don't think trying to use them to parse English is a good idea. You'll
make rapid progress at first, but then just as quickly run into situations
that YACC can't handle. From my experience it's a waste of time to even
try -- it'll just be incredibly frustrating.

Instead I would suggest using whatever programming language you're
comfortable in and implementing the parser in a more straightforward way,
resolving any problems by trial and error. Eventually you'll end up with
something that does the job. That's how the Inform and TADS parsers were
presumably created and they work pretty well. I'd definitely look at both
of these parsers very carefully before starting.

Generalized natural language parsing (as opposed to parsing simple commands)
probably does require some very sophisticated tools and algorithms, but no
one's done it entirely successfully yet, so who knows.

- Kaia

Larry Smith

unread,
Oct 18, 2000, 3:00:00 AM10/18/00
to

"Kaia Vintr" <ka...@xoe.com> wrote in message
news:NqnH5.349731$Gh.10...@news20.bellglobal.com...

>
> Larry Smith wrote in message ...
> >
> (about using YACC and FLEX for natural language processing)
> >
>
> These types of tools will only work on a VERY restricted grammar.

[rest of article snipped, because I agree with everything said.]

I noticed when I re-read my original post, that I left out a paragraph I'd
meant to put in. It would basically say that the language you implement had
to be fairly simple.

Still, the original poster wanted some ideas, and this qualifies. But I'll
get in line to warn him that if he goes in this direction, he'll have a much
steeper learning curve than he expected. Might be worth it in the long run,
but the short run might get rough. There's at least one commercial package
(referred to indirectly by the Antlr links, and often has ads in Dr. Dobb's)
called Visual Parse++. Since the project seems to be for work, it might be
worth forking out the $495 (list) for it. It's sure to pay for itself many
time over in work-hours saved during the learning curve.

0 new messages