Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

I have source for a nice parser

46 views

Skip to first unread message

Phil Goetz

unread,

May 6, 1991, 3:00:09 PM5/6/91

I kermited text for most of the source for my adventure's parser to my system.
If anyone REALLY wants to see source for a parser, I'll mail it to you.
It is about 70,000 bytes of text, and is written in 6502 assembly for the
Apple //. It's old now (I haven't worked on it in years), & I hate to keep
talking about something I wrote years ago, but
it implements most of the wants expressed here, such as recursive
search for noun phrases, context-sensitivity, adjectives, etc. If enough
people respond, I may put it up for ftp instead.

It is case-grammar based, keyed on the verb: The objects, instruments, etc.
must meet the restrictions of the verb. Execution of the command is handled
by verb-specific code (which I haven't uploaded, but could).

Here's a summary of the parser, which I think I've published here before
(though without offering source):

ENGLISH COMMAND PARSING

I developed a command parser which handles objects, sources,
destinations, adjectives, prepositions, adverbs, and adverbial phrases.
[Note: I removed the adverbial phrases and adverbs because they seemed
to have no purpose in an adventure.]
The parser runs on an Apple ][+ in 6502 machine language and
currently occupies about 4K of memory. It is an augmented transition
network. It is supported by an input routine, a lexical analysis routine
with 700-word dictionary, 50 verbs, 150 objects, 15 adjectives, and about
300 edges indicating properties of objects or relationships between objects.
It has 4 phases: lexical analysis, pre-ATN grammatical transformations, the
ATN, and command handlers to manipulate the game world.

Lexical Analysis

Every word in the dictionary has at least 1 type, and often 2 or 3.
Some types have a number associated with them: nouns, for instance, have
an object number. Analysis produces one word number and 1 to 3 word types
for each word, stored in parallel arrays. There are 3 special types: synonyms,
transformationals, and nulls. Synonyms take the types AND word number
of another word in the dictionary. This is so a word and all its synonyms
can be identified by the same word number. Transformationals are marked for
pre-ATN transformation (see below). Nulls are deleted. Articles (a, the),
demonstratives (i.e. these, those) and intensifiers (i.e. very, extremely)
are currently classified as nulls.

Pre-ATN Grammatical Transformation

The transformer scans the parallel arrays of word number and word
type for patterns which should be transformed to simpler forms using the
basic transformational operations of movement, copying, insertion, and
deletion. These include specific quirks of the language as well as
general grammatical transformations. For instance, you would want to
transform VT NP1 NP2 to VT NP2 "TO" NP1 so the ATN can pick up NP2 as
the object.
Note that the transformer will make a transformation if each
word in the input matches sequential words in the pattern using ANY
of the 3 possible types. So it is possible that a transformation could
be made with a mistaken type assumption. In practice, this has not
been a problem.
Each transformation has a priority level assigned to it. S o
high-level priority transforms are tried first. This prevents an uncommon
transformation from taking effect where a common one should have been used.

ATN

A transition network is a set of transitions from one state, or
node, to another based on the section of input currently being examineid.
In a nondeterministic net such as this one, there is more than one
possible choice at some nodes. If the program takes the wrong transition
and winds up at a dead end, it must be able to trace back to its last
decision branch and try another path. If the input is unparsable, the
proram will trace all possible nodes of the network and eventually
back out of the original node. The only other way to leave a transition
network is to succeed in parsing the input. Success is achieved by
reaching the end of the input while in an "accepting" node.
A recursive transition network is one that can call entire
networks, including itself, as a node. An augmented transition
network (ATN) is one which can perform special actions during each
transition and have special (i.e. semantic) requirements for a transition.
For example, the transition from node 7 to node 8 in my network is
invoked if the current word is a preposition of instrument such as "with"
or "using". The transition checks the verb's case frame to see if it
allows an instrument (verbs which do include "hit" and "wash"). If so,
it stores the current word in the Prep of Instrument slot, goes on to the
next word, and enters node 8. If not, it returns to node 7. Some transitions
leave flags to be checked/used later. The word "his" in "Get his book"
(if Ed is the only man in the room) should be used when finding objects
to remove all books from the object list that are not either owned by
or presently "in" (carried by) Ed. It doesn't now.
This network is not truly recursive in that it cannot normally
call independent networks as a node. It fakes recursive calling of noun
phrases (NPs) through tricky manipulations when entering and leaving NPs
and by using a special stack to keep track of NP calls. The difficulty
of implementing this in machine language is just one of many reasons not
to write your parser in machine language.
Backtracking is handled by associating a node and a transition index
with each word. When a node is entered, its number is stored in an array
parallel to the word arrays. When it is left by a successful transition,
the index pointing to the next branch that would have been examined from that
node is saved in another parallel array. If the program fails at the next
node, it decrements the word counter, reenters the node it came from, loads
the index, and resumes where it left off.
One great difficulty with this is that ALL data created/changed by
the unsuccessful link of transitions must be undone. This might be one
reason to write your parser in ProLog. It would be better to have registers
which are associated with a particular parse path through the network
which are then copied into global registers, than to use the same global
registers throughout (which I did).
Jumps are special cases because they involve a transition, yet are
not associated with a word. However, since jumps are always the last
transition tested within a node, the node jumped from can be removed
from the backtracking list and thus the parallel arrays are not thrown
out of sync.
The purpose of this ATN is to fill in certain slots for the command
handler. These slots contain the verb, object list, preposition of instrument,
instrument, prep of destination (ie into, onto), destination, and place (ie
"north", "bed" when used in ungrammatical but popular sentence "go bed").
Say you enter
PUT THE BAG IN THE BACKPACK IN THE BIG BLUE BOX
The first word is always the verb. Lexical analysis deletes the THEs:
PUT BAG IN BACKPACK IN BIG BLUE BOX
Now, a listing of the ATN looks like this (X! means that node X is an
accepting node; that is, if the sentence ends when it reaches that node,
it is accepted):

Current node Input Next node
0 Verb of place 1
Transitive verb 2
Intransitive verb 7
1 Preposition of motion 6
Place 7
2 Noun Phrase (object) 3
3! Exclusionary 4
Jump 5
4 NP (excluded from obj) 5
5! Preposition of motion 6
Jump 7
6 NP (destination) 7
7! Prep of instrument 8

The noun phrase network looks like this:
9 Noun group 10
10! Prep of no motion 11
11 Noun group 12
12! Conjunction ("and") 9

The noun group network looks (I think) like this:
13 Possessive (ie "his") 14
Jump 14
14 Number ("7", "all") 15
Jump 15
15 Adjective 15
Classifier 16
Noun 17
16 Classifier 16
Noun 17
17! Conjunction 14

(A classifier is very similar to an adjective, i.e. "sleeping" is a classifier
for "sleeping bag". The only difference to the program is that adjectives
are handled using edges, and classifiers modifying an object never change.)

PUT is a transitive verb, and so the ATN enters node 2 and looks for a noun
phrase. The noun phrase network identifies BAG IN BACKPACK IN BOX as a noun
phrase in 3 recursive passes. On the first pass, it identifies BAG as a
noun and tries to put it in the object list. It sees that BAG is a superset
containing BAG1 (the plastic bag) and BAG2 (the sleeping bag), so it puts
both in the object list. IN tells the NP to call itself using the next NP
as a source for the last NP. When it finds BACKPACK, it removes any bags
that are not IN the backpack from the object list. Again, IN instructs
the program to reenter the NP network and look for a source for BAG. (This
should really look for a source for BACKPACK, but if a BOX contains
the proper BAG and
the BACKPACK contains the proper BAG, the BACKPACK is probably in the BOX.
I haven't tried GET BACK IN BOX IN BACKPACK yet; my computer is dismantled.)
The NP network places BIG and BLUE in the adjective list. (The adjective
list is not used after parsing.) When it reaches BOX, it puts all boxes
in the known world in the object list, then removes all boxes that are not
BIG and BLUE. Then it removes all bags from the object list that are not in
the big blue box, and finds it has no objects left. It concludes that IN BOX
must specify the destination of the bag in the object list, and so puts the
bags back in the object list and puts BOX1 (big and blue) in the
destination slot and IN in the preposition of destination slot (used by
the command handler).

Errors, Ellipsis:

If the network is unable to parse a sentence, it tries to interpret
the sentence as a sentence fragment by inserting it into the last sentence
parsed. So if you say "Take the book" and them "examine", it replaces
"take" in "Take the book" with "examine" and tries (successfully) to parse
that.

Pronouns:

The only pronoun with any meaning is "it". "It" refers to the
last single object referenced. I might also have a "them" for the last
group of objects referred to, but I forget.

Post-ATN Processing

I thought of having a post-ATN processor create conceptual dependency
representations from the information passed on by the ATN, using something
like Wilks' semantic primitives or Schanks' primitive ACTS and inferences.
Instead of writing a command handler for each verb, more general handlers
could deal with similar verbs (i.e. give, take, sell, buy; eat, drink).
Ideally, this construction should be used during ATN analysis for semantic
interpretation and disambiguation of the sentence, but commands in adventures
are almost always unambiguous and simple enough for correct parsing by
the ATN.
Using semantic primitives is something I think IF requires, but it
is not practical until we can write IF which does not depend on ad-hoc rules
that are applied to individual verb-object-etc. combinations. It would be,
I think, difficult to apply such ad-hoc rules to a conceptual dependency
representation of a sentence - not for the computer, but for the human who
would have to translate all his ad-hoc situations into conceptual dependency
representations while programming.

Command Handlers

The command handlers might be designed for use with a processor as
mentioned above. Currently, however, they are verb-specific command handlers
as in most adventure games. They manipulate the game world by adding and/or
deleting edges in the database. They can also trigger actions/messages/etc
under certain conditions.
If a list of more than one object is provided with an action, the
action is repeated once for each object.

Phil Goetz
go...@cs.buffalo.edu or V373...@UBVMS.bitnet
This is the best of all possible worlds. - Leibniz
Unfortunately, yes. - Voltaire

0 new messages