Infocom text encoding

A. David Havill

unread,

Feb 20, 1991, 1:44:43 PM2/20/91

to

Does anyone know how to extract the vocabulary and/or text from an Infocom
data file? Purists may call me a cheater, but I'd be interested in reading
all of the "prose" that I'd only see by typing in a command that I'd see
once in a blue moon.

I have tried reverse engineering it with a debugger, but it's interpreter is
either far from trivial (of course) or they have deliberately made it
difficult to pull off this sort of stunt.

As far as I can figure out, the last step of the decoding process maps codes
0-26 to the lowercase alphabet, the next 11 or so codes are symbols, and the
final 26 are the uppercase alphabet... thus taking seven bits to store a
letter. I forget the exact ordering but could easily look it up; Infocom
takes no mind in hiding the key table.

Obviously, if it were this simple I'd have no trouble decoding it, so it's
obviously not this straightforward or I'm in error somewhere.

Could someone post some leads please? Thanks in advance... [net cliche] ;)
--
"I'd dust desert dopeheads | | .|| Adrian "David" Havill
for gas in my moped!" |--| /\\ /||| dha...@rucs2.sunlab.cs.runet.edu
--Opus' view on Desert Storm | |/--\\/ ||| my opinions are not RU's

Phil Goetz

unread,

Feb 20, 1991, 6:45:50 PM2/20/91

to

In article <1991Feb20....@rucs2.sunlab.cs.runet.edu> dha...@rucs2.sunlab.cs.runet.edu (A. David Havill) writes:
>Does anyone know how to extract the vocabulary and/or text from an Infocom
>data file? Purists may call me a cheater, but I'd be interested in reading
>all of the "prose" that I'd only see by typing in a command that I'd see
>once in a blue moon.

A couple of arcticles in _Computist_ explained
the Infocom text compression. I don't remember the
dates; maybe 2 years ago. Maybe 4? Anyway,
Infocom also allows a 2-byte value (I think) to
specify a word from a dictionary. The dictionary isn't
that large, so maybe it's a 1-byte or 10-bit value.

The program published in COMPUTIST will, of course, only
let you see the text off a disk for the Apple //
computer.

I'd like to put _Inmate_, my text adventure for the Apple //,
up for anonymous ftp. Where and how should I put it?

Phil Goetz
go...@cs.buffalo.EDU

Douglas Phillip Ghormley

unread,

Feb 21, 1991, 7:57:10 PM2/21/91

to

Actually, I cracked this code as an 9th grader. Of course, it took me
all summer, but trial and error can really get you places if you're
persistent.

(BTW, if any of this is a little off, I apologize, it's been over 4
years since I've messed with this stuff.) They use 5 bits/character,
thus allowing three characters (if they're lowercase alphabetical) for
every two bytes with one bit left over. That is the high order bit and,
if set, indicates that this 16-bit word is the last in the string. A
value of 0 maps to a space, since it's quite frequent, and 6-31 map to
letters. Code 4 is used as a capital letter prefix, 5 is used for
punctuation (hard-coded near as I could ever tell), which leaves 3 other
values for escape sequences which they use for macros. If you look at a
datafile (I'm speaking for the old games like Suspended and Enchater
(that was the one we cracked the code on)), you'll notice about 16 bytes
at the top of the file used for control and stuff, somewhere around 64
bytes of 0's, and then garbage for a few hundred bytes or so. This
garbage defines their macros. The macros are listed end to end, taking
up as many bytes as each needs. Following the macros definition region
(at a point in the file specified by one of those control bytes at the
top) is the macros directory. Each field is two bytes long and points
to the beginning of one of the macro strings. The first field of the
macro definitions corresponds to the code (1,0) the second (1,1), etc.
up to (3,31), which gives 96 macros. There is actually a conceptually
simpler version of this format which takes the same number of bits but
yields 128 macros.

Anyway, not to ramble, but we did many other interesting things to
Enchanter, such as adding links between rooms, changing room attributes
(name of room, whether or not a light was needed). We even managed to
start the game inside the loaf of bread....

We actually came up with a screen-oriented program for the IBM to decode
and display portions of the datafiles. Unfortunately the two of us
working on it graduated and went our separate ways and InfoDep (our
program) went the way of the Dodo.

Douglas Ghormley

Dower

unread,

Feb 21, 1991, 11:24:25 PM2/21/91

to

While on the topic of the infocom text, I was wondering, does
anyone out there have an idea what I might need to do to make a language
parser, any idea for what kind of search routine? Algorithms or PseudoCode
or even responses welcome...

relpy to:
do...@ritcsh.rit.edu...
thanks...

Kevin B. Keller

unread,

Feb 22, 1991, 7:10:12 PM2/22/91

to

In article <1991Feb22.0...@ritcsh.csh.rit.edu> do...@ritcsh.csh.rit.edu (Dower) writes:
>
> While on the topic of the infocom text, I was wondering, does
>anyone out there have an idea what I might need to do to make a language
>parser, any idea for what kind of search routine? Algorithms or PseudoCode
>or even responses welcome...
>

I am still a novice in this area but it is a major area of interest to me
and am in the process of learning. A couple of years ago i wrote a simple
(read kludge) pascal program to recognize a "structured" sentance that
used sets. the structure was basically:
verb source_noun preposition destination_noun. the nouns could be preceded
by 0 or more adjectives, and "fluff" words were just thrown away - 'the' 'a'
and so on. this worked fine but crashed hard when the structure was
incorrect; it was also very slow.
There are two things to investigate: natural language recognition, and
language theory (compilers and interpreters). natural language processing
is a division of AI that is (from what i understand) not having much
success. the problem is so complex that work has to be broken into
speciality fields and the parsers become very specialized. as far as
language theory goes there are two tools that are standard to unix
environments (i believe) LEX and YACC. LEX allows you to define 'chunks'
of text and it generates the code to recognize those chunks and return
values unique to each type. these are called tokens. YACC helps you
generate code that groups these tokens according to rules and then
performs an action that you can program (in writing a compiler you
build a parse tree but it takes any c code). this method still requires
that you know the structure ahead of time ( in writing a compiler you
would use the BNF (or other from) formal language specification).

my compiler textbook for the quarter is: Compilers: Principles, Techniques,
and Tools. Aho, Sethi, and Ullman. This is a standard book and should be
easy to find.

my AI text is Artificial Intelligence and the Design of Expert Systems,
Luger and Stubblefield. there is a chapter devoted to natural language
understanding (ch. 10). but it is not very in depth.

a nice beginning book is: Writing Interactive Compilers and Interpreters,
P.J. Brown. this is more of an introductory book and is lighter on the
theory. compilers gets into finite state theory pretty deep.

for LEX and YACC you can do 'man lex' and 'man yacc' on unix. there are
two PC versions that i know of mks and abraxis software. see computer
language, march 1989 for a YACC review.

well hope this helps if you want more just email me.

kevin
kke...@polyslo.calpoly.edu

>relpy to:
>do...@ritcsh.rit.edu...
>thanks...

A. David Havill

unread,

Feb 23, 1991, 4:08:55 AM2/23/91

to

In article do...@ritcsh.csh.rit.edu (Dower) writes:
>
>... [A]nyone out there have an idea what I might need to do to make a language
>parser, any idea for what kind of search routine? Algorithms or PseudoCode...

If you're considering writing a beast on the order of an infocom parser, the
best bet is to get out a old computer science compiler textbook, write a
scanner and recursive descent parser. The technique isn't trivial, but it
isn't that hard either-- it's easier to write a parser than a compiler %-)

There are a lot of places to go from here, and there's a good book called
"Adventure Games" by Compute! Books that gives a very good breakdown of the
basic two-word adventure game-- in pseudo-code. They also go into explain
defining multi-word grammers in BNF, doing a pretty good job of putting it in
layman's terms.

It'd be hard to find the book given that it existed in the heyday of text
adventures, (around '83) so I wish you luck in obtaining a copy.

jpb

unread,

Feb 23, 1991, 12:18:34 PM2/23/91

to

In article <1991Feb23....@rucs2.sunlab.cs.runet.edu> dha...@rucs2.sunlab.cs.runet.edu (A. David Havill) writes:
>In article do...@ritcsh.csh.rit.edu (Dower) writes:
>>
>>... [A]nyone out there have an idea what I might need to do to make a language
>>parser, any idea for what kind of search routine? Algorithms or PseudoCode...
>
>If you're considering writing a beast on the order of an infocom parser, the
>best bet is to get out a old computer science compiler textbook, write a
>scanner and recursive descent parser. The technique isn't trivial, but it
>isn't that hard either-- it's easier to write a parser than a compiler %-)
>

You also might want to look at the moo/mud/mush/muck programs on belch.berkeley.edu. They all have parsers that you can look at, in fairly portable C even.

--
Joe Block (j...@umbio.med.miami.edu)
"Never send a monster to do the work of an evil genius."