Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

compiling to z-machine code ?

21 views
Skip to first unread message

Mad Scientist Jr

unread,
Feb 5, 2005, 7:58:55 PM2/5/05
to
Is there a somewhat easy to understand spec of the z-machine? Just for
fun, I am interested in writing a simple compiler that can produce
basic z-machine runnable code (ZIP? ZIL? i forget the terminology) with
commands like variable assignment (string/integer), printing to the
screen, asking the user for input, if/then, etc. Any pointers
appreciated...

samwyse

unread,
Feb 5, 2005, 8:29:42 PM2/5/05
to
On or about 2/5/2005 6:58 PM, Mad Scientist Jr did proclaim:

First of all, there's the Inform compiler, which produces either Z-code
or Glulx (which is, sort-of, kind-of, a 32-bit version of Z-code,
although its author would disagree). You can find it here:
http://www.inform-fiction.org/

Given that, if you still want to write your own compiler then you need
"The Z-Machine Standards Document" found at:
http://www.inform-fiction.org/zmachine/standards/index.html

Terry Olsen

unread,
Feb 5, 2005, 10:25:39 PM2/5/05
to
On that note: has anyone ever done a program that could take a story file
and generate a basic program (or any other language) from it?


samwyse

unread,
Feb 6, 2005, 6:53:24 AM2/6/05
to
On or about 2/5/2005 9:25 PM, Terry Olsen did proclaim:

> On that note: has anyone ever done a program that could take a story file
> and generate a basic program (or any other language) from it?

Not off the top of my head. Back in '97-'98, there was a big effort to
decypher storyfiles, as part of the project to figure out how storyfiles
were put together and exactly what all of the Z-machine opcodes were
supposed to do. All of the Ztools were created at that time; you can
download them here:

http://www.inform-fiction.org/zmachine/ztools.html

InfoDump will display all of the strings in a storyfile, while txd will
give you just about every byte formatted for easy viewing. Nothing will
look at a procedure and create high-level code; the best you can ever
hope for is Inform assembler op-codes. You are especially out of luck
if you want a file that you can modify and re-compile, since storyfiles
use a compressed format to store pointers to objects, strings and
procedures. This means that deciding which variables and object
properties are pointers to objects and which are just numbers would be
especially uncertain. Any changes to the file that caused things to
move would invalidate any pointers that had been misidentified as
numbers (and vice versa), causing interesting errors at run time.

Fredrik Ramsberg

unread,
Feb 6, 2005, 7:03:32 AM2/6/05
to
Amir Karger has written a Perl module that translates Zcode files to
Perl:

http://ifarchive.jmac.org/if-archive/infocom/tools/Language-Zcode-0.8.tar.gz

I believe it's built so that you should be able to translate Zcode into
languages other than Perl too, if you're willing to do some coding of
course.

Message has been deleted

Fredrik Ramsberg

unread,
Feb 6, 2005, 7:48:08 AM2/6/05
to
>> On that note: has anyone ever done a program that could take
>> a story file and generate a basic program (or any other
>> language) from it?
> Not off the top of my head. Back in '97-'98, there was a
> big effort to decypher storyfiles, as part of the project
> to figure out how storyfiles were put together and exactly
> what all of the Z-machine opcodes were supposed to do.
> All of the Ztools were created at that time; you can
> download them here:


Inform 1 was released to the public on 10 May 1993, and I recall Graham
Nelson claiming that the didn't decode the Zcode story file format
himself - others had already done that before him.

Mad Scientist Jr

unread,
Feb 6, 2005, 12:21:52 PM2/6/05
to
Not that I am in a great hurry to do this, but I can't imagine it NOT
being possible to disassemble z-code. Are you saying that due to this
compressed format, any meaningful variable or function/procedure names
would be lost, and instead you would be left with A,B,C, functionA,
functionB, functionC, etc.? Because if the z-machine can run it, a
z-disassembler ought to be able to render source code from it (albeit
without friendly var/function names).

Andrew Plotkin

unread,
Feb 6, 2005, 12:39:06 PM2/6/05
to
Here, Mad Scientist Jr <usenet_...@yahoo.com> wrote:
> Not that I am in a great hurry to do this, but I can't imagine it NOT
> being possible to disassemble z-code. Are you saying that due to this
> compressed format, any meaningful variable or function/procedure names
> would be lost, and instead you would be left with A,B,C, functionA,
> functionB, functionC, etc.?

You have that problem, yes.

A bigger problem is distinguishing addresses from constants. If you
see a z-code opcode "load $4A8E into local variable 1", you have to
figure out that $4A8E actually refers to a string at that address,
rather than a numeric constant. This is important for a human reading
the code. It's even more important if you plan to modify the code and
recompile it, because in the new binary that string may be at a
different address. If you guess wrong about what that instance of
$4A8E means, you'll either print garbage or change an important
numeric constant in the program.

You can use some reasonable heuristics. For example, if local variable
1 is immediately printed, it's a very good bet that $4A8E is a string
address. If there *is* a string with address $4A8E, then again it's a
fair bet. But none of these heuristics is perfectly reliable, so any
true disassembling effort will require some going-over with human eyes.

(If that problem seems easy, then consider property identifiers. These
have the same difficulty, but they're small numbers like 4 or 7, which
-- unlike $4A8E -- are as likely to be *intended* as numbers as they
are to be property ids.)

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*
I'm still thinking about what to put in this space.

Rioshin an'Harthen

unread,
Feb 6, 2005, 1:00:04 PM2/6/05
to
"Andrew Plotkin" <erky...@eblong.com> wrote in message
news:cu5knp$fu$2...@reader2.panix.com...

> Here, Mad Scientist Jr <usenet_...@yahoo.com> wrote:
>> Not that I am in a great hurry to do this, but I can't imagine it NOT
>> being possible to disassemble z-code. Are you saying that due to this
>> compressed format, any meaningful variable or function/procedure names
>> would be lost, and instead you would be left with A,B,C, functionA,
>> functionB, functionC, etc.?
>
> You have that problem, yes.
>
> A bigger problem is distinguishing addresses from constants. If you
> see a z-code opcode "load $4A8E into local variable 1", you have to
> figure out that $4A8E actually refers to a string at that address,
> rather than a numeric constant. This is important for a human reading
> the code. It's even more important if you plan to modify the code and
> recompile it, because in the new binary that string may be at a
> different address. If you guess wrong about what that instance of
> $4A8E means, you'll either print garbage or change an important
> numeric constant in the program.

>
> You can use some reasonable heuristics. For example, if local variable
> 1 is immediately printed, it's a very good bet that $4A8E is a string
> address. If there *is* a string with address $4A8E, then again it's a
> fair bet. But none of these heuristics is perfectly reliable, so any
> true disassembling effort will require some going-over with human eyes.

Ok, so a big problem is distinguishing addresses from constants. How then
does an interpreter distinguish them? A disassembler can see just as much of
the story file as an interpreter. So why wouldn't it be able to distinguish
the cases from each other? If an interpreter can do it, so can a
disassembler.

> (If that problem seems easy, then consider property identifiers. These
> have the same difficulty, but they're small numbers like 4 or 7, which
> -- unlike $4A8E -- are as likely to be *intended* as numbers as they
> are to be property ids.)

And again, same reasoning. If an interpreter can distinguish them, then why
couldn't a disassembler?


samwyse

unread,
Feb 6, 2005, 2:12:53 PM2/6/05
to
On or about 2/6/2005 12:00 PM, Rioshin an'Harthen did proclaim:

> "Andrew Plotkin" <erky...@eblong.com> wrote in message
> news:cu5knp$fu$2...@reader2.panix.com...
>>You can use some reasonable heuristics. For example, if local variable
>>1 is immediately printed, it's a very good bet that $4A8E is a string
>>address. If there *is* a string with address $4A8E, then again it's a
>>fair bet. But none of these heuristics is perfectly reliable, so any
>>true disassembling effort will require some going-over with human eyes.
>
> Ok, so a big problem is distinguishing addresses from constants. How then
> does an interpreter distinguish them? A disassembler can see just as much of
> the story file as an interpreter. So why wouldn't it be able to distinguish
> the cases from each other? If an interpreter can do it, so can a
> disassembler.

The interpreter doesn't have to distinguish a value's meaning until it
actually get used by an instruction. When the interpreter sees the
opcode to print local variable 1, it can assume that the variable
contains the address of a string. (And if it doesn't, it can signal a
run-time error.) But the value may have been stored in that variable at
a completely different time. Also note that if the variable was
originally named 'temp', then it will likely have different types of
values at different times, which can futher bolix up the disassembler.
And static analysis of the entire program doesn't help since there might
be dead code and/or undiscovered bugs. Just because there's an
instruction somewhere to print local variable 1 doesn't mean that it
will ever be executed.

BTW, Infocom used a LISP-based language that is very different from
Inform. There are storyfiles (Seastalker comes immediately to mind)
that the Inform compiler would be unable to duplicate, because it won't
let you put procedures and strings in readable memory.

Mad Scientist Jr

unread,
Feb 6, 2005, 3:49:02 PM2/6/05
to
okay i think i understand that explanation. thanks to everyone who has
posted. i read some articles by the infocom guys that showed some
Muddle (aka MDL, the LISP like language they used) source code and it's
ugly! Did they ever make available any of their complete source codes
anywhere?

samwyse

unread,
Feb 6, 2005, 4:30:21 PM2/6/05
to
On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:

> okay i think i understand that explanation. thanks to everyone who has
> posted. i read some articles by the infocom guys that showed some
> Muddle (aka MDL, the LISP like language they used) source code and it's
> ugly! Did they ever make available any of their complete source codes
> anywhere?

LISP, APL and Forth are all interpreted languages that are incredibly
powerful, but which have incredibly steep learning curves. LISP-like
languages are particularly suited for natural language applications,
which is why they are used so much in Artificial Intellegence and were
judged well suited for interactive fiction.

The excerpts that you've seen published are pretty much all that there
is, unless Activision has some code hidden away in a vault somewhere.
The circumstantial evidence is pretty strong, however, that all they
have are the binaries, the same as us. Someone posted last year that
they were going to try to recreate the Infocom compiler; nothing has
been heard on the subject since then.

Andrew Plotkin

unread,
Feb 6, 2005, 5:06:24 PM2/6/05
to
Here, samwyse <deja...@email.com> wrote:
> On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
> > okay i think i understand that explanation. thanks to everyone who has
> > posted. i read some articles by the infocom guys that showed some
> > Muddle (aka MDL, the LISP like language they used) source code and it's
> > ugly! Did they ever make available any of their complete source codes
> > anywhere?
>
> LISP, APL and Forth are all interpreted languages that are incredibly
> powerful, but which have incredibly steep learning curves. LISP-like
> languages are particularly suited for natural language applications,
> which is why they are used so much in Artificial Intellegence and were
> judged well suited for interactive fiction.

The bits of Infocom code we have are not LISP. They're in a compiled
(not interpreted) language, which is imperative in form (not
functional or dynamic). It's a lot like early Inform, which is not a
surprise. Only the syntax is LISP-like.

Andrew Plotkin

unread,
Feb 6, 2005, 5:10:50 PM2/6/05
to
Here, Rioshin an'Harthen <rio...@breakthru.com> wrote:
> "Andrew Plotkin" <erky...@eblong.com> wrote in message
> news:cu5knp$fu$2...@reader2.panix.com...
> >
> > A bigger problem is distinguishing addresses from constants. If you
> > see a z-code opcode "load $4A8E into local variable 1", you have to
> > figure out that $4A8E actually refers to a string at that address,
> > rather than a numeric constant. This is important for a human reading
> > the code. It's even more important if you plan to modify the code and
> > recompile it, because in the new binary that string may be at a
> > different address. If you guess wrong about what that instance of
> > $4A8E means, you'll either print garbage or change an important
> > numeric constant in the program.
>
> Ok, so a big problem is distinguishing addresses from constants. How then
> does an interpreter distinguish them?

The interpreter doesn't distinguish them. It just passes the value
through blindly, and assumes that the right thing will happen.

Note that I specified the tasks of "making human-readable code" and
"modifying and recompiling code". Those are tasks which an interpreter
doesn't have to do. They are tasks which the Z-code format is not
designed to make easy. Unsurprisingly, therefore, they're *not* easy.

David Alex Lamb

unread,
Feb 6, 2005, 5:37:04 PM2/6/05
to
In article <cu5knp$fu$2...@reader2.panix.com>,

Andrew Plotkin <erky...@eblong.com> wrote:
>A bigger problem is distinguishing addresses from constants. If you
>see a z-code opcode "load $4A8E into local variable 1", you have to
>figure out that $4A8E actually refers to a string at that address,
>rather than a numeric constant. This is important for a human reading
>the code. It's even more important if you plan to modify the code and
>recompile it, because in the new binary that string may be at a
>different address. If you guess wrong about what that instance of
>$4A8E means, you'll either print garbage or change an important
>numeric constant in the program.

My knowledge of decompilers has deteriorated a lot in the last 25 years, but
I'm fairly sure this is a "standard problem" solved by some variant of type
inference or some other form of dynamic code analysis. I think there are
similar issues with byte-code-to-native compilers. I'm not saying it's easy,
just that there is literature in the compiler community that addresses similar problems.

>You can use some reasonable heuristics. For example, if local variable
>1 is immediately printed, it's a very good bet that $4A8E is a string
>address. If there *is* a string with address $4A8E, then again it's a
>fair bet. But none of these heuristics is perfectly reliable, so any
>true disassembling effort will require some going-over with human eyes.

IIRC the above-mentioned amount to formalizing these kind of heuristics and
verifying that the code really does treat the value that way.

>(If that problem seems easy, then consider property identifiers. These
>have the same difficulty, but they're small numbers like 4 or 7, which
>-- unlike $4A8E -- are as likely to be *intended* as numbers as they
>are to be property ids.)

The interesting trick is if some clever compiler treats them as property
identifiers in one place and simple integers in another. I have vague
memories of some peephole optimizers generating such multi-use situations.
--
"Yo' ideas need to be thinked befo' they are say'd" - Ian Lamb, age 3.5
http://www.cs.queensu.ca/~dalamb/ qucis->cs to reply (it's a long story...)

samwyse

unread,
Feb 6, 2005, 6:19:27 PM2/6/05
to
On or about 2/6/2005 4:06 PM, Andrew Plotkin did proclaim:

> Here, samwyse <deja...@email.com> wrote:
>
>>On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
>>
>>>okay i think i understand that explanation. thanks to everyone who has
>>>posted. i read some articles by the infocom guys that showed some
>>>Muddle (aka MDL, the LISP like language they used) source code and it's
>>>ugly! Did they ever make available any of their complete source codes
>>>anywhere?
>>
>>LISP, APL and Forth are all interpreted languages that are incredibly
>>powerful, but which have incredibly steep learning curves. LISP-like
>>languages are particularly suited for natural language applications,
>>which is why they are used so much in Artificial Intellegence and were
>>judged well suited for interactive fiction.
>
> The bits of Infocom code we have are not LISP. They're in a compiled
> (not interpreted) language, which is imperative in form (not
> functional or dynamic). It's a lot like early Inform, which is not a
> surprise. Only the syntax is LISP-like.

Quite so, but I was mostly discussing why a language was chosen that was
"ugly". I've used a few variants of LISP, including (in reverse order)
CommonLisp, Scheme and an IBM mainframe version that was probably a
close decendent of McCarthy's original implementation. A really
interesting (though very short) discussion of ZIL (Zork Implementation
Language) can be found here:

http://lambda-the-ultimate.org/node/view/429

It indicates that Activision *does* have the source code for both ZIL
and all of the Infocom games. "A lot of people might not realize it,
but as late as 1993, Activision's "Return to Zork" was still written in
a (heavily extended) version of ZIL, i.e. a dialect of Lisp."

Andrew Plotkin

unread,
Feb 6, 2005, 6:39:27 PM2/6/05
to
Here, samwyse <deja...@email.com> wrote:
> On or about 2/6/2005 4:06 PM, Andrew Plotkin did proclaim:
> > Here, samwyse <deja...@email.com> wrote:
> >
> >>On or about 2/6/2005 2:49 PM, Mad Scientist Jr did proclaim:
> >>
> >>>okay i think i understand that explanation. thanks to everyone who has
> >>>posted. i read some articles by the infocom guys that showed some
> >>>Muddle (aka MDL, the LISP like language they used) source code and it's
> >>>ugly! Did they ever make available any of their complete source codes
> >>>anywhere?
> >>
> >>LISP, APL and Forth are all interpreted languages that are incredibly
> >>powerful, but which have incredibly steep learning curves. LISP-like
> >>languages are particularly suited for natural language applications,
> >>which is why they are used so much in Artificial Intellegence and were
> >>judged well suited for interactive fiction.
> >
> > The bits of Infocom code we have are not LISP. They're in a compiled
> > (not interpreted) language, which is imperative in form (not
> > functional or dynamic). It's a lot like early Inform, which is not a
> > surprise. Only the syntax is LISP-like.
>
> Quite so, but I was mostly discussing why a language was chosen that was
> "ugly".

That's what I mean, though. If you're talking about MDL (the original
Zork), then it *was* more or less LISP-like, and I'm sure they chose
it because they were MIT hackers and that was the era when LISP was
the hippest thing in the cool universe.

If you're talking about ZIL, then I imagine they designed it ugly
because they were MIT hackers. And even though they were effectively
inventing a form of compiled BASIC, they were at least going to
*pretend* it was LISP-like. Dammit. Regardless, it doesn't actually
have any of the properties ("powerful, suited to natural languages")
that you were talking about.

> A really
> interesting (though very short) discussion of ZIL (Zork Implementation
> Language) can be found here:
>
> http://lambda-the-ultimate.org/node/view/429

I'm not convinced by everything posted there. In particular, calling
ZIL "MDL with a lot of its genuine arcana removed" seems to be
way off-base. ZIL had no facility to allocate or free memory; it had
no cons, car, or cdr facilities. I'm sticking with "compiled BASIC
with a lingering addiction to parentheses".

ethan...@gmail.com

unread,
Feb 7, 2005, 12:28:37 AM2/7/05
to
samwyse wrote:
> A really interesting (though very short) discussion of ZIL (Zork
> Implementation Language) can be found here:
>
> http://lambda-the-ultimate.org/node/view/429
>
> It indicates that Activision *does* have the source code for both
> ZIL and all of the Infocom games. "A lot of people might not
> realize it, but as late as 1993, Activision's "Return to Zork"
> was still written in a (heavily extended) version of ZIL, i.e. a
> dialect of Lisp."

Having been a beta-tester for "Return to Zork", I would have to
disagree with the quote from this page... RtZ was written in a language
that _looked_ Lispy, but it had zero to do with MDL or ZIL... no common
code, no common authorship, etc.

-ethan

Joel Ray Holveck

unread,
Feb 7, 2005, 3:34:24 AM2/7/05
to
On 2005-02-06 09:21:52 -0800, "Mad Scientist Jr"
<usenet_...@yahoo.com> said:

> Are you saying that due to this
> compressed format, any meaningful variable or function/procedure names
> would be lost, and instead you would be left with A,B,C, functionA,
> functionB, functionC, etc.? Because if the z-machine can run it, a
> z-disassembler ought to be able to render source code from it (albeit
> without friendly var/function names).

No, but many things are referred to as simply numbers, and you can't
tell what those numbers mean unless they're being used in their
functions.

For example, suppose that I have the following code:

[Foo direction;
print "That way is ", (name)location.direction;
];
[Bar obj;
print "That is ", (the)obj;
];
[Baz verb;
if (verb == ##Jump)
"You jump.";
"You don't jump.";
];
[Quux;
Foo(n_to);
Bar(platypus);
Baz(##Climb);
];

Now, in Quux, you're passing a number to Foo, a number to Bar, and a
number to Baz. The compiler is kind enough to let you use n_to (a
property name), platypus (an object identifier), and ##Climb (a verb
identifier) to refer to their numbers, but in the compiled code,
they're just numbers-- and they overlap. (In fact, the standard Inform
library relies on the fact that n_to and n_obj have the same number--
something I found out, much to my dismay, when I violated this just
before putting my program in beta.) A disassembler examining Quux
wouldn't know which is which, that is, it wouldn't know if the number
for n_to was meant to represent a property, object, verb, etc.

"But can't it look at Foo, Bar, and Baz?" Well, kind of. That would
take a considerable amount of analysis. (It's not entirely impossible;
Spidey, a Scheme analyzer, does some such analysis. But I suspect it's
impossible to always perform such analysis). Also, the routines may
not clearly define what type of object they're expecting. In the
program I'm working on now, some subroutines can be passed dictionary
words or objects, for example.

There are decompilers for some languages, but they're also unreliable
for similar reasons. It's more of a problem for Inform, since it
relies on lots of types. Decompilers are most common for languages
like C, which is almost a macro assembler anyway.

Cheers,
Piquan

samwyse

unread,
Feb 7, 2005, 9:30:10 AM2/7/05
to
On or about 2/6/2005 5:39 PM, Andrew Plotkin did proclaim:

> Here, samwyse <deja...@email.com> wrote:
>>Quite so, but I was mostly discussing why a language was chosen that was
>>"ugly".
>
> That's what I mean, though. If you're talking about MDL (the original
> Zork), then it *was* more or less LISP-like, and I'm sure they chose
> it because they were MIT hackers and that was the era when LISP was
> the hippest thing in the cool universe.

I can't argue with that. :-) I once played with a Lisp Machine that my
company was thinking about buying.

> If you're talking about ZIL, then I imagine they designed it ugly
> because they were MIT hackers. And even though they were effectively
> inventing a form of compiled BASIC, they were at least going to
> *pretend* it was LISP-like. Dammit. Regardless, it doesn't actually
> have any of the properties ("powerful, suited to natural languages")
> that you were talking about.

Well, the dictionary does implement LISP-like atoms. Other than that,
yeah, it is mostly a fancy BASIC.

>>A really
>>interesting (though very short) discussion of ZIL (Zork Implementation
>>Language) can be found here:
>>
>> http://lambda-the-ultimate.org/node/view/429
>
>
> I'm not convinced by everything posted there. In particular, calling
> ZIL "MDL with a lot of its genuine arcana removed" seems to be
> way off-base. ZIL had no facility to allocate or free memory; it had
> no cons, car, or cdr facilities. I'm sticking with "compiled BASIC
> with a lingering addiction to parentheses".

I suppose that it depends on the definition of "genuine arcana"; lots of
people would put all of LISP in that category. ZORK was probably
written while peeking at the FORTRAN source for Adventure and so got
written in a FORTRANish pidgin. It didn't need to allocate or free
memory; it required no cons, car, or cdr.

MDL, like a lot of LISPs, could be compiled (mostly to counter the
perception at the time that LISP was slower than C), and given one
version of LISP, it was pretty easy to derive new versions. I have no
proof, but I imagine that they took an existing version of MDL and
ported it to a new "hardware" architecture, albeit one that was
incredibly limited. Those portions of MDL that implemented "arcana"
didn't get the procedures written for compilation to the new hardware
and so got dropped. And once a working library is written, it tends to
freeze things, so the later games got written in the same pidgin.

I do agree with your final accessment. Once the axe was put away, what
was left did resemble BASIC more than it did LISP. I'll bet, though,
that the compiler did make lots of use of cons, car and cdr.

samwyse

unread,
Feb 7, 2005, 9:56:50 AM2/7/05
to
On or about 2/6/2005 4:37 PM, David Alex Lamb did proclaim:

> My knowledge of decompilers has deteriorated a lot in the last 25 years, but
> I'm fairly sure this is a "standard problem" solved by some variant of type
> inference or some other form of dynamic code analysis. I think there are
> similar issues with byte-code-to-native compilers. I'm not saying it's easy,
> just that there is literature in the compiler community that addresses similar problems.

Quite so. Java is very strongly typed, and (to support the
retrospection APIs) the class files keep a ton of data that would
normally be of interest to only to a debugger. As a result, Java
decompilers are a dime a dozen, because you can get by with a purely
static analysis. The Z machine is the exact opposite of Java, looking
more like an Motorola 6502. The architecture even seems to support
self-modifying code, although no one seems to have ever tried writing any.

Another problem is that there are multiple compilers. Turning a Java
class file back into Java is aided by the fact that there's really only
been a couple of versions of Java, and those have only very minor
differences. The major versions of Inform are quite different from each
other, there is a C compiler (AFAIK only used for "Silicon Castles"),
and then there's ZIL, of which little is known but was used to create
"the canon". Turing a storyfile compiled using Inform 4 into Inform 5
source would be very difficult; turning something that started out as C
or ZIL would, I suspect, be almost impossible.

samwyse

unread,
Feb 7, 2005, 9:58:29 AM2/7/05
to
On or about 2/7/2005 2:34 AM, Joel Ray Holveck did proclaim:

> (In fact, the standard Inform library
> relies on the fact that n_to and n_obj have the same number-- something
> I found out, much to my dismay, when I violated this just before putting
> my program in beta.)

I'm surprised to hear that. Where does that happen?

Matthew Russotto

unread,
Feb 7, 2005, 10:53:17 AM2/7/05
to
In article <1107693861.1...@c13g2000cwb.googlegroups.com>,

Fredrik Ramsberg <f...@mail.com> wrote:
>>> On that note: has anyone ever done a program that could take
>>> a story file and generate a basic program (or any other
>>> language) from it?
>> Not off the top of my head. Back in '97-'98, there was a
>> big effort to decypher storyfiles, as part of the project
>> to figure out how storyfiles were put together and exactly
>> what all of the Z-machine opcodes were supposed to do.
>> All of the Ztools were created at that time; you can
>> download them here:
>
>Inform 1 was released to the public on 10 May 1993, and I recall Graham
>Nelson claiming that the didn't decode the Zcode story file format
>himself - others had already done that before him. Your acclaimed date
>is probably wrong by at least five years.

The Ztools date back to 1992, and I believe they were the _second_
generation of Z-machine story file tools.

However, there was a later push to decode V6 and fill in
some of the gaps.

Matthew Russotto

unread,
Feb 7, 2005, 10:57:49 AM2/7/05
to
In article <1107710512.1...@l41g2000cwc.googlegroups.com>,

Mad Scientist Jr <usenet_...@yahoo.com> wrote:
>Not that I am in a great hurry to do this, but I can't imagine it NOT
>being possible to disassemble z-code. Are you saying that due to this
>compressed format, any meaningful variable or function/procedure names
>would be lost, and instead you would be left with A,B,C, functionA,
>functionB, functionC, etc.? Because if the z-machine can run it, a
>z-disassembler ought to be able to render source code from it (albeit
>without friendly var/function names).

Certainly Z-code can be disassembled. But at present, only to a
low-level assembler. Nobody has made a totally successful decompiler,
though there have been partially successful attempts.

In theory, most Z-code shouldn't be all that hard to decompile. It
comes down to a matter of bookkeeping; some manual help would be
needed when information was totally lost and to fill in variable and
function names, but not as much as you might think. In practice, no
one has done it. It doesn't help that TXD isn't really suitable as
the starting point for a decompiling effort, so you'd have to re-write
the disassembler part before you got to the fun stuff.

Matthew Russotto

unread,
Feb 7, 2005, 11:07:11 AM2/7/05
to
In article <cu666g$6jj$1...@knot.queensu.ca>,

David Alex Lamb <dal...@qucis.queensu.ca> wrote:
>
>My knowledge of decompilers has deteriorated a lot in the last 25 years, but
>I'm fairly sure this is a "standard problem" solved by some variant of type
>inference or some other form of dynamic code analysis. I think there are
>similar issues with byte-code-to-native compilers. I'm not saying it's easy,
>just that there is literature in the compiler community that
>addresses similar problems.

Most recent byte-code to native work has been done on type-aware VMs
like Java. This simplifies the task enormously.

On the flip side, the Z-machine's paucity of types simplifies that
side quite a bit.

>IIRC the above-mentioned amount to formalizing these kind of heuristics and
>verifying that the code really does treat the value that way.

Right. This is what I call "bookkeeping". If you can figure out that
parameter two of routine R_A562 must be a string (because it's used in a
print_paddr), then you can figure out that anything passed in as the
actual second parameter to R_A562 is a string also. You can work
things like that forward and backwards, and come up with the types of
a lot of things. With Infocom (but not Inform) code, you can also
assume that global variables have the same type throughout the code,
and that all properties of a given number have the same type.

>The interesting trick is if some clever compiler treats them as property
>identifiers in one place and simple integers in another. I have vague
>memories of some peephole optimizers generating such multi-use situations.

Fortunately, the ZIL compiler does not do this.

Graham Holden

unread,
Feb 7, 2005, 11:36:21 AM2/7/05
to
On Mon, 07 Feb 2005 10:07:11 -0600, russ...@grace.speakeasy.net (Matthew
Russotto) wrote:

>Right. This is what I call "bookkeeping". If you can figure out that
>parameter two of routine R_A562 must be a string (because it's used in a
>print_paddr), then you can figure out that anything passed in as the
>actual second parameter to R_A562 is a string also. You can work
>things like that forward and backwards, and come up with the types of
>a lot of things. With Infocom (but not Inform) code, you can also
>assume that global variables have the same type throughout the code,
>and that all properties of a given number have the same type.

But presumably, because of Inform/Z-machine's weak typing, there are many
situations where this could go wrong? Things like PrintOrRun() or others
where different types are passed in at different points in a game?

Regards,
Graham Holden (g-holden AT dircon DOT co DOT uk)
--
There are 10 types of people in the world;
those that understand binary and those that don't.

Mad Scientist Jr

unread,
Feb 7, 2005, 2:08:43 PM2/7/05
to
Alas if I only had a slightly bigger brain. I would think the value in
dcompiling z-code would mainly be to see how the masters wrote the
classic Infocom games - more for learning and game design purposes. I'm
sure a lot of that can be done from newer Inform games whose source is
freely available, but I still would love to start by learning from the
original Zork.

Andrew Plotkin

unread,
Feb 7, 2005, 2:13:52 PM2/7/05
to
Here, Joel Ray Holveck <jo...@piquan.org> wrote:
> (In fact, the standard Inform
> library relies on the fact that n_to and n_obj have the same number--

No, I don't believe it does. It's unfortunately easy for the *user* to
write code that relies on this assumption (writing "rm.n_obj" when you
meant "rm.n_to"), but the library doesn't contain that assumption out
of the box.

Mad Scientist Jr

unread,
Feb 7, 2005, 2:20:53 PM2/7/05
to
If I only had a slightly bigger brain (and more free time) I would try
to write the decompiler myself. As it is, I looked at the z-machine
spec and it's beyond my current knowledge and time... I once knew some
8-bit assembly and my understanding of low level computing ends there.
It's just a better use of my time to learn Inform and start writing
games. My original hope was I could write a simple z-code compiler to
translate my existing text games (which are no Zorks, I assure you) to
z-code from their current homebrewed language (a sub BASIC scripting
tongue at best). I think it best to learn Inform, and then if I'm
*still* intersted in porting these ancient games of mine over for comic
relief, it would be easier to write something to translate them to
Inform code. I think it's best to just write new games and attempt
something good. I begin reading the Inform manual tonight...

Matthew Russotto

unread,
Feb 7, 2005, 2:44:46 PM2/7/05
to
In article <2n5f01p8hlb2ml5i9...@4ax.com>,

Graham Holden <lo...@bottom.of.post> wrote:
>
>But presumably, because of Inform/Z-machine's weak typing, there are many
>situations where this could go wrong? Things like PrintOrRun() or others
>where different types are passed in at different points in a game?

PrintOrRun is an Inform thing; I don't think Infocom used anything
like it (I could be mistaken, it's been a while since I disassembled
the story files). However, even if you allow that case you've got it
down to a packed address; knowing it's either a routine or a string is
probably enough for most purposes, particularly since any constant
with such a type can be disambiguated in Inform-generated code.

Undoubtedly, though, you wouldn't be able to determine every type, and
you'd have to call for user help.

Richard Bos

unread,
Feb 7, 2005, 4:11:53 PM2/7/05
to
samwyse <deja...@email.com> wrote:

> The major versions of Inform are quite different from each
> other, there is a C compiler (AFAIK only used for "Silicon Castles"),
> and then there's ZIL, of which little is known but was used to create
> "the canon". Turing a storyfile compiled using Inform 4 into Inform 5

^^^^^^
_Nice_ typo.

Richard

Fredrik Ramsberg

unread,
Feb 7, 2005, 7:59:25 PM2/7/05
to

Matthew Russotto wrote:
> In article <2n5f01p8hlb2ml5i9...@4ax.com>,
> Graham Holden <lo...@bottom.of.post> wrote:
> >
> >But presumably, because of Inform/Z-machine's weak typing, there are
many
> >situations where this could go wrong? Things like PrintOrRun() or
others
> >where different types are passed in at different points in a game?
>
> PrintOrRun is an Inform thing; I don't think Infocom used anything
> like it (I could be mistaken, it's been a while since I disassembled
> the story files). However, even if you allow that case you've got it
> down to a packed address; knowing it's either a routine or a string
is
> probably enough for most purposes, particularly since any constant
> with such a type can be disambiguated in Inform-generated code.

There are also properties which can hold either an object number,
or a string or a routine, like n_to. There can't be any clues
in the code as to what kind of values they may contain.

/Fredrik

Joel Ray Holveck

unread,
Feb 7, 2005, 7:57:14 PM2/7/05
to
On 2005-02-07 11:13:52 -0800, Andrew Plotkin <erky...@eblong.com> said:

>> (In fact, the standard Inform library relies on the fact that n_to and
>> n_obj have the same number--
> No, I don't believe it does. It's unfortunately easy for the *user* to
> write code that relies on this assumption (writing "rm.n_obj" when you
> meant "rm.n_to"), but the library doesn't contain that assumption out
> of the box.

You are, of course, correct. I had thought (for some silly reason)
that the DM4 specified door_to should hold n_obj, not n_to, and had
written my code that way. Later I realized that GoSub would require
them to have the same number for ENTER DOOR to work.

Upon checking my assumptions, I discover that I was quite mistaken, and
apologize to the standard library authors.

Piquan

Joel Ray Holveck

unread,
Feb 7, 2005, 8:11:42 PM2/7/05
to
>> (In fact, the standard Inform library relies on the fact that n_to and
>> n_obj have the same number-- something I found out, much to my dismay,
>> when I violated this just before putting my program in beta.)
> I'm surprised to hear that. Where does that happen?

Only in my imagination, as it turns out; see my reply to Zarf.

Piquan

Ben Rudiak-Gould

unread,
Feb 7, 2005, 9:20:04 PM2/7/05
to
David Alex Lamb wrote:
> My knowledge of decompilers has deteriorated a lot in the last 25 years,
> but I'm fairly sure this is a "standard problem" solved by some variant
> of type inference or some other form of dynamic code analysis. I think
> there are similar issues with byte-code-to-native compilers. I'm not
> saying it's easy, just that there is literature in the compiler
> community that addresses similar problems.

Since no one has mentioned it yet, let me plug my Z-machine decompiler, Reform:

http://www.darkweb.com/~benrg/if-decompilers/

It relies on the user to specify the types of local variables, arrays, and
whatnot, and does some simple local type inference to figure out the types
of local constants. E.g. if the expression "local2-->0 == $1234" appears in
the code, and the user declared local2 to be a pointer to a word array of
string addresses, then $1234 will be tagged as a string address. It works
well in practice, but at the expense of a lot of manual effort to identify
the types of variables.

I was somewhat aware of research in decompilation when I started. I read
Cifuentes's thesis, and even implemented some of the algorithms, but none of
that made it into the released version; I settled for simpler ad-hoc
techniques that worked well enough. I'd love to return to it, but I don't
know if I'll ever have the time.

One problem I had is that high-level analysis conflicts with accurate
decompilation to some extent. Zilch and Inform are very simple compilers,
and simple decompilation preserves more of the original program structure
than fancy graph analysis would. If you just pattern-match on code fragments
produced by the Inform compiler, you'll do fine. If you try to get clever,
you can easily end up with a flowgraph that can't be expressed directly in
Inform.

>>You can use some reasonable heuristics. For example, if local variable
>>1 is immediately printed, it's a very good bet that $4A8E is a string
>>address.

The problem is that it's a very good bet, rather than a certainty. I
actually implemented global type inference for Reform, but never got it to
work properly. It always inferred that (nearly) everything had every type. I
never tracked down exactly why. Undoubtedly one reason is that I didn't do
flow analysis on local variables; if one got reused to hold a different
type, equivalence classes would be merged incorrectly. Global variables are
nastier. Inform likes to use g254 and g255 to hold temporaries. Memory
locations and properties are nastier still.

I definitely believe that it could be made to work, at least much better
than my primitive attempt. If anyone wants to pick up the torch, I'll help
you out as best I can. :-)

> The interesting trick is if some clever compiler treats them as property
> identifiers in one place and simple integers in another. I have vague
> memories of some peephole optimizers generating such multi-use situations.

Both Inform and Zilch do basically no optimization, so I don't think this
situation arises.

-- Ben

Charles Richmond

unread,
Feb 8, 2005, 12:43:47 AM2/8/05
to
samwyse wrote:
>
> On or about 2/6/2005 4:37 PM, David Alex Lamb did proclaim:
> > My knowledge of decompilers has deteriorated a lot in the last 25 years, but
> > I'm fairly sure this is a "standard problem" solved by some variant of type
> > inference or some other form of dynamic code analysis. I think there are
> > similar issues with byte-code-to-native compilers. I'm not saying it's easy,
> > just that there is literature in the compiler community that addresses similar problems.
>
> Quite so. Java is very strongly typed, and (to support the
> retrospection APIs) the class files keep a ton of data that would
> normally be of interest to only to a debugger. As a result, Java
> decompilers are a dime a dozen, because you can get by with a purely
> static analysis. The Z machine is the exact opposite of Java, looking
> more like an Motorola 6502. The architecture even seems to support
> self-modifying code, although no one seems to have ever tried writing any.
>
Since the 6502 was *not* made by Motorola, that was some trick alright...

--
+--------------------------------------------------------------------+
| Charles and Francis Richmond The way feminists see oppression |
| everywhere, men see urinals. |
| richmond at plano dot net It's a design feature. -- Fred Reed |

+--------------------------------------------------------------------+

samwyse

unread,
Feb 7, 2005, 11:47:09 PM2/7/05
to
On or about 2/7/2005 11:43 PM, Charles Richmond did proclaim:

> Since the 6502 was *not* made by Motorola, that was some trick alright...

You know, I was going to say "the Moto 6800", then decided that it was
more like the 6502. I think I've programmed in too many different
assembly languages, they're all starting to blur together.

Graham Holden

unread,
Feb 8, 2005, 6:05:30 AM2/8/05
to
<snip bits about using various heuristics to determine types/uses in
decompiled Z-code.>

One approach that might help would be to build an interpreter into the
decompiler (or build a decompiler into an interpreter...) and allow the
game to be played through. If a suitable "walkthrough" is available, this
wouldn't need too much user-input, but even if it had to be played
manually, it probably wouldn't feel like the same level of user input as
trying to guess the meaning of lumps of code (and certainly wouldn't need
the same "hacker-level" user to do it).

The idea being that by noting how addresses, references, variables etc. are
_actually_ used (by seeing what the interpreter does in a real game
session), instead of how they _might_ be used (through code analysis), you
might get a better picture of what was originally intended.

Quite how you'd go about this I don't know yet... it's just an "off the top
of my head" idea at the moment.

samwyse

unread,
Feb 8, 2005, 6:32:22 AM2/8/05
to
On or about 2/7/2005 1:08 PM, Mad Scientist Jr did proclaim:

> Alas if I only had a slightly bigger brain. I would think the value in
> dcompiling z-code would mainly be to see how the masters wrote the
> classic Infocom games - more for learning and game design purposes. I'm
> sure a lot of that can be done from newer Inform games whose source is
> freely available, but I still would love to start by learning from the
> original Zork.

The only problem is that the old games were frequently buggy. Take a
look at http://www.google.com/search?q=infocom-bugs, but be warned that
there are a lot of spoilers on those lists. Inform has a lot of
safety-checks built in to prevent a large number of those problems. The
old games are respected today more for their writing and their ideas,
not their implementation.

There are some neat objects in some of the games, but I think that
everyone one of them has been re-created in Inform. Check out "Toyshop"
for details. Also, there is a re-implementation of "Deadline" (or is it
"Suspect") that was written entirely in Inform. However, I don't know
if the author worked from a disassembly of the original, or created
everything from scratch.

Mad Scientist Jr

unread,
Feb 8, 2005, 11:18:41 AM2/8/05
to
I will definitely check out Toyshop. What would you and other users say
are the top 3 games I can get the source code to, to study and learn
good design? Or will the Inform manual be sufficient?

Rexx Magnus

unread,
Feb 8, 2005, 11:25:04 AM2/8/05
to
On Tue, 08 Feb 2005 16:18:41 GMT, Mad Scientist Jr scrawled:

> I will definitely check out Toyshop. What would you and other users say
> are the top 3 games I can get the source code to, to study and learn
> good design? Or will the Inform manual be sufficient?
>

I learned from the DM4, and also picked up a few tips as to what to use
and when, after reading the Inform Beginners Guide (which has been
revamped in the last few months).

--
http://www.rexx.co.uk

To email me, visit the site.

Matthew Russotto

unread,
Feb 8, 2005, 11:35:33 AM2/8/05
to
In article <1107824365....@g14g2000cwa.googlegroups.com>,

That would be a problem, but only for Inform code. It's Infocom code
I'm actually more interested in. The problem is similar to the problem of
unions in other languages, so there's probably been some work on it.

Matthew Russotto

unread,
Feb 8, 2005, 11:39:44 AM2/8/05
to
In article <7p6h01lakst0opo1m...@4ax.com>,

Graham Holden <lo...@bottom.of.post> wrote:
>
>One approach that might help would be to build an interpreter into the
>decompiler (or build a decompiler into an interpreter...) and allow the
>game to be played through. If a suitable "walkthrough" is available, this
>wouldn't need too much user-input, but even if it had to be played
>manually, it probably wouldn't feel like the same level of user input as
>trying to guess the meaning of lumps of code (and certainly wouldn't need
>the same "hacker-level" user to do it).

Now that's an interesting way of getting around a number of problems
-- particularly the function-pointer problem and the untyped-constant
problem. You could use this to collect data which would be fed into a
more traditional decompiler.

Esa A E Peuha

unread,
Feb 9, 2005, 5:19:29 AM2/9/05
to
Andrew Plotkin <erky...@eblong.com> writes:

> If you're talking about ZIL, then I imagine they designed it ugly
> because they were MIT hackers. And even though they were effectively
> inventing a form of compiled BASIC, they were at least going to
> *pretend* it was LISP-like.

Or maybe they just didn't want to rewrite the existing Zork source
which was in MDL, and therefore retained the syntax even though the
actual language turned out to be very different.

--
Esa Peuha
student of mathematics at the University of Helsinki
http://www.helsinki.fi/~peuha/

Esa A E Peuha

unread,
Feb 9, 2005, 5:23:25 AM2/9/05
to
"Mad Scientist Jr" <usenet_...@yahoo.com> writes:

> Alas if I only had a slightly bigger brain. I would think the value in
> dcompiling z-code would mainly be to see how the masters wrote the
> classic Infocom games - more for learning and game design purposes. I'm
> sure a lot of that can be done from newer Inform games whose source is
> freely available, but I still would love to start by learning from the
> original Zork.

You might want to check out Dungeon; after all, it _is_ the "original
Zork".

Andrew Plotkin

unread,
Feb 9, 2005, 11:01:23 AM2/9/05
to
Here, Esa A E Peuha <esa....@helsinki.fi> wrote:
> Andrew Plotkin <erky...@eblong.com> writes:
>
> > If you're talking about ZIL, then I imagine they designed it ugly
> > because they were MIT hackers. And even though they were effectively
> > inventing a form of compiled BASIC, they were at least going to
> > *pretend* it was LISP-like.
>
> Or maybe they just didn't want to rewrite the existing Zork source
> which was in MDL, and therefore retained the syntax even though the
> actual language turned out to be very different.

I doubt very much that functional MDL code works as functional ZIL
code. Why would the original MDL have avoided list operations?

Brandon Roberson

unread,
Feb 10, 2005, 7:45:06 PM2/10/05
to
The idea of writing a new compiler targeting a new virtual machine, and
tweaking every little detail of both in such a way that you can use the
unmodified source code for one program to compile and run a single
game... I'd call that preposterous.

But then again, I'm the sort of person who'd probably do that for the
fun of it, and I've known quite a few programmers who'd feel the same.

0 new messages