[Glulx] Text compression

Andrew Plotkin

unread,

Aug 10, 1999, 3:00:00 AM8/10/99

to

Well, it's more or less come down to Huffman compression.

In an effort to allow a little flexibility, the VM allows both character
and word-based Huffman encoding, or a mixture, if you like. The
"compression table" (in the game file) is a binary tree, whose leaves can
be any or all of the following:

* Single character (interpreted as Latin-1, as usual)
* Array of characters (terminated by a zero byte)
* End-of-string marker (stops the decompression process)
* Address of a Glulx string or function in memory
* Address of the address of a Glulx string or function in memory
(this is useful if you want to modify the effect of a node without
having to modify the compression table itself. Typically the table will
be in ROM, so this is a good thing.)

I'll write up the storage format of this tree for the spec, but it's
trivial. Buncha bytes.

Note that the standard Inform abbreviation mechanism can be handled by
this (using the array-of-chars leaf). So can the dynamic string mechanism;
in fact, that becomes more flexible, because strings can be nested
arbitrarily, and you can also have function calls in there.

The *implementation* of nested strings and function calls turns out to be
a little tricky, but not horrific. Basically a string-printing operation
will *be* a form of function call, and will push a call stub onto the
stack. Recursion then works in the usual Glulx way.

(Naively, it's fairly wasteful to have to push and pop stack values every
time you print a string. But in fact it's very easy for the interpreter to
optimize this out in the common case -- printing a single string with no
funky sub-string or sub-function calls. A game that makes heavy use of
abbreviations will run a bit more slowly, but I don't think it'll be a
bottleneck.)

--Z

"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the
borogoves..."

Andrew Plotkin

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

Andrew Plotkin <erky...@netcom.com> wrote:
> Well, it's more or less come down to Huffman compression.
>
> In an effort to allow a little flexibility, the VM allows both character
> and word-based Huffman encoding, or a mixture, if you like.

I've updated the Glulx specification to include all this stuff. I've also
added a few more things, like the dynamic memory opcodes.

http://www.eblong.com/zarf/glulx/

Most of this is implemented in the interpreter -- enough to run a game
with compressed strings and abbreviations -- but not quite all of it, so I
haven't updated the interpreter source code yet.

Magnus Olsson

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

In article <7oqckt$e...@dfw-ixnews17.ix.netcom.com>,

Andrew Plotkin <erky...@netcom.com> wrote:
>Well, it's more or less come down to Huffman compression.
>
>In an effort to allow a little flexibility, the VM allows both character

>and word-based Huffman encoding, or a mixture, if you like. The
>"compression table" (in the game file) is a binary tree, whose leaves can
>be any or all of the following:

What are your ideas on how the Inform compiler should utilize compression?
--
Magnus Olsson (m...@df.lth.se, zeb...@pobox.com)
------ http://www.pobox.com/~zebulon ------

Andrew Plotkin

unread,

Aug 12, 1999, 3:00:00 AM8/12/99

to

Magnus Olsson <m...@bartlet.df.lth.se> wrote:
> In article <7oqckt$e...@dfw-ixnews17.ix.netcom.com>,
> Andrew Plotkin <erky...@netcom.com> wrote:
>>Well, it's more or less come down to Huffman compression.
>>
>>In an effort to allow a little flexibility, the VM allows both character
>>and word-based Huffman encoding, or a mixture, if you like. The
>>"compression table" (in the game file) is a binary tree, whose leaves can
>>be any or all of the following:
>
> What are your ideas on how the Inform compiler should utilize compression?

At the moment, what I've got is that it compresses by characters, except
that if you've got -e mode on, it adds words in "Abbreviate" directives to
the list. (That's easy; Inform already has the code to check for those
words and render them as one entity.)

I still have to add support for the @0, @1, @2... code, which will be
dynamic strings, as in Z-code Inform. (Only now they can be nested, or
function calls.)

Other than that, I doubt I'll do anything else to it. Entirely word-based
compression would take a lot more memory and time during compilation, and
it's generally not worth it.