Extending the Z-Machine with Parchment

Dannii

unread,

Dec 13, 2009, 9:23:02 PM12/13/09

to Parchment

I hadn't liked the idea of extending the Z-Machine before, but I'm
warming to it.

Something I think is very important is maintaining compatibility with
other interpreters. We need to be able to test if we're running on a
Parchment-extended compatible interpreter. With Glulx this would be
very easy, but it's not so easy with the Z-Machine.

So how can we test if we're on Parchment in a backwards compatible
way? We could set a bit in the header, but that seems rather risky to
me. Instead I've had the idea of hacking the @check_unicode opcode by
setting one of its undefined bits. Other interpreters won't set this
bit, and so it will give us a simple test.

I've made a test file to check just this. It calls @check_unicode for
both the character 'a' and a private use character. Ideally we'd set
the bit just on a single character, just in case there are zcode files
that would break by having other bits be set.

I've put the code up in a gist: http://gist.github.com/255723
The zcode file is also available:
http://gist.github.com/raw/255723/b976aea28ff2cd87c745d8839d80f1cc9b946c59/parchment-gestalt-test.z5

Please test this in all the interpreters you can! Hopefully both tests
will return 3. But posts the results whatever they are.

This is what Parchment prints (which is what I hope all the others
will display too):

@check_unicode 97 (a): 3
@check_unicode $EF94 (private use character): 3

Ben Cressey

unread,

Dec 13, 2009, 10:16:17 PM12/13/09

to parc...@googlegroups.com

Please test this in all the interpreters you can! Hopefully both tests
will return 3. But posts the results whatever they are.

nFrotz will return 1 for @check_unicode $EF94. I'd borrowed that code for the forthcoming Glk Frotz release and only realized the error after I ran your test.

I've patched my code but it's perhaps a common mistake, especially for interpreters without an easy way to check the font being used.

Dannii

unread,

Dec 13, 2009, 10:33:58 PM12/13/09

to Parchment

Actually 1 is probably better for $EF94. 3 means the terp can print
and read the character, but there isn't a way for arbitrary unicode
characters to be read in zcode AFAIK (you need to use custom ZSCII
alphabets.)

Although if Glk can't display all unicode characters either then it
should return 0...

The main thing I want is consistency. I had thought about using a
private use character because it would be obscure and unlikely to
occur in normal zcode files, but then it would be weird to check if a
terp can print 'a' too. If all the terps are consistent with their
results for 'a' then we'll use that.

Thanks Ben!

Ben Cressey

unread,

Dec 15, 2009, 7:10:44 PM12/15/09

to parc...@googlegroups.com

Although if Glk can't display all unicode characters either then it
should return 0...

Glk can display any Unicode character. The limitation is more from the library side: it doesn't swap in fonts if the glyph is not available in the default font, and it doesn't expose the available glyphs in a way that an interpreter can query.

I've changed the return value back to 1. Thanks for the explanation.

The main thing I want is consistency. I had thought about using a
private use character because it would be obscure and unlikely to
occur in normal zcode files, but then it would be weird to check if a
terp can print 'a' too. If all the terps are consistent with their
results for 'a' then we'll use that.

If there were a way for the available glyphs to be queried, it would be dangerous to rely on a specific return value. One of the fonts that was being considered for garglk was Charis SIL, which uses the PUA range for some special characters. Not $EF94, admittedly, but since assignment is arbitrary, there's no way to guarantee that it would not be printable in a given font, or readable in a given game. MUFI, for instance, recommends $EF94 for LATIN CAPITAL LIGATURE AU.

However, if you pick something like $FFFE, which is guaranteed not to be a character, you could check to see if the interpreter claims to be able to read and print it. This could be reported as a legitimate bug in non-Parchment interpreters, though it may be bugged in most of them.

Dannii

unread,

Dec 18, 2009, 6:11:10 AM12/18/09

to Parchment

On Dec 16, 10:10 am, Ben Cressey <bcres...@gmail.com> wrote:
>
> > The main thing I want is consistency. I had thought about using a
> > private use character because it would be obscure and unlikely to
> > occur in normal zcode files, but then it would be weird to check if a
> > terp can print 'a' too. If all the terps are consistent with their
> > results for 'a' then we'll use that.
>
> If there were a way for the available glyphs to be queried, it would be
> dangerous to rely on a specific return value. One of the fonts that was
> being considered for garglk was Charis SIL, which uses the PUA range for
> some special characters. Not $EF94, admittedly, but since assignment is
> arbitrary, there's no way to guarantee that it would not be printable in a
> given font, or readable in a given game.

> MUFI<http://www.mufi.info/specs/MUFI-Alphabetic-2-0.pdf>,

> for instance, recommends $EF94 for LATIN CAPITAL LIGATURE AU.
>
> However, if you pick something like $FFFE, which is

> guaranteed<http://en.wikipedia.org/wiki/Unicode_Specials>not to be a

> character, you could check to see if the interpreter claims to
> be able to read and print it. This could be reported as a legitimate bug in
> non-Parchment interpreters, though it may be bugged in most of them.

I think I'll stick with 'a'.

What Parchment will do is:

@check_unicode(char):
if char == 'a' return 7
else return 3

(Although it could be nice make it check properly whether it can
actually read an arbitrary character that can be changed if and when
someone reports the bug.)

What a parchment-IO game will do is check if @check_unicode('a') == 7.
Hopefully all other interpreters will return 3, I'd be very surprised
if they do not!

So far so good, this system should allow us to detect whether we are
on Parchment or not. However what about non-Parchment-IO games? Well
hopefully none of them have any reason to check whether 'a' can be
printed. If they do, hopefully they will either check if it returns a
positive value, or check the individual bits. If there are any silly
games which need @check_unicode('a') == 3 exactly, then... well we can
cross that barrier later. We could probably check that the games are
all 2010 or later.

Now why I am suggesting this? It will allow us to test and develop a
more ideal web IO system earlier than if we had to wait for Quixe to
be finished. Quixe would be the ultimate target for this IO system,
though testing it with zcode games will be fun and helpful!

David Kinder

unread,

Feb 5, 2010, 3:09:21 AM2/5/10

to Parchment

> So how can we test if we're on Parchment in a backwards compatible
> way? We could set a bit in the header, but that seems rather risky to
> me. Instead I've had the idea of hacking the @check_unicode opcode by
> setting one of its undefined bits. Other interpreters won't set this
> bit, and so it will give us a simple test.

I'm a bit late to this, but can I suggest that if you want to extend
the Z-machine you don't do it by hacking the meaning of
@check_unicode? It would be a lot more elegant to have something in
the header, and this would at least be following in the tradition of
Infocom data files that occasionally use the contents of the header to
change themselves in various ways. The obviuous candidates in the
header for this are

1) The interpreter number field (offset 30 in the header), used to say
what sort of machine the interpreter is running on. Infocom used
values from 1 to 11, and I've never seen any interpreter use anything
outside of this. You could pick a number (e.g. 20) to mean "you're
running under Parchment".

2) The user name field (8 bytes starting at offset 56 in the header).
Used only by a few Infocom test files to change behaviour. The
interpreter could set this to "Parch".

One or both of these seems rather more appealing than hacking
@check_unicode: if you go ahead with "use extra bits of the return
code" for this one, then in future we can't re-use that bit for
anything actually related to Unicode.

David

Dannii

unread,

Feb 10, 2010, 3:01:03 AM2/10/10

to Parchment

Thanks for your input David!

I wasn't sure about using the header, as from the spec it seemed like
there was a huge degree of incompatibility or at least many different
implementations. @check_unicode however was new and clearly defined.

I doubt anyone will ever succeed in extending @check_unicode, but I
feel what you're saying. If you're sure setting the interpreter number
to something new won't cause problems that will be better. In
addition, I can make it check the year and only set the new
interpreter number if it's (20)10 or later. Then we'd only have to
worry about I6 or I7 doing something silly... there's no reason for
them to care about the interpreter number is there?

David Kinder

unread,

Feb 10, 2010, 11:40:03 AM2/10/10

to Parchment

> interpreter number if it's (20)10 or later. Then we'd only have to
> worry about I6 or I7 doing something silly... there's no reason for
> them to care about the interpreter number is there?

No, the I6 and I7 libraries never look at the interpreter number,
except for the version command, which just prints the number out. To
my knowledge the only games that did different things for different
machines were the Infocom ones, and then only in a few cases, mostly
Beyond Zork and the V6 games.

David

Reply all

Reply to author

Forward