Glulx: the cost of avoiding the stack.

d...@pobox.com

unread,

Feb 26, 2007, 1:29:01 PM2/26/07

to

In a recent thread I asserted that if one didn't want to use the Glulx
VM stack when compiling ones language then one didn't need to, and the
result would not be infeasibly inefficient. I now have some data.

There are probably several reasons why you might want to avoid using
the Glulx VM stack, most of them revolve around not being able to
address it (that is, create a pointer to it). I need to address the
stack in order to scan it for garbage collection (or rather, I will
need to do that when I get round to it); other languages might want to
stack allocate structured (larger than 1 word) objects and/or create
pointers to stack-allocated objects.

So. I believe that the cost of not using the Glulx VM stack is a
speed hit of no more than 4 times worse (that is, programs that don't
use the stack will take no longer than 400% of the time taken by an
equivalent program that does use the stack).

Here's tak in Inform 7 (see Gabriel's Performance and Evaluation of
Lisp Systems: http://www.dreamsongs.com/Books.html )

to decide what number is tak (x - number) ** (y - number) ** (z -
number):
if y >= x
begin;
decide on z;
end if;
let predecessor of x be x - 1;
let taka be tak predecessor of x ** y ** z;
let predecessor of y be y - 1;
let takb be tak predecessor of y ** z ** x;
let predecessor of z be z - 1;
let takc be tak predecessor of z ** x ** y;
decide on tak taka ** takb ** takc;

when play begins: say tak 24 ** 16 ** 8; end the game saying
"Goodbye".

h is room.

Here's tak in Lisp:

(defun tak (x y z)
(if (not (< y x))
z
(tak (tak (1- x) y z)
(tak (1- y) z x)
(tak (1- z) x y))))

Both of these are compiled to glulx and executed using glulxe on my
MacBook, times taken using /usr/bin/time.

Inform 7: echo quit | time glulxe /Users/drj/tak.ulx
3.39 real 3.38 user 0.00 sys

Lisp: /usr/bin/time glulxe test000010.ulx
11.05 real 11.04 user 0.00 sys

I don't really have a Lisp to glulx compiler. I have the very
smallest beginnings of a Lisp to glulx compiler that's just enough to
compile the tak benchmark into very inefficient code. Currently the
way (< y x) is particular embarrassing. Still, the translated Lisp
version doesn't use the VM stack, and whilst it is a fair bit slower,
it's not horrendously slow.

Here's an extract from the Lisp version of the glulx file. The
beginning of the function tak:

(in my undocumented disassembler syntax)

000000C8 4C 0D0D 00 0. 04 astore ram:00 (0 / 00) ram:04
000000CD 40 DD 00 04 copy ram:00 ram:04
000000D1 10 1D0D 00 50 00 add ram:00 (80 / 50) ram:00
000000D7 4C 1D0D 04 01 0C astore ram:04 (1 / 01) ram:0C
000000DD 48 1D0D 04 FE 14 aload ram:04 (254 / FE) ram:14
000000E3 4C 1D0D 04 02 14 astore ram:04 (2 / 02) ram:14
000000E9 48 1D0D 04 FD 14 aload ram:04 (253 / FD) ram:14
000000EF 4C 1D0D 04 03 14 astore ram:04 (3 / 03) ram:14
000000F5 40 D1 03 68 copy (3 / 03) ram:68
000000F9 48 1D0D 04 02 6C aload ram:04 (2 / 02) ram:6C
000000FF 48 DD0D 04 68 70 aload ram:04 ram:68 ram:70
00000105 27 DD03 6C 70 0000001B jge ram:6C ram:70 (27 / 0000001B)
0000010E 40 DD 70 6C copy ram:70 ram:6C
00000112 10 1D0D 68 01 68 add ram:68 (1 / 01) ram:68
00000118 26 1D03 68 04 FFFFFFE0 jlt ram:68 (4 / 04) (4294967264 /
FFFFFFE0)
00000121 20 03 00000014 jump (20 / 00000014)
00000127 48 1D0D 04 FF 14 aload ram:04 (255 / FF) ram:14
0000012D 4C 1D0D 04 02 14 astore ram:04 (2 / 02) ram:14
00000133 20 03 00000124 jump (292 / 00000124)
00000139 48 1D0D 04 FD 14 aload ram:04 (253 / FD) ram:14
0000013F 4C 1D0D 04 02 14 astore ram:04 (2 / 02) ram:14
00000145 48 1D0D 04 02 14 aload ram:04 (2 / 02) ram:14
0000014B 11 1D0D 14 04 14 sub ram:14 (4 / 04) ram:14
00000151 4C 1D0D 04 02 14 astore ram:04 (2 / 02) ram:14
00000157 48 1D0D 04 FE 14 aload ram:04 (254 / FE) ram:14
0000015D 4C 1D0D 04 03 14 astore ram:04 (3 / 03) ram:14
00000163 48 1D0D 04 FF 14 aload ram:04 (255 / FF) ram:14
00000169 4C 1D0D 04 04 14 astore ram:04 (4 / 04) ram:14
0000016F 10 1D0D 04 14 00 add ram:04 (20 / 14) ram:00
00000175 10 1D0D 04 08 08 add ram:04 (8 / 08) ram:08
0000017B 40 D3 00000189 0C copy (393 / 00000189) ram:0C
00000182 8104 03 000000C8 jumpabs (200 / 000000C8)
00000189 4C 1D0D 04 02 10 astore ram:04 (2 / 02) ram:10

ram:XX means ram location XX (in hex).
ram:00 is the Stack Pointer, which is actually only used to denot the
end of the argument list when calling a function.
ram:04 is the Base Pointer, which throughout the life of a function is
constant and is the value that Stack Pointer had on entry to the
function. You can see BP being saved (at SP) at the beginning of the
function. Note that this is indexed positively to get hold of
temporaries, and negatively to get hold of arguments.
ram:08 is the Arg Pointer; it denotes the beginning of the argument
list when calling a function.
ram:0C is the Link Register; it denotes the continuation when calling
a function (the address to which the function will return). You can
see this being saved at the beginning of the function too.

And just in case you were wondering, integers are represented by
shifting them left 2 (2-bit tagged representation). That's why (1- x)
gets compiled into sub blah 4 blah.

At the the moment this is really nothing more than a proof-of-conept:
that you don't need to use the Glulx stack if you don't want to.

More detail, including the actual .ulx files if anyone wants them,
available on request.

Oh. Is there a good place to discuss this nargy VM stuff? I mean, so
the rest of RAIF can get on which discussing the art of creating
adventure games.

drj

Andrew Plotkin

unread,

Feb 26, 2007, 10:24:11 PM2/26/07

to

Here, d...@pobox.com wrote:
> In a recent thread I asserted that if one didn't want to use the Glulx
> VM stack when compiling ones language then one didn't need to, and the
> result would not be infeasibly inefficient. I now have some data.

Thanks! Data, yum. :)

> So. I believe that the cost of not using the Glulx VM stack is a
> speed hit of no more than 4 times worse (that is, programs that don't
> use the stack will take no longer than 400% of the time taken by an
> equivalent program that does use the stack).

I confess that this is worse than I thought. I figured a stack access
would be four-to-eight times faster than a RAM access, but I didn't
figure that stack accesses would dominate interpreter activity to such
an extent.

On the up side, I'm glad my stack architecture does so much good. :)

So... if the Glulx spec were to include more direct stack access
opcodes, would it do you any good?

The "full stack scenario" is where you want to allocate stack objects,
of any size, and refer to them through pointers that can be intermixed
with (that is, distinguished from when found as a value) ordinary
memory pointers. That would mean putting the stack in RAM, which is
not what I'm thinking about at the moment. Are there lesser scenarios
which are still useful to you?

I guess the minimum scenario is a "get stack pointer" opcode, and then
"read word from stack" and "write word to stack" (which address the
stack using that pointer value -- i.e., an absolute value, as opposed
to the down-from-the-top offset which we've currently got.)

This wouldn't let you write code which correctly handled both stack
pointers and RAM pointers. But if you knew which you had, you could
write code to deal with it. And it would let you do your
garbage-collector.

> Oh. Is there a good place to discuss this nargy VM stuff?

I'm okay with it here. Every level of the IF development stack seems
to be fair game.

--Z

--
"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*
Making a saint out of Reagan is sad. Making an idol out of Nixon ("If the
President does it then it's legal") is contemptible.

greg

unread,

Feb 27, 2007, 2:01:47 AM2/27/07

to

Andrew Plotkin wrote:
> That would mean putting the stack in RAM, which is
> not what I'm thinking about at the moment.

Is there any reason *not* to put the stack in RAM? That
would seem to be the simplest solution, as it would allow
doing anything anyone might want to do with the stack,
without having to introduce new opcodes, new kinds of
pointers, etc.

--
Greg

Andrew Plotkin

unread,

Feb 27, 2007, 2:17:16 AM2/27/07

to

Here, greg <gr...@cosc.canterbury.ac.nz> wrote:
> Andrew Plotkin wrote:
> > That would mean putting the stack in RAM, which is
> > not what I'm thinking about at the moment.
>
> Is there any reason *not* to put the stack in RAM?

Well, apparently it would slow down games by a factor of four.

--Z

--
"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*

It's a nice distinction to tell American soldiers (and Iraqis) to die in
Iraq for the sake of democracy (ignoring the question of whether it's
*working*) and then whine that "The Constitution is not a suicide pact."

d...@pobox.com

unread,

Feb 27, 2007, 3:50:17 AM2/27/07

to

On Feb 27, 7:17 am, Andrew Plotkin <erkyr...@eblong.com> wrote:

> Here, greg <g...@cosc.canterbury.ac.nz> wrote:
> > Andrew Plotkin wrote:
> > > That would mean putting the stack in RAM, which is
> > > not what I'm thinking about at the moment.
>
> > Is there any reason *not* to put the stack in RAM?
>
> Well, apparently it would slow down games by a factor of four.

I think you misinterpret my experiment. The thing that slows down my
Lisp version of tak isn't particularly having the stack in RAM, it's
that having got the stack in RAM I can no longer use the really short
(and presumably fast) opcodes and operand modes for accessing the
Glulx VM stack directly. Essentially every load/store to a stack
location becomes an aload or an astore. And if I need to move data
from one stack location to another then that's an aload/astore pair.

Consider how (< y x) is compiled (as I said previously, this is
embarrassing. Part of that embarrassment is that the Lisp version of
<= can take an arbitrary number of arguments and checks that they're
all in order. This is neat for doing (<= 0 x 9)):

000000DD 48 1D0D 04 FE 14 aload ram:04 (254 / FE) ram:14
000000E3 4C 1D0D 04 02 14 astore ram:04 (2 / 02) ram:14
000000E9 48 1D0D 04 FD 14 aload ram:04 (253 / FD) ram:14
000000EF 4C 1D0D 04 03 14 astore ram:04 (3 / 03) ram:14
000000F5 40 D1 03 68 copy (3 / 03) ram:68
000000F9 48 1D0D 04 02 6C aload ram:04 (2 / 02) ram:6C
000000FF 48 DD0D 04 68 70 aload ram:04 ram:68 ram:70
00000105 27 DD03 6C 70 0000001B jge ram:6C ram:70 (27 / 0000001B)

<= is (currently) coded to evaluate all of its arguments onto
consecutive slots in the stack, then it loops over and compares
adjacent pairs. In our case the arguments to <= are both arguments to
tak; they end up at BP[2] and BP[3].

The first two instructions copy the second argument, y - stored at
BP[-2], into BP[2], via a temporary (ram:14). This takes one aload
and one astore. See how tedious this is? The next two instructions
do the same for x. When it comes to actually doing the comparison
(the JGE instruction at 0x105), the operands have to be in temporaries
(ram:6C and ram:70 in this case); the comparison cannot use the values
stored in the stack directly. Which means that the stacked values
have to be copied (again, embarrassingly) using aload instructions (at
0xf9 0xff).

All these aloads and astores are where the speed hit comes. I think.

If the Glulx stack were memory mapped then I'd just use that.
Arguments would be pushed on the stack using operand mode 8 and popped
off by the instruction that consumed them using operand mode 8 also.
The generated code would look much more natural and be much more
compact. The only speed hit would be if that meant that the
interpreter necessarily slowed down.

drj

d...@pobox.com

unread,

Feb 27, 2007, 4:25:54 AM2/27/07

to

On Feb 27, 3:24 am, Andrew Plotkin <erkyr...@eblong.com> wrote:
> Here, d...@pobox.com wrote:
> > In a recent thread I asserted that if one didn't want to use the Glulx
> > VM stack when compiling ones language then one didn't need to, and the
> > result would not be infeasibly inefficient. I now have some data.
>
> Thanks! Data, yum. :)
>
> > So. I believe that the cost of not using the Glulx VM stack is a
> > speed hit of no more than 4 times worse (that is, programs that don't
> > use the stack will take no longer than 400% of the time taken by an
> > equivalent program that does use the stack).
>
> I confess that this is worse than I thought. I figured a stack access
> would be four-to-eight times faster than a RAM access, but I didn't
> figure that stack accesses would dominate interpreter activity to such
> an extent.

Well, a cursory examination of the code I generate will reveal how
poorly (that is, not) optimised it is. Also, the tak function is
deliberately designed to do almost nothing other than thrash up and
down the stack. That's way I chose tak, it's close to a worst case
(as much stack access as possible) so the corresponding speed slowdown
should be close to the largest possible. I expect a more typical
program would not show as much slowdown.

>
> On the up side, I'm glad my stack architecture does so much good. :)
>
> So... if the Glulx spec were to include more direct stack access
> opcodes, would it do you any good?

Yees.

> The "full stack scenario" is where you want to allocate stack objects,
> of any size, and refer to them through pointers that can be intermixed
> with (that is, distinguished from when found as a value) ordinary
> memory pointers. That would mean putting the stack in RAM, which is
> not what I'm thinking about at the moment. Are there lesser scenarios
> which are still useful to you?
>
> I guess the minimum scenario is a "get stack pointer" opcode, and then
> "read word from stack" and "write word to stack" (which address the
> stack using that pointer value -- i.e., an absolute value, as opposed
> to the down-from-the-top offset which we've currently got.)

My primary purpose in having the stack directly addressable is that I
need to scan it for GC. This involves reading the entire stack and
modifying some of the values found therein. This activity happens
infrequently, but all at the same time (when GC begins the entire
stack is scanned, at other times no special stack access is required;
GC happens infrequently).

Crucially I need to be able to access the entire stack, not just the
frame of the current function. I need to start at the beginning of
the stack and go right to the end; in other words if I have a pointer
into the stack then I need to be able to increment it. If you're
thinking you can make the pointer returned by the "get stack pointer"
opcode opaque so that you don't define the relationship between the
addresses of one stack item and another stack item then that's no good
to me. If you define the layout of individual stack frames but still
allow opaque junk between stack frames then I'll use "get stack
pointer" at the beginning of each function and chain all the stack
frames together by hand so that I can traverse them.

It's simplest if the architectural stack is defined by being a
particular sequence of bytes at particular address in memory. I don't
particular care if I have to use a tedious opcode to access that
memory, but it is important that "read word from stack" can take all
addresses from 0 to N (or P to P+N, I don't really care, as long as I
know), and be defined for all such addresses.

It's even simpler if the stack is mapped into RAM. Glulxe implements
the stack as a single block. It seems to me that it would be a simple
matter to put this at the end of RAM and define the particular layout
used by Glulxe as the architectural specification (or clean it up a
bit first). Ah, this conflicts with @malloc. Well, in that case put
the stack at addresses -stacksize to -1 (that is, the top end of
memory). Glulxe can still allocate the whole lot as a single block;
memmap would point into the interior of a big block of memory.

> This wouldn't let you write code which correctly handled both stack
> pointers and RAM pointers. But if you knew which you had, you could
> write code to deal with it. And it would let you do your
> garbage-collector.

If I needed stack pointers in this scenario then I'd tag them so I
could distinguish; but I don't _need_ them. In Lisp (declare (dynamic-
extent ... )) provides an opportunity to stack-allocate some objects,
but I'd probably only bother if the stack was memory mapped.

With the provisos above, yes, something along the lines of those
opcodes would be sufficient to do the GC.

The suggestion strikes me as just barely sufficient for GC and almost
completely useless for stack allocated objects (which I don't want,
but didn't you say that Inform 7 could potentially make good use of
them?). If code to access a stack allocated object and code to access
a heap allocated object are different then it makes stack allocated
objects not very useful. Surely the whole point is that I can pass an
object pointer to a function and it can manipulate the object without
caring whether the object is on the stack or statically allocated?
But, I still don't really need stack-allocated objects.

drj

Tor Andersson

unread,

Feb 27, 2007, 8:01:43 AM2/27/07

to

On Feb 27, 9:50 am, d...@pobox.com wrote:
> On Feb 27, 7:17 am, Andrew Plotkin <erkyr...@eblong.com> wrote:
>
> > Here, greg <g...@cosc.canterbury.ac.nz> wrote:
> > > Andrew Plotkin wrote:
> > > > That would mean putting the stack in RAM, which is
> > > > not what I'm thinking about at the moment.
>
> > > Is there any reason *not* to put the stack in RAM?
>
> > Well, apparently it would slow down games by a factor of four.
>
> I think you misinterpret my experiment. The thing that slows down my
> Lisp version of tak isn't particularly having the stack in RAM, it's
> that having got the stack in RAM I can no longer use the really short
> (and presumably fast) opcodes and operand modes for accessing the
> Glulx VM stack directly. Essentially every load/store to a stack
> location becomes an aload or an astore. And if I need to move data
> from one stack location to another then that's an aload/astore pair.

I'm going to pipe in here and agree. It's the extra instructions
required
to move data to and from the stack space in RAM that costs, because
when using the native glulx stack the instructions are extremely
concise.

Here's an experiment with a toy compiler I wrote for Glulx:

tak ( x, y, z )
{
if ( ! ( y < x ) )
return z;
return
tak( tak(1-x, y, z),
tak(1-y, z, x),
tak(1-z, x, y));
}

Turns into this assembly code:

:_tak
dc.b 193 4 3 0 0
jlt a4 a0 :L0
return a8
:L0
sub 1 a8 sp
callfiii _tak sp a0 a4 sp
sub 1 a4 sp
callfiii _tak sp a8 a0 sp
sub 1 a0 sp
callfiii _tak sp a4 a8 sp
callfiii _tak sp sp sp sp
return sp

And then, when using a manual stack with pushing and popping
arguments on a global stack object.

memory stk[2000];
global top = 32; // first 32 are temps

tak ( )
{
stk[0] = stk[top--]; // pop z
stk[1] = stk[top--]; // pop y
stk[2] = stk[top--]; // pop x

if ( ! ( stk[1] < stk[2] ) )
{
stk[++top] = stk[0]; // push return value
return;
}

stk[3] = 1 - stk[2];
stk[++top] = stk[3];
stk[++top] = stk[1];
stk[++top] = stk[0];
tak(); // leave result on stack

stk[3] = 1 - stk[1];
stk[++top] = stk[3];
stk[++top] = stk[0];
stk[++top] = stk[2];
tak(); // leave result on stack

stk[3] = 1 - stk[0];
stk[++top] = stk[3];
stk[++top] = stk[2];
stk[++top] = stk[1];
tak(); // leave result on stack

tak();
}

:_tak
dc.b 193 0 0
copy *_top sp
sub *_top 1 *_top
aload _stk sp sp
astore _stk 0 sp
copy *_top sp
sub *_top 1 *_top
aload _stk sp sp
astore _stk 1 sp
copy *_top sp
sub *_top 1 *_top
aload _stk sp sp
astore _stk 2 sp
aload _stk 2 sp
aload _stk 1 sp
jlt sp sp :L0
aload _stk 0 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
return 0
:L0
aload _stk 2 sp
sub 1 sp sp
astore _stk 3 sp
aload _stk 3 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
aload _stk 1 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
aload _stk 0 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
callf _tak 0
aload _stk 1 sp
sub 1 sp sp
astore _stk 3 sp
aload _stk 3 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
aload _stk 0 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
aload _stk 2 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
callf _tak 0
aload _stk 0 sp
sub 1 sp sp
astore _stk 3 sp
aload _stk 3 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
aload _stk 2 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
aload _stk 1 sp
add *_top 1 *_top
copy *_top sp
astore _stk sp sp
callf _tak 0
callf _tak 0
return 0

That's roughly a factor four increase in the number of instructions.

Tor

d...@pobox.com

unread,

Feb 27, 2007, 8:42:56 AM2/27/07

to

Not to mention the fact that you're still using the Glulx VM stack for
function call/return; avoiding the stack for that cause more bloat.

Nice example.

drj

Andrew Plotkin

unread,

Feb 27, 2007, 11:43:38 AM2/27/07

to

Here, Tor Andersson <tor.an...@gmail.com> wrote:
>
> I'm going to pipe in here and agree. It's the extra instructions
> required to move data to and from the stack space in RAM that costs,
> because when using the native glulx stack the instructions are
> extremely concise.

Conciseness is related to speed, but it's not the same thing.

I take your points. I still think that a RAM stack has an inherent
speed hit, because every stack access then has to be read
byte-by-byte. And I still think you're not considering the question of
what the VM can do to help you *without* changing the current
architecture. Which, as I said, I am not considering at the present
time.

For example, what if there was an opcode addressing mode for "indirect
read and increment register"? (This is how I learned to do stack
operations in good old 6502 assembly.)

Glulx doesn't have registers the way the 6502 did, but it strikes me
(with some amusement) that the top of the *native* stack is a pretty
good place for the RAM-stack pointer.

--Z

--
"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*

9/11 did change everything. Since 9/12, the biggest threat to American
society has been the American president. I'd call that a change.

Andrew Plotkin

unread,

Feb 27, 2007, 11:56:44 AM2/27/07

to

Here, Andrew Plotkin <erky...@eblong.com> wrote:
> ... *without* changing the current architecture. Which, as I said, I

> am not considering at the present time.

And if you still want to know why, it's because I would stare
dejectedly at the problem, maybe write a spec sometime this summer,
think about updating the interpreter, and never update the compiler.

I'm trying to think about solutions that you will actually get. The
current malloc functionality is on the right scale -- as witness the
fact that you got it.

Stack-based objects, to go off on another tangent, aren't a
requirement. They're a solution. I think the requirement is "allocate
RAM-mapped objects which don't need to be specifically deallocated",
but that *is* a good use case for the current spec. You're not popping
those on and off the stack every opcode.

Blank

unread,

Feb 28, 2007, 6:27:15 AM2/28/07

to

d...@pobox.com wrote:

> Oh. Is there a good place to discuss this nargy VM stuff? I mean, so
> the rest of RAIF can get on which discussing the art of creating
> adventure games.
>
> drj
>

Though I don't have enough knowledge to contribute, I'm finding it
interesting looking over everyone's shoulder. The discussion is neatly
contained in one thread so anyone who's not interested can easily skip
it - or choose not to download it if that's an issue.

jz

Michael Martin

unread,

Mar 2, 2007, 5:42:34 PM3/2/07

to

On Feb 27, 8:43 am, Andrew Plotkin <erkyr...@eblong.com> wrote:
> For example, what if there was an opcode addressing mode for "indirect
> read and increment register"? (This is how I learned to do stack
> operations in good old 6502 assembly.)

If such an addressing mode were to be added, would there be any reason
not to go all-out and use the (Base Register) + (constant step) *
(Index Register) paradigm? While there's something to be said for
Good Old 6502 Assembly, LDA $02, Y is not my fondest memory of it.
(Fonder than LDA ($02), X, though.)

--Michael

Andrew Plotkin

unread,

Mar 2, 2007, 9:15:22 PM3/2/07

to

Well, I don't really have registers to play with, other than the
(native) stack...

I was imagining something where you have a value on top of the stack
(e.g., 0x1004). A write in this address mode would store into memory
address 0x1004 and change the top stack value to 0x1008. A read, the
reverse.

You're suggesting having, I think, 0x1000 0x0004 on top of the stack,
and a write would write into 0x1004 and change the upper value to
0x0008?

But I haven't thought whether this would be worth anything to programs
which are (mostly) normal I6 code, compiled by the I6 compiler. Hm.
You'd also need to make sure that the @call and @glk opcodes are
invokable. A new compiler could avoid the former, but not the latter.

Michael Martin

unread,

Mar 4, 2007, 11:04:14 PM3/4/07

to

On Mar 2, 6:15 pm, Andrew Plotkin <erkyr...@eblong.com> wrote:
> I was imagining something where you have a value on top of the stack
> (e.g., 0x1004). A write in this address mode would store into memory
> address 0x1004 and change the top stack value to 0x1008. A read, the
> reverse.

I completely misread you here, actually; I thought you were
suggesting
an addressing mode that included an indexed-stack-peek operation,
similar to the... er... *sound of digging up and checking the
Commodore
64 Programmer's Reference Guide* indirect-indexed addressing mode.

> You're suggesting having, I think, 0x1000 0x0004 on top of the stack,
> and a write would write into 0x1004 and change the upper value to
> 0x0008?

Based on my misreading, I was actually suggesting that one instead
mimic the astore* instructions when looking down the stack.

I'll have to study the thread more carefully before commenting
further.

--Michael

d...@pobox.com

unread,

Mar 7, 2007, 6:04:06 AM3/7/07

to

Until I'd implemented I thought this was the simplest.

Day 1 (some "Days" may settle in transit, etc):

Simply allocate the interpreter's stack contiguously and immediately
before main memory. Stack now appears at addresses -stacksize through
to -1 (2^32-stacksize to 2^32-1 if you prefer unsigned addresses).
That was easy.

Unfortunately, since the stack is accessed by native C code using
native C access, that means the stack appears swabbed in VM memory on
little-endian architectures (Intel). Aside: The C aliasing rules are
preserved because the glulxe VM always accesses VM memory using a
series of byte accesses.

Day 2:

Swab all VM accesses to its main memory so that from the native C code
in the interpreter an aligned 32-bit integer stored by the VM appears
to be in the correct order for native C access (and more importantly,
vice versa). This turns out to be surprisingly easy to do: every time
you access the array memmap with index i, simply use index (i^3)
instead.

This almost works. All the VM proper works, and you can access the
stack in memory with 32-bit ints coming out the right way round. The
interpreter's access to the stack isn't slowed down at all since it is
still accessing using native C int acccess.

Unfortunately the Glk interface assumes that it can simply write a
char array directly into VM memory without needing to treat it. Note
that the Glk interface already treats int arrays specially so that
they can be swabbed in and out.

I realise what is going on and feel immensely pleased with myself when
I type " tegttob el" into advent.ulx, and it works.

Day 3:

Swab char arrays coming in and out of Glk. This was pretty easy too
since most of the code for handling arrays that need to be converted
already exists for int arrays.

Result: my new interpreter now plays advent.ulx. As far as I can
tell. And you can access the stack using negative addresses.

Patch in separate post.

drj

d...@pobox.com

unread,

Mar 7, 2007, 6:22:09 AM3/7/07

to

On Feb 27, 3:24 am, Andrew Plotkin <erkyr...@eblong.com> wrote:
> That would mean putting the stack in RAM, which is
> not what I'm thinking about at the moment.

Here's a patch that can be applied to glulxe-0.4.0 it causes the stack
to appear in RAM at address -stacksize to -1. Native access to the
stack is not slowed at all, all VM accesses to memory are slowed a
bit. I haven't measured it. Mostly for lack of a suitable test.

Sit in directory containing the sources (that is, where the Makefile
appears) and apply patch with:
patch -p1 < file-containing-this-post

Summary of patch:
glkop.c: changed to swab in and out char arrays (just like int
arrays). grab_temp_char_array / release_temp_char_array; making
retained_register / unregister work with char arrays.
glulxe.h: Key change. Mem* macros changed to access memory in swabbed
form. SWABA macro (swab address) converts from VM address to native C
address. This is currently hard-wired to work for little-endian
architectures (Intel). For big-endian architectures this macro needs
to be changed to: #define SWABA(addr) (addr).
vm.c: allocate stack and main memory contiguously by allocating them
in a single block. Stack is memory mapped at top-end of memory.
Loading file now goes via MemW1 macro.

diff -ur glulxe/glkop.c glulxe-mapstack/glkop.c
--- glulxe/glkop.c 2004-12-17 04:02:02.000000000 +0000
+++ glulxe-mapstack/glkop.c 2007-03-06 22:32:57.000000000 +0000
@@ -13,10 +13,13 @@
- We can read or write to a 32-bit integer in VM memory using the
macros
ReadMemory(addr) and WriteMemory(addr), where addr is an address
taken from the argument list.
- - A character array is an actual array of bytes somewhere in terp
+ - (A character array is an actual array of bytes somewhere in terp
memory, whose actual address can be computed by the macro
AddressOfArray(addr). Again, addr is a VM address from the
argument
- list.
+ list.) This assumption has been eliminated in the swabbing
version
+ of Glulx. Character arrays are treated more like integer arrays
+ (see below), they are converted to C native format and back with
a
+ pair of macros.
- An integer array is a sequence of integers somewhere in VM
memory.
The array can be turned into a C integer array by the macro
CaptureIArray(addr, len), and released by ReleaseIArray().
@@ -52,6 +55,10 @@
(grab_temp_array(addr, len, passin))
#define ReleaseIArray(ptr, addr, len, passout) \
(release_temp_array(ptr, addr, len, passout))
+#define CaptureCArray(addr, len, passin) \
+ (grab_temp_char_array(addr, len, passin))
+#define ReleaseCArray(ptr, addr, len, passout) \
+ (release_temp_char_array(ptr, addr, len, passout))
#define ReadStructField(addr, fieldnum) \
(((addr) == 0xffffffff) \
? (stackptr -= 4, Stk4(stackptr)) \
@@ -83,8 +90,8 @@
} dispatch_splot_t;

/* We maintain a linked list of arrays being used for Glk calls. It
is
- only used for integer (glui32) arrays -- char arrays are handled
in
- place. It's not worth bothering with a hash table, since most
+ used for integer (glui32) and char arrays.
+ It's not worth bothering with a hash table, since most
arrays appear here only momentarily. */

typedef struct arrayref_struct arrayref_t;
@@ -136,6 +143,8 @@

static glui32 *grab_temp_array(glui32 addr, glui32 len, int passin);
static void release_temp_array(glui32 *arr, glui32 addr, glui32 len,
int passout);
+static char *grab_temp_char_array(glui32 addr, glui32 len, int
passin);
+static void release_temp_char_array(char *arr, glui32 addr, glui32
len, int passout);

static void prepare_glk_args(char *proto, dispatch_splot_t *splot);
static void parse_glk_args(dispatch_splot_t *splot, char **proto, int
depth,
@@ -468,7 +477,7 @@

switch (typeclass) {
case 'C':
- garglist[gargnum].array = AddressOfArray(varglist[ix]);
+ garglist[gargnum].array = CaptureCArray(varglist[ix],
varglist[ix+1], passin);
gargnum++;
ix++;
garglist[gargnum].uint = varglist[ix];
@@ -664,6 +673,7 @@

switch (typeclass) {
case 'C':
+ ReleaseCArray(garglist[gargnum].array, varglist[ix],
varglist[ix+1], passout);
gargnum++;
ix++;
gargnum++;
@@ -920,41 +930,62 @@
classes_remove(objclass, obj);
}

-static glui32 *grab_temp_array(glui32 addr, glui32 len, int passin)
+static void *grab_temp_generic_array(glui32 addr,
+ glui32 len,
+ glui32 elemsize)
{
arrayref_t *arref = NULL;
- glui32 *arr = NULL;
- glui32 ix, addr2;
+ void *arr = NULL;

if (len) {
- arr = (glui32 *)glulx_malloc(len * sizeof(glui32));
+ arr = (glui32 *)glulx_malloc(len * elemsize);
arref = (arrayref_t *)glulx_malloc(sizeof(arrayref_t));
if (!arr || !arref)
fatal_error("Unable to allocate space for array argument to Glk
call.");

arref->array = arr;
arref->addr = addr;
- arref->elemsize = 4;
+ arref->elemsize = elemsize;
arref->retained = FALSE;
arref->len = len;
arref->next = arrays;
arrays = arref;
+ }

- if (passin) {
- for (ix=0, addr2=addr; ix<len; ix++, addr2+=4) {
- arr[ix] = Mem4(addr2);
- }
+ return arr;
+}
+
+static glui32 *grab_temp_array(glui32 addr, glui32 len, int passin)
+{
+ glui32 *arr = grab_temp_generic_array(addr, len, sizeof(glui32));
+ glui32 ix, addr2;
+
+ if (passin) {
+ for (ix=0, addr2=addr; ix<len; ix++, addr2+=4) {
+ arr[ix] = Mem4(addr2);
}
}
+ return arr;
+}
+
+static char *grab_temp_char_array(glui32 addr, glui32 len, int
passin)
+{
+ char *arr = grab_temp_generic_array(addr, len, 1);
+ glui32 ix;

+ if (passin) {
+ for (ix=0; ix<len; ix++) {
+ arr[ix] = Mem1(addr+ix);
+ }
+ }
return arr;
}

-static void release_temp_array(glui32 *arr, glui32 addr, glui32 len,
int passout)
+
+static void release_temp_generic_array(void *arr, glui32 addr, glui32
len)
{
arrayref_t *arref = NULL;
arrayref_t **aptr;
- glui32 ix, val, addr2;

if (arr) {
for (aptr=(&arrays); (*aptr); aptr=(&((*aptr)->next))) {
@@ -974,17 +1005,43 @@
*aptr = arref->next;
arref->next = NULL;

- if (passout) {
- for (ix=0, addr2=addr; ix<len; ix++, addr2+=4) {
- val = arr[ix];
- MemW4(addr2, val);
- }
- }
glulx_free(arr);
glulx_free(arref);
}
}

+static void release_temp_array(glui32 *arr, glui32 addr, glui32 len,
int passout)
+{
+ if (passout) {
+ glui32 ix, val, addr2;
+
+ for (ix=0, addr2=addr; ix<len; ix++, addr2+=4) {
+ val = arr[ix];
+ MemW4(addr2, val);
+ }
+ }
+
+ release_temp_generic_array(arr, addr, len);
+ return;
+}
+
+static void release_temp_char_array(char *arr, glui32 addr, glui32
len,
+ int passout)
+{
+ if (passout) {
+ glui32 ix;
+
+ for (ix=0; ix<len; ix++) {
+ MemW1(addr+ix, arr[ix]);
+ }
+ }
+
+ release_temp_generic_array(arr, addr, len);
+ return;
+}
+
+
+
gidispatch_rock_t glulxe_retained_register(void *array,
glui32 len, char *typecode)
{
@@ -992,8 +1049,8 @@
arrayref_t *arref = NULL;
arrayref_t **aptr;

- if (typecode[4] != 'I' || array == NULL) {
- /* We only retain integer arrays. */
+ if ((typecode[4] != 'I' && typecode[4] != 'C') || array == NULL) {
+ /* We only retain integer and character arrays. */
rock.ptr = NULL;
return rock;
}
@@ -1005,8 +1062,12 @@
arref = *aptr;
if (!arref)
fatal_error("Unable to re-find array argument in Glk call.");
- if (arref->elemsize != 4 || arref->len != len)
+ if ((typecode[4] == 'I' && arref->elemsize != 4) ||
+ (typecode[4] == 'C' && arref->elemsize != 1) ||
+ arref->len != len)
+ {
fatal_error("Mismatched array argument in Glk call.");
+ }

arref->retained = TRUE;

@@ -1021,8 +1082,8 @@
arrayref_t **aptr;
glui32 ix, addr2, val;

- if (typecode[4] != 'I' || array == NULL) {
- /* We only retain integer arrays. */
+ if ((typecode[4] != 'I' && typecode[4] != 'C') || array == NULL) {
+ /* We only retain integer and character arrays. */
return;
}

@@ -1037,12 +1098,23 @@
fatal_error("Mismatched array reference in Glk call.");
if (!arref->retained)
fatal_error("Unretained array reference in Glk call.");
- if (arref->elemsize != 4 || arref->len != len)
+ if ((typecode[4] == 'I' && arref->elemsize != 4) ||
+ (typecode[4] == 'C' && arref->elemsize != 1) ||
+ arref->len != len)
+ {
fatal_error("Mismatched array argument in Glk call.");
+ }

- for (ix=0, addr2=arref->addr; ix<arref->len; ix++, addr2+=4) {
- val = ((glui32 *)array)[ix];
- MemW4(addr2, val);
+ if (typecode[4] == 'I') {
+ for (ix=0, addr2=arref->addr; ix<arref->len; ix++, addr2+=4) {
+ val = ((glui32 *)array)[ix];
+ MemW4(addr2, val);
+ }
+ } else if (typecode[4] == 'C') {
+ addr2=arref->addr;
+ for (ix=0; ix<len; ix++) {
+ MemW1(addr2+ix, ((char *)array)[ix]);
+ }
}
glulx_free(array);
glulx_free(arref);
diff -ur glulxe/glulxe.h glulxe-mapstack/glulxe.h
--- glulxe/glulxe.h 2004-12-14 01:30:19.000000000 +0000
+++ glulxe-mapstack/glulxe.h 2007-03-07 09:09:12.000000000 +0000
@@ -48,13 +48,59 @@
#define Write1(ptr, vl) \
(((unsigned char *)(ptr))[0] = (vl))

+/* A not obvious fact that we rely on (in the swabbing memory
+ * implementation) is that the Mem macros and the Read/Write macros
are
+ * never used to access the same piece of memory (except that right
+ * here, the Mem macros are defined in terms of the Read/Write
macros).
+ * In other words Mem can access memory in whatever format it likes
and
+ * as long as its all consistent, that's fine. The Mem macros do not
+ * have to be consistent with Read/Write.
+ * Read/Write _do_ have to be Big-endian, they're used to read and
write
+ * values from external files (save files, game files).
+ */
+#if 0
#define Mem1(adr) (Read1(memmap+(adr)))
#define Mem2(adr) (Read2(memmap+(adr)))
#define Mem4(adr) (Read4(memmap+(adr)))
#define MemW1(adr, vl) (Write1(memmap+(adr), (vl)))
#define MemW2(adr, vl) (Write2(memmap+(adr), (vl)))
#define MemW4(adr, vl) (Write4(memmap+(adr), (vl)))
+#endif /* 0 */

+/* Converts big-endian to little-endian (and vice-versa). Really it
+ * just converts a byte-pointer that points to a big-endian object to
a
+ * byte-pointer that points to a little-endian object. */
+#define SWABA(addr) ((addr)^3)
+
+#define Roff4(base, index) \
+ ( ((glui32)(((unsigned char *)(base))[SWABA(index+0)]) << 24) \
+ | ((glui32)(((unsigned char *)(base))[SWABA(index+1)]) << 16) \
+ | ((glui32)(((unsigned char *)(base))[SWABA(index+2)]) << 8) \
+ | ((glui32)(((unsigned char *)(base))[SWABA(index+3)])) )
+#define Roff2(base, index) \
+ ( ((glui16)(((unsigned char *)(base))[SWABA(index+0)]) << 8) \
+ | ((glui16)(((unsigned char *)(base))[SWABA(index+1)])) )
+#define Roff1(base, index) \
+ (((unsigned char *)(base))[SWABA(index)])
+
+#define Woff4(base, index, value) \
+ ( ((base)[SWABA(index+0)] = (unsigned char)(((glui32)(value)) >>
24)), \
+ ((base)[SWABA(index+1)] = (unsigned char)(((glui32)(value)) >>
16)), \
+ ((base)[SWABA(index+2)] = (unsigned char)(((glui32)(value)) >>
8)), \
+ ((base)[SWABA(index+3)] = (unsigned char)(((glui32)(value)))) )
+#define Woff2(base, index, value) \
+ ( ((base)[SWABA(index+0)] = (unsigned char)(((glui32)(value)) >>
8)), \
+ ((base)[SWABA(index+1)] = (unsigned char)(((glui32)(value)))) )
+#define Woff1(base, index, value) \
+ (((unsigned char *)(base))[SWABA(index)] = (value))
+
+#define Mem1(addr) (Roff1(memmap, addr))
+#define Mem2(addr) (Roff2(memmap, addr))
+#define Mem4(addr) (Roff4(memmap, addr))
+#define MemW1(addr, value) (Woff1(memmap, addr, value))
+#define MemW2(addr, value) (Woff2(memmap, addr, value))
+#define MemW4(addr, value) (Woff4(memmap, addr, value))
+
/* Macros to access values on the stack. These *must* be used
with proper alignment! (That is, Stk4 and StkW4 must take
addresses which are multiples of four, etc.) If the alignment
diff -ur glulxe/vm.c glulxe-mapstack/vm.c
--- glulxe/vm.c 2004-12-17 04:02:58.000000000 +0000
+++ glulxe-mapstack/vm.c 2007-03-05 14:03:50.000000000 +0000
@@ -85,16 +85,11 @@
/* Allocate main memory and the stack. This is where memory
allocation
errors are most likely to occur. */
endmem = origendmem;
- memmap = (unsigned char *)glulx_malloc(origendmem);
- if (!memmap) {
- fatal_error("Unable to allocate Glulx memory space.");
- }
- stack = (unsigned char *)glulx_malloc(stacksize);
+ stack = (unsigned char *)glulx_malloc(stacksize + origendmem);
if (!stack) {
- glulx_free(memmap);
- memmap = NULL;
- fatal_error("Unable to allocate Glulx stack space.");
+ fatal_error("Unable to allocate Glulx memory space.");
}
+ memmap = stack + stacksize;
stringtable = 0;

/* Initialize various other things in the terp. */
@@ -110,10 +105,7 @@
*/
void finalize_vm()
{
- if (memmap) {
- glulx_free(memmap);
- memmap = NULL;
- }
+ memmap = NULL;
if (stack) {
glulx_free(stack);
stack = NULL;
@@ -144,7 +136,7 @@
}
if (lx >= protectstart && lx < protectend)
continue;
- memmap[lx] = res;
+ MemW1(lx, res);
}
for (lx=endgamefile; lx<origendmem; lx++) {
memmap[lx] = 0;
@@ -176,7 +168,7 @@
glui32 change_memsize(glui32 newlen)
{
long lx;
- unsigned char *newmemmap;
+ unsigned char *newp;

if (newlen == endmem)
return 0;
@@ -191,12 +183,13 @@
if (newlen & 0xFF)
fatal_error("Can only resize Glulx memory space to a 256-byte
boundary.");

- newmemmap = (unsigned char *)glulx_realloc(memmap, newlen);
- if (!newmemmap) {
+ newp = (unsigned char *)glulx_realloc(stack, stacksize + newlen);
+ if (!newp) {
/* The old block is still in place, unchanged. */
return 1;
}
- memmap = newmemmap;
+ stack = newp;
+ memmap = stack + stacksize;

if (newlen > endmem) {
for (lx=endmem; lx<newlen; lx++) {

Cassy Palop

unread,

Mar 7, 2007, 6:33:44 AM3/7/07

to

First of all, thanks for your great solution.

Now, one question: the patch that you posted seems to be incomplete,
doesn't it?

It abruptly ends here:

> if (newlen > endmem) {
> for (lx=endmem; lx<newlen; lx++) {

Am I wrong? I am not sure if such "sudden end" is due to the google
groups reader.

Thanks in advance
Cassy

d...@pobox.com

unread,

Mar 7, 2007, 7:30:30 AM3/7/07

to

On Mar 7, 11:33 am, "Cassy Palop" <cassandrapa...@gmail.com> wrote:
> First of all, thanks for your great solution.

Umm. You're welcome. Though if I thought there was any danger of
anyone using this code... I would've littered it with the usual
disclaimers: not designed, not documented, not tested. Not fit for
any use. Proof of concept only.

Are you actually interested in using it?

>
> Now, one question: the patch that you posted seems to be incomplete,
> doesn't it?

Well, it may seem that way, but it isn't.

>
> It abruptly ends here:
>
> > if (newlen > endmem) {
> > for (lx=endmem; lx<newlen; lx++) {
>
> Am I wrong? I am not sure if such "sudden end" is due to the google
> groups reader.

It ends like that.

However...

There are more serious problems. Google, or someone, seems to have
wrapped my post. Which for a patch is just a horrible death. Sorry
about that.

People used to post patches to usenet _all the time_, the internet was
built using them. *sigh*

I can e-mail you the patch if you want.

drj

Cassy Palop

unread,

Mar 7, 2007, 9:55:13 AM3/7/07

to

Hi

> Are you actually interested in using it?

Sure. Inform 7 and Emily Short's "Floatpoint" recently spawned my
interest in the 32-bit virtual machine for IF dreamt up by Andrew
Plotkin.

> There are more serious problems. Google, or someone, seems to have
> wrapped my post. Which for a patch is just a horrible death. Sorry
> about that.

I noticed the side effect. (How unseemly of Google!). However I
finally managed to manually unwrap the gift (though in an incomplete
fashion). :)

> I can e-mail you the patch if you want.

That would be so kind of you.

Thanks in advance,
Cassy

Andrew Plotkin

unread,

Mar 7, 2007, 2:10:30 PM3/7/07

to

Here, Cassy Palop <cassand...@gmail.com> wrote:
> Hi
>
> > Are you actually interested in using it?
>
> Sure. Inform 7 and Emily Short's "Floatpoint" recently spawned my
> interest in the 32-bit virtual machine for IF dreamt up by Andrew
> Plotkin.

At this point I must put on my spec hat and say that, while I
appreciate the proof of the concept, it is not something that I intend
to add to Glulx.

Rumors that I have uncorked the flying ninja lemurs and given them
DRJ's name and blood type are entirely obsurd.

--Z

--
"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*

If the Bush administration hasn't thrown you in military prison
without trial, it's for one reason: they don't feel like it. Not
because of the Fifth Amendment.

d...@pobox.com

unread,

Mar 7, 2007, 4:50:28 PM3/7/07

to

On Mar 7, 7:10 pm, Andrew Plotkin <erkyr...@eblong.com> wrote:
> At this point I must put on my spec hat and say that, while I
> appreciate the proof of the concept, it is not something that I intend
> to add to Glulx.

Tell you what then, why don't we go back to the original plan where I
ignore all possible changes to the glulx VM, because they're in the
future anyway, and you don't ask what possible new VM features people
might want. It was much simpler that way.

drj

Andrew Plotkin

unread,

Mar 7, 2007, 5:57:53 PM3/7/07

to

This is why I said from my original question that I wasn't planning to
add a memory-mapped stack.

--Z

--
"And Aholibamah bare Jeush, and Jaalam, and Korah: these were the borogoves..."
*
If the Bush administration hasn't thrown you in military prison without trial,

it's for one reason: they don't feel like it. Not because you're an American.

d...@pobox.com

unread,

Mar 9, 2007, 5:03:18 AM3/9/07

to

On Feb 27, 4:43 pm, Andrew Plotkin <erkyr...@eblong.com> wrote:

> Here, Tor Andersson <tor.anders...@gmail.com> wrote:
>
>
>
> > I'm going to pipe in here and agree. It's the extra instructions
> > required to move data to and from the stack space in RAM that costs,
> > because when using the native glulx stack the instructions are
> > extremely concise.
>
> Conciseness is related to speed, but it's not the same thing.
>
> I take your points. I still think that a RAM stack has an inherent
> speed hit, because every stack access then has to be read
> byte-by-byte.

My proof of concept was mostly motivated by this assertion. A memory
mapped stack doesn't have to be read byte-by-byte if VM memory is
swabbed so that it is stored the same way round as the stack. Which
is what I did. Now I just need to measure the slow-down, but I don't
believe it will be very significant because VM memory is already
accessed byte-by-byte, and that hasn't changed.

> And I still think you're not considering the question of
> what the VM can do to help you *without* changing the current
> architecture. Which, as I said, I am not considering at the present
> time.

To be perfectly honest, I have a GC requirement, I have a solution.
It works now, on all currently deployed Glulx interpreters (that meet
the spec). I'm not particular interesting in changing Glulx.
Particular not in ad-hoc ways. I was originally just doing some
measurement. I didn't criticise Glulx, and I didn't set out
suggesting ways to change Glulx.

drj

d...@pobox.com

unread,

Mar 15, 2007, 5:22:22 AM3/15/07

to

On Feb 27, 3:24 am, Andrew Plotkin <erkyr...@eblong.com> wrote:

> So... if the Glulx spec were to include more direct stack access
> opcodes, would it do you any good?
>
> The "full stack scenario" is where you want to allocate stack objects,
> of any size, and refer to them through pointers that can be intermixed
> with (that is, distinguished from when found as a value) ordinary
> memory pointers. That would mean putting the stack in RAM, which is
> not what I'm thinking about at the moment. Are there lesser scenarios
> which are still useful to you?
>
> I guess the minimum scenario is a "get stack pointer" opcode, and then
> "read word from stack" and "write word to stack" (which address the
> stack using that pointer value -- i.e., an absolute value, as opposed
> to the down-from-the-top offset which we've currently got.)
>
> This wouldn't let you write code which correctly handled both stack
> pointers and RAM pointers. But if you knew which you had, you could
> write code to deal with it. And it would let you do your
> garbage-collector.

There is another use for a memory mapped stack that I forgot about
earlier: representing closed over variables.

In many languages, ML, Lisp, Lua, functions can be defined inside
other functions, and inner functions can access the variables of the
outer function:

(defun additator (x)
(lambda (y) (+ x y)))

The inner function (an anonymous lambda in this case) accesses the x
variable of additator and this continues to be true even when
additator returns (in Lisp we say that the variable x has lexical
scope, in that only code lexically enclosed by additator can access x,
and indefinite extent, in that the lifetime of the binding is for as
long as it is needed). The outer variables are closed over by the
inner function.

One strategy for implementing this (which I intend to use) is to
represent each closed over variable as a pointer. Whilst additator is
active x is on the stack and the inner lambda's x closure points to
the slot for x on the stack. When additator returns, freeing up the
stack space where x resides, space for x is allocated on the heap and
the inner lambda's closure is changed to point to the heap copy.
Additator can access x just like any other variable, and the inner
lambda pays the cost of one indirection; there's some bookkeeping
cost, primarily when a function (with possible closures) returns.

This is pretty straightforward to do in conventional architectures,
and pretty easy to do in Glulx with a memory mapped stack (either a
synthetic one like I'm currently using, or the built-in stack memory
mapped using my patch). A special "get stack pointer" instruction
would not make the built-in stack very attractive for this case. The
closure, essentially just a pointer, would have to "know" whether it
was pointing to the stack or to the heap (probably a type slot stored
alongside) and code to access a closed over variable would have to
cope with either case. Which would make accessing a closed over
variable much slower and fill more instructions. Probably to the
extent that I wouldn't bother using it.

drj