trick to improve procedure calls on PDP-11

Jun Woong

unread,

Jul 7, 2009, 3:59:17 AM7/7/09

to

[I'm unsure if this is a proper group to ask this question. If not,
please let me know]

In the paper named "The C Language Calling Sequence," Dennis Ritchie
introduces a trick used to improve procedure calls on PDP-11:

When sp is used to push arguments onto the stack, it must be
readjusted after each call to throw away the arguments. However, if
an extra word is left at the end of the stack, calls with only one
argument need only move this argument top of the stack, and sp need
not be adjusted after the call. Since many calls have one argument,
this is attractive.

with a caveat:

When using this technique special care must be taken to handle
nested calls and calls with active expression temporaries.

I tried to figure out how exactly it works, but failed. The "Red
Zone" used on x86-64 seems similar to it even if the "Red Zone" is
allowed only for leaf calls. In order to make signals work using the
same stack frame, I think the trick above has the extra word
protected by sp, but then it is obscure how the caveat should be
understood.

Could anyone shed light on this?

Thanks in advance.

--
Jun, Woong (woong at icu.ac.kr)

Scott Lurndal

unread,

Jul 7, 2009, 11:53:37 AM7/7/09

to

Jun,

It doesn't apply to the x86 (or x86_64) architectures since the
'leave/ret' instructions will restore the stack pointer to the same value
as it had prior to the call, with all arguments removed (were any
passed on the stack - the x86_64 psABI calling sequence passes the
first six scalar/pointer arguments in registers).

scott

Jun Woong

unread,

Jul 8, 2009, 1:52:54 AM7/8/09

to

sc...@slp53.sl.home (Scott Lurndal) wrote:
[...]

>
> Jun,
>
> It doesn't apply to the x86 (or x86_64) architectures since the
> 'leave/ret' instructions will restore the stack pointer to the same value
> as it had prior to the call, with all arguments removed (were any
> passed on the stack - the x86_64 psABI calling sequence passes the
> first six scalar/pointer arguments in registers).
>

I knew that; maybe I should have been clearer. I mentioned the "Red
Zone" just because it came across my mind when I first read about the
trick explained in the paper. I was not asking if the trick was used
in x86-64 or other architectures being in use today, but how it
worked on PDP-11 and how it was related to the caveat DMR said.

Thanks.

Scott Lurndal

unread,

Jul 8, 2009, 1:26:28 PM7/8/09

to

Jun,

You may try your query at alt.folklore.computers, DMR hangs out there.

I suspect that the calling function would stash the parameter in the
reserved word then call the callee function. The callee function
would extract the argument from the reserved word and use it. Note
that the callee must preserve the word locally (in register or on stack)
if the callee calls other functions that use the same calling sequence;
which may mean only leaf functions can really take advantage of the
performance boost obtained by this technique.

Not too different, really, from passing arguments in registers, here
the 'register' is just a reserved location in the stack.

scott

GPS

unread,

Jul 8, 2009, 5:12:44 PM7/8/09

to

Jun Woong wrote:

[comp.compilers might be more suitable, depending on how this develops]

Based on my experience with compilers I think he is talking about stack
frame setup.

The idea is that if you allocate space on the stack for the variables, and
any other state required for the frame, such as for spilling registers
(potentially), alignment, etc. you need to change the stack pointer.

What I believe he is proposing is directly using the TOS (top of stack)
indicated by the sp (stack pointer), so that you don't need to subtract from
sp to create the stack frame, or add to sp to cleanup the stack frame.

So if you have a function defined like so:
int foo(int r) { return r; }

You could generate code like this (pseudo-asm):
foo:
1. sub $4,sp
2. mov 0(pfp),0(sp)
3. mov 0(sp),res
4. add $4,sp
5. ret

1 allocates a stack frame. $4 indicates a literal 4, as opposed to an
address 4, that might generate a different sub potentially (depending on the
assembler and architecture).
2 moves the parameter into the stack frame and stores it in variable r's
space.
3 moves the variable r's value into the res (result) register.
4 cleans up the stack frame.
5 returns to the caller.

So, you could also do this:
foo:
1. mov 0(pfp),0(sp)
2. mov 0(sp),res
3. ret

So, in this case, I think that is similar to what DMR is talking about.

In the C code I gave, there is little need for 2. This could be done
instead:

foo:
1. mov 0(pfp),res
2. ret

Improving performance involves removing indirection. In some cases the
variables of a function may be entirely passed and stored in CPU registers,
so the need to move data to and from RAM is eliminated. The Sparc RISC
architecture takes advantage of this via register windows. It has been some
years since I wrote Sparc assembly, but from what I recall the output
registers become input registers after a call instruction, and there are
many more registers for locals.

-GPS

GPS

unread,

Jul 8, 2009, 5:16:45 PM7/8/09

to

GPS wrote:

It just occurred to me: you would most likely need a temporary register
here.

Most processors I have used don't provide memory operand to memory operand
mov instructions.

Jun Woong

unread,

Jul 10, 2009, 12:44:12 AM7/10/09

to

GPS <georg...@xmission.com> wrote:
[...]

>
> [comp.compilers might be more suitable, depending on how this develops]
>

Yes, I think this question is getting to be an off-topic here.

> Based on my experience with compilers I think he is talking about stack
> frame setup.
>
> The idea is that if you allocate space on the stack for the variables, and
> any other state required for the frame, such as for spilling registers
> (potentially), alignment, etc. you need to change the stack pointer.
>
> What I believe he is proposing is directly using the TOS (top of stack)
> indicated by the sp (stack pointer), so that you don't need to subtract from
> sp to create the stack frame, or add to sp to cleanup the stack frame.
>
> So if you have a function defined like so:
> int foo(int r) { return r; }
>
> You could generate code like this (pseudo-asm):
> foo:
> 1. sub $4,sp
> 2. mov 0(pfp),0(sp)

I think you meant:

mov 0(pfp),0(sp)
sub $4,sp

with the optimized code given below.

> 3. mov 0(sp),res
> 4. add $4,sp
> 5. ret
>
> 1 allocates a stack frame. $4 indicates a literal 4, as opposed to an
> address 4, that might generate a different sub potentially (depending on the
> assembler and architecture).
> 2 moves the parameter into the stack frame and stores it in variable r's
> space.
> 3 moves the variable r's value into the res (result) register.
> 4 cleans up the stack frame.
> 5 returns to the caller.
>
> So, you could also do this:
> foo:
> 1. mov 0(pfp),0(sp)
> 2. mov 0(sp),res
> 3. ret
>
> So, in this case, I think that is similar to what DMR is talking about.
>

The problem is that, according to DMR, the stack frame has the status
of the caller saved *below* the incoming argument. That is,

incomming arg 1
saved status (incl. old fp)
--------------------------- fp
locals and temps (if any)
--------------------------- sp

where the stack grows toward the bottom.

(You can see the HTML version of the paper in question at

http://cm.bell-labs.com/cm/cs/who/dmr/clcs.html

The stack frame for PDP-11 is given the "Good Strategy" section and
the trick being discussed under the "A Five-per-cent Digression"
section.)

Putting arguments into the frame is done by the caller and saving the
caller's status is done by the callee on PDP-11.

Saving the caller's status below incomming arguments seems to me that
the callee has to adjust sp anyway, while it doesn't have to in your
example.

I agree that the trick in question works (precisely, improves the
performance) only when sp need not be adjusted across calls. The
status saved below the incomming argument, however, makes me wonder
if there ever exists a case where omitting adjustment of sp is
possible.

> In the C code I gave, there is little need for 2. This could be done
> instead:
>
> foo:
> 1. mov 0(pfp),res
> 2. ret
>
> Improving performance involves removing indirection. In some cases the
> variables of a function may be entirely passed and stored in CPU registers,
> so the need to move data to and from RAM is eliminated. The Sparc RISC
> architecture takes advantage of this via register windows. It has been some
> years since I wrote Sparc assembly, but from what I recall the output
> registers become input registers after a call instruction, and there are
> many more registers for locals.
>

Thanks for your kind explanation.

Jun Woong

unread,

Jul 10, 2009, 12:54:06 AM7/10/09

to

sc...@slp53.sl.home (Scott Lurndal) wrote:
[...]
>
> Jun,
>

> You may try your query at alt.folklore.computers, DMR hangs out there.
>

Thanks for the information. It seems better to post the question
there hoping DMR sees it; unfortunately, the last posting to Usenet
by DMR is dated Dec 2008.