Safety and security from attack vectors

MitchAlsup

unread,

Jun 29, 2022, 3:19:01 PM6/29/22

to

This thread addresses how architectures can make themselves more
secure from various attack strategies.
<
Everyone should chime in with their favorite methods and aparatuses
<
My 66000 has a Safe Stack mode (which I am contemplating making
the only execution mode).
<
In this mode, there is a stack where return addresses are stored and loaded
and where preserved registers are stored and loaded.
<
The stack pointer to this stack is not in the register file <per seè> but
is a control register accessible with GuestOS privilege.
<
The stack is translated with PTEs where RWE = 000, so normal LDs
and STs will page fault.
<
The stack is known to be used as a strict stack so one can add new
cache lines with allocate and not bother with coherence traffic, similarly
when the stack area is deallocated modified lines do not have to be
written into the memory hierarchy, improving performance, as they
currently contain garbage.
<
The Safe Stack is only accessed by CALL instructions, which push
the return address onto the stack; The ENTER instruction, which
pushes preserved registers on Safe Stack and pushes non-preserved
registers on the regular stack, and allocates Local_data_area on the
regular stack {SP = R31}. EXIT reverses what ENTER did, and if EXIT
touches what would be R0 or if RET is performed control is transfered
back to caller (or message sender).
<
Preserved registers pushed on Safe Stack have their values reinitialized
to 0x0 <at least conceptually> so the application cannot even see that
data.
<
This was done in such a way that compiled code does not need to
know it is being called from within this thread, or being called from
a different thread, both EXIT and RET will transfer control back to
the appropriate receiver.
>
Buffer overflows can still corrupt application data, but can no longer
corrupt data the compiler believes is safe.
<
Obviously there is no way to corrupt the return address eliminating
Return Oriented Programming (RoP) attacks.
<
--------------------------------------------------------------------------------------------------------------
>
My 66000 defines the last layer of the cache hierarchy as <effectively>
the DRAM cache. This is a place where DRAM can prefetch ahead and
a place where modified lines await being written into DRAM.
>
An application can modify a cache line, flush it to DRAM, hundreds of
times, yet only the last 1 (or 2) lines will be written, preventing ROW
HAMMER attacks without modifying DRAM chips themselves.
<
---------------------------------------------------------------------------------------------------------------
<
My 66000 directly supports an instruction which branches or calls
through a table, and this table can be in RWE = 001 access space
so the application cannot read or write the table. The branch version
directly supports PIC switch tables, the call version supports method
calls. The important thing here is that the tables are not necessarily
modifiable, and are inherently PIC--containing data which is word offset
off of the current IP.
<
Thus, the compiler can make it extremely difficult to perform Call
oriented Programming attacks.
<
------------------Paranoid Applications----------------------------------------------------------
<
There are applications which do not even want the Operating System
to be able to see its data, yet want the OS to be able to provide file
system services to said application. The <effective> trap interface
of My 66000 has the ability to allow or to deny access to the caller's
address space.
<
GuestOS can still look at the address, still verify the address and size
are reasonable, and still do all the is necessary to setup the file system
transfer, sending the <opaque> originator virtual address to the <say>
SATA driver. When SATA driver is ready it performs virtual DMA directly
to/from paranoid application address space, GuestOS never gets to see
any of it.
<
Well, that's a start.

Stephen Fuld

unread,

Jun 30, 2022, 12:51:50 AM6/30/22

to

On 6/29/2022 12:18 PM, MitchAlsup wrote:
> This thread addresses how architectures can make themselves more
> secure from various attack strategies.
> <
> Everyone should chime in with their favorite methods and aparatuses

A couple of points.

1. You have mentioned before the steps you take to prevent side channel
attacks such as Spectre, etc. No need to repeat them here, just to
mention that this is another set of attack strategies. And that there
may be other such strategies in the future.

2. As Ivan has pointed out, there is no architectural prevention against
bags of cash or pretty women (or men, if the subject prefers). And no
prevention against the latest breach -

https://www.avclub.com/drunk-usb-drive-amagasaki-japan-data-leak-1849117533

Though these don't relieve the designer of working hard to prevent other
types of attacks.

> <
> My 66000 has a Safe Stack mode (which I am contemplating making
> the only execution mode).

This is generally a good idea, but you have to provide a mechanism for
some programs, e.g. debuggers, to read the stack, and then, of course,
some mechanism to prevent unauthorized use of the first mechanism. :-)

--
- Stephen Fuld
(e-mail address disguised to prevent spam)

Ivan Godard

unread,

Jun 30, 2022, 2:31:50 AM6/30/22

to

This is quite tricky to get right, or at least it has been for us. The
Safe Stack ("control stack" in our parlance) content may belong to
several different contexts and that content may be interleaved depending
on just how state spill works. Not only do you want to avoid needing a
trusted debugger, unless contented is aligned in permission region units
it is impossible to give a debugger permission to inspect only content
that its turf has rights to.

That's not much of a problem for us with our byte-sized permission unit,
but it's a real mess when the unit is pages; you have to advance the
whole stack by a page every time you transit to a new domain. Of course,
you can have a trusted stack-inspector, but that suffers not only the
watcher watched problem you mention, but also requires exposing the
structure of the spilled state. It's a mess.

Our solution avoids the problem completely by having a separate control
stack for each threadXturf. However, that's prohibitively costly if you
don't have our patented implicit zero. I suspect that other systems will
continue to use trusted debuggers and separate processes, thereby losing
the security gain of a Control Stack.

BGB

unread,

Jun 30, 2022, 4:03:27 AM6/30/22

to

On 6/29/2022 11:51 PM, Stephen Fuld wrote:
> On 6/29/2022 12:18 PM, MitchAlsup wrote:
>> This thread addresses how architectures can make themselves more
>> secure from various attack strategies.
>> <
>> Everyone should chime in with their favorite methods and aparatuses
>
> A couple of points.
>
> 1. You have mentioned before the steps you take to prevent side
> channel attacks such as Spectre, etc. No need to repeat them here, just
> to mention that this is another set of attack strategies. And that
> there may be other such strategies in the future.
>

IOW: Security is hard...

A few options (anti-injection protections):
1A, ASLR:
Some protection via the power of random numbers;
Very weak if attacker can guess or brute force the randomization;
Generally cheap.
1B, Security Tokens:
Fairly effective against stack-based buffer overflows;
Limited in scope;
Non-zero performance cost.
1C, Non-Executable Stack/Heap:
Can prevent many types of hostile code injections;
Cheap if supported by hardware;
Requires special "RWX memory" for JIT or Lambdas;
1D, Bounds-Checked Pointers
Can prevent out-of-bounds access as it happens.
Basically, "Capabilities without the memory tagging".
Weaker due to no way to prevent forging.

BJX2 does A/B/C here in the 64-bit ABI, D would require the 128-bit ABI.

Protection against non-trusted code:
2A, User/Supervisor Split
Pretty much every OS for the past 25+ years does this;
Depends on the OS enforcing the HAL/Syscalls and other things;
Should not be possible to sidestep the kernel or drivers.
2B, Hypervisor:
Slightly stronger, mostly by adding additional layers.
Can be broken if the code knows how to break the multiple layers.
2C, My VUGID system was originally intended partly for this scenario:
Enforce UID/GID + ACL checking on threads and memory objects;
Drawbacks: Relatively complicated to put into use effectively;
Can offers protection even with known memory addresses;
Similar properties to traditional page-based memory protection.
"Getting ROOT" breaks VUGID though.
2D, Capabilities:
Could offer fairly strong protection, if used correctly;
Relatively expensive to support in hardware and in software;
Tagged memory, 128-bit pointers, ...
If a way exists to steal or forge capabilities:
... the whole system is broken ...
2E, Type-Safety based Security
Similar properties to capabilities
Just with type-safety instead.
Typically requires a language design like Java or C#.
Typically also a GC because manual MM can break it.
...
Doesn't play well with writing code in C or C++.
2F, Very Big ASLR
Make address space very big and addresses hard to guess.

BJX2 (on paper) does A, C, and F.
Potentially, 1D + 2F could do a "poor mans imitation" of 2D.

Still kinda moot as TestKern still has roughly about as much memory
protection going on as MS-DOS...

Issues like Spectre or similar are an entirely different class of issue.
I suspect for a lot of these, a very dumb in-order CPU could be an
advantage (can't as easily "steal" information that isn't present in the
first place).

> 2. As Ivan has pointed out, there is no architectural prevention
> against bags of cash or pretty women (or men, if the subject prefers).

Well, assuming the person experiences a sense of attraction in some form
other than as an abstracted set of social behaviors; and/or some level
of generalized anxiety about still being alone and similar... (but no
real interest in anything "casual" as this seems like it would be pretty
much pointless in terms of moving towards ones' end goals).

They could lure someone with claims like "Hey, want to live rent-free, I
can provide a couch to crash on. There will also be a fridge perpetually
stocked with food and beer.", "Hey man, that is a tempting offer, but
what is the catch?...", but there is always a catch, "aint no free
couch", ...

> And no prevention against the latest breach -
>
> https://www.avclub.com/drunk-usb-drive-amagasaki-japan-data-leak-1849117533
>
> Though these don't relieve the designer of working hard to prevent other
> types of attacks.
>

>
>> <
>> My 66000 has a Safe Stack mode (which I am contemplating making
>> the only execution mode).
>
> This is generally a good idea, but you have to provide a mechanism for
> some programs, e.g. debuggers, to read the stack, and then, of course,
> some mechanism to prevent unauthorized use of the first mechanism. :-)
>
>

Yep.

Not really an obvious "good" solution within the traditional space.

One possible alternate approach would be a sort of "continuation passing
style" where the "continuation" (control-float graph) and "lexical
binding frame" (local variables and similar) are two
partially-independent constructs. Both would be present as dynamically
allocated structures which, while partially tied together, would be
structurally independent, and would take over the combined role
traditionally assigned to a call-stack. They would be created and
destroyed dynamically under normal control frame, but may be "captured"
into a more persistent form (likely via a linked-list walk, and setting
a "do not destroy when control-flow leaves this frame" flag for
everything along the path).

Some languages are built around such an execution model, but the obvious
drawback is that doing stuff this way is likely to have a "fairly steep"
impact on performance compared with a more traditional "C-like"
execution model.

But, it is possible it could be helped some if the ISA and low-level ABI
were adapted to help with using this model in place of the more
traditional linear stack approach.

There are a few languages around which traditional assume an execution
model like this though.

robf...@gmail.com

unread,

Jun 30, 2022, 6:35:39 AM6/30/22

to

Call target instructions. Calls cause an exception if the target is not a call
target instruction. Prevents jumping into the middle of code. An instruction
like ENTER could be a call target instruction.

Stefan Monnier

unread,

Jun 30, 2022, 11:50:42 AM6/30/22

to

> My 66000 defines the last layer of the cache hierarchy as <effectively>
> the DRAM cache. This is a place where DRAM can prefetch ahead and
> a place where modified lines await being written into DRAM.
>>
> An application can modify a cache line, flush it to DRAM, hundreds of
> times, yet only the last 1 (or 2) lines will be written, preventing ROW
> HAMMER attacks without modifying DRAM chips themselves.

How does this manifest itself in the architecture?

Stefan

MitchAlsup

unread,

Jun 30, 2022, 2:31:52 PM6/30/22

to

On Wednesday, June 29, 2022 at 11:51:50 PM UTC-5, Stephen Fuld wrote:
> On 6/29/2022 12:18 PM, MitchAlsup wrote:
> > This thread addresses how architectures can make themselves more
> > secure from various attack strategies.
> > <
> > Everyone should chime in with their favorite methods and aparatuses
> A couple of points.
>
> 1. You have mentioned before the steps you take to prevent side channel
> attacks such as Spectre, etc. No need to repeat them here, just to
> mention that this is another set of attack strategies. And that there
> may be other such strategies in the future.
>
> 2. As Ivan has pointed out, there is no architectural prevention against
> bags of cash or pretty women (or men, if the subject prefers).
<

In my case, I am still waiting !

<
> And no
> prevention against the latest breach -
>
> https://www.avclub.com/drunk-usb-drive-amagasaki-japan-data-leak-1849117533
>
> Though these don't relieve the designer of working hard to prevent other
> types of attacks.
> > <
> > My 66000 has a Safe Stack mode (which I am contemplating making
> > the only execution mode).
<
> This is generally a good idea, but you have to provide a mechanism for
> some programs, e.g. debuggers, to read the stack, and then, of course,
> some mechanism to prevent unauthorized use of the first mechanism. :-)
<

Any thread which has RWE mappings to those pages can do what they please
it is just the primary thread cannot (RWE = 000 is checked by HW)

MitchAlsup

unread,

Jun 30, 2022, 2:39:40 PM6/30/22

to

Last layer cache is shared by DRAM not across cores. Topologically,
it can be anywhere on die. Think of it as the multi-MB DRAM R/W
buffer.
<
Every core has uniform access.
<
DRAM controller decides when a write is appropriate. RASed and
about to be CASed and the bus is electrically pointed in the DRAM
direction. Even here, an arriving line write will update the data before
it is sent to DRAM.
<
It operates a lot like the cache in a SATA drive.
>
> Stefan

Stefan Monnier

unread,

Jun 30, 2022, 4:05:48 PM6/30/22

to

MitchAlsup [2022-06-30 11:39:37] wrote:
>> How does this manifest itself in the architecture?
> Last layer cache is shared by DRAM not across cores. Topologically,
> it can be anywhere on die. Think of it as the multi-MB DRAM R/W
> buffer.

I understand this part, but AFAICT it affects the implementation but not
the architecture. IOW the same could apply to an amd64 processor and
the software would be none the wiser and similarly a My 66000 could also
choose not to use such an implementation.

Or am I missing something?

Stefan

MitchAlsup

unread,

Jun 30, 2022, 4:31:55 PM6/30/22

to

It has been specified as such in architecture to prevent implementations
from placing the LLCache where RowHammer can attack DRAM stability.
That is the only reason for the specification in architecture.
>
> Stefan

Stefan Monnier

unread,

Jun 30, 2022, 4:36:47 PM6/30/22

to

>> I understand this part, but AFAICT it affects the implementation but not
>> the architecture. IOW the same could apply to an amd64 processor and
>> the software would be none the wiser and similarly a My 66000 could also
>> choose not to use such an implementation.
>>
>> Or am I missing something?
>>
> It has been specified as such in architecture to prevent implementations
> from placing the LLCache where RowHammer can attack DRAM stability.
> That is the only reason for the specification in architecture.

I see. So, IIUC, no piece of software should be able to notice the
difference (so an implementation that doesn't obey this principle would
still work just as well), but it's mentioned as a kind of "quality of
implementation" requirement.
Thanks,

Stefan

MitchAlsup

unread,

Jun 30, 2022, 5:42:10 PM6/30/22

to

In effect, yes. Specifies a minimum quality of implementation.
<
Cache updates and TLB updates are also specified as not-being performed
until the instruction completes, too. Spectré
<
There are a couple of other requirements sprinkled over the specifications
that are also, in effect, minimum quality of implementation requirements.
<
> Thanks,
>
>
> Stefan

EricP

unread,

Jul 1, 2022, 4:08:29 PM7/1/22

to

MitchAlsup wrote:
> <
> Everyone should chime in with their favorite methods and aparatuses
> <
> My 66000 has a Safe Stack mode (which I am contemplating making
> the only execution mode).

I don't think making safe mode mandatory is a good idea because it
means your current ABI definition and in particular its register and
stack manipulations are the only calling interface supported.

Each language has its own requirements, and each compiler
developer needs to be able to tailor their ABI to suit them.
I don't think it is possible to anticipate everyones current and
future needs. What is optimal for one will not be optimal for all.

What I would think is acceptable is to say something like
"The My66000 ISA is optimized for the ABI defined in Appendix D"
and leave it to OS and compiler developers to decide when it is suitable.

MitchAlsup

unread,

Jul 1, 2022, 4:26:38 PM7/1/22

to

On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > <
> > Everyone should chime in with their favorite methods and aparatuses
> > <
> > My 66000 has a Safe Stack mode (which I am contemplating making
> > the only execution mode).
<
> I don't think making safe mode mandatory is a good idea because it
> means your current ABI definition and in particular its register and
> stack manipulations are the only calling interface supported.
<

I will try to keep this in mind; and I though I had.

>
> Each language has its own requirements, and each compiler
> developer needs to be able to tailor their ABI to suit them.
<

What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
or existing run time libraries, existing system services, ...
There has to be some commonality to do this.

<
> I don't think it is possible to anticipate everyones current and
> future needs. What is optimal for one will not be optimal for all.
<

Can you cite an example of a language* that would want something
significantly different ? and identify what kind of difference that
language might want ?
<
(*) or language feature.

>
> What I would think is acceptable is to say something like
> "The My66000 ISA is optimized for the ABI defined in Appendix D"
> and leave it to OS and compiler developers to decide when it is suitable.
<

It is over in "My 66000 Software Operation", Chapter 4.
{The totality of the documentation became larger than seemed reasonable
in a single document. So, last year I broke it into 2 (then 3) documents.}

EricP

unread,

Jul 1, 2022, 5:48:27 PM7/1/22

to

MitchAlsup wrote:
> On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>>> <
>>> Everyone should chime in with their favorite methods and aparatuses
>>> <
>>> My 66000 has a Safe Stack mode (which I am contemplating making
>>> the only execution mode).
> <
>> I don't think making safe mode mandatory is a good idea because it
>> means your current ABI definition and in particular its register and
>> stack manipulations are the only calling interface supported.
> <
> I will try to keep this in mind; and I though I had.

Might just be me getting anxious when I feel caged in.

>> Each language has its own requirements, and each compiler
>> developer needs to be able to tailor their ABI to suit them.
> <
> What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
> or existing run time libraries, existing system services, ...
> There has to be some commonality to do this.

The way many compilers seem to deal with this is with
routine declarations that state which ABI to use.

E.g. MS C and GCC have different x64 ABI's.
If they want to support calls to the others code GCC might allow
external int __MS_stdcall CreateWindow (blah);

> <
>> I don't think it is possible to anticipate everyones current and
>> future needs. What is optimal for one will not be optimal for all.
> <
> Can you cite an example of a language* that would want something
> significantly different ? and identify what kind of difference that
> language might want ?
> <
> (*) or language feature.

For example Ada has language features that C doesn't support
like functions that return variable sized objects like strings.

In one possible implementation the callee allocates space on the stack,
copies in the string return value and puts the address in R0,
restores the preserved registers including FP,
and returns to the caller *WITHOUT* restoring SP.
Caller does what wants with that return string then restores SP.

MitchAlsup

unread,

Jul 1, 2022, 6:13:44 PM7/1/22

to

On Friday, July 1, 2022 at 4:48:27 PM UTC-5, EricP wrote:
> MitchAlsup wrote:
> > On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
> >> MitchAlsup wrote:
> >>> <
> >>> Everyone should chime in with their favorite methods and aparatuses
> >>> <
> >>> My 66000 has a Safe Stack mode (which I am contemplating making
> >>> the only execution mode).
> > <
> >> I don't think making safe mode mandatory is a good idea because it
> >> means your current ABI definition and in particular its register and
> >> stack manipulations are the only calling interface supported.
> > <
> > I will try to keep this in mind; and I though I had.
> Might just be me getting anxious when I feel caged in.
> >> Each language has its own requirements, and each compiler
> >> developer needs to be able to tailor their ABI to suit them.
> > <
> > What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
> > or existing run time libraries, existing system services, ...
> > There has to be some commonality to do this.
> The way many compilers seem to deal with this is with
> routine declarations that state which ABI to use.
>
> E.g. MS C and GCC have different x64 ABI's.
> If they want to support calls to the others code GCC might allow
> external int __MS_stdcall CreateWindow (blah);
<

I am trying to nip this in the bud before hand.

<
> > <
> >> I don't think it is possible to anticipate everyones current and
> >> future needs. What is optimal for one will not be optimal for all.
> > <
> > Can you cite an example of a language* that would want something
> > significantly different ? and identify what kind of difference that
> > language might want ?
> > <
> > (*) or language feature.
<
> For example Ada has language features that C doesn't support
> like functions that return variable sized objects like strings.
<

Arguments and results can be in struct form {multiple DoubleWords}
so one can return {uint64_t cnt; char *string; };
<
The first 8 DoubleWords traverse the boundary in registers, the rest on the
stack.

>
> In one possible implementation the callee allocates space on the stack,
> copies in the string return value and puts the address in R0,
> restores the preserved registers including FP,
> and returns to the caller *WITHOUT* restoring SP.
> Caller does what wants with that return string then restores SP.
<

Since ADA in particular has very strong (and checked) typing. This seems
to be a case where that short section of code at the return point gets inlined
into callee just before the Epilogue sequence. And then use the ABI normally.
{But thanks for the example. I will give it some thought.}

Brett

unread,

Jul 1, 2022, 6:40:42 PM7/1/22

to

"My 66000 Software Operation"

Is this document available on the web?

MitchAlsup

unread,

Jul 1, 2022, 9:38:07 PM7/1/22

to

Email me.

Anton Ertl

unread,

Jul 2, 2022, 2:36:56 AM7/2/22

to

MitchAlsup <Mitch...@aol.com> writes:
>On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
>> Each language has its own requirements, and each compiler
>> developer needs to be able to tailor their ABI to suit them.
><
>What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
>or existing run time libraries, existing system services, ...
>There has to be some commonality to do this.

Calls to C typically require special declarations, have significant
restrictions, and are generally a pain in many languages. SWIG is a
big project that exists only to deal with this pain. The reason for
the pain is the impendance mismatch between the languages. And
because it is such a pain, there is also no way to call arbitrary Ada,
C++, Lisp, ... functions, because that would be even more pain. You
can call Fortran (because it's close enough to C) if you know how to
call it as if it was a C function; you can call C++ functions declared
as "C" function, but not other C++ functions.

>Can you cite an example of a language* that would want something
>significantly different ? and identify what kind of difference that
>language might want ?

Functional languages (e.g., the ML family or Haskell) generally want
guaranteed tail-call elimination, which, in general, is incompatible
with the typical C calling convention (that's why gcc only offers
tail-call optimization for "sibcalls", not in general). In LLVM the
intermediate language has functiona as a language feature, and
therefore has the ABI baked in, to some extent. They have the option
of defining your own ABI, and IIRC the Haskell people tried to make
use of that, but I read that this was hard going, and I have not read
about anyone else going there.

In Prolog a predicate is called, and it can either fail, but it can
also succeed, without or with generating a choice point (possibly in
predicates called from the predicate at hand, not in the predicate
itself). Without choice point, the success is closest to a C return,
but Prolog implementations want tail-call elimination in that case.
If there is a choice point (or several), a subsequent failure
backtracks to the most recent choice point and starts working on the
alternative. Overall the literature has depicted Prolog calls as
having four ports: call, success, backtrack, failure.

Snobol and Icon have success and failure, but I don't know much more
about them.

PAF (currently a paper project)
<https://www.complang.tuwien.ac.at/anton/euroforth/ef13/papers/ertl-paf.pdf>
has PAF definitions and PAF calls (section 3.10) that offer more
liberties than your typical ABI (e.g., tail-call optimization), and
ABI definitions and calls that follow the ABI.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Niklas Holsti

unread,

Jul 2, 2022, 4:06:44 AM7/2/22

to

When an Ada function returns a dynamically-sized object such as a string
of dynamically determined length, we do not want to use heap allocation
for that object. Therefore the ABI must define where that object is
stored, so that the caller can take responsibility for releasing the
storage when no longer needed.

It is not enough for the ABI to specify that the returned value is a
pointer and a length.

>> In one possible implementation the callee allocates space on the stack,
>> copies in the string return value and puts the address in R0,
>> restores the preserved registers including FP,
>> and returns to the caller *WITHOUT* restoring SP.
>> Caller does what wants with that return string then restores SP.

The most widely used Ada compiler, GNAT, uses a "secondary stack" for
dynamically-sized function results and other dynamically-sized local
variables. For such objects the normal or primary stack contains
pointers into the secondary stack, leaving the primary stack frames with
a statically fixed lay-out. Each thread (Ada task) has its own secondary
stack arena that is created when the thread is created. Some systems fix
the size of the secondary stack, which means that it can overflow
(causing an exception); others may let the stack grow as needed.

The top-of-stack pointer for the secondary stack is not reset when a
function returns a value on the secondary stack, so it works as EricP
described, except for not messing with the primary stack pointer.

> Since ADA in particular has very strong (and checked) typing. This seems
> to be a case where that short section of code at the return point gets inlined
> into callee just before the Epilogue sequence. And then use the ABI normally.

I don't understand that. How do you "inline" into the /callee/? And
which "short section of code"? When an Ada function returns a
dynamically-sized object, that object can live (in the caller, or the
caller's caller, etc.) for a long time after it is returned.

> {But thanks for the example. I will give it some thought.}

I can provide some real Ada code examples, if you want such.

Thomas Koenig

unread,

Jul 2, 2022, 6:09:14 AM7/2/22

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> MitchAlsup <Mitch...@aol.com> writes:
>>On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
>>> Each language has its own requirements, and each compiler
>>> developer needs to be able to tailor their ABI to suit them.
>><
>>What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
>>or existing run time libraries, existing system services, ...
>>There has to be some commonality to do this.
>
> Calls to C typically require special declarations, have significant
> restrictions, and are generally a pain in many languages. SWIG is a
> big project that exists only to deal with this pain. The reason for
> the pain is the impendance mismatch between the languages. And
> because it is such a pain, there is also no way to call arbitrary Ada,
> C++, Lisp, ... functions, because that would be even more pain. You
> can call Fortran (because it's close enough to C) if you know how to
> call it as if it was a C function; you can call C++ functions declared
> as "C" function, but not other C++ functions.

Not sure what you are referring to here.

"Traditional" Fortran (F77) has no fixed calling convention,
and subtle differences, such as passing hidden string lengths,
do bite people (cf. the recent R/Lapack issues), or the difference
between varargs and non-varargs calls.

Since Fortran 2003, Fortran has a standardized C interop feature,
which was extended with array descriptors on the C side with
Fortran 2018. Unfortunately (and this was a big mistake) the
Fortran standard only specified access macros, and not the
way the relevant structs are ordered. This has led to binary
incompatiblity between compilers :-(

>>Can you cite an example of a language* that would want something
>>significantly different ? and identify what kind of difference that
>>language might want ?

Fortran's language definition makes it impossible for a valid
program to determine the way of argument passing. Traditionally,
this would mean pointers (unless VALUE is specified, and even there
some compilers differ in their handdling).

What I would like is a "call by reference" for INTENT(INOUT), where
a value is passed in a register, and the changed value is returned
in a register.

Another poing is efficient passing of array descriptors (aka
dope vectors). For each dimension, a lower bound, a extent and a
stride has to be passed, plus (for the whole array) a date pointer,
the rank, the type, an attribute what exactly it is, and the number
of arrays.

So, a two-dimensional array probably requires eight 64-bit words,
which is traditionally just passed with a pointer to the struct.
A few arguments and you will probably not have the option of passing
all that in registers any more...

Thomas Koenig

unread,

Jul 2, 2022, 6:10:30 AM7/2/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

> On Friday, July 1, 2022 at 4:48:27 PM UTC-5, EricP wrote:
>> MitchAlsup wrote:
>> > On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
>> >> MitchAlsup wrote:
>> >>> <
>> >>> Everyone should chime in with their favorite methods and aparatuses
>> >>> <
>> >>> My 66000 has a Safe Stack mode (which I am contemplating making
>> >>> the only execution mode).
>> > <
>> >> I don't think making safe mode mandatory is a good idea because it
>> >> means your current ABI definition and in particular its register and
>> >> stack manipulations are the only calling interface supported.
>> > <
>> > I will try to keep this in mind; and I though I had.
>> Might just be me getting anxious when I feel caged in.
>> >> Each language has its own requirements, and each compiler
>> >> developer needs to be able to tailor their ABI to suit them.
>> > <
>> > What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
>> > or existing run time libraries, existing system services, ...
>> > There has to be some commonality to do this.
>> The way many compilers seem to deal with this is with
>> routine declarations that state which ABI to use.
>>
>> E.g. MS C and GCC have different x64 ABI's.
>> If they want to support calls to the others code GCC might allow
>> external int __MS_stdcall CreateWindow (blah);
><
> I am trying to nip this in the bud before hand.

If (hopefully) your architecture ever gets implemented in silicon,
and Microsoft ports Windows to it, do you think that MS will care?

Anton Ertl

unread,

Jul 2, 2022, 11:46:52 AM7/2/22

to

Thomas Koenig <tko...@netcologne.de> writes:
>Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
>> [...] You

>> can call Fortran (because it's close enough to C) if you know how to

>> call it as if it was a C function [...]

>
>Not sure what you are referring to here.
>
>"Traditional" Fortran (F77) has no fixed calling convention,

Languages in general don't, systems usually do. I was thinking about
the fact that in (old) Unix (when Fortran was still significant to
more than just scientific computing) C and Fortran were interoperable,
and VAX/VMS had some common calling convention that covered several
languages, including Fortran and at some point C.

I also managed to call DGEMM from C in recent years.

You certainly have more experience in that area.

>and subtle differences, such as passing hidden string lengths,
>do bite people (cf. the recent R/Lapack issues), or the difference
>between varargs and non-varargs calls.

Varargs are responsible for much of the messiness in C calling
conventions, including the inability to guarantee tail-call
optimization (thanks to varargs, the caller has to deallocate the
arguments if they land on the stack).

Concerning "hidden string lengths", different string representations
are a typical example of the impendance mismatch between programming
languages that makes calling between languages such a pain.

>Since Fortran 2003, Fortran has a standardized C interop feature,

Ok, that's along the lines of "foreign function interfaces" (really C
interfaces) in other languages.

MitchAlsup

unread,

Jul 2, 2022, 12:55:04 PM7/2/22

to

On Saturday, July 2, 2022 at 10:46:52 AM UTC-5, Anton Ertl wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> >Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
> >> [...] You
> >> can call Fortran (because it's close enough to C) if you know how to
> >> call it as if it was a C function [...]
> >
> >Not sure what you are referring to here.
> >
> >"Traditional" Fortran (F77) has no fixed calling convention,
> Languages in general don't, systems usually do. I was thinking about
> the fact that in (old) Unix (when Fortran was still significant to
> more than just scientific computing) C and Fortran were interoperable,
> and VAX/VMS had some common calling convention that covered several
> languages, including Fortran and at some point C.
>
> I also managed to call DGEMM from C in recent years.
>
> You certainly have more experience in that area.
> >and subtle differences, such as passing hidden string lengths,
> >do bite people (cf. the recent R/Lapack issues), or the difference
> >between varargs and non-varargs calls.
<
> Varargs are responsible for much of the messiness in C calling
> conventions, including the inability to guarantee tail-call
> optimization (thanks to varargs, the caller has to deallocate the
> arguments if they land on the stack).
<

This depends on the ABI.
<
My 66000 ABI the caller is responsible for allocating and deallocating
arguments [9..k] on the stack while the callee is responsible for allocating
and deallocating arguments[1..8] on the stack. Varargs is no different
than non-varargs, here. {Arguments[1..8] are passed in registers, and
only need to be stored onto the stack if this is a varargs subroutine.
so the callee known if he is varargs (or not) and can do the appropriate
thing.} Once "on the stack" the arguments are a vector in memory which
can be indexed normally.
<
Thus caller needs not know if he is calling a varargs subroutine or not.
The only thing callee needs to do if he is varargs, is to store argument[1..8]
immediately "below" arguments[9..k] on the stack and to initialize a pointer
to that vector. Callee dereferences the vector and increments pointer each
access.
<
The original K&R C macro style varargs is compatible with My 66000 ABI.
Only the varargs subroutine has to know anything about variable argument
lists, caller remains blissfully ignorant, as does anyone varargs subroutine
calls.

Anton Ertl

unread,

Jul 2, 2022, 2:11:03 PM7/2/22

to

MitchAlsup <Mitch...@aol.com> writes:
>On Saturday, July 2, 2022 at 10:46:52 AM UTC-5, Anton Ertl wrote:
>> Varargs are responsible for much of the messiness in C calling
>> conventions, including the inability to guarantee tail-call
>> optimization (thanks to varargs, the caller has to deallocate the
>> arguments if they land on the stack).
><
>This depends on the ABI.
><
>My 66000 ABI the caller is responsible for allocating and deallocating
>arguments [9..k] on the stack while the callee is responsible for allocating
>and deallocating arguments[1..8] on the stack. Varargs is no different
>than non-varargs, here. {Arguments[1..8] are passed in registers, and
>only need to be stored onto the stack if this is a varargs subroutine.
>so the callee known if he is varargs (or not) and can do the appropriate
>thing.} Once "on the stack" the arguments are a vector in memory which
>can be indexed normally.
><
>Thus caller needs not know if he is calling a varargs subroutine or not.
>The only thing callee needs to do if he is varargs, is to store argument[1..8]
>immediately "below" arguments[9..k] on the stack and to initialize a pointer
>to that vector. Callee dereferences the vector and increments pointer each
>access.

That's fairly typical of modern ABIs, and it's a good approach. It is
compatible with calling a vararg function without prototype, and with
passing more arguments than the function needs. But the fact that the
caller puts arguments on the stack for 9 or more arguments means that
it cannot optimize tail calls to such functions. Can functions with
fewer arguments be tail-call optimized? One answer in
<https://stackoverflow.com/questions/3514283/c-tail-call-optimization>
mentions position-independent code, but I have not followed that
point. Anyway, the devil is in the details.

MitchAlsup

unread,

Jul 2, 2022, 4:16:04 PM7/2/22

to

Where should a caller, that is calling a subroutine put the 32 arguments it needs
to pass, if not on the stack ?
<
Secondarily, if the tail-call optimizer puts arguments 1..8 in registers, and 9..k
on the stack, then jumping to the entry point should perform the tail-call. It is
obvious that stack arguments[9..k] to the calling subroutine are no longer needed
so this area of the stack can be reused. The compiler may have to shuffle values
around to properly build the callers stack arguments, but the process is straight
forward.

MitchAlsup

unread,

Jul 2, 2022, 4:20:58 PM7/2/22

to

On Saturday, July 2, 2022 at 1:11:03 PM UTC-5, Anton Ertl wrote:

Note: My 66000 PIC does not have the problem mentioned at link.

Thomas Koenig

unread,

Jul 2, 2022, 4:44:46 PM7/2/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

> This thread addresses how architectures can make themselves more
> secure from various attack strategies.
><

> Everyone should chime in with their favorite methods and aparatuses

Regarding stacks:

> My 66000 has a Safe Stack mode (which I am contemplating making
> the only execution mode).

[...]

If the caller does not trust the callee, then it should not pass an
address into its own stack to return arguments. I would like the
possibility to protect the caller's stack against reading/writing by
using pointer overruns in such a case, so there could be a separate
argument return area. This could also be implemented by using a
separte stack (again with protection) or on the same stack as the
original call, with some rearrangement by the compiler if needed.

So, something like

[private memory of the caller, protected]
[read-only memory of the caller]
[read-write memory of the caller]

where the read-only and read-write part could be enforced by
hardware. This scheme could run into limitations if pointers were
passed several levels of callees; maybe a separate "call data stack"
would indeed be better.

Heap management - it might be a good idea to look at what the LLVM
and gcc sanitizers are doing, and how hardware could accelerate
that.

MitchAlsup

unread,

Jul 2, 2022, 5:32:24 PM7/2/22

to

On Saturday, July 2, 2022 at 3:44:46 PM UTC-5, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:
> > This thread addresses how architectures can make themselves more
> > secure from various attack strategies.
> ><
> > Everyone should chime in with their favorite methods and aparatuses
> Regarding stacks:
<

Preface:: To the extend I understand what Thomas is talking about::

<
> > My 66000 has a Safe Stack mode (which I am contemplating making
> > the only execution mode).
> [...]
>
> If the caller does not trust the callee, then it should not pass an
> address into its own stack to return arguments. I would like the
> possibility to protect the caller's stack against reading/writing by
> using pointer overruns in such a case, so there could be a separate
> argument return area. This could also be implemented by using a
> separte stack (again with protection) or on the same stack as the
> original call, with some rearrangement by the compiler if needed.
<

My 66000 has a typical paging protection scheme. It you want intra-page
protection variations you need to put the data in different pages. I worked
for several weeks on various schemes to protect different lines within a
page--but in the end it was all vastly easier to put the Safe Stack in different
pages and use a register to access it that cannot be seen by the lesser
privileged threads.
<
Even with Mills more exotic protection mechanism, they still chose to
use different memory areas for their equivalent of Safe Stack.

>
> So, something like
>
> [private memory of the caller, protected]
> [read-only memory of the caller]
> [read-write memory of the caller]
>

Why not write-only memory--memory this subroutine can write that is to be
consumed by another subroutine but not looked at by this one after writing ?

>
> where the read-only and read-write part could be enforced by
> hardware. This scheme could run into limitations if pointers were
> passed several levels of callees; maybe a separate "call data stack"
> would indeed be better.
<

Appears to me you need 4 stacks::
a) normal local_data stack
b) argument sending stack
c) result returning stack
d) stack where compiler data is stored that cannot be accessed by application.
....{preserved registers, return addresses, ... }
<
AND: I don't think anyone is going to go in this direction.

>
> Heap management - it might be a good idea to look at what the LLVM
> and gcc sanitizers are doing, and how hardware could accelerate
> that.
<

I did, field extraction is especially efficient in My 66000 ISA.

Ivan Godard

unread,

Jul 3, 2022, 12:23:19 AM7/3/22

to

Have you thought about varargs calls across protection boundaries?

Have you thought about varargs returns?

Anton Ertl

unread,

Jul 3, 2022, 7:55:50 AM7/3/22

to

MitchAlsup <Mitch...@aol.com> writes:
>On Saturday, July 2, 2022 at 1:11:03 PM UTC-5, Anton Ertl wrote:
>> But the fact that the
>> caller puts arguments on the stack for 9 or more arguments means that
>> it cannot optimize tail calls to such functions. Can functions with
><
>Where should a caller, that is calling a subroutine put the 32 arguments it needs
>to pass, if not on the stack ?

Passing the arguments on the stack is fine. However, making the
caller responsible for deleting them from the stack means that the
call cannot be tail-call optimized (because the cleanup has to happen
after the call, so it cannot be a tail-call).

A calling convention where the callee is responsible for removing
stack arguments would avoid that problem, but I don't think that's a
good idea for C.

>Secondarily, if the tail-call optimizer puts arguments 1..8 in registers, and 9..k
>on the stack, then jumping to the entry point should perform the tail-call.

The callee will not clean up the stack, because that's the
responsibility of the caller, so the caller must do the cleanup after
the call, and jumping to the callee won't do it.

Thomas Koenig

unread,

Jul 3, 2022, 11:22:44 AM7/3/22

to

MitchAlsup <Mitch...@aol.com> schrieb:

So, let's go through the -fsanitize switches of gcc and see where
hardware can help there.

First, the address sanitizer.

https://github.com/google/sanitizers/wiki/AddressSanitizer has the
details, which include:

- Use after free
- Heap buffer overflow
- Stack buffer overflow
- Global buffer overflow
- Use after return
- Use after scope
- Initialization order bugs
- Memory leaks

Trapping on use after return and use after scope stack inaccessible
after a function return, but this would do nothing for use after free.

Use after free - if the architecture had a way to mark memory
as "invalid", that could help. Page-grained protection would
probably lead to a huge waste of memory.

All kind of overflows - if it was possible to mark memory as
invalid to write to, that could help.

Memory leaks - that's a software issue, as are initialization
order bugs.

EricP

unread,

Jul 3, 2022, 12:47:46 PM7/3/22

to

And the release of this storage has to not leak in the face of
exceptions being thrown between the allocator and releaser.
(Also heaps are more expensive, require thread mutexes,...)

>>> In one possible implementation the callee allocates space on the stack,
>>> copies in the string return value and puts the address in R0,
>>> restores the preserved registers including FP,
>>> and returns to the caller *WITHOUT* restoring SP.
>>> Caller does what wants with that return string then restores SP.
>
>
> The most widely used Ada compiler, GNAT, uses a "secondary stack" for
> dynamically-sized function results and other dynamically-sized local
> variables. For such objects the normal or primary stack contains
> pointers into the secondary stack, leaving the primary stack frames with
> a statically fixed lay-out. Each thread (Ada task) has its own secondary
> stack arena that is created when the thread is created. Some systems fix
> the size of the secondary stack, which means that it can overflow
> (causing an exception); others may let the stack grow as needed.
>
> The top-of-stack pointer for the secondary stack is not reset when a
> function returns a value on the secondary stack, so it works as EricP
> described, except for not messing with the primary stack pointer.

Yes, a secondary stack is also an option.
I wanted to provide a practical example which requires manipulating
the registers in a manner incompatible with My66k SafeStack mode.

Both mechanisms are efficient (allocate is just a subtract and
a check against stack limit), thread local (no mutexes),
and allow for exceptions thrown between allocator and releaser.
Exception handler just has to save and restore the stack pointers
(as well as house keeping if destructors are involved).

Variable Return Functions (VRF) are uncommon enough that I would NOT
allocate a specific register to the Secondary Stack Pointer (SSP)
but rather locate it as a field in the user mode thread header.
(Also if new frame construction is rare I considered having a field
in the thread header for FP to be saved, saving the FP register).

Secondary stack comes with its own set of considerations, size,
but the manipulation is pretty much the same as the primary stack.

My ISA has a bank of up to 64k user mode "envionment registers"
that are pass information between user mode to/from OS or HW.
Two instructions "EVR rd,evr" and "EVW evr,rs" Environment Read/Write
a 16-bit environment register number that does something model specific.
Reads and writes of individual evr's can be trapped by virtual machines.
Reads or writes to invalid evr's causes an invalid instruction trap.

One such evr hold the user mode thread header pointer so it doesn't
need to take up a primary register. Another might hold the SSP.

>> Since ADA in particular has very strong (and checked) typing. This seems
>> to be a case where that short section of code at the return point gets
>> inlined
>> into callee just before the Epilogue sequence. And then use the ABI
>> normally.
>
>
> I don't understand that. How do you "inline" into the /callee/? And
> which "short section of code"? When an Ada function returns a
> dynamically-sized object, that object can live (in the caller, or the
> caller's caller, etc.) for a long time after it is returned.

Right, the method I described was not an inline.
It is a callee, who may be nested some depth for caller,
editing the registers such that the callee's stack allocation
appears inside the original callers local variable space.
Caller knows from the interface specification that callee will
do this so saves the SP beforehand and restores after.

There could be any number of routines between the allocator routine
and releaser routine (the original caller) passing this value around,
and multiple such allocations active at once.

MitchAlsup

unread,

Jul 3, 2022, 2:32:34 PM7/3/22

to

yes, it's possible; but code is not compatible as within same thread.

>
> Have you thought about varargs returns?
<

Only enough to be dangerous.

MitchAlsup

unread,

Jul 3, 2022, 2:37:41 PM7/3/22

to

On Sunday, July 3, 2022 at 6:55:50 AM UTC-5, Anton Ertl wrote:
> MitchAlsup <Mitch...@aol.com> writes:
> >On Saturday, July 2, 2022 at 1:11:03 PM UTC-5, Anton Ertl wrote:
> >> But the fact that the
> >> caller puts arguments on the stack for 9 or more arguments means that
> >> it cannot optimize tail calls to such functions. Can functions with
> ><
> >Where should a caller, that is calling a subroutine put the 32 arguments it needs
> >to pass, if not on the stack ?
<
> Passing the arguments on the stack is fine. However, making the
> caller responsible for deleting them from the stack means that the
> call cannot be tail-call optimized (because the cleanup has to happen
> after the call, so it cannot be a tail-call).
<

The tail call optimization neither allocates not deallocates space on the stack,
but reutilizes it.

>
> A calling convention where the callee is responsible for removing
> stack arguments would avoid that problem, but I don't think that's a
> good idea for C.
<

caller is responsible for deallocating stack he allocated.
callee is responsible for deallocating stack he allocated.

<
> >Secondarily, if the tail-call optimizer puts arguments 1..8 in registers, and 9..k
> >on the stack, then jumping to the entry point should perform the tail-call.
<
> The callee will not clean up the stack, because that's the
> responsibility of the caller, so the caller must do the cleanup after
> the call, and jumping to the callee won't do it.
<

Tail-call reuses callee allocated stack.
When a return is to be performed, callee deallocates that space so that
caller receives stack in the same condition he performed the original call.

Stefan Monnier

unread,

Jul 3, 2022, 4:03:57 PM7/3/22

to

> caller is responsible for deallocating stack he allocated.
> callee is responsible for deallocating stack he allocated.

[...]

> Tail-call reuses callee allocated stack.
> When a return is to be performed, callee deallocates that space so that
> caller receives stack in the same condition he performed the original call.

Proper tail-call elimination means that when A call B(b_args) and B ends
with a call to C(c_args), C should be entered with a stack that's
virtually identical as if A had called C(c-args) directly.

So if A deallocates the "b_args" he allocated by subtracting |b_args|
from the stack pointer, then it will misbehave unless |c_args| happens
to be equal to |b_args|. It can work correctly OTOH if A deallocates
the "b_args" by restoring some previous value of the stack pointer
without paying attention to the value of SP returned by the call.

Stefan

MitchAlsup

unread,

Jul 3, 2022, 4:22:43 PM7/3/22

to

On Sunday, July 3, 2022 at 3:03:57 PM UTC-5, Stefan Monnier wrote:
> > caller is responsible for deallocating stack he allocated.
> > callee is responsible for deallocating stack he allocated.
> [...]
> > Tail-call reuses callee allocated stack.
> > When a return is to be performed, callee deallocates that space so that
> > caller receives stack in the same condition he performed the original call.
<
> Proper tail-call elimination means that when A call B(b_args) and B ends
> with a call to C(c_args), C should be entered with a stack that's
> virtually identical as if A had called C(c-args) directly.
<

Obviously.

>
> So if A deallocates the "b_args" he allocated by subtracting |b_args|
<

A is not in control while B is in control. Thus A is not in a place to do anything.
A cannot deallocate the space he made on the stack until control returns.
{On a different note: the stack memory allocated for memory passed argu-
ments is the MAX( all( individual stack based arguments ) ). That is the
SP is manipulated once a the beginning of A and once at the end of A. A
knows this size and is ready to deallocate it when control returns from B
and it is time to return from A. }
<
When B tail-calls C, B uses the 8 register arguments to call C and can also
use all of the memory based argument area (since at this point that area
is no longer of value to B.) Certainly the 8 register arguments are not part
of this discussion. And if the stack memory area is large enough it can be
reused, too.

<
> from the stack pointer, then it will misbehave unless |c_args| happens
> to be equal to |b_args|. It can work correctly OTOH if A deallocates
> the "b_args" by restoring some previous value of the stack pointer
> without paying attention to the value of SP returned by the call.
<

So, all one has to do to make tail-calls work correctly is include C in the
MAX calculation above {transitively, of course.} and change the call to a
jump.
>
>
> Stefan

Stefan Monnier

unread,

Jul 3, 2022, 5:57:20 PM7/3/22

to

> So, all one has to do to make tail-calls work correctly is include C in the
> MAX calculation above {transitively, of course.} and change the call to a
> jump.

I don't think you can compute this MAX in general (e.g. when the call to
B or C is via an indirect function call).

Stefan

Ivan Godard

unread,

Jul 3, 2022, 6:23:49 PM7/3/22

to

Seems similar to the Ada dynamic object return that Niklas explained.

MitchAlsup

unread,

Jul 3, 2022, 8:19:39 PM7/3/22

to

How many (%) tail-call optimizations need more than 8 arguments ?

George Neuner

unread,

Jul 3, 2022, 10:06:57 PM7/3/22

to

On Sun, 3 Jul 2022 17:19:37 -0700 (PDT), MitchAlsup
<Mitch...@aol.com> wrote:

>How many (%) tail-call optimizations need more than 8 arguments ?

Percentage-wise, very few. I don't have URLs handy, but I recall
(late 90s, early 00s) seeing studies of several different languages
showing that the vast majority of all functions required 5 or fewer
arguments, and almost none required more than 12 arguments.

Note that they were NOT studying tail calls. Also my recollection is
that they were studying code AS WRITTEN by developers and not taking
into account compiler transformations: e.g., 'this' pointers for
object methods, hidden pointers to stack based closures or common data
structures, etc. There are a number of general transformations and
specific language features that can result in functions taking
additional compiler generated arguments.

However, it would be nice if those oddball functions that really *do*
require many arguments still could be tail-called efficiently. And
also called tail recursively efficiently.

There actually are more languages that require proper tail recursion
than languages that require tail call optimization. TCO is not
required for TR - at least not for simple same-function TR. Of
course, TCO aids CPS code, but since SSA became the dominant IR form,
very few compilers (for any language) bother doing CPS tranformation
any more: and there are a handful of situations where CPS and SSA just
do not play well together.

YMMV,
George

Stefan Monnier

unread,

Jul 4, 2022, 12:33:49 AM7/4/22

to

> There actually are more languages that require proper tail recursion
> than languages that require tail call optimization.

Interesting. I wonder how they specify it. The only languages I know
which specify some kind of "tail call" behavior are things like Scheme
or SML which specify it as a kind of "safe for space" property which
applies to all tail calls, whether recursive or not.

To make it apply only for "recursion" you need to clarify which kinds of
recursion need to be supported (mutual recursion? Recursion through an
indirect function call? ...), which is why I'm curious to see how it's
usually done.

Stefan

MitchAlsup

unread,

Jul 4, 2022, 2:40:28 PM7/4/22

to

On Sunday, July 3, 2022 at 3:03:57 PM UTC-5, Stefan Monnier wrote:

> > caller is responsible for deallocating stack he allocated.
> > callee is responsible for deallocating stack he allocated.
> [...]
> > Tail-call reuses callee allocated stack.
> > When a return is to be performed, callee deallocates that space so that
> > caller receives stack in the same condition he performed the original call.
<
> Proper tail-call elimination means that when A call B(b_args) and B ends
> with a call to C(c_args), C should be entered with a stack that's
> virtually identical as if A had called C(c-args) directly.
<

When A calls B
<
A's inbound argument[8..k] area
A's Local_data_area
A's outbound argument[8..k] area
<
B preforms prologue:
// A's outbound argument[8..k] area become B's inbound argument area.
<
B's Local_data_area
B's outbound argument[8..k] area
<
B calls C:: C performs prologue
// B's outbound argument[8..k] becomes C's inbound argument[8..k] area
<
C's Local_data_area
<
Since C is only tail-calling himself, and since at the time it calls itself
It no longer needs B's outbound arguments to C, it can use this space
for inbound arguments to C again. Thus the stack looks like:
< High address
A's inbound argument[8..k] area
A's Local_data_area
A's outbound argument[8..k] area B's inbound argument[8..k] area
B's Local_data_area
B's outbound argument[8..k] area C's inbound argument[8..k] area
C's Local_data_area
> low address
<
C tail-calls himself using C's inbound argument[8..k] area and
reutilizes C's Local_data_area.
<
C's inbound argument area is sufficient because B called C !
C's Local_data_area is sufficient because the current C is using it.
<
Thus, C has enough area in registers and on the stack to tail-call
C, and the work involved is identical to the work of setting up
arguments and calling--except that the call gets transformed
into a jump/branch and you enter the subroutine immediately
AFTER prologue--which BTW is known to the compiler at the
Call of C from C.

<
> So if A deallocates the "b_args" he allocated by subtracting |b_args|
<

A only deallocates b_args when A is returning to who called him.
You have the 'when' wrong; As long as we will return through A;
b_args remains on A's stack.

<
> from the stack pointer, then it will misbehave unless |c_args| happens
> to be equal to |b_args|. It can work correctly OTOH if A deallocates
> the "b_args" by restoring some previous value of the stack pointer
> without paying attention to the value of SP returned by the call.
<

You have the when wrong WRT when stack based arguments are
deallocated, and possibly where they sit on the stack.
>
>
> Stefan

MitchAlsup

unread,

Jul 4, 2022, 2:43:38 PM7/4/22

to

On Sunday, July 3, 2022 at 9:06:57 PM UTC-5, George Neuner wrote:
> On Sun, 3 Jul 2022 17:19:37 -0700 (PDT), MitchAlsup
> <Mitch...@aol.com> wrote:
>
> >How many (%) tail-call optimizations need more than 8 arguments ?
> Percentage-wise, very few. I don't have URLs handy, but I recall
> (late 90s, early 00s) seeing studies of several different languages
> showing that the vast majority of all functions required 5 or fewer
> arguments, and almost none required more than 12 arguments.
<

Let me extend my question::
<
How many (%) tail-call subroutines use variable arguments ? to
themselves ?

>
> Note that they were NOT studying tail calls. Also my recollection is
> that they were studying code AS WRITTEN by developers and not taking
> into account compiler transformations: e.g., 'this' pointers for
> object methods, hidden pointers to stack based closures or common data
> structures, etc. There are a number of general transformations and
> specific language features that can result in functions taking
> additional compiler generated arguments.
>
>
> However, it would be nice if those oddball functions that really *do*
> require many arguments still could be tail-called efficiently. And
> also called tail recursively efficiently.
<

I agree, but there was an assertion that MY 66000 ABI was going to encounter
trouble as written, and I am yet unconvinced that there is even a problem.

Stefan Monnier

unread,

Jul 4, 2022, 2:52:43 PM7/4/22

to

MitchAlsup [2022-07-04 11:40:26] wrote:
> On Sunday, July 3, 2022 at 3:03:57 PM UTC-5, Stefan Monnier wrote:
>> Proper tail-call elimination means that when A call B(b_args) and B ends
>> with a call to C(c_args), C should be entered with a stack that's
>> virtually identical as if A had called C(c-args) directly.
> <
> When A calls B
> <
> A's inbound argument[8..k] area
> A's Local_data_area
> A's outbound argument[8..k] area
> <
> B preforms prologue:
> // A's outbound argument[8..k] area become B's inbound argument area.
> <
> B's Local_data_area
> B's outbound argument[8..k] area
> <
> B calls C:: C performs prologue
> // B's outbound argument[8..k] becomes C's inbound argument[8..k] area
> <
> C's Local_data_area
> <
> Since C is only tail-calling himself, and since at the time it calls itself

Hmm... you're talking about a different situation. I'm talking about
the case where B tail-calls C. Whether C tail-calls itself or not is
a separate issue.

Stefan

MitchAlsup

unread,

Jul 4, 2022, 3:43:21 PM7/4/22

to

I though tail-call was where C calls itself (recursion mimicking looping).
B's calling of C is not a tail call, but a simple forward call (deeper onto
the stack)
>
>
> Stefan

MitchAlsup

unread,

Jul 4, 2022, 3:50:53 PM7/4/22

to

Adding:
<
When B does::
<
return C(c_args);
<
This is not a tail call, but a call where we know that B does not need control
returned. The very vast majority of these can be converted into a jump within
My 66000 ABI. Only when the argument list to C is larger than the argument
list to B is there any compiler issues. In these cases, the fall back position
is to convert B's calling of C in standard ABI form.
<
Given that B's outbound argument list (to C) is topologically below B's
Local_data_area on the stack, AND all of these are no longer needed,
C can tail-call itself for C's argument list and B's Local_data_area before
trouble is encountered in tail-call optimizations.
<
That should suffice for the 99.99%-ile
>
>
> Stefan

Stefan Monnier

unread,

Jul 4, 2022, 4:01:52 PM7/4/22

to

MitchAlsup [2022-07-04 12:43:19] wrote:
>> Hmm... you're talking about a different situation. I'm talking about
>> the case where B tail-calls C. Whether C tail-calls itself or not is
>> a separate issue.
> I though tail-call was where C calls itself (recursion mimicking looping).

A tail call is a call after which nothing else happens. E.g.:

int B(...)
{
[...]
return C(...);
}

the call to C is a tail call. In a language with proper tail-call
elimination (like Scheme or SML), the space used by B (and B's
arguments) should be recovered before calling C, such that during the
execution of C we're in the same situation as if A had called C directly
without going through B.

self-recursive tail calls are just one particular case of tail calls.

Most implementations of languages with proper tail-call elimination end
up using their own ABI for function calls, because the "native ABI"
designed for C tends not to be friendly for tail-call elimination :-(

> B's calling of C is not a tail call, but a simple forward call (deeper onto
> the stack)

None of B's local variables (and formal arguments) are still needed
after it called C, so keeping those on the stack is a kind of leak: it
should not be *deeper* onto the stack.

Many/most languages consider this kind of leak perfectly acceptable, but
not all of them. In the presence of automatic memory management, this
leak can be very significant because B's local variables may end up
pointing to a large data-structure that could otherwise be reclaimed.

A good reason for a language to require proper tail call elimination is
not just that it avoids those leaks but that those leaks can be
described/explained only in terms of low-level implementation artifacts
so if you want your language's semantics to be both clean and precise,
it's easier to require the implementation to be "safe for space" than to
try and clarify what is allowed to leak.

Stefan

John Levine

unread,

Jul 4, 2022, 4:03:57 PM7/4/22

to

According to MitchAlsup <Mitch...@aol.com>:

>> Hmm... you're talking about a different situation. I'm talking about
>> the case where B tail-calls C. Whether C tail-calls itself or not is
>> a separate issue.
><
>I though tail-call was where C calls itself (recursion mimicking looping).
>B's calling of C is not a tail call, but a simple forward call (deeper onto
>the stack)

This is definitely a tail call

int B(x)
{
y = ... x ...
return C(y);
}

Maybe B will tail call C so it's mutual recursion, maybe not, but with
proper tail call discipline it shouldn't matter.

If C tail calls itself directly, a decent optimizer should be able to turn that into a loop.

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Stefan Monnier

unread,

Jul 4, 2022, 5:01:41 PM7/4/22

to

> When B does::
> <
> return C(c_args);
> <
> This is not a tail call, but a call where we know that B does not need
> control returned.

In my books (and at https://en.wikipedia.org/wiki/Tail_call), this is
called "a tail call" :-)

> The very vast majority of these can be converted into a jump within
> My 66000 ABI. Only when the argument list to C is larger than the argument
> list to B is there any compiler issues.

Hmm... I thought the issue could only show up when the number of args
(for B or C) is large enough to require the use of the stack, but that
it could bite both when C's arglist is longer and when it's shorter
than B's.

But admittedly, I don't know enough about how ENTER/EXIT work
(AFAIK CALL/RET aren't affected by this because they only push/pop
a return address) and your ABI in general.

> In these cases, the fall back position
> is to convert B's calling of C in standard ABI form.

Can't B manually shuffle the stack to get rid of its own data and then
jump to C?

Stefan

MitchAlsup

unread,

Jul 4, 2022, 5:47:01 PM7/4/22

to

On Monday, July 4, 2022 at 4:01:41 PM UTC-5, Stefan Monnier wrote:
> > When B does::
> > <
> > return C(c_args);
> > <
> > This is not a tail call, but a call where we know that B does not need
> > control returned.
> In my books (and at https://en.wikipedia.org/wiki/Tail_call), this is
> called "a tail call" :-)
> > The very vast majority of these can be converted into a jump within
> > My 66000 ABI. Only when the argument list to C is larger than the argument
> > list to B is there any compiler issues.
<
> Hmm... I thought the issue could only show up when the number of args
> (for B or C) is large enough to require the use of the stack, but that
> it could bite both when C's arglist is longer and when it's shorter
> than B's.
<

Basically, as long as the stack's usage is "reasonable" the optimization can
be applied. When B or C's argument lists are 8 or smaller there is no concept
of a problem. When B argument list is longer than C's argument list, there is
never a 'stack' problem. When arguments to C do not point into B's Local_
data_area there is only a problem when C's argument list is longer than B's
inbound argument list + B's Local_data_area + B's outbound argument area.
The remaining 0.01% can restrict the optimization at little real cost.

>
> But admittedly, I don't know enough about how ENTER/EXIT work
> (AFAIK CALL/RET aren't affected by this because they only push/pop
> a return address) and your ABI in general.
> > In these cases, the fall back position
> > is to convert B's calling of C in standard ABI form.
<
> Can't B manually shuffle the stack to get rid of its own data and then
> jump to C?
<

It can manually reutilize the stack area it is already allowed to use::
{ B's inbound argument area, B's local_data_area, B's outbound argument area }
in creating a jump-call to C. Arguments to C pointing into B's Local_data_area
create the limit on how much more memory you have to play with before
running into trouble. With no such pointing arguments, you should be able to
optimize return-call into jump 99.99% of the time.
>
>
> Stefan

BGB

unread,

Jul 7, 2022, 7:17:56 PM7/7/22

to

On 7/2/2022 5:10 AM, Thomas Koenig wrote:
> MitchAlsup <Mitch...@aol.com> schrieb:

>> On Friday, July 1, 2022 at 4:48:27 PM UTC-5, EricP wrote:
>>> MitchAlsup wrote:
>>>> On Friday, July 1, 2022 at 3:08:29 PM UTC-5, EricP wrote:
>>>>> MitchAlsup wrote:
>>>>>> <
>>>>>> Everyone should chime in with their favorite methods and aparatuses
>>>>>> <
>>>>>> My 66000 has a Safe Stack mode (which I am contemplating making
>>>>>> the only execution mode).
>>>> <
>>>>> I don't think making safe mode mandatory is a good idea because it
>>>>> means your current ABI definition and in particular its register and
>>>>> stack manipulations are the only calling interface supported.
>>>> <
>>>> I will try to keep this in mind; and I though I had.
>>> Might just be me getting anxious when I feel caged in.
>>>>> Each language has its own requirements, and each compiler
>>>>> developer needs to be able to tailor their ABI to suit them.
>>>> <
>>>> What if those languages want to call {C, Fortran, Ada, C++, LISP, ...}
>>>> or existing run time libraries, existing system services, ...
>>>> There has to be some commonality to do this.
>>> The way many compilers seem to deal with this is with
>>> routine declarations that state which ABI to use.
>>>
>>> E.g. MS C and GCC have different x64 ABI's.
>>> If they want to support calls to the others code GCC might allow
>>> external int __MS_stdcall CreateWindow (blah);
>> <
>> I am trying to nip this in the bud before hand.
>

> If (hopefully) your architecture ever gets implemented in silicon,
> and Microsoft ports Windows to it, do you think that MS will care?

From what I have seen, MS's usual response to a new architecture is to
do something similar to their existing C ABI, just modified some to fit
the "flavor" of the new target (often with similar register assignments
to the target's original ABI).

Say, for example:
4 arguments in registers;
4 argument "Red Zone" on the stack;
Passing/returning structures by reference;
...

From what I have seen, they seem to have done similar ABI designs to
this on a number of targets.

I wouldn't be entirely surprised if, say, MS ever ported Windows or
similar to RISC-V, if they wouldn't end up doing something similar.

Though, it does come up as a thought that if (hypothetically) they went
for BJX2, they may or may not keep a similar ABI design, if for no
reason that the existing BJX2 ABI is already pretty close to the typical
MS design.

The only significant differences here being:
8 register arguments rather than 4;
No "Red Zone" area;
Passes structures under 16 bytes in registers rather than by-reference.

Granted, in part this was partly because BGBCC's ABI was originally
partially derived from the WinCE SH ABI anyways (rather than necessarily
from the original Hitachi ABI).

So, in this area, it remains an open question (well, and possibly
someone could make a point for re-adding the "Red Zone" area, though it
would be 8 arguments in this case).

...