Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What the World Needs Now

386 views
Skip to first unread message

Quadibloc

unread,
Dec 14, 2023, 7:44:50 AM12/14/23
to
...is love, sweet love. That's the only thing that there's just too little
of.

At least according to Hal David, and Burt Bacharach.

My original Concertina architecture was originally intended just to
illustrate how computers work, and then I decided to illustrate
almost every feature some computer somewhere ever had by throwing them
all in.

Concertina II was intended to be a bit more practical. But it wasn't
really designed primarily for commercial success.

I designed it to show what I would like to see - so it is still going
to have lots of features, if not _quite_ as many as the original
Concertina. IBM-format Decimal Floating Point. Character string
instructions. Cray-like vector instructions.

I threw in a block structure to allow pseudo-immediates, since I
agreed with Mitch that using immediates for constants is a good
idea with memory access being so slow these days. Even if the
way that made sense for me to implement them was one Mitch
quite reasonably found really awful.

And I also put in VLIW - based on the arguments in favor of the
Mill architecture, I felt that if being lightweight and avoiding
OoO but still being efficient is a good thing, then offering
this capability in a more conventional architecture, less
radically innovative than the Mill, might serve a purpose.

But none of this stuff is what people really want to see from
a new computer architecture. (In fact, they don't _want_ a
new computer architecture, they want to run their old Windows
programs in peace!)

What new improved capability _would_ have people beating
down the doors to get their hands on processors with a novel
architecture? To me, it's obvious. If somebody could
design a processor that would power computers that were
*much more secure* than today's computers, *that* is what
would be extremely attractive.

Is the problem really in the processor, though? Maybe the
problem is mostly in the software. Perhaps hardware outside
the processor - like having one hard disk read-only unless
a physical switch was turned to allow software to be installed
on drive C: - is what we need.

Mitch Alsup noted that on his design, there is no bit in the
status word that gets turned on to allow supervisor or privileged
or kernel mode; instead, program code allowed to run privileged
instructions is specified as such within the memory management
unit.

To me, it seemed like this approach had a fundamental problem,
although I'm sure Mitch figured out a way to deal with it.

Because you can always turn privileged mode off, but never on,
except through an interrupt or an interrupt-like mechanism
such as a supervisor call, when the computer is turned on, it
has to start from privileged mode. And memory management is
complicated, and needs to be set up before it can be used.

A workaround is to make 'raw' memory capable of running
privileged instructions, but then that means people who
don't want to bother with memory management don't have a
security mechanism.

So, while I felt his idea was useful, I would be tempted
to do things differently: have a privilege bit, but then
also have a feature that can be enabled that only allows
privileged code to run in code segments designated by the
memory descriptors given to the MMU.

Since the most insidious attack on a computer is to corrupt
its BIOS, or the boot sectors of its hard drive, a security
mechanism that could work when a computer is started up
would be attractive.

Modern microprocessors include the feature of executing
encrypted programs; the purpose of this is to allow
copy-protection schemes for things like movies and music
to work; the code is hidden from being disassembled, and
can't be tampered with or have its functionality duplicated.

This doesn't help security, though; a virus writer could
use the same public keys of the microprocessor maker
to hide parts of a virus from analysis. (But when something
is known to be a virus, and not an implementation of HDMI
encryption, the microprocessor maker will gladly decrypt
it for antivirus developers...)

In any case, this has inspired me to think of a feature to
add to Concertina II. Add a new kind of header block, which
must be the very first one, which contains an encrypted
checksum of the rest of the block - which must be valid for
the block to be allowed to execute. A mode exists where
only blocks with such a checksum can be executed.

The idea is to provide the following facilities:

Supply the computer with an encryption key.

Set the computer so that on bootup it is in the
mode only allowing execution of this checksummed code.

Have the computer add checksums to blocks, according
to an encryption key supplied when this is used, _not_
the one set inside the computer for validation, which
is only used for the purpose of validation.

With this:

The code to be executed is visible, and can be checked
to ensure it is the desired code;

A virus seeking to replace that code with its own can't
simply ask the computer to add checksums to its own
code.

A pin on the CPU will need to be used so that physical
access to the computer is required to change the
encryption/validation code it uses to a new one.

And to make replacing the BIOS with one with a new
encryption key for the checksums possible, the original
non-checksummed BIOS with which the machine is shipped
needs to be in ROM; the idea is that the Flash memory
used for updated BIOSes, which can be checksummed, can
be switched away from, and the mode requiring checksummed
code on bootup can be turned off, with physical access
to the machine, to allow a switchover to checksummed
code with a new key.

John Savard

MitchAlsup

unread,
Dec 14, 2023, 1:36:21 PM12/14/23
to
Quadibloc wrote:

> ....is love, sweet love. That's the only thing that there's just too little
> of.

> At least according to Hal David, and Burt Bacharach.

A people will happily go to war when the Hope in Peace is worse.
MitchAlsup 12/13/2023

> My original Concertina architecture was originally intended just to
> illustrate how computers work, and then I decided to illustrate
> almost every feature some computer somewhere ever had by throwing them
> all in.

> Concertina II was intended to be a bit more practical. But it wasn't
> really designed primarily for commercial success.

> I designed it to show what I would like to see - so it is still going
> to have lots of features, if not _quite_ as many as the original
> Concertina. IBM-format Decimal Floating Point. Character string
> instructions. Cray-like vector instructions.

> I threw in a block structure to allow pseudo-immediates, since I
> agreed with Mitch that using immediates for constants is a good
> idea with memory access being so slow these days. Even if the
> way that made sense for me to implement them was one Mitch
> quite reasonably found really awful.

> And I also put in VLIW - based on the arguments in favor of the
> Mill architecture, I felt that if being lightweight and avoiding
> OoO but still being efficient is a good thing, then offering
> this capability in a more conventional architecture, less
> radically innovative than the Mill, might serve a purpose.

> But none of this stuff is what people really want to see from
> a new computer architecture. (In fact, they don't _want_ a
> new computer architecture, they want to run their old Windows
> programs in peace!)

Apparently, people to not want the interiors of their automobiles
to look like their computer desk either (based on sales).

> What new improved capability _would_ have people beating
> down the doors to get their hands on processors with a novel
> architecture? To me, it's obvious. If somebody could
> design a processor that would power computers that were
> *much more secure* than today's computers, *that* is what
> would be extremely attractive.

My 66000 is not sensitive to the vast majority of current attack
strategies {rowhammer, Spectré, Meltdown, RoP, buffer overflows,
..}.

> Is the problem really in the processor, though? Maybe the
> problem is mostly in the software. Perhaps hardware outside
> the processor - like having one hard disk read-only unless
> a physical switch was turned to allow software to be installed
> on drive C: - is what we need.

> Mitch Alsup noted that on his design, there is no bit in the
> status word that gets turned on to allow supervisor or privileged
> or kernel mode; instead, program code allowed to run privileged
> instructions is specified as such within the memory management
> unit.

> To me, it seemed like this approach had a fundamental problem,
> although I'm sure Mitch figured out a way to deal with it.

> Because you can always turn privileged mode off, but never on,
> except through an interrupt or an interrupt-like mechanism
> such as a supervisor call, when the computer is turned on, it
> has to start from privileged mode. And memory management is
> complicated, and needs to be set up before it can be used.

No, but there does need to be a couple of cache lines in ROM
that define the state of the core(s) after reset is removed.
This data will provide Root Pointers for the various privilege
levels, various IPs, and random other configuration data. The
core will still have to enumerate PCIe, find, configure, and
initialize DRAM, load boot device drivers, and load the initial
system image.

> A workaround is to make 'raw' memory capable of running
> privileged instructions, but then that means people who
> don't want to bother with memory management don't have a
> security mechanism.

Let them burn. Their impudence will get them in the end.

> So, while I felt his idea was useful, I would be tempted
> to do things differently: have a privilege bit, but then
> also have a feature that can be enabled that only allows
> privileged code to run in code segments designated by the
> memory descriptors given to the MMU.

My 66000, instead, gives each privilege level its own
virtual address space.

> Since the most insidious attack on a computer is to corrupt
> its BIOS, or the boot sectors of its hard drive, a security
> mechanism that could work when a computer is started up
> would be attractive.

My 66000 comes out of reset with its MMUs turned on, its
privilege stack setup, and its priority interrupt system
enabled. it uses this ROM MMU to direct L1 and L2 to provide
storage for HLL code with DRAM is being found, configured,
initialized and a pool of DRAM pages built.

> Modern microprocessors include the feature of executing
> encrypted programs; the purpose of this is to allow
> copy-protection schemes for things like movies and music
> to work; the code is hidden from being disassembled, and
> can't be tampered with or have its functionality duplicated.

My 66000 has a means by which an application (without privilege)
can request Guest OS services and not allow GuestOS to examine
its memory. We call these applications "Paranoid". Obviously
only a subset of GuestOS service Calls will work, but files
and sockets do. I/O Devices have access to application memory
that GuestOS does not.

This prevents GuestOS from reading keys or secrets from an
application rather than needing another mode (SEM, AMD-V, ...)
Of course application cannot access GuestOS memory it does
not even have visibility to GuestOS root pointer.
{Hint:: GuestOS and application do not NECESSARILY share
an address space TLB entries, ASIDs--yet GuestOS can access
non paranoid application virtual memory. -- No Global bit}

> This doesn't help security, though; a virus writer could
> use the same public keys of the microprocessor maker
> to hide parts of a virus from analysis. (But when something
> is known to be a virus, and not an implementation of HDMI
> encryption, the microprocessor maker will gladly decrypt
> it for antivirus developers...)

> In any case, this has inspired me to think of a feature to
> add to Concertina II. Add a new kind of header block, which
> must be the very first one, which contains an encrypted
> checksum of the rest of the block - which must be valid for
> the block to be allowed to execute. A mode exists where
> only blocks with such a checksum can be executed.

This is one reason you cannot allow writable and executable
in the same PTE. My 66000 HW checks this.

> The idea is to provide the following facilities:

> Supply the computer with an encryption key.

Maybe more like an infinite set of encryption keys.

> Set the computer so that on bootup it is in the
> mode only allowing execution of this checksummed code.

Easily done if it comes out of reset with its MMU turned on.

Quadibloc

unread,
Dec 15, 2023, 9:30:31 AM12/15/23
to
On Thu, 14 Dec 2023 18:34:10 +0000, MitchAlsup wrote:

> This is one reason you cannot allow writable and executable in the same
> PTE. My 66000 HW checks this.

This is a very sound principle.

Of course, computers must allow executable programs to be loaded into
memory from the disk, but that just means that, now, loader programs
need to include a call to the operating system to change the status of
the memory into which a program has been loaded.

So the effect of this elementary precaution is to break any legacy
programs that do JIT compilation, for example. So this interferes with
making the x86/x86-64 platform secure.

Which helps to remind me of a very important fact: while I mentioned
adding a very _minor_ security feature to Concertina II here, which
happens to make use of the block format of program code, to permit
making certain types of fine-grained security a bit more convenient...

what is the really _important_ security enhancement that computers need?

In my opinion, it's better, more effective, sandboxing of Web browsers,
E-mail clients, and other Internet-facing applications. And when I've
been previously thinking about this, I noticed that one class of
applications - which is not connected with most of the security risk -
stands in the way of using some possible mechanisms for this purpose.

A lot of computer games - a class of applications which typically
require access to the full computational power of the computer - also
are connected to the Internet, so that players can interact and can't
cheat because actual gameplay takes place on servers, while the game
clients take care of game graphics.

If it's just browsers and E-mail clients, having those run in a
"sandbox" that also has a lot less performance than the real computer
is not a problem.

Is the CPU even the place for sandboxing? A genuinely effective
sandbox would involve a physical separation between the protected
computer and the one connected to the Internet, after all. But that
isn't convenient...

John Savard

EricP

unread,
Dec 15, 2023, 9:53:25 AM12/15/23
to
MitchAlsup wrote:
> Quadibloc wrote:
>
>> In any case, this has inspired me to think of a feature to
>> add to Concertina II. Add a new kind of header block, which
>> must be the very first one, which contains an encrypted
>> checksum of the rest of the block - which must be valid for
>> the block to be allowed to execute. A mode exists where
>> only blocks with such a checksum can be executed.
>
> This is one reason you cannot allow writable and executable
> in the same PTE. My 66000 HW checks this.

Not supporting RWE as a valid PTE permission won't stop anything.
It just means that on My66k an OS has to use trap and emulate
to a accomplish the same end result. All that does is make
certain things more expensive on that platform.

Whether to support RWE is a decision that should be left to the OS,
implemented through its privileging mechanisms and authorized
on a per-user or per-app basis for individual memory sections.


Scott Lurndal

unread,
Dec 15, 2023, 9:57:08 AM12/15/23
to
Quadibloc <quad...@servername.invalid> writes:
>On Thu, 14 Dec 2023 18:34:10 +0000, MitchAlsup wrote:
>
>> This is one reason you cannot allow writable and executable in the same
>> PTE. My 66000 HW checks this.
>
>This is a very sound principle.

Every modern translation table system from x86 through ARM have
at least one way to mark writable pages as no-execute. It's not a new feature.

The JIT folks have all adapted to this.

MitchAlsup

unread,
Dec 15, 2023, 12:11:05 PM12/15/23
to
Quadibloc wrote:

> On Thu, 14 Dec 2023 18:34:10 +0000, MitchAlsup wrote:

>> This is one reason you cannot allow writable and executable in the same
>> PTE. My 66000 HW checks this.

> This is a very sound principle.

> Of course, computers must allow executable programs to be loaded into
> memory from the disk, but that just means that, now, loader programs
> need to include a call to the operating system to change the status of
> the memory into which a program has been loaded.

> So the effect of this elementary precaution is to break any legacy
> programs that do JIT compilation, for example. So this interferes with
> making the x86/x86-64 platform secure.

The JiTer is in a different address space and has access to the common
JiTpool with write permission. The JiTpool is executable in the applications
using JiT. The transition between address spaces is on the order of 10
cycles. This is VASTLY more secure than allowing malicious JiT user
write access to the JiTed instruction space.

> Which helps to remind me of a very important fact: while I mentioned
> adding a very _minor_ security feature to Concertina II here, which
> happens to make use of the block format of program code, to permit
> making certain types of fine-grained security a bit more convenient...

> what is the really _important_ security enhancement that computers need?

> In my opinion, it's better, more effective, sandboxing of Web browsers,
> E-mail clients, and other Internet-facing applications. And when I've
> been previously thinking about this, I noticed that one class of
> applications - which is not connected with most of the security risk -
> stands in the way of using some possible mechanisms for this purpose.

Make the sandbox a completely different virtual address space.

> A lot of computer games - a class of applications which typically
> require access to the full computational power of the computer - also
> are connected to the Internet, so that players can interact and can't
> cheat because actual gameplay takes place on servers, while the game
> clients take care of game graphics.

> If it's just browsers and E-mail clients, having those run in a
> "sandbox" that also has a lot less performance than the real computer
> is not a problem.

Not when a context switch is 10-cycles.

You see, that is the problem, context switch (to a different VAS) is
currently 1000 cycles and "all these performant issues arise". Those
performant issues vanish when context switch is 10-cycles (without
needing an excursion through the OS.).

> Is the CPU even the place for sandboxing? A genuinely effective
> sandbox would involve a physical separation between the protected
> computer and the one connected to the Internet, after all. But that
> isn't convenient...

With 10-cycle context switches, you can run the sandbox where those
cores are only provided indirect access to the internet through an
IPI-like mechanism (also 10-cycles).

When a real hard context switch remains 10-cycless, you an run the
secure sandbox under a different HyperVisor that provides no illusion
of internet access. Still 10-cycles.

> John Savard

Quadibloc

unread,
Dec 15, 2023, 5:39:19 PM12/15/23
to
On Fri, 15 Dec 2023 17:10:36 +0000, MitchAlsup wrote:
> Quadibloc wrote:

>> Is the CPU even the place for sandboxing? A genuinely effective sandbox
>> would involve a physical separation between the protected computer and
>> the one connected to the Internet, after all. But that isn't
>> convenient...
>
> With 10-cycle context switches, you can run the sandbox where those
> cores are only provided indirect access to the internet through an
> IPI-like mechanism (also 10-cycles).
>
> When a real hard context switch remains 10-cycless, you an run the
> secure sandbox under a different HyperVisor that provides no illusion of
> internet access. Still 10-cycles.

It's certainly true that if you can have faster context switches,
you can use context switches more often with less loss of performance.

One obvious mechanism that some old computers used was to have a
separate set of registers for the 'other' state, instead of saving
and restoring registers from memory. So, yes, we certainly know
how to do that.

To me, though, the problem is that of course you can have context A
and context B, but how do we make the untrusted context secure, so
that it doesn't just find some vulnerability somewhere that wasn't
anticipated to affect or spy on the privileged context?

It's in a different address space, because the MMU put it there? Great,
that _should_ work, but we've already had cases of malware surmounting
the barriers between virtual machines under a hypervisor, for example.

If using a *virtual machine* won't give you a secure sandbox, then I
despair of anything else giving you a secure sandbox on a single computer.

And, worse, even if one had a perfect secure sandbox, it wouldn't
completely solve the security problem. Because one of the things
that you use the Internet for is to download programs to run on
your "good" computer. So the sandbox runs JavaScript and stuff like
that, but there's still a link between it and the computer we
want to keep secure, so that stuff can be downloaded.

And the stuff that's downloaded can be corrupted - on the server,
or inside the sandbox where the bad things are locked up!

These meditations lead me to the conclusion that there's no
really simple solution that can easily be seen to be perfect.

To give computers even a fighting chance to be more secure, then,
it seems the result will have to be more like this:

- Even the "good" computer, that runs the programs that need its
full speed and power, will still need to run antivirus programs
and have mitigations. Perimiter security that does away with the
need for this is not possible.

- The Internet-facing computer needs to be made highly resistant
to being compromised. An old idea from the early Bell Electronic
Switching System seems to be what is appropriate here: this
computer should be physically unable to ever write to the memory
it uses for (primary) executable code, that is, the programs
that handle Internet access, its own local operating system, and
so on; instead, that computer's software gets loaded into its
memory by the "good" computer shielded behind it.

It still gets to run "secondary" executable code - JavaScript and
the like - and because that's so dangerous, the secondary computer,
instead of being thought of as the sandbox, should still have
the kind of software sandbox functionality that processors do
nowadays. But, wait - we already know those sandboxes _can_ be
compromised. So, while what is mentioned so far creates a second
line of defense, which is nice, it's not really a "solution", since
the lines of defense can be compromised one at a time.

So you need one other thing. A good mechanism, external to the
Internet-facing computer, which can detect if something has been
compromised on that computer. For example, a technique like
rowhammer could compromise the program memory in it that
it doesn't even have a read connection to! (Of course, here, using
special memory, even if slower and/or more expensive, that avoids
rowhammer is another measure that I think is warranted.)

So the idea, I guess, so far is this:

For routine stuff - JavaScript that executes to give web sites more
functionality - that stuff stays outside the "real" computer, thus
making a large reduction in the risk.
Some functionality requires things to go from the Internet to the
"good" computer to be executed, but if the amount of that is limited,
then one can restrict that to trusted sources: i.e. reputable
game publishers, trusted repositories of software for download.

That keeps the danger down to a "dull roar"; no longer will sending a
bad E-mail to a computer or getting it to look at a bad web site let
a miscreant take over the computer. Social engineering, where a user
is led to trust the _wrong_ software repository, of course, is still
going to be possible. That couldn't be eliminated without compromising
the usability and power of the computer (of course, for special
purposes, such a compromise may be admissible; i.e. locked down
Chromebooks or iOS devices lent out by schools to their students and
stuff like that).

So now you've heard my philosophy on security. (And that guy who
made philosophy a dirty word here disappeared for real about the time
of the spam onslaught. So that ill wind blew some good.)

John Savard

Quadibloc

unread,
Dec 15, 2023, 8:07:29 PM12/15/23
to
On Fri, 15 Dec 2023 22:39:15 +0000, Quadibloc wrote:

> If using a *virtual machine* won't give you a secure sandbox, then I
> despair of anything else giving you a secure sandbox on a single
> computer.

So, instead, my answer boils down to this:

Yo dawg! I heard you like computers, so I put a computer in your
computer so that you can compute (securely) while you compute (with
untrusted code).

John Savard

BGB

unread,
Dec 15, 2023, 9:20:05 PM12/15/23
to
Agreed.

There are some targets where double-mapping is needed for RWE use-cases,
but this doesn't really solve anything apart from making JITs annoying
(and this sort of thing doesn't really seem to have caught on).

RWE should not be a default though:
Executable sections can be R+X;
Data sections and heap can be R/W.


OTOH, in my project, I had added a "tk_malloc_cat()" function (also
"_malloc_cat()"), which is given a magic "cat" value that can specify
various special cases:
RWE memory;
GlobalAlloc memory;
...

This allows the memory manager to parcel out these special cases, while
keeping the usual malloc/free interface (though, each category has its
own free-lists), and "tk_free()"/"free()" will return the object to the
appropriate free list.


Besides JITs, a major use-case for RWE memory is implementing things
like lambdas and similar (and allowing them to present themselves as
normal C function pointers).


Quadibloc

unread,
Dec 16, 2023, 11:20:03 AM12/16/23
to
On Fri, 15 Dec 2023 09:53:14 -0500, EricP wrote:

> Not supporting RWE as a valid PTE permission won't stop anything. It
> just means that on My66k an OS has to use trap and emulate to a
> accomplish the same end result. All that does is make certain things
> more expensive on that platform.
>
> Whether to support RWE is a decision that should be left to the OS,
> implemented through its privileging mechanisms and authorized on a
> per-user or per-app basis for individual memory sections.

Although I tend to agree with you, I also think that allowing write
permission and execute permission at the same time is highly
dangerous. So I would both deprecate that option, and put extra
security safeguards around it so that if malware found a vulnerability
in the operating system, it still wouldn't be able to turn that mode
on.

Here, making it so that once access to write plus execute is turned off,
it can't be turned on again without a reboot seems appropriate. (Or
"potentially on", because forcing the system to boot with this enabled
isn't a good idea either. So there would be three states: RWE available,
but off; RWE on; RWE turned permanently off until reboot. From the third
state, there is no return to the first two.)

John Savard

EricP

unread,
Dec 16, 2023, 11:54:53 AM12/16/23
to
Did you take a course in customer abusive design or something?
Who is it you think you are going to sell this feature to?



Anton Ertl

unread,
Dec 16, 2023, 12:58:48 PM12/16/23
to
Quadibloc <quad...@servername.invalid> writes:
>Although I tend to agree with you, I also think that allowing write
>permission and execute permission at the same time is highly
>dangerous.

Security theatre!

Gforth uses mmap(... RWX ...) for the area where it generates native
code. Now on MacOS on Apple Silicon (MacOS on Intel works fine) that
mmap() fails, which means that currently the development version of
Gforth does not work. When I work around this misfeature of MacOS on
Apple Silicon (by disabling native code generation), Gforth will run
around a factor of 2 slower on MacOS than on Linux.

Ok, you might say, the additional security is worth that to you. But
note that either variant can write to every byte in the process, jump
to every executable byte in the process, and perform all the system
activities supported by Gforth (including writing files and then
calling chmod on them and invoking them, and compiling and running C
code). That's because Gforth implements a general-purpose programming
language that's supposed to be able to do all these things. So this
security "feature" of MacOS on Apple Silicon does not make Gforth any
more secure.

Well, you might say, but at least Gforth does not run on MacOS on
Apple Silicon now. But it does, in the latest released version, which
happens to be able to work around this misfeature automatically. It's
still not any more secure.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Quadibloc

unread,
Dec 16, 2023, 6:49:33 PM12/16/23
to
On Sat, 16 Dec 2023 11:54:40 -0500, EricP wrote:
> Quadibloc wrote:

>> Here, making it so that once access to write plus execute is turned
>> off,
>> it can't be turned on again without a reboot seems appropriate. (Or
>> "potentially on", because forcing the system to boot with this enabled
>> isn't a good idea either. So there would be three states: RWE
>> available,
>> but off; RWE on; RWE turned permanently off until reboot. From the
>> third state, there is no return to the first two.)

> Did you take a course in customer abusive design or something?
> Who is it you think you are going to sell this feature to?

Well, Mitch proposes making it completely and absolutely impossible
for memory to both contain code that can currently be executed, and
for it to be writeable, at the same time.

I recognized the logic behind this. Yet, there are cases when this
would be a problem. So I felt that an option might be to dial the
security just a tiny notch down:

Make it _possible_ to have an operating system that did allow
read and execute at the same time, but _also_ make it possible for
an operating system not to allow it - in a way that can't be
subverted. The only way to change that being disabled is to reboot
into a different operating system that does allow it.

John Savard

Quadibloc

unread,
Dec 16, 2023, 6:58:07 PM12/16/23
to
On Sat, 16 Dec 2023 17:33:48 +0000, Anton Ertl wrote:
> Quadibloc <quad...@servername.invalid> writes:

>>Although I tend to agree with you, I also think that allowing write
>>permission and execute permission at the same time is highly dangerous.

> Security theatre!
>
> Gforth uses mmap(... RWX ...) for the area where it generates native
> code. Now on MacOS on Apple Silicon (MacOS on Intel works fine) that
> mmap() fails, which means that currently the development version of
> Gforth does not work. When I work around this misfeature of MacOS on
> Apple Silicon (by disabling native code generation), Gforth will run
> around a factor of 2 slower on MacOS than on Linux.

This is an argument you need to have with Mitch, rather than
with me. I lack the expertise to really assess how valuable not allowing
write and execute at the same time is; it certainly seems potentially
dangerous, so I went along with Mitch.

At the same time, there might be a demand for having both write and execute
in some cases - such as GForth. So I decided on allowing write/execute to
be disabled - with re-enebling it requiring a reboot. Those who fear
write/execute because of security then have a situation that's almost as
good as the processor not supporting it at all, but those who want it
have the option of using an operating system that runs with it allowed.

The situation on older computers, where being able to write and
execute together was just the norm, makes things easier for malware
to tamper with code. So it's clearly bad. If, instead, you have to
request that new blocks of memory be created with write/execute, then
JIT programs can run, but this can't be used to tamper with existing code
in memory - maybe that's good enough, and anything more is just
security theatre.

But hackers have been able to get around all sorts of security features
in computers.

John Savard

MitchAlsup

unread,
Dec 16, 2023, 8:21:09 PM12/16/23
to
Quadibloc wrote:

> On Sat, 16 Dec 2023 11:54:40 -0500, EricP wrote:
>> Quadibloc wrote:

>>> Here, making it so that once access to write plus execute is turned
>>> off,
>>> it can't be turned on again without a reboot seems appropriate. (Or
>>> "potentially on", because forcing the system to boot with this enabled
>>> isn't a good idea either. So there would be three states: RWE
>>> available,
>>> but off; RWE on; RWE turned permanently off until reboot. From the
>>> third state, there is no return to the first two.)

>> Did you take a course in customer abusive design or something?
>> Who is it you think you are going to sell this feature to?

> Well, Mitch proposes making it completely and absolutely impossible
> for memory to both contain code that can currently be executed, and
> for it to be writeable, at the same time.

I go even further:: GOT is read-only to the application using dynamic
linking--so malicious applications cannot reprogram GOT into a RoP
attack strategy.

> I recognized the logic behind this. Yet, there are cases when this
> would be a problem. So I felt that an option might be to dial the
> security just a tiny notch down:

Context switching in 10-cycles is the cure to the perceived problems.

Quadibloc

unread,
Dec 16, 2023, 11:20:33 PM12/16/23
to
On Sat, 16 Dec 2023 23:58:02 +0000, Quadibloc wrote:

> At the same time, there might be a demand for having both write and
> execute in some cases - such as GForth. So I decided on allowing
> write/execute to be disabled - with re-enebling it requiring a reboot.
> Those who fear write/execute because of security then have a situation
> that's almost as good as the processor not supporting it at all, but
> those who want it have the option of using an operating system that runs
> with it allowed.

But what if the next best thing isn't good enough?

> But hackers have been able to get around all sorts of security features
> in computers.

So if the CPU normally gets set, by the BIOS, or by boot-up code in the
OS, into a mode that doesn't allow write plus execute... and can't be
exited short of a reboot...

well, then, the hacker just has to corrupt the BIOS or early boot-up
code so that it never gets into that mode into the first place, if writing
executable code in memory is the next step in taking over the system!

Hmm. How do I take care of this?

Well, I've _already_ noted that *a pin on the CPU* might be used to
cause it to start up in a mode where it will refuse to execute ANY
instructions unless they're in blocks protected by encrypted
checksums!

So perhaps we need to dedicate _another_ pin on the CPU to mandating
another security mode. If the user wishes to only run operating systems
where write plus execute is hard disabled, then... ground that pin
or whatever, and it tells the CPU:

do not allow exiting the checksummed blocks only mode unless write
plus execute has been disabled!

That will stop any attempt to bring back write plus execute to a system
intended to exclude it!

John Savard

Quadibloc

unread,
Dec 16, 2023, 11:39:57 PM12/16/23
to
On Sun, 17 Dec 2023 04:20:28 +0000, Quadibloc wrote:

> So perhaps we need to dedicate _another_ pin on the CPU to mandating
> another security mode. If the user wishes to only run operating systems
> where write plus execute is hard disabled, then... ground that pin or
> whatever, and it tells the CPU:
>
> do not allow exiting the checksummed blocks only mode unless write plus
> execute has been disabled!

Silly me. This is more complicated than simply using that pin to
completely disable write plus execute.

John Savard

Quadibloc

unread,
Dec 16, 2023, 11:42:01 PM12/16/23
to
On Sun, 17 Dec 2023 01:17:14 +0000, MitchAlsup wrote:

> Context switching in 10-cycles is the cure to the perceived problems.

Yes, for a new architecture.

For an existing architecture, such as x86/x86-64, though, an OS
call to switch from allowing write to allowing execute is something
that would need to be added to a program using JIT compilation
that was written before the write/execute disable security feature
was added.

John Savard

Anton Ertl

unread,
Dec 17, 2023, 3:40:53 AM12/17/23
to
Quadibloc <quad...@servername.invalid> writes:
>On Sat, 16 Dec 2023 17:33:48 +0000, Anton Ertl wrote:
>> Quadibloc <quad...@servername.invalid> writes:
>
>>>Although I tend to agree with you, I also think that allowing write
>>>permission and execute permission at the same time is highly dangerous.
>
>> Security theatre!
>>
>> Gforth uses mmap(... RWX ...) for the area where it generates native
>> code. Now on MacOS on Apple Silicon (MacOS on Intel works fine) that
>> mmap() fails, which means that currently the development version of
>> Gforth does not work. When I work around this misfeature of MacOS on
>> Apple Silicon (by disabling native code generation), Gforth will run
>> around a factor of 2 slower on MacOS than on Linux.
>
>This is an argument you need to have with Mitch, rather than
>with me.

This is a newsgroup, he can read and react on my posting as well as
you and others.

>I lack the expertise to really assess how valuable not allowing
>write and execute at the same time is; it certainly seems potentially
>dangerous, so I went along with Mitch.

The point I was making is that in the case of Gforth, it does not add
any danger that is not already there. If you let an untrusted user
run Gforth, it's like allowing that user to run non-suid binaries that
that user supplied.

Thanks to return-oriented or jump-oriented programming, even normal
applications can do pretty much everything without needing RWX. In
recent times CPU manufacturers have added features for making these
techniques harder to use, but for anything using a virtual-machine
interpreter (e.g., Gforth), these features are unlikely to help,
because they provide a Turing-complete set of routines that you can
jump to, and typically enough OS functions for an attack.

So better sabotage interpreters, too? Goodbye, CPython, bash etc.
Instead, like Mitch Alsup and Apple suggest, provide a twisty maze of
passages for a JIT compiler writer to follow (different passages for
Apple and My66000), and hope that the JIT compiler writers follow this
passage while the attackers will not.

I don't think this will work, either: E.g., an attacker could
overwrite VM code before it is JIT-compiled, and presto, they again
can do anything the VM can do, like with the interpreter.

Bottom line: W^X does not add security, it adds security theatre.

Anton Ertl

unread,
Dec 17, 2023, 3:56:51 AM12/17/23
to
Quadibloc <quad...@servername.invalid> writes:
>The situation on older computers, where being able to write and
>execute together was just the norm, makes things easier for malware
>to tamper with code.

It is the norm.

How does it make things easier for attackers to tamper with code? For
15-20 years, malloc() has allocated everything RW (no X), data and BSS
segments are RW, and code segments are RX. So any plain C program
(and stuff based on that) never sees RWX memory.

Those programs that perform mmap(... RWX ...) need it for a reason,
typically for things like JIT compilation. And as I explained in
<2023Dec1...@mips.complang.tuwien.ac.at>, disabling RWX makes
life worse for the authors or users of these programs, but does not
prevent attacks from succeeding.

>If, instead, you have to
>request that new blocks of memory be created with write/execute, then
>JIT programs can run, but this can't be used to tamper with existing code
>in memory - maybe that's good enough, and anything more is just
>security theatre.

That's the usual situation nowadays, no need to cripple the hardware
for it; the exception is MacOS on Apple silicon, and this was done
without crippling the hardware, as demonstrated by Linux and by the
twisty passage that Apple wants JIT authors to follow.

Anton Ertl

unread,
Dec 17, 2023, 4:17:43 AM12/17/23
to
Quadibloc <quad...@servername.invalid> writes:
>Well, I've _already_ noted that *a pin on the CPU* might be used to
>cause it to start up in a mode where it will refuse to execute ANY
>instructions unless they're in blocks protected by encrypted
>checksums!

Of course freely programmable computers are not in the interest of
everyone. In particular, not in the interests of providers of
"consumer product" type things like the Playstation and the iPhone.
Sony and Apple want to be the gatekeepers for the software that runs
one their customer's computers (but of course, Sony and Apple think
these are their (Sony's and Apple's) computers); and they are charging
a pretty hefty toll at the gate, and don't want anyone to bypass the
gate and supply their costomers (and they want to turn the customers
into an exclusive possession) with software in some other way.

Therefore they have done such things for a long time. And other
business types also like this kind of business model, so we have had
such a misfeature, called "Secure Boot" in PCs for quite some time,
and it is quite a bit more elaborated than what you have outlined.

However, PC customers have not been as willing to let themselves be
chained by an overlord as game console or smartphone customers, so the
overlord has been moving slowly in the PC space, but it has been
moving: For Windows 11 to be installable a TPM (a hardware feature to
make the system airtight) is required, and IIRC for Windows on ARM
Microsoft requires Secure Boot to be enabled.

How many more years before all software that we run on our (not
Microsoft's) PCs has to go through the toll gates of Microsoft?

In any case, what you are imagining has already been imagined many
years ago, in more detail, and it has little to do with disabling RWX.

Thomas Koenig

unread,
Dec 17, 2023, 6:14:47 AM12/17/23
to
Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:
> Quadibloc <quad...@servername.invalid> writes:
>>The situation on older computers, where being able to write and
>>execute together was just the norm, makes things easier for malware
>>to tamper with code.
>
> It is the norm.
>
> How does it make things easier for attackers to tamper with code? For
> 15-20 years, malloc() has allocated everything RW (no X), data and BSS
> segments are RW, and code segments are RX. So any plain C program
> (and stuff based on that) never sees RWX memory.

An example: executable stacks are the norm on Linux, for realizing
trampolines (gcc parlance for pointers to nested functions).
See https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html .

I believe that gcc now uses heap-allocated rwx pages on Darwin,
which I believe is allowed (but I would have to check details
to be sure).

So, an open question: How does one create a pointer to a nested
function in architectures which disallow rwx pages? Standard C
does not allow this, but languages like Fortran do, as does a
gcc extension (which clang does not support, AFAIK).

A Fortran example (sorry for the somewhat lengthy code, but
I wanted to make it self-contained).

module interf
implicit none
abstract interface
function to_zero (x)
real, intent(in) :: x
real :: to_zero
end function to_zero
end interface
interface
function search_zero (a, b, fun)
import
real, intent(in) :: a, b
procedure (to_zero) :: fun
end function search_zero
end interface
end module interf

program main
use interf
implicit none
real :: a, b, sol
read (*,*) a, b
sol = search_zero (a, b, my_to_zero)
print *,sol

contains
function my_to_zero(x)
real, intent(in) :: x
real :: my_to_zero
my_to_zero = cos(x-a-b**2)
end function my_to_zero
end program main

John Dallman

unread,
Dec 17, 2023, 7:26:17 AM12/17/23
to
In article <ulml73$10fu3$1...@newsreader4.netcologne.de>,
tko...@netcologne.de (Thomas Koenig) wrote:

> An example: executable stacks are the norm on Linux, for realizing
> trampolines (gcc parlance for pointers to nested functions).
> See https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html .

One can, however, specify at link time that an executable or shared
library has no need for an executable stack, and it's good practice to do
so when possible.

John

Quadibloc

unread,
Dec 17, 2023, 8:52:51 AM12/17/23
to
On Sun, 17 Dec 2023 08:16:23 +0000, Anton Ertl wrote:

> The point I was making is that in the case of Gforth, it does not add
> any danger that is not already there.

I'm not sure of the relevance of that objection.

Nobody is accusing GForth of having bugs in it. The problem is that
the feature of the CPU that is needed for GForth to run efficiently -
having memory to which both write and execute access is permitted
at the same time - is dangerous.

That doesn't depend on GForth itself being dangerous. It means the
computer is more open to attacks which don't involve the application
GForth at all.

John Savard

Anton Ertl

unread,
Dec 17, 2023, 10:26:02 AM12/17/23
to
Thomas Koenig <tko...@netcologne.de> writes:
>An example: executable stacks are the norm on Linux, for realizing
>trampolines (gcc parlance for pointers to nested functions).
>See https://gcc.gnu.org/onlinedocs/gccint/Trampolines.html .

Are they the norm?

I did "pmap <pid>|grep rwx" on a number of processes in Debian 11, and
came up empty (no rwx page) on a bash, an xterm, and an xrn process.
I have compiled xrn myself without giving a flag for suppressing rwx.
Finally, the fourth try brought up a process with two rwx memory areas
(out of >3000), but they were "[ anon ]", not "[ stack ]": firefox.
For Firefox with it's Javascript JIT I am not surprised.

I next did

cd /proc
grep -l rwx */maps

and it came up with one other process (out of 170) in addition to
Firefox: Xorg.

>I believe that gcc now uses heap-allocated rwx pages on Darwin,
>which I believe is allowed (but I would have to check details
>to be sure).

On MacOS on Intel mmap(... RWX ...) works without further ado, for
MacOS on Apple Silicon I have come across some documentation about the
hoops that Apple wants JIT compiler writers to jump through; but given
that I don't plan to, I don't remember any details.

>So, an open question: How does one create a pointer to a nested
>function in architectures which disallow rwx pages? Standard C
>does not allow this, but languages like Fortran do, as does a
>gcc extension (which clang does not support, AFAIK).

The classical approach has been to pass such a thing as two machine
words: one points to the code, the other to the environment. Of
course, in an ABI that's based on the idea of only passing a code
pointer (e.g., what you might design with standard C in mind), you
need to pass the environment pointer through the code pointer, which
is what trampolines give you. I think that if you want to do that,
you either have to implement trampolines (and need to write code for
that), or you have to use a different ABI (e.g., pass a pair, or pass
a data pointer as a function pointer).

Anton Ertl

unread,
Dec 17, 2023, 10:53:59 AM12/17/23
to
Quadibloc <quad...@servername.invalid> writes:
>On Sun, 17 Dec 2023 08:16:23 +0000, Anton Ertl wrote:
>
>> The point I was making is that in the case of Gforth, it does not add
>> any danger that is not already there.
>
>I'm not sure of the relevance of that objection.
>
>Nobody is accusing GForth of having bugs in it. The problem is that
>the feature of the CPU that is needed for GForth to run efficiently -
>having memory to which both write and execute access is permitted
>at the same time - is dangerous.

I don't know what the bugs of Gforth have to do with it.

Concerning the feature, it is not any more dangerous than Gforth
already is (irrespective of bugs and irrespective of RWX pages),
because Gforth already provides all the power that any non-suid
process running as that user already has.

>It means the
>computer is more open to attacks which don't involve the application
>GForth at all.

It isn't. As mentioned earlier, only those processes that need them
get RWX areas. Those processes that do not have RWX areas are
certainly not "more open to attacks". And those that have RWX are
still just as open to attacks as without RWX. I have given the
example of Gforth.

For FireFox with its JavaScript and WebAssembly JIT compilers a
similar reasoning applies. The reasoning is more involved because
unlike Forth, JavaScript and WebAssembly are supposed to be
memory-safe. However, if some buffer overflow vulnerability
(typically in some library written in C) allows writing code in an RWX
area, it probably also allows overwriting the data in a way that
subverts memory safety.

For Xorg I don't know why it has an RWX area, but it has 86 areas
mapped R-X, which surely provide plenty of gadgets for ROP or JOP. It
even includes libLLVM-11, so yes, there is everything there that any
attacker can wish for, so the presence of the RWX area probably
provides no additional danger, either.

John Levine

unread,
Dec 17, 2023, 11:01:29 AM12/17/23
to
According to Thomas Koenig <tko...@netcologne.de>:
>So, an open question: How does one create a pointer to a nested
>function in architectures which disallow rwx pages?

You make pointers two words, one for the code, one that points to the
display for the dynamic data. This has been a standard compiler
technique since the late 1960s, and indeed the GCC page you pointed to
calls them descriptors. In the fine tradition of GCC overimplmenting
everything, they set bits in descriptors to deliberately misalign them
and indirect calls check to see how much indirection to do.

For a particularly painful example of nested functions, see Knuth's Man or Boy test

https://rosettacode.org/wiki/Man_or_boy_test

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

Thomas Koenig

unread,
Dec 17, 2023, 11:38:44 AM12/17/23
to
John Levine <jo...@taugh.com> schrieb:
> According to Thomas Koenig <tko...@netcologne.de>:
>>So, an open question: How does one create a pointer to a nested
>>function in architectures which disallow rwx pages?
>
> You make pointers two words, one for the code, one that points to the
> display for the dynamic data. This has been a standard compiler
> technique since the late 1960s, and indeed the GCC page you pointed to
> calls them descriptors. In the fine tradition of GCC overimplmenting
> everything, they set bits in descriptors to deliberately misalign them
> and indirect calls check to see how much indirection to do.

... which is yields ABI problems if the ABI is modeled on C,
and on function pointers being convertible to void* and back),
and needs requires conditional code for _each_ call through a
function pointer, because it needs to check if it is a vanilla
call or a call to a nested function.

> For a particularly painful example of nested functions, see Knuth's Man or Boy test
>
> https://rosettacode.org/wiki/Man_or_boy_test

Yes, that one is quite challenging.

EricP

unread,
Dec 17, 2023, 11:39:16 AM12/17/23
to
I'm objecting to the notion of forcing a processor to reboot
for any reason, let alone in order to change something that is
under control of the operating system. There was a time before PC's
when computers normally ran for a year without reboot and never crashed.

As for Mitch's RW-no-E restriction, I am confident it
will not survive its first encounter with actual customers
(where philosophy collides with income).


Thomas Koenig

unread,
Dec 17, 2023, 1:07:56 PM12/17/23
to
EricP <ThatWould...@thevillage.com> schrieb:

> I'm objecting to the notion of forcing a processor to reboot
> for any reason, let alone in order to change something that is
> under control of the operating system. There was a time before PC's
> when computers normally ran for a year without reboot and never crashed.

[tkoenig@cfarm135 ~]$ uptime
18:03:12 up 1537 days, 19:33, 1 user, load average: 12,31, 9,59, 8,58
[tkoenig@cfarm135 ~]$ uname -a
Linux cfarm135 4.18.0-80.7.2.el7.ppc64le #1 SMP Thu Sep 12 15:45:05 UTC 2019 ppc64le ppc64le ppc64le GNU/Linux

MitchAlsup

unread,
Dec 17, 2023, 1:26:07 PM12/17/23
to
Thomas Koenig wrote:

> EricP <ThatWould...@thevillage.com> schrieb:

>> I'm objecting to the notion of forcing a processor to reboot
>> for any reason, let alone in order to change something that is
>> under control of the operating system. There was a time before PC's
>> when computers normally ran for a year without reboot and never crashed.

When we shut down the server at Ross Technologies, it had been up for
7 years and several months. It had 16× the memory it started with,
25× the disk space, and every CPU had been upgraded multiple times
from single core 33 Mhz SUN to quad core 150 Mhz Ross modules.

Quadibloc

unread,
Dec 17, 2023, 1:26:42 PM12/17/23
to
On Sun, 17 Dec 2023 11:38:43 -0500, EricP wrote:

> As for Mitch's RW-no-E restriction, I am confident it will not survive
> its first encounter with actual customers (where philosophy collides
> with income).

It's true that since it is Mitch who advanced the assertion that it is
is a useful restriction here, it is he who people should debate its
needfulness with. However, I do not believe that it was _his idea_,
but rather that this restriction is *current accepted practice*, and
indeed is already implemented on a number of current processor
architectures.

That's why I decided that forcing reboot to get out of it was a
reasonable compromise between supporting legacy operation and
still having a fairly basic security feature.

John Savard

Scott Lurndal

unread,
Dec 17, 2023, 2:31:20 PM12/17/23
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>Quadibloc <quad...@servername.invalid> writes:
>>On Sat, 16 Dec 2023 17:33:48 +0000, Anton Ertl wrote:
>>> Quadibloc <quad...@servername.invalid> writes:
>>
>>>>Although I tend to agree with you, I also think that allowing write
>>>>permission and execute permission at the same time is highly dangerous.
>>
>>> Security theatre!
>>>
>>> Gforth uses mmap(... RWX ...) for the area where it generates native
>>> code. Now on MacOS on Apple Silicon (MacOS on Intel works fine) that
>>> mmap() fails, which means that currently the development version of
>>> Gforth does not work. When I work around this misfeature of MacOS on
>>> Apple Silicon (by disabling native code generation), Gforth will run
>>> around a factor of 2 slower on MacOS than on Linux.
>>
>>This is an argument you need to have with Mitch, rather than
>>with me.
>
>This is a newsgroup, he can read and react on my posting as well as
>you and others.
>
>>I lack the expertise to really assess how valuable not allowing
>>write and execute at the same time is; it certainly seems potentially
>>dangerous, so I went along with Mitch.
>
>The point I was making is that in the case of Gforth, it does not add
>any danger that is not already there. If you let an untrusted user
>run Gforth, it's like allowing that user to run non-suid binaries that
>that user supplied.

Assuming you're the klapatious who posted the question on stack
overflow, the answer was there - you need to switch on the jit
mode (and have the jit privilege to do what you want).


Scott Lurndal

unread,
Dec 17, 2023, 2:36:26 PM12/17/23
to
My best was on a 2000 vintage sony vaio laptop running RH8 hosting sendmail, apache and named
on the internet:

Mon May 10 11:27:56 PDT 2010
11:28am up 1528 days, 6:47, 8 users, load average: 0.23, 0.12, 0.05

Got taken out by a power outage where the 10 year old laptop battery
couldn't cover the outage.

Quadibloc

unread,
Dec 17, 2023, 2:41:50 PM12/17/23
to
On Thu, 14 Dec 2023 12:44:46 +0000, Quadibloc wrote:


> In any case, this has inspired me to think of a feature to add to
> Concertina II. Add a new kind of header block, which must be the very
> first one, which contains an encrypted checksum of the rest of the block
> - which must be valid for the block to be allowed to execute. A mode
> exists where only blocks with such a checksum can be executed.

But given that a common feature of present-day computers is to
disable writing to blocks which contain executable code, what on
Earth does this feature provide that that one does not provide
already?

I finally came out of my mental fog to see the answer!

Let us have the computer able to put itself into a mode where
while in privileged mode, it can only execute checksummed code.

This will protect the entire kernel against tampering. But what
you really want is to protect the whole operating system, including
the nonprivileged parts, from tampering.

Well, let the MMU have, in addition to "execute" permission to
hand out, an "execute checksummed only" permission to hand out. So
the kernel uses that whenever it allocates blocks containing operating
system code (hmm, is it going to need to do deep object module
inspection to do this?)... so now these features work together.

And then, *duh*, the benefit of these checksums over no write-execute
hits me. Write execute bans protect the copy of code that's in the
memory. Checksums even protect the copy *that's residing on the
computer's disk storage*, which should be of some value.

John Savard

John Levine

unread,
Dec 17, 2023, 2:48:05 PM12/17/23
to
According to Thomas Koenig <tko...@netcologne.de>:
>> You make pointers two words, one for the code, one that points to the
>> display for the dynamic data. This has been a standard compiler
>> technique since the late 1960s, ...

>... which is yields ABI problems if the ABI is modeled on C,
>and on function pointers being convertible to void* and back),
>and needs requires conditional code for _each_ call through a
>function pointer, because it needs to check if it is a vanilla
>call or a call to a nested function.

Yup, after almost 50 years C still suffers from "everything is really
an int". You can use C, or you can use nested functions, but you'll be
sad if you try to do both at the same time.

I never found the lack of nested functions in C to be much of a
problem. In the rare cases where I want to do something like that,
it's not hard to pass what would have been in the display as
explicit arguments.

Quadibloc

unread,
Dec 17, 2023, 3:14:41 PM12/17/23
to
On Sun, 17 Dec 2023 19:41:45 +0000, Quadibloc wrote:

> And then, *duh*, the benefit of these checksums over no write-execute
> hits me. Write execute bans protect the copy of code that's in the
> memory. Checksums even protect the copy *that's residing on the
> computer's disk storage*, which should be of some value.

And now my vision of the secure computer of the future should be
clear.

On the front of the computer is a little panel that you can slide
open when the computer is turned off. Behind it is a switch.

The switch turns on checksum protection, or turns it off.

One starts by turning it off.

Then the installation program on the DVD-ROM drive can run
as the first thing the computer boots from. And, indeed,
the ROM BIOS can run, to tell the computer how to work the
DVD-ROM drive.

And, indeed, the first DVD-ROM you use came in the box with
your *motherboard*; it is used to install a BIOS on flash
memory which will be checksummed, so it can run when
checksum protection is turned on.

It asks you for a password, which is hashed to produce the
checksum key, stored securely in the CPU.

Then you use the DVD-ROM for installing the operating system,
with checksum protection still turned off. (Even though
you've rebooted after flashing the BIOS. Just because checksum
protection is turned off doesn't mean that checksummed code
won't run; it only means that non-checksummed code can _also_
run.)

The installer program, read in from the DVD-ROM, asks for
a password - which has to be the same one you used for
installing the BIOS - and so it now copies the object code
of the operating system to the hard drive, with appropriate
checksums, unique to _your computer_, in every block of
instructions.

Keep the password, because the operating system won't be
able to update itself without it. (So the favored method
of hacking one of these systems will be to impersonate the
update screen to *get* the key that lets one ask the CPU
to write checksums that work!)

And after the operating system is installed, when the
computer turns itself off, turn on checksum protection,
and turn the computer back on.

This basic technique can of course be augmented by using
more secure and lower-overhead techniques whereby the
computer insists on digital signatures on object modules.

Unlike the digital signature techniques now in use, though,
here the unique key is created by the user, and under the
user's control, so Windows is not privileged above Linux
as has caused controversy around Secure Boot.

John Savard

Quadibloc

unread,
Dec 17, 2023, 3:28:04 PM12/17/23
to
On Sun, 17 Dec 2023 20:14:37 +0000, Quadibloc wrote:

> The installer program, read in from the DVD-ROM, asks for a password -
> which has to be the same one you used for installing the BIOS -

This assumes that everyone uses the same hash algorithm
to go from a password to a checksum key.

The alternative, where every motherboard maker uses its
own hash algorithm, which then gives back a long string
of hex digits which have to be typed in during operating
system installation does not bear thinking about.

However, this gives hackers a smaller search space in which
to look for bad passwords, which are highly likely when it
is so vitally important not to forget the password which is
rarely used!

Hmm. Perhaps the BIOS installer could also think of a number,
which the user must also write down? And then the computer
might have a unique serial number which can be used also,
although that has its issues too.

John Savard

BGB

unread,
Dec 18, 2023, 1:19:46 AM12/18/23
to
Yeah.
Imposing such a restriction at the ISA level is likely a bit too much of
an ask.

Better IMO to have flags to encode various modes, say:
Read Only
Read/Write
Read/Execute
Read/Write/Execute.

For these, one might have, say, a No-Write and No-Execute flag.
Write-Only and Execute-Only are not really common modes.

However, in my case, the MMU has a No-Read flag as well, so these are
technically possible.


For the main page flags, one might also have, say:
No-User (Supervisor Only)
No-Cache
Shared/Global
Valid/Present
...


If doing ACL checks, it may make sense to have a separate set of
Read/Write/Execute flags that are applied during ACL checking.

I guess, maybe relevant could be to devise a scheme for mapping ACL
checking onto a conventional page-walking style MMU design.


Where, say, one associates pairs of 16-bit Keys/ACLIDs with 16-bit field
in the tables (say: user/group/other read/write/execute flags, and some
mode-control bits for the matching rules).

Though, I guess one possible option could be to Morton-shuffle the two
IDs and then use them as a key into a page-table like structure.

With 16-bit ACLIDs, this would be a 3 level table.

For software handling, one can use a more space-efficient structure
(such as a B-Tree). But, a B-Tree walk or similar in hardware likely
isn't viable.

But, then again, ACL space is likely very sparse (if looking at pairs),
where only certain pairs of keys have anything defined, so a
page-table-like structure is unlikely to be particularly space-efficient
in this case.

...



EricP

unread,
Dec 18, 2023, 1:15:10 PM12/18/23
to
BGB wrote:
> On 12/17/2023 10:38 AM, EricP wrote:
>>
>> As for Mitch's RW-no-E restriction, I am confident it
>> will not survive its first encounter with actual customers
>> (where philosophy collides with income).
>>
>
> Yeah.
> Imposing such a restriction at the ISA level is likely a bit too much of
> an ask.
>
> Better IMO to have flags to encode various modes, say:
> Read Only
> Read/Write
> Read/Execute
> Read/Write/Execute.
>
> For these, one might have, say, a No-Write and No-Execute flag.
> Write-Only and Execute-Only are not really common modes.
>
> However, in my case, the MMU has a No-Read flag as well, so these are
> technically possible.
>
>
> For the main page flags, one might also have, say:
> No-User (Supervisor Only)
> No-Cache
> Shared/Global
> Valid/Present
> ...

If one has super and User mode, and R,W,E enables, that's 6 PTE bits.
The PTE is very cramped so there is a big incentive to save any bits.
We can get that down to 3 PTE bits by throwing out the silly combinations,
Write-only, Execute-only, Super:NoAccess+User:RWE,...

The P-Present bit suffices for full No-Access pages.

One could hard code 8 reasonable choices from the 64 possible but
my preferred approach now is to use the 3-bit access control field
as an index to an 8-entry table with the full 6 bits for each row,
and let the OS decide which 8 combinations of Super and User accesses
make sense for it.

I'd also use a 3-bit PTE field to index an 8 entry table for cache control,
as x86 does. Though it also has to deal with legacy issues so maybe
2-bits would suffice here (depends on whether one wants to have
write-allocate vs write-noallocate as distinct selectable options).



BGB

unread,
Dec 18, 2023, 1:54:46 PM12/18/23
to
As long as the total stays under 12 bits or so, probably fine...

Say, my case:
* (11): PR.U3 (User 3, OS / Virtual Memory)
* (10): PR.U2 (User 2, OS / Virtual Memory)
* ( 9): PR.S1 (Size 1) / PR.U1 (User 1, OS / Virtual Memory)
* ( 8): PR.S0 (Size 0) / PR.U0 (User 0, OS / Virtual Memory)
* ( 7): PR.NU (Not User Accessible)
* ( 6): PR.NX (Execute Disable)
* ( 5): PR.NW (Write Disable)
* ( 4): PR.NR (Read Disable)
* ( 3): PR.NC, Page Is Volatile / Non-Cachable
* ( 2): PR.D, (Dirty)
* ( 1): PR.V1, Page Valid Mode / Shared
* ( 0): PR.V0, Page Valid Mode / Present

There are not separate user and supervisor bits in this case.

There are U4 and U5, but these only exist with 16K pages (or U6 and U7
with 64K pages).

The Dirty flag is effectively another User bit in practice, as my MMU
doesn't actually use it for anything (and is more used so the OS
page-management can keep track of if a page may be dirty by effectively
implementing a write barrier).



But, some special cases, such as "User Read Only" may be encoded:
* NC Set, NU Clear:
** NX Clear and NW Set: Secure Execute Mode (*1)
** NX Set and NW Set: User Read Only


*1: These pages allow userland to "temporarily" gain supervisor-mode
instructions, but:
Only if certain magic instructions are invoked on entering and leaving
these pages (trying to execute these instructions elsewhere is a Fault,
trying to leave without using the first exiting the mode, is a fault,
trying to use supervisor instructions without first entering
secure-execute mode via the special instructions is a fault, ...).


For page-table entries:
(63:48) holds the ACLID.

As can be noted, the ACLID and KRR register are used for additional page
access checks. The ACL handling is generally done using a separate
table. If a KRR key is not in the ACL Cache, it will raise an ACL-MISS
fault, and the ISR will load it from the table.

The ACL Cache is currently a very small but fully-associative cache
(needs at least 4-way, 8-way is better, but 8-entries is generally all
that is needed for this cache, as there tend to be only a few ACLs in
use at any given time).


Note the ACLID is not the ASID, where the ASID is held in bits (63:48)
of the TTB register (Page Table Base), with the low-order bits of TTB
mostly used to indicate the page-table layout.



> One could hard code 8 reasonable choices from the 64 possible but
> my preferred approach now is to use the 3-bit access control field
> as an index to an 8-entry table with the full 6 bits for each row,
> and let the OS decide which 8 combinations of Super and User accesses
> make sense for it.
>
> I'd also use a 3-bit PTE field to index an 8 entry table for cache control,
> as x86 does. Though it also has to deal with legacy issues so maybe
> 2-bits would suffice here (depends on whether one wants to have
> write-allocate vs write-noallocate as distinct selectable options).
>

Dunno...

>
>

MitchAlsup

unread,
Dec 18, 2023, 2:58:32 PM12/18/23
to
EricP wrote:

>
> If one has super and User mode, and R,W,E enables, that's 6 PTE bits.
> The PTE is very cramped so there is a big incentive to save any bits.
> We can get that down to 3 PTE bits by throwing out the silly combinations,
> Write-only, Execute-only, Super:NoAccess+User:RWE,...

If one uses a different page table for user and supervisor, you only
need those 3-bits. In my case Guest OS has its own table and its own
ASID completely separate from application. ASID plays the part of G
(global).

Quadibloc

unread,
Dec 18, 2023, 10:30:58 PM12/18/23
to
On Sun, 17 Dec 2023 19:31:15 +0000, Scott Lurndal wrote:

> Assuming you're the klapatious who posted the question on stack
> overflow,

I presume that's a slightly misspelled reference to Klapaucius,
a robot (along with another, named Trurl) from Stanislaw Lem's _Cyberiad_.

John Savard

Thomas Koenig

unread,
Dec 19, 2023, 9:37:43 AM12/19/23
to
John Levine <jo...@taugh.com> schrieb:
> According to Thomas Koenig <tko...@netcologne.de>:
>>> You make pointers two words, one for the code, one that points to the
>>> display for the dynamic data. This has been a standard compiler
>>> technique since the late 1960s, ...
>
>>... which is yields ABI problems if the ABI is modeled on C,
>>and on function pointers being convertible to void* and back),
>>and needs requires conditional code for _each_ call through a
>>function pointer, because it needs to check if it is a vanilla
>>call or a call to a nested function.
>
> Yup, after almost 50 years C still suffers from "everything is really
> an int". You can use C, or you can use nested functions, but you'll be
> sad if you try to do both at the same time.

It becomes more interesting if you want to design an ABI (or an
ABI together with an ISA) which caters well to them, or have a
C-based ABI and try to fit them in.

> I never found the lack of nested functions in C to be much of a
> problem. In the rare cases where I want to do something like that,
> it's not hard to pass what would have been in the display as
> explicit arguments.

Some languages have it in their specification (Fortran and, I
believe, Ada), so they need to be supported, and should
be supported well.

John Levine

unread,
Dec 19, 2023, 10:17:43 AM12/19/23
to
According to Thomas Koenig <tko...@netcologne.de>:
>> I never found the lack of nested functions in C to be much of a
>> problem. In the rare cases where I want to do something like that,
>> it's not hard to pass what would have been in the display as
>> explicit arguments.
>
>Some languages have it in their specification (Fortran and, I
>believe, Ada), so they need to be supported, and should
>be supported well.

If you want to support nested functions, it's not hard, you make
function pointers two words, the code pointer and the static chain.

As you note, what's hard is trying to retrofit that into a C model
where all pointers are the same. Then you end up with kludges like
trampolines and misaligned descriptor pointers.

Scott Lurndal

unread,
Dec 19, 2023, 12:11:48 PM12/19/23
to
John Levine <jo...@taugh.com> writes:
>According to Thomas Koenig <tko...@netcologne.de>:
>>> I never found the lack of nested functions in C to be much of a
>>> problem. In the rare cases where I want to do something like that,
>>> it's not hard to pass what would have been in the display as
>>> explicit arguments.
>>
>>Some languages have it in their specification (Fortran and, I
>>believe, Ada), so they need to be supported, and should
>>be supported well.
>
>If you want to support nested functions, it's not hard, you make
>function pointers two words, the code pointer and the static chain.

Why? Nested functions need access to containing function local _data_,
not code. The actual generated code is independent of the containing
function.

Gcc has an extension that supports nested functions.

https://gcc.gnu.org/onlinedocs/gcc/Nested-Functions.html


Niklas Holsti

unread,
Dec 19, 2023, 12:38:50 PM12/19/23
to
Yes, Ada supports nested subprograms, and also supports nesting for
other kinds of things such as tasks/threads.

But coding standards for high-integrity Ada applications usually forbid
nested subprograms, typically because they are difficult to unit-test,
and early in the history of Ada there were voices suggesting the same
generally, for example the 1980 paper "Nesting in Ada programs is for
the birds", https://dl.acm.org/doi/10.1145/800004.807944. From the
abstract: "Given a data abstraction construct like the Ada package and
in light of current thoughts on programming methodology, we feel that
nesting is an anachronism."

In my own (non-critical) Ada programs, small nested subprograms are not
unusual, but pointers to them are very rare (I don't recall a single
case of it).

Anton Ertl

unread,
Dec 19, 2023, 1:29:12 PM12/19/23
to
Niklas Holsti <niklas...@tidorum.invalid> writes:
>In my own (non-critical) Ada programs, small nested subprograms are not
>unusual, but pointers to them are very rare (I don't recall a single
>case of it).

That's interesting, because a major reason for using nested functions
(or, more generally, closures), is if you need to pass a function with
a given interface, but have to pass additional data to that function.

E.g., consider a numerical integration function integrate(f,lo,hi),
which expects a function with one parameter f(x), and you want to
integrate x^y (pow(x,y) in C), where y is, say, passed as parameter.
Then you would do something like:

double intpowy(double y, double lo, double hi)
{
double f(double x)
{
return pow(x,y);
}
return integrate(f,lo,hi);

Niklas Holsti

unread,
Dec 19, 2023, 4:35:41 PM12/19/23
to
On 2023-12-19 20:12, Anton Ertl wrote:
> Niklas Holsti <niklas...@tidorum.invalid> writes:
>> In my own (non-critical) Ada programs, small nested subprograms are not
>> unusual, but pointers to them are very rare (I don't recall a single
>> case of it).
>
> That's interesting, because a major reason for using nested functions
> (or, more generally, closures), is if you need to pass a function with
> a given interface, but have to pass additional data to that function.


In Ada, instead of passing a (nested) function pointer to some service
that needs such a parameter, one can implement the service as a generic
program unit, where the function to be passed is given as a generic
parameter, and instantiate that generic locally in the nested context,
thus binding the instance with the nested function.

Depending on the way the compiler implements such generics (as macros,
or with shared code) this may or may not be implemented by passing a
function pointer, but even if it is, the compiler can use a "fat
pointer" for this generic case, without being forced to use a fat
pointer for all function pointers.


> E.g., consider a numerical integration function integrate(f,lo,hi),
> which expects a function with one parameter f(x), and you want to
> integrate x^y (pow(x,y) in C), where y is, say, passed as parameter.


That is the canonical example, yes. But it happens I have never needed
to do numerical integration in my Ada programs. I do have some cases
where I have used locally instantiated generics with generic function
parameters in the way I described above. In Ada the integration example
would be declared thusly:

generic
with function F (X : Float) return Float;
function Integrate (Lo, Hi : Float) return Float;

although one would usually not limit it to a specific number type such
as Float, but would make this type also a generic (type) parameter.
IIRC, the ability to define numerical integration as a generic
subprogram, with the function to be integrated as a generic parameter,
was one of the reasons presented for omitting function pointers entirely
from original Ada (Ada 83).

Thomas Koenig

unread,
Dec 19, 2023, 4:58:56 PM12/19/23
to
Niklas Holsti <niklas...@tidorum.invalid> schrieb:

>> E.g., consider a numerical integration function integrate(f,lo,hi),
>> which expects a function with one parameter f(x), and you want to
>> integrate x^y (pow(x,y) in C), where y is, say, passed as parameter.
>
>
> That is the canonical example, yes. But it happens I have never needed
> to do numerical integration in my Ada programs. I do have some cases
> where I have used locally instantiated generics with generic function
> parameters in the way I described above. In Ada the integration example
> would be declared thusly:
>
> generic
> with function F (X : Float) return Float;
> function Integrate (Lo, Hi : Float) return Float;
>
> although one would usually not limit it to a specific number type such
> as Float, but would make this type also a generic (type) parameter.
> IIRC, the ability to define numerical integration as a generic
> subprogram, with the function to be integrated as a generic parameter,
> was one of the reasons presented for omitting function pointers entirely
> from original Ada (Ada 83).

Interesting, thanks!

Just one remark: Making numerical routines generic can be a bit
problematic if constants depend on the accuracy of a type; things
like tolerances for checks for convergence, number of coefficients
in a polynomial (or Chebyshev) evaluation, quantities to divide by
for numeric differentiation etc.

MitchAlsup

unread,
Dec 19, 2023, 5:42:03 PM12/19/23
to
whether compiler constant arithmetic has the same numeric properties as
compiled code instruction sequences,

John Levine

unread,
Dec 19, 2023, 5:53:15 PM12/19/23
to
According to Scott Lurndal <sl...@pacbell.net>:
>>If you want to support nested functions, it's not hard, you make
>>function pointers two words, the code pointer and the static chain.
>
>Why? Nested functions need access to containing function local _data_,
>not code. The actual generated code is independent of the containing
>function.

Sorry, I meant pointers to nested functions. You are quite right that
if you can't take the address of an internal function, the code is
straightforward.

MitchAlsup

unread,
Dec 19, 2023, 8:51:00 PM12/19/23
to
John Levine wrote:

> According to Scott Lurndal <sl...@pacbell.net>:
>>>If you want to support nested functions, it's not hard, you make
>>>function pointers two words, the code pointer and the static chain.

Quibble:

The conventional understanding is that a word is 32-bits* whereas by now
all address spaces are 64-bit capable. (*) except in x86 land where
word remains 16-bits (crikey) which makes "word" even less useful
when describing pointers.

>>
>>Why? Nested functions need access to containing function local _data_,
>>not code. The actual generated code is independent of the containing
>>function.

> Sorry, I meant pointers to nested functions. You are quite right that
> if you can't take the address of an internal function, the code is
> straightforward.

Say you have an array of pointers to nested recursive functions;
Each of these pointers was initialized at some depth of recursion,
and run into an indexed dereference of one of those pointers::

result = fun_p[k](arguments);

How do figure out where on the call-stack the scope of this particular
function being called resides ??

Then: is it possible to pass this nested function pointer through a
void * pointer and back to a nested function pointer and retain the
to access the function in the scope it was initialized ??

void *p = fun_p[k];
result = p(arguments);

Niklas Holsti

unread,
Dec 20, 2023, 3:43:52 AM12/20/23
to
I certainly agree that the numerical analysis involved in implementing
something type-generic to a known accuracy is hard, and probably beyond
my skills. However, Ada provides a number of "attributes" that can be
used to query the properties of a floating-point type, including a
generic formal type, for example S'Model_Epsilon which shows the
accuracy of the floating-point type S. In principle, a generic algorithm
could adjust itself according to these properties.

For the full list of these attributes and their definitions, see
http://www.ada-auth.org/standards/22rm/html/RM-A-5-3.html and
http://www.ada-auth.org/standards/22rm/html/RM-G-2-2.html.

Niklas Holsti

unread,
Dec 20, 2023, 3:51:30 AM12/20/23
to
Ada compilers are required to evaluate typeless constant arithmetic to
unbounded precision (as "bignums"). For example, the constant pi is
defined thusly in the standard package Ada.Numerics:

Pi : constant :=
3.14159_26535_89793_23846_26433_83279_50288_41971_69399_37511;

and all the digits are significant in any computation involving Pi and
other typeless constants.

Computation with typed constants is required to use the operations of
that type, so should match what the target code does.

David Brown

unread,
Dec 20, 2023, 9:10:09 AM12/20/23
to
On 19/12/2023 22:35, Niklas Holsti wrote:
> On 2023-12-19 20:12, Anton Ertl wrote:
>> Niklas Holsti <niklas...@tidorum.invalid> writes:
>>> In my own (non-critical) Ada programs, small nested subprograms are not
>>> unusual, but pointers to them are very rare (I don't recall a single
>>> case of it).
>>
>> That's interesting, because a major reason for using nested functions
>> (or, more generally, closures), is if you need to pass a function with
>> a given interface, but have to pass additional data to that function.
>
>
> In Ada, instead of passing a (nested) function pointer to some service
> that needs such a parameter, one can implement the service as a generic
> program unit, where the function to be passed is given as a generic
> parameter, and instantiate that generic locally in the nested context,
> thus binding the instance with the nested function.
>
> Depending on the way the compiler implements such generics (as macros,
> or with shared code) this may or may not be implemented by passing a
> function pointer, but even if it is, the compiler can use a "fat
> pointer" for this generic case, without being forced to use a fat
> pointer for all function pointers.
>

That sounds very like lambdas in C++. It is much better than the way
nested functions in (extended) C sometimes have to be implemented, where
run-time generated trampoline functions on the stack are used to hide
the "fat" when a fat pointer would be safer and more efficient.


EricP

unread,
Dec 20, 2023, 11:27:07 AM12/20/23
to
Ok but all major OS's view the current process address space (P-space)
as directly addressable from system space (S-space), so the OS can reach
directly into the current process to read or write memory to pass args
to/from syscalls or deliver signals, etc.

To support this view of virtual memory on that platform all OS's would
need to maintain two parallel page tables to map each P-space twice,
one for user mode and one for super mode.

MitchAlsup

unread,
Dec 20, 2023, 11:41:05 AM12/20/23
to
What happens when the hosting machine makes a rounding error that the
target machine would not ?? {{Say, for example, the ADA routine changes
the rounding mode as it goes about its business, but the host compiler
does not see the rounding mode being changed ??}}

MitchAlsup

unread,
Dec 20, 2023, 11:41:06 AM12/20/23
to
EricP wrote:

> MitchAlsup wrote:
>> EricP wrote:
>>
>>>
>>> If one has super and User mode, and R,W,E enables, that's 6 PTE bits.
>>> The PTE is very cramped so there is a big incentive to save any bits.
>>> We can get that down to 3 PTE bits by throwing out the silly
>>> combinations,
>>> Write-only, Execute-only, Super:NoAccess+User:RWE,...
>>
>> If one uses a different page table for user and supervisor, you only
>> need those 3-bits. In my case Guest OS has its own table and its own
>> ASID completely separate from application. ASID plays the part of G
>> (global).

> Ok but all major OS's view the current process address space (P-space)
> as directly addressable from system space (S-space), so the OS can reach
> directly into the current process to read or write memory to pass args
> to/from syscalls or deliver signals, etc.

My 66000 provides this. Guest OS can reach down into application VaS,
application cannot reach up into Guest OS VaS. When Guest OS accesses
its own VaS it uses its own ASID, when accessing application it uses
application ASID. The same analogue occurs in Guest HV reaching down
into Guest OS VaS. When Guest HV accesses its own VaS it uses 1-level
translation and its own ASID; when accessing Guest OS it uses 2-level
translation and Guest OS ASID.

Switching between levels is fast because all the required information
is loaded at all times. Switching contexts is fast because only 1
register needs to be written, the data is obtained as if the control
registers were a cache of the data at the written address.

Stephen Fuld

unread,
Dec 20, 2023, 1:04:55 PM12/20/23
to
I believe Pascal, back in the 1970s implemented nested functions, though
I don't know how any specific implementation accomplished it. ISTM,
this is a nice mechanism for program development, allowing "iterative"
development.


--
- Stephen Fuld
(e-mail address disguised to prevent spam)

EricP

unread,
Dec 20, 2023, 1:13:57 PM12/20/23
to
How does these two ASIDs allow 3 bits in a PTE to grant different
access for the same page to user and OS? For example S:RW,U:RE

Scott Lurndal

unread,
Dec 20, 2023, 1:25:28 PM12/20/23
to
ARMv8 maps each half separately. There is a separate translation table
for each half of the virtual address space, with the high-half holding
the kernel code and data, and the low half holding the user portion.

Each half has it's own translation table base register and each half
can use a different asid.

Context switch between processes simply requires reloading the
table base register for the lower half (TTBR0_EL1). No other
TLB maintainance is required unless the 16-bit ASID wraps.

Anton Ertl

unread,
Dec 20, 2023, 1:57:46 PM12/20/23
to
Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>I believe Pascal, back in the 1970s implemented nested functions, though
>I don't know how any specific implementation accomplished it.

AFAIK Pascal implementations all just use static links chains to the
enclosing scopes. So you access an outer local by following the link
chain as many levels as you cross scopes. Pascal allows passing
nested functions/procedures as parameters, but does not support
variables containing them, or any other first-class handling of them.
This avoids the upward funarg problem (as the Lispers call it). These
parameters consist of the code pointer and the static link pointer.

Modula-2 has nested procedures, and allows putting procedures in
variables, fields, etc., but eliminates the upward funarg problem by
only allowing to store global-level procedures in variables, not
nested procedures. So it went closer to C.

Oberon originally had nested procedures, and AFAIK similar rules to
Modula-2. When I asked Wirth in 2020 if he has a good example for the
use of nested procedures (I thought he would have, given that he kept
this feature for so many years), he told me that he had removed nested
functions in 2013 or so.

MitchAlsup

unread,
Dec 20, 2023, 4:11:10 PM12/20/23
to
Anton Ertl wrote:

> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>>I believe Pascal, back in the 1970s implemented nested functions, though
>>I don't know how any specific implementation accomplished it.

> AFAIK Pascal implementations all just use static links chains to the
> enclosing scopes. So you access an outer local by following the link
> chain as many levels as you cross scopes.

How does Pascal do this counting when there are recursive frames on
the stack ??

MitchAlsup

unread,
Dec 20, 2023, 4:16:57 PM12/20/23
to
Oddly enough; I did not know this until reading this paragraph--and
yet, I converged on apparently the same scheme.

Anton Ertl

unread,
Dec 20, 2023, 4:47:28 PM12/20/23
to
mitch...@aol.com (MitchAlsup) writes:
>Anton Ertl wrote:
>
>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>>>I believe Pascal, back in the 1970s implemented nested functions, though
>>>I don't know how any specific implementation accomplished it.
>
>> AFAIK Pascal implementations all just use static links chains to the
>> enclosing scopes. So you access an outer local by following the link
>> chain as many levels as you cross scopes.
>
>How does Pascal do this counting when there are recursive frames on
>the stack ??

The static scope level is independent of recursion. Note that this
kind of implementation has a static link in addition to the dynamic
link.

E.g., consider

procedure a(...)
procedure b(...)
procedure c(...)
begin
b(...)
end;
begin
c(...);
end;
begin
b(...);
end;

After a few levels of recursion you have dynamic and static chains like this:

dynamic: c->b->c->b->c->b->a
static: --^--------------^

And upon the next call to b:

dynamic: b->c->b->c->b->c->b->a
static: --------------------^

MitchAlsup

unread,
Dec 20, 2023, 5:01:07 PM12/20/23
to
I was planning on ignoring access rights when a more privileged thread
accesses less privileged data.

Stefan Monnier

unread,
Dec 20, 2023, 5:27:38 PM12/20/23
to
Anton Ertl [2023-12-20 21:37:44] wrote:
> mitch...@aol.com (MitchAlsup) writes:
>>Anton Ertl wrote:
>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>>>>I believe Pascal, back in the 1970s implemented nested functions, though
>>>>I don't know how any specific implementation accomplished it.
>>> AFAIK Pascal implementations all just use static links chains to the
>>> enclosing scopes. So you access an outer local by following the link
>>> chain as many levels as you cross scopes.
>>How does Pascal do this counting when there are recursive frames on
>>the stack ??
> The static scope level is independent of recursion. Note that this
> kind of implementation has a static link in addition to the dynamic
> link.

Maybe I'm missing something but I think Pascal's display is "orthogonal"
to the issue discussed before of how to stuff together a code pointer
together with a pointer to its associated data.

Pascal's display is one possible representation of closures, but it
doesn't save you from the need to use a kind of "fat pointer" for the
first-class functions.

What saves Pascal is that first-class functions aren't first class at
all, so Pascal can safely and easily represent them any way it likes,
including as a pair of two values (a pointer to the code, and a pointer
to a stack frame, typically).

It's harder in C because some code wants to be able to convert
a function pointer to a `void*` and back, so you usually want to make it
fit into the size of a normal pointer.


Stefan

David Brown

unread,
Dec 20, 2023, 5:51:44 PM12/20/23
to
Converting between function pointers and data pointers (like void*) is
undefined behaviour in C. Of course, some people want to do that
regardless, and on many platforms you can probably get away with it if
you are lucky. On some platforms, however, data pointers and function
pointers are different sizes. (There are even some platforms where
pointers to different types of data are different sizes.)

But you /are/ allowed to convert back and forth between different
function pointer types in C, which means function pointers are always
the same size. And for efficiency, that would generally be the smallest
size that works for normal functions - i.e., typically just the address
in memory of the function's code.

So what saves C is that nested functions are not part of the language!
And when you have extensions that support nested functions, such as
supported by gcc, if the function really needs a fat pointer you have a
trampoline on the stack to hide that and make it look like a normal
"thin" pointer.


Scott Lurndal

unread,
Dec 20, 2023, 7:58:34 PM12/20/23
to
ARMv8 has PAN (privileged access never) which can be used to prevent
access by privileged code to unprivileged code and data. The bit
(in the PSR) can be set and reset by the kernel, so the kernel will
explicitly reset it when accessing user data, and set it otherwise.

If the kernel accesses nonprivileged data while PAN is set, it will fault.


https://developer.arm.com/documentation/102376/0100/Permissions-attributes

George Neuner

unread,
Dec 21, 2023, 12:08:40 AM12/21/23
to
On Wed, 20 Dec 2023 21:09:44 +0000, mitch...@aol.com (MitchAlsup)
wrote:

>Anton Ertl wrote:
>
>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>>>I believe Pascal, back in the 1970s implemented nested functions, though
>>>I don't know how any specific implementation accomplished it.
>
>> AFAIK Pascal implementations all just use static links chains to the
>> enclosing scopes.

@Anton: implementation dependent - some use displays instead.


>So you access an outer local by following the link
>> chain as many levels as you cross scopes.
>
>How does Pascal do this counting when there are recursive frames on
>the stack ??

To access non-local, non-global things (variables, functions, etc.)
nested functions require that the definition scopes be tracked through
the chain of function calls.

To this end there are 2 different kinds of links: by convention
referred to as 'static' and 'dynamic'.

dynamic = who called who
static = who defined who

The stack frame in a language - such as Pascal - having nested
functions contains both kinds of links. The dynamic link is the same
as in C: it just points to the immediate caller. The static link,
however, points to the last created frame of the function that
/defined/ the currently executing function.


[view with a fixed width font]

Expanding a bit on Anton's example, consider:

A defines B and C
B defines D
C defines E

A runs
A calls B
B calls D
D calls C
C calls E
E calls E
E calls A
A calls B
B calls D
D calls B

dynamic: A <- B <- D <- C <- E <- E <- A <- B <- D <- B
static : ^
: ^-----
: ^-----
: ^---------------
: ^-----
: ^----------
: ^
: ^-----
: ^-----
: ^---------------

You can see the dynamic links simply reflect the call chain. The
static links however, jump around and appear to form multiple chains.

The top level function A has no enclosing scope, so its static link is
to self. However, B, C, D and E all can access things defined in
enclosing scopes by following the chain of static links: e.g., a D
instance has a link to the last B instance, and that B instance has a
link to the last A instance.



Obviously there has to be a way to FIND the last instance of a scope
as it may be deep in the stack. The /simplest/ way is to mark each
frame with a small integer identifying their static nest level: e.g.,
A = 1, B = 2, C = 2, D = 3, E = 3



A 'display' is another way to implement non-local, non-global access
which uses an array of frame pointers rather than a 'chain' of static
links woven through the stack frames. Accessing an enclosing scope is
simply an indirection through display[target scope].

When entering a function at nest level N, the value of display[N] is
saved in the frame and display[N] set to point to the last instance of
the enclosing level N-1 frame. Exiting the function, display[N] is
restored from the saved value.

This makes /maintaining/ a display a bit most costly than using static
links - ie. a static link can simply be abandoned. However, if there
is significant use of things from multiple deep scopes [where 'deep'
is 2 or more scopes away], the display will be faster to use.

Only display elements at or (numerically) lower than the nest level of
the current function can (legally) be used - e.g., a function at level
3 can use display[1] .. display[3], whereas using display[4] or higher
would constiture an error.


So, assuming the same example as above:

A defines B and C
B defines D
C defines E

display[ 1 2 3 4 5 6 ]
A runs [ A . . . . . ] L = 1
A calls B [ A A . . . . ] L = 2
B calls D [ A A B . . . ] L = 3
D calls C [ A A b . . . ] L = 2
C calls E [ A A C . . . ] L = 3
E calls E [ A A C . . . ] L = 3
E calls A [ A a c . . . ] L = 1
A calls B [ A A c . . . ] L = 2
B calls D [ A A B . . . ] L = 3
D calls B [ A A b . . . ] L = 2

'L' indicates the current run level (after the call). Display values
at nest levels greater than the current run level are shown as
lowercase: they ARE still in the array, but (theoretically) are not
accessible from the currently running function.

Although it sort of /looks/ like the display is a stack, the elements
at indices greater than the current nest level still are in the array
and MAY still have meaning.

Unfortunately, there is a limit to how complex an example can be shown
here ... a far more complicated example would be required to show
saving and restoring values.


Using displays also interacts with threading: the display is part of
the thread state. However, thread local storage works fine for this -
in use displays tend to be cached quite effectively and so there is
little or no need for special hardware to handle them.


You might ask "who makes multiple deep accesses to different frames?"
That's a fair question to which I don't have a good answer. Displays
fell out of favor ... not because they were troublesome, but rather
because better compilation methods were developed to turn nested
functions into flat closures (which could be in heap or on stack) and
these mostly removed the need for either displays or static links.

YMMV.


>> Pascal allows passing
>> nested functions/procedures as parameters, but does not support
>> variables containing them

@Anton: again, implementation dependent. Standard Pascal (ISO and
ANSI) does not recognize 'function pointer' variables, but some
extended Pascals did.


>> , or any other first-class handling of them.
>> This avoids the upward funarg problem (as the Lispers call it). These
>> parameters consist of the code pointer and the static link pointer.

Yes, in Standard Pascal you could only call functions defined at the
same nesting level or in an enclosing levels WITHIN THE SAME SCOPE
CHAIN. In the example above E couldn't call D because their
definition scopes are disjoint.

However, again, some extended Pascals did allow cross scope calls
using pointers provided the caller and callee both were within a
common enclosing scope. IOW there were implementations in which you
could create a pointer to E in A and then call E from D.


>> Modula-2 has nested procedures, and allows putting procedures in
>> variables, fields, etc., but eliminates the upward funarg problem by
>> only allowing to store global-level procedures in variables, not
>> nested procedures. So it went closer to C.
>
>> Oberon originally had nested procedures, and AFAIK similar rules to
>> Modula-2. When I asked Wirth in 2020 if he has a good example for the
>> use of nested procedures (I thought he would have, given that he kept
>> this feature for so many years), he told me that he had removed nested
>> functions in 2013 or so.

Yes, for complex code, simple nested functions often are inadequate
and you need (not necessarily 'first-class' but) persistent closures.

[Consider that closures in Lisp really are 2nd class because they must
be bound to a symbol in order to be stored/persisted. In Scheme where
they ARE 1st class, they can be stored and passed around - including
upward - just like any other value.]


>> - anton

Anton Ertl

unread,
Dec 21, 2023, 1:56:40 AM12/21/23
to
Stefan Monnier <mon...@iro.umontreal.ca> writes:
>Anton Ertl [2023-12-20 21:37:44] wrote:
>> mitch...@aol.com (MitchAlsup) writes:
>>>Anton Ertl wrote:
>>>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>>>>>I believe Pascal, back in the 1970s implemented nested functions, though
>>>>>I don't know how any specific implementation accomplished it.
>>>> AFAIK Pascal implementations all just use static links chains to the
>>>> enclosing scopes. So you access an outer local by following the link
>>>> chain as many levels as you cross scopes.
>>>How does Pascal do this counting when there are recursive frames on
>>>the stack ??
>> The static scope level is independent of recursion. Note that this
>> kind of implementation has a static link in addition to the dynamic
>> link.
>
>Maybe I'm missing something but I think Pascal's display is "orthogonal"
>to the issue discussed before of how to stuff together a code pointer
>together with a pointer to its associated data.

Certainly. For example, AFAIK Pascal implementations do not use a
display, but just follows static links (but maybe somebody with more
knowledge can correct me). Cohen [cohen91] described how to use a
display for type inclusion tests. IIRC Wirth wrote a letter in reply
where he stated that he had tried to use a display for accessing outer
locals around 1970, but found that it hurt performance, and that he
was happy that this technique was useful for something after all.

@Article{cohen91,
author = "Norman H. Cohen",
title = "Type-Extension Type Tests Can Be Performed In Constant
Time",
journal = "ACM Transactions on Programming Languages and
Systems",
volume = "13",
number = "4",
pages = "626--629",
month = oct,
year = "1991",
refs = "2",
checked = "19940624",
source = "Dept. Library",
keywords = "class, descriptor, display, extensible data type,
inheritance, membership test, object-oriented
programming, type extension, type test",
note = "Technical Correspondence",
abstract = "Wirth's proposal for type extensions includes an
algorithm for determining whether a give value belongs
to an extension of a given type. In the worst case,
this algorithm takes time proportional to the depth of
the type-extension hierarchy. Wirth describes the loop
in this algorithm as ``unavoidable,'' but in fact, the
test can be performed in constant time by associating a
``display'' of base types with each type descriptor.",
xref = "Wirth:acm:toplas:1988",
reffrom = "Corney:Gough:plasa:1994",
}

>Pascal's display is one possible representation of closures, but it
>doesn't save you from the need to use a kind of "fat pointer" for the
>first-class functions.

Correct.

>What saves Pascal is that first-class functions aren't first class at
>all, so Pascal can safely and easily represent them any way it likes,
>including as a pair of two values (a pointer to the code, and a pointer
>to a stack frame, typically).

Even first-class closures are not a problem. The easiest way is to
let them be a type of two machine words, a code pointer and an
environment pointer.

>It's harder in C because some code wants to be able to convert
>a function pointer to a `void*` and back, so you usually want to make it
>fit into the size of a normal pointer.

Even first class closures that are represented in one machine word are
not a problem. Just box the two words and pass a pointer to that box.

The problem is an ABI that passes function pointers as code pointers
and where calling a function pointer performs an indirect call to the
passed pointer. I.e., a typical C ABI.

Terje Mathisen

unread,
Dec 21, 2023, 7:29:28 AM12/21/23
to
Niklas Holsti wrote:
> On 2023-12-20 0:37, MitchAlsup wrote:
>> Thomas Koenig wrote:
>>
>>> Niklas Holsti <niklas...@tidorum.invalid> schrieb:
>>
>>>>> E.g., consider a numerical integration function integrate(f,lo,hi),
>>>>> which expects a function with one parameter f(x), and you want to
>>>>> integrate x^y (pow(x,y) in C), where y is, say, passed as parameter.
>>>>
>>>>
>>>> That is the canonical example, yes. But it happens I have never
>>>> needed to do numerical integration in my Ada programs. I do have
>>>> some cases where I have used locally instantiated generics with
>>>> generic function parameters in the way I described above. In Ada the
>>>> integration example would be declared thusly:
>>>>
>>>>     generic
>>>>        with function F (X : Float) return Float;
>>>>     function Integrate (Lo, Hi : Float) return Float;
>>>>
>>>> although one would usually not limit it to a specific number type
>>>> such as Float, but would make this type also a generic (type)
>>>> parameter. IIRC, the ability to define numerical integration as a
>>>> generic subprogram, with the function to be integrated as a generic
>>>> parameter, was one of the reasons presented for omitting function
>>>> pointers entirely from original Ada (Ada 83).
>>
>>> Interesting, thanks!
>>
>>> Just one remark: Making numerical routines generic can be a bit
>>> problematic if constants depend on the accuracy of a type; things
>>> like tolerances for checks for convergence,
>>
>> whether compiler constant arithmetic has the same numeric properties
>> as compiled code instruction sequences,
>
>
> Ada compilers are required to evaluate typeless constant arithmetic to
> unbounded precision (as "bignums"). For example, the constant pi is
> defined thusly in the standard package Ada.Numerics:
>
>     Pi : constant :=
>        3.14159_26535_89793_23846_26433_83279_50288_41971_69399_37511;
>
> and all the digits are significant in any computation involving Pi and
> other typeless constants.
>
> Computation with typed constants is required to use the operations of
> that type, so should match what the target code does.
>
And still, 50 digits is only ~170 bits, so not enough to do arbitrary
parameter reductions (which requires ~1100 bits).

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Thomas Koenig

unread,
Dec 21, 2023, 8:02:49 AM12/21/23
to
David Brown <david...@hesbynett.no> schrieb:

> So what saves C is that nested functions are not part of the language!
> And when you have extensions that support nested functions, such as
> supported by gcc, if the function really needs a fat pointer you have a
> trampoline on the stack to hide that and make it look like a normal
> "thin" pointer.

I looked at what the Fortran compilers I could lay my hands on did
with the following code, which uses a contained subroutine inside
a subroutine:

module test
contains
subroutine t(f)
interface
subroutine f()
end subroutine
end interface
call f()
end subroutine
end module

subroutine foo(i)
use test
integer, intent(inout):: i
call t(bar)
contains
subroutine bar()
call xxx(i)
end
end

Interestingly, every compiler I could lay my hands on on short
notice (gfortran for several architectures, ifort, xlf, the new
flang for llvm) used trampolines.

See https://godbolt.org/z/nv5nr934q , which is for POWER and
contains a call to a subroutine suggestively called
__trampoline_setup.

Michael S

unread,
Dec 21, 2023, 8:22:41 AM12/21/23
to
On both leading general-purpose computing platforms, i.e. on Windows and
on Posix-compatibles, you don't have to be lucky: this type of
conversion is guaranteed to work. Period. In both cases it is an
integral part of dynamic linking APIs.

David Brown

unread,
Dec 21, 2023, 9:28:29 AM12/21/23
to
Nope.

POSIX requires that an object of type "void *" can hold a pointer to a
function. But it does not require that a conversion between a function
and a data pointer is defined. Not only is this code not guaranteed to
have defined behaviour or work as you might expect, but a conforming C
compiler is obliged to issue a diagnostic about it:

void (*foo)(void);

foo = (void (*)(void) dlsym(handle, "foo");

What /does/ work, for any POSIX system, is:

*(void ** )(&foo) = dlsym(handle, "foo");

<https://pubs.opengroup.org/onlinepubs/009696899/functions/dlsym.html>


Note that having the same size - and even the same representation - is
not all that is needed for things to work as naïvely as you might assume
in C.

And of course, the C world is vastly greater than just the "leading
general-purpose computing platforms", and it includes targets where
function pointers and data pointers are significantly different. Some
platforms (including quite popular ones in their day, like DOS) support
different memory models where data and code pointers can be big and
flexible or small and efficient. Things that work on one platform might
not work on others. The C language - defined by the standards - gives a
common subset. If you are writing in C, and you have not specified a
particular target but you rely on target-specific features, your code
relies on luck.

Niklas Holsti

unread,
Dec 21, 2023, 10:03:24 AM12/21/23
to
On 2023-12-21 14:29, Terje Mathisen wrote:
> Niklas Holsti wrote:

[snip]

>> Ada compilers are required to evaluate typeless constant arithmetic to
>> unbounded precision (as "bignums"). For example, the constant pi is
>> defined thusly in the standard package Ada.Numerics:
>>
>>      Pi : constant :=
>>         3.14159_26535_89793_23846_26433_83279_50288_41971_69399_37511;
>>
>> and all the digits are significant in any computation involving Pi and
>> other typeless constants.

[snip]

> And still, 50 digits is only ~170 bits, so not enough to do arbitrary
> parameter reductions (which requires ~1100 bits).


Ada compilers are explicitly allowed to define Ada.Numerics.Pi with more
digits if they want to do so (which sort of introduces a portability
problem, oops...)

I believe they are also allowed to use a more precise approximation of
pi in the numerical library functions.

EricP

unread,
Dec 21, 2023, 11:29:20 AM12/21/23
to
BTW this requires a RWE stack.
The x64 gfortran is considerably simpler.
It stuffs two MOV reg,imm and a CALL instructions on the stack at RSP+8,
copies RSP+8 to RDI, calls test_MOD which jumps to RDI.

bar.0:
mov rdi, QWORD PTR [r10]
jmp xxx_
__test_MOD_t:
jmp rdi
foo_:
sub rsp, 56
mov edx, -17599 // REX MOV reg,imm
mov ecx, -17847 // REX MOV reg,imm
mov QWORD PTR [rsp], rdi
lea rdi, [rsp+8]
mov WORD PTR [rsp+8], dx
mov edx, OFFSET FLAT:bar.0
mov QWORD PTR [rsp+40], 0
mov DWORD PTR [rsp+10], edx
mov WORD PTR [rsp+14], cx
mov QWORD PTR [rsp+16], rsp
mov DWORD PTR [rsp+24], -1864106167 // REX CALL
call __test_MOD_t
add rsp, 56
ret



MitchAlsup

unread,
Dec 21, 2023, 1:21:07 PM12/21/23
to
Where is R10 made to point at i ??

EricP

unread,
Dec 21, 2023, 2:32:25 PM12/21/23
to
-17847 = BA4A = REX.WX MOV r10,imm64
mov QWORD PTR [rsp],rdi copies i in rdi to the bottom of stack [RSP].
mov WORD PTR [rsp+14],cx stuffs REX.WX MOV r10,imm64 onto stack
mov QWORD PTR [rsp+16],rsp copies RSP onto stack after REX.WX MOV, r10,imm64

At execution REX.WX MOV r10,imm64 loads prior rsp that points to i into r10
bar.0:
mov rdi, QWORD PTR [r10] loads i into rdi



Thomas Koenig

unread,
Dec 22, 2023, 7:42:48 AM12/22/23
to
Niklas Holsti <niklas...@tidorum.invalid> schrieb:
> On 2023-12-19 20:12, Anton Ertl wrote:
>> Niklas Holsti <niklas...@tidorum.invalid> writes:
>>> In my own (non-critical) Ada programs, small nested subprograms are not
>>> unusual, but pointers to them are very rare (I don't recall a single
>>> case of it).
>>
>> That's interesting, because a major reason for using nested functions
>> (or, more generally, closures), is if you need to pass a function with
>> a given interface, but have to pass additional data to that function.
>
>
> In Ada, instead of passing a (nested) function pointer to some service
> that needs such a parameter, one can implement the service as a generic
> program unit, where the function to be passed is given as a generic
> parameter, and instantiate that generic locally in the nested context,
> thus binding the instance with the nested function.

It would be interesting to see how Ada compilers handle nested
functions. Would it be possible to create an Ada equivalent
of https://godbolt.org/z/e5cr9EoE3 (I hope the syntax is clear)
and post it?

Jean-Marc Bourguet

unread,
Dec 26, 2023, 11:35:20 AM12/26/23
to
mitch...@aol.com (MitchAlsup) writes:

> Anton Ertl wrote:
>
>> Stephen Fuld <sf...@alumni.cmu.edu.invalid> writes:
>>> I believe Pascal, back in the 1970s implemented nested functions, though
>>> I don't know how any specific implementation accomplished it.
>
>> AFAIK Pascal implementations all just use static links chains to the
>> enclosing scopes. So you access an outer local by following the link
>> chain as many levels as you cross scopes.
>
> How does Pascal do this counting when there are recursive frames on the
> stack ??

There are two kind of frame links: the dynamic links point just to the frame of
the caller, the static link to the lexically enclosing frame.

Something like (in a C-like syntax with nested function):

void f() {
int fv;

void f1() {
use(fv);
}

void f2() {
int f2v;

void f2a(int r) {
if (r != 0) {
f2a(r-1);
}
f1();
fv += 42;
f2v += 36;
}

f2a(3);

}
}

would be translated as (in a C-like syntax without nested function and with
the dynamic link handled by the C compiler if one is needed):

struct __f_frame {
int fv;
};

struct __f2_frame {
struct __f_frame* parent;
int f2v;
}

void f1(struct __f_frame* parent) {
use(parent->fv);
}

void f2a(struct __f2_frame* parent, int r) {
if (r != 0) {
f2a(parent, r-1);
}
f1(parent->parent);
parent->parent->fv += 42;
parent->f2v += 36;
}

void f2(struct __f_frame* parent) {
struct __f2_frame local_frame;
local_frame.parent = parent;
f2a(&local_frame, 3);
}

void f() {
struct __f_frame local_frame;
f2(&local_frame);
}

Yours,

--
Jean-Marc

Tim Rentsch

unread,
Jan 1, 2024, 3:26:59 PMJan 1
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:

[...]

> Modula-2 has nested procedures, and allows putting procedures in
> variables, fields, etc., but eliminates the upward funarg problem by
> only allowing to store global-level procedures in variables, not
> nested procedures. So it went closer to C.
>
> Oberon originally had nested procedures, and AFAIK similar rules to
> Modula-2. When I asked Wirth in 2020 if he has a good example for the
> use of nested procedures (I thought he would have, given that he kept
> this feature for so many years), he told me that he had removed nested
> functions in 2013 or so.

There is a significant benefit to well-done nested functions, where
well-done means full closures, syntactically lightweight, and
allowing an anonymous expressional form, often called a lambda:
the ability to define first-class control structures in the
language, rather than needing control structures built into the
language. See for example blocks in Smalltalk.

Quadibloc

unread,
Jan 3, 2024, 2:37:34 AMJan 3
to
On Fri, 15 Dec 2023 14:30:25 +0000, Quadibloc wrote:

> Is the CPU even the place for sandboxing? A genuinely effective sandbox
> would involve a physical separation between the protected computer and
> the one connected to the Internet, after all. But that isn't
> convenient...

In a different thread, I've finally fleshed this out.

One has the "real" computer, which is built for speed, and which may
therefore have some limitations to its security. So that computer isn't
connected directly to the Internet.

But it controls a subordinate computer which does connect to the Internet.

That computer should be completely unable to write to the only memory it
is capable of loading executable code from. Instead, its parent, the real
computer, loads all executable code into that memory.

This scheme was used by the old Bell system Electronic Switching System,
and I think that it produces pretty good security.

That computer would then have a subordinate computer of its own. This
third computer is used to run things like the implementation of JavaScript
with a just-in-time compiler. It uses 486-style tecnology; that is, its
an in-order processor (no Spectre) running slowly enough that its DRAM
is not subject to Rowhammer.

So a malicious web site would have to execute vulnerabilities created by
software bugs inside the third computer, and these would presumably ge
gradually eliminated as attacks are detected. It wouldn't have the
hardware vulnerabilities which are difficult to get rid of to use.

And then, in order to get at the hard drives of the computer, it would have
to attack the "real" computer at the top level. But between it and that is
a computer with zero ability to execute code that it can write or modify.

To me, that seems like a desert that malware can't cross. (A vulerability
could be created, of course, if the second computer was allowed to execute
external programs by means of an _interpreter_ instead of a just-in-time
compiler. So making interpreters available within the secondary computer
that talks to the Internet must be recognized as a practice to avoid.)

John Savard

MitchAlsup

unread,
Jan 3, 2024, 1:01:40 PMJan 3
to
Tim Rentsch wrote:

> There is a significant benefit to well-done nested functions, where
> well-done means full closures, syntactically lightweight, and
> allowing an anonymous expressional form, often called a lambda:
> the ability to define first-class control structures in the
> language, rather than needing control structures built into the
> language. See for example blocks in Smalltalk.

How do pointers to nested functions interact with try-throw-catch ??

Do you use the stack as it is now ?? or how it was when the closure was
constructed ??

John Levine

unread,
Jan 3, 2024, 3:31:37 PMJan 3
to
According to MitchAlsup <mitch...@aol.com>:
>Tim Rentsch wrote:
>
>> There is a significant benefit to well-done nested functions, where
>> well-done means full closures, ...

>How do pointers to nested functions interact with try-throw-catch ??
>
>Do you use the stack as it is now ?? or how it was when the closure was
>constructed ??

The usual answer is that you unwind the stack to its state at try, and
if you then call something that uses stack frames below that, you
lose.

Or you can do spaghetti stacks, where the stack can be forked and you
keep all the frames around that might be reactivated. See Scheme and
Smalltalk.


--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

MitchAlsup

unread,
Jan 3, 2024, 5:47:01 PMJan 3
to
John Levine wrote:

> According to MitchAlsup <mitch...@aol.com>:
>>Tim Rentsch wrote:
>>
>>> There is a significant benefit to well-done nested functions, where
>>> well-done means full closures, ...

>>How do pointers to nested functions interact with try-throw-catch ??
>>
>>Do you use the stack as it is now ?? or how it was when the closure was
>>constructed ??

> The usual answer is that you unwind the stack to its state at try, and
> if you then call something that uses stack frames below that, you
> lose.

> Or you can do spaghetti stacks, where the stack can be forked and you
> keep all the frames around that might be reactivated. See Scheme and
> Smalltalk.

Thanks.

Tim Rentsch

unread,
Jan 3, 2024, 11:53:18 PMJan 3
to
I see you got your question answered. That's good.

Note that Smalltalk doesn't have exceptions built into the
language. When exceptions were added to the Smalltalk
environment, no changes were needed to the language or
the VM - just defining some methods in the right places.

Anton Ertl

unread,
Jan 4, 2024, 4:06:53 AMJan 4
to
John Levine <jo...@taugh.com> writes:
>According to MitchAlsup <mitch...@aol.com>:
>>Tim Rentsch wrote:
>>
>>> There is a significant benefit to well-done nested functions, where
>>> well-done means full closures, ...
>
>>How do pointers to nested functions interact with try-throw-catch ??
>>
>>Do you use the stack as it is now ?? or how it was when the closure was
>>constructed ??
>
>The usual answer is that you unwind the stack to its state at try, and
>if you then call something that uses stack frames below that, you
>lose.

It's not clear to me what you mean with "below", but anyway:

In the usual exception-handling variant (as exists in, e.g., Java)
where you cannot continue at the throwing site, you just restore
setup, including the stack pointer and the frame pointer (and thus
the static and the dynamic link) to what it was at the "try". The
same holds for Pascal's goto to a label in a statically surrounding
procedure; the more usual exception mechanisms find the try-catch
block dynamically, but that makes no difference after the target of
the throw has been determined.

If the static link was invalid after the "try", it was invalid before
the "try", too; many languages and their implementations are designed
to avoid that, either by preventing closures that live after the
procedure that created them has finished (e.g., Pascal and Modula-2),
or by keeping such closures in garbage-collected memory rather than
the stack (e.g., Scheme).

>Or you can do spaghetti stacks, where the stack can be forked and you
>keep all the frames around that might be reactivated. See Scheme and
>Smalltalk.

From the viewpoint of what to do that's pretty similar to the case
above: if the catch block can decide to continue at the throwing
place, before entering the catch block, the frame pointer is restored
to where it was at the "try", but the stack pointer is not. If the
catch block decides to continue at the throw, the frame pointer (and
the stack pointer, if changed in the meantime) is restored to the
value at the throw; If the catch block decides to continue after the
catch block, the stack pointer is restored to the depth at the "try".

George Neuner

unread,
Jan 4, 2024, 10:07:58 PMJan 4
to
On Thu, 04 Jan 2024 08:17:05 GMT, an...@mips.complang.tuwien.ac.at
(Anton Ertl) wrote:

>John Levine <jo...@taugh.com> writes:
>>According to MitchAlsup <mitch...@aol.com>:
>>>Tim Rentsch wrote:
>>>
>>>> There is a significant benefit to well-done nested functions, where
>>>> well-done means full closures, ...
>>
>>>How do pointers to nested functions interact with try-throw-catch ??
>>>
>>>Do you use the stack as it is now ?? or how it was when the closure was
>>>constructed ??
>>
>>The usual answer is that you unwind the stack to its state at try, and
>>if you then call something that uses stack frames below that, you
>>lose.
>
>It's not clear to me what you mean with "below", but anyway:

Functions defined in scopes enclosed by the function containing the
"try".

The stack POV is reversed: the frame for function containing the "try"
would be lower in the stack, and frames for any enclosed functions
would be higher (closer to the top) in the stack.


>In the usual exception-handling variant (as exists in, e.g., Java)
>where you cannot continue at the throwing site, you just restore
>setup, including the stack pointer and the frame pointer (and thus
>the static and the dynamic link) to what it was at the "try". The
>same holds for Pascal's goto to a label in a statically surrounding
>procedure; the more usual exception mechanisms find the try-catch
>block dynamically, but that makes no difference after the target of
>the throw has been determined.

But that won't necessarily work in a language like Scheme where a
saved closure might use functions enclosed by the "try".
[Depends on the implementation of closures.]


>If the static link was invalid after the "try", it was invalid before
>the "try", too; many languages and their implementations are designed
>to avoid that, either by preventing closures that live after the
>procedure that created them has finished (e.g., Pascal and Modula-2),
>or by keeping such closures in garbage-collected memory rather than
>the stack (e.g., Scheme).

The static link is not invalid after the try - but it MAY become
invalid after a throw.


>>Or you can do spaghetti stacks, where the stack can be forked and you
>>keep all the frames around that might be reactivated. See Scheme and
>>Smalltalk.

As John mentioned, one possible implementation of closures is to
preserve the stack frames that are referenced [note: not necessarily
the whole call chain - just the referenced scope chain]. Static links
within the preserved chain of frames would remain valid.


>From the viewpoint of what to do that's pretty similar to the case
>above: if the catch block can decide to continue at the throwing
>place, before entering the catch block, the frame pointer is restored
>to where it was at the "try", but the stack pointer is not. If the
>catch block decides to continue at the throw, the frame pointer (and
>the stack pointer, if changed in the meantime) is restored to the
>value at the throw; If the catch block decides to continue after the
>catch block, the stack pointer is restored to the depth at the "try".

Microsoft's Structured Exception Handling (SEH) includes the ability
to continue from the exception point.

Lisp exceptions include the ability to continue from multiple
(re-entry) points following the throw.


>- anton

BGB

unread,
Jan 6, 2024, 1:28:11 AMJan 6
to
In my case, I have function pointers as narrow code pointers, and also
closures (as a C language extension), which look just like normal
function pointers.

Though the current implementation is that a closure is a pointer to a
blob of machine code that loads its own address into a designated
register and then branches to the actual entry point for the function
(with the captured bindings for the closure following the machine-code
blob).

Granted, this implementation still favors non-closure function pointers
as the default case.

At present, this was implemented by having a mechanism to allocate RWE
memory objects (via one of the special case values passed to
"_malloc_cat()").


> - anton

0 new messages