64-bit embedded computing is here and now

137 views
Skip to first unread message

James Brakefield

unread,
Jun 7, 2021, 10:47:54 AMJun 7
to

Sometimes things move faster than expected.
As someone with an embedded background this caught me by surprise:

Tera-Byte microSD cards are readily available and getting cheaper.
Heck, you can carry ten of them in a credit card pouch.
Likely to move to the same price range as hard disks ($20/TB).

That means that a 2+ square inch PCB can hold a 64-bit processor and enough storage for memory mapped files larger than 4GB.

Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices as the FABs mature? Will video data move to the IOT edge? Will AI move to the edge? Will every embedded CPU have a built-in radio?

Wait a few years and find out.

Don Y

unread,
Jun 7, 2021, 5:13:55 PMJun 7
to
On 6/7/2021 7:47 AM, James Brakefield wrote:
>
> Sometimes things move faster than expected. As someone with an embedded
> background this caught me by surprise:
>
> Tera-Byte microSD cards are readily available and getting cheaper. Heck, you
> can carry ten of them in a credit card pouch. Likely to move to the same
> price range as hard disks ($20/TB).
>
> That means that a 2+ square inch PCB can hold a 64-bit processor and enough
> storage for memory mapped files larger than 4GB.

Kind of old news. I've been developing on a SAMA5D36 platform with 256M of
FLASH and 256M of DDR2 for 5 or 6 years, now. PCB is just over 2 sq in
(but most of that being off-board connectors). Granted, it's a 32b processor
but I'll be upgrading that to something "wider" before release (software and
OS have been written for a 64b world -- previously waiting for costs to fall
to make it as economical as the 32b was years ago; now waiting to see if I
can leverage even MORE hardware-per-dollar!).

Once you have any sort of connectivity, it becomes practical to support
files larger than your physical memory -- just fault the appropriate
page in over whatever interface(s) you have available (assuming you
have other boxes that you can talk to/with)

> Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices as
> the FABs mature? Will video data move to the IOT edge? Will AI move to the
> edge? Will every embedded CPU have a built-in radio?

In my case, video is already *at* the edge. The idea of needing a
"bigger host" or "the cloud" is already obsolescent. Even the need
for bulk storage -- whether on-board (removable flash, as you suggest)
or remotely served -- is dubious. How much persistent store do you
really need, beyond your executables, in a typical application?

I've decided that RAM is the bottleneck as you can't XIP out of
an SD card...

Radios? <shrug> Possibly as wireless is *so* much easier to
interconnect than wired. But, you're still left with the power
problem; even at a couple of watts, wall warts are unsightly
and low voltage DC isn't readily available *everywhere* that
you may want to site a device. (how many devices do you
want tethered to a USB host before it starts to look a mess?)

The bigger challenge is moving developers to think in terms of
the capabilities that the hardware will afford. E.g., can
you exploit *true* concurrency in your application? Or, will
you "waste" a second core/thread context on some largely
decoupled activity? How much capability will you be willing
to sacrifice to your hosting OS -- and what NEW capabilities
will it 0provide you?

> Wait a few years and find out.

The wait won't even be *that* long...

Paul Rubin

unread,
Jun 8, 2021, 1:31:59 AMJun 8
to
James Brakefield <jim.bra...@ieee.org> writes:
> Is the 32-bit embedded processor cost vulnerable to 64-bit 7nm devices
> as the FABs mature? Will video data move to the IOT edge? Will AI move
> to the edge? Will every embedded CPU have a built-in radio?

I don't care what the people say--
32 bits are here to stay.

David Brown

unread,
Jun 8, 2021, 1:59:58 AMJun 8
to
8-bit microcontrollers are still far more common than 32-bit devices in
the embedded world (and 4-bit devices are not gone yet). At the other
end, 64-bit devices have been used for a decade or two in some kinds of
embedded systems.

We'll see 64-bit take a greater proportion of the embedded systems that
demand high throughput or processing power (network devices, hard cores
in expensive FPGAs, etc.) where the extra cost in dollars, power,
complexity, board design are not a problem. They will probably become
more common in embedded Linux systems as the core itself is not usually
the biggest part of the cost. And such systems are definitely on the
increase.

But for microcontrollers - which dominate embedded systems - there has
been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
cost. There is almost nothing to gain from a move to 64-bit, but the
cost would be a good deal higher. So it is not going to happen - at
least not more than a very small and very gradual change.

The OP sounds more like a salesman than someone who actually works with
embedded development in reality.

Don Y

unread,
Jun 8, 2021, 3:39:29 AMJun 8
to
On 6/7/2021 10:59 PM, David Brown wrote:
> 8-bit microcontrollers are still far more common than 32-bit devices in
> the embedded world (and 4-bit devices are not gone yet). At the other
> end, 64-bit devices have been used for a decade or two in some kinds of
> embedded systems.

I contend that a good many "32b" implementations are really glorified
8/16b applications that exhausted their memory space. I still see lots
of designs build on a small platform (8/16b) and augment it -- either
with some "memory enhancement" technology or additional "slave"
processors to split the binaries. Code increases in complexity but
there doesn't seem to be a need for the "work-per-unit-time" to.

[This has actually been the case for a long time. The appeal of
newer CPUs is often in the set of peripherals that accompany the
processor, not the processor itself.]

> We'll see 64-bit take a greater proportion of the embedded systems that
> demand high throughput or processing power (network devices, hard cores
> in expensive FPGAs, etc.) where the extra cost in dollars, power,
> complexity, board design are not a problem. They will probably become
> more common in embedded Linux systems as the core itself is not usually
> the biggest part of the cost. And such systems are definitely on the
> increase.
>
> But for microcontrollers - which dominate embedded systems - there has
> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little

I disagree. The "cost" (barrier) that I see clients facing is the
added complexity of a 32b platform and how it often implies (or even
*requires*) a more formal OS underpinning the application. Where you
could hack together something on bare metal in the 8/16b worlds,
moving to 32 often requires additional complexity in managing
mechanisms that aren't usually present in smaller CPUs (caches,
MMU/MPU, DMA, etc.) Developers (and their organizations) can't just
play "coder cowboy" and coerce the hardware to behaving as they
would like. Existing staff (hired with the "bare metal" mindset)
are often not equipped to move into a more structured environment.

[I can hack together a device to meet some particular purpose
much easier on "development hardware" than I can on a "PC" -- simply
because there's too much I have to "work around" on a PC that isn't
present on development hardware.]

Not every product needs a filesystem, network stack, protected
execution domains, etc. Those come with additional costs -- often
in the form of a lack of understanding as to what the ACTUAL
code in your product is doing at any given time. (this isn't the
case in the smaller MCU world; it's possible for a developer to
have written EVERY line of code in a smaller platform)

> cost. There is almost nothing to gain from a move to 64-bit, but the
> cost would be a good deal higher.

Why is the cost "a good deal higher"? Code/data footprints don't
uniformly "double" in size. The CPU doesn't slow down to handle
bigger data.

The cost is driven by where the market goes. Note how many 68Ks found
design-ins vs. the T11, F11, 16032, etc. My first 32b design was
physically large, consumed a boatload of power and ran at only a modest
improvement (in terms of system clock) over 8b processors of its day.
Now, I can buy two orders of magnitude more horsepower PLUS a
bunch of built-in peripherals for two cups of coffee (at QTY 1)

> So it is not going to happen - at
> least not more than a very small and very gradual change.

We got 32b processors NOT because the embedded world cried out for
them but, rather, because of the influence of the 32b desktop world.
We've had 32b processors since the early 80's. But, we've only had
PCs since about the same timeframe! One assumes ubiquity in the
desktop world would need to happen before any real spillover to embedded.
(When the "desktop" was an '11 sitting in a back room, it wasn't seen
as ubiquitous.)

In the future, we'll see the 64b *phone* world drive the evolution
of embedded designs, similarly. (do you really need 32b/64b to
make a phone? how much code is actually executing at any given
time and in how many different containers?)

[The OP suggests MCus with radios -- maybe they'll be cell phone
radios and *not* wifi/BLE as I assume he's thinking! Why add the
need for some sort of access point to a product's deployment if
the product *itself* can make a direct connection??]

My current design can't fill a 32b address space (but, that's because
I've decomposed apps to the point that they can be relatively small).
OTOH, designing a system with a 32b limitation seems like an invitation
to do it over when 64b is "cost effective". The extra "baggage" has
proven to be relatively insignificant (I have ports of my codebase
to SPARC as well as Atom running alongside a 32b ARM)

> The OP sounds more like a salesman than someone who actually works with
> embedded development in reality.

Possibly. Or, just someone that wanted to stir up discussion...

David Brown

unread,
Jun 8, 2021, 7:04:05 AMJun 8
to
On 08/06/2021 09:39, Don Y wrote:
> On 6/7/2021 10:59 PM, David Brown wrote:
>> 8-bit microcontrollers are still far more common than 32-bit devices in
>> the embedded world (and 4-bit devices are not gone yet).  At the other
>> end, 64-bit devices have been used for a decade or two in some kinds of
>> embedded systems.
>
> I contend that a good many "32b" implementations are really glorified
> 8/16b applications that exhausted their memory space. 

Sure. Previously you might have used 32 kB flash on an 8-bit device,
now you can use 64 kB flash on a 32-bit device. The point is, you are
/not/ going to find yourself hitting GB limits any time soon. The step
from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
system - the step from 32-bit to 64-bit is totally pointless for 99.99%
of embedded systems. (Even for most embedded Linux systems, you usually
only have a 64-bit cpu because you want bigger and faster, not because
of memory limitations. It is only when you have a big gui with fast
graphics that 32-bit address space becomes a limitation.)

A 32-bit microcontroller is simply much easier to work with than an
8-bit or 16-bit with "extended" or banked memory to get beyond 64 K
address space limits.

>
>> We'll see 64-bit take a greater proportion of the embedded systems that
>> demand high throughput or processing power (network devices, hard cores
>> in expensive FPGAs, etc.) where the extra cost in dollars, power,
>> complexity, board design are not a problem.  They will probably become
>> more common in embedded Linux systems as the core itself is not usually
>> the biggest part of the cost.  And such systems are definitely on the
>> increase.
>>
>> But for microcontrollers - which dominate embedded systems - there has
>> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
>
> I disagree.  The "cost" (barrier) that I see clients facing is the
> added complexity of a 32b platform and how it often implies (or even
> *requires*) a more formal OS underpinning the application.

Yes, that is definitely a cost in some cases - 32-bit microcontrollers
are usually noticeably more complicated than 8-bit ones. How
significant the cost is depends on the balances of the project between
development costs and production costs, and how beneficial the extra
functionality can be (like moving from bare metal to RTOS, or supporting
networking).

>
>> cost.  There is almost nothing to gain from a move to 64-bit, but the
>> cost would be a good deal higher.
>
> Why is the cost "a good deal higher"?  Code/data footprints don't
> uniformly "double" in size.  The CPU doesn't slow down to handle
> bigger data.

Some parts of code and data /do/ double in size - but not uniformly, of
course. But your chip is bigger, faster, requires more power, has wider
buses, needs more advanced memories, has more balls on the package,
requires finer pitched pcb layouts, etc.

In theory, you /could/ make a microcontroller in a 64-pin LQFP and
replace the 72 MHz Cortex-M4 with a 64-bit ARM core at the same clock
speed. The die would only cost two or three times more, and take
perhaps less than 10 times the power for the core. But it would be so
utterly pointless that no manufacturer would make such a device.

So a move to 64-bit in practice means moving from a small, cheap,
self-contained microcontroller to an embedded PC. Lots of new
possibilities, lots of new costs of all kinds.

Oh, and the cpu /could/ be slower for some tasks - bigger cpus that are
optimised for throughput often have poorer latency and more jitter for
interrupts and other time-critical features.

>
>>  So it is not going to happen - at
>> least not more than a very small and very gradual change.
>
> We got 32b processors NOT because the embedded world cried out for
> them but, rather, because of the influence of the 32b desktop world.
> We've had 32b processors since the early 80's.  But, we've only had
> PCs since about the same timeframe!  One assumes ubiquity in the
> desktop world would need to happen before any real spillover to embedded.
> (When the "desktop" was an '11 sitting in a back room, it wasn't seen
> as ubiquitous.)

I don't assume there is any direct connection between the desktop world
and the embedded world - the needs are usually very different. There is
a small overlap in the area of embedded devices with good networking and
a gui, where similarity to the desktop world is useful.

We have had 32-bit microcontrollers for decades. I used a 16-bit
Windows system when working with my first 32-bit microcontroller. But
at that time, 32-bit microcontrollers cost a lot more and required more
from the board (external memories, more power, etc.) than 8-bit or
16-bit devices. That has gradually changed with an almost total
disregard for what has happened in the desktop world.

Yes, the embedded world /did/ cry out for 32-bit microcontrollers for an
increasing proportion of tasks. We cried many tears when then
microcontroller manufacturers offered to give more flash space to their
8-bit devices by having different memory models, banking, far jumps, and
all the other shit that goes with not having a big enough address space.
We cried out when we wanted to have Ethernet and the microcontroller
only had a few KB of ram. I have used maybe 6 or 8 different 32-bit
microcontroller processor architectures, and I used them because I
needed them for the task. It's only in the past 5+ years that I have
been using 32-bit microcontrollers for tasks that could be done fine
with 8-bit devices, but the 32-bit devices are smaller, cheaper and
easier to work with than the corresponding 8-bit parts.

>
> In the future, we'll see the 64b *phone* world drive the evolution
> of embedded designs, similarly.  (do you really need 32b/64b to
> make a phone?  how much code is actually executing at any given
> time and in how many different containers?)
>

We will see that on devices that are, roughly speaking, tablets -
embedded systems with a good gui, a touchscreen, networking. And that's
fine. But these are a tiny proportion of the embedded devices made.

>
>> The OP sounds more like a salesman than someone who actually works with
>> embedded development in reality.
>
> Possibly.  Or, just someone that wanted to stir up discussion...
>

Could be. And there's no harm in that!

Theo

unread,
Jun 8, 2021, 10:46:29 AMJun 8
to
David Brown <david...@hesbynett.no> wrote:
> But for microcontrollers - which dominate embedded systems - there has
> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
> cost. There is almost nothing to gain from a move to 64-bit, but the
> cost would be a good deal higher. So it is not going to happen - at
> least not more than a very small and very gradual change.

I think there will be divergence about what people mean by an N-bit system:

Register size
Unit of logical/arithmetical processing
Memory address/pointer size
Memory bus/cache width

I think we will increasingly see parts which have different sizes on one
area but not the other.

For example, for doing some kinds of logical operations (eg crypto), having
64-bit registers and ALU makes sense, but you might only need kilobytes of
memory so only have <32 address bits.

For something else, like a microcontroller that's hung off the side of a
bigger system (eg the MCU on a PCIe card) you might want the ability to
handle 64 bit addresses but don't need to pay the price for 64-bit
registers.

Or you might operate with 16 or 32 bit wide external RAM chip, but your
cache could extend that to a wider word width.

There are many permutations, and I think people will pay the cost where it
benefits them and not where it doesn't.

This is not a new phenomenon, of course. But for a time all these numbers
were in the range between 16 and 32 bits, which made 32 simplest all round.
Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit
address) I think we're going to see more 32/64 hybrids.

Theo

James Brakefield

unread,
Jun 8, 2021, 3:38:46 PMJun 8
to
The only thing that will take more than 4GB is video or a day's worth of photos.
So there is likely to be some embedded aps that need a > 32-bit address space.
Cost, size or storage capacity are no longer limiting factors.

Am trying to puzzle out what a 64-bit embedded processor should look like.
At the low end, yeah, a simple RISC processor. And support for complex arithmetic
using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
32-bit pointers into the software?

David Brown

unread,
Jun 8, 2021, 4:11:24 PMJun 8
to
On 08/06/2021 21:38, James Brakefield wrote:

Could you explain your background here, and what you are trying to get
at? That would make it easier to give you better answers.

> The only thing that will take more than 4GB is video or a day's worth of photos.

No, video is not the only thing that takes 4GB or more. But it is,
perhaps, one of the more common cases. Most embedded systems don't need
anything remotely like that much memory - to the nearest percent, 100%
of embedded devices don't even need close to 4MB of memory (ram and
flash put together).

> So there is likely to be some embedded aps that need a > 32-bit address space.

Some, yes. Many, no.

> Cost, size or storage capacity are no longer limiting factors.

Cost and size (and power) are /always/ limiting factors in embedded systems.

>
> Am trying to puzzle out what a 64-bit embedded processor should look like.

There are plenty to look at. There are ARMs, PowerPC, MIPS, RISC-V.
And of course there are some x86 processors used in embedded systems.

> At the low end, yeah, a simple RISC processor.

Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
CISC devices are RISC. Not all are simple.

> And support for complex arithmetic
> using 32-bit floats?

A 64-bit processor will certainly support 64-bit doubles as well as
32-bit floats. Complex arithmetic is rarely needed, except perhaps for
FFT's, but is easily done using real arithmetic. You can happily do
32-bit complex arithmetic on an 8-bit AVR, albeit taking significant
code space and run time. I believe the latest gcc for the AVR will do
64-bit doubles as well - using exactly the same C code you would on any
other processor.

> And support for pixel alpha blending using quad 16-bit numbers?

You would use a hardware 2D graphics accelerator for that, not the
processor.

> 32-bit pointers into the software?
>

With 64-bit processors you usually use 64-bit pointers.

David Brown

unread,
Jun 8, 2021, 4:18:28 PMJun 8
to
On 08/06/2021 16:46, Theo wrote:
> David Brown <david...@hesbynett.no> wrote:
>> But for microcontrollers - which dominate embedded systems - there has
>> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
>> cost. There is almost nothing to gain from a move to 64-bit, but the
>> cost would be a good deal higher. So it is not going to happen - at
>> least not more than a very small and very gradual change.
>
> I think there will be divergence about what people mean by an N-bit system:

There has always been different ways to measure the width of a cpu, and
different people have different preferences.

>
> Register size

Yes, that is common.

> Unit of logical/arithmetical processing

As is that. Sometimes the width supported by general instructions
differs from the ALU width, however, resulting in classifications like
8/16-bit for the Z80 and 16/32-bit for the 68000.

> Memory address/pointer size

Yes, also common.

> Memory bus/cache width

No, that is not a common way to measure cpu "width", for many reasons.
A chip is likely to have many buses outside the cpu core itself (and the
cache(s) may or may not be considered part of the core). It's common to
have 64-bit wide buses on 32-bit processors, it's also common to have
16-bit external databuses on a microcontroller. And the cache might be
128 bits wide.

>
> I think we will increasingly see parts which have different sizes on one
> area but not the other.
>

That has always been the case.

> For example, for doing some kinds of logical operations (eg crypto), having
> 64-bit registers and ALU makes sense, but you might only need kilobytes of
> memory so only have <32 address bits.

You need quite a few KB of ram for more serious cryptography. But it
sounds more like you are talking about SIMD or vector operations here,
which are not considered part of the "normal" width of the cpu. Modern
x86 cpus might have 512 bit SIMD registers - but they are still 64-bit
processors.

But you are right that you might want some parts of the system to be
wider and other parts thinner.

>
> For something else, like a microcontroller that's hung off the side of a
> bigger system (eg the MCU on a PCIe card) you might want the ability to
> handle 64 bit addresses but don't need to pay the price for 64-bit
> registers.
>
> Or you might operate with 16 or 32 bit wide external RAM chip, but your
> cache could extend that to a wider word width.
>
> There are many permutations, and I think people will pay the cost where it
> benefits them and not where it doesn't.
>

Agreed.

> This is not a new phenomenon, of course. But for a time all these numbers
> were in the range between 16 and 32 bits, which made 32 simplest all round.
> Just like we previously had various 8/16 hybrids (eg 8 bit datapath, 16 bit
> address) I think we're going to see more 32/64 hybrids.
>

32-bit processors have often had 64-bit registers for floating point,
and 64-bit operations of various sorts. It is not new.


Dimiter_Popoff

unread,
Jun 8, 2021, 4:39:31 PMJun 8
to
On 6/8/2021 23:18, David Brown wrote:
> On 08/06/2021 16:46, Theo wrote:
>> ......
>
>> Memory bus/cache width
>
> No, that is not a common way to measure cpu "width", for many reasons.
> A chip is likely to have many buses outside the cpu core itself (and the
> cache(s) may or may not be considered part of the core). It's common to
> have 64-bit wide buses on 32-bit processors, it's also common to have
> 16-bit external databuses on a microcontroller. And the cache might be
> 128 bits wide.

I agree with your points and those of Theo, but the cache is basically
as wide as the registers? Logically, that is; a cacheline is several
times that, probably you refer to that.
Not that it makes much of a difference to the fact that 64 bit data
buses/registers in an MCU (apart from FPU registers, 32 bit FPUs are
useless to me) are unlikely to attract much interest, nothing of
significance to be gained as you said.
To me 64 bit CPUs are of interest of course and thankfully there are
some available, but this goes somewhat past what we call "embedded".
Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered
there is some real mess with their out of order execution, one needs to
do... hmmmm.. "sync", whatever they call it, all the time and there is
a huge performance cost because of that. Anybody heard anything about
it? (I only know what I was told).

Dimiter


James Brakefield

unread,
Jun 8, 2021, 5:25:24 PMJun 8
to
Am familiar with embedded systems, image processing and scientific applications.
Have used a number of 8, 16, 32 and ~64bit processors. Have also done work in
FPGAs. Am semi-retired and when working was always trying to stay ahead of
new opportunities and challenges.

Some of my questions/comments belong over at comp.arch

Dimiter_Popoff

unread,
Jun 8, 2021, 6:01:27 PMJun 8
to
The real value in 64 bit integer registers and 64 bit address space is
just that, having an orthogonal "endless" space (well I remember some
30 years ago 32 bits seemed sort of "endless" to me...).

Not needing to assign overlapping logical addresses to anything
can make a big difference to how the OS is done.

32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
*numbers* can be quite useful for storing/passing data.

Dimiter

======================================================
Dimiter Popoff, TGI http://www.tgi-sci.com
======================================================
http://www.flickr.com/photos/didi_tgi/

Don Y

unread,
Jun 8, 2021, 7:01:22 PMJun 8
to
On 6/8/2021 7:46 AM, Theo wrote:
> David Brown <david...@hesbynett.no> wrote:
>> But for microcontrollers - which dominate embedded systems - there has
>> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
>> cost. There is almost nothing to gain from a move to 64-bit, but the
>> cost would be a good deal higher. So it is not going to happen - at
>> least not more than a very small and very gradual change.
>
> I think there will be divergence about what people mean by an N-bit system:
>
> Register size
> Unit of logical/arithmetical processing
> Memory address/pointer size
> Memory bus/cache width

(General) Register size is the primary driver.

A processor can have very different "size" subcomponents.
E.g., a Z80 is an 8b processor -- registers are nominally 8b.
However, it support 16b operations -- on register PAIRs
(an implicit acknowledgement that the REGISTER is smaller
than the register pair). This is common on many smaller
processors. The address space is 16b -- with a separate 16b
address space for I/Os. The Z180 extends the PHYSICAL
address space to 20b but the logical address space
remains unchanged at 16b (if you want to specify a physical
address, you must use 20+ bits to represent it -- and invoke
a separate mechanism to access it!). The ALU is *4* bits.

Cache? Which one? I or D? L1/2/3/?

What about the oddballs -- 12b? 1b?

> I think we will increasingly see parts which have different sizes on one
> area but not the other.
>
> For example, for doing some kinds of logical operations (eg crypto), having
> 64-bit registers and ALU makes sense, but you might only need kilobytes of
> memory so only have <32 address bits.

That depends on the algorithm chosen and the hardware support available.

> For something else, like a microcontroller that's hung off the side of a
> bigger system (eg the MCU on a PCIe card) you might want the ability to
> handle 64 bit addresses but don't need to pay the price for 64-bit
> registers.
>
> Or you might operate with 16 or 32 bit wide external RAM chip, but your
> cache could extend that to a wider word width.
>
> There are many permutations, and I think people will pay the cost where it
> benefits them and not where it doesn't.

But you don't buy MCUs with a-la-carte pricing. How much does an extra
timer cost me? What if I want it to also serve as a *counter*? What
cost for 100K of internal ROM? 200K?

[It would be an interesting exercise to try to do a linear analysis of
product prices with an idea of trying to tease out the "costs" (to
the developer) for each feature in EXISTING products!]

Instead, you see a *price* that is reflective of how widely used the
device happens to be, today. You are reliant on the preferences of others
to determine which is the most cost effective product -- for *you*.

E.g., most of my devices have no "display" -- yet, the MCU I've chosen
has hardware support for same. It would obviously cost me more to
select a device WITHOUT that added capability -- because most
purchasers *want* a display (and *they* drive the production economies).

I could, potentially, use a 2A03 for some applications. But, the "TCO"
of such an approach would exceed that of a 32b (or larger) processor!

[What a crazy world!]

Don Y

unread,
Jun 8, 2021, 7:52:25 PMJun 8
to
On 6/8/2021 12:38 PM, James Brakefield wrote:

> |> I contend that a good many "32b" implementations are really glorified
> |> 8/16b applications that exhausted their memory space.
>
> The only thing that will take more than 4GB is video or a day's worth of photos.

That's not true. For example, I rely on a "PC" in my current design
to support the RDBMS. Otherwise, I would have to design a "special
node" (I have a distributed system) that had the resources necessary
to process multiple concurrent queries in a timely fashion; I can
put 100GB of RAM in a PC (whereas my current nodes only have 256MB).

The alternative is to rely on secondary (disk) storage -- which is
even worse!

And "video" is incredibly nondescript. It conjures ideas of STBs.
Instead, I see a wider range of applications in terms of *vision*.

E.g., let your doorbell camera "notice motion", recognize that
motion as indicative of someone/thing approaching it (e.g.,
a visitor), recognize the face/features of the visitor and
alert you to its presence (if desired). No need to involve a
cloud service to do this.

[My "doorbell" is a camera/microphone/speaker. *If* I want to
know that you are present, *it* will tell me. Or, if told to
do so, will grant you access to the house (even in my absence).
For "undesirables", I'm mounting a coin mechanism adjacent to
the entryway (our front door is protected by a gated porch area):
"Deposit 25c to ring bell. If we want to talk to you, your
deposit will be refunded. If *not*, consider that the *cost* of
pestering us!"]

There are surveillance cameras discretely placed around the exterior
of the house (don't want the place to look like a frigging *bank*!).
One of them has a clear view of the mailbox (our mail is delivered
via lettercarriers riding in mail trucks). Same front door camera
hardware. But, now: detect motion; detect motion STOPPING
proximate to mailbox (for a few seconds or more); detect motion
resuming; signal "mail available". Again, no need to involve a
cloud service to accomplish this. And, when not watching for mail
delivery, it's performing "general" surveillance -- mail detection
is a "free bonus"!

Imagine designing a vision-based inspection system where you "train"
the CAMERA -- instead of some box that the camera connects to. And,
the CAMERA signals accept/reject directly.

[I use a boatload of cameras, here; they are cheap sensors -- the
"cost" lies in the signal processing!]

> So there is likely to be some embedded aps that need a > 32-bit address space.
> Cost, size or storage capacity are no longer limiting factors.

No, cost size and storage are ALWAYS limiting factors!

E.g., each of my nodes derive power from the wired network connection.
That puts a practical limit of ~12W on what a node can dissipate.
That has to support the processing core plus any local I/Os! Note
that dissipated power == heat. So, one also has to be conscious of
how that heat will affect the devices' environs.

(Yes, there are schemes to increase this to ~100W but now the cost
of providing power -- and BACKUP power -- to a remote device starts
to be a sizeable portion of the product's cost and complexity).

My devices are intended to be "invisible" to the user -- so, they
have to hide *inside* something (most commonly, the walls or
ceiling -- in standard Jboxes for accessibility and Code compliance).
So, that limits their size/volume (mine are about the volume of a
standard duplex receptacle -- 3 cu in -- so fit in even the smallest
of 1G boxes... even pancake boxes!)

They have to be inexpensive so I can justify using LOTS of them
(I will have 240 deployed, here; my industrial beta site will have
over 1000; commercial beta site almost a similar number). Not only
is the cost of initial acquisition of concern, but also the *perceived*
cost of maintaining the hardware in a functional state (customer
doesn't want to have $10K of spares on hand for rapid incident response
and staff to be able to diagnose and repair/replace "on demand")

In my case, I sidestep the PERSISTENT storage issue by relegating that
to the RDBMS. In *that* domain, I can freely add spinning rust or
an SSD without complicating the design of the rest of the nodes.
So, "storage" becomes:
- how much do I need for a secure bootstrap
- how much do I need to contain a downloaded (from the RDBMS!) binary
- how much do I need to keep "local runtime resources"
- how much can I exploit surplus capacity *elsewhere* in the system
to address transient needs

Imagine what it would be like having to replace "worn" SD cards
at some frequency in hundreds of devices scattered around hundreds
of "invisible" places! Almost as bad as replacing *batteries* in
those devices!

[Have you ever had an SD card suddenly write protect itself?]

> Am trying to puzzle out what a 64-bit embedded processor should look like.

"Should"? That depends on what you expect it to do for you.
The nonrecurring cost of development will become an ever-increasing
portion of the device's "cost". If you sell 10K units but spend
500K on development (over its lifetime), you've justification for
spending a few more dollars on recurring costs *if* you can realize
a reduction in development/maintenance costs (because the development
is easier, bugs are fewer/easier to find, etc.)

Developers (and silicon vendors, as Good Business Practice)
will look at their code and see what's "hard" to do, efficiently.
Then, consider mechanisms that could make that easier or
more effective.

I see the addition of hardware features that enhance the robustness
of the software development *process*. E.g., allowing for compartmentalizing
applications and subsystems more effectively and *efficiently*.

[I put individual objects into their own address space containers
to ensure Object A can't be mangled by Client B (or Object C). As
a result, talking to an object is expensive because I have to hop
back and forth across that protection boundary. It's even worse
when the targeted object is located on some other physical node
(as now I have the transport cost to contend with).]

Similarly, making communications more robust. We already see that
with crypto accelerators. The idea of device "islands" is
obsolescent. Increasingly, devices will interact with other
devices to solve problems. More processing will move to the
edge simply because of scaling issues (I can add more CPUs
far more effectively than I can increase the performance of
a "centralized" CPU; add another sense/control point? let *it*
bring some processing abilities along with it!).

And, securing the product from tampering/counterfeiting; it seems
like most approaches, to date, have some hidden weakness. It's hard
to believe hardware can't ameliorate that. The fact that "obscurity"
is still relied upon by silicon vendors suggests an acknowledgement
of their weaknesses.

Beyond that? Likely more DSP-related support in the "native"
instruction set (so you can blend operations between conventional
computing needs and signal processing related issues).

And, graphics acceleration as many applications implement user
interfaces in the appliance.

There may be some other optimizations that help with hashing
or managing large "datasets" (without them being considered
formal datasets).

Power management (and measurement) will become increasingly
important (I spend almost as much on the "power supply"
as I do on the compute engine). Developers will want to be
able to easily ascertain what they are consuming as well
as why -- so they can (dynamically) alter their strategies.
In addition to varying CPU clock frequency, there may be
mechanisms to automatically (!) power down sections of
the die based on observed instruction sequences (instead
of me having to explicitly do so).

[E.g., I shed load when I'm running off backup power.
This involves powering down nodes as well as the "fields"
on selective nodes. How do I decide *which* load to shed to
gain the greatest benefit?]

Memory management (in the conventional sense) will likely
see more innovation. Instead of just "settling" for a couple
of page sizes, we might see "adjustable" page sizes.
Or, the ability to specify some PORTION of a *particular*
page as being "valid" -- instead of treating the entire
page as such.

Scheduling algorithms will hopefully get additional
hardware support. E.g., everything is deadline driven
in my design ("real-time"). So, schedulers are concerned
with evaluating the deadlines of "ready" tasks -- which
can vary, over time, as well as may need further qualification
based on other criteria (e.g., least-slack-time scheduling)

Everything in my system is an *opaque* object on which a
set of POSSIBLE methods that can be invoked. But, each *Client*
of that object (an Actor may be multiple Clients if it possesses
multiple different Handles to the Object) is constrained as to
which methods can be invoked via a particular Handle.

So, I can (e.g.) create an Authenticator object that has methods like
"set_passphrase" and "test_passphrase" and "invalidate_passphrase".
Yet, no "disclose_passphrase" method (for obvious reasons).
I can create an Interface to one privileged Client that
allows it to *set* a new passphrase. And, all other Interfaces
(to that Client as well as others!) may all be restricted to
only *testing* the passphrase ("Is it 'foobar'?"). And, I can
limit the number of attempts that you can invoke a particular
method over a particular interface so the OS does the enforcement
instead of relying on the Server to do so.

[What's to stop a Client from hammering on the Server (Authenticator
Object) repeatedly -- invoking test_passphrase with full knowledge
that it doesn't know the correct passhrase: "Is it 'foobar'?"
"Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?" "Is it 'foobar'?"
The Client has been enabled to do this; that doesn't mean he can't or
won't abuse it!

Note that unlimited access means the server has to respond to each of
those method invocations. By contrast, putting a limit on them
means the OS can block the invocation from ever reaching the Object
(and needlessly tying up the Object's resources). A capabilities
based system that relies on encrypted tokens means the Server has
to decrypt a token in order to determine that it is invalid;
the Server's resources are consumed instead of the Client's]

It takes effort (in the kernel) to verify that a Client *can* access a
particular Object (i.e., has a Handle to it) AND that the Client can
invoke THAT particular Method on that Object via this Handle (bound to
a particular Object *Interface*) as well as verifying the format of
the data, converting to a format suitable for the targeted Object
(which may use a different representational structure) for a
particular Version of the Interface...

I can either skimp on performing some of these checks (and rely
on other mechanisms to ensure the security and reliability of
the codebase -- in the presence of unvetted Actors) or hope
that some hardware mechanism in the processor makes these a bit
easier.

> At the low end, yeah, a simple RISC processor. And support for complex arithmetic
> using 32-bit floats? And support for pixel alpha blending using quad 16-bit numbers?
> 32-bit pointers into the software?

I doubt complex arithmetic will have much play. There might be support for
*building* larger data types (e.g., I use BigRationals which are incredibly
inefficient). But, the bigger bang will be for operators that allow
tedious/iterative solutions to be implemented in constant time. This,
for example, is why a hardware multiply (or other FPU capabilities)
is such a win -- consider the amount of code that is replaced by a single
op-code! Ditto things like "find first set bit", etc.

Why stick with 32b floats when you can likely implement doubles with a bit
more microcode (surely faster than trying to do wider operations built from
narrower ones)?

There's an entirely different mindset when you start thinking in
terms of "bigger processors". I.e., the folks who see 32b processors as
just *wider* 8/16b processors have typically not made this adjustment.
It's like trying to "sample the carry" in a HLL (common in ASM)
instead of concentrating on what you REALLY want to do and letting
the language make it easier for you to express that.

Expect to see people making leaps forward in terms of what they
expect from the solutions they put forth. Anything that you could
do with a PC, before, can now be done *in* a handheld flashlight!

Don Y

unread,
Jun 8, 2021, 8:31:25 PMJun 8
to
On 6/8/2021 4:04 AM, David Brown wrote:
> On 08/06/2021 09:39, Don Y wrote:
>> On 6/7/2021 10:59 PM, David Brown wrote:
>>> 8-bit microcontrollers are still far more common than 32-bit devices in
>>> the embedded world (and 4-bit devices are not gone yet). At the other
>>> end, 64-bit devices have been used for a decade or two in some kinds of
>>> embedded systems.
>>
>> I contend that a good many "32b" implementations are really glorified
>> 8/16b applications that exhausted their memory space.
>
> Sure. Previously you might have used 32 kB flash on an 8-bit device,
> now you can use 64 kB flash on a 32-bit device. The point is, you are
> /not/ going to find yourself hitting GB limits any time soon. The step

I don't see the "problem" with 32b devices as one of address space limits
(except devices utilizing VMM with insanely large page sizes). As I said,
in my application, task address spaces are really just a handful of pages.

I *do* see (flat) address spaces that find themselves filling up with
stack-and-heap-per-task, big chunks set aside for "onboard" I/Os,
*partial* address decoding for offboard I/Os, etc. (i.e., you're
not likely going to fully decode a single address to access a set
of DIP switches as the decode logic is disproportionately high
relative to the functionality it adds)

How often do you see a high-order address line used for kernel/user?
(gee, now your "user" space has been halved)

> from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
> system - the step from 32-bit to 64-bit is totally pointless for 99.99%
> of embedded systems. (Even for most embedded Linux systems, you usually
> only have a 64-bit cpu because you want bigger and faster, not because
> of memory limitations. It is only when you have a big gui with fast
> graphics that 32-bit address space becomes a limitation.)

You're assuming there has to be some "capacity" value to the 64b move.

You might discover that the ultralow power devices (for phones!)
are being offered in the process geometries targeted for the 64b
devices. Or, that some integrated peripheral "makes sense" for
phones (but not MCUs targeting motor control applications). Or,
that there are additional power management strategies supported
in the hardware.

In my mind, the distinction brought about by "32b" was more advanced
memory protection/management -- even if not used in a particular
application. You simply didn't see these sorts of mechanisms
in 8/16b offerings. Likewise, floating point accelerators. Working
in smaller processors meant you had to spend extra effort to
bullet-proof your code, economize on math operators, etc.

So, if you wanted the advantages of those (hardware) mechanisms,
you "upgraded" your design to 32b -- even if it didn't need
gobs of address space or generic MIPS. It just wasn't economical
to bolt on an AM9511 or practical to build a homebrew MMU.

> A 32-bit microcontroller is simply much easier to work with than an
> 8-bit or 16-bit with "extended" or banked memory to get beyond 64 K
> address space limits.

There have been some 8b processors that could seemlessly (in HLL)
handle extended address spaces. The Z180s were delightfully easy
to use, thusly. You just had to keep in mind that a "call" to
a different bank was more expensive than a "local" call (though
there were no syntactic differences; the linkage editor and runtime
package made this invisible to the developer).

We were selling products with 128K of DRAM on Z80's back in 1981.
Because it was easier to design THAT hardware than to step up to
a 68K, for example. (as well as leveraging our existing codebase)
The "video game era" was built on hybridized 8b systems -- even though
you could buy 32b hardware, at the time. You would be surprised at
the ingenuity of many of those systems in offloading the processor
of costly (time consuming) operations to make the device appear more
powerful than it actually was.

>>> We'll see 64-bit take a greater proportion of the embedded systems that
>>> demand high throughput or processing power (network devices, hard cores
>>> in expensive FPGAs, etc.) where the extra cost in dollars, power,
>>> complexity, board design are not a problem. They will probably become
>>> more common in embedded Linux systems as the core itself is not usually
>>> the biggest part of the cost. And such systems are definitely on the
>>> increase.
>>>
>>> But for microcontrollers - which dominate embedded systems - there has
>>> been a lot to gain by going from 8-bit and 16-bit to 32-bit for little
>>
>> I disagree. The "cost" (barrier) that I see clients facing is the
>> added complexity of a 32b platform and how it often implies (or even
>> *requires*) a more formal OS underpinning the application.
>
> Yes, that is definitely a cost in some cases - 32-bit microcontrollers
> are usually noticeably more complicated than 8-bit ones. How
> significant the cost is depends on the balances of the project between
> development costs and production costs, and how beneficial the extra
> functionality can be (like moving from bare metal to RTOS, or supporting
> networking).

I see most 32b designs operating without the benefits that a VMM system
can apply (even if you discount demand paging). They just want to have
a big address space and not have to dick with "segment registers", etc.
They plow through the learning effort required to configure the device
to move the "extra capabilities" out of the way. Then, just treat it
like a bigger 8/16 processor.

You can "bolt on" a simple network stack even with a rudimentary RTOS/MTOS.
Likewise, a web server. Now, you remove the need for graphics and other UI
activities hosted *in* the device. And, you likely don't need to support
multiple concurrent clients. If you want to provide those capabilities, do
that *outside* the device (let it be someone else's problem). And, you gain
"remote access" for free.

Few such devices *need* (or even WANT!) ARP caches, inetd, high performance
stack, file systems, etc.

Given the obvious (coming) push for enhanced security in devices, anything
running on your box that you don't need (or UNDERSTAND!) is likely going to
be pruned off as a way to reduce the attack surface. "Why is this port open?
What is this process doing? How robust is the XXX subsystem implementation
to hostile actors in an *unsupervised* setting?"

>>> cost. There is almost nothing to gain from a move to 64-bit, but the
>>> cost would be a good deal higher.
>>
>> Why is the cost "a good deal higher"? Code/data footprints don't
>> uniformly "double" in size. The CPU doesn't slow down to handle
>> bigger data.
>
> Some parts of code and data /do/ double in size - but not uniformly, of
> course. But your chip is bigger, faster, requires more power, has wider
> buses, needs more advanced memories, has more balls on the package,
> requires finer pitched pcb layouts, etc.

And has been targeted to a market that is EXTREMELY power sensitive
(phones!).

It is increasingly common for manufacturing technologies to be moving away
from "casual development". The days of owning your own wave and doing
in-house manufacturing at a small startup are gone. If you want to
limit yourself to the kinds of products that you CAN (easily) assemble, you
will find yourself operating with a much poorer selection of components
available. I could fab a PCB in-house and build small runs of prototypes
using the wave and shake-and-bake facilities that we had on hand. Harder
to do so, nowadays.

This has always been the case. When thru-hole met SMT, folks had to
either retool to support SMT, or limit themselves to components that
were available in thru-hole packages. As the trend has always been
for MORE devices to move to newer packaging technologies, anyone
who spent any time thinking about it could read the writing on the wall!
(I bought my Leister in 1988? Now, I prefer begging favors from
colleagues to get my prototypes assembled!)

I suspect this is why we now see designs built on COTS "modules"
increasingly. Just like designs using wall warts (so they don't
have to do the testing on their own, internally designed supplies).
It's one of the reasons FOSH is hampered (unlike FOSS, you can't roll
your own copy of a hardware design!)

> In theory, you /could/ make a microcontroller in a 64-pin LQFP and
> replace the 72 MHz Cortex-M4 with a 64-bit ARM core at the same clock
> speed. The die would only cost two or three times more, and take
> perhaps less than 10 times the power for the core. But it would be so
> utterly pointless that no manufacturer would make such a device.

This is specious reasoning: "You could take the die out of a 68K and
replace it with a 64 bit ARM." Would THAT core cost two or three times more
(do you recall how BIG 68K die were?) and consume 10 times the power?
(it would consume considerably LESS).

The market will drive the cost (power, size, $$$, etc.) of 64b cores
down as they will find increasing use in devices that are size and
power constrained. There's far more incentive to make a cheap,
low power 64b ARM than there is to make a cheap, low power i686
(or 68K) -- you don't see x86 devices in phones (laptops have bigger
power budgets so less pressure on efficiency).

There's no incentive to making thru-hole versions of any "serious"
processor, today. Just like you can't find any fabs for DTL devices.
Or 10 & 12" vinyl. (yeah, you can buy vinyl, today -- at a premium.
And, I suspect you can find someone to package an ARM on a DIP
carrier. But, each of those are niche markets, not where the
"money lies")

> So a move to 64-bit in practice means moving from a small, cheap,
> self-contained microcontroller to an embedded PC. Lots of new
> possibilities, lots of new costs of all kinds.

How do you come to that conclusion? I have a 32b MCU on a board.
And some FLASH and DRAM. How is that going to change when I
move to a 64b processor? The 64b devices are also SoCs so
it's not like you suddenly have to add address decoding logic,
a clock generator, interrupt controller, etc.

Will phones suddenly become FATTER to accommodate the extra
hardware needed? Will they all need bolt on battery boosters?

> Oh, and the cpu /could/ be slower for some tasks - bigger cpus that are
> optimised for throughput often have poorer latency and more jitter for
> interrupts and other time-critical features.

You're cherry picking. They can also be FASTER for other tasks
and likely will be optimized to justify/exploit those added abilities;
a vendor isn't going to offer a product that is LESS desireable than
his existing products. An IPv6 stack on a 64b processor is a bit
easier to implement than on 32b.

(remember, ARM is in a LOT of fabs! That speaks to how ubiquitous
it is!)

>>> So it is not going to happen - at
>>> least not more than a very small and very gradual change.
>>
>> We got 32b processors NOT because the embedded world cried out for
>> them but, rather, because of the influence of the 32b desktop world.
>> We've had 32b processors since the early 80's. But, we've only had
>> PCs since about the same timeframe! One assumes ubiquity in the
>> desktop world would need to happen before any real spillover to embedded.
>> (When the "desktop" was an '11 sitting in a back room, it wasn't seen
>> as ubiquitous.)
>
> I don't assume there is any direct connection between the desktop world
> and the embedded world - the needs are usually very different. There is
> a small overlap in the area of embedded devices with good networking and
> a gui, where similarity to the desktop world is useful.

The desktop world inspires the embedded world. You see what CAN be done
for "reasonable money".

In the 70's, we put i4004's into products because we knew the processing
that was required was "affordable" (at several kilobucks) -- because
we had our own '11 on site. We leveraged the in-house '11 to compute
"initialization constants" for the needs of specific users (operating
the i4004-based products). We didn't hesitate to migrate to i8080/85
when they became available -- because the price point was largely
unchanged (from where it had been with the i4004) AND we could skip the
involvement of the '11 in computing those initialization constants!

I watch the prices of the original 32b ARM I chose fall and see that
as an opportunity -- to UPGRADE the capabilities (and future-safeness
of the design). If I'd assumed $X was a tolerable price, before,
then it likely still is!

> We have had 32-bit microcontrollers for decades. I used a 16-bit
> Windows system when working with my first 32-bit microcontroller. But
> at that time, 32-bit microcontrollers cost a lot more and required more
> from the board (external memories, more power, etc.) than 8-bit or
> 16-bit devices. That has gradually changed with an almost total
> disregard for what has happened in the desktop world.

I disagree. I recall having to put lots of "peripherals" into
an 8/16b system, external address decoding logic, clock generators,
DRAM controllers, etc.

And, the cost of entry was considerably higher. Development systems
used to cost tens of kilodollars (Intellec MDS, Zilog ZRDS, Moto
EXORmacs, etc.) I shared a development system with several other
developers in the 70's -- because the idea of giving each of us our
own was anathema, at the time.

For 35+ years, you could put one on YOUR desk for a few kilobucks.
Now, it's considerably less than that.

You'd have to be blind to NOT think that the components that
are "embedded" in products haven't -- and won't continue -- to
see similar reductions in price and increases in performance.

Do you think the folks making the components didn't anticipate
the potential demand for smaller/faster/cheaper chips?

We've had TCP/IP for decades. Why is it "suddenly" more ubiquitous
in product offerings? People *see* what they can do with a technology
in one application domain (e.g., desktop) and extrapolate that to
other, similar application domains (embedded).

I did my first full custom 30+ years ago. Now, I can buy an off-the-shelf
component and "program" it to get similar functionality (without
involving a service bureau). Ideas that previously were "gee, if only..."
are now commonplace.

> Yes, the embedded world /did/ cry out for 32-bit microcontrollers for an
> increasing proportion of tasks. We cried many tears when then
> microcontroller manufacturers offered to give more flash space to their
> 8-bit devices by having different memory models, banking, far jumps, and
> all the other shit that goes with not having a big enough address space.
> We cried out when we wanted to have Ethernet and the microcontroller
> only had a few KB of ram. I have used maybe 6 or 8 different 32-bit
> microcontroller processor architectures, and I used them because I
> needed them for the task. It's only in the past 5+ years that I have
> been using 32-bit microcontrollers for tasks that could be done fine
> with 8-bit devices, but the 32-bit devices are smaller, cheaper and
> easier to work with than the corresponding 8-bit parts.

But that's because your needs evolve and the tools you choose to
use have, as well.

I wanted to build a little line frequency clock to see how well it
could discipline my NTPd. I've got all these PCs, single board PCs,
etc. lying around. It was *easier* to hack together a small 8b
processor to do the job -- less hardware to understand, no OS
to get in the way, really simple to put a number on the interrupt
latency that I could expect, no uncertainties about the hardware
that's on the PC, etc.

OTOH, I have a network stack that I wrote for the Z180 decades
ago. Despite being written in a HLL, it is a bear to deploy and
maintain owing to the tools and resources available in that
platform. My 32b stack was a piece of cake to write, by comparison!

>> In the future, we'll see the 64b *phone* world drive the evolution
>> of embedded designs, similarly. (do you really need 32b/64b to
>> make a phone? how much code is actually executing at any given
>> time and in how many different containers?)
>
> We will see that on devices that are, roughly speaking, tablets -
> embedded systems with a good gui, a touchscreen, networking. And that's
> fine. But these are a tiny proportion of the embedded devices made.

Again, I disagree. You've already admitted to using 32b processors
where 8b could suffice. What makes you think you won't be using 64b
processors when 32b could suffice?

It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
The boards are essentially the same size. "System" power consumption
is almost identical. Cost is the sole differentiating factor, today.
History tells us it will be less so, tomorrow. And, the innovations
that will likely come in that offering will likely exceed the
capabilities (or perceived market needs) of smaller processors.
To say nothing of the *imagined* uses that future developers will
envision!

I can make a camera that "reports to google/amazon" to do motion detection,
remote access, etc. Or, for virtually the same (customer) dollars, I
can provide that functionality locally. Would a customer want to add
an "unnecessary" dependency to a solution? "Tired of being dependant
on Big Brother for your home security needs? ..." Imagine a 64b SoC
with a cellular radio: "I'll *call* you when someone comes to the door..."
(or SMS)

I have cameras INSIDE my garage that assist with my parking and
tell me if I've forgotten to close the garage door. Should I have
google/amazon perform those value-added tasks for me? Will they
tell me if I've left something in the car's path before I run over it?
Will they turn on the light to make it easier for me to see?
Should I, instead, tether all of those cameras to some "big box"
that does all of that signal processing? What happens to those
resources when the garage is "empty"??

The "electric eye" (interrupter) that guards against closing the
garage door on a toddler/pet/item in it's path does nothing to
protect me if I leave some portion of the vehicle in the path of
the door (but ABOVE the detection range of the interrupter).
Locating a *camera* on teh side of the doorway lets me detect
if ANYTHING is in the path of the door, regardless of how high
above the old interrupter's position it may be located.

How *many* camera interfaces should the SoC *directly* support?

The number (and type) of applications that can be addressed with
ADDITIONAL *local* smarts/resources is almost boundless. And, folks
don't have to wait for a cloud supplier (off-site processing) to
decide to offer them.

"Build it and they will come."

[Does your thermostat REALLY need all of that horsepower -- two
processors! -- AND google's server in order to control the HVAC
in your home? My god, how did that simple bimetallic strip
ever do it??!]

If you move into the commercial/industrial domains, the opportunities
are even more diverse! (e.g., build a camera that does component inspection
*in* the camera and interfaces to a go/nogo gate or labeller)

Note that none of these applications need a display, touch panel, etc.
What they likely need is low power, small size, connectivity, MIPS and
memory. The same sorts of things that are common in phones.

>>> The OP sounds more like a salesman than someone who actually works with
>>> embedded development in reality.
>>
>> Possibly. Or, just someone that wanted to stir up discussion...
>
> Could be. And there's no harm in that!

On that, we agree.

Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!

Paul Rubin

unread,
Jun 8, 2021, 9:20:13 PMJun 8
to
James Brakefield <jim.bra...@ieee.org> writes:
> Am trying to puzzle out what a 64-bit embedded processor should look like.

Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
remote web browser. There's your 64 bit embedded system.

Don Y

unread,
Jun 8, 2021, 9:29:55 PMJun 8
to
On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:

>> Am trying to puzzle out what a 64-bit embedded processor should look like.
>> At the low end, yeah, a simple RISC processor. And support for complex
>> arithmetic
>> using 32-bit floats? And support for pixel alpha blending using quad 16-bit
>> numbers?
>> 32-bit pointers into the software?
>
> The real value in 64 bit integer registers and 64 bit address space is
> just that, having an orthogonal "endless" space (well I remember some
> 30 years ago 32 bits seemed sort of "endless" to me...).
>
> Not needing to assign overlapping logical addresses to anything
> can make a big difference to how the OS is done.

That depends on what you expect from the OS. If you are
comfortable with the possibility of bugs propagating between
different subsystems, then you can live with a logical address
space that exactly coincides with a physical address space.

But, consider how life was before Windows used compartmentalized
applications (and OS). How easily it is for one "application"
(or subsystem) to cause a reboot -- unceremoniously.

The general direction (in software development, and, by
association, hardware) seems to be to move away from unrestrained
access to the underlying hardware in an attempt to limit the
amount of damage that a "misbehaving" application can cause.

You see this in languages designed to eliminate dereferencing
pointers, pointer arithmetic, etc. Languages that claim to
ensure your code can't misbehave because it can only do
exactly what the language allows (no more injecting ASM
into your HLL code).

I think that because you are the sole developer in your
application, you see a distorted vision of what the rest
of the development world encounters. Imagine handing your
codebase to a third party. And, *then* having to come
back to it and fix the things that "got broken".

Or, in my case, allowing a developer to install software
that I have to "tolerate" (for some definition of "tolerate")
without impacting the software that I've already got running.
(i.e., its ok to kill off his application if it is broken; but
he can't cause *my* portion of the system to misbehave!)

> 32 bit FPU seems useless to me, 64 bit is OK. Although 32 FP
> *numbers* can be quite useful for storing/passing data.

32 bit numbers have appeal if you're registers are 32b;
they "fit nicely". Ditto 64b in 64b registers.

Don Y

unread,
Jun 8, 2021, 9:33:31 PMJun 8
to
On 6/8/2021 1:39 PM, Dimiter_Popoff wrote:

> Not long ago in a chat with a guy who knew some of ARM 64 bit I gathered
> there is some real mess with their out of order execution, one needs to
> do... hmmmm.. "sync", whatever they call it, all the time and there is
> a huge performance cost because of that. Anybody heard anything about
> it? (I only know what I was told).

Many processors support instruction reordering (and many compilers
will reorder the code they generate). In each case, the reordering
is supposed to preserve semantics.

If the code "just runs" (and is never interrupted nor synchronized
with something else), the result should be the same.

If you want to be able to arbitrarily interrupt an instruction
sequence, then you need to take special measures. This is why
we have barriers, the ability to flush caches, etc.

For "generic" code, the developer isn't involved with any of this.
Inside the kernel (or device drivers), its often a different
story...

George Neuner

unread,
Jun 9, 2021, 12:16:42 AMJun 9
to
On Tue, 8 Jun 2021 22:11:18 +0200, David Brown
<david...@hesbynett.no> wrote:


>Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
>CISC devices are RISC...

It certainly is correct to say of the x86 that its legacy, programmer
visible, instruction set is CISC ... but it is no longer correct to
say that the chip design is CISC.

Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
onto the front of what essentially is a load/store RISC.

"Complex" x86 instructions (in RAM and/or $I cache) are dynamically
translated into equivalent short sequences[*] of RISC-like wide format
instructions which are what actually is executed. Those sequences
also are stored into a special trace cache in case they will be used
again soon - e.g., in a loop - so they (hopefully) will not have to be
translated again.


[*] Actually, a great many x86 instructions map 1:1 to internal RISC
instructions - only a small percentage of complex x86 instructions
require "emulation" via a sequence of RISC instructions.


>... Not all [RISC] are simple.

Correct. Every successful RISC CPU has supported a suite of complex
instructions.


Of course, YMMV.
George

David Brown

unread,
Jun 9, 2021, 3:18:02 AMJun 9
to
On 09/06/2021 02:30, Don Y wrote:
> On 6/8/2021 4:04 AM, David Brown wrote:
>> On 08/06/2021 09:39, Don Y wrote:
>>> On 6/7/2021 10:59 PM, David Brown wrote:
>>>> 8-bit microcontrollers are still far more common than 32-bit devices in
>>>> the embedded world (and 4-bit devices are not gone yet).  At the other
>>>> end, 64-bit devices have been used for a decade or two in some kinds of
>>>> embedded systems.
>>>
>>> I contend that a good many "32b" implementations are really glorified
>>> 8/16b applications that exhausted their memory space.
>>
>> Sure.  Previously you might have used 32 kB flash on an 8-bit device,
>> now you can use 64 kB flash on a 32-bit device.  The point is, you are
>> /not/ going to find yourself hitting GB limits any time soon.  The step
>
> I don't see the "problem" with 32b devices as one of address space limits
> (except devices utilizing VMM with insanely large page sizes).  As I said,
> in my application, task address spaces are really just a handful of pages.
>

32 bit address space is not typically a problem or limitation.

(One other use of 64-bit address space is for debug tools like valgrind
or "sanitizers" that use large address spaces along with MMU protection
and specialised memory allocation to help catch memory errors. But
these also need sophisticated MMU's and a lot of other resources not
often found on small embedded systems.)

> I *do* see (flat) address spaces that find themselves filling up with
> stack-and-heap-per-task, big chunks set aside for "onboard" I/Os,
> *partial* address decoding for offboard I/Os, etc.  (i.e., you're
> not likely going to fully decode a single address to access a set
> of DIP switches as the decode logic is disproportionately high
> relative to the functionality it adds)
>
> How often do you see a high-order address line used for kernel/user?
> (gee, now your "user" space has been halved)

Unless you are talking about embedded Linux and particularly demanding
(or inefficient!) tasks, halving your address space is not going to be a
problem.

>
>> from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
>> system - the step from 32-bit to 64-bit is totally pointless for 99.99%
>> of embedded systems.  (Even for most embedded Linux systems, you usually
>> only have a 64-bit cpu because you want bigger and faster, not because
>> of memory limitations.  It is only when you have a big gui with fast
>> graphics that 32-bit address space becomes a limitation.)
>
> You're assuming there has to be some "capacity" value to the 64b move.
>

I'm trying to establish if there is any value at all in moving to
64-bit. And I have no doubt that for the /great/ majority of embedded
systems, it would not.

I don't even see it as having noticeable added value in the solid
majority of embedded Linux systems produced. But in those systems, the
cost is minor or irrelevant once you have a big enough processor.

> You might discover that the ultralow power devices (for phones!)
> are being offered in the process geometries targeted for the 64b
> devices.

Process geometries are not targeted at 64-bit. They are targeted at
smaller, faster and lower dynamic power. In order to produce such a big
design as a 64-bit cpu, you'll aim for a minimum level of process
sophistication - but that same process can be used for twice as many
32-bit cores, or bigger sram, or graphics accelerators, or whatever else
suits the needs of the device.

A major reason you see 64-bit cores in big SOC's is that the die space
is primarily taken up by caches, graphics units, on-board ram,
networking, interfaces, and everything else. Moving the cpu core from
32-bit to 64-bit only increases the die size by a few percent, and for
some tasks it will also increase the the performance of the code by a
small but helpful amount. So it is not uncommon, even if you don't need
the additional address space.

(The other major reason is that for some systems, you want to work with
more than about 2 GB ram, and then life is much easier with 64-bit cores.)

On microcontrollers - say, a random Cortex-M4 or M7 device - changing to
a 64-bit core will increase the die by maybe 30% and give roughly /zero/
performance increase. You don't use 64-bit unless you really need it.



>  Or, that some integrated peripheral "makes sense" for
> phones (but not MCUs targeting motor control applications).  Or,
> that there are additional power management strategies supported
> in the hardware.
>
> In my mind, the distinction brought about by "32b" was more advanced
> memory protection/management -- even if not used in a particular
> application.  You simply didn't see these sorts of mechanisms
> in 8/16b offerings.  Likewise, floating point accelerators.  Working
> in smaller processors meant you had to spend extra effort to
> bullet-proof your code, economize on math operators, etc.

You need to write correct code regardless of the size of the device. I
disagree entirely about memory protection being useful there. This is
comp.arch.embedded, not comp.programs.windows (or whatever). An MPU
might make it easier to catch and fix bugs while developing and testing,
but code that hits MPU traps should not leave your workbench.

But you are absolutely right about maths (floating point or integer) -
having 32-bit gives you a lot more freedom and less messing around with
scaling back and forth to make things fit and work efficiently in 8-bit
or 16-bit. And if you have floating point hardware (and know how to use
it properly), that opens up new possibilities.

64-bit cores will extend that, but the step is almost negligable in
comparison. It would be wrong to say "int32_t is enough for anyone",
but it is /almost/ true. It is certainly true enough that it is not a
problem that using "int64_t" takes two instructions instead of one.

>> Some parts of code and data /do/ double in size - but not uniformly, of
>> course.  But your chip is bigger, faster, requires more power, has wider
>> buses, needs more advanced memories, has more balls on the package,
>> requires finer pitched pcb layouts, etc.
>
> And has been targeted to a market that is EXTREMELY power sensitive
> (phones!).

A phone cpu takes orders of magnitude more power to do the kinds of
tasks that might be typical for a microcontroller cpu - reading sensors,
controlling outputs, handling UARTs, SPI and I²C buses, etc. Phone cpus
are optimised for doing the "big phone stuff" efficiently - because
that's what takes the time, and therefore the power.

(I'm snipping because there is far too much here - I have read your
comments, but I'm trying to limit the ones I reply to.)

>>
>> We will see that on devices that are, roughly speaking, tablets -
>> embedded systems with a good gui, a touchscreen, networking.  And that's
>> fine.  But these are a tiny proportion of the embedded devices made.
>
> Again, I disagree.

I assume you are disagreeing about seeing 64-bit cpus only on devices
that need a lot of memory or processing power, rather than disagreeing
that such devices are only a tiny proportion of embedded devices.

> You've already admitted to using 32b processors
> where 8b could suffice.  What makes you think you won't be using 64b
> processors when 32b could suffice?

As I have said, I think there will be an increase in the proportion of
64-bit embedded devices - but I think it will be very slow and gradual.
Perhaps in 20 years time 64-bit will be in the place that 32-bit is
now. But it won't happen for a long time.

Why do I use 32-bit microcontrollers where an 8-bit one could do the
job? Well, we mentioned above that you can be freer with the maths.
You can, in general, be freer in the code - and you can use better tools
and languages. With ARM microcontrollers I can use the latest gcc and
C++ standards - I don't have to program in a weird almost-C dialect
using extensions to get data in flash, or pay thousands for a limited
C++ compiler with last century's standards. I don't have to try and
squeeze things into 8-bit scaled integers, or limit my use of pointers
due to cpu limitations.

And manufacturers make the devices smaller, cheaper, lower power and
faster than 8-bit devices in many cases.

If manufactures made 64-bit devices that are smaller, cheaper and lower
power than the 32-bit ones today, I'd use them. But they would not be
better for the job, or better to work with and better for development in
the way 32-bit devices are better than 8-bit and 16-bit.

>
> It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
> The boards are essentially the same size.  "System" power consumption
> is almost identical.  Cost is the sole differentiating factor, today.

For you, perhaps. Not necessarily for others.

We design, program and manufacture electronics. Production and testing
of simpler cards is cheaper. The pcbs are cheaper. The chips are
cheaper. The mounting is faster. The programming and testing is
faster. You don't mix big, thick tracks and high power on the same
board as tight-packed BGA with blind/buried vias - but you /can/ happily
work with less dense packages on the same board.

If you are talking about replacing one 400-ball SOC with another
400-ball SOC with a 64-bit core instead of a 32-bit core, then it will
make no difference in manufacturing. But if you are talking about
replacing a Cortex-M4 microcontroller with a Cortex-A53 SOC, it /will/
be a lot more expensive in most volumes.

I can't really tell what kinds of designs you are discussing here. When
I talk about embedded systems in general, I mean microcontrollers
running specific programs - not general-purpose computers in embedded
formats (such as phones).

(For very small volumes, the actual physical production costs are a
small proportion of the price, and for very large volumes you have
dedicated machines for the particular board.)

>>> Possibly.  Or, just someone that wanted to stir up discussion...
>>
>> Could be.  And there's no harm in that!
>
> On that, we agree.
>
> Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!

I've not heard of that as a dieting method, but I shall give it a try :-)

David Brown

unread,
Jun 9, 2021, 4:40:43 AMJun 9
to
On 09/06/2021 06:16, George Neuner wrote:
> On Tue, 8 Jun 2021 22:11:18 +0200, David Brown
> <david...@hesbynett.no> wrote:
>
>
>> Pretty much all processors except x86 and brain-dead old-fashioned 8-bit
>> CISC devices are RISC...
>
> It certainly is correct to say of the x86 that its legacy, programmer
> visible, instruction set is CISC ... but it is no longer correct to
> say that the chip design is CISC.
>
> Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
> onto the front of what essentially is a load/store RISC.
>

Absolutely. But from the user viewpoint, it is the ISA that matters -
it is a CISC ISA. The implementation details are mostly hidden (though
sometimes it is useful to know about timings).

> "Complex" x86 instructions (in RAM and/or $I cache) are dynamically
> translated into equivalent short sequences[*] of RISC-like wide format
> instructions which are what actually is executed. Those sequences
> also are stored into a special trace cache in case they will be used
> again soon - e.g., in a loop - so they (hopefully) will not have to be
> translated again.
>
>
> [*] Actually, a great many x86 instructions map 1:1 to internal RISC
> instructions - only a small percentage of complex x86 instructions
> require "emulation" via a sequence of RISC instructions.
>

And also, some sequences of several x86 instructions map to single RISC
instructions, or to no instructions at all.

It is, of course, a horrendously complex mess - and is a major reason
for x86 cores taking more power and costing more than RISC cores for the
same performance.

>
>> ... Not all [RISC] are simple.
>
> Correct. Every successful RISC CPU has supported a suite of complex
> instructions.
>

Yes. People often parse RISC as R(IS)C - i.e., they think it means the
ISA has a small instruction set. It should be parsed (RI)SC - the
instructions are limited compared to those on a (CI)SC cpu.

>
> Of course, YMMV.
> George
>

David Brown

unread,
Jun 9, 2021, 4:59:34 AMJun 9
to
sync instructions of various types can be needed to handle
thread/process synchronisation, atomic accesses, and coordination
between software and hardware registers. Software normally runs with
the idea that it is the only thing running, and the cpu can re-order and
re-arrange the instructions and execution as long as it maintains the
illusion that the assembly instructions in the current thread are
executed one after the other. These re-arrangements and parallel
execution can give very large performance benefits.

But it also means that when you need to coordinate with other things,
you need syncs, perhaps cache flushes, etc. Full syncs can take
hundreds of cycles to execute on large processors. So you need to
distinguish between reads and writes, acquires and releases, syncs on
single addresses or general memory syncs. Big processors are optimised
for throughput, not latency or quick reaction to hardware events.

There are good reasons why big cpus are often paired with a Cortex-M
core in SOCs.


Don Y

unread,
Jun 9, 2021, 6:12:41 AMJun 9
to
On 6/9/2021 12:17 AM, David Brown wrote:

>>> from 8-bit or 16-bit to 32-bit is useful to get a bit more out of the
>>> system - the step from 32-bit to 64-bit is totally pointless for 99.99%
>>> of embedded systems. (Even for most embedded Linux systems, you usually
>>> only have a 64-bit cpu because you want bigger and faster, not because
>>> of memory limitations. It is only when you have a big gui with fast
>>> graphics that 32-bit address space becomes a limitation.)
>>
>> You're assuming there has to be some "capacity" value to the 64b move.
>
> I'm trying to establish if there is any value at all in moving to
> 64-bit. And I have no doubt that for the /great/ majority of embedded
> systems, it would not.

That;s a no-brainer -- most embedded systems are small MCUs.
Consider the PC I'm sitting at has an MCU in the keyboard;
another in the mouse; one in the optical disk drive; one in
the rust disk drive; one in the printer; two in the UPS;
one in the wireless "modem"; one in the router; one in
the thumb drive; etc. All offsetting the "big" CPU in
the computer, itself.

> I don't even see it as having noticeable added value in the solid
> majority of embedded Linux systems produced. But in those systems, the
> cost is minor or irrelevant once you have a big enough processor.

My point is that the market can distort the "price/value"
relationship in ways that might not, otherwise, make sense.
A "better" device may end up costing less than a "worse"
device -- simply because of the volumes that the population
of customers favor.

>> You might discover that the ultralow power devices (for phones!)
>> are being offered in the process geometries targeted for the 64b
>> devices.
>
> Process geometries are not targeted at 64-bit. They are targeted at
> smaller, faster and lower dynamic power. In order to produce such a big
> design as a 64-bit cpu, you'll aim for a minimum level of process
> sophistication - but that same process can be used for twice as many
> 32-bit cores, or bigger sram, or graphics accelerators, or whatever else
> suits the needs of the device.

They will apply newer process geometries to newer devices.
No one is going to retool an existing design -- unless doing
so will result in a significant market enhancement.

Why don't we have 100MHz MC6800's?

> A major reason you see 64-bit cores in big SOC's is that the die space
> is primarily taken up by caches, graphics units, on-board ram,
> networking, interfaces, and everything else. Moving the cpu core from
> 32-bit to 64-bit only increases the die size by a few percent, and for
> some tasks it will also increase the the performance of the code by a
> small but helpful amount. So it is not uncommon, even if you don't need
> the additional address space.
>
> (The other major reason is that for some systems, you want to work with
> more than about 2 GB ram, and then life is much easier with 64-bit cores.)
>
> On microcontrollers - say, a random Cortex-M4 or M7 device - changing to
> a 64-bit core will increase the die by maybe 30% and give roughly /zero/
> performance increase. You don't use 64-bit unless you really need it.

Again, "... unless the market has made those devices cheaper than
their previous choices" People don't necessarily "fit" their
applications to the devices they choose; they consider other
factors (cost, package type, availability, etc.) in deciding
what to actual design into the product.

You might "need" X MB of RAM but will "tolerate" 4X -- if the
price is better than for the X MB *or* the X MB devices are
not available. If the PCB layout can directly accommodate
such a solution, then great! But, even if not, a PCB
revision is a cheap expenditure if it lets you take advantage of
a different component.

I've made very deliberate efforts NOT to use many of the
"I/Os" on the MCUs that I'm designing around so I can
have more leeway in making that selection when released
to production (every capability used represents a
constraint that OTHER selections must satisfy)

>> Or, that some integrated peripheral "makes sense" for
>> phones (but not MCUs targeting motor control applications). Or,
>> that there are additional power management strategies supported
>> in the hardware.
>>
>> In my mind, the distinction brought about by "32b" was more advanced
>> memory protection/management -- even if not used in a particular
>> application. You simply didn't see these sorts of mechanisms
>> in 8/16b offerings. Likewise, floating point accelerators. Working
>> in smaller processors meant you had to spend extra effort to
>> bullet-proof your code, economize on math operators, etc.
>
> You need to write correct code regardless of the size of the device. I
> disagree entirely about memory protection being useful there. This is
> comp.arch.embedded, not comp.programs.windows (or whatever). An MPU
> might make it easier to catch and fix bugs while developing and testing,
> but code that hits MPU traps should not leave your workbench.

You're assuming you (or I) have control over all of the code that
executes on a product/platform. And, that every potential bug
manifests *in* testing. (If that were the case, we'd never
see bugs in the wild!)

In my case, "third parties" (who the hell is the SECOND party??)
can install code that I've no control over. That code could
be buggy -- or malevolent. Being able to isolate "actors"
from each other means the OS can detect "can't happens"
at run time and shut down the offender -- instead of letting
it corrupt some part of the system.

> But you are absolutely right about maths (floating point or integer) -
> having 32-bit gives you a lot more freedom and less messing around with
> scaling back and forth to make things fit and work efficiently in 8-bit
> or 16-bit. And if you have floating point hardware (and know how to use
> it properly), that opens up new possibilities.
>
> 64-bit cores will extend that, but the step is almost negligable in
> comparison. It would be wrong to say "int32_t is enough for anyone",
> but it is /almost/ true. It is certainly true enough that it is not a
> problem that using "int64_t" takes two instructions instead of one.

Except that int64_t can take *four* instead of one (add/sub/mul two
int64_t's with 32b hardware).

>>> Some parts of code and data /do/ double in size - but not uniformly, of
>>> course. But your chip is bigger, faster, requires more power, has wider
>>> buses, needs more advanced memories, has more balls on the package,
>>> requires finer pitched pcb layouts, etc.
>>
>> And has been targeted to a market that is EXTREMELY power sensitive
>> (phones!).
>
> A phone cpu takes orders of magnitude more power to do the kinds of
> tasks that might be typical for a microcontroller cpu - reading sensors,
> controlling outputs, handling UARTs, SPI and I²C buses, etc. Phone cpus
> are optimised for doing the "big phone stuff" efficiently - because
> that's what takes the time, and therefore the power.

But you're making assumptions about what the "embedded microcontroller"
will actually be called upon to do!

Most of my embedded devices have "done more" than the PCs on which
they were designed -- despite the fact that the PC can defrost bagels!

> (I'm snipping because there is far too much here - I have read your
> comments, but I'm trying to limit the ones I reply to.)
>
>>>
>>> We will see that on devices that are, roughly speaking, tablets -
>>> embedded systems with a good gui, a touchscreen, networking. And that's
>>> fine. But these are a tiny proportion of the embedded devices made.
>>
>> Again, I disagree.
>
> I assume you are disagreeing about seeing 64-bit cpus only on devices
> that need a lot of memory or processing power, rather than disagreeing
> that such devices are only a tiny proportion of embedded devices.

I'm disagreeing with the assumption that 64bit CPUs are solely used
on "tablets, devices with good GUIs, touchscreens, networking"
(in the embedded domain).

>> You've already admitted to using 32b processors
>> where 8b could suffice. What makes you think you won't be using 64b
>> processors when 32b could suffice?
>
> As I have said, I think there will be an increase in the proportion of
> 64-bit embedded devices - but I think it will be very slow and gradual.
> Perhaps in 20 years time 64-bit will be in the place that 32-bit is
> now. But it won't happen for a long time.

And how is that any different from 32b processors introduced in 1980
only NOW seeing any sort of "widespread" use?

The adoption of new technologies accelerates, over time. People
(not "everyone") are more willing to try new things -- esp if
it is relatively easy to do so. I can buy a 64b evaluation kit
for a few hundred dollars -- I paid more than that for my first
8" floppy drive. I can run/install some demo software and
get a feel for the level of performance, how much power
is consumed, etc. I don't need to convince my employer to
make that investment (so *I* can explore).

In a group environment, if such a solution is *suggested*,
I can then lend my support -- instead of shying away out of
fear of the unknown risks.

> Why do I use 32-bit microcontrollers where an 8-bit one could do the
> job? Well, we mentioned above that you can be freer with the maths.
> You can, in general, be freer in the code - and you can use better tools
> and languages.

Exactly. It's "easier" and you're less concerned with sorting
out (later) what might not fit or be fast enough, etc.

I could have done my current project with a bunch of PICs
talking to a "big machine" over EIA485 links (I'd done an
industrial automation project like that, before). But,
unless you can predict how many sensors/actuators ("motes")
there will EVER be, it's hard to determine how "big" that
computer needs to be!

Given that the cost of the PIC is only partially reflective
of the cost of the DEPLOYED mote (run cable, attach and
calibrate sensors/actuators, etc.) the added cost of
moving to a bigger device on that mote disappears.
Especially when you consider the flexibility it affords
(in terms of scaling)

> With ARM microcontrollers I can use the latest gcc and
> C++ standards - I don't have to program in a weird almost-C dialect
> using extensions to get data in flash, or pay thousands for a limited
> C++ compiler with last century's standards. I don't have to try and
> squeeze things into 8-bit scaled integers, or limit my use of pointers
> due to cpu limitations.
>
> And manufacturers make the devices smaller, cheaper, lower power and
> faster than 8-bit devices in many cases.
>
> If manufactures made 64-bit devices that are smaller, cheaper and lower
> power than the 32-bit ones today, I'd use them. But they would not be
> better for the job, or better to work with and better for development in
> the way 32-bit devices are better than 8-bit and 16-bit.

Again, you're making predictions about what those devices will be.

Imagine 64b devices ARE equipped with radios. You can ADD a radio
to your "better suited" 32b design. Or, *buy* the radio already
integrated into the 64b solution. Are you going to stick with
32b devices because they are "better suited" to the application?
Or, will you "suffer" the pains of embracing the 64b device?

It's not *just* a CPU core that you're dealing with. Just like
the 8/16 vs 32b decision isn't JUST about the width of the registers
in the device or size of the address space.

I mentioned my little experimental LFC device to discipline my
NTPd. It would have been *nice* if it had an 8P8C onboard
so I could talk to it "over the wire". But, that's not the
appropriate sort of connectivity for an 8b device -- a serial
port is. If I didn't have a means of connecting to it thusly,
the 8b solution -- despite being a TINY development effort -- would
have been impractical; bolting on a network stack and NIC would
greatly magnify the cost (development time) of that platform.

>> It's just as hard for me to prototype a 64b SoC as it is a 32b SoC.
>> The boards are essentially the same size. "System" power consumption
>> is almost identical. Cost is the sole differentiating factor, today.
>
> For you, perhaps. Not necessarily for others.
>
> We design, program and manufacture electronics. Production and testing
> of simpler cards is cheaper. The pcbs are cheaper. The chips are
> cheaper. The mounting is faster. The programming and testing is
> faster. You don't mix big, thick tracks and high power on the same
> board as tight-packed BGA with blind/buried vias - but you /can/ happily
> work with less dense packages on the same board.
>
> If you are talking about replacing one 400-ball SOC with another
> 400-ball SOC with a 64-bit core instead of a 32-bit core, then it will
> make no difference in manufacturing. But if you are talking about
> replacing a Cortex-M4 microcontroller with a Cortex-A53 SOC, it /will/
> be a lot more expensive in most volumes.
>
> I can't really tell what kinds of designs you are discussing here. When
> I talk about embedded systems in general, I mean microcontrollers
> running specific programs - not general-purpose computers in embedded
> formats (such as phones).

I cite phones as an example of a "big market" that will severely
impact the devices (MCUs) that are actually manufactured and sold.

I increasingly see "applications" growing in complexity -- beyond
"single use" devices in the past. Devices talk to more things
(devices) than they had, previously. Interfaces grow in
complexity (markets often want to exercise some sort of control
or configuration over a device -- remotely -- instead of just
letting it do its ONE thing).

In the past, additional functionality was an infrequent upgrade.
Now, designs accommodate it "in the field" -- because they
are expected to (no one wants to mail a device back to the factory
for a software upgrade -- or have a visit from a service tech
for that purpose).

Rarely does a product become LESS complex, with updates. I've
often found myself updating a design only to discover I've
run out of some resource ("ROM", RAM, real-time, etc.). This
never causes the update to be aborted; rather, it forces
an unexpected diversion into shoehorning the "new REQUIREMENTS"
into the old "5 pound sack".

In *my* case, there are fixed applications (MANY) running on
the hardware. But, the system is designed to allow for
new applications to be added, old ones replaced (or retired),
augmented with additional hardware, etc. It's not the "closed
unless updated" systems previously common.

We made LORAN-C position plotters, ages ago. Conceptually,
cut a portion of a commercially available map and adhere it
to the plotter bed. Position the pen at your current location
on the map. Turn on. Start driving ("sailing"). The pen
will move to indicate your NEW current position as well as
a track indicating your path TO that (from wherever you
were a moment ago).

[This uses 100% of an 8b processor's real-time to keep up
with the updates from the navigation receiver.]

"Gee, what if the user doesn't have a commercial map,
handy? Can't we *draw* one for him?"

[Hmmm... if we concentrate on JUST drawing a map, then
we can spend 100% of the CPU on THAT activity! We'll just
need to find some extra space to store the code required
and RAM to hold the variables we'll need...]

"Gee, when the fisherman drops a lobster pot over the
side, he has to run over to the plotter to mark the
current location -- so he can return to it at some later
date. Why can't we give him a button (on a long cable)
that automatically draws an 'X' on the plot each time
he depresses it?"

You can see where this is going...

Devices grow in features and complexity. If that plotter
was designed today, it would likely have a graphic display
(instead of pen and ink). And the 'X' would want to be
displayed in RED (or, some user-configured color). And
another color for the map to distinguish it from the "track".
And updates would want to be distributed via a phone
or thumbdrive or other "user accessible" medium.

This because the needs of such a device will undoubtedly
evolve. How often have you updated the firmware in
your disk drives? Optical drives? Mice? Keyboard?
Microwave oven? TV?

We designed medical instruments where the firmware resided
in a big, bulky "module" that could easily be removed
(expensive ZIF connector!) -- so that medtechs could
perform the updates in minutes (instead of taking the device
out of service). But, as long as we didn't overly tax the
real-time demands of the "base hardware", we were free
(subject to pricing issues) to enhance that "module" to
accommodate whatever new features were required. The product
could "remain current".

Like adding RAM to a PC to extend its utility (why can't I add
RAM to my SmartTVs? Why can't I update their codecs?)

The upgradeable products are designed for longer service lives
than the nonupgradable examples, here. So, they have to be
able to accommodate (in their "base designs" a wider variety
of unforeseeable changes.

If you expect a short service life, then you can rationalize NOT
upgrading/updating and simply expecting the user to REPLACE the
device at some interval that your marketeers consider appropriate.

> (For very small volumes, the actual physical production costs are a
> small proportion of the price, and for very large volumes you have
> dedicated machines for the particular board.)
>
>>>> Possibly. Or, just someone that wanted to stir up discussion...
>>>
>>> Could be. And there's no harm in that!
>>
>> On that, we agree.
>>
>> Time for ice cream (easiest -- and most enjoyable -- way to lose weight)!
>
> I've not heard of that as a dieting method, but I shall give it a try :-)

It's not recommended. I suspect it is evidence of some sort of
food allergy that causes my body not to process calories properly
(a tablespoon is 200+ calories; an enviable "scoop" is well over a
thousand!). It annoys my other half to no end cuz she gains weight
just by LOOKING at the stuff! :> So, its best for me to "sneak"
it when she can't set eyes on it. Or, for me to make flavors
that she's not keen on (this was butter pecan so she is REALLY
annoyed!)

Theo

unread,
Jun 9, 2021, 8:10:31 AMJun 9
to
Don Y <blocked...@foo.invalid> wrote:
> On 6/8/2021 7:46 AM, Theo wrote:
> > I think there will be divergence about what people mean by an N-bit system:
> >
> > Register size
> > Unit of logical/arithmetical processing
> > Memory address/pointer size
> > Memory bus/cache width
>
> (General) Register size is the primary driver.

Is it, though? What's driving that?
Why do you want larger registers without a larger ALU width?

I don't think register size is of itself a primary pressure. On larger CPUs
with lots of rename or vector registers, they have kilobytes of SRAM to hold
the registers, and increasing the size is a cost. On a basic in-order MCU
with 16 or 32 registers, is the register width an issue? We aren't
designing them on 10 micron technology any more.

I would expect datapath width to be more critical, but again that's
relatively small on an in-order CPU, especially compared with on-chip SRAM.

> However, it support 16b operations -- on register PAIRs
> (an implicit acknowledgement that the REGISTER is smaller
> than the register pair). This is common on many smaller
> processors. The address space is 16b -- with a separate 16b
> address space for I/Os. The Z180 extends the PHYSICAL
> address space to 20b but the logical address space
> remains unchanged at 16b (if you want to specify a physical
> address, you must use 20+ bits to represent it -- and invoke
> a separate mechanism to access it!). The ALU is *4* bits.

This is not really the world of a current 32-bit MCU, which has a 32 bit
datapath and 32 bit registers. Maybe it does 64 bit arithmetic in 32 bit
chunks, which then leads to the question of which MCU workloads require 64
bit arithmetic?

> But you don't buy MCUs with a-la-carte pricing. How much does an extra
> timer cost me? What if I want it to also serve as a *counter*? What
> cost for 100K of internal ROM? 200K?
>
> [It would be an interesting exercise to try to do a linear analysis of
> product prices with an idea of trying to tease out the "costs" (to
> the developer) for each feature in EXISTING products!]
>
> Instead, you see a *price* that is reflective of how widely used the
> device happens to be, today. You are reliant on the preferences of others
> to determine which is the most cost effective product -- for *you*.

Sure, what you buy is a 'highest common denominator' - you get things you
don't use, but that other people do. But it still depends on a significant
chunk of the market demanding those features. It's then a cost function of
how much the market wants a feature against how much it'll cost to implement
(and at runtime). If the cost is tiny, it may well get implemented even if
almost nobody asked for it.

If there's a use case, people will pay for it.
(although maybe not enough)

Theo

Don Y

unread,
Jun 9, 2021, 9:20:27 AMJun 9
to
On 6/9/2021 5:10 AM, Theo wrote:
> Don Y <blocked...@foo.invalid> wrote:
>> On 6/8/2021 7:46 AM, Theo wrote:
>>> I think there will be divergence about what people mean by an N-bit system:
>>>
>>> Register size
>>> Unit of logical/arithmetical processing
>>> Memory address/pointer size
>>> Memory bus/cache width
>>
>> (General) Register size is the primary driver.
>
> Is it, though? What's driving that?
> Why do you want larger registers without a larger ALU width?

You can use a smaller ALU (in the days when silicon was expensive)
to do the work of a larger one -- if you spread the operation over
time.

> I don't think register size is of itself a primary pressure. On larger CPUs
> with lots of rename or vector registers, they have kilobytes of SRAM to hold
> the registers, and increasing the size is a cost. On a basic in-order MCU
> with 16 or 32 registers, is the register width an issue? We aren't
> designing them on 10 micron technology any more.

It's just how people think of CPU widths. If there's no cost to
register width, then why didn't 8b CPUs have 64 bit accumulators
(and register files)?

> I would expect datapath width to be more critical, but again that's
> relatively small on an in-order CPU, especially compared with on-chip SRAM.
>
>> However, it support 16b operations -- on register PAIRs
>> (an implicit acknowledgement that the REGISTER is smaller
>> than the register pair). This is common on many smaller
>> processors. The address space is 16b -- with a separate 16b
>> address space for I/Os. The Z180 extends the PHYSICAL
>> address space to 20b but the logical address space
>> remains unchanged at 16b (if you want to specify a physical
>> address, you must use 20+ bits to represent it -- and invoke
>> a separate mechanism to access it!). The ALU is *4* bits.
>
> This is not really the world of a current 32-bit MCU, which has a 32 bit
> datapath and 32 bit registers.

Correct. I was just illustrating how you can have different
"widths" in a single architecture; yet a single "CPU width"
has to be used to describe it.

> Maybe it does 64 bit arithmetic in 32 bit
> chunks, which then leads to the question of which MCU workloads require 64
> bit arithmetic?

I treat time as a 64b entity (32b being inadequate).
IPv6 addresses won't fit in 32b.
There are also algorithms that can benefit from processing
data in wider chunks (e.g., count the number of set bits
in a 64b array goes faster in a 64b register than on a 32)
My BigRationals would be noticeably faster if I could process
64b at a time, instead of 32.

[This, of course, assumes D cache can hold "as much data" in each
case.]

And you don't always need the full width of a register -- do you use
all 32b of a register when you use it to keep track of the remaining
number of iterations of a loop? Or, the index into an array? Or the
time remaining until an upcoming deadline? Or processing characters
in a string?

>> But you don't buy MCUs with a-la-carte pricing. How much does an extra
>> timer cost me? What if I want it to also serve as a *counter*? What
>> cost for 100K of internal ROM? 200K?
>>
>> [It would be an interesting exercise to try to do a linear analysis of
>> product prices with an idea of trying to tease out the "costs" (to
>> the developer) for each feature in EXISTING products!]
>>
>> Instead, you see a *price* that is reflective of how widely used the
>> device happens to be, today. You are reliant on the preferences of others
>> to determine which is the most cost effective product -- for *you*.
>
> Sure, what you buy is a 'highest common denominator' - you get things you
> don't use, but that other people do. But it still depends on a significant
> chunk of the market demanding those features.

Yes. Or, an application domain that consumes lots of parts.

> It's then a cost function of
> how much the market wants a feature against how much it'll cost to implement
> (and at runtime). If the cost is tiny, it may well get implemented even if
> almost nobody asked for it.

You also have to remember that the seller isn't the sole actor in that
negotiation. Charge too much and the customer can opt for a different
(possibly "second choice") implementation.

So, it is in the seller's interest to make his product as cost-effectively
as possible. *Or*, have something that can't be obtained elsewhere.

Nowadays, there are no second sources as there were in decades past.
OTOH, I can find *another* ARM (for example) that may be "close enough"
to what I need and largely compatible with my existing codebase.
So, try to "hold me up" (overcharge) and I may find myself motivated
to visit one of your competitors.

[As HLLs are increasingly used, it's considerably easier to port a
design to a different processor family entirely! Not so when you had
100K of ASM to leverage]

I worked in a Motogorilla shop, years ago. When I started my design,
I brought in folks from other vendors. The Motogorilla rep got spooked;
to lose a design to another house would require answering some serious
questions from his superiors ("How did you lose the account?"). He
was especially nervous that the only Moto offering that I was considering
was second sourced by 7 or 8 other vendors... so, even if the device
got the design, he would likely have competitors keeping his pricing
in line.

> If there's a use case, people will pay for it.
> (although maybe not enough)

Designers often have somewhat arbitrary criteria for their decisions.
Maybe you're looking for something that will be available for at
least a decade. Or, have alternate sources that could be called upon
in case your fab was compromised or oversold (nothing worse than
hearing parts are "on allocation"!)

So, a vendor can't assume he has the "right" solution (or price) for a
given application. Maybe the designer has a "history" with a particular
vendor or product line and can leverage that experience in ways that
wouldn't apply to a different vendor.

A vendor's goal should always be to produce the best device for his
perceived/targeted audience at the best price point. Then, get it
into their hands so they are ready to embrace it when the opportunity
presents.

Microchip took an interesting approach trying to buy into "hobbyists"
with cheap evaluation boards and tools. I'm sure these were loss leaders.
But, if they ended up winning a design (or two) because the "hobbyist"
was in a position to influence a purchasing decision...

Paul Rubin

unread,
Jun 9, 2021, 12:41:30 PMJun 9
to
David Brown <david...@hesbynett.no> writes:
> I can't really tell what kinds of designs you are discussing here. When
> I talk about embedded systems in general, I mean microcontrollers
> running specific programs - not general-purpose computers in embedded
> formats (such as phones).

Philip Munts made a comment a while back that stayed with me: that these
days, in anything mains powered, there is usually little reason to use
an MCU instead of a Linux board.

Theo

unread,
Jun 9, 2021, 1:08:04 PMJun 9
to
I suppose there's a question of what embedded tasks intrinsically require
>4GiB RAM, and those that do so because it makes programmers' lives easier?

In other words, you /can/ write a function to detect if your fish tank is
hot or cold in Javascript that runs in a web app on top of Chromium on top
of Linux. Or you could make it out of a 6502, or a pair of logic gates.

That's complexity that's not fundamental to the application. OTOH
maintaining a database that's larger than 4GB physically won't work without
that amount of memory (or storage, etc).

There are obviously plenty of computer systems doing that, but the question
I don't know is what applications can be said to be 'embedded' but need that
kind of RAM.

Theo

Don Y

unread,
Jun 9, 2021, 1:12:54 PMJun 9
to
I note that anytime you use a COTS "module" of any kind, you're still
stuck having to design and layout some sort of "add-on" card that
handles your specific I/O needs; few real world devices can be
controlled with just serial ports, NICs and "storage interfaces".

And, you're now dependant on a board supplier as well as having
to understand what's on (and in) that board as they are now
critical components of YOUR product. The same applies to any firmware
or software that it runs.

I'm sure the FAA, FDA, etc. will gladly allow you to formally
validate some other party's software and assume responsibility
for its proper operation!

DJ Delorie

unread,
Jun 9, 2021, 1:16:22 PMJun 9
to
Paul Rubin <no.e...@nospam.invalid> writes:
> Philip Munts made a comment a while back that stayed with me: that these
> days, in anything mains powered, there is usually little reason to use
> an MCU instead of a Linux board.

I have a friend who has a ceiling fan with a raspberry pi in it, because
that was the easiest solution to turning it on and off remotely...

So yeah, I agree, "with a computer" is becoming a default answer.

On the other hand, my furnace (now geothermal) has been controlled by a
linux board since 2005 or so... maybe I'm not the typical user ;-)

Paul Rubin

unread,
Jun 9, 2021, 1:34:51 PMJun 9
to
Theo <theom...@chiark.greenend.org.uk> writes:
>> Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
>> remote web browser. There's your 64 bit embedded system.
> I suppose there's a question of what embedded tasks intrinsically require
>>4GiB RAM, and those that do so because it makes programmers' lives easier?

You can buy a Raspberry Pi 4 with up to 8gb of ram, but the most common
configuration is 2gb. The cpu is 64 bit anyway because why not?

> There are obviously plenty of computer systems doing that, but the
> question I don't know is what applications can be said to be
> 'embedded' but need that kind of RAM.

Lots of stuff is using 32 bit cpus with a few KB of ram these days. 32
bits is displacing 8 bits in the MCU world.

Is 64 bit displacing 32 bit in application processors like the Raspberry
Pi, even when less than 4GB of ram is involved? I think yes, at least
to some extent, and it will continue. My fairly low end mobile phone
has 2GB of ram and a 64 bit 4-core processor, I think.

Will 64 bit MCU's displace 32 bit MCUs? I don't know, maybe not.

Are application processors displacing MCU's in embedded systems? Not
much in portable and wearable stuff (other than phones) at least for
now, but in larger devices I think yes, at least somewhat for now, and
probably more going forward. Even if you're not using networking, it
makes software and UI development a heck of a lot easier.

Phil Hobbs

unread,
Jun 9, 2021, 1:44:24 PMJun 9
to
Except that if it has a network connection, you have to patch it
unendingly or suffer the common-as-dirt IoT security nightmares.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal Consultant
ElectroOptical Innovations LLC / Hobbs ElectroOptics
Optics, Electro-optics, Photonics, Analog Electronics
Briarcliff Manor NY 10510

http://electrooptical.net
http://hobbs-eo.com

Dimiter_Popoff

unread,
Jun 9, 2021, 1:56:30 PMJun 9
to
On 6/9/2021 4:29, Don Y wrote:
> On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:
>
>>> Am trying to puzzle out what a 64-bit embedded processor should look
>>> like.
>>> At the low end, yeah, a simple RISC processor.  And support for
>>> complex arithmetic
>>> using 32-bit floats?  And support for pixel alpha blending using quad
>>> 16-bit numbers?
>>> 32-bit pointers into the software?
>>
>> The real value in 64 bit integer registers and 64 bit address space is
>> just that, having an orthogonal "endless" space (well I remember some
>> 30 years ago 32 bits seemed sort of "endless" to me...).
>>
>> Not needing to assign overlapping logical addresses to anything
>> can make a big difference to how the OS is done.
>
> That depends on what you expect from the OS.  If you are
> comfortable with the possibility of bugs propagating between
> different subsystems, then you can live with a logical address
> space that exactly coincides with a physical address space.

So how does the linear 64 bt address space get in the way of
any protection you want to implement? Pages are still 4 k and
each has its own protection attributes governed by the OS,
it is like that with 32 bit processors as well (I talk power, I am
not interested in half baked stuff like ARM, risc-v etc., I don't
know if there could be a problem like that with one of these).

There is *nothing* to gain on a 64 bit machine from segmentation,
assigning overlapping address spaces to tasks etc.

Notice I am talking *logical* addresses, I was explicit about
that.

Dimiter_Popoff

unread,
Jun 9, 2021, 2:00:25 PMJun 9
to
Of course I know all that David, I have been using power processors
which do things out of order for over 20 years now.
What I was told was something about a real mess, like system memory
accesses getting wrong because of out of order execution hence
plenty of syncs needed to keep the thing working. I have not
even tried to verify that, only someone with experience with 64 bit
ARM can do that - so far none here seems to have that.

David Brown

unread,
Jun 9, 2021, 2:55:18 PMJun 9
to
It depends on the actual PPC's in question - with single core devices
targeted for embedded systems, you don't need much of that at all.
Perhaps an occasional sync of some sort in connection with using DMA,
but that's about it. Key to this is, of course, having your MPU set up
right to make sure hardware register accesses are in-order and not cached.

> What I was told was something about a real mess, like system memory
> accesses getting wrong because of out of order execution hence
> plenty of syncs needed to keep the thing working. I have not
> even tried to verify that, only someone with experience with 64 bit
> ARM can do that - so far none here seems to have that.
>

If the person programming the device has made incorrect assumptions, or
incorrect setup, then yes, things can go wrong if something other than
the current core is affected by the reads or writes.


Dimiter_Popoff

unread,
Jun 9, 2021, 3:03:21 PMJun 9
to
On 6/9/2021 20:44, Phil Hobbs wrote:
> Paul Rubin wrote:
>> David Brown <david...@hesbynett.no> writes:
>>> I can't really tell what kinds of designs you are discussing here.  When
>>> I talk about embedded systems in general, I mean microcontrollers
>>> running specific programs - not general-purpose computers in embedded
>>> formats (such as phones).
>>
>> Philip Munts made a comment a while back that stayed with me: that these
>> days, in anything mains powered, there is usually little reason to use
>> an MCU instead of a Linux board.
>>
>
> Except that if it has a network connection, you have to patch it
> unendingly or suffer the common-as-dirt IoT security nightmares.
>
> Cheers
>
> Phil Hobbs
>

Those nightmares do not apply if you are in complete control of your
firmware - which few people are nowadays indeed.

I have had netMCA devices on the net for over 10 years now in many
countries, the worst problem I have seen was some Chinese IP hanging
on port 80 to no consequences.

Dimiter_Popoff

unread,
Jun 9, 2021, 3:06:57 PMJun 9
to
You *do* need it enough to know what is there to know about it, I have
been through it all. How big a latency there is is irrelevant to the
point.

>> What I was told was something about a real mess, like system memory
>> accesses getting wrong because of out of order execution hence
>> plenty of syncs needed to keep the thing working. I have not
>> even tried to verify that, only someone with experience with 64 bit
>> ARM can do that - so far none here seems to have that.
>>
>
> If the person programming the device has made incorrect assumptions, or
> incorrect setup, then yes, things can go wrong if something other than
> the current core is affected by the reads or writes.
>

May be the assumptions of the person were wrong. Or may be your
assumption that their assumptions were wrong is wrong.
Neither of us knows which it is.

Phil Hobbs

unread,
Jun 9, 2021, 3:22:44 PMJun 9
to
Dimiter_Popoff wrote:
> On 6/9/2021 20:44, Phil Hobbs wrote:
>> Paul Rubin wrote:
>>> David Brown <david...@hesbynett.no> writes:
>>>> I can't really tell what kinds of designs you are discussing here.
>>>> When
>>>> I talk about embedded systems in general, I mean microcontrollers
>>>> running specific programs - not general-purpose computers in embedded
>>>> formats (such as phones).
>>>
>>> Philip Munts made a comment a while back that stayed with me: that these
>>> days, in anything mains powered, there is usually little reason to use
>>> an MCU instead of a Linux board.
>>>
>>
>> Except that if it has a network connection, you have to patch it
>> unendingly or suffer the common-as-dirt IoT security nightmares.
>>
>
> Those nightmares do not apply if you are in complete control of your
> firmware - which few people are nowadays indeed.
>
> I have had netMCA devices on the net for over 10 years now in many
> countries, the worst problem I have seen was some Chinese IP hanging
> on port 80 to no consequences.

But if you're using a RasPi or Beaglebone or something like that, you
need a reasonably well-upholstered Linux distro, which has to be patched
regularly. At very least it'll need a kernel, and kernel patches
affecting security are not exactly rare.

Paul Rubin

unread,
Jun 9, 2021, 3:58:48 PMJun 9
to
Phil Hobbs <pcdhSpamM...@electrooptical.net> writes:
> But if you're using a RasPi or Beaglebone or something like that, you
> need a reasonably well-upholstered Linux distro, which has to be
> patched regularly. At very least it'll need a kernel, and kernel
> patches affecting security are not exactly rare.

You're in the same situation with almost anything else connected to the
internet. Think of the notorious "smart light bulbs".

On the other hand, you are in reasonable shape if the raspberry pi
running your fish tank is only reachable through a LAN or VPN.
Non-networked low end linux boards are also a thing.

Dimiter_Popoff

unread,
Jun 9, 2021, 4:12:54 PMJun 9
to
Oh if you use one of these all you can rely on is prayer, I don't
think there is *one* person knowing everything which goes on within
such a system. Basically it is impossible to know, even if you have
all the manpower to dissect all the code you can still be taken by
surprise by something a compiler has inserted somewhere etc., your
initial point is well taken here.
If you ask *me* if I am 100% sure what my devices might do - and I
have written every single bit of code running on them, which has
been compiled by a compiler I have written every single bit of - I
might still be scratching my head. We buy our silicon, you know...

Hans-Bernhard Bröker

unread,
Jun 9, 2021, 4:52:15 PMJun 9
to
Am 09.06.2021 um 10:40 schrieb David Brown:
> On 09/06/2021 06:16, George Neuner wrote:

>> Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
>> onto the front of what essentially is a load/store RISC.

... and at about that time they also abandoned the last traces of their
original von-Neumann architecture. The actual core is quite strictly
Harvard now, treating the external RAM banks more like mass storage
devices than an actual combined code+data memory.

> Absolutely. But from the user viewpoint, it is the ISA that matters -

That depends rather a lot on who gets to be called the "user".

x86 are quite strictly limited to the PC ecosystem these days: boxes and
laptops built for Mac OS or Windows, some of them running Linux instead.
There the "user" is somebody buying hardware and software from
completely unrelated suppliers. I.e. unlike in the embedded world we
discuss here, the persons writing software for those things had no say
at all what type of CPU is used. They're thus not really the "user."
If they were, they probably wouldn't be using an x86. ;-)

The actual x86 users couldn't care less about the ISA --- the
overwhelming majority of them haven't the slightest idea what an ISA
even is. Some of them used to have a vague idea that there was some
32bit vs. a 64bit whatchamacallit somewhere in there, but even that has
surely faded away by now, as users no longer even face the decision
between them.

Don Y

unread,
Jun 9, 2021, 7:26:36 PMJun 9
to
On 6/9/2021 12:58 PM, Paul Rubin wrote:
> Phil Hobbs <pcdhSpamM...@electrooptical.net> writes:
>> But if you're using a RasPi or Beaglebone or something like that, you
>> need a reasonably well-upholstered Linux distro, which has to be
>> patched regularly. At very least it'll need a kernel, and kernel
>> patches affecting security are not exactly rare.
>
> You're in the same situation with almost anything else connected to the
> internet. Think of the notorious "smart light bulbs".

No, that's only if you didn't adequately prepare for such "exposure".

How many Linux/Windows boxes are running un-NEEDED services? Have
ports open that shouldn't be? How much emphasis was spent on ekeing
out a few percent extra performance from the network stack that
could have, instead, been spent on making it more robust?

How many folks RUNNING something like Linux/Windows in their product
actually know much of anything about what's under the hood? Do they
even know how to BUILD a kernel, let alone sort out what it's
doing (wrong)?

Exposed to the 'net you always are at the mercy of DoS attacks
consuming your inbound bandwidth (assuming you have no contrtol
of upstream traffic/routing). But, even a saturated network
connection doesn't have to crash your device.

OTOH, if your box is dutifully trying to respond to incoming packets
that may be malicious, then you'd better hope that response is
"correct" (or at least SAFE) in EVERY case.

For any of these mainstream OS's, an adversary can play with an
exact copy of yours 24/7/365 to determine its vulnerabilities
before ever approaching your device. And, even dig through
the sources (of some) to see how a potential attack could unfold.
Your device will likely advertise exactly what version of the
kernel (and network stack) it is running.

[An adversary can also BUY one of YOUR devices and do the same
off-line analysis -- but the analysis will only apply to YOUR
device (if you have a proprietary OS/stack) and not a
multitude of other exposed devices]

> On the other hand, you are in reasonable shape if the raspberry pi
> running your fish tank is only reachable through a LAN or VPN.
> Non-networked low end linux boards are also a thing.

Exactly. But that limits utility/accessibility.

If you only need moderate/occasional access, you can implement
a "stealth mode" that lets the server hide, "unprotected".
Or, require all accesses to be initiated from that server
(*to* the remote client) -- similar to a call-back modem.

And, of course, you can place constraints on what can be done
over that connection instead of just treating it as "God Mode".
[No, you can't set the heat to 105 degrees in the summer time;
I don't care if you happen to have appropriate credentials!
And, no, you can't install an update without my verifying
you and the update through other mechanisms...]

Don Y

unread,
Jun 9, 2021, 7:37:56 PMJun 9
to
On 6/9/2021 10:34 AM, Paul Rubin wrote:
> Theo <theom...@chiark.greenend.org.uk> writes:
>>> Buy yourself a Raspberry Pi 4 and set it up to run your fish tank via a
>>> remote web browser. There's your 64 bit embedded system.
>> I suppose there's a question of what embedded tasks intrinsically require
>>> 4GiB RAM, and those that do so because it makes programmers' lives easier?
>
> You can buy a Raspberry Pi 4 with up to 8gb of ram, but the most common
> configuration is 2gb. The cpu is 64 bit anyway because why not?

Exactly. Are they going to give you a *discount* for a 32b version?

(Here, you can have this one for half of 'FREE'...)

>> There are obviously plenty of computer systems doing that, but the
>> question I don't know is what applications can be said to be
>> 'embedded' but need that kind of RAM.
>
> Lots of stuff is using 32 bit cpus with a few KB of ram these days. 32
> bits is displacing 8 bits in the MCU world.
>
> Is 64 bit displacing 32 bit in application processors like the Raspberry
> Pi, even when less than 4GB of ram is involved? I think yes, at least
> to some extent, and it will continue. My fairly low end mobile phone
> has 2GB of ram and a 64 bit 4-core processor, I think.
>
> Will 64 bit MCU's displace 32 bit MCUs? I don't know, maybe not.

Some due to need but, I suspect, most due to pricing or other
features not available in the 32b world. Just like you don't
find PMMUs on 8/16b devices nor in-built NICs.

> Are application processors displacing MCU's in embedded systems? Not
> much in portable and wearable stuff (other than phones) at least for
> now, but in larger devices I think yes, at least somewhat for now, and
> probably more going forward. Even if you're not using networking, it
> makes software and UI development a heck of a lot easier.

This -------------------------------^^^^^^^^^^^^^^^^^^^^^^

Elbow room always takes some of the stress out of design. You
don't worry (as much) about bumping into limits and, instead,
concentrate on solving the problem at hand. The idea of
packing 8 'bools' into a byte (cuz I only had a hundred or
so of them available) is SO behind me, now! Just use something
"more convenient"... eight of them!

I pass pages between processes as an efficiency hack -- even if
I'm only using a fraction of the page. In smaller processors,
I'd be "upset" by this blatant "waste". Instead, I shrug it off
and note that it gives me a uniform way of moving data around
(instead of having to tweek interfaces to LIMIT the amount
of data that I move; or "massage" the data JUST for transport).

My "calculator service" uses BigRationals -- because its easier than
trying to explain to users writing scripts that arithmetic can overflow,
suffer rounding errors, that order of operations is important, etc.

Don Y

unread,
Jun 9, 2021, 7:43:54 PMJun 9
to
Transcoding multiple video sources (for concurrent clients) in a single
appliance?

I have ~30 cameras, here. Had I naively designed with them all connected
to a "camera processor", I suspect memory would be the least of my
concerns (motion and scene recognition in 30 places simultaneously?)
Instead, it was "easier" to give each camera its own processor. And,
gain extended "remotability" as part of the process.

Remember, the 32b address space has to simultaneously hold EVERYTHING that
will need to be accessible to your application -- the OS, it's memory
requirements, the application(s) tasks, the stacks/heaps for the threads
they contain, the data to be processed (in and out), the memory-mapped
I/Os consumed by the SoC itself, etc.

When you HAVE a capability/resource, it somehow ALWAYS gets used! ;-)

Don Y

unread,
Jun 9, 2021, 8:13:30 PMJun 9
to
On 6/9/2021 10:56 AM, Dimiter_Popoff wrote:
> On 6/9/2021 4:29, Don Y wrote:
>> On 6/8/2021 3:01 PM, Dimiter_Popoff wrote:
>>
>>>> Am trying to puzzle out what a 64-bit embedded processor should look like.
>>>> At the low end, yeah, a simple RISC processor. And support for complex
>>>> arithmetic
>>>> using 32-bit floats? And support for pixel alpha blending using quad
>>>> 16-bit numbers?
>>>> 32-bit pointers into the software?
>>>
>>> The real value in 64 bit integer registers and 64 bit address space is
>>> just that, having an orthogonal "endless" space (well I remember some
>>> 30 years ago 32 bits seemed sort of "endless" to me...).
>>>
>>> Not needing to assign overlapping logical addresses to anything
>>> can make a big difference to how the OS is done.
>>
>> That depends on what you expect from the OS. If you are
>> comfortable with the possibility of bugs propagating between
>> different subsystems, then you can live with a logical address
>> space that exactly coincides with a physical address space.
>
> So how does the linear 64 bt address space get in the way of
> any protection you want to implement? Pages are still 4 k and
> each has its own protection attributes governed by the OS,
> it is like that with 32 bit processors as well (I talk power, I am
> not interested in half baked stuff like ARM, risc-v etc., I don't
> know if there could be a problem like that with one of these).

With a linear address space, you typically have to link EVERYTHING
as a single image to place each thing in its own piece of memory
(or use segment based addressing).

I can share code between tasks without conflicting addressing;
the "data" for one instance of the app is isolated from other
instances while the code is untouched -- the code doesn't even
need to know that it is being invoked on different "data"
from one timeslice to the next. In a flat address space,
you'd need the equivalent of a "context pointer" that you'd
have to pass to the "shared code". And, have to hope that
all of your context could be represented in a single such
reference! (I can rearrange physical pages so they each
appear "where expected" to a bit of const CODE).

Similarly, the data passed (or shared) from one task (process) to
another can "appear" at entirely different logical addresses
"at the same time" as befitting the needs of each task WITHOUT
CONCERN (or awareness) of the existence of the other task.
Again, I don't need to pass a pointer to the data; the address
space has been manipulated to make sure it's where it should be.

The needs of a task can be met by resources "harvested" from
some other task. E.g., where is the stack for your TaskA?
How large is it? How much of it is in-use *now*? How much
can it GROW before it bumps into something (because that something
occupies space in "its" address space).

I start a task (thread) with a single page of stack. And, a
limit on how much it is allowed to consume during its execution.
Then, when it pushes something "off the end" of that page,
I fault a new page in and map it at the faulting address.
This continues as the task's stack needs grow.

When I run out of available pages, I do a GC cycle to
reclaim pages from (other?) tasks that are no longer using
them.

In this way, I can effectively SHARE a stack (or heap)
between multiple tasks -- without having to give any
consideration for where, in memory, they (or the stacks!)
reside.

I can move a page from one task (full of data) to another
task at some place that the destination task finds "convenient".
I can import a page from another network device or export
one *to* another device.

Because each task's address space is effectively empty/sparse,
mapping a page doesn't require much effort to find a "free"
place for it.

I can put constraints on each such mapping -- and then runtime
checks to ensure "things are as I expect": "Why is this NIC
buffer residing in this particular portion of the address space?"

With a task bound to a semicontiguous portion of memory, it can
deal with that region as if it was a smaller virtual region.
I can store 32b pointers to things if I know that my addresses
are based from 0x000 and the task never extends beyond a 4GB
region. If available, I can exploit "shorter" addressing modes.

> There is *nothing* to gain on a 64 bit machine from segmentation, assigning
> overlapping address spaces to tasks etc.

What do you gain by NOT using it? You're still dicking with the MMU.
(if you aren't then what value the MMU in your "logical" space? map
each physical page to a corresponding logical page and never talk to
the MMU again; store const page tables and let your OS just tweek the
base pointer for the TLBs to use for THIS task)

You still have to "position" physical resources in particular places
(and you have to deal with the constraints of all tasks, simultaneously,
instead of just those constraints imposed by the "current task")

David Brown

unread,
Jun 10, 2021, 3:37:45 AMJun 10
to
On 09/06/2021 22:52, Hans-Bernhard Bröker wrote:
> Am 09.06.2021 um 10:40 schrieb David Brown:
>> On 09/06/2021 06:16, George Neuner wrote:
>
>>> Since (at least) the Pentium 4 x86 really are a CISC decoder bolted
>>> onto the front of what essentially is a load/store RISC.
>
> ... and at about that time they also abandoned the last traces of their
> original von-Neumann architecture.  The actual core is quite strictly
> Harvard now, treating the external RAM banks more like mass storage
> devices than an actual combined code+data memory.
>
>> Absolutely.  But from the user viewpoint, it is the ISA that matters -
>
> That depends rather a lot on who gets to be called the "user".
>

I meant "the person using the ISA" - i.e., the programmer. And even
then, I meant low-level programmers who have to understand things like
memory models, cache thrashing, coding for vectors and SIMD, etc. These
are the people who see the ISA. I was not talking about the person
wiggling the mouse and watching youtube!

Dimiter_Popoff

unread,
Jun 10, 2021, 6:45:56 AMJun 10
to
Nothing could be further from the truth. What kind of crippled
environment can make you think that? Code can be position
independent on processors which are not dead by design nowadays.
When I started dps some 27 years ago I allowed program modules
to demand a fixed address on which they would reside. This exists
to this day and has been used 0 (zero) times. Same about object
descriptors, program library modules etc., the first system call
I wrote is called "allocm$", allocate memory. You request a number
of bytes and you get back an address and the actual number of
bytes you were given (it comes rounded by the memory cluster
size, typically 4k (a page). This was the *first* thing I did.
And yes, all allocation is done using worst fit strategy, sometimes
enhanced worst fit - things the now popular OS-s have yet to get to,
they still have to defragment their disks, LOL.

>
> I can share code between tasks without conflicting addressing;
> the "data" for one instance of the app is isolated from other
> instances while the code is untouched -- the code doesn't even
> need to know that it is being invoked on different "data"
> from one timeslice to the next.  In a flat address space,
> you'd need the equivalent of a "context pointer" that you'd
> have to pass to the "shared code".  And, have to hope that
> all of your context could be represented in a single such
> reference!  (I can rearrange physical pages so they each
> appear "where expected" to a bit of const CODE).
>
> Similarly, the data passed (or shared) from one task (process) to
> another can "appear" at entirely different logical addresses
> "at the same time" as befitting the needs of each task WITHOUT
> CONCERN (or awareness) of the existence of the other task.
> Again, I don't need to pass a pointer to the data; the address
> space has been manipulated to make sure it's where it should be.

So how do you pass the offset from the page beginning if you do
not pass an address.
And how is page manipulation simpler and/or safer than just passing
an address, sounds like a recipe for quite a mess to me.
In a 64 bit address space there is nothing stopping you to
pass addresses or not passing them and allow access to areas
you want to and disallow it elsewhere.
Other than that there is nothing to be gained by a 64 bit architecture
really, on 32 bit machines you do have FPUs, vector units etc.
doing calculation probably faster than the integer unit of a
64 bit processor.
The *whole point* of a 64 bit core is the 64 bit address space.


>
> The needs of a task can be met by resources "harvested" from
> some other task.  E.g., where is the stack for your TaskA?
> How large is it?  How much of it is in-use *now*?  How much
> can it GROW before it bumps into something (because that something
> occupies space in "its" address space).

This is the beauty of 64 bit logical address space. You allocate
enough logical memory and then you allocate physical on demand,
this is what MMUs are there for. If you want to grow your stack
indefinitely - the messy C style - you can just allocate it
a few gigabytes of logical memory and use the first few kilobytes
of it to no waste of resources. Of course there are much slicker
ways to deal with memory allocation.


>
> I start a task (thread) with a single page of stack.  And, a
> limit on how much it is allowed to consume during its execution.
> Then, when it pushes something "off the end" of that page,
> I fault a new page in and map it at the faulting address.
> This continues as the task's stack needs grow.

This is called "allocate on demand" and has been around
for times immemorial, check my former paragraph.

>
> When I run out of available pages, I do a GC cycle to
> reclaim pages from (other?) tasks that are no longer using
> them.

This is called "memory swapping", also for times immemorial.
For the case when there is no physical memory to reclaim, that
is.
The first version of dps - some decades ago - ran on a CPU32
(a 68340). It had no MMU so I implemented "memory blocks",
a task can declare a piece a swap-able block and allow/disallow
its swapping. Those blocks would then be shared or written to disk when
more memory was needed etc., memory swapping without an MMU.
Worked fine, must be still working for code I have not
touched since on my power machines, all those decades later.

>
> In this way, I can effectively SHARE a stack (or heap)
> between multiple tasks -- without having to give any
> consideration for where, in memory, they (or the stacks!)
> reside.

You can do this in a linear address space, too - this is what
the MMU is for.


>
> I can move a page from one task (full of data) to another
> task at some place that the destination task finds "convenient".
> I can import a page from another network device or export
> one *to* another device.

So instead of simply passing an address you have to switch page
translation entries, adjust them on each task switch, flush and
sync whatever it takes - does not sound very efficient to me.

>
> Because each task's address space is effectively empty/sparse,
> mapping a page doesn't require much effort to find a "free"
> place for it.

This is the beauty of having the 64 bit address space, you always
have enough logical memory. The "64 bit address space per task"
buys you *nothing*.

Don Y

unread,
Jun 10, 2021, 9:55:47 AMJun 10
to
On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:

[attrs elided]
You missed my point -- possibly because this issue was raised
BEFORE pointing out how much DYNAMIC management of the MMU
(typically an OS delegated acticity) "buys you":
"That depends on what you expect from the OS."

If you can ignore the MMU *completely*, then the OS is greatly
simplified. YOU (developer) take on the responsibilites of remembering
what is where, etc. EVERYTHING is visible to EVERYONE and at
EVERYTIME. The OS doesn't have to get involved in the management
of objects/tasks/etc. That's YOUR responsibility to ensure
your taskA doesn't go dicking around with taskB's resources.

Welcome to the 8/16b world!

The next step up is to statically deploy the MMU. You build
a SINGLE logical address space to suit your liking. Then, map
the underlying physical resources to it as best fits. And,
this never needs to change -- memory doesn't "move around",
it doesn't change characteristics (readable, writeable,
exeuctable, accessable-by-X, etc.)!

But, you can't then change permissions based on which task is
executing -- unless you want to dick with the MMU dynamically
(or swap between N discrete sets of STATIC page tables that
define the many different ways M tasks can share permissions)

So, you *just* use the MMU as a Memory Protection Unit; you mark
sections of memory that have CODE in them as no-write, you mark
regions with DATA as no-execute, and everything else as no-access.

And that's the way it stays for EVERY task!

This lets you convert RAM to ROM and prevents "fetches" from "DATA"
memory. It ensures your code is never overwritten and that the
processor never tries to execute out of "data memory" and NOTHING
tries to access address regions that are "empty"!

You've implemented a 1980's vintage protection scheme (this is how
we designed arcade pieces, back then, as you wanted your CODE
and FRAME BUFFER to occupy the same limited range of addresses)

<yawn>

Once you start using the MMU to dynamically *manage* memory (which
includes altering protections and re-mapping), then the cost of the
OS increases -- because these are typically things that are delegated
*to* the OS.

Whether or not you have overlapping address spaces or a single
flat address space is immaterial -- you need to dynamically manage
separate page tables for each task in either scheme. You can't
argue that the OS doesn't need to dick with the MMU "because it's
a flat address space" -- unless you forfeit those abilities
(that I illustrated in my post).

If you want to compare a less-able OS to one that is more featured,
then its disingenuous to blame that on overlapping address spaces;
the real "blame" lies in the support of more advanced features.

The goal of an OS should be to make writing *correct* code easier
by providing features as enhancements. It's why the OS typically
reads disk files instead of replicating that file system and driver
code into each task that needs to do so. Or, why it implements
delays/timers -- so each task doesn't reinvent the wheel (with its
own unique set of bugs).

You can live without an OS. But, typically only for a trivial
application. And, you're not likely to use a 64b processor just
to count characters received on a serial port! Or as an egg timer!

>> I can share code between tasks without conflicting addressing;
>> the "data" for one instance of the app is isolated from other
>> instances while the code is untouched -- the code doesn't even
>> need to know that it is being invoked on different "data"
>> from one timeslice to the next. In a flat address space,
>> you'd need the equivalent of a "context pointer" that you'd
>> have to pass to the "shared code". And, have to hope that
>> all of your context could be represented in a single such
>> reference! (I can rearrange physical pages so they each
>> appear "where expected" to a bit of const CODE).
>>
>> Similarly, the data passed (or shared) from one task (process) to
>> another can "appear" at entirely different logical addresses
>> "at the same time" as befitting the needs of each task WITHOUT
>> CONCERN (or awareness) of the existence of the other task.
>> Again, I don't need to pass a pointer to the data; the address
>> space has been manipulated to make sure it's where it should be.
>
> So how do you pass the offset from the page beginning if you do
> not pass an address.

YOU pass an object to the OS and let the OS map it where *it*
wants, with possible hints from the targeted task (logical address
space).

I routinely pass multiple-page-sized objects around the system.

"Here's a 20MB telephone recording, memory mapped (to wherever YOU,
its recipient, want it). Because it is memory mapped and has its
own pager, the actual amount of physical memory that is in use
at any given time can vary -- based on the resource allocation
you've been granted and the current resource availability in the
system. E.g., there may be as little as one page of physical
data present at any given time -- and that page may "move" to
back a different logical address based on WHERE you are presently
looking!

Go through and sort out when Bob is speaking and when Tom is speaking.
"Return" an object of UNKNOWN length that lists each of these time
intervals along with the speaker assumed to be talking in each. Tell
me where you (the OS) decided it would best fit into my logical address
space, after consulting the hint I provided (but that you may not have
been able to honor because the result ended up *bigger* than the "hole"
I had imagined it fitting into). No need to tell me how big it really
is as I will be able to parse it (cuz I know how you will have built that
list) and the OS will track the memory that it uses so all I have to do
is free() it (it may be built out of 1K pages, 4K pages, 16MB pages)!"

How is this HARDER to do when a single task has an entire 64b address
space instead of when it has to SHARE *a* single address space among
all tasks/objects?

> And how is page manipulation simpler and/or safer than just passing
> an address, sounds like a recipe for quite a mess to me.

The MMU has made that mapping a "permanent" part of THIS task's
address space. It isn't visible to any other task -- why *should*
it be? Why does the pointer need to indirectly reflect the fact
that portions of that SINGLE address space are ineligible to
contain said object because of OTHER unrelated (to this task) objects??

> In a 64 bit address space there is nothing stopping you to
> pass addresses or not passing them and allow access to areas
> you want to and disallow it elsewhere.

And I can't do that in N overlapping 64b address spaces?

The only "win" you get is by exposing everything to everyone.
That's not the way software is evolving. Compartmentalization
(to protect from other actors), opacity (to hide implementation
details), accessors (instead of exposing actual data), etc.

This comes at a cost -- in performance as well as OS design.
But, *seems* to be worth the effort, given how "mainstream"
development is heading.

> Other than that there is nothing to be gained by a 64 bit architecture
> really, on 32 bit machines you do have FPUs, vector units etc.
> doing calculation probably faster than the integer unit of a
> 64 bit processor.
> The *whole point* of a 64 bit core is the 64 bit address space.

No, the whole point of a 64b core is the 64b registers.
You can package a 64b CPU so that only 20! address lines
are bonded out. This limits the physical address space
to 20b. What value to making the logical address
space bigger -- so you can leave gaps for expansion
between objects??

>> The needs of a task can be met by resources "harvested" from
>> some other task. E.g., where is the stack for your TaskA?
>> How large is it? How much of it is in-use *now*? How much
>> can it GROW before it bumps into something (because that something
>> occupies space in "its" address space).
>
> This is the beauty of 64 bit logical address space. You allocate
> enough logical memory and then you allocate physical on demand,
> this is what MMUs are there for. If you want to grow your stack
> indefinitely - the messy C style - you can just allocate it
> a few gigabytes of logical memory and use the first few kilobytes
> of it to no waste of resources. Of course there are much slicker
> ways to deal with memory allocation.

Again, how is this any harder with "overlapping" 64b address spaces?
Or, how is it EASIER with nonoverlap?

>> I start a task (thread) with a single page of stack. And, a
>> limit on how much it is allowed to consume during its execution.
>> Then, when it pushes something "off the end" of that page,
>> I fault a new page in and map it at the faulting address.
>> This continues as the task's stack needs grow.
>
> This is called "allocate on demand" and has been around
> for times immemorial, check my former paragraph.

I'm not trying to be "novel". Rather, showing that these
features come from the MMU -- not a "nonoverlapping"
(or overlapping!) address space.

I.e., the take away from all this is the MMU is the win
AND the cost for the OS. Without it, the OS gets simpler...
and less capable!

>> When I run out of available pages, I do a GC cycle to
>> reclaim pages from (other?) tasks that are no longer using
>> them.
>
> This is called "memory swapping", also for times immemorial.
> For the case when there is no physical memory to reclaim, that
> is.
> The first version of dps - some decades ago - ran on a CPU32
> (a 68340). It had no MMU so I implemented "memory blocks",
> a task can declare a piece a swap-able block and allow/disallow
> its swapping. Those blocks would then be shared or written to disk when
> more memory was needed etc., memory swapping without an MMU.
> Worked fine, must be still working for code I have not
> touched since on my power machines, all those decades later.

There's no disk involved. The amount of physical memory
is limited to what's on-board (unless I try to move resources
to another node or -- *gack* -- use a scratch table in the RDBMS
as a backing store).

Recovering "no longer in use" portions of stack is "low hanging fruit";
look at the task's stack pointer and you know how much allocated stack
is no longer in use. Try to recover it (of course, the task
may immediately fault another page back into play but that's
an optimization issue).

If there is no "low hanging fruit", then I ask tasks to voluntarily
relinquish memory. Some tasks may have requested "extra" memory
in order to precompute results for future requests/activities.
If it was available -- and if the task wanted to "pay" for it -- then
the OS would grant the allocation (knowing that it could eventually
revoke it!) They could relinquish those resources at the expense of
having to recompute those things at a later date ("on demand" *or* when
memory is again available).

If I can't recover enough resources "voluntarily", then I
*take* memory away from a (selected) task and inform it
(raise an exception that it will handle as soon as it gets
a timeslice) of that "theft". It will either recover from
the loss (because it was being greedy and didn't elect
to forfeit excess memory that it had allocated when I asked,
earlier) *or* it will crash. <shrug> When you run out
of resources, SOMETHING has to give (and the OS is better
suited to determining WHAT than the individual tasks are...
they ALL think *they* are important!)

Again, "what do you expect from your OS?"

>> In this way, I can effectively SHARE a stack (or heap)
>> between multiple tasks -- without having to give any
>> consideration for where, in memory, they (or the stacks!)
>> reside.
>
> You can do this in a linear address space, too - this is what
> the MMU is for.

Yes, see? There's nothing special about a flat address space!

>> I can move a page from one task (full of data) to another
>> task at some place that the destination task finds "convenient".
>> I can import a page from another network device or export
>> one *to* another device.
>
> So instead of simply passing an address you have to switch page
> translation entries, adjust them on each task switch, flush and
> sync whatever it takes - does not sound very efficient to me.

It's not intended to be fast/efficient. It's intended to ensure
that the recipient -- AND ONLY THE RECIPIENT -- is *now*
granted access to that page's contents. depending on semantics,
it can create a copy of an object or "move" the object, leaving
a "hole" in the original location.

[I.e., if move semantics, then the original owner shouldn't be
trying to access something that he's "given away"! Any access,
by him, to that memory region should signal a fatal exception!]

If you don't care who sees what, then you don't need the MMU!
And we're back to my initial paragraph of this reply! :>

>> Because each task's address space is effectively empty/sparse,
>> mapping a page doesn't require much effort to find a "free"
>> place for it.
>
> This is the beauty of having the 64 bit address space, you always
> have enough logical memory. The "64 bit address space per task"
> buys you *nothing*.

If "always having enough logical memory" is such a great thing,
isn't having MORE logical memory (because you've moved other
things into OVERLAPPING portions of that memory space) an
EVEN BETTER thing?

Again, what does your flat addressing BUY the OS in terms of
complexity reduction? (your initial assumption)
"...a big difference to how the OS is done"

Dimiter_Popoff

unread,
Jun 10, 2021, 11:32:30 AMJun 10
to
On 6/10/2021 16:55, Don Y wrote:
> On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:
>
> [attrs elided]
>
Don, this becomes way too lengthy and repeating itself.

You keep on saying that a linear 64 bit address space means exposing
everything to everybody after I explained this is not true at all.

You keep on claiming this or that about how I do things without
bothering to understand what I said - like your claim that I use the MMU
for "protection only".
NO, this is not true either. On 32 bit machines - as mine in
production are - mapping 4G logical space into say 128M of physical
memory goes all the way through page translation, block translation
for regions where page translation would be impractical etc.
You sound the way I would have sounded before I had written and
built on for years what is now dps. The devil is in the detail :-).

You pass "objects", pages etc. Well guess what, it *always* boils
down to an *address* for the CPU. The rest is generic talk.
And if you choose to have overlapping address spaces when you
pass a pointer from one task to another the OS has to deal with this
at a significant cost.
In a linear address space, you pass the pointer *as is* so the OS does
not have to deal with anything except access restrictions.
In dps, you can send a message to another task - the message being
data the OS will copy into that tasks memory, the data being
perfectly able to be an address of something in another task's
memory. If a task accesses an address it is not supposed to
the user is notified and allowed to press CR to kill that task.
Then there are common data sections for groups of tasks etc.,
it is pretty huge really.

The concept "one entire address space to all tasks" is from the 60-s
if not earlier (I just don't know and don't care to check now) and it
has done a good job while it was necessary, mostly on 16 bit CPUs.
For today's processors this means just making them run with the
handbrake on, *nothing* is gained because of that - no more security
(please don't repeat that "expose everything" nonsense), just
burning more CPU power, constantly having to remap addresses etc.

Don Y

unread,
Jun 10, 2021, 5:09:47 PMJun 10
to
On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
> On 6/10/2021 16:55, Don Y wrote:
>> On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:
>>
>> [attrs elided]
> >
> Don, this becomes way too lengthy and repeating itself.
>
> You keep on saying that a linear 64 bit address space means exposing
> everything to everybody after I explained this is not true at all.

Task A has built a structure -- a page worth of data residing
at 0x123456. It wants to pass this to TaskB so that TaskB can perform
some operations on it.

Can TaskB acccess the data at 0x123456 *before* TaskA has told it
to do so?

Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?

Can TaskA alter the data at 0x123456 *after* it has "passed it along"
to TaskB -- possibly while TaskB is still using it?

> You keep on claiming this or that about how I do things without
> bothering to understand what I said - like your claim that I use the MMU
> for "protection only".

I didn't say that YOU did that. I said that to be able to ignore
the MMU after setting it up, you can ONLY use it to protect
code from alteration, data from execution, etc. The "permissions"
that it applies have to be invariant over the execution time of
ALL of the code.

So, if you DON'T use it "for protection only", then you are admitting
to having to dynamically tweek it.

*THIS* is the cost that the OS incurs -- and having a flat address
space doesn't make it any easier! If you aren't incurring that cost,
then you're not protecting something.

> NO, this is not true either. On 32 bit machines - as mine in
> production are - mapping 4G logical space into say 128M of physical
> memory goes all the way through page translation, block translation
> for regions where page translation would be impractical etc.
> You sound the way I would have sounded before I had written and
> built on for years what is now dps. The devil is in the detail :-).
>
> You pass "objects", pages etc. Well guess what, it *always* boils
> down to an *address* for the CPU. The rest is generic talk.

Yes, the question is "who manages the protocol for sharing".
Since forever, you could pass pointers around and let anyone
access anything they wanted. You could impose -- but not
ENFORCE -- schemes that ensured data was shared properly
(e.g., so YOU wouldn't be altering data that *I* was using).

[Monitors can provide some structure to that sharing but
are costly when you consider the number of things that may
potentially need to be shared. And, you can still poke
directly at the data being shared, bypassing the monitor,
if you want to (or have a bug)]

But, you had to rely on programming discipline to ensure this
worked. Just like you have to rely on discipline to ensure
code is "bugfree" (how's that worked for the industry?)

> And if you choose to have overlapping address spaces when you
> pass a pointer from one task to another the OS has to deal with this
> at a significant cost.

How does your system handle the above example? How do you "pass" the
pointer from TaskA to TaskB -- if not via the OS? Do you expose a
shared memory region that both tasks can use to exchange data
and hope they follow some rules? Always use synchronization
primitives for each data exchange? RELY on the developer to
get it right? ALWAYS?

Once you've passed the pointer, how does TaskB access that data
WITHOUT having to update the MMU? Or, has TaskB had access to
the data all along?

What happens when B wants to pass the modified data to C?
Does the MMU have to be updated (C's tables) to grant that
access? Or, like B, has C had access all along? And, has
C had to remain disciplined enough not to go mucking around
with that region of memory until A *and* B have done modifying
it?

I don't allow anyone to see anything -- until the owner of that thing
explicitly grants access. If you try to access something before it's
been made available for your access, the OS traps and aborts your
process -- you've violated the discipline and the OS is going to
enforce it! In an orderly manner that doesn't penalize other
tasks that have behaved properly.

> In a linear address space, you pass the pointer *as is* so the OS does
> not have to deal with anything except access restrictions.
> In dps, you can send a message to another task - the message being
> data the OS will copy into that tasks memory, the data being
> perfectly able to be an address of something in another task's

So, you don't use the MMU to protect TaskA's resources from TaskB
(or TaskC!) access. You expect LESS from your OS.

> memory. If a task accesses an address it is not supposed to
> the user is notified and allowed to press CR to kill that task.

What are the addresses "it's not supposed to?" Some *subset* of
the addresses that "belong" to other tasks? Perhaps I can
access a buffer that belongs to TaskB but not TaskB's code?
Or, some OTHER buffer that TaskB doesn't want me to see? Do
you explicitly have to locate ("org") each buffer so that you
can place SOME in protected portions of the address space and
others in shared areas? How do you change these distinctions
dynamically -- or, do you do a lot of data copying from
"protected" space to "shared" space?

> Then there are common data sections for groups of tasks etc.,
> it is pretty huge really.

Again, you expose things by default -- even if only a subset
of things. You create shared memory regions where there are
no protections and then rely on your application to behave and
not access data (that has been exposed for its access) until
it *should*.

Everybody does this. And everyone has bugs as a result. You
are relying on the developer to *repeatedly* implement the sharing
protocol -- instead of relying on the OS to enforce that for you.

It's like putting tons of globals in your application -- to
make data sharing easier (and, thus, more prone to bugs).

You expect less of your OS.

My tasks are free to do whatever they want in their own protection domain.
They KNOW that nothing can SEE the data they are manipulating *or*
observe HOW they are manipulating it or *influence* their manipulation
of it.

Until they want to expose that data. And, then, only to those entities
that they think SHOULD see it.

They can give (hand-off) data to another entity -- much like call-by-value
semantics -- and have the other entity know that NOTHING that the
original "donor" can do AFTER that handoff will affect the data that
has been "passed" to them.

Yet, they can still manipulate that data -- update it or reuse that
memory region -- for the next "client".

The OS enforces these guarantees. Much more than just passing along
a pointer to the data! Trying to track down the donor's alteration
of data while the recipient is concurrently accessing it (multiple
tasks, multiple cores, multiple CPUs) is a nightmare proposition.
And, making an *unnecessary* copy of it is a waste of resources
(esp if the two parties actually ARE well-behaved)

> The concept "one entire address space to all tasks" is from the 60-s
> if not earlier (I just don't know and don't care to check now) and it
> has done a good job while it was necessary, mostly on 16 bit CPUs.
> For today's processors this means just making them run with the
> handbrake on, *nothing* is gained because of that - no more security
> (please don't repeat that "expose everything" nonsense), just
> burning more CPU power, constantly having to remap addresses etc.

Remapping is done in hardware. The protection overhead is a
matter of updating page table entries. *You* gain nothing by creating
a flat address space because *you* aren't trying to compartmentalize
different tasks and subsystems. You likely protect the kernel's
code/data from direct interference from "userland" (U/S bit) but
want the costs of sharing between tasks to be low -- at the expense
of forfeiting protections between them.

*Most* of the world consists of imperfect coders. *Most* of us have
to deal with colleagues (of varying abilities) before, after and
during our tenure running code on the same CPU as our applications.

"The bug is (never!) in my code! So, it MUST be in YOURS!"

You can either stare at each other, confident in the correctness
of your own code. Or, find the bug IN THE OTHER GUY'S CODE
(you can't prove yours is correct anymore than he can; so you have to
find the bug SOMEWHERE to make your point), effectively doing his
debugging *for* him.

Why do you think desktop OS's go to such lengths to compartmentalize
applications? Aren't the coders of application A just as competent
as those who coded application B? Why would you think application
A might stomp on some resource belonging to application B? Wouldn't
that be a violation of DISCIPLINE (and outright RUDE)?

You've been isolated from this for far too long. So, don't see
what it's like to have to deal with another(s)' code impacting
the same product that *you* are working on.

Encapsulation and opacity are the best ways to ensure all interactions
to your code/data are through permitted interfaces.
"Who overwrote my location 0x123456? I know *I* didn't..."
"Who turned on power to the motor? I'm the only one who should do so!"
"Who deleted the log file?"
There's a reason we eschew globals!

I can ensure TaskB can't delete the log file -- by simply denying him
access to logfile.delete(). But, letting him use logfile.append()
as much as he wants! At the same time, allowing TaskA to delete or
logfile.rollover() as it sees fit -- because I've verified that
TaskA does this appropriately as part of its contract. And, there's
no NEED for TaskB to ever do so -- it's not B's responsibility
(so why allow him the opportunity to ERRONEOUSLY do so -- and then
have to chase down how this happened?)

If TaskB *tries* to access logfile.delete(), I can trap to make his
violation obvious: "Reason for process termination: illegal access"

And, I don't need to do this with pointers or hardware protection
of the pages in which logfile.delete() resides! I just don't let
him invoke *that* method! I *expect* my OS to provide these mechanisms
to the developer to make his job easier AND the resulting code more robust.

There is a cost to all this. But, *if* something misbehaves, it leaves
visible evidence of its DIRECT actions; you don't have to wonder WHEN
(in the past) some datum was corrupted that NOW manifests as an error
in some, possibly unrelated, manner.

Of course, you don't need any of this if you're a perfect coder.

You don't expose the internals of your OS to your tasks, do you?
Why? Don't you TRUST them to observe proper discipline in their
interactions with it? You trust them to observe those same
disciplines when interacting with each other... Why can't TaskA
see the preserved state for TaskB? Don't you TRUST it to
only modify it if it truly knows what it's doing? Not the result
of resolving some errant pointer?

Welcome to the 70's!

Dimiter_Popoff

unread,
Jun 10, 2021, 6:13:29 PMJun 10
to
On 6/11/2021 0:09, Don Y wrote:
> On 6/10/2021 8:32 AM, Dimiter_Popoff wrote:
>> On 6/10/2021 16:55, Don Y wrote:
>>> On 6/10/2021 3:45 AM, Dimiter_Popoff wrote:
>>>
>>> [attrs elided]
>>  >
>> Don, this becomes way too lengthy and repeating itself.
>>
>> You keep on saying that a linear 64 bit address space means exposing
>> everything to everybody after I explained this is not true at all.
>
> Task A has built a structure -- a page worth of data residing
> at 0x123456.  It wants to pass this to TaskB so that TaskB can perform
> some operations on it.
>
> Can TaskB acccess the data at 0x123456 *before* TaskA has told it
> to do so >
> Can TaskB access the data at 0x123456 WHILE TaskA is manipulating it?
>
> Can TaskA alter the data at 0x123456 *after* it has "passed it along"
> to TaskB -- possibly while TaskB is still using it?

If task A does not want any of the above it just places them in a
page to which it only has access. Or it can allow read access only.
*Why* do you confuse this with linear address space? What does the
one have to do with the other?

>
>> You keep on claiming this or that about how I do things without
>> bothering to understand what I said - like your claim that I use the MMU
>> for "protection only".
>
> I didn't say that YOU did that.  I said that to be able to ignore
> the MMU after setting it up, you can ONLY use it to protect
> code from alteration, data from execution, etc.  The "permissions"
> that it applies have to be invariant over the execution time of
> ALL of the code.
>
> So, if you DON'T use it "for protection only", then you are admitting
> to having to dynamically tweek it.

Of course dps is dealing with it, all the time. The purpose of the
linear *logical* address space is just orthogonality and simplicity,
like not having to remap passed addresses (which can have a lot
of further implications, like inability to use addresses in another
tasks structure).

>
> *THIS* is the cost that the OS incurs -- and having a flat address
> space doesn't make it any easier!  If you aren't incurring that cost,
> then you're not protecting something.

Oh but it does - see my former paragraph.
I already explained that. If task A wants to leave a message
into task B memory it goes through a call (signd7$ or whatever,
there are variations) and the message is left there.
If task A did not want to receive messages it won't even be
attempted by the OS, will return a straight error (task does not
support... whatever). If the message is illegal the result is
similar. And if it happens that task A tries to access directly
memory of task B which it is not supposed to it will just go to
the "task A memory access violation. Press CR to kill it".

You have to rely on the developer to get it right if they
write supervisor code. Otherwise you need not.
The signalling system works in user mode though you can
write supervisor level code which uses it, but if you
are allowed to write at that level you can mess up pretty
much everything, I hope you are not trying to wrestle
*that* one.

>
> Once you've passed the pointer, how does TaskB access that data
> WITHOUT having to update the MMU?  Or, has TaskB had access to
> the data all along?


By just writing to the address task A has listed for the
purpose. It is not in a protected area so the only thing
the MMU may have to do is a tablewalk.

*THIS* demonstrates the advantage of the linear logical
address space very well.

>
> What happens when B wants to pass the modified data to C?
> Does the MMU have to be updated (C's tables) to grant that
> access?  Or, like B, has C had access all along?  And, has
> C had to remain disciplined enough not to go mucking around
> with that region of memory until A *and* B have done modifying
> it?

Either of these has its area which allows messaging. I don't
see what you want to achieve by making it only more cumbersome
(but not less possible) to do.

> I don't allow anyone to see anything -- until the owner of that thing
> explicitly grants access.  If you try to access something before it's
> been made available for your access, the OS traps and aborts your
> process -- you've violated the discipline and the OS is going to
> enforce it!  In an orderly manner that doesn't penalize other
> tasks that have behaved properly.


So well, how is the linear address space in your way of doing that?
It certainly is not in my way when I do it.

>
>> In a linear address space, you pass the pointer *as is* so the OS does
>> not have to deal with anything except access restrictions.
>> In dps, you can send a message to another task - the message being
>> data the OS will copy into that tasks memory, the data being
>> perfectly able to be an address of something in another task's
>
> So, you don't use the MMU to protect TaskA's resources from TaskB
> (or TaskC!) access.  You expect LESS from your OS.

Why on Earth do you think that? And what does the linear address space
have to do with *any* of it?
Pages can be as small as 4k why do you not just have them properly
setup upon task start or at some time by having the page which
can receive messages open to accesses and the rest closed?
And again, how on Earth do you see any relevance between a linear
logical address space and all this.

>
>> memory. If a task accesses an address it is not supposed to
>> the user is notified and allowed to press CR to kill that task.
>
> What are the addresses "it's not supposed to?"  Some *subset* of
> the addresses that "belong" to other tasks?  Perhaps I can
> access a buffer that belongs to TaskB but not TaskB's code > Or, some OTHER buffer that TaskB doesn't want me to see?  Do
> you explicitly have to locate ("org") each buffer so that you
> can place SOME in protected portions of the address space and
> others in shared areas?  How do you change these distinctions
> dynamical