>> >Don't agree on that. Aarch64 has some brilliant bits, but it also has so=
>me decisions which I think will prove to be very short-sighted.
>>=20
>> Such as?
>
>Such as the fixed size 32 bit instructions. ARM invented the modern RISC wi=
>th both 16 bit and 32 bit instructions, beating every other ISA style for c=
>ode density on compiled C code. It's what propelled them to superstardom.
Doubtful. I think that ARM succeeded by focussing on the embedded and
the low-power market rather than trying to compete at the high end
during the GHz race years. And they succeeded by developing a
business model for this market: selling design and architecture
licenses rather than selling chips.
>A=
>stounding that they abandoned it for 64 bit at the cost of 30% code density=
> compared to their own 32 bit ISA -- and to our 64 bit one.
They obviously don't consider it worth the complication for 64-bit
systems. Makes sense to me. They have an ISA for low-end systems:
T32. No need to complicate A64 (Aarch64) with that. Note that
ARMv8-M supports "a variant of" T32, not A64.
>At the same time, they're likely to run out of opcode space for extensions =
>in the foreseeable future, as MIPS did.
Can you make the "foreseeable" more concrete? Which year? Or what
ISA extension do you foresee that will make them run into this
problem?
>> Why would one want a 64-bit core that is lower-end than a Cortex-A34?
>> The silicon (and thus cost) of the A34 is tiny compared to the silicon
>> of 4GB of DRAM or even flash, so having something even smaller does
>> not save much money.
>
>The only justification I *need* to give is that customers are lining up to =
>write six figure cheques to license such cores.
That does not answer my question at all. And it might be that your
license is just cheaper than whatever ARM is charging (and as for the
customers paying you more than for free cores, they may not be
comfortable with free stuff for CYA reasons).
>The is a *ton* of customer =
>demand for cores with basically a Cortex M0 microarchitecture and code size=
> but with 64 bit registers.
Apparently not enough for ARM to actually do that. Such a core should
be relatively easy for ARM to do.
Interestingly, what ARM offers is the Cortex-A32, a 32-bit (Aarch32)
only ARMv8-A implementation. Not sure why one would want that,
either.
>> And the other competition is a 32-bit core (say, Cortex-M0) with some
>> extra stuff to allow accessing more memory.
>
>Extra stuff such as bank switching and address translation, sure. It's not =
>a lot smaller than just making the damn registers 64 bits wide. Plus you ca=
>n't just compile standard C code for it but have to muck about with that ba=
>nk switching. It's awful. Surely the 8086-286 taught us that much?
You cannot use a regular ARM compiler for these accesses, yes. As for
standard C code, a lot of the nasal demons in the C standard come from
architectures like the 8086, so if you have standard C code (a big if)
and you have a compiler that maps C to this extended architecture, you
can just use it.
However, my idea was that the 64-bit accesses would only be used in a
few routines, and that the programmer would much around explicitly
with the banks there.
>Some people want 64 bit for the address space but it's not necessarily RAM.=
> It might be sparse.
For what?
>Or you might have other bigger 64 bit applications pro=
>cessors in the system but don't want to bother them with taking interrupts =
>or transferring data between peripherals and memory. But you want something=
> a little more than a dumb state machine DMA.
I.e., maybe a processor on a PCIe card, or something like Intel's
Management Engine or AMDs PSP. Yes, that would be an application that
makes sense.
>You might not want just one such core. Maybe you want dozens of them.
For what?
>ARM won't sell you a 64 bit core without SIMD.
>ARM won't sell you a 64 bit core without FP.
>ARM won't sell you a 64 bit core without multiply÷.
>ARM won't sell you a 64 bit core with 16 registers.
Good. The 32-bit ISAs of ARM are a mess. If you look up a
description of an instruction, you find that it's optional, and you
have no idea how much using that instruction will limit the reach of
your program. Conversely, the fact that AMD64 guarantees SSE2 was a
big advantage over IA-32 which did not: programs compiled for AMD64
could just use SSE2 for FP instead of having to resort to the 387
instruction set.
So the fact that Aarch64 guarantees all these features is an
advantage, and RISC-V will feel the disadvantage of having so little
guaranteed once it tries to expand into the general-purpose computing
space.
>ARM can't sell you a 64 bit core with comparable code density to their own =
>32 bit cores.
IMO Code density is overrated in this newsgroup.
>It's notable that where a particular 32 bit ARM core has 32 KB of icache, t=
>he comparable 64 bit core has 48 KB. That's expensive.
Which cores do you compare?
Looking at the data sheet of the RK3399
<
http://www.rockchip.fr/RK3399%20datasheet%20V1.8.pdf>, I see: that the
2 Cortex-A72s: 48KB I-cache, 32KB 4-way set-associative D-cache, 1MB
L2 shared between these cores.
4 Cortex-A53: 32KB I-cache, 32KB 4-way set-associativd D-cache, 512KB
L2 shared between the cores.
Do they use 8KB pages?
The data sheet of the AmLogic S922X (with Cortex-A73) does not specify
the cache sizes.
In any case, Intel has lived with 32KB I-caches on their
high-performance cores for many years and continues to do so in Sunny
Cove (while increasing the D-cache size), and according to you the
code density is the same as that of Aarch64, so the 48KB of the A72 in
the RK3399 is probably just because they could make it that large
without incurring another cycle of latency.
>Some people who want a 64 bit core don't care about large memory addressing=
> at all. They just know that a significant part of energy use goes into ins=
>truction fetch and decode and control. Doubling the datapath width can, dep=
>ending on your workload, nearly halve the number of instructions you have t=
>o fetch and decode and execute.
A good reason to use SIMD instructions, then, not a 64-bit core.
That's if you have SIMD instructions.
>> >As for high end .. we shall see. The U84 is only four years behind the A=
>72. Last year's U74 was six years behind the A53. That looks like rapidly c=
>atching up to me.
>>=20
>> ARM has been trying to play catch-up with Intel/AMD in the server
>> market for quite a while. For now they have not succeeded.
>
>It's very different. That market has historically wanted to run legacy x86 =
>applications.
Not really. There are lots of servers running only free software that
is also available for, e.g., Aarch64.
The issues are not in the applications, but in the fringes:
Booting and therefore installing an OS on an IA-32/AMD64 system is
more standardized and better streamlined than for Aarch64; e.g., I can
boot all kinds of systems from the same USB stick or HDD (the
BIOS->UEFI transition introduced a little fragmentation, however); by
contrast, I cannot use the SD cards of the Odroid C2, Odroid N2,
RockPro64, and Raspi4 interchangeably, even though they all support
Aarch64 and Aarch32.
I have to invest extra work to get at the performance counters of
these Aarch64 machines, and have not succeeded for the Odroid N2, and
only partially for the RockPro64 (and have not tried on the Raspi4),
because the custom kernels they come with are supplied without perf,
and getting a stock kernel to work on them requires extra work. Of
course, not that many people need performance counters, but many need
some other fringe specialty that works out of the box on AMD64 boxes,
but acts up on Aarch64.
The booting issue is also a problem on Alpha and PowerPC (and probably
others), just read the relevant FAQs. Apparently the natural state of
booting is fragmented, and it requires dedication to get to a common
booting system (or now two) as present on AMD64.
>To whatever extent ARM does succeed in making the server market ISA-agnosti=
>c, we benefit from that too.
Only if you have an ISA rather than a fragmented mess of options.
Fragmentation hurts in general-purpose computing, big time.
>> > And a lot of them want predictable execution times.
>>=20
>> Actually the worst-case execution time of caches is predictable and
>> better than that of main memory if the cache is designed appropriately
>> (true LRU replacement is good). I expect that one can also predict
>> the worst-case with dynamic branch predictors, but I don't know if the
>> result is better than with static branch prediction.
>
>It's not caches vs dynamic RAM. It's caches vs about the same size of stati=
>c RAM. In many applications a couple of KB is enough.
Then of course you don't need caches. But you also don't need 64-bit
addressing.
- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html