Who is going to buy the first Arm (M1) MacBooks?

Ant

unread,

Nov 11, 2020, 3:46:40 PM11/11/20

to

So, who is going to buy the first Arm (M1) MacBooks? ;)

Thank you for reading and hopefully answering. :)
--
Life's so loco! ..!.. *isms, sins, hates, (d)evil, tiredness, z, my body, illnesses (e.g., COVID-19 & SARS-CoV-2), deaths (RIP), heat, interruptions, issues, conflicts, obstacles, stresses, fires, out(r)ages, dramas, unlucky #4, 2020, greeds, bugs (e.g., crashes & female mosquitoes), etc. D:
Note: A fixed width font (Courier, Monospace, etc.) is required to see this signature correctly.
/\___/\ Ant(Dude) @ http://aqfl.net & http://antfarm.home.dhs.org.
/ /\ /\ \ Please nuke ANT if replying by e-mail.
| |o o| |
\ _ /
( )

Alan Baker

unread,

Nov 11, 2020, 3:57:47 PM11/11/20

to

On 2020-11-11 12:46 p.m., Ant wrote:
> So, who is going to buy the first Arm (M1) MacBooks? ;)
>
> Thank you for reading and hopefully answering. :)
>

I'm seriously considering it.

I'm about due for a new system...

...but I'm loath to be an earlier adopter.

:-)

Lewis

unread,

Nov 11, 2020, 5:22:52 PM11/11/20

to

In message <B5ednfUswZo00DHC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
> So, who is going to buy the first Arm (M1) MacBooks? ;)

I am ordering a MBA as soon as I decide if I should increase the RAM or
not. It will be mostly for my wife who almost certainly doesn't need
16GB, but I am torn.

--
I'm literally becoming less cool with every word you speak.

Alan Baker

unread,

Nov 11, 2020, 5:39:20 PM11/11/20

to

On 2020-11-11 2:22 p.m., Lewis wrote:
> In message <B5ednfUswZo00DHC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
>> So, who is going to buy the first Arm (M1) MacBooks? ;)
>
> I am ordering a MBA as soon as I decide if I should increase the RAM or
> not. It will be mostly for my wife who almost certainly doesn't need
> 16GB, but I am torn.
>
>

Spend the money and get the 16GB.

You're buying a machine to last the next 5 years or more. The extra RAM
will cost you (I'm in Canada) about $250CAD, so that's $50 per year to
ensure the machine has what it will need for the latter part of its life.

Ant

unread,

Nov 11, 2020, 5:40:15 PM11/11/20

to

It looks like Apple doesn't even sell its old Intel Macs from its web site? :(

Ant

unread,

Nov 11, 2020, 5:40:55 PM11/11/20

to

In comp.sys.mac.system Lewis <g.k...@kreme.dont-email.me> wrote:
> In message <B5ednfUswZo00DHC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
> > So, who is going to buy the first Arm (M1) MacBooks? ;)

> I am ordering a MBA as soon as I decide if I should increase the RAM or
> not. It will be mostly for my wife who almost certainly doesn't need
> 16GB, but I am torn.

Aren't the RAMs still soldered in? If so, then you might as well go as big as
you can afford. Same for storage. :(

nospam

unread,

Nov 11, 2020, 5:46:21 PM11/11/20

to

In article <XP2dnVNnM_jU9THC...@earthlink.com>, Ant

<a...@zimage.comANT> wrote:

> It looks like Apple doesn't even sell its old Intel Macs from its web site? :(

yes they do, and there's a lot of them.

nospam

unread,

Nov 11, 2020, 5:46:23 PM11/11/20

to

In article <XP2dnVJnM_js9THC...@earthlink.com>, Ant

<a...@zimage.comANT> wrote:

>
> Aren't the RAMs still soldered in? If so, then you might as well go as big as
> you can afford. Same for storage. :(

no. the ram is in the m1 itself.

Alan Baker

unread,

Nov 11, 2020, 6:27:19 PM11/11/20

to

On 2020-11-11 2:40 p.m., Ant wrote:
> In comp.sys.mac.system Alan Baker <notony...@no.no.no.no> wrote:
>> On 2020-11-11 12:46 p.m., Ant wrote:
>>> So, who is going to buy the first Arm (M1) MacBooks? ;)
>>>
>>> Thank you for reading and hopefully answering. :)
>>>
>
>> I'm seriously considering it.
>
>> I'm about due for a new system...
>
>> ...but I'm loath to be an earlier adopter.
>
>> :-)
>
> It looks like Apple doesn't even sell its old Intel Macs from its web site? :(
>

Nope.

It doesn't look like that at all.

PEBCAK

Your Name

unread,

Nov 11, 2020, 6:39:07 PM11/11/20

to

On 2020-11-11 22:40:49 +0000, Ant said:
> In comp.sys.mac.system Lewis <g.k...@kreme.dont-email.me> wrote:
>> In message <B5ednfUswZo00DHC...@earthlink.com> Ant
>> <a...@zimage.comANT> wrote:
>>>
>>> So, who is going to buy the first Arm (M1) MacBooks? ;)
>>
>> I am ordering a MBA as soon as I decide if I should increase the RAM or
>> not. It will be mostly for my wife who almost certainly doesn't need
>> 16GB, but I am torn.
>
> Aren't the RAMs still soldered in? If so, then you might as well go as big as
> you can afford. Same for storage. :(

Yep. It is one of the biggest pain-in-the-backside points with Apple
these days, especially considering Apple's over-pricing of RAM and
storage drives. :-(

Your Name

unread,

Nov 11, 2020, 6:50:44 PM11/11/20

to

Depends on the model you're looking for (may also depend on the
country). On the New Zealand Apple Store the list shows:

- three Mac Mini models - 2 x M1, 1 x Intel
and - four MacBook Pro 13" models - 2 x M1, 2 x Intel

But for the MacBook Air there is only the two M1 versions.

There will of course still be some old stock at Apple Autorised
Retailers and probably appearing on the Refurb Store for a while (but
stock there varies and can change quickly).

JF Mezei

unread,

Nov 11, 2020, 8:07:03 PM11/11/20

to

On 2020-11-11 17:40, Ant wrote:

> It looks like Apple doesn't even sell its old Intel Macs from its web site? :(

In Canada, the MacBook 13" Pro is available in either M1 or Intel.

The M1 appears cheaper but the displayed price is for a comfig with less
RAM and less disk.

The 16GB config max for the Air is laughable. Consider how much focus
on graphic designers/video editors there was in the keynote.

Remember the memory is not only welded but embedded inside the CPU chip
so no way to upgrade. Suspect the 8GB models are actually 16GB inside
but with half disabled. Same with the Air version (the cheaper one)
with 7 GPU cores instead of 8. Allows for thet to make use of chips with
defects.

nospam

unread,

Nov 11, 2020, 8:15:49 PM11/11/20

to

In article <VU%qH.433945$r25.3...@fx08.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> The 16GB config max for the Air is laughable.

it's a completely different architecture, therefore direct comparisons
are bogus.

ipads and iphones do exceptionally well with 4 gig.

Ant

unread,

Nov 11, 2020, 9:00:12 PM11/11/20

to

In comp.sys.mac.hardware.misc Tim <timst...@greenbee.net> wrote:

> On 11 Nov 2020 at 22:40:09 GMT, Ant <Ant> wrote:

> > It looks like Apple doesn't even sell its old Intel Macs from its web site? :(

> You can still buy Intel Minis.

Ah, but no MacBooks?

Ant

unread,

Nov 11, 2020, 9:01:30 PM11/11/20

to

That's frustrating. :(

nospam

unread,

Nov 11, 2020, 9:06:34 PM11/11/20

to

In article <VoydnT7Dncu7CjHC...@earthlink.com>, Ant

<a...@zimage.comANT> wrote:

> > > It looks like Apple doesn't even sell its old Intel Macs from its web
> > > site? :(
>
> > You can still buy Intel Minis.
>
> Ah, but no MacBooks?

lots of them.

Your Name

unread,

Nov 11, 2020, 9:09:10 PM11/11/20

to

On 2020-11-12 02:00:06 +0000, Ant said:
> In comp.sys.mac.hardware.misc Tim <timst...@greenbee.net> wrote:
>> On 11 Nov 2020 at 22:40:09 GMT, Ant <Ant> wrote:
>>>
>>> It looks like Apple doesn't even sell its old Intel Macs from its web site? :(
>>
>> You can still buy Intel Minis.
>
> Ah, but no MacBooks?

MacBook Pro 13", two Intel versions.
MacBook Air, no intel versions.

At least on the New Zealand Apple Store website.

Lewis

unread,

Nov 12, 2020, 9:48:03 AM11/12/20

to

That is what I ended up doing. With the EDU discount it's $1100.

--
Oh look, good intentions!

Lewis

unread,

Nov 12, 2020, 9:50:12 AM11/12/20

to

In message <XP2dnVJnM_js9THC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
> In comp.sys.mac.system Lewis <g.k...@kreme.dont-email.me> wrote:
>> In message <B5ednfUswZo00DHC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
>> > So, who is going to buy the first Arm (M1) MacBooks? ;)

>> I am ordering a MBA as soon as I decide if I should increase the RAM or
>> not. It will be mostly for my wife who almost certainly doesn't need
>> 16GB, but I am torn.

> Aren't the RAMs still soldered in? If so, then you might as well go as big as
> you can afford. Same for storage. :(

The RAM is not soldered in, no. The RAM is part of the SOC.

She's been using laptops with 8GB of RAM for 8 years now. so I am sure
she has no need of more RAM, but eh... it adds $8/m interest free, so
went ahead.

--
This zone of tranquility is compromised

Lewis

unread,

Nov 12, 2020, 9:51:57 AM11/12/20

to

In message <VoydnTnDncvpCjHC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
> In comp.sys.mac.hardware.misc Your Name <Your...@yourisp.com> wrote:
>> On 2020-11-11 22:40:49 +0000, Ant said:
>> > In comp.sys.mac.system Lewis <g.k...@kreme.dont-email.me> wrote:
>> >> In message <B5ednfUswZo00DHC...@earthlink.com> Ant
>> >> <a...@zimage.comANT> wrote:
>> >>>
>> >>> So, who is going to buy the first Arm (M1) MacBooks? ;)
>> >>
>> >> I am ordering a MBA as soon as I decide if I should increase the RAM or
>> >> not. It will be mostly for my wife who almost certainly doesn't need
>> >> 16GB, but I am torn.
>> >
>> > Aren't the RAMs still soldered in? If so, then you might as well go as big as
>> > you can afford. Same for storage. :(

>> Yep. It is one of the biggest pain-in-the-backside points with Apple
>> these days, especially considering Apple's over-pricing of RAM and
>> storage drives. :-(

> That's frustrating. :(

No it's not. It means the RAM is much faster and more tightly integrated
with the CPU and GPU and that much less needs to be moved around (for
one, no copying RAM from the "system" to the "GPU".

--
"Are you pondering what I'm pondering?"
"Wuh, I think so, Brain, but will they let the Cranberry Duchess stay
in the Lincoln Bedroom?"

Lewis

unread,

Nov 12, 2020, 9:54:51 AM11/12/20

to

In message <XP2dnVNnM_jU9THC...@earthlink.com> Ant <a...@zimage.comANT> wrote:
> It looks like Apple doesn't even sell its old Intel Macs from its web site? :(

That is entirely incorrect.

You can buy a 13" MBP with Intel, a 16" MBP with Intel, an iMac with
Intel, an iMac Pro with Inter, a Mac Pro with Intel, and a Mac mini i5.

--
Rincewind had always been happy to think of himself as a racist. The
One Hundred Meters, the Mile, the Marathon -- he'd run them all.

Lewis

unread,

Nov 12, 2020, 9:56:22 AM11/12/20

to

In message <VU%qH.433945$r25.3...@fx08.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> The 16GB config max for the Air is laughable. Consider how much focus
> on graphic designers/video editors there was in the keynote.

You have no idea what you are talking abut, as usual.

--
Train Station: where the train stops. Work Station: ...

JF Mezei

unread,

Nov 12, 2020, 1:34:53 PM11/12/20

to

In the keynote, Apple bragged about 8k video rendering on the MacBook Air.

A different architcture may change the size of executable code in RAM,
but it doesn't change buffer sizes when you render videos.

ARM being RISC, it should in fact require more instructions to do the
same work as CISC 8086. Also, binaries from 64 bit platforms are larger
since the fields in each opcode are larger.

Going from CISC 32 bit VAX to 64 bit RISC Alpha resulted in substally
larger binaries, and the need to tune the system very differently as it
required far more memory for the shareable images (dynamic library).
which were bigger.

OS-X is already 64 bits so moving to 64 bit ARM won't change much, but
going from cisc to RISC does.

The iPhone has typically required less RAM than Android because it is
more efficient and compact. But both are on the same platform (ARM).

To say that moving from 8086 to ARM will require less memory is not logica.

JF Mezei

unread,

Nov 12, 2020, 2:00:33 PM11/12/20

to

On 2020-11-11 20:15, nospam wrote:

Another aspect: that 8 or 16GB on a M1 based MAC is used for both the
CPU and the GPUs. So less of the RAM is actually available to the OS
because the GPUs use the same RAM. (likekly why Apple requires to have
some machine virtualization capabilities to manage RAM and other
resource allocation between different operating system instances ( OS_X
on CPU, the GPU OS , the neural engine etc).

In the discrete GPU days, the GPUs have their own gigabytes of memory,
so did not eat on the CPU's RAM (though some was dupicated as the app
needed to pupulate some RAM which was then transfered to GPU, a step
that is bypassed on Apple's ARM chips as the GPU share the same memory).

JF Mezei

unread,

Nov 12, 2020, 2:12:28 PM11/12/20

to

On 2020-11-12 09:51, Lewis wrote:

> No it's not. It means the RAM is much faster and more tightly integrated
> with the CPU and GPU and that much less needs to be moved around (for
> one, no copying RAM from the "system" to the "GPU".

Haven't verified this, but a friend pointed out thet PCI-express bus
speed is faster than DDR4 transfers.

With "normal" chips, you know how many channels to memory you have and
have the speed. So you can compare two systems's performance. (when you
have many cores, access to RAM can become the bottleneck so RAM
performance becomes important.

What sort of memery controller Apple uses will matter, especially when
you consider how many portions of the chip now have access to ram
separately (not just CPU cores, but alspo GPU cores, Secure enclave,
neural engine, IO controller etc).

Asll we know is "unified memory" which is meaningless marketing mumble
jumble.

It is only once independant testers riun standard benchmarks that we'll
have an idea of actual performance. Apple's keynote provided nothing of
value on performance. (look at their graphs, pretty lines, but no
scale/numbers for each axis).

Consider also that the Pro has a fan, the air doesn't. Yet, same CPU,
same OS. So it is very likely that the Air will be speed throttled due
to heat while the Pro will see better performance when doing lengthy work.

And it will make for interesting benchmark logic because you have cores
of different speeds. The 4 slow cores will slow down the average of the
4 fast ones.

nospam

unread,

Nov 12, 2020, 2:21:01 PM11/12/20

to

In article <gffrH.299239$mK4.2...@fx03.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

>
> Going from CISC 32 bit VAX

mezei law invoked.

>
> To say that moving from 8086 to ARM will require less memory is not logica.

it is.

nospam

unread,

Nov 12, 2020, 2:21:02 PM11/12/20

to

In article <kDfrH.133972$tL.5...@fx15.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> Another aspect: that 8 or 16GB on a M1 based MAC is used for both the
> CPU and the GPUs. So less of the RAM is actually available to the OS
> because the GPUs use the same RAM.

nope.

one major advantage is both the cpu and gpu can access the same data,
eliminating the need to copy it. another advantage is that the memory
itself is much faster.

nospam

unread,

Nov 12, 2020, 2:21:04 PM11/12/20

to

In article <uOfrH.514768$RY8....@fx48.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> Asll we know is "unified memory" which is meaningless marketing mumble
> jumble.

nonsense.

JF Mezei

unread,

Nov 13, 2020, 2:13:46 AM11/13/20

to

On 2020-11-12 14:21, nospam wrote:

> one major advantage is both the cpu and gpu can access the same data,
> eliminating the need to copy it. another advantage is that the memory
> itself is much faster.

Yes, you save when the app transfers to the GPU the scene description.
scene. But the GPU also uses a lot of memory to render the scene for
each frame. And that is a lot of memory used by the GPU which now draws
from RAM available to the CPU.

Lewis

unread,

Nov 13, 2020, 5:12:12 AM11/13/20

to

In message <uOfrH.514768$RY8....@fx48.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> What sort of memery controller Apple uses will matter, especially when

You are talking out of your ass again.

> Asll we know is "unified memory" which is meaningless marketing mumble
> jumble.

No it isn't. the RAM is part of the SoC is what it means.

> It is only once independant testers riun standard benchmarks that we'll
> have an idea of actual performance. Apple's keynote provided nothing of
> value on performance. (look at their graphs, pretty lines, but no
> scale/numbers for each axis).

You obviously didn't pay attention. Again.

> Consider also that the Pro has a fan, the air doesn't. Yet, same CPU,
> same OS. So it is very likely that the Air will be speed throttled due
> to heat

Yes. The mini and the MBP are faster because they have fans, they
specifically said this. The Air is still really really fucking fast.

> while the Pro will see better performance when doing lengthy work.

Let me think of the right word to respond to this. It's tricky. Oh
wait, I've got it,

DUH!

> And it will make for interesting benchmark logic because you have cores
> of different speeds.

No, it won't. Benchmarks run everything t full power and processes don't
get to pick what cores run.

> The 4 slow cores will slow down the average of the 4 fast ones.

The 4 slower cores are twice the speed of the old MacBook Air, so even if
the performance cores never kick in, the new Air is twice the performance
of the old Air and has massive battery gains. In fact, if only the HE
cores are used that would probably increase the battery life even more.

--
'Vetinari isn't mad.' 'Depends how you look at it. No one can be as
sane as he is without being mad.' --Feet of Clay

Lewis

unread,

Nov 13, 2020, 5:15:29 AM11/13/20

to

In message <gffrH.299239$mK4.2...@fx03.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> On 2020-11-11 20:15, nospam wrote:
>> In article <VU%qH.433945$r25.3...@fx08.iad>, JF Mezei
>> <jfmezei...@vaxination.ca> wrote:
>>
>>> The 16GB config max for the Air is laughable.
>>
>> it's a completely different architecture, therefore direct comparisons
>> are bogus.
>>
>> ipads and iphones do exceptionally well with 4 gig.

> In the keynote, Apple bragged about 8k video rendering on the MacBook Air.

And you think they were lying.

> ARM being RISC, it should in fact require more instructions to do the
> same work as CISC 8086.

You are, once again, talking out of your ass. Really, you need to get
out of the fucking 90s.

> Going from CISC 32 bit VAX

And there we are.

NO ONE FUCKING CARES ABOUT VAX. VAX IS NOT RELEVANT TO *ANYTHING*.

Fucking idiot.

--
The universe doesn't much care if you tread on a butterfly. There are
plenty more butterflies. Gods might note the fall of a sparrow
but they don't make any effort to catch them. --Lords and Ladies

nospam

unread,

Nov 13, 2020, 6:30:48 AM11/13/20

to

In article <ImqrH.448051$I15.3...@fx36.iad>, JF Mezei

you don't understand how things work, do you?

Lewis

unread,

Nov 13, 2020, 10:43:42 AM11/13/20

to

He really doesn't. At all.

--
He wasn't good or evil or cruel or extreme in any way but one, which
was that he had elevated greyness to the status of a fine art and
cultivated a mind that was as bleak and pitiless and logical as
the slopes of Hell.

Alan Browne

unread,

Nov 13, 2020, 11:27:22 AM11/13/20

to

On 2020-11-11 20:15, nospam wrote:

This is neither. In any case, it depends on the use case. For many
people 8 GB is oodles (browsing, some office apps, e-mail and the like).

16 GB would be a bare minimum for my home computer but 8 GB is more than
enough at work even with a VM.

--
"...there are many humorous things in this world; among them the white
man's notion that he is less savage than the other savages."
-Samuel Clemens

Alan Browne

unread,

Nov 13, 2020, 11:43:38 AM11/13/20

to

On 2020-11-12 13:34, JF Mezei wrote:

> ARM being RISC,

ARM's are RISCier than ever before just as CISC's are RISCier than ever.

Why that distinction is not talked about anymore ... or for that matter
over the last 10 years or so for most computers. It does still matter
in lower end devices (routers for example) v. intel based microcontrollers.

Further, it's likely that Apple are implementing their own specialized
instructions which would be CISCy, not RISCy.

Alan Browne

unread,

Nov 13, 2020, 11:45:56 AM11/13/20

to

On 2020-11-12 14:00, JF Mezei wrote:
> On 2020-11-11 20:15, nospam wrote:
>> In article <VU%qH.433945$r25.3...@fx08.iad>, JF Mezei
>> <jfmezei...@vaxination.ca> wrote:
>>
>>> The 16GB config max for the Air is laughable.
>>
>> it's a completely different architecture, therefore direct comparisons
>> are bogus.
>>
>> ipads and iphones do exceptionally well with 4 gig.
>
> Another aspect: that 8 or 16GB on a M1 based MAC is used for both the
> CPU and the GPUs.

As it is on many Macs that have shared GPU memory with the intel chip as
they use intel graphics. See this a lot in various MBA's and some Mac
Minis. Possibly lower spec 21" iMacs too (I haven't looked).

Alan Browne

unread,

Nov 13, 2020, 11:56:30 AM11/13/20

to

On 2020-11-12 14:21, nospam wrote:

It is marketing speak to a degree. Per a Wikipedia article[1] the SOC
uses 3733 MHz LPDDR4X spec'd SDRAM as an included component within the
SOC. Still needs a memory manager though that may be more deeply
integrated in the CPU; likely has DMA of some kind, esp. for the GPU,
but other I/O as well.

[1] that source has no reference so I declare escape clause. But it's a
good bet if someone has seen that in System Info declared as such on the
many, many developer systems out there.

Alan Browne

unread,

Nov 13, 2020, 12:47:56 PM11/13/20

to

On 2020-11-13 11:43, Alan Browne wrote:
> On 2020-11-12 13:34, JF Mezei wrote:
>
>> ARM being RISC,
>
> ARM's are RISCier than ever before just as CISC's are RISCier than ever.

CISCier... I meant to say.

JF Mezei

unread,

Nov 13, 2020, 1:16:36 PM11/13/20

to

On 2020-11-13 10:43, Lewis wrote:

>> you don't understand how things work, do you?
>
> He really doesn't. At all.

Perhaps you could do the community a favour and explain how PCI-E GPUs
work and how the Apple ones which share RAM will work.

JF Mezei

unread,

Nov 13, 2020, 1:22:06 PM11/13/20

to

On 2020-11-13 11:43, Alan Browne wrote:

> Further, it's likely that Apple are implementing their own specialized
> instructions which would be CISCy, not RISCy.

Apple seems to ave focued on adding discrete components such as the
neural processor, image processor etc to do these tasks. Adding
instructions would require compilers and LLVM be updated and be
specific to Apple and it is doubtful rhere would be much use of them.

Please remembes that the apologists argued that an ARM binary would be
smaller than an Intel one , hence need for less RAM on the laptop. That
is what I was responding to.

Lewis

unread,

Nov 13, 2020, 3:18:46 PM11/13/20

to

Perhaps you should do your own research as you have proved countless
times that you are incapable of learning from anything posted here.

--
You know a thorn can main / But a lover does the same / A gem will
reflect light / And a Fool will marvel at the sight / A fool such
as me, /Who sees not the gold, but the beauty of the shine

Lewis

unread,

Nov 13, 2020, 3:20:31 PM11/13/20

to

In message <0VyrH.145047$Ml5.1...@fx24.iad> Alan Browne <bitb...@blackhole.com> wrote:
> On 2020-11-12 14:21, nospam wrote:
>> In article <uOfrH.514768$RY8....@fx48.iad>, JF Mezei
>> <jfmezei...@vaxination.ca> wrote:
>>
>>> Asll we know is "unified memory" which is meaningless marketing mumble
>>> jumble.
>>
>> nonsense.

> It is marketing speak to a degree. Per a Wikipedia article[1] the SOC
> uses 3733 MHz LPDDR4X spec'd SDRAM as an included component within the
> SOC. Still needs a memory manager though that may be more deeply
> integrated in the CPU; likely has DMA of some kind, esp. for the GPU,
> but other I/O as well.

> [1] that source has no reference so I declare escape clause. But it's a
> good bet if someone has seen that in System Info declared as such on the
> many, many developer systems out there.

What many many developer systems out there?

(The DTK is not an M1 machine).

--
This is not a movie I saw, this is a dream I had. --Steven Wright

Alan Browne

unread,

Nov 13, 2020, 3:40:16 PM11/13/20

to

On 2020-11-13 13:22, JF Mezei wrote:
> On 2020-11-13 11:43, Alan Browne wrote:
>
>> Further, it's likely that Apple are implementing their own specialized
>> instructions which would be CISCy, not RISCy.
>
> Apple seems to ave focued on adding discrete components such as the
> neural processor, image processor etc to do these tasks. Adding
> instructions would require compilers and LLVM be updated and be
> specific to Apple and it is doubtful rhere would be much use of them.

They can add all the instructions they want to their own compiler(s) w/o
breaking anyone else's compilers. The later would simply not have access
to the extended instructions.

It's up to Apple to decide what additional instructions would be of use
to them and whether or not to publish the instructions for other
compilers to use.

>
> Please remembes that the apologists argued that an ARM binary would be
> smaller than an Intel one , hence need for less RAM on the laptop. That
> is what I was responding to.

Adding to instruction sets has 0 impact on the binary. May even reduce
it in particular cases.

Alan Browne

unread,

Nov 13, 2020, 3:47:37 PM11/13/20

to

On 2020-11-13 15:20, Lewis wrote:
> In message <0VyrH.145047$Ml5.1...@fx24.iad> Alan Browne <bitb...@blackhole.com> wrote:
>> On 2020-11-12 14:21, nospam wrote:
>>> In article <uOfrH.514768$RY8....@fx48.iad>, JF Mezei
>>> <jfmezei...@vaxination.ca> wrote:
>>>
>>>> Asll we know is "unified memory" which is meaningless marketing mumble
>>>> jumble.
>>>
>>> nonsense.
>
>> It is marketing speak to a degree. Per a Wikipedia article[1] the SOC
>> uses 3733 MHz LPDDR4X spec'd SDRAM as an included component within the
>> SOC. Still needs a memory manager though that may be more deeply
>> integrated in the CPU; likely has DMA of some kind, esp. for the GPU,
>> but other I/O as well.
>
>> [1] that source has no reference so I declare escape clause. But it's a
>> good bet if someone has seen that in System Info declared as such on the
>> many, many developer systems out there.
>
> What many many developer systems out there?

Thousands...

>
> (The DTK is not an M1 machine).

I assumed it was. Nevertheless, the use of the memory model cited above
is quite possible and that that part of the fab is supplied by the chip
maker whether on the same wafer or other.

And/or that someone made the same assumption that I did about the memory
found on the dev kit.

JF Mezei

unread,

Nov 13, 2020, 4:58:11 PM11/13/20

to

On 2020-11-13 15:18, Lewis wrote:

> Perhaps you should do your own research as you have proved countless
> times that you are incapable of learning from anything posted here.

So you insult what I say but are unable to provide any real information
to show I was wrong.

hint: look at RAM config for the M1 vs Intel version of MacBook pro.

Both have the in-CPU GPU.

BTW, someone mentione the M1 macs have 3733MHz LPDDR4X RAM. This RAM
spec is for the Intel version of the MacBook Pro 13". All Apple says for
the M1 is "unified memory".

Lewis

unread,

Nov 13, 2020, 7:51:14 PM11/13/20

to

In message <RjDrH.209288$5l1.1...@fx10.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> On 2020-11-13 15:18, Lewis wrote:

>> Perhaps you should do your own research as you have proved countless
>> times that you are incapable of learning from anything posted here.

> So you insult

Stating you are ignorant and spew irrelevant VAX shit constantly is not
an insult, it is a fact.

--
"I have no choice but to believe in free will." - Randy Wayne White

Lewis

unread,

Nov 13, 2020, 7:54:11 PM11/13/20

to

In message <HhCrH.320247$GQ4.3...@fx02.iad> Alan Browne <bitb...@blackhole.com> wrote:
> On 2020-11-13 15:20, Lewis wrote:
>> In message <0VyrH.145047$Ml5.1...@fx24.iad> Alan Browne <bitb...@blackhole.com> wrote:
>>> On 2020-11-12 14:21, nospam wrote:
>>>> In article <uOfrH.514768$RY8....@fx48.iad>, JF Mezei
>>>> <jfmezei...@vaxination.ca> wrote:
>>>>
>>>>> Asll we know is "unified memory" which is meaningless marketing mumble
>>>>> jumble.
>>>>
>>>> nonsense.
>>
>>> It is marketing speak to a degree. Per a Wikipedia article[1] the SOC
>>> uses 3733 MHz LPDDR4X spec'd SDRAM as an included component within the
>>> SOC. Still needs a memory manager though that may be more deeply
>>> integrated in the CPU; likely has DMA of some kind, esp. for the GPU,
>>> but other I/O as well.
>>
>>> [1] that source has no reference so I declare escape clause. But it's a
>>> good bet if someone has seen that in System Info declared as such on the
>>> many, many developer systems out there.
>>
>> What many many developer systems out there?

> Thousands...

Nonsense.

>> (The DTK is not an M1 machine).

> I assumed it was.

You assumed wrong. As was detailed at the time it is a A12X. Literally
an iPad chip. It does not have unified memory. It does not have
USB-4/TB. It is not the same chip *at all*

> Nevertheless, the use of the memory model cited above is quite
> possible and that that part of the fab is supplied by the chip maker
> whether on the same wafer or other.

The chip maker supplies exactly what Apple designed, nothing more,
nothing less.

> And/or that someone made the same assumption that I did about the memory
> found on the dev kit.

And that person was equally wrong.

--
Lead me not into temptation, I can find the way.

nospam

unread,

Nov 13, 2020, 8:06:33 PM11/13/20

to

In article <slrnrquali....@ProMini.lan>, Lewis

<g.k...@kreme.dont-email.me> wrote:

>
> >> (The DTK is not an M1 machine).
>
> > I assumed it was.
>
> You assumed wrong. As was detailed at the time it is a A12X. Literally
> an iPad chip. It does not have unified memory. It does not have
> USB-4/TB. It is not the same chip *at all*

actually, it was an a12z. it was basically an ipad in a mac mini box.
which is good enough for development work.

Lewis

unread,

Nov 14, 2020, 4:47:40 AM11/14/20

to

In message <131120202006318871%nos...@nospam.invalid> nospam <nos...@nospam.invalid> wrote:
> In article <slrnrquali....@ProMini.lan>, Lewis
> <g.k...@kreme.dont-email.me> wrote:

>>
>> >> (The DTK is not an M1 machine).
>>
>> > I assumed it was.
>>
>> You assumed wrong. As was detailed at the time it is a A12X. Literally
>> an iPad chip. It does not have unified memory. It does not have
>> USB-4/TB. It is not the same chip *at all*

> actually, it was an a12z.

Yes, which differed from the A12X in that it had one more GPU core.

> it was basically an ipad in a mac mini box.

Very much so. I know several people who had them.

> which is good enough for development work.

Yeo, especially considering how simple the transition from Intel to ARM
was for the developers (One spent several days porting, but the others
did their ports in under a day and one spent less than 15 minutes).

The rest of their summer was spent figuring out what fun new things they
could do.

--
Anybody who could duck the Vietnam war can certainly duck a couple of
shoes.

JF Mezei

unread,

Nov 14, 2020, 5:21:06 AM11/14/20

to

On 2020-11-14 04:47, Lewis wrote:

> Yeo, especially considering how simple the transition from Intel to ARM
> was for the developers (One spent several days porting, but the others
> did their ports in under a day and one spent less than 15 minutes).

Simple enough that it will take months more for Adobe to release native
Photoshop.

If you want to create binaries that make use of the new various
sub-processors like neural engine, image processor etc, support the
integrated GPUs etc, you need to start putting in a lot of conditional
code that applies only for a compilation targetted at the Mx chips, and
still generate your normal code for all other platforms.

Remains to be seen how much use Adobe will make the the proprietary
processors around he CPU, and how much Metal/GPU they will use.

It is also possible that merely recompiling wouldn't have yielded
marketable performance.

It is also possible that Adobe is waiting for usable Macs (aka: next
models with more RAM) and the delay has nothing to do with the porting
effort.

nospam

unread,

Nov 14, 2020, 8:29:15 AM11/14/20

to

In article <kcOrH.306825$mK4....@fx03.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

>
> > Yeo, especially considering how simple the transition from Intel to ARM
> > was for the developers (One spent several days porting, but the others
> > did their ports in under a day and one spent less than 15 minutes).
>
>
> Simple enough that it will take months more for Adobe to release native
> Photoshop.

photoshop is a very complex app, with some core routines in assembly
that are hand tuned to specific versions of processors.

> If you want to create binaries that make use of the new various
> sub-processors like neural engine, image processor etc, support the
> integrated GPUs etc, you need to start putting in a lot of conditional
> code that applies only for a compilation targetted at the Mx chips, and
> still generate your normal code for all other platforms.
>
> Remains to be seen how much use Adobe will make the the proprietary
> processors around he CPU, and how much Metal/GPU they will use.

photoshop is cross platform, so any change they make to the mac version
must also be done for the windows version so that the results are
identical for both.

> It is also possible that merely recompiling wouldn't have yielded
> marketable performance.

given that portions of photoshop are x86 assembly, it would be
impossible to 'merely recompile' it.

> It is also possible that Adobe is waiting for usable Macs (aka: next
> models with more RAM) and the delay has nothing to do with the porting
> effort.

no.

Lewis

unread,

Nov 14, 2020, 9:04:15 AM11/14/20

to

In message <kcOrH.306825$mK4....@fx03.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> On 2020-11-14 04:47, Lewis wrote:

>> Yeo, especially considering how simple the transition from Intel to ARM
>> was for the developers (One spent several days porting, but the others
>> did their ports in under a day and one spent less than 15 minutes).

> Simple enough that it will take months more for Adobe to release native
> Photoshop.

Adobe is Adobe and always moves at a snail's pace. Affinity will be out
day one, for example. ALL the Omni apps are ready to go on M1 Macs. Many
many others as well.

> If you want to create binaries that make use of the new various
> sub-processors like neural engine, image processor etc, support the
> integrated GPUs etc, you need to start putting in a lot of conditional
> code that applies only for a compilation targetted at the Mx chips, and
> still generate your normal code for all other platforms.

You have no fucking clue what you are talking about. Again. Nothing in
that paragraph is at all accurate.

> Remains to be seen how much use Adobe will make the the proprietary
> processors around he CPU, and how much Metal/GPU they will use.

And the idiocy continues.

--
Psychic convention cancelled due to unforeseen problems.

Alan Browne

unread,

Nov 14, 2020, 9:30:32 AM11/14/20

to

On 2020-11-13 19:54, Lewis wrote:
> In message <HhCrH.320247$GQ4.3...@fx02.iad> Alan Browne <bitb...@blackhole.com> wrote:
>> On 2020-11-13 15:20, Lewis wrote:
>>> In message <0VyrH.145047$Ml5.1...@fx24.iad> Alan Browne <bitb...@blackhole.com> wrote:
>>>> On 2020-11-12 14:21, nospam wrote:
>>>>> In article <uOfrH.514768$RY8....@fx48.iad>, JF Mezei
>>>>> <jfmezei...@vaxination.ca> wrote:
>>>>>
>>>>>> Asll we know is "unified memory" which is meaningless marketing mumble
>>>>>> jumble.
>>>>>
>>>>> nonsense.
>>>
>>>> It is marketing speak to a degree. Per a Wikipedia article[1] the SOC
>>>> uses 3733 MHz LPDDR4X spec'd SDRAM as an included component within the
>>>> SOC. Still needs a memory manager though that may be more deeply
>>>> integrated in the CPU; likely has DMA of some kind, esp. for the GPU,
>>>> but other I/O as well.
>>>
>>>> [1] that source has no reference so I declare escape clause. But it's a
>>>> good bet if someone has seen that in System Info declared as such on the
>>>> many, many developer systems out there.
>>>
>>> What many many developer systems out there?
>
>> Thousands...
>
> Nonsense.

Really. Even my lowly Pascal compiler developer got one and very
quickly. It was underwhelming in the sense that the new compiler was
tested and ready in mere days.

>
>>> (The DTK is not an M1 machine).
>
>> I assumed it was.
>
> You assumed wrong. As was detailed at the time it is a A12X. Literally
> an iPad chip. It does not have unified memory. It does not have
> USB-4/TB. It is not the same chip *at all*

Fine.

>
>> Nevertheless, the use of the memory model cited above is quite
>> possible and that that part of the fab is supplied by the chip maker
>> whether on the same wafer or other.
>
> The chip maker supplies exactly what Apple designed, nothing more,
> nothing less.

Simplest path: spec an existing type that is in the fabs capability and
meets the spec.

Lewis

unread,

Nov 14, 2020, 11:51:51 AM11/14/20

to

Again, the DTK Is NOT the M1 mac. At all.

> Simplest path: spec an existing type that is in the fabs capability and
> meets the spec.

Are you channeling JF?

--
> I miss the old days. I haven't killed anyone in years.
>
That's sad.

Alan Browne

unread,

Nov 14, 2020, 12:12:42 PM11/14/20

to

On 2020-11-14 11:51, Lewis wrote:
> In message <aSRrH.889498$AN2....@fx46.iad> Alan Browne <bitb...@blackhole.com> wrote:
>> On 2020-11-13 19:54, Lewis wrote:
>>> In message <HhCrH.320247$GQ4.3...@fx02.iad> Alan Browne <bitb...@blackhole.com> wrote:
>>>> On 2020-11-13 15:20, Lewis wrote:
>>>>> In message <0VyrH.145047$Ml5.1...@fx24.iad> Alan Browne <bitb...@blackhole.com> wrote:
>>>>>> On 2020-11-12 14:21, nospam wrote:
>>>>>>> In article <uOfrH.514768$RY8....@fx48.iad>, JF Mezei
>>>>>>> <jfmezei...@vaxination.ca> wrote:
>>>>>>>
>>>>>>>> Asll we know is "unified memory" which is meaningless marketing mumble
>>>>>>>> jumble.
>>>>>>>
>>>>>>> nonsense.
>>>>>
>>>>>> It is marketing speak to a degree. Per a Wikipedia article[1] the SOC
>>>>>> uses 3733 MHz LPDDR4X spec'd SDRAM as an included component within the
>>>>>> SOC. Still needs a memory manager though that may be more deeply
>>>>>> integrated in the CPU; likely has DMA of some kind, esp. for the GPU,
>>>>>> but other I/O as well.
>>>>>
>>>>>> [1] that source has no reference so I declare escape clause. But it's a
>>>>>> good bet if someone has seen that in System Info declared as such on the
>>>>>> many, many developer systems out there.
>>>>>
>>>>> What many many developer systems out there?
>>>
>>>> Thousands...
>>>

>>> Nonsense. [1] <------

>
>> Really. Even my lowly Pascal compiler developer got one and very
>> quickly. It was underwhelming in the sense that the new compiler was
>> tested and ready in mere days.
>
> Again, the DTK Is NOT the M1 mac. At all.

Yeah, resolved in prior posts.

I wasn't being specific to the processor - only that Apple put out
thousands of transition developer kits.

>
>> Simplest path: spec an existing type that is in the fabs capability and
>> meets the spec.
>
> Are you channeling JF?

Not at all.

Alan Browne

unread,

Nov 14, 2020, 12:22:00 PM11/14/20

to

On 2020-11-14 08:29, nospam wrote:

> photoshop is a very complex app, with some core routines in assembly
> that are hand tuned to specific versions of processors.

<s>

>
> given that portions of photoshop are x86 assembly, it would be
> impossible to 'merely recompile' it.

Such can be done with translation tools (_x86 -> ARM assembler).
They've possibly developed or acquired such for their iOS products.

The ARM 64 bit register set will offer a lot more opportunity to keep
more data in register rather than on stack which will help functions
blaze v. x86.

x86: 16 x 64b registers.

ARM_64: 31 x 64b - though the ARM procedure calling convention allocates
several specifically. Apple follow that convention to a large degree
but have their own "guideline" for some register use and specifically
reserve x18 and x29. Apple also have slightly different stack/register
conventions/alignments/restrictions than ARM.

JF Mezei

unread,

Nov 14, 2020, 12:40:33 PM11/14/20

to

On 2020-11-14 08:29, nospam wrote:

> given that portions of photoshop are x86 assembly, it would be
> impossible to 'merely recompile' it.

Surprised at that allegation since Photoshop used to run on 68k, PowerPC
and 32 bit 8086s.

The GPUs take in C++ code (or variant thereof). And x86 code isn't
optimized, so generally only used for low level hardware interfaces.

It wasn't that long ago that you and your ilk even refised to admit that
OS-X would have any assembly language, and now you state Photoshop (a
user mode application) has assembler.

Depending on demand, there might be an x26 assembler *compiler* which
would then generate optimised ARM opcodes.

When Digital did the migration from VAX to Alpha, it found itself with a
LOT of VAX assembler code (Macro) and decided to create a Macro compiler
that generated optimized Alpha binaries. This prevented the need to
rewrite many of the older apps.

However, I suspect the amount of x86 assembler in the OS-X ecosystem is
really not sufficient to warrant developing such a compiler. The
assembler portions in OS-X would be very low level and hace to be
rewritten to the new ARM environment (device interfaces etc) so a
compiler not very useful.

JF Mezei

unread,

Nov 14, 2020, 12:46:40 PM11/14/20

to

On 2020-11-14 10:41, Jolly Roger wrote:

> Fuck off, FUDster. Adobe is famous in the Mac community for taking
> longer than most other developers to update their apps.

Are you calling Tim Cook a FUDster? he is the one who announced
Photoshop would arrive next year.

nospam

unread,

Nov 14, 2020, 1:04:20 PM11/14/20

to

In article <jEUrH.365302$5_4....@fx40.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

>
> > given that portions of photoshop are x86 assembly, it would be
> > impossible to 'merely recompile' it.
>
> Surprised at that allegation since Photoshop used to run on 68k, PowerPC
> and 32 bit 8086s.

obviously, the 68k and powerpc versions of photoshop had 68k and
powerpc assembly in them. this is not a difficult concept.

photoshop has been intel-only for over a decade, thus there is no 68k
or powerpc code anywhere to be found anymore. also not a difficult
concept.

> The GPUs take in C++ code (or variant thereof). And x86 code isn't
> optimized, so generally only used for low level hardware interfaces.
>
> It wasn't that long ago that you and your ilk even refised to admit that
> OS-X would have any assembly language, and now you state Photoshop (a
> user mode application) has assembler.

i never said any such thing.

> Depending on demand, there might be an x26 assembler *compiler* which
> would then generate optimised ARM opcodes.
>
> When Digital did the migration from VAX to Alpha,

there you go again...

nospam

unread,

Nov 14, 2020, 1:04:21 PM11/14/20

to

In article <WmUrH.138317$nI.1...@fx21.iad>, Alan Browne

<bitb...@blackhole.com> wrote:

>
> > photoshop is a very complex app, with some core routines in assembly
> > that are hand tuned to specific versions of processors.
> <s>
> >
> > given that portions of photoshop are x86 assembly, it would be
> > impossible to 'merely recompile' it.
>
> Such can be done with translation tools (_x86 -> ARM assembler).
> They've possibly developed or acquired such for their iOS products.

it can, but it would not be anywhere close to optimal.

adobe tweaks their core routines not just for i5/i7, but different
variants, which is one reason why photoshop is one of the fastest image
processing apps.

Alan Browne

unread,

Nov 14, 2020, 1:26:55 PM11/14/20

to

On 2020-11-14 12:40, JF Mezei wrote:
> On 2020-11-14 08:29, nospam wrote:
>
>> given that portions of photoshop are x86 assembly, it would be
>> impossible to 'merely recompile' it.
>
> Surprised at that allegation since Photoshop used to run on 68k, PowerPC
> and 32 bit 8086s.

Pretty irrelevant. Going forward at least, assembler for one
architecture is pretty quickly converted to almost any other. As
architectures evolve it just gets easier as register sets expand and
opcodes provide more options to do operations in fewer instructions.

> The GPUs take in C++ code (or variant thereof). And x86 code isn't
> optimized, so generally only used for low level hardware interfaces.

Not at all. In treating large sets of integer values (which is what a
digital image is "made of"), assembler offers many ways to do many
things very efficiently and further allows for the ultimate of
optimization and clever tricks that are not expressable efficiently in
high level language. This can even include FP ops with opportunistic
avoidance of FWAIT if one takes extraordinary care.

On the other side of that dime, h/w control is now rarely implemented in
assembler because the HOL's provide ample read/write to control
registers and of course memory mapped I/O. Tossup whether assembler or
HOL is better, actually. For hardware makers this is even preferred as
they don't have to worry about what architecture and often even OS that
a driver will run on. Minor tweaks at most (and taken care via pragmas
in the source code).

> It wasn't that long ago that you and your ilk even refised to admit that
> OS-X would have any assembly language, and now you state Photoshop (a
> user mode application) has assembler.

The OS can be 100% in C (or whatever mix) and apps can be 100% in
assembler if need be, though usually limited to where needed. Not
saying Mac OS is assembler free, but as time has gone on, Apple have
reduced that to bare minimum.

> Depending on demand, there might be an x26 assembler *compiler* which
> would then generate optimised ARM opcodes.

x26 assembler? Eh? *compiler*? Do you mean translator or converter?
Sure. Quite plausible.

> When Digital did the migration from VAX to Alpha,

<S>
... get over it already.

> However, I suspect the amount of x86 assembler in the OS-X ecosystem is
> really not sufficient to warrant developing such a compiler. The
> assembler portions in OS-X would be very low level and hace to be
> rewritten to the new ARM environment (device interfaces etc) so a
> compiler not very useful.

You mean translator. Who knows. I'd lean more to Apple simply
re-writing the assembler portions for best efficiency since there is,
ultimately, not that much assembler left in Apple OS'. So might well
rewrite the little bit there is as tightly as possible. Esp. as
portions of the OS are common to watch and presumably AirTag where
energy use is most important.

The ARM assembler is available and Apple expressly provide their
guidelines to developers for using ARM assembly language for Mac OS (and
iOS and tvOS). I didn't see iPad OS or Watch OS in there but assume the
same guidance applies. I assume that ARMs assembler is invokable from
XCode but if not the assembler output object code certainly can be
linked. You need to follow ARM calling conventions with the Apple
exceptions of course.

Lewis

unread,

Nov 14, 2020, 1:34:06 PM11/14/20

to

You were talking out the unified memory in the M1 chip and comparing it
to the DTK.

>>> Simplest path: spec an existing type that is in the fabs capability and
>>> meets the spec.
>>
>> Are you channeling JF?

> Not at all.

Seems like it. The M1 is not an "existing type" it is a new chip
designed by Apple and made to their design. it is not being assembled out
of other bits.

--
Women like silent men, they think they're listening.

Lewis

unread,

Nov 14, 2020, 1:40:18 PM11/14/20

to

In message <jEUrH.365302$5_4....@fx40.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> On 2020-11-14 08:29, nospam wrote:

>> given that portions of photoshop are x86 assembly, it would be
>> impossible to 'merely recompile' it.

> Surprised at that allegation since Photoshop used to run on 68k, PowerPC
> and 32 bit 8086s.

An you think they simply recompiled the 68000 code for Intel Are ou
really that daft?

> The GPUs take in C++ code (or variant thereof). And x86 code isn't
> optimized, so generally only used for low level hardware interfaces.

You, AGAIN, have o idea what you're talking about.

> It wasn't that long ago that you and your ilk even refised to admit that
> OS-X would have any assembly language,

No one said that outside your delusion. What was said was that for the
vast majority of developers and the vast majority pf apps, there was no
assembly written. And there is less and less as Apple continues to
abstract the hardware from being accessed directly.

The end of kexts is coming very soon (possibly next year), for example.
Apple is moving to prevent anything from executing outside of the base
OS's permission model.

> When Digital did the migration from VAX

BZZT! And you're outta here!

--
He felt that the darkness was full of unimaginable horrors - and the
trouble with unimaginable horrors was that they were only too
easy to imagine...

Alan Browne

unread,

Nov 14, 2020, 1:40:40 PM11/14/20

to

On 2020-11-14 13:04, nospam wrote:
> In article <WmUrH.138317$nI.1...@fx21.iad>, Alan Browne
> <bitb...@blackhole.com> wrote:
>
>>
>>> photoshop is a very complex app, with some core routines in assembly
>>> that are hand tuned to specific versions of processors.
>> <s>
>>>
>>> given that portions of photoshop are x86 assembly, it would be
>>> impossible to 'merely recompile' it.
>>
>> Such can be done with translation tools (_x86 -> ARM assembler).
>> They've possibly developed or acquired such for their iOS products.
>
> it can, but it would not be anywhere close to optimal.

Why I said "possible". Further, when going forward (architecture) it's
easier to build optimization into the translation process.

It's also a good way to get the project ahead, validate the result, and
then "hand optimize" the result in successive rounds.

nospam

unread,

Nov 14, 2020, 1:47:03 PM11/14/20

to

In article <slrnrr094g....@ProMini.lan>, Lewis

<g.k...@kreme.dont-email.me> wrote:

> >> given that portions of photoshop are x86 assembly, it would be
> >> impossible to 'merely recompile' it.
>
> > Surprised at that allegation since Photoshop used to run on 68k, PowerPC
> > and 32 bit 8086s.
>
> An you think they simply recompiled the 68000 code for Intel Are ou
> really that daft?

do you really need to ask?

Alan Browne

unread,

Nov 14, 2020, 2:20:05 PM11/14/20

to

I took your objection to be the word "Thousands". That was me
referencing the number of dev kits. Not what specific processor was in
there.

>>>> Simplest path: spec an existing type that is in the fabs capability and
>>>> meets the spec.
>>>
>>> Are you channeling JF?
>
>> Not at all.
>
> Seems like it. The M1 is not an "existing type" it is a new chip
> designed by Apple and made to their design. it is not being assembled out
> of other bits.

Those elements can come from different designers(companies) with the
data for the design transferred to Apple to integrate into their overall
design by specification.

Apple claim "16 billion transistors" for the M1. That can't possibly
represent the entire CPU + 8 GB or 16 GB of memory. (8 GiB would take a
minimum of 64B transistors for a DRAM and 6 times that for SRAM plus all
the interface logic (address decode, etc.)).

So, could be integrated as separate chips onto 1 carrier:

Looking at:
https://www.apple.com/v/mac/m1/a/images/overview/chip__fffqz3ljssi2_large.jpg

That looks like 2 memory modules integrated onto the same carrier as the
processor portion to the left.

IAC: we'll see in an M1 Mac if the memory "spec" is listed in the system
information. I look forward to your screenshot.

JF Mezei

unread,

Nov 14, 2020, 2:48:21 PM11/14/20

to

On 2020-11-14 13:26, Alan Browne wrote:

> Not at all. In treating large sets of integer values (which is what a
> digital image is "made of"), assembler offers many ways to do many
> things very efficiently and further allows for the ultimate of
> optimization and clever tricks that are not expressable efficiently in
> high level language.

When moving to RISC, this changes the equation because of the large
number of optomizations possible buy re-ordering of opcodes to allow the
chip to make best use of pipeline, branch prediction and other
perofrmance techniques.

And Assembler, by definition, generates opcodes that match your code
both in nature and order. No optimizatiosn possible.

However, when a chip has an instruction to decode an H.264 stream for
instance, it can be more efficient to use assembler to use that
instriuction vs writing your own in higher level language that gets
optimized.

> x26 assembler? Eh? *compiler*? Do you mean translator or converter?
> Sure. Quite plausible.

COMPILER. The compiler treats the source as high level language and
does the optimization, re-ordering of the code to fit the target
archictecture. A translator merely translates already compiled code
without trying to understand the code and optmize it.

Rosetta 2 is a translator that takes code optimized for 8086 and
translates it to multiple ARM instructions that do the same whereas
recompiling will optimise the binary to run on ARM.

This is why Digital did a compiler for its Macro assembler when moving
platform so that the CISC VAX instructions would not only be translated
into a bunch of RISC instructions but could also optmize sets of
instructiosn just as any 3rd level language is optimized.

> ... get over it already.

So you refuse to discuss experiences of other ports and wish to keep
your head in sand and and blindly drink Apple's kool aid.

> You mean translator.

For applicationn where performance counts, I mean COMPILER. Where low
level code is truly needed, assembler is used because you can't optimize
it and it needs to be rewritten from scratch because handling of device
drivers, the types of io interfaces etc all very different.

In the case of OS-X, much already exists from IOS, so much could be
re-used, but there is still code needed for the new thunderbolt/USB-4
drivers and all the variants attached to it, including ethernet drivers
attached to the new thunderbolt/USB-4 IO interface.

> The ARM assembler is available and Apple expressly provide their
> guidelines to developers for using ARM assembly language

I am sure it is. But when targetting a RISC platform, assembly language
is much harder to beat efficieny of higher level languages because of
compiler and LLVM optimizations which know about how that CPU does
instruction pre-fetching, pipelining, branch prediction, etc.

JF Mezei

unread,

Nov 14, 2020, 2:52:38 PM11/14/20

to

On 2020-11-14 13:47, nospam wrote:

>> An you think they simply recompiled the 68000 code for Intel Are ou
>> really that daft?
>
> do you really need to ask?

Argument was made that Adobe had to rewrite large parts because they
were in 8086 assembly. I counterred that Adobe already had multi platfor
support in the past. So they can rewrite parts for ARM if they want to.
But it does show that it isn't just a simple recompile.

nospam

unread,

Nov 14, 2020, 3:08:50 PM11/14/20

to

In article <8wWrH.329951$9r7....@fx07.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

>
> And Assembler, by definition, generates opcodes that match your code
> both in nature and order. No optimizatiosn possible.

false.

> However, when a chip has an instruction to decode an H.264 stream for
> instance, it can be more efficient to use assembler to use that
> instriuction vs writing your own in higher level language that gets
> optimized.

better yet, let the hardware decoder handle it.

>
> This is why Digital did a compiler for its Macro assembler when moving
> platform so that the CISC VAX

*gong*

nospam

unread,

Nov 14, 2020, 3:08:51 PM11/14/20

to

In article <9AWrH.329952$9r7.2...@fx07.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> Argument was made that Adobe had to rewrite large parts because they
> were in 8086 assembly.

some parts are, not large parts.

it also needs to be functionally identical to the windows version.

> I counterred that Adobe already had multi platfor
> support in the past. So they can rewrite parts for ARM if they want to.

and they do.

> But it does show that it isn't just a simple recompile.

that's the whole point.

Alan Browne

unread,

Nov 14, 2020, 3:50:56 PM11/14/20

to

On 2020-11-14 14:48, JF Mezei wrote:
> On 2020-11-14 13:26, Alan Browne wrote:
>
>> Not at all. In treating large sets of integer values (which is what a
>> digital image is "made of"), assembler offers many ways to do many
>> things very efficiently and further allows for the ultimate of
>> optimization and clever tricks that are not expressable efficiently in
>> high level language.
>
> When moving to RISC, this changes the equation because of the large
> number of optomizations possible buy re-ordering of opcodes to allow the
> chip to make best use of pipeline, branch prediction and other
> perofrmance techniques.

1) RISC is an even better way to do tailored integer operations on large
arrays like images.
2) ARM chips over the past 10 or even 20 years have become quite CISCy.

> And Assembler, by definition, generates opcodes that match your code
> both in nature and order. No optimizatiosn possible.

Well, that is the point. Hand code in assembler so that every possible
instruction cycle is optimal. ie: optimization is implicit. That said,
tools can be employed that look for human error in the design and
suggest optimization.

For example a value pushed onto the stack when an otherwise un/little
used register is available could result in a large improvement in
performance.

> However, when a chip has an instruction to decode an H.264 stream for
> instance, it can be more efficient to use assembler to use that
> instriuction vs writing your own in higher level language that gets
> optimized.

Wow. Do you want your Sherlock medal by mail?

>
>
>> x26 assembler? Eh? *compiler*? Do you mean translator or converter?
>> Sure. Quite plausible.
>
> COMPILER. The compiler treats the source as high level language and
> does the optimization, re-ordering of the code to fit the target
> archictecture. A translator merely translates already compiled code
> without trying to understand the code and optmize it.

I only asked because in your typical rushed and poorly written prose
it's often difficult to understand what you're saying (esp. when it's
wrong or redundant as it often is).

[AAA].
Adding to that you have no clue what translators may be doing to
optimize code. For example they could easily decide to remove a bunch
of pushes onto the stack because more registers are available than the
original target. So the code runs much faster because less push/pops
but more importantly, VARs in register make for very fast execution v
VARs on stack.

> Rosetta 2 is a translator that takes code optimized for 8086 and
> translates it to multiple ARM instructions that do the same whereas
> recompiling will optimise the binary to run on ARM.

You don't know how many instructions it takes. Since later ARM's are
somewhat CISCy they can do a lot of instructions pretty much identical
to the x86_64.

Also, see an example above of optimization[AAA].

IAC, it would not matter much in many cases, as some x86 instructions
take a bucket of cycles that would take 2 or 3 RISC instructions of
shorter length.

>
> This is why Digital died.

Yep. They are dead. Get over it.

>
>> ... get over it already.
>
> So you refuse to discuss experiences of other ports and wish to keep
> your head in sand and and blindly drink Apple's kool aid.

Not at all. I probably have more experience with hardware and assembler
on a range of processor types than pretty much everyone here - probably
combined. And it was mainly for real time systems where in many cases,
cycles counted. (eg: TMS320C40 doing 100,000 interrupts per second when
it wasn't synthesizing a complex waveform. er, 2 waveforms). And I
confess, that that was not a "nice" CPU to program compared to, say, an
8086. (Worst was the Cosmac 1802 which I never mastered, thankfully I
just did very minor changes to that system).

>> You mean translator.
>
> For applicationn where performance counts, I mean COMPILER. Where low
> level code is truly needed, assembler is used because you can't optimize

Assembler is as optimal as can be assuming competent design.

> it and it needs to be rewritten from scratch because handling of device
> drivers, the types of io interfaces etc all very different.
>
> In the case of OS-X, much already exists from IOS, so much could be
> re-used, but there is still code needed for the new thunderbolt/USB-4
> drivers and all the variants attached to it, including ethernet drivers
> attached to the new thunderbolt/USB-4 IO interface.

Which, as I point out elsewhere, is almost certainly in HOL. High speed
I/O is most often via some variant of DMA. So you set the registers of
the DMA controller (via ports and/or memory mapped ports) and likewise
(usually) with an interrupt controller and/or memory controller.
Actual data movement is done by that hardware, not the CPU. Too much
(way) detail to bother with here.

>> The ARM assembler is available and Apple expressly provide their
>> guidelines to developers for using ARM assembly language
>
> I am sure it is. But when targetting a RISC platform, assembly language
> is much harder to beat efficieny of higher level languages because of
> compiler and LLVM optimizations which know about how that CPU does
> instruction pre-fetching, pipelining, branch prediction, etc.

Uh, no. Good assembler designers read the opcode manual very carefully
so that they know each instruction intimately (in its myriad variations)
and also carefully read every possible variant of the machine
optimizations (pre-fetch, pipeline, etc.) so they can take advantage of
the best set of instructions for the task.

Assembler allows the designer to carefully optimize every tiny step and
if desired model every machine cycle, etc. Understanding how the h/w
implementations of all the clever bits actually work allows a more
subtle optimization by the designer (ie: for some data sets that tend
some way, certain combinations of instructions will statistically beat
the assumed best set over a large set of data). This deep level of
optimization isn't done much, but I could see image processing being an
area where it could be an advantage.

Lewis

unread,

Nov 14, 2020, 6:20:54 PM11/14/20

to

No, I don't think that is correct. The SoC is entirely Apple design.

> Apple claim "16 billion transistors" for the M1. That can't possibly
> represent the entire CPU + 8 GB or 16 GB of memory. (8 GiB would take a
> minimum of 64B transistors for a DRAM and 6 times that for SRAM plus all
> the interface logic (address decode, etc.)).

The 16 billion transistors is for the CPUs specifically, not for the
entire M1. but how the RAM is integrated into the Soc? Sure, maybe those
components are sourced separately..

> So, could be integrated as separate chips onto 1 carrier:

> Looking at:
> https://www.apple.com/v/mac/m1/a/images/overview/chip__fffqz3ljssi2_large.jpg

> That looks like 2 memory modules integrated onto the same carrier as the
> processor portion to the left.

It does. We'll know more when someone scrapes an M1 and takes an
electron microscope to it, I suppose.

> IAC: we'll see in an M1 Mac if the memory "spec" is listed in the system
> information. I look forward to your screenshot.

Next month.

--
GRAMMAR IS NOT A TIME OF WASTE Bart chalkboard Ep. AABF10

Lewis

unread,

Nov 14, 2020, 6:26:27 PM11/14/20

to

In message <9AWrH.329952$9r7.2...@fx07.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> On 2020-11-14 13:47, nospam wrote:

>>> An you think they simply recompiled the 68000 code for Intel Are ou
>>> really that daft?
>>
>> do you really need to ask?

> Argument was made that Adobe had to rewrite large parts because they
> were in 8086 assembly. I counterred that Adobe already had multi platfor
> support in the past.

And it tok them forever to update then too. When did they finally have
an intel-native Mac App? And keep in mind, they already had ALL their
software running on Intel.

Adobe is slow. Always has been.

> So they can rewrite parts for ARM if they want to.

Eventually. Honestly, "net year" is lightning speed for Adobe.

> But it does show that it isn't just a simple recompile.

It is for many many developers and many many apps. Adobe goes to great
lengths to do their own thing, and that means the do not benefit from
many of the things Apple does to make transitions seamless.

Omni, all app on day one for M1. Affinity, all apps on M1 on one. Adobe
and Microsoft? Eh, some day next year probably. Google? 2027 for
anything not Chrome.

--
"Are you pondering what I'm pondering?"
"Umm, I think so, Don Cerebro, but, umm, why would Sophia Loren do a
musical?"

Lewis

unread,

Nov 14, 2020, 6:28:26 PM11/14/20

to

In message <8wWrH.329951$9r7....@fx07.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> And Assembler, by definition, generates opcodes that match your code
> both in nature and order. No optimizatiosn possible.

High bar, I know, but this may just be the dumbest most wrong thing you
have ever said.

--
"He raised his hammer defiantly and opened his mouth to say, "Oh,
yeah?" but stopped, because just by his ear he heard a growl. It
was quite low and soft, but it had a complex little waveform
which went straight down into a little knobbly bit in his spinal
column where it pressed an ancient button marked Primal Terror."

Alan Browne

unread,

Nov 14, 2020, 7:39:22 PM11/14/20

to

On 2020-11-14 18:20, Lewis wrote:
> In message <D5WrH.797928$eN2.1...@fx47.iad> Alan Browne <bitb...@blackhole.com> wrote:
>> On 2020-11-14 13:34, Lewis wrote:
>>> Seems like it. The M1 is not an "existing type" it is a new chip
>>> designed by Apple and made to their design. it is not being assembled out
>>> of other bits.
>
>> Those elements can come from different designers(companies) with the
>> data for the design transferred to Apple to integrate into their overall
>> design by specification.
>
> No, I don't think that is correct. The SoC is entirely Apple design.

Memory is a specialty not worth doing if someone else is a master at it.
Apple could license it and incorporate as needed. Not to say Apple
won't do their own memory in the future, but I'd bet they licensed it or
bought the modules outright. (or rather had the fab acquire the Apple
spec'd modules from the fab that made the modules unless it's the same
fab...)

>
>> Apple claim "16 billion transistors" for the M1. That can't possibly
>> represent the entire CPU + 8 GB or 16 GB of memory. (8 GiB would take a
>> minimum of 64B transistors for a DRAM and 6 times that for SRAM plus all
>> the interface logic (address decode, etc.)).
>
> The 16 billion transistors is for the CPUs specifically, not for the
> entire M1. but how the RAM is integrated into the Soc? Sure, maybe those
> components are sourced separately..
>
>> So, could be integrated as separate chips onto 1 carrier:
>
>> Looking at:
>> https://www.apple.com/v/mac/m1/a/images/overview/chip__fffqz3ljssi2_large.jpg
>
>> That looks like 2 memory modules integrated onto the same carrier as the
>> processor portion to the left.
>
> It does. We'll know more when someone scrapes an M1 and takes an
> electron microscope to it, I suppose.

Look at the Apple M1 video around 07:40 - 07:50. Not conclusive, but
certainly suggestive.

JF Mezei

unread,

Nov 14, 2020, 10:46:55 PM11/14/20

to

On 2020-11-14 15:50, Alan Browne wrote:

> Well, that is the point. Hand code in assembler so that every possible
> instruction cycle is optimal.

Jave you coded on Alpha? Have you coded on Itanium ? VAX ?

The Macro (VAX assembly langage) *compiler* for IA64 produced
faster/more efficient code than hand coding native IA64 assembler
becauise the LLVM/compilets on Ianiium spent the time to order and block
the operations properly to allow Itanic chips to run fast.

Doing the work native on IA64 would require you not only translate yoru
idea into individual opcodes, but also know what type of operations to
do in what order and insert the IA64 specific operations to tell the
chip which operatiosn depnds on which one.

> Adding to that you have no clue what translators may be doing to
> optimize code.

Apple has given sufficient hints on Rosetta2, namely that all system
calls and linked to a special library that accepts the call with Intel
argumant passing mechanism and then issue corresponding call to the
"real" routine with ARM argument passing standard.

This means the translator really only translates existing function
without undestanding what it does. Optimizing is much easier at higher
level because the concept of the loop etc are understood by the compiler.

> For example they could easily decide to remove a bunch
> of pushes onto the stack because more registers are available than the
> original target.

A language like Postscript is based on a stack, as is a reverse polish
calculator. That stack is part of the logic, not just some means to deal
with shortage of registers. A translator cannot know if you are using
the stack as temporary storage or whether it is an integral part of your
logic. The translator must maintain functionality.

A compiler generating code for x86 may decide to use stack mechanism to
store a value, and the same code, with teh same compiler targetting ARM
may use a register. But that is a decision made by compiler who
understand the desired goal of the sourcxe code.

A translator of already compiled binary code doesn't. If it sees use of
stack, it doesn't know whether it was meant as temporary storage, or if
it was truly meant as a LIFO storage logic desired by the program.

>> This is why Digital died.
>
> Yep. They are dead. Get over it.

Not sure where you got that quote, bit is was not me who said this in
this thread.

> Not at all. I probably have more experience with hardware and assembler
> on a range of processor types than pretty much everyone here

Simple "embedded device" processors tend to not have very fancy logic
and it is straighforward to code for them. Once you get into high
performnce processors (or the Itanic where Intel tried high
performance), it gets verry messy because of how a processor reacts to
instructions.

When you code for a processor that has faults when you tru to access
memory that isn't quadword aligned (64 bits), your fancy assembler code
that ran well on a less complex CPU suddently runs like molasses even
though that processor is supposed to be high performance. This is
something that won't happen with higher level labguage because the
copiler and LLVM know to align all memory access to a quadword to avoid
this and this is done automatically for you. so if you need the 3rd
byte, it will fetch 8 bytes from memory into a register and do the
sfifts to get the byte you want to avoid the memory fault.

> Which, as I point out elsewhere, is almost certainly in HOL. High speed
> I/O is most often via some variant of DMA.

Photoshop is not low level device driver. Most of a device driver is
now at higfher level labguage with only the very lowwest level in
assembler where you only do a few instructions.

> Assembler allows the designer to carefully optimize every tiny step and
> if desired model every machine cycle, etc.

"allows" is the keyword. The problem is that it requires you have
intimate knowledge of the whole architecturure to know what combinations
of OP codes you can use and in what order and know how the CPU will
pipeline them into different instriction prefetch etc.

When you are dealing with simplerembedded device CPUs, life is much
simpler and you focus on optimizing logic because every assembvler
instruction is executed sequentially anyways.

JF Mezei

unread,

Nov 14, 2020, 10:52:32 PM11/14/20

to

On 2020-11-14 18:26, Lewis wrote:

> Eventually. Honestly, "net year" is lightning speed for Adobe.

Adobe would have been aware of Apple's move to ARm for some time, well
before WWDC announcement. Adobe has also ported Photoshop to IOS/iPAD
OS, so some of that work may already have been done.

What surprises me is that Photoshop was mentioned, not Première or After
Effects which are facr more complex software in terms of hardware
interaction because they, at least on Wintel with CUDA GPSU will make
use of the GPUs a lot.

> Omni, all app on day one for M1. Affinity, all apps on M1 on one. Adobe
> and Microsoft? Eh, some day next year probably. Google? 2027 for
> anything not Chrome.

The keynote spoke at great length of its laptops for video and photo
editing. It didn't focus on those laptops being use for workd processor
or email. If the keynote focuses on the very tasks handled by Adobe,
then if Adobe isn't there, it's a problem.

(Especially since at WWDC they demoed Adobe software running translated,
which shows they place importance on Adobe software.

Lewis

unread,

Nov 15, 2020, 5:15:56 AM11/15/20

to

In message <2C1sH.110171$4d1....@fx09.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> On 2020-11-14 18:26, Lewis wrote:

>> Eventually. Honestly, "net year" is lightning speed for Adobe.

> Adobe would have been aware of Apple's move to ARm for some time, well
> before WWDC announcement. Adobe has also ported Photoshop to IOS/iPAD
> OS, so some of that work may already have been done.

And yet, Adobe is slow. Adobe is always slow. Their sloth has nothing to
do with anything or anyone who is not Adobe.

> What surprises me is that Photoshop was mentioned, not Première or After
> Effects which are facr more complex software in terms of hardware
> interaction because they, at least on Wintel with CUDA GPSU will make
> use of the GPUs a lot.

Photoshop is Adobe's #1 product.

>> Omni, all app on day one for M1. Affinity, all apps on M1 on one. Adobe
>> and Microsoft? Eh, some day next year probably. Google? 2027 for
>> anything not Chrome.

> The keynote spoke at great length of its laptops for video and photo
> editing. It didn't focus on those laptops being use for workd processor
> or email.

I made zero mention of any "workd processor or email" apps.

> If the keynote focuses on the very tasks handled by Adobe,
> then if Adobe isn't there, it's a problem.

It's a problem for Adobe, sure.

> (Especially since at WWDC they demoed Adobe software running translated,
> which shows they place importance on Adobe software.

It shows "Even Adboe software will run (because we all know how much of
a pain in the ass Adobe software is, don't we?)"

--
I'd like to move us right along to a Peter Gibbons. Now we had a
chance to meet this young man, and boy that's just a straight
shooter with upper management written all over him.

nospam

unread,

Nov 15, 2020, 8:09:37 AM11/15/20

to

In article <2C1sH.110171$4d1....@fx09.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> Adobe would have been aware of Apple's move to ARm for some time, well
> before WWDC announcement.

no

> Adobe has also ported Photoshop to IOS/iPAD
> OS, so some of that work may already have been done.

also wrong.

the ios version of photoshop is not in any way a port.

> What surprises me is that Photoshop was mentioned, not Première or After
> Effects which are facr more complex software in terms of hardware
> interaction because they, at least on Wintel with CUDA GPSU will make
> use of the GPUs a lot.

photoshop has massive name recognition. just about everyone has heard
of it. it's even become a verb.

Alan Browne

unread,

Nov 15, 2020, 11:56:59 AM11/15/20

to

On 2020-11-14 22:46, JF Mezei wrote:
> On 2020-11-14 15:50, Alan Browne wrote:
>
>> Well, that is the point. Hand code in assembler so that every possible
>> instruction cycle is optimal.
>
> Jave you coded on Alpha? Have you coded on Itanium ? VAX ?

Irrelevant to x86 / ARM transition and, read the news: Alpha is OLD and
dead technology.

The most recent Itanium iteration had no improvement made to it other
than clock speed. And that was in 2017. That is the end of that
product. It's dead Jim.

Stop bringing it up.

And yes, I've coded on VAX VMS, though exclusively HOL I had to
understand the stack operations to integrate across languages (calling
Fortran S/R's from Pascal mainly). I've looked at the PDP-11 and VAX
instruction sets and there was nothing especially daunting about them -
indeed quite friendly as I recall.

> The Macro (VAX assembly langage) *compiler* for IA64 produced
> faster/more efficient code than hand coding native IA64 assembler
> becauise the LLVM/compilets on Ianiium spent the time to order and block
> the operations properly to allow Itanic chips to run fast.

For lazy assembler writers, sure. Don't forget, at heart, the compiler
writes machine instructions that are entirely expressible in assembler.
Thus a good assembler programmer would do fine.

In reality, writing assembler is too expensive (man hour cost) to
warrant it in most cases. So good optimizing compilers are more than
good enough and will do the most useful optimizations most often.

> Doing the work native on IA64 would require you not only translate yoru
> idea into individual opcodes, but also know what type of operations to
> do in what order and insert the IA64 specific operations to tell the
> chip which operatiosn depnds on which one.

As I've explained several times, writing good efficient code is the goal
of writing assembler. There's a big man-hour cost in design, coding,
testing and de-bugging not to mention long term life cycle costs. And
of course it's less portable than HOL.

>> Adding to that you have no clue what translators may be doing to
>> optimize code.
>
> Apple has given sufficient hints on Rosetta2, namely that all system
> calls and linked to a special library that accepts the call with Intel
> argumant passing mechanism and then issue corresponding call to the
> "real" routine with ARM argument passing standard.

It's JIT translation as well as translate on install so code is
translated a single time in install cases.

QUOTE
Rosetta 2 can convert an application right at installation time,
effectively creating an ARM-optimized version of the app before you’ve
opened it. (It can also translate on the fly for apps that can’t be
translated ahead of time, such as browser, Java, and Javascript
processes, or if it encounters other new code that wasn’t translated at
install time.) With Rosetta 2 frontloading a bulk of the work, we may
see better performance from translated apps.
ENDQUOTE
https://www.theverge.com/21304182/apple-arm-mac-rosetta-2-emulation-app-converter-explainer

And finally, developers who want to keep up, will be recompiling for M1
- if they haven't already.

(Don't bring up Adobe. We know).

>
> This means the translator really only translates existing function
> without undestanding what it does. Optimizing is much easier at higher
> level because the concept of the loop etc are understood by the compiler.

You don't know how sophisticated the translator is. I suspect it has
some very, very clever tricks up its sleeve.

>> For example they could easily decide to remove a bunch
>> of pushes onto the stack because more registers are available than the
>> original target.
>
> A language like Postscript is based on a stack, as is a reverse polish
> calculator. That stack is part of the logic, not just some means to deal
> with shortage of registers. A translator cannot know if you are using
> the stack as temporary storage or whether it is an integral part of your
> logic. The translator must maintain functionality.

I was referring to stack machine call conventions for parameter passing
and saving return addresses, registers, etc. You're referring to an
implementation abstraction.

An RPN calculator, in HOL code, emulates a calculator stack in vars
(usually a linked list of some kind). This is not the same as the
machine stack, but instead an abstraction of a calculator stack usually
implemented in a HOL such as C, Fortran, Pascal, etc.

And yes, if it recurses (as it should) then the machine stack is used
for that, but the abstraction of the RPN is not on the machine stack.
It is in program memory (Data segment, not stack segment). It could
also use allocated memory if the stack is extremely deep (ie: not likely
to be done but is certainly "doable" for the exercise). This is
typically in the "extra segment" (_x86 speak) and based off of that
register pointer.

Postscript would also implement a stack structure (linked list probably)
to save and restore states through the document.

> A compiler generating code for x86 may decide to use stack mechanism to
> store a value, and the same code, with teh same compiler targetting ARM
> may use a register. But that is a decision made by compiler who
> understand the desired goal of the sourcxe code.

You don't need to know the "goal" only that a particular stack located
variable can instead be put into a register. That saves a push/pop and
more importantly is much faster than a memory located var. Going from
the 16x64b registers of the x86 to the 29 x 64b (available) registers of
the ARM will afford a lot of opportunity for the translator to do the same.

Even my _x86 Pascal compiler does this. Example:

.globl _P$INVENBAL_$$_GETFILERECPOINTER$WORD$$PRECPTR
_P$RUNFILM_$$_GETFILERECPOINTER$WORD$$PRECPTR:
# Temps allocated between rsp+0 and rsp+56
# [733] BEGIN
pushq %rbx
pushq %r12
pushq %r13
pushq %r14
pushq %r15
leaq -64(%rsp),%rsp
# Var v located in register r14w
# Var $result located in register rax
# Var TLim located in register xmm0
# Var T located in register xmm0
# Var i located in register eax
# Var R located in register eax
# Var Rc located in register eax
# Var TCount located in register eax
# Var Gr located in register r12b
# Var State located in register r15b
# Var found located in register r13b

ie: all these vars are usually stack located. Now (with the right
switch), they are register located. Two of those are pointers and that
makes for extraordinary speed improvements in accessing and processing
data, esp. with double indirect operations.

The ARM will just add 13 more registers for such optimization!

> A translator of already compiled binary code doesn't. If it sees use of
> stack, it doesn't know whether it was meant as temporary storage, or if
> it was truly meant as a LIFO storage logic desired by the program.

A well designed translator can go to the extent that resulting ops are
faster than the cost of implementation.

>>> This is why Digital died.
>>
>> Yep. They are dead. Get over it.
>
> Not sure where you got that quote, bit is was not me who said this in
> this thread.

>
>
>> Not at all. I probably have more experience with hardware and assembler
>> on a range of processor types than pretty much everyone here
>
> Simple "embedded device" processors tend to not have very fancy logic
> and it is straighforward to code for them. Once you get into high
> performnce processors (or the Itanic where Intel tried high
> performance), it gets verry messy because of how a processor reacts to
> instructions.

No. It's just expensive for the vast majority of applications so a
compiler is used.

> When you code for a processor that has faults when you tru to access
> memory that isn't quadword aligned (64 bits), your fancy assembler code
> that ran well on a less complex CPU suddently runs like molasses even
> though that processor is supposed to be high performance. This is
> something that won't happen with higher level labguage because the
> copiler and LLVM know to align all memory access to a quadword to avoid
> this and this is done automatically for you. so if you need the 3rd
> byte, it will fetch 8 bytes from memory into a register and do the
> sfifts to get the byte you want to avoid the memory fault.

Properly written and tested assembler code will avoid such faults.
Further, align pragmas are very present in assembler (and have been for
a very long time going back) if one wants speed over storage. Just
design trade decisions.

[AAA]
>>Mezei wrote, and snipped in an attempt to dodge the facts:

>>> In the case of OS-X, much already exists from IOS, so much could be
>>> re-used, but there is still code needed for the new
>>> thunderbolt/USB-4 drivers and all the variants attached to it,
>>> including ethernet >>>drivers attached to the new thunderbolt/USB-4
>>> IO interface.

>> Which, as I point out elsewhere, is almost certainly in HOL. High speed
>> I/O is most often via some variant of DMA.

[See AAA above]

>
> Photoshop is not low level device driver. Most of a device driver is

I was replying to your I/O points that you snipped out and that I
restored above.[AAA]. Again: device drivers are less and less in
assembler and more and more HOL.

> now at higfher level labguage with only the very lowwest level in
> assembler where you only do a few instructions.

As also explained to you a couple times, image processing is very large
integer arrays. So assembler is a great way to do some functions on
such data.

Going from x86_64 to ARM with 29 available 64b registers will seem like
a gift from heaven to Adobe for such processing and will help it blaze.

>> Assembler allows the designer to carefully optimize every tiny step and
>> if desired model every machine cycle, etc.
>
> "allows" is the keyword. The problem is that it requires you have
> intimate knowledge of the whole architecturure to know what combinations
> of OP codes you can use and in what order and know how the CPU will
> pipeline them into different instriction prefetch etc.

A good programmer should have such knowledge - as I've pointed out
elsewhere.

>
> When you are dealing with simplerembedded device CPUs, life is much
> simpler and you focus on optimizing logic because every assembvler
> instruction is executed sequentially anyways.

Even the simplest microcontrollers these days have optimizations and
it's entirely independent of the language used.

Assembler is not worth the cost in 99.99% of cases. Adobe, who cater to
a huge audience including very high end photography and marketing
departments want to not only do the most, but do it fast. So their core
processes are worth the man hour investment - which is part of the
relatively high price we pay for many Adobe products.

Your Name

unread,

Nov 15, 2020, 3:05:07 PM11/15/20

to

On 2020-11-15 10:15:54 +0000, Lewis said:
> In message <2C1sH.110171$4d1....@fx09.iad> JF Mezei
> <jfmezei...@vaxination.ca> wrote:
>> On 2020-11-14 18:26, Lewis wrote:
>>>
>>> Eventually. Honestly, "net year" is lightning speed for Adobe.
>>
>> Adobe would have been aware of Apple's move to ARm for some time, well
>> before WWDC announcement. Adobe has also ported Photoshop to IOS/iPAD
>> OS, so some of that work may already have been done.
>
> And yet, Adobe is slow. Adobe is always slow. Their sloth has nothing to
> do with anything or anyone who is not Adobe.
>
>> What surprises me is that Photoshop was mentioned, not Première or After
>> Effects which are facr more complex software in terms of hardware
>> interaction because they, at least on Wintel with CUDA GPSU will make
>> use of the GPUs a lot.
>
> Photoshop is Adobe's #1 product.

<snip>

Depends on how you 'calculate' that. In terms of dollar-sales value,
Photoshop could be their #1 product, but in terms of users it's more
likely the free Adobe Acrobat Reader.

JF Mezei

unread,

Nov 16, 2020, 9:47:48 AM11/16/20

to

On 2020-11-15 11:56, Alan Browne wrote:

> Irrelevant to x86 / ARM transition and, read the news: Alpha is OLD and
> dead technology.

Intel's x86 has "alphas Inside". AMD has "Alphs Inside". Apple's Ax
chips have Alpha inside. Don't underestimate the silent controbution it
made when it was given to varous groups and when engineers split off so
they woudln't be forced to wrok for Intel when HP "donated" them to Intel.

Multi-code memory controllers were first developped by Digital on Alpha.
Out of order execution, branch predictions done by DEC.

> The most recent Itanium iteration had no improvement

While it failed, Itanic is the perfect eample of a platform where
writing in Assembler was strong dissuaded because the CPU expects the
compiler to do the optimizarions which it doesn't do. (the E in EPIC
was because the compiler had to explicitedly add optcodes to tell the
CPU about blocks that could be executed in parralel).

Apple is very opaque when it comes to how it implemented the ARM
architecture. Not knowing the details of how it is implemented measn you
cannot optimize manually any assembler your write.

> than clock speed. And that was in 2017. That is the end of that
> product. It's dead Jim.

Itanic died well before that. HP paid Intel to keep it on life support
because L:a Cary made the mistake of signing on hard long term
commitments. So what happened is that thye promised iterations were
spread over the remaining 15 years of that 20 year commitment. The last
one was, like the commitment for a last Alpha, the same chip but the
bvetter quality ones pushed to higher clock rate. (EV7z were EV7 Alpha
that passed tests at higher clock rates)

> And yes, I've coded on VAX VMS, though exclusively HOL

Was was very very CISC, but because of that, was also vbery simple in
terms of how it processed instructions. This is whty Digital moved to
RISC where it could then work on simpler instructions that could easily
be opto9mied and huge efficiency gains. Those optimizations is what
makes coding in Assember a lot harder if you want to be efficient.

> For lazy assembler writers, sure. Don't forget, at heart, the compiler
> writes machine instructions that are entirely expressible in assembler.
> Thus a good assembler programmer would do fine.

The compiler and now LLVM are intimate with the target CPU and will do
optimiuzations that a organized Assembler program won't do because it
would become inreadable with operations i different order than your
logic requires.

> As I've explained several times, writing good efficient code is the goal
> of writing assembler.

This is old thinking because CPUs have gotten very complex under the
hood on how the process opcodes. Assembler is required for very low
level stuff (drivers, hardware interface) and if you need to use
instructions such as encryptioN/copmpression tyat are in the CPU but not
accessible from high level language. (and at that, the code snippet is
very short, taking argumenst, issuing the opcode and then returning).

> QUOTE
> Rosetta 2 can convert an application right at installation time,
> effectively creating an ARM-optimized version of the app before you’ve
> opened it.

The binary being translated has been optimised for x86. Rosetta 2 will
translate to replicate x86 operations with a or multiple ARM
instructions. But it will not optmize, reorder or regroup opcodes to
make full use of how Apple implemented ARM.

This is why an application compiled for ARM will outperform the same
appliction compiled for x86 and then translated to ARM.

> I was referring to stack machine call conventions for parameter passing
> and saving return addresses, registers, etc. You're referring to an
> implementation abstraction.

At run time, the CPU does not know if you are pushing something on the
stack for the purposes of building an argument list for subsequent call
to subroutine (which for it, is a mere "branch") or whether you are
pushing something something on the stack because you are implementing a
LIFO buffer in your logic which must not be thinkered with.

the CPU has no idea when you will be popping items from the stack, nor
does it know how many items will be popped from the stack when the next
subroutine is called.

And do not forget that calling any system service will get you to a
jacket subroutine which will process your Intel-format argument line,
built an ARM formal argument list and then call the system service for
you. So the translator cannot willy nilly change push and pops into
register storage operations.

> An RPN calculator, in HOL code, emulates a calculator stack in vars
> (usually a linked list of some kind). This is not the same as the
> machine stack, but instead an abstraction of a calculator stack usually
> implemented in a HOL such as C, Fortran, Pascal, etc.

A RPN calculator written in Assembler will make full use of the stack.
And the varous "stack" operations available from highler level languages
will make use of them as well.

> Properly written and tested assembler code will avoid such faults.
> Further, align pragmas are very present in assembler (and have been for
> a very long time going back) if one wants speed over storage. Just
> design trade decisions.

Align pragmas align a variable. But if you need the 3rd byte of a 4 byte
variable, your assembler will be quite different if you run on a machine
that lets you access any byte, vs one that required only
quadword-aligned accesses. (at which point, you load value in register
and do shifts to get your 3rd byte).

> I was replying to your I/O points that you snipped out and that I
> restored above.[AAA]. Again: device drivers are less and less in
> assembler and more and more HOL.

And for exactly the reasons I mentioned. What is left is truly the very
low level stuff.

=

> A good programmer should have such knowledge - as I've pointed out
> elsewhere.

So I take it that you are like nospam and lewis and claim to have
intimate knowledge of how Apple implemented ARM instriuctions, what sort
of logic its CPUs have in terms of instruction processing, pr-fetching,
parralel instruction decoding, pr-loading of values, branch prediction etc?

> Assembler is not worth the cost in 99.99% of cases. Adobe, who cater to
> a huge audience including very high end photography and marketing
> departments want to not only do the most, but do it fast.

Adobe products have to respond to a large variety of variants wvene
within the 8086 family. They likely have high level labnguage
implementation of logic, but if running on a a CPU that supports
instriction Z, will instead branch to a small assembler routine that
uses the opcode available on that CPU.

And I have to assume that Apple provided that info to Adobe under NDA a
long long time go.

nospam

unread,

Nov 16, 2020, 10:43:59 AM11/16/20

to

In article <miwsH.528513$RY8.4...@fx48.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

>
> Intel's x86 has "alphas Inside". AMD has "Alphs Inside". Apple's Ax
> chips have Alpha inside.

nonsense.

Lewis

unread,

Nov 16, 2020, 10:48:10 AM11/16/20

to

In message <miwsH.528513$RY8.4...@fx48.iad> JF Mezei <jfmezei...@vaxination.ca> wrote:
> Intel's x86 has "alphas Inside". AMD has "Alphs Inside". Apple's Ax
> chips have Alpha inside.

PLEASE take your meds.

--
I had the weirdest dream last night where I was a chocolate doughnut and you
were a steam roller painted pink. What do you think it means?
I think it means you were asleep.

Alan Browne

unread,

Nov 16, 2020, 2:01:40 PM11/16/20

to

On 2020-11-16 09:47, JF Mezei wrote:
> On 2020-11-15 11:56, Alan Browne wrote:

> Was was very very CISC, but because of that, was also vbery simple in
> terms of how it processed instructions. This is whty Digital moved to
> RISC where it could then work on simpler instructions that could easily
> be opto9mied and huge efficiency gains. Those optimizations is what
> makes coding in Assember a lot harder if you want to be efficient.

Writing assembler in RISC is just longer. Not harder. And yes, the
goal is always to be efficient (speed).

The goal (in the day) of RISC was higher performance at lower machine cost.

And, as mentioned, the ARM-64 is pretty CISCy as are most formerly RISC
processors have evolved over time.

Get over it. Your comparisons to a dead era are meaningless.

>
>
>> For lazy assembler writers, sure. Don't forget, at heart, the compiler
>> writes machine instructions that are entirely expressible in assembler.
>> Thus a good assembler programmer would do fine.
>
> The compiler and now LLVM are intimate with the target CPU and will do
> optimiuzations that a organized Assembler program won't do because it
> would become inreadable with operations i different order than your
> logic requires.

Out of order ops are equally implementable by writing assembler. If you
mean predictive branching, then it's entirely irrelevant.

>> As I've explained several times, writing good efficient code is the goal
>> of writing assembler.
>
> This is old thinking because CPUs have gotten very complex under the
> hood on how the process opcodes. Assembler is required for very low

No it is not old thinking. You resort to assembler because the compiler
is not doing as well as hand written machine code.

> level stuff (drivers, hardware interface) and if you need to use
> instructions such as encryptioN/copmpression tyat are in the CPU but not
> accessible from high level language. (and at that, the code snippet is
> very short, taking argumenst, issuing the opcode and then returning).

And (again) most drivers are actually in HOL. Not assembler. Because
of all sorts of needs from portability to maintenance and simply because
there is no difference in the HOL code for simple things like
reading/writing ports/mem-mapped i/o.

The rest of what you write is simple nonsense.

>> QUOTE
>> Rosetta 2 can convert an application right at installation time,
>> effectively creating an ARM-optimized version of the app before you’ve
>> opened it.
>
>
> The binary being translated has been optimised for x86. Rosetta 2 will
> translate to replicate x86 operations with a or multiple ARM
> instructions. But it will not optmize, reorder or regroup opcodes to
> make full use of how Apple implemented ARM.

You have no clue at all how Rosetta 2 works. Do you believe that Apple
are working at your coarse level of unpracticed understanding or that
perhaps they are making R2 really shine with a sophisticated conversion
process - one that takes place at install time especially?

For most implementations the translation will be a one time (on install
of the x86 code) event. That one time event allows for all sorts of
cleverness.

>
> This is why an application compiled for ARM will outperform the same
> appliction compiled for x86 and then translated to ARM.

Almost always. But the difference between the compiled as ARM case and
translated case will be far less than you assume.

>
>> I was referring to stack machine call conventions for parameter passing
>> and saving return addresses, registers, etc. You're referring to an
>> implementation abstraction.
>
> At run time, the CPU does not know if you are pushing something on the
> stack for the purposes of building an argument list for subsequent call
> to subroutine (which for it, is a mere "branch") or whether you are
> pushing something something on the stack because you are implementing a
> LIFO buffer in your logic which must not be thinkered with.

You don't seem to understand that the CPU stack is not the stack used in
the abstraction of a thing like an RPN calculator.

> the CPU has no idea when you will be popping items from the stack, nor
> does it know how many items will be popped from the stack when the next
> subroutine is called.

Again: implementation of RPN stack abstraction ≠ CPU stack.

>
> And do not forget that calling any system service will get you to a
> jacket subroutine which will process your Intel-format argument line,
> built an ARM formal argument list and then call the system service for
> you. So the translator cannot willy nilly change push and pops into
> register storage operations.

You're so off on understanding how such can be implemented that it
doesn't bear reply.

>> An RPN calculator, in HOL code, emulates a calculator stack in vars
>> (usually a linked list of some kind). This is not the same as the
>> machine stack, but instead an abstraction of a calculator stack usually
>> implemented in a HOL such as C, Fortran, Pascal, etc.
>
> A RPN calculator written in Assembler will make full use of the stack.
> And the varous "stack" operations available from highler level languages
> will make use of them as well.

You do not understand the difference between the implementation
abstraction of the calculator "stack" and the CPU processor stack. They
are two separate entities. Of course the later will be used, esp. as
any such calculator will likely use recursion, but the abstraction of a
calculator stack would not use the processor stack to abstract the RPN.

eg: in a HOL the RPN stack for a 16 deep RPN could be represented as:

ValStack=record
array[0..15] of double;
ptr: shortint;
end;

The pointer just keeps track of the last thing thrown to the stack
(initial ptr value 15 or 0 depending on how you want to do it). The
procedures that operate the RPN would of course have to change that
pointer and bound it and handle exceptions. (One could have an open
ended "stack" using a linked list instead at the cost of more handling).

And then operations would operate on that. function calls would of
course use the CPU stack, but the above record would be in local memory
or the var section of that procedure (itself located on the stack).

>> Properly written and tested assembler code will avoid such faults.
>> Further, align pragmas are very present in assembler (and have been for
>> a very long time going back) if one wants speed over storage. Just
>> design trade decisions.
>
> Align pragmas align a variable. But if you need the 3rd byte of a 4 byte
> variable, your assembler will be quite different if you run on a machine
> that lets you access any byte, vs one that required only
> quadword-aligned accesses. (at which point, you load value in register
> and do shifts to get your 3rd byte).

The point of an align pragma is so the next declared variable begins on
the align divisor address. Properly accessing that variable results in
a correct store or load. Most assemblers will throw a warning if you
load, eg: a byte, from a variable declared as a word (most notations do
allow for accessing a sub byte of a longer var, but that has to be done
explicitly to avoid a warning).

Every processor I've worked on does the shifting automatically (at the
cost of a few machine cycles) if the word is not aligned.

Thus, esp. for stack located variables it makes sense to align as that
memory is temporarily allocated to that function or procedure so there's
no real memory "cost" to it.

Global variables are a different matter and lead to the programmer
optimizing the order of variables, typically the largest (records of n
bytes, then 8 byte, then 4....1) in groups to keep the large vars
aligned, and to minimize memory unused due to align pragmas.

IOW, you're throwing up nonsense. Again.

>
>> I was replying to your I/O points that you snipped out and that I
>> restored above.[AAA]. Again: device drivers are less and less in
>> assembler and more and more HOL.
>
> And for exactly the reasons I mentioned. What is left is truly the very
> low level stuff.

Not in Adobe's case it would appear. They are doing some core
functionality in assembler in order to have higher performance.

>
> =
>> A good programmer should have such knowledge - as I've pointed out
>> elsewhere.
>
> So I take it that you are like nospam and lewis and claim to have
> intimate knowledge of how Apple implemented ARM instriuctions, what sort
> of logic its CPUs have in terms of instruction processing, pr-fetching,
> parralel instruction decoding, pr-loading of values, branch prediction etc?

I have intimate knowledge of real time programming in various assembler
languages, though not ARM. Looking at the ARM architecture there is
nothing esp. daunting, and of course going forward from an architecture
with 16 general purpose x 64b registers to one with 29 x 64b GP
registers doesn't make anything harder - quite the opposite.

>
>
>> Assembler is not worth the cost in 99.99% of cases. Adobe, who cater to
>> a huge audience including very high end photography and marketing
>> departments want to not only do the most, but do it fast.
>
> Adobe products have to respond to a large variety of variants wvene
> within the 8086 family. They likely have high level labnguage
> implementation of logic, but if running on a a CPU that supports
> instriction Z, will instead branch to a small assembler routine that
> uses the opcode available on that CPU.

More likely to use blocks of code tailored to each processor variant.
And I doubt it's granular to all possible variants but instead to the
more useful ones.

> And I have to assume that Apple provided that info to Adobe under NDA a
> long long time go.

No need other than special arrangements they may have. The guidance on
ARM programming for Apple iOS, tvOS and Mac OS is openly available.

Conclusion: you don't know much about machine level programming. Your
silly RPN example is clear evidence of that (can't discern between the
CPU stack and an RPN program stack abstraction).

The M1 is blazing fast for a variety of reasons from the CPU design
(always getting better) to an optimized use of memory ("unified")
amongst key elements of the processor to the operating system taking
advantage at an intimate level (and CPU designed for the OS) to Rosetta
2 most likely doing pretty deep optimization on emulated code.

And I expect that the M2 and on will quite fantastic with even higher
core counts appropriate to iMacs, higher end minis and laptops and "pro"
machines.

I'm done. Surprised I've spend so much time on this, actually.

JF Mezei

unread,

Nov 16, 2020, 5:25:51 PM11/16/20

to

Intel stole Alpha patents in mid 1990s. Digital won the lawsuit in 1997,
but instead of punishing Intel, as Digital was in the process of
downsizing itself so Compaq could buy it, it formally donated all Alpha
IP to Intel, committed its Digital-Unix port to Itanium (which was still
vapourware).

Alpha development continued with EV6 and EV7 and EV8 planned. When HP's
La Carly needed a distraction to postpone her being fired, she sagreed
to buy Compaq (which, like Digital, had set itself up for sale). Part of
this included killing Alpha in June 2001. HP announced purchase of
Compaq Sept 7 2001.

At that time, HP committed to completing EV7 and instead of EV8, set
target as EV7+. EV7 was delayed as mucha s they could because it woudl
have made Itanium look like a 1950s computer.

As part of the deal for killing Alpha and porting the Digital and Tandem
machines to Itanium, HP gave alpha engineers to Intel (keeping epough to
finish EV7 and then they were also "donated" to Intel, as well as all
the intellectual property related to Alpha, in particular the high end
memory controller which would become rhe CSI/Quickpath. Digital had had
experience with non uniform memory access since the 1990s without much
success so with EV7 found the solition to make it work. This approach
ios the basis for most memory controllers today.

As part of HP divesting itself from chiup engineering, a lot of them
didn't go to Intel, but went instead to AMD and with a large group going
to PA Semi Conductors which apple purchased to make its own chips.

BTW, the 1997 Intel agreement also had Intel get Digital's Hudosn FAB
which was still state of the art (but never upgraded so abandonned), and
also Digital's Strong ARM IP. The Strong ARM are the folks who fed the
Newton BTW.

intel squandered StrongARM in part because getting the Alpha IP gave it
a change to push its 8086 further.

The aspect which was of value to COmpaq and HP is that the "punishment"
Intel got for stealing Alpha IP was offering 8086s at low prices to
Digital, a deal inherited by COmpaq and later HP. As I recall it was a
10 year deal.

JF Mezei

unread,

Nov 16, 2020, 6:53:24 PM11/16/20

to

On 2020-11-16 14:01, Alan Browne wrote:

> And, as mentioned, the ARM-64 is pretty CISCy as are most formerly RISC
> processors have evolved over time.

So can you please describe the various technique Apple has imp;lemeted
its its cores with regards to instruction pre-fetching, pipelining, out
of order excecution, prediuctive branching? From your discussion, you
appear to be fully comfortable with Apple's ARM cores so it would be a
great contribution if you could describe them to help others who need to
write efficient assembler.

> Out of order ops are equally implementable by writing assembler.

And then try to get someone else to understand your source code.

The reason I brough up VAX to Alpha and Alpha to Inanium is excatctly
because the chips evolved into a situation where normal assembler
writing will often not only not take advantage of run-time optimizations
but can also slow things down if you do operaions in an order that
forces pre-feteched ionstructions to be reset and start from scartch again.

> No it is not old thinking. You resort to assembler because the compiler
> is not doing as well as hand written machine code.

C, Fortran and COBOL , on older plaforms gnerated very efficient code.
On newwer platforms, tend to generate more efficient code due to
optimizations by the compiler.

C++ has a few more layers of abastractions, but can also generate very
efficient code, as with Apple's SWIFT. This is because the optimizatiosn
done by the compiler and then by LLVM make maximum use of the chip, and
the old concept of only assembly being fast is gone.

> And (again) most drivers are actually in HOL. Not assembler. Because
> of all sorts of needs from portability to maintenance

hardware drivers still need some assembler at lowest level to interface
with the hardware. Higher level drivers can be in C or other.

> You have no clue at all how Rosetta 2 works.

Again, consider that Rosetta 2 links the translated image against
special system frameworks that accept Intel format calls and transform
them to ARM format calls and then call the desired system routine.

And within the translated image, the code still calls internal routines
with Intel format/technique for argument passing.

This says a LOT about what Rosetta does and doesn't do. And there cabn't
be logic changes because the translator needs to maintain the
environment expected by the rest of the code.

> Do you believe that Apple
> are working at your coarse level of unpracticed understanding or that
> perhaps they are making R2 really shine with a sophisticated conversion
> process - one that takes place at install time especially?

Rosetta 2 is a translator, not a recompiler. Yes, the translated image
runs natively as ARM code, but it maintains its Intel heritage for a lot
of stuff such as arguyment passing mechanisms.

> For most implementations the translation will be a one time (on install
> of the x86 code) event. That one time event allows for all sorts of
> cleverness.

Contrary to previous ports, there insn't much abandonware left on OS-X.
If you didn't compiled 64 bits, you're no longer present/supported on
platform. So of the software that is available now, chances are it will
all be recompiledc ro ARM shortly. There isn't much of a point to make a
translator that does AI on the binary code to understand what it tyries
to do and optimize it.

Consider also that OS-X is ARM native form day 1, comtrary to the 68K to
Power PC which kept many parts as 68K until finally converted. So there
is less of a need to make a super duper efficient translator.

> You don't seem to understand that the CPU stack is not the stack used in
> the abstraction of a thing like an RPN calculator.

The whole point of a CPU providing a stack is so it could be used. You
have no right to state that it is only used for argument passing.

> Again: implementation of RPN stack abstraction ≠ CPU stack.

Are you saying it is illegal to use the stack for such an app?
On which CPU/OS is it illegal ?

> The point of an align pragma is so the next declared variable begins on
> the align divisor address. Properly accessing that variable results in
> a correct store or load.

You missed the point. Your variable may be aligned, but if the variable
is a single byte, the available assembly operation might force load 8
bytes into the register. So you then have to play with shifts within the
register to get the one byte places in the leest significant bits of the
register. (and do the shisfst with awareness of sign implementation on
that platform).

On a compiler, this is done for you in the most efficient way and the
compiler may decide dto put many 1 byte variables together so one load
from memory allows access to any of the variable from register.

> Every processor I've worked on does the shifting automatically (at the
> cost of a few machine cycles) if the word is not aligned.

For Itanic, it was an acual fault for the early versions. Heavy
performance penalty. The fault code would then pickup the 8 bytes and
extract what you needed from it for you. Whe Intel designed the chip,
they though compilers would take care of this. But it turns out not all
computing is scientific, and the business applications on those large
machines tended to treat charactersas strings and not numbers whene
processing a client record. So even COBOL ended up generating terrible
performance.

> Not in Adobe's case it would appear. They are doing some core
> functionality in assembler in order to have higher performance.

I doubt there is much in Assembler code. Likely very small routines that
make use of 1 opcode to encode/decode/copmpress/decompress a block, with
a high level routine doing the same work sitting on the siote for when
app is execured on a CPU that doesn't have that extra instruction.

> I have intimate knowledge of real time programming in various assembler
> languages, though not ARM. Looking at the ARM architecture there is
> nothing esp. daunting, and of course going forward from an architecture
> with 16 general purpose x 64b registers to one with 29 x 64b GP
> registers doesn't make anything harder - quite the opposite.

It is how it is implemented that matters. The stuff one doesn't see such
as instruction pre-fecching/decoding, pipelining, etc.

> Conclusion: you don't know much about machine level programming. Your
> silly RPN example is clear evidence of that (can't discern between the
> CPU stack and an RPN program stack abstraction).

Not surprising coming from you or your ilk such as nospam and lewis.

> The M1 is blazing fast for a variety of reasons from the CPU design
> (always getting better) to an optimized use of memory ("unified")

Oh come on now. "unified memnory" is just marketing bullshit. Apple
hasn' conformed memory type or speed on the M1, it is only speculation
that it is the same as on the Intel model (the Lo power DDR4).

Real benchmarks should start popping up soon. Ones that last a few
minutes, enough to test thermal performance.

> And I expect that the M2 and on will quite fantastic with even higher
> core counts appropriate to iMacs, higher end minis and laptops and "pro"
> machines.

Only time will tell how Apple will scale its Axx chips to the Mac line
and what will become of the Mac Pro. And only time will tell whether the
Apple Macs will be narrowed to specific uses for which benchmarks are
produced or whether they will remain , like the x86, for generic computing.

aka: chip perfect for viewing H.265 videos, but sucks at calculating pi
to 5 billion decimals.

nospam

unread,

Nov 16, 2020, 9:33:40 PM11/16/20

to

In article <ShEsH.225224$BL.8...@fx16.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> > No it is not old thinking. You resort to assembler because the compiler
> > is not doing as well as hand written machine code.
>
> C, Fortran and COBOL , on older plaforms gnerated very efficient code.
> On newwer platforms, tend to generate more efficient code due to
> optimizations by the compiler.

hah, no.

> C++ has a few more layers of abastractions, but can also generate very
> efficient code, as with Apple's SWIFT. This is because the optimizatiosn
> done by the compiler and then by LLVM make maximum use of the chip, and
> the old concept of only assembly being fast is gone.

c++ has a lot more info for the compiler to be able to optimize than c,
fortran or cobol.

optimized cobol. that's really funny.

> > And (again) most drivers are actually in HOL. Not assembler. Because
> > of all sorts of needs from portability to maintenance
>
> hardware drivers still need some assembler at lowest level to interface
> with the hardware.

no they don't.

> Higher level drivers can be in C or other.

that part is true.

>
> Consider also that OS-X is ARM native form day 1, comtrary to the 68K to
> Power PC which kept many parts as 68K until finally converted. So there
> is less of a need to make a super duper efficient translator.

68k apps ran faster on powerpc than they natively ran on 68k macs.

some intel apps will run faster on m1 than natively on intel. it's
still much too early to know just how many and by how much.

> > You don't seem to understand that the CPU stack is not the stack used in
> > the abstraction of a thing like an RPN calculator.
>
> The whole point of a CPU providing a stack is so it could be used. You
> have no right to state that it is only used for argument passing.
>

> > Again: implementation of RPN stack abstraction é CPU stack.

>
> Are you saying it is illegal to use the stack for such an app?
> On which CPU/OS is it illegal ?

whooooooooosh.

> > The point of an align pragma is so the next declared variable begins on
> > the align divisor address. Properly accessing that variable results in
> > a correct store or load.
>
> You missed the point. Your variable may be aligned, but if the variable
> is a single byte, the available assembly operation might force load 8
> bytes into the register. So you then have to play with shifts within the
> register to get the one byte places in the leest significant bits of the
> register. (and do the shisfst with awareness of sign implementation on
> that platform).

no.

> On a compiler, this is done for you in the most efficient way and the
> compiler may decide dto put many 1 byte variables together so one load
> from memory allows access to any of the variable from register.

maybe, if there are enough to actually do that and it makes sense,
which it might not. typically it's padded.

> > Every processor I've worked on does the shifting automatically (at the
> > cost of a few machine cycles) if the word is not aligned.
>
> For Itanic, it was an acual fault for the early versions. Heavy
> performance penalty. The fault code would then pickup the 8 bytes and
> extract what you needed from it for you. Whe Intel designed the chip,
> they though compilers would take care of this. But it turns out not all
> computing is scientific, and the business applications on those large
> machines tended to treat charactersas strings and not numbers whene
> processing a client record. So even COBOL ended up generating terrible
> performance.

earlier, you said cobol was highly optimized. what happened?

> > Not in Adobe's case it would appear. They are doing some core
> > functionality in assembler in order to have higher performance.
>
> I doubt there is much in Assembler code. Likely very small routines that
> make use of 1 opcode to encode/decode/copmpress/decompress a block, with
> a high level routine doing the same work sitting on the siote for when
> app is execured on a CPU that doesn't have that extra instruction.

another one of your uninformed assumptions.

with photoshop, key core image processing routines are *very* highly
optimized, to where they are tuned for different versions of processors
in the same family.

>
>
> > The M1 is blazing fast for a variety of reasons from the CPU design
> > (always getting better) to an optimized use of memory ("unified")
>
> Oh come on now. "unified memnory" is just marketing bullshit. Apple
> hasn' conformed memory type or speed on the M1, it is only speculation
> that it is the same as on the Intel model (the Lo power DDR4).

it very definitely is *not* marketing bullshit.

> Real benchmarks should start popping up soon. Ones that last a few
> minutes, enough to test thermal performance.

they already have been and are extremely impressive.

<https://www.macrumors.com/2020/11/15/m1-chip-emulating-x86-benchmark/>
Despite the impact on performance, the single-core Rosetta 2
score results still outperforms any other Intel Mac, including the
2020 27-inch iMac with Intel Core i9-10910 @ 3.6GHz.

> > And I expect that the M2 and on will quite fantastic with even higher
> > core counts appropriate to iMacs, higher end minis and laptops and "pro"
> > machines.
>
> Only time will tell how Apple will scale its Axx chips to the Mac line
> and what will become of the Mac Pro.

how many times do you have to be told that the a* chips are for phones
and ipads, not for macs?

apple's announcement made this *very* clear by choosing an entirely new
letter, m, for the mac processors, at least initially.

it's likely that higher end apple silicon processors destined for the
mac pro and imac pro might get their own letter rather than a m-variant
(e.g., m1x), much like intel does with core i5/i7 and xeon.

nospam

unread,

Nov 17, 2020, 2:42:52 PM11/17/20

to

In article <2C1sH.110171$4d1....@fx09.iad>, JF Mezei

<jfmezei...@vaxination.ca> wrote:

> > Omni, all app on day one for M1. Affinity, all apps on M1 on one. Adobe
> > and Microsoft? Eh, some day next year probably. Google? 2027 for
> > anything not Chrome.
>
>
> The keynote spoke at great length of its laptops for video and photo
> editing. It didn't focus on those laptops being use for workd processor
> or email. If the keynote focuses on the very tasks handled by Adobe,
> then if Adobe isn't there, it's a problem.
>
> (Especially since at WWDC they demoed Adobe software running translated,
> which shows they place importance on Adobe software.

native photoshop will be a lot sooner than you might think.

<https://www.macrumors.com/2020/11/17/photoshop-apple-silicon-beta/>
We are excited to announce the first Beta release of Photoshop
running natively on Apple Silicon hardware!

Alan Browne

unread,

Nov 17, 2020, 8:04:34 PM11/17/20

to

On 2020-11-16 18:53, JF Mezei wrote:
> On 2020-11-16 14:01, Alan Browne wrote:
>
>> And, as mentioned, the ARM-64 is pretty CISCy as are most formerly RISC
>> processors have evolved over time.
>
> So can you please describe

Don't even try games like that. It's clear your understanding of CPU
design is rooted in the 80s with buzzwords from the trade press ever since.