Fujitsu will discontinue SPARC in 2034

Marco Moock

unread,

Oct 13, 2023, 6:17:33 AM10/13/23

to

Hello!

Fujitsu´s roadmap shows 2029 as end of sale and 2034 as end of support
of their SPARC product line.

Do other vendors like Oracle still continue to sell them or is SPARC
dead?

--
kind regards
Marco

John Dallman

unread,

Oct 13, 2023, 12:34:40 PM10/13/23

to

In article <ugb5fp$6cdq$2...@solani.org>, mm+s...@dorfdsl.de (Marco Moock)
wrote:

> Do other vendors like Oracle still continue to sell them or is SPARC
> dead?

It's on the way out. Oracle laid off their SPARC design team in late 2017
and have not released any new processors since the M8. They still offer
SPARC systems for sale, but I doubt they get many buyers. They claim
there will be support until 2034, the same date as Fujitsu.

<https://www.oracle.com/uk/servers/sparc/>
<https://blogs.oracle.com/oracle-systems/post/an-update-on-oracle-solaris-
and-sparc-infrastructure>
<https://blogs.oracle.com/oracle-systems/post/more-news-about-oracle-solar
is-and-sparc-infrastructure>
<https://blogs.oracle.com/oracle-systems/post/announcing-new-enhancements-
to-sparc-t8-and-m8-servers>

That last link seems to be the latest announcement, from December 2021.

They tried to claim to my employers in 2017-18 that SPARC hardware and
software had a future, but since they could not explain how this would
happen, even in outline, we started phasing out support.

Overall, Oracle's vertical integration plan for making their database
work better on their own hardware was not a success. It turned out that
Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
and there were no easy gains there. That meant that they didn't make lots
of extra money to keep SPARC chip development going, and paying for it
without that presumably didn't seem cost effective.

John

John Levine

unread,

Oct 13, 2023, 2:13:04 PM10/13/23

to

According to John Dallman <j...@cix.co.uk>:

>Overall, Oracle's vertical integration plan for making their database
>work better on their own hardware was not a success. It turned out that
>Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
>and there were no easy gains there.

What could they do that would make it better than an ARM or RISC V chip
running at the same speed? Transactional memory?

--
Regards,
John Levine, jo...@taugh.com, Primary Perpetrator of "The Internet for Dummies",
Please consider the environment before reading this e-mail. https://jl.ly

John Dallman

unread,

Oct 13, 2023, 2:48:34 PM10/13/23

to

In article <ugc1bb$c6q$1...@gal.iecc.com>, jo...@taugh.com (John Levine)
wrote:

> > It turned out that Oracle DB and SPARC Solaris were already
> > pretty well-tuned for each other, and there were no easy gains
> > there.
> What could they do that would make it better than an ARM or RISC V
> chip running at the same speed? Transactional memory?

As it turned out, they didn't have anything general-purpose. But the
intention certainly seemed to be to integrate hardware and software and
thereby increase sales. Without that, buying Sun Microsystems would not
have made any sense.

They have added some specialised crypto processors and database search
hardware to their own SPARC chips, but those don't seem to have had much
impact. My employers preferred to keep on writing portable code, rather
than contort their software designs to take advantage of niche hardware.

The fact that Oracle had got rid of their processor design team was not
confidence-inspiring, since it told us this hardware would not have a
long-term future. Oracle insisted that it would, but did not back that up
with anything except executive promises, and a refusal to talk about
technical issues.

John

Thomas Koenig

unread,

Oct 13, 2023, 2:49:57 PM10/13/23

to

Marco Moock <mm+s...@dorfdsl.de> schrieb:

> Hello!
>
> Fujitsu´s roadmap shows 2029 as end of sale and 2034 as end of support
> of their SPARC product line.

So, SPARC is effectively end-of-life, MIPS development has also
ceased (according to Wikipedia, its owner now makes RISC-V. Bah.),
POWER is still being developed, but without a broad customer base,
and HP-PA and Alpha were abandoned decades ago.

Sic transit gloria mundi.

John Dallman

unread,

Oct 13, 2023, 3:13:25 PM10/13/23

to

In article <ugc3gh$1hrml$1...@newsreader4.netcologne.de>,

tko...@netcologne.de (Thomas Koenig) wrote:

> So, SPARC is effectively end-of-life, MIPS development has also
> ceased (according to Wikipedia, its owner now makes RISC-V. Bah.),
> POWER is still being developed, but without a broad customer base,
> and HP-PA and Alpha were abandoned decades ago.
>
> Sic transit gloria mundi.

Yup. However, consider how many different minicomputer architectures were
created in the 1960s and 70s, and are now totally dead. There are still
quite a few specialised microprocessor architectures in use for embedded
work, although others have been replaced by ARM and RISC-V.

Just at present, the general-purpose architectures that are alive seem to
be:

Going well: x86-64 and ARM64, which have split the mass market, although
ARM64 has potential to eat away at x86-64's desktop and server markets.

Growing, but not yet fully-established: RISC-V.

In established niches, but not growing out of them: POWER, IBM Z.

On the way out: SPARC, Itanium.

Emulated for legacy code: Various 1960s mainframes, VAX.

Gone: MIPS, PA-RISC, Alpha, 68000.

Forgotten: pretty much everything else.

John

Scott Lurndal

unread,

Oct 13, 2023, 3:41:14 PM10/13/23

to

John Levine <jo...@taugh.com> writes:
>According to John Dallman <j...@cix.co.uk>:
>>Overall, Oracle's vertical integration plan for making their database
>>work better on their own hardware was not a success. It turned out that
>>Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
>>and there were no easy gains there.
>
>What could they do that would make it better than an ARM or RISC V chip
>running at the same speed? Transactional memory?

ARM has specified TM architecturally, I'm not sure if it is on any silicon yet.

https://developer.arm.com/documentation/102873/latest/The-Arm-Transactional-Memory-Extension

Michael S

unread,

Oct 14, 2023, 12:34:00 PM10/14/23

to

On Friday, October 13, 2023 at 9:13:04 PM UTC+3, John Levine wrote:
> According to John Dallman <j...@cix.co.uk>:
> >Overall, Oracle's vertical integration plan for making their database
> >work better on their own hardware was not a success. It turned out that
> >Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
> >and there were no easy gains there.
> What could they do that would make it better than an ARM or RISC V chip
> running at the same speed? Transactional memory?
>

It seems, by time of acquisition (late 2009) it was already know internally
that Rock (SPARC processor with TM support) is doomed. Although it was not
enounced publicly until next year.

Michael S

unread,

Oct 14, 2023, 12:37:18 PM10/14/23

to

Could it be specified for the benefit of owners of architectural license,
without intentions of Arm Inc. to implement it in any their cores?

BGB

unread,

Oct 14, 2023, 2:17:44 PM10/14/23

to

On 10/14/2023 11:33 AM, Michael S wrote:
> On Friday, October 13, 2023 at 9:13:04 PM UTC+3, John Levine wrote:
>> According to John Dallman <j...@cix.co.uk>:
>>> Overall, Oracle's vertical integration plan for making their database
>>> work better on their own hardware was not a success. It turned out that
>>> Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
>>> and there were no easy gains there.
>> What could they do that would make it better than an ARM or RISC V chip
>> running at the same speed? Transactional memory?
>>
>
> It seems, by time of acquisition (late 2009) it was already know internally
> that Rock (SPARC processor with TM support) is doomed. Although it was not
> enounced publicly until next year.
>

I also personally have difficulty imagining what exactly one could do in
a CPU that would give it much advantage for database tasks that would
not otherwise make sense with a general purpose CPU.

I would expect database workloads to be primarily dominated by memory
and IO.

So, it seems like one would mostly just want a CPU with significant
memory bandwidth and similar. Maybe some helper operations for things
like string compare and similar.

Though, assuming the B-Tree's were structured in an appropriate way,
things like "packed search" / "packed compare" instructions could be useful.

For example, I already had some instructions for looking up values
within packed-integer values, partly related to some cases I had used
B-Trees (and can also be used to help with implementing string operations).

I suspect this would be less useful for an actual database though, since
AFAIK, most people are using things like CHARACTER/VARCHAR or similar
for primary keys, whereas packed-compare would mostly only be
particularly useful if someone were using SMALLINT or INTEGER for their
primary keys.

Well, and also if their B-Tree nodes (with integer keys) were structured
with the integer key values stored consecutively in memory, say:
Node_Int {
u32 p_next;
u32 p_prev;
u32 p_up;
u16 n_keys;
u16 sz_rec;
byte i_depth;
byte pad[15];
u32 keys[N];
byte vals[N][sz_rec];
byte finalpad[M];
};

Where, say:
N=65504/(sz_rec+4)
M=65504-(N*(sz_rec+4))

And not so much:
Node_Gen {
u32 p_next; //00, next node, same depth
u32 p_prev; //04, prior node, same depth
u32 p_up; //08, parent node
u16 n_rec; //0C, number of entries in node
u16 sz_rec; //0E, size of entry
byte i_depth; //10, depth of node in tree
byte pad[15]; //11
byte data_rec[65504]; //data area, holds each entry
};

Where, say, each node holds 65520/sz_rec entries, each of which holds a
combination of primary key and node index (internal nodes) or the
contents of a table row (leaf nodes), with the table layout defined
externally (sz_rec being the combined size of all of the fields in the
table).

AFAIK, the latter sort of structure being more typical in databases.

But, yeah, the main reason it might make sense to put integer primary
keys before the row data or similar, is that it could allow for more
efficient binary search. But... could only work effectively if a person
uses an integer field or similar for a primary key, and not an ASCII
string or similar (where one may as well use the latter).

...

Anton Ertl

unread,

Oct 14, 2023, 5:21:44 PM10/14/23

to

Michael S <already...@yahoo.com> writes:
>On Friday, October 13, 2023 at 10:41:14=E2=80=AFPM UTC+3, Scott Lurndal wro=
>te:
>> ARM has specified TM architecturally, I'm not sure if it is on any silico=
>n yet.=20
>>=20
>> https://developer.arm.com/documentation/102873/latest/The-Arm-Transaction=

>al-Memory-Extension
>
>Could it be specified for the benefit of owners of architectural license,
>without intentions of Arm Inc. to implement it in any their cores?

My guess is that, when TM was a hot topic, someone at ARM was tasked
with writing a spec for it, and they produced a result. Someone was
also tasked with implementing it, but whatever they produced was found
to be lacking, and either was never integrated into silicon, or was
disabled if it was there.

We have certainly seen Intel struggle with TM (in the form of TSX):
Intel disabled them in client and small server CPUs (after the market
introduction of these CPUs in some cases) thanks to bugs, and AMD
never picking these features up. Strangely, Intel still offers TSX-NI
on their latest and greatest server CPU, the Xeon Platinum 8480+.

Makes me wonder how well TM worked on SPARC.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Chris M. Thomasson

unread,

Oct 14, 2023, 5:27:39 PM10/14/23

to

On 10/14/2023 1:57 PM, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
>> On Friday, October 13, 2023 at 10:41:14=E2=80=AFPM UTC+3, Scott Lurndal wro=
>> te:
>>> ARM has specified TM architecturally, I'm not sure if it is on any silico=
>> n yet.=20
>>> =20
>>> https://developer.arm.com/documentation/102873/latest/The-Arm-Transaction=
>> al-Memory-Extension
>>
>> Could it be specified for the benefit of owners of architectural license,
>> without intentions of Arm Inc. to implement it in any their cores?
>
> My guess is that, when TM was a hot topic, someone at ARM was tasked
> with writing a spec for it, and they produced a result. Someone was
> also tasked with implementing it, but whatever they produced was found
> to be lacking, and either was never integrated into silicon, or was
> disabled if it was there.
>
> We have certainly seen Intel struggle with TM (in the form of TSX):
> Intel disabled them in client and small server CPUs (after the market
> introduction of these CPUs in some cases) thanks to bugs, and AMD
> never picking these features up. Strangely, Intel still offers TSX-NI
> on their latest and greatest server CPU, the Xeon Platinum 8480+.
>
> Makes me wonder how well TM worked on SPARC.

Never liked TM, even when it was STM.

MitchAlsup

unread,

Oct 14, 2023, 9:48:22 PM10/14/23

to

On Saturday, October 14, 2023 at 1:17:44 PM UTC-5, BGB wrote:
> On 10/14/2023 11:33 AM, Michael S wrote:
> > On Friday, October 13, 2023 at 9:13:04 PM UTC+3, John Levine wrote:
> >> According to John Dallman <j...@cix.co.uk>:
> >>> Overall, Oracle's vertical integration plan for making their database
> >>> work better on their own hardware was not a success. It turned out that
> >>> Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
> >>> and there were no easy gains there.
> >> What could they do that would make it better than an ARM or RISC V chip
> >> running at the same speed? Transactional memory?
> >>
> >
> > It seems, by time of acquisition (late 2009) it was already know internally
> > that Rock (SPARC processor with TM support) is doomed. Although it was not
> > enounced publicly until next year.
> >
> I also personally have difficulty imagining what exactly one could do in
> a CPU that would give it much advantage for database tasks that would
> not otherwise make sense with a general purpose CPU.
<

Bigger caches and a slower clock rate help data base but do not help general
purpose. More slower CPUs and thinner cache hierarchy helps. Less prediction
helps too.
<
So, instead of 64KB L1s and 1MB L2s and 8MB L3s:: do a 256KB L1 and
8MB L2 with no L3s.
<
It does not mater how fast the clock rate is if you are waiting for memory to
respond.

>
> I would expect database workloads to be primarily dominated by memory
> and IO.
<

More so memory and less so I/O once they got main memories in the TB region.
Those TBs of main memory are used like the older index partitions on disks.

>
> So, it seems like one would mostly just want a CPU with significant
> memory bandwidth and similar. Maybe some helper operations for things
> like string compare and similar.
>

Mostly irrelevant. You are waiting for the comparison values to arrive a lot
more often than you spend performing the comparisons themselves. The
whole B-Tree stuff used to be performed on the disks themselves, leaving
the CPU(s) to do other stuff while the Tree was being searched.

>
> Though, assuming the B-Tree's were structured in an appropriate way,
> things like "packed search" / "packed compare" instructions could be useful.
<

Why do you think this is a job for a CPU ??

MitchAlsup

unread,

Oct 14, 2023, 9:49:12 PM10/14/23

to

On Saturday, October 14, 2023 at 4:21:44 PM UTC-5, Anton Ertl wrote:
> Michael S <already...@yahoo.com> writes:
>

> Makes me wonder how well TM worked on SPARC.
<

Not well enough to overcome the slow performance levels of SPARC CPUs.

BGB

unread,

Oct 15, 2023, 3:05:09 AM10/15/23

to

On 10/14/2023 8:48 PM, MitchAlsup wrote:
> On Saturday, October 14, 2023 at 1:17:44 PM UTC-5, BGB wrote:
>> On 10/14/2023 11:33 AM, Michael S wrote:
>>> On Friday, October 13, 2023 at 9:13:04 PM UTC+3, John Levine wrote:
>>>> According to John Dallman <j...@cix.co.uk>:
>>>>> Overall, Oracle's vertical integration plan for making their database
>>>>> work better on their own hardware was not a success. It turned out that
>>>>> Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
>>>>> and there were no easy gains there.
>>>> What could they do that would make it better than an ARM or RISC V chip
>>>> running at the same speed? Transactional memory?
>>>>
>>>
>>> It seems, by time of acquisition (late 2009) it was already know internally
>>> that Rock (SPARC processor with TM support) is doomed. Although it was not
>>> enounced publicly until next year.
>>>
>> I also personally have difficulty imagining what exactly one could do in
>> a CPU that would give it much advantage for database tasks that would
>> not otherwise make sense with a general purpose CPU.
> <
> Bigger caches and a slower clock rate help data base but do not help general
> purpose. More slower CPUs and thinner cache hierarchy helps. Less prediction
> helps too.
> <
> So, instead of 64KB L1s and 1MB L2s and 8MB L3s:: do a 256KB L1 and
> 8MB L2 with no L3s.
> <
> It does not mater how fast the clock rate is if you are waiting for memory to
> respond.

OK.

In my own use-cases, 16/32/64K seemed near optimal, but I guess if the
working sets are larger, this could favor bigger L1s.

It had seemed like, at 32 or 64K, one tends towards a 98 or 99% hit-rate.

In the past, my attempts to increase MHz tended to come at the cost of
significant reduction to L1 cache size (mostly, the "magic" being to
make it small enough to fit effectively into LUTRAMs).

The disadvantage being that this offsets any gains from more MHz in
terms of more cycles spent in cache misses.

I now have it "mostly working" (with the bigger caches), but still need
to determine if "more MHz" is enough to offset the significant increase
in clock cycles spent on interlock / register-RAW stalls. Still falls
short of consistently passing timing though (and also currently lacks
the low-precision FP-SIMD unit).

As-is:
Losses (for now):
Disabled dedicated FP-SIMD unit (*2);
Many more ops have higher latency;
Dropped RISC-V support and 96-bit address space support for now;
No more LDTEX instruction for now;
Back to no FDIV or FSQRT ops;
...
Partial:
Compare-Branch reduced to compare-with-zero only;
For now, have disabled 128-bit ALU operations.

Running the ringbus and L2 cache at higher clock-speeds does seem to
have increased memory bandwidth (including for accessing external RAM).

*2: Pulling off single-precision operations in a 3 cycle latency at
75MHz seemed to still be asking a bit too much.

But, on the other side, dropping MHz to allow bigger L1s could also make
sense, if the code has a naturally higher L1 miss rate.

In my own use cases, 33 or 25 MHz wouldn't allow much gain over 50MHz,
apart from the ability to possibly pull off fully pipelined
double-precision FPU operations or similar.

>>
>> I would expect database workloads to be primarily dominated by memory
>> and IO.
> <
> More so memory and less so I/O once they got main memories in the TB region.
> Those TBs of main memory are used like the older index partitions on disks.

Hmm...

I still have 48GB; and (on average) the most RAM-intensive thing I tend
to do is running Firefox (well, except in rare cases where I try to
recompile LLVM and it brings my computer to its knees, *).

Where, seemingly Firefox is a gas that will soon expand to use all
available RAM (until one periodically terminates the whole process tree
and reloads it).

*: Partial reason I invested a lot more effort in BGBCC, which I can
rebuild in a few seconds without totally owning my PC. Nevermind if I
frequently end up spending hours or days on long running Verilator
simulations...

AFAIK, a lot of the servers were using 10K RPM SAS drives and similar.

>>
>> So, it seems like one would mostly just want a CPU with significant
>> memory bandwidth and similar. Maybe some helper operations for things
>> like string compare and similar.
>>
> Mostly irrelevant. You are waiting for the comparison values to arrive a lot
> more often than you spend performing the comparisons themselves. The
> whole B-Tree stuff used to be performed on the disks themselves, leaving
> the CPU(s) to do other stuff while the Tree was being searched.

?...

I hadn't heard of anything like this; and none of the disk interfaces I
am aware of had presented much of an abstraction beyond that of 512B
sectors and linear block addresses.

RAID was typically multiple HDDs (with hardware error correction, ...),
but still presenting a "linear array of sectors" abstraction.

I am aware that older HDDs typically used C/H/S addressing. Early on,
one needing to set up the drive's values exactly as printed on the drive
or it wouldn't work.

I guess, older still, apparently one had to keep the HDD and controller
matched, as the drives could not be read with a different type of
controller than the one that initialized it (with different controllers
using different recording strategies, etc).

Apparently this was fixed with IDE drives, but AFAIK still existed with
floppy drives (but, with more standardization in terms of the recording
strategies used on floppy disks, since changing any of this would render
the floppies incompatible with those made on a different computer).

Though, admittedly, by the time I was using computers, they were mostly
already using IDE drives; but we still used floppies for a while. There
was also the wonkiness that early CD-ROM drives seemed to use
non-standardized cables and typically plugged into the soundcard rather
than using IDE.

It wasn't until after I was an adult that things moved over to SATA.

When I saw videos of early HDDs, I guess the difference was that they
were a lot bigger (roughly the same footprint as a CD-ROM drive but
twice as tall), and also connected with a pair of ribbon cables (in
addition to the power cable).

Well, and I guess there was some sort of older HDD tech where the HDD
was the size of a washing machine and had a lid that could be opened to
access a stack of platters.

Also saw a video where someone was trying to interface an 8-inch floppy
drive with a newer PC, but then having to deal with issues that
apparently these drives ran on 110VAC-60HZ and used 24V data signals
rather than 5V or 3.3V, ...

I guess someone could maybe get creative and mount one sideways in a
full-tower PC case?...

Dude: "Hey, check out my rig!", *kerchunk*, proceeds to pull out an 8
inch floppy... Then maybe rig up a power-bypass, and voltage leveling
stuff to try to plug it into a USB based floppy controller, maybe so
they could store data on it.

Or, maybe build one using some slightly newer tech, but still use 8-inch
floppies, to make "absurdly large" floppies (if one could achieve
similar recording density to a 3.5" floppy, they could fit significantly
more data on an 8" disk).

Then again, practically, this makes almost about as much sense as trying
to stick digital data onto a vinyl record...

Then again, it seemed like a lot of the Gen Z people gained an interest
in vinyl records; which for me seemed a little odd as this was old tech
even by my standards...

Waves cane, "Back in my day, we had CDs and MP3's!".

>>
>> Though, assuming the B-Tree's were structured in an appropriate way,
>> things like "packed search" / "packed compare" instructions could be useful.
> <
> Why do you think this is a job for a CPU ??

I hadn't heard of there being anything else to run the B-Tree walks...

It seemed like one would need a mechanism for hopefully moderately
efficient "strncmp()" and similar though...

Michael S

unread,

Oct 15, 2023, 4:20:49 AM10/15/23

to

On Friday, October 13, 2023 at 9:48:34 PM UTC+3, John Dallman wrote:
> In article <ugc1bb$c6q$1...@gal.iecc.com>, jo...@taugh.com (John Levine)
> wrote:
> > > It turned out that Oracle DB and SPARC Solaris were already
> > > pretty well-tuned for each other, and there were no easy gains
> > > there.
> > What could they do that would make it better than an ARM or RISC V
> > chip running at the same speed? Transactional memory?
> As it turned out, they didn't have anything general-purpose. But the
> intention certainly seemed to be to integrate hardware and software and
> thereby increase sales. Without that, buying Sun Microsystems would not
> have made any sense.
>

Back then common wisdom was that Oracle bought Sun Microsystems
first and foremost because they owned mySQL. Java was a nice bonus
on top of it. The rest of the company was just an appendage, to decide later
either useful or not.

Michael S

unread,

Oct 15, 2023, 4:25:38 AM10/15/23

to

On Sunday, October 15, 2023 at 4:49:12 AM UTC+3, MitchAlsup wrote:
> On Saturday, October 14, 2023 at 4:21:44 PM UTC-5, Anton Ertl wrote:
> > Michael S <already...@yahoo.com> writes:
> >
> > Makes me wonder how well TM worked on SPARC.
> <
> Not well enough to overcome the slow performance levels of SPARC CPUs.

How do you know?
Remember, SPARC with HTM never reached customers.

Michael S

unread,

Oct 15, 2023, 5:12:16 AM10/15/23

to

On Friday, October 13, 2023 at 10:13:25 PM UTC+3, John Dallman wrote:
> In article <ugc3gh$1hrml$1...@newsreader4.netcologne.de>,
> tko...@netcologne.de (Thomas Koenig) wrote:
>
> > So, SPARC is effectively end-of-life, MIPS development has also
> > ceased (according to Wikipedia, its owner now makes RISC-V. Bah.),
> > POWER is still being developed, but without a broad customer base,
> > and HP-PA and Alpha were abandoned decades ago.
> >
> > Sic transit gloria mundi.
> Yup. However, consider how many different minicomputer architectures were
> created in the 1960s and 70s, and are now totally dead. There are still
> quite a few specialised microprocessor architectures in use for embedded
> work, although others have been replaced by ARM and RISC-V.
>
> Just at present, the general-purpose architectures that are alive seem to
> be:
>
> Going well: x86-64 and ARM64, which have split the mass market, although
> ARM64 has potential to eat away at x86-64's desktop and server markets.
>
> Growing, but not yet fully-established: RISC-V.
>

Is RISC-V really present in general-purpose computing?

> In established niches, but not growing out of them: POWER, IBM Z.
>

IBM Z has established niche.
IBM POWER - not sure about it. POWER holds on for as long as IBM
able to make POWER chips that are competitive (or better exceeding)
x86-64 and ARM64 in absolute performance and at the same time not
ridiculously behind in price/performance. It looks to me like POWER is
in the same rat race that effectively killed SPARC and IPF. They just
manage to run a little better.

> On the way out: SPARC, Itanium.
>
> Emulated for legacy code: Various 1960s mainframes, VAX.

Some people emulate Alpha, too. Not many, I would guess.

>
> Gone: MIPS, PA-RISC, Alpha, 68000.

68K still sold today in form of ColdFire. But those are microcontrollers
rather than general-purpose computers.

John Dallman

unread,

Oct 15, 2023, 5:52:22 AM10/15/23

to

In article <363a3f9d-f061-4eda...@googlegroups.com>,

already...@yahoo.com (Michael S) wrote:

> Back then common wisdom was that Oracle bought Sun Microsystems
> first and foremost because they owned mySQL.

That never seemed plausible to me. The amount of money involved, and the
limited control over mySQL they'd get, just didn't seem to make sense.

Of course, companies' ideas of what things are worth can seem very odd.
Around the same time, in 2008, Microsoft offered to buy Yahoo! for $44.6
billion, which seemed like a "take the money and run before they realise
they've overpaid" kind of offer, but Yahoo! rejected it as too low.

John

Thomas Koenig

unread,

Oct 15, 2023, 6:48:29 AM10/15/23

to

Michael S <already...@yahoo.com> schrieb:

ColdFire does not implement the whole 68000 instruction set
(at least according to Wikipedia), some instructions and some
addressing modes are not implemented.

Thomas Koenig

unread,

Oct 15, 2023, 6:51:17 AM10/15/23

to

Anton Ertl <an...@mips.complang.tuwien.ac.at> schrieb:

> We have certainly seen Intel struggle with TM (in the form of TSX):
> Intel disabled them in client and small server CPUs (after the market
> introduction of these CPUs in some cases) thanks to bugs, and AMD
> never picking these features up. Strangely, Intel still offers TSX-NI
> on their latest and greatest server CPU, the Xeon Platinum 8480+.

IBM had it working on POWER8, released a buggy implementation (as
in "freezes the processor") on POWER9, and then dropped support
in Power10.

I haven't seen any benchmarks showing advantages/disadvantages
on POWER8 vs. POWER9.

Thomas Koenig

unread,

Oct 15, 2023, 7:09:05 AM10/15/23

to

MitchAlsup <Mitch...@aol.com> schrieb:

> On Saturday, October 14, 2023 at 1:17:44 PM UTC-5, BGB wrote:
>> On 10/14/2023 11:33 AM, Michael S wrote:
>> > On Friday, October 13, 2023 at 9:13:04 PM UTC+3, John Levine wrote:
>> >> According to John Dallman <j...@cix.co.uk>:
>> >>> Overall, Oracle's vertical integration plan for making their database
>> >>> work better on their own hardware was not a success. It turned out that
>> >>> Oracle DB and SPARC Solaris were already pretty well-tuned for each other,
>> >>> and there were no easy gains there.
>> >> What could they do that would make it better than an ARM or RISC V chip
>> >> running at the same speed? Transactional memory?
>> >>
>> >
>> > It seems, by time of acquisition (late 2009) it was already know internally
>> > that Rock (SPARC processor with TM support) is doomed. Although it was not
>> > enounced publicly until next year.
>> >
>> I also personally have difficulty imagining what exactly one could do in
>> a CPU that would give it much advantage for database tasks that would
>> not otherwise make sense with a general purpose CPU.
><
> Bigger caches and a slower clock rate help data base but do not help general
> purpose. More slower CPUs and thinner cache hierarchy helps. Less prediction
> helps too.
><
> So, instead of 64KB L1s and 1MB L2s and 8MB L3s:: do a 256KB L1 and
> 8MB L2 with no L3s.
><
> It does not mater how fast the clock rate is if you are waiting for memory to
> respond.

Hm, but 256k L1 would increase the L1 latency, correct?

Maybe this is a reason why SMT is popular in certain processors:
If a core is waiting for memory, it might as well switch to another
thread for which the outstanding request has already arrived.

Another reason, I'm told, is that SMT can be a reaction to
software licensing - some licenses in the commercial field are
are paid per physical core, and if SMT can squeeze out a few
percent of performance, even a more expensive processor might be
more cost-effective.

EricP

unread,

Oct 15, 2023, 9:02:31 AM10/15/23

to

Michael S wrote:
> On Sunday, October 15, 2023 at 4:49:12 AM UTC+3, MitchAlsup wrote:
>> On Saturday, October 14, 2023 at 4:21:44 PM UTC-5, Anton Ertl wrote:
>>> Michael S <already...@yahoo.com> writes:
>>>
>>> Makes me wonder how well TM worked on SPARC.
>> <
>> Not well enough to overcome the slow performance levels of SPARC CPUs.
>
> How do you know?
> Remember, SPARC with HTM never reached customers.

Sun Rock supposedly had HTM and was manufactured
but I see reference to being canceled by Oracle.

Rock: A High-Performance Sparc CMT Processor, 2009
https://www.kth.se/social/upload/4fb51486f27654119b000007/rock_computer.pdf

The HTM was basically the same as AMDs ASF and Intels RTM
that used cache line invalidate messages to abort the transaction.
As such it was a "new contender wins" try-fail-retry algorithm
and would likely have had the same usability problems as RTM.

David Brown

unread,

Oct 15, 2023, 10:53:00 AM10/15/23

to

That is correct. It also (depending on the version - there are multiple
ColdFire variants) implements instructions that are not in the 68k ISA.
The idea behind ColdFire was to re-implement the most used parts of the
68k ISA in a fresh design using a RISC-style implementation but keeping
the variable instruction length. The addressing modes dropped were
relatively rarely used, especially on later 68k devices (like 68030 and
68040) where they were slower than using simpler modes despite those
needing multiple instructions.

ColdFire is, however, rarely seen these days, and to my knowledge it has
been many years since any new ColdFire microcontrollers were introduced.

John Dallman

unread,

Oct 15, 2023, 11:21:28 AM10/15/23

to

In article <c2399332-9696-4749...@googlegroups.com>,

already...@yahoo.com (Michael S) wrote:

> > Growing, but not yet fully-established: RISC-V.
> Is RISC-V really present in general-purpose computing?

Not yet, but it seems to have an adequate design and enough momentum to
get there. /Staying/ there is a different question.

> > In established niches, but not growing out of them: POWER, IBM Z.

> IBM POWER - not sure about it. POWER holds on for as long as IBM
> able to make POWER chips that are competitive (or better exceeding)
> x86-64 and ARM64 in absolute performance and at the same time not
> ridiculously behind in price/performance. It looks to me like POWER
> is in the same rat race that effectively killed SPARC and IPF. They
> just manage to run a little better.

It's also used to run IBM i, which is a pretty big niche that's quite
easy to forget. It could be replaced, since the concept of the system is
that the hardware is replaceable, but IBM would try hard to avoid the
costs of doing that.

John

Quadibloc

unread,

Oct 15, 2023, 11:53:52 AM10/15/23

to

On Sunday, October 15, 2023 at 4:48:29 AM UTC-6, Thomas Koenig wrote:

> ColdFire does not implement the whole 68000 instruction set

Precisely, so it won't run AmigaOS.

John Savard

Terje Mathisen

unread,

Oct 15, 2023, 12:39:03 PM10/15/23

to

This is extremely relevant for engineers/engineering software which can
easily run into the $100K/year range for a single license: At those
prices any CPU/motherboard/platform combo which provides at least 10%
speedup is automatically worth $10K more, right?

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Anton Ertl

unread,

Oct 15, 2023, 12:55:41 PM10/15/23

to

David Brown <david...@hesbynett.no> writes:
>ColdFire is, however, rarely seen these days, and to my knowledge it has
>been many years since any new ColdFire microcontrollers were introduced.

Motorola spun the semiconductor part off as Freescale, and Freescale
was bought by NXP (spun off from Philips). In 2016 or so I heard a
presentation by someone familiar with the microcontroller industry
that NXP is focussing on ARM-based microcontrollers and puts Coldfire
and PowerPC on the back burner.

MitchAlsup

unread,

Oct 15, 2023, 1:22:20 PM10/15/23

to

Not on average, the number of nanoseconds increases, the number of clock
cycles spent doing nothing but waiting decreases.

>
> Maybe this is a reason why SMT is popular in certain processors:
> If a core is waiting for memory, it might as well switch to another
> thread for which the outstanding request has already arrived.
<

As long as you avoid Spectré and Meltdown you can do SMT in the wait
cycles of the other.

>
> Another reason, I'm told, is that SMT can be a reaction to
> software licensing - some licenses in the commercial field are
> are paid per physical core, and if SMT can squeeze out a few
> percent of performance, even a more expensive processor might be
> more cost-effective.
<

Hardware coming to save poorly licensed software--great !!

MitchAlsup

unread,

Oct 15, 2023, 1:30:37 PM10/15/23

to

Take your application and add a level of indirection for every LD, every ST
and every Branch, and see how your cache hierarchy works then.

>
> It had seemed like, at 32 or 64K, one tends towards a 98 or 99% hit-rate.
>

For your games, yes; for billion transactions per hour data base, decidedly no.

>
> In the past, my attempts to increase MHz tended to come at the cost of
> significant reduction to L1 cache size (mostly, the "magic" being to
> make it small enough to fit effectively into LUTRAMs).
>
> The disadvantage being that this offsets any gains from more MHz in
> terms of more cycles spent in cache misses.
<

Data base SW architecture makes one revisit cache hierarrchy and CPU
frequency.

In a data base system, one might find 16 of those GBs spent in DB indexing
structures. So, every LD. ST, and Branch touches one of those DW in that
16 GB address (sub)space before going it where it really wanted to be.

>

> > Why do you think this is a job for a CPU ??
<
> I hadn't heard of there being anything else to run the B-Tree walks...
<

IBM used to have channels where one could send a "find record "AbcdEFG"
and read record into *buffer" command.

Michael S

unread,

Oct 15, 2023, 1:35:11 PM10/15/23

to

On Sunday, October 15, 2023 at 7:55:41 PM UTC+3, Anton Ertl wrote:
> David Brown <david...@hesbynett.no> writes:
> >ColdFire is, however, rarely seen these days, and to my knowledge it has
> >been many years since any new ColdFire microcontrollers were introduced.
> Motorola spun the semiconductor part off as Freescale, and Freescale
> was bought by NXP (spun off from Philips). In 2016 or so I heard a
> presentation by someone familiar with the microcontroller industry
> that NXP is focussing on ARM-based microcontrollers and puts Coldfire
> and PowerPC on the back burner.

For new stuff - sure.
But old stuff still sold and still useful.
We had few designs (still produced occasionally) base on ColdFire MCU,
because back then they were the only 32-bit MCUs available for low volume
purchaser that had integrated Fast Ethernet phy.
Soon thereafter Ti launched Arm-based Tiva series with similar capabilities
and that's what we prefer now when we need MCU with Ethernet and care about
design time more than about few dollars of BoM.
But it seems (I didn't really look in last couple of years) that in NXP portfolio
integrated Ethernet phy is still available only with Coldfire.

ColdFire.

George Neuner

unread,

Oct 15, 2023, 1:36:27 PM10/15/23

to

On Sat, 14 Oct 2023 13:17:33 -0500, BGB <cr8...@gmail.com> wrote:

>I also personally have difficulty imagining what exactly one could do in
>a CPU that would give it much advantage for database tasks that would
>not otherwise make sense with a general purpose CPU.
>
>I would expect database workloads to be primarily dominated by memory
>and IO.
>
>So, it seems like one would mostly just want a CPU with significant
>memory bandwidth and similar. Maybe some helper operations for things
>like string compare and similar.
>
>
>Though, assuming the B-Tree's were structured in an appropriate way,
>things like "packed search" / "packed compare" instructions could be useful.
>
>For example, I already had some instructions for looking up values
>within packed-integer values, partly related to some cases I had used
>B-Trees (and can also be used to help with implementing string operations).
>
>
>I suspect this would be less useful for an actual database though, since
>AFAIK, most people are using things like CHARACTER/VARCHAR or similar
>for primary keys, whereas packed-compare would mostly only be
>particularly useful if someone were using SMALLINT or INTEGER for their
>primary keys.
>

> :

>
>But, yeah, the main reason it might make sense to put integer primary
>keys before the row data or similar, is that it could allow for more
>efficient binary search. But... could only work effectively if a person
>uses an integer field or similar for a primary key, and not an ASCII
>string or similar (where one may as well use the latter).

There are exceptions, but "string" like fields should (almost) never
be the /primary/ key or used for a foreign key. However, it does make
sense to be able to index them effectively for searching.

David Brown

unread,

Oct 15, 2023, 1:38:38 PM10/15/23

to

On 15/10/2023 18:47, Anton Ertl wrote:
> David Brown <david...@hesbynett.no> writes:
>> ColdFire is, however, rarely seen these days, and to my knowledge it has
>> been many years since any new ColdFire microcontrollers were introduced.
>
> Motorola spun the semiconductor part off as Freescale, and Freescale
> was bought by NXP (spun off from Philips). In 2016 or so I heard a
> presentation by someone familiar with the microcontroller industry
> that NXP is focussing on ARM-based microcontrollers and puts Coldfire
> and PowerPC on the back burner.
>

That's right.

At one point, Freescale had 4 processor architectures - PowerPC,
68k/ColdFire, MCore and ARM. It was not practical, and MCore was the
first to go. I never used MCore, nor did I know anyone who did, but
I've used all the others to some extent. The 68332 microcontroller was
brilliant, and is still being made after about 30 years - despite
Motorola trying to kill it off in favour of PowerPC microcontrollers.
They don't make so many new PowerPC microcontrollers now, but the
automotive and high-reliability industries are a conservative bunch, and
many don't trust the "newcomer" ARM as much as they trust PowerPC.

David Brown

unread,

Oct 15, 2023, 1:39:16 PM10/15/23

to

You can always use an "unimplemented instruction" exception handler :-)

George Neuner

unread,

Oct 15, 2023, 2:10:07 PM10/15/23

to

On Sat, 14 Oct 2023 14:27:35 -0700, "Chris M. Thomasson"
<chris.m.t...@gmail.com> wrote:

>Never liked TM, even when it was STM.

At least with STM you can diagnose the reason for failure. Finding a
bug with (or worse, in) HTM is darn near impossible.

When used properly for optimistic concurrency, any kind of TM can make
the programmer's job easier and the code clearer ... which often is
more important than execution speed.

It falls down hard when it is mistaken for dbms-like transactions.

My problem with (otherwise working) HTM is the limits ... sometimes
the data "elements" that need to be tracked simply are too large, and
other times there simply are too many of them.

STM can relax limits enough to be useful [if not particularly fast].

YMMV.

BGB

unread,

Oct 15, 2023, 2:16:24 PM10/15/23

to

Yeah, that makes sense. Huge L1's, less MHz, fewer miss penalties if the
working set is huge.

With 256K, one could potentially fit something like a Deflate
encoder/decoder entirely into the L1 cache as well.

>>
>> Maybe this is a reason why SMT is popular in certain processors:
>> If a core is waiting for memory, it might as well switch to another
>> thread for which the outstanding request has already arrived.
> <
> As long as you avoid Spectré and Meltdown you can do SMT in the wait
> cycles of the other.
>>
>> Another reason, I'm told, is that SMT can be a reaction to
>> software licensing - some licenses in the commercial field are
>> are paid per physical core, and if SMT can squeeze out a few
>> percent of performance, even a more expensive processor might be
>> more cost-effective.
> <
> Hardware coming to save poorly licensed software--great !!

I haven't really used any software like this.

For nearly everything, it seems like there is open-source alternatives,
and if there is not, one could in theory write something and then post
it on the internet.

Granted, I guess I can note that there seems to be a significant lack of
"not awful" open-source 3D modeling and solid-modeling software.

...

BGB

unread,

Oct 15, 2023, 2:47:30 PM10/15/23

to

Yeah.

I guess, if I were designing something, likely integer fields would be
used for primary keys (or index keys, if the primary key is not an
integer type).

Possibly, secondary index B-Tree's could either map string fields to the
index key/primary key; and also for implementing foreign keys, ...

Less sure how one would go about efficiently implementing queries by
non-unique fields though, say:
select * from Workers where FirstName='James'
Or similar...

In a naive implementation, something like this would likely require
walking every row in the table to perform the query.

Otherwise, one would likely need to build lists of rows which have a
given field row, or alternatively use hash-chains or similar in addition
to B-Trees. Would need to come up with a hash-chain structure that deals
well with arbitrary scalability. One could possibly build hash-chains on
top of a B-Tree, but this seems inefficient (say, one builds a
linked-list for each hash-chain segment on top of a B-Tree, with each
segment containing 0..N index keys for rows where this field matches the
corresponding hash).

But, yeah, in terms of the CPU, still it seems like all this mostly
amounts to designing the CPU mostly to optimize for large working sets;
rather than any particular ISA feature (say, the typical SIMD
shenanigans or similar; where things like floating-point SIMD are
unlikely to be particularly useful for database workloads, ...).

...

John Levine

unread,

Oct 15, 2023, 2:50:27 PM10/15/23

to

According to John Dallman <j...@cix.co.uk>:

>In article <363a3f9d-f061-4eda...@googlegroups.com>,
>already...@yahoo.com (Michael S) wrote:
>
>> Back then common wisdom was that Oracle bought Sun Microsystems
>> first and foremost because they owned mySQL.
>
>That never seemed plausible to me. The amount of money involved, and the
>limited control over mySQL they'd get, just didn't seem to make sense.

It makes sense as a way to ensure that MySQL never became a serious competitor
to Oracle. They segmented the market, very low end to free MySQL, low end
to paid MySQL, high end to Oracle. Yeah, the MySQL people left and now we
have Mariadb and Percona further splitting the low end.

BGB

unread,

Oct 15, 2023, 4:05:42 PM10/15/23

to

On 10/15/2023 10:21 AM, John Dallman wrote:
> In article <c2399332-9696-4749...@googlegroups.com>,
> already...@yahoo.com (Michael S) wrote:
>
>>> Growing, but not yet fully-established: RISC-V.
>> Is RISC-V really present in general-purpose computing?
>
> Not yet, but it seems to have an adequate design and enough momentum to
> get there. /Staying/ there is a different question.
>

I suspect that some of the weak areas of RISC-V will likely become more
obvious as they attempt higher performance implementations.

Then again, given it lacks condition codes or similar, this may
potentially allow it to get higher clock-speeds to compensate for some
other issues (relative to condition-code based ISAs, where condition
codes seem like a pretty steep weight in my limited experience).

Then again, I am left recently thinking about how I would design an ISA
and CPU core to try to sustain higher clock-speeds (my existing design
seems to point out some weak areas regarding clock-speed, and my
fiddling at trying to boost the clock-speed is pointing out some things
that "could have been done differently" in terms of the ISA design).

Say, had some things been done differently, if 75MHz would have been
easier to achieve on an FPGA, or 100MHz in reach (assuming sticking with
a 3-wide 64-bit design and 32K L1 caches and similar).

Like, for example, I have noted that, as-is, determining bundle length
seems to be a bit too steep (too many input bits need to be considered).
There is generally a bit much latency in the instruction decoders as
well, ...

Though, I suspect some latency could be reduced by going over to an "XG2
only" ISA variant, which would likely eliminate some of the latency here
(would remove ~ 4 bits per instruction-word for length determination,
and simplify the logic for unpacking register fields, ...).

Though, one weak area is that one still needs to remap register fields
to ports (based on opcode and bundle layout) and that the interpretation
of the immediate field depends on the opcode. No way around this as-is
(in a partial redesign, the idea would be to move over to defining
immediate fields in terms of the instruction block, and moving over to
zero-extended immediate values for everything other than
branch-displacements and jumbo-forms).

Though, I did recently make another random observation that helped
somewhat with timing:
Transforming stuff like:
...
always @(posedge clock)
begin
if(!exHold)
begin
tFoo <= tNxtFoo;
...
end
...
end
Into:
wire exHoldN = !exHold;
...
always @(posedge clock)
begin
if(exHoldN)
begin
tFoo <= tNxtFoo;
...
end
...
end

Resulted in some noticeable gains, as in the former case, it seems like
this resulted in high fanout "LUT1" routed into the "Clock-Enable" pin
on each of the flip-flops, whereas inverting the signal via the "wire"
declaration seemingly instead inverts the signal in the LUTs that
produce it (reducing the path length, ...).

It starts to seem like the logic synthesis is less clever sometimes than
it may seem at first...

Unlike RISC-V, I am still inclined to consider "relative compare and
branch" to be a "No, don't do it!" feature (*1).

Better in terms of latency to have "compare with zero and branch", which
avoids the "evilness" of needing to have a carry chain to determine
whether or not to branch.

Granted, one will still end up likely needing something akin to RISC-V's
"SLT" instruction, since "SUB + Bxx" with "compare with zero" logic
can't deal with the full dynamic range of an integer register (so, one
would need "SLT+BZ/BNZ" instead).

Granted, if most of the relative compares are between 32-bit values, and
one has 64-bit registers, then "SUB+Bxx" would be sufficient for these
cases.

*1: This has a higher cost than other options and only addresses a small
number of cases that couldn't be handled with the same instruction-count
via the other option (with the partial exception if one has a 1-cycle
constant load, 2-cycle ALU ops, and the branch has the same latency
either way).

Still somewhat in favor of having indexed addressing though, as IME, a
"not exactly small" proportion of the loads/stores tend to use it (and
"it doesn't really cost much" relative to fixed constant displacements).

Still sort of in favor of keeping predication, but might be tempted to
consider making the SR.T bit exclusively used for predication (apart
from CMPxx and TST, no other instructions would touch it).

Though, could possibly make sense to consider consolidating "predicate
stack manipulation" into the CMPxx instructions:
CMPxx // T = Rs xx Rt
CMPTAxx // T = T && (Rs xx Rt)
CMPTOxx // T = T || (Rs xx Rt)
CMPUAxx // T = U[0] && (Rs xx Rt)
CMPUOxx // T = U[0] || (Rs xx Rt)
CMPTPSAxx // U'={U[6:0], T}, T = T && (Rs xx Rt)
CMPTPSOxx // U'={U[6:0], T}, T = T || (Rs xx Rt)
CMPUPPAxx // T = U[0] && (Rs xx Rt), U={0, U[7:1]}
CMPUPPOxx // T = U[0] || (Rs xx Rt), U={0, U[7:1]}
...

Basically, because this can (in theory) allow predication to deal with
nested if/else blocks with less encoding-space overhead than the use of
dedicated predicate-bit registers (as was used in IA-64 or similar).

But, with the obvious drawback of being a stack model (and not allowing
instruction scheduling between different if/else blocks).

Well, and may not offer much advantage over my past (mostly failed)
experiment of "SR twiddle" instructions (apart from trying to merge the
twiddle into the compare itself; thus avoiding the overhead of needing
separate instructions to perform this twiddling). And, usually by the
time one is needing to deal with nested if/else blocks, it is "better"
to deal with the outer levels via a branch anyways.

...

But, getting non-terrible performance is still difficult it seems.

>>> In established niches, but not growing out of them: POWER, IBM Z.
>> IBM POWER - not sure about it. POWER holds on for as long as IBM
>> able to make POWER chips that are competitive (or better exceeding)
>> x86-64 and ARM64 in absolute performance and at the same time not
>> ridiculously behind in price/performance. It looks to me like POWER
>> is in the same rat race that effectively killed SPARC and IPF. They
>> just manage to run a little better.
>
> It's also used to run IBM i, which is a pretty big niche that's quite
> easy to forget. It could be replaced, since the concept of the system is
> that the hardware is replaceable, but IBM would try hard to avoid the
> costs of doing that.
>

Yep.

> John

Chris M. Thomasson

unread,

Oct 15, 2023, 4:12:48 PM10/15/23

to

On 10/14/2023 6:49 PM, MitchAlsup wrote:
> On Saturday, October 14, 2023 at 4:21:44 PM UTC-5, Anton Ertl wrote:
>> Michael S <already...@yahoo.com> writes:
>>
>> Makes me wonder how well TM worked on SPARC.
> <
> Not well enough to overcome the slow performance levels of SPARC CPUs.

Fwiw... I won a SunFire T2000 from Suns CoolThreads context with my
vZoom project.

Thomas Koenig

unread,

Oct 15, 2023, 4:44:26 PM10/15/23

to

Terje Mathisen <terje.m...@tmsw.no> schrieb:

> Thomas Koenig wrote:

>> Another reason, I'm told, is that SMT can be a reaction to
>> software licensing - some licenses in the commercial field are
>> are paid per physical core, and if SMT can squeeze out a few
>> percent of performance, even a more expensive processor might be
>> more cost-effective.
>
> This is extremely relevant for engineers/engineering software which can
> easily run into the $100K/year range for a single license: At those
> prices any CPU/motherboard/platform combo which provides at least 10%
> speedup is automatically worth $10K more, right?

Yes (although it depends a bit on how the software is used; it is
not a big difference if the user is made to wait a few seconds more
in an interactive session, a different matter if there is time
pressure and/or the software is running for a very long time).

But the thing that bugs me most about most commercial software is
license managers. These are a pest. Working on a problem with
these is extremely aggravating, because it is _completely_ wasted -
I am spending time on a piece of software that is designed to keep
me from working (and often does, although my company paid the money
for it).

If there ever is a reason for switching to open source where it is
available, it is license managers.

MitchAlsup

unread,

Oct 15, 2023, 6:53:28 PM10/15/23

to

On Sunday, October 15, 2023 at 3:05:42 PM UTC-5, BGB wrote:
> On 10/15/2023 10:21 AM, John Dallman wrote:
> > In article <c2399332-9696-4749...@googlegroups.com>,
> > already...@yahoo.com (Michael S) wrote:
> >
> >>> Growing, but not yet fully-established: RISC-V.
> >> Is RISC-V really present in general-purpose computing?
> >
> > Not yet, but it seems to have an adequate design and enough momentum to
> > get there. /Staying/ there is a different question.
> >
> I suspect that some of the weak areas of RISC-V will likely become more
> obvious as they attempt higher performance implementations.
<

One of the weak areas in RISC-V is that most of the investment comes from China.

>
> Then again, given it lacks condition codes or similar, this may
> potentially allow it to get higher clock-speeds to compensate for some
> other issues (relative to condition-code based ISAs, where condition
> codes seem like a pretty steep weight in my limited experience).
>

If condition codes make or break your design, it was already broken.

>
>
> Then again, I am left recently thinking about how I would design an ISA
> and CPU core to try to sustain higher clock-speeds (my existing design
> seems to point out some weak areas regarding clock-speed, and my
> fiddling at trying to boost the clock-speed is pointing out some things
> that "could have been done differently" in terms of the ISA design).
>
> Say, had some things been done differently, if 75MHz would have been
> easier to achieve on an FPGA, or 100MHz in reach (assuming sticking with
> a 3-wide 64-bit design and 32K L1 caches and similar).
>
> Like, for example, I have noted that, as-is, determining bundle length
> seems to be a bit too steep (too many input bits need to be considered).
> There is generally a bit much latency in the instruction decoders as
> well, ...
>
> Though, I suspect some latency could be reduced by going over to an "XG2
> only" ISA variant, which would likely eliminate some of the latency here
> (would remove ~ 4 bits per instruction-word for length determination,
> and simplify the logic for unpacking register fields, ...).
>
> Though, one weak area is that one still needs to remap register fields
> to ports (based on opcode and bundle layout) and that the interpretation
> of the immediate field depends on the opcode.
<

Gack:: {And this is something that irks me about ROSC-V compressed::}
<
If your source register fields move around, you have to do a partial decode
in order to route the bits to the register file select line decoder. This adds
at least 4 gates of delay (¼ cycle) between the point where inst resolves,
and you start moving the select line that will ultimately read its value out
to a register operand.

<
> No way around this as-is
> (in a partial redesign, the idea would be to move over to defining
> immediate fields in terms of the instruction block, and moving over to
> zero-extended immediate values for everything other than
> branch-displacements and jumbo-forms).
>

You must leave the register specification fields in fixed positions {if you
want a fast decode}.

See !! you moved the sampling of exHold before the clock edge, taking
at least 1 gate of delay out of the subsequent cycle.

>
> Resulted in some noticeable gains, as in the former case, it seems like
> this resulted in high fanout "LUT1" routed into the "Clock-Enable" pin
> on each of the flip-flops, whereas inverting the signal via the "wire"
> declaration seemingly instead inverts the signal in the LUTs that
> produce it (reducing the path length, ...).
>
> It starts to seem like the logic synthesis is less clever sometimes than
> it may seem at first...
>

If you have clock edges in the code, synthesis has little choice.

>
>
> Unlike RISC-V, I am still inclined to consider "relative compare and
> branch" to be a "No, don't do it!" feature (*1).
<

Originally, this a cross product of MIPS R2000 design point where the L1 cache
was shared as I$ and D$ on alternate phases--and having these paths both
take exactly 2 cycles. If this represents your pipeline(s), then the trick buys
maybe 5%, if it does not fit your pipeline(s) you tend to loose 5%-ish; because
either FETCH of LD gets a cycle added.

>
> Better in terms of latency to have "compare with zero and branch", which
> avoids the "evilness" of needing to have a carry chain to determine
> whether or not to branch.
<

This works will all pipeline design points, whereas the other works only
with I$ disjoint from D$ designs.

>
> Granted, one will still end up likely needing something akin to RISC-V's
> "SLT" instruction, since "SUB + Bxx" with "compare with zero" logic
> can't deal with the full dynamic range of an integer register (so, one
> would need "SLT+BZ/BNZ" instead).
<

Why not an instruction that sets all integral comparison patterns and
a branch that samples one (or more).

>
> Granted, if most of the relative compares are between 32-bit values, and
> one has 64-bit registers, then "SUB+Bxx" would be sufficient for these
> cases.
>
> *1: This has a higher cost than other options and only addresses a small
> number of cases that couldn't be handled with the same instruction-count
> via the other option (with the partial exception if one has a 1-cycle
> constant load, 2-cycle ALU ops, and the branch has the same latency
> either way).
>
>
> Still somewhat in favor of having indexed addressing though, as IME, a
> "not exactly small" proportion of the loads/stores tend to use it (and
> "it doesn't really cost much" relative to fixed constant displacements).
<

When it does get used, and you don't have it, you eat an instruction.
Which has to be fetched, decoded, executed, forwarding its result, later
written into RF, compared to simply having bits fetched, and tagging along
the pipeline doing no forwarding and no RF write. So, even it it only gets
used (say) 2% of the time, It can save 6% power (or more).

>
> Still sort of in favor of keeping predication, but might be tempted to
> consider making the SR.T bit exclusively used for predication (apart
> from CMPxx and TST, no other instructions would touch it).
>
> Though, could possibly make sense to consider consolidating "predicate
> stack manipulation" into the CMPxx instructions:
> CMPxx // T = Rs xx Rt
> CMPTAxx // T = T && (Rs xx Rt)
> CMPTOxx // T = T || (Rs xx Rt)
> CMPUAxx // T = U[0] && (Rs xx Rt)
> CMPUOxx // T = U[0] || (Rs xx Rt)
> CMPTPSAxx // U'={U[6:0], T}, T = T && (Rs xx Rt)
> CMPTPSOxx // U'={U[6:0], T}, T = T || (Rs xx Rt)
> CMPUPPAxx // T = U[0] && (Rs xx Rt), U={0, U[7:1]}
> CMPUPPOxx // T = U[0] || (Rs xx Rt), U={0, U[7:1]}
> ...
>
> Basically, because this can (in theory) allow predication to deal with
> nested if/else blocks with less encoding-space overhead than the use of
> dedicated predicate-bit registers (as was used in IA-64 or similar).
<

IMNSHO:: you are putting the xx bits in the CMP instruction rather than putting
them in the BB (or PB) instruction; where the room for the bits is less costly.
<
CMP Rt,Rs1,Rs2
BB xx,Label

Terje Mathisen

unread,

Oct 16, 2023, 2:52:11 AM10/16/23

to

John Levine wrote:
> According to John Dallman <j...@cix.co.uk>:
>> In article <363a3f9d-f061-4eda...@googlegroups.com>,
>> already...@yahoo.com (Michael S) wrote:
>>
>>> Back then common wisdom was that Oracle bought Sun Microsystems
>>> first and foremost because they owned mySQL.
>>
>> That never seemed plausible to me. The amount of money involved, and the
>> limited control over mySQL they'd get, just didn't seem to make sense.
>
> It makes sense as a way to ensure that MySQL never became a serious competitor
> to Oracle. They segmented the market, very low end to free MySQL, low end
> to paid MySQL, high end to Oracle. Yeah, the MySQL people left and now we
> have Mariadb and Percona further splitting the low end.
>

Personally I switched to MariaDB as soon as that became available,
professionally Postgres is doing everything I need, including stuff like
GeoSpatial data handling/datum transforms/queries.

Terje Mathisen

unread,

Oct 16, 2023, 3:05:56 AM10/16/23

to

Back when I was at Hydro, my brother Knut who was at the same company
working on optimizing fertilizer plants would need help every couple of
years: Every time AspenTech had an update to their license manager, we
had to figure out again how to get it to work.

In my previous job as CTO of Open iT, we made a (good!) living from
helping companies figure out how to pay less for their engineering
software licenses, our main source of information was the
(debug/regular) logs from the license managers. Before I left we were
supporting over 30 different license managers, we had to reverse
engineer incompatible modifications to the log formats every few months.

If you are ever in a situation where you are spending $Millions on
licensed software, _please_ get in contact with Open iT! (Or one of the
competitors, although I think Open iT which was the first to do this is
still the best).

Large companies like Maersk have gone public with their payback from
this kind of optimization: ROI of over 20X in just the first year,
including not just the Open iT software but also all internal and
external project costs. Subsequent years would be even better obviously.

David Brown

unread,

Oct 16, 2023, 3:33:02 AM10/16/23

to

On 16/10/2023 08:52, Terje Mathisen wrote:
> John Levine wrote:
>> According to John Dallman <j...@cix.co.uk>:
>>> In article <363a3f9d-f061-4eda...@googlegroups.com>,
>>> already...@yahoo.com (Michael S) wrote:
>>>
>>>> Back then common wisdom was that Oracle bought Sun Microsystems
>>>> first and foremost because they owned mySQL.
>>>
>>> That never seemed plausible to me. The amount of money involved, and the
>>> limited control over mySQL they'd get, just didn't seem to make sense.
>>
>> It makes sense as a way to ensure that MySQL never became a serious
>> competitor
>> to Oracle. They segmented the market, very low end to free MySQL, low
>> end
>> to paid MySQL, high end to Oracle. Yeah, the MySQL people left and
>> now we
>> have Mariadb and Percona further splitting the low end.
>>
> Personally I switched to MariaDB as soon as that became available,
> professionally Postgres is doing everything I need, including stuff like
> GeoSpatial data handling/datum transforms/queries.
>

PostgreSQL has always been much more upmarket and professional than
MySQL, and must surely have been far more of an alternative to Oracle
than MySQL ever was. People who use MySQL and are happy with its
feature set would not have the budget for Oracle for the task. But
PostgreSQL is much more feature-comparable to Oracle or MS SQL Server as
a database server - the key limitation is in the development and
administration tools.

Michael S

unread,

Oct 16, 2023, 4:56:21 AM10/16/23

to

On Sunday, October 15, 2023 at 4:02:31 PM UTC+3, EricP wrote:
> Michael S wrote:
> > On Sunday, October 15, 2023 at 4:49:12 AM UTC+3, MitchAlsup wrote:
> >> On Saturday, October 14, 2023 at 4:21:44 PM UTC-5, Anton Ertl wrote:
> >>> Michael S <already...@yahoo.com> writes:
> >>>
> >>> Makes me wonder how well TM worked on SPARC.
> >> <
> >> Not well enough to overcome the slow performance levels of SPARC CPUs.
> >
> > How do you know?
> > Remember, SPARC with HTM never reached customers.
> Sun Rock supposedly had HTM and was manufactured
> but I see reference to being canceled by Oracle.
>

Formally it was canceled by Oracle in 2010.
In reality it was canceled almost a year earlier, probably few weeks before
announcement of acquisition.
Rock's chief Marc Tremblay left Sun in April 2009.

David Brown

unread,

Oct 16, 2023, 4:58:49 AM10/16/23

to

At least these days we rarely have to deal with parallel port dongles!

One big-name embedded toolchain company I knew of had a larger support
department for dealing with licensing and dongle problems than their
department for technical support for the actual toolchain.

George Neuner

unread,

Oct 16, 2023, 10:39:02 AM10/16/23

to

On Mon, 16 Oct 2023 09:32:56 +0200, David Brown
<david...@hesbynett.no> wrote:

>PostgreSQL has always been much more upmarket and professional than
>MySQL, and must surely have been far more of an alternative to Oracle
>than MySQL ever was. People who use MySQL and are happy with its
>feature set would not have the budget for Oracle for the task. But
>PostgreSQL is much more feature-comparable to Oracle or MS SQL Server as
>a database server - the key limitation is in the development and
>administration tools.

Postgresql is much more capable than mySQL, but that capability comes
with a cost: Postgresql uses process parallelism - it is not
multithreaded and so uses more per-connection resources than does
mySQL. Postgresql often needs front-end connection pooling in
situations where mySQL would not, so even small installations can get
complicated.

I prefer Postgresql, but YMMV.

David Brown

unread,

Oct 16, 2023, 10:56:45 AM10/16/23

to

Yes, PostgreSQL is certainly a bit more resource-intensive than MySQL
(and forks). And threading, instead of or in combination with multiple
processes, would reduce that overhead a bit. This is especially true, I
believe, if running it on Windows. Other comparative overheads are the
inevitable result of being more capable - a database server that
enforces referential integrity is surely going to be more demanding than
one that doesn't bother with it!

Front-end pooling is often a useful feature even for MySQL - after all,
now that apparently every connection has to use SSL no matter how
unnecessary, making a new TCP/IP socket connection is not insignificant
effort.

> I prefer Postgresql, but YMMV.

So do I. I almost invariably use Python for database code, and
connection pooling is so easy that it's not an issue for me.

a...@littlepinkcloud.invalid

unread,

Oct 17, 2023, 5:16:05 AM10/17/23

to

Michael S <already...@yahoo.com> wrote:
>
> Back then common wisdom was that Oracle bought Sun Microsystems

> first and foremost because they owned mySQL. Java was a nice bonus
> on top of it. The rest of the company was just an appendage, to
> decide later either useful or not.

From what I remember of the numbers at the time, the ongoing
maintenance revenue from SPARC was enough to justify it. In addition,
Oracle was critically dependent on Java.

This sounds right:
https://www.forbes.com/sites/quora/2015/05/20/how-has-the-sun-acquisition-worked-out-for-oracle/

Andrew.

George Neuner

unread,

Oct 17, 2023, 6:54:02 PM10/17/23

to

On Sun, 15 Oct 2023 13:47:19 -0500, BGB <cr8...@gmail.com> wrote:

>On 10/15/2023 12:36 PM, George Neuner wrote:
>
>> There are exceptions, but "string" like fields should (almost) never
>> be the /primary/ key or used for a foreign key. However, it does make
>> sense to be able to index them effectively for searching.
>>
>
>Yeah.
>
>I guess, if I were designing something, likely integer fields would be
>used for primary keys (or index keys, if the primary key is not an
>integer type).

For a while now, I've been using UUID v1 for primary keys because they
can be time sorted (useful for log structured tables), and because in
a distributed database, separate shards can be merged without worrying
about conflicting primary keys.
[Obviously you must consider any /other/ keys on the table.]

>Possibly, secondary index B-Tree's could either map string fields to the
>index key/primary key; and also for implementing foreign keys, ...
>
>
>Less sure how one would go about efficiently implementing queries by
>non-unique fields though, say:
> select * from Workers where FirstName='James'
>Or similar...
>
>In a naive implementation, something like this would likely require
>walking every row in the table to perform the query.

Strings, in general, require an index that supports "multiset".

The actual strings obviously are needed, but unique strings can be
stored just once in a (compiler style) "interning" table associated
with the index. Additionally the strings can be hashed to create
integer search keys to accelerate "exact" match searches.

>Otherwise, one would likely need to build lists of rows which have a
>given field row, or alternatively use hash-chains or similar in addition
>to B-Trees. Would need to come up with a hash-chain structure that deals
>well with arbitrary scalability. One could possibly build hash-chains on
>top of a B-Tree, but this seems inefficient (say, one builds a
>linked-list for each hash-chain segment on top of a B-Tree, with each
>segment containing 0..N index keys for rows where this field matches the
>corresponding hash).

A (compiler style) interning table to store the unique strings in the
index usually is a good idea.

Using auxiliary data structures for rowsets is more complicated: you
need to maintain a separate rowset for each unique string, the rowsets
typically will be modified much more frequently than the string table,
the rowsets will be differently sized (set cardinality), and they will
need to be kept sorted to be useful for row fetching.

If you use a search method that naturally supports multisets - e.g.,
hash table or B+tree - then you can just keep the individual row
references right in the index entries, and construct a sorted list as
you locate matching entries. It costs some space, but makes
maintaining the rowsets much less complex.

Just in general you don't want your index to be in lots of little
pieces because that will complicate persisting it to disk. You want to
use structures that are simple to serialize: trees and lists
implemented by arrays of fixed size records, offsets instead of
pointers, etc.
[Not telling you anything you don't already know. 8-)]

In any case, exact matches and even many function-of(string) matches
can be handled relatively easily because the interned strings can be
hashed. The index keys then become a pair { hash, string-ref }, which
makes each entry a bit larger, but avoids dealing with character data
unless you first match the hash.

It is partial and dynamic matching: wildcards, regex, etc., that is
the real problem for indexing. These kinds of searches have to go
straight to the character data. They are brute force with the only
savings from indexing coming from reduced I/O: scanning the index vs
scanning the (presumably much larger) source table.

>But, yeah, in terms of the CPU, still it seems like all this mostly
>amounts to designing the CPU mostly to optimize for large working sets;
>rather than any particular ISA feature (say, the typical SIMD
>shenanigans or similar; where things like floating-point SIMD are
>unlikely to be particularly useful for database workloads, ...).
>
>...

Typical server workloads.

George

chrisq

unread,

Oct 21, 2023, 7:29:02 PM10/21/23

to

On 10/13/23 10:17, Marco Moock wrote:
> Hello!
>
> Fujitsu´s roadmap shows 2029 as end of sale and 2034 as end of support
> of their SPARC product line.
>
> Do other vendors like Oracle still continue to sell them or is SPARC
> dead?
>

The end of an age, and of a revolution. Learned so much from having and
using Sun machines, for nearly three decades. I guess no general purpose
computing company can match the tech prowess of Intel at squeezing the
last drop of performance from their processors.

Shut down the last Sparc machine here at the end of last year. Too slow
and too much power consumption, sadly...

Chris

Michael S

unread,

Oct 22, 2023, 7:57:37 AM10/22/23

to

On Sat, 21 Oct 2023 23:28:57 +0000
chrisq <dev...@nospam.com> wrote:

> On 10/13/23 10:17, Marco Moock wrote:
> > Hello!
> >
> > Fujitsu´s roadmap shows 2029 as end of sale and 2034 as end of
> > support of their SPARC product line.
> >
> > Do other vendors like Oracle still continue to sell them or is SPARC
> > dead?
> >
>
> The end of an age, and of a revolution.

Revolution? Can you elaborate?

> Learned so much from having
> and using Sun machines, for nearly three decades. I guess no general
> purpose computing company can match the tech prowess of Intel at
> squeezing the last drop of performance from their processors.
>

Not really.
Apple and AMD are very close to Intel's best in absolute single-thread
performance and both are better than Intel in performance per Watt. In
case of Apple, a lot better.
AMD also beats Intel in throughput per socket and throughput
per price.
Qualcomm promises to match Intel in absolute single-thread
performance really soon. And there are reasons to believe them.
Ampere beats Intel in throughput per socket and throughput
per price. They are behind in single-thread speed, but not ridiculously
so. Their single-thread performance is adequate for majority of uses.
The same applies to Amazon's Graviton processors. They don't sell
them so are not direct competitors. But they compete indirectly for
places in AWS racks.
IBM leads everybody else in throughput per core which is very important
ratio for people that by expensive software that is licensed per core.
Their single-thread performance is likely close to that
of Intel/AMD/Apple but they don't publish relevant benchmarks so it's a
guess rather than fact.

In other words, everybody who is still willing to compete is able to
outperform Intel in at least one metric.
Oracle and Fujitsu lost due to their own unwillingness to fight.

Branimir Maksimovic

unread,

Oct 22, 2023, 10:18:04 AM10/22/23

to

On 2023-10-22, Michael S <already...@yahoo.com> wrote:
> On Sat, 21 Oct 2023 23:28:57 +0000
> chrisq <dev...@nospam.com> wrote:
>
>> On 10/13/23 10:17, Marco Moock wrote:
>> > Hello!
>> >
>> > Fujitsu´s roadmap shows 2029 as end of sale and 2034 as end of
>> > support of their SPARC product line.
>> >
>> > Do other vendors like Oracle still continue to sell them or is SPARC
>> > dead?
>> >
>>
>> The end of an age, and of a revolution.
>
> Revolution? Can you elaborate?
>
>
>> Learned so much from having
>> and using Sun machines, for nearly three decades. I guess no general
>> purpose computing company can match the tech prowess of Intel at
>> squeezing the last drop of performance from their processors.
>>
>
> Not really.
> Apple and AMD are very close to Intel's best in absolute single-thread
> performance and both are better than Intel in performance per Watt. In
> case of Apple, a lot better.

Put Apple ARM on same frequency as Intel, and you will see single
threaded performance...

>

--

7-77-777, Evil Sinner!
https://www.linkedin.com/in/branimir-maksimovic-6762bbaa/

chrisq

unread,

Oct 23, 2023, 10:23:23 AM10/23/23

to

On 10/22/23 11:57, Michael S wrote:
> On Sat, 21 Oct 2023 23:28:57 +0000
> chrisq <dev...@nospam.com> wrote:
>
>
>> The end of an age, and of a revolution.
>
> Revolution? Can you elaborate?
>
>

Perhaps more than any other, Sun were at the leading edge of putting
serious technical computing on desktops. Affordable desktop unix
sounded the death knell of minicomputer manufacturers like DEC.
Companies like Charles River Data Systems were early 68K unix vendors,
but never reached the critical mass that Sun managed.

Compare the capability of even a Sun 3 with the state of the art in
pc tech at the time...

Chris

Michael S

unread,

Oct 23, 2023, 12:15:16 PM10/23/23

to

In that sense, "serious technical computing" still often done on
desktop, although probably not as much as it was 20-25 years ago.
And it could be said that Sun was the first that started to turn
the wheel in the opposite direction with their "network is a computer"
motto. That's what turned the "revolution" backward and brought
majority of "serious technical computing" back to server rooms behind
the closed doors, into modern days equivalents of minis. And in
more recent years even further back, to cloud, i.e. modern days
equivalents of mainframe and service bureau.
Ultimately, Sun Microsystems was not a beneficiary of reversed trend,
but that were they that started it, and it were they that put their
weight behind it for majority of their 28-years corporate history.

BGB

unread,

Oct 23, 2023, 1:35:54 PM10/23/23

to

Sort of half reminds me:
Sometime, roughly 20 years ago, had encountered something Sun related
(in one of the classes I was taking at the time).

IIRC, it looked like a flat-panel LCD monitor with a keyboard and mouse
plugged into it, with an Ethernet cable.

IIRC, the screen had a big Sun Microsystems logo, and I think also the
name Sun Microsystems next to the logo, taking up the top part of the
screen, with the lower part of the screen then having a username text field.

Looking stuff up, the "Sun Ray 150 Thin Client" looks most like the
thing in my memory.

Though, never saw it in use and never used it, was just sort of there...

Was a bit of a curiosity though, since this was mostly still in the era
of beige tower PCs with CRT monitors (this was shortly before I had
built a PC with a then-new x86-64 Clawhammer CPU; still using a big CRT
though).

Never saw any Sun workstations though; IIRC, at the time, they were most
well known for things like the Java JVM and similar.

...

Ironically, did see a video recently where someone was showing off a Mac
clone (which sort of resembled a Mac/PC hybrid, using a fairly
conventional 90s PC style case; including the whole thing of using a
riser so that they could put the PCI cards sideways, which is luckily
sort of a thing that mostly went away; also had a weird MOBO and used
SCSI drives), which is another thing I had never seen.

Last Macs I really saw IRL were mostly members of the "Macintosh II"
line, before they seemingly disappeared almost entirely (the world going
otherwise pretty much exclusively to Windows based PCs).

...

In other news: Did "mostly" achieve my "get stuff running at 75MHz"
goal, though just barely, and it is still rather prone to fail timing
seemingly at random.

Did resort to some "weird hacks", like moving part of the logic for
determining instruction length into the cache-line "store into L1 array"
stage, which means L1 I$ cache lines are now specific to the ISA mode
that was in effect at the time the cache-line was fetched (from memory).

Still seems possibly like my instruction encodings are "pushing it" for
a 75MHz CPU core, as things related to instruction length and decoding
were a big part of this part of the battle (and did end up making some
amount of structural changes to the instruction decoder, ...).

Does make me half wonder though if a "cleaned up" design for the CPU
core and ISA could maybe push everything (with a similar feature set to
my existing ISA) to 100MHz (well, excluding the FP-SIMD unit; which
still seems to fall a bit short of being able to pass timing as-is).

Though, could maybe make sense to add a few more EX stages, say:
EX1/2/3, As-is
EX4, "Hidden" stage (no forwarding).
EX5, Final EX stage
WB
With available instruction latency of: 2, 3, and 5 cycles (possibly with
a special case hack to still allow for single-cycle register moves and
constant loads). Then use 5-cycle ops for things like Binary32 FPU and
SIMD, given 3-cycle isn't really working out at the moment.

Also, turns out my branch predictor is broken and has likely been broken
for quite a while now (leaving a possible sub-goal of being to be able
to get the branch predictor to "actually work").

...

Scott Lurndal

unread,

Oct 23, 2023, 2:33:44 PM10/23/23

to

Michael S <already...@yahoo.com> writes:
>On Mon, 23 Oct 2023 14:23:10 +0000
>chrisq <dev...@nospam.com> wrote:
>
>> On 10/22/23 11:57, Michael S wrote:
>> > On Sat, 21 Oct 2023 23:28:57 +0000
>> > chrisq <dev...@nospam.com> wrote:
>> >
>> >
>> >> The end of an age, and of a revolution.
>> >
>> > Revolution? Can you elaborate?
>> >
>> >
>>
>> Perhaps more than any other, Sun were at the leading edge of putting
>> serious technical computing on desktops. Affordable desktop unix
>> sounded the death knell of minicomputer manufacturers like DEC.
>> Companies like Charles River Data Systems were early 68K unix vendors,
>> but never reached the critical mass that Sun managed.
>>
>> Compare the capability of even a Sun 3 with the state of the art in
>> pc tech at the time...
>>
>> Chris
>>
>
>In that sense, "serious technical computing" still often done on
>desktop, although probably not as much as it was 20-25 years ago.

When the Sun-1 came out in the 80's, the bulk of
technical computing was done on minicomputers (e.g. VAX)
using graphical output devices like the 4014 and VK100 GIGI.

John Dallman

unread,

Oct 23, 2023, 3:52:24 PM10/23/23

to

In article <uh6atm$39euq$1...@dont-email.me>, cr8...@gmail.com (BGB) wrote:

> Looking stuff up, the "Sun Ray 150 Thin Client" looks most like the
> thing in my memory.
>
> Though, never saw it in use and never used it, was just sort of
> there...

A guy from Sun came to my employer about that time to promote them to us.
We had no use for them, and it seems not many people did. The days of X
terminals were brief, and pre-dated the RISC revolution.

> Never saw any Sun workstations though; IIRC, at the time, they were
> most well known for things like the Java JVM and similar.

We had a fair few Sun workstations for building and testing software at
the time. We dropped the platform about 2018.

John

BGB

unread,

Oct 23, 2023, 4:27:56 PM10/23/23

to

On 10/23/2023 2:52 PM, John Dallman wrote:
> In article <uh6atm$39euq$1...@dont-email.me>, cr8...@gmail.com (BGB) wrote:
>
>> Looking stuff up, the "Sun Ray 150 Thin Client" looks most like the
>> thing in my memory.
>>
>> Though, never saw it in use and never used it, was just sort of
>> there...
>
> A guy from Sun came to my employer about that time to promote them to us.
> We had no use for them, and it seems not many people did. The days of X
> terminals were brief, and pre-dated the RISC revolution.
>

This was around 2003 or so IIRC.

Granted, I am not entirely sure the point of having a thin terminal,
when one also has PC's. Also never actually saw anyone using it for
anything, was just sort of hanging out in one of my classes as a bit of
a novelty.

This was also in the days when Windows XP was new and shiny, but the
use-case for a PC running Windows XP was a bit more obvious.

>> Never saw any Sun workstations though; IIRC, at the time, they were
>> most well known for things like the Java JVM and similar.
>
> We had a fair few Sun workstations for building and testing software at
> the time. We dropped the platform about 2018.
>

OK.

Did they keep getting made after Oracle bought Sun?...

But, if any were around, wherever they were, they were not anywhere near
any of the high-school students.

...

Some time later, when taking some college classes (after moving to
Guam), they were using an AS/400 somewhere. It was not uncommon to see
teachers and other people doing data-entry tasks in a terminal emulator
with a characteristic white-on-blue text interface.

IIRC, according to one of the teachers, somewhere on campus there was I
guess a twinax to ARCNET bridge, where apparently ARCNET was used for
the on-campus network (as opposed to Ethernet). Whole thing seemed kinda
weird and crufty at the time.

This being sometime around the 2006/2007 timeframe...

Scott Lurndal

unread,

Oct 23, 2023, 4:59:36 PM10/23/23

to

BGB <cr8...@gmail.com> writes:
>On 10/23/2023 2:52 PM, John Dallman wrote:
>> In article <uh6atm$39euq$1...@dont-email.me>, cr8...@gmail.com (BGB) wrote:
>>
>>> Looking stuff up, the "Sun Ray 150 Thin Client" looks most like the
>>> thing in my memory.
>>>
>>> Though, never saw it in use and never used it, was just sort of
>>> there...
>>
>> A guy from Sun came to my employer about that time to promote them to us.
>> We had no use for them, and it seems not many people did. The days of X
>> terminals were brief, and pre-dated the RISC revolution.
>>
>
>This was around 2003 or so IIRC.
>
>Granted, I am not entirely sure the point of having a thin terminal,
>when one also has PC's. Also never actually saw anyone using it for
>anything, was just sort of hanging out in one of my classes as a bit of
>a novelty.

We used NCD X-terminals extensively from 1989 through 1997,
connected to Unix servers (mostly 88100-based MVME or Unisys
S/8400 systems).

PCs were not viable alternatives in that era.

BGB

unread,

Oct 23, 2023, 5:33:27 PM10/23/23

to

Yeah... But, if it were from the 1989..1997 timeframe, I don't expect it
would have been using a full color LCD flat-panel display...

This was still fairly new tech at the time... and mostly limited to
things like laptops.

Granted, these sorts of LCD displays mostly replaced CRTs over the
following years.

Like, seemingly, someone, in around 2002 (apparently when this model
went on the market), thought getting these terminals was a good idea.

Or, if it was an older installation, I would have expected CRT based
terminals.

...

Stefan Monnier

unread,

Oct 23, 2023, 5:58:32 PM10/23/23

to

> Granted, I am not entirely sure the point of having a thin terminal, when
> one also has PC's.

IIRC, the point was: zero configuration.
In contrast room full of PCs were a nightmare to maintain.

Stefan

MitchAlsup

unread,

Oct 23, 2023, 6:53:41 PM10/23/23

to

On Monday, October 23, 2023 at 4:33:27 PM UTC-5, BGB wrote:
> On 10/23/2023 3:59 PM, Scott Lurndal wrote:
> > BGB <cr8...@gmail.com> writes:
> >> On 10/23/2023 2:52 PM, John Dallman wrote:
> >>> In article <uh6atm$39euq$1...@dont-email.me>, cr8...@gmail.com (BGB) wrote:
> >>>
> >>>> Looking stuff up, the "Sun Ray 150 Thin Client" looks most like the
> >>>> thing in my memory.
> >>>>
> >>>> Though, never saw it in use and never used it, was just sort of
> >>>> there...
> >>>
> >>> A guy from Sun came to my employer about that time to promote them to us.
> >>> We had no use for them, and it seems not many people did. The days of X
> >>> terminals were brief, and pre-dated the RISC revolution.
> >>>
> >>
> >> This was around 2003 or so IIRC.
> >>
> >> Granted, I am not entirely sure the point of having a thin terminal,
> >> when one also has PC's. Also never actually saw anyone using it for
> >> anything, was just sort of hanging out in one of my classes as a bit of
> >> a novelty.
> >
> > We used NCD X-terminals extensively from 1989 through 1997,
> > connected to Unix servers (mostly 88100-based MVME or Unisys
> > S/8400 systems).
> >
> > PCs were not viable alternatives in that era.
> >
> Yeah... But, if it were from the 1989..1997 timeframe, I don't expect it
> would have been using a full color LCD flat-panel display...
<

My 26" monitor I bought in '97 weighed at least 120 pounds and emitted
close to 600W of thermal energy into my office. Best color I every saw,
aside from emitting x-Rays and 600W it was rock solid.

>
> This was still fairly new tech at the time... and mostly limited to
> things like laptops.
>
> Granted, these sorts of LCD displays mostly replaced CRTs over the
> following years.
>

Panels still do not have the color gamut of the old "good" CRTs.

MitchAlsup

unread,

Oct 23, 2023, 6:54:37 PM10/23/23

to

Not if you bought all of them with the same SKU and did not allow for
"upgrades" an had administrators install software.
>
> Stefan

BGB

unread,

Oct 23, 2023, 6:54:44 PM10/23/23

to

OK. That makes sense.

Roughly a decade later, was taking some more classes.

The setup there was basically PCs, but with a sort of configuration
where the PC's didn't really remember anything on the local "C:" drive
(it only remembered files on a network drive with contents associated
with each student's login).

One could "install" programs by selecting them from a menu, but they
were associated with the user rather than on the PC being used. But,
logging in would sometimes take a rather long time...

Some of the students hacked around the menu-installer limitation by
having their "non-standard" programs (usually games and similar)
installed on their network drive.

IIRC, this was on Windows 7, not entirely sure how it was set up. I
think it was using PXE boot or similar though; and in cases where the
network boot went down, the PC's wouldn't boot and would get stuck in
the BIOS (and already booted PCs would fail to log in or would
unexpectedly lock up).

Though, IIRC the teacher PCs were independent of this system.

...

Stefan Monnier

unread,

Oct 23, 2023, 7:10:04 PM10/23/23

to

>> > Granted, I am not entirely sure the point of having a thin terminal, when
>> > one also has PC's.
>> IIRC, the point was: zero configuration.
>> In contrast room full of PCs were a nightmare to maintain.
> Not if you bought all of them with the same SKU and did not allow for
> "upgrades" an had administrators install software.

Hardware wasn't the problem: the problem was that it was still difficult
to keep users at bay (i.e. prevent them from messing up the system,
while at the same time give them enough rights to be able to get their
job done), and it was a constant struggle to keep them all uptodate.

Thin clients were supposed to have none of those downsides.

Nowadays, thin clients are everywhere, tho we call them "browsers"
instead :-)

Stefan

Scott Lurndal

unread,

Oct 24, 2023, 9:00:41 AM10/24/23

to

MitchAlsup <Mitch...@aol.com> writes:
>On Monday, October 23, 2023 at 4:33:27=E2=80=AFPM UTC-5, BGB wrote:
>> On 10/23/2023 3:59 PM, Scott Lurndal wrote:=20
>> > BGB <cr8...@gmail.com> writes:=20
>> >> On 10/23/2023 2:52 PM, John Dallman wrote:=20
>> >>> In article <uh6atm$39euq$1...@dont-email.me>, cr8...@gmail.com (BGB) =
>wrote:=20
>> >>>=20
>> >>>> Looking stuff up, the "Sun Ray 150 Thin Client" looks most like the=
>=20
>> >>>> thing in my memory.=20
>> >>>>=20
>> >>>> Though, never saw it in use and never used it, was just sort of=20
>> >>>> there...=20
>> >>>=20
>> >>> A guy from Sun came to my employer about that time to promote them to=
> us.=20
>> >>> We had no use for them, and it seems not many people did. The days of=
> X=20
>> >>> terminals were brief, and pre-dated the RISC revolution.=20
>> >>>=20
>> >>=20
>> >> This was around 2003 or so IIRC.=20
>> >>=20
>> >> Granted, I am not entirely sure the point of having a thin terminal,=
>=20
>> >> when one also has PC's. Also never actually saw anyone using it for=20
>> >> anything, was just sort of hanging out in one of my classes as a bit o=
>f=20
>> >> a novelty.=20
>> >=20
>> > We used NCD X-terminals extensively from 1989 through 1997,=20
>> > connected to Unix servers (mostly 88100-based MVME or Unisys=20
>> > S/8400 systems).=20
>> >=20
>> > PCs were not viable alternatives in that era.=20
>> >
>> Yeah... But, if it were from the 1989..1997 timeframe, I don't expect it=
>=20
>> would have been using a full color LCD flat-panel display...=20

17" color and 16" b&W (NCD17c, NCD16).

><
>My 26" monitor I bought in '97 weighed at least 120 pounds and emitted
>close to 600W of thermal energy into my office. Best color I every saw,
>aside from emitting x-Rays and 600W it was rock solid.

I had one of the 26" Sony monitors when I was at SGI ('97-'00) (attached to
a 2p Octane). Heavy, but as you say, excellent color.

Scott Lurndal

unread,

Oct 24, 2023, 9:01:28 AM10/24/23

to

MitchAlsup <Mitch...@aol.com> writes:
>On Monday, October 23, 2023 at 4:58:32=E2=80=AFPM UTC-5, Stefan Monnier wro=
>te:
>> > Granted, I am not entirely sure the point of having a thin terminal, wh=
>en=20

>> > one also has PC's.

>> IIRC, the point was: zero configuration.=20
>> In contrast room full of PCs were a nightmare to maintain.=20
>>=20

>Not if you bought all of them with the same SKU and did not allow for
>"upgrades" an had administrators install software.

Even then.

The NCD's downloaded the software when booted.

Anton Ertl

unread,

Oct 24, 2023, 12:57:44 PM10/24/23

to

sc...@slp53.sl.home (Scott Lurndal) writes:
>We used NCD X-terminals extensively from 1989 through 1997,
>connected to Unix servers (mostly 88100-based MVME or Unisys
>S/8400 systems).

We have used NCD X-Terminals (19" B/W CRTs with 1280x1024 resolution)
since 1993. As someone wrote, a big advantage was ease of
administration. Another big advantage was that the X-Terminals
remained up-to-date for much longer than PCs of the day. We kept the
NCDs until 2002. During that time we went through a number of
servers, from DecStations with 16MB RAM or so over 21064 Alphas,
21164A Alphas, 21264B Alphas with 1GB RAM, Coppermines (Pentium III
750 and Celeron 800) and Palomino (Athlon MP). So our users always
used a relatively recent CPU while staying with the same X-Terminals.

We replaced the NCDs with Igel Thin Clients (the then-fashionable name
for X-Terminals) and LCD screens in 2002. We replaced the Igels with
Zotac Zbox nano CI540 Plus in 2014 or so that we also boot over the
network and use as X-Terminals with LTSP as software.

- anton
--
'Anyone trying for "industrial quality" ISA should avoid undefined behavior.'
Mitch Alsup, <c17fcd89-f024-40e7...@googlegroups.com>

Andy Valencia

unread,

Oct 24, 2023, 8:46:46 PM10/24/23

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> We have used NCD X-Terminals (19" B/W CRTs with 1280x1024 resolution)
> since 1993. As someone wrote, a big advantage was ease of
> administration. Another big advantage was that the X-Terminals
> remained up-to-date for much longer than PCs of the day.

Another sweet spot for technology was the Linux Terminal Server Project.
Basically you'd PXE boot a thin client onto low spec diskless PC hardware,
with a server doing all the heavy lifting. It saved our local school
district thousands of dollars.

Software bloat being what it must be, eventually big apps like the browser
moved back on to the "thin" clients, who needed to be less thin. Eventually
all the win was lost, and that was the end of LTSP for us. It looks like it
has continued to evolve.

ChromeOS has filled the vacuum, in essence not that different from
the last iteration of LTSP we used.

Andy Valencia
Home page: https://www.vsta.org/andy/
To contact me: https://www.vsta.org/contact/andy.html

Thomas Koenig

unread,

Oct 27, 2023, 9:01:15 AM10/27/23

to

Scott Lurndal <sc...@slp53.sl.home> schrieb:

> When the Sun-1 came out in the 80's, the bulk of
> technical computing was done on minicomputers (e.g. VAX)
> using graphical output devices like the 4014 and VK100 GIGI.

Without a numerical co-processor, the 68000 was not really
competetive for floating point. This probably let minicomputer
vendors sleep at night, for a time.

Bernd Linsel

unread,

Oct 27, 2023, 12:41:35 PM10/27/23

to

There were the 68881 and 68882.
https://en.wikipedia.org/wiki/Motorola_68881

--
Bernd Linsel

Opus

unread,

Oct 27, 2023, 4:55:01 PM10/27/23

to

As I remember and they seem to state as well on this wikipedia page, the
68881/2 were designed for use with a 68020 (and later 68030) and not for
the 68000 itself.

I don't remember if it was at all possible to use a 68881 with a 68000.
But maybe by "68000" you meant the 68020 and later.

Thomas Koenig

unread,

Oct 27, 2023, 5:38:37 PM10/27/23

to

Bernd Linsel <bl1-thispartdo...@gmx.com> schrieb:

Yes, but the Sun-1 was released in 1982 (without numeric
coprocessor), the 68881 only in 1984. The Sun-3 was the first
machine with a numeric coprocessor, which made serious
numeric work possible.

Bernd Linsel

unread,

Oct 27, 2023, 6:07:00 PM10/27/23

to

Au contraire, the -881 worked with all 68k family processors, at least
up to the 040.

See e.g.
http://www.bitsavers.org/components/motorola/_appNotes/AN-0947_MC68881_Floating-Point_Coprocessor_as_a_Peripheral_in_a_M68000_System_%5BMotorola_1987_37p%5D.pdf

--
Bernd Linsel

MitchAlsup

unread,

Oct 27, 2023, 8:43:07 PM10/27/23

to

On Friday, October 27, 2023 at 5:07:00 PM UTC-5, Bernd Linsel wrote:
> On 27.10.2023 22:54, Opus wrote:
> > On 27/10/2023 18:41, Bernd Linsel wrote:
> >> On 27.10.2023 15:01, Thomas Koenig wrote:
> >>> Scott Lurndal <sc...@slp53.sl.home> schrieb:
> >>>
> >>>> When the Sun-1 came out in the 80's, the bulk of
> >>>> technical computing was done on minicomputers (e.g. VAX)
> >>>> using graphical output devices like the 4014 and VK100 GIGI.
> >>>
> >>> Without a numerical co-processor, the 68000 was not really
> >>> competetive for floating point. This probably let minicomputer
> >>> vendors sleep at night, for a time.
> >>
> >> There were the 68881 and 68882.
> >> https://en.wikipedia.org/wiki/Motorola_68881
> >
> > As I remember and they seem to state as well on this wikipedia page, the
> > 68881/2 were designed for use with a 68020 (and later 68030) and not for
> > the 68000 itself.
> >
> > I don't remember if it was at all possible to use a 68881 with a 68000.
> > But maybe by "68000" you meant the 68020 and later.
> Au contraire, the -881 worked with all 68k family processors, at least
> up to the 040.
<

Where "worked" meant that::
FABS was 35 cycles !!
FADD was 51 cycles !!
FMUL was 71 cycles !!
heck even FNEG was 35 cycles !!!
<
It (and x87 originals) were so bad at IEEE 754 that this opened up the door to RISCs.

Thomas Koenig

unread,

Oct 28, 2023, 10:02:23 AM10/28/23

to

MitchAlsup <Mitch...@aol.com> schrieb:

> On Friday, October 27, 2023 at 5:07:00 PM UTC-5, Bernd Linsel wrote:
>> On 27.10.2023 22:54, Opus wrote:
>> > On 27/10/2023 18:41, Bernd Linsel wrote:
>> >> On 27.10.2023 15:01, Thomas Koenig wrote:
>> >>> Scott Lurndal <sc...@slp53.sl.home> schrieb:
>> >>>
>> >>>> When the Sun-1 came out in the 80's, the bulk of
>> >>>> technical computing was done on minicomputers (e.g. VAX)
>> >>>> using graphical output devices like the 4014 and VK100 GIGI.
>> >>>
>> >>> Without a numerical co-processor, the 68000 was not really
>> >>> competetive for floating point. This probably let minicomputer
>> >>> vendors sleep at night, for a time.
>> >>
>> >> There were the 68881 and 68882.
>> >> https://en.wikipedia.org/wiki/Motorola_68881
>> >
>> > As I remember and they seem to state as well on this wikipedia page, the
>> > 68881/2 were designed for use with a 68020 (and later 68030) and not for
>> > the 68000 itself.
>> >
>> > I don't remember if it was at all possible to use a 68881 with a 68000.
>> > But maybe by "68000" you meant the 68020 and later.
>> Au contraire, the -881 worked with all 68k family processors, at least
>> up to the 040.
><
> Where "worked" meant that::
> FABS was 35 cycles !!

That is weird, that is just setting a bit.

> FADD was 51 cycles !!
> FMUL was 71 cycles !!
> heck even FNEG was 35 cycles !!!

Likewise, just an NEG...

> It (and x87 originals) were so bad at IEEE 754 that this opened up the door to RISCs.

I think the main source of the slowness whas that both 8087 and
68881/68882 used CORDIC.

It is an interesting question what sort of performance would have
been possible with the transistor budget of the 8087, around 45000
transistors, or 155000 for the 68881, if Wikipedia is to be believed.

Anton Ertl

unread,

Oct 28, 2023, 11:45:30 AM10/28/23

to

Thomas Koenig <tko...@netcologne.de> writes:
>MitchAlsup <Mitch...@aol.com> schrieb:
[68881]

>> Where "worked" meant that::
>> FABS was 35 cycles !!
>
>That is weird, that is just setting a bit.
>
>> FADD was 51 cycles !!
>> FMUL was 71 cycles !!
>> heck even FNEG was 35 cycles !!!
>
>Likewise, just an NEG...
>
>> It (and x87 originals) were so bad at IEEE 754 that this opened up the door to RISCs.
>
>I think the main source of the slowness whas that both 8087 and
>68881/68882 used CORDIC.

That cannot be the reason for FABS FADD FNEG.

>It is an interesting question what sort of performance would have
>been possible with the transistor budget of the 8087, around 45000
>transistors, or 155000 for the 68881, if Wikipedia is to be believed.

The 88100 contains 165,000 transistors, contains a pipelined integer
CPU and a pipelined FPU. It only does 32-bit and 64-bit FP (not
80-bit), and the latency for fadd.ddd is 6 cycles, and for fmul.ddd is
9 cycles. The FUs are pipelined, but register read and write is 32
bits at a time, so you can start an fadd.ddd only every second cycle.
If I understand the FP1 Extra stuff in Table 7-6 correctly, it can
start an fmul.ddd only ever 4 cycles.

FP absolute and FP negation are implemented with integer instructions
and take 1 cycle (I think).

According to
<https://archive.org/details/ieee_micro_v8n3_june_88/page/n54/mode/1up>,
the MIPS R3010 contains 75,000 transistors; it runs at 25MHz and takes
2 cycles for an FP DP add, and 5 cycles for an FP DP multiply
(compared to 26 and 46 for the 68882). According to Figure 1 and
Table 3, the R3010 was 16-27 times faster than a VAX11/780, and more
than 3 times faster than a VAX 8700. The Weitek 1164/1165 in the Sun
4/260 roughly matched the VAX 8700, and both were 3-7 times faster
than the VAX 11/780. The 68881 (in a Sun 3/260) roughly matched the
VAX 11/780.

In find the title of the next article funny: "Intel's 80960: An
Architecture optimized for Embedded Control". I only realized a few
years ago that it was anything but.

Going to the next article, it is about the Weitek 3164 (probably a
successor to the 1164 mentioned above). I did't find a transistor
count for the WTL3164, though.

Michael S

unread,

Oct 28, 2023, 2:12:56 PM10/28/23

to

test

MitchAlsup

unread,

Oct 28, 2023, 3:09:18 PM10/28/23

to

On Saturday, October 28, 2023 at 10:45:30 AM UTC-5, Anton Ertl wrote:
> Thomas Koenig <tko...@netcologne.de> writes:
> >MitchAlsup <Mitch...@aol.com> schrieb:
> [68881]
> >> Where "worked" meant that::
> >> FABS was 35 cycles !!
> >
> >That is weird, that is just setting a bit.
> >
> >> FADD was 51 cycles !!
> >> FMUL was 71 cycles !!
> >> heck even FNEG was 35 cycles !!!
> >
> >Likewise, just an NEG...
> >
> >> It (and x87 originals) were so bad at IEEE 754 that this opened up the door to RISCs.
> >
> >I think the main source of the slowness whas that both 8087 and
> >68881/68882 used CORDIC.
> That cannot be the reason for FABS FADD FNEG.
> >It is an interesting question what sort of performance would have
> >been possible with the transistor budget of the 8087, around 45000
> >transistors, or 155000 for the 68881, if Wikipedia is to be believed.
> The 88100 contains 165,000 transistors, contains a pipelined integer
> CPU and a pipelined FPU. It only does 32-bit and 64-bit FP (not
<

There were 4 pipelined units: Integer+logical+shift, multiply {Int, FP,
int DIV, FP DIV+SQRT}, and FADD {ADD, SUB, CMP, certain others},
AND memory {LDs and STs}

<
> 80-bit), and the latency for fadd.ddd is 6 cycles, and for fmul.ddd is
> 9 cycles. The FUs are pipelined, but register read and write is 32
> bits at a time, so you can start an fadd.ddd only every second cycle.
> If I understand the FP1 Extra stuff in Table 7-6 correctly, it can
> start an fmul.ddd only ever 4 cycles.
>
> FP absolute and FP negation are implemented with integer instructions
> and take 1 cycle (I think).
<

Correct but getting the 1<<31 was harder than FABS or FNEG, and these
are minor reasons My 66000 has universal constant support.

>
> According to
> <https://archive.org/details/ieee_micro_v8n3_june_88/page/n54/mode/1up>,
> the MIPS R3010 contains 75,000 transistors; it runs at 25MHz and takes
> 2 cycles for an FP DP add, and 5 cycles for an FP DP multiply
> (compared to 26 and 46 for the 68882). According to Figure 1 and
> Table 3, the R3010 was 16-27 times faster than a VAX11/780, and more
> than 3 times faster than a VAX 8700. The Weitek 1164/1165 in the Sun
> 4/260 roughly matched the VAX 8700, and both were 3-7 times faster
> than the VAX 11/780. The 68881 (in a Sun 3/260) roughly matched the
> VAX 11/780.
>
> In find the title of the next article funny: "Intel's 80960: An
> Architecture optimized for Embedded Control". I only realized a few
> years ago that it was anything but.
<

Anything but 'optimized' or anything but 'Embedded' ??

BGB

unread,

Oct 28, 2023, 5:17:12 PM10/28/23

to

Yeah...

These were previously 1-cycle in my case, now 2-cycle (with the 75MHz
clock-boost). But, I am not sure if boosting the clock-speed was "worth
it" (much of what was gained in MHz, was lost in all the extra interlock
penalties and similar; and all the added timing-constraints related
pains...).

Would seemingly be a bit easier if I were going for a 14ns cycle, rather
than 13.3 ...

>> It (and x87 originals) were so bad at IEEE 754 that this opened up the door to RISCs.
>
> I think the main source of the slowness whas that both 8087 and
> 68881/68882 used CORDIC.
>

These was likely something else at play, as CORDIC would be N/A for the
instructions mentioned above.

Well, and in my case, the instructions that would have used CORDIC, are
effectively absent.

But, yeah, it seems early FPUs were horribly slow.

As I see it:
Basically needed:
FADD, FSUB, FMUL
FCMP
Format converter ops.
Nice to have:
FMAC (but, has a latency cost to support it)
Optional in practice (can be done effectively in software, 1):
FDIV, FRCP
FSQRT
Better off omitted:
All of the complex math operators;
More so as many of the C libraries just do these in software.

Seemingly, the early FPUs more went all-in with features (like
instructions for trigonometric functions and so on).

*1: Most "cheap" strategies to do these in hardware being slower than
doing them in software (and not terribly useful to have them if they are
going to be boat anchors).

But, also with wonky designs, and/or are implemented in ways that become
a bottleneck (say, x87 suffering in that one needs a lot of extra
instructions to get much of anything done with it).

Things like needing to move values between GPRs and FPRs, or
twiddle/reload control register bit for operations, are also better off
avoided.

> It is an interesting question what sort of performance would have
> been possible with the transistor budget of the 8087, around 45000
> transistors, or 155000 for the 68881, if Wikipedia is to be believed.

I guess a related question would be, how transistor budget relates to
FPGA LUTs.

Theoretically, a LUT would need a lot more logic, but the ratio should
be smaller if one accounts for most of the LUTs effectively being used
for logic functions.

Say:
LUT1: Typically a NOT gate.
LUT2: Typically AND/OR/XOR/...
LUT3: 3-input gate, or a 2-way MUX
LUT4: Roughly 3 2-input gates;
LUT5: Misc stuff
LUT6: Say, 5 logic gates, or a 4-intput MUX.

Then I guess the biggest cases being MUX7 (8 input bits, 3 bit selector,
1 bit output; apparently 2 conjoined LUT6's).

Also a rare special case LUT7, apparently consisting of 2 LUT6s and an
internal MUX, but I haven't seen this case "in the wild" in synthesis
despite attempts to invoke it (seemingly usually just turns into a LUT6
followed by a LUT2).

IIRC:
NOT: Transistor + Reistor
NAND: Two transistors in series, and a resistor.
NOR: Two transistors in parallel, and a resistor.
AND: NAND followed by NOT (3 transistors, 2 resistors);
OR: NOR followed by NOT (3 transistors, 2 resistors);
XOR: More complicated, I don't remember at the moment.

So, LUT4: Say, assume 9-12 transistors on average, and assume LUT6 is 15-20.

Working backwards:
45k transistors -> Somewhere 3k LUT6 or 4k LUT4, roughly ICE40 area.
1M transistors -> Around 50..67k LUT6, so roughly XC7A100T ?...

Granted, this likely gets shot out if one considers things like DSP48
units or Block-RAM or similar. Would probably need a bit more thinking
to come up with a transistor equivalent for something like a Block-RAM.

Granted, it sorta makes sense:
An ICE40 is limited enough that one is hard-pressed to make much more
than an 8/16 bit microcontroller or similar.

Can make a fairly fancy CPU with something like an XC7A100T.
The 200T is also nice.

...

Not sure how much of an FPU one could fit into 4k LUT. Just the main FPU
from the BJX2 Core would already blow out the budget on something this
size; and this is with the the limiting constraint that it only really
does FADD/FSUB/FMUL.

Seems like one would somehow need to implement FP ops internally using
narrower hardware, such as working in 16-bit units in some sort of
elaborate state machine. This would likely suck.

Or, say, the FPU is actually internally more like a 16-bit
microcontroller merely presenting the front-end interface of being an
FPU. This could almost make sense in a way.

...

Branimir Maksimovic

unread,

Oct 28, 2023, 6:24:44 PM10/28/23

to

68000 was 16 bit, 68020 I think 32 bit, correct me if I am wrong...

--

7-77-777, Evil Sinner!
https://www.linkedin.com/in/branimir-maksimovic-6762bbaa/

David Schultz

unread,

Oct 28, 2023, 7:14:18 PM10/28/23

to

On 10/27/23 3:54 PM, Opus wrote:
> I don't remember if it was at all possible to use a 68881 with a 68000.
> But maybe by "68000" you meant the 68020 and later.

You could use it as a peripheral with the 68000. You could still write
the code to use the 68881 instructions, they were just dealt with by an
exception handler rather than the coprocessor interface.

--
http://davesrocketworks.com
David Schultz

MitchAlsup

unread,

Oct 28, 2023, 8:01:20 PM10/28/23

to

68000 was a 32-bit architecture on a 16-bit µarchitecture with a 24-bit
address bus and 16-bit data bus.
68008 was a 32-bit architecture on a 16-bit µarchitecture with a 24-bit
address bus and a 8-bit data bus.
68010 was a 32-bit architecture on a 16-bit µarchitecture with a 24-bit
address bus and a 16-bit data bus.
68020 was a 32-bit architecture on a 32-bit µarchitecture with a 24-bit
address bus and a 32-bit data bus.

Joe Pfeiffer

unread,

Oct 28, 2023, 8:15:25 PM10/28/23

to

68000 was 32 bit architecture on a 16 bit bus.

Michael S

unread,

Oct 29, 2023, 5:50:07 AM10/29/23

to

Mitch,
Users that access this group through Eternal September server, which by now
appear to be approximately half of the regular poster, don't see you last post,
because admin of Eternal September turned off Google Groups feeds for
majority of yesterday (Sat). If you want your post read, please repost later
today (Sun) through Google Groups or whenever you wish through other
provider.

Timothy McCaffrey

unread,

Oct 29, 2023, 8:32:19 PM10/29/23

to

I think the 68008 was only a 20 bit address bus.
The 68020 was 32 bit address bus.

- Tim

George Neuner

unread,

Oct 30, 2023, 12:06:12 AM10/30/23

to

On Fri, 27 Oct 2023 17:43:05 -0700 (PDT), MitchAlsup
<Mitch...@aol.com> wrote:

>On Friday, October 27, 2023 at 5:07:00?PM UTC-5, Bernd Linsel wrote:
>> On 27.10.2023 22:54, Opus wrote:

>> :

>> > I don't remember if it was at all possible to use a 68881 with a 68000.
>> > But maybe by "68000" you meant the 68020 and later.
>> Au contraire, the -881 worked with all 68k family processors, at least
>> up to the 040.
><
>Where "worked" meant that::
>FABS was 35 cycles !!
>FADD was 51 cycles !!
>FMUL was 71 cycles !!
>heck even FNEG was 35 cycles !!!

Which still was much faster than doing it in software on the CPU.

Mid 90's to 2000 I was doing a lot of medical imaging. At that time
MRI 'pixels' were 32-bit floating point and we were starting to see
64-bit pixels on the latest units.
[Technically the pixels really were a fractional deviation from a
common base, however both base and deviation(s) were expressed in 754
single format. For the most part, it was easiest to handle it all as
FP data.]

25 years later I don't have numbers to give, but I worked with
68030/68882 and 80386/80387 both running at 33MHz. At least with our
FP imaging codes, the 68030/68882 combo was noticeably faster.

Same with 68040: at 40MHz it trounced 100Mhz 80486-dx4 and (original)
60MHz Pentium. Versus 75Mhz Pentium, it was a toss-up ... some codes
were faster on Motorola, some on Intel.

Had different results with integer imaging codes - actually very
different results depending on the algorithm - but we are talking
about FPU performance here. 8-)

Unfortunately 40MHz 68040 was the last Motorola I used. When the
90MHz Pentium arrived, Intel definitively won the performance/price
contest, and we never looked back.

YMMV.

robf...@gmail.com

unread,

Oct 30, 2023, 1:24:04 AM10/30/23

to

The 68008 48 pin DIP was 20 address bits, but the 52-pin QFP version brought
out a couple more address bits (22 total) and another interrupt line.

David Brown

unread,

Oct 30, 2023, 8:00:14 AM10/30/23

to

Another issue for much of it was the slow coprocessor communication.
IIRC the 68000 did not have a separate coprocessor interface, requiring
many bus cycles (at 16-bit width) to pass instructions and data across.

A /long/ time ago, we made a board with a 68332 (a microcontroller with
a CPU mostly like a 68020, but with a 16-bit external databus) and a
68881 floating point coprocessor. The 68332 was great, and we used it
on many systems, but working with the 68881 was /slow/.

> It is an interesting question what sort of performance would have
> been possible with the transistor budget of the 8087, around 45000
> transistors, or 155000 for the 68881, if Wikipedia is to be believed.

Some of the hardware implementations of that time were extremely slow.
I think it was on the 68020 that they realised that the hardware
division instructions were slower than software division functions, and
so the hardware division instructions were dropped for the 68030 and later.

David Brown

unread,

Oct 30, 2023, 8:01:13 AM10/30/23

to

On 28/10/2023 16:02, Thomas Koenig wrote:

Another issue for much of it was the slow coprocessor communication.
IIRC the 68000 did not have a separate coprocessor interface, requiring
many bus cycles (at 16-bit width) to pass instructions and data across.

A /long/ time ago, we made a board with a 68332 (a microcontroller with
a CPU mostly like a 68020, but with a 16-bit external databus) and a
68881 floating point coprocessor. The 68332 was great, and we used it
on many systems, but working with the 68881 was /slow/.

> It is an interesting question what sort of performance would have
> been possible with the transistor budget of the 8087, around 45000
> transistors, or 155000 for the 68881, if Wikipedia is to be believed.

David Brown

unread,

Oct 30, 2023, 8:02:07 AM10/30/23

to

On 28/10/2023 16:02, Thomas Koenig wrote:

Another issue for much of it was the slow coprocessor communication.
IIRC the 68000 did not have a separate coprocessor interface, requiring
many bus cycles (at 16-bit width) to pass instructions and data across.

A /long/ time ago, we made a board with a 68332 (a microcontroller with
a CPU mostly like a 68020, but with a 16-bit external databus) and a
68881 floating point coprocessor. The 68332 was great, and we used it
on many systems, but working with the 68881 was /slow/.

> It is an interesting question what sort of performance would have
> been possible with the transistor budget of the 8087, around 45000
> transistors, or 155000 for the 68881, if Wikipedia is to be believed.

David Brown

unread,

Oct 30, 2023, 8:04:52 AM10/30/23

to

On 28/10/2023 16:02, Thomas Koenig wrote:

Another issue for much of it was the slow coprocessor communication.
IIRC the 68000 did not have a separate coprocessor interface, requiring
many bus cycles (at 16-bit width) to pass instructions and data across.

A /long/ time ago, we made a board with a 68332 (a microcontroller with
a CPU mostly like a 68020, but with a 16-bit external databus) and a
68881 floating point coprocessor. The 68332 was great, and we used it
on many systems, but working with the 68881 was /slow/.

> It is an interesting question what sort of performance would have
> been possible with the transistor budget of the 8087, around 45000
> transistors, or 155000 for the 68881, if Wikipedia is to be believed.

David Brown

unread,

Oct 30, 2023, 11:11:46 AM10/30/23

to

On 30/10/2023 12:37, David Brown wrote:

Sorry about the multiple posts - I had a bit of a hang-up with my news
client.

David Brown

unread,

Oct 30, 2023, 11:12:17 AM10/30/23

to

The 68000, IIRC, had a 16-bit ALU (as well as the 16-bit databus). The
register set was 32-bit wide, and the instructions all supported 32-bit
sizes (along with 8-bit and 32-bit). But a full 32-bit ALU would have
been too big and expensive. The idea was that the architecture would be
competitive with existing 16-bit devices while being "forward
compatible" with planned fully 32-bit versions.

EricP

unread,

Oct 30, 2023, 1:27:40 PM10/30/23

to

(It is hard pick messages out of the clutter as the
denial of service attack msg flood from GG continues)

It had 3 banks of 16-bit registers each with 16-bit ALU's
for data low, address low, address & data high,
plus two segmented buses for address and data.

see Fig.2

Patent US4296469A, 1978
Execution unit for data processor using segmented bus structure
https://patents.google.com/patent/US4296469A/

Harry Tredennick also did the IBM Micro-370 from a modified 68000.

Michael S

unread,

Oct 30, 2023, 4:31:57 PM10/30/23

to

On Mon, 30 Oct 2023 13:27:13 -0400
EricP <ThatWould...@thevillage.com> wrote:
>
> (It is hard pick messages out of the clutter as the
> denial of service attack msg flood from GG continues)
>

[O.T.]
You can read very well-filtered variant of comp.arch here:
https://www.novabbs.com/devel/thread.php?group=comp.arch

That includes all non-spam post done via Google Groups in the last
couple of days, i.e. during the period when Eternal September was
temmporarily disconnected from GG feeds.

Chris M. Thomasson

unread,

Oct 30, 2023, 5:16:48 PM10/30/23

to

On 10/29/2023 9:06 PM, George Neuner wrote:
> On Fri, 27 Oct 2023 17:43:05 -0700 (PDT), MitchAlsup
> <Mitch...@aol.com> wrote:
>
>> On Friday, October 27, 2023 at 5:07:00?PM UTC-5, Bernd Linsel wrote:
>>> On 27.10.2023 22:54, Opus wrote:
>>> :
>>>> I don't remember if it was at all possible to use a 68881 with a 68000.
>>>> But maybe by "68000" you meant the 68020 and later.
>>> Au contraire, the -881 worked with all 68k family processors, at least
>>> up to the 040.
>> <
>> Where "worked" meant that::
>> FABS was 35 cycles !!
>> FADD was 51 cycles !!
>> FMUL was 71 cycles !!
>> heck even FNEG was 35 cycles !!!
>
> Which still was much faster than doing it in software on the CPU.
>
>
> Mid 90's to 2000 I was doing a lot of medical imaging. At that time
> MRI 'pixels' were 32-bit floating point and we were starting to see
> 64-bit pixels on the latest units.

[...]

DICOM volumetric images?

George Neuner

unread,

Oct 31, 2023, 10:19:42 PM10/31/23

to

Yes. We were post-processing for holographic film rendering.