Announcing Mecrisp-Quintus for RISC-V

Matthias Koch

unread,

Mar 4, 2018, 10:16:02 PM3/4/18

to

I am happy to announce Mecrisp-Quintus, a port of Mecrisp-Stellaris for the RISC-V architecture. Currently, it requires RV32IM instruction set, and its first target is Picosoc by Clifford Wolf on a HX8K FPGA. Early testers and developers with real RISC-V hardware are welcome.

mecrisp.sourceforge.net

Paul Rubin

unread,

Mar 5, 2018, 1:20:43 PM3/5/18

to

Matthias Koch <matthi...@hot.uni-hannover.de> writes:
> I am happy to announce Mecrisp-Quintus, a port of Mecrisp-Stellaris
> for the RISC-V architecture.

Nice! How much work is involved, converting and debugging all that asm
code?

Matthias Koch

unread,

Mar 6, 2018, 8:45:17 AM3/6/18

to

> Nice! How much work is involved, converting and debugging all that asm
> code?

Hi Paul,

thank you !

I encountered RISC-V architecture at 34C3 in Leipzig at the end of december.
It took two months of my free time to learn the new instruction set and to
write Mecrisp-Quintus. Note that it cannot do register allocation for now,
it just performs constant folding and a bit of inlining.

Matthias

foxaudio...@gmail.com

unread,

Mar 6, 2018, 8:59:43 AM3/6/18

to

If I understood correctly I saw that the CPU uses 3..4 clocks per instruction (40 for MUL).

Is that typical for RISC machines in FPGA? (limited pipelining I assume)
Does having the register set give better performance than a J1 style stack machine in general usage?

Matthias Koch

unread,

Mar 6, 2018, 9:40:25 AM3/6/18

to

> If I understood correctly I saw that the CPU uses 3..4 clocks per instruction (40 for MUL).
>
> Is that typical for RISC machines in FPGA? (limited pipelining I assume)
> Does having the register set give better performance than a J1 style stack machine in general usage?

As Picosoc by Clifford Wolf on the HX8K breakout board runs out of a SPI attached serial
flash memory, it crawls. But the architecture itself is by no means limited to this slow
minimal "hardware" implementation ! J1a gives a much better performance on the same HX8K
breakout board. No, this is neither typical for RISC-V nor for RISC machines in FPGA.

Matthias

JUERGEN

unread,

Mar 7, 2018, 6:40:19 AM3/7/18

to

Matthias,
does this mecrisp implementation as well run using the Standard Lattice 8k board that I have from the 1802 IP implementation for the FIG Forth implementation? http://www.latticesemi.com/Products/DevelopmentBoardsAndKits/iCE40HX8KBreakoutBoard.aspx

and is there a plan to cover the Microsemi -Implementation? From their email today:
The RISC-V ecosystem is revolutionizing the processor technology landscape. One of the most innovative RISC-V solutions I witnessed during my time at embedded world 2018 was Microsemi’s integration of PolarFire FPGAs with their Mi-V RISC-V development tools ecosystem.

foxaudio...@gmail.com

unread,

Mar 7, 2018, 7:25:42 AM3/7/18

to

My bad. I missed the detail on running out of SPI. Thanks for the explanation.

B

Cecil Bayona

unread,

Mar 7, 2018, 12:33:38 PM3/7/18

to

Sounds like the story of the TI99-4A, take a fairly good processor and
totally cripple the CPU by using a horribly inefficient memory system.

--
Cecil - k5nwa

foxaudio...@gmail.com

unread,

Mar 7, 2018, 9:06:05 PM3/7/18

to

Hey. That's my hobby system. :-)

But yes it is glacial and the "fairly good" processor takes 14 clocks to MOV R0,R1
in zero wait state RAM. But it can do a context switch in 1 instruction which is pretty cool.

B

jpit...@gmail.com

unread,

Mar 8, 2018, 8:47:52 AM3/8/18

to

Did I miss something - or you are comparing a TI home computer from a generation ago ( 28 years) with a fixed CPU https://en.wikipedia.org/wiki/Texas_Instruments_TI-99/4A
with somebody's trial implementation of a RISC V CORE using a freeware tool chain on a Lattice ICE 8k board to see it running?? https://en.wikipedia.org/wiki/RISC-V

A partial list of organizations that support the RISC-V Foundation includes: AMD[citation needed], BAE Systems, Berkeley Architecture Research, Bluespec, Inc., Cortus, Draper[citation needed], Google, GreenWaves Technologies, Hewlett Packard Enterprise, Huawei, IBM, Imperas Software, ICT, IIT Madras, Lattice Semiconductor, Mellanox Technologies, Microsemi, Micron, Microsoft[citation needed], Nvidia, NXP, Oracle, Qualcomm, Rambus Cryptography Research, Western Digital, and SiFive[12][13].

Features and Typical Applications

Small (750-2000 LUTs in 7-Series Xilinx Architecture)
High fmax (250-450 MHz on 7-Series Xilinx FPGAs)
Selectable native memory interface or AXI4-Lite master
Optional IRQ support (using a simple custom ISA)
Optional Co-Processor Interface

This CPU is meant to be used as auxiliary processor in FPGA designs and ASICs. Due to its high fmax it can be integrated in most existing designs without crossing clock domains. When operated on a lower frequency, it will have a lot of timing slack and thus can be added to a design without compromising timing closure.

For even smaller size it is possible to disable support for registers x16..x31 as well as RDCYCLE[H], RDTIME[H], and RDINSTRET[H] instructions, turning the processor into an RV32E core.

Furthermore it is possible to choose between a dual-port and a single-port register file implementation. The former provides better performance while the latter results in a smaller core.

Note: In architectures that implement the register file in dedicated memory resources, such as many FPGAs, disabling the 16 upper registers and/or disabling the dual-port register file may not further reduce the core size.

The core exists in two variations: picorv32 and picorv32_axi. The former provides a simple native memory interface, that is easy to use in simple environments, and the latter provides an AXI-4 Lite Master interface that can easily be integrated with existing systems that are already using the AXI standard.

A separate core picorv32_axi_adapter is provided to bridge between the native memory interface and AXI4. This core can be used to create custom cores that include one or more PicoRV32 cores together with local RAM, ROM, and memory-mapped peripherals, communicating with each other using the native interface, and communicating with the outside world via AXI4.

The optional IRQ feature can be used to react to events from the outside, implement fault handlers, or catch instructions from a larger ISA and emulate them in software.

The optional Pico Co-Processor Interface (PCPI) can be used to implement non-branching instructions in an external coprocessor. Implementations of PCPI cores that implement the M Standard Extension instructions MUL[H[SU|U]] and DIV[U]/REM[U] are included in this package.

Mark Wills

unread,

Mar 8, 2018, 10:43:02 AM3/8/18

to

Mine too. The TMS9995 is a MUUUUUUCH better processor than the TMS9900.
Everything about it is improved, and it has the same gorgeous instruction
set.

I like the memory-to-memory architecture.

Cecil Bayona

unread,

Mar 8, 2018, 11:06:42 AM3/8/18

to

Yes, all that fancy CPU is crippled by using SPI serial memory instead
of really fast STATIC RAM, like it says it crawls waiting on the memory.

By the way the marketing blurb is nice but without decent memory all
those fancy words are a total waste of time. The same thing happens with
the esp8266, nice core small, cheap, could be very fast but to save on
pins used they connect it too to SPI serial memory and it crawls too a
total waste of a decent core.
--
Cecil - k5nwa

Cecil Bayona

unread,

Mar 8, 2018, 11:12:52 AM3/8/18

to

It had a nice instruction set but a few register would have improved
it's performance but the technology at the time made it expensive and
then it would not have been compatible with it's mini-computer line. The
TI designers killed all possible performance by it's memory design, by
sharing the RAM with the video controller on the TI99-4A. It was a nice
processor, their newer design the MSP430 uses real registers and is
cheap and fast. I specially like the FRAM devices.

--
Cecil - k5nwa

Mark Wills

unread,

Mar 8, 2018, 11:14:23 AM3/8/18

to

But speed isn't the only metric. Pin count, physical size, power consumption
and cost are all relevant.

Anton Ertl

unread,

Mar 8, 2018, 11:19:26 AM3/8/18

to

Cecil Bayona <cba...@cbayona.com> writes:
[slow FPGA implementation of RISC-V]

>Sounds like the story of the TI99-4A, take a fairly good processor and
>totally cripple the CPU by using a horribly inefficient memory system.

If you want faster implementations, the fastest one at the moment is a
fast PC with a fast RISC-V emulator (IIRC Michael Clark reported that
he could run a benchmark on his emulator on the PC at half the speed
that the benchmark runs native on the PC). If you want RISC-V
hardware, you can buy the Hifive1, or order the Hifive Unleashed
(should be delivered in June).

But for the purposes of the OP, obviously the slow FPGA implementation
was good enough.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2017: http://euro.theforth.net/

Cecil Bayona

unread,

Mar 8, 2018, 11:23:24 AM3/8/18

to

True but it's hard to get excited about a slow CPU, we have had those
for a long time. You can get a PIC for a fraction of a dollar but it's
nothing to get excited about unless it also performs well, performance
is one of the features that they brag about the RISC-V and with SPI RAM
there is no performance to brag about.

Nonetheless I would like to eventually see the code for it some of the
small FPGAs have a bit of really fast internal memory and maybe with
some minor changes it could speed it up quite a bit.

--
Cecil - k5nwa

Cecil Bayona

unread,

Mar 8, 2018, 11:40:45 AM3/8/18

to

On 3/8/2018 9:59 AM, Anton Ertl wrote:
> Cecil Bayona <cba...@cbayona.com> writes:
> [slow FPGA implementation of RISC-V]
>> Sounds like the story of the TI99-4A, take a fairly good processor and
>> totally cripple the CPU by using a horribly inefficient memory system.
>
> If you want faster implementations, the fastest one at the moment is a
> fast PC with a fast RISC-V emulator (IIRC Michael Clark reported that
> he could run a benchmark on his emulator on the PC at half the speed
> that the benchmark runs native on the PC). If you want RISC-V
> hardware, you can buy the Hifive1, or order the Hifive Unleashed
> (should be delivered in June).
>
> But for the purposes of the OP, obviously the slow FPGA implementation
> was good enough.
>
> - anton
>

The Hifive1 looks like inexpensive hardware if one wants to experiment
and get familiar with the CPU but I see the same pattern Serial RAM to
cut the use of pins but low performance, with a cache its better but
really fast memory would even be better. The benchmark figures would be
hard to meet in the real world, the test most likely fits inside the
cache so it runs fast but in a larger program then it would not do as well.

I wonder how well would it compare to a $10 esp CPU32 board?
--
Cecil - k5nwa

Paul Rubin

unread,

Mar 8, 2018, 2:35:39 PM3/8/18

to

Cecil Bayona <cba...@cbayona.com> writes:
> The same thing happens with the esp8266, nice core small, cheap, could
> be very fast but to save on pins used they connect it too to SPI
> serial memory and it crawls too a total waste of a decent core.

It has something like 128k of on-chip ram (most of it taken up by the
wifi stack) which is more than plenty for most MCU applications.
Program memory is SPI flash but the idea is you can copy it into ram and
run from there.

> performance is one of the features that they brag about the RISC-V and
> with SPI RAM there is no performance to brag about.

RISC-V performance depends on the implementation and the bragging refers
to the high possible ceiling. There are also versions that trade speed
for minimal gate count, etc. So I think this FPGA version is intended
as a simple testbed rather than a performance champ. If it can't also
use block ram, I'd be surprised if there are no versions that do.
Really though, RISC-V gets more interesting when it's on custom silicon.

Stephen Pelc

unread,

Mar 9, 2018, 6:12:57 AM3/9/18

to

On Thu, 8 Mar 2018 07:43:00 -0800 (PST), Mark Wills
<markwi...@gmail.com> wrote:

>Mine too. The TMS9995 is a MUUUUUUCH better processor than the TMS9900.
>Everything about it is improved, and it has the same gorgeous instruction
>set.

At that time, most CPUs increased their clock speeds regularly.
However, as fas as I know, TI never had a major clock speed
upgrade. The original TMS9900 needed a 48MHz input for a 3 MHz
runtime clock, and the TMS9995 needed a 12 MHz input. The use
of a 4:1 clocking system doomed the family during that silicon
period.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441
web: http://www.mpeforth.com - free VFX Forth downloads

jpit...@gmail.com

unread,

Mar 9, 2018, 11:02:02 AM3/9/18

to

I am just surprised about all of this negativity here.
Matthias did the work, THANK YOU VERY MUCH,
used an available board to have mecrisp SOFTWARE run on this new processor core.
As some people might understand here - mecrisp runs on many variants - independent of the hardware - slow fast fastest cheap and expensive.

It was interesting for me and probably others to learn that many of the supporting companies do not have a clue about processors, speed and other features relevant for an industrial product:

Cecil Bayona

unread,

Mar 9, 2018, 1:11:16 PM3/9/18

to

Sorry you are upset, I don't know about others but I have not criticized
the FPGA implementation of the CPU or meCrisp, I have used meCrisp and
it works well, what I have criticized is the implementation by early
CPUs on their desire to cut down on the cost by implementing horrible
memory architecture that kill it's performance. In the various chips
available we see a horrible memory architecture then all kinds of
complex tricks such as complex caches to try to bring the performance
back up instead of using the tried and true solution of having lots of
real RAM internally.

Such early steps could spell doom to the RISC-V architecture future as
it will become in the reputation of many just another poor performance
toy processor. If that is their desire so be it then they can join the
ranks of the esp8266 as a low cost medium performance CPUs, I think that
would be a shame as the CPU has so much potential. At least the esp8266
models have lots of real RAM inside the device to allow it to work a lot
better than what it otherwise do with just external serial memory.

The CPU needs to establish in it's early forms that the performance
benefits they brag about is real not marketing hype, as such some of the
early processors need to have memory that allow the processor to really
shine and perform exceedingly well in other than hand picked benchmarks
otherwise it can become in the mind of many another marketing ploy that
doesn't deliver what it promises.

The FPGA implementation is something I would like to take a look at and
see how simple it is so I'm glad it was done, so I agree THANK YOU.
Perhaps if there is a site available with details of the implementation
that would be really useful. There is no mention that I recall regarding
it being available for others to inspect, learn and see the advantages
of the CPU family over others so for now all we can do is guess from the
information in a paragraph.

--
Cecil - k5nwa

Jan Coombs

unread,

Mar 9, 2018, 3:34:37 PM3/9/18

to

On Fri, 9 Mar 2018 12:11:11 -0600
Cecil Bayona <cba...@cbayona.com> wrote:

> The CPU needs to establish in it's early forms that the
> performance benefits they brag about is real not marketing
> hype, as such some of the early processors need to have memory
> that allow the processor to really shine and perform
> exceedingly well in other than hand picked benchmarks
> otherwise it can become in the mind of many another marketing
> ploy that doesn't deliver what it promises.

I has only been about 2 years from microcontroller to PC level
product:

https://www.sifive.com/products/hifive-unleashed/

Jan Coombs

peter4...@gmail.com

unread,

Mar 9, 2018, 5:08:56 PM3/9/18

to

Thank you Matthias , for such a great job !

Mecrisp is absolutely outstanding Forth. I tested the STM32 ARM Cortex, and runs absurdly fast !

Hope soon to test the Risc-V version in some fast board.

Thumbs up for your great job !

Matthias Koch

unread,

Mar 9, 2018, 10:15:23 PM3/9/18

to

> does this mecrisp implementation as well run using the Standard Lattice 8k board

Yes, this one, as is, unmodified.

> and is there a plan to cover the Microsemi -Implementation?

Not on my side, as there is no open source toolchain for Microsemi FPGAs.

But I am fine with real RISC-V silicon, why not donate a HiFive1 ?
Does anybody know what happened to Arduino Cinque ?

jpit...@gmail.com

unread,

Mar 10, 2018, 4:11:44 AM3/10/18

to

You might want to increase your knowledge about the hardware and silicon by using the following links.
And I assume somebody here might go to the Linux Event and bring back some real facts from the horses mouth:

https://www.sifive.com/products/hifive1/

https://groups.google.com/forum/#!topic/comp.lang.forth/bi1Baa82KDI

https://events.linuxfoundation.org/events/elc-openiot-north-america-2018/features-and-add-ons/co-located-events/

http://www.datacenterknowledge.com/hardware/open-source-risc-v-ready-take-intel-amd-and-arm-data-center

According to the RISC-V Foundation, from a technical standpoint, the architecture should be able to compete with Intel, AMD, and ARM. The foundation says they see no reason why a RISC-V implementation should be any slower than x86 or ARM, and that "the ISA design should enable implementations to be somewhat more efficient than either." As for power consumption, the foundation says that depending "on the quality of the implementation," power efficiency should be able to more than match ARM's low energy use.
"In one point of comparison, the RISC-V Rocket core is twice as energy efficient as the most similar ARM implementation, the Cortex-A5," the foundation said in a Q&A.

jpit...@gmail.com

unread,

Mar 10, 2018, 7:16:39 AM3/10/18

to

Thanks Matthias. As you might know, Steve Teal designed a CDP1802 IP as part of the FIG Forth eBook Project https://www.amazon.co.uk/FIG-Forth-Manual-Documentation-Test-1802-ebook/dp/B01N42VLJE/ref=sr_1_1?s=books&ie=UTF8&qid=1479668561&sr=1-1

This project uses the Lattice ICE40 - 8K Board, see as well https://wiki.forth-ev.de/doku.php/projects:fig-forth-1802-fpga:start

If the RISC-V with mecrisp uses the same board, it should be relatively easy just to program the Flash and then run it? Is there an image available somewhere? There must be a lot of these boards around which could be upgraded to mecrisp ...

Downloading the Diamond software is free and you only need to use the programmer then - and forget about the rest as no VHDL work is involved.

jpit...@gmail.com

unread,

Mar 10, 2018, 7:26:22 AM3/10/18

to

Just went through the documentation there and saw there is additional Forth support https://github.com/riscv/riscv-wiki/wiki/RISC-V-Software-Status

Anton Ertl

unread,

Mar 10, 2018, 9:56:53 AM3/10/18

to

Jan Coombs <jenfhaom...@murmic.plus.com> writes:
>I has only been about 2 years from microcontroller to PC level
>product:
>
>https://www.sifive.com/products/hifive-unleashed/

WRT performance, this one is probably competetive with a Raspberry Pi 3:
slightly faster clock, larger caches, but only single-issue instead
of dual-issue. There we have it:

2550 DMIPS for U54 (1.7 DMIPS/MHZ * 1500MHz)
2458.1 Dhrystone MIPS for Raspberry Pi 3

Sources:
<https://www.sifive.com/products/risc-v-core-ip/u54-mc/>:
<https://www.raspberrypi.org/magpi/raspberry-pi-3-specs-benchmarks/>

3504 Coremarks for Raspberry Pi 3
4125 Coremarks for U54 (2.75 Coremarks/MHz * 1500MHz)

Sources:
<https://www.sifive.com/products/risc-v-core-ip/u54-mc/>
<https://www.heise.de/ct/artikel/Raspberry-Pi-3-Leistungsaufnahme-und-CoreMark-Vergleich-3121139.html>

Cecil Bayona

unread,

Mar 10, 2018, 11:44:23 AM3/10/18

to

Thanks for the laugh, you quote a bunch of marketing articles and I'm
supposed to bend over?

I like the "depending on the quality of implementation", "see no reason

why a RISC-V implementation should be any slower than x86 or ARM, and

that " that is precisely what I'm talking about, the architecture has
potential but they need to do it right.

--
Cecil - k5nwa

jpit...@gmail.com

unread,

Mar 10, 2018, 1:11:29 PM3/10/18

to

I just have to admit that a Cecil somewhere has a bigger brain - probably a few tons then - than the supporters AMD, BAE Systems, Berkeley Architecture Research, Bluespec, Inc., Cortus, Draper[citation needed], Google, GreenWaves Technologies, Hewlett Packard Enterprise, Huawei, IBM,.

You probably insulted 1000+ engineers with your comments.

I can understand that you do not like marketing as you do not have a clue there either.
Talking about things you do not understand unfortunately makes you - again - look silly.
You have not shown any competence regarding silicon
and I now wonder if you have any regarding software.

It is just outright silly to compare Matthias's implementation on a board he had access to with a professional board -

And I am laughing now as well as Anton has proven what the data says. Probably a miscalculation as you have not done it yourself ...

jpit...@gmail.com

unread,

Mar 10, 2018, 1:48:34 PM3/10/18

to

Thank you very much Anton for the data.
It reminds me of the comments I heard when ARM started
- somehow that has changed now.
Arm was developed for certain target applications - and these are dominated now by ARM - not by Intel.
So it comes down to to: what is the better processor?
the answer is it depends on what you want to achieve.

Cecil Bayona

unread,

Mar 10, 2018, 2:24:50 PM3/10/18

to

I guess you want to misunderstand me, nothing I can do about that. Where
have I criticized Matthias? My criticism has been about early
implementations that put things in that cripple the performance and
could give the technology a black eye. The potential of RISC-V is there
but putting things like serial RAM makes the family look bad.

Are you a marketing guy? You are not aware that these marketing type
will say anything to sell the product even when they have to stretch the
truth, so you believe everything they say, I depend more on what the
product does in the real world rather than on a piece of paper but if
I'm a horrible person because I don't believe the paper 100% then there
is little left to discuss. Engineering fact, if one uses SIO memory to
run software the unit will not perform to the capacity it could be
capable of, so if I'm somehow clueless for stating that then we should
terminate this discussion.

Have a good life and don't be disappointed when the marketing hype fails.

--
Cecil - k5nwa

jpit...@gmail.com

unread,

Mar 11, 2018, 6:00:11 AM3/11/18

to

On Saturday, 10 March 2018 03:15:23 UTC, Matthias Koch wrote:

Fingers crossed Matthias that the boards arrive soon - please send your address to epld...@aol.com - I cannot find it as the PC had crashed;

Will anybody be at the Linux conference?

I had sent an email to SIFive yesterday based on your post and this was the answer:

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Hi Juergen,

Thank you for reaching out and your detailed note below.
We will be happy to support you with a few HiFive1 boards.
Can you please let us know what address we should send them to?

By chance, will either yourself, Stephen, or Matthias Koch or anybody else be at the Embedded Linux Conference/OpenIoT Summit next week (3/12-3/14) in Portland Oregon?
SiFive is hosting a hackathon event there where we will have HiFive1s on site for folks to work with.
That will be a great chance to meet folks from the SiFive team in person.

Otherwise, as you make progress with HiFve1,
the best way to keep us and the larger community updated is to use our forums: https://forums.sifive.com/
You can see people posting about other HiFive1 related projects here as well.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

jpit...@gmail.com

unread,

Mar 11, 2018, 3:10:32 PM3/11/18

to

There are quite a few more RISC V processors in the pipeline, see https://en.wikipedia.org/wiki/RISC-V

The main problem is probably to have early Toolchains - definitely a chance for FORTH - and to invite the Libraries - get Sockpuppet integrated.

Anton Ertl

unread,

Mar 12, 2018, 6:08:40 AM3/12/18

to

jpit...@gmail.com writes:
>Arm was developed for certain target applications - and these are dominated now by ARM - not by Intel.

ARM was developed for Acorn home computers. The home computer market
was eaten by the PC market, and that is dominated by Intel. ARM
retargeted for small embedded computers, which allowed it to survive
the MHz wars, and they developed a business model that is successful
in the mobile market. And as an offspring of that, they are, in a
way, back in the home computer market with the Raspberry Pi computers;
although my guess is that only a small percentage of the Raspberry Pis
are used as home computers in the sense that they are the primary
computers of their users, like the Acorn machines of old were.

jpit...@gmail.com

unread,

Mar 12, 2018, 1:44:18 PM3/12/18

to

Please do not forget the ARM MultiCore Processors which have been designed to cut power consumption in computer farms in the future - where often the cost is in getting rid of the heat. We will see what happens with ARM and RISC-V over the next years.

Paul Rubin

unread,

Mar 12, 2018, 9:06:58 PM3/12/18

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> although my guess is that only a small percentage of the Raspberry Pis
> are used as home computers in the sense that they are the primary
> computers of their users, like the Acorn machines of old were.

That's ok, people's primary computers these days are mostly mobile
phones, which for now have ARM cpus.

Matthias Koch

unread,

Mar 12, 2018, 11:17:47 PM3/12/18

to

Thank you for the nice comments !

At the weekend, I finished an instruction set emulator which allows to run
RISC-V instructions on an ARM Cortex M3/M4. Mecrisp-Quintus 0.2 therefore
is available for Picosoc and the LM4F120 Stellaris Launchpad.

Matthias

Jan Coombs

unread,

Mar 13, 2018, 6:01:40 AM3/13/18

to

I'm not quite there yet, and not comfortable with Android yet,
but am looking at 10" tablet to replace phone, or possibly even
better for my eyes a 13" eInk tablet.

So you're right, and maybe I'm slowly getting there. Part of this
is due to noticing that my large laptop although turned off at
night seems to wake up and call home.

Perhaps my ideal tablet would have an open-source RISK-V
processor in it?

Jan Coombs
--

sam....@gmail.com

unread,

Mar 20, 2018, 4:04:11 PM3/20/18

to

I can't stand to watch this go on any more. The misdirected blame onto RISC-V and PicoRV32 cannot go on any longer.

On Friday, March 9, 2018 at 10:11:16 AM UTC-8, Cecil - k5nwa wrote:
> Sorry you are upset, I don't know about others but I have not criticized
> the FPGA implementation of the CPU or meCrisp, I have used meCrisp and

You actually *are* criticizing the FPGA implementation of the CPU. Your very words, taken literally, are under the completely mistaken impression that the 32-bit PicoRV32 processor core exposes an SPI bus interface as its front-side bus, and that all PicoRV32 installations will exhibit similar performance characteristics. This is simply not true.

Source code for this CPU is up on Github. You'll find that it exposes a parallel bus interface, capable of sustaining one memory transfer every clock cycle, assuming the CPU's specific implementation can keep up. My own RISC-V core, a 64-bit KCP53000, similarly exposes a parallel bus architecture that, given sufficient resources, can also sustain one transfer per cycle. (NOTE: Neither PicoRV32 nor KCP53000 takes advantage of this capability due to how their internal logic works, but that's due to a conscious engineering decision to aim for small size of the core, not high performance.)

However, the RISC-V already has a reputation for keeping up and even surpassing the Joneses. Actual silicon has been fabricated which, compared to ARM processors at similar fabrication nodes, were both slightly more performant and yet slightly less power-hungry. Make no mistake: ARM is scared.

I've personally witnessed a 62MHz Rocket core (in-order, single-issue, 5-stage pipelined, 64-bit processor) synthesized on a rather beefy Zynq FPGA running Quake under Linux and pushing pixels out over PCIe. The still-experimental BOOM (Berkeley Out-Of-Order Machine) implementation attains close to 79% instruction throughputs as a Xeon processor. That's not bad considering the discrepency in engineering resources that goes into Xeon vs BOOM.

> it works well, what I have criticized is the implementation by early
> CPUs on their desire to cut down on the cost by implementing horrible
> memory architecture that kill it's performance. In the various chips

I hate to tell you this, but your very own desktop PC also has a similarly bottle-necked memory path. Yeah, you got a lot of data bits coming out of those SIMMs, but let's be honest, the front-side memory bandwidth is *nothing* compared to what your voracious 4- to 7-execution unit, 14- to 31-stage (per core!!) pipelined Intel/AMD processor consumes under full workload. This is why your CPU has not one, not two, but THREE levels of cache on it -- to hide/amortize that bottleneck. This is something you correctly identify later in your post, but yet you completely fail to realize that it has *zero* to do with instruction set architectures. It boggles the mind.

Nobody denies that serial paths to memory is sucky. Clifford gets it. I get it. You obviously get it. But, by the same token, our hands are tied. We have to make due with what comes on off-the-shelf FPGA development boards. For example, on my homebrew computer project (which runs a custom Forth dialect of my own, BTW), the Kestrel-2DX, I have access to 48KB of 16-bit wide block RAM and a synthesized ROM. It's mighty quick, and I love it. But on my successor project, the Kestrel-3, the situation will be different: I'll have access to 512Kx16 external SRAM at 10ns (yay!) but serial SPI flash ROM (and, in fact, the same ROM used to program the FPGA itself -- yuck!). I cannot change this. I can only *hide* it, perhaps by using a portion of RAM to shadow ROM, a la how early 80286/80386 systems used to use VM for this purpose.

RISC-V is an instruction set architecture. It is *NOT* an implementation. I'll be the first to tell you that KCP53000 is *not* production-ready for anything beyond what you'd throw a 6502 or Z80 in (except that it has the bonus of being a 64-bit CPU). It has a high number of CPI, and right now, the only examples of its use couples its 64-bit data path to a 16-bit RAM bottleneck. At 25MHz, it gets 6 MIPS average performance. Not a mover. Definitely not a shaker.

But a Rocket core or BOOM core, synthesized on a Zynq with 32-bit path to RAM and equipped with a decent-sized cache? It's good enough for a decent quality gaming console. It might not play Crysis (as the meme goes), but it can definitely play Quake smoothly. I've seen it with my own eyes.

> Such early steps could spell doom to the RISC-V architecture future as
> it will become in the reputation of many just another poor performance
> toy processor. If that is their desire so be it then they can join the

They've actually had exactly the *opposite* effect. They're seeing that RISC-V is a viable, Linux-capable instruction set architecture, and now more and more people are wanting to see real silicon. This is especially true when looking at the larger FPGA implementations, such as those synthesized on Zynq boards, with 32-bit paths to external memory, et. al.

> The CPU needs to establish in it's early forms that the performance
> benefits they brag about is real not marketing hype, as such some of the

Already done. Watch the videos of the RISC-V workshops. Every one has a comparison with silicon against a competitor.

--
Samuel A. Falvo II
Kestrel Computer Project

jpit...@gmail.com

unread,

Mar 21, 2018, 4:18:52 AM3/21/18

to

Brilliant. Thank you Sam. Exactly what was needed from a person who knows the details.

jpit...@gmail.com

unread,

Mar 21, 2018, 4:22:18 AM3/21/18

to

On Tuesday, 20 March 2018 20:04:11 UTC, sam....@gmail.com wrote:

Thank you very much Sam for your post.
All timportant facts from somebody who knows the details.

gnuarm.del...@gmail.com

unread,

Mar 22, 2018, 9:32:53 PM3/22/18

to

On Wednesday, March 7, 2018 at 8:06:05 PM UTC-6, foxaudio...@gmail.com wrote:
> On Wednesday, March 7, 2018 at 12:33:38 PM UTC-5, Cecil - k5nwa wrote:
> > On 3/7/2018 6:25 AM, foxaudio...@gmail.com wrote:
> > > On Tuesday, March 6, 2018 at 9:40:25 AM UTC-5, Matthias Koch wrote:
> > >>> If I understood correctly I saw that the CPU uses 3..4 clocks per instruction (40 for MUL).
> > >>>
> > >>> Is that typical for RISC machines in FPGA? (limited pipelining I assume)
> > >>> Does having the register set give better performance than a J1 style stack machine in general usage?
> > >>
> > >> As Picosoc by Clifford Wolf on the HX8K breakout board runs out of a SPI attached serial
> > >> flash memory, it crawls. But the architecture itself is by no means limited to this slow
> > >> minimal "hardware" implementation ! J1a gives a much better performance on the same HX8K
> > >> breakout board. No, this is neither typical for RISC-V nor for RISC machines in FPGA.
> > >>
> > >> Matthias
> > >
> > > My bad. I missed the detail on running out of SPI. Thanks for the explanation.
> > >
> > > B

> > >
> >
> > Sounds like the story of the TI99-4A, take a fairly good processor and
> > totally cripple the CPU by using a horribly inefficient memory system.
> >

> > --
> > Cecil - k5nwa
>
> Hey. That's my hobby system. :-)
>
> But yes it is glacial and the "fairly good" processor takes 14 clocks to MOV R0,R1
> in zero wait state RAM. But it can do a context switch in 1 instruction which is pretty cool.

People often look at one aspect of a CPU and think of how "cool" it is. But a processor is not made up of a single instruction, it is the sum of it's pieces. Often a good point here comes with a poor point there. The good point may only be an advantage in a certain combination of external factors which is the case for the TI-99x instruction set.

The TMS-990 instruction set was designed for mini-computers where the logic delays in the ALU and decode logic were large compared to the memory cycle time. Once the CPU was integrated on a single chip the logic timing became much faster while the memory timing remained about the same. Once the memory access time dominated, the "register in memory" model became a boat anchor in spite of the ability to do fast context switches. If they had persisted and kept the processor design alive until they could integrate fast memory on chip they would have had a potential winner again. TI didn't want to be in the microcomputer business at that time and killed the product line entirely.

Oddly enough they still worked on MCUs and developed the first line of general purpose DSP processors, but the TMS99x was not part of that future.

Rick C.

gnuarm.del...@gmail.com

unread,

Mar 22, 2018, 9:39:23 PM3/22/18

to

On Thursday, March 8, 2018 at 9:43:02 AM UTC-6, Mark Wills wrote:

> On Thursday, 8 March 2018 02:06:05 UTC, foxaudio...@gmail.com wrote:
> > On Wednesday, March 7, 2018 at 12:33:38 PM UTC-5, Cecil - k5nwa wrote:
> > > On 3/7/2018 6:25 AM, foxaudio...@gmail.com wrote:
> > > > On Tuesday, March 6, 2018 at 9:40:25 AM UTC-5, Matthias Koch wrote:
> > > >>> If I understood correctly I saw that the CPU uses 3..4 clocks per instruction (40 for MUL).
> > > >>>
> > > >>> Is that typical for RISC machines in FPGA? (limited pipelining I assume)
> > > >>> Does having the register set give better performance than a J1 style stack machine in general usage?
> > > >>
> > > >> As Picosoc by Clifford Wolf on the HX8K breakout board runs out of a SPI attached serial
> > > >> flash memory, it crawls. But the architecture itself is by no means limited to this slow
> > > >> minimal "hardware" implementation ! J1a gives a much better performance on the same HX8K
> > > >> breakout board. No, this is neither typical for RISC-V nor for RISC machines in FPGA.
> > > >>
> > > >> Matthias
> > > >
> > > > My bad. I missed the detail on running out of SPI. Thanks for the explanation.
> > > >
> > > > B
> > > >
> > >
> > > Sounds like the story of the TI99-4A, take a fairly good processor and
> > > totally cripple the CPU by using a horribly inefficient memory system.
> > >
> > > --
> > > Cecil - k5nwa
> >
> > Hey. That's my hobby system. :-)
> >
> > But yes it is glacial and the "fairly good" processor takes 14 clocks to MOV R0,R1
> > in zero wait state RAM. But it can do a context switch in 1 instruction which is pretty cool.
> >
> >

> > B

>
> Mine too. The TMS9995 is a MUUUUUUCH better processor than the TMS9900.
> Everything about it is improved, and it has the same gorgeous instruction
> set.
>

> I like the memory-to-memory architecture.

I still have a 9995 CPU board I designed myself. I haven't fired it up in decades. I also have a Technico 9900 board that I actually pulled out and took a look at a while back. It had sockets for one of the very early EPROM devices, maybe the 2708 and I added wire wrap sockets that expanded the size to 2716 or maybe 2732. At that time they were pumping out new EPROM devices a bit faster than Moore's law I believe. It sure seemed to me like we couldn't keep up with the new versions. Anyway, the RAM was the same at just a couple of KB and the board was huge, maybe two foot long! It's a museum piece now, right next to my 8008 computer.

Rick C.

gnuarm.del...@gmail.com

unread,

Mar 22, 2018, 9:43:27 PM3/22/18

to

On Thursday, March 8, 2018 at 10:23:24 AM UTC-6, Cecil - k5nwa wrote:
> On 3/8/2018 10:14 AM, Mark Wills wrote:
> > On Thursday, 8 March 2018 16:06:42 UTC, Cecil - k5nwa wrote:
> >> On 3/8/2018 7:47 AM, jpit...@gmail.com wrote:

> >>> On Wednesday, 7 March 2018 17:33:38 UTC, Cecil - k5nwa wrote:
> >>>> On 3/7/2018 6:25 AM, foxaudio...@gmail.com wrote:
> >>>>> On Tuesday, March 6, 2018 at 9:40:25 AM UTC-5, Matthias Koch wrote:
> >>>>>>> If I understood correctly I saw that the CPU uses 3..4 clocks per instruction (40 for MUL).
> >>>>>>>
> >>>>>>> Is that typical for RISC machines in FPGA? (limited pipelining I assume)
> >>>>>>> Does having the register set give better performance than a J1 style stack machine in general usage?
> >>>>>>
> >>>>>> As Picosoc by Clifford Wolf on the HX8K breakout board runs out of a SPI attached serial
> >>>>>> flash memory, it crawls. But the architecture itself is by no means limited to this slow
> >>>>>> minimal "hardware" implementation ! J1a gives a much better performance on the same HX8K
> >>>>>> breakout board. No, this is neither typical for RISC-V nor for RISC machines in FPGA.
> >>>>>>
> >>>>>> Matthias
> >>>>>
> >>>>> My bad. I missed the detail on running out of SPI. Thanks for the explanation.
> >>>>>
> >>>>> B
> >>>>>
> >>>>
> >>>> Sounds like the story of the TI99-4A, take a fairly good processor and
> >>>> totally cripple the CPU by using a horribly inefficient memory system.
> >>>>
> >>>> --
> >>>> Cecil - k5nwa
> >>>

> >>> Did I miss something - or you are comparing a TI home computer from a generation ago ( 28 years) with a fixed CPU https://en.wikipedia.org/wiki/Texas_Instruments_TI-99/4A
> >>> with somebody's trial implementation of a RISC V CORE using a freeware tool chain on a Lattice ICE 8k board to see it running?? https://en.wikipedia.org/wiki/RISC-V

> >>>
> >>> A partial list of organizations that support the RISC-V Foundation includes: AMD[citation needed], BAE Systems, Berkeley Architecture Research, Bluespec, Inc., Cortus, Draper[citation needed], Google, GreenWaves Technologies, Hewlett Packard Enterprise, Huawei, IBM, Imperas Software, ICT, IIT Madras, Lattice Semiconductor, Mellanox Technologies, Microsemi, Micron, Microsoft[citation needed], Nvidia, NXP, Oracle, Qualcomm, Rambus Cryptography Research, Western Digital, and SiFive[12][13].
> >>>

> >>> Features and Typical Applications
> >>>
> >>> Small (750-2000 LUTs in 7-Series Xilinx Architecture)
> >>> High fmax (250-450 MHz on 7-Series Xilinx FPGAs)
> >>> Selectable native memory interface or AXI4-Lite master
> >>> Optional IRQ support (using a simple custom ISA)
> >>> Optional Co-Processor Interface
> >>>
> >>> This CPU is meant to be used as auxiliary processor in FPGA designs and ASICs. Due to its high fmax it can be integrated in most existing designs without crossing clock domains. When operated on a lower frequency, it will have a lot of timing slack and thus can be added to a design without compromising timing closure.
> >>>
> >>> For even smaller size it is possible to disable support for registers x16..x31 as well as RDCYCLE[H], RDTIME[H], and RDINSTRET[H] instructions, turning the processor into an RV32E core.
> >>>
> >>> Furthermore it is possible to choose between a dual-port and a single-port register file implementation. The former provides better performance while the latter results in a smaller core.
> >>>
> >>> Note: In architectures that implement the register file in dedicated memory resources, such as many FPGAs, disabling the 16 upper registers and/or disabling the dual-port register file may not further reduce the core size.
> >>>
> >>> The core exists in two variations: picorv32 and picorv32_axi. The former provides a simple native memory interface, that is easy to use in simple environments, and the latter provides an AXI-4 Lite Master interface that can easily be integrated with existing systems that are already using the AXI standard.
> >>>
> >>> A separate core picorv32_axi_adapter is provided to bridge between the native memory interface and AXI4. This core can be used to create custom cores that include one or more PicoRV32 cores together with local RAM, ROM, and memory-mapped peripherals, communicating with each other using the native interface, and communicating with the outside world via AXI4.
> >>>
> >>> The optional IRQ feature can be used to react to events from the outside, implement fault handlers, or catch instructions from a larger ISA and emulate them in software.
> >>>
> >>> The optional Pico Co-Processor Interface (PCPI) can be used to implement non-branching instructions in an external coprocessor. Implementations of PCPI cores that implement the M Standard Extension instructions MUL[H[SU|U]] and DIV[U]/REM[U] are included in this package.
> >>>
> >>
> >> Yes, all that fancy CPU is crippled by using SPI serial memory instead
> >> of really fast STATIC RAM, like it says it crawls waiting on the memory.
> >>
> >> By the way the marketing blurb is nice but without decent memory all
> >> those fancy words are a total waste of time. The same thing happens with
> >> the esp8266, nice core small, cheap, could be very fast but to save on
> >> pins used they connect it too to SPI serial memory and it crawls too a
> >> total waste of a decent core.
> >> --
> >> Cecil - k5nwa
> >
> > But speed isn't the only metric. Pin count, physical size, power consumption
> > and cost are all relevant.
> >
>
> True but it's hard to get excited about a slow CPU, we have had those
> for a long time. You can get a PIC for a fraction of a dollar but it's
> nothing to get excited about unless it also performs well, performance
> is one of the features that they brag about the RISC-V and with SPI RAM
> there is no performance to brag about.
>
> Nonetheless I would like to eventually see the code for it some of the
> small FPGAs have a bit of really fast internal memory and maybe with
> some minor changes it could speed it up quite a bit.

The RISC-V is not all about performance. It is intended to include a range of processors with different advantages. Fitting into less than 1 KLUT in an FPGA is a great advantage even if run from serial memory. The design could be made to run out of FPGA internal memory if speed is more important while still keeping a small footprint. No?

Rick C.

gnuarm.del...@gmail.com

unread,

Mar 22, 2018, 10:20:59 PM3/22/18

to

On Friday, March 9, 2018 at 5:12:57 AM UTC-6, Stephen Pelc wrote:

> On Thu, 8 Mar 2018 07:43:00 -0800 (PST), Mark Wills
> <markwi...@gmail.com> wrote:
>
> >Mine too. The TMS9995 is a MUUUUUUCH better processor than the TMS9900.
> >Everything about it is improved, and it has the same gorgeous instruction
> >set.
>

> At that time, most CPUs increased their clock speeds regularly.
> However, as fas as I know, TI never had a major clock speed
> upgrade. The original TMS9900 needed a 48MHz input for a 3 MHz
> runtime clock, and the TMS9995 needed a 12 MHz input. The use
> of a 4:1 clocking system doomed the family during that silicon
> period.
>
> Stephen

I don't think you understand the need for the 4x clock in the early processors. The same was true for the Intel 8080 and even the three chip CPU in the LSI-11. They used a logic construction that used transmission gates to form FFs and so required non-overlapping clocks to make use of the capacitance of the logic node to hold the value for a portion of a cycle. Then the value would be propagated to the next node on the next phase of the clock and so on. This was the best way to construct logic on the early processes. Once they refined NMOS further, they got away from dynamically clocked logic and processors used the conventional FFs we use today... most of the time.

People decry the divided down clocking of the early processors, but that doesn't make them slow. The TI line didn't die because the processors were slow, it died because TI started to focus their attention elsewhere. After all, most of the 9900 product line was intended to be used in their 990 minicomputer line and the day of the minicomputer was clearly coming to an end. They had newer, faster processors on the drawing board, the company decided to put it's eggs in a different basket.

Rick C.

Paul Rubin

unread,

Mar 22, 2018, 11:05:21 PM3/22/18

to

gnuarm.del...@gmail.com writes:
> Fitting into less than 1 KLUT in an FPGA is a great advantage even if
> run from serial memory.

I wonder if the tradeoffs required to cram a risc-v into 1 KLUT might
make a simpler 1-KLUT cpu (i.e. an 8 or 16 bitter, maybe even a stack
design) more attractive a lot of the time.

gnuarm.del...@gmail.com

unread,

Mar 23, 2018, 12:21:02 AM3/23/18

to

What are those trade offs?

Rick C.

Paul Rubin

unread,

Mar 23, 2018, 12:27:00 AM3/23/18

to

gnuarm.del...@gmail.com writes:
>> I wonder if the tradeoffs required to cram a risc-v into 1 KLUT might

> What are those trade offs?

If the 1-KLUT risc-v works as well as a 100-KLUT risc-v, then there were
no trade-offs. If it doesn't work as well, then they traded off
functionality/performance to get reduced gate count. Are you telling me
that the 1-KLUT one works as well as the 100-KLUT one? I'm skeptical,
but I'm not a hardware guy.

Anton Ertl

unread,

Mar 23, 2018, 4:50:59 AM3/23/18

to

gnuarm.del...@gmail.com writes:
[TI 990 instruction set]
>If they had per=
>sisted and kept the processor design alive until they could integrate fast =

>memory on chip they would have had a potential winner again.

Not by using that fast memory instead of registers. Larger structures
have longer access latencies, and you usually cannot afford to have a
lot of ports for larger memories. E.g, the 32KB D-cache of recent
Intel and AMD CPUs have 4 cycles of access latency and 2 ports, while
the registers have a shorter latency (the apparent latency is 0
cycles), and 8+ read ports and 4+ write ports.

OTOH, putting the workspace in registers, and letting the change of
the workspace pointer store the old workspace into memory and load the
new workspace from memory would be possible. The context switch would
cost a little more, but the rest of the execution would be faster.

Anton Ertl

unread,

Mar 23, 2018, 5:01:49 AM3/23/18

to

Sure, the b16 probably runs a lot faster on small FPGAs than this
implementation of RISC-V. The advantage of this RISC-V implementation
is that you can play with RISC-V in "real hardware" (rather than the
much faster software emulation on a PC) even if you own only a small
FPGA board.

Anton Ertl

unread,

Mar 23, 2018, 5:05:46 AM3/23/18

to

gnuarm.del...@gmail.com writes:
>The TI line didn't die because the processors were s=
>low, it died because TI started to focus their attention elsewhere. After =
>all, most of the 9900 product line was intended to be used in their 990 min=
>icomputer line and the day of the minicomputer was clearly coming to an end=
>.

And ARM was intended to be used in home computers, and home computers
(and Acorn itself) have come to an end, yet ARM is one of the dominant
architectures. So that's not the decisive factor.

>They had newer, faster processors on the drawing board, the company deci=
>ded to put it's eggs in a different basket.=20

That's more like it.

Stephen Pelc

unread,

Mar 23, 2018, 9:14:53 AM3/23/18

to

On Thu, 22 Mar 2018 19:20:58 -0700 (PDT),
gnuarm.del...@gmail.com wrote:

>I don't think you understand the need for the 4x clock in the early process=
>ors. The same was true for the Intel 8080 and even the three chip CPU in t=
>he LSI-11.

NMOS died, thank heavens.

Regardless of technical merit, the CPUs that won the race from the
late 1970s were the ones that raised their clock speed significantly.
These were the descendants of the 8086 and 68000. Note also that 1987
saw the Sun 4/260, arguably the first commercial SPARC/RISC
workstation. The 80386 appeared in 1985.

The other killer for 64 kb constrained architectures was the need for
bank-switching for extended code and data memory.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441
web: http://www.mpeforth.com - free VFX Forth downloads

gnuarm.del...@gmail.com

unread,

Mar 23, 2018, 1:08:10 PM3/23/18

to

I'm asking how you expect your question is meaningful given that you have no idea what the tradeoffs are or how to determine them? What would the point of comparison be exactly?

Just like many architectures, RISC-V is an instruction set. It's not really about the "trade offs" of one implementation vs. another. The question should be how does the 1 kLUT version compare to your requirements? Who cares how much faster the RISC-V being made on a custom die is going to run when you are looking for a CPU to go in an FPGA.

So there are a (huge) number of available CPUs for use in FPGAs. How well do any of them meet your requirements?

I know one design called ZPU which is actually a stack machine (not two stacks though) intended to run C code. The entire point was to have a common instruction set that could be implemented across a wide range of cost vs. performance points. At the small end it was a pretty crappy processor taking many clock cycles to perform a single instruction... BUT, it was still running the same instruction set and would run the same code compiled by the same tools. So if you didn't need higher performance here was a C compatible processor in just 600 LUTs. It has gotten some traction and been used professionally. Clearly some people's requirements were for a very small processor that could be coded in C.

Rick C.

gnuarm.del...@gmail.com

unread,

Mar 23, 2018, 1:13:20 PM3/23/18

to

On Friday, March 23, 2018 at 4:50:59 AM UTC-4, Anton Ertl wrote:
> gnuarm.del...@gmail.com writes:
> [TI 990 instruction set]
> >If they had per=
> >sisted and kept the processor design alive until they could integrate fast =
> >memory on chip they would have had a potential winner again.
>
> Not by using that fast memory instead of registers. Larger structures
> have longer access latencies, and you usually cannot afford to have a
> lot of ports for larger memories. E.g, the 32KB D-cache of recent
> Intel and AMD CPUs have 4 cycles of access latency and 2 ports, while
> the registers have a shorter latency (the apparent latency is 0
> cycles), and 8+ read ports and 4+ write ports.
>
> OTOH, putting the workspace in registers, and letting the change of
> the workspace pointer store the old workspace into memory and load the
> new workspace from memory would be possible. The context switch would
> cost a little more, but the rest of the execution would be faster.

Yup, the TMS99xx would never have competed against an i7. But using on chip memory for registers would have allowed the device to compete with the other devices of the day including the 8086 and 68000... well not entirely. It was still a 16 bit machine with a 64 kB address space (I think, I don't recall for sure, but it may have had data/instruction separation which would give you 128 kB). TI was working on the 99000 which was going to get past this barrier, but again the registers in external memory were an issue, but more importantly, TI decided to get out of the clearly limited mini-computer business which is what all these processors were intended for in the eyes of TI.

Rick C.

gnuarm.del...@gmail.com

unread,

Mar 23, 2018, 1:20:27 PM3/23/18

to

On Friday, March 23, 2018 at 5:05:46 AM UTC-4, Anton Ertl wrote:
> gnuarm.del...@gmail.com writes:
> >The TI line didn't die because the processors were s=
> >low, it died because TI started to focus their attention elsewhere. After =
> >all, most of the 9900 product line was intended to be used in their 990 min=
> >icomputer line and the day of the minicomputer was clearly coming to an end=
> >.
>
> And ARM was intended to be used in home computers, and home computers
> (and Acorn itself) have come to an end, yet ARM is one of the dominant
> architectures. So that's not the decisive factor.

You seem to be confusing the outcome of history with the intent of corporate decisions. IBM never intended the PC to become a big seller by itself, it was supposed to extend their large computers. That doesn't mean the PC was a marketing stroke of genius. They lucked into it and in the end it was not really a big win for them so they sold it off. That doesn't mean IBM had great foresight in any way.

> >They had newer, faster processors on the drawing board, the company deci=
> >ded to put it's eggs in a different basket.=20
>
> That's more like it.

Not sure what you mean. This was part of the same thing I was saying above. The TMS99xx was intended to support their mini-computers. Mini-computers were coming to an end. TI saw other opportunities for their semiconductor business so they killed off the mini/microcomputer business to pursue other prospects. It's all one thing.

Rick C.

gnuarm.del...@gmail.com

unread,

Mar 23, 2018, 1:30:26 PM3/23/18

to

On Friday, March 23, 2018 at 9:14:53 AM UTC-4, Stephen Pelc wrote:
> On Thu, 22 Mar 2018 19:20:58 -0700 (PDT),
> gnuarm.del...@gmail.com wrote:
>
> >I don't think you understand the need for the 4x clock in the early process=
> >ors. The same was true for the Intel 8080 and even the three chip CPU in t=
> >he LSI-11.
>
> NMOS died, thank heavens.

Hey! It was a huge improvement over the PMOS design of the 8008 which gave the 8080 it's three clock basic machine cycle. The 8008 had a single 8 bit bus for address and data, so cycle 1 was address upper half, cycle 2 as address lower half, cycle 3 was data. (I may have address upper and lower mixed up). The 8080 used these same cycles without the address multiplexing although the mux came back in the 8085.

NMOS didn't so much die as morph into CMOS.

> Regardless of technical merit, the CPUs that won the race from the
> late 1970s were the ones that raised their clock speed significantly.
> These were the descendants of the 8086 and 68000. Note also that 1987
> saw the Sun 4/260, arguably the first commercial SPARC/RISC
> workstation. The 80386 appeared in 1985.
>
> The other killer for 64 kb constrained architectures was the need for
> bank-switching for extended code and data memory.

Yes, and that is why the 8086 is just a mode in the 80286. It was a poor man's way to get past the 16 bit barrier which was largely replaced and honed in the 286 and 386. TI was doing similar things with the 99000. So the 9900 architecture wasn't unworkable, it died because TI found better business areas than supporting a dying mini-computer line.

Rick C.

Anton Ertl

unread,

Mar 24, 2018, 3:42:21 AM3/24/18

to

gnuarm.del...@gmail.com writes:
>On Friday, March 23, 2018 at 5:05:46 AM UTC-4, Anton Ertl wrote:
>> gnuarm.del...@gmail.com writes:

>> >The TI line didn't die because the processors were s=3D
>> >low, it died because TI started to focus their attention elsewhere. Aft=
>er =3D
>> >all, most of the 9900 product line was intended to be used in their 990 =
>min=3D
>> >icomputer line and the day of the minicomputer was clearly coming to an =
>end=3D
>> >.
>>=20

>> And ARM was intended to be used in home computers, and home computers
>> (and Acorn itself) have come to an end, yet ARM is one of the dominant
>> architectures. So that's not the decisive factor.
>

>You seem to be confusing the outcome of history with the intent of corporat=
>e decisions.

This points out that the original purpose of an architecture does not
determine its long-term viability.

>IBM never intended the PC to become a big seller by itself, i=
>t was supposed to extend their large computers. That doesn't mean the PC w=
>as a marketing stroke of genius. They lucked into it and in the end it was=
> not really a big win for them so they sold it off. That doesn't mean IBM =
>had great foresight in any way. =20

The IBM PC and its followups were a success because personal computers
were a success, because IBM did not cripple it (they did not perceive
it as threat to their large computers, and it was not, for a while),
and because it was an open system that inspired a lot of add-in cards
and other network effects.

I am pretty sure that the line produced a lot of revenue and profits
for them and is the basis for turning IBM into the more
service-oriented company that it is now. If they had not had the IBM
PC, maybe they would have ended up like DEC.

In the end (at least that's what an IBMer told me) their customers
expected Dell prices with free IBM service from an IBM PC, and they
could not make a profit on that, so they sold that part of the
business off.

>> >They had newer, faster processors on the drawing board, the company deci=

>=3D
>> >ded to put it's eggs in a different basket.=3D20

>>=20
>> That's more like it.
>
>Not sure what you mean.

If they had a better alternative, and compatibility was not a concern
(e.g., because the product line was not successful), of course they
would go for the alternative.

>This was part of the same thing I was saying above=
>. The TMS99xx was intended to support their mini-computers. Mini-computer=

>s were coming to an end.

TMS99xx were microprocessors, so TI's machines using them were
microcomputers (even though they were descended from a line of
minicomputers).

DEC built the VAX as a successor of their successful PDP-11 line which
suffered from the limitations of a 16-bit architecture, and they were
very successful with the VAX, for a decade or so. So it's not that
one could not build upon the success of minicomputers. I guess,
though, that the TI-990 was not that successful from the start (it was
also late to the party, starting in 1973).

It seems to me that in this case technical reasons and business
reasons coincided to kill off the line.

Mark Wills

unread,

Mar 24, 2018, 6:25:11 AM3/24/18

to

On Friday, March 23, 2018 at 1:32:53 AM UTC, gnuarm.del...@gmail.com wrote:
>If they had persisted and kept the processor design alive until they could >integrate fast memory on chip they would have had a potential winner again. TI >didn't want to be in the microcomputer business at that time and killed the >product line entirely.

> Rick C.

They did that with the TMS9995 - it had 256 bytes of on-chip ram, which was more
than enough for holding multiple register sets, and it worked very well.

Too little too late though; a 16-bit CPU with a 64K address space isn't going
to set the world on fire. By the time the TMS9995 was out it was up against
the 68000, which was a far more attractive option if you wanted a 16-bitter.

Ah well, it's all water under many bridges!

gnuarm.del...@gmail.com

unread,

Mar 24, 2018, 12:31:43 PM3/24/18

to

On Saturday, March 24, 2018 at 2:42:21 AM UTC-5, Anton Ertl wrote:
> gnuarm.del...@gmail.com writes:
> >On Friday, March 23, 2018 at 5:05:46 AM UTC-4, Anton Ertl wrote:
> >> gnuarm.del...@gmail.com writes:
> >> >The TI line didn't die because the processors were s=3D
> >> >low, it died because TI started to focus their attention elsewhere. Aft=
> >er =3D
> >> >all, most of the 9900 product line was intended to be used in their 990 =
> >min=3D
> >> >icomputer line and the day of the minicomputer was clearly coming to an =
> >end=3D
> >> >.
> >>=20
> >> And ARM was intended to be used in home computers, and home computers
> >> (and Acorn itself) have come to an end, yet ARM is one of the dominant
> >> architectures. So that's not the decisive factor.
> >
> >You seem to be confusing the outcome of history with the intent of corporat=
> >e decisions.
>
> This points out that the original purpose of an architecture does not
> determine its long-term viability.

We are talking two different things. You are talking about predicting the market and I am talking about corporate intent. If TI decides to drop their minicomputer line and the 99xx line has a purpose of supporting that, it is dead as well. No one will ever know if the 99xx architecture would have competed against other devices or for how long.

> >IBM never intended the PC to become a big seller by itself, i=
> >t was supposed to extend their large computers. That doesn't mean the PC w=
> >as a marketing stroke of genius. They lucked into it and in the end it was=
> > not really a big win for them so they sold it off. That doesn't mean IBM =
> >had great foresight in any way. =20
>
> The IBM PC and its followups were a success because personal computers
> were a success, because IBM did not cripple it (they did not perceive
> it as threat to their large computers, and it was not, for a while),
> and because it was an open system that inspired a lot of add-in cards
> and other network effects.

None of which IBM foresaw.

> I am pretty sure that the line produced a lot of revenue and profits
> for them and is the basis for turning IBM into the more
> service-oriented company that it is now. If they had not had the IBM
> PC, maybe they would have ended up like DEC.

We will never know. The point is that IBM never intended for the PC to become what it was. They didn't have the foresight and didn't have the marketing interest. TI killed off their micro products because they saw the minicomputer market going away, but didn't foresee the rise of the microcomputer market. IBM created the PC to facilitate their large computer lines, but didn't foresee the rise of the microcomputer market.

> In the end (at least that's what an IBMer told me) their customers
> expected Dell prices with free IBM service from an IBM PC, and they
> could not make a profit on that, so they sold that part of the
> business off.

Yep, they couldn't adapt to that market, so they left it. IBM had a great reputation for high quality products. I can't believe they couldn't make a profit on business computers. They just changed their focus and so the PC line was out the door.

> >> >They had newer, faster processors on the drawing board, the company deci=
> >=3D
> >> >ded to put it's eggs in a different basket.=3D20
> >>=20
> >> That's more like it.
> >
> >Not sure what you mean.
>
> If they had a better alternative, and compatibility was not a concern
> (e.g., because the product line was not successful), of course they
> would go for the alternative.

Why? If they didn't foresee the rise of the microcomputer that eventually would happen, why would they produce *any* microcomputer product?

> >This was part of the same thing I was saying above=
> >. The TMS99xx was intended to support their mini-computers. Mini-computer=
> >s were coming to an end.
>
> TMS99xx were microprocessors, so TI's machines using them were
> microcomputers (even though they were descended from a line of
> minicomputers).

Most of the TMS9900 family were designed to be used in the TMS990 line of computers. That it could be used in microcomputers was just a side line to TI. You can call them what you want, their minicomputers were not sold as PCs, they were sold as minicomputers to replace other minicomputers.

> DEC built the VAX as a successor of their successful PDP-11 line which
> suffered from the limitations of a 16-bit architecture, and they were
> very successful with the VAX, for a decade or so. So it's not that
> one could not build upon the success of minicomputers. I guess,
> though, that the TI-990 was not that successful from the start (it was
> also late to the party, starting in 1973).

And... minicomputers were on the fall as a whole, not a rise. DEC was building minicomputers, pushing them upward to compete with main frames which ultimately were dying a slow death as well. There was no success to build on. The route to take was down in size and price by changing the business model away from minicomputers entirely which TI did and DEC did not... which is still around?

> It seems to me that in this case technical reasons and business
> reasons coincided to kill off the line.

I don't agree that technical reasons had to do with it. The TMS9900 was a rather competitive processor in many ways. The TMS9995 "fixed" the issue of size coming in a 40 pin package (the 9900 was in a 64 pin package) and ran faster at the same time. The TMS99xx family had some legs. It just didn't have a willing owner.

Rick C.

Matthias Koch

unread,

Apr 25, 2018, 8:32:10 AM4/25/18

to

With thanks to the SiFive team for a hardware donation,
Mecrisp-Quintus 0.4 is now available for the HiFive1 board.

Matthias