Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

New soft processor core paper publisher?

1,144 views
Skip to first unread message

tammi...@gmail.com

unread,
Jun 12, 2013, 5:17:18 PM6/12/13
to
I have a general purpose soft processor core that I developed in verilog. The processor is unusual in that it uses four indexed LIFO stacks with explicit stack pointer controls in the opcode. It is 32 bit, 2 operand, fully pipelined, 8 threads, and produces an aggregate 200 MIPs in bargain basement Altera Cyclone 3 and 4 speed grade 8 parts while consuming ~1800 LEs. The design is relatively simple (as these things go) yet powerful enough to do real work.

I wrote a fairly extensive paper describing the processor, and am about to post it and my code over at opencores.org, but was thinking the paper and the concepts might be good enough for a more formal publication. Any suggestions on who might be interested in publishing it?

Nikolaos Kavvadias

unread,
Jun 13, 2013, 6:33:22 AM6/13/13
to
Hi,

I was in a similar position about 5 years ago. My own processor is the
ByoRISC, a RISC-like extensible custom processor supporting multiple-
input, multiple-output custom instructions.

> I have a general purpose soft processor core that I developed in verilog.  The processor is unusual in that it uses four indexed LIFO stacks with explicit stack pointer controls in the opcode.  It is 32 bit, 2 operand, fully pipelined, 8 threads, and produces an aggregate 200 MIPs in bargain basement Altera Cyclone 3 and 4 speed grade 8 parts while consuming ~1800 LEs.  The design is relatively simple (as these things go) yet powerful enough to do real work.

This reads like a "fourstack" architecture on steroids. It seems
good!
How do you compare with more classic RISC-like soft-cores like
MicroBlaze, Nios-II, LEON, etc?
There is also a classic book on stack-based computers, you really need
to go through this and reference it in your publication.

> I wrote a fairly extensive paper describing the processor, and am about to post it and my code over at opencores.org, but was thinking the paper and the concepts might be good enough for a more formal publication.  Any suggestions on who might be interested in publishing it?

I had chosen to publish to VLSI-SoC 2008 (due to proximity, that year
it was held in Greece).
It is an OK conference, however, not indexed well by DBLP and the
likes.
Anyway, here is a link to my submitted version of the paper:
http://www.nkavvadias.com/publications/kavvadias_vlsisoc08.pdf

The paper was really well accepted at the conference venue. I had
received some of my best reviews.
However, I didn't had the chance to present the paper in person,
because I was in really deep-S in the army and couldn't get a three-
day special leave for the conference. (I joined the army at 31, so I
was s'thing like an elderly private :). For instance, I was about the
same age as all the majors in the camp. Only colonels and people among
permanent staff where older.

On the contrary, I had a hard-time to publish an extended/long version
of the paper as a journal paper. All three publishers were arguing
about the existence of the conference paper, and that due to this
fact, no journal paper version was necessary (even with ~40% material
additions).

My suggestion is to:
a) go for the journal paper (e.g. IEEE Trans. on VLSI or ACM TECS if
you have s'thing really modern)
b) otherwise submit to an FPGA or architecture conference. It depends
on where you live, there are numerous European and worldwide
conferences with processor-related topics (FPGA-based architectures,
GPUs, ASIPs, novel architectures, manycores, etc).

In all cases you may have to adapt your material (e.g. due to page
limits) to the conventions of the publisher.

BTW another more recent example is the paper on the iDEA DSP soft-core
processor:
http://www.ntu.edu.sg/home/sfahmy/files/papers/fpt2012-cheah.pdf

This looks like a lean, mean architecture well-opted for contemporary
FPGAs.


Hope these help.

Best regards,
Nikolaos Kavvadias
http://www.nkavvadias.com

jt_eaton

unread,
Jun 13, 2013, 12:41:49 PM6/13/13
to
>I have a general purpose soft processor core that I developed in verilog.
=
>The processor is unusual in that it uses four indexed LIFO stacks with
expl=
>icit stack pointer controls in the opcode. It is 32 bit, 2 operand, fully
=
>pipelined, 8 threads, and produces an aggregate 200 MIPs in bargain
basemen=
>t Altera Cyclone 3 and 4 speed grade 8 parts while consuming ~1800 LEs.
Th=
>e design is relatively simple (as these things go) yet powerful enough to
d=
>o real work.
>
>I wrote a fairly extensive paper describing the processor, and am about to
=
>post it and my code over at opencores.org, but was thinking the paper and
t=
>he concepts might be good enough for a more formal publication. Any
sugges=
>tions on who might be interested in publishing it?
>

Do you also have an assembler, C++ compiler and debugger for this beast?
You should have a reference design running on a FPGA board if you want to
attract a following. Ideally it should also run linux.

Why can't you do both. Post the code to opencores.org and then write a
paper
about it and publish.

John

---------------------------------------
Posted through http://www.FPGARelated.com

Eric Wallin

unread,
Jun 13, 2013, 1:07:22 PM6/13/13
to
Thank you for your reply Nikolaos!

> This reads like a "fourstack" architecture on steroids. It seems
> good!

"A Four Stack Processor" by Bernd Paysan? I ran across that paper several years ago (thanks!). Very interesting, but with multiple ALUs, access to data below the LIFO tops, TLBs, security, etc. it is much more complex than my processor. It looks like a real bear to program and manage at the lowest level.

> How do you compare with more classic RISC-like soft-cores like
> MicroBlaze, Nios-II, LEON, etc?

The target audience for my processor is an FPGA developer who needs to implement complex functionality that tolerates latency but requires deterministic timing. Hand coding with no toolchain (verilog initial statement boot code). Simple enough to keep the processor model and current state in one's head (with room to spare). Small enough to fit in the smallest of FPGAs (with room to spare). Not meant at all to run a full-blown OS, but not a trivial processor.

> There is also a classic book on stack-based computers, you really need
> to go through this and reference it in your publication.

"Stack Computers: The New Wave" by Philip J. Koopman, Jr.? Also ran across that many years ago (thanks!). The main thrust of it seems to be the advocating of single data stack, single return stack, zero operand machines, which I feel (nothing personal) are crap. Easy to design and implement (I've made several while under the spell) but impossible to program in an efficient manner (gobs of real time wasted on stack thrash, the minimization of which leads directly to unreadable procedural coding practices, which leads to catastrophic stack faults).

> On the contrary, I had a hard-time to publish an extended/long version
> of the paper as a journal paper. All three publishers were arguing
> about the existence of the conference paper, and that due to this
> fact, no journal paper version was necessary (even with ~40% material
> additions).

Hmm. The last thing I want is to have my hands tied when I'm trying to give something away for free. But my paper would likely benefit from external editorial input.

> My suggestion is to:
> a) go for the journal paper (e.g. IEEE Trans. on VLSI or ACM TECS if
> you have s'thing really modern)

My processor incorporates what I believe are a couple of new innovations (but who ever really knows?) that I'd like to get out there if possible. And I wouldn' mind a bit of personal recognition if only for my efforts.

IEEE is probably out. I fundamentally disagree with the hoarding of tecnical papers behind a greedy paywall.

> BTW another more recent example is the paper on the iDEA DSP soft-core
> processor:
> http://www.ntu.edu.sg/home/sfahmy/files/papers/fpt2012-cheah.pdf

Wow, very nice paper describing a very nice design, thanks!

Eric Wallin

unread,
Jun 13, 2013, 1:23:56 PM6/13/13
to
Thanks for the response John!

> Do you also have an assembler, C++ compiler and debugger for this beast?
> You should have a reference design running on a FPGA board if you want to
> attract a following. Ideally it should also run linux.

See my response to Nikolaos above. Full-blown OS support was not the development target. But it's not a pico-blaze either. Somewhere in the middle, mainly for FPGA algorithms that can benefit from serialization.

> Why can't you do both. Post the code to opencores.org and then write a
> paper about it and publish.

That's probably the route I'll end up taking.

Theo Markettos

unread,
Jun 13, 2013, 8:44:03 PM6/13/13
to
Eric Wallin <tammi...@gmail.com> wrote:
> Thanks for the response John!
>
> > Do you also have an assembler, C++ compiler and debugger for this beast?
> > You should have a reference design running on a FPGA board if you want to
> > attract a following. Ideally it should also run linux.
>
> See my response to Nikolaos above. Full-blown OS support was not the
> development target. But it's not a pico-blaze either. Somewhere in the
> middle, mainly for FPGA algorithms that can benefit from serialization.

Benchmarks. Tell us why we should use your processor. How does it win
compared with the alternatives?

How easy is it to program? An assembler or a C compiler are really
necessary to make something usable - LLVM may come in handy as a C compiler
toolkit, I'm not sure what's an equivalent assembler toolkit.

Actually synthesise the thing. It's hard to take seriously something that's
never actually been tested for real, especially if it makes assumptions like
having gigabytes of single-cycle-latency memory. Debug it and make sure it
works in real hardware.

> > Why can't you do both. Post the code to opencores.org and then write a
> > paper about it and publish.

If you put it on opencores, document document document. There are tons of
half-baked projects with lame or nonexistent documentation, that kind of
half work on the author's dev system but fall over in real life for one
reason or another.

Is it vendor-independent, or does it use Xilinx/Altera/etc special stuff?
If so, how easily can that be replaced with an alternative vendor?

Regression tests and test suites. How do we know it's working? Can we work
on the code and make sure we don't break anything? What does 'working' mean
in the first place?


If you're trying to make an argument in computer architecture you can get
away without some of this stuff (a research prototype can have rough edges
because it's only to prove a point, as long as you tell us what they are).
Generally you need to tell a convincing story, and either the story is that
XYZ is a useful approach to take (so we can throw away the prototype and
build something better) or XYZ is a component people should use (when it
becomes more convincing if there's more support)

Some lists of well-known conferences:
http://sites.google.com/site/calasweb/fpga-conferences-and-workshops
http://tcfpga.org/conferences.html

Good luck :)

Theo

Eric Wallin

unread,
Jun 14, 2013, 11:43:26 AM6/14/13
to
Thanks for your response Theo!

On Thursday, June 13, 2013 8:44:03 PM UTC-4, Theo Markettos wrote:

> Benchmarks. Tell us why we should use your processor. How does it win
> compared with the alternatives?

Good point. So far I've coded a verification boot code gauntlet that it has passed, as well as restoring division and log2. If I had more code to push through it I could statistically tailor the instruction set (size the immediates, etc.) but I don't. I may at some point but I may not either. This is mainly for me, to help me to implement various projects that require complex computations in an FPGA (I currently need it for a digital Theremin that is under development), but I want to release it so others may examine and possibly use it or help me make it better, or use some of the ideas in there in their own stuff.

> How easy is it to program? An assembler or a C compiler are really
> necessary to make something usable - LLVM may come in handy as a C compiler
> toolkit, I'm not sure what's an equivalent assembler toolkit.

It's fairly general purpose and I think if you read the paper you might (or might not) find it easy to understand and program by hand using verilog initial statements. My main goals were that it be simple enough to grasp without tools, complex and fast enough to do real things, have compact opcodes so BRAM isn't wasted, etc. A compiler, OS, etc. are overkill and definitely not the intended target.

There is a middle ground between trivial and full-blown processors (particularly for FPGA logical use). Of all the commercial offerings in this range that I'm aware of, my processor is probably most similar to the Parallax Propeller, which is almost certainly pipeline threaded (though they don't tell you that in the documentation). The Propeller and has a video generator; character, sine, and log tables; and other stuff mine doesn't. But mine has a simpler, more unified overall concept and programming model. It is a true middle ground between register and stack machines.

> Actually synthesise the thing. It's hard to take seriously something that's
> never actually been tested for real, especially if it makes assumptions like
> having gigabytes of single-cycle-latency memory. Debug it and make sure it
> works in real hardware.

Not trying to argue from authority, but I've got 10 years of professional HDL experience, and have made several processors in the past for my own edification and had them up and running on Xilinx demo boards. This one hasn't actually run in the flesh yet, but it has gone through the build process many times and has been pretty thoroughly verified, so I would be amazed if there were any issues (famous last words). But I'll run it on a Cyclone IV board before releasing it.

> If you put it on opencores, document document document. There are tons of
> half-baked projects with lame or nonexistent documentation, that kind of
> half work on the author's dev system but fall over in real life for one
> reason or another.

I know what you mean, I never use any code directly from there. To be fair, most of the code I ran across in industry was fairly poor as well. Anyway, I've got a really nice document that took me about a month to write, with lots of drawings, tables, examples, etc. describing the design and my thoughts behind it. Even if people don't particularly like my processor they might be able to get something out of the background info in the paper (FPGA multipliers and RAM, LIFO & ALU design, pipelining, register set construction, etc.).

> Is it vendor-independent, or does it use Xilinx/Altera/etc special stuff?
> If so, how easily can that be replaced with an alternative vendor?

I was careful to not use vendor specific constructs in the verilog. The block RAM for main memory the the stacks is inferred, as are the ALU signed multipliers. I spent a long time on the modular partitioning of the code with a strong eye towards verification (as I usually do). The code was developed in Quartus, and has been compiled many, many times, but I haven't run it through XST yet.

> Regression tests and test suites. How do we know it's working? Can we work
> on the code and make sure we don't break anything? What does 'working' mean
> in the first place?

I'm probably an odd man out, but I don't agree with a lot of "standard" industry verification methodology. Test benches are fine for really complex code and / or data environments, but there is no substitute for good coding, proper modular partitioning, and thorough hand testing of each module. I've seen too many out of control projects with designers throwing things over various walls, leaving the verification up to the next guy who usually isn't familiar enough with it to really bang on the sensitive parts. And I kind of hate modelsim.

Anyone that codes should spend a lot of time verifying - I do, and for the most part really enjoy it. The industry has turned this essential activity into something most people loathe, so it just doesn't happen unless people get pushed into doing it. And even then it usually doesn't get done very thoroughly. Co-developing in environments like that is a nightmare.
Thanks, I'll check them out!

Theo Markettos

unread,
Jun 14, 2013, 5:29:15 PM6/14/13
to
Eric Wallin <tammi...@gmail.com> wrote:
> Thanks for your response Theo!
>
> On Thursday, June 13, 2013 8:44:03 PM UTC-4, Theo Markettos wrote:
>
> > Benchmarks. Tell us why we should use your processor. How does it win
> > compared with the alternatives?
>
> Good point. So far I've coded a verification boot code gauntlet that it
> has passed, as well as restoring division and log2. If I had more code to
> push through it I could statistically tailor the instruction set (size the
> immediates, etc.) but I don't. I may at some point but I may not either.
> This is mainly for me, to help me to implement various projects that
> require complex computations in an FPGA (I currently need it for a digital
> Theremin that is under development), but I want to release it so others
> may examine and possibly use it or help me make it better, or use some of
> the ideas in there in their own stuff.

FWIW 'benchmarks' doesn't necessarily mean running SPECfoo at 2.7 times
quicker than a 4004, but things like 'how many instructions does it take to
write division/FFT/quicksort/whatever' compared with the leading brand. Or
how many LEs, BRAMs, mW, etc. Numbers are good (as is publishing the source
so we can reproduce them).

> > How easy is it to program? An assembler or a C compiler are really
> > necessary to make something usable - LLVM may come in handy as a C compiler
> > toolkit, I'm not sure what's an equivalent assembler toolkit.
>
> It's fairly general purpose and I think if you read the paper you might
> (or might not) find it easy to understand and program by hand using
> verilog initial statements. My main goals were that it be simple enough
> to grasp without tools, complex and fast enough to do real things, have
> compact opcodes so BRAM isn't wasted, etc. A compiler, OS, etc. are
> overkill and definitely not the intended target.

Fair enough. If you're making architectural points, you can probably get
away with assembly examples. A simple assembler is good for developer
sanity, though. Could probably be knocked up in Python reasonably fast.

> > Actually synthesise the thing. It's hard to take seriously something that's
> > never actually been tested for real, especially if it makes assumptions like
> > having gigabytes of single-cycle-latency memory. Debug it and make sure it
> > works in real hardware.
>
> Not trying to argue from authority, but I've got 10 years of professional
> HDL experience, and have made several processors in the past for my own
> edification and had them up and running on Xilinx demo boards. This one
> hasn't actually run in the flesh yet, but it has gone through the build
> process many times and has been pretty thoroughly verified, so I would be
> amazed if there were any issues (famous last words). But I'll run it on a
> Cyclone IV board before releasing it.

I'm just a bit jaded from seeing papers at conferences where somebody wrote
some verilog which they only ran in modelsim, and never had to worry about
limited BRAM, or meeting timing, or multiple clock domains, or...

> > If you put it on opencores, document document document. There are tons of
> > half-baked projects with lame or nonexistent documentation, that kind of
> > half work on the author's dev system but fall over in real life for one
> > reason or another.
>
> I know what you mean, I never use any code directly from there. To be
> fair, most of the code I ran across in industry was fairly poor as well.
> Anyway, I've got a really nice document that took me about a month to
> write, with lots of drawings, tables, examples, etc. describing the
> design and my thoughts behind it. Even if people don't particularly like
> my processor they might be able to get something out of the background
> info in the paper (FPGA multipliers and RAM, LIFO & ALU design,
> pipelining, register set construction, etc.).

This is good. Just a thought - could you angle it as 'how to do processor
design' using your processor as a case study? That makes it more of a
useful tutorial than 'buy our brand, it's great'...

> I'm probably an odd man out, but I don't agree with a lot of "standard"
> industry verification methodology. Test benches are fine for really
> complex code and / or data environments, but there is no substitute for
> good coding, proper modular partitioning, and thorough hand testing of
> each module. I've seen too many out of control projects with designers
> throwing things over various walls, leaving the verification up to the
> next guy who usually isn't familiar enough with it to really bang on the
> sensitive parts. And I kind of hate modelsim.

That's not exactly what I meant... let's say you rearrange the pipelining on
your CPU. It turns out you introduce some obscure bug that causes branches to
jump to the wrong place if there's a multiply 3 instructions back from the
branch. How would you know if you did this, and make sure it didn't happen
again? Hand testing modules won't catch that.

It's worse if there's an OS involved, of course. But it can be easy to
introduce stupid bugs when you're refactoring something, and waste a lot of
time tracking them down.

We use Bluespec so avoid modelsim ;-)
(with Jenkins so we run the test suite for every commit. A bit overkill for
your needs, perhaps)

> Anyone that codes should spend a lot of time verifying - I do, and for the
> most part really enjoy it. The industry has turned this essential
> activity into something most people loathe, so it just doesn't happen
> unless people get pushed into doing it. And even then it usually doesn't
> get done very thoroughly. Co-developing in environments like that is a
> nightmare.

I admit the tools don't always make it easy...

Theo

Eric Wallin

unread,
Jun 14, 2013, 6:49:31 PM6/14/13
to
On Friday, June 14, 2013 5:29:15 PM UTC-4, Theo Markettos wrote:

> FWIW 'benchmarks' doesn't necessarily mean running SPECfoo at 2.7 times
> quicker than a 4004, but things like 'how many instructions does it take to
> write division/FFT/quicksort/whatever' compared with the leading brand. Or
> how many LEs, BRAMs, mW, etc. Numbers are good (as is publishing the source
> so we can reproduce them).

I have FPGA resource numbers for the Cyclone III target in the paper. Briefly, it consumes ~1800 LEs, 4 18x18 multipliers, 4 BRAMs for the stacks, plus whatever the main memory needs. This is roughly 1/3 of the smallest Cyclone III part. I have a restoring division example in the paper that gives 197 / 293 cycles best / worst case (a thread cycle is 8 200MHz clocks, but there are 8 threads running at this speed so aggregate throughput is potentially 200 MIPs if all threads are busy doing something).

I've seen lots of papers that claim speed numbers but don't give the speed grade, or tell you what hoops they jumped through to get those speeds. Without that info the speeds are meaningless.

> Fair enough. If you're making architectural points, you can probably get
> away with assembly examples. A simple assembler is good for developer
> sanity, though. Could probably be knocked up in Python reasonably fast.

That's certainly possible. At this point I'm writing code for it directly in verilog using an initial statement text file that gets included in the main memory. Several define statements make this clearer and actually fairly easy. But uploading code to a boot loader would require something like an assembler. I'm really trying to stay away from the need for toolsets.

> This is good. Just a thought - could you angle it as 'how to do processor
> design' using your processor as a case study? That makes it more of a
> useful tutorial than 'buy our brand, it's great'...

The paper is kind of that, background and general how to, but my processor doesn't have caches, branch prediction, pipeline hazards, TLBs, etc. so people wanting to know how to do that stuff will come up totally empty.

> That's not exactly what I meant... let's say you rearrange the pipelining on
> your CPU. It turns out you introduce some obscure bug that causes branches to
> jump to the wrong place if there's a multiply 3 instructions back from the
> branch. How would you know if you did this, and make sure it didn't happen
> again? Hand testing modules won't catch that.

It's correct by construction! ;-) Seriously though, there are no hazards to speak of and very little internal state, so branches pretty much either work or they don't. Once basic functionality was confirmed in simulation, I used processor code to check the processor itself e.g. I wrote some code that checks all branches against all possible branch conditions. Each test increments a count if it passes or decrements if it fails. The final passing number can only be reached if all tests pass. I've got simple code like this to test all of the opcodes. This exercise can help give an early feel for the completeness of the instruction set as well. Verifying something like the Pentium must be one agonizingly painful mountain to climb. Verifying each silicon copy must be a bear as well.

Eric Wallin

unread,
Jun 15, 2013, 3:45:48 PM6/15/13
to
So is there anything like the old Byte magazine (or a web equivalent) where enthusiasts and other non-academic, non-industry types can publish articles on computers / computing hardware?

rickman

unread,
Jun 15, 2013, 8:40:27 PM6/15/13
to
On 6/13/2013 1:07 PM, Eric Wallin wrote:
> Thank you for your reply Nikolaos!
>
>> This reads like a "fourstack" architecture on steroids. It seems
>> good!
>
> "A Four Stack Processor" by Bernd Paysan? I ran across that paper several years ago (thanks!). Very interesting, but with multiple ALUs, access to data below the LIFO tops, TLBs, security, etc. it is much more complex than my processor. It looks like a real bear to program and manage at the lowest level.
>
>> How do you compare with more classic RISC-like soft-cores like
>> MicroBlaze, Nios-II, LEON, etc?
>
> The target audience for my processor is an FPGA developer who needs to implement complex functionality that tolerates latency but requires deterministic timing. Hand coding with no toolchain (verilog initial statement boot code). Simple enough to keep the processor model and current state in one's head (with room to spare). Small enough to fit in the smallest of FPGAs (with room to spare). Not meant at all to run a full-blown OS, but not a trivial processor.

That the ground I have been plowing off and on for the last 10 years.


>> There is also a classic book on stack-based computers, you really need
>> to go through this and reference it in your publication.
>
> "Stack Computers: The New Wave" by Philip J. Koopman, Jr.? Also ran across that many years ago (thanks!). The main thrust of it seems to be the advocating of single data stack, single return stack, zero operand machines, which I feel (nothing personal) are crap. Easy to design and implement (I've made several while under the spell) but impossible to program in an efficient manner (gobs of real time wasted on stack thrash, the minimization of which leads directly to unreadable procedural coding practices, which leads to catastrophic stack faults).

I assume that you do understand that the point of MISC is that the
implementation can be minimized so that the instructions run faster. In
theory this makes up for the extra instructions needed to manipulate the
stack on occasion. But I understand your interest in minimizing the
inconvenience of stack ops. I spent a little time looking at
alternatives and am currently looking at a stack CPU design that allows
offsets into the stack to get around the extra stack ops. I'm not sure
how this compares to your ideas. It is still a dual stack design as I
have an interest in keeping the size of the implementation at a minimum.
1800 LEs won't even fit on the FPGAs I am targeting.


> My processor incorporates what I believe are a couple of new innovations (but who ever really knows?) that I'd like to get out there if possible. And I wouldn' mind a bit of personal recognition if only for my efforts.

I would like to hear about your innovations. As you seem to understand,
it is hard to be truly innovative finding new ideas that others have not
uncovered. But I think you are certainly in an area that is not
thoroughly explored.


> IEEE is probably out. I fundamentally disagree with the hoarding of tecnical papers behind a greedy paywall.

I won't argue with that. Even when I was an IEEE member, I never found
a document I didn't have to pay for.

When can we expect to see your paper?

--

Rick

Eric Wallin

unread,
Jun 15, 2013, 10:17:24 PM6/15/13
to
Thanks for your response rickman!

On Saturday, June 15, 2013 8:40:27 PM UTC-4, rickman wrote:
> That the ground I have been plowing off and on for the last 10 years.

Ooo, same here, and my condolences. I caught a break a couple of months ago and have been beavering away on it ever since, and I finally have something that doesn't cause me to vomit when I code for it. Multiple indexed simple stacks with explicit pointer control makes everything a lot easier than a bog standard stack machine. I think the auto-consumption of literally everything, particularly the data, indexes, and pointers you dearly want to use again is at the bottom of all the crazy people just accept with stack machines. This mechanism works great for manual data entry on HP calculators, but not so much for stack machines IMHO. Auto consumption also pretty much rules out conditional execution of single instructions.

> I assume that you do understand that the point of MISC is that the
> implementation can be minimized so that the instructions run faster. In
> theory this makes up for the extra instructions needed to manipulate the
> stack on occasion. But I understand your interest in minimizing the
> inconvenience of stack ops. I spent a little time looking at
> alternatives and am currently looking at a stack CPU design that allows
> offsets into the stack to get around the extra stack ops. I'm not sure
> how this compares to your ideas. It is still a dual stack design as I
> have an interest in keeping the size of the implementation at a minimum.

MISC is interesting, but you have to consider that all ops, including simple stack manipulations, will generally consume as much real time as a multiply, which suddenly makes all of those confusing stack gymnastics you have to perform to dig out your loop index or whatever from underneath your read/write pointer from underneath your data and such overly burdensome.

Indexes into a moving stack - that way lies insanity. Ever hit the roll down button on an HP calculator and get instantly flummoxed? Maybe a compiler can keep track of that kind of stuff, but my weak brain isn't up to the task.

Altera BRAM doesn't go as wide as Xilinx with true dual port. When I was working in Xilinx I was able to use a single BRAM for both the data and return stacks (16 bit data).

> 1800 LEs won't even fit on the FPGAs I am targeting.

I'm not sure anything less than the smallest Cyclone 2 is really worth developing in. A lot of the stuff below that is often more expensive due to the built-in configuration memory and such. There are quite inexpensive Cyclone dev boards on eBay from China.

> I would like to hear about your innovations. As you seem to understand,
> it is hard to be truly innovative finding new ideas that others have not
> uncovered. But I think you are certainly in an area that is not
> thoroughly explored.

I haven't seen anything exactly like it, certainly not the way the stacks are implemented. And I deal with extended arithmetic results in an unusual way. In terms of scheduling and pipelining, the Parallax Propeller is probably the closest in architecture (you can infer from the specs and operational model what they don't explicitly tell you in the datasheet).

> I won't argue with that. Even when I was an IEEE member, I never found
> a document I didn't have to pay for.

I was a member too right out of grad school. But, like Janet Jackson sang: "What have they done for me lately?"

> When can we expect to see your paper?

It's all but done, just picking around the edges at this point. As soon as the code is verified to my satisfaction I'll release both and post here.

Tom Gardner

unread,
Jun 16, 2013, 5:23:01 AM6/16/13
to
Eric Wallin wrote:

> Indexes into a moving stack - that way lies insanity. Ever hit the roll down button on an HP calculator and get instantly flummoxed? Maybe a compiler can keep track of that kind of stuff, but my weak brain isn't up to the task.

Have a look at comp.arch, in particular the current discussion about the "belt" in the Mill processor. Start by watching the video.

The Mill is a radical architecture that offers far greater instruction level parallelism than existing processors, partly by having no general purpose registers.

The Mill is irrelevant to FPGA processors; it is aimed at beating x86 machines.

Eric Wallin

unread,
Jun 16, 2013, 9:58:24 AM6/16/13
to
On Sunday, June 16, 2013 5:23:01 AM UTC-4, Tom Gardner wrote:

> The Mill is irrelevant to FPGA processors...

The Mill looks vaguely interesting (if you're into billion transistor processors) but as you indicated I'm not sure how it is relevant to this thread?

Tom Gardner

unread,
Jun 16, 2013, 1:16:42 PM6/16/13
to
You wrote "Indexes into a moving stack - that way lies insanity."
The Mill's belt is effectively exactly that, and they appear not to have gone insane.

Eric Wallin

unread,
Jun 16, 2013, 8:32:56 PM6/16/13
to
On Sunday, June 16, 2013 1:16:42 PM UTC-4, Tom Gardner wrote:

> You wrote "Indexes into a moving stack - that way lies insanity."
> The Mill's belt is effectively exactly that, and they appear not to have gone insane.

I bet they would if they tried to hand code it in assembly! ;-)

The first video comment is priceless: "Gandalf?"

Tom Gardner

unread,
Jun 17, 2013, 6:26:49 AM6/17/13
to
Eric Wallin wrote:
> On Sunday, June 16, 2013 1:16:42 PM UTC-4, Tom Gardner wrote:
>
>> You wrote "Indexes into a moving stack - that way lies insanity."
>> The Mill's belt is effectively exactly that, and they appear not to have gone insane.
>
> I bet they would if they tried to hand code it in assembly! ;-)

It *is* considerably easier than hand-coding Itanium. With that
you change *any* aspect of the microarchitecture and you go back
to the beginning. How do I know? I asked someone that was doing
it to assess its performance, and decided to Run Away from
anything to do with the Itanium.


> The first video comment is priceless: "Gandalf?"

How shallow :)


rickman

unread,
Jun 19, 2013, 5:13:04 PM6/19/13
to
On 6/15/2013 10:17 PM, Eric Wallin wrote:
> Thanks for your response rickman!
>
> On Saturday, June 15, 2013 8:40:27 PM UTC-4, rickman wrote:
>> That the ground I have been plowing off and on for the last 10 years.
>
> Ooo, same here, and my condolences. I caught a break a couple of months ago and have been beavering away on it ever since, and I finally have something that doesn't cause me to vomit when I code for it. Multiple indexed simple stacks with explicit pointer control makes everything a lot easier than a bog standard stack machine. I think the auto-consumption of literally everything, particularly the data, indexes, and pointers you dearly want to use again is at the bottom of all the crazy people just accept with stack machines. This mechanism works great for manual data entry on HP calculators, but not so much for stack machines IMHO. Auto consumption also pretty much rules out conditional execution of single instructions.

I was looking at how to improve a stack design a few months ago and came
to a similar conclusion. My first attempt at getting around the stack
ops was to use registers. I was able to write code that was both
smaller and faster since in my design all instructions are one clock
cycle so executed instruction count equals number of machine cycles.
Well, sort of. My original dual stack design was literally one clock
per instruction. In order to work with clocked block ram the register
machine would use either both phases of two clocks per machine cycle or
four clock cycles.

While pushing ideas around on paper, the J1 design gave me an idea of
adjusting the stack point as well as using an offset in each
instruction. That gave a design that is even faster with fewer
instructions. I'm not sure if it is practical in a small opcode. I
have been working with 8 and 9 bit opcodes, the latest approach with
stack pointer control can fit in 9 bits, but would be happier with a
couple more bits.


>> I assume that you do understand that the point of MISC is that the
>> implementation can be minimized so that the instructions run faster. In
>> theory this makes up for the extra instructions needed to manipulate the
>> stack on occasion. But I understand your interest in minimizing the
>> inconvenience of stack ops. I spent a little time looking at
>> alternatives and am currently looking at a stack CPU design that allows
>> offsets into the stack to get around the extra stack ops. I'm not sure
>> how this compares to your ideas. It is still a dual stack design as I
>> have an interest in keeping the size of the implementation at a minimum.
>
> MISC is interesting, but you have to consider that all ops, including simple stack manipulations, will generally consume as much real time as a multiply, which suddenly makes all of those confusing stack gymnastics you have to perform to dig out your loop index or whatever from underneath your read/write pointer from underneath your data and such overly burdensome.

Programming to facilitate stack optimization is king on a stack machine.
I'm not sure how the multiply speed is relevant, but the real question
is just how fast does an algorithm run which has to include all the
instructions needed as well as the clock speed. Then it is also
important to consider resources used. I think you said your design uses
1800 LEs which is a *lot* more than a simple two stack design. They
aren't always available.


> Indexes into a moving stack - that way lies insanity. Ever hit the roll down button on an HP calculator and get instantly flummoxed? Maybe a compiler can keep track of that kind of stuff, but my weak brain isn't up to the task.

Then I don't know why you are designing CPUs, lol! I like RPN
calculators and have trouble using anything else. I also program in
Forth so this all works for me.


> Altera BRAM doesn't go as wide as Xilinx with true dual port. When I was working in Xilinx I was able to use a single BRAM for both the data and return stacks (16 bit data).

I expect Xilinx has some patent that Altera can't get around for a
couple more years. Lattice seems to be pretty good though. I just
would prefer to have an async read since that works in a one clock
machine cycle better.


>> 1800 LEs won't even fit on the FPGAs I am targeting.
>
> I'm not sure anything less than the smallest Cyclone 2 is really worth developing in. A lot of the stuff below that is often more expensive due to the built-in configuration memory and such. There are quite inexpensive Cyclone dev boards on eBay from China.

I don't know about dev board cost, but I can get a 1280 LUT Lattice part
for under $4 in reasonable quantity. That is the area I typically work
in. My big problem is packages. I don't want to have to use extra fine
pitch on PCBs to avoid the higher costs. BGAs require very fine via
holes and fine pitch PCB traces and run the board costs up a bit. None
of the FPGA makers support the parts I like very well. VQ100 is my
favorite, small but enough pins for most projects.


>> I would like to hear about your innovations. As you seem to understand,
>> it is hard to be truly innovative finding new ideas that others have not
>> uncovered. But I think you are certainly in an area that is not
>> thoroughly explored.
>
> I haven't seen anything exactly like it, certainly not the way the stacks are implemented. And I deal with extended arithmetic results in an unusual way. In terms of scheduling and pipelining, the Parallax Propeller is probably the closest in architecture (you can infer from the specs and operational model what they don't explicitly tell you in the datasheet).
>
>> I won't argue with that. Even when I was an IEEE member, I never found
>> a document I didn't have to pay for.
>
> I was a member too right out of grad school. But, like Janet Jackson sang: "What have they done for me lately?"

My mistake was getting involved in the local chapters. Seems IEEE is
just a good ol' boys network and is all about status and going along to
get along. They don't believe in the written rules, more so the
unwritten ones.


>> When can we expect to see your paper?
>
> It's all but done, just picking around the edges at this point. As soon as the code is verified to my satisfaction I'll release both and post here.

Ok, looking forward to it.

--

Rick

Eric Wallin

unread,
Jun 20, 2013, 10:35:30 AM6/20/13
to
On Wednesday, June 19, 2013 5:13:04 PM UTC-4, rickman wrote:

> While pushing ideas around on paper, the J1 design gave me an idea of
> adjusting the stack point as well as using an offset in each
> instruction. That gave a design that is even faster with fewer
> instructions. I'm not sure if it is practical in a small opcode.

Interesting. The J1 strongly influenced me as well.

< I have been working with 8 and 9 bit opcodes, the latest approach with
> stack pointer control can fit in 9 bits, but would be happier with a
> couple more bits.

I decided to stay away from non-powers of 2 widths for instructions and data. Not efficient in standard storage. Having multiple instructions per word I see now as more of a bug than a feature because you have to index into it to return from a subroutine and how / where do you store the index?

> Programming to facilitate stack optimization is king on a stack machine.

I feel that this is a fiddly activity that wastes the programmer's time and creates code that is exceedingly difficult to figure out later.

> I'm not sure how the multiply speed is relevant, but the real question
> is just how fast does an algorithm run which has to include all the
> instructions needed as well as the clock speed.

Multiply is relevant because in a 32 bit machine it will likely be THE speed bottleneck, pulling overall timing down. They include non-fabric registering at the I/O of the FPGA multiply hardware to help pipeline it. Same with BRAM - reads really speed up if you use the "free" output registering (in addition to the synchronous register you are generally forced to use).

> > Indexes into a moving stack - that way lies insanity. Ever hit the roll down button on an HP calculator and get instantly flummoxed? Maybe a compiler can keep track of that kind of stuff, but my weak brain isn't up to the task.
>
> Then I don't know why you are designing CPUs, lol! I like RPN
> calculators and have trouble using anything else. I also program in
> Forth so this all works for me.

Quite the contrary, I've used HP calculators religiously since I won one in a HS engineering contest almost 30 years ago. Too bad they don't make the "real" ones anymore (35S is the best they can do it seems, maybe they lost the plans along with those of the Saturn V). But when I hit the roll down button to find a value on the stack, I have to give up on the other stack items due to confusion. I really want to like Forth, but after reading the books and being repeatedly repelled by the syntax and programming model I gave up.

My goal with CPU design was to make one simple enough to program without special tools, but complex enough to do real work and I think I've finally achieved that.

> I expect Xilinx has some patent that Altera can't get around for a
> couple more years. Lattice seems to be pretty good though. I just
> would prefer to have an async read since that works in a one clock
> machine cycle better.

I like Lattice parts too, and used the original MachXO on many boards in lieu of a CPLD.

But I gave up on single cycle along with two stacks and autoconsumption. Like you say async read BRAM is hard to come by. Single cycle is also slow and strands a bazillion FFs in the fabric.

I wonder if you've read this article:

http://spectrum.ieee.org/semiconductors/processors/25-microchips-that-shook-the-world

Moore made a lot of money off of what seem like frivolous lawsuits, which brings him down several notches in my eyes.

Tom Gardner

unread,
Jun 20, 2013, 10:51:15 AM6/20/13
to
Eric Wallin wrote:
> Quite the contrary, I've used HP calculators religiously since
> I won one in a HS engineering contest almost 30 years ago.
> Too bad they don't make the "real" ones anymore (35S is the
> best they can do it seems, maybe they lost the plans along
> with those of the Saturn V).

I'm sure HP still has the plans for the Saturn, viz
http://www.hpmuseum.org/saturn.htm

Sorry, couldn't resist.


> I really want to like Forth, but after reading the books
> and being repeatedly repelled by the syntax and programming
> model I gave up.

Nobody /writes/ Forth. They write programs that emit Forth.
The most mainstream example of that is printer drivers
emitting PostScript.

rickman

unread,
Jun 20, 2013, 7:50:09 PM6/20/13
to
On 6/20/2013 10:51 AM, Tom Gardner wrote:
> Eric Wallin wrote:
> > I really want to like Forth, but after reading the books
> > and being repeatedly repelled by the syntax and programming
> > model I gave up.

Quitter! If the syntax (or near total lack thereof) bothers you, then
you must have a very thin skin.

> Nobody /writes/ Forth. They write programs that emit Forth.
> The most mainstream example of that is printer drivers
> emitting PostScript.

LOL! I guess I was documenting something this morning rather than
writing code...

--

Rick

Elizabeth D. Rather

unread,
Jun 20, 2013, 9:31:53 PM6/20/13
to
Eric, I'm curious what books these were that you found so offensive?

I'm also baffled about your comment about "programs that emit Forth."
Although PostScript has many features in common with Forth, it is quite
different, both in terms of command set and programming model.

Modern Forths (e.g. since the release of ANS Forth 94) feature a variety
of implementation strategies, ranging from fairly conventional compilers
that generate optimized machine code to more traditional threaded code
models.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

Eric Wallin

unread,
Jun 20, 2013, 11:38:58 PM6/20/13
to
On Thursday, June 20, 2013 7:50:09 PM UTC-4, rickman wrote:
> Quitter! If the syntax (or near total lack thereof) bothers you, then
> you must have a very thin skin.

Ha ha! And I see what you did there.


On Thursday, June 20, 2013 9:31:53 PM UTC-4, Elizabeth D. Rather wrote:
> Eric, I'm curious what books these were that you found so offensive?

The books ("Starting Forth", "Thinkind Forth", "Forth Programmer's Handbook") weren't themselves offensive, but they revealed Forth to be much lamer than I expected for all the stick-it-to-the-man ethos surrounding it. I was totally stoked for a stack-based language that would solve all my problems, but all I got was some books gathering dust.

> I'm also baffled about your comment about "programs that emit Forth."
> Although PostScript has many features in common with Forth, it is quite
> different, both in terms of command set and programming model.

You're looking for Tom Gardner, he's down the hall near the elevators using a little stamp at the bottom of his cane to make little chicken footprints on the floor..

Tom Gardner

unread,
Jun 21, 2013, 5:30:39 AM6/21/13
to
Elizabeth D. Rather wrote:
> I'm also baffled about your comment about "programs that emit
> Forth." Although PostScript has many features in common with
> Forth, it is quite different, both in terms of command set
> and programming model.

It was my comment, based on experience of writing
a few Forth and PostScript programs back in the mid 80s.

I'm know there are differences between Forth and
PostScript, but the similarities are /more/ significant.
Forth is like PostScript in the same way that Delphi
is like Pascal, C# is like Java, Scheme is like Lisp,
Objective-C is like SmallTalk (and unlike C++) etc.

And, more importantly, ForthPostscript is unlike C,
is unlike Prolog, is unlike Lisp, is unlike Smalltalk
is unlike Pascal.

I was beguiled by Forth and to some extend remain so:
I'd /love/ to find a good justification to use it again,
and might do so, just for the hell of it.

I'm sure Forth has moved on since the 80s, but it cannot
escape its ethos any more than C, Prolog, Lisp,
Smalltalk can. Or rather, if it did chance then it wouldn't
be Forth anymore.


> Modern Forths (e.g. since the release of ANS Forth 94)
> feature a variety of implementation strategies, ranging
> from fairly conventional compilers that generate
> optimized machine code to more traditional threaded code models.

I'm sure that's true, but it misses the point.
Interpreted Java is the same as jitted Java is
the same as Hotspotted Java.

Ditto Forth; if it wasn't then it simply wouldn't be Forth!

I suppose I ought to change "nobody writes Forth" to
"almost nobody writes Forth.

I would, however, be interested to know whether Forth
has a defined memory model, since that's a _necessary_
_precondition_ to be able to write portable multithreaded
code that runs on multicore processors. Hells teeth,
even C has now has finally decided that they need to define
a memory model, only 20 years after those With Clue
knew it was required. (I don't think there are any
implementation though!)

David Brown

unread,
Jun 21, 2013, 7:18:32 AM6/21/13
to
On 21/06/13 11:30, Tom Gardner wrote:

> I suppose I ought to change "nobody writes Forth" to
> "almost nobody writes Forth.
>

Shouldn't that be "almost nobody Forth writes" ?

I wonder if Yoda programs in Forth...


(I too would like an excuse to work with Forth again.)

Tom Gardner

unread,
Jun 21, 2013, 10:25:55 AM6/21/13
to
David Brown wrote:
> On 21/06/13 11:30, Tom Gardner wrote:
>
>> I suppose I ought to change "nobody writes Forth" to
>> "almost nobody writes Forth.
>>
>
> Shouldn't that be "almost nobody Forth writes" ?

:)

Still use my HP calculators in preference to
algebraic and half-algebraic calculators!

Eric Wallin

unread,
Jun 22, 2013, 12:44:29 AM6/22/13
to
On Friday, June 21, 2013 10:25:55 AM UTC-4, Tom Gardner wrote:

> Still use my HP calculators in preference to
> algebraic and half-algebraic calculators!

Same here! When manually calculating, give me HP (or similar) or give me death. But Forth kind of sucks (and it pains me deeply to have discovered that). The failure (as I see it) of Forth is the awkward target programming model (or virtual machine as the kids say these days - virtual anything is sexy).

There's a pretty big gulf between between manual data & operation entry on something like an HP (or any hand held calculator) and programming with Forth (or any language). A single stack very much facilitates the former but places handcuffs on the latter. These are two fairly different activities that IMO don't exactly have boatloads of overlap, however much we might want or personally need for it to be so.

rickman

unread,
Jun 22, 2013, 1:07:12 AM6/22/13
to
I just saw that episode a few weeks ago. The chicken contest was pretty
cool actually.

I actually identify with House. Not that I am as smart as he is, but I
have a bad hip (waiting for Obamacare to kick in so I can get a new one)
and have a natural tendency to tick off people unless I work to reign it
in. :]

--

Rick

rickman

unread,
Jun 22, 2013, 1:20:13 AM6/22/13
to
On 6/21/2013 5:30 AM, Tom Gardner wrote:
> Elizabeth D. Rather wrote:
> > I'm also baffled about your comment about "programs that emit
> > Forth." Although PostScript has many features in common with
> > Forth, it is quite different, both in terms of command set
> > and programming model.
>
> It was my comment, based on experience of writing
> a few Forth and PostScript programs back in the mid 80s.
>
> I'm know there are differences between Forth and
> PostScript, but the similarities are /more/ significant.
> Forth is like PostScript in the same way that Delphi
> is like Pascal, C# is like Java, Scheme is like Lisp,
> Objective-C is like SmallTalk (and unlike C++) etc.
>
> And, more importantly, ForthPostscript is unlike C,
> is unlike Prolog, is unlike Lisp, is unlike Smalltalk
> is unlike Pascal.
>
> I was beguiled by Forth and to some extend remain so:
> I'd /love/ to find a good justification to use it again,
> and might do so, just for the hell of it.

If you ever need to control hardware, you will want it. The
interactivity is great.
Actually, when doing things in real time the interactivity sort of goes
away (or at least the utility of it) since you can't type fast enough to
control a robot or intercept a serial port running at 38.4 kbps. But
you can very easily test small portions of your code in ways that are
tricky in C or the other languages you mention. Then those ideas can be
applied to any app... even if not real time.


> I'm sure Forth has moved on since the 80s, but it cannot
> escape its ethos any more than C, Prolog, Lisp,
> Smalltalk can. Or rather, if it did chance then it wouldn't
> be Forth anymore.

Can *any* of us ever escape our ethos? I know I've been trying for many
a year and it is still right here by my side, aren't you Ethos?
Actually, that would be a good name for a couple of dogs, Ethos and
Pathos, Chesapeake Bay retrievers.


> > Modern Forths (e.g. since the release of ANS Forth 94)
> > feature a variety of implementation strategies, ranging
> > from fairly conventional compilers that generate
> > optimized machine code to more traditional threaded code models.
>
> I'm sure that's true, but it misses the point.
> Interpreted Java is the same as jitted Java is
> the same as Hotspotted Java.
>
> Ditto Forth; if it wasn't then it simply wouldn't be Forth!
>
> I suppose I ought to change "nobody writes Forth" to
> "almost nobody writes Forth.

So now I've been demoted to to "almost" nobody? What do I have to do to
work my way *up* to nobody?


> I would, however, be interested to know whether Forth
> has a defined memory model, since that's a _necessary_
> _precondition_ to be able to write portable multithreaded
> code that runs on multicore processors. Hells teeth,
> even C has now has finally decided that they need to define
> a memory model, only 20 years after those With Clue
> knew it was required. (I don't think there are any
> implementation though!)

Uh, can *someone* who knows something answer that one?

Do you have a lot of call to program multithreaded code for multicore
processors? What sort of apps are you coding?

--

Rick

rickman

unread,
Jun 22, 2013, 1:23:18 AM6/22/13
to
On 6/21/2013 7:18 AM, David Brown wrote:
> On 21/06/13 11:30, Tom Gardner wrote:
>
>> I suppose I ought to change "nobody writes Forth" to
>> "almost nobody writes Forth.
>>
>
> Shouldn't that be "almost nobody Forth writes" ?

I would say that was "nobody almost Forth writes". Wouldn't it be [noun
[adjective] [noun [adjective]]] verb?

> (I too would like an excuse to work with Forth again.)

What do you do instead?

--

Rick

rickman

unread,
Jun 22, 2013, 1:26:53 AM6/22/13
to
On 6/22/2013 12:44 AM, Eric Wallin wrote:
> On Friday, June 21, 2013 10:25:55 AM UTC-4, Tom Gardner wrote:
>
>> Still use my HP calculators in preference to
>> algebraic and half-algebraic calculators!
>
> Same here! When manually calculating, give me HP (or similar) or give me death. But Forth kind of sucks (and it pains me deeply to have discovered that). The failure (as I see it) of Forth is the awkward target programming model (or virtual machine as the kids say these days - virtual anything is sexy).
>
> There's a pretty big gulf between between manual data& operation entry on something like an HP (or any hand held calculator) and programming with Forth (or any language). A single stack very much facilitates the former but places handcuffs on the latter. These are two fairly different activities that IMO don't exactly have boatloads of overlap, however much we might want or personally need for it to be so.

I won't argue with any of that, or I might...

So why are you here exactly? I'm not saying you shouldn't be here or
that you shouldn't be saying what you are saying. But given how you
feel about Forth, I'm just curious why you want to have the conversation
you are having? Are you exploring your inner curmudgeon?

--

Rick

Tom Gardner

unread,
Jun 22, 2013, 4:34:46 AM6/22/13
to
rickman wrote:
> On 6/21/2013 5:30 AM, Tom Gardner wrote:
>> Elizabeth D. Rather wrote:
>> > I'm also baffled about your comment about "programs that emit
>> > Forth." Although PostScript has many features in common with
>> > Forth, it is quite different, both in terms of command set
>> > and programming model.
>>
>> I was beguiled by Forth and to some extend remain so:
>> I'd /love/ to find a good justification to use it again,
>> and might do so, just for the hell of it.
>
> If you ever need to control hardware, you will want it. The interactivity is great.

The first time I wished I had Forth was when manually
testing prototype hardware with a z80-class processor
c1983. Forth would have been ideal, and significantly
better than the rubbish I threw together.

That experience has shaped my views of "domain specific
languages" vs libraries to this day.

> Actually, when doing things in real time the interactivity sort of goes away (or at least the utility of it) since you can't type fast enough to control a robot or intercept a serial port running at
> 38.4 kbps. But you can very easily test small portions of your code in ways that are tricky in C or the other languages you mention. Then those ideas can be applied to any app... even if not real time.

Just so. It seems we are in violent agreement.

One also has to consider the tools that other people
are familiar with; changing away from them requires
very significant advantages, and Forth just doesn't
have those.


>> I suppose I ought to change "nobody writes Forth" to
>> "almost nobody writes Forth.
>
> So now I've been demoted to to "almost" nobody? What do I have to do to work my way *up* to nobody?

Why are you assuming "demoted"? Is a screw above or
below a nail in some imagined hierarchy!


>> I would, however, be interested to know whether Forth
>> has a defined memory model, since that's a _necessary_
>> _precondition_ to be able to write portable multithreaded
>> code that runs on multicore processors. Hells teeth,
>> even C has now has finally decided that they need to define
>> a memory model, only 20 years after those With Clue
>> knew it was required. (I don't think there are any
>> implementation though!)
>
> Uh, can *someone* who knows something answer that one?
>
> Do you have a lot of call to program multithreaded code for multicore processors? What sort of apps are you coding?

Even cellphones have dual and quad processors nowadays.

And, of course, there are outliers like the Parallax
Propellor.

Multicore will become the norm in the near future; the
only constraint in embedded systems will be memory bandwidth.

Tom Gardner

unread,
Jun 22, 2013, 4:36:22 AM6/22/13
to
rickman wrote:
> I won't argue with any of that, or I might...
>
> So why are you here exactly? I'm not saying you shouldn't be here or that you shouldn't be saying what you are saying. But given how you feel about Forth, I'm just curious why you want to have the
> conversation you are having? Are you exploring your inner curmudgeon?

(a) because it is fun
(b) to gain a personal understanding of the advantages of
nails and screws, and when each should/shouldn't be used


rickman

unread,
Jun 22, 2013, 10:17:29 AM6/22/13
to
On 6/22/2013 4:34 AM, Tom Gardner wrote:
> rickman wrote:
>> On 6/21/2013 5:30 AM, Tom Gardner wrote:
>>> Elizabeth D. Rather wrote:
>>> > I'm also baffled about your comment about "programs that emit
>>> > Forth." Although PostScript has many features in common with
>>> > Forth, it is quite different, both in terms of command set
>>> > and programming model.
>>>
>>> I was beguiled by Forth and to some extend remain so:
>>> I'd /love/ to find a good justification to use it again,
>>> and might do so, just for the hell of it.
>>
>> If you ever need to control hardware, you will want it. The
>> interactivity is great.
>
> The first time I wished I had Forth was when manually
> testing prototype hardware with a z80-class processor
> c1983. Forth would have been ideal, and significantly
> better than the rubbish I threw together.
>
> That experience has shaped my views of "domain specific
> languages" vs libraries to this day.

I'm not sure what that implies, but I'll go with it.


>> Actually, when doing things in real time the interactivity sort of
>> goes away (or at least the utility of it) since you can't type fast
>> enough to control a robot or intercept a serial port running at
>> 38.4 kbps. But you can very easily test small portions of your code in
>> ways that are tricky in C or the other languages you mention. Then
>> those ideas can be applied to any app... even if not real time.
>
> Just so. It seems we are in violent agreement.

Not *too* violent. I don't have guns or anything like that.


> One also has to consider the tools that other people
> are familiar with; changing away from them requires
> very significant advantages, and Forth just doesn't
> have those.

I'm not sure what tools you mean. Sometimes I miss using a debugger
where I can step through my code. Actually I bet Win32Forth has that
somewhere, but the docs are not so great and I am likely missing some
95% of what is in it. Still, debuggers have their limits, notably they
don't do a great job of getting you in the vicinity of the bug so you
can step through the code to find it.

My willingness to give up debuggers and "go with the Forth" was largely
prompted by a statement by Jeff Fox. I don't recall the context exactly
but I was talking about the difficulties of tracking down stack
underflows. My context was thinking like I was coding in C where I
would write a bunch of code and then start in on the bugs often adding
new ones in the process. Jeff pointed out that in Forth, if you have a
stack mismatch it shows that *you can't count*. Even though (or perhaps
because) it was so simple, that really struck me. If I can't write a
Forth word that balances the stack effects, it means I can't count to
three or four. Counting to three or four is a lot easier than using a
debugger!


>>> I suppose I ought to change "nobody writes Forth" to
>>> "almost nobody writes Forth.
>>
>> So now I've been demoted to to "almost" nobody? What do I have to do
>> to work my way *up* to nobody?
>
> Why are you assuming "demoted"? Is a screw above or
> below a nail in some imagined hierarchy!

Context is everything. I have no idea what the context of screw v nail
is about.


>>> I would, however, be interested to know whether Forth
>>> has a defined memory model, since that's a _necessary_
>>> _precondition_ to be able to write portable multithreaded
>>> code that runs on multicore processors. Hells teeth,
>>> even C has now has finally decided that they need to define
>>> a memory model, only 20 years after those With Clue
>>> knew it was required. (I don't think there are any
>>> implementation though!)
>>
>> Uh, can *someone* who knows something answer that one?
>>
>> Do you have a lot of call to program multithreaded code for multicore
>> processors? What sort of apps are you coding?
>
> Even cellphones have dual and quad processors nowadays.
>
> And, of course, there are outliers like the Parallax
> Propellor.
>
> Multicore will become the norm in the near future; the
> only constraint in embedded systems will be memory bandwidth.

So which are you writing code for? Or is this just a theoretical
discussion?

As to multicore being the norm... well, the last toaster I bought does
seem to have a micro in it, *really*. It's a four slice unit, each half
has three little buttons for bagel, frozen and reheat, one button for
cancel and a darkness knob. You have up to five seconds after pushing
down the handle to make your selections. I'll bet you anything this has
a GA4 in it! I can't imagine doing this without a multicore
processor... oh, wait, that is programmed in Forth isn't it... what is
that memory model again?

--

Rick

chrisabele

unread,
Jun 22, 2013, 10:19:55 AM6/22/13
to
On 6/22/2013 1:20 AM, rickman wrote:
>...
> Actually, that would be a good name for a couple of dogs, Ethos and
> Pathos, Chesapeake Bay retrievers.
>...

What happened to the third dog, Logos?

A funny omission, given that this group is primarily logic related (only
secondarily philosophy related).

Chris

David Brown

unread,
Jun 22, 2013, 11:21:14 AM6/22/13
to
On 22/06/13 07:23, rickman wrote:
> On 6/21/2013 7:18 AM, David Brown wrote:
>> On 21/06/13 11:30, Tom Gardner wrote:
>>
>>> I suppose I ought to change "nobody writes Forth" to
>>> "almost nobody writes Forth.
>>>
>>
>> Shouldn't that be "almost nobody Forth writes" ?
>
> I would say that was "nobody almost Forth writes". Wouldn't it be [noun
> [adjective] [noun [adjective]]] verb?

I thought about that, but I was not sure. When I say "work with Forth
again", I have only "played" with Forth, not "worked" with it, and it
was a couple of decades ago.

>
>> (I too would like an excuse to work with Forth again.)
>
> What do you do instead?
>

I do mostly small-systems embedded programming, which is mostly in C.
It used to include a lot more assembly, but that's quite rare now
(though it is not uncommon to have to make little snippets in assembly,
or to study compiler-generated assembly), and perhaps in the future it
will include more C++ (especially with C++11 features). I also do
desktop and server programming, mostly in Python, and I have done a bit
of FPGA work (but not for a number of years).

I don't think of Forth as being a suitable choice of language for the
kind of systems I work with - but I do think it would be fun to work
with the kind of systems for which Forth is the best choice. However, I
suspect that is unlikely to happen in practice. (Many years ago, my
company looked at a potential project for which Atmel's Marc-4
processors were a possibility, but that's the nearest I've come to Forth
at work.)

I just think it's fun to work with different types of language - it
gives you a better understanding of programming in general, and new
ideas of different ways to handle tasks.

rickman

unread,
Jun 22, 2013, 12:18:55 PM6/22/13
to
On 6/22/2013 11:21 AM, David Brown wrote:
> On 22/06/13 07:23, rickman wrote:
>> On 6/21/2013 7:18 AM, David Brown wrote:
>>> On 21/06/13 11:30, Tom Gardner wrote:
>>>
>>>> I suppose I ought to change "nobody writes Forth" to
>>>> "almost nobody writes Forth.
>>>>
>>>
>>> Shouldn't that be "almost nobody Forth writes" ?
>>
>> I would say that was "nobody almost Forth writes". Wouldn't it be [noun
>> [adjective] [noun [adjective]]] verb?
>
> I thought about that, but I was not sure. When I say "work with Forth
> again", I have only "played" with Forth, not "worked" with it, and it
> was a couple of decades ago.

Hey, it's not like this is *real* forth. But looking at how some Forth
code works for things like assemblers and my own projects, the data is
dealt with first starting with some sort of a noun type piece of data
(like a register) which may be modified by an adjective (perhaps an
addressing mode) followed by others, then the final verb to complete the
action (operation).


>>> (I too would like an excuse to work with Forth again.)
>>
>> What do you do instead?
>>
>
> I do mostly small-systems embedded programming, which is mostly in C. It
> used to include a lot more assembly, but that's quite rare now (though
> it is not uncommon to have to make little snippets in assembly, or to
> study compiler-generated assembly), and perhaps in the future it will
> include more C++ (especially with C++11 features). I also do desktop and
> server programming, mostly in Python, and I have done a bit of FPGA work
> (but not for a number of years).

Similar to myself, but with the opposite emphasis. I mostly do hardware
and FPGA work with embedded programming which has been rare for some years.

I think Python is the language a customer recommended to me. He said
that some languages are good for this or good for that, but Python
incorporates a lot of the various features that makes it good for most
things. They write code running under Linux on IP chassis. I think
they use Python a lot.


> I don't think of Forth as being a suitable choice of language for the
> kind of systems I work with - but I do think it would be fun to work
> with the kind of systems for which Forth is the best choice. However, I
> suspect that is unlikely to happen in practice. (Many years ago, my
> company looked at a potential project for which Atmel's Marc-4
> processors were a possibility, but that's the nearest I've come to Forth
> at work.)

So why can't you consider Forth for processors that aren't stack based?


> I just think it's fun to work with different types of language - it
> gives you a better understanding of programming in general, and new
> ideas of different ways to handle tasks.

I'm beyond "playing" in this stuff and I don't mean "playing" in a
derogatory way, I mean I just want to get my work done. I'm all but
retired and although some of my projects are not truly profit motivated,
I want to get them done with a minimum of fuss. I look at the tools
used to code in C on embedded systems and it scares me off really,
especially the open source ones that require you to learn so much before
you can become productive or even get the "hello world" program to work.
That's why I haven't done anything with the rPi or the Beagle Boards.

--

Rick

Eric Wallin

unread,
Jun 22, 2013, 12:26:51 PM6/22/13
to
On Saturday, June 22, 2013 1:26:53 AM UTC-4, rickman wrote:

> So why are you here exactly? I'm not saying you shouldn't be here or
> that you shouldn't be saying what you are saying. But given how you
> feel about Forth, I'm just curious why you want to have the conversation
> you are having? Are you exploring your inner curmudgeon?

Just venting at the industry, and procrastinating a bit (putting off the final verification of the processor). Apparently OT is my favorite subject as it seems I'm always busy derailing my own (and others) threads. That, and Y'all have very interesting takes on these and various and sundry other things.

Backing up a bit, it strikes me as a bit crazy to make a language based on the concept of a weird target processor. I mean, I get the portability thing, but at what cost? If my experience as a casual user (not programmer) of Java on my PC is any indication (data point of one, the plural of anecdote isn't data, etc.), the virtual stack-based processor paradigm has failed, as the constant updates, security issues, etc. pretty much forced me to uninstall it. And I would think that a language targeting a processor model that is radically different than the physically underlying one would be terribly inefficient unless the compiler can do hand stands while juggling spinning plates on fire - even if it is, god knows what it spits out. Canonical stack processors and their languages (Forth, Java, Postscript) at this point seem to be hanging by a legacy thread (even if every PC runs one peripherally at one time or another).

I suspect that multiple independent equal bandwidth threads (as I strongly suspect the Propeller has, and my processor definitely has) is such a natural construct - it fully utilizes the HW pipeline by eliminating all hazards, bubbles, stalls, branch prediction, etc. and uses the interstage registering for data and control value storage - that it will come more into common usage as compilers better adapt to multi-cores and threads. Then again, the industry never met a billion transistor bizarro world processor it didn't absolutely love, so what do I know?

I find it exceeding odd that the PC industry is still using x86 _anything_ at this point. Apple showed us you can just dump your processor and switch horses in midstream pretty much whenever you feel like it (68k => PowerPC => x86) and not torch your product line / lose your customer base. I suppose having Intel and MS go belly up overnight is beyond the pale and at the root of why we can't have nice things. I remember buying my first 286, imagining of all the wonderful projects it would enable, and then finding out what complete dogs the processor and OS were - it was quite disillusioning for the big boys to sell me a lump of shit like that (and for a lot more than 3 farthings).

Eric Wallin

unread,
Jun 22, 2013, 12:36:20 PM6/22/13
to
Why is it that every "serious" processor is a ball of cruft with multiple bags on the side, some much worse than others, but none even a mother could love? Is it because overly complex compilers have made HW developers disillusioned / complacent?

I honestly don't get how we got here from there. If it's anything, engineering is an exercise in complexity management, but processor design is somehow flying under the radar.

Tom Gardner

unread,
Jun 22, 2013, 12:56:24 PM6/22/13
to
Eric Wallin wrote:
> Why is it that every "serious" processor is a ball of cruft with multiple bags on the side, some much worse than others, but none even a mother could love? Is it because overly complex compilers have made HW developers disillusioned / complacent?

1) an adherence to the von Neumann concepts of what a processor should be
2) processing models that worked well given previous technology limits,
but which don't scale to modern technology limits (e.g processor/memory
speed ratio, number of ic pins, cache coherence)
3) backwards compatibility (*the* dominant commercial consideration)



> I honestly don't get how we got here from there. If it's anything, engineering is an exercise in complexity management, but processor design is somehow flying under the radar.

1) look at The Mill on comp.arch, for one beguiling possibility
2) message passing between processors/threads executing on
multiprocessor machines with non-coherent memories


Tom Gardner

unread,
Jun 22, 2013, 1:17:18 PM6/22/13
to
Eric Wallin wrote:
> On Saturday, June 22, 2013 1:26:53 AM UTC-4, rickman wrote:
>
>> So why are you here exactly? I'm not saying you shouldn't be here or
>> that you shouldn't be saying what you are saying. But given how you
>> feel about Forth, I'm just curious why you want to have the conversation
>> you are having? Are you exploring your inner curmudgeon?
>
> Just venting at the industry, and procrastinating a bit (putting off the final verification of the processor). Apparently OT is my favorite subject as it seems I'm always busy derailing my own (and others) threads. That, and Y'all have very interesting takes on these and various and sundry other things.
>
> Backing up a bit, it strikes me as a bit crazy to make a
> language based on the concept of a weird target processor.
> I mean, I get the portability thing, but at what cost?
> If my experience as a casual user (not programmer) of
> Java on my PC is any indication (data point of one,
> the plural of anecdote isn't data, etc.), the virtual stack-based
> processor paradigm has failed, as the constant updates, security
> issues, etc. pretty much forced me to uninstall it.

None of those real problems are related to "virtual stack-based
processor paradigm".



> And I would think that a language targeting a processor model
> that is radically different than the physically underlying one
> would be terribly inefficient unless the compiler can do hand
> stands while juggling spinning plates on fire - even if it is,
> god knows what it spits out.

That is just about what HotSpot does :) As L Peter Deutsch remarked
when people had similar reservations about the first (Smalltalk) JIT
in the mid 80s, if you can't tell the difference externally, the
internals don't matter.

Now for some fun how emulated processors can be faster than
native processors *even when both processors are the same*
- take processor X executing cpu intensive C benchmarks
compiled with optimisation on, and measure speed S1
- write an emulator for processor X and run that emulator
on processor X
- run those C benchmarks on in the emulator, see what the
code is actually doing (as opposed to what the compiler
dared not assume)
- use that knowledge to "patch the optimised binaries"
- run the patched binaries in the emulator, and measure
speed S2
- note that S2 can be faster than S1
http://archive.arstechnica.com/reviews/1q00/dynamo/dynamo-1.html



> Canonical stack processors and theirlanguages (Forth, Java,
> Postscript) at this point seem to be hanging by a legacy thread
> (even if every PC runs one peripherally at one time or another).

That's a bizarre assertion.

> I find it exceeding odd that the PC industry is still using
> x86 _anything_ at this point.

backwards compatibility is the dominant commercial imperative:
don't inconvenience your existing customers. (Windows 8? Tee hee)

> Apple showed us you can just dumpyour processor and switch
> horses in midstream pretty much whenever you feel like it
> (68k => PowerPC => x86) and not torch your product line /
> lose your customer base.

Only given preconditions that don't apply in the Wintel world.

Tom Gardner

unread,
Jun 22, 2013, 1:27:11 PM6/22/13
to
David Brown wrote:

> I do mostly small-systems embedded programming, which is mostly in C. It used to include a lot more assembly, but that's quite rare now (though it is not uncommon to have to make little snippets in
> assembly, or to study compiler-generated assembly), and perhaps in the future it will include more C++ (especially with C++11 features). I also do desktop and server programming, mostly in Python,
> and I have done a bit of FPGA work (but not for a number of years).

If C++ is the answer, I want to know what the question is.

How many *years* does it take before the first commercial
implementation of a C++ standard becomes available? Yes, I
know partial implementations become available pretty quickly.

Soustroup's tome describing "this is what I meant by the
various bits of C++" started out at ~400 pages and is now
around 1300 pages. 'Nuff said.

Don't forget that it is possible to get the compiler to
emit the sequence of prime numbers during the (unterminating)
compilation process. The language designers didn't realise
they had created such a monster until it was demonstrated
to them!
<http://en.wikibooks.org/wiki/C%2B%2B_Programming/Templates/Template_Meta-Programming#History_of_TMP>



> I just think it's fun to work with different types of language - it gives you a better understanding of programming in general, and new ideas of different ways to handle tasks.

Very true. Choose the right tool for the task at hand.


rickman

unread,
Jun 22, 2013, 2:08:20 PM6/22/13
to
On 6/22/2013 12:26 PM, Eric Wallin wrote:
> On Saturday, June 22, 2013 1:26:53 AM UTC-4, rickman wrote:
>
>> So why are you here exactly? I'm not saying you shouldn't be here or
>> that you shouldn't be saying what you are saying. But given how you
>> feel about Forth, I'm just curious why you want to have the conversation
>> you are having? Are you exploring your inner curmudgeon?
>
> Just venting at the industry, and procrastinating a bit (putting off the final verification of the processor). Apparently OT is my favorite subject as it seems I'm always busy derailing my own (and others) threads. That, and Y'all have very interesting takes on these and various and sundry other things.

I am cc'ing this to the forth group in case anyone there cares to join
in. I'm still a novice at the language so I only can give you my take
on things.


> Backing up a bit, it strikes me as a bit crazy to make a language based on the concept of a weird target processor. I mean, I get the portability thing, but at what cost? If my experience as a casual user (not programmer) of Java on my PC is any indication (data point of one, the plural of anecdote isn't data, etc.), the virtual stack-based processor paradigm has failed, as the constant updates, security issues, etc. pretty much forced me to uninstall it. And I would think that a language targeting a processor model that is radically different than the physically underlying one would be terribly inefficient unless the compiler can do hand stands while juggling spinning plates on fire - even if it is, god knows what it spits out. Canonical stack processors and their languages (Forth, Java, Postscript) at this point seem to be hanging by a legacy thread (even if every PC runs one peripherally at one time or another).

Off the topic at hand, here is one of thunderbird's many issues as a
news reader. It displays messages just fine in the reading window, but
in the edit window all of your quoted paragraphs show as single lines
goiing far off the right side of the screen. I have to switch back and
forth to read the text I am replying to!

Back to the discussion...

By weird target processor you mean the virtual machine? That is because
it is a very simple model. It does seem odd that such a model would be
adopted, but the use of the stack makes for a very simple parameter
passing method supported by very simple language features. There is no
need for syntax other than spaces. That is *very* powerful and allows
the tool to be kept very small.

Chuck Moore is all about simplicity and this is how he got this level of
simplicity in the language.


> I suspect that multiple independent equal bandwidth threads (as I strongly suspect the Propeller has, and my processor definitely has) is such a natural construct - it fully utilizes the HW pipeline by eliminating all hazards, bubbles, stalls, branch prediction, etc. and uses the interstage registering for data and control value storage - that it will come more into common usage as compilers better adapt to multi-cores and threads. Then again, the industry never met a billion transistor bizarro world processor it didn't absolutely love, so what do I know?

So what clock speeds does your processor achieve? It is an interesting
idea to pipeline everything and then treat the one processor as N
processors running in parallel. I think you have mentioned that here
before and I seem to recall taking a quick look at the idea some time
back. It fits well with many of the features available in FPGAs and
likely would do ok in an ASIC. I just would not have much need for it
in most of the things I am looking at doing.

Rather than N totally independent processors, have you considered using
pipelining to implement SIMD? This could get around some of the
difficulties in the N wide processor like memory bandwidth.


> I find it exceeding odd that the PC industry is still using x86 _anything_ at this point. Apple showed us you can just dump your processor and switch horses in midstream pretty much whenever you feel like it (68k => PowerPC => x86) and not torch your product line / lose your customer base. I suppose having Intel and MS go belly up overnight is beyond the pale and at the root of why we can't have nice things. I remember buying my first 286, imagining of all the wonderful projects it would enable, and then finding out what complete dogs the processor and OS were - it was quite disillusioning for the big boys to sell me a lump of shit like that (and for a lot more than 3 farthings).

You know why the x86 is still in use. It is not really that bad in
relation to the other architectures when measured objectively. It may
not be the best, but there is a large investment, mostly by Intel. if
Intel doesn't change why would anyone else? But that is being eroded by
the ARM processors in the handheld market. We'll see if Intel can
continue to adapt the x86 to low power and maintain a low cost.

I don't think MS is propping up the x86. They offer a version of
Windows for the ARM don't they? As you say, there is a bit of processor
specific code but the vast bulk of it is just a matter of saying ARMxyz
rather than X86xyz. Developers are another matter. Not many want to
support yet another target, period. If the market opens up for Windows
on ARM devices then that can change. In the mean time it will be
business as usual for desktop computing.

--

Rick

Eric Wallin

unread,
Jun 22, 2013, 4:11:28 PM6/22/13
to
On Saturday, June 22, 2013 2:08:20 PM UTC-4, rickman wrote:

> So what clock speeds does your processor achieve? It is an interesting
> idea to pipeline everything and then treat the one processor as N
> processors running in parallel. I think you have mentioned that here
> before and I seem to recall taking a quick look at the idea some time
> back. It fits well with many of the features available in FPGAs and
> likely would do ok in an ASIC. I just would not have much need for it
> in most of the things I am looking at doing.

The core will do ~200 MHz in the smallest Cyclone 3 or 4 speed grade 8 (the cheapest and slowest). It looks to the outside world like 8 independent processors (threads) running at 25 MHz, each with its own independent interrupt. Internally each thread has 4 private general purpose stacks that are each 32 entries deep, but all threads fully share main memory (combined instruction/data).

> Rather than N totally independent processors, have you considered using
> pipelining to implement SIMD? This could get around some of the
> difficulties in the N wide processor like memory bandwidth.

I haven't given this very much thought. But different cores could simultaneously work on different byte fields in a word in main memory so I'm not sure HW SIMD support is all that necessary.

This is just a small FPGA core, not an x86 killer. Though it beats me why more muscular processors don't employ these simple techniques.

Tom Gardner

unread,
Jun 22, 2013, 4:22:02 PM6/22/13
to
Eric Wallin wrote:

> The core will do ~200 MHz in the smallest Cyclone 3 or 4 speed grade
> 8 (the cheapest and slowest). It looks to the outside world like
> 8 independent processors (threads) running at 25 MHz, each with
> its own independent interrupt. Internally each thread has 4
> private general purpose stacks that are each 32 entries deep,
> but all threads fully share main memory (combined instruction/data).

Have you defined what happens when one processor writes to a
memory location that is being read by another processor?
In other words, what primitives do you provide that allow
one processor to reliably communicate with another?

rickman

unread,
Jun 22, 2013, 4:37:00 PM6/22/13
to
That is my point. With SIMD you have 1/8th the instruction rate saving
memory accesses, but the same amount of data can be processed. Of
course, it all depends on your app.

--

Rick

Eric Wallin

unread,
Jun 22, 2013, 5:11:56 PM6/22/13
to
On Saturday, June 22, 2013 4:22:02 PM UTC-4, Tom Gardner wrote:

> Have you defined what happens when one processor writes to a
> memory location that is being read by another processor?

Main memory is accessed by the threads sequentially, so there is no real contention possible.

> In other words, what primitives do you provide that allow
> one processor to reliably communicate with another?

None, it's all up to the programmer. Off the top of my head, one thread might keep tabs on a certain memory location A looking for a change of some sort, perform some activity in response to this, then write to a separate location B that one or more other threads are similarly watching.

Another option (that I didn't implement, but it would be simple to do) would be to enable interrupt access via the local register set, giving threads the ability to interrupt one another for whatever reason. But doing this via a single register could lead to confusion because there is no atomic read/write access (and I don't think it's worth implementing atomics just for this). Each thread interrupt could be in a separate register I suppose. With an ocean of main memory available for flags and mail boxes and such I guess I don't see the need for added complexity.

Eric Wallin

unread,
Jun 22, 2013, 5:21:51 PM6/22/13
to
On Saturday, June 22, 2013 4:37:00 PM UTC-4, rickman wrote:

> That is my point. With SIMD you have 1/8th the instruction rate saving
> memory accesses, but the same amount of data can be processed. Of
> course, it all depends on your app.

But this processor core doesn't have a memory bandwidth bottleneck, so the instruction rate is moot.

Main memory is a full dual port BRAM, so each thread gets a chance to read/write data and fetch an instruction every cycle. The bandwidth is actually overkill - the fetch side write port is unused.

Eric Wallin

unread,
Jun 22, 2013, 5:32:43 PM6/22/13
to
I think the design document is good enough for general public consumption, so I applied for a project over at opencores.org (they say they'll get around to it in one working day).

Still doing verification and minor code polishing, no bugs so far. All branch immediate distances and conditionals check out; interrupts are working as expected; stack functionality, depth, and error reporting via the local register set checks out. A log base 2 subroutine returns the same values as a spreadsheet, ditto for restoring unsigned division. I just need to confirm a few more things like logical and arithmetic ALU operations and the code should be good to go.

Tom Gardner

unread,
Jun 22, 2013, 5:57:23 PM6/22/13
to
Eric Wallin wrote:
> On Saturday, June 22, 2013 4:22:02 PM UTC-4, Tom Gardner wrote:
>
>> Have you defined what happens when one processor writes to a
>> memory location that is being read by another processor?
>
> Main memory is accessed by the threads sequentially, so there is no real contention possible.

OK, so what *atomic* synchronisation primitives are available?
Classic examples involve atomic read-modify-write operations (e.g.
test and set, compare and swap). And they are bloody difficult
and non-scalable if there is any memory hierarchy.


>> In other words, what primitives do you provide that allow
>> one processor to reliably communicate with another?
>
> None, it's all up to the programmer.

That raises red flags with software engineers. Infamously
with the Itanic, for example!


> Off the top of my head, one thread might keep tabs on a certain memory location A looking for a change of some sort, perform some activity in response to this, then write to a separate location B that one or more other threads are similarly watching.
>
> Another option (that I didn't implement, but it would be simple to do) would be to enable interrupt access via the local register set, giving threads the ability to interrupt one another for whatever reason. But doing this via a single register could lead to confusion because there is no atomic read/write access (and I don't think it's worth implementing atomics just for this). Each thread interrupt could be in a separate register I suppose. With an ocean of main memory available for flags and mail boxes and such I guess I don't see the need for added complexity.

How do you propose to implement mailboxes reliably?
You need to think of all the possible memory-access
sequences, of course.

rickman

unread,
Jun 22, 2013, 11:44:44 PM6/22/13
to
I don't get the question. Weren't semaphores invented a long time ago
and require no special support from the processor?

--

Rick

rickman

unread,
Jun 22, 2013, 11:45:56 PM6/22/13
to
But that is only true if you limit yourself to on chip memory.

What is the app you designed this processor for?

--

Rick

rickman

unread,
Jun 22, 2013, 11:47:49 PM6/22/13
to
On 6/22/2013 5:32 PM, Eric Wallin wrote:
> I think the design document is good enough for general public consumption, so I applied for a project over at opencores.org (they say they'll get around to it in one working day).
>
> Still doing verification and minor code polishing, no bugs so far. All branch immediate distances and conditionals check out; interrupts are working as expected; stack functionality, depth, and error reporting via the local register set checks out. A log base 2 subroutine returns the same values as a spreadsheet, ditto for restoring unsigned division. I just need to confirm a few more things like logical and arithmetic ALU operations and the code should be good to go.

Someone was talking about this recently, I don't recall if it was you or
someone else. It was pointed out that the most important aspect of any
core is the documentation. opencores has lots of pretty worthless cores
because you have to reverse engineer them to do anything with them.

--

Rick

David Brown

unread,
Jun 23, 2013, 3:54:10 AM6/23/13
to
On 22/06/13 20:08, rickman wrote:
> On 6/22/2013 12:26 PM, Eric Wallin wrote:
>> I find it exceeding odd that the PC industry is still using x86
>> _anything_ at this point. Apple showed us you can just dump your
>> processor and switch horses in midstream pretty much whenever you feel
>> like it (68k => PowerPC => x86) and not torch your product line /
>> lose your customer base. I suppose having Intel and MS go belly up
>> overnight is beyond the pale and at the root of why we can't have nice
>> things. I remember buying my first 286, imagining of all the
>> wonderful projects it would enable, and then finding out what complete
>> dogs the processor and OS were - it was quite disillusioning for the
>> big boys to sell me a lump of shit like that (and for a lot more than
>> 3 farthings).
>
> You know why the x86 is still in use. It is not really that bad in
> relation to the other architectures when measured objectively. It may
> not be the best, but there is a large investment, mostly by Intel.

The x86 was already considered an old-fashioned architecture the day it
was first released. It was picked for the IBM PC (against the opinion
of all the technical people, who wanted the 68000) because some PHB
decided that the PC was a marketing experiment of no more than a 1000 or
so units, so the processor didn't matter and they could pick the cheaper
x86 chip.

Modern x86 chips are fantastic pieces of engineering - but they are
fantastic implementations of a terrible original design. They are the
world's best example that given enough money and clever people, you
/can/ polish a turd.

> if
> Intel doesn't change why would anyone else? But that is being eroded by
> the ARM processors in the handheld market. We'll see if Intel can
> continue to adapt the x86 to low power and maintain a low cost.
>
> I don't think MS is propping up the x86. They offer a version of
> Windows for the ARM don't they? As you say, there is a bit of processor
> specific code but the vast bulk of it is just a matter of saying ARMxyz
> rather than X86xyz. Developers are another matter. Not many want to
> support yet another target, period. If the market opens up for Windows
> on ARM devices then that can change. In the mean time it will be
> business as usual for desktop computing.
>

MS props up the x86 - of that there is no doubt. MS doesn't really care
if the chips are made by Intel, AMD, or any of the other x86
manufacturers that have come and gone.

MS tried to make Windows independent of the processor architecture when
they first made Windows NT. That original ran on x86, MIPS, PPC and
Alpha. But they charged MIPS, PPC and Alpha for the privilege - and
when they couldn't afford the high costs to MS, they stopped paying and
MS stopped making these Windows ports. MS did virtually nothing to
promote these ports of Windows, and almost nothing to encourage any
other developers to target them. They just took the cash from the
processor manufacturers, and used it to split the workstation market
(which was dominated by Unix on non-x86 processors) and discourage Unix.

I don't think MS particularly cares about x86 in any way - they just
care that you run /their/ software. Pushing x86 above ever other
architecture just makes things easier and cheaper for them.

Part of the problem is that there is a non-negligible proportion of
Windows (and many third-party programs) design and code that /is/
x86-specific, and it is not separated into portability layers because it
was never designed for portability - so porting is a lot more work than
just a re-compile. It is even more work if you want to make the code
run fast - there is a lot of "manual optimisation" in key windows code
that is fine-tuned to the x86. For example, sometimes 8-bit variables
will be used because they are the fastest choice on old x86 processors
and modern x86 cpus handle them fine - but 32-bit RISC processors will
need extra masking and extending instructions to use them. To be fair
on MS, such code was written long before int_fast8_t and friends came
into use.

However, it is a lot better these days than it used to be - the
processor-specific code is much less as more code is in higher level
languages, and in particular, little of the old assembly code remained.
It is also easier on the third-party side, as steadily more developers
are making cross-platform code for Windows, MacOS and Linux - such code
is far easier to port to other processors.

As for Windows on the ARM, it is widely considered to be a bad joke. It
exists to try and take some of the ARM tablet market, but is basically a
con - it is a poor substitute for Android or iOS as a pad system, and
like other pads, it is a poor substitute for a "real" PC for content
creation rather than just viewing. People buy it thinking they can run
Windows programs on their new pad (they can't), or that they can use it
for MS Office applications (they can't - it's a cut-down version that
has less features than Polaris office on Android, and you can't do
sensible work on a pad anyway).

If ARM takes off as a replacement for x86, it will not be due to MS - it
will be due to Linux. The first "target" is the server world - software
running on Linux servers is already highly portable across processors.
For most of the software, you have the source code and you just
re-compile (by "you", I mean usually Red Hat, Suse, or other distro).
Proprietary Linux server apps are also usually equally portable - if
Oracle sees a market for their software on ARM Linux servers, they'll do
the re-compile quickly and easily.

/Real/ Windows for ARM, if we ever see it, will therefore come first to
servers.

David Brown

unread,
Jun 23, 2013, 4:00:24 AM6/23/13
to
If threads can do some sort of compare-and-swap instruction, then
semaphores should be no problem with this architecture. Without them,
there are algorithms to make semaphores but they don't scale well
(typically each semaphore needs a memory location for each thread that
might want access, and taking the semaphore requires a read of each of
these locations). It helps if you can guarantee that your thread has a
certain proportion of the processor time - i.e., there is a limit to how
much other threads can do in between instructions of your thread.

David Brown

unread,
Jun 23, 2013, 4:38:04 AM6/23/13
to
There are two main reasons.

The first, and perhaps most important, is non-technical - C (and to a
much smaller extent, C++) is the most popular language for embedded
development. That means it is the best supported by tools, best
understood by other developers, has the most sample code and libraries,
etc. There are a few niches where other languages are used - assembly,
Ada, etc. And of course there are hobby developers, lone wolves, and
amateurs pretending to be professionals who pick Pascal, Basic, or Forth.

I get to pick these things myself to a fair extent (with some FPGA work
long ago, I used confluence rather than the standard VHDL/Verilog). But
I would need very strong reasons to pick anything other than C or C++
for embedded development.


The other reason is more technical - Forth is simply not a great
language for embedded development work.

It certainly has some good points - its interactivity is very nice, and
you can write very compact source code.

But the stack model makes it hard to work with more complex functions,
so it is difficult to be sure your code is correct and maintainable.
The traditional Forth solution is to break your code into lots of tiny
pieces - but that means the programmer is jumping back and forth in the
code, rather than working with sequential events in a logical sequence.
The arithmetic model makes it hard to work with different sized types,
which are essential in embedded systems - the lack of overloading on
arithmetic operators means a lot of manual work in manipulating types
and getting the correct variant of the operator you want. The highly
flexible syntax means that static error checking is almost non-existent.

Andrew Haley

unread,
Jun 23, 2013, 5:04:56 AM6/23/13
to
In comp.lang.forth rickman <gnu...@gmail.com> wrote:

> Backing up a bit, it strikes me as a bit crazy to make a language
> based on the concept of a weird target processor. I mean, I get the
> portability thing, but at what cost? If my experience as a casual
> user (not programmer) of Java on my PC is any indication (data point
> of one, the plural of anecdote isn't data, etc.), the virtual
> stack-based processor paradigm has failed, as the constant updates,
> security issues, etc. pretty much forced me to uninstall it. And I
> would think that a language targeting a processor model that is
> radically different than the physically underlying one would be
> terribly inefficient unless the compiler can do hand stands while
> juggling spinning plates on fire - even if it is, god knows what it
> spits out.

Let's pick this apart a bit. Firstly, most Java updates and security
bugs have nothing whatsoever to do with the concept of a virtual
machine. They're almost always caused by coding errors in the
library, and they'd be bugs regardless of the architecture of the
virtual machine. Secondly, targeting a processor model that is
radically different than the physically underlying one is what every
optimizing compiler does evey day, and Java is no different.

> Canonical stack processors and their languages (Forth, Java,
> Postscript) at this point seem to be hanging by a legacy thread
> (even if every PC runs one peripherally at one time or another).

Not even remotely true. Java is either the most popular or the second
most popular porgramming language in the world. Most Java runs on
servers; the desktop is such a tiny part of the market that even if
everyone drops Java in the browser it will make almost no difference.

Andrew.

Paul Rubin

unread,
Jun 23, 2013, 5:09:09 AM6/23/13
to
> Java is either the most popular or the second most popular porgramming
> language in the world. Most Java runs on servers;

I think most Java runs on SIM cards. Of course there are more of those
than desktops and servers put together.

Tom Gardner

unread,
Jun 23, 2013, 5:31:24 AM6/23/13
to
Of course they are one communications mechanism, but not
the only one. Implementation can be made impossible by
some design decisions. Whether support is "special" depends
on what you regard as "normal", so I can't give you an
answer to that one!

Eric Wallin

unread,
Jun 23, 2013, 7:34:26 AM6/23/13
to
On Saturday, June 22, 2013 11:47:49 PM UTC-4, rickman wrote:

> Someone was talking about this recently, I don't recall if it was you or
> someone else. It was pointed out that the most important aspect of any
> core is the documentation. opencores has lots of pretty worthless cores
> because you have to reverse engineer them to do anything with them.

I just posted the design document:

http://opencores.org/project,hive

I'd be interested in any comments, my email address is in the document. I'll post the verilog soon.

Cheers!

Andrew Haley

unread,
Jun 23, 2013, 1:09:23 PM6/23/13
to
In comp.lang.forth Paul Rubin <no.e...@nospam.invalid> wrote:
>> Java is either the most popular or the second most popular porgramming
>> language in the world. Most Java runs on servers;
>
> I think most Java runs on SIM cards.

Err, what is this belief based on? I mean, you might be right, but I
never heard that before.

Andrew.

rickman

unread,
Jun 23, 2013, 1:18:44 PM6/23/13
to
You might want to fix your attribution line... What you replied to were
not my words.

Rick
--

Rick

rickman

unread,
Jun 23, 2013, 1:24:48 PM6/23/13
to
I'd be interested in reading the design document, but this is what I
find at Opencores...


HIVE - a 32 bit, 8 thread, 4 register/stack hybrid, pipelined verilog
soft processor core :: Overview Overview News Downloads Bugtracker

Project maintainers

Wallin, Eric
Details

Name: hive
Created: Jun 22, 2013
Updated: Jun 23, 2013
SVN: No files checked in


--

Rick

rickman

unread,
Jun 23, 2013, 1:27:16 PM6/23/13
to
What aspect of a processor can make implementation of semaphores
impossible?

--

Rick

Rob Doyle

unread,
Jun 23, 2013, 1:34:09 PM6/23/13
to
Lack of atomic operations.

Rob.


rickman

unread,
Jun 23, 2013, 1:45:03 PM6/23/13
to
What you just said in response to my question about why you *can't* pick
Forth is, "because it doesn't suit me". That's fair enough, but not as
much about Forth as it is about your preferences and biases.


> The other reason is more technical - Forth is simply not a great
> language for embedded development work.
>
> It certainly has some good points - its interactivity is very nice, and
> you can write very compact source code.
>
> But the stack model makes it hard to work with more complex functions,
> so it is difficult to be sure your code is correct and maintainable.

I think you will get some disagreement on that point.


> The traditional Forth solution is to break your code into lots of tiny
> pieces - but that means the programmer is jumping back and forth in the
> code, rather than working with sequential events in a logical sequence.

I am no expert, so far be it from me to defend Forth in this regard, but
my experience is that if you are having trouble writing code in Forth,
you don't "get it".

I've mentioned many times I think the first time I can recall hearing
"the word". I liked the idea of Forth, but was having trouble writing
code in it for the various reasons that people give, one of which is
your issue above. One time I was complaining that it was hard to find
stack mismatches where words were leaving too many parameter on the
stack or not enough. Jeff Fox weighed in (as he often would) and told
me I didn't need debuggers and such, statck mismatches just showed that
I couldn't count... That hit me between the eyes and I realized he was
right. Balancing the stack is just a matter of counting... *and*
keeping your word definitions small so that you aren't prone to
miscounting. That was the real lesson, keep the definitions small.

You don't need to jump "back and forth" so much, you just need to learn
to decompose the code so that each word is small enough to debug
visually. It was recognized a long time ago that even in C programming
that smaller is better. I don't recall the expert, but one of the
programming gurus of yesteryear had a guideline that C routines should
fit on a screen which was 24 lines at the time. But do people listen?
No. They write large routines that are hard to debug.


> The arithmetic model makes it hard to work with different sized types,
> which are essential in embedded systems - the lack of overloading on
> arithmetic operators means a lot of manual work in manipulating types
> and getting the correct variant of the operator you want. The highly
> flexible syntax means that static error checking is almost non-existent.

Really? I have never considered data types to be a problem in Forth.
Using S>D or just typing 0 to convert from single to double precision
isn't so hard. You could also define words that are C like,
(signed_double) and (unsigned_double), but then I don't know if this
would help you since I don't understand your concern.

Yes, in terms of error checking, Forth is at the other end of the
universe (almost) from Ada or VHDL (I'm pretty proficient at VHDL, not
so much with Ada). I can tell you that in VHDL you spend almost as much
time specifying and converting data types as you do the rest of coding.
The only difference from not having the type checking is that the tool
catches the "bugs" and you spend your time figuring out how to make it
happy, vs. debugging the usual way. I'm not sure which is really
faster. I honestly can't recall having a bug from data types in Forth,
but it could have happened.

--

Rick

rickman

unread,
Jun 23, 2013, 1:46:26 PM6/23/13
to
Lol, so semaphores were never implemented on a machine without an atomic
read modify write?

--

Rick

Rob Doyle

unread,
Jun 23, 2013, 2:05:14 PM6/23/13
to
You don't need a read/modify/write instruction. You need to perform a
read/modify/write sequence of instructions atomically. On simple
processors that could be accomplished by disabling interrupts around the
critical section of code.

I don't know if there is a machine that /can't/ implement a semaphore -
but that is not the question that you asked.

I suppose I could contrive one. For example, if you had a processor
that required disabling interrupts as described above and you had a had
to support a non-maskable interrupt...

Rob.

David Brown

unread,
Jun 23, 2013, 4:51:34 PM6/23/13
to
I viewed the question as "why *you* can't pick Forth" - I can only
really answer for myself.

You say "preferences and biases" - I say "experience and understanding" :-)

>
>
>> The other reason is more technical - Forth is simply not a great
>> language for embedded development work.
>>
>> It certainly has some good points - its interactivity is very nice, and
>> you can write very compact source code.
>>
>> But the stack model makes it hard to work with more complex functions,
>> so it is difficult to be sure your code is correct and maintainable.
>
> I think you will get some disagreement on that point.
>

No doubt I will.

Of course, remember your own preferences and biases - I say Forth makes
these things hard or difficult, but not impossible. If you are very
experienced with Forth, you'll find them easier. In fact, you will
forget that you ever found them hard, and can't see why it's not easy
for everyone.

>
>> The traditional Forth solution is to break your code into lots of tiny
>> pieces - but that means the programmer is jumping back and forth in the
>> code, rather than working with sequential events in a logical sequence.
>
> I am no expert, so far be it from me to defend Forth in this regard, but
> my experience is that if you are having trouble writing code in Forth,
> you don't "get it".
>

I can agree with that to a fair extent. Forth requires you to think in
a different manner than procedural languages (just as object oriented
languages, function languages, etc., all require different ways to think
about the task).

My claim is that even when you do "get it", there are disadvantages and
limitations to Forth.

> I've mentioned many times I think the first time I can recall hearing
> "the word". I liked the idea of Forth, but was having trouble writing
> code in it for the various reasons that people give, one of which is
> your issue above. One time I was complaining that it was hard to find
> stack mismatches where words were leaving too many parameter on the
> stack or not enough. Jeff Fox weighed in (as he often would) and told
> me I didn't need debuggers and such, statck mismatches just showed that
> I couldn't count... That hit me between the eyes and I realized he was
> right. Balancing the stack is just a matter of counting... *and*
> keeping your word definitions small so that you aren't prone to
> miscounting. That was the real lesson, keep the definitions small.

There are times when code is complex, because the task in hand is
complex, and it cannot sensibly be reduced into small parts without a
lot of duplication, inefficiency, or confusing structure (the same
applies to procedural programming - sometimes the best choice really is
a huge switch statement). I don't want to deal with trial-and-error
debugging in the hope that I've tested all cases of miscounting - I want
a compiler that handles the drudge work automatically and lets me
concentrate on the important things.

>
> You don't need to jump "back and forth" so much, you just need to learn
> to decompose the code so that each word is small enough to debug
> visually. It was recognized a long time ago that even in C programming
> that smaller is better. I don't recall the expert, but one of the
> programming gurus of yesteryear had a guideline that C routines should
> fit on a screen which was 24 lines at the time. But do people listen?
> No. They write large routines that are hard to debug.

Don't kid yourself here - people write crap in all languages. And most
programmers - of all languages - are pretty bad at it. Forth might
encourage you to split up the code into small parts, but there will be
people who call these "part1", "part2", "part1b", etc.

>
>
>> The arithmetic model makes it hard to work with different sized types,
>> which are essential in embedded systems - the lack of overloading on
>> arithmetic operators means a lot of manual work in manipulating types
>> and getting the correct variant of the operator you want. The highly
>> flexible syntax means that static error checking is almost non-existent.
>
> Really? I have never considered data types to be a problem in Forth.
> Using S>D or just typing 0 to convert from single to double precision
> isn't so hard. You could also define words that are C like,
> (signed_double) and (unsigned_double), but then I don't know if this
> would help you since I don't understand your concern.
>

I need to easily and reliably deal with data that is 8-bit, 16-bit,
32-bit and 64-bit. Sometimes I need bit fields that are a different
size. Sometimes I work with processors that have 20-bit, 24-bit or
40-bit data. I need to know exactly what I am getting, and exactly what
I am doing with it. Working with a "cell" or "double cell" is not good
enough - just like C "int" or "short int" is unacceptable.

If Forth has the equivalent of "uint8_t", "int_fast16_t", etc., then it
could work - but as far as I know, it does not. Unless I am missing
something, there is no easy way to write code that is portable between
different Forth targets if cell width is different. You would have to
define your own special set of words and operators, with different
definitions depending on the cell size. You are no longer working in
Forth, but your own little private language.

> Yes, in terms of error checking, Forth is at the other end of the
> universe (almost) from Ada or VHDL (I'm pretty proficient at VHDL, not
> so much with Ada). I can tell you that in VHDL you spend almost as much
> time specifying and converting data types as you do the rest of coding.

Yes, I dislike that about VHDL and Ada, though I have done little work
with either. C is a bit more of a happy medium - though sometimes the
extra protection you can get (but only if you want it) with C++ can be a
good idea.

> The only difference from not having the type checking is that the tool
> catches the "bugs" and you spend your time figuring out how to make it
> happy, vs. debugging the usual way. I'm not sure which is really
> faster. I honestly can't recall having a bug from data types in Forth,
> but it could have happened.
>

I use Python quite a bit - it has strong typing, but the types are
dynamic. This means there is very little compile-time checking. I
definitely miss that in the language - you waste a lot of time debugging
by trial-and-error when a statically typed language would spot your
error for you immediately.

Eric Wallin

unread,
Jun 23, 2013, 4:52:57 PM6/23/13
to
On Sunday, June 23, 2013 1:24:48 PM UTC-4, rickman wrote:

> I'd be interested in reading the design document, but this is what I
> find at Opencores...
>
> SVN: No files checked in

I believe SVN is for the verilog, which isn't there quite yet, but the document is. Click on "Downloads" at the upper right.

Here is a link to it:

http://opencores.org/usercontent,doc,1371986749

Eric Wallin

unread,
Jun 23, 2013, 5:10:19 PM6/23/13
to
I can see the need for some kind of semaphore mechanism if you have one or more caches sitting between the processor and the main memory (a "memory hierarchy") but that certainly isn't the case for my (Hive) processor, which is targeted towards small processor-centric tasks in FPGAs. Main memory is static, dual port, and connected directly to the core.

rickman

unread,
Jun 23, 2013, 6:30:28 PM6/23/13
to
Ok, this is certainly a lot more document than is typical for CPU
designs on opencores.

I'm not sure why you need to insert so much opinion of stack machines in
the discussions of the paper. Some of what I have read so far is not
very clear exactly what your point is and just comes off as a general
bias about stack machines including those who promote them. I don't
mind at all when technical shortcomings are pointed out, but I'm not
excited about reading the sort of opinion shown...

"Stack machines are (perhaps somewhat inadvertently) portrayed as a
panacea for all computing ills" I don't recall ever hearing anyone
saying that. Certainly there are a lot of claims for stack machines,
but the above is almost hyperbole.

There is a lot to digest in your document. I'll spend some time looking
at it.

--

Rick

Tom Gardner

unread,
Jun 23, 2013, 6:43:06 PM6/23/13
to
Eric Wallin wrote:
> I can see the need for some kind of semaphore mechanism if you have one or more caches sitting between the processor and the main memory (a "memory hierarchy") but that certainly isn't the case for my (Hive) processor, which is targeted towards small processor-centric tasks in FPGAs. Main memory is static, dual port, and connected directly to the core.

Unless your system is constrained in ways you haven't mentioned...

Do you have interrupts? If so you need semaphores.

Can more than one "source" cause a memory location
to be read or written within one processor instruction
cycle? If so you need semaphores.

I first realised the need for atomic operations when
doing hard real-time work on a 6800 (no caches,
single processor) as a vacation student. Then I did
some research and found out about semaphores. Atomicity
could only be guaranteed by disabling interrupts
for the critical operations. And if you ran two 6809s
off opposite clock phases, even that wasn't sufficient.

glen herrmannsfeldt

unread,
Jun 23, 2013, 8:15:09 PM6/23/13
to
rickman <gnu...@gmail.com> wrote:
> On 6/23/2013 4:52 PM, Eric Wallin wrote:

(snip)
>> I believe SVN is for the verilog, which isn't there quite yet,
>> but the document is. Click on "Downloads" at the upper right.

(snip)
>> http://opencores.org/usercontent,doc,1371986749

> Ok, this is certainly a lot more document than is typical for CPU
> designs on opencores.

> I'm not sure why you need to insert so much opinion of stack
> machines in the discussions of the paper. Some of what I have
> read so far is not very clear exactly what your point is and
> just comes off as a general bias about stack machines including
> those who promote them. I don't mind at all when technical
> shortcomings are pointed out, but I'm not excited about reading
> the sort of opinion shown...

> "Stack machines are (perhaps somewhat inadvertently) portrayed as a
> panacea for all computing ills" I don't recall ever hearing anyone
> saying that. Certainly there are a lot of claims for stack
> machines, but the above is almost hyperbole.

I suppose. Stack machines are pretty much out of style now.
One reason is that current compiler technology has a hard
time generating good code for them.

Well, stack machines, such as the Burroughs B5500, were popular
when machines had a small number of registers. They could be
implemented with most or all of the stack in main memory
(usually magnetic core). They allow for smaller instructions,
even as addressing space gets larger. (The base-displacement
addressing for S/360 was also to help with addressing.)

Now, I suppose if stack machines had stayed popular, that compiler
technology would have developed to use them more efficiently,
but general registers, 16 or more of them, allow for flexibility
in addressing that stacks make difficult.

Now, you could do like the x87, with a stack that also allows
one to address any stack element. The best, and some of the
worst, of both worlds.

-- glen

Eric Wallin

unread,
Jun 23, 2013, 9:23:26 PM6/23/13
to
On Sunday, June 23, 2013 6:43:06 PM UTC-4, Tom Gardner wrote:

> Do you have interrupts?

Yes, one per thread.

> If so you need semaphores.

Not sure I follow, but I'm not sure you've read the paper.

> Can more than one "source" cause a memory location
> to be read or written within one processor instruction
> cycle? If so you need semaphores.

If the programmer writes the individual thread programs so that two threads never write to the same address then by definition it can't happen (unless there is a bug in the code). I probably haven't thought about this as much as you have, but I don't see the fundamental need for more hardware if the programmer does his/her job.

rickman

unread,
Jun 23, 2013, 9:57:19 PM6/23/13
to
On 6/23/2013 9:23 PM, Eric Wallin wrote:
> On Sunday, June 23, 2013 6:43:06 PM UTC-4, Tom Gardner wrote:
>
>> Do you have interrupts?
>
> Yes, one per thread.
>
>> If so you need semaphores.
>
> Not sure I follow, but I'm not sure you've read the paper.

I used to know this stuff, but it has been a long time. I think what
Tom is referring to may not apply if you don't run more than one task on
a given processor. The issue is that to implement a semaphore you have
to do a read-modify-write operation on a word in memory. If "anyone"
else can get in the middle of your operation the semaphore can be
corrupted or fails. But I'm not sure just using an interrupt means you
will have problems, I think it simply means the door is open since
context can be switched causing a failure in the semaphore.

But as I say, it has been a long time and there are different reasons
for semaphores and different implementations.


>> Can more than one "source" cause a memory location
>> to be read or written within one processor instruction
>> cycle? If so you need semaphores.
>
> If the programmer writes the individual thread programs so that two threads never write to the same address then by definition it can't happen (unless there is a bug in the code). I probably haven't thought about this as much as you have, but I don't see the fundamental need for more hardware if the programmer does his/her job.

There are other resources that might be shared. Or maybe not, but if
so, you need to manage it.

Wow, I never realized how much I have forgotten.

--

Rick

rickman

unread,
Jun 23, 2013, 10:06:58 PM6/23/13
to
On 6/23/2013 8:15 PM, glen herrmannsfeldt wrote:
> rickman<gnu...@gmail.com> wrote:
>> On 6/23/2013 4:52 PM, Eric Wallin wrote:
>
> (snip)
>>> I believe SVN is for the verilog, which isn't there quite yet,
>>> but the document is. Click on "Downloads" at the upper right.
>
> (snip)
>>> http://opencores.org/usercontent,doc,1371986749
>
>> Ok, this is certainly a lot more document than is typical for CPU
>> designs on opencores.
>
>> I'm not sure why you need to insert so much opinion of stack
>> machines in the discussions of the paper. Some of what I have
>> read so far is not very clear exactly what your point is and
>> just comes off as a general bias about stack machines including
>> those who promote them. I don't mind at all when technical
>> shortcomings are pointed out, but I'm not excited about reading
>> the sort of opinion shown...
>
>> "Stack machines are (perhaps somewhat inadvertently) portrayed as a
>> panacea for all computing ills" I don't recall ever hearing anyone
>> saying that. Certainly there are a lot of claims for stack
>> machines, but the above is almost hyperbole.
>
> I suppose. Stack machines are pretty much out of style now.
> One reason is that current compiler technology has a hard
> time generating good code for them.

I think you might be referring to the sort of stack machines used in
minicomputers 30 years ago. For FPGA implementations stack CPUs are
alive and kicking. Forth seems to do a pretty good job with them. What
is the problem with other languages?


> Well, stack machines, such as the Burroughs B5500, were popular
> when machines had a small number of registers. They could be
> implemented with most or all of the stack in main memory
> (usually magnetic core). They allow for smaller instructions,
> even as addressing space gets larger. (The base-displacement
> addressing for S/360 was also to help with addressing.)

Yes, you *are* talking about 30 year old machines, or even 40 year old
machines.


> Now, I suppose if stack machines had stayed popular, that compiler
> technology would have developed to use them more efficiently,
> but general registers, 16 or more of them, allow for flexibility
> in addressing that stacks make difficult.
>
> Now, you could do like the x87, with a stack that also allows
> one to address any stack element. The best, and some of the
> worst, of both worlds.

You are reading my mind! That is what I spent some time looking at this
past winter. Then I got busy with work and have had to put it aside.

Eric's machine is a bit different having four stacks for each processor
and allowing each one to be popped or not rather than any addressing on
the stack itself. Interesting, but not so small as the two stack CPUs.

--

Rick

glen herrmannsfeldt

unread,
Jun 23, 2013, 10:07:37 PM6/23/13
to
Eric Wallin <tammi...@gmail.com> wrote:

(snip)
> If the programmer writes the individual thread programs so that
> two threads never write to the same address then by definition
> it can't happen (unless there is a bug in the code).

The question is related to communication between threads.
If they are independent, processing independent data, then no
problem. Usually they at least need to communicate with the OS
(or outside world, in general), which often needs semaphores.

> I probably haven't thought about this as much as you have,
> but I don't see the fundamental need for more hardware if
> the programmer does his/her job.

-- glen

Eric Wallin

unread,
Jun 23, 2013, 10:54:14 PM6/23/13
to
On Sunday, June 23, 2013 6:30:28 PM UTC-4, rickman wrote:

> I'm not sure why you need to insert so much opinion of stack machines in
> the discussions of the paper. Some of what I have read so far is not
> very clear exactly what your point is and just comes off as a general
> bias about stack machines including those who promote them. I don't
> mind at all when technical shortcomings are pointed out, but I'm not
> excited about reading the sort of opinion shown...

Point taken. I suppose I'm trying to spare others from wasting too much time and energy on canonical one and two stack machines. There just aren't enough stacks, so unless you want to deal with the top entry or two right now you'll be digging around, wasting both programming and real time, and getting confused. And they automatically toss data away that you often very much need, so you waste more time copying it or reloading it or whatever. I spent years trying to like them, thinking the problem was me. The J processor really helped break the spell.

Not saying I have all the answers, I hope the paper doesn't come across that way, but I do have to sell it to some degree (the paper ends with the down sides that I'm aware of, I'm sure there are more).

> "Stack machines are (perhaps somewhat inadvertently) portrayed as a
> panacea for all computing ills" I don't recall ever hearing anyone
> saying that. Certainly there are a lot of claims for stack machines,
> but the above is almost hyperbole.

Defense exhibit A:

http://www.ultratechnology.com/cowboys.html

Maybe I'm seeing things that aren't there, but almost every web site, paper, and book on stack machines and Forth that I've encountered has a vibe of "look at this revolutionary idea that the man has managed to keep down!" Absolutely no down sides mentioned, so the hapless noob is left with much too flattering of an impression. In my case this false impression was quite lasting, so I guess I've got something of an axe to grind. Perhaps I'll moderate this in future releases of the design document.

glen herrmannsfeldt

unread,
Jun 23, 2013, 11:18:21 PM6/23/13
to
rickman <gnu...@gmail.com> wrote:

(snip, I wrote)
>> I suppose. Stack machines are pretty much out of style now.
>> One reason is that current compiler technology has a hard
>> time generating good code for them.

> I think you might be referring to the sort of stack machines used in
> minicomputers 30 years ago. For FPGA implementations stack CPUs are
> alive and kicking. Forth seems to do a pretty good job with them. What
> is the problem with other languages?

The code generators designed for register machines, such as that
used by GCC or LCC, don't adapt to stack machines well.

As users of HP calculators know, given an expression with unrelated
arguments, it isn't hard to evaluate using a stack. But consider that
the expression might have some common subexpressions? You want to
evaluate the expression, evaluating the common subexpressions only
once. It is not so easy to get things into the right place on
the stack, such that they are at the top at the right time.

-- glen

Les Cargill

unread,
Jun 23, 2013, 11:21:14 PM6/23/13
to
No. The only requirement for semaphores
to work is to be able to turn off interrupts briefly.


--
Les Cargill

rickman

unread,
Jun 23, 2013, 11:42:33 PM6/23/13
to
On 6/23/2013 11:18 PM, glen herrmannsfeldt wrote:
> rickman<gnu...@gmail.com> wrote:
>
> (snip, I wrote)
>>> I suppose. Stack machines are pretty much out of style now.
>>> One reason is that current compiler technology has a hard
>>> time generating good code for them.
>
>> I think you might be referring to the sort of stack machines used in
>> minicomputers 30 years ago. For FPGA implementations stack CPUs are
>> alive and kicking. Forth seems to do a pretty good job with them. What
>> is the problem with other languages?
>
> The code generators designed for register machines, such as that
> used by GCC or LCC, don't adapt to stack machines well.

That shouldn't be a surprise to anyone. The guy who designed the ZPU
found that out the hard way.


> As users of HP calculators know, given an expression with unrelated
> arguments, it isn't hard to evaluate using a stack. But consider that
> the expression might have some common subexpressions? You want to
> evaluate the expression, evaluating the common subexpressions only
> once. It is not so easy to get things into the right place on
> the stack, such that they are at the top at the right time.

Tell me about it ;^)

--

Rick

rickman

unread,
Jun 24, 2013, 12:07:23 AM6/24/13
to
On 6/23/2013 10:54 PM, Eric Wallin wrote:
> On Sunday, June 23, 2013 6:30:28 PM UTC-4, rickman wrote:
>
>> I'm not sure why you need to insert so much opinion of stack machines in
>> the discussions of the paper. Some of what I have read so far is not
>> very clear exactly what your point is and just comes off as a general
>> bias about stack machines including those who promote them. I don't
>> mind at all when technical shortcomings are pointed out, but I'm not
>> excited about reading the sort of opinion shown...
>
> Point taken. I suppose I'm trying to spare others from wasting too much time and energy on canonical one and two stack machines. There just aren't enough stacks, so unless you want to deal with the top entry or two right now you'll be digging around, wasting both programming and real time, and getting confused. And they automatically toss data away that you often very much need, so you waste more time copying it or reloading it or whatever. I spent years trying to like them, thinking the problem was me. The J processor really helped break the spell.
>
> Not saying I have all the answers, I hope the paper doesn't come across that way, but I do have to sell it to some degree (the paper ends with the down sides that I'm aware of, I'm sure there are more).

I'm glad you can take (hopefully) constructive criticism. I was
concerned when I wrote the above that it might be a bit too blunt.

It will be a while before I get to the end of your paper. Do you
describe the applications you think the design would be good for? One
reason I don't completely agree with you about the suitability of MISC
type CPUs is that there are many apps with different requirements. Some
will definitely do better with a design other than yours. I wonder if
you had some specific class of applications that you were seeing that
you didn't think the MISC approach was optimal for or if it was just the
various "features" of MISC that didn't suit your tastes.


>> "Stack machines are (perhaps somewhat inadvertently) portrayed as a
>> panacea for all computing ills" I don't recall ever hearing anyone
>> saying that. Certainly there are a lot of claims for stack machines,
>> but the above is almost hyperbole.
>
> Defense exhibit A:
>
> http://www.ultratechnology.com/cowboys.html
>
> Maybe I'm seeing things that aren't there, but almost every web site, paper, and book on stack machines and Forth that I've encountered has a vibe of "look at this revolutionary idea that the man has managed to keep down!" Absolutely no down sides mentioned, so the hapless noob is left with much too flattering of an impression. In my case this false impression was quite lasting, so I guess I've got something of an axe to grind. Perhaps I'll moderate this in future releases of the design document.

I can't argue with you on this one. When I first saw the GA144 design
it sounded fantastic! But that is typical corporate product hype. The
reality of the chip is very different. When it comes to CPU cores for
FPGAs I don't see a lot of difference. Check out some of the other
offerings on Opencores. Everyone touts their design as something pretty
special even if they are just one of two or three that do the same
thing! I think they had some five or six PIC implementations and all
seemed to say they were the best!

I do have to say I am not in complete agreement with you about the
issues of MISC machines. Yes, there can be a lot of stack ops compared
to a register machine. But these can be minimized with careful
programming. I know that from experience. However, part of the utility
of a design is the ease of programming efficiently. I haven't looked at
yours yet, but just picturing the four stacks makes it seem pretty
simple... so far. :^)

I have to say I'm not crazy about the large instruction word. That is
one of the appealing things about MISC to me. I work in very small
FPGAs and 16 bit instructions are better avoided if possible, but that
may be a red herring. What matters is how many bytes a given program
uses, not how many bits are in an instruction.

I am supposed to present to the SVFIG and I think your design would be a
very interesting part of the presentation unless you think you would
rather present yourself. I'm sure they would like to hear about it and
they likely would be interested in your opinions on MISC. I know I am.

--

Rick

glen herrmannsfeldt

unread,
Jun 24, 2013, 12:49:50 AM6/24/13
to
Les Cargill <lcarg...@comcast.com> wrote:
> Rob Doyle wrote:

(snip)

>> Lack of atomic operations.

> No. The only requirement for semaphores
> to work is to be able to turn off interrupts briefly.

What about other processors or I/O using the same memory?

-- glen

Tom Gardner

unread,
Jun 24, 2013, 3:24:44 AM6/24/13
to
Eric Wallin wrote:
> On Sunday, June 23, 2013 6:43:06 PM UTC-4, Tom Gardner wrote:
>
>> Do you have interrupts?
>
> Yes, one per thread.
>
>> If so you need semaphores.
>
> Not sure I follow, but I'm not sure you've read the paper.

Nope. I've too many other things to understand in detail.
I have no bandwidth to debug your design.


>> Can more than one "source" cause a memory location
>> to be read or written within one processor instruction
>> cycle? If so you need semaphores.
>
> If the programmer writes the individual thread programs so that two threads never write to the same address then by definition it can't happen (unless there is a bug in the code). I probably haven't thought about this as much as you have, but I don't see the fundamental need for more hardware if the programmer does his/her job.

The problems that arise with the lack of atomic operations
and/or semaphores are a known problem. Any respectable
university-level software course will cover the problems
and various solutions.

Consider trying to pass a message consisting of one
integer from one thread to another such that the
receiving thread is guaranteed to be able to picks
it up exactly once.


Eric Wallin

unread,
Jun 24, 2013, 8:03:52 AM6/24/13
to
On Monday, June 24, 2013 3:24:44 AM UTC-4, Tom Gardner wrote:

> Consider trying to pass a message consisting of one
> integer from one thread to another such that the
> receiving thread is guaranteed to be able to picks
> it up exactly once.

Thread A works on the integer value and when it is done it writes it to location Z. It then reads a value at location X, increments it, and writes it back to location X.

Thread B has been repeatedly reading location X and notices it has been incremented. It reads the integer value at Z, performs some function on it, and writes it back to location Z. It then reads a value at Y, increments it, and writes it back to location Y to let thread A know it took, worked on, and replaced the integer at Z.

The above seems airtight to me if reads and writes to memory are not cached or otherwise delayed, and I don't see how interrupts are germane, but perhaps I haven't taken everything into account.

Tom Gardner

unread,
Jun 24, 2013, 8:30:46 AM6/24/13
to
Consider what happens if interrupt occurs at inopportune moment in the above sequence, and the other thread runs. You can get double or missed updates.

Do some research to find why "test and set" and "compare and swap" instructions exist.

Les Cargill

unread,
Jun 24, 2013, 8:31:58 AM6/24/13
to
Then it's no longer what I would call a semaphore. A
semaphore is, SFAIK, only a Dijkstra P() or V() operation.

It's not that things like this don't exist but rather that
they should be called something else, like "bus arbitration
scheme."

--
Les Cargill


Les Cargill

unread,
Jun 24, 2013, 8:50:04 AM6/24/13
to

Eric Wallin

unread,
Jun 24, 2013, 9:28:53 AM6/24/13
to
On Monday, June 24, 2013 8:30:46 AM UTC-4, Tom Gardner wrote:

> Consider what happens if interrupt occurs at inopportune moment in the above sequence, and the other thread runs. You can get double or missed updates.

I think you're missing the point that in my processor the threads run concurrently, not sequentially.

Tom Gardner

unread,
Jun 24, 2013, 9:47:28 AM6/24/13
to
Nope. That usually exacerbates problems, plus having 8-port memory (one for each thread) is not cheap!

Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?

Eric Wallin

unread,
Jun 24, 2013, 11:57:40 AM6/24/13
to
On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:

> Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?

I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a footing as possible.

Tom Gardner

unread,
Jun 24, 2013, 12:13:39 PM6/24/13
to
Eric Wallin wrote:
> On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:
>
>> Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?
>
> I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions.

Please point me to the section which discusses the primitive operations/attributes/properties that you have provided to enable inter-thread communication.


> If it doesn't then perhaps you could tell me what I left out, and I might include that info in the next rev.

Sorry, I don't have time to poorly recapitulate subjects that have been know about and solved for decades.


> Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a footing as possible.

Ditto, and I don't have time to find the flaws in your arguments.

If you want people to use your processor, it might be wise to give them the information they need to have confidence in its design.


rickman

unread,
Jun 24, 2013, 12:35:02 PM6/24/13
to
Eric, I think you have explained properly how your design will deal with
synchronization. I'm not sure what Tom is going on about. Clearly he
doesn't understand your design.

If it is of any help, Eric's design is more like 8 cores running in
parallel, time sharing memory and in fact, the same processor hardware
on a machine cycle basis (so no 8 ported memory required). If an
interrupt occurs it doesn't cause one of the other 7 tasks to run, they
are already running, it simply invokes the interrupt handler. I believe
Eric is not envisioning multiple tasks on a single processor.

As others have pointed out, test and set instructions are not required
to support concurrency and communications. They are certainly nice to
have, but are not essential. In your case they would be superfluous.

--

Rick

Tom Gardner

unread,
Jun 24, 2013, 12:56:05 PM6/24/13
to
rickman wrote:
> On 6/24/2013 11:57 AM, Eric Wallin wrote:
>> On Monday, June 24, 2013 9:47:28 AM UTC-4, Tom Gardner wrote:
>>
>>> Please explain why your processor does not need test and set or compare and exchange operations. What theoretical advance have you made?
>>
>> I'm not exactly sure why we're having this generalized, theoretical discussion when a simple reading the design document I've provided would probably answer your questions. If it doesn't then
>> perhaps you could tell me what I left out, and I might include that info in the next rev. Not trying to be gruff or anything, I'd very much like the document (and processor) to be on as solid a
>> footing as possible.
>
> Eric, I think you have explained properly how your design will deal with synchronization. I'm not sure what Tom is going on about. Clearly he doesn't understand your design.

Correct.

> If it is of any help, Eric's design is more like 8 cores running in parallel, time sharing memory and in fact, the same processor hardware on a machine cycle basis
>(so no 8 ported memory required).

Fair enough; sounds like it is in the same area as the propellor chip.

Is there anything to prevent multiple cores reading/writing the
same memory location in the same machine cycle? What is the
result when that happens?


> If an interrupt occurs it doesn't cause one of the other 7 tasks to run, they are already running, it simply invokes the interrupt handler. I believe Eric is not envisioning multiple tasks on a
> single processor.

Such presumptions would be useful to have in the white paper.


> As others have pointed out, test and set instructions are not required to support concurrency and communications. They are certainly nice to have, but are not essential.

Agreed. I'm perfectly prepared to accept alternative techniques,
e.g. disable interrupts.


> In your case they would be superfluous.

Not proven to me.

The trouble is I've seen too many hardware designs that
leave the awkward problems to software - especially first
efforts by small teams.

And too often those problems can be very difficult to solve
in software. Nowadays it is hard to find people that have
sufficient experience across the whole hardware/firmware/system
software spectrum to enable them to avoid such traps.

I don't know whether Eric is such a person, but I'm afraid
his answers have raised orange flags in my mind.

As a point of reference, I had similar misgivings when I
first heard about the Itanium's architecture in, IIRC,
1994. I suppressed them because the people involved were
undoubtedly more skilled in the area that I, and had been
working for 5 years. Much later I regrettably came to the
conclusion the orange flags were too optimistic.

glen herrmannsfeldt

unread,
Jun 24, 2013, 2:01:07 PM6/24/13
to
Tom Gardner <spam...@blueyonder.co.uk> wrote:

(snip)
> Is there anything to prevent multiple cores reading/writing the
> same memory location in the same machine cycle? What is the
> result when that happens?

Yes. Even if the threads don't communicate with each other, they
might share I/O devices which needs some communication.

(sniP)

> The trouble is I've seen too many hardware designs that
> leave the awkward problems to software - especially first
> efforts by small teams.

The 8087 was originally designed to have a virtual stack, where
on stack overflow an interrupt would trigger a software routine
to spill some stack registers to memory, and on underflow bring
them back again. But no-one tried to write the interrupt routine
until the hardware was done, and it turned out that it wasn't
possible. Not all the required state was available or settable.

They might have fixed it in the 80287, but then they had to be
compatible with the 8087. Actually, I don't know why they didn't
fix it, but it still isn't fixed.

-- glen

Tom Gardner

unread,
Jun 24, 2013, 2:45:29 PM6/24/13
to
glen herrmannsfeldt wrote:
> Tom Gardner <spam...@blueyonder.co.uk> wrote:
>
> (snip)
>> Is there anything to prevent multiple cores reading/writing the
>> same memory location in the same machine cycle? What is the
>> result when that happens?
>
> Yes. Even if the threads don't communicate with each other, they
> might share I/O devices which needs some communication.
>
> (sniP)
>
>> The trouble is I've seen too many hardware designs that
>> leave the awkward problems to software - especially first
>> efforts by small teams.
>
> The 8087 was originally designed to have a virtual stack, where
> on stack overflow an interrupt would trigger a software routine
> to spill some stack registers to memory, and on underflow bring
> them back again. But no-one tried to write the interrupt routine
> until the hardware was done, and it turned out that it wasn't
> possible. Not all the required state was available or settable.

Oh, inaccessible state is a problem that has been repeated
many times in many companies! Often to be found near to
virtual memory tables, exceptions, interrupts, and
debuggers - making Heisenbugs the norm not the exception :(


> They might have fixed it in the 80287, but then they had to be
> compatible with the 8087. Actually, I don't know why they didn't
> fix it, but it still isn't fixed.

I strongly suspect compatibility is the (fully justifiable)
reason; it is the reason for all sorts of hardware and
software cruft.

Eric Wallin

unread,
Jun 24, 2013, 3:25:20 PM6/24/13
to
On Monday, June 24, 2013 12:07:23 AM UTC-4, rickman wrote:

> I'm glad you can take (hopefully) constructive criticism. I was
> concerned when I wrote the above that it might be a bit too blunt.

I apologize to everyone here, I kind of barged in and have behaved somewhat brashly.

> ... part of the utility
> of a design is the ease of programming efficiently. I haven't looked at
> yours yet, but just picturing the four stacks makes it seem pretty
> simple... so far. :^)

Writing a conventional stack machine in an HDL isn't too daunting, but programming it afterward, for me anyway, was just too much.

> I have to say I'm not crazy about the large instruction word. That is
> one of the appealing things about MISC to me. I work in very small
> FPGAs and 16 bit instructions are better avoided if possible, but that
> may be a red herring. What matters is how many bytes a given program
> uses, not how many bits are in an instruction.

Yes. Opcode space obviously expands exponentially with bit count, so one can get a lot more with a small size increase. I think a 32 bit opcode is pushing it for a small FPGA implementation, but a 16 bit opcode gives one a couple of small operand indices, and some reasonably sized immediate instructions (data, conditional jumps, shifts, add) that I find I'm using quite a bit during the testing and verification phase. Data plus operation in a single opcode is hard to beat for efficiency but it has to earn it's keep in the expanded opcode space. With the operand indices you get a free copy/move with most single operand operations which is another efficiency.

> I am supposed to present to the SVFIG and I think your design would be a
> very interesting part of the presentation unless you think you would
> rather present yourself. I'm sure they would like to hear about it and
> they likely would be interested in your opinions on MISC. I know I am.

I'm on the other coast so I most likely can't attend, but I would be most honored if you were to present it to SVFIG.

Eric Wallin

unread,
Jun 24, 2013, 3:47:35 PM6/24/13
to
On Monday, June 24, 2013 12:56:05 PM UTC-4, Tom Gardner wrote:

> The trouble is I've seen too many hardware designs that
> leave the awkward problems to software - especially first
> efforts by small teams.

I know what you mean. Maybe I'm too picky, or perhaps too rigid to conform to other's coding styles, but 90% of the HDL code I've encountered both vocationally and avocationally has been quite poor. I tend to rewrite everything, including my slightly older code if I'm using it again, because my style keeps evolving, and it never hurts to take another look at things or at least give them some more polish.

> And too often those problems can be very difficult to solve
> in software. Nowadays it is hard to find people that have
> sufficient experience across the whole hardware/firmware/system
> software spectrum to enable them to avoid such traps.
>
> I don't know whether Eric is such a person, but I'm afraid
> his answers have raised orange flags in my mind.

Processors aren't my main thing, but I do have need of them in my main thing so I've been quite interested in them for ~15 years now. My EE graduate adviser was a professor of computer engineering and I took a couple of his courses. I own and have read the two Hennesey & Patterson (sp?) texts, though I must admit my eyes glazed over when TLBs and pipeline hazards were being discussed.

Discovering stack machines was a transforming experience for me, showing that it was possible to have much simpler and tractable HW underlying it all. But it's hard to beat indexed registers. This hybrid is something of a middle ground, and so far I'm not finding it too revolting to program by hand - then again it might be something only a mother can love.

Tom Gardner

unread,
Jun 24, 2013, 4:02:32 PM6/24/13
to
Doing something "just because I want to" is an _excellent_
reason. For me probably the next such project will be to
make a cheapskate 1GS/s oscilloscope & TDR. But there's a
heck of a learning curve w.r.t. FPGA clocking, i/o
structures, floorplanning and dev kits nowadays!

Whenever I've come across people that say "if you have
problem X then my product will solve it provided Y applies",
I give them more credence if they also say "but my product
doesn't do Z, you have to do that some other way".
I'm sure you've had similar experiences.
It is loading more messages.
0 new messages