Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

mr paysan where is your chip? where is the personal computer pwoered by paysan cpu? that runs firefox and is liek 10x faster than amd?

447 views
Skip to first unread message

gavino_himself

unread,
Oct 12, 2012, 4:36:00 PM10/12/12
to
mr paysan where is your chip? where is the personal computer pwoered by paysan cpu? that runs firefox and is liek 10x faster than amd?

Bernd Paysan

unread,
Oct 12, 2012, 4:49:04 PM10/12/12
to
gavino_himself wrote:

> mr paysan where is your chip? where is the personal computer pwoered
> by paysan cpu? that runs firefox and is liek 10x faster than amd?

As far as I can tell, the last development using my chip will go into an
iPod battery monitor - or already went, because Apple and the company
that produced it are quite secretive. It has 2 or 4k of RAM, and it
won't run any firefox. It's *not* a PC CPU. It's a deeply embedded
CPU. I think you should have a beer with Rod Pemberton, who will in
deep explain you the fundamental difference between the two things.

The other thing I designed, the 4stack CPU, is a 15 year old diploma
thesis. I don't think it will be that competitive today. Though, when
I look at all the ARMs around, which can be 10 times slower than a low-
end AMD, maybe I'm wrong.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

rickman

unread,
Oct 13, 2012, 12:06:59 AM10/13/12
to
On 10/12/2012 4:49 PM, Bernd Paysan wrote:
> gavino_himself wrote:
>
>> mr paysan where is your chip? where is the personal computer pwoered
>> by paysan cpu? that runs firefox and is liek 10x faster than amd?
>
> As far as I can tell, the last development using my chip will go into an
> iPod battery monitor - or already went, because Apple and the company
> that produced it are quite secretive. It has 2 or 4k of RAM, and it
> won't run any firefox. It's *not* a PC CPU. It's a deeply embedded
> CPU. I think you should have a beer with Rod Pemberton, who will in
> deep explain you the fundamental difference between the two things.
>
> The other thing I designed, the 4stack CPU, is a 15 year old diploma
> thesis. I don't think it will be that competitive today. Though, when
> I look at all the ARMs around, which can be 10 times slower than a low-
> end AMD, maybe I'm wrong.

I remember looking at that some years ago, but I don't recall the
significance. What was the advantage of four stacks? What were the
"other" two stacks for?

More importantly, what were it's successes and what were it's failures?
Or in other terms, what was good and what wasn't as good about the
architecture?

Rick

Bernd Paysan

unread,
Oct 13, 2012, 4:34:39 PM10/13/12
to
rickman wrote:
> I remember looking at that some years ago, but I don't recall the
> significance. What was the advantage of four stacks? What were the
> "other" two stacks for?

This essentially was a VLIW CPU which used a stack architecture to make
the instruction word less big. You shouldn't think of the four stacks
as similar to the four stacks Gforth has (data, return, float, locals
stack), all four are general purpose stacks, and are used for
parallelizing instructions.

> More importantly, what were it's successes and what were it's
> failures?

From a personal point of view the architecture's success was that I
could write synthesizable Verilog within the time frame of a diploma
thesis (which is only half a year), and the failure was that it was too
big for the tools our university had back then to make a prototype.

> Or in other terms, what was good and what wasn't as good about the
> architecture?

One thing that definitely wasn't good is the ease to target a C compiler
at it (or even a Forth compiler). Back then, people were quite
optimistic to improve C compilers towards wider issue machines, i.e.
VLIWs, which was also the reason why Intel decided to replace x86 by
IA64. They were all wrong. C compilers had just reached the point of
unmaintainability, i.e. the point where further progress is almost
impossible or at least requires to start over from scratch. Looking at
llvm's output convinces me that starting over from scratch just results
in faster generation of the same rubbish code and in neater error
messages.

What it did achieve was a quite high instruction per cycle count while
still being quite small. However, as I said in the discussion about
GA144 here, what actually matters most in computation performance today
is not the CPU, it's the memory. And that's why the CPU architecture
isn't important anymore.

rickman

unread,
Oct 13, 2012, 4:46:45 PM10/13/12
to
Thanks for your explanation. I'm not clear how four stacks with one ALU
(I'm assuming here since you haven't mentioned more) gives a big
advantage. But that is what it is and if you aren't working in that
direction I guess it is not likely to pay off benefits.

I don't agree with your CPU vs. memory generalization. Yes, the GA144
is limited by its memory architecture, but that is a limitation, not a
disqualification. As I have said before, the GA144 is not a processor
to be compared to the IA64 or the high end ARMs. It can't do the job
they can do and they can't do the jobs the GA144 can do. It's that
simple, they are apples and oranges... or should be to everyone other
than Gavino.

Rick

Bernd Paysan

unread,
Oct 13, 2012, 6:24:44 PM10/13/12
to
rickman wrote:
> Thanks for your explanation. I'm not clear how four stacks with one
> ALU (I'm assuming here since you haven't mentioned more)

I've mentioned VLIW, didn't I? I haven't mentioned "ALU" at all. VLIW
doesn't have a "humpty dumpty" meaning you can define as you like it:
They are four stacks with four ALUs, that's the whole point of the
exercise. I don't think I have to mention everything about the 4stack
processor in every posting I make; you can still read it up. Google for
"4stack", and you'll still find my old web page. m(

> gives a big
> advantage. But that is what it is and if you aren't working in that
> direction I guess it is not likely to pay off benefits.

It wasn't the kind of CPU I needed for the work I did in the
semiconductor industry in the past, that's why I created the b16 instead
- most tasks need surprisingly little computation power. And to get the
4stack off and running as a commercial product (rather than just a
thesis paper) requires too much funding - millions. And even though you
don't agree with my CPU and memory generalization: The memory wall is
not something I made up. And the difficulties to program a VLIW didn't
go away.

> I don't agree with your CPU vs. memory generalization. Yes, the GA144
> is limited by its memory architecture, but that is a limitation, not a
> disqualification.

Well, it is a pretty absurd limitation. Apart from the hearing aid, I
haven't seen anything commercial that really uses this product. And the
hearing aid would probably have been better off with a full custom
solution.

For the b16 tasks I made, I found that anything less than 2k of memory
is too little. Ease of programming is important; you add a CPU to your
chip because it is easier to program than Verilog. Making something
deliberately difficult to program (as the GA144 is) means that its
target audience are students with ample spare time who like to solve
some puzzles. Not engineers who like to make a product in a limited
time. Ease of programming *is* important.

> As I have said before, the GA144 is not a processor
> to be compared to the IA64 or the high end ARMs. It can't do the job
> they can do and they can't do the jobs the GA144 can do. It's that
> simple, they are apples and oranges... or should be to everyone other
> than Gavino.

I honestly don't know anything the GA144 is really good at. It is
difficult to program. It is cornering itself into a small niche of the
few algorithms that don't need memory (64 words is more like "no
memory", because that's shared between program and data). The CPU core
it uses would be quite good for all these little tasks where you don't
need much computing power, but you also don't have much electric power.
Let's put it that way: Chuck presented the c18 (the core that after some
iterations went into the GA144) on EuroForth 2001 in Dagstuhl. I wrote
the b16 in Verilog over the Christmas holidays, inspired by Chucks work.
Four months later, we started the first product to put it into - a
deviation of it, because as it is easy to adapt to the actual
requirements, we did that.

That's how commercially viable solutions look like. Not a solution
looking for a problem, but a solution that naturally fits the problems
customers have. I simply don't see that with GA144. Chuck is aiming at
high performance and low power, something others attack successfully
with modern processes instead. If he was aiming at medium performance
and ultra-low-power, he would have a product. His power budget for the
144 cores is in the same order of magnitude as a high-end ARM with
GPGPU, and those are designed to scale with quite a range of power
requirements (dynamically). And please look at the figures: These high-
end ARMs with GPGPU can do considerably more than GA144, they are
cheaper than a GA144 (because they are filling fabs - really, entire
fabs; this is big business), and it requires *less* external components
to make them run (they are SoCs, all you need is a crystal and a bit of
power management outside - and memory). Plus they are much easier to
program.

Chucks GA144 isn't an apple. But it isn't an orange, either. Nor is it
a banana. It might be some decorative pumpkin, colorful, with warts on
the outside, and hollow inside. GA144 is Chucks "burn the 125 million
dollars the Moore patent portfolio generated" project. Unfortunately,
Chuck doesn't even have that money. If you make a deal with the devil
(and patent trolls totally qualify as devils), all the gold the devil
gave you for your soul will evaporate at dawn. That's how it works,
it's not just a fairy tale.

Rod Pemberton

unread,
Oct 13, 2012, 8:53:23 PM10/13/12
to
"Bernd Paysan" <bernd....@gmx.de> wrote in message
news:51932083....@sunwukong.fritz.box...
> rickman wrote:
...

> > I remember looking at that some years ago, but I don't recall the
> > significance. What was the advantage of four stacks? What were the
> > "other" two stacks for?
>
> This essentially was a VLIW CPU which used a stack architecture to make
> the instruction word less big. You shouldn't think of the four stacks
> as similar to the four stacks Gforth has (data, return, float, locals
> stack), all four are general purpose stacks, and are used for
> parallelizing instructions.
>

Since both Chuck Moore's and Philip Koopman's numerous forrays into
stack-based, Forth microprocessors were known failures by the end of the
1980's or early 1990's, what made you believe your stack-based design
would be a success?

> > More importantly, what were it's successes and what were it's
> > failures?
>
> From a personal point of view the architecture's success was that I
> could write synthesizable Verilog within the time frame of a diploma
> thesis (which is only half a year), and the failure was that it was too
> big for the tools our university had back then to make a prototype.
>
> > Or in other terms, what was good and what wasn't as good about the
> > architecture?
>
> One thing that definitely wasn't good is the ease to target a C compiler
> at it (or even a Forth compiler). Back then, people were quite
> optimistic to improve C compilers towards wider issue machines, i.e.
> VLIWs, which was also the reason why Intel decided to replace x86 by
> IA64. They were all wrong.
...

> C compilers had just reached the point of
> unmaintainability, i.e. the point where further progress is almost
> impossible or at least requires to start over from scratch.

Now, that's an interesting an claim. I'm sure the list of reasons would be
interesting, to me, or, at least, amusing. They're not suitable for c.l.f.,
but that goes for this entire thread on obscure, irrelevant, obsolete,
microprocessor design.

> Looking at
> llvm's output convinces me that starting over from scratch just results
> in faster generation of the same rubbish code and in neater error
> messages.
>
> What it did achieve was a quite high instruction per cycle count while
> still being quite small. However, as I said in the discussion about
> GA144 here, what actually matters most in computation performance today
> is not the CPU, it's the memory. And that's why the CPU architecture
> isn't important anymore.
>
...


Rod Pemberton



rickman

unread,
Oct 14, 2012, 7:09:07 PM10/14/12
to
I had a well written, but somewhat lengthy reply written (although not
as long as your message ;) all ready to send, when my computer crashed.
I won't bother to rewrite it because you have heard it all before. I
appreciate your b16 as well as your 4 stack architecture even if I don't
remember the details (I do remember your b16 was very similar in
architecture to mine, mainly different instruction encoding).

To summarize our differences, I maintain that memory speed is a
limitation in all high end processors and in some midlevel applications.
The GA144 is not a high end processor and is not well suited to the
apps you would run on those processors. It is suited to apps which need
lots of small, fast processors such as SDR (software defined radio) with
the advantage over FPGAs of potentially lower power consumption.

We've been discussing this for some time now and we are being rather
repetitive I think. I hope that we can figure out what to agree on and
what our differences are, accept them and move on.

Rick

Mark Wills

unread,
Oct 15, 2012, 4:46:47 AM10/15/12
to
> Rick- Hide quoted text -
>
> - Show quoted text -

Why do you consider SDR to need lots of small processors? There are
plenty of videos on YouTube demoing SDR running on Arduino's and the
like...

Anton Ertl

unread,
Oct 15, 2012, 5:28:40 AM10/15/12
to
Bernd Paysan <bernd....@gmx.de> writes:
>The memory wall is
>not something I made up.

No, it's something that William Wulf and Sally McKee made up. And
their technical arguments were wrong, and I debunked them in
<http://www.complang.tuwien.ac.at/anton/memory-wall.html>.

The general idea that there are programs where memory latency is the
main contributor to run-time is not wrong. It seems to me that, for
general-purpose CPUs, the proportion of such programs is getting
smaller these days, because the relative speed of CPUs and memory
stays the same, but caches become bigger over time.

Another idea, which was not meant at all by the Wulf&McKee paper is
that there are memory-bandwidth-limited programs, where a large
proportion of the run-time is spent transferring data between the CPU
and memory. The proportion of such programs may be rising, because
CPUs gain more cores (slowly), wider SIMD units, and (GP)GPUs, and
that seems to eat bandwidth faster than the bandwidth grows from
having faster and wider memory interfaces.

In any case, there are a lot of programs that are CPU-bound, and for
those a faster CPU is helpful.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Mark Wills

unread,
Oct 15, 2012, 6:02:03 AM10/15/12
to
On Oct 15, 10:56 am, an...@mips.complang.tuwien.ac.at (Anton Ertl)
wrote:
If a cache is not a solution for slow memory, then what is it?

visua...@rocketmail.com

unread,
Oct 15, 2012, 6:17:18 AM10/15/12
to
On Saturday, October 13, 2012 10:46:55 PM UTC+2, rickman wrote:>
>
> >> I remember looking at that some years ago, but I don't recall the
> >> significance. What was the advantage of four stacks? What were the
> >> "other" two stacks for?
>

If you don't already know what advantages you get of stacks, please look at "Design of digital computers" by Hans W. Gschwind, Edward J. McCluskey, Springer Verlag, Wien 1967. There you find the explanations why stack computers are the future. It's not only because stack operations are high speed zero address operations.

We are all tolerant - as long as the other behaves as expected!


Paul Rubin

unread,
Oct 15, 2012, 7:25:05 AM10/15/12
to
Mark Wills <forth...@gmail.com> writes:
> Why do you consider SDR to need lots of small processors? There are
> plenty of videos on YouTube demoing SDR running on Arduino's and the
> like...

I'd expect in those demos, the Arduino cpu is only implementing a
controller or UI. The actual demodulation is happening in a fast DSP or
FPGA or something of that sort, like in GNU Radio. It's not obvious to
me that all currently important modulation schemes can be parallelized
well, but maybe they can.

Mark Wills

unread,
Oct 15, 2012, 8:35:44 AM10/15/12
to
On Oct 15, 12:25 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
Hmmm... Actually, you may be correct now that you mention it. I spent
a hour or so a few weeks ago looking at SDR on YouTube and I do recall
a lot of them (now that you've jogged my memory) using DSPs and the
like. Thanks for the refresh!

Bernd Paysan

unread,
Oct 15, 2012, 8:47:28 AM10/15/12
to
Anton Ertl wrote:

> Bernd Paysan <bernd....@gmx.de> writes:
>>The memory wall is
>>not something I made up.
>
> No, it's something that William Wulf and Sally McKee made up. And
> their technical arguments were wrong, and I debunked them in
> <http://www.complang.tuwien.ac.at/anton/memory-wall.html>.

Well, their arguments were wrong in so far that you can move a
sufficient large "window" of memory (the cache) close enough to the CPU
to overcome the latency problem. And that external memory becomes
faster with roughly the same pace as CPUs become faster.

> In any case, there are a lot of programs that are CPU-bound, and for
> those a faster CPU is helpful.

Given enough fast caches. The memory wall is not something
insurmountable, but something you can overcome - given enough caches and
enough bandwidth to enough memory. I'm not arguing against *that*. The
memory hierarchy is the most important part of a CPU.

Anton Ertl

unread,
Oct 15, 2012, 8:50:46 AM10/15/12
to
Mark Wills <forth...@gmail.com> writes:
>If a cache is not a solution for slow memory, then what is it?

Who said it isn't?

Bernd Paysan

unread,
Oct 15, 2012, 9:41:21 AM10/15/12
to
Rod Pemberton wrote:
> Since both Chuck Moore's and Philip Koopman's numerous forrays into
> stack-based, Forth microprocessors were known failures

Commercial failures: yes, sort-of (RTX made profits, especially with the
radiation-harded RTX2000). Technical failures: not really. I
understand that your simple troll-mind can only provoke, and not gain
knowledge.

> by the end of
> the 1980's or early 1990's, what made you believe your stack-based
> design would be a success?

Come on: This was a cool diploma thesis. The motivation to make a cool
diploma thesis is not to get rich quickly, but to explore things nobody
else has tried before, i.e. combining things that looked promising from
a technical point of view (VLIW+stacks). That's how you make progress
(to be precise: I don't mean you personally with "you". Idiot's can't
make progress, so maybe you can figure out why "you personally" is ruled
out). Even if it is not a commercial success, I have learned a lot, and
got a degree. Learning a lot is the main point of writing a diploma
thesis, so it was a tremendous success.

The b16, on the other hand, was an almost-instant commercial success for
the company I worked for. Idiots like gavino want to buy it as Firefox
browser PC (Firefox already has enough Forth in it to qualify as a Forth
success story), but the b16's intent is deeply embedded controlling
stuff inside a chip.

I don't know how much progress the company made where I last used the
b16 at, but the plan was to eventually have it in every iPod and iOS
device as battery monitor. While I was there, the managers were scared
by the CPU (and me, and the customer, and everything, the climate of
fear was horrible), and constantly considered ARM Cortex M0s (though it
was way too large and expensive); after I left and some other guys were
forced to take over, the reluctance declined considerably. Stack
processors are alien and unknown, and that's the main reason for
rejecting to use them.

>> C compilers had just reached the point of
>> unmaintainability, i.e. the point where further progress is almost
>> impossible or at least requires to start over from scratch.
>
> Now, that's an interesting an claim. I'm sure the list of reasons
> would be interesting, to me, or, at least, amusing.

Why should I go down to your level of discussion? You will definitely
beat me with your experince of being an idiot, being subtle offensive,
and misunderstand everything I write. I should rather keep you in my
killfile.

You could instead look at the history of GCC development, which at that
time struggled under Kenner, was forked to EGCS, and the maintainers of
EGCS struggled for years to get things into GCC which had been state of
the art, like SSA-trees.

And it's not just GCC: llvm has exactly the same problems; the code llvm
now generates for Gforth remind me on the 3.x series of GCC, and while
GCC 4.x generates better code than llvm, it has not fully recovered.
They now try to change language (move to C++) in GCC, but llvm is
already written in C++; it doesn't really help.

Mark Wills

unread,
Oct 15, 2012, 10:01:54 AM10/15/12
to
Bernd:
> Commercial failures: yes, sort-of (RTX made profits, especially with the
> radiation-harded RTX2000).  Technical failures: not really.

I'm inclined to agree. The figures shown in the Koopman book that I
have show stack based processors (at that time, based on the
technology and production processes of that time) to be faster at just
about everything (IIRC) than 'conventional' processors.

For some reason, they just didn't get any traction. Maybe it just
about where the investment/R&D dollars are, not the technology per-se.
I guess they were just 'too different'. Perhaps it's considered too
difficult to write code that operates on a stack (where access to data
is constrained by the order in which it is on the stack) as opposed to
code that accesses data via registers (essentially random access).
Personally, I don't find it particularly difficult, and I'm sure I'd
take to writing 'Forth assembly' like a duck to water.

I think the world missed a good opportunity. There was a chance to
explore an avenue of processor design where processors could have
perhaps been an order of magnitude less complex, with the payoffs in
performance and power consumption that that brings. Instead, we seemed
to go down the "the more complex the better" route.

Paul Rubin

unread,
Oct 15, 2012, 12:31:18 PM10/15/12
to
Mark Wills <forth...@gmail.com> writes:
> Perhaps it's considered too difficult to write code that operates on a
> stack (where access to data is constrained by the order in which it is
> on the stack) as opposed to code that accesses data via registers
> (essentially random access).

Stack processors including Chuck's have added registers to deal with the
insufficiency of pure stacks for implementing useful code. Without the
registers you end up having to use memory to hold temporary or other
very frequently accessed values. In the F18's case, a memory access
burns several instruction slots (because the memory address is in the
program stream) plus the access itself is several times slower than a
stack or register operation.

Koopman's book at least for some sections assumed instructions to get at
deeper levels of the stack than the top element, but in that case it's
not really a pure stack machine.

> Personally, I don't find it particularly difficult, and I'm sure I'd
> take to writing 'Forth assembly' like a duck to water.

The usefulness of local variables in practical Forth shows that there's
occasionally too much live data to do everything on the stack. In
interpreted Forths where the stack itself is also in memory, or in
compiled Forths where stack and local accesses end up using registers,
this works out ok. In stack hardware where locals and memory are 10x(?)
slower than stack accesses, overall program performance probably
suffers.

> I think the world missed a good opportunity. There was a chance to
> explore an avenue of processor design where processors could have
> perhaps been an order of magnitude less complex, with the payoffs in
> performance and power consumption that that brings. Instead, we seemed
> to go down the "the more complex the better" route.

"order of magnitude" sounds pretty dubious even for very simple
processors. Chuck's F18 has around 7000 transistors (I guess that
doesn't count the memory arrays) and I think this is comparable to the
classic 8-bit micros that were sort of similar in capability. I could
accept that the F18's performance is X percent better for some moderate
X, but I don't believe 10 times better.

Bernd Paysan

unread,
Oct 15, 2012, 1:51:06 PM10/15/12
to
Paul Rubin wrote:

> Mark Wills <forth...@gmail.com> writes:
>> Perhaps it's considered too difficult to write code that operates on
>> a stack (where access to data is constrained by the order in which it
>> is on the stack) as opposed to code that accesses data via registers
>> (essentially random access).
>
> Stack processors including Chuck's have added registers to deal with
> the insufficiency of pure stacks for implementing useful code.

Chuck has an A and B register, which are hard-coded in the instructions
that use them (i.e. they are not general purpose registers); they are
used as pointers into memory. The Transputer (which had a very shallow
stack) had its workspace, a pointer into the fast on-chip SRAM which
could be used with small indices (the pointer could point anywhere, but
it usually pointed into the small SRAM). The workspace has been used
for local variables, and moving the pointer made call/returns quick.
Such a workspace pointer plus a suitable way of caching the memory
underneath it could also work quite well for OOP - you keep your current
object close to the CPU, and access its members quickly. You probably
need two of them in that case - one for the locals, one for the current
object.

> Without
> the registers you end up having to use memory to hold temporary or
> other very frequently accessed values.

Yes. AFAIK about 1/3 of all operations are memory accesses, and that's
not just the programs I write; this equation holds true even when you
have enough registers for your locals.

> In the F18's case, a memory access
> burns several instruction slots (because the memory address is in the
> program stream) plus the access itself is several times slower than a
> stack or register operation.

Which is more related to Chuck's inexperience in designing memories.
The small memory I used for the TSMC 180nm process had an access time of
<2ns (worst case, IIRC). My recommendation is to have a small and fast
SRAM memory for these values, and a larger memory for the program - the
program access is sequential, and prefetching can hide most of the
latency.

> Koopman's book at least for some sections assumed instructions to get
> at deeper levels of the stack than the top element, but in that case
> it's not really a pure stack machine.

And it makes the stack slow, as well. The stack is ultra-fast,
*because* you directly can only access TOS and NOS, and nothing else.

>> Personally, I don't find it particularly difficult, and I'm sure I'd
>> take to writing 'Forth assembly' like a duck to water.
>
> The usefulness of local variables in practical Forth shows that
> there's
> occasionally too much live data to do everything on the stack. In
> interpreted Forths where the stack itself is also in memory, or in
> compiled Forths where stack and local accesses end up using registers,
> this works out ok. In stack hardware where locals and memory are
> 10x(?) slower than stack accesses, overall program performance
> probably suffers.

There's no need to make your memory access *that* slow. Single-cycle
for a small SRAM is easily doable. You need something like the
Transputer's workspace to be fast: a pointer register, and a bunch of
instructions that fetch and store words at small offsets relative to
this register.

In Gforth's VM, we have something similar: The local stack is a
register, and a few primitives are dedicated to access the first few
locals directly. We have decided for four integer locals and two
floating point locals (only fetchs, no stores, as the preferred
programming model with Gforth locals is to assign once, and that's done
with a >l, a move to the local stack).

>> I think the world missed a good opportunity. There was a chance to
>> explore an avenue of processor design where processors could have
>> perhaps been an order of magnitude less complex, with the payoffs in
>> performance and power consumption that that brings. Instead, we
>> seemed to go down the "the more complex the better" route.
>
> "order of magnitude" sounds pretty dubious even for very simple
> processors. Chuck's F18 has around 7000 transistors (I guess that
> doesn't count the memory arrays)

64 words by 16 bits by 6 transistors for an SRAM cell would be already
6k transistors, so it must be without memory arrays, but probably
including stacks.

> and I think this is comparable to the
> classic 8-bit micros that were sort of similar in capability. I could
> accept that the F18's performance is X percent better for some
> moderate X, but I don't believe 10 times better.

The classic 8-bit micros are really slow, especially since you needed 16
bit operations (which are then two 8 bit operations or more) quite
often. They also all needed several cycles or phases for one
instruction - typically 4 phases minimum. Going to a 1 cycle 16 bit
design gives an order of magnitude. The early Forth processors like the
Novix or RTX often crammed more than a single Forth instruction into one
instruction word, so they were faster per clock as the F18.

8 bit processors, especially the very popular 8051 have seen modern
single- or two-cycle implementations, but they are a lot larger than 7k
transistors.

Elizabeth D. Rather

unread,
Oct 15, 2012, 2:02:25 PM10/15/12
to
The problem wasn't lack of "market traction". They had a number of
important design wins, but the fact that just as Harris was about to go
into full production (a very cash-intensive process) the company made an
unfortunate acquisition (I believe of the semiconductor div. of RCA,
although I could be remembering wrong) which consumed their cash and
they had to halt further development of the RTX except for the rad-hard
RTX2010s.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

visua...@rocketmail.com

unread,
Oct 15, 2012, 2:39:00 PM10/15/12
to era...@forth.com
On Monday, October 15, 2012 8:02:24 PM UTC+2, Elizabeth D. Rather wrote:
> The problem wasn't lack of "market traction". They had a number of
> important design wins, but the fact that just as Harris was about to go
> into full production (a very cash-intensive process) the company made an
> unfortunate acquisition (I believe of the semiconductor div. of RCA,
> although I could be remembering wrong) which consumed their cash and
> they had to halt further development of the RTX except for the rad-hard
> RTX2010s.
>
> Cheers,
> Elizabeth
>
Yes, HARRIS bought INTERSIL and the INTERSIL management took over, killing the RTX2000.
A HARRIS insider told me back then - believe it or not - that HARRIS had to decide if they invest a lot of money into high volume fabs to feed the automobile market with RTX2000, and they didn't like to.
As I read now, they preferred to buy INTERSIL instead.

DB.
We are all tolerant as long as the other behaves as expected.

rickman

unread,
Oct 15, 2012, 4:43:52 PM10/15/12
to
You can do some very simple radio functions on a very simple processor,
but you have to have more horsepower to do more advanced stuff. Heck,
one of the radios I worked on had some two dozen processors in total
with at least one a full blown top end ARM and all sorts of IF
processing in an FPGA. Can an Arduio do that?

What is your point?

Rick

rickman

unread,
Oct 15, 2012, 4:52:21 PM10/15/12
to
On 10/15/2012 6:17 AM, visua...@rocketmail.com wrote:
> On Saturday, October 13, 2012 10:46:55 PM UTC+2, rickman wrote:>
>>
>>>> I remember looking at that some years ago, but I don't recall the
>>>> significance. What was the advantage of four stacks? What were the
>>>> "other" two stacks for?
>>
>
> If you don't already know what advantages you get of stacks, please look at "Design of digital computers" by Hans W. Gschwind, Edward J. McCluskey, Springer Verlag, Wien 1967. There you find the explanations why stack computers are the future. It's not only because stack operations are high speed zero address operations.

Wait a minute... you are citing a 35+ year old book for its forecast of
the future... what?

Today is yesterday's tomorrow! Am I here yet?

Rick

rickman

unread,
Oct 15, 2012, 5:10:57 PM10/15/12
to
On 10/15/2012 9:41 AM, Bernd Paysan wrote:
> Idiots like gavino

What!!!??? @@!!***

How dare you call Gavino an "Idiot"??? You should be washing his feet
and dropping rose petals on the ground in front of him when he walks.
No, you aren't worthy, obviously.

Gavino rules!


> Stack
> processors are alien and unknown, and that's the main reason for
> rejecting to use them.

I once worked at a company making array processors, which were DSP chips
in a double rack cabinet (yes, I'm that old). They had a 9U eurocard
holding an IO processor (a DMA engine) based on a sequencer from a bit
slice family. I added some code to the home brewed assembler so I could
add a new instruction to help with self test. Management (if you call
what they had management) was aghast that I dared to change the
microcode assembler! Jeeze, one damn new instruction and you thought
the world would end!!! I can't imagine what they would have done if I
suggested using a different architecture!

Have fun!

Rick

rickman

unread,
Oct 15, 2012, 5:15:27 PM10/15/12
to
On 10/15/2012 10:01 AM, Mark Wills wrote:
>
> I think the world missed a good opportunity. There was a chance to
> explore an avenue of processor design where processors could have
> perhaps been an order of magnitude less complex, with the payoffs in
> performance and power consumption that that brings. Instead, we seemed
> to go down the "the more complex the better" route.

When people complain about the "complexity" of mainstream processors,
they often don't take into account the inherent speed they are capable
of. Today's laptops run rings around any other CPU in existence. That
is why they are at the heart of the industry. ARMs are taking over from
the low end, but they are the same thing, just smaller versions which
are working from the other end upward rather than the high end downward.

A MISC processor (or even an array) is not a substitute for a CISC or a
RISC. Different horses for different courses.

Rick

rickman

unread,
Oct 15, 2012, 5:27:02 PM10/15/12
to
On 10/15/2012 12:31 PM, Paul Rubin wrote:
> Mark Wills<forth...@gmail.com> writes:
>> Perhaps it's considered too difficult to write code that operates on a
>> stack (where access to data is constrained by the order in which it is
>> on the stack) as opposed to code that accesses data via registers
>> (essentially random access).
>
> Stack processors including Chuck's have added registers to deal with the
> insufficiency of pure stacks for implementing useful code. Without the
> registers you end up having to use memory to hold temporary or other
> very frequently accessed values. In the F18's case, a memory access
> burns several instruction slots (because the memory address is in the
> program stream) plus the access itself is several times slower than a
> stack or register operation.

When the address is in A or B, the F18A memory access is more than three
times as slow as CPU instructions (max 5.7 ns vs max 1.65 ns) because
they are memory accesses. If the memory address is indirect through P,
then it adds another memory cycle (now six times slower than a CPU
cycle) plus the time for delaying the instruction prefetch which depends
on the slot occupied by the instruction.


>> Personally, I don't find it particularly difficult, and I'm sure I'd
>> take to writing 'Forth assembly' like a duck to water.
>
> The usefulness of local variables in practical Forth shows that there's
> occasionally too much live data to do everything on the stack. In
> interpreted Forths where the stack itself is also in memory, or in
> compiled Forths where stack and local accesses end up using registers,
> this works out ok. In stack hardware where locals and memory are 10x(?)
> slower than stack accesses, overall program performance probably
> suffers.

Are local variables "useful" or just convenient? I don't agree that
there is "too much live data" for the stack, at least very seldom. It
has been put to me that this indicates poor factoring. But I guess I
shouldn't open the factoring can of worms.


Rick

rickman

unread,
Oct 15, 2012, 5:47:23 PM10/15/12
to
On 10/15/2012 1:51 PM, Bernd Paysan wrote:
> Paul Rubin wrote:
>
>> In the F18's case, a memory access
>> burns several instruction slots (because the memory address is in the
>> program stream) plus the access itself is several times slower than a
>> stack or register operation.
>
> Which is more related to Chuck's inexperience in designing memories.
> The small memory I used for the TSMC 180nm process had an access time of
> <2ns (worst case, IIRC). My recommendation is to have a small and fast
> SRAM memory for these values, and a larger memory for the program - the
> program access is sequential, and prefetching can hide most of the
> latency.

Chuck's chip does prefetching of the opcodes. How can you prefetch an
operand when the address register just changed in the instruction
preceding the memory access?


>> Koopman's book at least for some sections assumed instructions to get
>> at deeper levels of the stack than the top element, but in that case
>> it's not really a pure stack machine.
>
> And it makes the stack slow, as well. The stack is ultra-fast,
> *because* you directly can only access TOS and NOS, and nothing else.

I think technically you can only access TOS in a true stack. Typically
the TOS and often the NOS are implemented in registers with a stack
behind them, that is how Chuck does it in the F18A.


> 64 words by 16 bits by 6 transistors for an SRAM cell would be already
> 6k transistors, so it must be without memory arrays, but probably
> including stacks.

This makes me wonder if there is too much emphasis on keeping the CPU
small and a larger RAM would not be hard at all with little impact on
the CPU speed. There is small and there is tiny. I believe most
processors end up with half the die or more as RAM/cache. That's not
all bad. Actually I'd like to see blocks of RAM like they have in
FPGAs. But that would break the regularity of the design which I'm sure
is not desired much by the developers (I say that instead of Chuck
because I'm led to believe there are other designers involved, but who
knows at what level).


>> and I think this is comparable to the
>> classic 8-bit micros that were sort of similar in capability. I could
>> accept that the F18's performance is X percent better for some
>> moderate X, but I don't believe 10 times better.
>
> The classic 8-bit micros are really slow, especially since you needed 16
> bit operations (which are then two 8 bit operations or more) quite
> often. They also all needed several cycles or phases for one
> instruction - typically 4 phases minimum. Going to a 1 cycle 16 bit
> design gives an order of magnitude. The early Forth processors like the
> Novix or RTX often crammed more than a single Forth instruction into one
> instruction word, so they were faster per clock as the F18.
>
> 8 bit processors, especially the very popular 8051 have seen modern
> single- or two-cycle implementations, but they are a lot larger than 7k
> transistors.

8051 is a poor example. Other 8 bit processors have much better designs
and are far from large. But they typically aren't done in current
process technology because they need low power, higher I/O voltages and
don't need to run any faster than the Flash.

With today's technology I don't see the point of keeping a design to 7 k
transistors when that lets you put some 50-60 times the number of
processors on the same size die as the GA144. So maybe you need 20
times more processors and more on chip memory... or maybe no more
processors and more on chip memory?

I get tired of thinking about the GA144 sometimes. A manager once told
me (back when I was very young) that I had raw talent (with a BIG
emphasis on RAW). That is the GA144, a chip with lots of RAW potential.
Who knows if it will ever be developed?

Rick

Paul Rubin

unread,
Oct 15, 2012, 6:28:54 PM10/15/12
to
rickman <gnu...@gmail.com> writes:
> When the address is in A or B, the F18A memory access is more than
> three times as slow as CPU instructions (max 5.7 ns vs max 1.65 ns)
> because they are memory accesses. If the memory address is indirect
> through P, then it adds another memory cycle (now six times slower
> than a CPU cycle) plus the time for delaying the instruction prefetch
> which depends on the slot occupied by the instruction.


Don't forget that even in the direct case, there are two or three memory
accesses. E.g. adding register A to the TOS is "@a +" (two instruction
slots), but adding variable X to the TOS ("X +") gets the address of X
from the bits of the remaining instruction slots in the current word (if
possible--potentially it has to read the -next- word to get the address
of X), then it has to read the contents of X, and finally it has to
fetch the following word before it can continue with the instruction
stream.

I remember finding it incredibly frustrating to not have a way (unless I
missed it) to load small constants like 1 without burning several words.

rickman

unread,
Oct 15, 2012, 7:07:30 PM10/15/12
to
On 10/15/2012 6:28 PM, Paul Rubin wrote:
> rickman<gnu...@gmail.com> writes:
>> When the address is in A or B, the F18A memory access is more than
>> three times as slow as CPU instructions (max 5.7 ns vs max 1.65 ns)
>> because they are memory accesses. If the memory address is indirect
>> through P, then it adds another memory cycle (now six times slower
>> than a CPU cycle) plus the time for delaying the instruction prefetch
>> which depends on the slot occupied by the instruction.
>
>
> Don't forget that even in the direct case, there are two or three memory
> accesses. E.g. adding register A to the TOS is "@a +" (two instruction
> slots), but adding variable X to the TOS ("X +") gets the address of X
> from the bits of the remaining instruction slots in the current word (if
> possible--potentially it has to read the -next- word to get the address
> of X), then it has to read the contents of X, and finally it has to
> fetch the following word before it can continue with the instruction
> stream.

The remaining bits of an instruction word can be used for immediate
data, but are NEVER used for an address I'm pretty sure. The opcode
used is @p, indirect P with autoincrement which fetches the next word
from instruction memory and increments the program counter. This is
followed by a! or b! and then @ or @b respectively. So it would be
@p a! @a and an address in the next word to read the value in memory
with an immediate addressing mode. It's been a couple of months since
I've looked at the GA144 so correct me if I am wrong on this.


> I remember finding it incredibly frustrating to not have a way (unless I
> missed it) to load small constants like 1 without burning several words.

I know, but try designing a MISC sometime. There are tons of tradeoffs
and Chuck picked ones that often end up with multiple instructions.
Read his pages, he has some shortcuts that aren't so bad. Zero is just
DUP OR if the stack is empty or DUP DUP OR if it isn't. Minus 1 is "get
a zero" and -. There are some that aren't about constants. Subtract is
quicker if you subtract S from T instead of the conventional T from S...
- . + - rather than push - pop . + -

The idea is that the ALU opcodes run really fast, so you can afford to
run a few off to do basic things that you aren't doing every six
instructions. I get that. But there are other issues I have trouble
with, like not having complete timing info.

Rick

rickman

unread,
Oct 15, 2012, 7:09:25 PM10/15/12
to
Opps, I forgot the link. Lots of useful info on this page. Like much
of his stuff, you really need to read it multiple times to get all the
juice from the grapes.

http://www.colorforth.com/inst.htm

Rick

Rod Pemberton

unread,
Oct 16, 2012, 1:21:52 AM10/16/12
to
"rickman" <gnu...@gmail.com> wrote in message
news:k5i4vr$qhl$1...@dont-email.me...
> On 10/15/2012 6:28 PM, Paul Rubin wrote:
...

> > I remember finding it incredibly frustrating to not have a way (unless I
> > missed it) to load small constants like 1 without burning several words.
>
> I know, but try designing a MISC sometime. There are tons of tradeoffs
> and Chuck picked ones that often end up with multiple instructions.
> Read his pages, he has some shortcuts that aren't so bad. Zero is just
> DUP OR if the stack is empty or DUP DUP OR if it isn't. Minus 1 is "get
> a zero" and -. There are some that aren't about constants. Subtract is
> quicker if you subtract S from T instead of the conventional T from S...
> - . + - rather than push - pop . + -
>

Instead of "DUP OR", I think you meant "DUP XOR".


So, must you do a stack size check to determine which sequence to use?

"stack_size" 0<> IF DUP THEN DUP XOR

That's obtuse.


Rod Pemberton


Rod Pemberton

unread,
Oct 16, 2012, 1:27:36 AM10/16/12
to
"Paul Rubin" <no.e...@nospam.invalid> wrote in message
news:7xtxtvo...@ruckus.brouhaha.com...
> Mark Wills <forth...@gmail.com> writes:
> > Perhaps it's considered too difficult to write code that operates on a
> > stack (where access to data is constrained by the order in which it is
> > on the stack) as opposed to code that accesses data via registers
> > (essentially random access).
>
> Stack processors including Chuck's have added registers to deal with the
> insufficiency of pure stacks for implementing useful code. Without the
> registers you end up having to use memory to hold temporary or other
> very frequently accessed values.

Exactly. Memory is generally much slower than registers. Registers are
usually implemented in very fast RAM and use a faster bus.

> > I think the world missed a good opportunity. There was a chance to
> > explore an avenue of processor design where processors could have
> > perhaps been an order of magnitude less complex, with the payoffs in
> > performance and power consumption that that brings. Instead, we seemed
> > to go down the "the more complex the better" route.
>
> "order of magnitude" sounds pretty dubious even for very simple
> processors. Chuck's F18 has around 7000 transistors (I guess that
> doesn't count the memory arrays) and I think this is comparable to the
> classic 8-bit micros that were sort of similar in capability. I could
> accept that the F18's performance is X percent better for some moderate
> X, but I don't believe 10 times better.

From another post of mine a five years ago:

8088 core (29000 transistors)
6502 core (9000 transistors)
Z80 core (8500 transistors)

So, the F18 is definately comparable on a transistor count basis
to "classic 8-bit micros".


Rod Pemberton


Paul Rubin

unread,
Oct 16, 2012, 1:26:44 AM10/16/12
to
"Rod Pemberton" <do_no...@notemailnotz.cnm> writes:
> Instead of "DUP OR", I think you meant "DUP XOR".


DUP OR is correct. OR on the F18 is an exclusive or. If you have a
concern about that, then as they say on the Charles Schwab commercials,
"talk to Chuck".

> So, must you do a stack size check to determine which sequence to use?

No. The F18 has circular stacks that don't underflow.

Rod Pemberton

unread,
Oct 16, 2012, 1:38:37 AM10/16/12
to
"Bernd Paysan" <bernd....@gmx.de> wrote in message
news:5582590.o...@sunwukong.fritz.box...
> Rod Pemberton wrote:
...

> > Since both Chuck Moore's and Philip Koopman's numerous forrays into
> > stack-based, Forth microprocessors were known failures
>
> Commercial failures: yes, sort-of (RTX made profits, especially with the
> radiation-harded RTX2000). Technical failures: not really.

Technical in what sense? In the sense, that they were built? Yes, they
were successes in that sense. Or, in the sense that they developed new
technology that is still used in microprocessors today? No, they were
failures in that sense.

> I understand that your simple troll-mind can only provoke, and not gain
> knowledge.

Can you _possibly_ state something without an insult included?

> > by the end of
> > the 1980's or early 1990's, what made you believe your stack-based
> > design would be a success?
>
> Come on: This was a cool diploma thesis. The motivation to make a cool
> diploma thesis is not to get rich quickly, but to explore things nobody
> else has tried before, i.e. combining things that looked promising from
> a technical point of view (VLIW+stacks). That's how you make progress.

Stack machines looked promising after well publicized market failures?

I guess you were slightly out of touch with reality at the time.

> [insult relocated]
...

> Even if it is not a commercial success, I have learned a lot, and
> got a degree. Learning a lot is the main point of writing a diploma
> thesis, so it was a tremendous success.
...

> (to be precise: I don't mean you personally with "you". Idiot's can't
> make progress, so maybe you can figure out why "you personally" is
> ruled out).

Well, declaring me to be an idiot puts you in an intectual class similar to
that of Albert Einstein. So, genius, why did the EU get the peace prize
(insert any prestigious intellectual, scientific, or humanitarian award)
and not you?

> [...] Stack processors are alien and unknown, and that's the main
> reason for rejecting to use them.

Seriously? (rhetorical)

You know full well that the efficiency of register based processors is the
reason stack processors aren't used.

The 6502 demonstrated this the best. It had processor registers, stack in
memory, and fast zero-page memory access. The fast zero-page access was
used to provide a psuedo-register file. This processor, by itself, clearly
demonstrated that registers, even psuedo-registers, and not stacks, were far
more efficient.

> >> C compilers had just reached the point of
> >> unmaintainability, i.e. the point where further progress is almost
> >> impossible or at least requires to start over from scratch.
> >
> > Now, that's an interesting an claim. I'm sure the list of reasons
> > would be interesting, to me, or, at least, amusing.
>
> Why should I go down to your level of discussion? You will definitely
> beat me with your experince of being an idiot, being subtle offensive,
> and misunderstand everything I write. I should rather keep you in my
> killfile.

If you had killfiled me as you've stated you would at least three times
now, you wouldn't have responded to this post.

I'm not a Usenet newb. How is your memory? I've been posting to
c.l.f. for six years.

I was first introduced to Usenet in the late 1980's. But, I didn't start
posting to Usenet until a few years ago.

> You could instead look at the history of GCC development, which at that
> time struggled under Kenner, was forked to EGCS, and the maintainers of
> EGCS struggled for years to get things into GCC which had been state of
> the art, like SSA-trees.
>
> And it's not just GCC: llvm has exactly the same problems; the code llvm
> now generates for Gforth remind me on the 3.x series of GCC, and while
> GCC 4.x generates better code than llvm, it has not fully recovered.
> They now try to change language (move to C++) in GCC, but llvm is
> already written in C++; it doesn't really help.

I was hoping for a list of what was unmaintainable with modern C compilers,
what limited their progress, and what required their current implementations
to be discarded. I've seen nothing of the sort of claimed issues with any
of the C compilers I've used, or even those that I'm aware of.


Rod Pemberton





Paul Rubin

unread,
Oct 16, 2012, 1:35:19 AM10/16/12
to
Paul Rubin <no.e...@nospam.invalid> writes:
>> So, must you do a stack size check to determine which sequence to use?
> No. The F18 has circular stacks that don't underflow.

Or to answer more relevantly, I should have said, the idea is that you
normally would know statically what stuff is on the stack (specifically
about whether.you are willing to trash the top element).

Rod Pemberton

unread,
Oct 16, 2012, 1:50:18 AM10/16/12
to
"Paul Rubin" <no.e...@nospam.invalid> wrote in message
news:7xtxtvg...@ruckus.brouhaha.com...
> "Rod Pemberton" <do_no...@notemailnotz.cnm> writes:
...

> > Instead of "DUP OR", I think you meant "DUP XOR".
>
> DUP OR is correct. OR on the F18 is an exclusive or.

Does it have an inclusive-or? If so (expected), what's it named? "IOR"?

FYI, one should probably clarify if an "OR" instruction is actually an
exclusive-or, and isn't actually an inclusive-or, as would be expected by
most. Otherwise, it requires familiarity with instruction set of the
processor in question.


Rod Pemberton



Paul Rubin

unread,
Oct 16, 2012, 1:48:28 AM10/16/12
to
"Rod Pemberton" <do_no...@notemailnotz.cnm> writes:
> I was hoping for a list of what was unmaintainable with modern C compilers,
> what limited their progress, and what required their current implementations
> to be discarded. I've seen nothing of the sort of claimed issues with any
> of the C compilers I've used, or even those that I'm aware of.

I've been involved in GCC development in the not-very-recent past. It
is a grand old program that has built up a lot of cruft over the years
and become hard to maintain. I do think using C++ will help some. But
frankly I agree with LCC co-author Chris Fraser, who says that C is a
lousy language for writing compilers in (ref:
http://stackoverflow.com/a/809954 ).

Paul Rubin

unread,
Oct 16, 2012, 1:54:53 AM10/16/12
to
"Rod Pemberton" <do_no...@notemailnotz.cnm> writes:
> Does it have an inclusive-or?

Nope. You could look yourself, http://www.colorforth.com/inst.htm .

> Otherwise, it requires familiarity with instruction set of the
> processor in question.

Yes. It's the F18 machine language, and machine language programming
usually requires familiarity with the processor's instruction set.

Elizabeth D. Rather

unread,
Oct 16, 2012, 3:18:52 AM10/16/12
to
On 10/15/12 7:38 PM, Rod Pemberton wrote:
> "Bernd Paysan" <bernd....@gmx.de> wrote in message
> news:5582590.o...@sunwukong.fritz.box...
>> Rod Pemberton wrote:
> ...
>
>>> Since both Chuck Moore's and Philip Koopman's numerous forrays into
>>> stack-based, Forth microprocessors were known failures
>>
>> Commercial failures: yes, sort-of (RTX made profits, especially with the
>> radiation-harded RTX2000). Technical failures: not really.
>
> Technical in what sense? In the sense, that they were built? Yes, they
> were successes in that sense. Or, in the sense that they developed new
> technology that is still used in microprocessors today? No, they were
> failures in that sense.

I would call a "technical failure" something that didn't work, or didn't
work well enough to establish a market. The RTX was killed *after* it
was proven to work, and *before* it went into full production, and the
reasons were economic in ways independent of anything specific to the
RTX itself.

visua...@rocketmail.com

unread,
Oct 16, 2012, 5:20:54 AM10/16/12
to
On Monday, October 15, 2012 10:52:32 PM UTC+2, rickman wrote:
> On 10/15/2012 6:17 AM, visualforth.com wrote:
>
> > If you don't already know what advantages you get of stacks, please look at "Design of digital computers" by Hans W. Gschwind, Edward J. McCluskey, Springer Verlag, Wien 1967. There you find the explanations why stack computers are the future. It's not only because stack operations are high speed zero address operations.
>
> Wait a minute... you are citing a 35+ year old book for its forecast of
> the future... what?
>
> Today is yesterday's tomorrow! Am I here yet?
>
> Rick

Yes, I have the audacity to quote an ancient truth.
The Apple is falling since Newton's time, but Chuck Moore is still alive!

Dirk Bruehl
http://4e4th.com
We are all tolerant - as long as the other behaves as expected!

Stephen Pelc

unread,
Oct 16, 2012, 6:00:33 AM10/16/12
to
On Mon, 15 Oct 2012 07:01:54 -0700 (PDT), Mark Wills
<forth...@gmail.com> wrote:

>Bernd:
>> Commercial failures: yes, sort-of (RTX made profits, especially with the
>> radiation-harded RTX2000). =A0Technical failures: not really.
...
>For some reason, they just didn't get any traction. Maybe it just
>about where the investment/R&D dollars are, not the technology per-se.

The reasons for the halting of the Harris RTX development projects are
several and a little strange. Just to set the record straight, but it
was a very long time ago and I have probably got some things wrong.
Here are a few items.

Harris took over the Intersil and RCA semiconductor products. The
RTX "champion" at Harris was reassigned to takeover management.

A third-party fab was being negotiated. Iraq invaded Kuwait and the
third party got cold feet and pulled out.

There were probably other factors.

From a technical view, the RTX2000 was not a good machine on which
to run C. Even then, a new CPU had to be a good target for C in order
to achieve commercial success. The RTX4000 (32 bit CPU) was a
different beast, but it was cancelled too.

Stephen

--
Stephen Pelc, steph...@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
133 Hill Lane, Southampton SO15 5AF, England
tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
web: http://www.mpeforth.com - free VFX Forth downloads

Syd Rumpo

unread,
Oct 16, 2012, 6:10:50 AM10/16/12
to
On 15/10/2012 22:47, rickman wrote:
> On 10/15/2012 1:51 PM, Bernd Paysan wrote:
>> Paul Rubin wrote:

<snip>

>>> Koopman's book at least for some sections assumed instructions to get
>>> at deeper levels of the stack than the top element, but in that case
>>> it's not really a pure stack machine.
>>
>> And it makes the stack slow, as well. The stack is ultra-fast,
>> *because* you directly can only access TOS and NOS, and nothing else.
>
> I think technically you can only access TOS in a true stack. Typically
> the TOS and often the NOS are implemented in registers with a stack
> behind them, that is how Chuck does it in the F18A.

The PSC1000 allowed access to IIRC the top 15 return stack elements.
That means you can push locals on to it and then use r7@ r7! etc as
appropriate. There was a multiple rdrop instruction for tidying up
afterwards.

In practice, with real code, I found this a very useful idea.

<snipped ad lib>

> I get tired of thinking about the GA144 sometimes.

Yes. It's frustrating to have something available with such great
potential, yet not be able to think of an application which couldn't be
done better in some other way.

Maybe (heresy!) it's a dud. The RTX2000s were great, the PSC1000 was
great too, technically. Several FPGA stack machines look to be very
useful, but maybe the GA144 is just backed too far into a niche.

Cheers
--
Syd

Ed

unread,
Oct 16, 2012, 7:31:05 AM10/16/12
to
Elizabeth D. Rather wrote:
> On 10/15/12 7:38 PM, Rod Pemberton wrote:
> > "Bernd Paysan" <bernd....@gmx.de> wrote in message
> > news:5582590.o...@sunwukong.fritz.box...
> >> Rod Pemberton wrote:
> > ...
> >
> >>> Since both Chuck Moore's and Philip Koopman's numerous forrays into
> >>> stack-based, Forth microprocessors were known failures
> >>
> >> Commercial failures: yes, sort-of (RTX made profits, especially with the
> >> radiation-harded RTX2000). Technical failures: not really.
> >
> > Technical in what sense? In the sense, that they were built? Yes, they
> > were successes in that sense. Or, in the sense that they developed new
> > technology that is still used in microprocessors today? No, they were
> > failures in that sense.
>
> I would call a "technical failure" something that didn't work, or didn't
> work well enough to establish a market. The RTX was killed *after* it
> was proven to work, and *before* it went into full production, and the
> reasons were economic in ways independent of anything specific to the
> RTX itself.

Yes but how many times can one claim Forth's lack of traction in the
marketplace is due to bad luck, lack of marketing, poor education,
inadequate Standards, hobbyists etc. When one runs out of excuses,
what fact remains?



Anton Ertl

unread,
Oct 16, 2012, 7:46:54 AM10/16/12
to
There are a number of ways to deal with having lots of live data (I
listed a number of them in my EuroForth 2011 paper
<http://www.complang.tuwien.ac.at/anton/euroforth/ef11/papers/ertl.pdf>.
Before we had locals, we used the other ways, but locals are often
better; e.g., using global variables makes the code non-reentrant,
using locals doesn't.

Concerning factoring, it helps in some cases, but not always. I put
up some cases where I could not come up with a factoring that
eliminates locals here as a challenge
<2003Jan2...@a0.complang.tuwien.ac.at>
<a3j4aa$a23$1...@news.tuwien.ac.at>
<2009Sep1...@mips.complang.tuwien.ac.at>, but nobody has taken up
the challenge.

- anton
--
M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html
comp.lang.forth FAQs: http://www.complang.tuwien.ac.at/forth/faq/toc.html
New standard: http://www.forth200x.org/forth200x.html
EuroForth 2012: http://www.euroforth.org/ef12/

Bernd Paysan

unread,
Oct 16, 2012, 10:56:10 AM10/16/12
to
Ed wrote:
> Yes but how many times can one claim Forth's lack of traction in the
> marketplace is due to bad luck, lack of marketing, poor education,
> inadequate Standards, hobbyists etc. When one runs out of excuses,
> what fact remains?

IMHO the main fact that I see as limiting for Forth is its strangeness.
Forth is only suitable for people who don't fear strange things.

Example: I have a recumbent bike. Not many people do have recumbent
bikes, as they look strange. It depends on age - when I drive through a
group of kids, I get comments like "cool bike!" and "I want that, too!"
When I drive through a group of near-retirement adults, I get comments
like "strange/awful/fearsome bike!" and some people even raised concerns
about how to steer it (there is a handlebar attached to the front wheel,
you know?) or that I need to turn the pedals with my hands or something
similarly silly - people who have seen me riding it, and I certainly
push the pedals with my feets, and have my hands on the handlebar -
which is underneath my seat.

A recumbent bike looks "all wrong" for people who are used to upright
bikes. A Forth compiler or program looks "all wrong" to people who are
experienced with C or similar languages.

In the Forth community, the number of people who ride a recumbent bike
is significantly larger. Recumbent bikes have a similar lack of
traction in the bicycle world, even though they are more secure, more
comfortable, and faster than normal bikes. Why? It's obvious that they
are just too strange for adults who have settled their believes and are
no longer in explorative mode.

Therefore, I think Dirk is right: If you want to make Forth popular, get
the kids. I'm not sure if you get the kids with a cheap controller
board (maybe it's more interesting to them to program their beloved
phones), but all experience we have in the Forth community with kids is
that they get it pretty quickly and that they don't have the problems
experienced programmers have with Forth: It isn't "too strange" for
them.

I've made the experience that when you suggest changing habits to people
who have settled, they interpret it immediately as insult. When I said
to my ex-boss that I'm not going to have a stop-over flight to San
Francisco, but I take the direct flight instead, he had to approve that
(the direct flight was even slightly cheaper), but he complained
bitterly that I made them all look stupid. Look, they *were* stupid.
It's not my fault for pointing out the obvious. But these people like
to kill the messenger, and stay as stupid as they are.

When I was younger, I often wondered why the most retarded technology
seems to get the most traction. Ugly processors like PIG^HC or 8051,
x86, DOS/Windows, nowadays things like PHP. When you ask people why
they have chosen them, the response is always "because that's what the
majority has chosen". Sheeple don't think for themselves, they follow
the herd. The herd is stupid. And stupid people - when forced to make
a decision - get it wrong. At least most of the time. IBM has chosen
x86 and DOS, and nobody was ever fired for buying IBM - that's the way
these people finally make their decisions: "Even if it might be wrong
(which is inevitable, because I'm stupid), I'm not going to be blamed."

Retarded technology for retarded people. If you want to use Forth, you
have to hide it. Inside the JavaScript compiler. As boot system,
inside a controller. Or, as Carrier IQ demonstrated: In some pre-
installed spyware. Most Forth success stories are projects where the
Forth system is hidden. This is not by accident. Look e.g. at National
Instruments: They had a Forth system, Asyst, to glue all the instruments
together. They replaced it with Labview. And Labview is a huge
success. Labview is very deliberately written *not* to alienate anybody
stupid, especially not the deciders, who have never seen a real program,
but only flow-chars or data-flow graphs. That's how you program Labview
- you paint data/control flow graphs, looks like a textbook
illustration. Your boss will be able to pretend to understand it. It
does not make him look stupid, as ": foo @ swap over + ! ;" would.

Maybe I'm even wrong with Lua as user-visible language for my net2o
effort. Maybe a Labview/Matlab-like approach would be the real way to
success.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Josh Grams

unread,
Oct 16, 2012, 12:09:58 PM10/16/12
to
Anton Ertl wrote: <2012Oct1...@mips.complang.tuwien.ac.at>
> rickman <gnu...@gmail.com> writes:
>>Are local variables "useful" or just convenient? I don't agree that
>>there is "too much live data" for the stack, at least very seldom. It
>>has been put to me that this indicates poor factoring. But I guess I
>>shouldn't open the factoring can of worms.
>
> Concerning factoring, it helps in some cases, but not always. I put
> up some cases where I could not come up with a factoring that
> eliminates locals here as a challenge, but nobody has taken up the challenge.

For those who have trouble with message IDs:

><2003Jan2...@a0.complang.tuwien.ac.at>
https://groups.google.com/forum/#!msg/comp.lang.forth/fDIecx3K4yk/gbhctrTvxuEJ

><a3j4aa$a23$1...@news.tuwien.ac.at>
https://groups.google.com/forum/#!msg/comp.lang.forth/rZ9Dqtlt2mU/eGh-okHZNLcJ

><2009Sep1...@mips.complang.tuwien.ac.at>
https://groups.google.com/forum/#!msg/comp.lang.forth/CECdFr4jt5E/lEUpbG4XfHQJ

--Josh

visua...@rocketmail.com

unread,
Oct 16, 2012, 2:26:21 PM10/16/12
to mi-k...@t-online.de, di...@bruehlconsult.com
I totally agree. I have the same experience.
But I don't hide Forth.
There will be a day of accepting Forth.

Dirk Bruehl

"We are all totally tolerant - as long as the other behaves as expected!"

Elizabeth D. Rather

unread,
Oct 16, 2012, 2:52:25 PM10/16/12
to
The appropriate question to ask is whether in each instance you're
looking at a "reason" or an "excuse".

If you're trying to blame Forth's lack of traction on technical
inadquacy, you need to point to an actual technical shortcoming, not
perceptional issues. Perceptions are real and important, but they're in
the realm of communication and marketing, not technical issues.

rickman

unread,
Oct 16, 2012, 3:55:41 PM10/16/12
to
On 10/16/2012 6:10 AM, Syd Rumpo wrote:
> On 15/10/2012 22:47, rickman wrote:
>> On 10/15/2012 1:51 PM, Bernd Paysan wrote:
>>> Paul Rubin wrote:
>
> <snip>
>
>>>> Koopman's book at least for some sections assumed instructions to get
>>>> at deeper levels of the stack than the top element, but in that case
>>>> it's not really a pure stack machine.
>>>
>>> And it makes the stack slow, as well. The stack is ultra-fast,
>>> *because* you directly can only access TOS and NOS, and nothing else.
>>
>> I think technically you can only access TOS in a true stack. Typically
>> the TOS and often the NOS are implemented in registers with a stack
>> behind them, that is how Chuck does it in the F18A.
>
> The PSC1000 allowed access to IIRC the top 15 return stack elements.
> That means you can push locals on to it and then use r7@ r7! etc as
> appropriate. There was a multiple rdrop instruction for tidying up
> afterwards.
>
> In practice, with real code, I found this a very useful idea.
>
> <snipped ad lib>
>
>> I get tired of thinking about the GA144 sometimes.
>
> Yes. It's frustrating to have something available with such great
> potential, yet not be able to think of an application which couldn't be
> done better in some other way.

I don't agree with this. I have said many times that I think it can
make a very good digital receiver (SDR) because of being so like an FPGA
but at much lower power levels.


> Maybe (heresy!) it's a dud. The RTX2000s were great, the PSC1000 was
> great too, technically. Several FPGA stack machines look to be very
> useful, but maybe the GA144 is just backed too far into a niche.

Certainly there are any number of aspects that were not optimized for
real world designs that we all have right now. I'm told this design is
intended for the "Internet of Things". But in reality I think it was
not "intended" for anything. It is just what happened in Chuck's head
and now he is looking for how best to use it. Read his blog posts and
you will see that even now he seems to have no idea how to do any number
of tasks with the chip. It is all about exploring the solution space of
this device. Read "The Map is not the Territory".

Someone here said that about him. He is not really a product designer,
he is a researcher who isn't trying to design useful stuff, rather he
designs stuff and then looks for uses.

Rick

rickman

unread,
Oct 16, 2012, 3:58:56 PM10/16/12
to
On 10/16/2012 1:21 AM, Rod Pemberton wrote:
> "rickman"<gnu...@gmail.com> wrote in message
> news:k5i4vr$qhl$1...@dont-email.me...
>> On 10/15/2012 6:28 PM, Paul Rubin wrote:
> ....
>
>>> I remember finding it incredibly frustrating to not have a way (unless I
>>> missed it) to load small constants like 1 without burning several words.
>>
>> I know, but try designing a MISC sometime. There are tons of tradeoffs
>> and Chuck picked ones that often end up with multiple instructions.
>> Read his pages, he has some shortcuts that aren't so bad. Zero is just
>> DUP OR if the stack is empty or DUP DUP OR if it isn't. Minus 1 is "get
>> a zero" and -. There are some that aren't about constants. Subtract is
>> quicker if you subtract S from T instead of the conventional T from S...
>> - . + - rather than push - pop . + -
>>
>
> Instead of "DUP OR", I think you meant "DUP XOR".

You are not conversant with the GA144 assembly language. The OR
instruction *is* an "xor".


> So, must you do a stack size check to determine which sequence to use?
>
> "stack_size" 0<> IF DUP THEN DUP XOR
>
> That's obtuse.

If your code has to check the size of the stack to know if the stack is
empty you aren't can't handle programming the GA144.

You've got bloody 64 words of combined instruction and data space.
Figure it out!

No, you just aren't allowed to post anything about the GA144 at this
point. I'm revoking your privileges!

Rick

rickman

unread,
Oct 16, 2012, 4:00:32 PM10/16/12
to
On 10/16/2012 1:50 AM, Rod Pemberton wrote:
> "Paul Rubin"<no.e...@nospam.invalid> wrote in message
> news:7xtxtvg...@ruckus.brouhaha.com...
>> "Rod Pemberton"<do_no...@notemailnotz.cnm> writes:
> ....
>
>>> Instead of "DUP OR", I think you meant "DUP XOR".
>>
>> DUP OR is correct. OR on the F18 is an exclusive or.
>
> Does it have an inclusive-or? If so (expected), what's it named? "IOR"?
>
> FYI, one should probably clarify if an "OR" instruction is actually an
> exclusive-or, and isn't actually an inclusive-or, as would be expected by
> most. Otherwise, it requires familiarity with instruction set of the
> processor in question.

Programming in assembly language requires familiarity with the
instruction set of the processor... DUH!

Rick

rickman

unread,
Oct 16, 2012, 4:04:10 PM10/16/12
to
I've never looked at how locals are implemented. For them to be
reentrant, don't they have to be on the stack? What is your stack is
not addressable? Do you have to build a stack in memory? That would be
very awkward on the F18A. With such small memory space I think the
entire issue is a "moop" point to quote George Costanza.

Rick

rickman

unread,
Oct 16, 2012, 4:09:42 PM10/16/12
to
Are you suggesting that it just *sucks*???

Rick

humptydumpty

unread,
Oct 16, 2012, 4:25:10 PM10/16/12
to
On Tuesday, October 16, 2012 12:15:32 PM UTC, Anton Ertl wrote:
Hi Anton!

A try for map-array:

: do-at ( x xt[x--y] a -- y xt )
@ swap dup >R execute R> ;

: mpa ( .. xt a u -- .. )
cells bounds ?DO I do-at 1 cells +LOOP drop ;

Have a nice day,
humptydumpty

van...@vsta.org

unread,
Oct 16, 2012, 4:27:13 PM10/16/12
to
rickman <gnu...@gmail.com> wrote:
> Are you suggesting that it just *sucks*???

For more and more applications, it would be the Wrong Choice. I just
quickly typed in the following, which tallies use of words in input
text, and then reports by word and then by count of incidences. I
didn't have to think about word length limits or number of word limits,
nor did I have to fuss with storage allocation and recovery. I used
just low level Python data structures, not any prebuilt higher level
classes.

Languages need to climb a curve of expressiveness (generators and
lambda were later additions to Python). At some point, they become
unwieldy to further evolve, and then you look for the best points to
carry forward into a new generation of programming language.

Forth did OK as it evolved in place. We're to the point where it
really needs to have its best bits carry forward to something new.
I can't say how to do that; every time I attempt the exercise
I map out something which sucks. The thing which is needed is
obvious by its absence.

Andy Valencia

==================================

import sys
res = {}
for l in sys.stdin:
for w in l.split():
res[w] = res.get(w, 0) + 1

print "Sorted by word:"
for w in sorted(res.keys()):
print w, res[w]

print "Sorted by incidences:"
for w,count in sorted(res.iteritems(), key=lambda tup: tup[1]):
print w, count

--
Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

Syd Rumpo

unread,
Oct 16, 2012, 4:31:44 PM10/16/12
to
On 16/10/2012 20:55, rickman wrote:
> On 10/16/2012 6:10 AM, Syd Rumpo wrote:
>> On 15/10/2012 22:47, rickman wrote:

<snipn>

>>> I get tired of thinking about the GA144 sometimes.
>>
>> Yes. It's frustrating to have something available with such great
>> potential, yet not be able to think of an application which couldn't be
>> done better in some other way.
>
> I don't agree with this. I have said many times that I think it can
> make a very good digital receiver (SDR) because of being so like an FPGA
> but at much lower power levels.

Yes, I've looked at it from the FPGA standpoint, but it just doesn't
have the routing flexibility and multiplies are slower than you might
expect. Maybe that's fixable in the future. How far have you looked at
SDR on GA144?

<snipped>

> Someone here said that about him. He is not really a product designer,
> he is a researcher who isn't trying to design useful stuff, rather he
> designs stuff and then looks for uses.

Agreed, and it's an enviable place to be. That doesn't help me though,
and it doesn't make GA144 viable. I wish it were otherwise.

Cheers
--
Syd

rickman

unread,
Oct 16, 2012, 5:06:25 PM10/16/12
to
On 10/16/2012 4:31 PM, Syd Rumpo wrote:
> On 16/10/2012 20:55, rickman wrote:
>> On 10/16/2012 6:10 AM, Syd Rumpo wrote:
>>> On 15/10/2012 22:47, rickman wrote:
>
> <snipn>
>
>>>> I get tired of thinking about the GA144 sometimes.
>>>
>>> Yes. It's frustrating to have something available with such great
>>> potential, yet not be able to think of an application which couldn't be
>>> done better in some other way.
>>
>> I don't agree with this. I have said many times that I think it can
>> make a very good digital receiver (SDR) because of being so like an FPGA
>> but at much lower power levels.
>
> Yes, I've looked at it from the FPGA standpoint, but it just doesn't
> have the routing flexibility and multiplies are slower than you might
> expect. Maybe that's fixable in the future. How far have you looked at
> SDR on GA144?

I haven't gone too far with it. I was actually looking at an
oscilloscope design as a way to get familiar with the device and found
that Green Arrays would not provide the timing information I wanted to
analyze an SDRAM interface using a clock. I wanted to paper calculate
the timing from the clock input to the GA144 to signal outputs the same
way I would in an FPGA. This requires knowing timing relative to the
stop/start mechanism of the CPU, both on I/O and inter-processor. They
would not provide that info, partly because they haven't measured it and
it would be a PITA to characterize it the same way they do the rest of
the timing but they also said they felt this would provide too much
insight into their proprietary design. I was free, however, to buy an
eval board and characterize their chips for myself. Honest, that's what
they said.


>> Someone here said that about him. He is not really a product designer,
>> he is a researcher who isn't trying to design useful stuff, rather he
>> designs stuff and then looks for uses.
>
> Agreed, and it's an enviable place to be. That doesn't help me though,
> and it doesn't make GA144 viable. I wish it were otherwise.

I may go back to the oscilloscope design. I don't think I can get an
SDRAM interface to run any faster than 50 MHz vs. the 133 MHz the chip
is designed for and I'm not certain that will work. Meanwhile I believe
Chuck claims near 50 ns operation of the static RAM on the eval boards.
That would be good as that is random access timing. But Chuck's
timing is typically done with nop cycles which result in timing
variations which have to be allowed for with timing margin. His work
seems to be more of the lab type stuff rather than production
orientations and he is happy if it only works with his monitor or only
on cool days sort of stuff.

Rick

Elizabeth D Rather

unread,
Oct 16, 2012, 5:37:33 PM10/16/12
to
Most Forth implementations of locals put them somewhere relative to the
return stack for reentrancy.

As for the larger point, as Anton notes, there are many ways to deal
with large amounts of "live" data. In our experience, locals are
occasionally useful (especially in things like managing Windows calls),
but mostly not. We have never felt a need for them in embedded systems.

The issues involving stack and data management as well as factoring are
in the realm of Forth-specific programming skills that are part of the
learning curve for programmers accustomed to other languages. The best
way to summarize it is to strive for simplicity: if factoring makes your
programs simpler (e.g. giving a clear name to recurrent operations) or
simplifies stack management, it's good. As the FizzBuzz conversation
recently showed, factoring for its own sake is not helpful. The same is
true for locals: if they simplify your code, they're good. If they're a
crutch for avoiding developing stack management skills, not so much.

Paul Rubin

unread,
Oct 16, 2012, 11:56:46 PM10/16/12
to
Bernd Paysan <bernd....@gmx.de> writes:
> Chuck has an A and B register, which are hard-coded in the instructions
> that use them (i.e. they are not general purpose registers); they are
> used as pointers into memory.

Register A is relatively general purpose and it is quite useful for
holding temporary values. Register B is more specialized.

> The Transputer (which had a very shallow stack) had its workspace, a
> pointer into the fast on-chip SRAM which could be used with small

That sounds sort of like SPARC register windows? I have the impression
those were considered a good idea at one time but became a bottleneck
later, for reasons I don't understand.

>> In stack hardware where locals and memory are 10x(?) slower than
>> stack accesses, overall program performance probably suffers.
> There's no need to make your memory access *that* slow. Single-cycle
> for a small SRAM is easily doable.

It's not just the memory access itself; there's also the extra
instruction cycles, and the extra memory accesses needed to get the
address, plus the additional fetch to get the next word full of
instructions. Maybe with faster memory it can be less than 10x but it's
still a lot.

> The classic 8-bit micros are really slow, especially since you needed 16
> bit operations (which are then two 8 bit operations or more) quite
> often. They also all needed several cycles or phases for one
> instruction - typically 4 phases minimum. Going to a 1 cycle 16 bit
> design gives an order of magnitude.

The F18 has several phases I'm pretty sure. And an 18-bit addition on
the F18 takes two cycles (you have to insert a no-op for the ripple
carry to settle, or something like that). And some 8-bitters had some
16-bit operations, though maybe not the classic ones.

Paul Rubin

unread,
Oct 17, 2012, 12:10:15 AM10/17/12
to
an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
> There are a number of ways to deal with having lots of live data (I
> listed a number of them in my EuroForth 2011 paper
> <http://www.complang.tuwien.ac.at/anton/euroforth/ef11/papers/ertl.pdf>.
> Before we had locals, we used the other ways, but locals are often
> better; e.g., using global variables makes the code non-reentrant,
> using locals doesn't.

The paper is interesting, though it doesn't mention the obvious
(unspeakable?) method, namely using PICK to random-access the data
stack.

Here's my version of the rectangle problem, using doubleword operations
but no PICK:

: draw ( x0 y0 x1 y1 -- )
2swap ." line from " swap . . ." to " swap . . cr ;

: xy-ll ( xll yll xur yur -- xll yll xur yur xll yll )
2over ;
: xy-ul ( xll yll xur yur -- xll yll xur yur xll yur )
2over 2over nip nip ;
: xy-ur ( xll yll xur yur -- xll yll xur yur xur yur )
2dup ;
: xy-lr ( xll yll xur yur -- xll yll xur yur xur yll )
2over 2over 2>r nip 2r> drop swap ;

: rect ( xll yll xur yur -- ) cr
xy-ul 2>r xy-ll 2r> draw
xy-ur 2>r xy-ul 2r> draw
xy-lr 2>r xy-ur 2r> draw
xy-ll 2>r xy-lr 2r> draw
2drop 2drop ;

For readability it saves more junk on the stack than necessary.

A version with PICK is almost as easy as the locals version. Of course
PICK is very distasteful, even more than locals.

Andrew Haley

unread,
Oct 17, 2012, 12:44:17 AM10/17/12
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> Bernd Paysan <bernd....@gmx.de> writes:
>
>> The Transputer (which had a very shallow stack) had its workspace, a
>> pointer into the fast on-chip SRAM which could be used with small
>
> That sounds sort of like SPARC register windows?

No, the workspace pointer just pointed to the local frame. Local
variables were at small offsets from the workspace pointer. This made
for very fast context switches: it just had to save the program
counter in your workspace, swap the workspace pointer, and restore the
program counter. This was done with a simple round robin; there was
no other context to save and restore.

Andrew.

humptydumpty

unread,
Oct 17, 2012, 2:55:23 AM10/17/12
to
On Tuesday, October 16, 2012 12:15:32 PM UTC, Anton Ertl wrote:
Hi!

Should `execute' keep xt on stack?

: execute ( ..x xt -- ..y xt )
dup >R execute R> ;

With this new `execute' map-array falls trivial:

: map-array ( .. xt[..x I--..y] a u -- .. )
cells bounds DO? I swap execute 1 cells +LOOP drop ;

, xt of stack efect ( ..x I -- ..y ), in this case I is an address.

It also works for 2D maps where block inside loop is `I J rot execute'.

humptydumpty

unread,
Oct 17, 2012, 3:06:10 AM10/17/12
to
Sorry for typo, should be `?DO' not `DO?'.

Ed

unread,
Oct 17, 2012, 4:25:19 AM10/17/12
to
Elizabeth D. Rather wrote:
> On 10/16/12 1:31 AM, Ed wrote:
> > ...
> > Yes but how many times can one claim Forth's lack of traction in the
> > marketplace is due to bad luck, lack of marketing, poor education,
> > inadequate Standards, hobbyists etc. When one runs out of excuses,
> > what fact remains?
>
> The appropriate question to ask is whether in each instance you're
> looking at a "reason" or an "excuse".
>
> If you're trying to blame Forth's lack of traction on technical
> inadquacy, you need to point to an actual technical shortcoming, not
> perceptional issues. Perceptions are real and important, but they're in
> the realm of communication and marketing, not technical issues.

Is it not a failure of Forth that a product was deemed not sufficiently
important to proceed with?

From the point of view of forthers, Forth is their world. From the
world's viewpoint however Forth may well be insignificant.



Albert van der Horst

unread,
Oct 17, 2012, 6:01:40 AM10/17/12
to
In article <l-idnbOtB7U8quPN...@supernews.com>,
And may I add to this, the lowest 16 in the workspace were really registers.
A register to register move r4-r5 would be two bytes
ldl 4
stl 5
Comparable in speed and code density with the 30386 of the time.
But look what happens if you need 17 registers! Nothing much.
The ldl 4 will take one more byte.

What happens if you run out of registers on the 80386?
head scratching and a total redesign.

I have done my twin prime counting attempt totally in transputer
(Forth macro) assembly. Smooth.

>
>Andrew.

Groetjes Albert

Albert van der Horst

unread,
Oct 17, 2012, 6:25:40 AM10/17/12
to
In article <2012Oct1...@mips.complang.tuwien.ac.at>,
For me this challenge is not about live data, but an interface definition
that is non-Forthy. I'd say if your format your data this way, you not
only have problems realizing the word that prints, but also where the
word is used. If it used in a way like 7 5 3 .print , it might
as well be `` <% 7%s %| 3 %s %> 7 OVER - blank TYPE '' in <# # #> style.

So a typical Forth evasion: we need a <# equivalent for floating point.

>
>- anton
>--
>M. Anton Ertl http://www.complang.tuwien.ac.at/anton/home.html

Groetjes Albert

Rod Pemberton

unread,
Oct 17, 2012, 6:32:44 AM10/17/12
to
"rickman" <gnu...@gmail.com> wrote in message
news:k5kea4$6ue$1...@dont-email.me...
> On 10/16/2012 1:21 AM, Rod Pemberton wrote:
> > "rickman"<gnu...@gmail.com> wrote in message
> > news:k5i4vr$qhl$1...@dont-email.me...
> >> On 10/15/2012 6:28 PM, Paul Rubin wrote:
> > ....
> >
> >>> I remember finding it incredibly frustrating to not have a way (unless
> >>> I missed it) to load small constants like 1 without burning several
> >>> words.
> >>
> >> I know, but try designing a MISC sometime. There are tons of tradeoffs
> >> and Chuck picked ones that often end up with multiple instructions.
> >> Read his pages, he has some shortcuts that aren't so bad. Zero is just
> >> DUP OR if the stack is empty or DUP DUP OR if it isn't. Minus 1 is
> >> "get a zero" and -. There are some that aren't about constants.
> >> Subtract is quicker if you subtract S from T instead of the
> >> conventional T from S... - . + - rather than push - pop . + -
> >>
> >
> > Instead of "DUP OR", I think you meant "DUP XOR".
>
> You are not conversant with the GA144 assembly language. The OR
> instruction *is* an "xor".

Who mentioned GA144?

> > So, must you do a stack size check to determine which sequence to use?
> >
> > "stack_size" 0<> IF DUP THEN DUP XOR
> >
> > That's obtuse.
>
> If your code has to check the size of the stack to know if the stack is
> empty you aren't can't handle programming the GA144.
>
> You've got bloody 64 words of combined instruction and data space.
> Figure it out!
>
> No, you just aren't allowed to post anything about the GA144 at this
> point. I'm revoking your privileges!
>

Do we have multiple people posting as "rickman"?

I didn't post anything about the GA144, nor even the F18 that Paul
mentioned. I merely corrected an incorrect statement, that a "DUP OR"
clears a value. That's not true for Forth or most other languages where an
"OR" is an inclusive-or. So, I corrected a statement that appeared to be
incorrect to me and probably most others here. Anyone who has programmed in
numerous assembly and high-level languages know that "OR" is commonly known
as an inclusive-or, not as an exclusive-or which is commonly known as "XOR".
This also happens to be a Forth newsgroup, not GA144 nor F18. The various
Forth standards (ANS, F83, F79, fig) define OR as an inclusive-or and XOR as
an exclusive-or. (See ANS 6.1.1980 and 6.1.2490)


Rod Pemberton




Rod Pemberton

unread,
Oct 17, 2012, 6:40:00 AM10/17/12
to
"rickman" <gnu...@gmail.com> wrote in message
news:k5ked3$6ue$2...@dont-email.me...
Paul may be programming the F18 (or you may be programming the GA144), but
I'm certainly not, nor are most others here. Most here are probably not
familiar with those instruction sets, nor care to ever be. So, if Paul
knows his statements are going to lead to an erroneous understanding, he
could've premptively prevented it by noting that it's not what one typically
expects. From his few recent posts, I'd say he is thoroughly familiar with
enough microprocessors and programming languages that he knows "OR" normally
means inclusive-or in most programming contexts.


Rod Pemberton



humptydumpty

unread,
Oct 17, 2012, 8:01:09 AM10/17/12
to
Hi Paul!
This subject was many times debated here in c.l.f.
What come into my mind now is something:

: 4dup 2over 2over ;

: hline ( y2 y1 x -- ;use 'putpixel' inside a ?DO..LOOP ) ... ;
: vline ( y x2 x1 -- ;the same idea ) ... ;

: rectangle ( y2 y1 x2 x1 -- )
4dup drop hline
4dup nip hline
4dup rot drop vline
vline drop ;

Anton Ertl

unread,
Oct 17, 2012, 10:00:53 AM10/17/12
to
rickman <gnu...@gmail.com> writes:
>I've never looked at how locals are implemented. For them to be
>reentrant, don't they have to be on the stack?

The usual implementation is to have them on the return stack, or to
have a separate locals stack.

> What is your stack is
>not addressable? Do you have to build a stack in memory? That would be
>very awkward on the F18A. With such small memory space I think the
>entire issue is a "moop" point to quote George Costanza.

If the F18A is something like a GA144 core, yes, you probably would
not use locals there. One could work on a sophisticated compiler that
uses A or B for a local or puts it on one of the stacks, but I think
that would miss the point of this architecture. There, you have A and
B for dealing with more data than can be dealt with easily with two
stacks, and the programmer is responsible for managing A and B.

I don't think that this is a good solution for large programs, but
these CPUs cannot run large programs anyway.

Anton Ertl

unread,
Oct 17, 2012, 10:08:38 AM10/17/12
to
humptydumpty <oua...@gmail.com> writes:
>A try for map-array:
>
>: do-at ( x xt[x--y] a -- y xt )
> @ swap dup >R execute R> ;
>
>: mpa ( .. xt a u -- .. )
> cells bounds ?DO I do-at 1 cells +LOOP drop ;

Looks functional. But is it an improvement over

: map-array ( ... addr u xt -- ... )
\ executes xt ( ... x -- ... ) for every element of the array starting
\ at addr and containing u elements
{ xt }
cells over + swap ?do
i @ xt execute
1 cells +loop ;

?

Anton Ertl

unread,
Oct 17, 2012, 10:21:15 AM10/17/12
to
Well, when I started on what eventually became F.RDP, I thought in
this direction at first, but did not see a good way to do it, so I
eventually settled on the F.RDP interface. But if you can come up
with a good interface that can be implemented with reasonable effort,
I would be glad to see it.

Bernd Paysan

unread,
Oct 17, 2012, 11:54:36 AM10/17/12
to
Anton Ertl wrote:
> Well, when I started on what eventually became F.RDP, I thought in
> this direction at first, but did not see a good way to do it, so I
> eventually settled on the F.RDP interface. But if you can come up
> with a good interface that can be implemented with reasonable effort,
> I would be glad to see it.

I don't like the factoring of f.rdp. The steps of working with floating
point numbers to get a printable representation IMHO should be:

Obtain the exponent in the current base (or base 10 if you restrict
output to base 10), and normalize the rest of the floating point number
(in [0.1,1[).

Do exponent adjustments - e.g. we want to print with exponent=0 for
exponent ranges [-2,5], and with an exponent factor of 3 for everything
else.

Split the normalized floating point number into integer and fractional
part (both results are still FP numbers, as the integer part might not
fit into a cell or double cell).

Extract integer part digits and fractional digits into a buffer.
Integer digits are added to the left, fractional digits to the right.
Allow to limit the total number of digits in that buffer.

Extract the sign.

Append the exponent in a suitable form (e.g. as SI unit prefix or as
e<sign><number>). Maybe you might want to know beforehand how many
characters that will take to make other adjustments.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Anton Ertl

unread,
Oct 17, 2012, 11:58:23 AM10/17/12
to
Bernd Paysan <bernd....@gmx.de> writes:
>Anton Ertl wrote:
>> Well, when I started on what eventually became F.RDP, I thought in
>> this direction at first, but did not see a good way to do it, so I
>> eventually settled on the F.RDP interface. But if you can come up
>> with a good interface that can be implemented with reasonable effort,
>> I would be glad to see it.
>
>I don't like the factoring of f.rdp. The steps of working with floating
>point numbers to get a printable representation IMHO should be:
>
>Obtain the exponent in the current base (or base 10 if you restrict
>output to base 10), and normalize the rest of the floating point number
>(in [0.1,1[).

Unfortunately the exponent and the normalization depend on the number
of digits to be shown. E.g., if you want to show 99.99 with two or
more fractional digits, the exponent is 2 and the normalized number is
0.9999, whereas for 0 or 1 fractional digits, the exponent is 3 and
the normalized number is 0.1. And if you want to drop into scientific
or engineering representation in some circumstances while keeping the
total width the same, there are additional cases to consider.

Anyway, I am certainly interested in seeing your better-factored FP
output words.

rickman

unread,
Oct 17, 2012, 1:23:10 PM10/17/12
to
On 10/16/2012 11:56 PM, Paul Rubin wrote:
> Bernd Paysan<bernd....@gmx.de> writes:
>> Chuck has an A and B register, which are hard-coded in the instructions
>> that use them (i.e. they are not general purpose registers); they are
>> used as pointers into memory.
>
> Register A is relatively general purpose and it is quite useful for
> holding temporary values. Register B is more specialized.

Neither A or B are "general" registers. The only difference is register
B can only be written while A can be read or written. Each can be the
address for memory accesses. There are no other operations supported.
So they aren't even "relatively" general.


>> The Transputer (which had a very shallow stack) had its workspace, a
>> pointer into the fast on-chip SRAM which could be used with small
>
> That sounds sort of like SPARC register windows? I have the impression
> those were considered a good idea at one time but became a bottleneck
> later, for reasons I don't understand.

No, register windows are accessed as registers. The Transputer on chip
RAM is just that RAM that is on chip to be faster. It is addressed like
RAM, not registers.

But there isn't much reason to compare a Transputer to a SPARC. I don't
think there were many SPARCs used as embedded processors while nearly
all Transputers were such. The SPARC architecture was intended for
desktop use.


>>> In stack hardware where locals and memory are 10x(?) slower than
>>> stack accesses, overall program performance probably suffers.
>> There's no need to make your memory access *that* slow. Single-cycle
>> for a small SRAM is easily doable.
>
> It's not just the memory access itself; there's also the extra
> instruction cycles, and the extra memory accesses needed to get the
> address, plus the additional fetch to get the next word full of
> instructions. Maybe with faster memory it can be less than 10x but it's
> still a lot.

In the GA144 with on chip RAM a memory access using register addressing
there is a 3x difference in speed compared to stack operations. But
that is just to load/store the memory word you still need to do the
operation from the stack.

If the address is immediate, a fetch from program space is needed taking
another 3x plus the additional time for the disrupted instruction
prefetch. So conservatively say it is then 7x slower than the operation
with data on the stack.

But the data has to get to the stack somehow. The question is whether
the data started on the stack or not.


>> The classic 8-bit micros are really slow, especially since you needed 16
>> bit operations (which are then two 8 bit operations or more) quite
>> often. They also all needed several cycles or phases for one
>> instruction - typically 4 phases minimum. Going to a 1 cycle 16 bit
>> design gives an order of magnitude.
>
> The F18 has several phases I'm pretty sure. And an 18-bit addition on
> the F18 takes two cycles (you have to insert a no-op for the ripple
> carry to settle, or something like that). And some 8-bitters had some
> 16-bit operations, though maybe not the classic ones.

No, there are no "phases" for the F18A, it is an async processor. There
is no clock even. There are just instruction timings. The carry
propagation for an add take the same time as two "basic" opcode cycles,
but that is from the time the operands on the stack are stable. If
there is another instruction that doesn't disturb the stack just before
the + then the carry has had time to propagate. The + instruction
always executes in the same time, you just have to provide time for the
carry to propagate.


Rick

rickman

unread,
Oct 17, 2012, 1:32:55 PM10/17/12
to
I would suggest that by now, you should know that there is little about
the F18A or the GA144 that is like anything you have worked with before.
Don't assume...

Also, I made the post that you misunderstood. I did think for a moment
about the code being misunderstood and wasn't sure who did and didn't
know about the opcodes of the F18A. Now I know. I also realized it
would stick in the minds of the readers much better if it just this sort
of discussion resulted. There many ways to make a point and I think
this one has been rooted in your mind pretty well, no?

"mission accomplished"

Rick

Elizabeth D. Rather

unread,
Oct 17, 2012, 2:33:08 PM10/17/12
to
That depends on whether the decision to drop the RTX had anything to do
with Forth either way. Your last sentence is critical: to us, the RTX
was a "Forth chip". To Harris, it was "that new processor" and at the
time giving birth to a new processor wasn't something they were prepared
to support regardless of its features. If Forth had nothing to do with
the decision, it can hardly be considered a "failure of Forth".

Paul Rubin

unread,
Oct 17, 2012, 2:58:32 PM10/17/12
to
rickman <gnu...@gmail.com> writes:
> Neither A or B are "general" registers. The only difference is
> register B can only be written while A can be read or written. Each
> can be the address for memory accesses. There are no other operations
> supported. So they aren't even "relatively" general.

A is general relative to B in the sense that you can store temporary
values in A and read them back. Obviously it's a stack processor and
there's no "add register to register" or anything like that.

> No, register windows are accessed as registers. The Transputer on
> chip RAM is just that RAM that is on chip to be faster.

That sounds like a good simple method, though these days larger cpu's
have caches for what I guess is the same purpose.

> If the address is immediate ... conservatively say it is then 7x
> slower than the operation with data on the stack.

Right, this is the case of reading or storing a variable. Of course to
address through A or B, you have to first load the register, which
(depending) might or might not count.

> No, there are no "phases" for the F18A, it is an async processor.
> There is no clock even.

I thought I saw in some other posts that the GA is async only in the
sense that the different cpu nodes are clocked independently of each
other. But each node has its own clock (ring oscillator) that puts out
several pulses (phases) for each instruction executed, and the different
steps happening inside an instruction are synchronized by those pulses.

rickman

unread,
Oct 17, 2012, 3:22:08 PM10/17/12
to
On 10/17/2012 2:58 PM, Paul Rubin wrote:
> rickman<gnu...@gmail.com> writes:
>> Neither A or B are "general" registers. The only difference is
>> register B can only be written while A can be read or written. Each
>> can be the address for memory accesses. There are no other operations
>> supported. So they aren't even "relatively" general.
>
> A is general relative to B in the sense that you can store temporary
> values in A and read them back. Obviously it's a stack processor and
> there's no "add register to register" or anything like that.

Yes, but that is the point. It is a special purpose register for
addressing memory. If you have general purpose registers that would be
very outside the idea of a stack machine.


>> No, register windows are accessed as registers. The Transputer on
>> chip RAM is just that RAM that is on chip to be faster.
>
> That sounds like a good simple method, though these days larger cpu's
> have caches for what I guess is the same purpose.

Some others were talking about a "workspace" which I didn't remember. I
guess that was pointed to in memory and typically was in the on chip
memory.


>> If the address is immediate ... conservatively say it is then 7x
>> slower than the operation with data on the stack.
>
> Right, this is the case of reading or storing a variable. Of course to
> address through A or B, you have to first load the register, which
> (depending) might or might not count.

Exactly, if the address is already in the register (such as being
calculated as in an array) then the access is the slightly faster number.


>> No, there are no "phases" for the F18A, it is an async processor.
>> There is no clock even.
>
> I thought I saw in some other posts that the GA is async only in the
> sense that the different cpu nodes are clocked independently of each
> other. But each node has its own clock (ring oscillator) that puts out
> several pulses (phases) for each instruction executed, and the different
> steps happening inside an instruction are synchronized by those pulses.

That is the nonesense that some people will post, but the instruction
timings vary widely depending on the instruction, prefetch and other
aspects and they are not integer multiples indicating that there is a
processor self clock that runs machine cycles or anything like that.
The F18A nodes are async processors in every sense of the word. Some
people just like to speculate about the internal structure.

Rick

Paul Rubin

unread,
Oct 17, 2012, 3:30:59 PM10/17/12
to
Bernd Paysan <bernd....@gmx.de> writes:
> Recumbent bikes have a similar lack of traction in the bicycle world,
> even though they are more secure, more comfortable, and faster than
> normal bikes. Why?

From what I have heard, they are much less maneuverable than normal
bikes. They are also harder to see in traffic, because the rider is
lower.

> and some people even raised concerns ... that I need to turn the
> pedals with my hands or something similarly silly -

They do make some hand-propelled ones and some where both the hands and
feet are used (I think some kind of speed record was set with one of
those). So it seems like a reasonable question to me.

> to my ex-boss that I'm not going to have a stop-over flight to San
> Francisco, but I take the direct flight instead,.. he complained
> bitterly that I made them all look stupid.

Flying from Europe to San Francisco takes something like 10 hours and I
can understand why most people wouldn't want to sit on a nonstop flight
for that long. Since the nonstop flight was a little cheaper, your boss
may have been concerned that upper management would now start pressuring
everyone to take the nonstop flights.

> When I was younger, I often wondered why the most retarded technology
> seems to get the most traction. Ugly processors like PIG^HC or 8051,
> x86, DOS/Windows, nowadays things like PHP.

Obviously first-mover advantage has a huge part. The PIC and 8051
offered high levels of integration at low cost compared to the
alternatives when they were introduced, DOS ran the original IBM PC,
etc. But, I'd say that PHP is really quite nice for beginners and
simple uses even now. It's easy to learn, it comes with a lot of useful
libraries that languages like Python are missing, and it's very well
documented. It's also quite efficient compared to many alternatives in
certain environments (massive shared hosting). Its disadvantages only
start to be felt once you're doing more complicated things.

> Inside the JavaScript compiler. As boot system, inside a controller.
> Or, as Carrier IQ demonstrated: In some pre- installed spyware. Most
> Forth success stories are projects where the Forth system is hidden.
> This is not by accident.

They used to say similar things about Lisp...

Frank Thomason

unread,
Oct 17, 2012, 3:31:04 PM10/17/12
to
On 10/17/2012 2:33 PM, Elizabeth D. Rather wrote:
> On 10/16/12 10:25 PM, Ed wrote:
>> Elizabeth D. Rather wrote:
>>> On 10/16/12 1:31 AM, Ed wrote:
>
> That depends on whether the decision to drop the RTX had anything to do
> with Forth either way. Your last sentence is critical: to us, the RTX
> was a "Forth chip". To Harris, it was "that new processor" and at the
> time giving birth to a new processor wasn't something they were prepared
> to support regardless of its features. If Forth had nothing to do with
> the decision, it can hardly be considered a "failure of Forth".
>

I worked at Harris, on the RTX product, and I can verify that what
Elizabeth says is basically correct. Harris' "official" microprocessor
product was the CMOS version of Intel's 286; the RTX effort was
considered a waste of resources. After the successful launch of the
RTX2000, we approached management with a proposal for what would be
needed to develop and support the 32-bit version of the RTX (Phil
Koopman's design). Our code name for it was the BINAR (But It's Not An
RTX), just to hide the project's origins. Management's response was very
negative. That was the beginning of the end of the RTX development.


Bernd Paysan

unread,
Oct 17, 2012, 3:53:59 PM10/17/12
to
rickman wrote:
> That is the nonesense that some people will post, but the instruction
> timings vary widely depending on the instruction, prefetch and other
> aspects and they are not integer multiples indicating that there is a
> processor self clock that runs machine cycles or anything like that.
> The F18A nodes are async processors in every sense of the word. Some
> people just like to speculate about the internal structure.

Which is something Chuck did talk about 11 years ago on EuroForth when
he presented the c18, the precesessor of the F18A. And which matches
the fact that all basic ALU and stack operations do take exactly the
same amount of time.

The structure, as far as I understood Chuck, is a ring oscillator, which
is stopped when something is "not ready", and there are a few ready
signals, from the prefetcher, from memory accesses, from the IOs. These
ready signals have their own delay, which is *not* a multiple of some
clock (the clock is *stopped* while waiting). The speculative part is
that Chuck kept the architecture from the c18, which is not that far-
fetched.

Nowadays, they don't talk that open about internals anymore. Maybe
spoiled by the patent trolls.

rickman

unread,
Oct 17, 2012, 4:00:40 PM10/17/12
to
On 10/17/2012 3:53 PM, Bernd Paysan wrote:
> rickman wrote:
>> That is the nonesense that some people will post, but the instruction
>> timings vary widely depending on the instruction, prefetch and other
>> aspects and they are not integer multiples indicating that there is a
>> processor self clock that runs machine cycles or anything like that.
>> The F18A nodes are async processors in every sense of the word. Some
>> people just like to speculate about the internal structure.
>
> Which is something Chuck did talk about 11 years ago on EuroForth when
> he presented the c18, the precesessor of the F18A. And which matches
> the fact that all basic ALU and stack operations do take exactly the
> same amount of time.
>
> The structure, as far as I understood Chuck, is a ring oscillator, which
> is stopped when something is "not ready", and there are a few ready
> signals, from the prefetcher, from memory accesses, from the IOs. These
> ready signals have their own delay, which is *not* a multiple of some
> clock (the clock is *stopped* while waiting). The speculative part is
> that Chuck kept the architecture from the c18, which is not that far-
> fetched.

Yes, you like the ring oscillator name, but in reality this is just a
delay line matched to the delay of the ALU. Likewise the other paths
all have matched delay lines, all of which, if in use, must indicate
ready for the CPU to proceed. You even say this yourself when you say,
"These ready signals have their own delay".

The point is that these delay lines are all different, their timings all
vary with PVT and they are all independent, not a function of any clock.
Unless you have the design in your hands, anything else is
speculation, but of more importance, doesn't add anything to the
understanding of the CPU.

Rick

Bernd Paysan

unread,
Oct 17, 2012, 4:35:06 PM10/17/12
to
Paul Rubin wrote:

> Bernd Paysan <bernd....@gmx.de> writes:
>> Recumbent bikes have a similar lack of traction in the bicycle world,
>> even though they are more secure, more comfortable, and faster than
>> normal bikes. Why?
>
> From what I have heard, they are much less maneuverable than normal
> bikes. They are also harder to see in traffic, because the rider is
> lower.

That's what you *heard*. You don't have any own experience. A "short-
wheeler" is about as easy to maneuver as a normal bike; however, if you
need jumping of some sort, you can't do it (it's not a BMX or similar).
And typical recumbent bikes - with rider - are about as high as a normal
car. In winter, when it's snowing, I use my upright (because I don't
have the right tires for snowy weather on my recumbent). I feel
completely invisible on the upright.

>> and some people even raised concerns ... that I need to turn the
>> pedals with my hands or something similarly silly -
>
> They do make some hand-propelled ones

Those are wheel-chairs (including the ones with hand+feet). Yes, some
people seem to confuse a recumbent bike with a wheel-chair, as well, and
ask me why I use this, as I obviously have healthy legs.

> and some where both the hands
> and feet are used (I think some kind of speed record was set with one
> of those).

Don't know, the speed records are usually achieved by good aerodynamics.
Muscle strength is not an issue with riding a bicycle, it's getting
enough oxygen inside. That's why Tour de France winners take Epo.

> So it seems like a reasonable question to me.

Even when the rider is riding it and you can look at what he does?
Nope. The kids don't ask me those questions. These are questions of
people who have made their mind up and don't need input any longer.

>> to my ex-boss that I'm not going to have a stop-over flight to San
>> Francisco, but I take the direct flight instead,.. he complained
>> bitterly that I made them all look stupid.
>
> Flying from Europe to San Francisco takes something like 10 hours and
> I can understand why most people wouldn't want to sit on a nonstop
> flight for that long.

It talkes 11 hours from Munich and it takes 10:45 from London (San
Francisco is mostly north¹ to us, so going westwards doesn't help). So
they chose to fly to London first, which means taking an early morning
flight, waiting 3 or 4 hours in the BA lounge, going through security
again, etc. It all makes so much more sense you now explained me that
they simply can't do the additional 15 minutes nonstop. I deliberately
only travel with hand-luggage for such a trip, because I don't want to
waste another hour for checking in my luggage.

¹) Some flat-earthers sitting next to me complained that taking the
detour over greenland was unnecessary.

> Since the nonstop flight was a little cheaper, your
> boss may have been concerned that upper management would now start
> pressuring everyone to take the nonstop flights.

No, the price was a non-issue. Upper management deliberately said that
for inter-continental flights, comfort is more important than cost,
unless the cost is excessive (i.e. no first class allowed).

>> When I was younger, I often wondered why the most retarded technology
>> seems to get the most traction. Ugly processors like PIG^HC or 8051,
>> x86, DOS/Windows, nowadays things like PHP.
>
> Obviously first-mover advantage has a huge part. The PIC and 8051
> offered high levels of integration at low cost compared to the
> alternatives when they were introduced, DOS ran the original IBM PC,

Which was somewhat late to that market... but was the first "acceptable"
to buy for people who are driven by fear.

> etc. But, I'd say that PHP is really quite nice for beginners and
> simple uses even now. It's easy to learn, it comes with a lot of
> useful libraries that languages like Python are missing, and it's very
> well
> documented. It's also quite efficient compared to many alternatives
> in
> certain environments (massive shared hosting). Its disadvantages only
> start to be felt once you're doing more complicated things.

Or trying to secure your account database ;-).

>> Inside the JavaScript compiler. As boot system, inside a controller.
>> Or, as Carrier IQ demonstrated: In some pre- installed spyware. Most
>> Forth success stories are projects where the Forth system is hidden.
>> This is not by accident.
>
> They used to say similar things about Lisp...

Indeed. For a long time, Emacs was about the most visible success story
of Lisp. And all the Lisp-haters dismissed Emacs ;-).

Bernd Paysan

unread,
Oct 17, 2012, 5:10:00 PM10/17/12
to
rickman wrote:
> Yes, you like the ring oscillator name, but in reality this is just a
> delay line matched to the delay of the ALU. Likewise the other paths
> all have matched delay lines, all of which, if in use, must indicate
> ready for the CPU to proceed. You even say this yourself when you
> say, "These ready signals have their own delay".

Yes, but the essential design are clocked flip-flops or latches, with a
ring oscillator as primary driving source of the clock. That there is a
"stop clock" signal to the ring oscillator is nothing special; the more
special thing is that this signal is used to asynchronously delay those
instructions that need it. It's not multiple delay pathes as you say,
without oscillator, it's a ring oscillator plus delay pathes.

> The point is that these delay lines are all different, their timings
> all vary with PVT and they are all independent, not a function of any
> clock.

But the only delay path that is circular and feeds back to itself,
resulting in the thing actually going forward is the thing I and Chuck
call "ring oscillator". It doesn't matter if you don't understand a
particular term.

> Unless you have the design in your hands, anything else is
> speculation, but of more importance, doesn't add anything to the
> understanding of the CPU.

That's your oppinion, that's black-box magic crap. In general, I'd like
to know how the inner workings are, and in general, this *does* add
considerably to the understanding.

rickman

unread,
Oct 17, 2012, 5:28:38 PM10/17/12
to
On 10/17/2012 5:10 PM, Bernd Paysan wrote:
> rickman wrote:
>> Yes, you like the ring oscillator name, but in reality this is just a
>> delay line matched to the delay of the ALU. Likewise the other paths
>> all have matched delay lines, all of which, if in use, must indicate
>> ready for the CPU to proceed. You even say this yourself when you
>> say, "These ready signals have their own delay".
>
> Yes, but the essential design are clocked flip-flops or latches, with a
> ring oscillator as primary driving source of the clock. That there is a
> "stop clock" signal to the ring oscillator is nothing special; the more
> special thing is that this signal is used to asynchronously delay those
> instructions that need it. It's not multiple delay pathes as you say,
> without oscillator, it's a ring oscillator plus delay pathes.

This is just po.ta.toes and po.tah.toes...


>> The point is that these delay lines are all different, their timings
>> all vary with PVT and they are all independent, not a function of any
>> clock.
>
> But the only delay path that is circular and feeds back to itself,
> resulting in the thing actually going forward is the thing I and Chuck
> call "ring oscillator". It doesn't matter if you don't understand a
> particular term.

I'm not much interested in opening this whole debate again.

All of the delay paths close a loop, otherwise the processor would stop.
The loop starts with whatever kicks off the instruction and when that
instruction ends, it kicks off the timing for the next instruction. The
instruction determines which delay path to utilize. For you to say the
other delays cause the primary delay to "wait" is just terminology.

If you have 3 instructions in a row that all use the same delay path,
won't that delay path be oscillating in a ring while the ALU delay path
is locked?

The point is the control circuit for the CPU has multiple delays and
which ever one controls the timing for the current instruction triggers
the FFs and kicks off the next instruction. Naming one path the
"primary" path is just terminology.


>> Unless you have the design in your hands, anything else is
>> speculation, but of more importance, doesn't add anything to the
>> understanding of the CPU.
>
> That's your oppinion, that's black-box magic crap. In general, I'd like
> to know how the inner workings are, and in general, this *does* add
> considerably to the understanding.

Really, what does it shed light on exactly? I want to know exactly what
the instructions do at a register level and I want to know max and min
for all the timings over PVT. Power levels are also useful. There may
be a few other assorted details, but why should anyone care whether they
use an AND gate or an OR gate and which delay path gets called the "ring
oscillator"?

Why don't we just agree to disagree on this?

Rick

Bernd Paysan

unread,
Oct 17, 2012, 6:11:35 PM10/17/12
to
rickman wrote:
>> That's your oppinion, that's black-box magic crap. In general, I'd
>> like to know how the inner workings are, and in general, this *does*
>> add considerably to the understanding.
>
> Really, what does it shed light on exactly?

Well, if it is a ring oscillator with additional stop/restart, there is
a maximum frequency you can achieve. It can't possibly go faster than
the primary cycle time, and there's also a minimum adder for start/stop
delay (though that can be quite small), so everything else has to be
slower. If it is just a selection of different delay chains (all
inverted, so that all can loop on their own), there is no such minimum
delay, nor is there a delay adder for the slower delays.

The minimum delay could be important for the communications with other
cores. If there wasn't such a minimum delay, and the port is already
written by the next core, you could end up with a very short delay (it
*is* ready), and therefore a setup violation. The ring oscillator as
core and as minimum delay ensures a minimum low and a minimum high time
of the clock, and therefore is a quite trivial mean to sanitize the
design, with the impact that all other pathes (that are not minimal)
can't be equal in speed.

The difference is between "all equal" (your mental picture) and "one is
always active and therefore guarantees a minimum low/high time".

> I want to know exactly
> what the instructions do at a register level and I want to know max
> and min
> for all the timings over PVT. Power levels are also useful. There
> may be a few other assorted details, but why should anyone care
> whether they use an AND gate or an OR gate and which delay path gets
> called the "ring oscillator"?

Well, it doesn't seem to matter for you, but it does matter for me.

> Why don't we just agree to disagree on this?

Yes, let's agree on that. So you let me call this thing ring oscillator
with extra asynchronous events to stop/restart it without telling me
that I'm wrong or speculating without any substance.

There is some substantial research into asynchronous CPUs, and - apart
from Chuck - little progress, because these designs are hard to analyze
and to get right - and there are no automated tools to do this. Using
one loop as dominant loop IMHO simplifies this task considerably,
allowing a more conventional approach for most of the design; probably
automated tools could get that right, too. You can analyze the part
that's clocked by the dominating loop as a normal synchronous design,
leaving only a few pathes through special function units (like SRAM or
communication interface) to be checked vs. the other delay chains.

Rod Pemberton

unread,
Oct 17, 2012, 9:10:45 PM10/17/12
to
"rickman" <gnu...@gmail.com> wrote in message
news:k5mq4c$336$1...@dont-email.me...
...

> There many ways to make a point and I think
> this one has been rooted in your mind pretty well, no?
>
> "mission accomplished"
>
> Rick

You're Rick as in Rick Hohensee, yes?

Silly me, I didn't catch the "rickman" monicker.

Shouldn't you be troll..., er, I'm mean touting your software, such as the
osimplay assembler, H3sm 3-stack Forth, H3rL 3-ring Linux, Ha3sm
3-stack Forth based OS, or Forreal 32-bit RM x86 code?


RP



Paul Rubin

unread,
Oct 17, 2012, 9:15:28 PM10/17/12
to
Bernd Paysan <bernd....@gmx.de> writes:
> That's what you *heard*. You don't have any own experience. A "short-
> wheeler" is about as easy to maneuver as a normal bike;

Including at very low (walking) speeds, which are sometimes necessary on
city streets due to having to keep moving and wait for traffic at the
same time?

> And typical recumbent bikes - with rider - are about as high as a
> normal car.

Some people do use recumbents around here and I can tell you, they are
harder to see from a car. Maybe not all models are like that.

> Don't know, the speed records are usually achieved by good aerodynamics.
> Muscle strength is not an issue with riding a bicycle, it's getting
> enough oxygen inside. That's why Tour de France winners take Epo.

Speed records aren't endurance contests like the Tour de France.
They're about getting the maximmum possible instantaneous power. So it
probably helps to use more muscles at the same time.

>> [Munich o San Francisco]
> they chose to fly to London first

That does sound a little odd, unless they wanted to shop at Heathrow
Airport (lots of overpriced stores there) or something like that. More
sensible would be to stop in (e.g.) New York which is about 6 hours from
Munich, I think.

Andrew Haley

unread,
Oct 17, 2012, 10:11:31 PM10/17/12
to
Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
> In article <l-idnbOtB7U8quPN...@supernews.com>,
> Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>>
>>No, the workspace pointer just pointed to the local frame. Local
>>variables were at small offsets from the workspace pointer. This made
>>for very fast context switches: it just had to save the program
>>counter in your workspace, swap the workspace pointer, and restore the
>>program counter. This was done with a simple round robin; there was
>>no other context to save and restore.
>
> And may I add to this, the lowest 16 in the workspace were really registers.

Gosh, are you sure? This would make for horrendous context switch
times. I don't remember this.

Andrew.

Mark Wills

unread,
Oct 18, 2012, 4:08:34 AM10/18/12
to
On Oct 18, 3:11 am, Andrew Haley <andre...@littlepinkcloud.invalid>
wrote:
> Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
>
> > In article <l-idnbOtB7U8quPNnZ2dnUVZ8t-dn...@supernews.com>,
> > Andrew Haley  <andre...@littlepinkcloud.invalid> wrote:
>
> >>No, the workspace pointer just pointed to the local frame.  Local
> >>variables were at small offsets from the workspace pointer.  This made
> >>for very fast context switches: it just had to save the program
> >>counter in your workspace, swap the workspace pointer, and restore the
> >>program counter.  This was done with a simple round robin; there was
> >>no other context to save and restore.
>
> > And may I add to this, the lowest 16 in the workspace were really registers.
>
> Gosh, are you sure?  This would make for horrendous context switch
> times.  I don't remember this.
>
> Andrew.

TMS9900 and derivatives have worked like this since 1976. It's a
brilliant system, though it didn't catch on. It means you can have as
many registers as you want, effectively. Subroutines can have their
own set of registers, with no need to push/pop data/from a stack
before/after subroutine calls.

Love it.

humptydumpty

unread,
Oct 18, 2012, 4:31:48 AM10/18/12
to
Hi!
Factoring that more I've got:
: REEXEC ( ..x xt -- ..y xt ) dup >R execute R> ;
,then thought that definition would deserve to be a primitive, then I posted in this thread "Should `execute' keep xt on stack?".

in this new version `map-array' is trivial:

: map-array ( .. xt a u -- .. ) cells bounds ?DO I swap REEXEC ( or the new `EXECUTE' that keep xt on stack ) 1 cells +LOOP drop ;

I think that would be an improvement.

Albert van der Horst

unread,
Oct 18, 2012, 5:25:12 AM10/18/12
to
In article <AJydnU63XsXO-OLN...@supernews.com>,
Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
>> In article <l-idnbOtB7U8quPN...@supernews.com>,
>> Andrew Haley <andr...@littlepinkcloud.invalid> wrote:
>>>
>>>No, the workspace pointer just pointed to the local frame. Local
>>>variables were at small offsets from the workspace pointer. This made
>>>for very fast context switches: it just had to save the program
>>>counter in your workspace, swap the workspace pointer, and restore the
>>>program counter. This was done with a simple round robin; there was
>>>no other context to save and restore.
>>
>> And may I add to this, the lowest 16 in the workspace were really registers.
>
>Gosh, are you sure? This would make for horrendous context switch
>times. I don't remember this.

The context switch time is very short. The workspace pointer points
to a registerset in on-chip RAM. That is what I try to explain.
Allthough they are mapped regularly in the memory space, they are for
all intents and purposes registers.

>
>Andrew.

Bernd Paysan

unread,
Oct 18, 2012, 12:35:24 PM10/18/12
to
Andrew Haley wrote:
>> And may I add to this, the lowest 16 in the workspace were really
>> registers.
>
> Gosh, are you sure? This would make for horrendous context switch
> times. I don't remember this.

Later transputers like the T9000 (the never-ending prototype, which
killed Inmos) and the ST20 used a "workspace cache", which was mostly a
register file. So when you switched context, there was a bit of a
slowdown to warm up that cache.

Andrew Haley

unread,
Oct 18, 2012, 12:42:42 PM10/18/12
to
Mark Wills <forth...@gmail.com> wrote:
> On Oct 18, 3:11?am, Andrew Haley <andre...@littlepinkcloud.invalid>
> wrote:
>> Albert van der Horst <alb...@spenarnc.xs4all.nl> wrote:
>>
>> > In article <l-idnbOtB7U8quPNnZ2dnUVZ8t-dn...@supernews.com>,
>> > Andrew Haley ?<andre...@littlepinkcloud.invalid> wrote:
>>
>> >>No, the workspace pointer just pointed to the local frame. ?Local
>> >>variables were at small offsets from the workspace pointer. ?This made
>> >>for very fast context switches: it just had to save the program
>> >>counter in your workspace, swap the workspace pointer, and restore the
>> >>program counter. ?This was done with a simple round robin; there was
>> >>no other context to save and restore.
>>
>> > And may I add to this, the lowest 16 in the workspace were really registers.
>> Gosh, are you sure? This would make for horrendous context switch
>> times. I don't remember this.
>
> TMS9900 and derivatives have worked like this since 1976. It's a
> brilliant system, though it didn't catch on. It means you can have as
> many registers as you want, effectively.

But they're not registers, they're RAM, aren't they?

Andrew.

Anton Ertl

unread,
Oct 18, 2012, 12:42:03 PM10/18/12
to
humptydumpty <oua...@gmail.com> writes:
>Factoring that more I've got:
>: REEXEC ( ..x xt -- ..y xt ) dup >R execute R> ;
>,then thought that definition would deserve to be a primitive, then I posted in this thread "Should `execute' keep xt on stack?".
>
>in this new version `map-array' is trivial:
>
>: map-array ( .. xt a u -- .. ) cells bounds ?DO I swap REEXEC ( or the new `EXECUTE' that keep xt on stack ) 1 cells +LOOP drop ;
>
>I think that would be an improvement.

Yes, this REEXEC looks promising. The question is: Is it useful in
other contexts?

Andrew Haley

unread,
Oct 18, 2012, 12:46:29 PM10/18/12
to
Of course, you can point the workspace at the Transputer's internal
memory, but that's not registers, it's still RAM. I's wrong to say
they were "really registers": they were just offsets from a pointer,
just like most processors can use indexed addressing to access locals.

Andrew

Andrew Haley

unread,
Oct 18, 2012, 1:06:14 PM10/18/12
to
Bernd Paysan <bernd....@gmx.de> wrote:
> Andrew Haley wrote:
>>> And may I add to this, the lowest 16 in the workspace were really
>>> registers.
>>
>> Gosh, are you sure? This would make for horrendous context switch
>> times. I don't remember this.
>
> Later transputers like the T9000 (the never-ending prototype, which
> killed Inmos) and the ST20 used a "workspace cache", which was mostly a
> register file. So when you switched context, there was a bit of a
> slowdown to warm up that cache.

Thanks, that's what I was thinking of. So, the T8 series (which I
used) didn't have workspace in registers, but T9 did.

Andrew.

Anton Ertl

unread,
Oct 18, 2012, 12:45:48 PM10/18/12
to
Paul Rubin <no.e...@nospam.invalid> writes:
>Bernd Paysan <bernd....@gmx.de> writes:
>> That's what you *heard*. You don't have any own experience. A "short-
>> wheeler" is about as easy to maneuver as a normal bike;
>
>Including at very low (walking) speeds, which are sometimes necessary on
>city streets due to having to keep moving and wait for traffic at the
>same time?

Sure, that's no problem. It seems to be a problem for car drivers,
though (at least here where hydraulic automatic transmissions are
rare), and they still drive on city streets.

>> Don't know, the speed records are usually achieved by good aerodynamics.
>> Muscle strength is not an issue with riding a bicycle, it's getting
>> enough oxygen inside. That's why Tour de France winners take Epo.
>
>Speed records aren't endurance contests like the Tour de France.
>They're about getting the maximmum possible instantaneous power. So it
>probably helps to use more muscles at the same time.

The same bikes (e.g. Varna Tempest) are used for the "Fastest Human
list" and the "Fastest Hour list"
<http://www.wisil.recumbents.com/wisil/fastest_list.asp>, and AFAIK
they are all just pedal driven. Even for the Fastest Human, where the
measured distance (200m) takes only about 5s, one has to accelerate to
that speed first, and maximum power has to be used for much more than
5s.

>>> [Munich o San Francisco]
>> they chose to fly to London first
>
>That does sound a little odd, unless they wanted to shop at Heathrow
>Airport (lots of overpriced stores there) or something like that. More
>sensible would be to stop in (e.g.) New York which is about 6 hours from
>Munich, I think.

Pick up the baggage there, go through immigration, check in the
baggage, miss the connecting flight because immigration took too long,
yes, what wouldn't we do for a stopover.
It is loading more messages.
0 new messages