More Thoughts on Green Arrays

rickman

unread,

Apr 9, 2011, 1:44:32 PM4/9/11

to

I had a conversation about the Green Arrays chips in an FPGA
discussion group where I said the devices should be thought of as
processor arrays with the emphasis on array rather than processor.
The point is that no one thinks about a LUT in an FPGA in terms of
keeping it "busy" or utilizing every gate, etc. They know that there
are lots of LUTs and FFs and they are cheap, so there is no need to
optimize every gate.

Another poster was very concerned about the sorts of issues you have
when you are multitasking on typical processors, like preventing
deadlocks. I can't remember ever worrying about deadlocks when
designing logic although I'm sure it is possible. But I couldn't
explain how they are avoided when designing hardware.

In fact, that is how I see the Green Array devices as programmable
hardware, not software engines. But I'm not sure I fully understand
how to use all the features in this way. Consider the memory
interface. It uses three processors to control memory and has to be
accessible from many nodes in the chip. I suppose one will be the
primary node to handle access requests. I suppose some of the other
nodes would need to pass messages around and act as multiplexors.

The one thing I am having trouble getting around is the tiny 64 word
memories. I can accept a small stack and I can see where minimal data
memory is needed, but I am stuck thinking about how to use a node for
processing and also use it for passing data between the other nodes.

Do the tools have a simulator so that code can be tested before the
hardware is available? That is another way I see these as being
similar to hardware, I will feel a lot more comfortable using a
simulator that lets me "see" what is going on in the chip. How else
will you be able to debug 144 processors?

Rick

Greg Bailey

unread,

Apr 9, 2011, 8:53:27 PM4/9/11

to

Darn right they include a simulator, Rick! One can find that out by
RTFM but on the assumption you aren't the only fellow who has not
done that we just updated the website so that it is a little bit harder
to miss now.

In addition, we just today released a new version of arrayForth that
incorporates a lot of improvements John, Jeff, Charley and others have
made over the past months in softsim as a result of having used it
extensively themselves. So the timing is good for you to get
acquainted with it. hot...@greenarraychips.com is the right email
address to talk with about usage issues if any. Downloadable right
now, free of charge, http://www.greenarraychips.com where there
is a bunch of other news being announced.

Cheers - Greg

"rickman" wrote in message
news:853c5ae6-879d-4e83...@a12g2000yqk.googlegroups.com...

<snip>

Paul Rubin

unread,

Apr 10, 2011, 5:53:31 PM4/10/11

to

"Greg Bailey" <gr...@GreenArrayChips.com> writes:
> Darn right they include a simulator, Rick! One can find that out by
> RTFM but on the assumption you aren't the only fellow who has not
> done that we just updated the website so that it is a little bit harder
> to miss now.

Hey Greg, good that you're hear, and I hope things are going well at GA.

Have you released the source code for your Eforth port, including the
virtual machine that runs on the GA nodes? That might be re-usable for
purposes of targeting a C compiler to the GA, like you suggested in your
Forth Day video. I don't feel likely to be the guy to do such a port,
but it would be interesting to look at, and probably helpful to someone
wanting to pursue running C on the chip.

Greg Bailey

unread,

Apr 11, 2011, 1:18:23 PM4/11/11

to

"Paul Rubin" wrote in message news:7xipule...@ruckus.brouhaha.com...

--------------------------------------------------------------------------------

Howdy!

Source for the current VM is included in the arrayForth release and
documentation is underway, look for a prelim release soon.

Brad

unread,

Apr 11, 2011, 5:44:52 PM4/11/11

to

On Apr 9, 10:44 am, rickman <gnu...@gmail.com> wrote:
> I had a conversation about the Green Arrays chips in an FPGA
> discussion group where I said the devices should be thought of as
> processor arrays with the emphasis on array rather than processor.

The FPGA analogy is pretty good IMHO, with GreenArrays using a 180nm
process so you would have to compare them to 180nm FPGAs. FPGAs have
embedded RAM blocks and hard multipliers these days, and those will
hopefully come to GreenArrays.

> ... I suppose some of the other

> nodes would need to pass messages around and act as multiplexors.

Yep, the nodes are often used as routing. They can do things to the
signal as they pass it along. It takes very little power to act as a
smart wire.

> The one thing I am having trouble getting around is the tiny 64 word
> memories. I can accept a small stack and I can see where minimal data
> memory is needed, but I am stuck thinking about how to use a node for
> processing and also use it for passing data between the other nodes.

I don't agree with the one-size-fits-all philosophy. There are sure
to be tasks that would like double or quadruple the RAM so there
should be some cores in the mix (that would take up double the
footprint of a normal C18) that are RAM-intensive. It's okay if they
run a little slower, since they're async cores. IMHO, with factoring
the power of code grows exponentially up to a point of diminishing
returns. 64 words is probably well before that point.

-Brad

Brad

unread,

Apr 11, 2011, 8:11:08 PM4/11/11

to

On Apr 11, 2:44 pm, Brad <hwfw...@gmail.com> wrote:
> I don't agree with the one-size-fits-all philosophy.

Here's a picture of what I'd like to see in an array:
https://sites.google.com/site/forthtoychest/sea.png

-Brad

rickman

unread,

Apr 12, 2011, 12:41:52 AM4/12/11

to

They may not have provided you with CPUs with 8x the memory, but they
did provide the option of shutting off one node to allow an adjacent
node to use the RAM giving that node double the amount. I am pretty
sure I read that somewhere.

There are a number of features of the GA devices that make me a bit
uncomfortable, but I'm not ready to start recommending changes until I
have tried using the parts.

My biggest concern is that there won't be enough app notes to provide
the techniques to solve practical problems. For example, the
oscillator app note does not give enough info to add a crystal to the
chip for an oscillator. Then of course, the big difference of these
chips from others is the software and I have no idea how much will be
required to explain how to properly work with that.

I'm pretty convinced these parts have a lot of applications if I can
understand how to properly use them. So I'll be interested in
reviewing the docs they come out with.

Rick

Paul Rubin

unread,

Apr 12, 2011, 3:06:22 AM4/12/11

to

rickman <gnu...@gmail.com> writes:
> They may not have provided you with CPUs with 8x the memory, but they
> did provide the option of shutting off one node to allow an adjacent
> node to use the RAM giving that node double the amount. I am pretty
> sure I read that somewhere.

As I understand, you can use the adjacent nodes' ram as (essentially)
I/O devices, so it's slower than using local ram, and uses some of your
code space.

> I'm pretty convinced these parts have a lot of applications if I can
> understand how to properly use them. So I'll be interested in
> reviewing the docs they come out with.

It does seem cool to get so much computing speed, at so small power
consumption and low cost. But I spent a while trying to think of
applications, and almost everything I could want to do turns out to not
really be workable. Most everything I do wants 8-bit bytes and 32-bit
arithmetic, or else wants gobs of ram, or else wants fast DSP blocks
or floating point arithmetic, etc.

rickman

unread,

Apr 12, 2011, 9:22:57 AM4/12/11

to

On Apr 12, 3:06 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> > They may not have provided you with CPUs with 8x the memory, but they
> > did provide the option of shutting off one node to allow an adjacent
> > node to use the RAM giving that node double the amount. I am pretty
> > sure I read that somewhere.
>
> As I understand, you can use the adjacent nodes' ram as (essentially)
> I/O devices, so it's slower than using local ram, and uses some of your
> code space.

I had the impression it was a direct connection, but perhaps it is
just a message protocol that requires software on both nodes. That
would not be as fast, but it would be completely extensible to all
nodes you wish to make part of the "RAM pool".

Rather than speculate, I guess we should wait for more details. This
is the sort o thing that I am waiting for. We won't have an idea of
what problems these parts may be applied to until we have more details
on how to make the part do useful things.

> > I'm pretty convinced these parts have a lot of applications if I can
> > understand how to properly use them. So I'll be interested in
> > reviewing the docs they come out with.
>
> It does seem cool to get so much computing speed, at so small power
> consumption and low cost. But I spent a while trying to think of
> applications, and almost everything I could want to do turns out to not
> really be workable. Most everything I do wants 8-bit bytes and 32-bit
> arithmetic, or else wants gobs of ram, or else wants fast DSP blocks
> or floating point arithmetic, etc.

I don't understand how needing 8 bit data can be a problem. Are you
saying that you want the device to automatically limit the data range
somehow? I don't see how wanting 32 bit arithmetic can be a problem
either. Can't multiple precision arithmetic do the job? The RAM
issue can be dealt with by adding external memory, either static or
dynamic RAM. Is that not fast enough? I don't follow the fast DSP
concern at all. 666 MIPS is not fast enough? Ok, so the serial
multiply slows it to say 30 MMACS... times 144 is... well you do the
math.

For signal processing these devices don't seem suited to the high end,
RADAR sample rates that FPGAs can handle. But they seem very capable
of handling audio processing.

Could it be that you are used to thinking in terms of the available
solutions which all fit a "standard" mold? I find the unique features
of this part to be applicable to many applications. I just don't know
enough to understand how to properly apply them to an application. I
do think that the low power of this device will be what will put this
device into new applications that other parts can't do.

Rick

Charley Shattuck

unread,

Apr 17, 2011, 5:11:05 PM4/17/11

to

Take a look at the appnote on MD5, http://www.greenarraychips.com/home/
documents/pub/AP001-MD5.html on the Green Arrays website. Skip down to
the source blocks 842, 844, and 850. The block 842 puts a 64 word table
of numbers into one node. Block 844 does the same for another node. In
this case we're doing 32 bit arithmetic by letting one row of nodes have
the high 16 bits and an adjacent row have the low 16 bits. Once in awhile
two nodes will share carry information.

But for now just look at one 64 word table. That node jumps to the port
shared with a neighbor and waits for instructions to be provided by that
neighbor. Once the data node has jumped to or called its code provider
the code in block 850's GET will read consecutive numbers from the data
table as needed. It looks like:

: get right b! @p .. @+ !p .. !b @b ;

So, "right b!" causes the B register to point to the data node neighbor.
"@p .." fetches the next word in memory to the data stack and pads the
rest of this word with nops. The word fetched is the instruction word
compiled by "@+ !p .." "@+" makes the data node fetch a word of data from
where its A register points. (The A register was initialized to 0 in the
word GO). "!p" writes the value back into the port. Meanwhile the code
node has executed "!b" to send the instruction word to the data node and
is waiting for that data with a "@b".

Note that the data node has absolutely no code in its RAM, just data. The
neighbor with the code simply writes instruction words into the shared
comm port to tell the data node what to do.

I think that if you read enough of the website you'll see that there *is*
some info about how to do things. It will only get better as time goes on.

And, as Greg said, the appropriate place to ask such questions is the
hot...@GreenArrayChips.com if you'd like to get an answer from someone
who knows it.

Charley.

rickman

unread,

Apr 17, 2011, 5:53:46 PM4/17/11

to

> hotl...@GreenArrayChips.com if you'd like to get an answer from someone
> who knows it.

Charley,

The last thing I want to do is to knock Green Arrays. I like your
ideas and I like the possibilities of the products. But I think your
main difficulty is going to be getting past the bias many engineers
will have against a part like this and educating them on how to
properly use the chip in fruitful ways. I get that you may have tons
of stuff on your web site but expecting a potential user to reverse
engineer code to learn how to use these parts is a bit of a reach.

Again, I'm not trying to knock your company's efforts, but my one
attempt to evaluate the parts was to look at an app which would be
doing ADC and DAC conversions and so would need a very stable clock.
I tried to read the app note on a 32 kHz crystal oscillator and was
not able to verify in a simulation that I could make it work. I heard
from someone from your company and was told this would bre fleshed out
in detail later and indicated that a 10 MHz oscillator was working in
the lab as proof that it was practical. Working "in the lab" is not
at all the same thing as being ready for prime time. Maybe it was
more comment of others in this group insisting that this constituted
"proof" that it was good to go. But to be used in a design an
oscillator has to have a lot of properties that can't be verified by
"it works in the lab".

I'm still positive on the GA devices. But I will need much better
docs and app notes before I will be able to create useful designs with
them. I'm looking forward to that time.

Rick

Paul Rubin

unread,

Apr 17, 2011, 11:41:29 PM4/17/11

to

rickman <gnu...@gmail.com> writes:
>> As I understand, you can use the adjacent nodes' ram as (essentially)
>> I/O devices, so it's slower than using local ram, and uses some of your

>> code space....

> Rather than speculate, I guess we should wait for more details. This
> is the sort o thing that I am waiting for.

I thought it was already pretty well documented, in the md5 notes for
example. I notice there is also some new (1 week old) data on the GA
web site: http://greenarrays.com/home/news/index.html

>> really be workable. Most everything I do wants 8-bit bytes and 32-bit

> I don't understand how needing 8 bit data can be a problem. Are you
> saying that you want the device to automatically limit the data range
> somehow?

Doing 8 bit operations seems to cost a big slowdown for shifting and
masking, if you want to use the on-cpu ram to hold 2 bytes per word.

> I don't see how wanting 32 bit arithmetic can be a problem either.
> Can't multiple precision arithmetic do the job?

Again, only at the expense of a big slowdown for all that added
juggling. Look at the md5 example for the amount of pain it takes.

> The RAM issue can be dealt with by adding external memory, either
> static or dynamic RAM. Is that not fast enough?

Certainly way too slow, like 100+ nsec for off chip access, vs 2 ns
for on-chip. Also bandwidth limited by the number of pins. Only
a few of the processor nodes can reach external ram. Those 2ns
internal accesses can happen on all 144 node at the same time.

> For signal processing these devices don't seem suited to the high end,
> RADAR sample rates that FPGAs can handle. But they seem very capable
> of handling audio processing.

Think of video codecs, software defined radio, etc.-- the stuff you find
in a fancy cell phone.

> Could it be that you are used to thinking in terms of the available
> solutions which all fit a "standard" mold?

Yes, possibly so, I'm used to processing big piles of character data
so for me it keeps coming back to byte operations 32-bit math, but I'm
sure there's lots that I'm not thinking of.

> I find the unique features of this part to be applicable to many

> applications. ... I do think that the low power of this device will

> be what will put this device into new applications that other parts
> can't do.

If you have any example application ideas you can reveal, I'd be
interested in seeing them. It's an amazing chip, I just haven't yet
personally figured out what to do with it, despite a fair amount of head
scratching.

rickman

unread,

Apr 19, 2011, 12:38:27 AM4/19/11

to

On Apr 17, 11:41 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> >> As I understand, you can use the adjacent nodes' ram as (essentially)
> >> I/O devices, so it's slower than using local ram, and uses some of your
> >> code space....
> > Rather than speculate, I guess we should wait for more details. This
> > is the sort o thing that I am waiting for.
>
> I thought it was already pretty well documented, in the md5 notes for
> example. I notice there is also some new (1 week old) data on the GA
> web site:http://greenarrays.com/home/news/index.html

I haven't read the MD5 app note.

> >> really be workable. Most everything I do wants 8-bit bytes and 32-bit
> > I don't understand how needing 8 bit data can be a problem. Are you
> > saying that you want the device to automatically limit the data range
> > somehow?
>
> Doing 8 bit operations seems to cost a big slowdown for shifting and
> masking, if you want to use the on-cpu ram to hold 2 bytes per word.

"Seems"? I don't see how it is that much of a hindrance to work with
bytes, but you know your apps.

> > I don't see how wanting 32 bit arithmetic can be a problem either.
> > Can't multiple precision arithmetic do the job?
>
> Again, only at the expense of a big slowdown for all that added
> juggling. Look at the md5 example for the amount of pain it takes.

Again, I'll take your word for it. Most DSP devices are 16 bits and
the top end 32 bit devices use Watts rather than milliWatts. I just
don't see how 144 processors would be slow doing 32 bit arithmetic.

> > The RAM issue can be dealt with by adding external memory, either
> > static or dynamic RAM. Is that not fast enough?
>
> Certainly way too slow, like 100+ nsec for off chip access, vs 2 ns
> for on-chip. Also bandwidth limited by the number of pins. Only
> a few of the processor nodes can reach external ram. Those 2ns
> internal accesses can happen on all 144 node at the same time.

Yes, external memory doesn't run at 2 ns. But does that mean it is
too slow for a given app? If your apps only run on processors which
have 2 ns access to tons of memory, then yes, these chips won't do the
job.

> > For signal processing these devices don't seem suited to the high end,
> > RADAR sample rates that FPGAs can handle. But they seem very capable
> > of handling audio processing.
>
> Think of video codecs, software defined radio, etc.-- the stuff you find
> in a fancy cell phone.

I think software defined radio is one area where this chip will
shine! You don't need tons of memory for that, you need fast
processing. Better than cell phones, which use lots of tailored
chips, are other products which use the same technology but aren't
made in the millions. I think these parts have a lot of potential
here.

> > Could it be that you are used to thinking in terms of the available
> > solutions which all fit a "standard" mold?
>
> Yes, possibly so, I'm used to processing big piles of character data
> so for me it keeps coming back to byte operations 32-bit math, but I'm
> sure there's lots that I'm not thinking of.

What types of apps are these?

> > I find the unique features of this part to be applicable to many
> > applications. ... I do think that the low power of this device will
> > be what will put this device into new applications that other parts
> > can't do.
>
> If you have any example application ideas you can reveal, I'd be
> interested in seeing them. It's an amazing chip, I just haven't yet
> personally figured out what to do with it, despite a fair amount of head
> scratching.

I still don't know enough about them to know what I can and can't do
with them. I may need to spend some time with the MD5 app note. It
looks like that is the one with the most info at the moment.

Rick

Paul Rubin

unread,

Apr 20, 2011, 12:13:46 AM4/20/11

to

Charley Shattuck <csha...@surewest.net> writes:
> I think that if you read enough of the website you'll see that there *is*
> some info about how to do things. It will only get better as time goes on.

Yes, there is a reasonable amount of docs now (at least for software)
and it was also helpful to look at the Seaforth docs which are also
still online. There are some parts that I'm a bit confused by
conceptually, like how one gets code to the GA144's interior nodes, or
what the rom of those nodes contains. I see node 105(?) containing the
eforth vm is rather far from the memory controller--does that mean using
several wire nodes between the vm and external memory? I guess those
delays of a few ns are still much faster than external dram.

I spent a little time last night looking at the vm and related code in
the arrayforth zip file. I didn't understand much of it, and it was
clearly anachronistic (I'd never seen code as "blocks" before though I'd
heard of it), but it was in a certain way beautiful. I'd like to try to
figure it out some more.

Paul Rubin

unread,

Apr 20, 2011, 1:20:09 AM4/20/11

to

rickman <gnu...@gmail.com> writes:
>> Doing 8 bit operations seems to cost a big slowdown for shifting and
>> masking, if you want to use the on-cpu ram to hold 2 bytes per word.
> "Seems"? I don't see how it is that much of a hindrance to work with
> bytes, but you know your apps.

I was thinking about the RC4 stream cipher. On a conventional processor
including a typical 8 bit micro, each byte of output takes around a
dozen instructions, juggling around a 256 byte memory array. On the GA,
I haven't counted exactly, but it's quite a bit more arithmetic and you
end up using three nodes. Despite this, I think RC4 on the GA is
reasonably practical. An RC4 key cracker might be an interesting GA144
application if you want to attack password-protected files from old
versions of Microsoft Word or a few other such programs.

> Again, I'll take your word for it. Most DSP devices are 16 bits and
> the top end 32 bit devices use Watts rather than milliWatts. I just
> don't see how 144 processors would be slow doing 32 bit arithmetic.

Well, look at the md5 notes, they split 32-bit words into 16-bit halves
stored on separate nodes, then have to juggle the halves around,
propagate carry bits between nodes, etc. The GA144 running flat out
uses about 0.6W. You get a stupendous amount of mips for that power,
but it's not something you can run from a watch battery.

> Yes, external memory doesn't run at 2 ns. But does that mean it is
> too slow for a given app? If your apps only run on processors which
> have 2 ns access to tons of memory, then yes, these chips won't do the job.

Yes, the caches in a conventional x86 or fast ARM processor are 2ns and
quite large compared with the GA144 memory.

> I think software defined radio is one area where this chip will
> shine! You don't need tons of memory for that, you need fast
> processing.

That's interesting and I'm glad to hear it. You probably know a lot
more about SDR than I do. But, I think I calculated that the GA beats
just about everything for energy per integer addition, it's not really
far ahead of dedicated DSP's for MAC, and might have been behind some.
I posted numbers in an earlier newsgroup message but no longer remember
them.

>> Yes, possibly so, I'm used to processing big piles of character data
>> so for me it keeps coming back to byte operations 32-bit math, but I'm
>> sure there's lots that I'm not thinking of.
>
> What types of apps are these?

The usual data communications and crypto stuff, character processing, etc.
Data compression wants a lot of memory besides the computation.

>> If you have any example application ideas you can reveal...

> I still don't know enough about them to know what I can and can't do
> with them. I may need to spend some time with the MD5 app note. It
> looks like that is the one with the most info at the moment.

Cool.

rickman

unread,

Apr 20, 2011, 3:56:43 AM4/20/11

to

On Apr 20, 1:20 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> >> Doing 8 bit operations seems to cost a big slowdown for shifting and
> >> masking, if you want to use the on-cpu ram to hold 2 bytes per word.
> > "Seems"? I don't see how it is that much of a hindrance to work with
> > bytes, but you know your apps.
>
> I was thinking about the RC4 stream cipher. On a conventional processor
> including a typical 8 bit micro, each byte of output takes around a
> dozen instructions, juggling around a 256 byte memory array. On the GA,
> I haven't counted exactly, but it's quite a bit more arithmetic and you
> end up using three nodes. Despite this, I think RC4 on the GA is
> reasonably practical. An RC4 key cracker might be an interesting GA144
> application if you want to attack password-protected files from old
> versions of Microsoft Word or a few other such programs.
>
> > Again, I'll take your word for it. Most DSP devices are 16 bits and
> > the top end 32 bit devices use Watts rather than milliWatts. I just
> > don't see how 144 processors would be slow doing 32 bit arithmetic.
>
> Well, look at the md5 notes, they split 32-bit words into 16-bit halves
> stored on separate nodes, then have to juggle the halves around,
> propagate carry bits between nodes, etc. The GA144 running flat out
> uses about 0.6W. You get a stupendous amount of mips for that power,
> but it's not something you can run from a watch battery.

I'm sure they did that as a way to parallelize the app. I don't think
you need to use two node to do 32 (or actually 36) bit arithmetic. As
to the power consumption, you are missing the single biggest feature
of the GA devices in my opinion, the nearly perfect power to
instruction use relationship. When you execute code it uses power.
When the processor stops, the power virtually stops. I think it would
be hard to find an app that actually dissipates 0.6 Watts in a GA144.
Think about it, that would be 144 * 666 MIPS = 96,000 MIPS!!! Even
using a factor of 10 to allow for the simple instruction set that
would give about 10,000 MIPS compared to a regular DSP. If your app
needs that much processing power, I think you will be very happy if it
only uses 0.6 Watts!

But if your app only uses 10% of that processing because of
"inefficiencies" as the processors wait for data to be passed,
synchronization, etc., the power consumption will be 10% or just 60
mW. A GA32 running at 10% utilization (2,100 MIPS) would only use
14.4 mW. That IS almost watch battery levels.

> > Yes, external memory doesn't run at 2 ns. But does that mean it is
> > too slow for a given app? If your apps only run on processors which
> > have 2 ns access to tons of memory, then yes, these chips won't do the job.
>
> Yes, the caches in a conventional x86 or fast ARM processor are 2ns and
> quite large compared with the GA144 memory.

Yes, and they consume lots of power and cost lots of $$$ per chip.
You compare the GA144 to other devices that may be similar in
processing power, but not in any other sense. The devices that are
about the same size, cost and power consumption have much, much less
processing power. That is why the GA architecture can afford to waste
processing power, there is so much of it and it is so cheap! That is
why I compare it to and FPGA rather than an MCU. Designers don't care
much about using the elements in FPGAs because they are so cheap and
plentiful.

> > I think software defined radio is one area where this chip will
> > shine! You don't need tons of memory for that, you need fast
> > processing.
>
> That's interesting and I'm glad to hear it. You probably know a lot
> more about SDR than I do. But, I think I calculated that the GA beats
> just about everything for energy per integer addition, it's not really
> far ahead of dedicated DSP's for MAC, and might have been behind some.
> I posted numbers in an earlier newsgroup message but no longer remember
> them.

I wouldn't go so far as to say I know lots more than anyone about
anything. But I was thinking of a good app to illustrate the GA
capabilities and I think a radio controlled clock might be a good
one. That was when I tried to figure out how to use it to drive a
crystal for an oscillator. I'll wait for the final app note on how to
design such a circuit. Power consumption is critical and a high speed
crystal driver will use too much power I think. A 32 kHz crystal will
be too slow. The design needs something around 500 kHz to drive the
ADC conversions to sample a 60 kHz radio signal. The bit rate of the
time reference signal is one bit per second which gives lots of
integration time. The ADC running at 8x to 10x the carrier will allow
a correlation to sync phase and pull the carrier out of the noise.
The rest is very simple and fairly low processing. This would work
easily on a GA4 I think. But the devil is in the details and I would
like to see some characterization data on the ADC. The talk about
linearization code for the ADC as well as temperature compensation,
etc. I don't know that any of this is needed for this app, but for me
to consider using the device I need to know what it can and can't
do. The radio controlled clock is just a demo app idea to show that
it can compete on power with custom solutions.

> >> Yes, possibly so, I'm used to processing big piles of character data
> >> so for me it keeps coming back to byte operations 32-bit math, but I'm
> >> sure there's lots that I'm not thinking of.
>
> > What types of apps are these?
>
> The usual data communications and crypto stuff, character processing, etc.
> Data compression wants a lot of memory besides the computation.

I guess you are talking fairly high data rates if you are worried
about the bandwidth to memory. If the chip has dedicated processors
handling the memory interface the rest of the design can have the full
bandwidth of the memory, no? For an SDRAM that can be pretty fast!

> >> If you have any example application ideas you can reveal...
> > I still don't know enough about them to know what I can and can't do
> > with them. I may need to spend some time with the MD5 app note. It
> > looks like that is the one with the most info at the moment.
>
> Cool.

A customer has just lit a fire and I need to put it out. I have to
test the next 200 boards and my test fixture still isn't working up to
snuff. It keeps screwing up the serial coms, actually the PC does,
but the point is tests have to be repeated until all the planets align
and the test works as well as the board does. This will take me a few
days. I wish I had bought a decent logic analyzer a couple of weeks
ago. The ancient beast I'm still using just doesn't cut the mustard
sometimes.

Rick

Albert van der Horst

unread,

Apr 20, 2011, 1:42:12 PM4/20/11

to

In article <7xtydts...@ruckus.brouhaha.com>,

Paul Rubin <no.e...@nospam.invalid> wrote:
>Charley Shattuck <csha...@surewest.net> writes:
>> I think that if you read enough of the website you'll see that there *is*
>> some info about how to do things. It will only get better as time goes on.
>
>Yes, there is a reasonable amount of docs now (at least for software)
>and it was also helpful to look at the Seaforth docs which are also
>still online. There are some parts that I'm a bit confused by
>conceptually, like how one gets code to the GA144's interior nodes, or
>what the rom of those nodes contains. I see node 105(?) containing the
>eforth vm is rather far from the memory controller--does that mean using
>several wire nodes between the vm and external memory? I guess those
>delays of a few ns are still much faster than external dram.

Getting code to interior nodes? That is like being worried about not
knowing what is going on inside a compiler.

I had parpi running on actual chips, you know. The Intellasys chips.
There were a couple of pieces of code, and then you tell the
system which code runs on what node, and how they are connected.
The system communicated with a PC through a serial line.
in : lN
out : the number of primes under N (after a couple of mS.)

We didn't have to write the code to handle a serial line either,
would you have expected otherwise?

>
>I spent a little time last night looking at the vm and related code in
>the arrayforth zip file. I didn't understand much of it, and it was
>clearly anachronistic (I'd never seen code as "blocks" before though I'd
>heard of it), but it was in a certain way beautiful. I'd like to try to
>figure it out some more.

We (Leon Konings and I) have moved away from blocks,
and now use ascii source, with some (own) tools.
(Been mentioned before on this forum.)

Groetjes Albert

--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

Paul Rubin

unread,

Apr 21, 2011, 2:02:04 AM4/21/11

to

rickman <gnu...@gmail.com> writes:
>> Well, look at the md5 notes, they split 32-bit words into 16-bit halves

>> stored on separate nodes, then have to juggle the halves around,...

> I'm sure they did that as a way to parallelize the app. I don't think
> you need to use two node to do 32 (or actually 36) bit arithmetic.

I didn't get that impression, since they'd still have had to use
the multiple nodes to hold the data (64 32-bit constants). Maybe
there is some way to reorganize the code to split each 32-bit word
into a pair of 18-bit words on the same node, but I figured that
the GA guys knew what they were doing and would have programmed it
the other way if that made sense.

> As to the power consumption, you are missing the single biggest
> feature of the GA devices in my opinion, the nearly perfect power to
> instruction use relationship. When you execute code it uses power.
> When the processor stops, the power virtually stops. I think it would
> be hard to find an app that actually dissipates 0.6 Watts in a GA144.
> Think about it, that would be 144 * 666 MIPS = 96,000 MIPS!!!

It's true, I'm thinking like a data center programmer rather than an
embedded guy, but I figure hardware costs money, so as a code-tweaker
I'm trying to maximize CPU utilization and if I'm only using 48,000
mips of those 96,000, I'd say my code is wasting half the available
cycles ;-).

> Even using a factor of 10 to allow for the simple instruction set that
> would give about 10,000 MIPS compared to a regular DSP. If your app
> needs that much processing power, I think you will be very happy if it
> only uses 0.6 Watts!

I don't think that's unrealistic; the GPU or DSP part of an Arm Cortex
chip in a fancy cell phone probably can do close to 10,000 DSP mips with
less than 0.6W of power. They use the mips for stuff like 1080HD video
encoding.

> A GA32 running at 10% utilization (2,100 MIPS) would only use 14.4 mW.
> That IS almost watch battery levels.

Again I'm not sure how that compares with the DSP's used in cell phones
and media player.

>> Yes, the caches in a conventional x86 or fast ARM processor are 2ns and
>> quite large compared with the GA144 memory.
> Yes, and they consume lots of power and cost lots of $$$ per chip.

I don't know how much the ARM chip cost, but the Cortex A8 (OMAP 3630)
is used in the Archos 28 media player, which is a complete Android
tablet including CPU, 4gb of flash, touch screen, battery, wifi, audio
codec, USB, etc. all for $100 retail. I'd expect the A8 chip has to be
cost comparable to the GA144 in order to be included in a gadget like
that. The economies of scale and the 45nm process technology both help
a lot I'm sure.

> That is why the GA architecture can afford to waste processing power,
> there is so much of it and it is so cheap! That is why I compare it
> to and FPGA rather than an MCU. Designers don't care much about using
> the elements in FPGAs because they are so cheap and plentiful.

Is that for real? I'd thought FPGA designers fiendishly try to
optimize their designs so they can fit in smaller, cheaper parts,
or get more functionality into a given part. The GA's big
strength is its stupendous amount of raw integer mips, so
my main thoughts towards applications are about how to use those
mips. Maybe I'm not looking in the right directions for problems
it can solve, though. I'm used to big-machine tasks which aren't
that great a fit for the GA's capabilities.

> I was thinking of a good app to illustrate the GA capabilities and I

> think a radio controlled clock might be a good one... Power

> consumption is critical and a high speed crystal driver will use too
> much power I think. A 32 kHz crystal will be too slow. The design
> needs something around 500 kHz to drive the ADC conversions to sample
> a 60 kHz radio signal.

I thought the idea is just use an ultra-low powered clock (32khz
crystal) like a conventional digital watch, with a radio that you
activate once a day for 1 minute or so, to get the time signal and
adjust the clock. So if the clock has a 200 mAH 3 volt coin cell,
and you want it to run for 3 years on a battery, assuming the
32 khz operation takes basically no power, the radio can use
30 mW if you run it 1 minute a day. That seems pretty doable
with a conventional part. In fact radio controlled clocks and
watches are widely available, so I don't think this application
really uses unique GA capabilities.

Anyway, do they even try to digitize the time signal? It's not
just analog demodulation making a serial signal given to a cpu
input pin?

> But the devil is in the details and I would like to see some
> characterization data on the ADC. The talk about linearization code
> for the ADC as well as temperature compensation, etc.

It would surprise me if the cheap consumer radio clocks already
available use any type of precise a/d's.

> I guess you are talking fairly high data rates if you are worried
> about the bandwidth to memory.

Well, if I'm running some monstrous computational task, and I want to
replace racks full of PC's with racks full of GA's... a guy can dream ;-).

More realistically I'm thinking of GPGPU-style applications which do use
quite high memory bandwidth. The (power hungry and expensive) NVidia
Tesla chip apparently has about 150 GB/s bandwidth (384 bit bus, 3.1 ghz
clock) and I don't think the GA144 can come near that.

> A customer has just lit a fire and I need to put it out. ... This

> will take me a few days. I wish I had bought a decent logic analyzer
> a couple of weeks ago. The ancient beast I'm still using just doesn't
> cut the mustard sometimes.

OK, good luck with it, I hope it works out. I'd have no clue how to
fix an issue like that.

rickman

unread,

Apr 22, 2011, 1:32:30 PM4/22/11

to

On Apr 21, 2:02 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> >> Well, look at the md5 notes, they split 32-bit words into 16-bit halves
> >> stored on separate nodes, then have to juggle the halves around,...
> > I'm sure they did that as a way to parallelize the app. I don't think
> > you need to use two node to do 32 (or actually 36) bit arithmetic.
>
> I didn't get that impression, since they'd still have had to use
> the multiple nodes to hold the data (64 32-bit constants). Maybe
> there is some way to reorganize the code to split each 32-bit word
> into a pair of 18-bit words on the same node, but I figured that
> the GA guys knew what they were doing and would have programmed it
> the other way if that made sense.

I can't say why they designed the md5 app the way they did, but my
point is that I don't think there is any constraint that precludes you
from doing 32 or 36 bit arithmetic on a single node. In fact, I have
no doubt that it is much more complex to distribute a single math
operation across multiple nodes. So I don't consider their example as
justification that multiple precision arithmetic is difficult on the
GA devices.

> > As to the power consumption, you are missing the single biggest
> > feature of the GA devices in my opinion, the nearly perfect power to
> > instruction use relationship. When you execute code it uses power.
> > When the processor stops, the power virtually stops. I think it would
> > be hard to find an app that actually dissipates 0.6 Watts in a GA144.
> > Think about it, that would be 144 * 666 MIPS = 96,000 MIPS!!!
>
> It's true, I'm thinking like a data center programmer rather than an
> embedded guy, but I figure hardware costs money, so as a code-tweaker
> I'm trying to maximize CPU utilization and if I'm only using 48,000
> mips of those 96,000, I'd say my code is wasting half the available
> cycles ;-).

That makes sense when you are dealing with $200 CPUs on $1000 boards.
But that it the first difference between the GA devices and any other
processor out there. The processor is not "king" any more. I think I
have made the analogy several times that this is more like an FPGA
where you take advantage of the fact that the chip is a "sea" of
processors and don't worry about using them all much less using them
all at max capacity. This device is not about setting a processing
rate barrier, it is about using the wealth of processing performance
to solve problems.

> > Even using a factor of 10 to allow for the simple instruction set that
> > would give about 10,000 MIPS compared to a regular DSP. If your app
> > needs that much processing power, I think you will be very happy if it
> > only uses 0.6 Watts!
>
> I don't think that's unrealistic; the GPU or DSP part of an Arm Cortex
> chip in a fancy cell phone probably can do close to 10,000 DSP mips with
> less than 0.6W of power. They use the mips for stuff like 1080HD video
> encoding.

GPU...? what??? There are no GPUs in a cell phone! An ARM Cortex
chip has no GPU and they don't have DSP either. There are combined
ARM/DSP chips, but there is no reason for a cell phone to have a GPU
unless they are trying to be a state of the art smart phone playing
videos. Trust me, when smart phones turn on all those features they
use a LOT more than 0.6 Watts!!! The smart phone owners I talk to say
if they don't use them too much, they can barely go a day between
charges!!!

> > A GA32 running at 10% utilization (2,100 MIPS) would only use 14.4 mW.
> > That IS almost watch battery levels.
>
> Again I'm not sure how that compares with the DSP's used in cell phones
> and media player.

Why are you worried about comparing to other devices? The benchmark I
use is the application I would consider it for. In my case I see that
the idle power of each processor is in the ballpark of 100 nW. The
running power of a processor is in the ballpark of 6.75 uW/MHz. Most
embedded processors are in the range of 200 uW/MHz and power house
processors like DSPs and GPUs are much more power hungry with the
possible exception of a desktop GPU when you have all the MACs running
productively. Someone published a power consumption number on a GPU
which showed it was in the ballpark of an embedded processor in terms
of uW/MHz, but desktop GPUs have little power control circuitry and
can't be throttled back the same way smaller devices are. I guess
that's why they are desktop GPSs where power consumption is not the
big issue.

> >> Yes, the caches in a conventional x86 or fast ARM processor are 2ns and
> >> quite large compared with the GA144 memory.
> > Yes, and they consume lots of power and cost lots of $$$ per chip.
>
> I don't know how much the ARM chip cost, but the Cortex A8 (OMAP 3630)
> is used in the Archos 28 media player, which is a complete Android
> tablet including CPU, 4gb of flash, touch screen, battery, wifi, audio
> codec, USB, etc. all for $100 retail. I'd expect the A8 chip has to be
> cost comparable to the GA144 in order to be included in a gadget like
> that. The economies of scale and the 45nm process technology both help
> a lot I'm sure.

I think Android processors are at least a factor of 2x or 4x the
ultimate cost of the GA144 once it is in volume production. But the
power consumption will never be on a par. The GA144 can literally run
from a watch battery if you have a task that lightly burdens it. You
might be able to keep the RTC of an Android processor running from a
watch cell. I can't say I have hard data on any of this as I have not
looked at the Android type processors in a couple of years. But they
use external RAM chips to run the OS and use external Flash to hold
the data and OS so in many ways it is an apples to oranges
comparison. Are you looking at the GA devices for an Android type
application? This may not be a good fit. I don't know how well a
GA144 would run as an application processor.

> > That is why the GA architecture can afford to waste processing power,
> > there is so much of it and it is so cheap! That is why I compare it
> > to and FPGA rather than an MCU. Designers don't care much about using
> > the elements in FPGAs because they are so cheap and plentiful.
>
> Is that for real? I'd thought FPGA designers fiendishly try to
> optimize their designs so they can fit in smaller, cheaper parts,
> or get more functionality into a given part. The GA's big
> strength is its stupendous amount of raw integer mips, so
> my main thoughts towards applications are about how to use those
> mips. Maybe I'm not looking in the right directions for problems
> it can solve, though. I'm used to big-machine tasks which aren't
> that great a fit for the GA's capabilities.

I am an FPGA designer and we don't have the means to properly and
readily optimize the logic in a small design much less a large one.
Unless you have lots of time to spend optimizing you can't really do
much other than flip the compiler switch between optimizing for speed
vs. size. There are techniques for generating designs that are lean,
but mostly designers spend their time on getting the job done and pick
a chip that will hold it.

Personally I think the GA loses a lot as soon as you try to harness it
to anything that looks like a desktop or even a smart phone
application. As you pointed out, it doesn't have that sort of memory
bandwidth or internal memory. MIPS can be used many ways. One way is
to replace hardware. That is what has happened in PCs over the last
15 years. The used to have telco modem cards that were larger than a
cell phone. Now that is one chip on the motherboard, not because the
chip incorporates it all, but because all the processing is now done
on the x86 CPU.

For embedded apps, rather than using a single hugely fast processor to
try to do everything, which can be very hard, a multiprocessor like
the GA144 can assign separate nodes to handle separate parts of the
problem.

Someone mentioned software defined radio. These typically have an RF
front end with down conversion to IF, IF ADC, FPGA to filter and
further down convert and a DSP to perform demodulation. A chip like
the GA144 can likely do everything after the IF ADC in software.

> > I was thinking of a good app to illustrate the GA capabilities and I
> > think a radio controlled clock might be a good one... Power
> > consumption is critical and a high speed crystal driver will use too
> > much power I think. A 32 kHz crystal will be too slow. The design
> > needs something around 500 kHz to drive the ADC conversions to sample
> > a 60 kHz radio signal.
>
> I thought the idea is just use an ultra-low powered clock (32khz
> crystal) like a conventional digital watch, with a radio that you
> activate once a day for 1 minute or so, to get the time signal and
> adjust the clock. So if the clock has a 200 mAH 3 volt coin cell,
> and you want it to run for 3 years on a battery, assuming the
> 32 khz operation takes basically no power, the radio can use
> 30 mW if you run it 1 minute a day. That seems pretty doable
> with a conventional part. In fact radio controlled clocks and
> watches are widely available, so I don't think this application
> really uses unique GA capabilities.

I don't think that is the way these devices work. They need to be
able to monitor the signal at all times because the signal strength is
marginal at best and it can't predict when it will be able to receive
it. The nature of the signal itself is to provide for averaging over
long periods to pull the signal out of the noise. The data rate is 1
bit per second with a 60 kHz carrier.

> Anyway, do they even try to digitize the time signal? It's not
> just analog demodulation making a serial signal given to a cpu
> input pin?

Not sure what you mean by "digitize the time signal". The signal is a
1 bps digital bit pattern giving the time once every minute. The bits
are pulse width encoded and then amplitude modulated onto the 60 kHz
carrier.

> > But the devil is in the details and I would like to see some
> > characterization data on the ADC. The talk about linearization code
> > for the ADC as well as temperature compensation, etc.
>
> It would surprise me if the cheap consumer radio clocks already
> available use any type of precise a/d's.

No, but that is not the only app I have that would need ADCs. I
currently produce a board that has an FPGA, a CODEC and various other
chips that processes CD quality audio at up to 48 kSPS and can go up
to 192 kSPS if there were a need to. The GA144 might be able to do 48
kSPS at 16 bits but that is not completely clear to me yet. It will
require software to linearize the transfer function as well as
measuring/setting the gain factor. My real concern is that talk is
cheap and until someone does a complete design and tests the ADC/DAC
as a piece of audio equipment, there is no way to know how well it
will work.

> > I guess you are talking fairly high data rates if you are worried
> > about the bandwidth to memory.
>
> Well, if I'm running some monstrous computational task, and I want to
> replace racks full of PC's with racks full of GA's... a guy can dream ;-).
>
> More realistically I'm thinking of GPGPU-style applications which do use
> quite high memory bandwidth. The (power hungry and expensive) NVidia
> Tesla chip apparently has about 150 GB/s bandwidth (384 bit bus, 3.1 ghz
> clock) and I don't think the GA144 can come near that.

No, this is one limitation of the GA chip. It was never intended to
work in such an app and I doubt would do better than a GPU if it
could. GPU apps run constantly and as you have been describing need
to maximize the efficiency of every processor. Those chips are
optimized for that job and do it well. The GA devices are not
optimized for those constraints and so would do poorly.

> > A customer has just lit a fire and I need to put it out. ... This
> > will take me a few days. I wish I had bought a decent logic analyzer
> > a couple of weeks ago. The ancient beast I'm still using just doesn't
> > cut the mustard sometimes.
>
> OK, good luck with it, I hope it works out. I'd have no clue how to
> fix an issue like that.

I need to buy a new one, but I want one combined with a decent scope.
They are all pricey. Hantek has some listed on their web site, but
they are not yet shipping and they don't really give specs, so they
may not do the job and may still be pricey. Since I haven't been able
to fix my test fixture problem, I'll have to use it as is and live
with the slower testing.

Rick

Bernd Paysan

unread,

Apr 22, 2011, 4:06:10 PM4/22/11

to

rickman wrote:

> GPU...? what??? There are no GPUs in a cell phone! An ARM Cortex
> chip has no GPU

Ok, here's a description of the processor in a quite popular cell phone:

http://en.wikipedia.org/wiki/Apple_A4

It has a PoverVR GPU. Same with the successor:

http://en.wikipedia.org/wiki/Apple_A5

Very similar chips can be found in other smartphones, where your
location data is not stored in Cupertino, but in Mountain View instead
(not that it really matters which cellphone operating system maker knows
your whereabout ;-).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

Paul Rubin

unread,

Apr 23, 2011, 7:03:13 AM4/23/11

to

Albert van der Horst <alb...@spenarnc.xs4all.nl> writes:
> Getting code to interior nodes? That is like being worried about not
> knowing what is going on inside a compiler.

As a compiler hacker, of course I want to know what compilers do.

> There were a couple of pieces of code, and then you tell the
> system which code runs on what node, and how they are connected.
> The system communicated with a PC through a serial line.

OK, so there is some kind of boot loader that moves the code around.
Can you replace code on the fly on individual nodes, during different
phases of a computation?

> We (Leon Konings and I) have moved away from blocks,
> and now use ascii source, with some (own) tools.
> (Been mentioned before on this forum.)

The blocks thing has its charms, but yeah, ascii seems more practical.

Paul E. Bennett

unread,

Apr 23, 2011, 7:17:09 AM4/23/11

to

rickman wrote:

[%X]

>> > A customer has just lit a fire and I need to put it out. ... This
>> > will take me a few days. I wish I had bought a decent logic analyzer
>> > a couple of weeks ago. The ancient beast I'm still using just doesn't
>> > cut the mustard sometimes.
>>
>> OK, good luck with it, I hope it works out. I'd have no clue how to
>> fix an issue like that.
>
> I need to buy a new one, but I want one combined with a decent scope.
> They are all pricey. Hantek has some listed on their web site, but
> they are not yet shipping and they don't really give specs, so they
> may not do the job and may still be pricey. Since I haven't been able
> to fix my test fixture problem, I'll have to use it as is and live
> with the slower testing.

Jack Ganssle was reviewing the new scope and logic analyser instruments from
Agilent in his column in EE Times just recently. Not sure what price range
you would find supportable but the devices seemed to have everything but the
kitchen sink and cofee maker. I know, that may be the deal breaker.

As to use of GA144 chips I have had a few ideas along the binocular vision
route. Two (cheap) cameras feeding the chip with video streams and
performing something like Canny Edge Detection to determine the
characteristics of the scene before it. This would be used in a self-guiding
robot. The Canny Edge Detection algorithm looks like it could be spread over
three or four cells for each vision channel with a few others to make sense
out of the scene and provide guidance to the motors. I am not worried that I
may not use the full capability of the whole chip for this.

Of course, if you get to need real processing muscle, hypercubing the chips
would give you enormous processing potential.

--
********************************************************************
Paul E. Bennett...............<email://Paul_E....@topmail.co.uk>
Forth based HIDECS Consultancy
Mob: +44 (0)7811-639972
Tel: +44 (0)1235-510979
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
********************************************************************

rickman

unread,

Apr 23, 2011, 9:42:41 PM4/23/11

to

On Apr 22, 4:06 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> rickman wrote:
> > GPU...? what??? There are no GPUs in a cell phone! An ARM Cortex
> > chip has no GPU
>
> Ok, here's a description of the processor in a quite popular cell phone:
>
> http://en.wikipedia.org/wiki/Apple_A4
>
> It has a PoverVR GPU. Same with the successor:
>
> http://en.wikipedia.org/wiki/Apple_A5
>
> Very similar chips can be found in other smartphones, where your
> location data is not stored in Cupertino, but in Mountain View instead
> (not that it really matters which cellphone operating system maker knows
> your whereabout ;-).

You snipped my post. How about if you read the full post?

Rick

foxchip

unread,

Apr 24, 2011, 2:10:22 AM4/24/11

to

On Apr 23, 3:03 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> OK, so there is some kind of boot loader that moves the code around.
> Can you replace code on the fly on individual nodes, during different
> phases of a computation?

There is an SPI flash boot node, an autobaud asynchronous serial
boot node, an synchronous serial boot node, a 1-wire serial boot
node, and two serdes boot nodes with boot code in ROM. Any of
these nodes can load boot any or all nodes into RAM and then
any node with pins can be used to boot in other ways. Of course
you can replace code on the fly on individual nodes by writing
to RAM.

You can also boot from a combination of nodes, loading a program
from SPI flash to all nodes on the chip, or loading an external
memory driver and then booting from parallel flash, then you
can load programs interactively or run debug diagnostic software
from any node with pins. This is all very basic stuff that has
been discussed for years.

I found this very helpful when I spent a couple of days creating
the talking voltmeter demo that used five nodes. I loaded programs
and ran them interactively from the PC, then I wrote some of it
to flash after that was debugged. Then I booted it from SPI flash
and continued to run it interactively from the PC, added a little
more code and wrote that to flash to boot the full application.
I picked it as an example because I noticed that the parts it
used, booting from flash, reading data from flash, using analog
output pins or doing pwm analog output on a digital pin to play
speech, doing analog input, linearizing analog input, converting
A/D input into digits, indexing into SPI flash to play the
recorded spoken analog digits, buffering the streamed voice output,
and running from flash in a stand-along mode or offering various
ways to interact with it at a Forth command line did not take
much code. These were the sorts of beginners tutorials I saw
being offered on other microcontrollers and I wanted to show
how simple it all was to partition and implement and how the
whole thing could be done in about 175 words of memory. I
wanted to show that that sort of program only needed a tiny
fraction of the resources on a 24node or 40node design.

> > We (Leon Konings and I) have moved away from blocks,
> > and now use ascii source, with some (own) tools.
> > (Been mentioned before on this forum.)

VentureForth also used ascii source in files so that programers
who had abandoned Forth blocks or never appreciated the advantages
they have over files could play using the tools that they used
for other things. Some people write editors, some say that they
would kill if someone tried to take away the editor that they
are used to using that was written by someone else. ;-)

Chuck never used VentureForth or the ascii files. I made the
mistake of talking about blocks in past tense once around Chuck
and he corrected me. I was the person who felt that most
people had never appreciated the advantages of blocks for
simplicity of software, speed, throughput, editing of source,
and data storage and that we could offer an enviroment that
required only using a small fraction of Forth. I also did
CAD and software development work in colorforth so I was in
a pretty good position to compare the details of the use of
both environments.

I did feel sort of like a traitor for offering the VentureForth
tools to the public in ANS Forth, using files, etc. when I knew
what really worked best. But I also had seen how many people had
abandoned the best features of Forth and how many never grasped
what they were about and had seen how many people in c.l.f just
absolutely hated how different colorforth was from things they
already knew.

> The blocks thing has its charms, but yeah, ascii seems more practical.

VentureForth did have some features that were not implemented in
colorforth blocks at the time like generating code templates and
doing dataflow algebra to generate test code templates, automate
place and route of functions, and prove correctness. Of course
those things are not hard but many people complained that they
didn't understand those things, and were sure they were hard and
would be the source of their worst errors or just scare them
away from wanting to try simple stuff kids learn to do.

I found it amusing when Dave Guzeman would tell people that they
were silly to be so worried about place and route. He would say that
from everything he had seen it was about like deciding how to
place your groceries in your refrigerator and he doubted if people
really had to rely on computer programs to put their food in their
fridge. I was also offering tools designed for when one connected
up a few hundred or a few thousand cluster chips together which
does make place and route, data flow correctness, and interactive
debugging in Forth a little more complicated than on chips with
only a few small nodes to deal with.

Best Wishes

Paul Rubin

unread,

Apr 25, 2011, 12:00:34 PM4/25/11

to

> I found this very helpful when I spent a couple of days creating

> the talking voltmeter demo that used five nodes. ...

That sounds like it would make a nice tutorial/application note.

> These were the sorts of beginners tutorials I saw
> being offered on other microcontrollers and I wanted to show
> how simple it all was to partition and implement and how the
> whole thing could be done in about 175 words of memory. I
> wanted to show that that sort of program only needed a tiny
> fraction of the resources on a 24node or 40node design.

See, here's the issue I'm wondering about. 175 words of code plus some
data can fit on 5 nodes, which could be 1 central node doing something
like remote procedure calls to its 4 neighbors using ports, fine. But
what happens if you need 10 or 20 nodes? Now you need nodes
communicating with other node that are somewhat distant on the chip, so
there has to be code on the intermediate nodes to route traffic around
while still getting on with its own functions, and it starts to seem
really cramped with so few channels and so little code space. It's not
like a PC where can you just plop in a multi-threaded TCP server. How
do you handle this?

> VentureForth did have some features that were not implemented in
> colorforth blocks at the time like generating code templates and
> doing dataflow algebra to generate test code templates, automate
> place and route of functions, and prove correctness. Of course
> those things are not hard

Are you saying colorforth has this stuff now? Maybe you're not using
the terminology the same way as me, but I think of those as nontrivial
problems.

> I found it amusing when Dave Guzeman would tell people that they
> were silly to be so worried about place and route. He would say that
> from everything he had seen it was about like deciding how to
> place your groceries in your refrigerator and he doubted if people
> really had to rely on computer programs to put their food in their
> fridge.

I know this is a big issue for CAD programs that tend to be one of the
driving applications for SAT solvers.

rickman

unread,

Apr 25, 2011, 8:10:11 PM4/25/11

to

On Apr 25, 12:00 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> > I found this very helpful when I spent a couple of days creating
> > the talking voltmeter demo that used five nodes. ...
>
> That sounds like it would make a nice tutorial/application note.
>
> > These were the sorts of beginners tutorials I saw
> > being offered on other microcontrollers and I wanted to show
> > how simple it all was to partition and implement and how the
> > whole thing could be done in about 175 words of memory. I
> > wanted to show that that sort of program only needed a tiny
> > fraction of the resources on a 24node or 40node design.
>
> See, here's the issue I'm wondering about. 175 words of code plus some
> data can fit on 5 nodes, which could be 1 central node doing something
> like remote procedure calls to its 4 neighbors using ports, fine. But
> what happens if you need 10 or 20 nodes? Now you need nodes
> communicating with other node that are somewhat distant on the chip, so
> there has to be code on the intermediate nodes to route traffic around
> while still getting on with its own functions, and it starts to seem
> really cramped with so few channels and so little code space. It's not
> like a PC where can you just plop in a multi-threaded TCP server. How
> do you handle this?

Not sure how you get 175 words of code. It is 64 18 bit words per
node, no? Each word has up to... I forget, four instruction? So it
would be up to 5 * 64 or 320 words or over 1000 instructions in five
nodes.

If you can't design an app to split the task between a number of nodes
so that each one fit in the memory available, I don't think I would
use the GA devices. Typically for these devices, I would consider the
data path first and allocate processors along the line of the data
flow, typically data flow is linear. Then if there will be processors
that need to interpret commands I would allocate a flow for the
control path. Depending on the app, that may be more of a tree
structure, or it may suit to pass the control along with the data in
packets.

Just consider how you want to decompose the design and then allocate
to nodes... like groceries in your refrigerator... I like that.

> > I found it amusing when Dave Guzeman would tell people that they
> > were silly to be so worried about place and route. He would say that
> > from everything he had seen it was about like deciding how to
> > place your groceries in your refrigerator and he doubted if people
> > really had to rely on computer programs to put their food in their
> > fridge.
>
> I know this is a big issue for CAD programs that tend to be one of the
> driving applications for SAT solvers.

I don't know about the rest of it, but this is a question I can
answer. The difference between place and route in an ASIC or FPGA and
putting away your groceries is the order of magnitude. I can picture
everything in my refrigerator, even if that is not an altogether
pleasant thought, while it is virtually impossible for a human to
manage place and route for 10,000 or 100,000 LUT/FFs with some 6 or
more inputs and up to two outputs each. 144 nodes with just four
channels each is a piece of cake... in the fridge.

Rick

Albert van der Horst

unread,

Apr 26, 2011, 7:03:24 AM4/26/11

to

In article <7xk4f0y...@ruckus.brouhaha.com>,

You've been looking at programs. Then I can understand that conclusion.

I've been looking in Elektor, an electronics magazine.
About half of the designs published would be better off on the GA144.
Not necessarily cheaper but better under control, better specs, ,
smaller, less PC real estate, less connections. Also easier to program
in principle, but current tools don't cut it.

A notable example from the last issue : an rs232 to VGA converter.

Albert van der Horst

unread,

Apr 26, 2011, 7:22:30 AM4/26/11

to

In article <dd08c476-df9c-4480...@j16g2000pro.googlegroups.com>,
foxchip <f...@ultratechnology.com> wrote:
<SNIP>

>
>I found this very helpful when I spent a couple of days creating
>the talking voltmeter demo that used five nodes. I loaded programs
>and ran them interactively from the PC, then I wrote some of it
>to flash after that was debugged. Then I booted it from SPI flash
>and continued to run it interactively from the PC, added a little
>more code and wrote that to flash to boot the full application.
>I picked it as an example because I noticed that the parts it
>used, booting from flash, reading data from flash, using analog
>output pins or doing pwm analog output on a digital pin to play
>speech, doing analog input, linearizing analog input, converting
>A/D input into digits, indexing into SPI flash to play the
>recorded spoken analog digits, buffering the streamed voice output,
>and running from flash in a stand-along mode or offering various
>ways to interact with it at a Forth command line did not take
>much code. These were the sorts of beginners tutorials I saw
>being offered on other microcontrollers and I wanted to show
>how simple it all was to partition and implement and how the
>whole thing could be done in about 175 words of memory. I
>wanted to show that that sort of program only needed a tiny
>fraction of the resources on a 24node or 40node design.

Is this an official Green Arrays project?

That would tell me there is some seriously wrong in the
marketing department.

Instead of making the world understand that this chip can
do a crystal oscillator with just the crystal (even if it
must be very low frequency), you embark on new projects.

If you want to promote GreenArrays,
just finish the oscillator document for Pete's sake!
Nobody believes the claims that are made there, short
of people who have looked very deeply into the chips,
like me. And these result have been discussed, extensively
on the embedded newsgroups.

That would make enough of a splash in electronic cycles.
You don't need anything more spectacular. If there is a
new paradigm, people really need a focus point to wrap
their heads around.

Its like with Jezus. A small miracle like turning water
into wine is very convincing, if it happens before your
very eyes.

>Best Wishes

Paul Rubin

unread,

Apr 28, 2011, 2:17:36 AM4/28/11

to

Charley Shattuck <csha...@surewest.net> writes:
> Take a look at the appnote on MD5, http://www.greenarraychips.com/home/
> documents/pub/AP001-MD5.html on the Green Arrays website. Skip down to

> the source blocks 842, 844, and 850. ...

Charley, I bookmarked this post when you posted it but wanted to get
around to thanking you and saying it is appreciated. The md5 code is
more understandable now than it was when I first looked at it, since
I've gotten to understand the processor a little better.

It occurs to me that it's probably possible to implement the Salsa20
stream cipher in one node, if that's of any interest. TEA/XXTEA is
another possibility.

> And, as Greg said, the appropriate place to ask such questions is the
> hot...@GreenArrayChips.com if you'd like to get an answer from someone
> who knows it.

Right now I'm just sort of a sight-seer (not involved in any hardware
development) so I wouldn't feel right sending questions to a channel
intended for more legitimate business communication.

foxchip

unread,

May 1, 2011, 1:05:55 PM5/1/11

to

On Apr 26, 3:22 am, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:

> >I found this very helpful when I spent a couple of days creating
> >the talking voltmeter demo that used five nodes. I loaded programs
> >and ran them interactively from the PC, then I wrote some of it
> >to flash after that was debugged. Then I booted it from SPI flash
> >and continued to run it interactively from the PC, added a little
> >more code and wrote that to flash to boot the full application.
> >I picked it as an example because I noticed that the parts it
> >used, booting from flash, reading data from flash, using analog
> >output pins or doing pwm analog output on a digital pin to play
> >speech, doing analog input, linearizing analog input, converting
> >A/D input into digits, indexing into SPI flash to play the
> >recorded spoken analog digits, buffering the streamed voice output,
> >and running from flash in a stand-along mode or offering various
> >ways to interact with it at a Forth command line did not take
> >much code. These were the sorts of beginners tutorials I saw
> >being offered on other microcontrollers and I wanted to show
> >how simple it all was to partition and implement and how the
> >whole thing could be done in about 175 words of memory. I
> >wanted to show that that sort of program only needed a tiny
> >fraction of the resources on a 24node or 40node design.
>
> Is this an official Green Arrays project?

No. That was demo done in VentureForth at IntellaSys.
The reference was to the SEAforth24 24 node processor and
VentureForth compiler released five years ago. It was
intended to show the sort of simple things that children
could understand.

> That would tell me there is some seriously wrong in the
> marketing department.

You don't seem to pay a lot of attention to details like
who did what or when on what or at which company or how
it worked or how simple some things are that seem like
miracles to you.

> Instead of making the world understand that this chip can
> do a crystal oscillator with just the crystal (even if it
> must be very low frequency), you embark on new projects.

That reference was a different company, different compiler,
different application, and being done years ago you have
things going backwards in time since it is hard to "embark
on new projects" that were actually done in a few day and
a few years ago.

> Its like with Jezus. A small miracle like turning water
> into wine is very convincing, if it happens before your
> very eyes.

I think the crystal example is a good easy project for
children to do as a learning exercise for starters with
limited understanding and background.

Best Wishes

foxchip

unread,

May 1, 2011, 1:37:18 PM5/1/11

to

On Apr 25, 8:00 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> See, here's the issue I'm wondering about. 175 words of code plus some
> data can fit on 5 nodes, which could be 1 central node doing something
> like remote procedure calls to its 4 neighbors using ports, fine. But
> what happens if you need 10 or 20 nodes?

The design began by looking at the required dataflow to have analog
readings coming in on one end, the SPI flash serving up recorded sound
on the other end, and node doing analog our or pwm analog out on a
digital pin to one side, and a few nodes doing some conversions along
the way. Once a simple dataflow pattern was determined a very simple
code template for the data flow was generated and a little conversion
code was added to that to define the application.

When laying it out any nodes that needed to be wires just used the
code for the dataflow template instantiated with IN and OUT ports
specified by its placement. The dataflow code was generated and
the instantiated IN and OUT on each node was generated by placement.

The idea was the this simple approach worked for 5, 10, or 10,000
nodes quite easily. Doing 5 or 10 nodes manually is easy but
doing 10,000 or 100,000 nodes individually one by one would not
be very productive.

The SEAforth24 design done first had SPI FLASH more distant from
analog so it needed a few simple wire nodes instantiated with the
data flow pattern template code. The SEAforth40 had SPI flash close
to analog nodes so did not need the simple wire nodes with the
simple dataflow template code. It was a simple demo after all
meant to show it was at least as simple and maybe smaller than
the same demo done on other small embedded micros.

> Now you need nodes
> communicating with other node that are somewhat distant on the chip, so
> there has to be code on the intermediate nodes to route traffic around
> while still getting on with its own functions, and it starts to seem
> really cramped with so few channels and so little code space.

That was not the case. You are overly concerned about something about
as simple as something could be. How complicated is a line or two of
machine generated code plopped down on a node?

> It's not like a PC where can you just plop in a multi-threaded
> TCP server. How do you handle this?

You plop in a mult-threaded TCP server if you have one in the
library just like on a PC. When ever I mention something like
that Greg always says, you are just talking about a TCP/IP server
as he has lots of experience with those.

It was also a demo of how people who plop things in a from a
library to create an application could do just that in a GUI
based drag and drop environment just like they like to do on a
PC or with some embedded programming tools. ;-)

> > VentureForth did have some features that were not implemented in
> > colorforth blocks at the time like generating code templates and
> > doing dataflow algebra to generate test code templates, automate
> > place and route of functions, and prove correctness. Of course
> > those things are not hard
>
> Are you saying colorforth has this stuff now? Maybe you're not using
> the terminology the same way as me, but I think of those as nontrivial
> problems.

Did you read the paper explaining it? I did a little more reading
on Haskell and how it approaches parallel programming and got more
insight into why you see so many things as being so complicated.

No. I was saying that VentureForth was designed for drag and drop
programmers who want to plop things in from a library and have the
compiler make sure that the details are correct and the dataflow
is verified correct with the generated templates instantiated by
just doing a drag and drop onto a picture. It was not hard stuff
to do but it was not the sort of thing the programmers who were
doing work in colorforth were doing. And in those days colorforth
did not even have a softsim module.

> I know this is a big issue for CAD programs that tend to be one of the
> driving applications for SAT solvers.

Most of the colorforth users back then were doing VLSI CAD designing
simple circuits that other CAD programs could not comprehend. They
were not writing application programs and the code they were using
was not designed to work like the code used by drag and drop
CAD programmers. GA being a smaller company has focused on using
their colorforth tools.

Best Wishes

Paul Rubin

unread,

May 2, 2011, 3:14:25 AM5/2/11

to

foxchip <f...@ultratechnology.com> writes:
> When laying it out any nodes that needed to be wires just used the
> code for the dataflow template instantiated with IN and OUT ports
> specified by its placement. The dataflow code was generated and
> the instantiated IN and OUT on each node was generated by placement.
>
> The idea was the this simple approach worked for 5, 10, or 10,000
> nodes quite easily. Doing 5 or 10 nodes manually is easy but
> doing 10,000 or 100,000 nodes individually one by one would not
> be very productive.

Thanks, maybe one issue is that I haven't seen your tools in action, so
I have to just go by the chip data sheet. Do your tools have manuals
online? If yes, I haven't seen them.

An example of what I'm asking: suppose you want to implement a 4Kbyte
lookup table in ram. With 2 bytes per word that takes 32 nodes, with no
ram space left for holding code. So if you have a 4*8 block of nodes
for the ram, how do you get data out of the interior ones? Can a code
word written to an exterior node (on one edge) route all the way to the
other side of the block somehow? Is there around 5ns delay for each
node that the data has to traverse?

I also wonder if there's a way to 1) asynchronously check whether input
is available on a port; 2) listen to all 4 ports simultaneously and
sleep until input appears on one of them (like "select" in unix).

> You plop in a mult-threaded TCP server if you have one in the
> library just like on a PC.

That is pretty impressive with just 64 words of ram...

>> Are you saying colorforth has this stuff now?

> Did you read the paper explaining it?

Probably not--which paper?

> No. I was saying that VentureForth was designed for drag and drop
> programmers who want to plop things in from a library

Oh I see, I didn't realize that.

Albert van der Horst

unread,

May 2, 2011, 8:01:49 AM5/2/11

to

In article <2195d916-e3ca-4c72...@i39g2000prd.googlegroups.com>,
foxchip <f...@ultratechnology.com> wrote:
>On Apr 26, 3:22=A0am, Albert van der Horst <alb...@spenarnc.xs4all.nl>

>wrote:
>> >I found this very helpful when I spent a couple of days creating

Please note: this happened a few years ago.

>> >the talking voltmeter demo that used five nodes. =A0I loaded programs

>
>> That would tell me there is some seriously wrong in the
>> marketing department.
>
>You don't seem to pay a lot of attention to details like
>who did what or when on what or at which company or how
>it worked or how simple some things are that seem like
>miracles to you.

No, I don't pay much attention to details in your long
repetitious posts.
They tend to be vague and, may I say so, overly pretentious.
Still a talking voltmeter should have drawn my attention.

>
>> Instead of making the world understand that this chip can
>> do a crystal oscillator with just the crystal (even if it
>> must be very low frequency), you embark on new projects.
>
>That reference was a different company, different compiler,
>different application, and being done years ago you have
>things going backwards in time since it is hard to "embark
>on new projects" that were actually done in a few day and
>a few years ago.

I think you should have at least have mentioned that it was
done in a different company, with a different compiler.
So we could draw the conclusion that it was hardly relevant.
Can you at least comment on how good this project will run
on the Green Array chips?

If you are not working on the crystal oscillator and not
on the talking Voltmeter, what are you working on lately?

>
>> Its like with Jezus. A small miracle like turning water
>> into wine is very convincing, if it happens before your
>> very eyes.
>
>I think the crystal example is a good easy project for
>children to do as a learning exercise for starters with
>limited understanding and background.

You shouldn't downplay the wine miracle. It knocked people
off their socks, because they didn't understand how Jesus
did it. The understanding of children nowadays would
have impressed Greek philosophers, but they have been taught,
which is different from inventing.

This is a silly rationalisation of a decision not to beef up
the crystal document. From a marketing point of view this
decision is weird, if not indefensible. Unless, of course,
Green Arrays can't. That is what the world thinks.
What I think is that a practical circuit with a 10 Mhz
crystal is definitively non-trivial.

foxchip

unread,

May 2, 2011, 10:51:53 AM5/2/11

to

On May 2, 4:01 am, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:

> No, I don't pay much attention to details in your long
> repetitious posts.

Yes I have heard you make that excuse before for your
ignorance and repeated posting of untruths.

Then please don't ask me to repeat myself again if you just
simply refuse to read my answers, documentation, or watch the
videos of presentations by Chuck, by me, and by others so
that you can brag about your ignorance and post insults.

> They tend to be vague and, may I say so, overly pretentious.
> Still a talking voltmeter should have drawn my attention.

Many of the presentations I have done for SVFIG and papers I
posted were directly in answer to questions asked in c.l.f
or the absurd stuff that gets posted over and over by people
who say that they don't read the documentation complain
that it never existed and don't bother looking at real
details until many years after the fact.

When I read the nay-sayers, detractors, Forth-haters, and
people who refuse to read real documentation writing things
like "Jeff Fox does not use abstractions and advises other
Forth programmers to not use abstractions" or posting their
own crazy code or made up facts I am likely to post details
of some of the abstractions I do use, post real code, and
actual facts whether you personally like it or bother to
pay attention or not. Some people do learn and I have
helped a few who did pay attention to earn a million
here or there along the way.

> I think you should have at least have mentioned that it was
> done in a different company, with a different compiler.

I not only mentioned it, I gave a long presentation on the
details which was video taped in 2008. The data flow stuff
was discussed in c.l.f in explanations of the 3D vision
project done with BMW in 2007 and 2008. The talking
voltmeter tutorial was presented in 2008, with more
details and discussions in c.l.f before Green Array
Chips began. I mentioned it again when you posted
the insults where you confused the past for the future.

I did think Greg Bailey did an excellent job in his Forth
Day presentation in 2010 of addressing all the questions
and concerns that I had seen people raise in discussions
in c.l.f in 2009 and 2010 with the exception of the
questions I and others addressed about the use of the
tools or the technical details of how things work.

We had quite a few discussions of VentureForth last
year in c.l.f. and I moved on after being told by
Elizabeth in no uncertain terms that I actually didn't
write it at all.

I thought the funniest thing presented at last year's
Forth Day was the offhand comment Greg made about how
there was no hidden clause in the colorforth license
that said that if you write any code for it that you
won't own what you wrote.

I mentioned to Greg and Chuck the last time I was up
in Incline Village NV that I thought the comment was
funny. Chuck's response was that the felt it was
better that Forth Inc. claimed ownership and authorship
than that it be tied up and kept locked away by TPL.

Of course I could say a lot of things about it but
know what sort of responses are likely in c.l.f. I
have considered explaining things that we did that
were not in the public release in more detail and
that the explanations of the data-flow algebra tools
was one example of that.

> So we could draw the conclusion that it was hardly relevant.
> Can you at least comment on how good this project will run
> on the Green Array chips?

I commented on that in detail in c.l.f about a year ago. You
say you don't read my "long repetitious posts," but provide
your own versions of things, then ask me to repeat myself.
I think I have wasted enough time addressing your noise
on that.

> If you are not working on the crystal oscillator and not
> on the talking Voltmeter, what are you working on lately?

Those were the sorts of things that were designed to teach
people with interest or even children how to do something
easy in a day and to answer questions that were asked.

The things I have done more recently were presented as
part of the plan for 2011 at last years SVFIG Forth at
the end of 2010. I know that some people don't like to
discuss presentations and documentation until many years
later when they complain they never bothered to look at
the details when their questions were first answered.

I think you just got upset when I reviewed the examples
of the sample code you had written and posted a while
back that exposed that you hadn't even bothered to learn
the most basic things like -IF THEN introduced in the
eighties and explained again and again in the context
of answers to questions about Sh-Boom, P21, i21, F21,
P8, P32, X25, SEAforth24, SEAforth40, GA32, GA40, GA4,
and GA144. After all things like -IF THEN was documented
and explained hundreds of times in the last twenty years
yet you somehow missed all of it.

You prefer to brag that you don't read documentation,
don't watch presentations, and don't bother to read
answers no matter how much there is or how many times
your questions actually get answered. At least you
are not alone in that.

Best Wishes

foxchip

unread,

May 2, 2011, 11:20:05 AM5/2/11

to

On May 1, 11:14 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> Thanks, maybe one issue is that I haven't seen your tools in action, so
> I have to just go by the chip data sheet. Do your tools have manuals
> online? If yes, I haven't seen them.

I think this is where I came in when over a year ago you were
complaining that the megabytes of documentation, gigabytes of
videos, and the many tutorials and explanations I had given
didn't exist at all. I do get tired of repeating myself to
people who prefer to argue that they can keep their eyes and
ears and mind closed.

It would be so much easier if people would listen to answers
or read documentation or even watch videos rather than just
ask the same questions over and over.

> I also wonder if there's a way to 1) asynchronously check whether input
> is available on a port;

Sure, that's basic stuff documented a hundred times in a hundred
places. The IOCS register contains the status of neighbor
ports. One can read it without halting and see if any neighbors are
sleeping waiting for responses to port reads or writes.

> 2) listen to all 4 ports simultaneously and
> sleep until input appears on one of them (like "select" in unix).

Sure, that's basic stuff documented a hundred times in a hundred
places. It's called multiport read to address RDLU which stands
for Right, Down, Left, Up. This name is also the order in which
the status bits appear in the IOCS register.

I get the impression more and more that you still haven't read
any documentation.

> > Did you read the paper explaining it?
>
> Probably not--which paper?

As I say, I think this is where I began discussions with you
about a year ago when you were complaining that the explanations,
documentation, and videos didn't exist. I do get tired of
repeating myself to answer your same basic questions when you
simply don't bother to look at documentation.

The answer to that question is that I posted that explanation and
further documentation to my 2008 Forth Day presentation in my
blog at http://www.ultratechnology.com/blog.htm#ForthDay2008

where there is also a link to

A Transformational Algebra for Communicating Sequential Process
Data-Flow Diagram Statements in Classes of Parallel Forthlet
Objects for Design, Automated Place and Route, and Application
Development on the SEAforth Architecture.

http://www.ultratechnology.com/CSP-data-flow_diagrams.doc

and where there was also code to "plop down" a talking
voltmeter demo. The example given did not show the GUI
based drag and drop interface done by one of the Russian
programmers but it did mention it.

> > No. I was saying that VentureForth was designed for drag and drop
> > programmers who want to plop things in from a library
>
> Oh I see, I didn't realize that.

Yes. Now I would prefer to move on from the 2008 documentation
of the work done before that.

Best Wishes

Paul Rubin

unread,

May 4, 2011, 12:26:28 AM5/4/11

to

foxchip <f...@ultratechnology.com> writes:
>> I also wonder if there's a way to 1) asynchronously check whether input
>> is available on a port;

> Sure, that's basic stuff documented a hundred times in a hundred places.
> The IOCS register contains the status of neighbor
> ports. One can read it without halting and see if any neighbors are
> sleeping waiting for responses to port reads or writes.

It's not in the obvious place, which is the chip data book, but I was
able to find some more info once I knew what to look for, so thanks.

>> 2) listen to all 4 ports simultaneously

> Sure, that's basic stuff documented a hundred times in a hundred places

I hope at some point GA can collect all this info into one document (it
could be called something like a "reference manual") instead of
expecting people to keep tabs on what's scattered in 100 places around
the internet.

> It's called multiport read to address RDLU which stands>

> I get the impression more and more that you still haven't read
> any documentation.

Similarly, the data book said nothing about this, they just said things
like "when you read from a port, your node sleeps til data is
available". But with this new info I found a passing reference to
multiple reads in a doc I'd already looked at.

> blog at http://www.ultratechnology.com/blog.htm#ForthDay2008
> where there is also a link to...
> http://www.ultratechnology.com/CSP-data-flow_diagrams.doc

Thanks, this was an informative document.

Greg Bailey

unread,

May 7, 2011, 11:11:21 PM5/7/11

to

"Paul Rubin" wrote in message news:7x39kvw...@ruckus.brouhaha.com...

foxchip <f...@ultratechnology.com> writes:
>> I also wonder if there's a way to 1) asynchronously check whether input
>> is available on a port;

> Sure, that's basic stuff documented a hundred times in a hundred places.
> The IOCS register contains the status of neighbor
> ports. One can read it without halting and see if any neighbors are
> sleeping waiting for responses to port reads or writes.

It's not in the obvious place, which is the chip data book, but I was
able to find some more info once I knew what to look for, so thanks.

>> 2) listen to all 4 ports simultaneously
> Sure, that's basic stuff documented a hundred times in a hundred places

I hope at some point GA can collect all this info into one document (it
could be called something like a "reference manual") instead of
expecting people to keep tabs on what's scattered in 100 places around
the internet.

If by "chip data book" you refer to DB002, please refer to its first section
in which Related Documentation is identified. GA documentation is factored
to avoid redundancy which should come as no surprise to anyone :-)

Greg Bailey

unread,

May 9, 2011, 1:23:58 AM5/9/11

to

Albert,

As Regards your questions about what Jeff was working on *recently*, by
which I take it you mean the past weeks since Jeff was hospitalized for
congestive heart failure, I am happy to report that Jeff had been working on
the Automated Testing System (ATS) which we need to implement and use before
we can ship chips to our paying customers. A thorough description of this
system is already in draft and will be publicly released when it is
complete.

I wish you well - Greg

Greg Bailey

unread,

May 9, 2011, 1:35:21 AM5/9/11

to

Apologies, my reply to this message last night was hidden in the body of the
quoted message.

The first section of each of our Data Books has a heading for "Related
Documentation."

In the case of the Chip data book, DB002, this section reads as follows:

1.2 Related Documents
This book describes this particular model of GreenArray chip, including its
array and I/O configuration, pin-out, ROM contents, packaging, and its
electrical and physical characteristics. In the interest of avoiding
needless and often confusing redundancy, it is designed to be used in
combination with other documents describing standard architecture and other
components of the chip.
The general characteristics and programming details for the F18A computers
and I/O used in this chip are described in a separate document; please refer
to F18A Technology Reference. The boot protocols supported by this chip are
detailed in Boot Protocols for GreenArrays Chips. The current editions of
these, along with many other relevant documents and application notes as
well as the current edition of this document, may be found on our website at
http://www.greenarraychips.com . It is always advisable to ensure that you
are using the latest documents before starting work.

As I wrote here recently, Greenarrays solicits e-mail from anyone who finds
deficiencies in our documentation.

Thanks - Greg

"Paul Rubin" wrote in message news:7x39kvw...@ruckus.brouhaha.com...

foxchip <f...@ultratechnology.com> writes:

Paul Rubin

unread,

May 9, 2011, 11:11:08 AM5/9/11

to

"Greg Bailey" <gr...@greenarraychips.com> writes:
> Apologies, my reply to this message last night was hidden in the body

> of the quoted message.... RDLU ...

Thanks, the info I wanted was in the F18A Technology Reference (DB001),
pp. 11-12. I meant to post that earlier.