Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Would green arrays produce something with a web browser? like a cheap appliance?

263 views
Skip to first unread message

gavino

unread,
Jan 10, 2012, 5:28:39 AM1/10/12
to
Would green arrays produce something with some persistance and a web browser? perhaps staggeringly cheap? or perhaps something like plan9's 9p that could repalce the web with a network file system or some other metaphore?

Elizabeth D. Rather

unread,
Jan 10, 2012, 1:24:07 PM1/10/12
to
No. Their business is embedded systems, mostly very small low-power
devices, as you would know if you read their web site.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

BruceMcF

unread,
Jan 10, 2012, 1:49:44 PM1/10/12
to
On Jan 10, 5:28 am, gavino <gavcom...@gmail.com> wrote:
> Would green arrays produce something with some persistance and a web browser? perhaps staggeringly cheap?  or perhaps something like plan9's 9p that could repalce the web with a network file system or some other metaphore?

a web browser that replaces the web would be useless because people
use web browsers to browse the stuff on the web and the stuff that is
one the web is on the web and not on the web replacement?

and staggeringly cheap means requiring staggering sales volume to
cover any fixed costs at all? for instance a bot to replace gavino
could be staggeringly cheap but since nobody would pay anything for it
the profit margin would be a minus percent?

Bernd Paysan

unread,
Jan 10, 2012, 2:48:43 PM1/10/12
to
BruceMcF wrote:
> a web browser that replaces the web would be useless because people
> use web browsers to browse the stuff on the web and the stuff that is
> one the web is on the web and not on the web replacement?

Well, Gavinos questions usually sound stupid, but your answer is not
right, either. Apples App-Store and Googles Android Market are full of
apps, which just allow access to web content in a way that is an
improvement over HTML. Often a quite significant improvement.

So there is a point to rethink the web, and do it better (and that does
not mean gradual improvements like HTML5). After all, as content
publisher, you don't want a HTML page, and two apps for your customers,
you want one single thing, easier to maintain.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

Jason Damisch

unread,
Jan 10, 2012, 5:45:07 PM1/10/12
to
On Jan 10, 2:28 am, gavino <gavcom...@gmail.com> wrote:
> Would green arrays produce something with some persistance and a web browser? perhaps staggeringly cheap?  or perhaps something like plan9's 9p that could repalce the web with a network file system or some other metaphore?

No, but you can investigate iTV as they were developing cheap web
browser which might have been pretty cool. But, the internet and
internet programming has become so complex due to such technologies as
Javascript and Flash, that a simple cheap web-browser would in fact
have limited scope and abilities, and only be able to use a subset of
the internet. This would not be a mass appeal device.

Brad

unread,
Jan 10, 2012, 8:04:54 PM1/10/12
to
On Jan 10, 3:28 am, gavino <gavcom...@gmail.com> wrote:
> ... that could replace the web with a network file system or some other metaphore?

Maybe Gavino could be replaced with a Forth program that generates
random questions. Maybe Gavino _is_ such a program.

GreenArrays chips are for devices that bear no resemblance to a
computer: no screen, no network, no keyboard, etc.

-Brad

Arnold Doray

unread,
Jan 10, 2012, 9:31:48 PM1/10/12
to
With a high speed connection, it might be possible to offload the heavy
lifting (CSS rendering, Javascript interactivity) to a server. Your "Web
Device" doesn't do very much apart from display the resulting image, the
network connection and relaying the user input. The new Kindle Fire's
browser takes this approach.

The GA144 could conceivably be used in this context. You could probably
strip out some of the peripheral chips from existing designs, because the
GA144 can do a lot of these in software. CM illustrates some of this
capability in the latest Forth Day "fireside" video. I found it
absolutely fascinating.


Cheers,
Arnold

Jason Damisch

unread,
Jan 10, 2012, 11:35:28 PM1/10/12
to
OnLive does this for video games. I had Onlive installed on my
computer, but I didn't find much on it which I wouldn't rather just
purchase as a game to run on my own hardware. I determined that I
would actually save money by purchasing the very few games that I
would play, and I don't play video games very much these days anyway.
I'm not that impressed with cloud gaming at the moment. My attitude
could change though.

I thought that a GA144 could be used to create one of those cheap
knock off game consoles which cost $40 and have a hundred games
already in ROM. The hardware could be cheap enough in quantity, but
cost of software would be a show stopper, especially if the built in
games were of much good quality. The quality of the built in games
for these knock off consoles have been low enough to hinder their
popularity.

Jason

Arnold Doray

unread,
Jan 11, 2012, 2:21:12 AM1/11/12
to
That's an interesting idea. I can imagine a gaming company with deep
pockets making these cheap GA144 based consoles and selling them at cost.

The user payes for the high quality games. This could be done in stages
-- provide a sample of free games and higher levels are paid.

There's actually a huge target market for this -- the (pre)teens. I
should know -- I have a hard time stopping mine from playing Angry Birds.
This age group is also unlikely to be given a dedicated handphone for
playing games.

Of course, you'd still have to change your game's entire programming
model. Even if someone comes up with a "C" that targets the GA144, it's
still hard going since parallel programming is hard. I don't think it's
possible to just take a C program an parallelize it automatically for the
GA144. You'd have to start afresh. But should not be too hard for simple
games.

Cheers,
Arnold


Rod Pemberton

unread,
Jan 11, 2012, 7:22:46 AM1/11/12
to
"Arnold Doray" <inv...@invalid.com> wrote in message
news:jeisak$von$1...@dont-email.me...
...

<OT browsers>

> With a high speed connection, it might be possible to offload the heavy
> lifting (CSS rendering, Javascript interactivity) to a server.
>

Certain browsers natively support proxies and webpage compression for slow
connections, e.g. Opera, i.e., intended for 56K dialup. I think most
wireless connections would qualify as a "slow" connection use in this case,
even if not that slow. I'm not sure if you can use Opera on portable
devices though. You'll have to check for yourself:

http://www.opera.com/


Rod Pemberton



Rod Pemberton

unread,
Jan 11, 2012, 7:23:40 AM1/11/12
to
"Arnold Doray" <inv...@invalid.com> wrote in message
news:jejd97$rkq$2...@dont-email.me...
> On Tue, 10 Jan 2012 20:35:28 -0800, Jason Damisch wrote:
>
...

<OT gaming consoles>

> > I thought that a GA144 could be used to create one of those cheap knock
> > off game consoles which cost $40 and have a hundred games already in
> > ROM. The hardware could be cheap enough in quantity, but cost of
> > software would be a show stopper, especially if the built in games were
> > of much good quality. The quality of the built in games for these knock
> > off consoles have been low enough to hinder their popularity.
> >
>
> That's an interesting idea.

Aren't there a few devices like that already, e.g., C64 with games, Atari
2600 with games ...

They aren't GA144 based.

> I can imagine a gaming company with deep pockets making these
> cheap GA144 based consoles and selling them at cost.

That sounds like what MS with XBox ...

> The user payes for the high quality games. This could be done in stages
> -- provide a sample of free games and higher levels are paid.
>

That sounds like what MS did with XBox ...

> I have a hard time stopping mine from playing Angry Birds.

You could always retrain them for the new economy (sarcasm):
http://games.adultswim.com/hemp-tycoon-puzzle-online-game.html

> This age group is also unlikely to be given a dedicated handphone
> for playing games.

The modern version of LED handheld football ... ? It's viable.


Rod Pemberton


Rod Pemberton

unread,
Jan 11, 2012, 7:24:04 AM1/11/12
to
"Brad" <hwf...@gmail.com> wrote in message
news:e89dc94a-7290-47f7...@y12g2000yqc.googlegroups.com...
...

> Maybe Gavino could be replaced with a Forth program that generates
> random questions. Maybe Gavino _is_ such a program.

No, his questions are too focused and repetitive. AI returns results
slightly less vague than the magic 8-ball.


Rod Pemberton



Arnold Doray

unread,
Jan 11, 2012, 9:21:18 AM1/11/12
to
Good observations Rod. That's just my ignorance showing. The last game I
played (and enjoyed) was Dune2. I got it out my system then. Everything
else I've seen bores me. :)

I thought the value of the GA144 in this context is that it allows you do
do a lot of bit banging. It's amazingly versatile. You can move into
software what you could only previously do in hardware. This means lower
$ due to parts you don't need, and also because it simplifies console
design and fabrication.

Cheers,
Arnold





Jason Damisch

unread,
Jan 11, 2012, 12:02:00 PM1/11/12
to

> Aren't there a few devices like that already, e.g., C64 with games, Atari
> 2600 with games ...

Yes, there are those. There are the ones that fit inside of a game
controller,
and the game controller itself is the entire console. There is the
Atari Flashback
which is an actual Atari 2600 shrunk down with the games in a single
ROM and with
the pins for the cartridge port unconnected. Then there are the
Chinese knockoffs
which are entirely different hardware, which have better hardware, but
whose games
are not very good, and which get returned in droves after Christmas
because the
customers are not happy with them.

> They aren't GA144 based.

True

> > I can imagine a gaming company with deep pockets making these
> > cheap GA144 based consoles and selling them at cost.
>
> That sounds like what MS with XBox ...

The idea is to not compete with Sony or MS, because you just can't.
The idea,
which actually, probably isn't viable anyway, is to create something
like the
Chinese knock offs, but which are actually fun to play. I think that
I'm just
sort of playing around with ideas here.

> The modern version of LED handheld football ... ?  It's viable.

If you can get good cheap screens, maybe you can use the extra horse
power for
fantastic A.I. Otherwise I'm not sure if there would be an advantage
or not.

> Rod Pemberton

Jason Damisch

unread,
Jan 11, 2012, 11:55:36 AM1/11/12
to
Well, I'm sure that the GA144 is not a dedicated graphics system, so
most
of the cores would be used to generate the graphics from. By adding
the
ability to upload and install new games, you have to add an SD card
and
then your hardware costs go up. At some point you start to compete
against
Sony and Microsoft and the big kids won't accept a no name brand
console
in place of a Sony or XBox with zillions of popular games out for them
already. The big kids are very fashion and brand sensitive. The
little
kids might like a $40 stocking stuffer *IF* the games were good
enough,
but then again, software development costs kill you unless you have a
big toy company actually invest heavily into the project.

You might be better of sticking to a different model or approach if
you
want to include GA chips inside of toys.

Jason

( just making conversation )

BruceMcF

unread,
Jan 11, 2012, 2:01:16 PM1/11/12
to
On Jan 10, 2:48 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> BruceMcF wrote:
>> a web browser that replaces the web would be useless because people
>> use web browsers to browse the stuff on the web and the stuff that is
>> one the web is on the web and not on the web replacement?

> Well, Gavinos questions usually sound stupid, but your answer is not
> right, either.  Apples App-Store and Googles Android Market are full
> of apps, which just allow access to web content in a way that is an
> improvement over HTML.  Often a quite significant improvement.

But how many of those apps that "replace the web" are web browsers?

> So there is a point to rethink the web, and do it better (and that
> does not mean gradual improvements like HTML5).  After all, as content
> publisher, you don't want a HTML page, and two apps for your
> customers, you want one single thing, easier to maintain.

Ceteris paribus, but all other things are rarely equal. Content
publishers may <i>want</i> that, but I rather see it actually heading
the opposite direction, with content publishers competing for content
by proliferating means of access rather than by winnowing down means
of access.

Still, that dimension of the question does not seem to be actually
present in the question: rather, you projected that content into the
question. That encourages gavino to continue posting random questions
in hopes that people will read something into them of interest.

rickman

unread,
Jan 15, 2012, 2:05:57 PM1/15/12
to
What peripheral chips could you push inside the GA144? I think it to
be highly unlikely you could get Ethernet at any standard speed above
10 Mbps in a GA144. Same with USB above 12 Mbps. You could probably
drive a VGA signal at fairly low resolution, but who wants that? I
think 1024 pixels would be required even for a state of the art
handheld device.

The peripherals is where the GA144 falls short in many ways. If you
don't mind rewriting all the code and limited the applications to a
minimum subset the chip will be a good platform. But you will need to
add devices to get the standard interfaces running at standard rates;
USB, Ethernet, HDMI, ...

Rick

Arnold Doray

unread,
Jan 15, 2012, 10:56:31 PM1/15/12
to
I don't think the problems you mention are insurmountable. 10 Mbps could
be OK if you're clever in how you compress the image data you're sending
from server to console. The server does the heavy lifting, the console
only displays and relays user input.

> The peripherals is where the GA144 falls short in many ways. If you
> don't mind rewriting all the code and limited the applications to a
> minimum subset the chip will be a good platform. But you will need to
> add devices to get the standard interfaces running at standard rates;
> USB, Ethernet, HDMI, ...
>

Of course you'd have to do this. But it isn't that hard if you control
all the harware. You'd only need one driver for each.

Cheers,
Arnold


rickman

unread,
Jan 16, 2012, 11:01:01 AM1/16/12
to
I think you are missing my point. Or maybe I am missing yours. There
are few useful interfaces you can push inside the GA144. It can't
handle most useful interfaces that I can see. I am thinking of
building a GA144 board and it looks like I would need to add a
conventional MCU to provide useful interfaces, USB, Ethernet, etc.

But maybe I am wrong. What interfaces did you have in mind?

Rick

Arnold Doray

unread,
Jan 16, 2012, 11:23:17 AM1/16/12
to
On Mon, 16 Jan 2012 08:01:01 -0800, rickman wrote:

>
> I think you are missing my point. Or maybe I am missing yours. There
> are few useful interfaces you can push inside the GA144. It can't
> handle most useful interfaces that I can see. I am thinking of building
> a GA144 board and it looks like I would need to add a conventional MCU
> to provide useful interfaces, USB, Ethernet, etc.
>
> But maybe I am wrong. What interfaces did you have in mind?
>
> Rick

I was comparing the GA144 with something of similar computing muscle &
wattage (like the Atoms, ARMs, etc). My thinking is that compared to
solutions that use these, you could conceivably move some hardware
functions into software by using a GA144 (or more than one, if
necessary). Of course, you can't completely avoid using extra hardware
for Ethernet, USB, etc.

Cheers
Arnold

Paul Rubin

unread,
Jan 16, 2012, 1:30:58 PM1/16/12
to
rickman <gnu...@gmail.com> writes:
> I am thinking of building a GA144 board and it looks like I would need
> to add a conventional MCU to provide useful interfaces, USB, Ethernet, etc.

This doesn't seem so terrible. The GA144 is a big expensive (and
powerful) part compared to an MCU. I bet a typical PC motherboard or
fancy cell phone has dozens of MCU's, so it doesn't seem like a big deal
if complex interfaces for a GA144 also need a few external components.

The GA144 of course can handle GPIO unusually well, by dedicating CPU
nodes to I/O pins. It should be able to do traditional serial or
parallel i/o pretty well, and there are serial or parallel to USB bridge
chips that handle the USB protocol (presumably also for ethernet). The
main bottleneck for doing fancy protocols directly in the GA144 is
probably code space in the nodes rather than processing cycles.

Albert van der Horst

unread,
Jan 16, 2012, 3:53:24 PM1/16/12
to
In article <jf1itl$b22$2...@dont-email.me>,
Arnold Doray <inv...@invalid.com> wrote:
<SNIP>
>
>I was comparing the GA144 with something of similar computing muscle &
>wattage (like the Atoms, ARMs, etc). My thinking is that compared to
>solutions that use these, you could conceivably move some hardware
>functions into software by using a GA144 (or more than one, if
>necessary). Of course, you can't completely avoid using extra hardware
>for Ethernet, USB, etc.

This is limited to level converters. GA144 should be able to
handle Ethernet and USB up to 100 MHz bit-banging.

We are not talking about 12 nm GA's. These could replace the chips
in SATA drives.
I'm much more optimistic in this area than with AI applications.

>
>Cheers
>Arnold


--
--
Albert van der Horst, UTRECHT,THE NETHERLANDS
Economic growth -- being exponential -- ultimately falters.
albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst

rickman

unread,
Jan 16, 2012, 3:39:40 PM1/16/12
to
On Jan 16, 1:30 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> > I am thinking of building a GA144 board and it looks like I would need
> > to add a conventional MCU to provide useful interfaces, USB, Ethernet, etc.
>
> This doesn't seem so terrible.

I guess this is where we disagree. When I select an MCU the
incorporation of functions into the device is what I look for. I
don't want to have to add an Ethernet MAC to my MCU because I can buy
MCUs that include that. Many include the PHY as well. Same with
USB. If the CPUs inside the GA144 can't be used to replace interface
chips to the point that an external MCU needs to be added, the GA144
is no longer an MCU of any note.


> The GA144 is a big expensive (and
> powerful) part compared to an MCU.  I bet a typical PC motherboard or
> fancy cell phone has dozens of MCU's, so it doesn't seem like a big deal
> if complex interfaces for a GA144 also need a few external components.

Sure a PC has many MCUs in it. But I'm not building a PC. I want to
build small embedded systems. On a general purpose board it already
needs an FPGA added to the GA144 just to do level shifting as the
GA144 is 1.8 volt I/O only. I would prefer that I could implement the
interfaces of the day without adding other MCUs.

BTW, how exactly is the GA144 big, or even expensive? At less than
$15 each in some quantity they are way less than most of the high end
ARM devices. Yes, they are at the top end of MCUs, but not out of the
ballpark.


> The GA144 of course can handle GPIO unusually well, by dedicating CPU
> nodes to I/O pins.

That is where it falls short. GPIO often needs something other than
the CPU core voltage. So level shifting is needed. This is a real
shortcoming in a device built in a process some years old. Nearly all
MCUs are available in 5 volt tolerant versions.


> It should be able to do traditional serial or
> parallel i/o pretty well,

Any MCU can do async serial at RS-232 speeds. In fact I bid an FPGA
based design for a 16 serial port board a year or two ago and it was
underbid by someone planning software UARTs on a CM3 ARM.


> and there are serial or parallel to USB bridge
> chips that handle the USB protocol (presumably also for ethernet).

I think they are the opposite direction. When I talk about USB
interfaces I want host capability on the board. Yes, you can get that
in a chip, but why adding a special chip when you can add an MCU with
all the interfaces you need? Then the question is why is the GA144
needed?


> The
> main bottleneck for doing fancy protocols directly in the GA144 is
> probably code space in the nodes rather than processing cycles.

No, processing speed is not likely to be a bottle neck while code
space may be. But that depends on the interface. I'd like to know
how to implement 100 Mbps Ethernet on the GA144 or 480 Mbps USB using
processors that only run 700 MIPS theoretical max. Even if you can
partition the code among the nodes, I don't think the interface can be
driven at the full bit rate. That is another reason to add the FPGA.
Perhaps adding some hardware for the front end will allow the software
to handle the rest. I don't know. But GreenArrays hasn't come out
with any info to explain how to do it either.

I guess I am really lamenting that GA is not showing us how to do the
things that you might want to do with this device. Other MCU vendors
provide all sorts of app notes and design info for their parts. Why?
Because designers won't use the parts if you don't. No one wants to
reinvent the wheel.

Rick

rickman

unread,
Jan 16, 2012, 3:52:45 PM1/16/12
to
I am all ears. How would you implement high speed USB on a GA144
without adding an MCU or something just as expensive and using as much
board space? Like I said, I would just add an MCU that has USB
software available.

10 years ago MCUs may have had support for USB and Ethernet, but you
didn't get software handed to you. Now you do. Just saying the
interface can, in theory, be shoved into software is pointless when
other MCUs exist which provide both the hardware and software.

I don't see the GA144 as a super fast CPU. There is far too little
memory on each node to treat them like general purpose CPUs. I see
the GA144 more like a gate array, each node being an element to be
used to implement a portion of an application without considering if
the full processing speed is used. In that context most of the
potential CPU cycles will be wasted. In fact, the chip is designed to
optimize wasting CPU cycles by not burning power when idle and getting
into and out of idle state with zero overhead. Other multiple CPU
chips provide much larger amounts of memory. Using those leave you
with the problem of keeping the CPUs busy. That is a problem you
should ignore with the GA144 I feel. The problem really is what parts
of the application can you push into the software and I don't see
enough hardware around the edges (the I/Os) to properly facilitate
most apps.

Rick

Richard Owlett

unread,
Jan 16, 2012, 4:43:16 PM1/16/12
to
rickman wrote:
> On Jan 16, 1:30 pm, Paul Rubin<no.em...@nospam.invalid> wrote:
> [ *MASSIVE SNIP* ]
>
>> The GA144 of course can handle GPIO unusually well, by dedicating CPU
>> nodes to I/O pins.
>
> That is where it falls short. GPIO often needs something other than
> the CPU core voltage. So level shifting is needed. This is a real
> shortcoming in a device built in a process some years old. Nearly all
> MCUs are available in 5 volt tolerant versions.

At board level, is that a legitamate objection?
[I'm so elderly as to think of I/O as 026's and line printers]

Thought there was a NEW_FANGLED thingy called "TriState buffer"
OKalreadymayalsoneedalevelshifter ps lackof spaces
intentional ;/

OK Owl "ducks" for cover as he composes a VERY bad pun for
another subject line

A. K.

unread,
Jan 16, 2012, 5:07:31 PM1/16/12
to
On 16.01.2012 22:43, Richard Owlett wrote:
> rickman wrote:
>> On Jan 16, 1:30 pm, Paul Rubin<no.em...@nospam.invalid> wrote:
>> [ *MASSIVE SNIP* ]
>>
>>> The GA144 of course can handle GPIO unusually well, by dedicating CPU
>>> nodes to I/O pins.
>>
>> That is where it falls short. GPIO often needs something other than
>> the CPU core voltage. So level shifting is needed. This is a real
>> shortcoming in a device built in a process some years old. Nearly all
>> MCUs are available in 5 volt tolerant versions.
>
> At board level, is that a legitamate objection?
> [I'm so elderly as to think of I/O as 026's and line printers]

Well, son, you beat me here. :-)))))

rickman

unread,
Jan 16, 2012, 6:12:21 PM1/16/12
to
On Jan 16, 3:53 pm, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:
> In article <jf1itl$b2...@dont-email.me>,
> Arnold Doray  <inva...@invalid.com> wrote:
> <SNIP>
>
>
>
> >I was comparing the GA144 with something of similar computing muscle &
> >wattage (like the Atoms, ARMs, etc). My thinking is that compared to
> >solutions that use these, you could conceivably move some hardware
> >functions into software by using a GA144 (or more than one, if
> >necessary). Of course, you can't completely avoid using extra hardware
> >for Ethernet, USB, etc.
>
> This is limited to level converters. GA144 should be able to
> handle Ethernet and USB up to 100 MHz bit-banging.
>
> We are not talking about 12 nm GA's. These could replace the chips
> in SATA drives.
> I'm much more optimistic in this area than with AI applications.

Why do you feel the GA chip can handle 100 Mbps serial interfaces?
Even if you could make that happen, I don't see 480 Mbps USB. There
are even little CM3 devices that can support High Speed USB.

Don't get me wrong. I'd love to see this happen. But running at less
than 7 instructions per bit, how can a GA144 node do any processing on
the serial bit stream? At 480 Mbps we are talking about less than 2
instructions per bit.

Rick

rickman

unread,
Jan 16, 2012, 6:14:29 PM1/16/12
to
I'm afraid that I have no idea what you are talking about...

Rick

Richard Owlett

unread,
Jan 17, 2012, 12:06:51 AM1/17/12
to
Alright all ready it's been 30 years since I dealt with
tri-state buffers
Level shifters can be trivial [one transistor and two
resistors {emitter to common, input to resistor to base,
output from junction of collector and resistor to supply rail]

I'll let Jerry donate the 74xxxx numbers ;)


Paul Rubin

unread,
Jan 17, 2012, 1:36:25 AM1/17/12
to
rickman <gnu...@gmail.com> writes:
> BTW, how exactly is the GA144 big, or even expensive? At less than
> $15 each in some quantity they are way less than most of the high end
> ARM devices. Yes, they are at the top end of MCUs, but not out of the
> ballpark.

I would have thought $15 is a big expensive part by most embedded-cpu
standards, and the external parts (USB interface etc.) we're talking
about would be $1 or less.

I think you could do 12 mbps USB by bit banging, if there's enough code
space. 480 mbps would need external support. 100 mbit ethernet might
barely be doable if you've got a node doing nothing but deserialize bits
and shuttle them to another node for processing. I would think not that
many applications really need those fast interfaces. Can 10 mbit
ethernet co-exist with 100 mbit on the same port on a typical PC? Maybe
10 mbit is the easiest approach if you want to do ethernet in software.

> Any MCU can do async serial at RS-232 speeds. In fact I bid an FPGA
> based design for a 16 serial port board a year or two ago and it was
> underbid by someone planning software UARTs on a CM3 ARM.

16 ports in software on a low end ARM at some high speed? Hmmm.
Recently I was thinking that multiport serial was a possible good use
for a GA144. An ARM would sure be easier...

> but why adding a special chip when you can add an MCU with all the
> interfaces you need? Then the question is why is the GA144 needed?

Well the natural idea is that you want the GA144's high computational
parallelism, but finding applications for all that cpu speed that fit
within the GA144's unusual constraints hasn't been that easy.

> I guess I am really lamenting that GA is not showing us how to do the
> things that you might want to do with this device. Other MCU vendors
> provide all sorts of app notes and design info for their parts. Why?
> Because designers won't use the parts if you don't.

GA is trying to get people to visit them and write app notes (I
mentioned the video link a few threads ago) in exchange for eval boards.

Albert van der Horst

unread,
Jan 17, 2012, 6:45:19 AM1/17/12
to
In article <893cb497-7588-4857...@k28g2000yqc.googlegroups.com>,
rickman <gnu...@gmail.com> wrote:
>
>Why do you feel the GA chip can handle 100 Mbps serial interfaces?
>Even if you could make that happen, I don't see 480 Mbps USB. There
>are even little CM3 devices that can support High Speed USB.
>
>Don't get me wrong. I'd love to see this happen. But running at less
>than 7 instructions per bit, how can a GA144 node do any processing on
>the serial bit stream? At 480 Mbps we are talking about less than 2
>instructions per bit.

Fetching a byte is a few instructions per bit.
Handling the byte (done by another processor) is an order of magnitude
less fast.
Tucking a checksum to a stream, can be done streaming.
It depends a bit on the protocol, but 100 Mbps seems feasible to
me with the current chips.

480 Mbps USB is probably out of reach.

>
>Rick

Albert van der Horst

unread,
Jan 17, 2012, 6:53:57 AM1/17/12
to
In article <VsqdnROWe8loBInS...@supernews.com>,
What is the joy of the GA144?

Having a thumnail device with a solar cell on one side and an
micro sd on the other and the GA144 in between, playing mpegs.

Would you carry a square meter of printed circuit board
around your neck?
Would you be impressed by a device that can do what yo cellphone
can (playing Mpegs) that is actually *larger* than a cellphone?

I must say that Rickman hits the nail on the head.

Groetjes Albert

rickman

unread,
Jan 17, 2012, 7:50:41 AM1/17/12
to
On Jan 17, 6:45 am, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:
> In article <893cb497-7588-4857-8a6b-5bb60497f...@k28g2000yqc.googlegroups.com>,
>
> rickman <gnu...@gmail.com> wrote:
>
> >Why do you feel the GA chip can handle 100 Mbps serial interfaces?
> >Even if you could make that happen, I don't see 480 Mbps USB. There
> >are even little CM3 devices that can support High Speed USB.
>
> >Don't get me wrong. I'd love to see this happen. But running at less
> >than 7 instructions per bit, how can a GA144 node do any processing on
> >the serial bit stream? At 480 Mbps we are talking about less than 2
> >instructions per bit.
>
> Fetching a byte is a few instructions per bit.
> Handling the byte (done by another processor) is an order of magnitude
> less fast.
> Tucking a checksum to a stream, can be done streaming.
> It depends a bit on the protocol, but 100 Mbps seems feasible to
> me with the current chips.
>
> 480 Mbps USB is probably out of reach.

I think this is a VERY optimistic analysis. With only some five
instructions per bit I seriously doubt that bits can be collected from
a CLOCKED serial stream into bytes and passed on to another
processor. This will require maintaining a bit counter with a test to
know when the sample is assembled, the data being shifted, both need
to be cleared out when complete, data needs to be sync'd to a clock
signal. I think that is far too much to do in so few instructions.
This is a hard synchronous process. The input stream MUST be sampled
at the correct time. I suppose you could use one node to
synchronously sample bits and queues them on to the next node which
aggregates them into samples. But I'm not sure the aggregation can be
done at 100 MHz.

Seat of the pants analysis is easy. I can't see anyone signing up to
do this.

I think my point still stands that it is not practical to implement
today's standard embedded application interface protocols in software
on the GA144. If it can be done at all, it will be a neat trick. But
someone will need to demonstrate that.

Rick

Albert van der Horst

unread,
Jan 17, 2012, 2:35:14 PM1/17/12
to
In article <bJ-dndD049lwnIjS...@supernews.com>,
One transistor and two resistors for one hundred outputs,
piece of cake.

>
>I'll let Jerry donate the 74xxxx numbers ;)
>
>

Bernd Paysan

unread,
Jan 17, 2012, 7:00:18 PM1/17/12
to
rickman wrote:
> I think this is a VERY optimistic analysis. With only some five
> instructions per bit I seriously doubt that bits can be collected from
> a CLOCKED serial stream into bytes and passed on to another
> processor. This will require maintaining a bit counter with a test to
> know when the sample is assembled, the data being shifted, both need
> to be cleared out when complete, data needs to be sync'd to a clock
> signal. I think that is far too much to do in so few instructions.
> This is a hard synchronous process. The input stream MUST be sampled
> at the correct time.

I'm very sceptical that GA144 can do anything beyond 10MBit Ethernet,
which is manchester encoded at 20MHz signal rate (manchester code means
up to two signals per bit). Receiving 10MBit Ethernet seems to be
doable; sending, well maybe the other side also tolerates quite some
different rates (after all, short/long signals on manchester code are by
a factor two apart).

100 MBit Ethernet has a signal rate of 125MHz (4b/5b encoding), and you
indeed need to synchronize with the signal. Would be easier if the
GA144 had a fixed clock frequency at a multiple of 125MHz, but it
doesn't. Using a PHY chip makes things easier, because the PHY already
does the 4b/5b stuff, i.e. you only need to run at 25MHz, but you need 4
bits in parallel (plus clock).

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://bernd-paysan.de/

rickman

unread,
Jan 17, 2012, 10:57:06 PM1/17/12
to
There is nothing in the GA144 that precludes the use of a clock. I
don't think you can get a node to accurately sample 125 Mbps even with
a clock. I suppose that using a PHY brings the design into the realm
of possibility. Does the PHY use a PLL to synchronize to the bus data
rate?

Rick

Albert van der Horst

unread,
Jan 18, 2012, 11:20:09 AM1/18/12
to
In article <jf522j$llr$1...@online.de>, Bernd Paysan <bernd....@gmx.de> wrote:
>
>100 MBit Ethernet has a signal rate of 125MHz (4b/5b encoding), and you
>indeed need to synchronize with the signal. Would be easier if the
>GA144 had a fixed clock frequency at a multiple of 125MHz, but it
>doesn't. Using a PHY chip makes things easier, because the PHY already
>does the 4b/5b stuff, i.e. you only need to run at 25MHz, but you need 4
>bits in parallel (plus clock).

Of course I assumed that the 125 Mhz is available as a clock on one of
other pins of the dedicated processor that does the very lowest
bit handling. There is no reason why this wouldn't be feasible, but
probably cost us another processor or two.

>
>--
>Bernd Paysan
>"If you want it done right, you have to do it yourself"
>http://bernd-paysan.de/
>


rickman

unread,
Jan 18, 2012, 11:11:55 AM1/18/12
to
On Jan 18, 11:20 am, Albert van der Horst <alb...@spenarnc.xs4all.nl>
wrote:
> In article <jf522j$ll...@online.de>, Bernd Paysan <bernd.pay...@gmx.de> wrote:
>
> >100 MBit Ethernet has a signal rate of 125MHz (4b/5b encoding), and you
> >indeed need to synchronize with the signal. Would be easier if the
> >GA144 had a fixed clock frequency at a multiple of 125MHz, but it
> >doesn't. Using a PHY chip makes things easier, because the PHY already
> >does the 4b/5b stuff, i.e. you only need to run at 25MHz, but you need 4
> >bits in parallel (plus clock).
>
> Of course I assumed that the 125 Mhz is available as a clock on one of
> other pins of the dedicated processor that does the very lowest
> bit handling. There is no reason why this wouldn't be feasible, but
> probably cost us another processor or two.

I have no idea why you say accepting a 125 MHz clock to sample a data
stream would be feasible on a GA144. That is less than five
instructions per bit. Do you have any idea of how to accomplish this
or are you just mulling it over in your mind and saying, yeah, that
could work?

I have done similar things and it is pretty hard to do anything useful
in five instructions. Let's reason this out. The first node would
have to wait for the clock and read the data bit. I think that can be
done in one instruction if both the clock and the data are read
together. But then the data must be masked out which is how many
instructions, two? It has to be shifted to make the parallel byte or
word as preferred, that's two or three more instructions reaching the
five threshold and we still haven't dealt with the bit counter. The
bit counter could be done in a separate node which sends the bit count
to the input node and waits for it to be read before incrementing the
counter. Still the input node must read the value and decide if the
data in the shift register must be sent to the next node.

I don't see any of this as being further decomposable really. I
suppose the input process could just send the parallel data to the
next node on every clock and let the next node handle the counter and
deciding when to pass the assembled data further up the chain. But we
reached our five instructions before we even counted the counter logic
or passing the data to the next process.

Do you have some ideas on another way to do this? Am I making this
too complex? I will say that writing this I found reductions I didn't
think of initially.

I don't mean to run down the GA people or especially Chuck and anyone
else who helped to design the chip. But I would have expected things
like this to have been addressed BEFORE the chip design was completed
and built. I would have used the prototypes to build and test all of
the basic interface types I expected would be needed. There are
SERDES on board, but I don't think they can be used with a clock.
Otherwise they could potentially make even 480 Mbps USB feasible on
these devices.

Rick

rickman

unread,
Jan 18, 2012, 4:01:46 PM1/18/12
to
On Jan 17, 1:36 am, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> > BTW, how exactly is the GA144 big, or even expensive? At less than
> > $15 each in some quantity they are way less than most of the high end
> > ARM devices. Yes, they are at the top end of MCUs, but not out of the
> > ballpark.
>
> I would have thought $15 is a big expensive part by most embedded-cpu
> standards, and the external parts (USB interface etc.) we're talking
> about would be $1 or less.
>
> I think you could do 12 mbps USB by bit banging, if there's enough code
> space. 480 mbps would need external support. 100 mbit ethernet might
> barely be doable if you've got a node doing nothing but deserialize bits
> and shuttle them to another node for processing. I would think not that
> many applications really need those fast interfaces. Can 10 mbit
> ethernet co-exist with 100 mbit on the same port on a typical PC? Maybe
> 10 mbit is the easiest approach if you want to do ethernet in software.

Sure, I think I said doing 10 Mbps Ethernet and 12 Mbps USB is likely
no big deal. If I knew more about the details of Ethernet I would be
all over that now. But a viable product will likely need 100 Mbps
Ethernet or 480 Mbps USB. I suppose that all depends on the
application of course. My point is that I think they could have added
very little specialized hardware, just like the other MCU vendors, and
ended up with a product that is a lot more usable.


> > Any MCU can do async serial at RS-232 speeds. In fact I bid an FPGA
> > based design for a 16 serial port board a year or two ago and it was
> > underbid by someone planning software UARTs on a CM3 ARM.
>
> 16 ports in software on a low end ARM at some high speed? Hmmm.
> Recently I was thinking that multiport serial was a possible good use
> for a GA144. An ARM would sure be easier...

I am looking at an application which will have low volumes. It does
not require the GA144 at all, but I am thinking of making the board
available as a GA144 eval board as well, two birds with one stone. So
I'm trying to figure out what I might provide on the board to
facilitate users I/O requirements. It almost seems absurd to add a
separate MCU as the USB/Ethernet interface device because it
integrates the I/O capability but that is what I am looking at.


> > but why adding a special chip when you can add an MCU with all the
> > interfaces you need? Then the question is why is the GA144 needed?
>
> Well the natural idea is that you want the GA144's high computational
> parallelism, but finding applications for all that cpu speed that fit
> within the GA144's unusual constraints hasn't been that easy.

I'm not sure the chip is really about the "high computational
parallelism" in the way most people think. The CPUs are all memory
starved. You can attach a relatively low speed memory device
(compared to PC memory) which can be shared among the internal nodes.
But this still does not make the CPU nodes "high speed" computers. I
prefer to think of them as hardware like an FPGA. They can do a task,
given their limitations and don't worry about wasting the CPU cycles.
So then the question is what can they do? 125 Mbps Ethernet I/O is at
the very limit and likely beyond.


> > I guess I am really lamenting that GA is not showing us how to do the
> > things that you might want to do with this device. Other MCU vendors
> > provide all sorts of app notes and design info for their parts. Why?
> > Because designers won't use the parts if you don't.
>
> GA is trying to get people to visit them and write app notes (I
> mentioned the video link a few threads ago) in exchange for eval boards.

Yes, I read about that. I can't afford to spend a week or more out
there donating my time, ending up with nothing I can use
commercially. I suppose it might make the person the reigning guru in
that application on the GA144 which could lead to jobs.

Rick

Bernd Paysan

unread,
Jan 18, 2012, 7:39:37 PM1/18/12
to
rickman wrote:
> I'm not sure the chip is really about the "high computational
> parallelism" in the way most people think. The CPUs are all memory
> starved.

Yes, it's really not well thought through. Computational intensive
tasks usually also are memory intensive. You need data to crunch? It's
probably a lot of data then.

Look at the competition. There's, on the one side: FPGAs. The Cyclone
V 5CEA4 is a relatively low cost device, and has 144 18x19 multipliers
(all single cycle, the GA144 multiplier are 1x18 multipliers, and AFAIK
need two cycles per bit, because the carry propagation has to be taken
into account - granted, the cycles are smaller, but propbably only by a
factor of 3), 30 kB of on-chip memory (the GA144 has 18kB, and these 64
words per CPU are shared between program and data), 400MHz DDR2-RAM IOs
to really have fast access to other memory, etc. Handling a GB Ethernet
device in such a Cyclone 5 is not all that complicated - you need your
external phy device.

The other competition are GPGPUs. They have such an abundance of
computation resources that they are rated in teraflops now (single
precision).

Chuck doesn't like the ultra-complex design software for current chips.
He thinks his simplistic approach yields faster chips. I'm not
convinced. I've done similar CPUs (b16) with similar technology, and
ended up at comparable clock speeds, and I don't need to spend years
tuning the thing for a particular technology - I can jump to the latest
available or my sweet spot for affordability; even if I'm 30% slower
than Chuck for a given technology, I can be 10 times faster by using the
most recent technology, and pack 100 times as many cores on one die. If
that would actually be a good idea.

IMHO, Chuck is something like a one-man research institute. He has
wonderful ideas, but it's research work, not actual product development.
Message has been deleted

forther

unread,
Jan 18, 2012, 8:11:18 PM1/18/12
to
On Wednesday, January 18, 2012 4:39:37 PM UTC-8, Bernd Paysan wrote:

> Yes, it's really not well thought through. Computational intensive
> tasks usually also are memory intensive. You need data to crunch? It's
> probably a lot of data then.

A lot of data to crunch doesn't necessarily require a lot of memory if it's streaming data.

Arnold Doray

unread,
Jan 19, 2012, 9:49:09 AM1/19/12
to
On Thu, 19 Jan 2012 01:39:37 +0100, Bernd Paysan wrote:

> rickman wrote:
>> I'm not sure the chip is really about the "high computational
>> parallelism" in the way most people think. The CPUs are all memory
>> starved.
>
> Yes, it's really not well thought through.

With respect, that's a pretty arrogant statement. CM et al have been
thinking of these things for *decades*. He's sunk millions into it. It is
likely very well thought through. You're just applying it in the wrong
directions.

>
> Computational intensive
> tasks usually also are memory intensive. You need data to crunch? It's
> probably a lot of data then.
>

No, not all class of programs that are computationally intensive are also
data intensive. There are a large class of practical problems (AI,Machine
Vision come to mind) to which you could apply the GA144-type chips.

> Look at the competition. There's, on the one side: FPGAs. The Cyclone
> V 5CEA4 is a relatively low cost device, and has 144 18x19 multipliers
> (all single cycle, the GA144 multiplier are 1x18 multipliers, and AFAIK
> need two cycles per bit, because the carry propagation has to be taken
> into account - granted, the cycles are smaller, but propbably only by a
> factor of 3), 30 kB of on-chip memory (the GA144 has 18kB, and these 64
> words per CPU are shared between program and data), 400MHz DDR2-RAM IOs
> to really have fast access to other memory, etc. Handling a GB Ethernet
> device in such a Cyclone 5 is not all that complicated - you need your
> external phy device.

The devil is in the details. FPGAs are hard to program especially for
novel applications. That extra cost of development has to be taken into
account. There is a lot more to computation than multiplying. You have to
consider the whole package.

Did you see this Forth Day video:

http://www.forth.org/svfig/videos/fd2010/montvelishsky.ogm

where Michael Montvelishsky used a S40 to incorporate a realtime machine
vision algorithm?

>
> The other competition are GPGPUs. They have such an abundance of
> computation resources that they are rated in teraflops now (single
> precision).
>

That's comparing apples to oranges. GPGPUs need a CPU to feed them with
data. Try running a GPGPU on AAA batteries.

> Chuck doesn't like the ultra-complex design software for current chips.
> He thinks his simplistic approach yields faster chips. I'm not
> convinced. I've done similar CPUs (b16) with similar technology, and
> ended up at comparable clock speeds, and I don't need to spend years
> tuning the thing for a particular technology - I can jump to the latest
> available or my sweet spot for affordability; even if I'm 30% slower
> than Chuck for a given technology, I can be 10 times faster by using the
> most recent technology, and pack 100 times as many cores on one die. If
> that would actually be a good idea.

Show us the hardware. Unless you've built these devices and are have
actually put them into production, it sounds like a lot of handwaving to
me.

>
> IMHO, Chuck is something like a one-man research institute. He has
> wonderful ideas, but it's research work, not actual product development.

Forth came out of his brain. Is it "research work"? Well, many people see
Forth as merely a curiosity. But it's still pretty pervasive, despite tbe
95% winging about its "deficiencies". ;)

I believe the GA144 is like that as well. It's a device for which the
market hasn't been created yet, precisely because there hasn't been a
viable device on the market for this niche.

You have to have some imagination to see where it can be best applied.

Cheers,
Arnold

rickman

unread,
Jan 19, 2012, 2:01:21 PM1/19/12
to
I don't see the comparison to GPUs as they are power hungry devices
and do not lend themselves to applications outside of PCs. I've never
seen them used in any other designs. Although they can be separately
programmed, they don't run an OS and don't work standalone.

I think your comparison with FPGAs is not too far off target. I
question the price issue as the Cyclone 5 devices are quite large,
even the smallest. But in general the GA144 is prices in the same
ballpark as low end FPGAs. I don't think it would be easy to get the
same processing speed in a low end FPGA, but then at GA144 won't be so
easy to get working with 100+ MHz signals.

Rick

rickman

unread,
Jan 19, 2012, 2:30:54 PM1/19/12
to
On Jan 19, 9:49 am, Arnold Doray <inva...@invalid.com> wrote:
> On Thu, 19 Jan 2012 01:39:37 +0100, Bernd Paysan wrote:
> > rickman wrote:
> >> I'm not sure the chip is really about the "high computational
> >> parallelism" in the way most people think. The CPUs are all memory
> >> starved.
>
> > Yes, it's really not well thought through.
>
> With respect, that's a pretty arrogant statement. CM et al have been
> thinking of these things for *decades*. He's sunk millions into it. It is
> likely very well thought through. You're just applying it in the wrong
> directions.

I'm not sure how you can call someone arrogant "with respect". I
agree that calling it "not well thought through" is a bit much, but
arrogant? He is just saying he doesn't think it works well. I can't
argue that it doesn't have significant limitations. I think that
mostly you just need to recalibrate your thinking of these as typical
MCUs.


> > Computational intensive
> > tasks usually also are memory intensive. You need data to crunch? It's
> > probably a lot of data then.
>
> No, not all class of programs that are computationally intensive are also
> data intensive. There are a large class of practical problems (AI,Machine
> Vision come to mind) to which you could apply the GA144-type chips.
>
> > Look at the competition. There's, on the one side: FPGAs. The Cyclone
> > V 5CEA4 is a relatively low cost device, and has 144 18x19 multipliers
> > (all single cycle, the GA144 multiplier are 1x18 multipliers, and AFAIK
> > need two cycles per bit, because the carry propagation has to be taken
> > into account - granted, the cycles are smaller, but propbably only by a
> > factor of 3), 30 kB of on-chip memory (the GA144 has 18kB, and these 64
> > words per CPU are shared between program and data), 400MHz DDR2-RAM IOs
> > to really have fast access to other memory, etc. Handling a GB Ethernet
> > device in such a Cyclone 5 is not all that complicated - you need your
> > external phy device.
>
> The devil is in the details. FPGAs are hard to program especially for
> novel applications. That extra cost of development has to be taken into
> account. There is a lot more to computation than multiplying. You have to
> consider the whole package.

Yes, I have been making my living the last 10 years or so doing easy
development of "hard to program" FPGAs. I'll stack FPGAs against most
processors any day in terms of easy of development.


> Did you see this Forth Day video:
>
> http://www.forth.org/svfig/videos/fd2010/montvelishsky.ogm
>
> where Michael Montvelishsky used a S40 to incorporate a realtime machine
> vision algorithm?
>
>
>
> > The other competition are GPGPUs. They have such an abundance of
> > computation resources that they are rated in teraflops now (single
> > precision).
>
> That's comparing apples to oranges. GPGPUs need a CPU to feed them with
> data. Try running a GPGPU on AAA batteries.
>
> > Chuck doesn't like the ultra-complex design software for current chips.
> > He thinks his simplistic approach yields faster chips. I'm not
> > convinced. I've done similar CPUs (b16) with similar technology, and
> > ended up at comparable clock speeds, and I don't need to spend years
> > tuning the thing for a particular technology - I can jump to the latest
> > available or my sweet spot for affordability; even if I'm 30% slower
> > than Chuck for a given technology, I can be 10 times faster by using the
> > most recent technology, and pack 100 times as many cores on one die. If
> > that would actually be a good idea.
>
> Show us the hardware. Unless you've built these devices and are have
> actually put them into production, it sounds like a lot of handwaving to
> me.

Now you stepped on your own... toes. Bernd has produced his Forth CPU
in silicon before. You should research your arguments before you make
them.


> > IMHO, Chuck is something like a one-man research institute. He has
> > wonderful ideas, but it's research work, not actual product development.
>
> Forth came out of his brain. Is it "research work"? Well, many people see
> Forth as merely a curiosity. But it's still pretty pervasive, despite tbe
> 95% winging about its "deficiencies". ;)
>
> I believe the GA144 is like that as well. It's a device for which the
> market hasn't been created yet, precisely because there hasn't been a
> viable device on the market for this niche.
>
> You have to have some imagination to see where it can be best applied.

I would love to hear details. I can't even figure out how to
implement current standards, no, make that yesterday's standards in I/
O on the GA144. 100 Mbps Ethernet and 480 Mbps USB are both at least
one revision old compared to current technology in use in homes.
Neither one can be implemented in the GA144 without external silicon
to help. That is the sort of thing Bernd is talking about by his
"research" comment. If the chip had been designed with more of a
marketing focus I believe it would incorporate the needed hardware to
make each of these standards much easier to use.

Rick

Bernd Paysan

unread,
Jan 19, 2012, 4:45:36 PM1/19/12
to
forther wrote:

> On Wednesday, January 18, 2012 4:39:37 PM UTC-8, Bernd Paysan wrote:
>> Yes, it's really not well thought through. Computational intensive
>> tasks usually also are memory intensive. You need data to crunch?
>> It's probably a lot of data then.
>
> A lot of data to crunch doesn't necessarily require a lot of memory if
> it's streaming data.

But then it still requires a lot of IO bandwidth. GA144 has neither.

forther

unread,
Jan 19, 2012, 5:08:56 PM1/19/12
to
On Thursday, January 19, 2012 1:45:36 PM UTC-8, Bernd Paysan wrote:

> But then it still requires a lot of IO bandwidth. GA144 has neither.

It depends of how you define "a lot".

Anyway, we've implemented at least 3 applications, based on SEAforth 40C18: music synthesizer, hearing aid device and a subsystem of machine vision. In all these cases the memory and IO bandwidth were adequate to the task. Of course, it doesn't prove, that SEAforth or GA144 is perfect match to everything. It just proves, that such fields exist. In our case SEAforth appeared to be better, than any known competitors, especially in terms of power efficiency.

Paul Rubin

unread,
Jan 19, 2012, 5:26:03 PM1/19/12
to
rickman <gnu...@gmail.com> writes:
> I don't see the comparison to GPUs as they are power hungry devices
> and do not lend themselves to applications outside of PCs. I've never
> seen them used in any other designs.

Mobile phones have them too FWIW.

> I don't think it would be easy to get the same processing speed in a
> low end FPGA, but then at GA144 won't be so easy to get working with
> 100+ MHz signals.

The GA144's other main attraction is very low static power consumption
of idle nodes.

Bernd Paysan

unread,
Jan 19, 2012, 5:27:45 PM1/19/12
to
rickman wrote:
> I don't see the comparison to GPUs as they are power hungry devices
> and do not lend themselves to applications outside of PCs. I've never
> seen them used in any other designs. Although they can be separately
> programmed, they don't run an OS and don't work standalone.

Well, GPUs as such are power hungry devices, but in terms of
power/compute power, they are prett damn'd good. A teraflops (single
precision) for 200W? Well, if you only need a gigaflop, a GPU is too
big for you.

> I think your comparison with FPGAs is not too far off target. I
> question the price issue as the Cyclone 5 devices are quite large,
> even the smallest.

Well, it's a 28nm device. It isn't large, even the largest. It just
has a lot of gates on it. Because it uses the latest available
technology.

> But in general the GA144 is prices in the same
> ballpark as low end FPGAs. I don't think it would be easy to get the
> same processing speed in a low end FPGA, but then at GA144 won't be so
> easy to get working with 100+ MHz signals.

A GA144 in 28nm would probably a much more impressive device than a
Cyclone 5. But it isn't available. I think the Cyclone 5 with the 144
multipliers will beat a GA144 hands down in whatever task you can
imagine.

I've done my b16 in similar processes as Chuck did his stuff. But that
was because my b16 always was just an add-on to some custom-specific
ASIC with analog and power components, and analog+power usually is added
to decade old processes (there is a current counter-trend to this,
because people want SoCs, and the power+analog should not go into
another die). The main reason to deploy the b16 never was speed, it
always was area and power consumption (be quick to finish is good for
power consumption, too).

The IO stuff always was dedicated Verilog, not bit-banging in software.
It always cooperated with the CPU, or used DMA to write directly into
memory.

I heard Chuck talking at EuroForth about 10 years ago, when he started
this multi-chip endeaver. He clearly said that he thinks this is a cool
idea, but did not know what kind of product that would go into.

This is the definition of research project. You have a cool idea, but
no product. Research is necessary. But to make it a product, you have
to think about how people will use it. In a product-driven development,
the requirements are first, the solution (the actual implementation)
comes next.

Bernd Paysan

unread,
Jan 19, 2012, 6:15:45 PM1/19/12
to
Oh yes, there always will be applications that are computation intense,
but need neither lots of IO nor lots of internal memory. I've done a
equalizer filter component in Zetex' DDFA, and it would qualify as such
an application - it needs 4 cells of data and 5 cells for coefficients,
and the program for a biquad filter is really small - this sort of thing
easily fits into one GA144 core (maybe even stereo, i.e. two of those),
and the data rate for audio data is low.

Zetex' DDFA is a full custom chip, so the most competitive was to build
a special-purpose CPU that was just good at biquads and mixing (I did
not even consider to use a b16 for that task), but if you have to buy
off-the-shelf parts, the selection process is different. I would say
that a hearing aid should be a full-custom chip, because you want it to
be as small as possible (so it should be a one-chip solution), and as
low power as possible (i.e. the smallest possible battery). As hearing
aids should be mass products (thanks to Apple for selling so many iPods
;-), a full custom chip is cost efficient.

Arnold Doray

unread,
Jan 19, 2012, 10:19:02 PM1/19/12
to
On Thu, 19 Jan 2012 11:30:54 -0800, rickman wrote:

> On Jan 19, 9:49 am, Arnold Doray <inva...@invalid.com> wrote:
>> On Thu, 19 Jan 2012 01:39:37 +0100, Bernd Paysan wrote:
>> > rickman wrote:
>> >> I'm not sure the chip is really about the "high computational
>> >> parallelism" in the way most people think. The CPUs are all memory
>> >> starved.
>>
>> > Yes, it's really not well thought through.
>>
>> With respect, that's a pretty arrogant statement. CM et al have been
>> thinking of these things for *decades*. He's sunk millions into it. It
>> is likely very well thought through. You're just applying it in the
>> wrong directions.
>
> I'm not sure how you can call someone arrogant "with respect". I agree
> that calling it "not well thought through" is a bit much, but arrogant?
> He is just saying he doesn't think it works well. I can't argue that it
> doesn't have significant limitations. I think that mostly you just need
> to recalibrate your thinking of these as typical MCUs.
>

I'm not calling Bernd arrogant. Only that that one statement of his
sounds arrogant to me. The statement sounds arrogant because it dismisses
someone's life work without providing any adequate basis for that
judgment. I mean, comparing the GA144 to GPGPUs ??!! Seriously. I guess
even geniuses have a day off.

I find it hard to equate GA144s with ordinary MCUs. Their computational
capabilities, parallellism, lack of synchronous processing, etc make them
very different from MCUs. You could of course force them into that mold,
but I suspect they'd disappoint. You've got to apply them in a different
context.

Perhaps that's where our differences in opinion lie. You want to teach a
new dog old tricks. :)

>> > Computational intensive tasks usually also are memory intensive. You
>> > need data to crunch? It's probably a lot of data then.
>>
>> No, not all class of programs that are computationally intensive are
>> also data intensive. There are a large class of practical problems
>> (AI,Machine Vision come to mind) to which you could apply the
>> GA144-type chips.
>>
>> > Look at the competition. There's, on the one side: FPGAs. The
>> > Cyclone V 5CEA4 is a relatively low cost device, and has 144 18x19
>> > multipliers (all single cycle, the GA144 multiplier are 1x18
>> > multipliers, and AFAIK need two cycles per bit, because the carry
>> > propagation has to be taken into account - granted, the cycles are
>> > smaller, but propbably only by a factor of 3), 30 kB of on-chip
>> > memory (the GA144 has 18kB, and these 64 words per CPU are shared
>> > between program and data), 400MHz DDR2-RAM IOs to really have fast
>> > access to other memory, etc. Handling a GB Ethernet device in such a
>> > Cyclone 5 is not all that complicated - you need your external phy
>> > device.
>>
>> The devil is in the details. FPGAs are hard to program especially for
>> novel applications. That extra cost of development has to be taken into
>> account. There is a lot more to computation than multiplying. You have
>> to consider the whole package.
>
> Yes, I have been making my living the last 10 years or so doing easy
> development of "hard to program" FPGAs. I'll stack FPGAs against most
> processors any day in terms of easy of development.
>
>

On what tasks Rick? Have you used FPGAs for realtime machine vision?
Robot control? I'm saying that you have to look at new markets, not old
ones. The old markets (DSP, etc) are probably best served by the old
devices.

Also, do FPGAs scale to large production volumes for your target device?
The GA144 costs $20 at low volume, and probably gets much cheaper with
volume. Your target device's production costs go down with volume with
the same reliability. With FPGAs, small volumes are probably cheaper (I
mean base cost of device + cost of putting in your app into the FPGA),
but your costs are going to go up with volume, for the same reliability.
This hidden cost has to be factored in the total cost of production.

I suspect large production volumes is where the GA144 would beat FPGAs in
terms of cost.


>> Did you see this Forth Day video:
>>
>> http://www.forth.org/svfig/videos/fd2010/montvelishsky.ogm
>>
>> where Michael Montvelishsky used a S40 to incorporate a realtime
>> machine vision algorithm?
>>
>>
>>
>> > The other competition are GPGPUs. They have such an abundance of
>> > computation resources that they are rated in teraflops now (single
>> > precision).
>>
>> That's comparing apples to oranges. GPGPUs need a CPU to feed them with
>> data. Try running a GPGPU on AAA batteries.
>>
>> > Chuck doesn't like the ultra-complex design software for current
>> > chips. He thinks his simplistic approach yields faster chips. I'm
>> > not convinced. I've done similar CPUs (b16) with similar technology,
>> > and ended up at comparable clock speeds, and I don't need to spend
>> > years tuning the thing for a particular technology - I can jump to
>> > the latest available or my sweet spot for affordability; even if I'm
>> > 30% slower than Chuck for a given technology, I can be 10 times
>> > faster by using the most recent technology, and pack 100 times as
>> > many cores on one die. If that would actually be a good idea.
>>
>> Show us the hardware. Unless you've built these devices and are have
>> actually put them into production, it sounds like a lot of handwaving
>> to me.
>
> Now you stepped on your own... toes. Bernd has produced his Forth CPU
> in silicon before. You should research your arguments before you make
> them.
>
>

I am aware of Bernd's b16 processor. I said "and put them into
production". Meaning ready for production in volume with the hardware
issues fully resolved. That's the hard part. I wasn't aware that been
done for the b16.

Even so, the rest of Bernd argument in that paragraph is misleading. The
GA144 uses a conservative process. CM says as much in his 2011 video.
They could (conservatively) put in 50x the number of cores on the same
die space using modern technology. CM says that they don't know if this
is necessary. You need to start writing apps for the GA144 first. I
agree. That makes perfect business sense.


>> > IMHO, Chuck is something like a one-man research institute. He has
>> > wonderful ideas, but it's research work, not actual product
>> > development.
>>
>> Forth came out of his brain. Is it "research work"? Well, many people
>> see Forth as merely a curiosity. But it's still pretty pervasive,
>> despite tbe 95% winging about its "deficiencies". ;)
>>
>> I believe the GA144 is like that as well. It's a device for which the
>> market hasn't been created yet, precisely because there hasn't been a
>> viable device on the market for this niche.
>>
>> You have to have some imagination to see where it can be best applied.
>
> I would love to hear details. I can't even figure out how to implement
> current standards, no, make that yesterday's standards in I/
> O on the GA144. 100 Mbps Ethernet and 480 Mbps USB are both at least
> one revision old compared to current technology in use in homes. Neither
> one can be implemented in the GA144 without external silicon to help.
> That is the sort of thing Bernd is talking about by his "research"
> comment. If the chip had been designed with more of a marketing focus I
> believe it would incorporate the needed hardware to make each of these
> standards much easier to use.
>

Like I said, if you want to squeeze in these standards, you'd probably be
disappointed with the GA144's performance. I suspect you've got to look
at *new* markets to get the best out of these devices. AI/machine vision
and robot control are probably good fits.

Take a look at Michael Montvelishsky's video for a real application/
details.

Cheers,
Arnold







Bernd Paysan

unread,
Jan 21, 2012, 5:40:04 PM1/21/12
to
Arnold Doray wrote:
> I'm not calling Bernd arrogant. Only that that one statement of his
> sounds arrogant to me. The statement sounds arrogant because it
> dismisses someone's life work without providing any adequate basis for
> that judgment. I mean, comparing the GA144 to GPGPUs ??!! Seriously. I
> guess even geniuses have a day off.

I don't think you understood what I wrote. I wrote that GPGPUs - which
have many cores, just like GA144 - do actually provide their many cores
with an adequate memory interface and an adequate amount of local memory
for each core - GA144 does not, the nodes are severely memory-starved
for all but a few algorithms, and external memory access is cumbersome
and slow.

This means GA144 is severely limited in what it can be used for.

> Also, do FPGAs scale to large production volumes for your target
> device?

Yes. If you have large production volumes, you take the Verilog or VDHL
code and convert it into an ASIC, with an intermediate step for medium
volume productions, where the FGPA makers offer something like a gate
array, which is compatible with their FPGA tool chain, AFAIK using a
single mask layer to encode the actual logic and routing (which is
really saving a lot of money with current processes).

Arnold Doray

unread,
Jan 21, 2012, 11:59:59 PM1/21/12
to
On Sat, 21 Jan 2012 23:40:04 +0100, Bernd Paysan wrote:

> Arnold Doray wrote:
>> I'm not calling Bernd arrogant. Only that that one statement of his
>> sounds arrogant to me. The statement sounds arrogant because it
>> dismisses someone's life work without providing any adequate basis for
>> that judgment. I mean, comparing the GA144 to GPGPUs ??!! Seriously. I
>> guess even geniuses have a day off.
>
> I don't think you understood what I wrote. I wrote that GPGPUs - which
> have many cores, just like GA144 - do actually provide their many cores
> with an adequate memory interface and an adequate amount of local memory
> for each core - GA144 does not, the nodes are severely memory-starved
> for all but a few algorithms, and external memory access is cumbersome
> and slow.
>

The GA144 and GPGPUs are meant for vastly different markets. I don't see
any meaningful comparisons being drawn between them. That they rely on
multiple simple cores does not make them comparable either. That would be
like comparing radios and televisions just because both use transistors.

Perhaps a better comparison would be between the GA144 and products like
Tilera's [1]. Or the designs from Adapteva [2]. But they have an order of
magnitude higher power consumption compared to the GA144.

The point is that these various devices fit into different niches. And
even for the same niche, the competing devices would offer different
advantages (ease of development, cost, reliabilty, power, processing
capabilty, brand reputability, etc.).

> This means GA144 is severely limited in what it can be used for.

"severly limited" is a relative term. One poster has used a similar
device (the SeaForth chip) for music synthesizers, machine vision, etc.
and found it much better compared to the alternatives. In these cases,
the "slowness" of the GA144's I/O is not an issue. You can just stream
data into it from external SDRAM.

Personally, I feel that the biggest hindrance to using the GA144 is that
the development tools have a sharp learning curve. To program for the
GA144, you need to:

- learn a new language (colorForth)
- use an unfamiliar development environment (the colorForth IDE and
softsim)
- learn the F18's instruction set
- learn how to get around the F18's idiosyncracies (18-bit word, circular
stacks, etc).
- learn how to effectively decompose your problem into the multiple simple
nodes the GA144 has. This is hard for the GA144 because of its
limitations.

Compare these with Tilera's products which uses Java/C++ development
using Eclipse.

Of course, these challenges won't stop a company that recognizes the
competitive advantage the GA144 might give them. But it certainly
discourages casual experimentation and exploration, and lengthens the
development cycle. Some with existing FPGA experience might just use
FPGAs for the job even if they might obtain better results with the
GA144.

I feel getting more examples using standard Forth (eg, eForth or
polyForth) might help, as would a more standard development environment.
But it leaves open the issue of mapping your application to the chip. The
MD5 algorithm, while a good start, might not be one of great pedagogical
value, since you need to know the MD5 algorithm well, and it also does
not illustrate the advantage of using the GA144 for this problem -- is
using GA144 faster for this problem?

Perhaps a better approach is to use simpler problems, which lay bare the
issues. I came across a nice one yesterday - the "Partial Digest" (aka
"Turnpike") problem: Given a set of (non-repeating) numbers X (eg, X =
{1,3,5}), the "partial digest" of X called DX is the set of all distances
(ie, DX = {3-1, 5-1, 5-3} = {2,4,2}). The problem is: given a partial
digest of X to calculate X itself, with 0 being the smallest number as
reference. The solution is not necessarily unique. You need to output all
the answers. The problem is simple, amenable to exhaustive search and
parallelizable.


Cheers,
Arnold

[1] http://www.tilera.com/products/processors/TILE-Gx-8000
[2] http://www.adapteva.com/index.php?
option=com_content&view=article&id=72&Itemid=79

gavino

unread,
Jan 23, 2012, 10:50:33 AM1/23/12
to
Hi Bernd what kinds of things have you used your own chip for?

rickman

unread,
Jan 24, 2012, 9:29:10 AM1/24/12
to
On Jan 19, 5:26 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> > I don't see the comparison to GPUs as they are power hungry devices
> > and do not lend themselves to applications outside of PCs.  I've never
> > seen them used in any other designs.
>
> Mobile phones have them too FWIW.

I guess you can call them mobile phones, but they are really "smart
phones" which aren't really low power. They last about as long as a
low power laptop really. These are very limited versions of GPUs
specifically designed for the mobiles. I doubt they would be very
useful for any other app.


> > I don't think it would be easy to get the same processing speed in a
> > low end FPGA, but then at GA144 won't be so easy to get working with
> > 100+ MHz signals.
>
> The GA144's other main attraction is very low static power consumption
> of idle nodes.

That's actually where much of its low power comes from. It would be
hard to design an app that could use most of the nodes most of the
time, so if they nodes aren't very, very low power when idle they ruin
the low power nature of the chip.

Rick

rickman

unread,
Jan 24, 2012, 10:08:00 AM1/24/12
to
On Jan 19, 5:27 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> rickman wrote:
> > I don't see the comparison to GPUs as they are power hungry devices
> > and do not lend themselves to applications outside of PCs.  I've never
> > seen them used in any other designs.  Although they can be separately
> > programmed, they don't run an OS and don't work standalone.
>
> Well, GPUs as such are power hungry devices, but in terms of
> power/compute power, they are prett damn'd good.  A teraflops (single
> precision) for 200W?  Well, if you only need a gigaflop, a GPU is too
> big for you.

The point is that the two devices have very different power
envelopes. If you need a device that can fit in your hand or one that
runs off of a small battery for days, a GPU just won't do the job,
period.


> > I think your comparison with FPGAs is not too far off target.  I
> > question the price issue as the Cyclone 5 devices are quite large,
> > even the smallest.
>
> Well, it's a 28nm device.  It isn't large, even the largest.  It just
> has a lot of gates on it.  Because it uses the latest available
> technology.

I think you exaggerate the aspects of 28 nm a bit. The larger FPGAs
are always large chips. Someone posted in the FPGA group not too long
ago that Xilinx is selling one of their largest chips for $5000 each I
believe. Heck, I've heard it only costs $1000 to process a wafer! Of
course yield also plays a part in this. But you get the point.

I tried to look up current prices on FPGAs, but the Cyclone V don't
have pricing at Digikey and similar places, you have to get quotes.
With other families, the lowest prices I find are typically the same
as a low volume price on the GA144. However the GA144 isn't really
being sold in large quantities so they aren't able to pass on the full
potential savings of mass production. I'd be interested in how much a
GA32 would cost once it was being produced at 100,000 per month.
Better I'd like to be selling a product at 100,000 per month using a
GA32.


> > But in general the GA144 is prices in the same
> > ballpark as low end FPGAs.  I don't think it would be easy to get the
> > same processing speed in a low end FPGA, but then at GA144 won't be so
> > easy to get working with 100+ MHz signals.
>
> A GA144 in 28nm would probably a much more impressive device than a
> Cyclone 5.  But it isn't available.  I think the Cyclone 5 with the 144
> multipliers will beat a GA144 hands down in whatever task you can
> imagine.

If you consider all aspects of the application you just lost that
bet. None of the FPGA families other than the SiBlue parts can
compare to a GA144 on power, running or static. The SiBlue parts
might just give the GA144 a run for its money in terms of power
consumption, but doesn't have multipliers, etc, just LUTs and FFs and
maybe a PLL.

Actually, the 5CEA4 with 144 multiplier is not the smallest Cyclone 5,
it is the next to smallest and I bet is priced closer to $20. Also,
it may have 144 of the 18x19 multipliers, but only 72 DSP units with
72 accumulators.

I expect an FPGA can be configured for a task and beat a GA144, but
I'm not so sure you could build a similar array of processors in the
smallest FPGA (meaning similar pricing) and result in a faster
device. Even the fastest version is spec'd for under 300 MHz
operation for 18x18 multiplies and the memory is about the same
speed. Even though the GA144 cores can't multiply at that rate, they
run the rest of the instructions around twice as fast.


> I've done my b16 in similar processes as Chuck did his stuff.  But that
> was because my b16 always was just an add-on to some custom-specific
> ASIC with analog and power components, and analog+power usually is added
> to decade old processes (there is a current counter-trend to this,
> because people want SoCs, and the power+analog should not go into
> another die).  The main reason to deploy the b16 never was speed, it
> always was area and power consumption (be quick to finish is good for
> power consumption, too).

Yes, the only time I was able to justify a processor in an FPGA was
when we were slammed to the wall on capacity and replaced the random
logic with a special purpose processor to calculate some general, but
low rate, calculations.


> The IO stuff always was dedicated Verilog, not bit-banging in software.
> It always cooperated with the CPU, or used DMA to write directly into
> memory.
>
> I heard Chuck talking at EuroForth about 10 years ago, when he started
> this multi-chip endeaver.  He clearly said that he thinks this is a cool
> idea, but did not know what kind of product that would go into.
>
> This is the definition of research project.  You have a cool idea, but
> no product.  Research is necessary.  But to make it a product, you have
> to think about how people will use it.  In a product-driven development,
> the requirements are first, the solution (the actual implementation)
> comes next.

Yes, I agree 100% really. I think they not only missed the boat by
not adding specific hardware for the various interfaces that many
products need, they should have taken a cue from FPGAs and made the I/
Os more capable in terms of the voltage standards they work with. Add
memory blocks, both flash and ram, higher speed Ethernet and USB
support and make the I/Os multiple voltage with 3.3 volt tolerance and
I think they would have a system on a chip that will beat nearly
anything out there. As it is the device is really just an array of
processors and enough memory for the register set.

Rick

Brad

unread,
Jan 25, 2012, 10:11:43 PM1/25/12
to
On Jan 24, 8:08 am, rickman <gnu...@gmail.com> wrote:
> As it is the device is really just an array of
> processors and enough memory for the register set.
>
You should see Russel Fish's latest multicore venture. He developed a
computer architecture (and libraries to build it) for DRAM processes.
Pretty impressive. https://www.venraytechnology.com/

-Brad

rickman

unread,
Jan 26, 2012, 5:14:05 PM1/26/12
to
I don't get it. They talk about cell phone and laptops...

"Depending on product, battery lifetime will increase 2X-5X"

Then they suggest some absurd possibilities...

"Previously impossible low-power products are now feasable including
possibly:

Bluetooth earpiece computer
Computer embedded in glasses
Spoken language translator
Disposable computers
Human and animal implanted computers"


If you take a conventional cell phone processor and cut the power
consumption 5x you are still miles away from something that can be
implanted in a person or in glasses.

Their web site reads like a viewgraph presentation. Do they actually
have a product in silicon or is it all library cells waiting for
someone to give them money to build it?

Rick

Mark Wills

unread,
Jan 27, 2012, 6:47:36 AM1/27/12
to
Rick,

This will shed some light on it. The guy seems pretty creepy to me...

http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/

rickman

unread,
Jan 30, 2012, 3:53:45 PM1/30/12
to
> http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-B...

Yes, I've looked at this before. It is not a bad idea, but it needs
to be defined in terms of the applications it is good for. The web
site is pretty poor at that. I still say they are looking for people
with deep pockets to finance more work on their approach and are not
really ready to release any sort of product.

Rick

gavino

unread,
Feb 2, 2012, 6:48:24 PM2/2/12
to
what???

gavino

unread,
Feb 2, 2012, 6:49:45 PM2/2/12
to
plan9 replaces http with something apparently better than NFS

some like gopher

Bernd Paysan

unread,
Feb 7, 2012, 10:35:17 AM2/7/12
to
Arnold Doray wrote:

> On Sat, 21 Jan 2012 23:40:04 +0100, Bernd Paysan wrote:
>> I don't think you understood what I wrote. I wrote that GPGPUs -
>> which have many cores, just like GA144 - do actually provide their
>> many cores with an adequate memory interface and an adequate amount
>> of local memory for each core - GA144 does not, the nodes are
>> severely memory-starved for all but a few algorithms, and external
>> memory access is cumbersome and slow.
>>
>
> The GA144 and GPGPUs are meant for vastly different markets. I don't
> see any meaningful comparisons being drawn between them. That they
> rely on multiple simple cores does not make them comparable either.
> That would be like comparing radios and televisions just because both
> use transistors.

You don't understand. It does not matter which target market it is, it
matters that you usually need memory and bandwidth for computation.
This is a very fundamental issue. It does not matter what power
consumption GA144 has, when you can't get the data in to perform the
operation.

>> This means GA144 is severely limited in what it can be used for.
>
> "severly limited" is a relative term. One poster has used a similar
> device (the SeaForth chip) for music synthesizers, machine vision,
> etc. and found it much better compared to the alternatives. In these
> cases, the "slowness" of the GA144's I/O is not an issue. You can just
> stream data into it from external SDRAM.

In these cases the number of operations per data from the external SDRAM
is sufficiently large, and the internal state needed to perform these
algorithms sufficiently small - I know that for FIR filters and edge
detection on images, this is actually true, they can operate with very
little state and really small programs. Other approaches at
synthesizers like sample-based ones will not work that well on GA144.

The rule of thumb is that for one operation you need one bit of memory
IO (with the amount of memory the GA144 processing element has). There
probably is a bell-shaped curve around algorithms where some use more
memory, and some use less. You can find the algorithms where GA144 is
useful, but they are likely to be only a few, because GA144 is clearly
unbalanced - it does not allow one bit of memory IO per core operation,
it allows much less.

The less balanced an architecture is, the fewer purposes you will find
for it. Your critique that you have to get acustomed with an unusual
development system is valid, too, but it is an entirely different
critique.

rickman

unread,
Feb 7, 2012, 5:48:03 PM2/7/12
to
On Feb 7, 10:35 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Arnold Doray wrote:
> > On Sat, 21 Jan 2012 23:40:04 +0100, Bernd Paysan wrote:
> >> I don't think you understood what I wrote. I wrote that GPGPUs -
> >> which have many cores, just like GA144 - do actually provide their
> >> many cores with an adequate memory interface and an adequate amount
> >> of local memory for each core - GA144 does not, the nodes are
> >> severely memory-starved for all but a few algorithms, and external
> >> memory access is cumbersome and slow.
>
> > The GA144 and GPGPUs are meant for vastly different markets. I don't
> > see any meaningful comparisons being drawn between them. That they
> > rely on multiple simple cores does not make them comparable either.
> > That would be like comparing radios and televisions just because both
> > use transistors.
>
> You don't understand. It does not matter which target market it is, it
> matters that you usually need memory and bandwidth for computation.
> This is a very fundamental issue. It does not matter what power
> consumption GA144 has, when you can't get the data in to perform the
> operation.

Bernd, I really don't know what you are going on about. Of course it
matters what your target market is. "Usually" means "not always"
which is the point. There are certainly apps out there that will suit
the GA144. There are development groups out there that the tools will
suit. There are management teams that the company Green Arrays will
suit. To be successful all of these will need to be in the same
company.


> >> This means GA144 is severely limited in what it can be used for.
>
> > "severly limited" is a relative term. One poster has used a similar
> > device (the SeaForth chip) for music synthesizers, machine vision,
> > etc. and found it much better compared to the alternatives. In these
> > cases, the "slowness" of the GA144's I/O is not an issue. You can just
> > stream data into it from external SDRAM.
>
> In these cases the number of operations per data from the external SDRAM
> is sufficiently large, and the internal state needed to perform these
> algorithms sufficiently small - I know that for FIR filters and edge
> detection on images, this is actually true, they can operate with very
> little state and really small programs. Other approaches at
> synthesizers like sample-based ones will not work that well on GA144.
>
> The rule of thumb is that for one operation you need one bit of memory
> IO (with the amount of memory the GA144 processing element has). There
> probably is a bell-shaped curve around algorithms where some use more
> memory, and some use less. You can find the algorithms where GA144 is
> useful, but they are likely to be only a few, because GA144 is clearly
> unbalanced - it does not allow one bit of memory IO per core operation,
> it allows much less.
>
> The less balanced an architecture is, the fewer purposes you will find
> for it. Your critique that you have to get acustomed with an unusual
> development system is valid, too, but it is an entirely different
> critique.

"Balanced" has no meaning except in the context of applications. So
by definition an architecture that is "balanced" for most applications
will suit most applications. I believe you just defined it that way.
But you haven't shown anything about how the GA144 is unbalanced other
than stating some "rule of thumb". Apps are not really the same as
algorithms. Fitting an app to a processor is not the same as fitting
algorithms to a processor.

I still say you really don't "get" the GA144. I have repeatedly made
the point that I don't think of the GA144 as a processor, I think of
it more like an FPGA with processors instead of LUTs. No one worries
if the 4 input LUT in an FPGA is not a good match to their algorithm.
They just use them the best way they can figure and if some LUTs are
used as inverters, fine. In fact, in the older days, it was not
uncommon for a CLB in a Xilinx part to be used solely for routing!
After all, they claim to be selling you the routing and giving you the
logic for free.

Consider that you use a processor in the GA144 to implement functions
without worrying if you use all the MIPS. MIPS are very inexpensive
in the GA144 and you should feel free to let many of them go to
waste! Once you appreciate that I think the GA144 will make more
sense to you and you will find more apps to apply it to. Maybe you
should consider that GA is selling you the I/O and you get the MIPS
for free!

In other words, think about your apps, not about how many MIPS are
wasted... MIPS are ALWAYS wasted in every processor.

Rick

Paul Rubin

unread,
Feb 7, 2012, 8:07:07 PM2/7/12
to
rickman <gnu...@gmail.com> writes:
> I still say you really don't "get" the GA144. I have repeatedly made
> the point that I don't think of the GA144 as a processor, I think of
> it more like an FPGA with processors instead of LUTs.

Well, where are we then:

1) try to figure out how to use the GA144 instead of traditional
CPU's, say for web browsing. Conclude the GA144 isn't really
good for that due to lack of memory, so we should look for
FPGA-like applications instead.

2) try to figure out how to use the GA144 instead of traditional
FPGA's, say for i/o protocols such as SATA or ethernet. Now
there's maybe enough memory, but nowhere near enough i/o bandwidth.

So what does that leave? There's limitations in both views. I also
remember Bernd saying that his b16 applications tended to use around 1k
(bytes?) of program rom. The GA144 has just 64 words of rom per cpu,
requiring at best complicated partitioning of the code across multiple
nodes. And unless you're getting custom masks made with your program in
the rom, you have to put the code into RAM instead, which eats into the
amount of data you can operate on. FPGA's typically have block RAM and
you can use CLB's as ram, while routing corresponds to program rom.

I know that when I've tried thinking up potential GA144 apps and coding
them, I always hit walls made by the limited code space. Having 1k
bytes (400 words, say) would have opened up lots more possibilities.

It might be interesting to have some GA processor nodes available as
hard CPU blocks in a normal FPGA, comparable to how they now have DSP
slices and RAM blocks. They'd be much more compact than the synthesized
softcore CPU's one often finds in FPGA-based designs.

rickman

unread,
Feb 7, 2012, 9:18:00 PM2/7/12
to
On Feb 7, 8:07 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> > I still say you really don't "get" the GA144. I have repeatedly made
> > the point that I don't think of the GA144 as a processor, I think of
> > it more like an FPGA with processors instead of LUTs.
>
> Well, where are we then:
>
> 1) try to figure out how to use the GA144 instead of traditional
> CPU's, say for web browsing. Conclude the GA144 isn't really
> good for that due to lack of memory, so we should look for
> FPGA-like applications instead.
>
> 2) try to figure out how to use the GA144 instead of traditional
> FPGA's, say for i/o protocols such as SATA or ethernet. Now
> there's maybe enough memory, but nowhere near enough i/o bandwidth.
>
> So what does that leave? There's limitations in both views. I also
> remember Bernd saying that his b16 applications tended to use around 1k
> (bytes?) of program rom. The GA144 has just 64 words of rom per cpu,
> requiring at best complicated partitioning of the code across multiple
> nodes. And unless you're getting custom masks made with your program in
> the rom, you have to put the code into RAM instead, which eats into the
> amount of data you can operate on. FPGA's typically have block RAM and
> you can use CLB's as ram, while routing corresponds to program rom.

Let me understand. You are saying that if a processor can't be used
as a web browser then there are not many apps that can be run on the
processor. Also if an FPGA can't be used for SATA or Ethernet then
there aren't many apps to be done on the FPGA (btw, I'm not sure you
can't use a GA144 for Ethernet, just not for 100 Mbps without external
support). Do you see the fallacy in that? There are TONS of apps
that aren't excluded by these limitations but that could make use of
the GA144 processors.

I have a board in production that would be well suited for the GA144
if it wasn't already designed. The board is small with a low end
FPGA, a stereo audio codec, some analog buffering and some level
translation (RS-422, 3.3 V to 5 V, etc). The GA144 has real potential
for this and would provide so much more potential for additional
processing. The board currently does some signal processing to
demodulate an IRIG-B signal. The GA144 would be able to do much
faster versions of IRIG. We recently added pure audio capability and
the GA144 could add all sorts of functions to that including speech
compression. The GA144 is slightly higher cost than the existing FPGA
and CODEC but parts cost is not a problem currently and a single digit
$ cost increase would not be a problem. Interestingly I included a
1.8 volt power converter so a lower power version of the FPGA could be
used if needed, so the board space is already there for that. If I
was selling more of these boards I would consider pushing my customer
to try to generate some interest for additional capabilities.

I have any number of apps that would make use of a GA32 or even a
GA4.


> I know that when I've tried thinking up potential GA144 apps and coding
> them, I always hit walls made by the limited code space. Having 1k
> bytes (400 words, say) would have opened up lots more possibilities.
>
> It might be interesting to have some GA processor nodes available as
> hard CPU blocks in a normal FPGA, comparable to how they now have DSP
> slices and RAM blocks. They'd be much more compact than the synthesized
> softcore CPU's one often finds in FPGA-based designs.

Yes, I wouldn't argue with that. Even as is, I would like to see some
features added to the GA144 such as versatile I/O blocks supporting
multiple I/O standards including true differential I/O and 5 volt
tolerance. That would not impact the core CPUs at all really. It
would just require some additional set up registers in the I/O CPUs.
I'd also like to see support for a practical crystal oscillator.
Having to design an external oscillator or buy one is something few
other CPU chips do. Almost all of them will work with a crystal
directly.

Rick

van...@vsta.org

unread,
Feb 7, 2012, 11:08:31 PM2/7/12
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> I know that when I've tried thinking up potential GA144 apps and coding
> them, I always hit walls made by the limited code space. Having 1k
> bytes (400 words, say) would have opened up lots more possibilities.

For network packet processing, I always figured you'd need a GA144-like
architecture, except with horizontal and vertical memory busses for each row
and column. Design them without arbitration in order to simplify them, and
let the surrounding board supply SRAM to the subset of rows and columns your
design ends up actually needing.

This way you can provide larger amounts of state at the critical points in
the processing chain, without having to bog yourself down with the full
"matrix of Pentium" budget. I still doubt 64 words is the sweet spot, but a
couple hundred words and those external memory primitives would really open
up the design space, at least in the way I was looking at using this kind of
processing. You could even do creative things like having your north
neighbor strobe the address to the SRAM on the vertical bus a while before
sending the actual work south to your node. Design the sequence right and
the needed SRAM data is already available, minimizing bus+SRAM latencies.

--
Andy Valencia
Home page: http://www.vsta.org/andy/
To contact me: http://www.vsta.org/contact/andy.html

Bernd Paysan

unread,
Feb 9, 2012, 6:12:29 PM2/9/12
to
rickman wrote:
> I still say you really don't "get" the GA144. I have repeatedly made
> the point that I don't think of the GA144 as a processor, I think of
> it more like an FPGA with processors instead of LUTs. No one worries
> if the 4 input LUT in an FPGA is not a good match to their algorithm.

Yes, but one *does* worry about the amount of on-chip memory and IO
bandwidth in an FPGA. And look at FPGAs, they have more memory and
significantly more IO bandwidth than the GA144, when e.g. having a
comparable amount of multipliers.

There rules of thumbs are true for FPGAs, too, though of course they are
formulated differently. The rule of thumb there is that per 4-input
LUT, you need a DFF, and per LUT+DFF, you need about 100 bits of on-chip
SRAM. This is quite a lot of SRAM per 4-input logic function.

When you want to compute something, you need data, computation without
data is meaningless. To store data, you need memory.

Elizabeth D. Rather

unread,
Feb 9, 2012, 7:59:09 PM2/9/12
to
As I understand it, the theory with the GA parts is that data *flows
through* one or more cores that are performing various procedures on it
before ultimately storing it in off-chip RAM or sending out to a device.
Therefore the requirement for data storage on a single cell is minimal.

Cheers,
Elizabeth

--
==================================================
Elizabeth D. Rather (US & Canada) 800-55-FORTH
FORTH Inc. +1 310.999.6784
5959 West Century Blvd. Suite 700
Los Angeles, CA 90045
http://www.forth.com

"Forth-based products and Services for real-time
applications since 1973."
==================================================

rickman

unread,
Feb 9, 2012, 10:58:32 PM2/9/12
to
On Feb 9, 6:12 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> rickman wrote:
> > I still say you really don't "get" the GA144.  I have repeatedly made
> > the point that I don't think of the GA144 as a processor, I think of
> > it more like an FPGA with processors instead of LUTs.  No one worries
> > if the 4 input LUT in an FPGA is not a good match to their algorithm.
>
> Yes, but one *does* worry about the amount of on-chip memory and IO
> bandwidth in an FPGA. And look at FPGAs, they have more memory and
> significantly more IO bandwidth than the GA144, when e.g. having a
> comparable amount of multipliers.
>
> There rules of thumbs are true for FPGAs, too, though of course they are
> formulated differently.  The rule of thumb there is that per 4-input
> LUT, you need a DFF, and per LUT+DFF, you need about 100 bits of on-chip
> SRAM.  This is quite a lot of SRAM per 4-input logic function.

I don't know where you get your rules of thumb. I have been using
FPGAs for 15 years and I've never heard this. In fact, in 1995 there
was NO on chip memory other than the LUTs which in Xilinx parts could
be used as 16 bits of SRAM per 4-input LUT. Even the DFF is not at
all standard. I have seen a number of devices that provide 3 DFF for
4 LUTs because they know all the DFFs in a design are seldom used.
The reason there was no on chip memory? Because the devices were
routing limited. It was only after the density of chips increased to
a point where they could provide better routing resources and enough
logic that it was better to include ram than more logic. So there has
always been a tradeoff.

Today the density of FPGAs allows DSP modules by the thousands, memory
by the Mbit and nearly a million LUTs. Potentially you could put the
equivalent of dozens of GA144 devices in a single Virtex 6 device.
What does that have to do with the GA144?

Sure more memory is better... as long as you don't have to give up
anything to get it. The point is that the GA144 is a useful device,
if you don't let your prejudices get in the way.


> When you want to compute something, you need data, computation without
> data is meaningless.  To store data, you need memory.
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/

Ok, believe in the "rules of thumb". In the mean time others will be
making products with the GA144. I may be one of them shortly.

Rick

Arnold Doray

unread,
Feb 10, 2012, 5:23:17 AM2/10/12
to
On Thu, 09 Feb 2012 14:59:09 -1000, Elizabeth D. Rather wrote:

> As I understand it, the theory with the GA parts is that data *flows
> through* one or more cores that are performing various procedures on it
> before ultimately storing it in off-chip RAM or sending out to a device.
> Therefore the requirement for data storage on a single cell is minimal.

I don't believe there is any disagreement on this point. The argument is
whether there is a "mismatch" between how fast computation is done and
how fast data can be read/written off/to external memory to feed that
computation.

Of course, any "mismatch" is relative to the task at hand. But still,
it's a valid criticism because many traditional tasks might not be a good
fit for the GA144 because of this mismatch.

The $1M question is whether there are commercially compelling
applications for this chip despite this limitation.

Cheers,
Arnold

Arnold Doray

unread,
Feb 10, 2012, 6:02:36 AM2/10/12
to
On Tue, 07 Feb 2012 16:35:17 +0100, Bernd Paysan wrote:

> Arnold Doray wrote:
>
>> On Sat, 21 Jan 2012 23:40:04 +0100, Bernd Paysan wrote:
>>> I don't think you understood what I wrote. I wrote that GPGPUs -
>>> which have many cores, just like GA144 - do actually provide their
>>> many cores with an adequate memory interface and an adequate amount of
>>> local memory for each core - GA144 does not, the nodes are severely
>>> memory-starved for all but a few algorithms, and external memory
>>> access is cumbersome and slow.
>>>
>>>
>> The GA144 and GPGPUs are meant for vastly different markets. I don't
>> see any meaningful comparisons being drawn between them. That they rely
>> on multiple simple cores does not make them comparable either. That
>> would be like comparing radios and televisions just because both use
>> transistors.
>
> You don't understand. It does not matter which target market it is, it
> matters that you usually need memory and bandwidth for computation. This
> is a very fundamental issue. It does not matter what power consumption
> GA144 has, when you can't get the data in to perform the operation.
>

Yes, that's true: You need to get the data into the CPU quickly enough
otherwise your computation waits on I/O.

But even so, this rule simply means you have to be mindful of the data
needs of your program. Yes, a lot of the traditional apps won't fit the
GA144's limitations, but that doesn't mean the GA144 is merely a
"research" chip as you claim.

>>> This means GA144 is severely limited in what it can be used for.
>>
>> "severly limited" is a relative term. One poster has used a similar
>> device (the SeaForth chip) for music synthesizers, machine vision, etc.
>> and found it much better compared to the alternatives. In these cases,
>> the "slowness" of the GA144's I/O is not an issue. You can just stream
>> data into it from external SDRAM.
>
> In these cases the number of operations per data from the external SDRAM
> is sufficiently large, and the internal state needed to perform these
> algorithms sufficiently small - I know that for FIR filters and edge
> detection on images, this is actually true, they can operate with very
> little state and really small programs. Other approaches at
> synthesizers like sample-based ones will not work that well on GA144.
>
> The rule of thumb is that for one operation you need one bit of memory
> IO (with the amount of memory the GA144 processing element has). There
> probably is a bell-shaped curve around algorithms where some use more
> memory, and some use less. You can find the algorithms where GA144 is
> useful, but they are likely to be only a few, because GA144 is clearly
> unbalanced - it does not allow one bit of memory IO per core operation,
> it allows much less.
>
> The less balanced an architecture is, the fewer purposes you will find
> for it.

Yes, but have you confused "number of algorithms fitting the GA144" with
"number of deployments that the GA144 can go on"? The latter is the only
economic metric any business cares about. A single application on a
billion deployments can still make a compelling business case.

Also, even if your app had to wait on I/O, there could be other reasons
why using the GA144 might still make sense.

Cheers,
Arnold


Arnold Doray

unread,
Feb 10, 2012, 6:16:53 AM2/10/12
to
On Thu, 09 Feb 2012 19:58:32 -0800, rickman wrote:

> Ok, believe in the "rules of thumb". In the mean time others will be
> making products with the GA144. I may be one of them shortly.
>
> Rick

This reminds me of an Issac Azimov short story: A Scientist proves that a
force field technology could never be useful because an enormous amount
of energy is needed to sustain a field for any reasonable length of time.
Meanwhile, a Technician, unaware of the scientist's proof, comes up with
a practical force field -- instead of a continuously sustained field, the
Technician's invention switches the field on/off for billionths of a
second at a time.

Cheers,
Arnold

Bernd Paysan

unread,
Feb 10, 2012, 11:27:47 AM2/10/12
to
Arnold Doray wrote:

> On Thu, 09 Feb 2012 19:58:32 -0800, rickman wrote:
>
>> Ok, believe in the "rules of thumb". In the mean time others will be
>> making products with the GA144. I may be one of them shortly.
>
> This reminds me of an Issac Azimov short story: A Scientist proves
> that a force field technology could never be useful because an
> enormous amount of energy is needed to sustain a field for any
> reasonable length of time. Meanwhile, a Technician, unaware of the
> scientist's proof, comes up with a practical force field -- instead of
> a continuously sustained field, the Technician's invention switches
> the field on/off for billionths of a second at a time.

However, this time it is the other way round. These rules of thumbs
that go into FPGAs, GPGPUs and other devices used all over the world by
many people *are* derived from practical needs, from real algorithms and
measurements. 15 years ago, as stated, FPGAs did not follow these rule
of thumbs, and as consequence were severely limited. The relationship
between calculation and memory resources were derived from real designs,
not from scientific theories, the engineers of Altera and Xilinx try to
find a good compromise when they put various hard macros (memory,
multipliers) on their chips.

If Chuck's theory is (as Elizabeth stated) that data flows through the
nodes, and only very minimal storage is needed, *this* is the theory
part here. The practice shows otherwise, it shows that algorithms need
state and memory. Not all of them in the same amount, rules of thumbs
are averages found by analyzing data.

And a "rule of thumb" is clearly not a scientific proof of something.
It is a practical rule, which is not an universal truth. So go ahead
and make your product based on the GA144. There will be nieches where
the memory starved CPU nodes and the limited IO bandwidth are not a
problem. But please acknowledge it: This is a special purpose device,
deliberately shifted to a particular corner. Maybe it's even a valid
business strategy, as the center of the memory/computation power chart
is already full of players.

Bernd Paysan

unread,
Feb 10, 2012, 11:54:03 AM2/10/12
to
I've already addressed this issue: When you have your billion
deployment case, you build the chip around your requirements. No matter
what's available on the marked, because building a special purpose chip
for a billion deployments is *always* worth the price. There are very
few identical chips build in that order of magnitude, if you don't count
discretes and memory.

This is not what Chuck did. GA144 is not build around some requirements
for a mass-market product. It is clearly a solution looking for a
problem, not the other way round. Solutions looking for problems can be
successful, but it is by accident. This is a product with an unknown
business case, and as Chuck doesn't know what people will use this
product for, it is research. It is exploration of unknown territory.
Normal chip design does not work like this, you have a customer, you
know what the customer wants, and you have some freedom for
implementation details, but the requirements are there before you start
(they will change while the project is going, but not that much).

I've watching Chuck for the last 20 years doing microprocessors here and
there, it was always interesting to watch. The relation between work
and genius put into it and useful products resulting from that however
was very bad.

rickman

unread,
Feb 16, 2012, 4:12:17 PM2/16/12
to
On Feb 10, 11:27 am, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> Arnold Doray wrote:
> > On Thu, 09 Feb 2012 19:58:32 -0800, rickman wrote:
>
> >> Ok, believe in the "rules of thumb". In the mean time others will be
> >> making products with the GA144. I may be one of them shortly.
>
> > This reminds me of an Issac Azimov short story: A Scientist proves
> > that a force field technology could never be useful because an
> > enormous amount of energy is needed to sustain a field for any
> > reasonable length of time. Meanwhile, a Technician, unaware of the
> > scientist's proof, comes up with a practical force field -- instead of
> > a continuously sustained field, the Technician's invention switches
> > the field on/off for billionths of a second at a time.
>
> However, this time it is the other way round. These rules of thumbs
> that go into FPGAs, GPGPUs and other devices used all over the world by
> many people *are* derived from practical needs, from real algorithms and
> measurements. 15 years ago, as stated, FPGAs did not follow these rule
> of thumbs, and as consequence were severely limited. The relationship
> between calculation and memory resources were derived from real designs,
> not from scientific theories, the engineers of Altera and Xilinx try to
> find a good compromise when they put various hard macros (memory,
> multipliers) on their chips.

You talk of these rules of FPGAs, but I have been designing with FPGAs
since the XC3000 parts and I don't recall ever hearing such rules
stated anywhere. Even if such rules were given by someone at
sometime, just like any generalization, they only apply when the
underlying *assumptions* are true. Often people forget the underlying
assumptions and only remember the rule.

For the most part, the amount of memory on an FPGA is a matter of
marketing rather than engineering really. They sell the same parts to
a wide market with a focus on their biggest customers. So the chip
designs optimize their profits and don't really constitute an FPGA
design principle.


> If Chuck's theory is (as Elizabeth stated) that data flows through the
> nodes, and only very minimal storage is needed, *this* is the theory
> part here. The practice shows otherwise, it shows that algorithms need
> state and memory. Not all of them in the same amount, rules of thumbs
> are averages found by analyzing data.

Here is a "niche" where very little working memory is required,
Software Defined Radio (SDR). The basic receiver chain can be done on
a GA144 entirely from the IF to audio by using the on chip ADCs and
DACs. No external memory would be needed other than the Flash for
program storage during boot. But then I guess that is just a tiny
market... ;^)


> And a "rule of thumb" is clearly not a scientific proof of something.
> It is a practical rule, which is not an universal truth. So go ahead
> and make your product based on the GA144. There will be nieches where
> the memory starved CPU nodes and the limited IO bandwidth are not a
> problem. But please acknowledge it: This is a special purpose device,
> deliberately shifted to a particular corner. Maybe it's even a valid
> business strategy, as the center of the memory/computation power chart
> is already full of players.

I won't argue with those words because they also describe GPUs and
DSPs, special purpose devices deliberately shifted to a particular
corner.. and very large corners at that.

I have never denied that the GA144 is memory starved. I have tried to
point out that by focusing on this limitation you are unreasonably
limiting the application of the device. I maintain that there are
plenty of useful apps that can be done using such a device and you
seem to insist that they are very few. Just ten years ago most
engineers would have said the Atom and the ARM processors were only
good for "niche" processing applications. Their low power constraints
limited the processing so much they would never be able to suit the
"center of the computation power chart". The market has shifted
because of their existence and now define the center of the market.

Maybe the GA144 won't be a hugely successful device. I don't like the
$20 price tag. I think the raspberry Pi people are not paying that
much for the Broadcom chip, or they couldn't sell the board for $25.
A GA32 at $5 would be much more suitable for my needs. I suspect the
family won't really take off until it is updated for some market
realities and moved to a smaller process enabling perhaps 1024 nodes.
I'd also like to see more RAM, but I am going to explore the device as
is and see where it goes from here. Hopefully not the dust bin of
history.

Rick

Paul Rubin

unread,
Feb 16, 2012, 4:45:16 PM2/16/12
to
rickman <gnu...@gmail.com> writes:
> Here is a "niche" where very little working memory is required,
> Software Defined Radio (SDR). The basic receiver chain can be done on
> a GA144 entirely from the IF to audio by using the on chip ADCs and
> DACs. No external memory would be needed other than the Flash for
> program storage during boot. But then I guess that is just a tiny
> market... ;^)

I'm having trouble believing that, for spread spectrum schemes where the
spreading function is complicated, or you want to handle wideband
signals (e.g. GNU SDR can demodulate multiple FM broadcasts from a
single A/D simultaneously, needing sample rate in the 10's or 100's of
MHz if I understand correctly).

> I won't argue with those words because they also describe GPUs and
> DSPs, special purpose devices deliberately shifted to a particular
> corner.. and very large corners at that.

Bernd made the point that applications for GPU's and DSP's were well
known before GPU's and DSP's were actually developed, and the
applications drove the development. Is there a known application that
drove the development of the GA144?

> I maintain that there are plenty of useful apps that can be done using
> such a device and you seem to insist that they are very few.

Well, few and plenty are relative terms... try to think of applications
for DSP's or microcontrollers and they come to mind far more rapidly
than they do for the GA144, it seems to me.

> A GA32 at $5 would be much more suitable for my needs. I suspect the
> family won't really take off until it is updated for some market
> realities and moved to a smaller process enabling perhaps 1024 nodes.

I haven't heard of anyone figuring out to do with even 144 nodes (given
the GA144's memory and i/o constraints) so I have a hard time seeing a
GA1024 as doing much good without a lot of other new capabilities. Even
the GA144 (it seems to me) doesn't have enough interconnect between its
nodes, which could be added fairly easily, according to an earlier clf
discussion with Jeff Fox. I do like the idea of a $5 GA32 and it seems
to me that it could have similar total i/o capacity to the GA144,
bringing it much closer to Bernd's suggested i/o to mips ratio.

> I'd also like to see more RAM, but I am going to explore the device as
> is and see where it goes from here. Hopefully not the dust bin of
> history.

I'd like to see an FPGA with some ram blocks, some DSP slices, and some
GA controller nodes as hard macros. I do like the spirit of the GA
folks, and I wish them success one way or the other. I'm still
scratching my head on the technical side.

Tarkin

unread,
Feb 16, 2012, 4:58:46 PM2/16/12
to
On Jan 19, 5:27 pm, Bernd Paysan <bernd.pay...@gmx.de> wrote:
> rickman wrote:
> > I don't see the comparison to GPUs as they are power hungry devices
> > and do not lend themselves to applications outside of PCs.  I've never
> > seen them used in any other designs.  Although they can be separately
> > programmed, they don't run an OS and don't work standalone.
>
> Well, GPUs as such are power hungry devices, but in terms of
> power/compute power, they are prett damn'd good.  A teraflops (single
> precision) for 200W?  Well, if you only need a gigaflop, a GPU is too
> big for you.
>
> > I think your comparison with FPGAs is not too far off target.  I
> > question the price issue as the Cyclone 5 devices are quite large,
> > even the smallest.
>
> Well, it's a 28nm device.  It isn't large, even the largest.  It just
> has a lot of gates on it.  Because it uses the latest available
> technology.
>
> > But in general the GA144 is prices in the same
> > ballpark as low end FPGAs.  I don't think it would be easy to get the
> > same processing speed in a low end FPGA, but then at GA144 won't be so
> > easy to get working with 100+ MHz signals.
>
> A GA144 in 28nm would probably a much more impressive device than a
> Cyclone 5.  But it isn't available.  I think the Cyclone 5 with the 144
> multipliers will beat a GA144 hands down in whatever task you can
> imagine.
>
> I've done my b16 in similar processes as Chuck did his stuff.  But that
> was because my b16 always was just an add-on to some custom-specific
> ASIC with analog and power components, and analog+power usually is added
> to decade old processes (there is a current counter-trend to this,
> because people want SoCs, and the power+analog should not go into
> another die).  The main reason to deploy the b16 never was speed, it
> always was area and power consumption (be quick to finish is good for
> power consumption, too).
>
> The IO stuff always was dedicated Verilog, not bit-banging in software.
> It always cooperated with the CPU, or used DMA to write directly into
> memory.
>
> I heard Chuck talking at EuroForth about 10 years ago, when he started
> this multi-chip endeaver.  He clearly said that he thinks this is a cool
> idea, but did not know what kind of product that would go into.
>
> This is the definition of research project.  You have a cool idea, but
> no product.  Research is necessary.  But to make it a product, you have
> to think about how people will use it.  In a product-driven development,
> the requirements are first, the solution (the actual implementation)
> comes next.
>
> --
> Bernd Paysan
> "If you want it done right, you have to do it yourself"http://bernd-paysan.de/

Isn't AI the 800-pound (362.87 kg) gorilla in the room?
Each "neuron" need ins, outs, weights, and a function.
With low power consumption, 'stacking' (or otherwise interconnecting)
these individual nets draws less power than say, a cheap cluster of
x86 boxen (Don't know about an ARM array, though- an ARMy?).

TTFN,
Tarkin

van...@vsta.org

unread,
Feb 16, 2012, 8:11:15 PM2/16/12
to
Tarkin <tark...@gmail.com> wrote:
> Isn't AI the 800-pound (362.87 kg) gorilla in the room?
> Each "neuron" need ins, outs, weights, and a function.

I believe current studies on human neurons are that in both signalling and
state there is quite a bit more complexity than was originally appreciated.
For interconnect, simpler neural nets (think earthworms) use *timing* of
inter-neuron traffic as a control mechanism, which greatly simplified the
neural net topology. For a long time, it was thought that time signalling
was not used in the much more complex human neural net. More recent studies
are indicating that it's in fact being used for a whole deeper, previously
unappreciated level of complexity.

Similarly, the standard model is that each neuron is a simple actor in the
net. Again, recent studies are finding vastly more complex interior state to
the neuron's mechanism (at least in humans).

So we may not know enough about AI neurons to be able to look at a particular
building block and decide whether or not it's a fit.

Tarkin

unread,
Feb 16, 2012, 11:32:00 PM2/16/12
to
On Feb 16, 8:11 pm, van...@vsta.org wrote:
To be sure, but I am suggesting that the GA chip(s) might prove
to be a closer approximation at a power level far less
than what is being used currently.

TTFN,
Tarkin

Bernd Paysan

unread,
Feb 18, 2012, 7:50:21 PM2/18/12
to
rickman wrote:
> You talk of these rules of FPGAs, but I have been designing with FPGAs
> since the XC3000 parts and I don't recall ever hearing such rules
> stated anywhere. Even if such rules were given by someone at
> sometime, just like any generalization, they only apply when the
> underlying *assumptions* are true. Often people forget the underlying
> assumptions and only remember the rule.
>
> For the most part, the amount of memory on an FPGA is a matter of
> marketing rather than engineering really. They sell the same parts to
> a wide market with a focus on their biggest customers. So the chip
> designs optimize their profits and don't really constitute an FPGA
> design principle.

This is a really funny remark. Do you understand what you say? IMHO,
you prove my point. Yet you seem to disagree with me.

Yes, FPGAs are customer-driven. The amount of memory, LUTs, DSP
functions and such that go into an FPGA are based on customer
requirements, and weighted by prospected sales, so customers with odd
requirements and low volume will not get an optimal device, while the
majority gets a good fit.

This is what this "rule of thumb" is about, this is the whole point: It
defines a good balance between these different resources, obtained by
actual requirements. Like I said, GPUs, DSPs, FPGAs, and full custom
chips are usually desigend to meet requirements from existing customers.
You may call this "marketing", I don't care what name it has. It is an
actual feedback loop between consumer and producer, optimizing the
product to the needs of the majority of its consumers.

FPGAs sell to engineers, and if you have an unbalanced FPGA, which
"looks good on paper", because some numbers are bigger than the
competition, it will not sell, because it will not meet the demand of
the customers, and the customers are intelligent enough to see the
problem. The purchase decision for an FPGA usually is quite low in the
food chain, because it is not expensive. Therefore, the pointy hair
factor is low.

The actual ratios between IO bandwidth, internal memory, and
computational functions usually is *not* determined by the rules of
thumb as I describe, it is determined by the actual requirements. The
rule of thumb, as usual, is only a heuristic to get there *without*
doing all the hard work, and actually analyze the requirements. That's
the whole point of a "rule of thumb".

Concerning audio applications: I still remember the memory requirements
for the two filters I did for the DDFA. The constraints are bit
different from the GA144, because I used a single-cycle 35x32 MAC unit,
and designed the other components to share that unit if necessary. A
GA144 node needs about 200 cycles to do this multiplication and
accumulation, but the cycle time is shorter, so maybe one node is a
factor 30 slower. I.e. the computation power of one GA144 (with all 144
nodes) is about four of my multipliers (I had six in that chip in
total). The whole amount of RAM in the DDFA design is about 30k bytes.
Scaling that down (GA144 computation power 2/3rd of Zetex DDFA), we get
20k bytes. This is roughtly what the GA144 has. So to speak, for an
audio signal processing unit, computation power and memory are indeed
ok. And maybe, I'm miscalculating my rule of thumb, because I use the
~1ns cycle time as baseline. If I use a 36x36 MAC operation as
baseline, one CPU in a GA144 is so much slower that the balance shifts
towards numbers I'm used to. After all, our DDFA design did fit well
into FGPAs, using up LUTs, memory, and DSP functions.

But then, doing these rather wide multiplications is an abuse of the
GA144, it is not designed to do then efficiently.

rickman

unread,
Feb 20, 2012, 7:35:25 PM2/20/12
to
On Feb 16, 4:45 pm, Paul Rubin <no.em...@nospam.invalid> wrote:
> rickman <gnu...@gmail.com> writes:
> > Here is a "niche" where very little working memory is required,
> > Software Defined Radio (SDR). The basic receiver chain can be done on
> > a GA144 entirely from the IF to audio by using the on chip ADCs and
> > DACs. No external memory would be needed other than the Flash for
> > program storage during boot. But then I guess that is just a tiny
> > market... ;^)
>
> I'm having trouble believing that, for spread spectrum schemes where the
> spreading function is complicated, or you want to handle wideband
> signals (e.g. GNU SDR can demodulate multiple FM broadcasts from a
> single A/D simultaneously, needing sample rate in the 10's or 100's of
> MHz if I understand correctly).

I'm not clear on what you have trouble believing? I am sure there are
any number of things that are hard to impossible to do on a GA144, but
there are lots of SDRs that can be implemented on a GA144 with an
appropriate IF input. No, I don't think the ADCs will handle a 70 MHz
IF with 20 MHz of bandwidth, but that is not the spec of every SDR.


> > I won't argue with those words because they also describe GPUs and
> > DSPs, special purpose devices deliberately shifted to a particular
> > corner.. and very large corners at that.
>
> Bernd made the point that applications for GPU's and DSP's were well
> known before GPU's and DSP's were actually developed, and the
> applications drove the development. Is there a known application that
> drove the development of the GA144?

I don't get the point of the question. I think you have snipped too
much context. Bernd was talking about the GA144 being "special
purpose" as if that meant its market was limited. My point is that
DSPs and GPUs are "special purpose" and still have huge markets. For
the DSP the market (cell phones) didn't exist initially if I
remember.


> > I maintain that there are plenty of useful apps that can be done using
> > such a device and you seem to insist that they are very few.
>
> Well, few and plenty are relative terms... try to think of applications
> for DSP's or microcontrollers and they come to mind far more rapidly
> than they do for the GA144, it seems to me.

That is largely because you aren't familiar with the GA144 I expect.
I don't see any shortage of markets for it. What markets is the GA144
excluded from that the DSPs and MCUs address? The only limitation
that I see is that the entry level price is over $20 currently because
you need to add a Flash and a RAM. But that is also true of many DSPs
depending on the particular unit. Also, the GA144 is just one chip,
the family is yet to be filled out and there are lots of variations
that can be done in the future. The first DSP and MCU were real
pieces of junk compared to what we have today.


> > A GA32 at $5 would be much more suitable for my needs. I suspect the
> > family won't really take off until it is updated for some market
> > realities and moved to a smaller process enabling perhaps 1024 nodes.
>
> I haven't heard of anyone figuring out to do with even 144 nodes (given
> the GA144's memory and i/o constraints) so I have a hard time seeing a
> GA1024 as doing much good without a lot of other new capabilities. Even
> the GA144 (it seems to me) doesn't have enough interconnect between its
> nodes, which could be added fairly easily, according to an earlier clf
> discussion with Jeff Fox. I do like the idea of a $5 GA32 and it seems
> to me that it could have similar total i/o capacity to the GA144,
> bringing it much closer to Bernd's suggested i/o to mips ratio.

You keep talking about what you can imagine. I'm not totally clear on
what you don't like about varaious things, but it is very clear to me
you keep seeing the GA144 the same way you see a DSP or an MCU. Until
you see it as it is, I don't think you will find much use for it. I
would love to read what Jeff would have to say about it. I have
always learned a lot from reading Jeff's posts here.


> > I'd also like to see more RAM, but I am going to explore the device as
> > is and see where it goes from here. Hopefully not the dust bin of
> > history.
>
> I'd like to see an FPGA with some ram blocks, some DSP slices, and some
> GA controller nodes as hard macros. I do like the spirit of the GA
> folks, and I wish them success one way or the other. I'm still
> scratching my head on the technical side.

I think the FPGA is the analog for the GA144 (or GA1024), much more so
than MCUs or DSPs. Designers often do DSP on FPGAs. Does anyone
worry that the FPGA isn't keeping LUTs busy, etc? No, they decompose
the problem to fit the available resources. Focus on what the GA144
is rich with and work around what people see as limitations.

Rick

rickman

unread,
Feb 20, 2012, 7:47:00 PM2/20/12
to
This discussion is starting to get personal and I don't want to
argue. I'll just say that your numbers for the multiply are off by a
minimum of a factor of two. But even if they were on target, they
have no value in this conversation because they apply to one design
niche. Are you really going to try to itemize all the designs that
the GA144 isn't good for? All audio doesn't require 35 bit
multiplies. I'm pretty sure the GA144 was intended to do audio since
one of the apps some of the predecessor chips were touted for was home
theater.

Anyway, do you think there is much use to continuing this
conversation? I think we have plowed this furrow enough and will just
continue to see the thing differently.

Rick

van...@vsta.org

unread,
Feb 20, 2012, 8:03:09 PM2/20/12
to
rickman <gnu...@gmail.com> wrote:
> I'm not clear on what you have trouble believing? I am sure there are
> any number of things that are hard to impossible to do on a GA144, but
> there are lots of SDRs that can be implemented on a GA144 with an
> appropriate IF input.

Has anybody verified their MD5 appnote? Does it generate correct results?
What kind of performance does it get? Who designed and wrote the code, and
how long did it take to develop?

It seems like the only non-trivial piece of functionality to be documented
for the GA144. If it's correct and fast after less than a week's effort, the
GA144 strategy is probably OK. If it's buggy and slow after months of
expert development... it could flag a problem.

Bernd Paysan

unread,
Feb 20, 2012, 8:27:15 PM2/20/12
to
rickman wrote:
> This discussion is starting to get personal and I don't want to
> argue. I'll just say that your numbers for the multiply are off by a
> minimum of a factor of two.

Are they? The GA144 core can do a 18x1 multiply step, and AFAIK each
takes two cycles (full carry chain). To make that a 36x36
multiplication, you need 4 times 18x18, i.e. 144 cycles for the
multiplication steps alone. Add accumulation and a bit of overhead, and
you end up in "the order of magnitude" of 200, which I stated.

> But even if they were on target, they
> have no value in this conversation because they apply to one design
> niche. Are you really going to try to itemize all the designs that
> the GA144 isn't good for?

No, I'm trying to map a real-world design I know of to the GA144, which
is "more general purpose", and apparently good at audio-style
applications.

> All audio doesn't require 35 bit
> multiplies. I'm pretty sure the GA144 was intended to do audio since
> one of the apps some of the predecessor chips were touted for was home
> theater.

I know. That's what the Zetex DDFA is touted for, as well - it is a
special-purpose chip, built for a particular product segment, and by
design a niche product. Nobody would argue that it is somewhere
unbalanced, because it is designed to a particular target.

The 35 bit internal data path of the DDFA seems to be "absurd", but we
wanted a high quality audio processor and did calculations how FIR
filters add noise to the signal (yes, they do), and that was the
headroom we ended up to make this additional noise be lower than the
actual noise of the system.

As I said: You *first* do the analysis of your requirements, and *then*
you build your product (where of course part of the analysis phase is
building a prototype). Dismissing some particular requirement which was
derived by sound engineering, just because the GA144 looks bad on it,
well, that's fanboi-ism.

> Anyway, do you think there is much use to continuing this
> conversation? I think we have plowed this furrow enough and will just
> continue to see the thing differently.

I try to discuss, I didn't come to a final conclusion about the GA144
yet. If you don't want to discuss, because you "see things
differently", feel free to stop. This discussion started personally (I
said "this is a research chip, not well thought-through"), and I have no
problems with similar personal replies like "even a genius has a bad
day" or such, because, yes, I do have bad days, and I can be convinced.

Paul Rubin

unread,
Feb 20, 2012, 10:07:18 PM2/20/12
to
van...@vsta.org writes:
> Has anybody verified their MD5 appnote? Does it generate correct
> results? What kind of performance does it get? Who designed and
> wrote the code, and how long did it take to develop?

I haven't verified it (did they even post the code) but I've studied the
doc for long enough to be convinced that it would be dog slow. I spent
a considerable amount of time trying to identify cryptography primitives
that could be implemented in the GA144 efficiently, and it's pretty
hard. It's bloody painful to code for the thing given its memory
constraints.

Paul Rubin

unread,
Feb 20, 2012, 10:47:52 PM2/20/12
to
rickman <gnu...@gmail.com> writes:
> I'm not clear on what you have trouble believing? I am sure there are
> any number of things that are hard to impossible to do on a GA144, but
> there are lots of SDRs that can be implemented on a GA144 with an
> appropriate IF input. No, I don't think the ADCs will handle a 70 MHz
> IF with 20 MHz of bandwidth, but that is not the spec of every SDR.

OK, you know more about SDR than I do, and I can think of some legacy
analog modulation schemes that a GA144 could possibly handle. With
digital modulation it may be more of a challenge, even in narrow bands.

> For the DSP the market (cell phones) didn't exist initially if I
> remember.

I thought DSP's were developed for use in the landline phone system long
before there were cell phones. That market, and the market for signal
processing filters in general, formerly had to be served by boxes full
of expensive analog circuitry.

> That is largely because you aren't familiar with the GA144 I expect.
> I don't see any shortage of markets for it. What markets is the GA144
> excluded from that the DSPs and MCUs address?

> The only limitation that I see is that the entry level price is over
> $20 currently because you need to add a Flash and a RAM.

The package is also pretty big, and the processor is slow unless you
have a way to put the parallelism to good use. The 700 mhz base clock
isn't that helpful when you consider how many cycles it takes to get the
data from another node or from external memory, to where you can use it.
The 18 bit wordsize is also pretty awkward for "computer" applications
(it's probably less awkward for signal processing).

> there are lots of variations that can be done in the future. The
> first DSP and MCU were real pieces of junk compared to what we have
> today.

I agree with you that future GA chips (if they are made) may be be much
more directly useable than the GA144. But I think even the earliest
MCU's and DSP's enabled significant cost reductions over what could be
done without them. It was before my time but I think the first MCU
application was a desk calculator using the Intel 4004. The MCU
replaced dozens or 100's of chips used in earlier calculators. The DSP
replaced boards full of complex and fragile analog filter circuits.
What does the GA144 replace?

> You keep talking about what you can imagine. I'm not totally clear on
> what you don't like about varaious things, but it is very clear to me
> you keep seeing the GA144 the same way you see a DSP or an MCU. Until
> you see it as it is, I don't think you will find much use for it. I
> would love to read what Jeff would have to say about it. I have
> always learned a lot from reading Jeff's posts here.

I miss Jeff too, I was just reading his site last night in fact. But I
remember discussing the GA144 with him and still not being that
persuaded. One thing he claimed was that the Seaforth-based hearing aid
was equivalent to a DSP board that used 100 watts, requiring a
prospective user to carry a car battery around. After further
discussion it turned out that the hearing aid had been prototyped with a
high-end DSP that used 1 watt, not 100. The Seaforth chip used quite a
lot less than 1 watt, but it still wasn't clear to me that the functions
couldn't be implemented on a low powered DSP with comparable total
consumption than the 1 watt chip.

> I think the FPGA is the analog for the GA144 (or GA1024), much more so
> than MCUs or DSPs. Designers often do DSP on FPGAs. Does anyone
> worry that the FPGA isn't keeping LUTs busy, etc?

Don't they? I mean, there are small cheap FPGA's and big expensive
ones, and don't designers try to use the small cheap ones when they can
and the big expensive ones only if they must?

And certainly, if I had a giant FPGA (say one with a billion gates) I
would know right away what kinds of interesting new things I could do
with it (like implement a new CPU architecture, or a big
content-addressible memory, or something like DJ Bernstein's factoring
circuits) but it's harder for me to see what to do with a GA1024. If
ideas for it are so easy to come up with, perhaps you could post a few.
Most applications I've thought of for a GA processor can either be done
in 10-20 nodes, or else can't be done competitively with other
approaches at all. But I'm not a hardware guy so I agree with you that
maybe my vision is limited in this area.

Elizabeth D. Rather

unread,
Feb 21, 2012, 12:15:36 AM2/21/12
to
Greg Bailey, who's the head of that team, did a really great job coding
MD5 on an 8051 for me some years ago. I think if anybody can make that
hum satisfactorily it would be Greg.

Paul Rubin

unread,
Feb 21, 2012, 5:28:29 AM2/21/12
to
"Elizabeth D. Rather" <era...@forth.com> writes:
> Greg Bailey, who's the head of that team, did a really great job
> coding MD5 on an 8051 for me some years ago. I think if anybody can
> make that hum satisfactorily it would be Greg.

Yeah, the problem is that the GA144 just isn't a good match for that
algorithm, which relies heavily on 32-bit arithmetic and table lookups
in a 64*32 table, neither of which maps well onto the GA. I think it
might be possible to code the RC4 stream cipher in 3 nodes sort of
reasonably, though I got bogged down on optimizations when I tried to
code it. I may try again sometime. Shift register ciphers like E0
(used in Bluetooth) might be easier to code, but they generally have
poor security records (RC4 also has known imperfections).

Andrew Haley

unread,
Feb 21, 2012, 6:20:03 AM2/21/12
to
Paul Rubin <no.e...@nospam.invalid> wrote:

> I think it might be possible to code the RC4 stream cipher in 3
> nodes sort of reasonably, though I got bogged down on optimizations
> when I tried to code it. I may try again sometime. Shift register
> ciphers like E0 (used in Bluetooth) might be easier to code, but
> they generally have poor security records ...

Generally, maybe, but some have fabulous security records. The
shrinking generator springs immediately to my mind: no-one in the
public domain has even got close to breaking it.

Andrew.

Paul Rubin

unread,
Feb 22, 2012, 5:31:16 AM2/22/12
to
Andrew Haley <andr...@littlepinkcloud.invalid> writes:
> Generally, maybe, but some have fabulous security records. The
> shrinking generator springs immediately to my mind: no-one in the
> public domain has even got close to breaking it.

That is pretty interesting. I did find some papers by Golic and others
about distinguishing attacks, but RC4 has those too, and it's cool that
something so simple works so well. OTOH, it probably hasn't had nearly
as much security analysis as RC4 or AES or whatever. And a GA
implementation involving clocking single bits around would be awfully
slow, like 100's of cycles per keystream bit. Maybe it's feasible to
split a "bit slice" implementation across several nodes. That would a
lot easier if the nodes had more memory, even 128 words instead of 64.

Andrew Haley

unread,
Feb 22, 2012, 6:03:01 AM2/22/12
to
Paul Rubin <no.e...@nospam.invalid> wrote:
> Andrew Haley <andr...@littlepinkcloud.invalid> writes:
>> Generally, maybe, but some have fabulous security records. The
>> shrinking generator springs immediately to my mind: no-one in the
>> public domain has even got close to breaking it.
>
> That is pretty interesting. I did find some papers by Golic and
> others about distinguishing attacks, but RC4 has those too, and it's
> cool that something so simple works so well. OTOH, it probably
> hasn't had nearly as much security analysis as RC4 or AES or
> whatever.

It's had quite a lot: people are fascinated that something so simple
can be so hard to crack, and it's been around for quite a while.

> And a GA implementation involving clocking single bits around would
> be awfully slow, like 100's of cycles per keystream bit. Maybe it's
> feasible to split a "bit slice" implementation across several nodes.
> That would a lot easier if the nodes had more memory, even 128 words
> instead of 64.

Sure. The shrinking generator's strength is really in hardware
implementations, where its low gate count is a big advantage.

Andrew.

forther

unread,
Feb 23, 2012, 6:38:02 PM2/23/12
to
On Monday, February 20, 2012 5:27:15 PM UTC-8, Bernd Paysan wrote:
> The 35 bit internal data path of the DDFA seems to be "absurd", but we
> wanted a high quality audio processor and did calculations how FIR
> filters add noise to the signal (yes, they do), and that was the
> headroom we ended up to make this additional noise be lower than the
> actual noise of the system.

35 bits internal data is not absurd. Absurd is to have both sample history
and the kernel in 35 bits. 18 bits or so of kernel coefficients is way enough.
So, it's not 36x36, but 36x18 MAC, which is required for high quality audio
processing. One bit MAC takes 2 ops (cycles) of g144, meaning that complete
36x18 MAC takes 72+ ops. Not 200. It's the same bollpark though.

> As I said: You *first* do the analysis of your requirements, and *then*
> you build your product (where of course part of the analysis phase is
> building a prototype). Dismissing some particular requirement which was
> derived by sound engineering, just because the GA144 looks bad on it,
> well, that's fanboi-ism.

Let's define "looking bad". If it's power efficiency we are after, then (assuming,
that single op takes ~7 pico Joules) single "hifi audio" MAC is ~0.5 nano Joules.
Is it good or bad?

If it's the speed you are after, then you have to take into the consideration, that
g144 has 144 cores and you can run 144 MACs in parallel. Single core makes
~700 megaops/second, which is ~9 mMACs/second on single core, or
~1.2 gMACs/sec. if all the 144 cores do nothing but multiplying/adding.

Can you make the same estimations for that DDFA chip?

Bernd Paysan

unread,
Feb 25, 2012, 8:08:16 PM2/25/12
to
forther wrote:
> 35 bits internal data is not absurd. Absurd is to have both sample
> history and the kernel in 35 bits. 18 bits or so of kernel
> coefficients is way enough. So, it's not 36x36, but 36x18 MAC, which
> is required for high quality audio processing. One bit MAC takes 2 ops
> (cycles) of g144, meaning that complete 36x18 MAC takes 72+ ops. Not
> 200. It's the same bollpark though.

We found that we need 24 bits for IIR filter coefficients to be
sufficiently precise, and 32 bits for the upsampling FIR filter - the
"problem" for the less critical IIR filters is the low-frequency range,
where correction is needed the most, and speakers have ugly response
curves. So yes, it's not 36x36, but 36x18 won't do it, either. The
upsamping FIR filters are 2/3 of these multiplication macros, the IIR
filters are 1/3.

You also need to calculate accumulation (18x18 multiplication already
takes 36 instructions, without any accumulation), and core-to-core
communication.

>> As I said: You *first* do the analysis of your requirements, and
>> *then* you build your product (where of course part of the analysis
>> phase is
>> building a prototype). Dismissing some particular requirement which
>> was derived by sound engineering, just because the GA144 looks bad on
>> it, well, that's fanboi-ism.
>
> Let's define "looking bad". If it's power efficiency we are after,
> then (assuming, that single op takes ~7 pico Joules) single "hifi
> audio" MAC is ~0.5 nano Joules. Is it good or bad?

Bad (and it's more likely 1.5 nJ, because 36x18 wouldn't be enough, and
you forgot the overhead). The DDFA needs 600 millions of those per
second, and the typical (not maximum!) output power of a stereo
amplifier is just a few watts. So our design goal was to be around 1W
in the digital part, including the feedback ADCs (they run at the same
~100MHz, producing 8 bits per cycle). I don't remember the figures
exactly, but AFAIK our power budget for the DSP parts was no more than
300mW. And we did that in 180nm, not in 130nm.

> If it's the speed you are after, then you have to take into the
> consideration, that g144 has 144 cores and you can run 144 MACs in
> parallel. Single core makes ~700 megaops/second, which is ~9
> mMACs/second on single core, or ~1.2 gMACs/sec. if all the 144 cores
> do nothing but multiplying/adding.
>
> Can you make the same estimations for that DDFA chip?

I still think the 200 cycles per MAC plus overhead are closer to reality
than your 72 cycles - you can't cut the specification, and it has to
accumulate, it has to communicate ;-). And you need a worst-case
szenario, it's not ok to take an average chip at room temperature. You
need to take a bad chip at high temperature - cooling audio equiplent is
difficult (no fans allowed!), and even a high efficient DDFA-based
amplifier still may need to get rid of 10W per channel under high volume
conditions (up to 8 channels in total) - passive cooling only (though,
fortunately, not in direct proximity of the digital chip).

The DDFA chip does 648 MMAC/s. It needs all of those only at 192kHz
inputs, but only the IIR filters slow down on lower sample rates (the
upsampling filter for 48kHz and below just has a steeper cut-off and
four times the coefficients).

Unfortunately, the takeover of Zetex by Diodes resulted in that there
are only two product made with the DDFA, the NAD VISO 1 and the NAD M51
(according to reviews, the VISO 1 is the best-sounding iPod docking
station of the world...), because Diodes is unable to sell a device more
costly than a few cents, and NAD was already pilot customer before that.

David Kuehling

unread,
Feb 25, 2012, 8:40:56 PM2/25/12
to
>>>>> "Bernd" == Bernd Paysan <bernd....@gmx.de> writes:

[..]
>> If it's the speed you are after, then you have to take into the
>> consideration, that g144 has 144 cores and you can run 144 MACs in
>> parallel. Single core makes ~700 megaops/second, which is ~9
>> mMACs/second on single core, or ~1.2 gMACs/sec. if all the 144 cores
>> do nothing but multiplying/adding.
>>
>> Can you make the same estimations for that DDFA chip?

> I still think the 200 cycles per MAC plus overhead are closer to
> reality than your 72 cycles - you can't cut the specification, and it
> has to accumulate, [..]

But doesn't accumulation come for free? I.e. if one multiplcation step
+* is (test bit, shift left, add conditionally), and you need to
accumulate, you just start with some value X instead of initializing to
zero.

I though that this was the reason why all those DSPs include MAC
operations. The only downside is that you need 3 inputs to the
operation, so more data busses and more read ports into the register
file.

David
--
GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk.gpg
Fingerprint: B17A DC95 D293 657B 4205 D016 7DEF 5323 C174 7D40

Bernd Paysan

unread,
Feb 26, 2012, 2:38:46 PM2/26/12
to
David Kuehling wrote:
> But doesn't accumulation come for free? I.e. if one multiplcation
> step +* is (test bit, shift left, add conditionally), and you need to
> accumulate, you just start with some value X instead of initializing
> to zero.

+* can do something like 18*18+18 (bits), but not 18*18+36. And to
generate 36*36+72 (36 bit MAC), you would even need 18*18+72 to make
this work without overhead.

> I though that this was the reason why all those DSPs include MAC
> operations. The only downside is that you need 3 inputs to the
> operation, so more data busses and more read ports into the register
> file.

DSPs have parallel multipliers, where you indeed have ws*ws+dws (ws=word
size, dws=double word size) as one operation, often single-
cylce/pipelined with one op per cycle.

Paul Rubin

unread,
Feb 26, 2012, 4:44:03 PM2/26/12
to
Bernd Paysan <bernd....@gmx.de> writes:
> DSPs have parallel multipliers, where you indeed have ws*ws+dws (ws=word
> size, dws=double word size) as one operation, often single-
> cylce/pipelined with one op per cycle.

Some of them have accumulators wider than dws, like the Motorola 56k
which has 24*24+56. That stops overflow when you use the whole ws and
accumulate multiple terms. I don't know how common that feature is, but
it makes implementing bignum multiplication much nicer.

forther

unread,
Feb 29, 2012, 3:33:02 PM2/29/12
to
On Saturday, February 25, 2012 5:08:16 PM UTC-8, Bernd Paysan wrote:
> forther wrote:
> > 35 bits internal data is not absurd. Absurd is to have both sample
> > history and the kernel in 35 bits. 18 bits or so of kernel
> > coefficients is way enough. So, it's not 36x36, but 36x18 MAC, which
> > is required for high quality audio processing. One bit MAC takes 2 ops
> > (cycles) of g144, meaning that complete 36x18 MAC takes 72+ ops. Not
> > 200. It's the same bollpark though.
>
> We found that we need 24 bits for IIR filter coefficients to be
> sufficiently precise, and 32 bits for the upsampling FIR filter - the
> "problem" for the less critical IIR filters is the low-frequency range,
> where correction is needed the most, and speakers have ugly response
> curves. So yes, it's not 36x36, but 36x18 won't do it, either. The
> upsamping FIR filters are 2/3 of these multiplication macros, the IIR
> filters are 1/3.

I'm still not convinced.

> Bad (and it's more likely 1.5 nJ, because 36x18 wouldn't be enough, and
> you forgot the overhead). The DDFA needs 600 millions of those per
> second, and the typical (not maximum!) output power of a stereo
> amplifier is just a few watts. So our design goal was to be around 1W
> in the digital part, including the feedback ADCs (they run at the same
> ~100MHz, producing 8 bits per cycle). I don't remember the figures
> exactly, but AFAIK our power budget for the DSP parts was no more than
> 300mW.

Even *if* 200 (including overhead) steps per MAC is a must on g144 and
*if* you do remember correctly it's still the same ballpark: 900mW vs 300mW

> And we did that in 180nm, not in 130nm.

g144 is 180nm

quiet_lad

unread,
Mar 7, 2012, 7:27:12 PM3/7/12
to
On Jan 10, 2:45 pm, Jason Damisch <jasondami...@yahoo.com> wrote:
> On Jan 10, 2:28 am, gavino <gavcom...@gmail.com> wrote:
>
> > Would green arrays produce something with some persistance and a web browser? perhaps staggeringly cheap?  or perhaps something like plan9's 9p that could repalce the web with a network file system or some other metaphore?
>
> No, but you can investigate iTV as they were developing cheap web
> browser which might have been pretty cool.  But, the internet and
> internet programming has become so complex due to such technologies as
> Javascript and Flash, that a simple cheap web-browser would in fact
> have limited scope and abilities, and only be able to use a subset of
> the internet.  This would not be a mass appeal device.

would html 5 change this?

quiet_lad

unread,
Mar 7, 2012, 7:26:18 PM3/7/12
to
On Jan 10, 10:49 am, BruceMcF <agil...@netscape.net> wrote:
> On Jan 10, 5:28 am, gavino <gavcom...@gmail.com> wrote:
>
> > Would green arrays produce something with some persistance and a web browser? perhaps staggeringly cheap?  or perhaps something like plan9's 9p that could repalce the web with a network file system or some other metaphore?
>
> a web browser that replaces the web would be useless because people
> use web browsers to browse the stuff on the web and the stuff that is
> one the web is on the web and not on the web replacement?
>
> and staggeringly cheap means requiring staggering sales volume to
> cover any fixed costs at all? for instance a bot to replace gavino
> could be staggeringly cheap but since nobody would pay anything for it
> the profit margin would be a minus percent?

zomg ur lame
It is loading more messages.
0 new messages