powerpc in virtex2pro

Muthu

unread,

Mar 12, 2002, 9:12:00 AM3/12/02

to

Hi,

The virtex2pro having 2 inbuild powerpc 405. Is it mean that, for any
logic design the powerpc will be used to give the logic functionality?
or If we need POwerpc we can use it?

What is the special about having inbuild powerpc?

Thanks and Regards,
Muthu

Peter Alfke

unread,

Mar 12, 2002, 11:12:22 AM3/12/02

to

Muthu wrote:

> What is the special about having inbuild powerpc?

The advantage of built-in PowerPC microprocessor is that it connects
very well to the logic fabric ( the CLBs, BlockRAMs, etc.) In
Virtex-IIPro, each PPC has about 700 connections to the fabric, with
several 64-wide busses.
Obviously, you could use an external PPC, but that would not only mean an
additional package, it would also mean many hundreds of FPGA pins being
wasted on interfacing to the external PPC. More space, more power, less
reliability, and most likely lower system performance.
The tight and flexible connection between PPC and the logic is the
biggest advantage.

You can use the PPC for anything you want, from glorified state machine (
living off its two 16 Kbyte caches, to a full-fledged computer with its
operating system residing in external RAM. This covers a very wider
range. You can make many different trade-offs between hardware and
software implementation, perhaps without ever changing the pc-board.

Peter Alfke

Tim

unread,

Mar 15, 2002, 3:16:53 PM3/15/02

to

Peter Alfke wrote

> The advantage of built-in PowerPC microprocessor is that it connects
> very well to the logic fabric ( the CLBs, BlockRAMs, etc.) In
> Virtex-IIPro, each PPC has about 700 connections to the fabric, with
> several 64-wide busses.
> Obviously, you could use an external PPC, but that would not only mean an
> additional package, it would also mean many hundreds of FPGA pins being
> wasted on interfacing to the external PPC. More space, more power, less
> reliability, and most likely lower system performance.
> The tight and flexible connection between PPC and the logic is the
> biggest advantage.

It looks as if the '405 consumes the space of 512 LUTs, ignoring
any dedicated layers on the chip. 512 LUTs is midway between an
XC2S30 and an XC2S50. But maybe the design was harder than a
typical XC2S50 implementation :-)

Cyrille de Brébisson

unread,

Mar 29, 2002, 4:26:33 PM3/29/02

to

Hello,

Actually, I have a related question.
In our design we are using an ARM CPU. My question is:
Can we put an ARM in the virtex 2 pro?
Were can I find/buy an ARM cpu core source (or precompiled) file to program
in my FPGA?

Regards, Cyrille

"Muthu" <muthu...@yahoo.co.in> wrote in message
news:28c66cd3.02031...@posting.google.com...

Peter Alfke

unread,

Mar 30, 2002, 12:20:13 AM3/30/02

to

"Cyrille de Brébisson" wrote:

> In our design we are using an ARM CPU. My question is:
> Can we put an ARM in the virtex 2 pro?
> Were can I find/buy an ARM cpu core source (or precompiled) file to program
> in my FPGA?
>

Cyrille,
the answer to both your questions is: No.
The PowerPC in Virtex-II Pro is a "hard" implementation, packing the
microprocessor with its caches and MMU into the smallest possible silicon
area, <4 square millimeters.
What you seem to be looking for is a "soft" implementation, using the
programmable logic "fabric".
That solution is impractical for something as complex as PowerPC or even ARM.
It would take up an unreasonable portion of a large chip, and achieve mediocre
performance at best.
Xilinx offers a soft microprocessor, called MicroBlaze, especially tuned for
efficient implementation in the Virtex architecture. It is not as fast and
capable as PowerPC, but uses only ~900 slices.
"Half the size and twice the speed of NIOS" is the Xilinx slogan. Please, no
flames...

Peter Alfke, Xilinx Applications

Keith R. Williams

unread,

Mar 30, 2002, 2:27:04 PM3/30/02

to

In article <a82m6n$lvj$1...@web1.cup.hp.com>, cyrille_de...@hp.com
says...

> Hello,
>
> Actually, I have a related question.
> In our design we are using an ARM CPU. My question is:
> Can we put an ARM in the virtex 2 pro?
> Were can I find/buy an ARM cpu core source (or precompiled) file to program
> in my FPGA?

In the interest of full disclosure... Xilinx doesn't do ARM, but Altera
has an ARM hard core in their Excalibur series. I don't know anything
more than what is on their web site though.

Xilinx chose the right processor. ;-)

----
Keith
IBM PowerPC Development (but nothing to do with Xilinx/IBM alliance)

Kevin Brace

unread,

Mar 30, 2002, 4:26:12 PM3/30/02

to

I read an article a few months ago about a startup developing yet
another ARM clone (PicoTurbo exited the ARM clone business.).

http://www.eetimes.com/story/OEG20020124S0111

What is the purpose of putting an ARM or its clone core into a Virtex-II
Pro?
Is it for an ASIC prototype?

Kevin Brace (In general, don't respond to me directly, and respond
within the newsgroup.)

Ron Huizen

unread,

Apr 1, 2002, 9:18:40 AM4/1/02

to

Peter,

Are you saying that putting an ARM core into a Virtex II is not doable,
or just not practical? Or are you only talking about the V2 Pro?

---------
Ron Huizen
BittWare

Peter Alfke

unread,

Apr 1, 2002, 11:36:19 AM4/1/02

to

There are hard cores and soft cores.
Hard cores are specialized silicon areas that implement a fnction very efficiently,
but only the manufacturer ( Xilinx, Altera, etc ) can do that.
Soft macros use the logic fabric ( CLBs etc ), and any user can instantiate them
anywhere. Soft macros consume more silicon area per function, and are slower, but
more flexible. (eg MicroBlaze and Nios)
IMHO, both PowerPC and ARM are too complex to be implemented as soft macros.
Xilinx picked PowerPC as the hard core in Virtex-II Pro, Altera picked ARM for
Excalibur, and, surprisingly enough, no microprocessor at all for Stratix, their
future flagship...

Peter Alfke
===================

Rick Filipkiewicz

unread,

Apr 1, 2002, 4:00:20 PM4/1/02

to

"Keith R. Williams" wrote:

> In article <a82m6n$lvj$1...@web1.cup.hp.com>, cyrille_de...@hp.com
> says...
> > Hello,
> >
> > Actually, I have a related question.
> > In our design we are using an ARM CPU. My question is:
> > Can we put an ARM in the virtex 2 pro?
> > Were can I find/buy an ARM cpu core source (or precompiled) file to program
> > in my FPGA?
>
> In the interest of full disclosure... Xilinx doesn't do ARM, but Altera
> has an ARM hard core in their Excalibur series. I don't know anything
> more than what is on their web site though.
>
> Xilinx chose the right processor. ;-)
>
>

Nah! It should have been one of the MIPS cores.

- Speaking as someone for whom the bigger the MIPS world, the bigger my earnings.

Ray Andraka

unread,

Apr 1, 2002, 7:20:32 PM4/1/02

to

Rick Filipkiewicz wrote:

> "Keith R. Williams" wrote:
>
> > Xilinx chose the right processor. ;-)
> >
> >
>
> Nah! It should have been one of the MIPS cores.
>

I voted for an RCA1802 knock off ;-)

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email r...@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Alan Calac

unread,

Apr 1, 2002, 8:40:43 PM4/1/02

to

Microblase has an unofficial slogan too:
"Half the size of Nios, and a quarter of the features".

Xilinx certainly doesn't prefer this one, but it seems to be catching
on with those who have used both processors.

--------->Alan

Peter Alfke <pal...@earthlink.net> wrote in message news:<3CA54B02...@earthlink.net>...

Keith R. Williams

unread,

Apr 1, 2002, 10:29:28 PM4/1/02

to

In article <3CA8CA64...@algor.co.uk>, ri...@algor.co.uk
says...
>
>
> "Keith R. Williams" wrote:

> > Xilinx chose the right processor. ;-)
> >
> >
>
> Nah! It should have been one of the MIPS cores.

Other people have other points of view. ;-)

>
> - Speaking as someone for whom the bigger the MIPS world, the bigger my earnings.

Yep! My thoughts exactly! ;-)

I may be looking to leverage some knowledge someday. Nice going
Xilinx!

----
Keith

Jan Gray

unread,

Apr 2, 2002, 9:04:03 AM4/2/02

to

"Peter Alfke" <pal...@earthlink.net> wrote

> IMHO, both PowerPC and ARM are too complex to be implemented as soft
macros.

Implementations of *integer subset*s of MIPS, ARM, and PowerPC architectures
are not too complex to be implemented as soft cores. One can produce an
integer MIPS-I soft core as "small" as MicroBlaze; and I have done a
spreadhseet analysis/design study for an FPGA-optimized PowerPC Book I soft
core that cost between 1200 and 2000 LUTs (1.3-2.2x the size of MicroBlaze),
depending upon performance tradeoffs and whether or not you trap and emulate
certain rare and expensive instructions.

The only thing holding back fast (100 MHz) relatively compact (800-2000
LUTs) FPGA-optimized soft core implementations of subsetted commercial RISC
instruction set architectures is the intellectual property landscape.

I am surprised that certain processor IP companies, that lack a hard core
programmable logic platform, and may therefore be losing certain design wins
to ARM and PPC, have not yet launched soft FPGA-optimized processor core
products. Perhaps they too think it infeasible or impractical. (My company
would be pleased to demonstrate otherwise.)

I predict that sooner-or-later all processor IP licensors will come to the
realization that programmable logic has become the air that a great many of
their designers breathe, and that eventually all processor IP licensors will
offer or endorse FPGA-optimized soft processor core implementations of their
ISAs. To not do so would be to surrender a quickly growing market segment
to their competitors. I put that date around 2005.

There is no defense against the ATTACK OF THE KILLER FPGAS!

I also feel that binary translation (static or dynamic) will become
important and then commonplace, both as a way to run legacy ISAs on
streamlined FPGA-optimized cores, and as a way to run full ISAs on subsetted
ISA implementations.

Jan Gray, Gray Research LLC
FPGA CPU News: www.fpgacpu.org

Austin Lesea

unread,

Apr 2, 2002, 6:04:39 PM4/2/02

to

Ron,

You can put any soft processor core you want in the Virtex II, or II Pro, as long as
they are small enough to fit in the FPGA.

Of course, there are royalty issues when using the ARM core.

The PowerPC(tm IBM) license allows unlimited use of the PowerPC core in Virtex II
Pro. And the license is free. MicroBlaze is also free.

Austin

Utku Ozcan

unread,

Apr 1, 2002, 9:00:10 AM4/1/02

to

"Cyrille de Brébisson" wrote:

> Hello,
>
> Actually, I have a related question.
> In our design we are using an ARM CPU. My question is:
> Can we put an ARM in the virtex 2 pro?
> Were can I find/buy an ARM cpu core source (or precompiled) file to program
> in my FPGA?
>
> Regards, Cyrille

AFAIK Altera devices support ARM.

Utku

crob

unread,

Apr 3, 2002, 12:43:57 PM4/3/02

to

It's simple. If you want an ARM core, use Altera's Excalibur family.
This is a cost-effective method if you want an ARM microprocessor
connected directly to a PLD.

As for the slogan, "Half the size and twice the speed of NIOS", leave
this were it belongs, with the Marketing weenies. I have used the
MicroBlaze microprocessor, and couldn't get close to the numbers
claimed, go figure. I also noticed a SIGNIFICANT decrease in
performance when I ran my code from external memory.

C-ROB

Austin Lesea <austin...@xilinx.com> wrote in message news:<3CAA3906...@xilinx.com>...

jerry1111

unread,

Apr 3, 2002, 3:36:53 PM4/3/02

to

Uzytkownik "crob" <cro...@yahoo.com> napisal w wiadomosci
news:cb769f6b.0204...@posting.google.com...

> It's simple. If you want an ARM core, use Altera's Excalibur family.
> This is a cost-effective method if you want an ARM microprocessor

Cos effective?
Altera's dealer told me that prices are starting from $1000 (in Poland)
to $5000. I wanted to buy ARM kit two weeks ago. When he told
me that kit costs $10k, and chips are starting from $1k - I stopped ;)

> connected directly to a PLD.
>
> As for the slogan, "Half the size and twice the speed of NIOS", leave
> this were it belongs, with the Marketing weenies. I have used the
> MicroBlaze microprocessor, and couldn't get close to the numbers
> claimed, go figure. I also noticed a SIGNIFICANT decrease in
> performance when I ran my code from external memory.

Interesting. Don't you know how it looks like in Nios? - my Nios
is waiing for... taxes (those taxes which are applied to stuff when crossing
national border - I don't know their name in English ;)

jerry

Jim Granville

unread,

Apr 3, 2002, 4:13:01 PM4/3/02

to

jerry1111 wrote:
>
> Uzytkownik "crob" <cro...@yahoo.com> napisal w wiadomosci
> news:cb769f6b.0204...@posting.google.com...
> > It's simple. If you want an ARM core, use Altera's Excalibur family.
> > This is a cost-effective method if you want an ARM microprocessor
> Cos effective?
> Altera's dealer told me that prices are starting from $1000 (in Poland)
> to $5000. I wanted to buy ARM kit two weeks ago. When he told
> me that kit costs $10k, and chips are starting from $1k - I stopped ;)

You have to need the speed, and tight coupling :-)

At the other end of the scale, the lowest price spotted so far, for
a ARM microcontroller (Off chip memory) is $4.95/10K from OKI
See
http://www.okisemi.com/html/docs/Intro-7830.html

That's under x188 / x186 / eZ80 ...

-jg

Austin Lesea

unread,

Apr 3, 2002, 4:24:38 PM4/3/02

to

C-ROB,

Obviously, when one does a performance test, one does not go off chip to memory.

It is much faster to execute out of BRAM, and show what the part is capable of doing.

It would be like testing a Pentium IV with SDR 133 MHz RAM.....

As for the difference when on-chip, I can only guess that you were doing something different from
what was done for the benchmark. Did you open a hotline case at the time, and ask why?

Austin

Peter Alfke

unread,

Apr 3, 2002, 4:55:01 PM4/3/02

to crob

crob wrote:

> I also noticed a SIGNIFICANT decrease in
> performance when I ran my code from external memory.

No surprise, and an excellent argument for on-chip microprocessors running out of on-chip caches
and BlockRAM, and having good connectivity to the FPGA fabric.
Let me stop here, before I get into my Virtex-II Pro with PowerPC pitch... :-)

Peter Alfke

Jan Gray

unread,

Apr 3, 2002, 5:40:23 PM4/3/02

to

"Peter Alfke" <peter...@xilinx.com> wrote:
> crob wrote:
> > I also noticed a SIGNIFICANT decrease in
> > performance when I ran my code from external memory.
>
> No surprise, and an excellent argument for on-chip microprocessors running
out of on-chip caches
> and BlockRAM, and having good connectivity to the FPGA fabric.

If you're going to stay on chip, you might as well assume a 16-bit address
space. Once you've done that, a <200 LUT 16-bit RISC MCU will often
suffice. :-)

If you're going to choose a 32-bit CPU because you want to use some of those
great sprawling OSs and RTOSs and TCP/IP stacks and so forth, then it is
likely that your application will not fit in on-chip RAM, and you should
choose a microprocessor that provides at least an I-buffer, or I-cache, or
branch target cache, lest your processor stall every instruction or branch.
(Even if you have adequate bandwidth to off-chip instruction memory, the
branch latency (nonsequential instruction fetch latency) will kill you.) A
20-50 MHz RISC CPU can get by with flow-through external SRAM. But at 125
MHz and up, plan to use an I-cache.

The good news it is fairly simple to add an I-cache to an FPGA CPU core,
assuming it already has an "instruction not ready signal".

Of course, caches are also important to reduce core external bandwidth
requirements. Using caches, you may be able to share a common memory
interface with other cores, or may be able to use a 16-bit external memory
data path instead of a 32-bit one.

B. Joshua Rosen

unread,

Apr 3, 2002, 6:08:41 PM4/3/02

to

Xilinx has made some curious choices with the VirtexII Pro line and I was
wondering if Peter would comment on some of them. First off let me say
that the PPC was the right choice. Over the last few years the embedded
processor of choice at all of the places that I have consulted to has
been the 405. I also think that combining a 405 with an FPGA makes a lot
of sense, it goes a long way towards being able to build a single chip
customized embedded system without having to do an ASIC. Now for the
things that I think were a little curious,

1) How come there isn't a dedicated DDR interface on the chip. I've never
seen a PPC application that didn't require DRAM, a dedicated interface
would be cheaper and higher performing than using valuable CLBs to build
a soft interface. (If I'm mistaken about the lack of a dedicated DDR
interface please let me know, I didn't see any mention of one when I read
the spec).

2) I don't see the need for putting four processors on a die. In almost
all cases a single 405 should be adequate, in a few case you could make
good use of two but I don't think that you would ever need four. There
should have been a wider choice of parts with a single 405 core.

3) There should also be a wider range of parts that have Rocket IO but no
PPC. This is really a 2003 time frame issue when 3GIO starts to roll out,
at that point Rocket IO will become very important.

4) On chip Flash RAM would be useful. An embedded PPC is going to require
some Flash. Also it would be nice if the serial Flash RAM were on chip,
I bet every one is sick of the extra part that most Xilinx designs
require.

5) The IBM 405 chips include on board ethernet MACs, a PCI interface and
an SDRAM interface, a version of that chip that also incorporates an FPGA
and FLASH would be a good idea. In theory you could handle almost any
embedded application with just that one chip plus an SDRAM or two.

6) This is a Virtex II issue, not just a Virtex II Pro issue. How about
offering versions of the Virtex II without the on board multipliers. The
multipliers make sense for DSP applications but they are a waste of money
and power for everything else. In my 12 years doing Xilinx designs I have
never needed a multiplier. I've frequently needed a CAM so I wouldn't
mind a few CAMs on board, but I'd rather have a cheaper part without the
multiplers.

John_H

unread,

Apr 3, 2002, 6:44:11 PM4/3/02

to

I'll put in my 2 cents worth on a few items below regarding the aspects I've
come to appreciate...

"B. Joshua Rosen" wrote:

> 1) How come there isn't a dedicated DDR interface on the chip. I've never
> seen a PPC application that didn't require DRAM, a dedicated interface
> would be cheaper and higher performing than using valuable CLBs to build
> a soft interface. (If I'm mistaken about the lack of a dedicated DDR
> interface please let me know, I didn't see any mention of one when I read
> the spec).
>
> 2) I don't see the need for putting four processors on a die. In almost
> all cases a single 405 should be adequate, in a few case you could make
> good use of two but I don't think that you would ever need four. There
> should have been a wider choice of parts with a single 405 core.

For both 1) and 2), the intent isn't exclusively to replace the processor in
an embedded system, but to distribute the tasks. If you're dealing with an IP
block that needs some intelligence, a single PPC could be used to take care of
the somewhat complex - though limited - functionality needed. Why do almost
all systems require a DRAM interface? Because there's just sooo much
functionality pushed into a single general purpose machine. If you can
distribute the processing and keep the code requirements small for some
functional blocks, a main CPU could take care of the big tasks outside of the
FPGA when performance wouldn't allow the little Virtex-II PPC to keep up with
all the system demands.

> 4) On chip Flash RAM would be useful. An embedded PPC is going to require
> some Flash. Also it would be nice if the serial Flash RAM were on chip,
> I bet every one is sick of the extra part that most Xilinx designs
> require.

I've been finding more ways to deal with external flash memory. I'd prefer to
keep the FPGA cost constrained and get cheap generic flash rather than
spending more for a process that isn't optimum for either the logic or the
memory.

> 6) This is a Virtex II issue, not just a Virtex II Pro issue. How about
> offering versions of the Virtex II without the on board multipliers. The
> multipliers make sense for DSP applications but they are a waste of money
> and power for everything else. In my 12 years doing Xilinx designs I have
> never needed a multiplier. I've frequently needed a CAM so I wouldn't
> mind a few CAMs on board, but I'd rather have a cheaper part without the
> multiplers.

Do you find yourself designing shifters? Both barrel shifters and straight
shifters are nicely implemented in the multiplier blocks. Whether to align a
SONET bit stream to the byte oriented frame or to insert/extract variable
bit-width items into/from serial streams, shifters have been an important part
of many of my designs.

Austin Lesea

unread,

Apr 4, 2002, 11:35:42 AM4/4/02

to

Joshua,

Maybe I can comment, as I was on the VII Pro team, and the VII team.

See below,

Austin

---------------------snip----------------

1) How come there isn't a dedicated DDR interface on the chip. I've never
seen a PPC application that didn't require DRAM, a dedicated interface
would be cheaper and higher performing than using valuable CLBs to build
a soft interface. (If I'm mistaken about the lack of a dedicated DDR
interface please let me know, I didn't see any mention of one when I read
the spec).

DDR is built out of the DDR FF in the IOB's and logic in the FPGA. DDR isn't the only standard, and customers have many other applications. DDR is neat, but too specific.

2) I don't see the need for putting four processors on a die. In almost
all cases a single 405 should be adequate, in a few case you could make
good use of two but I don't think that you would ever need four. There
should have been a wider choice of parts with a single 405 core.

We just don't know how customers will use all of this power. If 405ppc's are 'free', you can use one executing out of internal cache to handle the "error 404", and another running off internal cache to monitor QOS, etc.

When electric motors were very expensive, a machine shop had one, and leather belts to every tool station. When fractional horsepower motors became inexpensive and ubiquitous, they were used everywhere, with no thought.

If 405ppc are everywhere, you may dedicate them to tasks that seem horribly inefficient if you continue to think in terms of the one big expensive monster processor.

3) There should also be a wider range of parts that have Rocket IO but no
PPC. This is really a 2003 time frame issue when 3GIO starts to roll out,
at that point Rocket IO will become very important.

Stay tuned.

4) On chip Flash RAM would be useful. An embedded PPC is going to require
some Flash. Also it would be nice if the serial Flash RAM were on chip,
I bet every one is sick of the extra part that most Xilinx designs
require.

Flash requires a process that is usually two years behind the leading process. To do a flash capable FPGA would be to be obsolete on day 1 of the introduction. Not very exciting.

5) The IBM 405 chips include on board ethernet MACs, a PCI interface and
an SDRAM interface, a version of that chip that also incorporates an FPGA
and FLASH would be a good idea. In theory you could handle almost any
embedded application with just that one chip plus an SDRAM or two.

MACs are soft cores to us.

6) This is a Virtex II issue, not just a Virtex II Pro issue. How about
offering versions of the Virtex II without the on board multipliers. The
multipliers make sense for DSP applications but they are a waste of money
and power for everything else. In my 12 years doing Xilinx designs I have
never needed a multiplier. I've frequently needed a CAM so I wouldn't
mind a few CAMs on board, but I'd rather have a cheaper part without the
multiplers.

Well, they take up a tiny amount of area, so the cost savings is washed out completely by having to make two parts, with lower volumes in each.

Lasse Langwadt Christensen

unread,

Apr 4, 2002, 1:26:11 PM4/4/02

to

you can definately put an ARM in an FPGA the last project I worked on, I
did
an a ASIC proto of a SoC with an ARM7-TDMI-S in a virtexE, rigth now I'm
working
on something similar but in a virtex2, so it can hopefully get more of
the
clock gating in the design working in the prototype.

Size and performance will not be like a hard implementation, but for a
prototype
that doesn't really matter as long as the performance is enough and the
design
fits a chip you can buy. And if you need to there's things that could be
changed
to bettter fit and fpga, so performance could be increased, but for a
prototype
you don't what to do that unless you have to.

But anyways, buying the source code for an ARM will probably cost you an
arm ;) and a leg,

-Lasse
-- Lasse Langwadt Christensen,
-- Aalborg, Danmark

Nicholas Weaver

unread,

Apr 4, 2002, 2:04:22 PM4/4/02

to

Additional view from a computer architect type:

In article <3CAC80DE...@xilinx.com>,

Austin Lesea <austin...@xilinx.com> wrote:
>> 1) How come there isn't a dedicated DDR interface on the chip. I've never
>> seen a PPC application that didn't require DRAM, a dedicated interface
>> would be cheaper and higher performing than using valuable CLBs to build
>> a soft interface. (If I'm mistaken about the lack of a dedicated DDR
>> interface please let me know, I didn't see any mention of one when I read
>> the spec).
>
>DDR is built out of the DDR FF in the IOB's and logic in the FPGA. DDR isn't
>the only standard, and customers have many other applications. DDR is neat,
>but too specific.

There is also a design pholosophy (which I can agree with for some
uses, can't for others here) that only the minimally useful set should
be implemented, because that is the cheapest and useable by the most
people.

A dedicated DDR SDRAM interface would be very nice, but that would
consume a couple mm^2 of silicon, which is only usable by those who
are going to plunk down a DDR interface, on a specific set of pins.

>> 2) I don't see the need for putting four processors on a die. In almost
>> all cases a single 405 should be adequate, in a few case you could make
>> good use of two but I don't think that you would ever need four. There
>> should have been a wider choice of parts with a single 405 core.
>
>We just don't know how customers will use all of this power. If 405ppc's are
>'free', you can use one executing out of internal cache to handle the "error
>404", and another running off internal cache to monitor QOS, etc.
>
>When electric motors were very expensive, a machine shop had one, and leather
>belts to every tool station. When fractional horsepower motors became
>inexpensive and ubiquitous, they were used everywhere, with no thought.
>
>If 405ppc are everywhere, you may dedicate them to tasks that seem horribly
>inefficient if you continue to think in terms of the one big expensive monster
>processor.

And processors these days, for a simple core, are INCREDIBLY cheap,
especially this one:

It has no memory (those are the BlockRAMs), only the register file,
datapath, and control logic.

Even in synthesis, discounting the register file and caches, a 5 stage
SPARC uP core takes 1.3mm x .85mm in a .18uM process. The caches, out
of 4 1024x32b memories, are almost as big as the core itself!
http://www.eecg.toronto.edu/~pagiamt/research/leon.html

So in the area of about ~8-10 Virtex 2 BlockRAMs (1024x18b memories),
you can fit a SYNTHESIZED sparc core (without a hardware
multiplier/divider or MMU). I suspect that the Virtex 2 PPC core is
even smaller, but with most of the actual area being the interfacing
of the core to everything else.

I'd love to get my hands on an XC2VP4 or larger die or die photo, just
to verify these hunches about area in more detail.

But according to the datasheet, the XC2VP2 uses 4 columns, 4 high of
BlockRAMS, with the top and bottom of the center columns replaced with
the RocketIO transecivers, so a pitch of 4 clb slices/BlockRAM.

The XC2VP4 uses 4 columns, 10 high (its a 40x22 instead of a 16x22
array) and has 28 BlockRAMs, so 8 BlockRAMs are replaced for the PPC
core, and 128 CLBs (500 slices) of logic. This is pretty CHEAP!

If you have a low time critical function (EG, one which takes a fair
path-length, but isn't necessarily pipeline-every-cycle), if you can
replace just 128 CLBs with the use of the processor core, you've won,
bigtime. So my assumption here is the 8 BlockRAMs of area are
replaced with the uP core, with the rest going to a heck of a lot of
interface logic.

>> 4) On chip Flash RAM would be useful. An embedded PPC is going to require
>> some Flash. Also it would be nice if the serial Flash RAM were on chip,
>> I bet every one is sick of the extra part that most Xilinx designs
>> require.
>
>Flash requires a process that is usually two years behind the leading
>process. To do a flash capable FPGA would be to be obsolete on day 1 of the
>introduction. Not very exciting.

The only way I could conceive of their being Flash on the die is some
fancy packaging, eg, a chip-up smaller flash chip bonded to internal
pads on a chip down larger part. And do you REALLY want to spend an
extra $20 just to reduce your part count from 2 to 1, and save 16-30
external pins?

>> 6) This is a Virtex II issue, not just a Virtex II Pro issue. How about
>> offering versions of the Virtex II without the on board multipliers. The
>> multipliers make sense for DSP applications but they are a waste of money
>> and power for everything else. In my 12 years doing Xilinx designs I have
>> never needed a multiplier. I've frequently needed a CAM so I wouldn't
>> mind a few CAMs on board, but I'd rather have a cheaper part without the
>> multiplers.
>
>Well, they take up a tiny amount of area, so the cost savings is washed out
>completely by having to make two parts, with lower volumes in each.

And, as Ray Andraka has pointed out, a multiplier makes a great
shifter as well. A variable shift is suprisingly expensive in an FPGA
fabric: there are a lot of muxes, but it is an operation that is
suprisingly common.

An 18x18 multiplier can implement an 18 bit variable rotation with
just 18 LUTs worth of logic to deincode the shift amount, and an
additional 18 LUTs worth of logic if you want to make it a left
shift/rotate, an additional 36 LUTs worth if you want to make a
variable left/right shift.

The multiplier blocks are an example of something which IS very
common.
--
Nicholas C. Weaver nwe...@cs.berkeley.edu

Nicholas Weaver

unread,

Apr 4, 2002, 2:09:15 PM4/4/02

to

In article <3CAC9AC3...@ieee.org>,

Lasse Langwadt Christensen <lang...@ieee.org> wrote:
>But anyways, buying the source code for an ARM will probably cost you an
>arm ;) and a leg,

While SPARC is free. :)
http://www.gaisler.com/leon.html

Falk Brunner

unread,

Apr 4, 2002, 1:47:01 PM4/4/02

to

"Peter Alfke" <peter...@xilinx.com> schrieb im Newsbeitrag
news:3CAB7A34...@xilinx.com...

> No surprise, and an excellent argument for on-chip microprocessors running
out of on-chip caches
> and BlockRAM, and having good connectivity to the FPGA fabric.
> Let me stop here, before I get into my Virtex-II Pro with PowerPC pitch...
:-)

Peter Alfke, always on duty !
SNCR. . .;-)

--
MfG
Falk

Falk Brunner

unread,

Apr 4, 2002, 1:57:04 PM4/4/02

to

"Austin Lesea" <austin...@xilinx.com> schrieb im Newsbeitrag
news:3CAC80DE...@xilinx.com...

> If 405ppc are everywhere, you may dedicate them to tasks that seem
horribly
> inefficient if you continue to think in terms of the one big expensive
monster
> processor.

110% acknowledge!!!!!!!

This "one big CPU for all task" is the ancient approach of those Intel guys.
I remember a day, not too long ago, where Intel saw the future of the
personel computer with just a big RAM and a CPU, doing everything just in
software. :-0
Hey guys, see those grafic controllers nowadays? See how many transistor
they have? See how much OPS they do?
Yes?
So go home and cry.
;-)

--
MfG
Falk

Nicholas Weaver

unread,

Apr 4, 2002, 2:40:46 PM4/4/02

to

In article <a8i8kh$sh28a$2...@ID-84877.news.dfncis.de>,

Falk Brunner <Falk.B...@gmx.de> wrote:
>110% acknowledge!!!!!!!
>
>This "one big CPU for all task" is the ancient approach of those Intel guys.
>I remember a day, not too long ago, where Intel saw the future of the
>personel computer with just a big RAM and a CPU, doing everything just in
>software. :-0
>Hey guys, see those grafic controllers nowadays? See how many transistor
>they have? See how much OPS they do?
>Yes?

Pfah. Big bloated pieces of silicon. :)

It has ALWAYS been that several small processors are more "efficient"
than one big processor, and it has always been a matter of
programmability.

A classic example is the Intel IXP1200 network processor, it consists
of a single ARM core and 6 small risc-like cores, with context-switch
on event (memory miss). A really powerful architecture if you can
program it, and small too. Excluding the numerous interfaces (SDRAM,
PCI, IXP bus, etc), it ends up being in the ~$10 silicon range.

There is a lot of space still left in architectures with such
performance that are also easier to program.

Remember, an 8x8mm die, in a wafer level package, can buy you >200
pins [1], 10+ 32b Gops/second, in the sub $10/chip range. [2]

[1] albeit at a .5mm pitch. Then again, 200 pins, any other ways, is
going to easily add another $4-5 to the chip cost. So it is a
tradeoff: higher board cost, lower part cost and area.

Ken McElvain

unread,

Apr 4, 2002, 2:55:24 PM4/4/02

to

Lasse Langwadt Christensen wrote:

>>
>
> you can definately put an ARM in an FPGA the last project I worked on, I
> did
> an a ASIC proto of a SoC with an ARM7-TDMI-S in a virtexE, rigth now I'm
> working
> on something similar but in a virtex2, so it can hopefully get more of
> the
> clock gating in the design working in the prototype.

Clock gating for an asic design can be automatically converted to
enables in Certify with no source code changes. This covers
flops, latches, memories (inferred or instantiated).

Ken McElvain CTO
Synplicity, Inc.

Peter Alfke

unread,

Apr 4, 2002, 2:55:45 PM4/4/02

to

Austin answered the specific questions very well.
Please allow me to add some philosophical comments:

We are in the business of providing programmable solutions, but there is always
a temptation to add dedicated circuitry because it is smaller and faster and may
consume less power. We have to make agonizing choices, because any
specialization detracts from the universality, and any one of the special
circuits we add burdens each chip and must be paid for by every user, while it
may help only certain users or applications.

Over the years we have added global clocks, carry logic, BlockRAM, clock
management, lots of I/O standards, on-chip termination resistors, multipliers,
triple-DES decryption, and now also PowerPC and 3-gigabit SerDes dedicated
circuitry. Every one of these additions was made after carefully evaluating the
trade-offs between the dedicated area (cost) vs general usefulness. And we are
happy with our choices.

There is a long list of potential candidates that were rejected ( I was in favor
of adding a dedicated PCI interface, the the XC4000, which luckily was
rejected).

Some of our competitors have populated a graveyard (or at least a retirement
community) of commercially unsuccessful attempts to add excessive or poorly
executed specialization to programmable logic, and IMHO Excalibur with its
glued-on ARM and Mercury with its limited-speed incomplete dedicated clock
recovery may be headed in the same direction.

Whenever you add something costly, you should do it right, and don't leave the
job half completed!

Xilinx is obviously also adding dedicated circuitry, but only after very careful
consideration of the technical and economical trade-offs.
And it looks like we have been right in our choices so far.
But keep the suggestions coming.
We are listening!

Peter Alfke

Austin Lesea

unread,

Apr 4, 2002, 4:26:47 PM4/4/02

to

Nicholas,

Just one minor point: the 405ppc has its own caches (16K for data, and 16K for
instructions) so you can execute quite a bit right out of that without ever using a
BRAM.

Austin

Steve Casselman

unread,

Apr 4, 2002, 5:00:49 PM4/4/02

to

I have to disagree that a part with dedicated pins is a net loss for Xilinx.
For example my patent http://www.delphion.com/details?pn=US06178494__
suggests that it might be useful to have a part that can be inserted into a
pre-existing socket. For example if there were a part that fit into the
second slot in of a Pentium system there is a good chance you could sell
millions and millions of them.

Steve

"Peter Alfke" <peter...@xilinx.com> wrote in message
news:3CACAFC2...@xilinx.com...

Nicholas Weaver

unread,

Apr 4, 2002, 5:05:49 PM4/4/02

to

In article <3CACC517...@xilinx.com>,

Austin Lesea <austin...@xilinx.com> wrote:
>Nicholas,
>
>Just one minor point: the 405ppc has its own caches (16K for data, and 16K for
>instructions) so you can execute quite a bit right out of that without ever using a
>BRAM.

OK. That makes even more sense (i shoulda noticed something was
wrong), because otherwise it would take a HELL of a lot of interface
logic to occupy 128 CLBs worth of logic.

In any case, the assertion is: A uP is small. Including a fair
number of them in a large FPGA is rather low cost.

Nicholas Weaver

unread,

Apr 4, 2002, 5:16:53 PM4/4/02

to

In article <k24r8.1447$Jl4.91...@newssvr13.news.prodigy.com>,

Steve Casselman <sc.n...@vcc.com> wrote:
>I have to disagree that a part with dedicated pins is a net loss for Xilinx.
>For example my patent http://www.delphion.com/details?pn=US06178494__
>suggests that it might be useful to have a part that can be inserted into a
>pre-existing socket. For example if there were a part that fit into the
>second slot in of a Pentium system there is a good chance you could sell
>millions and millions of them.

However, the only consistant dedicated pins NEEDED are power and
ground. Otherwise, the joys of reconfiguration, as long as the
reconfigurable logic is fast enough, you can match the interface.

Also, any dedicated circuitry is much harder to test, as it adds
irregularities which need to be tested.

Kevin Brace

unread,

Apr 4, 2002, 7:35:27 PM4/4/02

to

I have seen a patent issued to Xilinx that describes an FPGA with a
dedicated bus controller.
The bus controller the patent described was a PCI interface.
Peter, is this patent related to what you just said?

Kevin Brace (In general, don't respond to me directly, and respond
within the newsgroup.)

Peter Alfke

unread,

Apr 4, 2002, 7:39:10 PM4/4/02

to

There is no clear relationship between patents and product planning.
Sometimes things get patented, but still don't make it through the product planning
process. A clever idea is one thing, a successful product something else. Sometimes
they come together :-)
I do not know about this particular patent. It is not in my name...
Peter Alfke
======================

Ray Andraka

unread,

Apr 4, 2002, 8:24:13 PM4/4/02

to

Well, no it wasn't me this time. Actually the multipliers can be used as a shifter
but... consider that to build a 16 bit rotator or shifter only requires 64 luts,
and if pipelined to the maximum 4 stages can be clocked at more than twice the
speed of the multiplier. Even then, the speed of the multiplier assumes that the
multiplier has its input and outputs registered in the adjacent CLBs with no LUTs
between (the multiplier has fairly long setup and clock to out times compared to
the CLBs, so you need to keep the routes to/from the multiplier very short and with
no LUTs). Adding those registers, you have 16 in, 16 out on top of the registes
and logic for the shift decode (that can actually be done with the BRAM rather than
with CLBs).

In any event, you can see that using the multiplier as a shifter winds up costing
more than half the CLB resources needed to do the same function in the fabric, and
you only get half the speed or less if you are not careful about placement. The
in-the-fabric version also gives you freedom of placement anywhere on the die
instead of being constrained to the multiplier sites. The time-hardware product in
this case actually favors the shifter implemented in the fabric even if the
multipliers themselves are free and not counted in the comparison.

Sorry to rain on your parade there, but you know not everyone can see the emperor's
new clothes.

Nicholas Weaver wrote:

> And, as Ray Andraka has pointed out, a multiplier makes a great
> shifter as well. A variable shift is suprisingly expensive in an FPGA
> fabric: there are a lot of muxes, but it is an operation that is
> suprisingly common.
>
> An 18x18 multiplier can implement an 18 bit variable rotation with
> just 18 LUTs worth of logic to deincode the shift amount, and an
> additional 18 LUTs worth of logic if you want to make it a left
> shift/rotate, an additional 36 LUTs worth if you want to make a
> variable left/right shift.
>

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email r...@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Jan Gray

unread,

Apr 7, 2002, 11:14:56 AM4/7/02

to

"Steve Casselman" <sc.n...@vcc.com> wrote

> For example my patent http://www.delphion.com/details?pn=US06178494__
> suggests that it might be useful to have a part that can be inserted into
a
> pre-existing socket.

http://www.fpgacpu.org/usenet/fpgas_as_pc_coprocessors.html:
"Five times better latency and four times better bandwidth could be
achieved if FPGA vendors invent a way to directly connect their parts
to the Pentium Pro external bus, as a peer of the memory/bus
controller. A custom, dedicated Pentium Pro interface would probably
be required, since FPGA configurable logic would be too slow and
electrically incompatible."

(4/96, 5/96 threads:
http://groups.google.com/groups?th=589c20eee24735de,
http://groups.google.com/groups?th=2946a4c24dd295ae)

Steve Casselman

unread,

Apr 8, 2002, 2:06:22 PM4/8/02

to

Hey Jan. You'll notice that even though the patent was issued in 2000 it was
submitted 11/96. Of course my note book that has my initial write up is
dated dec 17, 1994...

Steve Casselman

"Jan Gray" <jsg...@acm.org> wrote in message
news:a8pnvu$e89$1...@slb4.atl.mindspring.net...