Signal processing using FPGAs

Amit Thakar

unread,

Jan 16, 2002, 3:11:11 PM1/16/02

to

Hi,

I was hoping someone could answer questions I had regarding digital signal
processors vs. FPGAs for implementing computationally intensive signal
processing algorithms:

1. Can FPGAs (especially newer ones) achieve better performance than DSPs?
2. If so, then why do ppl use DSPs as opposed to FPGAs?
3. Which is more flexible in terms of reprogrammability (I would think they
would be the same in this regard).
3. What are other (dis)advantages of using FPGAs vs. DSPs.

In general, I understand that DSPs provide a low cost solution due to high
volume of generic products, but performance tends to fall short for many
applications.

Any input would be greatly appreciated.
Thanks!

-Amit

Austin Lesea

unread,

Jan 16, 2002, 4:00:06 PM1/16/02

to

See below,

Austin

Amit Thakar wrote:

> Hi,
>
> I was hoping someone could answer questions I had regarding digital signal
> processors vs. FPGAs for implementing computationally intensive signal
> processing algorithms:
>
> 1. Can FPGAs (especially newer ones) achieve better performance than DSPs?

Yes. 70 to over 100 times better performance.

>
> 2. If so, then why do ppl use DSPs as opposed to FPGAs?

Tools. Hard to create DSP structures for FPGAs as opposed to writing C code.
Hard to simulate. The Xtreme DSP (tm) program is addressing these issues with
better tools (ie CoreGen, interface to MatLab (Simulink), etc).

>
> 3. Which is more flexible in terms of reprogrammability (I would think they
> would be the same in this regard).

Easy to reprogram either, but you need the program! FPGAs can change their
hardware, not just the algorithm, so they are more flexible (potentially).

>
> 3. What are other (dis)advantages of using FPGAs vs. DSPs.

Tools. Hard to design, hard to simulate. Cost. If you need 100X performance,
the cost is low. If you look to do what a simple DSP chip can do, the cost is
high (in comparison).

>
>
> In general, I understand that DSPs provide a low cost solution due to high
> volume of generic products, but performance tends to fall short for many
> applications.

FPGAs are also mass produced, so their cost is low (see Spartan for example).

Ray Andraka

unread,

Jan 16, 2002, 5:17:26 PM1/16/02

to

Amit Thakar wrote:

> Hi,
>
> I was hoping someone could answer questions I had regarding digital signal
> processors vs. FPGAs for implementing computationally intensive signal
> processing algorithms:
>
> 1. Can FPGAs (especially newer ones) achieve better performance than DSPs?

Yes, We typically see just under 100:1 performance advantage in our designs
(mostly radar and imaging). That ratio hasn't change much over the last 10
years.

>
> 2. If so, then why do ppl use DSPs as opposed to FPGAs?

Obtaining that performance from FPGAs requires a relatively rare skill set: The
FPGA DSP designer needs to be skilled in logic design, specifically FPGA design,
and also has to be familiar enough with Hardware based DSP to be able to
optimize the algorithm implementation for a hardware, make that FPGA,
implementation. Most DSP algorithms in use today are geared toward a software
solution. The efficient hardware solution often requires a quite different
approach. Most people doing DSP work these days are software types (which makes
sense seeing the DSP field has been dominated by the microprocessor for the past
quarter century), so it follows that DSP designs will continue to use
microprocessors as the preferred platform as long as the project requirements
are met.

>
> 3. Which is more flexible in terms of reprogrammability (I would think they
> would be the same in this regard).

The microprocessors still hold an edge on flexibility, mostly because of the
state of the tools. Microprocessor code typically takes seconds to compile
changes where the FPGA code can take hours. The tools, especially when it comes
to evaluation of algorithms, are also much better for the software side.

>
> 3. What are other (dis)advantages of using FPGAs vs. DSPs.

The biggest disadvantage to using FPGAs is the relative scarcity of expertise
needed for the performance gains I noted above. FPGAs also cost more per device
than typical DSP micros, but when you consider dollars for a specific
performance then the PFGAs get cheaper as soon as performance drives you over
one or two DSPs.

I tell my customers that if it can be done in a (single or at most two) DSP
micro, then they should do it there rather than in an FPGA because it is a lot
easier (and somewhat cheaper) to find expertise on the processors.

>
>
> In general, I understand that DSPs provide a low cost solution due to high
> volume of generic products, but performance tends to fall short for many
> applications.
>
> Any input would be greatly appreciated.
> Thanks!
>
> -Amit

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email r...@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

Amit Thakar

unread,

Jan 16, 2002, 5:36:31 PM1/16/02

to

Thanks a lot Austin and Ray, your responses were really helpful.

-Amit

"Amit Thakar" <ath...@uwaterloo.ca> wrote in message
news:z7l18.253991$KT.59...@news4.rdc1.on.home.com...

Steve Underwood

unread,

Jan 17, 2002, 12:42:43 AM1/17/02

to

Amit Thakar wrote:

> Hi,
>
> I was hoping someone could answer questions I had regarding digital signal
> processors vs. FPGAs for implementing computationally intensive signal
> processing algorithms:
>
> 1. Can FPGAs (especially newer ones) achieve better performance than DSPs?

It depends what you mean by performance. If you mean speed, then yes. An
FPGA solution can generally be 1 to 2 orders of magnitude faster. If you
means capable of implementing extremely complex algorithms well, then
no. The complexity of handling, say, a CELP vocoder in dedicated
hardware makes it a poor choice.

> 2. If so, then why do ppl use DSPs as opposed to FPGAs?

Use dedicated hardware solutions for simple things - if you need a
simple filter in a CD player, nothing will beat a bit serial solution in
hardware.

Use dedicated hardware solutions for speed - if you do real time radio
channel processing you generally have a high throughput requirement, but
the algorithms aren't too complex. Doing this with programmable DSPs
would require a large bank of the, splitting up the work. FPGAs are
ideal here.

Use programmable DSPs where the algorithms are complex - things like
vocoders can be a nightmare to implement in an FPGA.

Use programmable DSPs where the requirements change frequently - FPGA
may be reprogrammable, but minor changes in requirements tend to become
major changes in an FPGA DSP solution. In a programmable solution they
may be just a few lines of code.

Use programmable DSPs for low volume - this is not a clear cut thing, as
FPGAs are the right things for a lot of low volume uses. However, your
chances of being able to use, say, a standard PCI card solution are far
better with the programmable approach.

> 3. Which is more flexible in terms of reprogrammability (I would think they
> would be the same in this regard).

Depends what flexibility you need. If you got the board layout wrong you
are more likely to be able to fix that by reprogramming an FPGA, than a
programmable DSP. You you want to make algorithm changes its usually
much quicker with programmable DSPs.

> 3. What are other (dis)advantages of using FPGAs vs. DSPs.

I think these issues are too application dependent to generalise.

> In general, I understand that DSPs provide a low cost solution due to high
> volume of generic products, but performance tends to fall short for many
> applications.

What does "performance fall short" mean? Nothing will match the current
consumption of the programmable solutions in my GSM phone. If battery
life if your overriding criteria, that very current consumption focussed
solution will beat anything a more general purpose FPGA can do. In terms
of mere processing speed, in most DSP applications some specific level
is enough and you can't take any real advantage of any left over compute
resources. You can never have too low a level of battery consumption,
though. No solution will maximise all performance parameters - pretty
much like any other area of engineering.

Regards,
Steve

Bert Cuzeau

unread,

Jan 17, 2002, 4:06:36 AM1/17/02

to

Hello Amit,

You have received indeed an enthusiastic feedback about
implementing D.S.Processing in FPGA. (is this a surprise ? ;-)

One issue you may soon face is getting/building a hardware
platform, and how to interface it with the real world.

For some of these designs, we now use a superb platform
made by a UK company named Hunt Engineering.
http://www.hunteng.co.uk

This architecture is real clever, and you can freely
mix FPGA(s) and DSP(s) in your system and get the
best of both worlds. The control by the PC is also a breeze.

Hope this helps,

Bert Cuzeau
Technical Manager ALSE rance
- Digital Design
- VHDL + Verilog
- Training Courses

--
Posted via Mailgate.ORG Server - http://www.Mailgate.ORG

Austin Lesea

unread,

Jan 17, 2002, 10:31:23 AM1/17/02

to

You are welcome,

Please visit http://www.nallatech.com/ for hardware and software to make the
DSP development in FPGAs much much easier.

Austin

Falk Brunner

unread,

Jan 17, 2002, 12:30:53 PM1/17/02

to

"Ray Andraka" <r...@andraka.com> schrieb im Newsbeitrag
news:3C45FC8B...@andraka.com...

> The biggest disadvantage to using FPGAs is the relative scarcity of
expertise
> needed for the performance gains I noted above. FPGAs also cost more per
device
> than typical DSP micros, but when you consider dollars for a specific
> performance then the PFGAs get cheaper as soon as performance drives you
over
> one or two DSPs.

The is a nice article in thwe new issue of Xcell. Its about a design for a
2D FFT of a 2048x2048x16bit image with 120 frames/second.
The use two XC2V6000 one for the row transformation, one for the column
transformation. So a 2048 Point (16 bit) FFT takes 2.8 us. The whole system
costs about 20k $. The System that was used before (dozens of PowerPC cards)
was about 480k $.
Hmm, not bad ;-)

--
MfG
Falk

#BASUKI ENDAH PRIYANTO#

unread,

Jan 17, 2002, 9:18:38 PM1/17/02

to

go to the following URL: www.andraka.com

They have explaination about your Q's

BUZZ

:-----Original Message-----
:From: Amit Thakar [mailto:ath...@uwaterloo.ca]
:Posted At: Thursday, 17 January, 2002 4:11 AM
:Posted To: fpga
:Conversation: Signal processing using FPGAs
:Subject: Signal processing using FPGAs
:
:
:Hi,

:
:
:

Ron Huizen

unread,

Jan 21, 2002, 9:54:51 AM1/21/02

to

With respects to DSP vs FPGA, I'm curious to hear thoughts on floating
point versus fixed point issues. There has been some discussion with
respects to algorithm complexity, and also on types of algorithms which
are better suited for DSP than FPGA implementations. Is it just assumed
that anything requiring floating point falls into the DSP camp?

Of course, I must confess a bit of a bias, working at a Sharc DSP board
company, but we do have a PMC board with both a floating point DSP (ADI
21160) and a Virtex II, so I'm looking for more insights as to where we
should be drawing the line between the FPGA and DSP in terms of system
implementations.
Our approach is always weighted by customer time and cost issues, but
typically whatever's easier (and hence faster and less risky) comes
ahead of which is the most efficient in terms of saving a few $$.

----------
Ron Huizen
Bittware

glen herrmannsfeldt

unread,

Jan 21, 2002, 2:35:56 PM1/21/02

to

Amit Thakar [mailto:ath...@uwaterloo.ca] wrote:

>I was hoping someone could answer questions I had regarding
>digital signal processors vs. FPGAs for implementing
>computationally intensive signal processing algorithms
>
> 1. Can FPGAs (especially newer ones) achieve better
> performance than DSPs?
> 2. If so, then why do ppl use DSPs as opposed to FPGAs?
> 3. Which is more flexible in terms of reprogrammability (I
> would think they would be the same in this regard).

> 4. What are other (dis)advantages of using FPGAs vs. DSPs.

>
> In general, I understand that DSPs provide a low cost solution
> due to high volume of generic products, but performance
> tends to fall short for many applications.

Some years ago, I knew met people working on a C compiler to compile
C code to FPGA code. The idea seemed so wrong to me.

A programmed FPGA is like a whole bunch of logic connected together,
once programmed it is static. If you can use most of the logic on
most clock cycles (assuming synchronous designs) FPGA's work well.

Consider an average C program, which some initialization, maybe
a few loops in the middle, some function calls, and write out
some results at the end. When each of those parts is executing
the other parts are not. If you tried to program an FPGA, some
of the logic would go to initialization, and only be used once.
The results writing at the end is also only done once. This is
not the way that logic design is done, though.

Only the loop part is normally executed enough to make good use
of FPGA logic. If you can concentrate on those iterative parts
of the algorithm, then you can make good use on an FPGA.

In many cases you would want both an FPGA to implement the most
used logic operations, and a DSP or other programmable processor
to implement the control functions, things like I/O operations
and such.

Some logic operations work well in FPGA's while others don't.
Something as common as floating point arithmetic is very hard
to implement well in an FPGA, but is supported well by many DSP.

Pretty much, you have to look at each algorithm individually
to see what the best implementation would be.

P.S., if this is homework please reference the newsgroup.
You will still need to do enough thinking to get an answer
out of this that you should get appropriate credit.

-- glen

Ray Andraka

unread,

Jan 21, 2002, 2:41:17 PM1/21/02

to

We draw the dividing line in mostly terms of performance. The cost differences are
not great enough in most cases to be a deciding factor. As for floating point, a
majority of DSP applications can be done in fixed point rather than floating point.
The nice thing with FPGAs is that you are not tied to a specific bit width, rather you
can go with whatever width matches your needs. There are occasionally applications
that benefit from floating point, and we do still do those in FPGAs. Again, there are
architectural things that can be done to reduce the complexity of the hardware. If
you look at floating point, all it is is a fixed point value with a second fixed point
value attached to it indicating a power of 2 scale. Many operations require the
operands to be of the same scale, so that the operation is essentially a fixed point
operation (addition is an example of this). When we do something that requires
floating point, we typically do a string of operations in fixed point starting with
the normalized mantissa. Then, only after completing the operations, we renormalize
the result and adjust the exponent. For example, a CORDIC rotation is inherently a
fixed point operation because it is a series of shifts and adds. If we need to use it
with a floating point input and output, we do the actual rotation on the normalized
input pair, and pass the exponent around it. At the back end, we renormalize the
rotated mantissa and adjust the exponent accordingly. If you compare that to a series
of floating point adders, you've saved a tremendous amount of hardware terms of
normalizing/denormalizing shifters without giving up anything in dynamic range or
precision. Another example is the use of a block floating point algorithm for doing
large FFTs. For small FFT's we just accept the bit growth by increasing the width of
the data path. For large ones, we often use a block floating point scheme that has a
common exponent for the set.

Peter Alfke

unread,

Jan 21, 2002, 4:30:05 PM1/21/02

to glen herrmannsfeldt

glen herrmannsfeldt wrote:

> In many cases you would want both an FPGA to implement the most
> used logic operations, and a DSP or other programmable processor
> to implement the control functions, things like I/O operations
> and such.
>
> Some logic operations work well in FPGA's while others don't.
> Something as common as floating point arithmetic is very hard
> to implement well in an FPGA, but is supported well by many DSP.
>
> Pretty much, you have to look at each algorithm individually
> to see what the best implementation would be.

In a nutshell:
FPGAs are good at parallel implementations, thousands ( ten thousands !) of
little LUT-engines running simultaneously. That's why they can beat DSPs by
orders of magnitude in performance, but not in versatility and ease of use.
DSPs are good at executing code sequentially ( yes, I know, sometimes a few in
parallel). Not so fast, but more complex.

Traditionally, FPGAs have shied away from floating point, but Virtex-II with
hundreds of multipliers may change that ( the multiplier can also be used as a
shifter ). This invites simplified non-IEEE floating point.

Peter Alfke, Xilinx Applications

Eric Smith

unread,

Jan 21, 2002, 5:46:22 PM1/21/02

to

Peter Alfke <peter...@xilinx.com> writes:
> Traditionally, FPGAs have shied away from floating point, but
> Virtex-II with hundreds of multipliers may change that ( the
> multiplier can also be used as a shifter ). This invites simplified
> non-IEEE floating point.

What's the most efficient way to implement a priority encoder in a
Virtex-II to compute the number of bits needed to shift for normalization
after FP add/subtract?

Peter Alfke

unread,

Jan 21, 2002, 7:15:42 PM1/21/02

to Eric Smith

There must be many different methods. Here is one 18-input priority encoder that
seems "creative"
Use a BlockRAM, configured 1k x 18. Feed 9 bits into port A, the other 9 bits
into port B.

This produces un-encoded outputs, 9 from portA, 9 from port B ( stagger the
output wiring appropriately ).
Now you need a wide NOR of all 9 inputs on A to act as an enable for port B.
If the BlockRAM were combinatorial, we could hide this wide NOR inside the ROM,
but the BlockRAM read is clocked, so we must use external logic ( 2 or 3 LUTs )
to enable the outputs from port B.
But we save the encoder/decoder needed to control the multiplier. as a shifter.
There is only a single 1 coming from this 18-input priority encoder.

I thank Bernie New for the basic idea.

Peter Alfke, Xilinx Applications
=======================================

Ray Andraka

unread,

Jan 21, 2002, 7:49:33 PM1/21/02

to

You don't need a priority encoder for normalization. Instead use a merged tree
shifter and make the shift decision at each step:

This shifter consists of n layers of 2:1 muxes. For the normalizing shift, you
need to do the largest shift first, so in the case of a 16 bit normalizer you
have 4 levels of 2:1 muxes. The first layer shifts data left by 8 bits if there
are 8 redundant sign bits in the input (if sign-magnitude format, then if there
are 8 leading zeros, if 2's comp then if top 9 bits are the same), otherwise
passes the data unchanged. The next layer shifts the data from the first layer
left by 4 bits if the top four bits of that data are redundant sign or passes it
unchanged otherwise, and so on. No priority encode needed, you can deeply
pipeline it easily (and at no cost), less real estate and clock latency than the
priority encode plus shifter approach, and the shift decisions at each level,
when concatenated together indicate the total shift, which can be the exponent.
This is not unique to VirtexII, and when pipelined is much (nearly 3x) faster
than either the multipliers or the block RAM.

Eric Smith wrote:

--

John_H

unread,

Jan 22, 2002, 12:57:35 PM1/22/02

to

The "single bit out" to work as a shift position in the multiplier block is
beautiful.

But wouldn't it be faster and more efficient to do a cascade to generate this single
bit? Cascade one logic value until the first "one" is encountered, then cascade the
other value. Take an XOR of the cascade shifted zero and the cascade shifted one
and you have the single bit output like with the dual port ROM. [If the timing
analyzer wouldn't choke on feeding YB back into the next stage LUT for n(tilo), the
XORCY could perform the function without an additional level of logic.]

Agreed, there are "many different methods" but I've never liked getting into the ROM
initialization issues.