Pipelining a large mathematical equation

Carlos

unread,

Oct 12, 2012, 10:47:22 AM10/12/12

to

Hello,

This is a bit of a general question concerning coding styles.
Suppose that you want to implement the following equation in a fully pipelined manner:
((a + b) + (c + d)*e)*f

Ideally, in order to achieve good timing (more or less), I would tend to perform almost every arithmetic operation on its own into an intermediate result.
Afterwards I would combine intermediate results, for example as shown below (this is just some pseudo code, I might have made some errors).

process (my_clk)
begin
if rising_edge(my_clk) then
if my_reset = '1' then
ab1 <= (others => '0');
ab2 <= (others => '0');
cd1 <= (others => '0');
cde2 <= (others => '0');
abcde3 <= (others => '0');
abcdef4 <= (others => '0');

e1 <= (others => '0');
f1 <= (others => '0');
f2 <= (others => '0');
f3 <= (others => '0');
else
-- Stage (1)
ab1 <= a + b;
cd1 <= c + d;
e1 <= e; -- Delay
f1 <= f; -- Delay
-- Stage (2)
cde2 <= cd1 * e1;
f2 <= f1; -- Delay
ab2 <= ab1; -- Delay
-- Stage (3)
abcde3 <= ab2 + cde2;
f3 <= f2; -- Delay
-- Stage (4)
abcdef4 <= abcde3 * f3;
end if;
end if;
end process;

For now, I keep a "number" to indicate the pipeline stage I'm in, since I don't want to add an intermediate result with an "older" one.
This might be fine, but assuming we are doing some large interpolation formula which consists of 20 arithmetic terms, this might become quite cumbersome, especially that you have to give a meaningful name for the intermediate values.

My questions are the following:
#1- Is there a better coding style to keep the code clean and easily modifiable? Maybe use variables instead of signals etc... (I'm interested in knowing how you guys would implement that).
#2- Is there a way to just insert the whole equation and give enough pipelining registers for the synthesis tool to use/duplicate/balance/insert in order to achieve something neat.
(In this case latency is not an issue, we are more concerned with throughput).
For example, could we do something like, the below code, and let the synthesis tool do its magic?
#3- If #2 is doable, will this code be more or less portable to other devices, or will I have to manually tweak the synthesis parameters for different devices.

my_eq1 <= ((a + b) + (c + d)*e)*f;
my_eq2 <= my_eq1;
my_eq3 <= my_eq2;
my_eq4 <= my_eq3;
-- ...

In reality we are only going to read the signal "my_eq4" the others are just pipeline stages that we don't care about.

Notes:
- I am assuming that all vector widths are correctly set.
- I also understand that, depending on the platform and available resources, an architecture might be better suited than another.

Any cool coding tips would be appreciated. Thanks!

Gabor

unread,

Oct 13, 2012, 10:56:19 AM10/13/12

to

I would suggest trying your suggestion in your synthesis tool to see
what becomes of it. XST, at least in more recent versions is pretty
good at "balancing registers" to insert the pipeline stages where it
makes sense to meet timing. Specifically you can code wide
multiplications that need more than one DSP unit, follow the output
with a number of pipeline stages, and XST wll move the pipelining
into the DSP registers for intermediate results. It's not clear
how well XST will perform on other more general equations, but
again it couldn't hurt to try it out. I imagine Synplicity will
do at least as well. Can't comment on other tools.

-- Gabor

Andy

unread,

Oct 15, 2012, 10:04:53 AM10/15/12

to

Anytime I see variables or signals named xx1, xx2, xx3, etc., alarm bells and flashing lights go off in my head. You can use an array, and a shift register to accomplish what you want, and the length of the shift register determines the number of pipeline stages (and you could use a generic for that).

generic(stages: positive := 1);

...

type pipe_t is array(1 to stages) of unsigned(output'range);
signal pipe: pipe_t;

...

pipe <= pipe(2 to stages) & (((a + b) + (c + d)*e)*f); -- registered

...

output <= pipe(1); -- combinatorial

Keep in mind, you may want to manage intermediate result size/resolution. If so, take a look at the fixed point package (vhdl 2008), even if you need zero fractional bits. Each operation's result grows to handle the maximum range/precision, and then you use resize() (intermedately and/or at the end) to specify how much range/precision you want to keep. This would be a good place for a function that handles all that, then just call the function in the concatenation for the shift register.

Andy

HT-Lab

unread,

Oct 17, 2012, 4:53:31 AM10/17/12

to

On 12/10/2012 15:47, Carlos wrote:
> Hello,
>
> This is a bit of a general question concerning coding styles.
> Suppose that you want to implement the following equation in a fully pipelined manner:
> ((a + b) + (c + d)*e)*f

Perhaps another solution to look at is High-Level Synthesis. Thanks to
Xilinx Vivado HLS this is now an affordable solution (comes free with
ISE14.2?).

There are some nice video's here:
http://www.xilinx.com/training/vivado/index.htm

HLS tools are ideally suited to perform architectural exploration
(number of pipelines, number/type of resources etc) on a large un-timed
block of logic. After you have crafted your perfect architecture the
tool generates the VHDL or Verilog for you.

Obviously Vivado HLS is not as capable as the big boys (CatapultC
Cynthesiser etc) but I suspect you get quite a lot of HLS for your money.

Just a though,

Hans
www.ht-lab.com

carlos...@gmail.com

unread,

Oct 17, 2012, 1:03:13 PM10/17/12

to

Hello,

Gabor, thanks I might be testing it out with a more concrete design.

Andy, I do agree with you concerning the incremental numbers, they are quite annoying and I do not typically use them unless it's a single delay pipe.
Concerning using a function. Do you mean that you would do the whole calculation (with resizes etc...) in a combinatorial fashion (or in single equation) and then add some amount of pipelining stages to improve the performance.
Or would you manually separate the calculations into several stages rather than rely on the tools to do the work?

I will also have a look at Xilinx Vivado HLS, thanks!
C.

Andy

unread,

Oct 24, 2012, 8:59:37 AM10/24/12

to

On Wednesday, October 17, 2012 12:03:14 PM UTC-5, carlos...@gmail.com wrote:
> Andy, I do agree with you concerning the incremental numbers, they are quite annoying and I do not typically use them unless it's a single delay pipe. Concerning using a function. Do you mean that you would do the whole calculation (with resizes etc...) in a combinatorial fashion (or in single equation) and then add some amount of pipelining stages to improve the performance. Or would you manually separate the calculations into several stages rather than rely on the tools to do the work? I will also have a look at Xilinx Vivado HLS, thanks! C.

Sorry it took me a whle to get back to you...

I would try the single expression followed (or preceded) by pipeline stages, and enable retiming/pipelining optimizations in synthesis first. If that gets you where you need to be (speed, area, etc.) then you're done. Only if that does not work satisfactorily with your tool would I break it up manually into individual pipeline stages. The former is much more readable, writable and maintainable than the latter.

Note that, if you need to use resizing() to manage the size/precision of intermediate results, you can break it up into several statements using variables, then shift the final variable result into the pipeline. Just remember to write before read on a variable to retain the combinatorial behavior before it gets shifted into the pipeline, all in the same clocked process.

Note also that breaking the expression up is not absolutely necessary if you want to use resize() on intermediate results; you can embed resize() in the expression.

Andy

Gerhard Hoffmann

unread,

Dec 9, 2012, 10:09:07 AM12/9/12

to

Am 12.10.2012 16:47, schrieb Carlos:
> Hello,
>
> This is a bit of a general question concerning coding styles.
> Suppose that you want to implement the following equation in a fully pipelined manner:

.....

> Any cool coding tips would be appreciated. Thanks!
>

Take a look at my sine / cos function on opencores.org
and the pipeline entity in special.

http://opencores.org/project,sincos

regards, Gerhard