ALU in VHDL and a bunch of questions

Dmitri Katchalov

unread,

Jul 25, 2002, 12:34:33 PM7/25/02

to

Hi,

I'm new to FPGA. I'm trying to replicate PIC16Fxxx core as an exersize
(any real programmer should write at least one OS and compiler :)

I'm trying to synthesize a simple ALU. I'm using VHDL and XST (WebPack).
Target is SpartanIIE. It sortof works but is rather inefficient.
At first I tried a big case statement for all ALU operations.
XST happily infers lots of built-in macros (one for each ALU op)
and a huge output mux. For example it produces 6 carry-chain adders
(one for each ADD, SUB, INC, DEC and another two to get the
half-carry bit for ADD/SUB) where I would think one is enough.

I've narrowed the problem down to a simple adder/subtractor:

if add='1' then
Y <= A + B;
else
Y <= A - B;
end if;

This works fine, produces a single 8-bit adder/subtractor. 4 slices in total.
But this does not give me carry/borrow bit.

if add='1' then
Y <= ('0' & A) + ('0' & B);
else
Y <= ('0' & A) - ('0' & B);
end if;

produces 8bit adder with carry-out, a separate 9bit subtractor and
a 9bit 2x1 mux. 9 slices. I tried different variations of the above
with the same results.

Finally I have come up with the following code.
It uses the fact that A-B = A +(-B) = A + ((not B) + 1).

variable tmp: integer;
variable cin: std_logic;

if op = '1' then
tmp := conv_integer(B);
cin := '0';
else
tmp := conv_integer(not B);
cin := '1';
end if;

Y <= conv_std_logic_vector(conv_integer(A) + tmp + conv_integer(cin),9);

This infers 1 "9bit adder carry in" and 8 2x1 muxes and takes only 4 slices.
Much better. One small detail: if I declare cin as integer instead
of converting it from std_logic at the last step, I'm back to 9 slices.

Now the questions.

* Am I on the right track?

* I'm trying to describe purely combinatorial logic here. The output
is supposed to be the same fixed boolean function of inputs no matter
how it is described. Why such big variations (more than 2 times the area)?
Is this a problem with the tool or they all like that?

* Should I be tweaking XST settings instead? Is there a magic setting
like "Do what I mean not what I say" :)

* Xilinx lib has "8bit adder carry out" but it doesn't seem to have
"8bit subtractor borrow out". Is this right?

* How do I get the half-carry bit out of the 8bit adder? I guess I can
instantiate/infer two separate 4bit adders. Is there a better way?

* What's the story with IEEE.std_logic.SIGNED vs .UNSIGNED? I heard that
they are are mutually exclusive and math operations produce different
results depending on which one is in use. Webpack automatically inserts
IEEE.STD_LOGIC_UNSIGNED.ALL at the beginning of every VHDL source it
creates. Should I always use UNSIGNED?

* Is there a decent on-line reference for all those IEEE.* libraries?
I've found several good VHDL tutorials but none of them covers
std_logic in details.

Thanks,
Dmitri

Goran Bilski

unread,

Jul 25, 2002, 1:58:43 PM7/25/02

to

Hi Dimitri,

Believe me, I have been there and I couldn't find a nice way to do what I wanted.

Which was a add/sub with carry-in and carry-out.
I spend a day trying all sorts of way to express that but only ended up with half
solutions.
Then I just instanciated LUT, MUXCY and XORCY using a generate loop.
That took me 5 min to code which it's way faster than trying to foul the tools.

I have found out that if I have to go around the synthesize tool by changing my
code,
it's way faster to just instanciated what I want.
That also gives me the possibility to RLOC the components and do even more stuff
using the MULT_AND
and set/reset on the DFFs.

Göran

Renaud Pacalet

unread,

Jul 26, 2002, 7:34:35 AM7/26/02

to

Dmitri Katchalov a écrit :

> Hi,
>

Hi Dimitri,

> ...

> Now the questions.
>
> * Am I on the right track?

As you know what you want you could express it directly:

library IEEE;
use IEEE.STD_LOGIC_1164.all;
use IEEE.NUMERIC_STD.all;
...
signal IN1, IN2: UNSIGNED(8 downto 0);
signal CIN: NATURAL range 0 to 1;
...
IN1 <= '0' & A;
IN2 <= '0' & B when ADD = '1' else
'1' & not B;
CIN <= 0 when ADD = '1' else
1;
Y <= STD_LOGIC_VECTOR(IN1 + IN2 + CIN);

>
> * I'm trying to describe purely combinatorial logic here. The
> output is supposed to be the same fixed boolean function of inputs
> no matter how it is described. Why such big variations (more than
> 2 times the area)? Is this a problem with the tool or they all
> like that?

They're all like that.

> * What's the story with IEEE.std_logic.SIGNED vs .UNSIGNED? I
> heard that they are are mutually exclusive and math operations
> produce different results depending on which one is in use.
> Webpack automatically inserts IEEE.STD_LOGIC_UNSIGNED.ALL at the
> beginning of every VHDL source it creates. Should I always use
> UNSIGNED?

use IEEE.NUMERIC_STD instead of all these non-standard packages.

Regards,
--
Renaud Pacalet, ENST / COMELEC, 46 rue Barrault 75634 Paris Cedex 13
Tel. : 01 45 81 78 08 | Fax : 01 45 80 40 36 | Mel : pac...@enst.fr

Adrian Bica

unread,

Jul 26, 2002, 10:37:05 AM7/26/02

to

dmi...@mailandnews.com (Dmitri Katchalov) wrote in message news:<3db7c986.02072...@posting.google.com>...

> Hi,
>
> I'm new to FPGA. I'm trying to replicate PIC16Fxxx core as an exersize
> (any real programmer should write at least one OS and compiler :)
>
> I'm trying to synthesize a simple ALU. I'm using VHDL and XST (WebPack).

I would suggest you to work a bit level. Create a bit cell able to
perform all the operation, not only add and subtract. The bit cell
will have two inputs A and B for operands, one input Cin for carry-in,
two inputs S0 and S1 to select the operation, one output for result
and one for Cout (carry out). Basically, you need a 4 inputs mux with
two selection bits. The inputs of the mux will be
A – used for NOPs and other instruction which don’t
change the Accumulator
A or B – used for OR instruction and set bit instruction
A and B – used for AND instructions and clear bit instructions
A xor B used for XOR, ADD and SUB (if you apply an inverted operand
at the input B and set Cy to 1 if no borrow).
I didn’t enter in details (you should use also Generate and
Propagate signals for speed, you should think how two make rotate and
shift, bit operations and test bit operations), I just want to suggest
to use a structural description rather than a behavioral one if you
want to optimize the design.

Regards,
Adrian

Ray Andraka

unread,

Jul 26, 2002, 2:09:16 PM7/26/02

to

Hear! hear!

Goran speaks the the truth here. This is why a large part of our internal library is
structurally instantiated.

The trouble with an add/sub with carry-in is that for subtraction you need to do A +
!B +1, which means the +1 has to go into the carry chain as the carry in, so you'll
have trouble making that work in 1 level of logic unless you do the +1 in other logic
either before or after the add/sub stage.

The trick to getting the synth to produce what you want most of the time is to code
it up with a structure exactly mimicking the slice, which is to say code it so that
you do any muxing etc in front of the add:

B_inv<= B when sub='1' else not B;
cin <= 1 when sub='1' else 0;
d<= A + Binv + cin;

If need be, you can put syn_keeps on the inputs of that logic to keep the synth
honest.

Goran Bilski wrote:

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email r...@andraka.com
http://www.andraka.com

"They that give up essential liberty to obtain a little
temporary safety deserve neither liberty nor safety."
-Benjamin Franklin, 1759

SPAM@freeuk.com MikeJ

unread,

Jul 27, 2002, 10:38:10 AM7/27/02

to

Hi,

This is the add/sub part of the ALU from the risc5x core on opencores, where
you can also get the package and the generic vhdl / simulation model.

Output is A + B, A -B or A.

The trick is to force a one on the carry in when doing a subtract.
Logic usage : 1 slice for every 2 bits.

Some people argue (rightly so) that this level of code is unreadable. True,
it is, but you can build up a library of these things which have been
simulated to death (I have simulation models of LUT4, MUXCY etc) and then
just use them when you need them.

hope this helps,
Mike.
--
-- Risc5x
-- www.OpenCores.Org - November 2001
--
--
-- This library is free software; you can distribute it and/or modify it
-- under the terms of the GNU Lesser General Public License as published
-- by the Free Software Foundation; either version 2.1 of the License, or
-- (at your option) any later version.
--
-- This library is distributed in the hope that it will be useful, but
-- WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-- See the GNU Lesser General Public License for more details.
--
-- A RISC CPU core.
--
-- (c) Mike Johnson 2001. All Rights Reserved.
-- mikej@<NOSPAM>opencores.org for support or any other issues.
--
-- Revision list
--
-- version 1.0 initial opencores release
--
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

--
-- op <= A +/- B or A
--
entity ADD_SUB is
generic (
WIDTH : in natural := 8
);
port (
A : in std_logic_vector(WIDTH-1 downto 0);
B : in std_logic_vector(WIDTH-1 downto 0);

ADD_OR_SUB : in std_logic; -- high for DOUT <= A +/- B, low for DOUT
<= A
DO_SUB : in std_logic; -- high for DOUT <= A - B, low for DOUT
<= A + B

CARRY_OUT : out std_logic_vector(WIDTH-1 downto 0);
DOUT : out std_logic_vector(WIDTH-1 downto 0)
);
end;

use work.pkg_xilinx_prims.all;
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

architecture VIRTEX of ADD_SUB is

signal lut_op : std_logic_vector(WIDTH-1 downto 0);
signal mult_and_op : std_logic_vector(WIDTH-1 downto 0);
signal carry : std_logic_vector(WIDTH downto 0);
signal op_int : std_logic_vector(WIDTH-1 downto 0);

function loc(i : integer) return integer is
begin
return (((WIDTH+1)/2)-1) - i/2;
end loc;

begin
carry(0) <= DO_SUB;
INST : for i in 0 to WIDTH-1 generate
attribute RLOC of u_lut : label is "R" & integer'image(loc(i)) &
"C0.S1";
attribute RLOC of u_1 : label is "R" & integer'image(loc(i)) &
"C0.S1";
attribute RLOC of u_2 : label is "R" & integer'image(loc(i)) &
"C0.S1";
attribute RLOC of u_3 : label is "R" & integer'image(loc(i)) &
"C0.S1";
attribute INIT of u_lut : label is "C66C";
begin
u_lut : LUT4
--pragma translate_off
generic map (
INIT => str2slv(u_lut'INIT)
)
--pragma translate_on
port map (
I0 => ADD_OR_SUB,
I1 => A(i),
I2 => B(i),
I3 => DO_SUB,
O => lut_op(i)
);

u_1 : MULT_AND
port map (
I0 => ADD_OR_SUB,
I1 => A(i),
LO => mult_and_op(i)
);

u_2 : MUXCY
port map (
DI => mult_and_op(i),
CI => carry(i),
S => lut_op(i),
O => carry(i+1)
);

u_3 : XORCY
port map (
LI => lut_op(i),
CI => carry(i),
O => op_int(i)
);

end generate;
CARRY_OUT <= carry(WIDTH downto 1);
DOUT <= op_int;
end Virtex;

<SNIP>

Dmitri Katchalov

unread,

Jul 27, 2002, 10:54:01 AM7/27/02

to

Thank you guys for your valuable comments.

Dmitri

dmi...@mailandnews.com (Dmitri Katchalov) wrote in message news:<3db7c986.02072...@posting.google.com>...

> I'm trying to synthesize a simple ALU.

rickman

unread,

Jul 27, 2002, 12:01:50 PM7/27/02

to

A couple of comments for points that were not fully addressed.

Yes, but this will be somewhat compilier dependent.

> * I'm trying to describe purely combinatorial logic here. The output
> is supposed to be the same fixed boolean function of inputs no matter
> how it is described. Why such big variations (more than 2 times the area)?
> Is this a problem with the tool or they all like that?

As you said, "any real programmer should write at least one OS and
compiler", try writing code to or even just figuring out how to
translate this stuff into hardware. Not so easy. Compliers are simple
in comparison.

> * Should I be tweaking XST settings instead? Is there a magic setting
> like "Do what I mean not what I say" :)

No, issues with carry and the like are not easy since different chips
deal with them differently. So the compiler needs to be able to map to
different architectures.

> * Xilinx lib has "8bit adder carry out" but it doesn't seem to have
> "8bit subtractor borrow out". Is this right?

Don't know, but as you found, an adder and a subtractor are the same
thing with inverters on one input and the carries.

> * How do I get the half-carry bit out of the 8bit adder? I guess I can
> instantiate/infer two separate 4bit adders. Is there a better way?

The last time I tried to get a carry out of the middle of a carry chain,
I found that the Xilinx architecture does not support that without
breaking the carry chain. So it will need to be done with two 4 bit
adders, as you say.

> * What's the story with IEEE.std_logic.SIGNED vs .UNSIGNED? I heard that
> they are are mutually exclusive and math operations produce different
> results depending on which one is in use. Webpack automatically inserts
> IEEE.STD_LOGIC_UNSIGNED.ALL at the beginning of every VHDL source it
> creates. Should I always use UNSIGNED?

Both of these libraries are NOT IEEE standards. They are Synopsis
proprietary IIRC. So avoid using them and use the "numeric_std" library
instead.

use IEEE.NUMERIC_STD.all;

> * Is there a decent on-line reference for all those IEEE.* libraries?
> I've found several good VHDL tutorials but none of them covers
> std_logic in details.

If you find one, let us all know. Type conversion is the only thing I
have trouble with in VHDL. I recently worked with some Verilog people
and could not convince them that VHDL was even viable because of all the
issues created by strong typing. Verilog is much like C and lets you do
anything you want, no matter how stupid or wrong. But then in a year of
coding, I only made two mistakes from that and it was the same mistake
twice! Sometimes I am a little slow to learn :)

--

Rick "rickman" Collins

rick.c...@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design URL http://www.arius.com
4 King Ave 301-682-7772 Voice
Frederick, MD 21701-3110 301-682-7666 FAX

Ray Andraka

unread,

Jul 27, 2002, 2:57:41 PM7/27/02

to

rickman wrote:

> A couple of comments for points that were not fully addressed.
>
>

I think there is an adder/subtractor in the coregen, if you insist on using a
generated core.

>
> > * Xilinx lib has "8bit adder carry out" but it doesn't seem to have
> > "8bit subtractor borrow out". Is this right?
>
> Don't know, but as you found, an adder and a subtractor are the same
> thing with inverters on one input and the carries.
>
> > * How do I get the half-carry bit out of the 8bit adder? I guess I can
> > instantiate/infer two separate 4bit adders. Is there a better way?

It can be done, but it takes a little mind-bending. Basically, you need to turn
your 8 bit adder into a 9 bit one with bit 4 being a dummy so that you can pull out
the carry out through the bit. It takes a bit of caressing the tools to make them
infer it.

Eric Smith

unread,

Jul 27, 2002, 5:37:53 PM7/27/02

to

> * How do I get the half-carry bit out of the 8bit adder? I guess I can
> instantiate/infer two separate 4bit adders. Is there a better way?

Ray Andraka <r...@andraka.com> writes:
> It can be done, but it takes a little mind-bending. Basically, you
> need to turn your 8 bit adder into a 9 bit one with bit 4 being a
> dummy so that you can pull out the carry out through the bit. It
> takes a bit of caressing the tools to make them infer it.

Is there any advantage to doing that rather than two four-bit adders?
For instance, with two four-bit adders, does the synthesizer not
recognize that it can continue the carry chain between them? Or
does the FPGA not allow you to tap the carry from intermediate stages
of the chain?

Ray Andraka

unread,

Jul 28, 2002, 10:20:38 AM7/28/02

to

In order to tap the carry chain you need to add an extra bit in the carry
chain. The synthesis tools won't do that for you, and in fact will not
infer a caryy chain for less than about 7 bits. Using 2 four bit counters
you incur the delay to get off and then onto the second chain, where with a
single chain you only incur ~100ps. With 2 4 bit counts, it is likely not
your worst case path anyway, so for the sake of simplicity, readability and
maintainability of the code, it is probably better to just infer them as
separate counters. My point was that what you asked about could be done,
but it is not done automatically by the tools and it takes a bit of
finabling to make it work.

Eric Smith wrote:

--

Dmitri Katchalov

unread,

Jul 29, 2002, 4:42:35 AM7/29/02

to

Thanks again everyone.

Using your suggestions I've managed to implement PIC-style
ADD/SUB/INC/DEC with carry and half-carry out in just 4 slices, see code below.
I'm not sure about the polarity of the borrow bit though.

Synthesis infers 2 5-bit adders, later optimised into 4-bit
adders with carry in/out. P&R places them in one column one immediately
on top of another (in otherwise empty FPGA). I don't have suffucient
knowledge to tell from all those the reports whether the carry chain
is broken or continues over. It does seem to continue over.

Here is the code, comments appreciated.

Regards,
Dmitri

library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;

entity alu_adder is
Port ( A,B: in std_logic_vector(7 downto 0);
op: in std_logic_vector(1 downto 0);
Y: out std_logic_vector(7 downto 0);
carry_out: out std_logic;
dc_out: out std_logic );
constant ADD : std_logic_vector(1 downto 0) := "00";
constant SUB : std_logic_vector(1 downto 0) := "01";
constant DEC : std_logic_vector(1 downto 0) := "10";
constant INC : std_logic_vector(1 downto 0) := "11";
end entity alu_adder;

architecture Behavioral of alu_adder is
begin
process( A, B, op )
variable tmp: std_logic_vector(7 downto 0);
variable lo_nibble, hi_nibble: unsigned(5 downto 0);
variable cin: std_logic;
begin
case op is
when INC => tmp := (others => '0'); cin := '1';
when DEC => tmp := (others => '1'); cin := '0';
when SUB => tmp := not B; cin := '1';
when ADD => tmp := B; cin := '0';
when others => tmp := (others => '-'); cin := '-';
end case;

lo_nibble := unsigned('0' & A(3 downto 0) & cin ) +
unsigned('0' & tmp(3 downto 0) & cin );

hi_nibble := unsigned('0' & A(7 downto 4) & lo_nibble(5) ) +
unsigned('0' & tmp(7 downto 4) & lo_nibble(5) );

Y <= std_logic_vector( hi_nibble(4 downto 1) & lo_nibble(4 downto 1));
dc_out <= lo_nibble(5);
carry_out <= hi_nibble(5);
end process;
end architecture Behavioral;

Ray Andraka <r...@andraka.com> wrote in message news:<3D43FE74...@andraka.com>...

Ray Andraka

unread,

Jul 29, 2002, 9:20:06 AM7/29/02

to

It may very well be tapping it. The virtex architecture supports tapping the carry chain
and sending it out the XB and YB outputs of the slice. I have had trouble in the past
getting the software to use that tap when the carry out to the next slice is also used,
although I don't recall if it was a xilinx problem or a synplify problem. If you are
using 4.x there is a good possibility that it is properly tapping the carry chain. If you
are using the paid version of the Xilinx software, you can open the FPGA editor and
examine the results directly. If not, you can tell from the timing report if you have it
show all timing and then you find the carry chain in question.

Dmitri Katchalov wrote:

--

SPAM@freeuk.com MikeJ

unread,

Jul 29, 2002, 5:28:49 PM7/29/02

to

Paranoia !
I just thought I would check again the risc5x code I posted earlier works
correctly.
Looking with FPGA Editor it does build a single 8 lut long carry chain and
the 4.2i tools correctly strips off the DC carry bit, in my case using the
YB slice output.
Phew !

Mikej

"Ray Andraka" <r...@andraka.com> wrote in message

news:3D4541C8...@andraka.com...

Ray Andraka

unread,

Jul 29, 2002, 8:44:49 PM7/29/02

to

Yeah, it used to be a problem. I hadn't looked at the tools output for that
case in a while. I knew the Virtex could tap the chain, but for along time the
tools wouldn't let you do it.