RAM based Accumulator?

Michael Holmstrom

unread,

Mar 29, 2001, 8:54:58 PM3/29/01

to

The problem is to implement a RAM based accumulator in VHDL. The basic idea
is to have a RAM that get fed data at a certain rate and each time the new
data appears at the RAM input, it gets added to the previous value of the
RAM at that specific address. So far so good, it all gets tricky when the
requirements are that the data gets clocked in at the same speed as the
clock speed of the process. What would be the most sensible approach to get
this to work with this setup, can this be solved by splitting things up in
several parallel processes or?

Thanks for your help!

/Michael

Kent Orthner

unread,

Mar 29, 2001, 9:13:09 PM3/29/01

to

Well, it depends on your RAM.

For example, in the most common FPGAs, the RAM components have seperate
pins for the input and outputs. This means that as the data flows out
of the ram on the output pins, you can feed that back to the input pins.

Then, it becomes as simple as
RamDataIn <= RamDataOut + NewData;

You control the address you want to accumulate to, and the write strobe,
and you're set.

If you're using a RAM that has registers outputs, such as Xilinx's BlockRAM,
then you don't actually get the data out of the RAM until the next clock cycle.
In this case, you'll need to use a dual port RAM, and port A reads from addr(t),
while port B writes to addr(t-1).

If you using a normal RAM chip that can do one operation per clock cycle, and
has a shared read/write bus, then I don't see how it can be done.

Hope this helps,
-Kent

Hamish Moffatt VK3SB

unread,

Mar 30, 2001, 4:10:24 AM3/30/01

to

Kent Orthner <kort...@hotmail.nospam.com> wrote:
> If you're using a RAM that has registers outputs, such as Xilinx's BlockRAM,

The block RAM has registered address inputs, with asynchronous outputs.

To quote the datasheet:
"The read address is registered on the read port clock edge
and data appears on the output after the RAM access time."

(It would be nice if it DID have registered outputs, as the
outputs are quite slow.)

The effect is the same in this application though. Some sort
of banking is probably needed if data must be accumulated on
every cycle.

Hamish
--
Hamish Moffatt VK3SB <ham...@debian.org> <ham...@cloud.net.au>

Kent Orthner

unread,

Mar 30, 2001, 6:33:51 AM3/30/01

to

> Kent Orthner <kort...@hotmail.nospam.com> wrote:
> > If you're using a RAM that has registers outputs, such as Xilinx's BlockRAM,

ham...@cloud.net.au (Hamish Moffatt VK3SB) writes:
> The block RAM has registered address inputs, with asynchronous outputs.

Good point. My mistake.

> To quote the datasheet:
> "The read address is registered on the read port clock edge
> and data appears on the output after the RAM access time."
>
> (It would be nice if it DID have registered outputs, as the
> outputs are quite slow.)

I disagree! If you want registered outputs, you can register
the yourself. I'm glad we're not forced to use registered O/P's.

> The effect is the same in this application though. Some sort
> of banking is probably needed if data must be accumulated on
> every cycle.

well, if BlockRAM is bring used, the dual-port can be
used . . one port to read, the other to update.

-Kent

Hamish Moffatt VK3SB

unread,

Mar 30, 2001, 8:01:49 PM3/30/01

to

Kent Orthner <kort...@hotmail.nospam.com> wrote:
> I disagree! If you want registered outputs, you can register
> the yourself. I'm glad we're not forced to use registered O/P's.

Optional registered outputs, then. In high speed designs the output
registers have to be placed quite close to the block RAMs to meet
timing. I've had to use LOCs for this in the past.

> well, if BlockRAM is bring used, the dual-port can be
> used . . one port to read, the other to update.

As long as you don't need to update the same location as you're
reading.

Ray Andraka

unread,

Mar 30, 2001, 9:58:26 PM3/30/01

to

Hamish Moffatt VK3SB wrote:
>
> Kent Orthner <kort...@hotmail.nospam.com> wrote:
> > I disagree! If you want registered outputs, you can register
> > the yourself. I'm glad we're not forced to use registered O/P's.
>
> Optional registered outputs, then. In high speed designs the output
> registers have to be placed quite close to the block RAMs to meet
> timing. I've had to use LOCs for this in the past.

Yeah. and even when the enables (those are the worst ones) and data are
registered in the CLBs immediately adjacent to the BRAM, the BRAM interface is
typically the limiting factor on how fast my designs can be clocked. I shudder
to think how slow they would be without the 'registered' read address.

The other long pole in virtex is carry-chains with a high fan-in control such
as add/subtract (boy I sure miss having the LUT follow the carry chain instead
of contributing to it's delay). It looks like the same is going to be true in
virtexII. Just look at the BRAM and multiplier numbers. Lots slower than the
fabric can go :-(. Still, the V2 is a neat part with lots of nice features.
Even if I can't clock them as fast as I'd like, I can still get an awful lot of
processing in there.

>
> > well, if BlockRAM is bring used, the dual-port can be
> > used . . one port to read, the other to update.
>
> As long as you don't need to update the same location as you're
> reading.
>
> Hamish
> --
> Hamish Moffatt VK3SB <ham...@debian.org> <ham...@cloud.net.au>

--
-Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930 Fax 401/884-7950
email r...@andraka.com
http://www.andraka.com

Hamish Moffatt VK3SB

unread,

Mar 31, 2001, 8:09:41 PM3/31/01

to

Ray Andraka <r...@andraka.com> wrote:
> Yeah. and even when the enables (those are the worst ones) and data are
> registered in the CLBs immediately adjacent to the BRAM, the BRAM interface is
> typically the limiting factor on how fast my designs can be clocked. I shudder
> to think how slow they would be without the 'registered' read address.

Well, I heard that the block RAM outputs go straight to the multiplier
inputs without registers in between.. somehow 150+ MHz operating
speeds don't seem very likely in this configuration.

Ray Andraka

unread,

Mar 31, 2001, 11:38:00 PM3/31/01

to

it's only about 140 MHz. The multiplier speeds are dependent on the width of
the multiplier. If you look closely, the numbers for the 18 bit unpipelined
multiplier are only about 140 MHz. Supposedly the pipeline register in the
multiplier will bring the speed up, but unfortunately it is neither documented
in the datasheet, nor supported by the current software. Either way, you can
still build a faster (but deeply pipelined) multiplier in the fabric. It is not
at all clear what kind of performance numbers you'll get with the BRAM feeding
the multiplier as is intended.

--