RMW example for BlockRAM?

491 views
Skip to first unread message

Gabor Greif

unread,
Nov 27, 2015, 2:18:45 PM11/27/15
to CLaSH - Hardware Description Language
Hi all,

I am working on a content-addressed memory that is backed by block-RAM. For the bucket updates I need a read-modify-write cycle. I'd do it with a delayed signal somehow, but if there is something to steal out there I'd love to see it...

Any hints?

Thanks,

Gabor

Bogdan Penkovskyi

unread,
Nov 28, 2015, 6:00:12 AM11/28/15
to CLaSH - Hardware Description Language
Hi Gabor!

Here is my quick implementation of FIFO memory using block RAM. I believe, it can be optimized,
so I'd appreciate feedback from more experienced designers.

Cheers,

Bogdan


import CLaSH.Prelude

-- | A cyclic counter which increases on `en` signal
counterEnT :: (Eq t, Num t) => t -> t -> Bool -> (t, t)
counterEnT maxVal cnt en = (cnt', cnt)
  where cnt' | en && maxVal - 1 == cnt = 0
             | en = cnt + 1
             | otherwise = cnt

counterEn :: Unsigned 9 -> Signal Bool -> Signal (Unsigned 9)
counterEn maxVal = mealyB (counterEnT maxVal) 0

-- | Fifo memory using block RAM with independent read and write
fifo :: (KnownNat n, KnownNat n1)
      => SNat n
      -> (Signal (BitVector n1), Signal Bool, Signal Bool)
      -> (Signal (BitVector n1), Signal Bool, Signal Bool)
fifo depth (x, write, read) = (bram j i we x, full, empty)
  where
    bram = blockRam (replicate depth 0)
    full = d .==. signal max_d
    empty = d .==. signal 0

    d :: Signal (Unsigned 9)
    d = register 0 d'
    d' = delta `fmap` (bundle (write, read, d))

    depth' = fromIntegral $ natVal depth

    i :: Signal (Unsigned 9)
    i = (counterEn depth') (read .&&. (not1 empty))

    j :: Signal (Unsigned 9)
    j = (counterEn depth') we

    we = write .&&. can_write
    can_write = read .||. (d .<. signal max_d)

    delta (w',r',d') | w' && not r' && d' < max_d = d' + 1
                     | (not w') && r' && d' > 0 = d' - 1
                     | otherwise = d'
    max_d = fromIntegral (natVal depth)

Gabor Greif

unread,
Dec 11, 2015, 7:15:26 PM12/11/15
to CLaSH - Hardware Description Language
Thanks for the answers so far! I was also having a look at the BlockRAM processor example and I believe I understand the issues now.

I have prepared a minimal example of a histogram, ready for content/style review: https://github.com/ggreif/clash-ground/blob/master/Histogram.hs

The obvious gotcha is that I do not (yet) handle the case of equal read/write addresses, so the behavior is that of read-after-write, and I miss counts.

I'll try to fix that next.

Cheers,

Gabor

Gabor Greif

unread,
Dec 18, 2015, 4:51:50 AM12/18/15
to CLaSH - Hardware Description Language
Replying to my own post...

I am observing a strange behaviour in the Haskell side of BlockRam (e.g. in the precursor version https://github.com/ggreif/clash-ground/blob/f9bdfded1ae8ecf0a4962d444990842acbddc0c1/Histogram.hs)

When defining
> testInput' :: Signal (Unsigned 7)
> testInput' = stimuliGenerator $ 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil

and sampling it like:

*Main> L.drop 1 (sampleN 10 (topEntity testInput'))
[0,0,1,1,2,2,3,3,4]

These repetitions must happen because the read and write addresses coincide. But it surely looks like bug.
A xilinx blockRAM expert told me that a write and a read to the same address is deterministically possible,
the write goes before the read and the read fetches what the write has changed.

Any hints?

Thanks,

   Gabor

Christiaan Baaij

unread,
Dec 19, 2015, 5:44:23 AM12/19/15
to clash-l...@googlegroups.com
Hi Gabor,

It's very likely that Xilinx indeed natively support write-before-read, or, new-data-read. I think it's the same for Altera.

However, when I implemented the blockRam primitive, I followed the Altera HDL coding guide, which recommends read-before-write (old-data-read).

We can make new primitives, which implement write-before-read.

-- Christiaan
--
You received this message because you are subscribed to the Google Groups "CLaSH - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gabor Greif

unread,
Dec 20, 2015, 2:32:06 PM12/20/15
to CLaSH - Hardware Description Language

Altera seems to support new data mode at least since 2011 (https://www.altera.com/en_US/pdfs/literature/hb/cyclone-iv/cyiv-51003.pdf page 16).

Xilinx al least since 2005 (http://www.xilinx.com/support/documentation/ip_documentation/sp_block_mem.pdf)

Should I write an issue for this? (I can venture a try to implement it too).

Cheers,

    Gabor

Christiaan Baaij

unread,
Dec 21, 2015, 4:46:51 AM12/21/15
to clash-l...@googlegroups.com
Hi Gabor,

I looked at the datasheets of:
- The Altera Cyclone IV series
((https://www.altera.com/en_US/pdfs/literature/hb/cyclone-iv/cyiv-51003.pdf)
- The Xilinx 7 series
(http://www.xilinx.com/support/documentation/user_guides/ug473_7Series_Memory_Resources.pdf)

Now, technically, the CLaSH block RAM primitive is a:
- Dual port block RAM
- Operating in synchronous mode

The dual port means, that we have two address lines, which in our case
are a read address, and a write address. Synchronous mode means that
both the read and write address line are synchronised to the same clock.

Now, at the bottom of page 16 of the Cyclone IV memory block
documentation, in the section "Mixed-Port Read-During-Write Mode", which
is what we have, it says:

"This mode applies to a RAM in simple or true dual-port mode, which has
one port reading and the other port writing to the same address location
with the same clock. In this mode, you also have two output choices: Old
Data mode or Don't Care mode. In Old Data mode, a read-during-write
operation to different ports causes the RAM outputs to reflect the old
data at that address location. In Don't Care mode, the same operation
results in a “Don't Care” or unknown value on the RAM outputs"

As you can see, the "New Data" write mode is not an option in dual-port
operation for the Altera Cyclone IV. Now, moving to the latest Xilinx
7-series FPGAs, on page 18 and 19 of the memory recourses document, in
the section "Synchronous Clocking", it says:

"When one port performs a write operation, the write operation succeeds;
the other port can reliably read data from the same location if the
write port is in READ_FIRST mode. DATA_OUT on both ports then reflects
the previously stored data.

If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then
the DATA_OUT on the read port would become invalid (unreliable). The
mode setting of the read-port does not affect this operation."

So, although WRITE_FIRST is an available option, the output in
synchronous dual-port mode on a conflicting read port would be
invalid/unreliable. So even though WRITE_FIRST is available, READ_FIRST
seems like the sensible choice.

So for both Xilinx and Altera devices, the "Old Data" or READ_FIRST mode
is the only available/sensible choice for our synchronous dual-port
blockRam primitive. We could add single-port blockRam primitives to
CLaSH, which has a single address line, and read and write enable
signals. For single-port block RAMs, "New Data" or WRITE_FIRST is
possible/sensible.

Please first consult your resident Xilinx FPGA expert first though.
Perhaps I'm misunderstanding the Xilinx documentation, and a reliable
WRITE_FIRST, dual port, synchronous Block RAM is possible on Xilinx
devices. In that case, I'm fine also with adding such a primitive.
However, I would like it to live in a new "CLaSH.Xilinx" module
hierarchy then.

Cheers,

Christiaan


On 12/20/2015 08:32 PM, Gabor Greif wrote:
>
> Altera seems to support new data mode at least since 2011
> (https://www.altera.com/en_US/pdfs/literature/hb/cyclone-iv/cyiv-51003.pdf
> page 16).
>
> Xilinx al least since 2005
> (http://www.xilinx.com/support/documentation/ip_documentation/sp_block_mem.pdf)
>
> Should I write an issue for this? (I can venture a try to implement it too).
>
> Cheers,
>
> Gabor
>
>
> Am Samstag, 19. Dezember 2015 11:44:23 UTC+1 schrieb Christiaan Baaij:
>
> Hi Gabor,
>
> It's very likely that Xilinx indeed natively support
> write-before-read, or, new-data-read. I think it's the same for Altera.
>
> However, when I implemented the blockRam primitive, I followed the
> Altera HDL coding guide, which recommends read-before-write
> (old-data-read).
>
> We can make new primitives, which implement write-before-read.
>
> -- Christiaan
>
> On 18 Dec 2015, at 10:51, Gabor Greif <ggr...@gmail.com
> <javascript:>> wrote:
>
>> Replying to my own post...
>>
>> I am observing a strange behaviour in the Haskell side of BlockRam
>> (e.g. in the precursor version
>> https://github.com/ggreif/clash-ground/blob/f9bdfded1ae8ecf0a4962d444990842acbddc0c1/Histogram.hs
>> <https://github.com/ggreif/clash-ground/blob/f9bdfded1ae8ecf0a4962d444990842acbddc0c1/Histogram.hs>)
>>
>> When defining
>> > testInput' :: Signal (Unsigned 7)
>> > testInput' = stimuliGenerator $ 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> 0
>> :> 0 :> 0 :> 0 :> 0 :> 0 :> 0 :> Nil
>>
>> and sampling it like:
>>
>> *Main> L.drop 1 (sampleN 10 (topEntity testInput'))
>> [0,0,1,1,2,2,3,3,4]
>>
>> These repetitions must happen because the read and write addresses
>> coincide. But it surely looks like bug.
>> A xilinx blockRAM expert told me that a write and a read to the
>> same address is deterministically possible,
>> the write goes before the read and the read fetches what the write
>> has changed.
>>
>> Any hints?
>>
>> Thanks,
>>
>> Gabor
>>
>>
>>
>>
>> Am Samstag, 12. Dezember 2015 01:15:26 UTC+1 schrieb Gabor Greif:
>>
>> Thanks for the answers so far! I was also having a look at the
>> BlockRAM processor example and I believe I understand the
>> issues now.
>>
>> I have prepared a minimal example of a histogram, ready for
>> content/style review:
>> https://github.com/ggreif/clash-ground/blob/master/Histogram.hs <https://github.com/ggreif/clash-ground/blob/master/Histogram.hs>
>>
>> The obvious gotcha is that I do not (yet) handle the case of
>> equal read/write addresses, so the behavior is that of
>> read-after-write, and I miss counts.
>>
>> I'll try to fix that next.
>>
>> Cheers,
>>
>> Gabor
>>
>> --
>> You received this message because you are subscribed to the Google
>> Groups "CLaSH - Hardware Description Language" group.
>> To unsubscribe from this group and stop receiving emails from it,
>> send an email to clash-languag...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout
>> <https://groups.google.com/d/optout>.
>
> --
> You received this message because you are subscribed to the Google
> Groups "CLaSH - Hardware Description Language" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to clash-languag...@googlegroups.com
> <mailto:clash-languag...@googlegroups.com>.

Dan Brown

unread,
Dec 24, 2015, 10:25:57 AM12/24/15
to CLaSH - Hardware Description Language
I've recently been wrestling with issues related to this at work, but in the single-port case.  In case it helps anyone, let me respond to this:

We could add single-port blockRam primitives to 
CLaSH, which has a single address line, and read and write enable 
signals. For single-port block RAMs, "New Data" or WRITE_FIRST is 
possible/sensible. 

In fact, for Altera the situation is pretty horrible, at least for the devices I'm using.  From the Stratix V handbook, in Embedded Memory Features, table 1:

FeaturesM20KMLAB


Same-port read-during-write

Output ports set to "new data".

Output ports set to "don't care".

Mixed-port read-during-writeOutput ports set to "old data" or "don't care".Output ports set to "old data", "new data", "don't care", or "constrained don't care".

The Stratix V devices have two types of embedded BRAM.  We can see that in the single-port case, one type of memory block supports only "New Data", the other only "Don't Care".  Which means that "Don't Care" is the only reasonable behavior to assume, if you want a generic memory that the tools can instantiate to either memory block!

This also applies to simultaneous read + write on the same port of a dual-port device, which is the mode I encountered at work.

   -Dan

Peter Lebbing

unread,
Dec 24, 2015, 1:13:00 PM12/24/15
to clash-l...@googlegroups.com
On 24/12/15 16:25, Dan Brown wrote:
> Which means that "Don't Care" is the only reasonable behavior to
> assume, if you want a generic memory that the tools can instantiate
> to either memory block!

Wouldn't it be possible to instantiate a "New data" behaviour with a
memory in any mode you want and some surrounding logic that provides
a bypass? If you see a read-during-write occuring, disregard the
data output of the RAM and feed the data directly from the write input
to the read output.

This is a waste of resources if the application doesn't care, but can be
instantiated regardless of the mode of the actual RAM.

My 2 cents,

Peter.

--
I use the GNU Privacy Guard (GnuPG) in combination with Enigmail.
You can send me encrypted mail if you want some privacy.
My key is available at <http://digitalbrains.com/2012/openpgp-key-peter>

Dan Brown

unread,
Dec 24, 2015, 1:15:35 PM12/24/15
to clash-l...@googlegroups.com
Excellent point.  "New Data" mode can *always* be implemented, at the cost of logic and performance (maximum clock rate).

--
You received this message because you are subscribed to a topic in the Google Groups "CLaSH - Hardware Description Language" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clash-language/_alQnDgfoe8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clash-languag...@googlegroups.com.

Gabor Greif

unread,
Dec 29, 2015, 6:13:54 AM12/29/15
to CLaSH - Hardware Description Language
Hi Christiaan, Peter, Can!

Thanks for this insights. This really made me think, and to package it up as an abstraction...

I have implemented Peter's suggestion like this:

> readNew :: (Signal (Unsigned n) -> Signal (Unsigned n) -> Signal Bool -> Signal a -> Signal a) -> Signal (Unsigned n) -> Signal (Unsigned n) -> Signal Bool -> Signal a -> Signal a
> readNew ram wrAddr rdAddr wrEn wrData = mux wasSame wasWritten $ ram wrAddr rdAddr wrEn wrData
>   where sameAddr = liftA2 (==) wrAddr rdAddr
>         wasSame = False `register` (liftA2 (&&) wrEn sameAddr)
>         wasWritten = undefined `register` wrData

usage is:

*Main> :t readNew (blockRam $ replicate d4 0)
readNew (blockRam $ replicate d4 0)
  :: (Num a, KnownNat n) =>
     Signal (Unsigned n)
     -> Signal (Unsigned n) -> Signal Bool -> Signal a -> Signal a

*Main> :t readNew (blockRamPow2 $ replicate d4 0)
readNew (blockRamPow2 $ replicate d4 0)
  :: Num a =>
     Signal (Unsigned 2)
     -> Signal (Unsigned 2) -> Signal Bool -> Signal a -> Signal a

Technically it would work for a 1-cycle delayed asyncRam too.

Christiaan, do you think it would make sense to include `readNew` into the CLaSH.Prelude?

Cheers,

    Gabor

Gabor Greif

unread,
Dec 29, 2015, 8:03:58 AM12/29/15
to CLaSH - Hardware Description Language
Some more thoughts. Maybe the platform toolchains (Vivado, Quartus?) will recognize specialised uses of the read-after-write pattern (by peeking through the output logic) and automatically optimize to the HW's abilities?

Cheers,

    Gabor


Am Dienstag, 29. Dezember 2015 12:13:54 UTC+1 schrieb Gabor Greif:
Hi Christiaan, Peter, Can!

Of course I mean "Dan". Sorry for the typo.
 

Christiaan Baaij

unread,
Dec 30, 2015, 6:28:30 AM12/30/15
to clash-l...@googlegroups.com
Yes, it makes sense to add it to the CLaSH Prelude.
Especially given that both Xilinx and Altera recommend to implement read-write conflict-resolution in logic anyway (i.e. not depend on the read-write conflict behaviour of the RAM itself)
Perhaps you want to submit a patch/pull request?

With regards to platform toolchain magic: unlikely.
From personal experience, the VHDL/(System)Verilog has to be quite specific for the toolchains to infer (block)RAMs.
It is the main reason why the blockRam is a primitive in the first place.

-- Christiaan

--
You received this message because you are subscribed to the Google Groups "CLaSH - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.

Gabor Greif

unread,
Dec 30, 2015, 8:47:12 AM12/30/15
to CLaSH - Hardware Description Language
Am Mittwoch, 30. Dezember 2015 12:28:30 UTC+1 schrieb Christiaan Baaij:
Yes, it makes sense to add it to the CLaSH Prelude.
Especially given that both Xilinx and Altera recommend to implement read-write conflict-resolution in logic anyway (i.e. not depend on the read-write conflict behaviour of the RAM itself)
Perhaps you want to submit a patch/pull request?

Christiaan,

great, here it is in its first version: https://github.com/clash-lang/clash-prelude/pull/39

Let's start bikeshedding about function/module names and docs.
 

With regards to platform toolchain magic: unlikely.
From personal experience, the VHDL/(System)Verilog has to be quite specific for the toolchains to infer (block)RAMs.
It is the main reason why the blockRam is a primitive in the first place.

 see.

Thanks,

    Gabor
 

Gabor Greif

unread,
Jan 7, 2016, 7:19:55 AM1/7/16
to CLaSH - Hardware Description Language
Okay, looks like the readNew adapter for blockRam will be part of the next release. Still, any comments welcome.

Another issue that popped up for me is the lacking modularity of the (block)Ram primitives. There are 3 signals controlling the write portion of the ram. I tried to join those into a single, more expressive signal. I came up with this convenience adapter.

maybeWrite :: (Signal addr -> Signal addr -> Signal Bool -> Signal dt -> Signal dt)
           -> Signal (Maybe (addr, dt)) -> Signal addr -> Signal dt
maybeWrite ram wr rd = ram wrAddr rd wrEn wrData
  where apart (Just (addr, dt)) = (True, addr, dt)
        apart Nothing = (False, undefined, undefined)
        (wrEn, wrAddr, wrData) = unbundle (apart <$> wr)

The big upside of a ram signature Signal (Maybe (addr, dt)) -> Signal addr -> Signal dt is the whole power of the Applicative/Monad tools that are suddenly becoming available.

I stacked further functionality on top of this, such as:

condWrite :: (Signal (Maybe (addr, dt)) -> Signal addr -> Signal dt)
          -> (dt -> Maybe dt) -> Signal addr -> Signal dt

and finally an error-prone task of dealing with clock-delayed RMW became a rather trivial one-liner:

histoCond = condWrite (maybeWrite $ readNew (blockRamPow2 (repeat 0))) (Just . (+1))

Does anybody also feel that such utility adapters have a place in the Prelude?

My dabblings can be seen here: https://github.com/ggreif/clash-ground/blob/master/Histogram.hs

As always, I'd love to hear your thoughts.

Cheers,

    Gabor
Reply all
Reply to author
Forward
0 new messages