VHDL if statement without else equivalent in Clash

258 views
Skip to first unread message

Mahshid Shahmohammadian

unread,
May 21, 2021, 12:02:08 PM5/21/21
to Clash - Hardware Description Language
Hi,

I am working on a serial handshaking combinator that performs a function (say incrementer for now) every n cycles. I need if without else. I have written it in Kansas Lava and I had to use delay to assign to the signal whenever the else clause occurs. I'm new to Clash, so before moving to monadic approach and use when, I wrote the code below where for example in x assignment in else I'm assigning x to x which in Kansas Lava leads to loop for code generation. I'm wondering what happens here in Clash since it compiles and generates the vhdl code successfully. And how do you suggest writing this statement in Clash?

handshakeSerial :: (Num a) => State -> (Enabled a,Ready) -> (State,(Enabled a,Ready))
handshakeSerial state (dataIn,out_ready) = (state',(dataOut,in_ready))
  where
    dataOut   = case state of  Idle -> Nothing
                                             Valid  -> x'
                                             Ready  -> Nothing
    in_ready  = case cnt of  0 -> True
                                           _ -> False
    state' = case state of  Idle -> case dataIn of
                                          Nothing -> Idle
                                          _       -> Valid
                                       Valid  ->   if out_ready then Ready else Valid
                                       Ready  ->   if not out_ready then Idle else Ready

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt)
    x = if (state == Valid && cnt == 0) then dataIn else x
    x' = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then x + 1
          else (if cnt == n && out_ready == True then 0 else x)

type Enabled a = Maybe a
type Ready = Bool
data State = Idle | Valid | Ready  deriving (Eq,Show,Generic,NFData,ShowX)


Thanks,
Mahshid

peter.t...@gmail.com

unread,
May 22, 2021, 4:32:58 AM5/22/21
to Clash - Hardware Description Language
Your x is only used to calculate x' for the value next cycle.

State Valid is the only condition in which x' slips through to data output

In case state Valid, count is n (nonzero), read enable ("out_ready") line down, yes, x=x is what you have defined, so the value of x' is a looped calculation, and that gets through to data output.

I assume you didn't mean that! I guess that you meant to write was instead that x should stay equal to some previous value that it held before, so you need to hold on to that to assign to x now by making your state remember x for one cycle, say as "x_", and then instead of "else x", write "else x_".

So you need to replace "State" by a pair "(State,datain)", then your first argument becomes not "state", but "(state,x_)".

You also need to provide the value of x you calculate now as an extra piece of state to remember for next cycle. Your result should now say not "state'" but "(state',x)".

It would be helpful to reformat the code to make it more easily readable. Say:

x = case (state,cnt) of
         (Valid,0) -> dataIn
         _              -> x_     ---------- NB previous value, not present value!

for example. Lots of white space.

PTB (who is about to ask a question ...)

Christiaan Baaij

unread,
May 22, 2021, 5:12:09 AM5/22/21
to clash-l...@googlegroups.com
It is as Peter says: you seem to want to use the `x` from the previous clock cycle, so you need to make it part of your state.
So you probably want:
```
handshakeSerial :: (Num a) => (State, Enabled a) -> (Enabled a,Ready) -> ((State,Enabled a),(Enabled a,Ready))
handshakeSerial (state, xP) (dataIn,out_ready) = ((state',x'),(dataOut,in_ready))

  where
    dataOut   = case state of  Idle -> Nothing
                                             Valid  -> x'
                                             Ready  -> Nothing
    in_ready  = case cnt of  0 -> True
                                           _ -> False
    state' = case state of  Idle -> case dataIn of
                                          Nothing -> Idle
                                          _       -> Valid
                                       Valid  ->   if out_ready then Ready else Valid
                                       Ready  ->   if not out_ready then Idle else Ready

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt)
    x = if (state == Valid && cnt == 0) then dataIn else xP

    x' = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then x + 1
          else (if cnt == n && out_ready == True then 0 else x)
type Enabled a = Maybe a
type Ready = Bool
data State = Idle | Valid | Ready  deriving (Eq,Show,Generic,NFData,ShowX)
```
I'm not exactly sure which value you want to remember though:
1. The value of dataIn, or
2. The value of x'

Currently, the above code implements 2. If you meant option 1, you have to change to:
```
handshakeSerial (state, xP) (dataIn,out_ready) = ((state',x),(dataOut,in_ready))
```
Either option sorta corresponds to adding a `delay` function like you would in Kansas Lava.

Finally: Clash does not check for combinational loops when it translates Haskell to Verilog/VHDL; that's why Clash happily generates Verilog/VHDL for your original code.
There are some corner cases that make checking for "actual" combination loops tricky, so we haven't created the infrastructure in the Clash compiler to check for combinational loops.
Also, you would already be able to witness combinational loops when you simulate/run your code as a regular Haskell program: you would get a blinking cursor because evaluation of your program gets stuck.

Hope the above helps

--
You received this message because you are subscribed to the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clash-language/4e1deb0e-59da-474d-a14d-465fb87ff046n%40googlegroups.com.

peter.t...@gmail.com

unread,
May 22, 2021, 6:14:38 AM5/22/21
to Clash - Hardware Description Language


"cnt" may have the same problem, going by the "cnt = ... else cnt". Hard to say if that ever gets through the logic. And I thought runtime in the interpreter says "<<loop>>" explicitly, when it can tell, so clearly I have too old a version!

Peter

Mahshid Shahmohammadian

unread,
May 24, 2021, 8:35:07 AM5/24/21
to clash-l...@googlegroups.com
Thank you, Peter and Christiaan. A delayed version of x (and cnt) is what I need as I mentioned I implemented this by delay or register in Kansas Lava. I was curious to investigate what is happening in Clash that the code is successfully generated for this implementation that Christiaan clarified.

Should I do the same with cnt? both previous x and previous cnt bundled in with state to go into the mealy machine, something like this:

handshakeSerial :: (Num a) => (ST,Enabled a,Int) -> (Enabled a,Ready) -> ((ST,Enabled a,Int),(Enabled a,Ready))
handshakeSerial (state,x_,cnt_) (dataIn,out_ready) = ((state',x',cnt),(dataOut,in_ready))

  where
    dataOut   = case state of  Idle -> Nothing
                               Valid  -> x'
                               Ready  -> Nothing
    in_ready  = case cnt of  0 -> True
                             _ -> False
    state' = case state of  Idle -> case dataIn of
                              Nothing -> Idle
                              _       -> Valid
                            Valid  ->   if out_ready then Ready else Valid
                            Ready  ->   if not out_ready then Idle else Ready

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt_)
    x = if (state == Valid && cnt == 0) then dataIn else x_

    x' = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then x + 1
          else (if cnt == n && out_ready == True then 0 else x)

hsSerial :: (KnownDomain dom,
        GHC.Classes.IP (Clash.Signal.HiddenClockName dom) (Clock dom),
        GHC.Classes.IP (Clash.Signal.HiddenEnableName dom) (Enable dom),
        GHC.Classes.IP (Clash.Signal.HiddenResetName dom) (Reset dom),
        Num a, NFDataX a)
        =>
        Signal dom (Enabled a, Ready) -> Signal dom (Enabled a, Ready)
hsSerial = mealy handshakeSerial (Idle, Nothing, 0)

For some reason, when I simulate this in Clash I get no results:

handshakeSerialTest :: forall a . (NFDataX a, Num a) => [(Enabled a,Bool)]
handshakeSerialTest = simulate @System hsSerial [(Nothing,False),(Just 5, False),(Just 5, False),(Just 5,True),(Just 5, False)]

And with simulate_lazy I get:
[(Nothing,


Thanks,
Mahshid



--
Mahshid Shahmohammadian
Ph.D. Candidate
Computer Science Department
Drexel University

Martijn Bastiaan

unread,
May 24, 2021, 8:48:04 AM5/24/21
to clash-l...@googlegroups.com

The definition of `cnt` depends on itself:

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt_)

I.e., if we assume `state` equals `Valid` and we want to know what `cnt` is, we first need to know what `cnt` is - as we want to check whether it's equal to zero. This is a combinatorial loop - Clash will never insert memory elements implicitly. The simulation will therefore get stuck in an infinite loop try to evaluate `cnt`.

Martijn Bastiaan

unread,
May 24, 2021, 8:50:05 AM5/24/21
to clash-l...@googlegroups.com

Also, I see you're using the underlying representation of the various Hidden* constructs we've got. (This makes sense as this is unfortunately what is being shown in error messages.) You could replace:

KnownDomain dom,
        GHC.Classes.IP (Clash.Signal.HiddenClockName dom) (Clock dom),
        GHC.Classes.IP (Clash.Signal.HiddenEnableName dom) (Enable dom),
        GHC.Classes.IP (Clash.Signal.HiddenResetName dom) (Reset dom)

with

HiddenClockResetEnable dom

Cheers,
Martijn

On 24-05-2021 14:34, Mahshid Shahmohammadian wrote:

Mahshid Shahmohammadian

unread,
May 24, 2021, 9:01:12 AM5/24/21
to clash-l...@googlegroups.com
Right, this should depend on the value of "cnt_" which is the previous value of cnt not itself.

Thanks,
Mahshid

Mahshid Shahmohammadian

unread,
May 27, 2021, 2:22:50 PM5/27/21
to Clash - Hardware Description Language
I'm facing a new challenge and would like to ask for your recommendation. Remember my serial combinatror, now consider a pipelined version which I implemented in Kansas Lava as well. I'm implementing it using fold on Vec type. My problem is again with delays here; which you guys suggested passing the previous values to the function that is fed to the mealy machine for the serial version.

Please take a look at the code below:

handshakeParallel :: (Num a) => (Int,a,Valid) -> (Bool,a,Valid,Ready) -> ((Int,a,Bool),(a,Valid,Ready))
handshakeParallel (n,x_,valid_out_) (rst,dataIn,in_valid,out_ready) = ((n,x',out_valid),(x',out_valid,in_ready))
  where

    out_valid = some logic ...
    in_ready  = out_ready

    x = if (rst == False && out_ready == True) then dataIn else x_
    ones = replicate (SNat :: SNat 16) 1
    initial = (0,out_ready,x)
    (_,_,x') = foldl (\(i,ready,f) g -> if ready then (i,ready,f+g) else ??? ) initial ones


hsParallel :: (HiddenClockResetEnable dom, Num a, NFDataX a)
                  => Signal dom (Bool, a, Valid, Ready) -> Signal dom (a,Valid, Ready)
hsParallel = mealy handshakeParallel (16, 0,True)    --- (n,x_init, valid_out_init)


In the function the fold gets, I want to say if ready is asserted then add input to 1 (since this combinator just adds to 1 for now), and else: give me a previous value. How can I enforce the previous value here inside this function here? Am I required to use "delay" which is operated on Signal types and I need to lift my types to "Signal a"? And if yes should I write my conditional statements using "mux" only if I go into Signal realm?

Also, in the lambda expression I wrote in (\(i,ready,f) g -> if ready then (i,ready,f+g) else ??? ) the f+g should've been replaced with (delay f + g) if I want a pipelined structure that I guess makes the requirement for delay even more.

Thanks a lot in advance,
Mahshid

Christiaan Baaij

unread,
May 28, 2021, 3:43:49 AM5/28/21
to clash-l...@googlegroups.com
You say you implemented this function in Kansas Lava as well, could we see it? (Just a link to some online source repo, e.g. github / github gist, is sufficient).
That way I have a better understanding of what you're trying to do.

peter.t...@gmail.com

unread,
May 28, 2021, 7:14:07 AM5/28/21
to Clash - Hardware Description Language
I admit to being baffled by the code too. The last lot had me beswazzled because state' did not depend on state! There was a correlation with several things, but no direct statement of the intended evolution. (An explanation of why it is hard for a human to undrstand  might be that it originally was a finite-state machine generated from a diagram, rendered in code, and then partially translated/reverse engineered by a human again - my guess.)  Without the first code as a solid basis for understanding (or the Lava) it is practically impossible at least for this human to repartition it into parallel/pipelined units.

But people here do the impossible all the time, so maybe! Good luck!

I'll have a go at rendering your code in a way that I find more parseable .... (let me know if I err):

handshakeParallel :: (Num a)
                  => (Int,a,Valid)         -- state   (last count? last "x"? nominal state)
                  -> (Bool,a,Valid,Ready)  -- inputs  (reset? data in? what? flow control?)
                  -> ( (Int,a,Bool)        -- state
                     , (a,Valid,Ready)     -- outputs (data out? what? flow control?)
                     )
handshakeParallel (n,   x_    ,valid_out_)
                  (rst, dataIn,in_valid, out_ready) =
                      ( (n,x',out_valid)
                      , (x',out_valid,in_ready)
                      )
  where

    out_valid = some logic ... -- Ineed to see it! How does it depend on state? Please elaborate.
    in_ready  = out_ready      -- WHY is this here?


Above you should drop the out_ready as it plays no role  and we want to simplify for understanding. You can connect the streams via "out_ready = in_ready"  after having constructed the mealy machine without the extra output. (I don't know what names you will give to the streams at that point, so you will have to modify these names to match the ones you use for the inputs and outputs to/from the mealy machine).

    x = if (rst == False && out_ready == True) then dataIn else x_

That is surprising if rst is really a reset. Shouldn't reset set x to something like 0?

    x = case undefined of
          _ | rst            -> x_   -- (really? Surely 0?)
          _ | not out_ready  -> x_
          _                  -> dataIn  -- so out_ready gates dataIn to x
   
It looks from the above like x is intended to be dataOut! Is that what it really is, perhaps?

The following fold is really quite far from parseable for me personally and I would ordinarily guess on that account that it is mistaken. Can you explain for me in words what it is trying to do? Perhaps some inline types would aid my reading?

    init = (0,out_ready,x)
    (_,_,x') = foldl fn init  (replicate d16 1)
               where fn (i,ready,f) g = case ready of
                                          True  -> (i,ready,f+g)
                                          False -> -- need to see this!

Fold works its way along the input (which is 11111...) accumulating a result. That result seems to be more or less a count (discarded) (can't tell what it counts because the code is not there :-(), a yes/no boolean that starts out as "out_ready" and does not change in the code shown, so the business end must be in the elided code, and a sum that starts with x and adds 1 all the time in the code shown, so it ends up with x+16 (the number of initial 1s).

The initial condition out_ready does not change, and the vector it is applied to is a constant, so this (x') is a function of x and out_ready ONLY. What is it intended to be? The part shown is

   x' = case out_ready of
          True  -> x+16
          False -> ????

My conviction is that the code with fold in must be wrong? The types and portion of code given say this is a simple function with no need to use fold. Can you say something about that to put my mind at ease?

hsParallel :: (HiddenClockResetEnable dom, Num a, NFDataX a)
           => Signal dom (Bool, a, Valid, Ready)  -- inputs
           -> Signal dom (a,Valid, Ready)         -- outputs
hsParallel = mealy handshakeParallel (16, 0,True) -- (n,x_init, valid_out_init)

Without knowing with a great deal more certainty what the semantics is intended to be, I can't really offer a parallelization. Those things are hard enough to get right with all the information in the world!

For me, the first mystery to resolve is why your mealy machine has a next state that does not depend explicitly on the previous state plus new inputs. It is not natural for a human being to write that as a mealy machine! Can you shed some light there.

If it were really true that there is no semantic connection between prior and next state, then you could just not bother with a mealy machine. But I think there is some connection and it has been obscured in the coding. Can you make the dependence explicit,please? That would help my understanding a lot.

Regards

PTB

peter.t...@gmail.com

unread,
May 28, 2021, 7:38:26 AM5/28/21
to Clash - Hardware Description Language

Are you perhaps trying to add up  the 16 last inputs for which the ready signal was high when they arrived ? Or something not a million miles away from that?

If so, the idea of using fold is mistaken because (a) the vector it is applied to must be  present right now, and you are trying to apply it "across time" (I guess!). You must instead accumulate the 16 last inputs into a vector present in the here and now, and apply fold to that vector, right now. But please don't do that, because...

(b) the idea of using fold was rather baroque because the function it implements does not look hard at all, so you didn't need a sledgehammer.

My guess is that you want to accumulate some ongoing count or sum of inputs, each gated by a ready signal that was present at the same time as the input and output that evolving count or sum while so requested by  flow control and reset it on a signal too. Is that it?

People will be able to render that for you quite simply (but illegibly!) as a combination of operations on signals. You'd be surprised.

For example, the signal that at each moment in time contains the last 16 inputs (including the last) as a vector is

last16Ins :: (HiddenClockResetEnable dom, NFDataX, ...) => Signal (Vec 16 a)
last16Ins = zipWith (+>>) dataIns (register (replicate d16 1) last16Ins)

or something simular. Details left to the reader! Then you can just fmap f onto that, where f does whatever you want on that vector of 16 things you have accumulated over the last 16 cycles. (Maybe the vector should record the data_ready input too!).

Mahshid Shahmohammadian

unread,
May 28, 2021, 12:44:38 PM5/28/21
to Clash - Hardware Description Language
Sorry for the confusion the code I provided may have caused. Thank you Peter for trying to parse the function. The idea of linking to a repository for a better understanding of the functionality of this pipelined incrementer makes sense. Please check the repo here:


And my Kansas Lava implementation is in:

gen-vhdl/incrementer/kansas-lava/incrementer.hs --> parallel version is the function "parallelIter"

Also, I have a handwritten VHDL version of this functionality that helps to understand what I'm talking about which is:



Thanks,
Mahshid

Mahshid Shahmohammadian

unread,
May 28, 2021, 1:27:04 PM5/28/21
to Clash - Hardware Description Language
Peter,

The last16Ins is not quite what I have in mind, however, it is kinda similar. The circuit will check if the ready signal is asserted assigns the (dataIn+1) to the first element of the pipeline stage, and also the rest of the pipeline stages will be incremented. The valid is treated as a shift register when ready is asserted. Finally, the last stage of the pipeline is outputted to dataOut.

So, in simulation, the waveform will look like the attached file.

On Friday, May 28, 2021 at 7:38:26 AM UTC-4 peter.t...@gmail.com wrote:
parallel-inc.png

peter.t...@gmail.com

unread,
May 28, 2021, 5:30:13 PM5/28/21
to Clash - Hardware Description Language
That's more HOW than WHAT, but I get the idea, I think ...


1  Check if the ready signal is asserted assigns the (dataIn+1) to the first element of the pipeline stage,

Assuming the state is a vector of somethings representing what is in the various stages at one time, that is

           v' = if ready then replace 0 (dataIn+1) v else v



2. and also the rest of the pipeline stages will be incremented.

So that is       

   v' = if ready then map (+1) (replace 0 dataIn v) else v


3. The valid is treated as a shift register when ready is asserted.

I don't quite parse that. Do you mean that actually the stuff also all shifts up one position  when ready is high? That would be

  v'  = if ready then map (+1) ( dataIn +>> v) else v

I am assuming that nothing moves and/or is incremented and/or introduced when ready is low! You didn't say.

Finally, the last stage of the pipeline is outputted to dataOut.

  dataOut = last v  -- or do you mean last of v', the changed vector? I guess you meant the former.

Yes? Now that is a mealy machine as written, but it could be split up. I'll talk about that below. Meanwhile the mealy machine is

     mymachine :: (HiddenClockResetEnable dom , dataIn ~ data, dataOut ~ data, ready ~ Bool) -- what types really?
            => Signal dom dataIn -> Signal dom ready -> Signal dom dataOut
  mymachine = mealy f init . curry bundle       -- playing silly with ". curry bundle" to give you the nice type above
         where
                   f :: (Vec 16 data,(dataIn,read)y) -> (Vec16 data,dataOut)
                   f (v,(dataIn,ready)) = (v',dataOut)
                                          where v' = if ready then map (+1) ( dataIn +>> v) else v
                                                dataOut = last v
                   init = def :: Vec 16 data  -- FIXME, specify please

      
OK? I don't know if that is exactly what you meant because the English isn't fully determinative with respect to some points of detail. I have no  sure feeling for example  of if you really meant to increment ALL the stages at once, in the situation where anything happens at all. What's the point? The data will just all end up having been incremented by 16 by the time it gets to the end of the pipeline (the "vector"), so why not just increment it by 16 in one go instead of by 1, 16 times over?

Maybe it's just an exercise, and it's not meant to make too much sense in practical terms!

What is interesting is that what comes out is delayed by at least 16 cycles (and incremented by 16, as per the above), and likely in practice is delayed by considerably more. That is because every cycle in which ready is down adds one cycle to the delay, as the pipeline does not move at all on that cycle. Did you really mean  that? I assume so. Otherwise it's just a last16Ins producer, and you didn't want that.

You really want this  to be not a single machine handling a vector of 16 values, but 16 stages each handling one value each. I'll do that now.

So you would write

        dataOuts = (m15 readies . m14 readies . ... . m1 readies . m0 readies) dataIns

where each of those 16 machines (each takes the ready signal) takes a dataIn and produces a dataOut. I'll do that more succinctly lower down, but it helps  to see it written out "longhand" first, I think. Each machine has the formal type just announced:

      m0, m1, ..., m15 :: (HiddenClockResetEnable dom, ...) => Signal dom ready -> Signal dom dataIn -> Signal dom dataOut


They are all identical, aren't they?

    m0 = m
    m1 = m
    ...
    m15 = m

OK, I got tired already. Let's cut the longhand and just write

  dataOuts = (m readies . m readies . ... . m readies . m readies) dataIns

What we need is the vector of 16 machines already wired with the ready signal:

   replicate d16 (m readies)

and we need to compose them:

   dataOuts  = compose (replicate d16 (m readies))  dataIns

and "compose" had better mean a fold of the binary function composition operator

  dataOuts = compose (replicate d16 (m readies))  dataIns
                   where compose = fold (.)

It is fervently to be hoped that Clash can smash all that abstract statement out into a flat application, so I don't have to help it at all in any way, not by crossing my fingers even. That remains to be seen.

The single machine "m" is the mealy machine with a vector length 1 in it, instead of length 16.

m :: (HiddenClockResetEnable dom , dataIn ~ data, dataOut ~ data, ready ~ Bool)
            => Signal dom dataIn -> Signal dom ready -> Signal dom dataOut
  m = mealy f init . curry bundle
         where
                   f :: (Vec 1 data,(dataIn,read)y) -> (Vec 1 data,dataOut)
                   f (v,(dataIn,ready)) = (v',dataOut)
                                          where v' = if ready then map (+1) ( dataIn +>> v) else v
                                                dataOut = last v
                   init = def :: Vec 1 data  -- FIXME, specify please


I copied that from higher up and changed the 16s to 1s. You could beat up on it for having vectors length 1 instead of just the data inside the vector. Shrug. I'm lazy.

Is this like what you were thinking of?

Regards

PTB

Mahshid Shahmohammadian

unread,
May 28, 2021, 6:19:50 PM5/28/21
to clash-l...@googlegroups.com
Thanks a lot for your complete elaboration! Other than the valid part everything you mentioned is what I meant. Valid is another input signal to the circuit that is we keep them in a shift register manner and output out_valid as the last (MSB) of valid. Also, to answer your question why don't we just add by 16 and delay by 16, this is going to be a generic circuit not only incrementing stuff (a combinator).

Your idea to pass the vector as a state to the mealy machine sounds good! So I'm just going to go with that.

Thanks,
Mahshid

You received this message because you are subscribed to a topic in the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clash-language/nhnc_dNOOxg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clash-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clash-language/5eed4632-b527-4cd8-bdb9-d51e6a3c6fa9n%40googlegroups.com.

Mahshid Shahmohammadian

unread,
May 31, 2021, 12:40:34 PM5/31/21
to clash-l...@googlegroups.com
I gave this more thought. In the first implementation, we can ignore the reset signal and just use your function for m as a single machine for every other 15 stages of the pipeline. The first stage is different from others. The first one gets the input data but the rest of the machines get the data from the previous machines inside the pipeline. I'm not sure the composition is going to be the same, is it?

Do you think this implementation is going to use fewer resources after synthesis compared to the fold implementation I suggested along with using delays (since you are way more expert than me)?

Peter Breuer

unread,
May 31, 2021, 1:59:45 PM5/31/21
to clash-l...@googlegroups.com
The first is the same machine as the rest. Just its input has a different name. Honest!

(Brevity caused by phone gui - apologies!)

Peter

peter.t...@gmail.com

unread,
May 31, 2021, 6:58:18 PM5/31/21
to Clash - Hardware Description Language
Now at a computer, let me elaborate.

First of all, I think you should concentrate first on your solution using a single mealy machine with a vector as state. When you have that working, but not before then, you can think about splitting it up into a pipeline of 16 mealy machines, following roughly what I did. One thing at a time! (though it's great that you are thinking ahead).

I made several typos/thinkos in writing down my own mealy machine on that vector, so I _know_ you will have to think about that first in order to get it working. I didn't intend that, apologies, but it's quite serendipitous from that point of view :-). On my conscience are

Thinko 1) I forgot to prefix the function type declaration with "forall (dom::Domain) data dataIn dataOut . " in order that the type declarations in subfunctions work (using those named types). Otherwise they'll just be a headache.

Thinko 2) I wrote down the (ready,dataIn) arguments the wrong way round, as (dataIn,ready).  It has to be "ready first" in order that the readies stream can be given as a (first) parameter to the mealy machine(s), which in turn allowed that neat trick of partially evaluating the tiny little mealy machines on the readies stream as "m readies" before replicating them in a vector and composing them as functions on their remaining input stream. Ahem. You'll probably spot that and correct it without a second thought.

OK? I reiterate that I don't see any difference in the semantics of the tiny little machines, if ever you get to the point of making those. The names of their input streams, or what is on them, are none of their business as machines. They take what they are given, and that's that. That some of them
are taking the outputs of other machines as their input, and one is taking the dataIns streaam, is  unknown to them all. They do what they do. If I miscalculated and somehow there should be some difference between their semantics, my apologies, but as I vaguely understood the intended functionality, there should be none. You may well know better!

You confused me with that "valid" business because without my allegedly erronious interpretation of what you said, the pipeline would never move!

You forgot to tell me to move the whole pipeline up one position when ready is asserted (or perhaps on every cycle! I don't know!). I misinterpreted what you said as telling me to do that. Now I don't know what you mean, as you tell me I got it wrong. But you should take note that you _MUST_ move the pipeline along explicitly yourself, if you intend to do so. It won't happen on its own. The mealy machine does what you tell it to and nothing else. Don't tell it to move the pipeline up one, and it won't!

Also the "valid" stuff as you now describe it just sounds like a parallel pipeline/vector moving an incoming "valid" signal up in step with the pipeline contents (yes, I know you think it's a bitvector, not a vector, but the difference is slight). Do you mean to keep the "valids" in step with the real pipeline content or not? If yes, why not attach them, and  move pairs (data,valid) up the  pipeline, instead of just "data"? You may say it uses more gates, but I don't think with small numbers like "16" hanging around that _anything_ can use too many gates, and the bigger danger right now is getting the semantics wrong. When you have it right, the easy way, then you can obscure the code for some greater implementation efficiency! Nothing is more difficult than keeping parallel accounting in step. What happens on reset? Make it as easy on yourself as possible!

But in any case, you have very little control over how Clash renders your code. In principle you can write whatever you like within one mealy machine and provided the functionality is the same, Clash will render that the same way, no matter how hard you try to obscure or disguise it :-) (in practice that is not true, but it is a fair first approximation to the truth). So you should not care about expressing what you mean one way or another. Clash will just smash the whole thing out into combinatorial logic, stuck inside a single loop, with one clock delay.  Same semantics for the combinatorial logic, same ("normalized") logic expression results ("in principle").

 [The reality is that rendering into vhdl or verilog will intercept higher level constructs than AND and NOR gates, which will result in different expressions in those languages ... nevertheless, vhdl or verilog compilers if asked to render that into pure logic gates should always complete the normalization - in their own fashion. You are fundamentally working in a computationally decidable domain, in which the semantic equality of syntactic expressions can in principle be decided in finite time and syntactic differences after normalization must mean differences in semantics ... so sue me for not mentioning that computational complexity can make that ideal effectively impossible.]

The only real difference you can make is with that split into 16 mealy machines in a pipeline. Even then I wouldn't swear that will make any difference, unless you also declare "NOINLINE m" on the little mealy machines. That will definitely stop Clash trying to smash those together, if it does that (which I give about 70/30 odds on).

I also was concerned because those 16 machines in a pipeline apparently don't have any delay between their input and output. BUT, but but, I was careful to define their output as "last v", not last v', which should provide the missing delay. Mealy machines left to themselves do output on the same cycle as their input, just a little later in the cycle. Ordinarily the effect of an input would cascade all the way up that pipeline in the same cycle, making yu have to go very slow with the clock! But I disengaged the output from the input via "last v" (it would have been engaged via last v', because the input dataIn influences v', while v is the last state of the vector, before the input arrived), so there is no cascade. Please be careful and don't accidently connect the input to the output in the same cycle and then put everything in a pipeline ...! It is the way I had it deliberately.

That's about all I can think of to say about it right now.

Ask again when you have the vector mealy machine working.

Regards

PTB

Mahshid Shahmohammadian

unread,
May 31, 2021, 8:48:47 PM5/31/21
to clash-l...@googlegroups.com
I already implemented the single machine version, but my own version. (since I could not figure out that " . curry bundle" part and got syntax errors). Here it is:

m16 :: (Num a) 
    => (Vec 16 a,Vec 16 Bool)
    -> (Bool,a,Valid,Ready)
    -> ((Vec 16 a,Vec 16 Bool),(a,Valid,Ready))
m16 (x_,v_) (rst,dataIn,in_valid,out_ready) = ((x',v'),(dataOut,out_valid,in_ready))
  where
    v' = if rst then (replicate d16 False)
         else (if (out_ready == True) then (in_valid +>> v_) else v_)

    x' = if (rst == False && out_ready == True)
         then (map (+1) (dataIn +>> x_)) else x_
    out_valid = last v_
    dataOut = last x_
    in_ready  = out_ready

hsm16 :: (HiddenClockResetEnable dom, Num a, NFDataX a)

       => Signal dom (Bool, a, Valid, Ready)
       -> Signal dom (a,Valid, Ready)
hsm16 = mealy m16 (replicate d16 0, replicate d16 False)

After synthesizing this, it's resulting in a much fewer number of LUTs than the fold implementation that I mentioned before. So, I wonder if I can get those 16 machines compositions to work, I can optimize the resource utilization by a lot! I'm kinda stuck on that curry and (.) which I think you tried to get the output of type "signal dom data" out of mealy call. Could you please double-check that?

BTW this is my other version which takes much more resources. I would appreciate it if you share any thoughts on why this one is utilizing way more LUTs.

handshakeParallel :: (HiddenClockResetEnable dom, Num a,NFDataX a)
                   => (Bool, Signal dom a,Valid,Ready)
                   -> (Signal dom a,Valid,Ready)
handshakeParallel (rst,dataIn,in_valid,out_ready) = (x',out_valid,out_ready)
  where   
    val_vec = replicate d16 in_valid
    vec' = if (rst == False && out_ready == True)
           then rotateLeft val_vec (length val_vec - 1)
           else rotateLeft val_vec (length val_vec - 2)      
    out_valid = if (rst == True) then False else (vec' !! 0)

    x = if (rst == False && out_ready == True) then (dataIn+1) else (delay 0 x)
    ones = replicate d15 1
    initial = (0,out_ready,x)
    (_,_,x') = foldl iter initial ones

iter (i, ready, f) g = (i',ready',f')
   where
     f' = if ready then ( f + g) else (dflipflop f')
     ready' = ready
     i' = i + 1


Thanks,
Mahshid






peter.t...@gmail.com

unread,
Jun 1, 2021, 2:31:19 AM6/1/21
to Clash - Hardware Description Language
Great! Here's an update. I'll explain below (yes, it really does what you want).

import Clash.Prelude
type Valid = Bool
type Ready = Bool
m16_s :: ( Num a
         , dataIn ~ a, dataOut ~ a
         , validOut ~ Valid            -- = Bool
         , readyOut ~ Ready            -- = Bool
         )
    => Vec 16 (Maybe a)                -- entries tagged as valid/not valid
    -> ( readyOut, dataIn)             -- validIN really! + data
    -> ( Vec 16 (Maybe a)
       , (validOut, dataOut)
       )
m16_s x (readyOut,dataIn) = ( x', (validOut,dataOut))
    where
    x' = if readyOut then map (fmap (+1)) (Just dataIn +>> x) else x
    validOut  = case last x of
                  Nothing -> False -- invalid slot
                  _       -> True  -- valid   slot
    dataOut   = case last x of
                  Nothing -> 0     -- invalid slot
                  Just x  -> x     -- valid   slot

m16 :: (HiddenClockResetEnable dom, Num a, NFDataX a)
    => Reset dom -> Signal dom (Ready, a) -> Signal dom (Valid,a)
m16 rst = withReset rst (mealy m16_s def)

So ... it is now clear what that "valid" thing is about. Because you start with a vector of 0s, you can't tell when a 0 is in the state vector because it was newly input as data or it was just there before anything at all was input and is now coming out. Your vector of "valids" just filled up as 00000, 00001, 00011, 00111, etc as more data entered your pipeline. When a 1 gets to the end of that vector, you know that the corresponding data in the parallel vector alongside it is for real, whether it is a 0 or not a 0. You might as well have counted to 16! Using a rolling vector of 1/0s (aka True/False)  saved you about 10 gates on addition, since you needed three full adders and one half adder to increment a 4-bit count. Maybe less. Anyway, it isn't needed ...

... because the simple thing is just to tag the data in the state vector with that one extra "valid" bit saying if it is for real or not.  That also uses 16 extra bits of storage, just like you had, and it also rolls the valid bit along without needing any more logic than that.

So instead of having a "Vec 16 a" as the type of your state, you need a "Vec 16 (Maybe a)". The Maybe means the entries are tagged with a "Just" when they are for real. (Otherwise they are tagged as "Nothing").

Then in your code, where you had a "dataIn +>> x", instead you need to add the "for real" tag to the data going in and write "Just dataIn +>> x".

Now you have a slight difficulty in incrementing the stuff in the state vector, because you have to increment underneath the tags. So instead of writing "map (+1) ...", you have to write "map (fmap (+1)) ...". The "fmap" turns the "+1" into something that works underneath the tags, so it turns "Just x" into "Just (x+1)" (and it leaves "Nothing" as  is).

 Your data will now trundle up the pipeline with a "yes, I am for real" tag attached, if it is for real, because the Just is only attached on data that is incoming when the ready signal is high. I've made sure the state vector is initialized with Nothings with the mealy machine initial state, which is "def".

Of a vector, that means to put a default value in each entry of the vector, and the default value for any "Maybe foo" type is "Nothing", so that's what one will get in the starting state vector entries.

One will also get that when the reset signal is applied.

I've taken out your "rst" input because as far as I could see what it did was return the state vector to the initial state (I didn't check too carefully, but that seemed to be the gist). You can do that by just signalling reset to the mealy machine when you have it as a complete build (the "m16" above).

I haven't bothered with the output that was just equal to an input - or something like that. That's a wire! In parallel.

You can now see that your validOut signal (sorry, was that out_valid originally? I tried to make all the names follow the same pattern) is just checking that the last entry in the state vector is tagged OK. I believe that follows the spirit of what you intended. You just had the tags trotting along in an auxiliary vector in parallel, instead of attached to the data in the actual state vector. It's just the same, but less complicated.

I have taken care to ensure that you can just abbreviate these machines to a 1-element state vector and then connect 16 of them up head-to-tail in a chain, and they should still work. You have to pass the _same_ reset signal to them all, so you will eventually write

  m16 rsts = compose (replicate d16 (m1 rsts))

for the complete machine built as 16 small machines. The input signal type is the same as the output signal type. That is, the input is SIgnal dom (Bool,a) and the output is Signal dom (Bool,a). You called one of the Bools "Valid" and the other "Ready" (I think). The intent is the same both on input and output, if I understand what is going on correctly. It is to signal that the accompanying data value is meaningful, not just some garbage that happened to be lying around and is still here. So I am pretty confident that chaining will work.

(not that I am going to stand close enough to try it ...)

Peter

peter.t...@gmail.com

unread,
Jun 1, 2021, 7:35:50 AM6/1/21
to Clash - Hardware Description Language
PS. I can't tell you why one thing uses more LUTs than another because I have little idea what a "LUT" is! Logical Unit something, maybe? If you could tell me how you are getting a count for them, maybe I could work out what they are from that.

If it means "logic gates", I'd love to know how you get a gate count out of VHDL. Please tell me! Does it become apparent when one compiles  the VHDL to something else. maybe netlist(s)? Where is that information exactly?

Tx

Peter

Mahshid Shahmohammadian

unread,
Jun 1, 2021, 11:43:56 AM6/1/21
to clash-l...@googlegroups.com
Sorry for the confusion. I'm synthesizing the generated VHDLs using vivado for Xilinx Virtex 7 FPGA. Please take a look at page 21 of this user guide:

I'm getting about 900 LUTs for the fold with delay implementation and only 34 LUTs for the m16 implementation with mealy machine. Would really appreciate any thoughts on the reasons.

Thanks a lot!
Mahshid

peter.t...@gmail.com

unread,
Jun 1, 2021, 3:15:04 PM6/1/21
to Clash - Hardware Description Language


Doesn't help me much, I'm afraid. "What's an LUT" should have an easy answer! It says "Lookup table"!

They say they mean more exactly a list of (64) 6-bit inputs, with one 2-bit output  for each. That defines the function graph of a  little  gate with 6 input wires and 2 output wires. That is two independent 3-input, 1-output gates? Say two 2-input, 1-output gates plus enable lines.

Anyway,  one LUT is two logic gates, plus frills. Probably one can make a basic flip-flop out of that, with feedback. So one of your designs is much more complex in its possible behaviours than the other, and/or needs much more storage! That's all. Are you sure their behaviour is the same? Storage requirements the same? The synthesis doesn't seem to think so.

(You will have to explain your second design to me because I can't make anything of it  by eye)

Regards

PTB

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 11:50:12 AM6/7/21
to Clash - Hardware Description Language
Sorry for the late response. I understand that the two designs are different in the way that one is translating the elements in flip flops as the results of the carry logic is passed to the registers but the other one is storing the result in a look-up table but the reason is not 100 percent clear to me. I have tested the functionality of the two designs and they both result in the same outputs. For example, in a testbench if I wait for a random number of clock cycles they both manage to successfully get me correct results.

To explain the second design:

This one does not use a mealy machine and uses dflipflop and delay functions instead for each element in the pipeline.  dataIn is inputted to the pipeline if the out_ready is asserted, otherwise, the previous value of x is assigned to x. Then because I know I want my function to be "x+1" as an incrementer, so I create a vector of ones of length 16 (the number of iterations). Then I apply a fold on this with function iter which does something similar (if ready is asserted then add the pipeline element to 1 otherwise a delayed version of that element). For valid I used the functions available in Vector library to make something like shift register and at the end get the last of the vector. That's it.

handshakeParallel :: (HiddenClockResetEnable dom, Num a,NFDataX a)
                   => (Bool, Signal dom a,Valid,Ready)
                   -> (Signal dom a,Valid,Ready)
handshakeParallel (rst,dataIn,in_valid,out_ready) = (x',out_valid,out_ready)
  where  
    x = if (rst == False && out_ready == True) then dataIn else (delay 0 x)
    ones = replicate d16 1

    initial = (0,out_ready,x)
    (_,_,x') = foldl iter initial ones
    val_vec = replicate d16 in_valid
    vec' = if (rst == False && out_ready == True)
           then rotateLeft val_vec (length val_vec - 1)
           else rotateLeft val_vec (length val_vec - 2)      
    out_valid = if (rst == True) then False else (vec' !! 0)


iter (i, ready, f) g = (i',ready',f')
   where
     f' = if ready then ( f + g) else (dflipflop f')
     ready' = ready
     i' = i + 1


Thanks,
Mahshid

peter.t...@gmail.com

unread,
Jun 7, 2021, 2:14:07 PM6/7/21
to Clash - Hardware Description Language
I'll try this:

"dataIn is inputted to the pipeline if the out_ready is asserted", otherwise, the previous value of x is assigned to x."

What is your representation of a pipeline? What (name and) type have you given it?

"Then because I know I want my function to be "x+1" as an incrementer, so I create a vector of ones of length 16 (the number of iterations)."

Is this vector a pipeline state? Part of a pipeline state? An input to the pipeline? Where do you create it and what is its name and type? The problem for me thus far is that you seem to be describing the pipeline I gave, and that does not correspond at all to the code you have written, so  you must be thinking one thing and reading/writing another. You should check your thinking by devising some checks! Please add type declarations to the functions and objects you define in that code, and make sure that Clash's idea of what you have written corresponds to yours. You should find the answer is "no"!

"Then I apply a fold on this with function iter which does something similar (if ready is asserted then add the pipeline element to 1 otherwise a delayed version of that element)."

What does the function iter do, and what is the function it constructs via fold, and what is the type it applies to and what is the type of what it produces?

(Fold just applies a binary operator pairwise between the elements of a list/vector, so the information content in it is just the binary function part. I need to hear more than "something similar"! Please say exactly)

"For valid I used the functions available in Vector library to make something like shift register and at the end get the last of the vector."

What is "valid"? Isn't that just the OK tag on the data that came in, delayed 16 cycles? If so, you are saying:

      valids, out_readies :: Signal dom Bool                                                      -- delay 16 cycles
      valids = compose (replicate d16 (register False))  out_readies           -- 16 registers, placed in series ("compose"d), applied
                     where compose = fold (.)

Your function iter must be the clue. Clash tells me it has type:

iter :: (HiddenClockResetEnable dom, Num x, Num y, NFDataX y) => (x, Bool, Signal dom y) -> Signal dom y -> (x, Bool, Signal dom y)

It looks like you intend a binary stream to stream transformation, done over several (16?) times.  So you will take a vector of 16 streams and do something to them? The binary transform part is

     stream' = if ready then  streamL + streamR else register undefined stream'

You can't have stream' = ... stream'. That's a loop. You might have meant the stream of undefineds, which would be

    stream' = if ready then streamL + streamR else pure undefined

So if ready is high, then this will just add (I didn't know you can do that!!! Thanks for telling me. I hope you are right) the elements of two streams pairwise. If you start with 16 streams, it adds their elements 16-wise.  What are these 16 streams? You apply your folded iter binary function to "ones", which is "1" replicated 16 times. The trouble is, that needs to be a stream in order to be presented to iter as one of its arguments, so the type system will cause that to be interpreted as whatever the numerical representation of a stream of 1s is, which I presume is just "pure 1". This looks like fun!  Yes:

  > sampleN 4 (1 :: Signal System Int)
  [1,1,1,1]

Nice! So one could have written

  stream' = if ready then streamL + streamR else undefined

The whole thing just has to be wrong, but it is fun!

Bottom line: declare types for your functions. I think you'll see you've been wrong about what the arguments and results are. Having "1" mean an infinite stream of Ints is certainly not what you intended. I think.

Cute notation, and I personally approve of it as great fun and just what I would want, but it explains why Haskell instead requires one to explicitly put things inside a Monad with an injection (that is "pure 1", here), rather than letting the system silently infer the injection for you. That's all very well if you know what you are doing and are right, but if you are wrong then it is a silent magnifier of your error.

This class-based feature in many ways allows one to construct a new language, giving new semantics to old syntax. The example above of "1" meaning an infinite stream of 1s is just wondrous.



Peter

peter.t...@gmail.com

unread,
Jun 7, 2021, 5:15:08 PM6/7/21
to Clash - Hardware Description Language
I feel moved to add that was the most exciting and wonderful "mistake" that anyone probably ever will make/has made. You probably don't think so.

You seem to have accidentally embedded everything in a space of streams, so "1" is the stream of 1s, and "+" is the operator that adds the elements of two streams, corresponding element to corresponding element. It's like embedding functions as points in a space of measures (!!). You are now living in a place where everything may be much, much more complicated than you were imagining. That accounts for the extra gates.

It's a miracle that everything works as it should - if it does (there may be a functorial embedding in a dual space at work here). But basically, that's what's up. Everything is wrong, from the point of view of your intention (very right! from the point of view of my own interest).

(A) The first thing that is wrong for you is the type of your "handshake". You intended it to be a stream transformer that takes a stream of dataIns and a stream of boolean outReadies and a stream of resets,  and produces  a stream of dataOuts and a stream of boolean valids.

The intended type is ("Reset" is a special Signal kind known to Clash as such, but it is really just booleans underneath)

      Reset dom -> Signal dom dataIn -> Signal dom OutReady -> (Signal dom dataOut, Signal dom Valid)

Instead of just one mealy machine doing that, you want a pipeline of mealy machines. Each little mealy machine does exactly the same thing as that at the type level, because you're just labelling what the streams of inputs and outputs ARE, really.

   m0, m1, ..., mF :: Reset dom -> Signal dom dataIn -> Signal dom OutReady -> (Signal dom dataOut, Signal dom Valid)

Your only problem is how to connect them up so the output of the first streams into the second as input, the second output into the third, etc. Worry about what they actually DO inside later. That's just semantics inside the boxes. The wiring between the boxes is the important part.

So you name all these streams:

(dataOuts0, valids0) = m0 resets dataIns0 out_readies0
(dataOuts1, valids1) = m1 resets dataIns1 out_readies1
...

[This is cringeworthy stylistically in terms of code authorship, but this way one can see what is going on.]

Now you connect them:

dataIns1 = dataOuts0
out_readies1 = valids0
dataIns1 = dataOuts1
out_readies2 = valids1
...

There, you are, one pipeline. I hooked the same reset signal up to all the stages. It seemed kinder.

All you have to do is define what the pipeline components do.

They all do the same thing. If the out_ready input is high, then they take in a new datum and move out their old datum, together with a/its valid high tag. If the out_ready input is low, they don't take in a new datum, but they still  do move out their old datum, together with whatever its valid tag said.

That is a mealy machine (or a register, if you prefer, but a register is a mealy machine). Its state is a pair (dat,Valid)   (also known as "Maybe dat" in Clash). When the accompanying valid tag is low, nobody cares what value dat is, and it may be undefined, even. As follows:

   m :: Reset dom -> Signal dom (dataIn,OutReady) -> Signal dom (dataOut,Valid)
   m resets = withReset resets (mealy fn init)
                       where init :: (dat,Valid)
                                   init = (undefined,False)
                                   fn :: (dat,Valid) ->(dataIn,OutReady) -> ((dat,Valid), (dataOut,Valid))
                                   fn (dat,valid) (dataIn,out_ready) = if out_ready then ((dataIn,True),(dat,valid))
                                                                                               else ((undefined,False),(dat,valid))   -- new state, outputs


You can do that with just a register, if you prefer. A register really is a mealy machine.

   m resets = withReset resets (register (undefined,False) . filter)
                                where filter :: Signal dom (dataIn,OutReady) -> Signal dom (dataIn,OutReady)
                                            filter = fmap fn
                                                        where  fn :: (dataIn,OutReady) -> (dataIn,OutReady)
                                                                    fn (dataIn,out_ready) = if out_ready then (dataIn,True) else (undefined,False)

I am supposing throughout that OutReady ~ Valid ~ Bool, dataIn ~ dataOut ~ dat as types.

I mentioned the predefined type Clash has that represents (dat,True) as Just dat, and (undefined,False) as Nothing. You may as well use it.

   m :: Reset dom -> Signal dom (Maybe dataIn) -> Signal dom (Maybe dataOut)

Things look simpler like that. It's clearer that there's just one stream of inputs and one stream of outputs for each pipeline stage. The inputs and outputs carry tags. When the tag is a "Just" then the accompanying data is valid. When the tag is "Nothing" then the accompanying data is not valid. Indeed, to keep everyone honest, it has been vamooshed and replaced by undefined. That will save a few gates. It doesn't need storing or processing.

This is where you should notice that instead of laboriously writing out all the wiring names and hookups one by one, you can simply write a higher order combinator to put the whole lot together and save yourself the bother:

    handshake :: Reset dom -> Signal dom (Maybe dataIn) -> Signal dom (Maybe dataOut)
    handshake resets = compose (replicate d16 (m resets))
                                      where compose = fold (.)

Nobody needs to know the names of the wires, other than input to and output from the complete pipeline (and I haven't named those either).

As to what you actually did, that would bear quite some analysis, and I hope somebody will do it. Maybe me. Maybe not!

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 6:38:26 PM6/7/21
to clash-l...@googlegroups.com
Let me respond like this:

- What is your representation of a pipeline? What (name and) type have you given it?

I have not named the pipeline, but the first stage of the pipeline is named x (with type Signal dom a) which goes into "initial" for the fold. Fold constructs the pipeline stages every clk cycle.

- Is this vector a pipeline state? Part of a pipeline state? An input to the pipeline? Where do you create it and what is its name and type? The problem for me thus far is that you seem to be describing the pipeline I gave, and that does not correspond at all to the code you have written, so you must be thinking one thing and reading/writing another. You should check your thinking by devising some checks! Please add type declarations to the functions and objects you define in that code, and make sure that Clash's idea of what you have written corresponds to yours. You should find the answer is "no"!

The vector of ones I created is the +1 of the incrementer function I need for every stage of the pipeline. This code is something different from the other one we talked about previously (the one you proposed with mealy machines) and I'm just trying to ask some expert's opinion to see why this one takes much more resources on FPGA.

- What does the function iter do, and what is the function it constructs via fold, and what is the type it applies to and what is the type of what it produces? (Fold just applies a binary operator pairwise between the elements of a list/vector, so the information content in it is just the binary function part. I need to hear more than "something similar"! Please say exactly)

The function iter adds the pipeline element to 1 if ready is asserted otherwise a delayed version of that element is assigned for every clock cycle. That 1 comes from the vector I previously created. Because I created 16 ones this fold will continue for 16 cycles, and the last one will have 16 delays just like a pipeline structure.

-The trouble is, that needs to be a stream in order to be presented to iter as one of its arguments, so the type system will cause that to be interpreted as whatever the numerical representation of a stream of 1s is, which I presume is just "pure 1".
- Bottom line: declare types for your functions. I think you'll see you've been wrong about what the arguments and results are. Having "1" mean an infinite stream of Ints is certainly not what you intended. I think.

The type my iter function is:
iter :: (Int,Bool,Signal dom a) -> Signal dom a -> (Int,Bool,Signal dom a)
So, isn't a signal supposed to work on stream of values instead of pure values?  The type of "1" I defined is not just Int, it's "Signal dom a" which I think should be a stream of 1 values that should be paired with the values from my iter function.

Thanks,
Mahshid




Peter Breuer

unread,
Jun 7, 2021, 7:40:37 PM6/7/21
to clash-l...@googlegroups.com
There's a small problem here:

> I have not named the pipeline, but the first stage of the pipeline is named
> x (with type Signal dom a) which goes into "initial" for the fold. Fold

A pipeline stage is not a signal. A signal goes INTO a pipeline stage
(and another comes out). A signal is what one finds on a wire. Wires
are attached to stages.

I don't think we can get past that.

But I think I understand what you intend. Instead of supplying a
vector of 16 1s, you are supplying a vector of 16 wires, on each of
which 1 is constantly asserted, dynamically, from cycle to cycle.

That saves you having to store 1 internally, or connect a wire to the
positive line, internally. OTOH it leaves open the question of who and
where are the 16 sources of 1s that you have shanghaied into
supplying, forever.

> constructs the pipeline stages every clk cycle.

Pipeline stages cannot be constructed dynamically (though it's a nice
idea!). They're silicon.

> The vector of ones I created is the +1 of the incrementer function I need

It isn't, but it may be your intention. Clash will tell you that you
have created a vector of 16 signals, each of which is carrying a 1,
repeated forever. That is not a vector of ones, but a vector of
signals, and each signal is carrying infinitely many 1s, timewise.

If you intended to make a vector of 1s, you would have written

replicate d16 1 :: Vec 16 Int

But Clash will tell you that you have created:

replicate d16 1 :: Vec 16 (Signal dom Int)

So it is a collection of 16 unnamed wires with 1s on each. As I said,
I get your idea.

It just seems mind-bending to me! Why would you supply a stream of 1s?

> for every stage of the pipeline. This code is something different from the
> other one we talked about previously (the one you proposed with mealy
> machines)

But a pipeline IS "mealy machines" arranged in sequence. What do you
imagine it as if not that?

What you may be describing is stream semantics. That is a system in
which every node in a topological network is understood as the
producer and consumer of several (different!) infinite streams of
data. Each stream has one node as origin, and one node as destination.
(Some nodes split incoming data into two outgoing copies of the input
stream).

In that kind of system, yes, you would provide a constant "1" not as a
simple parameter, but as a stream of 1s, one 1 arriving every clock
cycle, forever.

Is that what you are imagining? I can guess that in some vocabulary,
the streams may be referred to as "pipes" (of data), and that somehow
you have conflated that with "pipeline", which is a different word.

The dictionary will say:

pipeline

<architecture> A sequence of {functional units} ("stages")
which performs a task in several steps, like an assembly line
in a factory.

There is a connection in language, but it is serendipitous.

So ... if I were to replace "pipeline" by "data pipe" in what you have
written, would I understand better?

Peter

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 7:58:55 PM6/7/21