VHDL if statement without else equivalent in Clash

280 views
Skip to first unread message

Mahshid Shahmohammadian

unread,
May 21, 2021, 12:02:08 PM5/21/21
to Clash - Hardware Description Language
Hi,

I am working on a serial handshaking combinator that performs a function (say incrementer for now) every n cycles. I need if without else. I have written it in Kansas Lava and I had to use delay to assign to the signal whenever the else clause occurs. I'm new to Clash, so before moving to monadic approach and use when, I wrote the code below where for example in x assignment in else I'm assigning x to x which in Kansas Lava leads to loop for code generation. I'm wondering what happens here in Clash since it compiles and generates the vhdl code successfully. And how do you suggest writing this statement in Clash?

handshakeSerial :: (Num a) => State -> (Enabled a,Ready) -> (State,(Enabled a,Ready))
handshakeSerial state (dataIn,out_ready) = (state',(dataOut,in_ready))
  where
    dataOut   = case state of  Idle -> Nothing
                                             Valid  -> x'
                                             Ready  -> Nothing
    in_ready  = case cnt of  0 -> True
                                           _ -> False
    state' = case state of  Idle -> case dataIn of
                                          Nothing -> Idle
                                          _       -> Valid
                                       Valid  ->   if out_ready then Ready else Valid
                                       Ready  ->   if not out_ready then Idle else Ready

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt)
    x = if (state == Valid && cnt == 0) then dataIn else x
    x' = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then x + 1
          else (if cnt == n && out_ready == True then 0 else x)

type Enabled a = Maybe a
type Ready = Bool
data State = Idle | Valid | Ready  deriving (Eq,Show,Generic,NFData,ShowX)


Thanks,
Mahshid

peter.t...@gmail.com

unread,
May 22, 2021, 4:32:58 AM5/22/21
to Clash - Hardware Description Language
Your x is only used to calculate x' for the value next cycle.

State Valid is the only condition in which x' slips through to data output

In case state Valid, count is n (nonzero), read enable ("out_ready") line down, yes, x=x is what you have defined, so the value of x' is a looped calculation, and that gets through to data output.

I assume you didn't mean that! I guess that you meant to write was instead that x should stay equal to some previous value that it held before, so you need to hold on to that to assign to x now by making your state remember x for one cycle, say as "x_", and then instead of "else x", write "else x_".

So you need to replace "State" by a pair "(State,datain)", then your first argument becomes not "state", but "(state,x_)".

You also need to provide the value of x you calculate now as an extra piece of state to remember for next cycle. Your result should now say not "state'" but "(state',x)".

It would be helpful to reformat the code to make it more easily readable. Say:

x = case (state,cnt) of
         (Valid,0) -> dataIn
         _              -> x_     ---------- NB previous value, not present value!

for example. Lots of white space.

PTB (who is about to ask a question ...)

Christiaan Baaij

unread,
May 22, 2021, 5:12:09 AM5/22/21
to clash-l...@googlegroups.com
It is as Peter says: you seem to want to use the `x` from the previous clock cycle, so you need to make it part of your state.
So you probably want:
```
handshakeSerial :: (Num a) => (State, Enabled a) -> (Enabled a,Ready) -> ((State,Enabled a),(Enabled a,Ready))
handshakeSerial (state, xP) (dataIn,out_ready) = ((state',x'),(dataOut,in_ready))

  where
    dataOut   = case state of  Idle -> Nothing
                                             Valid  -> x'
                                             Ready  -> Nothing
    in_ready  = case cnt of  0 -> True
                                           _ -> False
    state' = case state of  Idle -> case dataIn of
                                          Nothing -> Idle
                                          _       -> Valid
                                       Valid  ->   if out_ready then Ready else Valid
                                       Ready  ->   if not out_ready then Idle else Ready

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt)
    x = if (state == Valid && cnt == 0) then dataIn else xP

    x' = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then x + 1
          else (if cnt == n && out_ready == True then 0 else x)
type Enabled a = Maybe a
type Ready = Bool
data State = Idle | Valid | Ready  deriving (Eq,Show,Generic,NFData,ShowX)
```
I'm not exactly sure which value you want to remember though:
1. The value of dataIn, or
2. The value of x'

Currently, the above code implements 2. If you meant option 1, you have to change to:
```
handshakeSerial (state, xP) (dataIn,out_ready) = ((state',x),(dataOut,in_ready))
```
Either option sorta corresponds to adding a `delay` function like you would in Kansas Lava.

Finally: Clash does not check for combinational loops when it translates Haskell to Verilog/VHDL; that's why Clash happily generates Verilog/VHDL for your original code.
There are some corner cases that make checking for "actual" combination loops tricky, so we haven't created the infrastructure in the Clash compiler to check for combinational loops.
Also, you would already be able to witness combinational loops when you simulate/run your code as a regular Haskell program: you would get a blinking cursor because evaluation of your program gets stuck.

Hope the above helps

--
You received this message because you are subscribed to the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clash-language/4e1deb0e-59da-474d-a14d-465fb87ff046n%40googlegroups.com.

peter.t...@gmail.com

unread,
May 22, 2021, 6:14:38 AM5/22/21
to Clash - Hardware Description Language


"cnt" may have the same problem, going by the "cnt = ... else cnt". Hard to say if that ever gets through the logic. And I thought runtime in the interpreter says "<<loop>>" explicitly, when it can tell, so clearly I have too old a version!

Peter

Mahshid Shahmohammadian

unread,
May 24, 2021, 8:35:07 AM5/24/21
to clash-l...@googlegroups.com
Thank you, Peter and Christiaan. A delayed version of x (and cnt) is what I need as I mentioned I implemented this by delay or register in Kansas Lava. I was curious to investigate what is happening in Clash that the code is successfully generated for this implementation that Christiaan clarified.

Should I do the same with cnt? both previous x and previous cnt bundled in with state to go into the mealy machine, something like this:

handshakeSerial :: (Num a) => (ST,Enabled a,Int) -> (Enabled a,Ready) -> ((ST,Enabled a,Int),(Enabled a,Ready))
handshakeSerial (state,x_,cnt_) (dataIn,out_ready) = ((state',x',cnt),(dataOut,in_ready))

  where
    dataOut   = case state of  Idle -> Nothing
                               Valid  -> x'
                               Ready  -> Nothing
    in_ready  = case cnt of  0 -> True
                             _ -> False
    state' = case state of  Idle -> case dataIn of
                              Nothing -> Idle
                              _       -> Valid
                            Valid  ->   if out_ready then Ready else Valid
                            Ready  ->   if not out_ready then Idle else Ready

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt_)
    x = if (state == Valid && cnt == 0) then dataIn else x_

    x' = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then x + 1
          else (if cnt == n && out_ready == True then 0 else x)

hsSerial :: (KnownDomain dom,
        GHC.Classes.IP (Clash.Signal.HiddenClockName dom) (Clock dom),
        GHC.Classes.IP (Clash.Signal.HiddenEnableName dom) (Enable dom),
        GHC.Classes.IP (Clash.Signal.HiddenResetName dom) (Reset dom),
        Num a, NFDataX a)
        =>
        Signal dom (Enabled a, Ready) -> Signal dom (Enabled a, Ready)
hsSerial = mealy handshakeSerial (Idle, Nothing, 0)

For some reason, when I simulate this in Clash I get no results:

handshakeSerialTest :: forall a . (NFDataX a, Num a) => [(Enabled a,Bool)]
handshakeSerialTest = simulate @System hsSerial [(Nothing,False),(Just 5, False),(Just 5, False),(Just 5,True),(Just 5, False)]

And with simulate_lazy I get:
[(Nothing,


Thanks,
Mahshid



--
Mahshid Shahmohammadian
Ph.D. Candidate
Computer Science Department
Drexel University

Martijn Bastiaan

unread,
May 24, 2021, 8:48:04 AM5/24/21
to clash-l...@googlegroups.com

The definition of `cnt` depends on itself:

    cnt = if (state == Valid && cnt == 0) || (cnt > 0 && cnt < n) then cnt + 1
          else (if cnt == n && out_ready == True then 0 else cnt_)

I.e., if we assume `state` equals `Valid` and we want to know what `cnt` is, we first need to know what `cnt` is - as we want to check whether it's equal to zero. This is a combinatorial loop - Clash will never insert memory elements implicitly. The simulation will therefore get stuck in an infinite loop try to evaluate `cnt`.

Martijn Bastiaan

unread,
May 24, 2021, 8:50:05 AM5/24/21
to clash-l...@googlegroups.com

Also, I see you're using the underlying representation of the various Hidden* constructs we've got. (This makes sense as this is unfortunately what is being shown in error messages.) You could replace:

KnownDomain dom,
        GHC.Classes.IP (Clash.Signal.HiddenClockName dom) (Clock dom),
        GHC.Classes.IP (Clash.Signal.HiddenEnableName dom) (Enable dom),
        GHC.Classes.IP (Clash.Signal.HiddenResetName dom) (Reset dom)

with

HiddenClockResetEnable dom

Cheers,
Martijn

On 24-05-2021 14:34, Mahshid Shahmohammadian wrote:

Mahshid Shahmohammadian

unread,
May 24, 2021, 9:01:12 AM5/24/21
to clash-l...@googlegroups.com
Right, this should depend on the value of "cnt_" which is the previous value of cnt not itself.

Thanks,
Mahshid

Mahshid Shahmohammadian

unread,
May 27, 2021, 2:22:50 PM5/27/21
to Clash - Hardware Description Language
I'm facing a new challenge and would like to ask for your recommendation. Remember my serial combinatror, now consider a pipelined version which I implemented in Kansas Lava as well. I'm implementing it using fold on Vec type. My problem is again with delays here; which you guys suggested passing the previous values to the function that is fed to the mealy machine for the serial version.

Please take a look at the code below:

handshakeParallel :: (Num a) => (Int,a,Valid) -> (Bool,a,Valid,Ready) -> ((Int,a,Bool),(a,Valid,Ready))
handshakeParallel (n,x_,valid_out_) (rst,dataIn,in_valid,out_ready) = ((n,x',out_valid),(x',out_valid,in_ready))
  where

    out_valid = some logic ...
    in_ready  = out_ready

    x = if (rst == False && out_ready == True) then dataIn else x_
    ones = replicate (SNat :: SNat 16) 1
    initial = (0,out_ready,x)
    (_,_,x') = foldl (\(i,ready,f) g -> if ready then (i,ready,f+g) else ??? ) initial ones


hsParallel :: (HiddenClockResetEnable dom, Num a, NFDataX a)
                  => Signal dom (Bool, a, Valid, Ready) -> Signal dom (a,Valid, Ready)
hsParallel = mealy handshakeParallel (16, 0,True)    --- (n,x_init, valid_out_init)


In the function the fold gets, I want to say if ready is asserted then add input to 1 (since this combinator just adds to 1 for now), and else: give me a previous value. How can I enforce the previous value here inside this function here? Am I required to use "delay" which is operated on Signal types and I need to lift my types to "Signal a"? And if yes should I write my conditional statements using "mux" only if I go into Signal realm?

Also, in the lambda expression I wrote in (\(i,ready,f) g -> if ready then (i,ready,f+g) else ??? ) the f+g should've been replaced with (delay f + g) if I want a pipelined structure that I guess makes the requirement for delay even more.

Thanks a lot in advance,
Mahshid

Christiaan Baaij

unread,
May 28, 2021, 3:43:49 AM5/28/21
to clash-l...@googlegroups.com
You say you implemented this function in Kansas Lava as well, could we see it? (Just a link to some online source repo, e.g. github / github gist, is sufficient).
That way I have a better understanding of what you're trying to do.

peter.t...@gmail.com

unread,
May 28, 2021, 7:14:07 AM5/28/21
to Clash - Hardware Description Language
I admit to being baffled by the code too. The last lot had me beswazzled because state' did not depend on state! There was a correlation with several things, but no direct statement of the intended evolution. (An explanation of why it is hard for a human to undrstand  might be that it originally was a finite-state machine generated from a diagram, rendered in code, and then partially translated/reverse engineered by a human again - my guess.)  Without the first code as a solid basis for understanding (or the Lava) it is practically impossible at least for this human to repartition it into parallel/pipelined units.

But people here do the impossible all the time, so maybe! Good luck!

I'll have a go at rendering your code in a way that I find more parseable .... (let me know if I err):

handshakeParallel :: (Num a)
                  => (Int,a,Valid)         -- state   (last count? last "x"? nominal state)
                  -> (Bool,a,Valid,Ready)  -- inputs  (reset? data in? what? flow control?)
                  -> ( (Int,a,Bool)        -- state
                     , (a,Valid,Ready)     -- outputs (data out? what? flow control?)
                     )
handshakeParallel (n,   x_    ,valid_out_)
                  (rst, dataIn,in_valid, out_ready) =
                      ( (n,x',out_valid)
                      , (x',out_valid,in_ready)
                      )
  where

    out_valid = some logic ... -- Ineed to see it! How does it depend on state? Please elaborate.
    in_ready  = out_ready      -- WHY is this here?


Above you should drop the out_ready as it plays no role  and we want to simplify for understanding. You can connect the streams via "out_ready = in_ready"  after having constructed the mealy machine without the extra output. (I don't know what names you will give to the streams at that point, so you will have to modify these names to match the ones you use for the inputs and outputs to/from the mealy machine).

    x = if (rst == False && out_ready == True) then dataIn else x_

That is surprising if rst is really a reset. Shouldn't reset set x to something like 0?

    x = case undefined of
          _ | rst            -> x_   -- (really? Surely 0?)
          _ | not out_ready  -> x_
          _                  -> dataIn  -- so out_ready gates dataIn to x
   
It looks from the above like x is intended to be dataOut! Is that what it really is, perhaps?

The following fold is really quite far from parseable for me personally and I would ordinarily guess on that account that it is mistaken. Can you explain for me in words what it is trying to do? Perhaps some inline types would aid my reading?

    init = (0,out_ready,x)
    (_,_,x') = foldl fn init  (replicate d16 1)
               where fn (i,ready,f) g = case ready of
                                          True  -> (i,ready,f+g)
                                          False -> -- need to see this!

Fold works its way along the input (which is 11111...) accumulating a result. That result seems to be more or less a count (discarded) (can't tell what it counts because the code is not there :-(), a yes/no boolean that starts out as "out_ready" and does not change in the code shown, so the business end must be in the elided code, and a sum that starts with x and adds 1 all the time in the code shown, so it ends up with x+16 (the number of initial 1s).

The initial condition out_ready does not change, and the vector it is applied to is a constant, so this (x') is a function of x and out_ready ONLY. What is it intended to be? The part shown is

   x' = case out_ready of
          True  -> x+16
          False -> ????

My conviction is that the code with fold in must be wrong? The types and portion of code given say this is a simple function with no need to use fold. Can you say something about that to put my mind at ease?

hsParallel :: (HiddenClockResetEnable dom, Num a, NFDataX a)
           => Signal dom (Bool, a, Valid, Ready)  -- inputs
           -> Signal dom (a,Valid, Ready)         -- outputs
hsParallel = mealy handshakeParallel (16, 0,True) -- (n,x_init, valid_out_init)

Without knowing with a great deal more certainty what the semantics is intended to be, I can't really offer a parallelization. Those things are hard enough to get right with all the information in the world!

For me, the first mystery to resolve is why your mealy machine has a next state that does not depend explicitly on the previous state plus new inputs. It is not natural for a human being to write that as a mealy machine! Can you shed some light there.

If it were really true that there is no semantic connection between prior and next state, then you could just not bother with a mealy machine. But I think there is some connection and it has been obscured in the coding. Can you make the dependence explicit,please? That would help my understanding a lot.

Regards

PTB

peter.t...@gmail.com

unread,
May 28, 2021, 7:38:26 AM5/28/21
to Clash - Hardware Description Language

Are you perhaps trying to add up  the 16 last inputs for which the ready signal was high when they arrived ? Or something not a million miles away from that?

If so, the idea of using fold is mistaken because (a) the vector it is applied to must be  present right now, and you are trying to apply it "across time" (I guess!). You must instead accumulate the 16 last inputs into a vector present in the here and now, and apply fold to that vector, right now. But please don't do that, because...

(b) the idea of using fold was rather baroque because the function it implements does not look hard at all, so you didn't need a sledgehammer.

My guess is that you want to accumulate some ongoing count or sum of inputs, each gated by a ready signal that was present at the same time as the input and output that evolving count or sum while so requested by  flow control and reset it on a signal too. Is that it?

People will be able to render that for you quite simply (but illegibly!) as a combination of operations on signals. You'd be surprised.

For example, the signal that at each moment in time contains the last 16 inputs (including the last) as a vector is

last16Ins :: (HiddenClockResetEnable dom, NFDataX, ...) => Signal (Vec 16 a)
last16Ins = zipWith (+>>) dataIns (register (replicate d16 1) last16Ins)

or something simular. Details left to the reader! Then you can just fmap f onto that, where f does whatever you want on that vector of 16 things you have accumulated over the last 16 cycles. (Maybe the vector should record the data_ready input too!).

Mahshid Shahmohammadian

unread,
May 28, 2021, 12:44:38 PM5/28/21
to Clash - Hardware Description Language
Sorry for the confusion the code I provided may have caused. Thank you Peter for trying to parse the function. The idea of linking to a repository for a better understanding of the functionality of this pipelined incrementer makes sense. Please check the repo here:


And my Kansas Lava implementation is in:

gen-vhdl/incrementer/kansas-lava/incrementer.hs --> parallel version is the function "parallelIter"

Also, I have a handwritten VHDL version of this functionality that helps to understand what I'm talking about which is:



Thanks,
Mahshid

Mahshid Shahmohammadian

unread,
May 28, 2021, 1:27:04 PM5/28/21
to Clash - Hardware Description Language
Peter,

The last16Ins is not quite what I have in mind, however, it is kinda similar. The circuit will check if the ready signal is asserted assigns the (dataIn+1) to the first element of the pipeline stage, and also the rest of the pipeline stages will be incremented. The valid is treated as a shift register when ready is asserted. Finally, the last stage of the pipeline is outputted to dataOut.

So, in simulation, the waveform will look like the attached file.

On Friday, May 28, 2021 at 7:38:26 AM UTC-4 peter.t...@gmail.com wrote:
parallel-inc.png

peter.t...@gmail.com

unread,
May 28, 2021, 5:30:13 PM5/28/21
to Clash - Hardware Description Language
That's more HOW than WHAT, but I get the idea, I think ...


1  Check if the ready signal is asserted assigns the (dataIn+1) to the first element of the pipeline stage,

Assuming the state is a vector of somethings representing what is in the various stages at one time, that is

           v' = if ready then replace 0 (dataIn+1) v else v



2. and also the rest of the pipeline stages will be incremented.

So that is       

   v' = if ready then map (+1) (replace 0 dataIn v) else v


3. The valid is treated as a shift register when ready is asserted.

I don't quite parse that. Do you mean that actually the stuff also all shifts up one position  when ready is high? That would be

  v'  = if ready then map (+1) ( dataIn +>> v) else v

I am assuming that nothing moves and/or is incremented and/or introduced when ready is low! You didn't say.

Finally, the last stage of the pipeline is outputted to dataOut.

  dataOut = last v  -- or do you mean last of v', the changed vector? I guess you meant the former.

Yes? Now that is a mealy machine as written, but it could be split up. I'll talk about that below. Meanwhile the mealy machine is

     mymachine :: (HiddenClockResetEnable dom , dataIn ~ data, dataOut ~ data, ready ~ Bool) -- what types really?
            => Signal dom dataIn -> Signal dom ready -> Signal dom dataOut
  mymachine = mealy f init . curry bundle       -- playing silly with ". curry bundle" to give you the nice type above
         where
                   f :: (Vec 16 data,(dataIn,read)y) -> (Vec16 data,dataOut)
                   f (v,(dataIn,ready)) = (v',dataOut)
                                          where v' = if ready then map (+1) ( dataIn +>> v) else v
                                                dataOut = last v
                   init = def :: Vec 16 data  -- FIXME, specify please

      
OK? I don't know if that is exactly what you meant because the English isn't fully determinative with respect to some points of detail. I have no  sure feeling for example  of if you really meant to increment ALL the stages at once, in the situation where anything happens at all. What's the point? The data will just all end up having been incremented by 16 by the time it gets to the end of the pipeline (the "vector"), so why not just increment it by 16 in one go instead of by 1, 16 times over?

Maybe it's just an exercise, and it's not meant to make too much sense in practical terms!

What is interesting is that what comes out is delayed by at least 16 cycles (and incremented by 16, as per the above), and likely in practice is delayed by considerably more. That is because every cycle in which ready is down adds one cycle to the delay, as the pipeline does not move at all on that cycle. Did you really mean  that? I assume so. Otherwise it's just a last16Ins producer, and you didn't want that.

You really want this  to be not a single machine handling a vector of 16 values, but 16 stages each handling one value each. I'll do that now.

So you would write

        dataOuts = (m15 readies . m14 readies . ... . m1 readies . m0 readies) dataIns

where each of those 16 machines (each takes the ready signal) takes a dataIn and produces a dataOut. I'll do that more succinctly lower down, but it helps  to see it written out "longhand" first, I think. Each machine has the formal type just announced:

      m0, m1, ..., m15 :: (HiddenClockResetEnable dom, ...) => Signal dom ready -> Signal dom dataIn -> Signal dom dataOut


They are all identical, aren't they?

    m0 = m
    m1 = m
    ...
    m15 = m

OK, I got tired already. Let's cut the longhand and just write

  dataOuts = (m readies . m readies . ... . m readies . m readies) dataIns

What we need is the vector of 16 machines already wired with the ready signal:

   replicate d16 (m readies)

and we need to compose them:

   dataOuts  = compose (replicate d16 (m readies))  dataIns

and "compose" had better mean a fold of the binary function composition operator

  dataOuts = compose (replicate d16 (m readies))  dataIns
                   where compose = fold (.)

It is fervently to be hoped that Clash can smash all that abstract statement out into a flat application, so I don't have to help it at all in any way, not by crossing my fingers even. That remains to be seen.

The single machine "m" is the mealy machine with a vector length 1 in it, instead of length 16.

m :: (HiddenClockResetEnable dom , dataIn ~ data, dataOut ~ data, ready ~ Bool)
            => Signal dom dataIn -> Signal dom ready -> Signal dom dataOut
  m = mealy f init . curry bundle
         where
                   f :: (Vec 1 data,(dataIn,read)y) -> (Vec 1 data,dataOut)
                   f (v,(dataIn,ready)) = (v',dataOut)
                                          where v' = if ready then map (+1) ( dataIn +>> v) else v
                                                dataOut = last v
                   init = def :: Vec 1 data  -- FIXME, specify please


I copied that from higher up and changed the 16s to 1s. You could beat up on it for having vectors length 1 instead of just the data inside the vector. Shrug. I'm lazy.

Is this like what you were thinking of?

Regards

PTB

Mahshid Shahmohammadian

unread,
May 28, 2021, 6:19:50 PM5/28/21
to clash-l...@googlegroups.com
Thanks a lot for your complete elaboration! Other than the valid part everything you mentioned is what I meant. Valid is another input signal to the circuit that is we keep them in a shift register manner and output out_valid as the last (MSB) of valid. Also, to answer your question why don't we just add by 16 and delay by 16, this is going to be a generic circuit not only incrementing stuff (a combinator).

Your idea to pass the vector as a state to the mealy machine sounds good! So I'm just going to go with that.

Thanks,
Mahshid

You received this message because you are subscribed to a topic in the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/clash-language/nhnc_dNOOxg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to clash-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clash-language/5eed4632-b527-4cd8-bdb9-d51e6a3c6fa9n%40googlegroups.com.

Mahshid Shahmohammadian

unread,
May 31, 2021, 12:40:34 PM5/31/21
to clash-l...@googlegroups.com
I gave this more thought. In the first implementation, we can ignore the reset signal and just use your function for m as a single machine for every other 15 stages of the pipeline. The first stage is different from others. The first one gets the input data but the rest of the machines get the data from the previous machines inside the pipeline. I'm not sure the composition is going to be the same, is it?

Do you think this implementation is going to use fewer resources after synthesis compared to the fold implementation I suggested along with using delays (since you are way more expert than me)?

Peter Breuer

unread,
May 31, 2021, 1:59:45 PM5/31/21
to clash-l...@googlegroups.com
The first is the same machine as the rest. Just its input has a different name. Honest!

(Brevity caused by phone gui - apologies!)

Peter

peter.t...@gmail.com

unread,
May 31, 2021, 6:58:18 PM5/31/21
to Clash - Hardware Description Language
Now at a computer, let me elaborate.

First of all, I think you should concentrate first on your solution using a single mealy machine with a vector as state. When you have that working, but not before then, you can think about splitting it up into a pipeline of 16 mealy machines, following roughly what I did. One thing at a time! (though it's great that you are thinking ahead).

I made several typos/thinkos in writing down my own mealy machine on that vector, so I _know_ you will have to think about that first in order to get it working. I didn't intend that, apologies, but it's quite serendipitous from that point of view :-). On my conscience are

Thinko 1) I forgot to prefix the function type declaration with "forall (dom::Domain) data dataIn dataOut . " in order that the type declarations in subfunctions work (using those named types). Otherwise they'll just be a headache.

Thinko 2) I wrote down the (ready,dataIn) arguments the wrong way round, as (dataIn,ready).  It has to be "ready first" in order that the readies stream can be given as a (first) parameter to the mealy machine(s), which in turn allowed that neat trick of partially evaluating the tiny little mealy machines on the readies stream as "m readies" before replicating them in a vector and composing them as functions on their remaining input stream. Ahem. You'll probably spot that and correct it without a second thought.

OK? I reiterate that I don't see any difference in the semantics of the tiny little machines, if ever you get to the point of making those. The names of their input streams, or what is on them, are none of their business as machines. They take what they are given, and that's that. That some of them
are taking the outputs of other machines as their input, and one is taking the dataIns streaam, is  unknown to them all. They do what they do. If I miscalculated and somehow there should be some difference between their semantics, my apologies, but as I vaguely understood the intended functionality, there should be none. You may well know better!

You confused me with that "valid" business because without my allegedly erronious interpretation of what you said, the pipeline would never move!

You forgot to tell me to move the whole pipeline up one position when ready is asserted (or perhaps on every cycle! I don't know!). I misinterpreted what you said as telling me to do that. Now I don't know what you mean, as you tell me I got it wrong. But you should take note that you _MUST_ move the pipeline along explicitly yourself, if you intend to do so. It won't happen on its own. The mealy machine does what you tell it to and nothing else. Don't tell it to move the pipeline up one, and it won't!

Also the "valid" stuff as you now describe it just sounds like a parallel pipeline/vector moving an incoming "valid" signal up in step with the pipeline contents (yes, I know you think it's a bitvector, not a vector, but the difference is slight). Do you mean to keep the "valids" in step with the real pipeline content or not? If yes, why not attach them, and  move pairs (data,valid) up the  pipeline, instead of just "data"? You may say it uses more gates, but I don't think with small numbers like "16" hanging around that _anything_ can use too many gates, and the bigger danger right now is getting the semantics wrong. When you have it right, the easy way, then you can obscure the code for some greater implementation efficiency! Nothing is more difficult than keeping parallel accounting in step. What happens on reset? Make it as easy on yourself as possible!

But in any case, you have very little control over how Clash renders your code. In principle you can write whatever you like within one mealy machine and provided the functionality is the same, Clash will render that the same way, no matter how hard you try to obscure or disguise it :-) (in practice that is not true, but it is a fair first approximation to the truth). So you should not care about expressing what you mean one way or another. Clash will just smash the whole thing out into combinatorial logic, stuck inside a single loop, with one clock delay.  Same semantics for the combinatorial logic, same ("normalized") logic expression results ("in principle").

 [The reality is that rendering into vhdl or verilog will intercept higher level constructs than AND and NOR gates, which will result in different expressions in those languages ... nevertheless, vhdl or verilog compilers if asked to render that into pure logic gates should always complete the normalization - in their own fashion. You are fundamentally working in a computationally decidable domain, in which the semantic equality of syntactic expressions can in principle be decided in finite time and syntactic differences after normalization must mean differences in semantics ... so sue me for not mentioning that computational complexity can make that ideal effectively impossible.]

The only real difference you can make is with that split into 16 mealy machines in a pipeline. Even then I wouldn't swear that will make any difference, unless you also declare "NOINLINE m" on the little mealy machines. That will definitely stop Clash trying to smash those together, if it does that (which I give about 70/30 odds on).

I also was concerned because those 16 machines in a pipeline apparently don't have any delay between their input and output. BUT, but but, I was careful to define their output as "last v", not last v', which should provide the missing delay. Mealy machines left to themselves do output on the same cycle as their input, just a little later in the cycle. Ordinarily the effect of an input would cascade all the way up that pipeline in the same cycle, making yu have to go very slow with the clock! But I disengaged the output from the input via "last v" (it would have been engaged via last v', because the input dataIn influences v', while v is the last state of the vector, before the input arrived), so there is no cascade. Please be careful and don't accidently connect the input to the output in the same cycle and then put everything in a pipeline ...! It is the way I had it deliberately.

That's about all I can think of to say about it right now.

Ask again when you have the vector mealy machine working.

Regards

PTB

Mahshid Shahmohammadian

unread,
May 31, 2021, 8:48:47 PM5/31/21
to clash-l...@googlegroups.com
I already implemented the single machine version, but my own version. (since I could not figure out that " . curry bundle" part and got syntax errors). Here it is:

m16 :: (Num a) 
    => (Vec 16 a,Vec 16 Bool)
    -> (Bool,a,Valid,Ready)
    -> ((Vec 16 a,Vec 16 Bool),(a,Valid,Ready))
m16 (x_,v_) (rst,dataIn,in_valid,out_ready) = ((x',v'),(dataOut,out_valid,in_ready))
  where
    v' = if rst then (replicate d16 False)
         else (if (out_ready == True) then (in_valid +>> v_) else v_)

    x' = if (rst == False && out_ready == True)
         then (map (+1) (dataIn +>> x_)) else x_
    out_valid = last v_
    dataOut = last x_
    in_ready  = out_ready

hsm16 :: (HiddenClockResetEnable dom, Num a, NFDataX a)

       => Signal dom (Bool, a, Valid, Ready)
       -> Signal dom (a,Valid, Ready)
hsm16 = mealy m16 (replicate d16 0, replicate d16 False)

After synthesizing this, it's resulting in a much fewer number of LUTs than the fold implementation that I mentioned before. So, I wonder if I can get those 16 machines compositions to work, I can optimize the resource utilization by a lot! I'm kinda stuck on that curry and (.) which I think you tried to get the output of type "signal dom data" out of mealy call. Could you please double-check that?

BTW this is my other version which takes much more resources. I would appreciate it if you share any thoughts on why this one is utilizing way more LUTs.

handshakeParallel :: (HiddenClockResetEnable dom, Num a,NFDataX a)
                   => (Bool, Signal dom a,Valid,Ready)
                   -> (Signal dom a,Valid,Ready)
handshakeParallel (rst,dataIn,in_valid,out_ready) = (x',out_valid,out_ready)
  where   
    val_vec = replicate d16 in_valid
    vec' = if (rst == False && out_ready == True)
           then rotateLeft val_vec (length val_vec - 1)
           else rotateLeft val_vec (length val_vec - 2)      
    out_valid = if (rst == True) then False else (vec' !! 0)

    x = if (rst == False && out_ready == True) then (dataIn+1) else (delay 0 x)
    ones = replicate d15 1
    initial = (0,out_ready,x)
    (_,_,x') = foldl iter initial ones

iter (i, ready, f) g = (i',ready',f')
   where
     f' = if ready then ( f + g) else (dflipflop f')
     ready' = ready
     i' = i + 1


Thanks,
Mahshid






peter.t...@gmail.com

unread,
Jun 1, 2021, 2:31:19 AM6/1/21
to Clash - Hardware Description Language
Great! Here's an update. I'll explain below (yes, it really does what you want).

import Clash.Prelude
type Valid = Bool
type Ready = Bool
m16_s :: ( Num a
         , dataIn ~ a, dataOut ~ a
         , validOut ~ Valid            -- = Bool
         , readyOut ~ Ready            -- = Bool
         )
    => Vec 16 (Maybe a)                -- entries tagged as valid/not valid
    -> ( readyOut, dataIn)             -- validIN really! + data
    -> ( Vec 16 (Maybe a)
       , (validOut, dataOut)
       )
m16_s x (readyOut,dataIn) = ( x', (validOut,dataOut))
    where
    x' = if readyOut then map (fmap (+1)) (Just dataIn +>> x) else x
    validOut  = case last x of
                  Nothing -> False -- invalid slot
                  _       -> True  -- valid   slot
    dataOut   = case last x of
                  Nothing -> 0     -- invalid slot
                  Just x  -> x     -- valid   slot

m16 :: (HiddenClockResetEnable dom, Num a, NFDataX a)
    => Reset dom -> Signal dom (Ready, a) -> Signal dom (Valid,a)
m16 rst = withReset rst (mealy m16_s def)

So ... it is now clear what that "valid" thing is about. Because you start with a vector of 0s, you can't tell when a 0 is in the state vector because it was newly input as data or it was just there before anything at all was input and is now coming out. Your vector of "valids" just filled up as 00000, 00001, 00011, 00111, etc as more data entered your pipeline. When a 1 gets to the end of that vector, you know that the corresponding data in the parallel vector alongside it is for real, whether it is a 0 or not a 0. You might as well have counted to 16! Using a rolling vector of 1/0s (aka True/False)  saved you about 10 gates on addition, since you needed three full adders and one half adder to increment a 4-bit count. Maybe less. Anyway, it isn't needed ...

... because the simple thing is just to tag the data in the state vector with that one extra "valid" bit saying if it is for real or not.  That also uses 16 extra bits of storage, just like you had, and it also rolls the valid bit along without needing any more logic than that.

So instead of having a "Vec 16 a" as the type of your state, you need a "Vec 16 (Maybe a)". The Maybe means the entries are tagged with a "Just" when they are for real. (Otherwise they are tagged as "Nothing").

Then in your code, where you had a "dataIn +>> x", instead you need to add the "for real" tag to the data going in and write "Just dataIn +>> x".

Now you have a slight difficulty in incrementing the stuff in the state vector, because you have to increment underneath the tags. So instead of writing "map (+1) ...", you have to write "map (fmap (+1)) ...". The "fmap" turns the "+1" into something that works underneath the tags, so it turns "Just x" into "Just (x+1)" (and it leaves "Nothing" as  is).

 Your data will now trundle up the pipeline with a "yes, I am for real" tag attached, if it is for real, because the Just is only attached on data that is incoming when the ready signal is high. I've made sure the state vector is initialized with Nothings with the mealy machine initial state, which is "def".

Of a vector, that means to put a default value in each entry of the vector, and the default value for any "Maybe foo" type is "Nothing", so that's what one will get in the starting state vector entries.

One will also get that when the reset signal is applied.

I've taken out your "rst" input because as far as I could see what it did was return the state vector to the initial state (I didn't check too carefully, but that seemed to be the gist). You can do that by just signalling reset to the mealy machine when you have it as a complete build (the "m16" above).

I haven't bothered with the output that was just equal to an input - or something like that. That's a wire! In parallel.

You can now see that your validOut signal (sorry, was that out_valid originally? I tried to make all the names follow the same pattern) is just checking that the last entry in the state vector is tagged OK. I believe that follows the spirit of what you intended. You just had the tags trotting along in an auxiliary vector in parallel, instead of attached to the data in the actual state vector. It's just the same, but less complicated.

I have taken care to ensure that you can just abbreviate these machines to a 1-element state vector and then connect 16 of them up head-to-tail in a chain, and they should still work. You have to pass the _same_ reset signal to them all, so you will eventually write

  m16 rsts = compose (replicate d16 (m1 rsts))

for the complete machine built as 16 small machines. The input signal type is the same as the output signal type. That is, the input is SIgnal dom (Bool,a) and the output is Signal dom (Bool,a). You called one of the Bools "Valid" and the other "Ready" (I think). The intent is the same both on input and output, if I understand what is going on correctly. It is to signal that the accompanying data value is meaningful, not just some garbage that happened to be lying around and is still here. So I am pretty confident that chaining will work.

(not that I am going to stand close enough to try it ...)

Peter

peter.t...@gmail.com

unread,
Jun 1, 2021, 7:35:50 AM6/1/21
to Clash - Hardware Description Language
PS. I can't tell you why one thing uses more LUTs than another because I have little idea what a "LUT" is! Logical Unit something, maybe? If you could tell me how you are getting a count for them, maybe I could work out what they are from that.

If it means "logic gates", I'd love to know how you get a gate count out of VHDL. Please tell me! Does it become apparent when one compiles  the VHDL to something else. maybe netlist(s)? Where is that information exactly?

Tx

Peter

Mahshid Shahmohammadian

unread,
Jun 1, 2021, 11:43:56 AM6/1/21
to clash-l...@googlegroups.com
Sorry for the confusion. I'm synthesizing the generated VHDLs using vivado for Xilinx Virtex 7 FPGA. Please take a look at page 21 of this user guide:

I'm getting about 900 LUTs for the fold with delay implementation and only 34 LUTs for the m16 implementation with mealy machine. Would really appreciate any thoughts on the reasons.

Thanks a lot!
Mahshid

peter.t...@gmail.com

unread,
Jun 1, 2021, 3:15:04 PM6/1/21
to Clash - Hardware Description Language


Doesn't help me much, I'm afraid. "What's an LUT" should have an easy answer! It says "Lookup table"!

They say they mean more exactly a list of (64) 6-bit inputs, with one 2-bit output  for each. That defines the function graph of a  little  gate with 6 input wires and 2 output wires. That is two independent 3-input, 1-output gates? Say two 2-input, 1-output gates plus enable lines.

Anyway,  one LUT is two logic gates, plus frills. Probably one can make a basic flip-flop out of that, with feedback. So one of your designs is much more complex in its possible behaviours than the other, and/or needs much more storage! That's all. Are you sure their behaviour is the same? Storage requirements the same? The synthesis doesn't seem to think so.

(You will have to explain your second design to me because I can't make anything of it  by eye)

Regards

PTB

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 11:50:12 AM6/7/21
to Clash - Hardware Description Language
Sorry for the late response. I understand that the two designs are different in the way that one is translating the elements in flip flops as the results of the carry logic is passed to the registers but the other one is storing the result in a look-up table but the reason is not 100 percent clear to me. I have tested the functionality of the two designs and they both result in the same outputs. For example, in a testbench if I wait for a random number of clock cycles they both manage to successfully get me correct results.

To explain the second design:

This one does not use a mealy machine and uses dflipflop and delay functions instead for each element in the pipeline.  dataIn is inputted to the pipeline if the out_ready is asserted, otherwise, the previous value of x is assigned to x. Then because I know I want my function to be "x+1" as an incrementer, so I create a vector of ones of length 16 (the number of iterations). Then I apply a fold on this with function iter which does something similar (if ready is asserted then add the pipeline element to 1 otherwise a delayed version of that element). For valid I used the functions available in Vector library to make something like shift register and at the end get the last of the vector. That's it.

handshakeParallel :: (HiddenClockResetEnable dom, Num a,NFDataX a)
                   => (Bool, Signal dom a,Valid,Ready)
                   -> (Signal dom a,Valid,Ready)
handshakeParallel (rst,dataIn,in_valid,out_ready) = (x',out_valid,out_ready)
  where  
    x = if (rst == False && out_ready == True) then dataIn else (delay 0 x)
    ones = replicate d16 1

    initial = (0,out_ready,x)
    (_,_,x') = foldl iter initial ones
    val_vec = replicate d16 in_valid
    vec' = if (rst == False && out_ready == True)
           then rotateLeft val_vec (length val_vec - 1)
           else rotateLeft val_vec (length val_vec - 2)      
    out_valid = if (rst == True) then False else (vec' !! 0)


iter (i, ready, f) g = (i',ready',f')
   where
     f' = if ready then ( f + g) else (dflipflop f')
     ready' = ready
     i' = i + 1


Thanks,
Mahshid

peter.t...@gmail.com

unread,
Jun 7, 2021, 2:14:07 PM6/7/21
to Clash - Hardware Description Language
I'll try this:

"dataIn is inputted to the pipeline if the out_ready is asserted", otherwise, the previous value of x is assigned to x."

What is your representation of a pipeline? What (name and) type have you given it?

"Then because I know I want my function to be "x+1" as an incrementer, so I create a vector of ones of length 16 (the number of iterations)."

Is this vector a pipeline state? Part of a pipeline state? An input to the pipeline? Where do you create it and what is its name and type? The problem for me thus far is that you seem to be describing the pipeline I gave, and that does not correspond at all to the code you have written, so  you must be thinking one thing and reading/writing another. You should check your thinking by devising some checks! Please add type declarations to the functions and objects you define in that code, and make sure that Clash's idea of what you have written corresponds to yours. You should find the answer is "no"!

"Then I apply a fold on this with function iter which does something similar (if ready is asserted then add the pipeline element to 1 otherwise a delayed version of that element)."

What does the function iter do, and what is the function it constructs via fold, and what is the type it applies to and what is the type of what it produces?

(Fold just applies a binary operator pairwise between the elements of a list/vector, so the information content in it is just the binary function part. I need to hear more than "something similar"! Please say exactly)

"For valid I used the functions available in Vector library to make something like shift register and at the end get the last of the vector."

What is "valid"? Isn't that just the OK tag on the data that came in, delayed 16 cycles? If so, you are saying:

      valids, out_readies :: Signal dom Bool                                                      -- delay 16 cycles
      valids = compose (replicate d16 (register False))  out_readies           -- 16 registers, placed in series ("compose"d), applied
                     where compose = fold (.)

Your function iter must be the clue. Clash tells me it has type:

iter :: (HiddenClockResetEnable dom, Num x, Num y, NFDataX y) => (x, Bool, Signal dom y) -> Signal dom y -> (x, Bool, Signal dom y)

It looks like you intend a binary stream to stream transformation, done over several (16?) times.  So you will take a vector of 16 streams and do something to them? The binary transform part is

     stream' = if ready then  streamL + streamR else register undefined stream'

You can't have stream' = ... stream'. That's a loop. You might have meant the stream of undefineds, which would be

    stream' = if ready then streamL + streamR else pure undefined

So if ready is high, then this will just add (I didn't know you can do that!!! Thanks for telling me. I hope you are right) the elements of two streams pairwise. If you start with 16 streams, it adds their elements 16-wise.  What are these 16 streams? You apply your folded iter binary function to "ones", which is "1" replicated 16 times. The trouble is, that needs to be a stream in order to be presented to iter as one of its arguments, so the type system will cause that to be interpreted as whatever the numerical representation of a stream of 1s is, which I presume is just "pure 1". This looks like fun!  Yes:

  > sampleN 4 (1 :: Signal System Int)
  [1,1,1,1]

Nice! So one could have written

  stream' = if ready then streamL + streamR else undefined

The whole thing just has to be wrong, but it is fun!

Bottom line: declare types for your functions. I think you'll see you've been wrong about what the arguments and results are. Having "1" mean an infinite stream of Ints is certainly not what you intended. I think.

Cute notation, and I personally approve of it as great fun and just what I would want, but it explains why Haskell instead requires one to explicitly put things inside a Monad with an injection (that is "pure 1", here), rather than letting the system silently infer the injection for you. That's all very well if you know what you are doing and are right, but if you are wrong then it is a silent magnifier of your error.

This class-based feature in many ways allows one to construct a new language, giving new semantics to old syntax. The example above of "1" meaning an infinite stream of 1s is just wondrous.



Peter

peter.t...@gmail.com

unread,
Jun 7, 2021, 5:15:08 PM6/7/21
to Clash - Hardware Description Language
I feel moved to add that was the most exciting and wonderful "mistake" that anyone probably ever will make/has made. You probably don't think so.

You seem to have accidentally embedded everything in a space of streams, so "1" is the stream of 1s, and "+" is the operator that adds the elements of two streams, corresponding element to corresponding element. It's like embedding functions as points in a space of measures (!!). You are now living in a place where everything may be much, much more complicated than you were imagining. That accounts for the extra gates.

It's a miracle that everything works as it should - if it does (there may be a functorial embedding in a dual space at work here). But basically, that's what's up. Everything is wrong, from the point of view of your intention (very right! from the point of view of my own interest).

(A) The first thing that is wrong for you is the type of your "handshake". You intended it to be a stream transformer that takes a stream of dataIns and a stream of boolean outReadies and a stream of resets,  and produces  a stream of dataOuts and a stream of boolean valids.

The intended type is ("Reset" is a special Signal kind known to Clash as such, but it is really just booleans underneath)

      Reset dom -> Signal dom dataIn -> Signal dom OutReady -> (Signal dom dataOut, Signal dom Valid)

Instead of just one mealy machine doing that, you want a pipeline of mealy machines. Each little mealy machine does exactly the same thing as that at the type level, because you're just labelling what the streams of inputs and outputs ARE, really.

   m0, m1, ..., mF :: Reset dom -> Signal dom dataIn -> Signal dom OutReady -> (Signal dom dataOut, Signal dom Valid)

Your only problem is how to connect them up so the output of the first streams into the second as input, the second output into the third, etc. Worry about what they actually DO inside later. That's just semantics inside the boxes. The wiring between the boxes is the important part.

So you name all these streams:

(dataOuts0, valids0) = m0 resets dataIns0 out_readies0
(dataOuts1, valids1) = m1 resets dataIns1 out_readies1
...

[This is cringeworthy stylistically in terms of code authorship, but this way one can see what is going on.]

Now you connect them:

dataIns1 = dataOuts0
out_readies1 = valids0
dataIns1 = dataOuts1
out_readies2 = valids1
...

There, you are, one pipeline. I hooked the same reset signal up to all the stages. It seemed kinder.

All you have to do is define what the pipeline components do.

They all do the same thing. If the out_ready input is high, then they take in a new datum and move out their old datum, together with a/its valid high tag. If the out_ready input is low, they don't take in a new datum, but they still  do move out their old datum, together with whatever its valid tag said.

That is a mealy machine (or a register, if you prefer, but a register is a mealy machine). Its state is a pair (dat,Valid)   (also known as "Maybe dat" in Clash). When the accompanying valid tag is low, nobody cares what value dat is, and it may be undefined, even. As follows:

   m :: Reset dom -> Signal dom (dataIn,OutReady) -> Signal dom (dataOut,Valid)
   m resets = withReset resets (mealy fn init)
                       where init :: (dat,Valid)
                                   init = (undefined,False)
                                   fn :: (dat,Valid) ->(dataIn,OutReady) -> ((dat,Valid), (dataOut,Valid))
                                   fn (dat,valid) (dataIn,out_ready) = if out_ready then ((dataIn,True),(dat,valid))
                                                                                               else ((undefined,False),(dat,valid))   -- new state, outputs


You can do that with just a register, if you prefer. A register really is a mealy machine.

   m resets = withReset resets (register (undefined,False) . filter)
                                where filter :: Signal dom (dataIn,OutReady) -> Signal dom (dataIn,OutReady)
                                            filter = fmap fn
                                                        where  fn :: (dataIn,OutReady) -> (dataIn,OutReady)
                                                                    fn (dataIn,out_ready) = if out_ready then (dataIn,True) else (undefined,False)

I am supposing throughout that OutReady ~ Valid ~ Bool, dataIn ~ dataOut ~ dat as types.

I mentioned the predefined type Clash has that represents (dat,True) as Just dat, and (undefined,False) as Nothing. You may as well use it.

   m :: Reset dom -> Signal dom (Maybe dataIn) -> Signal dom (Maybe dataOut)

Things look simpler like that. It's clearer that there's just one stream of inputs and one stream of outputs for each pipeline stage. The inputs and outputs carry tags. When the tag is a "Just" then the accompanying data is valid. When the tag is "Nothing" then the accompanying data is not valid. Indeed, to keep everyone honest, it has been vamooshed and replaced by undefined. That will save a few gates. It doesn't need storing or processing.

This is where you should notice that instead of laboriously writing out all the wiring names and hookups one by one, you can simply write a higher order combinator to put the whole lot together and save yourself the bother:

    handshake :: Reset dom -> Signal dom (Maybe dataIn) -> Signal dom (Maybe dataOut)
    handshake resets = compose (replicate d16 (m resets))
                                      where compose = fold (.)

Nobody needs to know the names of the wires, other than input to and output from the complete pipeline (and I haven't named those either).

As to what you actually did, that would bear quite some analysis, and I hope somebody will do it. Maybe me. Maybe not!

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 6:38:26 PM6/7/21
to clash-l...@googlegroups.com
Let me respond like this:

- What is your representation of a pipeline? What (name and) type have you given it?

I have not named the pipeline, but the first stage of the pipeline is named x (with type Signal dom a) which goes into "initial" for the fold. Fold constructs the pipeline stages every clk cycle.

- Is this vector a pipeline state? Part of a pipeline state? An input to the pipeline? Where do you create it and what is its name and type? The problem for me thus far is that you seem to be describing the pipeline I gave, and that does not correspond at all to the code you have written, so you must be thinking one thing and reading/writing another. You should check your thinking by devising some checks! Please add type declarations to the functions and objects you define in that code, and make sure that Clash's idea of what you have written corresponds to yours. You should find the answer is "no"!

The vector of ones I created is the +1 of the incrementer function I need for every stage of the pipeline. This code is something different from the other one we talked about previously (the one you proposed with mealy machines) and I'm just trying to ask some expert's opinion to see why this one takes much more resources on FPGA.

- What does the function iter do, and what is the function it constructs via fold, and what is the type it applies to and what is the type of what it produces? (Fold just applies a binary operator pairwise between the elements of a list/vector, so the information content in it is just the binary function part. I need to hear more than "something similar"! Please say exactly)

The function iter adds the pipeline element to 1 if ready is asserted otherwise a delayed version of that element is assigned for every clock cycle. That 1 comes from the vector I previously created. Because I created 16 ones this fold will continue for 16 cycles, and the last one will have 16 delays just like a pipeline structure.

-The trouble is, that needs to be a stream in order to be presented to iter as one of its arguments, so the type system will cause that to be interpreted as whatever the numerical representation of a stream of 1s is, which I presume is just "pure 1".
- Bottom line: declare types for your functions. I think you'll see you've been wrong about what the arguments and results are. Having "1" mean an infinite stream of Ints is certainly not what you intended. I think.

The type my iter function is:
iter :: (Int,Bool,Signal dom a) -> Signal dom a -> (Int,Bool,Signal dom a)
So, isn't a signal supposed to work on stream of values instead of pure values?  The type of "1" I defined is not just Int, it's "Signal dom a" which I think should be a stream of 1 values that should be paired with the values from my iter function.

Thanks,
Mahshid




Peter Breuer

unread,
Jun 7, 2021, 7:40:37 PM6/7/21
to clash-l...@googlegroups.com
There's a small problem here:

> I have not named the pipeline, but the first stage of the pipeline is named
> x (with type Signal dom a) which goes into "initial" for the fold. Fold

A pipeline stage is not a signal. A signal goes INTO a pipeline stage
(and another comes out). A signal is what one finds on a wire. Wires
are attached to stages.

I don't think we can get past that.

But I think I understand what you intend. Instead of supplying a
vector of 16 1s, you are supplying a vector of 16 wires, on each of
which 1 is constantly asserted, dynamically, from cycle to cycle.

That saves you having to store 1 internally, or connect a wire to the
positive line, internally. OTOH it leaves open the question of who and
where are the 16 sources of 1s that you have shanghaied into
supplying, forever.

> constructs the pipeline stages every clk cycle.

Pipeline stages cannot be constructed dynamically (though it's a nice
idea!). They're silicon.

> The vector of ones I created is the +1 of the incrementer function I need

It isn't, but it may be your intention. Clash will tell you that you
have created a vector of 16 signals, each of which is carrying a 1,
repeated forever. That is not a vector of ones, but a vector of
signals, and each signal is carrying infinitely many 1s, timewise.

If you intended to make a vector of 1s, you would have written

replicate d16 1 :: Vec 16 Int

But Clash will tell you that you have created:

replicate d16 1 :: Vec 16 (Signal dom Int)

So it is a collection of 16 unnamed wires with 1s on each. As I said,
I get your idea.

It just seems mind-bending to me! Why would you supply a stream of 1s?

> for every stage of the pipeline. This code is something different from the
> other one we talked about previously (the one you proposed with mealy
> machines)

But a pipeline IS "mealy machines" arranged in sequence. What do you
imagine it as if not that?

What you may be describing is stream semantics. That is a system in
which every node in a topological network is understood as the
producer and consumer of several (different!) infinite streams of
data. Each stream has one node as origin, and one node as destination.
(Some nodes split incoming data into two outgoing copies of the input
stream).

In that kind of system, yes, you would provide a constant "1" not as a
simple parameter, but as a stream of 1s, one 1 arriving every clock
cycle, forever.

Is that what you are imagining? I can guess that in some vocabulary,
the streams may be referred to as "pipes" (of data), and that somehow
you have conflated that with "pipeline", which is a different word.

The dictionary will say:

pipeline

<architecture> A sequence of {functional units} ("stages")
which performs a task in several steps, like an assembly line
in a factory.

There is a connection in language, but it is serendipitous.

So ... if I were to replace "pipeline" by "data pipe" in what you have
written, would I understand better?

Peter

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 7:58:55 PM6/7/21
to clash-l...@googlegroups.com
Sorry if that sentence made confusion, yes a pipeline is a mealy machine. What I meant was I want this code not to use mealy machine and just use register/delay/dflipflop. Just this.

Was that stream of 1s "mistake" a misunderstanding? You wrote it much better than me: "That is not a vector of ones, but a vector of

signals, and each signal is carrying infinitely many 1s, timewise."

-So it is a collection of 16 unnamed wires with 1s on each. As I said, I get your idea.

It just seems mind-bending to me! Why would you supply a stream of 1s?

I need those for the function inside iter, here for the incrementer it is 1 because I add to 1, but suppose you have a function like CORDIC that keeps constants in a look-up table and every clock cycle gets and applies a function to each one. The idea basically came from here!

Thanks,
Mahshid

--
You received this message because you are subscribed to the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.

Peter Breuer

unread,
Jun 7, 2021, 8:45:03 PM6/7/21
to clash-l...@googlegroups.com
On 08/06/2021, Mahshid Shahmohammadian <mahshi...@gmail.com> wrote:
> Sorry if that sentence made confusion, yes a pipeline is a mealy machine.
> What I meant was I want this code not to use mealy machine and just use
> register/delay/dflipflop. Just this.

Now you've confused me more! A register IS a mealy machine. And I DID
construct the single machine using just a register and nothing else,
just to illustrate that it is so. Look down the page.


> Was that stream of 1s "mistake" a misunderstanding? You wrote it much
> better than me: "That is not a vector of ones, but a vector of
> signals, and each signal is carrying infinitely many 1s, timewise."

But I don't know which you want! They are different! I now understand
that you want a vector of streams, each carrying 1s, and have been
mistakenly calling that a vector of 1s? Yes?


> Why would you supply a stream of 1s?
>
> I need those for the function inside iter, here for the incrementer it is 1

But iter is a nonsensical construction as it stands: as a node in a
stream semantics network, it receives two input streams and one
PARAMETER (not a stream of parameters!) and produces one output
stream. In other words, it is a node that is switched between two
stream transform semantics by the parameter"out_ready", which is a
constant in time.

You must have made a mistake there. If you are going to provide a 1 as
a stream of 1s, then you are surely going to provide the out_ready
parameter not as a single one-time-for-all-time value, but as a stream
of values.

> because I add to 1, but suppose you have a function like CORDIC that keeps

CORDIC?

> constants in a look-up table and every clock cycle gets and applies a
> function to each one. The idea basically came from here!


Surely you want not to do a vector calculation ("map ...") but instead
do just a piece of an arbitrary calculation in 16 stages? So the type
might be:

mymachine :: Signal dom (Maybe (Val ,Vec 16 Cmd)) -> Signal dom (Maybe Val)

and the idea is that each value (with out_ready asserted) comes in
with 16 commands about what to do to it in sequence in each stage
attached to it, and it comes out with those done.

That IS a pipeline.

But is not particularly interesting because it's just a linear
topology, in stream semantics terms.

I think that before you were thinking of supplying a stream of
commands to each node in a 16-node (linear) network. Thus:

n0, n1, n2, ... :: Signal dom (Maybe Cmd) -> Signal dom (Maybe
Val) -> Signal dom (Maybe Val)

outs0 = n0 cmds0 ins0
outs1 = n1 cmds1 ins1
...

and each n0,n1, n2, ... is the same mealy machine (which I wrote as a register).

If a command comes in at the same time as a valid data, that validates the pair!

Rest left as exercise.

Peter

Mahshid Shahmohammadian

unread,
Jun 7, 2021, 9:01:56 PM6/7/21
to clash-l...@googlegroups.com
Now you've confused me more! A register IS a mealy machine. And I DID
construct the single machine using just a register and nothing else,
just to illustrate that it is so. Look down the page.

Yes, that's right I mean the syntax. I saw both of your implementations and understand. I am just curious to know why my other code is resulting in high resources, but still the same results in output and waveform in simulation.

But I don't know which you want! They are different!  I now understand
that you want a vector of streams,  each carrying 1s, and have been
mistakenly calling that a vector of 1s? Yes?

Yes, a vector of stream of ones with type Vec 16 (Signal dom a)

But iter is a nonsensical construction as it stands: as a node in a
stream semantics network, it receives two input streams and one
PARAMETER (not a stream of parameters!)

I see, ready should by of type Signal dom Bool, yes? so that it is also a stream of parameters. So, I think iter should have the type below:
iter :: (Int,Signal dom Bool,Signal dom a) -> Signal dom a -> (Int,Signal dom Bool,Signal dom a)


Thank you,
Mahshid

--
You received this message because you are subscribed to the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.

peter.t...@gmail.com

unread,
Jun 8, 2021, 4:42:38 AM6/8/21
to Clash - Hardware Description Language
Your (intended!) stream semantics code cannot be giving the same simulation results as the pipeline code, because they really are type-theoretically and semantically different. It could be one of those situations where one thinks X is looking at B but it is really looking at A, because Y has thought that they had erased A and replaced it by B, but one  of those steps has silently failed, so it is still A there, not B.

I would prove that by adding the type declarations to the stream semantics code, and simplifying, but that would be really, really pedantic! It just isn't the same shape, form or content. One can, for example simplify the declaration

    (_,_,x') = foldl iter initial ones

to

   x' = if out_ready then 16 else compose (replicate d16 dflipflop) x'

( "16" really stands for a stream of 16::Int here; it is actually "sum ones", where ones = replicate d16 1 is a vector of 16 streams of 1::Int; Your fold placed a + operator "between" each of those 16 streams, creating the pointwise sum of 16 streams of 1s, which is a stream of 16s).

It just cannot be the case that you expect either the answer 16 (as a stream!) or "undefined", depending on the one constant parameter "out_ready".

I think you want to go all the way to a stream semantics rendering. Forget about things like fold and iter for now. They do not directly apply to that situation.


Peter

peter.t...@gmail.com

unread,
Jun 8, 2021, 5:08:58 AM6/8/21
to Clash - Hardware Description Language
By the way, I think what you want in terms of a topological network of nodes and "pipes"  (aka streams of data) is:

dataIn -> (n0) -> (n1) -> ... (n15) -> dataOut
            ^      ^           ^
            |      |           |
           cmd0   cmd1       cmd15

where dataIn, dataOut, cmd0, cmd1, ..., cmd15 are all _streams_. 

Is that right?  Does a command flip a node into a state where it does the commanded function from then onwards, or does it just apply to the current data input to the node (or the next data input)?

The only difference wrt a "pipeline" is that the stages are commanded to change functionality individually, in real time , instead of taking their cue from a program  (sequence of 16 instructions) that came in packaged along with the data at the single entry point to the pipeline.

Peter

Peter Lebbing

unread,
Jun 8, 2021, 12:31:05 PM6/8/21
to clash-l...@googlegroups.com
Hello Mahshid, this is Peter from QBayLogic.

Let me start out noting that Clash is a structural HDL. The structure of
your Clash code will be really similar to the structure in the FPGA.
There is no high-level synthesis. Maybe in the future, but not at the
moment.

On Mon, 31 May 2021, Mahshid Shahmohammadian wrote:
> I already implemented the single machine version, but my own version. (since I could not figure out that " . curry
> bundle" part and got syntax errors). Here it is:
>
> m16 :: (Num a) 
>     => (Vec 16 a,Vec 16 Bool)
>     -> (Bool,a,Valid,Ready)
>     -> ((Vec 16 a,Vec 16 Bool),(a,Valid,Ready))
> m16 (x_,v_) (rst,dataIn,in_valid,out_ready) = ((x',v'),(dataOut,out_valid,in_ready))
>   where
>     v' = if rst then (replicate d16 False)
>          else (if (out_ready == True) then (in_valid +>> v_) else v_)
>     x' = if (rst == False && out_ready == True)
>          then (map (+1) (dataIn +>> x_)) else x_
>     out_valid = last v_
>     dataOut = last x_
>     in_ready  = out_ready
>
> hsm16 :: (HiddenClockResetEnable dom, Num a, NFDataX a)
>        => Signal dom (Bool, a, Valid, Ready)
>        -> Signal dom (a,Valid, Ready)
> hsm16 = mealy m16 (replicate d16 0, replicate d16 False)

I see room for improvement, but yeah, that's a basic working pipeline
that seems to match the waveform you mentioned[1], just looking at it. I
haven't tried to run it.

> After synthesizing this, it's resulting in a much fewer number of LUTs than the fold implementation that I mentioned
> before.

So there are two styles of composition in Clash. You can write a lot of
functionality in a combinatorial network and then connect registers to
that. So for instance you end up with a combinatorial network of the
type `NFDataX s => s -> i -> (s, o)` and you plug that into `mealy`, or
if the output of the Mealy machine happens to match the state, another
good option is:

myMachine x =
let y = register 0 (f x y)
in y

Your function `f` is the combinatorial circuit there. And the 0 is the
initial state.

The other style of composition in Clash is to have components that take
and produce `Signal`s, and connect those together. So instead of making
a large combinatorial function and passing that to `mealy`, you create
smaller combinatorial functions, each in a `mealy` and connect those
together.

The following two are equivalent:

variant1
:: HiddenClockResetEnable dom
=> Signal dom Bool
-> Signal dom (Unsigned 8)
-> (Signal dom Bool, Signal dom (Unsigned 8))
variant1 valid inp = mealyB transF (repeat False, 0, 0) (valid, inp)
where
transF
:: (Vec 2 Bool, Unsigned 8, Unsigned 8)
-> (Bool, Unsigned 8)
-> ( (Vec 2 Bool, Unsigned 8, Unsigned 8)
, (Bool, Unsigned 8))
transF (valids, d1, d2) (valid, inp) =
let valids0 = valid +>> valids
d1_0 = inp + 3
d2_0 = d1 `shiftL` 1
validO = last valids
in ((valids0, d1_0, d2_0), (validO, d2))

variant2
:: forall dom
. HiddenClockResetEnable dom
=> Signal dom Bool
-> Signal dom (Unsigned 8)
-> (Signal dom Bool, Signal dom (Unsigned 8))
variant2 valid inp = (validO, d2)
where
valids :: Signal dom (Vec 2 Bool)
valids = register (repeat False) (liftA2 (+>>) valid valids)
validO = last <$> valids

d1 = register 0 (inp + 3)
d2 = register 0 ((`shiftL` 1) <$> d1)

Now one style might lead to an easier to understand result in one case,
and in other cases the other style might be preferable. Note that
`variant1` is a lot more friendly to people just starting out. In
variant2, I could get away with `inp + 3` because, as you had already
discovered, we have this instance:

Num a => Num (Signal dom a)

This says that whenever you use a type that has a `Num` instance, you
can do `Num` things directly to a signal of that type. That means you
can write literal integrals and do +, -, *, negate, abs and signum
directly on signals, as an aid to writing designs. Because if you need
to do something different, you suddenly start seeing these <$> and <*>
(latter not here) or liftA, liftA2 and stuff like that. This means
that operations should be lifted to work on Signals.

The forms are equivalent, but sometimes one form can lead to HDL that is
easier to optimise by the synthesis tooling than if you used the other
form. Just by my guts, I expect many `fold`s to maybe lead to a better
result if done inside `mealy` or `register` rather than a `fold` over
`Signal`s. But I might be wrong; I'm not a compiler developer here at
QBayLogic, so my knowledge of this is pretty limited.

> So, I wonder if I can get those 16 machines compositions to work, I can optimize the resource utilization by a
> lot! I'm kinda stuck on that curry and (.) which I think you tried to get the output of type "signal dom data" out of
> mealy call. Could you please double-check that?

The following two types are not equal, but are semantically equivalent
and they are isomorphic:

1. (Signal dom a, Signal dom b)
2. Signal dom (a, b)

One is a tuple of signals, and the other is a signal of a tuple. Using
bundle, you can go from 1. to 2. and unbundle, you can go from 2. to 1.

`mealy` has the type:
[...] -> Signal dom i -> Signal dom o

It has only a single input. So what if we have multiple inputs and yet
want to use `mealy`? We write:

f i1 i2 = mealy fT 0 (bundle (i1, i2))

This is all you need to know to work with it. In fact, read the
documentation on `mealyB` for a shorthand for often occurring situations
with `mealy` and `bundle` (don't use `mealyB` indiscriminately!).

Now people who are well used to writing Haskell might do the following
two transformations that change almost nothing about its semantics:

f i1 i2 = mealy fT 0 (curry bundle i1 i2)
f = mealyfT 0 . curry bundle

This is because `curry` is:

curry :: ((a, b) -> c) -> a -> b -> c

and a concrete instance of `curry bundle` can be the
following:

f :: KnownDomain dom
=> Signal dom a
-> Signal dom b
-> Signal dom (a, b)
f = curry bundle

Without specifying type signatures, Clash might be unable to precisely
infer the type of "curry bundle" though, so be sure to sprinkle
liberally with type signatures.

Or rather, if I were you, I'd just write

f i1 i2 = mealy fT 0 (bundle (i1, i2))

or

f i1 i2 = mealyB fT 0 (i1, i2)

until you feel you want to involve more advanced prose-saving
constructions.

> BTW this is my other version which takes much more resources. I would appreciate it if you share any thoughts on why
> this one is utilizing way more LUTs.
>
> handshakeParallel :: (HiddenClockResetEnable dom, Num a,NFDataX a)
>                    => (Bool, Signal dom a,Valid,Ready)
>                    -> (Signal dom a,Valid,Ready)
> handshakeParallel (rst,dataIn,in_valid,out_ready) = (x',out_valid,out_ready)
>   where   
>     val_vec = replicate d16 in_valid
>     vec' = if (rst == False && out_ready == True)
>            then rotateLeft val_vec (length val_vec - 1)
>            else rotateLeft val_vec (length val_vec - 2)      
>     out_valid = if (rst == True) then False else (vec' !! 0)
>
>     x = if (rst == False && out_ready == True) then (dataIn+1) else (delay 0 x)
>     ones = replicate d15 1
>     initial = (0,out_ready,x)
>     (_,_,x') = foldl iter initial ones
>
> iter (i, ready, f) g = (i',ready',f')
>    where
>      f' = if ready then ( f + g) else (dflipflop f')
>      ready' = ready
>      i' = i + 1

I think it's the path where you do one of two `rotateLeft`s. Those are
dynamic rotates, that take a "runtime" rotation amount, even though
optimisations should be able to determine that since `length val_vec` is
actually a constant, it's just a mux choosing between two possible
sources. The code also does not look like it's working
right.

The whole of vec' is 16 copies of in_valid, which is rotated by two
fixed amounts depending on the value of rst and out_ready. Since all
elements are equal to one another (it's just 16 times the input
in_valid), it can't be what you intended to write.

I can't tell whether your `Valid` and `Ready` types are `Bool` or
`Signal dom Bool`, but the difference is major. If it's a Bool, it
cannot be a signal in the design, it should be treated as a constant
coefficient. You might be able to blur this line, but if it is something
that might be one value at one moment in your circuit and another at a
different moment, then it needs to be in `Signal` for your own sanity
and the sanity of the compiler.

I can't really tell what your intention was with the `rotateLeft`, but I
can concoct a version that is equivalent to the correct code at the top
quote in this e-mail, if you want me to, for educational purposes. But
if I were you, I'd do most of your writing in the combinatorial
transition function. That is the most accessible style when Haskell is
relatively new to you.

I'll also respond later with some ideas about pipelines, but now my work
day is over!

HTH,

Peter.

[1] https://groups.google.com/g/clash-language/c/nhnc_dNOOxg/m/GptSDPUXAgAJ

peter.t...@gmail.com

unread,
Jun 8, 2021, 7:34:14 PM6/8/21
to Clash - Hardware Description Language


I was so highly amused by

>  because, as you had already  discovered, we have this instance:
> Num a => Num (Signal dom a)

that I decided to see what other weird stuff one could get happening by simply mentioning things like "1" and having Clash interpret them as something else completely (in the above, the signal that is always 1/aka the stream of infinitely many 1s).

It turns out that Signal is also an applicative function, which means one should morally be able to get a stream of y by just writing a stream of functions each of type x->y right next to a stream of x, as in "fns xs".

Boo hoo, but it won't do that, because applicative functor class functor things need to be kicked with a "<*>" before they'll take the hint and do the obvious (apply themselves pointwise), so one has to write "fns <*> xs" , which rather spoils the surprise.

I suppose one could write that as "(fns <*>) xs", which is a lot nicer, apart from the parens, because yes, that is the "upper star" of a function thing in a monadic setting.  Perhaps "(<*>) fns xs" is nicer, because at least fns and xs do end up next to each other, which is the morally right thing.

Anyway, I thought I might write one of the nodes in the intended stream-semantics network using that sort of mind-bending technique. It's all a bit forced, because one has to sometimes use "<*>" and sometimes use "<$>" to keep the type system happy, so the typing is getting in the way of it rather than driving the magic. It would be nicer if one could just write what looks natural and have those things inferred for one (and complete confusion for anyone who can't figure out what is going on, heh). This is the topological network:

dataIn -> (n0) -> (n1) -> ... (n15) -> dataOut
            ^      ^           ^
            |      |           |
           cmd0   cmd1       cmd15

and the stuff coming in and going out are all streams. The dataIn is only sometimes there (Maybe x) [what was called "out_ready" data when the ta is Just, not so "out_ready" when Nothing]. The commands will be only sometimes there too (Maybe (x->x)) and when a new command comes in its action is to change the function of the node from then on. The dataOut is also only sometimes there, being a 16-clock interval behind the input, but otherwise in step.

Here is n0, mysteriously written without much of any typing to help the reader:


(HiddenClockResetEnable dom, NFDataX dataOut)
  => Signal dom (Maybe dataIn)               -- ready/not ready data in
  -> Signal dom (Maybe (dataIn->dataOut))    -- maybe command flip to new functionality
  -> Signal dom (Maybe dataOut)              -- valid/not valid data out
n0 mins mfns = mouts
            where mouts  = fmap <$> states <*> mins
                  states = register state0 altered_states
                           where altered_states = (?:) <$> mfns <*> states
                  state0 = const undefined

-- helper
(?:) :: Maybe x -> x -> x
(?:) (Just x) y = x
(?:) Nothing  y = y

Surely nobody can understand that! Not even if I add the typing:

                  mouts :: Signal dom (Maybe dataOut)
                  mouts = fmap <$> states <*> mins

                  states :: Signal dom (dataIn -> dataOut)
                  states = register state0 altered_states

                  altered_states :: Signal dom (dataIn->dataOut)
                  altered_states = (?:) <$> mfns <*> states

                  state0 :: dataIn -> dataOut
                  state0 = const undefined




Surely not.

Peter

Peter Lebbing

unread,
Jun 9, 2021, 8:43:55 AM6/9/21
to Clash - Hardware Description Language
On Tue, 8 Jun 2021, peter.t...@gmail.com wrote:
> [...]
> Surely nobody can understand that! Not even if I add the typing:
>
> [...]
> Surely not.

This thread is about helping Mahshid write their circuit in Clash. This post
does not help with that; it only makes things more difficult to understand.
Let's please try to keep threads on-topic. Thank you!

Peter Lebbing.

peter.t...@gmail.com

unread,
Jun 9, 2021, 10:52:16 AM6/9/21
to Clash - Hardware Description Language
"This thread is about helping Mahshid write their circuit in Clash. This post
does not help with that; it only makes things more difficult to understand."

Nice attempt at humor!

It's the first time I have ever used such technique, so I am at learner level! Anything more you can tell me about it would be interesting.

However, it's exactly what Mashid asked for right now in order to compare with the big clunky combinatorial logic plus mealy machine approach: use a register, "not" a mealy machine (we know that's just presentation, but still), and use the streams of values as though they were values, and build a topological network of components communicating over "pipes" (as I think we have found out, rather than "pipeline").

Actually my  claimng  the notation is rebarbative was to encourage you not to do it. I merely copied your introduction of a mealy machine as y = register 0 (f x y).

Perhaps you're pretending not to understand that intentionally gentle remark as a sort of double touche!

[The difficulty for understanding in that is that the same thing, "y", appears on the left and the right, so it's hard to understand for a software engineer because it looks like a big baaaad recursion. It might even look bad to a h/w engineer who has taken to heart the admonition to avoid recursion, if synthesis is wanted! Yet it could also look OK to a h/w engineer who knows absolutely nothing about programming, and sees it simply as a wiring diagram in which y is wired through  logic f that mixes in input x and goes into a register the output of which is the y that we tapped to provide the input to f in the first place. The "y" and "x" are the labels on the wires in a diagram,  as a H/W engineer is used to, and the "f" and "register" and the labels on boxes they are connected to. Similarly in foo <*> bar one has two wires leading into a box called "<*>" and coming out with their data generically conjoined. I complained that this nice attempt at making things easy for HW engineers actually fails a little because the type system interferes at times rather than helps, forcing me to write gum <$> foo <*> bar where effortless writing would have  gum <*> foo <*> bar and let the type system take care of it (it should infer pure gum <*> foo <*> bar) . If you want to fix that user-unfriendliness, there's a project! If one can automatically interpret 1 as pure 1 in the right context, one should be able to automatically interpret gum as pure gum here, and Wriggley Spearmint gum at that ..]

[NB. I went looking for some operator/class that would allow one to write y =  0 : f x y or similar, but didn't see anything great. Foldable?]

I too would be interested in knowing how the gate counts compare between

1) one big mealy machine with a 16-part state
2) 16 small mealy machines arranged in a pipeline
3) a topological network, as set out above [with +1 cmds and in-node interpretation of them instead of general functions]

All we know right now is that what Mashid thinks is the gate count for the alternative to (1) that she built can't be right,  because the code for that was pretty much impossible every which way, thanks to confusion between vector of 1s/vector of stream of 1s. But ...

Since a pipeline IS a topological network, it's hard to figure what more one could want in asking for something that is NOT a pipeline. My suggestion above is  to introduce data simultaneously at every point in the "pipeline" rather than only at the beginning (which is what pipeline means).

Or perhaps she wants to take it out from the middle too! I don't know!  There are hints that more "parallelism" is what is wanted.

Anyway, maybe Mashid thinks 1) and 2) are the same (why?), so has not measured (2), which is a shame if so, as that would answer a question.

It would tell  if Clash's normalization expands what VHDL would regard as "inner loops" into the one big outer loop that VHDL simulations run, and VHDL  (93) semantics requires (by adding extra loop counters, etc). If so, there should be no difference.

I would actually disagree fairly strongly with the characterization you gave of Clash as "structural" (firstly because that does not actually mean anything - it's the kind of thing people say to excuse that their compiler doesn't do much more than translate literally what the programmer wrote, while hoping that the people they are talking to don't know that while it sounds positive to them! It's PR. And secondly ...) because the extensive and computationally heavy "normalization" that Clash undertakes is ostensibly a smashing up of what the programmer wrote into a different, normal form that by definition must take no notice of the presentation of the original.  To make it be "structural" one would have to declare NOINLINE of every programmed function, which would stop normalization.

It can be made to look particularly "structural" by programming in terms of those  <*>, <$> symbols and stopping inlining of those or larger subexpressions.. I was wondering why Christiaan seems to like them as I have to look them up every time .. but perhaps I have understood now. Possibly they  reflect an inheritance from stream semantics and topological networks?



 
Clearer?

PTB

Peter Lebbing

unread,
Jun 9, 2021, 11:21:35 AM6/9/21
to clash-l...@googlegroups.com
On Tue, 8 Jun 2021, Peter Lebbing wrote:
> I'll also respond later with some ideas about pipelines, but now my work
> day is over!

So here we go. Except that I made a mistake yesterday and would like to
correct it:

> So there are two styles of composition in Clash. You can write a lot of
> functionality in a combinatorial network and then connect registers to
> that. So for instance you end up with a combinatorial network of the
> type `NFDataX s => s -> i -> (s, o)` and you plug that into `mealy`, or
> if the output of the Mealy machine happens to match the state, another
> good option is:
>
> myMachine x =
> let y = register 0 (f x y)
> in y

That falls firmly in the second style, connecting `Signal`s, and I wrote
it wrong. It should be:

myMachine x =
let y = register 0 (f <$> x <*> y)
in y

because I had intended `f` to be a plain function, but `x` and `y` are
`Signal`s so we need to lift `f` to `Signal`s. Forget this example, it
was poor. The usual way to write this in what I dubbed "the first style"
is:

myMachine x = mealy myMachineT 0 x

myMachineT s i = (s', s)
where
s' = ... compute next state ...

(first line can be eta-reduced to

myMachine = mealy myMachineT 0

if you want)

You might even look at moore[1] (`moore trans id`), or medvedev[2]. The
latter needs to be imported before it's available.


But let's talk pipelines.

You use boolean valid flags, but Clash has a nice tool for this, the
`Maybe` type. This generates identical hardware as a valid bit and a
value, but it has some useful features.

First off, `Maybe` is a `Functor`. Without going into details, `Functor`
gives you a function `fmap` and its infix version `<$>`. The two are
identical, but one is prefix notation and the other infix notation. For
`Maybe` specifically, they are defined as follows:

fmap
:: (a -> b)
-> Maybe a
-> Maybe b
fmap f Nothing = Nothing
fmap f (Just x) = Just (f x)

So we give it a function and a `Maybe` value and the function is applied
to every value inside the `Maybe`. `Nothing` stays `Nothing`.

Secondly, you want to stall the pipeline when the thing connected to the
output is not ready, and you want to be able to reset the thing to an
initial value. Many of our building blocks have a context that states
`HiddenClockResetEnable dom`. This context asserts that the function has
a clock domain called `dom` (here universally quantified) and three
hidden parameters, a `Clock dom`, a `Reset dom` and an `Enable dom`.
These signals will become explicit during generation of HDL for your
synthesis tool, and can be connected there. But especially `Enable` can
be used inside a design and is perfect for stalling a pipeline. So I'd
like to introduce you to the following solution, my first variant of the
pipeline:

--8<---------------cut here---------------start------------->8---

import Clash.Prelude

import qualified Clash.Explicit.Prelude as CEP

variant1
:: forall dom a
. ( HiddenClockResetEnable dom
, Num a
, NFDataX a
)
=> Signal dom (Maybe a)
-- ^ Input
-> Signal dom Bool
-- ^ Output ready
-> ( Signal dom (Maybe a)
-- ^ Output
, Signal dom Bool
-- ^ Input ready
)
variant1 inp outReady = (variant1_0 outReady inp, outReady)
where
variant1_0 = hideClockResetEnable variant1_1

variant1_1
:: KnownDomain dom
=> Clock dom
-> Reset dom
-> Enable dom
-> Signal dom Bool
-> Signal dom (Maybe a)
-> Signal dom (Maybe a)
variant1_1 clk rst en ready =
let en0 = CEP.enable en ready
in CEP.mealy clk rst en0 var1Trans (repeat Nothing)

var1Trans
:: Num a
=> Vec 17 (Maybe a)
-> Maybe a
-> (Vec 17 (Maybe a), Maybe a)
var1Trans s i = (s', last s)
where
s' = i :> map (fmap (+1)) (init s)

--8<---------------cut here---------------end--------------->8---

There's two forms, implicit with its hidden parameters, and explicit.
The Haskell type system will appreciate it if you make clear cuts
between the two, trying to be too clever will outsmart the type system,
unfortunately. So either have something be implicit or explicit, try to
keep them separate.

`variant1` is the packaging, passing the ready on to the producer before
the component and passing to `variant1_0`. `variant1_0` merely hides the
explicit parameters of `variant1_1`. `variant1_1` is written with
explicit parameters, and merges the `ready` signal into the `Enable` for
`CEP.mealy`, the explicit form of a mealy machine. The effect is gating
all registers of `CEP.mealy` if `ready` is de-asserted /or/ when the
incoming Enable is de-asserted. So they are binary-AND-ed together, in
order to keep honoring the incoming `Enable`.

Use `enableGen` to generate an enable that is always asserted. If Clash
can determine that a circuit is using an always asserted `Enable` line,
it will in many cases not emit any enable logic in generated HDL.

The transition function `var1Trans` is short and sweet. I surmised that
your pipeline stages would have a significant propagation delay, so I
stuck a register in front, and there is also a register directly before
the output (the latter was also in your design). `i` is just registered
without any combinatorial logic, and the rest of the pipeline stages
have `fmap (+1)` applied to the `Maybe a`.

Using `Maybe` also gives us an advantage your version of the pipeline
did not have. Using `BitPack`[3], you can inspect what many types used
for representing data look like in logic. Now look at what
`Maybe (Unsigned 8)` looks like on the wire:

Start `clashi` and type the following:

>>> import qualified Prelude as P
>>> putStr (unlines (P.map (show . pack) [Nothing, Just (1 :: Unsigned 8), Just 2, Just 3]))
0_...._....
1_0000_0001
1_0000_0010
1_0000_0011

(For clarity, the dot means function composition, so `show . pack` means
"first apply pack, then apply show".

Note that you can use https://hoogle.haskell.org/ to quickly find
documentation for functions, like https://hoogle.haskell.org/?hoogle=unlines .)

What we see here is a valid bit in position 8, and positions 0 through 7
are the value. When the valid bit is unset for `Nothing`, this tells us
that the bits of the value are don't care bits, represented by dots.
Note that our initial state (and reset state, if you assert the hidden
`Reset` input) consists of all `Nothing`s. This tells synthesis that all
"valid bits" should obviously be zero, but the data bits can be
don't care. Synthesis will only synthesise a Clear input on the "valid"
flipflop connected to Reset, and will not route any Reset at all to the
data bits. This improves the efficiency, reducing fabric usage,
improving placement and possibly raising clock speeds.

One thing I don't know if synthesis does, is automatically do the best
thing for power usage. From a power usage standpoint, it should gate the
flipflops for the data bits based on the valid bit, because unchanging
don't care data uses less power than changing don't care data. On the
other hand, the extra gating increases fabric usage and might prevent
optimal placement of flipflops in a certain LE, so maybe it depends on
the circuit what is optimal.

Viewing the RTL in your synthesis tool will probably show all the reset
and enable lines of individual flip flops for more info.

One final comment about variant1. Type variables are implicitly
universally quantified, but by doing `forall dom a` explicitly, we bind
the type variables `dom` and `a`, which allows us to use those variables
in the `where` block in type signatures, asserting that all those `a`s
refer to one and the same concrete type.

This was a toy example with every stage doing the same thing. What about
having the stages do the same operation but differing coefficients?
Here's variant2. It just changes `var1Trans`, the rest stays the same.
Adding that is left as an exercise to the reader.

var2Trans
:: Num a
=> Vec 17 (Maybe a)
-> Maybe a
-> (Vec 17 (Maybe a), Maybe a)
var2Trans s i = (s', last s)
where
s' = i :> zipWith (flip (fmap . (+)))
(init s)
( 2 :> 3 :> 5 :> 7 :> 11 :> 13 :> 17 :> 19
:> 23 :> 29 :> 31 :> 37 :> 41 :> 43 :> 47 :> 53 :> Nil)


But if even your operations are different, I think variant 3 is the
better option:

--8<---------------cut here---------------start------------->8---

variant3
:: forall dom a
. ( HiddenClockResetEnable dom
, Num a
, Bits a
, NFDataX a
)
=> Signal dom (Maybe a)
-- ^ Input
-> Signal dom Bool
-- ^ Output ready
-> ( Signal dom (Maybe a)
-- ^ Output
, Signal dom Bool
-- Input ready
)
variant3 inp outReady = (variant3_0 outReady inp, outReady)
where
variant3_0 = hideClockResetEnable variant3_1

variant3_1
:: KnownDomain dom
=> Clock dom
-> Reset dom
-> Enable dom
-> Signal dom Bool
-> Signal dom (Maybe a)
-> Signal dom (Maybe a)
variant3_1 clk rst en ready =
let en0 = CEP.enable en ready
in withClockResetEnable clk rst en0 variant3_2

variant3_2
:: ( HiddenClockResetEnable dom
, Num a
, Bits a
, NFDataX a
)
=> Signal dom (Maybe a)
-> Signal dom (Maybe a)
variant3_2 =
plStage (`xor` 32767)
. plStage (+ 5)
. plStage (`shiftL` 2)
. plStage (.&. 682)

plStage
:: ( HiddenClockResetEnable dom
, NFDataX a
)
=> (a -> a)
-> Signal dom (Maybe a)
-> Signal dom (Maybe a)
plStage f = register Nothing . (fmap f <$>)

--8<---------------cut here---------------end--------------->8---

Note that I'm using `register` even though we have `regEn` which would
readily (heh) take the boolean ready input. But I liked `variant3_2` a
lot better with just the data flowing through the composition, so I kept
the stuff where the ready input is merged in the `Enable` input.

The `(fmap f <$>)` is silliness, but it's useful. Let's look at the
types in clashi:

>>> :t fmap
fmap :: Functor f => (a -> b) -> f a -> f b
>>> :t (\f -> (fmap f <$>))
(\f -> (fmap f <$>))
:: (Functor f1, Functor f2) => (a -> b) -> f1 (f2 a) -> f1 (f2 b)
>>> :t fmap . fmap
fmap . fmap
:: (Functor f1, Functor f2) => (a -> b) -> f1 (f2 a) -> f1 (f2 b)

The thing is that `Signal` is also a Functor. So when we have a
`Signal (Maybe a)`, that's a Functor over a Functor. So we need to
`fmap` twice to get a value `a` to a `Maybe a` to a `Signal (Maybe a)`.
And <$> is the infix version of `fmap`, the following two are the same
thing:

((fmap . fmap) f)
(fmap f <$>)

I hope this gets you on the right track!

Peter.

[1] https://hackage.haskell.org/package/clash-prelude-1.4.2/docs/Clash-Prelude-Moore.html#v:moore

[2] https://hackage.haskell.org/package/clash-prelude-1.4.2/docs/Clash-Prelude-Moore.html#v:medvedev

[3] https://hackage.haskell.org/package/clash-prelude-1.4.2/docs/Clash-Class-BitPack.html

Peter Lebbing

unread,
Jun 9, 2021, 11:37:10 AM6/9/21
to clash-l...@googlegroups.com
On Wed, 9 Jun 2021, Peter Lebbing wrote:
> variant3_2 =
> plStage (`xor` 32767)
> . plStage (+ 5)
> . plStage (`shiftL` 2)
> . plStage (.&. 682)

There's no register on the input anymore, make that

variant3_2 =
plStage (`xor` 32767)
. plStage (+ 5)
. plStage (`shiftL` 2)
. plStage (.&. 682)
. register Nothing

(alternatively, `plStage id` would do the same).

And data flows from bottom to top, so it's ANDed first, then shifted,
then addition, then XOR.

HTH,

Peter.

Mahshid Shahmohammadian

unread,
Jun 9, 2021, 12:08:10 PM6/9/21
to Clash - Hardware Description Language
Thank you all for the effort, I've got many ideas of what could have been wrong with the second code (register with fold) and what other options to consider!

Mahshid
Message has been deleted
Message has been deleted

Peter Lebbing

unread,
Jun 10, 2021, 7:06:05 AM6/10/21
to clash-l...@googlegroups.com

Hello Mahshid!

On Wed, 9 Jun 2021, Mahshid Shahmohammadian wrote:

Thank you all for the effort, I've got many ideas of what could have
been wrong with the second code (register with fold) and what other
options to consider!

You're welcome, I hope you'll like using Clash!

It occurred to me that I'm not listening to my own advice! :-) I'm using complicated syntax for register in plStage when I said that moore is much more readable.

This one is so much nicer and identical in function:

plStage f = moore (const (fmap f)) id Nothing

I hadn't mentioned that you can use Hoogle for Clash documentation as well:

https://hoogle.haskell.org/?hoogle=moore

It gives two results inside clash-prelude, one with explicit clock, reset and enable and one with implicit ones. I'm using the implicit one here.


And I thought that Google Groups destroyed the formatting of code for everyone, but it appears it only happens for plain text mails, it looks like people using HTML mails can properly format code. So I'll try to paste all my code below in a form that is not a pain to read.

The two variants from my first mail, as an example of "mealy style" and "Signal composition style" are:

variant1
  :: HiddenClockResetEnable dom
  => Signal dom Bool


  -> Signal dom (Unsigned 8)
  -> (Signal dom Bool, Signal dom (Unsigned 8))
variant1 valid inp = mealyB transF (repeat False, 0, 0) (valid, inp)
 where
  transF
    :: (Vec 2 Bool, Unsigned 8, Unsigned 8)
    -> (Bool, Unsigned 8)
    -> ( (Vec 2 Bool, Unsigned 8, Unsigned 8)
       , (Bool, Unsigned 8))
  transF (valids, d1, d2) (valid, inp) =
    let valids0 = valid +>> valids
        d1_0 = inp + 3
        d2_0 = d1 `shiftL` 1
        validO = last valids
    in ((valids0, d1_0, d2_0), (validO, d2))

variant2
  :: forall dom
   . HiddenClockResetEnable dom

  => Signal dom Bool


  -> Signal dom (Unsigned 8)
  -> (Signal dom Bool, Signal dom (Unsigned 8))
variant2 valid inp = (validO, d2)
 where
  valids :: Signal dom (Vec 2 Bool)
  valids = register (repeat False) (liftA2 (+>>) valid valids)
  validO = last <$> valids

  d1 = register 0 (inp + 3)
  d2 = register 0 ((`shiftL` 1) <$> d1)


The variants of the pipeline from my second mail:

var2Trans


  :: Num a
  => Vec 17 (Maybe a)
  -> Maybe a
  -> (Vec 17 (Maybe a), Maybe a)
var2Trans s i = (s', last s)
 where
  s' = i :> zipWith (flip (fmap . (+)))
                    (init s)
                    (    2 :>  3 :>  5 :>  7 :> 11 :> 13 :> 17 :> 19
                     :> 23 :> 29 :> 31 :> 37 :> 41 :> 43 :> 47 :> 53 :> Nil)

variant3

variant3_2 =
    plStage (`xor` 32767)
  . plStage (+ 5)
  . plStage (`shiftL` 2)
  . plStage (.&. 682)
  . register Nothing


plStage
  :: ( HiddenClockResetEnable dom
     , NFDataX a
     )
  => (a -> a)
  -> Signal dom (Maybe a)
  -> Signal dom (Maybe a)

plStage f = moore (const (fmap f)) id Nothing

--8<---------------cut here---------------end--------------->8---

I'd really like to improve the situation with needing about 15 lines of code just to do

    let en0 = CEP.enable en ready

so I'm going to see if we can do something about that.


HTH,

Peter.

Mahshid Shahmohammadian

unread,
Jun 14, 2021, 11:27:05 AM6/14/21
to clash-l...@googlegroups.com
Hi Peter Lebbing,

Thanks for your help! I have one question, here you mention:

Secondly, you want to stall the pipeline when the thing connected to the
output is not ready, and you want to be able to reset the thing to an
initial value. Many of our building blocks have a context that states
`HiddenClockResetEnable dom`. This context asserts that the function has
a clock domain called `dom` (here universally quantified) and three
hidden parameters, a `Clock dom`, a `Reset dom` and an `Enable dom`.
These signals will become explicit during generation of HDL for your
synthesis tool, and can be connected there. But especially `Enable` can
be used inside a design and is perfect for stalling a pipeline.

Why do you think this should be linked to Enable and not Reset? How can I tell CLash compiler to connect the reset signal used in my function to the default reset port after VHDL generation?
Also, I could not find CEP.enable in Clash.Explicit.Prelude, I see that you have: import qualified Clash.Explicit.Prelude as CEP

Can you point me to more documents to read about these implicit and explicit Clock Reset Enable in Clash system? and when we should hide and expose them?

Thanks a lot.
Mahshid


--
You received this message because you are subscribed to the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.
Message has been deleted

Peter Lebbing

unread,
Jun 15, 2021, 12:19:10 PM6/15/21
to clash-l...@googlegroups.com
Hi Mahshid,

On 2021-06-14 17:26, Mahshid Shahmohammadian wrote:

Also, I could not find CEP.enable in Clash.Explicit.Prelude, I see
that you have: import qualified Clash.Explicit.Prelude as CEP

That is a style of import that imports the specified module under a user-given name. There's many ways to change the way an import makes names available. This one says that all names exported by Clash.Explicit.Prelude should be made available under the qualification CEP. So all f, g, h and j are identical to eachother:


import Clash.Explicit.Prelude

f = enable
g = Clash.Explicit.Prelude.enable

- or -

import qualified Clash.Explicit.Prelude

h = Clash.Explicit.Prelude.enable

- or -

import qualified Clash.Explicit.Prelude as CEP

j = CEP.enable

There's a bunch more of combinations, I'm just showing these to illustrate what CEP does. It's not a proper part of the name of the function, it's a label locally given to that module.


Can you point me to more documents to read about these implicit and
explicit Clock Reset Enable in Clash system? and when we should hide
and expose them?

https://hackage.haskell.org/package/clash-prelude-1.4.2/docs/Clash-Signal.html#g:10
discusses hidden clocks, resets and enables, although I am just in the process of improving that text a bit (no major overhaul). Part of the text is still from when we just had hidden clocks and resets, before we added hidden enables.

These implicit parameters were introduced to reduce the amount of boring simple signal passing in your design. The following variation on my previous code is properly functioning Clash code:
 
 

import Clash.Explicit.Prelude

variant3
  :: ( KnownDomain dom

     , Num a
     , Bits a
     , NFDataX a
     )
  => Clock dom
  -> Reset dom
  -> Enable dom
  -> Signal dom (Maybe a)

     -- ^ Input
  -> Signal dom Bool
     -- ^ Output ready
  -> ( Signal dom (Maybe a)
       -- ^ Output
     , Signal dom Bool
       -- Input ready
     )
variant3 clk rst en inp outReady
  = (variant3_2 clk rst en0 inp, outReady)
 where
  en0 = enable en outReady

variant3_2
  :: ( KnownDomain dom

     , Num a
     , Bits a
     , NFDataX a
     )
  => Clock dom
  -> Reset dom
  -> Enable dom
  -> Signal dom (Maybe a)

  -> Signal dom (Maybe a)
variant3_2 clk rst en =
    plStage (`xor` 32767) clk rst en
  . plStage (+ 5)         clk rst en
  . plStage (`shiftL` 2)  clk rst en
  . plStage (.&. 682)     clk rst en
  . register clk rst en Nothing

plStage
  :: ( KnownDomain dom

     , NFDataX a
     )
  => (a -> a)
  -> Clock dom

  -> Reset dom
  -> Enable dom
  -> Signal dom (Maybe a)
  -> Signal dom (Maybe a)
plStage f clk rst en = moore clk rst en (const (fmap f)) id Nothing
 
 
 
Please note that my code did not start with the usual "import Clash.Prelude", do not include that. Instead it imports "Clash.Explicit.Prelude", the version of our Prelude that does not use implicit clocks, resets and enables.

This style of programming gets rather tedious when all we do with those clocks, resets and enables is pass them along unaltered. Now I did alter the "Enable" at one point, and interestingly there the implicit version got tedious instead for a moment. I'm working on that, but it is not my top priority. It'll come, in the new version of Clash.

In the simplest case, if your whole design is one clock domain, with an external global reset and optionally an external global enable, you could write your whole design completely implicitly, i.e. without doing anything with your Hidden constraints yourself. Then Clash will generate the entities with clock, reset and enable lines in the generated HDL, and you can connect them externally in your synthesis tooling. Do note that the reset line in particular needs to abide by the restrictions the FPGA imposes on them, like for instance synchronous de-asserting.

Bit this is not really recommended, it's quick-and-dirty. As the tutorial states, it is common to have the top entity of your design in explicit form. That part of the tutorial shows the most basic case of just exposing the signals on the subordinate(s) of the top entity.

But as the tutorial shows in the Blinker example, it is also common to generate clock and reset signals in Clash.

My advice is that as soon as you actually do meaningful things with clocks, resets and/or enables, they should be explicit. The implicit version is for when all you do is pass along the same clock, reset and enable signals to all sequential components in that part of the circuit.



Why do you think this should be linked to Enable and not Reset?

In your original code, you had a separate input that would reset the circuit to its initial state, and an input that stalled the pipeline when the consumer on the output was not ready to consume a datum.

I was glossing over the Reset; it's not uncommon for there to be only one signal in the whole clock domain that is the reset. So I meant to imply that such a reset can take place at the top entity for that clock domain. Of course, you could create multiple separate reset domains within the clock domain if that is beneficial for your design.

I focussed on the Enable because it is more common to have multiple enable signals in one clock domain, each gating a separate part of the design. I created a separate enable signal for the memory elements in your pipeline such that specifically the pipeline stalls. Everything that should stall when the consumer at the output is not ready can all get that same enable signal, but the whole clock domain should not have its Enable de-asserted by the consumer's readiness, because that would probably include the consumer itself, and the consumer would be locked in a state where it cannot except and everything just stops.


How can I tell CLash compiler to connect the reset signal used in my
function to the default reset port after VHDL generation?

Both when you have a hidden reset and when you have an explicit reset on the top entity, the generated HDL will contain a reset line for you to connect:
 
 
 

topEntity
  :: Clock System
  -> Reset System
  -> Enable System
  -> Signal System (Unsigned 8)
  -> Signal System (Unsigned 8)
topEntity = exposeClockResetEnable (register 0)
 

will produce the following VHDL:
 

entity topEntity is
  port(-- clock
       clk    : in TopEntReset_topEntity_types.clk_System;
       -- reset
       rst    : in TopEntReset_topEntity_types.rst_System;
       -- enable
       en     : in TopEntReset_topEntity_types.en_System;
       s      : in unsigned(7 downto 0);
       result : out unsigned(7 downto 0));
end;
 

and
 
 

topEntity
  :: SystemClockResetEnable
  => Signal System (Unsigned 8)
  -> Signal System (Unsigned 8)
topEntity = register 0
 

will produce the following VHDL:
 

entity topEntity is
  port(-- clock
       \c$$d(%,,%)\   : in TopEntReset_topEntity_types.clk_System;
       -- reset
       \c$$d(%,,%)_0\ : in TopEntReset_topEntity_types.rst_System;
       -- enable
       \c$$d(%,,%)_1\ : in TopEntReset_topEntity_types.en_System;
       s              : in unsigned(7 downto 0);
       result         : out unsigned(7 downto 0));
end;
 
The default names are definitely not as nice, they look like ASCII emoji ;-). But it works.

I hope this clarifies things more.

Have a good day,

Peter.
--
I use the GNU Privacy Guard (GnuPG) in combination with Enigmail.
You can send me encrypted mail if you want some privacy.
My key is available at <http://digitalbrains.com/2012/openpgp-key-peter>

Leon Schoorl

unread,
Jun 17, 2021, 7:49:58 AM6/17/21
to clash-l...@googlegroups.com
Somewhat confusingly enable isn't listed in the documentation for Clash.Explicit.Prelude: https://hackage.haskell.org/package/clash-prelude-1.4.2/docs/Clash-Explicit-Prelude.html
There you can see enable is exported from three different places: Clash.Explicit.Signal, Clash.Explicit.Prelude.Safe, Clash.Explicit.Prelude

Op di 15 jun. 2021 om 18:19 schreef Peter Lebbing <pe...@qbaylogic.com>:

Peter Lebbing

unread,
Jun 17, 2021, 8:32:22 AM6/17/21
to clash-l...@googlegroups.com
On Thu, 17 Jun 2021, Leon Schoorl wrote:
> Somewhat confusingly enable isn't listed in the documentation for
> Clash.Explicit.Prelude:

My apologies for my incomplete answer! I had completely failed to
realise we don't list documentation for everything in the preludes. It
would make an already massive documentation page completely unwieldy if
we did that, unfortunately.

Yes, the Index is your friend, and Hoogle:

https://hoogle.haskell.org/?hoogle=clash-prelude.enable

Peter.
Reply all
Reply to author
Forward
0 new messages