Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Counting the number of ones present ...

123 views
Skip to first unread message

John Ledford

unread,
Jul 8, 2002, 12:17:03 PM7/8/02
to
Hi all,

I am trying to figure out how to write a module that can count the number of
ones present on the input to the module. I have 32 inputs into this module
and I want to put out a number from 0-31 in a 5 bit hex number.

Thoughts or pointers ? Getting this put succinctly into a google search is
quite difficult.... and I was wondering if anyone out there had any good
pointers ... (or even source code for a similar module ? ... yeah, as if I
would be that lucky ;).

Any pointers or thoughts would be appreciated!
Thanks!
John


--
<><><><><><><><><><><><><><><><><><><><>
John Ledford, Design Engineer
Instrumentation Design Laboratory
The University of Kansas
6042 Malott Hall, Lawrence, Kansas 66045
Office: (785) 864-4083
Fax: (785) 864-5396
http://www.idl.ku.edu
jled...@NO.Span.Remove.This-ku.edu
<><><><><><><><><><><><><><><><><><><><>


Mark Schellhorn

unread,
Jul 8, 2002, 12:31:35 PM7/8/02
to
I'm assuming that the code should be synthesizable...

input [31:0] a;
output [4:0] sum;

integer n;

sum = 0;
for (n=0; n<=31; n=n+1) begin
sum = sum + a[n];
end

or:

sum = (a[31] +
a[30]) +
(a[29] +
...
...
a[2]) +
(a[1] +
a[0]);

The first one will have a long ripple delay. You can play with the
parentheses in the second one to force balancing of the adders in
different ways by the synthesis tool. Not using any parentheses will
probably not produce an optimum ckt.

John_H

unread,
Jul 8, 2002, 12:36:00 PM7/8/02
to
In verilog, it's effectively in[0]+in[1]+in[2]+in[3]+...

Keep in mind that 32 inputs requires a 6 bit output for a count of 0 to 32,
inclusive.

Timing will often be a problem and a hierarchical approach can give you the
ability to pipeline. You can use a first stage of logic - 4 input lookups in an
FPGA, for instance - to do a quick count of the first "groups" of signals then
add those counts together to get larger counts.

fully pipelined:

input clk;
input [31:0] in;
output [6:0];
reg [2:0] count4 [7:0];
reg [3:0] count8 [3:0];
reg [4:0] count16 [1:0];
reg [5:0] count32; // 6 bits
integer i,j,k;

always @( posedge clk)
begin
for( i=0; i<8; i=i+1)
count4[i] = in[4*i]+in[4*i+1]+in[4*i+2]+in[4*i+3];
for( j=0; j<4; j=j+1 )
count8[j] <= count4[2*j]+count4[2*j+1];
for( k=0; k<2; k=k+1 )
count16[k] <= count8[2*k]+count8[2*k+1];
count32 = count16[0] + count16[1];
end

You can make it combinatorial in all or in part to get better timing.

John Ledford

unread,
Jul 8, 2002, 2:13:09 PM7/8/02
to
Hi Mark (et al.),

> I'm assuming that the code should be synthesizable...

Agreed. Thanks!


> sum = 0;
> for (n=0; n<=31; n=n+1) begin
> sum = sum + a[n];
> end

> sum = (a[31] +


> a[30]) +
> (a[29] +
> ...
> ...
> a[2]) +
> (a[1] +
> a[0]);

Wow ... I hadn't thought of something so simple. I am used to writing things
that are esoteric functions and letting the synthesis tool figure out how to
make it, but I guess I missed this one ... wow ...

> The first one will have a long ripple delay.

Yeah, and I forgot to mention that as an issue. One problem is that I have
one clock cycle to sample 32 inputs, determine which were not present last
time, count the number of ones present, pick out the minimum value and the
maximum value of the # of inputs present and then put the min / max into
storage registers.

I'd like to do all of that at around 200 MHz or more ... :) .... I am not
holding my breath and I might have to pipeline it, but I did figure out how
to do the min / max asynchronously with any number of inputs :) (32 input (5
bit wide) min comparator with any number of inputs missing or present).


> You can play with the
> parentheses in the second one to force balancing of the adders in
> different ways by the synthesis tool. Not using any parentheses will
> probably not produce an optimum ckt.

I'll do that. I greatly appreciate your time and input!.

Thanks!
John <- who used to do a lot more HDL designs and just got a huge signal
processing type application dumped in his lap after not having done HDL in
years :).

John Ledford

unread,
Jul 8, 2002, 2:25:00 PM7/8/02
to
Hi John,

> In verilog, it's effectively in[0]+in[1]+in[2]+in[3]+...

Cool! Thanks for the help!


> Keep in mind that 32 inputs requires a 6 bit output for a count of 0 to
32,
> inclusive.

Ummm ....errr .... 2^5 = 32 ... so I need 5 bits total to represent 32
states. Did I miss something ? .... I could be confused but I've never seen
this requirement .... help ?


> Timing will often be a problem and a hierarchical approach can give you
the
> ability to pipeline.

Agreed on all accounts. See other message to Mark Schellhorn about what the
project is. I forgot to add that after the end of a timing window that is
started by the first signal going high, I have to be done in around 200ns or
less. I have a LOT to do .... so I am shooting for an optimized flat
approach ... if I can't get it fast enough I may have to turn around and
pipeline it ... area is not a problem, just speed (I haven't even picked the
part family yet and won't until I have some idea how big this monster is
going to be ....).

I am hoping that I can find a way to parallel operations such that
everything is going at the exact same time, i.e. the min / max detector and
the counter are completely seperate operations. The logic they feed can then
take the next clock cycle to propogate ... so if I can get them fast enough
then I don't have to pipeline ... and that is a hope anyhow ....

Of course I will have to figure out how to spec timing constraints ... which
I have managed to never have to do (well, more than one or two signals)
since I have always worked on very highly pipelined synchronous designs ....
the type where there is maybe 2 logic levels between clocks :), and they're
typically slow clocks.... until now ...

I've even considered duplicating detectors and counters and having a sampler
block that sends data to blocks in alternating sequence :). Like I said,
area is not a problem. Design time and speed are the issue.

I love working on one off projects. I hated working commercial products
where end cost was the overriding factor (no, I CAN'T get 110% of the logic
into the chip!!!!, but yet you have to ... so you optimize and optimize and
re-write and re-write ... ARGH!).


Thanks again for your input and the sample code. I really appreciate it!

Mark Schellhorn

unread,
Jul 8, 2002, 2:58:50 PM7/8/02
to
Glad to help.

John_H is correct in that you do need 6 bits to count to 32; unless it's
not possible that you will have 0 ones. If you never have all zeroes you
can code the count with five bits: 0x0 means 1, 0x1 means 2... 0x1F
means 32, etc.

John Ledford

unread,
Jul 8, 2002, 3:12:39 PM7/8/02
to
> Glad to help.

And I appreciate it very much. Solved that sticking point :)


> John_H is correct in that you do need 6 bits to count to 32; unless it's
> not possible that you will have 0 ones. If you never have all zeroes you
> can code the count with five bits: 0x0 means 1, 0x1 means 2... 0x1F
> means 32, etc.

Ahhh ... I get it ... hadn't thought of that .... there is definately the
case where their will be all zero's ... in fact it's more likely that there
will never be 32 ones :).... all zero's will actually be the common input
... and over hundreds of cycles the sum total will be up to 32 ... but I
have to be able to have all 32 hit at once in a test mode that is present in
the system ...

Thanks again for your help!

John <- who needs to get back to real work ... I hadn't subscribed to news
groups on purpose at work until this problem :).

Al Provist

unread,
Jul 8, 2002, 3:17:34 PM7/8/02
to
>
> Ummm ....errr .... 2^5 = 32 ... so I need 5 bits total to represent 32
> states. Did I miss something ? .... I could be confused but I've never
> seen
> this requirement .... help ?
>

2^n is coded with n+1 bits.

example: 2^0 ->1 1 bit
2^1 -> 10 2 bits
...
2^n -> 100..000 n+1 bits.
<--n--->

Of course, you can code 32 states with 5 bits but from 0 to 31. If you want to count 32 inputs set to 1, you need 33 states, 0 input at 1,
1 input at, ..., all 32 inputs at 1. And thus a counter with 6 bits.

A+
Al

John Ledford

unread,
Jul 8, 2002, 3:48:52 PM7/8/02
to

> Of course, you can code 32 states with 5 bits but from 0 to 31. If you
want to count 32 inputs set to 1, you need 33 states, 0 input at 1,
> 1 input at, ..., all 32 inputs at 1. And thus a counter with 6 bits.


Thanks for the clarification! I appreciate it!

John


John_H

unread,
Jul 8, 2002, 4:20:01 PM7/8/02
to
The comparators should compile nicely as well. For the 6'h0-6'h20 result
(0-32), the 6 bit compares require a carry chain of only 6 or 7 bits each when
compared to the max or min value. Just be sure you reset the max tracker to 0
and the min tracker to 32 (or more) such that the first clock results is the
true min/max value equal to that first (only) sample. With the pipelined
approach I showed you, the system should run very nicely in most FPGAs. Some
CPLDs might be troublesome, but FPGAs with built-in carry chains are clean.

You might pipeline the comparison rather than feeding it directly to an enable
or mux to select the new value. In this arrangement you typically only have one
net delay followed by the carry chain directly to the setup of a register.
Using the comparison result directly to 6 enables would add another net delay
onto the carry chain. Only in the tightest timing requirements would you need
to go to the extreme of so many pipeline stages. You might find that you only
need two or 3 pipeline stages rather than the 5 or 6 stages I've been talking
toward.

Enjoy!

0 new messages