Array of bits

pozz

unread,

May 20, 2021, 3:22:43 AM5/20/21

to

Many times I need to pass to functions or serialize an array of bits. If
they are just a few (8-16 bits), I decide to use a standard array:

uint8_t my_short_array_of_bits[16];
void abits_set_bit(uint8_t *array, size_t size, uint8_t nbit);
uint8_t abits_get_bit(uint8_t *arrray, size_t size, uint8_t nbit);

It sometimes occurs that the number of bits is much greater, maybe 100
or 200. If I developed on desktop/server machine, I would continue using
array of uint8_t, but I'm developing on embedded platforms with limited
resources.

So I started using uint32_t for 32 bits. It is very comfortable to test,
set and clear a bit. I often need to set the N lowest bits or get the
lowest bit set (__builtin_ctz).

For a recent project, I needed to increase the number of bits to 60, so
the first idea is to use uint64_t. However I'm thinking to try to
generalize for a bigger number of bits, because tomorrow someone will
ask to increase the length again to 100 or 200.

Do you have some suggestions for this? Maybe in the same project I need
to use an array of 100 bits and an array of 200 bits. I would prefer a
typedef for both types, for example:

typedef uint8_t inputs_t[100];
uint8_t inputs_get_bit(inputs_t inputs, unsigned int nbit);
void inputs_set_bit(...);
void inputs_reset_bit(...);
void inputs_toggle_bit(...);
int inputs_ctz(inputs_t inputs);
void inputs_set_low_bits(inputs_t inputs, unsigned int nbits);

typedef uint8_t outputs_t[200];
uint8_t outputs_get_bit(outputs_t inputs, unsigned int nbit);
...

Maybe I can create a .h file with preprocessor macro that will define
static inline functions automatically for many types of arrays.
Something similar to:

#define ABITS_TYPE_NAME inputs
#define ABITS_LENGTH 100
#include "abits.h"

--- abits.h ---
#define ABITS_TYPE CONCAT(ABITS_TYPE_NAME, _t)
typedef uint8_t ABITS_TYPE[ABITS_LENGTH];

#define ABITS_GET_FUNC CONCAT(ABITS_TYPE_NAME, _get_bit)
static inline uint8_t
ABITS_GET_FUNC(ABITS_TYPE ABITS_TYPE_NAME, unsigned int nbits) {
...
}

...
--- end of abits.h ---

I know I can try to write this .h file, but maybe there are some public
domain code ready-to-use.

Don Y

unread,

May 21, 2021, 7:05:59 PM5/21/21

to

On 5/20/2021 12:22 AM, pozz wrote:
> Many times I need to pass to functions or serialize an array of bits. If they
> are just a few (8-16 bits), I decide to use a standard array:
>
> uint8_t my_short_array_of_bits[16];
> void abits_set_bit(uint8_t *array, size_t size, uint8_t nbit);
> uint8_t abits_get_bit(uint8_t *arrray, size_t size, uint8_t nbit);

Why would you want to allocate a BYTE (uint8_t) for EACH BIT?
8 bits would fit into:
uint8_t my_short_array_of_bits[1];
while 16 would fit in:
uint8_t my_short_array_of_bits[2];
generally:
uint8_t my_short_array_of_bits[(NUMBITS+7)/8];

> It sometimes occurs that the number of bits is much greater, maybe 100 or 200.

uint8_t my_short_array_of_bits[(100+7)/8];
uint8_t my_short_array_of_bits[(200+7)/8];

> If I developed on desktop/server machine, I would continue using array of
> uint8_t, but I'm developing on embedded platforms with limited resources.

So?

> So I started using uint32_t for 32 bits. It is very comfortable to test, set
> and clear a bit. I often need to set the N lowest bits or get the lowest bit
> set (__builtin_ctz).
>
> For a recent project, I needed to increase the number of bits to 60, so the
> first idea is to use uint64_t. However I'm thinking to try to generalize for a
> bigger number of bits, because tomorrow someone will ask to increase the length
> again to 100 or 200.
>
> Do you have some suggestions for this? Maybe in the same project I need to use
> an array of 100 bits and an array of 200 bits. I would prefer a typedef for
> both types, for example:

Ideally, you would conditionally pick your underlying implementation
(i.e., whether to use uint8 vs unint32 vs uint64) based on whatever the
native hardware best supports.

Using a wider base type can be less efficient on targets that have narrower
natural data sizes. E.g., support for a uint32 on an 8 bit processor may
ADD code where a "more natural" 8 bit type would better fit with the
instruction set/architecture.

E.g., I use BigRationals in some of my math. These are pairs of BigIntegers
(which, in turn, are implemented as arrays of some convenient *base* data type
wide enough to support the value(s) they are being asked to represent.
So, if the value is currently 0x123456789, you'd need at least 33 bits to
represent the value. On an architecture where bytes are the nominal data
type, this would require an array of 5 bytes; on a 16bit architecture, it
would require 3 words; on a 32bit architecture, 2 longs.

In my case, the size of the array is hidden from the type; a BigInteger
value of 0x123456789 would consume less resources than one having the value
0x112233445566778899 -- but, both would still be treated as "BigIntegers".

If you want to create stronger typing and create a type for each possible
bit-array size, then you'll need to "templatize" the routines that process
those data types as a bit_array_16_t is not type compatible with a
bit_array_18_t -- even though the underlying implementations are strikingly
similar.

OTOH, if you just treat them all as "bit_array_t" -- and assume responsibility
for manually ensuring that you've specified the correct size for each
argument processed, then you can share a parameterized (instead of templatized)
implementation.

boB

unread,

May 22, 2021, 3:17:21 AM5/22/21

to

On Fri, 21 May 2021 16:05:32 -0700, Don Y
<blocked...@foo.invalid> wrote:

>On 5/20/2021 12:22 AM, pozz wrote:
>> Many times I need to pass to functions or serialize an array of bits. If they
>> are just a few (8-16 bits), I decide to use a standard array:
>>
>> uint8_t my_short_array_of_bits[16];
>> void abits_set_bit(uint8_t *array, size_t size, uint8_t nbit);
>> uint8_t abits_get_bit(uint8_t *arrray, size_t size, uint8_t nbit);
>
>Why would you want to allocate a BYTE (uint8_t) for EACH BIT?
>8 bits would fit into:
> uint8_t my_short_array_of_bits[1];
>while 16 would fit in:
> uint8_t my_short_array_of_bits[2];
>generally:
> uint8_t my_short_array_of_bits[(NUMBITS+7)/8];

Don, isn't this kind of like Bit Banding for an ARM Corte-M3, 4, 7 etc
? Except I ~think~ that ARM uses more than 1 byte for each bit ?

I wouldn't prefer not to use 1 byte to represent 1 bit but I have seen
it done before for true/false I think back in the 8 bit days

Don Y

unread,

May 22, 2021, 3:50:59 AM5/22/21

to

On 5/22/2021 12:17 AM, boB wrote:
> On Fri, 21 May 2021 16:05:32 -0700, Don Y
> <blocked...@foo.invalid> wrote:
>
>> On 5/20/2021 12:22 AM, pozz wrote:
>>> Many times I need to pass to functions or serialize an array of bits. If they
>>> are just a few (8-16 bits), I decide to use a standard array:
>>>
>>> uint8_t my_short_array_of_bits[16];
>>> void abits_set_bit(uint8_t *array, size_t size, uint8_t nbit);
>>> uint8_t abits_get_bit(uint8_t *arrray, size_t size, uint8_t nbit);
>>
>> Why would you want to allocate a BYTE (uint8_t) for EACH BIT?
>> 8 bits would fit into:
>> uint8_t my_short_array_of_bits[1];
>> while 16 would fit in:
>> uint8_t my_short_array_of_bits[2];
>> generally:
>> uint8_t my_short_array_of_bits[(NUMBITS+7)/8];
>
> Don, isn't this kind of like Bit Banding for an ARM Corte-M3, 4, 7 etc
> ? Except I ~think~ that ARM uses more than 1 byte for each bit ?
>
> I wouldn't prefer not to use 1 byte to represent 1 bit but I have seen
> it done before for true/false I think back in the 8 bit days

Sure, you can use a long long to represent a bit -- if handling long longs
is efficient (for your architecture) *and* if you have the resources.

But, you only *need* a single bit.

Some multiple-bit operations (e.g., set all bits to the right or left
of a particular bit) are far more efficient when bits are packed into
a "datum" (byte/word/long/etc.) -- you don't need to do multiple
load/stores to access them at different addresses.

More than one bit set in a "variable" (useful for scanning keypads)?

if ((variable - 1) & variable) {
// multiple bits set
} else {
// only one bit set
}

By contrast, if each "bit" was an array element, you'd have to
walk through the array -- the *entire* array (or, at least enough
of it to conclusively rule out "only one bit")

Count number of set bits?

int
bitcount(uint32_t value) {
value = value - ((value >> 0001) & 0x55555555);
value = (value & 0x33333333) + ((value >> 0010) & 0x33333333);
value = (value + (value >> 0100)) & 0x0F0F0F0F;
return (value * 0x01010101) >> (0011 * 01000);
}

[hopefully I didn't misremember any typos!]

constant time, regardless of number of set bits (no branches).
And, *faster*/smaller than examining individual elements of
an array.

Etc. (there are lots of clever tricks that can be applied when
bits are *packed*)

The OP doesn't explain what he intends to do with these data.
What other operations will be performed on them? *Between* them?
merge_bit_arrays(array1, array2)
intersect_bit_arrays(...)
etc.

As always, horses for courses...

Don Y

unread,

May 22, 2021, 4:04:04 AM5/22/21

to

On 5/22/2021 12:50 AM, Don Y wrote:
> int
> bitcount(uint32_t value) {
> value = value - ((value >> 0001) & 0x55555555);
> value = (value & 0x33333333) + ((value >> 0010) & 0x33333333);
> value = (value + (value >> 0100)) & 0x0F0F0F0F;
> return (value * 0x01010101) >> (0011 * 01000);
> }

Ugh! Sorry, I'm mixing language characteristics. In GNU C, the
constants that appear to be "octal" (but are intended to be binary)
would be prefaced with "0b". In Limbo, you'd use the "2r" prefix.

Dimiter_Popoff

unread,

May 22, 2021, 8:13:40 AM5/22/21

to

To make the thread somewhat less dull let me state
(to the delight of David :-) that true big endian (power like)
shows up as being the correct way when doing bitmaps.
Bit 0 is the left most bit in the byte at the lowest offset
etc. Nicely ordered left to right...

Don Y

unread,

May 22, 2021, 8:03:08 PM5/22/21

to

With modern debuggers, you'd not bother to look at the "raw"
data in memory but, rather, display the "variable" directly
(letting the debugger make it "look pretty")

I'm more concerned with things like synchronizing logs between
different hosts working on the same "problem"; being able to
order them temporally as well as causally.

Presently, not the sort of thing a debugger can tackle...

pozz

unread,

May 23, 2021, 10:32:58 AM5/23/21

to

Il 22/05/2021 01:05, Don Y ha scritto:
> On 5/20/2021 12:22 AM, pozz wrote:
>> Many times I need to pass to functions or serialize an array of bits.
>> If they are just a few (8-16 bits), I decide to use a standard array:
>>
>> uint8_t my_short_array_of_bits[16];
>> void abits_set_bit(uint8_t *array, size_t size, uint8_t nbit);
>> uint8_t abits_get_bit(uint8_t *arrray, size_t size, uint8_t nbit);
>
> Why would you want to allocate a BYTE (uint8_t) for EACH BIT?
> 8 bits would fit into:
>    uint8_t my_short_array_of_bits[1];
> while 16 would fit in:
>    uint8_t my_short_array_of_bits[2];
> generally:
>    uint8_t my_short_array_of_bits[(NUMBITS+7)/8];

Because the bits are just a few, so I don't lost many space and loops
for operations on bit are very fast.

Of course, this decision is made when the bit are completely indipendent
data and I don't interested in classical "bit operations".

For example, when you have 10 boolean options for your software. Why
should you pack those options in a bitmask if they are unrelated?

>> It sometimes occurs that the number of bits is much greater, maybe 100
>> or 200.
>
> uint8_t my_short_array_of_bits[(100+7)/8];
> uint8_t my_short_array_of_bits[(200+7)/8];
>
>> If I developed on desktop/server machine, I would continue using array
>> of uint8_t, but I'm developing on embedded platforms with limited
>> resources.
>
> So?
>
>> So I started using uint32_t for 32 bits. It is very comfortable to
>> test, set and clear a bit. I often need to set the N lowest bits or
>> get the lowest bit set (__builtin_ctz).
>>
>> For a recent project, I needed to increase the number of bits to 60,
>> so the first idea is to use uint64_t. However I'm thinking to try to
>> generalize for a bigger number of bits, because tomorrow someone will
>> ask to increase the length again to 100 or 200.
>>
>> Do you have some suggestions for this? Maybe in the same project I
>> need to use an array of 100 bits and an array of 200 bits. I would
>> prefer a typedef for both types, for example:
>
> Ideally, you would conditionally pick your underlying implementation
> (i.e., whether to use uint8 vs unint32 vs uint64) based on whatever the
> native hardware best supports.

Yes, but my question was related to a "generic" approach that can be
used if the bits are 32, 64 or 256. In the latter case, arrays are
necessary.

> Using a wider base type can be less efficient on targets that have narrower
> natural data sizes. E.g., support for a uint32 on an 8 bit processor may
> ADD code where a "more natural" 8 bit type would better fit with the
> instruction set/architecture.
>
> E.g., I use BigRationals in some of my math. These are pairs of
> BigIntegers
> (which, in turn, are implemented as arrays of some convenient *base*
> data type
> wide enough to support the value(s) they are being asked to represent.
> So, if the value is currently 0x123456789, you'd need at least 33 bits to
> represent the value. On an architecture where bytes are the nominal data
> type, this would require an array of 5 bytes; on a 16bit architecture, it
> would require 3 words; on a 32bit architecture, 2 longs.
>
> In my case, the size of the array is hidden from the type; a BigInteger
> value of 0x123456789 would consume less resources than one having the value
> 0x112233445566778899 -- but, both would still be treated as "BigIntegers".
>
> If you want to create stronger typing and create a type for each possible
> bit-array size, then you'll need to "templatize" the routines that process
> those data types as a bit_array_16_t is not type compatible with a
> bit_array_18_t -- even though the underlying implementations are strikingly
> similar.

Yes, something similar. I only asked if there was a public domain code
with this kind of approach.

> OTOH, if you just treat them all as "bit_array_t" -- and assume
> responsibility
> for manually ensuring that you've specified the correct size for each
> argument processed, then you can share a parameterized (instead of
> templatized)
> implementation.

Yes, this is another solution.

Don Y

unread,

May 23, 2021, 4:08:39 PM5/23/21

to

On 5/23/2021 7:32 AM, pozz wrote:
> Il 22/05/2021 01:05, Don Y ha scritto:
>> On 5/20/2021 12:22 AM, pozz wrote:
>>> Many times I need to pass to functions or serialize an array of bits. If
>>> they are just a few (8-16 bits), I decide to use a standard array:
>>>
>>> uint8_t my_short_array_of_bits[16];
>>> void abits_set_bit(uint8_t *array, size_t size, uint8_t nbit);
>>> uint8_t abits_get_bit(uint8_t *arrray, size_t size, uint8_t nbit);
>>
>> Why would you want to allocate a BYTE (uint8_t) for EACH BIT?
>> 8 bits would fit into:
>> uint8_t my_short_array_of_bits[1];
>> while 16 would fit in:
>> uint8_t my_short_array_of_bits[2];
>> generally:
>> uint8_t my_short_array_of_bits[(NUMBITS+7)/8];
>
> Because the bits are just a few, so I don't lost many space and loops for
> operations on bit are very fast.
>
> Of course, this decision is made when the bit are completely indipendent data
> and I don't interested in classical "bit operations".
>
> For example, when you have 10 boolean options for your software. Why should you
> pack those options in a bitmask if they are unrelated?

Why should you pack them into a bit_array_t -- if they are unrelated?

Bool flagX;
Bool flag3;
Bool running;
Bool OK;

Putting them into an array suggests that they are related to each other
more than simply by the fact that they are of the same *type*.

You don't put all of your "integers" into a int_array_t, do you?

int count;
int books;
int iteration;
int bank_balance;

>>> Do you have some suggestions for this? Maybe in the same project I need to
>>> use an array of 100 bits and an array of 200 bits. I would prefer a typedef
>>> for both types, for example:
>>
>> Ideally, you would conditionally pick your underlying implementation
>> (i.e., whether to use uint8 vs unint32 vs uint64) based on whatever the
>> native hardware best supports.
>
> Yes, but my question was related to a "generic" approach that can be used if
> the bits are 32, 64 or 256. In the latter case, arrays are necessary.

Whether or not an array is necessary depends on the size of the "base type"
on which the array will be built. If your processor handles unit8_t's
efficiently, you'd pack 8 bits into a uint8_t and then number_of_bits/8
of these into a bit_array_t. If your processor handles uint32_t's
efficiently, you'd pack 32 of them into a uint32_t and then pack
number_of_bits/32 of these into a bit_array_t.

Note that "handles efficiently" may not be the same as the
native data size for your processor. E.g., a 32b CPU with
an 8 bit data path would likely handle uint8_t's more efficiently
than uint32_t's -- because it could reduce the array member access
to a single "byte access" (memory cycle). OTOH, if the cost of
accessing a uint32_t is the same (or lower!), then you'd opt
for that as a base type.

Once you have selected a base type, the code is the same
(except for adjustments in manifest constants related to
the number of bits packed in each byte)

>> Using a wider base type can be less efficient on targets that have narrower
>> natural data sizes. E.g., support for a uint32 on an 8 bit processor may
>> ADD code where a "more natural" 8 bit type would better fit with the
>> instruction set/architecture.
>

> Yes, something similar. I only asked if there was a public domain code with
> this kind of approach.

<shrug> It would likely take longer to FIND than it would take to WRITE.
Pseudocode:

// note that this relies on BASE_T being the "expected" width
// AND sizeof yielding that value!
// if it is greater, then bits of memory get wasted

#define BASE_T ... // unit8_t, uint16_t, uint32_t, etc.
#define BASE_BITS (8 * sizeof(BASE_T))

result_t
test_bit(
uint bit, // [0,size)
BASE_T *array, // (size+7)/BASE_BITS elements
uint size // (0,UINT_MAX]
) {
ASSERT( size > 0 ); // makes no sense to have zero or fewer bits!

// ensure the referenced bit resides within the structure
if ( (bit < 0) || (bit >= size) ) {
return ERROR;
}

if ( array[bit/8] & (1 << (bit % BASE_BITS)) ) {
return (1);
} else {
return (0);
}
}

[No guarantees as I just rolled out of bed! :> ]

This implementation counts bits "the normal way"
(from LSb upward).

Pick different BASE_T's and see what the code looks like.
Or, profile it's execution on YOUR hardware to see if there
are advantages to one BASE_T over others.

Note that the larger the BASE_T, the greater the chance for
"wasted bits" (e.g., if you only need 5 bits, then using
a ulonglong will "cost" you considerably more than you *need*!)

By comparison, an implementation that uses uint8_t's for each "bit"
would likely look like:

result_t
test_bit(
uint bit, // [0,size)
BASE_T *array, // size elements
uint size // (0,UINT_MAX]
) {
ASSERT( size > 0 ); // makes no sense to have zero or fewer bits!

// ensure the referenced bit resides within the structure
if ( (bit < 0) || (bit >= size) ) {
return ERROR;
}

if ( array[bit] ) {
return (1);
} else {
return (0);
}
}

i.e., once you start considering the overhead of error checking,
there's little performance gain.

>> OTOH, if you just treat them all as "bit_array_t" -- and assume responsibility
>> for manually ensuring that you've specified the correct size for each
>> argument processed, then you can share a parameterized (instead of templatized)
>> implementation.
>
> Yes, this is another solution.

As implemented, above. But, now the compiler can't tell you if you've used
the wrong type in a particular line of code.

Time for breakfast (which most people call "lunch"!). Sunday... The Pork Dish!

Don Y

unread,

May 23, 2021, 4:17:39 PM5/23/21

to

On 5/23/2021 1:08 PM, Don Y wrote:

> result_t
> test_bit(
> uint bit, // [0,size)
> BASE_T *array, // size elements
> uint size // (0,UINT_MAX]
> ) {
> ASSERT( size > 0 ); // makes no sense to have zero or fewer bits!
>
> // ensure the referenced bit resides within the structure
> if ( (bit < 0) || (bit >= size) ) {

Note that many compilers will flag the "bit < 0" test as unnecessary (because
I've chosen to define it as unsigned -- it can't take on a negative value!).
However, I include this as it guards against someone deciding to change bit's
data type to signed and introducing a latent error.

Don Y

unread,

May 25, 2021, 4:49:33 AM5/25/21

to

On 5/23/2021 1:08 PM, Don Y wrote:

> if ( array[bit/8] & (1 << (bit % BASE_BITS)) ) {

s/8/BASE_BITS/

Does that make you happy, Jay? :>

(And, no, I'm not going to make the other changes!)