bitfields compact packing within a struct or class

MarioCPPP

unread,

Apr 14, 2023, 1:19:24 PM4/14/23

to

I am seeing a new (new to me) directive (ALIGNAT) but it
cannot pack bitfields spanning over two different bytes.

Is it there a way to avoid "padding" within the structure,
because the function would be the redistribution of bits (I
mean, some mapping of 5 bytes of 8 bits in 8 fields of 5
bits, all of this by mean of a union, with another 40++ bits
wide integer and a single read / write) ?

maybe my approach is wrong from the beginning.
In this case, what way would you recommend to explore ?

I mean sth like this (the code is bugged, due to padding !)

struct EightSymbolsFiveBitWide
{
unsigned Field_0 : 5;
unsigned Field_1 : 5;
unsigned Field_2 : 5;
unsigned Field_3 : 5;
unsigned Field_4 : 5;
unsigned Field_5 : 5;
unsigned Field_6 : 5;
unsigned Field_7 : 5;
};

struct FiveBytesEightBitWide
{
uint8_t Bytes [5];
};

union Converter
{
EightSymbolsFiveBitWide Symbols;
FiveBytesEightBitWide Bytes;
unsigned long long FastReadWriter;
// some of its bits get lost (41 to 64)
};

FiveBytesEightBitWide.Bytes should be the interface to low
level file access (byte-based), without padding.

I am playing with converter alike Base64, with different
sets of symbols (variable bit depths, even using relatively
primes numbers, I mean numbers which do not share any
divisor except 1).

But I would prefer not to use any bit operators (like &, |,
<<, >>), just using perfect data structures that
redistribute bits.

Any suggestions ?
TNX

--
1) Resistere, resistere, resistere.
2) Se tutti pagano le tasse, le tasse le pagano tutti
MarioCPPP

Bo Persson

unread,

Apr 14, 2023, 1:45:31 PM4/14/23

to

There is no portable way to do this. The standard explicitly says

"Allocation of bit-fields within a class object is
implementation-defined. Alignment of bit-fields is
implementation-defined. Bit-fields are packed into some addressable
allocation unit.

[Note 1: Bit-fields straddle allocation units on some machines and not
on others. Bit-fields are assigned right-to-left on some machines,
left-to-right on others. — end note]"

(http://eel.is/c++draft/class.bit)

Bo Persson

unread,

Apr 14, 2023, 1:50:18 PM4/14/23

to

Except of course using bit operations "by hand". It is the bit-fields
that are not portable.

Scott Lurndal

unread,

Apr 14, 2023, 2:00:49 PM4/14/23

to

MarioCPPP <NoliMihiFran...@libero.it> writes:
>I am seeing a new (new to me) directive (ALIGNAT) but it
>cannot pack bitfields spanning over two different bytes.
>
>Is it there a way to avoid "padding" within the structure,
>because the function would be the redistribution of bits (I
>mean, some mapping of 5 bytes of 8 bits in 8 fields of 5
>bits, all of this by mean of a union, with another 40++ bits
>wide integer and a single read / write) ?
>
>maybe my approach is wrong from the beginning.
>In this case, what way would you recommend to explore ?
>
>I mean sth like this (the code is bugged, due to padding !)
>
>
>struct EightSymbolsFiveBitWide
>{
> unsigned Field_0 : 5;
> unsigned Field_1 : 5;
> unsigned Field_2 : 5;
> unsigned Field_3 : 5;
> unsigned Field_4 : 5;
> unsigned Field_5 : 5;
> unsigned Field_6 : 5;
> unsigned Field_7 : 5;
>};

struct eight_symbols_five_bits_wide {
uint64_t field0 : 5,
field1 : 5,
field2 : 5,
field3 : 5,
field4 : 5,
field5 : 5,
field6 : 5,
field7 : 5,
unused : 24;
} __attribute__ ((packed));

MarioCPPP

unread,

Apr 16, 2023, 9:21:03 AM4/16/23

to

understood. Tnx for clarification.

And what about the standard type BITSET <size> ?

I know, it's not a "value-type", so fitting more bitset in a
struct would not work.
I need to think about the suitability of bitset <size> for
my experiment ...
But any consideration is welcome

MarioCPPP

unread,

Apr 16, 2023, 9:22:24 AM4/16/23

to

intresting !!! But for the little I can understand, this
answer seems in contrast with former Bo Persson reply. So
which is true ? Or is it just me that don't understand they
are not really in contrast ?

another sub-question : why the type is defined with a
precise size unsigned type, which does not even match the
width ? I mean, I usually see only plain unsigned. uint64_t
is a "complete" type, suggesting 64 bit width, which is not
the actual size. Why so ?
Tnx

MarioCPPP

unread,

Apr 16, 2023, 9:24:48 AM4/16/23

to

On 14/04/23 20:00, Scott Lurndal wrote:

another detail surprising to me : is it equivalent to use
COMMA to separate fields ? I saw this just in ENUMS, but not
for member declaration.
Or is it the secret that enables packing, while the
SEMICOLON would have prevented the packing ?
Or was it just a typo ? Sorry for the silly question, but I
have doubts on bit fields

Öö Tiib

unread,

Apr 16, 2023, 3:22:01 PM4/16/23

to

We do not precisely know your use-case but it looks unlikely.
The std::bitset contains bits not 5 bit integers like you seem
to want. There are no methods to extract or set 5-bit sequences
in it. You anyway either have to access it bit by bit or
manipulate unsigned long long (that it can input or output)
by hand.

To me just writing a class that keeps the data ether in
uint_64_t or uint8_t[5] (depending on requirements) and
extracts or sets the quintets in those by hand feels
more promising idea than using std::bitset<40>.

Öö Tiib

unread,

Apr 16, 2023, 3:28:53 PM4/16/23

to

No, it has been always so both in C and in C++ and it makes
no difference.
People just tend often to declare data members of classes
individually (or it can be is required by their coding standard),
otherwise ...

struct point { int x, y; };

... and ...

struct point { int x; int y; };

... have always been equivalent struct definitions.

Michael S

unread,

Apr 16, 2023, 4:00:21 PM4/16/23

to

There is no contradiction if you know that the answer came from Scott Lurndal.
In recent years Scott always assumes ARM64 platform unless explicitly
stated otherwise.
On ARM64 his solution is indeed portable between different compilers
because the ABI on this platform specifies exact rules for bit fields.

>
> another sub-question : why the type is defined with a
> precise size unsigned type, which does not even match the
> width ? I mean, I usually see only plain unsigned. uint64_t
> is a "complete" type, suggesting 64 bit width, which is not
> the actual size. Why so ?

On ARM64 Linux uint64_t is the same as unsigned long.

Scott Lurndal

unread,

Apr 16, 2023, 7:14:08 PM4/16/23

to

Michael S <already...@yahoo.com> writes:
>On Sunday, April 16, 2023 at 4:22:24=E2=80=AFPM UTC+3, MarioCPPP wrote:
>> On 14/04/23 20:00, Scott Lurndal wrote:=20
>> > MarioCPPP <NoliMihiFran...@libero.it> writes:=20
>> >> I am seeing a new (new to me) directive (ALIGNAT) but it=20
>> >> cannot pack bitfields spanning over two different bytes.=20
>> >>=20
>> >> Is it there a way to avoid "padding" within the structure,=20
>> >> because the function would be the redistribution of bits (I=20
>> >> mean, some mapping of 5 bytes of 8 bits in 8 fields of 5=20
>> >> bits, all of this by mean of a union, with another 40++ bits=20
>> >> wide integer and a single read / write) ?=20
>> >>=20
>> >> maybe my approach is wrong from the beginning.=20
>> >> In this case, what way would you recommend to explore ?=20
>> >>=20
>> >> I mean sth like this (the code is bugged, due to padding !)=20
>> >>=20
>> >>=20
>> >> struct EightSymbolsFiveBitWide=20
>> >> {=20
>> >> unsigned Field_0 : 5;=20
>> >> unsigned Field_1 : 5;=20
>> >> unsigned Field_2 : 5;=20
>> >> unsigned Field_3 : 5;=20
>> >> unsigned Field_4 : 5;=20
>> >> unsigned Field_5 : 5;=20
>> >> unsigned Field_6 : 5;=20
>> >> unsigned Field_7 : 5;=20
>> >> };=20
>> >=20
>> > struct eight_symbols_five_bits_wide {=20
>> > uint64_t field0 : 5,=20
>> > field1 : 5,=20
>> > field2 : 5,=20
>> > field3 : 5,=20
>> > field4 : 5,=20
>> > field5 : 5,=20
>> > field6 : 5,=20
>> > field7 : 5,=20
>> > unused : 24;=20
>> > } __attribute__ ((packed));=20
>> >
>> intresting !!! But for the little I can understand, this=20
>> answer seems in contrast with former Bo Persson reply. So=20
>> which is true ? Or is it just me that don't understand they=20
>> are not really in contrast ?=20
>>=20
>
>There is no contradiction if you know that the answer came from Scott Lurnd=

>al.
>In recent years Scott always assumes ARM64 platform unless explicitly

Actually, in this case, I was posting code for x86_64.

which we
use in production every day. Note that the order of the lsbit and
msbit differ depend on whether the host is big-endian or little-endian.

>stated otherwise.
>On ARM64 his solution is indeed portable between different compilers
>because the ABI on this platform specifies exact rules for bit fields.
>

>>=20
>> another sub-question : why the type is defined with a=20
>> precise size unsigned type, which does not even match the=20
>> width ? I mean, I usually see only plain unsigned. uint64_t=20
>> is a "complete" type, suggesting 64 bit width, which is not=20
>> the actual size. Why so ?=20

>
>On ARM64 Linux uint64_t is the same as unsigned long.

on x86_64 as well.

Paavo Helde

unread,

Apr 17, 2023, 12:50:51 AM4/17/23

to

This is true on Linux x86_64, but not in Windows x86_64.

A related story: just a week ago I was trying out how smart GPT-4 is,
and it generated for me some quite impressive code using the openssl
X509_gmtime_adj() function. Very nice, but the manpage suggests against
using that function, because of the limited range of a 'long' argument.

All our programs are 64-bit anyway, so we should not be worried about
this limitation - but we need to support also Windows c86_64, where we
do need to worry about that.

So basically GPT-4 silently created a Y2K38 bug for me, which would
surface only after ca 15 years, and only on Windows. So there ;-)

MarioCPPP

unread,

Apr 17, 2023, 8:31:42 AM4/17/23

to

yes ... I as afraid so

> The std::bitset contains bits not 5 bit integers like you seem
> to want. There are no methods to extract or set 5-bit sequences
> in it. You anyway either have to access it bit by bit or
> manipulate unsigned long long (that it can input or output)
> by hand.
>
> To me just writing a class that keeps the data ether in
> uint_64_t or uint8_t[5] (depending on requirements) and

the problem is to match
uint8_t[5] <=> uint5_t[8]
with single read / write.
uint5_t obviously is not a type, just to stress the concept.

> extracts or sets the quintets in those by hand feels
> more promising idea than using std::bitset<40>.

Ok: not a viable way to explore ! Tnx

James Kuyper

unread,

Apr 17, 2023, 10:41:28 AM4/17/23

to

On 4/16/23 09:18, MarioCPPP wrote:
...

> And what about the standard type BITSET <size> ?

C++ is a type sensitive language, so you need to be careful about case.
It does have a bitset class template, it doesn't have anything named
BITSET.

James Kuyper

unread,

Apr 17, 2023, 10:51:58 AM4/17/23

to

On 4/16/23 09:22, MarioCPPP wrote:
> On 14/04/23 20:00, Scott Lurndal wrote:

...

>> struct eight_symbols_five_bits_wide {
>> uint64_t field0 : 5,

...

> another sub-question : why the type is defined with a
> precise size unsigned type, which does not even match the
> width ? I mean, I usually see only plain unsigned. uint64_t
> is a "complete" type, suggesting 64 bit width, which is not
> the actual size. Why so ?

"A bit-field shall have integral or enumeration type; the bit-field
semantic property is not part of the type of the class member." (11.4.9p1).

A bit-field's declared type determines how values extracted from the
bit-field are handled after extraction.

James Kuyper

unread,

Apr 17, 2023, 11:21:07 AM4/17/23

to

On 4/16/23 09:24, MarioCPPP wrote:
> On 14/04/23 20:00, Scott Lurndal wrote:

...

>> struct eight_symbols_five_bits_wide {
>> uint64_t field0 : 5,
>> field1 : 5,
>> field2 : 5,
>> field3 : 5,
>> field4 : 5,
>> field5 : 5,
>> field6 : 5,
>> field7 : 5,
>> unused : 24;
>> } __attribute__ ((packed));
>>
>
> another detail surprising to me : is it equivalent to use
> COMMA to separate fields ? I saw this just in ENUMS, but not
> for member declaration.
> Or is it the secret that enables packing, while the
> SEMICOLON would have prevented the packing ?
> Or was it just a typo ? Sorry for the silly question, but I
> have doubts on bit fields

Those commas have nothing to do with packing. They are separators for
the member-declarator-list part of a member declaration which is based
upon the following grammar rule:

"member-declaration :
attribute-specifier-seq opt decl-specifier-seq opt
member-declarator-list opt ;" (11.4)

The key part of a declarator is the identifier that names what is being
declared. It also includes the syntax that declares whether the
identifier identifies a function, a point, a reference, a bit-field; if
none of those apply, it identifies an object. The "final" or "overrides"
keywords and any "requires" clause or initializer are also part of the
declarator. This rule allows you to declare multiple declarators in a
single declaration.
The attribute and declaration specifiers that precede the declarator
list are shared by all of the declarators in the list. The key part of
the declaration specifier (9.2) is the defining-type-specifier (9.2.8),
which describes the type of the object or bit-field, the type returned
by the function, or pointed at by the pointer, or referred to by the
reference. It also includes the "friend", "typedef", "constexpr",
"consteval", "constinit", "inline", "virtual", and "explicit" keywords.
If the decl-specifier-seq is sufficiently complicated, declaring
multiple declarators in a single declaration can save a significant
amount of typing. However, I would not recommend using that fact as the
sole justification for doing so. I recommend using it only to declare a
group of closely related declarators (as in the field0 through field 7
members declared above).

MarioCPPP

unread,

Apr 17, 2023, 12:59:43 PM4/17/23

to

it is simply a way to stress keywords in a portable way
without formatting

james...@alumni.caltech.edu

unread,

Apr 17, 2023, 1:25:06 PM4/17/23

to

On Monday, April 17, 2023 at 12:59:43 PM UTC-4, MarioCPPP wrote:
> On 17/04/23 16:41, James Kuyper wrote:
> > On 4/16/23 09:18, MarioCPPP wrote:
> > ...
> >> And what about the standard type BITSET <size> ?
> >
> > C++ is a type sensitive language, so you need to be careful about case.
> > It does have a bitset class template, it doesn't have anything named
> > BITSET.
> it is simply a way to stress keywords in a portable way
> without formatting

bitset is not a keyword, it's an identifier.

I recognised what you were trying to do, I'm just telling you that, because C++ is a case sensitive language, it's a bad idea to use case for any purpose other than identifying the correct spelling of a keyword or identifier.