Bit parsing class

Al

unread,

Nov 9, 2009, 3:04:19 PM11/9/09

to

We have a utility class that is used to parse octet buffers on a bit
by bit basis. You initialize the parser object to point at a buffer
and then you call methods such as x = getBits(5); and it will return
the next 5 bits into x and internally keep track of the bit position
where you left off so next time you call you get the very next
bits....

What makes the most sense in designing this utility when it comes to
bit order?

Lets say the memory is laid out as such...
0xab (address 10)
0xcd (address 11)
0xef (address 12)

A) Should the very first call to the bit parser start from bit 0 of
address 10 and iterate to the left? or ...
B) Should the very first call start at bit 7 of address 10 and iterate
to the right?

Solution A seems intuitive because bit 0 of address 10 is returned
first but Solution B returns numerical quantities that span multiple
bytes in network byte order.

What is the most intuitive way to do this so it makes the most sense
to someone using this utility?

Hypothetically if this were a supported standard iostream class what
do you think they would implement?

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Neil Butterworth

unread,

Nov 9, 2009, 4:59:38 PM11/9/09

to

Al wrote:

> We have a utility class that is used to parse octet buffers on a bit
> by bit basis. You initialize the parser object to point at a buffer
> and then you call methods such as x = getBits(5); and it will return
> the next 5 bits into x and internally keep track of the bit position
> where you left off so next time you call you get the very next
> bits....
>
> What makes the most sense in designing this utility when it comes to
> bit order?

I don't think the interface makes much sense. It's bound to be
enormously inefficient, in an area (bit-twiddling) when if you need to
use it all you really need efficiency. In these cases, the normal
bitwise operators, explicitly using masks and shifts seem to make more
sense. This is probably why the standard library provides nothing other
than the bitset template class.

>
> Lets say the memory is laid out as such...
> 0xab (address 10)
> 0xcd (address 11)
> 0xef (address 12)
>
> A) Should the very first call to the bit parser start from bit 0 of
> address 10 and iterate to the left? or ...
> B) Should the very first call start at bit 7 of address 10 and iterate
> to the right?

What does your application require? You do have an application that
needs this functionality, don't you? If not, why are you providing it?

> Solution A seems intuitive because bit 0 of address 10 is returned
> first but Solution B returns numerical quantities that span multiple
> bytes in network byte order.
>
> What is the most intuitive way to do this so it makes the most sense
> to someone using this utility?

Nothing, I repeat nothing, in programming is intuitive And this must be
true in spades for bitwise operations, where bit order is really a
matter of momentary convenience.

> Hypothetically if this were a supported standard iostream class what
> do you think they would implement?

iostreams are pretty resolutely character oriented - I can't imagine
them implementing a bitwise interface.

Neil Butterworth

Seungbeom Kim

unread,

Nov 9, 2009, 4:56:36 PM11/9/09

to

Al wrote:
> We have a utility class that is used to parse octet buffers on a bit
> by bit basis. You initialize the parser object to point at a buffer
> and then you call methods such as x = getBits(5); and it will return
> the next 5 bits into x and internally keep track of the bit position
> where you left off so next time you call you get the very next
> bits....
>
> What makes the most sense in designing this utility when it comes to
> bit order?
>
> Lets say the memory is laid out as such...
> 0xab (address 10)
> 0xcd (address 11)
> 0xef (address 12)
>
> A) Should the very first call to the bit parser start from bit 0 of
> address 10 and iterate to the left? or ...
> B) Should the very first call start at bit 7 of address 10 and iterate
> to the right?
>
> Solution A seems intuitive because bit 0 of address 10 is returned
> first but Solution B returns numerical quantities that span multiple
> bytes in network byte order.
>
> What is the most intuitive way to do this so it makes the most sense
> to someone using this utility?
>
> Hypothetically if this were a supported standard iostream class what
> do you think they would implement?

I don't think there's one right way, especially when you consider
both of big-endian and little-endian environments.

Assuming that lower-addressed bytes should be accessed first in any
case, we have two possibilities: under a little-endian environment,
less significant bytes are accessed first, so it is more consistent
and makes more sense to take less significant bits first (Solution A)
as well, and vice versa (Solution B for big-endian).

(Personally, I feel little-endianness is more consistent in the first
place (suppose you're implementing your own BigInt class using std::
vector and guess which ordering you'll choose), but opinions may vary.)

You just have to make a choice and be consistent throughout your program.

--
Seungbeom Kim

George Neuner

unread,

Nov 9, 2009, 9:41:29 PM11/9/09

to

\On Mon, 9 Nov 2009 14:04:19 CST, Al <agam....@gmail.com> wrote:

>We have a utility class that is used to parse octet buffers on a bit
>by bit basis. You initialize the parser object to point at a buffer
>and then you call methods such as x = getBits(5); and it will return
>the next 5 bits into x and internally keep track of the bit position
>where you left off so next time you call you get the very next
>bits....
>
>What makes the most sense in designing this utility when it comes to
>bit order?
>
>Lets say the memory is laid out as such...
>0xab (address 10)
>0xcd (address 11)
>0xef (address 12)
>
>A) Should the very first call to the bit parser start from bit 0 of
>address 10 and iterate to the left? or ...
>B) Should the very first call start at bit 7 of address 10 and iterate
>to the right?
>
>Solution A seems intuitive because bit 0 of address 10 is returned
>first but Solution B returns numerical quantities that span multiple
>bytes in network byte order.
>
>What is the most intuitive way to do this so it makes the most sense
>to someone using this utility?

Byte order and bit order are independent. Most CPUs today order bits
in a byte right to left - 76543210 - regardless of their multi-byte
data endianess. (Note that TCP/IP's htonl() etc. swap only bytes -
they don't swap bits).

Making any sense of the stream is going to depend on the order in
which the data was packed to begin with. This could be a problem if
you cross CPUs or languages (e.g., C bitfields are implementation
defined whereas C++ bitfields have a standard bit order). You really
need to have control of both sides.

>Hypothetically if this were a supported standard iostream class what
>do you think they would implement?

If you would like to follow some convention, then I would look to
serial communication where the msb typically is transmitted first
(although some snazzy serial chips can go either way).

That would be option B.

George

Nick Hounsome

unread,

Nov 10, 2009, 8:00:00 AM11/10/09

to

On 9 Nov, 20:04, Al <agam.em...@gmail.com> wrote:
> We have a utility class that is used to parse octet buffers on a bit
> by bit basis. You initialize the parser object to point at a buffer
> and then you call methods such as x = getBits(5); and it will return
> the next 5 bits into x and internally keep track of the bit position
> where you left off so next time you call you get the very next
> bits....

But presumably x is an integer so you don't really mean to get some
bits (They would have to go into a bitset in general: getBits(70)???)
you mean to get an integer so you want something like
getBigEndian<int>(5) or possibly even getLittleEndian<int,5>() or
getUnsigned<BIG_ENDIAN,int,5>() or getBigEndian(5,x)

This should clarify things for you: The bit order (within a byte) is
always natural - it's only the byte order that can change and that can
be shown in the function name.

>
> What makes the most sense in designing this utility when it comes to
> bit order?
>
> Lets say the memory is laid out as such...
> 0xab (address 10)
> 0xcd (address 11)
> 0xef (address 12)
>
> A) Should the very first call to the bit parser start from bit 0 of
> address 10 and iterate to the left? or ...
> B) Should the very first call start at bit 7 of address 10 and iterate
> to the right?
>
> Solution A seems intuitive because bit 0 of address 10 is returned
> first but Solution B returns numerical quantities that span multiple
> bytes in network byte order.
>
> What is the most intuitive way to do this so it makes the most sense
> to someone using this utility?
>
> Hypothetically if this were a supported standard iostream class what
> do you think they would implement?

iostreams don't do binary

If you derive from an ostream the mystream << 3 will still be a
character output.

Creating your own class and overloading operator << is not a good idea
because it is too easy to output the wrong size - It's best to
explicitly say what you want. Also I have had situations where some
values must be big endian and others litlle endian so, again, you have
to be explicit

Never do this sort of thing unless you really have to - i.e. You have
to stuff a lot of data into a fixed size buffer and it wont fit
naturally.

tohava

unread,

Nov 10, 2009, 9:21:59 AM11/10/09

to

On Nov 9, 11:59 pm, Neil Butterworth <nbutterworth1...@gmail.com>
wrote:

> Al wrote:
> > We have a utility class that is used to parse octet buffers on a bit
> > by bit basis. You initialize the parser object to point at a buffer
> > and then you call methods such as x = getBits(5); and it will return
> > the next 5 bits into x and internally keep track of the bit position
> > where you left off so next time you call you get the very next
> > bits....
>
> > What makes the most sense in designing this utility when it comes to
> > bit order?
>
> I don't think the interface makes much sense. It's bound to be
> enormously inefficient, in an area (bit-twiddling) when if you need to
> use it all you really need efficiency.

Sometimes you use bit-twiddling to conserve space, not to increase
speed (for example in a piece of code that I have, I use a 16-bit
fixed point arithmetic type which I implement using a class)

Jens Schmidt

unread,

Nov 11, 2009, 12:16:44 AM11/11/09

to

George Neuner wrote:

> If you would like to follow some convention, then I would look to
> serial communication where the msb typically is transmitted first
> (although some snazzy serial chips can go either way).

Actually serial communication over the well known serial port
transmits lsb first. As a consequence, you can't see any difference
between
7 bits, space parity, 1 stop bit
8 bits, no parity, 1 stop bit (if sending 0..127 only)
7 bits, no parity, 2 stop bits

All in all, bit order is very application dependent. When looking
at 1 bit pixels in graphics formats they can be ordered in any
of the four directions.
--
Greetings,
Jens Schmidt

Nick Hounsome

unread,

Nov 11, 2009, 8:00:38 PM11/11/09

to

On 11 Nov, 05:16, Jens Schmidt <Jens.Schmidt...@gmx.de> wrote:
> George Neuner wrote:
> > If you would like to follow some convention, then I would look to
> > serial communication where the msb typically is transmitted first
> > (although some snazzy serial chips can go either way).
>
> Actually serial communication over the well known serial port
> transmits lsb first. As a consequence, you can't see any difference
> between
> 7 bits, space parity, 1 stop bit
> 8 bits, no parity, 1 stop bit (if sending 0..127 only)
> 7 bits, no parity, 2 stop bits

The bit order used over serial lines is irrelevant.

The serial port driver presents data to the software as bytes not bits
this means that
if I write a byte with value 0x12 on a little endian machine and send
it over a serial line to a big endian machine it will still be a byte
with value 0x12 whatever bit order is used on the line.

If, however, I write a 2 byte integer, then, for example, 0x1234 willl
become 0x3421

--

Jens Schmidt

unread,

Nov 12, 2009, 7:15:34 AM11/12/09

to

Nick Hounsome wrote:

> On 11 Nov, 05:16, Jens Schmidt <Jens.Schmidt...@gmx.de> wrote:
>> George Neuner wrote:
>> > If you would like to follow some convention, then I would look to
>> > serial communication where the msb typically is transmitted first
>> > (although some snazzy serial chips can go either way).
>>
>> Actually serial communication over the well known serial port
>> transmits lsb first. As a consequence, you can't see any difference
>> between
>> 7 bits, space parity, 1 stop bit
>> 8 bits, no parity, 1 stop bit (if sending 0..127 only)
>> 7 bits, no parity, 2 stop bits
>
> The bit order used over serial lines is irrelevant.

Usually yes, I agree. My post was just to correct the error in the
post before.
But currently I have a program on my to-do list where for reception
I can use the serial hardware, but for transmission I have to
send out the data bit by bit. The hardware just can't satisfy
the timing constraints.
--
Greetings,
Jens Schmidt

George Neuner

unread,

Nov 12, 2009, 9:18:23 PM11/12/09

to

On Tue, 10 Nov 2009 23:16:44 CST, Jens Schmidt
<Jens.Sc...@gmx.de> wrote:

>George Neuner wrote:
>
>> If you would like to follow some convention, then I would look to
>> serial communication where the msb typically is transmitted first
>> (although some snazzy serial chips can go either way).
>
>Actually serial communication over the well known serial port
>transmits lsb first.

You're right. I definitely misremembered this.

But there are comm standards that xmit msb first.

George

--