[boost] [Review] Boost.Endian mini-review

Joel FALCOU

unread,

Jan 23, 2015, 11:17:25 AM1/23/15

to bo...@lists.boost.org, boost...@lists.boost.org, boost-a...@lists.boost.org

The mini-review of Boost.Endian starts today and runs through Sunday,
February 1.

The library repository is available on GitHub at
https://github.com/boostorg/endian

The docs for the library can be viewed online at
https://boostorg.github.io/endian/

The objective of the mini-review is to verify that the issues raised
during the formal review have been addressed. Those issues are
documented at https://boostorg.github.io/endian/mini_review_topics.html
which also describes the resolution of each issue.

Comments and suggestions unrelated to the mini-review issues are also
welcome, but the key question the review manager needs your opinion on
is this:

Is the library ready to be added to Boost releases?

-- Joel Falcou, Endian Review Manager

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Peter Dimov

unread,

Jan 23, 2015, 12:47:24 PM1/23/15

to bo...@lists.boost.org

Joel FALCOU wrote:

> Is the library ready to be added to Boost releases?

Let me preface everything else I'll say by that: I think that the answer to
the above question is "yes", as every question raised during the review
seems to have been addressed.

That said, I want to point out that the library in its current state does
not support one approach to dealing with endianness, which I will outline
below.

Let's take the following hypothetical file format as an example, loosely
lifted from the docs:

[00] code (32 bit big endian)
[04] length (32 bit big endian)
[08] version (16 bit little endian)
[10] shape type (32 bit little endian)
[14] ...

All three approaches that the library supports involve declaring a struct
with the above members and expecting that this struct can be read/written
directly to file, which means that its layout and size must correspond to
the above description.

What I tend to do, however, is rather different. I do declare a
corresponding struct:

struct header
{
int code;
unsigned length;
int version;
int shape_type;
};

but never read or write it directly, which means that I do not need to make
sure that its layout and size are fixed.

Instead, in the function read(h), I do this (pseudocode):

read( header& h )
{
unsigned char data[ 14 ];
fread data from file;

read_32_lsb( h.code, data + 0 );
read_32_lsb( h.length, data + 4 );
read_16_msb( h.version, data + 8 );
read_32_lsb( h.shape_type, data + 10 );
}

Note that this does not require the machine to have a 32 bit int or a 16 bit
int at all. int can be 48 bits wide and even have trap bits. Which is, I
admit, only of academic interest today, but still.

The generic implementation of read_32_lsb is:

void read_32_lsb( int & v, unsigned char data[ 4 ] )
{
unsigned w = data[ 0 ];

w += (unsigned)data[ 1 ] << 8;
w += (unsigned)data[ 2 ] << 16;
w += (unsigned)data[ 3 ] << 24;

v = w;
}

which works on any endianness.

This approach - as shown - does have a drawback. If you have an array of
802511 32 bit lsb integers in the file, and the native int is 32 bit lsb,
one read of 802511*4 bytes is vastly superior in performance to a loop that
would read 4 bytes and call read_32_lsb on the byte[4]. Which is why the
above is generally combined with

void read_32_lsb_n( int * first, int * last, FILE * fp );

but this ties us to using FILE* and, given the rest of the library, is not
strictly needed because it offers us enough options to handle this case
efficiently.

Nevin Liber

unread,

Jan 23, 2015, 2:27:06 PM1/23/15

to Renee Silverman

Is the library ready to be added to Boost releases?

YES!

On 23 January 2015 at 11:47, Peter Dimov <li...@pdimov.com> wrote:

Instead, in the function read(h), I do this (pseudocode):
>
> read( header& h )
> {
> unsigned char data[ 14 ];
> fread data from file;
>
> read_32_lsb( h.code, data + 0 );
> read_32_lsb( h.length, data + 4 );
> read_16_msb( h.version, data + 8 );
> read_32_lsb( h.shape_type, data + 10 );
> }
>

Basically you serialize it by hand.

Another more declarative (de)serialization technique is the one that Tom
Rodgers talked about at CppCon "Implementing Wire Protocols with Boost
Fusion" http://youtu.be/wbZdZKpUVeg.

Other than specifically calling out Boost.Serialization, this already seems
to be covered in the documentation:

*Why not just use Boost.Serialization?* Serialization involves a conversion
for every object involved in I/O. Endian integers require no conversion or
copying. They are already in the desired format for binary I/O. Thus they
can be read or written in bulk.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (847) 691-1404

Philip Bennefall

unread,

Jan 23, 2015, 4:27:38 PM1/23/15

to bo...@lists.boost.org

Hi all,

This is my first time reviewing a Boost library, so I figured a
mini-review might be a good place to start.

I did not look at the library during the formal review, so I am
approaching it as a new user. I have read through the documentation and
parts of the reference, and I think it looks good. The documentation
does a good job of explaining the pros and cons of the three usage
patterns, and I am happy with the performance measurements. I also
looked through the source, and was pleasantly surprised to see the
modest dependency list. That's a deal-breaker for me when deciding which
Boost libraries to use.

One thing I was not quite clear on is, what features are disabled in
C++03 mode? Can this be clarified a little?

I was happy to find the support for unaligned types with byte sizes
other than 1, 2, 4 and 8. This, in combination with dynamic_bitset,
would have been very useful when I implemented a MIDI file parser a
little while ago.

In my view, an endianness library should have been in Boost 10 years
ago. I vote for acceptance.

Kind regards,

Philip Bennefall

> .

Olaf van der Spek

unread,

Jan 23, 2015, 6:25:18 PM1/23/15

to bo...@lists.boost.org

On Fri, Jan 23, 2015 at 6:47 PM, Peter Dimov <li...@pdimov.com> wrote:
> read( header& h )
> {
> unsigned char data[ 14 ];
> fread data from file;
>
> read_32_lsb( h.code, data + 0 );
> read_32_lsb( h.length, data + 4 );
> read_16_msb( h.version, data + 8 );
> read_32_lsb( h.shape_type, data + 10 );
> }

I've been looking for (standard) functions like that too, I think
they'd be very useful.
You'd probably want aligned variants too as they might offer better performance.

Also got a question about the name endian_reverse: wouldn't
reverse_endian make more sense?

--
Olaf

Jason Newton

unread,

Jan 24, 2015, 2:25:33 AM1/24/15

to bo...@lists.boost.org, boost...@lists.boost.org, boost-a...@lists.boost.org

As a long time user of the Boost.Endian library (applying patches to the
distribution I maintain), I am happy to see it finally up for uh...
mini-review. Hoping to see it get to formal review and included... further
hoping it makes it into the next C++ standard!

It is ready in my opinion. I've seen it go through some topical changes
over the years but the only thing I've used from it are the conversion
functions which have been great for implementing proper io in the range of
network socket io / serial with various protocols and stored data from
varies devices/sensors) if a specification enforces byte order. It
fullfills my needs and always did fairly well on performance with gcc/clang.

I think the only thing that I have no interest in using - a kind way of
saying its probably unneeded bloat IMO is the arithmetic types. I prefer
native types and using conversion functions. But maybe they could improve
my code - its something I toss back and forth a little and haven't
attempted embracing.

This library should not be looked at as fulfilling an uncommon need - and
should even be part of future C++ standards because the usual
macro/functions (e.g. htons) in the in arpa/inet.h are widely used outside
of inet related code (see above examples) and have the types encoded in the
name making them not easily usable/composable in contrast to something
template based which the library offers. Further IEEE 754 support -
especially for double floating point types. The library definitely fills
that gap and has for years IMO.

Best Regards,
-Jason

Beman Dawes

unread,

Jan 24, 2015, 9:54:18 AM1/24/15

to Boost Developers List

Yes, decoupling the external I/O format from the internal format is
often good design.

> Instead, in the function read(h), I do this (pseudocode):
>
> read( header& h )
> {
> unsigned char data[ 14 ];
> fread data from file;
>
> read_32_lsb( h.code, data + 0 );
> read_32_lsb( h.length, data + 4 );
> read_16_msb( h.version, data + 8 );
> read_32_lsb( h.shape_type, data + 10 );
> }

Using the endian library to achieve the same effect:

struct header // same as above

{
int code;
unsigned length;
int version;
int shape_type;
};

read( header& h )
{
struct
{
big_int32_buf_t code;
big_uint32_buf_t length;
little_int16_buf_t version;
big_int32_buf_ut shape_type;
} data;

fread data from file;

h.code = data.code.value();
h.length = data.length.value();
h.version = data.version.value();
h.shape_type = data.shape_type();
}

This seems to me to be better code because the layout of either of the
two structs can change without having to change anything else. In your
code, if the size of " [00] code (32 bit big endian)" changed to "
[00] code (64 bit big endian)", then all four of the read_* function
calls would have to be changed.

--Beman

Olaf van der Spek

unread,

Jan 24, 2015, 10:53:03 AM1/24/15

to bo...@lists.boost.org

On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bda...@acm.org> wrote:
> struct
> {
> big_int32_buf_t code;
> big_uint32_buf_t length;
> little_int16_buf_t version;
> big_int32_buf_ut shape_type;
> } data;

You're mixing aligned and unaligned types. Shouldn't all be unaligned?
See the docs: "Warning: Code that uses aligned types is possibly
non-portable because alignment requirements vary between hardware
architectures and because alignment may be affected by compiler
switches or pragmas. For example, alignment of an 64-bit integer may
be to a 32-bit boundary on a 32-bit machine. Furthermore, aligned
types are only available on architectures with 16, 32, and 64-bit
integer types."

--
Olaf

Peter Dimov

unread,

Jan 24, 2015, 11:10:15 AM1/24/15

to bo...@lists.boost.org

Olaf van der Spek wrote:
> On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bda...@acm.org> wrote:
> > struct
> > {
> > big_int32_buf_t code;
> > big_uint32_buf_t length;
> > little_int16_buf_t version;
> > big_int32_buf_ut shape_type;
> > } data;
>
> You're mixing aligned and unaligned types. Shouldn't all be unaligned? See
> the docs: "Warning: Code that uses aligned types is possibly non-portable
> because alignment requirements vary between hardware architectures and
> because alignment may be affected by compiler switches or pragmas. For
> example, alignment of an 64-bit integer may be to a 32-bit boundary on a
> 32-bit machine. Furthermore, aligned types are only available on
> architectures with 16, 32, and 64-bit integer types."

Yes, a good argument against having aligned buffers at all, even though the
docs do provide rationale for them. If you want aligned, use the non-buffer
types.

Peter Dimov

unread,

Jan 24, 2015, 11:10:28 AM1/24/15

to bo...@lists.boost.org

> This approach - as shown - does have a drawback. If you have an array of
> 802511 32 bit lsb integers in the file, and the native int is 32 bit lsb,
> one read of 802511*4 bytes is vastly superior in performance to a loop
> that would read 4 bytes and call read_32_lsb on the byte[4]. Which is why
> the above is generally combined with
>
> void read_32_lsb_n( int * first, int * last, FILE * fp );

It occurs to me that the library could benefit from similar additions to
little_to_native et al, which are currently scalar-only and would,
therefore, depend on a Sufficiently Smart(tm) compiler to optimize out the
empty calls.

Batch endianness fixups are very common and very performance sensitive.

Olaf van der Spek

unread,

Jan 24, 2015, 11:13:33 AM1/24/15

to bo...@lists.boost.org

On Sat, Jan 24, 2015 at 5:04 PM, Peter Dimov <li...@pdimov.com> wrote:
> Olaf van der Spek wrote:
>>
>> On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bda...@acm.org> wrote:
>> > struct
>> > {
>> > big_int32_buf_t code;
>> > big_uint32_buf_t length;
>> > little_int16_buf_t version;
>> > big_int32_buf_ut shape_type;
>> > } data;
>>
>> You're mixing aligned and unaligned types. Shouldn't all be unaligned? See
>> the docs: "Warning: Code that uses aligned types is possibly non-portable
>> because alignment requirements vary between hardware architectures and
>> because alignment may be affected by compiler switches or pragmas. For
>> example, alignment of an 64-bit integer may be to a 32-bit boundary on a
>> 32-bit machine. Furthermore, aligned types are only available on
>> architectures with 16, 32, and 64-bit integer types."
>
>
> Yes, a good argument against having aligned buffers at all, even though the
> docs do provide rationale for them. If you want aligned, use the non-buffer
> types.

I think the default / shortest name types should be unaligned, with
aligned access being explicit.

--
Olaf

Beman Dawes

unread,

Jan 24, 2015, 1:00:52 PM1/24/15

to Boost Developers List

On Sat, Jan 24, 2015 at 10:52 AM, Olaf van der Spek <m...@vdspek.org> wrote:
> On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bda...@acm.org> wrote:
>> struct
>> {
>> big_int32_buf_t code;
>> big_uint32_buf_t length;
>> little_int16_buf_t version;
>> big_int32_buf_ut shape_type;
>> } data;
>
> You're mixing aligned and unaligned types. Shouldn't all be unaligned?

In theory, yes. If data_t ever got embedded into a larger structure at
an unaligned location, that would matter. Unaligned types would also
protect against architectures that required super-alignment or don't
support 16/32/64 bit integer types, but I'm not sure such
architectures actually exist anymore.

In practice, the enhanced efficiency of aligned types is usually more
important than worry about someday encountering an odd-ball
architecture or use in an unaligned location. But it is an engineering
tradeoff, so you can make the choice whichever way seems appropriate
to your application.

--Beman

Peter Dimov

unread,

Jan 24, 2015, 1:28:00 PM1/24/15

to bo...@lists.boost.org

Beman Dawes wrote:
> In practice, the enhanced efficiency of aligned types is usually more
> important than worry about someday encountering an odd-ball architecture
> or use in an unaligned location. But it is an engineering tradeoff, so you
> can make the choice whichever way seems appropriate to your application.

The more practical problem is not oddball architectures, but the fact that
when a field changes in size, everything after that field needs to be
rechecked for alignedness. (You correctly identified this as a maintenance
problem with the fixed offset approach I outlined, and it also applies
here.)

Nevin Liber

unread,

Jan 24, 2015, 3:03:59 PM1/24/15

to Renee Silverman

On 24 January 2015 at 12:00, Beman Dawes <bda...@acm.org> wrote:

> In practice, the enhanced efficiency of aligned types is usually more
> important than worry about someday encountering an odd-ball
> architecture or use in an unaligned location.
>

I disagree with that for endian. Most of those uses are in wire and file
protocols, and they are either densely packed or based on the alignment of
the original architecture that the protocol was developed on. Unless one
is transferring data between machines with different architectures, why
would one use endian at all?

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (847) 691-1404

Olaf van der Spek

unread,

Jan 24, 2015, 5:34:26 PM1/24/15

to bo...@lists.boost.org

On Sat, Jan 24, 2015 at 9:03 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 24 January 2015 at 12:00, Beman Dawes <bda...@acm.org> wrote:
>
>> In practice, the enhanced efficiency of aligned types is usually more
>> important than worry about someday encountering an odd-ball
>> architecture or use in an unaligned location.
>>
>
> I disagree with that for endian. Most of those uses are in wire and file
> protocols, and they are either densely packed or based on the alignment of
> the original architecture that the protocol was developed on. Unless one
> is transferring data between machines with different architectures, why
> would one use endian at all?

Because the protocol is specified as using big endian ints (for
example) and the code has to run on little endian machines too.

--
Olaf

Beman Dawes

unread,

Jan 25, 2015, 10:25:03 AM1/25/15

to Boost Developers List

On Fri, Jan 23, 2015 at 4:26 PM, Philip Bennefall <phi...@blastbay.com> wrote:
> Hi all,
>
> This is my first time reviewing a Boost library, so I figured a mini-review
> might be a good place to start.
>
> I did not look at the library during the formal review, so I am approaching
> it as a new user. I have read through the documentation and parts of the
> reference, and I think it looks good. The documentation does a good job of
> explaining the pros and cons of the three usage patterns, and I am happy
> with the performance measurements. I also looked through the source, and was
> pleasantly surprised to see the modest dependency list. That's a
> deal-breaker for me when deciding which Boost libraries to use.
>
> One thing I was not quite clear on is, what features are disabled in C++03
> mode? Can this be clarified a little?

New section, added. See
https://boostorg.github.io/endian/index.html#C++03-support

>
> I was happy to find the support for unaligned types with byte sizes other
> than 1, 2, 4 and 8. This, in combination with dynamic_bitset, would have
> been very useful when I implemented a MIDI file parser a little while ago.
>
> In my view, an endianness library should have been in Boost 10 years ago. I
> vote for acceptance.

Thanks,

--Beman

Beman Dawes

unread,

Jan 25, 2015, 10:49:03 AM1/25/15

to Boost Developers List

On Fri, Jan 23, 2015 at 6:25 PM, Olaf van der Spek <m...@vdspek.org> wrote:
...

>
> Also got a question about the name endian_reverse: wouldn't
> reverse_endian make more sense?

I've gone around and around in circles between "reverse_endianness"
(most explicit, but too long), "reverse_endian" (which grates on me a
bit because every time I see it I waste time thinking it should be
spelled "reverse_endianness"), and "endian_reverse". I like
endian_reverse because it starts right out by alerting the reader as
to the context of the reversal. Since this function is used as a UDT
customization point, so gets defined in a user namespace for UDT
types, that seems important to me.

Thanks,

--Beman

Beman Dawes

unread,

Jan 25, 2015, 11:22:24 AM1/25/15

to Boost Developers List, boost...@lists.boost.org

On Sat, Jan 24, 2015 at 2:25 AM, Jason Newton <nev...@gmail.com> wrote:
...

> As a long time user of the Boost.Endian library (applying patches to the
> distribution I maintain), I am happy to see it finally up for uh...
> mini-review. Hoping to see it get to formal review and included... further
> hoping it makes it into the next C++ standard!

The library went through formal review in 2011, so will go into Boost
as soon as the mini-review is over. And, yes, I do plan to propose it
to the standards committee.

> It is ready in my opinion. I've seen it go through some topical changes
> over the years but the only thing I've used from it are the conversion
> functions which have been great for implementing proper io in the range of
> network socket io / serial with various protocols and stored data from
> varies devices/sensors) if a specification enforces byte order. It
> fullfills my needs and always did fairly well on performance with gcc/clang.

Depending on what version you have been using, the performance may now
be even better.

> I think the only thing that I have no interest in using - a kind way of
> saying its probably unneeded bloat IMO is the arithmetic types. I prefer
> native types and using conversion functions. But maybe they could improve
> my code - its something I toss back and forth a little and haven't
> attempted embracing.

Because even the endian buffer and arithmetic types that appear most
expensive generate only short sequences of code, the optimizing
compilers often generate exactly the same instructions regardless of
endian approach. On Intel machines, that often distills down to a
single bswap instruction since the value is already in a register.

That isn't always true, so you do need to measure performance of
various approaches in the context of your actual application. But if
use of the buffer or arithmetic types would otherwise improve your
code, please don't reject them without actually testing performance
first.

>
> This library should not be looked at as fulfilling an uncommon need - and
> should even be part of future C++ standards because the usual
> macro/functions (e.g. htons) in the in arpa/inet.h are widely used outside
> of inet related code (see above examples) and have the types encoded in the
> name making them not easily usable/composable in contrast to something
> template based which the library offers. Further IEEE 754 support -
> especially for double floating point types. The library definitely fills
> that gap and has for years IMO.

Thanks for the kind words!

--Beman

Beman Dawes

unread,

Jan 25, 2015, 11:53:36 AM1/25/15

to Boost Developers List

On Sat, Jan 24, 2015 at 11:13 AM, Olaf van der Spek <m...@vdspek.org> wrote:
> On Sat, Jan 24, 2015 at 5:04 PM, Peter Dimov <li...@pdimov.com> wrote:
>> Olaf van der Spek wrote:
>>>
>>> On Sat, Jan 24, 2015 at 3:54 PM, Beman Dawes <bda...@acm.org> wrote:
>>> > struct
>>> > {
>>> > big_int32_buf_t code;
>>> > big_uint32_buf_t length;
>>> > little_int16_buf_t version;
>>> > big_int32_buf_ut shape_type;
>>> > } data;
>>>
>>> You're mixing aligned and unaligned types. Shouldn't all be unaligned? See
>>> the docs: "Warning: Code that uses aligned types is possibly non-portable
>>> because alignment requirements vary between hardware architectures and
>>> because alignment may be affected by compiler switches or pragmas. For
>>> example, alignment of an 64-bit integer may be to a 32-bit boundary on a
>>> 32-bit machine. Furthermore, aligned types are only available on
>>> architectures with 16, 32, and 64-bit integer types."
>>
>>
>> Yes, a good argument against having aligned buffers at all, even though the
>> docs do provide rationale for them. If you want aligned, use the non-buffer
>> types.
>
> I think the default / shortest name types should be unaligned, with
> aligned access being explicit.

I thought that at one time, too, and the names were as you suggest.

There were several problems with defaulting to unaligned and thus
having longer names for aligned types.

1) Real-world applications use aligned data layouts far more often
than unaligned. Users prefer that defaults meet the most common needs.

2) If the default is unaligned, unaligned types will get used
inadvertently where aligned types could be used, and that can cause
performance problems. Remember that the familiar <cstdint> types (e.g.
int16_t, int32_t) are aligned types. It is natural to assume
big_int32_t is an aligned type.

Thanks,

--Beman

Peter Dimov

unread,

Jan 25, 2015, 12:03:55 PM1/25/15

to bo...@lists.boost.org

Beman Dawes wrote:

> 2) If the default is unaligned, unaligned types will get used
> inadvertently where aligned types could be used, and that can cause
> performance problems. Remember that the familiar <cstdint> types (e.g.
> int16_t, int32_t) are aligned types. It is natural to assume big_int32_t
> is an aligned type.

big_int32_t is not a buffer type though.

I don't know whether Olaf includes non-buffer types when he says:

>> I think the default / shortest name types should be unaligned, with
>> aligned access being explicit.

but I most certainly do not:

>>> Yes, a good argument against having aligned buffers at all, even though
>>> the docs do provide rationale for them. If you want aligned, use the
>>> non-buffer types.

Jason Newton

unread,

Jan 25, 2015, 6:12:48 PM1/25/15

to bo...@lists.boost.org, boost...@lists.boost.org

On Sun, Jan 25, 2015 at 8:21 AM, Beman Dawes <bda...@acm.org> wrote:

> On Sat, Jan 24, 2015 at 2:25 AM, Jason Newton <nev...@gmail.com> wrote:
> > I think the only thing that I have no interest in using - a kind way of
> > saying its probably unneeded bloat IMO is the arithmetic types. I prefer
> > native types and using conversion functions. But maybe they could
> improve
> > my code - its something I toss back and forth a little and haven't
> > attempted embracing.
>
> Because even the endian buffer and arithmetic types that appear most
> expensive generate only short sequences of code, the optimizing
> compilers often generate exactly the same instructions regardless of
> endian approach. On Intel machines, that often distills down to a
> single bswap instruction since the value is already in a register.
>
> That isn't always true, so you do need to measure performance of
> various approaches in the context of your actual application. But if
> use of the buffer or arithmetic types would otherwise improve your
> code, please don't reject them without actually testing performance
> first.
>

Allow me to clarify: I meant code bloat, not speed.

-Jason

Jeremy Maitin-Shepard

unread,

Jan 25, 2015, 9:19:30 PM1/25/15

to bo...@lists.boost.org

I have made use of this library over the past 6 months. It has served my
needs very well, and I think it is definitely ready to be added to Boost.

I have a few comments, though:

conditional_reverse and conditional_reverse_inplace should default the
second order value to native (for both the compile-time order and runtime
order versions). Conversions are almost always to/from native endianness,
and it is annoying to have to specify that explicitly.

The supplied endian_arithmetic and endian_buffer templates are also a bit
inconvenient to use due to the need to specify both the integer type and the
number of bits. It would be more convenient to be able to just specify the
number of bits and have the integer type be determined automatically. I
suppose there is the issue of needing to choose between an exact type and a
possible faster type.

For both the buffer and arithmetic types, if the aligned types are to be
kept, the user needs to make an explicit choice about alignment. Making
either aligned or unaligned the effective default, by giving it the
big_uint32_t name, rather than the more explicit big_uint32_ut/big_uint32_at
types, makes it easier for the user to overlook alignment issues. Therefore
I suggest that the _t names be removed in favor of the more explicit _at and
_ut names. Potentially, the alignment amount could also be specified as a
template argument rather than be platform-dependent.

The correct usage of aligned generally seems rather tricky. Since alignment
is platform-specific, it seems there is high risk of inadvertantly
introducing padding where it doesn't belong. It also seems there is a high
risk of the alignment constraints being violated when some byte buffer is
cast to the struct containing endian buffer/arithmetic types. It seems
about the only guaranteed safe use of the aligned types is if a single
endian type is declared as a local variable.

There is also the fact that on x86/x86-64, there are no separate
instructions for aligned vs. unaligned data, the code just runs a bit faster
if the data is aligned. I think the Endian library partially takes
advantage of this (e.g. via the patch I submitted), though there may be some
places in which it does not take advantage of this. The benefit of using
aligned types is particularly small, therefore, on x86. It would be nice if
there were a way of declaring a "preferred" alignment, that will be used if
possible for automatic variables, but will never result in padding being
added. Unfortunately I don't know of any way to get that behavior.

Beman Dawes

unread,

Jan 26, 2015, 7:24:28 AM1/26/15

to Boost Developers List

On Sat, Jan 24, 2015 at 3:03 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> On 24 January 2015 at 12:00, Beman Dawes <bda...@acm.org> wrote:
>
>> In practice, the enhanced efficiency of aligned types is usually more
>> important than worry about someday encountering an odd-ball
>> architecture or use in an unaligned location.
>>
>
> I disagree with that for endian. Most of those uses are in wire and file
> protocols, and they are either densely packed or based on the alignment of
> the original architecture that the protocol was developed on.

I agree with you that a lot of uses are for legacy formats, but my
experience has been that most of those observe 16 and 32-bit alignment
rules that are the same as we need today. 64-bit values may be a
different story - I have virtually no experience with legacy formats
that use 64-bit data.

But regardless, the endian library must support both aligned and
unaligned. I'm warming to Jeremy Maitin-Shepard's suggestion in a
later post that we use explicit names for both aligned and unaligned
types.

> Unless one
> is transferring data between machines with different architectures, why
> would one use endian at all?

Most of us write programs for little-endian architectures that deal
with big endian wire and file formats.

Thanks,

--Beman

Beman Dawes

unread,

Jan 26, 2015, 8:04:20 AM1/26/15

to Boost Developers List, boost...@lists.boost.org

On Sun, Jan 25, 2015 at 6:12 PM, Jason Newton <nev...@gmail.com> wrote:
> On Sun, Jan 25, 2015 at 8:21 AM, Beman Dawes <bda...@acm.org> wrote:
>
>> On Sat, Jan 24, 2015 at 2:25 AM, Jason Newton <nev...@gmail.com> wrote:
>> > I think the only thing that I have no interest in using - a kind way of
>> > saying its probably unneeded bloat IMO is the arithmetic types. I prefer
>> > native types and using conversion functions. But maybe they could
>> improve
>> > my code - its something I toss back and forth a little and haven't
>> > attempted embracing.
>>
>> Because even the endian buffer and arithmetic types that appear most
>> expensive generate only short sequences of code, the optimizing
>> compilers often generate exactly the same instructions regardless of
>> endian approach. On Intel machines, that often distills down to a
>> single bswap instruction since the value is already in a register.
>>
>> That isn't always true, so you do need to measure performance of
>> various approaches in the context of your actual application. But if
>> use of the buffer or arithmetic types would otherwise improve your
>> code, please don't reject them without actually testing performance
>> first.
>>
>
> Allow me to clarify: I meant code bloat, not speed.

Sorry - your original post was clear, but I misinterpreted.

I have not systematically measured code bloat. Maybe I can do that at
least for some common use cases.

Thanks,

--Beman

Beman Dawes

unread,

Jan 26, 2015, 9:39:59 AM1/26/15

to Boost Developers List

On Sun, Jan 25, 2015 at 9:19 PM, Jeremy Maitin-Shepard
<jer...@jeremyms.com> wrote:
> I have made use of this library over the past 6 months. It has served my
> needs very well, and I think it is definitely ready to be added to Boost.
>
> I have a few comments, though:
>
> conditional_reverse and conditional_reverse_inplace should default the
> second order value to native (for both the compile-time order and runtime
> order versions). Conversions are almost always to/from native endianness,
> and it is annoying to have to specify that explicitly.

Makes sense. Will give it a try.

> The supplied endian_arithmetic and endian_buffer templates are also a bit
> inconvenient to use due to the need to specify both the integer type and the
> number of bits. It would be more convenient to be able to just specify the
> number of bits and have the integer type be determined automatically. I
> suppose there is the issue of needing to choose between an exact type and a
> possible faster type.

My expectation is that the class templates will only be used directly
is in highly generic code where specifying the integer type is not a
problem. I also worry about introducing change to these critical
templates at this stage in the process.

>
> For both the buffer and arithmetic types, if the aligned types are to be
> kept, the user needs to make an explicit choice about alignment. Making
> either aligned or unaligned the effective default, by giving it the
> big_uint32_t name, rather than the more explicit big_uint32_ut/big_uint32_at
> types, makes it easier for the user to overlook alignment issues. Therefore
> I suggest that the _t names be removed in favor of the more explicit _at and
> _ut names.

That's an interesting thought. Alignment is a critical choice, so
requiring it be explicit might be doing users a favor. What do others
think?

> Potentially, the alignment amount could also be specified as a
> template argument rather than be platform-dependent.

I'd rather not introduce additional complexity for something that is a
potential rather than demonstrated need.

> The correct usage of aligned generally seems rather tricky. Since alignment
> is platform-specific, it seems there is high risk of inadvertantly
> introducing padding where it doesn't belong.

Agreed. Docs changed to recommend static_asserting that sizeof the
enclosing structure is correct.

> It also seems there is a high
> risk of the alignment constraints being violated when some byte buffer is
> cast to the struct containing endian buffer/arithmetic types. It seems
> about the only guaranteed safe use of the aligned types is if a single
> endian type is declared as a local variable.

In isolation, perhaps. But when coupled with software engineering
practices like use of asserts and static asserts, aligned types are
safe.

> There is also the fact that on x86/x86-64, there are no separate
> instructions for aligned vs. unaligned data, the code just runs a bit faster
> if the data is aligned. I think the Endian library partially takes
> advantage of this (e.g. via the patch I submitted),

Yes, your patches have been applied:-)

> though there may be some
> places in which it does not take advantage of this.

Possibly. Any additional patches would be welcomed.

> The benefit of using
> aligned types is particularly small, therefore, on x86. It would be nice if
> there were a way of declaring a "preferred" alignment, that will be used if
> possible for automatic variables, but will never result in padding being
> added. Unfortunately I don't know of any way to get that behavior.

I'm afraid we are getting into an area of diminishing returns,
although it is hard to know for sure without a specific proposal.

My feeling is that the library is "good enough", and that future
effort is best spent on a proposal for standardizing it:-)

Thanks,

--Beman

Peter Dimov

unread,

Jan 26, 2015, 10:11:42 AM1/26/15

to bo...@lists.boost.org

Beman Dawes wrote:

> My expectation is that the class templates will only be used directly is
> in highly generic code where specifying the integer type is not a problem.

Your expectation is incorrect. Some file formats support both little and big
endian variants (to make it easier for writers, at readers' expense). It's
typical to do

read( signature );

if( signature == 0x12345678 )
{
read_header<little_endian>( h );
}
else if( signature == 0x78563412 )
{
read_header<big_endian>( h );
}
else
{
// not a valid file
}

which is "lowly generic".

That said, I support your decision to have both the type and the size as
template parameters; as I demonstrated in my example, the type is decided by
the program, and the size is decided by the file format, so in general, the
two do not necessarily match.

However, this allows me to sneak in another comment. Currently, if I try to
put 0x123456 into a 16 bit buffer, it silently cuts it to 0x3456 without
missing a beat. This is sometimes, but very rarely, what we want. Typically,
what one wants in this case is an exception, because this usually happens
when writing a file, and if one tries to write 0x123456 and only 0x3456
comes out, it's likely that the result will be corrupted and unreadable.

Beman Dawes

unread,

Jan 26, 2015, 4:00:30 PM1/26/15

to Boost Developers List

On Mon, Jan 26, 2015 at 10:11 AM, Peter Dimov <li...@pdimov.com> wrote:
...

>
> However, this allows me to sneak in another comment. Currently, if I try to
> put 0x123456 into a 16 bit buffer, it silently cuts it to 0x3456 without
> missing a beat. This is sometimes, but very rarely, what we want. Typically,
> what one wants in this case is an exception, because this usually happens
> when writing a file, and if one tries to write 0x123456 and only 0x3456
> comes out, it's likely that the result will be corrupted and unreadable.

Good idea!

Perhaps an error checking policy could be added as a template
parameter, with two supplied - one a nop, the other throwing an
exception on overflow.

I'll document that somewhere as a high priority for the future.

--Beman

Peter Dimov

unread,

Jan 26, 2015, 4:43:05 PM1/26/15

to bo...@lists.boost.org

Beman Dawes wrote:
> Perhaps an error checking policy could be added as a template parameter,
> with two supplied - one a nop, the other throwing an exception on
> overflow.

Given that in many, if not most, cases, the check is a no-op - when the
value type and the bits match - it might instead make sense to make the
methods conditionally noexcept when the bits are enough to store everything
in range, and always throw on overflow otherwise.

Olaf van der Spek

unread,

Jan 26, 2015, 4:49:38 PM1/26/15

to bo...@lists.boost.org

On Mon, Jan 26, 2015 at 10:42 PM, Peter Dimov <li...@pdimov.com> wrote:
> Beman Dawes wrote:
>>
>> Perhaps an error checking policy could be added as a template parameter,
>> with two supplied - one a nop, the other throwing an exception on overflow.
>
>
> Given that in many, if not most, cases, the check is a no-op - when the
> value type and the bits match - it might instead make sense to make the
> methods conditionally noexcept when the bits are enough to store everything
> in range, and always throw on overflow otherwise.

Isn't overflow more like a logic error than a runtime error? An assert
might make more sense.

--
Olaf

Peter Dimov

unread,

Jan 26, 2015, 5:14:40 PM1/26/15

to bo...@lists.boost.org

Olaf van der Spek wrote:

> Isn't overflow more like a logic error than a runtime error? An assert
> might make more sense.

It typically happens in a portion of the code that can already throw because
it deals with output to file. And if it were an assert, what would you do to
avoid it? The same the library would do, check the value and throw if it
doesn't fit. So the library just does for you what you'd have done anyway,
as a convenience.

As a more practical matter, users don't tend to like asserts when saving,
even if this would have been the theoretically sound thing to do. :-)

(For the reverse operation, where the bits are more than the value type can
represent, assert doesn't seem to make sense either because we don't control
the bits, which typically come from an external source.)

In all this I'm assuming the buffer types, BTW. I'm not sure if the same
logic would apply to the arithmetic types, because they have very different
use cases. Buffers are relatively straightforward - the bits are read from
file and the value is extracted, or the value is stored and the bits are
written - so it's easier to reason about the context in which they are used.
Arithmetics should probably just do on overflow whatever the underlying type
does.

Cliff Green

unread,

Jan 26, 2015, 11:25:24 PM1/26/15

to bo...@lists.boost.org

> Is the library ready to be added to Boost releases?

Yes.

I have used Beman's Endian library from 2011 extensively and have been
completely satisfied by it. This newer version provides the original
functionality plus adds extensive new useful functionality (and will allow
me to replace some of my "homegrown" endian utility code).

However, I do have concern about one aspect of the new functionality (or
maybe I didn't read deep enough in the docs) - floating point endian
reversal functions which return by value.

Unless something has significantly changed in the last few years, swapping
and returning floating point values is likely to silently change bits for
certain bit patterns / values. In particular, returning by value will
normalize some values (I assume as part of loading or accessing the value
into floating point CPU registers). There may be other floating point
characteristics that silently change the bits dealing with NaN and infinity
values (but I'm far from an expert on floating point computations).

I've never seen the normalization happen for in-place swapping of floating
point values, only when returning by value.

This means that, for certain values of x, where x is floating point:
endian_reverse(endian_reverse(x)) != x. (Note that for integers,
endian_reverse(endian_reverse(x)) == x, for all possible values of x.)

Maybe I missed something in the docs where the normalization is turned off,
or an exception thrown, or somehow this situation is addressed (and again,
it's only for the conversion functions returning by value). Some of the
Boost floating point / computation experts might want to chime in.

I've worked in multiple projects that blithely swapped floating point values
in various distributed processing environments, rarely paying attention to
these kinds of issues (as well as more obvious issues, like whether all
systems were IEEE 754 or not), and most of the time they were "lucky",
happening to write home-grown code that swapped in place and was careful not
to move the swapped floating point values out of buffers before reading or
writing. I always mentioned this code was brittle and to pay attention or
document the brittleness.

Cliff

---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

Olaf van der Spek

unread,

Jan 27, 2015, 3:41:31 AM1/27/15

to bo...@lists.boost.org

On Mon, Jan 26, 2015 at 11:14 PM, Peter Dimov <li...@pdimov.com> wrote:
> Olaf van der Spek wrote:
>
>> Isn't overflow more like a logic error than a runtime error? An assert
>> might make more sense.
>
>
> It typically happens in a portion of the code that can already throw because
> it deals with output to file. And if it were an assert, what would you do to
> avoid it?

On input ensure the destination (type) is large enough to hold all
values and on output ensure the value to be written isn't too large.

> The same the library would do, check the value and throw if it
> doesn't fit. So the library just does for you what you'd have done anyway,
> as a convenience.
>
> As a more practical matter, users don't tend to like asserts when saving,
> even if this would have been the theoretically sound thing to do. :-)

Users don't like bugs in general.

> (For the reverse operation, where the bits are more than the value type can
> represent, assert doesn't seem to make sense either because we don't control
> the bits, which typically come from an external source.)

But we do control the destination type.

> In all this I'm assuming the buffer types, BTW. I'm not sure if the same
> logic would apply to the arithmetic types, because they have very different
> use cases. Buffers are relatively straightforward - the bits are read from
> file and the value is extracted, or the value is stored and the bits are
> written - so it's easier to reason about the context in which they are used.
> Arithmetics should probably just do on overflow whatever the underlying type
> does.
>
> _______________________________________________
> Unsubscribe & other changes:
> http://lists.boost.org/mailman/listinfo.cgi/boost

--
Olaf

Olaf van der Spek

unread,

Jan 27, 2015, 4:26:34 AM1/27/15

to bo...@lists.boost.org

On Sun, Jan 25, 2015 at 5:53 PM, Beman Dawes <bda...@acm.org> wrote:
> I thought that at one time, too, and the names were as you suggest.
>
> There were several problems with defaulting to unaligned and thus
> having longer names for aligned types.
>
> 1) Real-world applications use aligned data layouts far more often
> than unaligned. Users prefer that defaults meet the most common needs.

Unaligned types support all cases, aligned types do not, so it's at
most a performance issue.

> 2) If the default is unaligned, unaligned types will get used
> inadvertently where aligned types could be used, and that can cause
> performance problems. Remember that the familiar <cstdint> types (e.g.
> int16_t, int32_t) are aligned types. It is natural to assume
> big_int32_t is an aligned type.

Maybe

BTW, what about the requested read_be32 and write_be32 etc variants?
Having a way to read an int somewhere out of a (unsigned) char buffer
is really handy.

--
Olaf

Beman Dawes

unread,

Jan 27, 2015, 8:49:56 AM1/27/15

to Boost Developers List

Cliff,

Thank you very much for this post! You have identified a serious weakness.

The underlying problem is that I have virtually no experience with
floating-point issues. So if you or others could help improve endian
floating point support, it will be greatly appreciated.

Here are some areas where help is needed:

* Recommendations for changes to the library interface for floating
point. For example, eliminating the return by value functions.
* Test cases that probe for errors you have seen in the past or that
you worry about.
* Suggestions about warnings or other changes you would like to see
made to the docs.
* Should the implementation refuse to compile (i.e. #error) on no-IEEE
754 systems? If so, how to implement?
* Anything else that you can think of to improve FP support.

Thanks,

--Beman

Beman Dawes

unread,

Jan 27, 2015, 10:26:08 AM1/27/15

to Boost Developers List

On Tue, Jan 27, 2015 at 4:26 AM, Olaf van der Spek <m...@vdspek.org> wrote:

>
> BTW, what about the requested read_be32 and write_be32 etc variants?
> Having a way to read an int somewhere out of a (unsigned) char buffer
> is really handy.

I've added a "Future directions" docs section, and that is one of the items.

https://boostorg.github.io/endian/index.html#Future-directions

--Beman

Stefan Seefeld

unread,

Jan 27, 2015, 11:47:02 AM1/27/15

to bo...@lists.boost.org

Hi there,

sorry to hijack this discussion, but trying to view at the docs I had
some generic questions:

On 23/01/15 11:17 AM, Joel FALCOU wrote:
> The docs for the library can be viewed online at
> https://boostorg.github.io/endian/

Are there plans to make all of the boost docs / website available
through https://boostorg.github.io ?
At present https://boostorg.github.io/endian/ appears to work mostly
fine, though the logo is missing, and the toplevel link to
https://boostorg.github.io/index.html yields a 404 error. Is that
supposed o work ? Are library authors required to do anything to
generate those docs, and is that documented anywhere ?

(I'm looking at https://svn.boost.org/trac/boost/wiki/ModularBoost but
find that to be lacking a bit behind, so it's hard to figure out what
actually works and what is only planned. )

Thanks,
Stefan

--

...ich hab' noch einen Koffer in Berlin...

Beman Dawes

unread,

Jan 27, 2015, 1:28:50 PM1/27/15

to Boost Developers List

Beman Dawes

unread,

Jan 27, 2015, 1:39:30 PM1/27/15

to Boost Developers List

On Tue, Jan 27, 2015 at 11:46 AM, Stefan Seefeld <ste...@seefeld.name> wrote:

> Hi there,
>
> sorry to hijack this discussion, but trying to view at the docs I had
> some generic questions:
>
> On 23/01/15 11:17 AM, Joel FALCOU wrote:
>> The docs for the library can be viewed online at
>> https://boostorg.github.io/endian/
>
> Are there plans to make all of the boost docs / website available
> through https://boostorg.github.io ?

Not that I know of. It is convenient to have a library's docs online
for mini-reviews, and I used GitHub to do that because it is so
convenient.

> At present https://boostorg.github.io/endian/ appears to work mostly
> fine, though the logo is missing, and the toplevel link to
> https://boostorg.github.io/index.html yields a 404 error. Is that
> supposed o work?

No, since those are both links to the super project, and for purposes
of a mini-review they were not necessary.

> Are library authors required to do anything to
> generate those docs, and is that documented anywhere ?

No, and no. I sometimes publish master docs and sometimes develop
docs, according to current needs.

>
> (I'm looking at https://svn.boost.org/trac/boost/wiki/ModularBoost but
> find that to be lacking a bit behind, so it's hard to figure out what
> actually works and what is only planned. )

If we do ever require it, those pages will get updated.

Cheers,

--Beman

Rob Stewart

unread,

Jan 29, 2015, 1:53:41 PM1/29/15

to bo...@lists.boost.org

On January 23, 2015 11:17:25 AM EST, Joel FALCOU <joel....@gmail.com> wrote:
> The mini-review of Boost.Endian starts today and runs through Sunday,
> February 1.

>
> The docs for the library can be viewed online at
> https://boostorg.github.io/endian/

I found a few issues on the docs:

* "an class"
* "may improved efficiency"
* " Remarks: Meet the EndianReversible requirements", for endian_reverse() should, I assume, refer to "x"
* should there be a note suggesting that the following be constexpr, when possible?

template <class EndianReversible> EndianReversible conditional_reverse(EndianReversible x, order order1, order order2) noexcept;

* The effects documented for endian_reverse_inplace() don't allow for specialization.

The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate.
I
___
Rob

(Sent from my portable computation engine)

Paul A. Bristow

unread,

Jan 30, 2015, 5:25:36 AM1/30/15

to bo...@lists.boost.org

The proof of the pudding is in the eating.

I'd suggest that *testing interoperability* is the key precaution that you should advise.

It should obviously be between all possible combinations of processor hardware types.

The test should include a range of floating-point values from min to max,
and include NaN and infinity unless it is certain that they will never occur.

(Unless very special precautions are taken, then infinity and NaN are like to rear their ugly heads
at some time - probably when you least want to see them!)

It must be possible to 'round-trip' all floating-point values.

You can't test all possible values (except for 32-bit float - 64-bit double takes about 50 years at
current processor speeds ;-)
but you can chose values (floating-point bit patterns) at random for a test lasting some minutes or
even hours.

(This was the only way that I found to check for failures in Microsoft cout - cin round-tripping (a
feature not a bug) that are relevant to serialization).

Paul

---
Paul A. Bristow
Prizet Farmhouse
Kendal UK LA8 8AB
+44 (0) 1539 561830

Cliff Green

unread,

Jan 30, 2015, 12:17:43 PM1/30/15

to bo...@lists.boost.org

>> ... I'd suggest that *testing interoperability* is the key precaution

>> that you should advise.

It should obviously be between all possible combinations of processor
hardware types.

The test should include a range of floating-point values from min to max,
and include NaN and infinity unless it is certain that they will never
occur.

(Unless very special precautions are taken, then infinity and NaN are like
to rear their ugly heads at some time - probably when you least want to see
them!)

It must be possible to 'round-trip' all floating-point values.

You can't test all possible values (except for 32-bit float - 64-bit double
takes about 50 years at current processor speeds ;-) but you can chose
values (floating-point bit patterns) at random for a test lasting some
minutes or even hours.
<<

Thanks, Paul.

I was going to start another discussion thread asking for suggestions from
some of the experts (or at least people more expert than me) dealing with
floating point endian swapping, but I'll wait to see if we get responses on
this thread.

I'm still not sure what to suggest to Beman for the Endian library wrt to
float and double API's, specially as this library could or should be the
basis for a standardization proposal. Integers are completely safe and
well-defined for swapping - all bit patterns form valid integers and
swapping bytes never result in "special" hardware processing. This is not
the case for floating point. I have definitely seen "in-place" floating
point swapping used extensively without problems, and have definitely seen
other functionality have problems (my previously mentioned "returning
swapped floating point by value causes silent changing of bits"
normalization issue). And, of course, there's the additional problem of
binary floating point interoperability between IEEE-754 processors and those
that are not.

Boost.Endian does not (and should not) define wire protocols. It provides
functionality and API's for translating to and from wire protocols. But with
floating point, any kind of standardization will have to not only define the
API, but some form of wire protocol. Then the Endian functions become
something like:

opaque_binary_type native_to_interoperate_fp(float x); // opaque_binary_type
stores fp elements - sign, mantissa, exponent, etc
opaque_binary_type native_to_interoperate_fp(double x);
float fp_interoperate_to_native(opaque_binary_type x);
double fp_interoperate_to_native(opaque_binary_type x);

Unless there is something I'm missing.

Here's my initial (and incomplete) answers to Beman's queries to my e-mail:

* Recommendations for changes to the library interface for floating point.
For example, eliminating the return by value functions.

Eliminating the floating point "return by value" function should definitely
should be performed. Whether the "in place" swap should be removed I'd like
to hear from others more expert. Specifically, with common hardware
architectures, what are the operations that will invoke floating point
normalization (and other "bit changing" functionality)? What is safe to
perform?

* Test cases that probe for errors you have seen in the past or that you
worry about.

Paul gave a pretty good summary in his reply about the general testing
strategy. I don't have specific test cases, so maybe others more expert can
chime in with specific test logic.

* Suggestions about warnings or other changes you would like to see made to
the docs.

Definitely some kind of summary of these warnings and "brittleness" should
be included in the docs. Probably no more than a paragraph or two of general
"watch out". I'll be happy to help. If we get responses from people more
expert at the hardware instruction level (i.e. what is really going on
"under the covers") summaries of those discussions can be included.

* Should the implementation refuse to compile (i.e. #error) on no-IEEE 754
systems? If so, how to implement?

Not sure - if we're not defining a floating point wire protocol, then it
doesn't matter what the underlying bit pattern of a swapped floating value
is. It's up to the application to know what's "on the wire" and whether it
can be safely unswapped back to native float or double (just like it's up
the application to know the underlying integer wire protocol).

* Anything else that you can think of to improve FP support.

Hopefully this is a good start and others will chime in and you'll have a
useful and safe (even if not 100% complete) initial API and implementation.

Cliff

---
This email has been checked for viruses by Avast antivirus software.
http://www.avast.com

Rob Stewart

unread,

Jan 30, 2015, 1:52:10 PM1/30/15

to bo...@lists.boost.org

On January 30, 2015 12:17:21 PM EST, Cliff Green <cli...@codewrangler.net> wrote:
> >> ... I'd suggest that *testing interoperability* is the key
> precaution
> >> that you should advise.

[snip]

> I was going to start another discussion thread asking for suggestions
> from
> some of the experts (or at least people more expert than me) dealing
> with
> floating point endian swapping, but I'll wait to see if we get
> responses on
> this thread.

Another thread would be wise. The change in subject will attract attention, and the new thread will surface even if someone has ignored this one.

___
Rob

(Sent from my portable computation engine)

Gottlob Frege

unread,

Jan 31, 2015, 12:42:58 AM1/31/15

to bo...@lists.boost.org

On Fri, Jan 30, 2015 at 12:17 PM, Cliff Green <cli...@codewrangler.net> wrote:
>
> I'm still not sure what to suggest to Beman for the Endian library wrt to
> float and double API's, specially as this library could or should be the
> basis for a standardization proposal. Integers are completely safe and
> well-defined for swapping - all bit patterns form valid integers and
> swapping bytes never result in "special" hardware processing.

I don't think that it actually guaranteed (it is only guaranteed for
unsigned char). It is fine for 2s-compliment, but for endian in the
standard, other integer representations need to be taken into account.

Tony

Bjorn Reese

unread,

Feb 1, 2015, 8:33:22 AM2/1/15

to bo...@lists.boost.org

On 01/23/2015 05:17 PM, Joel FALCOU wrote:

> Is the library ready to be added to Boost releases?

Yes.

The documentation is well-written, the code looks clean (except for the
multitude of #ifdefs), and the API is superior to N3783.

Having said that, I do have a few comments:

The endian_reverse() functions assume that there are only two types of
endianness (big and little) and that they are the reverse of each other.
In practice, this is a feasible assumption. However, once in a while, an
odd endianness does show up in this millennia. For example, ARM11
processors uses mix-endian [1] for double. Byte reversal will not
handle such cases correctly. The could become a problem during C++
standardization.

Why are there no *_int8_t types?

[1]
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0338g/I1844.html

Niall Douglas

unread,

Feb 1, 2015, 10:20:35 AM2/1/15

to bo...@lists.boost.org

On 1 Feb 2015 at 14:33, Bjorn Reese wrote:

> The endian_reverse() functions assume that there are only two types of
> endianness (big and little) and that they are the reverse of each other.
> In practice, this is a feasible assumption. However, once in a while, an
> odd endianness does show up in this millennia. For example, ARM11
> processors uses mix-endian [1] for double. Byte reversal will not
> handle such cases correctly. The could become a problem during C++
> standardization.

Another big problem for standardisation is that Endian only supports
CHAR_BIT == 8. I recently had a pull request come through for my
nedtries library adding CHAR_BIT = 10 support. I guess 10 bit per
byte CPUs are still common enough, and a bit more surprisingly they
can compile modern libraries.

Niall

--
ned Productions Limited Consulting
http://www.nedproductions.biz/
http://ie.linkedin.com/in/nialldouglas/

Rob Stewart

unread,

Feb 1, 2015, 7:11:55 PM2/1/15

to bo...@lists.boost.org

enum class align is overkill. I'd prefer an ordinary enum:

enum assignment
{
aligned,
unaligned
};

Usage, in endian_buffer for example, would be a little more straightforward:

template
<
order Order
, typename T
, std::size_t Nbits
, alignment Align = unaligned
> class endian_buffer;

More doc comments....

index.html

"and the least-significant-byte-first is traditionally called": omit "the" our add it to the beginning of the sentence.

s/release builds/optimized builds/ (the former is colloquial)

s/Useful when intrinsic headers such as byteswap.h are not being found by your compiler/This is useful when a compiler cannot find out has no intrinsic support or fails to locate the appropriate header/

s/Useful for eliminating missing intrinsics/This is useful for eliminating missing intrinsics/

The answer to the FAQ, "Does endianness have any uses outside of portable binary file or network I/O formats?", is unsatisfying. Several times already you've mentioned the same idea, but nowhere do you explain what you mean. How does the library provide that minor benefit?

In the FAQ, "Why bother with binary I/O?", s/direct access/random access/.

In the same FAQ, the sentence starting with "Disadvantages" should be a separate paragraph.

On the FAQ, "Which is better, big-endian or little-endian?", s/x64/x86-64/.

s/Systems where integer endianness differs/Systems on which integer endianness differs/ (systems aren't places)

buffer.html

s/See the Wikipedia/See Wikipedia/ reads better to me

s/containing four byte big-endian/containing four-byte, big-endian/

In the GIS example, I suggest sizeof(h) vs. sizeof(header) as a general means to protect against type changes. Boost should set a good example.

s/do layout memory/do lay out memory/

s/base classes will no longer disqualify/base classes no longer disqualify/

s/from being a POD/from being POD/ (s/POD/POD type/ would also work)

"alignment of an 64-bit integer may be to a 32-bit boundary on a 32-bit machine" would be more helpful if you complete the thought: "and to a 64-bit boundary on a 64-bit machine."

The note, "One-byte big and little buffer types have identical functionality", might be better as, "One-byte big and little buffer types are identical on all platforms."

The endian_buffer default constructor should have a note saying that the value is unspecified. (I'm assuming the value is not zero-initialized.)

s/endianness conversion if required is performed/endianness conversion, if required, is performed/

s/set to one../set to one./

If prefer s/POD's/PODs/

"This is ensures that , and so" needs some work.

___
Rob

(Sent from my portable computation engine)

Rob Stewart

unread,

Feb 3, 2015, 6:18:11 PM2/3/15

to bo...@lists.boost.org

arithmetic.html

"operators are +, -, ~, !, prefix and postfix -- and ++" has too many "and"s. Try, "operators are +, -, ~, !, plus both prefix and postfix -- and ++".

The binary arithmetic and relational operators lists really should have "and" before the final operator in each series.

Connects I made on buffer.html, like those on the warning and note after the typedef table apply to this page.

In "An endian is an integer byte-holder", I presume that "endian" should be "endian_arithmetic". (If not, then s/endian_arithmetic/endian/ throughout the page. The same applies to the synopsis.

The effects for the default constructor say nothing about the value. If it is uninitialized, a note to that effect would be helpful.

Rob Stewart

unread,

Feb 3, 2015, 6:25:24 PM2/3/15

to bo...@lists.boost.org

One thing that is not clear is why both the buffer and arithmetic types exist. The one, of course, derives from the other, so the only benefit would be the omission of what the derivate adds. What does one gain by using the buffer class rather than just always using the arithmetic type? IOW, why not just collapse the hierarchy?

Beman Dawes

unread,

Feb 9, 2015, 8:40:37 AM2/9/15

to Boost Developers List

On Tue, Feb 3, 2015 at 6:25 PM, Rob Stewart <rob.s...@verizon.net> wrote:
> One thing that is not clear is why both the buffer and arithmetic types exist. The one, of course, derives from the other, so the only benefit would be the omission of what the derivate adds. What does one gain by using the buffer class rather than just always using the arithmetic type? IOW, why not just collapse the hierarchy?

I've added these two entries to the overall FAQ:

Why do both the buffer and arithmetic types exist?

Conversions in the buffer types are explicit. Conversions in the
arithmetic types are implicit. This fundamental difference is a
deliberate design feature that would be lost if the inheritance
hierarchy were collapsed.

The original design provided only arithmetic types. Buffer types were
requested during formal review by those wishing total control over
when conversion occurs. They also felt that buffer types would be less
likely to be misused by maintenance programmers not familiar with the
implications of performing a lot of arithmetic operations on the
endian arithmetic types.

What is gained by using the buffer types rather than always just using
the arithmetic types?

Assurance than hidden conversions are not performed. This is of
overriding importance to users concerned about achieving the ultimate
in terms of speed.

"Always just using the arithmetic types" is fine for other users. When
the ultimate in speed needs to be ensured, the arithmetic types can be
used in the same design patterns or idioms that would be used for
buffer types, resulting in the same code being generated for either
types.

Thanks,

--Beman

Rob Stewart

unread,

Feb 9, 2015, 12:14:00 PM2/9/15

to bo...@lists.boost.org

On February 9, 2015 8:40:29 AM EST, Beman Dawes <bda...@acm.org> wrote:
>
> What is gained by using the buffer types rather than always just using
> the arithmetic types?
>
> Assurance than hidden conversions are not performed. This is of
> overriding importance to users concerned about achieving the
> ultimate in terms of speed.

^^^^^^^^^^^^^^^^^^^^^^
"highest speed" would suffice.

> "Always just using the arithmetic types" is fine for other users. When
> the ultimate in speed needs to be

You inverted the logic there.

> ensured, the arithmetic types can be
> used in the same design patterns or idioms that would be used for
> buffer types, resulting in the same code being generated for either
> types.

The latter confuses me a bit. The arithmetic types can, obviously, be used in the same cases as for the buffer types, but the obvious reason to prefer the arithmetic types by default is for the flexibility the implicit conversions and extra operators afford.

___
Rob

(Sent from my portable computation engine)

Beman Dawes

unread,

Feb 9, 2015, 7:39:20 PM2/9/15

to Boost Developers List

On Thu, Jan 29, 2015 at 1:53 PM, Rob Stewart <rob.s...@verizon.net> wrote:
> On January 23, 2015 11:17:25 AM EST, Joel FALCOU <joel....@gmail.com> wrote:
>> The mini-review of Boost.Endian starts today and runs through Sunday,
>> February 1.
>>
>> The docs for the library can be viewed online at
>> https://boostorg.github.io/endian/
>
> I found a few issues on the docs:
>
> * "an class"

Fixed.

> * "may improved efficiency"

Fixed.

> * " Remarks: Meet the EndianReversible requirements", for endian_reverse() should, I assume, refer to "x"

Fixed.

> * should there be a note suggesting that the following be constexpr, when possible?
>
> template <class EndianReversible> EndianReversible conditional_reverse(EndianReversible x, order order1, order order2) noexcept;

I'm leery of that. The user would not be able to depend on a given
function being constexpr because it would change according to the
endianness of the platform. I could be wrong about that, and I'm not
sure if users would get in trouble even if it were correct. The C++
committee is also considering similar scenarios for constexpr, and the
point is far from settled, even between people who understand
constexpr better than I do.

>
> * The effects documented for endian_reverse_inplace() don't allow for specialization.

I'll deal with that separately in a day or two.

>
> The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate.

I'll try to remove the font-size from the css, and see if that helps.
I currently have it set at 85% because that looks good on my
development machine under Firefox.

I checked firefox, chrome, safari, and IE, using Windows, Linux, OS X,
and Android. While a few of those combinations produced
<code>...</code> a bit on the small side, it was never "much smaller".

It will be tomorrow before I push these changes up to GitHub.

Thanks,

--Beman

Gavin Lambert

unread,

Feb 10, 2015, 12:24:59 AM2/10/15

to bo...@lists.boost.org

On 27/01/2015 01:24, Beman Dawes wrote:
> On Sat, Jan 24, 2015 at 3:03 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
>> On 24 January 2015 at 12:00, Beman Dawes <bda...@acm.org> wrote:
>>
>>> In practice, the enhanced efficiency of aligned types is usually more
>>> important than worry about someday encountering an odd-ball
>>> architecture or use in an unaligned location.
>>
>> I disagree with that for endian. Most of those uses are in wire and file
>> protocols, and they are either densely packed or based on the alignment of
>> the original architecture that the protocol was developed on.
>
> I agree with you that a lot of uses are for legacy formats, but my
> experience has been that most of those observe 16 and 32-bit alignment
> rules that are the same as we need today. 64-bit values may be a
> different story - I have virtually no experience with legacy formats
> that use 64-bit data.

Storage formats are a mixed bag (aligned is probably most common), but
virtually all wire formats that I've seen require unaligned (no-padding)
data.

Beman Dawes

unread,

Feb 11, 2015, 10:03:51 AM2/11/15

to Boost Developers List

On Sun, Feb 1, 2015 at 10:20 AM, Niall Douglas
<s_sour...@nedprod.com> wrote:
> On 1 Feb 2015 at 14:33, Bjorn Reese wrote:
>
>> The endian_reverse() functions assume that there are only two types of
>> endianness (big and little) and that they are the reverse of each other.
>> In practice, this is a feasible assumption. However, once in a while, an
>> odd endianness does show up in this millennia. For example, ARM11
>> processors uses mix-endian [1] for double. Byte reversal will not
>> handle such cases correctly. The could become a problem during C++
>> standardization.
>
> Another big problem for standardisation is that Endian only supports
> CHAR_BIT == 8. I recently had a pull request come through for my
> nedtries library adding CHAR_BIT = 10 support. I guess 10 bit per
> byte CPUs are still common enough, and a bit more surprisingly they
> can compile modern libraries.

Actually, the library specification for the endian types does support
CHAR_BITs other than 8.
Likewise built-in integers with sizes other than 8, 16, 32, and
64-bits. Ones complement integers should also work, and probably
sign-magnitude too.

The current implementation has only been tested on twos-complement
8-bit byte machines with 8, 16, 32, and 64-bit built-in integers.
That's why it asserts on CHAR_BIT == 8. Various optimizations may have
to be disabled before the code would work correctly.

The C++ committee's library working group is OK with conversion
functions with various limitations, and had included some in the
Library Fundamentals TS 1. The only reason they got pulled was because
on many platforms the names were already in use as macros, and there
wasn't time to come up with a new set of names.

Thanks,

--Beman

Beman Dawes

unread,

Feb 11, 2015, 10:24:54 AM2/11/15

to Boost Developers List

On Thu, Jan 29, 2015 at 1:53 PM, Rob Stewart <rob.s...@verizon.net> wrote:

> The code type is much smaller than the surrounding text. This is particularly odd, besides annoying, on the specifications section in which the effects, remarks, etc. Are huge compared to the function declaration to which they relate.

An update to the CSS has just been pushed to
https://boostorg.github.io/endian. It makes the presentation of text
more uniform across browsers, but the only smartphone I have is
android based, so I couldn't test IOS. On android the Chrome
presentation is excellent, while the Firefox presentation is perfectly
usable although not quite as pretty. Also tested various browsers on
Windows, Linux, and OS X, and all the results are adequate to
excellent. Safari does a particularly pleasing rendering on my macbook
pro retina display, although chrome is pretty good to. Firefox is
adequate.

Thanks,

--Beman

Rob Stewart

unread,

Feb 12, 2015, 8:42:29 AM2/12/15

to bo...@lists.boost.org

On February 11, 2015 10:24:44 AM EST, Beman Dawes <bda...@acm.org> wrote:
> On Thu, Jan 29, 2015 at 1:53 PM, Rob Stewart <rob.s...@verizon.net>
> wrote:
>
> > The code type is much smaller than the surrounding text. This is
> particularly odd, besides annoying, on the specifications section in
> which the effects, remarks, etc. Are huge compared to the function
> declaration to which they relate.
>
> An update to the CSS has just been pushed to
> https://boostorg.github.io/endian. It makes the presentation of text
> more uniform across browsers, but the only smartphone I have is
> android based, so I couldn't test IOS. On android the Chrome
> presentation is excellent, while the Firefox presentation is perfectly
> usable although not quite as pretty.

With Chrome on Android, it looks very nice except that the code often runs beyond the right edge of the blue background.

With Firefox on Android, the display has improved, but the code is half the height of the prose, which is still hard to read. Changing font size in the browser has no effect. That's unhelpful for those with poor eyesight or large screens, etc. as they are denied control.

> Also tested various browsers on
> Windows, Linux, and OS X, and all the results are adequate to
> excellent. Safari does a particularly pleasing rendering on my macbook
> pro retina display, although chrome is pretty good to. Firefox is
> adequate.

With Firefox on Windows, the text runs down the middle and extends about half the window width, and the code overruns the blue background, but it is quite readable. The text size does change as I change the browser font size, unlike on Android.

If only they all worked alike.

___
Rob

(Sent from my portable computation engine)

Rob Stewart

unread,

Feb 12, 2015, 8:47:54 AM2/12/15

to bo...@lists.boost.org

[I found this languishing in my drafts folder. Sorry it's late.]

arithmetic.html

"operators are +, -, ~, !, prefix and postfix -- and ++" has too many "and"s. Try, "operators are +, -, ~, !, plus both prefix and postfix -- and ++".

The binary arithmetic and relational operators lists really should have "and" before the final operator in each series.

Comments I made on buffer.html, like those on the warning and note after the typedef table apply to this page.

In, "An endian is an integer byte-holder", I presume that "endian" should be "endian_arithmetic". (If not, then s/endian_arithmetic/endian/ throughout the page. The same applies to the synopsis. The two are used variously.

The effects for the default constructor say nothing about the value. If it is uninitialized, a note to that effect would be helpful.

___
Rob

(Sent from my portable computation engine)

Beman Dawes

unread,

Feb 18, 2015, 7:36:37 AM2/18/15

to Boost Developers List

On Sun, Feb 1, 2015 at 7:11 PM, Rob Stewart <rob.s...@verizon.net> wrote:
> enum class align is overkill. I'd prefer an ordinary enum:
>
> enum assignment
> {
> aligned,
> unaligned
> };
>
> Usage, in endian_buffer for example, would be a little more straightforward:
>
> template
> <
> order Order
> , typename T
> , std::size_t Nbits
> , alignment Align = unaligned
>> class endian_buffer;
>

I have a mild preference for the current scoped enum formulation, so
am not making the change. I do agree that for libraries that support
C++03 compilers, a scoped enum is awkward.

> More doc comments....

> index.html
>
> 9 editorial fixes.

Applied, more or less as suggested.

>
> buffer.html
>
> s/See the Wikipedia/See Wikipedia/ reads better to me

I tried it, but "the Wikipedia" reads better to me.

> 12 editorial fixes.

Applied, more or less as suggested.

I've push the changes up to github, and also updated
https://boostorg.github.io/endian/

Thanks,

--Beman

Gavin Lambert

unread,

Feb 18, 2015, 7:44:18 PM2/18/15

to bo...@lists.boost.org

On 19/02/2015 01:36, Beman Dawes wrote:
> On Sun, Feb 1, 2015 at 7:11 PM, Rob Stewart <rob.s...@verizon.net> wrote:
>>
>> buffer.html
>>
>> s/See the Wikipedia/See Wikipedia/ reads better to me
>
> I tried it, but "the Wikipedia" reads better to me.

For what this particular bikeshed is worth, as "Wikipedia" is a proper
noun, usually the forms that sound the most correct are "see the
Wikipedia article for more info" or "see Wikipedia for more info".

You're not supposed to use articles with proper nouns such as place
names, and that includes well-known site names.

(Although Vortigaunts get a free pass. They remember the Freeman.)

Given that in this case you're hyperlinking to a specific article,
perhaps you should just add "article" to the hyperlink, as in "the
_Wikipedia article_".

Reply all

Reply to author

Forward