Comments on n3783: Network byte order

388 views
Skip to first unread message

fmatth...@gmail.com

unread,
Oct 1, 2013, 11:56:10 PM10/1/13
to std-pr...@isocpp.org
This proposal should be generalized to a more generic byte swapping solution. Something like:

//<bswap>
//Endianness
struct byte_order {
  enum order {
     little_endian,
     big_endian,
  };
order value = /*implementation defined*/;
};

//Cross-platform Byteswapping
template <typename T> constexpr T bswap(T val);
template <typename T> constexpr T cpu_to_be(T val) {
  return byte_order::value == little_endian ? bswap(val) : val;
}
template <typename T> constexpr T cpu_to_le(T val) {
  return byte_order::value == little_endian ? val : bswap(val);
}

//<net>

#include <bswap>
//Network byte order layer built ontop of bswap
template <typename T> constexpr T hton(T val) { return cpu_to_be(val); }

This is so easy and trivial to implement. Most compilers even have builtins for all of the byte swapping routines. I don't know why its not standardized.

Ville Voutilainen

unread,
Oct 1, 2013, 11:59:37 PM10/1/13
to std-pr...@isocpp.org
On 2 October 2013 06:56, <fmatth...@gmail.com> wrote:
This proposal should be generalized to a more generic byte swapping solution. Something like:

We know. :)
 
This is so easy and trivial to implement. Most compilers even have builtins for all of the byte swapping routines. I don't know why its not standardized.



It's not standardized because networking didn't need the generic solution and I don't recall
seeing a proposal for the generic one.

fmatth...@gmail.com

unread,
Oct 4, 2013, 1:05:15 AM10/4/13
to std-pr...@isocpp.org

It's not standardized because networking didn't need the generic solution and I don't recall
seeing a proposal for the generic one.


In that case I'd like to work on making one. Its a pretty small thing so it should not be difficult.
Here is an example of what it might look like:

Anyone care to comment or add suggestions? 

Thanks!

stackm...@hotmail.com

unread,
Oct 4, 2013, 2:55:00 AM10/4/13
to std-pr...@isocpp.org, fmatth...@gmail.com
I am strongly against hton and ntoh. There is no single fixed network byteorder. Byteorder depends on the underlying protocol.

Thiago Macieira

unread,
Oct 4, 2013, 3:18:55 AM10/4/13
to std-pr...@isocpp.org
On quinta-feira, 3 de outubro de 2013 23:55:00, stackm...@hotmail.com
wrote:
> I am strongly against hton and ntoh. There is no single fixed network
> byteorder. Byteorder depends on the underlying protocol.

You're at least 20 years too late on that argument.

"Network byte order" has been established for ages as "big endian". The htonl
and htons function have existed since the 1980s in one form or another.

That said, we need conversions to/from little endian as well, since many
protocols currently use (and future ones might too) little-endian, especially
since the most consumer-grade CPUs operate in little-endian mode.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358

fmatth...@gmail.com

unread,
Oct 4, 2013, 7:55:23 AM10/4/13
to std-pr...@isocpp.org, fmatth...@gmail.com


On Friday, October 4, 2013 2:55:00 AM UTC-4, stackm...@hotmail.com wrote:
I am strongly against hton and ntoh. There is no single fixed network byteorder. Byteorder depends on the underlying protocol.

The "internet" is big endian. Unless we move away from tcp/ip and replace all of our switches and routers this will not change anytime soon.
hton() is a good abstraction, but it needs to be built on top of a more generic solution. Many people deal with binary files
and hardware devices. These can require either little or big endian file formats.

byteswapping is really about cross platform binary data formats. Networking is only one user. 

Markus Mayer

unread,
Oct 4, 2013, 8:51:45 AM10/4/13
to std-pr...@isocpp.org
> --
>
> ---
> You received this message because you are subscribed to the Google
> Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to std-proposal...@isocpp.org.
> To post to this group, send email to std-pr...@isocpp.org.
> Visit this group at
> http://groups.google.com/a/isocpp.org/group/std-proposals/.

Thanks for your interest in making a proposal. As I am interested in
this topic too, I want to help you with this.

> struct byte_order {
> enum order {
> little_endian = 1234,
> big_endian = 4321,
> //Optional: pdp endian? Linux defines this.
> pdp_endian = 3412,
> };
> constexpr static order value = little_endian; //Implementation defined value little_endian or big_endian.
> };

Is there a reason for these specific numbers (e.g. little_endian =
1234)? Using smaller numbers (0,1,...) would enable us to store the
values in a smaller data type (e.g. char).

Intentionally I would write the above as:

enum class byte_order
//enum class byte_order : unsigned char (if size matters)
{
little_endian = 0,
big_endian = 1,
native_endian = little_endian //Implementation defined
};

I cannot say if it is betters or worse. What is your opinion about it?

> constexpr uint16_t bswap(uint16_t v) {
> return __builtin_bswap16(v);
> }

I didn't know that __builtin_bswap could be used as a constexpr. Awesome!

> constexpr uint16_t bswap16(uint16_t t) { return bswap(t); }

I would prefer a template for bswap:
template <typename T>
constexpr T bswap(T x); //Only specialized for supported types

This enables the following usages:
uint16_t x = bswap<uint16_t>(5);
uint16_t y = bswap(x);

> template <typename T>
> constexpr T cpu_to_le(T t) { return byte_order::value == byte_order::little_endian ? t : bswap(t); }

I personally would prefer the wording 'native' to 'cpu'.
I would move the byte_orders to template arguments (either as addition,
or as a replacement of the above):

template <byte_order from, byte_order to, typename T>
constexpr T convertByteOrder(T t);

> //Optional: Should we support signed integers as well?

IMHO yes!

> //Optional: Should we support floating point types? Do binary formats or hardware devices need this?

Yes, I am sure someone will someday create a format that needs this (If
it is not already existing).

Question: Is there any architecture where integers and floating point
numbers are stored in different endianesses?

> //Optional: Byte Swapping an arbitrary sized buffer? Is this at all useful?

I am not sure about this either. At least all supported integral types
should be supported.

Note: For buffers std::reverse_copy could be used to realize bswap.

> //Optional: Do we need/want a macro interface?

I think we you have it!

What I've been missing is a function the convert the byte_order without
knowing the order during compilation time. e.g:

template<typename T>
constexpr T convertByteOrderRT(T t, byte_order from, byte_order to);

byte_order from = ??; // Is set during runtime

uint16_t x = convertByteOrderRT(v, from, little_endian);

This function could also replace the 'convertByteOrder' function
mentioned above.

Some corner cases we probably should not care about:
- Swapping of types with sizes like 6 Byte
- Behavior when CHAR_BITS != 8


I hope these remarks are helpful for you and welcome any feedback.


regards, Markus









Thiago Macieira

unread,
Oct 4, 2013, 12:16:35 PM10/4/13
to std-pr...@isocpp.org
On sexta-feira, 4 de outubro de 2013 14:51:45, Markus Mayer wrote:
> Is there a reason for these specific numbers (e.g. little_endian =
> 1234)? Using smaller numbers (0,1,...) would enable us to store the
> values in a smaller data type (e.g. char).

Historic, from BSD. See endian.h.

There's also a third possible value used historically too. endian.h calls it
PDP_ENDIAN and its value is 3412. I've also seen it called "boustrophedon byte
order", but I guess no one can remember how to spell that (I had to look it
up).

fmatth...@gmail.com

unread,
Oct 4, 2013, 11:54:05 PM10/4/13
to std-pr...@isocpp.org, lotha...@gmx.de
This is much better actually. native_endian should be the same enum type.  This opens more possibility and freedom in implementing templates, functions, and switch statements.
Also while I like the style of the 1234 and 4321, 0 and 1 are better. That allows using the enums as array indicies for free.

Maybe this?
enum class endian {
  little,
  big,
  native = little
};

I cannot say if it is betters or worse. What is your opinion about it?

> constexpr uint16_t bswap(uint16_t v) {
>   return __builtin_bswap16(v);
> }

I didn't know that __builtin_bswap could be used as a constexpr. Awesome!

I haven't actually tested this for sure, but it compiles. Anyway the purpose is just to make the point that most compilers already do this trivially.

 
> constexpr uint16_t bswap16(uint16_t t) { return bswap(t); }

I would prefer a template for bswap:
template <typename T>
constexpr T bswap(T x); //Only specialized for supported types

This enables the following usages:
uint16_t x = bswap<uint16_t>(5);
uint16_t y = bswap(x);

I'm sold, that by far the most elegant way to handle type conversions.

 
> template <typename T>
> constexpr T cpu_to_le(T t) { return byte_order::value == byte_order::little_endian ? t : bswap(t); }

I personally would prefer the wording 'native' to 'cpu'.
I would move the byte_orders to template arguments (either as addition,
or as a replacement of the above):

host is also another possibility. Naming is hard, lets think more, we have time.
 

template <byte_order from, byte_order to, typename T>
constexpr T convertByteOrder(T t);


This I'm not sure I agree with. It's incredibly wordy and I'm not sure what the template arguments buy you. Most of the time people just want to convert to either le or be. 
So almost all of the time one of the arguments will be native. When the user wants to do a strict conversion, he can just use bswap directly.
Also a short name I think is important here, especially if used for setting up tables of constexpr values. For example is a list of device driver

Imagine a header file for a device driver which has a large table of big endian 32 bit oridinal commands.

//device.hh
enum : uint32_t {
  kOrdOpen = cpu_to_be(5UL),
  kOrdClose = cpu_to_be(6UL)
};

The above is consise. Spelling it out with templates would be redundant and tiresome.

> //Optional: Should we support signed integers as well?

IMHO yes!
Ok I agree 

> //Optional: Should we support floating point types? Do binary formats or hardware devices need this?

Yes, I am sure someone will someday create a format that needs this (If
it is not already existing).

Question: Is there any architecture where integers and floating point
numbers are stored in different endianesses?

Never heard of that, but if its possible there can be separate int and float endian enums like we have above. 
I don't think that adds very much complexity. In the meantime, we should have them in.


> //Optional: Byte Swapping an arbitrary sized buffer? Is this at all useful?

I am not sure about this either. At least all supported integral types
should be supported.

Note: For buffers std::reverse_copy could be used to realize bswap.

> //Optional: Do we need/want a macro interface?

I think we you have it!

What I've been missing is a function the convert the byte_order without
knowing the order during compilation time. e.g:

template<typename T>
constexpr T convertByteOrderRT(T t, byte_order from, byte_order to);

byte_order from = ??; // Is set during runtime

uint16_t x = convertByteOrderRT(v, from, little_endian);

This function could also replace the 'convertByteOrder' function
mentioned above.

runtime control makes sense.
 

Some corner cases we probably should not care about:
- Swapping of types with sizes like 6 Byte
Just reverse the bytes I suppose, what else would you do.
 
- Behavior when CHAR_BITS != 8
byte swapping is about bytes not bits. So I think we can safely ignore this horrible issue.
 


I hope these remarks are helpful for you and welcome any feedback.

Thanks for your suggestions! 

regards, Markus



New version of header on github. 

Bjorn Reese

unread,
Oct 5, 2013, 8:50:16 AM10/5/13
to std-pr...@isocpp.org
On 10/05/2013 05:54 AM, fmatth...@gmail.com wrote:

> New version of header on github.

I suggest that you rename the cpu_to_* and *_to_cpu functions, because
not all operating systems (e.g. Stratus VOS) use the same endianness as
the CPU.

fmatth...@gmail.com

unread,
Oct 5, 2013, 9:14:04 AM10/5/13
to std-pr...@isocpp.org

I can agree with that. I've changed the name to host for the moment. Still open to possibly using native or some other name.
One other benefit of host is that it matches hton() naming scheme.

Maybe the byte_order enum value should also match? This makes the link between them a little more clear. byte_order::host/host_to_Xe or byte_order::native/native_to_Xe

We could also shorten the names even further:
htole()
htobe()
letoh()
betoh() 

Ville Voutilainen

unread,
Oct 5, 2013, 10:04:18 AM10/5/13
to std-pr...@isocpp.org
On 5 October 2013 16:14, <fmatth...@gmail.com> wrote:

We could also shorten the names even further:
htole()
htobe()
letoh()
betoh() 




Perhaps we should try to make them readable. htons and ntohs are bad examples to follow
in that regard.

Markus Mayer

unread,
Oct 5, 2013, 12:10:22 PM10/5/13
to std-pr...@isocpp.org
On 10/05/2013 03:14 PM, fmatth...@gmail.com wrote:
>
>
> On Saturday, October 5, 2013 8:50:16 AM UTC-4, Bjorn Reese wrote:
>
> On 10/05/2013 05:54 AM, fmatth...@gmail.com <javascript:> wrote:
>
> > New version of header on github.
>
> I suggest that you rename the cpu_to_* and *_to_cpu functions, because
> not all operating systems (e.g. Stratus VOS) use the same endianness as
> the CPU.
>
>
> I can agree with that. I've changed the name to host for the moment.
> Still open to possibly using native or some other name.
> One other benefit of host is that it matches hton() naming scheme.
>
> Maybe the byte_order enum value should also match? This makes the link
> between them a little more clear. byte_order::host/host_to_Xe or
> byte_order::native/native_to_Xe
>

Irregardless if we call it 'native', 'host' or anything else, the enum
value should match the *_to_Xe functions. Everything else would be
irritating.

Markus Mayer

unread,
Oct 5, 2013, 12:14:30 PM10/5/13
to std-pr...@isocpp.org
On 10/05/2013 04:04 PM, Ville Voutilainen wrote:
>
>
>
> On 5 October 2013 16:14, <fmatth...@gmail.com
> <mailto:fmatth...@gmail.com>> wrote:
>
>
> We could also shorten the names even further:
> htole()
> htobe()
> letoh()
> betoh()
>
>
>
>
> Perhaps we should try to make them readable. htons and ntohs are bad
> examples to follow
> in that regard.
>

+1 for readability.

I also suggest to rename the bswap template to 'byte_swap' or even
'swap_bytes'.

The same holds for 'bconvert' ('byte_convert' or 'convert_bytes').


fmatth...@gmail.com

unread,
Oct 5, 2013, 12:34:34 PM10/5/13
to std-pr...@isocpp.org, lotha...@gmx.de


On Saturday, October 5, 2013 12:14:30 PM UTC-4, Markus Mayer wrote:
On 10/05/2013 04:04 PM, Ville Voutilainen wrote:
>
>
>
> On 5 October 2013 16:14, <fmatth...@gmail.com
> <mailto:fmatth...@gmail.com>> wrote:
>
>
>     We could also shorten the names even further:
>     htole()
>     htobe()
>     letoh()
>     betoh()
>
>
>
>
> Perhaps we should try to make them readable. htons and ntohs are bad
> examples to follow
> in that regard.
>

+1 for readability.


99% of the time I agree. However I do view these kinds of bit/byte operations as primitives, also even the same as operators like +, |, <<, etc..
These kinds of things are the only place I'd prefer short names to avoid typing, especially if they are being combined in an expression. 
 
I also suggest to rename the bswap template to 'byte_swap' or even
'swap_bytes'.
Usually for function names, I prefer verbs to go first. So swap_bytes would be preferrable. I'm not totally opposed to bswap() though for reasons mentioned above.
 

The same holds for 'bconvert' ('byte_convert' or 'convert_bytes').

This was a bad name, now there is a new one.

Updates in git. 

 

Thiago Macieira

unread,
Oct 5, 2013, 12:49:25 PM10/5/13
to std-pr...@isocpp.org
On sexta-feira, 4 de outubro de 2013 14:51:45, Markus Mayer wrote:
> > constexpr uint16_t bswap(uint16_t v) {
> >
> > return __builtin_bswap16(v);
> >
> > }
>
> I didn't know that __builtin_bswap could be used as a constexpr. Awesome!

Depends on the compiler.

According to comments on the Qt source code, GCC allows it but Clang doesn't.

Bo Persson

unread,
Oct 5, 2013, 1:10:17 PM10/5/13
to std-pr...@isocpp.org
fmatth...@gmail.com skrev 2013-10-05 15:14:
>
>
> On Saturday, October 5, 2013 8:50:16 AM UTC-4, Bjorn Reese wrote:
>
> On 10/05/2013 05:54 AM, fmatth...@gmail.com <javascript:> wrote:
>
> > New version of header on github.
>
> I suggest that you rename the cpu_to_* and *_to_cpu functions, because
> not all operating systems (e.g. Stratus VOS) use the same endianness as
> the CPU.
>
>
> I can agree with that. I've changed the name to host for the moment.
> Still open to possibly using native or some other name.

If we are going to be this picky, 'host' is not that good when you run
in a virtual environment. Neither is 'native'. :-)


Bo Persson



fmatth...@gmail.com

unread,
Oct 5, 2013, 1:52:51 PM10/5/13
to std-pr...@isocpp.org, b...@gmb.dk

If we are going to be this picky, 'host' is not that good when you run
in a virtual environment. Neither is 'native'.  :-)

Maybe true, but a little too pedantic I think as you hinted.

I like host because it also matches the networking context.
host_to_net(T t) will be symmetrical with the other byte order methods. 

Markus Mayer

unread,
Oct 6, 2013, 7:37:48 AM10/6/13
to std-pr...@isocpp.org
On 10/05/2013 05:54 AM, fmatth...@gmail.com wrote:
>
> Maybe this?
> enum class endian {
> little,
> big,
> native = little
> };
>

Yes. I like your solution. We should just be consistent if we use the
wording 'endian' or 'byte_order' (I prefer the later, which you also use
at your version at github)

>
> I personally would prefer the wording 'native' to 'cpu'.
> I would move the byte_orders to template arguments (either as addition,
> or as a replacement of the above):
>
>
> host is also another possibility. Naming is hard, lets think more, we
> have time.
>

I will create a list of possible names with some pros and cons to help
the discussion.

>
> Never heard of that, but if its possible there can be separate int and
> float endian enums like we have above.
> I don't think that adds very much complexity. In the meantime, we should
> have them in.

A quick look at wikipedia revels the following:
'However, on modern standard computers (i.e., implementing IEEE 754),
one may in practice safely assume that the endianness is the same for
floating point numbers as for integers, making the conversion
straightforward regardless of data type. (Small embedded systems using
special floating point formats may be another matter however.)'

I am not sure if we should cover that case, but my solution would look
something like:

enum class byte_order {
little,
big,
integral_native = little, //Implementation defined
floating_point_native
};

>
>
> > //Optional: Byte Swapping an arbitrary sized buffer? Is this at
> all useful?
>
> I am not sure about this either. At least all supported integral types
> should be supported.
>

After further think about it, I cann't come up with a use case requiring
a buffer swap routine.

> Note: For buffers std::reverse_copy could be used to realize bswap.
>
> > //Optional: Do we need/want a macro interface?
>
> I think we you have it!

I am not that sure anymore if we really need a macro interface.

>
> What I've been missing is a function the convert the byte_order without
> knowing the order during compilation time. e.g:
>
> template<typename T>
> constexpr T convertByteOrderRT(T t, byte_order from, byte_order to);
>
> byte_order from = ??; // Is set during runtime
>
> uint16_t x = convertByteOrderRT(v, from, little_endian);
>
> This function could also replace the 'convertByteOrder' function
> mentioned above.
>
>
> runtime control makes sense.
>
>

I think we could unify the 'reorder_bytes' functions as well as the
'host_to' function to something like:

template <typename T>
constexpr T reorder_bytes(T t, byte_order in, byte_order out =
byte_order::host) {
return in == out ? t : swap_bytes(t);
}


We also should come up with some concrete use-cases the see if we cover
them.

Bo Persson

unread,
Oct 6, 2013, 8:08:41 AM10/6/13
to std-pr...@isocpp.org
Markus Mayer wrote 2013-10-06 13:37:
>
> A quick look at wikipedia revels the following:
> 'However, on modern standard computers (i.e., implementing IEEE 754),
> one may in practice safely assume that the endianness is the same for
> floating point numbers as for integers, making the conversion
> straightforward regardless of data type. (Small embedded systems using
> special floating point formats may be another matter however.)'
>
> I am not sure if we should cover that case,

Another "modern" computer that doesn't use IEEE floating point by
default is IBM mainframes (zSeries).

That's a rather important platform that a language standard shouldn't
neglect.


Bo Persson



Thiago Macieira

unread,
Oct 6, 2013, 2:32:17 PM10/6/13
to std-pr...@isocpp.org
On domingo, 6 de outubro de 2013 13:37:48, Markus Mayer wrote:
> > > //Optional: Byte Swapping an arbitrary sized buffer? Is this at
> > all useful?
> >
> > I am not sure about this either. At least all supported integral types
> > should be supported.
>
> After further think about it, I cann't come up with a use case requiring
> a buffer swap routine.

Here's one: checking the endianness and converting to an integral at once,
from a char buffer.

const char *buffer = getBufferPointer();
unsigned id = qFromBigEndian<uint16_t>(buffer);
unsigned qdcount = qFromBigEndian<uint16_t>(buffer + 4);
unsigned ancount = qFromBigEndian<uint16_t>(buffer + 6);
unsigned nscount = qFromBigEndian<uint16_t>(buffer + 8);
unsigned arcount = qFromBigEndian<uint16_t>(buffer + 10);

This is very useful when parsing binary network protocols or binary files.
Usually, the formats align the fields naturally, but sometimes they don't.
What's more, there's no guarantee that the alignment requirements of the local
CPU match those of the format. And, of course, there's the matter of whether
the character buffer was aligned enough to begin with.

See
http://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.h.html#_Z14qFromBigEndianPKh

fmatth...@gmail.com

unread,
Oct 6, 2013, 2:34:17 PM10/6/13
to std-pr...@isocpp.org, lotha...@gmx.de


On Sunday, October 6, 2013 7:37:48 AM UTC-4, Markus Mayer wrote:
On 10/05/2013 05:54 AM, fmatth...@gmail.com wrote:
>
> Maybe this?
> enum class endian {
>    little,
>    big,
>    native = little
> };
>

Yes. I like your solution. We should just be consistent if we use the
wording 'endian' or 'byte_order' (I prefer the later, which you also use
at your version at github)

I also prefer byte_order. 

>
>     I personally would prefer the wording 'native' to 'cpu'.
>     I would move the byte_orders to template arguments (either as addition,
>     or as a replacement of the above):
>
>
> host is also another possibility. Naming is hard, lets think more, we
> have time.
>

I will create a list of possible names with some pros and cons to help
the discussion.

>
> Never heard of that, but if its possible there can be separate int and
> float endian enums like we have above.
> I don't think that adds very much complexity. In the meantime, we should
> have them in.

A quick look at wikipedia revels the following:
'However, on modern standard computers (i.e., implementing IEEE 754),
one may in practice safely assume that the endianness is the same for
floating point numbers as for integers, making the conversion
straightforward regardless of data type. (Small embedded systems using
special floating point formats may be another matter however.)'

I am not sure if we should cover that case, but my solution would look
something like:

enum class byte_order {
   little,
   big,
   integral_native = little, //Implementation defined
   floating_point_native
};

Or maybe something like this:

enum class endian {
  little,
  big
};

template <typename T>
struct byte_order<T> {
  endian value = little;
}

And then provide specializations for integral and floating point types as needed.

The the host_to_le method becomes:
template <typename T>
T host_to_le(T t) { return byte_order<T>::value == endian::little ? t : swap_bytes(t); }
 

>
>
>      > //Optional: Byte Swapping an arbitrary sized buffer? Is this at
>     all useful?
>
>     I am not sure about this either. At least all supported integral types
>     should be supported.
>

After further think about it, I cann't come up with a use case requiring
a buffer swap routine.
Me either, lets drop it.
 

>     Note: For buffers std::reverse_copy could be used to realize bswap.
>
>      > //Optional: Do we need/want a macro interface?
>
>     I think we you have it!

I am not that sure anymore if we really need a macro interface.

I am going to hold onto the macro interface. Macros might be evil, but there could be a use for it. After all we still don't have a static_if. One a real proposal is drafted, if the standards committee decides not to have the macro interface, I'll remove it then.
 

>
>     What I've been missing is a function the convert the byte_order without
>     knowing the order during compilation time. e.g:
>
>     template<typename T>
>     constexpr T convertByteOrderRT(T t, byte_order from, byte_order to);
>
>     byte_order from = ??; // Is set during runtime
>
>     uint16_t x = convertByteOrderRT(v, from, little_endian);
>
>     This function could also replace the 'convertByteOrder' function
>     mentioned above.
>
>
> runtime control makes sense.
>
>

I think we could unify the 'reorder_bytes' functions as well as the
'host_to' function to something like:

template <typename T>
constexpr T reorder_bytes(T t, byte_order in, byte_order out =
byte_order::host) {
   return in == out ? t : swap_bytes(t);
}

 I had some ideas like this too.

Thiago Macieira

unread,
Oct 6, 2013, 2:53:10 PM10/6/13
to std-pr...@isocpp.org
On domingo, 6 de outubro de 2013 11:34:17, fmatth...@gmail.com wrote:
> template <typename T>
> struct byte_order<T> {
> endian value = little;
> }
>
> And then provide specializations for integral and floating point types as
> needed.

Any chance of modifying numeric_limits for this?

fmatth...@gmail.com

unread,
Oct 6, 2013, 3:15:51 PM10/6/13
to std-pr...@isocpp.org
On Sunday, October 6, 2013 2:53:10 PM UTC-4, Thiago Macieira wrote:
On domingo, 6 de outubro de 2013 11:34:17, fmatth...@gmail.com wrote:
> template <typename T>
> struct byte_order<T> {
>   endian value = little;
> }
>
> And then provide specializations for integral and floating point types as
> needed.

Any chance of modifying numeric_limits for this?


What would be the benefit of putting it there?

Keeping it as a separate trait makes it easier to extend for user defined types like simd registers.
Also, not modifying already existing headers makes this proposal much easier to accept.

Still if theres a compelling reason to put it in numeric_limits, we should fight for that.

Geoffrey Romer

unread,
Oct 7, 2013, 12:06:05 PM10/7/13
to std-pr...@isocpp.org
On Fri, Oct 4, 2013 at 5:51 AM, Markus Mayer <lotha...@gmx.de> wrote:
On 10/04/2013 07:05 AM, fmatth...@gmail.com wrote:

    It's not standardized because networking didn't need the generic
    solution and I don't recall
    seeing a proposal for the generic one.


In that case I'd like to work on making one. Its a pretty small thing so
it should not be difficult.
Here is an example of what it might look like:
https://github.com/fmatthew5876/stdcxx/blob/master/byteorder/include/byteorder.hh

Anyone care to comment or add suggestions?

Thanks!

--

---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send

This seems problematic, because it's hard to rigorously specify "byte swapping" for types beyond the exact-width integer types, and it may not even be implementable with defined behavior (e.g. if byte-swapping a well-formed value produces a trap value). It's also too low-level an API for general use, because it requires the user to know or condition on the host endianness, which is almost always a mistake (q.v. http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html).

It seems to me that what's really wanted are two symmetric operations, serialization and deserialization, which convert between the (unspecified) native format and an explicitly-specified byte format (e.g. 32-bit little-endian two's complement, or big-endian IEEE 754 double-precision). The non-native format should be handled using char*, the standard-sanctioned way to work with uninterpreted bytes (or perhaps some type that wraps char*).

Straw-man proposal: establish a BinaryFormatPolicy concept, such that if P is a type that models that concept, and c is a char* value,

Policy::native_type Policy::read(c);

reads Policy::number_of_bytes bytes starting at c, and returns the value they represent. Similarly, if v is a value of type Policy::native_type

Policy::write(v, c);

writes the encoded value of v in the Policy::number_of_bytes bytes starting at c. You could even conceivably support variable-length encodings, by having read and write update their pointer parameters, but then you have to worry about issues like buffer allocation.

You'd also want to standardize at least some basic policy classes, covering big and little endian versions of unsigned and two's complement integers in various sizes, and IEEE single, double, and extended precision. I'm not sure exactly how to name them; it's tempting to make things like endianness and size be template parameters, but it's not clear how useful that is in practice, and I worry making size a parameter will tempt users into doing subtly wrong things like std::big_endian_ieee<sizeof(double)> in the name of portability.


This enables the following usages:
uint16_t x = bswap<uint16_t>(5);
uint16_t y = bswap(x);

template <typename T>
constexpr T cpu_to_le(T t) { return byte_order::value == byte_order::little_endian ? t : bswap(t); }

I personally would prefer the wording 'native' to 'cpu'.
I would move the byte_orders to template arguments (either as addition, or as a replacement of the above):

template <byte_order from, byte_order to, typename T>
constexpr T convertByteOrder(T t);

//Optional: Should we support signed integers as well?

IMHO yes!

//Optional: Should we support floating point types? Do binary formats or hardware devices need this?

Yes, I am sure someone will someday create a format that needs this (If it is not already existing).

Question: Is there any architecture where integers and floating point numbers are stored in different endianesses?

As was pointed out elsewhere in the thread, the answer is yes, and worse, Wikipedia refers to "old ARM processors that have half little-endian, half big-endian floating point representation". The possibility of boustraphedon endianness even for integers has already come up. However, per the above, we shouldn't have to care about any possible host endianness- we just need to care about the formats for which it's worth standardizing the ability to read and write them.
 

//Optional: Byte Swapping an arbitrary sized buffer? Is this at all useful?

I am not sure about this either. At least all supported integral types should be supported.

Note: For buffers std::reverse_copy could be used to realize bswap.

//Optional: Do we need/want a macro interface?

I think we you have it!

What I've been missing is a function the convert the byte_order without knowing the order during compilation time. e.g:

template<typename T>
constexpr T convertByteOrderRT(T t, byte_order from, byte_order to);

byte_order from = ??; // Is set during runtime

uint16_t x = convertByteOrderRT(v, from, little_endian);

This function could also replace the 'convertByteOrder' function mentioned above.

Some corner cases we probably should not care about:
- Swapping of types with sizes like 6 Byte
- Behavior when CHAR_BITS != 8


I hope these remarks are helpful for you and welcome any feedback.


regards, Markus
--

--- You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.

fmatth...@gmail.com

unread,
Oct 8, 2013, 8:41:10 PM10/8/13
to std-pr...@isocpp.org


On Monday, October 7, 2013 12:06:05 PM UTC-4, Geoffrey Romer wrote:
On Fri, Oct 4, 2013 at 5:51 AM, Markus Mayer <lotha...@gmx.de> wrote:
On 10/04/2013 07:05 AM, fmatth...@gmail.com wrote:

    It's not standardized because networking didn't need the generic
    solution and I don't recall
    seeing a proposal for the generic one.


In that case I'd like to work on making one. Its a pretty small thing so
it should not be difficult.
Here is an example of what it might look like:
https://github.com/fmatthew5876/stdcxx/blob/master/byteorder/include/byteorder.hh

Anyone care to comment or add suggestions?

Thanks!

--

---
You received this message because you are subscribed to the Google
Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send

Even if byte-swapping performs a trap value (for example, signalling nan for float), all it does is store the value in either a register or memory. Is there any architecture where merely storing a particular bit pattern could cause a fault or exception? I certainly would hope not.
If your byteswapped value could cause an exception by doing other operations arithmetic on it, that's ok.
 
low-level an API for general use, because it requires the user to know or condition on the host endianness, which is almost always a mistake (q.v. http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html).
Thank you for this, the the point of view made from this article was very enlightening. I'll comment more at the end. 

It seems to me that what's really wanted are two symmetric operations, serialization and deserialization, which convert between the (unspecified) native format and an explicitly-specified byte format (e.g. 32-bit little-endian two's complement, or big-endian IEEE 754 double-precision). The non-native format should be handled using char*, the standard-sanctioned way to work with uninterpreted bytes (or perhaps some type that wraps char*).

Yes converting from whatever native is to big and little endian is really the end goal. The byte order of the machine is an implementation detail. 

Straw-man proposal: establish a BinaryFormatPolicy concept, such that if P is a type that models that concept, and c is a char* value,

Policy::native_type Policy::read(c);
Isn't this pointer requirement a bit restrictive? File Io in particular does not expose it's internal buffer.

float farray[512];
read_from_file(somefile, farray, sizeof(farray));

//Entire loop should be optimized out if host is big endian.
for(auto& f : farray) {
f = be_to_host(f);
}

To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

I'll have to consider your policy idea a bit more first. Something like that could be a better interface.

This brings up an interesting question. Do we actually need to expose the host byte order at all?
If we just provide host_to_Xe() functions, then we avoid all of these nasty corner cases with strange byte orders.

In addition, exposing the byte order also brings problems when a new architecture with some byte order not in the standard list is presented.
User code that is written this way will break;
if(host_byte_order == little_endian) {
  //Do little endian stuff
}  else {
  //Assume big endian
}

This becomes impossible if there is no exposed host byte order.

So now let me ask this question. If we just provide host_to_Xe() methods for all builtin types (and optionally encourage overloads for implementation specific types like simd as needed),
is there ever a case where users will still need a method of determining the system byte order?

Put another way, is it possible for us to account for all possible uses cases of knowing the byte order and just provide those via a simple interface?
The only time I've ever needed it is for serialization.

If exposing the byte order enables an important use case that cannot be abstracted, then I'd be inclined to keep it around. Otherwise, do we really need it at all?


 

David Stone

unread,
Oct 8, 2013, 8:41:22 PM10/8/13
to std-pr...@isocpp.org, fmatth...@gmail.com
There are situations in which you want to do more than just host vs. specified. For instance, you may have a machine that translates from one networking protocol to another. On the receiving end, the data is in little-endian format, but on the sending end, the data must be in big-endian format. You don't want to have to go through any intermediate step, you just want to convert little to big, even if you happen to be running on a PDP-10.

For what it's worth, both I and Beman Dawes have worked on two fairly complete endian libraries (that are partially merged). Mine only has the byte conversion functions, and his also has integer types of specified endianness. I will let Beman discuss his version (if he reads this), but mine can be found at https://bitbucket.org/davidstone/endian so this might be a good reference for discussion. I believe that is should work on any platform with a standard-compliant implementation, even if CHAR_BIT == 19, sizeof(short) == sizeof(int) == sizeof(long) == (sizeof(long long) / 9).

Based on my experiences now, I would actually argue against the naming decisions that I made in that library. For reference, the functions are named in the style of `T be_to_le(T t)` and `T h_to_pdp(T t)`. My original guide was to follow the lead of htons-style function. The type is never included in the function name, though, and possible values in either position are be, le, pdp, h, and n. I now believe that it would be better to depart a bit from these somewhat cryptic names. This functions are unlikely to be called very often (usually relegated to one or two locations as part of a larger library function), and will almost never be part of large chained expressions. Typical use would likely look something like

    int const value = n_to_h(read_int(socket));

as the most complicated form in most code. I believe that this justifies a more verbose naming convention with less abbreviation, and were I to re-write this library, I would spell things out more. My preference would be a name like host_to_network or little_to_big. host_to_network_byte_order is getting a little too long. I also don't feel like we get any benefit from specifying the source and destination formats as an enum passed via template parameter.

Whether the "network" version of byte ordering functions is needed at all is something we would have to decide as well. I lean toward leaving it out (and just having people use "big"), but I do not feel strongly about this and would not complain if others preferred to have it in. I do not know how languages other than C name similar functions.

"host" would be my preferred name over "native" or "cpu", but this also isn't an important issue to me.

Theoretically these could be defined as constexpr using the new relaxed constexpr rules (but not as I defined the functions due to the use of reinterpret_cast). However, this would constrain implementations a bit. Based on my testing, the reinterpret_cast version actually ended up being slower on all compilers, but it is also the only simple solution that works for floating point types. Moreover, most uses of an endian library will be for writing to files or network interfaces, which cannot be done in constexpr functions, anyway, so these functions probably should not be declared constexpr.

They also should not be declared noexcept due to the possibility of trap representations.



Given all of this, I don't believe it would be necessary to worry about defining some sort of enum or integer value to specify what the byte order of the host machine is, as the only purpose for such a thing that I can see would be to define these functions.



There are systems where the data segment of memory can be little or big endian, and on such systems, these functions should still work as expected (do a dynamic determination of what type of system it is).

fmatth...@gmail.com

unread,
Oct 8, 2013, 10:58:09 PM10/8/13
to std-pr...@isocpp.org, fmatth...@gmail.com


On Tuesday, October 8, 2013 8:41:22 PM UTC-4, David Stone wrote:
There are situations in which you want to do more than just host vs. specified. For instance, you may have a machine that translates from one networking protocol to another. On the receiving end, the data is in little-endian format, but on the sending end, the data must be in big-endian format. You don't want to have to go through any intermediate step, you just want to convert little to big, even if you happen to be running on a PDP-10.

Indeed, we should have conversions to and from the different formats. This is useful for virtual machines, binary converters, and such.
 

For what it's worth, both I and Beman Dawes have worked on two fairly complete endian libraries (that are partially merged). Mine only has the byte conversion functions, and his also has integer types of specified endianness. I will let Beman discuss his version (if he

Please contact him and let him know about this if you can. I'd really like the input of people who have already done this before. Experience is the most valuable teacher.
 
reads this), but mine can be found at https://bitbucket.org/davidstone/endian so this might be a good reference for discussion.
Very cool, I will take a look.
 
I believe that is should work on any platform with a standard-compliant implementation, even if CHAR_BIT == 19, sizeof(short) == sizeof(int) == sizeof(long) == (sizeof(long long) / 9). 

Based on my experiences now, I would actually argue against the naming decisions that I made in that library. For reference, the functions are named in the style of `T be_to_le(T t)` and `T h_to_pdp(T t)`. My original guide was to follow the lead of htons-style function. The type is never included in the function name, though, and possible values in either position are be, le, pdp, h, and n. I now believe that it would be better to depart a bit from these somewhat cryptic names. This functions are unlikely to be called very often (usually relegated to one or two locations as part of a larger library function), and will almost never be part of large chained expressions. Typical use would likely look something like

    int const value = n_to_h(read_int(socket));

as the most complicated form in most code. I believe that this justifies a more verbose naming convention with less abbreviation, and were I to re-write this library, I would spell things out more. My preference would be a name like host_to_network or little_to_big. host_to_network_byte_order is getting a little too long.

That's good to know. Naming is always the hardest thing to decide.
 
I also don't feel like we get any benefit from specifying the source and destination formats as an enum passed via template parameter.
 
Perhaps you want to write a customizable serialization class/framework and pass in the byte order of target byte stream? Providing templated versions certainly doesn't hurt.


Whether the "network" version of byte ordering functions is needed at all is something we would have to decide as well. I lean toward leaving it out (and just having people use "big"), but I do not feel strongly about this and would not complain if others preferred to have it in. I do not know how languages other than C name similar functions.

Actually we don't. I'm going to leave this to the networking people to decide for themselves.
One thing we may want to collaborate on is naming. If we use host_to_little, they may want to do host_to_network, if we to htole or host_to_le, maybe they stick with hton.
 

"host" would be my preferred name over "native" or "cpu", but this also isn't an important issue to me.

I'm kind of in favor of host as well. It at least matches the traditional networking context everyone is familiar with.
 

Theoretically these could be defined as constexpr using the new relaxed constexpr rules (but not as I defined the functions due to the use of reinterpret_cast). However, this would constrain implementations a bit. Based on my testing, the reinterpret_cast version actually ended up being slower on all compilers, but it is also the only simple solution that works for floating point types. Moreover, most uses of an endian library will be for writing to files or network interfaces, which cannot be done in constexpr functions, anyway, so these functions probably should not be declared constexpr.

I'm strongly in favor of constexpr. You can define things like constant headers or hardware device ordinals right into your code with no runtime overhead.

struct header {
  uint32_t version,
  uint32_t size,
};
const header hdr = {
host_to_be(5),
host_to_be(32)
};

write(file, hdr, sizeof(hdr));
write(file, data, 32);

Also, it wouldn't surprise me if constexpr continues to get more powerful and permissive as time goes on.


They also should not be declared noexcept due to the possibility of trap representations.

This was mentioned before. 
Do you know of a concrete example on some machine?


Given all of this, I don't believe it would be necessary to worry about defining some sort of enum or integer value to specify what the byte order of the host machine is, as the only purpose for such a thing that I can see would be to define these functions.

I think so too. Unless someone can come up with a compelling use case, maybe it should be left undefined.


There are systems where the data segment of memory can be little or big endian, and on such systems, these functions should still work as expected (do a dynamic determination of what type of system it is).

This is another possible issue. What are the limits of configurable byte orders? If its just that the app can be compiled one way or the other, then its no problem. If it has to be determined at runtime, that would break constexpr.
 

David Stone

unread,
Oct 8, 2013, 11:17:07 PM10/8/13
to std-pr...@isocpp.org, fmatth...@gmail.com
Another possible naming convention that someone had brought up in the past is that for "host" configurations, the "host" can simply be implied. Rather than host_to_big, the function would be to_big. Rather than little_to_host the function would be from_little. However, that was in the context of a proposed boost::endian namespace. Perhaps this library should follow the lead of std::chrono with the use of a nested namespace?

Consider the case of the some ARM processors: http://infocenter.arm.com/help/topic/com.arm.doc.ddi0278b/Cegbbbab.html . The endianness can be set dynamically by setting a global signal, and that endianness can be big, little, or "mixed", which is the same as little endian, except that 64-bit values have their 32-bit "words" swapped (which is similar to how PDP-endianness is with 16-bit vs. 32-bit values).

This type of scenario appears to ruin any possibility of constexpr support in general.

David Stone

unread,
Oct 8, 2013, 11:39:57 PM10/8/13
to std-pr...@isocpp.org, fmatth...@gmail.com
As for trap representations, this is trivial if we allow the endian functions to work on floating point values for systems with IEEE floats. There are valid values for double in one endianness that would be a signaling NaN on another. I don't know off-hand of any real platforms that have trap representations for integer types, but I would be suspicious of getting the -0 representation on a one's complement or sign-magnitude architecture.

Markus Mayer

unread,
Oct 9, 2013, 11:17:03 AM10/9/13
to std-pr...@isocpp.org
On 10/06/2013 08:32 PM, Thiago Macieira wrote:
> On domingo, 6 de outubro de 2013 13:37:48, Markus Mayer wrote:
>>> > //Optional: Byte Swapping an arbitrary sized buffer? Is this at
>>> all useful?
>>>
>>> I am not sure about this either. At least all supported integral types
>>> should be supported.
>>
>> After further think about it, I cann't come up with a use case requiring
>> a buffer swap routine.
>
> Here's one: checking the endianness and converting to an integral at once,
> from a char buffer.
>
> const char *buffer = getBufferPointer();
> unsigned id = qFromBigEndian<uint16_t>(buffer);
> unsigned qdcount = qFromBigEndian<uint16_t>(buffer + 4);
> unsigned ancount = qFromBigEndian<uint16_t>(buffer + 6);
> unsigned nscount = qFromBigEndian<uint16_t>(buffer + 8);
> unsigned arcount = qFromBigEndian<uint16_t>(buffer + 10);
>
> This is very useful when parsing binary network protocols or binary files.
> Usually, the formats align the fields naturally, but sometimes they don't.
> What's more, there's no guarantee that the alignment requirements of the local
> CPU match those of the format. And, of course, there's the matter of whether
> the character buffer was aligned enough to begin with.
>
> See
> http://code.woboq.org/qt5/qtbase/src/corelib/global/qendian.h.html#_Z14qFromBigEndianPKh
>

I think this would mix two unrelated functions into one. (byte_order
conversion and obtaining values from a buffer)

The proper solution would be two independent functions like:

unsigned qdcount = big_to_host(fromBuffer<uint16_t>(buffer + 4));

Markus Mayer

unread,
Oct 9, 2013, 11:18:31 AM10/9/13
to std-pr...@isocpp.org
On 10/06/2013 08:53 PM, Thiago Macieira wrote:
> On domingo, 6 de outubro de 2013 11:34:17, fmatth...@gmail.com wrote:
>> template <typename T>
>> struct byte_order<T> {
>> endian value = little;
>> }
>>
>> And then provide specializations for integral and floating point types as
>> needed.
>
> Any chance of modifying numeric_limits for this?
>

I also thought about this, but how does the byte_order relate to the
term "numeric_limits".

Markus Mayer

unread,
Oct 9, 2013, 11:26:54 AM10/9/13
to std-pr...@isocpp.org
Even if you can change the endianness of some arm processors, the
compiler still needs to know the endianess (gcc uses -mlittle-endian and
-mbig-endian for arm) to correctly layout multibyte values.

Thiago Macieira

unread,
Oct 9, 2013, 12:25:08 PM10/9/13
to std-pr...@isocpp.org
On quarta-feira, 9 de outubro de 2013 17:17:03, Markus Mayer wrote:
> The proper solution would be two independent functions like:
>
> unsigned qdcount = big_to_host(fromBuffer<uint16_t>(buffer + 4));

Maybe, or that might be slower.

Geoffrey Romer

unread,
Oct 9, 2013, 1:16:35 PM10/9/13
to std-pr...@isocpp.org
I'm not very familiar with how trap values work on various architectures, so you may be right on that, but my point is more general: the whole point of strong typing is to provide a guarantee that typed objects are well-formed. In other words, 'float' should only be used to store floating-point numbers; it is an abuse of the type system to use it to store "some bytes that we will eventually convert to a floating point number", and the standard library should not enable those sorts of abuses. It's a bad practice, even if not technically harmful in this particular case.

The correct type for storing uninterpreted bytes is char* (although if you get me drunk enough, I might also endorse unsigned integral types in some circumstances).
 
 
low-level an API for general use, because it requires the user to know or condition on the host endianness, which is almost always a mistake (q.v. http://commandcenter.blogspot.com/2012/04/byte-order-fallacy.html).
Thank you for this, the the point of view made from this article was very enlightening. I'll comment more at the end. 

It seems to me that what's really wanted are two symmetric operations, serialization and deserialization, which convert between the (unspecified) native format and an explicitly-specified byte format (e.g. 32-bit little-endian two's complement, or big-endian IEEE 754 double-precision). The non-native format should be handled using char*, the standard-sanctioned way to work with uninterpreted bytes (or perhaps some type that wraps char*).

Yes converting from whatever native is to big and little endian is really the end goal. The byte order of the machine is an implementation detail. 

Straw-man proposal: establish a BinaryFormatPolicy concept, such that if P is a type that models that concept, and c is a char* value,

Policy::native_type Policy::read(c);
Isn't this pointer requirement a bit restrictive? File Io in particular does not expose it's internal buffer. 

Yes, that's one of several reasons this is only a straw man, not a real proposal. One could imagine using some sort of generic byte-source and byte-sink concept instead of char*, or providing overloads for iostreams. Ideally, the API would cleanly support both I/O and in-place conversion. It's important, though, that the optimizer be able to fully inline read(), write(), and iteration operations on the source/sink types, so that it can eliminate no-op loops.
 

float farray[512];
read_from_file(somefile, farray, sizeof(farray));

//Entire loop should be optimized out if host is big endian.
for(auto& f : farray) {
f = be_to_host(f);
}

There are two problems with this code. First of all, you are reading uninterpreted bytes directly into float objects. As discussed above, this is a bad practice; the behavior of this code is at best unspecified, if not outright undefined, and I'm far from convinced it's safe in practice, given the possibility of things like signaling NaNs.

Second, it's not portable. It tacitly assumes that sizeof(float) is exactly equal to the size of floating point numbers in somefile's format, but this cannot be true in general, because the standard does not uniquely determine sizeof(float). Worse, your comment indicates that be_to_host does nothing but byte swapping, which means that this code is also assuming that the host uses the same floating-point format as somefile.

These are the exact points I wanted to emphasize with my proposed API:
- The conversion API must not be symmetric: the wire-format data is an uninterpreted byte sequence, and must be treated as such by the type system.
- The conversion API cannot be parameterized by the host type: it must be parameterized by the wire format (which contains strictly more information). It is implementation-defined whether any given host type can hold all possible values of a given wire format, so the host type must be dictated by the API, not by the user.

Here's how I'd write that code (note that I've tweaked some names from my previous proposal):

char raw_array[512 * big_endian_ieee_single::wire_size];
read_from_file(somefile, farray, sizeof(farray));

typedef big_endian_ieee_single::host_type ieee_float;
static_assert(sizeof(ieee_float) == big_endian_ieee_single::wire_size);

char* ptr = raw_array;
// Entire loop should be optimized out if host supports big-endian IEEE
while (ptr < raw_array + sizeof(raw_array)) {
  reinterpret_cast<ieee_float>(ptr) = big_endian_ieee_single::read(ptr);
  ptr += sizeof(ieee_float);
}

ieee_float* float_array = reinterpret_cast<ieee_float*>(raw_array);

Note that the static_assert is not guaranteed to succeed, but it's much more likely to be true than if I was using "float" in place of the ieee_float typedef. If I wanted my code to be completely portable, I'd need to allocate the raw and deserialized arrays separately, but I could probably use SFINAE tricks to do this only when sizeof(ieee_float) != big_endian_ieee_single::wire_size.
The major case I'm aware of is, as David mentioned, when you want to translate between two formats. Notably, even in this case you don't actually care about the native byte order, you just want to efficiently convert one well-specified format to another. We could extend the API with something like

template <typename SourcePolicy, typename SinkPolicy>
void convert_byte_format(const char* source, char* sink);

which the implementation can specialize appropriately (an interesting but solvable challenge would be to extend this to support user-defined policies). However, that may be over-engineering; this use case seems too marginal to warrant much standardization effort, and a separate API may not be needed: you can just code this in terms of the to-host/from-host API

char* source, sink;
// Initialize source and sink
while (have_more_input) {
  SourcePolicy::host_type tmp = SourcePolicy::read(source);
  SourcePolicy::write(tmp, sink);
  // increment source and sink
}

and rely on the optimizer to generate efficient code. I'd expect a decent optimizer to be able to turn this into a minimal number of byte-swaps and copies, and to completely eliminate the loop when the source and sink have the same endianness.

In case it wasn't clear, I don't support exposing the host byte order in a standard API: It would be annoyingly hard to standardize such an API in the face of mixed endianness, dynamic endianness, and other complicating factors, and the result would likely be an "attractive nuisance" rather than a genuinely useful utility. The people who really need it are probably already living in the land of implementation-defined behavior, so they can just use vendor extensions. 

Beman Dawes

unread,
Oct 9, 2013, 5:32:18 PM10/9/13
to std-proposals
On Tue, Oct 1, 2013 at 11:59 PM, Ville Voutilainen <ville.vo...@gmail.com> wrote:



On 2 October 2013 06:56, <fmatth...@gmail.com> wrote:
This proposal should be generalized to a more generic byte swapping solution. Something like:

We know. :)
 
This is so easy and trivial to implement. Most compilers even have builtins for all of the byte swapping routines. I don't know why its not standardized.



It's not standardized because networking didn't need the generic solution and I don't recall
seeing a proposal for the generic one.

With help from a lot of others, I've had a generic Endian library accepted by Boost, subject to dealing with a number of concerns. Those concerns have now been dealt with, so I'm hoping the library will go into Boost trunk in the next few weeks and ship with Boost release 1.56.  The library handles both endian conversions and endian types, and would presumably be suitable for a standard library TS proposal.

See github.com/beman/endian

--Beman

fmatth...@gmail.com

unread,
Oct 20, 2013, 8:27:36 PM10/20/13
to std-pr...@isocpp.org, bda...@acm.org
I'm going to put this on hold for now and focus on my other bitwise operators proposal (which may end up containing generic byte reversal routines). We can revisit the idea later. 
Perhaps the best way would be to use Beman's library and build from there.
Reply all
Reply to author
Forward
0 new messages