Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

On endianness, many #ifdefs, and the need thereof

76 views
Skip to first unread message

Daniel

unread,
Oct 2, 2019, 7:00:51 AM10/2/19
to
Most C++ software that needs to know whether the host is big endian or little
endian comes with a header file with a very long sequence of #ifdef's, along
the lines of

# if (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \
(defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER == __BIG_ENDIAN) || \
(defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN) || \
(defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)) || (defined(__BIG_ENDIAN__) && !defined(__LITTLE_ENDIAN__)) || \
defined(__ARMEB__) || defined(__MIPSEB__) || defined(__s390__) || defined(__sparc__)

On the other hand, at least one library merely has

static constexpr bool little_endianess(int num = 1) noexcept
{
return *reinterpret_cast<char*>(&num) == 1;
}

referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.

So why the many #ifdef's, if this is enough? Or is it enough?

Thanks,
Daniel

David Brown

unread,
Oct 2, 2019, 7:26:44 AM10/2/19
to
This function is cannot be evaluated as a constant expression, despite
the "constexpr" qualifier. So you can't use in in places where you
actually need a constant expression (size of an array, template
parameter, etc.).

It will still be good enough in some contexts, especially as compilers
can optimise knowing the result of the function.

For constant expression endianness detection, conditional compilation is
your main choice (the complexity depends on the portability you need),
with pre-build configuration being an alternative, until C++20
standardises endianness.



Öö Tiib

unread,
Oct 2, 2019, 8:42:35 AM10/2/19
to
On Wednesday, 2 October 2019 14:00:51 UTC+3, Daniel wrote:

> On the other hand, at least one library merely has
>
> static constexpr bool little_endianess(int num = 1) noexcept
> {
> return *reinterpret_cast<char*>(&num) == 1;
> }
>
> referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.
>
> So why the many #ifdef's, if this is enough? Or is it enough?

The "constexpr" qualifier to functions is meaning something like
"pure function". It can be that it is impossible to expand it
compile time.

Result of reinterpret_cast is not constant expression and so your
function does not work compile time. Demo with online compiler:
<http://coliru.stacked-crooked.com/a/0801518db168d629>


Juha Nieminen

unread,
Oct 2, 2019, 8:53:51 AM10/2/19
to
Daniel <daniel...@gmail.com> wrote:
> On the other hand, at least one library merely has
>
> static constexpr bool little_endianess(int num = 1) noexcept
> {
> return *reinterpret_cast<char*>(&num) == 1;
> }
>
> referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.
>
> So why the many #ifdef's, if this is enough? Or is it enough?

Endianess cannot be resolved at compile time, no matter what you do, not
even with modern C++. (Believe me, I have tried. It's not possible.)
The only way to know endianess at compile time is if an external tool
(such as some kind of configure script) provides you with a macro or
constant that says so.

I haven't tried, but I believe the above example will not be evaluated
at compile time (assuming it compiles at all, which I don't think it does;
haven't tried, though).

There may be a technical reason why the compiler cannot provide
endianess information at compile time.

David Brown

unread,
Oct 2, 2019, 9:05:28 AM10/2/19
to
The /compiler/ can handle it at compile time - it can figure out the
result of this function at compile time and use that in optimisation.
(Compilers also usually have pre-defined macros for endianness - but
they are not standardised.) The issue is that the language says this
can't be part of a constant expression.

I suspect the technical reason will be that constant expressions can't
access memory - they need to be independent of the state of the system.

What I don't really understand, is why endianness macros have never been
standardised. gcc has __BYTE_ORDER__, __ORDER_LITTLE_ENDIAN__,
__ORDER_BIG_ENDIAN__. Other compilers usually have similar macros. It
would not have taken a great deal of effort to make __STDC__ equivalents
part of the C and C++ standards.

With C++20, we will have be able to write:

static constexpr bool little_endianess() noexcept
{
return std::endian::native == std::endian::little;
}

and it will be a constant expression.



Bonita Montero

unread,
Oct 2, 2019, 9:11:03 AM10/2/19
to
I think it's the best to have wrapper-classes for endianess'ed data
-types which are externally exchanged. This might might internally
use, depending on the type of the compiler per #ifdef, manual swapping
of the bytes or an appropriate intrinsic.

Scott Lurndal

unread,
Oct 2, 2019, 9:54:55 AM10/2/19
to
David Brown <david...@hesbynett.no> writes:
>On 02/10/2019 14:53, Juha Nieminen wrote:
>> Daniel <daniel...@gmail.com> wrote:
>>> On the other hand, at least one library merely has
>>>
>>> static constexpr bool little_endianess(int num = 1) noexcept
>>> {
>>> return *reinterpret_cast<char*>(&num) == 1;
>>> }
>>>
>>> referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.
>>>
>>> So why the many #ifdef's, if this is enough? Or is it enough?
>>
>> Endianess cannot be resolved at compile time, no matter what you do, not
>> even with modern C++. (Believe me, I have tried. It's not possible.)
>> The only way to know endianess at compile time is if an external tool
>> (such as some kind of configure script) provides you with a macro or
>> constant that says so.
>>
>> I haven't tried, but I believe the above example will not be evaluated
>> at compile time (assuming it compiles at all, which I don't think it does;
>> haven't tried, though).
>>
>> There may be a technical reason why the compiler cannot provide
>> endianess information at compile time.
>>
>
>The /compiler/ can handle it at compile time - it can figure out the
>result of this function at compile time and use that in optimisation.

Can it? How does that work when you're cross-compiling to a different
architecture (like a microcontroller, for example)?

Daniel

unread,
Oct 2, 2019, 10:52:29 AM10/2/19
to
On Wednesday, October 2, 2019 at 8:53:51 AM UTC-4, Juha Nieminen wrote:
> Daniel wrote:
> > On the other hand, at least one library merely has
> >
> > static constexpr bool little_endianess(int num = 1) noexcept
> > {
> > return *reinterpret_cast<char*>(&num) == 1;
> > }
> >
> > referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.
> >
>
> I haven't tried, but I believe the above example will not be evaluated
> at compile time (assuming it compiles at all, which I don't think it does;
>
The example is from the nlohmann json library, where I first saw it,
https://github.com/nlohmann/json/blob/develop/single_include/nlohmann/json.hpp#L5159,
and it does compile.

Daniel

Öö Tiib

unread,
Oct 2, 2019, 11:04:48 AM10/2/19
to
On what compiler does that compile:

constexpr bool b = nlohmann:detail::little_endianess();

???

Daniel

unread,
Oct 2, 2019, 11:31:48 AM10/2/19
to
That wouldn't :-) little_endianess() is a static member function of the
template class binary_reader. I meant that the function compiles in the
contexts in which it is used in nlohmann's library, it is used after all. But
the result can't be assigned to a constexpr bool.

Daniel



Barry Schwarz

unread,
Oct 2, 2019, 11:33:27 AM10/2/19
to
Both approaches will allow the programmer to generate code for each
case as appropriate.

The macro approach will allow the compiler to generate code only for
the case that applies to that compilation (assuming appropriate uses
of the #if DIRECTIVE). This will reduce the size of the generated
code. Whether that is a significant consideration obviously depends
on the code.

The function approach will be evaluated at run time. The compiler
will generate code for both cases and the program will then execute
only the relevant code (assuming appropriate uses of the if
STATEMENT). The if will be evaluated whenever the code needs to
distinguish between the two cases, thus increasing the run time.
Whether this is a significant consideration obviously depends on the
code.

IMO, the macro approach is a bit more difficult to maintain if
portability is an issue since different compilers (or even updates to
the same compiler) may use different macro names that need to be
tested not to mention the nesting of && and || operations.

OTOH, the function approach will fail if sizeof (int) is 1 (32-bit
char).

--
Remove del for email

Daniel

unread,
Oct 2, 2019, 11:59:25 AM10/2/19
to
On Wednesday, October 2, 2019 at 7:26:44 AM UTC-4, David Brown wrote:
> On 02/10/2019 13:00, Daniel wrote:
> >
> > On the other hand, at least one library merely has
> >
> > static constexpr bool little_endianess(int num = 1) noexcept
> > {
> > return *reinterpret_cast<char*>(&num) == 1;
> > }
> >
>
> This function is cannot be evaluated as a constant expression, despite
> the "constexpr" qualifier. So you can't use in in places where you
> actually need a constant expression (size of an array, template
> parameter, etc.).
>
> It will still be good enough in some contexts, especially as compilers
> can optimise knowing the result of the function.
>
> For constant expression endianness detection, conditional compilation is
> your main choice (the complexity depends on the portability you need),
> with pre-build configuration being an alternative, until C++20
> standardises endianness.

Thanks for the explanation, appreciated.

Daniel

Alf P. Steinbach

unread,
Oct 2, 2019, 1:36:12 PM10/2/19
to
On 02.10.2019 13:00, Daniel wrote:
> Most C++ software that needs to know whether the host is big endian or little
> endian comes with a header file with a very long sequence of #ifdef's, along
> the lines of
>
> # if (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \
> (defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER == __BIG_ENDIAN) || \
> (defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN) || \
> (defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)) || (defined(__BIG_ENDIAN__) && !defined(__LITTLE_ENDIAN__)) || \
> defined(__ARMEB__) || defined(__MIPSEB__) || defined(__s390__) || defined(__sparc__)

C++20 will provide compile time endianness information.

<url: https://en.cppreference.com/w/cpp/types/endian> notes that one
possible implementation is

enum class endian
{
#ifdef _WIN32
little = 0,
big = 1,
native = little
#else
little = __ORDER_LITTLE_ENDIAN__,
big = __ORDER_BIG_ENDIAN__,
native = __BYTE_ORDER__
#endif
};

Evidently that's because the number of fully fledged C++ compilers today
is very limited, where each is either compatible with Visual C++ (the
first branch) or with g++ (the second branch).


> On the other hand, at least one library merely has
>
> static constexpr bool little_endianess(int num = 1) noexcept
> {
> return *reinterpret_cast<char*>(&num) == 1;
> }
>
> referencing a contribution on stackoverflow, http://stackoverflow.com/a/1001328/266378.

As noted else-thread this cannot be evaluated at compile time by a
conforming compiler.

As not noted there (as I remember it), since there's no way the body
/can/ be evaluated at compile time, the function definition is not valid
as declared `constexpr`, and should not compile at all.

Indeed, Visual C++ 2019 says

foo.cpp(1): error C3615: constexpr function 'little_endianess' cannot
result in a constant expression

and g++ 8.2.0 says

foo.cpp:3:17: error: a reinterpret_cast is not a constant expression
return *reinterpret_cast<char*>(&num) == 1;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~

> So why the many #ifdef's, if this is enough? Or is it enough?

It's reasonable to assume that that macro is many years old.


- Alf

Oliver S.

unread,
Oct 2, 2019, 1:59:17 PM10/2/19
to
> # if (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) && __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \
> (defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER == __BIG_ENDIAN) || \
> (defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER == BIG_ENDIAN) || \
> (defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)) || (defined(__BIG_ENDIAN__) && !defined(__LITTLE_ENDIAN__)) || \
> defined(__ARMEB__) || defined(__MIPSEB__) || defined(__s390__) || defined(__sparc__)

obfuscated preprocessor contest ...

Keith Thompson

unread,
Oct 2, 2019, 3:54:36 PM10/2/19
to
sc...@slp53.sl.home (Scott Lurndal) writes:
> David Brown <david...@hesbynett.no> writes:
[...]
>>The /compiler/ can handle it at compile time - it can figure out the
>>result of this function at compile time and use that in optimisation.
>
> Can it? How does that work when you're cross-compiling to a different
> architecture (like a microcontroller, for example)?

Presumably the compiler knows the endianness of the target architecture
it's generating code for. There might be an issue with the compiler not
having that information soon enough, but it could certainly get the
information if it's worthwhile.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
void Void(void) { Void(); } /* The recursive call of the void */

Scott Lurndal

unread,
Oct 2, 2019, 4:16:53 PM10/2/19
to
Keith Thompson <ks...@mib.org> writes:
>sc...@slp53.sl.home (Scott Lurndal) writes:
>> David Brown <david...@hesbynett.no> writes:
>[...]
>>>The /compiler/ can handle it at compile time - it can figure out the
>>>result of this function at compile time and use that in optimisation.
>>
>> Can it? How does that work when you're cross-compiling to a different
>> architecture (like a microcontroller, for example)?
>
>Presumably the compiler knows the endianness of the target architecture
>it's generating code for. There might be an issue with the compiler not
>having that information soon enough, but it could certainly get the
>information if it's worthwhile.

Each ring in ARM64 can be set to run big- or little-endian. In ARMv7,
the application itself can change the endianness dynamically (SETEND
instruction). Linux supports reading the endianness from the ELF
header on Arm64 and will configure the process state accordingly.

So, absent some indication to the compiler on the command line
which endianness is desired, there doesn't seem to be a way for the
compiler to figure it out itself.

There is absolutely nothing wrong with using the pre-processor and
implementation defined (or specified by the programmer with -D)
macros to determine which endianness is being used.

We write a lot of code that is required to run in both big- and
little-endian environments, and one of the larger issues with
portability is related to bitfields. All of our structures with
bitfields have declarations similar to:

struct INTCTLR_CMD_CLEAR_s {
#if __BYTE_ORDER == __BIG_ENDIAN
uint64_t dev_id : 32; /**< [ 63: 32] Interrupt device ID. */
uint64_t reserved_8_31 : 24;
uint64_t cmd_type : 8; /**< [ 7: 0] Command type. Indicates GITS_CMD_TYPE_E::CMD_CLEAR. */
#else
uint64_t cmd_type : 8;
uint64_t reserved_8_31 : 24;
uint64_t dev_id : 32;
#endif
#if __BYTE_ORDER == __BIG_ENDIAN
uint64_t reserved_96_127 : 32;
uint64_t int_id : 32; /**< [ 95: 64] Interrupt ID to be translated. */
#else
uint64_t int_id : 32;
uint64_t reserved_96_127 : 32;
#endif
uint64_t reserved_128_191 : 64;
uint64_t reserved_192_255 : 64;
} s;

Juha Nieminen

unread,
Oct 2, 2019, 5:32:38 PM10/2/19
to
Daniel <daniel...@gmail.com> wrote:
>> > > > static constexpr bool little_endianess(int num = 1) noexcept
>> > > > {
>> > > > return *reinterpret_cast<char*>(&num) == 1;
>> > > > }

>> > The example is from the nlohmann json library, where I first saw it,
>> > https://github.com/nlohmann/json/blob/develop/single_include/nlohmann/json.hpp#L5159,
>> > and it does compile.
>>
>> On what compiler does that compile:
>>
>> constexpr bool b = nlohmann:detail::little_endianess();
>>
>> ???
>
> That wouldn't :-) little_endianess() is a static member function of the
> template class binary_reader. I meant that the function compiles in the
> contexts in which it is used in nlohmann's library, it is used after all. But
> the result can't be assigned to a constexpr bool.

I find it surprising that it will compile with that 'constexpr'
modifier. I have always had the understanding that a constexpr
function can only contain code that could at least in theory be
evaluated at compile time (even if it actually isn't). At the very
least the potential must be there.

Taking the address of a variable and dereferencing it doesn't
sound like something that should be doable at compile time,
and therefore it sounds like it wouldn't be allowed in a
constexpr function (in the same way as it isn't allowed
eg. as a non-type template parameter).

Daniel

unread,
Oct 2, 2019, 6:39:52 PM10/2/19
to
On Wednesday, October 2, 2019 at 1:36:12 PM UTC-4, Alf P. Steinbach wrote:
> On 02.10.2019 13:00, Daniel wrote:
>
> > On the other hand, at least one library merely has
> >
> > static constexpr bool little_endianess(int num = 1) noexcept
> > {
> > return *reinterpret_cast<char*>(&num) == 1;
> > }
> >
>
> As noted else-thread this cannot be evaluated at compile time by a
> conforming compiler.
>
> As not noted there (as I remember it), since there's no way the body
> /can/ be evaluated at compile time, the function definition is not valid
> as declared `constexpr`, and should not compile at all.
>
> Indeed, Visual C++ 2019 says
>
> foo.cpp(1): error C3615: constexpr function 'little_endianess' cannot
> result in a constant expression
>
> and g++ 8.2.0 says
>
> foo.cpp:3:17: error: a reinterpret_cast is not a constant expression
> return *reinterpret_cast<char*>(&num) == 1;
> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
A curiosity is that

const bool val = nlohmann::detail::binary_reader<nlohmann::json>::little_endianess();

compiles in vs2019 version 16.2.5, where little_endianess() is as shown above.
I'm assuming it must also compile with clang and g++ on linux and OSX, as it's
exercised in continuous integration builds. It fails in vs2019 version 16.2.5
with the error message Alf reported when extracted from the class. More work
for the compiler vendors!

Daniel

David Brown

unread,
Oct 3, 2019, 3:48:18 AM10/3/19
to
Try this on <https://godbolt.org>


static constexpr bool little_endianess(int num = 1) noexcept
{
return *reinterpret_cast<char*>(&num) == 1;
}

bool is_little_endian() {
return little_endianess();
}



You'll see gcc generate code such as:

is_little_endian():
movl $1, %eax
ret


This is, of course, an optimisation issue - not all compilers can handle
it, and you will need to enable optimisation. (And clang throws an
wobbly when you have a "reinterpret_cast" in a "constexpr" function.
You can just remove the "constexpr", since it doesn't help anyway.)


Compilers know their targets, and they know if they are targeting
little-endian or big-endian (or mixed-endian). They know that, even if
the cpu in question supports both endiannesses - binaries are generated
for only one endianness at a time.

David Brown

unread,
Oct 3, 2019, 3:54:47 AM10/3/19
to
That is the /logical/ behaviour. In practice, a good compiler can
evaluate the function at compile time (even though it is not a constant
expression), and use this along with dead code elimination to remove the
run-time check and code that cannot be called.

(Since it logically has the the test, the unused branch of code still
has to be valid and compilable code, unlike when you use conditional
compilation with the preprocessor.)

>
> IMO, the macro approach is a bit more difficult to maintain if
> portability is an issue since different compilers (or even updates to
> the same compiler) may use different macro names that need to be
> tested not to mention the nesting of && and || operations.
>
> OTOH, the function approach will fail if sizeof (int) is 1 (32-bit
> char).
>

You can happily combine the approaches:

# if (defined(__BYTE_ORDER__) && defined(__ORDER_BIG_ENDIAN__) &&
__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__) || \
(defined(__BYTE_ORDER) && defined(__BIG_ENDIAN) && __BYTE_ORDER ==
__BIG_ENDIAN) || \
(defined(BYTE_ORDER) && defined(BIG_ENDIAN) && BYTE_ORDER ==
BIG_ENDIAN) || \
(defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)) ||
(defined(__BIG_ENDIAN__) && !defined(__LITTLE_ENDIAN__)) || \
defined(__ARMEB__) || defined(__MIPSEB__) || defined(__s390__) ||
defined(__sparc__)

constexpr bool is_big_endian = true;
constexpr bool is_little_endian = false;
#define IS_BIG_ENDIAN 1
#define IS_LITTLE_ENDIAN 0

#else

constexpr bool is_big_endian = false;
constexpr bool is_little_endian = true;
#define IS_BIG_ENDIAN 0
#define IS_LITTLE_ENDIAN 1

#endif


Then in the rest of your code, you can use conditional compilation on
"#if IS_BIG_ENDIAN", or use "if (is_big_endian)", or even "if
constexpr(is_big_endian)" for newer C++.

Of course, you'll still be in trouble on a PDP, but you can't have
everything!

Jorgen Grahn

unread,
Oct 3, 2019, 7:34:06 AM10/3/19
to
On Wed, 2019-10-02, Daniel wrote:
> On Wednesday, October 2, 2019 at 8:53:51 AM UTC-4, Juha Nieminen wrote:
>> Daniel wrote:
>> > On the other hand, at least one library merely has
>> >
>> > static constexpr bool little_endianess(int num = 1) noexcept
>> > {
>> > return *reinterpret_cast<char*>(&num) == 1;
>> > }
>> >
>> > referencing a contribution on stackoverflow,
>> > http://stackoverflow.com/a/1001328/266378.
>> >
>>
>> I haven't tried, but I believe the above example will not be evaluated
>> at compile time (assuming it compiles at all, which I don't think it does;
>>
> The example is from the nlohmann json library, where I first saw it,
> https://github.com/nlohmann/json/ [...]

Weird and slightly worrying that a JSON library would have to be
endianness-aware ...

/Jorgen,
too lazy to check the source code

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

David Brown

unread,
Oct 3, 2019, 7:46:13 AM10/3/19
to
A little look suggests that it is for parsing binary JSON formats (like
UBJSON), rather than standard text-based JSON. But is still not
necessary, and I doubt if the code's algorithm is more efficient than
the obvious endian independent method.

Paavo Helde

unread,
Oct 3, 2019, 7:46:23 AM10/3/19
to
On 3.10.2019 14:33, Jorgen Grahn wrote:
> Weird and slightly worrying that a JSON library would have to be
> endianness-aware ...

A comment from that library:

@note This function needs to respect the system's endianess, because bytes
in CBOR, MessagePack, and UBJSON are stored in network order (big
endian) and therefore need reordering on little endian systems.


Daniel

unread,
Oct 3, 2019, 8:40:43 AM10/3/19
to
On Thursday, October 3, 2019 at 7:34:06 AM UTC-4, Jorgen Grahn wrote:
>
> Weird and slightly worrying that a JSON library would have to be
> endianness-aware ...
>
A strictly JSON library need not be, but a number of JSON libraries include
support for JSON-like binary formats, and nlohmann includes limited support for
CBOR, Message Pack, UBJSON and BSON.

Daniel

Daniel

unread,
Oct 3, 2019, 8:56:13 AM10/3/19
to
On Thursday, October 3, 2019 at 7:46:13 AM UTC-4, David Brown wrote:
>
> A little look suggests that it is for parsing binary JSON formats (like
> UBJSON), rather than standard text-based JSON. But is still not
> necessary, and I doubt if the code's algorithm is more efficient than
> the obvious endian independent method.

An "obvious endian independent method"? For decoding and encoding CBOR,
Message Pack, UBJSON, and BSON into native types? What are you referring to?

CBOR encodes 8, 16, 32, and 64 bit signed and unsigned ints as big endian, and
16, 32, and 64 bit floats as big endian. "CBOR for typed arrays" accommodates
typed arrays of 8, 16, 32, and 64 bit signed and unsigned ints, and 16, 32,
64, 128 bit floating point numbers, in either little endian or big endian.
Could you explain the "obvious endian independent method" that will encode and
decode that?

Thanks,
Daniel

Scott Lurndal

unread,
Oct 3, 2019, 9:17:48 AM10/3/19
to
If the incoming big-endian data is treated as a sequence of bytes, you
can easily reconstruct a binary value simply shifting the data into the
correct sized container a byte at a time.

uint64_t value = 0ul;
char *bp = &buffer[first_byte_of_uint64_big_endian_data];

for (size_t i = 0; i < sizeof(uint64_t); i++) {
value <<= 8;
value |= *bp++;
}

This works regardless of the host endianness.

David Brown

unread,
Oct 3, 2019, 10:03:52 AM10/3/19
to
You want "uint8_t" (or, for even more portability, uint_least8_t)
instead of "char" for your pointer. Unpleasant things will happen if
"char" is signed.

Apart from that, that is the "obvious" method I meant.

Some compilers are smart enough to optimise this kind of code into a
single load (if the cpu supports unaligned accesses) and a byte-swap (if
the cpu is little-endian). Clang manages it for 16-bit and 32-bit
versions of this code, but not the 64-bit version. gcc won't do it at
all in this case (I've seen it do such optimisation with other code).

Daniel

unread,
Oct 3, 2019, 10:14:09 AM10/3/19
to
On Thursday, October 3, 2019 at 10:03:52 AM UTC-4, David Brown wrote:
> On 03/10/2019 15:17, Scott Lurndal wrote:
> >
> > If the incoming big-endian data is treated as a sequence of bytes, you
> > can easily reconstruct a binary value simply shifting the data into the
> > correct sized container a byte at a time.
> >
> > uint64_t value = 0ul;
> > char *bp = &buffer[first_byte_of_uint64_big_endian_data];
> >
> > for (size_t i = 0; i < sizeof(uint64_t); i++) {
> > value <<= 8;
> > value |= *bp++;
> > }
> >
> > This works regardless of the host endianness.
> >
>
> You want "uint8_t" (or, for even more portability, uint_least8_t)
> instead of "char" for your pointer. Unpleasant things will happen if
> "char" is signed.
>
> Apart from that, that is the "obvious" method I meant.
>
Thanks, Scott and David, obvious, perhaps, but I didn't know about it.

Daniel

Jorgen Grahn

unread,
Oct 3, 2019, 2:51:14 PM10/3/19
to
On Thu, 2019-10-03, David Brown wrote:
> On 03/10/2019 13:33, Jorgen Grahn wrote:
>> On Wed, 2019-10-02, Daniel wrote:
...
>>> The example is from the nlohmann json library, where I first saw it,
>>> https://github.com/nlohmann/json/ [...]
>>
>> Weird and slightly worrying that a JSON library would have to be
>> endianness-aware ...
>>
>> /Jorgen,
>> too lazy to check the source code
>
> A little look suggests that it is for parsing binary JSON formats (like
> UBJSON),

I can't help misreading that as "Undefined Behaviour JSON" :-)

Thanks. I'm not into newish file formats, and didn't know things like
this existed. (Maybe ASN.1 wasn't such a bad idea after all?)

/Jorgen

Keith Thompson

unread,
Oct 3, 2019, 5:28:16 PM10/3/19
to
David Brown <david...@hesbynett.no> writes:
[...]
> A little look suggests that it is for parsing binary JSON formats (like
> UBJSON), rather than standard text-based JSON. But is still not
> necessary, and I doubt if the code's algorithm is more efficient than
> the obvious endian independent method.

FYI, my browser shows me a "This site may be hacked." warning for
http://ubjson.org/
I haven't investigated further.

David Brown

unread,
Oct 4, 2019, 4:11:27 AM10/4/19
to
On 03/10/2019 23:28, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
> [...]
>> A little look suggests that it is for parsing binary JSON formats (like
>> UBJSON), rather than standard text-based JSON. But is still not
>> necessary, and I doubt if the code's algorithm is more efficient than
>> the obvious endian independent method.
>
> FYI, my browser shows me a "This site may be hacked." warning for
> http://ubjson.org/
> I haven't investigated further.
>

I think that's just the modern (and IMHO ridiculous and
counter-productive) trend for browsers to get their knickers in a twist
over "http" sites - as though using "https" automatically made a site
safe and secure.

It certainly doesn't look hacked to me. Of course, it might have been a
temporary issue, or maybe you are the victim of a man-in-the-middle
attack, and https really would have been useful!

If you are remotely interested in UBJSON itself, Wikipedia is a safe bet:

<https://en.wikipedia.org/wiki/UBJSON>

It looks like an interesting format, but it bugs me a little that they
distinguish between int8 and uint8, but do not have unsigned versions of
other integer sizes.

Daniel

unread,
Oct 4, 2019, 12:00:46 PM10/4/19
to
On Friday, October 4, 2019 at 4:11:27 AM UTC-4, David Brown wrote:
>
> <https://en.wikipedia.org/wiki/UBJSON>
>
> It [UBJSON] looks like an interesting format, but it bugs me a little that
> they distinguish between int8 and uint8, but do not have unsigned versions
> of other integer sizes.

Compared to other JSON like binary formats, UBJSON has one interesting
feature, strongly typed arrays and objects,
see http://ubjson.org/type-reference/container-types/#optimized-format-example-array.
This allows the value type tags for items in an array to be omitted, when the
value type is the same. In other respects, though, UBJSON falls short of other
JSON like binary formats such as Message Pack and CBOR, both in terms of
support for types and for compactness. CBOR, in particular, has all the
momentum, being defined in an Internet Standards Document, RFC 7049,
https://tools.ietf.org/html/rfc7049, and now widely supported in tooling
across different programming languages. CBOR supports extensible tagging of
byte sequences, and RFC's are emerging that take advantage of that, e.g. CBOR
Tags for Typed Arrays, https://tools.ietf.org/html/draft-ietf-cbor-array-tags-
07.

Daniel
https://github.com/danielaparker/jsoncons
0 new messages