Compress on Big Endian, decompress on Little Endian

Nick Gildea

unread,

Mar 14, 2012, 8:33:03 AM3/14/12

to LZ4c

Hi,

I'm currently attempting to use LZ4 to compress data recorded on an
Android device (big endian) and decompress the data on an x86 machine
(little endian).

From reading the source code (lz4.c) it seems that the code is set up
to use a single endian-ness, based on the LZ4_READ_LITTLEENDIAN_16 and
WRITE macros. Is there a correct way of handling this deployment
scenario?

I initially used an unmodified lz4.c and found that my decompression
would always fail fairly early on in the process, judging by the
return values from LZ4_uncompress(). After looking at the code, I
guessed that the problem was an endian issue: the Android code would
create a big endian archive (as LZ4_READ_LITTLEENDIAN_16 would resolve
to the big endian version when built for Android) while when built on
the x86 machine the macro would resolve to the little endian version.

I tried to resolve this by commenting out the #if case and only
supplying the little endian version of the macros. This allowed me to
decompress the archives, but there seem to be some problems with the
data. (I.e. each of the packets is number prior to compression, but
the decompressed packets had some inconsistencies in the numbering,
e.g. I would get IDs { 0, 0, 2, 3, ... } instead of { 0, 1, 2,
3, ... }).

Is this usage possible with the default code? Can I correctly modify
lz4.c to support this setup?

Thanks,

Nick.

Nick Gildea

unread,

Mar 14, 2012, 8:39:50 AM3/14/12

to LZ4c

I've just had another look at lz4.c, and tried a different change.
I've commented the #define LZ4_BIG_ENDIAN 1 block, hopefully forcing
data to always be organised as little endian.

This *appears* to be working. Does this seem like a reasonable thing
to do?

Yann Collet

unread,

Mar 14, 2012, 9:02:20 AM3/14/12

to lz...@googlegroups.com

Hi Nick

If you know, beforehand, that your CPU is using Little Endian convention,
then yes, you are right : by commenting out this define, you ensure LZ4 code will consider the CPU as being Little Endian, regardless of the detection macro.

What's strange however is that you seem positive in the fact that your data producer is Big Endian. If it was, then using the "Big endian" code variant would be necessary to preserve compatibility with Little Endian target.
It seems it just does not happen this way, and you need the "Little Endian" path to preserve compatibility.

Sometimes, some CPU which are "Big Endian by default" are in fact endian-selectable (or called Bi-Endian, see http://en.wikipedia.org/wiki/Endianness#Bi-endian_hardware), they can run a program using Little-Endian mode too, depending on some parameters provided by the system. Maybe that's what happens on your Android device ?

The case you rise seems to point towards a False positive in the automatic Big Endian detection routine. CPU Feature detection is still, to this date, an empirical process, which can be flawed in unforeseen environments.
Specifically, Endianess is detected at compile time within LZ4, and therefore fixed into the binary. This is a problem if the target system can "select its endianess" at will during runtime.

Dynamic detection is implemented within lz4demo. It works fine, since this routine has much less impact on global performance.
It's possible to introduce the same dynamic model into lz4, but that will cost a bit of performance, due to extra checks within critical loops.

Your comments are always welcomed.

Regards

Yann

2012/3/14 Nick Gildea <nick....@gmail.com>

Nick Gildea

unread,

Mar 14, 2012, 9:37:30 AM3/14/12

to LZ4c

Hi Yann,

Ahh, that makes sense. I had read that ARM CPUs were bi-endian and
that they worked in little endian mode by default. However, when the
macros in lz4.c picked up the environment as big endian, I assumed
that was correct! :)

If I have some time I'll try and figure out why the false positive
occurs, and hopefully then correct the condition for android.

Thanks again,

Nick.

On Mar 14, 1:02 pm, Yann Collet <yann.collet...@gmail.com> wrote:
> Hi Nick
>
> If you know, beforehand, that your CPU is using Little Endian convention,
> then yes, you are right : by commenting out this define, you ensure LZ4
> code will consider the CPU as being Little Endian, regardless of the
> detection macro.
>
> What's strange however is that you seem positive in the fact that your data
> producer is Big Endian. If it was, then using the "Big endian" code variant
> would be necessary to preserve compatibility with Little Endian target.
> It seems it just does not happen this way, and you need the "Little Endian"
> path to preserve compatibility.
>
> Sometimes, some CPU which are "Big Endian by default" are in fact

> endian-selectable (or called Bi-Endian, seehttp://en.wikipedia.org/wiki/Endianness#Bi-endian_hardware), they can run a

> program using Little-Endian mode too, depending on some parameters provided
> by the system. Maybe that's what happens on your Android device ?
>
> The case you rise seems to point towards a False positive in the automatic
> Big Endian detection routine. CPU Feature detection is still, to this date,
> an empirical process, which can be flawed in unforeseen environments.
> Specifically, Endianess is detected at compile time within LZ4, and
> therefore fixed into the binary. This is a problem if the target system can
> "select its endianess" at will during runtime.
>
> Dynamic detection is implemented within lz4demo. It works fine, since this
> routine has much less impact on global performance.
> It's possible to introduce the same dynamic model into lz4, but that will
> cost a bit of performance, due to extra checks within critical loops.
>
> Your comments are always welcomed.
>
> Regards
>
> Yann
>

> 2012/3/14 Nick Gildea <nick.gil...@gmail.com>

Cyan

unread,

Mar 14, 2012, 10:21:07 AM3/14/12

to LZ4c

The critical detection macro is :

#if (defined(__BIG_ENDIAN__) || defined(__BIG_ENDIAN) ||
defined(_BIG_ENDIAN) || defined(_ARCH_PPC) || defined(__PPC__) ||
defined(__PPC) || defined(PPC) || defined(__powerpc__) ||
defined(__powerpc) || defined(powerpc) ||
((defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ ==
__ORDER_BIG_ENDIAN__))) )

Since you are mentionning that your target CPU is ARM (which Bi-
Endian, and Little Endian by default), would you mind checking :
Presence and value of : __BIG_ENDIAN__, __BIG_ENDIAN, _BIG_ENDIAN
Presence and value of : __BYTE_ORDER__, __ORDER_BIG_ENDIAN__
Presence and value of : __LITTLE_ENDIAN__, __LITTLE_ENDIAN,
_LITTLE_ENDIAN

Possible outcome : one of the BIG_ENDIAN is defined, but its value is
0
Possible outcome : BIG_ENDIAN and LITTLE_ENDIAN are both defined
simultaneously
Opened question : is there a way to detect BI-ENDIAN situations ?

Rgds

Yann

Dmitry Cherepanov

unread,

Mar 14, 2012, 4:43:36 PM3/14/12

to lz...@googlegroups.com

#if defined( __BIG_ENDIAN__)
#if (__BIG_ENDIAN__ > 0)
          #ifndef MY_BIGENDIAN
          #define MY_BIGENDIAN 1
        #endif
#endif
#endif

#if defined( __BIG_ENDIAN__)
#if (_BIG_ENDIAN > 0)
          #ifndef MY_BIGENDIAN
          #define MY_BIGENDIAN 1
        #endif
#endif
#endif

and so on for the every big endian macro in every compiller!!!!

2012/3/14 Cyan <yann.co...@gmail.com>

Dmitry

unread,

Mar 14, 2012, 5:22:36 PM3/14/12

to LZ4c

sorry my previos code is not help to detect bi-endian
but may be
#if (LZ4_BIG_ENDIAN_DETECTED == 1) && (LZ4_BIG_ENDIAN_DETECTED == 1)
#define LZ4_BI_ENDIAN_DETECTED 1
#endif

Any way I think
there is three informations sources

1. The ARM company reference guide documentation on core
2. The chip manufactirer reference guide on chip. Chip has core and
some periphiral devices and !!! BUSes!! bus to RAM to ROM and etc. So
chip manufacturer can remove big or little endian support to the core
3. The compliller documentation and OS documentation

The my case is

1. ARM7TDMIs core is endian independant Bi endian
2. BUT :-( Philips chip manufacturer of his MCU series cut support
LittleEndian
3. No OS in my case and compiler has no options to swithc of endianes

Reply all

Reply to author

Forward