Re : Byte Order

167 views
Skip to first unread message

Sachin Garg

unread,
Jan 25, 2003, 1:35:31 PM1/25/03
to
Hi,

> Actually Intel and Motorola chips store the 4 byte int in different
orders.
>
> To force the order to be processor independent just write
> the 4 bytes out one at a time rather than the whole int
> at once with code like.
>
> Given:
> int num;
>
> cout << (char)((num >> 24) & 0xFF);
> cout << (char)((num >> 16) & 0xFF);
> cout << (char)((num >> 8) & 0xFF);
> cout << (char)((num) & 0xFF);

I have a related question.

I have to read data from a file into unsigned longs(4 byte). Data is written
in file in LITTLE endian form.

My code is to be platform independent. I am doing this by reading
straight into an unsigned long and then converting endianness if I am on a
big endian machine. (using #if to check platform)

However, It will make my job easier if I can "read data byte by byte" and then
form a long after every 4 reads. Any platform independent code to do this?
(it will be better it doesnt needs #if)

Thanks.

Sachin Garg [India]
http://sachingarg.cjb.net
http://sachingarg.go.to

Kevin Goodsell

unread,
Jan 25, 2003, 2:54:18 PM1/25/03
to
On 25 Jan 2003 10:35:31 -0800, sch...@yahoo.com (Sachin Garg) wrote:

>
>I have to read data from a file into unsigned longs(4 byte). Data is written
>in file in LITTLE endian form.
>
>My code is to be platform independent. I am doing this by reading
>straight into an unsigned long and then converting endianness if I am on a
>big endian machine. (using #if to check platform)

You are assuming that only those two orders are possible. This is
simply not the case.

>
>However, It will make my job easier if I can "read data byte by byte" and then
>form a long after every 4 reads. Any platform independent code to do this?
>(it will be better it doesnt needs #if)

The "correct" way is platform independent (unless possibly if the
platforms in question use different byte-sizes).

const int word_length = 4; // assuming 4-byte values
unsigned char c;
unsigned long value = 0;
for (int i=0; i<word_length; i++)
{
fin >> c; // some error-checking would be appropriate
value = (value << CHAR_BIT) | c;
}

Or you could read the 4 bytes into an array, vector, deque or
whatever, then operate on that. That would probably simplify it a bit
(particularly the error-checking that I skipped above):

for (int i=0; i<word_length; i++)
{
value = (value << CHAR_BIT) | array[i];
}

-Kevin

Sachin Garg

unread,
Jan 25, 2003, 6:19:32 PM1/25/03
to
"Kevin Goodsell" <good...@bridgernet.com> wrote in message news:u6q53vohm097jnj1o...@4ax.com...

> On 25 Jan 2003 10:35:31 -0800, sch...@yahoo.com (Sachin Garg) wrote:

> >I have to read data from a file into unsigned longs(4 byte). Data is written
> >in file in LITTLE endian form.

> The "correct" way is platform independent (unless possibly if the


> platforms in question use different byte-sizes).
>
> const int word_length = 4; // assuming 4-byte values
> unsigned char c;
> unsigned long value = 0;
> for (int i=0; i<word_length; i++)
> {
> fin >> c; // some error-checking would be appropriate
> value = (value << CHAR_BIT) | c;
> }

Hi,

So, this shifting method will always give same 'value' independent of
platform endianness.

Is this 'value' SAME in terms of "numerical value" stored in them or
in
terms of "bit pattern" stored in them.

According to me, it should be same "numerical value". If this is the
case,
then, this 'value' will be same as that read by "fin>>value;" on a
little endian machine or that on a big endian machine?

Or if it is same "bit pattern", then, that bit pattern will have
numerical value same as ...?

Gianni Mariani

unread,
Jan 25, 2003, 6:38:05 PM1/25/03
to Sachin Garg


Here is a class that is endian independant and uses no #defines.

template <class base_type >
class NetworkOrder
{
public:

base_type m_uav;

static inline bool IsBigEndianbool()
{
unsigned x = 1;
return ! ( * ( char * )( & x ) );
}

static inline void OrderRead(
const base_type & i_val,
base_type & i_destination
)
{
unsigned char * src = ( unsigned char * ) & i_val;
unsigned char * dst = ( unsigned char * ) & i_destination;

if (
( sizeof( base_type ) == 1 )
|| HA_OSTraitsBase::IsBigEndianbool()
) {

//
// Alignment is an issue some architectures so
// even for non-swapping we read a byte at a time

if ( sizeof( base_type ) == 1 ) {
dst[0] = src[0];
} else if ( sizeof( base_type ) == 2 ) {
dst[0] = src[0];
dst[1] = src[1];
} else if ( sizeof( base_type ) == 4 ) {
dst[0] = src[0];
dst[1] = src[1];
dst[2] = src[2];
dst[3] = src[3];
} else {

for (
int i = sizeof( base_type );
i > 0;
i --
) {
* ( dst ++ ) = * ( src ++ );
}
}

} else {

if ( sizeof( base_type ) == 2 ) {
dst[1] = src[0];
dst[0] = src[1];
} else if ( sizeof( base_type ) == 4 ) {
dst[3] = src[0];
dst[2] = src[1];
dst[1] = src[2];
dst[0] = src[3];
} else {
dst += sizeof( base_type ) -1;
for ( int i = sizeof( base_type ); i > 0; i -- ) {
* ( dst -- ) = * ( src ++ );
}
}
}


static inline void OrderWrite(
const base_type & i_val,
base_type & i_destination
)
{
// for the time being this is the same as OrderRead
OrderRead( i_val, i_destination );
}

inline operator base_type () const
{
base_type l_value;
OrderRead( m_uav, l_value );
return l_value;
}


inline base_type operator=( base_type in_val )
{
OrderWrite( in_val, m_uav );
return in_val;
}

};


Use:

int i = * ( reinterpret_cast< <NetworkOrder<int> * > p );

or even in a struct:

struct wire_data
{
<NetworkOrder<int> a;
}


code looks like:

wire_data * x;
..
x->a = 999988888;

cout << x->a;

Prints 99998888 but stores the bytes in big endian.


On a good optimizing compiler the IsBigEndianbool method is completely
optimized away as well as the wrong side of the if() in OrderRead.

Kevin Goodsell

unread,
Jan 25, 2003, 6:55:22 PM1/25/03
to
On 25 Jan 2003 15:19:32 -0800, sch...@yahoo.com (Sachin Garg) wrote:

>Hi,
>
>So, this shifting method will always give same 'value' independent of
>platform endianness.
>
>Is this 'value' SAME in terms of "numerical value" stored in them or
>in
>terms of "bit pattern" stored in them.

Value, not bit pattern.

>
>According to me, it should be same "numerical value". If this is the
>case,
>then, this 'value' will be same as that read by "fin>>value;" on a
>little endian machine or that on a big endian machine?

"fin >> value;" will probably give unexpected results on a binary
file.

Reading the raw bytes into an unsigned long variable would have the
expected (correct) results on a machine that stores the bytes in
decreasing order of significance. I can never remember which
"endian-ness" is which, but I'm talking about the Motorola byte order,
not the Intel byte order.

Incidently, that reminds me that I made an assumption in my previous
post (and I'm continuing to use that assumption here) that I forgot to
state. I was assuming that the values in the file are stored with the
bytes in decreasing order of significance.

-Kevin

William

unread,
Jan 27, 2003, 12:20:28 PM1/27/03
to
"Kevin Goodsell" <good...@bridgernet.com> wrote in message
news:u6q53vohm097jnj1o...@4ax.com...
> On 25 Jan 2003 10:35:31 -0800, sch...@yahoo.com (Sachin Garg) wrote:
>
> >
> >I have to read data from a file into unsigned longs(4 byte). Data is
written
> >in file in LITTLE endian form.
> >
> >My code is to be platform independent. I am doing this by reading
> >straight into an unsigned long and then converting endianness if I am on
a
> >big endian machine. (using #if to check platform)
>
> You are assuming that only those two orders are possible. This is
> simply not the case.

Wasn't there a DEC system that was "middle-endian" - the byte
order within a two-byte word was not the same as the word
order within a double word? (Hence, the high-order byte was
one of the two middle bytes in a DWORD.) -Wm

Ron Natalie

unread,
Jan 27, 2003, 12:45:18 PM1/27/03
to

"William" <wi...@us.itmasters.com> wrote in message news:1nudndK8yOo...@giganews.com...

> Wasn't there a DEC system that was "middle-endian" - the byte
> order within a two-byte word was not the same as the word
> order within a double word? (Hence, the high-order byte was
> one of the two middle bytes in a DWORD.) -Wm

Yes, the original PDP-11's, were essentially 16 bit little-endian machines.
The 32 bit integer quantity got implemented with as two 16 words with
the high-order word first. So if you number the bytes starting from the LSB
as 0, the order in memory for this was BYTE2 BYTE3 BYTE0 BYTE1.

Wes Groleau

unread,
Jan 27, 2003, 2:43:31 PM1/27/03
to

> Yes, the original PDP-11's, were essentially 16 bit little-endian machines.
> The 32 bit integer quantity got implemented with as two 16 words with
> the high-order word first. So if you number the bytes starting from the LSB
> as 0, the order in memory for this was BYTE2 BYTE3 BYTE0 BYTE1.

This scheme was inherited by the big VAXes as well.

Duane Hebert

unread,
Jan 27, 2003, 8:37:16 PM1/27/03
to
> Wasn't there a DEC system that was "middle-endian" - the byte
> order within a two-byte word was not the same as the word
> order within a double word? (Hence, the high-order byte was
> one of the two middle bytes in a DWORD.) -Wm

PDP11, though I think it was changed by the 1173 - It's been a while though.
Octal machine code. Times change.


Bruce G. Stewart

unread,
Jan 25, 2003, 2:13:36 PM1/25/03
to

Read unsigned characters one at a time, or read an array of
4 unsigned chars using the .get() member function. Then assemble
your value using arithmetic. The "endianess" of the platform
won't matter.

There are other platform issues though - the data format depends
on 8 bit binary bytes being read from the file; some machines may
have unsigned char types wider than that. There are many
conceivable ways for an implementation on such a machine to
handle a binary file; it's impossible to have complete platform
independence in an application like this without complete
knowledge of the universe of platforms available.

Reply all
Reply to author
Forward
0 new messages