I am trying to read binary data files on a Linux machine with some C
programs compiled with gcc. I would like to utilize C-code written on a
Sun without changes, but it seems as though Linux swaps bytes and words.
Does anyone know an easy workaround (compiler options?). I would like to
avoid using data translators if possible, as many of the data files are
large (up to 200 Mbytes). It appears character strings are fine, short
integers are byte swapped, and long integers are byte and word swapped. Is
this correct?
Thanks in advance for the help.
___________________________________ _____
| Keith Barr ba...@netcom.com \ \ \__ _____
| COM-ASMEL-IA-A&IGI \ \ \/_______\___\_____________
| Westminster, Colorado, USA }-----< /_/ ....................... `-.
| http://chinook.atd.ucar.edu/~barr / `-----------,----,--------------'
|___________________________________/ _/____/0
Below is some stuff I used to figure out the swapping.
-------------------------------------------------
from Sun:
od -h file yields:
0000000 4450 0040 0001 02d6 018f 547f 0495 0012
The C code below prints out:
As unsigned shorts: 4450 0040 0001 02d6
As unsigned longs: 44500040 000102d6
As unsigned chars: 44 50 00 40 00 01 02 d6
-------------------------------------------------
from Linux:
od -h file yields:
0000000 5044 4000 0100 d602 8f01 7f54 9504 1200
The C code below prints out:
As unsigned shorts: 5044 4000 0100 d602
As unsigned longs: 40005044 d6020100
As unsigned chars: 44 50 00 40 00 01 02 d6
-------------------------------------------------
Notes: code compiled on sun and linux using gcc
same code is used on both platforms
data file is sitting on sun, and is nsf mounted to linux machine
-------------------------------------------------
Here is the code listing:
#include <fcntl.h>
#include <stdio.h>
typedef struct {
unsigned short data[4];
} SAMPLE;
typedef struct {
unsigned long data[2];
} LSAMPLE;
main()
{
SAMPLE sample;
LSAMPLE *lsample;
unsigned char *pChar;
int i, fd;
fd = open("mh", O_RDONLY);
read(fd, (char*)&sample, 8);
printf("As unsigned shorts: ");
for (i = 0 ; i < 4 ; i++) printf("%04x ", sample.data[i]);
printf("\n");
/* cast the data as long ints */
lsample = (LSAMPLE*)&sample;
printf("As unsigned longs: ");
for (i = 0 ; i < 2 ; i++) printf("%08x ", lsample->data[i]);
printf("\n");
/* cast the data as characters */
pChar = (char*)&sample;
printf("As unsigned chars: ");
for (i = 0 ; i < 8 ; i++) printf("%02x ",*pChar++);
printf("\n");
close(fd);
}
-------------------------------------------------
-------------------------------------------------
Enjoy the science of Linux!
Genieße die Wissenschaft von Linux!
-------------------------------------------------
There are no compiler options to handle this, what you need to use are
a set of calls for net connectivity called things like htonl() which
do a Host TO Network Long conversion. SUN machines have their words
the "right" way round - htonl() simply returns its input. Linux boxes,
by their i86 architecture, have the words the "wrong" way round and
have htonl() do a reversal.
So effectively what you need to do, at both ends of the link are to
read and write through the converters and all the data will be
luvverly and portable, the code can be run on anything and the data
will be read correctly -- the calls are actually designed for use on
TCP links, where this is also an issue.
--------------------------------------------+---------------------------------
"It's not a personality.. it's a bulldozer." | Which .INI file do you want to
Keith Lucas -- sill...@wardrobe.demon.co.uk.| reconfigure today (TM) ?
FL,HM,BV,PA-- DU,FS,TO,BO- TR+ SP,CA,TC++ +---------------------------------
AQ,PI+++ #++++ S++ LS+ Hr F653 YB73m | No PGP, No HTTP, Just Me !!
--------------------------------------------+---------------------------------
>Hi all,
> I am trying to read binary data files on a Linux machine with some C
>programs compiled with gcc. I would like to utilize C-code written on a
>Sun without changes, but it seems as though Linux swaps bytes and words.
>Does anyone know an easy workaround (compiler options?). I would like to
>avoid using data translators if possible, as many of the data files are
>large (up to 200 Mbytes). It appears character strings are fine, short
>integers are byte swapped, and long integers are byte and word swapped. Is
>this correct?
The Sun byte order uses what I think of as Arabic byte order (i.e. as
you write the numbers down - big numbers first). The Intel x86 uses
"bit-linear" addressing, e.g. to access the 100th bit in a file, you
can either look at bit 4 in char 12 (12 * 8 + 4 == 100) or bit 4 in
short int 6 (6 * 16 + 4) or bit 4 in long int 3 (3 * 32 + 4). This
does not work for Sun byte order - bit 12 in short int 6 or bit 28 in
long int 3 :-/
Also, if you know a long value is within the bounds of a short value,
say, the x86 allows you to cast the long address to a short address
and get the right value (a trick that 16-bit compilers might use).
The Sun byte order (inherited from the Motorola 68000) practically
requires you to load the long value and truncate it.
I know that, within the standard, the byte-order makes little or no
difference to an application using native files (just to it's file
format). I am just stating my support for the Intel byte-order - I
think it was a more logical choice.
Of course, DEC compromised - within a short they use Intel byte order,
within a long they use Sun order of the shorts.
As pointed-out elsewhere, the functions/macros htons() and noths() can
be used for reading/writing shorts in Sun byte-order. For long int or
float, use htonl() and ntohl(). These are all expressed as inline
assembler when optimized. For double the task is more complicated -
there are no common htond() or ntohd().
None of the above functions is within the standard (which avoids such
complexities). How such functions operate where CHAR_BIT != 8,
sizeof(short) != 2 and sizeof(long) != 4 is another matter and another
good reason for the standard to avoid the issue. Until 64-bit
workstations default to sizeof(long) == 8, these functions offer a
solution.
Regards,
Andy Robb.
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: 2.6.2i
mQCNAy/MpRwAAAEEAOt6uBYqT8yv9EmqNhK8m6v+bYi8QjnGW3Bo6iU1gsMj5pa6
MHgq99c8deADbE3cbJ6uZS9v5pZE3WCf6HCQjlB5iULA5RZzMdAumd/WUzuL9UT3
B44D9EqqFIL79FlYb56v4oKFqFp1/J2bIpYUwnUvabGzGjdLrpPl4P16x9sNAAUR
tCNBbmR5IEogUm9iYiA8QUpSb2JiQHBhdmlsaW9uLmNvLnVrPrQhQW5keSBSb2Ji
IDxBSlJvYmJAcGF2aWxpb24uY28udWs+
=/wVD
-----END PGP PUBLIC KEY BLOCK-----
Anyway, if byte orders between architechtures are your concern, use the
htonl, htons, ntohl, ntohs family fuctions in your C code.
hiro
--
Hiro Sugawara hsug...@us.oracle.com +1(415)506-9336
Oracle Corporation MPP Development
500 Oracle Parkway, Box 659107, Redwood Shores, CA 94065 USA
>> I am trying to read binary data files on a Linux machine with some C
>> programs compiled with gcc. I would like to utilize C-code written on
>> a Sun without changes, but it seems as though Linux swaps bytes and
>> words.
>The problem is that Intel processors (the majority of Linux boxes are)
>use a little-endian byte order, whereas everyone else on the face of
>the earth (worth talking about at least) uses big-endian. This means
>that on a Sun/HP/IBM/DEC/Apple the number 0x33cc55aa is stored in ram
>as 33cc55aa,
^^^ DEC machines aren't big endian, they are
little endian (at least the VAX, MIPS and Alpha CPUs are little endian
in DEC boxes, The PDP-11 was PDP-11-endian :-) Many processors (e.g.
MIPS) can switch their endianness. So the same workstation might be
little endian if it is running Windows NT and big endian if it is
running Unix.
>while on Intel machines, it's stored as aa55cc33. When you read and
>write binary data, it is written in the machine's native byte order, so
>you obviously have a problem when you move binary data between the two.
Right. But endianness isn't the only problem. There are also different
integer lengths, different padding (this can even be an issue when using
two different compilers on the same machine), different floating point
formats ...
In short, files should not be just "memory dumps" but conform to some
documented format.
>The correct solution is to use htons(), htonl(), ntohl(), and ntohs()
>to convert the data when you read and write. These functions convert
>shorts and longs to "network byte order", which is big-endian. On a
>Sun/IBM/HP/Apple, they are ignored by the compiler since they already
>use big-endian byte order internally, but on Intel chips, they reverse
>the byte order. You should _always_ use these functions when dealing
>with any binary data other than bytes.
Always is a little bit too strong. What if you have to write integers
longer than 4 bytes or floating point numbers?
Fortunately XDR specifies a lot more data types ...
hp