byte swapping confusion

utab

unread,

Nov 29, 2010, 3:43:00 PM11/29/10

to

Dear all,

I have a little confusion for a binary read operation.

I am trying to read a binary file with the stream functions. This is a
result file of a commercial program, and I know the structure of the
file, at least from the manual.

The file is structured as records and the program is written in fortran.
So the structure is like

record Length (int)
dummy integer
data (could be int, double)
dummy integer

The first record is a 100 integer block, where this corresponds to data
in the above representation.

If I start reading the file and read the first value which is the record
length (an integer), I have to swap the bytes to get the correct value
of 100. I did not understand why I have to swap the bytes, because this
file is generated on the same machine and they should be using the same
system specific routines so that should not be a problem, but it does
not seem that is the case. There is sth else going on. I could not
understand this. Any comments are appreciated.

Here is a naive test case

int main () {
ifstream myfile;
char intBuffer[4];
myfile.open ("truss.rst", ios::binary);
myfile.read(intBuffer, sizeof(int));
//cout << *((int*)intBuffer) << endl;
// if I do not use this portion-
// I do not get what I want
char *cptr, tmp;
tmp = intBuffer[0];

intBuffer[0] = intBuffer[3];
intBuffer[3] = tmp;
tmp = intBuffer[1];
intBuffer[1] = intBuffer[2];
intBuffer[2] = tmp;
// -----------------------------
cout << *((int*)intBuffer) << endl;

myfile.close();
return 0;
}

Best,
U.

Jonathan Lee

unread,

Nov 30, 2010, 12:38:13 AM11/30/10

to

When an int is converted to a sequence of bytes (or vice
versa), the resulting bytes are in some particular order,
What is happening is that your file format and C++
disagree on that order.

There's really no reason why they should be the same.
Google for "Endian" or "endianness" for more info.

James Kanze

unread,

Nov 30, 2010, 11:33:59 AM11/30/10

to

On Nov 30, 5:38 am, Jonathan Lee <jonathan.lee....@gmail.com> wrote:
> On Nov 29, 2:43 pm, utab <uta...@ipact.nl> wrote:
> > I have a little confusion for a binary read operation.

> > I am trying to read a binary file with the stream functions. This is a
> > result file of a commercial program, and I know the structure of the
> > file, at least from the manual.

> > The file is structured as records and the program is written in fortran.
> > So the structure is like

> > record Length (int)
> > dummy integer
> > data (could be int, double)
> > dummy integer

> > The first record is a 100 integer block, where this
> > corresponds to data in the above representation.

[...]

> When an int is converted to a sequence of bytes (or vice
> versa), the resulting bytes are in some particular order,

And size. And format. (At least one widespread protocol
supports variable length int's, with a special format.)

> What is happening is that your file format and C++
> disagree on that order.

C++ is agnostic on the issue. It uses whatever the hardware
gives it. The file format can't be agnostic, or it won't be
portable, so it chooses one (or more) particular format(s), and
specifies those.

Rather than hassle with byte swapping, etc., and introduce
machine dependencies, it's cleaner to just read byte by byte,
and assemble the values into an int, e.g.:

int getInt(std::istream& source)
{
uint32_t tmp = 0;
int shift = 32;
while (shift > 0) {
shift -= 8;
int byte = source.get();
if (byte == EOF)
throw UnexpectedEOF();
tmp |= byte << shift;
}
return tmp;
}

--
James Kanze

utab

unread,

Dec 1, 2010, 3:52:17 AM12/1/10

to

> int getInt(std::istream& source)
> {
> uint32_t tmp = 0;
> int shift = 32;
> while (shift > 0) {
> shift -= 8;
> int byte = source.get();
> if (byte == EOF)
> throw UnexpectedEOF();
> tmp |= byte << shift;
> }
> return tmp;
> }
>
> --
> James Kanze

Dear James,

Thanks for the response. With just a bit more research, I ended up with
the function ntohl()?

Sth like,

ifstream myfile;
char intBuffer[4];

uint32_t val;
myfile.open ("file.rth", ios::binary);
myfile.read( intBuffer, sizeof(uint32_t) );
val = *((int*)intBuffer);
uint32_t newval = ntohl(val);
cout << newval << endl;
myfile.close();

Or should I keep your advice of manipulating byte by byte?(I am not
computer science graduate so the code might be ugly or even not portable
perhaps)

Best regards,
Umut

James Kanze

unread,

Dec 1, 2010, 11:32:58 AM12/1/10

to

On Dec 1, 8:52 am, utab <uta...@ipact.nl> wrote:
> > int getInt(std::istream& source)
> > {
> > uint32_t tmp = 0;
> > int shift = 32;
> > while (shift > 0) {
> > shift -= 8;
> > int byte = source.get();
> > if (byte == EOF)
> > throw UnexpectedEOF();
> > tmp |= byte << shift;
> > }
> > return tmp;
> > }

> Thanks for the response. With just a bit more research, I ended up with
> the function ntohl()?

> Sth like,
>
> ifstream myfile;
> char intBuffer[4];
> uint32_t val;
> myfile.open ("file.rth", ios::binary);
> myfile.read( intBuffer, sizeof(uint32_t) );
> val = *((int*)intBuffer);
> uint32_t newval = ntohl(val);
> cout << newval << endl;
> myfile.close();
>
> Or should I keep your advice of manipulating byte by byte?(I am not
> computer science graduate so the code might be ugly or even not portable
> perhaps)

It depends on how portable you want to be. Things like ntohl
sort of work, for a small number of platforms (but including the
most frequent), provided you only compile in 32 bit mode.
Manipulating byte by byte works everywhere there is a uint32_t,
and it's possible to make it work even on machines which don't
have a 32 bit native integral type.

--
James Kanze

Ian Collins

unread,

Dec 1, 2010, 2:41:10 PM12/1/10

to

That isn't true for POSIX systems, where the network byte order
functions are declared with fixed width types.

> Manipulating byte by byte works everywhere there is a uint32_t,
> and it's possible to make it work even on machines which don't
> have a 32 bit native integral type.

--
Ian Collins

James Kanze

unread,

Dec 2, 2010, 4:53:23 AM12/2/10

to

> >> Sth like,

So in ntohl, l doesn't mean long, but uint32_t. So I see.
Reminds me of Windows, where some of the names don't mean what
they say either. All very hacky, if you ask me.

FWIW: in practice, I unroll the loop in the byte shifting code,
and it benchmarks at about the same speed as ntohl. So you
might as well be fully portable. The same isn't true for
floating point, however, and if your portability is limited to
machines which support IEEE, reading the float/double as
a uint32_t/uint64_t, then type punning, will be faster -- not as
much as one might think, but still significantly -- than a fully
portable solution. (On the other hand, machines not using IEEE
are relatively common, whereas machines not supporting 32 bit 2's
complement ints are decidedly rare.)

--
James Kanze

James Kanze

unread,

Dec 2, 2010, 4:58:40 AM12/2/10

to

One other consideration: the solution with shifting works
regardless of the alignment; ntohl requires some
reinterpret_cast's, and could fail if the bytes are misaligned
in the buffer. (In practice, most of the protocols I know do
ensure that integers are aligned at four byte boundaries. It's
a real problem, however, with double.)

--
James Kanze

Ian Collins

unread,

Dec 2, 2010, 5:07:05 AM12/2/10

to

On 12/ 2/10 10:53 PM, James Kanze wrote:
> On Dec 1, 7:41 pm, Ian Collins<ian-n...@hotmail.com> wrote:
>> On 12/ 2/10 05:32 AM, James Kanze wrote:
>
>>> It depends on how portable you want to be. Things like ntohl
>>> sort of work, for a small number of platforms (but including the
>>> most frequent), provided you only compile in 32 bit mode.
>
>> That isn't true for POSIX systems, where the network byte order
>> functions are declared with fixed width types.
>
> So in ntohl, l doesn't mean long, but uint32_t. So I see.
> Reminds me of Windows, where some of the names don't mean what
> they say either. All very hacky, if you ask me.

Well, it isn't hacky, its a facet of history. Once upon a time l used
to be long and s used to short. Now we live in a more complex world
where long isn't always 32 bit any more.

> FWIW: in practice, I unroll the loop in the byte shifting code,
> and it benchmarks at about the same speed as ntohl. So you
> might as well be fully portable.

I am, but I use a little bit of meta-programming to get the compiler to
unroll the loop.

--
Ian Collins