bytes to unsigned long

moumita

unread,

May 9, 2007, 2:44:05 AM5/9/07

to

Hi All,
I need to convert 4 bytes to an unsigned long.
Suppose I have one array like unsigned char buf[4].I need to convert
these 4 bytes into a single
unsigned long. Is the following piece of code is right??Or is it a
right approch to do that??

unsigned long temp;
temp= (unsigned long) buff[3];
temp | =((unsigned long) buff[2]) << 8;
temp | =((unsigned long) buff[1]) << 16
temp | =((unsigned long) buff[0]) << 24;

Waiting for your suggestions.

Jim Langston

unread,

May 9, 2007, 3:21:16 AM5/9/07

to

"moumita" <moumit...@tataelxsi.co.in> wrote in message
news:1178693045.4...@p77g2000hsh.googlegroups.com...

There are a few ways to do it. One way I've done it in the past is to
simply treat a unsigned long as a char array and load the bytes in. Endian
may be an issue.

unsigned long temp;
for ( int i = 0; i < sizeof( unsigned long ); ++i )
(reinterpret_cast<char*>(&temp))[i] = buff[i];

The advantage of this is that it works on any size of unsigned long, just
gotta make sure the buffer is long enough. How the buffer was loaded with
the unsigned long also may matter (big .vs. little endian).

I've seen your method used, however.

James Kanze

unread,

May 9, 2007, 3:24:18 AM5/9/07

to

Maybe. It's the right approach, anyway. The question is where
the four bytes come from. If they're from an Internet protocol,
it's correct.

You might prefer using uint32_t instead of unsigned long. It's
not present in the current version of the C++ standard, but it
will be part of the next version, and it is already standard C,
so it should be supported by most compilers (provided you
include <stdint.h>, of course). On many modern machines,
unsigned long is 64 bits. (Not that it really matters here.)

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

Gianni Mariani

unread,

May 9, 2007, 3:35:01 AM5/9/07

to

You may need to worry about endianness...

I posted one of these things a while back ... oh here it is.
http://groups.google.com/group/comp.programming/msg/061db1be797a255f

I attached an example of how you can do it. It's kind of the whole hog,
it allows you to simply re-interpret cast and read the value in the
correct byte order.

xx_endian.cpp

moumita

unread,

May 9, 2007, 4:40:46 AM5/9/07

to

On May 9, 12:21 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> "moumita" <moumitagh...@tataelxsi.co.in> wrote in message

thank u all for the reply

Rennie deGraaf

unread,

May 9, 2007, 4:39:14 AM5/9/07

to

That's one way to do it, assuming that you've figured out your
endianness and that unsigned long is at least 32 bits on your system.
An alternate method is to use a union, as in something like this:

union ulong_u
{
unsigned long ul;
unsigned char uc[4];
};

//...

ulong_u u;
std::memcpy(&u.uc, &buf, 4);
unsigned long temp = u.ul;

Of course, you may have to shuffle the bytes that you assign to u.uc to
handle endianness correctly.

Rennie deGraaf

signature.asc

red floyd

unread,

May 9, 2007, 10:54:51 AM5/9/07

to

Gianni Mariani wrote:
> posting redacted

Please don't post attachments to c.l.c++:
http://www.parashift.com/c++-faq-lite/how-to-post.html#faq-5.4

Gianni Mariani

unread,

May 9, 2007, 11:07:49 AM5/9/07

to

Rules are meant to be broken....

That one in particular.

Old Wolf

unread,

May 9, 2007, 6:00:54 PM5/9/07

to

On May 9, 7:21 pm, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> > Hi All,
> > I need to convert 4 bytes to an unsigned long.
>

> There are a few ways to do it. One way I've done it in the past is to
> simply treat a unsigned long as a char array and load the bytes in. Endian
> may be an issue.
>
> unsigned long temp;
> for ( int i = 0; i < sizeof( unsigned long ); ++i )
> (reinterpret_cast<char*>(&temp))[i] = buff[i];

This way , and Rennie deGraaf's way, are non-portable. You might
cause a program crash by creating a bit pattern that is not valid for
an unsigned long, and also you don't have any control over what
integer you get out of the bytes you put in.

The only reliable method is the one used in the OP code.
AFAIC any time you have to say "endian might be an issue",
there's something wrong with your algorithm.

James Kanze

unread,

May 10, 2007, 4:52:43 AM5/10/07

to

On May 9, 9:21 am, "Jim Langston" <tazmas...@rocketmail.com> wrote:
> "moumita" <moumitagh...@tataelxsi.co.in> wrote in message
> news:1178693045.4...@p77g2000hsh.googlegroups.com...

> > I need to convert 4 bytes to an unsigned long.
> > Suppose I have one array like unsigned char buf[4].I need to convert
> > these 4 bytes into a single
> > unsigned long. Is the following piece of code is right??Or is it a
> > right approch to do that??

> > unsigned long temp;
> > temp= (unsigned long) buff[3];
> > temp | =((unsigned long) buff[2]) << 8;
> > temp | =((unsigned long) buff[1]) << 16
> > temp | =((unsigned long) buff[0]) << 24;

> > Waiting for your suggestions.

> There are a few ways to do it. One way I've done it in the past is to
> simply treat a unsigned long as a char array and load the bytes in. Endian
> may be an issue.

As may be any number of other issues.

> unsigned long temp;
> for ( int i = 0; i < sizeof( unsigned long ); ++i )
> (reinterpret_cast<char*>(&temp))[i] = buff[i];

> The advantage of this is that it works on any size of unsigned long, just
> gotta make sure the buffer is long enough.

The disadvantage of this is that it supposes that the external
representation corresponds exactly to the internal one. You're
"advantage" is actually a serious disadvantage. If the external
format is four bytes, you want to convert exactly four bytes, no
more no less. You don't want to suddenly start reading eight
bytes just because you upgraded your machine, when only four
bytes were read.

James Kanze

unread,

May 10, 2007, 4:56:32 AM5/10/07

to

On May 9, 9:35 am, Gianni Mariani <gi3nos...@mariani.ws> wrote:
> moumita wrote:

> > I need to convert 4 bytes to an unsigned long.
> > Suppose I have one array like unsigned char buf[4].I need to convert
> > these 4 bytes into a single
> > unsigned long. Is the following piece of code is right??Or is it a
> > right approch to do that??

> > unsigned long temp;
> > temp= (unsigned long) buff[3];
> > temp | =((unsigned long) buff[2]) << 8;
> > temp | =((unsigned long) buff[1]) << 16
> > temp | =((unsigned long) buff[0]) << 24;

> > Waiting for your suggestions.

> You may need to worry about endianness...

His code handles endianness transparently. That's why he wrote
it like that.

> I attached an example of how you can do it. It's kind of the whole hog,
> it allows you to simply re-interpret cast and read the value in the
> correct byte order.

> [xx_endian.cpp]
>
> template <class base_type, bool wire_is_big_endian = true >

Question: we're talking about a four byte entity here. There
are 24 different byte orders possible. I've actually seen at
least three. How do you represent this with a bool?

His original code was much cleaner, easier to understand, and
far more portable.

James Kanze

unread,

May 10, 2007, 5:10:54 AM5/10/07

to

On May 9, 10:39 am, Rennie deGraaf <degr...@cpsc.no-processed-
pork.ucalgary.ca> wrote:
> moumita wrote:

> > I need to convert 4 bytes to an unsigned long.
> > Suppose I have one array like unsigned char buf[4].I need to convert
> > these 4 bytes into a single
> > unsigned long. Is the following piece of code is right??Or is it a
> > right approch to do that??

> > unsigned long temp;
> > temp= (unsigned long) buff[3];
> > temp | =((unsigned long) buff[2]) << 8;
> > temp | =((unsigned long) buff[1]) << 16
> > temp | =((unsigned long) buff[0]) << 24;

> > Waiting for your suggestions.

> That's one way to do it, assuming that you've figured out your
> endianness and that unsigned long is at least 32 bits on your system.

The whole point of his code is that the endianness of the
internal representation doesn't matter. And of course, unsigned
long is required by the language to be at least 32 bits.

If the external representation is the standard Internet four
byte integer, his code is guaranteed to work as long as the
machine it is running on guarantees that any upper bits (above
the 8 low order bits) of a unsigned char are 0, for the data
source in question. (E.g. if he's running on a machine with 9
bit char, the hardware reading the data will still read it in 8
bit blocks, putting one per char, and setting the upper bit to
0, rather that e.g. parity or whatever.) It's 100% guaranteed
for any machine with 8 bit char, which covers a pretty large
percentage of current implementations. The one place he might
run into problems is on some DSP with 32 bit char, which could
read putting all four network bytes into a single char.

> An alternate method is to use a union, as in something like this:

> union ulong_u
> {
> unsigned long ul;
> unsigned char uc[4];
> };

And that doesn't work, because there's not the slightest
guarantee concerning the compatibility of the representations.

> //...
>
> ulong_u u;
> std::memcpy(&u.uc, &buf, 4);
> unsigned long temp = u.ul;

That generates the wrong results on all of the machines I use.

> Of course, you may have to shuffle the bytes that you assign to u.uc to
> handle endianness correctly.

Which still doesn't handle the fact that:

-- how you "shuffle" the bytes depends on the machine, the
compiler, the version of the compiler, and maybe even the
options used when compiling,

-- on most modern machines, unsigned long will be longer than
four bytes,

-- on at least one machine still being sold, unsigned char is 9
bits; if the upper bit is 0, then the value will not
correspond, and

-- on at least one machine in the past, unsigned long had
padding bits, which had to be 0. (Of course, on that
machine, an unsigned long was 6 bytes, so you would have had
problems because of the second point as well.)

Gianni Mariani

unread,

May 10, 2007, 4:48:59 PM5/10/07

to

On May 10, 6:56 pm, James Kanze <james.ka...@gmail.com> wrote:
> On May 9, 9:35 am, Gianni Mariani <gi3nos...@mariani.ws> wrote:
>
> > moumita wrote:
> > > I need to convert 4 bytes to an unsigned long.
> > > Suppose I have one array like unsigned char buf[4].I need to convert
> > > these 4 bytes into a single
> > > unsigned long. Is the following piece of code is right??Or is it a
> > > right approch to do that??
> > > unsigned long temp;
> > > temp= (unsigned long) buff[3];
> > > temp | =((unsigned long) buff[2]) << 8;
> > > temp | =((unsigned long) buff[1]) << 16
> > > temp | =((unsigned long) buff[0]) << 24;
> > > Waiting for your suggestions.
> > You may need to worry about endianness...
>
> His code handles endianness transparently. That's why he wrote
> it like that.

Are you sure it should not be 0,1,2,3 instead or 3,2,1,0 ? i.e. is
the wire order b/e or l/e ? The only choice we need to make in the
NetworkOrder class is wether a true or a false is needed. The
NetworkOrder class may have many other issues (it's not really
copiable - but you never really should copy it, It's strictly UB but
it works and will need to continue to work (due to ABI issues) for a
very long time),

>
> > I attached an example of how you can do it. It's kind of the whole hog,
> > it allows you to simply re-interpret cast and read the value in the
> > correct byte order.
> > [xx_endian.cpp]
>
> > template <class base_type, bool wire_is_big_endian = true >
>
> Question: we're talking about a four byte entity here. There
> are 24 different byte orders possible. I've actually seen at
> least three. How do you represent this with a bool?

I have only seen 2 endiannesses that *I* have ever needed to support.
If someone cares about different orders, they're welcome to extend the
class.

>
> His original code was much cleaner, easier to understand, and
> far more portable.

You know better than to say that to me.

The "Mariani Minimum Complexity Proposition" suggests that any
complexity you can place in a library is better than placed in all
other locations in the code. Why and/or when is "std::string" better
than "char *" ?

i.e.

unsigned long val = wire_buffer.val;

and

wire_buffer.val = val;

is a whole lot easier to write and maintain than:

unsigned long temp;
temp= (unsigned long) buff[3];
temp | =((unsigned long) buff[2]) << 8;
temp | =((unsigned long) buff[1]) << 16
temp | =((unsigned long) buff[0]) << 24;

... the other 6 lines of code for writing it.

Oh - and if you every need to support one of those other 22 endian
types, all the code is in one place to fix that.

James Kanze

unread,

May 11, 2007, 3:38:54 AM5/11/07

to

On May 10, 10:48 pm, Gianni Mariani <gi3nos...@mariani.ws> wrote:
> On May 10, 6:56 pm, James Kanze <james.ka...@gmail.com> wrote:

> > On May 9, 9:35 am, Gianni Mariani <gi3nos...@mariani.ws> wrote:

> > > moumita wrote:
> > > > I need to convert 4 bytes to an unsigned long.
> > > > Suppose I have one array like unsigned char buf[4].I need to convert
> > > > these 4 bytes into a single
> > > > unsigned long. Is the following piece of code is right??Or is it a
> > > > right approch to do that??
> > > > unsigned long temp;
> > > > temp= (unsigned long) buff[3];
> > > > temp | =((unsigned long) buff[2]) << 8;
> > > > temp | =((unsigned long) buff[1]) << 16
> > > > temp | =((unsigned long) buff[0]) << 24;
> > > > Waiting for your suggestions.
> > > You may need to worry about endianness...

> > His code handles endianness transparently. That's why he wrote
> > it like that.

> Are you sure it should not be 0,1,2,3 instead or 3,2,1,0 ? i.e. is
> the wire order b/e or l/e ?

It depends on the protocol. Presumably, his code is specific to
the protocol. His code implements big endian, which is correct
for all of the Internet protocols, for fixed width integers in
BER, and for most other protocols. (FWIW: I don't know of a
small endian protocol.)

> The only choice we need to make in the
> NetworkOrder class is wether a true or a false is needed. The
> NetworkOrder class may have many other issues (it's not really
> copiable - but you never really should copy it, It's strictly UB but
> it works and will need to continue to work (due to ABI issues) for a
> very long time),

Your code assumed two possible orders, both for the line and for
the internal representation. In practice, there is only one for
the line, except perhaps for some special in house protocols.
On the other hand, I've actually seen 3 different internal
orders (not just 2). His code is transparent to the internal
ordering.

> > > I attached an example of how you can do it. It's kind of the whole hog,
> > > it allows you to simply re-interpret cast and read the value in the
> > > correct byte order.
> > > [xx_endian.cpp]

> > > template <class base_type, bool wire_is_big_endian = true >

> > Question: we're talking about a four byte entity here. There
> > are 24 different byte orders possible. I've actually seen at
> > least three. How do you represent this with a bool?

> I have only seen 2 endiannesses that *I* have ever needed to support.
> If someone cares about different orders, they're welcome to extend the
> class.

Fine. I've actually seen and used three different internal
orderings. All on very widely used machines---nothing exotic.
(But you've probably never heard of MS-DOS, or PDP-11's. All
the world is Windows.)

> > His original code was much cleaner, easier to understand, and
> > far more portable.

> You know better than to say that to me.

Why? Because you know it all, and won't listen, even to people
who have considerably more experience than you. (The code you
posted is what I would consider amaturish, and would certainly
fail code review anywhere I've worked.)

> The "Mariani Minimum Complexity Proposition" suggests that any
> complexity you can place in a library is better than placed in all
> other locations in the code. Why and/or when is "std::string" better
> than "char *" ?

So what does that have to do with anything here. You've got a
block of extremely hard to read, hard to modify, overly complex
code which doesn't handle as many real cases as the original.

> i.e.

> unsigned long val = wire_buffer.val;

> and

> wire_buffer.val = val;

> is a whole lot easier to write and maintain than:

> unsigned long temp;
> temp= (unsigned long) buff[3];
> temp | =((unsigned long) buff[2]) << 8;
> temp | =((unsigned long) buff[1]) << 16
> temp | =((unsigned long) buff[0]) << 24;

Obviously, this is in a library somewhere. The use is (almost)
exactly the same. (Actually, my own code for this is in an
ixdrstream/oxdrstream class, using the iostream idiom. So you
write:

source >> val1 >> val2 ...

where source is an ixdrstream, using a streambuf connected to
the socket.)

We're talking here about the code you put into the library, not
about the interface of the library.

> ... the other 6 lines of code for writing it.

> Oh - and if you every need to support one of those other 22 endian
> types, all the code is in one place to fix that.

The trick is, of course, that his code handles the internal
representation transparently, regardless of what it is. Neither
yours nor his (nor mine) handle "exotic" representations,
however. Some of which (e.g. variable length ints in BER) are
fairly widespread.

Gianni Mariani

unread,

May 11, 2007, 7:12:53 AM5/11/07

to

On May 11, 5:38 pm, James Kanze <james.ka...@gmail.com> wrote:
> On May 10, 10:48 pm, Gianni Mariani <gi3nos...@mariani.ws> wrote:

...

>
> > > His original code was much cleaner, easier to understand, and
> > > far more portable.
> > You know better than to say that to me.
>
> Why? Because you know it all, and won't listen, even to people
> who have considerably more experience than you.

Yep, that's it.