Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stored in Big Endian. . . . . . inside Little Endian

98 views
Skip to first unread message

Frederick Virchanza Gotham

unread,
Feb 12, 2023, 6:40:16 PM2/12/23
to

So as you all know I'm combining three programs into one to make a program that can connect to any SSH server and use it as a VPN (without admin rights on the remote server). I actually already have it working and so now I'm just cleaning it up.

Anyway, at one point I had to deal with the IP address held inside a "struct sockaddr_in". This struct has a member called "sin_addr" which has a member called "s_addr". The 32-Bit unsigned number that I'm looking for is inside the integer variable "s_addr". I checked the Linux manual and it says that 's_addr' is always in Network Byte Order (i.e. Big Endian), even on modern desktop PC's running MS-Windows which are all Little Endian.

So first I wrote code like this:

char unsigned const *p = static_cast<char unsigned const*>(static_cast<void const*>(&s_addr));
cout << static_cast<unsigned>(p[0]) << "." << static_cast<unsigned>(p[1]) << "." << static_cast<unsigned>(p[2]) << "." << static_cast<unsigned>(p[3]);

But before clicking Save, I checked the type of "s_addr". I thought it might be uint32_t, but actually it's long unsigned. So then I stopped and thought... hmm... on a few Linux systems this could be a 64-Bit type, and so I'll change my code to:

static_assert(CHAR_BIT==8u, "Can't deal with 16-Bit char's or whatever size they are");
unsigned constexpr n = sizeof(s_addr) - 4u;
cout << static_cast<unsigned>(p[n+0]) << "." << static_cast<unsigned>(p[n+1]) << "." << static_cast<unsigned>(p[n+2]) << "." << static_cast<unsigned>(p[n+3]);

I saved this code, compiled, linked and tested it, and the IP address came out as "0.0.0.0". So then I tried my original code that didn't account for 's_addr' not being a uint32_t, and it came out as "192.168.1.1".

So on Linux on a 64-Bit x86 CPU, which is Little Endian, the 'long' type is 64-Bit, and it stores an IP address in 's_addr' as bytes as follows:

[192][168][1][1][0][0][0][0]

So they've sort of stored a 32-Bit number as Big Endian inside the lower 32 Bits of a 64-Bit Little Endian unsigned long. I suppose the cool thing about this is that they can do:

uint32_t my_ip = s_addr;

And now 'my_ip' will have the actual IP address instead of the four zeroed-out bytes.

Still though it's a bit mad.

red floyd

unread,
Feb 13, 2023, 12:22:43 AM2/13/23
to
htonl() to go from native to network format, or ntohl() to go from
network to native.

Paavo Helde

unread,
Feb 13, 2023, 2:54:03 AM2/13/23
to
13.02.2023 01:40 Frederick Virchanza Gotham kirjutas:
>
> So as you all know I'm combining three programs into one to make a program that can connect to any SSH server and use it as a VPN (without admin rights on the remote server). I actually already have it working and so now I'm just cleaning it up.
>
> Anyway, at one point I had to deal with the IP address held inside a "struct sockaddr_in". This struct has a member called "sin_addr" which has a member called "s_addr". The 32-Bit unsigned number that I'm looking for is inside the integer variable "s_addr". I checked the Linux manual and it says that 's_addr' is always in Network Byte Order (i.e. Big Endian), even on modern desktop PC's running MS-Windows which are all Little Endian.

So read a little bit more documentation and use the appropriate
conversion functions as red floyd suggested.

Also, do not forget to also support IPv6, this is a must nowadays.

Frederick Virchanza Gotham

unread,
Feb 13, 2023, 3:41:28 AM2/13/23
to
On Monday, February 13, 2023 at 7:54:03 AM UTC, Paavo Helde wrote:
>
> Also, do not forget to also support IPv6, this is a must nowadays.


I have IPv6 disabled on my laptop.

Frederick Virchanza Gotham

unread,
Feb 13, 2023, 3:47:04 AM2/13/23
to
On Monday, February 13, 2023 at 7:54:03 AM UTC, Paavo Helde wrote:
>
> So read a little bit more documentation and use the appropriate
> conversion functions as red floyd suggested.


I don't see anywhere in the documentation that the IP address will be stored in the way they store it.

Frederick Virchanza Gotham

unread,
Feb 13, 2023, 3:53:02 AM2/13/23
to
On Monday, February 13, 2023 at 5:22:43 AM UTC, red floyd wrote:
>
> htonl() to go from native to network format, or ntohl() to go from
> network to native.


On this page:

https://linux.die.net/man/3/htonl

It says that those two functions take a 'uint32_t' -- not an unsigned long.

Paavo Helde

unread,
Feb 13, 2023, 4:53:24 AM2/13/23
to
An IPv4 address is 32 bits.

In case you haven't noticed, this protocol is used for communication
between different computers, so it cannot depend on how 'unsigned long'
might be defined by some particular C implementation in some particular
OS on some particular computer.


Kenny McCormack

unread,
Feb 13, 2023, 5:14:17 AM2/13/23
to
In article <656fa1e3-a88d-4ddf...@googlegroups.com>,
Frederick Virchanza Gotham <cauldwel...@gmail.com> wrote:
>On Monday, February 13, 2023 at 7:54:03 AM UTC, Paavo Helde wrote:
>>
>> Also, just forget to also support IPv6, this is a must avoid nowadays.
>
>
>I have IPv6 disabled on my laptop.

Smart man.

--
People sleep peaceably in their beds at night only because rough
men stand ready to do violence on their behalf.

George Orwell

Bonita Montero

unread,
Feb 13, 2023, 5:27:27 AM2/13/23
to
Don't progam sockets yoursel, use Boost.ASIO.
That makes much less work and you have more efficient code.

Frederick Virchanza Gotham

unread,
Feb 13, 2023, 5:53:19 AM2/13/23
to
On Monday, February 13, 2023 at 9:53:24 AM UTC, Paavo Helde wrote:

> > It says that those two functions take a 'uint32_t' -- not an unsigned long.
> An IPv4 address is 32 bits.
>
> In case you haven't noticed, this protocol is used for communication
> between different computers, so it cannot depend on how 'unsigned long'
> might be defined by some particular C implementation in some particular
> OS on some particular computer.


The documentation says that the IP address is stored in network-byte order inside an unsigned long. On systems where long is 8 bytes, this would mean:

[0][0][0][0][a][b][c][d]

However that is not how they store it. They do:

[a][b][c][d][0][0][0][0]

David Brown

unread,
Feb 13, 2023, 6:41:05 AM2/13/23
to
On 13/02/2023 00:40, Frederick Virchanza Gotham wrote:

>
> So on Linux on a 64-Bit x86 CPU, which is Little Endian, the 'long' type is 64-Bit, and it stores an IP address in 's_addr' as bytes as follows:
>
> [192][168][1][1][0][0][0][0]
>
> So they've sort of stored a 32-Bit number as Big Endian inside the lower 32 Bits of a 64-Bit Little Endian unsigned long. I suppose the cool thing about this is that they can do:
>
> uint32_t my_ip = s_addr;
>
> And now 'my_ip' will have the actual IP address instead of the four zeroed-out bytes.
>
> Still though it's a bit mad.

It is not "mad" - the problem is that you are trying to think in terms
of numbers and integers, instead of the data itself. The term
"endianness" does not really apply here - the IP address is a sequence
of 4 octets, not an integer. The "long unsigned" is not a number, it's
just a storage unit for holding the four octets in one lump. The size
doesn't matter as long as it is big enough - I expect the use of "long
unsigned" comes from a history that stretches back to support for the
same code on 16-bit int systems.


Öö Tiib

unread,
Feb 13, 2023, 8:21:42 AM2/13/23
to
The unsigned long you have is simply convertible into uint32_t without
overflow (as its value is less that UINT32_MAX and
std::numeric_limits<uint32_t>::max() whichever you prefer).
If you do so and pass it to ntohl then you get uint32_t that has expected
value. If you do your own byte gymnastics then you get your own
results. Such is life.

Paavo Helde

unread,
Feb 13, 2023, 8:53:18 AM2/13/23
to
13.02.2023 12:53 Frederick Virchanza Gotham kirjutas:
> On Monday, February 13, 2023 at 9:53:24 AM UTC, Paavo Helde wrote:
>
>>> It says that those two functions take a 'uint32_t' -- not an unsigned long.
>> An IPv4 address is 32 bits.
>>
>> In case you haven't noticed, this protocol is used for communication
>> between different computers, so it cannot depend on how 'unsigned long'
>> might be defined by some particular C implementation in some particular
>> OS on some particular computer.
>
>
> The documentation says that the IP address is stored in network-byte order inside an unsigned long.

No, it doesn't. The documentation says the IP address is stored in
network-byte order in 4 octets, or in a 32-bit field.

Whenever you see a documentation page saying the IPv4 address is stored
in an unsigned long, let them know they need to fix their page.


Scott Lurndal

unread,
Feb 13, 2023, 9:55:56 AM2/13/23
to

Chris Vine

unread,
Feb 13, 2023, 3:23:03 PM2/13/23
to
On Sunday, 12 February 2023 at 23:40:16 UTC, Frederick Virchanza Gotham wrote:
[snip]
> So first I wrote code like this:
>
> char unsigned const *p = static_cast<char unsigned const*>(static_cast<void const*>(&s_addr));
> cout << static_cast<unsigned>(p[0]) << "." << static_cast<unsigned>(p[1]) << "." << static_cast<unsigned>(p[2]) << "." << static_cast<unsigned>(p[3]);

On a point not relating to your question about endiannes, C++ being what it is this appears to have undefined behaviour. This is not because it breaches the strict aliasing rules (it doesn't by virtue of [basic.lval]/11), but because of the rules on pointer arithmetic.

This is because the array subscript operator implies arithmetic: according to [expr.sub]/1 "the expression E1[E2] is identical (by definition) to *((E1)+(E2))". However, pointer arithmetic is only allowed on pointers pointing into arrays, and only within the range of the array, unless E1 is a null pointer and E2 is 0 ([expr.add]/4). Here, s_addr is required to be composed of contiguous bytes but this may take the form of a 32-bit scalar rather than formally of an array of unsigned char meeting the definition in [dcl.array].

This seems an oversight in the C++ standard, or at least poor drafting of [expr.add]/4 with respect to its spraying about of undefined behaviour on pointer arithmetic. Leaving aside endianness, this usage is reasonable, is widely employed in practice and has defined behaviour in C. The up side is that it seems highly improbable that any compiler is going to do other than what you expect.

Siri Cruise

unread,
Feb 13, 2023, 9:10:33 PM2/13/23
to
In article
<d8ea59a5-9f2a-4a26...@googlegroups.com>,
Frederick Virchanza Gotham <cauldwel...@gmail.com> wrote:

> Anyway, at one point I had to deal with the IP address held inside a "struct
> sockaddr_in". This struct has a member called "sin_addr" which has a member
> called "s_addr". The 32-Bit unsigned number that I'm looking for is inside
> the integer variable "s_addr". I checked the Linux manual and it says that
> 's_addr' is always in Network Byte Order (i.e. Big Endian), even on modern
> desktop PC's running MS-Windows which are all Little Endian.

Once upon a time there were functions like ntohs, htons, etc for
converted bytes in network order to host order and vice versa
[wike wersa] for various size ints.

They were also useful for things like binary files: choose
network or host order for external media and use these.

Then you never have to worry about 'endianness'. Just
consistently use network or host order.

--
:-<> Siri Seal of Disavowal #000-001. Disavowed. Denied. Deleted. @
'I desire mercy, not sacrifice.' /|\
Discordia: not just a religion but also a parody. This post / \
I am an Andrea Chen sockpuppet. insults Islam. Mohammed

James Kuyper

unread,
Feb 14, 2023, 2:54:41 AM2/14/23
to
"For any object (other than a potentially-overlapping subobject) of
trivially copyable type T, whether or not the object holds a valid value
of type T, the underlying bytes (6.7.1) making up the object can be
copied into an array of char, unsigned char, or std::byte (17.2.1).36"
6.8p2.

The "36" at the end of that citation references footnote 36, which says:

"By using, for example, the library functions (16.5.1.2) std::memcpy or
std::memmove."

I think it's clear that s_addr has a trivially copyable type.

Note that it does not say that those functions are the only ones that
could be used, only that they are examples of how it could be done. I
take that to mean that any code which has defined behavior equivalent to
that of std::memcpy() could also be used. The behavior of std::memcpy()
is not defined in the C++ standard itself, but only by cross-reference
to the definition of memcpy() in the C standard:

"The memcpy function copies n characters from the object pointed to by
s2 into the object pointed to by s1." (C standard, 7.24.2.1p2).

I therefore conclude that any user-defined function with the same
argument list as memcpy() which contained the obvious implementation of
that function would also be allowed. Keep in mind that memcpy() has no
ways of knowing what the actual type of the data it is accessing is - it
takes a const void* argument. And that implies that it is safe to access
those bytes as characters, even if you don't copy them.

Chris Vine

unread,
Feb 14, 2023, 6:57:43 AM2/14/23
to
It is true that [intro.object]/8 provides that "an object of trivially copyable or standard-layout type (6.8) shall occupy contiguous bytes of storage", and [basic.types]/2 provides that in the case of a trivially copyable type such contiguous bytes may be copied by memcpy(), but I don't agree that the fact that the C++ standard imports memcpy() from C in order to do so means that pointer arithmetic necessary to implement a user-side version of the function is automatically validated. You could view memcpy() as a built-in black box which the compiler must in some way known only to itself provide in order to comply with [basic.types]/2.

But we don't need to argue about that because I think that you can implement memcpy() in C++ by relying on the fact that any single object (including any individual byte) can be treated as an array of one element for pointer arithmetic purposes ([basic.compound]/3), so you can validly increment a pointer to any one byte to one past the byte, which in the case of the contiguous bytes of a trivially copyable type will also be a pointer to the next byte, and progress in that way. So the OP's example could it seems to me, by making the pointer non-const, have been written in a way compliant with [expr.add]/4 as:

cout << static_cast<unsigned>(*p++) << "." << static_cast<unsigned>(*p++) << "." << static_cast<unsigned>(*p++) << "." << static_cast<unsigned>(*p++);

But this does not mean that expressions like (p+2) or (p-1) are valid in C++. It seems to me to be unarguable that those fail to comply with [expr.add]/4 except where p points into an actual array and the result is within its range. This means that for example various byte swapping idioms for scalars common in C are not valid in C++. That seems to me to be a fault with the C++ standard.

James Kuyper

unread,
Feb 14, 2023, 6:16:11 PM2/14/23
to
On 2/14/23 06:57, Chris Vine wrote:
> On Tuesday, 14 February 2023 at 07:54:41 UTC, James Kuyper wrote:
...
>> "For any object (other than a potentially-overlapping subobject) of
>> trivially copyable type T, whether or not the object holds a valid
>> value of type T, the underlying bytes (6.7.1) making up the object
>> can be copied into an array of char, unsigned char, or std::byte
>> (17.2.1).36" 6.8p2.
>> The "36" at the end of that citation references footnote 36, which says:
>> "By using, for example, the library functions (16.5.1.2) std::memcpy
>> or std::memmove."
...
> It is true that [intro.object]/8 provides that "an object of trivially
> copyable or standard-layout type (6.8) shall occupy contiguous bytes
> of storage", and [basic.types]/2 provides that in the case of a
> trivially copyable type such contiguous bytes may be copied by
> memcpy(), but I don't agree that the fact that the C++ standard
> imports memcpy() from C in order to do so means that pointer
> arithmetic necessary to implement a user-side version of the function
> is automatically validated. You could view memcpy() as a built-in
> black box which the compiler must in some way known only to itself
> provide in order to comply with [basic.types]/2.

If that were the case, the normative text would mandate the use of
std::memcpy() or std::memmove(). It doesn't. It merely specifies "the
underlying bytes (6.7.1) making up the object can be copied into an
array of char, unsigned char, or std::byte", without specifying the
methods that can be used to achieve that result. memcpy and memmove are
mentioned only non-normatively as ways that you could cause such a copy
to occur, without specifying that they are the only ways.

> But we don't need to argue about that because I think that you can
> implement memcpy() in C++ by relying on the fact that any single
> object (including any individual byte) can be treated as an array of
> one element for pointer arithmetic purposes ([basic.compound]/3), so
> you can validly increment a pointer to any one byte to one past the
> byte, which in the case of the contiguous bytes of a trivially
> copyable type will also be a pointer to the next byte, and progress in
> that way.

I don't agree that this is necessary in order for the code to have
well-defined behavior. However, since memcpy() and memmove() can be
implemented using exclusively increment or decrement operations, and the
same can be done in user code, I don't see this as an argument
preventing user code from copying the object byte-by-byte.

Chris Vine

unread,
Feb 14, 2023, 7:45:33 PM2/14/23
to
We are agreed on your last sentence. I have been of the view that user code
can copy a trivially copyable object byte-by-byte, for the reasons I gave.
However, on rereading your posts, I am unclear whether you maintain that
the OP's original code has defined behaviour: that is, whether or not it falls
foul of [expr.add]/4 (the rules concerning pointer arithmetic).

If you think the code doesn't have undefined behaviour, is that on the basis
that because

auto x = p[2];

achieves the same result as an incrementing version

unsigned char* tmp = p; ++tmp; ++tmp;
auto x = *tmp

the former must be treated as OK whatever [expr.add]/4 may on the face of
it say to the contrary?

(Here 'p' is the pointer p in the OP's original code.)

Vir Campestris

unread,
Feb 15, 2023, 12:39:10 PM2/15/23
to
On 13/02/2023 10:53, Frederick Virchanza Gotham wrote:
> The documentation says that the IP address is stored in network-byte order inside an unsigned long. On systems where long is 8 bytes, this would mean:
>
> [0][0][0][0][a][b][c][d]
>
> However that is not how they store it. They do:
>
> [a][b][c][d][0][0][0][0]

The address _is_ in network order. It's just it's not in the word where
you expected it. I'd expect to be able to make a union with unsigned
char[4] and see the data correctly - and this mapping does that.

Andy

Richard Damon

unread,
Feb 15, 2023, 7:53:53 PM2/15/23
to
And I would expect that the description of the data being stored in an
"unsigned long" within the packet structure is likely incorrect, and an
anacronism to when long was a synonym for a 32 bit number. The packet
structure likely actually matches how the bytes are presented on the
wire, and thus the IP address is stored in a 4 byte unsigned value. Of
course putting it into an unsigned long when outside the packet
structure works, but just wastes some memory.

This documentation predates the creation of uint32_t, which is likely
what would be used if it was decided to rewrite the documentation today.


Of course, the short/long <-> s/l in the function names becomes a bit of
disconnect at that point.

Scott Lurndal

unread,
Feb 15, 2023, 10:07:31 PM2/15/23
to
I think that would be very unlikely. IPv4 addresses aren't (and never
were) the only form of network endpoint addressing on ethernet or any other
transport (IPv6, X.25, BNA, SNA, DECnet, UDS). Since bind(2) can bind
to IPv4, IPv6 and unix domain socket addresses, the address field by its very
nature must be variable length.

Richard Damon

unread,
Feb 15, 2023, 10:35:48 PM2/15/23
to
But the OP is talking about looking into the structure format that is
specifically IPv4. (sockaddr_in).

Yes, a generic sockaddr structure needs to store all sorts of stuff for
all types of interfaces, and if I understand it right, stores IPv4
addreess like they are stored in sockaddr_in.
0 new messages