Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Does this memory access yield undefined behaviour?

4 views
Skip to first unread message

Matthias Hofmann

unread,
Apr 1, 2009, 3:17:13 AM4/1/09
to
Hello everyone,

please have a look at the following code, which is a stripped-down version
of a 24 bit graphics routine I am working at:

int main()
{
const int size = 12;
const int width = 4;

int red = 0x00ff0000;
int green = 0x0000ff00;
int blue = 0x000000ff;

char* pc = new char[size];

int* pi = ( int* ) pc;

for ( int i = 0; i < width; i++ )
{
*pi &= 0xff000000;
*pi |= red | green | blue;

pi = ( int* ) ( ( char* ) pi + 3 );
}

delete [] pc;

return 0;
}

The problem is that there is no 24 bit data type in C++, so I am using a 32
bit data type (my original code uses a Windows DWORD, which is defined as
unsigned long, but I am using int in my example for simplicity) to iterate
through a scanline of 24 bit pixels.

So in the last iteration of the loop, only the first 3 of the 4 bytes that
'pi' points to belong to the allocated array. In the line

*pi &= 0xff000000;

these first 3 bytes are set to 0, while the last byte, which is one past the
end of the array, is left unchanged. Similarly, in the line

*pi |= red | green | blue;

the first 3 bytes that 'pi' points to are set to 255, while the last byte is
left unchanged again.

Now I wonder whether my code yields undefined behaviour?

--
Matthias Hofmann
Anvil-Soft, CEO
http://www.anvil-soft.com - The Creators of Toilet Tycoon
http://www.anvil-soft.de - Die Macher des Klomanagers

[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

SG

unread,
Apr 1, 2009, 9:32:40 AM4/1/09
to
On 1 Apr., 09:17, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> Hello everyone,
>
> please have a look at the following code, which is a stripped-down version
> of a 24 bit graphics routine I am working at:
>
> int main()
> {
> const int size = 12;
> const int width = 4;
> int red = 0x00ff0000;
> int green = 0x0000ff00;
> int blue = 0x000000ff;
> char* pc = new char[size];
> int* pi = ( int* ) pc;
> for ( int i = 0; i < width; i++ )
> {
> *pi &= 0xff000000;
> *pi |= red | green | blue;
> pi = ( int* ) ( ( char* ) pi + 3 );
> }
> delete [] pc;
> return 0;
> }
> [...]

> Now I wonder whether my code yields undefined behaviour?

I don't know whether it qualifies as "undefined behaviour" or just
"implementation defined". Apart from the obvious endianess issues and
the fact that an int might only have 16 bits you have alignment
problems. On some machine a pointer to an int could be restricted to
point at "even addresses" only (for example).

What you *can* do is using memcpy:

std::vector<unsigned char> blah (12);
unsigned k = 0;
std::memcpy(&k,&blah[0],3);

provided that "unsinged" has at least 24 bits, a character has exactly
8 bits and the integer's layout is little endian (an x86 based
machine).

Checkout the <climits> header file and its macros UINT_MAX and
ULONG_MAX and pick the first type of {unsigned, unsigned long} where
xxx_MAX >= 16777215. Well, according to the standard ULONG_MAX is
guaranteed to be at least 2^32-1. So, you only have to check UINT_MAX
like this:

#if UINT_MAX >= 16777215
typedef unsigned pixel24_type;
#else
typedef unsigned long pixel24_type;
#endif

If you want your code to be endian-safe you can assemble your 24 ints
by hand. If that turns out to be too slow on "little-endian + 8bit
chars" machines you can still use std::memcpy for those machines.

Prefer the use of *unsigned* characters. If you use plain chars and
they turn out to be signed than you run into implementation-defined
behaviour when writin something like this:

char c = (some_24bit_color_code) & 0xFF;

The rhs could result in a value of, say, 160. An assignment to a
signed integer with a value outside of the range of possible values
for that integer is implementation-defined. So, in case "char" is
signed and has 8 bits you don't really know what the value of 'c' is
going to be. For *unsigned* integers as target the assignment obeys a
simple rule.

So, assembling your integers would look like this:

inline pixel24_type uchar2pixel(unsigned char* p)
{
return p[0] | (p[1] << 8) | (p[2] << 16);
}

and if you're ultra paranoid and fear characters with more than 8
bits:

inline pixel24_type uchar2pixel(unsigned char* p)
{
return (p[0] & 0xFFu) | ((p[1] & 0xFFu) << 8)
| ((p[2] & 0xFFu) << 16);
}


See http://home.att.net/~jackklein/c/inttypes.html


Cheers!
SG


--

Goran

unread,
Apr 1, 2009, 9:31:52 AM4/1/09
to
> *pi &= 0xff000000;
>
> these first 3 bytes are set to 0, while the last byte, which is one past the
> end of the array, is left unchanged.

It's misleading that you say that this last byte is left unchanged -
it's orthogonal to the problem, which that you are trying to access
it, even if it's only to read it, and it's not yours. That's a no-no.

Other than that, frankly, I see no purpose in the way you are trying
to do things. Did you measure, in an optimized build, a solution with
a simple 3-byte (packed) struct? And did it yield significantly worse
performance than or-ing? One is often hard pressed to beat the
optimizing compiler in bit-twiddling (and VC ain't half bad).

HTH,
Goran.

--

salil

unread,
Apr 1, 2009, 9:31:41 AM4/1/09
to
> int* pi = ( int* ) pc;
>
> pi = ( int* ) ( ( char* ) pi + 3 );
>
Hi,

pi is an integer pointer and generally implementation will have an
alignment requirement. So, on 32-bit systems, integer needs to be
aligned on 4-byte boundary. Since you are incrementing it by 3 bytes,
on ARM systems, it will result into data abort I guess.

regards
Salil


--

Martin Bonner

unread,
Apr 1, 2009, 12:05:08 PM4/1/09
to
On Apr 1, 8:17 am, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> Hello everyone,
>
> please have a look at the following code, which is a stripped-down version
> of a 24 bit graphics routine I am working at:
>
[snip bit twiddling via pointer to int]

> Now I wonder whether my code yields undefined behaviour?
Yes. Access via pointer to integer is undefined behaviour if the
underlying object was actually type array of char.

You will, in practise, blow up the second time through the loop on a
non-x86 machine (eg ARM or PowerPC).

On an x86 machine you will just suffer horrible performance problems
as it tries to fixup the unaligned reads and writes.

Given that the space off the end of the array is quite likely to be
used by the heap for book-keeping, it is entirely possible that your
non-atomic "read-don't modify-replace original" sequence will blow up
some neat lock-free heap management algorithm (if the library
atomically increments that byte in a different thread, you will
restore the old value and lose the change).

As others have said, does this actually make a significant difference
to performance?


--

Mathias Gaunard

unread,
Apr 1, 2009, 12:01:05 PM4/1/09
to
On 1 avr, 09:17, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:

>
> The problem is that there is no 24 bit data type in C++, so I am using a 32
> bit data type (my original code uses a Windows DWORD, which is defined as
> unsigned long, but I am using int in my example for simplicity) to iterate
> through a scanline of 24 bit pixels.
>
> So in the last iteration of the loop, only the first 3 of the 4 bytes that
> 'pi' points to belong to the allocated array. In the line

> [...]


>
> Now I wonder whether my code yields undefined behaviour?

Yes, it does.
You're reading past the bounds of your array.

--

Pavel Minaev

unread,
Apr 1, 2009, 6:08:32 PM4/1/09
to
On Apr 1, 9:05 am, Martin Bonner <martinfro...@yahoo.co.uk> wrote:
> Yes. Access via pointer to integer is undefined behaviour if the
> underlying object was actually type array of char.

Is it, however? int is POD, and therefore its lifetime begins as soon
as we have a properly aligned block of memory of suitable size, per
the Standard. I always took it to mean that any such suitable block
could have its address cast to int*, and used as such.

Of course, in this particular case, the block isn't properly aligned,
so it doesn't apply here. But in general?

joshua...@gmail.com

unread,
Apr 1, 2009, 9:19:44 PM4/1/09
to
On Apr 1, 12:17 am, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> Hello everyone,
>
> please have a look at the following code, which is a stripped-down version
> of a 24 bit graphics routine I am working at:
[snip]

> Now I wonder whether my code yields undefined behaviour?

Let's see.

1- I think the standard says that you can cast from T* to void* and
back to T* and have that work for all types T. (This may also work for
char*. Does it? See C++03 3.9.2.4. It says that char* and void* shall
have the same representation and alignment constraints. Odd that
unsigned char* isn't mentioned here.) This doesn't work for T* to U*
and back to T* for all types T and U, specifically int*. This is
undefined behavior.

2- Alignment issues. reinterpret_cast<int*>(reinterpret_cast<char*>
(pi) + 3) may not be a valid int pointer. This is undefined behavior.
(Do note that new char[] and new unsigned char[] return memory which
is aligned for any type (assuming the allocation size is large enough
to hold the object), see C++ 5.3.4.10.)

3- You access 1 byte off the end of the array. This is undefined
behavior.

4- You access memory of the char array through an int pointer. Aka the
strict aliasing rule. See C++03 3.10.15. This is undefined behavior.
memcpy (?? and the other related cstdlib functions ??), union, and
placement new are options to get around this.

5- Endian issues with the bit twiddling. You might not get the results
you want. Implementation defined.

6- Assuming specific size of int. Implementation defined.

On Apr 1, 3:08 pm, Pavel Minaev <int...@gmail.com> wrote:
> On Apr 1, 9:05 am, Martin Bonner <martinfro...@yahoo.co.uk> wrote:
>
> > Yes. Access via pointer to integer is undefined behaviour if the
> > underlying object was actually type array of char.
>
> Is it, however? int is POD, and therefore its lifetime begins as soon
> as we have a properly aligned block of memory of suitable size, per
> the Standard. I always took it to mean that any such suitable block
> could have its address cast to int*, and used as such.

This is my understanding. I may be incorrect.

//ex 1
#include <new>
using namespace std;
struct foo { int x; };
int main()
{ char * c = new char[sizeof(foo)];
foo * p = new (c) foo;
p->x = 1;
int y = p->x;
p->~foo();
delete[] c;
return y;
}

//ex 2
#include <new>
using namespace std;
struct foo { int x; };
int main()
{ char * c = new char[sizeof(foo)];
foo * p = new (c) foo;
p->x = 1;
int y = p->x;
// p->~foo();
delete[] c;
return y;
}

//ex 3
#include <new>
using namespace std;
struct foo { int x; };
int main()
{ char * c = new char[sizeof(foo)];
// foo * p = new (c) foo;
foo * p = reinterpret_cast<foo*>(c);
p->x = 1;
int y = p->x;
// p->~foo();
delete[] c;
return y;
}

Ex 1 is well formed and has no unspecified nor implementation defined
behavior. (Do I need to include <new>? Comeau complains if I don't.)

Ex 2 is well formed and has no unspecified nor implementation defined
behavior. It may have a leak. If foo was not POD, would it necessarily
have a leak?

Ex 3 has undefined behavior. It accesses a char[] through an int
pointer, and this violates the strict aliasing rule, see C++03
3.10.15.

I also don't know which objects are we allowed to placement new other
objects into? Any suitably aligned object of sufficient size
(presumably no)? Properly aligned PODs of sufficient size? (How do we
know it's properly aligned besides implementation docs?) Just new'ed
char arrays and new'ed unsigned char arrays of sufficient size?

I don't fully understand what C++ 5.3.4.10 is saying. It mentions that
the difference between the result of the new expression and the
allocation function is an integer multiple of the most stringent
alignment type. It has a note explaining this allows the common idiom
of placement-newing objects into char arrays and unsigned char arrays.
I don't understand how the rules imply the note in parenthesis.

On Apr 1, 3:08 pm, Pavel Minaev <int...@gmail.com> wrote:
> Of course, in this particular case, the block isn't properly aligned,
> so it doesn't apply here. But in general?

But it is suitably aligned. See C++ 5.3.4.10. The return of new char[]
is suitably aligned for any type (for an allocation size sufficient to
hold the object). (However, the op's cast to char*, increment by 3,
and cast back to int* is another story, which is still broken.)


--

litb

unread,
Apr 1, 2009, 9:16:09 PM4/1/09
to
On 2 Apr., 00:08, Pavel Minaev <int...@gmail.com> wrote:
> On Apr 1, 9:05 am, Martin Bonner <martinfro...@yahoo.co.uk> wrote:
>
> > Yes. Access via pointer to integer is undefined behaviour if the
> > underlying object was actually type array of char.
>
> Is it, however? int is POD, and therefore its lifetime begins as soon
> as we have a properly aligned block of memory of suitable size, per
> the Standard. I always took it to mean that any such suitable block
> could have its address cast to int*, and used as such.
>
> Of course, in this particular case, the block isn't properly aligned,
> so it doesn't apply here. But in general?
>

It even is properly aligned. The expression new evaluates to for char
and unsigned char is always correctly aligned for any object that fits
in the allocated block. And if we reuse the storage of the char array,
for objects of type int, then the lifetime of the char array ends, and
the lifetime of the int starts :) I think it in fact is this simple.

What i think is not granted is that the pointer conversion to int*
works. But it pretty much follows from 3.9/2. However, i would wish
the Standard would put an explicit rule about that into
reinterpret_cast cases, or somewhere else (like, pointer conversions
to/from pointers that have identical value representation and
alignment requirements won't change the pointer value). So far there
is no such rule i know of. But anyway, worst would be unspecified
behavior.

Pavel Minaev

unread,
Apr 2, 2009, 8:08:24 AM4/2/09
to
On Apr 1, 6:19 pm, joshuamaur...@gmail.com wrote:
> 1- I think the standard says that you can cast from T* to void* and
> back to T* and have that work for all types T. (This may also work for
> char*. Does it? See C++03 3.9.2.4. It says that char* and void* shall
> have the same representation and alignment constraints. Odd that
> unsigned char* isn't mentioned here.) This doesn't work for T* to U*
> and back to T* for all types T and U, specifically int*. This is
> undefined behavior.
[snip]

> Ex 3 has undefined behavior. It accesses a char[] through an int
> pointer, and this violates the strict aliasing rule, see C++03
> 3.10.15.

Now this gets interesting. On one hand, we do indeed have 3.10
[basic.lval]/15, which is pretty clear in that it applies to all
types, including POD, and specifically int (since it speaks of
unsigned counterparts). On the other hand:

1.8[intro.object]: An object is a region of storage.

3.8[basic.life]: the lifetime of an array object or of an object of
POD type (3.9) starts as soon as storage with proper size and
alignment is obtained

If these two are combined, it seems to follow that any region of
storage of proper size and alignment for POD type T is effectively an
object of type T...

Pavel Minaev

unread,
Apr 2, 2009, 8:06:03 AM4/2/09
to
On Apr 1, 6:16 pm, litb <Schaub-Johan...@web.de> wrote:
> > Of course, in this particular case, the block isn't properly aligned,
> > so it doesn't apply here. But in general?
>
> It even is properly aligned. The expression new evaluates to for char
> and unsigned char is always correctly aligned for any object that fits
> in the allocated block.

You misinterpret that requirement. It only applies if you take the
exact pointer value returned to you by operator new, and use that as
an address of memory location to store your objects into. But once you
start doing pointer arithmetics on it, all bets are off. Simply put,
while block of memory at "p = new char[sizeof(int) + 1]" is definitely
correctly aligned for int, the block of memory at "p + sizeof(int) -
1" need not be (and, in fact, isn't on many popular architectures.

> And if we reuse the storage of the char array,
> for objects of type int, then the lifetime of the char array ends, and
> the lifetime of the int starts :) I think it in fact is this simple.

The lifetime of the char array does not end. Since it is also a POD,
it lives for as long as the corresponding storage is there. The fact
that the same storage is also used for int is irrelevant.

> What i think is not granted is that the pointer conversion to int*
> works. But it pretty much follows from 3.9/2. However, i would wish
> the Standard would put an explicit rule about that into
> reinterpret_cast cases, or somewhere else (like, pointer conversions
> to/from pointers that have identical value representation and
> alignment requirements won't change the pointer value). So far there
> is no such rule i know of. But anyway, worst would be unspecified
> behavior.

I believe that for PODs you can always use the following well-defined
pattern (again, assuming all the alignment requirements are
satisfied):

static_cast<pod2_t*>(static_cast<void*>(&pod1));

litb

unread,
Apr 2, 2009, 4:04:14 PM4/2/09
to
On 2 Apr., 14:06, Pavel Minaev <int...@gmail.com> wrote:
> On Apr 1, 6:16 pm, litb <Schaub-Johan...@web.de> wrote:
>
> > > Of course, in this particular case, the block isn't properly aligned,
> > > so it doesn't apply here. But in general?
>
> > It even is properly aligned. The expression new evaluates to for char
> > and unsigned char is always correctly aligned for any object that fits
> > in the allocated block.
>
> You misinterpret that requirement. It only applies if you take the
> exact pointer value returned to you by operator new, and use that as
> an address of memory location to store your objects into.

Yes that is exactly what i said :) That statement was not intended to
include the cast to int* yet.

> But once you
> start doing pointer arithmetics on it, all bets are off. Simply put,
> while block of memory at "p = new char[sizeof(int) + 1]" is definitely
> correctly aligned for int, the block of memory at "p + sizeof(int) -
> 1" need not be (and, in fact, isn't on many popular architectures.
>

Yes i agree with you. My answer was as a rejection to the quoted text.
I didn't want to say that the original poster's code was correct and
wasn't breaking any aligning rules.

> > And if we reuse the storage of the char array,
> > for objects of type int, then the lifetime of the char array ends, and
> > the lifetime of the int starts :) I think it in fact is this simple.
>
> The lifetime of the char array does not end. Since it is also a POD,
> it lives for as long as the corresponding storage is there. The fact
> that the same storage is also used for int is irrelevant.
>

I disagree with you. The Standard says the lifetime ends as soon as
the storage is reused. If we reuse it for other purposes, then the
lifetime of the previously living object ends exactly there
(disregarding of whether we have got a POD there originally or not. We
don't need to call any destructors even if there is a non-trivial
destructor).

> > What i think is not granted is that the pointer conversion to int*
> > works. But it pretty much follows from 3.9/2. However, i would wish
> > the Standard would put an explicit rule about that into
> > reinterpret_cast cases, or somewhere else (like, pointer conversions
> > to/from pointers that have identical value representation and
> > alignment requirements won't change the pointer value). So far there
> > is no such rule i know of. But anyway, worst would be unspecified
> > behavior.
>
> I believe that for PODs you can always use the following well-defined
> pattern (again, assuming all the alignment requirements are
> satisfied):
>
> static_cast<pod2_t*>(static_cast<void*>(&pod1));
>

I would like to get a reference to the appropriate Standard section
please. This looks interesting.

Arne Mertz

unread,
Apr 8, 2009, 9:51:12 AM4/8/09
to
litb schrieb:

>>> And if we reuse the storage of the char array,
>>> for objects of type int, then the lifetime of the char array ends, and
>>> the lifetime of the int starts :) I think it in fact is this simple.
>> The lifetime of the char array does not end. Since it is also a POD,
>> it lives for as long as the corresponding storage is there. The fact
>> that the same storage is also used for int is irrelevant.
>>
>
> I disagree with you. The Standard says the lifetime ends as soon as
> the storage is reused. If we reuse it for other purposes, then the
> lifetime of the previously living object ends exactly there
> (disregarding of whether we have got a POD there originally or not. We
> don't need to call any destructors even if there is a non-trivial
> destructor).
>

So as a conclusion, if we have two pointers of different Types
pointing to the same storage location (e.g. as a result of an
interpret_cast) and actually access that storage using the pointers
alternately, we have two alternating lifetime-cycles of Objects of
the two types.
In the presence of concurrency, having pointers to the same storage
location in different threads/processes would mean it was
indeterminate if the lifetime of that object has ended or not.
Sounds a bit like "Schrödingers pointer".

greets
A

SG

unread,
Apr 8, 2009, 2:54:36 PM4/8/09
to
On 8 Apr., 15:51, Arne Mertz <n...@arne-mertz.de> wrote:
> So as a conclusion, if we have two pointers of different Types
> pointing to the same storage location (e.g. as a result of an
> interpret_cast) and actually access that storage using the pointers
> alternately, we have two alternating lifetime-cycles of Objects of
> the two types.
> In the presence of concurrency, having pointers to the same storage
> location in different threads/processes would mean it was
> indeterminate if the lifetime of that object has ended or not.
> Sounds a bit like "Schrödingers pointer".

You don't need concurrent access to get a weird behaviour. The
compiler is allowed to assume that a pointer T* and a pointer U* don't
point to the same memory location in case neither T nor U is char or
unsigned char (strict aliasing rule). Under this assumption the
compiler is allowed to reorder instructions. If you violate this
aliasing rule instruction reorderings may become observable.

Cheers!
SG

Matthias Hofmann

unread,
Apr 9, 2009, 6:44:25 PM4/9/09
to
"litb" <Schaub-...@web.de> schrieb im Newsbeitrag
news:7ae1bd49-c12e-4e9d...@c9g2000yqm.googlegroups.com...

>> I believe that for PODs you can always use the following well-defined
>> pattern (again, assuming all the alignment requirements are
>> satisfied):
>>
>> static_cast<pod2_t*>(static_cast<void*>(&pod1));
>>
>
> I would like to get a reference to the appropriate Standard section
> please. This looks interesting.

According to 3.9.2/4, a void pointer shall be able to hold any object
pointer. In addition, 4.10/2 says that a pointer to any object type can be
converted to a void pointer and that the result is a pointer to the start of
the storage location where the object resides. This is confirmed by 3.9.2/4,
which furthermore says that a pointer to void shall have the same
representation and alignment requirements as a pointer to char. The
conversion from pointer to object to pointer to void can be done implicitly
or using a static_cast (5.2.9/4).

As far as casting from pointer to void to pointer to object is concerned,
the standard only guarantees that casting from T* to void and back to T*
will work (5.2.9/10). However, an exception for char which is scattered all
over the standrad, but most of it can be found in 3.9 and has to do with the
basic properties of objects being storage regions. Also note that there is
confusion whether unsigned char or just plain char should be used to access
memory, because plain char might be signed or unsigned, and in the former
case certain bit patterns might be adjusted when the memory is accessed. The
following two postings might be interesting for you:

http://groups.google.com/group/comp.lang.c++.moderated/msg/81b1187f784d7274
http://groups.google.com/group/comp.lang.c++.moderated/msg/aa874558058b127c

Anyway, its seems like you can cast from a T* to a char* with the following
two convenience functions:

// Converts a pointer of any non-const
// type to a non-const char pointer.
inline char* char_ptr( void* p ) throw()
{ return static_cast<char*>( p ); }

// Converts a pointer of any constant
// type to a constant char pointer.
inline const char* char_ptr( const void* p ) throw()
{ return static_cast<const char*>( p ); }

--
Matthias Hofmann
Anvil-Soft, CEO
http://www.anvil-soft.com - The Creators of Toilet Tycoon
http://www.anvil-soft.de - Die Macher des Klomanagers

joshua...@gmail.com

unread,
Apr 9, 2009, 10:10:59 PM4/9/09
to
On Apr 9, 3:44 pm, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> "litb" <Schaub-Johan...@web.de> schrieb im

Newsbeitragnews:7ae1bd49-c12e-4e9d...@c9g2000yqm.googlegroups
.com...
>
> >> I believe that for PODs you can always use the following well-defined
> >> pattern (again, assuming all the alignment requirements are
> >> satisfied):
>
> >> static_cast<pod2_t*>(static_cast<void*>(&pod1));
>
> > I would like to get a reference to the appropriate Standard section
> > please. This looks interesting.

Note that the example was for a pod object of type pod1_t, taking the
address, and casting to pod2_t*, a pointer to a different POD type.
This is not allowed in the general case. Your reply is as though the
example was a cast from pod_t* to void*, and then back to the same
pod_t*.


On Apr 9, 3:44 pm, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> As far as casting from pointer to void to pointer to object is concerned,
> the standard only guarantees that casting from T* to void and back to T*
> will work (5.2.9/10).

This is what I thought too. However, Alf P. Steinbach was kind enough
to correct me here:
http://groups.google.com/group/comp.lang.c++.moderated/msg/a6c8c48c140d668e

9.2/17
> A pointer to a POD-struct object, suitably converted using a
reinterpret_cast, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides) and vice versa.


On Apr 9, 3:44 pm, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> However, an exception for char which is scattered all
> over the standrad, but most of it can be found in 3.9 and has to do with
the
> basic properties of objects being storage regions.

As Alf P. Steinbach noted when he corrected me, neither of us could
actually find any sort of explicit exception allowing taking a pointer
to POD, casting to char* (or unsigned char*), and then reading or
writing through the char*. However, it's quite heavily implied, enough
so I feel safe using it.

3.8/5 says that after memory allocation has occurred, but before
construction of a non-POD object in that memory, you may static_cast a
pointer to the memory to void*, then static_cast to char* and unsigned
char*. Presumably you are allowed to read and write through this char*
or unsigned char*.

3.9/2 strongly implies that we can take the address of any POD, cast
to char* or unsigned char*, and read or write through the char* or
unsigned char*. It uses memcpy in the example, and the note also
mentions memcpy and memmove. It does not state that you can read or
write to a POD through a char* or unsigned char*.

3.9.2/4 says a void* can point at any object, and strongly implies a
char* can point at any object by stating it has the same alignment
restrictions and representation as void*. (unsigned char* is absent.
An oversight I presume.)

3.10/15, the strict aliasing rule, says that accessing any object
through the wrong lvalue type results in undefined behavior, and lists
exceptions. Accessing through a char lvalue and a unsigned char lvalue
are listed as exceptions.

>From this, I feel it is at least intended, if not allowed, to do as
follows: To take a pointer to a POD object, cast to char* or unsigned
char*, and read or write through the char* or unsigned char*. Reads
and writes to the POD object and through the char* or unsigned char*
may be done in any order. The bytes read from the char* and unsigned
char* are implementation defined. The result of reading from the POD
struct when writing arbitrary bytes through the char* or unsigned
char* is specific to each implementation, possibly undefined behavior
(?? someone correct me here). For example, you might have just written
a trap representation of the POD type via the char*. Otherwise there
is no undefined behavior, such as alignment issues or aliasing
issues.


On Apr 9, 3:44 pm, "Matthias Hofmann" <hofm...@anvil-soft.com> wrote:
> Also note that there is
> confusion whether unsigned char or just plain char should be used to
access
> memory, because plain char might be signed or unsigned, and in the former
> case certain bit patterns might be adjusted when the memory is accessed.
The
> following two postings might be interesting for you:

[snip]

I don't think there's confusion amongst knowledgeable programmers on
char* vs unsigned char*. I believe there is a common mistake which
people use char*, and char might be signed type, and that can produce
interesting sign expansion errors, whereas most knowledgeable
programmers will tell you to prefer unsigned char* for such things to
avoid sign expansion errors.


--

0 new messages