Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

conversions between "const char *" and "const unsigned char *"

535 views
Skip to first unread message

Alan McKenney

unread,
Jul 29, 2007, 11:09:58 PM7/29/07
to
In some code that we use, some of our libraries use "const char"
pointers to pass strings (arrays) of bytes and others use
"const unsigned char" pointers. Wherever the signedness of a byte
matters,
the value of the individual byte is cast to the appropriate type
(unsigned char or char).


As far as I can tell, the only way that C++ allows one to convert
from one to the other is via "reinterpret_cast", so there are
reinterpret_cast's
whenever we have to pass data from one library to another. E.g.,

f( const char *, size_t );
....
g( const unsigned char *str, size_t len ) {
// ...
f( reinterpret_cast<const char *>( str ), len );
// ...
}

So far, this has not caused us any problems, because on all
implementations
we use (or that I know of), there's no difference in how unsigned and
(signed?) char
is stored or in the format of the pointers.

1. Are there any (real) implementations where using the
reinterpret_cast
this way will cause any problems?

2. Do the requirements of the C++ standard pretty much insure that
what
we are doing will work on any (non-pathological) implementation?

-- Alan McKenney


--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

Alberto Ganesh Barbati

unread,
Jul 30, 2007, 8:52:59 PM7/30/07
to
Alan McKenney ha scritto:

>
> 1. Are there any (real) implementations where using the
> reinterpret_cast
> this way will cause any problems?
>
> 2. Do the requirements of the C++ standard pretty much insure that
> what
> we are doing will work on any (non-pathological) implementation?
>

3.9.1/1 says "A char, a signed char, and an unsigned char occupy the
same amount of storage and have the same alignment requirements (3.9);
that is, they have the same object representation. For character types,
all bits of the object representation participate in the value
representation."

If I interpret this statement correctly, the answers to your questions
are, respectively, no and yes.

HTH,

Ganesh

Maciej Sobczak

unread,
Jul 31, 2007, 10:04:41 AM7/31/07
to
On 31 Lip, 02:52, Alberto Ganesh Barbati <AlbertoBarb...@libero.it>
wrote:

> 3.9.1/1 says "A char, a signed char, and an unsigned char occupy the
> same amount of storage and have the same alignment requirements (3.9);
> that is, they have the same object representation. For character types,
> all bits of the object representation participate in the value
> representation."

And the next sentence in the same paragraph states that only in
unsigned char any combination of bits forms a valid value. This will
not cause any problems with pointers, but can be a potential problem
(on a very pathological platform, of course) when char is used to
*hold* a piece of other object.

--
Maciej Sobczak
http://www.msobczak.com/

Greg Herlihy

unread,
Jul 31, 2007, 10:08:49 AM7/31/07
to
On 7/29/07 8:09 PM, in article
1185743129....@19g2000hsx.googlegroups.com, "Alan McKenney"
<alan_mc...@yahoo.com> wrote:

> As far as I can tell, the only way that C++ allows one to convert
> from one to the other is via "reinterpret_cast", so there are
> reinterpret_cast's
> whenever we have to pass data from one library to another. E.g.,
>
> f( const char *, size_t );
> ....
> g( const unsigned char *str, size_t len ) {
> // ...
> f( reinterpret_cast<const char *>( str ), len );
> // ...
> }
>

By my count, C++ offers (at least) three additional casts capable of
converting a const unsigned char pointer to a const char pointer. They are:

A static_cast:

f( static_cast<const char*>(static_cast<const void*>(str)), len);

A "C-style" cast:

f( (const char *) str, len );

and a functional cast:

typedef const char * ConstCharPtr;

f( ConstCharPtr( str ), len );

In my opinion, either one of the last two casts would likely be a more
appropriate choice for the cast in question than a reinterpret_cast. A
reinterpret_cast usually signifies a questionable or potentially dangerous
type conversion. Generally, a garden-variety signed-to-unsigned char pointer
conversion would not merit such a high level of concern (although
conceivably, it might).

At any rate, there is no "wrong" choice when selecting a cast in this
situation. None of these casts actually has any effect upon the program's
execution - and each one compiles away to nothing.

> f( const char *<, size_t );


> ....
> g( const unsigned char *str, size_t len ) {
> // ...
> f( reinterpret_cast<const char *>( str ), len );
> // ...
> }

> 1. Are there any (real) implementations where using the


> reinterpret_cast
> this way will cause any problems?

How could it? The only C++ cast operator that performs actual work is
dynamic_cast<>. The other cast operators are purely language constructs -
whose only purpose is to make type conversions in C++ inconvenient and
unsightly - and thereby discourage their use in C++ programming.



> 2. Do the requirements of the C++ standard pretty much insure that
> what
> we are doing will work on any (non-pathological) implementation?

The cast "operators" themselves pose little risk to a C++ program. The
converted values themselves, on the other hand, do pose a potential danger.
The degree of risk depends on whether the converted values meet the
program's expectations for data of that type.

To illustrate: a C++ program might assume that all valid char values are
non-negative. Yet a large unsigned char value when converted to a signed
char representation acquires a negative value. So a valid char before the
conversion - becomes an invalid char (from the program's point of view)
after the conversion. Now the C++ cast operators can help a programmer
identify where in the source code, the program is performing type
conversions (and which of those conversions are the most likely to cause
problems). But it falls to the the C++ programmer to ensure that every type
conversion undertaken by a C++ program - is safe.

Greg

Roman.Pe...@gmail.com

unread,
Jul 31, 2007, 10:04:56 AM7/31/07
to
On 31 , 04:52, Alberto Ganesh Barbati <AlbertoBarb...@libero.it>
wrote:

> Alan McKenney ha scritto:
>
>
>
> > 1. Are there any (real) implementations where using the
> > reinterpret_cast
> > this way will cause any problems?
>
> > 2. Do the requirements of the C++ standard pretty much insure that
> > what
> > we are doing will work on any (non-pathological) implementation?
>
> 3.9.1/1 says "A char, a signed char, and an unsigned char occupy the
> same amount of storage and have the same alignment requirements (3.9);
> that is, they have the same object representation. For character types,
> all bits of the object representation participate in the value
> representation."
>
> If I interpret this statement correctly, the answers to your questions
> are, respectively, no and yes.
>
> HTH,
>
> Ganesh

I'm curious how one can access object of type 'char' through
pointer of type 'unsigned char *' without causing UB and without
relying on implementation defined behaviour. I believe, one should use
static_cast<unsigned char *>(static_cast<void*>(&c)) and should
not use reinterpret_cast. Is it correct? Can someone point paragraph
in standart which makes static_cast trick legal?

Roman Perepelitsa.

Tjark Weber

unread,
Jul 31, 2007, 10:11:28 AM7/31/07
to
On Jul 31, 2:52 am, Alberto Ganesh Barbati <AlbertoBarb...@libero.it>
wrote:

> Alan McKenney ha scritto:
> > 1. Are there any (real) implementations where using the
> > reinterpret_cast this way will cause any problems?
>
> > 2. Do the requirements of the C++ standard pretty much insure that
> > what we are doing will work on any (non-pathological)
implementation?
>
> 3.9.1/1 says "A char, a signed char, and an unsigned char occupy the
> same amount of storage and have the same alignment requirements (3.9);
> that is, they have the same object representation. For character types,
> all bits of the object representation participate in the value
> representation."
>
> If I interpret this statement correctly, the answers to your questions
> are, respectively, no and yes.

You're talking about char/unsigned char here, while the original
question is about *pointers* to (const) char/unsigned char.

The standard says little about conversions between pointers of
different types. The (only) relevant paragraph seems to be 5.2.10
(7): "A pointer to an object can be explicitly converted to a pointer
to an object of different type. Except that converting an rvalue of
type 'pointer to T1' to the type 'pointer to T2' [...] and back to its
original type yields the original pointer value, the result of such a
pointer conversion is unspecified."

In other words: the *only* thing that we know about


reinterpret_cast<const char *>( str )

is that
reinterpret_cast<const unsigned char *>( reinterpret_cast<const char
*>( str ) )
will be equal to
str

Nothing more. In particular, we have *no idea* what will happen when
we dereference


reinterpret_cast<const char *>( str )

So the answer to Alan's second question, as far as I can see, seems to
be no.

Tjark

Greg Herlihy

unread,
Jul 31, 2007, 7:04:53 PM7/31/07
to


On 7/31/07 7:04 AM, in article
1185863623.6...@q75g2000hsh.googlegroups.com, "Maciej Sobczak"
<see.my....@gmail.com> wrote:

> On 31 Lip, 02:52, Alberto Ganesh Barbati <AlbertoBarb...@libero.it>
> wrote:
>
>> 3.9.1/1 says "A char, a signed char, and an unsigned char occupy the
>> same amount of storage and have the same alignment requirements (3.9);
>> that is, they have the same object representation. For character types,
>> all bits of the object representation participate in the value
>> representation."
>
> And the next sentence in the same paragraph states that only in
> unsigned char any combination of bits forms a valid value. This will
> not cause any problems with pointers, but can be a potential problem
> (on a very pathological platform, of course) when char is used to
> *hold* a piece of other object.

No, a char works just as well as an unsigned char when it comes to holding
the value of a (POD) object. §3.9.1/2 simply states that every bit pattern
of an unsigned char has to represent a number. Therefore an implementation
may choose to reserve certain bit patterns to represent values other than
numbers - whenever one of those bit patterns is interpreted as a char (or
signed) char value. But regardless of how the bits of a char value are
interpreted, §3.9/2 guarantees that the value of a POD object whose bytes
are copied into an array of chars (or into an array of unsigned chars) - and
back to the object - will retain its original value.

After all, the value of a byte in memory is not inherently signed or
unsigned - instead it is the operation which may be performed upon that
value that comes in two varieties: signed or unsigned. In other words,
"addition" really comprises two distinct operations: signed addition and
unsigned addition - just as "multiplication" may be either signed or
unsigned. So the signedness of a type in C++ is really just a directive to
the compiler - a directive that specifies which set (signed or unsigned) of
operations the compiler should choose from - whenever an operation is
performed upon that value. Since copying byte values involves no operations
upon the values of the bytes being copied themselves - it stands to reason
that whatever signedness a program may choose to attach to those values - is
irrelevant.

Greg

Alan McKenney

unread,
Aug 1, 2007, 1:18:37 AM8/1/07
to
On Jul 31, 10:11 am, Tjark Weber <tjark.we...@gmail.com> wrote:
> > Alan McKenney ha scritto:
> > > 1. Are there any (real) implementations where using the
> > > reinterpret_cast this way will cause any problems?
>
> > > 2. Do the requirements of the C++ standard pretty much insure that
> > > what we are doing will work on any (non-pathological)
> > > implementation?
>
> ... In particular, we have *no idea* what will happen when

> we dereference
> reinterpret_cast<const char *>( str )

[ where str is "const unsigned char*"]

> So the answer to Alan's second question, as far as I can see, seems to
> be no.
>
> Tjark

Well, by putting in "(non-pathological)" I was trying to go
beyond the trivial question of whether this was explicitly
allowed by the standard.

I think most readers of this group know that for general types S
and T,
casting a T* to an S* with "reinterpret_cast" and then
dereferencing it
is UB (which includes working as a naive user might expect.)

I'm guessing that an implementation which represented
"const char *" with the byte address of the value (or
first value for arrays) and "const unsigned char *"
with the one's complement of the byte address
would be standard-conformant. But I believe it
would be pathological -- I cannot imagine any reason
to do it other than to annoy people.

My question was whether the requirements of the standard
make it pointless for "const char *" and "const unsigned
char *" to have different representations, or some other
problem that would break our code.

By comparison, on a 64-bit word-addressed computer (e.g., Crays),
it makes perfect sense for "char" pointers to have a
different representation from "int" pointers: for char
pointers, you shift the word address left three bits and
use the last three to index within the word. (It's not the
only way to do it, but it's a reasonable one.)

On the other hand, at one point I convinced myself that there
is no value in having pointers to class objects of
different types have different representations, because
there are so many cases where a legal pointer to one type
can be converted to a legal pointer to another.


-- Alan McKenney

Alf P. Steinbach

unread,
Aug 1, 2007, 1:19:21 AM8/1/07
to
* Roman.Pe...@gmail.com:

>>
> I'm curious how one can access object of type 'char' through
> pointer of type 'unsigned char *' without causing UB and without
> relying on implementation defined behaviour.

Copying values back and forth is well-defined by the requirement that
copying POD to char values and back yields original POD.

Doing arithmetic on the values formally involves implementation defined
behavior.

However, computers that don't use two's complement form for signed
integers are largely extinct, so in practice this issue is moot.


> I believe, one should use
> static_cast<unsigned char *>(static_cast<void*>(&c)) and should
> not use reinterpret_cast. Is it correct? Can someone point paragraph
> in standart which makes static_cast trick legal?

IMHO it's rather silly to introduce unneeded verbosity that reduces
clarity (so much that the author wonders whether it's formally correct).

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

Maciej Sobczak

unread,
Aug 1, 2007, 9:20:48 AM8/1/07
to
On 1 Sie, 01:04, Greg Herlihy <gre...@pacbell.net> wrote:

> > And the next sentence in the same paragraph states that only in
> > unsigned char any combination of bits forms a valid value. This will
> > not cause any problems with pointers, but can be a potential problem
> > (on a very pathological platform, of course) when char is used to
> > *hold* a piece of other object.
>
> No, a char works just as well as an unsigned char when it comes to holding
> the value of a (POD) object.

What about referencing it (converting to r-value)?

> '3.9.1/2 simply states that every bit pattern


> of an unsigned char has to represent a number.

Yes. Nothing like this for char.

> Therefore an implementation
> may choose to reserve certain bit patterns to represent values other than
> numbers

Like for example *trap values*?
The way I read the standard makes me think that not every bit
combination is guaranteed to be a valid char value.

> But regardless of how the bits of a char value are
> interpreted, '3.9/2 guarantees that the value of a POD object whose bytes
> are copied into an array of chars (or into an array of unsigned chars) -
and
> back to the object - will retain its original value.

Yes, but that does not involve reading the char object (converting it
to r-value). Note that for copying the memcpy functio is proposed in
the standard examples, not an explicit for loop. Why?

I would say that the following:

T t;
unsigned char buf[sizeof(T)];
unsigned char *pt = reinterpret_cast<char*>(&t);
for (int i = 0; i != sizeof(T); ++i)
{
buf[i] = *(pt + i);
}

is guaranteed to be valid, wheres the analogous version with char
instead of unsigned char is not (we are talking about permission for
pathology, of course).

However, the following:

T t;
char buf[sizeof(T)];
memcpy(buf, &t, sizeof(T));

(and back) is fine, since memcpy does not involve referencing the
particular char objects (please note that the interface for memcpy
does not use char* but void* and we can rely that it does "the right
thing" no matter what are the pathologies of char).

Is this over-paranoic, or just strict reading of the standard?

--
Maciej Sobczak
http://www.msobczak.com/

--

Martin Bonner

unread,
Aug 1, 2007, 4:24:39 PM8/1/07
to
On Jul 31, 3:08 pm, Greg Herlihy <gre...@pacbell.net> wrote:
> On 7/29/07 8:09 PM, in article
> 1185743129.117776.88...@19g2000hsx.googlegroups.com, "Alan McKenney"

> <alan_mckenn...@yahoo.com> wrote:
> > 1. Are there any (real) implementations where using the
> > reinterpret_cast
> > this way will cause any problems?
>
> How could it? The only C++ cast operator that performs actual work is
> dynamic_cast<>. The other cast operators are purely language constructs -
> whose only purpose is to make type conversions in C++ inconvenient and
> unsightly - and thereby discourage their use in C++ programming.

That's not true. Consider:
struct A { int a; }; struct B { int b; };
struct Derived : A, B {};

A* pa = ...; B* pb = ...;

std::cout << pa << " " << static_cast<Derived*>(pa) << std::endl;
std::cout << pb << " " << static_cast<Derived*>(pb) << std::endl;

At least one of the static casts will change the actual numerical
value of the pointer.

Brian Neal

unread,
Aug 1, 2007, 8:37:11 PM8/1/07
to
On Aug 1, 12:19 am, "Alf P. Steinbach" <al...@start.no> wrote:
> * Roman.Perepeli...@gmail.com:

> > I believe, one should use
> > static_cast<unsigned char *>(static_cast<void*>(&c)) and should
> > not use reinterpret_cast. Is it correct? Can someone point paragraph
> > in standart which makes static_cast trick legal?
>
> IMHO it's rather silly to introduce unneeded verbosity that reduces
> clarity (so much that the author wonders whether it's formally correct).
>

Sutter and Alexandrescu recommend something like this in item 92 of
their book "C++ Coding Standards" to avoid reinterpret_cast in this
situation:

T1* p1 = ...;
void* vp = p1;
T2* p2 = static_cast<T2*>(vp);

I've seen others just do this: T2* p2 = static_cast<T2*>((void*) p1);

It is frustrating that this question comes up all the time and we
never hear a definitive answer. In low level programming you
frequently have to cast char* to unsigned char*. In C you would just
(unsigned char*) cast it. In C++, no one seems to know what to do. I
have talked my colleagues out of reinterpret_cast and have gotten them
to do the Sutter/Alexandrescu advice. But it seems like a lot of extra
code to write.

I made a post a long time ago here asking for comments on a
pointer_cast<> template function that does the Sutter and Alexandrescu
technique to reduce typing, but got few responses. So I'll repeat it
here. :-)

template<typename T>
inline T pointer_cast(void* p)
{
return static_cast<T>(p);
}

Usage:

char* cp = ...;
unsigned char* ucp = pointer_cast<unsigned char*>(cp);

(I had const and volatile variations too). At least the use of
pointer_cast makes it clear my intent: I am trying to cast between
unrelated pointer types that I know "should" work (e.g. char* and
unsigned char*), and I am avoiding reinterpret_cast and C-style casts.
Comments?


--

Alf P. Steinbach

unread,
Aug 2, 2007, 1:11:06 PM8/2/07
to
* Brian Neal:

> On Aug 1, 12:19 am, "Alf P. Steinbach" <al...@start.no> wrote:
>> * Roman.Perepeli...@gmail.com:
>>> I believe, one should use
>>> static_cast<unsigned char *>(static_cast<void*>(&c)) and should
>>> not use reinterpret_cast. Is it correct? Can someone point paragraph
>>> in standart which makes static_cast trick legal?
>> IMHO it's rather silly to introduce unneeded verbosity that reduces
>> clarity (so much that the author wonders whether it's formally correct).
>>
>
> Sutter and Alexandrescu recommend something like this in item 92 of
> their book "C++ Coding Standards" to avoid reinterpret_cast in this
> situation:
>
> T1* p1 = ...;
> void* vp = p1;
> T2* p2 = static_cast<T2*>(vp);

Without the context it's difficult to say whether this is reasonable.

It would need some compelling technical reason to avoid naming the cast
as a reinterpret_cast, and none is apparent.

I suspect that the recommendation's rationale, whatever it is, is not
equally endorsed by both authors, and furthermore that it's limited to
some very narrow context: not a general rule but a special case rule.


> I've seen others just do this: T2* p2 = static_cast<T2*>((void*) p1);
>
> It is frustrating that this question comes up all the time and we
> never hear a definitive answer.

What's the question?

The above construction as a general replacement for reinterpret_cast is
just wasteful muddling-the-waters verbosity.


> In low level programming you
> frequently have to cast char* to unsigned char*. In C you would just
> (unsigned char*) cast it. In C++, no one seems to know what to do.

On the contrary, in C++ you can also do

(unsigned char*)p

just as in C, and that is equivalent to (it /means/, in this context),

reinterpret_cast<unsigned char*>( p )

which is better because it makes it clear what the intention is, and can
be searched for.


> I
> have talked my colleagues out of reinterpret_cast and have gotten them
> to do the Sutter/Alexandrescu advice. But it seems like a lot of extra
> code to write.

Talk them back. ;-)


> I made a post a long time ago here asking for comments on a
> pointer_cast<> template function that does the Sutter and Alexandrescu
> technique to reduce typing, but got few responses. So I'll repeat it
> here. :-)
>
> template<typename T>
> inline T pointer_cast(void* p)
> {
> return static_cast<T>(p);
> }
>
> Usage:
>
> char* cp = ...;
> unsigned char* ucp = pointer_cast<unsigned char*>(cp);

That's what reinterpret_cast is for.


> (I had const and volatile variations too). At least the use of
> pointer_cast makes it clear my intent: I am trying to cast between
> unrelated pointer types that I know "should" work (e.g. char* and
> unsigned char*), and I am avoiding reinterpret_cast and C-style casts.
> Comments?

That's what reinterpret_cast is for. Mainly, that is. You can also use
reinterpret_cast to cast between pointers/refs to a POD struct type and
the type of its first element, with implementation-defined result.

§5.2.10/10 guarantees that

*reinterpret_cast<T*>( &x )

is an lvalue that refers to "the same object as the source lvalue, but
with a different type".

As far as I know no such guarantee exists for

*static_cast<T*>( static_cast<void*>( &x ) )

The only guarantee I know is that converting T* to void* and back to T*,
yields the original pointer value, per §5.2.9/10.

Using the static_cast is more verbose, it's far less clear, and AFICS
it's also formally Undefined Behavior -- although in practice there's
not much room to wiggle here for a compiler.

In short, it's just silly as a replacement for reinterpret_cast.

There is one situation where conversion to void* is called for, and
that's when storing pointers to objects of different types in e.g. hash
table, used only for object identification. Then the cast to employ is
neither static_cast nor reinterpret_cast, but dynamic_cast, ensuring
that you get a pointer to the most derived sub-object. I'm not sure
whether that has anything to do with the Herb/Andrei recommendation?

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Francis Glassborow

unread,
Aug 2, 2007, 1:28:47 PM8/2/07
to
Brian Neal wrote:

> (I had const and volatile variations too). At least the use of
> pointer_cast makes it clear my intent: I am trying to cast between
> unrelated pointer types that I know "should" work (e.g. char* and
> unsigned char*), and I am avoiding reinterpret_cast and C-style casts.
> Comments?
>
>

That is what reinterpret_cast<> is designed for. Now it would be
different if you were casting between types where the cast MUST work.
However in C++ there are very few cases where this might be true yet a
static_cast<> fails.


--
Note that robinton.demon.co.uk addresses are no longer valid.

Falk Tannhäuser

unread,
Aug 2, 2007, 1:30:14 PM8/2/07
to
Brian Neal wrote:
> Sutter and Alexandrescu recommend something like this in item 92 of
> their book "C++ Coding Standards" to avoid reinterpret_cast in this
> situation:
>
> T1* p1 = ...;
> void* vp = p1;
> T2* p2 = static_cast<T2*>(vp);
>
> I've seen others just do this: T2* p2 = static_cast<T2*>((void*) p1);
>
> It is frustrating that this question comes up all the time and we
> never hear a definitive answer. In low level programming you
> frequently have to cast char* to unsigned char*. In C you would just
> (unsigned char*) cast it. In C++, no one seems to know what to do.

The definitive answer is that reinterpret_cast *is* the Right Thing for
this purpose :-), although I agree that the Standard's wording in
[expr.reinterpret.cast] could probably be clearer. (I believe there
already was a similar discussion on this subject in this newsgroups,
back in February 2006.)

While it is true that, concerning pointer reinterpret_casts, the

standard only says:
"A pointer to an object can be explicitly converted to a pointer to an
object of different type. Except that converting an rvalue of type

"pointer to T1" to the type "pointer to T2" (where T1 and T2 are object
types and where the alignment requirements of T2 are no stricter than
those of T1) and back to its original type yields the original pointer
value, the result of such a pointer conversion is unspecified.",

there is an important supplementary guarantee hidden a few paragraphs
later, when it comes to reference reinterpret_casts:
"An lvalue expression of type T1 can be cast to the type "reference to
T2" if an expression of type "pointer to T1" can be explicitly converted
to the type "pointer to T2" using a reinterpret_cast. That is, a
reference cast reinterpret_cast<T&>(x) has the same effect as the
conversion *reinterpret_cast<T*>(&x) with the built-in & and * operators
[...]. The result [...] refers to the same object as the source lvalue,
but with a different type. [...] No temporary is created, no copy is
made, and constructors or conversion functions are not called."

The most important part here is the "refers to the same object", from
which (together with the equivalence between reference and pointer
reinterpret_casts) we can infer that the result of a pointer
reinterpret_cast points to the same object as the source value.

Without further knowledge about the types of the source and destination
pointers, this still leaves us with undefined or unspecified behaviour
when we try to access an object of some type through an lvalue of a
different type, due to:
* different sizes and alignment requirements of the types,
* the fact that some bit patterns being legal object representation of a
value of one type may not correspond to a legal object representation of
another type,
* even if it does, the actual value is platform dependent (endian-ness...),
* aliasing rules allow the compiler to optimise accesses in such a
manner that modifications of an object through an lvalue of one type may
not always be "seen" through an lvalue of another type.

However, all these problems do exist no matter how the alias is obtained
(through reinterpret_cast, C-style cast, pointer conversion to void*
followed by static_cast to the target pointer type, unions ...). BTW,
C-style casts are equivalent to reinterpret_cast, possibly combined with
const_cast (unless they can be interpreted as static_cast, possibly
combined with const_cast, which is not the case for conversions between
"const char*" and "const unsigned char*"), as stated in [expr.cast].

In the special case of conversion between "const char*" and "const
unsigned char*", problems of size, alignment and aliasing don't occur.
There remains the potential problem of two different bit patterns
representing the same value (+0 or -0) when plain char is signed and
sign/magnitude or one' complement is used in the machine, as it already
has been mentioned in this thread.

> I have talked my colleagues out of reinterpret_cast and have gotten them
> to do the Sutter/Alexandrescu advice. But it seems like a lot of extra
> code to write.
>
> I made a post a long time ago here asking for comments on a
> pointer_cast<> template function that does the Sutter and Alexandrescu
> technique to reduce typing, but got few responses. So I'll repeat it
> here. :-)
>
> template<typename T>
> inline T pointer_cast(void* p)
> {
> return static_cast<T>(p);
> }
>
> Usage:
>
> char* cp = ...;
> unsigned char* ucp = pointer_cast<unsigned char*>(cp);

Well, the advantage is that it is easy to search and replace
pointer_cast by reinterpret_cast :-). More seriously, I would be very
interested to learn which problem S&A (yes I know they are probably
among the most knowledgeable people on the planet when it comes to C++,
and I have great respect for them) exactly intended to address in their
book and how their advice does solve it.

Falk

Brian Neal

unread,
Aug 3, 2007, 1:41:13 AM8/3/07
to
On Aug 2, 12:11 pm, "Alf P. Steinbach" <al...@start.no> wrote:
> * Brian Neal:

> > Sutter and Alexandrescu recommend something like this in item 92 of
> > their book "C++ Coding Standards" to avoid reinterpret_cast in this
> > situation:
>
> > T1* p1 = ...;
> > void* vp = p1;
> > T2* p2 = static_cast<T2*>(vp);
>
> Without the context it's difficult to say whether this is reasonable.

The context is item 92 from their book: avoid reinterpret_cast. In the
above case, how to cast between unrelated pointer types without
reinterpret_cast.

>
> It would need some compelling technical reason to avoid naming the cast
> as a reinterpret_cast, and none is apparent.
>
> I suspect that the recommendation's rationale, whatever it is, is not
> equally endorsed by both authors, and furthermore that it's limited to
> some very narrow context: not a general rule but a special case rule.

Well I cannot speak for how well both authors endorse their own
advice, but they recommend that you in general cast between unrelated
pointer types with a static_cast and an intermediate void*. It is not
a special case. They do not advocate the use of reinterpret_cast for
casting pointers of unrelated types to each other.

> > It is frustrating that this question comes up all the time and we
> > never hear a definitive answer.
>
> What's the question?

How to safely cast between pointers of unrelated types like char* and
unsigned char* without resorting to reinterpret_cast or C-style casts.
C++ programmers have been warned by many sources (like Sutter and
Alexandrescu) not to use reinterpret_cast when you don't have to. In
particular, in this very case in item 92 of their book. And of course
in C++ it is better to use C++ style casts instead of C style casts to
better broadcast intent. That is what led me to the pointer_cast<>
template. Other opinions, such as yours, are welcome.

In the book they say that reinterpret_cast has very little guarantees
and is pretty much up to the implementation on how it should work and
should be avoided when you can. Again, I don't want to put words in
their mouth, or post their text without permission, so please see
their item 92 if you can.

>
> The above construction as a general replacement for reinterpret_cast is
> just wasteful muddling-the-waters verbosity.

[...]


>
>
> That's what reinterpret_cast is for.

[...]


>
>
> That's what reinterpret_cast is for. Mainly, that is.

[...]

> Using the static_cast is more verbose, it's far less clear, and AFICS
> it's also formally Undefined Behavior -- although in practice there's
> not much room to wiggle here for a compiler.
>
> In short, it's just silly as a replacement for reinterpret_cast.

Thank you for your interpretation.


--

Alf P. Steinbach

unread,
Aug 3, 2007, 12:03:37 PM8/3/07
to
* Brian Neal:

> On Aug 2, 12:11 pm, "Alf P. Steinbach" <al...@start.no> wrote:
>> * Brian Neal:
>>> Sutter and Alexandrescu recommend something like this in item 92 of
>>> their book "C++ Coding Standards" to avoid reinterpret_cast in this
>>> situation:
>>> T1* p1 = ...;
>>> void* vp = p1;
>>> T2* p2 = static_cast<T2*>(vp);
>> Without the context it's difficult to say whether this is reasonable.
>
> The context is item 92 from their book: avoid reinterpret_cast. In the
> above case, how to cast between unrelated pointer types without
> reinterpret_cast.

Well yes, you already said that.

When I (implicitly) asked for more, that meant a short explanation,
perhaps with relevant quotes, or quoting the whole thing if it's small
enough.


>> It would need some compelling technical reason to avoid naming the cast
>> as a reinterpret_cast, and none is apparent.
>>
>> I suspect that the recommendation's rationale, whatever it is, is not
>> equally endorsed by both authors, and furthermore that it's limited to
>> some very narrow context: not a general rule but a special case rule.
>
> Well I cannot speak for how well both authors endorse their own
> advice, but they recommend that you in general cast between unrelated
> pointer types with a static_cast and an intermediate void*. It is not
> a special case. They do not advocate the use of reinterpret_cast for
> casting pointers of unrelated types to each other.

Ouch.


[snip]


> In the book they say that reinterpret_cast has very little guarantees
> and is pretty much up to the implementation on how it should work and
> should be avoided when you can. Again, I don't want to put words in
> their mouth, or post their text without permission, so please see
> their item 92 if you can.

Small quotes are "fair use" under US copyright law, and perfectly legal.


>> The above construction as a general replacement for reinterpret_cast is
>> just wasteful muddling-the-waters verbosity.
> [...]
>>
>> That's what reinterpret_cast is for.
> [...]
>>
>> That's what reinterpret_cast is for. Mainly, that is.
>
> [...]
>
>> Using the static_cast is more verbose, it's far less clear, and AFICS
>> it's also formally Undefined Behavior -- although in practice there's
>> not much room to wiggle here for a compiler.
>>
>> In short, it's just silly as a replacement for reinterpret_cast.
>
> Thank you for your interpretation.

You would be welcome to an interpretation, but for that you'll have to
wait until I see that item 92 explained or quoted.

What you got from me -- and snipped! -- was the standard's very
clear guarantees.

That is /not/ an interpretation.

It's the absolute authority on the matter.

And if that's not good enough for you, then you'll never have an
acceptable answer, because there's nothing more authoritative.

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Bo Persson

unread,
Aug 3, 2007, 4:25:14 PM8/3/07
to
Alf P. Steinbach wrote:
:: * Brian Neal:

::: On Aug 2, 12:11 pm, "Alf P. Steinbach" <al...@start.no> wrote:
:::: * Brian Neal:
::::: Sutter and Alexandrescu recommend something like this in item
::::: 92 of their book "C++ Coding Standards" to avoid
::::: reinterpret_cast in this situation:
::::: T1* p1 = ...;
::::: void* vp = p1;
::::: T2* p2 = static_cast<T2*>(vp);
:::: Without the context it's difficult to say whether this is
:::: reasonable.
:::
::: The context is item 92 from their book: avoid reinterpret_cast.
::: In the above case, how to cast between unrelated pointer types
::: without reinterpret_cast.
::
:: Well yes, you already said that.
::
:: When I (implicitly) asked for more, that meant a short explanation,
:: perhaps with relevant quotes, or quoting the whole thing if it's
:: small enough.

There isn't much to quote, actually. Item 92 is a page full of the
good reasons not to use reinterpret_cast, unless we just have to.

And then finally they just say that when casting unrelated pointers,
we should prefer the code shown above. Then the page ends, without a
motivation.

Next page is item 93, "Avoid using static_cast on pointers". :-)

::
:: You would be welcome to an interpretation, but for that you'll


:: have to wait until I see that item 92 explained or quoted.

Second edition of the book?


Bo Persson

--

Brian Neal

unread,
Aug 6, 2007, 6:13:35 PM8/6/07
to
On Aug 3, 3:25 pm, "Bo Persson" <b...@gmb.dk> wrote:
> Alf P. Steinbach wrote:
>
> :: * Brian Neal:
> ::: On Aug 2, 12:11 pm, "Alf P. Steinbach" <al...@start.no> wrote:
> :::: * Brian Neal:
> ::::: Sutter and Alexandrescu recommend something like this in item
> ::::: 92 of their book "C++ Coding Standards" to avoid
> ::::: reinterpret_cast in this situation:
> ::::: T1* p1 = ...;
> ::::: void* vp = p1;
> ::::: T2* p2 = static_cast<T2*>(vp);
> :::: Without the context it's difficult to say whether this is
> :::: reasonable.
> :::
> ::: The context is item 92 from their book: avoid reinterpret_cast.
> ::: In the above case, how to cast between unrelated pointer types
> ::: without reinterpret_cast.
> ::
> :: Well yes, you already said that.
> ::
> :: When I (implicitly) asked for more, that meant a short explanation,
> :: perhaps with relevant quotes, or quoting the whole thing if it's
> :: small enough.
>
> There isn't much to quote, actually. Item 92 is a page full of the
> good reasons not to use reinterpret_cast, unless we just have to.

Okay I finally have the book in front of me. Bo is right, there isn't
much rationale given. Here is a quote:

"...the reality is that reinterpret_cast's effects are worse than
reinterpreting objects' bit patterns (which would be bad enough).
Except that some conversions are guaranteed to be reversible, its
effects are actually implementation-defined, so you don't know if it
will do even that. It is unreliable and nonportable."

>
> And then finally they just say that when casting unrelated pointers,
> we should prefer the code shown above. Then the page ends, without a
> motivation.
>

Yes. I wish there was a bit more here, especially after reading a few
posts in this thread.

> Next page is item 93, "Avoid using static_cast on pointers". :-)

:-) True, but in their defense they are talking about navigating an
inheritance hierarchy with pointers (static_cast vs dynamic_cast).

0 new messages