history of pointer to char

j0mbolar

unread,

Sep 5, 2004, 3:31:35 AM9/5/04

to

prior to c89 there was not a generic pointer
which could be used to point to any object type
so pointer to char seemingly acted as a
substitute for this. However, because of the
lack of a generic pointer(yes, some implementations
might have offered a generic pointer but this
was not standardized) and that a pointer to char
acted as a substitute in this respect, does
this explain why basically the standard hints at
pointer to char having the save alignment requirements
and such as a pointer to void? and as such does this
really mean that a pointer to char can be substituted
anywhere a pointer to void is used? maybe I'm over-
looking some important detail but still making
this query out of curiosity.

James Kuyper

unread,

Sep 5, 2004, 11:17:12 AM9/5/04

to

j0mb...@engineer.com (j0mbolar) wrote in message news:<2d31a9f9.04090...@posting.google.com>...

> prior to c89 there was not a generic pointer
> which could be used to point to any object type
> so pointer to char seemingly acted as a
> substitute for this. However, because of the
> lack of a generic pointer(yes, some implementations
> might have offered a generic pointer but this
> was not standardized) and that a pointer to char
> acted as a substitute in this respect, does
> this explain why basically the standard hints at
> pointer to char having the save alignment requirements
> and such as a pointer to void? and as such does this

6.2.5p26: "A pointer to *void* shall have the same representation and
alignment requirements as a pointer to a character type."
If you consider that to be only a "hint", what does it take for you to
consider something as having been explicitly required?

> really mean that a pointer to char can be substituted
> anywhere a pointer to void is used? maybe I'm over-
> looking some important detail but still making
> this query out of curiosity.

The standard requires in several places that various sets of types
must have the same representation and alignment:

6.2.5p6,9: corresponding signed and unsigned types; "same
representation" is restricted to those values that can be represented
in both types.
6.2.5p13: each complex type, and an array of exactly 2 elements of the
corresponding real type
6.2.5p25: The qualified and unqualified versions of a type. 39)
6.2.5p26: pointer to void and pointer to a character type.
All pointers to struct types.
All pointers to union types.
G.2p3: an imaginary type and the corresponding real type.

In each of those locations except G.2p3, there's a footnote
indicating that:
"The same representation and alignment requirements are meant to imply
interchangeability as arguments to functions, return values from
functions, and members of unions." However, while those statements are
"meant to imply" those things, interchangeability isn't actually
required, which I consider to be an error, though others disagree. In
particular, such types are not considered to be compatible types,
which matters for a great many rules. If an implementation provided
"same representation and alignment" without allowing
interchangeability, it would fully conforming to the letter of the
standard.

As far as I can tell, it's perfectly feasible to do so. For instance,
the implementation could use function interface conventions that
passes arguments using different methods, depending upon which
qualifications are applied to them.

There are many situation where types that can be converted to each
other, but are otherwise unrelated would cause a problem (such as a
constraint violation or undefined behavior). In each of those
situations, I believe it's legal for a conforming implementation to
produce the same kind of problems for types that are only required to
"have the same representation and alignment".

For instance, the type "function returning void*" is not compatible
with "function returning char*" (6.7.5.3p15). Therefore, while code
that converts a pointer of one of those types to the other type is
perfectly legal, if the converted pointer is dereferenced before being
converted back to its original type, the behavior is undefined
(6.3.2.3p8).

If a return statement's expression has a type that doesn't match the
type returned by the function, it is implicitly converted, as if by
assignment, to the return type of the function.
If a prototype for a function is in scope, arguments corresponding to
the explicitly listed parameters are implicitly converted, as if by
assignment, to the corresponding parameter's type.
For each of the cases listed above except complex numbers, there is an
implicit conversion allowed between the types that are said to have
the "same representation and alignment". Therefore, a problem can
occur

1. If an argument is passed to a function taking a parameter of the
other type, when there is no prototype in scope for that function.

2. The relevant argument is one of the variable arguments of a
function that allows a variable number of arguments, and the function
itself uses the other type to unpack it with the va_arg() macro.

3. The relevant types are an imaginary type and the corresponding real
type, in which case the implicit conversion always produces a 0.0,
which is probably not what was meant to be implied by
"interchangeable". However, this is the one case where the footnote
about interchangeability is missing. However, this does provide a good
counter-example to the claim that "same representation and alignment"
necessarily implies interchangeability.

Douglas A. Gwyn

unread,

Sep 6, 2004, 4:13:26 AM9/6/04

to

j0mbolar wrote:
> prior to c89 there was not a generic pointer
> which could be used to point to any object type
> so pointer to char seemingly acted as a
> substitute for this.

Yes, because the C model for data objects was that
they were sutably aligned byte arrays with additional
properties (such as value interpretation) impressed
upon them by the type system. Most contemporary
hardware implemented useful data access operations
(e.g. read/write/increment/multiply) in a manner that
closely conformed to that model, and that remains
true today.

When the initial C standard was being prepared, the
utility of separating denotation of byte pointers
from pointers to character type was identified, and
void* was invented for the former, retaining char*
for the latter. You can see this distinction most
clearly in the mem... versus str... function
prototypes.

> does this explain why basically the standard hints at
> pointer to char having the save alignment requirements
> and such as a pointer to void?

It's not a hint, it's a requirement. The new byte-
pointer type was used in e.g. the mem... interfaces
whereas previously written code had been using char*,
and we didn't want to introduce a serious
incompatibility in such interfaces. Note that such
code was using non-prototype ("old style") function
declarations, where the interface would work with data
created "any old way", just so long as the same
argument representations and argument-passing
conventions were used for both the function call and
the function definition. The same-representation
requirement ensures that a char* argument can be safely
passed to a function defined with a void* argument.

In fact a lot of older code passed int* or other such
address types to functions expecting byte pointers.
We ensured that such code would not be badly broken by
conforming implementations by introducing automatic
conversion between void* and other types of object
pointer; that was especially helpful when prototype
forms of function declaration started becoming visible
to such old code (mainly via revised standard headers).

> and as such does this
> really mean that a pointer to char can be substituted
> anywhere a pointer to void is used?

Not quite, because as James Kuyper pointed out, there
are (in come contexts) type *compatibility* requirements
that might be violated by such a substitution. You
should, however, be safe if you keep in mind that void*
should be used to point to undifferentiated arrays of
bytes, while char* should be used only with (narrow)
character-coded data (such as string literals). One
further thing: If you for some reason need to write code
that picks up undifferentiated objects, for example an
analogue to the memcpy function, you should treat the
type as array of unsigned char, which means accessing its
constituent bytes via a pointer of type unsigned char*.

Dan Pop

unread,

Sep 6, 2004, 11:14:34 AM9/6/04

to

In <2d31a9f9.04090...@posting.google.com> j0mb...@engineer.com (j0mbolar) writes:

>prior to c89 there was not a generic pointer
>which could be used to point to any object type
>so pointer to char seemingly acted as a
>substitute for this. However, because of the
>lack of a generic pointer(yes, some implementations
>might have offered a generic pointer but this
>was not standardized) and that a pointer to char
>acted as a substitute in this respect, does
>this explain why basically the standard hints at
>pointer to char having the save alignment requirements
>and such as a pointer to void?

This requirement naturally follows from the fact that stand alone C
objects are contiguous sequences of bytes (characters), so a pointer to
character must contain all the information required to access any object
type. Therefore, the representation of character pointers is also
suitable for void pointers.

>and as such does this
>really mean that a pointer to char can be substituted
>anywhere a pointer to void is used?

Nope. Void pointers have special semantics in C, that are not shared by
character pointers (conversions to and from void pointers are performed
automatically, without requiring a cast). So, you cannot replace

int i;
void *p = &i;

by

int i;
char *p = &i;

because there is no automatic conversion between pointer to int and
pointer to char. Such code requires a diagnostic.

However, if you also insert *all* the required casts, you can replace all
the void pointers in a program by character pointers (char, signed char,
unsigned char all work for this purpose) without changing the program
semantics.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

Douglas A. Gwyn

unread,

Sep 6, 2004, 2:27:47 PM9/6/04

to

Dan Pop wrote:
> This requirement naturally follows from the fact that stand alone C
> objects are contiguous sequences of bytes (characters), so a pointer to
> character must contain all the information required to access any object
> type.

That would be, all the information necessary to
*locate* an object of any type. An access in
general needs to know the specific type, also.

Note that this isn't as clear-cut as it appears.
For example, some architectures, especially on
embedded platforms, have multiple kinds of data
address space, which might require additional
bits be used in a generic pointer representation,
which might not be desirable from the viewpoint
of efficiency. (The Embedded C TR treats this.)

> Therefore, the representation of character pointers is also
> suitable for void pointers.

Therefore it is also sufficient for them. The
suitability is actually a matter of some debate.

Dan Pop

unread,

Sep 7, 2004, 8:18:30 AM9/7/04

to

In <Q_-dnfPo6qm...@comcast.com> "Douglas A. Gwyn" <DAG...@null.net> writes:

>Dan Pop wrote:
>> This requirement naturally follows from the fact that stand alone C
>> objects are contiguous sequences of bytes (characters), so a pointer to
>> character must contain all the information required to access any object
>> type.
>
>That would be, all the information necessary to
>*locate* an object of any type. An access in
>general needs to know the specific type, also.

The information about the type need not be stored in the pointer value.
The compiler *knows* the type of each pointer, so this information need
not be obtained at run time from the pointer value.

Keith Thompson

unread,

Sep 7, 2004, 5:05:00 PM9/7/04

to

But the discussion was about generic pointers (implemented as char* in
pre-ANSI C, void* in C89 and later standards). For example, qsort has
no way of knowing the type of array passed to it via its "void *base"
argument; the other arguments provide partial implicit information
about the actual type. ("Partial" because the only information
actually passed is the element size and a comparison routine; qsort
still doesn't know the actual type.)

Type information isn't typically stored in the pointer value itself,
but it has to come from somewhere.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Dan Pop

unread,

Sep 8, 2004, 7:51:59 AM9/8/04

to

In <ln8ybl3...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:

>Dan...@cern.ch (Dan Pop) writes:
>> In <Q_-dnfPo6qm...@comcast.com> "Douglas A. Gwyn"
>> <DAG...@null.net> writes:
>>>Dan Pop wrote:
>>>> This requirement naturally follows from the fact that stand alone C
>>>> objects are contiguous sequences of bytes (characters), so a pointer to
>>>> character must contain all the information required to access any object
>>>> type.
>>>
>>>That would be, all the information necessary to
>>>*locate* an object of any type. An access in
>>>general needs to know the specific type, also.
>>
>> The information about the type need not be stored in the pointer value.
>> The compiler *knows* the type of each pointer, so this information need
>> not be obtained at run time from the pointer value.
>
>But the discussion was about generic pointers (implemented as char* in
>pre-ANSI C, void* in C89 and later standards).

Generic pointers are not dereferenced as such. The code dereferencing
them must know the real type.

>For example, qsort has
>no way of knowing the type of array passed to it via its "void *base"
>argument; the other arguments provide partial implicit information
>about the actual type. ("Partial" because the only information
>actually passed is the element size and a comparison routine; qsort
>still doesn't know the actual type.)

Bogus example: qsort doesn't reference those pointers, therefore it
doesn't need any type information. The function that actually deferences
the pointers is user supplied and it does know the actual type of the
objects.

>Type information isn't typically stored in the pointer value itself,
>but it has to come from somewhere.

And it always does come from the source code, being therefore available
to the compiler.

Dan
--
Dan Pop
DESY Zeuthen, RZ group
Email: Dan...@ifh.de

Currently looking for a job in the European Union