Is there a bounded memcpy()-like function?

Rick C. Hodgin

unread,

Dec 26, 2017, 3:20:20 PM12/26/17

to

Is there a bounded memcpy()-like function which allows me to pass in a
src pointer, starting offset, length of data at the pointer to copy,
and a dst pointer and maximum size of the buffer in dst to receive the
data, which will copy as much as is allowed, and if the starting
offset exceeds the bounded size, perform no copy?

// Something like:
size_t bounded_memcpy(void *dst, size_t dst_size, size_t offset,
void *src, size_t src_size);

So that if I call with these it will only copy up to the max:

char buffer[2];
size_t offset;

// Populate buffer[]
offset = bounded_memcpy(buffer, sizeof(buffer), 0, "Hi, ", 4);
offset = bounded_memcpy(buffer, sizeof(buffer), offset, "mom!", 4);

The end result is that buffer[3] contains only "Hi" even though much
more data was attempted.

--
Rick C. Hodgin

Richard Damon

unread,

Dec 26, 2017, 4:41:46 PM12/26/17

to

I know of no such function in the C standard, and it doesn't sound like
it is defined the way C tends to define standard libraryy functions. It
wouldn't take much effort to define such a function, perhaps using
memcpy as a core to do the actual copy.

Providing a base buffer and an offset isn't the normal way C libraries
work, they tend to just have you pass base+offset instead. Also the
memory functions don't have size limits, that capability is more normal
in the string functions where you don't know for sure long the source
data string is.

herrman...@gmail.com

unread,

Dec 26, 2017, 5:37:27 PM12/26/17

to

On Tuesday, December 26, 2017 at 12:20:20 PM UTC-8, Rick C. Hodgin wrote:
> Is there a bounded memcpy()-like function which allows me to pass in a
> src pointer, starting offset, length of data at the pointer to copy,
> and a dst pointer and maximum size of the buffer in dst to receive the
> data, which will copy as much as is allowed, and if the starting
> offset exceeds the bounded size, perform no copy?
>
> // Something like:
> size_t bounded_memcpy(void *dst, size_t dst_size, size_t offset,
> void *src, size_t src_size);

I don't know about the offset, but if C had a min() function, that
would give the minimum of two arguments, you could use that for the
length of memcpy(). Instead, use the conditional operator to
select the appropriate length.

Manfred

unread,

Dec 26, 2017, 6:22:09 PM12/26/17

to

On 12/26/2017 9:20 PM, Rick C. Hodgin wrote:
> Is there a bounded memcpy()-like function which allows me to pass in a
> src pointer, starting offset, length of data at the pointer to copy,
> and a dst pointer and maximum size of the buffer in dst to receive the
> data, which will copy as much as is allowed, and if the starting
> offset exceeds the bounded size, perform no copy?

memcpy_s

http://en.cppreference.com/w/c/string/byte/memcpy

fir

unread,

Dec 26, 2017, 7:48:59 PM12/26/17

to

W dniu wtorek, 26 grudnia 2017 22:41:46 UTC+1 użytkownik Richard Damon napisał:
> (..)

you should know (and anybody other here, this i wrote to any troll-feeder who answers to rick troll) that hodgin is a local cretine who abuses this group by floding it in pathological brainless dumbness and so on

dont woke this depressing harmfull cretine up again

i propose if you want to answer on some topic which identally is mentioned by this pathological idiot, cut his text on it out, cut his name and answer to that but do not
direct the answer to this abusing idiot but
direct it more neutrally skiping this well known moron

this group is treally better without this depresing troll, and this troll is not worth talking to becouse of his numerous abuses... be responsible

Rick C. Hodgin

unread,

Dec 27, 2017, 12:37:08 PM12/27/17

to

Very close to what I wanted. Thank you.

--
Rick C. Hodgin

Rick C. Hodgin

unread,

Dec 27, 2017, 12:50:22 PM12/27/17

to

The purpose is to allow a series of operations to silently pass through
copy operations even if there's not enough space in the target, while
still returning the size of the buffer's needs even if it wasn't yet
allocated (or wasn't yet allocated sufficiently).

-----
I sent supercat an email on an optimization question, but I don't think
I have a valid email address ... or he's ignoring me. I had a thought
regarding writing this type of function this way...:

#define u32 unsigned _int32_t
#define s8 char

u32 bounded_memcpy(s8* dst, u32 dst_size, u32 dst_offset,
s8* src, u32 src_size)
{
u32 src_offset;

// Iterate for each character in src
for (src_offset = 0; src_offset < src_size; ++src_offset,
++dst_offset)
{
// Are we still bounded?
if (dst_offset < dst_size)
dst[dst_offset] = src[src_offset]; // Yes
}

// Indicate our new offset
return(dst_offset);
}

...and have the optimizing compiler recognize that dst, dst_size, and
dst_offset are constraints on the target, and that src and src_size
are constraints on the input, and to recognize it's a copy operation
and rewrite this code as a fundamental series of correct single
memcpy() calls and bypass logic by extracting the logic in the for()
loop and if() statement all in the optimizer.

--
Rick C. Hodgin

Richard Damon

unread,

Dec 27, 2017, 4:43:45 PM12/27/17

to

I was not saying it wasn't a useful way to think of it, just that it
didn't look like the model the standard library seems to have been build
on. The library seems much more based on moving pointers to arrays with
offsets, not that arrays with offsets don't work.

> -----

>
> #define u32 unsigned _int32_t
> #define s8 char
>
> u32 bounded_memcpy(s8* dst, u32 dst_size, u32 dst_offset,
> s8* src, u32 src_size)
> {
> u32 src_offset;
>
>
> // Iterate for each character in src
> for (src_offset = 0; src_offset < src_size; ++src_offset,
> ++dst_offset)
> {
> // Are we still bounded?
> if (dst_offset < dst_size)
> dst[dst_offset] = src[src_offset]; // Yes
> }
>
> // Indicate our new offset
> return(dst_offset);
> }
>
> ...and have the optimizing compiler recognize that dst, dst_size, and
> dst_offset are constraints on the target, and that src and src_size
> are constraints on the input, and to recognize it's a copy operation
> and rewrite this code as a fundamental series of correct single
> memcpy() calls and bypass logic by extracting the logic in the for()
> loop and if() statement all in the optimizer.
>

A compiler MIGHT be able to recognize the memcpy, but in no wayy would I
expect it must do so. Much better to define the function in a way to
make it do so:

I would define it something like: (untested and uncompiled code)

size_t bounded_memcpy(void* dst, size_t dst_size, size_t dst_offset,
void const* src, size_t src_size) {

if(dst_offset >= dst_size) {
/* Already overflowed, no copy */
src_size = 0;
} else if(dst_offset + src_size > dst_size) {
/* More to copy then room */
src_size = dst_size - dst_offset;
}

if(src_size > 0) memcpy(((char *)dst) + dst_offset, src, src_size);
return src_size;
}

This version is likely better if copies tend to be large, yours might be
better if most copies are short.

supe...@casperkitty.com

unread,

Dec 27, 2017, 7:21:35 PM12/27/17

to

On Wednesday, December 27, 2017 at 11:50:22 AM UTC-6, Rick C. Hodgin wrote:
> I sent supercat an email on an optimization question, but I don't think
> I have a valid email address ... or he's ignoring me. I had a thought
> regarding writing this type of function this way...:

I sent you an email with a better email address for reaching me; did you
not receive it?

> #define u32 unsigned _int32_t
> #define s8 char
>
> u32 bounded_memcpy(s8* dst, u32 dst_size, u32 dst_offset,
> s8* src, u32 src_size)
> {
> u32 src_offset;
>
>
> // Iterate for each character in src
> for (src_offset = 0; src_offset < src_size; ++src_offset,
> ++dst_offset)
> {
> // Are we still bounded?
> if (dst_offset < dst_size)
> dst[dst_offset] = src[src_offset]; // Yes
> }
>
> // Indicate our new offset
> return(dst_offset);
> }
>
> ...and have the optimizing compiler recognize that dst, dst_size, and
> dst_offset are constraints on the target, and that src and src_size
> are constraints on the input, and to recognize it's a copy operation
> and rewrite this code as a fundamental series of correct single
> memcpy() calls and bypass logic by extracting the logic in the for()
> loop and if() statement all in the optimizer.

Better than memcpy would be a proper typed array-copy intrinsic defined in
a way that existing implementations could process it by simply defining
a suitable macro.

As it is, memcpy is horribly overloaded for a number of purposes:

1. copying objects of a known type for use as that same type

2. copying objects of an unknown type for use as that same type

3. copying bytes from objects of unknown type for re-interpretation
as a known type

4. copying bytes from objects of known type for re-interpretation as
an unknown type

5. copying bytes from an objects of one unknown type for use or
reinterpretation as some other unknown (possibly different) type

Having distinct operations for copying typed and untyped data would allow
aliasing-based optimizations in cases where memcpy is used to copy data
between arrays of the same type, without changing the behavior in cases
where it was used to bytewise-convert data between types--something that
was unambiguously allowed in C89 (since memcpy is defined in terms of
reading and writing characters) but became needlessly ambiguous in C99.

Rick C. Hodgin

unread,

Dec 27, 2017, 10:00:16 PM12/27/17

to

On Wednesday, December 27, 2017 at 7:21:35 PM UTC-5, supe...@casperkitty.com wrote:
> On Wednesday, December 27, 2017 at 11:50:22 AM UTC-6, Rick C. Hodgin wrote:
> > I sent supercat an email on an optimization question, but I don't think
> > I have a valid email address ... or he's ignoring me. I had a thought
> > regarding writing this type of function this way...:
>
> I sent you an email with a better email address for reaching me; did you
> not receive it?

I did not ... unless I've missed it or forgotten. Let me double-check.
[virtual pause] Nope. Can't find it.

I tend to think of memcpy() as a more low-level function, a basic build-
ing block of software. I abstract its use on data (typed or otherwise)
down to fundamental byte operations.

But what do you think about the possibility of creating an optimization
which examines functions and internal data processing from this per-
spective? The general idea of recognizing that some parameters are
input, some are output, some are bounding, and then, based on the logic
in use within the function, being able to extract out bounded ranges of
data movement that resolve to something like base memcpy() functions
(be they in the form of intrinsics or otherwise)?

--
Rick C. Hodgin

Patrick.Schluter

unread,

Dec 29, 2017, 6:41:56 AM12/29/17

to

const void *src, size_t src_size)
{
char *dest = (char*)dst; /* for simpler pointer arithmetic */

if(offset < dst_size) {
if(dst_size - offset < src_size)
src_size = dst_size - offset;
memcpy(dest+offset, src, src_size);
}
return offset+src_size;
}

Patrick.Schluter

unread,

Dec 29, 2017, 7:00:14 AM12/29/17

to

Yes, one can believe in fairies or one can implement it with memcpy

size_t bounded_memcpy(void *dst, size_t dst_size, size_t offset,

const void *src, size_t src_size)
{

if(offset < dst_size) {
memcpy((char*)dst+offset, src, dst_size - offset < src_size ?
dst_size - offset :
src_size);
}
return offset+src_size;
}

which has the advantage to touch the source only when there's really
something to copy.

Rick C. Hodgin

unread,

Dec 29, 2017, 7:44:47 AM12/29/17

to

https://www.youtube.com/watch?v=OcfqDPAy7zc&t=3s

--
Rick C. Hodgin

William Ahern

unread,

Dec 29, 2017, 9:30:13 PM12/29/17

to

Both implementations should probably check for overflow--dst_offset++ in the
former, offset + src_size in the latter.

One aspect about C programming that bothers me is that pointer arithmetic is
signed (more or less) but constraints often use unsigned arithmetic. Using
unsigned arithmetic for constraints makes plenty of sense to me, especially
in code that needs to track out-of-bounds positions. Nonetheless, on its
face I'm not sure that dst_offset<dst_size necessarily implies that
dst+dst_offset is a valid address or that it can even be safely evaluated.

On the one hand, C11 6.5.6p8 says,

If both the pointer operand and the result point to elements of the same
array object, or one past the last element of the array object, the
evaluation shall not produce an overflow

On the other hand p9 says,

When two pointers are subtracted ... [if] the result is not representable
in an object of [ptrdiff_t (a signed integer type)], the behavior is
undefined.

It's a subtle issue--inapplicable to the code using memcpy but relevant to
the looping code, yet why that should be is not at all obvious.

James Kuyper

unread,

Dec 29, 2017, 9:58:27 PM12/29/17

to

There's nothing that bounded_memcpy() can do to avoid that problem. It's
the responsibility of the caller to make sure that dst+dst_size is an
expression with defined behavior. If the caller has in fact arranged
that, then given the way the standard defines addition of an integer to
a pointer, and the fact that dst_size and offset are both unsigned,
offset<=dst_size is sufficient to guarantee that dst+offset has defined
behavior.

> On the one hand, C11 6.5.6p8 says,
>
> If both the pointer operand and the result point to elements of the same
> array object, or one past the last element of the array object, the
> evaluation shall not produce an overflow
>
> On the other hand p9 says,
>
> When two pointers are subtracted ... [if] the result is not representable
> in an object of [ptrdiff_t (a signed integer type)], the behavior is
> undefined.
>
> It's a subtle issue--inapplicable to the code using memcpy but relevant to
> the looping code, yet why that should be is not at all obvious.

I don't see any pointer subtractions anywhere in the above code. The
only subtractions are of unsigned integers, and are performed only when
the result of the subtraction is guaranteed to be non-negative. Does
"the looping code" that you're referring to reside in some other message?

Keith Thompson

unread,

Dec 29, 2017, 10:37:01 PM12/29/17

to

William Ahern <wil...@25thandClement.com> writes:
[...]

> One aspect about C programming that bothers me is that pointer arithmetic is
> signed (more or less) but constraints often use unsigned arithmetic. Using
> unsigned arithmetic for constraints makes plenty of sense to me, especially
> in code that needs to track out-of-bounds positions. Nonetheless, on its
> face I'm not sure that dst_offset<dst_size necessarily implies that
> dst+dst_offset is a valid address or that it can even be safely evaluated.

[...]

In what sense is pointer arithmetic either signed or unsigned?

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Richard Damon

unread,

Dec 29, 2017, 11:01:30 PM12/29/17

to

On 12/29/17 9:16 PM, William Ahern wrote:
> Both implementations should probably check for overflow--dst_offset++ in the
> former, offset + src_size in the latter.
>
> One aspect about C programming that bothers me is that pointer arithmetic is
> signed (more or less) but constraints often use unsigned arithmetic. Using
> unsigned arithmetic for constraints makes plenty of sense to me, especially
> in code that needs to track out-of-bounds positions. Nonetheless, on its
> face I'm not sure that dst_offset<dst_size necessarily implies that
> dst+dst_offset is a valid address or that it can even be safely evaluated.
>
> On the one hand, C11 6.5.6p8 says,
>
> If both the pointer operand and the result point to elements of the same
> array object, or one past the last element of the array object, the
> evaluation shall not produce an overflow
>
> On the other hand p9 says,
>
> When two pointers are subtracted ... [if] the result is not representable
> in an object of [ptrdiff_t (a signed integer type)], the behavior is
> undefined.
>
> It's a subtle issue--inapplicable to the code using memcpy but relevant to
> the looping code, yet why that should be is not at all obvious.
>

Pointers are more like unsigned numbers, but not really. Pointer
arithmetic is defined only for pointer values that point within, or one
past, a given array.

dst+dst_offset is defined as the equivalent as &dst[dst_offset] which if
dst_offset is no bigger than the dimension of the array, must be a valid
value.

The issue with overflow of ptrdiff_t can only occur for a type with
sizeof = 1, and an array size greater than SIZE_MAX/2 (i.e. an object
over 1/2 of the size of a maximum size object).

On 'flat' architectures, size_t normally spans the full addressable
range of the processor (thus size_t is big enough to be uintptr_t), and
thus this object that could overflow ptrdiff_t occupies over 1/2 of the
memory space, which is fairly unlikely (but not impossible). On
segmented architectures, often size_t was just big enough for a maximum
sized segment, and this case was a bit more likely. (Perhaps 16 bit x86
code in large memory model was a good example of this, size_t was 16
bits, pointers were 32 bits, 16 bits being a segment, and 16 bits being
the offset).

One reason to write loops with traveling pointers with pointer compares
rather than pointer - base_pointer < size.

supe...@casperkitty.com

unread,

Dec 29, 2017, 11:51:48 PM12/29/17

to

On Friday, December 29, 2017 at 10:01:30 PM UTC-6, Richard Damon wrote:
> On 'flat' architectures, size_t normally spans the full addressable
> range of the processor (thus size_t is big enough to be uintptr_t), and
> thus this object that could overflow ptrdiff_t occupies over 1/2 of the
> memory space, which is fairly unlikely (but not impossible). On
> segmented architectures, often size_t was just big enough for a maximum
> sized segment, and this case was a bit more likely. (Perhaps 16 bit x86
> code in large memory model was a good example of this, size_t was 16
> bits, pointers were 32 bits, 16 bits being a segment, and 16 bits being
> the offset).

On x86 architectures, if "p" was the start of a 50K allocation, then
(p+49152)-p would yield -16384, but on the flip side, computing p+(-16384)
would yield p+49152. A little weird, but more useful than limiting
allocations to 32767 bytes, and more practical than making ptrdiff_t be
32 bits.

Richard Damon

unread,

Dec 30, 2017, 7:37:02 AM12/30/17

to

p need to be an array of a type like char. If p was an array of shorts,
then the biggest you could add would have been 25000, which won't cause
overflow.

James Kuyper

unread,

Dec 30, 2017, 12:38:05 PM12/30/17

to

On 12/29/2017 10:36 PM, Keith Thompson wrote:
> William Ahern <wil...@25thandClement.com> writes:
> [...]
>> One aspect about C programming that bothers me is that pointer arithmetic is
>> signed (more or less) but constraints often use unsigned arithmetic. Using
>> unsigned arithmetic for constraints makes plenty of sense to me, especially
>> in code that needs to track out-of-bounds positions. Nonetheless, on its
>> face I'm not sure that dst_offset<dst_size necessarily implies that
>> dst+dst_offset is a valid address or that it can even be safely evaluated.
> [...]
>
> In what sense is pointer arithmetic either signed or unsigned?

The only thing I can think of that's relavent is the fact that pointer
subtraction has a result that is of a signed integer type. Otherwise,
the signedness of pointers is a pretty meaningless concept.

bartc

unread,

Dec 30, 2017, 1:15:02 PM12/30/17

to

Like the signedness of characters should have been.

Although an easy fix there would have been to call them bytes instead.

Ben Bacarisse

unread,

Dec 30, 2017, 3:18:42 PM12/30/17

to

C was designed with the PDP-11 in mind as one (major) target. The
PDP-11's MOVB instruction sign extends the value when the destination is
a register -- in effect the hardware has decided that bytes are signed.

Other targets had unsigned bytes. It made sense to leave the signedness
of char as an implementation issue.

> Although an easy fix there would have been to call them bytes instead.

How would calling them bytes have helped? Do you mean you would not be
making this point if C had a "byte" type exactly like it's current
"char" type? Is it only the fact that they are called "char" that bugs
you?

--
Ben.

bartc

unread,

Dec 30, 2017, 4:05:19 PM12/30/17

to

I think that would have helped. But 'byte' would probably have need to
be signed by default to match the other int types. And there have been
only two variations, just like int.

(How was it even possible to have a char type - really a small int type
- which could have been either signed or unsigned, and compatible with
neither? In effect three char types.)

In the absence of a separate 'char' type, it would also have forced
people to pay more attention to the type of a string constant. So that
they might always have ended up as arrays of unsigned byte, solving a
few problems.

Standard C might then have introduced 'char' as a synonym for unsigned
byte. Then you would always know where you were.

--
bartc

supe...@casperkitty.com

unread,

Dec 30, 2017, 5:31:36 PM12/30/17

to

On Saturday, December 30, 2017 at 2:18:42 PM UTC-6, Ben Bacarisse wrote:
> How would calling them bytes have helped? Do you mean you would not be
> making this point if C had a "byte" type exactly like it's current
> "char" type? Is it only the fact that they are called "char" that bugs
> you?

Having distinct types for characters and bytes would have been helpful in
at least three ways:

1. It would accommodate systems where the smallest addressable chunk of
storage is less than 8 bits. Having a 4-bit or 6-bit "char" would not
make much sense, but if a platform can conveniently address 4-bit or
6-bit chunks of storage, or even individual bits (I've seen a TI chip
that used bit addressing) it would make sense to allow for that. Note
that on such platforms, a loop that copies individual bytes might be
an order of magnitude slower than memcpy, but that would simply mean
that code should avoid such loops on such platforms.

2. Once aliasing rules were added to the language, it would have made sense
to have a separate "may-alias-everything" type [and perhaps optional
chunked types as well] rather than overloading the types used for
manipulating text characters. I don't think there's much benefit to
requiring that a complier allow for (strcpy(*p, "Hey)) overwriting
objects of arbitrary type, rather than bona fide characters, but the
Standard requires that compilers recognize aliasing in such cases.

3. It would have made things cleaner on systems where the normal type used
for text characters is larger than one byte.

Les Cargill

unread,

Dec 30, 2017, 5:55:42 PM12/30/17

to

Ben Bacarisse wrote:
> bartc <b...@freeuk.com> writes:
>
>> On 30/12/2017 17:37, James Kuyper wrote:
>>> On 12/29/2017 10:36 PM, Keith Thompson wrote:
>>>> William Ahern <wil...@25thandClement.com> writes:
>>>> [...]
>>>>> One aspect about C programming that bothers me is that pointer arithmetic is
>>>>> signed (more or less) but constraints often use unsigned arithmetic. Using
>>>>> unsigned arithmetic for constraints makes plenty of sense to me, especially
>>>>> in code that needs to track out-of-bounds positions. Nonetheless, on its
>>>>> face I'm not sure that dst_offset<dst_size necessarily implies that
>>>>> dst+dst_offset is a valid address or that it can even be safely evaluated.
>>>> [...]
>>>>
>>>> In what sense is pointer arithmetic either signed or unsigned?
>>>
>>> The only thing I can think of that's relavent is the fact that pointer
>>> subtraction has a result that is of a signed integer type. Otherwise,
>>> the signedness of pointers is a pretty meaningless concept.
>>
>> Like the signedness of characters should have been.
>
> C was designed with the PDP-11 in mind as one (major) target. The
> PDP-11's MOVB instruction sign extends the value when the destination is
> a register -- in effect the hardware has decided that bytes are signed.
>
> Other targets had unsigned bytes. It made sense to leave the signedness
> of char as an implementation issue.
>

On some toolchains ( read: GNU for ARM) , you can reconfigure
( recompile? ) the toolchain such that unsigned is the default. It's
kind of annoying...

>> Although an easy fix there would have been to call them bytes instead.
>
> How would calling them bytes have helped? Do you mean you would not be
> making this point if C had a "byte" type exactly like it's current
> "char" type? Is it only the fact that they are called "char" that bugs
> you?
>

*Sigh* K&R had it right - unless it's a float/double, it's an *int* and
a *char* is an 8-bit int.

Such a nice simplifying assumption and yet so confusing....?

And don't get me started about C11 whinging about uint8_t * to calls
like strlen() :)

--
Les Cargill

Les Cargill

unread,

Dec 30, 2017, 5:56:49 PM12/30/17

to

bartc wrote:
> On 30/12/2017 20:18, Ben Bacarisse wrote:

<snip>

>
> Standard C might then have introduced 'char' as a synonym for unsigned
> byte. Then you would always know where you were.
>

K&R had char as a synonym for int of 8 bits. That should
have continued, IMO...

--
Les Cargill

David Brown

unread,

Dec 30, 2017, 6:33:41 PM12/30/17

to

How can it be annoying that plain char is unsigned? If it matters, then
use "signed char". And if it matters that it is 8-bit, then use "int8_t".

The only reason why "char" is signed on some systems is historic - it
stretches back to a time when C did not have the full complement of
signed/unsigned char/short/int/long. There is certainly no sensible
logical reason for a type called "char" to be signed - and very little
justification for it being "unsigned" or having arithmetic support at
all. It's just an artefact of the way C was first made.

>
>>> Although an easy fix there would have been to call them bytes instead.
>>
>> How would calling them bytes have helped? Do you mean you would not be
>> making this point if C had a "byte" type exactly like it's current
>> "char" type? Is it only the fact that they are called "char" that bugs
>> you?
>>
>
> *Sigh* K&R had it right - unless it's a float/double, it's an *int* and
> a *char* is an 8-bit int.
>
> Such a nice simplifying assumption and yet so confusing....?

And so limited! If C had not gained more types, it would have been
replaced.

>
> And don't get me started about C11 whinging about uint8_t * to calls
> like strlen() :)
>

It sometimes seems that whinging is the main purpose of this group.

Keith Thompson

unread,

Dec 30, 2017, 6:39:35 PM12/30/17

to

Les Cargill <lcarg...@comcast.com> writes:
[...]

> K&R had char as a synonym for int of 8 bits. That should
> have continued, IMO...

"int" is a specific integer type, not a general term for "integer".

char is an integer type, but even in K&R1 (1978) its signedness
is implementation-defined.

Appendix A section 6.1:

Whether or not sign-extension occurs for characters is machine
dependent, but it is guaranteed that a member of the standard
character set is non-negative. Of the machines treated by this
manual, only the PDP-11 sign-extends.

The machines mentioned are PDP-11, Honeywell 6000, IBM 370, and
Interdata 8/32. (IBM 370 uses EBCDIC with 8-bit char, so plain
char pretty much has to be unsigned.)

Les Cargill

unread,

Jan 1, 2018, 12:42:44 AM1/1/18

to

Keith Thompson wrote:
> Les Cargill <lcarg...@comcast.com> writes:
> [...]
>> K&R had char as a synonym for int of 8 bits. That should
>> have continued, IMO...
>
> "int" is a specific integer type, not a general term for "integer".
>

One would hope so.

> char is an integer type, but even in K&R1 (1978) its signedness
> is implementation-defined.
>

Even with GCC, the signedness is implementation-optional. I don't
recall details but one toolchain defaulted to unsigned, and it was
maddening,

> Appendix A section 6.1:
>
> Whether or not sign-extension occurs for characters is machine
> dependent, but it is guaranteed that a member of the standard
> character set is non-negative. Of the machines treated by this
> manual, only the PDP-11 sign-extends.
>
> The machines mentioned are PDP-11, Honeywell 6000, IBM 370, and
> Interdata 8/32. (IBM 370 uses EBCDIC with 8-bit char, so plain
> char pretty much has to be unsigned.)
>

I have to wonder how many machines from that list are still in the wild?

--
Les Cargill

Les Cargill

unread,

Jan 1, 2018, 12:55:40 AM1/1/18

to

I do use uint8_t/int8_t but they didn't. It's annoying because you
expect sign extension to work in cases and when it doesn't...

> The only reason why "char" is signed on some systems is historic - it
> stretches back to a time when C did not have the full complement of
> signed/unsigned char/short/int/long. There is certainly no sensible
> logical reason for a type called "char" to be signed - and very little
> justification for it being "unsigned" or having arithmetic support at
> all. It's just an artefact of the way C was first made.
>

Well, except that a char is just a constrained int, and ints are signed.

I agree that's not a good reason, but it's still a reason.

It's an arbitrary, axiomatic thing.

>>
>>>> Although an easy fix there would have been to call them bytes instead.
>>>
>>> How would calling them bytes have helped? Do you mean you would not be
>>> making this point if C had a "byte" type exactly like it's current
>>> "char" type? Is it only the fact that they are called "char" that bugs
>>> you?
>>>
>>
>> *Sigh* K&R had it right - unless it's a float/double, it's an *int* and
>> a *char* is an 8-bit int.
>>
>> Such a nice simplifying assumption and yet so confusing....?
>
> And so limited! If C had not gained more types, it would have been
> replaced.
>

It's kind of a mad consumerist frenzy in a way. Once you start having
types there's all kind of arguments you can get in with yourself.

No, seriously- I watched a thing in defense of the Sherman tank
recently, and supporting 76MM ammo plus the 75 MM ammo it supported
would have caused all sort of logistics problem, so they didn't.

The 76 MM round would have made it better in tank-tank battles with
Panzers but the use cases for tanks really didn't support that,
surprisingly. It didn't come up that much.

No, I don't want to explain that to the tankers either.

>>
>> And don't get me started about C11 whinging about uint8_t * to calls
>> like strlen() :)
>>
>
> It sometimes seems that whinging is the main purpose of this group.

It probably is. :)

--
Les Cargill

Robert Wessel

unread,

Jan 1, 2018, 2:29:40 AM1/1/18

to

Certainly the S/370's descendents are still with us.

PDP-11 code lives on in at least a few places, although I suspect it's
mostly emulation now.

The Interdata stuff ended up with Concurrent Computer Corporation, who
were producing the 3200 series, which I *think* (no personal
experience) are at least partial descendents of the 7/32 and 8/32,
into at least the early 2000s. No idea as to the current status.

The GE-600/Honeywell 6000 ended up at Groupe Bull, who were supporting
it at least to the late nineties. Again, no idea on current status,
although I believe Groupe Bull is now owned by someone else.

David Brown

unread,

Jan 1, 2018, 11:08:04 AM1/1/18

to

On 01/01/18 06:45, Les Cargill wrote:
> Keith Thompson wrote:
>> Les Cargill <lcarg...@comcast.com> writes:
>> [...]
>>> K&R had char as a synonym for int of 8 bits. That should
>>> have continued, IMO...
>>
>> "int" is a specific integer type, not a general term for "integer".
>>
>
> One would hope so.
>
>> char is an integer type, but even in K&R1 (1978) its signedness
>> is implementation-defined.
>>
>
> Even with GCC, the signedness is implementation-optional. I don't
> recall details but one toolchain defaulted to unsigned, and it was
> maddening,

I really don't get that. I think more of gcc's targets have unsigned by
default for their chars - and to the extent that signedness of char's
makes sense, that is the sane choice. But not only do I not find it
"maddening" if an implementation has signed plain chars, or unsigned
plain chars - often I do not even /know/ which it is. If your code
depends on one or other choice here, your code is wrong.

>
>> Appendix A section 6.1:
>>
>>      Whether or not sign-extension occurs for characters is machine
>>      dependent, but it is guaranteed that a member of the standard
>>      character set is non-negative. Of the machines treated by this
>>      manual, only the PDP-11 sign-extends.
>>
>> The machines mentioned are PDP-11, Honeywell 6000, IBM 370, and
>> Interdata 8/32. (IBM 370 uses EBCDIC with 8-bit char, so plain
>> char pretty much has to be unsigned.)
>>
>
> I have to wonder how many machines from that list are still in the wild?
>

Who cares? Plain chars are usually unsigned on modern ISA's, and signed
on more legacy ISA's. It is not a fixed rule, and it doesn't matter anyway.

David Brown

unread,

Jan 1, 2018, 11:16:16 AM1/1/18

to

No, you don't expect it to work. At least, /I/ don't expect it to work.
Code written long ago for a single implementation, written by someone
unaware of "signed char" might fail to work on other targets now. And
that is /exactly/ why gcc gives you the option of picking the signedness
of plain char - so that you can use such old code.

>
>> The only reason why "char" is signed on some systems is historic - it
>> stretches back to a time when C did not have the full complement of
>> signed/unsigned char/short/int/long. There is certainly no sensible
>> logical reason for a type called "char" to be signed - and very little
>> justification for it being "unsigned" or having arithmetic support at
>> all. It's just an artefact of the way C was first made.
>>
>
> Well, except that a char is just a constrained int, and ints are signed.
>
> I agree that's not a good reason, but it's still a reason.

That is stretching "reason" very thinly.

>
> It's an arbitrary, axiomatic thing.

It has never been arbitrary, and certainly is not axiomatic (if it were
axiomatic, there would be /one/ accepted correct choice). Having plain
char being signed was a logical choice in the days of pure ASCII (from
the same era as 7-bit communication protocols) and on machines that
automatically sign-extended all byte loads. Those days are long gone,
and unsigned plain chars make far more sense for 8-bit character sets.
For most uses, it simply does not matter - and in cases where it /is/
relevant, use "signed char" or "unsigned char".

>
>>>
>>>>> Although an easy fix there would have been to call them bytes instead.
>>>>
>>>> How would calling them bytes have helped? Do you mean you would not be
>>>> making this point if C had a "byte" type exactly like it's current
>>>> "char" type? Is it only the fact that they are called "char" that bugs
>>>> you?
>>>>
>>>
>>> *Sigh* K&R had it right - unless it's a float/double, it's an *int* and
>>> a *char* is an 8-bit int.
>>>
>>> Such a nice simplifying assumption and yet so confusing....?
>>
>> And so limited! If C had not gained more types, it would have been
>> replaced.
>>
>
> It's kind of a mad consumerist frenzy in a way. Once you start having
> types there's all kind of arguments you can get in with yourself.
>
> No, seriously- I watched a thing in defense of the Sherman tank
> recently, and supporting 76MM ammo plus the 75 MM ammo it supported
> would have caused all sort of logistics problem, so they didn't.
>
> The 76 MM round would have made it better in tank-tank battles with
> Panzers but the use cases for tanks really didn't support that,
> surprisingly. It didn't come up that much.
>
> No, I don't want to explain that to the tankers either.
>

Electrons are more flexible than tank ammo. And no one is suggesting
that along with 16-bit and 32-bit types we also need 33-bit types.

Ben Bacarisse

unread,

Jan 1, 2018, 3:03:28 PM1/1/18

to

Les Cargill <lcarg...@comcast.com> writes:
<text removed>

> Even with GCC, the signedness is implementation-optional. I don't
> recall details but one toolchain defaulted to unsigned, and it was
> maddening,

Why?

<text removed>
--
Ben.

Keith Thompson

unread,

Jan 1, 2018, 5:05:43 PM1/1/18

to

David Brown <david...@hesbynett.no> writes:
[...]

> I really don't get that. I think more of gcc's targets have unsigned by
> default for their chars - and to the extent that signedness of char's
> makes sense, that is the sane choice. But not only do I not find it
> "maddening" if an implementation has signed plain chars, or unsigned
> plain chars - often I do not even /know/ which it is. If your code
> depends on one or other choice here, your code is wrong.

[...]

I've found that plain char is signed for most targets (at least for most
targets I use). But as you say, it shouldn't matter for most purposes.

David Brown

unread,

Jan 2, 2018, 3:21:08 AM1/2/18

to

On 01/01/18 23:05, Keith Thompson wrote:
> David Brown <david...@hesbynett.no> writes:
> [...]
>> I really don't get that. I think more of gcc's targets have unsigned by
>> default for their chars - and to the extent that signedness of char's
>> makes sense, that is the sane choice. But not only do I not find it
>> "maddening" if an implementation has signed plain chars, or unsigned
>> plain chars - often I do not even /know/ which it is. If your code
>> depends on one or other choice here, your code is wrong.
> [...]
>
> I've found that plain char is signed for most targets (at least for most
> targets I use).

I believe you work mostly with "big" targets (the kind that run *nix
style OS's) - is that correct? It may be that there is a stronger
tradition for signed plain char in that area. I work mainly with small
embedded systems, and we make a lot more use of unsigned types than is
common in "traditional" C coding - plain char is often unsigned on such
systems. But the category split is not rigid or absolute, and certainly
not something to rely on.

> But as you say, it shouldn't matter for most purposes.
>

Ideally, it shouldn't matter for /any/ purpose. Unfortunately,
sometimes you might have to compile code that makes assumptions about
the signedness of plain char, and then it does matter.

Scott Lurndal

unread,

Jan 2, 2018, 10:34:53 AM1/2/18

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:
>On Tuesday, December 26, 2017 at 4:41:46 PM UTC-5, Richard Damon wrote:
>> On 12/26/17 3:20 PM, Rick C. Hodgin wrote:
>> > Is there a bounded memcpy()-like function which allows me to pass in a
>> > src pointer, starting offset, length of data at the pointer to copy,
>> > and a dst pointer and maximum size of the buffer in dst to receive the
>> > data, which will copy as much as is allowed, and if the starting
>> > offset exceeds the bounded size, perform no copy?

>

>The purpose is to allow a series of operations to silently pass through
>copy operations even if there's not enough space in the target, while
>still returning the size of the buffer's needs even if it wasn't yet
>allocated (or wasn't yet allocated sufficiently).

snprintf, using '%s' as format specifier, fills all your aforementioned
needs.

Rick C. Hodgin

unread,

Jan 2, 2018, 10:38:08 AM1/2/18

to

No. It falls short in a few ways, namely that you don't learn the full
offset needed, nor can you pass in the starting offset and bounds, which
are then tested to see if a write is needed in the target. It will also
stop copying data once it reaches a NULL for the %s parameter.

The other examples people have given will work. And, I've written my
own as well which I am currently using.

--
Thank you, | Indianapolis, Indiana | God is love -- 1 John 4:7-9
Rick C. Hodgin | http://www.libsf.org/ | http://tinyurl.com/yaogvqhj

Keith Thompson

unread,

Jan 2, 2018, 12:21:14 PM1/2/18

to

David Brown <david...@hesbynett.no> writes:
> On 01/01/18 23:05, Keith Thompson wrote:
>> David Brown <david...@hesbynett.no> writes:
>> [...]
>>> I really don't get that. I think more of gcc's targets have unsigned by
>>> default for their chars - and to the extent that signedness of char's
>>> makes sense, that is the sane choice. But not only do I not find it
>>> "maddening" if an implementation has signed plain chars, or unsigned
>>> plain chars - often I do not even /know/ which it is. If your code
>>> depends on one or other choice here, your code is wrong.
>> [...]
>>
>> I've found that plain char is signed for most targets (at least for most
>> targets I use).
>
> I believe you work mostly with "big" targets (the kind that run *nix
> style OS's) - is that correct?

Yes. Even the embedded systems I work on run Linux kernels.

> It may be that there is a stronger
> tradition for signed plain char in that area. I work mainly with small
> embedded systems, and we make a lot more use of unsigned types than is
> common in "traditional" C coding - plain char is often unsigned on such
> systems. But the category split is not rigid or absolute, and certainly
> not something to rely on.
>
>> But as you say, it shouldn't matter for most purposes.
>
> Ideally, it shouldn't matter for /any/ purpose. Unfortunately,
> sometimes you might have to compile code that makes assumptions about
> the signedness of plain char, and then it does matter.

It can be mildly inconvenient that string literals have elements
that are signed. I/O is defined in terms of unsigned chars; for
example fgetc() returns the next character as an unsigned char
converted to an int. strcmp() operates on arrays of plain char,
but treats the elements as unsigned char. 7.24.4 says that the
characters are "interpreted as unsigned char", not converted.

In practice we depend on conversions between char and unsigned
char to behave sensibly. I'm not sure that all such assumptions
are guaranteed to be valid on non-2's-complement systems.

One place where this becomes visible in code is that the argument
to isdigit() et al has to be converted to unsigned char.
`isdigit(s[i])` is unsafe if s is a pointer to a string that
might contain characters with negative values; you have to write
`isdigit((unsigned char)s[i])`.

Another potential glitch, not related to signedness: The conventional
input loop:
int c;
while ((c = getchar()) != EOF) { /* ... */ }
can terminate early if CHAR_BIT>=16 and sizeof(int)==1.
(Few programmers are likely to encounter such systems.)

William Ahern

unread,

Jan 4, 2018, 7:30:22 PM1/4/18

to

James Kuyper <james...@verizon.net> wrote:
> On 12/29/2017 09:16 PM, William Ahern wrote:
>> Patrick.Schluter <Patrick....@free.fr> wrote:
>>> Le 27.12.2017 à 18:50, Rick C. Hodgin a écrit :

<snip>

>>>> // Are we still bounded?
>>>> if (dst_offset < dst_size)
>>>> dst[dst_offset] = src[src_offset]; // Yes
>>>> }

<snip>

>> One aspect about C programming that bothers me is that pointer arithmetic is
>> signed (more or less) but constraints often use unsigned arithmetic. Using
>> unsigned arithmetic for constraints makes plenty of sense to me, especially
>> in code that needs to track out-of-bounds positions. Nonetheless, on its
>> face I'm not sure that dst_offset<dst_size necessarily implies that
>> dst+dst_offset is a valid address or that it can even be safely evaluated.
>
> There's nothing that bounded_memcpy() can do to avoid that problem. It's
> the responsibility of the caller to make sure that dst+dst_size is an
> expression with defined behavior. If the caller has in fact arranged
> that, then given the way the standard defines addition of an integer to
> a pointer, and the fact that dst_size and offset are both unsigned,
> offset<=dst_size is sufficient to guarantee that dst+offset has defined
> behavior.

C11 6.5.6p9 says,

... if the expression P points either to an element of an array object
or one past the last element of an array object, and the expression Q
points to the last element of the same array object, the expression
((Q)+1)-(P) has the same value as ((Q)-(P))+1

On a typical x86 32-bit system PTRDIFF_MAX is 2^31-1 and SIZE_MAX is 2^32-1.
With PAE regular user processes should be able to address the full 4GB, or
at least mmap files at least as large as 3GB. (I put together a test program
using ftruncate and mmap, but the only 32-bit system I have access to is
OpenBSD, which won't allow me to mmap more than 2^30.)

So you should be able to allocate a char array of size PTRDIFF_MAX+1. Say P
points to the first element and Q points to the last element. That means Q-P
is precisely PTRDIFF_MAX, (Q-P)+1 evaluates to PTRDIFF_MAX+1 in a
well-defined manner, but (Q+1)-P isn't representable as ptrdiff_t and thus
invokes undefined behavior.

But that violates the identity declared by the standard at least if we take
it literally. And I don't see why we shouldn't take it literally. I don't
think the standard refers to a particular "value" in any other context where
deriving the value may invoke undefined behavior. Where else does the
standard effectively say, "expression A produces the same values as
expression B for all valid inputs, except where evaluation of B might
produce undefined behavior where it wouldn't for A." I'm not saying that you
couldn't come up with such wording, just that where the standard discusses
such inconsistencies it does so in a different manner.

>> On the one hand, C11 6.5.6p8 says,
>>
>> If both the pointer operand and the result point to elements of the same
>> array object, or one past the last element of the array object, the
>> evaluation shall not produce an overflow
>>
>> On the other hand p9 says,
>>
>> When two pointers are subtracted ... [if] the result is not representable
>> in an object of [ptrdiff_t (a signed integer type)], the behavior is
>> undefined.
>>
>> It's a subtle issue--inapplicable to the code using memcpy but relevant to
>> the looping code, yet why that should be is not at all obvious.
>
> I don't see any pointer subtractions anywhere in the above code. The
> only subtractions are of unsigned integers, and are performed only when
> the result of the subtraction is guaranteed to be non-negative. Does
> "the looping code" that you're referring to reside in some other message?

It's not that the code itself does subtraction, it's that the identity
between ((Q)+1)-(P) and ((Q)-(P))+1 stated by the standard implies an
equivalency between the signed (or signed-liked) arithmetic of
pointer-pointer subtraction and pointer-integer addition.[1] Thus, it's not
clear that (P)+N necessarily behaves like unsigned arithmetic. At the same
time, the identity and other rules don't necessarily exclude implementations
which support arrays with more than PTRDIFF_MAX elements.
malloc(PTRDIFF_MAX+1) might work just fine, but to derive pointers past
PTRDIFF_MAX+1 you may have to do so using multiple expressions.

[1] The point of equating (Q+1)-P to (Q-P)+1 is to highlight the symmetry of
the arithemtic behavior, particularly where P>Q, i.e. where Q-P results in
(ptrdiff_t)-1. You could argue that I'm unnecessarily assuming that the
undefinedess of signed arithemtic overflow applies to pointer arithmetic.
But it's the standard that says P-Q is undefined where it's not
representable as ptrdiff_t, which is much like (if not precisely like) how
it describes the rules regarding integer arithmetic overflow.

James R. Kuyper

unread,

Jan 5, 2018, 10:12:46 AM1/5/18

to

On 01/04/2018 07:19 PM, William Ahern wrote:
> James Kuyper <james...@verizon.net> wrote:
>> On 12/29/2017 09:16 PM, William Ahern wrote:

...

>>> One aspect about C programming that bothers me is that pointer arithmetic is
>>> signed (more or less) but constraints often use unsigned arithmetic. Using
>>> unsigned arithmetic for constraints makes plenty of sense to me, especially
>>> in code that needs to track out-of-bounds positions. Nonetheless, on its
>>> face I'm not sure that dst_offset<dst_size necessarily implies that
>>> dst+dst_offset is a valid address or that it can even be safely evaluated.
>>
>> There's nothing that bounded_memcpy() can do to avoid that problem. It's
>> the responsibility of the caller to make sure that dst+dst_size is an
>> expression with defined behavior. If the caller has in fact arranged
>> that, then given the way the standard defines addition of an integer to
>> a pointer, and the fact that dst_size and offset are both unsigned,
>> offset<=dst_size is sufficient to guarantee that dst+offset has defined
>> behavior.
>
> C11 6.5.6p9 says,
>
> ... if the expression P points either to an element of an array object
> or one past the last element of an array object, and the expression Q
> points to the last element of the same array object, the expression
> ((Q)+1)-(P) has the same value as ((Q)-(P))+1
>
> On a typical x86 32-bit system PTRDIFF_MAX is 2^31-1 and SIZE_MAX is 2^32-1.
> With PAE regular user processes should be able to address the full 4GB, or
> at least mmap files at least as large as 3GB. (I put together a test program
> using ftruncate and mmap, but the only 32-bit system I have access to is
> OpenBSD, which won't allow me to mmap more than 2^30.)
>
> So you should be able to allocate a char array of size PTRDIFF_MAX+1. Say P

Since ptrdiff_t is a signed type, unless PTRDIFF_MAX < INT_MAX,
PTRDIFF_MAX+1 involves signed overflow, and therefore undefined
behavior; on typical 2's complement systems the undefined behavior will
often take the form of giving a result that's negative.

> points to the first element and Q points to the last element. That means Q-P
> is precisely PTRDIFF_MAX, (Q-P)+1 evaluates to PTRDIFF_MAX+1 in a

> well-defined manner, ...

As indicated above, no, it does not.

> ... but (Q+1)-P isn't representable as ptrdiff_t and thus
> invokes undefined behavior.

Correct, and when that is the case, the undefined behavior of that
expression overrides the promise implied by the identity that is
expressed in terms of that expression.

>>> On the one hand, C11 6.5.6p8 says,
>>>
>>> If both the pointer operand and the result point to elements of the same
>>> array object, or one past the last element of the array object, the
>>> evaluation shall not produce an overflow
>>>
>>> On the other hand p9 says,
>>>
>>> When two pointers are subtracted ... [if] the result is not representable
>>> in an object of [ptrdiff_t (a signed integer type)], the behavior is
>>> undefined.
>>>
>>> It's a subtle issue--inapplicable to the code using memcpy but relevant to
>>> the looping code, yet why that should be is not at all obvious.
>>
>> I don't see any pointer subtractions anywhere in the above code. The
>> only subtractions are of unsigned integers, and are performed only when
>> the result of the subtraction is guaranteed to be non-negative. Does
>> "the looping code" that you're referring to reside in some other message?
>
> It's not that the code itself does subtraction, it's that the identity
> between ((Q)+1)-(P) and ((Q)-(P))+1 stated by the standard implies an
> equivalency between the signed (or signed-liked) arithmetic of
> pointer-pointer subtraction and pointer-integer addition.[1] Thus, it's not
> clear that (P)+N necessarily behaves like unsigned arithmetic.

Unsigned types:
U1. Cannot store a negative value.
U2. If the promoted type is unsigned, the difference cannot be negative.
U3. An expression with a mathematical value not representable in that
type has a well-defined result.

Signed types:
S1. Can store a negative value.
S2. The promoted type will always be signed, and the difference can be
negative.
S3. An expression with a mathematical value not representable in that
type has undefined behavior

Pointer types:
P1. The standard doesn't attach any meaning to the concept of a negative
pointer value (or, for that matter, a positive one).
P2. The type of the difference between pointer values is guaranteed to
be signed, and the result can be negative.
P3. Addition of an integer value to a pointer value, or subtraction of
an integer value from a pointer value, has undefined behavior if the
result doesn't point into or one past the end of the same array that the
pointer operand pointed into (or one past the end of). Representability
of the result is completely irrelevant.
Subtraction of two pointer values has undefined behavior unless they
both point into or one past the end of the same array. Representability
of the result does matter, but it's not the only thing that determines
whether the behavior is undefined.

I'd say that P2 is more similar to S2 than to U2. However, while P3 and
S3 both involve undefined behavior, making them more similar to each
other than either of them is to U3, the circumstances under which that
behavior occurs are quite different. Therefore, I think that making any
analogy between signed arithmetic and pointer arithmetic isn't
particularly useful. This particularly true because most arithmetic
operators defined for integer types are constraint violations when
applied to pointer types: unary: + - ~ binary: * / % >> << & | ^

supe...@casperkitty.com

unread,

Jan 5, 2018, 1:40:37 PM1/5/18

to

On Friday, January 5, 2018 at 9:12:46 AM UTC-6, James R. Kuyper wrote:
> Since ptrdiff_t is a signed type, unless PTRDIFF_MAX < INT_MAX,
> PTRDIFF_MAX+1 involves signed overflow, and therefore undefined
> behavior; on typical 2's complement systems the undefined behavior will
> often take the form of giving a result that's negative.

On many implementations there will be no defined sequence of events
that could create an object larger than PTRDIFF_MAX bytes. If no such
objects can exist on a particular implementation, there would be no need
to define the behavior of subtracting pointers that differ by more than
that amount.

On an implementation which uses 16-bit word addresses, and where a byte
pointer consists of an 16-bit word address along with a byte offset, the
difference between two pointers could be larger than UINT_MAX. If such
an implementation processes calloc(4, 20000u) by returning a 40000-word
allocation, it may be able to properly evaluate the difference between
uint32_t* pointers which identify the start and end of the allocation.
Forbidding such an implementation from honoring the allocation request
would seem less helpful than allowing it, but recognizing that casting
the pointers to char* before the subtract may yield meaningless results.

On linear-address implementations where a single object could occupy more
than half the address space, but where integer math uses silent wraparound
two's-complement semantics, the difference between two character pointers
might overflow, but would always do so in a way that still obeys all the
expected transitive and associative relations.

I don't think the authors of the Standard wanted to imply that
implementations of the first sort should be viewed as defective if they
allow calloc() to return non-null when code asks for an object larger than
PTRDIFF_MAX bytes, but nor did they want to imply that code targeting the
second type of implementation should be viewed as defective if it requests
objects between PTRDIFF_MAX and SIZE_MAX bytes and expects that pointer
arithmetic involving them will transitively. Unfortunately, the Standard
generally ignores behaviors which would naturally be defined on some
implementations but not others, based upon things like the relative sizes
and storage formats of their different types.

William Ahern

unread,

Jan 5, 2018, 5:45:19 PM1/5/18

to

James R. Kuyper <james...@verizon.net> wrote:
> On 01/04/2018 07:19 PM, William Ahern wrote:

<snip>

>> So you should be able to allocate a char array of size PTRDIFF_MAX+1. Say P
>
> Since ptrdiff_t is a signed type, unless PTRDIFF_MAX < INT_MAX,
> PTRDIFF_MAX+1 involves signed overflow, and therefore undefined
> behavior; on typical 2's complement systems the undefined behavior will
> often take the form of giving a result that's negative.

I should have written (size_t)PTRDIFF_MAX+1, or conversely used different
language or notation that couldn't be conflated with compliant C code.

But thank you for interpreting my psuedo-code literally and then deducing
falsehood by illustrating the presence of undefined behavior. That more than
anything shows that you recognize my point ;)

>> points to the first element and Q points to the last element. That means Q-P
>> is precisely PTRDIFF_MAX, (Q-P)+1 evaluates to PTRDIFF_MAX+1 in a
>> well-defined manner, ...
>
> As indicated above, no, it does not.
>
>> ... but (Q+1)-P isn't representable as ptrdiff_t and thus
>> invokes undefined behavior.
>
> Correct, and when that is the case, the undefined behavior of that
> expression overrides the promise implied by the identity that is
> expressed in terms of that expression.

Conversely, we can avoid extrinsic qualification by not assuming that the
standard defines the behavior of P+N where N is greater than PTRDIFF_MAX.

I originally said that it wasn't clear to me that such a case was
well-defined, not that I didn't think it could be interpreted to be
well-defined or that implementations didn't behave as such.

I agree that the analogies are unproductive, if not unwarranted. But the
particular manner in which the standard rigorously defines pointer
arithemtic invites comparison and equivocation to integer arithmetic
behaviors.

In the context of a discussion about designing what are often called "safe"
primitives, we should be especially concerned with not only pointing out
flaws, but pointing out a failure to affirmatively prove correctness. Take a
function like the following

char *lc(char *src, size_t len) {
char *p = src;
while (p - src < srclen) {
unsigned char c = *p;
*(unsigned char *)p++ = tolower(c);
}
return src;
}

which uses identical bounds-checking logic to the supposedly correct
solution used in CERT C Coding Standard rule STR37-C.[1]

It's not "safe" for all valid inputs, presuming the implementation properly
supports objects greater than PTRDIFF_MAX in size. You explain the reason
why it's problematic and propose

char *lc(char *src, size_t len) {
char *p = src, *pe = src + len;
while (p < pe) {
unsigned char c = *p;
*(unsigned char *)p++ = tolower(c);
}
return src;
}

The engineer smartly asks, "but if p - src might be undefined, why wouldn't
src + len be undefined for the same inputs". The best we can say in defense
of the supposedly better implementation is, apparently, that it is because
it is.[2] Alternatively, we could say the situation will never happen--that
you'll never see such a scenario--but in that case there's nothing wrong
with the first implementation. But how do you positively prove it would
never happen? You can't, because the standard doesn't necessarily rule it
out, either and in fact it's absolutely something we should expect. So maybe
we use

char *lc(char *src, size_t len) {
char *p = src;
while (len--) {
unsigned char c = *p;
*(unsigned char *)p++ = tolower(c);
}
return src;
}

But _why_ we wrote it that way is difficult to explain, to say the least.

[1] From https://wiki.sei.cmu.edu/confluence/display/c/STR37-C.+Arguments+to+character-handling+functions+must+be+representable+as+an+unsigned+char

Compliant Solution

This compliant solution casts the character to unsigned char before
passing it as an argument to the isspace() function:

#include <ctype.h>
#include <string.h>

size_t count_preceding_whitespace(const char *s) {
const char *t = s;
size_t length = strlen(s) + 1;
while (isspace((unsigned char)*t) && (t - s < length)) {
++t;
}
return t - s;
}

[2] We could say that if such a large object were ever validly instantiated
in the environment (by malloc or something non-standard like mmap) that it
must be okay at least in that environment. But especially in environments
where the left hand rarely knows what the right hand is doing (one group
writes the kernel, another writes libc, a third writes the compiler) it's
not very prudent to assume that everybody is on the same page, particularly
when you're tasked with writing "safe" (resilient, robust, simple,
transparent, whatever) library code. The point of a standard is to provide a
shared target, which is a far more reliable, consistent, and most of all
efficient way to coordinate at scale.

supe...@casperkitty.com

unread,

Jan 5, 2018, 6:47:52 PM1/5/18

to

The Standard does not specify that an expression with integer type is
converted to ptrdiff_t before it is added to or subtracted from a pointer.
If the size of an object would fit in ptrdiff_t, then adding a value which
is too large to fit in ptrdiff_t would result in Undefined Behavior. The
Standard could perhaps benefit from clarification about whether:

char *p = calloc(2,PTRDIFF_MAX+1uLL);
if (p)
p[PTRDIFF_MAX*2uLL+1] = 5;

would be required to either set p to a null pointer or else set it to the
start of an allocation whose last byte gets the value 5 written to it.
Nothing in the Standard would forbid calloc() from returning a null pointer
in this case. I was a bit surprised that searching for "calloc", the
Standard doesn't specify whether implementations are required to return null
in all cases where the arithmetically-correct size of the object would
exceed the largest size the allocator can handle (as opposed to multiplying
the two operands using type size_t and allocating that amount of space).

> [2] We could say that if such a large object were ever validly instantiated
> in the environment (by malloc or something non-standard like mmap) that it
> must be okay at least in that environment. But especially in environments
> where the left hand rarely knows what the right hand is doing (one group
> writes the kernel, another writes libc, a third writes the compiler) it's
> not very prudent to assume that everybody is on the same page, particularly
> when you're tasked with writing "safe" (resilient, robust, simple,
> transparent, whatever) library code. The point of a standard is to provide a
> shared target, which is a far more reliable, consistent, and most of all
> efficient way to coordinate at scale.

What is necessary in many cases is not to have a Standard specify individual
behaviors, but specify the relationships among behaviors. For example, it
would make sense to say that an implementation must define the behavior of
adding 0 to a pointer returned by a successful malloc/calloc/etc. call as
yielding a pointer matching the original, and likewise define the behavior of
passing such a pointer to memcpy/memmove/fread/fwrite/etc. when the size is
zero. If a successful malloc(0) returns a null pointer, then implementations
would have to define the behavior of such operations on null pointers. If
a successful malloc(0) can never return a null pointer, such obligations need
not apply.

James Kuyper

unread,

Jan 6, 2018, 5:04:09 PM1/6/18

to

On 01/05/2018 05:30 PM, William Ahern wrote:
> James R. Kuyper <james...@verizon.net> wrote:
>> On 01/04/2018 07:19 PM, William Ahern wrote:
> <snip>
>>> So you should be able to allocate a char array of size PTRDIFF_MAX+1. Say P
>>
>> Since ptrdiff_t is a signed type, unless PTRDIFF_MAX < INT_MAX,
>> PTRDIFF_MAX+1 involves signed overflow, and therefore undefined
>> behavior; on typical 2's complement systems the undefined behavior will
>> often take the form of giving a result that's negative.
>
> I should have written (size_t)PTRDIFF_MAX+1, or conversely used different
> language or notation that couldn't be conflated with compliant C code.
>
> But thank you for interpreting my psuedo-code literally and then deducing
> falsehood by illustrating the presence of undefined behavior.

My first draft treated it as just pseudo-code, so I simply pointed out
that you should have warned people that PTRDIFF_MAX+1 should not be
interpreted as a C expression. However, I then noticed your claim that
"(Q-P)+1 evaluates to PTRDIFF_MAX+1 in a well-defined manner", and
concluded that you were in fact unaware that the addition has undefined
behavior. That's when I decided to re-write my response based upon the
possibility that you didn't know that PTRDIFF_MAX+1 probably has
undefined behavior.

Other things you've written, in both this message and the previous one,
suggest that you were in fact aware that it has undefined behavior - but
then why use the incorrect phrase "well-defined" in that sentence? I
was, and remain, confused by that conflict.

...

>>> ... but (Q+1)-P isn't representable as ptrdiff_t and thus
>>> invokes undefined behavior.
>>
>> Correct, and when that is the case, the undefined behavior of that
>> expression overrides the promise implied by the identity that is
>> expressed in terms of that expression.
>
> Conversely, we can avoid extrinsic qualification by not assuming that the
> standard defines the behavior of P+N where N is greater than PTRDIFF_MAX.

That the standard defines the behavior of that expression is not an
assumption, it's a conclusion derived from the words from the standard
containing that definition:

"... if the expression P points to the i-th element of an array object,
the expressions (P)+N ... (where N has the value n) point[s] to ... the
i+n-th ... element of the array object, provided [it] exist[s]." (6.5.6p8).

In this definition, i+n must be interpreted as a mathematical
expression, rather than a C expression. If the array object contains at
least i+n objects, then it's perfectly clear what that definition means,
even if n>PTRDIFF_MAX. There are no direct constraints on the value of
n. The only indirect constraints on n are that "If both the pointer

operand and the result point to elements of the same array object, or
one past the last element of the array object, the evaluation shall not

produce an overflow; otherwise, the behavior is undefined." There's no
constraint on n based upon comparing it with PTRDIFF_MAX.

...

> In the context of a discussion about designing what are often called "safe"
> primitives, we should be especially concerned with not only pointing out
> flaws, but pointing out a failure to affirmatively prove correctness. Take a
> function like the following
>
> char *lc(char *src, size_t len) {
> char *p = src;
> while (p - src < srclen) {

I'm going to assume that srclen and len were supposed to be the same. If
that assumption is incorrect, please explain.

> unsigned char c = *p;
> *(unsigned char *)p++ = tolower(c);
> }
> return src;
> }
>
> which uses identical bounds-checking logic to the supposedly correct
> solution used in CERT C Coding Standard rule STR37-C.[1]>
> It's not "safe" for all valid inputs, presuming the implementation properly
> supports objects greater than PTRDIFF_MAX in size. You explain the reason
> why it's problematic and propose
>
> char *lc(char *src, size_t len) {
> char *p = src, *pe = src + len;
> while (p < pe) {
> unsigned char c = *p;
> *(unsigned char *)p++ = tolower(c);
> }
> return src;
> }
>
> The engineer smartly asks, "but if p - src might be undefined, why wouldn't
> src + len be undefined for the same inputs".

p-src can have undefined behavior because "If the result is not
representable in an object of that type [ptrdiff_t], the behavior is
undefined." (6.5.6p9). The engineer cannot cite a corresponding clause
that makes the behavior of src+len undefined.

Let me moderate that statement slightly. The engineer cannot point to
any restriction based, directly or indirectly, upon comparing the value
of len with PTRDIFF_MAX. However, he can point to undefined behavior so
long as len > 1, by "omission of any explicit definition of behavior"
(4p6), based upon the fact that len is (presumably) the actual length of
the array. I presume that this point is NOT the one you're arguing
about. Let me explain that point:

The standard, as written, implies that the only way to create a pointer
one past the end of an array is by adding 1 to a pointer to the last
element of the array. Many people believe that if P points to the end
of the array, and P-n points within the array, then the standard
requires (P-n)+(n+1) to be an alternative way of calculating such a
pointer. They reach this conclusion by rearranging the math into P-n+n+1
=> P+(-n+n+1) => P+1; but the standard doesn't provide justification for
such a rearrangement. Its definition of how pointer addition and
subtraction works in general would require that rearrangement to be
correct, if P pointed anywhere else in that array, so long as P-n also
pointed inside the array. However, note that the definition (which I
quoted earlier), ends with the phrase "provided they exist." When the
result would point one past the end of the array, the corresponding
element does NOT exist, and that definition does not apply, and
therefore, cannot be used to justify such rearrangements.

However, the committee intended that this should work, and it does in
fact work for essentially all real world implementations. I have, in
fact, had a great deal of trouble convincing many people that the
wording of the standard doesn't match that intent. Even if I'm right,
it's certainly not what your argument is about.

> ... The best we can say in defense

> of the supposedly better implementation is, apparently, that it is because
> it is.[2]

I prefer saying "the worse implementation can have undefined behavior in
cases where the better one only has undefined behavior if you believe
the arguments of one crazy pedant that the words of the standard don't
correctly implement the intent of the committee. No real world
implementation has a problem with src+len, where src points at the start
of an array and len is the number of elements in that array."

Tim Rentsch

unread,

Jan 7, 2018, 1:19:18 PM1/7/18

to

William Ahern <wil...@25thandClement.com> writes:

> Patrick.Schluter <Patrick....@free.fr> wrote:
>

>> Le 27.12.2017 at 18:50, Rick C. Hodgin a ecrit:

>>
>>> On Tuesday, December 26, 2017 at 4:41:46 PM UTC-5, Richard Damon wrote:
>>>
>>>> On 12/26/17 3:20 PM, Rick C. Hodgin wrote:
>>>>
>>>>> Is there a bounded memcpy()-like function which allows me to pass in a
>>>>> src pointer, starting offset, length of data at the pointer to copy,
>>>>> and a dst pointer and maximum size of the buffer in dst to receive the
>>>>> data, which will copy as much as is allowed, and if the starting
>>>>> offset exceeds the bounded size, perform no copy?
>>>>>

>>>>> // Something like:
>>>>> size_t bounded_memcpy(void *dst, size_t dst_size, size_t offset,
>>>>> void *src, size_t src_size);
>>>>>
>>>>> So that if I call with these it will only copy up to the max:
>>>>>
>>>>> char buffer[2];
>>>>> size_t offset;
>>>>>
>>>>> // Populate buffer[]
>>>>> offset = bounded_memcpy(buffer, sizeof(buffer), 0, "Hi, ", 4);
>>>>> offset = bounded_memcpy(buffer, sizeof(buffer), offset, "mom!", 4);
>>>>>
>>>>> The end result is that buffer[3] contains only "Hi" even though much
>>>>> more data was attempted.
>>>>
>>>> I know of no such function in the C standard, and it doesn't sound like
>>>> it is defined the way C tends to define standard libraryy functions. It
>>>> wouldn't take much effort to define such a function, perhaps using
>>>> memcpy as a core to do the actual copy.
>>>>
>>>> Providing a base buffer and an offset isn't the normal way C libraries
>>>> work, they tend to just have you pass base+offset instead. Also the
>>>> memory functions don't have size limits, that capability is more normal
>>>> in the string functions where you don't know for sure long the source
>>>> data string is.

>>>
>>> The purpose is to allow a series of operations to silently pass through
>>> copy operations even if there's not enough space in the target, while
>>> still returning the size of the buffer's needs even if it wasn't yet
>>> allocated (or wasn't yet allocated sufficiently).
>>>

>>> -----
>>> I sent supercat an email on an optimization question, but I don't think
>>> I have a valid email address ... or he's ignoring me. I had a thought
>>> regarding writing this type of function this way...:
>>>
>>> #define u32 unsigned _int32_t
>>> #define s8 char
>>>
>>> u32 bounded_memcpy(s8* dst, u32 dst_size, u32 dst_offset,
>>> s8* src, u32 src_size)
>>> {
>>> u32 src_offset;
>>>
>>>
>>> // Iterate for each character in src
>>> for (src_offset = 0; src_offset < src_size; ++src_offset,
>>> ++dst_offset)
>>> {

>>> // Are we still bounded?
>>> if (dst_offset < dst_size)
>>> dst[dst_offset] = src[src_offset]; // Yes
>>> }
>>>

>>> // Indicate our new offset
>>> return(dst_offset);
>>> }
>>>
>>> ...and have the optimizing compiler recognize that dst, dst_size, and
>>> dst_offset are constraints on the target, and that src and src_size
>>> are constraints on the input, and to recognize it's a copy operation
>>> and rewrite this code as a fundamental series of correct single
>>> memcpy() calls and bypass logic by extracting the logic in the for()
>>> loop and if() statement all in the optimizer.
>>
>> Yes, one can believe in fairies or one can implement it with memcpy
>>
>> size_t bounded_memcpy(void *dst, size_t dst_size, size_t offset,
>> const void *src, size_t src_size)
>> {
>> if(offset < dst_size) {
>> memcpy((char*)dst+offset, src, dst_size - offset < src_size ?
>> dst_size - offset :
>> src_size);
>> }
>> return offset+src_size;
>> }
>>
>> which has the advantage to touch the source only when there's really
>> something to copy.
>
> Both implementations should probably check for overflow--dst_offset++ in the
> former, offset + src_size in the latter.

>
> One aspect about C programming that bothers me is that pointer arithmetic is
> signed (more or less) but constraints often use unsigned arithmetic. Using
> unsigned arithmetic for constraints makes plenty of sense to me, especially
> in code that needs to track out-of-bounds positions. Nonetheless, on its
> face I'm not sure that dst_offset<dst_size necessarily implies that
> dst+dst_offset is a valid address or that it can even be safely evaluated.
>

> On the one hand, C11 6.5.6p8 says,
>

> If both the pointer operand and the result point to elements of the same
> array object, or one past the last element of the array object, the
> evaluation shall not produce an overflow
>

> On the other hand p9 says,
>
> When two pointers are subtracted ... [if] the result is not representable

> in an object of [ptrdiff_t (a signed integer type)], the behavior is
> undefined.

To me it seems clear what is intended: adding a pointer and an
integer must work in all cases (inside the array limits, of
course), but subtracting one pointer from another is defined
only when the difference is inside the range of ptrdiff_t.

> It's a subtle issue--inapplicable to the code using memcpy but relevant to
> the looping code, yet why that should be is not at all obvious.

Because all pointers are wide enough to point to any valid
address, but ptrdiff_t might not be wide enough to capture all
differences. For addition, undersized integers are promoted "up"
to the width of a pointer, but for subtraction, results are
converted "down" to the width of ptrdiff_t.

Why the Standard chooses to allow such an incongruity is not
obvious (historical precedent?), but given that it does allow it
the stated rules seem a natural consequence.

Tim Rentsch

unread,

Jan 7, 2018, 1:33:36 PM1/7/18

to

Richard Damon <Ric...@Damon-Family.org> writes:

> On 12/29/17 9:16 PM, William Ahern wrote:
>
>> Both implementations should probably check for overflow--dst_offset++ in the
>> former, offset + src_size in the latter.
>>
>> One aspect about C programming that bothers me is that pointer arithmetic is
>> signed (more or less) but constraints often use unsigned arithmetic. Using
>> unsigned arithmetic for constraints makes plenty of sense to me, especially
>> in code that needs to track out-of-bounds positions. Nonetheless, on its
>> face I'm not sure that dst_offset<dst_size necessarily implies that
>> dst+dst_offset is a valid address or that it can even be safely evaluated.
>>
>> On the one hand, C11 6.5.6p8 says,
>>
>> If both the pointer operand and the result point to elements of the same
>> array object, or one past the last element of the array object, the
>> evaluation shall not produce an overflow
>>
>> On the other hand p9 says,
>>
>> When two pointers are subtracted ... [if] the result is not representable
>> in an object of [ptrdiff_t (a signed integer type)], the behavior is
>> undefined.
>>

>> It's a subtle issue--inapplicable to the code using memcpy but relevant to
>> the looping code, yet why that should be is not at all obvious.
>

> Pointers are more like unsigned numbers, but not really. Pointer
> arithmetic is defined only for pointer values that point within, or
> one past, a given array.
>
> dst+dst_offset is defined as the equivalent as &dst[dst_offset] which
> if dst_offset is no bigger than the dimension of the array, must be a
> valid value.
>
> The issue with overflow of ptrdiff_t can only occur for a type with
> sizeof = 1, and an array size greater than SIZE_MAX/2 (i.e. an object
> over 1/2 of the size of a maximum size object).

I think you're assuming that ptrdiff_t and size_t are somehow tied
together with respect to their ranges. I believe the Standard does
not require any such relationship. An implementation could have
a ptrdiff_t type with a width of 17 bits, and a size_t type with a
width of 37 bits, and still be conforming (assuming I haven't
missed something, which of course I may have).

Richard Damon

unread,

Jan 7, 2018, 2:33:34 PM1/7/18

to

ptrdiff_t and size_t may not be required to be the same size by any
wording in the standard, but typically (I know of no exceptions) they
are, and that generally makes sense, as except for the (unusual) case of
an array of char (or other type with sizeof == 1) with a bound exceeding
SIZE_MAX/2, it works. For most machines, to handle this case would force
ptrdiff_t to be twice that size, and bigger than a typical register, and
thus somewhat inefficient to process, so the fact that the standard
gives them this out (which is likely the exact reason for that wording)
they will implement things the way that is efficient that works for the
vast majority of cases.

supe...@casperkitty.com

unread,

Jan 8, 2018, 10:49:54 AM1/8/18

to

On Sunday, January 7, 2018 at 1:33:34 PM UTC-6, Richard Damon wrote:
> ptrdiff_t and size_t may not be required to be the same size by any
> wording in the standard, but typically (I know of no exceptions) they
> are, and that generally makes sense, as except for the (unusual) case of
> an array of char (or other type with sizeof == 1) with a bound exceeding
> SIZE_MAX/2, it works. For most machines, to handle this case would force
> ptrdiff_t to be twice that size, and bigger than a typical register, and
> thus somewhat inefficient to process, so the fact that the standard
> gives them this out (which is likely the exact reason for that wording)
> they will implement things the way that is efficient that works for the
> vast majority of cases.

The C11 Standard requires that on platforms where "int" is 16 bits,
ptrdiff_t must be a larger type, even if no objects could exceed 32767
bytes. On such systems, however, size_t would most likely be a 16-bit
type. On a platform where no object can exceed 32767 bytes, I'd regard
a non-conforming implementation where ptrdiff_t is 16 bits as more useful
than one where it is 32 bits, but the Standard would not allow such
implementations.

Patrick.Schluter

unread,

Jan 8, 2018, 1:36:53 PM1/8/18

to

I would imagine that a compiler targetting x86 in real mode would define
sizeof (ptrdiff_t) > sizeof (size_t) in the huge memory model, where
pointers can address the whole 20 bits address range but cannot define
size_t bigger that 16 bit for consistency with the other memory models.

While x86 in real mode seems obsolete, there are still legacy projects
using 80186 and derivatives to this day. I know at least 3 projects
still actively supported using such processors (Am186EM, 80186EX and
plain old 80188).

Patrick.Schluter

unread,

Jan 8, 2018, 1:42:47 PM1/8/18

to

> obvious (historical precedent?), but given that it does allow it==

> the stated rules seem a natural consequence.
>

x86 in real mode's HUGE memory model requires it.
it has
sizeof(void*) == 4 (but only 20 bits range of values). It can
have aliased pointers, i.e. any object can be addressed with up to 4096
different pointers.
sizeof(size_t) == 2

supe...@casperkitty.com

unread,

Jan 8, 2018, 3:36:12 PM1/8/18

to

On Monday, January 8, 2018 at 12:36:53 PM UTC-6, Patrick.Schluter wrote:
> Le 08.01.2018 à 16:49, supe...@casperkitty.com a écrit :
> > The C11 Standard requires that on platforms where "int" is 16 bits,
> > ptrdiff_t must be a larger type, even if no objects could exceed 32767
> > bytes. On such systems, however, size_t would most likely be a 16-bit
> > type. On a platform where no object can exceed 32767 bytes, I'd regard
> > a non-conforming implementation where ptrdiff_t is 16 bits as more useful
> > than one where it is 32 bits, but the Standard would not allow such
> > implementations.
>
> I would imagine that a compiler targetting x86 in real mode would define
> sizeof (ptrdiff_t) > sizeof (size_t) in the huge memory model, where
> pointers can address the whole 20 bits address range but cannot define
> size_t bigger that 16 bit for consistency with the other memory models.

An option to make ptrdiff_t be larger than size_t could be useful on any
platform where an object's size could exceed SIZE_MAX/2. I do find it
curious the Standard would not require that an implementation uphold the
identity p+(q-p)==q any time p and q point to or just past the same array
object--even if that would require defining other behaviors that would not
otherwise be defined(*)--but requires small implementations to use a ptrdiff_t
which is more expensive than would be necessary to ensure that subtraction
of arbitrary array elements fits within it.

(*) On most non-huge x86 implementations, given `char foo[49152];`,
(foo+49152u)-foo would yield -16384, but foo+(-16384) would yield
foo+49152u, thus upholding the aforementioned identity **even in cases
where the difference between two pointers would exceed PTRDIFF_MAX**.

Richard Damon

unread,

Jan 9, 2018, 9:10:36 PM1/9/18

to

I had forgotten that ptrdiff_t had a minimum range of +-65535. I guess
they consider that char arrays (or objects with pointers converted to
char) bigger than 32767 were common enough that implementations should
handle them.

Note, that your example can't hold for a hosted implementation, as the
translation limits now require being able to create an object of 65535
bytes.

supe...@casperkitty.com

unread,

Jan 11, 2018, 10:57:18 AM1/11/18

to

The required minimum used to be 32767. I wonder why C99 changed it
without also requiring that hosted implementations use 32-bit "int". I
can't think of any platforms that couldn't efficiently use 32-bit "int"
but could efficiently handle 65,535-byte objects (MS-DOS implementations
were often limited to 65,520-byte objects except when using horribly-
inefficient "huge" more or using platform-specific extensions).

The only time it makes sense to use 16-bit "int" is when targeting or
emulating platforms where the expanded requirement would be hardship.
Implementations that could handle objects larger than 32767 bytes would
almost certainly do so whether or not the Standard required it, so all the
new requirement did was reduce the range of platforms that could host an
efficient conforming C implementation.

If the Standard recognized more than two categories of implementation the
issue could have been resolved sensibly by having a category of "minimal
hosted implementation" which could use 16-bit "int" and did not need to
support objects larger than 32767 bytes, and a "full hosted implementation"
which would need to use 32-bit or larger "int", and would be required to
support larger objects.

Richard Damon

unread,

Jan 11, 2018, 11:18:38 PM1/11/18

to

Note, the requirement is to create an object of that size, so it could
be a static/global variable, which could be a full segment big. malloc()
is allowed to fail for any of a variety of reasons, so I guess being too
big for malloc would bbe ok. This means that 16 bit x86 could be
conforming in a 'large' data model, and for this case ptrdiff_t would be
an alias for long (32 bits) while size_t might be unsigned int (16 bits).

C99 did basically say that 64k byte machines could not have conforming
hosted implementations, perhaps there weren't any in existence at that
point.

Tim Rentsch

unread,

Jan 13, 2018, 10:38:43 PM1/13/18

to

Next time perhaps you can preface your comments with "In most
implementations", so that there will be no confusion.

> and that generally makes sense, as except for the (unusual) case
> of an array of char (or other type with sizeof == 1) with a bound
> exceeding SIZE_MAX/2, it works. For most machines, to handle this case
> would force ptrdiff_t to be twice that size, and bigger than a typical
> register, and thus somewhat inefficient to process, so the fact that
> the standard gives them this out (which is likely the exact reason for
> that wording) they will implement things the way that is efficient
> that works for the vast majority of cases.

This sounds like another assumption, that implementations typically
allow allocation up to SIZE_MAX bytes (stipulating that ptrdiff_t
and size_t are the same width). I believe many implementations
impose a lower threshold, eg SIZE_MAX/2 bytes, which avoids the
problem. (In fact the C compiler on my old workhorse linux server
enforces that limit.) Even granting that ptrdiff_t and size_t
have the same width, the "out" is not necessary.

Tim Rentsch

unread,

Jan 13, 2018, 10:55:11 PM1/13/18

to

"Patrick.Schluter" <Patrick....@free.fr> writes:

> Le 07.01.2018 at 19:18, Tim Rentsch a ecrit:

What I think you mean is that (some) implementations that work
with a real-mode huge memory model choose to have size_t be 16
bits. The memory model doesn't make size_t be 16 bits; that
decision belongs to the implementation.

In any case, what you're talking about is not the same as what I
was talking about. There is no problem with making ptrdiff_t
_larger_ than size_t. The weirdness is in making ptrdiff_t too
small to span a single object. An implementation with a 16-bit
size_t cannot possibly run afoul of this problem (as was pointed
out, assuming conformance to C99 or later - C90 apparently didn't
specify any limits on the ranges of size_t or ptrdiff_t).

supe...@casperkitty.com

unread,

Jan 15, 2018, 12:56:11 PM1/15/18

to

On Saturday, January 13, 2018 at 9:55:11 PM UTC-6, Tim Rentsch wrote:
> In any case, what you're talking about is not the same as what I
> was talking about. There is no problem with making ptrdiff_t
> _larger_ than size_t. The weirdness is in making ptrdiff_t too
> small to span a single object. An implementation with a 16-bit
> size_t cannot possibly run afoul of this problem (as was pointed
> out, assuming conformance to C99 or later - C90 apparently didn't
> specify any limits on the ranges of size_t or ptrdiff_t).

Making ptrdiff_t larger than size_t would likely have significant negative
performance implementations on systems where it would be necessary, and in
many cases would offer little or no benefit. On a 16-bit system, machine
code for "p1-p2 > 8000" that would would work with objects up to 65,535 bytes
would be much more expensive than code which only had to work in cases where
the difference was in the range +/-32767. Further, if a system uses the kind
of silent wraparound two's-complement semantics that were, according to the
C89 rationale, used by most then-current C implementations, programmers that
need to handle larger differences may be able to write "p1-p2 > 8000u" if
p2 can't be smaller than p1, or "(p1>p2) && (p1-p2 > 8000u)" if it might be.