does a double cast to unsigned makes sense in any circumstance?

Meredith Montgomery

unread,

Jan 3, 2022, 12:18:57 PM1/3/22

to

I've seen this somewhere or I wrote this some time in the past for some
reason and I can't see how this make sense any longer. In a procedure
to read a numeric string and turn it into a number, we need to convert
each char-digit to some kind of integer:

(uint64_t) (unsigned char) (s[pos] - '0');

But the double cast here seems odd. I could have written this by
copying it from somewhere else. Today, what I would write is just

(uint64_t) (s[pos] - '0');

Am I making a mistake now? I can't anything wrong with this. Isn't a
char just an int? I'm turning an int into an unsigned integer (of a
larger size).

Thank you!

Bart

unread,

Jan 3, 2022, 12:33:02 PM1/3/22

to

Have you tried it? With this code:

signed char c=-10;
printf("%llu\n", (unsigned long long) c);
printf("%llu\n", (unsigned long long) (unsigned char) c);

The output is

18446744073709551606
246

So clearly there's a difference.

(I used 'long long' so that I could use the llu print format with more
confidence.)

Andrey Tarasevich

unread,

Jan 3, 2022, 1:06:17 PM1/3/22

to

Yes, it does make sense. For example, if the outer type is wider than
the inner type, as in your example.

A cast to unsigned type is not just a conceptual type change. It also
might transform (wrap) the value in accordance with the rules of modulo
arithmetic, with modulo equal to 2^(type width).

So, the first cast is intended to wrap the value. The second cast is
intended to "expand" the result to a desired target type.

--
Best regards,
Andrey Tarasevich

Bonita Montero

unread,

Jan 3, 2022, 1:31:53 PM1/3/22

to

Am 03.01.2022 um 18:15 schrieb Meredith Montgomery:
> I've seen this somewhere or I wrote this some time in the past for some
> reason and I can't see how this make sense any longer. In a procedure
> to read a numeric string and turn it into a number, we need to convert
> each char-digit to some kind of integer:
>
> (uint64_t) (unsigned char) (s[pos] - '0');

(s[pos] - '0') might me signed and without the cast to unsigned
char it might be sign-extended to int64_4 before it is converted
to uint64_t. But there are no guarantees when converting negative
values to unsigneds.

Richard Damon

unread,

Jan 3, 2022, 2:37:45 PM1/3/22

to

As others have said, it does change the behavior, but only if s[pos]
might be less than '0', which means it holds something other than a
digit, as the characters 0..9 are a required to be a consecutive
increase sequence (which the code is likely counting on).

If you know that s[pos] will always be > 0, then the double cast may
produce faster code, as unless the compiler can also determine this, it
may need to first sign extend to int, then zero extend to uint64_t,
verse just zero extending to uint64_t.

Meredith Montgomery

unread,

Jan 15, 2022, 8:09:33 PM1/15/22

to

Indeed. Thank you. I had tried it, but I wasn't quite thinking of
negative numbers because I was thinking of just my chars and my chars
are never negative. So the purpose there is to not let the number grow
to 64 bits. That's my understanding right now. Thank you.

Meredith Montgomery

unread,

Jan 15, 2022, 8:12:58 PM1/15/22

to

Interesting. I got another question here. If s[pos] is always a char
and '0' is always a char, then s[pos] - '0' will never be greater than a
char. It might be negative. If it's negative and if I do

(unsigner char) s[pos] - '0',

can this do any wrapping at all? I'd say not in this case. I think you
are in more general waters than I took your message. Can you clarify a
bit? I'm a bit lost.

Meredith Montgomery

unread,

Jan 15, 2022, 8:14:23 PM1/15/22

to

Really, no guarantees? So a procedure that does

(uint64_t) (unsigned char) (s[pos] - '0')

is not a C program if a machine uses negative numbers for chars?

Meredith Montgomery

unread,

Jan 15, 2022, 8:15:30 PM1/15/22

to

What do you mean by ``zero extend''? Are you qualifying the verb ``to
extend''? You lost me there. Thank you.

Richard Damon

unread,

Jan 15, 2022, 8:55:35 PM1/15/22

to

When the processor loads the 8 bit character into the bottom of the
register, it generally leaves the rest of the register alone.

If Char is signed, and int is 32 bits, then the implementation can
convert it to an int by extending the sign of the bottom byte into the
rest of the register, and then to make it a unsigned 54 bit value, it
needs to put zeros into the upper 32 bits of the register.

By casting to unsigned character, then the implentation only needs to
use an instruction that fills the upper 56 bits of the register to 0.

Setting the upper bits of a register with a value is called 'extending'

If it is copying the sign bit of the lower part of the register, it is
called sign extending.

If it is just fixing them to zero, it is called zero extending, because
we are thinking of the always 0 'sign bits' that extend beyond the bits
of the unsigned value.

Tim Rentsch

unread,

Jan 15, 2022, 9:53:21 PM1/15/22

to

The original context for the expression in question is this
function:

> [posted by Meredith Montgomery]
>
> #include <limits.h>
> #include <inttypes.h>
>
> int scan_ulong(register char *s, register unsigned long *u)
> {
> register unsigned int pos;
> register unsigned long r;
> register unsigned long c;
>
> pos = 0; r = 0;
>
> for ( ;; ) {
> c = (unsigned long) (unsigned char) (s[pos] - '0');
> if (c < 10) {
> if( ((ULONG_MAX - c) / 10) >= r)
> r = r * 10 + c;
> else return -1; /* lack of space */
> ++pos; continue;
> }
> break;
> }
>
> *u = r;
> return pos;
> }

In this context, the casts are superfluous. Just write

c = s[pos] - '0';

and the right thing will happen, because the assignment to 'c'
converts the value on the right hand side to 'unsigned long',
and so takes care of any negative values.

James Kuyper

unread,

Jan 15, 2022, 11:13:08 PM1/15/22

to

On 1/15/22 8:12 PM, Meredith Montgomery wrote:
...

> Interesting. I got another question here. If s[pos] is always a char
> and '0' is always a char, then s[pos] - '0' will never be greater than a
> char. It might be negative. If it's negative and if I do

Note that while members of the basic character set are required to be
represented by non-negative values (6.2.5p3), members of the extended
character set are not. If char is signed, they can be as low as
CHAR_MIN, in which case subtracting '0' necessarily results in undefined
behavior. The following discussion applies only when that isn't the case:

> (unsigner char) s[pos] - '0',
>
> can this do any wrapping at all? I'd say not in this case. I think you
> are in more general waters than I took your message. Can you clarify a
> bit? I'm a bit lost.

Subtraction involves (6.5p6) the usual arithmetic conversions
(6.3.1.8p1), and the first part of that is the integer promotions
(6.3.1.1p2). Both overflow and wrap-around are possible results, but
only on systems where CHAR_BIT >= 16. Such systems do exist - the ones
I've heard of are embedded systems specialized for digital signal
processing.

If all values of type unsigned char can be represented as an int,
(unsigned char)s[pos] will be promoted to int before performing the
subtraction (6.3.1.1p2). In the unlikely event that CHAR_MAX == INT_MAX,
(int)(unsigned char)s[pos] will have the same value as s[pos].
Therefore, if s[pos] is sufficiently close to INT_MIN, subtracting '0'
from it will have undefined behavior.

In the unlikely event that UCHAR_MAX > INT_MAX, s[pos] is promoted to an
unsigned int (6.3.1.1p2). Since '0' has the type 'int', and 'int' and
unsigned int' have the same integer conversion rank, the usual
arithmetic conversions specify that '0' is converted to unsigned int,
and the subtraction is carried out in that type. If s[pos] is less than
'0', then s[pos] - '0' will have a mathematical value that is negative,
so it will have to wrap around.

Bonita Montero

unread,

Jan 15, 2022, 11:13:15 PM1/15/22

to

Am 03.01.2022 um 20:37 schrieb Richard Damon:

> If you know that s[pos] will always be > 0, then the double cast may
> produce faster code, as unless the compiler can also determine this,
> it may need to first sign extend to int, then zero extend to uint64_t,
> verse just zero extending to uint64_t.

On x86 there are widening-instructions to zero- / sign-extend a shorter
value which have all the same performance.

James Kuyper

unread,

Jan 15, 2022, 11:24:29 PM1/15/22

to

On 1/15/22 8:14 PM, Meredith Montgomery wrote:
> Bonita Montero <Bonita....@gmail.com> writes:
>
>> Am 03.01.2022 um 18:15 schrieb Meredith Montgomery:
>>> I've seen this somewhere or I wrote this some time in the past for some
>>> reason and I can't see how this make sense any longer. In a procedure
>>> to read a numeric string and turn it into a number, we need to convert
>>> each char-digit to some kind of integer:
>>> (uint64_t) (unsigned char) (s[pos] - '0');
>>
>> (s[pos] - '0') might me signed and without the cast to unsigned
>> char it might be sign-extended to int64_4 before it is converted

>> to uint64_t. ...

He's right about it being signed - that's normally the case. It normally
has the type `int`, except in the extremely unlikely case that CHAR_MAX
> INT_MAX, in which case it will have the type `unsigned int`. However,
int64_t comes into play only if it's typedef for `int`.

>> ... But there are no guarantees when converting negative

>> values to unsigneds.
>
> Really, no guarantees? So a procedure that does

No, Bonita is mistaken. The standard provides a strong explicit
guarantee of what the behavior is when converting negative values to
unsigned type:

"... the value is converted by repeatedly adding ... one more than the
maximum value that can be represented in the new type until the value is
in the range of the new type." (6.3.1.3p2).

It's the other direction that's dangerous: conversion of an integer
value to a signed integer type that is not representable in that type
has undefined behavior.

Keith Thompson

unread,

Jan 16, 2022, 4:31:12 PM1/16/22

to

James Kuyper <james...@alumni.caltech.edu> writes:
[...]

> "... the value is converted by repeatedly adding ... one more than the
> maximum value that can be represented in the new type until the value is
> in the range of the new type." (6.3.1.3p2).
>
> It's the other direction that's dangerous: conversion of an integer
> value to a signed integer type that is not representable in that type
> has undefined behavior.

No, converting an out-of-range integer value to a signed integer type
yields an implementation-defined result or raises an
implementation-defined signal. (C99 added the option of raising a
signal; in my opinion that was a bad idea.)

--
Keith Thompson (The_Other_Keith) Keith.S.T...@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

James Kuyper

unread,

Jan 16, 2022, 11:25:24 PM1/16/22

to

On 1/16/22 4:30 PM, Keith Thompson wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
> [...]
>> "... the value is converted by repeatedly adding ... one more than the
>> maximum value that can be represented in the new type until the value is
>> in the range of the new type." (6.3.1.3p2).
>>
>> It's the other direction that's dangerous: conversion of an integer
>> value to a signed integer type that is not representable in that type
>> has undefined behavior.
>
> No, converting an out-of-range integer value to a signed integer type
> yields an implementation-defined result or raises an
> implementation-defined signal. (C99 added the option of raising a
> signal; in my opinion that was a bad idea.)

Sorry - you're correct, and I actually knew that - but I was too tired
when I wrote that message.

Meredith Montgomery

unread,

Jan 17, 2022, 7:41:58 AM1/17/22

to

That makes perfect sense. Thanks for the info. Very appreciated.

Meredith Montgomery

unread,

Jan 17, 2022, 7:43:27 AM1/17/22

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

> James Kuyper <james...@alumni.caltech.edu> writes:
> [...]
>> "... the value is converted by repeatedly adding ... one more than the
>> maximum value that can be represented in the new type until the value is
>> in the range of the new type." (6.3.1.3p2).
>>
>> It's the other direction that's dangerous: conversion of an integer
>> value to a signed integer type that is not representable in that type
>> has undefined behavior.
>
> No, converting an out-of-range integer value to a signed integer type
> yields an implementation-defined result or raises an
> implementation-defined signal. (C99 added the option of raising a
> signal; in my opinion that was a bad idea.)

Just curious --- why do you think a signal is a bad idea?

Meredith Montgomery

unread,

Jan 17, 2022, 7:44:53 AM1/17/22

to

Awesome information. Thank you so much.

Meredith Montgomery

unread,

Jan 17, 2022, 7:47:07 AM1/17/22

to

Oh, interesting! Thanks! I did not think of that.

Keith Thompson

unread,

Jan 17, 2022, 5:00:02 PM1/17/22

to

A signal isn't inherently a bad idea, particularly if portable code can
handle it. But the fact that the signal is implementation-defined makes
that impossible. (And freestanding implementations needn't support
<signal.h>.)

And I've never heard of an implementation that raises a signal on an out
of range conversion, so having that option has not turned out to be
useful.

To make it potentially useful, the standard could have required a
specific signal to be raised (say, SIGCONV, which could be an alias for
some other signal), and specified a predefined macro that tells you
which choice the implementation made.

Even more useful would be to require the signal (but that would break
existing code) or provide a mechanism to enable it (but that might be
too great a burden on existing implementations). But again, that would
leave freestanding implementations out in the cold.

Reliable handling of numeric overflow has always been difficult in C.
This feature seems like a single very small step in making it easier,
but without more complete support I think it would have been better to
leave it the way it was in C90.

Tim Rentsch

unread,

Jan 20, 2022, 10:37:33 PM1/20/22

to

James Kuyper <james...@alumni.caltech.edu> writes:

> On 1/15/22 8:12 PM, Meredith Montgomery wrote:
> ...
>
>> Interesting. I got another question here. If s[pos] is always a char
>> and '0' is always a char, then s[pos] - '0' will never be greater than a
>> char. It might be negative. If it's negative and if I do
>
> Note that while members of the basic character set are required to be
> represented by non-negative values (6.2.5p3), members of the extended
> character set are not. If char is signed, they can be as low as
> CHAR_MIN, in which case subtracting '0' necessarily results in undefined

> behavior. [...]

No, it doesn't. Subtracting one character value from another can
give undefined behavior, but it doesn't have to, and indeed
usually cannot. The only circumstance where it can is when char
is signed and CHAR_MAX == INT_MAX.

james...@alumni.caltech.edu

unread,

Jan 21, 2022, 12:35:28 PM1/21/22

to

On Thursday, January 20, 2022 at 10:37:33 PM UTC-5, Tim Rentsch wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
...

> > Note that while members of the basic character set are required to be
> > represented by non-negative values (6.2.5p3), members of the extended
> > character set are not. If char is signed, they can be as low as
> > CHAR_MIN, in which case subtracting '0' necessarily results in undefined
> > behavior. [...]

As Keith pointed out (and I had already conceded four days before your
message) that was a mistake on my part - the relevant clause says
"Otherwise, the new type is signed and the value cannot be represented in it;
either the result is implementation-defined or an implementation-defined
signal is raised." (6.3.1.3p3).

> No, it doesn't. Subtracting one character value from another can
> give undefined behavior, but it doesn't have to, and indeed
> usually cannot. The only circumstance where it can is when char
> is signed and CHAR_MAX == INT_MAX.

You're right - I was thinking about the possibility that CHAR_MAX == INT_MAX,
but I forgot to specify that this was a pre-condition for my claim that the
undefined behavior (which should instead have been a referrence to
6.3.1.3p3) was necessarily the result. Thank you for pointing this out.

Tim Rentsch

unread,

Jan 21, 2022, 8:20:03 PM1/21/22

to

"james...@alumni.caltech.edu" <james...@alumni.caltech.edu> writes:

> On Thursday, January 20, 2022 at 10:37:33 PM UTC-5, Tim Rentsch wrote:
>
>> James Kuyper <james...@alumni.caltech.edu> writes:
>
> ...
>
>>> Note that while members of the basic character set are
>>> required to be represented by non-negative values (6.2.5p3),
>>> members of the extended character set are not. If char is
>>> signed, they can be as low as CHAR_MIN, in which case
>>> subtracting '0' necessarily results in undefined behavior.
>>> [...]
>
> As Keith pointed out (and I had already conceded four days
> before your message) that was a mistake on my part - the
> relevant clause says "Otherwise, the new type is signed and the
> value cannot be represented in it; either the result is
> implementation-defined or an implementation-defined signal is
> raised." (6.3.1.3p3).

I noticed that, and saw Keith's response, and so chose not to
post a followup myself. But I wasn't saying anything about
converting back to a char, just about the subtraction.

>> No, it doesn't. Subtracting one character value from another can
>> give undefined behavior, but it doesn't have to, and indeed
>> usually cannot. The only circumstance where it can is when char
>> is signed and CHAR_MAX == INT_MAX.
>
> You're right - I was thinking about the possibility that
> CHAR_MAX == INT_MAX, but I forgot to specify that this was a
> pre-condition for my claim that the undefined behavior (which
> should instead have been a referrence to 6.3.1.3p3) was
> necessarily the result. Thank you for pointing this out.

I was confident you would understand the point once I mentioned
it. The followup was meant mainly for other readers who may
have been confused.

Meredith Montgomery

unread,

Jan 28, 2022, 8:14:04 PM1/28/22

to

Keith Thompson <Keith.S.T...@gmail.com> writes:

> Meredith Montgomery <mmont...@levado.to> writes:
>> Keith Thompson <Keith.S.T...@gmail.com> writes:
>>> James Kuyper <james...@alumni.caltech.edu> writes:
>>> [...]
>>>> "... the value is converted by repeatedly adding ... one more than the
>>>> maximum value that can be represented in the new type until the value is
>>>> in the range of the new type." (6.3.1.3p2).
>>>>
>>>> It's the other direction that's dangerous: conversion of an integer
>>>> value to a signed integer type that is not representable in that type
>>>> has undefined behavior.
>>>
>>> No, converting an out-of-range integer value to a signed integer type
>>> yields an implementation-defined result or raises an
>>> implementation-defined signal. (C99 added the option of raising a
>>> signal; in my opinion that was a bad idea.)
>>
>> Just curious --- why do you think a signal is a bad idea?
>
> A signal isn't inherently a bad idea, particularly if portable code can
> handle it. But the fact that the signal is implementation-defined makes
> that impossible. (And freestanding implementations needn't support
> <signal.h>.)

What's a freestanding implementation?

James Kuyper

unread,

Jan 28, 2022, 9:29:24 PM1/28/22

to

On 1/28/22 20:13, Meredith Montgomery wrote:
...

> What's a freestanding implementation?

"The two forms of conforming implementation are hosted and freestanding.
A conforming hosted implementation shall accept any strictly conforming
program. A conforming freestanding implementation shall accept any
strictly conforming program in which the use of the features specified
in the library clause (Clause 7) is confined to the contents of the
standard headers <float.h> , <iso646.h> , <limits.h> , <stdalign.h> ,
<stdarg.h> , <stdbool.h> , <stddef.h> , <stdint.h> , and <stdnoreturn.h>
. Additionally, a conforming freestanding implementation shall accept
any strictly conforming program in which the use of the features
specified in the header <string.h> , except the following functions:
strdup , strndup , strcoll , strxfrm , strerror ." (4p6)

That last sentence is new in n2731.pdf, the latest draft of the CX 202X
that I have. It's not grammatically correct; I suspect that the word
"except" in the final sentence should have been "is confined to the use
of", paralleling the structure of the previous sentence.

Basically, freestanding implementations have no obligation to implement
most of the C standard library.

Keith Thompson

unread,

Jan 28, 2022, 10:37:54 PM1/28/22

to

See section 4 of any edition of the C standard. I usually use
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf, a draft
that's close to C11.

The two forms of *conforming implementation* are hosted and
freestanding. A *conforming hosted implementation* shall accept any
strictly conforming program. A *conforming freestanding implementation*

shall accept any strictly conforming program in which the use of the

features specified in the library clause (clause 7) is confined to the

contents of the standard headers <float.h>, <iso646.h>, <limits.h>,
<stdalign.h>, <stdarg.h>, <stdbool.h>, <stddef.h>, <stdint.h>, and

<stdnoreturn.h>. A conforming implementation may have extensions
(including additional library functions), provided they do not alter
the behavior of any strictly conforming program.

Basically a hosted implementation is the kind that you're most likely to
encounter. It generates code that runs under an operating system, and
it supports the entire standard library. A freestanding implementation
targets an embedded system that might not have an operating system at
all. The only standard library headers that must be supported are the
ones that don't declare any functions. A freestanding implementation
might provide some library functions, but it isn't required to.

Bonita Montero

unread,

Jan 29, 2022, 10:57:34 AM1/29/22

to

This makes different results if "s[pos] - '0'" is negative.

Öö Tiib

unread,

Jan 29, 2022, 12:54:58 PM1/29/22

to

Your usual incapability to read next sentence below that addresses it
specifically noted.

> >
> > and the right thing will happen, because the assignment to 'c'
> > converts the value on the right hand side to 'unsigned long',
> > and so takes care of any negative values.

Works, unlike your pathetically borken functions in another thread:
https://groups.google.com/g/comp.lang.c/c/pfgpeqAm0h8/m/y9MHNBwuAQAJ

Manfred

unread,

Jan 29, 2022, 9:45:20 PM1/29/22

to

A freestanding implementation might also target the OS itself of a
common system (not necessarily embedded), right?

Keith Thompson

unread,

Jan 30, 2022, 3:02:12 AM1/30/22

to

Sure. Any implementation that meet's the standard's requirements can be
a conforming freestanding implementation -- even one that targets an OS.
You could call an implementation "freestanding" even if it's just
because you haven't implemented all the required parts of the standard
library.

The standard does have more to say about freestanding implementations
(N1570 5.1.2.1):

In a freestanding environment (in which C program execution
may take place without any benefit of an operating system),
the name and type of the function called at program startup are
implementation-defined. Any library facilities available to a
freestanding program, other than the minimal set required by
clause 4, are implementation-defined.

The effect of program termination in a freestanding environment
is implementation-defined.

And the ANSI C Rationale:

By defining conforming implementations in terms of the programs they
accept, the Standard leaves open the door for a broad class of
extensions as part of a conforming implementation. By defining both
conforming hosted and conforming freestanding implementations, the
Standard recognizes the use of C to write such programs as operating
systems and ROM-based applications, as well as more conventional
hosted applications. Beyond this two-level scheme, no additional
subsetting is defined for C, since the Committee felt strongly that
too many levels dilutes the effectiveness of a standard.

I think the intent is that freestanding implementations are for embedded
systems, likely with no OS, or you're using the implementation to build
the OS, but it's not a strict requirement.

Bonita Montero

unread,

Jan 30, 2022, 7:17:37 AM1/30/22

to

That's not true. The compiler may give a negative value
converted to an unsigned value:

__declspec(noinline)
unsigned long f( char c )
{
return c - '0';
}

movsx eax, cl
sub eax, 48
ret 0

Richard Damon

unread,

Jan 30, 2022, 12:59:25 PM1/30/22

to

I suppose that means that most 'Windows' implementations are technically
'freestanding' since the program starts at winmain not main (or just not
conforming, which also sounds sort of right).

Öö Tiib

unread,

Jan 31, 2022, 2:53:39 AM1/31/22

to

And little negative value converted to unsigned value does not
satisfy < 10. Works, unlike your pathetically borken functions

Tim Rentsch

unread,

Feb 4, 2022, 1:29:20 AM2/4/22

to

Surely what is intended is just the opposite: interfaces in <string.h>
may be used, except that strdup, strndup, strcoll, strxfrm, strerror
may not be used.

james...@alumni.caltech.edu

unread,

Feb 4, 2022, 1:44:18 PM2/4/22

to

On Friday, February 4, 2022 at 1:29:20 AM UTC-5, Tim Rentsch wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
...

> > . Additionally, a conforming freestanding implementation shall accept
> > any strictly conforming program in which the use of the features
> > specified in the header <string.h> , except the following functions:
> > strdup , strndup , strcoll , strxfrm , strerror ." (4p6)
> >
> > That last sentence is new in n2731.pdf, the latest draft of the CX 202X
> > that I have. It's not grammatically correct; I suspect that the word
> > "except" in the final sentence should have been "is confined to the use
> > of", paralleling the structure of the previous sentence.
> Surely what is intended is just the opposite: interfaces in <string.h>
> may be used, except that strdup, strndup, strcoll, strxfrm, strerror
> may not be used.

You might be right about that; it does make sense. However, regardless of what
they actually meant, the actual wording was messed up. It doesn't clearly
express either of our guesses as to the intended meaning.