Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Richard Heathfield's setbits()

44 views
Skip to first unread message

Alex Vinokur

unread,
May 2, 2010, 7:46:07 AM5/2/10
to
Hi,

Here is Richard Heathfield's function from http://users.powernet.co.uk/eton/kandr2/krx206.html

#include <stdio.h>
unsigned setbits(unsigned x, int p, int n, unsigned y)
{
return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
<< n)) << (p + 1 - n));
}

setbits(x,p,n,y) returns x with the n bits that begin at position p
set to the rightmost n bits of y, leaving the other bits unchanged.

It seems that setbits (x, 31, n, y) may produce undefined behavior.

According to the C++ standard:
Behavior of shift operators << and >> is undefined if the right
operand is negative, or
greater than or equal to the length in bits of the promoted left
operand.

So, result ((~0 << (p + 1)) may be undefined.

Regards,

Alex Vinokur


Ben Bacarisse

unread,
May 2, 2010, 11:39:26 AM5/2/10
to
Alex Vinokur <alex.v...@gmail.com> writes:

> Here is Richard Heathfield's function from http://users.powernet.co.uk/eton/kandr2/krx206.html
>
> #include <stdio.h>
> unsigned setbits(unsigned x, int p, int n, unsigned y)
> {
> return (x & ((~0 << (p + 1)) | (~(~0 << (p + 1 - n))))) | ((y & ~(~0
> << n)) << (p + 1 - n));
> }
>
> setbits(x,p,n,y) returns x with the n bits that begin at position p
> set to the rightmost n bits of y, leaving the other bits unchanged.
>
> It seems that setbits (x, 31, n, y) may produce undefined behavior.

Yes, it does.

> According to the C++ standard:

K&R is about C so you should probably quote the C standard. The intent
is that C and C++ remain synchronised about this sort of thing so won't
matter much, but a solution to a K&R exercise must be assumed to be C.
As it happens, modern C (C99) has diverged from C++ in the case of left
shifting negative numbers but the above is, presumably, not C99.

> Behavior of shift operators << and >> is undefined if the right
> operand is negative, or
> greater than or equal to the length in bits of the promoted left
> operand.
>
> So, result ((~0 << (p + 1)) may be undefined.

It can be undefined for other reasons, though none that matter in the
context of K&R2. ~0 is often signed and negative in which case it is
undefined in C99 but not in C90 or in C++ (at least up to and including
2003). It's safer to shift unsigned types so I'd suggest ~0u.

I think it's hard to do this without knowing the width of the type. I'd
probably write:

unsigned width = CHAR_BIT * sizeof x;
unsigned mask = ~0u >> width - n << p - n + 1;
return x & ~mask | y & mask;

This goes wrong if there are padding bits, but at least we can check for
that (UINT_MAX will be == ~0u if there are none).

There is a bit-twiddling version that is one operation shorter:

return x ^ (x ^ a) & mask;

but you'd need to justify the "eh?" this might prompt in the reader.

--
Ben.

Jonathan Lee

unread,
May 2, 2010, 11:47:57 AM5/2/10
to
On May 2, 7:46 am, Alex Vinokur <alex.vino...@gmail.com> wrote:
> Here is Richard Heathfield's function
> ...

> setbits(x,p,n,y) returns x with the n bits that begin at position p
> set to the rightmost n bits of y, leaving the other bits unchanged.
>
> It seems that setbits (x, 31, n, y) may produce undefined behavior.

Assuming p, n are in [0, INT_BIT)

unsigned mask = (~((~0U) << n)) << p;
return (x & ~mask) + ((y << p) & mask);
or
return x ^ ((x ^ (y << p)) & mask)

--Jonathan

Eric Sosman

unread,
May 2, 2010, 1:54:16 PM5/2/10
to
On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
> [...]
> ~0 is often signed and negative [...]

In C, s/often/always/.

> [...]


> This goes wrong if there are padding bits, but at least we can check for
> that (UINT_MAX will be == ~0u if there are none).

Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
type is an unsigned type, the expression ~E is equivalent to the
maximum value representable in that type minus E." As always, the
settings of any padding bits in the result of ~E (of any arithmetic
operation) are unspecified.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ben Bacarisse

unread,
May 2, 2010, 2:15:34 PM5/2/10
to
Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
>> [...]
>> ~0 is often signed and negative [...]
>
> In C, s/often/always/.

I'll take your word for it! I was not sure about ~0 on a 1's complement
machine that supports negative zero. It's called "negative" but is it
less than zero for the purposes of a shift operation? I was not sure.

>> [...]
>> This goes wrong if there are padding bits, but at least we can check for
>> that (UINT_MAX will be == ~0u if there are none).
>
> Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
> type is an unsigned type, the expression ~E is equivalent to the
> maximum value representable in that type minus E." As always, the
> settings of any padding bits in the result of ~E (of any arithmetic
> operation) are unspecified.

Ah, yes, of course. Do you know a good way to determine the width of an
unsigned type? By "good" I probably mean "other than the obvious"
iterative one.

--
Ben.

Eric Sosman

unread,
May 2, 2010, 2:58:24 PM5/2/10
to
On 5/2/2010 2:15 PM, Ben Bacarisse wrote:
> Eric Sosman<eso...@ieee-dot-org.invalid> writes:
>
>> On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
>>> [...]
>>> ~0 is often signed and negative [...]
>>
>> In C, s/often/always/.
>
> I'll take your word for it! I was not sure about ~0 on a 1's complement
> machine that supports negative zero. It's called "negative" but is it
> less than zero for the purposes of a shift operation? I was not sure.

Ah! Okay, "negative zero" might not be "negative" (since it
compares equal to "positive zero"). Point taken.

>>> [...]
>>> This goes wrong if there are padding bits, but at least we can check for
>>> that (UINT_MAX will be == ~0u if there are none).
>>
>> Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
>> type is an unsigned type, the expression ~E is equivalent to the
>> maximum value representable in that type minus E." As always, the
>> settings of any padding bits in the result of ~E (of any arithmetic
>> operation) are unspecified.
>
> Ah, yes, of course. Do you know a good way to determine the width of an
> unsigned type? By "good" I probably mean "other than the obvious"
> iterative one.

Hallvard B. Furuseth came up with

"

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where
* 0 <= k < 3.2E+10 */
#define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL %0x3fffffffL
*30 \
+ (m)%0x3fffffffL /((m)%31+1)/31%31*5 +
4-12/((m)%31+3))

Or if you are less paranoid about how large UINTMAX_MAX can get:

/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))

." (Sorry about the line-wrapping.) Dunno whether you'd deem this
good, but it's certainly a jaw-dropper.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ersek, Laszlo

unread,
May 2, 2010, 3:15:22 PM5/2/10
to
On Sun, 2 May 2010, Ben Bacarisse wrote:

> Do you know a good way to determine the width of an unsigned type? By
> "good" I probably mean "other than the obvious" iterative one.

You can count the number of set bits in logarithmic time. (*)

Counting the bits set in (uintmax_t)(unsigned_type)-1 should give the
number of value bits in "unsigned_type". It should cover all unsigned
types, and it should require not more steps than log2(sizeof(uintmax_t) *
(size_t)CHAR_BIT) rounded up or so.

(*) To see this, divide the entire bit-string in adjacent bit-pairs. A
bit-pair can have 0, 1, or 2 bits set. This sum can be represented in two
bits as 00_b, 01_b, or 10_b; there is no carry from this addition. Then
you can add adjacent bit-pairs representing the sums, in order to extend
the sum to the originally adjacent bit-pair. The highest value will be
4_d, or 0100_b, so it will fit into each four-bit nibble. And so on. You
basically do multiple additions at once. Disjunct bit-segments can be used
in parallel, because there is never a carry. As you double the segment
length in each step, the representible range is squared per segment, but
the maximum sum is only doubled; double exponential vs. exponential.

For uin32_t, it should look something like this:

unsigned
countbits32(uint32_t u32)
{
/*
Compute initial sums. A single bit is identical to the number of bits
set in it.
*/

/* 10101010 ... 01010101 ... */
u32 = (u32 & 0xAAAAAAAAlu) >> 1 + (u32 & 0x55555555lu);

/* 11001100 ... 00110011 ... */
u32 = (u32 & 0xCCCCCCCClu) >> 2 + (u32 & 0x33333333lu);

/* 11110000 ... 00001111 ... */
u32 = (u32 & 0xF0F0F0F0lu) >> 4 + (u32 & 0x0F0F0F0Flu);

u32 = (u32 & 0xFF00FF00lu) >> 8 + (u32 & 0x00FF00FFlu);
u32 = (u32 & 0xFFFF0000lu) >> 16 + (u32 & 0x0000FFFFlu);
}

(I probably broke this, so caveat emptor.)

Another way to look at this is a complete binary tree of six levels (five
levels plus root). The leaves are the individual original bits (also
representing the number of bits set in each individual bit). Each internal
node sums the value of its two direct children, representing the number of
all nonzero leaves of the subtree rooted at that node. Leveling up happens
in parallel.

(So much ranting about such a small thing, and even so, probably
incorrect. Sorry!)

Cheers,
lacos

Ben Bacarisse

unread,
May 2, 2010, 6:18:51 PM5/2/10
to
Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 5/2/2010 2:15 PM, Ben Bacarisse wrote:

<snip>


>> Do you know a good way to determine the width of an
>> unsigned type? By "good" I probably mean "other than the obvious"
>> iterative one.
>
> Hallvard B. Furuseth came up with
>
> "
>
> /* Number of bits in inttype_MAX, or in any (1<<k)-1 where
> * 0 <= k < 3.2E+10 */
> #define IMAX_BITS(m) ((m) /((m)%0x3fffffffL+1) /0x3fffffffL
> %0x3fffffffL *30 \
> + (m)%0x3fffffffL /((m)%31+1)/31%31*5 +
> 4-12/((m)%31+3))
>
> Or if you are less paranoid about how large UINTMAX_MAX can get:
>
> /* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
> #define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
>
> ." (Sorry about the line-wrapping.) Dunno whether you'd deem this
> good, but it's certainly a jaw-dropper.

I remember seeing that now... And I was worried about suggesting

x ^ (x ^ y) & mask
for


x & ~mask | y & mask

!!

--
Ben.

Ben Bacarisse

unread,
May 2, 2010, 6:41:42 PM5/2/10
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:

> Alex Vinokur <alex.v...@gmail.com> writes:
<snip>


>> setbits(x,p,n,y) returns x with the n bits that begin at position p
>> set to the rightmost n bits of y, leaving the other bits unchanged.

^^^^^^^^^^^^^^^^^^^^^
I missed this bit of the spec. You need y << p - n + 1 in:

> unsigned width = CHAR_BIT * sizeof x;
> unsigned mask = ~0u >> width - n << p - n + 1;
> return x & ~mask | y & mask;

return x & ~mask | (y << p - n + 1) & mask;

but you should also use:

unsigned mask = ~(~0u << n) << p - n + 1;

as this does not need the width of the type.

--
Ben.

Ben Bacarisse

unread,
May 2, 2010, 6:59:41 PM5/2/10
to
Jonathan Lee <cho...@shaw.ca> writes:

> On May 2, 7:46 am, Alex Vinokur <alex.vino...@gmail.com> wrote:
>> Here is Richard Heathfield's function
>> ...
>> setbits(x,p,n,y) returns x with the n bits that begin at position p
>> set to the rightmost n bits of y, leaving the other bits unchanged.
>>
>> It seems that setbits (x, 31, n, y) may produce undefined behavior.
>
> Assuming p, n are in [0, INT_BIT)
>
> unsigned mask = (~((~0U) << n)) << p;

This is a better way to make the mask but I think you've altered what p
means. Alex (based on Richard's code) seems to take p to be the
position of the most significant bit of those changed.

It's simpler (and consistent with the problem wording) to interpret p as
the least significant bit of the changed bits (as you have done) but
some people might be confused by this change of meaning.

It easy to switch to the other interpretation of the problem: substitute
p - n + 1 for p.

> return (x & ~mask) + ((y << p) & mask);
> or
> return x ^ ((x ^ (y << p)) & mask)

--
Ben.

Alex Vinokur

unread,
May 2, 2010, 7:57:52 PM5/2/10
to

"Alex Vinokur" <alex.v...@gmail.com> wrote in message
news:b76469ca-9a52-4f7e...@i10g2000yqh.googlegroups.com...

Replace ((~0 << (p + 1)) with (((~0 << (p)) << 1)

>
> Regards,
>
> Alex Vinokur
>
>
>
>

Phil Carmody

unread,
May 3, 2010, 2:55:31 AM5/3/10
to

Modulo caveats, so would I.

> unsigned width = CHAR_BIT * sizeof x;
> unsigned mask = ~0u >> width - n << p - n + 1;

Alas that can't portably inject a zero-length bitstring.

> return x & ~mask | y & mask;
>
> This goes wrong if there are padding bits, but at least we can check for
> that (UINT_MAX will be == ~0u if there are none).
>
> There is a bit-twiddling version that is one operation shorter:
>
> return x ^ (x ^ a) & mask;

One C operation shorter, yes. One instruction deeper (3 rather than 2)
on architectures sufficiently rich to parallelise the '&'s in the former.

> but you'd need to justify the "eh?" this might prompt in the reader.

Anyone who can't read that expression from left to right as
"change in x the bits that are different between x and a within the mask"
on the second reading shouldn't be reading code, but should be clicking
on buttons in some GUI instead. It's a perfectly standard idiom amongst
those who have used C as a high level assembler in every-tick-counts
embedded work. Or at least should be.

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1

Phil Carmody

unread,
May 3, 2010, 3:27:25 AM5/3/10
to
Ben Bacarisse <ben.u...@bsb.me.uk> writes:
> context of K&R2. ~0 is often signed and negative in which case it is
> undefined in C99 but not in C90 or in C++ (at least up to and including
> 2003). It's safer to shift unsigned types so I'd suggest ~0u.

~0u may end up being converted to being signed on sufficiently
bizarre architectures which permit the usual arithmetic
conversions (6.3.1.8) to fall through to, and be caught by:

Otherwise, if the type of the operand with signed
integer type can represent all of the values of
the type of the operand with unsigned integer
type, then the operand with unsigned integer type
is converted to the type of the operand with
signed integer type.

But that probably only affects people in 33-bit la-la-land.

Ben Bacarisse

unread,
May 3, 2010, 6:36:18 AM5/3/10
to
Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>> Alex Vinokur <alex.v...@gmail.com> writes:

<snip>


>>> setbits(x,p,n,y) returns x with the n bits that begin at position p
>>> set to the rightmost n bits of y, leaving the other bits unchanged.

<snip>


>> unsigned width = CHAR_BIT * sizeof x;
>> unsigned mask = ~0u >> width - n << p - n + 1;
>
> Alas that can't portably inject a zero-length bitstring.

True. I don't know if that matters or not (the specification is a
little loose) but I have already noted that JL's construction of the
mask is superior and, since it handles 0 naturally, it wins all round:

unsigned width = ~(~0u << n) << p - n + 1;


return x & ~mask | (y << p - n + 1) & mask;

If one goes on to interpret p as the least significant bit of those
injected then there is no need for p - n + 1; and the solution is
simpler still:

unsigned mask = ~(~0u << n) << p;
return x & ~mask | (y << p) & mask;

which is what he posted (modulo parentheses).

[I'm posting just to summarise what I think is the neatest solution.]

<snip>
--
Ben.

Ersek, Laszlo

unread,
May 3, 2010, 8:02:14 AM5/3/10
to

On Mon, 3 May 2010, Phil Carmody wrote:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:

>> context of K&R2. ~0 is often signed and negative in which case it is
>> undefined in C99 but not in C90 or in C++ (at least up to and including
>> 2003). It's safer to shift unsigned types so I'd suggest ~0u.
>
> ~0u may end up being converted to being signed

0u has type "unsigned int", it has the conversion rank of "int", so it is
no candidate for integer promotion. Thus ~0u is still of type "unsigned
int" (and has value UINT_MAX, C99 "6.5.3.3 Unary arithmetic operators"
p4). If you use ~0u as one operand of a binary (as in, taking two
arguments) operator that implies the usual arithmetic conversions, and the
other operand is also of integer type, then

- If the other operand has lesser conversion rank than "int", it is
converted to "int" or "unsigned int", dependent on whether "int" can
represent all values of the original type with the small conversion rank.
If it is converted to "int", then that "int" is converted to "unsigned
int", and the operator is evaluated.

- If the other operand has the conversion rank of "int", it is either
"int" or "unsigned int", and then see the last sentence the previous
paragraph.

- If the other operand has greater rank than that of "int", eg. it's a
"long" (long long) or "long unsigned" (long long unsigned), or some other
extended integer type, then ~0u is converted. If the other operand is
"long", and "long" can represent all values of "unsigned int", then ~0u
will be converted to "long", but the value won't change: ((long)UINT_MAX).
If the other operand is "unsigned long", or it's "long" but "long" can't
represent all "unsigned int" values, then the other operand will be
converted to (or stay) "unsigned long", ~0u will be also converted to
"unsigned long" (-> ((unsigned long)UINT_MAX)), and the operator will be
evaluated.

In short, ~0u always has type "unsigned int" in itself.

Shifting doesn't imply the usual arithmetic conversions, see C99 6.5.7
"Bitwise shift operators" p2-3: "Each of the operands shall have integer
type. [...] The integer promotions are performed on each of the operands.
The type of the result is that of the promoted left operand."

Cheers,
lacos

Ben Bacarisse

unread,
May 3, 2010, 8:19:14 AM5/3/10
to
"Ersek, Laszlo" <la...@caesar.elte.hu> writes:

> On Mon, 3 May 2010, Phil Carmody wrote:
>
>> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>
>>> context of K&R2. ~0 is often signed and negative in which case it
>>> is undefined in C99 but not in C90 or in C++ (at least up to and
>>> including 2003). It's safer to shift unsigned types so I'd suggest
>>> ~0u.
>>
>> ~0u may end up being converted to being signed
>
> 0u has type "unsigned int", it has the conversion rank of "int", so it
> is no candidate for integer promotion.

n1256.pdf 6.3.1.1 p2 says:

An object or expression with an integer type whose integer conversion
rank is less than /or equal to/ the rank of int and unsigned int

(my emphasis). This is part of the paragraph that determines what types
are candidates for integer promotion.

This may be a change from the published C99 causes by one of the
Technical Corrigenda because my copy of n1256.pdf has change bars here.
None the less, it is still official "C".

I was aware of the odd possibility that ~0u could be a signed int but it
seemed way to fussy to bring it up! For one thing, there is nothing you
can do about except hope that the implementation does the shifts as
you'd expect anyway.

BTW, I'd like to be wrong about this.

<snip>
--
Ben.

Ersek, Laszlo

unread,
May 3, 2010, 9:52:27 AM5/3/10
to
On Mon, 3 May 2010, Ben Bacarisse wrote:

> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:
>
>> On Mon, 3 May 2010, Phil Carmody wrote:
>>
>>> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>
>>>> context of K&R2. ~0 is often signed and negative in which case it
>>>> is undefined in C99 but not in C90 or in C++ (at least up to and
>>>> including 2003). It's safer to shift unsigned types so I'd suggest
>>>> ~0u.
>>>
>>> ~0u may end up being converted to being signed
>>
>> 0u has type "unsigned int", it has the conversion rank of "int", so it
>> is no candidate for integer promotion.
>
> n1256.pdf 6.3.1.1 p2 says:
>
> An object or expression with an integer type whose integer conversion
> rank is less than /or equal to/ the rank of int and unsigned int
>
> (my emphasis). This is part of the paragraph that determines what types
> are candidates for integer promotion.
>
> This may be a change from the published C99 causes by one of the
> Technical Corrigenda because my copy of n1256.pdf has change bars here.
> None the less, it is still official "C".

Yes, my copy of C99 doesn't contain the emphasized words. TC2 entry 10
adds them.

Even so, 0u could be promoted to "int" only, and only if INT_MAX >=
UINT_MAX. Considering sizeof(int) == sizeof(unsigned), this would require
strictly more padding bits in "unsigned int" than in "signed int". That
would fly in the face of the conversion rules:

----v----
Otherwise, if the operand that has unsigned integer type has rank greater
or equal to the rank of the type of the other operand, then the operand
with signed integer type is converted to the type of the operand with
unsigned integer type.
----^----

and the candidate types for integer constants suffixed with "u" (unsigned,
unsigned long, unsigned long long).

Furthermore, INT_MAX > UINT_MAX is impossible as per C99 6.2.5 "Types" p9.
Hence, for such a promotion to be possible, the implementation must make
UINT_MAX = INT_MAX, and place strictly more padding bits in "unsigned int"
than in "int", and the conversion from "int" to "unsigned int" would be
lossy.

I have no idea why TC2 entry 10 came to be.


> I was aware of the odd possibility that ~0u could be a signed int but it
> seemed way to fussy to bring it up! For one thing, there is nothing you
> can do about except hope that the implementation does the shifts as
> you'd expect anyway.
>
> BTW, I'd like to be wrong about this.

Me too, absolutely. My C89 unsigned's suddenly silently turning into int's
would be catastrophic. "unsigned" was the safe haven you once reached
(after considering range) and stayed in.

... Ah, I think it's a wording bug in TC2 entry 10 (or not, if I also
misunderstand 6.3.1.1 p3 "The integer promotions preserve value including
sign").

6.3.1.1 p2 has two sections. The first section says what may be used "in
an expression wherever an int or unsigned int may be used". The strict
less-than rank restriction might have exluded "int" and "unsigned"
themselves, which is (considered very formally) nonsensical: "an int may
not be used where an int may be used". TC2 might have fixed this.

The second section describes the circumstances of the promotion, eg. the
target type.

The question is whether section 1 *implies* section 2, that is, if an
object or expression with integer type may be used in place of an int or
unsigned int, does that necessarily invoke promotion? This didn't matter
with pre-TC2 C99, because even if there has been such an implication,
"int" and "unsigned int" falsified the premise.

I think the intent of the second section could be reformulated like this:

----v----
/Except when the original type is unsigned int,/ [i]f an int can represent
all values of the original type, the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions. 48) All other types are unchanged by the integer
promotions.
----^----

This introduces some ambiguity in interpreting the "otherwise" branch, but
I don't think it matters. There should be no difference between "an
unsigned int is not promoted" and "an unsigned int is promoted to an
unsigned int".

Cheers,
lacos

Tim Rentsch

unread,
May 3, 2010, 2:15:24 PM5/3/10
to
Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>> context of K&R2. ~0 is often signed and negative in which case it is
>> undefined in C99 but not in C90 or in C++ (at least up to and including
>> 2003). It's safer to shift unsigned types so I'd suggest ~0u.
>
> ~0u may end up being converted to being signed on sufficiently
> bizarre architectures which permit the usual arithmetic
> conversions (6.3.1.8) to fall through to, and be caught by:
>
> Otherwise, if the type of the operand with signed
> integer type can represent all of the values of
> the type of the operand with unsigned integer
> type, then the operand with unsigned integer type
> is converted to the type of the operand with
> signed integer type.
>

Right idea but not quite the right section; the usual
arithmetic conversions don't apply to operands of ~.

Tim Rentsch

unread,
May 3, 2010, 2:12:29 PM5/3/10
to
Eric Sosman <eso...@ieee-dot-org.invalid> writes:

> On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
>> [...]
>> ~0 is often signed and negative [...]
>
> In C, s/often/always/.

Trivia question 1: Under what circumstances may the C expression

~0 > 0

evaluate to 1?


>> [...]
>> This goes wrong if there are padding bits, but at least we can check for
>> that (UINT_MAX will be == ~0u if there are none).
>
> Also if there are twenty. 6.5.3.3p4: "[...] If the promoted
> type is an unsigned type, the expression ~E is equivalent to the
> maximum value representable in that type minus E." As always, the
> settings of any padding bits in the result of ~E (of any arithmetic
> operation) are unspecified.

Trivia question 2: Under what circumstances may the C expression

UINT_MAX == (unsigned) -1 && UINT_MAX != ~0u

evaluate to 1?

Eric Sosman

unread,
May 3, 2010, 4:03:32 PM5/3/10
to
On 5/3/2010 2:12 PM, Tim Rentsch wrote:
> Eric Sosman<eso...@ieee-dot-org.invalid> writes:
>
>> On 5/2/2010 11:39 AM, Ben Bacarisse wrote:
>>> [...]
>>> ~0 is often signed and negative [...]
>>
>> In C, s/often/always/.
>
> Trivia question 1: Under what circumstances may the C expression
>
> ~0> 0
>
> evaluate to 1?

Ben Bacarisse gave the necessary hint in a response yesterday
(which I acknowledged as correct yesterday). If ~0 produces a
"minus zero" trap representation, the behavior is undefined -- in
which case the `>' can yield 0, 1, or even -98.6 in blinking lights.

> Trivia question 2: Under what circumstances may the C expression
>

> UINT_MAX == (unsigned) -1&& UINT_MAX != ~0u
>
> evaluate to 1?

Three that come to mind:

- When `UINT_MAX' is a user-defined macro or variable, not
the macro defined by <limits.h>

- When the code has (invalidly) #define'd the `unsigned'
keyword to something screwy

- When the implementation is non-conforming

Can't think of any others, though.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,
May 3, 2010, 5:20:10 PM5/3/10
to
On 5/3/2010 4:03 PM, Eric Sosman wrote:
> On 5/3/2010 2:12 PM, Tim Rentsch wrote:
>> [...]

>> Trivia question 2: Under what circumstances may the C expression
>>
>> UINT_MAX == (unsigned) -1&& UINT_MAX != ~0u
>>
>> evaluate to 1?
>
> Three that come to mind:
>
> - When `UINT_MAX' is a user-defined macro or variable, not
> the macro defined by <limits.h>
>
> - When the code has (invalidly) #define'd the `unsigned'
> keyword to something screwy
>
> - When the implementation is non-conforming
>
> Can't think of any others, though.

Okay, I've thought of another -- but I'm not sure it's an
actual possibility, or just the imprecision of natural language.
6.2.5p9 says "The range of nonnegative values of a signed integer
type is a subrange of the corresponding unsigned integer type [...]"
If "subrange" is taken to mean "proper subrange," then it follows
that there exist `unsigned int' values not representable as `int'
(namely, those not in the subrange). In that case, the language
of 6.3.1.1p2 "If an int can represent all values of the original
type [...]" does not apply, and 0u does not promote to signed int
but remains an unsigned int operand.

But if "subrange" includes "improper subrange," that is, the
entire range (so INT_MAX==UINT_MAX), then 6.3.1.1p2 *does* apply
and the integer promotions convert 0u to 0, a signed int, making
the expression ~0u ill-defined because of the usual uncertainties
about bit-flipping signed integers. Indeed, most operators with
unsigned int operands become problematic in the face of possible
promotion to (signed) int. To be sure of unsigned semantics, you'd
need to do everything in an unsigned type of greater conversion rank
than int (unsigned long, for example).

... on the other hand, even if this reading of "subrange"
makes unsigned int arithmetic largely useless, it has precedent.
In 6.3.1.1p1 we see "No two signed integer types shall have the
same rank, even if they have the same representation," showing that
types with identical ranges can have different ranks. And 6.2.5p8
says "[...] the range of values of the type with smaller integer
conversion rank is a subrange of the values of the other type,"
which when applied to types with different ranks but identical
ranges shows that "subrange" is (at least sometimes) used in a
sense that covers "improper subrange," that is, identity.

So, I guess you pays your money and takes your choice: Either
you believe that "subrange" really means "sub" (in 6.2.5p9, at any
rate), and unsigned int arithmetic works, or else you take it in
the 6.3.1.1p1/6.2.5p8 sense, allow 0u to promote to 0, and give
up on unsigned int arithmetic. In my heart of hearts I can't
believe that the Committee intended to forbid unsigned int arithmetic
in strictly conforming programs, but it's at least *possible* to
read the Standard that way.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Ersek, Laszlo

unread,
May 3, 2010, 8:32:15 PM5/3/10
to
On Mon, 3 May 2010, Eric Sosman wrote:

[snip]

> So, I guess you pays your money and takes your choice: Either you
> believe that "subrange" really means "sub" (in 6.2.5p9, at any rate),
> and unsigned int arithmetic works, or else you take it in the
> 6.3.1.1p1/6.2.5p8 sense, allow 0u to promote to 0, and give up on
> unsigned int arithmetic. In my heart of hearts I can't believe that the
> Committee intended to forbid unsigned int arithmetic in strictly
> conforming programs, but it's at least *possible* to read the Standard
> that way.

I was curious how it went historically.

C89 6.2.1.1 Characters and integers

----v----
A char, a short int, or an int bit-field, or their signed or unsigned
varieties, or an enumeration type, may be used in an expression wherever
an int or unsigned int may be used. If an int can represent all values of

the original type, the value is converted to an int; otherwise, it is

converted to an unsigned int. These are called the integral promotions.
^27 All other arithmetic types are unchanged by the integral promotions.

The integral promotions preserve value including sign. As discussed
earlier, whether a "plain" char is treated as signed is
implementation-defined.
----^----

(That is, "int" and "unsigned int" are not candidates for promotion.)

This passage was touched by neither TCOR1 nor TCOR2.

The C89 Rationale examines this in great detail in 3.2.1.1 "Characters and
integers", and says

----v----
QUIET CHANGE

A program that depends upon unsigned preserving arithmetic conversions
will behave differently, probably without complaint. This is considered
the most serious semantic change made by the Committee to a widespread
current practice.
----^----

C99 with TC1 applied (but not TC2) still excluded "int" and "unsigned int"
from the integer promotions as "source types". Then TC2 came along and
included them. I tried to find some rationale for this, but the most
recent C99 Rationale I know of is version 5.10, released in April 2003,
and C99 TC2 is younger: it was published on 2004-11-15. Thus the most
recent C99 Rationale has no comment on C99 TC2; in fact, here's the
unified diff between the corresponding passages of the two rationales,
after some cautious reformatting. (I recommend piping the diff through
"colordiff | less -R" or something similar.)

----v----
--- c89rat.txt 2010-05-04 02:15:56.000000000 +0200
+++ c99rat.txt 2010-05-04 02:16:05.000000000 +0200
@@ -1,25 +1,22 @@
-Since the publication of K&R,
+Between the publication of K&R and the development of C89,
a serious divergence
-has
+had
occurred among implementations
-of C
in the evolution of
-integral
+integer
promotion rules. Implementations
-fall
+fell
into two major camps
-,
which may be characterized as unsigned preserving and value preserving. The
difference between these approaches
-centers
+centered
on the treatment of unsigned char and unsigned short
-,
when widened by the
-integral
+integer
promotions, but the decision
-has
+had
an impact on the typing of constants as well (see ďż˝
-3.1.3.2
+6.4.4.1
).

The unsigned preserving approach calls for promoting the two smaller unsigned
@@ -27,7 +24,6 @@
independent of execution environment.

The value preserving approach calls for promoting those types to signed int
-,
if that type can properly represent all the values of the original type, and
otherwise for promoting those types to unsigned int. Thus, if the execution
environment represents short as something smaller than int, unsigned short
@@ -35,14 +31,14 @@

Both schemes give the same answer in the vast majority of cases, and both give
the same effective result in even more cases in implementations with
-twos
+two's
-complement arithmetic and quiet wraparound on signed overflow --- that is, in
most current implementations. In such implementations, differences between the
two only appear when these two conditions are both true:

1. An expression involving an unsigned char or unsigned short produces an
int-wide result in which the sign bit is set
-: i.e.
+, that is
, either a unary operation on such a type, or a binary operation in which the
other operand is an int or ``narrower'' type.

@@ -53,9 +49,7 @@
a long type, or

* it is the left operand of the right-shift operator
-(
in an implementation where this shift is defined as arithmetic
-)
, or

* it is either operand of /, %, <, <=, >, or >=.
@@ -65,15 +59,13 @@
signed or unsigned interpretation. Exactly the same ambiguity arises whenever
an unsigned int confronts a signed int across an operator, and the signed int
has a negative value.
-(
Neither scheme does any better, or any worse, in resolving the ambiguity of
this confrontation.
-)
Suddenly, the negative signed int becomes a very large unsigned int, which may
be surprising
----
+,
or it may be exactly what is desired by a
-knowledgable
+knowledgeable
programmer. Of course, all of these ambiguities can be avoided by a judicious
use of casts.

@@ -87,17 +79,20 @@
whereas the value preserving rules minimize such confrontations. Thus, the
value preserving rules were considered to be safer for the novice, or unwary,
programmer. After much discussion, the
+C89
Committee decided in favor of value preserving rules, despite the fact that the
UNIX C compilers had evolved in the direction of unsigned preserving.

QUIET CHANGE
+IN C89

A program that depends upon unsigned preserving arithmetic conversions will
behave differently, probably without complaint. This
-is
+was
considered the most serious semantic change made by the
+C89
Committee to a widespread current practice.

The Standard clarifies that the
-integral
+integer
promotion rules also apply to bit-fields.
----^----

With no rationale available on C99 TC2, I'm convinced that the words "or
equal to" were added by TC2 entry 10 only in order to fix a superficial
contradiction, and not to permit the promotability of unsigned int. The
latter would have been a huge "quiet" change, and it surely would have
deserved an update to the C99 Rationale.

Cheers,
lacos

Keith Thompson

unread,
May 4, 2010, 3:03:31 AM5/4/10
to
"Ersek, Laszlo" <la...@caesar.elte.hu> writes:
[...]

> I was curious how it went historically.
[snip]

> The C89 Rationale examines this in great detail in 3.2.1.1 "Characters and
> integers", and says
>
> ----v----
> QUIET CHANGE
>
> A program that depends upon unsigned preserving arithmetic conversions
> will behave differently, probably without complaint. This is considered
> the most serious semantic change made by the Committee to a widespread
> current practice.
> ----^----
>
> C99 with TC1 applied (but not TC2) still excluded "int" and "unsigned int"
> from the integer promotions as "source types". Then TC2 came along and
> included them. I tried to find some rationale for this, but the most
> recent C99 Rationale I know of is version 5.10, released in April 2003,
> and C99 TC2 is younger: it was published on 2004-11-15. Thus the most
> recent C99 Rationale has no comment on C99 TC2; in fact, here's the
> unified diff between the corresponding passages of the two rationales,
> after some cautious reformatting. (I recommend piping the diff through
> "colordiff | less -R" or something similar.)
>
[snip]

>
> With no rationale available on C99 TC2, I'm convinced that the words "or
> equal to" were added by TC2 entry 10 only in order to fix a superficial
> contradiction, and not to permit the promotability of unsigned int. The
> latter would have been a huge "quiet" change, and it surely would have
> deserved an update to the C99 Rationale.

Changes in the TCs (Technical Corrigenda) come
from DRs (Defect Reports), which are listed at
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm>.

The change in question was made in response to DR #230,
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_230.htm>.
It addresses enumerate types with the same integer conversion rank
as int or unsigned int.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Ersek, Laszlo

unread,
May 4, 2010, 5:24:15 AM5/4/10
to
On Tue, 4 May 2010, Keith Thompson wrote:

> "Ersek, Laszlo" <la...@caesar.elte.hu> writes:

>> With no rationale available on C99 TC2, I'm convinced that the words
>> "or equal to" were added by TC2 entry 10 only in order to fix a
>> superficial contradiction, and not to permit the promotability of
>> unsigned int. The latter would have been a huge "quiet" change, and it
>> surely would have deserved an update to the C99 Rationale.
>
> Changes in the TCs (Technical Corrigenda) come from DRs (Defect
> Reports), which are listed at
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm>.
>
> The change in question was made in response to DR #230,
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_230.htm>. It
> addresses enumerate types with the same integer conversion rank as int
> or unsigned int.

That's superb, thank you very much! So it was all about enums with greater
rank than that of int. (Even though "[t]he identifiers in an enumerator
list are declared as constants that have type int" (6.7.2.2p3).)

If I understand the "Committee Discussion" correctly, the two
(alternative) proposed changes were considered equivalent, and the simpler
one was chosen for, well, simplicity. Unfortunately, it seems to have
allowed, unintentionally, the "promotion" of 0u to 0 under the "right"
circumstances.

Thanks!
lacos

Ersek, Laszlo

unread,
May 4, 2010, 5:28:40 AM5/4/10
to
On Tue, 4 May 2010, Ersek, Laszlo wrote:

> On Tue, 4 May 2010, Keith Thompson wrote:

>> The change in question was made in response to DR #230,
>> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_230.htm>. It
>> addresses enumerate types with the same integer conversion rank as int

^^^^


>> or unsigned int.
>
> That's superb, thank you very much! So it was all about enums with
> greater rank than that of int.

^^^^^^^

I seem to have some problems with reading comprehension this morning
(too). Sorry.

lacos

Tim Rentsch

unread,
May 5, 2010, 11:19:59 PM5/5/10
to
Eric Sosman <eso...@ieee-dot-org.invalid> writes:

A sentence in 6.2.6.2p2 makes it pretty clear that "subrange" allows
the ranges to be the same:

Each bit that is a value bit shall have the same value as the
same bit in the object representation of the corresponding
unsigned type (if there are M value bits in the signed type
and N in the unsigned type, then M <= N).

If M == N then the non-negative range of the signed type matches
the range of the unsigned type. If "subrange" were meant to
be only proper subranges, the constraint would have been M < N.

0 new messages