Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

arithmetic overflow revisited

612 views
Skip to first unread message

blmblm.m...@gmail.com

unread,
Aug 7, 2018, 11:10:35 PM8/7/18
to
I teach C programming to undergraduates, and while I do my best
to teach them how to use the language correctly, the recent thread
about undefined behavior reminds me that I never know quite what to
say about overflow in arithmetic on signed integers. Usually I've
just muttered something about how several languages (Java and Scala
come to mind) just give "wrong" answers, and there's not a lot one
can easily do about it. But if in C it's undefined behavior, hm,
that's (strictly speaking!) more serious, and I'm curious about what
one *can* do. Consider this code fragment:

int a, b;
/* code to assign values to a and b */
printf("%d + %d is %d\n", a, b, a + b);

I noticed in the other thread a suggestion (I think -- I may have
misunderstood it) that one can avoid UB in addition of signed integers
by first casting to unsigned integers, adding, and then casting back
to signed. That seems like it ought to give a result consistent with
the behavior or those Other Languages (that quietly wrap around),
and is perhaps the best one can reasonably do. Is that what a truly
pedantic and careful programmer would do?

Mostly I'm curious, because I think for my students the most appropriate
thing to do is just to continue to say that many commonly-used programming
languages (not all!) don't deal very gracefully with this kind of thing
and that a full discussion is beyond the scope of the course. But it
would be nice to give some hints about how to write truly careful code.

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.

Reinhardt Behm

unread,
Aug 7, 2018, 11:48:08 PM8/7/18
to
I would word it in the following way:
What's usually happens in the CPU (may be not all, but most) is silently
wrap around. Many languages handle it the same.
In C the compiler is free to do it the same way and ignore any overflow but
it is also free to do any kind of nasty things (the nasal daemons..)

Ignoring overflow and wrapping around will lead to mathematically incorrect
results. They can lead to embarrassing outputs of your program like websites
we all have seen to show totally nonsensical data (Your subscription will
end in -32767 days). In real life such results can lead to catastrophic
outcomes even with people getting killed and the programmer going to jail.

The responsible way of handling this is to always check the possible range
of inputs and results of calculations - also intermediate ones - and be
prepared that such overflows can happen and choose your data types
accordingly to prevent them. In critical programs document this to make sure
nobody can accuse you and the next programmer modifying the software knows
about it.

A question to the standard experts here:
Arithmetic operations with int types can inevitably lead to overflow and
thus UB. Which allows the compiler do to anything nasty. Can we expect that
if the inputs are such that no overflow will happen, we are guaranteed the
mathematically correct values and no nasal daemons, even if the compiler has
no chance to deduce that no OV will happen.

--
Reinhardt

Malcolm McLean

unread,
Aug 8, 2018, 5:20:26 AM8/8/18
to
On Wednesday, August 8, 2018 at 4:48:08 AM UTC+1, Reinhardt Behm wrote:
>
> A question to the standard experts here:
> Arithmetic operations with int types can inevitably lead to overflow and
> thus UB. Which allows the compiler do to anything nasty. Can we expect that
> if the inputs are such that no overflow will happen, we are guaranteed the
> mathematically correct values and no nasal daemons, even if the compiler has
> no chance to deduce that no OV will happen.
>
Take this code.

int main(int argc, char **argv)
{
printf("%d\n", atoi(argv[1]) + atoi(argv[2]));
}

The compiler has no way of detecting overflow at compile time. However
if the inputs are within range, it must act as you would expect.
Otherwise practically any calculation would be UB because it's generally
to hard to reason about values set in distant parts of the program,
even if the programmer knows that they must be small.

However if the results overflow, output can be anything. Generally it
will be a wrap, but a high quality implementation might terminate with
an error message, because, as you say, wrong results are often worse
than no results at all.

David Brown

unread,
Aug 8, 2018, 5:33:56 AM8/8/18
to
On 08/08/18 05:48, Reinhardt Behm wrote:

> A question to the standard experts here:
> Arithmetic operations with int types can inevitably lead to overflow and
> thus UB. Which allows the compiler do to anything nasty. Can we expect that
> if the inputs are such that no overflow will happen, we are guaranteed the
> mathematically correct values and no nasal daemons, even if the compiler has
> no chance to deduce that no OV will happen.
>

Yes.

David Brown

unread,
Aug 8, 2018, 5:41:07 AM8/8/18
to
Teach them not to allow signed integer overflow in their programs.
There are, as you say, languages that define the behaviour of signed
integer overflow - but they give incorrect answers. Defined answers,
but wrong answers (for most purposes).

Teach them that they should not add two numbers if they might overflow -
just as they should not empty one bowl of apples into another bowl of
apples if that bowl is not big enough to hold them all. In C, doing so
may result in a troll coming along and eating you, and then the apples.
In Java, you'd be guaranteed a result - a negative number of apples in
the bowl. /Neither/ result is good. So don't do it.

Casting to unsigned integers, doing the arithmetic, and casting back is
equivalent to the Java "solution". It's adding some more apples to a
pile of apples and getting a negative result. Don't do that.
Consistently, predictably wrong and meaningless is still wrong and
meaningless.


There /are/ occasions when you actively want wraparound behaviour, and
unsigned integers give you that. But they are rare - very rare. In
almost all cases, if it overflows, it's a bug.


Bart

unread,
Aug 8, 2018, 6:21:49 AM8/8/18
to
On 08/08/2018 04:10, blm...@myrealbox.com wrote:
> I teach C programming to undergraduates, and while I do my best
> to teach them how to use the language correctly, the recent thread
> about undefined behavior reminds me that I never know quite what to
> say about overflow in arithmetic on signed integers. Usually I've
> just muttered something about how several languages (Java and Scala
> come to mind) just give "wrong" answers, and there's not a lot one
> can easily do about it. But if in C it's undefined behavior, hm,
> that's (strictly speaking!) more serious, and I'm curious about what
> one *can* do. Consider this code fragment:
>
> int a, b;
> /* code to assign values to a and b */
> printf("%d + %d is %d\n", a, b, a + b);
>
> I noticed in the other thread a suggestion (I think -- I may have
> misunderstood it) that one can avoid UB in addition of signed integers
> by first casting to unsigned integers, adding, and then casting back
> to signed. That seems like it ought to give a result consistent with
> the behavior or those Other Languages (that quietly wrap around),
> and is perhaps the best one can reasonably do.

This is what makes it so silly. You have to obfuscate your code just to
get to the starting point of those other languages. And do it in a
million places (eg. everywhere you might use ++i).

Those other languages had the benefit of being invented when pretty much
every machine used twos complement arithmetic, and they could afford to
ignore the oddball ones. (Although I think Java specifically makes its
behaviour identical on all machines.)

There's no reason why the same assumption couldn't be made with C, if
you are never going to run it on anything different. But no, C has to
play the UB card.

Is that what a truly
> pedantic and careful programmer would do?

If you're that worried about it, you need to use a language like Python,
where overflow is impossible: the numbers just get bigger and bigger
until you run out of memory.

Or one like Ada, where I can't imagine there is no facility to deal with
overflow systematically.

Otherwise, just remember this is a low level language and you're on your
own. There are no exceptions to trap an overflow and jump to a part of
your program to deal with that.

(And even if there were, it would make programs much more complicated.
How exactly could you proceed from such an event anyway? I think the
best is gcc's -ftrapv which will abort your program.)

If the correct result is absolutely crucial, and you can't ensure no
overflow at coding time, then you will need to put in validation checks,
for example in your example:

if (addoverflows(a,b)) ...

addoverflows() will check if a+b will overflow, and one way of doing
that involves performing the addition, but if does overflow, it will
invoke UB! So here you might use the unsigned version, but it would make
life much easier if it wasn't UB.

This is where I disagree with certain people here (although I
acknowledge that UB can be used to generate marginally more efficient code).

>
> Mostly I'm curious, because I think for my students the most appropriate
> thing to do is just to continue to say that many commonly-used programming
> languages (not all!) don't deal very gracefully with this kind of thing
> and that a full discussion is beyond the scope of the course. But it
> would be nice to give some hints about how to write truly careful code.

If specifically coding in such a low level language, one should be aware
of how overflows are handled inside a machine, which involves looking
beyond what the language says about it.

And actually, the way it works is that addition (signed /or/ unsigned,
it's usually exactly the same operation) wraps around. There may or may
not be internal flags set. But usually, INT_MAX+1 gives you INT_MIN.

One thing that bugs me with this, is that exactly the same limitation of
fixed width numbers also causes overflow with unsigned integers. Imagine
your program was like this:

unsigned int a, b;
/* code to assign values to a and b */
printf("%u + %u is %u\n", a, b, a + b);

Does making a and b unsigned magically make the problem of an incorrect
result disappear? It doesn't.

Except that C says a+b can NEVER overflow, and the result is ALWAYS
correct. Which means that UINT_MAX+1 resulting in zero is fine; it
doesn't represent any kind of overflow.

Good luck putting that across...

--
bart

Bart

unread,
Aug 8, 2018, 6:25:55 AM8/8/18
to
On 08/08/2018 10:40, David Brown wrote:

> Teach them that they should not add two numbers if they might overflow -
> just as they should not empty one bowl of apples into another bowl of
> apples if that bowl is not big enough to hold them all.

Unless the bowls are unsigned.

Then the problem becomes one of explaining why it doesn't matter if the
numbers are unsigned, even though you still don't get the total of all
the apples. (And not even the saturated amount; you might end with less
than in either bowl.)

--
bart

Reinhardt Behm

unread,
Aug 8, 2018, 6:43:10 AM8/8/18
to
Thank you and David for this clarification. I am no language lawyer, but
sometimes one can get the impression in this group that everything is UB.

--
Reinhardt

Malcolm McLean

unread,
Aug 8, 2018, 6:44:35 AM8/8/18
to
On Wednesday, August 8, 2018 at 11:21:49 AM UTC+1, Bart wrote:
>
> unsigned int a, b;
> /* code to assign values to a and b */
> printf("%u + %u is %u\n", a, b, a + b);
>
> Does making a and b unsigned magically make the problem of an incorrect
> result disappear? It doesn't.
>
> Except that C says a+b can NEVER overflow, and the result is ALWAYS
> correct. Which means that UINT_MAX+1 resulting in zero is fine; it
> doesn't represent any kind of overflow.
>
> Good luck putting that across...
>
Another issue is that most integers in a program represent either counts
of things in memory, indices into those arrays, or sizes of memory in bytes.
So they should be size_t. Which means that C defines wrong results on
overflow. To be fair, there is ssize_t, but I have seldom seen it used.


David Brown

unread,
Aug 8, 2018, 6:45:05 AM8/8/18
to
On 08/08/18 12:25, Bart wrote:
> On 08/08/2018 10:40, David Brown wrote:
>
>> Teach them that they should not add two numbers if they might overflow -
>> just as they should not empty one bowl of apples into another bowl of
>> apples if that bowl is not big enough to hold them all.
>
> Unless the bowls are unsigned.

Read the rest of my post.

Teach them not to add two numbers if they might overflow. That
/includes/ unsigned numbers. Unsigned overflow is defined behaviour in
C - but it still gives you the wrong result for almost all cases.

>
> Then the problem becomes one of explaining why it doesn't matter if the
> numbers are unsigned, even though you still don't get the total of all
> the apples. (And not even the saturated amount; you might end with less
> than in either bowl.)
>

No, the key point to explain is how to avoid overflow. That's all. It
doesn't really matter if the integers are signed or unsigned - if you
are overflowing, you have a bug in the code.

Once you get onto more advanced stuff, you can start looking into why
wraparound unsigned overflow is occasionally useful, but that is for
much later on.

Bart

unread,
Aug 8, 2018, 6:50:33 AM8/8/18
to
On 08/08/2018 11:44, David Brown wrote:
> On 08/08/18 12:25, Bart wrote:
>> On 08/08/2018 10:40, David Brown wrote:
>>
>>> Teach them that they should not add two numbers if they might overflow -
>>> just as they should not empty one bowl of apples into another bowl of
>>> apples if that bowl is not big enough to hold them all.
>>
>> Unless the bowls are unsigned.
>
> Read the rest of my post.
>
> Teach them not to add two numbers if they might overflow. That
> /includes/ unsigned numbers. Unsigned overflow is defined behaviour in
> C - but it still gives you the wrong result for almost all cases.

Thanks, that must be the first time anyone has agreed with me on that point.


--
bart

mark.b...@gmail.com

unread,
Aug 8, 2018, 6:50:49 AM8/8/18
to
On Wednesday, 8 August 2018 11:25:55 UTC+1, Bart wrote:
> On 08/08/2018 10:40, David Brown wrote:
>
> > Teach them that they should not add two numbers if they might overflow -
> > just as they should not empty one bowl of apples into another bowl of
> > apples if that bowl is not big enough to hold them all.
>
> Unless the bowls are unsigned.

What on earth do you think you mean by this?

David's point is simple.
1) Integers, in practically any programming language, have finite capacity.
(Naturally there are various Big Number implementations, but native integers
are finite).
2) If you exceed that capacity the result can not be arithmetically
correct.
3) Ergo, you should not exceed that capacity.

The fact that for certain languages, or certain implementations of languages,
the result of exceeding the capacity is a predictable, but still
arithmetically incorrect, value does not invalidate David's point.

The defined behaviour of signed arithmetic overflow in some languages is, I
suspect, largely a result of them developing in an environment where it was
reasonable to restrict the target hardware to that implementing
twos-complement arithmetic. I'd be interested to know if anyone has tried
implementing, say, Java on ones-complement or sign-and-magnitude environments
and if so how much that hurt.

The defined behaviour doesn't make the arithmetic results any more correct,
nor, in my mind, does it invalidate the decision of the designers of C (see
my other post "from the horse's mouth" for some background on the
design/evolution of C) that the behaviour of such overflow should not be
defined, so as to avoid the need for compiler-writers to jump through hoops
in environments where twos-complement wasn't the default.

Bart

unread,
Aug 8, 2018, 6:59:02 AM8/8/18
to
C implements array indices as pointer offsets, which need to be positive
or negative. So the use of size_t might be a problem.

Actually, it needs a signed type one bit wider than size_t: given a
pointer to location zero, you might need to access an element at the top
of memory via an offset.

And a pointer to the top of memory, and needing to access a location at
(or near) the bottom of memory.

--
bart

James Kuyper

unread,
Aug 8, 2018, 7:02:00 AM8/8/18
to
On 08/07/2018 11:10 PM, blm...@myrealbox.com wrote:
> I teach C programming to undergraduates, and while I do my best
> to teach them how to use the language correctly, the recent thread
> about undefined behavior reminds me that I never know quite what to
> say about overflow in arithmetic on signed integers. Usually I've
> just muttered something about how several languages (Java and Scala
> come to mind) just give "wrong" answers, and there's not a lot one
> can easily do about it. But if in C it's undefined behavior, hm,
> that's (strictly speaking!) more serious, and I'm curious about what
> one *can* do. Consider this code fragment:
>
> int a, b;
> /* code to assign values to a and b */
> printf("%d + %d is %d\n", a, b, a + b);
>
> I noticed in the other thread a suggestion (I think -- I may have
> misunderstood it) that one can avoid UB in addition of signed integers
> by first casting to unsigned integers, adding, and then casting back
> to signed. That seems like it ought to give a result consistent with
> the behavior or those Other Languages (that quietly wrap around),
> and is perhaps the best one can reasonably do. Is that what a truly
> pedantic and careful programmer would do?

No.

The problem is that any bit pattern that represents a negative value in
the signed type will represent a positive value in the corresponding
unsigned type that is too big to be represented in the signed type.
Therefore, when such a value is converted from the unsigned type to the
signed type "either the result is implementation-defined or an
implementation-defined signal is raised." (6.3.1.3p3). That's marginally
better than "undefined behavior": "implementation-defined" behavior is
unspecified behavior for which an implementation is required to document
which choice it makes. Therefore, if your code is targeted to only a
small number of implementations, you can read the documentation for each
one, determine what it does with signed overflow, and write your code
accordingly. But such code is not guaranteed to work as desired on all
implementations of C.

I've heard it claimed that there is no C compiler, anywhere, that takes
the "raise a signal" option. I'm not sure how anyone could be
justifiably certain about that (unjustified certainty, on the other
hand, is easy to come by). But it is the case that most implementations
choose "implementation-defined result", and most of them define the
result as 2's complement wrap-around, in precisely the same fashion as
the other languages you're talking about.

The right way to deal with signed integer overflow portably is to
prevent it from happening. This can be trivial if the numbers you're
working with have a limited range: if you use a week numbering
convention that guarantees that week_number is always positive and less
than 53, then you can calculate 7*week_number without having to worry
about overflow. However, avoiding overflow can be annoyingly difficult
if you need to write code that works with arbitrary values.

Bart

unread,
Aug 8, 2018, 7:07:34 AM8/8/18
to
On 08/08/2018 11:50, mark.b...@gmail.com wrote:
> On Wednesday, 8 August 2018 11:25:55 UTC+1, Bart wrote:
>> On 08/08/2018 10:40, David Brown wrote:
>>
>>> Teach them that they should not add two numbers if they might overflow -
>>> just as they should not empty one bowl of apples into another bowl of
>>> apples if that bowl is not big enough to hold them all.
>>
>> Unless the bowls are unsigned.
>
> What on earth do you think you mean by this?
...
> The defined behaviour doesn't make the arithmetic results any more correct,

All I ever remember reading on this group is that there is no such thing
as overflow with unsigned integers, so the wraparound results are always
correct.

That would presume that everyone writing such code always had modular
arithmetic in mind.

You're right of course in that the overflow of both signed and unsigned
numbers, when people are trying to emulate pen and paper results, is
what is important and what you should strive to avoid.

But discussion of modular arithmetic and UB tends to side-line that.

--
bart

Ben Bacarisse

unread,
Aug 8, 2018, 7:23:58 AM8/8/18
to
ssize_t is from POSIX. C has no such type. Even if it had, using it
would no help. How can switching from wraparound (that can be easily be
tested for) to undefined overflow be any better?

But both miss the point. You are right -- use size_t for counts in
algorithms but there is usually no need to test for wraparound when
iterating because you will have tested or limited the size of the
problem first.

--
Ben.

David Brown

unread,
Aug 8, 2018, 7:24:54 AM8/8/18
to
It's great to see people who ask, and then appreciate the answer!

There are more things that are UB in C than in most programming
languages. However, some people here (and outside c.l.c.) seem to think
this is necessarily a bad thing, or at least that defined behaviour is
always better. Writing code that runs something with undefined
behaviour is, of course, a bad thing - but the UB itself is not.


David Brown

unread,
Aug 8, 2018, 7:27:22 AM8/8/18
to
No, I don't think so. For one thing, I disagreed with you. I said
don't overflow the apples, and you said "unless the bowls are unsigned".
Don't overflow the bowls even if they are unsigned.

Secondly, I have on a number of occasions pointed out that unsigned
overflow - defined behaviour though it is - is usually incorrect code.




David Brown

unread,
Aug 8, 2018, 7:36:51 AM8/8/18
to
On 08/08/18 13:07, Bart wrote:
> On 08/08/2018 11:50, mark.b...@gmail.com wrote:
>> On Wednesday, 8 August 2018 11:25:55 UTC+1, Bart wrote:
>>> On 08/08/2018 10:40, David Brown wrote:
>>>
>>>> Teach them that they should not add two numbers if they might
>>>> overflow -
>>>> just as they should not empty one bowl of apples into another bowl of
>>>> apples if that bowl is not big enough to hold them all.
>>>
>>> Unless the bowls are unsigned.
>>
>> What on earth do you think you mean by this?
> ...
>> The defined behaviour doesn't make the arithmetic results any more
>> correct,
>
> All I ever remember reading on this group is that there is no such thing
> as overflow with unsigned integers, so the wraparound results are always
> correct.

The wraparound behaviour of unsigned integers means that the behaviour
is always defined.

Whether it is /correct/ depends on what you are trying to model with
them. As Mark says, they have finite capacity, unlike mathematical
non-negative integers. The unsigned integers in C have arithmetic
modulo 2^n for some n. Their arithmetic on overflow is correct in this
model - no matter what values two unsigneds have, adding them will give
the correct answer module 2^n. But if you are wanting to model
mathematical non-negative integers, the answer will be incorrect on
overflow.

>
> That would presume that everyone writing such code always had modular
> arithmetic in mind.
>
> You're right of course in that the overflow of both signed and unsigned
> numbers, when people are trying to emulate pen and paper results, is
> what is important and what you should strive to avoid.
>
> But discussion of modular arithmetic and UB tends to side-line that.
>

When discussing the behaviour of integer overflow in C, the undefined
nature of signed integer overflows compared to the defined nature of
unsigned integer overflows is usually the key point. In both cases, the
answer is probably wrong from the programmer's viewpoint. I can agree
that this correctness issue is often sidelined, but not be me.


David Brown

unread,
Aug 8, 2018, 7:39:44 AM8/8/18
to
On 08/08/18 12:44, Malcolm McLean wrote:
> On Wednesday, August 8, 2018 at 11:21:49 AM UTC+1, Bart wrote:
>>
>> unsigned int a, b;
>> /* code to assign values to a and b */
>> printf("%u + %u is %u\n", a, b, a + b);
>>
>> Does making a and b unsigned magically make the problem of an incorrect
>> result disappear? It doesn't.
>>
>> Except that C says a+b can NEVER overflow, and the result is ALWAYS
>> correct. Which means that UINT_MAX+1 resulting in zero is fine; it
>> doesn't represent any kind of overflow.
>>
>> Good luck putting that across...
>>
> Another issue is that most integers in a program represent either counts
> of things in memory, indices into those arrays, or sizes of memory in bytes.

Nonsense. I know you believe this - you post this claim regularly. It
may be true in /your/ programs, but not other ones. And even if it
/were/ true, it would be mostly irrelevant, except in that overflows of
counts, indices or sizes would be program errors regardless of whether
you use signed or unsigned types.

> So they should be size_t. Which means that C defines wrong results on
> overflow. To be fair, there is ssize_t, but I have seldom seen it used.
>

ssize_t is a POSIX type, not part of the C standards.

David Brown

unread,
Aug 8, 2018, 7:48:06 AM8/8/18
to
There may be a gcc/clang "-fsanitize" option that would count. Normally
gcc and clang define the conversion from unsigned to signed types to use
two's complement wraparound (this is not in any way dependent on -fwrapv
or other flags affecting signed integer overflow). But it is
conceivable that -fsanitize=signed-integer-overflow would change that
behaviour, and count as "raising a signal".


> The right way to deal with signed integer overflow portably is to
> prevent it from happening. This can be trivial if the numbers you're
> working with have a limited range: if you use a week numbering
> convention that guarantees that week_number is always positive and less
> than 53, then you can calculate 7*week_number without having to worry
> about overflow. However, avoiding overflow can be annoyingly difficult
> if you need to write code that works with arbitrary values.
>

Agreed.

John Forkosh

unread,
Aug 8, 2018, 8:06:43 AM8/8/18
to
#include <limits.h>
int a, b;
/* code to assign values to a and b */
if ( (a>=0? b < INT_MAX - a : b > INT_MIN - a) )
printf("%d + %d is %d\n", a, b, a + b);
else printf("don't do that\n");

But since they're apparently pretty new to programming,
it's -- as you seem to be recognizing -- ridiculous to go off
on some long-winded detour:
"> ...a full discussion is beyond the scope of the course"
seems exactly right to me.
--
John Forkosh ( mailto: j...@f.com where j=john and f=forkosh )

David Brown

unread,
Aug 8, 2018, 9:03:34 AM8/8/18
to
On 08/08/18 14:35, Stefan Ram wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
>> The problem is that any bit pattern that represents a negative value in
>> the signed type will represent a positive value in the corresponding
>> unsigned type that is too big to be represented in the signed type.
>
> So, does this mean the old trick to check whether an
> int-value is between 0 and and a positive int value
> (3 is used in the example below) with just one comparison
> really is portable in C and correct for all possible values
> of the argument (and all possible int values of the end of
> the range)?
>
> int is_in_range( int const n ){ return( unsigned )n < 3; }
>
> And can we expect this to be more efficient than "0 <= n &&
> n < 3" or is the function just trying in vain to do the work
> of the optimizer?
>
> In practice, with GCC:
>
> __attribute__((always_inline))
> static inline int is_in_range( int const n )
> { return( unsigned )n < 3; }
>

In practice, with gcc:

static bool is_in_range(int n) {
return (n >= 0) && (n < 3);
}

Write what you /mean/. You are writing a test function - the sensible
return type (for the last 20 years) is "bool".

Let the compiler handle the little stuff. It will do the optimisation
(generating the same code each time). It will decide when inlining
makes sense.

This is in the context of teaching - you don't teach "tricks" like
messing about with casts unless you are doing very advanced classes and
your students already understand the importance of writing clear code.
There /are/ situations when such tricks might be acceptable - buried
deep within the implementations of performance critical libraries,
surrounded by comments explaining what is going on. There is no place
for them in normal coding.

David Brown

unread,
Aug 8, 2018, 9:46:44 AM8/8/18
to
On 08/08/18 15:12, Stefan Ram wrote:
> blm...@myrealbox.com <blmblm.m...@gmail.com> writes:
>> Mostly I'm curious, because I think for my students the most appropriate
>> thing to do is just to continue to say that many commonly-used programming
>> languages (not all!) don't deal very gracefully with this kind of thing
>> and that a full discussion is beyond the scope of the course. But it
>> would be nice to give some hints about how to write truly careful code.
>
> C is a language for experts. It takes a lot of learning
> (time) to become an expert. Most students will not reach
> this level. (Especially when they do not study computer
> science but, say, mechanical engineering.)
>

So teach them simple rules - don't let your numbers overflow. That
works for /all/ programming languages, regardless of whether there is
run-time checking for overflow or not.

And teach them to format their code in commonly used, readable styles.

> Why - of all the programming languages - does it have to be
> C which these poor folks have to learn?
>

I agree with this in principle (though I have no idea why the students
on the OP's course are learning C - it could be for good reasons). I
would not suggest C++, Java or BASIC as sensible alternatives, but there
are many other languages out there which can be a better choice.

However, I would not consider more or less UB as a reason for choosing a
language - nor would I consider integer overflow behaviour to be an
issue in itself, as overflows are errors in /all/ languages. A language
that detects integer overflow at runtime can be convenient for debugging
and make it easier to find problems in the code, which is a definite plus.


Reinhardt Behm

unread,
Aug 8, 2018, 9:56:11 AM8/8/18
to
That's what my parents taught me.

> There are more things that are UB in C than in most programming
> languages. However, some people here (and outside c.l.c.) seem to think
> this is necessarily a bad thing, or at least that defined behaviour is
> always better. Writing code that runs something with undefined
> behaviour is, of course, a bad thing - but the UB itself is not.

Agreed. As far as I understand it, UB usually comes from situations which
are often not clearly definable in the language, a least not generally. What
I have seen so far, they are often in some obscure corners and indicate
possible programming errors due to sloppiness.

--
Reinhardt

Bart

unread,
Aug 8, 2018, 10:21:36 AM8/8/18
to
On 08/08/2018 14:12, Stefan Ram wrote:
> blm...@myrealbox.com <blmblm.m...@gmail.com> writes:
>> Mostly I'm curious, because I think for my students the most appropriate
>> thing to do is just to continue to say that many commonly-used programming
>> languages (not all!) don't deal very gracefully with this kind of thing
>> and that a full discussion is beyond the scope of the course. But it
>> would be nice to give some hints about how to write truly careful code.
>
> C is a language for experts.

Experts in what? In the C standard?

Somebody may have a small low-level task, and requires a subset of the
features of C, and would like it to be done at the speed of C.

Or they may already have an algorithm, and want to implement in C,
perhaps as just one function to be used from another language, and is
aware of the limitations of fixed-width arithmetic.

Someone should be able to use the language knowing the basics. Or by
tweaking an existing working program (how I usually tinker with a
language I don't know).


It takes a lot of learning
> (time) to become an expert. Most students will not reach
> this level. (Especially when they do not study computer
> science but, say, mechanical engineering.)
>
> Why - of all the programming languages - does it have to be
> C which these poor folks have to learn?
>
> LIST
> 10 REM READ TWO NUMBERS AND PRINT THEIR POWER
> 20 INPUT I%
> 30 INPUT J%
> 40 PRINT I% ^ J%

> ? 32767
> ? 32767
> ?OVERFLOW ERROR IN 40

I notice your Java version didn't attempt to do the same thing (using
hardcoded values, and doing add rather than power).

I can try this in one of my languages (the one that apparently can't
hold a candle to Python), where I normally use 64-bit signed ints with
no provision for dealing with overflow:

print "Enter two integers:"
readln a, b
println a ** b

If I enter 10,20, the result is 7766279631452241920 (truncated to 64
bits at each step of the process). If I want a true result that copes
with big numbers, I can write that last line like this:

println longint(a) ** longint(b)

Result is now 100000000000000000000. This is the approach I favour: only
do the extra work if really necessary, like when dealing with user input.

In C, you would probably fall back to 'double' rather than trying to
work with big integers. There are still limitations, but ones people are
already familiar from using calculators. So 10**20+1 might still be 10**20.

--
bart

Malcolm McLean

unread,
Aug 8, 2018, 12:21:40 PM8/8/18
to
On Wednesday, August 8, 2018 at 2:46:44 PM UTC+1, David Brown wrote:
>
> However, I would not consider more or less UB as a reason for choosing a
> language - nor would I consider integer overflow behaviour to be an
> issue in itself, as overflows are errors in /all/ languages. A language
> that detects integer overflow at runtime can be convenient for debugging
> and make it easier to find problems in the code, which is a definite plus.
>
Eliminating UB in favour of a defined exit with an error message is a
good idea, but not practical for C. However it's only a small win, the
program still essentially crashes out, but you can't have a confusing
result which appears to work but corrupts something elsewhere in the
program.
Whilst no finite computer can represent all the integers from 0 to infinity,
that's more a theoretical than a practical restriction. The issue with
C overflows is that numbers which represent something - like the number
of people in the world - can overflow on types which a reasonable but
inexperienced person might assume can hold them, like a long int. In
languages which allow for huge integers by default, that's not a problem.



Rick C. Hodgin

unread,
Aug 8, 2018, 12:32:45 PM8/8/18
to
On Wednesday, August 8, 2018 at 12:21:40 PM UTC-4, Malcolm McLean wrote:
> Eliminating UB in favour of a defined exit with an error message is a
> good idea, but not practical for C.

CAlive's design addresses this with marker points you can drop
in your code, and then return to the marker point on any range
of caught exceptions. You then do a single test manually, or
have it implicitly setup, to handle caught errors. The location
of the error and other information is available, and there's an
unwindTo markerName ability to return up the call stack safely.

The ability to auto-trap to the debugger through an inquiry state
is also available.

These all exist as extensions to the base C language. No C code
changes, excepr the addition of the new things. All other code
and logic remains the same.

--
Rick C. Hodgin

David Brown

unread,
Aug 8, 2018, 1:28:40 PM8/8/18
to
On 08/08/18 18:21, Malcolm McLean wrote:
> On Wednesday, August 8, 2018 at 2:46:44 PM UTC+1, David Brown wrote:
>>
>> However, I would not consider more or less UB as a reason for choosing a
>> language - nor would I consider integer overflow behaviour to be an
>> issue in itself, as overflows are errors in /all/ languages. A language
>> that detects integer overflow at runtime can be convenient for debugging
>> and make it easier to find problems in the code, which is a definite plus.
>>
> Eliminating UB in favour of a defined exit with an error message is a
> good idea, but not practical for C.

It is not a practical idea for /any/ general purpose programming
language. It is possible to do it for some times of UB, maybe even (if
you are willing to pay the run-time costs) for /most/ UB - but not for
all UB.

But it /is/ possible to choose to exit with a error message on some
types of UB - that is perfectly valid for C, and exists in some
implementations (such as gcc with "-ftrapv" or
"-fsanitize=signed-integer-overflow" options).

Whether it is practical for any given use will vary.

> However it's only a small win, the
> program still essentially crashes out, but you can't have a confusing
> result which appears to work but corrupts something elsewhere in the
> program.
> Whilst no finite computer can represent all the integers from 0 to infinity,
> that's more a theoretical than a practical restriction. The issue with
> C overflows is that numbers which represent something - like the number
> of people in the world - can overflow on types which a reasonable but
> inexperienced person might assume can hold them, like a long int. In
> languages which allow for huge integers by default, that's not a problem.
>

There are not many uses for which 32-bit integers are not big enough -
and very, very few for which 64-bit (the most common "long int" these
days) is not suitable. Still, I agree with the principle of your point.

Keith Thompson

unread,
Aug 8, 2018, 1:47:36 PM8/8/18
to
David Brown <david...@hesbynett.no> writes:
[...]
> When discussing the behaviour of integer overflow in C, the undefined
> nature of signed integer overflows compared to the defined nature of
> unsigned integer overflows is usually the key point. In both cases, the
> answer is probably wrong from the programmer's viewpoint. I can agree
> that this correctness issue is often sidelined, but not be me.

A note on terminology.

The C standard (N1570 6.2.5p9) says:

A computation involving unsigned operands can never overflow,
because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one
greater than the largest value that can be represented by the
resulting type.

What this means is that the standard does not use the word "overflow"
to refer to, for example, the behavior of `UINT_MAX + 1U`.

It would have been equally reasonable for the standard to say
instead that operations on unsigned operands *can* "overflow",
and that the behavior is well defined.

If I say that unsigned operations cannot overflow, I'm simply using
the word "overflow" in the sense that the standard uses it. If,
on the other hand, I say that the behavior of unsigned overflow is
well defined, I'm using the word "overflow" in a more informal way,
one that should be easily understood even by a language lawyer.
(I personally prefer to stick to the standard's terminology when
possible, but there's nothing wrong with a little informality --
and if I use the word "overflow" informally, I'll probably note in
passing that the standard doesn't use the word that way.)

And there *are* contexts (not all contexts, but some) in which the
wraparound behavior defined by the language is exactly what you want,
and having `UINT_MAX + 1U` yield `0U` is absolutely correct.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,
Aug 8, 2018, 1:49:27 PM8/8/18
to
Malcolm McLean <malcolm.ar...@gmail.com> writes:
[...]
> Another issue is that most integers in a program represent either counts
> of things in memory, indices into those arrays, or sizes of memory in bytes.
> So they should be size_t. Which means that C defines wrong results on
> overflow. To be fair, there is ssize_t, but I have seldom seen it used.

ssize_t is defined by POSIX, not by ISO C.

Keith Thompson

unread,
Aug 8, 2018, 1:56:27 PM8/8/18
to
James Kuyper <james...@alumni.caltech.edu> writes:
[...]
> The problem is that any bit pattern that represents a negative value in
> the signed type will represent a positive value in the corresponding
> unsigned type that is too big to be represented in the signed type.
> Therefore, when such a value is converted from the unsigned type to the
> signed type "either the result is implementation-defined or an
> implementation-defined signal is raised." (6.3.1.3p3). That's marginally
> better than "undefined behavior": "implementation-defined" behavior is
> unspecified behavior for which an implementation is required to document
> which choice it makes. Therefore, if your code is targeted to only a
> small number of implementations, you can read the documentation for each
> one, determine what it does with signed overflow, and write your code
> accordingly. But such code is not guaranteed to work as desired on all
> implementations of C.

Ignoring the signal clause in that sentence, such a conversion
yields an implementation-defined *result*, not implementation-defined
*behavior*. That means that you can be certain that the conversion
will yield *some* value (and that the implementation must document
how it determines that value), and that, for example, a statement
following the conversion will be executed.

> I've heard it claimed that there is no C compiler, anywhere, that takes
> the "raise a signal" option. I'm not sure how anyone could be
> justifiably certain about that (unjustified certainty, on the other
> hand, is easy to come by). But it is the case that most implementations
> choose "implementation-defined result", and most of them define the
> result as 2's complement wrap-around, in precisely the same fashion as
> the other languages you're talking about.

I don't think I've ever heard that particular claim. I've *made* a
similar claim several times, but I've always been careful to qualify
it by saying that *as far as I know* no C compiler takes the "raise
a signal" option. (And nobody has ever provided a counterexample
in response, which somewhat increases my confidence that there are
no such implementations.)

[...]

Keith Thompson

unread,
Aug 8, 2018, 2:00:08 PM8/8/18
to
r...@zedat.fu-berlin.de (Stefan Ram) writes:
> James Kuyper <james...@alumni.caltech.edu> writes:
>>The problem is that any bit pattern that represents a negative value in
>>the signed type will represent a positive value in the corresponding
>>unsigned type that is too big to be represented in the signed type.
>
> So, does this mean the old trick to check whether an
> int-value is between 0 and and a positive int value
> (3 is used in the example below) with just one comparison
> really is portable in C and correct for all possible values
> of the argument (and all possible int values of the end of
> the range)?
>
> int is_in_range( int const n ){ return( unsigned )n < 3; }

Conversion from int to unsigned int is well defined (and is not
defined in terms of bit representations), so that should work
as intended.

> And can we expect this to be more efficient than "0 <= n &&
> n < 3" or is the function just trying in vain to do the work
> of the optimizer?

It's probably just trying in vain to do the work of the optimizer.

> In practice, with GCC:
>
> __attribute__((always_inline))
> static inline int is_in_range( int const n )
> { return( unsigned )n < 3; }

If you find (by examining the generated assembly code and/or
measuring performance) that the (unsigned) trick yields significantly
faster code than the more obvoious `n >= 0 && n <= 3`, then consider
using it -- and adding a comment explaining why you're not using
the more straightforward method.

David Brown

unread,
Aug 8, 2018, 2:23:26 PM8/8/18
to
I agree with everything you say here (after all, it is straight from the
standards). But I think from the context it was clear that the
"overflow" under discussion was the behaviour of expressions like
INT_MAX + 1 for signed integers and UINT_MAX + 1u for unsigned integers.
We could have been more precise about the terms used, however.

> And there *are* contexts (not all contexts, but some) in which the
> wraparound behavior defined by the language is exactly what you want,
> and having `UINT_MAX + 1U` yield `0U` is absolutely correct.
>

Yes, I said as much, and can provide examples if they are of interest to
anyone. But in most contexts (including counting apples...) modulo
wrapping in unsigned integers gives answers that do not fit the logic of
the problem the code is supposed to handle.

Ben Bacarisse

unread,
Aug 8, 2018, 4:20:27 PM8/8/18
to
John Forkosh <for...@panix.com> writes:

> blm...@myrealbox.com <blmblm.m...@gmail.com> wrote:
>> I teach C programming to undergraduates, and while I do my best
>> to teach them how to use the language correctly, the recent thread
>> about undefined behavior reminds me that I never know quite what to
>> say about overflow in arithmetic on signed integers.
<snip>
>> int a, b;
>> /* code to assign values to a and b */
>> printf("%d + %d is %d\n", a, b, a + b);
<snip>
> #include <limits.h>
> int a, b;
> /* code to assign values to a and b */
> if ( (a>=0? b < INT_MAX - a : b > INT_MIN - a) )
> printf("%d + %d is %d\n", a, b, a + b);
> else printf("don't do that\n");

This prohibits a lot of valid additions. I think you want <= and >= in
the arms of the conditional.

> But since they're apparently pretty new to programming,
> it's -- as you seem to be recognizing -- ridiculous to go off
> on some long-winded detour:
> "> ...a full discussion is beyond the scope of the course"
> seems exactly right to me.

ACK!

--
Ben.

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:15:48 PM8/8/18
to
In article <pkdp5i$3ba$1...@dont-email.me>,
Reinhardt Behm <rb...@hushmail.com> wrote:
> AT Wednesday 08 August 2018 11:10, , wrote:
>
> > I teach C programming to undergraduates, and while I do my best
> > to teach them how to use the language correctly, the recent thread
> > about undefined behavior reminds me that I never know quite what to
> > say about overflow in arithmetic on signed integers. Usually I've
> > just muttered something about how several languages (Java and Scala
> > come to mind) just give "wrong" answers, and there's not a lot one
> > can easily do about it. But if in C it's undefined behavior, hm,
> > that's (strictly speaking!) more serious, and I'm curious about what
> > one *can* do. Consider this code fragment:
> >
> > int a, b;
> > /* code to assign values to a and b */
> > printf("%d + %d is %d\n", a, b, a + b);
> >
> > I noticed in the other thread a suggestion (I think -- I may have
> > misunderstood it) that one can avoid UB in addition of signed integers
> > by first casting to unsigned integers, adding, and then casting back
> > to signed. That seems like it ought to give a result consistent with
> > the behavior or those Other Languages (that quietly wrap around),
> > and is perhaps the best one can reasonably do. Is that what a truly
> > pedantic and careful programmer would do?
> >
> > Mostly I'm curious, because I think for my students the most appropriate
> > thing to do is just to continue to say that many commonly-used programming
> > languages (not all!) don't deal very gracefully with this kind of thing
> > and that a full discussion is beyond the scope of the course. But it
> > would be nice to give some hints about how to write truly careful code.
>
> I would word it in the following way:
> What's usually happens in the CPU (may be not all, but most) is silently
> wrap around. Many languages handle it the same.
> In C the compiler is free to do it the same way and ignore any overflow but
> it is also free to do any kind of nasty things (the nasal daemons..)
>
> Ignoring overflow and wrapping around will lead to mathematically incorrect
> results. They can lead to embarrassing outputs of your program like websites
> we all have seen to show totally nonsensical data (Your subscription will
> end in -32767 days). In real life such results can lead to catastrophic
> outcomes even with people getting killed and the programmer going to jail.
>
> The responsible way of handling this is to always check the possible range
> of inputs and results of calculations - also intermediate ones - and be
> prepared that such overflows can happen and choose your data types
> accordingly to prevent them. In critical programs document this to make sure
> nobody can accuse you and the next programmer modifying the software knows
> about it.

Agreed -- if it matters, this is the responsible thing to do. (But for
the purposes of a beginning programming class -- "pretend it won't happen"
seems more appropriate.)

One question in my mind was *how* to check for potential overflow, but
I notice that a couple of replies give methods for that (which on reading
make me wonder why I thought the problem was hard :-( ).

> [ snip ]

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:19:03 PM8/8/18
to
In article <VuzaD.2312612$I77.1...@fx44.am4>, Bart <b...@freeuk.com> wrote:
> On 08/08/2018 04:10, blm...@myrealbox.com wrote:
> > I teach C programming to undergraduates, and while I do my best
> > to teach them how to use the language correctly, the recent thread
> > about undefined behavior reminds me that I never know quite what to
> > say about overflow in arithmetic on signed integers. Usually I've
> > just muttered something about how several languages (Java and Scala
> > come to mind) just give "wrong" answers, and there's not a lot one
> > can easily do about it. But if in C it's undefined behavior, hm,
> > that's (strictly speaking!) more serious, and I'm curious about what
> > one *can* do. Consider this code fragment:
> >
> > int a, b;
> > /* code to assign values to a and b */
> > printf("%d + %d is %d\n", a, b, a + b);
> >
> > I noticed in the other thread a suggestion (I think -- I may have
> > misunderstood it) that one can avoid UB in addition of signed integers
> > by first casting to unsigned integers, adding, and then casting back
> > to signed. That seems like it ought to give a result consistent with
> > the behavior or those Other Languages (that quietly wrap around),
> > and is perhaps the best one can reasonably do.
>
> This is what makes it so silly. You have to obfuscate your code just to
> get to the starting point of those other languages. And do it in a
> million places (eg. everywhere you might use ++i).
>

I disagree about ++i. I don't know about your use cases, but for
me by far the most common one for ++i is as the increment part of a
"for" loop, and I'm not sure I can think of any way you could get
overflow in the simple and most-common-for-me case:

for (int i = 0; i < N; ++i) { .. }

where N is an "int".

(Arguably if i is being used an array index the right type for it is
size_t, but leave that for now.)

> [ snip ]

(I read the rest with interest but in the interest of replying to
everyone won't comment more. I shouldn't be surprised by the number
and length of replies, but -- I was.)

> Good luck putting that across...
>

Yeah. But really, I'm inclined to think that in any course in a
difficult and technical subject there are going to be points in a
first course where you just have to say "the details here are beyond
the scope of this course, but be aware that what we do in this course
is sort of a first approximation".

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:19:33 PM8/8/18
to
In article <pkehfm$t60$1...@dont-email.me>,
Reinhardt Behm <rb...@hushmail.com> wrote:
> AT Wednesday 08 August 2018 17:20, Malcolm McLean wrote:
>
> > On Wednesday, August 8, 2018 at 4:48:08 AM UTC+1, Reinhardt Behm wrote:
> >>
> >> A question to the standard experts here:
> >> Arithmetic operations with int types can inevitably lead to overflow and
> >> thus UB. Which allows the compiler do to anything nasty. Can we expect
> >> that if the inputs are such that no overflow will happen, we are
> >> guaranteed the mathematically correct values and no nasal daemons, even
> >> if the compiler has no chance to deduce that no OV will happen.
> >>
> > Take this code.
> >
> > int main(int argc, char **argv)
> > {
> > printf("%d\n", atoi(argv[1]) + atoi(argv[2]));
> > }
> >
> > The compiler has no way of detecting overflow at compile time. However
> > if the inputs are within range, it must act as you would expect.
> > Otherwise practically any calculation would be UB because it's generally
> > to hard to reason about values set in distant parts of the program,
> > even if the programmer knows that they must be small.
> >
> > However if the results overflow, output can be anything. Generally it
> > will be a wrap, but a high quality implementation might terminate with
> > an error message, because, as you say, wrong results are often worse
> > than no results at all.
>
> Thank you and David for this clarification. I am no language lawyer, but
> sometimes one can get the impression in this group that everything is UB.
>

I wondered about that too, so I'm glad you asked and that someone answered.

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:20:23 PM8/8/18
to
In article <pkeij0$bi0$1...@dont-email.me>,
Oh. Duh. I should have thought to check that. Instead I did a couple
of experiments .... And I *KNOW* (and try to teach my students!) that
that isn't a foolproof way of finding out something about C!! So thanks.


> I've heard it claimed that there is no C compiler, anywhere, that takes
> the "raise a signal" option. I'm not sure how anyone could be
> justifiably certain about that (unjustified certainty, on the other
> hand, is easy to come by). But it is the case that most implementations
> choose "implementation-defined result", and most of them define the
> result as 2's complement wrap-around, in precisely the same fashion as
> the other languages you're talking about.
>
> The right way to deal with signed integer overflow portably is to
> prevent it from happening. This can be trivial if the numbers you're
> working with have a limited range: if you use a week numbering
> convention that guarantees that week_number is always positive and less
> than 53, then you can calculate 7*week_number without having to worry
> about overflow. However, avoiding overflow can be annoyingly difficult
> if you need to write code that works with arbitrary values.
>

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:21:02 PM8/8/18
to
In article <overflow-20...@ram.dialup.fu-berlin.de>,
Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
> > However, avoiding overflow can be annoyingly difficult
> >if you need to write code that works with arbitrary values.
>
> Just because I cannot resist to repost my old code ...
>
> The following program will read two integral numbers and
> then print their sum. The longest functions is dedicated
> to reading a number:
>
> #include <stdio.h>
> #include <stdlib.h>
> #include <limits.h>
> #include <ctype.h>
>
> int addintintoverflow( int const i, int const j )
> { return
> i > 0 && j > 0 && i >( INT_MAX - j )||
> i < 0 && j < 0 && i <( INT_MIN - j ); }
>
> int mulintintoverflow( int const i, int const j )
> { if( i > 0 )
> { if( j > 0 ){ if( i > INT_MAX / j )return 1; }
> else { if( j < INT_MIN / i )return 2; }}
> else
> { if( j > 0 ){ if ( i < INT_MIN / j )return 3; }
> else if ( i != 0 && j < INT_MAX / i )return 4; }
> return 0; }


OHHHH. Duh. Why did I think it would be difficult if even possible
to detect overflow? :-( Anyway thanks.

>
> int add( int const i, int const j )
> { if( addintintoverflow( i, j ))
> return fprintf( stderr, "overflow.\n" ), EXIT_FAILURE;
> return printf( "%d\n", i+j ) > 0 ? EXIT_SUCCESS : EXIT_FAILURE; }
>
> int read( int * p )
> { int ch;
> while( ch = getchar(), isspace( ch ));
> if( ch < 0 )return 0;
> if( ch == '-' )
> fprintf( stderr, "minus sign not implemented yet.\n" ), exit( 99 );
> int value;
> if( !isdigit(( unsigned char )ch ))return 0; else
> { value = ch - '0';
> int ch; while( ch = getchar(), isdigit(( unsigned char )ch ))
> { if( mulintintoverflow( value, 10 ))return 0;
> value = value * 10;
> int const v = ch - '0';
> if( addintintoverflow( value, v ))return 0;
> value = value + v;
> continue; }}
> *p = value; return 1; }
>
> int main( void )
> { int i, j; if( read( &i )&& read( &j ))return add( i, j );
> fprintf( stderr, "input error.\n" ); return EXIT_FAILURE; }

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:25:57 PM8/8/18
to
In article <C-201808...@ram.dialup.fu-berlin.de>,
Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> blm...@myrealbox.com <blmblm.m...@gmail.com> writes:
> >Mostly I'm curious, because I think for my students the most appropriate
> >thing to do is just to continue to say that many commonly-used programming
> >languages (not all!) don't deal very gracefully with this kind of thing
> >and that a full discussion is beyond the scope of the course. But it
> >would be nice to give some hints about how to write truly careful code.
>
> C is a language for experts. It takes a lot of learning
> (time) to become an expert. Most students will not reach
> this level. (Especially when they do not study computer
> science but, say, mechanical engineering.)
>
> Why - of all the programming languages - does it have to be
> C which these poor folks have to learn?
>

Well. That's a good question, and I'm happy to answer. This will get
long ....

I teach (at a 4-year US college) two courses using C:

One is a one-credit-hour course in C programming required as part of
our BS/CS degree program. Our department thinks it's important that
people getting this degree have some exposure to straight C, *because*
it's fairly low-level. (For the curious, we teach our beginning
courses for majors using Scala and some of the later ones using C++.)
Students in this course know how to write programs in *some* language
but are mostly not yet expert programmers. I try to really emphasize
with this group that C is full of traps for the unwary and suggest
ways to avoid them, but sometimes "beyond the scope of this course"
seems like the right approach. It's also not surprising for one
of these students to want to know more, so I like to feel confident
enough about my own knowledge to say what the "beyond" might be.

The other is a beginning programming course for engineering majors.
Up until a few years ago, these students took our intro course for
CS majors, but when we switched that to Scala, well, ENGR was not
happy. I wasn't involved in the ensuing discussion, but what was
reported to me was that they insisted on a language "with pointers",
which to us means C++ or C, and we all pretty much agreed that for
beginners C++ is just way too big and complex. I personally think
Python might be a better choice for this group -- they might actually
use it, and you can write much more interesting programs in it --
but "they" want "pointers", and we don't want to teach C++ as a first
programming language, so C it is.

I tell both groups that as far as I know there *is* still a small
market for C programmers. Some of it's operating-systems stuff, but
my understanding is that there are also some embedded systems that
are programmed in C, and that's a niche market where an engineer
might end up. And some of our industry contacts (the ones who work
in security) say they want people who know some C (not sure why).

For the CS majors I add that we think exposure to programming at this
level is useful in giving them the broad conceptual understanding of
the field that the degree is supposed to represent, even if they never
write another C program.

I also tell both groups that only a C fanatic would use C for general
application programs; there are plenty of other choices that are
more suitable for that. I add, for the engineers, that the first
programming language is the hardest; learning a second will be
much easier.

> [ snip ]

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 10:26:55 PM8/8/18
to
In article <87zhxw3...@bsb.me.uk>,
Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> John Forkosh <for...@panix.com> writes:
>
> > blm...@myrealbox.com <blmblm.m...@gmail.com> wrote:
> >> I teach C programming to undergraduates, and while I do my best
> >> to teach them how to use the language correctly, the recent thread
> >> about undefined behavior reminds me that I never know quite what to
> >> say about overflow in arithmetic on signed integers.
> <snip>
> >> int a, b;
> >> /* code to assign values to a and b */
> >> printf("%d + %d is %d\n", a, b, a + b);
> <snip>
> > #include <limits.h>
> > int a, b;
> > /* code to assign values to a and b */
> > if ( (a>=0? b < INT_MAX - a : b > INT_MIN - a) )
> > printf("%d + %d is %d\n", a, b, a + b);
> > else printf("don't do that\n");
>
> This prohibits a lot of valid additions. I think you want <= and >= in
> the arms of the conditional.


OHHHH. Duh. Why did I think it would be difficult if even possible
to detect overflow? :-( Anyway thanks to both of you.


> > But since they're apparently pretty new to programming,
> > it's -- as you seem to be recognizing -- ridiculous to go off
> > on some long-winded detour:
> > "> ...a full discussion is beyond the scope of the course"
> > seems exactly right to me.
>
> ACK!
>

Glad to hear agreement. But it's nice to have something in reserve
for the occasional student who wants to know more -- "beyond the scope,
but the short version of a better answer is .... ".

james...@alumni.caltech.edu

unread,
Aug 8, 2018, 10:43:31 PM8/8/18
to
On Wednesday, August 8, 2018 at 10:19:03 PM UTC-4, blm...@myrealbox.com wrote:
...
> I disagree about ++i. I don't know about your use cases, but for
> me by far the most common one for ++i is as the increment part of a
> "for" loop, and I'm not sure I can think of any way you could get
> overflow in the simple and most-common-for-me case:
>
> for (int i = 0; i < N; ++i) { .. }
>
> where N is an "int".

That's easy - in fact, it was discussed here just recently:

for(int i = INT_MIN; i<=INT_MAX; i++) { ... }

Obviously, that loop won't work as desired, no matter how int
overflow is handled. The question under discussion was how to fix it.

Richard Damon

unread,
Aug 8, 2018, 10:57:37 PM8/8/18
to
Not quite. Many forms of UB could fairly easily be caught at run time,
but with a cost, and due to the design of C, the standard doesn't want
to impose that cost, so it doesn't require the check and leaves it as UB.

Most of the decisions in the language design make a lot of sense when
you think of the original purpose of the language.

It was intended as a useful tool in the hands of the expert. Adding a
safety net for the less experienced wouldn't generally be done if that
net got in the way of the expert. This means the compiler will normally
'trust' the programmer, and not 'complain' of code that might be
questionable. Questionable code might be pointed out be a separate
analysis with something like lint.

The language was designed so that most things have the possibility of
being done reasonably efficiently on most machines, but implementations
are not required to be efficient. This means that the breadth of
behavior of the early machines around when the language was being
defined limited what the language would define. Signed overflow is one
such condition, so rather than trying to define what it should do and
impose a cost on all implementation that behaved differently, it was
left undefined. Unsigned arithmetic didn't leave this undefined because
there are cases where you want to be able to handle the overflow, and
unsigned arithmetic is relatively rare, so the added cost isn't normally
in effect.

A big purpose of the language was to be able to write reasonably
portable code that was reasonably efficient on many machines.
Portability didn't need to be perfect, but if you stayed within the
bounds of defined behavior things would generally work,

It was also intended that implementations for a given machine could (but
didn't have to) define that natural implementation behavior was
preserved (like signed overflow wrapping) so that programs that didn't
need to be portable could gain addition efficiencies.

Because the language was defined to allow porting system programs to new
machines, the C language was defined so that a very simple (and dumb)
implementation was possible. Once you have this going, you could then
port over the faster and more efficient implementations. A C complier
can be built as a one pass operation, possible broken up into a number
of 1 pass phases, never needing to go back to reprocess previous results
(the final linking step will need to be able to go back and patch
previously generated output).

blmblm.m...@gmail.com

unread,
Aug 8, 2018, 11:41:47 PM8/8/18
to
In article <6b247311-f0f4-4dd0...@googlegroups.com>,
True. But that's not quite a kind of loop I typically write, and the
loop I wrote doesn't have that problem, even if N is INT_MAX, does it?

(I admit that I halfway expect a reply that produces another head-slap
moment. :-)? )

Reinhardt Behm

unread,
Aug 9, 2018, 1:37:38 AM8/9/18
to
Full ACK.

As the programmer I want to have control when and how checks are done. I
don't want do have any checking introduced by the compiler which are not
needed because I know it (e.g. overflow) cannot happen.
Even if there would be some kind of error handling, how do I make sure this
is correct and gets tested.
In many cases of an embedded system there is not even something that could
be done by the compiler/run time lib in case of such a "problem".
There is not display device for an error message, the system is located
somewhere where nobody would be able to look at a display.
Even a simple program abort is prohibitive. You don't want the autopilot
just shutting itself off in when airborne.

Usually the complicated tests for possible overflow covering all corner
cases shown in this thread are not needed. We don't process arbitrary values
as in academic examples, our variables will never be INT_MAX.
When my input comes from a 10 bit ADC I know the values lie within 0..1023.
If I scale that by 25 / 7, I _know_ the intermediate product (adc*25) fits
into a 16 bit int.
So I don't need and don't want any checks created by the compiler.
And this knowledge is documented with comments and in the requirements. So
even if later somebody wants to use this module in an application with a 12
bit ADC where an overflow would happen, the constraints and assumptions can
easily be found. We do not want the Ariane V thing to happen again.

--
Reinhardt

Tim Rentsch

unread,
Aug 9, 2018, 2:24:25 AM8/9/18
to
blm...@myrealbox.com <blmblm.m...@gmail.com> writes:

> I teach C programming to undergraduates, and while I do my best to
> teach them how to use the language correctly, the recent thread
> about undefined behavior reminds me that I never know quite what to
> say about overflow in arithmetic on signed integers. Usually I've
> just muttered something about how several languages (Java and Scala
> come to mind) just give "wrong" answers, and there's not a lot one
> can easily do about it. But if in C it's undefined behavior, hm,
> that's (strictly speaking!) more serious, and I'm curious about what
> one *can* do. Consider this code fragment:
>
> int a, b;
> /* code to assign values to a and b */
> printf("%d + %d is %d\n", a, b, a + b);
>
> I noticed in the other thread a suggestion (I think -- I may have
> misunderstood it) that one can avoid UB in addition of signed
> integers by first casting to unsigned integers, adding, and then
> casting back to signed. That seems like it ought to give a result
> consistent with the behavior or those Other Languages (that quietly
> wrap around), and is perhaps the best one can reasonably do. Is
> that what a truly pedantic and careful programmer would do?
>
> Mostly I'm curious, because I think for my students the most
> appropriate thing to do is just to continue to say that many
> commonly-used programming languages (not all!) don't deal very
> gracefully with this kind of thing and that a full discussion is
> beyond the scope of the course. But it would be nice to give some
> hints about how to write truly careful code.

There are a couple of different issues here, and it's important
to distinguish and prioritize them.

For programming in C, it's important to understand the notion of
undefined behavior. This topic deserves treatment on its own, not
as a footnote or parenthetical explanation of something else (ie,
such as integer arithmetic in this case). And that explanation
should come before other topics, like arithmetic operations, where
undefined behavior is part of the definition (or lack thereof) of
the operations involved.

This explanation needn't go into all the gory details. You might
say that some programming languages are "safe" (or perhaps mostly
safe), in that if a program does something wrong what happens is
still fairly well delineated. C isn't like that: in C some
operations are "unsafe", and the only thing that can be relied on
is that these cases should not be relied on. Follow with a couple
of examples in each of the two categories, safe and unsafe. After
(and only after) introducing the concept of undefined behavior
should the class then get on to signed arithmetic and how to deal
with it.

As for how to steer clear of the danger zones, there are different
ways and different ideas for how one should or might do that. One
way is to stick to common patterns, like the example you give in
a later posting

for( i = something; i < N; i++ ){ ... }

If N has the same type that i does, there will never be a problem
with overflow or undefined behavior (of course, not counting
cases where 'i' might be assigned inside the loop body).

Another way that some people favor is to prefer using unsigned
types to using signed types. I find this technique very useful,
especially for variables used for indexing. Of course, using
unsigned types has its own set of gotchas, and those must be
guarded against, but usually the consequences are more benign
than direct undefined behavior.

Another idea is to use a type with a large range, like long long.
Doing this doesn't eliminate the problem of undefined behavior,
but it does greatly reduce it. In practice just using long long
may give better ROI, safety-wise, than writing very careful code
but with shorter types.

I'm sure there are other practices that could go on this list.
There isn't any fixed set of rules one can follow that always
guarantees a good result, partly because the problem is
multi-dimensional. For example, code that is safer might also run
slower, and in some cases speed considerations dominate. For
purposes of your class, I think the key point to emphasize is that
one should be conscious of the potential dangers, and be aware of
when code is starting to get outside of one's personal safety
envelope. Always staying inside is okay. Going outside but
always being very careful is okay. What isn't okay is wandering
back and forth across the edge without realizing it and without
taking any extra precautions.

So for that they may be worth, there are my suggestions.

Tim Rentsch

unread,
Aug 9, 2018, 2:28:16 AM8/9/18
to
>> On Wednesday, August 8, blm...@myrealbox.com wrote:
>> ...
>>
>>> I disagree about ++i. I don't know about your use cases, but for
>>> me by far the most common one for ++i is as the increment part of a
>>> "for" loop, and I'm not sure I can think of any way you could get
>>> overflow in the simple and most-common-for-me case:
>>>
>>> for (int i = 0; i < N; ++i) { .. }
>>>
>>> where N is an "int".
>>
>> That's easy - in fact, it was discussed here just recently:
>>
>> for(int i = INT_MIN; i<=INT_MAX; i++) { ... }
>>
>> Obviously, that loop won't work as desired, no matter how int
>> overflow is handled. The question under discussion was how to fix it.
>
> True. But that's not quite a kind of loop I typically write, and the
> loop I wrote doesn't have that problem, even if N is INT_MAX, does it?
>
> (I admit that I halfway expect a reply that produces another head-slap
> moment. :-)? )

FWIW I think your comments here are right on the money. IMO
if anyone merits a head-slap it is James, for giving an
example that is not on-point for your posted use-pattern.

Tim Rentsch

unread,
Aug 9, 2018, 2:34:14 AM8/9/18
to
James Kuyper <james...@alumni.caltech.edu> writes:

> On 08/07/2018 11:10 PM, blm...@myrealbox.com wrote:
>
>> I teach C programming to undergraduates, and while I do my best
>> to teach them how to use the language correctly, the recent thread
>> about undefined behavior reminds me that I never know quite what to
>> say about overflow in arithmetic on signed integers. Usually I've
>> just muttered something about how several languages (Java and Scala
>> come to mind) just give "wrong" answers, and there's not a lot one
>> can easily do about it. But if in C it's undefined behavior, hm,
>> that's (strictly speaking!) more serious, and I'm curious about what
>> one *can* do. Consider this code fragment:
>>
>> int a, b;
>> /* code to assign values to a and b */
>> printf("%d + %d is %d\n", a, b, a + b);
>>
>> I noticed in the other thread a suggestion (I think -- I may have
>> misunderstood it) that one can avoid UB in addition of signed integers
>> by first casting to unsigned integers, adding, and then casting back
>> to signed. That seems like it ought to give a result consistent with
>> the behavior or those Other Languages (that quietly wrap around),
>> and is perhaps the best one can reasonably do. Is that what a truly
>> pedantic and careful programmer would do?
>
> No.
>
> The problem is that any bit pattern that represents a negative value in
> the signed type will represent a positive value in the corresponding
> unsigned type that is too big to be represented in the signed type.

It can be. But it doesn't have to be.

Tim Rentsch

unread,
Aug 9, 2018, 2:37:52 AM8/9/18
to
Keith Thompson <ks...@mib.org> writes:

> r...@zedat.fu-berlin.de (Stefan Ram) writes:
>
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>
>>> The problem is that any bit pattern that represents a negative value in
>>> the signed type will represent a positive value in the corresponding
>>> unsigned type that is too big to be represented in the signed type.
>>
>> So, does this mean the old trick to check whether an
>> int-value is between 0 and and a positive int value
>> (3 is used in the example below) with just one comparison
>> really is portable in C and correct for all possible values
>> of the argument (and all possible int values of the end of
>> the range)?
>>
>> int is_in_range( int const n ){ return( unsigned )n < 3; }
>
> Conversion from int to unsigned int is well defined (and is not
> defined in terms of bit representations), so that should work
> as intended.

On implementations where UINT_MAX > INT_MAX. Others, not
so much.

John Forkosh

unread,
Aug 9, 2018, 2:38:47 AM8/9/18
to
I think Reinhardt Behm already gave you a pretty good answer for that,
which I believe you agreed with, and that I'd also agree with...
o choose your datatypes appropriately and check your inputs
(i.e., checking for a+b overflow after some long calculation for
a and b is "closing the stable door after the horse has bolted")
So then you additionally asked
o assume that won't happen for beginning programmers, so then what?
Yeah, but then for beginning programmers you're back to your own remark,
with which we all agreed,
o "a full discussion is beyond the scope of the course"
And so now, for "the occasional student who wants to know more", you can
presumably give that more capable student Behm's advice, and for this
more capable student
o "assume that it will happen"
Or something like that.
--
John Forkosh ( mailto: j...@f.com where j=john and f=forkosh )

David Brown

unread,
Aug 9, 2018, 4:23:26 AM8/9/18
to
Just to be clear, the "costs" here include run-time checking costs,
missed optimisation opportunities, and (usually less relevant today)
awkward code generation on some targets.

And also many forms of UB could /not/ easily be caught at run time, such
as most pointer issues.

Some other forms of UB could be caught at compile time, or link time
(some compilers do better at this than others).

>> Most of the decisions in the language design make a lot of sense when
>> you think of the original purpose of the language.
>>
>> It was intended as a useful tool in the hands of the expert. Adding a
>> safety net for the less experienced wouldn't generally be done if that
>> net got in the way of the expert. This means the compiler will normally
>> 'trust' the programmer, and not 'complain' of code that might be
>> questionable. Questionable code might be pointed out be a separate
>> analysis with something like lint.

These days, good compilers do quite a bit of the work that used to
require separate programs like "lint". But there are many programs
around for doing extra static analysis - some general, some specialised.
And there are compiler/library combinations for doing a fair amount of
run-time UB detection for those that want to use them.

But for those that don't want them - or just turn them off - UB and the
"trust the programmer" attitude are key to C's efficiency.

>>
>> The language was designed so that most things have the possibility of
>> being done reasonably efficiently on most machines, but implementations
>> are not required to be efficient. This means that the breadth of
>> behavior of the early machines around when the language was being
>> defined limited what the language would define. Signed overflow is one
>> such condition, so rather than trying to define what it should do and
>> impose a cost on all implementation that behaved differently, it was
>> left undefined. Unsigned arithmetic didn't leave this undefined because
>> there are cases where you want to be able to handle the overflow, and
>> unsigned arithmetic is relatively rare, so the added cost isn't normally
>> in effect.

It is not right to say that signed overflow is UB to avoid extra costs
on some odd implementations. That is one of the reasons, yes, but it is
not the only one. Others include there being no logical, rational
choice for the behaviour, it allows a range of optimisations, and by
making signed overflow a mistake rather than defined behaviour, it is
far easier for compiler warnings, linters and other analysers to point
out the error.
Exactly, yes. This is something often missed by people who think UB
should be detected when possible. Equally, you don't want your
autopilot's altitude to roll over from 32,767 feet to -32,768 feet as
you climb - defining signed integer overflow would not be at all helpful
in most real-world cases.

And for those not working on embedded systems - you don't want your
database server or your sound card driver throwing up error messages and
asking if it is OK to restart.

>
> Usually the complicated tests for possible overflow covering all corner
> cases shown in this thread are not needed. We don't process arbitrary values
> as in academic examples, our variables will never be INT_MAX.

And usually if you /have/ big values, you can just use bigger types.
Programming in C has got easier over the years!

> When my input comes from a 10 bit ADC I know the values lie within 0..1023.
> If I scale that by 25 / 7, I _know_ the intermediate product (adc*25) fits
> into a 16 bit int.
> So I don't need and don't want any checks created by the compiler.
> And this knowledge is documented with comments and in the requirements. So
> even if later somebody wants to use this module in an application with a 12
> bit ADC where an overflow would happen, the constraints and assumptions can
> easily be found. We do not want the Ariane V thing to happen again.
>

I agree with you up to a point - but not the "document with comments"
part. Document it with static assertions or other compile-time checks
whenever possible :

#define ADC_MAX 1023
#define SCALE_TOP 25
#define SCALE_BOTTOM 7

static_assert((ADC_MAX * SCALE_TOP < INT_MAX), "Check the types for adc
scaling");

This is /far/ safer than:

// Make sure ADC_MAX * SCALE_TOP is within the range of int



David Brown

unread,
Aug 9, 2018, 4:36:44 AM8/9/18
to
It /always/ matters, so you should /always/ check your ranges. That
does not mean you always need to add run-time checks to the code - it
means using appropriate checks for the task in hand. That might mean
simply knowing the ranges in question. It might mean comments or
documentation. It might mean notes in the answer to the homework
question. It might mean careful choices and naming of types or
functions to keep things clear. It might be entirely obvious due to the
nature of the task. There is a huge difference in the effort needed
here for a quick test program and a rocket guidance system. But you
should /always/ think about it and be sure your code is safe enough for
the task. And that should, IMHO, be drilled into the students from the
start of their first class - it should not be an afterthought!

>
> One question in my mind was *how* to check for potential overflow, but
> I notice that a couple of replies give methods for that (which on reading
> make me wonder why I thought the problem was hard :-( ).
>

General checking for potential overflow is often hard to do well - it's
easy to get things slightly wrong, or end up with ugly and messy code.
It is not often a good idea to try to do the kind of "is it safe to add
these two int's" checks - it is error prone coding and usually not the
logical thing to do. Instead, look directly at the parameters for the
function being called - check those for sensible values. Then make sure
you are using types that are big enough for the purpose - fixed size
types in <stdint.h> are often a very easy way to get that.


David Brown

unread,
Aug 9, 2018, 4:45:06 AM8/9/18
to
On 09/08/18 05:41, blm...@myrealbox.com wrote:
> In article <6b247311-f0f4-4dd0...@googlegroups.com>,
> <james...@alumni.caltech.edu> wrote:
>> On Wednesday, August 8, 2018 at 10:19:03 PM UTC-4, blm...@myrealbox.com wrote:
>> ...
>>> I disagree about ++i. I don't know about your use cases, but for
>>> me by far the most common one for ++i is as the increment part of a
>>> "for" loop, and I'm not sure I can think of any way you could get
>>> overflow in the simple and most-common-for-me case:
>>>
>>> for (int i = 0; i < N; ++i) { .. }
>>>
>>> where N is an "int".
>>
>> That's easy - in fact, it was discussed here just recently:
>>
>> for(int i = INT_MIN; i<=INT_MAX; i++) { ... }
>>
>> Obviously, that loop won't work as desired, no matter how int
>> overflow is handled. The question under discussion was how to fix it.
>
> True. But that's not quite a kind of loop I typically write, and the
> loop I wrote doesn't have that problem, even if N is INT_MAX, does it?
>

Yes, but it is worth knowing (and showing your students) some examples
of what /not/ to write, and why they are wrong. Learning to write good
code also involves learning not to write bad code.

Since loops like that example are rarely useful, there is no need to
jump through hoops to try to get "elegant" alternatives - such
discussions are fine for a c.l.c. thread, but not for a C programming
course.

Reinhardt Behm

unread,
Aug 9, 2018, 5:00:20 AM8/9/18
to
AT Thursday 09 August 2018 16:23, David Brown wrote:

>> When my input comes from a 10 bit ADC I know the values lie within
>> 0..1023. If I scale that by 25 / 7, I know the intermediate product
>> (adc*25) fits into a 16 bit int.
>> So I don't need and don't want any checks created by the compiler.
>> And this knowledge is documented with comments and in the requirements.
>> So even if later somebody wants to use this module in an application with
>> a 12 bit ADC where an overflow would happen, the constraints and
>> assumptions can easily be found. We do not want the Ariane V thing to
>> happen again.
>>
>
> I agree with you up to a point - but not the "document with comments"
> part. Document it with static assertions or other compile-time checks
> whenever possible :
>
> #define ADC_MAX 1023
> #define SCALE_TOP 25
> #define SCALE_BOTTOM 7
>
> static_assert((ADC_MAX * SCALE_TOP < INT_MAX), "Check the types for adc
> scaling");
>
> This is far safer than:
>
> // Make sure ADC_MAX * SCALE_TOP is within the range of int

ACK
Just another question:
How far can you trust the compiler with the expression in your
static_assert? Will it use large enough types internally?

--
Reinhardt

David Brown

unread,
Aug 9, 2018, 5:24:39 AM8/9/18
to
Don't try to detect overflow - try to /avoid/ overflow. If you are
using appropriate values, appropriate operations, and appropriate types,
then there will be no overflow. If that is not the case, then fix the
problem rather than trying to detect overflow.

>>
>> int add( int const i, int const j )
>> { if( addintintoverflow( i, j ))
>> return fprintf( stderr, "overflow.\n" ), EXIT_FAILURE;
>> return printf( "%d\n", i+j ) > 0 ? EXIT_SUCCESS : EXIT_FAILURE; }
>>

Whatever you teach your students, I hope you don't teach them to write a
function called "add" with two "int" parameters and an "int" return type
that returns the length of a string printed out!

And I /really/ hope you don't teach them to follow the incredible
formatting style shown in these examples.

>> int read( int * p )
>> { int ch;
>> while( ch = getchar(), isspace( ch ));
>> if( ch < 0 )return 0;
>> if( ch == '-' )
>> fprintf( stderr, "minus sign not implemented yet.\n" ), exit( 99 );
>> int value;
>> if( !isdigit(( unsigned char )ch ))return 0; else
>> { value = ch - '0';
>> int ch; while( ch = getchar(), isdigit(( unsigned char )ch ))
>> { if( mulintintoverflow( value, 10 ))return 0;
>> value = value * 10;
>> int const v = ch - '0';
>> if( addintintoverflow( value, v ))return 0;
>> value = value + v;
>> continue; }}
>> *p = value; return 1; }
>>

I also hope you would never recommend a mess like that for an algorithm
for reading an integer. Try this:

bool readIntValidated(int * p)
{
int ch;

// Skip white space
do {
ch = getchar();
if (ch < 0) return false;
} while (isspace(ch));

// Handle minus sign
bool minus = false;
if (ch == '-') {
minus = true;
ch = getchar();
if (ch < 0) return false;
}

// If no digits are found at all, return false
if (!isdigit(ch)) return false;

int value = 0;
while (true) {
if (isdigit(ch)) {
const int v = ch - '0';
if (minus) {
if (value < INT_MIN / 10) return false;
value *= 10;
if (value < (INT_MIN + v)) return false;
value -= v;
} else {
if (value > INT_MAX / 10) return false;
value *= 10;
if (value > (INT_MAX - v)) return false;
value += v;
}
} else {
// Non-digit ends input
*p = value;
return true;
}
ch = getchar();
if (ch < 0) return false;
}
}

(The code is not tested, and may have bugs.)

Some key points here are:

1. Don't use complicated general check functions unnecessarily. Check
what you want to know.

2. Don't obsess about DRY ("don't repeat yourself"). If the natural
flow of the code involves calling "getchar()" or "isdigit" in several
places, that's fine - don't use convoluted loop structures just to try
to avoid them. The compiler will handle the efficiency for you.

3. Use brackets and indentation to make the structure clear.


An alternative here would be to use "long long int value = 0;" and then
you can do the arithmetic safely and test afterwards:

if (minus) {
value = (value * 10) - v;
if (value < INT_MIN) return false;
} else {
value = (value * 10) + v;
if (value > INT_MAX) return false;
}

(Theoretically, "long long int" does not have to be larger than "int".
In practice, it is.)

David Brown

unread,
Aug 9, 2018, 5:45:41 AM8/9/18
to
The first parameter of static_assert (or _Static_assert) is an "integer
constant expression". It can use any integer types supported by the
compiler. You can trust it further than you can trust me - to be safe,
you may want the static_assert here to be:

static_assert(((long long int) ADC_MAX * SCALE_TOP < INT_MAX),
"Check the types for adc scaling");

Otherwise since ADC_MAX and SCALE_TOP are both constants of type "int",
the multiplication will be done as "int" - and overflow if they are too
big. (Your compiler will probably warn you about that.)


David Brown

unread,
Aug 9, 2018, 5:55:39 AM8/9/18
to
I agree that Python is likely to be a better choice here. It is good
for engineering and maths.

>
> I tell both groups that as far as I know there *is* still a small
> market for C programmers. Some of it's operating-systems stuff, but
> my understanding is that there are also some embedded systems that
> are programmed in C, and that's a niche market where an engineer
> might end up. And some of our industry contacts (the ones who work
> in security) say they want people who know some C (not sure why).

There is still a very big market for C programming. A fair amount of
what is done in C could probably be better done in other languages, but
inertia is high - and there are some people that insist on C everywhere,
regardless of how appropriate it is. You are right about its use in OS
and low-level stuff (drivers, etc.), and for embedded work (where C is
dominant - and there is a /lot/ of embedded programming going on). But
is also highly important in a few other areas. Code that needs to be
very efficient, especially in library code, is typically done in C.
Code that needs to be accessible from many languages (again, often in
libraries) is also often written in C - it is the language of choice for
interfaces due to the simplicity of its data types and function calling
conventions. Code for the libraries and run-time systems for other
languages is often written in C for speed and portability, and
extensions for other languages are generally written in C. (The reason
Python is a good choice for engineering is the existence of libraries
like numpy and scipy, which give a nice Python interface while having
fast C routines underneath.)



>
> For the CS majors I add that we think exposure to programming at this
> level is useful in giving them the broad conceptual understanding of
> the field that the degree is supposed to represent, even if they never
> write another C program.
>
> I also tell both groups that only a C fanatic would use C for general
> application programs; there are plenty of other choices that are
> more suitable for that.

Agreed - but there are a lot of C fanatics around...

John Forkosh

unread,
Aug 9, 2018, 6:44:37 AM8/9/18
to
blm...@myrealbox.com <blmblm.m...@gmail.com> wrote:
> The other is a beginning programming course for engineering majors.
> Up until a few years ago, these students took our intro course for
> CS majors, but when we switched that to Scala, well, ENGR was not
> happy. I wasn't involved in the ensuing discussion, but what was
> reported to me was that they insisted on a language "with pointers",
> which to us means C++ or C


" ...they[ENGR] insisted on a language "with pointers" " ???

Why??? If this insistence ultimately dictated your choice of
language to teach, and it wasn't the language you'd have otherwise
chosen, then maybe the answer to that "why?" is pretty significant.

I've done >>lots<< of physics/math programming, though that was
many years ago, and primarily in Fortran, which at that time
had no pointers whatsoever (and even now I personally find its
pointer syntax pretty ugly). But no problem, because no pointers
were ever needed for anything we were ever doing. Moreover, years
later on, I programmed (and still program) lots of math-heavy
"financial engineering" stuff in C, and again never needed (nor need)
any pointers for anything.

It makes zero sense to me that your engineering dept's #1 big ask is
for pointers. That's what pops into their minds before anything else???
I know you say, "I wasn't involved in the ensuing discussion",
but if the outcome of that discussion has had a significant negative
effect on your subsequent teaching, then I'd suggest you re-visit
the issue and get some concrete reasons why they want/need pointers.
I'd wager you a couple of "free beers" they ain't gonna have any
reasons that hold up under legitimate scrutiny.

Bart

unread,
Aug 9, 2018, 6:58:55 AM8/9/18
to
On 09/08/2018 03:18, blm...@myrealbox.com wrote:
> In article <VuzaD.2312612$I77.1...@fx44.am4>, Bart <b...@freeuk.com> wrote:

>> This is what makes it so silly. You have to obfuscate your code just to
>> get to the starting point of those other languages. And do it in a
>> million places (eg. everywhere you might use ++i).
>>
>
> I disagree about ++i. I don't know about your use cases, but for
> me by far the most common one for ++i is as the increment part of a
> "for" loop, and I'm not sure I can think of any way you could get
> overflow in the simple and most-common-for-me case:
>
> for (int i = 0; i < N; ++i) { .. }
>
> where N is an "int".

Checking ++i would be an extreme case. But it can happen if i is
modified inside the loop. Or there is perhaps a conditional --i in the
loop (so it does many more iterations) and there is also ++j. Or you
don't know what N is (result of a complex expression perhaps).

Or, when the for-loop header is much more complicated than this. Because
it allows it, it seems to encourage some people to cram so much into the
loop header, you can't even be certain which if any is the 'loop index'.

Or even you write a loop with an unknown start index:

for(; (c=(*fmt))!=0; ++fmt){

This is an actual example where fmt is a function parameter. In this
case it's a pointer, but it could equally have been an int.
>
> (Arguably if i is being used an array index the right type for it is
> size_t, but leave that for now.)
>
>> [ snip ]
>
> (I read the rest with interest but in the interest of replying to
> everyone won't comment more. I shouldn't be surprised by the number
> and length of replies, but -- I was.)

Overflow and UB is one of those topics...

>> Good luck putting that across...
>>
>
> Yeah. But really, I'm inclined to think that in any course in a
> difficult and technical subject there are going to be points in a
> first course where you just have to say "the details here are beyond
> the scope of this course, but be aware that what we do in this course
> is sort of a first approximation".

I can't remember in my first programming course that that was covered at
all (or any subsequent ones for that matter). It's enough to know that
numbers in such languages have limited range, and being binary have
funny-looking limits when expressed as decimal.

But I think it could be useful in such courses to use a friendly
implementation that traps on overflows and makes other runtime checks.

--
bart

Ben Bacarisse

unread,
Aug 9, 2018, 7:39:32 AM8/9/18
to
This bit confused me. The evaluation of an integer constant expression
will use the same types as any other, so for a 16-but int (the case in
question) ADC_MAX * SCALE_TOP might overflow. The compiler might tell
you aout this, but it also might simply generate a negative number that
will be less that INT_MAX.

> - to be safe,
> you may want the static_assert here to be:
>
> static_assert(((long long int) ADC_MAX * SCALE_TOP < INT_MAX),
> "Check the types for adc scaling");

Obviously it's likely that for small systems this will be safe, but the
general advice should be different -- you can't safely test for possible
overflow using the very expression that might overflow and C permits
int, long and long long to be the same size, so there is the same danger
here, at least theoretically, as in the original.

It's better to test that ADC_MAX is no greater that INT_MAX / SCALE_TOP.
You may never be scaling 64-bit ADC values, but it's better to keep to a
general rules than working out what's likely to be safe from case to
case. (I'd also check for typos by adding in ADC_MAX > 0 && SCALE_TOP >
0.)

> Otherwise since ADC_MAX and SCALE_TOP are both constants of type "int",
> the multiplication will be done as "int" - and overflow if they are too
> big. (Your compiler will probably warn you about that.)

Is there something about static_assert (I've not studied that yet) or
integer constant expressions in general that justifies the "might"?
Surely the multiplication must be done as int (in so far as that makes
any sense)?

--
Ben.

james...@alumni.caltech.edu

unread,
Aug 9, 2018, 8:03:05 AM8/9/18
to
On Thursday, August 9, 2018 at 2:37:52 AM UTC-4, Tim Rentsch wrote:
> Keith Thompson <ks...@mib.org> writes:
>
> > r...@zedat.fu-berlin.de (Stefan Ram) writes:
...
> >> So, does this mean the old trick to check whether an
> >> int-value is between 0 and and a positive int value
> >> (3 is used in the example below) with just one comparison
> >> really is portable in C and correct for all possible values
> >> of the argument (and all possible int values of the end of
> >> the range)?
> >>
> >> int is_in_range( int const n ){ return( unsigned )n < 3; }
> >
> > Conversion from int to unsigned int is well defined (and is not
> > defined in terms of bit representations), so that should work
> > as intended.
>
> On implementations where UINT_MAX > INT_MAX. Others, not
> so much.

"The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same." (6.2.5p9). Therefore,
UINT_MAX<INT_MAX isn't permitted. UINT_MAX == INT_MAX is, so I presume
that's the case you're talking about?

"1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it is
unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type." (6.3.1.3).

The first paragraph applies to any non-negative int value, and gives a
well-defined result. The second paragraph applies to any negative int
value, and describes a procedure that terminates with a well-defined
result for any such value, and both of those statements remain true
regardless of the value of INT_MAX. What are you talking about?

james...@alumni.caltech.edu

unread,
Aug 9, 2018, 8:07:46 AM8/9/18
to
On Thursday, August 9, 2018 at 2:34:14 AM UTC-4, Tim Rentsch wrote:
> James Kuyper <james...@alumni.caltech.edu> writes:
...
> > The problem is that any bit pattern that represents a negative value in
> > the signed type will represent a positive value in the corresponding
> > unsigned type that is too big to be represented in the signed type.
>
> It can be. But it doesn't have to be.

Example, please? And don't say "negative zero"; that's not really negative, despite the presence of that word in it's name.

jacobnavia

unread,
Aug 9, 2018, 8:29:10 AM8/9/18
to
Le 08/08/2018 à 13:01, James Kuyper a écrit :
> I've heard it claimed that there is no C compiler, anywhere, that takes
> the "raise a signal" option.

The lcc-win compiler (using a compile time option) will raise the
overflow signal and call a user defined function. If the user hasn't
defined an overflow handler a system provided default function wwill be
called.

jacobnavia

unread,
Aug 9, 2018, 8:35:06 AM8/9/18
to
Le 08/08/2018 à 15:12, Stefan Ram a écrit :
> Why - of all the programming languages - does it have to be
> C which these poor folks have to learn?

Mmmm, let's see...

Ranting about how bad C is?

That message must have been posted to comp.lang.c, for sure. A newsgroup
where people rant about C.

Great!

David Brown

unread,
Aug 9, 2018, 9:17:41 AM8/9/18
to
I meant that you can "trust" static_assert to work as it should - but
you can't necessarily trust that the examples given by me are correct
(since my example was not accurate).

And the example here was for programming rocket control systems. Your
compiler /will/ give you an error if there is an overflow in the
constant expression here - a hard error, halting compilation. That
isn't the kind of work you do using Bart's home-made compiler or gcc in
its most permissive modes - you will have such static analysis in place.

>> - to be safe,
>> you may want the static_assert here to be:
>>
>> static_assert(((long long int) ADC_MAX * SCALE_TOP < INT_MAX),
>> "Check the types for adc scaling");
>
> Obviously it's likely that for small systems this will be safe, but the
> general advice should be different -- you can't safely test for possible
> overflow using the very expression that might overflow and C permits
> int, long and long long to be the same size, so there is the same danger
> here, at least theoretically, as in the original.

Yes, that is theoretically possible. You need to match your checks
appropriately. However, the only system I have ever heard of with
64-bit int is an early Cray, and I am not sure there was ever a C99
compiler for it (it's not the kind of system I work with). I am
confident that you will not find a system with a 10-bit ADC and 64-bit
int. And clearly you cannot have a system with 16-bit int and long long
int being the same size.

I think it is a good thing to have widely portable code, and a bad thing
to be non-portable unnecessarily. But equally I think it is a bad idea
to be obsessively portable towards all possible and hypothetical past,
present and future systems - if such portability works against the
clarity or simplicity of your code, the it should be dropped.
(Protecting the code by compile-time checks of your assumptions and
limitations can be a good idea.)

So you are correct that you can't use the static assert to check the
same expression for overflow - that's why I added the cast. But - for
most people's programming needs - the usefulness of such checks far
outweighs any issues you might have with extreme portability.

>
> It's better to test that ADC_MAX is no greater that INT_MAX / SCALE_TOP.

If you like. It was a quick example - real life details will depend on
the rest of the code. The key point is that you don't use comments to
express something that can be written in code.

> You may never be scaling 64-bit ADC values, but it's better to keep to a
> general rules than working out what's likely to be safe from case to
> case.

I would /love/ to see an ADC that gave 64-bit values!

No, general rules are /not/ always helpful. Too much generality is
counter-productive.

> (I'd also check for typos by adding in ADC_MAX > 0 && SCALE_TOP >
> 0.)

What typos would that check? Again, let's be realistic. Someone might
define ADC_MAX to be 1024 when they meant 1023, but no one is going to
write it as -1023 by mistake.

>
>> Otherwise since ADC_MAX and SCALE_TOP are both constants of type "int",
>> the multiplication will be done as "int" - and overflow if they are too
>> big. (Your compiler will probably warn you about that.)
>
> Is there something about static_assert (I've not studied that yet) or
> integer constant expressions in general that justifies the "might"?

I'm sorry, I can't see which "might" you are referring to.

> Surely the multiplication must be done as int (in so far as that makes
> any sense)?
>

Yes, that is my understanding.


Bart

unread,
Aug 9, 2018, 10:46:52 AM8/9/18
to
On 09/08/2018 14:17, David Brown wrote:

> And the example here was for programming rocket control systems. Your
> compiler /will/ give you an error if there is an overflow in the
> constant expression here - a hard error, halting compilation. That
> isn't the kind of work you do using Bart's home-made compiler or gcc in
> its most permissive modes - you will have such static analysis in place.

If I try this program:

#include <stdio.h>
#include <limits.h>

#define ADC_MAX 65535
#define SCALE_TOP 1000000
#define SCALE_BOTTOM 7

_Static_assert((ADC_MAX * SCALE_TOP < INT_MAX),
"Check the types for adc scaling");

with various compilers run with minimum options (just compile), then my
compiler gives a fatal error, gcc gives a mere warning, one which can be
swamped by other warnings[**].

Only one other compiler gave a fatal error:

bcc (mine) Fatal error
Pelles C Fatal error
gcc Warning (Windows)
clang Nothing (rextester)
icc Warning (godbolt)

lccwin Syntax error (these don't support it)
DMC Syntax error
MSVC Syntax error

I'm not proposing mine for mission critical apps. (Such a compiler needs
to be thoroughly exercised to iron out problems and that's only possible
by using it on many millions of lines of code over many different
applications over years. But then, becoming large and complex and highly
patched can be also be counter-productive.)

But you shouldn't be so dismissive. It was interesting that gcc would
pass that code fragment (in that it proceeds to produce an output file)
unless given who knows what options. Presumably a blanket -Werror so
that even a temporarily unused variable because of temporarily commented
out code will halt compilation.

Although I can tell you that overflow when evaluating such expressions
inside my compiler is not done. Unless the compiler itself is built with
overflow trapping. For mission critical purposes, I would suggest a
different language, or a restricted subset of this one.

(** Which has been a problem with gcc recently; after screens full of
warnings, was there also an error that you missed? There's no message at
the end as to whether it completed the compilation or not. The only clue
is less of a delay than normal. So you run the program, and get the same
problem, because it's the same program as before!)


--
bart

Bart

unread,
Aug 9, 2018, 10:58:00 AM8/9/18
to
On 09/08/2018 15:46, Bart wrote:
> On 09/08/2018 14:17, David Brown wrote:
>
>> And the example here was for programming rocket control systems.  Your
>> compiler /will/ give you an error if there is an overflow in the
>> constant expression here - a hard error, halting compilation.  That
>> isn't the kind of work you do using Bart's home-made compiler or gcc in
>> its most permissive modes - you will have such static analysis in place.
>
> If I try this program:
>
>   #include <stdio.h>
>   #include <limits.h>
>
>   #define ADC_MAX 65535
>   #define SCALE_TOP 1000000
>   #define SCALE_BOTTOM 7
>
>   _Static_assert((ADC_MAX * SCALE_TOP < INT_MAX),

I should say that I evaluate such expressions using 64-bit integers
(either signed or unsigned), like in preprocessor expressions.

I don't know what the Standard says and I'm not about to find out since
implementing everything to the letter would take years of effort. My
product merely compiles a subset and dialect of C (and it is useful as a
test bed for experimental features), so I can do what I like.

But if int*int is normally calculated within the compiler at the same
size as int then either those constants need an LL suffix or a cast is
needed in the assert.

--
bart

Keith Thompson

unread,
Aug 9, 2018, 2:38:40 PM8/9/18
to
Bart <b...@freeuk.com> writes:
> On 09/08/2018 14:17, David Brown wrote:
>> And the example here was for programming rocket control systems. Your
>> compiler /will/ give you an error if there is an overflow in the
>> constant expression here - a hard error, halting compilation. That
>> isn't the kind of work you do using Bart's home-made compiler or gcc in
>> its most permissive modes - you will have such static analysis in place.

Correction: Any conforming compiler must produce a *diagnostic*, not
necessarily an error (N1570 5.1.1.3). An overflow in a constant
expression is a constraint violation. (Note that this applies only in
contexts that syntactically require a constant expression -- which
includes the first argument to _Static_assert (N1570 6.7.10)).

> If I try this program:
>
> #include <stdio.h>
> #include <limits.h>
>
> #define ADC_MAX 65535
> #define SCALE_TOP 1000000
> #define SCALE_BOTTOM 7
>
> _Static_assert((ADC_MAX * SCALE_TOP < INT_MAX),
> "Check the types for adc scaling");
>
> with various compilers run with minimum options (just compile), then my
> compiler gives a fatal error, gcc gives a mere warning, one which can be
> swamped by other warnings[**].

I know Bart doesn't care about this, but others might want to know
that by producing a warning gcc has met its obligation to produce
a diagnostic message for this constraint violation, and that the
"-pedantic-errors" option will cause gcc to produce a fatal error
message.

[...]

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,
Aug 9, 2018, 2:47:02 PM8/9/18
to
Bart <b...@freeuk.com> writes:
> On 09/08/2018 15:46, Bart wrote:
>> On 09/08/2018 14:17, David Brown wrote:
>>
>>> And the example here was for programming rocket control systems.  Your
>>> compiler /will/ give you an error if there is an overflow in the
>>> constant expression here - a hard error, halting compilation.  That
>>> isn't the kind of work you do using Bart's home-made compiler or gcc in
>>> its most permissive modes - you will have such static analysis in place.
>>
>> If I try this program:
>>
>> #include <stdio.h>
>> #include <limits.h>
>>
>> #define ADC_MAX 65535
>> #define SCALE_TOP 1000000
>> #define SCALE_BOTTOM 7
>>
>> _Static_assert((ADC_MAX * SCALE_TOP < INT_MAX),
>
> I should say that I evaluate such expressions using 64-bit integers
> (either signed or unsigned), like in preprocessor expressions.
>
> I don't know what the Standard says and I'm not about to find out since
> implementing everything to the letter would take years of effort. My
> product merely compiles a subset and dialect of C (and it is useful as a
> test bed for experimental features), so I can do what I like.

I know that Bart doesn't care about this, but the type rules
for the constant-expression in a _Static_assert declaration are
the same as for any other ordinary C expression. Assuming that
INT_MAX >= 1000000, all subexpressions in the _Static_assert() are
of type int, and a conforming compiler must diagnose the overflow
if ADC_MAX * SCALE_TOP exceeds INT_MAX.

The fact that _Static_assert is handled at compile time makes
it easy to mistakenly assume that the expression follows the
rules of preprocessor expressions, where all signed and unsigned
integer types are treated as intmax_t and uintmax_t, respectively.
But the expression is evaluated during translation phase 7, not 4
(N1570 5.1.1.2).

> But if int*int is normally calculated within the compiler at the same
> size as int

It is (unless INT_MAX < 1000000).

> then either those constants need an LL suffix or a cast is
> needed in the assert.

Yes.

Keith Thompson

unread,
Aug 9, 2018, 3:01:50 PM8/9/18
to
For example, if UINT_MAX and INT_MAX are both 32767 (which implies
that unsigned int has at least one padding bit), `(unsigned)-32767`
yields 1U, and is_in_range incorrectly returns 1.

You could have explained that, but you chose to be obscure.

I recently made a remark in another thread about adding you to my
killfile. I expressed it somewhat facetiously, but I was serious
about it -- something that I might not have made sufficiently clear.
To be clear, you are now in my killfile. That doesn't necessarily
mean I'll never see anything you post (for boring reasons having to
do with how my newsreader works), but I do not intend to reply to
you in the future. I think the reasons for this are sufficiently
obvious.

Keith Thompson

unread,
Aug 9, 2018, 3:07:57 PM8/9/18
to
Interesting. What is the "overflow signal"? Is it a signal in the
sense defined in <signal.h>, or is it something else? If the latter,
then lcc-win would seem not to be a counterexample to the claim.

(I don't expect jacob will reply to me, so I'll take a look at it later
when I have access to my Windows system.)

David Brown

unread,
Aug 9, 2018, 3:10:00 PM8/9/18
to
On 09/08/18 16:46, Bart wrote:
> On 09/08/2018 14:17, David Brown wrote:
>
>> And the example here was for programming rocket control systems.  Your
>> compiler /will/ give you an error if there is an overflow in the
>> constant expression here - a hard error, halting compilation.  That
>> isn't the kind of work you do using Bart's home-made compiler or gcc in
>> its most permissive modes - you will have such static analysis in place.
>
> If I try this program:
>
>   #include <stdio.h>
>   #include <limits.h>
>
>   #define ADC_MAX 65535
>   #define SCALE_TOP 1000000
>   #define SCALE_BOTTOM 7
>
>   _Static_assert((ADC_MAX * SCALE_TOP < INT_MAX),
>               "Check the types for adc scaling");
>
> with various compilers run with minimum options (just compile), then my
> compiler gives a fatal error, gcc gives a mere warning, one which can be
> swamped by other warnings[**].

My gcc gives my a fatal error, and there are no other warnings. I would
expect the same to be true for someone working with rocket control systems.

Why does my gcc give me a fatal error while your gcc gives a warning?
Because I know how to use my compiler, and you don't.

Why does my gcc not give me swamps of other warnings? Because I know
how to use my compiler, and how to write quality C code, and you don't.

I expect people writing serious code to use their tools, and write the C
code, in a manner far closer to the way I do than the way /you/ do.

> But you shouldn't be so dismissive.

Of course I should be dismissive of your messing about with compilers -
it is of no use to anyone until you learn to use tools properly.

> It was interesting that gcc would
> pass that code fragment (in that it proceeds to produce an output file)
> unless given who knows what options. Presumably a blanket -Werror so
> that even a temporarily unused variable because of temporarily commented
> out code will halt compilation.
>
> Although I can tell you that overflow when evaluating such expressions
> inside my compiler is not done. Unless the compiler itself is built with
> overflow trapping. For mission critical purposes, I would suggest a
> different language, or a restricted subset of this one.
>
> (** Which has been a problem with gcc recently; after screens full of
> warnings, was there also an error that you missed? There's no message at
> the end as to whether it completed the compilation or not. The only clue
> is less of a delay than normal. So you run the program, and get the same
> problem, because it's the same program as before!)
>

Screens of warnings for C has never been an issue with gcc, nor has
there ever been a confusion about warnings and errors amongst people who
can use development tools properly. Blame the user here, not the tools.

(gcc used to have a big problem with unreadable warnings and errors in
C++, especially in template libraries. This is partly a problem with
the way C++ works, partly a limitation of the way gcc handles these
errors. gcc has improved, and the language is gaining features like
"concepts" (don't blame me for the choice of name here) that greatly
simplify the error messages.)

Ben Bacarisse

unread,
Aug 9, 2018, 3:37:53 PM8/9/18
to
David Brown <david...@hesbynett.no> writes:

> On 09/08/18 13:39, Ben Bacarisse wrote:
<snip>
>> It's better to test that ADC_MAX is no greater that INT_MAX / SCALE_TOP.
>
> If you like. It was a quick example - real life details will depend on
> the rest of the code. The key point is that you don't use comments to
> express something that can be written in code.

Yes, but I was not commenting on that point. I was commenting on how
best to test the values in an assert.

>> You may never be scaling 64-bit ADC values, but it's better to keep to a
>> general rules than working out what's likely to be safe from case to
>> case.
>
> I would /love/ to see an ADC that gave 64-bit values!
>
> No, general rules are /not/ always helpful. Too much generality is
> counter-productive.

I think you may be reading more into what I wrote than I intended. The
general rule was just to avoid the potential UB that you are testing
for. I don't think that's counter-productively over general. A compile
time divide is as simple as a multiply, so I'm not advocating something
hugely complex to cope with an extremely rare possibility. I'm
advocating something simple so you don't have to persuade anyone that
it's safe.

Now you know your compiler will stop on compile-time overflow, but
that's just the sort of thing I'd comment because it can't be tested
for. Better to avoid the issue altogether in my book.

<snip>
> What typos would that check? Again, let's be realistic. Someone might
> define ADC_MAX to be 1024 when they meant 1023, but no one is going to
> write it as -1023 by mistake.

Is your list of possible mistakes publicly available? It would make
everyone's life simpler if we all knew which bugs simply won't occur!
:-)

>>> Otherwise since ADC_MAX and SCALE_TOP are both constants of type "int",
>>> the multiplication will be done as "int" - and overflow if they are too
>>> big. (Your compiler will probably warn you about that.)
>>
>> Is there something about static_assert (I've not studied that yet) or
>> integer constant expressions in general that justifies the "might"?
>
> I'm sorry, I can't see which "might" you are referring to.

Now this is odd. There is no "might" there -- quite clearly. So I went
back up the thread to read the original and there I read "might" instead
of "will" (on /two/ readings!). Something about the context was making
me misread it even when I was looking hard to be sure what you'd said.
Truly mysterious. Anyway, sorry about that!

--
Ben.

Keith Thompson

unread,
Aug 9, 2018, 3:42:10 PM8/9/18
to
David Brown <david...@hesbynett.no> writes:
> On 09/08/18 16:46, Bart wrote:
[snip]
> My gcc gives my a fatal error, and there are no other warnings. I would
> expect the same to be true for someone working with rocket control systems.
>
> Why does my gcc give me a fatal error while your gcc gives a warning?
> Because I know how to use my compiler, and you don't.

Do you expect Bart to start caring about gcc options the 137th time he's
told about them? (No, I haven't actually been counting.)

[...]

>> But you shouldn't be so dismissive.
>
> Of course I should be dismissive of your messing about with compilers -
> it is of no use to anyone until you learn to use tools properly.

By all means, be dismissive -- but please, only a finite number of
times. The point has been made. Move on.

[...]

Bart

unread,
Aug 9, 2018, 4:19:04 PM8/9/18
to
On 09/08/2018 20:09, David Brown wrote:
> On 09/08/18 16:46, Bart wrote:
>> On 09/08/2018 14:17, David Brown wrote:
>>
>>> And the example here was for programming rocket control systems.  Your
>>> compiler /will/ give you an error if there is an overflow in the
>>> constant expression here - a hard error, halting compilation.  That
>>> isn't the kind of work you do using Bart's home-made compiler or gcc in
>>> its most permissive modes - you will have such static analysis in place.
>>
>> If I try this program:
>>
>>    #include <stdio.h>
>>    #include <limits.h>
>>
>>    #define ADC_MAX 65535
>>    #define SCALE_TOP 1000000
>>    #define SCALE_BOTTOM 7
>>
>>    _Static_assert((ADC_MAX * SCALE_TOP < INT_MAX),
>>                "Check the types for adc scaling");
>>
>> with various compilers run with minimum options (just compile), then
>> my compiler gives a fatal error, gcc gives a mere warning, one which
>> can be swamped by other warnings[**].
>
> My gcc gives my a fatal error, and there are no other warnings.  I would
> expect the same to be true for someone working with rocket control systems.
>
> Why does my gcc give me a fatal error while your gcc gives a warning?
> Because I know how to use my compiler, and you don't.

You're missing my point.

Yes we all know gcc or your favourite compiler can be told to behave in
any manner your desire by cajoling it into taking some things more
seriously.

But anyone else, who runs a compiler in default mode, for example using:

cc program.c

or who types in code into godbolt.org or rextester.com without touching
any settings other than language and compiler, they will not see an
error (unless using pelles C).

And the other part of it is that a major compiler like MSVC didn't even
support that feature.

> Why does my gcc not give me swamps of other warnings?  Because I know
> how to use my compiler, and how to write quality C code, and you don't.

Yes I know. That means that when creating, modifying or extending an
application of tens of thousands of lines, every single line of it must
be perfect and with no warning before you can start testing.

Including all the code you've just added, that you don't know works yet,
and that you might have to strip out and rewrite half an hour later,
after spending all that time to remove all possible things might give a
warning.

Some of us just work in different ways which are not necessarily wrong
just because you don't work the same way.

> I expect people writing serious code to use their tools, and write the C
> code, in a manner far closer to the way I do than the way /you/ do.
>
>> But you shouldn't be so dismissive.
>
> Of course I should be dismissive of your messing about with compilers -
> it is of no use to anyone until you learn to use tools properly.

Excuse me. It is MY compiler that will show that error without doing
anything at all except run it with the name of the source file as input.
Could it get more straightforward than that?

You are saying YOUR tool is superior because /you have to tell it how to
compile programs and which errors to detect and which to ignore/? It
sounds like it doesn't know its job! I guess you would call that
'flexibility', because sometimes it really doesn't matter if that rocket
crashes.

> Screens of warnings for C has never been an issue with gcc,

Yes, you said. You never get warnings.

So, what is the purpose in having warnings if all of them have to be
taken seriously according to you? Do you really never see them? Or do
you see them before they are fixed? If which case you will also get
screenfuls of them from time to time. Then what; do you print them off
and go through them one by one and not start to actually run your
program until they've all gone?

(And then you find there are bugs as I suggested above.)

My compiler never gives warnings. It gives errors and stops at the first
error. I fix that and move on. Since I can't keep in mind more than one
anyway, and I'm not going to print out a long list to go through. You
just compile again; if there is another error, you will find out in 50 msec.

--
bart

Tim Rentsch

unread,
Aug 9, 2018, 4:55:08 PM8/9/18
to
A _Static_assert (and presumably also the static_assert here) in
C requires an integer constant expression. The Standard doesn't
make this completely clear, but an early Defect Report does, that
in cases where a constant expression is required, as here, the
compiler must identify (ie, with a diagnostic) any overflow, etc
(eg, division by zero). The constraint in 6.6 p4 says

Each constant expression shall evaluate to a constant that
is in the range of representable values for its type.

and Defect Report #31 says, in part, in the Response section

the Committee's judgement of the intent is that the
``representable'' requirement applies to each subexpression
of a constant expression,

and gives a specific example

case INT_MAX + 2:

as being a constraint violation (and therefore requiring a
diagnostic). So if there is overflow in this static_assert
predicate, a diagnostic is mandatory.

Ben Bacarisse

unread,
Aug 9, 2018, 8:41:33 PM8/9/18
to
Thanks, I didn't know that.

It's tempting me to revise my advice since, now, the clearest way to
assert no overflow is to try to provoke it. However, I'm still not sure
I'd do that. If, for example, some decides to make the constants
unsigned (1024u) the multiplication will now not overflow but the
division would still detect the problem.

--
Ben.

Reinhardt Behm

unread,
Aug 9, 2018, 9:37:27 PM8/9/18
to
That happens only if you just hack together something. People who do serious
software development have serious development processes and we write code
that mostly goes without warnings the first time.
And yes, before doing any test I try to get rid of any warning. My time is
too valuable to test code that already look suspicious to the compiler.

> Including all the code you've just added, that you don't know works yet,
> and that you might have to strip out and rewrite half an hour later,
> after spending all that time to remove all possible things might give a
> warning.
>
> Some of us just work in different ways which are not necessarily wrong
> just because you don't work the same way.

Yes some just hack something together, others develop software.

>> I expect people writing serious code to use their tools, and write the C
>> code, in a manner far closer to the way I do than the way /you/ do.
>>
>>> But you shouldn't be so dismissive.
>>
>> Of course I should be dismissive of your messing about with compilers -
>> it is of no use to anyone until you learn to use tools properly.
>
> Excuse me. It is MY compiler that will show that error without doing
> anything at all except run it with the name of the source file as input.
> Could it get more straightforward than that?
>
> You are saying YOUR tool is superior because /you have to tell it how to
> compile programs and which errors to detect and which to ignore/? It
> sounds like it doesn't know its job! I guess you would call that
> 'flexibility', because sometimes it really doesn't matter if that rocket
> crashes.
>
>> Screens of warnings for C has never been an issue with gcc,
>
> Yes, you said. You never get warnings.
>
> So, what is the purpose in having warnings if all of them have to be
> taken seriously according to you? Do you really never see them? Or do
> you see them before they are fixed? If which case you will also get
> screenfuls of them from time to time. Then what; do you print them off
> and go through them one by one and not start to actually run your
> program until they've all gone?

Yes, this is how it's done. Why should try to find any errors by running the
program when the compiler has already found them for me.

> (And then you find there are bugs as I suggested above.)
>
> My compiler never gives warnings. It gives errors and stops at the first
> error. I fix that and move on. Since I can't keep in mind more than one
> anyway, and I'm not going to print out a long list to go through. You
> just compile again; if there is another error, you will find out in 50
> msec.

So if your compiler is so great and fast, why don't you just use it properly
and fix all these errors before wasting time to find them again by running
the program.

--
Reinhardt

Richard Damon

unread,
Aug 9, 2018, 9:43:49 PM8/9/18
to
On 8/9/18 10:46 AM, Bart wrote:
>
> (** Which has been a problem with gcc recently; after screens full of
> warnings, was there also an error that you missed? There's no message at
> the end as to whether it completed the compilation or not. The only clue
> is less of a delay than normal. So you run the program, and get the same
> problem, because it's the same program as before!)
>

If you get swamped with warnings you are doing something wrong. You have
several options:

1) Go through and fix the conditions for each warning (or at least most
of them) so you don't get swamped with warnings.

2) Selectively disable the warning around the code in question,
preferably with a comment on why you are doing this

3) Turn off the select warnings that are being over sensitive on your
code. Not all warning options are useful for all code bases

4) Most likely, a bit of a combination of the above.

If you get so many warnings that you don't look at them, you are doing
it wrong and not using the tool right. This DOES mean you may need to
read the documentation for the compiler.

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 9:56:59 PM8/9/18
to
In article <kfnpnys...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:
> blm...@myrealbox.com <blmblm.m...@gmail.com> writes:
>
> > I teach C programming to undergraduates, and while I do my best to
> > teach them how to use the language correctly, the recent thread
> > about undefined behavior reminds me that I never know quite what to
> > say about overflow in arithmetic on signed integers. Usually I've
> > just muttered something about how several languages (Java and Scala
> > come to mind) just give "wrong" answers, and there's not a lot one
> > can easily do about it. But if in C it's undefined behavior, hm,
> > that's (strictly speaking!) more serious, and I'm curious about what
> > one *can* do. Consider this code fragment:
> >
> > int a, b;
> > /* code to assign values to a and b */
> > printf("%d + %d is %d\n", a, b, a + b);
> >
> > I noticed in the other thread a suggestion (I think -- I may have
> > misunderstood it) that one can avoid UB in addition of signed
> > integers by first casting to unsigned integers, adding, and then
> > casting back to signed. That seems like it ought to give a result
> > consistent with the behavior or those Other Languages (that quietly
> > wrap around), and is perhaps the best one can reasonably do. Is
> > that what a truly pedantic and careful programmer would do?
> >
> > Mostly I'm curious, because I think for my students the most
> > appropriate thing to do is just to continue to say that many
> > commonly-used programming languages (not all!) don't deal very
> > gracefully with this kind of thing and that a full discussion is
> > beyond the scope of the course. But it would be nice to give some
> > hints about how to write truly careful code.
>
> There are a couple of different issues here, and it's important
> to distinguish and prioritize them.
>
> For programming in C, it's important to understand the notion of
> undefined behavior. This topic deserves treatment on its own, not
> as a footnote or parenthetical explanation of something else (ie,
> such as integer arithmetic in this case). And that explanation
> should come before other topics, like arithmetic operations, where
> undefined behavior is part of the definition (or lack thereof) of
> the operations involved.
>

Mmph. I think you have a point for students with some previous
programming experience. But for total beginners? and at least
a few of the students in that course for ENGR majors are in that
category.

> This explanation needn't go into all the gory details. You might
> say that some programming languages are "safe" (or perhaps mostly
> safe), in that if a program does something wrong what happens is
> still fairly well delineated. C isn't like that: in C some
> operations are "unsafe", and the only thing that can be relied on
> is that these cases should not be relied on. Follow with a couple
> of examples in each of the two categories, safe and unsafe. After
> (and only after) introducing the concept of undefined behavior
> should the class then get on to signed arithmetic and how to deal
> with it.

So can you think of examples of "safe" versus "unsafe" suitable for
total beginners? that can be taught before teaching anything about ....
Well, pretty much anything, right? signed integer arithmetic,
reading input (which is a whole can of worms by itself :-( -- there's
a reason I didn't put any attempt at that in my example :-) ).

> As for how to steer clear of the danger zones, there are different
> ways and different ideas for how one should or might do that. One
> way is to stick to common patterns, like the example you give in
> a later posting
>
> for( i = something; i < N; i++ ){ ... }
>
> If N has the same type that i does, there will never be a problem
> with overflow or undefined behavior (of course, not counting
> cases where 'i' might be assigned inside the loop body).
>
> Another way that some people favor is to prefer using unsigned
> types to using signed types. I find this technique very useful,
> especially for variables used for indexing. Of course, using
> unsigned types has its own set of gotchas, and those must be
> guarded against, but usually the consequences are more benign
> than direct undefined behavior.
>
> Another idea is to use a type with a large range, like long long.
> Doing this doesn't eliminate the problem of undefined behavior,
> but it does greatly reduce it. In practice just using long long
> may give better ROI, safety-wise, than writing very careful code
> but with shorter types.
>
> I'm sure there are other practices that could go on this list.
> There isn't any fixed set of rules one can follow that always
> guarantees a good result, partly because the problem is
> multi-dimensional. For example, code that is safer might also run
> slower, and in some cases speed considerations dominate. For
> purposes of your class, I think the key point to emphasize is that
> one should be conscious of the potential dangers, and be aware of
> when code is starting to get outside of one's personal safety
> envelope. Always staying inside is okay. Going outside but
> always being very careful is okay. What isn't okay is wandering
> back and forth across the edge without realizing it and without
> taking any extra precautions.
>
> So for that they may be worth, there are my suggestions.

I think we're on the same page with regard to goals -- make sure the
students are aware that there are traps for the unwary, and give some
guidance about how to avoid them -- it's just that teaching them this
stuff while also trying to get across the basic ideas of programming
(variables, assignment, conditional execution, repetition) seems kind
of daunting! Not that I can really object to that given that I get
paid pretty well to do this stuff, but still.

Anyway thanks for the long reply.

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 9:57:41 PM8/9/18
to
In article <pkgnhg$jfq$1...@reader1.panix.com>,
John Forkosh <for...@panix.com> wrote:
> blm...@myrealbox.com <blmblm.m...@gmail.com> wrote:
> > In article <87zhxw3...@bsb.me.uk>,
> > Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
> >> John Forkosh <for...@panix.com> writes:
> >> > blm...@myrealbox.com <blmblm.m...@gmail.com> wrote:
> >> >> I teach C programming to undergraduates, and while I do my best
> >> >> to teach them how to use the language correctly, the recent thread
> >> >> about undefined behavior reminds me that I never know quite what to
> >> >> say about overflow in arithmetic on signed integers. <snip>
> >>
> >> >> int a, b;
> >> >> /* code to assign values to a and b */
> >> >> printf("%d + %d is %d\n", a, b, a + b);
> >> <snip>
> >>
> >> > #include <limits.h>
> >> > int a, b;
> >> > /* code to assign values to a and b */
> >> > if ( (a>=0? b < INT_MAX - a : b > INT_MIN - a) )
> >> > printf("%d + %d is %d\n", a, b, a + b);
> >> > else printf("don't do that\n");
> >>
> >> This prohibits a lot of valid additions. I think you want <= and >= in
> >> the arms of the conditional.
> >
> > OHHHH. Duh. Why did I think it would be difficult if even possible
> > to detect overflow? :-( Anyway thanks to both of you.
> >
> >> > But since they're apparently pretty new to programming,
> >> > it's -- as you seem to be recognizing -- ridiculous to go off
> >> > on some long-winded detour:
> >> > "> ...a full discussion is beyond the scope of the course"
> >> > seems exactly right to me.
> >>
> >> ACK!
> >
> > Glad to hear agreement. But it's nice to have something in reserve
> > for the occasional student who wants to know more -- "beyond the scope,
> > but the short version of a better answer is .... ".
>
> I think Reinhardt Behm already gave you a pretty good answer for that,
> which I believe you agreed with, and that I'd also agree with...
> o choose your datatypes appropriately and check your inputs
> (i.e., checking for a+b overflow after some long calculation for
> a and b is "closing the stable door after the horse has bolted")
> So then you additionally asked
> o assume that won't happen for beginning programmers, so then what?
> Yeah, but then for beginning programmers you're back to your own remark,
> with which we all agreed,
> o "a full discussion is beyond the scope of the course"
> And so now, for "the occasional student who wants to know more", you can
> presumably give that more capable student Behm's advice, and for this
> more capable student
> o "assume that it will happen"
> Or something like that.

I like that last point. If something *can* go wrong .... :-).

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 9:58:19 PM8/9/18
to
In article <pkguek$2le$1...@dont-email.me>,
David Brown <david...@hesbynett.no> wrote:
> On 09/08/18 04:15, blm...@myrealbox.com wrote:
> > In article <pkdp5i$3ba$1...@dont-email.me>,
> > Reinhardt Behm <rb...@hushmail.com> wrote:
> >> AT Wednesday 08 August 2018 11:10, , wrote:
> >>
> >>> I teach C programming to undergraduates, and while I do my best
> >>> to teach them how to use the language correctly, the recent thread
> >>> about undefined behavior reminds me that I never know quite what to
> >>> say about overflow in arithmetic on signed integers. Usually I've
> >>> just muttered something about how several languages (Java and Scala
> >>> come to mind) just give "wrong" answers, and there's not a lot one
> >>> can easily do about it. But if in C it's undefined behavior, hm,
> >>> that's (strictly speaking!) more serious, and I'm curious about what
> >>> one *can* do. Consider this code fragment:
> >>>
> >>> int a, b;
> >>> /* code to assign values to a and b */
> >>> printf("%d + %d is %d\n", a, b, a + b);
> >>>
> >>> I noticed in the other thread a suggestion (I think -- I may have
> >>> misunderstood it) that one can avoid UB in addition of signed integers
> >>> by first casting to unsigned integers, adding, and then casting back
> >>> to signed. That seems like it ought to give a result consistent with
> >>> the behavior or those Other Languages (that quietly wrap around),
> >>> and is perhaps the best one can reasonably do. Is that what a truly
> >>> pedantic and careful programmer would do?
> >>>
> >>> Mostly I'm curious, because I think for my students the most appropriate
> >>> thing to do is just to continue to say that many commonly-used programming
> >>> languages (not all!) don't deal very gracefully with this kind of thing
> >>> and that a full discussion is beyond the scope of the course. But it
> >>> would be nice to give some hints about how to write truly careful code.
> >>
> >> I would word it in the following way:
> >> What's usually happens in the CPU (may be not all, but most) is silently
> >> wrap around. Many languages handle it the same.
> >> In C the compiler is free to do it the same way and ignore any overflow but
> >> it is also free to do any kind of nasty things (the nasal daemons..)
> >>
> >> Ignoring overflow and wrapping around will lead to mathematically incorrect
> >> results. They can lead to embarrassing outputs of your program like websites
> >> we all have seen to show totally nonsensical data (Your subscription will
> >> end in -32767 days). In real life such results can lead to catastrophic
> >> outcomes even with people getting killed and the programmer going to jail.
> >>
> >> The responsible way of handling this is to always check the possible range
> >> of inputs and results of calculations - also intermediate ones - and be
> >> prepared that such overflows can happen and choose your data types
> >> accordingly to prevent them. In critical programs document this to make sure
> >> nobody can accuse you and the next programmer modifying the software knows
> >> about it.
> >
> > Agreed -- if it matters, this is the responsible thing to do. (But for
> > the purposes of a beginning programming class -- "pretend it won't happen"
> > seems more appropriate.)
>
> It /always/ matters, so you should /always/ check your ranges. That
> does not mean you always need to add run-time checks to the code - it
> means using appropriate checks for the task in hand. That might mean
> simply knowing the ranges in question. It might mean comments or
> documentation. It might mean notes in the answer to the homework
> question. It might mean careful choices and naming of types or
> functions to keep things clear. It might be entirely obvious due to the
> nature of the task. There is a huge difference in the effort needed
> here for a quick test program and a rocket guidance system. But you
> should /always/ think about it and be sure your code is safe enough for
> the task. And that should, IMHO, be drilled into the students from the
> start of their first class - it should not be an afterthought!

Good point. Arguably I should stress that more, though I can be kind of
fanatical about checking for errors in user-supplied program inputs,
to the point where some students complain. Not exactly the same thing,
maybe.

> > One question in my mind was *how* to check for potential overflow, but
> > I notice that a couple of replies give methods for that (which on reading
> > make me wonder why I thought the problem was hard :-( ).
> >
>
> General checking for potential overflow is often hard to do well - it's
> easy to get things slightly wrong, or end up with ugly and messy code.
> It is not often a good idea to try to do the kind of "is it safe to add
> these two int's" checks - it is error prone coding and usually not the
> logical thing to do. Instead, look directly at the parameters for the
> function being called - check those for sensible values. Then make sure
> you are using types that are big enough for the purpose - fixed size
> types in <stdint.h> are often a very easy way to get that.

If you're writing, say, a simple four-function calculator program,
I think you pretty much have to check all operations for overflow,
don't you? I guess fixed-size types might make that a little easier --
compute a result twice the size of the inputs and check?

Silly example? yeah, but not atypical of the kinds of toy programs
often used in beginning courses.

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 9:59:04 PM8/9/18
to
In article <pkguua$5ed$1...@dont-email.me>,
David Brown <david...@hesbynett.no> wrote:
> On 09/08/18 05:41, blm...@myrealbox.com wrote:
> > In article <6b247311-f0f4-4dd0...@googlegroups.com>,
> > <james...@alumni.caltech.edu> wrote:
> >> On Wednesday, August 8, 2018 at 10:19:03 PM UTC-4, blm...@myrealbox.com wrote:
> >> ...
> >>> I disagree about ++i. I don't know about your use cases, but for
> >>> me by far the most common one for ++i is as the increment part of a
> >>> "for" loop, and I'm not sure I can think of any way you could get
> >>> overflow in the simple and most-common-for-me case:
> >>>
> >>> for (int i = 0; i < N; ++i) { .. }
> >>>
> >>> where N is an "int".
> >>
> >> That's easy - in fact, it was discussed here just recently:
> >>
> >> for(int i = INT_MIN; i<=INT_MAX; i++) { ... }
> >>
> >> Obviously, that loop won't work as desired, no matter how int
> >> overflow is handled. The question under discussion was how to fix it.
> >
> > True. But that's not quite a kind of loop I typically write, and the
> > loop I wrote doesn't have that problem, even if N is INT_MAX, does it?
> >
>
> Yes, but it is worth knowing (and showing your students) some examples
> of what /not/ to write, and why they are wrong. Learning to write good
> code also involves learning not to write bad code.

Interesting point. I'll have to try to think of some good examples! and
I guess this could be one.

> Since loops like that example are rarely useful, there is no need to
> jump through hoops to try to get "elegant" alternatives - such
> discussions are fine for a c.l.c. thread, but not for a C programming
> course.
>
> > (I admit that I halfway expect a reply that produces another head-slap
> > moment. :-)? )

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 10:00:43 PM8/9/18
to
In article <pkh18e$if8$1...@dont-email.me>,
David Brown <david...@hesbynett.no> wrote:
> On 09/08/18 04:20, blm...@myrealbox.com wrote:
> > In article <overflow-20...@ram.dialup.fu-berlin.de>,
> > Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> >> James Kuyper <james...@alumni.caltech.edu> writes:
> >>> However, avoiding overflow can be annoyingly difficult
> >>> if you need to write code that works with arbitrary values.
> >>
> >> Just because I cannot resist to repost my old code ...
> >>
> >> The following program will read two integral numbers and
> >> then print their sum. The longest functions is dedicated
> >> to reading a number:
> >>
> >> #include <stdio.h>
> >> #include <stdlib.h>
> >> #include <limits.h>
> >> #include <ctype.h>
> >>
> >> int addintintoverflow( int const i, int const j )
> >> { return
> >> i > 0 && j > 0 && i >( INT_MAX - j )||
> >> i < 0 && j < 0 && i <( INT_MIN - j ); }
> >>
> >> int mulintintoverflow( int const i, int const j )
> >> { if( i > 0 )
> >> { if( j > 0 ){ if( i > INT_MAX / j )return 1; }
> >> else { if( j < INT_MIN / i )return 2; }}
> >> else
> >> { if( j > 0 ){ if ( i < INT_MIN / j )return 3; }
> >> else if ( i != 0 && j < INT_MAX / i )return 4; }
> >> return 0; }
> >
> >
> > OHHHH. Duh. Why did I think it would be difficult if even possible
> > to detect overflow? :-( Anyway thanks.
> >
>
> Don't try to detect overflow - try to /avoid/ overflow. If you are
> using appropriate values, appropriate operations, and appropriate types,
> then there will be no overflow. If that is not the case, then fix the
> problem rather than trying to detect overflow.

Maybe I'm just being dense here, but consider the toy problem of writing
a simple four-function calculator. How do you avoid overflow? the only
thing that occurs to me is something prompted by another poster, who
mentioned fixed-size types -- one could use one size for inputs and
the running total and compute intermediate results using a size twice
as big?

> >> int add( int const i, int const j )
> >> { if( addintintoverflow( i, j ))
> >> return fprintf( stderr, "overflow.\n" ), EXIT_FAILURE;
> >> return printf( "%d\n", i+j ) > 0 ? EXIT_SUCCESS : EXIT_FAILURE; }
> >>
>
> Whatever you teach your students, I hope you don't teach them to write a
> function called "add" with two "int" parameters and an "int" return type
> that returns the length of a string printed out!

Ouch. I didn't really notice that, but yeah, not perhaps the most
self-documenting function name, is it?

> And I /really/ hope you don't teach them to follow the incredible
> formatting style shown in these examples.

In the examples I show them I follow my own preferred style, which is
less, um, "dense" than the code below.

I tell them there are many possible formatting styles and that they
should choose one that's readable and use it consistently. Arguably
I could do a lot more in the way of explaining what I mean by that
and pushing them harder to do it.

> >> int read( int * p )
> >> { int ch;
> >> while( ch = getchar(), isspace( ch ));

I like using assignments in the test of a "while", but this one's too
dense/tricky even for me.

> >> if( ch < 0 )return 0;

Why test < 0? is this meant to be an EOF check? Why not *use* the
EOF constant?

The rest -- eh, aside from the format and the use of int where logically
boolean makes more sense, it doesn't seem bad to me. Your version's more
readable, IMO, if only because of formatting.

> >> if( ch == '-' )
> >> fprintf( stderr, "minus sign not implemented yet.\n" ), exit( 99 );
> >> int value;
> >> if( !isdigit(( unsigned char )ch ))return 0; else
> >> { value = ch - '0';
> >> int ch; while( ch = getchar(), isdigit(( unsigned char )ch ))
> >> { if( mulintintoverflow( value, 10 ))return 0;
> >> value = value * 10;
> >> int const v = ch - '0';
> >> if( addintintoverflow( value, v ))return 0;
> >> value = value + v;
> >> continue; }}
> >> *p = value; return 1; }
> >>
>
> I also hope you would never recommend a mess like that for an algorithm
> for reading an integer. Try this:
>
> bool readIntValidated(int * p)
> {
> int ch;
>
> // Skip white space
> do {
> ch = getchar();
> if (ch < 0) return false;
> } while (isspace(ch));
>
> // Handle minus sign
> bool minus = false;
> if (ch == '-') {
> minus = true;
> ch = getchar();
> if (ch < 0) return false;
> }
>
> // If no digits are found at all, return false
> if (!isdigit(ch)) return false;
>
> int value = 0;
> while (true) {
> if (isdigit(ch)) {
> const int v = ch - '0';
> if (minus) {
> if (value < INT_MIN / 10) return false;
> value *= 10;
> if (value < (INT_MIN + v)) return false;
> value -= v;
> } else {
> if (value > INT_MAX / 10) return false;
> value *= 10;
> if (value > (INT_MAX - v)) return false;
> value += v;
> }
> } else {
> // Non-digit ends input
> *p = value;
> return true;
> }
> ch = getchar();
> if (ch < 0) return false;
> }
> }
>
> (The code is not tested, and may have bugs.)
>
> Some key points here are:
>
> 1. Don't use complicated general check functions unnecessarily. Check
> what you want to know.
>
> 2. Don't obsess about DRY ("don't repeat yourself"). If the natural
> flow of the code involves calling "getchar()" or "isdigit" in several
> places, that's fine - don't use convoluted loop structures just to try
> to avoid them. The compiler will handle the efficiency for you.
>
> 3. Use brackets and indentation to make the structure clear.
>
>
> An alternative here would be to use "long long int value = 0;" and then
> you can do the arithmetic safely and test afterwards:
>
> if (minus) {
> value = (value * 10) - v;
> if (value < INT_MIN) return false;
> } else {
> value = (value * 10) + v;
> if (value > INT_MAX) return false;
> }
>
> (Theoretically, "long long int" does not have to be larger than "int".
> In practice, it is.)
>
>
>
> >> int main( void )
> >> { int i, j; if( read( &i )&& read( &j ))return add( i, j );
> >> fprintf( stderr, "input error.\n" ); return EXIT_FAILURE; }

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 10:01:29 PM8/9/18
to
In article <pkh32j$sna$1...@dont-email.me>,
David Brown <david...@hesbynett.no> wrote:
> On 09/08/18 04:25, blm...@myrealbox.com wrote:
> > In article <C-201808...@ram.dialup.fu-berlin.de>,
> > Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> >> blm...@myrealbox.com <blmblm.m...@gmail.com> writes:
> >>> Mostly I'm curious, because I think for my students the most appropriate
> >>> thing to do is just to continue to say that many commonly-used programming
> >>> languages (not all!) don't deal very gracefully with this kind of thing
> >>> and that a full discussion is beyond the scope of the course. But it
> >>> would be nice to give some hints about how to write truly careful code.
> >>
> >> C is a language for experts. It takes a lot of learning
> >> (time) to become an expert. Most students will not reach
> >> this level. (Especially when they do not study computer
> >> science but, say, mechanical engineering.)
> >>
> >> Why - of all the programming languages - does it have to be
> >> C which these poor folks have to learn?
> >>
> >
> > Well. That's a good question, and I'm happy to answer. This will get
> > long ....
> >
> > I teach (at a 4-year US college) two courses using C:
> >
> > One is a one-credit-hour course in C programming required as part of
> > our BS/CS degree program. Our department thinks it's important that
> > people getting this degree have some exposure to straight C, *because*
> > it's fairly low-level. (For the curious, we teach our beginning
> > courses for majors using Scala and some of the later ones using C++.)
> > Students in this course know how to write programs in *some* language
> > but are mostly not yet expert programmers. I try to really emphasize
> > with this group that C is full of traps for the unwary and suggest
> > ways to avoid them, but sometimes "beyond the scope of this course"
> > seems like the right approach. It's also not surprising for one
> > of these students to want to know more, so I like to feel confident
> > enough about my own knowledge to say what the "beyond" might be.
> >
> > The other is a beginning programming course for engineering majors.
> > Up until a few years ago, these students took our intro course for
> > CS majors, but when we switched that to Scala, well, ENGR was not
> > happy. I wasn't involved in the ensuing discussion, but what was
> > reported to me was that they insisted on a language "with pointers",
> > which to us means C++ or C, and we all pretty much agreed that for
> > beginners C++ is just way too big and complex. I personally think
> > Python might be a better choice for this group -- they might actually
> > use it, and you can write much more interesting programs in it --
> > but "they" want "pointers", and we don't want to teach C++ as a first
> > programming language, so C it is.
>
> I agree that Python is likely to be a better choice here. It is good
> for engineering and maths.
>

That's my impression too -- or at least that it's popular there, which
maybe shouldn't matter but does.

> > I tell both groups that as far as I know there *is* still a small
> > market for C programmers. Some of it's operating-systems stuff, but
> > my understanding is that there are also some embedded systems that
> > are programmed in C, and that's a niche market where an engineer
> > might end up. And some of our industry contacts (the ones who work
> > in security) say they want people who know some C (not sure why).
>
> There is still a very big market for C programming. A fair amount of
> what is done in C could probably be better done in other languages, but
> inertia is high - and there are some people that insist on C everywhere,
> regardless of how appropriate it is. You are right about its use in OS
> and low-level stuff (drivers, etc.), and for embedded work (where C is
> dominant - and there is a /lot/ of embedded programming going on). But
> is also highly important in a few other areas. Code that needs to be
> very efficient, especially in library code, is typically done in C.
> Code that needs to be accessible from many languages (again, often in
> libraries) is also often written in C - it is the language of choice for
> interfaces due to the simplicity of its data types and function calling
> conventions. Code for the libraries and run-time systems for other
> languages is often written in C for speed and portability, and
> extensions for other languages are generally written in C. (The reason
> Python is a good choice for engineering is the existence of libraries
> like numpy and scipy, which give a nice Python interface while having
> fast C routines underneath.)

I hadn't thought about the libraries issue, but yeah, good point.

> > For the CS majors I add that we think exposure to programming at this
> > level is useful in giving them the broad conceptual understanding of
> > the field that the degree is supposed to represent, even if they never
> > write another C program.
> >
> > I also tell both groups that only a C fanatic would use C for general
> > application programs; there are plenty of other choices that are
> > more suitable for that.
>
> Agreed - but there are a lot of C fanatics around...
>
> > I add, for the engineers, that the first
> > programming language is the hardest; learning a second will be
> > much easier.

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 10:03:55 PM8/9/18
to
In article <pkh5ub$s2l$1...@reader1.panix.com>,
> > The other is a beginning programming course for engineering majors.
> > Up until a few years ago, these students took our intro course for
> > CS majors, but when we switched that to Scala, well, ENGR was not
> > happy. I wasn't involved in the ensuing discussion, but what was
> > reported to me was that they insisted on a language "with pointers",
> > which to us means C++ or C
>
>
> " ...they[ENGR] insisted on a language "with pointers" " ???
>
> Why??? If this insistence ultimately dictated your choice of
> language to teach, and it wasn't the language you'd have otherwise
> chosen, then maybe the answer to that "why?" is pretty significant.
>
> I've done >>lots<< of physics/math programming, though that was
> many years ago, and primarily in Fortran, which at that time
> had no pointers whatsoever (and even now I personally find its
> pointer syntax pretty ugly). But no problem, because no pointers
> were ever needed for anything we were ever doing. Moreover, years
> later on, I programmed (and still program) lots of math-heavy
> "financial engineering" stuff in C, and again never needed (nor need)
> any pointers for anything.
>
> It makes zero sense to me that your engineering dept's #1 big ask is
> for pointers. That's what pops into their minds before anything else???
> I know you say, "I wasn't involved in the ensuing discussion",
> but if the outcome of that discussion has had a significant negative
> effect on your subsequent teaching, then I'd suggest you re-visit
> the issue and get some concrete reasons why they want/need pointers.
> I'd wager you a couple of "free beers" they ain't gonna have any
> reasons that hold up under legitimate scrutiny.

I wouldn't take that bet :-).

Curiously enough, just in the last few days I've been involved in an
e-mail discussion with colleagues about this course and what language
we use (the person who may teach it next semester prefers C++), and
I got a bit more of the backstory that led us to choose C ....

First a disclaimer: This is all second-hand, and even if it weren't
I'd be hesitant to say anything that would reflect badly on my
colleagues in ENGR.

But apparently ENGR's request for a separate course for their
majors was driven almost entirely by a dislike for the language we
use in our first course for CS majors, namely Scala. Pushed for a
pedagogical reason that Scala wasn't okay, they came up with "doesn't
have pointers". We have our suspicions about whether the people making
this request have any idea what they mean by it, but like I said,
I don't want to bad-mouth people on the basis of a second-hand report.

Curiously enough, I'm hearing now for the first time that they (ENGR)
also mentioned Python as an option, which -- um, this would seem
to contradict the "must have pointers" requirement?! So now I'm
completely baffled about what they really want and why. If Python
was an option I'm not sure why it apparently wasn't considered
more seriously when we first started offering this course (not many
years ago).

Even more curious, the person I'm hearing this from (who pays more
attention to pedagogy than I do) cited a scholarly paper purporting
to demonstrate that C++ is actually a better language for beginners
than Python. Seems totally counter-intuitive, doesn't it? Possibly
digging into details would lead one to question what they measured
and how, but still, interesting?? I've taught CS1 in C++ and it's
not something I'm eager to do again.

Anyway, I'm pushing just a bit for us to reopen the discussion with
ENGR, but we have so much other stuff to deal with these days that
it may not happen. Enrollments are going up and up, and while it's
great that so many students want to study computer science, finding
people to teach them given university politics(?) and the labor market
is a challenge.

blmblm.m...@gmail.com

unread,
Aug 9, 2018, 10:05:23 PM8/9/18
to
In article <F7VaD.1975$%L2....@fx15.am4>, Bart <b...@freeuk.com> wrote:
> On 09/08/2018 03:18, blm...@myrealbox.com wrote:
> > In article <VuzaD.2312612$I77.1...@fx44.am4>, Bart <b...@freeuk.com> wrote:
>
> >> This is what makes it so silly. You have to obfuscate your code just to
> >> get to the starting point of those other languages. And do it in a
> >> million places (eg. everywhere you might use ++i).
> >>
> >
> > I disagree about ++i. I don't know about your use cases, but for
> > me by far the most common one for ++i is as the increment part of a
> > "for" loop, and I'm not sure I can think of any way you could get
> > overflow in the simple and most-common-for-me case:
> >
> > for (int i = 0; i < N; ++i) { .. }
> >
> > where N is an "int".
>
> Checking ++i would be an extreme case. But it can happen if i is
> modified inside the loop. Or there is perhaps a conditional --i in the
> loop (so it does many more iterations) and there is also ++j. Or you
> don't know what N is (result of a complex expression perhaps).

Okay, I wasn't specific enough about the use case. I tend to regard
modifying the loop counter in the body of the loop as tricky/unclear
code to be avoided. And if evaluating N might go wrong, then the
problem isn't with the ++i, is it?

> Or, when the for-loop header is much more complicated than this. Because
> it allows it, it seems to encourage some people to cram so much into the
> loop header, you can't even be certain which if any is the 'loop index'.
>
> Or even you write a loop with an unknown start index:
>
> for(; (c=(*fmt))!=0; ++fmt){
>
> This is an actual example where fmt is a function parameter. In this
> case it's a pointer, but it could equally have been an int.

Yipes. Just .... yipes. Is it even legal to increment function
pointers?! (I could look it up, but here in clc someone may know
without consulting the standard. :-) )

> > (Arguably if i is being used an array index the right type for it is
> > size_t, but leave that for now.)
> >
> >> [ snip ]
> >
> > (I read the rest with interest but in the interest of replying to
> > everyone won't comment more. I shouldn't be surprised by the number
> > and length of replies, but -- I was.)
>
> Overflow and UB is one of those topics...

Apparently. :-)?

> >> Good luck putting that across...
> >>
> >
> > Yeah. But really, I'm inclined to think that in any course in a
> > difficult and technical subject there are going to be points in a
> > first course where you just have to say "the details here are beyond
> > the scope of this course, but be aware that what we do in this course
> > is sort of a first approximation".
>
> I can't remember in my first programming course that that was covered at
> all (or any subsequent ones for that matter). It's enough to know that
> numbers in such languages have limited range, and being binary have
> funny-looking limits when expressed as decimal.
>
> But I think it could be useful in such courses to use a friendly
> implementation that traps on overflows and makes other runtime checks.

Could be, though then the students aren't really learning to write
proper C, are they? FSVO "proper", maybe. :-)

David Brown

unread,
Aug 10, 2018, 5:18:27 AM8/10/18
to
And again, /you/ are missing the point. No one working on serious
programming /does/ run their compiler like that.

People writing code that will work on a range of platforms and compilers
may aim for such "cc program.c" compilations to work. But they don't
use that for their development - they enable warnings, they use
debugging options, they use memory leak detectors, they use linters and
checkers - whatever makes their development process simpler.

And people writing code for a rocket control system would never even
bother with how the code compiles without the right options - the
compiler and the options used are considered critical parts of the
project and are not left to the whims of a random user.

> And the other part of it is that a major compiler like MSVC didn't even
> support that feature.

MSVC is a major C++ compiler - it is not a major C compiler, and is
badly out of date for C. AFAIK, it does not even support all of C99,
never mind C11. But it has Static_assert in C++11 modes.

>
>> Why does my gcc not give me swamps of other warnings? Because I know
>> how to use my compiler, and how to write quality C code, and you don't.
>
> Yes I know. That means that when creating, modifying or extending an
> application of tens of thousands of lines, every single line of it must
> be perfect and with no warning before you can start testing.
>

Why would I want to test code that isn't as good as I can get it at the
coding stage? Why would I want to write tens of thousands of lines
without testing underway? Why would I want to write code that triggers
warnings in the first place?

You seem to imagine my development as a process of writing huge swaths
of poor quality code, then generating vast numbers of warnings, then
going back and changing the code. I don't - I only get warnings if I've
made a mistake in my coding, or while I've not yet completed a
particular part of the code.

> Including all the code you've just added, that you don't know works yet,
> and that you might have to strip out and rewrite half an hour later,
> after spending all that time to remove all possible things might give a
> warning.
>
> Some of us just work in different ways which are not necessarily wrong
> just because you don't work the same way.

You might well have a different choice of which warnings to use - some
of the warnings I use are definitely in the "style" category. That's
fair enough. But warnings are there to show you have got something
wrong - and I am sure you too aim to write correct code at the start
rather than writing a jumble.

About the only warnings I can think of where you might want to disable
them early in the development process, then enable them later on and
perhaps get a bunch of warnings is the various "unused" warnings. If
you write in a style where these are triggered often, just disable those
warnings.

>
>> I expect people writing serious code to use their tools, and write the
>> C code, in a manner far closer to the way I do than the way /you/ do.
>>
>>> But you shouldn't be so dismissive.
>>
>> Of course I should be dismissive of your messing about with compilers
>> - it is of no use to anyone until you learn to use tools properly.
>
> Excuse me. It is MY compiler that will show that error without doing
> anything at all except run it with the name of the source file as input.
> Could it get more straightforward than that?
>
> You are saying YOUR tool is superior because /you have to tell it how to
> compile programs and which errors to detect and which to ignore/? It
> sounds like it doesn't know its job! I guess you would call that
> 'flexibility', because sometimes it really doesn't matter if that rocket
> crashes.

Again, you are letting the point fly pass you in your eagerness to feel
wronged.

I have said /many/ times that I would prefer gcc to have more warnings
and stricter checking by default. It would make almost no difference to
serious developers who understand the importance of using languages and
tools correctly, but it might help some amateurs.

Yes, gcc is superior to your compiler in many ways. Better warnings and
error checking is most certainly one of them. Your tool gets a plus for
convenience by enabling some checks automatically, but that does not
outweigh the fact that gcc has many more useful checks (and other
options) that are available.

I am dismissive of your continual complaints about the problems you have
with gcc, because you fail to use it correctly despite all the help and
advice you are given. Don't you think that it is fair?

(It has been pointed out to me that repeatedly telling you this is not
helpful. That is probably correct.)

>
>> Screens of warnings for C has never been an issue with gcc,
>
> Yes, you said. You never get warnings.

You snipped the bit about "for people who can use development tools
properly".

Warnings are helpful. Huge piles of warnings are not helpful - so it is
a good idea to use development practices that don't give you mountains
of warnings. Write some code, compile it, fix any errors or warnings,
test it. Rinse and repeat. Don't write mountains of code and then
generate mountains of warnings.

And use a decent IDE. When you do a build, you will have a list of
errors, warnings, and other messages - neatly sorted. You'll have
markers in your editor windows, you can jump directly to the part of
your code that triggered the message (it is not necessarily the part
with the problem, but it's a start). Many warnings can be generated by
the IDE as you type, making it as fast as possible to see and correct
the error.

>
> So, what is the purpose in having warnings if all of them have to be
> taken seriously according to you? Do you really never see them? Or do
> you see them before they are fixed?

I see them and then fix them.

> If which case you will also get
> screenfuls of them from time to time.

That is very rare - and only in cases where I know exactly what change I
have made that caused the effect. For example, if I rename a commonly
used header file, I'd expect piles of warnings from the build due to
missing include files, undefined identifiers, and so on.

> Then what; do you print them off
> and go through them one by one and not start to actually run your
> program until they've all gone?

I don't print them - I use an IDE for most of my programming, with two
large screens. (My desk is covered in printouts of other things, paper
with notes, etc., but not for a list of warnings.) The warnings are
there in the IDE, so I can click on them to navigate to the problem if
necessary. (It is rarely necessary - with short edit/compile cycles the
warnings would most likely be in the code being worked on.)

And no, I rarely bother running my code until the warnings are fixed.

>
> (And then you find there are bugs as I suggested above.)
>
> My compiler never gives warnings. It gives errors and stops at the first
> error. I fix that and move on. Since I can't keep in mind more than one
> anyway, and I'm not going to print out a long list to go through. You
> just compile again; if there is another error, you will find out in 50
> msec.
>

Once I have got into the main part of development for a project, I also
turn all warnings into errors - I don't want to accidentally accept code
with known problems. But my tools don't stop on the first error - that
would make the whole process a good deal slower. (And before you
mention compilation speed, it is the psychological break that is the
issue, not the compile speed. If my projects were a lot bigger and full
of C++ header libraries, then the compile speed would be relevant.)


David Brown

unread,
Aug 10, 2018, 6:31:24 AM8/10/18
to
On 09/08/18 21:37, Ben Bacarisse wrote:
> David Brown <david...@hesbynett.no> writes:
>
>> On 09/08/18 13:39, Ben Bacarisse wrote:
> <snip>
>>> It's better to test that ADC_MAX is no greater that INT_MAX / SCALE_TOP.
>>
>> If you like. It was a quick example - real life details will depend on
>> the rest of the code. The key point is that you don't use comments to
>> express something that can be written in code.
>
> Yes, but I was not commenting on that point. I was commenting on how
> best to test the values in an assert.
>

I certainly agree that you can't check for overflow using the same
expression in the static assert than in the original code, because the
same rules apply for ranges, overflows, etc.

>>> You may never be scaling 64-bit ADC values, but it's better to keep to a
>>> general rules than working out what's likely to be safe from case to
>>> case.
>>
>> I would /love/ to see an ADC that gave 64-bit values!
>>
>> No, general rules are /not/ always helpful. Too much generality is
>> counter-productive.
>
> I think you may be reading more into what I wrote than I intended.

That seems to have been the case.

> The
> general rule was just to avoid the potential UB that you are testing
> for. I don't think that's counter-productively over general. A compile
> time divide is as simple as a multiply, so I'm not advocating something
> hugely complex to cope with an extremely rare possibility. I'm
> advocating something simple so you don't have to persuade anyone that
> it's safe.
>

Agreed.

> Now you know your compiler will stop on compile-time overflow, but
> that's just the sort of thing I'd comment because it can't be tested
> for. Better to avoid the issue altogether in my book.
>

Fair enough.

> <snip>
>> What typos would that check? Again, let's be realistic. Someone might
>> define ADC_MAX to be 1024 when they meant 1023, but no one is going to
>> write it as -1023 by mistake.
>
> Is your list of possible mistakes publicly available? It would make
> everyone's life simpler if we all knew which bugs simply won't occur!
> :-)
>

If I'm right that no one would make the mistake of writing -1023, then
this one is already on /your/ list. Figuring out what might be likely
mistakes that should be checked, and what is so unlikely that the
possibility could be ignored, is not an easy task. In this case I would
put a check that ADC_MAX is one less than a power of two as higher
priority than a test for it being negative - but that is due to the
nature of common ADC's and would not be a general rule. (And there are
uncommon ADC's with different maximums.)


>>>> Otherwise since ADC_MAX and SCALE_TOP are both constants of type "int",
>>>> the multiplication will be done as "int" - and overflow if they are too
>>>> big. (Your compiler will probably warn you about that.)
>>>
>>> Is there something about static_assert (I've not studied that yet) or
>>> integer constant expressions in general that justifies the "might"?
>>
>> I'm sorry, I can't see which "might" you are referring to.
>
> Now this is odd. There is no "might" there -- quite clearly. So I went
> back up the thread to read the original and there I read "might" instead
> of "will" (on /two/ readings!). Something about the context was making
> me misread it even when I was looking hard to be sure what you'd said.
> Truly mysterious. Anyway, sorry about that!
>

Not a problem at all. I managed to read a whole bunch of stuff in your
post that wasn't there at all, so getting one word wrong is nothing!



David Brown

unread,
Aug 10, 2018, 6:34:29 AM8/10/18
to
Nor did I, so I've learned something new here.

Roll on, C17 with the defect reports integrated in the document (and a
rather nicer typesetting, IMHO).

>
> It's tempting me to revise my advice since, now, the clearest way to
> assert no overflow is to try to provoke it. However, I'm still not sure
> I'd do that. If, for example, some decides to make the constants
> unsigned (1024u) the multiplication will now not overflow but the
> division would still detect the problem.
>

Agreed.

fr31...@gmail.com

unread,
Aug 10, 2018, 6:37:48 AM8/10/18
to
On Tuesday, August 7, 2018 at 11:10:35 PM UTC-4, blm...@myrealbox.com wrote:

> the recent thread
> about undefined behavior reminds me that I never know quite what to
> say about overflow in arithmetic on signed integers.
>

As with any math-critical section of code, intrinsics, or in-line
assembly, offers the best solution.

Integer addition, signed or unsigned, and overflow checking can be
done directly with intrinsics.

The development of processor technology has far outstripped the
capacity of both C and C compilers. In many kinds of problems
intrinsics offer the best solution.

Reinhardt Behm

unread,
Aug 10, 2018, 6:40:49 AM8/10/18
to
Just tested this with GCC 4.7. It complains if there is an overflow in the
expression.
This is misleading for the purpose of this assert, but should give a hint
that there is something wrong with the values ADC_MAX or SCALE_TOP.

--
Reinhardt

Ben Bacarisse

unread,
Aug 10, 2018, 6:45:45 AM8/10/18
to
blm...@myrealbox.com <blmblm.m...@gmail.com> writes:

> In article <F7VaD.1975$%L2....@fx15.am4>, Bart <b...@freeuk.com> wrote:
<snip>
>> Or even you write a loop with an unknown start index:
>>
>> for(; (c=(*fmt))!=0; ++fmt){
>>
>> This is an actual example where fmt is a function parameter. In this
>> case it's a pointer, but it could equally have been an int.
>
> Yipes. Just .... yipes. Is it even legal to increment function
> pointers?! (I could look it up, but here in clc someone may know
> without consulting the standard. :-) )

It's not permitted, and any attempt to do so must be diagnosed. It's
what C calls a constraint violation -- a compile-time checked error that
is as close to "illegal" as the standard gets.

But that's not what Bart said. fmt is a function /parameter/; something
like

void function(char *fmt, ...)
{
...
}

He did say "it's a pointer" but it could be a pointer to any complete
object type, as in my sketch above.

Incomplete types have unknown size so you can't increment pointers to
them. void is the classic example, as well as structs whose members are
not yet known. After only seeing

struct link;

you can declare

struct link *lp;

but you can't write ++lp.

<snip>
--
Ben.

David Brown

unread,
Aug 10, 2018, 7:05:00 AM8/10/18
to
On 10/08/18 03:58, blm...@myrealbox.com wrote:
> In article <pkguek$2le$1...@dont-email.me>,
> David Brown <david...@hesbynett.no> wrote:
<snip>
>>
>> It /always/ matters, so you should /always/ check your ranges. That
>> does not mean you always need to add run-time checks to the code - it
>> means using appropriate checks for the task in hand. That might mean
>> simply knowing the ranges in question. It might mean comments or
>> documentation. It might mean notes in the answer to the homework
>> question. It might mean careful choices and naming of types or
>> functions to keep things clear. It might be entirely obvious due to the
>> nature of the task. There is a huge difference in the effort needed
>> here for a quick test program and a rocket guidance system. But you
>> should /always/ think about it and be sure your code is safe enough for
>> the task. And that should, IMHO, be drilled into the students from the
>> start of their first class - it should not be an afterthought!
>
> Good point. Arguably I should stress that more, though I can be kind of
> fanatical about checking for errors in user-supplied program inputs,
> to the point where some students complain. Not exactly the same thing,
> maybe.

It can be enough simply to point out that there are limits to the inputs
of a sample program. But you want your students to be in the habit of
thinking about this sort of thing from early on.

>
>>> One question in my mind was *how* to check for potential overflow, but
>>> I notice that a couple of replies give methods for that (which on reading
>>> make me wonder why I thought the problem was hard :-( ).
>>>
>>
>> General checking for potential overflow is often hard to do well - it's
>> easy to get things slightly wrong, or end up with ugly and messy code.
>> It is not often a good idea to try to do the kind of "is it safe to add
>> these two int's" checks - it is error prone coding and usually not the
>> logical thing to do. Instead, look directly at the parameters for the
>> function being called - check those for sensible values. Then make sure
>> you are using types that are big enough for the purpose - fixed size
>> types in <stdint.h> are often a very easy way to get that.
>
> If you're writing, say, a simple four-function calculator program,
> I think you pretty much have to check all operations for overflow,
> don't you?

I said that you don't /often/ have to check for general arithmetic
overflow, not that it is never the case!

> I guess fixed-size types might make that a little easier --
> compute a result twice the size of the inputs and check?

Often that is quite a convenient way to handle it.

>
> Silly example? yeah, but not atypical of the kinds of toy programs
> often used in beginning courses.
>

You need to present a variety of methods in your course.

David Brown

unread,
Aug 10, 2018, 7:14:22 AM8/10/18
to
On 10/08/18 04:05, blm...@myrealbox.com wrote:
> In article <F7VaD.1975$%L2....@fx15.am4>, Bart <b...@freeuk.com> wrote:
>> On 09/08/2018 03:18, blm...@myrealbox.com wrote:
>>> In article <VuzaD.2312612$I77.1...@fx44.am4>, Bart <b...@freeuk.com> wrote:
>>
>>>> This is what makes it so silly. You have to obfuscate your code just to
>>>> get to the starting point of those other languages. And do it in a
>>>> million places (eg. everywhere you might use ++i).
>>>>
>>>
>>> I disagree about ++i. I don't know about your use cases, but for
>>> me by far the most common one for ++i is as the increment part of a
>>> "for" loop, and I'm not sure I can think of any way you could get
>>> overflow in the simple and most-common-for-me case:
>>>
>>> for (int i = 0; i < N; ++i) { .. }
>>>
>>> where N is an "int".
>>
>> Checking ++i would be an extreme case. But it can happen if i is
>> modified inside the loop. Or there is perhaps a conditional --i in the
>> loop (so it does many more iterations) and there is also ++j. Or you
>> don't know what N is (result of a complex expression perhaps).
>
> Okay, I wasn't specific enough about the use case. I tend to regard
> modifying the loop counter in the body of the loop as tricky/unclear
> code to be avoided.

Yes. Modifying the loop counter in the body is one of these points
where you have to teach the students that it is /possible/ in C, and
therefore they might come across it in other people's code - but they
should never do it themselves. (I'm sure there are people who could
tell you why they think it is a good technique in some cases - either
they are wrong, or the the examples are significantly more advanced than
the level you are teaching.)

> And if evaluating N might go wrong, then the
> problem isn't with the ++i, is it?
>
>> Or, when the for-loop header is much more complicated than this. Because
>> it allows it, it seems to encourage some people to cram so much into the
>> loop header, you can't even be certain which if any is the 'loop index'.
>>
>> Or even you write a loop with an unknown start index:
>>
>> for(; (c=(*fmt))!=0; ++fmt){
>>
>> This is an actual example where fmt is a function parameter. In this
>> case it's a pointer, but it could equally have been an int.
>
> Yipes. Just .... yipes. Is it even legal to increment function
> pointers?! (I could look it up, but here in clc someone may know
> without consulting the standard. :-) )

He said function /parameter/, not function /pointer/. But yes, it is
legal to increment function pointers - if the pointer points into an
array of function pointers (of the right kind).

>
>>> (Arguably if i is being used an array index the right type for it is
>>> size_t, but leave that for now.)
>>>
>>>> [ snip ]
>>>
>>> (I read the rest with interest but in the interest of replying to
>>> everyone won't comment more. I shouldn't be surprised by the number
>>> and length of replies, but -- I was.)
>>
>> Overflow and UB is one of those topics...
>
> Apparently. :-)?
>
>>>> Good luck putting that across...
>>>>
>>>
>>> Yeah. But really, I'm inclined to think that in any course in a
>>> difficult and technical subject there are going to be points in a
>>> first course where you just have to say "the details here are beyond
>>> the scope of this course, but be aware that what we do in this course
>>> is sort of a first approximation".
>>
>> I can't remember in my first programming course that that was covered at
>> all (or any subsequent ones for that matter). It's enough to know that
>> numbers in such languages have limited range, and being binary have
>> funny-looking limits when expressed as decimal.
>>
>> But I think it could be useful in such courses to use a friendly
>> implementation that traps on overflows and makes other runtime checks.
>
> Could be, though then the students aren't really learning to write
> proper C, are they? FSVO "proper", maybe. :-)
>

Surely they are learning to write correct code? If not, what's the point?

They should probably learn some basic debugging techniques while they
are at it, and certainly should learn about the use of tools like gcc
and clang sanitizers to help catch problems.


David Brown

unread,
Aug 10, 2018, 7:45:54 AM8/10/18
to
(I think /I/ was the "other poster" who mentioned fixed size types :-) )

Sometimes you can't have types and numbers for which you are sure
overflow cannot occur, and then you are stuck with trying to predict it
or detect it. Checks on ranges before the operation before carrying it
out is, in fact, a way of avoiding the overflow. /Detecting/ would be
to carry out the operation in a defined manner (typically by using
bigger types, or by converting to unsigned types) to avoid UB, and then
determining afterwards if it worked correctly.

I am just advising that you don't use inappropriate or overly complex
checks.


>>>> int add( int const i, int const j )
>>>> { if( addintintoverflow( i, j ))
>>>> return fprintf( stderr, "overflow.\n" ), EXIT_FAILURE;
>>>> return printf( "%d\n", i+j ) > 0 ? EXIT_SUCCESS : EXIT_FAILURE; }
>>>>
>>
>> Whatever you teach your students, I hope you don't teach them to write a
>> function called "add" with two "int" parameters and an "int" return type
>> that returns the length of a string printed out!
>
> Ouch. I didn't really notice that, but yeah, not perhaps the most
> self-documenting function name, is it?
>
>> And I /really/ hope you don't teach them to follow the incredible
>> formatting style shown in these examples.
>
> In the examples I show them I follow my own preferred style, which is
> less, um, "dense" than the code below.

Less "dense" is usually good.

>
> I tell them there are many possible formatting styles and that they
> should choose one that's readable and use it consistently. Arguably
> I could do a lot more in the way of explaining what I mean by that
> and pushing them harder to do it.

You can't (or at least shouldn't) be too forceful about formatting
styles in a teaching class. Different people, different groups,
different projects use different styles. And when a programmer starts
working with an existing project, group, department, etc., they usually
have to adopt the existing style. Consistency is often more important
than anything else. All you can do is for /you/ to use a neat style,
and to encourage them to use a style that is clear, easy to follow, and
has plenty of space to improve readability - and to give you space for
your red pen on printed out assignments :-)

>
>>>> int read( int * p )
>>>> { int ch;
>>>> while( ch = getchar(), isspace( ch ));
>
> I like using assignments in the test of a "while", but this one's too
> dense/tricky even for me.
>

Personally, I have a greater aversion to it than most people. Teach
your students that some people write code like that, and they need to
know what it means, but they don't have to choose to write it themselves.

>>>> if( ch < 0 )return 0;
>
> Why test < 0? is this meant to be an EOF check? Why not *use* the
> EOF constant?
>

EOF is guaranteed to be less than zero, and valid character returns are
guaranteed to be non-negative. But testing explicitly for EOF would
work too.

Tim Rentsch

unread,
Aug 10, 2018, 7:53:32 AM8/10/18
to
blm...@myrealbox.com <blmblm.m...@gmail.com> writes:

[...]
>>> In article <overflow-20...@ram.dialup.fu-berlin.de>,
>>> Stefan Ram <r...@zedat.fu-berlin.de> wrote:
>>>>
>>>> int read( int * p )
>>>> { int ch;
>>>> while( ch = getchar(), isspace( ch ));
>
> I like using assignments in the test of a "while", but this one's
> too dense/tricky even for me.

A funny combination of views. As far as using assignments in
while() conditions go, this use is among the simpler ones.

This particular case though is probably better written thusly:

do ch = getchar(); while( isspace( ch ) );

which kills several semantic birds with one syntactic stone.

David Brown

unread,
Aug 10, 2018, 8:00:00 AM8/10/18
to
The first programming course for CS majors should have very different
requirements, and very different contents, from a first programming
course for ENGR students.

Engineering students need something practical to help do engineering
problems. That means something that works well with lots of numerical
operations, matrices, vectors, floating point arithmetic, etc. It
should be a well-known language with plenty of tools. Fortran was a
popular choice, but now I would recommend Python with numpy/scipy. C is
a poor choice - you spend too much effort faffing around with manual
memory handling, limited file support and almost non-existent string and
text handling. With Python and numpy, code to read in a table of
numbers from a csv file is done in a few lines - you can concentrate on
the important stuff rather than the mechanics of the language.

Programming students need something with a strong theoretical basis -
they should be looking at verifying correctness of their algorithms,
separating functions pre-conditions and post-conditions, identifying
invariants in data structures, and that sort of stuff. It is a /good/
thing if the language is not so well known - it is a benefit if no one
in the class has heard of it before starting the course. I does not
have to be a language that anyone would ever use after leaving
university - its purpose is to help teach about programming, not to
train code monkeys. So Scala is a fine choice here. When I was at uni,
we used something akin to Haskell.

Programming students should take up C when learning about embedded
programming, OS design, interfacing, low-level programming, etc. That
comes later.


>
> Curiously enough, I'm hearing now for the first time that they (ENGR)
> also mentioned Python as an option, which -- um, this would seem
> to contradict the "must have pointers" requirement?! So now I'm
> completely baffled about what they really want and why. If Python
> was an option I'm not sure why it apparently wasn't considered
> more seriously when we first started offering this course (not many
> years ago).
>
> Even more curious, the person I'm hearing this from (who pays more
> attention to pedagogy than I do) cited a scholarly paper purporting
> to demonstrate that C++ is actually a better language for beginners
> than Python. Seems totally counter-intuitive, doesn't it? Possibly
> digging into details would lead one to question what they measured
> and how, but still, interesting?? I've taught CS1 in C++ and it's
> not something I'm eager to do again.
>

C++ is a huge language. You can teach some of it to beginners, but you
can't expect to cover everything in a single course.

If they ever ask you to teach C++, insist that you get to teach modern
C++ - at least C++11, but preferably newer. It makes things much easier.

Tim Rentsch

unread,
Aug 10, 2018, 8:34:33 AM8/10/18
to
blm...@myrealbox.com <blmblm.m...@gmail.com> writes:

> In article <kfnpnys...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> blm...@myrealbox.com <blmblm.m...@gmail.com> writes:
>>
>>> I teach C programming to undergraduates, and while I do my best to
>>> teach them how to use the language correctly, the recent thread
>>> about undefined behavior reminds me that I never know quite what to
>>> say about overflow in arithmetic on signed integers. [...]
>>
>> For programming in C, it's important to understand the notion of
>> undefined behavior. This topic deserves treatment on its own, not
>> as a footnote or parenthetical explanation of something else (ie,
>> such as integer arithmetic in this case). And that explanation
>> should come before other topics, like arithmetic operations, where
>> undefined behavior is part of the definition (or lack thereof) of
>> the operations involved.
>
> Mmph. I think you have a point for students with some previous
> programming experience. But for total beginners? and at least
> a few of the students in that course for ENGR majors are in that
> category.

IMO, yes, even for total beginners. I'm talking about 10 or
maybe 15 minutes out of one lecture. They should be made aware
of the general notion, even if they don't understand all the ins
and outs and whys and wherefores. One change: I now think
"safe" and "unsafe" may not be as good as "well behaved" and
"poorly behaved". (I would like something better than "poorly
behaved" but haven't thought of anything yet.)

>> This explanation needn't go into all the gory details. You might
>> say that some programming languages are "safe" (or perhaps mostly
>> safe), in that if a program does something wrong what happens is
>> still fairly well delineated. C isn't like that: in C some
>> operations are "unsafe", and the only thing that can be relied on
>> is that these cases should not be relied on. Follow with a couple
>> of examples in each of the two categories, safe and unsafe. After
>> (and only after) introducing the concept of undefined behavior
>> should the class then get on to signed arithmetic and how to deal
>> with it.
>
> So can you think of examples of "safe" versus "unsafe" suitable for
> total beginners? that can be taught before teaching anything about ....
> Well, pretty much anything, right? signed integer arithmetic,
> reading input (which is a whole can of worms by itself :-( -- there's
> a reason I didn't put any attempt at that in my example :-) ).

For safe: testing whether a variable is equal to zero. This is
safe in C as well as in more well-behaved languages. (I am
assuming here that the idea of variables has at least been
introduced.)

(Strictly speaking, in C this is potentially UB if the variable
holds a trap representation or hasn't been initialized, but for
this course it seems okay to leave out that level of detail.)

For unsafe: dividing by zero. In C this is straight out UB.
In more well-behaved languages it at least guarantees a run-time
trap of some variety. The operation of division is so familiar
that it shouldn't need any previous coverage.

That's only one example in each category, the best I could come
up with on short notice. As the course outline develops I
expect other examples will occur to you, perhaps drawing on
some area covered later but appealing to students' intuition
about how things may work. Your students may be ignorant
about programming but presumably they are not stupid.

And just as I was writing this another example presented itself:
using an uninitialized variable. Obvious eh? :) Some other
languages simply don't have uninitialized variables, but C
certainly does, and in most cases using one is UB, or at least
potentially UB.

> [...]
> Anyway thanks for the long reply.

Thank you, I'm glad to hear my comments found some resonance.
It is loading more messages.
0 new messages