mixed-sign arithmetic and auto

Andrei Alexandrescu (See Website For Email)

unread,

Jan 5, 2008, 2:57:02 AM1/5/08

to

I have had a mini- or probably micro- or even milli-epiphany: using
"auto" will exacerbate C's broken unsigned arithmetic, which C++ also
inherited.

As we all know, in C, any expression that has unsigned within a radius
of a mile will also have type unsigned. This is a simple rule but one of
remarkable bluntness because it assign many operators the wrong result
type. Consider u an unsigned int value and i a signed int value.

1. u * i and i * u yield unsigned, although it should yield a signed value.

2. u / i and i / u also yield unsigned, although again they should both
return a signed value.

3. u + i and i + u again yield unsigned. Here it is not clear which
signedness would be more helpful. In my personal opinion, the
tie-breaker should be a rule that "does not yield the wrong result for
small, reasonable inputs". I'm basing this on the assumption that most
integrals are small in absolute value, something that I recall was
measured in the context of conservative garbage collectors. By that
rule, u + i and i + u should be typed as signed. Typing them as unsigned
make the operation fail for small numbers, e.g. 0 - 1u yields a large
unsigned number.

4. This is the funniest one: -u actually returns unsigned!

(As an aside to my point: comparisons convert both numbers to unsigned,
so i < u will first convert i to unsigned. This again fails for small
reasonable numbers because -1 will never be smaller than anything. Some
compilers warn about mixed-signed comparisons.)

C and C++ partly compensate their mishandling of mixed-sign operations
by being generous with implicit conversions: int converts to unsigned
and back no problem. So to avoid the whole "I got the wrong signedness"
business, you don't even need a cast - only a named value:

int a = i + u; // fine
unsigned b = i + u; // also fine

Complex expressions still are exposed to issues, but they are only a
subset of mixed-sign code. Here's an example that might surprise some:

int i = -3;
unsigned u = 2;
int x = (i + u) / 2;

The "correct" value is zero, but x receives the largest integer value.

This all is hardly news to anyone hanging out around here. My
milli-epiphany is that "auto" will make all of the ambiguities worse.
Why? Because C and C++98 require a type specification whenever a value
is defined. But in C++0x, if auto is successful, people will use "auto"
knowing it does the right thing without so much as thinking about it:

auto a = i + u;

Oops... a will be unsigned, even though the user meant it (and actually,
without being an expert in the vagaries of integral arithmetic sincerely
thought) it is int. After all, the code:

int a = i + u;

so it's intuitive that replacing "int" with "auto" is harmless and
actually better, because it will nicely become "long" if necessary. So
with all things considered, "auto" does not always do the right thing!

So I thought I'd share this thought with you all and ask if there are
any ideas on how to solve the problem elegantly. My prediction is that,
if we keep the current rules, "auto" will actually do more harm than
good for mixed-sign arithmetic. As changing semantics is not an option,
it might be useful to look into statically disabling certain mixed-sign
operations.

Andrei

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

kwikius

unread,

Jan 5, 2008, 10:47:20 AM1/5/08

to

On Jan 5, 7:57 am, "Andrei Alexandrescu (See Website For Email)"
<SeeWebsiteForEm...@erdani.org> wrote:

<...>

> So I thought I'd share this thought with you all and ask if there are
> any ideas on how to solve the problem elegantly. My prediction is that,
> if we keep the current rules, "auto" will actually do more harm than
> good for mixed-sign arithmetic. As changing semantics is not an option,
> it might be useful to look into statically disabling certain mixed-sign
> operations.

The simple answer is to avoid inbuilt 'C' arithmetic types. Create a
framework of Concepts representing the semantics of operations on UDT
ints, and allow the user, to create custom types with the semantics
(conversion, binary ops etc)they want. (Also for example I would love
types of a guaranteed size !, overflow behaviour, etc, etc)

The problem is to try to make defining your own types as painless as
possible. The current methodology is to define operations directly on
types, which is all wrong IMO. It is possible to overload operations
based on Concepts, it is also possible to create function definitions
based solely on concepts (including the ability to opt out or override
for certain types or 'archetype' combinations)

With the combination of these mechanisms, it is possible to create
very lightweight type definitions with a large amount of functionality
and well documented semantics, without too much effort.

As to how well this functionality plays with C++0x Concepts I don't
know, but I hope to publish a version of my quan physical quantities
library at some stage which demonstrates the techniques in current C+
+. (tested on VC7.1, VC8.0 and gcc4.)

(quan models physical quantities, but the basic problems are similar
for fundamental arithmetic types, as you described above)

Side Note for D. Making operator functions only permissible as member
functions of types (think about it) knackers this methodology AFAICS ,
which is a shame!

regards
Andy Little

Francis Glassborow

unread,

Jan 5, 2008, 10:50:02 AM1/5/08

to

Andrei Alexandrescu (See Website For Email) wrote:

> So I thought I'd share this thought with you all and ask if there are
> any ideas on how to solve the problem elegantly. My prediction is that,
> if we keep the current rules, "auto" will actually do more harm than
> good for mixed-sign arithmetic. As changing semantics is not an option,
> it might be useful to look into statically disabling certain mixed-sign
> operations.
>

I suppose that we could require a cast when auto is used in the context
of a mixed arithmetic initialisation expression. However, as you
illustrated, that still leaves the door open for errors.

I think I would prefer to strongly encourage compilers to warn any time
that mixed mode arithmetic is used in an initialiser expression.

That gives me pause for a moments thought, perhaps we could require a
cast when using the new initialisation syntax with a mixed mode
initialiser expression:

int something { u + i}; // error
requires that you write it as either

int something(u + i); // hopefully generating a compile time warning

or

int something { int (u + i) };

and:

auto something { u + i}; // error
requires that you write it as either

auto something(u + i); // hopefully generating a compile time warning

or

auto something { int (u + i) };

--
Note that robinton.demon.co.uk addresses are no longer valid.

Andrei Alexandrescu (See Website For Email)

unread,

Jan 5, 2008, 6:32:10 PM1/5/08

to

Francis Glassborow wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>
>> So I thought I'd share this thought with you all and ask if there are
>> any ideas on how to solve the problem elegantly. My prediction is that,
>> if we keep the current rules, "auto" will actually do more harm than
>> good for mixed-sign arithmetic. As changing semantics is not an option,
>> it might be useful to look into statically disabling certain mixed-sign
>> operations.
>>

(All -- sorry for the few incoherent sentences toward the end of my
previous message. I haven't proofread it, and it shows.)

> I suppose that we could require a cast when auto is used in the context
> of a mixed arithmetic initialisation expression. However, as you
> illustrated, that still leaves the door open for errors.
>
> I think I would prefer to strongly encourage compilers to warn any time
> that mixed mode arithmetic is used in an initialiser expression.

There's a possibility for the compiler to properly track ambiguous-sign
results. Imagine the compiler defines internal types intbits and
longbits, which mean "value of ambiguous signedness". Then any of the
mixed-sign operation yields either intbits or longbits.

These types would not be accessible to user code, so if somebody tries
to write this:

auto a = u + i;

they see the error message: "Cannot infer type of a from a value of
ambiguous signedness".

The beauty of the scheme is that intbits does implicitly convert to int
and unsigned int, so as long as the user _does_ decide the desired
signedness of the result, the code goes through:

int a = u + i; // fine, intbits -> int
unsigned b = u + i; // fine, intbits -> unsigned

Another nice element of the scheme is that the sign ambiguity is
properly taken care of in complex expressions:

int a = (u + i) & i;

This works because:

a) u + i returns intbits

b) it's legal to do bitwise AND between intbits and int (the sign is
irrelevant) returning intbits

c) the intbits result gets converted to a

Again, if a were auto, the code would not compile.

So in a nutshell the compiler would use these two types to transport the
information that ambiguous signedness is in vigor. As soon as the user
tries something that has sign-dependent semantics, an error would occur
(or warning for legacy code):

int a = (u + i ) / 2; // warning: ambiguous-sign operation

> That gives me pause for a moments thought, perhaps we could require a
> cast when using the new initialisation syntax with a mixed mode
> initialiser expression:
>
> int something { u + i}; // error
> requires that you write it as either
>
> int something(u + i); // hopefully generating a compile time warning
>
> or
>
> int something { int (u + i) };
>
> and:
>
> auto something { u + i}; // error
> requires that you write it as either
>
> auto something(u + i); // hopefully generating a compile time warning
>
> or
>
> auto something { int (u + i) };

This scheme is imperfect because it requires a cast to an actual type,
so the code is brittle when "something" changes type from int to long.
There should be library functions that do the cast taking size into account:

auto something { std::tosigned (u + i) };

or

auto something { std::tounsigned (u + i) };

Andrei

--

Nevin :-] Liber

unread,

Jan 6, 2008, 11:20:16 AM1/6/08

to

In article <477F1858...@erdani.org>,

"Andrei Alexandrescu (See Website For Email)"

<SeeWebsit...@erdani.org> wrote:

> So I thought I'd share this thought with you all and ask if there are
> any ideas on how to solve the problem elegantly. My prediction is that,
> if we keep the current rules, "auto" will actually do more harm than
> good for mixed-sign arithmetic. As changing semantics is not an option,
> it might be useful to look into statically disabling certain mixed-sign
> operations.

While auto might do "more harm than good" for those folks who mix
fundamental types, in my opinion it is more important that the rules for
auto remain as close as possible to that of template deduction. Making
rules that are inconsistent are the things that make the language more
expert friendly at the expense of everyone else, because only the
experts will spend enough time learning the deep dark corners of the
language to even know about the existence of these tricks.

We are talking about people who already mix types without understanding
the ramifications of doing so. Do you really expect them to discover
this feature of auto? Or worse, their fix might be along the lines of:

auto x = static_cast<int>((i + u) / 2);

because they "know" that casts always "fix" these kinds of problems.

Your change would make the following code transformation very fragile:

template<typename T> void func(T const& t) { /* ... */ }
//...
func(a + b);

into

template<typename T> void func(T const& t) { /* ... */ }
//...
auto c = a + b;
func(c);

Having that exception to the use of auto makes the overall language
harder, not easier to use.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

Andrei Alexandrescu (See Website For Email)

unread,

Jan 6, 2008, 1:48:06 PM1/6/08

to

Nevin :-] Liber wrote:
> In article <477F1858...@erdani.org>,
> "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsit...@erdani.org> wrote:
>
>> So I thought I'd share this thought with you all and ask if there are
>> any ideas on how to solve the problem elegantly. My prediction is that,
>> if we keep the current rules, "auto" will actually do more harm than
>> good for mixed-sign arithmetic. As changing semantics is not an option,
>> it might be useful to look into statically disabling certain mixed-sign
>> operations.
>
> While auto might do "more harm than good" for those folks who mix
> fundamental types, in my opinion it is more important that the rules for
> auto remain as close as possible to that of template deduction. Making
> rules that are inconsistent are the things that make the language more
> expert friendly at the expense of everyone else, because only the
> experts will spend enough time learning the deep dark corners of the
> language to even know about the existence of these tricks.

This might be a misunderstanding. My idea was to _disable_ "auto" (i.e.,
render it uncompilable) when combined with certain mixed-sign
operations, not to impart to it different semantics than the rest of the
type inference mechanism.

I agree that making auto smarter than template deduction would be the
wrong fight to fight.

> We are talking about people who already mix types without understanding
> the ramifications of doing so. Do you really expect them to discover
> this feature of auto? Or worse, their fix might be along the lines of:
>
> auto x = static_cast<int>((i + u) / 2);
>
> because they "know" that casts always "fix" these kinds of problems.

People tend to take the path of least resistance. In this case, I think
they'd write:

int c = (i + u) / 2;

which is shorter and does the same thing.

> Your change would make the following code transformation very fragile:
>
> template<typename T> void func(T const& t) { /* ... */ }
> //...
> func(a + b);
>
> into
>
> template<typename T> void func(T const& t) { /* ... */ }
> //...
> auto c = a + b;
> func(c);

Nonono, that's certainly not what I had in mind. Again: in my opinion,
it might be useful to just disallow (or warn on) the use of auto in the
most flagrant cases of unsigned type mishandling.

Andrei

--

Greg Herlihy

unread,

Jan 7, 2008, 11:00:00 AM1/7/08

to

On Jan 4, 11:57 pm, "Andrei Alexandrescu (See Website For Email)"

<SeeWebsiteForEm...@erdani.org> wrote:
> I have had a mini- or probably micro- or even milli-epiphany: using
> "auto" will exacerbate C's broken unsigned arithmetic, which C++ also
> inherited.

The new use of the "auto" keyword does not change anything abut C++'s
unsigned arithmetic - and that is exactly how it should be.

> As we all know, in C, any expression that has unsigned within a radius
> of a mile will also have type unsigned. This is a simple rule but one of
> remarkable bluntness because it assign many operators the wrong result
> type. Consider u an unsigned int value and i a signed int value.
>
> 1. u * i and i * u yield unsigned, although it should yield a signed
value.

No. The product of u and i should not be signed. Here's why: an
unsigned int in C++ is not simply a signed integer value that happens
to have a non-negative value. In C++, an unsigned int is a member of a
"finite field". Signed values, in contrast, are members of a non-
finite field (the set of integers) - even though an int type in C++
can hold only a finite number of values.

Finite fields have some interesting properties. For one, all
operations performed within a finite field result in an element within
that field. Therefore, it must be the case that all arithmetic
operations involving an unsigned int must yield an unsigned int. So, i
* u has to produce an unsigned value - even if the multiplication has
to "wrap around" the edges of the field in either a forward (for
positive multipliers) or a backward (for negative multipliers)
direction in order to ensure that the result of the multiplication is
a member of the finite field.

> 2. u / i and i / u also yield unsigned, although again they should both
> return a signed value.

No. Just like any other arithmetic operation performed over a finite
field, the quotient yielded by division must yield a member of the
finite field, that is, an unsigned value.

> 3. u + i and i + u again yield unsigned. Here it is not clear which
> signedness would be more helpful. In my personal opinion, the
> tie-breaker should be a rule that "does not yield the wrong result for
> small, reasonable inputs". I'm basing this on the assumption that most
> integrals are small in absolute value, something that I recall was
> measured in the context of conservative garbage collectors. By that
> rule, u + i and i + u should be typed as signed. Typing them as unsigned
> make the operation fail for small numbers, e.g. 0 - 1u yields a large
> unsigned number.
>
> 4. This is the funniest one: -u actually returns unsigned!

Naturally, for anyone familiar only with standard arithmetic, finite
field arithmetic does seem odd. But being different does not imply
being wrong. As it turns out, finite field arithmetic is just as well-
defined (but not as well-known) as the "ordinary" arithmetic taught in
first grade. So -u yielding an unsigned value in a finite field is
really no more surprising than -i yielding a signed value over a non-
finite field.

> This all is hardly news to anyone hanging out around here. My
> milli-epiphany is that "auto" will make all of the ambiguities worse.
> Why? Because C and C++98 require a type specification whenever a value
> is defined. But in C++0x, if auto is successful, people will use "auto"
> knowing it does the right thing without so much as thinking about it:
>
> auto a = i + u;

The above expression adds i and u and stores the result a variable
named "a" of some integral type. Now, clearly the programmer does not
care whether "a" happens to be signed or unsigned. After all, if the
programmer had a preference regarding a's signedness, then the
programmer would have specified whether "a" was a signed or unsigned
type. But since no such type is specified, it must be the case that
whatever type the compiler does select for "a" - is a matter of
complete indifference to the programmer.

> Oops... a will be unsigned, even though the user meant it (and actually,
> without being an expert in the vagaries of integral arithmetic sincerely
> thought) it is int. After all, the code:

> int a = i + u;
>
> so it's intuitive that replacing "int" with "auto" is harmless and
> actually better, because it will nicely become "long" if necessary. So
> with all things considered, "auto" does not always do the right thing!

If by the "right thing" you mean that "auto" should somehow intuit
whether the programmer wants the result of a particular expression to
be stored as a signed or unsigned value - then, yes, the "auto"
keyword clearly does not do the right thing. Nor will it ever. But the
purpose of the "auto" keyword is not to declare a variable of a type
identical to the type that the programmer would have declared - if the
programmer had been required to specify a type.

The presence of the "auto" keyword therefore is not a shorthand
notation for the programmer's preferred type - instead, "auto"
indicates that no such preferred type exists. For example, when
storing a result of an unspecified type, the programmer would clearly
have no preference with regard to type. Another example: when
declaring a variable to hold an intermediate result of a longer
calculation, the programmer would want the type of the intermediate
result to be the same as the type the result would have had as a
subexpression of the entire calculation.

> So I thought I'd share this thought with you all and ask if there are
> any ideas on how to solve the problem elegantly. My prediction is that,
> if we keep the current rules, "auto" will actually do more harm than
> good for mixed-sign arithmetic. As changing semantics is not an option,
> it might be useful to look into statically disabling certain mixed-sign
> operations.

The only potential problem that I can foresee is that C++ programmers
might not understand how to use the "auto" keyword appropriately. The
new use of "auto" does not mean that programmers will be able to
replace explicit type declarations with vague ones. Yet, I doubt that
many programmers would use "auto" with such an expectation.

Programmers after all are quite familiar with the penalties of being
vague in their programming. So how many programmers, needing to
declare an "int" variable, would instead of declaring the "int"
variable - opt to declare an "auto" variable instead? Most programers
I would think, would instinctively would favor the explicit
declaration over the implicit one.

Greg

Francis Glassborow

unread,

Jan 8, 2008, 9:17:21 AM1/8/08

to

Greg Herlihy wrote:
> The presence of the "auto" keyword therefore is not a shorthand
> notation for the programmer's preferred type - instead, "auto"
> indicates that no such preferred type exists.

I am not sure that I agree with that assertion. one of the main purposes
of auto is to simplify the writing of template code.

template<typename T, typename U>
foo(T t, U u)-> typeof(t * u){
auto temp(t * u);
//do something
return temp;
}

Yes, I know that 'typeof' is not correct but I lack the time to go and
look up the correct syntax etc.

The point is that as a programmer I do care what the type is but I
cannot hard code it because I do not know what it will be.

And yes I understand your rationale of why unsigned v signed works that
way but nonetheless it is not the only sane choice and other languages
do it differently. Indeed I remain of the opinion that the way overflow
works for signed integer types is dangerous and not understood by many
programmers.

Carlos Moreno

unread,

Jan 8, 2008, 9:18:16 AM1/8/08

to

> > 1. u * i and i * u yield unsigned, although it should yield a signed
> > value.
>
> No. The product of u and i should not be signed. Here's why: an
> unsigned int in C++ is not simply a signed integer value that happens
> to have a non-negative value. In C++, an unsigned int is a member of a
> "finite field". Signed values, in contrast, are members of a non-
> finite field (the set of integers) - even though an int type in C++
> can hold only a finite number of values.

This does not make sense --- I mean, it makes sense only when
"forcing"
the logic that way.

If you claim that int is part of a non-finite field even though the
set of values representable by the int data type is finite, how can
you possibly justify that unsigned int is different? Why do you
simply "choose" to say that unsigned int is a part of a finite set
and not part of a non-finite set?

Both int and unsigned int are finite sets. As such, they are members
of any finite set that includes them, and they're both also part of
the same non-finite set --- the integer numbers, Z. All of the
values
representable by int are integer numbers; all of the values
representable
by unsigned int are integer numbers.

Why do you choose to *see them* in such an unintuitive way?

Technically speaking, promotion either way would be wrong; normally,
when mixing types in an expression, the result has the type that is
the superset of the other one (at least conceptually). int and
double --- integer numbers are a subset of real numbers; though
double is far from truly represent real numbers, the intended
meaning is that of real numbers.

The problem is that neither the actual set of int values nor the
actual set of unsigned int values is a subset of the other one. But
then, given that *conceptually* the set of non-negative integer
values is a subset of the set of integer values, it would only make
sense that promotion (i.e., choosing the result type of an expression
mixing unsigned and signed) chooses signed.

Putting aside all these theoretical/conceptual arguments, I think
anyone simply needs to agree that there must be something wrong
with a language where the following does not correctly give the
average of a set of integer numbers:

double avg (const vector<int> & values)
{
return accumulate (values.begin(), values.end(), 0) /
values.size();
}

Yes, you could argue that the one detail that is wrong with such
a language is that vector<>::size() returns unsigned. I do not
think that's such a bad idea, given that the size can not be
negative. However, a negative integer value being divided by a
positive number and yielding a positive value, yes, I call that
a severe flaw.

IMHO, your argument abouet finite and non-finite fields would be
valid if the fact were explicit --- *very explicit*. With int
and unsigned int, intuition kicks in, and the whole thing ends
up being one additional entry in the "Common Mistakes" Appendix
at the end of most C books.

Carlos
--

Andrei Alexandrescu (See Website For Email)

unread,

Jan 8, 2008, 9:19:07 AM1/8/08

to

Greg Herlihy wrote:
>> As we all know, in C, any expression that has unsigned within a radius
>> of a mile will also have type unsigned. This is a simple rule but one of
>> remarkable bluntness because it assign many operators the wrong result
>> type. Consider u an unsigned int value and i a signed int value.
>>
>> 1. u * i and i * u yield unsigned, although it should yield a signed
> value.
>
> No. The product of u and i should not be signed. Here's why: an
> unsigned int in C++ is not simply a signed integer value that happens
> to have a non-negative value. In C++, an unsigned int is a member of a
> "finite field". Signed values, in contrast, are members of a non-
> finite field (the set of integers) - even though an int type in C++
> can hold only a finite number of values.

Interesting! As your entire argument hinges on the fact that unsigned
models finite fields, let's focus on that. First off, I did not even
know what a finite field is, so I searched around and saw that it's the
same as a Galois field, at which point a lonely neuron fired reminding
me of a class taken a long time ago.

I could check quite easily that indeed unsigned int models the finite
field e.g. 2**32 (on 32-bit machines). So, point taken.

However, your arguments fail to convince me for the following reasons.

First, one issue with unsigned is that it converts to and from int. I
agree that there is an isomorphism between int and unsigned, as the sets
have the same number of elements; but in order to derive anything
useful, we must make sure that the isomorphism is interesting. If you
consider int to model, as you say, integers small in absolute value,
then I fail to find the isomorphism between int and unsigned as very
interesting.

Second (and I agree that this is an argument by authority) I failed to
find much evidence that people generally use unsigned to model a finite
field in actual programs. To the best of my knowledge, the uses of
unsigned types I've seen were:

1. As a model for natural numbers

2. As a "bag of bits" where the sign is irrelevant

3. As a natural number modulo something. (This use would be closest to
the finite field use.)

For example, I doubt that somebody said: "I need to model the number of
elements in a container, so a finite field would be exactly what the
doctor prescribed." More likely, the person has thought of a natural
number. I conjecture that more people mean "natural number" than "finite
field" when using unsigned types.

> Finite fields have some interesting properties. For one, all
> operations performed within a finite field result in an element within
> that field. Therefore, it must be the case that all arithmetic
> operations involving an unsigned int must yield an unsigned int.

This argument does not even follow. Int is also a finite field
isomorphic with unsigned, so it's completely arbitrary in a mixed
operation which field you want the result to "fall". On what grounds was
unsigned preferred? For all I know, u1 - u2 produces a useful result for
small values of u1 and u2 if it's typed as int.

> So, i
> * u has to produce an unsigned value - even if the multiplication has
> to "wrap around" the edges of the field in either a forward (for
> positive multipliers) or a backward (for negative multipliers)
> direction in order to ensure that the result of the multiplication is
> a member of the finite field.
>
>> 2. u / i and i / u also yield unsigned, although again they should both
>> return a signed value.
>
> No. Just like any other arithmetic operation performed over a finite
> field, the quotient yielded by division must yield a member of the
> finite field, that is, an unsigned value.

Nope. You again assume the same thing without proving it: why would the
result fall in the finite field unsigned and not in the finite field
int? And if you claim that int is not intended to model a finite field,
then I come and ask - then on what grounds do you define an morphism
from int to unsigned?

If we continue to pull on that string, it pretty much unweaves your
entire finite-field-based argument. So I snipped some of it in wait for
more information.

> Another example: when
> declaring a variable to hold an intermediate result of a longer
> calculation, the programmer would want the type of the intermediate
> result to be the same as the type the result would have had as a
> subexpression of the entire calculation.

I agree that this is a good argument. It does not dilute my point, which
was: since the rules for typing mixed-sign arithmetic might surprise
some, something that explicit typing cloaked by allowing free
conversions to and fro, it might be useful to disallow certain uses of auto.

>> So I thought I'd share this thought with you all and ask if there are
>> any ideas on how to solve the problem elegantly. My prediction is that,
>> if we keep the current rules, "auto" will actually do more harm than
>> good for mixed-sign arithmetic. As changing semantics is not an option,
>> it might be useful to look into statically disabling certain mixed-sign
>> operations.
>
> The only potential problem that I can foresee is that C++ programmers
> might not understand how to use the "auto" keyword appropriately. The
> new use of "auto" does not mean that programmers will be able to
> replace explicit type declarations with vague ones. Yet, I doubt that
> many programmers would use "auto" with such an expectation.
>
> Programmers after all are quite familiar with the penalties of being
> vague in their programming. So how many programmers, needing to
> declare an "int" variable, would instead of declaring the "int"
> variable - opt to declare an "auto" variable instead? Most programers
> I would think, would instinctively would favor the explicit
> declaration over the implicit one.

With this point I flat out disagree as I have extensive experience with
"auto" in another language. Defining symbols with "auto" makes the code
more robust - if the operands change type from int to long or even
double, MyNum or whatnot, the result would follow. If you explicitly
type the result as int, then a long will be silently truncated, and all
you must rely on for debugging are the non-standard compiler warnings.

Andrei

Jiri Palecek

unread,

Jan 8, 2008, 9:21:56 AM1/8/08

to

Greg Herlihy wrote:

> On Jan 4, 11:57 pm, "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsiteForEm...@erdani.org> wrote:
>> I have had a mini- or probably micro- or even milli-epiphany: using
>> "auto" will exacerbate C's broken unsigned arithmetic, which C++ also
>> inherited.
>
> The new use of the "auto" keyword does not change anything abut C++'s
> unsigned arithmetic - and that is exactly how it should be.

Yes, auto doesn't change anything about C++ arithmetic, which is good. But
that doesn't mean that C++ arithmetic should not change per se.

>> As we all know, in C, any expression that has unsigned within a radius
>> of a mile will also have type unsigned. This is a simple rule but one of
>> remarkable bluntness because it assign many operators the wrong result
>> type. Consider u an unsigned int value and i a signed int value.
>>
>> 1. u * i and i * u yield unsigned, although it should yield a signed
> value.
>
> No. The product of u and i should not be signed. Here's why: an
> unsigned int in C++ is not simply a signed integer value that happens
> to have a non-negative value. In C++, an unsigned int is a member of a
> "finite field". Signed values, in contrast, are members of a non-
> finite field (the set of integers) - even though an int type in C++
> can hold only a finite number of values.

Sorry, but that is a misunderstanding. unsigneds with their usual operations
do not form a field, but a ring.

> Finite fields have some interesting properties. For one, all
> operations performed within a finite field result in an element within
> that field. Therefore, it must be the case that all arithmetic
> operations involving an unsigned int must yield an unsigned int. So, i
> * u has to produce an unsigned value - even if the multiplication has
> to "wrap around" the edges of the field in either a forward (for
> positive multipliers) or a backward (for negative multipliers)
> direction in order to ensure that the result of the multiplication is
> a member of the finite field.

However, be it a ring or field or whatever, any properties of operation on
it only apply to the operations which are part of the structure. In case of
rings, these are (let U denote unsigned):

+: U x U -> U ... the usual addition (modulo)
-: U -> U ... the opposite element (as in -1==0xFFFF)
*: U x U -> U ... the usual multiplication (modulo)

So, the properties of rings tell us absolutely _nothing_ about how u*i, u+i
or u/i should behave (it also doesn't tell us anything about u/u, because
rings have no division, also, it doesn't tell us anything about
relationals, because there's no way a finite additive group can be ordered
consistently with the addition operation)

> Naturally, for anyone familiar only with standard arithmetic, finite
> field arithmetic does seem odd. But being different does not imply
> being wrong. As it turns out, finite field arithmetic is just as well-
> defined (but not as well-known) as the "ordinary" arithmetic taught in
> first grade. So -u yielding an unsigned value in a finite field is
> really no more surprising than -i yielding a signed value over a non-
> finite field.

This is OK. However, if, in C++, it would hold that for each unsigned u,
(int)(-u)==-(int)u (which does not, unfortunately), it would also hold that
any calculation that only uses +, -, * would yield the same result
regardless of signedness/unsignedness of the arguments or any
subexpressions, provided the "signed" subexpressions do not overflow.

>> This all is hardly news to anyone hanging out around here. My
>> milli-epiphany is that "auto" will make all of the ambiguities worse.
>> Why? Because C and C++98 require a type specification whenever a value
>> is defined. But in C++0x, if auto is successful, people will use "auto"
>> knowing it does the right thing without so much as thinking about it:
>>
>> auto a = i + u;
>
> The above expression adds i and u and stores the result a variable
> named "a" of some integral type. Now, clearly the programmer does not
> care whether "a" happens to be signed or unsigned. After all, if the
> programmer had a preference regarding a's signedness, then the
> programmer would have specified whether "a" was a signed or unsigned
> type. But since no such type is specified, it must be the case that
> whatever type the compiler does select for "a" - is a matter of
> complete indifference to the programmer.

No, that is not the purpose of auto. If the programmer really didn't care,
you could "infer" some type like void for every auto in every program. The
thing a programmer wants to accomplish using auto is to infer the most
generic type that can hold the value of rhs. Something like if E is an
expression and S is an expression containing E as a subexpression, and

auto a = E;

then S and S with the subexpression E replaced by a should be equivalent if
both have defined behaviour.

However, auto does that quite well, so this is really not the problem. The
problem is that C++ lets you mix signed and unsigned types in expressions
even if it has effect on the value, which would be solved Alexei's
proposal.

> Programmers after all are quite familiar with the penalties of being
> vague in their programming. So how many programmers, needing to
> declare an "int" variable, would instead of declaring the "int"
> variable - opt to declare an "auto" variable instead? Most programers
> I would think, would instinctively would favor the explicit
> declaration over the implicit one.

I don't think there will be many programmers trying to write "auto" instead
of "int" (after all, it is one character longer :-) However, there might be
programmers who, in templated code, would think at this place

T t=getT();
unsigned u=...;
T something=t*u;

something like "what if the class T is so clever it returns something
magical from t*u, like an expression template? I can make use of that,
after all, I only need it for further computation, I don't extract the
value." And then, changes it into

T t=getT();
unsigned u=...;
auto something=t*u;
... do more computation with something ...

Regards
Jiri Palecek

Walter Bright

unread,

Jan 8, 2008, 9:24:35 AM1/8/08

to

Greg Herlihy wrote:
> Here's why: an
> unsigned int in C++ is not simply a signed integer value that happens
> to have a non-negative value. In C++, an unsigned int is a member of a
> "finite field". Signed values, in contrast, are members of a non-
> finite field (the set of integers) - even though an int type in C++
> can hold only a finite number of values.

What's the basis for the assertion that unsigned are finite fields and
signed are infinite fields?

I always thought the difference between signed and unsigned ints was the
bias in the range of values, not any theoretical difference that is not
reflected in the actual machine. ints in C++ reflect the underlying
reality of the hardware. Pretending that reality doesn't exist will get
one into big (programming) trouble.

For example, many program bugs result from integer overflow and
subsequent wraparound (the language offers no straightforward way to
detect and trap such errors). You can't program as if ints had infinite
range.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

Francis Glassborow

unread,

Jan 8, 2008, 5:39:50 PM1/8/08

to

Andrei Alexandrescu (See Website For Email) wrote:
> Greg Herlihy wrote:

> This argument does not even follow. Int is also a finite field
> isomorphic with unsigned, so it's completely arbitrary in a mixed
> operation which field you want the result to "fall". On what grounds was
> unsigned preferred? For all I know, u1 - u2 produces a useful result for
> small values of u1 and u2 if it's typed as int.
>

I wish that int did model a finite field. Unfortunately it does not
because of its behaviour when when you go outside the supported range.
Yes it is only a technical nit but errors weaken your argument.

Nonetheless you are completely correct when you say that in general
programmers view unsigned int as modelling a range of natural numbers.
Indeed many are surprised the first time they come across wrap around.
They get even more surprised when they discover that there is no such
requirement for int. And worse, there is no requirement for the
implementation to tell you what it does when an int expression evaluates
out of the range of supported values.

Francis Glassborow

unread,

Jan 8, 2008, 5:40:00 PM1/8/08

to

Walter Bright wrote:
> Greg Herlihy wrote:
>> Here's why: an
>> unsigned int in C++ is not simply a signed integer value that happens
>> to have a non-negative value. In C++, an unsigned int is a member of a
>> "finite field". Signed values, in contrast, are members of a non-
>> finite field (the set of integers) - even though an int type in C++
>> can hold only a finite number of values.
>
> What's the basis for the assertion that unsigned are finite fields and
> signed are infinite fields?
>
> I always thought the difference between signed and unsigned ints was the
> bias in the range of values, not any theoretical difference that is not
> reflected in the actual machine. ints in C++ reflect the underlying
> reality of the hardware. Pretending that reality doesn't exist will get
> one into big (programming) trouble.
>
> For example, many program bugs result from integer overflow and
> subsequent wraparound (the language offers no straightforward way to
> detect and trap such errors). You can't program as if ints had infinite
> range.

No, much, much worse; you have no assurance that you will get
wrap-around on overflow of an int. You can get absolutely anything
however whimsical because your program has entered UB land.

Nevin :-] Liber

unread,

Jan 8, 2008, 5:42:45 PM1/8/08

to

In article <478114A...@erdani.org>,

"Andrei Alexandrescu (See Website For Email)"
<SeeWebsit...@erdani.org> wrote:

> Nevin :-] Liber wrote:
> > auto x = static_cast<int>((i + u) / 2);
> >
> > because they "know" that casts always "fix" these kinds of problems.
>
> People tend to take the path of least resistance. In this case, I think
> they'd write:
>
> int c = (i + u) / 2;
>
> which is shorter and does the same thing.

It means that while this compiles:

std::vector<X> v;
//...
auto middle_index_round_down = v.size() / 2;

the slightly different:

auto middle_index_round_up = (1 + v.size()) / 2;

would not (since it is (i + u) / 2).

Heck, even

if (!v.empty)
{
auto back_index = v.size() - 1;
// ...
}

would fail to compile.

In order to ensure that these types of things would compile, they would
have to go back to using std::vector<X>::size_type (or worse, they would
follow the path of least resistance and choose one of int, unsigned,
long, unsigned long or size_t). Isn't the whole point of auto so that
we don't have to use those kinds of expressions?

Heck, if I'm using std::bitset, I have to know that the type returned by
size() is size_t, since there isn't a typedef for size_type inside of it
to use. But I digress.

> Nonono, that's certainly not what I had in mind. Again: in my opinion,
> it might be useful to just disallow (or warn on) the use of auto in the
> most flagrant cases of unsigned type mishandling.

I just don't know how to consistently tell at compile time which ones
are flagrant.

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> 773 961-1620

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

Jan 9, 2008, 10:29:25 AM1/9/08

to

Francis Glassborow wrote:
> Indeed many are surprised the first time they come across wrap around.
> They get even more surprised when they discover that there is no such
> requirement for int. And worse, there is no requirement for the
> implementation to tell you what it does when an int expression evaluates
> out of the range of supported values.

I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every last
one of them used the same instruction for adding ints and adding
unsigneds. So how can one wrap and the other not?

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

Walter Bright

unread,

Jan 9, 2008, 10:29:02 AM1/9/08

to

Francis Glassborow wrote:
> No, much, much worse; you have no assurance that you will get
> wrap-around on overflow of an int. You can get absolutely anything
> however whimsical because your program has entered UB land.

I've never heard of a machine that did otherwise.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

James Dennett

unread,

Jan 9, 2008, 10:45:34 AM1/9/08

to

Walter Bright wrote:
> Greg Herlihy wrote:
>> Here's why: an
>> unsigned int in C++ is not simply a signed integer value that happens
>> to have a non-negative value. In C++, an unsigned int is a member of a
>> "finite field". Signed values, in contrast, are members of a non-
>> finite field (the set of integers) - even though an int type in C++
>> can hold only a finite number of values.
>
> What's the basis for the assertion that unsigned are finite fields and
> signed are infinite fields?

unsigned types have guaranteed modular behaviour, so they act
like perfectly healthy rings (but not fields, as there are
zero divisors); they are Z/nZ for some n.

integral types don't have such defined semantics: they are just
a model of a subset of Z, with restrictions of the operators.

> I always thought the difference between signed and unsigned ints was the
> bias in the range of values, not any theoretical difference that is not
> reflected in the actual machine.

Actual machines reflect this theory. That's why this theory
is used by C and C++. Many actual machines actually go further
and make signed integral types also have modular semantics,
effectively eliminating most of the distinction.

> ints in C++ reflect the underlying
> reality of the hardware. Pretending that reality doesn't exist will get
> one into big (programming) trouble.

Nobody is arguing for ignoring reality.

> For example, many program bugs result from integer overflow and
> subsequent wraparound (the language offers no straightforward way to
> detect and trap such errors). You can't program as if ints had infinite
> range.

Indeed, and obviously so.

-- James

Greg Herlihy

unread,

Jan 9, 2008, 10:43:24 AM1/9/08

to

On Jan 8, 6:24 am, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Greg Herlihy wrote:
> > Here's why: an unsigned int in C++ is not simply a signed integer value
> > that happens to have a non-negative value. In C++, an unsigned int is
> > a member of a "finite field". Signed values, in contrast, are members

> > of a non-finite field (the set of integers) - even though an int type

> > in C++ can hold only a finite number of values.
>
> What's the basis for the assertion that unsigned are finite fields and
> signed are infinite fields?

The most obvious clue is that signed values in C++ can "overflow" (and
lead to undefined behavior) - while unsigned values cannot overflow
(and therefore cannot lead to undefined behavior).

The fact that signed arithmetic in C++ can "overflow" means that a
signed integral type can represent only a finite subset of values
drawn from an infinite set (the set of integers). "Overflow" occurs
when an expression yields an integer value that is a member of the
superset, but is not a member of the subset of integer values that the
type can represent.

The fact that unsigned arithmetic in C++ cannot overflow means that an
unsigned integer type represents the complete set of values drawn from
a finite set (specifically: the set of integers [0...2^n-1] with n
being the number of bits in the unsigned type's value representation).
So overflow is never possible with unsigned types in C++; instead,
calculations appear to "wrap" around, and always yield a value that is
a member of the finite set.

> I always thought the difference between signed and unsigned ints was the
> bias in the range of values, not any theoretical difference that is not
> reflected in the actual machine. ints in C++ reflect the underlying
> reality of the hardware. Pretending that reality doesn't exist will get
> one into big (programming) trouble.

If signed and unsigned types in C++ represented values drawn from the
same superset - then the rules for signed and unsigned arithmetic
would have to be identical. There could be no difference between
signed and unsigned type arithmetic because the rules that apply to
the superset of values as a whole - would naturally govern any subset
of values thereof..

Andrei in his original post starts with the assumption that that
signed and unsigned values in C++ are drawn from a common superset of
values. So, on the basis of this assumption, he interprets the fact
that unsigned arithmetic differs from signed arithmetic in C++ - as a
flaw in the language.

Instead of concluding that unsigned arithmetic in C++ is broken,
however, one can reach the more logical conclusion that the difference
between signed and unsigned arithmetic (combined with the fact that
the signed values can overflow while the unsigned values cannot) is
proof that signed and unsigned types are drawn from two -different-
supersets of values: one infinite, and the other finite. So, in C++, a
signed type represents a finite range of values in an infinite set,
whereas an unsigned type represents a complete range of values of a
finite set.

> For example, many program bugs result from integer overflow and
> subsequent wraparound (the language offers no straightforward way to
> detect and trap such errors). You can't program as if ints had infinite
> range.

Overflow is always possible whenever a type represents a finite number
of values drawn from an infinite set. Note also that consequences of
integer overflow is undefined in C++ (although most implementations
ignore the condition). Moreover, the fact that signed arithmetic
overflow has no defined behavior proves that signed values are not
drawn from an finite set, just as the absence of overflow for unsigned
arithmetic proves that unsigned values in C++ are drawn from a finite
set.

Greg

Andrei Alexandrescu (See Website For Email)

unread,

Jan 9, 2008, 10:44:39 AM1/9/08

to

Nevin :-] Liber wrote:
> In article <478114A...@erdani.org>,
> "Andrei Alexandrescu (See Website For Email)"
> <SeeWebsit...@erdani.org> wrote:
>
>> Nevin :-] Liber wrote:
>>> auto x = static_cast<int>((i + u) / 2);
>>>
>>> because they "know" that casts always "fix" these kinds of problems.
>> People tend to take the path of least resistance. In this case, I think
>> they'd write:
>>
>> int c = (i + u) / 2;
>>
>> which is shorter and does the same thing.
>
> It means that while this compiles:
>
> std::vector<X> v;
> //...
> auto middle_index_round_down = v.size() / 2;

(Background: below, by "my scheme" I denote the hypothetical use of
types intbits and longbits to encode an integral of ambiguous,
as-of-yet-undecided signedness.)

This is u / i, which (in my scheme) would return intbits. Given,
however, that 2 is a compile-time constant, the compiler could relax the
rule and type it as u / u which yields u.

> the slightly different:
>
> auto middle_index_round_up = (1 + v.size()) / 2;
>
> would not (since it is (i + u) / 2).

Following the rule above and considering that 1 is a positive
compile-time constant, the expression would be typed as (u + u) / u
which is u again. But let's say for the argument that you use a variable:

int x = 1;
auto middle_index_round_up = (x + v.size()) / 2;

In my system hat would not issue the warning: "division has
sign-dependent semantics and was used with an expression of ambiguous sign."

Which I think is pretty neat.

> Heck, even
>
> if (!v.empty)
> {
> auto back_index = v.size() - 1;
> // ...
> }
>
> would fail to compile.

First off, the branch is never taken because you take the address of a
member function which is never null :o).

Assuming parens after empty, the code would not fail to compile. It all
depends on what you do with back_index. That value will be typed as
intbits, and so it will have ambiguous signedness. If you later try an
operation with sign-dependent semantics (e.g. shift, promotion,
division, or modulus), a warning would ensue. Otherwise (add, subtract,
multiply), there is no warning.

> In order to ensure that these types of things would compile, they would
> have to go back to using std::vector<X>::size_type (or worse, they would
> follow the path of least resistance and choose one of int, unsigned,
> long, unsigned long or size_t). Isn't the whole point of auto so that
> we don't have to use those kinds of expressions?

In my system, auto would do the right thing and also transport
information that is useful during the downstream usage of the computed
value.

> Heck, if I'm using std::bitset, I have to know that the type returned by
> size() is size_t, since there isn't a typedef for size_type inside of it
> to use. But I digress.
>
>> Nonono, that's certainly not what I had in mind. Again: in my opinion,
>> it might be useful to just disallow (or warn on) the use of auto in the
>> most flagrant cases of unsigned type mishandling.
>
> I just don't know how to consistently tell at compile time which ones
> are flagrant.

My criterion is this: if you compute a value for which int and unsigned
are idempotent as the result type (e.g. u + i) or one for which choosing
unsigned is arguably the wrong choice (e.g. i / u), and later use that
value in ways that have sign-dependent semantics, issue a warning and
ask for the programmer to make signedness explicit. It's actually
amazing how many two's complement operations don't even care about
signedness and "just work". That's why I believe my system has low
impact on correct programs.

Andrei

--

Andrei Alexandrescu (See Website For Email)

unread,

Jan 9, 2008, 10:44:14 AM1/9/08

to

Ouch. I thought int operations are guaranteed to work the modulo way.
Knowing that you should know better than me, I searched the standard for
"modulo" and found 3.9.1 para 4, which says:

"Unsigned integers, declared unsigned, shall obey the laws of arithmetic
modulo 2 n where n is the number of bits in the value representation of
that particular size of integer. (Footnote: This implies that unsigned
arithmetic does not overflow because a result that cannot be represented
by the resulting unsigned integer type is reduced modulo the number that
is one greater than the largest value that can be represented by the
resulting unsigned integer type.)"

So far so good. Skipping down to para 7 on the same page reveals:

"The representations of integral types shall define values by use of a
pure binary numeration system.44) [Example: this International Standard
permits 2's complement, 1's complement and signed magnitude
representations for integral types.]"

I thought 2's complement is required, but I was wrong. So now I think
that indeed overflow on integers might produce quite a variety of results.

Interestingly, a conversion from int to unsigned int is required to
behave as if two's complement were used! See 4.7/2:

"If the destination type is unsigned, the resulting value is the least
unsigned integer congruent to the source integer (modulo 2 n where n is
the number of bits used to represent the unsigned type). [Note: In a
two's complement representation, this conversion is conceptual and there
is no change in the bit pattern (if there is no truncation).]"

Thanks!

Andrei

Andrei Alexandrescu (See Website For Email)

unread,

Jan 9, 2008, 6:44:30 PM1/9/08

to

Greg Herlihy wrote:
[snip to the grand finale:]

> Overflow is always possible whenever a type represents a finite number
> of values drawn from an infinite set. Note also that consequences of
> integer overflow is undefined in C++ (although most implementations
> ignore the condition). Moreover, the fact that signed arithmetic
> overflow has no defined behavior proves that signed values are not
> drawn from an finite set, just as the absence of overflow for unsigned
> arithmetic proves that unsigned values in C++ are drawn from a finite
> set.

This might as well the first time in history that someone convinced
someone else of something on the Usenet, so - thanks a million.

I think I'm getting what you say. Let me see if I understand your point
correctly.

* Unsigned has defined behavior for all values and all operations, save
for division by zero.

* Int does NOT enjoy that property; if it were guaranteed to use 2's
complement representation, it WOULD have. That was the fatal flaw in my
reasoning - I thought C++ always uses 2's complement for int.

* When doing a mixed-sign operation, there were two possible choices:
(a) convert both to unsigned and always obtain a well-defined result, or
(b) convert both to int and step into undefined behavior. Clearly, the
better choice was (a).

Is this correct?

Thanks again!

Andrei

Bart van Ingen Schenau

unread,

Jan 9, 2008, 6:45:13 PM1/9/08

to

Walter Bright wrote:

> Francis Glassborow wrote:
>> Indeed many are surprised the first time they come across wrap
>> around. They get even more surprised when they discover that there is
>> no such requirement for int. And worse, there is no requirement for
>> the implementation to tell you what it does when an int expression
>> evaluates out of the range of supported values.
>
> I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every
> last one of them used the same instruction for adding ints and adding
> unsigneds. So how can one wrap and the other not?

On hardware that uses 2-s complement representation of negative numbers,
you can perform the large majority of operations as if the operands are
unsigned without affecting the result if the operands actually were
signed (and with a negative value).
As a side-effect, the wrap-around behaviour of unsigned arithmetic also
gets used for signed arithmetic.

On the other hand, I have used a compiler for a DSP that supports
saturating arithmetic (overflow gets clipped to the largest value).
Although the compiler writers decided otherwise, this mode of
arithmetic could have been selected to be used for signed operands
without any violation of the standard.

>
> --------
> Walter Bright

Bart v Ingen Schenau
--
a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://c-faq.com/
c.l.c++ FAQ: http://www.parashift.com/c++-faq-lite/

Greg Herlihy

unread,

Jan 10, 2008, 9:55:42 AM1/10/08

to

On Jan 9, 7:29 am, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Francis Glassborow wrote:
> > Indeed many are surprised the first time they come across wrap around.
> > They get even more surprised when they discover that there is no such
> > requirement for int. And worse, there is no requirement for the
> > implementation to tell you what it does when an int expression evaluates
> > out of the range of supported values.
>
> I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every last
> one of them used the same instruction for adding ints and adding
> unsigneds. So how can one wrap and the other not?

The short answer is that signed arithmetic might not be allowed to
wrap. Specifically, a C++ compiler could (at the very least) insert
code to test for signed integer overflow after every signed arithmetic
operation that the user's programs performs. If one of these tests
does detect that signed integer overflow has occurred, then the C++
compiler is at liberty to have the program to do anything it likes.
The C++ Standard does not define any behavior for a program once
signed overflow has occurred. So, for example, the C++ compiler might
have the program abort, or trap, or continue executing - while
supplying any arbitrary (or not-so-arbitrary) value to represent the
result of the signed arithmetic operation that overflowed. The
compiler is certainly not obligated in this situation to supply the
wrapped value as the result of the calculation. So for these reasons,
user code cannot count on signed integer overflow necessarily yielding
any particular value; and in fact, the user's program cannot really
count on anything - once the value of a signed integer type has
overflowed.

Morever, at least one C++ compiler, g++ 4.0, perform optimizations
based on the knowledge that signed arithmetic operations are not
guaranteed to wrap around like their unsigned counterparts. In fact,
to disable these optimizations (and force g++ to assume that signed
operations do, in fact, wrap), the programmer has to pass the "-
fwrapv" command line switch to the g++ compiler.

Greg

James Dennett

unread,

Jan 10, 2008, 9:57:09 AM1/10/08

to

Walter Bright wrote:
> Francis Glassborow wrote:
>> No, much, much worse; you have no assurance that you will get
>> wrap-around on overflow of an int. You can get absolutely anything
>> however whimsical because your program has entered UB land.
>
> I've never heard of a machine that did otherwise.

But C and C++ implementors have, and so the standards
deliberately support those machines. This is one of the
advantages of having a diverse group working on a language
(which offsets some of the *disadvantages*).

-- James

Walter Bright

unread,

Jan 10, 2008, 10:06:53 AM1/10/08

to

Bart van Ingen Schenau wrote:
> On the other hand, I have used a compiler for a DSP that supports
> saturating arithmetic (overflow gets clipped to the largest value).
> Although the compiler writers decided otherwise, this mode of
> arithmetic could have been selected to be used for signed operands
> without any violation of the standard.

The compiler writers made a wise move. Such an unusual mode could
silently introduce pernicious bugs when porting existing, debugged code
to it.

Since there is no way to defend against such possible errors in one's
code, and the overwhelming majority (dare I say all?) compilers
implement it in one way, that way should be standardized.

There are a lot of choices one must make when designing a compiler that
go far beyond what the standard says. Being compatible above and beyond
the standard as much as possible with existing practice is a solid win.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Francis Glassborow

unread,

Jan 10, 2008, 10:08:48 AM1/10/08

to

Bart van Ingen Schenau wrote:

> Walter Bright wrote:
>
>> Francis Glassborow wrote:
>>> Indeed many are surprised the first time they come across wrap
>>> around. They get even more surprised when they discover that there is
>>> no such requirement for int. And worse, there is no requirement for
>>> the implementation to tell you what it does when an int expression
>>> evaluates out of the range of supported values.
>> I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every
>> last one of them used the same instruction for adding ints and adding
>> unsigneds. So how can one wrap and the other not?
>
> On hardware that uses 2-s complement representation of negative numbers,
> you can perform the large majority of operations as if the operands are
> unsigned without affecting the result if the operands actually were
> signed (and with a negative value).
> As a side-effect, the wrap-around behaviour of unsigned arithmetic also
> gets used for signed arithmetic.
>
> On the other hand, I have used a compiler for a DSP that supports
> saturating arithmetic (overflow gets clipped to the largest value).
> Although the compiler writers decided otherwise, this mode of
> arithmetic could have been selected to be used for signed operands
> without any violation of the standard.
>

Yes, but the result is fully defined and is not a reason for making it
undefined behaviour. The only justification for the UB is that there
have been processors that detect overflow and then terminate the process
(or perhaps worse). Now my argument is that we should make the result
implementation defined and if C or C++ is to run on such a processor
either you provide a non-standard implementation (not exactly that
uncommon a thing to do when a processor has some uncommon feature) or
you provide either 2's complement or saturation semantics by emulation
(yes that has a performance cost but the payback for programmers writing
clean software is considerable.)

Note that where you have saturation semantics the programmer has a
simple test to discover if that has happened:

int main(){
int i, j;
std::cin >> i >> j;
int temp = i + j;
if(temp - i != j) {
std::cout << "The evaluation saturated\n";
}
else {
std::cout << temp << std::endl;
}
}

--

Martin Bonner

unread,

Jan 10, 2008, 10:07:11 AM1/10/08

to

On Jan 9, 3:29 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Francis Glassborow wrote:
> > No, much, much worse; you have no assurance that you will get
> > wrap-around on overflow of an int. You can get absolutely anything
> > however whimsical because your program has entered UB land.
>
> I've never heard of a machine that did otherwise.

- I'm pretty sure that there were machines that raised a machine check
or software interupt whenever integers overflowed.

- A machine that used sign+magnitude could reasonably interupt
whenever -0 was loaded from memory into a register (which means the
error could happen quite a bit later when the overflowed result was /
loaded/).

- I believe there used to be a machine that manipulated integers with
floating point instructions (using a zero exponent or something).
Overflow might have /very/ interesting properties on such a machine.

Having said all that, I am pretty sure that anybody implementing a
compiler will try and use wrapping arithmetic if possible (because so
many people write unportable code which assumes it), and anybody
designing a new instruction set will provide 2-complement wrapping
arithmetic (for the same reason).

Ron Natalie

unread,

Jan 10, 2008, 10:08:57 AM1/10/08

to

Walter Bright wrote:
> Francis Glassborow wrote:
>> Indeed many are surprised the first time they come across wrap around.
>> They get even more surprised when they discover that there is no such
>> requirement for int. And worse, there is no requirement for the
>> implementation to tell you what it does when an int expression evaluates
>> out of the range of supported values.
>
> I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every last
> one of them used the same instruction for adding ints and adding
> unsigneds. So how can one wrap and the other not?
>

We actually had a processor (Gould SEL) that had exceptions on integer
overflow. So much code is sloppily written (at least in the UNIX of
the day) that we had to disable that one. It was *WAY* more prevalent
of a bug in programs that the "All the worlds a freaking vax" assumption
that *(int*)0 is 0.

Francis Glassborow

unread,

Jan 10, 2008, 10:07:59 AM1/10/08

to

Walter Bright wrote:
> Francis Glassborow wrote:
>> Indeed many are surprised the first time they come across wrap around.
>> They get even more surprised when they discover that there is no such
>> requirement for int. And worse, there is no requirement for the
>> implementation to tell you what it does when an int expression evaluates
>> out of the range of supported values.
>
> I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every last
> one of them used the same instruction for adding ints and adding
> unsigneds. So how can one wrap and the other not?
>

I largely agree yet both WG14 and WG21 adamantly insist that overflow of
a signed integer type is undefined behaviour. It deeply irritates me
because it means that this simple novice program has potential undefined
behaviour (and avoiding it is real tough -- for all but expert programmers)

int main(){
int i, j;
std::cin >> i >> j;

std::cout << i + j << std::endl;

Peter Dimov

unread,

Jan 10, 2008, 5:08:31 PM1/10/08

to

On Jan 9, 5:29 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Francis Glassborow wrote:
> > No, much, much worse; you have no assurance that you will get
> > wrap-around on overflow of an int. You can get absolutely anything
> > however whimsical because your program has entered UB land.
>
> I've never heard of a machine that did otherwise.

Machines may not, but some optimizers apparently do.

Andrei Alexandrescu (See Website For Email)

unread,

Jan 10, 2008, 5:06:47 PM1/10/08

to

Ron Natalie wrote:
> Walter Bright wrote:
>> Francis Glassborow wrote:
>>> Indeed many are surprised the first time they come across wrap around.
>>> They get even more surprised when they discover that there is no such
>>> requirement for int. And worse, there is no requirement for the
>>> implementation to tell you what it does when an int expression evaluates
>>> out of the range of supported values.
>>
>> I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every
>> last one of them used the same instruction for adding ints and adding
>> unsigneds. So how can one wrap and the other not?
>>
>
> We actually had a processor (Gould SEL) that had exceptions on integer
> overflow. So much code is sloppily written (at least in the UNIX of
> the day) that we had to disable that one. It was *WAY* more prevalent
> of a bug in programs that the "All the worlds a freaking vax" assumption
> that *(int*)0 is 0.

Let's then restart the discussion. From what I've read, all references
to non-modulo-int or non-2's-complement machines were either historical
or described machines that offered the ability to choose overflow mode.

Is it reasonable to say that de facto, all of today's machines offer 2's
complement int and modulo overflow?

Taking this further: if the C++ standard were defined today, would it be
better to rely on 2's complement and modulo throughout?

Andrei

Bart van Ingen Schenau

unread,

Jan 10, 2008, 5:07:36 PM1/10/08

to

Walter Bright wrote:

> Bart van Ingen Schenau wrote:
>> On the other hand, I have used a compiler for a DSP that supports
>> saturating arithmetic (overflow gets clipped to the largest value).
>> Although the compiler writers decided otherwise, this mode of
>> arithmetic could have been selected to be used for signed operands
>> without any violation of the standard.
>
> The compiler writers made a wise move. Such an unusual mode could
> silently introduce pernicious bugs when porting existing, debugged
> code to it.

I don't think everyone would agree with that assessment.
One of the major fields where DSP's get used is in audio processing. In
that field, saturation actually gives better results than wraparound.

If we were to define the behaviour for overflow of signed integers, I
would much rather prefer that it is made implementation defined than
that one particular behaviour get chosen.

>
> Since there is no way to defend against such possible errors in one's
> code, and the overwhelming majority (dare I say all?) compilers
> implement it in one way, that way should be standardized.

As long as there are niche markets where some other behaviour gives
better results, we should allow the compiler writers the choice of what
they implement.

And if the behaviour is implementation defined, you can test for the
behaviour that is provided:

int main()
{
int i = INT_MAX;
int test = i+1;

if (test == INT_MIN)
{
std::cout << "Wraparound on overflow\n";
}
else if (test == INT_MAX)
{
std::cout << "Saturating arithmetic\n";
}
else
{
std::cout << "Weird. Is this compiler conforming?\n";
}
}

> --------
> Walter Bright

Bart v Ingen Schenau
--
a.c.l.l.c-c++ FAQ: http://www.comeaucomputing.com/learn/faq
c.l.c FAQ: http://c-faq.com/
c.l.c++ FAQ: http://www.parashift.com/c++-faq-lite/

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

Jan 10, 2008, 5:17:32 PM1/10/08

to

James Dennett wrote:
> Walter Bright wrote:
>> Francis Glassborow wrote:
>>> No, much, much worse; you have no assurance that you will get
>>> wrap-around on overflow of an int. You can get absolutely anything
>>> however whimsical because your program has entered UB land.
>>
>> I've never heard of a machine that did otherwise.
>
> But C and C++ implementors have, and so the standards
> deliberately support those machines. This is one of the
> advantages of having a diverse group working on a language
> (which offsets some of the *disadvantages*).

It's not a clear advantage to support those machines. It's not practical
to detect dependency on a particular overflow behavior, so what happens
when your code is ported is that you're at risk for serious bugs
*silently* being inserted into your otherwise working code.

C++ would better serve programmers by standardizing much of the
undefined behavior and catering to the needs of the 99.999% of C++
programmers out there, rather than some wacky, obsolete machine.

The interesting thing about this is that while one might think this
pulls the rug out from under such machines, in reality it does not.
There's nothing wrong with a compiler vendor for Wacky Obsolete CPU to
state that "This compiler is C++ Standard Conforming except for the
following behaviors ... 1) integer overflow is different in this manner
...." The problems facing the programmer for such a WOCPU won't be any
different than if the Standard allowed such unusual behavior, and it
would be better because at least (presumably) the documentation would
list the non-conforming behavior and the programmer can keep an eye out
for it.

For the other 99.999% of programmers, they have behavior they can rely
on. It's a win-win all around.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Edward Rosten

unread,

Jan 10, 2008, 5:13:18 PM1/10/08

to

On Jan 9, 8:29 am, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Francis Glassborow wrote:
> > No, much, much worse; you have no assurance that you will get
> > wrap-around on overflow of an int. You can get absolutely anything
> > however whimsical because your program has entered UB land.
>
> I've never heard of a machine that did otherwise.

Slightly facetious, but you've never heard of Intel?

The MMX innstruction set provides instructions like (written using MMX
intrinsics notation) _mm_adds_pi16, which adds with saturation.
There's no reason a compiler couldn't use this instruction. On the
other hand, are there machines which do not support wrap-around at
all? On a 2's complement machine, wraparound without a trap is used
for adding integers wider than the machine's word size, so it would be
quite strange if those CPUs didn't provide that facility.

I suppose it depends if there are extant machines which use one's
complement.

-Ed
--
(You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258)

/d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1
r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12
d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage

Greg Herlihy

unread,

Jan 11, 2008, 2:03:12 AM1/11/08

to

On Jan 9, 3:44 pm, "Andrei Alexandrescu (See Website For Email)"
<SeeWebsiteForEm...@erdani.org> wrote:

> Greg Herlihy wrote:
>
> > Overflow is always possible whenever a type represents a finite number
> > of values drawn from an infinite set. Note also that consequences of
> > integer overflow is undefined in C++ (although most implementations
> > ignore the condition). Moreover, the fact that signed arithmetic
> > overflow has no defined behavior proves that signed values are not
> > drawn from an finite set, just as the absence of overflow for unsigned
> > arithmetic proves that unsigned values in C++ are drawn from a finite
> > set.
>
> This might as well the first time in history that someone convinced
> someone else of something on the Usenet, so - thanks a million.

This occassion marks a personal first as well - yours is the first
response to one of my USENET posts that ever expressed some measure of
agreement with what I had written. :-)

> I think I'm getting what you say. Let me see if I understand your point
> correctly.
>
> * Unsigned has defined behavior for all values and all operations, save
> for division by zero.

Yes. I would say that unsigned values in c++ form a "closed" set of
values, such that all (defined) arithmetic operations within this
closed set, yield a value that is a member this set. Signed values in C
++ form an "open" set of values; so - even though all (defined)
operations with signed types yield a member of this open set - not all
members of the set can be represented by a signed type. And whenever a
calculation yields one of these un-representable values, overflow is
said to have occurred.

> * Int does NOT enjoy that property; if it were guaranteed to use 2's
> complement representation, it WOULD have. That was the fatal flaw in my
> reasoning - I thought C++ always uses 2's complement for int.

Even 2's complement representation does not guarantee that signed
operations will wrap. A C++ compiler could generaste "add with
overflow" instructions for signed types, and generate "add - no
overflow" instructions for unsigned types and then trap on the
overflow condition flag.

As an aside, C# has two keywords "checked" and "unchecked" that let
the programmer specify whether arithmetic overflow should be ignored -
or cause an exception to be thrown - for a specific calculation or
block of calculations.

> * When doing a mixed-sign operation, there were two possible choices:
> (a) convert both to unsigned and always obtain a well-defined result, or
> (b) convert both to int and step into undefined behavior. Clearly, the
> better choice was (a).

If one of the operands in an expression is unsigned, then the
arithmetic operation is usually unsigned. Granted, C++ does not
strictly adhere to this principle. For example, a signed long operand
trumps an unsigned int operand (but only if a long is able to
represent all the values of an unsigned int - oh well, so much for
portability). Otherwise, all things being equal, the use of an
unsigned type in an expression is usually "infectious" - propagating
unsignedness throughout the entire expression in which it appears.

Greg

Walter Bright

unread,

Jan 11, 2008, 2:00:56 AM1/11/08

to

Bart van Ingen Schenau wrote:
> Walter Bright wrote:
>
>> Bart van Ingen Schenau wrote:
>>> On the other hand, I have used a compiler for a DSP that supports
>>> saturating arithmetic (overflow gets clipped to the largest value).
>>> Although the compiler writers decided otherwise, this mode of
>>> arithmetic could have been selected to be used for signed operands
>>> without any violation of the standard.
>> The compiler writers made a wise move. Such an unusual mode could
>> silently introduce pernicious bugs when porting existing, debugged
>> code to it.
>
> I don't think everyone would agree with that assessment.
> One of the major fields where DSP's get used is in audio processing. In
> that field, saturation actually gives better results than wraparound.

I agree that makes perfect sense for someone writing code specifically
for that DSP.

But that doesn't help you when you're porting an existing, working code
base to it. Let's say you're porting an MP3 compressor that does heavy
integer manipulation. You have the source code, but have no idea how it
works, nor do you care. You compile it, and it doesn't work. Now what?
You've got a major investment of your time ahead of you.

> If we were to define the behaviour for overflow of signed integers, I
> would much rather prefer that it is made implementation defined than
> that one particular behaviour get chosen.

That doesn't help anyone who is faced with code that breaks when ported,
nor does it help anyone test to see if they are dependent on such
implementation defined behavior.

Not only that, if one were to write code that *relied* on that DSP's
behavior, that code becomes inherently non-portable, so why must this be
standardized? What advantage is there for anyone?

>> Since there is no way to defend against such possible errors in one's
>> code, and the overwhelming majority (dare I say all?) compilers
>> implement it in one way, that way should be standardized.
>
> As long as there are niche markets where some other behaviour gives
> better results, we should allow the compiler writers the choice of what
> they implement.

This is not a law. There's nothing preventing a compiler vendor from
having non-standard behavior that's specific to a particular niche and
is specifically there to aid programmers for that niche. Compiler
vendors do it all the time, in fact.

> And if the behaviour is implementation defined, you can test for the
> behaviour that is provided:
>
> int main()
> {
> int i = INT_MAX;
> int test = i+1;
>
> if (test == INT_MIN)
> {
> std::cout << "Wraparound on overflow\n";
> }
> else if (test == INT_MAX)
> {
> std::cout << "Saturating arithmetic\n";
> }
> else
> {
> std::cout << "Weird. Is this compiler conforming?\n";
> }
> }

I agree it's easy enough to detect what behavior is there, but I contend
it is impractical (or even impossible?) to mechanically detect whether
any particular section of code depends on particular integer overflow
behavior or not.

I propose that for such types of behavior, it is better for the standard
to standardize it as much as possible, so programmers do not have to
worry about bugs that cannot be detected. Or even worse, implement
"portable" solutions that don't work because the programmer doesn't have
a machine to test it on (it was common in the 16 bit days for people to
write code that was "portable" to 32 bits only to find when 32 bit
machines became available that they'd misunderstood the issues completely).

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 11, 2008, 2:59:49 PM1/11/08

to

Greg Herlihy wrote:
> Morever, at least one C++ compiler, g++ 4.0, perform optimizations
> based on the knowledge that signed arithmetic operations are not
> guaranteed to wrap around like their unsigned counterparts. In fact,
> to disable these optimizations (and force g++ to assume that signed
> operations do, in fact, wrap), the programmer has to pass the "-
> fwrapv" command line switch to the g++ compiler.

Can you give an example of what those optimizations are?

James Dennett

unread,

Jan 11, 2008, 2:55:18 PM1/11/08

to

Walter Bright wrote:
> James Dennett wrote:
>> Walter Bright wrote:
>>> Francis Glassborow wrote:
>>>> No, much, much worse; you have no assurance that you will get
>>>> wrap-around on overflow of an int. You can get absolutely anything
>>>> however whimsical because your program has entered UB land.
>>>
>>> I've never heard of a machine that did otherwise.
>>
>> But C and C++ implementors have, and so the standards
>> deliberately support those machines. This is one of the
>> advantages of having a diverse group working on a language
>> (which offsets some of the *disadvantages*).
>
> It's not a clear advantage to support those machines. It's not practical
> to detect dependency on a particular overflow behavior, so what happens
> when your code is ported is that you're at risk for serious bugs
> *silently* being inserted into your otherwise working code.
>
> C++ would better serve programmers by standardizing much of the
> undefined behavior and catering to the needs of the 99.999% of C++
> programmers out there, rather than some wacky, obsolete machine.

That's subjective: C and C++ have opted *not* to limit itself to
mainstream architecture, and are probably more widespread than any
other languages partly as a result of that.

There are certainly advantages in simplicity to restricting
a language to supporting only more "normal" architectures,
and if I were designing a language I'd do as you did with D,
and assume 2's complement, word sizes being powers of 2, and
so on. That doesn't mean that C or C++ made a "wrong" choice,
it just means that their design goals aren't the same.

> The interesting thing about this is that while one might think this
> pulls the rug out from under such machines, in reality it does not.
> There's nothing wrong with a compiler vendor for Wacky Obsolete CPU to
> state that "This compiler is C++ Standard Conforming except for the
> following behaviors ... 1) integer overflow is different in this manner
> ...."

Exceptions to conformance are major issues for those who write
widely ported code (which is many of the organizations for
which I've worked). Working around non-compliance is a very
expensive game.

> The problems facing the programmer for such a WOCPU won't be any
> different than if the Standard allowed such unusual behavior

That's false. It would require special knowledge of that system,
hence every such system, whereas currently knowledge of just one
document, the C++ standard, suffices. That's the strength of
having a standard.

> and it
> would be better because at least (presumably) the documentation would
> list the non-conforming behavior and the programmer can keep an eye out
> for it.
>
> For the other 99.999% of programmers, they have behavior they can rely
> on. It's a win-win all around.

Your 99.999% is rather optimistic, giving only 10 per million who
need to care about such systems. Some languages are needed to cater
for *very* portable code. D isn't such a language, and is not
intended to be. C and C++ are. Swings and roundabouts.

-- James

Francis Glassborow

unread,

Jan 11, 2008, 3:02:54 PM1/11/08

to

> Not only that, if one were to write code that *relied* on that DSP's
> behavior, that code becomes inherently non-portable, so why must this be
> standardized? What advantage is there for anyone?

As, for example, does code that assumes a 32-bit int. It is part of the
design philosophy of both C and C++ to support all reasonable hardware
designs and not endeavour to force hardware to conform to the
expectations of the language.

If you want fully portable code you do not use either C or C++. However
a competent programmer knows what the hardware requirements are for the
code he is writing and will check that they are met.

Anyone who simply re-compiles code for different hardware without
reading the documentation (which should specify what it was written for
and what it has been tested on) deserves a bit of pain :) OTOH using
third party software that is inadequately documented also deserves pain
:) If the original programmer cannot take the trouble to document
his/her code then why would you expect it to have sufficient quality to
be useful?

Francis Glassborow

unread,

Jan 11, 2008, 3:02:31 PM1/11/08

to

Bart van Ingen Schenau wrote:
> Walter Bright wrote:
>
>> Bart van Ingen Schenau wrote:
>>> On the other hand, I have used a compiler for a DSP that supports
>>> saturating arithmetic (overflow gets clipped to the largest value).
>>> Although the compiler writers decided otherwise, this mode of
>>> arithmetic could have been selected to be used for signed operands
>>> without any violation of the standard.
>> The compiler writers made a wise move. Such an unusual mode could
>> silently introduce pernicious bugs when porting existing, debugged
>> code to it.
>
> I don't think everyone would agree with that assessment.
> One of the major fields where DSP's get used is in audio processing. In
> that field, saturation actually gives better results than wraparound.
>

And that is also the case for image processing, saturating on a colour
makes much more sense than wrap-around.

There are numerous fields where saturated arithmetic makes more sense
than wrap-around (i.e. modulus) and where hardware is designed to work
in such fields it makes sense to allow C and C++ implementations support
such behaviour.

However I will continue to say that the current UB is not sensible even
if some experts want to justify it on the grounds that there are or have
been processors that raise a hardware exception on integer overflow.

--

Edward Rosten

unread,

Jan 11, 2008, 10:57:41 PM1/11/08

to

On Jan 10, 3:07 pm, Bart van Ingen Schenau <b...@ingen.ddns.info>
wrote:

> > Since there is no way to defend against such possible errors in one's
> > code, and the overwhelming majority (dare I say all?) compilers
> > implement it in one way, that way should be standardized.
>
> As long as there are niche markets where some other behaviour gives
> better results, we should allow the compiler writers the choice of what
> they implement.

In some ways. If all CPUs can provide integer arithmetic with
wraparound, then I personally think it would be worth standardising
on, so consistent behaviour can be achieved. If, however saturated
arithmetic is sufficiently useful (it probably is), then it can be
provided as an additional type, or a std library type. Compiler
writers are free to implement saturating arithmetic in the most
efficient manner possible (eg by using builtin types, inline assembly
or whatever).

Does this way have any significant disadvantages?

-Ed

--

Greg Herlihy

unread,

Jan 12, 2008, 5:17:31 AM1/12/08

to

On Jan 11, 11:59 am, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Greg Herlihy wrote:
> > Morever, at least one C++ compiler, g++ 4.0, perform optimizations
> > based on the knowledge that signed arithmetic operations are not
> > guaranteed to wrap around like their unsigned counterparts. In fact,
> > to disable these optimizations (and force g++ to assume that signed
> > operations do, in fact, wrap), the programmer has to pass the "-
> > fwrapv" command line switch to the g++ compiler.
>
> Can you give an example of what those optimizations are?

Yes. For example, a C++ compiler is currently free to reduce an
expression like "(2 * s)/2" to "s" - whenever s is a signed integer
type. (Note that there are endless variations of this optimization:
replacing "(4 * x)/2" with "2 * x", for example). Because signed
overflow is undefined, the compiler can assume that (2 * s) does not
overflow. Therefore the compiler can also assume that "(2 * s)/2"
equals "s". In the event that 2 * s -does- overflow, however, the full
expression would then have no defined value. So in the event of "(2 *
s)" overflowing, returning "s" as the result of "(2 * s)/2" works just
as well as returning any other value.

Now, if signed overflow were to "wrap" instead of being undefined,
then a C++ compiler would no longer be able to substitute "s" for ""(2
* s)/2". Instead, the compiler would actually have to perform the
multiplication and division in order to obtain the correct result.

Here's another example of an optimization that relies on signed
overflow being undefined:

int f(int x)
{
return 0×7ffffff0 < x && x + 32 < 0x7fffffff;
}

The gcc compiler, for example, will optimize f() down to a single
"return 0;" statement. The logic of this optimization is
straightforward: if x is greater than 7ffffff0 then x+32 must yield a
integer value greater than 0x7fffffff - or (if the addition overflows)
an undefined value. In no case, therefore, does x+32 ever have to
return a value less than 0x7fffffff, so the right hand side of the
logical "and" expression never needs to be true - for any value of
"x".

If signed overflow were to "wrap around" however, then x+32 could
yield a negative integer - an integer value which would obviously be
less than 0x7fffffff. Therefore, in this hypothetical situation, f()
could in fact return "1" as its result. Therefore signed overflow has
to be undefined in order for f() to return 0 for all values of x.

Along the same lines, gcc optimizes this routine:

int f()
{
int i;
int j = 0;
for (i = 1; i > 0; i += i)
++j;
return j;
}

into an infinite loop. Because that no matter how many times "i" is
doubled, the value of "i" will remain either positive - or become
undefined. So the programmer should have no expectation that "i" will
ever have the negative value that is required to terminate the "for"
loop.

I was able to find more technical optimizations that depend on signed
overflow being undefined. Not being an expert in compiler
optimizations, I will not attempt to explain them here. I will instead
refer to the discussion found in "Advances in Computer Systems
Architecture" on page 241. A preview of this book is available online
from Google Books at:

http://books.google.com/books?id=Zo0KRa-22ggC&printsec=frontcover

Greg

Walter Bright

unread,

Jan 12, 2008, 7:12:33 AM1/12/08

to

Francis Glassborow wrote:
> As, for example, does code that assumes a 32-bit int. It is part of the
> design philosophy of both C and C++ to support all reasonable hardware
> designs and not endeavour to force hardware to conform to the
> expectations of the language.

Compiler design and language design co-evolve with CPU design. I submit
as proof of that the obsolescence of 36 bit machines, 10 bit bytes,
non-IEEE arithmetic, EBCDIC, special "pascal" CPU instructions, BCD
opcodes, etc. I submit as proof the evolution of orthogonal register
sets, opcodes for the simple setup/teardown of stack frames, etc.

> If you want fully portable code you do not use either C or C++. However
> a competent programmer knows what the hardware requirements are for the
> code he is writing and will check that they are met.

Of course. The only problem is the shortage of competent C++
programmers, and difficulty of recognizing them when you run across one.
And frankly I'd want my (rare and expensive) competent programmers
working on more productive things than trying to wring all the UB out of
the code. I'd rather *define* UB out of existence. Voila, problem gone.

> Anyone who simply re-compiles code for different hardware without
> reading the documentation (which should specify what it was written for
> and what it has been tested on) deserves a bit of pain :)
> OTOH using
> third party software that is inadequately documented also deserves pain
> :)
> If the original programmer cannot take the trouble to document
> his/her code then why would you expect it to have sufficient quality to
> be useful?

I should think you'd be pushing to remove const, static type checking,
etc., from C++. Because, after all, if programmers would only get off
their lazy rears and become competent, they wouldn't need those crutches
and defenses against poor documentation.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 12, 2008, 7:12:44 AM1/12/08

to

James Dennett wrote:
> Walter Bright wrote:
>> James Dennett wrote:
>>> Walter Bright wrote:
>> C++ would better serve programmers by standardizing much of the
>> undefined behavior and catering to the needs of the 99.999% of C++
>> programmers out there, rather than some wacky, obsolete machine.
>
> That's subjective: C and C++ have opted *not* to limit itself to
> mainstream architecture, and are probably more widespread than any
> other languages partly as a result of that.

On the other hand, Java went the route of eliminating undefined and
implementation defined behavior, and has had astonishing success
(largely at C++'s expense).

Look at any non-trivial C++ app - it's larded up with #ifdef's for
various local variations. Aiming to reduce the need for these will make
C++ more portable, reliable and useful.

> There are certainly advantages in simplicity to restricting
> a language to supporting only more "normal" architectures,
> and if I were designing a language I'd do as you did with D,
> and assume 2's complement, word sizes being powers of 2, and
> so on.

The D programming language does nail down many UBs and IDBs:

1) integer sizes are fixed
2) bytes are 8 bit
3) source code set is unicode
4) floating point is IEEE
5) char's are unsigned

I intend to take this further and define the order of evaluation of
expressions, too. D won't eliminate all UB and IDB, because things like
defining endianness are not practical, but it will go as far as it can.

I'm old enough to have programmed on 36 bit PDP-10s, processors with 10
bit bytes, DOS near/far/ss programming, and EBCDIC. But those machines
are dead. Nobody is designing new nutburger machines.

The advantages you cite aren't simplicity - they are:

1) portability
2) robustness
3) predictability
4) reliability
5) correctness

The larger a system one is working on, the more important these become.
Unless you can mechanically detect reliance on UB or IDB, you by
definition cannot have a reliable or portable program.

> That doesn't mean that C or C++ made a "wrong" choice,
> it just means that their design goals aren't the same.

Whenever doing an update to the standard, it is worthwhile revisiting
old design goals and assumptions to see if they still make sense. I
contend that supporting such other integer arithmetic no longer makes
sense, and am fairly convinced that it never did.

I view a C++ compiler as being a tool that either is useful or it is
not. That is not quite the same as if it is standard conforming or not -
standards conformance is only worthwhile if it serves the need for
making a compiler useful.

If on Wacky CPU X, integer arithmetic is not 2's complement, and the
standard says it must be, does that mean one cannot implement C++ on
that platform? No, it means one can still implement a variant of C++ on
that platform. This variant will be *no less useful* than the current
situation of undefined integer behavior. For all the other 99.999% of
the programmers out there, C++ is *more useful* because the integer
arithmetic will be more portable and reliable.

Every useful C++ compiler in the DOS world had extensions to C++
specifically for that platform, C++ compilers without those extensions
were useless toys, and support for DOS was what gave C++ critical mass
to succeed. Furthermore, some C++ features are impractical on DOS -
exception handling and templates. A useful C++ compiler for DOS will
disable those standard features.

>> The interesting thing about this is that while one might think this
>> pulls the rug out from under such machines, in reality it does not.
>> There's nothing wrong with a compiler vendor for Wacky Obsolete CPU to
>> state that "This compiler is C++ Standard Conforming except for the
>> following behaviors ... 1) integer overflow is different in this
>> manner ...."
>
> Exceptions to conformance are major issues for those who write
> widely ported code (which is many of the organizations for
> which I've worked).

The *exact same* issue exists if the standard says "UB" for integer
overflow. All the standard is doing here is dumping the problem off onto
the user, the problem does not go away. It does not aid the user by
allowing a program to "launch nuclear missiles" to be standard
conforming behavior upon integer overflow.

> Working around non-compliance is a very expensive game.

Working around UB and IDB is just as expensive, and I'd argue it's more
expensive because:

1) reliance on UB or IDB is not mechanically detectable, making programs
*inherently* unreliable

2) one cannot rely on what the compiler from machine to machine, version
to version, or even compiler switch to compiler switch

How much effort have you seen, time and again, going into dealing with
the implementation defined size of an int? Everybody deals with it, 90%
of them get it wrong, and nobody solves it the same way as anybody else.
How is this not very expensive?

>> The problems facing the programmer for such a WOCPU won't be any
>> different than if the Standard allowed such unusual behavior
>
> That's false. It would require special knowledge of that system,
> hence every such system, whereas currently knowledge of just one
> document, the C++ standard, suffices. That's the strength of
> having a standard.

As this thread has demonstrated, even C++ experts do not know this
corner of the language spec. Worse, workarounds for this issue are
difficult and rarely discussed. And disastrously, reliance on such UB
behavior cannot be mechanically detected.

Conversely, defining the behavior means that one does not have to know
how other systems work. The less UB and IDB, the easier the porting
gets, reducing costs.

For another example, can you guarantee your C++ programs aren't
dependent on the signedness of 'char'? How is knowledge of the standard
going to help you with this? I can guarantee you from decades of
experience with this, that you can understand every detail of the spec
and very carefully not depend on the sign of 'char', yet until you
actually try out your code on a compiler with a different sign, you have
no idea if your code will work or not.

UB and IDB are not strengths of the standard. They are costly weaknesses.

>> and it would be better because at least (presumably) the documentation
>> would list the non-conforming behavior and the programmer can keep an
>> eye out for it.
>>
>> For the other 99.999% of programmers, they have behavior they can rely
>> on. It's a win-win all around.
>
> Your 99.999% is rather optimistic, giving only 10 per million who
> need to care about such systems. Some languages are needed to cater
> for *very* portable code. D isn't such a language, and is not
> intended to be. C and C++ are. Swings and roundabouts.

My experience porting D code between platforms is it ports easier than
the equivalent C++ code. UB means a program operates in an unpredictably
different way on different platforms. I agree that this makes the
language *spec* more portable, but I disagree that it makes language
source code more portable.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

James Dennett

unread,

Jan 12, 2008, 6:29:25 PM1/12/08

to

Walter Bright wrote:
> James Dennett wrote:
>> Walter Bright wrote:
>>> James Dennett wrote:
>>>> Walter Bright wrote:
>>> C++ would better serve programmers by standardizing much of the
>>> undefined behavior and catering to the needs of the 99.999% of C++
>>> programmers out there, rather than some wacky, obsolete machine.
>>
>> That's subjective: C and C++ have opted *not* to limit itself to
>> mainstream architecture, and are probably more widespread than any
>> other languages partly as a result of that.
>
> On the other hand, Java went the route of eliminating undefined and
> implementation defined behavior, and has had astonishing success
> (largely at C++'s expense).

Some of it has been as C++'s expense. But that's fine. A
monoculture isn't the best solution.

> Look at any non-trivial C++ app - it's larded up with #ifdef's for
> various local variations.

Some avoid #ifdefs by using other approaches, but it's true that
variations for different platforms are the norm. (They exist in
Java code too, when it integrates well with a given platform.)

> Aiming to reduce the need for these will make
> C++ more portable, reliable and useful.

In some senses of the terms, yes, and in some, no.

>
>> There are certainly advantages in simplicity to restricting
>> a language to supporting only more "normal" architectures,
>> and if I were designing a language I'd do as you did with D,
>> and assume 2's complement, word sizes being powers of 2, and
>> so on.
>
> The D programming language does nail down many UBs and IDBs:
>
> 1) integer sizes are fixed
> 2) bytes are 8 bit
> 3) source code set is unicode
> 4) floating point is IEEE
> 5) char's are unsigned
>
> I intend to take this further and define the order of evaluation of
> expressions, too. D won't eliminate all UB and IDB, because things like
> defining endianness are not practical, but it will go as far as it can.
>
> I'm old enough to have programmed on 36 bit PDP-10s, processors with 10
> bit bytes, DOS near/far/ss programming, and EBCDIC. But those machines
> are dead. Nobody is designing new nutburger machines.

DSPs are far from dead, and many don't have support for 8-bit
bytes in any reasonable fashion.

> The advantages you cite aren't simplicity - they are:
>
> 1) portability
> 2) robustness
> 3) predictability
> 4) reliability
> 5) correctness

No, please allow me to speak for myself. The advantage of which
I speak *was* simplicity. You are free to argue for others.

Simplicity is a huge consideration, and can enhance all of the
above. A simple language can also push complexity into the code.

> The larger a system one is working on, the more important these become.
> Unless you can mechanically detect reliance on UB or IDB, you by
> definition cannot have a reliable or portable program.

For suitable definitions. Fortunately in the real world it's
not hard for good programmers to have reliable and portable
programs.

>> That doesn't mean that C or C++ made a "wrong" choice,
>> it just means that their design goals aren't the same.
>
> Whenever doing an update to the standard, it is worthwhile revisiting
> old design goals and assumptions to see if they still make sense.

Indeed, but with an ISO Standard, millions of users, and untold
billions of lines of code there is a huge amount of inertia --
more than I'd like, and more than I'd have believed a decade ago.

> I contend that supporting such other integer arithmetic no longer makes
> sense, and am fairly convinced that it never did.

For C++, I'm not sure that I agree. For D, I do. Vive la
difference.

> I view a C++ compiler as being a tool that either is useful or it is
> not.

"Useful" is not a boolean attribute. There are gradations of
utility. Some compilers are more useful than others in particular
circumstances.

> That is not quite the same as if it is standard conforming or not -
> standards conformance is only worthwhile if it serves the need for
> making a compiler useful.

Indeed, standards conformance is a goal because of benefits it
can bring, not because it's valuable in an abstract sense.

[snip]

>>> The interesting thing about this is that while one might think this
>>> pulls the rug out from under such machines, in reality it does not.
>>> There's nothing wrong with a compiler vendor for Wacky Obsolete CPU
>>> to state that "This compiler is C++ Standard Conforming except for
>>> the following behaviors ... 1) integer overflow is different in this
>>> manner ...."
>>
>> Exceptions to conformance are major issues for those who write
>> widely ported code (which is many of the organizations for
>> which I've worked).
>
> The *exact same* issue exists if the standard says "UB" for integer
> overflow. All the standard is doing here is dumping the problem off onto
> the user, the problem does not go away. It does not aid the user by
> allowing a program to "launch nuclear missiles" to be standard
> conforming behavior upon integer overflow.

It allows for implementations which diagnose all overflows; it
allows for optimizations based on assumptions of non-overflow,
and diagnostics when such optimizations are made in ways that
could alter behaviour of code; it encourages implementations to
provide diagnostics for unsafe use. Wrapping semantics may be
well-defined, but are *NOT* always safe. Safety depends on what
the specification/requirements call for. Pretending that
modular arithmetic is always the right solution is simplistic.
Allowing for diagnostics of overflow in many senses can serve
a broader community better than oversimplifying. For smaller
communities, the simpler approach can be better.

>> Working around non-compliance is a very expensive game.
>
> Working around UB and IDB is just as expensive, and I'd argue it's more
> expensive because:
>
> 1) reliance on UB or IDB is not mechanically detectable, making programs
> *inherently* unreliable
>
> 2) one cannot rely on what the compiler from machine to machine, version
> to version, or even compiler switch to compiler switch
>
> How much effort have you seen, time and again, going into dealing with
> the implementation defined size of an int?

Very little; it's a trivial thing. Good programmers use a type
which is guaranteed to have the properties they need, so they
won't use unadorned int for anything more than a -32767 to +32767
range unless they know that their target implementations support
a larger range, they'll just use an int32_t-like type or a long,
or long long, as needed. Certainly I've seen mistakes made, but
I've seen mistakes made in languages with fixed-sized types too.
Ada seems to do better here than most (again).

> Everybody deals with it, 90%
> of them get it wrong, and nobody solves it the same way as anybody else.
> How is this not very expensive?

Your perspective/experience do not match mine. Many places deal
with it, most of the get it right, and most of them solve it in
very similar ways, moreso since C99 et al standardized typedefs
for various integral types. The expense is insignificant in all
competently run projects I've seen.

>>> The problems facing the programmer for such a WOCPU won't be any
>>> different than if the Standard allowed such unusual behavior
>>
>> That's false. It would require special knowledge of that system,
>> hence every such system, whereas currently knowledge of just one
>> document, the C++ standard, suffices. That's the strength of
>> having a standard.
>
> As this thread has demonstrated, even C++ experts do not know this
> corner of the language spec.

I noticed one expert who was surprised by it (which did
surprise me). But then even you make incorrect claims
about basic aspects of C and C++ on occasion -- it doesn't
always mean that things are too complicated, just that
people are (all) fallible.

> For another example, can you guarantee your C++ programs aren't
> dependent on the signedness of 'char'? How is knowledge of the standard
> going to help you with this? I can guarantee you from decades of
> experience with this, that you can understand every detail of the spec
> and very carefully not depend on the sign of 'char', yet until you
> actually try out your code on a compiler with a different sign, you have
> no idea if your code will work or not.

I can assure you from decades of experience that, while this
problem can exist, I've never suffered portability issues because
of it, because compilers have warned in any marginal situation,
and it's rare to use unadorned "char" for anything where
signedness matters. (Exception: <ctype.h>-related matters.)

> UB and IDB are not strengths of the standard. They are costly weaknesses.

An oversimplification. There are pros and cons to them. C++
has too much UB, but *most* of it is there for good reason, and
in many cases has allowed for the language to move forward while
*also* providing excellent stability.

>>> and it would be better because at least (presumably) the
>>> documentation would list the non-conforming behavior and the
>>> programmer can keep an eye out for it.
>>>
>>> For the other 99.999% of programmers, they have behavior they can
>>> rely on. It's a win-win all around.
>>
>> Your 99.999% is rather optimistic, giving only 10 per million who
>> need to care about such systems. Some languages are needed to cater
>> for *very* portable code. D isn't such a language, and is not
>> intended to be. C and C++ are. Swings and roundabouts.
>
> My experience porting D code between platforms is it ports easier than
> the equivalent C++ code.

Porting Java is easy too, if your target platform supports it.
It's true that writing portable C++ isn't trivial. However,
the most portable C++ code (or, even better, the most portable
C code) is *far* more portable than any Java code, simply because
implementations are viable for so many more platforms.

> UB means a program operates in an unpredictably
> different way on different platforms. I agree that this makes the
> language *spec* more portable, but I disagree that it makes language
> source code more portable.

Sometimes it does, sometimes it doesn't, for reasons I've somewhat
covered. This is not a trivial issue, and reducing it to soundbites
doesn't do it justice.

-- James

Pete Becker

unread,

Jan 12, 2008, 6:28:29 PM1/12/08

to

On 2008-01-12 01:12:44 -0500, Walter Bright
<wal...@digitalmars-nospamm.com> said:

>
> On the other hand, Java went the route of eliminating undefined and
> implementation defined behavior, and has had astonishing success
> (largely at C++'s expense).

Post hoc ergo propter hoc.

Programmers who did serious numeric computations hated Java in its
original incarnation, because the restrictions it imposed on
floating-point math made it abominably slow on Intel processors.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

Jerry Coffin

unread,

Jan 12, 2008, 6:42:16 PM1/12/08

to

In article <zIOdnW_qHabJjhXa...@comcast.com>,
wal...@digitalmars-nospamm.com says...

[ ... ]

> On the other hand, Java went the route of eliminating undefined and
> implementation defined behavior, and has had astonishing success
> (largely at C++'s expense).

Yes and no. To a large extent, undefined behavior is simply defining the
limits of what the standard attempts to cover. It's certainly true that
Java defines a _lot_ more of its environment than C++ does. OTOH, it's
also true that in many cases that C++ explicitly states that something
has undefined behavior, the Java standard (at least the last time I
looked at it) simply ignores the issue entirely or (more often) includes
some phrase like "IEEE floating point", that makes it sound like the
issue has been dealt with, but when you get down to it doesn't really
mean much at all.

> Look at any non-trivial C++ app - it's larded up with #ifdef's for
> various local variations. Aiming to reduce the need for these will make
> C++ more portable, reliable and useful.

While true, at least in my experience, this is primarily to deal with
entirely different issues, such as the simple fact that C and C++ only
standardize relatively small standard libraries. Even something as
timing to higher precision than one second generally requires selecting
among half a dozen different platform-specific heaaders, libraries, etc.

Java certainly has a _much_ larger standard library, but this has little
to do with undefined or implementation defined behavior.

[ ... ]

> 1) integer sizes are fixed
> 2) bytes are 8 bit
> 3) source code set is unicode
> 4) floating point is IEEE
> 5) char's are unsigned
>
> I intend to take this further and define the order of evaluation of
> expressions, too. D won't eliminate all UB and IDB, because things like
> defining endianness are not practical, but it will go as far as it can.

These requirements still limit practical portability, even among modern,
widely-used architectures. Quite a few DSP and even some PDAs, cell
phones, etc., don't support any 8-bit type, just to give one example.

> I'm old enough to have programmed on 36 bit PDP-10s, processors with 10
> bit bytes, DOS near/far/ss programming, and EBCDIC. But those machines
> are dead. Nobody is designing new nutburger machines.

People are still designing and using machines that don't fit the
limitations above. In fact, I'd guess those limitations would cause
problems for the _majority_ of CPUs (though not for the CPUs in things
that are generally thought of as "computers").

[ ... ]

> The larger a system one is working on, the more important these become.
> Unless you can mechanically detect reliance on UB or IDB, you by
> definition cannot have a reliable or portable program.

If you rephrased that as "reliable _and_ portable", you'd at least have
a point. Programs that depend on UB or IDB can certainly be reliable as
long as portability isn't required.

> If on Wacky CPU X, integer arithmetic is not 2's complement, and the
> standard says it must be, does that mean one cannot implement C++ on
> that platform? No, it means one can still implement a variant of C++ on
> that platform. This variant will be *no less useful* than the current
> situation of undefined integer behavior. For all the other 99.999% of
> the programmers out there, C++ is *more useful* because the integer
> arithmetic will be more portable and reliable.

I can't say I agree. For most programmers most of the time, the only
interesting point is to ensure that the number of bits is sufficient
that overflow just doesn't happen. As long as that's the case, the
difference between one's complement and two's complement (for exmaple)
is entirely irrelevant.

Nearly the only time anybody really cares when when writing an extended
precision integer library. You'd accomplish far more by defining (for
one example) the result of the remainder operator when dealing with
negative numbers.

> Every useful C++ compiler in the DOS world had extensions to C++
> specifically for that platform, C++ compilers without those extensions
> were useless toys, and support for DOS was what gave C++ critical mass
> to succeed. Furthermore, some C++ features are impractical on DOS -
> exception handling and templates. A useful C++ compiler for DOS will
> disable those standard features.

You dismiss quite a few architectures that are currently in _wide_ use
as "nutburger", but then you act as if _anybody_ cared about MS-DOS
anymore?

[ ... ]

> The *exact same* issue exists if the standard says "UB" for integer
> overflow. All the standard is doing here is dumping the problem off onto
> the user, the problem does not go away. It does not aid the user by
> allowing a program to "launch nuclear missiles" to be standard
> conforming behavior upon integer overflow.

The way you write, you'd think the average programmer spent a
substantial part of his time dealing with integer overflow. This just
isn't the case -- I can hardly remember the last time I wrote anything
where integer overflow was an issue at all. Limiting portability to deal
with something that isn't a problem to start with is a poor trade off.

> > Working around non-compliance is a very expensive game.
>
> Working around UB and IDB is just as expensive, and I'd argue it's more
> expensive because:
>
> 1) reliance on UB or IDB is not mechanically detectable, making programs
> *inherently* unreliable

You're overstating the situation. Certainly some reliance on some UB
and/or IDB is mechnically detectable.

> 2) one cannot rely on what the compiler from machine to machine, version
> to version, or even compiler switch to compiler switch

This is true only in a purely theoretical sense, and you know it. Yes, a
few things change with compiler switches, but 1) only a few, and 2)
compiler switches don't just happen randomly or by accident.

Yes, when you're writing a library that's intended to be portable to
anything anywhere under any circumstances, these can be major issues.
For most people writing end programs, the major issues are things like
keeping track of the make files to ensure that the correct compiler
switches get used on various machines -- and in most cases (at least
IME) these have little to do with UB or IDB and a great deal to do with
the fact that some compilers break perfectly well defined code under
certain circumstances (especially overeager optimization).

> How much effort have you seen, time and again, going into dealing with
> the implementation defined size of an int? Everybody deals with it, 90%
> of them get it wrong, and nobody solves it the same way as anybody else.
> How is this not very expensive?

I've seen a lot of effort put into it repeatedly, but I'd say over 99%
of the time, it's been entirely unnecessary from beginning to end. If C
and C++ made it even _more_ difficult to deal with, so people would
learn to keep it from being an issue at all, everybody would really be
better off most of the time.

In any case, C99 and C++ TR1 have both dealt with this for the rare
ocassion that it really is an issue (and, unfortunately, made it still
easier to write size-dependent code when it's completely unnecessary).
C++ 0x will undoubtedly add this to the base C++ language as well. IMO,
this is almost certain to hurt portability, but at least those of us who
are competent can ignore it the majority of the time when it's
counterproductive; languages like Java and D don't even allow that.

> Conversely, defining the behavior means that one does not have to know
> how other systems work. The less UB and IDB, the easier the porting
> gets, reducing costs.

It gets easier, to a narrower range of targets. Outside that range of
targets, it becomes either drastically more difficult, or truly
impossible. Contrary to your previous claims, targets you see fit to
ignore have not gone away, nor are they likely to do so anytime soon.

> For another example, can you guarantee your C++ programs aren't
> dependent on the signedness of 'char'? How is knowledge of the standard
> going to help you with this? I can guarantee you from decades of
> experience with this, that you can understand every detail of the spec
> and very carefully not depend on the sign of 'char', yet until you
> actually try out your code on a compiler with a different sign, you have
> no idea if your code will work or not.

Oh come on. Typical compilers have had switches to control the
signedness of char for years.

> UB and IDB are not strengths of the standard. They are costly weaknesses.

They are not strengths or weaknesses -- they are simply boundaries.
Nothing more and nothing less. C and C++ are nearly unique only in the
fact that they make a serious attempt at specifying the boundaries of
what they do and don't define, whereas most other language specs simply
ignore the boundaries between what they do and don't define.

[ ... ]

> My experience porting D code between platforms is it ports easier than
> the equivalent C++ code. UB means a program operates in an unpredictably
> different way on different platforms. I agree that this makes the
> language *spec* more portable, but I disagree that it makes language
> source code more portable.

This seems to indicate little more than that you've ported code only
within a relatively small range. At least based on what you've said,
porting D code to (say) a Microchip PIC or any of a large number of DSPs
would be somewhere between excruciating and impossible.

Just for a few examples, try to find a reasonable way to support an 8-
bit char in:

http://www.analog.com/UploadedFiles/Associated_Docs/352228244SHARC_getst
art_online.pdf

or:

http://focus.ti.com/lit/ug/spru731/spru731.pdf

Note that these are not ancient "nutburger" architectures -- these are
both current and in _wide_ use. Just for an obvious example, the last
time I was in Costco, they had a brand new HD-DVD player that (on the
outside of the box!) bragged about using a SHARC processor.

--
Later,
Jerry.

The universe is a figment of its own imagination.

Walter Bright

unread,

Jan 13, 2008, 6:48:19 AM1/13/08

to

James Dennett wrote:
> DSPs are far from dead, and many don't have support for 8-bit
> bytes in any reasonable fashion.

I don't know much of anything about DSPs. But I know that many
specialized CPU chips tend to have specialized languages that come with
them, and that's perfectly reasonable. I don't think anyone wants to
recompile Office for a DSP, anyway :-)

> Fortunately in the real world it's
> not hard for good programmers to have reliable and portable
> programs.

I disagree. For example, I've never found a non-trivial C++ program that
would port successfully between 16 and 32 bits, even by expert
programmers (far better than just good ones), without doing some
adjustments and bug fixing.

Changing endianness often breaks C++ code, as do changes in struct
member padding. Varying int sizes, and char signedness, also break
programs. The reason is simple, it is really really hard to look at a
piece of code and verify it doesn't have these issues. It is impossible
to test for portability issues without actually porting the code.

I have a large piece of code that works for DMC++, VC++, and an older
g++. Upgrading to the latest g++ breaks it. It still compiles, it just
produces wrong answers. I don't know yet what went wrong, but portable
C++ ain't.

> It allows for implementations which diagnose all overflows;

Are there any such implementations?

> it allows for optimizations based on assumptions of non-overflow,

Apparently the new g++ does that, though my question on what those were
is so far unanswered. My experience with optimizations that change the
behavior is that customers call it an optimizer bug, even if the fault
lies with their reliance on UB.

> and diagnostics when such optimizations are made in ways that
> could alter behaviour of code; it encourages implementations to
> provide diagnostics for unsafe use.

I don't see any way to issue warnings on unsafe overflow use based on
static analysis of code.

> Wrapping semantics may be well-defined, but are *NOT* always safe.

I'm not arguing that they are safe. I'm saying that well-defined
semantics make code that, once tested, can be reliably ported.

> Safety depends on what
> the specification/requirements call for. Pretending that
> modular arithmetic is always the right solution is simplistic.

I'm not arguing that a specific is always the right solution. I'm
arguing that undefined behavior is the wrong solution because it is, by
definition, not the "right" solution.

> Allowing for diagnostics of overflow in many senses can serve
> a broader community better than oversimplifying.

Does any C++ implementation diagnose integer overflow at runtime?

>> How much effort have you seen, time and again, going into dealing with
>> the implementation defined size of an int?
> Very little; it's a trivial thing. Good programmers use a type
> which is guaranteed to have the properties they need, so they
> won't use unadorned int for anything more than a -32767 to +32767
> range

There aren't very many "good" C++ programmers, then <g>.

> unless they know that their target implementations support
> a larger range, they'll just use an int32_t-like type or a long,
> or long long, as needed. Certainly I've seen mistakes made,

I was in the trenches when the big shift from 16 bit C++ to 32 bit C++
took place, and ints doubled in size. I can tell you for a fact that
various schemes for portably doing this were debated ad nauseum, and
that almost none of them actually worked when it became time to do the
real port. If you want an example, look no further than windows.h.

The converse was also true, C++ code developed for 32 bit machines
rarely ported to 16 bits without major effort, often a rewrite was required.

Even in D, people cannot seem to shake their C/C++ heritage in worrying
about the size of an int, and typedef it "just in case". I know I am
much happier with "int" than "int32_t". The latter just stinks. Sorry.

It's at least possible you aren't seeing actual problems with int sizes
these days because practically every C++ compiler sets them at 32 bits,
even for 64 bit CPUs. So you never know if your use of typedefs is
correct or not.

> but I've seen mistakes made in languages with fixed-sized types too.

So have I. But the question is how prevalent are such mistakes, versus
mistakes from the int sizes changing?

>> Everybody deals with it, 90%
>> of them get it wrong, and nobody solves it the same way as anybody else.
>> How is this not very expensive?
>
> Your perspective/experience do not match mine. Many places deal
> with it, most of the get it right, and most of them solve it in
> very similar ways, moreso since C99 et al standardized typedefs
> for various integral types. The expense is insignificant in all
> competently run projects I've seen.

I'll bet that in most of those competently run projects, the code has
never been ported to a compiler with different int sizes, so how good a
job they did has never been tested. I saw how well (i.e. badly) it
worked in the last big shift from 16 to 32 bit code.

Would you like to try porting one of the ones that get it right to 16
bits? I've got a beer that says it fails <g>.

And that brings us back to the fundamental problem with UB and IDB - how
do you *know* you did it right? It isn't testable.

Me, I'd rather define the problem out of existence, and have my good,
competent engineers working on something more worthy of their talents.

>> As this thread has demonstrated, even C++ experts do not know this
>> corner of the language spec.
> I noticed one expert who was surprised by it (which did
> surprise me). But then even you make incorrect claims
> about basic aspects of C and C++ on occasion -- it doesn't
> always mean that things are too complicated, just that
> people are (all) fallible.

Yes, although I've read every detail of the specs and have implemented
them, I sometimes mis-recall bits of it. I bet if I sat down and quizzed
you on arcane details of the spec, I'd find a way to trip you up, too.
The point of all this is that dismissing problems with the language by
saying that "good" or "competent" programmers won't trip over them is
not good enough. Humans, no matter how good they are, screw up now and
then. I view the job of the language designer is, at least in part, to
make the design resistant to human failure.

For example, airplane pilots use checklists for everything. Is it
because they aren't good pilots? Absolutely not. They are good pilots
because they *use* the checklist, even though it seems silly. Even the
best pilots would (and have) made monumental mistakes like forgetting to
put gas in the tanks. Even though their very lives are forfeit, they
still make stupid mistakes.

I read an article recently about attempts to introduce checklists into
hospital procedures. The doctors are resisting because they feel
checklists are insulting and demeaning to their exalted expertise. The
reality is that hospitals that use checklists reduce mistakes by
something like 30% (I forgot the exact figure).

The best programmers are not gods. They make stupid mistakes, too. I
make them, you make them, Bjarne makes them. The "checklist" is the
compiler. The more the language can be designed so that mistakes get
caught by the compiler or in test, rather than being UB, the more
reliable software we can make.

> I can assure you from decades of experience that, while this
> problem can exist, I've never suffered portability issues because
> of it,

Do you use compilers that have different signs? In the Windows world,
all the compilers, over time, gravitated towards using the same
signedness for char (signed) not because of happenstance, but because it
made real code more portable. I'm not in the least surprised that g++ on
x86 Linux also has chars signed. It's pretty easy to never actually
encounter a compiler with a different char sign, and hence have
undiscovered bugs.

> because compilers have warned in any marginal situation,

Warnings are a good sign that there's something wrong with the language
design. BTW, I just tried this:

int test(char c) { return c; }

with:

g++ -c foo.cpp -Wall

and it compiled without error or warning. (gcc-4.1)

> and it's rare to use unadorned "char" for anything where
> signedness matters.

That's because we try to avoid that like the plague.

> (Exception: <ctype.h>-related matters.)

String literals are char based (the sign gets you in hot water when
you're doing utf-8). The standard library (especially the C one) is
replete with char*. If you avoid using char, you wind up using a lot of
casts. It's not practical to avoid char.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 13, 2008, 6:48:54 AM1/13/08

to

Pete Becker wrote:

> On 2008-01-12 01:12:44 -0500, Walter Bright said:
>> On the other hand, Java went the route of eliminating undefined and
>> implementation defined behavior, and has had astonishing success
>> (largely at C++'s expense).
>
> Post hoc ergo propter hoc.

I agree I can't prove it. But neither can one prove James' remark:

"That's subjective: C and C++ have opted *not* to limit itself to
mainstream architecture, and are probably more widespread than any
other languages partly as a result of that."

C++ (with a few extensions) was remarkably well suited to writing apps
for DOS, and C++ rode the big surge in DOS and PCs up. I don't believe
C++'s success is due to it being supported on nutburger CPU designs.
Let's face it - the PDP-10 is dead.

> Programmers who did serious numeric computations hated Java in its
> original incarnation, because the restrictions it imposed on
> floating-point math made it abominably slow on Intel processors.

Yeah, I know about that. They went too tight with the floating point in
eliminating IDB, and since backed off a turn of the screw. C++ could
easily tighten down the screws several full turns, though. For Bob's
sake, why are chars still optionally signed?

I've read many diatribes against Java, from the ignorant to the well
-informed, and not one ever complained about the integer math behavior
being nailed down.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Bo Persson

unread,

Jan 13, 2008, 6:43:14 AM1/13/08

to

James Dennett wrote:
> Walter Bright wrote:
>>
>> The D programming language does nail down many UBs and IDBs:
>>
>> 1) integer sizes are fixed
>> 2) bytes are 8 bit
>> 3) source code set is unicode
>> 4) floating point is IEEE
>> 5) char's are unsigned
>>
>> I intend to take this further and define the order of evaluation of
>> expressions, too. D won't eliminate all UB and IDB, because things
>> like defining endianness are not practical, but it will go as far
>> as it can. I'm old enough to have programmed on 36 bit PDP-10s,
>> processors
>> with 10 bit bytes, DOS near/far/ss programming, and EBCDIC. But
>> those machines are dead. Nobody is designing new nutburger
>> machines.

Except IBM. :-)

On the other hand, requiring IEEE floating point has already
disqualified the EBCDIC systems.

[...]

>> My experience porting D code between platforms is it ports easier
>> than the equivalent C++ code.

Porting is easier if you limit the number of potential platforms.

>
> Porting Java is easy too, if your target platform supports it.

Porting Java is hard, if you haven't ported its platform first!

We had a discussion just last week with a Java developer on reusing
his web server code on the mainframe.

- "Oh dear! That's just Java 1.5, I need 1.6 generics for my code.
Limiting myself to 1.5 features will cost you a lot more!"

> It's true that writing portable C++ isn't trivial. However,
> the most portable C++ code (or, even better, the most portable
> C code) is *far* more portable than any Java code, simply because
> implementations are viable for so many more platforms.

Porting is easier if you limit the number of potential platforms. :-)

Bo Persson

Walter Bright

unread,

Jan 13, 2008, 6:52:05 AM1/13/08

to

Jerry Coffin wrote:
> the Java standard (at least the last time I
> looked at it) simply ignores the issue entirely or (more often) includes
> some phrase like "IEEE floating point", that makes it sound like the
> issue has been dealt with, but when you get down to it doesn't really
> mean much at all.

IEEE 754 floating point arithmetic means a lot more than nothing, and
certainly far more than C++ floating point, but you're right that it
doesn't nail it down 100%.

> Java certainly has a _much_ larger standard library, but this has little
> to do with undefined or implementation defined behavior.

I agree with that.

>> 1) integer sizes are fixed
>> 2) bytes are 8 bit
>> 3) source code set is unicode
>> 4) floating point is IEEE
>> 5) char's are unsigned
>>
>> I intend to take this further and define the order of evaluation of
>> expressions, too. D won't eliminate all UB and IDB, because things like
>> defining endianness are not practical, but it will go as far as it can.
>
> These requirements still limit practical portability, even among modern,
> widely-used architectures. Quite a few DSP and even some PDAs, cell
> phones, etc., don't support any 8-bit type, just to give one example.

I disagree, because nothing would permit a D *variant* from being
customized to unusual architectures. Such would not be less useful than
simply dumping the problem off to all users.

I suspect that this is an even better solution for those programming
such machines, because they won't be under the delusion that code that
has never been tested under such conditions would have been "portably"
written with some wrong notion of portability.

> People are still designing and using machines that don't fit the
> limitations above. In fact, I'd guess those limitations would cause
> problems for the _majority_ of CPUs (though not for the CPUs in things
> that are generally thought of as "computers").

A more relevant question is how many programmers are programming for
these oddballs, vs programming for mainstream computers?

(I get asked now and then to produce a custom compiler for some oddball
CPU, but I ask for all the development money up front because I know
there is no market for such a compiler.)

>> The larger a system one is working on, the more important these become.
>> Unless you can mechanically detect reliance on UB or IDB, you by
>> definition cannot have a reliable or portable program.
>
> If you rephrased that as "reliable _and_ portable", you'd at least have
> a point. Programs that depend on UB or IDB can certainly be reliable as
> long as portability isn't required.

UB does not imply reliable or repeatable behavior, so any dependence on
UB is inherently unreliable _and_ unportable.

> I can't say I agree. For most programmers most of the time, the only
> interesting point is to ensure that the number of bits is sufficient
> that overflow just doesn't happen. As long as that's the case, the
> difference between one's complement and two's complement (for exmaple)
> is entirely irrelevant.

Most of the time, sure. Even 99% of the time. But when you've got a
million lines of code, suddenly even the obscure cases become probable.
And when you don't have a thorough test suite (who does?) how can you be
*sure* you don't have an issue there?

I'm very interested in building languages which can offer a high degree
of reliability. While D isn't a language that gets one there 100%, it
gets a lot closer than C++ does. I am a little surprised at the
resistance to improving C++ along these lines. It's not like pinning
down UB is going to break existing code - by definition, it won't.

> Nearly the only time anybody really cares when when writing an extended
> precision integer library. You'd accomplish far more by defining (for
> one example) the result of the remainder operator when dealing with
> negative numbers.

Did that, too, just forgot to mention it.

> You dismiss quite a few architectures that are currently in _wide_ use
> as "nutburger", but then you act as if _anybody_ cared about MS-DOS
> anymore?

The D programming language explicitly does not support 16 bit platforms.
That should leave no doubt about my position on that <g>. C++ was
designed to support it, and so is fair game for criticizing its
shortcomings in doing so.

I would vote for C++ to explicitly ditch 16 bit support. No problem there!

>> The *exact same* issue exists if the standard says "UB" for integer
>> overflow. All the standard is doing here is dumping the problem off onto
>> the user, the problem does not go away. It does not aid the user by
>> allowing a program to "launch nuclear missiles" to be standard
>> conforming behavior upon integer overflow.
>
> The way you write, you'd think the average programmer spent a
> substantial part of his time dealing with integer overflow. This just
> isn't the case -- I can hardly remember the last time I wrote anything
> where integer overflow was an issue at all.

I rarely worry about it either, but then again I've had several bugs due
to it. One in particular is in a storage allocator:

nbytes = dimension * element_size;

Crud, that overflows. I do lots of hash computations, too, which rely on
wraparound overflow.

> Limiting portability to deal
> with something that isn't a problem to start with is a poor trade off.

This may be the root of our different ideas: We have different
definitions of portability.

You (if you don't mind me putting words in your mouth) define it as the
likelihood of if a program compiles on X that it will also compile on Y.
Whether it works or not depends on how good the programmer is.

I define it as the likelihood of if a program compiles *and* works on X
that it will also compile *and* work on Y, regardless of how good the
programmer is.

>> 1) reliance on UB or IDB is not mechanically detectable, making programs
>> *inherently* unreliable
> You're overstating the situation. Certainly some reliance on some UB
> and/or IDB is mechnically detectable.

Runtime integer overflow isn't.

>> 2) one cannot rely on what the compiler from machine to machine, version
>> to version, or even compiler switch to compiler switch
>
> This is true only in a purely theoretical sense, and you know it.

In this thread, it was pointed out that new for g++ are optimizations
that change the behavior of integer overflow.

My own code has broken from one g++ version to the next from reliance on UB.

> Yes, a
> few things change with compiler switches, but 1) only a few, and 2)
> compiler switches don't just happen randomly or by accident.

A typical C++ compiler has a bewildering array of switches that change
its behavior.

> Yes, when you're writing a library that's intended to be portable to
> anything anywhere under any circumstances, these can be major issues.

They're major issues for people who need to develop reliable programs
such as, say, a flight control system, or banking software. Such
applications need more than reliance on "good" programmers and prayer.

If you're writing a game, who gives a darn if it fails now and then.

> For most people writing end programs, the major issues are things like
> keeping track of the make files to ensure that the correct compiler
> switches get used on various machines -- and in most cases (at least
> IME) these have little to do with UB or IDB and a great deal to do with
> the fact that some compilers break perfectly well defined code under
> certain circumstances (especially overeager optimization).

g++ 4.1 has 40 options that explicitly modify C++ language behavior.
That's 40 factorial interactions. I suspect there are more, like the
aforementioned integer optimizations.

>> How much effort have you seen, time and again, going into dealing with
>> the implementation defined size of an int? Everybody deals with it, 90%
>> of them get it wrong, and nobody solves it the same way as anybody else.
>> How is this not very expensive?
>
> I've seen a lot of effort put into it repeatedly, but I'd say over 99%
> of the time, it's been entirely unnecessary from beginning to end.

In D, the effort to deal with it is 0 because the problem is defined out
of existence.

> If C
> and C++ made it even _more_ difficult to deal with, so people would
> learn to keep it from being an issue at all, everybody would really be
> better off most of the time.

The way to make things more difficult is to make them compile time
errors. Then they cannot be avoided or overlooked. Ideally, if a program
compiles, then its output should be defined by the language.

> In any case, C99 and C++ TR1 have both dealt with this for the rare
> ocassion that it really is an issue (and, unfortunately, made it still
> easier to write size-dependent code when it's completely unnecessary).

It's rarely an issue now because:

1) C++ compilers have dropped 16 bit support (and 16 bit ints).
2) 32 bit C++ compilers all use 32 bit ints.
3) 64 bit C++ compilers also use 32 bit ints.

In other words, C++ has de facto standardized around 32 bit ints.

> C++ 0x will undoubtedly add this to the base C++ language as well. IMO,
> this is almost certain to hurt portability, but at least those of us who
> are competent can ignore it the majority of the time when it's
> counterproductive; languages like Java and D don't even allow that.

In D, you can use a variable sized int if you want to:

typedef int myint;

and use myint everywhere instead of int. To change the size, change the
typedef. Nothing is taken away from you by fixing the size of int. It
just approaches it from the opposite direction:

C++: use int for variable sizes, typedef for fixed sizes
D: use int for fixed sizes, typedef for variable sizes
Java: doesn't have typedefs, oh well :-)

>> Conversely, defining the behavior means that one does not have to know
>> how other systems work. The less UB and IDB, the easier the porting
>> gets, reducing costs.
>
> It gets easier, to a narrower range of targets. Outside that range of
> targets, it becomes either drastically more difficult, or truly
> impossible.

How does it become harder or impossible?

> Contrary to your previous claims, targets you see fit to
> ignore have not gone away, nor are they likely to do so anytime soon.

I think 16 bit DOS and 36 bit PDP-10's are dead and are not likely to
rise from their graves.

>> For another example, can you guarantee your C++ programs aren't
>> dependent on the signedness of 'char'? How is knowledge of the standard
>> going to help you with this? I can guarantee you from decades of
>> experience with this, that you can understand every detail of the spec
>> and very carefully not depend on the sign of 'char', yet until you
>> actually try out your code on a compiler with a different sign, you have
>> no idea if your code will work or not.
> Oh come on. Typical compilers have had switches to control the
> signedness of char for years.

Yes, and Digital Mars C++ does, too. I know of nobody who actually tests
their code using those switches. You and I can argue that "good"
programmers will, but we both know they won't.

Most switches of that sort are of limited utility anyway because they
screwup the abi to existing compiled libraries.

>> UB and IDB are not strengths of the standard. They are costly weaknesses.
>
> They are not strengths or weaknesses -- they are simply boundaries.
> Nothing more and nothing less. C and C++ are nearly unique only in the
> fact that they make a serious attempt at specifying the boundaries of
> what they do and don't define, whereas most other language specs simply
> ignore the boundaries between what they do and don't define.

I agree that C and C++ do an unusually good job at specifying the language.

> Just for a few examples, try to find a reasonable way to support an 8-
> bit char in:
>
> http://www.analog.com/UploadedFiles/Associated_Docs/352228244SHARC_getst
> art_online.pdf
>
> or:
>
> http://focus.ti.com/lit/ug/spru731/spru731.pdf
>
> Note that these are not ancient "nutburger" architectures -- these are
> both current and in _wide_ use. Just for an obvious example, the last
> time I was in Costco, they had a brand new HD-DVD player that (on the
> outside of the box!) bragged about using a SHARC processor.

Here's the C++ compiler for the sharc:

http://www.analog.com/UploadedFiles/Associated_Docs/75285036450_SHARC_cc_man.pdf

The C++ compiler for sharc has many sharc specific extensions. It isn't
hard to imagine a D variant that would do the same. You'd have the same
difficulties porting C++ code to the sharc C++ compiler as you would
porting standard D to sharc specific D.

As for the 32 bit sharc characters, they would map on to the "dchar" 32
bit D character type. I imagine a D for sharc would issue a compile
error on encountering a "char". At least the programmer then has a clue
he needs to use "dchar" instead, and perhaps double check the code using
that variable.

As for sharc "shorts" being 32 bits, that doesn't help you if your code
needs to address or manipulate 16 bit data. Just taking away the 16 bit
type doesn't magically make the code work, even if it is in C++ and
still compiles.

Again, just being able to compile the code doesn't mean it's portable.

Also, I wish to point out the difference between "wide use" meaning
there are a lot of CPUs in circulation and "wide use" meaning a lot of
programmers are writing code for it. Only one programmer might be
writing the sharc code that is stamped out into millions of HD-DVD
units. HD-DVDs in Costco gives no clue about how many programmers write
sharc code, other than it is greater than 0. On the other hand, I've
shipped several hundred thousand C++ compilers for Windows.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

kwikius

unread,

Jan 13, 2008, 3:59:33 PM1/13/08

to

On Jan 12, 11:42 pm, Jerry Coffin <jcof...@taeus.com> wrote:
> In article <zIOdnW_qHabJjhXanZ2dnUVZ_uWln...@comcast.com>,

> > How much effort have you seen, time and again, going into dealing with
> > the implementation defined size of an int? Everybody deals with it, 90%
> > of them get it wrong, and nobody solves it the same way as anybody else.
> > How is this not very expensive?
>
> I've seen a lot of effort put into it repeatedly, but I'd say over 99%
> of the time, it's been entirely unnecessary from beginning to end.

As an example look at boosts rational class, this effectively cubes
the problem, and its trivial to get an invalid result (using built in
int's) unless you ( the programmer) work very hard. The rational class
simply ignores the issue and leaves you to do all the work. Why does
it do that?. Because the C++ integer types are a mess that takes a lot
of work to deal with.

> If C
> and C++ made it even _more_ difficult to deal with, so people would
> learn to keep it from being an issue at all, everybody would really be
> better off most of the time.

:-) Love it !!! sand(head)->bury() :-)

> This seems to indicate little more than that you've ported code only
> within a relatively small range. At least based on what you've said,
> porting D code to (say) a Microchip PIC or any of a large number of DSPs
> would be somewhere between excruciating and impossible.

IIRC Its difficult to implement 8 bit integer math to be standard
conforming on 8 bit PIC using fundamental types , because the C
standard requires conversion to an int (16 bits min) (Again IIRC) many
Microchip compilers for 8 Bit PIC don't convert 8 bit to 16 bits
before the calc, some PICS have 8 x 8 hardware multiply so you can see
their logic.

> Just for a few examples, try to find a reasonable way to support an 8-
> bit char in:
>
> http://www.analog.com/UploadedFiles/Associated_Docs/352228244SHARC_getst
> art_online.pdf

Again some available instructions are not useful due to semantics of C
ints, eg signed x unsigned multiply, IIRC.

There is a nice solution to all this. Make int (and family) optional
typedef for some lower level UDT(which must fit in the loose
boundaries of 'int' defined in std) If the standard semantics are not
useful (as in 8 bit PIC example) simply dont define the typedef. Where
defined, if the chosen rep is not implemented on that platform... it
don't compile.

generic algorithms for e,g int-like types can where necessary be
implemented using templates/ Concepts which leave the detailed
semantics to the types.

Int-like Types can disallow expressions ( e.g signed + unsigned fails
to compile, or --> signed, or promotes internally to safe containing
type and throws if result is out of range, etc, etc)

OTOH If current int semantics are acceptable it is there, and can be
customised within the confines of the std spec, if required.

IOW it is possible to have your cake and eat it. I believe the
mechanisms are all there, except the one to make fundamental types
optional, or chose the rep.

regards
Andy Little

--

Lance Diduck

unread,

Jan 13, 2008, 4:04:16 PM1/13/08

to

> > I think I'm getting what you say. Let me see if I understand your point
> > correctly.
>
> > * Unsigned has defined behavior for all values and all operations, save
> > for division by zero.
>
> Yes. I would say that unsigned values in c++ form a "closed" set of
> values, such that all (defined) arithmetic operations within this
> closed set, yield a value that is a member this set. Signed values in C
> ++ form an "open" set of values; so - even though all (defined)
> operations with signed types yield a member of this open set - not all
> members of the set can be represented by a signed type. And whenever a
> calculation yields one of these un-representable values, overflow is
> said to have occurred.
>
> > * Int does NOT enjoy that property; if it were guaranteed to use 2's
> > complement representation, it WOULD have. That was the fatal flaw in my
> > reasoning - I thought C++ always uses 2's complement for int.
>
> Even 2's complement representation does not guarantee that signed
> operations will wrap. A C++ compiler could generaste "add with
> overflow" instructions for signed types, and generate "add - no
> overflow" instructions for unsigned types and then trap on the
> overflow condition flag.

Some processors do not wrap integers. Rather, they "saturate" meaning
that overflow conditions results in the maximum value for that type.
i.e.
int g=INT_MAX;
g+=2;
assert(g==INT_MAX);
This is useful in DSP, where an overflow condition then results in a
still usable value, rather than something totally wrong. But it is
certainly no longer a finite field.
It is not just found in dedicated DSP: Anybody that uses Intel MMX
could play around with saturation arithmetic at the hardware level.

"int, unsigned, signed" etc are really shorthand for "do whatever
arithmetic is the most natural for my processor." I think this has
always been the intent (back in the early days of C, the choice was
between "one or two complement," rather than "saturate or wrap."

Most processors of course uses finite fields for unsigned, and the
signed part uses two complement. It is always preferable to use
unsigned -- if nothing else an optimizer has a far easier time
manipulating unsigned values -- there shifts and mult/div are
interchangable, and implicit conversions do not have to be "signed
extended."

Lance

James Dennett

unread,

Jan 13, 2008, 4:08:58 PM1/13/08

to

Walter Bright wrote:
> Pete Becker wrote:
>> On 2008-01-12 01:12:44 -0500, Walter Bright said:
>>> On the other hand, Java went the route of eliminating undefined and
>>> implementation defined behavior, and has had astonishing success
>>> (largely at C++'s expense).
>>
>> Post hoc ergo propter hoc.
>
> I agree I can't prove it. But neither can one prove James' remark:
>
> "That's subjective: C and C++ have opted *not* to limit itself to
> mainstream architecture, and are probably more widespread than any
> other languages partly as a result of that."
>
> C++ (with a few extensions) was remarkably well suited to writing apps
> for DOS, and C++ rode the big surge in DOS and PCs up. I don't believe
> C++'s success is due to it being supported on nutburger CPU designs.
> Let's face it - the PDP-10 is dead.

But most processors are embedded, and C and C++ have huge
market share in the embedded world (where C is still likely
more widely used, but C++ has been increasing for many years).

[snip]

> I've read many diatribes against Java, from the ignorant to the well
> -informed, and not one ever complained about the integer math behavior
> being nailed down.

Compared to Java's other problems for numerical code, that
one doesn't even make the radar. (But let's be fair, Java
has improved a lot in its 7 versions to date.)

-- James

Andrei Alexandrescu (See Website For Email)

unread,

Jan 13, 2008, 4:09:39 PM1/13/08

to

Walter Bright wrote:
> James Dennett wrote:

>> it allows for optimizations based on assumptions of non-overflow,
>
> Apparently the new g++ does that, though my question on what those were
> is so far unanswered. My experience with optimizations that change the
> behavior is that customers call it an optimizer bug, even if the fault
> lies with their reliance on UB.

I searched a bit online and found such a case. Compile this code:

#include <cstdio>
using namespace std;

int main() {
int u = 2000000000;
int v = (u * 2) / 2;
printf("%d\n", v);
}

with and without the -fwrapv flag. It produces different results.

Andrei

James Dennett

unread,

Jan 13, 2008, 4:08:43 PM1/13/08

to

Walter Bright wrote:
> James Dennett wrote:
>> DSPs are far from dead, and many don't have support for 8-bit
>> bytes in any reasonable fashion.
>
> I don't know much of anything about DSPs. But I know that many
> specialized CPU chips tend to have specialized languages that come with
> them, and that's perfectly reasonable. I don't think anyone wants to
> recompile Office for a DSP, anyway :-)
>
>
>> Fortunately in the real world it's
>> not hard for good programmers to have reliable and portable
>> programs.
>
> I disagree. For example, I've never found a non-trivial C++ program that
> would port successfully between 16 and 32 bits, even by expert
> programmers (far better than just good ones), without doing some
> adjustments and bug fixing.

I have seen numerous of them; in fact, they've been pretty
much the norm (though 32- to 64- bit portability was the
last question, and some OS vendors threw spanners into the
works with their APIs there).

> Changing endianness often breaks C++ code, as do changes in struct
> member padding.

It often breaks low-level C++ code that was written carelessly.
I've ported millions of lines of code that do NOT have issues
here. It seems that you've somehow been exposed to less good
code than one would hope.

> Varying int sizes, and char signedness, also break
> programs. The reason is simple, it is really really hard to look at a
> piece of code and verify it doesn't have these issues. It is impossible
> to test for portability issues without actually porting the code.

Your estimate of how much of a problem these things are does
not match mine.

> I have a large piece of code that works for DMC++, VC++, and an older
> g++. Upgrading to the latest g++ breaks it. It still compiles, it just
> produces wrong answers. I don't know yet what went wrong, but portable
> C++ ain't.

I'd bet it's not a problem with integral type sizes if you're
using reasonable coding practices.

>> It allows for implementations which diagnose all overflows;
>
> Are there any such implementations?

I don't know. Even if there aren't, the standard allows for one
in future. The C++ market has been around a long time, and will
continue to be significant for a long time yet, and a standard
that chooses too often to fix things when it could allow for
better QoI is not a good standard.

>> it allows for optimizations based on assumptions of non-overflow,
>
> Apparently the new g++ does that, though my question on what those were
> is so far unanswered.

There are (at least experimentally) some flags which will alert
you when g++ optimizes based on wraparound. There's also something
like -fno-wrapv which disables those optimizations at some cost in
code speed (but the cost is <<< 1% for typical application code).

> My experience with optimizations that change the
> behavior is that customers call it an optimizer bug, even if the fault
> lies with their reliance on UB.

C++ users also often call RVO a bug, and they call reordering
their operations between sequence points bugs, and they call
hiding by name a bug, and they call diagnostics for their buggy
code bugs. There are trade-offs to be made: there's some merit
in fewer complaints from customers, to be sure, but there's cost
to disabling optimizations too. Maybe what's needed is a
"simple" mode where non-obvious optimizations are disabled and
code runs more slowly, and a "strict" mode where the optimizer
is allowed to do anything consistent with the language spec.

>> and diagnostics when such optimizations are made in ways that
>> could alter behaviour of code; it encourages implementations to
>> provide diagnostics for unsafe use.
>
> I don't see any way to issue warnings on unsafe overflow use based on
> static analysis of code.

Please note that I did not restrict this to *static* analysis.
However, range checking in static analysers is fairly common,
though the type systems of C, C++ (and Java, C#, D, etc.) make
it rather hard to be strict without generating a huge number
of false positives.

I'll mention again: turning an overflow into wraparound is
not generally safe -- code which assumes no overflow is broken
in either case.

>> Wrapping semantics may be well-defined, but are *NOT* always safe.
>
> I'm not arguing that they are safe. I'm saying that well-defined
> semantics make code that, once tested, can be reliably ported.

I'd prefer to have code that can be made safe. Porting is
secondary (though the average piece of code I've written in
my career runs on maybe 5 different platforms).

>> Safety depends on what
>> the specification/requirements call for. Pretending that
>> modular arithmetic is always the right solution is simplistic.
>
> I'm not arguing that a specific is always the right solution. I'm
> arguing that undefined behavior is the wrong solution because it is, by
> definition, not the "right" solution.

But it's a meta-solution: it allows implementations to offer
a choice of solutions, and for the market (rather than a BDFL
or a committee) to determine which is most useful.

>> Allowing for diagnostics of overflow in many senses can serve
>> a broader community better than oversimplifying.
>
> Does any C++ implementation diagnose integer overflow at runtime?

I do not know of any. My knowledge is far from perfect.
I wouldn't be surprised if some tools attempted to do this.

>>> How much effort have you seen, time and again, going into dealing with
>>> the implementation defined size of an int?
>> Very little; it's a trivial thing. Good programmers use a type
>> which is guaranteed to have the properties they need, so they
>> won't use unadorned int for anything more than a -32767 to +32767
>> range
>
> There aren't very many "good" C++ programmers, then <g>.

There's sadly a shortage of competent programmers, and it's
somewhat independent of the implementation language.

>> unless they know that their target implementations support
>> a larger range, they'll just use an int32_t-like type or a long,
>> or long long, as needed. Certainly I've seen mistakes made,
>
> I was in the trenches when the big shift from 16 bit C++ to 32 bit C++
> took place, and ints doubled in size. I can tell you for a fact that
> various schemes for portably doing this were debated ad nauseum, and
> that almost none of them actually worked when it became time to do the
> real port. If you want an example, look no further than windows.h.

I saw the 16- to 32-bit shift, and the 32- to 64-bit shift, and
how well it was handled seemed to vary by community, with the
Windows world having huge problems and the Unix world having
relatively few (because Unix was diverse from early on, and
Windows was a monoculture). The very uniformity and guarantees
that Win16 (and later Win32) laid down caused portability
problems to later generations of machines; the flexibility
that was built in to Unix-style specifications aided that
same portability.

> The converse was also true, C++ code developed for 32 bit machines
> rarely ported to 16 bits without major effort, often a rewrite was
> required.
>
> Even in D, people cannot seem to shake their C/C++ heritage in worrying
> about the size of an int, and typedef it "just in case". I know I am
> much happier with "int" than "int32_t". The latter just stinks. Sorry.

It stinks to use "int" if it's not going to be as fast as "int64_t"
in some context, certainly. And how am I to get the fastest type
for operations that could be done in 16 bits? intfast16_t, or...
I have to guess with D whether to use short or int, and profile on
every platform? If I say that I need a type of exactly 32 bits, I'm
overspecifying the size while underspecifying optimization goals
(for speed or for space).

But if your focus is narrow enough that you care only about mainstream
desktop and server platforms, it's a perfectly reasonable trade-off.

> It's at least possible you aren't seeing actual problems with int sizes
> these days because practically every C++ compiler sets them at 32 bits,
> even for 64 bit CPUs. So you never know if your use of typedefs is
> correct or not.

sizeof(long) varies still, on common systems, as does sizeof(void*).
But my knowledge doesn't come just from "these days" -- as with many,
I started working on 8-bit systems with 16-bit addressing. Please
don't assume that those who disagree with you do so because they
lack experience or knowledge. It's common that they have knowledge
which you don't (and vice versa).

>> but I've seen mistakes made in languages with fixed-sized types too.
>
> So have I. But the question is how prevalent are such mistakes, versus
> mistakes from the int sizes changing?
>
>
>>> Everybody deals with it, 90%
>>> of them get it wrong, and nobody solves it the same way as anybody else.
>>> How is this not very expensive?
>>
>> Your perspective/experience do not match mine. Many places deal
>> with it, most of the get it right, and most of them solve it in
>> very similar ways, moreso since C99 et al standardized typedefs
>> for various integral types. The expense is insignificant in all
>> competently run projects I've seen.
>
> I'll bet that in most of those competently run projects, the code has
> never been ported to a compiler with different int sizes, so how good a
> job they did has never been tested. I saw how well (i.e. badly) it
> worked in the last big shift from 16 to 32 bit code.

I'd take that bet (though I'll also say that it's common for other
reasons just to say that platforms smaller than 32-bits are
unsupported because they don't have the horsepower to run the
application).

> Would you like to try porting one of the ones that get it right to 16
> bits? I've got a beer that says it fails <g>.

I'll have to dig up an emulator for a 16-bit platform to take
that bet, but it sounds like a good excuse for a beer... I'll
keep you posted.

> And that brings us back to the fundamental problem with UB and IDB - how
> do you *know* you did it right? It isn't testable.
>
> Me, I'd rather define the problem out of existence, and have my good,
> competent engineers working on something more worthy of their talents.

If I were making a new language, I'd do the same. The cost of
supporting as many platforms as C and C++ do is enormous (but
the benefit to many users of those platforms has been huge).

>>> As this thread has demonstrated, even C++ experts do not know this
>>> corner of the language spec.
>> I noticed one expert who was surprised by it (which did
>> surprise me). But then even you make incorrect claims
>> about basic aspects of C and C++ on occasion -- it doesn't
>> always mean that things are too complicated, just that
>> people are (all) fallible.
>
> Yes, although I've read every detail of the specs and have implemented
> them, I sometimes mis-recall bits of it.

Me too.

> I bet if I sat down and quizzed
> you on arcane details of the spec, I'd find a way to trip you up, too.

Yup; you might start by grilling me on arcana of name lookup,
there are enough landmines there I'd fail on.

> The point of all this is that dismissing problems with the language by
> saying that "good" or "competent" programmers won't trip over them is
> not good enough. Humans, no matter how good they are, screw up now and
> then. I view the job of the language designer is, at least in part, to
> make the design resistant to human failure.

A noble goal. Probably good PL design can make 10% as much
difference as the variation between programmers does, but that's
still a huge potential benefit.

> For example, airplane pilots use checklists for everything. Is it
> because they aren't good pilots? Absolutely not. They are good pilots
> because they *use* the checklist, even though it seems silly. Even the
> best pilots would (and have) made monumental mistakes like forgetting to
> put gas in the tanks. Even though their very lives are forfeit, they
> still make stupid mistakes.
>
> I read an article recently about attempts to introduce checklists into
> hospital procedures. The doctors are resisting because they feel
> checklists are insulting and demeaning to their exalted expertise. The
> reality is that hospitals that use checklists reduce mistakes by
> something like 30% (I forgot the exact figure).

And software processes have a similar (or even more pronounced)
effect on quality in spite of some complaining that they somehow
suppress creativity (when I think, rather, that they free our
minds from the dull, mechanizable details and free us to use
them creatively).

> The best programmers are not gods. They make stupid mistakes, too. I
> make them, you make them, Bjarne makes them. The "checklist" is the
> compiler. The more the language can be designed so that mistakes get
> caught by the compiler or in test, rather than being UB, the more
> reliable software we can make.

A compiler is part of the checklist. Static analysis tools go
further, design and code reviews help, unit testing helps (and
I know you agree on that, as D built it right into the language).

>> I can assure you from decades of experience that, while this
>> problem can exist, I've never suffered portability issues because
>> of it,
>
> Do you use compilers that have different signs?

I have done (and if I want to do so today, I'll fire up gcc and
tell it to make char an unsigned type with -funsigned-char or
whatever the option is).

> In the Windows world,
> all the compilers, over time, gravitated towards using the same
> signedness for char (signed) not because of happenstance, but because it
> made real code more portable. I'm not in the least surprised that g++ on
> x86 Linux also has chars signed. It's pretty easy to never actually
> encounter a compiler with a different char sign, and hence have
> undiscovered bugs.

Ironically, I've seen a fair number bugs because of use of char
assuming values could not be negative (and g++, I think, warns
when using plain char as an index into an array).

>> because compilers have warned in any marginal situation,
>
> Warnings are a good sign that there's something wrong with the language
> design. BTW, I just tried this:
>
> int test(char c) { return c; }
>
> with:
>
> g++ -c foo.cpp -Wall
>
> and it compiled without error or warning. (gcc-4.1)

What's "marginal" about that situation? It's fine except on
a platform where sizeof(int)==1 and char is signed, and even
then it's fine unless c > MAX_INT.

>> and it's rare to use unadorned "char" for anything where
>> signedness matters.
>
> That's because we try to avoid that like the plague.

I'd say it's because char is morally not a numeric type :)

>> (Exception: <ctype.h>-related matters.)
>
> String literals are char based (the sign gets you in hot water when
> you're doing utf-8). The standard library (especially the C one) is
> replete with char*. If you avoid using char, you wind up using a lot of
> casts. It's not practical to avoid char.

True, that is something of a pain, and would have all gone
away if char were mandated to be unsigned.

-- James

Peter Dimov

unread,

Jan 13, 2008, 4:06:27 PM1/13/08

to

On Jan 13, 1:48 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> I've read many diatribes against Java, from the ignorant to the well
> -informed, and not one ever complained about the integer math behavior
> being nailed down.

That's because the potential performance loss is hidden, doubly so if
the C++ compilers in wide use don't take advantage of the license to
optimize under the no-overflow assumption.

A similar example is that certain Fortran code runs circles around C
because of the no-alias assumption, yet few programmers have
complained that C doesn't un-define the behavior in the presence of
aliasing.

The slight performance loss in the integer arithmetic case may still
be tolerable in exchange for a fully nailed-down specification, of
course.

Peter Dimov

unread,

Jan 13, 2008, 4:05:18 PM1/13/08

to

On Jan 13, 1:48 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> > it allows for optimizations based on assumptions of non-overflow,

>
> Apparently the new g++ does that, though my question on what those were
> is so far unanswered. My experience with optimizations that change the
> behavior is that customers call it an optimizer bug, even if the fault
> lies with their reliance on UB.

Greg Herlihy answered this question in another subthread. In short,
2*x/2 == x. This is the kind of optimization people expect, not the
kind they object to.

Francis Glassborow

unread,

Jan 13, 2008, 4:03:17 PM1/13/08

to

Walter Bright wrote:
> g++ 4.1 has 40 options that explicitly modify C++ language behavior.
> That's 40 factorial interactions. I suspect there are more, like the
> aforementioned integer optimizations.

Not according to my math. Assuming that each switch is simply on/off
there are 2^40 (yes a very big number but many orders of magnitude less
than the one you state. 2^40 will fit without overflow into a 64-bit
integer type 40 factorial won't. That matters in the context of what we
are discussing)

Furthermore I am not sure that all the options are compatible with each
other. If they are not then the alternatives get further reduced.

Yechezkel Mett

unread,

Jan 13, 2008, 4:00:46 PM1/13/08

to

On Jan 10, 5:06 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Bart van Ingen Schenau wrote:
>
> > On the other hand, I have used a compiler for a DSP that supports
> > saturating arithmetic (overflow gets clipped to the largest value).
> > Although the compiler writers decided otherwise, this mode of
> > arithmetic could have been selected to be used for signed operands
> > without any violation of the standard.
>
> The compiler writers made a wise move. Such an unusual mode could
> silently introduce pernicious bugs when porting existing, debugged code
> to it.
>

> Since there is no way to defend against such possible errors in one's
> code, and the overwhelming majority (dare I say all?) compilers
> implement it in one way, that way should be standardized.

I would not be happy with that. I suspect that most uses of overflow
in signed integers are unintentional, and such porting would not be
introducing bugs -- the software is probably already buggy. I
personally would much prefer a compiler that trapped unintentional
overflow rather than allowing the program to continue blithely on. I
understand that for performance reasons, or ease of implementation, or
some other reason most compiler implementors currently prefer wrap-
around, but for the standard to mandate it would be a major
disservice.

For the (I suspect) rare cases where wrap-around is useful unsigned
types are available, and the use of a signed type in such a situation
suggests to me that the programmer simply didn't take the possibility
into account.

Yechezkel Mett

Peter Dimov

unread,

Jan 13, 2008, 4:08:07 PM1/13/08

to

On Jan 12, 2:12 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> The D programming language does nail down many UBs and IDBs:

...

> 4) floating point is IEEE

I wonder what it does on x86/x87 with the examples cited in

http://hal.archives-ouvertes.fr/docs/00/15/88/63/PDF/floating-point.pdf

Apparently, correct and fast IEEE is much harder than it looks.

Pete Becker

unread,

Jan 13, 2008, 4:03:05 PM1/13/08

to

On 2008-01-13 00:48:54 -0500, Walter Bright
<wal...@digitalmars-nospamm.com> said:

> Pete Becker wrote:
>> On 2008-01-12 01:12:44 -0500, Walter Bright said:
>>> On the other hand, Java went the route of eliminating undefined and
>>> implementation defined behavior, and has had astonishing success
>>> (largely at C++'s expense).
>>
>> Post hoc ergo propter hoc.
>
> I agree I can't prove it. But neither can one prove James' remark:

I didn't say you can't prove it. I said your argument is nonsense.

>
>> Programmers who did serious numeric computations hated Java in its
>> original incarnation, because the restrictions it imposed on
>> floating-point math made it abominably slow on Intel processors.
>
> Yeah, I know about that. They went too tight with the floating point in
> eliminating IDB, and since backed off a turn of the screw. C++ could
> easily tighten down the screws several full turns, though.

Maybe. But your claim was that eliminating undefined behavior and
implementation defined behavior was somehow responsible for Java's
success. This example shows just the opposite.

>
> I've read many diatribes against Java, from the ignorant to the well
> -informed, and not one ever complained about the integer math behavior
> being nailed down.

Again irrelevant to your claim that these things were responsible for
Java's success.

Java's success was largely the result of having a marketing department.
Naive programmers believe their claims that tightening the rules makes
programming far easier, despite the absence of concrete evidence.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Andrei Alexandrescu (See Website For Email)

unread,

Jan 13, 2008, 4:10:36 PM1/13/08

to

Walter Bright wrote:

That's a good point. I think the crux of the matter is this:

Kernighan & Ritchie fostered mapping C integral types (short, int, long)
to the most "fit" types on the target machine in a highly flexible way
(e.g. they could all bear the same size), with int modeling the natural
word size. That strategy made it feasible to define a compiler for a
wide range of machines and to write code for each. The same strategy
conjectures that differences among integral types sizes on different
platforms is either of little importance, can be reliably taken care of
by the programmer.

In my opinion, experience has shown that indeed C's considerable
flexibility in mapping names to machine types made it highly
implementable on all sorts of machines, but that the second story turned
out to be less successful.

Other languages chose to fix the sizes of integral types, strategy that
made it less implementable on various machines, but made things easy for
porting existing code among machines on which the language could
successfully be ported.

I wonder what would be the "best" way. In the latter systems, for
example, an implementation for a certain CPU might prohibit use of some
types, e.g. "char". Everything in the client code and standard library
of that language relying on "char" must go. An interesting conclusion is
that such a language must have generic (a.k.a. template) function
capabilities, such that commonly useful functions (such as string
functions, parsing and conversion functions etc.) do not commit to any
types in particular. I think such a system would be a net improvement.

Andrei

Walter Bright

unread,

Jan 13, 2008, 4:14:41 PM1/13/08

to

Bo Persson wrote:
>>> My experience porting D code between platforms is it ports easier
>>> than the equivalent C++ code.
> Porting is easier if you limit the number of potential platforms.

Sure, but I was comparing porting C++ from platform A to B with porting
D from A to B. The latter was noticeably easier.

>> Porting Java is easy too, if your target platform supports it.
> Porting Java is hard, if you haven't ported its platform first!

Porting C++ compilers is pretty hard, too. How many programmers do you
know who can write a code generator?

> We had a discussion just last week with a Java developer on reusing
> his web server code on the mainframe.
>
> - "Oh dear! That's just Java 1.5, I need 1.6 generics for my code.
> Limiting myself to 1.5 features will cost you a lot more!"

But isn't Java implemented in C? C is more portable and available on
every platform, so he should just recompile it and he's good to go.

A look through the source of Boost and STL will show that compiler
support for various C++ features is a perennial problem. Source code
portability, when using advanced C++ features, has always been a serious
problem. How long do you think it will take for all C++ compilers to
implement all of C++0x?

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 14, 2008, 3:27:19 PM1/14/08

to

Pete Becker wrote:
> Java's success was largely the result of having a marketing department.

While a great marketing department is certainly helpful, it is a serious
mistake to dismiss Java that way (at least for a language designer, it is).

I regularly talk to heavy Java developers, with the aim of finding out
what works and what doesn't work for them. The same goes for C++. I even
talk with the Lisp guys when I can find one!

> Naive programmers believe their claims that tightening the rules makes
> programming far easier, despite the absence of concrete evidence.

Given the code in C++ and Java:

a = foo() + bar();

which one would I have to rewrite as:

tmp = foo();
a = tmp + bar();

if there is an order of evaluation issue? How would I reliably detect
such an issue in the C++ version (putting on my QA hat)?

From a QA standpoint, it's clearly easier to verify the Java version
than the C++ one. foo() is evaluated first, then bar(). That's easier to
deal with than hmm, which one happens first, and when might that change,
and do I have any order dependencies here?

Java does have a predictability issue with the gc, but that is separate
from integer, OOE, etc., issues.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 14, 2008, 3:25:29 PM1/14/08

to

Walter Bright wrote:
>> Note that these are not ancient "nutburger" architectures -- these are
>> both current and in _wide_ use. Just for an obvious example, the last
>> time I was in Costco, they had a brand new HD-DVD player that (on the
>> outside of the box!) bragged about using a SHARC processor.
>
> Here's the C++ compiler for the sharc:
>
> http://www.analog.com/UploadedFiles/Associated_Docs/75285036450_SHARC_cc_man.pdf

I'd like to expand on this issue a bit. The C++ sharc compiler has:

1) 32 bit shorts
2) 32 bit chars
3) 40 bit doubles

While this is legal for a C++ compiler, and arbitrary C++ code may
compile successfully, that doesn't at all mean it will run. For example,
take zlib, the open source compression library, that's been ported to
many platforms. What are the odds that is going to work out of the box
with this compiler? I'd say, zero. You're going to have to redesign
large sections of it.

For example, let's look at shorts. The only reason people use shorts
these days is:

1) to reduce memory consumption
2) to manipulate 16 bit values in an existing data structure
3) to specifically use 16 bit math
4) to manipulate 2 bytes at once

because shorts are often slower than ints. So, any code using shorts for
2..4 will have to be recoded.

Similar reasoning applies to chars. Chars are very often used, not to
store characters, but to do byte manipulation. You cannot do byte
addressing on the sharc. All the C++ code you wrote to do byte
manipulation will need to be thoroughly re-engineered.

Let's look at a hypothetical D compiler for the sharc. A straightforward
implementation would probably simply make chars and shorts illegal. Code
that uses characters could be edited to use dchars instead (D's 32 bit
character type). Code that manipulates bytes and 16 bit values is going
to have to be re-engineered, just like the C++ code will be. The only
real difference to the programmer is that the D compiler will *tell* you
you cannot use chars or shorts, while the C++ compiler will compile it
anyway and produce code that won't work properly, leaving you to find it
with a debugger.

The 40 bit doubles are a bit more problematic. I'd recommend for D to
just go with 40 bit doubles on that platform, and then let the user deal
with the possible precision problems just as he would have to with C++.

The bottom line is, which language would be more work for the sharc
programmer to port code to? Only experience can tell for sure, but I
suspect D would come out ahead as the compiler will flag code that won't
work on sharc, while C++ will compile it anyway and leave it at the
mercy of his test suite to find the problems.

Walter Bright

unread,

Jan 14, 2008, 3:28:42 PM1/14/08

to

Andrei Alexandrescu (See Website For Email) wrote:
> Walter Bright wrote:
>> James Dennett wrote:
>>> it allows for optimizations based on assumptions of non-overflow,
>>
>> Apparently the new g++ does that, though my question on what those were
>> is so far unanswered. My experience with optimizations that change the
>> behavior is that customers call it an optimizer bug, even if the fault
>> lies with their reliance on UB.
>
> I searched a bit online and found such a case. Compile this code:
>
> #include <cstdio>
> using namespace std;
>
> int main() {
> int u = 2000000000;
> int v = (u * 2) / 2;
> printf("%d\n", v);
> }
>
> with and without the -fwrapv flag. It produces different results.

I did a google search for: [optimizer fwrapv printf]

The very first hit was:
http://www.nabble.com/GCC-4-compiler-bug-td13980364.html

Note that the user calls it a compiler bug, just as I predicted. Reading
the followups to that is very relevant to this discussion.

Here's a very informative article about it:
http://gcc.gnu.org/ml/gcc/2007-01/msg00120.html

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 14, 2008, 3:27:39 PM1/14/08

to

Francis Glassborow wrote:
> Walter Bright wrote:
>> g++ 4.1 has 40 options that explicitly modify C++ language behavior.
>> That's 40 factorial interactions. I suspect there are more, like the
>> aforementioned integer optimizations.
>
>
> Not according to my math. Assuming that each switch is simply on/off
> there are 2^40 (yes a very big number but many orders of magnitude less
> than the one you state. 2^40 will fit without overflow into a 64-bit
> integer type 40 factorial won't. That matters in the context of what we
> are discussing)

While your math is right (and mine was wrong), it doesn't matter. The
universe will end before you can test all combinations of those switches
with the test suite. Then, there are all the other g++ switches, which
go on for pages.

> Furthermore I am not sure that all the options are compatible with each
> other. If they are not then the alternatives get further reduced.

Sure, but can you get them all tested before the sun turns into a red giant?

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Bo Persson

unread,

Jan 14, 2008, 3:39:00 PM1/14/08

to

Walter Bright wrote:
> Bo Persson wrote:
>>>> My experience porting D code between platforms is it ports easier
>>>> than the equivalent C++ code.
>> Porting is easier if you limit the number of potential platforms.
>
> Sure, but I was comparing porting C++ from platform A to B with
> porting D from A to B. The latter was noticeably easier.

It could be because the variation is smaller. There are some platforms
were you cannot easily implement D at all, like IBM zSeries with
EBCDIC character set and non-IEEE floating point. C++ has no problem
with that.

>
>
>>> Porting Java is easy too, if your target platform supports it.
>> Porting Java is hard, if you haven't ported its platform first!
>
> Porting C++ compilers is pretty hard, too. How many programmers do
> you know who can write a code generator?

The point was rather that Java is very hard, if the intended platform
doesn't support the spec. It might require adding dedicated hardware:

http://www-03.ibm.com/systems/z/zaap/

>
>
>> We had a discussion just last week with a Java developer on reusing
>> his web server code on the mainframe.
>>
>> - "Oh dear! That's just Java 1.5, I need 1.6 generics for my code.
>> Limiting myself to 1.5 features will cost you a lot more!"
>
> But isn't Java implemented in C? C is more portable and available on
> every platform, so he should just recompile it and he's good to go.

Well, IBM believe they should decide what Java version to run on z/OS.
It also needs to access the special Application Assist hardware to do
IEEE floating point. C and C++ doesn't have to do that.

For our inhouse applications portability is no concern, but Java still
insists on using a portable data format. Even if it is more expensive.
A lot more, in this case!

>
> A look through the source of Boost and STL will show that compiler
> support for various C++ features is a perennial problem. Source code
> portability, when using advanced C++ features, has always been a
> serious problem.

IMO, Boost is trying much too hard to support pre-standard compilers.
If they tried to support 1-2 releases of each compiler, instead of
4-5, the number of workarounds needed would be drastically reduced.

> How long do you think it will take for all C++
> compilers to implement all of C++0x?

A long time, no doubt. :-(

Bo Persson

Greg Herlihy

unread,

Jan 14, 2008, 3:32:05 PM1/14/08

to

I agree completely: C++ should not define signed arithmetic overflow.
After all, a C++ program already has the option of performing unsigned
arithmetic in order to avoid overflow. Signed arithmetic, therefore,
should be reserved for those calculations in which an overflowed value
has no sensible representation (because the value falls outside of the
range of integers that the integral type can represent).

I would guess that for many - if not most - programs that perform
integer arithmetic, defining overflow to "wrap around" would be
counter-productive. After all, it is hard to imagine how an
accounting, payroll or billing program would benefit if - the sum of a
large number of positive integers - were to turn out to be negative.
Clearly, in such a situation, having the program abort would be
preferable than having the program carry on - only with this negative
sum filling in for the expected (but unrepresentable) value.

Moreover, having a C++ program trap on signed overflow is not some
hypothetical possibility. At least one C++ compiler (g++) offers an
option (-ftrapv) to trap on signed arithmetic overflow.

For example, compiling the source file below:

// overflow.cc

#include <limits.h>
#include <iostream>

unsigned unsigned_overflow()
{
unsigned u = UINT_MAX;

std::cout << "Testing unsigned overflow...\n";
return u + 1;
}

signed signed_overflow()
{
int s = INT_MAX;

std::cout << "Testing signed overflow...\n";
return s + 1;
}

int main()
{
int u = unsigned_overflow();

std::cout << "Unsigned overflow value: " << u << "\n";

int s = signed_overflow();

std::cout << "Signed overflow value: " << s << "\n";
}

with this command:

g++ -ftrapv overflow.cc

produces the following output when run:

Testing unsigned overflow...
Unsigned overflow value: 0
Testing signed overflow...
Abort trap

Greg

Jerry Coffin

unread,

Jan 14, 2008, 3:35:08 PM1/14/08

to

In article <5dSdnf-pHcH0MhTa...@comcast.com>,
wal...@digitalmars-nospamm.com says...

> Jerry Coffin wrote:
> > the Java standard (at least the last time I
> > looked at it) simply ignores the issue entirely or (more often) includes
> > some phrase like "IEEE floating point", that makes it sound like the
> > issue has been dealt with, but when you get down to it doesn't really
> > mean much at all.
>
> IEEE 754 floating point arithmetic means a lot more than nothing, and
> certainly far more than C++ floating point, but you're right that it
> doesn't nail it down 100%.

For Sun it meant Java code executed far better on hardware built by Sun
than by many of their competitors. For most Java users, however, it
meant nothing more than a false sense of security (then again, Java in
general seems to have been designed specifically and marketed with the
idea of instilling a false sense of security).

[ ... ]

> > These requirements still limit practical portability, even among modern,
> > widely-used architectures. Quite a few DSP and even some PDAs, cell
> > phones, etc., don't support any 8-bit type, just to give one example.
>
> I disagree, because nothing would permit a D *variant* from being
> customized to unusual architectures. Such would not be less useful than
> simply dumping the problem off to all users.

Yes, it would. It's entirely possible and indeed quite practical and
reasonable to write C and C++ code that works perfectly fine depending
only upon what C and C++ guarantee for char (for one example). The fact
that under some circumstances char might be 16, 32 or even 64 bits
doesn't bother the code a bit. Even code that is written to depend on 8-
bit chars is usually quite easy to convert to remove that dependency.

D goes the opposite route: by guaranteeing that char will always be 8
bits, it (tacitly?) encourages people to write their code to depend on
that fact. It's a bit difficult to guess at how easy it is to write D
code that won't break with other sizes of char, since there's apparently
no way to test such code right now.

> I suspect that this is an even better solution for those programming
> such machines, because they won't be under the delusion that code that
> has never been tested under such conditions would have been "portably"
> written with some wrong notion of portability.

I think this is a delusion. People who normally work with such machines
are unlikely to be deluded about the amount of code that's really
portable. Nonetheless, if somebody _wants_ to do so, they most certainly
_can_ write C and/or C++ code that works fine with various sizes of
char. With D they can't do any such thing, because anything with a
different size of char, by definition, isn't D.

[ ... ]

> A more relevant question is how many programmers are programming for
> these oddballs, vs programming for mainstream computers?

That, of course, is a difficult question to answer. The ratio of
programmers to CPUs is almost certainly smaller than with mainstream
computers, but sales volume is also _quite_ high in many cases, making
it difficult to figure out the actual number.

[ ... ]

> UB does not imply reliable or repeatable behavior, so any dependence on
> UB is inherently unreliable _and_ unportable.

This is nonsense and you know it. UB does not _imply_ reliable or
repeatable, but it does not preclude either one. The standard
specifically allows an implementation to define the result of particular
undefined behavior -- and when portability isn't a concern, the fact
that it's defined only by the implementation is entirely irrelevant.

[ ... ]

> But when you've got a
> million lines of code, suddenly even the obscure cases become probable.
> And when you don't have a thorough test suite (who does?) how can you be
> *sure* you don't have an issue there?

Being truly "*sure*" of anything is pretty rare -- even if it's required
by the language, it's nearly impossible to be sure you couldn't run
across some obscure bug in the compiler. Even on projects considerably
less than a million lines, I've certainly done so.

> I'm very interested in building languages which can offer a high degree
> of reliability. While D isn't a language that gets one there 100%, it
> gets a lot closer than C++ does.

I've yet to see convincing evidence of that -- though I'll admit I
haven't looked very hard, and such evidence would be off-topic here in
any case.

> I am a little surprised at the
> resistance to improving C++ along these lines. It's not like pinning
> down UB is going to break existing code - by definition, it won't.

Yes and no -- it won't break any portable code. It could easily break a
lot of code that depends on behavior of a specific implementation.

> > Nearly the only time anybody really cares when when writing an extended
> > precision integer library. You'd accomplish far more by defining (for
> > one example) the result of the remainder operator when dealing with
> > negative numbers.
>
> Did that, too, just forgot to mention it.

I'm glad to hear that.

> The D programming language explicitly does not support 16 bit platforms.
> That should leave no doubt about my position on that <g>. C++ was
> designed to support it, and so is fair game for criticizing its
> shortcomings in doing so.
>
> I would vote for C++ to explicitly ditch 16 bit support. No problem there!

To accomplish what? For most practical purposes, 16-bit platforms were
"ditched" when exception handling was added. OTOH, if somebody's willing
to go to the time, effort, etc., of implementing it for a 16-bit
platform, and meets all the requirements, more power to them.

I see no reason to warp the requirements specifically to support 16-bit
platforms -- but I also see no reason to explicitly rule them out.

[ ... ]

> Crud, that overflows. I do lots of hash computations, too, which rely on
> wraparound overflow.

I've done quite a few hashes too -- but on unsigned types. For unsigned
types, the rules in C and C++ are quite explicit and have always been
quite sufficient for my purposes.

[ ... ]

> You (if you don't mind me putting words in your mouth) define it as the
> likelihood of if a program compiles on X that it will also compile on Y.
> Whether it works or not depends on how good the programmer is.

That's not really correct at all. I think portability depends not only
on how "good the programmer is", but upon the programmer's intent. Java,
for one example, attempts to ensure that every correctly written program
is portable.

While I can see the motivation for that, I disagree with it. While
portability is a useful attribute, there are also many perfectly good
reasons to write code that's not portable.

Just for example, for work I've written a number of custom debuggers for
specific platforms (Windows more often than anything else). As almost
anybody can guess, virtually none of this code is portable -- and I
can't imagine any reasonable change to the language that would enhance
its portability in any way.

I should note that although the code doesn't port to other systems, back
when Windows NT was current, I had no problem at all with compiling and
_using_ quite a bit of it (completely unchanged) on Intel, MIPS and
Alpha based machines. On Intel, I believe it worked fine with every
compiler I had handy at the time as well (at least with MS, Borland,
Intel and GNU).

> I define it as the likelihood of if a program compiles *and* works on X
> that it will also compile *and* work on Y, regardless of how good the
> programmer is.

That makes it sound a great deal like D probably won't support most of
what I develop at all.

> >> 1) reliance on UB or IDB is not mechanically detectable, making programs
> >> *inherently* unreliable
> > You're overstating the situation. Certainly some reliance on some UB
> > and/or IDB is mechnically detectable.
>
> Runtime integer overflow isn't.

Yes and no. It can detect when runtime integer overflow becomes a
possibility, and even (for example) show the source of the values that
could result in the overflow.

[ ... ]

> A typical C++ compiler has a bewildering array of switches that change
> its behavior.

Change what behavior? I've certainly run across switches that simply
broke the compiler for certain code, but these are more or less
irrelevant -- the language standard means little with respect to a
compiler that doesn't (as used) implement that language.

> > Yes, when you're writing a library that's intended to be portable to
> > anything anywhere under any circumstances, these can be major issues.
>
> They're major issues for people who need to develop reliable programs
> such as, say, a flight control system, or banking software. Such
> applications need more than reliance on "good" programmers and prayer.

We must have radically different experiences. When I've dealt with high
reliability systems, portability was _never_ an issue. Rather the
contrary, high-reliability hardware was one of the first requirements,
and quite a lot of the code stood no chance of ever running on a typical
off-the-shelf PC or anything very similar at all.

Likewise, what was defined as part of the language was largely
irrelevant as well -- even when the language required particular
behavior, that behavior had to be verified with the specified
implementation before it could be depended upon. Granted, attempts have
been made (at times) at verifying/certifying implementations to assure
that they really conformed with the language requirements. Few (if any)
of these efforts seems to survive today though...

[ ... ]

> g++ 4.1 has 40 options that explicitly modify C++ language behavior.
> That's 40 factorial interactions. I suspect there are more, like the
> aforementioned integer optimizations.

Even at best, your math is flawed. Though I don't use g++ on a regular
basis, my experience with other compilers indicates that many of the
switches have no interaction at all.

> >> How much effort have you seen, time and again, going into dealing with
> >> the implementation defined size of an int? Everybody deals with it, 90%
> >> of them get it wrong, and nobody solves it the same way as anybody else.
> >> How is this not very expensive?
> >
> > I've seen a lot of effort put into it repeatedly, but I'd say over 99%
> > of the time, it's been entirely unnecessary from beginning to end.
>
> In D, the effort to deal with it is 0 because the problem is defined out
> of existence.

I suppose that depends on your viewpoint. At least in Java, the attempt
at defining the problem out of existence has created a problem that I'd
say is at least twice as bad.

> > If C
> > and C++ made it even _more_ difficult to deal with, so people would
> > learn to keep it from being an issue at all, everybody would really be
> > better off most of the time.
>
> The way to make things more difficult is to make them compile time
> errors. Then they cannot be avoided or overlooked.

You obviously missed my point. I mean doing something to make it more
difficult to even create an 'int32t' (or whatever) at all.

> Ideally, if a program
> compiles, then its output should be defined by the language.

I disagree. There is a great deal of room for non-portable code in the
world. Attempting to define the output for all code is pointless and
foolish.

> > In any case, C99 and C++ TR1 have both dealt with this for the rare
> > ocassion that it really is an issue (and, unfortunately, made it still
> > easier to write size-dependent code when it's completely unnecessary).
>
> It's rarely an issue now because:
>
> 1) C++ compilers have dropped 16 bit support (and 16 bit ints).
> 2) 32 bit C++ compilers all use 32 bit ints.
> 3) 64 bit C++ compilers also use 32 bit ints.
>
> In other words, C++ has de facto standardized around 32 bit ints.

I disagree. If anything, the prevelance of 32-bit ints has created
problems, not cured them.

> > C++ 0x will undoubtedly add this to the base C++ language as well. IMO,
> > this is almost certain to hurt portability, but at least those of us who
> > are competent can ignore it the majority of the time when it's
> > counterproductive; languages like Java and D don't even allow that.
>
> In D, you can use a variable sized int if you want to:
>
> typedef int myint;
>
> and use myint everywhere instead of int. To change the size, change the
> typedef. Nothing is taken away from you by fixing the size of int. It
> just approaches it from the opposite direction:

That doesn't fix the problem.

> C++: use int for variable sizes, typedef for fixed sizes
> D: use int for fixed sizes, typedef for variable sizes
> Java: doesn't have typedefs, oh well :-)

You've got things backwards: in C++ you get code that works correctly on
different sizes of machines, unless you take steps to stop it from doing
so. In D you get code that works correctly on different sizes of
machines only by going through massive brain damage to undo its
mistakes. In Java, you can't even undo its mistakes.

> >> Conversely, defining the behavior means that one does not have to know
> >> how other systems work. The less UB and IDB, the easier the porting
> >> gets, reducing costs.
> >
> > It gets easier, to a narrower range of targets. Outside that range of
> > targets, it becomes either drastically more difficult, or truly
> > impossible.
>
> How does it become harder or impossible?

By making it impossible to get a D implementation for that platform, of
course.

> > Contrary to your previous claims, targets you see fit to
> > ignore have not gone away, nor are they likely to do so anytime soon.
>
> I think 16 bit DOS and 36 bit PDP-10's are dead and are not likely to
> rise from their graves.

So why did you make statements about 16-bit DOS as if it was a relevant
target? Nonetheless, the set of requirements you set (such as an 8-bit
char) rule out a LOT more than just 16-bit DOS and 36-bit PDP-10's. In
fact, 8-bit char does NOT rule out DOS, but does rule out quite a few
targets that are currently in wide use. If you want to rule out only
dead systems, that's fine -- but you seem to be throwing out the baby
with the bath water.

[ ... ]

> Yes, and Digital Mars C++ does, too. I know of nobody who actually tests
> their code using those switches.

You do now! Well, technically I don't use the Digital Mars C++ switch
for the job, but I certainly use Microsoft's. In fact, I make char
unsigned more often than not...

> You and I can argue that "good"
> programmers will, but we both know they won't.

So you're claiming that because I do so, I'm _not_ a good programmer? If
there's logic in that claim, I've completely missed it.

> Most switches of that sort are of limited utility anyway because they
> screwup the abi to existing compiled libraries.

What problems have you verified along that line? I've used unsigned char
quite a bit without seeing such problems.

[ ... ]

> The C++ compiler for sharc has many sharc specific extensions. It isn't
> hard to imagine a D variant that would do the same. You'd have the same
> difficulties porting C++ code to the sharc C++ compiler as you would
> porting standard D to sharc specific D.

You've got things backwards -- while the SHARC extensions certainly make
it easy to write C++ code for it that won't port elsewhere, they do NOT
prevent portable code from working on the SHARC.

In the D case, however, there seems to be no way to write code that's
reasonably portable TO the SHARC.

[ ... ]

> Again, just being able to compile the code doesn't mean it's portable.

Just being able to compile doesn't mean it's necessarily portable,
that's true. OTOH, just NOT being able to compile definite means it is
NOT portable.

With C++ you at least have a possibility of writing code that's portable
between Windows and that SHARC (for example) and doing what you want on
both. At best D appears to make that much more difficult, and for most
practical purposes appears to rule it out completely.

> Also, I wish to point out the difference between "wide use" meaning
> there are a lot of CPUs in circulation and "wide use" meaning a lot of
> programmers are writing code for it. Only one programmer might be
> writing the sharc code that is stamped out into millions of HD-DVD
> units. HD-DVDs in Costco gives no clue about how many programmers write
> sharc code, other than it is greater than 0. On the other hand, I've
> shipped several hundred thousand C++ compilers for Windows.

Already addressed to at least some extent above.

--
Later,
Jerry.

The universe is a figment of its own imagination.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

Jan 14, 2008, 3:30:45 PM1/14/08

to

James Dennett wrote:
>>> It allows for implementations which diagnose all overflows;
>> Are there any such implementations?
> I don't know. Even if there aren't, the standard allows for one
> in future. The C++ market has been around a long time, and will
> continue to be significant for a long time yet, and a standard
> that chooses too often to fix things when it could allow for
> better QoI is not a good standard.

Yes, C++ has been around for 20+ years. If a particular characteristic
of C++ compilers, allowed for by the standard, and not too hard to
implement, has not emerged in that time, then I argue there is no
significant interest in it. And so allowing for it in the standard is
not compelling, particularly if there are costs associated with allowing it.

> There's also something
> like -fno-wrapv which disables those optimizations at some cost in
> code speed (but the cost is <<< 1% for typical application code).

An optimization which breaks code and offers <<< 1% improvement is a bad
idea to implement, even if allowed by the standard.

> Maybe what's needed is a
> "simple" mode where non-obvious optimizations are disabled and
> code runs more slowly, and a "strict" mode where the optimizer
> is allowed to do anything consistent with the language spec.

The siren song of more compiler switches should be resisted as much as
possible. Put wax in your ears and rope yourself to the mast :-)

> I'll mention again: turning an overflow into wraparound is
> not generally safe -- code which assumes no overflow is broken
> in either case.

The code this optimization breaks is not always incorrect code. See the
link I posted in my reply to Andrei.

> I'd prefer to have code that can be made safe.

Leaving it as UB doesn't help at all with that.

> But it's a meta-solution: it allows implementations to offer
> a choice of solutions, and for the market (rather than a BDFL
> or a committee) to determine which is most useful.

But that means you're encouraging *reliance* on undefined behavior.

>> There aren't very many "good" C++ programmers, then <g>.
> There's sadly a shortage of competent programmers, and it's
> somewhat independent of the implementation language.

The bar for competence in C++ is a lot higher than for other languages.
That's pretty marvy if you're a top C++ expert and can command $5000/day
in consulting fees, but it stinks if you're on the other side of that
having to write the checks. Hence the demand for languages where the
costs are lower, a demand that I think C++ dismisses a little too easily.

> The very uniformity and guarantees
> that Win16 (and later Win32) laid down caused portability
> problems to later generations of machines; the flexibility
> that was built in to Unix-style specifications aided that
> same portability.

My point was that 16 bit programs *tried* to be portable to 32 bits. All
those typedefs in windows.h were there for a reason.

> It stinks to use "int" if it's not going to be as fast as "int64_t"
> in some context, certainly. And how am I to get the fastest type
> for operations that could be done in 16 bits? intfast16_t, or...
> I have to guess with D whether to use short or int, and profile on
> every platform? If I say that I need a type of exactly 32 bits, I'm
> overspecifying the size while underspecifying optimization goals
> (for speed or for space).

You can still use typedefs in D for your various tradeoffs. Like I said
in another post, D approaches this from the opposite direction than C++.

C++: int sizes variable, typedef sizes fixed
D: int sizes fixed, typedef sizes variable

> But if your focus is narrow enough that you care only about mainstream
> desktop and server platforms, it's a perfectly reasonable trade-off.

It's a false choice. D doesn't prevent you from using varying integer
sizes. It's just not the *default*.

> Please
> don't assume that those who disagree with you do so because they
> lack experience or knowledge. It's common that they have knowledge
> which you don't (and vice versa).

Point taken.

>> I bet if I sat down and quizzed
>> you on arcane details of the spec, I'd find a way to trip you up, too.
> Yup; you might start by grilling me on arcana of name lookup,
> there are enough landmines there I'd fail on.

It's fun to watch even the experts' mouths drop when I point some of
these things out <g>. Ok, so I enjoy a little schadenfreude here and
there; I'm going to hell.

> A noble goal. Probably good PL design can make 10% as much
> difference as the variation between programmers does, but that's
> still a huge potential benefit.

My goal is 10%, and I agree it's huge. If you've got a million dollar
budget, that's $100,000 going to the profit.

> A compiler is part of the checklist. Static analysis tools go
> further, design and code reviews help, unit testing helps (and
> I know you agree on that, as D built it right into the language).

Yes. I found that putting such in the language makes it much more likely
that people will use it. Built in unit testing has become a very popular
feature of D. Bruce Eckel gets the credit for talking me into that one.

>>> because compilers have warned in any marginal situation,
>>
>> Warnings are a good sign that there's something wrong with the language
>> design. BTW, I just tried this:
>>
>> int test(char c) { return c; }
>>
>> with:
>>
>> g++ -c foo.cpp -Wall
>>
>> and it compiled without error or warning. (gcc-4.1)
>
> What's "marginal" about that situation?

It gives different answers for different signedness of char's, when c
has a value 0x80<=c<=0xFF. I should think if a compiler was to usefully
warn about these things, that example would be first on the list.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

Peter Dimov

unread,

Jan 14, 2008, 8:29:32 PM1/14/08

to

On Jan 14, 10:27 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> Given the code in C++ and Java:

>
> a = foo() + bar();
>
> which one would I have to rewrite as:
>
> tmp = foo();
> a = tmp + bar();
>
> if there is an order of evaluation issue? How would I reliably detect
> such an issue in the C++ version (putting on my QA hat)?

You make the source code and the unit tests available and someone is
sure to report it to you - if the code is worth using.

Consider the opposite example.

int a = b * c;

where the 'Java' code has int fixed at 32 bits, and the C++ port has a
64 bit int. The 'Java' code may have an overflow-induced bug in some
rare cases, whereas the C++ port will not suffer from this issue. So
it actually works both ways. Closed source vendors that support and
test on few platforms will much prefer a fixed int, whereas open
source authors whose code runs on a variety of platforms would prefer
a variable int in the original C spirit (the largest integral type
that does not impose significant performance loss).

Walter Bright

unread,

Jan 14, 2008, 8:36:53 PM1/14/08

to

Bo Persson wrote:
> Walter Bright wrote:
>> Bo Persson wrote:
>>>>> My experience porting D code between platforms is it ports easier
>>>>> than the equivalent C++ code.
>>> Porting is easier if you limit the number of potential platforms.
>> Sure, but I was comparing porting C++ from platform A to B with
>> porting D from A to B. The latter was noticeably easier.
>
> It could be because the variation is smaller. There are some platforms
> were you cannot easily implement D at all, like IBM zSeries with
> EBCDIC character set and non-IEEE floating point. C++ has no problem
> with that.

No problem compiling the code, that is. It offers no guarantee your
ascii code will run. All the porting problems are dumped on the programmer.

So yes, C++ 'supports' EBCDIC. But C++ doesn't support Unicode or UTF
encodings. D's native character set is Unicode UTF-8, UTF-16, and
UTF-32. EBCDIC is the past, Unicode is the future.

Java, based on ascii, runs on EBCDIC machines. So it clearly can't be
that hard.

Floating point is another issue. FP code is often very sensitive to
precision and other details. You might even need to use different
algorithms for non-IEEE arithmetic. The fact that the code manages to
compile on those machines is of no help at all in finding/fixing such
dependencies.

>>>> Porting Java is easy too, if your target platform supports it.
>>> Porting Java is hard, if you haven't ported its platform first!
>> Porting C++ compilers is pretty hard, too. How many programmers do
>> you know who can write a code generator?
>
> The point was rather that Java is very hard, if the intended platform
> doesn't support the spec. It might require adding dedicated hardware:
>
> http://www-03.ibm.com/systems/z/zaap/

zaap is not required to run Java on those machines (see the zaap faq).
All it is is a hardware accelerator specific to Java bytecodes.

>>> We had a discussion just last week with a Java developer on reusing
>>> his web server code on the mainframe.
>>>
>>> - "Oh dear! That's just Java 1.5, I need 1.6 generics for my code.
>>> Limiting myself to 1.5 features will cost you a lot more!"
>> But isn't Java implemented in C? C is more portable and available on
>> every platform, so he should just recompile it and he's good to go.
>
> Well, IBM believe they should decide what Java version to run on z/OS.

That's a problem with IBM, not any particular language.

> It also needs to access the special Application Assist hardware to do
> IEEE floating point. C and C++ doesn't have to do that.

So, with the JVM implemented in C, how does C being compilable on that
machine mean that we can just recompile the JVM and have Java work? We
both know that doesn't happen - and that just having code compile
doesn't mean it is portable.

> For our inhouse applications portability is no concern, but Java still
> insists on using a portable data format.

Then I believe we agree that there's more (a lot more) to portable C++
code than being able to recompile it.

> Even if it is more expensive.
> A lot more, in this case!

It boggles the mind to think people will pay $125,000 for zaap which is
only good for accelerating Java apps, when they can use cheap linux
boxes instead. It's not like Java apps are legacy apps from the 60's.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

emildot...@gmail.com

unread,

Jan 14, 2008, 8:34:47 PM1/14/08

to

> The only problem is the shortage of competent C++
> programmers, and difficulty of recognizing them when you run across one.
> And frankly I'd want my (rare and expensive) competent programmers
> working on more productive things than trying to wring all the UB out of
> the code. I'd rather *define* UB out of existence. Voila, problem gone.

Problem gone, yes -- but at what price?

Undefined behavior exists in the standard not because most platforms
differ, but because it allows for optimizations even in the presence
of slight deviations from the "norm".

"Solving" the problem the way you're suggesting would make programmers
more tolerant to the overhead introduced on some platforms. For
example, most C++ programmers are ignorant to the issue of pointer
aliasing simply because it's defined with safety in mind, yet every
one is paying the price of this definition in terms of overhead.
What's more, they look at the code generated by the compiler and think
"duh, this is a lame optimizer" when in fact the optimizer can't be
less lame while still complying with the language definition.

Emil Dotchevski
Reverge Studios, Inc.
http://www.revergestudios.com/reblog/index.php?n=ReCode

Pete Becker

unread,

Jan 14, 2008, 8:33:56 PM1/14/08

to

On 2008-01-14 09:27:19 -0500, Walter Bright
<wal...@digitalmars-nospamm.com> said:

> Pete Becker wrote:
>> Java's success was largely the result of having a marketing department.
>
> While a great marketing department is certainly helpful, it is a serious
> mistake to dismiss Java that way (at least for a language designer, it is).

Funny how you removed the context from what I said. Your claim, which
is what I was responding to, was:

>>> On the other hand, Java went the route of eliminating undefined and
>>> implementation defined behavior, and has had astonishing success
>>> (largely at C++'s expense).

I stand by my statement. Java's "astonishing success (largely at C++'s
expense)" was largely the result of having a marketing department.
While Java was on the rise, just about every article you read about
Java started out with a paragraph that was a cheap shot at C++ (My
favorite was the claim that C++ couldn't have garbage collection
because it doesn't run in a virtual machine). You had to read the
second paragraph to find out what the article was actually about.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

Jan 15, 2008, 8:52:30 AM1/15/08

to

Jerry Coffin wrote:
> Yes, it would. It's entirely possible and indeed quite practical and
> reasonable to write C and C++ code that works perfectly fine depending
> only upon what C and C++ guarantee for char (for one example). The fact
> that under some circumstances char might be 16, 32 or even 64 bits
> doesn't bother the code a bit. Even code that is written to depend on 8-
> bit chars is usually quite easy to convert to remove that dependency.

Except when you're doing the (very common) practice of using char types
to do byte manipulation. Or if you're trying to deal with Unicode encodings.

> D goes the opposite route: by guaranteeing that char will always be 8
> bits, it (tacitly?) encourages people to write their code to depend on
> that fact. It's a bit difficult to guess at how easy it is to write D
> code that won't break with other sizes of char, since there's apparently
> no way to test such code right now.

If you were worried about that, you could:
typedef char mychar; // 8 bit chars
or:
typedef wchar mychar; // 16 bit chars
or:
typedef dchar mychar; // 32 bit chars

>> I suspect that this is an even better solution for those programming
>> such machines, because they won't be under the delusion that code that
>> has never been tested under such conditions would have been "portably"
>> written with some wrong notion of portability.
> I think this is a delusion. People who normally work with such machines
> are unlikely to be deluded about the amount of code that's really
> portable.

I agree that people who are used to porting between machines A and B
will have long since figured out how to write code that is portable from
A to B. My point was for people who had no experience with such ports
attempting to write code that is portable.

Writing code that is portable in C++ is a process of accumulating
techniques, tricks, and methods over a period of time from experience.
Just reading the spec isn't sufficient (and few C++ programmers have
actually read the whole thing).

> Nonetheless, if somebody _wants_ to do so, they most certainly
> _can_ write C and/or C++ code that works fine with various sizes of
> char. With D they can't do any such thing, because anything with a
> different size of char, by definition, isn't D.

See the above typedef's.

>> A more relevant question is how many programmers are programming for
>> these oddballs, vs programming for mainstream computers?
>
> That, of course, is a difficult question to answer. The ratio of
> programmers to CPUs is almost certainly smaller than with mainstream
> computers, but sales volume is also _quite_ high in many cases, making
> it difficult to figure out the actual number.

The sales volume for embedded processors is quite meaningless in
determining the number of programmers on it. It's just as worthless as
determining the number of Apple engineers there are by counting ipod sales.

>> UB does not imply reliable or repeatable behavior, so any dependence on
>> UB is inherently unreliable _and_ unportable.
> This is nonsense and you know it. UB does not _imply_ reliable or
> repeatable, but it does not preclude either one. The standard
> specifically allows an implementation to define the result of particular
> undefined behavior -- and when portability isn't a concern, the fact
> that it's defined only by the implementation is entirely irrelevant.

The compiler implementor can guarantee whatever he wants to, but the
specification does not guarantee anything with regards to UB. Erratic,
random behavior is certainly allowed by the spec, and indeed happens
with UB behavior like buffer overflowing.

>> But when you've got a
>> million lines of code, suddenly even the obscure cases become probable.
>> And when you don't have a thorough test suite (who does?) how can you be
>> *sure* you don't have an issue there?
>
> Being truly "*sure*" of anything is pretty rare -- even if it's required
> by the language, it's nearly impossible to be sure you couldn't run
> across some obscure bug in the compiler.

I agree, but that doesn't justify not doing what we can to improve the
odds of things being correct. Boeing can never guarantee their planes
won't crash, but they work very hard at addressing every cause of
failure that they know about.

>> I'm very interested in building languages which can offer a high degree
>> of reliability. While D isn't a language that gets one there 100%, it
>> gets a lot closer than C++ does.
> I've yet to see convincing evidence of that -- though I'll admit I
> haven't looked very hard, and such evidence would be off-topic here in
> any case.

I don't believe such evidence would be off-topic in a discussion about
whether C++ should fix its UB problems or not.

>> I would vote for C++ to explicitly ditch 16 bit support. No problem there!
> To accomplish what?

To get rid of the possibility of 16 bit 'int' types.

> For most practical purposes, 16-bit platforms were
> "ditched" when exception handling was added. OTOH, if somebody's willing
> to go to the time, effort, etc., of implementing it for a 16-bit
> platform, and meets all the requirements, more power to them.

Digital Mars C++ supports 16 bit code with exception handling. It works
technically, but is not practical. I agree that 32 bits is needed for EH.

> While
> portability is a useful attribute, there are also many perfectly good
> reasons to write code that's not portable.

I agree. But each issue of UB and IDB should be evaluated in terms of
its cost and benefits. C++ has too much UB and IDB that has signficant
costs, and benefits which are dubious. D also has UB and IDB, just a lot
less of it.

> On Intel, I believe it worked fine with every
> compiler I had handy at the time as well (at least with MS, Borland,
> Intel and GNU).

Windows compilers are pretty compatible in their handling of UB and IDB,
quite deliberately so as each tried to lure customers away from their
competitors. It's a lot easier to lure a customer if their code compiles
and *works* without changes.

Hence, evidence of portability among Windows C++ compilers is not much
evidence in favor of allowing UB and IDB.

>> I define it as the likelihood of if a program compiles *and* works on X
>> that it will also compile *and* work on Y, regardless of how good the
>> programmer is.
> That makes it sound a great deal like D probably won't support most of
> what I develop at all.

I was defining portability with that statement, not D. D isn't intended
to be 100% portable. But D does intend to remove non-portable aspects of
language design whose costs exceed the benefits.

>>>> 1) reliance on UB or IDB is not mechanically detectable, making programs
>>>> *inherently* unreliable
>>> You're overstating the situation. Certainly some reliance on some UB
>>> and/or IDB is mechnically detectable.
>> Runtime integer overflow isn't.
>
> Yes and no. It can detect when runtime integer overflow becomes a
> possibility, and even (for example) show the source of the values that
> could result in the overflow.

This could result in overflow:

int sum(int a, int b) { return a + b; }

A compiler that nagged about this would be more of a nuisance than a help.

>> A typical C++ compiler has a bewildering array of switches that change
>> its behavior.
> Change what behavior?

man g++

will list 40 of them under "C++ Language Options".

>>>> How much effort have you seen, time and again, going into dealing with
>>>> the implementation defined size of an int? Everybody deals with it, 90%
>>>> of them get it wrong, and nobody solves it the same way as anybody else.
>>>> How is this not very expensive?
>>> I've seen a lot of effort put into it repeatedly, but I'd say over 99%
>>> of the time, it's been entirely unnecessary from beginning to end.
>> In D, the effort to deal with it is 0 because the problem is defined out
>> of existence.
> I suppose that depends on your viewpoint. At least in Java, the attempt
> at defining the problem out of existence has created a problem that I'd
> say is at least twice as bad.

The size of an int is a bad problem in Java?

>> Ideally, if a program
>> compiles, then its output should be defined by the language.
> I disagree. There is a great deal of room for non-portable code in the
> world. Attempting to define the output for all code is pointless and
> foolish.

D doesn't eliminate all UB or IDB. But it does try to eliminate all the
ones where the costs exceed the benefits.

>> In other words, C++ has de facto standardized around 32 bit ints.
> I disagree. If anything, the prevelance of 32-bit ints has created
> problems, not cured them.

What problems?

>>> languages like Java and D don't even allow that.
>> In D, you can use a variable sized int if you want to:
>> typedef int myint;
>> and use myint everywhere instead of int. To change the size, change the
>> typedef. Nothing is taken away from you by fixing the size of int. It
>> just approaches it from the opposite direction:
> That doesn't fix the problem.

Why not?

>> C++: use int for variable sizes, typedef for fixed sizes
>> D: use int for fixed sizes, typedef for variable sizes
>> Java: doesn't have typedefs, oh well :-)
>
> You've got things backwards: in C++ you get code that works correctly on
> different sizes of machines, unless you take steps to stop it from doing
> so. In D you get code that works correctly on different sizes of
> machines only by going through massive brain damage to undo its
> mistakes.

A typedef is massive brain damage? I'm not following this at all.

>>> Contrary to your previous claims, targets you see fit to
>>> ignore have not gone away, nor are they likely to do so anytime soon.
>> I think 16 bit DOS and 36 bit PDP-10's are dead and are not likely to
>> rise from their graves.
> So why did you make statements about 16-bit DOS as if it was a relevant
> target?

I used it as an example of where C++ tried to be portable, but failed.
C++98 *did* try to accommodate 16 bit targets.

>> Yes, and Digital Mars C++ does, too. I know of nobody who actually tests
>> their code using those switches.
> You do now!

There's one!

>> Most switches of that sort are of limited utility anyway because they
>> screwup the abi to existing compiled libraries.
> What problems have you verified along that line? I've used unsigned char
> quite a bit without seeing such problems.

Well, anything that depends on CHAR_MAX and CHAR_MIN being constant
throughout the program, for example.

>> The C++ compiler for sharc has many sharc specific extensions. It isn't
>> hard to imagine a D variant that would do the same. You'd have the same
>> difficulties porting C++ code to the sharc C++ compiler as you would
>> porting standard D to sharc specific D.
>
> You've got things backwards -- while the SHARC extensions certainly make
> it easy to write C++ code for it that won't port elsewhere, they do NOT
> prevent portable code from working on the SHARC.

That's kinda obvious, code that's portable to the SHARC works on the
SHARC <g>.

> In the D case, however, there seems to be no way to write code that's
> reasonably portable TO the SHARC.

If you eschewed char and short in favor of dchar and int, your code will
port to the SHARC.

> At best D appears to make that much more difficult, and for most
> practical purposes appears to rule it out completely.

That's just false. If the SHARC had 7 bit bytes, I'd concede the point.
But that's not the case. If you stick with D types that do match SHARC
types, it's just as portable as the C++ code is.

I challenge you to port zlib (written in C) to the SHARC. I think you'll
find it every bit as much work as if it were in D.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 15, 2008, 8:52:54 AM1/15/08

to

Pete Becker wrote:
> On 2008-01-14 09:27:19 -0500, Walter Bright
> <wal...@digitalmars-nospamm.com> said:
>
>> Pete Becker wrote:
>>> Java's success was largely the result of having a marketing department.
>>
>> While a great marketing department is certainly helpful, it is a serious
>> mistake to dismiss Java that way (at least for a language designer, it
>> is).
>
> Funny how you removed the context from what I said. Your claim, which
> is what I was responding to, was:
>
>>>> On the other hand, Java went the route of eliminating undefined and
>>>> implementation defined behavior, and has had astonishing success
>>>> (largely at C++'s expense).
>
> I stand by my statement. Java's "astonishing success (largely at C++'s
> expense)" was largely the result of having a marketing department.

And I don't believe I misrepresented your statement. I went back and
read your earlier article, and I'm still not sure how to construe it any
other way than what it obviously says. I apologize if it was
misrepresented, but there was no intention to, and I still don't see the
misrepresentation.

> While Java was on the rise, just about every article you read about
> Java started out with a paragraph that was a cheap shot at C++ (My
> favorite was the claim that C++ couldn't have garbage collection
> because it doesn't run in a virtual machine). You had to read the
> second paragraph to find out what the article was actually about.

You can get people to look at a product, kick the tires, and evaluate it
based on lies. But you won't get a sustained success that way, and Java
has been a sustained success.

Even VB has things to teach a language designer.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

James Dennett

unread,

Jan 15, 2008, 9:04:03 AM1/15/08

to

Walter Bright wrote:
> James Dennett wrote:
>>>> It allows for implementations which diagnose all overflows;
>>> Are there any such implementations?
>> I don't know. Even if there aren't, the standard allows for one
>> in future. The C++ market has been around a long time, and will
>> continue to be significant for a long time yet, and a standard
>> that chooses too often to fix things when it could allow for
>> better QoI is not a good standard.
>
> Yes, C++ has been around for 20+ years. If a particular characteristic
> of C++ compilers, allowed for by the standard, and not too hard to
> implement, has not emerged in that time, then I argue there is no
> significant interest in it. And so allowing for it in the standard is
> not compelling, particularly if there are costs associated with allowing
> it.
>
>
>> There's also something
>> like -fno-wrapv which disables those optimizations at some cost in
>> code speed (but the cost is <<< 1% for typical application code).
>
> An optimization which breaks code and offers <<< 1% improvement is a bad
> idea to implement, even if allowed by the standard.

There are many optimizations implemented by compilers (including,
I would suspect, yours) which give <<<1% improvement on running
times for typical programs -- but cumulatively the make a big
difference.

>> Maybe what's needed is a
>> "simple" mode where non-obvious optimizations are disabled and
>> code runs more slowly, and a "strict" mode where the optimizer
>> is allowed to do anything consistent with the language spec.
>
> The siren song of more compiler switches should be resisted as much as
> possible. Put wax in your ears and rope yourself to the mast :-)

I disagree, both as an implementor and as a user.

>> I'll mention again: turning an overflow into wraparound is
>> not generally safe -- code which assumes no overflow is broken
>> in either case.
>
> The code this optimization breaks is not always incorrect code. See the
> link I posted in my reply to Andrei.

By definition, it is incorrect code. It violates language rules.

>> I'd prefer to have code that can be made safe.
>
> Leaving it as UB doesn't help at all with that.

Taking my quote in context, it does. (But it's out of context now.)

>> But it's a meta-solution: it allows implementations to offer
>> a choice of solutions, and for the market (rather than a BDFL
>> or a committee) to determine which is most useful.
>
> But that means you're encouraging *reliance* on undefined behavior.

No.

>>> There aren't very many "good" C++ programmers, then <g>.
>> There's sadly a shortage of competent programmers, and it's
>> somewhat independent of the implementation language.
>
> The bar for competence in C++ is a lot higher than for other languages.

I fear that I have to agree with that.

> That's pretty marvy if you're a top C++ expert and can command $5000/day
> in consulting fees, but it stinks if you're on the other side of that
> having to write the checks. Hence the demand for languages where the
> costs are lower, a demand that I think C++ dismisses a little too easily.

I don't know that it does. Frankly, being competent to write complex
code has about the same level of investment required whichever language
you choose, as the complexity isn't in the language.

>> The very uniformity and guarantees
>> that Win16 (and later Win32) laid down caused portability
>> problems to later generations of machines; the flexibility
>> that was built in to Unix-style specifications aided that
>> same portability.
>
> My point was that 16 bit programs *tried* to be portable to 32 bits. All
> those typedefs in windows.h were there for a reason.

It's harsh to call windows.h amateurish, but historically it
has been. (The "amateurs" in question got very rich from very
bad software. Which highlights the fact that being technically
smart isn't everything.)

>> It stinks to use "int" if it's not going to be as fast as "int64_t"
>> in some context, certainly. And how am I to get the fastest type
>> for operations that could be done in 16 bits? intfast16_t, or...
>> I have to guess with D whether to use short or int, and profile on
>> every platform? If I say that I need a type of exactly 32 bits, I'm
>> overspecifying the size while underspecifying optimization goals
>> (for speed or for space).
>
> You can still use typedefs in D for your various tradeoffs. Like I said
> in another post, D approaches this from the opposite direction than C++.

C99 provides appropriate typedefs for various trade-offs, as
standard. C++0x will inherit those.

> C++: int sizes variable, typedef sizes fixed
> D: int sizes fixed, typedef sizes variable

But in D, you have no built-in support for requesting "the fastest
integral type please" or "the smallest of at least 48 bits" (except
in that D doesn't support types between 32 and 64 bits, as far as
I understand).

>> But if your focus is narrow enough that you care only about mainstream
>> desktop and server platforms, it's a perfectly reasonable trade-off.
>
> It's a false choice. D doesn't prevent you from using varying integer
> sizes. It's just not the *default*.

It's not a false choice if D has such a limited selection as I
believe it does.

>> Please
>> don't assume that those who disagree with you do so because they
>> lack experience or knowledge. It's common that they have knowledge
>> which you don't (and vice versa).
>
> Point taken.
>
>>> I bet if I sat down and quizzed
>>> you on arcane details of the spec, I'd find a way to trip you up, too.
>> Yup; you might start by grilling me on arcana of name lookup,
>> there are enough landmines there I'd fail on.
>
> It's fun to watch even the experts' mouths drop when I point some of
> these things out <g>. Ok, so I enjoy a little schadenfreude here and
> there; I'm going to hell.

I hope there's a bar there. We can continue our discussion >:)

>> A noble goal. Probably good PL design can make 10% as much
>> difference as the variation between programmers does, but that's
>> still a huge potential benefit.
>
> My goal is 10%, and I agree it's huge. If you've got a million dollar
> budget, that's $100,000 going to the profit.

And a million dollars doesn't buy all that much software development
these days.

>> A compiler is part of the checklist. Static analysis tools go
>> further, design and code reviews help, unit testing helps (and
>> I know you agree on that, as D built it right into the language).
>
> Yes. I found that putting such in the language makes it much more likely
> that people will use it. Built in unit testing has become a very popular
> feature of D. Bruce Eckel gets the credit for talking me into that one.

I have newfound respect for Mr Eckel.

>>>> because compilers have warned in any marginal situation,
>>>
>>> Warnings are a good sign that there's something wrong with the language
>>> design. BTW, I just tried this:
>>>
>>> int test(char c) { return c; }
>>>
>>> with:
>>>
>>> g++ -c foo.cpp -Wall
>>>
>>> and it compiled without error or warning. (gcc-4.1)
>>
>> What's "marginal" about that situation?
>
> It gives different answers for different signedness of char's, when c
> has a value 0x80<=c<=0xFF. I should think if a compiler was to usefully
> warn about these things, that example would be first on the list.

For a signed 8-bit char, that never occurs, so I don't know
what you mean. And a conversion from a char to an int will
preserve values if possible.

-- James

Jerry Coffin

unread,

Jan 15, 2008, 9:02:49 AM1/15/08

to

In article <suWdnfHdOtlF5xfa...@comcast.com>,
wal...@digitalmars-nospamm.com says...

[ ... ]

> I'd like to expand on this issue a bit. The C++ sharc compiler has:
>
> 1) 32 bit shorts
> 2) 32 bit chars
> 3) 40 bit doubles
>
> While this is legal for a C++ compiler, and arbitrary C++ code may
> compile successfully, that doesn't at all mean it will run.

Of course it doesn't mean it will run, but it means that if you care,
you CAN make it run correctly.

Compiling is necessary but not sufficient.

With C++ you can meet the necesssary condition easily. Meeting the
sufficient condition may or may not take extra time, effort, skill, etc.

With D you cannot meet the necessary condition (short of something like
writing a virtual machine that provides the conditions it requires, of
course). The sufficient condition becomes irrelevant when the necessary
condition cannot be met.

Your argument based on creating a new language that does not include
char, short, etc., has little to do with either D or C++, since that
language obviously would NOT be either one.

In any case, simply providing an error where/when anybody tries to do
anything with a char (for example) is relatively easy to manage in C++
as well:

#define char

Now, something like:

char x;

or:

char func(void);

or:

void func(char x) {}

becomes ill-formed, so the compiler complains about it. Yes, there are a
FEW things that can escape this, such as a function that specifies the
type but not the name of an (unused) parameter:

void func(char) {}

...but this is fairly rare, and since (by definition) we're talking
about an unused parameter, changing its size seems extremely unlikely to
lead to problems anyway.

In the end, writing C++ that's portable between this and various other
architectures is entirely possible, though doing so may be more
difficult than writing less portable code.

Writing D that's portable to this and various other architectures is
simply impossible. The language specification precludes D from being
implemented on this architecture.

--
Later,
Jerry.

The universe is a figment of its own imagination.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Dave Harris

unread,

Jan 15, 2008, 4:39:33 PM1/15/08

to

wal...@digitalmars-nospamm.com (Walter Bright) wrote (abridged):

> Yes, C++ has been around for 20+ years. If a particular
> characteristic of C++ compilers, allowed for by the standard, and
> not too hard to implement, has not emerged in that time, then I
> argue there is no significant interest in it.

I'd be interested. I've been hurt by int overflow bugs, and I'd have been
helped by a DEBUG option to trap it at run-time. (Or a RELEASE option, if
it didn't affect performance - there's no good reason to overflow int in
C++.)

I think to some extent these things go in fashion. I use a Microsoft
compiler, and it seems every new release has more and better
bug-detection. A feature which impacts performance becomes increasingly
likely to get supported as hardware gets faster. Also, in this case, as
hardware gets bigger. Now file systems and even address spaces bigger
than 32 bits are becoming common, so int overflows dealing with them are
more likely. Just because something hasn't been done in the past doesn't
mean it won't happen in the future.

I gather you are a compiler vendor, and didn't implement it because you
weren't aware the standard permitted it. So maybe now you know it's
allowed and that at least some users want it, you'll consider adding it
to your product? If not you, then perhaps other vendors reading this.

-- Dave Harris, Nottingham, UK.

Bo Persson

unread,

Jan 15, 2008, 4:36:48 PM1/15/08

to

Walter Bright wrote:
> Bo Persson wrote:
>> Walter Bright wrote:
>>> Bo Persson wrote:
>>>>>> My experience porting D code between platforms is it ports
>>>>>> easier than the equivalent C++ code.
>>>> Porting is easier if you limit the number of potential platforms.
>>> Sure, but I was comparing porting C++ from platform A to B with
>>> porting D from A to B. The latter was noticeably easier.
>>
>> It could be because the variation is smaller. There are some
>> platforms were you cannot easily implement D at all, like IBM
>> zSeries with EBCDIC character set and non-IEEE floating point. C++
>> has no problem with that.
>
> No problem compiling the code, that is. It offers no guarantee your
> ascii code will run. All the porting problems are dumped on the
> programmer.

My code isn't coded for ACSII, and I don't want to port it. This is
all about inhouse applications. We just want to write code for the
hardware we have.

>
> So yes, C++ 'supports' EBCDIC. But C++ doesn't support Unicode or
> UTF encodings. D's native character set is Unicode UTF-8, UTF-16,
> and UTF-32. EBCDIC is the past, Unicode is the future.

But we live with the past. 40 years of EBCDIC code just doesn't go
away.

>
> Java, based on ascii, runs on EBCDIC machines. So it clearly can't
> be that hard.

For some definition of "runs", see below.

>
> Floating point is another issue. FP code is often very sensitive to
> precision and other details. You might even need to use different
> algorithms for non-IEEE arithmetic. The fact that the code manages
> to compile on those machines is of no help at all in finding/fixing
> such dependencies.

It's not about porting, it's about being allowed to write custom code
for the machine. Too bad if the language spec says that you cannot!

>
>
>>>>> Porting Java is easy too, if your target platform supports it.
>>>> Porting Java is hard, if you haven't ported its platform first!
>>> Porting C++ compilers is pretty hard, too. How many programmers do
>>> you know who can write a code generator?
>>
>> The point was rather that Java is very hard, if the intended
>> platform doesn't support the spec. It might require adding
>> dedicated hardware: http://www-03.ibm.com/systems/z/zaap/
>
> zaap is not required to run Java on those machines (see the zaap
> faq). All it is is a hardware accelerator specific to Java
> bytecodes.

"All it is" is that it enables you to run Java code on the mainframe,
with resonable efficiency. Without the accellerator, even JIT-ed code
made the WebSphere server consume 50% of the CPU resources on
Enterprise level z9 hardware. The other several hundred applications
had to be content with the other half of the machine.

>
>
>>>> We had a discussion just last week with a Java developer on
>>>> reusing his web server code on the mainframe.
>>>>
>>>> - "Oh dear! That's just Java 1.5, I need 1.6 generics for my
>>>> code. Limiting myself to 1.5 features will cost you a lot more!"
>>> But isn't Java implemented in C? C is more portable and available
>>> on every platform, so he should just recompile it and he's good
>>> to go.
>>
>> Well, IBM believe they should decide what Java version to run on
>> z/OS.
>
> That's a problem with IBM, not any particular language.
>
>> It also needs to access the special Application Assist hardware to
>> do IEEE floating point. C and C++ doesn't have to do that.
>
> So, with the JVM implemented in C, how does C being compilable on
> that machine mean that we can just recompile the JVM and have Java
> work? We both know that doesn't happen - and that just having code
> compile doesn't mean it is portable.

And the absurdity here is that the Java spec makes it less portable.
Having the JVM implemented in C isn't enough, if it means that you
have to emulate the hardware. C and C++ can use the hardware, Java
cannot.

Also, we have no requirement for the code to be portable. It only runs
on one set of hardware!

>
>
>> For our inhouse applications portability is no concern, but Java
>> still insists on using a portable data format.
>
> Then I believe we agree that there's more (a lot more) to portable
> C++ code than being able to recompile it.
>
>> Even if it is more expensive.
>> A lot more, in this case!
>
> It boggles the mind to think people will pay $125,000 for zaap
> which is only good for accelerating Java apps, when they can use
> cheap linux boxes instead. It's not like Java apps are legacy apps
> from the
> 60's.

And some people buy several of these, to solve the other problem -
access to the data.

We started out running web applications on a room full of PC level
hardware. Using C++, and everything. :-)

Turned out there was a scaling problem. The databases were still on
the mainframe, because that is where the data is produced. The
bottleneck turned out to be the communications between the PC servers
and the back end database. The transactions timed out.

So, by moving the web server over to the mainframe that was solved. At
the expense of rewriting everything in Java, and investing a couple of
$100k in application specific hardware (about 4 zAAPs I guess). So
Java's portable spec cost us that amount, for code that isn't portable
anyway!

Bo Persson

Sean Hunt

unread,

Jan 15, 2008, 4:33:20 PM1/15/08

to

On Jan 14, 6:36 pm, Walter Bright <wal...@digitalmars-nospamm.com>
wrote:

> So yes, C++ 'supports' EBCDIC. But C++ doesn't support Unicode or UTF
> encodings. D's native character set is Unicode UTF-8, UTF-16, and
> UTF-32. EBCDIC is the past, Unicode is the future.

It's possible to write a compiler that uses 32-bit char encoded in
UTF-32...

Or, you could wait until C++09, when C++ will support all three of
those encodings you just mentioned.

Pete Becker

unread,

Jan 15, 2008, 4:32:05 PM1/15/08

to

On 2008-01-15 02:52:54 -0500, Walter Bright
<wal...@digitalmars-nospamm.com> said:

>
> You can get people to look at a product, kick the tires, and evaluate it
> based on lies. But you won't get a sustained success that way, and Java
> has been a sustained success.
>

That's why nobody smokes cigarettes any more: they evaluate smoking
objectively, see the dangers, and make a rational decision.

Language choices can be strongly influenced by non-technical criteria.
Sustained success, in itself, is not evidence of technical superiority,
just of technical adequacy.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Walter Bright

unread,

Jan 16, 2008, 8:06:19 AM1/16/08

to

Dave Harris wrote:
> I gather you are a compiler vendor,

Yes, Digital Mars.

> and didn't implement it because you
> weren't aware the standard permitted it.

I've never had a request for it until now.

> So maybe now you know it's
> allowed and that at least some users want it, you'll consider adding it
> to your product? If not you, then perhaps other vendors reading this.

I'm always open to doing custom compiler work for people. Email me if
you're interested.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Jerry Coffin

unread,

Jan 16, 2008, 8:08:40 AM1/16/08

to

In article <-92dnbFuPZ_oixHa...@comcast.com>,

wal...@digitalmars-nospamm.com says...
> Jerry Coffin wrote:

[ ... ]

> > Even code that is written to depend on 8-
> > bit chars is usually quite easy to convert to remove that dependency.
>
> Except when you're doing the (very common) practice of using char types
> to do byte manipulation. Or if you're trying to deal with Unicode encodings.

Yes, it's possible write such code, but (IMO, of course) much like most
code that deals with specifics of the hardware, I think most of that
should be isolated to specific modules that interface to the outside
world.

[ ... ]

> I agree that people who are used to porting between machines A and B
> will have long since figured out how to write code that is portable from
> A to B. My point was for people who had no experience with such ports
> attempting to write code that is portable.

I don't think changing the language is likely to have much effect on
this. IME, most people write code that's conceptually non-portable (so
to speak) and changing the language isn't going to affect that. Just for
an obvious example, there's nearly no possible change in the language
that will make code portable that has calls to Windows, UNIX, or
whatever everywhere, depends heavily on the OS's memory management data
structures, process and/or thread scheduling, etc.

>From my own experience in porting C and C++ code, I'd say probably a
single digit percentage of the effort was devoted to issues with the
language itself; the vast majority was devoted to things that no
change in the language could have cured (at least short of a standard
library at least the size of Java's).

> > Nonetheless, if somebody _wants_ to do so, they most certainly
> > _can_ write C and/or C++ code that works fine with various sizes of
> > char. With D they can't do any such thing, because anything with a
> > different size of char, by definition, isn't D.
>
> See the above typedef's.

I'll take your word for their being effective.

> The compiler implementor can guarantee whatever he wants to, but the
> specification does not guarantee anything with regards to UB. Erratic,
> random behavior is certainly allowed by the spec, and indeed happens
> with UB behavior like buffer overflowing.

Of course _some_ UB causes erratic behavior, and I doubt anybody (doing
legitimate coding anyway) has any interest in depending up buffer
overflows. That's hardly proof that all UB leads to erratic behavior for
those who don't care about portability.

[ ... ]

> >> I would vote for C++ to explicitly ditch 16 bit support. No problem there!
> > To accomplish what?
>
> To get rid of the possibility of 16 bit 'int' types.

Why? If you want something to be 32 bits, why are you using int in the
first place? Why not use long if that's what you really want?

[ ... ]

> [...] each issue of UB and IDB should be evaluated in terms of

> its cost and benefits. C++ has too much UB and IDB that has signficant
> costs, and benefits which are dubious. D also has UB and IDB, just a lot
> less of it.

I'm not sure I agree with all your conclusions, but I certainly agree
with the idea of how to reach them. Given that nearly none of the costs
or benefits is easily measured, I think differences of opinion on the
subject are quite reasonable though.

That's certainly not to say that I disagree completely either: I
think some things in C++ could be defined more tightly without any harm.
In some cases, such changes would be opposed. In many cases, however,
they remain undefined (or loosely defined0 simply because nobody's been
sufficiently affected that they're willing to go to the work of writing
a tighter specification.

> > On Intel, I believe it worked fine with every
> > compiler I had handy at the time as well (at least with MS, Borland,
> > Intel and GNU).
>
> Windows compilers are pretty compatible in their handling of UB and IDB,
> quite deliberately so as each tried to lure customers away from their
> competitors.

Yes and no -- certainly MS, Borland and Intel fit that description. I'm
not sure I'd describe g++ as a Windows compiler at all though -- for the
most part, it's a UNIX compiler that runs on Windows only because
Windows can do a semi-passable imitation of UNIX when/if required.

[ ... ]

> > Yes and no. It can detect when runtime integer overflow becomes a
> > possibility, and even (for example) show the source of the values that
> > could result in the overflow.
>
> This could result in overflow:
>
> int sum(int a, int b) { return a + b; }
>
> A compiler that nagged about this would be more of a nuisance than a help.

I'm talking about doing dataflow analysis, so if it can prove that sum
is only ever called with numbers smaller than 100, it gives no warning.
OTOH, if it can prove that sum is called with an external input, it does
provide such a warning.

> >> A typical C++ compiler has a bewildering array of switches that change
> >> its behavior.
> > Change what behavior?
>
> man g++
>
> will list 40 of them under "C++ Language Options".

Quite a few of these simply aren't allowed by C++ at all. If they want
to call what it compiles C++ anyway, I'm certainly not going to stop
them, but it's clear that things like -fno-enforce-eh-specs, -fms-
extensions and either -ffor-scope or else -fno-for-scope simply makes
the compiler process a language that's not really standard C++ at all.

Most of the rest shouldn't have any effect on the language per se. Just
for example things like -fabi-version and -fno-default-inline at least
should fall into this category (i.e. I haven't investigated them in
detail, but the ABI version _should_ only affect, well, the ABI, not
anything related to the language itself. If it does affect the language
itself, it probably falls into the category above, where only one
setting really follows the standard, and the others are meaningless from
the viewpoint of the standard language.

About the only way a compiler will always strictly follow a language
definition is if the compiler IS the language definition. While switches
like most you've mentioned are probably a practical necessity, they
don't really have much to do with the language definition itself -- the
language definition clearly can't have much effect on people who've
chosen to ignore it.

[ ... ]

> > I suppose that depends on your viewpoint. At least in Java, the attempt
> > at defining the problem out of existence has created a problem that I'd
> > say is at least twice as bad.
>
> The size of an int is a bad problem in Java?

Yes -- it requires that the source code be rewritten to take advantage
of the underlying machine, and frequently hurts performance without
accomplishing anything in return.

[ ... ]

> >> In other words, C++ has de facto standardized around 32 bit ints.
> > I disagree. If anything, the prevelance of 32-bit ints has created
> > problems, not cured them.
>
> What problems?

People writing non-portable code for no good reason.

> >>> languages like Java and D don't even allow that.
> >> In D, you can use a variable sized int if you want to:
> >> typedef int myint;
> >> and use myint everywhere instead of int. To change the size, change the
> >> typedef. Nothing is taken away from you by fixing the size of int. It
> >> just approaches it from the opposite direction:
> > That doesn't fix the problem.
>
> Why not?

It prevents me from specifying to the compiler what I really want. A lot
of the time, I use an int in C or C++ to get something that's tailored
to the size the machine can use best -- 16 bits would be enough, but if
it happens to do 32- or 64-bit math on it, that's fine too. In Java (and
apparently D) there's just no way for me to specify that type directly.
To get it, I have to use a typedef, and modify my code to fit the
machine at hand -- the very antithesis of portability, and all I get for
my trouble is what a C or C++ compiler would have handled for me
entirely automatically.

> > You've got things backwards: in C++ you get code that works correctly on
> > different sizes of machines, unless you take steps to stop it from doing
> > so. In D you get code that works correctly on different sizes of
> > machines only by going through massive brain damage to undo its
> > mistakes.
>
> A typedef is massive brain damage? I'm not following this at all.

If it was one typedef in one place for one purpose, the brain damage
would be relatively minor. In reality, of course, on anything more than
a truly trivial program, that's not the case, and getting sensible
behavior would involve a lot of brain damage.

> >> Most switches of that sort are of limited utility anyway because they
> >> screwup the abi to existing compiled libraries.
> > What problems have you verified along that line? I've used unsigned char
> > quite a bit without seeing such problems.
>
> Well, anything that depends on CHAR_MAX and CHAR_MIN being constant
> throughout the program, for example.

Hmmm? They remain constant through the entire program -- are you
suggesting that one would change the signedness of char in different
parts of the program? That sounds a bit like the people who try to
change the range produced by rand() by editing the definition of
RAND_MAX...

> > In the D case, however, there seems to be no way to write code that's
> > reasonably portable TO the SHARC.
>
> If you eschewed char and short in favor of dchar and int, your code will
> port to the SHARC.

Hmmm...okay, I buy the idea that at that point it might at least be
theoretically possible (at least provided there was a D compiler for the
SHARC, of course).

[ ... ]

> I challenge you to port zlib (written in C) to the SHARC. I think you'll
> find it every bit as much work as if it were in D.

This sounds pretty silly to me. First of all, to port it in D, I'd have
to start by writing a D compiler for the SHARC, and I'd be _very_
surprised if that wasn't more work than porting zlib.

Second, the SHARC is a DSP. On a DSP, you generally care about signal
processing types of code -- things like forward and inverse transforms
(e.g. FFT or DCT) as well as filters (IIR and FIR). You also frequently
do things
like quantization, and error-encoding. If you had zlib working on it,
what in the world would you _do_ with it?

--
Later,
Jerry.

The universe is a figment of its own imagination.

[ See http://www.gotw.ca/resources/clcm.htm for info about ]

Stephen Howe

unread,

Jan 16, 2008, 8:04:10 AM1/16/08

to

> I'd be interested. I've been hurt by int overflow bugs, and I'd have been
> helped by a DEBUG option to trap it at run-time. (Or a RELEASE option, if
> it didn't affect performance - there's no good reason to overflow int in
> C++.)

A compiler could certainly do it on the Intel platform. Just emit an INTO
instruction after every integer operation. It forces an interrupt if
overflow occured.
There is some performance loss and some size increase in the program (so
probably useful as a debug option).

Stephen Howe

Walter Bright

unread,

Jan 16, 2008, 8:14:58 AM1/16/08

to

James Dennett wrote:
> There are many optimizations implemented by compilers (including,
> I would suspect, yours) which give <<<1% improvement on running
> times for typical programs -- but cumulatively the make a big
> difference.

Sure. But I don't implement ones which break code. Here's what customers
say when that happens:

"I compile my code without optimization and it works. I turn on
optimization and it fails. Therefore, your compiler is busted."

It can be a hard lesson to not be too much of a language lawyer when
implementing an optimizer. Not if you want people to use it. There are
enough problems with not emulating blatant bugs in other compilers (such
as other compilers not implementing two level lookup properly).

> By definition, it is incorrect code. It violates language rules.

If you manufacture a car with the brake pedal on the right and the gas
on the left, your customers will be very unhappy with you. It doesn't
matter if the small print in the back of the owner's manual says they
are reversed.

For another example, the C++ spec allows the compiler to reorder
floating point expressions. Doing this, however, is a very bad idea, as
floating point ops are not commutative.

>> That's pretty marvy if you're a top C++ expert and can command $5000/day
>> in consulting fees, but it stinks if you're on the other side of that
>> having to write the checks. Hence the demand for languages where the
>> costs are lower, a demand that I think C++ dismisses a little too easily.
>
> I don't know that it does. Frankly, being competent to write complex
> code has about the same level of investment required whichever language
> you choose, as the complexity isn't in the language.

I don't agree. C++ is loaded with unnecessary complexity (*) that takes
years to master. Why not spend that effort learning how to be a better
programmer? Who actually understands two level lookup?

(*) The complexity is usually there for legacy reasons, but that doesn't
aid the developer of new code.

> It's harsh to call windows.h amateurish, but historically it
> has been.
> (The "amateurs" in question got very rich from very bad software.

Windows for 16 bits had some extremely difficult technical hurdles to
overcome. It's easy to say it was "very bad" from 20 years of hindsight,
but I think that is very unfair.

>> C++: int sizes variable, typedef sizes fixed
>> D: int sizes fixed, typedef sizes variable
> But in D, you have no built-in support for requesting "the fastest
> integral type please"

Yes:
std.stdint.int_fastNN_t, where NN is one of (8,16,32,64)

> or "the smallest of at least 48 bits"

Yes:
std.stdint.int_leastNN_t, where NN is one of (8,16,32,64)

> (except
> in that D doesn't support types between 32 and 64 bits, as far as
> I understand).

C++ has no standard way of saying "at least 48 bits", either.

> It's not a false choice if D has such a limited selection as I
> believe it does.

D supports more integer types than C++ does. C++ has no requirement for
a 64 bit integral type, for example.

>>>> Warnings are a good sign that there's something wrong with the language
>>>> design. BTW, I just tried this:
>>>>
>>>> int test(char c) { return c; }
>>>>
>>>> with:
>>>>
>>>> g++ -c foo.cpp -Wall
>>>>
>>>> and it compiled without error or warning. (gcc-4.1)
>>>
>>> What's "marginal" about that situation?
>>
>> It gives different answers for different signedness of char's, when c
>> has a value 0x80<=c<=0xFF. I should think if a compiler was to usefully
>> warn about these things, that example would be first on the list.
>
> For a signed 8-bit char, that never occurs, so I don't know
> what you mean.

This comes up regularly with code that reads multibyte character data.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

ThosRTanner

unread,

Jan 16, 2008, 8:15:14 AM1/16/08

to

On Jan 10, 3:07 pm, Francis Glassborow
<francis.glassbo...@btinternet.com> wrote:
> Walter Bright wrote:
> > Francis Glassborow wrote:
> >> Indeed many are surprised the first time they come across wrap around.
> >> They get even more surprised when they discover that there is no such
> >> requirement for int. And worse, there is no requirement for the
> >> implementation to tell you what it does when an int expression evaluates
> >> out of the range of supported values.
>
> > I've programmed asm on many machines, 8, 10, 16, and 32 bit. Every last
> > one of them used the same instruction for adding ints and adding
> > unsigneds. So how can one wrap and the other not?
>
> I largely agree yet both WG14 and WG21 adamantly insist that overflow of
> a signed integer type is undefined behaviour. It deeply irritates me
> because it means that this simple novice program has potential undefined
> behaviour (and avoiding it is real tough -- for all but expert programmers)
>
> int main(){
> int i, j;
> std::cin >> i >> j;
> std::cout << i + j << std::endl;
>
> }

How would you define it to behave? Given appropriate input values,
that is certainly going to behave differently on 16/32/64 bit
implementations.

Andrei Alexandrescu (See Website For Email)

unread,

Jan 16, 2008, 2:21:32 PM1/16/08

to

Jerry Coffin wrote:
> In article <-92dnbFuPZ_oixHa...@comcast.com>,
> wal...@digitalmars-nospamm.com says...

>> I agree that people who are used to porting between machines A and B
>> will have long since figured out how to write code that is portable from
>> A to B. My point was for people who had no experience with such ports
>> attempting to write code that is portable.
>

> I don't think changing the language is likely to have much effect on
> this. IME, most people write code that's conceptually non-portable (so
> to speak) and changing the language isn't going to affect that. Just for
> an obvious example, there's nearly no possible change in the language
> that will make code portable that has calls to Windows, UNIX, or
> whatever everywhere, depends heavily on the OS's memory management data
> structures, process and/or thread scheduling, etc.
>
>>From my own experience in porting C and C++ code, I'd say probably a
> single digit percentage of the effort was devoted to issues with the
> language itself; the vast majority was devoted to things that no
> change in the language could have cured (at least short of a standard
> library at least the size of Java's).

But clearly there's a world of difference between the two situations.
The dependence on APIs etc. will render the program being ported
uncompilable, not compilable but producing different outputs.

Also, I can't understand how the benefits of static checking are so
easily overlooked for the case of integral type sizes. The argument that
a massive-sized program can be written "a l'aveugle" (blindly, i.e.,
without any means for mechanical verification) to be directly or easily
portable to whatever odd-sized architectures you might have out there is
IMHO tenuous. Reminds me a lot of old school C programmers who didn't
like new-school function declarations that check their argument number
and types.

(I indulge myself in a game sometimes: I try to write as many lines of
code as I can and bet with myself that it will compile. (Just compile,
never mind run and produce current results.) My personal record is in
the dozens of lines of code - there's always something that I don't get
quite right. Clearly I'm not the kind of guy who can write a large
program that navigates the intricacies of integral arithmetic and stays
portable for various integral size combinations.)

On a meta-note, I wish there was more out-of-the-box-ness in this
thread. Most answers I saw come straight from within some box - either
the D box or the C++ box - and obstinately refuse to look out one iota.
If I were to believe what's being said, the C++ standard got it
absolutely perfectly and we can go home as improvement over the status
quo is just impossible and inconceivable. It would be preferable to look
for the truth (e.g. a solution that is out of both boxes), not for
defending tooth and nail whatever is in the standard, and dangerously
fluidify principles in the process.

Given the circumstances (the immutability of the standard at this point
on such fundamental matters), the easiest way to assuage any cognitive
dissonance is to align belief (changeable) to reality (unchangeable) and
consequently argue for the correctness of whatever is in the spec. Would
be great for us to discount that psychological effect and pursue the
truth and the "good" instead. For my money, if this thread finds a
design that is unimplementable in either C++ or D, but is better than
both, then that's a success.

Andrei

James Dennett

unread,

Jan 16, 2008, 2:24:11 PM1/16/08

to

Walter Bright wrote:
> James Dennett wrote:
>> There are many optimizations implemented by compilers (including,
>> I would suspect, yours) which give <<<1% improvement on running
>> times for typical programs -- but cumulatively the make a big
>> difference.
>
> Sure. But I don't implement ones which break code. Here's what customers
> say when that happens:
>
> "I compile my code without optimization and it works. I turn on
> optimization and it fails. Therefore, your compiler is busted."

You don't even implement RVO? That can be a major performance
issue. Or re-ordering of operations between sequence points?
(Less of a case there, but it's not specified by the standard,
so whatever you do can "break" some code.) Do you always force
the FPU to discard additional information that would not be
stored to memory? Users complain of that breaking code on x86
all the time.

> It can be a hard lesson to not be too much of a language lawyer when
> implementing an optimizer. Not if you want people to use it. There are
> enough problems with not emulating blatant bugs in other compilers (such
> as other compilers not implementing two level lookup properly).
>
>
>> By definition, it is incorrect code. It violates language rules.
>
> If you manufacture a car with the brake pedal on the right and the gas
> on the left, your customers will be very unhappy with you. It doesn't
> matter if the small print in the back of the owner's manual says they
> are reversed.

Proof by bad analogy is bad fraud ;)

Where there is a near-universally agreed standard, such as
for some aspects of car instrumentation, there's very strong
reason to conform. But if someone changes where I turn on
my headlights, that's fine -- I expect to have to read the
manual when I buy a car to find out how to operate all of
its features, even though I wouldn't buy one where I could
not operate its basic mechanisms from experience.

> For another example, the C++ spec allows the compiler to reorder
> floating point expressions. Doing this, however, is a very bad idea, as
> floating point ops are not commutative.
>
>
>>> That's pretty marvy if you're a top C++ expert and can command $5000/day
>>> in consulting fees, but it stinks if you're on the other side of that
>>> having to write the checks. Hence the demand for languages where the
>>> costs are lower, a demand that I think C++ dismisses a little too
>>> easily.
>>
>> I don't know that it does. Frankly, being competent to write complex
>> code has about the same level of investment required whichever language
>> you choose, as the complexity isn't in the language.
>
> I don't agree. C++ is loaded with unnecessary complexity (*) that takes
> years to master. Why not spend that effort learning how to be a better
> programmer? Who actually understands two level lookup?

Many people understand two phase lookup; nobody really understood
the previous status quo well enough.

> (*) The complexity is usually there for legacy reasons, but that doesn't
> aid the developer of new code.

The learning curve for C++ is not mostly due to the baggage it
carries, though some is. Mastering Ada takes years. Mastering
D will take years, when it stabilizes enough to stop being a
moving target.

>> It's harsh to call windows.h amateurish, but historically it
>> has been.
>> (The "amateurs" in question got very rich from very bad software.
>
> Windows for 16 bits had some extremely difficult technical hurdles to
> overcome. It's easy to say it was "very bad" from 20 years of hindsight,
> but I think that is very unfair.

It was also easy to see at the time, and that's fair.

>>> C++: int sizes variable, typedef sizes fixed
>>> D: int sizes fixed, typedef sizes variable
>> But in D, you have no built-in support for requesting "the fastest
>> integral type please"
>
> Yes:
> std.stdint.int_fastNN_t, where NN is one of (8,16,32,64)

So the fastest is std.stdint.int_fast8_t then? Good to know.
Is that usually a 32-bit type on current hardware?

>> or "the smallest of at least 48 bits"
>
> Yes:
> std.stdint.int_leastNN_t, where NN is one of (8,16,32,64)

48 is not one of those. So programs have to ask for at least
64 bits, even if the hardware had an efficient 48 bit type but
no efficient 64-bit type.

>> (except
>> in that D doesn't support types between 32 and 64 bits, as far as
>> I understand).
>
> C++ has no standard way of saying "at least 48 bits", either.

C++0x does, as does C99. (It wouldn't be fair to compare an
unstandardized D to the last ISO standard for C++, which is
essentially from 1998. More reasonable to compare D from the
21st century to 21-st century C++, no?)

>> It's not a false choice if D has such a limited selection as I
>> believe it does.
>
> D supports more integer types than C++ does. C++ has no requirement for
> a 64 bit integral type, for example.

C++0x does support an integral type of at least 64 bits, as does C99,
and implementations can and do support sizes which aren't in
{8,16,32,64}. D might mandate more than C++98, but not more than
C++0x, and D disallows types which are supported by C++ compilers.

>
>>>>> Warnings are a good sign that there's something wrong with the
>>>>> language
>>>>> design. BTW, I just tried this:
>>>>>
>>>>> int test(char c) { return c; }
>>>>>
>>>>> with:
>>>>>
>>>>> g++ -c foo.cpp -Wall
>>>>>
>>>>> and it compiled without error or warning. (gcc-4.1)
>>>>
>>>> What's "marginal" about that situation?
>>>
>>> It gives different answers for different signedness of char's, when c
>>> has a value 0x80<=c<=0xFF. I should think if a compiler was to usefully
>>> warn about these things, that example would be first on the list.
>>
>> For a signed 8-bit char, that never occurs, so I don't know
>> what you mean.
>
> This comes up regularly with code that reads multibyte character data.

No; my point was not that it doesn't come up often, but that
the example you described NEVER occurs because it is mathematically
impossible. An 8-bit signed char does not have values in the
range you cite as problematic. If you can offer a complete
example showing a bug caused by this, it might be helpful.

-- James

Thomas Richter

unread,

Jan 16, 2008, 2:20:21 PM1/16/08

to

Jerry Coffin schrieb:

>>>> I would vote for C++ to explicitly ditch 16 bit support. No problem there!
>>> To accomplish what?
>> To get rid of the possibility of 16 bit 'int' types.
>
> Why? If you want something to be 32 bits, why are you using int in the
> first place? Why not use long if that's what you really want?

Cough, cough. Because it would be wrong. On the platform I'm using, a long
is 64 bit wide, and an int is 32 bit. Nothing wrong with that, from the
language perspective. Quite wrong from the user's perspective - as
demonstrated... (-;

The problem is really that, even when using an "int", you cannot be sure
that it fits your needs *unless* you are really, really careful with your
steps. It is "easy" (well, not exactly, but as far as the type system is
concerned, it is) to write a C++ compiler for a generic platform, but it
is hard to write programs that are truly portable. In almost all cases,
at least in my attempts, it means that you have to define your own types
with *known* ranges. And if the target does not support those types, you
have to invest a lot of work. In the end, what difference does it make
whether I cannot port my program because the compiler doesn't support it,
or I cannot port because the types I need aren't available.

I understand, from the perspective of the C programming language, this
particular choice for the definition of the language. C is a low-level
language, or a high-level assembler, and is *supposed* to support any
half way reasonable platform. The type system is something you pay for,
and it makes sense for the tasks C is designed for.

But for C++? Is a DSP really the "typical" target architecture for a low
complexity, low-speed chip? Is it worth the price?

>> int sum(int a, int b) { return a + b; }
>>
>> A compiler that nagged about this would be more of a nuisance than a help.
>
> I'm talking about doing dataflow analysis, so if it can prove that sum
> is only ever called with numbers smaller than 100, it gives no warning.
> OTOH, if it can prove that sum is called with an external input, it does
> provide such a warning.

Is this really realistic? I mean, all the simplicity of the function
aside, but in any real code, I wouldn't expect the compiler being able to
prove anything about the limits of the sum.

>>>> A typical C++ compiler has a bewildering array of switches that change
>>>> its behavior.
>>> Change what behavior?
>> man g++
>>
>> will list 40 of them under "C++ Language Options".
>
> Quite a few of these simply aren't allowed by C++ at all. If they want
> to call what it compiles C++ anyway, I'm certainly not going to stop
> them, but it's clear that things like -fno-enforce-eh-specs, -fms-
> extensions and either -ffor-scope or else -fno-for-scope simply makes
> the compiler process a language that's not really standard C++ at all.

Besides, not all of them are orthogonal, i.e. can be used independent of
each other. And the effects of some switches are again *completely*
unrelated,
so you do not need to test all combinations either. I don't think "counting
options" tells anything.

> [ ... ]
>
>>> I suppose that depends on your viewpoint. At least in Java, the attempt
>>> at defining the problem out of existence has created a problem that I'd
>>> say is at least twice as bad.
>> The size of an int is a bad problem in Java?
>
> Yes -- it requires that the source code be rewritten to take advantage
> of the underlying machine, and frequently hurts performance without
> accomplishing anything in return.

Portable code?

Java *does* have problems, but exactly in situations where it does not
define strictly, for example in its GUI library. I've seen Java programs
not working on Windows even though they worked on Linux, and vice versa,
and the reason was that the programs expected some specific behavior of the
library which was left "undefined".

>>>> In other words, C++ has de facto standardized around 32 bit ints.
>>> I disagree. If anything, the prevelance of 32-bit ints has created
>>> problems, not cured them.
>> What problems?
>
> People writing non-portable code for no good reason.

I think efficient development (or, fast development) is a pretty good
reason. The compiler should help me to avoid those cases, but it simply
can't all the way.

>>>> and use myint everywhere instead of int. To change the size, change the
>>>> typedef. Nothing is taken away from you by fixing the size of int. It
>>>> just approaches it from the opposite direction:
>>> That doesn't fix the problem.
>> Why not?
>
> It prevents me from specifying to the compiler what I really want. A lot
> of the time, I use an int in C or C++ to get something that's tailored
> to the size the machine can use best -- 16 bits would be enough, but if
> it happens to do 32- or 64-bit math on it, that's fine too. In Java (and
> apparently D) there's just no way for me to specify that type directly.
> To get it, I have to use a typedef, and modify my code to fit the
> machine at hand -- the very antithesis of portability, and all I get for
> my trouble is what a C or C++ compiler would have handled for me
> entirely automatically.

Interesting argument, but can also be turned around. If you use -say- in
java an int, and the machine is 16 bit wide, a "smart compiler" could run
a data analysis and could determine that a 16 bit register on this machine
would be good enough. That said, the choice of the proper representation
should be left to the compiler.

Thus, the typical C++ argument "let the compiler be smart enough" also works
the other way 'round.

>>>> Most switches of that sort are of limited utility anyway because they
>>>> screwup the abi to existing compiled libraries.
>>> What problems have you verified along that line? I've used unsigned char
>>> quite a bit without seeing such problems.
>> Well, anything that depends on CHAR_MAX and CHAR_MIN being constant
>> throughout the program, for example.
>
> Hmmm? They remain constant through the entire program -- are you
> suggesting that one would change the signedness of char in different
> parts of the program? That sounds a bit like the people who try to
> change the range produced by rand() by editing the definition of
> RAND_MAX...

Probably Walter talks about cases where you link separate modules compiled
with differing options into one program? Yes, that's of course UB, or
rather, the language specs as I read them do not address this at all.

>> I challenge you to port zlib (written in C) to the SHARC. I think you'll
>> find it every bit as much work as if it were in D.
>
> This sounds pretty silly to me. First of all, to port it in D, I'd have
> to start by writing a D compiler for the SHARC, and I'd be _very_
> surprised if that wasn't more work than porting zlib.
>
> Second, the SHARC is a DSP. On a DSP, you generally care about signal
> processing types of code -- things like forward and inverse transforms
> (e.g. FFT or DCT) as well as filters (IIR and FIR). You also frequently
> do things
> like quantization, and error-encoding. If you had zlib working on it,
> what in the world would you _do_ with it?

Exactly. What would you do with C++ on it? (-: Yes, that's provocative,
but is this DSP really a *typical* target for the C++ language so its worth
making things a lot harder for the rest of the world?

It would make sense to have a stricter C++ for "non-embedded" systems that
provides more guarantees for the programmer than C++ does today. There's
no problem with relaxing the constraints for other systems provided they
are documented.

So long,
Thomas

--

Zeljko Vrba

unread,

Jan 16, 2008, 2:25:53 PM1/16/08

to

{ Accepted as follow-up, but further discussion in this direction will
probably be off-topic for clc++m. -mod }

On 2008-01-16, Stephen Howe <sjhoweATdialD...@giganews.com> wrote:
>
> A compiler could certainly do it on the Intel platform. Just emit an INTO
>

AFAIK, the INTO instruction is invalid in 64-bit mode. I don't see any
functional difference between INTO and JO (jump on overflow) to some near
label and proceeding with abort() code from there.

Bo Persson

unread,

Jan 16, 2008, 7:01:37 PM1/16/08

to

Cough, too!

Do we really want a language that can only be implemented in "typical"
architectures?!

>>> I challenge you to port zlib (written in C) to the SHARC. I think
>>> you'll find it every bit as much work as if it were in D.
>>
>> This sounds pretty silly to me. First of all, to port it in D, I'd
>> have to start by writing a D compiler for the SHARC, and I'd be
>> _very_ surprised if that wasn't more work than porting zlib.
>>
>> Second, the SHARC is a DSP. On a DSP, you generally care about
>> signal processing types of code -- things like forward and inverse
>> transforms (e.g. FFT or DCT) as well as filters (IIR and FIR). You
>> also frequently do things
>> like quantization, and error-encoding. If you had zlib working on
>> it, what in the world would you _do_ with it?
>
> Exactly. What would you do with C++ on it? (-: Yes, that's
> provocative,
> but is this DSP really a *typical* target for the C++ language so
> its worth making things a lot harder for the rest of the world?

Is making it easier for some systems *really* worth making it
impossible for other systems?

>
> It would make sense to have a stricter C++ for "non-embedded"
> systems that provides more guarantees for the programmer than C++
> does today. There's
> no problem with relaxing the constraints for other systems provided
> they are documented.

How is having one standard for some systems and another standard for
other systems better than UB.

Nothing stops an implementation from defining the behavior in the
first place.

Bo Persson

Andrei Alexandrescu (See Website For Email)

unread,

Jan 17, 2008, 6:11:52 AM1/17/08

to

James Dennett wrote:

> Walter Bright wrote:
>>> But in D, you have no built-in support for requesting "the fastest
>>> integral type please"
>>
>> Yes:
>> std.stdint.int_fastNN_t, where NN is one of (8,16,32,64)
>

> So the fastest is std.stdint.int_fast8_t then? Good to know.
> Is that usually a 32-bit type on current hardware?
>

>>> or "the smallest of at least 48 bits"
>>
>> Yes:
>> std.stdint.int_leastNN_t, where NN is one of (8,16,32,64)
>

> 48 is not one of those. So programs have to ask for at least
> 64 bits, even if the hardware had an efficient 48 bit type but
> no efficient 64-bit type.
>

>>> (except
>>> in that D doesn't support types between 32 and 64 bits, as far as
>>> I understand).
>>
>> C++ has no standard way of saying "at least 48 bits", either.
>

> C++0x does, as does C99. (It wouldn't be fair to compare an
> unstandardized D to the last ISO standard for C++, which is
> essentially from 1998. More reasonable to compare D from the
> 21st century to 21-st century C++, no?)

This is true but missing the point, or at least driving a different
direction that the one I'd hope. D can solve the issue by just adding a
couple more typedefs or a template BoundedIntegral to the standard
library. We're talking type system design here, and the issue at hand
(in my opinion) is: what's better, a design that leaves "usual" integral
types to the discretion of the compiler and provides additional "least"
or "fixed" types, or one that confines the "usual" types to specific
sizes and also provides "least" types (and perhaps "fixed" types of
exotic sizes)?

Andrei

kwikius

unread,

Jan 17, 2008, 6:12:00 AM1/17/08

to

On 16 Jan, 19:21, "Andrei Alexandrescu (See Website For Email)"
<SeeWebsiteForEm...@erdani.org> wrote:

> For my money, if this thread finds a
> design that is unimplementable in either C++ or D,
> but is better than both, then that's a success.

Such a language has many of C++ features, but it removes fundamental
types inherited from C completely.

It replaces inbuilt types with a virtual machine, IOW an abstract
assembly language, from which you can construct your own primitive
types. The machine should also allow metaprogramming of UDT's

It has Concepts but with a properly scoped mechanism for composition
( C++ template metaprogramming mechanism is a good model)

Sources of inspiration... BCPL, C++, Java,.NET, LLVM

Any day now .... ;_0

regards
Andy Little

Walter Bright

unread,

Jan 17, 2008, 4:44:53 PM1/17/08

to

Bo Persson wrote:
> How is having one standard for some systems and another standard for
> other systems better than UB.

Good question. It's better because it places the extra work for the
unusual systems on the relatively few people who program for them, who
will have that extra work anyway, rather than placing the burden on all
programmers.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Walter Bright

unread,

Jan 17, 2008, 4:46:21 PM1/17/08

to

Replies to both Thomas and Jerry embedded here:

Thomas Richter wrote:
> Jerry Coffin schrieb:

>>> int sum(int a, int b) { return a + b; }
>>> A compiler that nagged about this would be more of a nuisance than a help.
>> I'm talking about doing dataflow analysis, so if it can prove that sum
>> is only ever called with numbers smaller than 100, it gives no warning.
>> OTOH, if it can prove that sum is called with an external input, it does
>> provide such a warning.
>
> Is this really realistic? I mean, all the simplicity of the function
> aside, but in any real code, I wouldn't expect the compiler being able to
> prove anything about the limits of the sum.

I've done a lot of work with data flow analysis. You're right, it isn't
realistic. Only in a small minority of cases is such analysis able to
narrow the range of values a variable might take. (I once spent a lot of
time trying to automatically remove array bounds checks by being able to
prove the index was within range.)

> Besides, not all of them are orthogonal, i.e. can be used independent of
> each other. And the effects of some switches are again *completely*
> unrelated, so you do not need to test all combinations either.

You cannot test all combinations. It's mathematically impractical. So,
you take a guess at which ones are most likely to interact. But as with
all complex software, it isn't always clear that things don't actually
interact.

>> It prevents me from specifying to the compiler what I really want. A lot
>> of the time, I use an int in C or C++ to get something that's tailored
>> to the size the machine can use best -- 16 bits would be enough, but if
>> it happens to do 32- or 64-bit math on it, that's fine too. In Java (and
>> apparently D) there's just no way for me to specify that type directly.

Yes, there is. Use std.stdint.int_fast32_t

>> To get it, I have to use a typedef,

Sorry, I don't get what the problem with using a standard library
typedef is.

>> and modify my code to fit the machine at hand --

The standard library does that for you, just like using size_t and
ptrdiff_t.

>> the very antithesis of portability, and all I get for
>> my trouble is what a C or C++ compiler would have handled for me
>> entirely automatically.

Is it really so hard to use a typedef from the standard library? After
all, you (i.e. C++ programmers) regularly use size_t and ptrdiff_t.

>>>>> Most switches of that sort are of limited utility anyway because they
>>>>> screwup the abi to existing compiled libraries.
>>>> What problems have you verified along that line? I've used unsigned char
>>>> quite a bit without seeing such problems.
>>> Well, anything that depends on CHAR_MAX and CHAR_MIN being constant
>>> throughout the program, for example.
>> Hmmm? They remain constant through the entire program

Not if you compile some of your code with one char sign setting and your
precompiled library compiles it with another setting.

>> -- are you
>> suggesting that one would change the signedness of char in different
>> parts of the program?

That's exactly what happens if you're using third party precompiled
libraries. Those aren't exactly rare.

>>> I challenge you to port zlib (written in C) to the SHARC. I think you'll
>>> find it every bit as much work as if it were in D.
>> This sounds pretty silly to me. First of all, to port it in D, I'd have
>> to start by writing a D compiler for the SHARC,

That is silly, because porting an application implies the existence of
the appropriate compiler on the target. Somebody has to do the work to
write a C++ compiler for SHARC, just as somebody has to do the work to
write a D compiler for it. It's not relevant to the discussion of
porting code.

>> Second, the SHARC is a DSP. On a DSP, you generally care about signal
>> processing types of code -- things like forward and inverse transforms
>> (e.g. FFT or DCT) as well as filters (IIR and FIR). You also frequently
>> do things
>> like quantization, and error-encoding. If you had zlib working on it,
>> what in the world would you _do_ with it?

It sounds like you agree that C++ language rules don't make source code
portable. If the DSP coder is always going to be writing custom code for
that DSP that he would not need to port anywhere else, and no other C++
code could be ported to that DSP, then what is the point of having a
portable language specification? He'd be no worse off having a language
customized for that DSP.

> It would make sense to have a stricter C++ for "non-embedded" systems that
> provides more guarantees for the programmer than C++ does today. There's
> no problem with relaxing the constraints for other systems provided they
> are documented.

Yes, I agree that is sensible.

--------
Walter Bright
http://www.digitalmars.com
C, C++, D programming language compilers

--

Zeljko Vrba

unread,

Jan 17, 2008, 4:52:34 PM1/17/08

to

On 2008-01-16, Andrei Alexandrescu (See Website For Email)

<SeeWebsit...@erdani.org> wrote:
>
> On a meta-note, I wish there was more out-of-the-box-ness in this
>

I once had the idea that integral types should be defined in terms of
natural
register width of the target CPU:

char = 8 bits (exactly!)
short = half register
int = register
long = max(2*register, pointer)

I got an objection that not all CPUs have something that could be uniquely
defined as "natural register width". Maybe the current state of C and C++
integral types should be blamed on CPU designers :-)

===

Alternative solution would be to kill ALL integer types and replace them
with (s|u)bitstring<N> types (signed and unsigned variants) which support
all of the usual integer arithmetic operators and conversions. The type
would perform arithmetic on N-bit numbers, but the actual _representation_
of the type would be left unspecified (maybe left to be defined in the ABI).