Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What's wrong with this picture?

90 views
Skip to first unread message

lon...@gmail.com

unread,
Jan 6, 2008, 2:52:55 AM1/6/08
to
Below is a copy of reference implementation from "Improved min/max"
proposal for upcoming C++0x standard library (http://www.open-std.org/
JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
something must be wrong with a language if this is how min/max has to
be implemented?


namespace detail
{

template <class T, bool make_const, bool make_volatile>
struct union_cv_helper;

template <class T>
struct union_cv_helper<T, false, false>
{
typedef T type;
};

template <class T>
struct union_cv_helper<T, false, true>
{
typedef volatile T type;
};

template <class T>
struct union_cv_helper<T, true, false>
{
typedef const T type;
};

template <class T>
struct union_cv_helper<T, true, true>
{
typedef const volatile T type;
};

} // detail

template <class T, class U>
struct union_cv
{
static const bool make_const = std::tr1::is_const<T>::value ||
std::tr1::is_const<U>::value;
static const bool make_volatile = std::tr1::is_volatile<T>::value
|| std::tr1::is_volatile<U>::value;
typedef typename std::tr1::remove_cv<T>::type Tr;
typedef typename std::tr1::remove_cv<U>::type Ur;
typedef typename detail::union_cv_helper<Tr, make_const,
make_volatile>::type type;
};

template <class T, class U>
struct promote
{
static T t;
static U u;
static bool b;
typedef typename std::tr1::remove_cv<decltype(b ? t : u)>::type
type;
};

namespace detail
{

template <class T, class Tr, class U, class Ur>
struct min_max_return_helper
{
typedef typename promote<T&, U&>::type type;
};

template <class T, class Tr, class U>
struct min_max_return_helper<T, Tr, U, Tr>
{
typedef typename union_cv<T, U>::type& type;
};

template <class T, T t, class U, U u>
struct safe_less_equal
{
static const bool Tsigned = std::tr1::is_signed<T>::value;
static const bool Usigned = std::tr1::is_signed<U>::value;
static const bool Tneg = Tsigned && t < T(0);
static const bool Uneg = Usigned && u < U(0);
static const bool value = Tneg == Uneg ? t <= u : Tneg;
};

template <class T>
struct int_min
{
static const T value = std::tr1::is_signed<T>::value ? T(T(1) <<
std::numeric_limits<T>::digits) : T(0);
};

template <class T>
struct int_max
{
static const T value = T(~int_min<T>::value);
};

template <class T, class U, bool = std::tr1::is_integral<T>::value &&
std::tr1::is_integral<U>::value>
struct safe_compare_imp
{
typedef typename promote<T, U>::type R;
static const R Rmin = int_min<R>::value;
static const R Rmax = int_max<R>::value;

static const T Tmin = int_min<T>::value;
static const T Tmax = int_max<T>::value;

static const U Umin = int_min<U>::value;
static const U Umax = int_max<U>::value;

static const bool value = safe_less_equal<R, Rmin, T, Tmin>::value
&&
safe_less_equal<R, Rmin, U, Umin>::value
&&
safe_less_equal<T, Tmax, R, Rmax>::value
&&
safe_less_equal<U, Umax, R,
Rmax>::value;
};

template <class T, class U>
struct safe_compare_imp<T, U, false>
{
static const bool value = true;
};

template <class T>
struct safe_compare_imp<T, T, true>
{
static const bool value = true;
};

template <class T>
struct safe_compare_imp<T, T, false>
{
static const bool value = true;
};

template <class T, class U>
struct safe_compare
{
private:
typedef typename std::tr1::remove_cv<typename
std::tr1::remove_reference<T>::type>::type Tr;
typedef typename std::tr1::remove_cv<typename
std::tr1::remove_reference<U>::type>::type Ur;
public:
static const bool value = detail::safe_compare_imp<Tr, Ur>::value;
};

} // detail

template <class T, class U, bool = detail::safe_compare<T, U>::value>
struct min_max_return {};

template <class T, class U>
struct min_max_return<T&&, U&&, true>
{
typedef typename promote<T&&, U&&>::type type;
};

template <class T, class U>
struct min_max_return<T&&, U&, true>
{
typedef typename promote<T&&, U&>::type type;
};

template <class T, class U>
struct min_max_return<T&, U&&, true>
{
typedef typename promote<T&, U&&>::type type;
};

template <class T, class U>
struct min_max_return<T&, U&, true>
{
private:
typedef typename std::tr1::remove_cv<T>::type Tr;
typedef typename std::tr1::remove_cv<U>::type Ur;
public:
typedef typename detail::min_max_return_helper<T, Tr, U, Ur>::type
type;
};

template <class T, class U>
inline
typename min_max_return<T&&, U&&>::type
min(T&& a, U&& b)
{
if (b < a)
return std::forward<U>(b);
return std::forward<T>(a);
}

template <class T, class U, class Compare>
inline
typename min_max_return<T&&, U&&>::type
min(T&& a, U&& b, Compare comp)
{
if (comp(b, a))
return std::forward<U>(b);
return std::forward<T>(a);
}

template <class T, class U>
inline
typename min_max_return<T&&, U&&>::type
max(T&& a, U&& b)
{
if (a < b)
return std::forward<U>(b);
return std::forward<T>(a);
}

template <class T, class U, class Compare>
inline
typename min_max_return<T&&, U&&>::type
max(T&& a, U&& b, Compare comp)
{
if (comp(a, b))
return std::forward<U>(b);
return std::forward<T>(a);
}

---
[ comp.std.c++ is moderated. To submit articles, try just posting with ]
[ your news-reader. If that fails, use mailto:std...@ncar.ucar.edu ]
[ --- Please see the FAQ before posting. --- ]
[ FAQ: http://www.comeaucomputing.com/csc/faq.html ]

Francis Glassborow

unread,
Jan 6, 2008, 11:29:11 AM1/6/08
to
lon...@gmail.com wrote:
> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?
>


min and max have a long history of being difficult to implement without
gotchas lurking. This is often a surprise to programmers who think that
all they need is something like:

template<class T>
T min(T t1, T t2){return t1 < t2 ? t1 : t2;}

It is all the multitude of corner cases that cause the complexity.

For example, consider:

min(a, b) = c;

for all possible types for a,b and c.

I actually think that a language that can ensure a diagnostic for the
above iff it is erroneous is a street ahead of most (which cannot).

Pete Becker

unread,
Jan 6, 2008, 11:30:18 AM1/6/08
to
On 2008-01-05 20:52:55 -0500, lon...@gmail.com said:

> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?

min and max do not have to be implemented this way. The problem with
this proposal (which was rejected when it was considered) is not the
language, but the specification that the proposal tries to implement.
The current versions of min and max work just fine, and are trivial to
implement, without needing layers and layers of clever template magic.


--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

John Nagle

unread,
Jan 6, 2008, 3:19:13 PM1/6/08
to

===================================== MODERATOR'S COMMENT:

Please trim unnecessary context when replying.


===================================== END OF MODERATOR'S COMMENT


lon...@gmail.com wrote:
> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?

Oh, but it's so l33t. If it was just something like

#define min(x,y) (x) < (y) ? (x) : (y)

then C++ programming could be outsourced to Bangalore.

C++ has succumbed to the LISP disease - trying to implement the compiler
in macros. Here's the final version of the MIT "LOOP" macro, from near
LISP's end of life.

http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/iter/loop/mit/mit_loop.cl

LISP went downhill in much the same way.

John Nagle
Animats

Howard Hinnant

unread,
Jan 6, 2008, 3:19:11 PM1/6/08
to
In article
<bd5b2ab2-5e9c-445e...@q77g2000hsh.googlegroups.com>,
lon...@gmail.com wrote:

> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?

Ah, you'll be pleased to know that the LWG declined to accept this
proposal: too complicated.

Fwiw, it did match the flexibility of a macro-based implementation while
detecting at compile time some dangerous situations which the macro
version silently ignores.

#include <iostream>

#define min(x, y) ((y) < (x) ? (y) : (x))

int main()
{
int i = -1;
unsigned j = 3;
std::cout << min(i, j) << '\n'; // outputs 3
}

The proposal refuses to compile the above, but does allow:

int i = -1;
char c = 'A';
int x = std::min(i, c); // x == -1

(unless char is unsigned and sizeof(int) == sizeof(char))

-Howard

kwikius

unread,
Jan 6, 2008, 6:35:38 PM1/6/08
to
On 6 Jan, 07:52, lont...@gmail.com wrote:
> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?

1) Though there is tr1::type_traits, there is no std:: construct to
combine them elegantly.
you really need 'and','or' and 'not' "metafunctions", to work direct
on traits classes.
The code looks pretty ugly without that.

2). Does min need to return a non-const reference where it could? No
IMO it should return by value. Although its doable its not the
complexity, its the confusion. Its similar to

if (x = 1) An error?

Now consider:

if (min(a,b) = 1) //An error?

and if this is convenient how about abs(T) where T is unsigned ? etc,
etc...


3) Many of the associated functions (e.g promote, int_max, int_min)
are included, though they are really generic. IOW It looks more
complicated because they're all in the same source file.

4) Its unclear what happens with UDT's. In line with other std
functions e.g abs, it would probably be best to exclude them and allow
user to define them (visible in the namespace of their type). There is
no way AFAICS to deduce the generic returntype from 2
LessThanComparable but not same UDT's.

Note however that for 2 same UDT's its trivial. however allowing 2
same UDT's and not also then catering for different but
LessThanComparable udt's seems confusing.

my 2p's worth

regards
Andy Little

Peter Dimov

unread,
Jan 6, 2008, 6:44:11 PM1/6/08
to
On Jan 6, 9:52 am, lont...@gmail.com wrote:
> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?

.

Indeed. Specifically, what's wrong with the language is that there is
no way to write a function that has a single return statement without
specifying its return type. min is simply

function min( x, y ) { return x < y? x: y; }

and while it's (or will be in C++0x) possible to somewhat approximate
the lack of explicit argument types:

template<class X, class Y> ... min( X && x, Y && y )
{
return x < y? std::forward<X>( x ): std::forward<Y>( y );
}

the ... part is harder. It's possible that we'll be able to write min
this way:

template<class X, class Y> auto min( X && x, Y && y ) ->
decltype( x < y? std::forward<X>( x ): std::forward<Y>( y ) )
{
return x < y? std::forward<X>( x ): std::forward<Y>( y );
}

There's still something wrong with the language though. :-)

It will be funny if we end up with

auto min = <>( x, y ) { x < y? x: y }

as the easiest way to write min. :-)

lon...@gmail.com

unread,
Jan 6, 2008, 6:44:11 PM1/6/08
to
> min and max have a long history of being difficult to implement without
> gotchas lurking. This is often a surprise to programmers who think that
> all they need is something like: ...

In case you didn't notice my post was not about min/max. Recently
there is more and more code like that (and much worse) in standard
library and other "advanced" C++ projects. I have written such code
myself and I enjoyed the process just like the next guy. But unlike
the next guy, I also understand how little practical value lies in
such very time consuming, hugely overcomplicated and barely readable
piles of garbage which we so lovingly call "clever template magic".

In a platform for testing new language and libraries design ideas we
could very well sacrifice clarity and briefness for other
requirements, but for a practical programming language clarity and
briefness have to remain high on the list of priorities. So next time
when you see code like "Improved min/max" above, just take a step
back, look at it again, then repeat after great John McEnroe: "You
cannot be serious!"

lon...@gmail.com

unread,
Jan 6, 2008, 6:44:07 PM1/6/08
to
> C++ has succumbed to the LISP disease - trying to implement the compiler
> in macros. Here's the final version of the MIT "LOOP" macro, from near
> LISP's end of life.
>
> http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/...

>
> LISP went downhill in much the same way.
>
> John Nagle
> Animats

Thanks for the link, quite funny.
I was beginning to think I was the only one who saw this trend as a
problem.

Alberto Ganesh Barbati

unread,
Jan 6, 2008, 11:20:22 PM1/6/08
to
lon...@gmail.com ha scritto:

> Below is a copy of reference implementation from "Improved min/max"
> proposal for upcoming C++0x standard library (http://www.open-std.org/
> JTC1/SC22/WG21/docs/papers/2007/n2199.html). Anyone else thinks that
> something must be wrong with a language if this is how min/max has to
> be implemented?

In fact, that proposal has been rejected by the LWG, according to
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n2433.html

Ganesh

Andrei Alexandrescu (See Website For Email)

unread,
Jan 7, 2008, 2:09:36 AM1/7/08
to
lon...@gmail.com wrote:
>> min and max have a long history of being difficult to implement without
>> gotchas lurking. This is often a surprise to programmers who think that
>> all they need is something like: ...
>
> In case you didn't notice my post was not about min/max. Recently
> there is more and more code like that (and much worse) in standard
> library and other "advanced" C++ projects. I have written such code
> myself and I enjoyed the process just like the next guy. But unlike
> the next guy, I also understand how little practical value lies in
> such very time consuming, hugely overcomplicated and barely readable
> piles of garbage which we so lovingly call "clever template magic".
>
> In a platform for testing new language and libraries design ideas we
> could very well sacrifice clarity and briefness for other
> requirements, but for a practical programming language clarity and
> briefness have to remain high on the list of priorities. So next time
> when you see code like "Improved min/max" above, just take a step
> back, look at it again, then repeat after great John McEnroe: "You
> cannot be serious!"

I agree; this is perhaps a good example of the law of diminishing
returns at work. The min/max solution in the (rejected) proposal
painstakingly handles by hand all of the exceedingly uninteresting
corner cases and also all semantics-dependent static typing (e.g. having
min invoked against an unsigned short and a signed char yields a signed
char, unless I misunderstood the code). It's a sort of assembler of
metaprogramming that achieves the right result by appropriately handling
on a case-by-case basis all of the possible situations.

A second-order, and very interesting, point that lurks beneath this
discussion is: what do we compare the solution against? As far as I
know, no language except the nascent D 2.0 does a better job without
paying in some way, usually in terms of efficiency, expressiveness, or
both. And I think we agree that costly abstractions are easy to come
across.

So probably the right question to ask is not "what's wrong with C++?"
but instead "what's wrong with static typing technology at large?" It's
a criticism directed more at language design in the large, rather than a
criticism of C++ in particular. If the programming languages community
has not gotten to the point where the min and max functions can easily
and painlessly expressed and typechecked, probably it's in fact a tour
de force that C++ managed to do it, even if it huffed, puffed, and blew
a gasket in the process.

The third-order and the even more interesting point lurking is the
forensics perspective: where did C++ make the wrong turn, and what
better type system could it have devised? And I have a little opinion on
that. It's a combination of factors in which mishandling integral types
is a part of, but the killer is IMHO the binding of an rvalue to a
reference to const. While such binding does help certain situations and
avoids code duplication, it turns out it's lethal for others because it
allows rvalues to cross back the Styx and show themselves as valid
lvalues, claiming to have an address, when in fact they are zombies.
Most of the machinery in any improved min/max implementation is
dedicated to this one issue.

If I could turn the time back, I'd try to suggest looking into other
means for handling rvalues and disallow rvalue binding to const
reference. It just did too much harm.

Just for the record, one possible way to handle rvalues and lvalues
without code duplication is staged type inference, which is what D 2.0
does. A complete implementation of the min function in D that achieves
all of the objective of the mentioned C++ proposal (save for always
computing the tightest type for the result) is:

typeof(b < a ? b : a)
min(A : ref T || T, B : ref U || U, T, U)(A a, B b)
{
return b < a ? b : a;
}

The key is the "A : ref T || T" type matching. It means that during type
deduction, type A is matched against the pattern ref T (T& in C++), and
should that fail, try the second match, which always succeeds. This
staged type matching, combined with D's refusal to bind an rvalue to a
reference, is what makes it possible to implement min in a manner that's
at the same time sound, efficient, and compact. I'm sure there should be
ways to do things even better than that, but they're not that easy to
find (and then again the law of diminishing returns would apply, as it's
hard to get much more compact than that).

Backporting that to C++ would look like this:

template <class A = T& || T, class B = U& || U, class T, class U>
common_type<A, B> min(A a, B b)
{
return b < a ? b : a;
}

which would again specify what patterns the types A and B should try and
in what sequence.


Andrei

John Nagle

unread,
Jan 8, 2008, 1:09:38 AM1/8/08
to
Andrei Alexandrescu (See Website For Email) wrote:
> lon...@gmail.com wrote:

> So probably the right question to ask is not "what's wrong with C++?"
> but instead "what's wrong with static typing technology at large?" It's
> a criticism directed more at language design in the large, rather than a
> criticism of C++ in particular. If the programming languages community
> has not gotten to the point where the min and max functions can easily
> and painlessly expressed and typechecked, probably it's in fact a tour
> de force that C++ managed to do it, even if it huffed, puffed, and blew
> a gasket in the process.

A long time ago, back when Ada was being designed, I wrote a short paper
titled "Type Integer Considered Harmful", in which I addressed this issue.
The fundamental problem here is that typing addresses a numeric problem
by linguistic means.

The position I took was that the basic integer numeric concept should be
the "range", the range of values a variable can contain. Pascal has the
construct

type T 1..100;

which is considered a "subrange" of the integers. I suggested that this
simply be treated as a "range", with the implementation having the
responsibility to choose a suitable machine representation, or reporting
a compile time error if the range was too big for the hardware.

Constants have a range of their value; i.e. 100 has a range 100..100,
which is simple enough. Intermediate expressions are the tough problem.

I proposed that for intermediate expressions, the compiler be required
to provide a representation large enough to prevent overflow UNLESS overflow
would occur when the value was assigned to a variable of known range. So,
for Pascal-like forms:

type onebyte = 0..255; twobytes = 0..2^16-1; fourbytes = 0..2^32-1;
var y1, y2, y3: twobytes;
z: fourbytes;

z := y1 + y2; (* intermediate for y1 + y2 must be at
least 0..(2^16-1)*2 *)
y3 := y1 + y2; (* don't have to promote intermediate to longer value,
just detect overflow. *)
y1 := (y1 + 1) mod (2^16-1)
(* modular add, written explicitly. The compiler is
free to optimize this into a 16-bit modular
add without range checking. If the
programmer really wants an add which wraps,
this is how it's done. *)

The compiler has to know how to compute the output ranges for the built in
operators. For user-written functions, the return type of the function
is explicit, so that's determined by the programmer.

I wrote this back when there were still 36 bit machines, and part of the
point of this was that the same results are obtained on all platforms for
all programs that compile and run without raising range errors. One
could have a generic programming mechanism in which the output range of
a function was computed at compile time from the input ranges, but
I didn't propose that back in the early 1980s.

I don't see C++ going this route, but this, fundamentally, is the right
answer to numeric range issues. Because it gets the right answers.

(Pascal ran into problems with subranges because, originally, Wirth
designed Pascal for the CDC 6600, which had integer hardware for one
size, 60 bits. 60 bits was big enough that needing larger integers wasn't a
serious problem. When Pascal was brought to the 16-bit world, on machines
where a 32-bit integer add was expensive, that problem was dealt with using
various hacks, rather than rethinking this issue. C kept the hacks
but dropped the "subrange" concept. And so, here we are, three decades
later, still dealing with the problem.)

John Nagle
Animats

lon...@gmail.com

unread,
Jan 8, 2008, 9:44:42 AM1/8/08
to

I am glad someone not only sees that something is wrong but actually
has a pretty good idea of what exactly is wrong. By the way, I have
looked at D language and I am impressed. Seems like a huge improvement
on C++ (for the most part). But until we can all switch to D [:-)], do
you think something can be done for C++ to at least mitigate this
problem?

Andrei Alexandrescu (See Website For Email)

unread,
Jan 8, 2008, 1:57:05 PM1/8/08
to
John Nagle wrote:
> A long time ago, back when Ada was being designed, I wrote a short paper
> titled "Type Integer Considered Harmful", in which I addressed this issue.
> The fundamental problem here is that typing addresses a numeric problem
> by linguistic means.
>
> The position I took was that the basic integer numeric concept should be
> the "range", the range of values a variable can contain. Pascal has the
> construct
>
> type T 1..100;
>
> which is considered a "subrange" of the integers. I suggested that this
> simply be treated as a "range", with the implementation having the
> responsibility to choose a suitable machine representation, or reporting
> a compile time error if the range was too big for the hardware.

Interesting. I couldn't find the paper, but you did a great job at
conveying the gist of it.

There is one element that could help your case in C++ (e.g. as a library
implementation) but a few that work against:

+ The advent of auto would allow people to let the compiler take care of
computing the result type of an expression. Otherwise it would be
terrible to compute the type of e.g. "a * b + c ^ d" by hand.

- Types will proliferate - there will not be a handful of integral
types, but myriads of'em! Template code would be hard pressed to avoid
bloating. Tons of things will not typecheck - what if a library accepts
a range slightly different than yours? You'll have to insert some
explicit conversions etc.

- Magic limits for integers work similarly to magic limits for array
sizes. When somebody defines an int, they might have an inkling on what
the range could be, but most of the time it would be nothing but a
ballpark. So probably many of these limits would be either wrong or
quite large. What follows in the second case is that the first
multiplication takes you in the 64-bit range, and the second is a
compile-time error. This way of doing things just doesn't scale. So just
like instead of statically-sized arrays we prefer dynamically-sized
arrays and hope we have enough memory, similarly, instead of fixed-sized
ints we prefer "dynamically-sized" ints and hope we have enough bits
real estate in'em.

- Inefficiency. I think that with all casting and impedance mismatches
at function and module boundaries there will be a significant amount of
checking and casting back and forth.

Also, I think your scheme fails to solve many (most?) interesting cases.
Consider:

vector<string> vec;
// take the last element
enforce(vec.size() >= 1, "Vector can't be empty");
auto e = vec[vec.size() - 1];

(Could be written as vec.back() but I'm using the index to make my point.)

What's the type of vec.size() in your system? Let's say 0..MAXSIZE,
where MAXSIZE is a constant cleverly divined for the particular system,
sizeof(string) etc. What's the type of an index into the vector? Should
be 0..vec.size(), but you can't type things dynamically so you'll have
to go with 0..MAXSIZE-1. At this point we make the nasty realization
that the operation type is -1..MAXSIZE-1 so there's a need to use some
cast to patch things up.

This is a strong argument. So I understand that system does a mix of
compile-time checking and runtime checking. The fact that you know
ranges during compilation helps avoid some of the run-time checks. Other
languages have chosen to fix the size of integral types so at least they
get the same answers on all platforms, but not necessarily the "right"
or at least let's say "commonly expected" answers. Yet some other
languages forewent the ranges but do dynamic overflow checking after
each operations, so they get the "right" answers but at a higher runtime
cost than your system. Did I understand it correctly?

I guess you could make the case for applying your integral typing system
to C++ by implementing it in the form of a library.


Andrei

Andrei Alexandrescu (See Website For Email)

unread,
Jan 8, 2008, 4:17:25 PM1/8/08
to
lon...@gmail.com wrote:
> I am glad someone not only sees that something is wrong but actually
> has a pretty good idea of what exactly is wrong. By the way, I have
> looked at D language and I am impressed. Seems like a huge improvement
> on C++ (for the most part). But until we can all switch to D [:-)], do
> you think something can be done for C++ to at least mitigate this
> problem?

A while ago, I've discussed with Howard Hinnant the scenario in which
functions could overload on value vs. reference, e.g.:

struct A {};
void foo(A &);
void foo(const A &);
void foo(A);

would describe three distinct overloads, which would be selected based
on the "true" constness and lvalueness of the passed-in argument. Then
people would have been able to figure out things properly, and probably
min and max would have looked nicer.

Howard explained to me that that would have been a breaking change. I
forgot the actual cause, but I know the pattern - there was a subtle
complicated interaction between the suggested rule and some others. And
of course it goes without saying that at this point, breaking backwards
compatibility is a very tall order.

Rvalue references properly navigate the type system and manage to pass
around rvalue information without breaking compatibility. Of course,
adding a new type constructor is less desirable but I doubt there would
have been an acceptable alternative. Apparently, however, rvalue
references offer either a perfectly good min/max or a perfectly simple
one, just not both in the same package. But rvalue references do offer
an ok solution to swap.

Stepanov was quite right when he said that a good language should offer
compelling generic solutions to (1) min and max, (2) swap, and (3)
finding a value in a collection through linear search. C++98 gets (IMHO
of course) 2/5 stars for the first challenge, 1/5 stars for the second,
and 4/5 for the third. (It's remarkable it had the best overall score.)
C++0x gets 3, 3, and 4 stars respectively. The only language I know that
gets all fives is D 2.0, but comparison would be unfair because it was
_designed_ observing those challenges.

So what can C++ programmers do to mitigate the problem? Probably the
best answer is to rely on libraries, which are in a sense "free". The
C++ language allows library writers to put together pretty much any kind
of delicious sausage using means that would make casual users shriek in
horror if they saw.

I know of another successful language that has two even more distinct
user categories: LaTeX. There are users and there are package writers.
I'm probably a LaTeX "advanced beginner" and all I can do when I need to
solve a problem is to download a package that does it for me in ways
that I have no understanding of, use it, and scramble my way out of
whatever errors I get into.


Andrei

John Nagle

unread,
Jan 9, 2008, 2:16:03 AM1/9/08
to
Andrei Alexandrescu (See Website For Email) wrote:
> John Nagle wrote:
>> A long time ago, back when Ada was being designed, I wrote a short
>> paper
>> titled "Type Integer Considered Harmful", in which I addressed this
>> issue.
>> The fundamental problem here is that typing addresses a numeric problem
>> by linguistic means.
>>
>> The position I took was that the basic integer numeric concept
>> should be
>> the "range", the range of values a variable can contain. Pascal has the
>> construct
>>
>> type T 1..100;
>>
>> which is considered a "subrange" of the integers. I suggested that this
>> simply be treated as a "range", with the implementation having the
>> responsibility to choose a suitable machine representation, or reporting
>> a compile time error if the range was too big for the hardware.
>
> Interesting. I couldn't find the paper, but you did a great job at
> conveying the gist of it.

I think it made it into the Ada discussion group documents, which
went out on microfiche, and somewhere in some obscure library cabinet
those probably still exist. Not that anybody cares at this late date.

> - Types will proliferate - there will not be a handful of integral
> types, but myriads of'em! Template code would be hard pressed to avoid
> bloating. Tons of things will not typecheck - what if a library accepts
> a range slightly different than yours? You'll have to insert some
> explicit conversions etc.

The intention is not that these are all different types. They're all
type "int" with different attributes. The space used to store them is
an implementation detail. This works for Ada, which doesn't have explicit
pass by reference. "In-Out" variables may be either passed by reference
or copied, as the implementation pleases. C/C++ have explicit pass
by reference, which exposes that implementation detail and prevents
a size conversion when necessary.

> - Magic limits for integers work similarly to magic limits for array
> sizes. When somebody defines an int, they might have an inkling on what
> the range could be, but most of the time it would be nothing but a
> ballpark. So probably many of these limits would be either wrong or
> quite large. What follows in the second case is that the first
> multiplication takes you in the 64-bit range, and the second is a
> compile-time error. This way of doing things just doesn't scale.

It's only a problem with integer expressions with more than one
operator. With one operator, the result type controls the range.
With two operators, it's an issue.

The classic headache in the 16-bit era was

int w,x,y,z; /* "int" being 16 bits */

w = (x * y) / z;

Under C rules, "x * y" is forced silently into 16 bits. This
is numerically wrong.

For expressions like

e = a*b*c*d;

you might have to write

e = long(a*b) * long(c*d)

to get the program to compile. This is annoying, but makes it clear
to the programmer that they are forcing a range limit.

What actually killed this in Pascal, by the way, was an insistence by
Wirth that the compiler should not perform compile-time arithmetic.
So, in ISO Pascal, you couldn't write things like

const cnt = 100;
var tab: Array [0..cnt-1] of Real;

You had to write:

const cnt = 100, cntminusone=99;
var tab: Array [0..cntminusone] of Real;

That was just silly, and lead to a proliferation of incompatible
Pascal dialects.

> - Inefficiency. I think that with all casting and impedance mismatches
> at function and module boundaries there will be a significant amount of
> checking and casting back and forth.

It's a problem. Although, as I point out occasionally, about 90% of
subscript and range checks can usually be optimized out. I used to
work on this sort of thing; see

http://www.animats.com/papers/verifier/verifiermanual.pdf

> Also, I think your scheme fails to solve many (most?) interesting cases.
> Consider:
>
> vector<string> vec;
> // take the last element
> enforce(vec.size() >= 1, "Vector can't be empty");
> auto e = vec[vec.size() - 1];
>
> (Could be written as vec.back() but I'm using the index to make my point.)
>
> What's the type of vec.size() in your system? Let's say 0..MAXSIZE,
> where MAXSIZE is a constant cleverly divined for the particular system,
> sizeof(string) etc. What's the type of an index into the vector? Should
> be 0..vec.size(), but you can't type things dynamically so you'll have
> to go with 0..MAXSIZE-1. At this point we make the nasty realization
> that the operation type is -1..MAXSIZE-1 so there's a need to use some
> cast to patch things up.

That's just a situation that forces a run-time check. Actually, two
run-time checks, which may be optimized into one, and maybe optimized
out entirely.

One option is to not export the "range" concept to the user level.
Programmers only write "int", "short", etc., but internally, the
compiler treats them as ranges when deciding what intermediate type
to use.

>> I don't see C++ going this route, but this, fundamentally, is the
>> right answer to numeric range issues. Because it gets the right answers.
>
> This is a strong argument. So I understand that system does a mix of
> compile-time checking and runtime checking. The fact that you know
> ranges during compilation helps avoid some of the run-time checks. Other
> languages have chosen to fix the size of integral types so at least they
> get the same answers on all platforms, but not necessarily the "right"
> or at least let's say "commonly expected" answers. Yet some other
> languages forewent the ranges but do dynamic overflow checking after
> each operations, so they get the "right" answers but at a higher runtime
> cost than your system. Did I understand it correctly?

Basically, yes. When there were machines in active use with 8,
12, 16, 18, 24, 32, 36, 48, and 60 bit integer arithmetic, and with
ones complement, twos complement, and signed magnitude representations,
this offered a way to get uniform answers. Also, for a while it looked
like hardware with integer overflow checking was becoming mainstream;
the DEC VAX line had it. Given that machine base, this made sense.
Hardware is now standardized enough that it's much less of an issue.
In fact, it may be time for C and C++ to formally drop support for
odd-sized machines. You can still buy a Unisys ClearPath server
and run OS 2200 (the 40th anniversary of that OS was April 4, 2007),
but in 2007 Unisys finally switched to running the 36-bit code
in software emulation on Intel CPUs, instead of manufacturing their
own custom CMOS processors. I think that's the end of the 36-bit machines.

Again, I don't see C++ going this way; it's far too late.
But it's worth realizing that the problem does have solutions.

My basic point to all this is that trying to deal with numeric
ranges by syntactic means is like pounding a screw. It may work,
but not very well. That's why trying to deal with ranges via
templates is so ugly.

John Nagle

Walter Bright

unread,
Jan 9, 2008, 10:27:39 AM1/9/08
to
Andrei Alexandrescu (See Website For Email) wrote:
> I know of another successful language that has two even more distinct
> user categories: LaTeX. There are users and there are package writers.
> I'm probably a LaTeX "advanced beginner" and all I can do when I need to
> solve a problem is to download a package that does it for me in ways
> that I have no understanding of, use it, and scramble my way out of
> whatever errors I get into.

The difference with Latex, however, is that the user can simply look at
the output of Latex and see if it is right or not. This is not true of
programming language libraries, because our test suites are never good
enough.

Pete Becker

unread,
Jan 9, 2008, 12:11:57 PM1/9/08
to

===================================== MODERATOR'S COMMENT:

Please take care when replying to ensure that this thread remains topical.


===================================== END OF MODERATOR'S COMMENT

On 2008-01-09 10:27:39 -0500, wal...@digitalmars-nospamm.com (Walter
Bright) said:

> Andrei Alexandrescu (See Website For Email) wrote:
>> I know of another successful language that has two even more distinct
>> user categories: LaTeX. There are users and there are package writers.
>> I'm probably a LaTeX "advanced beginner" and all I can do when I need
>> to solve a problem is to download a package that does it for me in ways
>> that I have no understanding of, use it, and scramble my way out of
>> whatever errors I get into.
>
> The difference with Latex, however, is that the user can simply look at
> the output of Latex and see if it is right or not. This is not true of
> programming language libraries, because our test suites are never good
> enough.
>

Our document test suites are never good enough, either. That's why so
many publications have errata pages. People, and programmers in
particular, tend to underestimate the difficulty of producing
documents that are correct, readable, and beautiful. Programming in
LaTeX is, in some ways, much harder than programming in [insert name of
your favorite programming language here]. It's poorly structured and
often poorly documented. Many writers do "download a package ... that
[they] have no understanding of" and manage to get minimally acceptable
results, but doing a good job requires study, experimentation, and
testing, just like writing solid code does.

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

0 new messages