that is a lie.
Compilation only makes sure that values provided at compilation-time
are of the right datatype.
What happens though is that in the real world, pretty much all
computation depends on user provided values at runtime.  See where are
we heading?
this works at compilation time without warnings:
int m=numbermax( 2, 6 );
this too:
int a, b, m;
scanf( "%d", &a );
scanf( "%d", &b );
m=numbermax( a, b );
no compiler issues, but will not work just as much as in python if
user provides "foo" and "bar" for a and b... fail.
What you do if you're feeling insecure and paranoid?  Just what
dynamically typed languages do:  add runtime checks.  Unit tests are
great to assert those.
Fact is:  almost all user data from the external words comes into
programs as strings.  No typesystem or compiler handles this fact all
that graceful...
I would even go further.
Types are only part of the story.  You may distinguish between integers
and floating points, fine.  But what about distinguishing between
floating points representing lengths and floating points representing
volumes?  Worse, what about distinguishing and converting floating
points representing lengths expressed in feets and floating points
representing lengths expressed in meters.
If you start with the mindset of static type checking, you will consider
that your types are checked and if the types at the interface of two
modules matches you'll think that everything's ok.  And six months later
you Mars mission will crash.
On the other hand, with the dynamic typing mindset, you might even wrap
your values (of whatever numerical type) in a symbolic expression
mentionning the unit and perhaps other meta data, so that when the other
module receives it, it may notice (dynamically) that two values are not
of the same unit, but if compatible, it could (dynamically) convert into
the expected unit.  Mission saved!
-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
In fairness, you could do this statically too, and without the consing 
required by the dynamic approach.
-- Scott
I don't deny it. My point is that it's a question of mindset.
    The trouble with that essay is that he's comparing with C++.
C++ stands alone as offering hiding without memory safety.
No language did that before C++, and no language has done it
since.
    The basic problem with C++ is that it take's C's rather lame
concept of "array=pointer" and wallpapers over it with
objects.  This never quite works.  Raw pointers keep seeping
out.  The mold always comes through the wallpaper.
    There have been better strongly typed languages.  Modula III
was quite good, but it was from DEC's R&D operation, which was
closed down when Compaq bought DEC.
				John Nagle
The scanf() family of functions is fine for everyday use, but not
robust enough for potentially hostile inputs. atoi() had to be
replaced by strtol(), but there's a need for a higher-leve function
built on strtol().
I wrote a generic commandline parser once, however it's almost
impossible to achieve something that is both useable and 100%
bulletproof.
or simply use c++ etc and simply use overridden operators which pick the correct
algorithm....
-- 
"Avoid hyperbole at all costs, its the most destructive argument on
the planet" - Mark McIntyre in comp.lang.c
> I'd like to design a language like this. If you add a quantity in
> inches to a quantity in centimetres you get a quantity in (say)
> metres. If you multiply them together you get an area, if you divide
> them you get a dimeionless scalar. If you divide a quantity in metres
> by a quantity in seconds you get a velocity, if you try to subtract
> them you get an error.
There are several existing systems which do this.  The HP48 (and 
descendants I expect) support "units" which are essentially dimensions. 
 I don't remember if it signals errors for incoherent dimensions.  
Mathematica also has some units support, and it definitely does not 
indicate an error: "1 Inch + 1 Second" is fine.  There are probably 
lots of other systems which do similar things.
"Malcolm McLean" <malcolm...@btinternet.com> wrote in message 
news:1d6e115c-cada-46fc...@c10g2000yqh.googlegroups.com...
As you suggested in 'Varaibles with units' comp.programming Feb 16 2008? 
[Yes with that spelling...]
I have a feeling that would quickly make programming impossible (if you 
consider how many combinations of dimensions/units, and operators there 
might be).
One approach I've used is to specify a dimension (ie. unit) only for 
constant values, which are then immediately converted (at compile time) to a 
standard unit:
a:=sin(60°)       # becomes sin(1.047... radians)
d:=6 ins          # becomes 152.4 mm
Here the standard units are radians, and mm. Every other calculation uses 
implied units.
-- 
Bartc 
On the other hand sqrt(4 inches^2) is quite well defined. The question
is whether to allow sqrt(1 inch). It means using rationals rather than
integers for unit superscripts.
(You can argue that you can get things like km^9s^-9g^3 even in a
simple units system. The difference is that these won't occur very
often in real programs, just when people are messing sbout with the
system, and we don't need to make messing about efficient or easy to
use).
> he problem is that if you allow expressions rather than terms then
> the experssions can get arbitrarily complex. sqrt(1 inch + 1 Second),
> for instance.
I can't imagine a context where 1 inch + 1 second would not be an 
error, so this is a slightly odd example.  Indeed I think that in 
dimensional analysis summing (or comparing) things with different 
dimensions is always an error.
> 
> On the other hand sqrt(4 inches^2) is quite well defined. The question
> is whether to allow sqrt(1 inch). It means using rationals rather than
> integers for unit superscripts.
There's a large existing body of knowledge on dimensional analysis 
(it's a very important tool for physics, for instance), and obviously 
the answer is to do whatever it does.  Raising to any power is fine, I 
think (but transcendental functions, for instance, are never fine, 
because they are equivalent to summing things with different 
dimensions, which is obvious if you think about the Taylor expansion of 
a transcendental function).
--tim
It is cumbersome to do it statically, in the current Ada standard. Doing 
it by run-time checks in overloaded operators is easier, but of course 
has some run-time overhead. There are proposals to extend Ada a bit to 
make a static check of physical units ("dimensions") simpler. See 
http://www.ada-auth.org/cgi-bin/cvsweb.cgi/acs/ac-00184.txt?rev=1.3&raw=Y 
and inparticular the part where Edmond Schonberg explains a suggestion 
for the GNAT Ada compiler.
> A mission failure is a failure of management. The Ariadne crash was.
Just a nit, the launcher is named "Ariane".
-- 
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
       .      @       .
> I'd like to design a language like this. If you add a quantity in
> inches to a quantity in centimetres you get a quantity in (say)
> metres. If you multiply them together you get an area, if you divide
> them you get a dimeionless scalar. If you divide a quantity in metres
> by a quantity in seconds you get a velocity, if you try to subtract
> them you get an error.
Done in 1992.
See 
 <http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/syntax/measures/0.html>
 citation at <http://portal.acm.org/citation.cfm?id=150168>
and my extension to it as part of the Loom system:
   <http://www.isi.edu/isd/LOOM/documentation/loom4.0-release-notes.html#Units>
-- 
Thomas A. Russ,  USC/Information Sciences Institute
>I would say the dimensional checking is underrated. It must be
>complemented with a hard and fast rule about only using standard
>(SI) units internally.
>
>Oil output internal : m^3/sec
>Oil output printed:  kbarrels/day
"barrel" is not an SI unit.  And when speaking about oil there isn't
even a simple conversion.
42 US gallons ? 34.9723 imp gal ? 158.9873 L
[In case those marks don't render, they are meant to be the
double-tilda sign meaning "approximately equal".]
George
42 US gallons ≈ 34.9723 imp gal ≈ 158.9873 l
The post as I received it was encoded as 7-bit us-ascii, so definitely
no double-tilde. This post was encoded as utf-8.
I didn't go as far as that, but:
$ cat test.can
database input 'canal.sqlite'
for i=link 'Braunston Turn' to '.*'
	print 'It is ';i.distance into 'distance:%M';' miles (which is '+i.distance into 'distance:%K'+' km) to ';i.place2 into 'name:place'
end for i
$ canal test.can
It is 0.10 miles (which is 0.16 km) to London Road Bridge No 90
It is 0.08 miles (which is 0.13 km) to Bridge No 95
It is 0.19 miles (which is 0.30 km) to Braunston A45 Road Bridge No 91
-- 
Online waterways route planner            | http://canalplan.eu
Plan trips, see photos, check facilities  | http://canalplan.org.uk
He didn't say it was.  Internal calculations are done in SI units (in
this case, m^3/sec); on output, the internal units can be converted to
whatever is convenient.
>                              And when speaking about oil there isn't
> even a simple conversion.
>
>   42 US gallons  ?  34.9723 imp gal  ?  158.9873 L
>
> [In case those marks don't render, they are meant to be the
> double-tilda sign meaning "approximately equal".]
There are multiple different kinds of "barrels", but "barrels of oil"
are (consistently, as far as I know) defined as 42 US liquid gallons.
A US liquid gallon is, by definition, 231 cubic inches; an inch
is, by definition, 0.0254 meter.  So a barrel of oil is *exactly*
0.158987294928 m^3, and 1 m^3/sec is exactly 13.7365022817792
kbarrels/day.  (Please feel free to check my math.)  That's
admittedly a lot of digits, but there's no need for approximations
(unless they're imposed by the numeric representation you're using).
-- 
Keith Thompson (The_Other_Keith) ks...@mib.org  <http://www.ghoti.net/~kst>
Nokia
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
> Fact is:  almost all user data from the external words comes into
> programs as strings.
Sorry, sent this to the individual, not the group.
I'd be very surprised if that were true. I suspect the majority of
programs are in embedded systems, and they will get their "user data"
come straight from the hardware. I doubt the engine controller in your
car or the computer inside your TV pass much data around as strings.
Of course, Python probably isn't the ideal language for real-time
embedded systems (I expect an advocate will be along in a moment to
tell me I'm wrong) but it's important to remember that the desktop
isn't the be-all and end-all of computing.
For what it's worth, all the research I've seen has found that
compile-time checking and testing tend to catch different bugs, so if
correctness is really important to you then you'll want both. The
truth of the matter is that correctness is only one factor in the
equation, and cost is another. Python tilts the balance in favour of
cost -- it's really fast to develop in Python compared to some other
languages, but you lose compiler checks. That's the right balance for
a lot of applications, but not for all. If it's really critical that
the program be correct then you'll want a bondage-and-discipline
language that does masses of check, you might even do separate static
analysis, and you'll *still* do all the dynamic testing you would have
in Python.
-- 
Tim Rowe
There are already numerous libraries that help you with this kind of 
things in various languages; Python (you're crossposting to 
comp.lang.python), for instance, has several, such as Unum, and 
including one I've written but not yet released.  It's not clear why one 
would need this built into the language:
 >>> print si
kg m s A K cd mol
 >>> length = 3*si.in_ # underscore is needed since `in` is a keyword
 >>> print length
3.0 in_
 >>> lengthInCentimeters = length.convert(si.cm)
 >>> print lengthInCentimeters
7.62 cm
 >>> area = lengthInCentimeters*lengthInCentimeters
 >>> print area
58.0644 cm**2
 >>> biggerArea = 10.0*area
 >>> ratio = area/biggerArea
 >>> print ratio
0.1
 >>> speed = (3.0*si.m)/(1.5*si.s)
 >>> print speed
2.0 m/s
 >>> ratio - speed
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
   File "unity.py", line 218, in __sub__
     converted = other.convert(self.strip())
   File "unity.py", line 151, in convert
     raise IncompatibleUnitsError, "%r and %r do not have compatible 
units" % (self, other)
__main__.IncompatibleUnitsError: <Quantity @ 0x-4814a834 (2.0 m/s)> and 
<Quantity @ 0x-4814a7d4 (1.0)> do not have compatible units
And everybody's favorite:
 >>> print ((epsilon_0*mu_0)**-0.5).simplify()
299792458.011 m/s
 >>> print c # floating point accuracy aside
299792458.0 m/s
-- 
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
  San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
   In Heaven all the interesting people are missing.
    -- Friedrich Nietzsche
Actually, the speed of light is exactly 299792458.0 m/s by
definition.  (The meter and the second are defined in terms of the
same wavelength of light; this was changed relatively recently.)
1 inch + 1 second = ~4.03e38 grams.
GORY DETAILS:
Tim Bradshaw  <t...@tfeb.org> wrote:
+---------------
| Malcolm McLean said:
| > he problem is that if you allow expressions rather than terms then
| > the experssions can get arbitrarily complex. sqrt(1 inch + 1 Second),
| > for instance.
| 
| I can't imagine a context where 1 inch + 1 second would not be an 
| error, so this is a slightly odd example.  Indeed I think that in 
| dimensional analysis summing (or comparing) things with different 
| dimensions is always an error.
+---------------
Unless you convert them to equivalent units first. For example, in
relativistic or cosmological physics, one often uses a units basis
wherein (almost) everything is scaled to "1":
http://en.wikipedia.org/wiki/Natural_units
When you set c = 1, then:
    Einstein's equation E = mc2 can be rewritten in Planck units as E = m.
    This equation means "The rest-energy of a particle, measured in Planck
    units of energy, equals the rest-mass of a particle, measured in
    Planck units of mass."
See also:
    http://en.wikipedia.org/wiki/Planck_units
    ...
    The constants that Planck units, by definition, normalize to 1 are the:
    * Gravitational constant, G;
    * Reduced Planck constant, h-bar;  [h/(2*pi)]
    * Speed of light in a vacuum, c;
    * Coulomb constant, 1/(4*pi*epsilon_0) (sometimes k_e or k);
    * Boltzmann's constant, k_B (sometimes k).
This sometimes leads people to do things that would appear sloppy
or even flat-out wrong in MKS or CGS units, such as expressing mass
in terms of length:
    Consider the equation A=1e10 in Planck units. If A represents a
    length, then the equation means A=1.6e-25 meters. If A represents
    a mass, then the equation means A=220 kilograms. ...
    In fact, natural units are especially useful when this ambiguity
    is *deliberate*: For example, in special relativity space and time
    are so closely related that it can be useful to not specify whether
    a variable represents a distance or a time.
So it is that we find that the mass of the Sun is 1.48 km or 4.93 us, see:
http://en.wikipedia.org/wiki/Solar_mass#Related_units
In this limited sense, then, one could convert both 1 inch and 1 second
to masses[1], and *then* add them, hence:
1 inch + 1 second = ~4.03e38 grams.
;-} ;-}
-Rob
[1] 1 inch is "only" ~3.41e28 g, whereas 1 second is ~4.03e38 g,
    so the latter completely dominates in the sum.
-----
Rob Warnock			<rp...@rpw3.org>
627 26th Avenue			<URL:http://rpw3.org/>
San Mateo, CA 94403		(650)572-2607
Sounds just like Frink:
http://futureboy.us/frinkdocs/
http://en.wikipedia.org/wiki/Frink
Cheers,
Chris
--
http://blog.rebertia.com
>George Neuner <gneu...@comcast.net> writes:
>> On 28 Sep 2010 12:42:40 GMT, Albert van der Horst
>> <alb...@spenarnc.xs4all.nl> wrote:
>>>I would say the dimensional checking is underrated. It must be
>>>complemented with a hard and fast rule about only using standard
>>>(SI) units internally.
>>>
>>>Oil output internal : m^3/sec
>>>Oil output printed:  kbarrels/day
>>
>> "barrel" is not an SI unit.
>
>He didn't say it was.  Internal calculations are done in SI units (in
>this case, m^3/sec); on output, the internal units can be converted to
>whatever is convenient.
That's true.  But it is a situation where the conversion to SI units
loses precision and therefore probably shouldn't be done.
>
>>                              And when speaking about oil there isn't
>> even a simple conversion.
>>
>>   42 US gallons  ?  34.9723 imp gal  ?  158.9873 L
>>
>> [In case those marks don't render, they are meant to be the
>> double-tilda sign meaning "approximately equal".]
>
>There are multiple different kinds of "barrels", but "barrels of oil"
>are (consistently, as far as I know) defined as 42 US liquid gallons.
>A US liquid gallon is, by definition, 231 cubic inches; an inch
>is, by definition, 0.0254 meter.  So a barrel of oil is *exactly*
>0.158987294928 m^3, and 1 m^3/sec is exactly 13.7365022817792
>kbarrels/day.  (Please feel free to check my math.)  That's
>admittedly a lot of digits, but there's no need for approximations
>(unless they're imposed by the numeric representation you're using).
I don't care to check it ... the fact that the SI unit involves 12
decimal places whereas the imperial unit involves 3 tells me the
conversion probably shouldn't be done in a program that wants
accuracy.
George
I know.  Hence why I wrote the comment "floating point accuracy aside" 
when printing it.
-- 
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
  San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
   If the past sits in judgment on the present, the future will be lost.
    -- Winston Churchill
A currently developed language with units is curl: see
http://developers.curl.com/userdocs/docs/en/dguide/quantities-basic.html
Frink's most recent version is only 17 days old. (You seemed to imply
Frink isn't under active development.)
Cheers,
Chris
>>  >>> print c # floating point accuracy aside
>> 299792458.0 m/s
> 
> Actually, the speed of light is exactly 299792458.0 m/s by
> definition.  
Yes, but just in vacuum.
Greetings,
Torsten
-- 
http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8 
verschiedenen Datenbanksystemen abstrahiert,
Queries von Applikationen trennt und automatisch die Query-Ergebnisse 
auswerten kann.
Because perhaps you're thinking that oil is sent over the oceans, and
sold retails in barrils of 42 gallons?
Actually, when I buy oil, it's from a pump that's graduated in liters!
It comes from trucks with citerns containing 24 mł.
And these trucks get it from reservoirs of 23,850 mł.
"Tankers move approximately 2,000,000,000 metric tons" says the English
Wikipedia page...
Now perhaps it all depends on whether you buy your oil from Total or
from Texaco, but in my opinion, you're forgetting something: the last
drop.  You never get exactly 42 gallons of oil, there's always a little
drop more or less, so what you get is perhaps 158.987 liter or
41.9999221 US gallons, or even 158.98 liter = 41.9980729 US gallons,
where you need more significant digits.
-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
And even that pales in comparison to the expansion and contraction of 
petroleum products with temperature. Compensation to standard temp is 
required in some jurisdictions but not in others...
Ok.  I took the comment to be an indication that the figure was
subject to floating point accuracy concerns; in fact you meant just
the opposite.
> On Tue, 28 Sep 2010 12:15:07 -0700, Keith Thompson <ks...@mib.org>
> wrote:
> >He didn't say it was.  Internal calculations are done in SI units (in
> >this case, m^3/sec); on output, the internal units can be converted to
> >whatever is convenient.
> 
> That's true.  But it is a situation where the conversion to SI units
> loses precision and therefore probably shouldn't be done.
I suppose that one has to choose between two fundamental designs for any
computational system of units.  One can either store the results
internally in a canonical form, which generally means an internal
representation in SI units.  Then all calculations are performed using
the interal units representation and conversion happens only on input or
output.
Or one can store the values in their original input form, and perform
conversions on the fly during calculations.  For calculations one will
still need to have some canonical representation for cases where the
result value doesn't have a preferred unit provided.  For internal
calculations this will often be the case.
Now whether one will necessarily have a loss of precision depends on
whether the conversion factors are exact or approximations.  As long as
the factors are exact, one can have the internal representation be exact
as well.  One method would be to use something like the Commmon Lisp
rational numbers or the Gnu mp library.
And a representation where one preserves the "preferred" unit for
display purposes based on the original data as entered is also nice.
Roman Cunis' Common Lisp library does that, and with the use of rational
numbers for storing values and conversion factors allows one to do nice
things like make sure that
30mph * 3h = 90mi
even when the internal representation is in SI units (m/s, s, m).
No.  I'm just reacting to the "significant figures" issue.   Real
world issues like US vs Eurozone and measurement error aside - and
without implying anyone here - many people seem to forget that
multiplying significant figures doesn't add them, and results to 12
decimal places are not necessarily any more accurate than results to 2
decimal places.
It makes sense to break macro barrel into micro units only when
necessary.  When a refinery purchases 500,000 barrels, it is charged a
barrel price, not some multiple of gallon or liter price and
regardless of drop over/under.  The refinery's process is continuous
and it needs a delivery if it has less than 20,000 barrels - so the
current reserve figure of 174,092 barrels is as accurate as is needed
(they need to order by tomorrow because delivery will take 10 days).
OTOH, because the refinery sells product to commercial vendors of
gasoline/petrol and heating oil in gallons or liters, it does makes
sense to track inventory and sales in (large multiples of) those
units.
Similarly, converting everything to mł simply because you can does not
make sense.  When talking about the natural gas reserve of the United
States, the figures are given in Kmł - a few thousand mł either way is
irrelevant.
George
I disagree with your conclusion.  Sure, the data was textual when it
was initially read by the program, but that should only be relevant to
the input processing code.  The data is likely converted to some
internal representation immediately after it is read and validated,
and in a sanely-designed program, it maintains this representation
throughout its life time.  If the structure of some data needs to
change during development, the compiler of a statically-typed language
will automatically tell you about any client code that was not updated
to account for the change.  Dynamically typed languages do not provide
this assurance.
This is a red herring.  You don't have to invoke run-time input to 
demonstrate bugs in a statically typed language that are not caught by 
the compiler.  For example:
[ron@mighty:~]$ cat foo.c
#include <stdio.h>
int maximum(int a, int b) {
  return (a > b ? a : b);
}
int foo(int x) { return 9223372036854775807+x; }
int main () {
  printf("%d\n", maximum(foo(1), 1));
  return 0;
}
[ron@mighty:~]$ gcc -Wall foo.c
[ron@mighty:~]$ ./a.out 
1
Even simple arithmetic is Turing-complete, so catching all type-related 
errors at compile time would entail solving the halting problem.
rg
In short, static typing doesn't solve all conceivable problems.
We are all aware that there is no perfect software development process
or tool set.  I'm interested in minimizing the number of problems I
run into during development, and the number of bugs that are in the
finished product.  My opinion is that static typed languages are
better at this for large projects, for the reasons I stated in my
previous post.
More specifically, the claim made above:
> in C I can have a function maximum(int a, int b) that will always
> work. Never blow up, and never give an invalid answer.
is false.  And it is not necessary to invoke the vagaries of run-time 
input to demonstrate that it is false.
> We are all aware that there is no perfect software development process
> or tool set.  I'm interested in minimizing the number of problems I
> run into during development, and the number of bugs that are in the
> finished product.  My opinion is that static typed languages are
> better at this for large projects, for the reasons I stated in my
> previous post.
More power to you. What are you doing here on cll then?
rg
But the above maximum() function does exactly that.  The program's
behavior happens to be undefined or implementation-defined for reasons
unrelated to the maximum() function.
Depending on the range of type int on the given system, either the
behavior of the addition in foo() is undefined (because it overflows),
or the implicit conversion of the result to int either yields an
implementation-defined result or (in C99) raises an
implementation-defined signal; the latter can lead to undefined
behavior.
Since 9223372036854775807 is 2**63-1, what *typically* happens is that
the addition yields the value 0, but the C language doesn't require that
particular result.  You then call maximum with arguments 0 and 1, and
it quite correctly returns 1.
>> We are all aware that there is no perfect software development process
>> or tool set.  I'm interested in minimizing the number of problems I
>> run into during development, and the number of bugs that are in the
>> finished product.  My opinion is that static typed languages are
>> better at this for large projects, for the reasons I stated in my
>> previous post.
>
> More power to you.  What are you doing here on cll then?
This thread is cross-posted to several newsgroups, including
comp.lang.c.
> In short, static typing doesn't solve all conceivable problems.
>
> We are all aware that there is no perfect software development process
> or tool set.  I'm interested in minimizing the number of problems I
> run into during development, and the number of bugs that are in the
> finished product.  My opinion is that static typed languages are
> better at this for large projects, for the reasons I stated in my
> previous post.
Our experience is that a garbage collector and native bignums are much
more important to minimize the number of problems we run into during
development and the number of bugs that are in the finished products.
This all hinges on what you consider to be "a function maximum(int a, 
int b) that ... always work[s] ... [and] never give[s] an invalid 
answer."  But if you don't consider an incorrect answer (according to 
the rules of arithmetic) to be an invalid answer then the claim becomes 
vacuous.  You could simply ignore the arguments and return 0, and that 
would meet the criteria.
If you try to refine this claim so that it is both correct and 
non-vacuous you will find that static typing does not do nearly as much 
for you as most of its adherents think it does.
> >> We are all aware that there is no perfect software development process
> >> or tool set.  I'm interested in minimizing the number of problems I
> >> run into during development, and the number of bugs that are in the
> >> finished product.  My opinion is that static typed languages are
> >> better at this for large projects, for the reasons I stated in my
> >> previous post.
> >
> > More power to you.  What are you doing here on cll then?
> 
> This thread is cross-posted to several newsgroups, including
> comp.lang.c.
Ah, so it is. My bad.
rg
This thread is massively cross-posted.
int maximum(int a, int b) { return a > b ? a : b; }
>           But if you don't consider an incorrect answer (according to 
> the rules of arithmetic) to be an invalid answer then the claim becomes 
> vacuous.  You could simply ignore the arguments and return 0, and that 
> would meet the criteria.
I don't believe it's possible in any language to write a maximum()
function that returns a correct result *when given incorrect argument
values*.
The program (assuming a typical implementation) calls maximum() with
arguments 0 and 1.  maximum() returns 1.  It works.  The problem
is elsewhere in the program.
(And on a hypothetical system with INT_MAX >= 9223372036854775808,
the program's entire behavior is well defined and mathematically
correct.  C requires INT_MAX >= 32767; it can be as large as the
implementation chooses.  In practice, the largest value I've ever
seen for INT_MAX is 9223372036854775807.)
> If you try to refine this claim so that it is both correct and 
> non-vacuous you will find that static typing does not do nearly as much 
> for you as most of its adherents think it does.
Speaking only for myself, I've never claimed that static typing solves
all conceivable problems.  My point is only about this specific example
of a maximum() function.
[...]
OK.  You finished your post with a reference to the halting problem,
which does not help to bolster any practical argument.  That is why I
summarized your post in the manner I did.
I agree that static typed languages do not prevent these types of
overflow errors.
That the problem is "elsewhere in the program" ought to be small 
comfort.  But very well, try this instead:
[ron@mighty:~]$ cat foo.c
#include <stdio.h>
int maximum(int a, int b) { return a > b ? a : b; }
int main() {
  long x = 8589934592;
  printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
  return 0;
}
[ron@mighty:~]$ gcc -Wall foo.c 
[ron@mighty:~]$ ./a.out 
Max of 8589934592 and 1 is 1
It is, perhaps, but it's also an important technical point:  You CAN write
correct code for such a thing.
> int maximum(int a, int b) { return a > b ? a : b; }
> int main() {
>   long x = 8589934592;
>   printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
You invoked implementation-defined behavior here by calling maximum() with
a value which was outside the range.  The defined behavior is that the
arguments are converted to the given type, namely int.  The conversion
is implementation-defined and could include yielding an implementation-defined
signal which aborts execution.
Again, the maximum() function is 100% correct -- your call of it is incorrect.
You didn't pass it the right sort of data.  That's your problem.
(And no, the lack of a diagnostic doesn't necessarily prove anything; see
the gcc documentation for details of what it does when converting an out
of range value into a signed type, it may well have done exactly what it
is defined to do.)
-s
-- 
Copyright 2010, all wrongs reversed.  Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
I don't claim that it's comforting, merely that it's true.
>           But very well, try this instead:
>
> [ron@mighty:~]$ cat foo.c
> #include <stdio.h>
>
> int maximum(int a, int b) { return a > b ? a : b; }
>
> int main() {
>   long x = 8589934592;
>   printf("Max of %ld and 1 is %d\n", x, maximum(x,1));
>   return 0;
> }
> [ron@mighty:~]$ gcc -Wall foo.c 
> [ron@mighty:~]$ ./a.out 
> Max of 8589934592 and 1 is 1
That exhibits a very similar problem.
8589934592 is 2**33.
Given the output you got, I presume your system has 32-bit int and
64-bit long.  The call maximum(x, 1) implicitly converts the long
value 8589934592 to int.  The result is implementation-defined,
but typically 0.  So maximum() is called with arguments of 0 and 1,
as you could see by adding a printf call to maximum().
Even here, maximum() did exactly what was asked of it.
I'll grant you that having a conversion from a larger type to a smaller
type quietly discard high-order bits is unfriendly.  But it matches the
behavior of most CPUs.
Here's another example:
#include <stdio.h>
int maximum(int a, int b) { return a > b ? a : b; }
int main(void) {
  double x = 1.8;
  printf("Max of %f and 1 is %d\n", x, maximum(x, 1));
  return 0;
}
Output:
Max of 1.800000 and 1 is 1
Note that the mistake can be diagnosed:
lint /tmp/u.c -m64 -errchk=all
(7) warning: passing 64-bit integer arg, expecting 32-bit integer: 
maximum(arg 1)
-- 
Ian Collins
Of course.  Computers always do only exactly what you ask of them.  On 
this view there is, by definition, no such thing as a bug, only 
specifications that don't correspond to one's intentions.  
Unfortunately, correspondence to intentions is the thing that actually 
matters when writing code.
> I'll grant you that having a conversion from a larger type to a smaller
> type quietly discard high-order bits is unfriendly.
"Unfriendly" is not the adjective that I would choose to describe this 
behavior.
There is a whole hierarchy of this sort of "unfriendly" behavior, some 
of which can be caught at compile time using a corresponding hierarchy 
of ever more sophisticated tools.  But sooner or later if you are using 
Turing-complete operations you will encounter the halting problem, at 
which point your compile-time tools will fail.  (c.f. the Collatz 
problem)
I'm not saying one should not use compile-time tools, only that one 
should not rely on them.  "Compiling without errors" is not -- and 
cannot ever be -- be a synonym for "bug-free."
rg
f00f.
That said... I think you're missing Keith's point.
> Unfortunately, correspondence to intentions is the thing that actually 
> matters when writing code.
Yes.  Nonetheless, the maximum() function does exactly what it is intended
to do *with the inputs it receives*.  The failure is outside the function;
it did the right thing with the data actually passed to it, the problem
was a user misunderstanding as to what data were being passed to it.
So there's a bug -- there's code which does not do what it was intended
to do.  However, that bug is in the caller, not in the maximum()
function.
This is an important distinction -- it means we can write a function
which performs that function reliably.  Now we just need to figure out
how to call it with valid data... :)
We is why wee all have run time tools called unit tests, don't we?
-- 
Ian Collins
That argument can be made for dynamic language as well. If you write in
dynamic language (e.g. python):
def maximum(a, b):
    return a if a > b else b
The dynamic language's version of maximum() function is 100% correct --
if you passed an uncomparable object, instead of a number, your call of
it is incorrect; you just didn't pass the right sort of data. And that's
your problem as a caller.
In fact, since Python's integer is infinite precision (only bounded by
available memory); in practice, Python's version of maximum() has less
chance of producing erroneous result.
The /most/ correct version of maximum() function is probably one written
in Haskell as:
maximum :: Integer -> Integer -> Integer
maximum a b = if a > b then a else b
Integer in Haskell has infinite precision (like python's int, only
bounded by memory), but Haskell also have static type checking, so you
can't pass just any arbitrary objects.
But even then, it's still not 100% correct. If you pass a really large
values that exhaust the memory, the maximum() could still produce
unwanted result.
Second problem is that Haskell has Int, the bounded integer, and if you
have a calculation in Int that overflowed in some previous calculation,
then you can still get an incorrect result. In practice, the
type-agnostic language with *mandatory* infinite precision arithmetic
wins in terms of correctness. Any language which only has optional
infinite precision arithmetic can always produce erroneous result.
Anyone can dream of 100% correct program; but anyone who believes they
can write a 100% correct program is just a dreamer. In reality, we don't
usually need 100% correct program; we just need a program that runs
correctly enough most of the times that the 0.0000001% chance of
producing erroneous result becomes irrelevant.
In summary, in this particular case with maximum() function, static
checking does not help in producing the most correct code; if you need
to ensure the highest correctness, you must use a language with
*mandatory* infinite precision integers.
Of course there's such a thing as a bug.
This version of maximum:
    int maximum(int a, int b) {
        return a > b ? a : a;
    }
has a bug. This version:
    int maximum(int a, int b) {
        return a > b ? a : b;
    }
I would argue, does not.  The fact that it might be included in a
buggy program does not mean that it is itself buggy.
[...]
> I'm not saying one should not use compile-time tools, only that one 
> should not rely on them.  "Compiling without errors" is not -- and 
> cannot ever be -- be a synonym for "bug-free."
Agreed.  (Though C does make it notoriously easy to sneak buggy code
past the compiler.)
"in C I can have a function maximum(int a, int b) that will always
work. Never blow up, and never give an invalid answer. "
Dynamic typed languages like Python fail in this case on "Never blows
up".
> > I'm not saying one should not use compile-time tools, only that one 
> > should not rely on them.  "Compiling without errors" is not -- and 
> > cannot ever be -- be a synonym for "bug-free."
> 
> Agreed.  (Though C does make it notoriously easy to sneak buggy code
> past the compiler.)
Let's just leave it at that then.
rg
> On 2010-09-30, RG <rNOS...@flownet.com> wrote:
> > Of course.  Computers always do only exactly what you ask of them.  On 
> > this view there is, by definition, no such thing as a bug, only 
> > specifications that don't correspond to one's intentions.  
> 
> f00f.
> 
> That said... I think you're missing Keith's point.
> 
> > Unfortunately, correspondence to intentions is the thing that actually 
> > matters when writing code.
> 
> Yes.  Nonetheless, the maximum() function does exactly what it is intended
> to do *with the inputs it receives*.  The failure is outside the function;
> it did the right thing with the data actually passed to it, the problem
> was a user misunderstanding as to what data were being passed to it.
> 
> So there's a bug -- there's code which does not do what it was intended
> to do.  However, that bug is in the caller, not in the maximum()
> function.
> 
> This is an important distinction -- it means we can write a function
> which performs that function reliably.  Now we just need to figure out
> how to call it with valid data... :)
We lost some important context somewhere along the line:
> > > in C I can have a function maximum(int a, int b) that will always
> > > work. Never blow up, and never give an invalid answer. If someone
> > > tries to call it incorrectly it is a compile error.
Please take note of the second sentence.
One way or another, this claim is plainly false.  The point I was trying 
to make is not so much that the claim is false (someone else was already 
doing that), but that it can be demonstrated to be false without having 
to rely on any run-time input.
rg
But you have to know a lot about the language to know that there's a
problem.  You cannot sensibly test your max function on every
combination of (even int) input which it's designed for (and, of course,
it works for those).
-- 
Online waterways route planner            | http://canalplan.eu
Plan trips, see photos, check facilities  | http://canalplan.org.uk
Or using the new suffix return syntax in C++0x. Something like
template <typename T0, typename T1>
[] maximum( T0 a, T1 b) { return a > b ? a : b; }
Where the return type is deduced at compile time.
-- 
Ian Collins
The second sentence is not disproved by a cast from one datatype to
another (which changes the value) that happens before maximum() is
called.
int maximum(int a, int b);
    int foo() {
      int (*barf)() = maximum;
      return barf(3);
    }
This compiles fine for me.  Where is the cast?  Where is the error message?
Are you saying barf(3) doesn't call maximum?
Indeed.  This is generic programming.   And it happens that in Lisp (and
I assume in languages such as Python), sinte types are not checked at
compilation time, all the functions you write are always generic
functions.
In particular, the property "arguments are not comparable" is not
something that can be determined at compilation time, since the program
may add a compare method for the given argument at run-time (if the
comparison operator used is a generic function).
You can't have it both ways.  Either I am calling it incorrectly, in 
which case I should get a compiler error, or I am calling it correctly, 
and I should get the right answer.  That I got neither does in fact 
falsify the claim.  The only way out of this is to say that 
maximum(8589934592, 1) returning 1 is in fact "correct", in which case 
we'll just have to agree to disagree.
rg
With Tiny C on my system, your code does not cause maximum to give an
incorrect value, or to blow up:
int maximum(int a, int b)
{
  printf("entering maximum %d %d\n",a,b);
  if ( a > b )
    return a;
  else
    return b;
}
int foo()
{
   int (*barf)() = maximum;
   return barf(3);
}
int main (int argc, char *argv[])
{
  printf("maximum is %d\n",foo());
}
------------- output -----------------------------------
entering maximum 3 4198400
maximum is 4198400
How do you define "Never blows up"?
Personally, I'd consider maximum(8589934592, 1) returning 1 as a blow
up, and of the worst kind since it passes silently.
I think we have to agree to disagree, because I don't see the lack of
a compiler error at step 2 as a problem with the maximum() function.
They don't "blow up". They may throw an exception, on which you can act. 
You make it sound like a core dump, which it isn't.
Pascal
-- 
My website: http://p-cos.net
Common Lisp Document Repository: http://cdr.eurolisp.org
Closer to MOP & ContextL: http://common-lisp.net/project/closer/
Try a language with stricter type checking:
CC /tmp/u.c
"/tmp/u.c", line 7: Error: Cannot use int(*)(int,int) to initialize 
int(*)().
"/tmp/u.c", line 8: Error: Too many arguments in call to "int(*)()".
-- 
Ian Collins
Never has execution halt.
I think a key reason in the big rise in the popularity of interpreted
languages is that when execution halts, they normally give a call
stack and usually a good reason for why things couldn't continue. As
opposed to compiled languages which present you with a blank screen
and force you to - fire up a debugger, or much much worse, look at a
core dump - to try and discern all the information the interpreter
presents to you immediately.
>
> Personally, I'd consider maximum(8589934592, 1) returning 1 as a blow
> up, and of the worst kind since it passes silently.
If I had to choose between "blow up" or "invalid answer" I would pick
"invalid answer".
In this example RG is passing a long literal greater than INT_MAX to a
function that takes an int and the compiler apparently didn't give a
warning about the change in value as it created the cast to an int,
even with the option -Wall (all warnings). I think it's legitmate to
consider that an option for a warning/error on this condition should
be available. As far the compiler generating code that checks for a
change in value at runtime when a number is cast to a smaller data
type, I think that's also a legitimate request for a C compiler option
(in addition to other runtime check options like array subscript out
of bounds).
<snip>
> > Fact is:  almost all user data from the external words comes into
> > programs as strings.  No typesystem or compiler handles this fact all
> > that graceful...
>
> I would even go further.
>
> Types are only part of the story.  You may distinguish between integers
> and floating points, fine.  But what about distinguishing between
> floating points representing lengths and floating points representing
> volumes?  Worse, what about distinguishing and converting floating
> points representing lengths expressed in feets and floating points
> representing lengths expressed in meters.
fair points
> If you start with the mindset of static type checking, you will consider
> that your types are checked and if the types at the interface of two
> modules matches you'll think that everything's ok.  And six months later
> you Mars mission will crash.
do you have any evidence that this is actually so? That people who
program in statically typed languages actually are prone to this "well
it compiles so it must be right" attitude?
> On the other hand, with the dynamic typing mindset, you might even wrap
> your values (of whatever numerical type) in a symbolic expression
> mentionning the unit and perhaps other meta data, so that when the other
> module receives it, it may notice (dynamically) that two values are not
> of the same unit, but if compatible, it could (dynamically) convert into
> the expected unit.  Mission saved!
they *may* do this but do they *actually* do it? My (limited)
experience of dynamically typed languges is everynow and again you
attempt to apply an operator to the wrong type of operand and kerblam!
If your testing is inadaquate then it's inadaquate whatever the
typiness of your language.
there are some application domains where neither option would be
viewed as a satisfactory error handling strategy. Fly-by-wire, petro-
chemicals, nuclear power generation. Hell you'd expect better than
this from your phone!
> In this example RG is passing a long literal greater than INT_MAX to a
> function that takes an int and the compiler apparently didn't give a
> warning about the change in value as it created the cast to an int,
> even with the option -Wall (all warnings). I think it's legitmate to
> consider that an option for a warning/error on this condition should
> be available. As far the compiler generating code that checks for a
> change in value at runtime when a number is cast to a smaller data
> type, I think that's also a legitimate request for a C compiler option
> (in addition to other runtime check options like array subscript out
> of bounds).
I think that it's a legitimate request, in this age and day, for a C
programmer to require that it be NOT an option to a C compiler not to
give any error for this and similar cases.
(And we should just kill all the programs that don't pass this check,
which I'm afraid would be a big number, which I understand, is the
reason why C compilers don't change).
-- 
__Pascal Bourguignon__
http://www.informatimago.com
> On 27 Sep, 20:29, p...@informatimago.com (Pascal J. Bourguignon)
> wrote:
>> If you start with the mindset of static type checking, you will consider
>> that your types are checked and if the types at the interface of two
>> modules matches you'll think that everything's ok.  And six months later
>> you Mars mission will crash.
>
> do you have any evidence that this is actually so? That people who
> program in statically typed languages actually are prone to this "well
> it compiles so it must be right" attitude?
Yes, I can witness that it's in the mind set.
Well, the problem being always the same, the time pressures coming from
the sales people (who can sell products of which the first line of
specifications has not been written yet, much less of code), it's always
a battle to explain that once the code is written, there is still a lot
of time needed to run tests and debug it.  I've even technical managers,
who should know better, expecting that we write bug-free code in the
first place (when we didn't even have a specification to begin with!).
>> On the other hand, with the dynamic typing mindset, you might even wrap
>> your values (of whatever numerical type) in a symbolic expression
>> mentionning the unit and perhaps other meta data, so that when the other
>> module receives it, it may notice (dynamically) that two values are not
>> of the same unit, but if compatible, it could (dynamically) convert into
>> the expected unit.  Mission saved!
>
> they *may* do this but do they *actually* do it? My (limited)
> experience of dynamically typed languges is everynow and again you
> attempt to apply an operator to the wrong type of operand and kerblam!
> If your testing is inadaquate then it's inadaquate whatever the
> typiness of your language.
Unfortunately, a lot of programmers in dynamic programming languages
have been formed with static programming languages bring with them their
old mindset.  Moreover, when the syntax of the newer dynamic programming
languages is explicitely designed similar to an older static programming
language,  in order to attract these programmers toward the better
technologies, this does not help changing the mindset either.
Unfortunately, you can write FORTRAN code in any programming language.
But my point is that at least with dynamic programming languages,
there's an alternative mindset and it is easier to implement such
a scheme than with static programming languages.
In Lisp, which stresses the symbolic computing part (S-expr are Symbolic
Expressions), it is almost trivial to implement.
I wasn't speaking generally, just in the case of which of only two
choices RG's code should be referred to - "blowing up" or "giving an
invalid answer".
I think error handling in personal computer and website software has
improved over the years but there is still some room for improvement
as you will still get error messages that don't tell you something you
can relay to tech support more than that an error occurred or that
some operation can't be performed.
But I worked with programmers doing in-house software who were
incredibly turned off by exception handling in C++. I thought that
meant that they preferred to return and check error codes from
functions as they had done in C, and for some of them it did seem to
mean that. But for others it seemed that they didn't want to
anticipate errors at all ("that file is always gonna be there!"). I
read a Java book by Deitel and Deitel and they pointed out what might
have lead to that attitude - the homework and test solutions in
college usually didn't require much if any error handling - the
student could assume files were present, data was all there and in the
format expected, user input was valid and complete, etc.
I think I'd prefer termination if those were my only choices. What's
the rest of the program going to do with the wrong result? When the
program finally gives up the cause is lost in the mists of time, and
those are hard to debug!
> I think error handling in personal computer and website software has
> improved over the years but there is still some room for improvement
> as you will still get error messages that don't tell you something you
> can relay to tech support more than that an error occurred or that
> some operation can't be performed.
>
> But I worked with programmers doing in-house software who were
> incredibly turned off by exception handling in C++. I thought that
> meant that they preferred to return and check error codes from
> functions as they had done in C, and for some of them it did seem to
> mean that. But for others it seemed that they didn't want to
> anticipate errors at all ("that file is always gonna be there!").
that was one of the reasons I liked exceptions. If my library threw an
exception then the caller *had* to do something about it. Even to
ignore it he had to write some code.
> I
> read a Java book by Deitel and Deitel and they pointed out what might
> have lead to that attitude - the homework and test solutions in
> college usually didn't require much if any error handling - the
> student could assume files were present, data was all there and in the
> format expected, user input was valid and complete, etc.
plausible. Going from beginner to <whatever> I probably steadily
increased the pessimism of my code. The file might not be there. That
other team might send us syntactically invalid commands. Even if it
can't go wrong it will go wrong. Fortunately my collage stuff included
some OS kernal stuff. There anything that can go wrong will go wrong.
> there are some application domains where neither option would be
> viewed as a satisfactory error handling strategy. Fly-by-wire, petro-
> chemicals, nuclear power generation. Hell you'd expect better than
> this from your phone!
People always give these kind of scenarios, but actually there are far 
more mundane ones.  In my day job I'm a sysadmin and I spend a bunch of 
time writing code (typically what would nowadays be called "scripts" 
rather than programs, but there's no real difference) which does things 
of the form
	for every machine in <several hundred systems>
	do <something>
where <something> is fairly often "modify critical system configuration file".
Programs like that have some absolute, non-negotiable requirements:
- they must never fail silently;
- they must check everything they do however unlikely it seems that it 
would failm
  because they will come across systems which have almost arbitrary 
misconfiguration.
- they should be idempotent if possible;
- if they come across something odd they either need to handle it,
  or put things back the way they were and back out;
- if they absolutely can not put things back, they need to report this 
very clearly
  and carefully preserve any detriitus in such a way that a human can 
pick up the bits;
- whatever they do they need to report in a completely parsable way 
what happened
  (success, failure, already done, backed out, not backed out, and so on).
These are quite mundane everyday things, but the consequences of 
getting them wrong can be quite nasty (the worst ones being "the 
machines will still run, but won't boot").
You are calling maximum() incorrectly, but you are doing so in a way
that the compiler is not required to diagnose.
If you want to say that the fact that the compiler is not required
to diagnose the error is a flaw in the C language, I won't
argue with you.  It's just not a flaw in the maximum() function.
If I write:
    const double pi = 22.0/7.0;
    printf("pi = %f\n", pi);
then I suppose I'm calling printf() incorrectly, but I wouldn't
expect my compiler to warn me about it.
If you're arguing that
int maximum(int a, int b) { return a > b ? a : b; }
is flawed because it's too easy to call it incorrectly, you're
effectively arguing that it's not possible to write correct
code in C at all.
For various historical reasons, "-Wall" has the semantics you might
expect from an option named "-Wsome-common-warnings-but-not-others".
-s
-- 
Copyright 2010, all wrongs reversed.  Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
I am not speaking for my employer, although they do rent some of my opinions.
>> > > in C I can have a function maximum(int a, int b) that will always
>> > > work. Never blow up, and never give an invalid answer. If someone
>> > > tries to call it incorrectly it is a compile error.
> Please take note of the second sentence.
I did. That is entirely correct.
> One way or another, this claim is plainly false.  The point I was trying 
> to make is not so much that the claim is false (someone else was already 
> doing that), but that it can be demonstrated to be false without having 
> to rely on any run-time input.
It is not at all obvious to me that it is, in fact, false.  So far as
I can tell, *if* the function is successfully called, then it will take
two integers, compare them, and return the larger one.  It will never
return something which is not an integer.  It will never raise an exception.
It will never return a value which, if you try to treat it as an integer,
raise an exception.
Now, if you pass the wrong values to it, you will get wrong answers -- but
that's your problem for passing it wrong values.
I would understand an "invalid" answer to be one of the wrong category.  For
instance, if I have a function in Python that I expect to return a string,
and it returns None, I have gotten an answer that is "invalid" -- it's not
a string.
> This compiles fine for me. Where is the cast?
On the first line of code inside foo().
> Where is the error message?
You chose to use a form that suppresses the error message.
> Are you saying barf(3) doesn't call maximum?
I would say that it is undefined whether or not it calls maximum, because
you called a function through a function pointer of a different sort,
which invoked undefined behavior.
There exist real compiles on which code much like this will coredump
without ever once trying to jump to the address of the maximum function,
because the compiler caught your error.
You get a warning if you ask for it.  If you choose to run without all
the type checking on, that's your problem.
> Yes, I can witness that it's in the mind set.
Huh.
So here I am, programming in statically typed languages, and I have never
in my life thought that things which compiled were necessarily right.  Not
even when I was an arrogant teenager.
I guess I don't exist. *sob*
> Well, the problem being always the same, the time pressures coming from
> the sales people (who can sell products of which the first line of
> specifications has not been written yet, much less of code), it's always
> a battle to explain that once the code is written, there is still a lot
> of time needed to run tests and debug it.
At $dayjob, they give us months between feature complete and shipping,
because they expect us to spend a lot of time testing, debugging, and
cleaning up.  But during that time we are explicitly not adding features...
> But my point is that at least with dynamic programming languages,
> there's an alternative mindset and it is easier to implement such
> a scheme than with static programming languages.
I think this grossly oversimplifies things.
> RG <rNOS...@flownet.com> writes:
> [...]
> > You can't have it both ways.  Either I am calling it incorrectly, in 
> > which case I should get a compiler error, or I am calling it correctly, 
> > and I should get the right answer.  That I got neither does in fact 
> > falsify the claim.  The only way out of this is to say that 
> > maximum(8589934592, 1) returning 1 is in fact "correct", in which case 
> > we'll just have to agree to disagree.
> 
> You are calling maximum() incorrectly, but you are doing so in a way
> that the compiler is not required to diagnose.
Yes.  I know.  That was my whole point.  There are ways to call a 
function incorrectly (more broadly, there are errors in code) that a C 
compiler is not required to diagnose.
> If you want to say that the fact that the compiler is not required
> to diagnose the error is a flaw in the C language, I won't
> argue with you.
I'm not even saying it's a flaw in the language.  All I'm saying is that 
the original claim -- that any error in a C program will be caught by 
the compiler -- is false, and more specifically, that it can be 
demonstrated to be false without appeal to unknown run-time input.
As an aside, this particular error *could* be caught (and in fact would 
be caught by other tools like lint), but there are errors that can not 
be caught by any static analysis, and that therefore one should not be 
lulled into a false sense of security by the fact that your code is 
written in a statically typed language and compiled without errors or 
warnings.  That's all.
> If I write:
> 
>     const double pi = 22.0/7.0;
>     printf("pi = %f\n", pi);
> 
> then I suppose I'm calling printf() incorrectly, but I wouldn't
> expect my compiler to warn me about it.
> 
> If you're arguing that
> 
>     int maximum(int a, int b) { return a > b ? a : b; }
> 
> is flawed because it's too easy to call it incorrectly, you're
> effectively arguing that it's not possible to write correct
> code in C at all.
I would say that it is very, very hard to write correct code in C for 
any non-vacuous definition of "correct".  That is the reason that core 
dumps and buffer overflows are so ubiquitous.  I prefer Lisp or Python, 
where core dumps and buffer overflows are virtually nonexistent.  One 
does get the occasional run-time error that might have been caught at 
compile time, but I much prefer that to a core dump or a security hole.
One might hypothesize that the best of both worlds would be a dynamic 
language with a static analyzer layered on top.  Such a thing does not 
exist.  It makes an instructive exercise to try to figure out why.  (For 
the record, I don't know the answer, but I've learned a lot through the 
process of pondering this conundrum.)
rg
After thinking for a bit. I believe I can demonstrate a situation
where indeed maximum could return the wrong answer and it isn't due to
being passed incorrect input.
If, in maximum, after the entrance to the function call but right
before the comparison, a signal handler gets invoked, walks the stack,
swaps the two values for a and b, and returns back into maximum. Then
maximum will do the wrong thing.  Since control flow was always in
a subgraph of the control flow graph through maximum, this would
classify as a failure given your strict view. (As an aside, one can
do the same thing with a debugger.)
Blocking the signals around the comparison and assignment of the
result to a temporary variable that you will return won't fix it.
This is because (in C) you must have a sequence point after the
unblocking of the signals and before the assignment of a temporary
variable holding the result to the return register, where, in fact,
another signal could arrive and again corrupt the results. Depending
upon the optimzation values of the compiler, it may or may not adjust
the ordering semantics of the assignment to the return register in
relation to the call to unblock the signals. The assignment of a
result to a return register is not defined to be something in C,
and can happen anywhere. But the C statements you used to write it
must adhere to sequence evaluation.
Since the signal handler could do anything, including completely
replacing the text segments and/r loaded libraries of the code or
move the PC to an arbitrary palce, I don't think you can "fix" this
problem. Ever.
If you think this is a pedantic case which never happens in practice,
I'm the maintainer of a well-known user space checkpointing system
where these types of problems have to be thought about deeply because
they happen.
In addition, there are other modes of error injection: in compute
clusters with very high density memory that is not ECC, you can
actually calculate the probability that a bit will flip at an address
in memory due to cosmic rays. That probability is disturbingly high.
Just an idle search online produced this article:
http://news.cnet.com/8301-30685_3-10370026-264.html
which mentions some statistics. Think 1 billion hours is a lot and
"it'll never happen"?
There are 8760 hours in a year. So, you'd only need 114,156 computers
in a cluster running for one year before amassing 1 billion hours
of computation. That isn't a big number for large financial companies,
google, etc, etc, etc to own.
As a fun statistic, the BlueGene/P supercomputer can have 884,736
processors with associated memory modules. According to the math
in the article, one BlueGene/P should see a max of ~600,000 memory
errors per year.
Sure, you might not think any of this is a problem, because your
home desktop always produces the right answer when balancing your
checkbook, but it is a matter of perception of scale. Lots of large
clusters and data movement houses go to great length to ensure data
integrity. Injecting wrong data 4 hours into a 6 month long workflow
running on thousands of computers really upsets the hell out of people.
I've run into physicists who simply run their buggy software over
and over and over again on the same data and do statistical analysis
on the results. They've come to the realization that they can't
find/fix/verify all the bugs in their code, so they assume they are
there and write systems which try to be mathematically robust to the
nature of the beast. It is cheaper to wait for 1000 runs of a program
to be computed on a cluster than to spend human time debugging a
difficult bug in the code.
So, mathematically, maximum can't fail inside of itself, realistically
while executing on a physical machine, you bet it'll fail. :)
-pete
> How do you define "Never blows up"?
I would say "blow up" would be "raise an exception".
> Personally, I'd consider maximum(8589934592, 1) returning 1 as a blow
> up, and of the worst kind since it passes silently.
So run your compiler with a decent set of warning levels, and watch as
you are magically warned that you're passing an object of the wrong type.
On any given system, one or the other is true:
1.  The constant 8589934592 is of type int, and the function will
"work" -- will give that result.
2.  The constant is not of type int, and the compiler will warn you about
this if you ask.
> On 2010-09-30, Lie Ryan <lie....@gmail.com> wrote:
> > On 09/30/10 16:09, TheFlyingDutchman wrote:
> >> Dynamic typed languages like Python fail in this case on "Never blows
> >> up".
> 
> > How do you define "Never blows up"?
> 
> I would say "blow up" would be "raise an exception".
> 
> > Personally, I'd consider maximum(8589934592, 1) returning 1 as a blow
> > up, and of the worst kind since it passes silently.
> 
> So run your compiler with a decent set of warning levels, and watch as
> you are magically warned that you're passing an object of the wrong type.
My code compiles with no warnings under gcc -Wall.
> On any given system, one or the other is true:
> 
> 1.  The constant 8589934592 is of type int, and the function will
> "work" -- will give that result.
> 2.  The constant is not of type int, and the compiler will warn you about
> this if you ask.
It would be nice if this were true, but my example clearly demonstrates 
that it is not.  And if your response is to say that I should have used 
lint, then my response to that will be that because of the halting 
problem, for any static analyzer that you present, I can construct a 
program that either contains an error that either your analyzer will not 
catch, or for which it will generate a false positive.  It just so 
happens that constructing such examples for standard C is very easy.
rg
> On 2010-09-30, RG <rNOS...@flownet.com> wrote:
> > You can't have it both ways.  Either I am calling it incorrectly, in 
> > which case I should get a compiler error,
> 
> You get a warning if you ask for it.  If you choose to run without all
> the type checking on, that's your problem.
My example compiles with no warnings under gcc -Wall.
Yes, I know I could have used lint.  But that misses the point.  For any 
static analyzer, because of the halting problem, I can construct a 
program that either contains an error that the analyzer will not catch, 
or for which the analyzer will produce a false positive.
rg
That's nice.  gcc -Wall uses only a small subset of warnings that fit
the usual expectations of C code that's trying to work on common
architectures.
>> 2.  The constant is not of type int, and the compiler will warn you about
>> this if you ask.
> It would be nice if this were true, but my example clearly demonstrates 
> that it is not.
No, it doesn't, because you didn't ask for the relevant kind of warnings.
> And if your response is to say that I should have used 
> lint, then my response to that will be that because of the halting 
> problem, for any static analyzer that you present, I can construct a 
> program that either contains an error that either your analyzer will not 
> catch, or for which it will generate a false positive.  It just so 
> happens that constructing such examples for standard C is very easy.
I'm not sure that that's actually a halting problem case.  The thing about
static typing is that we don't actually HAVE to solve the halting problem;
we only have look at the types of the components, all of which are knowable
at compile time, and we can tell you whether there's any unsafe conversions.
And that's the magic of static typing:  It is not a false positive to
warn you that "2L" is not of type int.  There are things which would be a
false positive in trying to determine whether something will be out of range
in a runtime expression, but which are not false positives in a statically
typed language.
My post that kicked off this thread was not cross-posted, so many of the 
participants may not have seen it.  Here it is again, for your convenience:
---------------------
This might have been mentioned here before, but I just came across it: a 
2003 essay by Bruce Eckel on how reliable systems can get built in 
dynamically-typed languages.  It echoes things we've all said here, but 
I think it's interesting because it describes a conversion experience: 
Eckel started out in the strong-typing camp and was won over.
https://docs.google.com/View?id=dcsvntt2_25wpjvbbhk
-- Scott
> And that's the magic of static typing:  It is not a false positive to
> warn you that "2L" is not of type int.
We'll have to agree to disagree about that.  The numerical value 2 can 
safely be represented as an int, so I would consider this a false 
positive.
rg
Here's a post I wrote earlier, before the conversation got cross-posted. 
  To me, this is the essence of the matter.
-----------------------
Norbert_Paul wrote:
 >
 > OK, but sometimes it is handy to have the possibility to make 
compile-time
 > assertions which prevent you from committing easily avoidable simple
 > mistakes.
Agreed.  I actually don't see this issue in black and white terms; I've 
written lots of Lisp, and I've written lots of code in statically typed 
languages, and they all have advantages and disadvantages.  In the end 
it all comes back to my time: how much time does it take me to ship a 
debugged system?  Working in Lisp, sometimes I don't get immediate 
feedback from the compiler that I've done something stupid, but this is 
generally counterbalanced by the ease of interactive testing, that 
frequently allows me to run a new piece of code several times in the 
time it would have taken me to do a compile-and-link in, say, C++.
So while I agree with you that compiler warnings are sometimes handy, 
and there are occasions, working in Lisp, that I would like to have more 
of them(*), it really doesn't happen to me very often that the lack of 
one is more than a minor problem.
(*) Lisp compilers generally do warn about some things, like passing the 
wrong number of arguments to a function, or inconsistent spelling of the 
name of a local variable.  In my experience, these warnings cover a 
substantial fraction of the stupid mistakes I actually make.
-- Scott
Look again; there's no cast in foo().
That first line declare barf as an object of type "pointer to
function returning int", or more precisely, "pointer to function with
an unspecified but fixed number and type of parameters returning int"
(i.e., an old-style non-prototype declaration, still legal but
deprecated in both C90 and C99).  It then initializes it to point
to the "maximum" function.  I *think* the types are sufficiently
"compatible" (not necessarily using that word the same way the
standard does) for the initialization to be valid and well defined.
I might check the standard later.
It would have been better to use a prototype (for those of you
in groups other than comp.lang.c, that's a function declaration that
specifies the types of any parameters):
int (*barf)(int, int) = maximum;
IMHO it's better to use prototypes consistently than to figure out the
rules for interactions between prototyped vs. non-prototyped function
declarations.
[...]
Can you describe any plausible real-world programs where the effort of
complicated static is justified, and for which the halting problem gets
in the way of analysis?  By "real world", I meanI wouldn't consider
searching for counterexamples to the Collatz conjecture to qualify as
sufficiently real-world and sufficiently complex for fancy static
analysis.  And even if it did, the static analyzer could deliver a
partial result, like "this function either returns a counterexample to
the Collatz conjecture or else it doesn't return".  
D. Turner wrote a famous paper arguing something like the above, saying
basically that Turing completeness of programming languages is
overrated:
http://www.jucs.org/jucs_10_7/total_functional_programming
The main example of a sensible program that can't be written in a
non-complete language is an interpreter for a Turing-complete language.
But presumably a high-assurance application should never contain such a
thing, since the interpreted programs themselves then wouldn't have
static assurance.
> One might hypothesize that the best of both worlds would be a dynamic 
> language with a static analyzer layered on top.  Such a thing does not 
> exist.  It makes an instructive exercise to try to figure out why.  (For 
> the record, I don't know the answer, but I've learned a lot through the 
> process of pondering this conundrum.)
There are static analysis tools for Common Lisp:
http://www.cs.cmu.edu/afs/cs/project/ai-repository/ai/lang/lisp/code/tools/lint/0.html
or lisp in general.  For example PHENARETE is in the category of static
analysis tools.
One could regret that they're not more developed, but I guess this only
proves the success of using dynamic programming languages: if there were
a real need for these tools, along with a good ROI, they would be more
developed.  In the meantime, several test frameworks are developed.
-- 
__Pascal Bourguignon__                     http://www.informatimago.com/
> In article <slrnia9dbo.2uq...@guild.seebs.net>,
>  Seebs <usenet...@seebs.net> wrote:
>
>> On 2010-09-30, RG <rNOS...@flownet.com> wrote:
>> > You can't have it both ways.  Either I am calling it incorrectly, in 
>> > which case I should get a compiler error,
>> 
>> You get a warning if you ask for it.  If you choose to run without all
>> the type checking on, that's your problem.
>
> My example compiles with no warnings under gcc -Wall.
IIRC, -Wall is not reall ALL.
Try with: gcc -Wall -Wextra -Werror
I would still argue that should be the default, and if really there was
a need, there could be options to disable some warning, or to have some
errors be warnings...
Of course.
>> If you want to say that the fact that the compiler is not required
>> to diagnose the error is a flaw in the C language, I won't
>> argue with you.
>
> I'm not even saying it's a flaw in the language.  All I'm saying is that 
> the original claim -- that any error in a C program will be caught by 
> the compiler -- is false, and more specifically, that it can be 
> demonstrated to be false without appeal to unknown run-time input.
Did someone *really* claim that "any error in a C program will
be caught by the compiler"?  If so, I must have missed that.
It's certainly not true; code that compiles cleanly can be riddled
with errors.  That's true in any language, but more so in C than
in some others.
> As an aside, this particular error *could* be caught (and in fact would 
> be caught by other tools like lint), but there are errors that can not 
> be caught by any static analysis, and that therefore one should not be 
> lulled into a false sense of security by the fact that your code is 
> written in a statically typed language and compiled without errors or 
> warnings.  That's all.
I don't believe anyone has said otherwise.
>> If I write:
>> 
>>     const double pi = 22.0/7.0;
>>     printf("pi = %f\n", pi);
>> 
>> then I suppose I'm calling printf() incorrectly, but I wouldn't
>> expect my compiler to warn me about it.
>> 
>> If you're arguing that
>> 
>>     int maximum(int a, int b) { return a > b ? a : b; }
>> 
>> is flawed because it's too easy to call it incorrectly, you're
>> effectively arguing that it's not possible to write correct
>> code in C at all.
>
> I would say that it is very, very hard to write correct code in C for 
> any non-vacuous definition of "correct".  That is the reason that core 
> dumps and buffer overflows are so ubiquitous.  I prefer Lisp or Python, 
> where core dumps and buffer overflows are virtually nonexistent.  One 
> does get the occasional run-time error that might have been caught at 
> compile time, but I much prefer that to a core dump or a security hole.
I would say that it can certainly be difficult to write correct
code in C, but I don't believe it's nearly as hard as you think
it is.  It requires more discipline than some other languages,
and it can require some detailed  knowledge of the language itself,
particularly what it defines and what it doesn't.  And it's not
always worth the effort if another language can do the job as well
or better.
> One might hypothesize that the best of both worlds would be a dynamic 
> language with a static analyzer layered on top.  Such a thing does not 
> exist.  It makes an instructive exercise to try to figure out why.  (For 
> the record, I don't know the answer, but I've learned a lot through the 
> process of pondering this conundrum.)
-- 
>>
>> > "in C I can have a function maximum(int a, int b) that will always
>> > work. Never blow up, and never give an invalid answer. "
>>
>> > Dynamic typed languages like Python fail in this case on "Never blows
>> > up".
>>
>> How do you define "Never blows up"?
>
> Never has execution halt.
>
> I think a key reason in the big rise in the popularity of interpreted
> languages 
This is a false conception.
Whether the execution of a program is done by a processor of the
programming language, or a processor of another programming language
(and therefore requiring a translation phase), is a notion is NOT a
characteristic of programming language, but only of execution
environments.
1- There are C interpreters
    CINT - http://root.cern.ch/root/Cint.html
    EiC - http://eic.sourceforge.net/
    Ch - http://www.softintegration.com
2- All the current Common Lisp implementations have compilers,
3- Most current Common Lisp implementations actually compile to native
   code (ie they chose to translate to programming languages that are
   implemented by Intel, AMD or Motorola. (Notice that these programming
   languages are NOT implemented in hardware, but in software, called
   micro-code, stored on the real hardware inside the
   micro-processors); some choose to translate to C and call an external
   C compiler to eventually translate to "native" code).
4- Actually, there is NO current Common Lisp implementation having only
   an interpreter.  On the contrary, most of the don't have any
   interpreter (but all of them have a REPL, this is an orthogonal
   concept).
5- Even the first LISP implementation made in 1959 had a compiler.
6- I know less the situation for the other dynamic programming language,
   but for example, if CPython weren't a compiler, you should know that
   CLPython is a compiler (it's an implementation of Python written in
   Common Lisp, which translates Python into Common Lisp and compiles it).
> is that when execution halts, they normally give a call
> stack and usually a good reason for why things couldn't continue. As
> opposed to compiled languages which present you with a blank screen
> and force you to - fire up a debugger, or much much worse, look at a
> core dump - to try and discern all the information the interpreter
> presents to you immediately.
Theorically, a compiler for a static programming language has even more
information about the program, so it should be able to produce even
better backtrace in case of problem...
> We'll have to agree to disagree about that.
No, we won't.  It's the *definition* of static typing.  Static typing
is there to give you some guarantees at the expense of not being able
to express some things without special extra effort.  That's why it's
static.
> The numerical value 2 can 
> safely be represented as an int, so I would consider this a false 
> positive.
That's nice for you, I guess.
The point of static typing is that it makes it possible to ensure that
the values that reach a function are in fact of the correct type -- at
the cost of not being able to rely on free runtime conversions.
If you want to write safe conversions, you can do that.  If you don't
bother to do that, you end up with errors -- by definition.