Specifically, the numerically-intensive spectral-norm and n-body benchmarks
from the shootout. I get:
ocamlopt g++ -O3 F#
Spectral-norm 14.94s 9.34s 9.37s
N-body 9.23s 8.21s 6.87s
I find this particularly interesting because these are bast-case results for
OCaml and C++. Specifically, the machine is a 2x Athlon 64 (OCaml does
worse on Intels) and the F# is running in 32-bit Win XP but OCaml and g++
are in 64-bit Linux.
Any have data running F# on 64-bit Windows?
--
Dr Jon D Harrop, Flying Frog Consultancy
OCaml for Scientists
http://www.ffconsultancy.com/products/ocaml_for_scientists/?usenet
> I find this particularly interesting because these are bast-case results for
> OCaml and C++. Specifically, the machine is a 2x Athlon 64 (OCaml does
> worse on Intels) and the F# is running in 32-bit Win XP but OCaml and g++
> are in 64-bit Linux.
Hm. I've heard about slowdowns on 64bit because there is more data to
be shoveld over the bus (and the effective cache size in ints/pointers
is only half). So I would be cautious to call that best case results:
Try comparing 32bit and 32Bit or 64 and 64 and then we can talk.
Regards -- Markus
I think you're blazing new ground!
I don't remember seeing side-by-side programming language timings
where the measurements were made on different OSs.
Should we look forward to side-by-side comparions where OCaml was
measured on Athlon and F# was measured on Xeon? ;-)
I think that's awesome, but how useful is performance these days? I
don't think people use python because it's fast. *I* certainly don't
use ocaml because it *is* fast. I use it because it's damn faster to
write code in than anything else. Testing and debugging python code
totally sucks. I rarely test/debug ocaml code. When I do, it's
fairly simple and straightforward.
I'll be sure to learn how to benchmark properly from you, Isaac:
http://shootout.alioth.debian.org/sandbox/benchmark.php?test=binarytrees&lang=ocaml&id=0
- "De-optimized by Isaac Gouy"
Well, performance is still vital for a wide variety of important
applications in science and engineering. So the performance of languages in
their respective environments is significant for anyone wanting to do such
work.
> Isaac Gouy wrote:
>> I think you're blazing new ground!
>>
>> I don't remember seeing side-by-side programming language timings
>> where the measurements were made on different OSs.
>>
>> Should we look forward to side-by-side comparions where OCaml was
>> measured on Athlon and F# was measured on Xeon? ;-)
>
> I'll be sure to learn how to benchmark properly from you, Isaac:
>
> http://shootout.alioth.debian.org/sandbox/benchmark.php?test=binarytrees&lang=ocaml&id=0
> - "De-optimized by Isaac Gouy"
Oh, come on. This is becoming stale. Not that I could say that of if
I.G. can benchmark or does benchmark right. We know that is a pet
peeve of yours, but instead of focusing on what Isaac does wrong, do
it better. It shouldn't be difficult to build a better benchmarking
side _if_ you do it right, that means (a) don't compare apples with
oranges, (b) keep the proper scientific spirit regarding the openness
of results and (c) state clearly what the purpose of the benchmarks
is.
Regards -- Markus
> baguas...@gmail.com wrote:
>> I think that's awesome, but how useful is performance these days? I
>> don't think people use python because it's fast. *I* certainly don't
>> use ocaml because it *is* fast. I use it because it's damn faster to
>> write code in than anything else. Testing and debugging python code
>> totally sucks. I rarely test/debug ocaml code. When I do, it's
>> fairly simple and straightforward.
>
> Well, performance is still vital for a wide variety of important
> applications in science and engineering. So the performance of languages in
> their respective environments is significant for anyone wanting to do such
> work.
It is not unimportant but not important beyond everything else. I
think it has been suggested before that you're exhibiting a rather
unhealthy fixation on performance (on performance almost alone) and I
repeat it: Performance is not all. It depends on the overall equation
including developer time, number of deployments, relative cost of
hardware upgrades to say which solution is efficient. A factor of
three often doesn't mean a thing (especially in the more logically
oriented problems where the clear and maintainable implementation of a
specific model is more important than speed). E.g. I don't care wether
my package installer takes 5 seconds or fifteen to figure out all the
dependencies, but I care about a dependeny model smart and automatic
enough that it gets things right without too much user input.
So please, Jon, get away from that fixation on performance and if you
really need to benchmark performance because it's so important to you
(to you, mind you) than don't compare apples with oranges and don't
benchmark one language on a 64bit operating system against another one
on a 32bit operating system and then talk about differences in
language implementation. Since word and cache size have a profound
influence, also the granularity with which the OS does operations,
data allaocations and transfers it's really difficult to do a
comparison in this case since you changed two factors at the same
time: Environment and language. Experimental physicist know that one
shouldn't do that.
Regards -- Markus
I already did.
What version of gcc did you use? I find it very hard to believe that F#
produces faster code than g++ ... But gcc 3.4.5 (mingw) has trouble
optimising the C++ (not C) version of the n-body program. On my
machine, the newer mingw 4.2.1 (dw2) compiler produces code that is more
than twice faster (~70 s for 3.4.5 vs. ~25 s for 4.2.1).
And as other people said, it would be nice if you did the benchmarks
again, running the programs on the same operating system. I am curious
about the results.
GCC 4.1.3
> I find it very hard to believe that F#
> produces faster code than g++ ...
I agree. I tried various other compiler options (-O2 and -ffast-math) and
they made it no faster or significantly slower.
> But gcc 3.4.5 (mingw) has trouble
> optimising the C++ (not C) version of the n-body program. On my
> machine, the newer mingw 4.2.1 (dw2) compiler produces code that is more
> than twice faster (~70 s for 3.4.5 vs. ~25 s for 4.2.1).
>
> And as other people said, it would be nice if you did the benchmarks
> again, running the programs on the same operating system. I am curious
> about the results.
I am not curious about such results. I would not try to use OCaml under
Windows or F# under Linux.
A month ago I was able to congratulate you on enjoying the joke
http://groups.google.com/group/comp.programming/msg/b094c27b4349b798
Now you seem to have lost your sense of humour.
That's an interesting kind of excuse, I think it would also work (at
least as well) for someone with no interest running OCaml on Xeon or
F# on Athlon.
Perhaps your sense of humour is intact after all, and the joke - as
usual - is on the rest of us :-)
While I don't disagree with your statement as such...
I had the opportunity to present a case where we used a multicore
architecture to speed up a commercial product. The newsworthy
item was that we've also shipped it to the customer (and the
work to port half a million lines of code to a new architecture
was insignificant). This product uses about 10,000 concurrent
processes in normal operation.
In most benchmarks (e.g. the shootout), such a high degree of
concurrency is not tested, partly because many programming
environments simply cannot cope with it. It is not considered
especially taxing for an Erlang environment.
In _this_ particular product, performance is almost entirely
decided by the ability to handle a very large number of
extremely complex state machines, but most importantly,
getting it to work _at all_ is the big challenge, and also
hinges on the ability to program these FSMs in the right
way. Thus, performance, important as it is, is secondary.
(or tertiary - robustness and debugging support are more
important.)
As regards concurrency, OCaml does not appear to do that
well:
whereas Haskell does (at least up to 3000 processes).
BR,
Ulf W
The critical difference is, of course, that F# is designed to run under
Windows and OCaml is designed to run under Unix.
I don't know what to make of this.
On one level, this is just hilarious. There's no reason why F# shouldn't
run just fine on Unix, or OCaml on Windows; in practice there might be
differences, of course, but that's nothing that a different process
handling library couldn't fix. (I'd expect differences for languages
with direct support for multithreading, but AFAIK both OCaml and F# rely
on libraries for that.)
On the other hand, I know that your claims are often backed by personal
experience, so what's the actual difference that you mean?
Regards,
Jo
Yes, massive concurrency is certainly an interesting area.
However, because the cheap-concurrency benchmark is ill-formed (like most of
the benchmarks on the shootout) the subjectively permitted OCaml "solution"
is nothing more than a grossly inefficient way to print 500*n.
In order to make any useful statements about the abilities of each language
to handle concurrent applications (both CPU bound and massively concurrent)
we need implementations of an objective and quantitative benchmark. Massive
concurrency is well outside my remit but perhaps you could propose a
problem that could be solved objectively as a test?
While I agree that the benchmark is ill-formed (it allows
for too much cheating), most entries actually try to use
the most intuitive concurrency constructs available to
each. This should really penalize entries that needlessly
use memory-protected threads which maintain debugging
metadata, etc. (like Erlang). But three of the fastest
entries - Erlang, Haskell, Mozart - do no form of cheating,
and still come out on top.
It isn't particularly interesting to compare a solution
with full-fledged processes with one that pokes a bunch
of coroutines. Both may be useful, but for entirely
different things. What _is_ interesting is when the
process-based solutions outperform the coroutines.
BTW, some of the rejected programs (two of them written
in Ocaml) do fit your description rather well.
> In order to make any useful statements about the
> abilities of each language to handle concurrent
> applications (both CPU bound and massively concurrent)
> we need implementations of an objective and quantitative
> benchmark. Massive concurrency is well outside my
> remit but perhaps you could propose a problem that
> could be solved objectively as a test?
That's a bit difficult. I don't have an answer ready.
Even in an ambitious comparison like the one done by
Trinder, Nyström et al
(http://www.macs.hw.ac.uk/~trinder/papers/ICFP2007.pdf),
the authors hesitate to draw too many conclusions when
comparing Erlang and Haskell (a performance comparison
was not done due to time constraints), to wit:
"These results should, however, be viewed with some caution.
GdH is a research language and lacks a production-quality
implementation. More significantly, GdH’s distributed paradigm
is limited in a number of ways. GdH uses a distributed virtual
shared-memory model, so distributed performance
will not scale as well as a distributed-memory model
like ERLANG. GdH has conventional fault tolerance using
distributed exceptions, where other languages including
ERLANG have more advanced models. GdH supports only closed
systems: i.e. while an arbitrary number of processes can be
created on an arbitrary number of processors, all of the
distributed processes are part of a single program and no
new programs nor new processors can be added (or removed).
Finally, GdH is lazy and hence it’s harder to statically
predict program performance, often a crucial aspect of
distributed systems".
BR,
Ulf W
> While I agree that the benchmark is ill-formed (it allows
> for too much cheating), most entries actually try to use
> the most intuitive concurrency constructs available to
> each. This should really penalize entries that needlessly
> use memory-protected threads which maintain debugging
> metadata, etc. (like Erlang). But three of the fastest
> entries - Erlang, Haskell, Mozart - do no form of cheating,
> and still come out on top.
>
> It isn't particularly interesting to compare a solution
> with full-fledged processes with one that pokes a bunch
> of coroutines. Both may be useful, but for entirely
> different things. What _is_ interesting is when the
> process-based solutions outperform the coroutines.
There do seem to be a lot of quite different approaches to
concurrency.
> > Massive concurrency is well outside my
> > remit but perhaps you could propose a problem that
> > could be solved objectively as a test?
>
> That's a bit difficult. I don't have an answer ready.
Maybe anything more than evaluating a particular technique is too much
to ask?
"a benchmark for evaluating software transactional memory (STM)
implementations"
http://lpd.epfl.ch/kapalka/index.php?page=stmbench7
Yes it is just hilarious.
F# does run fine on Mono - but I don't suppose any of us would be
surprised if MS .Net performance was different than Mono performance -
so we'd stay with F# on Win XP.
The puzzle is why we would avoid all four OCaml for Microsoft Windows
ports on the download page - are they too slow or too fast :-)
Essentially, OCaml is native to Linux and Mac OS X whereas F# is native to
Windows.
To me, comparing OCaml and F# under Linux (with F# running under Mono) is
equivalent to comparing French and Italian food bought from a French
restaurant using the justification that buying from the same restaurant
avoids bias.
And how does "being native" to a platform change the relevant properties?
Regards,
Jo
According to this page, the ocaml compiler works fine on Windows:
http://caml.inria.fr/ocaml/portability.en.html
> To me, comparing OCaml and F# under Linux (with F# running under Mono) is
> equivalent to comparing French and Italian food bought from a French
> restaurant using the justification that buying from the same restaurant
> avoids bias.
OK, Mono and Microsoft's runtime do not perform the same. So benchmark
ocaml and g++ on Windows instead.
As you said, performance matters in scientific applications.
*Scientists*, who probably had to do a lab exercise each week during
their undergrad years, are not going to be impressed about such sloppy
benchmarking.
What a great way to figure out whether the French food from the French
restaurant is better than the Italian food from the French
restaurant!
Yes, and to further the discussion on how difficult it is
to benchmark different concurrency strategies against
each other, here's a nice little comment by Richard O'Keefe
on Simon Peyton Jones' article about Software Transactional
Memory in Haskell. It presents two solutions to the Santa
Claus Problem, one with STM and one with Erlang message
passing.
http://www.cs.otago.ac.nz/staffpriv/ok/santa/index.htm
BR,
Ulf W
> Isaac Gouy wrote:
>> On Sep 2, 9:26 am, Jon Harrop <j...@ffconsultancy.com> wrote:
>>> Szabolcs wrote:
>> -snip-
>>> > And as other people said, it would be nice if you did the benchmarks
>>> > again, running the programs on the same operating system. I am curious
>>> > about the results.
>>>
>>> I am not curious about such results. I would not try to use OCaml under
>>> Windows or F# under Linux.
>>
>> That's an interesting kind of excuse, I think it would also work (at
>> least as well) for someone with no interest running OCaml on Xeon or
>> F# on Athlon.
>>
>> Perhaps your sense of humour is intact after all, and the joke - as
>> usual - is on the rest of us :-)
>
> The critical difference is, of course, that F# is designed to run under
> Windows and OCaml is designed to run under Unix.
Care to alaborate that? AFAIK Ocamls runs rather OK under Windows.
-- Markus
> Joachim Durchholz wrote:
>> On the other hand, I know that your claims are often backed by personal
>> experience, so what's the actual difference that you mean?
>
> Essentially, OCaml is native to Linux and Mac OS X whereas F# is native to
> Windows.
Oh no. Don't you think you're stretching the term "native" a bit here?
What does "native" mean in this context?
- The way the language runs at the platform (native in the sense
ocamlopt produces native code) => Ocaml on Windows has at least
two native ports (MingW and VisualC). On the other side F# run in
a virtual machine of some kind. Can't be what you mean.
- Native in the sense of "has grown there": You've a point here, but
on that argument we'd have to compare C++ with C#: The first has
been invented on Unix (AFAIK) the latter is a genuin Microsoft
invention.
Also I don't see that the lanaguges themselves should run better or
worse on Linux or Windows: That is just a question of (host) compiler
or VM quality. The only sense in which F# is a windows citizen is the
ampunt of integration already achieved in terms of libraries (and the
same applies to OCaml as a Unix citizen). That has nothing to do with
the core language itself.
> To me, comparing OCaml and F# under Linux (with F# running under Mono) is
> equivalent to comparing French and Italian food bought from a French
> restaurant using the justification that buying from the same restaurant
> avoids bias.
Uuh. Bad analogies.
Well, if you have to compare, then please provide 4 data points (or better 8):
- Ocaml, F#
- 63bit and 32bit OS
- Linux and Windows
And on the same hardware, please.
The data slice that is of primary to the practical programmer is
always the same language at the same hardware under Linux and Windows.
... ff one wants to develop a portable code base. And to that end I
doubt that sticking to a common subset would be feasible: No classes
in F# and Ocaml is missing some of the .NET integration features, so
prtability of whole programs between F# and OCaml would be painful.
I liked your contributions so far (and I take back some of my critique
regarding missing scientific attitude, but not all, because I still
have the impression you're starting out "to prove that OCaml (or F#)
is faster than X" instead of going back to "I want to compare 2
languages. As a scientist I have to ask: What are the important
properties the matter in this context and how can I measure them") ...
as I said I liked your contributions so far, but recently I've got the
impression you're drifting away into obscurantism.
Sorry. I know you can do better :-). Please don't "publish" incomplete
data and then start defending that with bad analogies. There is no
natural homeland for a given programming language. They have to
perform where we want to use them. Not being at home at X is no excuse
for bad performance. We want to know how and how well they work on any
platform, not where they come from (the latter being totally
uninteresting. BTW: Is the native homeland of Java the coffee machine
or the toaster: That would explain a lot ...)
Regards -- Markus
Seems like Jon can't win. If he did compare OCaml on Windows and F# on
Windows. I'm 100% sure someone would complain the comparison is unfair
because OCaml's *runtime* system has not been optimized for the Win32 API.
So Jon did what I think was a completely reasonable thing he benchmarked
on each "native" OS, where native means the primary platform for which
each systems is likely optimized for.
I guess the only way for Jon to convince everyone is to produce the full
matrix of language and OS. I think it's completely unfair to accuse Jon
of trying to rig the results. If anything I think he tried to make it a
"fair" comparison. I find it amazing that there is so much controversy
about this.
That would be best.
> I think it's completely unfair to accuse Jon
> of trying to rig the results.
Nobody has said that.
> If anything I think he tried to make it a
> "fair" comparison.
Well, I think he tried and failed.
There is no such thing as a fair comparison here - it's really a
function of how far each language has been optimized for each platform.
So first, the differences may be due to different manpower applied, so
it may be really a property of the implementation and not the language
per se (but he made a statement about the languages); second, it's
unclear how much of these differences are going to persist (but his
statement was made as if the differences were cast in concrete); third,
the matrix was incomplete (but he was already drawing conclusions).
Regards,
Jo
There is no well defined mathematical of the "speed of a language".
There is only the speed of a given implementation of languages. Jon
picked what he considered the best implementations of various languages
implementations and measured them for a particular benchmark. From that
experiment he concluded that because there exists F# programs that are
faster, it is likely not impossible to believe that on average F# are
not much slower than C++ programs compiled with gcc.
Of course not, but language design can still help or hinder efficiency.
E.g. languages with a required reflection API such as Smalltalk, Java
and Lisp can't easily optimize away uncalled functions and never
accessed fields. Java's concurrency model has been described as "too
heavyweight for efficient". Etc.
That's not a clearcut distinction, but nevertheless a useful one.
Regards,
Jo
Both are single-implementation languages.
> second, it's
> unclear how much of these differences are going to persist (but his
> statement was made as if the differences were cast in concrete);
I did not state that these properties would persist and it would be stupid
for anyone to assume that.
> third,
> the matrix was incomplete (but he was already drawing conclusions).
You can draw valid conclusions with an incomplete matrix. In this case, I
have shown that F# is not always slower than OCaml, which is an important
conclusion and contrary to many people's beliefs.
None of these rebuttals have logical foundations: they are born of contempt.
If you devote unbounded resources to compiler development, irrespective
of language you can always produce optimal code for any arbitrarily
large class of programs. You just can't do it for all programs because
of the halting problem. But in the limit there's no real distinction to
be made between languages, unless you bound the resources devoted to the
compiler.
Perhaps, you can prove that it takes asymptotically more resources to
optimize for one language over the other but I don't know of any results
in that area.
Hence it makes no sense to compare "language speed". It only makes sense
to compare implementations. The most fair comparison is to pick
implementations for which you reasonably expect the most resources were
spent in optimizing those implementations.
Which goes back to the underlying reason I assume Jon picked the targets
he did. Makes, perfect sense to me! Yet his choice seems to have caused
some controversy. I will admit, I'd prefer to the word-size to be the
same, but I find the notion of what Jon did to be unscientific a little
silly.
What is particularly annoying are the claims that his comparisons of
implementations is an unfair comparison of the languages, and is
unscientific. What is logically unfounded is the notion of "language
speed".
All you can compare is implementations, and the primary factor to
consider is level of investment in making those implementations fast.
Account for that first then worry about the second-order effects like
word-size and OS.
You can draw the valid conclusion that F# on 32 bit XP is not always
slower than OCaml on 64 bit Linux.
I would guess the OCaml nbody comparison with several other language
implementations would make many people wonder how that particular
OCaml program could be improved:
http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=all
http://shootout.alioth.debian.org/debian/benchmark.php?test=nbody&lang=all
> None of these rebuttals have logical foundations: they are born of contempt.
They are born of abused credulity.
You can bludgeon away with "OCaml is native to Linux" all you like -
it won't change the fact that on my old machine the OCaml nbody
program was /faster/ on 32 bit XP than on 32 bit Linux.
I can understand that someone might not want to mess about installing
"Visual C++ 2005 Express", and MASM, and 600MB of "Microsoft Platform
SDK for Windows Server 2003 R2" (for a couple of library files), as
well as installing the OCaml win32 binaries - but that is how we get
beyond speculation about OCaml on XP.
> Markus E L wrote:
>> Jon Harrop wrote:
>>
>>> Joachim Durchholz wrote:
>>>> On the other hand, I know that your claims are often backed by personal
>>>> experience, so what's the actual difference that you mean?
>>> Essentially, OCaml is native to Linux and Mac OS X whereas F# is native to
>>> Windows.
What does that quote have to do with me?
Regards - M
> targets he did. Makes, perfect sense to me! Yet his choice seems to
> have caused some controversy. I will admit, I'd prefer to the
His choice doesn't cause controversy, because he picked thos targets
for development. The controvery is about making _language_ or
_language implementation_ comparison based on those choice.
> word-size to be the same, but I find the notion of what Jon did to be
> unscientific a little silly.
Not what he did, but the conclusions he drew.
> What is particularly annoying are the claims that his comparisons of
> implementations is an unfair comparison of the languages, and is
> unscientific.
> What is logically unfounded is the notion of "language speed".
But you note that Jon almost exclusively is aiming for speed
comparisons and he does them with an eye on the languages under
scrutiny?
> All you can compare is implementations, and the primary factor to
That's a truism and useless one at that.
> consider is level of investment in making those implementations
> fast. Account for that first then worry about the second-order effects
> like word-size and OS.
You can well consider the level of investment etc. As a _user_ of the
language I'm primarily interested in the performance, advantages and
disadvantages of the existing implementations.
Regards -- Markus
> Java's concurrency model has been described as "too
> heavyweight for efficient". Etc.
While that may be true, the biggest problem with Java's
concurrency model is that it doesn't handle complexity
well. This is the main reason why the standing advice
to Java programmers is to avoid concurrency as much
as possible.
"Writing a thread-safe class is hard, but analyzing
an existing class for thread-safety is even harder,
as is enhancing it so that it remains thread-safe."
http://www.ibm.com/developerworks/java/library/j-jtp06294.html
BR,
Ulf W
> Joachim Durchholz wrote:
>> There is no such thing as a fair comparison here - it's really a
>> function of how far each language has been optimized for each platform.
>> So first, the differences may be due to different manpower applied, so
>> it may be really a property of the implementation and not the language
>> per se (but he made a statement about the languages);
>
> Both are single-implementation languages.
Ah, not really: There is an implementation on 32bit and one on 64bit,
one on Windows and one on Linux and others on other Unixes. They are
build from the same source, but they are different.
>> second, it's
>> unclear how much of these differences are going to persist (but his
>> statement was made as if the differences were cast in concrete);
> I did not state that these properties would persist and it would be stupid
> for anyone to assume that.
Uh, oh.
>> third,
>> the matrix was incomplete (but he was already drawing conclusions).
> You can draw valid conclusions with an incomplete matrix. In this case, I
> have shown that F# is not always slower than OCaml, which is an important
That again is a truism. Useless, too. Put Ocaml on completely
different and weak hardware than it will certainly be slower than
F#. You see: The value of your propositions is inverse proportional to
the number of uncontrolled/variable factors in your test setup. And
the point is that you got to many variable factors with only 2 data
points to say anything beyond "I can make Ocaml slower than F#". Not
that I doubt that F# is or can be on the par with OCaml. But certainly
I'd see that better and more exhaustively documented than the skewed
Ocaml on Unix and 64bit and F# on Windows and 32bit comparison you
delivered (if I remember right) and would prefer to refrain from
premature conclusions and/or speculations until then. At least I would
not insist on being scientific at that moment but make it clear that
I've my discoverers hat on ("Wow look, its faster, who would have
expected this") and announce a more complete data set for later.
> conclusion and contrary to many people's beliefs.
Nothing to do with belief, but proper (scientific) methods of
comparison.
> None of these rebuttals have logical foundations: they are born of contempt.
Oh, come on. Some of the people (like me or Jo) that criticize you
now, have been argued for you just recently. Now that we criticize
you, we're suddenly holding you in contempt? That is just stupid, Jon,
and not the proper scientific spirit. I also would like to add, that
it is not in accordance with the degree of professionalism you try to
exhibit/build at your Frog consultancy site. Criticism is a part of
science (and I have supplied some constructive criticism, I think, by
pointing out the differences in cache and word size and their effect
between 32 and 64 bit) and it is also by necessity of business. If
every business would get into a huff if their product gets a
not-quite-so favourable review, we'd all be sulking instead of selling
anything. Or getting anything published.
Regards -- Markus
> On Sep 5, 12:30 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> -snip-
>> You can draw valid conclusions with an incomplete matrix. In this case, I
>> have shown that F# is not always slower than OCaml, which is an important
>> conclusion and contrary to many people's beliefs.
>
> You can draw the valid conclusion that F# on 32 bit XP is not always
> slower than OCaml on 64 bit Linux.
>
> I would guess the OCaml nbody comparison with several other language
> implementations would make many people wonder how that particular
> OCaml program could be improved:
>
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=all
>
> http://shootout.alioth.debian.org/debian/benchmark.php?test=nbody&lang=all
I'm note sure what this comment is doing here, except perhaps trying
to provoke JH. But BTW, I suggest that avoiding stuff like
px := !px +. bodies.(i).vx *. bodies.(i).mass;
py := !py +. bodies.(i).vy *. bodies.(i).mass;
pz := !pz +. bodies.(i).vz *. bodies.(i).mass;
...
bodies.(0).vx <- -. !px /. solar_mass;
bodies.(0).vy <- -. !py /. solar_mass;
bodies.(0).vz <- -. !pz /. solar_mass
might help. This looks like a very literal translation from C and my
suspicion is: It's not nice to the compiler. But I don't know.
>> None of these rebuttals have logical foundations: they are born of contempt.
>
> They are born of abused credulity.
Ah, nonsense. So far we didn't need to _believe_ in Jon.
> You can bludgeon away with "OCaml is native to Linux" all you like -
> it won't change the fact that on my old machine the OCaml nbody
> program was /faster/ on 32 bit XP than on 32 bit Linux.
That's funny. I wonder why? background processes disrupting the cache?
Different gcc's as backends? And BTW: Byte code or native? Note that
the details given by you are also incomplete?
> I can understand that someone might not want to mess about installing
> "Visual C++ 2005 Express", and MASM, and 600MB of "Microsoft Platform
> SDK for Windows Server 2003 R2" (for a couple of library files), as
> well as installing the OCaml win32 binaries - but that is how we get
> beyond speculation about OCaml on XP.
Just do it, OK?
Regards -- Markus
I think you understate the importance of word size: Implementations
can be specifically optimised for a particular size & this can make a
big difference to the observed performance. Jon & I have had some
exchanges where we've seen very different levels of performance from
the same code which have turned out to be down to word size
differences.
Phil
--
http://www.kantaka.co.uk/ .oOo. public key: http://www.kantaka.co.uk/gpg.txt
Absolutely, and being 64-bit gives every advantage to OCaml in the context
of numerical algorithms such as these yet F# still outperforms OCaml. This
is precisely why I find these results fascinating.
That is indefensible. For everything I say, you can claim that you already
knew it or that it was obvious.
> Useless, too.
Stating "truism" is useless. You can always state it so it conveys no
information.
> Put Ocaml on completely different and weak hardware than it will certainly
> be slower than F#.
Yes.
> You see: The value of your propositions is inverse proportional to
> the number of uncontrolled/variable factors in your test setup.
This is about satisfying an inequality. I have shown that it can be
satisfied (F# can be faster than OCaml) and, moreover, that this can even
be true when OCaml is on its best-case setup.
> And
> the point is that you got to many variable factors with only 2 data
> points to say anything beyond "I can make Ocaml slower than F#". Not
> that I doubt that F# is or can be on the par with OCaml. But certainly
> I'd see that better and more exhaustively documented than the skewed
> Ocaml on Unix and 64bit and F# on Windows and 32bit comparison you
> delivered (if I remember right) and would prefer to refrain from
> premature conclusions and/or speculations until then.
Firstly, my conclusion is not premature. On the contrary, it is completely
objectively and quantitatively justified. Secondly, of course everyone
would always like to see infinite evidence to justify each and every
hypothesis. That is a practical impossibility and you can draw perfectly
valid conclusions at every step.
>Phil Armstrong wrote:
>> I think you understate the importance of word size: Implementations
>> can be specifically optimised for a particular size & this can make a
>> big difference to the observed performance. Jon & I have had some
>> exchanges where we've seen very different levels of performance from
>> the same code which have turned out to be down to word size
>> differences.
>Absolutely, and being 64-bit gives every advantage to OCaml in the context
>of numerical algorithms such as these yet F# still outperforms OCaml. This
>is precisely why I find these results fascinating.
I don't understand this. Could someone perhaps explain why 64-bit is
an advantage for numerical algorithms ? Are you dealing with integers
larger than 31 bits ?
Thanx in advance,
/Finn
Numerical algorithms that fetch data from arrays these days are limited
by bandwidth to the memory subsystem. The wider data path of the 64-bit
CPU can provided a noticeable boost in performance. Or as others have
pointed out a slowdown because of the extra traffic and caching issues
because pointers get uniformly bigger.
>Finn Schiermer Andersen wrote:
>{stuff deleted}
>> I don't understand this. Could someone perhaps explain why 64-bit is
>> an advantage for numerical algorithms ? Are you dealing with integers
>> larger than 31 bits ?
>Numerical algorithms that fetch data from arrays these days are limited
>by bandwidth to the memory subsystem. The wider data path of the 64-bit
>CPU can provided a noticeable boost in performance.
Hmmm. But do you know for a fact that the datapath between the floating
point unit and the data cache is wider on a 64-bit machine than on a 32-bit
machine ? I know only little about x86 implementations, but on most mips
and powerpc implementations i know of, the path between the FPU and the
data cache is 64 bit regardless of the width of the integer data path.
The width of the data path between the cache and the floating point unit
depends on whether the machine has been designed for single or double
precision.
> Or as others have
>pointed out a slowdown because of the extra traffic and caching issues
>because pointers get uniformly bigger.
Exactly.
Best regards,
/Finn
{stuff deleted}
> Criticism is a part of
> science (and I have supplied some constructive criticism, I think, by
> pointing out the differences in cache and word size and their effect
> between 32 and 64 bit) and it is also by necessity of business.
At the end of the day science is about data. Criticism without new data
is useless. Jon, tends to make some claim and he usually has some reason
to believe it based on some sort of measured evidence. It would simply
be better if you carried out some experiments to bolster your criticism
about possible flaws in Jon's assumptions and present them.
However, given the set of assumption Jon had in his head, he carried out
the obvious experiment under those assumptions. The assumptions Jon made
can all be falsified with data. Falsify one of them and stop lecturing
us about experimental methods.
For the x64 vs th x86 comparisons, you also get more integer registers
which reduces the need of the compiler to spill to main memory when it
runs out of registers. I would expect the availability of more registers
to improve performance all things being equal by a few percent.
Given the code for n-body, and the small data-set I would be surprised
if the larger pointers were a significant issue. I would be less
surprised if the pointer size were an issue for pointer intensive
benchmark, but n-body should not be pointer intensive.
> >> None of these rebuttals have logical foundations: they are born of contempt.
>
> > They are born of abused credulity.
>
> Ah, nonsense. So far we didn't need to _believe_ in Jon.
As you previously noted, we need to believe his assertion that 64-bit
Linux measurements are the best case results for OCaml and G++
http://groups.google.com/group/comp.lang.functional/msg/a5c82e4300ae6227
> > You can bludgeon away with "OCaml is native to Linux" all you like -
> > it won't change the fact that on my old machine the OCaml nbody
> > program was /faster/ on 32 bit XP than on 32 bit Linux.
>
> That's funny. I wonder why? background processes disrupting the cache?
> Different gcc's as backends? And BTW: Byte code or native? Note that
> the details given by you are also incomplete?
I guess you wrote that before reading the paragraph that followed -
gcc on Linux, vc on XP - if it was bytecode we wouldn't need the MS
toolchain.
> > I can understand that someone might not want to mess about installing
> > "Visual C++ 2005 Express", and MASM, and 600MB of "Microsoft Platform
> > SDK for Windows Server 2003 R2" (for a couple of library files), as
> > well as installing the OCaml win32 binaries - but that is how we get
> > beyond speculation about OCaml on XP.
>
> Just do it, OK?
I did already.
I think 64-bit cpus tend to have larger caches (not wider fpu paths
because float size never increased). They also tend to make use of newer
technology, so at least part of that slowdown is made up for.
(BTW architectures with variable pointer sizes have invariably met with
performance problems when decoding addresses, and generally made the
programmer's life miserable. Since 32 bits are definitely not enough, I
think the move to 64 bits is still warranted. 64 bits should be enough -
you can address every subatomic particle in the universe with that IIRC,
much more every memory cell in it.)
Regards,
Jo
When someone says I assume X we can just shrug and say we don't assume
X come back when you have more than assumptions.
The assumptions Jon made can be verified with data - is the burden on
him to demonstrate his assumptions hold or on others to demonstrate
his assumptions don't hold?
>Finn Schiermer Andersen wrote:
>{stuff deleted}
>> Hmmm. But do you know for a fact that the datapath between the floating
>> point unit and the data cache is wider on a 64-bit machine than on a 32-bit
>> machine ?
>It's not just the data path between the cache and the FPU, but the
>data-path/bus between the cache and main memory. In the end its all
>about memory subsystem bandwidth and you get more with 64-bit machines,
Really. I'm a bit surprised, because there is no inherent connection
between the 64/32-bit distinction on the ISA level and the design
of the rest of the memory hierarchy, except for the datapath between
the L1 cache and the integer datapath, which is almost always designed
to match the width of an integer.
/Finn
Yes I can see that providing infinite evidence would be a practical
impossibility. But I'm finding it quite difficult to see why this
small number of possibilities need give rise to infinite evidence ;-)
afaict Just 2 data points - native OCaml on 32-bit Win XP - would
provide direct comparison with F# .Net on 32-bit Win XP (admittedly at
the risk of discovering 64-bit Linux isn't always the best-case setup
for OCaml on your hardware).
Just to provide some more data, under 32-bit Win XP:
ocamlopt
Spectral-norm 34.19s
N-body 16.80s
The assumption passes the "it's debatable test". The scientific methods
doesn't dictate who is responsible for verifying every *reasonable*
assumption. If you disbelieve it and Jon's results because you *know*
it's *definitely* and *obviously* wrong then just say so and ignore Jon.
If you are uncertain, I think it would be polite to help out the greater
effort of getting at the truth and do some experiments yourself. Jon did
something quick and dirty to satisfy his curiosity and shared it with
us. If you think he is in error. Some data to show it would be much more
worth while, than complaining Jon's little experiment is "unscientific."
There has been much more heat/criticism than light/data. In believe if
you claim to be a participating in a scientific discourse, you have some
responsibility to generate some light/data in proportion to your
heat/criticism.
I personally think it's unfair to expect Jon to do all he can to
convince you! Jon just wanted to convince himself from what I can tell
and just shared the results.
To be fair, I think you've provide some experimental evidence against
the "64-bit is best case", but it's hard to know exactly what's going on.
BTW just to be clear, I'm not sure what exactly to make of Jon's
results, but to me they seem like an honest attempt to provide some much
needed light on these issues. It's a shame they generate more heat than
I think they should.
I don't disagree with you, but once you start building 64-bit machines,
you start optimizing the memory hierarchy to provided more memory
bandwidth. If you don't you get a degradation in performance because the
required bandwidth to maintain the same CPI for a 32-bit machine
naturally increases. Also, the 64-bit machines end up in multi-proc
server configurations so again you architect for that scenario which
again is memory bound. This is why AMD developed Hypertransport and on
die memory controllers.
Presumably on the same hardware? Thank you!
Now here's a puzzle -
MS native OCaml on my old 2GHz P4, on 32-bit Win XP,
using ntimer from the "Windows 2003 Resource Kit"
spectral-norm 30.60s
n-body 33.70s
I suspect somethings wrong when spectral-norm is faster on my old
computer than on your dual-core AMD Athlon 64 :-)
But it's difficult to suggest what the problem might be because
- you haven't told us whether this is the MinGW-based native Win32
port, the Microsoft-based native Win32 port, or Cygwin-based port
- you haven't told us the compiler options used for each program (or
for the programs on Linux
As a wild guess, are you using the compiler option: -inline 10
I think the suggestion over and over has been that it would be
slightly easier to know "what exactly to make of Jon's results" if
there were fewer differences between the things being measured ;-)
Like stick with the the MS toolchain:
F# on .Net on Win XP
OCaml MS based native on Win XP
or stick with the gcc toolchain on Linux, or do both.
What if F# on Mono is faster than that OCaml nbody program? :-)
That's always the case, but Jon did what he did for a reason. He didn't
do it accidentally. He did it for a reason consistent with a particular
set of assumptions he had in mind. The fact everyone else seems to
disagree with those assumptions, are the problem. It's not because Jon
was being "unscientific" in his approach.
I'd bet that if he did the XP Ocaml, F# comparison. Someone, else would
ask about the Linux numbers, because they'd think XP on OCaml is
"suboptimal". The full matrix is the only non-controversial thing. It's
unfortunate that producing that matrix requires non-trivial effort.
People's motives are their own - they do what they do.
I don't think agreement or disagreement with those assumptions is the
problem - the problem is figuring out "what exactly to make of Jon's
results" given they include those assumptions. Thankfully that problem
seems to be going away.
Ok - violent agreement then. How nice.
/Finn--
"Inside every man a human is fighting to get out" -- sch...@diku.dk
"No language in Section 2.05 allows Daimler to avoid its obligation to
certify a list of Designated CPUs by stating that none are currently
in use..." --- SCO's Memo in Opposition to DC's Motion to Dismiss
No compiler options, native Win32 port, argument is n=5500.
Please help us understand what you were "pleasantly surprised" about
by being a little more forthcoming with information:
When you say "native Win32 port" is that for
- do you mean the MS based "native Win32 port"?
- do you mean the MinGW-based "native Win32 port"?
- do you mean the Cygwin-based port?
When you say "No compiler options" is that for
- ocamlopt on Win XP?
- f# on Win XP?
- ocamlopt on Linux?
- g++ on Linux?
- all of the above?
> Firstly, my conclusion is not premature. On the contrary, it is completely
> objectively and quantitatively justified. Secondly, of course everyone
> would always like to see infinite evidence to justify each and every
> hypothesis. That is a practical impossibility and you can draw perfectly
> valid conclusions at every step.
Unfortunately in my book you are still missing a step: But have it
your way: We will see wether anybody "buys" that conclusion (and how
many do so).
Regards -- Markus
> Phil Armstrong wrote:
>> I think you understate the importance of word size: Implementations
>> can be specifically optimised for a particular size & this can make a
>> big difference to the observed performance. Jon & I have had some
>> exchanges where we've seen very different levels of performance from
>> the same code which have turned out to be down to word size
>> differences.
>
> Absolutely, and being 64-bit gives every advantage to OCaml in the context
> of numerical algorithms such as these yet F# still outperforms OCaml. This
> is precisely why I find these results fascinating.
Cache sizes? Optimizations for GC not applying any more? All this kind
of stuff? I find the "yet" questionable in your conclusions/results,
that's all. You assume something ("being 64-bit gives every advantage
to OCaml ...") that hasn't been understoo completely yet and would
have been demonstrated by benchmarking 32 bit OCaml against 64 bit
Ocaml => voila, and that would give us exactly the missing link/dat
point everyone has been clamouring for.
Regards -- Markus
MS based.
> When you say "No compiler options" is that for
> - ocamlopt on Win XP?
> - f# on Win XP?
> - ocamlopt on Linux?
> - g++ on Linux?
> - all of the above?
I already gave the compile options explicitly. F# is using default Release
mode. G++ had -O3 and ocamlopt had no options.
And can we please have all the relevant software versions? Ocamlopt on
windows uses the native compiler (Mingw gcc from Cygwin or VC) as
backend and I think some gcc versions might suck.
Regards -- Markus
On my old 2GHz P4, this is how much /faster/ the programs are using
the compile options from the benchmarks game, instead of just -O3 for g
++ and no compile options for ocamlopt that you used.
nbody spectralnorm
g++ 8.5% faster 11.3% faster
ocamlopt 2.3% faster 54.0% faster
ocamlopt xp 2.1% faster 58.7% faster
http://shootout.alioth.debian.org/gp4/benchmark.php?test=nbody&lang=all
http://shootout.alioth.debian.org/gp4/benchmark.php?test=spectralnorm&lang=all
inlining makes a big difference on spectral-norm, maybe you'll see a
50-60% improvement on your ocamlopt measurements!
I'm happy to acknowledge that you have nothing to learn about
benchmarking from me, and I'm confident that you know far more about g+
+ and ocamlopt compiler options than I'll ever be interested in
learning.
It would be surprising if you needed me to say - did you try setting g+
+ -march=athlon64-sse3?
It would be astonishing if you needed me to say - did you try setting
ocamlopt -inline? - because you do set ocamlopt -inline 100 for the
ray tracer program on your website.
Which leaves us with a puzzle: how can it be that on your dual-core
AMD Athlon 64, the MS based native ocamlopt spectral-norm program is
slower N=5,500 than on my old 2GHz P4?
Once more, here are the measurements you reported for MS native OCaml,
on 32-bit Win XP:
spectral-norm 34.19s
n-body 16.80s
and here are the corresponding measurements for my slower computer
N=5,500 and N=20,000,000
spectral-norm 30.60s
n-body 33.70s
I can easily make the spectral-norm program take twice as long simply
by omitting the ocamlopt compiler options, but I'm sure you would
agree that couldn't be described as best-case for OCaml.
I hope you take another look at the g++ and ocamlopt compiler options
and report updated measurements.
I made a mistake here: ocamlopt was already using -inline 10 on 64-bit.
> Just to provide some more data, under 32-bit Win XP:
>
> ocamlopt
> Spectral-norm 34.19s
> N-body 16.80s
With -inline 10 under 32-bit WinXP I get:
ocamlopt
Spectral-norm 14.66s
Lest we forget - do those measurements for spectral-norm on 64-bit
Linux and 32-bit Win XP demonstrate that 64-bit Linux is not always
the best-case setup
for OCaml on your hardware either?
Isaac Gouy wrote:
> On Sep 8, 6:59 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
>> Jon Harrop wrote:
>>> Jon Harrop wrote:
>>>> ocamlopt g++ -O3 F#
>>>> Spectral-norm 14.94s 9.34s 9.37s
>>>> N-body 9.23s 8.21s 6.87s
{stuff deleted}
>> With -inline 10 under 32-bit WinXP I get:
>>
>> ocamlopt
>> Spectral-norm 14.66s
>> N-body 16.80s
>
>
> Lest we forget - do those measurements for spectral-norm on 64-bit
> Linux and 32-bit Win XP demonstrate that 64-bit Linux is not always
> the best-case setup
> for OCaml on your hardware either?
I would say the difference between spectral norm on 32/64 bit is likley
to be within experimental error, and it is clearly the case that n-body
does better on 64 bit. If anything I would say it confirms Jon's
assumptions.
The difference of 1.9% is well within experimental error. For example,
running the program again on my Linux box after pausing some other programs
I get 14.461s, which is 1.4% faster than Win XP again.
Also, you can improve performance significantly by adding -unsafe:
64-bit Linux: 9.818s
Win XP: 11.921s
So I would say there is still no evidence that OCaml programs running on
32-bit Windows can be faster than on 64-bit Linux for numerically-intensive
programs like these.
Jon Harrop wrote:
> Isaac Gouy wrote:
>> Lest we forget - do those measurements for spectral-norm on 64-bit
>> Linux and 32-bit Win XP demonstrate that 64-bit Linux is not always
>> the best-case setup for OCaml on your hardware either?
>
> The difference of 1.9% is well within experimental error. For example,
> running the program again on my Linux box after pausing some other programs
> I get 14.461s, which is 1.4% faster than Win XP again.
>
> Also, you can improve performance significantly by adding -unsafe:
>
> 64-bit Linux: 9.818s
> Win XP: 11.921s
>
> So I would say there is still no evidence that OCaml programs running on
> 32-bit Windows can be faster than on 64-bit Linux for numerically-intensive
> programs like these.
I'm a bit confused now: Do we have a complete table of times with
hardware, software versions, OS, compilation options and finally time
somewhere?
Regards -- Markus
No, but we have more data, and the data does not disprove any of Jon's
assumptions. At this point, I think the burden is on those disbelieving
Jon's numbers/conclusions to do some experiments.
Honey, this was not about doubting Jon's results, but rather that I
lost overview (as my text said, BTW). I was also a request to tabulate
all results together for convenience.
You're seeing the whole thing too much as a fight. Me, I think Jon's
wants to communicate something and I'm just happy to give feedback
when his message didn't arrive at my side, but at the same time I'm
rather interested in Jon's message. Wether he is right or wrong or
whatever remains to be seen and has nothing to do with the question
that I, really, don't know any more who tested what under which
circumstances last week, and, not being a sucker for performance
myself, so this question being completely secondary to me, I'm too
lazy to go back all those umpteenth posts to pick out all the data and
corrections and counter-corrections.
Regards -- Markus
What are you actually reporting - fastest of 3 or 5 or ... consecutive
timings? mean of 3 or 5 or ... ? Elapsed time? usr time? usr+sys
time?
(Maybe "pausing some other programs" would also reduce your Win XP
timings!)
> Also, you can improve performance significantly by adding -unsafe:
>
> 64-bit Linux: 9.818s
> Win XP: 11.921s
The puzzle is why the previous g++ and ocamlopt measurements were
described as "best case" when we can improve performance so
significantly.
Can we help the compiler some more:
- does -march=athlon64-sse3 help?
- does -noassert help?
Which of Jon's numbers would you like us to believe:
- the numbers that show ocamlopt spectralnorm on 64-bit Linux was 56%
faster than on 32-bit Win XP, or the ones that show it was 1.9%
slower, or the ones that show it was 17% faster?
- the numbers that show F# on Win 32 was 37% faster than "best case"
ocamlopt spectral norm on 64-bit Linux, or the ones that show it was
4.5% faster?
There is no need to: my original post summarized everything.
The table is far from complete, of course. You still have dozens of
platforms and architectures to examine, a huge number of permutations of
compiler options, compiler versions, OS versions and so forth. However, all
of the important information was neatly summarized in my original post.
CPU time. Best of three.
> (Maybe "pausing some other programs" would also reduce your Win XP
> timings!)
The Win XP box was already unladen.
>> Also, you can improve performance significantly by adding -unsafe:
>>
>> 64-bit Linux: 9.818s
>> Win XP: 11.921s
>
> The puzzle is why the previous g++ and ocamlopt measurements were
> described as "best case" when we can improve performance so
> significantly.
My original assumption was that AMD64 Linux is the best case system for
numerical OCaml code. All we have done since is find more evidence that
corroborates that assumption.
Of course, you may now demand that I back up my assumption by providing
benchmark results for OCaml and F#/Mono running under Solaris...
> Can we help the compiler some more:
> - does -march=athlon64-sse3 help?
> - does -noassert help?
Neither of those options will have any effect on this program.
Maybe you're saying that it was a mistake to do timing measurements on
a busy Linux box but you didn't make that mistake on the Win XP box?
(Are those different computers?)
It seems that when you say "well within experimental error" it's just
a nice sounding phrase - it doesn't actually tell us about the
variability of measurements made under the same conditions - someone
could come along and say the 4.5% difference between F# and ocamlopt
spectral-norm is "well within experimental error" without much fear of
factual contradiction.
> >> Also, you can improve performance significantly by adding -unsafe:
>
> >> 64-bit Linux: 9.818s
> >> Win XP: 11.921s
>
> > The puzzle is why the previous g++ and ocamlopt measurements were
> > described as "best case" when we can improve performance so
> > significantly.
>
> My original assumption was that AMD64 Linux is the best case system for
> numerical OCaml code. All we have done since is find more evidence that
> corroborates that assumption.
Maybe you're saying that when you wrote "these are bast-case results
for OCaml and C++" what you meant was that these are the best-case
results when compiler options which would improve performance
significantly are ignored?
> Of course, you may now demand that I back up my assumption by providing
> benchmark results for OCaml and F#/Mono running under Solaris...
No thanks. Squeezing blood out of this stone is tedious enough.
> > Can we help the compiler some more:
> > - does -march=athlon64-sse3 help?
> > - does -noassert help?
>
> Neither of those options will have any effect on this program.
Is that just a prediction or something you already tried (after you
paused some other programs) ?
It seems like you've told us:
- the Linux timings given in your original post were made on a busy
machine, and can be improved simply by pausing some programs while you
do the timing measurements
- the Linux timings given in your original post can be improved
significantly just by setting a compiler option
It seems like you've told us the important information neatly
summarized in your original post is wrong.
That will always be true. You could improve performance further by tweaking
the Linux kernel.
> - the Linux timings given in your original post can be improved
> significantly just by setting a compiler option
Or optimizing the program properly, yes. This has no effect on my original
statements, as we have shown.
> It seems like you've told us the important information neatly
> summarized in your original post is wrong.
On the contrary, you have completely failed to undermine my assumptions or
conclusions.
Yes it will always be true that doing timing measurements on a machine
busy with other programs is daft!
Implying that the difficulty of timing on an unloaded machine is in
any way comparable to tweaking the Linux kernel is transparent
hyperbole.
> > - the Linux timings given in your original post can be improved
> > significantly just by setting a compiler option
>
> Or optimizing the program properly, yes. This has no effect on my original
> statements, as we have shown.
The original statement gave measurements showing F# spectralnorm was
37% faster than ocamlopt spectralnorm, and now we know that drops to
4.5% when expert OCaml programmers (like yourself) start to use their
knowledge of ocamlopt compiler options - that's a quite substantial
"no effect".
> > It seems like you've told us the important information neatly
> > summarized in your original post is wrong.
>
> On the contrary, you have completely failed to undermine my assumptions or
> conclusions.
I have completely failed to undermine your lack of curiousity.
I think that's sad - a genuine effort to get the best from the g++ and
ocamlopt compilers would be a much stronger basis for F# advocacy.
You could do a proper statistical test for significance from many
measurements if a case arose where the result was not obvious, as it is
here.
>> > The puzzle is why the previous g++ and ocamlopt measurements were
>> > described as "best case" when we can improve performance so
>> > significantly.
>>
>> My original assumption was that AMD64 Linux is the best case system for
>> numerical OCaml code. All we have done since is find more evidence that
>> corroborates that assumption.
>
> Maybe you're saying that when you wrote "these are bast-case results
> for OCaml and C++" what you meant was that these are the best-case
> results when compiler options which would improve performance
> significantly are ignored?
In OCaml, you should remove bounds checks explicitly at specific points
rather than globally altering the language semantics via a compiler flag.
If you're going to do that then you might as well optimize the program
properly but then you're on the slippery slope that most of the shootout
benchmarks are trivially reducible.
For example, just writing the inner loops idiomatically in OCaml brings the
time down to 12.00s:
let eval_A_times_u u v =
let n = Array.length v - 1 in
for i = 0 to n do
let vi = ref 0. in
for j = 0 to n do
vi := !vi +. eval_A i j *. u.(j)
done;
v.(i) <- !vi
done
Manually hoisting the bounds checks brings the time down to 9.44s:
let eval_A_times_u u v =
let n = Array.length u - 1 in
for i = 0 to n do
let vi = ref 0. in
for j = 0 to n do
vi := !vi +. eval_A i j *. Array.unsafe_get u j
done;
v.(i) <- !vi
done
If you go down this route then you must add another dimension to your
results matrix with an infinite variety of different programs on it...
>> > Can we help the compiler some more:
>> > - does -march=athlon64-sse3 help?
>> > - does -noassert help?
>>
>> Neither of those options will have any effect on this program.
>
> Is that just a prediction or something you already tried (after you
> paused some other programs) ?
The former option affects C code compiled with ocamlc/opt (there is none)
and the latter removes assertions (there are none). If you would like to
learn more about the ocamlopt compiler options then please read the manual.
The nice thing about "obvious" is that we get to discard what we wish
to discard and keep what we wish to keep.
-snip-
> > Maybe you're saying that when you wrote "these are bast-case results
> > for OCaml and C++" what you meant was that these are the best-case
> > results when compiler options which would improve performance
> > significantly are ignored?
>
> In OCaml, you should remove bounds checks explicitly at specific points
> rather than globally altering the language semantics via a compiler flag.
So why do you suppose that compiler flag exists?
> >> > Can we help the compiler some more:
> >> > - does -march=athlon64-sse3 help?
> >> > - does -noassert help?
>
> >> Neither of those options will have any effect on this program.
>
> > Is that just a prediction or something you already tried (after you
> > paused some other programs) ?
>
> The former option affects C code compiled with ocamlc/opt (there is none)
> and the latter removes assertions (there are none). If you would like to
> learn more about the ocamlopt compiler options then please read the manual.
-march=athlon64-sse3 Did you forget that you timed c++ programs?
If you're willing to compare different programs and put effort into
optimizing the OCaml but not the F#, yes.
>> > It seems like you've told us the important information neatly
>> > summarized in your original post is wrong.
>>
>> On the contrary, you have completely failed to undermine my assumptions
>> or conclusions.
>
> I have completely failed to undermine your lack of curiousity.
You are observing deliberate choices, not a lack of curiosity.
> I think that's sad - a genuine effort to get the best from the g++ and
> ocamlopt compilers would be a much stronger basis for F# advocacy.
The -unsafe option alters the semantics of the language and does not reflect
typical use. If you want to avoid bounds checks then do it properly and do
it in both languages.
> Isaac Gouy wrote:
>> It seems like you've told us:
>>
>> - the Linux timings given in your original post were made on a busy
>> machine, and can be improved simply by pausing some programs while you
>> do the timing measurements
>
> That will always be true. You could improve performance further by tweaking
> the Linux kernel.
>
>> - the Linux timings given in your original post can be improved
>> significantly just by setting a compiler option
>
> Or optimizing the program properly, yes. This has no effect on my original
> statements, as we have shown.
>
>> It seems like you've told us the important information neatly
>> summarized in your original post is wrong.
>
> On the contrary, you have completely failed to undermine my assumptions or
> conclusions.
Jon, my confusion was honest. Your original post doesn't mention
software versions (I'd even suggest that you say which Windows version
you use ...) and your details of hardware configuration were scarce.
This is (from my side) not about trying to undermine your assumptions
but about getting complete data, and if possible, not scattered in
bits over a long thread. But don't go through too much effort opn my
account: As I said, I'm interested, but performance is not a primary
consideration with me, so I can live w/o if it is too much trouble.
My impression is, you too, are seeing this too much as a fight instead
of an opportunity to produce and exchange knowledge.
Regards -- Markus
> On Sep 9, 6:58 pm, Jon Harrop <j...@ffconsultancy.com> wrote:
> -snip-
>> > (Maybe "pausing some other programs" would also reduce your Win XP
>> > timings!)
>>
>> The Win XP box was already unladen.
>
> Maybe you're saying that it was a mistake to do timing measurements on
> a busy Linux box but you didn't make that mistake on the Win XP box?
> (Are those different computers?)
As I understood the original post, they aren't, but it would be good
to see that stated explicitely.
> It seems that when you say "well within experimental error" it's just
> a nice sounding phrase - it doesn't actually tell us about the
> variability of measurements made under the same conditions - someone
> could come along and say the 4.5% difference between F# and ocamlopt
> spectral-norm is "well within experimental error" without much fear of
> factual contradiction.
It probably is. :-).
- M
> Markus E L wrote:
>> I'm a bit confused now: Do we have a complete table of times with
>> hardware, software versions, OS, compilation options and finally time
>> somewhere?
>
> The table is far from complete, of course. You still have dozens of
> platforms and architectures to examine, a huge number of permutations of
> compiler options, compiler versions, OS versions and so forth. However, all
> of the important information was neatly summarized in my original post.
Correction: Do we have a complete and self contained table of the
measurements you've taken so far?
Years ago I've had the dubious joy to supervise student lab work. It
was similarly tedious to get them submitting complete reports (giving
a sketch of theory used as a basis including complete measurements and
results calculated from that including non obvious intermediate
results). Two of the most recurring themes: "But this is already in
the lab instructions" and "but there is no other way to do it anyway,
-- or to do it with this equipment -- why document it".
I've to admit I'm getting reminded to that. I always told them, that
it's them that want to make a (scientific) case, not me who wants to
know the numbers. Sometimes it even helped.
And next time anybody submits benchmarking numbers here, I'd kindly
ask to see this a a piece of lab work / measurement, so the need
arises to discuss
(a) the state of affairs from which one starts (previous work, state
of the art / scientific knowledge),
(b) the motivation ("... so we'd want to know ..."),
(c) the method ("therefore we ..., this works because, ... and
applying foobar avaraging during analysis ... unfolding the
baz-master artefacts ... gives us a good measure of ..."),
(d) the setup ("we are using a double refracted bit stream
frobnicator in the wiring give by Foomaster[1989] ..."),
(e) the numbers ("we've been measuring three times for every of the
..., the raw results are given in table 1"),
(f) the analysis ("applying the transformations (I) and (II) as
outlined in section (c) ... we get the values given in table 5
as foobar indicators ..."),
(g) _separate_ discussion ("... is in good agreement with the theory
... but ... contrary to the expectations outlined in (b) ...").
Of course some of the sections might be empty or very short (a
reference to an external source or another published paper). But the
general suggestion is, to keep it self contained in a way that the
"casual" reader, who is nonetheless trained in the art, get's a
complete picture (even if not exhaustive at first glance).
BTW: This now has only remotely to do with Jons submissions. But
coming back to them, Jon, I've the impression that you're mixing up
exhaustive and complete in your comments at the beginning of the
post. I did not ask for exhaustive measurements on all platforms. I
did ask for a complete tabulation of your results. And maybe it's
already somewhere: If so, my appologies.
Sorry for going into so much length here, but I suddenly had this
blinding flash of insight that benchmarking should be treated as every
other piece of labwork (and seldom is, even "professional"
benchmarking).
Regards -- Markus
What exactly do you mean by "complete" and "self-contained"?
Perhaps this part of the MICROSOFT .NET FRAMEWORK 2.0 EULA covers what
you'd like to see:
"(1) you must disclose all the information necessary for replication
of the tests, including complete and accurate details of your
benchmark testing methodology, the test scripts/cases, tuning
parameters applied, hardware and software platforms tested, the name
and version number of any third party testing tool used to conduct the
testing, and complete source code for the benchmark suite/harness that
is developed by or for you and used to test both the .NET Component
and the competing implementation(s)"
http://download.microsoft.com/download/4/7/c/47c23122-8647-4277-bec7-5164a6714f12/FX20.ENU.htm
> Markus E L wrote:
>> Jon Harrop wrote:
>>> Markus E L wrote:
>>>> I'm a bit confused now: Do we have a complete table of times with
>>>> hardware, software versions, OS, compilation options and finally time
>>>> somewhere?
>>>
>>> The table is far from complete, of course. You still have dozens of
>>> platforms and architectures to examine, a huge number of permutations of
>>> compiler options, compiler versions, OS versions and so forth. However,
>>> all of the important information was neatly summarized in my original
>>> post.
>>
>> Correction: Do we have a complete and self contained table of the
>> measurements you've taken so far?
>
> What exactly do you mean by "complete" and "self-contained"?
Something which can be used as the basis for discussion, has all
measurements (or the finally valid ones) in it and their context
(hardware, OS version(s), software versions). Single document, you
know. As I said: Something that could perhaps be seen as a lab report,
nothing I'd have to search through the thread to find out what the
current state of affairs is and patch the together with discriptions
and measurements from earlier posts?
You're only pretending to be dense now, don't you? Really, really,
disappointing that is.
I'm not so much interested in performance measurements at the moment
that I need to play those games. Sorry. Just ignore my request.
Regards -- Markus
Perhaps it does. Though _I_ don't care about enforcing the MS EULA
with others: First it remains to be seen which parts of the EULA are
actually binding in which country and second it is my experience that
the MS hotline doesn't understand their own EULA either (though they
are terribly friendly).
I'm not interested in putting out a small fire with a big tank of
gasoline, thanks. As I already wrote to Jon: If it is all soooooo
difficult to put together a proper, comprehensive and self contained
report, then he can leave it. I'm not so fixated on performance anyway.
(I think, though, OCaml could use a all-in-one distribution with a
rich library in windows).
Regards -- Markus
I see. You want enough information to be able to reproduce the results all
neatly collated. That's perfectly reasonable and I'll endeavour to do that
in the future.
In fact, I'll try to do that here. The hardware is:
AMD Athlon(tm) 64 X2 Dual Core Processor 4400+ 2.211GHz
2Gb RAM
Under Linux, each CPU is reported as:
cpu family : 15
model : 35
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 4400+
stepping : 2
cpu MHz : 2211.376
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm
3dnowext 3dnow pni lahf_lm cmp_legacy
bogomips : 4424.58
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp
Benchmarking current code from the shootout with bounds checking, measuring
real time best of three:
SpectralNorm N-Body
OCaml 14.353s 9.469s
F# 9.374s 6.933s
The following compiler options were used because they gave the fastest
results:
SpectralNorm:
ocamlopt -inline 10
fsc -O3
N-Body:
ocamlopt
fsc -O3
The following systems were used for the two languages:
OCaml:
x86_64 GNU/Linux 2.6.18 SMP
F#:
Windows XP Pro 2002 SP2
> You're only pretending to be dense now, don't you? Really, really,
> disappointing that is.
Not at all, no. I had been assuming that by "complete" you
meant "benchmarked on every permutation of setup" which is, of course,
infeasible.
Yes.
> -snip-
>> > Maybe you're saying that when you wrote "these are bast-case results
>> > for OCaml and C++" what you meant was that these are the best-case
>> > results when compiler options which would improve performance
>> > significantly are ignored?
>>
>> In OCaml, you should remove bounds checks explicitly at specific points
>> rather than globally altering the language semantics via a compiler flag.
>
> So why do you suppose that compiler flag exists?
Xavier would know.
>> >> > Can we help the compiler some more:
>> >> > - does -march=athlon64-sse3 help?
>> >> > - does -noassert help?
>>
>> >> Neither of those options will have any effect on this program.
>>
>> > Is that just a prediction or something you already tried (after you
>> > paused some other programs) ?
>>
>> The former option affects C code compiled with ocamlc/opt (there is none)
>> and the latter removes assertions (there are none). If you would like to
>> learn more about the ocamlopt compiler options then please read the
>> manual.
>
> -march=athlon64-sse3 Did you forget that you timed c++ programs?
If you're referring to C++, that compiler switch is not valid here:
$ g++ -O3 -march=athlon64-sse3 spectralnorm.cpp -o spectralnorm
spectralnorm.cpp:1: error: bad value (athlon64-sse3) for -march= switch
spectralnorm.cpp:1: error: bad value (athlon64-sse3) for -mtune= switch
I'm sure there are a plethora of valid switches I could try...