[Haskell-cafe] Can Haskell outperform C++?

211 views
Skip to first unread message

Janek S.

unread,
May 6, 2012, 2:40:01 AM5/6/12
to haskel...@haskell.org
Hi,

a couple of times I've encountered a statement that Haskell programs can have performance
comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell,
compiler can reason and make guarantess about the code and use that knowledge to automatically
parallelize the program without any explicit parallelizing commands in the code. I haven't seen
any sort of evidence that would support such claims. Can anyone provide a code in Haskell that
performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C
programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in
C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of
course allowed.

Jan

_______________________________________________
Haskell-Cafe mailing list
Haskel...@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Ivan Lazar Miljenovic

unread,
May 6, 2012, 3:44:23 AM5/6/12
to Janek S., haskel...@haskell.org
On 6 May 2012 16:40, Janek S. <freme...@poczta.onet.pl> wrote:
> Hi,
>
> a couple of times I've encountered a statement that Haskell programs can have performance
> comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell,
> compiler can reason and make guarantess about the code and use that knowledge to automatically
> parallelize the program without any explicit parallelizing commands in the code. I haven't seen
> any sort of evidence that would support such claims. Can anyone provide a code in Haskell that
> performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C
> programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in
> C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of
> course allowed.

How about http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-today-parallel-haskell-redux/
?

--
Ivan Lazar Miljenovic
Ivan.Mi...@gmail.com
http://IvanMiljenovic.wordpress.com

Artur

unread,
May 6, 2012, 4:41:58 AM5/6/12
to haskel...@haskell.org
On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:
> On 6 May 2012 16:40, Janek S.<freme...@poczta.onet.pl> wrote:
>> Hi,
>>
>> a couple of times I've encountered a statement that Haskell programs can have performance
>> comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell,
>> compiler can reason and make guarantess about the code and use that knowledge to automatically
>> parallelize the program without any explicit parallelizing commands in the code. I haven't seen
>> any sort of evidence that would support such claims. Can anyone provide a code in Haskell that
>> performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C
>> programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in
>> C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of
>> course allowed.
> How about http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-today-parallel-haskell-redux/
> ?
>
Hi,

isn't it that particular Haskell code is outperforming C (22 seconds
vs. 33), just because the author uses recursion in C? I surely love
Haskell, and the way it's code is easy parallelized, but that example
seams not fair.

Yves Parès

unread,
May 6, 2012, 4:58:45 AM5/6/12
to Artur, haskel...@haskell.org
I do not agree: the fib function is tail-recursive, any good C compiler is able to optimize away the calls and reduce it to a mere loop.
At least that's what I learnt about tail recursion in C with GCC.

2012/5/6 Artur <apek...@gmail.com>

Roman Cheplyaka

unread,
May 6, 2012, 5:09:39 AM5/6/12
to Yves Parès, Artur, haskel...@haskell.org
It is not tail-recursive.

* Yves Parès <yves....@gmail.com> [2012-05-06 10:58:45+0200]
> I do not agree: the fib function is tail-recursive, any good C compiler is
> able to optimize away the calls and reduce it to a mere loop.
> At least that's what I learnt about tail recursion in C with GCC.
>
> 2012/5/6 Artur <apek...@gmail.com>
>
> > On 06.05.2012 10:44, Ivan Lazar Miljenovic wrote:
> >
> >> On 6 May 2012 16:40, Janek S.<freme...@poczta.onet.pl> wrote:
> >>
> >>> Hi,
> >>>
> >>> a couple of times I've encountered a statement that Haskell programs can
> >>> have performance
> >>> comparable to programs in C/C++. I've even read that thanks to
> >>> functional nature of Haskell,
> >>> compiler can reason and make guarantess about the code and use that
> >>> knowledge to automatically
> >>> parallelize the program without any explicit parallelizing commands in
> >>> the code. I haven't seen
> >>> any sort of evidence that would support such claims. Can anyone provide
> >>> a code in Haskell that
> >>> performs better in terms of execution speed than a well-written C/C++
> >>> program? Both Haskell and C
> >>> programs should implement the same algorithm (merge sort in Haskell
> >>> outperforming bubble sort in
> >>> C doesn't count), though I guess that using Haskell-specific idioms and
> >>> optimizations is of
> >>> course allowed.
> >>>
> >> How about http://donsbot.wordpress.com/**2007/11/29/use-those-extra-**
> >> cores-and-beat-c-today-**parallel-haskell-redux/<http://donsbot.wordpress.com/2007/11/29/use-those-extra-cores-and-beat-c-today-parallel-haskell-redux/>
> >> ?
> >>
> >> Hi,
> >
> > isn't it that particular Haskell code is outperforming C (22 seconds vs.
> > 33), just because the author uses recursion in C? I surely love Haskell,
> > and the way it's code is easy parallelized, but that example seams not fair.
> >
> >
> > ______________________________**_________________
> > Haskell-Cafe mailing list
> > Haskel...@haskell.org
> > http://www.haskell.org/**mailman/listinfo/haskell-cafe<http://www.haskell.org/mailman/listinfo/haskell-cafe>
> >

> _______________________________________________
> Haskell-Cafe mailing list
> Haskel...@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


--
Roman I. Cheplyaka :: http://ro-che.info/

Yves Parès

unread,
May 6, 2012, 5:12:15 AM5/6/12
to Roman Cheplyaka, Artur, haskel...@haskell.org
Sorry sorry sorry ^^
I read too fast and realized I was wrong before you sent this.

Okay so then it would be nice that somewhere with higher skills tells us where does Haskell recursive calls stand when compared to C's.

2012/5/6 Roman Cheplyaka <ro...@ro-che.info>

Roman Cheplyaka

unread,
May 6, 2012, 5:23:06 AM5/6/12
to Artur, haskel...@haskell.org
* Artur <apek...@gmail.com> [2012-05-06 11:41:58+0300]

> isn't it that particular Haskell code is outperforming C (22 seconds
> vs. 33), just because the author uses recursion in C? I surely love
> Haskell, and the way it's code is easy parallelized, but that example
> seams not fair.

I think the point was to take two programs of similar code complexity
(or, rather, simplicity) implementing the same algorithm (modulo
parallelism).

So I'm not sure what exactly you're objecting to.

If you're saying that there are better algorithms to compute Fibonacci
numbers, it's not relevant — the important thing that the two
programs are computing the same thing in the same way.

If you're saying that in C an explicit stack should have been used
instead of recursion, then it would increase the code complexity while
having non-obvious performance benefits.

In any case, assuming that on this particular task Haskell is x times
slower than C, taking sufficiently many cores and large enough N would
outweigh that.

The again, I don't think that article was meant as a fair comparison
between Haskell and C on an average task. (The chosen example consists of
one loop which is easily parallelisable.) All it demonstrates is that
it is very easy to exploit parallelism in Haskell when there's an
opportunity.

--
Roman I. Cheplyaka :: http://ro-che.info/

_______________________________________________

Austin Seipp

unread,
May 6, 2012, 5:51:31 AM5/6/12
to Roman Cheplyaka, Artur, Yves Parès, haskel...@haskell.org
In this case it doesn't matter; while it isn't technically tail
recursive, GCC is very capable of transforming it into a direct loop
likely because it knows about the associative/commutative properties
of "+" so it's able to re-arrange the body as it sees fit since
combined, both calls are in 'tail position.'

With GCC 4.6.3 at -O3 optimization level on my Ubuntu 12.04 machine,
the example in the linked article executes in 9s (Intel Core i5-2520M
CPU @ 2.50GHz, dual core with 4 total logical cores) and the body of
fib does no contain any call instructions - in fact, it seems to not
only transform the body into a loop, but unroll a very good portion of
it leading to very very fast code.

Technically, if you want to be a pedant, the benchmarks aren't even
equivalent anyway; GCC is able to use the native 'long long' datatype
while GHC promotes the results of 'fib' to Integer, and thus is backed
by GMP and a lot of extra code. GMP is fast, but it's not as fast as a
native data type. It should use Int64.That change alone speeds up the
benchmark by approximately 30% for me using GHC 7.4.1 (~30s at -N4 vs
~19s at -N4.)

I don't think the point of that post was to outwit the C compiler, but
show how easy it is to add parallelism to a Haskell program
(ironically, pseq/par has proven quite difficult to leverage for nice
wall-clock speedups in practice, hence the advent of libraries like
strategies and more recently monad-par, that make speedups easier to
get.) I do think it's much easier to add parallelism to Haskell
programs - even if they are not as fast, it's far easier to write,
understand, and safer to do so. And ultimately all this code is very
old (5+ years) and not reflective of either toolchain anymore. GHC has
had immesurable amounts of overhauls in its parallelism support as
well as the libraries supporting it, and both GHC and GCC have far
better optimisation capabilities.
--
Regards,
Austin

Axel

unread,
May 6, 2012, 6:03:36 AM5/6/12
to Roman Cheplyaka, haskel...@haskell.org
Well, I believe that comparison is about solving a task, not writing code in some particular way. I get your point, but I think that when comparing language speed, you should use the best techniques each one has to offer, it makes no sense otherwise.

If there was some kind of comparison made recently between Haskell and C++, I'd be really glad to read a report. I like Haskell much more than C++, and it'd be good to know, how write code in Haskell of speed equal or greater to that of C++ (at least in some typical functional language tasks, like math and data processing).

Jerzy Karczmarczuk

unread,
May 6, 2012, 6:09:39 AM5/6/12
to haskel...@haskell.org
Roman Cheplyaka:
> If you're saying that in C an explicit stack should have been used
> instead of recursion, then it would increase the code complexity while
> having non-obvious performance benefits.
This is a fragment of a bigger message I won't comment.

But THIS is a little dubious. You may accuse anything of anything by
such vacuous statements as "non-obvious performance benefits". If the
stack frame allocation policy is lousy (not because of incompetence of
the compiler writers, but because of its "universalism"), then the gain
may be quite visible. My favourite examples are the classical "flood
fill" algorithms for filling closed regions in computer graphics, and
also some models of percolation (finding paths in rather big graphs).
Everything depends on the language used, but I have seen the
acceleration by a factor of 5 after having replaced the recursion by
explicit stacks + loops.

Jerzy Karczmarczuk

Ertugrul Söylemez

unread,
May 6, 2012, 7:40:59 AM5/6/12
to haskel...@haskell.org
"Janek S." <freme...@poczta.onet.pl> wrote:

> a couple of times I've encountered a statement that Haskell programs
> can have performance comparable to programs in C/C++. I've even read
> that thanks to functional nature of Haskell, compiler can reason and
> make guarantess about the code and use that knowledge to automatically
> parallelize the program without any explicit parallelizing commands in
> the code. I haven't seen any sort of evidence that would support such
> claims. Can anyone provide a code in Haskell that performs better in
> terms of execution speed than a well-written C/C++ program? Both
> Haskell and C programs should implement the same algorithm (merge sort
> in Haskell outperforming bubble sort in C doesn't count), though I
> guess that using Haskell-specific idioms and optimizations is of
> course allowed.

While Haskell usually gets close to C/C++ speed for number crunching
algorithms it seldomly beats them. Of course the execution model gives
potential to do so, but currently GHC doesn't do enough low level
optimization. It is amazing at high level optimization so you often get
similar speeds for much less code that is close to the problem domain.
Combined with the fact that it's more difficult to write incorrect
programs in Haskell there is an overall save in time to get to the
result and that is amplified whenever you have to change the program
later.

Where Haskell really gets beyond C and C++ is in concurrency.
Multithreaded network servers written in Haskell are not only a lot
easier to write, but they also perform better than well written server
programs in C/C++. This is because Haskell's concurrency goes well with
its parallel execution model, where in machine code you don't really
have a notion of procedures. You have thunks and they can not only be
executed in parallel, but they can also be moved between threads freely.

Imagine that in C/C++ you would move a client to another thread in the
middle of a function. The Haskell run-time system does this all the
time to distribute computing time evenly between all CPUs. Of course
this preemptive execution model can be replicated in C/C++, but that is
an amazingly difficult task, so difficult that you would practically
write very verbose Haskell in C/C++.


Greets,
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/
signature.asc

Ertugrul Söylemez

unread,
May 6, 2012, 7:52:47 AM5/6/12
to haskel...@haskell.org
Roman Cheplyaka <ro...@ro-che.info> wrote:

> > isn't it that particular Haskell code is outperforming C (22
> > seconds vs. 33), just because the author uses recursion in C? I
> > surely love Haskell, and the way it's code is easy parallelized, but
> > that example seams not fair.
>
> I think the point was to take two programs of similar code complexity
> (or, rather, simplicity) implementing the same algorithm (modulo
> parallelism).
>
> So I'm not sure what exactly you're objecting to.

I think while this is a valid argument it is not very convincing. You
have to acknowledge that C and C++ can give better performance at number
crunching. The gain is small enough that personally I wouldn't go
through the trouble of writing my algorithms in C/C++, but simple-minded
people often have a desire to get the best performance possible, in
which case you really want to use C, C++, Fortran or whatever high level
assembler language you like.

In other words: Writing Haskell in C is just as wrong as writing C in
Haskell. A comparison of algorithm /implementation styles/ does not
have much semantic content. It just shows that Haskell rocks at high
level optimization while C rocks at low level optimization. By the same
experiment you can see that Haskell is bad at low level optimization
while C is bad at high level optimization.

In order to compare code performance you have to implement the same
algorithm in the idiomatic style in both languages. This experiment
would show that Haskell, even though it gets close, is still slower than
C. It would however show that interpreted Python and Ruby are
considerably slower than both.
signature.asc

Bartosz Milewski

unread,
May 6, 2012, 4:15:34 PM5/6/12
to Austin Seipp, Artur, haskel...@haskell.org, Yves Parès
An equivalent parallel version in C++11 would look something like this:

long long fib(long long n) {
if (n< 2) {
return 1;
}
std::future<long long> r = std::async(fib, n-2);
long long l = fib(n-1);
return r.get() + l;
}


My bet though is that it would perform worse that the serial version
because of much higher overheads in starting threads in C++.

[:Bartosz:]

wren ng thornton

unread,
May 6, 2012, 7:32:25 PM5/6/12
to haskel...@haskell.org
On 5/6/12 2:40 AM, Janek S. wrote:
> Hi,
>
> a couple of times I've encountered a statement that Haskell programs can have performance
> comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell,
> compiler can reason and make guarantess about the code and use that knowledge to automatically
> parallelize the program without any explicit parallelizing commands in the code. I haven't seen
> any sort of evidence that would support such claims. Can anyone provide a code in Haskell that
> performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C
> programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in
> C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of
> course allowed.

For a quantitative comparison in a very narrow domain, check out:

Duncan Coutts, Don Stewart, Roman Leshchinskiy (2007).
Rewriting Haskell Strings.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.90.3166

Fundamentally, neither language can be faster than the other since it's
possible to hack around in assembly code with both of them. Therefore,
asking which is faster when implementing identical algorithms is not a
meaningful question. In reality, most people don't spend time
microoptimizing assembly code these days; instead, people implement
whatever seems good enough given the constraints on development time and
execution time. Thus, the real question should be: which algorithms are
made easy to implement by the language?--- either in terms of raw syntax
(hence maintainability), or in terms of man-hours worked (hence
development time).

The ease of parallelism in Haskell is a prime example of the fact that,
while all of it is technically possible to re-implement in C with
identical performance, Haskell is simply better because the parallel
code is already implemented and programs using it are statically checked
for type safety, which reduces the development time significantly. But
for the most part, that isn't a comparison of languages, it's a
comparison of libraries (for which the language of Haskell facilitates
making good libraries).

Once you approach the real question and try to quantify it, you're going
to run into the issues that arise any time you try to quantify human
populations. What works well for one project or company is not
necessarily going to generalize to a different set of developers; thus,
in order to ensure ecological validity in your comparison, you'll need
to compare lots of different kinds of groups. But of course, your
ability to do so is biased by the social environment; e.g., imperative
languages are more mainstream and therefore afford different developer
communities than functional languages do. Thus, not only can you not
come up with a simple metric that does justice to the question, but the
answer to the question is going to evolve and change over time ---even
if the underlying languages and compilers do not change--- because
society changes and evolves over time.

For as often as this question is asked, and for as much as it could be
interesting to actually get an answer, this isn't the sort of research
that computer scientists are (generally) trained to do. Which is why it
isn't generally done.

--
Live well,
~wren

Silvio Frischknecht

unread,
May 8, 2012, 1:01:17 AM5/8/12
to haskel...@haskell.org
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Can anyone provide a code in Haskell that performs better in terms
> of execution speed than a well-written C/C++ program?

http://shootout.alioth.debian.org/ compares a lot of programming
languages on a lot of problems. Haskell is never as fast as C++, but
for some problems it comes close.

Also I challenge anyone to improve one of the haskell programs there.
It'd be cool if we could make haskell get a higher rank. I recently
managed to improve the Fasta algorithm, but not by much. Also I think
the benchmarks don't use llvm flag. It says somewhere that they don't
measure llvm because they don't have time. But I think they are
refering to clang. So maybe someone should tell them to turn on the
llvm flag since it makes a lot of haskell programs faster.

Silvio
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJPqKicAAoJEDLsP+zrbatWmSIP+gNSI61KMvY2VRsWRhd7+j5U
YAO3CnBTt6lSbMNplFf5AZnbPnY3NVJSG2ZUN12n7ZcaOawmwub3H+N9e0XTXv38
vOEGzFb/t/OTIx4GHXiWz6kZfzyiTVQpEhzoqx/ZX4KZqrUBt5ZuSpmWtPrPXCfF
joCZiBZIwxfOkI/l1zsb8vW3Xaqxs9dfRQOJEf7GLB5zGXwUA2ZLWG5HYvAUHu0G
9xB9cBgaX7HKdBwo3af2WZEFznZnZ1LUpc8TuacL54wVmLpLNJU2EaeqH2LsH0R3
VxX9yU1TIQUBubDa1Tui73xJ2L4Ns7AVD7Kx14yK5cu61wpz/KeUOU/wgedp+wNe
o54alfqzrfOC+GAJGJFg+3aIkXuij4j1jTXZ+/3ya7nofcBZwtqoZUWCvjYSoEaI
xecxuHLCK2pd3zwPk7ZUnG0Mo0vu4ZpVKpCs4u8nPRzVl0101mGkHSVTVXjVP8R/
d3AIPwy74B4nvCk9gohxHwtsvsmzxoRZr7E5XkGDTQdbj0Ly5bJfBW3c1X/jvq9c
FHvxCspERGf6S+aX9L6lg9v3/aje/av2q0zUL7jizA4no3q7D/ZvWkmIWF5ySyRh
+QrC39I6GHDMvxXn0HIp9m2226sNGL4vpvBTgucJWJcHUX+FdytFIe7VQ0ZvdXvx
IjxCrgMAPVy5/TH44PP+
=TaFj
-----END PGP SIGNATURE-----

Yves Parès

unread,
May 8, 2012, 6:18:28 AM5/8/12
to Silvio Frischknecht, haskel...@haskell.org
One thing that baffles me is the comparison Haskell V. Java: http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ghc&lang2=java

Would've expected always shorter code and better performances on average.

2012/5/8 Silvio Frischknecht <silvio....@gmail.com>

Simon Marlow

unread,
May 8, 2012, 8:41:26 AM5/8/12
to Janek S., haskel...@haskell.org
On 06/05/2012 07:40, Janek S. wrote:
> a couple of times I've encountered a statement that Haskell programs
> can have performance comparable to programs in C/C++. I've even read
> that thanks to functional nature of Haskell, compiler can reason and
> make guarantess about the code and use that knowledge to
> automatically parallelize the program without any explicit
> parallelizing commands in the code. I haven't seen any sort of
> evidence that would support such claims. Can anyone provide a code in
> Haskell that performs better in terms of execution speed than a
> well-written C/C++ program? Both Haskell and C programs should
> implement the same algorithm (merge sort in Haskell outperforming
> bubble sort in C doesn't count), though I guess that using
> Haskell-specific idioms and optimizations is of course allowed.

On the subject of parallelism in particular, there is no fully implicit
parallelism in Haskell at the moment. When people ask about this I
typically point to this paper:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.145.8183

which shows that it is possible to get some gains doing this, but it is
hard and the gains were not great.


However, what Haskell does give you is a guarantee that you can safely
call any function in parallel (with itself or with something else), as
long as the function does not have an IO type. This is about as close
to automatic parallelisation as you can practically get. Take any pure
function from a library that someone else wrote, and you can use it with
parMap or call it from multiple threads, and reasonably expect to get a
speedup. Furthermore with parMap you're guaranteed that the result will
be deterministic, and there are no race conditions or deadlocks to worry
about.

Cheers,
Simon

Daniël de Kok

unread,
May 8, 2012, 8:43:48 AM5/8/12
to haskell-cafe
On May 8, 2012, at 12:18 PM, Yves Parès wrote:
> Would've expected always shorter code

It's not so surprising if you consider that some of the programs are practically imperative programs in Haskell. To give an example:

http://shootout.alioth.debian.org/u32/program.php?test=fannkuchredux&lang=ghc&id=3

-- Daniël

Isaac Gouy

unread,
May 8, 2012, 2:03:59 PM5/8/12
to haskel...@haskell.org
2012/5/8 Silvio Frischknecht

>Also I challenge anyone to improve one of the haskell programs there. >It'd be cool if we could make haskell get a higher rank. I recently >managed to improve the Fasta algorithm, but not by much. Also I think >the benchmarks don't use llvm flag. It says somewhere that they don't >measure llvm because they don't have time. But I think they are >refering to clang. So maybe someone should tell them to turn on the >llvm flag since it makes a lot of haskell programs faster.


Several GHC versions have come and gone since the Haskell benchmarks game programs were written, so perhaps it is time that a dozen new programs were written to replace those old programs - new programs that take advantage of GHC 7.4.1 and -llvm.

Ryan Newton

unread,
May 10, 2012, 8:44:09 PM5/10/12
to Ertugrul Söylemez, haskel...@haskell.org
through the trouble of writing my algorithms in C/C++, but simple-minded
people often have a desire to get the best performance possible, in
which case you really want to use C, C++, Fortran or whatever high level
assembler language you like.

I think this is a bit of an unfair accusation ("simple-minded").  Performance is only relevant to certain domains, sure.  But program performance is an entire *industry*.  And I'd argue it's of massive importance to the world at large.  Intel has an army of "AE"s (application engineers) that go out to other companies to make applications perform better.  There are plenty of compute bound industries -- i.e. movie companies are limited by what kind of frame they can render in ~24 hrs; sequencing centers are limited by certain very slow bioinformatics algorithms; you can look almost anywhere you like for examples.

As a community I think we have to face the fact that writing the hot inner loop of your application as idiomatic Haskell is not [yet] going to give you C/Fortran performance off the bat.  Though in some cases there's not really anything stopping us but more backend/codegen work (I'm thinking of arithmetically intensive loops with scalars only).  For example, the following Mandel kernel is in many ways the *same* as the C version:


We have the types; we've got strictness (for this loop); but the C version was 6X faster when I tested it.

  -Ryan

Manuel M T Chakravarty

unread,
May 10, 2012, 10:09:51 PM5/10/12
to rrne...@gmail.com, Ertugrul Söylemez, haskel...@haskell.org
Ryan Newton:
As a community I think we have to face the fact that writing the hot inner loop of your application as idiomatic Haskell is not [yet] going to give you C/Fortran performance off the bat.  Though in some cases there's not really anything stopping us but more backend/codegen work (I'm thinking of arithmetically intensive loops with scalars only).  For example, the following Mandel kernel is in many ways the *same* as the C version:


We have the types; we've got strictness (for this loop); but the C version was 6X faster when I tested it.

Did you use the LLVM backend? I am asking because I have seen dramatic differences between the code generators in similar example. Have a look at


With GHC's native code generator, the C version is much faster than the Haskell version (by a factor of 2 or 3 IIRC), but using GHC's LLVM backend, the Haskell version is even a few percent faster than the C version on my machine. (This is on OS X using llvm-gcc to compile the C code — that is, the code generator for the C and Haskell version goes via LLVM.)

I think that is an interesting example, because it shows (a) just how much of a difference a good code generator can make and (b) that using the same code generator, a Haskell compiler has no inherent disadvantage to a C compiler. (Nevertheless, especially for small examples, it is much easier to write a slow Haskell than to write a slow C program.)

Manuel

Ertugrul Söylemez

unread,
May 10, 2012, 11:53:07 PM5/10/12
to haskel...@haskell.org
Ryan Newton <rrne...@gmail.com> wrote:

> I think this is a bit of an unfair accusation ("simple-minded").
> Performance is only relevant to certain domains, sure. But program
> performance is an entire *industry*. And I'd argue it's of massive
> importance to the world at large. Intel has an army of "AE"s
> (application engineers) that go out to other companies to make
> applications perform better. There are plenty of compute bound
> industries -- i.e. movie companies are limited by what kind of frame
> they can render in ~24 hrs; sequencing centers are limited by certain
> very slow bioinformatics algorithms; you can look almost anywhere you
> like for examples.

I'm sorry for the rough words. I didn't mean to overgeneralize. Of
course in certain domains a reasonable performance is desirable, but the
first question you should ask is whether you want high level or low
level performance. Haskell performs amazingly at the high level and
suboptimal at the low level.

My point is: If you need C-like performance at a certain spot there is
really no excuse for writing the entire application in C. Haskell has a
working, powerful enough FFI. Also idiomatic Haskell code nowadays
performs close to C. If your code doesn't, chances are that it's not
even idiomatic. Sorting a list is easy and beautiful in code. But it's
far from idiomatic. Using ST with destructive update is also not
idiomatic. One of my performance masterpieces is the "instinct" AI
library (see Hackage). It uses only immutable vectors and performs very
heavy Double calculations, yet performs better than the same code with
mutable arrays did.

With a few years of Haskell experience in my backpack I know how to
utilize laziness to get amazing performance for code that most people
would feel must be written with destructively updating loop. And the
resulting code is just beautiful to read and watch at work, a great
demonstration of the wonders the GHC developers have created.


> As a community I think we have to face the fact that writing the hot
> inner loop of your application as idiomatic Haskell is not [yet] going
> to give you C/Fortran performance off the bat. Though in some cases
> there's not really anything stopping us but more backend/codegen work
> (I'm thinking of arithmetically intensive loops with scalars only).
> For example, the following Mandel kernel is in many ways the *same* as
> the C version:
>
> https://github.com/simonmar/monad-par/blob/662fa05b2839c8a0a6473dc490ead8dd519ddd1b/examples/src/mandel.hs#L24H
>
> We have the types; we've got strictness (for this loop); but the C
> version was 6X faster when I tested it.

Well, if it's "in many ways the same as C", then again it's probably not
idiomatic Haskell. I don't know what a Mandel kernel is, so I can't
really tell you how I would write the code without further study.
However, I'm doing a lot of number crunching and my code usually gets
really close to C, and especially for Integer-heavy calculations
actually outperforms C. I have done the comparison. Using GMP with all
the features it provides in the mpz_* family of functions is in fact
slower than the same in Haskell (GHC 7.4.1 on Intel i5 64 bits here),
and from my C times I have experience using GMP properly including smart
allocation, etc.

If you want high performance Haskell, writing C in Haskell is /not/ the
solution. It /will/ result in suboptimal code, likely considerably
slower than the same code in C.


Greets,
Ertugrul

--
Key-ID: E5DD8D11 "Ertugrul Soeylemez <e...@ertes.de>"
FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11
Keysrv: hkp://subkeys.pgp.net/
signature.asc

Gregg Lebovitz

unread,
May 11, 2012, 8:41:12 AM5/11/12
to haskel...@haskell.org
This is an outstanding discussion. I am learning a lot from it.

I would find it useful to pull all this information together into a single document that discusses all the performance issues in one place and shares the real life experience is dealing with each issue. I see this as a best practice paper rather than a research document.

Does such a document exist? If not, I am willing try and start one.

Ryan writes:

With a few years of Haskell experience in my backpack I know how to
utilize laziness to get amazing performance for code that most people
would feel must be written with destructively updating loop.


I have heard similar comments  from other experienced Haskell users about how get better performance from their code by understanding the details of applying laziness. I would like to try and capture this.

Are there blogs, wiki postings, and list postings that provide information about particular issues?

Gregg

Chris Wong

unread,
May 12, 2012, 1:02:07 AM5/12/12
to Gregg Lebovitz, haskel...@haskell.org
On Sat, May 12, 2012 at 12:41 AM, Gregg Lebovitz <gleb...@gmail.com> wrote:
> I would find it useful to pull all this information together into a single
> document that discusses all the performance issues in one place and shares
> the real life experience is dealing with each issue. I see this as a best
> practice paper rather than a research document.
>
> Does such a document exist? If not, I am willing try and start one.

http://www.haskell.org/haskellwiki/Performance

;)

Gregg Lebovitz

unread,
May 12, 2012, 11:20:00 AM5/12/12
to Chris Wong, haskel...@haskell.org
Chris,

Thanks you for indulging me. I will eventually get use to the idea that
there is a wiki page for almost every topic :-)

Gregg

Ertugrul Söylemez

unread,
May 13, 2012, 12:55:17 AM5/13/12
to haskel...@haskell.org
Gregg Lebovitz <gleb...@gmail.com> wrote:

> Ryan writes:
>
> With a few years of Haskell experience in my backpack I know how to
> utilize laziness to get amazing performance for code that most people
> would feel must be written with destructively updating loop.

That was me actually. Just a minor side note that some people may
consider a nitpick, but I'd like to keep being the author of my own
statements. Thanks.
signature.asc

Dmitry Vyal

unread,
May 14, 2012, 4:03:23 AM5/14/12
to Ertugrul Söylemez, Haskell-Cafe
On 05/11/2012 07:53 AM, Ertugrul Söylemez wrote:
> My point is: If you need C-like performance at a certain spot there is
> really no excuse for writing the entire application in C. Haskell has
> a working, powerful enough FFI. Also idiomatic Haskell code nowadays
> performs close to C. If your code doesn't, chances are that it's not
> even idiomatic. Sorting a list is easy and beautiful in code. But it's
> far from idiomatic. Using ST with destructive update is also not
> idiomatic. One of my performance masterpieces is the "instinct" AI
> library (see Hackage). It uses only immutable vectors and performs
> very heavy Double calculations, yet performs better than the same code
> with mutable arrays did. With a few years of Haskell experience in my
> backpack I know how to utilize laziness to get amazing performance for
> code that most people would feel must be written with destructively
> updating loop. And the resulting code is just beautiful to read and
> watch at work, a great demonstration of the wonders the GHC developers
> have created.
Hello Ertugrul,

Out of the curios, did you compare the performance of Instinct with
implementations in languages, associated with numerical computations,
like Octave?

Ertugrul Söylemez

unread,
May 14, 2012, 5:27:28 AM5/14/12
to haskel...@haskell.org
Dmitry Vyal <aka...@gmail.com> wrote:

> > My point is: If you need C-like performance at a certain spot there
> > is really no excuse for writing the entire application in C.
> > Haskell has a working, powerful enough FFI. Also idiomatic Haskell
> > code nowadays performs close to C. If your code doesn't, chances are
> > that it's not even idiomatic. Sorting a list is easy and beautiful
> > in code. But it's far from idiomatic. Using ST with destructive
> > update is also not idiomatic. One of my performance masterpieces is
> > the "instinct" AI library (see Hackage). It uses only immutable
> > vectors and performs very heavy Double calculations, yet performs
> > better than the same code with mutable arrays did. With a few years
> > of Haskell experience in my backpack I know how to utilize laziness
> > to get amazing performance for code that most people would feel must
> > be written with destructively updating loop. And the resulting code
> > is just beautiful to read and watch at work, a great demonstration
> > of the wonders the GHC developers have created.
>
> Out of the curios, did you compare the performance of Instinct with
> implementations in languages, associated with numerical computations,
> like Octave?

No, sorry.
signature.asc

Ryan Newton

unread,
May 14, 2012, 10:26:35 AM5/14/12
to Ertugrul Söylemez, Manuel Chakravarty, haskel...@haskell.org
Well, if it's "in many ways the same as C", then again it's probably not
idiomatic Haskell.  

It's just a recursive equation for mandelbrot fractals.  I should have been precise, I didn't mean that the source is literally the same as the C source (i.e. there's no for loop, no mutable variables), rather that it presents the compiler backend with the same advantages as the C backend would have with the equivalent loop.  That is, there's no control flow obfuscation (dictionaries, calls to unknown targets), no problems with laziness, and the data representations are completely known.  


mandel :: Int -> Complex Double -> Int
mandel max_depth c = loop 0 0
  where
   loop i !z
    | i == max_depth = i
    | magnitude z >= 2.0 = i
    | otherwise = loop (i+1) (z*z + c)

It's also a loop that is readily recognized as a loop.  Now, to my knowledge, GHC doesn't explicitly recognize loops in any stage of the compiler (so as to perform loop optimizations), something which other functional compilers sometimes do.  

But, anyway, it turns out that my example above is easily transformed from a bad GHC performance story into a good one.  If you'll bear with me, I'll show how below.
   First, Manuel makes a good point about the LLVM backend.  My "6X" anecdote was from a while ago and I didn't use llvm [1].  I redid it just now with 7.4.1+LLVM, results below.  (The below table should read correctly in fixed width font, but you can also see the data in the spreadsheet here.)

                   Time (ms)   Compiled File size   Comple+Runtime (ms)
GHC 7.4.1 O0    2444        1241K
GHC 7.4.1 O2    925        1132K             1561
GHC 7.4.1 O2 llvm  931         1133K
GHC 7.0.4 O2 via-C 684         974K

So LLVM didn't help [1].  And in fact the deprecated via-C backend did the best!  Compare with GCC: 

G++ O0             300         9K                   531
G++ O3             110         7K                   347
G++ O3 recursive   116         9K

Uh oh, the "6X" gap I mentioned is now closer to 9X.  And, in fact, performing a mini "language shootout" on the above code, reveals that GHC is doing worse than not only OCaml, but Chez Scheme, in spite of dynamic type checks, and a necessarily boxed representation of complex numbers:

Chez Scheme 8.4    284         2.7K notStandalone   372
OCaml              166         180K                 301

At least Python does worse!

Python 2.6         1973        NA                   1973

So here's the catch:  If you check the Core and STG GHC 7.4 is actually compiling the above loop very well.  This microbenchmark turns into just a "magnitude" microbenchmark.  The implementation of Data.Complex uses an EXTREMELY expensive method to avoid overflow [2].

Since OCaml did well above, I took a look at their standard library's implementation, on line 51 here.  They use a nice little math trick (the extra division) that is also mentioned on Wikipedia.  By implementing the same trick in Haskell, replacing the standard "magnitude" function, we get the following results.

GHC 7.4.1 No
Overflow Avoidance   
39         1127K                674
GHC 741, OCaml-style
Overflow avoidance   
74                  1127K

Wow, now not only is the Haskell version faster than OCaml/Scheme, but it is 48% faster than C++, which is appropriate to the title of this email!  Of course, this probably means that C++'s abs is also doing something overly expensive for overflow avoidance (or that their representation of complex numbers is not fully unpacked and stored in registers)  I haven't tracked it down yet.

But in any case, I'll be submitting a library patch.  The moral, I think, is that community members could do a great deal to help "Haskell" performance by simply microbenchmarking and optimizing library routines in base!

Cheers,
  -Ryan

P.S. You can check out the above benchmarks from here: https://github.com/rrnewton/MandelMicrobench

[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM.

[2]  P.P.P.S. It turns out there was already a ticket (http://hackage.haskell.org/trac/ghc/ticket/2450) regarding magnitude's performance.  But it still has bad performance even after a small refactoring was performed.

Manuel M T Chakravarty

unread,
May 14, 2012, 8:55:22 PM5/14/12
to rrne...@gmail.com, Ertugrul Söylemez, haskel...@haskell.org
Ryan Newton:
But, anyway, it turns out that my example above is easily transformed from a bad GHC performance story into a good one.  If you'll bear with me, I'll show how below.
   First, Manuel makes a good point about the LLVM backend.  My "6X" anecdote was from a while ago and I didn't use llvm [1].  I redid it just now with 7.4.1+LLVM, results below.  (The below table should read correctly in fixed width font, but you can also see the data in the spreadsheet here.)

                   Time (ms)   Compiled File size   Comple+Runtime (ms)
GHC 7.4.1 O0    2444        1241K
GHC 7.4.1 O2    925        1132K             1561
GHC 7.4.1 O2 llvm  931         1133K
GHC 7.0.4 O2 via-C 684         974K

So LLVM didn't help [1].  And in fact the deprecated via-C backend did the best!  

I would classify that as a bug.


[1] P.P.S. Most concerning to me about Haskell/C++ comparisons are David Peixotto's findings that LLVM optimizations are not very effective on Haskell-generated LLVM compared with typical clang-generated LLVM.

There is some work underway to improve the situation, but I am sure more could be done.

Manuel

Yves Parès

unread,
May 15, 2012, 5:59:50 AM5/15/12
to Chris Wong, haskel...@haskell.org
Yet this resource seems a little outdated:
http://www.haskell.org/haskellwiki/Performance/Strings does not speak about Text
http://www.haskell.org/haskellwiki/Performance/Arrays about Vector
And what's the status of the Streams IO approach detailed in http://www.haskell.org/haskellwiki/Performance/IO ? Has it evolved into enumerators/conduits, or was that an unrelated approach?

2012/5/12 Chris Wong <chrisyco+h...@gmail.com>

wren ng thornton

unread,
May 16, 2012, 6:54:08 AM5/16/12
to haskel...@haskell.org
On 5/10/12 8:44 PM, Ryan Newton wrote:
>> through the trouble of writing my algorithms in C/C++, but simple-minded
>> people often have a desire to get the best performance possible, in
>> which case you really want to use C, C++, Fortran or whatever high level
>> assembler language you like.
>
> I think this is a bit of an unfair accusation ("simple-minded").

Well, yes and no.

On the one hand, characterizing those who desire the best performance
possible as "simple-minded" is, at best, a gross over-generalization.
Like you, I work in a field where optimization is king (e.g., in machine
translation, program runtimes are measured in days).

But on the other hand, there are quite a lot of people who focus
excessively on "optimization" when the actual differences for the code
they're writing are either negligible (e.g., because of I/O bottlenecks)
or uninteresting (e.g., a 2x slowdown is a nuisance rather than a
crisis). For those who focus on optimization when the benefits are
negligible or uninteresting, I think it's fair to characterize that
focus as "simple-minded". This focus seems especially common when people
start talking about "which language is faster"--- as opposed to, say,
which library is faster, or which implementation of a given algorithm is
faster. In some cases the question of language speed is legitimate, but
IME it's far more often used to prop up prejudices about which language
is "better"--- i.e., which group of human beings (defined by their
choice of programming language) is "better".


I agree that, as a community, we need to come to a realistic
understanding of the performance characteristics of Haskell compared to
C etc. I don't like prejudice and propaganda for Haskell any more than I
like prejudice and propaganda for C etc. In some cases it's true that
Haskell out-performs C (or C++, or Java); but in many cases it does not,
and we need to acknowledge that. The real problem, I feel, is that it's
not always clear which category any given program falls into. In some
cases the "idiomatic" Haskell is the fast one, in some cases its the
slow one; but in all cases, what is meant by "idiomatic Haskell" is a
moving target with poorly specified meaning.

The advent of ByteStrings was a major coup in figuring out exactly where
and why Haskell is slow and how to fix it. Iteratees are another one,
though the details are still being worked out. However, things like
strictness annotations on (non-accumulator) function arguments, the
choice whether to use ST or not, the choice whether to CPS-transform or
not, etc, are still very much a black art. Even with a thorough
understanding of the STG-machine and the GHC compilation pipeline, it's
not always clear whether they will help or hurt. ST in particular is
especially finicky, to the point where I wonder if it's ever a win
(outside of in-place updates on arrays).

So while I think most language performance comparisons are dubious to
begin with, I don't think the comparison of Haskell to C++ (or C, or
Java,...) can even be construed as something with a meaningful answer.
Haskell is just too different, and its in-the-large cost model is too
poorly understood. I'm not even aware of any helpful generalizations
over comparing two specific programs. Even when restricting to things
like parsing or compute-intensive numeric code, the comparison comes
back with mixed answers.

--
Live well,
~wren

Yves Parès

unread,
May 16, 2012, 7:43:30 AM5/16/12
to wren ng thornton, haskel...@haskell.org
> On the one hand, characterizing those who desire the best performance possible as "simple-minded" is, at best, a gross over-generalization. Like you, I work in a field where optimization is king (e.g., in machine translation, program runtimes are measured in days).

You misread the logical implication, Ertugrul said:

> simple-minded people often have a desire to get the best performance possible

Which means:
A is simple-minded ==> A will (most probably) want to get the best perf.

You interpreted it as:
A wants to get the best perf. ==> A is simple-minded

I agree with you, the second implication is wrong, but that's not what was meant.

(Moral: Should people use more logical operators and less words we would undertand better).

2012/5/16 wren ng thornton <wr...@freegeek.org>

Gregg Lebovitz

unread,
May 16, 2012, 10:40:26 AM5/16/12
to haskel...@haskell.org
Wren,

I see at least three different issues being discussed here. I think it
is important to delineate them:

1) Does Haskell and its libraries need performance improvements?
Probably yes. Some of the performance issues seem to be related to the
way the language is implemented and others by how it is defined.
Developers really do run into performance issues with Haskell and either
learn to work around the issue or try to fix the offending
implementation. The wiki performance page gives insight into some of the
performance issues and how address them.

2) Are language performance comparisons useful? They can be, if the
comparison is focusing on what each language does best. They aren't
useful if they focus on benchmarks that aren't relevant to the target
domain of each language. I think the problem with current comparisons,
is that they are designed to favor imperative languages.

3) Are the negative perceptions generated by popular performance
comparisons important? Only if you care about the broader adoption of
the language. If you don't care, then the point is moot. If you do care,
then yes the comparisons are unfair and simple minded, but they do have
an affect on the percepts of the language. It is not uncommon for
management to raise roadblocks to the introduction of new technologies
due to flawed reports that were published in popular magazines.

If Haskell were a product and I was responsible for its success, then I
would be using common marketing techniques to combat the negative
perceptions. One is a technique called "changing the playing field"
which means producing documents that reset the criteria for the
evaluations and comparisons. This is not "deception", but merely making
sure that potential users understand where and how to make the proper
comparisons, or more importantly that my "champion" was well armed with
documentation to counter argue popular but flawed comparisons.

On 5/16/2012 6:54 AM, wren ng thornton wrote:
> Haskell is just too different, and its in-the-large cost model is too
> poorly understood. I'm not even aware of any helpful generalizations
> over comparing two specific programs. Even when restricting to things
> like parsing or compute-intensive numeric code, the comparison comes
> back with mixed answers.

Isaac Gouy

unread,
May 16, 2012, 12:59:54 PM5/16/12
to haskel...@haskell.org
Wed May 16 16:40:26 CEST 2012, Gregg Lebovitz wrote:

2) ... I think the problem with current comparisons,
is that they are designed to favor imperative languages.


Please be specific:

- Which current comparisons?

- How do you know what they are designed to favor?

Kevin Charter

unread,
May 16, 2012, 1:44:31 PM5/16/12
to haskell-cafe
On Wed, May 16, 2012 at 8:40 AM, Gregg Lebovitz <gleb...@gmail.com> wrote:
1) Does Haskell and its libraries need performance improvements?  Probably yes. Some of the performance issues seem to be related to the way the language is implemented and others by how it is defined. Developers really do run into performance issues with Haskell and either learn to work around the issue or try to fix the offending implementation. The wiki performance page gives insight into some of the performance issues and how address them.

I think there is a closely-related issue: that you'll need to learn how to debug subtle performance problems early in your Haskell programming career. In particular

  • you will have space leaks and related performance problems in naively-written programs
  • you will consequently need to learn how to use strictness annotations and other related language features, and how to use the profiler to determine where they're needed
For example, imagine you're new to the language, and as an exercise decide to write a program that counts the characters on standard input and writes the count to standard output. A naive program in, say, Python will probably use constant space and be fairly fast. A naive program in Haskell stands a good chance of having a space leak, building a long chain of thunks that isn't forced until it needs to write the final answer.  On small inputs, you won't notice. The nasty surprise comes when your co-worker says "cool, let's run it on this 100 MB log file!" and your program dies a horrible death. If your friend is a sceptic, she'll arch an eyebrow and secretly think your program -- and Haskell -- are a bit lame.

The example is contrived, but it's a very common task to consume some big stream of input and produce a much smaller summary of it. I think it's fair to say that you have to be a slightly sophisticated Haskell programmer to do those kinds of things correctly, at least compared to more mainstream languages.

My experience is that this is perhaps the most important  'problem' with Haskell performance. Not so much that it's typically two or three or ten times slower than language X, but that it's easy to have a bad experience early on, one that seems inconsistent with the language shoot-out and other performance comparisons.

Kevin
-- 
Kevin Charter
kevin....@acm.org

Gregg Lebovitz

unread,
May 16, 2012, 3:02:25 PM5/16/12
to haskel...@haskell.org
Isaac,

I was looking at the debian coding contest benchmarks referenced by
others in this discussion. All of the benchmarks algorithms, appear to
be short computationally intensive programs with a fairly low level of
abstraction.

In almost all examples, the requirement says: you must implement the X
functions as implemented in Java, or C#, or C++. The k-nucleotide even
specifies a requirement to use an update a hash-table.

I wonder if someone here could come up with a set of benchmarks that
would out perform C++.

Interesting that you would focus on this one comment in my post and not
comment on one on countering the benchmarks with a new criteria for
comparing languages.

Gregg

Gregg Lebovitz

unread,
May 16, 2012, 3:17:59 PM5/16/12
to kevin....@acm.org, Kevin Charter, haskell-cafe
Kevin,

Interesting point.

Over the past few weeks as part of my work, I have interviewed a large numbers of Haskell developers from many different industries  and have been hearing the same points you are making. Space leaks that were address by learning how to use laziness and performance issues cleared up by using strictness.

Also interesting is that in all my interviews, GHC performance was never raised. No one said "I have to drop into C to solve that performance problem."

Gregg

Bardur Arantsson

unread,
May 16, 2012, 3:57:40 PM5/16/12
to haskel...@haskell.org
On 05/16/2012 09:02 PM, Gregg Lebovitz wrote:
> Isaac,
>
> I was looking at the debian coding contest benchmarks referenced by
> others in this discussion. All of the benchmarks algorithms, appear to
> be short computationally intensive programs with a fairly low level of
> abstraction.
>
> In almost all examples, the requirement says: you must implement the X
> functions as implemented in Java, or C#, or C++. The k-nucleotide even
> specifies a requirement to use an update a hash-table.
>
> I wonder if someone here could come up with a set of benchmarks that
> would out perform C++.
>

That's easy:

> let ones = 1 : ones
> take 5 ones
[1,1,1,1,1]

I'm not sure how much C++ code you'd have to write to produce the
correct answer without butchering the intent too much, but the naïve
translation to C++ loops infinitely. Obviously Haskell is infintely
better than C++!11111oneone!

> Interesting that you would focus on this one comment in my post and not
> comment on one on countering the benchmarks with a new criteria for
> comparing languages.

Comparing languages is a highly non-trivial matter involving various
disciplines (including various squidgy ones) and rarely makes sense
without a very specific context for comparison.

So the short answer is: mu. Discovering the long answer requires a
lifetime or more of research and may not actually result in an answer.

Regards,

Kevin Charter

unread,
May 16, 2012, 4:00:26 PM5/16/12
to Gregg Lebovitz, haskell-cafe
On Wed, May 16, 2012 at 1:17 PM, Gregg Lebovitz <gleb...@gmail.com> wrote:
Also interesting is that in all my interviews, GHC performance was never raised. No one said "I have to drop into C to solve that performance problem."

That's been my experience too. I've so far been able to get acceptable performance with GHC when I've made good use of the tools for finding space and time leaks. And it hasn't been hard, just something I had to learn to do.
 

Gregg

Kevin

--
Kevin Charter
kevin....@acm.org

Ben Gamari

unread,
May 16, 2012, 4:00:25 PM5/16/12
to kevin....@acm.org, haskell-cafe
Kevin Charter <kcha...@gmail.com> writes:

snip
> For example, imagine you're new to the language, and as an exercise decide
> to write a program that counts the characters on standard input and writes
> the count to standard output. A naive program in, say, Python will probably
> use constant space and be fairly fast. A naive program in Haskell stands a
> good chance of having a space leak, building a long chain of thunks that
> isn't forced until it needs to write the final answer. On small inputs,
> you won't notice. The nasty surprise comes when your co-worker says "cool,
> let's run it on this 100 MB log file!" and your program dies a horrible
> death. If your friend is a sceptic, she'll arch an eyebrow and secretly
> think your program -- and Haskell -- are a bit lame.
>
I, for one, can say that my initial impressions of Haskell were quite
scarred by exactly this issue. Being in experimental physics, I often
find myself writing data analysis code doing relatively simple
manipulations on large(ish) data sets. My first attempt at tackling a
problem in Haskell took nearly three days to get running with reasonable
performance. I can only thank the wonderful folks in #haskell profusely
for all of their help getting through that period. That being said, it
was quite frustrating at the time and I often wondered how I could
tackle a reasonably large project if I couldn't solve even a simple
problem with halfway decent performance. If it weren't for #haskell, I
probably would have given up on Haskell right then and there, much to my
deficit.

While things have gotten easier, even now, nearly a year after writing
my first line of Haskell, I still occassionally find a performance
issue^H^H^H^H surprise. I'm honestly not really sure what technical
measures could be taken to ease this learning curve. The amazing
community helps quite a bit but I feel that this may not be a
sustainable approach if Haskell gains greater acceptance. The profiler
is certainly useful (and much better with GHC 7.4), but there are still
cases where the performance hit incurred by the profiling
instrumentation precludes this route of investigation without time
consuming guessing at how to pare down my test case. It's certainly not
an easy problem.

Cheers,

- Ben

Gregg Lebovitz

unread,
May 16, 2012, 4:37:48 PM5/16/12
to haskel...@haskell.org
Ben,

This is precisely the kind of problem I am currently thinking about and
why I asked for pointers to documents on best practices. The current
model for gaining the necessary experience to use Haskell effectively
does not scale well.

Here are some ideas on how to address the knowledge issue:

1) Outstanding best practices documents that capture this knowledge and
provides useful answers. Organizing this information in an online
document that can be searched by keyword or index would be really
helpful. The hard part will be maintaining it. As someone already
pointed out the wiki entry on performance is already dated.

2) Training presentations derived from #1 that provides lots of examples.

3) Online Training Videos based on #2 above

4) Profiling tools that uses the captured knowledge above, highlights
problem areas, and provides guidance on correcting them.

As I mentioned in a previous email. I am willing to take on organization
#1, but I will need help from people on this list. I can start with the
performance wiki, but we will still need examples of where people get
into trouble and the ways around them.

Gregg

Yves Parès

unread,
May 16, 2012, 4:45:51 PM5/16/12
to Ben Gamari, haskell-cafe
> The profiler is certainly useful (and much better with GHC 7.4)

What are the improvements in that matter? (I just noticed that some GHC flags wrt profiling have been renamed)

2012/5/16 Ben Gamari <bgamar...@gmail.com>

Ben Gamari

unread,
May 16, 2012, 5:18:35 PM5/16/12
to Yves Parès, haskell-cafe
Yves Parès <yves....@gmail.com> writes:

>> The profiler is certainly useful (and much better with GHC 7.4)
>
> What are the improvements in that matter? (I just noticed that some GHC
> flags wrt profiling have been renamed)
>

The executive summary can be found in the release notes[1]. There was
also a talk I remember watching a while ago which gave a pretty nice
overview. I can't recall, but I might have been this[2]. Lastly,
profiling now works with multiple capabilities[3].

Cheers,

- Ben


[1] http://www.haskell.org/ghc/docs/7.4.1/html/users_guide/release-7-4-1.html
[2] http://www.youtube.com/watch?v=QBFtnkb2Erg
[3] https://plus.google.com/107890464054636586545/posts/hdJAVufhKrD

Gregg Lebovitz

unread,
May 16, 2012, 5:23:30 PM5/16/12
to haskel...@haskell.org

On 5/16/2012 3:57 PM, Bardur Arantsson wrote:
> Comparing languages is a highly non-trivial matter involving various
> disciplines (including various squidgy ones) and rarely makes sense
> without a very specific context for comparison. So the short answer
> is: mu. Discovering the long answer requires a lifetime or more of
> research and may not actually result in an answer.
Depends on your goal.

From the discussion on the cafe, it appears that many believe
performance is not the only criteria for evaluating languages. I agree
with the comment made by Ertugrul Söylemez. Some people focus on
getting the best performance because they don't know any better. If
given better evaluation criteria they might think differently.

Asserting the other criteria publicly helps others understand the
benefits of Haskell, otherwise they just see the lower performance
numbers from places like the debian contest.

Isaac Gouy

unread,
May 16, 2012, 6:59:01 PM5/16/12
to haskel...@haskell.org
> From: Gregg Lebovitz <gleb...@gmail.com>

> Sent: Wednesday, May 16, 2012 12:02 PM


> I was looking at the debian coding contest benchmarks referenced by others in
> this discussion.

"debian coding contest" ?

It's been called many things but, until now, not that.



> All of the benchmarks algorithms, appear to be short
> computationally intensive programs with a fairly low level of abstraction.

Tiny tiny toy programs - more or less 100 lines - not quite small enough to be microbenchmarks. Why might that be?

Well, not IO bound. Do people usually mean IO performance when they compare programming languages?

(I guess you must have excluded meteor-contest from your consideration. It's a coding contest and so doesn't specify an algorithm.)



> In almost all examples, the requirement says: you must implement the X functions
> as implemented in Java, or C#, or C++.

binary-trees - No, it doesn't say that.
thread-ring - No.
chameneos-redux - No.
pidigits - No.
regex-dna - No.
k-nucleotide - No.
mandelbrot - No.
reverse-complement - No.

spectral-norm - "Each program must implement 4 separate functions / procedures / methods like the C# program." (The point being cross function calling so don't amalgamate 4 functions into a single function.)

fasta-redux - No.
fasta - No.
meteor-contest - No.

fannkuch-redux - "defined by programs in Performing Lisp Analysis of the FANNKUCH Benchmark"

n-body - "Each program should model the orbits of Jovian planets, using the same simple symplectic-integrator - see the Java program."


> The k-nucleotide even specifies a requirement to use an update a hash-table.

Probably not too onerous a requirement for a test of hash table lookup :-)

Some people like hash tables, go figure.

http://gregorycollins.net/posts/2011/06/11/announcing-hashtables



-snip-
> Interesting that you would focus on this one comment in my post and not comment
> on one on countering the benchmarks with a new criteria for comparing languages.

Too obvious to be interesting.

Interesting that you haven't said how you know they are "designed to favor imperative languages" ;-)

Richard O'Keefe

unread,
May 16, 2012, 9:13:33 PM5/16/12
to Gregg Lebovitz, haskel...@haskell.org
In a lecture today I presented (for quite other reasons) a simple combinatorial enumeration problem where the difference between two algorithms was far larger than any plausible difference between programming languages. If a programming language makes it easier to explore high level alternative algorithms, you may very easily end up with faster programs in a "slower" language. (Sorry about the very long line: broken return key.)

Gregg Lebovitz

unread,
May 16, 2012, 10:04:47 PM5/16/12
to Richard O'Keefe, haskel...@haskell.org
Richard,

Thank you. This is an example of what I had in mind when I talked about
changing the playing field. Do you have a slide deck for this lecture
that you would be willing to share with me? I am very interested in
learning more.

Gregg

Richard O'Keefe

unread,
May 17, 2012, 1:11:42 AM5/17/12
to Haskell Cafe
On 17/05/2012, at 2:04 PM, Gregg Lebovitz wrote:

> Richard,
>
> Thank you. This is an example of what I had in mind when I talked about changing the playing field. Do you have a slide deck for this lecture that you would be willing to share with me? I am very interested in learning more.


No slide deck required. The task is "generating alternating permutations".

Method 1: generate permutations using a backtracking search;
when a permutation is generated, check if it is alternating.

Method 2: use the same backtracking search, but only allow extensions
that preserve the property of being an alternating permutation.

For n=12 the second method is 94 times faster than the first, and the
ratio increases with growing n. At the time I wrote the program I had not
heard of Bauslaugh and Ruskey's constant-average-time algorithm for generating
alternating permutations. Experimentally the Method 2 backtracking search
appears to take constant time per solution anyway.

n (time ms): En n!

1 ( 0.0): 1 1 <- method 1
1 ( 0.0): 1 <- method 2
2 ( 0.0): 1 2
2 ( 0.0): 1
3 ( 0.0): 2 6
3 ( 0.0): 2
4 ( 0.0): 5 24
4 ( 0.0): 5
5 ( 0.0): 16 120
5 ( 0.0): 16
6 ( 0.1): 61 720
6 ( 0.0): 61
7 ( 0.6): 272 5040
7 ( 0.1): 272
8 ( 4.4): 1385 40320
8 ( 0.3): 1385
9 ( 35.2): 7936 362880
9 ( 1.4): 7936
10 ( 359.7): 50521 3628800
10 ( 9.3): 50521
11 ( 4035.7): 353792 39916800
11 ( 67.0): 353792
12 (49584.6): 2702765 479001600 <- method 1
12 ( 528.1): 2702765 <- method 2

Those times are in C; SWI Prolog (which compiles to an abstract instruction
set that is then interpreted by portable C) was about 24 times slower.
A factor of 94 comfortably exceeds a factor of 24...

Roman Werpachowski

unread,
May 17, 2012, 2:12:18 AM5/17/12
to haskel...@haskell.org
> Date: Wed, 16 May 2012 06:54:08 -0400
> From: wren ng thornton <wr...@freegeek.org>
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: haskel...@haskell.org
> Message-ID: <4FB38750...@freegeek.org>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed

> But on the other hand, there are quite a lot of people who focus
> excessively on "optimization" when the actual differences for the code
> they're writing are either negligible (e.g., because of I/O bottlenecks)
> or uninteresting (e.g., a 2x slowdown is a nuisance rather than a
> crisis).

It depends on what you use the code for. If I run an overnight report
for the trading book of my bank in 16 hours, it is
not acceptable and is a disaster. If I run it in 8h, it's OK-ish. In
business settings, you often have strict deadlines and
optimizing the code to be 2x faster can make a huge difference, as you
change from missing the deadline to
meeting a deadline.

Regards,
RW

Colin Adams

unread,
May 17, 2012, 3:44:15 AM5/17/12
to Haskell Cafe
Oops. Forget to reply to all.

---------- Forwarded message ----------
From: Colin Adams <colinpa...@gmail.com>
Date: 17 May 2012 08:43
Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
To: Roman Werpachowski <roman.wer...@gmail.com>





On 17 May 2012 07:12, Roman Werpachowski <roman.wer...@gmail.com> wrote:

It depends on what you use the code for. If I run an overnight report
for the trading book of my bank in 16 hours, it is
not acceptable and is a disaster. If I run it in 8h, it's OK-ish. In
business settings, you often have strict deadlines and
optimizing the code to be 2x faster can make a huge difference, as you
change from missing the deadline to
meeting a deadline.


Seconded.
I remember once being asked to examine a certain batch program for performance improvements. I spotted an obvious inefficiency, and made a recommendation, which was implemented. It saved something like 10 - 30 minutes on the overnight run (which took hours), so I was disappointed with the result. But the client was delighted, as the run now fitted into the batch window.

Roman Werpachowski

unread,
May 17, 2012, 6:07:37 AM5/17/12
to haskel...@haskell.org
> From: "Richard O'Keefe" <o...@cs.otago.ac.nz>
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: Haskell Cafe <haskel...@haskell.org>
> Message-ID: <5F6605A2-DFE0-4AEA...@cs.otago.ac.nz>
> Content-Type: text/plain; charset=us-ascii
>
> On 17/05/2012, at 2:04 PM, Gregg Lebovitz wrote:
>
>> Richard,
>>
>> Thank you. This is an example of what I had in mind when I talked about changing the playing field. Do you have a slide deck for this lecture that you would be willing to share with me? I am very interested in learning more.
>
>
> No slide deck required.  The task is "generating alternating permutations".
>
> Method 1: generate permutations using a backtracking search;
>          when a permutation is generated, check if it is alternating.
>
> Method 2: use the same backtracking search, but only allow extensions
>          that preserve the property of being an alternating permutation.

Gregg,

what makes Method 2 so much harder than Method 1 to implement in C or C++?

Gregg Lebovitz

unread,
May 17, 2012, 8:24:46 AM5/17/12
to haskel...@haskell.org
Roman,

I think this question is for Richard. I haven't had a chance to play
with these methods. I will try to do that today.

Gregg

Gregg Lebovitz

unread,
May 17, 2012, 8:50:58 AM5/17/12
to haskel...@haskell.org
Isaac,

I see your point. Probably I shouldn't have made that assertion given my
limited understanding of the benchmarks. I want to thank you for your
kind and gentle way of pointing this out to me. I feel very welcomed and
encourage.

I still plan to work on the performance paper with the help of others on
this list. I look forward to your support and constructive feedback.

Gregg


On 5/16/2012 6:59 PM, Isaac Gouy wrote:
> -snip-
>> Interesting that you would focus on this one comment in my post and not comment
>> on one on countering the benchmarks with a new criteria for comparing languages.
> Too obvious to be interesting.
>
> Interesting that you haven't said how you know they are "designed to favor imperative languages" ;-)
>
>
>

Isaac Gouy

unread,
May 17, 2012, 11:25:54 AM5/17/12
to haskel...@haskell.org
> From: Gregg Lebovitz <gleb...@gmail.com>

> Sent: Thursday, May 17, 2012 5:50 AMI look forward to
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
>
> Isaac,
>
> I see your point. Probably I shouldn't have made that assertion given my
> limited understanding of the benchmarks. I want to thank you for your
> kind and gentle way of pointing this out to me. I feel very welcomed and
> encourage.
>
> I still plan to work on the performance paper with the help of others on
> this list. I look forward to your support and constructive feedback.
>
> Gregg



Gregg,

You wrote that you were looking at the benchmarks and then made a definite assertion about what was shown on the website. Unsuspecting readers would assume that you were truthfully reporting what you saw on the website.

I cannot explain to them why you made an assertion which could be seen not to be true, simply by looking at the benchmarks game website.

best wishes, Isaac

Gregg Lebovitz

unread,
May 17, 2012, 11:36:42 AM5/17/12
to haskel...@haskell.org
Isaac,

I understand. Thank you. I will be more careful about my wording in the
future. I really do appreciate your taking the time to point this out to
me. I am here to learn and help where I can.

Gregg

Richard O'Keefe

unread,
May 17, 2012, 11:30:09 PM5/17/12
to Roman Werpachowski, haskel...@haskell.org

On 17/05/2012, at 10:07 PM, Roman Werpachowski wrote:
>> No slide deck required. The task is "generating alternating permutations".
>>
>> Method 1: generate permutations using a backtracking search;
>> when a permutation is generated, check if it is alternating.
>>
>> Method 2: use the same backtracking search, but only allow extensions
>> that preserve the property of being an alternating permutation.
>
> Gregg,
>
> what makes Method 2 so much harder than Method 1 to implement in C or C++?


It was me, not Gregg. There was and is no claim that method 2 is "much harder
to implement in C or C++". In fact both methods *were* implemented easily in C.
The claim was and remains solely that
THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
can be bigger than
THE TIME DIFFERENCE BETWEEN *LANGUAGES*
and is in this particular case.

(And that's despite the fact that the C version kept the set of unused
elements as a one-native-word bit mask, while the Prolog version kept it
as a linked list.)

There is also a second claim, which I thought was uncontentious:
the relative difficulty of tasks varies with language.

Roman Werpachowski

unread,
May 18, 2012, 7:45:12 AM5/18/12
to haskel...@haskell.org
> Date: Fri, 18 May 2012 15:30:09 +1200
> From: "Richard O'Keefe" <o...@cs.otago.ac.nz>
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: Roman Werpachowski <roman.wer...@gmail.com>
> Cc: haskel...@haskell.org
> Message-ID: <B43B8CE3-9F90-4DC3...@cs.otago.ac.nz>
> Content-Type: text/plain; charset=us-ascii
>
>
> On 17/05/2012, at 10:07 PM, Roman Werpachowski wrote:
>>> No slide deck required.  The task is "generating alternating permutations".
>>>
>>> Method 1: generate permutations using a backtracking search;
>>>          when a permutation is generated, check if it is alternating.
>>>
>>> Method 2: use the same backtracking search, but only allow extensions
>>>          that preserve the property of being an alternating permutation.
>>
>> Gregg,
>>
>> what makes Method 2 so much harder than Method 1 to implement in C or C++?
>
>
> It was me, not Gregg.

My apologies to you and Gregg.

> There was and is no claim that method 2 is "much harder
> to implement in C or C++".  In fact both methods *were* implemented easily in C.

OK, got that now. So Haskell doesn't have a *big* advantage over C w/r
to the ease of implementation of both algorithms?

> The claim was and remains solely that
> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>   can be bigger than
> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>   and is in this particular case.

Yes, but aren't the differences in the same ballpark, and if we want
to compare *languages*, we should use identical algorithms to make the
comparison fair.

>
> (And that's despite the fact that the C version kept the set of unused
> elements as a one-native-word bit mask, while the Prolog version kept it
> as a linked list.)
>
> There is also a second claim, which I thought was uncontentious:
> the relative difficulty of tasks varies with language.

It matters much less if you write the code to be run multiple times.

Regards,
RW

Isaac Gouy

unread,
May 18, 2012, 12:37:17 PM5/18/12
to haskel...@haskell.org




----- Original Message -----
> From: Richard O'Keefe <o...@cs.otago.ac.nz>
> Sent: Thursday, May 17, 2012 8:30 PM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
-snip-

> The claim was and remains solely that
> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>   can be bigger than
> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>   and is in this particular case.

That seems like a modest and uncontentious claim.


> There is also a second claim, which I thought was uncontentious:
> the relative difficulty of tasks varies with language.

That doesn't seem unlikely; but I think you'd have to spell out just what you mean by language, and it also doesn't seem unlikely that for the same task the relative difficulty might flip when other details change (the people doing the task, the language implementation, the libraries available for the language implementation...)

It all seems so very particular ;-)

o...@cs.otago.ac.nz

unread,
May 18, 2012, 12:38:29 PM5/18/12
to Roman Werpachowski, haskel...@haskell.org

>> There was and is no claim that method 2 is "much harder
>> to implement in C or C++".  In fact both methods *were* implemented
>> easily in C.
>
> OK, got that now. So Haskell doesn't have a *big* advantage over C w/r
> to the ease of implementation of both algorithms?

In the case of these specific algorithms, no.
In the case of other backtracking algorithms, it could have
quite a big advantage.
>
>> The claim was and remains solely that
>> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>>   can be bigger than
>> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>>   and is in this particular case.
>
> Yes, but aren't the differences in the same ballpark,

(a) only to the degree that you are willing to call
all language differences "in the same ballpark".
(b) no, because the speed difference between the algorithms
grows without bound as the problem size increases, while
the speed difference between the languages does not.

Here's another example: the UNIX 'tsort' command.
I have versions in Java, Smalltalk, and C. The Java and
Smalltalk versions are about the same speed, linear in
the size of the graph. The C version appears to be the
original UNIX code, and it is *quadratic* in the number
of nodes. Result: Smalltalk running faster than C by
a ratio that increases without bound as the input grows.
This is actually a case where it was easier to write fast
code than slow code in Smalltalk, easier to write slow code
than fast code in C. (Built in hash tables, what else?)

> and if we want
> to compare *languages*, we should use identical algorithms to make the
> comparison fair.

In the permutation generation example, I was talking about
four programs:
Language X Language Y
Method 1 Program X1 Program Y1 -- identical algorithms
Method 2 Program X2 Program Y2 -- identical algorithms

However, "identical" isn't clearly defined.
For example, is a Java program that uses HashMap<String,Integer>
-- and thus allocates millions of Integer objects --
"identical" to a Smalltalk program using Dictionary
-- where the integers fit into the immediate range, so that
no integer boxes are needed at all -- ?
In the 'tsort' case, it turns out that the Java and Smalltalk
versions are I/O bound with over 90% of the time spent just
reading the data. They have I/O libraries with very different
structures, so what does "identical algorithms" mean? If you
are using dictionaries/hashmaps, and the two languages have
implementations that compute different hash functions for strings,
is _that_ using the same implementation? (Some hash functions
are amazingly bad, see Andres Valloud's book.)

>> There is also a second claim, which I thought was uncontentious:
>> the relative difficulty of tasks varies with language.
>
> It matters much less if you write the code to be run multiple times.

You misunderstand. The issue is whether you bother to write the
more efficient code *at all*. The tsort code in C I was looking
at was real production code from a UNIX system that is still on
the market, and in the 20 or so years since it was written, nobody
had ever bothered to fix the blatant efficiency bug. In other
languages, it would have been easier to write the program without
the bug in the first place.

Isaac Gouy

unread,
May 18, 2012, 1:51:44 PM5/18/12
to haskel...@haskell.org




----- Original Message -----
> From: "o...@cs.otago.ac.nz" <o...@cs.otago.ac.nz>
> Sent: Friday, May 18, 2012 9:38 AM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?

-snip-
>> and if we want
>> to compare *languages*, we should use identical algorithms to make the
>> comparison fair.
>
> In the permutation generation example, I was talking about
> four programs:
>           Language X  Language Y
> Method 1  Program X1  Program Y1  -- identical algorithms
> Method 2  Program X2  Program Y2  -- identical algorithms
>
> However, "identical" isn't clearly defined.

Moreover, being absolutely sure that the algorithms are in some sense
"identical" might make comparison pointless - for example, when the same assembly
is generated by gcc from a C program and gnat from an Ada program.


-snip-
> In the 'tsort' case, it turns out that the Java and Smalltalk
> versions are I/O bound with over 90% of the time spent just
> reading the data.

My guess is that they could be written to do better than that - but it's
idiotic of me to say so without understanding the specifics, please
forgive me ;-)


> They have I/O libraries with very different
> structures, so what does "identical algorithms" mean?  If you
> are using dictionaries/hashmaps, and the two languages have
> implementations that compute different hash functions for strings,
> is _that_ using the same implementation?

Of course, to some degree, user defined hash functions remedy that specific problem.


I agree with the thrust of your comments - even programming languages (and implementations) that seem similar, are often so different (when we get down to specifics) that comparison between programs written in different languages is a matter of making the best of a bad job.

But we're still going to ask - Will my program be faster if I write it in language X? - and we're
still going to wish for a simpler answer than - It depends how you write it!

Paulo Pocinho

unread,
May 18, 2012, 6:53:33 PM5/18/12
to Isaac Gouy, haskel...@haskell.org
I've been following the topic in both threads. Very nice discussion.

On 18 May 2012 18:51, Isaac Gouy <igo...@yahoo.com> wrote:
>
> Moreover, being absolutely sure that the algorithms are in some sense
> "identical" might make comparison pointless - for example, when the same assembly
> is generated by gcc from a C program and gnat from an Ada program.

Suppose we take the most efficient algorithm, in terms of time and
space. The demonstrations given before this post indicate the language
can be more important than the algorithm. There are two different
takes on the situation.
On one hand we have time and space. On the other, man-hours and
reliability. Both languages deal with this ratio.

> But we're still going to ask - Will my program be faster if I write it in language X? - and we're
> still going to wish for a simpler answer than - It depends how you write it!

Change that question to: Will my language be faster if I write program X?

Good C++ implementations are easy to find. However, everything is
exposed. De-bugging is taken for granted. Unless everyone in a team
follows some strict practice, productivity may be at risk. It is more
difficult to provide reliability (i.e. programs as proofs [1]). It is
everywhere - that only increases probability to find more optimized
algorithms in C++ libraries.

The Haskell implementation enforces an interface to the programmer
(the low level time-space management, i.e. GC, IO monads,
parallelism). Improvements in implementation increase efficiency of
the program (i.e. optimization in standard libraries [2]). Risk of
damaging productivity is low.

Taking that you could derive one efficient algorithm down to assembly,
the fastest language is a matter of implementation. Then, in equal
grounds, it's rather easy choice.

--
[1] http://homepages.inf.ed.ac.uk/wadler/papers/frege/frege.pdf
[2] https://groups.google.com/d/msg/haskell-cafe/KIxGd4babKE/J7LV3EGtutsJ

wren ng thornton

unread,
May 19, 2012, 1:57:20 AM5/19/12
to haskel...@haskell.org
On 5/16/12 7:43 AM, Yves Parès wrote:
>> On the one hand, characterizing those who desire the best performance
>> possible as "simple-minded" is, at best, a gross over-generalization. Like
>> you, I work in a field where optimization is king (e.g., in machine
>> translation, program runtimes are measured in days).
>
> You misread the logical implication, Ertugrul said:
>
>> simple-minded people often have a desire to get the best performance
>> possible

For the record, I was replying to Ryan's response to Ertugrul, rather
than to Ertugrul directly. I make no claims about what Ertugrul said or
the (in)validity thereof.

However, while the "logical" interpretation of Ertugrul's words may be
that simple-mindedness implies performance-desire, that interpretation
is not the only one available to the standard interpretation of his
words, nor IMO the dominant interpretation. It is equally valid to
interpret them as saying that the people under discussion are
simpletons, and that those people desire the best performance possible.
(I.e., an attributive rather than restrictive reading of the adjective.)
This latter interpretation is perfectly valid ---as the semantics of the
utterance---, but is pejorative of the people under discussion; and that
pejoration is what Ryan was (fairly) calling Ertugrul out on.

While I think it was fair for Ryan to call Ertugrul out on the matter, I
also thought the subject warranted further discussion since the
pejorative claim comes from somewhere, and so dismissing it entirely
fails to address the underlying issue. Not that I think Ryan was
dismissing it outright, just that it would be easy to interpret his
words as doing so.

--
Live well,
~wren

Ertugrul Söylemez

unread,
May 19, 2012, 2:57:38 AM5/19/12
to haskel...@haskell.org
wren ng thornton <wr...@freegeek.org> wrote:

> However, while the "logical" interpretation of Ertugrul's words may be
> that simple-mindedness implies performance-desire, that interpretation
> is not the only one available to the standard interpretation of his
> words, nor IMO the dominant interpretation. It is equally valid to
> interpret them as saying that the people under discussion are
> simpletons, and that those people desire the best performance
> possible. (I.e., an attributive rather than restrictive reading of the
> adjective.) This latter interpretation is perfectly valid ---as the
> semantics of the utterance---, but is pejorative of the people under
> discussion; and that pejoration is what Ryan was (fairly) calling
> Ertugrul out on.

Well, the meaning was the logical one: Simple-mindedness implies desire
for maximum performance possible. Even that is an overgeneralization.
People can be simple-minded in many ways. I hoped that my point would
come across.

Ryan got right that it was indeed an accusation. I did not specify to
what group it was directed, which was probably the reason for the
confusion. The unconditional desire for maximum possible object code
performance is usually very stupid, not to mention impossible to reach
with any high level language and any multi-tasking operating system. To
get the maximum possible performance you must write an ring 0
application in assembly language and boot directly into it.

Haskell delivers reasonable performance for almost all non-embedded
applications, and for the extreme edge cases one can still switch to C
or assembly using the FFI. Haskell's average penalty compared to C is
no reason to write the entire application in C. That was my main point.


Greets,
Ertugrul

--
nightmare = unsafePerformIO (getWrongWife >>= sex)
http://ertes.de/
signature.asc

wren ng thornton

unread,
May 19, 2012, 3:30:05 AM5/19/12
to haskel...@haskell.org
On 5/16/12 3:57 PM, Bardur Arantsson wrote:
> Comparing languages is a highly non-trivial matter involving various
> disciplines (including various squidgy ones) and rarely makes sense
> without a very specific context for comparison.

Exactly. That's what I was trying to get at re the problems of comparing
Haskell to C++ (or indeed any pair of dissimilar languages). A
legitimate comparison will involve far more than microbenchmarks, but
then a legitimate comparison must always have a specific focus and
context in order to be able to say anything interesting. The problems I
see in language comparisons is that by and large they tend not to have
any such focus[1], and consequently they shed little light on how the
languages compare and shed a bit of darkness by serving only to confirm
initial biases.


[1] A nice counter-example to this trend are papers like:

http://www.cse.unsw.edu.au/~chak/papers/modules-classes.pdf

There was another one comparing the expressivity of Java-style
polymorphism vs Haskell-style polymorphism, based on an analysis of
in-the-wild code; but I can't seem to pull it up at the moment.

wren ng thornton

unread,
May 19, 2012, 3:49:02 AM5/19/12
to haskel...@haskell.org
On 5/18/12 7:45 AM, Roman Werpachowski wrote:
> On Fri, 18 May 2012 15:30:09 +1200, "Richard O'Keefe"<o...@cs.otago.ac.nz> wrote:
>> The claim was and remains solely that
>> THE TIME DIFFERENCE BETWEEN *ALGORITHMS*
>> can be bigger than
>> THE TIME DIFFERENCE BETWEEN *LANGUAGES*
>> and is in this particular case.
>
> Yes, but aren't the differences in the same ballpark, and if we want
> to compare *languages*, we should use identical algorithms to make the
> comparison fair.

"Fair" in what sense? That is, what _exactly_ are you hoping to compare?

If the goal is to benchmark the implementation of the runtime, VM, or
built-in types, then requiring the same algorithm makes sense--- because
the algorithm is irrelevant other than to provide a bunch of calls to
the runtime/vm/etc. However, benchmarking a language's implementation in
this way is rarely that helpful. It's great for comparing CPython to
PyPy (or any other in-language cross-compiler comparison), but what
would it tell you about Haskell vs C++?

If the goal is to compare, say, production costs for a given level of
performance, then fixing the algorithm is not at all fair. The fact of
the matter is that different languages make different algorithms easier
to (a) implement, and (b) discover/identify/generalize. Thus, when it
comes to real-world software, the language that makes it easy to
implement good algorithms has a major advantage--- an advantage which is
being specifically ignored by fixing the algorithm aforehand.

In order for "fair" to have any meaning whatsoever, we must first
specify what is being compared, so that we know what it is that things
are supposed to be fair with regard to.

--
Live well,
~wren

wren ng thornton

unread,
May 19, 2012, 4:15:48 AM5/19/12
to haskel...@haskell.org
On 5/16/12 4:37 PM, Gregg Lebovitz wrote:
> 1) Outstanding best practices documents that capture this knowledge and
> provides useful answers. Organizing this information in an online
> document that can be searched by keyword or index would be really
> helpful. The hard part will be maintaining it. As someone already
> pointed out the wiki entry on performance is already dated.

In light of that fact, and the general high-pace of innovation in
Haskell, I think that rather than trying to keep a wiki up-to-date, it
would probably be better to create a series of time-stamped "formal"
documents--- like how HCAR is published.

The benefit of such an approach is two-fold. On the one hand, there's a
date right there which shows when the contents were relevant (e.g., on
which version of GHC one particular style performed better than
another). On the other hand, when putting out the next issue/version the
authors are forced to revisit the old content and ask whether it's still
relevant or whether the tides have shifted; and if they're not willing
to commit one way or another, the content can be dropped without eliding
the fact (written in the previous issue) that it used to be an optimization.

Another thing I think is important is to give the actual reasons why
some particular thing is an optimization, and more particularly why it
is no longer an optimization once things have changed. A big issue I've
run into with blog-posts etc on performance in Haskell is that there's a
lot of folkloreism and just-so stories which tell people "just do X",
without explaining how X differs from Y, why X is better or worse than
Y, or the circumstances under which you may prefer Y to X. Without
providing the details about why something is an
optimization/pessimization, we don't provide users with the tools to
figure things out for themselves in this everchanging world.

--
Live well,
~wren

Paolino

unread,
May 19, 2012, 6:44:12 AM5/19/12
to Janek S., haskel...@haskell.org
Hello,

I would like to add questions to yours. I'm not sure that C++ programs
are same performance as C actually, so I can have a bad logic.

How much is hard to port a haskell program to C ?
If it will become harder and harder, (i.e. for parallelizations) than
it's fair to choose haskell for performance, but if it's not, I think
it's hard to think that such a high level language could ever compile
down to something running faster than its port to C.

Many of the abstractions used for nice haskell code are based on
boxing values, so we have to take away all the boxing to make it run
faster, and so we are porting to C.

Will hardware really go for hundreds of cores ?
If yes than all languages supporting simple parallelization will be
used to code, but this will not make C slower, just more difficult to
use in contests.

Also a leading haskell point is , how does it take to make a correct
code, which can be performance in the wild.

So it makes some sense to choose haskell for performance to me, but
not to try to compete in sequential algorithmic performance, with the
exception for who do it to learn what is behind the abstractions that
makes their code slow and not compile down to perfect C.

paolino

P.S. The performance problems are actually learning haskell
programming abstractions and the intuition development on when to use
each.



2012/5/6 Janek S. <freme...@poczta.onet.pl>:
> Hi,
>
> a couple of times I've encountered a statement that Haskell programs can have performance
> comparable to programs in C/C++. I've even read that thanks to functional nature of Haskell,
> compiler can reason and make guarantess about the code and use that knowledge to automatically
> parallelize the program without any explicit parallelizing commands in the code. I haven't seen
> any sort of evidence that would support such claims. Can anyone provide a code in Haskell that
> performs better in terms of execution speed than a well-written C/C++ program? Both Haskell and C
> programs should implement the same algorithm (merge sort in Haskell outperforming bubble sort in
> C doesn't count), though I guess that using Haskell-specific idioms and optimizations is of
> course allowed.
>
> Jan

Roman Werpachowski

unread,
May 19, 2012, 6:58:17 AM5/19/12
to haskel...@haskell.org
> Date: Sat, 19 May 2012 08:57:38 +0200
> From: Ertugrul S?ylemez <e...@ertes.de>
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
> To: haskel...@haskell.org
> Message-ID: <20120519085...@tritium.streitmacht.eu>
> Content-Type: text/plain; charset="us-ascii"


> Haskell delivers reasonable performance for almost all non-embedded
> applications, and for the extreme edge cases one can still switch to C
> or assembly using the FFI.

At the cost of introducing another dimension of complexity (two
codebases, two toolchains, need for programmers specialising in two
languages). There is a reason why many businesses go to great pains to
ensure their codebase is homogeneous (and it's not "managers are
dumb").

RW

Janis Voigtländer

unread,
May 19, 2012, 9:07:14 AM5/19/12
to haskel...@haskell.org
Am 19.05.2012 12:00, schrieb wren ng thornton:
> Exactly. That's what I was trying to get at re the problems of comparing
> Haskell to C++ (or indeed any pair of dissimilar languages). A
> legitimate comparison will involve far more than microbenchmarks, but
> then a legitimate comparison must always have a specific focus and
> context in order to be able to say anything interesting. The problems I
> see in language comparisons is that by and large they tend not to have
> any such focus[1], and consequently they shed little light on how the
> languages compare and shed a bit of darkness by serving only to confirm
> initial biases.
>
>
> [1] A nice counter-example to this trend are papers like:
>
> http://www.cse.unsw.edu.au/~chak/papers/modules-classes.pdf
>
> There was another one comparing the expressivity of Java-style
> polymorphism vs Haskell-style polymorphism, based on an analysis of
> in-the-wild code; but I can't seem to pull it up at the moment.

Possibly "Software extension and integration with type classes" by
Lämmel and Ostermann?

http://doi.acm.org/10.1145/1173706.1173732

Best,
Janis.

--
Jun.-Prof. Dr. Janis Voigtländer
http://www.iai.uni-bonn.de/~jv/

Isaac Gouy

unread,
May 19, 2012, 1:07:25 PM5/19/12
to wren ng thornton, haskel...@haskell.org
> From: wren ng thornton <wr...@freegeek.org>
> Sent: Saturday, May 19, 2012 12:49 AM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?


-snip-
> "Fair" in what sense? That is, what _exactly_ are you hoping to
> compare?
>
> If the goal is to benchmark the implementation of the runtime, VM, or built-in
> types, then requiring the same algorithm makes sense--- because the algorithm is
> irrelevant other than to provide a bunch of calls to the runtime/vm/etc.
> However, benchmarking a language's implementation in this way is rarely that
> helpful. It's great for comparing CPython to PyPy (or any other in-language
> cross-compiler comparison), but what would it tell you about Haskell vs C++?

The PyPy crowd won't like it if you take programs written for CPython and measure how they run with PyPy - that's "not fair". But it might take a couple of years before they contribute programs optimised for PyPy :-(

(You already said what it would tell you, but questioned how helpful that would be.)


> If the goal is to compare, say, production costs for a given level of
> performance, then fixing the algorithm is not at all fair. The fact of the
> matter is that different languages make different algorithms easier to (a)
> implement, and (b) discover/identify/generalize. Thus, when it comes to
> real-world software, the language that makes it easy to implement good
> algorithms has a major advantage--- an advantage which is being specifically
> ignored by fixing the algorithm aforehand.

Let's just say that's true - Is it useful? What would we need to do to make the comparison?

We could do something like - "Plat_Forms: Is there a single best web development technology? A professional programming contest"

http://page.mi.fu-berlin.de/prechelt/Biblio/platforms07-cacm-2010.pdf

But that was just once, with very few teams, and only one problem -- seems like it would need to be repeated more often than is affordable, and with more teams than can be persuaded to donate their time.

Maybe your point is true but practically useless? :-(


> In order for "fair" to have any meaning whatsoever, we must first
> specify what is being compared, so that we know what it is that things are
> supposed to be fair with regard to.

'What does "not fair" mean? (A fable)'    http://stackoverflow.com/a/6380299

Richard O'Keefe

unread,
May 20, 2012, 6:41:14 PM5/20/12
to Haskell Cafe

On 19/05/2012, at 5:51 AM, Isaac Gouy wrote:
>> In the 'tsort' case, it turns out that the Java and Smalltalk
>> versions are I/O bound with over 90% of the time spent just
>> reading the data.
>
> My guess is that they could be written to do better than that - but it's
> idiotic of me to say so without understanding the specifics, please
> forgive me ;-)

Actually, I/O bound is *good*.

Here's the times from the C version, which has been hacked hard in order
to be as fast as I could make it.

N total input process output
1000; 0.004618 = 0.004107 + 0.000438 + 0.000073
2000; 0.014467 = 0.012722 + 0.001609 + 0.000136
5000; 0.059810 = 0.051308 + 0.008199 + 0.000303
10000; 0.204111 = 0.150638 + 0.052800 + 0.000673
20000; 0.717362 = 0.518343 + 0.197655 + 0.001364
50000; 3.517340 = 2.628550 + 0.885456 + 0.003331

N here is the number of nodes in the graph;
the number of edges is floor(N**1.5 * 0.75).
Input is the read-word + look up in hash table time.
Process is the compute-the-transitive-closure time.
Output is the time to write the node names in order.
All node names had the form x## with ## being 1..10000.
This is with my own low level code; using scanf(%...s)
pushed the input time up by 40%.

The Mac OS X version of the tsort command took
31.65 CPU seconds for N=10,000, of which
28.74 CPU seconds was 'system'.

Like I said, the languages I used in this test
>> ... have I/O libraries with very different
>> structures, so what does "identical algorithms" mean? If you
>> are using dictionaries/hashmaps, and the two languages have
>> implementations that compute different hash functions for strings,
>> is _that_ using the same implementation?
>
> Of course, to some degree, user defined hash functions remedy that specific problem.

While creating other, and perhaps worse, ones.

For example, in the Smalltalk code, if you use a Dictionary of Strings,
you're getting Robert Jenkin's hash function in optimised C. If you
supply your own, you're getting a very probably worse hash function
and it's going to run rather slower. And above all, the stuff you are
benchmarking is no longer code that people are actually likely to write.

> But we're still going to ask - Will my program be faster if I write it in language X? - and we're
> still going to wish for a simpler answer than - It depends how you write it!

Here's another little example. I had a use for the Singular Value Decomposition
in a Java program. Should I use pure Java or native C?

Pure Java taken straight off the CD-ROM that came with a large
book of numerical algorithms in Java: T seconds.

After noticing that the code was just warmed over Fortran, and was
varying the leftmost subscript fastest (which is good for Fortran,
bad for most other languages) and swapping all the matrix dimensions: T/2 seconds.

After rewriting in C: T/4 seconds.

After rewriting the C code to call the appropriate BLAS
and thereby using tuned code for the hardware, T/7 seconds.

Since this was going to take hundreds of seconds per run, the answer was easy.

A simple little thing like using a[i][j] vs a[j][i] made a
factor of 2 difference to the overall speed.

"It depends" is the second best answer we can get.
The best answer is one that says _what_ it depends on.

Richard O'Keefe

unread,
May 20, 2012, 10:17:56 PM5/20/12
to Paolino, haskel...@haskell.org, Janek S.

> How much is hard to port a haskell program to C ?
> If it will become harder and harder, (i.e. for parallelizations) than
> it's fair to choose haskell for performance, but if it's not, I think
> it's hard to think that such a high level language could ever compile
> down to something running faster than its port to C.

There is a logic programming language called Mercury;
it has strict polymorphic types and strict modes and it supports
functional syntax as well as Horn clause syntax. You could think
of it as 'strict Clean with unification'.

In the early days, they had a list processing benchmark where
the idiomatic Mercury version of the program was faster than
the idiomatic C version of the program, despite the fact that
at the time Mercury was compiling via C.

The answer was that the kind of C code being generated by Mercury
was not the kind of code any sane programmer would ever have written
by hand. It really does depend on how you write it.

> Will hardware really go for hundreds of cores ?

You can already buy a >700 core machine (I have _no_ idea how many
chips are involved in that) for Java.

Ryan Newton

unread,
May 21, 2012, 10:33:14 AM5/21/12
to Ertugrul Söylemez, haskel...@haskell.org
The unconditional desire for maximum possible object code
performance is usually very stupid, not to mention impossible to reach
with any high level language and any multi-tasking operating system.  

Definitely.  I don't know if we have a catchy term for "kneejerk optimization" or if it falls under the broader umbrella of "premature optimization" [including misdirected or unneeded optimization].  

I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy! 
 
 Haskell's average penalty compared to C is
no reason to write the entire application in C. 

Yes, this seems to be a separate disease.  Not just using low-level langs, per se, but using them for *everything*.  I have worked at places in industry where teams automatically use C++ for everything.  For example, they use it for building all complete GUI applications, which surprises me a little bit.  I would have thought in recent years that almost everyone was using *something* else (Java,Python, whatever) to do much of the performance-non-critical portions of their application logic.
 
  -Ryan

Ertugrul Söylemez

unread,
May 21, 2012, 10:43:57 AM5/21/12
to haskel...@haskell.org
Ryan Newton <rrne...@gmail.com> wrote:

> I do think we have the opposite problem, however, in much Haskell code
> -- people are using the clean, obviously correct, but inefficient code
> even in standard library functions that really should be optimized
> like crazy!

Not necessarily. For example the 'nub' function from Data.List could be
much faster. Unfortunately this would also change its type. O(n²)
complexity is really the best you can get with the Eq constraint. You
have to change to Ord for better performance.

In other words: Some optimizations change the semantics, and semantics
is taken very seriously in Haskell, for which I'm grateful.


Greets,
Ertugrul

--
Key-ID: E5DD8D11 "Ertugrul Soeylemez <e...@ertes.de>"
FPrint: BD28 3E3F BE63 BADD 4157 9134 D56A 37FA E5DD 8D11
Keysrv: hkp://subkeys.pgp.net/
signature.asc

Yves Parès

unread,
May 21, 2012, 10:51:05 AM5/21/12
to rrne...@gmail.com, Ertugrul Söylemez, haskel...@haskell.org
> I do think we have the opposite problem, however, in much Haskell code -- people are using the clean, obviously correct, but inefficient code even in standard library functions that really should be optimized like crazy!

And even before optimizing "like crazy", I think the functions that are "more often" good should be emphasized: Prelude should export foldl' together with/instead of foldl, sum should have its sum' counterpart (or even be rewritten to use foldl'), and so on...
It baffles me that functions that are said to be more efficient in the majority of cases are not the default.


> I have worked at places in industry where teams automatically use C++ for everything.

Have they looked at you like if you were an alien (and even said you were not a serious developper) when you emitted the possibility of evaluating the feasibility of using/making a more expressive language for a specific task?

2012/5/21 Ryan Newton <rrne...@gmail.com>

Yves Parès

unread,
May 21, 2012, 10:53:25 AM5/21/12
to Ertugrul Söylemez, haskel...@haskell.org
> Not necessarily.  For example the 'nub' function from Data.List could be
> much faster.  Unfortunately this would also change its type.  O(n²)
> complexity is really the best you can get with the Eq constraint.

Why not in that kind of cases provide a second function (named differently), together with the original function, and specify they're differences (i.e. wrt performances) in the doc?
It seems like a pretty quick and honest trade-off to me.

2012/5/21 Ertugrul Söylemez <e...@ertes.de>

Thomas DuBuisson

unread,
May 21, 2012, 11:39:52 AM5/21/12
to Yves Parès, Ertugrul Söylemez, haskel...@haskell.org
On Mon, May 21, 2012 at 7:53 AM, Yves Parès <yves....@gmail.com> wrote:
>> Not necessarily.  For example the 'nub' function from Data.List could be
>> much faster.  Unfortunately this would also change its type.  O(n²)
>> complexity is really the best you can get with the Eq constraint.
>
> Why not in that kind of cases provide a second function (named differently),
> together with the original function, and specify they're differences (i.e.
> wrt performances) in the doc?
> It seems like a pretty quick and honest trade-off to me.


WRT nub, Bart Massey did exactly this in his "nubOrd" proposal. He
obtained consensus then failed to finish the ticket [1]. If this
particular case is of interest to you or anyone else then I suggest
you take the patches, re-propose and see it finished. If you are
interested in this general category of issue, I think this is a case
study in how costly even our seemingly light weight proposals process
is in terms of proposer time investment.

Cheers,
Thomas

[1] http://hackage.haskell.org/trac/ghc/ticket/2717

Sam Martin

unread,
May 21, 2012, 11:47:06 AM5/21/12
to rrne...@gmail.com, haskel...@haskell.org

> Yes, this seems to be a separate disease.  Not just using low-level langs, per se,
> but using them for *everything*.  I have worked at places in industry where teams
> automatically use C++ for everything.  For example, they use it for building all
> complete GUI applications, which surprises me a little bit.  I would have thought
> in recent years that almost everyone was using *something* else (Java,Python,
> whatever) to do much of the performance-non-critical portions of their application logic.

I think "disease" might be overstating this somewhat :) In defence of using C++ for everything: Interfacing different languages is not exactly trivial, and in some cases, impossible.

If you are writing a program or system that has significant performance requirements, you might just be better off doing the whole thing in C/C++ and living with the annoyance of doing GUIs, or whatever, in C++. The overhead of pulling in another language may simply outweigh the convenience.

In addition, it's pretty much impossible for me to use Haskell to write portions (or all of) a game that would include a console version. This is largely for build system and platform support issues. Doing everything in C++ is nearly the only option here.

This situation could be improved though, by making it far easier to embed Haskell within C/C++. It's not difficult by design, but there are large engineering obstacles in the way, and it's hard to see where the effort to remove these could come from. But then it would be more plausible to argue that people are missing out by not using a mix of C++ and Haskell.

Cheers,
Sam

Isaac Gouy

unread,
May 21, 2012, 12:15:32 PM5/21/12
to Haskell Cafe
> From: Richard O'Keefe <o...@cs.otago.ac.nz>
> Sent: Sunday, May 20, 2012 3:41 PM

> On 19/05/2012, at 5:51 AM, Isaac Gouy wrote:
>>> In the 'tsort' case, it turns out that the Java and Smalltalk
>>> versions are I/O bound with over 90% of the time spent just
>>> reading the data.
>>
>> My guess is that they could be written to do better than that - but
> it's
>> idiotic of me to say so without understanding the specifics, please
>> forgive me ;-)
>
> Actually, I/O bound is *good*.

Why would that be good or bad?

I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs.


-snip-
>> Of course, to some degree, user defined hash functions remedy that specific problem.
>
> While creating other, and perhaps worse, ones.
>
> For example, in the Smalltalk code, if you use a Dictionary of Strings,
> you're getting Robert Jenkin's hash function in optimised C.  If you
> supply your own, you're getting a very probably worse hash function
> and it's going to run rather slower.  And above all, the stuff you are
> benchmarking is no longer code that people are actually likely to write.

I think you're being a touch quarrelsome :-)

Do *all* Smalltalk language implementations provide that same hash function in optimised C?

Have Smalltalk language implementations *always* provided that same hash function in optimised C?

How might that hash function be used when the (not necessarily Smalltalk) language implementation does not provide it?

Have you researched what code people are actually likely to write, or are you just speculating? ;-)



-snip-
>> But we're still going to ask - Will my program be faster if I write it
>> in language X? - and we're still going to wish for a simpler answer
>> than - It depends how you write it!
>
> Here's another little example.  I had a use for the Singular Value
> Decomposition in a Java program.  Should I use pure Java or native C?
>
> Pure Java taken straight off the CD-ROM that came with a large
> book of numerical algorithms in Java:  T seconds.
>
> After noticing that the code was just warmed over Fortran, and was
> varying the leftmost subscript fastest (which is good for Fortran,
> bad for most other languages) and swapping all the matrix dimensions: T/2
> seconds.
>
> After rewriting in C:  T/4 seconds.
>
> After rewriting the C code to call the appropriate BLAS
> and thereby using tuned code for the hardware, T/7 seconds.
>
> Since this was going to take hundreds of seconds per run, the answer was easy.

Maybe the more interesting question was - Should I use a scripting language + BLAS.


> "It depends" is the second best answer we can get.
> The best answer is one that says _what_ it depends on.

That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)

Yves Parès

unread,
May 21, 2012, 12:27:57 PM5/21/12
to Sam Martin, haskel...@haskell.org
> If you are writing a program or system that has significant performance requirements, you might just be better off doing the whole thing in C/C++ and living with the annoyance of doing GUIs

I fail to see how the GUI part would suffer from lack of performance if the rest of the system is fine. I would hate to be bold, but to me this case sounds a little bit like "MVC done wrong" if the breaking GUI apart from the rest of the software is really that impossible.
Anyway only benchmarks could tell if in such case the coding of the GUI in Haskell and the integration with the rest burns the performances to the ground.


> In addition, it's pretty much impossible for me to use Haskell to write portions (or all of) a game that would include a console version. This is largely for build system and platform support issues. Doing everything in C++ is nearly the only option here.

I worked in a video game company too (I refered to it when I answered to Ryan about companies automatically using C++), and I agree, the first unbreakable obstacle to the implementation of some parts of the application in Haskell (or in anything else than C/C++) that comes to mind is the fact that in the end it must run not only on personal computers.
The main issue is that those systems are way too closed by their manufacturers. Second issue may be RAM (way scarcer than on PCs: e.g. 512Mo in all for the PS3), but --again-- benchmarks wrt that would be more enlightening than my own opinion.

2012/5/21 Sam Martin <sam.m...@geomerics.com>

Stephen Tetley

unread,
May 21, 2012, 1:08:18 PM5/21/12
to haskel...@haskell.org
On 21 May 2012 17:27, Yves Parès <yves....@gmail.com> wrote:

> I fail to see how the GUI part would suffer from lack of performance if the
> rest of the system is fine. I would hate to be bold, but to me this case
> sounds a little bit like "MVC done wrong" if the breaking GUI apart from the
> rest of the software is really that impossible.

A few years ago one of the architects at Adobe published some slides
on the software engineering aspects of PhotoShop, unfortunately I
couldn't find them on the web when I tried recently but I believe it
stated the codebase was well over 1 million lines of C++ and the GUI
(including Adobe's own frameworks) accounted for more than half of
that...

GUI's often *are* the program rather than a way in to use it.

Isaac Gouy

unread,
May 21, 2012, 2:42:19 PM5/21/12
to haskel...@haskell.org
> From: Stephen Tetley <stephen...@gmail.com>

> Sent: Monday, May 21, 2012 10:08 AM
> Subject: Re: [Haskell-cafe] Can Haskell outperform C++?
>
> On 21 May 2012 17:27, Yves Parès <yves....@gmail.com> wrote:
>
>> I fail to see how the GUI part would suffer from lack of performance if the
>> rest of the system is fine. I would hate to be bold, but to me this case
>> sounds a little bit like "MVC done wrong" if the breaking GUI
>> apart from the rest of the software is really that impossible.
>
> A few years ago one of the architects at Adobe published some slides
> on the software engineering aspects of PhotoShop, unfortunately I
> couldn't find them on the web when I tried recently but I believe it
> stated the codebase was well over 1 million lines of C++ and the GUI
> (including Adobe's own frameworks) accounted for more than half of
> that...
>
> GUI's often *are* the program rather than a way in to use it.


What about a more recent code base, this is from a 2008 presentation on Adobe Lightroom 2:

- "63% of the main Lightroom-team authored code is Lua"
- "16% C++"
- "12% ObjC"
- "9% C"

http://www.troygaul.com/LrExposedC4.html

Richard O'Keefe

unread,
May 21, 2012, 9:54:58 PM5/21/12
to Isaac Gouy, Haskell Cafe

On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
>> Actually, I/O bound is *good*.
>
> Why would that be good or bad?

The context here is a UNIX-style topological sorting program.
Being I/O bound means that the program is limited by how fast
it can read the data. If 90% of the time goes into reading
the data, that means that the *algorithmic* part is fast enough.

There may be very little one can do about the I/O part.

> I suppose you're just suggesting that, in this case, the performance characteristics of the Java and Smalltalk programs are similar to the C program; but, for whatever reason, you're leaving us guessing about the timings for those other programs.

I didn't _think_ I'd omitted anything important. Oh well.

For 25,000 nodes and 2,989,635 edges,
Java (first version) 4.83 seconds -- uses ArrayList, HashMap
Java (2nd version) 4.17 seconds -- uses own IntList
Java (3rd version) 4.09 seconds -- different approach
Smalltalk 4.40 seconds
C 0.92 seconds -- my code
Mac OS X tsort(1) 150.35 seconds -- 11.84 user + 138.51 system

For 50,000 nodes and 8,385,254 edges,
Java (first version) ran out of memory after 89.54 seconds (default heap)
Java (2nd version) 13.31 seconds -- avoid Integer boxing!
Java (3rd version) 15.07 seconds
Smalltalk 14.52 seconds
C 2.36 seconds
Mac OS X tsort(1) 482.32 seconds -- 41.55 user + 440.77 system

The Mac OS X tsort(1) is presumably written in C. Symbols have been
stripped, so even with Instruments I can't see where the time is going.
Judging from its memory behaviour, I believe that most of the time is
going in the input phase, which suggests a spectacularly inefficient
symbol table, which suggests that it might be using a not-very-modified
version of the Unix V7 code, which used a linear search...

>>> Of course, to some degree, user defined hash functions remedy that specific problem.
>>
>> While creating other, and perhaps worse, ones.
>>
>> For example, in the Smalltalk code, if you use a Dictionary of Strings,
>> you're getting Robert Jenkin's hash function in optimised C. If you
>> supply your own, you're getting a very probably worse hash function
>> and it's going to run rather slower. And above all, the stuff you are
>> benchmarking is no longer code that people are actually likely to write.
>
> I think you're being a touch quarrelsome :-)

That upset me. All I was saying is that surely the only *point* of
talking about the performance of *languages* is to get some idea of
how well programs are likely to do, and not any old specially crafted
code, but the kind of code that is likely to be written for purposes
other than benchmarking. So the more you bash on a particular example
to get the time down, the less representative it is of the kind of code
people are likely to write *for purposes other than benchmarking*.

Just today I marked a student program where their program took 15 minutes
and my model answer took a touch under 4 milliseconds. I explained to
them _that_ their program was spectacularly slow. They already knew _why_
it was. I also explained the one trick (lazy initialisation) that could
have hugely improved the time. I also told them that they had made the
right decision, to optimise *their* time, not the computer's, in their
particular context.

> Do *all* Smalltalk language implementations provide that same hash function in optimised C?

*That* function, no. *Some* hash function as a "primitive" (meaning
optimised C), well, I don't have every Smalltalk, but the ones I do
have, I've checked, and yes they do.
>
> Have Smalltalk language implementations *always* provided that same hash function in optimised C?

Obviously not: Smalltalk-80 ran on Xerox D-machines, which didn't *have*
C. "Primitives" were implemented in microcode.
>
> Have you researched what code people are actually likely to write, or are you just speculating? ;-)

I'm in my 6th decade. I've seen a lot of code in a lot of languages.
>
>> "It depends" is the second best answer we can get.
>> The best answer is one that says _what_ it depends on.
>
> That may be the best answer to some other question - but for the stated question I think were looking for a Yes or a No :-)

The subject line has the obvious and boring answer "yes, of course".

I recall one time when I wanted to analyse some data and typed up
Fortran code for a suitable technique from a book. It didn't work.
After struggling to debug it for a while, I implemented the whole
thing from scratch in Prolog. Then I went back to the Fortran
code, spotted my typing mistake, and fixed it. Result, two working
programs. The Prolog program (compiled to a VM which was emulated;
the compiler was not very clever) ran faster than the Fortran program
(compiled to native code by a seriously optimising compiler from a
major computer manufacturer).

Reason? Same algorithm, different data structure. The data structure
was one that was easy to express in Prolog (or Haskell!) but would
have been insanely difficult in Fortran 77. This century's Fortran is
of course another matter.

The point here of course is that there are data structures and algorithms
that are easy to express in some languages but hard in others, so that
in code written *for purposes other than benchmarking*, they just would
not be _used_ in one of those languages.

Isaac Gouy

unread,
May 22, 2012, 12:54:52 PM5/22/12
to Haskell Cafe
> From: Richard O'Keefe
> Sent: Monday, May 21, 2012 6:54 PM

> On 22/05/2012, at 4:15 AM, Isaac Gouy wrote:
>>>  Actually, I/O bound is *good*.
>>
>>  Why would that be good or bad?
>
> The context here is a UNIX-style topological sorting program.
> Being I/O bound means that the program is limited by how fast
> it can read the data.  If 90% of the time goes into reading
> the data, that means that the *algorithmic* part is fast enough.
>
> There may be very little one can do about the I/O part.

Maybe you could say how the Java I/O is being done.


> I didn't _think_ I'd omitted anything important.  Oh well.

-snip-
>     For 50,000 nodes and 8,385,254 edges,
>     Java (first version) ran out of memory after 89.54 seconds (default heap)
>     Java (2nd version)  13.31 seconds   -- avoid Integer boxing!
>         Java (3rd version)  15.07 seconds
>         Smalltalk           14.52 seconds
>         C                    2.36 seconds

fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.

fwiw my expectation is that should be possible to make the Java program considerably faster. Of course, what I expect and what happens are often very different ;-)


>>>>  Of course, to some degree, user defined hash functions remedy that
> specific problem.
>>>
>>>  While creating other, and perhaps worse, ones.
>>>
>>>  For example, in the Smalltalk code, if you use a Dictionary of Strings,
>>>  you're getting Robert Jenkin's hash function in optimised C. 
> If you
>>>  supply your own, you're getting a very probably worse hash function
>>>  and it's going to run rather slower.  And above all, the stuff you
> are
>>>  benchmarking is no longer code that people are actually likely to
> write.
>>
>>  I think you're being a touch quarrelsome :-)
>
> That upset me.

I'm sorry that gentle comment upset you.


> All I was saying is that surely the only *point* of
> talking about the performance of *languages* is to get some idea of
> how well programs are likely to do, and not any old specially crafted
> code, but the kind of code that is likely to be written for purposes
> other than benchmarking.  So the more you bash on a particular example
> to get the time down, the less representative it is of the kind of code
> people are likely to write *for purposes other than benchmarking*.

Certainly less representative of the kind of code students are likely to write for course credits, and the kind of code people are likely to write to complete some project task before they hand it off to someone else, and the kind of code people are likely to write until their program is actually put into use and suddenly they are working the weekend to make their program faster.

A more positive way to think of - code written for purposes of benchmarking - is that it's like code written to address a performance hot spot.


> Just today I marked a student program where their program took 15 minutes
> and my model answer took a touch under 4 milliseconds.  I explained to
> them _that_ their program was spectacularly slow.  They already knew _why_
> it was.  I also explained the one trick (lazy initialisation) that could
> have hugely improved the time.  I also told them that they had made the
> right decision, to optimise *their* time, not the computer's, in their
> particular context.

The whole point is carried by your final assertion.

Here's another anecdote - in my first week on site, I overheard a group of engineers arguing that Smalltalk should be abandoned because calculation times were incredibly slow. I peeked at the code, saw that it was littered with unnecessary numeric conversions (plus unnecessary arbitrary precision arithmetic), fixed and timed a sample, and left them with a lot of rework to do all across their library code.

The "kind of code people are likely to write" is sometimes best described as bad.

That can have consequences which spiral out of proportion -- an engineer writes some bad Smalltalk, an app performs so slowly the business group loses money, the manager of the business group notices and is shown that the slow app is the problem (and it really is the problem), the manager now "knows" that "Smalltalk is slow", the manager moves the business group away from Smalltalk, the manager is promoted and moves the whole organization away from Smalltalk. That's also an anecdote.


> *That* function, no.  *Some* hash function as a "primitive" (meaning
> optimised C), well, I don't have every Smalltalk, but the ones I do
> have, I've checked, and yes they do.

Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?


>>  Have you researched what code people are actually likely to write, or are
> you just speculating? ;-)
>
> I'm in my 6th decade.  I've seen a lot of code in a lot of languages.

So just speculating.


> The subject line has the obvious and boring answer "yes, of course".

I agree, because of the implicit qualification - "if we write the C++ badly enough" :-)


-snip-
> The point here of course is that there are data structures and algorithms
> that are easy to express in some languages but hard in others, so that
> in code written *for purposes other than benchmarking*, they just would
> not be _used_ in one of those languages.

Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse. 

(And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)

Richard O'Keefe

unread,
May 22, 2012, 10:59:28 PM5/22/12
to Isaac Gouy, Haskell Cafe

On 23/05/2012, at 4:54 AM, Isaac Gouy wrote:
>> There may be very little one can do about the I/O part.
>
> Maybe you could say how the Java I/O is being done.

>> For 50,000 nodes and 8,385,254 edges,
>> Java (first version) ran out of memory after 89.54 seconds (default heap)
>> Java (2nd version) 13.31 seconds -- avoid Integer boxing!
>> Java (3rd version) 15.07 seconds
>> Smalltalk 14.52 seconds
>> C 2.36 seconds
>
> fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.

My own experience is that Java is anywhere between 2 times slower and 150 times
slower than C. This is not for number crunching; one _would_ expect Java to be
about the same as C for that because it is working at the same level as C. But
string processing and text I/O using the java.io.* classes aren't brilliant.

Reader r = new BufferedReader(new InputStreamReader(System.in));

This "layers of wrappers" approach to text I/O adds overheads of its own, but
less than I'd thought. Using System.in directly takes the time down from
15.07 seconds to 11.11 seconds. The code used Character.isWhitespace(c) to
test for a white space character; replacing that by c <= ' ' brought the time
down further to 10.87 seconds.

With both of these changes, we are moving away from recommended good practice;
the faster code is the kind of code people are not supposed to write any more.

Characters are read one at a time using r.read(). There are no fast "skip
characters in this set" or "take characters in this set and return them as
a string" methods in the Reader or InputStream interfaces, and StreamTokenizer
reads characters one at a time using .read() also, so would be slower.

As for Smalltalk, you must be thinking of free Smalltalk systems that lacked a
JIT. Commercial Smalltalks have excellent JITs (many HotSpot ideas were copied
from them) and there is now a pretty good one for the free systems Squeak and
Pharo. These particular measurements were made using my own Smalltalk compiler
which is an oddity amongst Smalltalks: a whole program compiler that compiles
via C. Yes, most of the good ideas came from INRIA, although ST/X does
something not entirely dissimilar.

> fwiw my expectation is that should be possible to make the Java program considerably faster.

I have used profiling to find where the time was going.
Approximately 70% is spent in the one method that reads a "word":
- a loop skipping white space characters
- a loop reading other characters and adding them to a StringBuilder
- [looking the string up in a HashMap and creating and entering a
new Node if necessary -- this time is not part of that 70%].

Any reasonable person would expect file reading to be buffered and
for the read-one-byte method to usually just fiddle a few integers
and fetch an element of an array. In fact the method is native, so
every character requires switching from the Java universe to the C
one and back. (The Smalltalk equivalent is pretty close to fgetc().)

The only way to make this Java program 'considerably faster' is to
not read characters (or bytes) in the natural Java way. (The way,
in fact, that java.io.StreamTokenizer does.)

And it's not INTERESTING, and it's not about LANGUAGES.
There is NOTHING about the Java language that makes code like this
necessarily slow. It's the LIBRARY. The java.io library was
designed for flexibility, not speed. That's why there is a java.nio
library.

Haskell is in pretty much the same boat. I've always liked the
I/O model in the Haskell report. I've expected simplicity from it,
not speed, and that's what I get. But as all the new approaches
to I/O (like iteratees, which make my head hurt) make perfectly
clear, the limitations of the IO module are not limitations of
Haskell, or of JHC, but of a particular library.

And that's the point I was making with this example. Why does
Smalltalk come out in the middle of the Java results? A balance
between a language penalty (tagged integer arithmetic is a lot
slower than native integer arithmetic) and a library bonus (a
leaner meaner I/O design where there are wrappers if you want
them but you very seldom need them). It's the great advantage
of using libraries rather than syntax: libraries can be changed.
>
>> All I was saying is that surely the only *point* of
>> talking about the performance of *languages* is to get some idea of
>> how well programs are likely to do, and not any old specially crafted
>> code, but the kind of code that is likely to be written for purposes
>> other than benchmarking. So the more you bash on a particular example
>> to get the time down, the less representative it is of the kind of code
>> people are likely to write *for purposes other than benchmarking*.
>
> Certainly less representative of the kind of code students

Leave the students out of it. It's less representative of the kind of
code I see written by Sun or in Apache stuff.
>
> The "kind of code people are likely to write" is sometimes best described as bad.

At last, something we can agree about!
>
>> *That* function, no. *Some* hash function as a "primitive" (meaning
>> optimised C), well, I don't have every Smalltalk, but the ones I do
>> have, I've checked, and yes they do.
>
> Maybe I misunderstood the thrust of your original comment - Were you trying to make a point about writing in C and calling that code from Smalltalk as a user written primitive; or were you trying to make a point about the importance of using a better hash function?

Neither. I am making the point that many benchmarks benchmark library
code rather than languages or compilers per se, and that the concept of
"same algorithm" is as a result very fuzzy.
>
>
>>> Have you researched what code people are actually likely to write, or are
>> you just speculating? ;-)
>>
>> I'm in my 6th decade. I've seen a lot of code in a lot of languages.
>
> So just speculating.

How is "I have seen a lot of code" construed as "just speculating"?
I know what code people are likely to write because I have seen the
code that many people DID write. I know what code people are likely
to write in Java because I have seen more Java (written by people
who really should know what they are doing) than I ever wanted to see
AND because I know how they are TOLD to write.
>
>
>> The subject line has the obvious and boring answer "yes, of course".
>
> I agree, because of the implicit qualification - "if we write the C++ badly enough" :-)

Otherwise construed, "the way C++ is usually written".
C++ *can* be extremely fast. One of my colleagues is very good at it.
But one of his key rules is "never use the standard template library,
including C++ strings."

>> The point here of course is that there are data structures and algorithms
>> that are easy to express in some languages but hard in others, so that
>> in code written *for purposes other than benchmarking*, they just would
>> not be _used_ in one of those languages.
>
> Let's say that's true - until we show how much better programs using other data structures and algorithms perform those specific tasks, it's just an excuse.
>
> (And, depending on how hard it is to express those different data structures and algorithms in other languages, it might not even be a good excuse.)

My evidence may be anecdotal, but it is better than arguing with
NO evidence.

The example you are commenting on here is one where I reported an
emulated Prolog program running faster than native Fortran 77.
Recall that Fortran 77
- did not include recursion
- did not include pointers
- did not include user defined data types
The kind of dynamic tree construction and walking that is so
trivial and every day in Haskell was extremely difficult to
express in Fortran 77.

Not necessarily impossible. A parser for English was once written
in Fortran, and I think that was before Fortran 77 even. But hard.

A programming language that supports concurrency, or backtracking,
or constraint programming, or type level arithmetic, or dependent
types, or ... will make some things thinkable and straightforward
to express that would be neither in a language without such
support.

wren ng thornton

unread,
May 22, 2012, 11:23:42 PM5/22/12
to haskel...@haskell.org
On 5/21/12 10:51 AM, Yves Parès wrote:
>> I do think we have the opposite problem, however, in much Haskell code --
> people are using the clean, obviously correct, but inefficient code even in
> standard library functions that really should be optimized like crazy!
>
> And even before optimizing "like crazy", I think the functions that are
> "more often" good should be emphasized: Prelude should export foldl'
> together with/instead of foldl, sum should have its sum' counterpart (or
> even be rewritten to use foldl'), and so on...
> It baffles me that functions that are said to be more efficient in the
> majority of cases are not the default.

Part of this is due to legacy and the Haskell Report. For example, the
definition of sum is specified by the Report. And unfortunately there
exist (rare) cases where the semantics of sum and sum' differ: namely
when the type supports lazy addition (e.g., Peano numbers) and therefore
can return a partial answer to an infinite summation.

That said, the GHC philosophy in most of these cases has been to:

(a) use their own definitions with a CPP flag USE_REPORT_PRELUDE in case
you *really* want the Report's definition[1], and

(b) to "secretly" replace foldl by foldl' in cases where it can
determine the function argument to be strict (and therefore that the
change doesn't alter the semantics).

That doesn't fix everything, and it's no justification for not having
the newer Reports give more sane definitions, but it's better than nothing.

Part of the issue re fixing the Report is that there's a tension between
correctness and expediency, as well as between different notions of
"correctness". By and large, Haskell has managed to stay out of
theoretical arguments about choosing what "correct" means when it is
still controversial in mathematics. (Whence many of the complaints about
the non-mathematical nature of the numeric type classes.) But which of
the foldr vs foldl vs foldl' definitions of summation is the "correct"
definition? Mathematically, infinite summations should be allowed, and
summations of infinite quantities should be allowed. However,
pragmatically, infinite summations are of a different nature than finite
summations; e.g., when they converge it'd be nice to evaluate them in
finite time. So on the one hand, a naive view of "correctness" suggests
that we ought to allow these summations as just more of the same; but on
the other hand, an alternative notion of "correctness" suggests that
while infinite sums and sums of infinites should be handled, they should
be handled by different means than traditional finite sums of finite values.

The foldl' definition of summation only allows finite summations of
finite[2] values; the foldl definition only allows finite summations,
but they can be summations of infinite values; the foldr definition
allows both infinite values and infinitely many summands, though it's
not especially useful for handling infinite summations[3]. We can
probably exclude foldr with all fairness, and tell folks to handle
infinite summations in another way. But between foldl and foldl' there's
(room for) a lot more debate about which should be blessed by the
Prelude. Do we want to support lazy numbers out of the box, or do we
want to support performance for strict numbers out of the box? While
there's been a growing strictification of Haskell for performance
reasons, do recall that laziness was one of the initial reasons for
defining Haskell in the first place. Hence the old Report's definition.
The role of Haskell as a research engine has moved on since then, but
the decision to codify that is still non-trivial.


[1]
http://hackage.haskell.org/packages/archive/base/4.5.0.0/doc/html/src/Data-List.html#sum

[2] For the purpose of discussion, I'm considering the Float and Double
values for infinities to be "finite", because they are identical to the
finite values with respect to strictness and size of representation.

[3] It could take infinitely long to return, even for types which allow
lazily returning partial values. For example,

data Peano = Z | S Peano
instance Num Peano where ...

zeros = Z : zeros

-- Has to traverse the whole list, just in case there's a (S _) in
-- there somewhere, before it knows whether to return Z or (S _).
main = print (foldr (+) Z zeros)

--
Live well,
~wren

wren ng thornton

unread,
May 23, 2012, 12:30:17 AM5/23/12
to Haskell Cafe
On 5/22/12 12:54 PM, Isaac Gouy wrote:
> On 5/21/2012 6:54 PM, Richard O'Keefe wrote:
>> For 50,000 nodes and 8,385,254 edges,
>> Java (first version) ran out of memory after 89.54 seconds (default heap)
>> Java (2nd version) 13.31 seconds -- avoid Integer boxing!
>> Java (3rd version) 15.07 seconds
>> Smalltalk 14.52 seconds
>> C 2.36 seconds
>
> fwiw my expectation is that Java would not be that much slower than C, and would be considerably faster than Smalltalk.

FWIW, that matches my expectations pretty well. Naive/standard Java
performing slower than Smalltalk; highly tweaked Java using non-standard
data types performing on-par with or somewhat faster than Smalltalk.
That C is 7x faster is a bit on the high end, but for something like
tsort I could imagine it'd be possible.

Do bear in mind that Java doesn't optimize ---that's the JIT's job---
and that the standard datatypes for Java are only good enough to suffice
for casual use and to be better than *naively* implementing them
yourself. They are extremely mediocre if you have any sort of
"specialized" needs. If you know what you're doing in designing data
structures, you can get amazing mileage out of writing a hand-tuned
implementation that (1) avoids boxing native types, (2) uses a
non-generic structure which has good complexities for the kinds of
operations you'll actually be using, and (3) doesn't bother trying to
implement the ridiculous APIs of the standard data types.

But even still, in my experience of using Smalltalk, the standard data
structures are much better done and so they will be on-par with what
you'd get from hand-tuning for Java. I've spent a lot of time trying to
get decent performance out of Java, not so much with Smalltalk; but the
performance with Smalltalk was sufficient that it wasn't needed so badly.

--
Live well,
~wren

Yves Parès

unread,
May 23, 2012, 3:51:35 AM5/23/12
to wren ng thornton, haskel...@haskell.org
I understand your concerns about modifying the current ideology, it's fair enough.
Actually I'm myself more in favor of adding strict couterparts, and export them conjointly, to support both the mathematical roots and the performances of the operations that are done most of time (Which kind of numbers do developers work with more often? Regular Ints and Doubles or Peano lazy numbers?)


2012/5/23 wren ng thornton <wr...@freegeek.org>

Isaac Gouy

unread,
May 23, 2012, 12:39:42 PM5/23/12
to Haskell Cafe
> From: Richard O'Keefe <o...@cs.otago.ac.nz>
> Sent: Tuesday, May 22, 2012 7:59 PM

> But string processing and text I/O using the java.io.* classes aren't brilliant.

Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?


-snip-

>  Using System.in directly takes the time down from 15.07 seconds to 11.11 seconds.
-snip-
> With both of these changes, we are moving away from recommended good practice;
> the faster code is the kind of code people are not supposed to write any more.

Says who? Is that on your own authority or some other source you can point us to?


-snip-
> As for Smalltalk, you must be thinking of free Smalltalk systems that lacked a
> JIT.  Commercial Smalltalks have excellent JITs (many HotSpot ideas were copied
> from them) ...

As for Smalltalk, I earned my crust with commercial Smalltalks for a decade.


> These particular measurements were made using my own Smalltalk compiler
> which is an oddity amongst Smalltalks: a whole program compiler that compiles
> via C.  Yes, most of the good ideas came from INRIA, although ST/X does
> something not entirely dissimilar.

Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!

You have got to be joking.


>> fwiw my expectation is that should be possible to make the Java program
>> considerably faster.

-snip-
> Any reasonable person would expect ...

I'm happy to hear what *you* would expect.

-snip-
> And it's not INTERESTING, and it's not about LANGUAGES.
> There is NOTHING about the Java language that makes code like this
> necessarily slow.  It's the LIBRARY.  The java.io library was
> designed for flexibility, not speed.  That's why there is a java.nio
> library. 

Here's the gorilla in the room question - So why doesn't your program use java.nio?


> And that's the point I was making with this example.  Why does
> Smalltalk come out in the middle of the Java results?  A balance
> between a language penalty (tagged integer arithmetic is a lot
> slower than native integer arithmetic) and a library bonus (a
> leaner meaner I/O design where there are wrappers if you want
> them but you very seldom need them).  It's the great advantage
> of using libraries rather than syntax: libraries can be changed.

No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.


-snip-
> Neither.  I am making the point that many benchmarks benchmark library
> code rather than languages or compilers per se, and that the concept of
> "same algorithm" is as a result very fuzzy.

Thank you, for stating your point clearly.


-snip-
> How is "I have seen a lot of code" construed as "just speculating"?

You seem to be generalizing from what you recollect without any way to control for the particularities of the situations you observed, or the particularities of your recollection. You don't seem to have data - just memories.


-snip-
> My evidence may be anecdotal, but it is better than arguing with NO evidence.

imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.

Isaac Gouy

unread,
May 23, 2012, 4:26:47 PM5/23/12
to Haskell Cafe
> From: wren ng thornton <wr...@freegeek.org>

> Sent: Tuesday, May 22, 2012 9:30 PM

-snip-
> FWIW, that matches my expectations pretty well. Naive/standard Java performing
> slower than Smalltalk; highly tweaked Java using non-standard data types
> performing on-par with or somewhat faster than Smalltalk.

I have no difficulty believing that if you are talking about a 1996 Java reference implementation and a 1996 Smalltalk JIT VM.

I could believe that if you are comparing a naive Java program with a highly tweaked Smalltalk program.


> That C is 7x faster is a bit on the high end, but for something like tsort I could imagine it'd be possible.

It's possible because it's possible to write a Java program to be slower than it need be :-)


> Do bear in mind that Java doesn't optimize ---that's the JIT's job

What are we supposed to make of that?

Why write that and not -- Do bear in mind that Smalltalk doesn't optimize that's the JIT's job -- or -- Do bear in mind that C doesn't optimize that's the compiler's job.


-snip-
> But even still, in my experience of using Smalltalk, the standard data
> structures are much better done and so they will be on-par with what you'd
> get from hand-tuning for Java. I've spent a lot of time trying to get decent
> performance out of Java, not so much with Smalltalk; but the performance with
> Smalltalk was sufficient that it wasn't needed so badly.

Do you have a specific example that you can share?

Richard O'Keefe

unread,
May 24, 2012, 1:52:13 AM5/24/12
to Isaac Gouy, Haskell Cafe

On 24/05/2012, at 4:39 AM, Isaac Gouy wrote:

>> From: Richard O'Keefe <o...@cs.otago.ac.nz>
>> Sent: Tuesday, May 22, 2012 7:59 PM
>
>> But string processing and text I/O using the java.io.* classes aren't brilliant.
>
> Wait just a moment - Are you comparing text I/O for C programs that process bytes against Java programs that process double-byte unicode?

No. Amongst other things, I have my own ByteString and ByteStringBuilder
classes that are basically clones of String and StringBuilder, and using
them makes surprisingly little direct difference; the point is saving
memory.

I have obtained large speedups in Java using Java by dodging around the
Java libraries. Other people have reported the same to me.

>> With both of these changes, we are moving away from recommended good practice;
>> the faster code is the kind of code people are not supposed to write any more.
>
> Says who? Is that on your own authority or some other source you can point us to?

It looks increasingly as though there is no point in this discussion.
Is there ANY conceivable criticism of Java that will not elicit
ad hominem attacks from you?

I have read more Java textbooks than I wished to. I was on Sun's
Java techniques and tips mailing list for years. I could go on,
but is there, *really*, any point?
>
>> These particular measurements were made using my own Smalltalk compiler
>> which is an oddity amongst Smalltalks: a whole program compiler that compiles
>> via C. Yes, most of the good ideas came from INRIA, although ST/X does
>> something not entirely dissimilar.
>
> Wait just a moment - you wrote "I didn't _think_ I'd omitted anything important" and now it turns out that the measurements were made using your personal Smalltalk implementation!
>
> You have got to be joking.

Why? On various benchmarks, sometimes VisualWorks is better,
sometimes my system is better. My system is utterly naive,
incorporating almost none of the classic Smalltalk optimisations.

I redid the test using VisualWorks NonCommercial.
It took about twice as long as my Smalltalk did.
According to 'TimeProfiler profile: [...]',
98% of the time is in the load phase; half of that
is down to the hash table. A surprisingly small part
of the rest is due to actual input (ExternalReadStream>>next);
quite a bit goes into building strings and testing characters.

Why the difference?
With all due respect, VisualWorks still has the classic Smalltalk
implementation of hash tables. Mine is different. This is a
library issue, not a language issue.
One of the tasks in reading is skipping separators.
Since it's used a lot in parsing input, my library pushes that
right down to the bottom level of ReadStream and ChannelInputStream.
VisualWorks uses a single generic implementation that doesn't get
up to the low level dodges mine does. And so on.

All *library* issues, not *compiler* or *language* issues.

Which is the whole point of this thread, as far as I am concerned.
C, Java, Smalltalk: this real example is dominated by *library*
level issues, not language issues or compiler issues.

>> And it's not INTERESTING, and it's not about LANGUAGES.
>> There is NOTHING about the Java language that makes code like this
>> necessarily slow. It's the LIBRARY. The java.io library was
>> designed for flexibility, not speed. That's why there is a java.nio
>> library.
>
> Here's the gorilla in the room question - So why doesn't your program use java.nio?
>
Because that would be insane.

This is a program I originally whipped up in less than an hour
for two reasons:

(A) I wanted to provide some students with an example of a
"work-list" algorithm that had some realism to it.
For that purpose, the program had to be READABLE.

(B) To my astonishment, the tsort(1) programs in OpenSolaris
and Mac OS X 10.6.8 turned out to be grotesquely slow for
non-toy graphs. I was expecting to have a use for the
program myself, so as it stood, the Java version was
already quite fast enough to be useful. (As in, a LOT
faster than the system version, even though the system
version was written in C.)

The one issue I had with the first version was not time, but
space, so I explored two ways of making it take less space.

There is no NEED to rewrite the program to use java.nio;
having replaced the system version of the command the Java
version was no longer the bottleneck in my intended use.

For me personally, having no experience with java.nio,
it was *easier* to rewrite the program from scratch in C
than to overcome the java.nio learning curve. And in any
case, I knew very well that I could get near enough to the
same order of improvement using InputStream and wrapping
my own buffering code over that (I've done that before).
Above all, since the students were even less familiar with
nio than I am, using nio would have destroyed the program's
utility for purpose (A).

As for the Smalltalk version, I often rewrite small things
into Smalltalk in order to find out what I'm doing wrong in
my implementation.


>
>> And that's the point I was making with this example. Why does
>> Smalltalk come out in the middle of the Java results? A balance
>> between a language penalty (tagged integer arithmetic is a lot
>> slower than native integer arithmetic) and a library bonus (a
>> leaner meaner I/O design where there are wrappers if you want
>> them but you very seldom need them). It's the great advantage
>> of using libraries rather than syntax: libraries can be changed.
>
> No, that doesn't seem to be the case - if I'm misunderstanding what you've done then please correct me, but it seems that Smalltalk comes out in the middle of the Java results because you chose to use a Java library "designed for flexibility, not speed" and you chose to use that library in a way that slows the program down.

No, I chose to
- use the official Java plain text I/O library
- the way the official Java series books and tutorials
say it should be used
- with a MINIMUM of wrapper layers.

And it was FAST ENOUGH TO BE USEFUL.
No, I chose to use that library THE WAY IT IS INTENDED TO BE USED.
It is the simplest most straightforward way to go.
It's the *same* "algorithm" that the C and Smalltalk versions use.

> imo It would be better to "show how much better programs using other data structures and algorithms perform those specific tasks" than brandish anecdotes from a past century.

"Past century"? Insults, is it?

As for "how much better programs using other data structures and algorithms
perform", this whole thread is about how well programs using the SAME data
structures and algorithms perform, and whether we can assign much meaning
to that. How could it possibly be better to do something irrelevant to the
topic?

Bryan O'Sullivan

unread,
May 24, 2012, 2:16:29 AM5/24/12
to Richard O'Keefe, Haskell Cafe
On Wed, May 23, 2012 at 10:52 PM, Richard O'Keefe <o...@cs.otago.ac.nz> wrote:
"Past century"?  Insults, is it?

Do you fine gentlemen absolutely have to continue this endless, offtopic, unedifying back-and-forth in public? Please.

Isaac Gouy

unread,
May 24, 2012, 3:00:28 AM5/24/12
to Haskell Cafe
Sorry Bryan, there are a couple of comments I should make a final reply to - I'll ignore the rest.




> From: Richard O'Keefe <o...@cs.otago.ac.nz>
> Sent: Wednesday, May 23, 2012 10:52 PM

-snip-
>> Says who? Is that on your own authority or some other source you can point
>> us to?
>
> It looks increasingly as though there is no point in this discussion.
> Is there ANY conceivable criticism of Java that will not elicit
> ad hominem attacks from you?

It isn't an ad hominem attack to ask you who's the authority that made some recommendation.


-snip-
>> Wait just a moment - you wrote "I didn't _think_ I'd omitted
>>  anything important" and now it turns out that the measurements were made
>>  using your personal Smalltalk implementation!
>>
>> You have got to be joking.
>
> Why?

Because you omitted basic information about the measurements you presented.


-snip-
>> imo It would be better to "show how much better programs using other
> data structures and algorithms perform those specific tasks" than brandish
> anecdotes from a past century.
>
> "Past century"?  Insults, is it?

No, it's an echo of the words you used - "...insanely difficult in Fortran 77.  This century's Fortran is of course another matter."

Ryan Newton

unread,
May 24, 2012, 10:51:18 AM5/24/12
to Isaac Gouy, Haskell Cafe
Oops, forgot to reply-to-all.  This was a minor clarification on Wren's behalf (he can correct me if I'm wrong).  But I agree with Bryan that it's time for the thread to die:
 
> Do bear in mind that Java doesn't optimize ---that's the JIT's job

What are we supposed to make of that?

Why write that and not -- Do bear in mind that Smalltalk doesn't optimize that's the JIT's job -- or -- Do bear in mind that C doesn't optimize that's the compiler's job.

I believe this was referring to the fact that javac isn't an aggressive optimizing compiler on the way from source to bytecode, i.e. it's the bytecode->asm leg where the optimization effort is focused.  

As an outsider to things Java that's something I've had trouble understanding, actually.  It doesn't seem like an either-or choice to me...

   -Ryan

Chris Dornan

unread,
May 24, 2012, 12:17:19 PM5/24/12
to Haskell Cafe
> Oops, forgot to reply-to-all.

Noooooooooooo! You had the right idea the first time. :-)

(Please excuse us while we chide you as humorously as we can into putting this thread out of its misery.)

Chris
Reply all
Reply to author
Forward
0 new messages