Re: Alternatives to C: ObjectPascal, Eiffel, Ada or Modula-3?

Mark T.B. Carroll

unread,

Jul 21, 2009, 9:25:47 AM7/21/09

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Pascal J. Bourguignon wrote:
>> Have a look at Haskell.
>
> He said performance was important.

And it's achievable in Haskell, especially with a bunch of the work
that's been done in GHC 6.6, 6.8, 6.10, and especially if you have
multiple cores. (I'm not just talking about things like short cut
fusion, but a bunch of library improvements like in ByteStrings.)

The caveat - and you may consider it a showstopper, but I don't - is
that it's easy for a newbie to write grossly inefficient Haskell. It
takes some understanding of what GHC's up to and what fancy stuff is
available, and some use of the profiler, to really be able to eke good
performance out of the system, and you may not think it acceptable that
(a) it's not easy for new users to figure out how to get good
performance from it and (b) if a part of the code has to be much changed
to make it fast, that part often ends up looking rather ugly. (I'd be
interested to see if Haskell fans can rebut me on those points, but
I bet some of the language shootout code still looks horrifying.)

Admittedly, the last I looked, it may be that Data.HashTable is still
slower than I'd have thought reasonable. But if I figure that if I
use it anyway then someone'll fix it and I'll notice the improvement
when I upgrade GHC. (-:

[ followups trimmed ]

Mark

Jon Harrop

unread,

Jul 21, 2009, 11:41:00 AM7/21/09

to

Mark T.B. Carroll wrote:
> Jon Harrop <j...@ffconsultancy.com> writes:
>> Pascal J. Bourguignon wrote:
>>> Have a look at Haskell.
>>
>> He said performance was important.
>
> And it's achievable in Haskell, especially with a bunch of the work
> that's been done in GHC 6.6, 6.8, 6.10, and especially if you have
> multiple cores. (I'm not just talking about things like short cut
> fusion, but a bunch of library improvements like in ByteStrings.)

There seems to be overwhelming evidence to the contrary to me.

For example, the Burrows-Wheeler Transform has been an outstanding challenge
since it appeared on the Haskell Cafe mailing list over 2 years ago:

http://www.haskell.org/pipermail/haskell-cafe/2007-June/027387.html

In 30 minutes, I wrote an F# implementation that is as fast as C and an
OCaml implementation that gets within 3x the performance of C:

http://flyingfrogblog.blogspot.com/2009/07/ocaml-vs-f-burrows-wheeler.html

Two years later and the fastest known Haskell is still 200x slower than my
F#.

Quicksort is another trivial example where the reflex is to cite:

qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

which is obviously not quicksort and, by my measurements, is over 6,000x
slower than my F# when sorting 20k elements. You could not hope to sort
even 1M elements with that function.

> The caveat - and you may consider it a showstopper, but I don't - is
> that it's easy for a newbie to write grossly inefficient Haskell.

The problem is not that newbies can easily write inefficient Haskell but
that even the most seasoned experts cannot make all Haskell competitively
efficient.

> It
> takes some understanding of what GHC's up to and what fancy stuff is
> available, and some use of the profiler, to really be able to eke good
> performance out of the system, and you may not think it acceptable that
> (a) it's not easy for new users to figure out how to get good
> performance from it and (b) if a part of the code has to be much changed
> to make it fast, that part often ends up looking rather ugly. (I'd be
> interested to see if Haskell fans can rebut me on those points, but
> I bet some of the language shootout code still looks horrifying.)

I think it is way beyond the realms of "taking some understanding". Look at
the complete lack of performance-related information in all of the
literature on Haskell. Real World Haskell had the opportunity to pioneer
this but, instead, it contains misinformation about hash tables.

> Admittedly, the last I looked, it may be that Data.HashTable is still
> slower than I'd have thought reasonable. But if I figure that if I
> use it anyway then someone'll fix it and I'll notice the improvement
> when I upgrade GHC. (-:

Sure. Data.HashTable was never the real issue though. The perf bug in the
GHC run-time that afflicts all boxed arrays and has been outstanding for
over 6 years was the real problem.

I have consulted some people with far more Haskell-specific experience than
myself about the potential of using Haskell in the context of technical
computing and they unanimously agreed that it was nowhere near usable yet,
essentially for these reasons (and unpredictable space).

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?u

Mark T.B. Carroll

unread,

Jul 21, 2009, 11:17:18 AM7/21/09

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Mark T.B. Carroll wrote:
>> Jon Harrop <j...@ffconsultancy.com> writes:
>>> Pascal J. Bourguignon wrote:
>>>> Have a look at Haskell.
>>>
>>> He said performance was important.
>>
>> And it's achievable in Haskell, especially with a bunch of the work
>> that's been done in GHC 6.6, 6.8, 6.10, and especially if you have
>> multiple cores. (I'm not just talking about things like short cut
>> fusion, but a bunch of library improvements like in ByteStrings.)
>
> There seems to be overwhelming evidence to the contrary to me.
>
> For example, the Burrows-Wheeler Transform has been an outstanding challenge
> since it appeared on the Haskell Cafe mailing list over 2 years ago:
>
> http://www.haskell.org/pipermail/haskell-cafe/2007-June/027387.html

Hang on, you cite a thread where you replied to someone saying,

] The point being that lots of people look at Haskell and go "oh, that's
] very cute for writing trivial example code, but it can never be fast;
] for that you must use C or C++".

] Well, I altered the code, and it's *still* very short and very
] readable, and it's just as fast as the (3 pages long) C++ version.
] >:-D

and where it was also mentioned that BWT may be unusual because,

] bwt transformation is very good researched area, so probably you will
] not get decent performance (megabytes per second) without lot of work.

You're also picking a thread from a couple of years ago, which I'm
guessing doesn't include many of the GHC improvements I'm talking about
above. Or do you seriously imagine that the Haskell community has been
busily beavering away on this specific problem in the two years since
and made little more progress?

So, I'm sorry, you're losing me here, if you think that's a good example
of what you mean.

(snip)

> Two years later and the fastest known Haskell is still 200x slower than my
> F#.

Oooh, your F# is around 200 times faster than the C++ mentioned above
(seeing as the Haskell is described as `just as fast')?

> Quicksort is another trivial example where the reflex is to cite:
>
> qsort [] = []
> qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)
>
> which is obviously not quicksort

(I'm curious: why isn't it?)

> and, by my measurements, is over 6,000x slower than my F# when sorting
> 20k elements. You could not hope to sort even 1M elements with that
> function.

I'd have hoped that the reflex is simply to use the library function
from Data.Sort, which has indeed been receiving attention over the past
couple of years - I think people have even played with mutable array
implementations, I don't know what the latest one uses. But, absolutely,
naive reflex code in Haskell can be grossly inefficient, as I already
said.

(snip)

> The problem is not that newbies can easily write inefficient Haskell but
> that even the most seasoned experts cannot make all Haskell competitively
> efficient.

Then why does it perform comparably with OCaml at
http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all ?

(snip)

> I think it is way beyond the realms of "taking some understanding". Look at
> the complete lack of performance-related information in all of the
> literature on Haskell.

I'm sorry, but that really is absolute rubbish. Google is your friend.
Or just start at http://www.haskell.org/haskellwiki/Performance and
follow the links to the more specific pages.

Or, if you weirdly consider literature online to not be `literature',
search academic journals instead. For instance, consider
http://portal.acm.org/citation.cfm?id=1481845
Mind you, a bunch of the more recent papers tend to be multi-core
oriented.

I'm amused to read at http://www.haskell.org/haskellwiki/Performance/GHC

] Since GHC doesn't have any credible competition in the performance
] department these days it's hard to say what overly-slow means, so just
] use your judgement!

A significant fraction of the GHC effort over the past couple of years
has been on performance improvements. Various aspects of that routinely
get mentioned on the mailing lists.

(snip)

> Sure. Data.HashTable was never the real issue though. The perf bug in the
> GHC run-time that afflicts all boxed arrays and has been outstanding for
> over 6 years was the real problem.

I wonder how much difference it actually makes these days since the
relevant changes in GHC 6.6.

> I have consulted some people with far more Haskell-specific experience than
> myself about the potential of using Haskell in the context of technical
> computing and they unanimously agreed that it was nowhere near usable yet,
> essentially for these reasons (and unpredictable space).

You don't /have/ to predict space - the profiler's very nice. You just
follow some rules of thumb (some of which will be on the performance
pages I pointed you to) and squash any remaining leaks afterward once
you've seen where they happen. I'm hardly a Haskell expert - I don't
even understand half the mailing list traffic - and I'm already finding
that I introduce very few space leaks any more and can squash them all
without usually even needing the ticky-ticky profiling.

Mark

Mark T.B. Carroll

unread,

Jul 21, 2009, 11:22:46 AM7/21/09

to

"Mark T.B. Carroll" <Mark.C...@Aetion.com> writes:

> Jon Harrop <j...@ffconsultancy.com> writes:
(snip)
>> I think it is way beyond the realms of "taking some understanding". Look at
>> the complete lack of performance-related information in all of the
>> literature on Haskell.
>
> I'm sorry, but that really is absolute rubbish. Google is your friend.
> Or just start at http://www.haskell.org/haskellwiki/Performance and
> follow the links to the more specific pages.

(snip)

I should add, the Haskell Performance /chapter/ of
http://en.wikibooks.org/wiki/Haskell may also be of interest -
it's only the last couple of sections now that remain unfinished.

(It's a pity the parallelism one isn't done yet - GHC's threading these
days performs wonderfully quickly, at least on Linux where I use it, and
with STM as well it's so easy to write multithreaded code that works
well.)

Mark

Jon Harrop

unread,

Jul 22, 2009, 5:10:45 AM7/22/09

to

Mark T.B. Carroll wrote:
> Jon Harrop <j...@ffconsultancy.com> writes:
>> There seems to be overwhelming evidence to the contrary to me.
>>
>> For example, the Burrows-Wheeler Transform has been an outstanding
>> challenge since it appeared on the Haskell Cafe mailing list over 2 years
>> ago:
>>
>> http://www.haskell.org/pipermail/haskell-cafe/2007-June/027387.html
>
> Hang on, you cite a thread where you replied to someone saying,
>
> ] The point being that lots of people look at Haskell and go "oh, that's
> ] very cute for writing trivial example code, but it can never be fast;
> ] for that you must use C or C++".
>
> ] Well, I altered the code, and it's *still* very short and very
> ] readable, and it's just as fast as the (3 pages long) C++ version.
> ] >:-D

Turns out that was a Haskeller comparing with bad C++ in order to make
Haskell look good:

http://www.mail-archive.com/haskell-cafe%40haskell.org/msg25615.html

> and where it was also mentioned that BWT may be unusual because,
>
> ] bwt transformation is very good researched area, so probably you will
> ] not get decent performance (megabytes per second) without lot of work.

I got decent performance without a lot of work.

> You're also picking a thread from a couple of years ago, which I'm
> guessing doesn't include many of the GHC improvements I'm talking about
> above.

We redid the test this week on c.l.functional and the results were the same:
the fastest Haskell is still orders of magnitude slower.

> (snip)
>> Two years later and the fastest known Haskell is still 200x slower than
>> my F#.
>
> Oooh, your F# is around 200 times faster than the C++ mentioned above
> (seeing as the Haskell is described as `just as fast')?

Exactly, yes.

>> Quicksort is another trivial example where the reflex is to cite:
>>
>> qsort [] = []
>> qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x)
>> xs)
>>
>> which is obviously not quicksort
>
> (I'm curious: why isn't it?)

Lennart Augustsson explained:

http://augustss.blogspot.com/2007/08/quicksort-in-haskell-quicksort-is.html

In essence, quicksort is an in-place algorithm. Indeed, that is what puts
the "quick" in "quicksort".

>> and, by my measurements, is over 6,000x slower than my F# when sorting
>> 20k elements. You could not hope to sort even 1M elements with that
>> function.
>
> I'd have hoped that the reflex is simply to use the library function
> from Data.Sort, which has indeed been receiving attention over the past
> couple of years

List.sort is 300x slower than my F# here.

> I think people have even played with mutable array implementations, I
> don't know what the latest one uses.

Great! Do any of them get within 10x the performance of conventional
implementations (e.g. my F#)?

> (snip)
>> The problem is not that newbies can easily write inefficient Haskell but
>> that even the most seasoned experts cannot make all Haskell competitively
>> efficient.
>
> Then why does it perform comparably with OCaml at
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all ?

That does not address my counter examples.

> (snip)
>> I think it is way beyond the realms of "taking some understanding". Look
>> at the complete lack of performance-related information in all of the
>> literature on Haskell.
>
> I'm sorry, but that really is absolute rubbish. Google is your friend.
> Or just start at http://www.haskell.org/haskellwiki/Performance and
> follow the links to the more specific pages.

You're joking. One article is about stack overflows, which is irrelevant.
Another is about basic tail recursion. The largest article essentially
says "don't evaluate what you don't need". There is only one actual
comparison of performance measurements in the entire thing!

Here is a list of all of the case studies given in those articles:

1. Sum a list of numbers.

2. Extract even and odd elements from a list.

3. Mergesort.

4. List length.

5. Average a list of numbers.

In each case, they examined less than a dozen lines of code. Compare that to
even the literature about performance in OCaml that I wrote myself in the
past 2 years:

1. Optimized bytecode interpreting.

2. Optimized bytecode compilation to native code.

3. FFT.

4. Lexing and parsing.

5. Ray tracing.

6. FFI.

7. SciMark2 benchmark.

8. FEMS.

9. Fork-based parallelism with marshalling.

10. Mesh generation.

11. Huffman.

12. Traveling salesman with simulated annealing.

13. Regular expressions.

14. Fluid dynamics.

15. Flocking.

16. Global optimization.

17. LZW.

18. Singular value decomposition.

19. QR decomposition.

Each case study is 100-200 lines of code with performance measurements and a
walkthrough of the optimization process. In all cases, the performance
obtained from OCaml is within 3x that of C.

To the best of my knowledge there is nothing comparable in the Haskell
world. For example, I cannot even find a Haskell port of SciMark2.

> Or, if you weirdly consider literature online to not be `literature',
> search academic journals instead. For instance, consider
> http://portal.acm.org/citation.cfm?id=1481845
> Mind you, a bunch of the more recent papers tend to be multi-core
> oriented.

That discussion of linked lists is better than nothing but I would much
rather see case studies of the optimization of solutions to real problems,
like the ones I just listed.

> I'm amused to read at http://www.haskell.org/haskellwiki/Performance/GHC
>
> ] Since GHC doesn't have any credible competition in the performance
> ] department these days it's hard to say what overly-slow means, so just
> ] use your judgement!
>
> A significant fraction of the GHC effort over the past couple of years
> has been on performance improvements. Various aspects of that routinely
> get mentioned on the mailing lists.

I read the mailing list. The Burrows Wheeler challenge from 2007 was one of
the few threads that caught my eye because it is a nice real-world example.

> (snip)
>> Sure. Data.HashTable was never the real issue though. The perf bug in the
>> GHC run-time that afflicts all boxed arrays and has been outstanding for
>> over 6 years was the real problem.
>
> I wonder how much difference it actually makes these days since the
> relevant changes in GHC 6.6.

Why would it make a difference if the bug was never fixed?

Adrian Hey

unread,

Jul 22, 2009, 11:39:57 AM7/22/09

to

Hello Mark,

Mark T.B. Carroll wrote:
> You don't /have/ to predict space - the profiler's very nice. You just
> follow some rules of thumb (some of which will be on the performance
> pages I pointed you to) and squash any remaining leaks afterward once
> you've seen where they happen.

But a Haskeller (of all people :-) should realise you can't prove
absence of space leaks empirically. Anyway, what will heap profiling
with ghc tell you about space behaviour with jhc or any other compiler?
(not much, see my other post). In that sense profiling is even less
satisfactory than testing correctness with quickcheck/smallcheck or
whatever.

OK, I admit making precise statements about space behaviour (let alone
proving them) is not a trivial thing. But one might reasonably hope for
some consistency across implementations, optimisation settings,
versions etc.

Also, squashing leaks may not be so simple in practice. Do you remember
the hoops folk had to jump through to implement a non-leaky histogram
with Data.Map before insertWith' was added?

Regards
--
Adrian Hey

Isaac Gouy

unread,

Jul 22, 2009, 1:14:39 PM7/22/09

to

On Jul 21, 8:17 am, "Mark T.B. Carroll" <Mark.Carr...@Aetion.com>

wrote:
> Jon Harrop <j...@ffconsultancy.com> writes:

-snip-

> > The problem is not that newbies can easily write inefficient Haskell but
> > that even the most seasoned experts cannot make all Haskell competitively
> > efficient.
>

> Then why does it perform comparably with OCaml at http://shootout.alioth.debian.org/gp4/benchmark.php?test=all〈=all?
-snip-

Those are year old measurements which are not updated.

These are the current measurements:

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=ghc&lang2=ocaml&box=1

http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=ghc&lang2=ocaml&box=1

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=all&d=data&gpp=on&java=on&ghc=on&ocaml=on&calc=calculate&box=1

http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=all&d=data&gpp=on&java=on&ghc=on&ocaml=on&calc=calculate&box=1

http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=all&d=data

http://shootout.alioth.debian.org/u32q/benchmark.php?test=all&lang=all&d=data

Jon Harrop

unread,

Jul 22, 2009, 2:48:43 PM7/22/09

to

Jon Harrop wrote:

> Mark T.B. Carroll wrote:
>> Then why does it perform comparably with OCaml at
>> http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all ?
>
> That does not address my counter examples.

Actually, it provides another example. Haskell is 3x slower and 4x more
verbose on Fannkuch:

http://shootout.alioth.debian.org/u32/benchmark.php?test=fannkuch&lang=all

Note that OCaml gets within 20% of C's performance.

Isaac Gouy

unread,

Jul 22, 2009, 2:26:59 PM7/22/09

to

On Jul 22, 11:48 am, Jon Harrop <j...@ffconsultancy.com> wrote:
> Jon Harrop wrote:
> > Mark T.B. Carroll wrote:
> >> Then why does it perform comparably with OCaml at

> >>http://shootout.alioth.debian.org/gp4/benchmark.php?test=all〈=all?

>
> > That does not address my counter examples.
>
> Actually, it provides another example. Haskell is 3x slower and 4x more
> verbose on Fannkuch:
>

> http://shootout.alioth.debian.org/u32/benchmark.php?test=fannkuch&lan...

>
> Note that OCaml gets within 20% of C's performance.

Actually, it provides another /selective/ example.

Apparently being selective with the evidence wasn't enough because you
then rounded up the multiples in favour of OCaml.

The same OCaml program that is faster than the Haskell #5 program when
both are forced to use a single core is slower than the Haskell
program when both are allowed to use quadcore -

http://shootout.alioth.debian.org/u32q/fulldata.php?test=fannkuch&p1=ghc-5&p2=ocaml-1&p3=ghc-5&p4=ocaml-1

And the more verbose re-written OCaml #2 program is also slower than
the Haskell #5 program when both are allowed to use quadcore -

http://shootout.alioth.debian.org/u32q/fulldata.php?test=fannkuch&p1=ghc-5&p2=ocaml-2&p3=ghc-5&p4=ocaml-2

Mark T.B. Carroll

unread,

Jul 26, 2009, 12:42:47 AM7/26/09

to

Jon Harrop <j...@ffconsultancy.com> writes:

> Mark T.B. Carroll wrote:
(snip)

>> You're also picking a thread from a couple of years ago, which I'm
>> guessing doesn't include many of the GHC improvements I'm talking about
>> above.
>
> We redid the test this week on c.l.functional and the results were the same:
> the fastest Haskell is still orders of magnitude slower.

Given that it's over a decade since I last implemented the BWT, I had a
quick go today to see how mine did relative to the one you mentioned. In
fairness to the one in the mailing list article you posted a link to, I
followed it in going via the inefficient native String implementation
instead of just reading and writing IOUArrays of Word8 directly from
disk, I maintained its inclusion of an explicit EOF, I found myself
inventing a really weird mergesort that I'm sure is suboptimally
nonstandard, and I resisted paralellizing my sort code. There might be
bugs, but not apparently very consequential ones - I did check against a
couple of small test vectors I found online, and do note the asserts at
the end of the code. But even given all that, before starting to profile
and optimize I seem to be beating that other version by over 10x for a
block size of 2Mb, and I haven't even bothered considering trying
something like a suffix tree sort instead. However, I've now put enough
time in that I'll now leave it for others to point out room for
improvement in my code. Given your previous query, I chose to use
IOUArrays so you could see them in action.

(snip)

> Lennart Augustsson explained:
>
> http://augustss.blogspot.com/2007/08/quicksort-in-haskell-quicksort-is.html
>
> In essence, quicksort is an in-place algorithm. Indeed, that is what puts
> the "quick" in "quicksort".

Looking around the web, people seem to have their own ideas on what
makes a sort a quicksort. However, for your interest, I included a
simple in-place quicksort implementation in my code. (Which is only used
for small lists, given that the BWT needs good worst-case complexity.)

>> I think people have even played with mutable array implementations, I
>> don't know what the latest one uses.
>
> Great! Do any of them get within 10x the performance of conventional
> implementations (e.g. my F#)?

I don't know off-hand. It hasn't been an issue for me yet; you'd have to
ask others. (I can't check your F# because we're Linux all the way.)

(snip)

>> Then why does it perform comparably with OCaml at
>> http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all ?
>
> That does not address my counter examples.

It addresses your point of competitively-efficient with a large number
of examples.

(I've only noticed you mention BWT so far.)

>> (snip)
>>> I think it is way beyond the realms of "taking some understanding". Look
>>> at the complete lack of performance-related information in all of the
>>> literature on Haskell.
>>
>> I'm sorry, but that really is absolute rubbish. Google is your friend.
>> Or just start at http://www.haskell.org/haskellwiki/Performance and
>> follow the links to the more specific pages.
>
> You're joking. One article is about stack overflows, which is irrelevant.
> Another is about basic tail recursion. The largest article essentially
> says "don't evaluate what you don't need". There is only one actual
> comparison of performance measurements in the entire thing!

You're obviously not actually looking. There's a /lot/ more than that
just one click from that URL, let alone the other I also provided. I
think I shall be content to let other readers look for themselves rather
than relying on our characterizations though.

(snip)

>> Or, if you weirdly consider literature online to not be `literature',
>> search academic journals instead. For instance, consider
>> http://portal.acm.org/citation.cfm?id=1481845
>> Mind you, a bunch of the more recent papers tend to be multi-core
>> oriented.
>
> That discussion of linked lists is better than nothing

... and was just an example of what can be found, hence my suggestion to
`search'! Indeed, a good number of the papers are in
performance-oriented journals. (Again, I don't think I'm going to try to
seek your agreement there, I'll be content for interested readers to
look for themselves.)

> I read the mailing list. The Burrows Wheeler challenge from 2007 was one of
> the few threads that caught my eye because it is a nice real-world
> example.

Do you really think I'm suggesting that performance can be got from
Haskell without having real-world examples in mind? My employer's been
using Haskell since 2001 for most software development. For instance,
two large projects we have going right now in Haskell are (a) fusion of
incoming sensor reports for battlefield situation awareness and enemy
intent inference, and (b) computational docking of a large set of
ligands with proteins, without prior hints as to where the active sites
are - for both of which, performance is rather critical. However, this
is closed-source proprietary code, so I can't really share much about
it. But certainly don't imagine that there aren't real people using
Haskell to create performance-sensitive code for paying customers who
have real-world problems that need to be solved (hence my amazement at
your assertions). (An aside: I find it hard to imagine that the
investment bankers' interest in it isn't performance-sensitive either,
given the high speed at which algorithmic trading occurs these days -
and they're going to tell you about as much about their code as I can
about my employer's.)

>> (snip)
>>> Sure. Data.HashTable was never the real issue though. The perf bug in the
>>> GHC run-time that afflicts all boxed arrays and has been outstanding for
>>> over 6 years was the real problem.
>>
>> I wonder how much difference it actually makes these days since the
>> relevant changes in GHC 6.6.
>
> Why would it make a difference if the bug was never fixed?

Because, as you can well see from activity on the bug report you linked
to, in GHC 6.6 other relevant things touched it that might ameliorate
it.

I'm getting the sense that you've already made your mind up, though. I'm
happy to support open-minded inquiry, but I'm not about to try to force
someone's mind to change. I'll leave it to others to correct any
remaining misapprehensions you have.

Of course, you will insist on seeing my code. Most of it's just a pair
of in-place sorts given that you seemed interested in such.

module BWT (main) where
import Control.Exception
import Control.Monad
import Control.Parallel.Strategies
import Data.Array.IO
import Data.Array.Unboxed
import Data.Char
import Data.Function (on)
import Data.List
import System.Time

-- previous code

zipWith' f [] _ = []
zipWith' f (x:xs) ~(y:ys) = f x y : zipWith' f xs ys

bwt1 :: String -> String

bwt1 = map head . sortBy (compare `on` tail) . init . tails . ('\0':)

rbwt1 :: String -> String

rbwt1 xs = tail $ zipWith' (flip const) xs $ head res
where res = sortBy (compare `on` head) (zipWith' (:) xs res)

-- new code

foldM' :: Monad m => (b -> a -> m b) -> b -> [a] -> m b

foldM' _ a _ | a `seq` False = error "never get here"

foldM' _ a [] = return a

foldM' f a (x:xs) =
do fa <- f a x
foldM' f fa xs

comparatorFor :: UArray Int Char -> Int -> Int -> Ordering

comparatorFor characters i1 i2 =
compareStrings i1 i1 i2
where
(start, end) = bounds characters
compareStrings i0 i1 i2 =
case compare (characters ! i1) (characters ! i2) of
EQ -> let i1' = if i1 == end then start else succ i1
i2' = if i2 == end then start else succ i2
in if i0 == i1' then EQ else compareStrings i0 i1' i2'
cmp -> cmp

{-# SPECIALIZE arraySort :: (Int -> Int -> Ordering) -> IOUArray Int Int -> IO () #-}
{-# SPECIALIZE arraySort :: (Char -> Char -> Ordering) -> IOUArray Int Char -> IO () #-}

arraySort compare array =
do bounds@(left, right) <- getBounds array
temp <- newArray_ bounds
inTemp <- sortSlice array temp left right False
when inTemp $ sequence_ [ readArray temp i >>= writeArray array i | i <- [left..right] ]
where
sortSlice array temp left right inTemp =
do let mid = div (left + right) 2
if right - left > 25
then do inTemp1 <- sortSlice array temp left mid inTemp
inTemp2 <- sortSlice array temp (succ mid) right inTemp
case (inTemp1, inTemp2) of
(False, False) -> merge array temp left left mid (succ mid) right >> return True
(True , False) -> do sequence_ [ readArray array i >>= writeArray temp i | i <- [ succ mid .. right ] ]
merge temp array left left mid (succ mid) right >> return False
(False, True ) -> do sequence_ [ readArray array i >>= writeArray temp i | i <- [ left .. mid ] ]
merge temp array left left mid (succ mid) right >> return False
(True , True ) -> merge temp array left left mid (succ mid) right >> return False
else do quickSort (if inTemp then temp else array) left right
return inTemp
merge from to z sx ex sy ey =
case (sx <= ex, sy <= ey) of
(False, False) -> return ()
(True , False) -> sequence_ [ readArray from xi >>= writeArray to zi | (xi, zi) <- zip [sx..ex] [z..] ]
(False, True ) -> sequence_ [ readArray from yi >>= writeArray to zi | (yi, zi) <- zip [sy..ey] [z..] ]
(True, True ) -> do xElement <- readArray from sx
yElement <- readArray from sy
if compare xElement yElement /= GT
then writeArray to z xElement >> merge from to (succ z) (succ sx) ex sy ey
else writeArray to z yElement >> merge from to (succ z) sx ex (succ sy) ey
quickSort array left right =
when (left < right) $
do pivot <- partition array left right (div (left + right) 2)
quickSort array left (pred pivot)
quickSort array (succ pivot) right
partition array left right pivot
| left < pivot =
do leftElement <- readArray array left
pivotElement <- readArray array pivot
if compare leftElement pivotElement == GT
then do writeArray array left pivotElement
writeArray array pivot leftElement
partition array left right left
else partition array (succ left) right pivot
| right > pivot =
do rightElement <- readArray array right
pivotElement <- readArray array pivot
if compare rightElement pivotElement == LT
then do writeArray array right pivotElement
writeArray array pivot rightElement
partition array left right right
else partition array left (pred right) pivot
| otherwise = return pivot

bwt2 :: String -> IO String

bwt2 string =
do let size = length string
let packedString = listArray (0, size) (string ++ "\0")
indices <- newListArray (0, size) [0..] :: IO (IOUArray Int Int)
arraySort (comparatorFor packedString) indices
indexList <- getElems indices
return $ map ((packedString !) . (`mod` (succ size)) . pred) indexList

rbwt2 :: String -> IO String

rbwt2 string =
do let size = (1, length string)
let packedString = listArray size string
sortedString <- thaw packedString
arraySort compare sortedString
firstOccurrences <- newArray_ ('\0', '\255')
foldM_ (findFirstOccurrences sortedString firstOccurrences) '\256' [ 1 .. length string ]
howManyUsed <- newArray ('\0', '\255') 0
paths <- newArray_ size
forM_ [ 1 .. length string ] $ findPaths paths packedString firstOccurrences howManyUsed
let i = head [ i | i <- [ 1 .. length string ], packedString ! i == '\0' ]
decodeString sortedString paths i
where
findFirstOccurrences sortedString firstOccurrences previous i =
do char <- readArray sortedString i
when (char /= previous) $ writeArray firstOccurrences char i
return char
findPaths :: IOUArray Int Int -> UArray Int Char -> IOUArray Char Int -> IOUArray Char Int -> Int -> IO ()
findPaths paths packedString firstOccurrences howManyUsed i =
do let char = packedString ! i
firstOfChar <- readArray firstOccurrences char
howManyOfChar <- readArray howManyUsed char
writeArray howManyUsed char (succ howManyOfChar)
writeArray paths (firstOfChar + howManyOfChar) i
decodeString :: IOUArray Int Char -> IOUArray Int Int -> Int -> IO String
decodeString sortedString paths i =
do originalHead <- readArray sortedString i
if originalHead == '\0'
then return []
else do originalTail <- readArray paths i >>= decodeString sortedString paths
return $ originalHead : originalTail

main =
do linux <- readFile "/boot/vmlinuz"
let testString = take (2^21) linux
seq (sum [ ord char | char <- testString ]) $ putStrLn "Read test string into memory"
putStrLn "Starting first test"
start1 <- getClockTime
let testString1 = rbwt1 (bwt1 testString)
putStrLn "Finished first test" `demanding` rnf testString1
end1 <- getClockTime
putStrLn "Starting second test"
start2 <- getClockTime
testString2 <- bwt2 testString >>= rbwt2
putStrLn "Finished second test" `demanding` rnf testString2
end2 <- getClockTime
assert (testString == testString1) $
putStrLn $ "Time for first test: " ++ timeDiffToString (diffClockTimes end1 start1)
assert (testString == testString2) $
putStrLn $ "Time for second test: " ++ timeDiffToString (diffClockTimes end2 start2)

Here's a use of it -

~/src/haskell$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 6.10.3
~/src/haskell$ ghc --make -O2 -o bwt BWT.hs -main-is BWT
[1 of 1] Compiling BWT ( BWT.hs, BWT.o )
Linking bwt ...
~/src/haskell$ ./bwt +RTS -K1G
Read test string into memory
Starting first test
Finished first test
Starting second test
Finished second test
Time for first test: 183 secs
Time for second test: 17 secs
~/src/haskell$

Note that the time is for doing /and/ reversing the BWT. I'd be
interested to know how it compares to your F# code.

My implementation's probably a bit on crack, I know, I just knocked it
together with half-remembered half-understood stuff. But, it's really
not as hard as you make out, given that it's already working fairly
quickly and correctly despite obvious room for improvement along lines I
mentioned and others. That thing you pointed to on the mailing list
certainly wasn't representative - otherwise we'd have dropped Haskell
use years ago.

Mark

Mark T.B. Carroll

unread,

Jul 26, 2009, 12:44:16 AM7/26/09

to

Isaac Gouy <igo...@yahoo.com> writes:

> Those are year old measurements which are not updated.
>
> These are the current measurements:

...

Ah, thank you. They're encouraging. (-:

Mark

Mark T.B. Carroll

unread,

Jul 26, 2009, 12:50:33 AM7/26/09

to

Adrian Hey <ah...@NoSpicedHam.iee.org> writes:

> Hello Mark,
>
> Mark T.B. Carroll wrote:
>> You don't /have/ to predict space - the profiler's very nice. You just
>> follow some rules of thumb (some of which will be on the performance
>> pages I pointed you to) and squash any remaining leaks afterward once
>> you've seen where they happen.
>
> But a Haskeller (of all people :-) should realise you can't prove
> absence of space leaks empirically. Anyway, what will heap profiling
> with ghc tell you about space behaviour with jhc or any other
> compiler?

Not much, I assume. Well, I expect good things of jhc once you actually
get a compiled binary out of it, and yhc can be surprisingly good at
times, but, indeed, the Haskell 98 spec leaves some wiggle room over
when evaluation happens, among other things, so indeed you probably
can't be sure without some rather more careful thinking than I usually
perform. You'd certainly want to look again if you switched compiler
(which I don't think we've ever done).

(snip)

> OK, I admit making precise statements about space behaviour (let alone
> proving them) is not a trivial thing. But one might reasonably hope for
> some consistency across implementations, optimisation settings,
> versions etc.

Yes, though it is nice that multiple implementations exist at all! And
frequently it's just a matter of adding a bit more strictness here and
there if there is some odd problem.

> Also, squashing leaks may not be so simple in practice. Do you remember
> the hoops folk had to jump through to implement a non-leaky histogram
> with Data.Map before insertWith' was added?

Oh, there's an insertWith' now? Cool. Data.Set and Data.Map have been
bugbears of mine: I have to remember to force them each step if I am
accumulating them in a fold or something, and the lazy `With' operator
stuff on the values has on occasion had me just grab the source and make
my own tweaked version that does what I'm guessing insertWith' now does.

Mark

Mark T.B. Carroll

unread,

Jul 26, 2009, 1:19:54 AM7/26/09

to

"Mark T.B. Carroll" <Mark.C...@Aetion.com> writes:

> Jon Harrop <j...@ffconsultancy.com> writes:
(snip)

>> In essence, quicksort is an in-place algorithm. Indeed, that is what puts
>> the "quick" in "quicksort".

(snip)
> import Data.Array.IO

I should mention, for fast mutation of arrays of 8-bit-wide characters,
you'll find the Data.ByteString libraries impressive. A more sensible
BWT implementation would probably use them or at least Word8 instead of
the arrays of Char I used, given that the Chars may be rather wider than
8 bits wide; most of my code should be fine with that change, just
fiddle the '\256' and get rid of String, instead working directly on
files with the rapid IO for arrays of Word8 that the libraries offer.

Though, my larger point is that if you're interested in fast Haskell
code that solves real-world problems, you may be illuminated by some of
the links from http://www.cse.unsw.edu.au/~dons/papers/CSL06.html -
at least then I don't have to appeal to things I can't share here!

Mark

Mark T.B. Carroll

unread,

Jul 26, 2009, 8:53:01 AM7/26/09

to

"Mark T.B. Carroll" <Mark.C...@Aetion.com> writes:

> Given that it's over a decade since I last implemented the BWT, I had a
> quick go today to see how mine did relative to the one you mentioned.

Jon - I should have mentioned, I picked one near the end of that thread
that people seemed to like, guessing it was one of the better ones - I
didn't know which one you considered `best'. It's easy enough to plug in
others, of course.

Mark

Mark T.B. Carroll

unread,

Jul 26, 2009, 2:41:30 PM7/26/09

to

"Mark T.B. Carroll" <Mark.C...@Aetion.com> writes:

> ~/src/haskell$ ./bwt +RTS -K1G

I was thinking how I was a bit disappointed at the speed, and it occurs
to me that it's probably hemorrhaging stack somewhere. My usual approach
to that would be to lay some cost centers around to profile the stack
usage; usually it means I need to force evaluation in each step of some
large loop or recursion, and in this case I wonder if the arithmetic's
somehow to blame. That's the kind of efficiency bug that I seem to spend
time creating and squashing, not that I have time right now for things
people aren't paying me for.

Still, it opens the larger point of what rules of thumb I still need to
learn to manage stack usage better.

By coincidence, the actual work I'm doing today is also full of arrays.

Mark

Paul Rubin

unread,

Jul 26, 2009, 3:54:45 PM7/26/09

to

"Mark T.B. Carroll" <Mark.C...@Aetion.com> writes:

> Looking around the web, people seem to have their own ideas on what
> makes a sort a quicksort. However, for your interest, I included a
> simple in-place quicksort implementation in my code. (Which is only used
> for small lists, given that the BWT needs good worst-case complexity.)

Why not use uvector.algorithms? It has some in-place sorts, it
supports stream fusion, etc. I had been thinking of trying to adapt
Ertugul's BWT to use uvector and run a few measurements.

http://hackage.haskell.org/package/uvector-algorithms

Mark T.B. Carroll

unread,

Jul 26, 2009, 5:44:46 PM7/26/09

to

Paul Rubin <http://phr...@NOSPAM.invalid> writes:

> "Mark T.B. Carroll" <Mark.C...@Aetion.com> writes:
>> Looking around the web, people seem to have their own ideas on what
>> makes a sort a quicksort. However, for your interest, I included a
>> simple in-place quicksort implementation in my code. (Which is only used
>> for small lists, given that the BWT needs good worst-case complexity.)
>
> Why not use uvector.algorithms? It has some in-place sorts, it
> supports stream fusion, etc.

Mostly because I'd never noticed it before. (-: It even seems to have
radix sort which I think the original BWT report suggested might be
useful for some part.

Though I don't know if we do have to get into suffix trees for something
really good.

> I had been thinking of trying to adapt Ertugul's BWT to use uvector
> and run a few measurements.

That's an interesting idea. I'd be astonished if something fairly
efficient weren't too much of a distance from something somebody's
already posted.

Mark

Jon Harrop

unread,

Jul 28, 2009, 1:02:42 PM7/28/09

to

I would be *very* interested to see this if anyone does it!

Jon Harrop

unread,

Jul 28, 2009, 1:03:26 PM7/28/09

to

Mark T.B. Carroll wrote:
> I'd be astonished if something fairly efficient weren't too much of a
> distance from something somebody's already posted.

Dude, you should post that on the wiki as some more performance advice. ;-)