[Caml-list] Re: Why OCaml sucks

Jon Harrop

unread,

May 8, 2008, 8:45:16 PM5/8/08

to caml...@yquem.inria.fr

Brian Hurt recently published the following blog post "Why OCaml sucks":

http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/

I think it is interesting to discuss which aspects of OCaml can be improved
upon and how but I disagree with some of his points. I'll address each of the
original points in turn:

1. Lack of Parallelism: Yes, this is already a complete show stopper.
Exploiting multicores requires a scalable concurrent GC and message passing
(like JoCaml) is not a substitute. Unfortunately, this is now true of all
functional languages available for Linux, which is why we have now migrated
entirely to Windows and F#. I find it particularly ironic that the Haskell
community keep hyping the multicore capabilities of pure code when the
rudimentary GC in Haskell's only usable implementation already stopped
scaling.

2. Printf: I like OCaml's printf. So much, in fact, that I wish it were in
Pervasives (as it is in F#) so I didn't have to do "open Printf" all the time
in OCaml. While there are theoretically-elegant functional equivalents they
all suck in practical terms, primarily due to hideous error messages. I think
printf is one of the reasons OCaml dominates over languages like Haskell and
SML. Easy hash tables are another.

3. Lack of multi-file modules: I have never found this to be a problem. Nor do
I find filenames implying module names to be a problem, as many SML advocates
seem to believe (yes, both of them ;-).

4. Mutable data: I believe the exact opposite. The ability to drop down to
mutable data structures for performance without leaving the language is
essential and the ability to predict memory consumption is essential, both of
which Haskell lacks. Consequently, Haskell's inability to handle mutation
efficiently and safely have doomed it to failure for practical applications.

5. Strings: pushing unicode throughout a general purpose language is a
mistake, IMHO. This is why languages like Java and C# are so slow.

6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that I
would like to tweak (e.g. adding an optional "end" after a "match"
or "function" to make them easier to nest), I am not bother about the
shift-reduce conflicts. Mainstream languages get by with far more serious
syntactic issues (like <<...>> in C++).

7. Not_found: I like this, and Exit and Invalid_argument. Brian's point that
the name of this exception does not convey its source is fallacious: that's
what exception traces are for.

8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
the "exceptions are for exceptional circumstances" line promoted by the
advocates of any language implementation with cripplingly-slow exception
handlers. I really miss fast exception handling in F#. Brian gives an example
of exception handling with recursive IO functions failing to be tail
recursive here and advocates option types. But recursion is the wrong tool
for the job here and option types are even worse. You should use mutation
and, failing that, CPS.

9. Deforestation: Brian says "Haskell has introduced a very interesting and
(to my knowledge) unique layer of optimization, called deforrestation". True,
of course, but useless theoretical piffle because we know that Haskell is
slow in practice and prohibitively difficult to optimize to-boot. Deforesting
is really easy to do by hand.

10. Limited standard library: I agree but this is only an issue because we are
not able to fix the problem by contributing to the OCaml distribution.

11. Slow lazy: I had never noticed.

The only major gripe that I have with OCaml is lack of a concurrent GC. I
think this general deficit is going to have a massive adverse impact on the
whole of Linux and lots of people will migrate to Windows and .NET when they
see how much faster their code can run.

I have other wish-list items of my own to add:

JIT compilation for metaprogramming.
Type specialization.
Unboxed types (structs).
No 16Mb limit.
Inlining.
Custom per-type functions for comparison, equality and hashing.
An intermediate representation that I can sell software in to earn a living.
Pattern matching over lazy values.

I believe these can be fixed by creating a new open source functional language
for Linux based upon LLVM. However, the lack of a suitable GC is a complete
show stopper. The JVM is the only thing that comes close and it is unable to
support tail calls without a catastrophic performance cost, i.e. so bad that
you might as well write an interpreter.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs

Matthew William Cox

unread,

May 8, 2008, 9:11:40 PM5/8/08

to caml...@yquem.inria.fr

On Fri, May 09, 2008 at 01:39:54AM +0100, Jon Harrop wrote:
> Brian Hurt recently published...

>
> 3. Lack of multi-file modules: I have never found this to be a
> problem. Nor do I find filenames implying module names to be a
> problem, as many SML advocates seem to believe (yes, both of them ;-).

I would add a 3a: the inability to have a "root"-level functor. As it
stands, all functors are (sometimes naturally, sometimes not) embedded
in a "root" module. I'm looking at you, Map.Make. You too, Set.Make.

> 4. Mutable data: I believe the exact opposite. The ability to drop down to
> mutable data structures for performance without leaving the language is
> essential and the ability to predict memory consumption is essential, both of
> which Haskell lacks. Consequently, Haskell's inability to handle mutation
> efficiently and safely have doomed it to failure for practical applications.

Well, in fairness, there's always the ST monad. However, the rank-2
polymorphism involved is nontrivial, and I agree with you [Jon.]

> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.

This is simply ridiculous. Using heavy-weight unicode-aware functions
for character operations may slow down string-intensive operations in
those languages, but the only alternative is to be broken. See items 3
and 4 here:

http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html

In both cases, we need locale-aware character processing. That means
Unicode these days. Unless you code your own routines for processing
every 8-bit character set out there. I don't.

> 6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that I
> would like to tweak (e.g. adding an optional "end" after a "match"
> or "function" to make them easier to nest), I am not bother about the
> shift-reduce conflicts. Mainstream languages get by with far more serious
> syntactic issues (like <<...>> in C++).

I've been bitten by exactly these sorts of problems. Sometimes I get an
obscure type error, sometimes I don't. I've just gotten used to not
placing a match-like construct in the middle of a sequence expression
when I can avoid it.

> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
> the "exceptions are for exceptional circumstances" line promoted by the
> advocates of any language implementation with cripplingly-slow exception
> handlers. I really miss fast exception handling in F#. Brian gives an example
> of exception handling with recursive IO functions failing to be tail
> recursive here and advocates option types. But recursion is the wrong tool
> for the job here and option types are even worse. You should use mutation
> and, failing that, CPS.

I suspect his reaction to exceptions comes from an unfamiliarity with
their uses in Ocaml. In C++/Java land, the gospel is to only use them
for exceptional circumstances. In the case of C++, this is probably
because of the difficulty of writing exception-safe code: if you're
going to throw an exception, your code is likely going to be broken as
a result. As a result, exceptions seem to be reserved for situations
where normal processing must terminate anyways.

> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation". True,
> of course, but useless theoretical piffle because we know that Haskell is
> slow in practice and prohibitively difficult to optimize to-boot. Deforesting
> is really easy to do by hand.

Yes, its easy to do by hand. It's also time consuming. Don't you make a
living off ocaml? I'm surprised you can justify using your time on such a
mechanical task. Granted, deforesting something like map f . map g into
map (f . g) is trivial, but we can come up with both trivial and
nontrivial examples for almost anything.

Cheers,
Matt

signature.asc

Arthur Chan

unread,

May 9, 2008, 12:46:20 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

On Thu, May 8, 2008 at 5:39 PM, Jon Harrop <j...@ffconsultancy.com> wrote:

>
> Brian Hurt recently published the following blog post "Why OCaml sucks":
>
> http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/
>
> I think it is interesting to discuss which aspects of OCaml can be improved
> upon and how but I disagree with some of his points. I'll address each of
> the
> original points in turn:
>
> 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> Exploiting multicores requires a scalable concurrent GC and message passing
> (like JoCaml) is not a substitute. Unfortunately, this is now true of all
> functional languages available for Linux, which is why we have now migrated
> entirely to Windows and F#. I find it particularly ironic that the Haskell
> community keep hyping the multicore capabilities of pure code when the
> rudimentary GC in Haskell's only usable implementation already stopped
> scaling.

Fork? For something like a raytracer, I do not see how threads would be any
more useful than fork. For the UI, the native threads suffice. For
windows... well f*ck windows. When was the last time you heard of a cool
new windows app anyway? MSFT has killed their own platform with bad
engineering decisions.

That said... I guess you could say we should support native multithreading,
just because there are apparently rare niche applications (I'm told
webservers are one), but I suspect that fork may still be better, and
there's other things that would be nicer, like full support for recursive
modules/functors. There's also a nasty bug involving the typechecker with
labeled arguments and classes. How do you file a bug against ocaml anyway?

>
>
> 2. Printf: I like OCaml's printf. So much, in fact, that I wish it were in
> Pervasives (as it is in F#) so I didn't have to do "open Printf" all the
> time
> in OCaml. While there are theoretically-elegant functional equivalents they
> all suck in practical terms, primarily due to hideous error messages. I
> think
> printf is one of the reasons OCaml dominates over languages like Haskell
> and
> SML. Easy hash tables are another.

Easy hash tables would indeed be nice. I think that the python syntax for
printf might be better, but I'm uncertain on that.

>
>
> 3. Lack of multi-file modules: I have never found this to be a problem. Nor
> do
> I find filenames implying module names to be a problem, as many SML
> advocates
> seem to believe (yes, both of them ;-).

Why in god's name would you have modules spread across multiple files?
That's like having a 1000 line function. This is why I want recursive
modules btw. Not only do I want them recursive, but I would like for them
to be able to be recursive across files, but I'm guessing that that one
would be a long shot. That having been said, I've always managed to work
around it elegantly in the end. The sort of situations that call for a
truly humongous module will often tend to favor OO style programming anyway.

>
>
> 4. Mutable data: I believe the exact opposite. The ability to drop down to
> mutable data structures for performance without leaving the language is
> essential and the ability to predict memory consumption is essential, both
> of
> which Haskell lacks. Consequently, Haskell's inability to handle mutation
> efficiently and safely have doomed it to failure for practical
> applications.
>

I freaking hate mutable data. I couldn't care less about performance.
Excessive use of mutable data makes code difficult to understand, and it is
why C code is so hard to understand. Mind you, I do C for a living and have
been writing code in it for nearly 10 years. While that doesn't make me a
master, I am not a mere n00b either. Making everything mutable means you
have to keep track of every variable in a function or class. It's not so
bad if you're looking at good code, but not all the programmers out there
are good.

And it is occasionally strangely difficult to write good code when your only
tools are mutable single variables (no tuples), pointers and function calls.

>
> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.

This is actually a serious letdown. There should be built-in support for
unicode in ocaml. If not, then there should at least be some maintained
"distro" for ocaml where camomile is packaged in. Indeed, i would go so far
to say that the linux distro maintainers should do their bit and make the
ocaml metapackages install camomile or some other unicode package. I do not
use it myself but unicode is terribly important for businesses.

>
>
> 6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that
> I
> would like to tweak (e.g. adding an optional "end" after a "match"
> or "function" to make them easier to nest), I am not bother about the
> shift-reduce conflicts. Mainstream languages get by with far more serious
> syntactic issues (like <<...>> in C++).

they should use parsers that aren't brain dead

>
> 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point
> that
> the name of this exception does not convey its source is fallacious: that's
> what exception traces are for.

I actually find it a bit annoying, but it's not serious enough for me to
complain about it normally. I would've prefered that the file open
functions return a Haskell style either of the file or the error itself.
Say, why don't we have an either datatype in ocaml?

>
>
> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
> the "exceptions are for exceptional circumstances" line promoted by the
> advocates of any language implementation with cripplingly-slow exception
> handlers. I really miss fast exception handling in F#. Brian gives an
> example
> of exception handling with recursive IO functions failing to be tail
> recursive here and advocates option types. But recursion is the wrong tool
> for the job here and option types are even worse. You should use mutation

> and, failing that, CPS.'

::blink:: exceptions? you guys use those things? xD

I dunno what that guy is bitching about. The code for exception handle
looks quite nearly like what the code would be if you were to do a match
blah with None -> () | Some x -> f x, which is what you'd probably replace
the exception-ish code with anyway.

> I have other wish-list items of my own to add:
>

> . JIT compilation for metaprogramming.
> . Type specialization.
> . Unboxed types (structs).
> . No 16Mb limit.

What do you mean by 16mb limit?

>
> . Inlining.

isn't it best for the compiler to handle that? I wouldn't mind hearing
another perspective on this, but I thought that compilers were smarter these
days.

>
> . Custom per-type functions for comparison, equality and hashing.
> . An intermediate representation that I can sell software in to earn a
> living.
> . Pattern matching over lazy values.

>
> I believe these can be fixed by creating a new open source functional
> language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable
> to
> support tail calls without a catastrophic performance cost, i.e. so bad
> that
> you might as well write an interpreter.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy Ltd.
> http://www.ffconsultancy.com/products/?e
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

--
-----------------------------------------------------------------------
(\__/)
(='.'=)This is Bunny. Copy and paste Bunny into your
(")_(")signature to help him gain world domination.
------------------------------------------------------------------------

Jon Harrop

unread,

May 9, 2008, 1:14:55 AM5/9/08

to caml...@yquem.inria.fr

On Friday 09 May 2008 05:45:53 you wrote:
> On Thu, May 8, 2008 at 5:39 PM, Jon Harrop <j...@ffconsultancy.com> wrote:
> > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> > Exploiting multicores requires a scalable concurrent GC and message
> > passing (like JoCaml) is not a substitute. Unfortunately, this is now
> > true of all functional languages available for Linux, which is why we
> > have now migrated entirely to Windows and F#. I find it particularly
> > ironic that the Haskell community keep hyping the multicore capabilities
> > of pure code when the rudimentary GC in Haskell's only usable
> > implementation already stopped scaling.
>
> Fork? For something like a raytracer, I do not see how threads would be
> any more useful than fork.

There are two problems with that:

You go back to manual memory management between parallel threads/processes.

Parallelism is for performance and performance requires mutable data
structures.

Then you almost always end up copying data unnecessarily because you cannot
collect it otherwise, which increases memory consumption and massively
degrades performance that, in turn, completely undermines the original point
of parallelism. The cost of interthread communication is then so high in
OCaml that you will rarely be able to obtain any performance improvement for
the number of cores desktop machines are going to see over the next ten
years, by which time OCaml will be 10-100x slower than the competition.

> When was the last time you heard of a cool new windows app anyway?

The last time we released a product. :-)

> > . No 16Mb limit.
>
> What do you mean by 16mb limit?

OCaml's strings and arrays are limited to 16Mb in 32-bit.

> > . Inlining.
>
> isn't it best for the compiler to handle that? I wouldn't mind hearing
> another perspective on this, but I thought that compilers were smarter
> these days.

Definitely not. Compilers uniformly suck at inlining. For example, agressive
inlining is often beneficial in numerical code and often damaging in symbolic
code. Compilers cannot tell the difference.

This is very similar to "unboxed data structures are always better", which
also isn't generally true.

I've got more gripes to add:

Missing types, like float32 and int16.
DLLs.

Jon Harrop

unread,

May 9, 2008, 1:16:09 AM5/9/08

to caml...@yquem.inria.fr

On Friday 09 May 2008 02:11:33 Matthew William Cox wrote:
> On Fri, May 09, 2008 at 01:39:54AM +0100, Jon Harrop wrote:
> > 5. Strings: pushing unicode throughout a general purpose language is a
> > mistake, IMHO. This is why languages like Java and C# are so slow.
>
> This is simply ridiculous. Using heavy-weight unicode-aware functions
> for character operations may slow down string-intensive operations in
> those languages, but the only alternative is to be broken.

Your definition of "broken" is broken.

> See items 3 and 4 here:
>
> http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
>
> In both cases, we need locale-aware character processing. That means
> Unicode these days. Unless you code your own routines for processing
> every 8-bit character set out there. I don't.

That is not a justification for pushing Unicode throughout a programming
language. You can support Turkish and Arabic without degrading the
performance of my software.

For us, this is about two things:

1. Profit.

2. Performance.

Our sales already correlate with the world distribution of wealth even though
we only support English. There is literally no value in us supporting other
languages. So we have no need of unicode.

What you call "Using heavy-weight unicode-aware functions for character
operations" refers to all lexing and parsing. OCaml provides incredibly fast
lexers and parsers and our customers make heavy use of that. Other languages
do not. This is a major advantage of OCaml, IMHO. So imposing unicode upon us
is not only unnecessary but is actually damaging.

> > 8. Exceptions: I love OCaml's extremely fast exception handling (6x
> > faster than C++, 30x faster than Java and 600x faster than C#/F#!). I
> > hate the "exceptions are for exceptional circumstances" line promoted by
> > the advocates of any language implementation with cripplingly-slow
> > exception handlers. I really miss fast exception handling in F#. Brian
> > gives an example of exception handling with recursive IO functions
> > failing to be tail recursive here and advocates option types. But
> > recursion is the wrong tool for the job here and option types are even
> > worse. You should use mutation and, failing that, CPS.
>
> I suspect his reaction to exceptions comes from an unfamiliarity with
> their uses in Ocaml.

Except Brian has been using OCaml for many years.

> > 9. Deforestation: Brian says "Haskell has introduced a very interesting
> > and (to my knowledge) unique layer of optimization, called
> > deforrestation". True, of course, but useless theoretical piffle because
> > we know that Haskell is slow in practice and prohibitively difficult to
> > optimize to-boot. Deforesting is really easy to do by hand.
>
> Yes, its easy to do by hand. It's also time consuming.

Do you spend >1% of your time deforesting?

> Don't you make a living off ocaml?

I use OCaml professionally in industry but our OCaml-based sales have declined
since Q3 2007 to only £6k p.a. now.

> I'm surprised you can justify using your time on such a mechanical task.

By my estimates, <<1% of my time is spent deforesting. It is irrelevant, just
like all of the other "benefits" of Haskell that have debilitating hidden
costs.

> Granted, deforesting something like map f . map g into
> map (f . g) is trivial, but we can come up with both trivial and
> nontrivial examples for almost anything.

Predictable performance and memory consumption are infinitely more valuable
than the odd trivial rewrite.

Incidentally, I forgot a couple of things for my wish list:

Overloading.
IDE-friendly, e.g. Intellisense for GUI code.

Tom Primožič

unread,

May 9, 2008, 2:31:28 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Great post, Jon!

2008/5/9 Jon Harrop <j...@ffconsultancy.com>:

>
> I believe these can be fixed by creating a new open source functional
> language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable
> to
> support tail calls without a catastrophic performance cost, i.e. so bad
> that
> you might as well write an interpreter.
>

I completely agree with you! I had this idea a long time ago, but I was put
off by the enormous code-base one would have to write to make even a simple
usable implementation. However, if there was a team gathered that would
create, let's say, a "next generation ML" on top of LLVM, count me in.
As far as multithreading is concerned, I've gathered some interesting
ideas... Primarily, soft threads are a great thing to have, and we would
have to look no further than our collegue's Felix language implementation,
which supports amaizingly fast green-threads. For the GC, we can turn to our
"father", Xavier Leroy, and his paper about the parallel GC for OCaml Light
[1]. I'm sure that from the time that paper was written, there has been some
improvements and new ideas for GC for multithreading, so by doing enough
research and combining the best ideas, we could easily pull it off.
A great thing would be to support "pluggable GC", i.e. that the programmer
would have the option to select the GC that best sits his program (for
example, Metronome-like GC [2] for real-time programs).

[1] http://citeseer.ist.psu.edu/doligez93concurrent.html
[2] http://citeseer.ist.psu.edu/bacon03metronome.html

My $100, Tom

Elliott Oti

unread,

May 9, 2008, 2:46:30 AM5/9/08

to caml...@yquem.inria.fr

Almost all of Brian's proposals would make Ocaml more Haskell-like. My
first thought on reading his essay was "well what you want is Haskell
with strictness as default".

Regards

Elliott

> . JIT compilation for metaprogramming.
> . Type specialization.
> . Unboxed types (structs).
> . No 16Mb limit.

> . Inlining.
> . Custom per-type functions for comparison, equality and hashing.
> . An intermediate representation that I can sell software in to earn a living.
> . Pattern matching over lazy values.

>
> I believe these can be fixed by creating a new open source functional language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable to
> support tail calls without a catastrophic performance cost, i.e. so bad that
> you might as well write an interpreter.
>
>

_______________________________________________

Richard Jones

unread,

May 9, 2008, 3:45:22 AM5/9/08

to caml...@inria.fr

> 1. Lack of Parallelism: Yes, this is already a complete show stopper.

Why can't you just fork off enough processes to cover each core?

> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.

I wrote an entire wiki in OCaml, it's UTF-8 clean throughout and
demonstrably handles German, French, Italian, Japanese, Korean and
Chinese (http://www.ocaml-tutorial.org/). I wrote another Japanese
website in OCaml. To do it, I just treated strings as UTF-8. The
only UTF-8-specific code in the whole thing was tiny, I think a
UTF-8-sensitive string truncation function, and a little bit of
collation (mostly handled by the database, but could easily have been
done using Camomile). My conclusion: UTF-8 is easy.

Rich.

--
Richard Jones
Red Hat

Till Varoquaux

unread,

May 9, 2008, 3:53:36 AM5/9/08

to caml...@yquem.inria.fr

Funnily enough I find this discussion very reassuring:
While we all have gripes with OCaml they seem to be generally quite
different, which IMHO higlights the fact that OCaml is already pretty
darn good. As for the things I've seen flying:
_Non concurrent GC. This is a very sticky one. OCaml's GC as it stands
is brilliant but is becoming increasingly irrelevant in a world where
machines are turning multi-core,multi-cpu and multi god knows what.
That being said parallel programming is no silver bullet, there are
many problems for which it remains hard and no easy solution pops to
mind. And, please, no more flinging Erlang around: it is nice but
certainly not a speed daemon; it simmply is not in the same playing
field as OCaml. Let's hope Emmanuel Chailloux manage to pull it off. I
would consider this of uttermost importance for OCaml's future and
offer my (linmited) help if you guys need anything.
_Printf: I've ranted about it more than my share but use it all the
time anyway.Don't like it, don't use it. Danvy style combinators are a
very nice hack but don't seem to cut it. I share a very special
love/hate relation with formats: I have tried to blast them more than
once but always end up using them more than my share. Here my current
(small gripes):
_format6? come on it seems that two types parameters should really
be all that you need... It seems that a lot of this comes from an
effort to avoid building the string when possible. Well most of the
time I am doing IO so optimization is not of paramount importance. I
never manage to quite understand all the 6 type parameters....
_I would KILL to be able to specify my own escaping functions for %S and %C.
_Mutable data. Come on, give me a break, the only one that is really a
killer is strings (what happens if you modify the keys of a
Map.Make(String).t). OCaml takes a pragmatic route by having mutable
data, generic comparison and hashing. That doesn't make it as "clean"
from a theoretical point of view but I am not ready to get rid of
those just to have the warm and fuzzy feeling that I am doing things
right (instead of actually doing anything)...
Oh and,AFAIK this is not what prevents you from doing SYB (lack of
type classes might be more like it).

My very own set of gripes include:
_ Not being to include a module defining a type t in another module
defining a type t (there was a patch for this floating around on the
mailing list that allowed to `hide` part of a module). And since we
are talking about killer module extensions: Mixin's and Russo type
modules have also been tested as extension's to the module system...
_Paying a very high performance price when programming with a high
level of abstraction (combinator libraries,functors...). MLton fixes
at the expense of being a slow compiler.
_HM(x) takes no hint from the compiler and therefor dancing around
recursif polymorphism issues and value restrictions with records and
eta expansion is a harrassing dance that has had the best of me on
numerous occasions. But let's not despair there might be light at the
end of the tunnel: MLF is our knight in shiny armor here.

The combination of these 3 is a big brick wall I've slammed into often
when trying to program at a high level of abstraction. They all have
known solutions (except maybe 2). I do feel privileged that these are
even on my radar; the complaints I have to voice against, say, Java
are so down to earth compared to this....

Till

--
http://till-varoquaux.blogspot.com/

David Teller

unread,

May 9, 2008, 3:56:55 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

On Fri, 2008-05-09 at 01:39 +0100, Jon Harrop wrote:
> 1. Lack of Parallelism: Yes, this is already a complete show stopper.

Agreed.

> 2. Printf: I think

> printf is one of the reasons OCaml dominates over languages like Haskell and
> SML.

I'm not sure about "dominate", but yes, it's definitely one reason why I
code in OCaml rather than Haskell.

> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.

We have a good Unicode library (see Camomile), good ropes libraries (see Community Caml)
and someone is bound to write a syntax extension to provide a natural syntax for Rope
literal constants.

> 6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that I
> would like to tweak (e.g. adding an optional "end" after a "match"
> or "function" to make them easier to nest), I am not bother about the
> shift-reduce conflicts. Mainstream languages get by with far more serious
> syntactic issues (like <<...>> in C++).

That's something we may need to discuss at some point. I would really like the ability to write
match a with
( | bla
| bla
| bla )

> 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point that
> the name of this exception does not convey its source is fallacious: that's
> what exception traces are for.

I personally prefer ExtLib's approach of redefining exceptions per-module.

> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!).

Yep.

> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation". True,
> of course, but useless theoretical piffle because we know that Haskell is
> slow in practice and prohibitively difficult to optimize to-boot. Deforesting
> is really easy to do by hand.

Are you sure or is that just a troll ? Supero seems to improve enormously Haskell's performances
and the Shootout already shows Haskell beating OCaml in several tests.

> 10. Limited standard library: I agree but this is only an issue because we are
> not able to fix the problem by contributing to the OCaml distribution.

That's the whole idea of Community Caml / Batteries Included. Really, feel free to contribute.

> . Pattern matching over lazy values.
Have you looked at the Patterns project on Google ? It provides pattern-matching
over lazy values. I've used it in conjunction with my own lazy list module [1]
to port Graham Hutton's Countdown problem from Haskell, and it works.

> I believe these can be fixed by creating a new open source functional language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable to
> support tail calls without a catastrophic performance cost, i.e. so bad that
> you might as well write an interpreter.

Why a full new language ? I may understand the interest of writing a new
compiler for OCaml (or whichever other language) and gradually improving
the forked compiler, but that's a different story altogether.

Cheers,
David

[1] https://forge.ocamlcore.org/frs/shownotes.php?release_id=12
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.

Jon Harrop

unread,

May 9, 2008, 4:15:28 AM5/9/08

to caml...@yquem.inria.fr

On Friday 09 May 2008 08:45:09 Richard Jones wrote:
> > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
>
> Why can't you just fork off enough processes to cover each core?

They need to share data, e.g. write results into the same matrix or mutate the
same array.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Ulf Wiger (TN/EAB)

unread,

May 9, 2008, 4:29:29 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:

> Brian Hurt recently published the following blog post
> "Why OCaml sucks":
>
> http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/

A friendly word of advice from a member of the Erlang
Community, which just suffered an endless thread with the
subject "Why Erlang sucks", triggered by a similar blog
article on Erlang:

Please change the subject Now!

The surely unintended consequence of such a blog article
(which was otherwise ok, btw) is that Google will now
find 336 hits on "ocaml sucks", threatening to exceed
Erlang's corresponding 337 hits any minute now.

But it's your party. I'll go back to lurking now. (:

BR,
Ulf W

Richard Jones

unread,

May 9, 2008, 5:31:28 AM5/9/08

to caml...@inria.fr

On Fri, May 09, 2008 at 09:10:12AM +0100, Jon Harrop wrote:
> On Friday 09 May 2008 08:45:09 Richard Jones wrote:
> > > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> >
> > Why can't you just fork off enough processes to cover each core?
>
> They need to share data, e.g. write results into the same matrix or
> mutate the same array.

You might want to look at the 'ancient' module. While the common
understanding is that the shared data should be immutable, in fact
you'd be OK if all you want to do is mutate a simple array of unboxed
ints or floats[1]. Of course to an extent you're back to using manual
memory management, but if it's only a few structures that may be
acceptable. Getting parallelism right is hard enough that programmers
should be forced to think about exactly which structures need to be
shared, rather than threads where the default is to share everything
and make the naive assumption that the programmer knows what they're
doing.

Rich.

[1] The real basis for the 'immutable restriction' is explained in the
documentation, along with exceptions, but it's complicated so I won't
go into it here.

--
Richard Jones
Red Hat

_______________________________________________

Vincent Hanquez

unread,

May 9, 2008, 5:52:05 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

On Fri, May 09, 2008 at 01:39:54AM +0100, Jon Harrop wrote:
>

> Brian Hurt recently published the following blog post "Why OCaml sucks":
>
> http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/
>
> I think it is interesting to discuss which aspects of OCaml can be improved
> upon and how but I disagree with some of his points. I'll address each of the
> original points in turn:

yeah, a new troll post (!)

> 1. Lack of Parallelism: Yes, this is already a complete show stopper.

no it's not. it's in your fantasy world. lots of applications doesn't
(or marginally) benefits from parallelism, and that your specific turf
would benefit from them, is not a good reason to impose their drawbacks
on everybody else.

> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.

unicode string should not be the default string, but unicode string need
to be available as a first class citizen. again, ocaml is not about doing
raytracer in opengl only.

> 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point that
> the name of this exception does not convey its source is fallacious: that's
> what exception traces are for.

exception traces are *not* available in long running program (daemon).
and having a Not_found crippling somewhere is just plain annoying.

even having something like a List.Not_found/Hashtbl.Not_found would make
thing a bit easier.

> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
> the "exceptions are for exceptional circumstances" line promoted by the
> advocates of any language implementation with cripplingly-slow exception
> handlers.

exceptions are for exceptional circumstances. using them as a fancy goto
mechanism is just plain stupid and really bad programming style.

> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation". True,
> of course, but useless theoretical piffle because we know that Haskell is
> slow in practice and prohibitively difficult to optimize to-boot. Deforesting
> is really easy to do by hand.

have you been hiding in a cave lately ?
haskell has improve its performance lately; not on everything, but still
can beat ocaml on some micro benchmarks.

> I have other wish-list items of my own to add:
>

> . No 16Mb limit.

use 64 bits.

--
Vincent

Jon Harrop

unread,

May 9, 2008, 6:29:29 AM5/9/08

to Vincent Hanquez, caml...@yquem.inria.fr

On Friday 09 May 2008 10:45:16 you wrote:
> On Fri, May 09, 2008 at 01:39:54AM +0100, Jon Harrop wrote:
> > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
>
> no it's not. it's in your fantasy world. lots of applications doesn't
> (or marginally) benefits from parallelism, and that your specific turf
> would benefit from them,

That's just crazy talk. Nobody can afford to ignore the multicore era that we
have been in for some time now.

> is not a good reason to impose their drawbacks on everybody else.

What drawbacks?

> > 5. Strings: pushing unicode throughout a general purpose language is a
> > mistake, IMHO. This is why languages like Java and C# are so slow.
>
> unicode string should not be the default string, but unicode string need
> to be available as a first class citizen.

Agreed.

> > 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point
> > that the name of this exception does not convey its source is fallacious:
> > that's what exception traces are for.
>
> exception traces are *not* available in long running program (daemon).

Because you compiled it wrongly or because you lost the output?

> > 8. Exceptions: I love OCaml's extremely fast exception handling (6x
> > faster than C++, 30x faster than Java and 600x faster than C#/F#!). I
> > hate the "exceptions are for exceptional circumstances" line promoted by
> > the advocates of any language implementation with cripplingly-slow
> > exception handlers.
>
> exceptions are for exceptional circumstances.

Bah, nonsense. Exceptions are used extensively for non-exceptional
circumstances in idiomatic OCaml and it works beautifully.

> > 9. Deforestation: Brian says "Haskell has introduced a very interesting
> > and (to my knowledge) unique layer of optimization, called
> > deforrestation". True, of course, but useless theoretical piffle because
> > we know that Haskell is slow in practice and prohibitively difficult to
> > optimize to-boot. Deforesting is really easy to do by hand.
>
> have you been hiding in a cave lately?

With yo mamma.

> haskell has improve its performance lately; not on everything, but still
> can beat ocaml on some micro benchmarks.

Look at the objective and quantitative results using the latest GHC on a
modern machine. Haskell can't even touch OCaml, let alone F#. ;-)

> > I have other wish-list items of my own to add:
> >
> > . No 16Mb limit.
>
> use 64 bits.

You aren't customer facing are you?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

May 9, 2008, 6:34:52 AM5/9/08

to David Teller, caml...@yquem.inria.fr

On Friday 09 May 2008 08:58:26 you wrote:
> On Fri, 2008-05-09 at 01:39 +0100, Jon Harrop wrote:
> > 9. Deforestation: Brian says "Haskell has introduced a very interesting
> > and (to my knowledge) unique layer of optimization, called
> > deforrestation". True, of course, but useless theoretical piffle because
> > we know that Haskell is slow in practice and prohibitively difficult to
> > optimize to-boot. Deforesting is really easy to do by hand.
>
> Are you sure or is that just a troll?

My point is that actual performance on real code is what matters and not the
number of optimizations that might theoretically apply. From the measurements
I have seen and done, Haskell is substantially slower despite its many extra
optimizations. Also, the Haskell community are very quick to cite useless
superfluous academic piffle that is of no practical use. They have a lot of
very fast Fibonacci number generators, for example.

We investigated alternative languages to diversify into last year and Haskell
was one of them. The single biggest problem with Haskell is that it is wildly
unpredictable in terms of performance and memory consumption. I asked many
specialists what they thought of Haskell for Scientists and they all advised
me to forget the idea. The next biggest problem is that it is as commerce
unfriendly as OCaml (e.g. no DLLs).

At last year's CUFP, a Haskell user from industry explained how they had to
drop to C/C++ whenever performance was relevant.

I received dozens of Haskell implementations of our ray tracer benchmark and
performance varied enormously but only one person managed to write an
implementation that gets within 3x the performance of OCaml, and he had spent
his life writing optimizing Haskell compilers. I asked why his code was fast
and he said it is too difficult to explain why. And that is a tiny program.

The Haskell mailing lists are full of people asking why their programs run so
slowly. The response is generally to litter the code with strictness
annotations and then resort to unsafe operations. There is virtually no
usable information explaining how to optimize Haskell code.

In contrast, OCaml is easy to optimize by following simple well-documented
rules and can achieve excellent performance.

> Supero seems to improve enormously Haskell's performances and the Shootout
> already shows Haskell beating OCaml in several tests.

The shootout is completely flawed by design. Just ignore it.

Also, they are running legacy hardware and OCaml does much better on modern
64-bit machines. For example, they claim that Haskell is 50% faster on binary
trees but running the test here, OCaml is 40% faster. They claim Haskell is
60% faster on nbody by here OCaml is 60% faster.

> > 10. Limited standard library: I agree but this is only an issue because
> > we are not able to fix the problem by contributing to the OCaml
> > distribution.
>
> That's the whole idea of Community Caml / Batteries Included. Really, feel
> free to contribute.

I thought Community OCaml was about tacking things on externally using macros
and extra libraries.

> > . Pattern matching over lazy values.
>
> Have you looked at the Patterns project on Google ? It provides
> pattern-matching over lazy values. I've used it in conjunction with my own
> lazy list module [1] to port Graham Hutton's Countdown problem from
> Haskell, and it works.

I had not seen that. I'll check it out, thanks.

> > I believe these can be fixed by creating a new open source functional
> > language for Linux based upon LLVM. However, the lack of a suitable GC is
> > a complete show stopper. The JVM is the only thing that comes close and
> > it is unable to support tail calls without a catastrophic performance
> > cost, i.e. so bad that you might as well write an interpreter.
>

> Why a full new language? I may understand the interest of writing a new

> compiler for OCaml (or whichever other language) and gradually improving
> the forked compiler, but that's a different story altogether.

A concurrent GC is the only major issue. Everything else is comparatively
easy.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Gerd Stolpmann

unread,

May 9, 2008, 7:11:01 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Am Freitag, den 09.05.2008, 06:09 +0100 schrieb Jon Harrop:
> On Friday 09 May 2008 05:45:53 you wrote:
> > On Thu, May 8, 2008 at 5:39 PM, Jon Harrop <j...@ffconsultancy.com> wrote:
> > > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> > > Exploiting multicores requires a scalable concurrent GC and message
> > > passing (like JoCaml) is not a substitute. Unfortunately, this is now
> > > true of all functional languages available for Linux, which is why we
> > > have now migrated entirely to Windows and F#. I find it particularly
> > > ironic that the Haskell community keep hyping the multicore capabilities
> > > of pure code when the rudimentary GC in Haskell's only usable
> > > implementation already stopped scaling.
> >
> > Fork? For something like a raytracer, I do not see how threads would be
> > any more useful than fork.

I think the parallelism capabilities are already excellent. We have been
able to implement the application backend of Wink's people search in
O'Caml, and it is of course a highly parallel system of programs. This
is not the same class raytracers or desktop parallelism fall into - this
is highly professional supercomputing. I'm talking about a cluster of
~20 computers with something like 60 CPUs.

Of course, we did not use multithreading very much. We are relying on
multi-processing (both "fork"ed style and separately started programs),
and multiplexing (i.e. application-driven micro-threading). I especially
like the latter: Doing multiplexing in O'Caml is fun, and a substitute
for most applications of multithreading. For example, you want to query
multiple remote servers in parallel: Very easy with multiplexing,
whereas the multithreaded counterpart would quickly run into scalability
problems (threads are heavy-weight, and need a lot of resources).

> There are two problems with that:
>

> . You go back to manual memory management between parallel threads/processes.

I guess you refer to explicit references between processes. This is a
kind of problem, and best handled by avoiding it. We have some cases
where we have to keep remote state. The solution was to have a timer,
and delete it after some time of not accessing it.

After all, most state is only temporary, and if it is lost, it can be
created again (at some cost, of course).

> . Parallelism is for performance and performance requires mutable data
> structures.

In our case, the mutable data structures that count are on disk.
Everything else is only temporary state.

I admit that it is a challenge to structure programs in a way such that
parallel programs not sharing memory profit from mutable state. Note
that it is also a challenge to debug locks in a multithreaded program so
that they run 24/7. Parallelism is not easy after all.

> Then you almost always end up copying data unnecessarily because you cannot
> collect it otherwise, which increases memory consumption and massively
> degrades performance that, in turn, completely undermines the original point
> of parallelism.

Ok, I understand. We are complete fools. :-)

I think that the cost of copying data is totally overrated. We are doing
this often, and even over the network, and hey, we are breaking every
speed limit.

> The cost of interthread communication is then so high in
> OCaml that you will rarely be able to obtain any performance improvement for
> the number of cores desktop machines are going to see over the next ten
> years, by which time OCaml will be 10-100x slower than the competition.

This is a quite theoretical statement. We will rather see that most
application programmers will not learn parallelism at all, and that
consumers will start question the sense of multicores, and the chip
industry will search for alternatives.

And _if_ application programmers learn parallelism, then rather in the
multi-processing/multiplexing setup we use, and not the multithreading
style you propagate. And on servers (where parallelism ought to happen),
the poor support of Windows for it (lacking "fork" and other cool
features) is no problem.

Gerd

>
> > When was the last time you heard of a cool new windows app anyway?
>
> The last time we released a product. :-)
>
> > > . No 16Mb limit.
> >
> > What do you mean by 16mb limit?
>
> OCaml's strings and arrays are limited to 16Mb in 32-bit.
>
> > > . Inlining.
> >
> > isn't it best for the compiler to handle that? I wouldn't mind hearing
> > another perspective on this, but I thought that compilers were smarter
> > these days.
>
> Definitely not. Compilers uniformly suck at inlining. For example, agressive
> inlining is often beneficial in numerical code and often damaging in symbolic
> code. Compilers cannot tell the difference.
>
> This is very similar to "unboxed data structures are always better", which
> also isn't generally true.
>
> I've got more gripes to add:
>

> . Missing types, like float32 and int16.
> . DLLs.
>
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Ralph Douglass

unread,

May 9, 2008, 7:37:32 AM5/9/08

to Vincent Hanquez, caml...@yquem.inria.fr

Not_found and Failure _ can be a bit annoying to watch for, so I suggest
using Core. The default behavior for most functions is to return an option
instead of raising an exception. There are _exn versions of the functions
if you want exceptions.

--
Ralph

Gabriel Kerneis

unread,

May 9, 2008, 7:59:07 AM5/9/08

to Gerd Stolpmann, caml...@yquem.inria.fr

On Fri, May 09, 2008 at 01:12:06PM +0200, Gerd Stolpmann wrote:
> Of course, we did not use multithreading very much. We are relying on
> multi-processing (both "fork"ed style and separately started programs),
> and multiplexing (i.e. application-driven micro-threading). I especially
> like the latter: Doing multiplexing in O'Caml is fun, and a substitute
> for most applications of multithreading. For example, you want to query
> multiple remote servers in parallel: Very easy with multiplexing,
> whereas the multithreaded counterpart would quickly run into scalability
> problems (threads are heavy-weight, and need a lot of resources).

Do you have any pointer on multiplexing (or some piece of code you could
show)? This seems interesting but I can't figure out what it looks like.

Thanks,
--
Gabriel

Robert Fischer

unread,

May 9, 2008, 8:11:17 AM5/9/08

to caml...@yquem.inria.fr

One of the big reasons I'd like to see single-process concurrency is because I'd like to be able to
hand an arbitrary function to be executed on a different (truly concurrent) thread. There are very
significant boundaries to passing around arbitrary functions via message passing (multiprocess
concurrency), but the barrier to entry is much lower within the same process.

~~ Robert.

Ulf Wiger (TN/EAB)

unread,

May 9, 2008, 8:33:30 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:

>
> 1. Lack of Parallelism: Yes, this is already a complete show
> stopper. Exploiting multicores requires a scalable concurrent
> GC and message passing (like JoCaml) is not a substitute.
> Unfortunately, this is now true of all functional languages
> available for Linux, which is why we have now migrated
> entirely to Windows and F#.

Dear Jon,

I will keep reminding you that Erlang is a functional language
(just not a statically typed one). It has very lightweight
processes, concurrent schedulers on multicore, and per-process
GC. It scales very well on multicore.

I can understand if you don't consider Erlang a competitive
alternative for building ray tracers, but I will protest
while you insist on making sweeping statements about *all*
functional languages that are patently untrue for Erlang.

Both in the Haskell and Erlang communities, there is
occasional reference to Erlang/Haskell/OCaml as a very
useful triple, having much in common, but each being
especially good in some fairly well-defined area.
I think this perspective is useful for all three languages.

BR,
Ulf W

BTW, the ErlOcaml project is beginning to reach a state of
usefulness, at least for experiments:

http://code.google.com/p/erlocaml/

It allows an Ocaml program to act as a "hidden erlang node",
talking to Erlang programs via Erlang message passing.
One possible use for this could be to use Erlang as a
systems programming language, keeping your application
logic in OCaml. Erlang/OTP comes with practically everything
you need to turn your product into a robust cluster,
including a distributed database and some very useful logic
for error recovery and in-service upgrade. And it's all
Open Source.

Many thanks to Ludovic Coquelle for all the coding he has done.
The rest of us have kept to the cheerleading section for now.

Gerd Stolpmann

unread,

May 9, 2008, 8:40:13 AM5/9/08

to Gabriel Kerneis, caml...@yquem.inria.fr

Am Freitag, den 09.05.2008, 13:58 +0200 schrieb Gabriel Kerneis:
> On Fri, May 09, 2008 at 01:12:06PM +0200, Gerd Stolpmann wrote:
> > Of course, we did not use multithreading very much. We are relying on
> > multi-processing (both "fork"ed style and separately started programs),
> > and multiplexing (i.e. application-driven micro-threading). I especially
> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute
> > for most applications of multithreading. For example, you want to query
> > multiple remote servers in parallel: Very easy with multiplexing,
> > whereas the multithreaded counterpart would quickly run into scalability
> > problems (threads are heavy-weight, and need a lot of resources).
>
> Do you have any pointer on multiplexing (or some piece of code you could
> show)? This seems interesting but I can't figure out what it looks like.

For some background information look here:

- Asynchronous RPC:
http://projects.camlcity.org/projects/dl/ocamlnet-2.2.9/doc/html-main/Rpc_intro.html

Look for the asynchronous examples

- The low-level side is enlightened here:
http://projects.camlcity.org/projects/dl/ocamlnet-2.2.9/doc/html-main/Equeue_intro.html

Here is a code snipped from our production code. It is from a server
that continuously monitors RPC ports:

let check_port esys p =
(* Checks the port p continuously until mon_until - then the port is deleted
from the list
*)
let (ba,ua) = Lazy.force shmem in
let rec next_check() =
let now = Unix.gettimeofday() in
if now < Int64.to_float p.mon_until then (
let conn =
match p.mon_port with
| `local_port _ ->
failwith "Monitoring local ports is not supported"
| `inet4_port a ->
Rpc_client.Inet(a.inet4_host, a.inet4_port) in
let rpc_proto =
match p.mon_protocol with
| `tcp -> Rpc.Tcp
| `udp -> Rpc.Udp in
let client =
Capi_clnt.Generic.V1.create_client2
~esys
(`Socket(rpc_proto, conn, Rpc_client.default_socket_config)) in
Rpc_helpers.set_exn_handler "generic" client;
Rpc_client.configure client 0 5.0; (* 5 secs timeout *)
Capi_clnt.Generic.V1.ping'async
client
()
(fun get_reply ->
(* We are called back when the "ping" is replied or times out *)
let successful = try get_reply(); true with _ -> false in
Rpc_client.shut_down client;
p.mon_alive <- successful;
ba.{ p.mon_shmpos } <- (if successful then 1L else 0L);
let g = Unixqueue.new_group esys in
Unixqueue.once esys g 1.0 next_check
);
)
else
remove_port p
in
next_check()

This (single-process) server can watch a high number of ports at the
same time and consumes almost no resources. Note how the loop is
programmed, essentially we have

let rec next_check() =
...
Capi_clnt.Generic.V1.ping'async
client
()
(fun get_reply ->
...
Unixqueue.once esys g 1.0 next_check
)

("once" calls the passed functions once in the future, here after 1
second). Basically, you have a closure with the action in the event
queue, and when the action is done, a new closure is appended to this
queue to schedule the next task. Using closures is important because
this enforces that all stack values are copied to the heap (I view
closures in this context as a means to "heapify" values), so the
recursion is stackless.

This server is also interesting because we actually use shared memory
for speeding up the communication path between client and server (look
at the "ba.{ p.mon_shmpos } <- ..." line). The client is here the
program that wants to know whether a port is alive. This is an
optimization for the most frequent case, but works only if client and
server reside on the same node. (Actually, we run this server on every
node, so this is always the case.) Effectively, it is no big problem to
combine shared memory style and multi-processing.

Gerd

--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

David Teller

unread,

May 9, 2008, 8:48:00 AM5/9/08

to Gabriel Kerneis, Gerd Stolpmann, caml...@yquem.inria.fr

You can try and take a look at Zheng Li's SDFlow. In particular,
functions switchn and farm.

Cheers,
David

On Fri, 2008-05-09 at 13:58 +0200, Gabriel Kerneis wrote:

> Do you have any pointer on multiplexing (or some piece of code you could
> show)? This seems interesting but I can't figure out what it looks like.
>
> Thanks,
--

David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.

_______________________________________________

David Teller

unread,

May 9, 2008, 9:00:28 AM5/9/08

to Ralph Douglass, caml...@yquem.inria.fr

For information, Batteries Included offers for each module an
ExceptionLess submodule, which redefines exception-throwing functions as
returning either an 'a option or, for more details, an ('a, 'b) result.

A Result module complements this with monadic operations for whoever is
interested.

Cheers,
David

On Fri, 2008-05-09 at 07:37 -0400, Ralph Douglass wrote:
> Not_found and Failure _ can be a bit annoying to watch for, so I
> suggest using Core. The default behavior for most functions is to
> return an option instead of raising an exception. There are _exn
> versions of the functions if you want exceptions.

--

David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.

_______________________________________________

Ulf Wiger (TN/EAB)

unread,

May 9, 2008, 9:01:36 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:
>
> . Parallelism is for performance and performance
> requires mutable data structures.

I disagree. SMP Erlang has no mutable data structures,
but achieves very good scalability anyway.

http://www.franklinmint.fm/blog/archives/000792.html

Performance implies different things in different
applications, of course. In some cases, I'm sure mutable
data structures might seem unavoidable. But you can't
equate them with performance.

The following quote from a recent post on the ErlyWeb list
is an example of both multicore scalability and some
wicked performance - without mutable data structures:

"Under some pretty extreme loads - around 20,000 open
mainframe connections - I was blowing up Erlyweb/Yaws.
As a sanity check, this was when Erlyweb/Yaws was
consuming ~90% of all 4 CPUs on a dedicated newish
Dell server running Ubuntu 7.10 server; there was
probably smoke coming out of the box at the time ;->

On occasion, I was also blowing up CICS on the mainframe
(a Z9 series, so pretty new gear) with workload supplied
by this single Erlyweb box on occasion, so maybe we were
getting into some weird sort of temporal, black-hole type
situation where the universe was starting to fold in on
itself ;-> There's nothing like applying extreme workloads
to show up some really strange problems. " (David Mitchell)

http://groups.google.com/group/erlyweb/browse_thread/thread/587d010c7b3e024

BR,
Ulf W

David Teller

unread,

May 9, 2008, 9:07:15 AM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

On Fri, 2008-05-09 at 11:29 +0100, Jon Harrop wrote:
> On Friday 09 May 2008 08:58:26 you wrote:
> > Are you sure or is that just a troll?
>
> My point is that actual performance on real code is what matters and not the
> number of optimizations that might theoretically apply. From the measurements
> I have seen and done, Haskell is substantially slower despite its many extra
> optimizations.

Ok, I'll take your word on that.

> > > 10. Limited standard library: I agree but this is only an issue because
> > > we are not able to fix the problem by contributing to the OCaml
> > > distribution.
> >
> > That's the whole idea of Community Caml / Batteries Included. Really, feel
> > free to contribute.
>
> I thought Community OCaml was about tacking things on externally using macros
> and extra libraries.

Yes and no.

* Batteries Included =
** take the legacy standard library, ExtLib, Camomile, SDFlow and other
libraries (list undecided so far, but pcre, ocamlnet, camlzip are strong
candidates), add some glue to get them all to work together, and build a
comprehensive external library
** do the same thing for macros
** distribute everything as a GODI package with lots of dependencies

* Community OCaml = use the same codebase but
** completely replace the legacy standard library
** turn this into a full, community-maintained, OCaml distribution.

In either case, the objective is to achieve something JDK-like in its
reach.

> > Why a full new language? I may understand the interest of writing a
> new
> > compiler for OCaml (or whichever other language) and gradually
> improving
> > the forked compiler, but that's a different story altogether.
>
> A concurrent GC is the only major issue. Everything else is
> comparatively
> easy.

Well, if you get to write a concurrent GC for the core of OCaml, feel
free to alert the community. People might be willing to help with
advanced features.

Cheers,
David

--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act
brings liquidations.

_______________________________________________

Jeff Polakow

unread,

May 9, 2008, 11:39:16 AM5/9/08

to j...@ffconsultancy.com, caml...@yquem.inria.fr, David Teller, caml-lis...@yquem.inria.fr

Hello,

> We investigated alternative languages to diversify into last year and
Haskell
> was one of them. The single biggest problem with Haskell is that it is
wildly
> unpredictable in terms of performance and memory consumption.
>

This is only the case if you don't understand lazy evaluation. This is no
different from OCaml, or any language. One must understand the operational
semantics to write efficient code. Imagine how a C programmer feels when
writing OCaml without knowing to make functions tail-recursive.

> The Haskell mailing lists are full of people asking why their programs
run so
> slowly. The response is generally to litter the code with strictness
> annotations and then resort to unsafe operations. There is virtually no
> usable information explaining how to optimize Haskell code.
>

Many people using Haskell don't fully appreciate the enormous difference
between eager and lazy evaluation; furthermore, most languages, functional
or otherwise, use some sort of eager evaluation. Strictness annotations
and unsafe operations are rarely necessary to write efficient code (but
they are necessary to make code written for an eager language run fast in
a lazy language).

In any case, I'm not trying to push Haskell or OCaml; they are both useful
in the real-world.

I wonder if similar complaints (unpredicatable performance, memory use,
dearth of practical information) will arise about F# as it starts to be
widely adopted in the real world.

-Jeff

---

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.

Jon Harrop

unread,

May 9, 2008, 1:51:51 PM5/9/08

to Ulf Wiger (TN/EAB), caml...@yquem.inria.fr

On Friday 09 May 2008 14:00:52 you wrote:
> Jon Harrop skrev:
> > . Parallelism is for performance and performance
> >
> > requires mutable data structures.
>
> I disagree. SMP Erlang has no mutable data structures,
> but achieves very good scalability anyway.

Scalability != performance. For CPU intensive tasks, Erlang is uniformly slow.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Ulf Wiger (TN/EAB)

unread,

May 9, 2008, 2:17:44 PM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:

> On Friday 09 May 2008 14:00:52 you wrote:
>> Jon Harrop skrev:
>>> . Parallelism is for performance and performance
>>>
>> > requires mutable data structures.
>>
>> I disagree. SMP Erlang has no mutable data structures,
>> but achieves very good scalability anyway.
>
> Scalability != performance. For CPU intensive tasks,
> Erlang is uniformly slow.

I don't see how you can say that. If you introduce the
restriction that there can be only one CPU, then this
might be true. Some applications are very difficult to
scale to multiple CPUs, but others are inherently
scalable. For some problems, mutable data structures
make all the difference, and in others, they just make
a mess of things.

If you say "parallelism is for performance", you imply
that the program is scalable, and that it actually makes
a difference to add more CPUs. In this case, mutable
the presence of data structures will make scaling more
difficult. Most problems involved in utilizing multicore
boil down to the widespread use of mutable data structures.

If the problem isn't scalable, then other tricks are needed
to achieve performance, and mutable data structures may
be indispensable.

BR,
Ulf W

Jon Harrop

unread,

May 9, 2008, 3:55:11 PM5/9/08

to Gerd Stolpmann, caml...@yquem.inria.fr

On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote:
> I think the parallelism capabilities are already excellent. We have been
> able to implement the application backend of Wink's people search in
> O'Caml, and it is of course a highly parallel system of programs. This
> is not the same class raytracers or desktop parallelism fall into - this
> is highly professional supercomputing. I'm talking about a cluster of
> ~20 computers with something like 60 CPUs.
>
> Of course, we did not use multithreading very much. We are relying on
> multi-processing (both "fork"ed style and separately started programs),
> and multiplexing (i.e. application-driven micro-threading). I especially
> like the latter: Doing multiplexing in O'Caml is fun, and a substitute
> for most applications of multithreading. For example, you want to query
> multiple remote servers in parallel: Very easy with multiplexing,
> whereas the multithreaded counterpart would quickly run into scalability
> problems (threads are heavy-weight, and need a lot of resources).

If OCaml is good for concurrency on distributed systems that is great but it
is completely different to CPU-bound parallelism on multicores.

> > There are two problems with that:
> >
> > . You go back to manual memory management between parallel
> > threads/processes.
>
> I guess you refer to explicit references between processes. This is a
> kind of problem, and best handled by avoiding it.

Then you take a massive performance hit.

> > . Parallelism is for performance and performance requires mutable data
> > structures.
>
> In our case, the mutable data structures that count are on disk.
> Everything else is only temporary state.

Exactly. That is a completely different kettle of fish to writing high
performance numerical codes for scientific computing.

> I admit that it is a challenge to structure programs in a way such that
> parallel programs not sharing memory profit from mutable state. Note
> that it is also a challenge to debug locks in a multithreaded program so
> that they run 24/7. Parallelism is not easy after all.

Parallelism is easy in F#.

> > Then you almost always end up copying data unnecessarily because you
> > cannot collect it otherwise, which increases memory consumption and
> > massively degrades performance that, in turn, completely undermines the
> > original point of parallelism.
>
> Ok, I understand. We are complete fools. :-)
>
> I think that the cost of copying data is totally overrated. We are doing
> this often, and even over the network, and hey, we are breaking every
> speed limit.

You cannot afford to pay that price for parallel implementations of most
numerical algorithms.

> > The cost of interthread communication is then so high in
> > OCaml that you will rarely be able to obtain any performance improvement
> > for the number of cores desktop machines are going to see over the next
> > ten years, by which time OCaml will be 10-100x slower than the
> > competition.
>
> This is a quite theoretical statement. We will rather see that most
> application programmers will not learn parallelism at all, and that
> consumers will start question the sense of multicores, and the chip
> industry will search for alternatives.

On the contrary, that is not a theoretical statement at all: it already
happened. F# already makes it much easier to write high performance parallel
algorithms and its concurrent GC is the crux of that capability.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

May 9, 2008, 3:55:12 PM5/9/08

to Ulf Wiger (TN/EAB), caml...@yquem.inria.fr

On Friday 09 May 2008 13:33:16 Ulf Wiger wrote:
> Jon Harrop skrev:
> > 1. Lack of Parallelism: Yes, this is already a complete show
> >
> > stopper. Exploiting multicores requires a scalable concurrent
> > GC and message passing (like JoCaml) is not a substitute.
> > Unfortunately, this is now true of all functional languages
> > available for Linux, which is why we have now migrated
> >
> > entirely to Windows and F#.
>
> Dear Jon,
>
> I will keep reminding you that Erlang is a functional language
> (just not a statically typed one). It has very lightweight
> processes, concurrent schedulers on multicore, and per-process
> GC. It scales very well on multicore.

I will keep reminding you at Erlang is not competitively performance for
CPU-bound computation like number crunching. The fact that it scales well on
distributed clusters for massively concurrent applications is irrelevant:
that has nothing to do with multicore computing.

> "Under some pretty extreme loads - around 20,000 open
> mainframe connections - I was blowing up Erlyweb/Yaws.
> As a sanity check, this was when Erlyweb/Yaws was
> consuming ~90% of all 4 CPUs on a dedicated newish
> Dell server running Ubuntu 7.10 server; there was
> probably smoke coming out of the box at the time ;->

If this were relevant to multicore computing, an "extreme load" would
certainly be 100% CPU usage and not 90%.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

May 9, 2008, 3:55:12 PM5/9/08

to Jeff Polakow, caml...@yquem.inria.fr

On Friday 09 May 2008 16:38:55 Jeff Polakow wrote:
> Hello,
>
> > We investigated alternative languages to diversify into last year and
> > Haskell
> > was one of them. The single biggest problem with Haskell is that it is
> > wildly
> > unpredictable in terms of performance and memory consumption.
>
> This is only the case if you don't understand lazy evaluation.

Simon Peyton-Jones told me that. I am sure you will agree that he understands
lazy evaluation.

> This is no
> different from OCaml, or any language. One must understand the operational
> semantics to write efficient code. Imagine how a C programmer feels when
> writing OCaml without knowing to make functions tail-recursive.

Tail calls have simple rules. Graph reduction of a lazy program with
optimizing rewrites and strictness analysis does not adhere to simple rules.

Simple rules => predictable.

> I wonder if similar complaints (unpredicatable performance, memory use,
> dearth of practical information) will arise about F# as it starts to be
> widely adopted in the real world.

F# has long since overtaken all other functional languages in terms of
industrial uptake and I have not heard that complaint from anyone. Like
OCaml, it follows simple rules and is predictable as a consequence.

Some of the rules are very different in F#. For example, nested closures
degrade performance in OCaml but improve performance in F#.

This may well change as we transition to manycore and parallelism makes
performance unpredictable.

Ulf Wiger (TN/EAB)

unread,

May 9, 2008, 4:26:23 PM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:

> On Friday 09 May 2008 13:33:16 Ulf Wiger wrote:
>> Jon Harrop skrev:
>>> 1. Lack of Parallelism: Yes, this is already a complete show
>>>
>> > stopper. Exploiting multicores requires a scalable concurrent
>> > GC and message passing (like JoCaml) is not a substitute.
>> > Unfortunately, this is now true of all functional languages
>> > available for Linux, which is why we have now migrated
>>> entirely to Windows and F#.
>> Dear Jon,
>>
>> I will keep reminding you that Erlang is a functional language
>> (just not a statically typed one). It has very lightweight
>> processes, concurrent schedulers on multicore, and per-process
>> GC. It scales very well on multicore.
>
> I will keep reminding you at Erlang is not competitively
> performance for CPU-bound computation like number crunching.

There is a vast number of applications where performance
is not about number crunching. OCaml stands a good chance
of expanding into some of them, e.g. if it can grow into
providing better support for concurrency.

> The fact that it scales well on
> distributed clusters for massively concurrent applications
> is irrelevant: that has nothing to do with multicore
> computing.

Who's been talking about distributed clusters?

Erlang does scale on distributed clusters, but the
link I provided was about scaling on multicore.
OCaml could too, but I don't think that insisting on
the use of mutable data structures is the way to go.

This may surprise you, but I don't subscribe to this list
in order to try to convert OCaml programmers to Erlang.
I think Erlang, OCaml and Haskell can benefit from
an exchange of ideas. I will continue to believe that at
least some members of the OCaml community share this view.

BR,
Ulf W

Berke Durak

unread,

May 9, 2008, 4:36:46 PM5/9/08

to Jon Harrop, caml-list

On Fri, May 9, 2008 at 8:09 PM, Jon Harrop <j...@ffconsultancy.com> wrote:

> On Friday 09 May 2008 16:38:55 Jeff Polakow wrote:
> > Hello,
> >
> > > We investigated alternative languages to diversify into last year and
> > > Haskell
> > > was one of them. The single biggest problem with Haskell is that it is
> > > wildly
> > > unpredictable in terms of performance and memory consumption.
> >
> > This is only the case if you don't understand lazy evaluation.
>
> Simon Peyton-Jones told me that. I am sure you will agree that he
> understands
> lazy evaluation.
>
> This is no
> > different from OCaml, or any language. One must understand the
> operational
> > semantics to write efficient code. Imagine how a C programmer feels when
> > writing OCaml without knowing to make functions tail-recursive.
>
> Tail calls have simple rules. Graph reduction of a lazy program with
> optimizing rewrites and strictness analysis does not adhere to simple
> rules.

> Simple rules => predictable.
>

I concur. Having your programming language require a compiler with
unspecifiably sophisticated optimizations to yield useful performance is not
a very good thing:
I wrote something about this here:

http://abaababa.blogspot.com/2008/03/transparent-performance.html

> I wonder if similar complaints (unpredicatable performance, memory use,
> dearth of practical information) will arise about F# as it starts to be
> widely adopted in the real world.

F# has long since overtaken all other functional languages in terms of
> industrial uptake and I have not heard that complaint from anyone. Like
> OCaml, it follows simple rules and is predictable as a consequence.

Jon, I imagine that the reasons for your unending praise of F# concurrent
goodness are specific to your peculiar number-crunching, cash-earning
applications, of which we don't know much.

Would you be kind enough to detail a little bit, on this list, the kind of
things
you do with F# and Ocaml, what kind of customers you have, what their
attitudes
are towards Ocaml and functional programming, and what horrible lengths
Ocaml
forces you to go to ship sellable closed-source code, if any? And please,
don't ask us to
spend thirty pounds (is that still a valid currency?) for that information
:)

Meanwhile the world has been going free & open source for 10 years and F# is
as FOSS as the Coca-Cola company sells organic broccoli. Also I'm not sure
of the speed at which the Windows platform will go down the drain but it
doesn't seem to do very well. Agreed, Windows
has its niches in the industry (finance, CAD, electronics) but some
subniches could quickly switch to Linux/BSD/whatever the moment their major
commercial application (say, Autocad) are ported to Linux.

However 10 years from now we'll still have Linux and Ocaml will still be
running on them.
As the Ocaml community, with some effort on our part, we could make Ocaml a
substantial
alternative to the "P" in LAMP. Monadic prison is too tough for the layman,
yet the laymen
are starting to asking questions about the adequacy of "dynamic typing"
w.r.t. software engineering problems (maintainability, cooperation, monkey
patching issues) and performance problems.

So let's continue the good work on Batteries, Core Lib, etc. and not worry
too much about the concurrency of the GC. If that gets fixed, it'll be good
to have around, but it's not critical.
--
Berke

Gerd Stolpmann

unread,

May 9, 2008, 4:39:39 PM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop:
> On Friday 09 May 2008 12:12:00 Gerd Stolpmann wrote:
> > I think the parallelism capabilities are already excellent. We have been
> > able to implement the application backend of Wink's people search in
> > O'Caml, and it is of course a highly parallel system of programs. This
> > is not the same class raytracers or desktop parallelism fall into - this
> > is highly professional supercomputing. I'm talking about a cluster of
> > ~20 computers with something like 60 CPUs.
> >
> > Of course, we did not use multithreading very much. We are relying on
> > multi-processing (both "fork"ed style and separately started programs),
> > and multiplexing (i.e. application-driven micro-threading). I especially
> > like the latter: Doing multiplexing in O'Caml is fun, and a substitute
> > for most applications of multithreading. For example, you want to query
> > multiple remote servers in parallel: Very easy with multiplexing,
> > whereas the multithreaded counterpart would quickly run into scalability
> > problems (threads are heavy-weight, and need a lot of resources).
>
> If OCaml is good for concurrency on distributed systems that is great but it
> is completely different to CPU-bound parallelism on multicores.

You sound like somebody who tries to sell hardware :-)

Well, our algorithms are quite easy to parallelize. I don't see a
difference in whether they are CPU-bound or disk-bound - we also have
lots of CPU-bound stuff, and the parallelization strategies are the
same.

The important thing is whether the algorithm can be formulated in a way
so that state mutations are rare, or can at least be done in a
"cache-friendly" way. Such algorithms exist for a lot of problems. I
don't know which problems you want to solve, but it sounds like as if it
were special problems. Like for most industries, most of our problems
are simply "do the same for N objects" where N is very large, and
sometimes "sort data", also for large N.

> > In our case, the mutable data structures that count are on disk.
> > Everything else is only temporary state.
>
> Exactly. That is a completely different kettle of fish to writing high
> performance numerical codes for scientific computing.

I don't understand. Relying on disk for sharing state is a big problem
for us, but unavoidable. Disk is slow memory with a very special timing.
Experience shows that even accessing state over the network is cheaper
than over disk. Often, we end up designing our algorithms around the
disk access characteristics. Compared to that the access to RAM-backed
state over network is fast and easy.

> > I admit that it is a challenge to structure programs in a way such that
> > parallel programs not sharing memory profit from mutable state. Note
> > that it is also a challenge to debug locks in a multithreaded program so
> > that they run 24/7. Parallelism is not easy after all.
>
> Parallelism is easy in F#.

Wonders must have happened I'm not aware of. How does F# prevent
deadlocks?

> > This is a quite theoretical statement. We will rather see that most
> > application programmers will not learn parallelism at all, and that
> > consumers will start question the sense of multicores, and the chip
> > industry will search for alternatives.
>
> On the contrary, that is not a theoretical statement at all: it already
> happened. F# already makes it much easier to write high performance parallel
> algorithms and its concurrent GC is the crux of that capability.

Don't misunderstand me, I'm not anti-F#. I only have no interests right
now in taking advantage of multicores by concurrent GC's. I rather want
to have an ultra-fast single-core execution. I can do the
parallelization myself.

Gerd

>
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Berke Durak

unread,

May 9, 2008, 4:55:37 PM5/9/08

to Gerd Stolpmann, caml...@yquem.inria.fr

BTW Gerd, Martin, what about writing a small post or paper about how you're
doing
what it is it that you do for Wink with Ocaml? The Hydro article on Gerd's
site
is interesting but too short and abstract.

I'm thinking of a more detailed practical use-case with more contextual
detail ("we're indexing people"), numbers ("we have 1,234,567 persons in our
database, distribued over X machines with Y processors"), performance
figures (each CPU runs x Ocaml processes compiled in native code, those eat
95% of CPU and spend 5% of their time in IO with a RSS of 80MB; profiling
shows that most of their time is spent there; we figured that if you do this
and that, it goes faster, etc.).

That kind of information would be very welcome and could do much good to
Ocaml, instead of flamewar-provoking posts.
--
Berke

Till Varoquaux

unread,

May 9, 2008, 5:01:20 PM5/9/08

to Gerd Stolpmann, caml...@yquem.inria.fr

First of all let's try to stop the squabling and have some actual some
discussions with actual content (trolling is very tempting and I am
the first to fall for it). OCaml is extremly nice but not perfect.
Other languages have other tradeoffs and the INRIA is not here to
fullfill all our desires.

shm_open shares memories through file descriptors and, under
linux/glibc, this done using /dev/shm. You can mmap this as a bigarray
and, voila, shared memory. This is quite nice for numerical
computation, plus you get closures etc... in your forks. Oh and COW on
modern OS's makes this very cheap.

--
http://till-varoquaux.blogspot.com/

Berke Durak

unread,

May 9, 2008, 5:13:34 PM5/9/08

to Till Varoquaux, caml...@yquem.inria.fr

On Fri, May 9, 2008 at 11:00 PM, Till Varoquaux <till.va...@gmail.com>
wrote:

Yes, that's the kind of approach I like.

- Do not forget to do a Gc.compact before forking to avoid collecting the
same unreacahble data in each fork.

- For sharing complex data, you can marshall into a shared Bigarray.

If the speed of Marshal becomes a bottleneck, a specialized Marshal that
skips most of the checks/byte-oriented, compact serialization things that
extern.c currently does could speed
things up.

- A means for inter-process synchronization/communication is still needed.
A userland solution using a shared memory consensus algorithm (which would
probably require some
C or assembly for atomic operations) could be cheap.
--
Berke

Vincent Hanquez

unread,

May 9, 2008, 6:08:52 PM5/9/08

to Jon Harrop, caml...@yquem.inria.fr

On Fri, May 09, 2008 at 11:23:50AM +0100, Jon Harrop wrote:
> That's just crazy talk. Nobody can afford to ignore the multicore era that we
> have been in for some time now.

I can find lots of interesting use of all my cores without having all my
programs be _automaticly_ multi threaded.

> > exception traces are *not* available in long running program (daemon).
>
> Because you compiled it wrongly or because you lost the output?

i see you have no idea what a daemon is, and how ocaml can print its
stack trace only on stderr when reaching toplevel (i.e. quitting).
(on a non-modified version of ocaml)

> > have you been hiding in a cave lately?
>
> With yo mamma.

this actually sum up your posts and the idea contains in them pretty well.

> You aren't customer facing are you?

funny enough that you say that. we both know, which one of us is actually
producing some ocaml software that is really selling.

--
Vincent

David Teller

unread,

May 9, 2008, 6:22:22 PM5/9/08

to Vincent Hanquez, caml...@yquem.inria.fr

On Fri, 2008-05-09 at 23:01 +0100, Vincent Hanquez wrote:
> On Fri, May 09, 2008 at 11:23:50AM +0100, Jon Harrop wrote:
> > > have you been hiding in a cave lately?
> >
> > With yo mamma.
>
> this actually sum up your posts and the idea contains in them pretty well.

Please stop that. Both of you.

--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.

_______________________________________________

David Teller

unread,

May 9, 2008, 6:24:16 PM5/9/08

to Jon Harrop, Gerd Stolpmann, caml...@yquem.inria.fr

On Fri, 2008-05-09 at 19:10 +0100, Jon Harrop wrote:
> Parallelism is easy in F#.

Now, that's a cliffhanger. Could you elaborate ?

Cheers,
David

> > I think that the cost of copying data is totally overrated. We are doing
> > this often, and even over the network, and hey, we are breaking every
> > speed limit.
>
> You cannot afford to pay that price for parallel implementations of most
> numerical algorithms.

Er... Not being a specialist, I may be wrong, but I seem to remember
that you can afford that, as long as you're also doing something else
during that copy.

> On the contrary, that is not a theoretical statement at all: it
> already
> happened. F# already makes it much easier to write high performance
> parallel
> algorithms and its concurrent GC is the crux of that capability.

Examples ? Pretty please ?

Cheers,
David

--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act
brings liquidations.

_______________________________________________

Richard Jones

unread,

May 9, 2008, 6:26:23 PM5/9/08

to Berke Durak, caml...@yquem.inria.fr

On Fri, May 09, 2008 at 11:13:26PM +0200, Berke Durak wrote:
> - For sharing complex data, you can marshall into a shared Bigarray.
>
> If the speed of Marshal becomes a bottleneck, a specialized Marshal that
> skips most of the checks/byte-oriented, compact serialization things that
> extern.c currently does could speed
> things up.

At the risk of sounding like a broken record ... OR you could use
Ancient which does the above, right.

Rich.

--
Richard Jones
Red Hat

Richard Jones

unread,

May 9, 2008, 6:35:00 PM5/9/08

to caml...@inria.fr

On Fri, May 09, 2008 at 07:09:57PM +0100, Jon Harrop wrote:
> F# has long since overtaken all other functional languages in terms of
> industrial uptake and I have not heard that complaint from anyone. Like
> OCaml, it follows simple rules and is predictable as a consequence.

Figures to back up this extraordinary claim? (And I don't mean the
unverifiable figures of a certain Cambridge-based consultancy).

These commercial enterprises better hope they don't need to modify the
F# compiler at all, and that MS keep releasing new versions and fixes
forever, because the terms of the F# license would prevent them from
fixing it themselves (unlike if they'd decided to go with an open
source solution).

Rich.

--
Richard Jones
Red Hat

_______________________________________________

Berke Durak

unread,

May 9, 2008, 7:01:24 PM5/9/08

to Richard Jones, caml...@yquem.inria.fr

On Sat, May 10, 2008 at 12:26 AM, Richard Jones <ri...@annexia.org> wrote:

> On Fri, May 09, 2008 at 11:13:26PM +0200, Berke Durak wrote:
> > - For sharing complex data, you can marshall into a shared Bigarray.
> >
> > If the speed of Marshal becomes a bottleneck, a specialized Marshal that
> > skips most of the checks/byte-oriented, compact serialization things that
> > extern.c currently does could speed
> > things up.
>
> At the risk of sounding like a broken record ... OR you could use
> Ancient which does the above, right.
>

I just looked at the interface and in spite of an initial difference in aims
(sharing large, static data between processes), Ancient seems to be almost
what's needed for parallelism.

It could be used as an IPC mailbox, provided you map /dev/shm, come up with
some cheap means of synchronizing your processes (possibly by using a
futex?)
and ensure that the creation/deletion functions do not have too much
overhead.

But you are saying in the README that values in the ancient heap have some
limitations, namely no ad-hoc polymorphic primitives. That may be
acceptable
for sharing a large static database but won't cut the mustard for cheap
message-
passing. So we need a function to copy back an ancient value in the heap so
that
it can be used normally, which is what I was talking about: a simplified,
cheaper
Marshal module.
--
Berke

Vincent Hanquez

unread,

May 9, 2008, 7:04:05 PM5/9/08

to David Teller, Gerd Stolpmann, caml...@yquem.inria.fr

On Sat, May 10, 2008 at 12:25:49AM +0200, David Teller wrote:
> > On the contrary, that is not a theoretical statement at all: it
> > already
> > happened. F# already makes it much easier to write high performance
> > parallel
> > algorithms and its concurrent GC is the crux of that capability.
>
> Examples ? Pretty please ?

and please stop throwing him bones to carry on this _stupid_ troll thread.

--
Vincent

Jon Harrop

unread,

May 9, 2008, 9:34:15 PM5/9/08

to Ulf Wiger (TN/EAB), caml...@yquem.inria.fr

On Friday 09 May 2008 19:17:30 Ulf Wiger wrote:
> Jon Harrop skrev:
> > On Friday 09 May 2008 14:00:52 you wrote:
> >> Jon Harrop skrev:
> >>> . Parallelism is for performance and performance
> >>>
> >> > requires mutable data structures.
> >>
> >> I disagree. SMP Erlang has no mutable data structures,
> >> but achieves very good scalability anyway.
> >
> > Scalability != performance. For CPU intensive tasks,
> >
> > Erlang is uniformly slow.
>
> I don't see how you can say that. If you introduce the
> restriction that there can be only one CPU, then this
> might be true.

On 1 CPU Erlang is currently ~5x slower in this context. So even if we ignore
the cost of message passing we know Erlang cannot be competitive for <6
cores. In fact, you find that the cost of message passing over mutation makes
Erlang not competitive for any existing computers in this context, even 256
CPU shared memory supercomputers.

This is why Erlang has not (and will not) penetrate the market of scientists
programming shared memory supercomputers. For the same reason, Erlang is not
relevant for exploiting multicore: its remit is massive concurrency which is
a completely different problem.

The theoretical scaling of Erlang's performance in terms of asymptotic
complexity is irrelevant because we're still looking at small numbers (<<256)
of cores and the prefactor is huge. Moreover, this will never change: even
when we have 1,024-core machines there will still be locality effects best
exploited by sharing memory between physically-close cores.

So neither approach is a panacea but concurrent GC is essential now and
message passing will be essential in 10 years.

> Some applications are very difficult to
> scale to multiple CPUs, but others are inherently
> scalable. For some problems, mutable data structures
> make all the difference, and in others, they just make
> a mess of things.
>
> If you say "parallelism is for performance", you imply
> that the program is scalable, and that it actually makes
> a difference to add more CPUs. In this case, mutable
> the presence of data structures will make scaling more
> difficult.

That is commonly claimed by the proponents of alternative approaches (purity,
STM, message passing etc.) but I have found it to be untrue in the
case of F# although I can believe it applies when there is uncontrolled
mutation.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Richard Jones

unread,

May 10, 2008, 3:52:21 AM5/10/08

to caml...@inria.fr

On Sat, May 10, 2008 at 01:01:03AM +0200, Berke Durak wrote:
> But you are saying in the README that values in the ancient heap have some
> limitations, namely no ad-hoc polymorphic primitives.

You misunderstand this limitation, but anyway ...

> So we need a function to copy back an ancient value in the heap so
> that it can be used normally

Should you want to copy a value back to the heap (almost always
unnecessary), you can just copy it like you would any other value.

Berke Durak

unread,

May 10, 2008, 4:25:00 AM5/10/08

to Richard Jones, caml...@inria.fr

On Sat, May 10, 2008 at 9:52 AM, Richard Jones <ri...@annexia.org> wrote:

> On Sat, May 10, 2008 at 01:01:03AM +0200, Berke Durak wrote:
> > But you are saying in the README that values in the ancient heap have
> some
> > limitations, namely no ad-hoc polymorphic primitives.
>
> You misunderstand this limitation, but anyway ...

The paragraph from your README then needs some clarification.

> (1) Ad-hoc polymorphic primitives (structural equality, marshalling
> and hashing) do not work on ancient data structures, meaning that you
> will need to provide your own comparison and hashing functions. For
> more details see Xavier Leroy's response here:

As you cannot mutate anything that is ancient (since it might be
concurrently
accessed), you cannot mark or modify them in-place for ad-hoc marshalling or
deep copying. Is that correct?

Comparison does not mark (and thus does not work on cyclic structures).
Does it work on values in the ancient heap (I'm not talking of handles
here)?

So it seems that adding a generic copy-out-of-the-ancient heap function
(which marks in a private area) would be worthwhile. Should not be too
difficult.
--
Berke

Christophe TROESTLER

unread,

May 10, 2008, 4:37:08 AM5/10/08

to t...@snarc.org, caml...@yquem.inria.fr

On Fri, 9 May 2008 23:01:57 +0100, Vincent Hanquez wrote:
>
> > Because you compiled it wrongly or because you lost the output?
>
> i see you have no idea what a daemon is, and how ocaml can print its
> stack trace only on stderr when reaching toplevel (i.e. quitting).
> (on a non-modified version of ocaml)

Is it what you want: http://caml.inria.fr/mantis/view.php?id=4265 ?

Cheers,
ChriS

Richard Jones

unread,

May 10, 2008, 4:51:59 AM5/10/08

to Berke Durak, caml...@inria.fr

On Sat, May 10, 2008 at 10:24:50AM +0200, Berke Durak wrote:
> > (1) Ad-hoc polymorphic primitives (structural equality, marshalling
> > and hashing) do not work on ancient data structures, meaning that you
> > will need to provide your own comparison and hashing functions. For
> > more details see Xavier Leroy's response here:
>
> As you cannot mutate anything that is ancient (since it might be
> concurrently
> accessed),

There are various restrictions on mutating the ancient data, which are
explained in the README. It is not true that ancient data is
completely immutable, just that you really need to understand what
you're doing in order to do it correctly. There are additional
restrictions to mutating [any] data concurrently, but I didn't explain
those because they are obvious, and can be solved with standard
threading techniques (mutexes, etc.)..

> you cannot mark or modify them in-place for ad-hoc marshalling or
> deep copying. Is that correct?

I'm not sure what this means. I haven't tried to Marshal ancient
data, because ancient data can already be persisted, so marshaling it
doesn't make much sense.

'Deep copying' of ancient data can be done just like deep copying any
other OCaml value.

> Comparison does not mark (and thus does not work on cyclic structures).
> Does it work on values in the ancient heap (I'm not talking of handles
> here)?

Your use of 'value', 'handle' etc. is confusing me. I suggest you
take a look at how OCaml values are stored (eg. <caml/mlvalues.h> is a
good place to start).

Anyhow, the polymorphic primitives (like %compare) don't work, just
because they make the assumption that anything outside the normal
OCaml heap is incomparable, but you can certainly write your own
comparison functions to replace those, eg. comparing
character-by-character for strings.

This has nothing to do with 'marking'.

> So it seems that adding a generic copy-out-of-the-ancient heap function
> (which marks in a private area) would be worthwhile. Should not be too
> difficult.

As I said earlier, you can just copy values from the ancient heap as
you would any other value, eg. using { ... with ... } syntax or
Array.copy / String.copy etc.

Let's say this again. Values on the ancient heap look just like
values anywhere else in the program. You can pass them to functions,
print them out, add them up, do whatever else you would normally do,
with very few restrictions. The differences are:

- the polymorphic primitives don't work (so you can't compare or hash them)
- they don't get garbage collected
- you should be very careful about mutating them

Vincent Hanquez

unread,

May 10, 2008, 5:25:19 AM5/10/08

to Christophe TROESTLER, caml...@yquem.inria.fr

On Sat, May 10, 2008 at 10:36:50AM +0200, Christophe TROESTLER wrote:
> On Fri, 9 May 2008 23:01:57 +0100, Vincent Hanquez wrote:
> >
> > > Because you compiled it wrongly or because you lost the output?
> >
> > i see you have no idea what a daemon is, and how ocaml can print its
> > stack trace only on stderr when reaching toplevel (i.e. quitting).
> > (on a non-modified version of ocaml)
>
> Is it what you want: http://caml.inria.fr/mantis/view.php?id=4265 ?

yes, and that what we use already; but it's not yet in the default
version of ocaml, which means that my debian upstream ocaml version is
not able to do it.

actually, i just saw the update from 18th march about the tentative
implementation in CVS. i really hope that's going to make it.

--
Vincent

Gerd Stolpmann

unread,

May 10, 2008, 6:55:46 AM5/10/08

to Berke Durak, caml...@yquem.inria.fr

Sorry, but it's not us who provoke. There is this F# fanboy who tries to
play his games, and doesn't say what his motivations are. You know when
there is money involved there are interests.

I'll try to write something. I have to be careful, however, because we
have competitors, and they are also after such information. Maybe I pick
one of our subsystems. That should be good enough for a case study while
not revealing too many details.

Ulf Wiger (TN/EAB)

unread,

May 10, 2008, 10:51:38 AM5/10/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:
> On Friday 09 May 2008 19:17:30 Ulf Wiger wrote:
>> Jon Harrop skrev:
>>>

>>> Erlang is uniformly slow.
>> I don't see how you can say that. If you introduce the
>> restriction that there can be only one CPU, then this
>> might be true.
>
> On 1 CPU Erlang is currently ~5x slower in this context.

I thought "this context" for this thread was a blog article
that discussed OCaml's weaknesses in general terms.

Just looking at the default weighting of the shootout
benchmarks, the Erlang/OCaml ratio is 3.17; excluding
the only concurrency-related benchmark, where Erlang
is 10.6x faster than OCaml, it's 4.0.
(Not that I consider the shootout representative to the
products we build...)

> So even if we ignore the cost of message passing we know
> Erlang cannot be competitive for <6 cores.

[...]

> This is why Erlang has not (and will not) penetrate the
> market of scientists programming shared memory
> supercomputers.

The problem with your argumentation is that you make
sweeping general statements and later, when challenged
justify them by claiming them to be true in a very
specific context - presumably your favourite problem.

I can only assume that you are perfectly happy with
OCaml staying focused on scientific computation and
applications that have the same characteristics.

> For the same reason, Erlang is not relevant for exploiting
> multicore: its remit is massive concurrency which is
> a completely different problem.

Perhaps we live on different planets...?
In my world, products are never this black and white.

There are constant tradeoffs, where a high degree of
concurrency is almost always a factor, but also fairly
big chunks of sequential processing where raw performance
is important. The problem is that we can't find one
language that does it all. Erlang is admittedly slow on
things like text parsing, math, etc., but jumping out of
the shared memory space and executing the code in something
like C or OCaml, we pay so much for heavyweight
communication that it's usually not worth the effort - never
mind the fact that debugging becomes much, much harder.

So on total system performance, we usually do very well,
even though it is pretty slow on some specific parts of
the problem. And we get superior robustness and
maintenance cost, which for our customers is far more
important than raw performance.

Right now, the main language that I see as an interesting
contender for our type of products is Haskell, because it
combines very good performance with very good support for
lightweight concurrency /and/ offers very high productivity.

OCaml, sadly, cannot even be considered for anything but
specialized tasks, since it has no credible support for
concurrency. I don't really see why it'd have to stay
that way, except that interest in message-passing
concurrency seems to have been very low so far on this
mailing list.

(Actually, I'd rank Felix higher, if it could ever rise
from obscurity, since it was designed to run safely as
a dynamically linked component. It could work as a very
nice complement to Erlang.)

And as for exploiting multicore, we simply cannot get
our hands on enough cores quickly enough (mainly because
we need them in compact NEBS-compliant embedded systems).
But we ship dual-core systems today, and got a 1.7x
speedup without even recompiling the code. Very soon,
we'll move to quad-cores, and expect corresponding
speedups again. 8-core boards are on the horizon.

Just recently, we noted that an Erlang-based product may
well surpass what's been the undisputed market leader on
performance using an 8-core machine, since that product
(hand-written C++) cannot use more than 1 core due to
timing issues. And the Erlang product in question wasn't
even designed for raw performance, but for maximum
convenience.

Perhaps this is all irrelevant to your particular
consultancy company? I know a number of companies that
consider it extremely relevant /today/ - in market
segments worth many billions of dollars.

BR,
Ulf W

Jon Harrop

unread,

May 10, 2008, 2:24:45 PM5/10/08

to Ulf Wiger (TN/EAB), caml...@yquem.inria.fr

On Saturday 10 May 2008 15:51:20 Ulf Wiger wrote:
> Jon Harrop skrev:

> > On 1 CPU Erlang is currently ~5x slower in this context.
>
> I thought "this context" for this thread was a blog article
> that discussed OCaml's weaknesses in general terms.
>
> Just looking at the default weighting of the shootout
> benchmarks, the Erlang/OCaml ratio is 3.17; excluding
> the only concurrency-related benchmark, where Erlang
> is 10.6x faster than OCaml, it's 4.0.
> (Not that I consider the shootout representative to the
> products we build...)

So we agree that Erlang is not in the same league as OCaml for CPU-intensive
tasks on <6 cores?

> > So even if we ignore the cost of message passing we know
> > Erlang cannot be competitive for <6 cores.
>
> [...]
>
> > This is why Erlang has not (and will not) penetrate the
> > market of scientists programming shared memory
> > supercomputers.
>
> The problem with your argumentation is that you make
> sweeping general statements and later, when challenged
> justify them by claiming them to be true in a very
> specific context - presumably your favourite problem.

I specifically said "For CPU intensive tasks...". I was not making a "sweeping
general statement".

I do not believe that not-massively-concurrent applications are a niche
either. They constitute the vast majority of programs.

> > For the same reason, Erlang is not relevant for exploiting
> > multicore: its remit is massive concurrency which is
> > a completely different problem.
>
> Perhaps we live on different planets...?
> In my world, products are never this black and white.
>
> There are constant tradeoffs, where a high degree of
> concurrency is almost always a factor,

In the specific context of your products, yes. Few general programs require a
high degree of concurrency though.

> but also fairly
> big chunks of sequential processing where raw performance
> is important. The problem is that we can't find one
> language that does it all. Erlang is admittedly slow on
> things like text parsing, math, etc., but jumping out of
> the shared memory space and executing the code in something
> like C or OCaml, we pay so much for heavyweight
> communication that it's usually not worth the effort - never
> mind the fact that debugging becomes much, much harder.

Perhaps. Comparing debugging is apples and oranges though. F# inherited great
debugging facilities from .NET but I never use them because static type
checking catches my errors instead. I believe a parallel debugger for OCaml
would be equally useless (if OCaml supported parallelism).

> So on total system performance, we usually do very well,
> even though it is pretty slow on some specific parts of
> the problem. And we get superior robustness and
> maintenance cost, which for our customers is far more
> important than raw performance.

I don't doubt that but I do not believe that Erlang's success with massively
concurrent applications has anything to do with adding better multicore
support to OCaml.

> Right now, the main language that I see as an interesting
> contender for our type of products is Haskell, because it
> combines very good performance with very good support for
> lightweight concurrency /and/ offers very high productivity.

I'm surprised. When I studied Haskell I felt that it was an academic language
with the same implementation problems as OCaml. There are no decent
development environments (Haskell did not even have type throwback).
Performance and memory consumption are wildly unpredictable, even between
compiler versions. Virtually nobody uses it outside academia (unlike OCaml,
which has been seeing substantial penetration in industry for many years).

Moreover, the paper describing Haskell's parallel GC gives performance figures
showing some applications degrading in performance when moving from 4 to 8
cores. So I think the Haskell community's claim that it is "good for
parallelism" is a triumph of hope over reality.

Haskell is also commerce unfriendly. Nobody sells Haskell DLLs for other
Haskell programmers (same with OCaml). There is a book market but only for
academics. Maybe "Real World Haskell" will change that but I won't hold my
breath, not least because I cringe at the idea of a product with "real world"
in the title.

> OCaml, sadly, cannot even be considered for anything but
> specialized tasks, since it has no credible support for
> concurrency. I don't really see why it'd have to stay
> that way, except that interest in message-passing
> concurrency seems to have been very low so far on this
> mailing list.

If that were true you would expect to see OCaml's use being restricted to
certain domains when, in fact, it appears to be much more broadly used than
Erlang.

> (Actually, I'd rank Felix higher, if it could ever rise
> from obscurity, since it was designed to run safely as
> a dynamically linked component. It could work as a very
> nice complement to Erlang.)
>
> And as for exploiting multicore, we simply cannot get
> our hands on enough cores quickly enough (mainly because
> we need them in compact NEBS-compliant embedded systems).
> But we ship dual-core systems today, and got a 1.7x
> speedup without even recompiling the code. Very soon,
> we'll move to quad-cores, and expect corresponding
> speedups again. 8-core boards are on the horizon.

Dell's eight core desktops are now only £1,080! :-)

> Just recently, we noted that an Erlang-based product may
> well surpass what's been the undisputed market leader on
> performance using an 8-core machine, since that product
> (hand-written C++) cannot use more than 1 core due to
> timing issues. And the Erlang product in question wasn't
> even designed for raw performance, but for maximum
> convenience.
>
> Perhaps this is all irrelevant to your particular
> consultancy company?

Without knowing what problem was being solved I cannot say whether or not it
is relevant to our work. I suspect it is another massively concurrent
application, in which case it is irrelevant for us, yes.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Mike Lin

unread,

May 10, 2008, 2:39:30 PM5/10/08

to caml...@yquem.inria.fr

I'll chip in here and say that as someone who does a lot of numerical
scientific computation (CRFs and M^3 networks for genome sequence analysis),
it would be really hard for me to justify structuring all my computations to
make them amenable to shared-memory parallelization in order to get a measly
2-8x speedup. In my field, I don't feel there are many major questions we
can't attack with a single core but we could if we could only do it 4x
faster.
If I really *need* to parallelize a computation, I usually need to do it
bigtime (100x, 1000x), and I use a compute farm. If I'm going to do the
nontrivial legwork of parallelization, I'd rather do it in such a way that
it can immediately scale to the compute farm when I find the multicore
speedup to be underwhelming - basically coarse-grain parallelization with
RPCs.
The multicore situation might be more interesting to me if/when, as
predicted, we see dozens or hundreds of cores on a single chip. But it seems
to me like coordinating shared-memory computations in that scenario is still
a rather active research area, from both hardware and software perspectives,
and whatever designs are out there at the moment for concurrent GCs etc.
might well end up being irrelevant. Similarly, except for a handful of major
products, scientific code doesn't have a very long shelf-life (you're just
trying to pump out data for your next paper), and anything I write now will
be vastly obsoleted and long forgotten in 4-5 years. So I really don't worry
about this right now.

Now, if it were not only possible but really easy for me to do fine-grain
parallelization, like with OpenMP, Task Parallel Library, Cilk, etc., then
there'd be little reason for me not to take advantage of current multicores.
But again, it's not like this would enable me to attack all these problems
that I can't now. I don't doubt such problems exist, but they don't for me
(or really anyone I know). So, to me, OCaml's lack of shared-memory
concurrency is a shame but not a show-stopper.

-Mike Lin

PS. Apologies in advance, but I can never resist the urge to good-naturedly
hoist one by his own petard,
"I'm hard pressed to find a practically-important application that
will be better off with a concurrent GC..."
-Jon Harrop, 11 months ago
http://groups.google.com/group/comp.lang.functional/browse_thread/thread/a288a989c5cd6acb/50b6ca2607173c91

On Sat, May 10, 2008 at 10:51 AM, Ulf Wiger (TN/EAB) <ulf....@ericsson.com>
wrote:

Jon Harrop

unread,

May 10, 2008, 4:05:06 PM5/10/08

to David Teller, caml...@yquem.inria.fr

On Friday 09 May 2008 23:25:49 David Teller wrote:
> On Fri, 2008-05-09 at 19:10 +0100, Jon Harrop wrote:
> > Parallelism is easy in F#.
>
> Now, that's a cliffhanger. Could you elaborate?

Sure. Review the concerns cited regarding parallel programming in OCaml:

1. When do we fork? Earlier to amortize the overhead or later to enable
sharing of data structures?

2. How do we avoid excessive copying? What if each parallel thread only
requires part of a data structure, should we dissect the data structure to
alleviate message passing?

3. How do we pass values like heap allocated custom blocks?

4. Do we use the Ancient module and manage our memory manually so that we can
mutate shared data?

These all make parallel programming much harder in OCaml than it needs to be.
All of these concerns simply disappear when you have a concurrent GC. F#
already does. Hopefully OCaml will soon.

Addressing each point in turn:

1. Use the high-performance thread pool provided by the Task Parallel Library.

2. There is no copying.

3. You can pass any value to another thread because everything is uniformly
represented.

4. Memory management is fully automated and shared mutable data is easy.

Consider the embarassingly-parallel task of matrix-matrix multiplication. Note
that this is absolutely best-case for OCaml because the overheads are as
small as possible thanks to the complete absence of fine-grained parallelism.
Most real applications require much more fine-grained parallelism and OCaml's
performance is much worse as a consequence.

In F#, this is really easy to write:

let c = Array2.zero_create am bn
Parallel.For(0, n, fun i ->
for j = 0 to n - 1 do
let mutable r = 0.0
for k = 0 to n - 1 do
r <- r + a.[i,k] * b.[k,j]
c.[i,j] <- r)

There are several things to notice about this:

The input matrices are immediately accessible from parallel threads without
us having to do anything. There is no copying, so memory consumption is lower
and there is no overhead to worry about.

The output matrix is filled directly using mutation. Again, there is no
copying, no subsequent single-threaded collation of results and no overhead.

No manual memory management is required.

There is no easy solution in OCaml but you might:

Fork a process for each CPU at the start of the multiply in order to avoid
copying the input matrices, use message passing to return the result and
collate it serially.

Prefork a process for each CPU and use message passing to queue work items,
pass the input matrices using message passing, compute results, pass results
back using message passing and collate them serially.

So we might have an O(1) hit from forking, an O(n^2) hit from message passing
and collation and an O(n^3) actual algorithm. Like Ulf said, message passing
scales well. However, its raw performance is awful compared to leveraging
shared memory.

Here is an OCaml implementation of the above F#:

let invoke (f : 'a -> 'b) x : unit -> 'b =
let input, output = Unix.pipe() in
match Unix.fork() with
| -1 -> (let v = f x in fun () -> v)
| 0 ->
Unix.close input;
let output = Unix.out_channel_of_descr output in
Marshal.to_channel output (try `Res(f x) with e -> `Exn e) [];
close_out output;
exit 0
| pid ->
Unix.close output;
let input = Unix.in_channel_of_descr input in
fun () ->
let v = Marshal.from_channel input in
ignore (Unix.waitpid [] pid);
close_in input;
match v with
| `Res x -> x
| `Exn e -> raise e

let parmul_aux i0 i1 n a b =
let c = Array.make_matrix (i1 - i0) n 0. in
for i = i0 to i1 - 1 do
let ai = a.(i) in
for j = 0 to n - 1 do
let r = ref 0.0 in
for k = 0 to n - 1 do
r := !r +. Array.unsafe_get ai k *. Array.unsafe_get (Array.unsafe_get b k)
j
done;
c.(i - i0).(j) <- !r
done;
done;
c

let parmul n a b =
let c1 = invoke (fun () -> parmul_aux 0 (n/2) n a b) () in
let c2 = parmul_aux (n/2) n n a b in
Array.concat [c1(); c2]

Note that we must manually insert unsafe array accesses and hoist loop
invariants in order to obtain comparable performance. This code is clearly
much more complicated than the F# and that complexity is completely
unnecessary.

> > > I think that the cost of copying data is totally overrated. We are
> > > doing this often, and even over the network, and hey, we are breaking
> > > every speed limit.
> >
> > You cannot afford to pay that price for parallel implementations of most
> > numerical algorithms.
>
> Er... Not being a specialist, I may be wrong, but I seem to remember
> that you can afford that, as long as you're also doing something else
> during that copy.

Here are some actual performance results for this task on my machine:

n OCaml F#
Serial Parallel Serial Parallel
16 0.00005 0.00022 0.00002 0.000045
32 0.00040 0.00067 0.00015 0.00015
64 0.0033 0.0021 0.00090 0.00068
512 4.0 2.4 2.5 1.4

As you can see, F# is much faster in every case. Larger "n" approaches the
limit where F# is only 1.6x faster than OCaml. For smaller "n" the gap
quickly widens with F# being 5x faster for n=16.

> > On the contrary, that is not a theoretical statement at all: it
> > already happened. F# already makes it much easier to write high
> > performance parallel algorithms and its concurrent GC is the crux of that
> > capability.
>
> Examples ? Pretty please ?

The above examples are a good start. The latest F#.NET Journal article covers
parallelism using Microsoft's TPL. A future OCaml Journal article will cover
the use of fork-based parallelism. I think you can see what I mean from the
results I've presented in this post though.

To put this another way, multicore computing provides hardware accelerated
shared memory and a concurrent GC makes it easy to exploit that. Message
passing is fine for concurrent applications that are not CPU bound or for
distributed computing but it is not competitive on today's multicore machines
and will not become competitive in the next decade.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Charles Forsyth

unread,

May 10, 2008, 5:34:42 PM5/10/08

to caml...@yquem.inria.fr

> Message
> passing is fine for concurrent applications that are not CPU bound or for
> distributed computing but it is not competitive on today's multicore machines
> and will not become competitive in the next decade.

i don't understand any of this.

>2. How do we avoid excessive copying? What if each parallel thread only
>requires part of a data structure, should we dissect the data structure to
>alleviate message passing?

this seems to suggest that message passing implies serialising the data structure.
perhaps i missed something, though.

Ulf Wiger (TN/EAB)

unread,

May 10, 2008, 5:58:39 PM5/10/08

to Jon Harrop, caml...@yquem.inria.fr

Jon Harrop skrev:

> On Saturday 10 May 2008 15:51:20 Ulf Wiger wrote:
>> Jon Harrop skrev:
>

> So we agree that Erlang is not in the same league
> as OCaml for CPU-intensive tasks on <6 cores?

For compute-intensive tasks that don't require coordination
with other tasks, sure.

> I specifically said "For CPU intensive tasks...".
> I was not making a "sweeping general statement".

Perhaps we simply read different things into the word
"CPU-intensive". If a program is able to make full use
of the CPU and is limited by available CPU capacity,
I will call it CPU-intensive.

In another post you compared an F# program with an
OCaml program, where the OCaml program had to do
marshaling and unmarshaling in order to achieve
cooperation between tasks. This is of course also
CPU-intensive, and if a significant amount of this
is required, OCaml will lose against Erlang big time,
just as it did against F#.

> In the specific context of your products, yes. Few
> general programs require a high degree of
> concurrency though.

One of the things you quickly notice when programming
in a language that really encourages you to explore
the concurrency patterns in your problem, is that
there is much more concurrency than you'd think, even
in conventional programs.

It's just that we've grown so accustomed to mapping
problems onto a sequential model, simply because
dealing with concurrency is a nightmare in most
conventional programming languages.

Here's another way of looking at it: While programming
so far has been very much about automating singleton tasks
that would otherwise take a long time to perform manually
(e.g. scientific computing), there is a huge number of
problems that are normally solved by groups of people
cooperating. These can be automated too, but without
mirroring the concurrency patterns used by people when
solving these problems manually, it gets very difficult.

We'll see a great number of web-based applications where
you not only need to handle hundreds or thousands of
simultaneous visitors, but each session will require
(real-time) interaction with other systems, as well as
perhaps other sessions within the system.

I think the term "massive concurrency" tends to lead to
misunderstandings. The key characteristic of a concurrency-
oriented language is that it's so easy and cheap to create
processes and coordinate between tasks, that you don't
think twice about creating just as many processes as you
need in order to solve the problem neatly. If this means
creating 10,000 processes, that shouldn't be an issue, just
like creating 10,000 objects shouldn't be for an OO language.

But to fully utilize 4 or 8 cores, you don't need thousands
of processes.

>> but jumping out of
>> the shared memory space and executing the code in something
>> like C or OCaml, we pay so much for heavyweight
>> communication that it's usually not worth the effort - never
>> mind the fact that debugging becomes much, much harder.
>
> Perhaps. Comparing debugging is apples and oranges though.

I simply meant that combining different languages and
execution environments makes debugging much more difficult
than using just one.

>> So on total system performance, we usually do very well,
>> even though it is pretty slow on some specific parts of
>> the problem. And we get superior robustness and
>> maintenance cost, which for our customers is far more
>> important than raw performance.
>
> I don't doubt that but I do not believe that Erlang's success
> with massively concurrent applications has anything to do
> with adding better multicore support to OCaml.

I agree that this is something very different from trying to
speed up scientific calculations using shared-memory
parallelism. But I would encourage a look at how lightweight
message passing concurrency can be used to build very complex
and robust systems. Jocaml is a very interesting start.

>> Right now, the main language that I see as an interesting
>> contender for our type of products is Haskell, because it
>> combines very good performance with very good support for
>> lightweight concurrency /and/ offers very high productivity.
>

> I'm surprised. [...]

> Moreover, the paper describing Haskell's parallel GC gives
> performance figures showing some applications degrading
> in performance when moving from 4 to 8 cores. So I
> think the Haskell community's claim that it is "good for
> parallelism" is a triumph of hope over reality.

I'm not into shared-memory parallelism. For me, it's much
more important that the language be good at coordination
between tasks - since failure in that area means death
by complexity in my domain.

There was a project comparing Erlang and C++ for telecoms
software, which also looked at Glasgow Distributed Haskell
for the same tasks. While it came to some fairly concrete
conclusions re. Erlang vs C++, the work on Haskell was not
exhaustive enough to be directly comparable. Still, I would
view the preliminary results as very interesting.

F# could also potentially be interesting (if the underlying
concurrency support would improve substantially in terms
of thread creation, number of threads, scheduling cost, etc.,
and I have reason to believe it will). The main obstacle
is that we are extremely wary of vendor lock-in, both in
terms of programming language and operating system.

>> And as for exploiting multicore, we simply cannot get
>> our hands on enough cores quickly enough (mainly because
>> we need them in compact NEBS-compliant embedded systems).
>> But we ship dual-core systems today, and got a 1.7x
>> speedup without even recompiling the code. Very soon,
>> we'll move to quad-cores, and expect corresponding
>> speedups again. 8-core boards are on the horizon.
>
> Dell's eight core desktops are now only £1,080! :-)

Well, a single-core NEBS-compliant rack-mountable Dell
server with 128 MB RAM is over $5,000! (:

(Not that we could use those either. We need Micro TCA
form factor.)

BR,
Ulf W

Jon Harrop

unread,

May 11, 2008, 12:03:35 AM5/11/08

to caml...@yquem.inria.fr

On Saturday 10 May 2008 22:39:33 Charles Forsyth wrote:
> >2. How do we avoid excessive copying? What if each parallel thread only
> >requires part of a data structure, should we dissect the data structure to
> >alleviate message passing?
>
> this seems to suggest that message passing implies serialising the data
> structure. perhaps i missed something, though.

Copying is the obvious solution in OCaml because there is no easy way to share
OCaml values between processes.

There are alternatives (like referencing shared memory as custom blocks and
managing deallocation manually) but they are substantially more complicated
and it is not clear whether or not they will be faster in OCaml because
indirection is required to disambiguate the heaps for the GC and prevent it
from traversing the shared memory.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Charles Forsyth

unread,

May 11, 2008, 5:36:15 AM5/11/08

to j...@ffconsultancy.com, caml...@yquem.inria.fr

> Copying is the obvious solution in OCaml because there is no easy way to share
> OCaml values between processes.

>There are alternatives (like referencing shared memory as custom blocks and
>managing deallocation manually) but they are substantially more complicated

ok, i understand now: it would be possible in principle given the language,
but current implementation makes it difficult.

Kuba Ober

unread,

May 12, 2008, 8:54:58 AM5/12/08

to caml...@yquem.inria.fr

> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.

Unicode by itself, when wider-than-byte encodings are used, adds "zero"
runtime overhead; the only overhead is storage (2 or 4 bytes per character).

Given that storage is cheap, I'd much rather have Unicode support than lack of
it.

How exactly unicode support makes Java and C# slow?! Jon, I'd have thought
that you know better than that :( Oh well, we all have bad days.

Cheers, Kuba

Kuba Ober

unread,

May 12, 2008, 9:02:05 AM5/12/08

to caml...@yquem.inria.fr

> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation".
> True, of course, but useless theoretical piffle because we know that
> Haskell is slow in practice and prohibitively difficult to optimize
> to-boot. Deforesting is really easy to do by hand.

Yet, if you look at things in the light of "optimization is depessimization",
you'd much rather have easier to read code, than code which is ugly because
you preoptimized it by hand. This is why, for me, Ocaml has a long way to go
to make it useful for run-of-the-mill production code. My pet peev is
performance penalty paid for writing in functional style where it actually
makes sense -- say passing an arithmetic operator to a map-style function.

Kuba Ober

unread,

May 12, 2008, 9:13:29 AM5/12/08

to caml...@yquem.inria.fr

> > 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> > Exploiting multicores requires a scalable concurrent GC and message
> > passing (like JoCaml) is not a substitute. Unfortunately, this is now
> > true of all functional languages available for Linux, which is why we
> > have now migrated entirely to Windows and F#. I find it particularly
> > ironic that the Haskell community keep hyping the multicore capabilities
> > of pure code when the rudimentary GC in Haskell's only usable
> > implementation already stopped scaling.
>
> Fork? For something like a raytracer, I do not see how threads would be
> any more useful than fork. For the UI, the native threads suffice. For
> windows... well f*ck windows. When was the last time you heard of a cool
> new windows app anyway? MSFT has killed their own platform with bad
> engineering decisions.

I wouldn't generalize like that. Where I work we use Qt (a C++ framework) and
can relatively trivially deploy on three major platforms (Linux, Windows and
OS X). It's been a long time since I dealt directly with any Windows
deficiencies per se. Trolls had to deal with them for sure, but they did the
job so that I wouldn't have to. Most cool applications are multi-platform...

I can't comment about the original lack of parallelism issue, but the
platform-bashing comment is just gibberish to me. I'm all for cool, inspiring
platforms, and I use OS X and Linux exclusively at home/school, but Windows
has made some headway in recent years and for me XP is quite reasonable as
long as you don't deal with driver writing. The latter is still better than
in WIN95 DDK days, but the toolset (apart from the F#-based tools) is
obscenely out-of-date, just as it was in 1995. Then there are some regressed
pieces of Linux which are quite a reality check given the whole "we support
old hardware" propaganda (steer clear of SCSI tape drives), so for me
personally I have equally many bash-Windows arguments as I have bash-Linux
ones.

Richard Jones

unread,

May 12, 2008, 9:22:36 AM5/12/08

to caml...@inria.fr

FWIW this is an implementation using Ancient:

----------------------------------------------------------------------
let n = 1024
let a = Array.make_matrix n n 6.7
let b = Array.make_matrix n n 8.9

(* Result array, stored in shared memory. *)
let c =
let c = Array.make_matrix n n 0. in
let fd = Unix.openfile "/tmp/zero" [Unix.O_RDWR;Unix.O_TRUNC;Unix.O_CREAT] 0o644 in
let md = Ancient.attach fd 0x440000000000n in
Ancient.follow (Ancient.share md 0 c)

let parmul_aux i0 i1 n a b =

for i = i0 to i1 - 1 do
let ai = a.(i) in
for j = 0 to n - 1 do
let r = ref 0.0 in
for k = 0 to n - 1 do
r := !r +. Array.unsafe_get ai k *.
Array.unsafe_get (Array.unsafe_get b k) j
done;

c.(i).(j) <- !r
done;
done

let parmul n a b =

(match Unix.fork () with 0 -> parmul_aux 0 (n/2) n a b; exit 0 | _ -> ());
parmul_aux (n/2) n n a b;
ignore (Unix.wait ())

;;

parmul n a b
----------------------------------------------------------------------

This is just barely faster than Jon's OCaml version using message
passing (12% faster on my test machine[0]). Which just seems to show
that the overhead of message passing _isn't_ the problem here[1].
Perhaps it's the bounds checking in the assignment back to the matrix?

Anyhow, in real life situations we'd all be using a super-optimized
hand-coded-in-assembly matrix multiplication library (LAPACK?), so
this is all very academic.

Rich.

[0] Quad core Intel hardware:
model name : Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz

[1] Creation of the result matrix and copying it to shared memory is
almost instantaneous in my tests.

--
Richard Jones
Red Hat

_______________________________________________

Kuba Ober

unread,

May 12, 2008, 9:31:11 AM5/12/08

to caml...@yquem.inria.fr

On Friday 09 May 2008, Ulf Wiger (TN/EAB) wrote:
> Jon Harrop skrev:
> > On Friday 09 May 2008 14:00:52 you wrote:
> >> Jon Harrop skrev:
> >>> . Parallelism is for performance and performance
> >>>
> >> > requires mutable data structures.
> >>
> >> I disagree. SMP Erlang has no mutable data structures,
> >> but achieves very good scalability anyway.
> >
> > Scalability != performance. For CPU intensive tasks,
> >

> > Erlang is uniformly slow.
>
> I don't see how you can say that. If you introduce the
> restriction that there can be only one CPU, then this

> might be true. Some applications are very difficult to

> scale to multiple CPUs, but others are inherently
> scalable. For some problems, mutable data structures
> make all the difference, and in others, they just make
> a mess of things.
>
> If you say "parallelism is for performance", you imply
> that the program is scalable, and that it actually makes
> a difference to add more CPUs. In this case, mutable
> the presence of data structures will make scaling more

> difficult. Most problems involved in utilizing multicore
> boil down to the widespread use of mutable data structures.
>
> If the problem isn't scalable, then other tricks are needed
> to achieve performance, and mutable data structures may
> be indispensable.

I think people forget Jon's background/interests: numerical methods.
Think of working with large amounts of numbers, where the cost of some
operations is on par with copying a few words of memory, and good
non-algorithmic (!) optimizations can increase performance 10-fold (see
Mersenne project, for example). Moreover, a lot of numerical methods are
applied in real-time applications, and there a 50% loss in performance simply
means that the game becomes unplayable, or that the phone connection sounds
like excrement dropping into the hole. For web applications (generally
speaking), a 50% loss of performance means you pay 2x more money for servers
and power. Up to a certain point, this is small compared to the costs of
maintaining the software. Where numerical methods are widely applied, a 50%
performance loss may mean you're out of the market.

I'm no F# fanboy; I use Ocaml exclusively for most of my numerical work (some
FEM, all numerical methods courses I take), but as a language for packaged
application development (bread-and-butter stuff that sells in boxes or via
downloads) it's simply not there yet for me. F# would be more like it, if I
weren't wary of .net platform lock-in. Give Mono a few more years and it
won't be an issue anymore...

For server-side use Ocaml is probably fine, but so is a number of languages
that for me pack way more punch (Lisp, for one). In any event, these days I
see little reason to develop web-facing stuff on non-google platforms (at
least if you're a startup), so I'm in Python lock-in in that area anyway now.

Cheers, Kuba

Jon Harrop

unread,

May 12, 2008, 10:21:42 AM5/12/08

to caml...@yquem.inria.fr

On Monday 12 May 2008 13:54:45 Kuba Ober wrote:
> > 5. Strings: pushing unicode throughout a general purpose language is a
> > mistake, IMHO. This is why languages like Java and C# are so slow.
>
> Unicode by itself, when wider-than-byte encodings are used, adds "zero"
> runtime overhead; the only overhead is storage (2 or 4 bytes per
> character).

You cannot degrade memory consumption without also degrading performance.
Moreover, there are hidden costs such as the added complexity in a lexer
which potentially has 256x larger dispatch tables or an extra indirection for
every byte read.

> Given that storage is cheap, I'd much rather have Unicode support than lack
> of it.

Sure. I don't mind unicode being available. I just don't want to have to use
it myself because it is of no benefit to me (or many other people) but is a
significant cost.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

May 12, 2008, 2:12:16 PM5/12/08

to caml...@yquem.inria.fr

On Monday 12 May 2008 14:22:24 Richard Jones wrote:
> This is just barely faster than Jon's OCaml version using message
> passing (12% faster on my test machine[0]).

On this machine, the Ancient-based implementation is 30-40% faster than the
message passing OCaml.

The important question is: how many operations must we be doing for
parallelism to be worthwhile? The answer is of fundamental importance.

Plotting the graphs, we find that parallelism is faster in F# for n>32 and in
OCaml for n>150 for this O(n^3) task. Therefore, OCaml is two orders of
magnitude slower than F# at spawning parallel computations. Moreover, to get
this performance from OCaml you had to sacrifice safety.

> Which just seems to show that the overhead of message passing _isn't_ the
> problem here[1].

Well, OCaml currently offers several different approaches to parallelism that
are all extremely inefficient. The parallel GC implementation being funded by
Jane St. will completely remove this performance problem with OCaml in this
context but it will not scale as well as Microsoft's concurrent GC and will
suffer from the same stalling.

This is why I believe that project is of incredible importance.

> Perhaps it's the bounds checking in the assignment back to the matrix?

Actually it was the polymorphic array maps and copies rather than the bounds
checking within them. So I hand rolled optimized float array array copying
code into both of the new parallel OCaml implementations to get the best
possible performance out of OCaml for the above comparison.

> Anyhow, in real life situations we'd all be using a super-optimized
> hand-coded-in-assembly matrix multiplication library (LAPACK?), so
> this is all very academic.

If you ignore the impedance mismatch between the data structures, the
performance overhead of invoking external functions and all applications that
require different functionality that has not already been written, optimized
and packaged by someone else, yes.

The fact is, this is hugely important to the vast majority of the software
industry and companies are investing massive amounts of time and effort into
parallelizing their software. Developers are flocking to .NET because it
makes parallelism easy and OCaml has the potential to provide these benefits
if the parallel GC project flies.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

May 12, 2008, 2:23:26 PM5/12/08

to caml...@yquem.inria.fr

On Monday 12 May 2008 14:31:04 Kuba Ober wrote:
> I'm no F# fanboy; I use Ocaml exclusively for most of my numerical work
> (some FEM, all numerical methods courses I take), but as a language for
> packaged application development (bread-and-butter stuff that sells in
> boxes or via downloads) it's simply not there yet for me. F# would be more
> like it, if I weren't wary of .net platform lock-in. Give Mono a few more
> years and it won't be an issue anymore...

This raises the question of whether or not the Mono backend for OCaml might
pay off? My impression is that Mono is going nowhere but I have been known to
be overly skeptical in the past. ;-)

Speaking of which, it would also be interesting to compare the performance of
OCaml with F#/Mono on that matrix-matrix multiply benchmark...

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Arthur Chan

unread,

May 12, 2008, 3:19:07 PM5/12/08

to Kuba Ober, caml...@yquem.inria.fr

> Yet, if you look at things in the light of "optimization is
> depessimization",
> you'd much rather have easier to read code, than code which is ugly
> because
> you preoptimized it by hand. This is why, for me, Ocaml has a long way to
> go
> to make it useful for run-of-the-mill production code. My pet peev is
> performance penalty paid for writing in functional style where it actually
> makes sense -- say passing an arithmetic operator to a map-style function.
>

What do you mean by this? What language would not incur this kind of
performance hit? Is F# able to optimize this out or were you referring to
something else?

>
> Cheers, Kuba
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>

--
-----------------------------------------------------------------------
(\__/)
(='.'=)This is Bunny. Copy and paste Bunny into your
(")_(")signature to help him gain world domination.
------------------------------------------------------------------------

Arthur Chan

unread,

May 12, 2008, 3:33:09 PM5/12/08

to Kuba Ober, caml...@yquem.inria.fr

> I can't comment about the original lack of parallelism issue, but the
> platform-bashing comment is just gibberish to me. I'm all for cool,
> inspiring
> platforms, and I use OS X and Linux exclusively at home/school, but
> Windows
> has made some headway in recent years and for me XP is quite reasonable as
> long as you don't deal with driver writing. The latter is still better
> than
> in WIN95 DDK days, but the toolset (apart from the F#-based tools) is
> obscenely out-of-date, just as it was in 1995. Then there are some
> regressed
> pieces of Linux which are quite a reality check given the whole "we
> support
> old hardware" propaganda (steer clear of SCSI tape drives), so for me
> personally I have equally many bash-Windows arguments as I have bash-Linux
> ones.
>
> Cheers, Kuba
>

Lets not start a platform war on this thread. I have *plenty* of gripes
with Linux as well. I was merely digressing, and was sniping at the high
cost of platform specific development. If you want to discuss Windows
programming gripes, Joel Spolsky's forum has plenty, even with regard to the
idea that things have been better lately. ocaml+qt does not count. If
you're using ocaml+qt, then you've basically sidestepped the issues with
writing code for Windows.

Karl Zilles

unread,

May 12, 2008, 3:39:20 PM5/12/08

to Arthur Chan, caml...@yquem.inria.fr

Arthur Chan wrote:
>
> Yet, if you look at things in the light of "optimization is
> depessimization",
> you'd much rather have easier to read code, than code which is ugly
> because
> you preoptimized it by hand. This is why, for me, Ocaml has a long
> way to go
> to make it useful for run-of-the-mill production code. My pet peev is
> performance penalty paid for writing in functional style where it
> actually
> makes sense -- say passing an arithmetic operator to a map-style
> function.
>
>
> What do you mean by this? What language would not incur this kind of
> performance hit? Is F# able to optimize this out or were you referring
> to something else?

For Ocaml: "when inlining higher-order functions with known function
arguments, those known function arguments are not themselves inlined."

http://camltest.inria.fr/pub/old_caml_site/caml-list/1048.html

(This is an old post, if things have changed I would love to be corrected.)

sml can inline such functions, making passing + to a map style function
potentially as efficient as writing a procedural loop.

Arthur Chan

unread,

May 12, 2008, 4:06:39 PM5/12/08

to Jon Harrop, caml...@yquem.inria.fr

>
>
> The fact is, this is hugely important to the vast majority of the software
> industry and companies are investing massive amounts of time and effort
> into
> parallelizing their software. Developers are flocking to .NET because it
> makes parallelism easy and OCaml has the potential to provide these
> benefits
> if the parallel GC project flies.
>

> Maybe... while I agree with your earlier points, I have not seen *any*
NET uptake in the bay area, particularly with startups. People are using
Hadoop, or some in-house version of it, or toying with Erlang, to get
parallel efficiency.

Arthur Chan

unread,

May 12, 2008, 4:33:48 PM5/12/08

to Jon Harrop, caml...@yquem.inria.fr, David Teller

>
>
> let c = Array2.zero_create am bn
> Parallel.For(0, n, fun i ->
> for j = 0 to n - 1 do
> let mutable r = 0.0
> for k = 0 to n - 1 do
> r <- r + a.[i,k] * b.[k,j]
> c.[i,j] <- r)
>
>

That is indeed a very pretty piece of code. I was wondering. The
concurrent GC that the Jane St. folks are writing, will it be useable with
the default stdlib that ships with ocaml, or will we have to use theirs?

Till Varoquaux

unread,

May 12, 2008, 5:22:36 PM5/12/08

to Arthur Chan, caml...@yquem.inria.fr, David Teller

The concrurent GC that we are writing? You must know more things than
I do. Note to myself: raise this in the next meeting.

I think you are referring to the Ocaml summer project which is to be
done by Emmanuel Chailloux's student.

Till

2008/5/12 Arthur Chan <baguas...@gmail.com>:

> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>

--
http://till-varoquaux.blogspot.com/

Gerd Stolpmann

unread,

May 12, 2008, 8:41:48 PM5/12/08

to Richard Jones, caml...@inria.fr

Am Montag, den 12.05.2008, 14:22 +0100 schrieb Richard Jones:
> This is just barely faster than Jon's OCaml version using message
> passing (12% faster on my test machine[0]). Which just seems to show
> that the overhead of message passing _isn't_ the problem here[1].

I've just written my own distributed version. You find my comments and
timings here:

http://blog.camlcity.org/blog/parallelmm.html

The code is here:

https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/examples/rpc/matrixmult/

In this (very unoptimized) multiplier message passing accounts for ~25%
of the runtime. Even for 2 cores there is already a speedup. 10 cores
(over a network) are about 4 times faster than a single core without
message passing.

Gerd

> Perhaps it's the bounds checking in the assignment back to the matrix?
>
> Anyhow, in real life situations we'd all be using a super-optimized
> hand-coded-in-assembly matrix multiplication library (LAPACK?), so
> this is all very academic.
>
> Rich.
>
> [0] Quad core Intel hardware:
> model name : Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz
>
> [1] Creation of the result matrix and copying it to shared memory is
> almost instantaneous in my tests.
>
--

------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Jon Harrop

unread,

May 12, 2008, 9:24:23 PM5/12/08

to caml...@yquem.inria.fr

On Tuesday 13 May 2008 01:42:42 Gerd Stolpmann wrote:
> Am Montag, den 12.05.2008, 14:22 +0100 schrieb Richard Jones:
> > This is just barely faster than Jon's OCaml version using message
> > passing (12% faster on my test machine[0]). Which just seems to show
> > that the overhead of message passing _isn't_ the problem here[1].
>
> I've just written my own distributed version. You find my comments and
> timings here:
>
> http://blog.camlcity.org/blog/parallelmm.html
>
> The code is here:
>
> https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/examples/rpc/mat
>rixmult/
>
> In this (very unoptimized) multiplier message passing accounts for ~25%
> of the runtime. Even for 2 cores there is already a speedup. 10 cores
> (over a network) are about 4 times faster than a single core without
> message passing.

For what values of "n"?

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Gerd Stolpmann

unread,

May 12, 2008, 10:02:03 PM5/12/08

to Jon Harrop, caml...@yquem.inria.fr

Am Dienstag, den 13.05.2008, 02:19 +0100 schrieb Jon Harrop:
> On Tuesday 13 May 2008 01:42:42 Gerd Stolpmann wrote:
> > Am Montag, den 12.05.2008, 14:22 +0100 schrieb Richard Jones:
> > > This is just barely faster than Jon's OCaml version using message
> > > passing (12% faster on my test machine[0]). Which just seems to show
> > > that the overhead of message passing _isn't_ the problem here[1].
> >
> > I've just written my own distributed version. You find my comments and
> > timings here:
> >
> > http://blog.camlcity.org/blog/parallelmm.html
> >
> > The code is here:
> >
> > https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/examples/rpc/mat
> >rixmult/
> >
> > In this (very unoptimized) multiplier message passing accounts for ~25%
> > of the runtime. Even for 2 cores there is already a speedup. 10 cores
> > (over a network) are about 4 times faster than a single core without
> > message passing.
>
> For what values of "n"?

It's in the article. n=1000, 2000, 3000. The "4 times faster" statement
is for n=3000.

Gerd

--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Jon Harrop

unread,

May 12, 2008, 11:18:04 PM5/12/08

to Gerd Stolpmann, caml...@yquem.inria.fr

On Tuesday 13 May 2008 03:03:10 Gerd Stolpmann wrote:
> Am Dienstag, den 13.05.2008, 02:19 +0100 schrieb Jon Harrop:
> > On Tuesday 13 May 2008 01:42:42 Gerd Stolpmann wrote:
> > > In this (very unoptimized) multiplier message passing accounts for ~25%
> > > of the runtime. Even for 2 cores there is already a speedup. 10 cores
> > > (over a network) are about 4 times faster than a single core without
> > > message passing.
> >
> > For what values of "n"?
>
> It's in the article. n=1000, 2000, 3000. The "4 times faster" statement
> is for n=3000.

Can you find a more accurate estimate of the threshold value of "n" above
which there is a speedup on 2 cores?

I think that would be very useful.

--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e

_______________________________________________

Jon Harrop

unread,

May 12, 2008, 11:52:28 PM5/12/08

to Gerd Stolpmann, caml...@yquem.inria.fr

On Friday 09 May 2008 21:40:45 Gerd Stolpmann wrote:
> Am Freitag, den 09.05.2008, 19:10 +0100 schrieb Jon Harrop:
> > If OCaml is good for concurrency on distributed systems that is great but
> > it is completely different to CPU-bound parallelism on multicores.
>
> You sound like somebody who tries to sell hardware :-)

:-)

> Well, our algorithms are quite easy to parallelize. I don't see a
> difference in whether they are CPU-bound or disk-bound - we also have
> lots of CPU-bound stuff, and the parallelization strategies are the
> same.
>
> The important thing is whether the algorithm can be formulated in a way
> so that state mutations are rare, or can at least be done in a
> "cache-friendly" way. Such algorithms exist for a lot of problems. I
> don't know which problems you want to solve, but it sounds like as if it
> were special problems. Like for most industries, most of our problems
> are simply "do the same for N objects" where N is very large, and
> sometimes "sort data", also for large N.

Right. Those are embarassingly parallel problems, which is why you are not
suffering from a lack of fine-grained parallelism as we are. Some problems
require fine-grained parallelism but, on a more general note, we are trying
to push parallelism as deep into our libraries as possible so that users can
benefit from their multicore machines whatever they are doing.

The ability to spawn parallel computions efficiently is of the utmost
importance here. Without it, multicore holds no advantages (e.g. for that
matrix multiply benchmark with n<100).

> > Parallelism is easy in F#.
>

> Wonders must have happened I'm not aware of. How does F# prevent
> deadlocks?

Parallel programming is typically extremely well suited to data parallel
constructs (parallel for and so forth) so deadlocks are not a problem.
However, F# is also unusually good for improving robustness even in the
presence of low-level threading constructs because, just like OCaml, mutation
is contained and typically reversed for performance-critical sections of
code. So the number of locks in a large parallel F# applications is tiny
compared to Java/C# and it is perfectly feasible to manage their potential
interactions (i.e. to avoid deadlocks) by hand.

> > > This is a quite theoretical statement. We will rather see that most
> > > application programmers will not learn parallelism at all, and that
> > > consumers will start question the sense of multicores, and the chip
> > > industry will search for alternatives.

> >
> > On the contrary, that is not a theoretical statement at all: it already
> > happened. F# already makes it much easier to write high performance
> > parallel algorithms and its concurrent GC is the crux of that capability.
>

> Don't misunderstand me, I'm not anti-F#.

This will all apply to OCaml when it gets a parallel GC this summer. :-)

> I only have no interests right
> now in taking advantage of multicores by concurrent GC's. I rather want
> to have an ultra-fast single-core execution. I can do the
> parallelization myself.

Right. Our customers are loving parallelism right now and want to make the
most of their multicore machines today. This is pushing us to make everything
as multicore friendly as possible.

Kuba Ober

unread,

May 13, 2008, 9:18:30 AM5/13/08

to caml...@yquem.inria.fr

On Monday 12 May 2008, you wrote:
> > Yet, if you look at things in the light of "optimization is
> > depessimization",
> > you'd much rather have easier to read code, than code which is ugly
> > because
> > you preoptimized it by hand. This is why, for me, Ocaml has a long way to
> > go
> > to make it useful for run-of-the-mill production code. My pet peev is
> > performance penalty paid for writing in functional style where it
> > actually makes sense -- say passing an arithmetic operator to a map-style
> > function.
>
> What do you mean by this? What language would not incur this kind of
> performance hit? Is F# able to optimize this out or were you referring to
> something else?

It's not much about the language, but about implementation. IIRC some Lisps
can do this kind of an optimization, gcc could do it as well -- whenever
the value of the argument to a function is known, a potentially
call-site-specific version of the function can be generated, and in such cases
it'd be much simpler, emitted-code-wise, than the version which explicitly
emits the arguments and calls the operator.

Kuba Ober

unread,

May 13, 2008, 9:33:25 AM5/13/08

to caml...@yquem.inria.fr

On Monday 12 May 2008, Jon Harrop wrote:
> On Monday 12 May 2008 13:54:45 Kuba Ober wrote:
> > > 5. Strings: pushing unicode throughout a general purpose language is a
> > > mistake, IMHO. This is why languages like Java and C# are so slow.
> >
> > Unicode by itself, when wider-than-byte encodings are used, adds "zero"
> > runtime overhead; the only overhead is storage (2 or 4 bytes per
> > character).
>
> You cannot degrade memory consumption without also degrading performance.
> Moreover, there are hidden costs such as the added complexity in a lexer
> which potentially has 256x larger dispatch tables or an extra indirection
> for every byte read.

In a typical programming language which only accepts ASCII characters outside
of string constants, your dispatch table will be short anyway (covers ASCII
subset only), and there will be an extra comparison or two, active only when
lexing strings. So no biggie.

> > Given that storage is cheap, I'd much rather have Unicode support than
> > lack of it.
>
> Sure. I don't mind unicode being available. I just don't want to have to
> use it myself because it is of no benefit to me (or many other people) but
> is a significant cost.

Let's look at a relatively widely deployed example: Qt toolkit.
Qt uses a 16 bit Unicode representation, and I really doubt that there are any
runtime-measurable costs associated with it. By "runtime measurable" I mean
that, say, application startup would take longer. A typical Qt application
will do quite a bit of string manipulation on startup (even file names
are stored in Unicode and converted to/from OS's code page), and they have
slashed startup time by half on "major" applications, between Qt 3 and Qt 4,
by doing algorithmic-style optimizations unrelated to strings (reducing number
of malloc's, for one). So, unless you can show that one of your applications
actually runs faster when you use non-Unicode strings as compared to well
implemented Unicode ones, I will not really consider Unicode to be a problem.

I do agree that many tools, like lexer generators, may not be Unicode-aware or
have poorly implemented Unicode awareness. The 256x lexer table blowup
shouldn't happen even if you were implementing APL with fully Unicode-aware
lexer. The 1st level lexer table should be split into two pieces (ASCII and
APL ranges), and everything else is either an error or goes opaquely into
string constants.

A lexer jump table only makes sense when it actually saves time compared to
a bunch of compare-and-jumps. On modern architectures some jump lookup tables
may actually be slower than compare-and-jumps, because some hardware
optimizations done by CPU (say branch prediction) may simply ignore branch
lookup tables, or only handle tables commonly generated by compilers...

Cheers, Kuba

Robert Fischer

unread,

May 13, 2008, 9:49:40 AM5/13/08

to caml...@yquem.inria.fr

>>>> 5. Strings: pushing unicode throughout a general purpose language is a
>>>> mistake, IMHO. This is why languages like Java and C# are so slow.
>>> Unicode by itself, when wider-than-byte encodings are used, adds "zero"
>>> runtime overhead; the only overhead is storage (2 or 4 bytes per
>>> character).
>> You cannot degrade memory consumption without also degrading performance.
>> Moreover, there are hidden costs such as the added complexity in a lexer
>> which potentially has 256x larger dispatch tables or an extra indirection
>> for every byte read.
>

Okay, I was going to let this slide, but it kept resurfacing and annoying me.

Is there any empirical support for the assertion that Java and C# are slow because of *unicode*? Of
all things, *unicode*? The fact that they're bytecod languages isn't a bigger hit? At least with
the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
available to ridiculous levels so that the full GC barely ever has to fire. The the nigh-universal
optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
too. And this is to say nothing of user-space problems like the explosion of nontrivial types
associated with the object-driven style. With all that going on, you're blaming their *Unicode
support* for why they're slow? "This is why languages like Java and C# are so slow." Really? Got
evidence for that?

~~ Robert.

Brian Hurt

unread,

May 13, 2008, 10:01:55 AM5/13/08

to Robert Fischer, caml...@yquem.inria.fr

Robert Fischer wrote:

>>>>>5. Strings: pushing unicode throughout a general purpose language is a
>>>>>mistake, IMHO. This is why languages like Java and C# are so slow.
>>>>>
>>>>>
>>>>Unicode by itself, when wider-than-byte encodings are used, adds "zero"
>>>>runtime overhead; the only overhead is storage (2 or 4 bytes per
>>>>character).
>>>>
>>>>
>>>You cannot degrade memory consumption without also degrading performance.
>>>Moreover, there are hidden costs such as the added complexity in a lexer
>>>which potentially has 256x larger dispatch tables or an extra indirection
>>>for every byte read.
>>>
>>>
>
>Okay, I was going to let this slide, but it kept resurfacing and annoying me.
>
>Is there any empirical support for the assertion that Java and C# are slow because of *unicode*? Of
>all things, *unicode*? The fact that they're bytecod languages isn't a bigger hit? At least with
>the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
>increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
>available to ridiculous levels so that the full GC barely ever has to fire. The the nigh-universal
>optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
>too. And this is to say nothing of user-space problems like the explosion of nontrivial types
>associated with the object-driven style. With all that going on, you're blaming their *Unicode
>support* for why they're slow? "This is why languages like Java and C# are so slow." Really? Got
>evidence for that?
>
>~~ Robert.
>
>_
>

The problem, as I understand it, is in writting parsers. Your standard
finite automata based regular expression library or lexical analyzer is
based, at it's heart, on a table lookup- you have a 2D array, whose size
is the number of input characters times the number of states. For ASCII
input, the number of possible input characters is small- 256 at most.
256 input characters times hundreds of states isn't that big of a table-
we're looking at sizes in 10's of K- easily handlable even in the bad
old days of 64K segments. Even going to UTF-16 ups the number of input
characters from 256 to 65,536- and now a moderately large state machine
(hundreds of states) weighs in at tens of megabytes of table space.
And, of course, if you try to handle the entire 31-bit full unicode
point space, welcome to really large tables :-).

The solution, I think, is to change the implementation of your finite
automata to use some data structure smarter than a flat 2D array, but
that's me.

Brian

Robert Fischer

unread,

May 13, 2008, 10:13:43 AM5/13/08

to caml...@yquem.inria.fr

> The problem, as I understand it, is in writting parsers. Your standard
> finite automata based regular expression library or lexical analyzer is
> based, at it's heart, on a table lookup- you have a 2D array, whose size
> is the number of input characters times the number of states. For ASCII
> input, the number of possible input characters is small- 256 at most.
> 256 input characters times hundreds of states isn't that big of a table-
> we're looking at sizes in 10's of K- easily handlable even in the bad
> old days of 64K segments. Even going to UTF-16 ups the number of input
> characters from 256 to 65,536- and now a moderately large state machine
> (hundreds of states) weighs in at tens of megabytes of table space.
> And, of course, if you try to handle the entire 31-bit full unicode
> point space, welcome to really large tables :-).
>
> The solution, I think, is to change the implementation of your finite
> automata to use some data structure smarter than a flat 2D array, but
> that's me.
>

Yes. It is certainly possible to write slow code to solve this problem.

A slightly more involved analysis is probably in order, so let's ask Wikipedia for some more light.

http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design

As Kuba pointed out, the high bit is 0 on any ASCII characters, and the significant bits of a
multi-byte sequence determine the length of the sequence. There are also a few large classes of bit
sequences which are simply not allowed. Now, I'm not an expert in writing parsers, but these
qualities certainly sound like nice optimization hooks.

Getting back to the original question, though -- is there any evidence that Java/C# are slow because
of unicode support, and not because of other aspects of the languages? Because that assertion seems
flat-out bogus to me.

Gerd Stolpmann

unread,

May 13, 2008, 10:24:46 AM5/13/08

to Brian Hurt, caml...@yquem.inria.fr

Note that we have such a lexer already: ulex. It uses binary decision
trees, AFAIK. The resulting code has moderate size. It can take some
time, however, until ulex has converted the NFA to a DFA.

Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------

Berke Durak

unread,

May 13, 2008, 11:18:16 AM5/13/08

to Robert Fischer, caml...@yquem.inria.fr

Robert Fischer wrote:

> Getting back to the original question, though -- is there any evidence that Java/C# are slow because
> of unicode support, and not because of other aspects of the languages? Because that assertion seems
> flat-out bogus to me.

I do not think the JVM is especially slow in practice. However, one potential source of
slowness could be, in some particular cases, conversions to and from the internal short array-based
string representation to UTF8 when using native code. Similarly, Java strings being immutable,
in-place modification of strings is not possible from native code, so a lot of bindings to C libraries
end up duplicating strings a lot (see e.g. PCRE).

This is why the NIO API exposing mutable and/or externally allocated buffers was introduced in the JVM,
but it remains hard to use.

However it is true that regexes on UTF8 can be quite slow. Compare (on Linux):

/udir/durak> dd if=/dev/urandom bs=10M count=1 of=/dev/shm/z

/udir/durak> time LANG=en_US.UTF-8 grep -c "^[a-z]*$" /dev/shm/z
2.31s user 0.01s system 99% cpu 2.320 total
/udir/durak> time LANG=C grep -c "^[a-z]*$" /dev/shm/z
0.04s user 0.01s system 98% cpu 0.048 total

Lesson 1: when lexing, do not read unicode chars one at a time. Pre-process your regular expression
according to your input encoding, instead.

That being said, I think strings should be represented as they are today, and that the core
Ocaml libraries do not have much business dealing with UTF8. We seldom need letter-indexed
random access to strings.

However, the time is ripe for throwing out old 8-bit charsets such as ISO-8859-x (a.k.a Latin-y)
and whatnot. This simplifies considerably lesson 1: it's either ASCII or UTF8 Unicode. I think
the Ocaml lexer should simply treat any byte with its high bit set as a lowercase letter.
--
Berke DURAK

Kuba Ober

unread,

May 14, 2008, 12:29:27 AM5/14/08

to caml...@yquem.inria.fr

> The problem, as I understand it, is in writting parsers. Your standard
> finite automata based regular expression library or lexical analyzer is
> based, at it's heart, on a table lookup- you have a 2D array, whose size
> is the number of input characters times the number of states.

This is just silly.

This table can be usually trivially degenerated into hardcoded
compare-and-jumps, for most lexers I've dealt with this will astronomically
reduce the size and increase speed.

The fact that the lexer generator you're using is dumb doesn't help, but
please don't assume that this dumbassedness is somehow a universal constant
and given upon us by $DEITY.

Cheers, Kuba

Kuba Ober

unread,

May 14, 2008, 12:40:15 AM5/14/08

to caml...@yquem.inria.fr

On Tuesday 13 May 2008, Robert Fischer wrote:
> > The problem, as I understand it, is in writting parsers. Your standard
> > finite automata based regular expression library or lexical analyzer is
> > based, at it's heart, on a table lookup- you have a 2D array, whose size
> > is the number of input characters times the number of states. For ASCII
> > input, the number of possible input characters is small- 256 at most.
> > 256 input characters times hundreds of states isn't that big of a table-
> > we're looking at sizes in 10's of K- easily handlable even in the bad
> > old days of 64K segments. Even going to UTF-16 ups the number of input
> > characters from 256 to 65,536- and now a moderately large state machine
> > (hundreds of states) weighs in at tens of megabytes of table space.
> > And, of course, if you try to handle the entire 31-bit full unicode
> > point space, welcome to really large tables :-).
> >
> > The solution, I think, is to change the implementation of your finite
> > automata to use some data structure smarter than a flat 2D array, but
> > that's me.
>
> Yes. It is certainly possible to write slow code to solve this problem.

With "slow code" you could have been meaning two things:

1. Table lookup globally replaced by compares-and-jumps. The latter benefit
from branch prediction and speculative execution. So it's not slow anymore.

2. Table "compression" used, where a few compare-and-jumps remove
huge "unused" swaths of the table. By "unused" I meant "bomb out with an
internal error".

I think you're being silly. Stop it.

Cheers, Kuba

Kuba Ober

unread,

May 14, 2008, 9:45:17 AM5/14/08

to caml...@yquem.inria.fr

On Friday 09 May 2008, Richard Jones wrote:
> On Fri, May 09, 2008 at 07:09:57PM +0100, Jon Harrop wrote:
> > F# has long since overtaken all other functional languages in terms of
> > industrial uptake and I have not heard that complaint from anyone. Like
> > OCaml, it follows simple rules and is predictable as a consequence.
>
> Figures to back up this extraordinary claim? (And I don't mean the
> unverifiable figures of a certain Cambridge-based consultancy).
>
> These commercial enterprises better hope they don't need to modify the
> F# compiler at all, and that MS keep releasing new versions and fixes
> forever, because the terms of the F# license would prevent them from
> fixing it themselves (unlike if they'd decided to go with an open
> source solution).

Availability of source code enables that, but is not a guarantee that a fix
will be forthcoming or economical. Gcc codebase is all for us to see, yet it
would require either a genius or lots of time for the ordinary ones among us
to get to speed to work with it in general. I've attempted it 2-3 times, and
I gave up after a while (just wrapping your mind around gas's borkedness can
be revolting), even though I have no problem understanding most of the
concepts involved; I maintain a proprietary, half-assed, just-good-enough
implementation of a nonconforming Lisp which produces MCU (eZ8 and 12 bit
pic) assembly on par with what I can write myself, mostly. But it's written
in Lisp too, and while I could probably port it to C, I could never develop
it in C (it'd degenerate in a way which makes gcc code look stellar). So even
if you do have knowledge in the field, but no first-hand exposure to
braindamage involved with writing (and maintaining) a compiler of any sort in
a low level lanugage like C, you might as well have no access to the source
code -- it won't help much beyond simple recompilation or minor patches
needed to have the code compile on a newer revision of the host platform (say
newer Linux distro).