http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/
I think it is interesting to discuss which aspects of OCaml can be improved
upon and how but I disagree with some of his points. I'll address each of the
original points in turn:
1. Lack of Parallelism: Yes, this is already a complete show stopper.
Exploiting multicores requires a scalable concurrent GC and message passing
(like JoCaml) is not a substitute. Unfortunately, this is now true of all
functional languages available for Linux, which is why we have now migrated
entirely to Windows and F#. I find it particularly ironic that the Haskell
community keep hyping the multicore capabilities of pure code when the
rudimentary GC in Haskell's only usable implementation already stopped
scaling.
2. Printf: I like OCaml's printf. So much, in fact, that I wish it were in
Pervasives (as it is in F#) so I didn't have to do "open Printf" all the time
in OCaml. While there are theoretically-elegant functional equivalents they
all suck in practical terms, primarily due to hideous error messages. I think
printf is one of the reasons OCaml dominates over languages like Haskell and
SML. Easy hash tables are another.
3. Lack of multi-file modules: I have never found this to be a problem. Nor do
I find filenames implying module names to be a problem, as many SML advocates
seem to believe (yes, both of them ;-).
4. Mutable data: I believe the exact opposite. The ability to drop down to
mutable data structures for performance without leaving the language is
essential and the ability to predict memory consumption is essential, both of
which Haskell lacks. Consequently, Haskell's inability to handle mutation
efficiently and safely have doomed it to failure for practical applications.
5. Strings: pushing unicode throughout a general purpose language is a
mistake, IMHO. This is why languages like Java and C# are so slow.
6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that I
would like to tweak (e.g. adding an optional "end" after a "match"
or "function" to make them easier to nest), I am not bother about the
shift-reduce conflicts. Mainstream languages get by with far more serious
syntactic issues (like <<...>> in C++).
7. Not_found: I like this, and Exit and Invalid_argument. Brian's point that
the name of this exception does not convey its source is fallacious: that's
what exception traces are for.
8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
the "exceptions are for exceptional circumstances" line promoted by the
advocates of any language implementation with cripplingly-slow exception
handlers. I really miss fast exception handling in F#. Brian gives an example
of exception handling with recursive IO functions failing to be tail
recursive here and advocates option types. But recursion is the wrong tool
for the job here and option types are even worse. You should use mutation
and, failing that, CPS.
9. Deforestation: Brian says "Haskell has introduced a very interesting and
(to my knowledge) unique layer of optimization, called deforrestation". True,
of course, but useless theoretical piffle because we know that Haskell is
slow in practice and prohibitively difficult to optimize to-boot. Deforesting
is really easy to do by hand.
10. Limited standard library: I agree but this is only an issue because we are
not able to fix the problem by contributing to the OCaml distribution.
11. Slow lazy: I had never noticed.
The only major gripe that I have with OCaml is lack of a concurrent GC. I
think this general deficit is going to have a massive adverse impact on the
whole of Linux and lots of people will migrate to Windows and .NET when they
see how much faster their code can run.
I have other wish-list items of my own to add:
JIT compilation for metaprogramming.
Type specialization.
Unboxed types (structs).
No 16Mb limit.
Inlining.
Custom per-type functions for comparison, equality and hashing.
An intermediate representation that I can sell software in to earn a living.
Pattern matching over lazy values.
I believe these can be fixed by creating a new open source functional language
for Linux based upon LLVM. However, the lack of a suitable GC is a complete
show stopper. The JVM is the only thing that comes close and it is unable to
support tail calls without a catastrophic performance cost, i.e. so bad that
you might as well write an interpreter.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
Caml-list mailing list. Subscription management:
http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
Archives: http://caml.inria.fr
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
Bug reports: http://caml.inria.fr/bin/caml-bugs
I would add a 3a: the inability to have a "root"-level functor. As it
stands, all functors are (sometimes naturally, sometimes not) embedded
in a "root" module. I'm looking at you, Map.Make. You too, Set.Make.
> 4. Mutable data: I believe the exact opposite. The ability to drop down to
> mutable data structures for performance without leaving the language is
> essential and the ability to predict memory consumption is essential, both of
> which Haskell lacks. Consequently, Haskell's inability to handle mutation
> efficiently and safely have doomed it to failure for practical applications.
Well, in fairness, there's always the ST monad. However, the rank-2
polymorphism involved is nontrivial, and I agree with you [Jon.]
> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.
This is simply ridiculous. Using heavy-weight unicode-aware functions
for character operations may slow down string-intensive operations in
those languages, but the only alternative is to be broken. See items 3
and 4 here:
http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
In both cases, we need locale-aware character processing. That means
Unicode these days. Unless you code your own routines for processing
every 8-bit character set out there. I don't.
> 6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that I
> would like to tweak (e.g. adding an optional "end" after a "match"
> or "function" to make them easier to nest), I am not bother about the
> shift-reduce conflicts. Mainstream languages get by with far more serious
> syntactic issues (like <<...>> in C++).
I've been bitten by exactly these sorts of problems. Sometimes I get an
obscure type error, sometimes I don't. I've just gotten used to not
placing a match-like construct in the middle of a sequence expression
when I can avoid it.
> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
> the "exceptions are for exceptional circumstances" line promoted by the
> advocates of any language implementation with cripplingly-slow exception
> handlers. I really miss fast exception handling in F#. Brian gives an example
> of exception handling with recursive IO functions failing to be tail
> recursive here and advocates option types. But recursion is the wrong tool
> for the job here and option types are even worse. You should use mutation
> and, failing that, CPS.
I suspect his reaction to exceptions comes from an unfamiliarity with
their uses in Ocaml. In C++/Java land, the gospel is to only use them
for exceptional circumstances. In the case of C++, this is probably
because of the difficulty of writing exception-safe code: if you're
going to throw an exception, your code is likely going to be broken as
a result. As a result, exceptions seem to be reserved for situations
where normal processing must terminate anyways.
> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation". True,
> of course, but useless theoretical piffle because we know that Haskell is
> slow in practice and prohibitively difficult to optimize to-boot. Deforesting
> is really easy to do by hand.
Yes, its easy to do by hand. It's also time consuming. Don't you make a
living off ocaml? I'm surprised you can justify using your time on such a
mechanical task. Granted, deforesting something like map f . map g into
map (f . g) is trivial, but we can come up with both trivial and
nontrivial examples for almost anything.
Cheers,
Matt
>
> Brian Hurt recently published the following blog post "Why OCaml sucks":
>
> http://enfranchisedmind.com/blog/2008/05/07/why-ocaml-sucks/
>
> I think it is interesting to discuss which aspects of OCaml can be improved
> upon and how but I disagree with some of his points. I'll address each of
> the
> original points in turn:
>
> 1. Lack of Parallelism: Yes, this is already a complete show stopper.
> Exploiting multicores requires a scalable concurrent GC and message passing
> (like JoCaml) is not a substitute. Unfortunately, this is now true of all
> functional languages available for Linux, which is why we have now migrated
> entirely to Windows and F#. I find it particularly ironic that the Haskell
> community keep hyping the multicore capabilities of pure code when the
> rudimentary GC in Haskell's only usable implementation already stopped
> scaling.
Fork? For something like a raytracer, I do not see how threads would be any
more useful than fork. For the UI, the native threads suffice. For
windows... well f*ck windows. When was the last time you heard of a cool
new windows app anyway? MSFT has killed their own platform with bad
engineering decisions.
That said... I guess you could say we should support native multithreading,
just because there are apparently rare niche applications (I'm told
webservers are one), but I suspect that fork may still be better, and
there's other things that would be nicer, like full support for recursive
modules/functors. There's also a nasty bug involving the typechecker with
labeled arguments and classes. How do you file a bug against ocaml anyway?
>
>
> 2. Printf: I like OCaml's printf. So much, in fact, that I wish it were in
> Pervasives (as it is in F#) so I didn't have to do "open Printf" all the
> time
> in OCaml. While there are theoretically-elegant functional equivalents they
> all suck in practical terms, primarily due to hideous error messages. I
> think
> printf is one of the reasons OCaml dominates over languages like Haskell
> and
> SML. Easy hash tables are another.
Easy hash tables would indeed be nice. I think that the python syntax for
printf might be better, but I'm uncertain on that.
>
>
> 3. Lack of multi-file modules: I have never found this to be a problem. Nor
> do
> I find filenames implying module names to be a problem, as many SML
> advocates
> seem to believe (yes, both of them ;-).
Why in god's name would you have modules spread across multiple files?
That's like having a 1000 line function. This is why I want recursive
modules btw. Not only do I want them recursive, but I would like for them
to be able to be recursive across files, but I'm guessing that that one
would be a long shot. That having been said, I've always managed to work
around it elegantly in the end. The sort of situations that call for a
truly humongous module will often tend to favor OO style programming anyway.
>
>
> 4. Mutable data: I believe the exact opposite. The ability to drop down to
> mutable data structures for performance without leaving the language is
> essential and the ability to predict memory consumption is essential, both
> of
> which Haskell lacks. Consequently, Haskell's inability to handle mutation
> efficiently and safely have doomed it to failure for practical
> applications.
>
I freaking hate mutable data. I couldn't care less about performance.
Excessive use of mutable data makes code difficult to understand, and it is
why C code is so hard to understand. Mind you, I do C for a living and have
been writing code in it for nearly 10 years. While that doesn't make me a
master, I am not a mere n00b either. Making everything mutable means you
have to keep track of every variable in a function or class. It's not so
bad if you're looking at good code, but not all the programmers out there
are good.
And it is occasionally strangely difficult to write good code when your only
tools are mutable single variables (no tuples), pointers and function calls.
>
> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.
This is actually a serious letdown. There should be built-in support for
unicode in ocaml. If not, then there should at least be some maintained
"distro" for ocaml where camomile is packaged in. Indeed, i would go so far
to say that the linux distro maintainers should do their bit and make the
ocaml metapackages install camomile or some other unicode package. I do not
use it myself but unicode is terribly important for businesses.
>
>
> 6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that
> I
> would like to tweak (e.g. adding an optional "end" after a "match"
> or "function" to make them easier to nest), I am not bother about the
> shift-reduce conflicts. Mainstream languages get by with far more serious
> syntactic issues (like <<...>> in C++).
they should use parsers that aren't brain dead
>
> 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point
> that
> the name of this exception does not convey its source is fallacious: that's
> what exception traces are for.
I actually find it a bit annoying, but it's not serious enough for me to
complain about it normally. I would've prefered that the file open
functions return a Haskell style either of the file or the error itself.
Say, why don't we have an either datatype in ocaml?
>
>
> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
> the "exceptions are for exceptional circumstances" line promoted by the
> advocates of any language implementation with cripplingly-slow exception
> handlers. I really miss fast exception handling in F#. Brian gives an
> example
> of exception handling with recursive IO functions failing to be tail
> recursive here and advocates option types. But recursion is the wrong tool
> for the job here and option types are even worse. You should use mutation
> and, failing that, CPS.'
::blink:: exceptions? you guys use those things? xD
I dunno what that guy is bitching about. The code for exception handle
looks quite nearly like what the code would be if you were to do a match
blah with None -> () | Some x -> f x, which is what you'd probably replace
the exception-ish code with anyway.
> I have other wish-list items of my own to add:
>
> . JIT compilation for metaprogramming.
> . Type specialization.
> . Unboxed types (structs).
> . No 16Mb limit.
What do you mean by 16mb limit?
>
> . Inlining.
isn't it best for the compiler to handle that? I wouldn't mind hearing
another perspective on this, but I thought that compilers were smarter these
days.
>
> . Custom per-type functions for comparison, equality and hashing.
> . An intermediate representation that I can sell software in to earn a
> living.
> . Pattern matching over lazy values.
>
> I believe these can be fixed by creating a new open source functional
> language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable
> to
> support tail calls without a catastrophic performance cost, i.e. so bad
> that
> you might as well write an interpreter.
>
> --
> Dr Jon D Harrop, Flying Frog Consultancy Ltd.
> http://www.ffconsultancy.com/products/?e
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
--
-----------------------------------------------------------------------
(\__/)
(='.'=)This is Bunny. Copy and paste Bunny into your
(")_(")signature to help him gain world domination.
------------------------------------------------------------------------
There are two problems with that:
You go back to manual memory management between parallel threads/processes.
Parallelism is for performance and performance requires mutable data
structures.
Then you almost always end up copying data unnecessarily because you cannot
collect it otherwise, which increases memory consumption and massively
degrades performance that, in turn, completely undermines the original point
of parallelism. The cost of interthread communication is then so high in
OCaml that you will rarely be able to obtain any performance improvement for
the number of cores desktop machines are going to see over the next ten
years, by which time OCaml will be 10-100x slower than the competition.
> When was the last time you heard of a cool new windows app anyway?
The last time we released a product. :-)
> > . No 16Mb limit.
>
> What do you mean by 16mb limit?
OCaml's strings and arrays are limited to 16Mb in 32-bit.
> > . Inlining.
>
> isn't it best for the compiler to handle that? I wouldn't mind hearing
> another perspective on this, but I thought that compilers were smarter
> these days.
Definitely not. Compilers uniformly suck at inlining. For example, agressive
inlining is often beneficial in numerical code and often damaging in symbolic
code. Compilers cannot tell the difference.
This is very similar to "unboxed data structures are always better", which
also isn't generally true.
I've got more gripes to add:
Missing types, like float32 and int16.
DLLs.
Your definition of "broken" is broken.
> See items 3 and 4 here:
>
> http://www.moserware.com/2008/02/does-your-code-pass-turkey-test.html
>
> In both cases, we need locale-aware character processing. That means
> Unicode these days. Unless you code your own routines for processing
> every 8-bit character set out there. I don't.
That is not a justification for pushing Unicode throughout a programming
language. You can support Turkish and Arabic without degrading the
performance of my software.
For us, this is about two things:
1. Profit.
2. Performance.
Our sales already correlate with the world distribution of wealth even though
we only support English. There is literally no value in us supporting other
languages. So we have no need of unicode.
What you call "Using heavy-weight unicode-aware functions for character
operations" refers to all lexing and parsing. OCaml provides incredibly fast
lexers and parsers and our customers make heavy use of that. Other languages
do not. This is a major advantage of OCaml, IMHO. So imposing unicode upon us
is not only unnecessary but is actually damaging.
> > 8. Exceptions: I love OCaml's extremely fast exception handling (6x
> > faster than C++, 30x faster than Java and 600x faster than C#/F#!). I
> > hate the "exceptions are for exceptional circumstances" line promoted by
> > the advocates of any language implementation with cripplingly-slow
> > exception handlers. I really miss fast exception handling in F#. Brian
> > gives an example of exception handling with recursive IO functions
> > failing to be tail recursive here and advocates option types. But
> > recursion is the wrong tool for the job here and option types are even
> > worse. You should use mutation and, failing that, CPS.
>
> I suspect his reaction to exceptions comes from an unfamiliarity with
> their uses in Ocaml.
Except Brian has been using OCaml for many years.
> > 9. Deforestation: Brian says "Haskell has introduced a very interesting
> > and (to my knowledge) unique layer of optimization, called
> > deforrestation". True, of course, but useless theoretical piffle because
> > we know that Haskell is slow in practice and prohibitively difficult to
> > optimize to-boot. Deforesting is really easy to do by hand.
>
> Yes, its easy to do by hand. It's also time consuming.
Do you spend >1% of your time deforesting?
> Don't you make a living off ocaml?
I use OCaml professionally in industry but our OCaml-based sales have declined
since Q3 2007 to only £6k p.a. now.
> I'm surprised you can justify using your time on such a mechanical task.
By my estimates, <<1% of my time is spent deforesting. It is irrelevant, just
like all of the other "benefits" of Haskell that have debilitating hidden
costs.
> Granted, deforesting something like map f . map g into
> map (f . g) is trivial, but we can come up with both trivial and
> nontrivial examples for almost anything.
Predictable performance and memory consumption are infinitely more valuable
than the odd trivial rewrite.
Incidentally, I forgot a couple of things for my wish list:
Overloading.
IDE-friendly, e.g. Intellisense for GUI code.
2008/5/9 Jon Harrop <j...@ffconsultancy.com>:
>
> I believe these can be fixed by creating a new open source functional
> language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable
> to
> support tail calls without a catastrophic performance cost, i.e. so bad
> that
> you might as well write an interpreter.
>
I completely agree with you! I had this idea a long time ago, but I was put
off by the enormous code-base one would have to write to make even a simple
usable implementation. However, if there was a team gathered that would
create, let's say, a "next generation ML" on top of LLVM, count me in.
As far as multithreading is concerned, I've gathered some interesting
ideas... Primarily, soft threads are a great thing to have, and we would
have to look no further than our collegue's Felix language implementation,
which supports amaizingly fast green-threads. For the GC, we can turn to our
"father", Xavier Leroy, and his paper about the parallel GC for OCaml Light
[1]. I'm sure that from the time that paper was written, there has been some
improvements and new ideas for GC for multithreading, so by doing enough
research and combining the best ideas, we could easily pull it off.
A great thing would be to support "pluggable GC", i.e. that the programmer
would have the option to select the GC that best sits his program (for
example, Metronome-like GC [2] for real-time programs).
[1] http://citeseer.ist.psu.edu/doligez93concurrent.html
[2] http://citeseer.ist.psu.edu/bacon03metronome.html
My $100, Tom
Regards
Elliott
> . JIT compilation for metaprogramming.
> . Type specialization.
> . Unboxed types (structs).
> . No 16Mb limit.
> . Inlining.
> . Custom per-type functions for comparison, equality and hashing.
> . An intermediate representation that I can sell software in to earn a living.
> . Pattern matching over lazy values.
>
> I believe these can be fixed by creating a new open source functional language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable to
> support tail calls without a catastrophic performance cost, i.e. so bad that
> you might as well write an interpreter.
>
>
_______________________________________________
Why can't you just fork off enough processes to cover each core?
> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.
I wrote an entire wiki in OCaml, it's UTF-8 clean throughout and
demonstrably handles German, French, Italian, Japanese, Korean and
Chinese (http://www.ocaml-tutorial.org/). I wrote another Japanese
website in OCaml. To do it, I just treated strings as UTF-8. The
only UTF-8-specific code in the whole thing was tiny, I think a
UTF-8-sensitive string truncation function, and a little bit of
collation (mostly handled by the database, but could easily have been
done using Camomile). My conclusion: UTF-8 is easy.
Rich.
--
Richard Jones
Red Hat
My very own set of gripes include:
_ Not being to include a module defining a type t in another module
defining a type t (there was a patch for this floating around on the
mailing list that allowed to `hide` part of a module). And since we
are talking about killer module extensions: Mixin's and Russo type
modules have also been tested as extension's to the module system...
_Paying a very high performance price when programming with a high
level of abstraction (combinator libraries,functors...). MLton fixes
at the expense of being a slow compiler.
_HM(x) takes no hint from the compiler and therefor dancing around
recursif polymorphism issues and value restrictions with records and
eta expansion is a harrassing dance that has had the best of me on
numerous occasions. But let's not despair there might be light at the
end of the tunnel: MLF is our knight in shiny armor here.
The combination of these 3 is a big brick wall I've slammed into often
when trying to program at a high level of abstraction. They all have
known solutions (except maybe 2). I do feel privileged that these are
even on my radar; the complaints I have to voice against, say, Java
are so down to earth compared to this....
Till
--
http://till-varoquaux.blogspot.com/
> 2. Printf: I think
> printf is one of the reasons OCaml dominates over languages like Haskell and
> SML.
I'm not sure about "dominate", but yes, it's definitely one reason why I
code in OCaml rather than Haskell.
> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.
We have a good Unicode library (see Camomile), good ropes libraries (see Community Caml)
and someone is bound to write a syntax extension to provide a natural syntax for Rope
literal constants.
> 6. Shift-reduce conflicts: although there as aspects of OCaml's syntax that I
> would like to tweak (e.g. adding an optional "end" after a "match"
> or "function" to make them easier to nest), I am not bother about the
> shift-reduce conflicts. Mainstream languages get by with far more serious
> syntactic issues (like <<...>> in C++).
That's something we may need to discuss at some point. I would really like the ability to write
match a with
( | bla
| bla
| bla )
> 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point that
> the name of this exception does not convey its source is fallacious: that's
> what exception traces are for.
I personally prefer ExtLib's approach of redefining exceptions per-module.
> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!).
Yep.
> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation". True,
> of course, but useless theoretical piffle because we know that Haskell is
> slow in practice and prohibitively difficult to optimize to-boot. Deforesting
> is really easy to do by hand.
Are you sure or is that just a troll ? Supero seems to improve enormously Haskell's performances
and the Shootout already shows Haskell beating OCaml in several tests.
> 10. Limited standard library: I agree but this is only an issue because we are
> not able to fix the problem by contributing to the OCaml distribution.
That's the whole idea of Community Caml / Batteries Included. Really, feel free to contribute.
> . Pattern matching over lazy values.
Have you looked at the Patterns project on Google ? It provides pattern-matching
over lazy values. I've used it in conjunction with my own lazy list module [1]
to port Graham Hutton's Countdown problem from Haskell, and it works.
> I believe these can be fixed by creating a new open source functional language
> for Linux based upon LLVM. However, the lack of a suitable GC is a complete
> show stopper. The JVM is the only thing that comes close and it is unable to
> support tail calls without a catastrophic performance cost, i.e. so bad that
> you might as well write an interpreter.
Why a full new language ? I may understand the interest of writing a new
compiler for OCaml (or whichever other language) and gradually improving
the forked compiler, but that's a different story altogether.
Cheers,
David
[1] https://forge.ocamlcore.org/frs/shownotes.php?release_id=12
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.
They need to share data, e.g. write results into the same matrix or mutate the
same array.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
A friendly word of advice from a member of the Erlang
Community, which just suffered an endless thread with the
subject "Why Erlang sucks", triggered by a similar blog
article on Erlang:
Please change the subject Now!
The surely unintended consequence of such a blog article
(which was otherwise ok, btw) is that Google will now
find 336 hits on "ocaml sucks", threatening to exceed
Erlang's corresponding 337 hits any minute now.
But it's your party. I'll go back to lurking now. (:
BR,
Ulf W
You might want to look at the 'ancient' module. While the common
understanding is that the shared data should be immutable, in fact
you'd be OK if all you want to do is mutate a simple array of unboxed
ints or floats[1]. Of course to an extent you're back to using manual
memory management, but if it's only a few structures that may be
acceptable. Getting parallelism right is hard enough that programmers
should be forced to think about exactly which structures need to be
shared, rather than threads where the default is to share everything
and make the naive assumption that the programmer knows what they're
doing.
Rich.
[1] The real basis for the 'immutable restriction' is explained in the
documentation, along with exceptions, but it's complicated so I won't
go into it here.
--
Richard Jones
Red Hat
_______________________________________________
yeah, a new troll post (!)
> 1. Lack of Parallelism: Yes, this is already a complete show stopper.
no it's not. it's in your fantasy world. lots of applications doesn't
(or marginally) benefits from parallelism, and that your specific turf
would benefit from them, is not a good reason to impose their drawbacks
on everybody else.
> 5. Strings: pushing unicode throughout a general purpose language is a
> mistake, IMHO. This is why languages like Java and C# are so slow.
unicode string should not be the default string, but unicode string need
to be available as a first class citizen. again, ocaml is not about doing
raytracer in opengl only.
> 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point that
> the name of this exception does not convey its source is fallacious: that's
> what exception traces are for.
exception traces are *not* available in long running program (daemon).
and having a Not_found crippling somewhere is just plain annoying.
even having something like a List.Not_found/Hashtbl.Not_found would make
thing a bit easier.
> 8. Exceptions: I love OCaml's extremely fast exception handling (6x faster
> than C++, 30x faster than Java and 600x faster than C#/F#!). I hate
> the "exceptions are for exceptional circumstances" line promoted by the
> advocates of any language implementation with cripplingly-slow exception
> handlers.
exceptions are for exceptional circumstances. using them as a fancy goto
mechanism is just plain stupid and really bad programming style.
> 9. Deforestation: Brian says "Haskell has introduced a very interesting and
> (to my knowledge) unique layer of optimization, called deforrestation". True,
> of course, but useless theoretical piffle because we know that Haskell is
> slow in practice and prohibitively difficult to optimize to-boot. Deforesting
> is really easy to do by hand.
have you been hiding in a cave lately ?
haskell has improve its performance lately; not on everything, but still
can beat ocaml on some micro benchmarks.
> I have other wish-list items of my own to add:
>
> . No 16Mb limit.
use 64 bits.
--
Vincent
That's just crazy talk. Nobody can afford to ignore the multicore era that we
have been in for some time now.
> is not a good reason to impose their drawbacks on everybody else.
What drawbacks?
> > 5. Strings: pushing unicode throughout a general purpose language is a
> > mistake, IMHO. This is why languages like Java and C# are so slow.
>
> unicode string should not be the default string, but unicode string need
> to be available as a first class citizen.
Agreed.
> > 7. Not_found: I like this, and Exit and Invalid_argument. Brian's point
> > that the name of this exception does not convey its source is fallacious:
> > that's what exception traces are for.
>
> exception traces are *not* available in long running program (daemon).
Because you compiled it wrongly or because you lost the output?
> > 8. Exceptions: I love OCaml's extremely fast exception handling (6x
> > faster than C++, 30x faster than Java and 600x faster than C#/F#!). I
> > hate the "exceptions are for exceptional circumstances" line promoted by
> > the advocates of any language implementation with cripplingly-slow
> > exception handlers.
>
> exceptions are for exceptional circumstances.
Bah, nonsense. Exceptions are used extensively for non-exceptional
circumstances in idiomatic OCaml and it works beautifully.
> > 9. Deforestation: Brian says "Haskell has introduced a very interesting
> > and (to my knowledge) unique layer of optimization, called
> > deforrestation". True, of course, but useless theoretical piffle because
> > we know that Haskell is slow in practice and prohibitively difficult to
> > optimize to-boot. Deforesting is really easy to do by hand.
>
> have you been hiding in a cave lately?
With yo mamma.
> haskell has improve its performance lately; not on everything, but still
> can beat ocaml on some micro benchmarks.
Look at the objective and quantitative results using the latest GHC on a
modern machine. Haskell can't even touch OCaml, let alone F#. ;-)
> > I have other wish-list items of my own to add:
> >
> > . No 16Mb limit.
>
> use 64 bits.
You aren't customer facing are you?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
My point is that actual performance on real code is what matters and not the
number of optimizations that might theoretically apply. From the measurements
I have seen and done, Haskell is substantially slower despite its many extra
optimizations. Also, the Haskell community are very quick to cite useless
superfluous academic piffle that is of no practical use. They have a lot of
very fast Fibonacci number generators, for example.
We investigated alternative languages to diversify into last year and Haskell
was one of them. The single biggest problem with Haskell is that it is wildly
unpredictable in terms of performance and memory consumption. I asked many
specialists what they thought of Haskell for Scientists and they all advised
me to forget the idea. The next biggest problem is that it is as commerce
unfriendly as OCaml (e.g. no DLLs).
At last year's CUFP, a Haskell user from industry explained how they had to
drop to C/C++ whenever performance was relevant.
I received dozens of Haskell implementations of our ray tracer benchmark and
performance varied enormously but only one person managed to write an
implementation that gets within 3x the performance of OCaml, and he had spent
his life writing optimizing Haskell compilers. I asked why his code was fast
and he said it is too difficult to explain why. And that is a tiny program.
The Haskell mailing lists are full of people asking why their programs run so
slowly. The response is generally to litter the code with strictness
annotations and then resort to unsafe operations. There is virtually no
usable information explaining how to optimize Haskell code.
In contrast, OCaml is easy to optimize by following simple well-documented
rules and can achieve excellent performance.
> Supero seems to improve enormously Haskell's performances and the Shootout
> already shows Haskell beating OCaml in several tests.
The shootout is completely flawed by design. Just ignore it.
Also, they are running legacy hardware and OCaml does much better on modern
64-bit machines. For example, they claim that Haskell is 50% faster on binary
trees but running the test here, OCaml is 40% faster. They claim Haskell is
60% faster on nbody by here OCaml is 60% faster.
> > 10. Limited standard library: I agree but this is only an issue because
> > we are not able to fix the problem by contributing to the OCaml
> > distribution.
>
> That's the whole idea of Community Caml / Batteries Included. Really, feel
> free to contribute.
I thought Community OCaml was about tacking things on externally using macros
and extra libraries.
> > . Pattern matching over lazy values.
>
> Have you looked at the Patterns project on Google ? It provides
> pattern-matching over lazy values. I've used it in conjunction with my own
> lazy list module [1] to port Graham Hutton's Countdown problem from
> Haskell, and it works.
I had not seen that. I'll check it out, thanks.
> > I believe these can be fixed by creating a new open source functional
> > language for Linux based upon LLVM. However, the lack of a suitable GC is
> > a complete show stopper. The JVM is the only thing that comes close and
> > it is unable to support tail calls without a catastrophic performance
> > cost, i.e. so bad that you might as well write an interpreter.
>
> Why a full new language? I may understand the interest of writing a new
> compiler for OCaml (or whichever other language) and gradually improving
> the forked compiler, but that's a different story altogether.
A concurrent GC is the only major issue. Everything else is comparatively
easy.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
I think the parallelism capabilities are already excellent. We have been
able to implement the application backend of Wink's people search in
O'Caml, and it is of course a highly parallel system of programs. This
is not the same class raytracers or desktop parallelism fall into - this
is highly professional supercomputing. I'm talking about a cluster of
~20 computers with something like 60 CPUs.
Of course, we did not use multithreading very much. We are relying on
multi-processing (both "fork"ed style and separately started programs),
and multiplexing (i.e. application-driven micro-threading). I especially
like the latter: Doing multiplexing in O'Caml is fun, and a substitute
for most applications of multithreading. For example, you want to query
multiple remote servers in parallel: Very easy with multiplexing,
whereas the multithreaded counterpart would quickly run into scalability
problems (threads are heavy-weight, and need a lot of resources).
> There are two problems with that:
>
> . You go back to manual memory management between parallel threads/processes.
I guess you refer to explicit references between processes. This is a
kind of problem, and best handled by avoiding it. We have some cases
where we have to keep remote state. The solution was to have a timer,
and delete it after some time of not accessing it.
After all, most state is only temporary, and if it is lost, it can be
created again (at some cost, of course).
> . Parallelism is for performance and performance requires mutable data
> structures.
In our case, the mutable data structures that count are on disk.
Everything else is only temporary state.
I admit that it is a challenge to structure programs in a way such that
parallel programs not sharing memory profit from mutable state. Note
that it is also a challenge to debug locks in a multithreaded program so
that they run 24/7. Parallelism is not easy after all.
> Then you almost always end up copying data unnecessarily because you cannot
> collect it otherwise, which increases memory consumption and massively
> degrades performance that, in turn, completely undermines the original point
> of parallelism.
Ok, I understand. We are complete fools. :-)
I think that the cost of copying data is totally overrated. We are doing
this often, and even over the network, and hey, we are breaking every
speed limit.
> The cost of interthread communication is then so high in
> OCaml that you will rarely be able to obtain any performance improvement for
> the number of cores desktop machines are going to see over the next ten
> years, by which time OCaml will be 10-100x slower than the competition.
This is a quite theoretical statement. We will rather see that most
application programmers will not learn parallelism at all, and that
consumers will start question the sense of multicores, and the chip
industry will search for alternatives.
And _if_ application programmers learn parallelism, then rather in the
multi-processing/multiplexing setup we use, and not the multithreading
style you propagate. And on servers (where parallelism ought to happen),
the poor support of Windows for it (lacking "fork" and other cool
features) is no problem.
Gerd
>
> > When was the last time you heard of a cool new windows app anyway?
>
> The last time we released a product. :-)
>
> > > . No 16Mb limit.
> >
> > What do you mean by 16mb limit?
>
> OCaml's strings and arrays are limited to 16Mb in 32-bit.
>
> > > . Inlining.
> >
> > isn't it best for the compiler to handle that? I wouldn't mind hearing
> > another perspective on this, but I thought that compilers were smarter
> > these days.
>
> Definitely not. Compilers uniformly suck at inlining. For example, agressive
> inlining is often beneficial in numerical code and often damaging in symbolic
> code. Compilers cannot tell the difference.
>
> This is very similar to "unboxed data structures are always better", which
> also isn't generally true.
>
> I've got more gripes to add:
>
> . Missing types, like float32 and int16.
> . DLLs.
>
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
--
Ralph
Do you have any pointer on multiplexing (or some piece of code you could
show)? This seems interesting but I can't figure out what it looks like.
Thanks,
--
Gabriel
~~ Robert.
Dear Jon,
I will keep reminding you that Erlang is a functional language
(just not a statically typed one). It has very lightweight
processes, concurrent schedulers on multicore, and per-process
GC. It scales very well on multicore.
I can understand if you don't consider Erlang a competitive
alternative for building ray tracers, but I will protest
while you insist on making sweeping statements about *all*
functional languages that are patently untrue for Erlang.
Both in the Haskell and Erlang communities, there is
occasional reference to Erlang/Haskell/OCaml as a very
useful triple, having much in common, but each being
especially good in some fairly well-defined area.
I think this perspective is useful for all three languages.
BR,
Ulf W
BTW, the ErlOcaml project is beginning to reach a state of
usefulness, at least for experiments:
http://code.google.com/p/erlocaml/
It allows an Ocaml program to act as a "hidden erlang node",
talking to Erlang programs via Erlang message passing.
One possible use for this could be to use Erlang as a
systems programming language, keeping your application
logic in OCaml. Erlang/OTP comes with practically everything
you need to turn your product into a robust cluster,
including a distributed database and some very useful logic
for error recovery and in-service upgrade. And it's all
Open Source.
Many thanks to Ludovic Coquelle for all the coding he has done.
The rest of us have kept to the cheerleading section for now.
For some background information look here:
- Asynchronous RPC:
http://projects.camlcity.org/projects/dl/ocamlnet-2.2.9/doc/html-main/Rpc_intro.html
Look for the asynchronous examples
- The low-level side is enlightened here:
http://projects.camlcity.org/projects/dl/ocamlnet-2.2.9/doc/html-main/Equeue_intro.html
Here is a code snipped from our production code. It is from a server
that continuously monitors RPC ports:
let check_port esys p =
(* Checks the port p continuously until mon_until - then the port is deleted
from the list
*)
let (ba,ua) = Lazy.force shmem in
let rec next_check() =
let now = Unix.gettimeofday() in
if now < Int64.to_float p.mon_until then (
let conn =
match p.mon_port with
| `local_port _ ->
failwith "Monitoring local ports is not supported"
| `inet4_port a ->
Rpc_client.Inet(a.inet4_host, a.inet4_port) in
let rpc_proto =
match p.mon_protocol with
| `tcp -> Rpc.Tcp
| `udp -> Rpc.Udp in
let client =
Capi_clnt.Generic.V1.create_client2
~esys
(`Socket(rpc_proto, conn, Rpc_client.default_socket_config)) in
Rpc_helpers.set_exn_handler "generic" client;
Rpc_client.configure client 0 5.0; (* 5 secs timeout *)
Capi_clnt.Generic.V1.ping'async
client
()
(fun get_reply ->
(* We are called back when the "ping" is replied or times out *)
let successful = try get_reply(); true with _ -> false in
Rpc_client.shut_down client;
p.mon_alive <- successful;
ba.{ p.mon_shmpos } <- (if successful then 1L else 0L);
let g = Unixqueue.new_group esys in
Unixqueue.once esys g 1.0 next_check
);
)
else
remove_port p
in
next_check()
This (single-process) server can watch a high number of ports at the
same time and consumes almost no resources. Note how the loop is
programmed, essentially we have
let rec next_check() =
...
Capi_clnt.Generic.V1.ping'async
client
()
(fun get_reply ->
...
Unixqueue.once esys g 1.0 next_check
)
("once" calls the passed functions once in the future, here after 1
second). Basically, you have a closure with the action in the event
queue, and when the action is done, a new closure is appended to this
queue to schedule the next task. Using closures is important because
this enforces that all stack values are copied to the heap (I view
closures in this context as a means to "heapify" values), so the
recursion is stackless.
This server is also interesting because we actually use shared memory
for speeding up the communication path between client and server (look
at the "ba.{ p.mon_shmpos } <- ..." line). The client is here the
program that wants to know whether a port is alive. This is an
optimization for the most frequent case, but works only if client and
server reside on the same node. (Actually, we run this server on every
node, so this is always the case.) Effectively, it is no big problem to
combine shared memory style and multi-processing.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
Cheers,
David
On Fri, 2008-05-09 at 13:58 +0200, Gabriel Kerneis wrote:
> Do you have any pointer on multiplexing (or some piece of code you could
> show)? This seems interesting but I can't figure out what it looks like.
>
> Thanks,
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.
_______________________________________________
A Result module complements this with monadic operations for whoever is
interested.
Cheers,
David
On Fri, 2008-05-09 at 07:37 -0400, Ralph Douglass wrote:
> Not_found and Failure _ can be a bit annoying to watch for, so I
> suggest using Core. The default behavior for most functions is to
> return an option instead of raising an exception. There are _exn
> versions of the functions if you want exceptions.
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.
_______________________________________________
I disagree. SMP Erlang has no mutable data structures,
but achieves very good scalability anyway.
http://www.franklinmint.fm/blog/archives/000792.html
Performance implies different things in different
applications, of course. In some cases, I'm sure mutable
data structures might seem unavoidable. But you can't
equate them with performance.
The following quote from a recent post on the ErlyWeb list
is an example of both multicore scalability and some
wicked performance - without mutable data structures:
"Under some pretty extreme loads - around 20,000 open
mainframe connections - I was blowing up Erlyweb/Yaws.
As a sanity check, this was when Erlyweb/Yaws was
consuming ~90% of all 4 CPUs on a dedicated newish
Dell server running Ubuntu 7.10 server; there was
probably smoke coming out of the box at the time ;->
On occasion, I was also blowing up CICS on the mainframe
(a Z9 series, so pretty new gear) with workload supplied
by this single Erlyweb box on occasion, so maybe we were
getting into some weird sort of temporal, black-hole type
situation where the universe was starting to fold in on
itself ;-> There's nothing like applying extreme workloads
to show up some really strange problems. " (David Mitchell)
http://groups.google.com/group/erlyweb/browse_thread/thread/587d010c7b3e024
BR,
Ulf W
Ok, I'll take your word on that.
> > > 10. Limited standard library: I agree but this is only an issue because
> > > we are not able to fix the problem by contributing to the OCaml
> > > distribution.
> >
> > That's the whole idea of Community Caml / Batteries Included. Really, feel
> > free to contribute.
>
> I thought Community OCaml was about tacking things on externally using macros
> and extra libraries.
Yes and no.
* Batteries Included =
** take the legacy standard library, ExtLib, Camomile, SDFlow and other
libraries (list undecided so far, but pcre, ocamlnet, camlzip are strong
candidates), add some glue to get them all to work together, and build a
comprehensive external library
** do the same thing for macros
** distribute everything as a GODI package with lots of dependencies
* Community OCaml = use the same codebase but
** completely replace the legacy standard library
** turn this into a full, community-maintained, OCaml distribution.
In either case, the objective is to achieve something JDK-like in its
reach.
> > Why a full new language? I may understand the interest of writing a
> new
> > compiler for OCaml (or whichever other language) and gradually
> improving
> > the forked compiler, but that's a different story altogether.
>
> A concurrent GC is the only major issue. Everything else is
> comparatively
> easy.
Well, if you get to write a concurrent GC for the core of OCaml, feel
free to alert the community. People might be willing to help with
advanced features.
Cheers,
David
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act
brings liquidations.
_______________________________________________
> We investigated alternative languages to diversify into last year and
Haskell
> was one of them. The single biggest problem with Haskell is that it is
wildly
> unpredictable in terms of performance and memory consumption.
>
This is only the case if you don't understand lazy evaluation. This is no
different from OCaml, or any language. One must understand the operational
semantics to write efficient code. Imagine how a C programmer feels when
writing OCaml without knowing to make functions tail-recursive.
> The Haskell mailing lists are full of people asking why their programs
run so
> slowly. The response is generally to litter the code with strictness
> annotations and then resort to unsafe operations. There is virtually no
> usable information explaining how to optimize Haskell code.
>
Many people using Haskell don't fully appreciate the enormous difference
between eager and lazy evaluation; furthermore, most languages, functional
or otherwise, use some sort of eager evaluation. Strictness annotations
and unsafe operations are rarely necessary to write efficient code (but
they are necessary to make code written for an eager language run fast in
a lazy language).
In any case, I'm not trying to push Haskell or OCaml; they are both useful
in the real-world.
I wonder if similar complaints (unpredicatable performance, memory use,
dearth of practical information) will arise about F# as it starts to be
widely adopted in the real world.
-Jeff
---
This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and destroy this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
Scalability != performance. For CPU intensive tasks, Erlang is uniformly slow.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
I don't see how you can say that. If you introduce the
restriction that there can be only one CPU, then this
might be true. Some applications are very difficult to
scale to multiple CPUs, but others are inherently
scalable. For some problems, mutable data structures
make all the difference, and in others, they just make
a mess of things.
If you say "parallelism is for performance", you imply
that the program is scalable, and that it actually makes
a difference to add more CPUs. In this case, mutable
the presence of data structures will make scaling more
difficult. Most problems involved in utilizing multicore
boil down to the widespread use of mutable data structures.
If the problem isn't scalable, then other tricks are needed
to achieve performance, and mutable data structures may
be indispensable.
BR,
Ulf W
If OCaml is good for concurrency on distributed systems that is great but it
is completely different to CPU-bound parallelism on multicores.
> > There are two problems with that:
> >
> > . You go back to manual memory management between parallel
> > threads/processes.
>
> I guess you refer to explicit references between processes. This is a
> kind of problem, and best handled by avoiding it.
Then you take a massive performance hit.
> > . Parallelism is for performance and performance requires mutable data
> > structures.
>
> In our case, the mutable data structures that count are on disk.
> Everything else is only temporary state.
Exactly. That is a completely different kettle of fish to writing high
performance numerical codes for scientific computing.
> I admit that it is a challenge to structure programs in a way such that
> parallel programs not sharing memory profit from mutable state. Note
> that it is also a challenge to debug locks in a multithreaded program so
> that they run 24/7. Parallelism is not easy after all.
Parallelism is easy in F#.
> > Then you almost always end up copying data unnecessarily because you
> > cannot collect it otherwise, which increases memory consumption and
> > massively degrades performance that, in turn, completely undermines the
> > original point of parallelism.
>
> Ok, I understand. We are complete fools. :-)
>
> I think that the cost of copying data is totally overrated. We are doing
> this often, and even over the network, and hey, we are breaking every
> speed limit.
You cannot afford to pay that price for parallel implementations of most
numerical algorithms.
> > The cost of interthread communication is then so high in
> > OCaml that you will rarely be able to obtain any performance improvement
> > for the number of cores desktop machines are going to see over the next
> > ten years, by which time OCaml will be 10-100x slower than the
> > competition.
>
> This is a quite theoretical statement. We will rather see that most
> application programmers will not learn parallelism at all, and that
> consumers will start question the sense of multicores, and the chip
> industry will search for alternatives.
On the contrary, that is not a theoretical statement at all: it already
happened. F# already makes it much easier to write high performance parallel
algorithms and its concurrent GC is the crux of that capability.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
I will keep reminding you at Erlang is not competitively performance for
CPU-bound computation like number crunching. The fact that it scales well on
distributed clusters for massively concurrent applications is irrelevant:
that has nothing to do with multicore computing.
> "Under some pretty extreme loads - around 20,000 open
> mainframe connections - I was blowing up Erlyweb/Yaws.
> As a sanity check, this was when Erlyweb/Yaws was
> consuming ~90% of all 4 CPUs on a dedicated newish
> Dell server running Ubuntu 7.10 server; there was
> probably smoke coming out of the box at the time ;->
If this were relevant to multicore computing, an "extreme load" would
certainly be 100% CPU usage and not 90%.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
Simon Peyton-Jones told me that. I am sure you will agree that he understands
lazy evaluation.
> This is no
> different from OCaml, or any language. One must understand the operational
> semantics to write efficient code. Imagine how a C programmer feels when
> writing OCaml without knowing to make functions tail-recursive.
Tail calls have simple rules. Graph reduction of a lazy program with
optimizing rewrites and strictness analysis does not adhere to simple rules.
Simple rules => predictable.
> I wonder if similar complaints (unpredicatable performance, memory use,
> dearth of practical information) will arise about F# as it starts to be
> widely adopted in the real world.
F# has long since overtaken all other functional languages in terms of
industrial uptake and I have not heard that complaint from anyone. Like
OCaml, it follows simple rules and is predictable as a consequence.
Some of the rules are very different in F#. For example, nested closures
degrade performance in OCaml but improve performance in F#.
This may well change as we transition to manycore and parallelism makes
performance unpredictable.
There is a vast number of applications where performance
is not about number crunching. OCaml stands a good chance
of expanding into some of them, e.g. if it can grow into
providing better support for concurrency.
> The fact that it scales well on
> distributed clusters for massively concurrent applications
> is irrelevant: that has nothing to do with multicore
> computing.
Who's been talking about distributed clusters?
Erlang does scale on distributed clusters, but the
link I provided was about scaling on multicore.
OCaml could too, but I don't think that insisting on
the use of mutable data structures is the way to go.
This may surprise you, but I don't subscribe to this list
in order to try to convert OCaml programmers to Erlang.
I think Erlang, OCaml and Haskell can benefit from
an exchange of ideas. I will continue to believe that at
least some members of the OCaml community share this view.
BR,
Ulf W
> On Friday 09 May 2008 16:38:55 Jeff Polakow wrote:
> > Hello,
> >
> > > We investigated alternative languages to diversify into last year and
> > > Haskell
> > > was one of them. The single biggest problem with Haskell is that it is
> > > wildly
> > > unpredictable in terms of performance and memory consumption.
> >
> > This is only the case if you don't understand lazy evaluation.
>
> Simon Peyton-Jones told me that. I am sure you will agree that he
> understands
> lazy evaluation.
>
> This is no
> > different from OCaml, or any language. One must understand the
> operational
> > semantics to write efficient code. Imagine how a C programmer feels when
> > writing OCaml without knowing to make functions tail-recursive.
>
> Tail calls have simple rules. Graph reduction of a lazy program with
> optimizing rewrites and strictness analysis does not adhere to simple
> rules.
> Simple rules => predictable.
>
I concur. Having your programming language require a compiler with
unspecifiably sophisticated optimizations to yield useful performance is not
a very good thing:
I wrote something about this here:
http://abaababa.blogspot.com/2008/03/transparent-performance.html
> I wonder if similar complaints (unpredicatable performance, memory use,
> dearth of practical information) will arise about F# as it starts to be
> widely adopted in the real world.
F# has long since overtaken all other functional languages in terms of
> industrial uptake and I have not heard that complaint from anyone. Like
> OCaml, it follows simple rules and is predictable as a consequence.
Jon, I imagine that the reasons for your unending praise of F# concurrent
goodness are specific to your peculiar number-crunching, cash-earning
applications, of which we don't know much.
Would you be kind enough to detail a little bit, on this list, the kind of
things
you do with F# and Ocaml, what kind of customers you have, what their
attitudes
are towards Ocaml and functional programming, and what horrible lengths
Ocaml
forces you to go to ship sellable closed-source code, if any? And please,
don't ask us to
spend thirty pounds (is that still a valid currency?) for that information
:)
Meanwhile the world has been going free & open source for 10 years and F# is
as FOSS as the Coca-Cola company sells organic broccoli. Also I'm not sure
of the speed at which the Windows platform will go down the drain but it
doesn't seem to do very well. Agreed, Windows
has its niches in the industry (finance, CAD, electronics) but some
subniches could quickly switch to Linux/BSD/whatever the moment their major
commercial application (say, Autocad) are ported to Linux.
However 10 years from now we'll still have Linux and Ocaml will still be
running on them.
As the Ocaml community, with some effort on our part, we could make Ocaml a
substantial
alternative to the "P" in LAMP. Monadic prison is too tough for the layman,
yet the laymen
are starting to asking questions about the adequacy of "dynamic typing"
w.r.t. software engineering problems (maintainability, cooperation, monkey
patching issues) and performance problems.
So let's continue the good work on Batteries, Core Lib, etc. and not worry
too much about the concurrency of the GC. If that gets fixed, it'll be good
to have around, but it's not critical.
--
Berke
You sound like somebody who tries to sell hardware :-)
Well, our algorithms are quite easy to parallelize. I don't see a
difference in whether they are CPU-bound or disk-bound - we also have
lots of CPU-bound stuff, and the parallelization strategies are the
same.
The important thing is whether the algorithm can be formulated in a way
so that state mutations are rare, or can at least be done in a
"cache-friendly" way. Such algorithms exist for a lot of problems. I
don't know which problems you want to solve, but it sounds like as if it
were special problems. Like for most industries, most of our problems
are simply "do the same for N objects" where N is very large, and
sometimes "sort data", also for large N.
> > In our case, the mutable data structures that count are on disk.
> > Everything else is only temporary state.
>
> Exactly. That is a completely different kettle of fish to writing high
> performance numerical codes for scientific computing.
I don't understand. Relying on disk for sharing state is a big problem
for us, but unavoidable. Disk is slow memory with a very special timing.
Experience shows that even accessing state over the network is cheaper
than over disk. Often, we end up designing our algorithms around the
disk access characteristics. Compared to that the access to RAM-backed
state over network is fast and easy.
> > I admit that it is a challenge to structure programs in a way such that
> > parallel programs not sharing memory profit from mutable state. Note
> > that it is also a challenge to debug locks in a multithreaded program so
> > that they run 24/7. Parallelism is not easy after all.
>
> Parallelism is easy in F#.
Wonders must have happened I'm not aware of. How does F# prevent
deadlocks?
> > This is a quite theoretical statement. We will rather see that most
> > application programmers will not learn parallelism at all, and that
> > consumers will start question the sense of multicores, and the chip
> > industry will search for alternatives.
>
> On the contrary, that is not a theoretical statement at all: it already
> happened. F# already makes it much easier to write high performance parallel
> algorithms and its concurrent GC is the crux of that capability.
Don't misunderstand me, I'm not anti-F#. I only have no interests right
now in taking advantage of multicores by concurrent GC's. I rather want
to have an ultra-fast single-core execution. I can do the
parallelization myself.
Gerd
>
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
I'm thinking of a more detailed practical use-case with more contextual
detail ("we're indexing people"), numbers ("we have 1,234,567 persons in our
database, distribued over X machines with Y processors"), performance
figures (each CPU runs x Ocaml processes compiled in native code, those eat
95% of CPU and spend 5% of their time in IO with a RSS of 80MB; profiling
shows that most of their time is spent there; we figured that if you do this
and that, it goes faster, etc.).
That kind of information would be very welcome and could do much good to
Ocaml, instead of flamewar-provoking posts.
--
Berke
shm_open shares memories through file descriptors and, under
linux/glibc, this done using /dev/shm. You can mmap this as a bigarray
and, voila, shared memory. This is quite nice for numerical
computation, plus you get closures etc... in your forks. Oh and COW on
modern OS's makes this very cheap.
--
http://till-varoquaux.blogspot.com/
Yes, that's the kind of approach I like.
- Do not forget to do a Gc.compact before forking to avoid collecting the
same unreacahble data in each fork.
- For sharing complex data, you can marshall into a shared Bigarray.
If the speed of Marshal becomes a bottleneck, a specialized Marshal that
skips most of the checks/byte-oriented, compact serialization things that
extern.c currently does could speed
things up.
- A means for inter-process synchronization/communication is still needed.
A userland solution using a shared memory consensus algorithm (which would
probably require some
C or assembly for atomic operations) could be cheap.
--
Berke
I can find lots of interesting use of all my cores without having all my
programs be _automaticly_ multi threaded.
> > exception traces are *not* available in long running program (daemon).
>
> Because you compiled it wrongly or because you lost the output?
i see you have no idea what a daemon is, and how ocaml can print its
stack trace only on stderr when reaching toplevel (i.e. quitting).
(on a non-modified version of ocaml)
> > have you been hiding in a cave lately?
>
> With yo mamma.
this actually sum up your posts and the idea contains in them pretty well.
> You aren't customer facing are you?
funny enough that you say that. we both know, which one of us is actually
producing some ocaml software that is really selling.
--
Vincent
Please stop that. Both of you.
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act brings liquidations.
_______________________________________________
Now, that's a cliffhanger. Could you elaborate ?
Cheers,
David
> > I think that the cost of copying data is totally overrated. We are doing
> > this often, and even over the network, and hey, we are breaking every
> > speed limit.
>
> You cannot afford to pay that price for parallel implementations of most
> numerical algorithms.
Er... Not being a specialist, I may be wrong, but I seem to remember
that you can afford that, as long as you're also doing something else
during that copy.
> On the contrary, that is not a theoretical statement at all: it
> already
> happened. F# already makes it much easier to write high performance
> parallel
> algorithms and its concurrent GC is the crux of that capability.
Examples ? Pretty please ?
Cheers,
David
--
David Teller
Security of Distributed Systems
http://www.univ-orleans.fr/lifo/Members/David.Teller
Angry researcher: French Universities need reforms, but the LRU act
brings liquidations.
_______________________________________________
At the risk of sounding like a broken record ... OR you could use
Ancient which does the above, right.
Rich.
--
Richard Jones
Red Hat
Figures to back up this extraordinary claim? (And I don't mean the
unverifiable figures of a certain Cambridge-based consultancy).
These commercial enterprises better hope they don't need to modify the
F# compiler at all, and that MS keep releasing new versions and fixes
forever, because the terms of the F# license would prevent them from
fixing it themselves (unlike if they'd decided to go with an open
source solution).
Rich.
--
Richard Jones
Red Hat
_______________________________________________
> On Fri, May 09, 2008 at 11:13:26PM +0200, Berke Durak wrote:
> > - For sharing complex data, you can marshall into a shared Bigarray.
> >
> > If the speed of Marshal becomes a bottleneck, a specialized Marshal that
> > skips most of the checks/byte-oriented, compact serialization things that
> > extern.c currently does could speed
> > things up.
>
> At the risk of sounding like a broken record ... OR you could use
> Ancient which does the above, right.
>
I just looked at the interface and in spite of an initial difference in aims
(sharing large, static data between processes), Ancient seems to be almost
what's needed for parallelism.
It could be used as an IPC mailbox, provided you map /dev/shm, come up with
some cheap means of synchronizing your processes (possibly by using a
futex?)
and ensure that the creation/deletion functions do not have too much
overhead.
But you are saying in the README that values in the ancient heap have some
limitations, namely no ad-hoc polymorphic primitives. That may be
acceptable
for sharing a large static database but won't cut the mustard for cheap
message-
passing. So we need a function to copy back an ancient value in the heap so
that
it can be used normally, which is what I was talking about: a simplified,
cheaper
Marshal module.
--
Berke
and please stop throwing him bones to carry on this _stupid_ troll thread.
--
Vincent
On 1 CPU Erlang is currently ~5x slower in this context. So even if we ignore
the cost of message passing we know Erlang cannot be competitive for <6
cores. In fact, you find that the cost of message passing over mutation makes
Erlang not competitive for any existing computers in this context, even 256
CPU shared memory supercomputers.
This is why Erlang has not (and will not) penetrate the market of scientists
programming shared memory supercomputers. For the same reason, Erlang is not
relevant for exploiting multicore: its remit is massive concurrency which is
a completely different problem.
The theoretical scaling of Erlang's performance in terms of asymptotic
complexity is irrelevant because we're still looking at small numbers (<<256)
of cores and the prefactor is huge. Moreover, this will never change: even
when we have 1,024-core machines there will still be locality effects best
exploited by sharing memory between physically-close cores.
So neither approach is a panacea but concurrent GC is essential now and
message passing will be essential in 10 years.
> Some applications are very difficult to
> scale to multiple CPUs, but others are inherently
> scalable. For some problems, mutable data structures
> make all the difference, and in others, they just make
> a mess of things.
>
> If you say "parallelism is for performance", you imply
> that the program is scalable, and that it actually makes
> a difference to add more CPUs. In this case, mutable
> the presence of data structures will make scaling more
> difficult.
That is commonly claimed by the proponents of alternative approaches (purity,
STM, message passing etc.) but I have found it to be untrue in the
case of F# although I can believe it applies when there is uncontrolled
mutation.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
You misunderstand this limitation, but anyway ...
> So we need a function to copy back an ancient value in the heap so
> that it can be used normally
Should you want to copy a value back to the heap (almost always
unnecessary), you can just copy it like you would any other value.
> On Sat, May 10, 2008 at 01:01:03AM +0200, Berke Durak wrote:
> > But you are saying in the README that values in the ancient heap have
> some
> > limitations, namely no ad-hoc polymorphic primitives.
>
> You misunderstand this limitation, but anyway ...
The paragraph from your README then needs some clarification.
> (1) Ad-hoc polymorphic primitives (structural equality, marshalling
> and hashing) do not work on ancient data structures, meaning that you
> will need to provide your own comparison and hashing functions. For
> more details see Xavier Leroy's response here:
As you cannot mutate anything that is ancient (since it might be
concurrently
accessed), you cannot mark or modify them in-place for ad-hoc marshalling or
deep copying. Is that correct?
Comparison does not mark (and thus does not work on cyclic structures).
Does it work on values in the ancient heap (I'm not talking of handles
here)?
So it seems that adding a generic copy-out-of-the-ancient heap function
(which marks in a private area) would be worthwhile. Should not be too
difficult.
--
Berke
Is it what you want: http://caml.inria.fr/mantis/view.php?id=4265 ?
Cheers,
ChriS
There are various restrictions on mutating the ancient data, which are
explained in the README. It is not true that ancient data is
completely immutable, just that you really need to understand what
you're doing in order to do it correctly. There are additional
restrictions to mutating [any] data concurrently, but I didn't explain
those because they are obvious, and can be solved with standard
threading techniques (mutexes, etc.)..
> you cannot mark or modify them in-place for ad-hoc marshalling or
> deep copying. Is that correct?
I'm not sure what this means. I haven't tried to Marshal ancient
data, because ancient data can already be persisted, so marshaling it
doesn't make much sense.
'Deep copying' of ancient data can be done just like deep copying any
other OCaml value.
> Comparison does not mark (and thus does not work on cyclic structures).
> Does it work on values in the ancient heap (I'm not talking of handles
> here)?
Your use of 'value', 'handle' etc. is confusing me. I suggest you
take a look at how OCaml values are stored (eg. <caml/mlvalues.h> is a
good place to start).
Anyhow, the polymorphic primitives (like %compare) don't work, just
because they make the assumption that anything outside the normal
OCaml heap is incomparable, but you can certainly write your own
comparison functions to replace those, eg. comparing
character-by-character for strings.
This has nothing to do with 'marking'.
> So it seems that adding a generic copy-out-of-the-ancient heap function
> (which marks in a private area) would be worthwhile. Should not be too
> difficult.
As I said earlier, you can just copy values from the ancient heap as
you would any other value, eg. using { ... with ... } syntax or
Array.copy / String.copy etc.
Let's say this again. Values on the ancient heap look just like
values anywhere else in the program. You can pass them to functions,
print them out, add them up, do whatever else you would normally do,
with very few restrictions. The differences are:
- the polymorphic primitives don't work (so you can't compare or hash them)
- they don't get garbage collected
- you should be very careful about mutating them
yes, and that what we use already; but it's not yet in the default
version of ocaml, which means that my debian upstream ocaml version is
not able to do it.
actually, i just saw the update from 18th march about the tentative
implementation in CVS. i really hope that's going to make it.
--
Vincent
Sorry, but it's not us who provoke. There is this F# fanboy who tries to
play his games, and doesn't say what his motivations are. You know when
there is money involved there are interests.
I'll try to write something. I have to be careful, however, because we
have competitors, and they are also after such information. Maybe I pick
one of our subsystems. That should be good enough for a case study while
not revealing too many details.
I thought "this context" for this thread was a blog article
that discussed OCaml's weaknesses in general terms.
Just looking at the default weighting of the shootout
benchmarks, the Erlang/OCaml ratio is 3.17; excluding
the only concurrency-related benchmark, where Erlang
is 10.6x faster than OCaml, it's 4.0.
(Not that I consider the shootout representative to the
products we build...)
> So even if we ignore the cost of message passing we know
> Erlang cannot be competitive for <6 cores.
[...]
> This is why Erlang has not (and will not) penetrate the
> market of scientists programming shared memory
> supercomputers.
The problem with your argumentation is that you make
sweeping general statements and later, when challenged
justify them by claiming them to be true in a very
specific context - presumably your favourite problem.
I can only assume that you are perfectly happy with
OCaml staying focused on scientific computation and
applications that have the same characteristics.
> For the same reason, Erlang is not relevant for exploiting
> multicore: its remit is massive concurrency which is
> a completely different problem.
Perhaps we live on different planets...?
In my world, products are never this black and white.
There are constant tradeoffs, where a high degree of
concurrency is almost always a factor, but also fairly
big chunks of sequential processing where raw performance
is important. The problem is that we can't find one
language that does it all. Erlang is admittedly slow on
things like text parsing, math, etc., but jumping out of
the shared memory space and executing the code in something
like C or OCaml, we pay so much for heavyweight
communication that it's usually not worth the effort - never
mind the fact that debugging becomes much, much harder.
So on total system performance, we usually do very well,
even though it is pretty slow on some specific parts of
the problem. And we get superior robustness and
maintenance cost, which for our customers is far more
important than raw performance.
Right now, the main language that I see as an interesting
contender for our type of products is Haskell, because it
combines very good performance with very good support for
lightweight concurrency /and/ offers very high productivity.
OCaml, sadly, cannot even be considered for anything but
specialized tasks, since it has no credible support for
concurrency. I don't really see why it'd have to stay
that way, except that interest in message-passing
concurrency seems to have been very low so far on this
mailing list.
(Actually, I'd rank Felix higher, if it could ever rise
from obscurity, since it was designed to run safely as
a dynamically linked component. It could work as a very
nice complement to Erlang.)
And as for exploiting multicore, we simply cannot get
our hands on enough cores quickly enough (mainly because
we need them in compact NEBS-compliant embedded systems).
But we ship dual-core systems today, and got a 1.7x
speedup without even recompiling the code. Very soon,
we'll move to quad-cores, and expect corresponding
speedups again. 8-core boards are on the horizon.
Just recently, we noted that an Erlang-based product may
well surpass what's been the undisputed market leader on
performance using an 8-core machine, since that product
(hand-written C++) cannot use more than 1 core due to
timing issues. And the Erlang product in question wasn't
even designed for raw performance, but for maximum
convenience.
Perhaps this is all irrelevant to your particular
consultancy company? I know a number of companies that
consider it extremely relevant /today/ - in market
segments worth many billions of dollars.
BR,
Ulf W
So we agree that Erlang is not in the same league as OCaml for CPU-intensive
tasks on <6 cores?
> > So even if we ignore the cost of message passing we know
> > Erlang cannot be competitive for <6 cores.
>
> [...]
>
> > This is why Erlang has not (and will not) penetrate the
> > market of scientists programming shared memory
> > supercomputers.
>
> The problem with your argumentation is that you make
> sweeping general statements and later, when challenged
> justify them by claiming them to be true in a very
> specific context - presumably your favourite problem.
I specifically said "For CPU intensive tasks...". I was not making a "sweeping
general statement".
I do not believe that not-massively-concurrent applications are a niche
either. They constitute the vast majority of programs.
> > For the same reason, Erlang is not relevant for exploiting
> > multicore: its remit is massive concurrency which is
> > a completely different problem.
>
> Perhaps we live on different planets...?
> In my world, products are never this black and white.
>
> There are constant tradeoffs, where a high degree of
> concurrency is almost always a factor,
In the specific context of your products, yes. Few general programs require a
high degree of concurrency though.
> but also fairly
> big chunks of sequential processing where raw performance
> is important. The problem is that we can't find one
> language that does it all. Erlang is admittedly slow on
> things like text parsing, math, etc., but jumping out of
> the shared memory space and executing the code in something
> like C or OCaml, we pay so much for heavyweight
> communication that it's usually not worth the effort - never
> mind the fact that debugging becomes much, much harder.
Perhaps. Comparing debugging is apples and oranges though. F# inherited great
debugging facilities from .NET but I never use them because static type
checking catches my errors instead. I believe a parallel debugger for OCaml
would be equally useless (if OCaml supported parallelism).
> So on total system performance, we usually do very well,
> even though it is pretty slow on some specific parts of
> the problem. And we get superior robustness and
> maintenance cost, which for our customers is far more
> important than raw performance.
I don't doubt that but I do not believe that Erlang's success with massively
concurrent applications has anything to do with adding better multicore
support to OCaml.
> Right now, the main language that I see as an interesting
> contender for our type of products is Haskell, because it
> combines very good performance with very good support for
> lightweight concurrency /and/ offers very high productivity.
I'm surprised. When I studied Haskell I felt that it was an academic language
with the same implementation problems as OCaml. There are no decent
development environments (Haskell did not even have type throwback).
Performance and memory consumption are wildly unpredictable, even between
compiler versions. Virtually nobody uses it outside academia (unlike OCaml,
which has been seeing substantial penetration in industry for many years).
Moreover, the paper describing Haskell's parallel GC gives performance figures
showing some applications degrading in performance when moving from 4 to 8
cores. So I think the Haskell community's claim that it is "good for
parallelism" is a triumph of hope over reality.
Haskell is also commerce unfriendly. Nobody sells Haskell DLLs for other
Haskell programmers (same with OCaml). There is a book market but only for
academics. Maybe "Real World Haskell" will change that but I won't hold my
breath, not least because I cringe at the idea of a product with "real world"
in the title.
> OCaml, sadly, cannot even be considered for anything but
> specialized tasks, since it has no credible support for
> concurrency. I don't really see why it'd have to stay
> that way, except that interest in message-passing
> concurrency seems to have been very low so far on this
> mailing list.
If that were true you would expect to see OCaml's use being restricted to
certain domains when, in fact, it appears to be much more broadly used than
Erlang.
> (Actually, I'd rank Felix higher, if it could ever rise
> from obscurity, since it was designed to run safely as
> a dynamically linked component. It could work as a very
> nice complement to Erlang.)
>
> And as for exploiting multicore, we simply cannot get
> our hands on enough cores quickly enough (mainly because
> we need them in compact NEBS-compliant embedded systems).
> But we ship dual-core systems today, and got a 1.7x
> speedup without even recompiling the code. Very soon,
> we'll move to quad-cores, and expect corresponding
> speedups again. 8-core boards are on the horizon.
Dell's eight core desktops are now only £1,080! :-)
> Just recently, we noted that an Erlang-based product may
> well surpass what's been the undisputed market leader on
> performance using an 8-core machine, since that product
> (hand-written C++) cannot use more than 1 core due to
> timing issues. And the Erlang product in question wasn't
> even designed for raw performance, but for maximum
> convenience.
>
> Perhaps this is all irrelevant to your particular
> consultancy company?
Without knowing what problem was being solved I cannot say whether or not it
is relevant to our work. I suspect it is another massively concurrent
application, in which case it is irrelevant for us, yes.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
Now, if it were not only possible but really easy for me to do fine-grain
parallelization, like with OpenMP, Task Parallel Library, Cilk, etc., then
there'd be little reason for me not to take advantage of current multicores.
But again, it's not like this would enable me to attack all these problems
that I can't now. I don't doubt such problems exist, but they don't for me
(or really anyone I know). So, to me, OCaml's lack of shared-memory
concurrency is a shame but not a show-stopper.
-Mike Lin
PS. Apologies in advance, but I can never resist the urge to good-naturedly
hoist one by his own petard,
"I'm hard pressed to find a practically-important application that
will be better off with a concurrent GC..."
-Jon Harrop, 11 months ago
http://groups.google.com/group/comp.lang.functional/browse_thread/thread/a288a989c5cd6acb/50b6ca2607173c91
On Sat, May 10, 2008 at 10:51 AM, Ulf Wiger (TN/EAB) <ulf....@ericsson.com>
wrote:
Sure. Review the concerns cited regarding parallel programming in OCaml:
1. When do we fork? Earlier to amortize the overhead or later to enable
sharing of data structures?
2. How do we avoid excessive copying? What if each parallel thread only
requires part of a data structure, should we dissect the data structure to
alleviate message passing?
3. How do we pass values like heap allocated custom blocks?
4. Do we use the Ancient module and manage our memory manually so that we can
mutate shared data?
These all make parallel programming much harder in OCaml than it needs to be.
All of these concerns simply disappear when you have a concurrent GC. F#
already does. Hopefully OCaml will soon.
Addressing each point in turn:
1. Use the high-performance thread pool provided by the Task Parallel Library.
2. There is no copying.
3. You can pass any value to another thread because everything is uniformly
represented.
4. Memory management is fully automated and shared mutable data is easy.
Consider the embarassingly-parallel task of matrix-matrix multiplication. Note
that this is absolutely best-case for OCaml because the overheads are as
small as possible thanks to the complete absence of fine-grained parallelism.
Most real applications require much more fine-grained parallelism and OCaml's
performance is much worse as a consequence.
In F#, this is really easy to write:
let c = Array2.zero_create am bn
Parallel.For(0, n, fun i ->
for j = 0 to n - 1 do
let mutable r = 0.0
for k = 0 to n - 1 do
r <- r + a.[i,k] * b.[k,j]
c.[i,j] <- r)
There are several things to notice about this:
The input matrices are immediately accessible from parallel threads without
us having to do anything. There is no copying, so memory consumption is lower
and there is no overhead to worry about.
The output matrix is filled directly using mutation. Again, there is no
copying, no subsequent single-threaded collation of results and no overhead.
No manual memory management is required.
There is no easy solution in OCaml but you might:
Fork a process for each CPU at the start of the multiply in order to avoid
copying the input matrices, use message passing to return the result and
collate it serially.
Prefork a process for each CPU and use message passing to queue work items,
pass the input matrices using message passing, compute results, pass results
back using message passing and collate them serially.
So we might have an O(1) hit from forking, an O(n^2) hit from message passing
and collation and an O(n^3) actual algorithm. Like Ulf said, message passing
scales well. However, its raw performance is awful compared to leveraging
shared memory.
Here is an OCaml implementation of the above F#:
let invoke (f : 'a -> 'b) x : unit -> 'b =
let input, output = Unix.pipe() in
match Unix.fork() with
| -1 -> (let v = f x in fun () -> v)
| 0 ->
Unix.close input;
let output = Unix.out_channel_of_descr output in
Marshal.to_channel output (try `Res(f x) with e -> `Exn e) [];
close_out output;
exit 0
| pid ->
Unix.close output;
let input = Unix.in_channel_of_descr input in
fun () ->
let v = Marshal.from_channel input in
ignore (Unix.waitpid [] pid);
close_in input;
match v with
| `Res x -> x
| `Exn e -> raise e
let parmul_aux i0 i1 n a b =
let c = Array.make_matrix (i1 - i0) n 0. in
for i = i0 to i1 - 1 do
let ai = a.(i) in
for j = 0 to n - 1 do
let r = ref 0.0 in
for k = 0 to n - 1 do
r := !r +. Array.unsafe_get ai k *. Array.unsafe_get (Array.unsafe_get b k)
j
done;
c.(i - i0).(j) <- !r
done;
done;
c
let parmul n a b =
let c1 = invoke (fun () -> parmul_aux 0 (n/2) n a b) () in
let c2 = parmul_aux (n/2) n n a b in
Array.concat [c1(); c2]
Note that we must manually insert unsafe array accesses and hoist loop
invariants in order to obtain comparable performance. This code is clearly
much more complicated than the F# and that complexity is completely
unnecessary.
> > > I think that the cost of copying data is totally overrated. We are
> > > doing this often, and even over the network, and hey, we are breaking
> > > every speed limit.
> >
> > You cannot afford to pay that price for parallel implementations of most
> > numerical algorithms.
>
> Er... Not being a specialist, I may be wrong, but I seem to remember
> that you can afford that, as long as you're also doing something else
> during that copy.
Here are some actual performance results for this task on my machine:
n OCaml F#
Serial Parallel Serial Parallel
16 0.00005 0.00022 0.00002 0.000045
32 0.00040 0.00067 0.00015 0.00015
64 0.0033 0.0021 0.00090 0.00068
512 4.0 2.4 2.5 1.4
As you can see, F# is much faster in every case. Larger "n" approaches the
limit where F# is only 1.6x faster than OCaml. For smaller "n" the gap
quickly widens with F# being 5x faster for n=16.
> > On the contrary, that is not a theoretical statement at all: it
> > already happened. F# already makes it much easier to write high
> > performance parallel algorithms and its concurrent GC is the crux of that
> > capability.
>
> Examples ? Pretty please ?
The above examples are a good start. The latest F#.NET Journal article covers
parallelism using Microsoft's TPL. A future OCaml Journal article will cover
the use of fork-based parallelism. I think you can see what I mean from the
results I've presented in this post though.
To put this another way, multicore computing provides hardware accelerated
shared memory and a concurrent GC makes it easy to exploit that. Message
passing is fine for concurrent applications that are not CPU bound or for
distributed computing but it is not competitive on today's multicore machines
and will not become competitive in the next decade.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
i don't understand any of this.
>2. How do we avoid excessive copying? What if each parallel thread only
>requires part of a data structure, should we dissect the data structure to
>alleviate message passing?
this seems to suggest that message passing implies serialising the data structure.
perhaps i missed something, though.
For compute-intensive tasks that don't require coordination
with other tasks, sure.
> I specifically said "For CPU intensive tasks...".
> I was not making a "sweeping general statement".
Perhaps we simply read different things into the word
"CPU-intensive". If a program is able to make full use
of the CPU and is limited by available CPU capacity,
I will call it CPU-intensive.
In another post you compared an F# program with an
OCaml program, where the OCaml program had to do
marshaling and unmarshaling in order to achieve
cooperation between tasks. This is of course also
CPU-intensive, and if a significant amount of this
is required, OCaml will lose against Erlang big time,
just as it did against F#.
> In the specific context of your products, yes. Few
> general programs require a high degree of
> concurrency though.
One of the things you quickly notice when programming
in a language that really encourages you to explore
the concurrency patterns in your problem, is that
there is much more concurrency than you'd think, even
in conventional programs.
It's just that we've grown so accustomed to mapping
problems onto a sequential model, simply because
dealing with concurrency is a nightmare in most
conventional programming languages.
Here's another way of looking at it: While programming
so far has been very much about automating singleton tasks
that would otherwise take a long time to perform manually
(e.g. scientific computing), there is a huge number of
problems that are normally solved by groups of people
cooperating. These can be automated too, but without
mirroring the concurrency patterns used by people when
solving these problems manually, it gets very difficult.
We'll see a great number of web-based applications where
you not only need to handle hundreds or thousands of
simultaneous visitors, but each session will require
(real-time) interaction with other systems, as well as
perhaps other sessions within the system.
I think the term "massive concurrency" tends to lead to
misunderstandings. The key characteristic of a concurrency-
oriented language is that it's so easy and cheap to create
processes and coordinate between tasks, that you don't
think twice about creating just as many processes as you
need in order to solve the problem neatly. If this means
creating 10,000 processes, that shouldn't be an issue, just
like creating 10,000 objects shouldn't be for an OO language.
But to fully utilize 4 or 8 cores, you don't need thousands
of processes.
>> but jumping out of
>> the shared memory space and executing the code in something
>> like C or OCaml, we pay so much for heavyweight
>> communication that it's usually not worth the effort - never
>> mind the fact that debugging becomes much, much harder.
>
> Perhaps. Comparing debugging is apples and oranges though.
I simply meant that combining different languages and
execution environments makes debugging much more difficult
than using just one.
>> So on total system performance, we usually do very well,
>> even though it is pretty slow on some specific parts of
>> the problem. And we get superior robustness and
>> maintenance cost, which for our customers is far more
>> important than raw performance.
>
> I don't doubt that but I do not believe that Erlang's success
> with massively concurrent applications has anything to do
> with adding better multicore support to OCaml.
I agree that this is something very different from trying to
speed up scientific calculations using shared-memory
parallelism. But I would encourage a look at how lightweight
message passing concurrency can be used to build very complex
and robust systems. Jocaml is a very interesting start.
>> Right now, the main language that I see as an interesting
>> contender for our type of products is Haskell, because it
>> combines very good performance with very good support for
>> lightweight concurrency /and/ offers very high productivity.
>
> I'm surprised. [...]
> Moreover, the paper describing Haskell's parallel GC gives
> performance figures showing some applications degrading
> in performance when moving from 4 to 8 cores. So I
> think the Haskell community's claim that it is "good for
> parallelism" is a triumph of hope over reality.
I'm not into shared-memory parallelism. For me, it's much
more important that the language be good at coordination
between tasks - since failure in that area means death
by complexity in my domain.
There was a project comparing Erlang and C++ for telecoms
software, which also looked at Glasgow Distributed Haskell
for the same tasks. While it came to some fairly concrete
conclusions re. Erlang vs C++, the work on Haskell was not
exhaustive enough to be directly comparable. Still, I would
view the preliminary results as very interesting.
F# could also potentially be interesting (if the underlying
concurrency support would improve substantially in terms
of thread creation, number of threads, scheduling cost, etc.,
and I have reason to believe it will). The main obstacle
is that we are extremely wary of vendor lock-in, both in
terms of programming language and operating system.
>> And as for exploiting multicore, we simply cannot get
>> our hands on enough cores quickly enough (mainly because
>> we need them in compact NEBS-compliant embedded systems).
>> But we ship dual-core systems today, and got a 1.7x
>> speedup without even recompiling the code. Very soon,
>> we'll move to quad-cores, and expect corresponding
>> speedups again. 8-core boards are on the horizon.
>
> Dell's eight core desktops are now only £1,080! :-)
Well, a single-core NEBS-compliant rack-mountable Dell
server with 128 MB RAM is over $5,000! (:
(Not that we could use those either. We need Micro TCA
form factor.)
BR,
Ulf W
Copying is the obvious solution in OCaml because there is no easy way to share
OCaml values between processes.
There are alternatives (like referencing shared memory as custom blocks and
managing deallocation manually) but they are substantially more complicated
and it is not clear whether or not they will be faster in OCaml because
indirection is required to disambiguate the heaps for the GC and prevent it
from traversing the shared memory.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
>There are alternatives (like referencing shared memory as custom blocks and
>managing deallocation manually) but they are substantially more complicated
ok, i understand now: it would be possible in principle given the language,
but current implementation makes it difficult.
Unicode by itself, when wider-than-byte encodings are used, adds "zero"
runtime overhead; the only overhead is storage (2 or 4 bytes per character).
Given that storage is cheap, I'd much rather have Unicode support than lack of
it.
How exactly unicode support makes Java and C# slow?! Jon, I'd have thought
that you know better than that :( Oh well, we all have bad days.
Cheers, Kuba
Yet, if you look at things in the light of "optimization is depessimization",
you'd much rather have easier to read code, than code which is ugly because
you preoptimized it by hand. This is why, for me, Ocaml has a long way to go
to make it useful for run-of-the-mill production code. My pet peev is
performance penalty paid for writing in functional style where it actually
makes sense -- say passing an arithmetic operator to a map-style function.
I wouldn't generalize like that. Where I work we use Qt (a C++ framework) and
can relatively trivially deploy on three major platforms (Linux, Windows and
OS X). It's been a long time since I dealt directly with any Windows
deficiencies per se. Trolls had to deal with them for sure, but they did the
job so that I wouldn't have to. Most cool applications are multi-platform...
I can't comment about the original lack of parallelism issue, but the
platform-bashing comment is just gibberish to me. I'm all for cool, inspiring
platforms, and I use OS X and Linux exclusively at home/school, but Windows
has made some headway in recent years and for me XP is quite reasonable as
long as you don't deal with driver writing. The latter is still better than
in WIN95 DDK days, but the toolset (apart from the F#-based tools) is
obscenely out-of-date, just as it was in 1995. Then there are some regressed
pieces of Linux which are quite a reality check given the whole "we support
old hardware" propaganda (steer clear of SCSI tape drives), so for me
personally I have equally many bash-Windows arguments as I have bash-Linux
ones.
----------------------------------------------------------------------
let n = 1024
let a = Array.make_matrix n n 6.7
let b = Array.make_matrix n n 8.9
(* Result array, stored in shared memory. *)
let c =
let c = Array.make_matrix n n 0. in
let fd = Unix.openfile "/tmp/zero" [Unix.O_RDWR;Unix.O_TRUNC;Unix.O_CREAT] 0o644 in
let md = Ancient.attach fd 0x440000000000n in
Ancient.follow (Ancient.share md 0 c)
let parmul_aux i0 i1 n a b =
for i = i0 to i1 - 1 do
let ai = a.(i) in
for j = 0 to n - 1 do
let r = ref 0.0 in
for k = 0 to n - 1 do
r := !r +. Array.unsafe_get ai k *.
Array.unsafe_get (Array.unsafe_get b k) j
done;
c.(i).(j) <- !r
done;
done
let parmul n a b =
(match Unix.fork () with 0 -> parmul_aux 0 (n/2) n a b; exit 0 | _ -> ());
parmul_aux (n/2) n n a b;
ignore (Unix.wait ())
;;
parmul n a b
----------------------------------------------------------------------
This is just barely faster than Jon's OCaml version using message
passing (12% faster on my test machine[0]). Which just seems to show
that the overhead of message passing _isn't_ the problem here[1].
Perhaps it's the bounds checking in the assignment back to the matrix?
Anyhow, in real life situations we'd all be using a super-optimized
hand-coded-in-assembly matrix multiplication library (LAPACK?), so
this is all very academic.
Rich.
[0] Quad core Intel hardware:
model name : Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz
[1] Creation of the result matrix and copying it to shared memory is
almost instantaneous in my tests.
--
Richard Jones
Red Hat
_______________________________________________
I think people forget Jon's background/interests: numerical methods.
Think of working with large amounts of numbers, where the cost of some
operations is on par with copying a few words of memory, and good
non-algorithmic (!) optimizations can increase performance 10-fold (see
Mersenne project, for example). Moreover, a lot of numerical methods are
applied in real-time applications, and there a 50% loss in performance simply
means that the game becomes unplayable, or that the phone connection sounds
like excrement dropping into the hole. For web applications (generally
speaking), a 50% loss of performance means you pay 2x more money for servers
and power. Up to a certain point, this is small compared to the costs of
maintaining the software. Where numerical methods are widely applied, a 50%
performance loss may mean you're out of the market.
I'm no F# fanboy; I use Ocaml exclusively for most of my numerical work (some
FEM, all numerical methods courses I take), but as a language for packaged
application development (bread-and-butter stuff that sells in boxes or via
downloads) it's simply not there yet for me. F# would be more like it, if I
weren't wary of .net platform lock-in. Give Mono a few more years and it
won't be an issue anymore...
For server-side use Ocaml is probably fine, but so is a number of languages
that for me pack way more punch (Lisp, for one). In any event, these days I
see little reason to develop web-facing stuff on non-google platforms (at
least if you're a startup), so I'm in Python lock-in in that area anyway now.
Cheers, Kuba
You cannot degrade memory consumption without also degrading performance.
Moreover, there are hidden costs such as the added complexity in a lexer
which potentially has 256x larger dispatch tables or an extra indirection for
every byte read.
> Given that storage is cheap, I'd much rather have Unicode support than lack
> of it.
Sure. I don't mind unicode being available. I just don't want to have to use
it myself because it is of no benefit to me (or many other people) but is a
significant cost.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
On this machine, the Ancient-based implementation is 30-40% faster than the
message passing OCaml.
The important question is: how many operations must we be doing for
parallelism to be worthwhile? The answer is of fundamental importance.
Plotting the graphs, we find that parallelism is faster in F# for n>32 and in
OCaml for n>150 for this O(n^3) task. Therefore, OCaml is two orders of
magnitude slower than F# at spawning parallel computations. Moreover, to get
this performance from OCaml you had to sacrifice safety.
> Which just seems to show that the overhead of message passing _isn't_ the
> problem here[1].
Well, OCaml currently offers several different approaches to parallelism that
are all extremely inefficient. The parallel GC implementation being funded by
Jane St. will completely remove this performance problem with OCaml in this
context but it will not scale as well as Microsoft's concurrent GC and will
suffer from the same stalling.
This is why I believe that project is of incredible importance.
> Perhaps it's the bounds checking in the assignment back to the matrix?
Actually it was the polymorphic array maps and copies rather than the bounds
checking within them. So I hand rolled optimized float array array copying
code into both of the new parallel OCaml implementations to get the best
possible performance out of OCaml for the above comparison.
> Anyhow, in real life situations we'd all be using a super-optimized
> hand-coded-in-assembly matrix multiplication library (LAPACK?), so
> this is all very academic.
If you ignore the impedance mismatch between the data structures, the
performance overhead of invoking external functions and all applications that
require different functionality that has not already been written, optimized
and packaged by someone else, yes.
The fact is, this is hugely important to the vast majority of the software
industry and companies are investing massive amounts of time and effort into
parallelizing their software. Developers are flocking to .NET because it
makes parallelism easy and OCaml has the potential to provide these benefits
if the parallel GC project flies.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
This raises the question of whether or not the Mono backend for OCaml might
pay off? My impression is that Mono is going nowhere but I have been known to
be overly skeptical in the past. ;-)
Speaking of which, it would also be interesting to compare the performance of
OCaml with F#/Mono on that matrix-matrix multiply benchmark...
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
What do you mean by this? What language would not incur this kind of
performance hit? Is F# able to optimize this out or were you referring to
something else?
>
> Cheers, Kuba
>
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
--
-----------------------------------------------------------------------
(\__/)
(='.'=)This is Bunny. Copy and paste Bunny into your
(")_(")signature to help him gain world domination.
------------------------------------------------------------------------
Lets not start a platform war on this thread. I have *plenty* of gripes
with Linux as well. I was merely digressing, and was sniping at the high
cost of platform specific development. If you want to discuss Windows
programming gripes, Joel Spolsky's forum has plenty, even with regard to the
idea that things have been better lately. ocaml+qt does not count. If
you're using ocaml+qt, then you've basically sidestepped the issues with
writing code for Windows.
For Ocaml: "when inlining higher-order functions with known function
arguments, those known function arguments are not themselves inlined."
http://camltest.inria.fr/pub/old_caml_site/caml-list/1048.html
(This is an old post, if things have changed I would love to be corrected.)
sml can inline such functions, making passing + to a map style function
potentially as efficient as writing a procedural loop.
I think you are referring to the Ocaml summer project which is to be
done by Emmanuel Chailloux's student.
Till
2008/5/12 Arthur Chan <baguas...@gmail.com>:
> _______________________________________________
> Caml-list mailing list. Subscription management:
> http://yquem.inria.fr/cgi-bin/mailman/listinfo/caml-list
> Archives: http://caml.inria.fr
> Beginner's list: http://groups.yahoo.com/group/ocaml_beginners
> Bug reports: http://caml.inria.fr/bin/caml-bugs
>
>
--
http://till-varoquaux.blogspot.com/
I've just written my own distributed version. You find my comments and
timings here:
http://blog.camlcity.org/blog/parallelmm.html
The code is here:
https://godirepo.camlcity.org/svn/lib-ocamlnet2/trunk/code/examples/rpc/matrixmult/
In this (very unoptimized) multiplier message passing accounts for ~25%
of the runtime. Even for 2 cores there is already a speedup. 10 cores
(over a network) are about 4 times faster than a single core without
message passing.
Gerd
> Perhaps it's the bounds checking in the assignment back to the matrix?
>
> Anyhow, in real life situations we'd all be using a super-optimized
> hand-coded-in-assembly matrix multiplication library (LAPACK?), so
> this is all very academic.
>
> Rich.
>
> [0] Quad core Intel hardware:
> model name : Intel(R) Core(TM)2 Quad CPU Q9450 @ 2.66GHz
>
> [1] Creation of the result matrix and copying it to shared memory is
> almost instantaneous in my tests.
>
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
For what values of "n"?
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
It's in the article. n=1000, 2000, 3000. The "4 times faster" statement
is for n=3000.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
Can you find a more accurate estimate of the threshold value of "n" above
which there is a speedup on 2 cores?
I think that would be very useful.
--
Dr Jon D Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/products/?e
_______________________________________________
:-)
> Well, our algorithms are quite easy to parallelize. I don't see a
> difference in whether they are CPU-bound or disk-bound - we also have
> lots of CPU-bound stuff, and the parallelization strategies are the
> same.
>
> The important thing is whether the algorithm can be formulated in a way
> so that state mutations are rare, or can at least be done in a
> "cache-friendly" way. Such algorithms exist for a lot of problems. I
> don't know which problems you want to solve, but it sounds like as if it
> were special problems. Like for most industries, most of our problems
> are simply "do the same for N objects" where N is very large, and
> sometimes "sort data", also for large N.
Right. Those are embarassingly parallel problems, which is why you are not
suffering from a lack of fine-grained parallelism as we are. Some problems
require fine-grained parallelism but, on a more general note, we are trying
to push parallelism as deep into our libraries as possible so that users can
benefit from their multicore machines whatever they are doing.
The ability to spawn parallel computions efficiently is of the utmost
importance here. Without it, multicore holds no advantages (e.g. for that
matrix multiply benchmark with n<100).
> > Parallelism is easy in F#.
>
> Wonders must have happened I'm not aware of. How does F# prevent
> deadlocks?
Parallel programming is typically extremely well suited to data parallel
constructs (parallel for and so forth) so deadlocks are not a problem.
However, F# is also unusually good for improving robustness even in the
presence of low-level threading constructs because, just like OCaml, mutation
is contained and typically reversed for performance-critical sections of
code. So the number of locks in a large parallel F# applications is tiny
compared to Java/C# and it is perfectly feasible to manage their potential
interactions (i.e. to avoid deadlocks) by hand.
> > > This is a quite theoretical statement. We will rather see that most
> > > application programmers will not learn parallelism at all, and that
> > > consumers will start question the sense of multicores, and the chip
> > > industry will search for alternatives.
> >
> > On the contrary, that is not a theoretical statement at all: it already
> > happened. F# already makes it much easier to write high performance
> > parallel algorithms and its concurrent GC is the crux of that capability.
>
> Don't misunderstand me, I'm not anti-F#.
This will all apply to OCaml when it gets a parallel GC this summer. :-)
> I only have no interests right
> now in taking advantage of multicores by concurrent GC's. I rather want
> to have an ultra-fast single-core execution. I can do the
> parallelization myself.
Right. Our customers are loving parallelism right now and want to make the
most of their multicore machines today. This is pushing us to make everything
as multicore friendly as possible.
It's not much about the language, but about implementation. IIRC some Lisps
can do this kind of an optimization, gcc could do it as well -- whenever
the value of the argument to a function is known, a potentially
call-site-specific version of the function can be generated, and in such cases
it'd be much simpler, emitted-code-wise, than the version which explicitly
emits the arguments and calls the operator.
In a typical programming language which only accepts ASCII characters outside
of string constants, your dispatch table will be short anyway (covers ASCII
subset only), and there will be an extra comparison or two, active only when
lexing strings. So no biggie.
> > Given that storage is cheap, I'd much rather have Unicode support than
> > lack of it.
>
> Sure. I don't mind unicode being available. I just don't want to have to
> use it myself because it is of no benefit to me (or many other people) but
> is a significant cost.
Let's look at a relatively widely deployed example: Qt toolkit.
Qt uses a 16 bit Unicode representation, and I really doubt that there are any
runtime-measurable costs associated with it. By "runtime measurable" I mean
that, say, application startup would take longer. A typical Qt application
will do quite a bit of string manipulation on startup (even file names
are stored in Unicode and converted to/from OS's code page), and they have
slashed startup time by half on "major" applications, between Qt 3 and Qt 4,
by doing algorithmic-style optimizations unrelated to strings (reducing number
of malloc's, for one). So, unless you can show that one of your applications
actually runs faster when you use non-Unicode strings as compared to well
implemented Unicode ones, I will not really consider Unicode to be a problem.
I do agree that many tools, like lexer generators, may not be Unicode-aware or
have poorly implemented Unicode awareness. The 256x lexer table blowup
shouldn't happen even if you were implementing APL with fully Unicode-aware
lexer. The 1st level lexer table should be split into two pieces (ASCII and
APL ranges), and everything else is either an error or goes opaquely into
string constants.
A lexer jump table only makes sense when it actually saves time compared to
a bunch of compare-and-jumps. On modern architectures some jump lookup tables
may actually be slower than compare-and-jumps, because some hardware
optimizations done by CPU (say branch prediction) may simply ignore branch
lookup tables, or only handle tables commonly generated by compilers...
Cheers, Kuba
Okay, I was going to let this slide, but it kept resurfacing and annoying me.
Is there any empirical support for the assertion that Java and C# are slow because of *unicode*? Of
all things, *unicode*? The fact that they're bytecod languages isn't a bigger hit? At least with
the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
available to ridiculous levels so that the full GC barely ever has to fire. The the nigh-universal
optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
too. And this is to say nothing of user-space problems like the explosion of nontrivial types
associated with the object-driven style. With all that going on, you're blaming their *Unicode
support* for why they're slow? "This is why languages like Java and C# are so slow." Really? Got
evidence for that?
~~ Robert.
>>>>>5. Strings: pushing unicode throughout a general purpose language is a
>>>>>mistake, IMHO. This is why languages like Java and C# are so slow.
>>>>>
>>>>>
>>>>Unicode by itself, when wider-than-byte encodings are used, adds "zero"
>>>>runtime overhead; the only overhead is storage (2 or 4 bytes per
>>>>character).
>>>>
>>>>
>>>You cannot degrade memory consumption without also degrading performance.
>>>Moreover, there are hidden costs such as the added complexity in a lexer
>>>which potentially has 256x larger dispatch tables or an extra indirection
>>>for every byte read.
>>>
>>>
>
>Okay, I was going to let this slide, but it kept resurfacing and annoying me.
>
>Is there any empirical support for the assertion that Java and C# are slow because of *unicode*? Of
>all things, *unicode*? The fact that they're bytecod languages isn't a bigger hit? At least with
>the JVM, the hypercomplicated GC should probably take some of the blame, too -- I've seen 2x speed
>increases by *reducing* the space available to the GC, and 10x speed increases by boosting the space
>available to ridiculous levels so that the full GC barely ever has to fire. The the nigh-universal
>optimization-ruining mutable data and virtual function (e.g. method) dispatch I'm sure doesn't help,
>too. And this is to say nothing of user-space problems like the explosion of nontrivial types
>associated with the object-driven style. With all that going on, you're blaming their *Unicode
>support* for why they're slow? "This is why languages like Java and C# are so slow." Really? Got
>evidence for that?
>
>~~ Robert.
>
>_
>
The problem, as I understand it, is in writting parsers. Your standard
finite automata based regular expression library or lexical analyzer is
based, at it's heart, on a table lookup- you have a 2D array, whose size
is the number of input characters times the number of states. For ASCII
input, the number of possible input characters is small- 256 at most.
256 input characters times hundreds of states isn't that big of a table-
we're looking at sizes in 10's of K- easily handlable even in the bad
old days of 64K segments. Even going to UTF-16 ups the number of input
characters from 256 to 65,536- and now a moderately large state machine
(hundreds of states) weighs in at tens of megabytes of table space.
And, of course, if you try to handle the entire 31-bit full unicode
point space, welcome to really large tables :-).
The solution, I think, is to change the implementation of your finite
automata to use some data structure smarter than a flat 2D array, but
that's me.
Brian
A slightly more involved analysis is probably in order, so let's ask Wikipedia for some more light.
http://en.wikipedia.org/wiki/UTF-8#Rationale_behind_UTF-8.27s_design
As Kuba pointed out, the high bit is 0 on any ASCII characters, and the significant bits of a
multi-byte sequence determine the length of the sequence. There are also a few large classes of bit
sequences which are simply not allowed. Now, I'm not an expert in writing parsers, but these
qualities certainly sound like nice optimization hooks.
Getting back to the original question, though -- is there any evidence that Java/C# are slow because
of unicode support, and not because of other aspects of the languages? Because that assertion seems
flat-out bogus to me.
Note that we have such a lexer already: ulex. It uses binary decision
trees, AFAIK. The resulting code has moderate size. It can take some
time, however, until ulex has converted the NFA to a DFA.
Gerd
--
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
Phone: +49-6151-153855 Fax: +49-6151-997714
------------------------------------------------------------
> Getting back to the original question, though -- is there any evidence that Java/C# are slow because
> of unicode support, and not because of other aspects of the languages? Because that assertion seems
> flat-out bogus to me.
I do not think the JVM is especially slow in practice. However, one potential source of
slowness could be, in some particular cases, conversions to and from the internal short array-based
string representation to UTF8 when using native code. Similarly, Java strings being immutable,
in-place modification of strings is not possible from native code, so a lot of bindings to C libraries
end up duplicating strings a lot (see e.g. PCRE).
This is why the NIO API exposing mutable and/or externally allocated buffers was introduced in the JVM,
but it remains hard to use.
However it is true that regexes on UTF8 can be quite slow. Compare (on Linux):
/udir/durak> dd if=/dev/urandom bs=10M count=1 of=/dev/shm/z
/udir/durak> time LANG=en_US.UTF-8 grep -c "^[a-z]*$" /dev/shm/z
2.31s user 0.01s system 99% cpu 2.320 total
/udir/durak> time LANG=C grep -c "^[a-z]*$" /dev/shm/z
0.04s user 0.01s system 98% cpu 0.048 total
Lesson 1: when lexing, do not read unicode chars one at a time. Pre-process your regular expression
according to your input encoding, instead.
That being said, I think strings should be represented as they are today, and that the core
Ocaml libraries do not have much business dealing with UTF8. We seldom need letter-indexed
random access to strings.
However, the time is ripe for throwing out old 8-bit charsets such as ISO-8859-x (a.k.a Latin-y)
and whatnot. This simplifies considerably lesson 1: it's either ASCII or UTF8 Unicode. I think
the Ocaml lexer should simply treat any byte with its high bit set as a lowercase letter.
--
Berke DURAK
This is just silly.
This table can be usually trivially degenerated into hardcoded
compare-and-jumps, for most lexers I've dealt with this will astronomically
reduce the size and increase speed.
The fact that the lexer generator you're using is dumb doesn't help, but
please don't assume that this dumbassedness is somehow a universal constant
and given upon us by $DEITY.
Cheers, Kuba
With "slow code" you could have been meaning two things:
1. Table lookup globally replaced by compares-and-jumps. The latter benefit
from branch prediction and speculative execution. So it's not slow anymore.
2. Table "compression" used, where a few compare-and-jumps remove
huge "unused" swaths of the table. By "unused" I meant "bomb out with an
internal error".
I think you're being silly. Stop it.
Cheers, Kuba
Availability of source code enables that, but is not a guarantee that a fix
will be forthcoming or economical. Gcc codebase is all for us to see, yet it
would require either a genius or lots of time for the ordinary ones among us
to get to speed to work with it in general. I've attempted it 2-3 times, and
I gave up after a while (just wrapping your mind around gas's borkedness can
be revolting), even though I have no problem understanding most of the
concepts involved; I maintain a proprietary, half-assed, just-good-enough
implementation of a nonconforming Lisp which produces MCU (eZ8 and 12 bit
pic) assembly on par with what I can write myself, mostly. But it's written
in Lisp too, and while I could probably port it to C, I could never develop
it in C (it'd degenerate in a way which makes gcc code look stellar). So even
if you do have knowledge in the field, but no first-hand exposure to
braindamage involved with writing (and maintaining) a compiler of any sort in
a low level lanugage like C, you might as well have no access to the source
code -- it won't help much beyond simple recompilation or minor patches
needed to have the code compile on a newer revision of the host platform (say
newer Linux distro).