Is it time to start rethinking concurrency in OCaml?
I have followed the argumentation of only using one native thread for the OCaml runtime. I can easily see how this can increase performance and simplify implementation. I can also see that spawning new processes makes sense, so you get a local heap for each task.
However, as we move forward it seems that we will get more than a few cores on the same computational node according to the following article:
As I see it, it is not feasible to spawn a new process with a local heap for each core, when the number of cores increases dramatically.
I am not sure that a parallel GC is a sufficient solution either due to the high contention on memory, at least unless it provide some additional core affinity features. I believe some level of compiler support is needed in the not so distant future such that enough primitives are available to build powerful multi-core aware libraries. One approach could be micro heaps with core affinity and handle mutable memory specially.
> Is it time to start rethinking concurrency in OCaml? > (...)
That's a recurrent topic in this list. I'll summarise the arguments and save us all some time:
i) Someone raises the issue that it's time for Ocaml to go multicore.
ii) A few voices go "yeah!" and state that with a concurrent GC, Ocaml can take over the world. Their argument is as follows:
1) Ocaml gets a concurrent GC; 2) ... 3) Profit!
iii) A voice of reason notes that the devil is in step 2) above. If your programming paradigm for concurrency is Threads+Mutexes, then you are exposing yourself to a world of pain in race conditions. At this point someone usually links to Xavier's standard speech on concurrency (which is a few years old, but as poignant now as it was then).
iv) The voices from step ii) retort that they're über-programmers and that they can make perfectly scalable and race condition-free programmes with Threads+Mutexes if only they had a concurrent GC.
v) The voice of reason remarks that one of the main reasons we all chose Ocaml is because the language ensures safe, resilient programmes. In a way it's the humble recognition that we are human, and that we make mistakes. Choosing the Threads+Mutexes path would be hubris and an abandonment of fundamental reasons why we chose Ocaml in the first place.
While I tend to agree with viewpoints iii) and v), I do think a healthy discussion on the subject of multicore is in order. Though I suggest we focus the discussion on concurrency models that will allow us to take advantage of those multicores in a safe, sane manner:
Well, it's fun to join the old discussion in the new times. The fact that computers go multicore at a greater scale makes it recurrent.
Erlang makes concurrency easy due to purity, and OCaml is famous for being eclectic. Why not embrace Erlang's model by imposing limitations on what can be in threads -- keeping them pure? Erlang model works and attracts people to functional programming in general. Even if some other model of concurrency prevails, it is interesting and useful to interop with Erlang easily. Here's what Erlang folks have started:
Doing this properly can solve a lot of needs out there, and bring lots of debugged, proven, high-quality concurrent server and communication code within reach.
> Erlang makes concurrency easy due to purity, and OCaml is > famous for being eclectic. Why not embrace Erlang's > model by imposing limitations on what can be in threads -- > keeping them pure?
Indeed. Ocaml's imperative features can be very useful in a local context (ie, not crossing the boundaries of a function), but do hinder concurrency when used for modifying global state. I wonder: is there already a tool that verifies Ocaml code to ensure that a function is pure? Something along the lines of Ocamlexc, but applied to purity.
The way I see it, there are (at least) three axis associated with a function: the value axis, the exception axis, and the purity axis (the latter can also be called the side-effect axis). In Ocaml, a function signature cares only about the value axis, though implicitly the other axis are also be present. For example:
val integer_division: int -> int -> int throws Division_by_zero is_pure
val incr_global_counter: unit -> unit throws_nothing is_not_pure
I can imagine a "spawn" statement in a concurrent Caml that expects that the function passed as parameter be pure. And of course a function would only be pure if it did not modify global state and only invoked pure functions itself.
Obviously the other alternative is to go the Haskell route and transform Caml in a pure functional language where monads reign. Though I think it would be interesting to come up with a solution that keeps the advantages of local imperative features, but does ensure purity for the purpose of concurrency.
(and I'm sure there is already a ton of research along these lines)
> I can imagine a "spawn" statement in a concurrent Caml that expects > that the function passed as parameter be pure. And of course a function > would only be pure if it did not modify global state and only invoked > pure functions itself.
Incidentally, it is of course possible for a function to invoke impure functions while still being pure itself (ie, it ensures the impurity does not "leak out"). One question to those more familiar with current language research: any recommended resources out there about this topic?
> Incidentally, it is of course possible for a function to invoke impure > functions while still being pure itself (ie, it ensures the impurity > does not "leak out"). One question to those more familiar with current > language research: any recommended resources out there about this topic?
> Cheers, > Dario Teixeira
You can easily infer purity (meaning no affectation) of an ocaml expression using the same algorithm than for exception (numerous papers on that subkect) ... just tag ":=" and "<-" as being able to raise the fake exception "#Affect" which can not be catch by any try (because its name starts with a "#") ...
> Well, it's fun to join the old discussion in the new times. The fact > that computers go multicore at a greater scale makes it recurrent.
> Erlang makes concurrency easy due to purity, and OCaml is famous
> for being eclectic. Why not embrace Erlang's model by imposing > limitations on what can be in threads -- keeping them pure?
Erlang processes are not pure, but they do have their own memory heap, making it possible to do stop-and-copy GC per process. I think the share-nothing model is more important than purity.
There are some functions that allow Erlang processes to update "global state" (e.g. the ETS hash and ordered_set tables), but they are ensured to be atomic by the runtime system. You could model each of these functions using normal erlang processes.
Also worth reading is Ulrich Drepper's series on current and future system architectures. I've highlighted the important parts of this long series below, but you can find the complete list of links at the end of part 1.
Uli has recently been advocating languages like OCaml (+ fork, numactl and virtualization obviously) for future architectures which will involve massive numbers of cores and very long paths to main memory.
Zitat von Mikkel Fahnøe Jørgensen <mik...@dvide.com>:
[...]
> I am not sure that a parallel GC is a sufficient solution either due > to the high contention on memory, at least unless it provide some > additional core affinity features. I believe some level of compiler > support is needed in the not so distant future such that enough > primitives are available to build powerful multi-core aware > libraries. > One approach could be micro heaps with core affinity and handle > mutable memory specially.
[...]
Not especially for multicore, but for parallel programming, this might be of interest:
(To mention this by me also is recurrent, as the thread we are in...)
Ciao, Oliver
P.S.: During the last multicore discussion, I found that link, but had not tried OCamlp3l. Now I think I will have more time and motivation and it could be compiled and installed without any problems with OCaml 3.10.2.
On Fri, Dec 19, 2008 at 09:37:32PM +0100, Oliver Bandel wrote:
[...]
> P.S.: During the last multicore discussion, I found that link, > but had not tried OCamlp3l. Now I think I will have more > time and motivation and it could be compiled and installed > without any problems with OCaml 3.10.2.
Has anyone tried it with 3.11?
I had an idea to try out some fork-based OCaml programming to exploit the 4 & 8 core machines we have here, but maybe can try this instead.
On Fri, Dec 19, 2008 at 4:27 PM, Richard Jones <r...@annexia.org> wrote: > On Fri, Dec 19, 2008 at 09:37:32PM +0100, Oliver Bandel wrote: > [...] >> P.S.: During the last multicore discussion, I found that link, >> but had not tried OCamlp3l. Now I think I will have more >> time and motivation and it could be compiled and installed >> without any problems with OCaml 3.10.2.
> Has anyone tried it with 3.11?
> I had an idea to try out some fork-based OCaml programming to exploit > the 4 & 8 core machines we have here, but maybe can try this instead.
The prelude.ml project has some fork-based parallel functions for lists, arrays, strings and bigarrays:
While I have not tried OCamlp3l on 3.11 yet, my guess is that it would work. It is a pure-OCaml set of libraries along with some helper scripts/programs and as far as I know there is not any camlp4 involved. After speaking with the authors, the package does seem to be more focused on distributed computing than local parallelism. It is still possible to use it for local parallelism though. OCamlp3l is currently going through a rewrite as Camlp3l though the restructuring is not complete at this point. CVS repositories for both are here -- http://camlcvs.inria.fr/cgi-bin/cvsweb/bazar-ocaml/
Please let us know how it goes if you do try one or both of these out.
ii) People used to choose OCaml because it was fast but lack of support for multicores means that OCaml is no longer competitively performant for many tasks on today's computers.
> iii) A voice of reason notes that the devil is in step 2) above. > If your programming paradigm for concurrency is Threads+Mutexes, > then you are exposing yourself to a world of pain in race > conditions. At this point someone usually links to Xavier's > standard speech on concurrency (which is a few years old, but > as poignant now as it was then).
iii) Your "voice of reason" also said that multicore computers "have never really taken off, at least in the general public":
> ... > While I tend to agree with viewpoints iii) and v), I do think a healthy > discussion on the subject of multicore is in order. Though I suggest we > focus the discussion on concurrency models that will allow us to take > advantage of those multicores in a safe, sane manner:
At this point, you need to distinguish between concurrency and parallelism.
OCaml can already handle concurrency reasonably well, particularly when CPU usage is low. Perhaps OCaml has a future in that specific domain but it is not my remit so I am not the best person to comment on this (ask Gerd).
OCaml is extremely bad at parallelism, even embarassingly parallel tasks because OCaml still has to marshall all the data in a gather operation or resort to manual memory management. However, the problems are entirely with the current implementation and not with the language at all. So a new implementation could address this problem.
> a) Could Jocaml be the future of Ocaml?
For concurrency perhaps but not for parallelism.
> b) How do we handle the global mutability issue?
Avoid mutation when it is likely to get out of control (e.g. Erlang-style concurrent applications) but keep it available because it gives huge performance improvements in many parallel applications. F# already has good solutions in both cases so we can just copy them.
I spent a lot of time thinking about the future of the ML family of languages in the open source world recently. I believe I have an another way forward that some people might find interesting. I think ML has proven its worth and it is time to begin developing a completely new open source ML implementation. There are various different ways to accomplish this and the goals are quite obvious (IMHO). However, finding a route that is likely to lead to those goals that is socially acceptable (i.e. not a fork) and technically feasible (i.e. incrementally adding useful features and garnering users) was not easy. I believe the best course of action is to develop an embedded DSL for OCaml (e.g. for higher-performance numerics) and evolve it into a completely new ML implementation.
I have actually already started this using the excellent LLVM project and I just obtained the first promising results yesterday: a simple benchmark where my compiler runs OCaml code 3x faster than ocamlopt does.
There are a *lot* of low hanging fruit here. So I think it should be relatively easy to create a tool that is useful for a significant number of people. The most interesting aspect is that developing this project as a DLL that is entirely accessible from within OCaml makes it possible to address many of OCaml's own problems for existing OCaml programmers.
If anyone has any ideas about this I'd love to hear them.
Jon Harrop wrote: > I have actually already started this using the excellent LLVM project and I > just obtained the first promising results yesterday: a simple benchmark where > my compiler runs OCaml code 3x faster than ocamlopt does.
Is this going to be an open source project? Is this (or is this going to be) developed in the open with the sources in a publicly available revision control system?
Erik -- ----------------------------------------------------------------- Erik de Castro Lopo ----------------------------------------------------------------- The idea that Bill Gates has appeared like a knight in shining armour to lead all customers out of a mire of technological chaos neatly ignores the fact that it was he who, by peddling second-rate technology, led them into it in the first place. - Douglas Adams in Guardian, 25-Aug-95
On Fri, Dec 19, 2008 at 10:31:33PM +0000, Jon Harrop wrote: > OCaml is extremely bad at parallelism, even embarassingly parallel tasks > because OCaml still has to marshall all the data in a gather operation or > resort to manual memory management.
Merjis implemented the Ancient module to handle this case, and we shared the code with an open source license so everyone can benefit:
On Fri, Dec 19, 2008 at 05:03:30PM -0500, Hezekiah M. Carty wrote: > The prelude.ml project has some fork-based parallel functions for > lists, arrays, strings and bigarrays:
On Friday 19 December 2008 22:36:40 Erik de Castro Lopo wrote:
> Jon Harrop wrote: > > I have actually already started this using the excellent LLVM project and > > I just obtained the first promising results yesterday: a simple benchmark > > where my compiler runs OCaml code 3x faster than ocamlopt does.
> Is this going to be an open source project?
Yes.
> Is this (or is this going to be) developed in the open with the sources in a > publicly available revision control system?
Yes. I will, most likely, work on this as a fun project for as long as possible in an attempt to build something vaguely inspiring before I put the code up somewhere (e.g. OCamlForge). I will be more than happy to give contributors commit access to the source because I am very busy with unrelated projects. If anyone is desperately keen to have a play then I'll consider putting it up immediately.
The only thing that some might object to is that I would like to make this a commerce-friendly platform if possible, i.e. facilitate the buying and selling of libraries built upon this platform. I don't see that this desire has any negative effects but I believe the presence of a commercial platform built around such a foundation would help to provide much-needed resources to fund further work.
I should also mention that several other people are working on similar things, also building upon LLVM. Someone has even tried to contribute an ML implementation to LLVM itself as a configuration tool:
I've used prelude.ml to parallelize my system, and it works fine on Mac OSX. Ilmari has graciously worked with me to add versions of pmap called pmap_init to initialize per-process file descriptors, and pmap_initWithIndex to let the pieces know their IDs -- hopefully they will be rolled into the trunk once it's refactored.
On Dec 19, 2008, at 5:47 PM, Richard Jones wrote:
> On Fri, Dec 19, 2008 at 05:03:30PM -0500, Hezekiah M. Carty wrote: >> The prelude.ml project has some fork-based parallel functions for >> lists, arrays, strings and bigarrays:
While we're on the topic of prelude.ml, I wonder how many folks would find it very nice to just open Prelude and have all these nice functions readily available. I know that the lack of it makes references such as List.map ubiquitous, and many argue it's a good thing. However, it feels refreshing to use short names in an orderly way. Since there's also the Batteries, perhaps a consensus can be worked out on what should be in a such an extension to Pervasives?
> While we're on the topic of prelude.ml, I wonder how many folks would find > it very nice to just open Prelude and have all these nice functions readily > available.
Prelude looks nice, and I certainly do not want to put it down, so forgive me for my critical questions.
How does Prelude handle Windows support, and would we want library components that are not cross platform?
The 3pl project is cross platform if I remember correctly (short of some issues on OS-X that hopefully is corrected or is in CVS).
There is also the Lwt lightweight threads library which is not multi-core. I'm just wondering of some of these efforts can be combined to a more homogenous library solution. I believe there are also other solutions in progress.
Other than that, I wonder if Prelude really is a practical solution as it stands: without looking deep into it, it seems to be targeting inner loop concurrency by spawning processes, which is costly. In some cases this may be fine, such as an easy way to speed up calculating file signatures, but often you would like to have your N processes in a pool that you can tap when needed, such as handling web requests.
Of course we can have different libraries for different purposes, but before including any one such library, I think it would be healthy to discuss requirements.
Finally, I'm also wondering about how to tap platform specific concurrency. OS-X 10.6 will be shipping with Grand Central for concurrent support, and I'm sure there will be, or already are, similar initiatives on other platforms.
>> While we're on the topic of prelude.ml, I wonder how many folks >> would find >> it very nice to just open Prelude and have all these nice functions >> readily >> available.
> Prelude looks nice, and I certainly do not want to put it down, so > forgive me for my critical questions [...]
I agree its solution to parallelism via a simple forking map/reduce is not universal, but in fact I am wondering about having other functional combinators available in shorthand. Prelude.ml is a superb crash course in FP, and in fact I catch myself reinventing these idioms ad hoc quite often. I'm very tempted to just include it always, making it my own Pervasives.
Alexy Khrabrov wrote: > I agree its solution to parallelism via a simple forking map/reduce is > not universal, but in fact I am wondering about having other functional
coThreads is comparatively low-level. You might be able to write your own schedule on top of it. A no-brainer pmap can be defined as follows:
Toplevel trace: ---- # :load unix.cma;; # :path +process;; # :load cothreads.cma;; # let pmap f a = let ea = Array.map (Cothread.spawn f) a in Array.map Event.sync ea;; val pmap : ('a -> 'b) -> 'a array -> 'b array = <fun> # pmap ((+) 1) (Array.init 10 (fun i -> i));; - : int array = [|1; 2; 3; 4; 5; 6; 7; 8; 9; 10|] ----
> combinators available in shorthand. Prelude.ml is a > superb crash course in FP, and in fact I catch myself reinventing these > idioms ad hoc quite often. I'm very tempted to just include it always, > making it my own Pervasives.
Maybe you should also look at some of Haskell's standard lib. Pros: Haskell guys are really good at naming; Cons: costs are always hidden and combinators composition can be terse.
I don't think this is packaged in Debian yet, so I had to make a few decisions about packaging:
- the ocamlfind name is 'preludeml' - the package name (in Fedora) is ocaml-preludeml - no upstream versions, so I packaged it as "version 0.1", dated 20080922
> ii) People used to choose OCaml because it was fast but lack of support for > multicores means that OCaml is no longer competitively performant for many > tasks on today's computers.
That is definitely an argument.
>> If your programming paradigm for concurrency is Threads+Mutexes,
I think this model is necessary for systems programming, but I don't think we should look broader than this.
I'm not really sure what separates threads from functions, but I'd like to be able to chain logic sequentially and concurrently in a way that sync primitives are available but usually not required, and in a way that efficiently handle concurrent operations in C libraries, including transaction operations. I believe Oleg did some work on this, but this work would likely benefit from compiler support.
So the discussion should be
a) is there any real interest in supporting multiple cores - when we look beyond concurrent GC and Mutexes?
b) what are the basic primitives we need from the compiler to support a range of interesting concurrency models in an efficient manner.
c) we are probably not inventing a new vector language - like fortress - super efficient at micro parallelism, but it should be possible to efficiently tap into the processing power available.
d) error handling and asymmetric termination times of concurrent operations is non-trivial - you may not want to wait for all sub-tasks to complete before the main task does starts aggregating. And if you embed distributed concurrency you may want to reschedule a long running operation on a new node. This is not a compiler issue, but the compiler may support a failure model that makes this easier to handle in a generic way for a distributed library. Google often mentions the importance of abandoning failed tasks because a hard-drive is down ever so often, or something.
The spawn multiple processes solutions we have today concerns me slightly because I'm not convinced that processes are cleaned up properly on failure. I may be wrong though.
> I think ML has proven its worth and > it is time to begin developing a completely new open source ML > implementation.
I think this sounds interesting.
But before starting such an effort, it would be nice with more road map information from Inria. Do they want to include LLVM, or are there good reasons not to?
More importantly, I believe Xavier is right about the non-scalability of plain threaded garbage collection. So I really think we need to understand how we want to obtain that parallelism efficiently. And when we understand that, perhaps Inria is more willing to listen to the multi-core argument?
That said, it can't hurt to have an experimental solution to try out new ideas and prove that they work.
> 2008/12/19 Jon Harrop <j...@ffconsultancy.com>: > I think this model is necessary for systems programming, but I don't > think we should look broader than this.
Sorry, I mean the exactly opposite - we should not rely on a tradional threading model, It won't scale. But we still need basic threads for some systems tasks, likely.