[Caml-list] Why systhreads?

Lauri Alanko

unread,

Nov 23, 2002, 4:51:09 PM11/23/02

to caml...@inria.fr

Hello.

A simple, fundamental question: why is native-code threading done using
system threads? Why isn't pure user-level scheduling used as with
bytecode?

It seems that all the time incompatibilities and deficiencies in Win32
threads and pthreads cause no end of trouble, for instance they fail to
support the asynchronous exceptions which I yearned for.

Since there is a single heap and threads are run in a strictly
serialized order, system threads don't even give any support for
parallelism. So user-level threading seems like the sensible option. For
instance, the GHC Haskell compiler uses pure user-level threading both
in native code and when interpreted, and it works pretty well. (All
right, there's now talk of adding systhread support, but only for
foreign interface issues.)

I cannot believe that supporting many different system thread interfaces
is easier than managing native-code stacks manually. So could someone
please clarify what the motivation here is?

Thanks.

Lauri Alanko
l...@iki.fi
-------------------
To unsubscribe, mail caml-lis...@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/
Beginner's list: http://groups.yahoo.com/group/ocaml_beginners

Sven Luther

unread,

Nov 24, 2002, 2:41:17 AM11/24/02

to Lauri Alanko, caml...@inria.fr

On Sat, Nov 23, 2002 at 11:08:06AM +0200, Lauri Alanko wrote:
> Hello.
>
> A simple, fundamental question: why is native-code threading done using
> system threads? Why isn't pure user-level scheduling used as with
> bytecode?

I don't really know about windows (which is a pain to use ocaml on
anyway) but on unix, you can choose at compile time to use either
systhreads or ocamlthreads.

Friendly,

Sven Luther

Vitaly Lugovsky

unread,

Nov 24, 2002, 12:20:51 PM11/24/02

to Lauri Alanko, caml...@inria.fr

On Sat, 23 Nov 2002, Lauri Alanko wrote:

> A simple, fundamental question: why is native-code threading done using
> system threads? Why isn't pure user-level scheduling used as with
> bytecode?

How will you manage SMP scheduling then? May be, smthng like OpenMP will
be nice, but it's not so generic as just native threads.

Chris Hecker

unread,

Nov 24, 2002, 12:48:12 PM11/24/02

to Sven Luther, Lauri Alanko, caml...@inria.fr

>I don't really know about windows (which is a pain to use ocaml on
>anyway) but on unix, you can choose at compile time to use either
>systhreads or ocamlthreads.

For bytecode or for native? His question was about native.

On a related note, now that the first CPUs with HyperThreading are
shipping, is there any plan to multithread the GC so caml programs can take
advantage of HT? I can understand why it was not a high priority to
support real threads for multiprocessor machines when that was the only way
to get parallelism with threads, but once HT is ubiquitous, it has the
potential to make it worth the trouble to thread a regular application to
increase performance. I don't think this is a high priority now, because
there's 0% penetration of HT right now, but hopefully there's some plan for
the future.

I guess the question is, is a multithreaded GC an open research problem, or
is there a known good solution and it just hasn't gotten to the top of the
priority list yet?

Chris

Basile STARYNKEVITCH

unread,

Nov 24, 2002, 1:16:50 PM11/24/02

to Chris Hecker, Sven Luther, Lauri Alanko, caml...@inria.fr

>>>>> "Chris" == Chris Hecker <che...@d6.com> writes:

Chris> On a related note, now that the first CPUs with
Chris> HyperThreading are shipping, is there any plan to
Chris> multithread the GC so caml programs can take advantage of
Chris> HT? I can understand why it was not a high priority to
Chris> support real threads for multiprocessor machines when that
Chris> was the only way to get parallelism with threads, but once
Chris> HT is ubiquitous, it has the potential to make it worth the
Chris> trouble to thread a regular application to increase
Chris> performance. I don't think this is a high priority now,
Chris> because there's 0% penetration of HT right now, but
Chris> hopefully there's some plan for the future.

I would suppose that HyperThreading chips can already successfully
being used on Linux (I was told, IIRC, that such chips gives 2 cpu
when asked thru /proc/cpuinfo). So, I would suppose that they can take
advantage of native threads on Ocaml already.

Chris> I guess the question is, is a multithreaded GC an open
Chris> research problem, or is there a known good solution and it
Chris> just hasn't gotten to the top of the priority list yet?

I have a question to the Ocaml team : Could they explain what is the
current (and perhaps near future) status of multithreading in Ocaml,
notably with respect to garbage collection?

I thought that the current GC in Ocaml (3.06) is already
multithread-capable, at least because each thread has his own birth
region and can do minor garbage collections independently of other
threads, so threads have to synchronize only on major (ie full)
garbage collections. Is my assumption correct? I just had a glance
into ocaml/byterun/minor_gc.c and did not found any thread-local
variables there... Also, ocaml/otherlibs/systhreads/posix.c mentions
/* The global mutex used to ensure that at most one thread is running
Caml code */

What is (with ocamlopt -thread on x86/linux) the use of multithreading
in Ocaml? I really thought that it was really (in practical terms)
biprocessor support, not only some limited kind of (throwable once
only) continuations?

Regards.
--

Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basile<at>starynkevitch<dot>net
alias: basile<at>tunes<dot>org
8, rue de la Faïencerie, 92340 Bourg La Reine, France
-------------------
To unsubscribe, mail caml-lis...@inria.fr Archives: http://caml.inr=
ia.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr=

Dmitry Bely

unread,

Nov 24, 2002, 1:34:07 PM11/24/02

to caml...@inria.fr

Vitaly Lugovsky <v...@ontil.ihep.su> writes:

>> A simple, fundamental question: why is native-code threading done using
>> system threads? Why isn't pure user-level scheduling used as with
>> bytecode?
>
> How will you manage SMP scheduling then?

AFAIK Ocaml program cannot utilise SMP even with the native threads (due to
the single master lock).

- Dmitry Bely

Lauri Alanko

unread,

Nov 24, 2002, 3:06:34 PM11/24/02

to caml...@inria.fr

On Sun, Nov 24, 2002 at 08:14:50PM +0300, Vitaly Lugovsky wrote:
> How will you manage SMP scheduling then? May be, smthng like OpenMP will
> be nice, but it's not so generic as just native threads.

Unless I have _very_ serious misconceptions, ocaml's threads _always_
run in a strictly serialized order, since they share a common heap and
it'd be horrendous to lock the heap at every allocation. So using
systhreads does _not_ buy us any parallelization with SMP.

Lauri Alanko
l...@iki.fi

Christopher Quinn

unread,

Nov 24, 2002, 4:15:06 PM11/24/02

to caml...@inria.fr

amazingly the threading of caml was done back in '93!
here is the paper on it:
http://pauillac.inria.fr/~xleroy/publi/concurrent-gc.ps.gz

but the runtime of the current distro is not so threaded. parallelism is limited to those system functions (for I/O) in the C source files you see surrounded by enter_blocking_section()/leave_blocking_section().
these functions mask whether a new, real thread is created for the duration of the call, or the descriptor is given to select() in the case of bytecode. i think they enforce the 'global lock' on the runtime.

i imagine the performance cost of threading the runtime is rather too high (just what is it that makes java so slow anyway - a multitude of resource locks? )

my particular wish is to see the runtime with a compile option to eliminate static global state (make it thread local?) to enable multiple instances of the runtime to operate in the same address space, albeit completely independently.

- chris

-------------------
To unsubscribe, mail caml-lis...@inria.fr Archives: http://caml.inria.fr
Bug reports: http://caml.inria.fr/bin/caml-bugs FAQ: http://caml.inria.fr/FAQ/

Vitaly Lugovsky

unread,

Nov 24, 2002, 6:18:53 PM11/24/02

to Dmitry Bely, caml...@inria.fr

On Sun, 24 Nov 2002, Dmitry Bely wrote:

> >> A simple, fundamental question: why is native-code threading done using
> >> system threads? Why isn't pure user-level scheduling used as with
> >> bytecode?
> >
> > How will you manage SMP scheduling then?
>
> AFAIK Ocaml program cannot utilise SMP even with the native threads (due to
> the single master lock).

I tried OCaml in a non memory-consuming numerical applications on SMP.
Seems to work well enough (100% load of all the processors).

Xavier Leroy

unread,

Nov 25, 2002, 5:04:54 AM11/25/02

to Lauri Alanko, caml...@inria.fr

It seems that the annual discussion on threads started again. Allow
me to deliver again my standard lecture on this topic.

Threads have at least three different purposes:

1- Parallelism on shared-memory multiprocessors.
2- Overlapping I/O and computation (while a thread is blocked on a network
read, other threads may proceed).
3- Supporting the "coroutine" programming style
(e.g. if a program has a GUI but performs long computations,
using threads is a nicer way to structure the program than
trying to wrap the long computation around the GUI event loop).

The goals of OCaml threads are (2) and (3) but not (1) (for reasons
that I'll get into later), with historical emphasis on (2) due to the
MMM (Web browser) and V6 (HTTP proxy) applications.

Pure user-level scheduling, or equivalently control operators (call/cc),
provide (3) but not (2).

To achieve (2) with a user-level scheduler such as OCaml's bytecode
thread library requires all sorts of hacks, such as non-blocking I/O
and select() under Unix, plus wrapping of all I/O operations so that
they call the user-level scheduler in cases where they are about to
block. (Otherwise, the whole process would block, and not just the
calling thread.)

Not only this is ugly (read the sources of the bytecode thread library
to get an idea) and inefficient, but it interacts very poorly with
external libraries written in C. For instance, deep inside the C
implementation of gethostbyname(), there are network reads that can
block; there is no way to wrap these with scheduler calls, short of
rewriting gethostbyname() entirely.

To make things worse, non-blocking I/O is done completely differently
under Unix and under Win32. I'm not even sure Win32 provides enough
support for async I/O to write a real user-level scheduler.

Another issue with user-level threads, at least in native code, is the
handling of the thread stacks, especially if we wish to have thread
stacks that start small and grow on demand. It can be done, but is
highly processor- and OS-dependent. (For instance, stack handling on
the IA64 is, ah, peculiar: there are actually two stacks that grow in
opposite directions within the same memory area...)

One aspect of wisdom is to know when not to do something oneself, but
leave it to others. Scheduling I/O and computation concurrently, and
managing process stacks, is the job of the operating system. Trying
to do it entirely in a user-mode program is just not reasonable.
(For another reference point, see Java's move away from "green
threads" and towards system threads.)

What about parallelism on SMP machines? The main issue here is that
the runtime system, and in particular the garbage collector and memory
manager, must be MP-safe. This means minimizing global state, and
introducing locking around accesses to shared resources. If done
naively (e.g. locking at each heap allocation), this can be extremely
costly; it also complicates the runtime system a lot. Finally,
garbage collection can become a limiting factor if it is done in the
"stop the world" fashion (all threads stop during GC); a concurrent GC
avoids this problem, but adds tremendous complexity.

(Of course, all this SMP support stuff slows down the runtime system
even if there is only one processor, which is the case for almost all
our users...)

All this has been done before in the context of Caml: that was
Damien Doligez's Concurrent Caml Light system, in the early 90s.
Indeed, the incremental major GC that we have in OCaml is a
simplification of Damien's concurrent GC. If you're interested, have
a look at Damien's publications.

Why was Concurrent Caml Light abandoned? Too complex; too hard to debug
(despite the existence of a machine-checked proof of correctness);
and dubious practical interest. Shared-memory multiprocessors have
never really "taken off", at least in the general public. For large
parallel computations, clusters (distributed-memory systems) are the
norm. For desktop use, monoprocessors are plenty fast. Even if you
have a 4-processor SMP machine, it isn't clear whether you should
write your program using shared memory or using message passing -- the
latter is slightly more expensive, but scales to clusters...

What about hyperthreading? Well, I believe it's the last convulsive
movement of SMP's corpse :-) We'll see how it goes market-wise. At
any rate, the speedups announced for hyperthreading in the Pentium 4
are below a factor of 1.5; probably not enough to offset the overhead
of making the OCaml runtime system thread-safe.

In summary: there is no SMP support in OCaml, and it is very very
unlikely that there will ever be. If you're into parallelism, better
investigate message-passing interfaces.

- Xavier Leroy

Markus Mottl

unread,

Nov 25, 2002, 9:23:46 AM11/25/02

to Xavier Leroy, Lauri Alanko, caml...@inria.fr

On Mon, 25 Nov 2002, Xavier Leroy wrote:
> In summary: there is no SMP support in OCaml, and it is very very
> unlikely that there will ever be. If you're into parallelism, better
> investigate message-passing interfaces.

To make at least some users happy: it is indeed possible to exploit
SMP-machines with native threads in OCaml, but those benefits only occur
when calling external functions that do not interfere with the OCaml
runtime. E.g. LACAML (the LAPACK-interface for OCaml) makes use of this,
which means that you can, say, crunch several matrices in parallel. Due
to the elegant handling of threads in OCaml, this is much nicer to do
than in C.

Regards,
Markus Mottl

--
Markus Mottl mar...@oefai.at
Austrian Research Institute
for Artificial Intelligence http://www.oefai.at/~markus

Blair Zajac

unread,

Nov 25, 2002, 2:03:23 PM11/25/02

to Xavier Leroy, Lauri Alanko, caml...@inria.fr

Xavier Leroy wrote:
>
> It seems that the annual discussion on threads started again. Allow
> me to deliver again my standard lecture on this topic.
>
> Threads have at least three different purposes:
>
> 1- Parallelism on shared-memory multiprocessors.
> 2- Overlapping I/O and computation (while a thread is blocked on a network
> read, other threads may proceed).
> 3- Supporting the "coroutine" programming style
> (e.g. if a program has a GUI but performs long computations,
> using threads is a nicer way to structure the program than
> trying to wrap the long computation around the GUI event loop).

[Discussion on (1), (2) and (3) removed].

To summarize, for (2) system threads are required and and you can't
prevent blocking with user level threads easily or at all. For (3),
making the Ocaml system support SMP is "Too complex; too hard to
debug" and SMP boxes aren't all that popular.

Aren't these contradictory statements?

For Ocaml to support a Ocaml program to have one thread to block on a
system call and to allow other threads to continue, doesn't this support
SMP? Does Ocaml support this?

I need the functionality to have multiple threads where one thread can
block and not stop the others, either due to the OS or to the Ocaml
runtime system.

What am I missing here?

Best,
Blair

--
Blair Zajac <bl...@orcaware.com>
Web and OS performance plots - http://www.orcaware.com/orca/

james woodyatt

unread,

Nov 25, 2002, 4:09:42 PM11/25/02

to Blair Zajac, The Trade

[this thread should probably migrate to ocaml_b...@yahoogroups.com]

On Monday, Nov 25, 2002, at 11:01 US/Pacific, Blair Zajac wrote:

> Xavier Leroy wrote:
>>
>> Threads have at least three different purposes:
>> 1- Parallelism on shared-memory multiprocessors.
>

> [Discussion on (1), (2) and (3) removed].
>
> To summarize, for (2) system threads are required and and you can't
> prevent blocking with user level threads easily or at all. For (3),
> making the Ocaml system support SMP is "Too complex; too hard to
> debug" and SMP boxes aren't all that popular.
>
> Aren't these contradictory statements?

Assuming you meant (1) not (3), then the answer is: No. They're not.

> For Ocaml to support a Ocaml program to have one thread to block on a
> system call and to allow other threads to continue, doesn't this
> support
> SMP?

Not necessarily.

> Does Ocaml support this?

No. All threads are serialized, so an SMP machine only loads one
processor at a time.

> I need the functionality to have multiple threads where one thread can
> block and not stop the others, either due to the OS or to the Ocaml
> runtime system.
>
> What am I missing here?

If I had to guess, I would say you are probably missing how your
application is covered by case (2) or case (3) in M. LeRoy's standard
lecture on the subject.

I've been a very long way down this road myself, and I agree with him.
If you want your application to parallelize well, the winning design
pattern seems to be message passing between distributed memory
processes.

--
j h woodyatt <j...@wetware.com>
markets are only free to the people who own them.

Chris Hecker

unread,

Nov 25, 2002, 5:28:28 PM11/25/02

to james woodyatt, Blair Zajac, The Trade

>If you want your application to parallelize well, the winning design
>pattern seems to be message passing between distributed memory processes.

I was going to let it drop after the "lecture" (which should be put in a
faq or something), but come on, this is a silly generalization. I have
colleagues who have gotten very large speedups from hyperthreading on
commercial applications, not demos. The point is, it's "free" for Intel to
put it in, and your app is waiting on cache misses and pipeline stalls
anyway, so you might as well do something with those cycles. Now you can
get extra work done during those times in C, but you won't be able to in
caml, and that's a bummer. It's not a showstopper, since you can always
call out to C, but it is yet another thing in the list of features that
aren't natively exploitable in caml. Of course there's a cost to enabling
this in caml, and it may be that there's no good way to do it or that it's
not worth it cost/benefit-wise, but saying "you don't want to do it anyway"
is just apologist.

Xavier saying 1.5x is not worth it is really strange to me; most
performance sensitive programmers I know would kill their mother to get
1.5x. I wonder what factor would be worth it for Xavier?

I think the overriding point here is that in the past SMP has not taken off
on the desktop, so it wasn't worth worrying about for end-user
applications. That will no longer be true, simply because it was so cheap
for Intel to add HT. From now on, almost all chips they ship will be
"logically" SMP (barring some unforseen thing where HT isn't used at all
and becomes expensive to keep in the chip...I assume this is what Xavier
meant by "last gasp", but I doubt it based on Intel's historic behavior
with other CPU features). For commercial application developers, that
changes the landscape a bit.

It's very similar to MMX and SSE. Neither technology revolutionized to
world (like the hype suggested), but once all viable end user machines have
it, it becomes cost effective to use. HT is even easier, because unlike
MMX and SSE, it involves no compiler changes (for C compilers) and is
backwards compatible.

I am not a big fan of threading; in fact, I think it's almost always a
cost/benefit lose (except when used to simulate async io) for my kinds of
applications (games). However, HT changes the cost/benefit equation. How
much remains to be seen, of course.

Chris

Sven Luther

unread,

Nov 26, 2002, 1:51:28 AM11/26/02

to Chris Hecker, james woodyatt, Blair Zajac, The Trade

On Mon, Nov 25, 2002 at 02:20:11PM -0800, Chris Hecker wrote:
>
> >If you want your application to parallelize well, the winning design
> >pattern seems to be message passing between distributed memory processes.
>
> I was going to let it drop after the "lecture" (which should be put in a
> faq or something), but come on, this is a silly generalization. I have
> colleagues who have gotten very large speedups from hyperthreading on
> commercial applications, not demos. The point is, it's "free" for Intel to
> put it in, and your app is waiting on cache misses and pipeline stalls
> anyway, so you might as well do something with those cycles. Now you can
> get extra work done during those times in C, but you won't be able to in
> caml, and that's a bummer. It's not a showstopper, since you can always
> call out to C, but it is yet another thing in the list of features that
> aren't natively exploitable in caml. Of course there's a cost to enabling
> this in caml, and it may be that there's no good way to do it or that it's
> not worth it cost/benefit-wise, but saying "you don't want to do it anyway"
> is just apologist.
>
> Xavier saying 1.5x is not worth it is really strange to me; most
> performance sensitive programmers I know would kill their mother to get
> 1.5x. I wonder what factor would be worth it for Xavier?

I think he said that the 1.5x would not cover the cost of adding smp
support in the first place. Apart from the fact that the added cost will
also be incurred by the single processor people, and, well HT technology
is all fine, but there will be some time before it is widely available.
Maybe then this issue will come up again, and another response will be
made.

Friendly,

Sven Luther

Xavier Leroy

unread,

Nov 26, 2002, 4:07:59 AM11/26/02

to Blair Zajac, caml...@inria.fr

Blair Zajac wrote:

> To summarize, for (2) system threads are required and and you can't
> prevent blocking with user level threads easily or at all. For (3),
> making the Ocaml system support SMP is "Too complex; too hard to
> debug" and SMP boxes aren't all that popular.
> Aren't these contradictory statements?
>
> For Ocaml to support a Ocaml program to have one thread to block on a
> system call and to allow other threads to continue, doesn't this support
> SMP? Does Ocaml support this?

No to the first question. Yes to the second.

By "supporting SMP", I mean having several threads executing Caml code
in parallel, thus using the Caml runtime system in a concurrent fashion.
This is the hard part.

In the current implementation of systhreads, the Caml executor and
runtime system is one big critical section: at most one thread can
execute Caml code at a given time, but arbitrarily many other threads
can be blocked on I/O (and thus isn't calling the Caml runtime system).
Each thread leaves the critical section before calling a potentially
blocking I/O operation, and re-enters it when the I/O completes.

> I need the functionality to have multiple threads where one thread can
> block and not stop the others, either due to the OS or to the Ocaml
> runtime system.

You have that functionality. What you don't have is the ability to
keep several processors busy running Caml code. (As Markus said, you
can still have C code running concurrently with Caml code, provided
that the C code doesn't call the Caml runtime system.)

Chris Hecker wrote:

> Xavier saying 1.5x is not worth it is really strange to me; most
> performance sensitive programmers I know would kill their mother to get
> 1.5x. I wonder what factor would be worth it for Xavier?

Factors of 10 are always nice :-) Just kidding. What I meant is the
following: assume making the Caml runtime system thread-safe entails a
25% slowdown on program execution. (This can easily happen if e.g. we
have to lock a mutex at each heap allocation.) Further assume that by
doing so, you get a 1.5 speedup from hyperthreading. In the end, your
program will run 1.5 * 0.75 = 1.12 times faster than its equivalent
running on the standard, single-processor Caml runtime. It's not
worth the effort.

- Xavier Leroy

Sven Luther

unread,

Nov 26, 2002, 4:32:24 AM11/26/02

to Xavier Leroy, Blair Zajac, caml...@inria.fr

On Tue, Nov 26, 2002 at 10:02:54AM +0100, Xavier Leroy wrote:
> Blair Zajac wrote:
>
> > To summarize, for (2) system threads are required and and you can't
> > prevent blocking with user level threads easily or at all. For (3),
> > making the Ocaml system support SMP is "Too complex; too hard to
> > debug" and SMP boxes aren't all that popular.
> > Aren't these contradictory statements?
> >
> > For Ocaml to support a Ocaml program to have one thread to block on a
> > system call and to allow other threads to continue, doesn't this support
> > SMP? Does Ocaml support this?
>
> No to the first question. Yes to the second.
>
> By "supporting SMP", I mean having several threads executing Caml code
> in parallel, thus using the Caml runtime system in a concurrent fashion.
> This is the hard part.
>
> In the current implementation of systhreads, the Caml executor and
> runtime system is one big critical section: at most one thread can
> execute Caml code at a given time, but arbitrarily many other threads
> can be blocked on I/O (and thus isn't calling the Caml runtime system).
> Each thread leaves the critical section before calling a potentially
> blocking I/O operation, and re-enters it when the I/O completes.

In the case i have a multi-threaded lablgtk executable, having ine
thread managing the interface and the other running programs, that even
if i had a way of killing a thread (or setting a mutex or whatever to
signal it to stop), that if the running thread is looping, i will never
be able to execute the interface thread which will (trough a callback)
set the mutex to the stop option, because the running thread doesn't do
blocking IO ?

Mmm, checking the mutex is a blocking IO though, isn't it ?

keeping the GUI alive even if some other stuff is taking time or looping
forever is a nice application of threading support.

Friendly,

Sven Luther

Xavier Leroy

unread,

Nov 26, 2002, 4:36:37 AM11/26/02

to Sven Luther, Blair Zajac, caml...@inria.fr

> In the case i have a multi-threaded lablgtk executable, having ine
> thread managing the interface and the other running programs, that even
> if i had a way of killing a thread (or setting a mutex or whatever to
> signal it to stop), that if the running thread is looping, i will never
> be able to execute the interface thread which will (trough a callback)
> set the mutex to the stop option, because the running thread doesn't do
> blocking IO ?

That's a long question. Had to read it three times to see what you mean :-)

The answer to your question is that Caml systhreads do support
preemption: a timer forces the currently running thread to call
Thread.yield() at regular intervals. In turn, Thread.yield()
releases the master mutex, calls sched_yield(), and re-acquires the
master mutex, giving other threads a chance to grab the master mutex
and run.

> keeping the GUI alive even if some other stuff is taking time or looping
> forever is a nice application of threading support.

Sure. But this is all taken care of.

- Xavier Leroy

Sven Luther

unread,

Nov 26, 2002, 5:55:25 AM11/26/02

to Xavier Leroy, Sven Luther, Blair Zajac, caml...@inria.fr

On Tue, Nov 26, 2002 at 10:34:01AM +0100, Xavier Leroy wrote:
> > In the case i have a multi-threaded lablgtk executable, having ine
> > thread managing the interface and the other running programs, that even
> > if i had a way of killing a thread (or setting a mutex or whatever to
> > signal it to stop), that if the running thread is looping, i will never
> > be able to execute the interface thread which will (trough a callback)
> > set the mutex to the stop option, because the running thread doesn't do
> > blocking IO ?
>
> That's a long question. Had to read it three times to see what you mean :-)

Yes, sorry about that.

> The answer to your question is that Caml systhreads do support
> preemption: a timer forces the currently running thread to call
> Thread.yield() at regular intervals. In turn, Thread.yield()
> releases the master mutex, calls sched_yield(), and re-acquires the
> master mutex, giving other threads a chance to grab the master mutex
> and run.

So it is not necessary to call Thread.yield() myself before the blocking
code, right ?

> > keeping the GUI alive even if some other stuff is taking time or looping
> > forever is a nice application of threading support.
>
> Sure. But this is all taken care of.

:)))

Friendly,

Sven Luther

Chris Hecker

unread,

Nov 26, 2002, 1:47:04 PM11/26/02

to Xavier Leroy, Blair Zajac, caml...@inria.fr

>Factors of 10 are always nice :-) Just kidding. What I meant is the
>following: assume making the Caml runtime system thread-safe entails a
>25% slowdown on program execution. (This can easily happen if e.g. we
>have to lock a mutex at each heap allocation.) Further assume that by
>doing so, you get a 1.5 speedup from hyperthreading. In the end, your
>program will run 1.5 * 0.75 = 1.12 times faster than its equivalent
>running on the standard, single-processor Caml runtime. It's not
>worth the effort.

Sure, that's kinda obvious. My original question was whether there was a
known way to do a multithreaded gc that doesn't suck (costing 25% on
nonthreaded applications does not count as not sucking) that ocaml could
use if it became worth it (ie. in the event HT was widely adopted and
actually worked well in practice). If you're saying the above is the state
of the art in multithreaded gc, then yes, it's not worth it. If there was
a multithreaded gc technique that cost 3% for single threaded apps, and all
processors in existence were HT-enabled, then the equation starts to look
different. I never said this was the case now, or somebody should start
typing this new gc in, I just wondered if the technology existed in case it
became interesting.

I find it slightly ironic that I'm the "HyperThread guy" in this thread,
since I'm pretty anti-hype myself. Oh well. Another slightly frustrating
thing is that 90% of this thread was taken up by stuff that's documented
(poorly, but still) on the net (whether SMP is supported now, whether async
io works, whether non-systhreads work in native code, how the global mutex
works, etc.). It would be so nice if the FAQ was better formatted and we
had a way of quickly updating it, but no, I don't have any time for that
and I'm sure nobody else does either. And, of course, nobody reads the FAQ
before posting anyway. :)

Chris

Dave Berry

unread,

Nov 26, 2002, 2:06:33 PM11/26/02

to Xavier Leroy, Lauri Alanko, caml...@inria.fr

At 11:01 25/11/2002, Xavier Leroy wrote:
>For large
>parallel computations, clusters (distributed-memory systems) are the
>norm.

I think this is an exaggeration. I've just started work at the UK National
e-Science Centre, which is linked to the Edinburgh Parallel Computing
Centre. We have several new multiprocessor machines (16 processors, 64
processors, etc.), and there doesn't seem to be a shortage of uses for them.

Dave.

Gregory Morrisett

unread,

Nov 26, 2002, 2:25:39 PM11/26/02

to Xavier Leroy, Blair Zajac, caml...@inria.fr

>Factors of 10 are always nice :-) Just kidding. What I meant is the
>following: assume making the Caml runtime system thread-safe
>entails a 25% slowdown on program execution. (This can easily
>happen if e.g. we have to lock a mutex at each heap
>allocation.)

I would assume that allocation (in the nursery) is done
via a pointer bump, and that it would be easy enough to
have separate nurseries for threads. Then the only
synchronization that's needed is for GC. At least, that's
how we did it with SML/NJ.

-Greg

Damien Doligez

unread,

Nov 27, 2002, 8:15:29 AM11/27/02

to caml...@inria.fr

On Monday, Nov 25, 2002, at 23:20 Europe/Paris, Chris Hecker wrote:

> However, HT changes the cost/benefit equation. How much remains to be
> seen, of course.

Do you really think so ? In my experience, 95% of the costs of threads
(with shared memory) are in the debugging (of the threads
implementation,
AND of the programs). Cheap SMP machines and HT do not change the
cost/benefit equation very much.

More important, you don't need threads and shared memory to make use
of a SMP machine. Any kind of parallelism will do. Several processes
with message-passing can easily get you 100% load on all your
processors.
Also, message-passing is more general; for example it will work on
clusters.

So my opinion is: multiprocessing good, threads bad.

-- Damien

Tim Freeman

unread,

Nov 27, 2002, 10:40:34 AM11/27/02

to v...@ontil.ihep.su, db...@mail.ru, caml...@inria.fr

> I tried OCaml in a non memory-consuming numerical applications on SMP.
>Seems to work well enough (100% load of all the processors).

Wrong metric. You want speedup, not CPU utilization. You can get CPU
utilization for free by running an infinite loop. Did the application
run anywhere near N times faster when you were using N processors?

--
Tim Freeman
t...@fungible.com
GPG public key fingerprint ECDF 46F8 3B80 BB9E 575D 7180 76DF FE00 34B1 5C78

Chris Hecker

unread,

Nov 27, 2002, 1:10:42 PM11/27/02

to Damien Doligez, caml...@inria.fr

[sorry for the longwinded response]

>Do you really think so ? In my experience, 95% of the costs of threads
>(with shared memory) are in the debugging (of the threads implementation,
>AND of the programs). Cheap SMP machines and HT do not change the
>cost/benefit equation very much.

Like I said in my previous mail, I think it's going to be similar to
MMX/SSE. The performance improvement you get is not worth the development
and support headache, until the technology is ubiquitous. Once it's
everywhere, it becomes worthwhile. I'm using a middleware library for my
game right now that requires MMX. That's finally an acceptable
requirement. On xbox, which is a fixed platform with a known cpu, every
game uses SSE, because it's just guaranteed to be there, and can make a big
difference if you're willing to work with its problems (using structure of
arrays layout, etc.). And let's not even talk about the insanity of the
PS2 architecture. Xbox2 will use a CPU with HT, because there won't be any
Intel CPUs that don't have HT, so it'll get used there by apps.

Now, as you point out, threads are complicated to design, program, and
debug. I agree with this completely. As I said, I never use threaded
designs if I can avoid it. However, if it becomes very easy to spawn very
small scale parallel threads in C on an HT processor, then it could make a
big performance difference for some algorithms. People are working on C
compilers that have these extensions built in. Intel's got one
now. They'll be first, everyone will ignore it until the installed base is
big enough, and then it'll go into msvc. MMX, SSE, and 3dnow followed the
exact same path.

The reason this is different (or has the potential to be different) with HT
compared to discrete cpus is that a) HT is free so it will be ubiquitous
eventually, and b) HT drops the thread context switch time to 0. It's not
worth starting up a thread on another cpu to do a few instructions worth of
work, but it is conceivable that it would be for HT. Again, I think this
will mirror MMX. The original version of MMX has a horrible context switch
time, and overloaded the FPU registers. It was worthless. They fixed
it. I assume there are similar gotchas with the first version of HT. But,
in a couple revs, they'll fix it and it will be possible to have a second
thread do half the work in a small loop, with no overhead (there'll be a hw
thread pool, hw wait on mutex/sleep, etc.).

The reason HT can make a performance difference is that your app is
stalling in the CPU all the time anyway. Even tight loops aren't memory
bandwidth bound (unless it's a copy or fill), they're memory access bound;
there's a huge difference between the two. HT can take advantage of the
latter and give you way more utilization, even on a smallscale loop. In
theory, anyway. :) But, as I said, I have [non-Intel] colleagues who have
seen big wins with HT on some applications, enough to make them say, "huh,
this actually works!"

Now, you could just say, "hey, caml's not for that kind of lowlevel stuff",
which is a fine response. However, I've been doing a lot of lowlevel stuff
in my game, all in caml (linear algebra, 3d transforms, bitmap operations,
etc.), and it's so close to being good enough to just stay in caml and not
have to drop to C. I understand the point of using the right tool for the
job, but there is overhead (both cognitive and development-process-wise,
both important) associated with hooking something in C, and so it would be
really nice to stay in caml all the time. Bringing this back to HT, this
is the kind of feature that requires inria to do it, because I don't think
anybody else understands the gc. By contrast, I could probably get an SSE
code generator working if I thought it was worth it. But there's no way I
could multithread the gc. :)

>More important, you don't need threads and shared memory to make use
>of a SMP machine. Any kind of parallelism will do. Several processes
>with message-passing can easily get you 100% load on all your processors.
>Also, message-passing is more general; for example it will work on clusters.

Sure, but an HT cpu shares L1 and L2 caches between the threads. This
means that you really want your threads to be working on the same data and
code if you can help it. It'll still work for processes, but you're going
to thrash way more than if you're doing local stuff.

Again, I'm not an HT zealot; I don't even know if it's going to
succeed. But, I do think it has the potential to have a big impact on
performance oriented programming, and it would be great if there's a plan
for supporting it in caml if it actually works. If it's simply not
possible to multithread the gc well, then that's that. But it seems like
something you want to have simmering on the mental back burner in case it
turns out you want it later.

Sorry for the huge post,

Gerd Stolpmann

unread,

Nov 27, 2002, 4:07:58 PM11/27/02

to Chris Hecker, Damien Doligez, caml...@inria.fr

Am 2002.11.27 19:04 schrieb(en) Chris Hecker:
> Now, as you point out, threads are complicated to design, program, and
> debug. I agree with this completely. As I said, I never use threaded
> designs if I can avoid it. However, if it becomes very easy to spawn very
> small scale parallel threads in C on an HT processor, then it could make a
> big performance difference for some algorithms. People are working on C
> compilers that have these extensions built in. Intel's got one
> now. They'll be first, everyone will ignore it until the installed base is
> big enough, and then it'll go into msvc. MMX, SSE, and 3dnow followed the
> exact same path.

If it is really easy to spawn a second thread (or wake an existing thread up),
this could be useful for OCaml's runtime system internally. I can imagine that it is
not that difficult to rewrite the GC such that it runs in two threads. I don't mean
that it runs in parallel with the rest of the program (expensive locking problems),
but that the runtime system wakes two GC threads up when it is necessary, and
waits until both threads have done their job. That would reduce the time
spent with GC, maybe from 30% to 20% for a typical program. Of course, this
is only possible when there are good ideas to parallelize the GC such that the
extra coordination time does not eat up the extra CPU power.

Just an idea, I really do not know whether it is doable (or worth doing it).

Gerd
------------------------------------------------------------
Gerd Stolpmann * Viktoriastr. 45 * 64293 Darmstadt * Germany
ge...@gerd-stolpmann.de http://www.gerd-stolpmann.de
------------------------------------------------------------

Lauri Alanko

unread,

Nov 28, 2002, 1:48:03 AM11/28/02

to caml...@inria.fr

On Mon, Nov 25, 2002 at 11:01:33AM +0100, Xavier Leroy wrote:
> One aspect of wisdom is to know when not to do something oneself, but
> leave it to others. Scheduling I/O and computation concurrently, and
> managing process stacks, is the job of the operating system. Trying
> to do it entirely in a user-mode program is just not reasonable.

Nevertheless that is the way many language implementations do it, mainly
because their idea of what a thread should look like and how it should
be used differs from what eg. Posix threads (or at least their common
implementations) provide. Pthreads are just too heavy.

So if I understand correctly, benefits of user-level threads include:

* Thread creation speed (no context switches)
* Minimal memory footprint
* Flexibility (eg. inter-thread exceptions)

whereas using system threads gives us:

* Ease of implementation
* Better handling of blocking functions in foreign libraries

Now this is of course a matter of taste, but I'd say that the former
weighs much more than the latter. The problems with gethostbyname can be
averted even with user-level threads (the standard way is spawning an
external server process for each gethostbyname call), whereas there's no
way to get the benefits of user-level threads while using system
threads (short of writing one's own threading system, which is also
pretty much impossible unless you have at least continuations...)

Thankfully it seems like system threads will be much lighter at least in
Linux 2.6...

Lauri Alanko
l...@iki.fi

Quetzalcoatl Bradley

unread,

Nov 28, 2002, 4:20:56 AM11/28/02

to caml...@inria.fr

While the topic of threads is fresh...

Suppose you have an OCAML library (native code) to be embedded in a
multithreaded C application. The library is compiled with -output-obj
unix.cmxa threads.cmxa and the C program calls caml_startup at the
beginning.

Then I create a few C threads and they all call into the ocaml library
occasionally. before calling in they first leave_blocking_section, and
afterwards they call leave_blocking_section.

At this point, The first time a call is made into the ocaml,
Mutex.create is called, which crashes during a GC inside
"oldify_local_roots". Hash_retaddr(retaddr) is called, and the result
looked up in the frame_descriptors table, but the result is a NULL
pointer which crashes when dereferenced.

Is there anything special that needs to be done to "bless" external
threads before they call into ocaml?

Unfortunately it isn't really feasible for me to have ocaml create all
the threads and have the C code called from ocaml instead. I presume
that would be easier though.

Thanks,

Quetzalcoatl Bradley
qbra...@blackfen.com

Vitaly Lugovsky

unread,

Nov 29, 2002, 8:28:16 AM11/29/02

to Tim Freeman, db...@mail.ru, caml...@inria.fr

On Wed, 27 Nov 2002, Tim Freeman wrote:

> > I tried OCaml in a non memory-consuming numerical applications on SMP.
> >Seems to work well enough (100% load of all the processors).
>
> Wrong metric. You want speedup, not CPU utilization. You can get CPU
> utilization for free by running an infinite loop. Did the application
> run anywhere near N times faster when you were using N processors?

Yes. But, >80% of the system time was in external C functions, without
any memory management. So, really wrong metric...