Speeding up Nano

Markus Mottl

unread,

May 2, 2014, 4:31:14 PM5/2/14

to ocaml...@googlegroups.com

Hi,

I've just noticed that the performance of Nano_mutex.lock/unlock could
be improved in several ways. Together these could (my guess) amount
to around 25% in most use cases.

"lock" is currently a recursive function, which cannot be inlined. It
should be easy to rewrite the function without "rec", at least for the
overwhelmingly likely case that the mutex is not being held.

It is also not necessary to call "current_thread_id", which is very
expensive, all over again after thread contention (with_blocker),
because its result is obviously not going to change.

The "unlock" function could benefit from reordering the first two "if"
branches. It tests for an unlikely error branch first. We could thus
usually avoid one branch.

It is currently necessary to call "Thread.id" and "Thread.self"
separately. It would quite noticeably improve performance (> 20%) if
just one external call were necessary. This is a little tricky,
because there are several thread implementations (VM-threads, POSIX,
Win32). This should hence probably be added to the OCaml runtime.

Last (and probably least) we could avoid some closure allocations due
to "with_blocker". These probably don't make much of a difference
though, because they only happen during mutex contention. Proper
thread code should never hold mutexes for long.

The likely cases (uncontested mutex, correct usage) should have the
most optimized instruction path.

Regards,
Markus

--
Markus Mottl http://www.ocaml.info markus...@gmail.com

Yaron Minsky

unread,

May 2, 2014, 4:52:56 PM5/2/14

to ocaml...@googlegroups.com

Awesome! Do you have a patch, or is this just from reading?

y

> --
> You received this message because you are subscribed to the Google Groups "ocaml-core" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to ocaml-core+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Markus Mottl

unread,

May 2, 2014, 9:51:13 PM5/2/14

to ocaml...@googlegroups.com

No, I don't, but I read it and measured Thread.id + Thread.self, which
obviously eat most of the time. Just adding a "noalloc" external
declaration for "Thread.id" gives you ~20% performance gain.

I think it might be possible to avoid calling any external C-functions
from lock/unlock except for contested mutexes. You could use the hook
architecture in the OCaml-runtime for installing functions that get
called each time a thread acquires the runtime lock. See file
"signals.h" in the exposed runtime API and the hooks
"caml_leave_blocking_section_hook" and
"caml_try_leave_blocking_section_hook".

Just store a function pointer in the hook that will call the previous
function in there and which then, for example, sets an OCaml reference
to the current thread id. Since context switches are so rare and
expensive, this should hardly have any performance impact. But now
lock/unlock won't have to perform expensive function calls to obtain
the current thread id, they just need to access a ref value. I
wouldn't be surprised if you could more than double the speed of
Nano_mutex that way.

Regards,
Markus

Yaron Minsky

unread,

May 4, 2014, 10:45:31 PM5/4/14

to ocaml...@googlegroups.com, Stephen Weeks

Looping in Sweeks, in case he hasn't seen the thread.

y

Yaron Minsky

unread,

May 5, 2014, 8:56:59 AM5/5/14

to Stephen Weeks, ocaml...@googlegroups.com

Sweeks replied on another thread, but for anyone else: it seems reasonable, and we'll likely get to it in a few months, but it's not top of stack right now.

Reply all

Reply to author

Forward

Speeding up Nano_mutex

Markus Mottl

Yaron Minsky

Markus Mottl

Yaron Minsky

Yaron Minsky