Hi,
I've just noticed that the performance of Nano_mutex.lock/unlock could
be improved in several ways. Together these could (my guess) amount
to around 25% in most use cases.
"lock" is currently a recursive function, which cannot be inlined. It
should be easy to rewrite the function without "rec", at least for the
overwhelmingly likely case that the mutex is not being held.
It is also not necessary to call "current_thread_id", which is very
expensive, all over again after thread contention (with_blocker),
because its result is obviously not going to change.
The "unlock" function could benefit from reordering the first two "if"
branches. It tests for an unlikely error branch first. We could thus
usually avoid one branch.
It is currently necessary to call "Thread.id" and "Thread.self"
separately. It would quite noticeably improve performance (> 20%) if
just one external call were necessary. This is a little tricky,
because there are several thread implementations (VM-threads, POSIX,
Win32). This should hence probably be added to the OCaml runtime.
Last (and probably least) we could avoid some closure allocations due
to "with_blocker". These probably don't make much of a difference
though, because they only happen during mutex contention. Proper
thread code should never hold mutexes for long.
The likely cases (uncontested mutex, correct usage) should have the
most optimized instruction path.
Regards,
Markus
--
Markus Mottl
http://www.ocaml.info markus...@gmail.com