Calling Convention Benchmarks

96 views
Skip to first unread message

Mike Anderson

unread,
Jan 15, 2013, 10:34:42 PM1/15/13
to numerica...@googlegroups.com
Just for fun and possibly to give us some facts to inform API design decisions, I wrote some micro-benchmarks using Hugo Duncan's excellent Criterium library to test the performance of different method calling conventions.

It's an artificial micro-benchmark clearly, so usual caveats apply.

Preliminary conclusions:
- Primitive functions and Java interface calls using primitives are fastest (around 2 ns)
- Boxed regular functions, boxed Java interface calls and pre-defined protocol calls are next (around 8ns)
- Protocol calls that have been extended are a bit slower, but still pretty fast (around 14 ns)
- A multi-method that dispatches on class is quite a bit slower  (around 90 ns)
- A multi-method that double-dispatches on [(class a) (class b)] is *much* slower (around 230 ns)
- Reflection really sucks..... (9000 ns)

My takeaway from all this is that we are right to focus on using protocols as the primary dispatch mechanism.

Konrad Hinsen

unread,
Jan 16, 2013, 5:51:02 AM1/16/13
to numerica...@googlegroups.com
Mike Anderson writes:

> Just for fun and possibly to give us some facts to inform API design decisions, I wrote
> some micro-benchmarks using Hugo Duncan's excellent Criterium library to test the
> performance of different method calling conventions.

...

> My takeaway from all this is that we are right to focus on using protocols as the
> primary dispatch mechanism.

Nice! The result is not surprising but it's always better to know than to estimate.

Konrad.

Edmund Jackson

unread,
Jan 17, 2013, 2:55:39 AM1/17/13
to numerica...@googlegroups.com
Really interesting, thanks for putting that together Mike.

Matthew Willson

unread,
Jan 24, 2013, 6:34:54 AM1/24/13
to numerica...@googlegroups.com
Ah thanks for doing this! Slightly disappointing re multimethods, although not entirely unexpected. And of course nanoseconds are nothing compared to the execution time for operations on a middle sized or bigger matrix :)

Suspect it might also be possible to speed up the multimethod a bit with a dedicated class for dispatch whose hashCode is given in terms of (.hashCode (.getClass a)) and (.hashCode (.getClass b)). Might give that a try.

-Matt



On Wednesday, 16 January 2013 03:34:42 UTC, Mike Anderson wrote:

Mike Anderson

unread,
Jan 24, 2013, 7:18:10 AM1/24/13
to numerica...@googlegroups.com
On Thursday, 24 January 2013 19:34:54 UTC+8, Matthew Willson wrote:
Ah thanks for doing this! Slightly disappointing re multimethods, although not entirely unexpected. And of course nanoseconds are nothing compared to the execution time for operations on a middle sized or bigger matrix :)

Quite right for big matrices. 

They really matter on small matrices though: in vectorz for example a 3d matrix addition is ~1.5ns. Even protocol dispatch would be the majority of the execution time :-)
 

Suspect it might also be possible to speed up the multimethod a bit with a dedicated class for dispatch whose hashCode is given in terms of (.hashCode (.getClass a)) and (.hashCode (.getClass b)). Might give that a try.

Hmmm try if you like but not sure if it is worth it. If the matrix is small you need protocols, and (as you correctly observe) if the matrix is medium/large it doesn't matter anyway.... plus I expect the cost is probably in the construction / disposal of the temporary object rather than the computation of the hashcode.

Matthew Willson

unread,
Jan 24, 2013, 8:32:57 AM1/24/13
to numerica...@googlegroups.com
Suspect it might also be possible to speed up the multimethod a bit with a dedicated class for dispatch whose hashCode is given in terms of (.hashCode (.getClass a)) and (.hashCode (.getClass b)). Might give that a try.

Hmmm try if you like but not sure if it is worth it. If the matrix is small you need protocols, and (as you correctly observe) if the matrix is medium/large it doesn't matter anyway.... plus I expect the cost is probably in the construction / disposal of the temporary object rather than the computation of the hashcode.

Fair enough. It was fairly quick to give it a try though -- turns out it speeds up multimethod dispatch by approx 2x for me (129us vs 266us). I can push the extra benchmark if you want.

-Matt

Mike Anderson

unread,
Jan 24, 2013, 9:02:06 AM1/24/13
to numerica...@googlegroups.com
Cool - the more data points the better!

Sure, push it up. It's a useful reference for people interested in this stuff more broadly.... 

Matthew Willson

unread,
Jan 24, 2013, 9:40:17 AM1/24/13
to numerica...@googlegroups.com
OK, have done. Looks like I accidentally pushed to the main repo, apologies didn't realise I had permission!

FWIW, I think it should be possible to write a dispatcher for class-based binary dispatch which is almost as fast as non-interface-based protocol dispatch, along similar lines to the protocol macros -- which IIUC take the dispatch table and compile a dispatcher from it consisting of bunch of isinstance checks. Bit too much of a yak shave for me at the moment though :)

-Matt
Reply all
Reply to author
Forward
0 new messages