Protocols fns and concurrency

9 views
Skip to first unread message

Christophe Grand

unread,
May 10, 2010, 10:34:11 AM5/10/10
to cloju...@googlegroups.com
Hi all,

If I understand correctly a protocol-fn's fastest path is when the class of 1st arg is the same as the one of the previous invocation, it makes sense for tight loops where data tend to be homogeneous.
What bothers me is when two (or more) concurrent loops process each one with a different datatype (and it may happen for low-level fns such as conj): each thread is going to reset the .mre field in MethodImplCache to their favorite value, effectively degrading performance.

Is this a real problem? (if no please disregard next questions)
Can this be adressed with, one day, invokedynamic support?
Does the .mre field need to be volatile (since it points to an immutable Entry object) and could this change (making .mre non-volatile) alleviate this problem?

Thanks,

Christophe

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To post to this group, send email to cloju...@googlegroups.com.
To unsubscribe from this group, send email to clojure-dev...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/clojure-dev?hl=en.

Sean Devlin

unread,
May 10, 2010, 11:15:32 AM5/10/10
to cloju...@googlegroups.com
Just read you post on all of this, nice write up.  Can you provide a real example with observed performance numbers for the switching penalty?  I think this will answer your first question (and help with the others).

Thanks,
Sean

Christophe Grand

unread,
May 10, 2010, 3:12:24 PM5/10/10
to cloju...@googlegroups.com
Sean,

here are my actual measurements and experiments: http://gist.github.com/396387
--
Brussels, 23-25/6 http://conj-labs.eu/
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.cgrand.net/ (en)

Christophe Grand

unread,
May 11, 2010, 4:48:33 AM5/11/10
to cloju...@googlegroups.com
I guess the answer is that low-level protocols are intended to be extended when datatypes are defined (to go through the interface path).


On Mon, May 10, 2010 at 4:34 PM, Christophe Grand <chris...@cgrand.net> wrote:
Hi all,

If I understand correctly a protocol-fn's fastest path is when the class of 1st arg is the same as the one of the previous invocation, it makes sense for tight loops where data tend to be homogeneous.
What bothers me is when two (or more) concurrent loops process each one with a different datatype (and it may happen for low-level fns such as conj): each thread is going to reset the .mre field in MethodImplCache to their favorite value, effectively degrading performance.

Is this a real problem? (if no please disregard next questions)
Can this be adressed with, one day, invokedynamic support?
Does the .mre field need to be volatile (since it points to an immutable Entry object) and could this change (making .mre non-volatile) alleviate this problem?

Thanks,

Christophe




--
Brussels, 23-25/6 http://conj-labs.eu/
Professional: http://cgrand.net/ (fr)
On Clojure: http://clj-me.cgrand.net/ (en)

Rich Hickey

unread,
May 18, 2010, 8:12:01 AM5/18/10
to Clojure Dev


On May 10, 10:34 am, Christophe Grand <christo...@cgrand.net> wrote:
> Hi all,
>
> If I understand correctly a protocol-fn's fastest path is when the class of
> 1st arg is the same as the one of the previous invocation, it makes sense
> for tight loops where data tend to be homogeneous.
> What bothers me is when two (or more) concurrent loops process each one with
> a different datatype (and it may happen for low-level fns such as conj):
> each thread is going to reset the .mre field in MethodImplCache to their
> favorite value, effectively degrading performance.
>
> Is this a real problem? (if no please disregard next questions)

No more so than any other caching strategy.

> Can this be adressed with, one day, invokedynamic support?

No, same issues there. Essentially, the usual strategy for inline
caches is that, should they be determined to be heavily polymorphic
(megamorphic), they back out of caching altogether and use the slow
path always. I think what is happening here is better, as the cost of
this (in this case, ineffective) caching is negligible compared to the
overhead of detecting the polymorphism and bailing out of caching.

> Does the .mre field need to be volatile (since it points to an immutable
> Entry object) and could this change (making .mre non-volatile) alleviate
> this problem?
>

It was non-volatile before, and may return to that, but that was
turned off in a general sweep of other (now resolved) issues. Note
that this is not really a concurrency thing at all. You might see it
just from single threaded use over a heterogeneous collection.

I think what is in question is the presumption of uniform performance.
That is not to be expected. The fastest support is inline definition,
as then you have an interface hooked into the class of the target
itself. (Note that even an instanceof check can vary in perf due to
the complexity of the class hierarchies involved). Next fastest is
(temporally) homogeneous use, optimized by the caching. Finally,
heterogenous use, not helped by the caching, the perf of which is
dominated by the lookup, not the caching of its result. This scales
gracefully and, even in the latter case, is quite fast.

Rich

Christophe Grand

unread,
May 28, 2010, 6:47:21 AM5/28/10
to cloju...@googlegroups.com
On Tue, May 18, 2010 at 2:12 PM, Rich Hickey <richh...@gmail.com> wrote: 

> Does the .mre field need to be volatile (since it points to an immutable
> Entry object) and could this change (making .mre non-volatile) alleviate
> this problem?
>

It was non-volatile before, and may return to that, but that was
turned off in a general sweep of other (now resolved) issues. Note
that this is not really a concurrency thing at all. You might see it
just from single threaded use over a heterogeneous collection.

It doesn't bother me that a branch prediction strategy fails on a heterogeneous collection.
I was worried about performance degrading with the number of concurrent threads and if it was possible to make the most recent entry more thread local.
However I built a clojure.jar with a non volatile mre field and benchmarked it on a Sun T2000: same performance profile as with a volatile mre.
 

I think what is in question is the presumption of uniform performance.

Protocols trade the expression problem against non-uniform performance.
The only thing I was really concerned about was that the shared mre may cause performance to degrade with the number of threads.
 
That is not to be expected. The fastest support is inline definition,
as then you have an interface hooked into the class of the target
itself. (Note that even an instanceof check can vary in perf due to
the complexity of the class hierarchies involved). Next fastest is
(temporally) homogeneous use, optimized by the caching. Finally,
heterogenous use, not helped by the caching, the perf of which is
dominated by the lookup, not the caching of its result. This scales
gracefully and, even in the latter case, is quite fast.

True, while too focused on my question, I forgot that the real fastest path was through the interface associated with a Protocol.

Thanks for your reply,

Christophe
 

Reply all
Reply to author
Forward
0 new messages