Calling close! on core.async timer channel considered harmful

171 views
Skip to first unread message

Antonin Hildebrand

unread,
Apr 11, 2016, 3:46:38 PM4/11/16
to Clojure Dev
Hello,

Today I experienced a tricky non-deterministic timing-related bug in my code by using timers from core.async. This happened in ClojureScript, but I believe the same problem applies to Clojure implementation as well. I have naively called close! on a channel returned from core.async's (timeout ...) call.

Now pause and think what could possibly go wrong...

The problem is that timers implementation has a nice optimization: "timeout events are coalesced". If a new timeout which is being created is scheduled to a similar time in future as some already existing scheduled timeout, the returned channel is reused. That means that (timeout ...) call may return an existing "recycled" timeout channel which is shared with some other code.

If you were unaware of actual timers implementation you might expect to get a normal new channel and be allowed to do whatever you want with it. Docs are not helpful here.

The problem of this is that internal timers implementation optimization changes assumption about valid channel operations on returned timer channel. In this case close! is clearly unsafe operation.

The trickiness of this issue is in non-determinism. My code had been working fine until I hit specific performance profile of my code where some timers got coalesced. Trying to track it down was a difficult task because the issue was not reproducible deterministically and after adding a bunch of logging calls the problem went intermittently away (until it returned in another shape or form). Also the issue was non-local. My failing code (due to timeouts closing unexpectedly early) had nothing to do with the actual "bad" code calling close! which happened to be completely unrelated and in a different namespace.

I understand that it is probably too late to change semantics of timeout API. I would at least propose to implement an internal channel flag which would mark a channel as "read-only" or "owned". Mutation calls to such channel would raise an exception / assert with some explanation. Internal timers code would still be able to close the channel because it would "own it".

regards,
Antonin
Reply all
Reply to author
Forward
0 new messages