Unexpected core.async timeout behaviour

328 views
Skip to first unread message

Peter Taoussanis

unread,
Mar 28, 2014, 2:48:12 AM3/28/14
to clo...@googlegroups.com
Hi all, quick question:

`(dotimes [_ 50000] (go (<! (async/timeout 100))))` runs as expected.
`(dotimes [_ 50000] (go (<! (async/timeout (+ 50 (rand-int 100))))))` produces an error:

(< (.size takes) impl/MAX-QUEUE-SIZE)> java.lang.AssertionError: Assert failed: No more than 1024 pending takes are allowed on a single channel.

It appears (?) that there's a (surprisingly low?) limit to the number of unique timeout timestamps that can be simultaneously queued. 

Is this the expected behaviour running Clojure 1.6.0, core.async 0.1.278.0-76b25b-alpha?

Much appreciated, thanks! Cheers :-)

Tim Visher

unread,
Mar 28, 2014, 7:50:07 AM3/28/14
to Clojure
Hi Peter,
I _still_ have no personal experience with core.async ( :((( ), but I
did spot this message coming through recently which answers your
question with 'yes' I believe.

https://groups.google.com/forum/#!searchin/clojure/1024/clojure/NIPIzJ7l6RA/Idm1_2GMlCMJ

It sounds like you need to use a different kind of channel buffer
(whatever that means :).

--

In Christ,

Timmy V.

http://blog.twonegatives.com/
http://five.sentenc.es/ -- Spend less time on mail

Peter Taoussanis

unread,
Mar 28, 2014, 8:08:12 AM3/28/14
to clo...@googlegroups.com
Hi Tim, thanks for the info!

It's not clear to me that this is the same issue, unfortunately. (Though I may be missing something obvious).

In the example I've provided above, we're actually creating a _new_ channel for each take. The problem appears to be either some interaction between the loop and core.async that I'm not aware of, or something on the _implementation-end_ that is bumping up against the referenced issue (i.e. an insufficiently-buffered channel somewhere).

So there's actually no channel here that I could be buffering, since it's not my channel that's overflowing. Again, modulo me missing something obvious :-)

Does that make sense?

Tim Visher

unread,
Mar 28, 2014, 8:16:04 AM3/28/14
to Clojure
Ah, forgive me for not seeing the subtlety and getting excited about
being able to help in some small way on a core.async problem. :)

Can one of the adults chime in?

Peter Taoussanis

unread,
Mar 28, 2014, 8:21:16 AM3/28/14
to clo...@googlegroups.com
Please, not at all! Appreciate any ideas :-)

Timothy Baldridge

unread,
Mar 28, 2014, 9:24:23 AM3/28/14
to clo...@googlegroups.com
This is a caused by an interesting interaction of two things: 1) channels can have no more than 1024 pending takes at a time. and 2) (timeout) "caches" it's return value for a given ms + window size of time. At the moment, window is about 5-10ms. 

This error message is normally the result of a bug. That is to say, if you have 1024 pending (not buffered, but actual blocking) takes/puts, then normally your system is not considering back pressure in some way. So we created an arbitrary limit imposed to keep people from writing bad code. If you think of it, pending takes/puts can create unbounded queues on the input/output of a channel. This is an attempt to make that queue size bounded. 

However, in your problem you have something else going on as well. In order to make calls to timeout fast inside an inner loop, timeout will often return the same channel from more than one call to the function. So if you call timeout every millisecond, you'll get the same channel about 5-10 times. This increases performance, and since the timeout logic involved isn't highly accurate anyways this performance optimization rarely causes very many problems. 

If this is causing a problem in an actual system I'd love to hear about it. Neither of these issues are caused by hard limits in the design of core.async. So they could be tweaked, but we'll probably only do so if we have a concrete example of a problem. So a use case would be needed, instead of an arbitrary failing test case. 

Timothy


On Fri, Mar 28, 2014 at 6:21 AM, Peter Taoussanis <ptaou...@gmail.com> wrote:
Please, not at all! Appreciate any ideas :-)

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
“One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.”
(Robert Firth)

Peter Taoussanis

unread,
Mar 28, 2014, 9:37:31 AM3/28/14
to clo...@googlegroups.com
Okay, fantastic - appreciate the detailed info Timothy!

This did actually came up in staging today; reduced it to the toy example here. Now that I understand what's happening, let me think about it a little and get back to you.

BTW I don't think I've ever thanked you personally for your work on core.async. It's incredible, a real game changer and a pleasure to work with - so thank you.

Peter Taoussanis

unread,
Mar 28, 2014, 9:52:47 AM3/28/14
to clo...@googlegroups.com
One thing I'm not clear on: if I've understood your explanation correctly, I would expect the 100ms timeout to produce this error _more_ (not less) often.

So can I just confirm some things here?

1. `async/timeout` calls can (always?) get "cached" to the nearest TIMEOUT_RESOLUTION_MS.
2. In this tight loop example, that means that `<!` is sometimes getting called against the same (cached) timeout channel.
3. It's happening sufficiently often (do to the high loop count+speed) to overflow the [unbuffered] timeout channel's implicit take buffer.

Is that all right?

If so, why isn't the fixed `(async/timeout 100)` channel producing the same (or worse) behaviour? Is something preventing it from being cached in the same way?

Ghadi Shayban

unread,
Apr 15, 2014, 8:19:26 PM4/15/14
to clo...@googlegroups.com
Dredging this back up, you've read the scenario properly.  Timers are coalesced a particular resolution.

Use cases needing more than 1024 active handles on a channel can use a mult.  For example, if you had to timeout every request at a same time exactly 5 minutes in the future, (mult (timeout 300000)) and then give every request a fresh tap of that.

The fixed 100 case I think you're getting lucky, whereas the random window you're getting unfavorable coalescing, which seems counterintuitive.
Reply all
Reply to author
Forward
0 new messages