Channels as a superior alternative to event emitters

70 views
Skip to first unread message

pcxunl...@gmail.com

unread,
Sep 22, 2014, 9:16:05 AM9/22/14
to strati...@googlegroups.com
Channels are used in Clojure and Go. They are based on CSP. Channels are superior to Event Emitters and I will now explain why.

Imagine you are writing a program. It probably does all kinds of asynchronous things: reading or writing to the disk, maybe communicating with some other server, maybe waiting for a click to happen on a DOM node...

The normal way of doing this is with callbacks and/or event listeners, which are essentially the same thing. So you use Node's "fs.readFile", or DOM "addEventListener" or whatever.

This all works fine! Because you have multiple parts of your program operating simultaneously, it's difficult to reason about the control flow, but other than that this way of doing things works: you receive events, and you also *never miss events*. This last bit here is very important!

It's very annoying to a user if they click on something and nothing happens. Or if you're receiving data from a server in chunks, it would be terrible if you missed part of it, thereby corrupting things. Or if you're trying to keep your program in sync with some external source, if you missed some events, your internal data is now out of sync!

To make this very concrete, let's talk about a Google Chrome Extension. These extensions are written in JavaScript and can attach event handlers which will be called on various events, like when a tab is opened, updated, moved around, or closed.

So you're making a program which makes it easier to manage tabs in Chrome, and to do so it listens to these tab open/update/close events, and then updates the program's state to match the tabs in Chrome. If you missed any of these events, your program's state is now wrong! This is terrible.

Thankfully, it *never happens*! Ever! Event listeners are clunky and annoying, but they have the critical benefit that they never miss events: even if the user opens 1,000 tabs simultaneously, you will receive each and every single one of those 1,000 events.

Sure, the server might go down half-way through, or various other nasty stuff might happen that would cause events to *not be sent* in the first place, but that's different from *losing* events: if the event is sent at all, it will be received. Period.

Now, moving onto SJS, we have the sjs:event/Emitter function. This gives us a nicer way to deal with events: rather than an event handler that will be called at some random time in the future, we have a stream of events that we can read from. Reading from the stream is synchronous and blocking, so we can use things like waitfor/and, waitfor/or, and try/retract to deal with events. This is great!

There's just one problem with it: it drops events! This means if you do the normal idiomatic way of...

    someEventEmitter ..@each(function (x) {
      callSomeFunction(x)
    })

...then you might miss some events from "someEventEmitter"! Why does this happen? What's going on is, if "callSomeFunction" blocks, then it allows other code to run. That other code then calls "someEventEmitter.emit", which will push the event into "someEventEmitter", but because "callSomeFunction" is blocked right now, it can't deal with the event, so it just ignores it.

And so the default, normal, idiomatic way of dealing with events loses events. This isn't just some rare thing either: it happens commonly. And what's worse, it fails *silently*: you might test your code and it works fine, but then when you deploy it and a user opens a bunch of tabs at once, suddenly you miss events and your internal state is wrong. In the best case you'll trip up something and get an error, in the worst case you won't even realize that it's wrong and you get data corruption.

So, what's the answer to this? Buffers! The problem is that downstream isn't fast enough to keep up with upstream, so we store the unused events in a buffer, so that downstream can then grab them later, when it's ready.

Sounds good, except... buffers are fixed size. And when the buffer runs out, it starts dropping events again! You can say, "buffer only 1,000 items" or whatever, but that then means that if a user opens 1,001 tabs at once, you now drop events again. You might say, "aw, but that never happens, who opens 1,000 tabs at once?", but you're wrong: some users do. With Chrome's Task Manager, it's possible to keep literally thousands of tabs open at once, without consuming any RAM or CPU.

Well, okay, then I'll just make the buffer really big, like 9,999,999, or maybe even Infinity! Okay, that works, now you'll never miss any events. Great, we're now on par with event listeners!

So now you have the fun of adding a buffer to every single event emitter you use! And if you forget (which is easy to do), then you won't even find out about it until later, because dropped events fail silently!

So, rather than using efficient event listeners, you now have to use inefficient buffers on every single event emitter, even in the situations where you don't really need a buffer in the first place. Why do you *always* have to use a buffer? Well, there's two situations: either upstream adds the buffer, or downstream adds the buffer.

If upstream is supposed to add the buffer, then it has to always add it, because upstream doesn't know whether downstream will block or not. If downstream is supposed to add the buffer, then it only needs to add the buffer if it actually blocks. But because *any* function can block at any time, it's not always trivial to determine whether it blocks or not. And what if it starts off non-blocking, but then later you make it blocking, but forget to add a buffer? It's just easier and safer to *always* buffer.

Phew, what a pain, but that's the price you pay for having nice waitfor/and semantics, right? Not true, we can do better. This is where channels come in.

A channel is *identical* to an event emitter in every way, except for one tiny detail: with event emitters, sending out an event never blocks. But putting something into a channel blocks. That's it. But that one detail changes everything.

The way channels work is, you can take an item from the channel, and you can put an item into the channel. If something tries to take from the channel and there's nothing in it, it will block. This is the same as event emitters. But if something tries to put something into the channel and nothing is waiting to take from it, it will also block. This is the key difference.

This means that channels will never take more than they can handle, and will never put more than they can handle. It also means that they provide synchronization: it will only unblock when one side is trying to take and one side is trying to put, at the same time.

Now suddenly the default idiom of using "@each" on a channel is totally fine: you will never miss events, because it won't put onto a channel until the take is ready. And you still get the loveliness with waitfor/and, waitfor/or, and try/retract.

You can *optionally* add a buffer to a channel, which can be either fixed size or sliding: if the fixed size buffer runs out, it doesn't drop events like an event emitter, it instead blocks! The sliding buffer allows a channel to never block, but it does drop events. This is totally fine: sometimes you need to drop events. But it should be a conscious decision, not the default!

Rather than having to add a buffer to every single event emitter to prevent losing events, it's instead idiomatic to not use buffers with channels unless there's a good reason to.

But wait a minute, what about converting from event listeners (like "addEventListener") to channels? Channels block, but event listeners don't. Channels have a queue for put and take: this essentially acts as an unbounded buffer, so you can put onto a channel from a non-blocking source like "addEventListener" and you won't lose events.

----

So! I wrote a channel module, and here's the code: http://pastebin.com/xanjhZTk

Right now, the queues have an arbitrary limit of 1,024 items at any one time, but unlike event emitters, it will throw an error when you put too many: no silent failures. And if you really need to, you can configure the limit to be higher.

It does work, and I have tested it, but it could still use some polish, documentation, unit tests, etc. which I will provide over time.

It would be nice if this could replace the sjs:event module, but if that's unacceptable, then having both sjs:event and sjs:channel at the same time is acceptable as well.

Regardless, I will be using this module in my own code (which deals with many asynchronous issues), both to improve the module, and also to prove that channels work well.
Reply all
Reply to author
Forward
0 new messages