Hey this is an update from last month on the state of fibers in nodejs. I've made a lot of improvements to fibers, fixed a lot of bugs, and wanted to send out an update. I'll try to address a lot of questions and misconceptions people have about fibers here.
= = "What are fibers?" = =
Fibers, similar to coroutines and "green threads", are a way to write to asynchronous code in a synchronous way. Essentially they are threads, but without the scheduler. That means the operating system isn't cutting off your threads and starting them again whenever it feels like. The job of swapping to another fiber is up to the client. And at no point will more than 1 fiber be running; you don't get to run multiple fibers at once.
= = "How does that help me write asynchronous code?" = =
Consider a simple program which copies one file to another. Ignoring for a moment that stream.pipe() exists, you're going to have to write at least two callbacks within each other. You also have to check for errors manually which is no fun. That's where fibers come in.
Check out this gist:
Note the difference between copyFileWithoutFiber() and copyFileWithFiber(). Without fibers you must check for errors in your callback to ensure you're not doing anything stupid. Then you have to nest another callback which also checks for errors. If your workflow gets even marginally complicated your code will quickly turn into a deeply-nested mess.
But with fibers you can just use a simple try/catch block to catch errors. No callbacks are needed, you just yield execution back to the original code. When the file is read, your fiber will pick back up where it left off. Notice the code looks synchronous, but to the client copyFileWithoutFiber() and copyFileWithFiber() are /indistinguishable/. Both functions will return immediately and call the callback when their job is done. But with fibers you can isolate your callback code succinctly and then just worry about your workflow.
= = "How does it work?" = =
When you create a new fiber what happens is that it creates an entirely new execution stack. The first frame on this stack is a function that you pass in. After your fiber has been created you can freely switch into and out of that stack by using run() and yield() respectively. When you switch back into a stack everything is exactly how you left it. You resume in the middle of your function, local variables are in tact, closures are fine, and so on. You can pass data into the fiber with run(), and return data back to the caller with yield() (you could also do this with globals). You can even throw exceptions /into/ the fiber with throwInto().
Keep in mind that this is the exact same thing that happens with a thread. Except with a thread you don't control when your thread starts and stops so you have to deal with peculiarities like locking and race conditions. With a fiber you explicitly switch into an in out of fibers, so none of that is a problem.
= = "Sounds expensive..." = =
It's actually not that expensive. Context switches (switching between stacks/threads/fibers) aren't really that expensive; your computer is generally switching between threads 1000's of times per second.
In terms of memory, each fiber (while running) will consume around 64kb of memory. That memory basically all goes to the stack. Fibers are reused in a pool to avoid constantly creating and deleting fibers; creating lots of short-lived fibers is totally acceptable. The pool is currently set to a maximum of 120 fibers. For those of you keeping track at home that's about 8mb of memory, which incidentally is the cost of a /single/ pthread (by default).
If you compare fibers to callbacks in a micro-benchmarking situation you're likely to say "oh no this is too slow!" but that mantra is foolish. Yes fibers will be considerably slower, but that's merely a consequence of the v8 C++/JS membrane. Basically any time you switch between C++ code and JS code you're going to take a relatively large performance hit when compared to a pure Javascript function. This is why all of the v8 runtime is written directly in Javascript. But what you must keep in mind is that cost when compared to something like, reading from database, is very insubstantial.
To put this in perspective, these two functions are about the same in terms of cpu time spent. In fact F2() is slightly slower.
function F1() {
var fiber = Fiber(function() {
yield();
});
fiber.run();
fiber.run();
}
function F2() {
var buf = new Buffer(5);
buf.write('hello');
var buf2 = new Buffer(5);
buf2.write('world');
}
In both cases there are 6 switches between JS and C++ (object destructors count as a JS/C++ switch), and that makes up most of the cost. You wouldn't feel bad about calling native node functions, so don't feel bad about using fibers.
However, I would recommend against using fibers for long polling. Having very large numbers of fibers active (many thousands) may begin to impact garbage collection. For long polling it's not hard to write your long poll in a callback and then start a fiber when you need to do some work.
= = "But there are 100's of libraries which do this already." = =
I'm certain there's nothing out there that will result in code as elegant as you can achieve with fibers. The closest thing you can get are transformation-based approaches like streamline.js and narrativejs, but fibers are just easier to work with.
= = "How can I try it out?" = =
npm install fibers
I'm happy to field questions about fibers, just ask. I've been spending a lot of time over the past month or so thinking about this so I've probably got an answer for you.
Thanks for reading!
~ Marcel