All concurrency models suck

614 views
Skip to first unread message

Kenton Varda

unread,
Nov 15, 2013, 10:33:08 PM11/15/13
to capnproto
Threads:
- Locking is hard.  Not enough and you corrupt memory randomly, too many and you deadlock randomly.
- Lockless is harder, and only works in restricted use cases.
- OS threads are expensive, while green threads suffer from similar problems to even loops -- while still having the other problems of threads.
- Trying to multiplex RPCs over a network connection?  Now you need an extra context switch whenever you receive a message on it.

Event loops / actors:
- Your interface must specify whether it can block.  Realize later you need to block somewhere?  Time to rewrite EVERYTHING.
- If you just assume everything could block and make all APIs asynchronous, performance suffers from bookkeeping overhead and code is painful to write.
- Pretty much everything has to be heap-allocated.
- An uncooperative event callback can cause starvation.
- It's hard to get optimal resource utilization, because in practice you can only use the CPU or main memory or the disk at any particular time.  Sure, there are supposedly APIs for asynchronous disk I/O, but in reality they are hard to use, their implementations are not well-optimized, and lots of disk I/O actually happens through virtual memory (whether swap or mmap()-based) which obviously cannot be asynchronous.  Using main memory and CPU simultaneously requires hyperthreading, which of course requires threading.

Producer/consumer, publish/subscribe:
- Great for Big Data processing, but doesn't fit the request/response model of interactive software or most fine-grained processing.

Pure functional:
- Let's be honest; your program probably has state, and you aren't smart enough to transform it into one that doesn't.
- Despite functional code being inherently parallel, the magical compiler that can parallelize it has yet to be invented.

Transactional Memory:
- May allow event loops to utilize multiple CPUs while still appearing sequential.  Doesn't realistically solve the other problems with event loops.
- May make mutex locking cheaper.  Doesn't make it easier.

============================

After a few months down the EventLoop path I am wondering whether I should have done threads instead.  Initially I wasn't sure how to do pipelining in that case but in retrospect it's not that complicated:  client method calls can still return something promise-like that supports pipelining, but has a wait() method that blocks instead of a then(callback) method.

I like my technique of using const-correctness to enforce thread-safety, with MutexGuarded<T> forcing locking when needed (no chance of forgetting to lock).  It seems to make thread-safety somewhat tractable.

Starting over now isn't a good idea, though.  Maybe I can add the second model some time in the future, as an option...

-Kenton

Kenton Varda

unread,
Nov 15, 2013, 11:22:52 PM11/15/13
to capnproto
Relatedly, I've decided to significantly relax the thread-safety guarantees in the current event loop design, as handling threads to the degree I've tried turns out to be slow and complicated.

In particular:
- Encourage the use of Cap'n Proto capabilities as the way to do inter-thread communication, since they already provide strong thread-safety guarantees.
- Give up on cross-thread synchronous promise destruction.  Destroying a cross-thread Promise will only cause the task to eventually be canceled, which is anyway the same guarantee you get across RPC.  This is the most significant change which massively simplifies the implementation.
- Events queued cross-thread will land in a different queue than local events.  That separate queue will only be checked every now and then, to avoid the need to synchronize on every event.  This is more of an optimization than a semantic change.
- Perhaps remove the ability to have two threads working on building the same message.  This requires an atomic op on every object allocation which has a measurable (not huge, but measurable) performance impact.

Thoughts?

-Kenton

Andrew Lutomirski

unread,
Nov 16, 2013, 12:40:45 AM11/16/13
to Kenton Varda, capnproto
On Fri, Nov 15, 2013 at 8:22 PM, Kenton Varda <temp...@gmail.com> wrote:
> - Events queued cross-thread will land in a different queue than local
> events. That separate queue will only be checked every now and then, to
> avoid the need to synchronize on every event. This is more of an
> optimization than a semantic change.

I like this approach.

> - Perhaps remove the ability to have two threads working on building the
> same message. This requires an atomic op on every object allocation which
> has a measurable (not huge, but measurable) performance impact.

As a default feature, atomic ops to allocate structs in the arena
sounds like a lot of expense (probably even worse on non-x86) for a
feature that will probably almost never be used.

On the other hand, it might not be so hard to create a variant of
MessageBuilder that allocates separate segments to each thread. I'm
not sure how useful this would be.

--Andy

Kenton Varda

unread,
Nov 17, 2013, 5:52:49 AM11/17/13
to capnproto
I just had a nutty idea...  What if we implemented Promise::wait() using fibers (green threads)?

The current implementation of wait() is highly unsafe to use:  it runs the event loop until the promise resolves.  The problem is that if any other event also does a wait(), then the first wait() cannot return until the second one, because they're on the same stack and that's how stacks work.  But if wait() instead switched to a new fiber and continued the event loop there, then the waits could actually return in any order.

The neat thing about this is that it allows you to reclaim some of the nicer features of thread-based programming:  the ability to allocate on the stack and not worry about capturing object ownership in lambdas and such.

Meanwhile, however, it's still fundamentally event-loop programming.  Your state can only change out from under you at defined points, not at just any instruction.  No need for lots of mutexes.  Yes, this means it's cooperative multitasking, although you can always create real threads (with separate event loops) when you need them and communicate with them safely via Cap'n Proto interfaces.

If you're using GCC (or a very recent Clang, I think?), you can even use -fsplit-stack to get segmented stacks which avoids the need to allocate a lot of space for stacks, allowing you to create lots of fibers and maybe even get a Go-like feel.

I think this would actually be pretty easy to implement...  Both Unix and Windows have standard APIs for fibers.

-Kenton

Andrew Lutomirski

unread,
Nov 17, 2013, 2:15:03 PM11/17/13
to Kenton Varda, capnproto
On Sun, Nov 17, 2013 at 2:52 AM, Kenton Varda <temp...@gmail.com> wrote:
> I just had a nutty idea... What if we implemented Promise::wait() using
> fibers (green threads)?
>
> The current implementation of wait() is highly unsafe to use: it runs the
> event loop until the promise resolves. The problem is that if any other
> event also does a wait(), then the first wait() cannot return until the
> second one, because they're on the same stack and that's how stacks work.
> But if wait() instead switched to a new fiber and continued the event loop
> there, then the waits could actually return in any order.
>
> The neat thing about this is that it allows you to reclaim some of the nicer
> features of thread-based programming: the ability to allocate on the stack
> and not worry about capturing object ownership in lambdas and such.
>
> Meanwhile, however, it's still fundamentally event-loop programming. Your
> state can only change out from under you at defined points, not at just any
> instruction. No need for lots of mutexes. Yes, this means it's cooperative
> multitasking, although you can always create real threads (with separate
> event loops) when you need them and communicate with them safely via Cap'n
> Proto interfaces.
>
> If you're using GCC (or a very recent Clang, I think?), you can even use
> -fsplit-stack to get segmented stacks which avoids the need to allocate a
> lot of space for stacks, allowing you to create lots of fibers and maybe
> even get a Go-like feel.
>
> I think this would actually be pretty easy to implement... Both Unix and
> Windows have standard APIs for fibers.

I think this could be useful for some, but maybe not all, applications.

Are you thinking of allowing fiber switching from inside the event
loop or just from outside? If the former, I think this'll cause
problems (it will break the really nice thing of a callback-less event
loop in that lots of things will become reentrant). If the latter, I
think it could work well.

Can this kind of thing be done outside the core EventLoop? Suppose
that, rather than exposing wait, EventLoop had a lower-level but more
flexible exit condition mechanism. You could add Promises / Jobs /
Events / whatever to the set of conditions that would cause the
EventLoop to exit and then you could ask the EventLoop to run until at
least one exit condition were satisfied. Code outside EventLoop (a
wrapper, for example) could have a function called wait that would add
something to the set of exit conditions, switch to the EventLoop's
fiber, run until exit, and then switch to whatever fiber caused the
wakeup.

Other programs that don't want to use fibers could use a different
wait implementation that just invokes the event loop directly but only
ever adds one exit condition.

One thing I like about C++ is that it doesn't impose any particular
concurrently model. I think that EventLoop could stay
concurrently-model-agnostic without much complexity.

--Andy
> --
> You received this message because you are subscribed to the Google Groups
> "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to capnproto+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/capnproto.

Kenton Varda

unread,
Nov 17, 2013, 6:29:30 PM11/17/13
to Andrew Lutomirski, capnproto
I very much meant that fibers would be used inside the event loop.

Yes, this does mean that your code may be reentered during a regular function call.  Clearly, every function would still have to document whether it does any waits, so that callers can be prepared for that.  However, what we have today is that function calls which need to wait on something have to return a promise, and then the caller has to register a continuation on that promise, which requires allocating state on the heap and transferring ownership into the continuation, etc., which is all very ugly.  And obviously, between the time the continuation is registered and the time it actually runs, state could change arbitrarily.  So you aren't actually avoiding any reentrance complication compared to fibers, but you are adding a lot of complication to deal with continuations.

The real difference between the fiber and continuation approaches here is that now it's possible to accidentally not realize that a particular function call waits.  You have to be really clear about which functions are allowed to wait and which aren't, maybe with a strict naming convention or some such.  That is a problem, but I think it might be outweighed by the benefits.

-Kenton

Jason Paryani

unread,
Nov 17, 2013, 7:20:47 PM11/17/13
to capn...@googlegroups.com, Andrew Lutomirski
Have you given any thought to how this would play with the Python wrappings? It sounds like I'd have to wrap every call in a PyGILState_Ensure/Release to ensure you don't re-enter the Python interpreter, but that's a blocking operation. Will this work?

Andrew Lutomirski

unread,
Nov 17, 2013, 8:20:59 PM11/17/13
to Kenton Varda, capnproto
On Sun, Nov 17, 2013 at 3:29 PM, Kenton Varda <temp...@gmail.com> wrote:
> I very much meant that fibers would be used inside the event loop.
>
> Yes, this does mean that your code may be reentered during a regular
> function call. Clearly, every function would still have to document whether
> it does any waits, so that callers can be prepared for that. However, what
> we have today is that function calls which need to wait on something have to
> return a promise, and then the caller has to register a continuation on that
> promise, which requires allocating state on the heap and transferring
> ownership into the continuation, etc., which is all very ugly. And
> obviously, between the time the continuation is registered and the time it
> actually runs, state could change arbitrarily. So you aren't actually
> avoiding any reentrance complication compared to fibers, but you are adding
> a lot of complication to deal with continuations.

I'm still unconvinced that you need fibers in the event loop for that.
What if, instead, you added a new method on EventLoop (or a
not-actually-in-EventLoop fiber mechanism) like:

template<typename T, typename Func> Promise<T> runInAFiber(Func&& f)

That function could create a fiber that's set to run f, add that fiber
to the scheduler, and return a Promise for the fiber's return value.
As a shortcut, you could make:

promise.thenInAFiber([] (Value v) {...})

have the same effect as:

promise.then([] (Value v) { return whatever.runInAFiber([&v] { return
continuation(move(v)); } })

where continuation is whatever got passed to thenInAFiber. (There's
presumably room to optimize that to remove a few levels of
indirection.)

Benefits:

1. EventLoop stays simple.
2. Fibers are optional.
3. Functions called from the event loop never result in a fiber
switch, so there aren't surprise reentrancy issues.


For Python, this would probably want to work with greenlet / gevent.
Making that work might require a bit of help from the kj
implementation.

--Andy

Kenton Varda

unread,
Nov 17, 2013, 8:39:11 PM11/17/13
to Jason Paryani, capnproto, Andrew Lutomirski
On Sun, Nov 17, 2013 at 4:20 PM, Jason Paryani <jpar...@gmail.com> wrote:
Have you given any thought to how this would play with the Python wrappings? It sounds like I'd have to wrap every call in a PyGILState_Ensure/Release to ensure you don't re-enter the Python interpreter, but that's a blocking operation. Will this work?

Well, first, to be clear, what I'm proposing would be optional.  The Cap'n Proto client interfaces would use it at all -- they need to return promises to give the caller an opportunity to do pipelining.  But fiber-based callers can easily call .wait() on any promise.

Point is, Python doesn't necessarily have to expose this behavior.  And maybe it's less important in Python anyway since there isn't as much performance benefit to be had from stack allocation, and capture-by-move is a non-issue.

That said, if you did want to expose this in python, the whole point is that you would want to allow reentry into the Python interpreter from another fiber.  You probably wouldn't have to do anything special with the GIL to make this work -- from Python's point of view, it just looks like you made a call into some C++ code and then that code called back into some Python code elsewhere.

That said, I can imagine garbage collectors being not entirely happy about fibers, if they need to scan the stack.  Depends on the implementation, though.

-Kenton

Kenton Varda

unread,
Nov 17, 2013, 8:44:42 PM11/17/13
to Andrew Lutomirski, capnproto
On Sun, Nov 17, 2013 at 5:20 PM, Andrew Lutomirski <an...@luto.us> wrote:
I'm still unconvinced that you need fibers in the event loop for that.
 What if, instead, you added a new method on EventLoop (or a
not-actually-in-EventLoop fiber mechanism) like:

template<typename T, typename Func> Promise<T> runInAFiber(Func&& f)

I think this is actually what I had in mind.  If you have a function which returns a promise, it really has to be non-blocking to make sense, so anyone wanting to implement such a function using fibers would need to call something like runInAFiber() under the hood.

Andrew Lutomirski

unread,
Nov 17, 2013, 8:48:09 PM11/17/13
to Kenton Varda, capnproto
I guess what I mean is that functions passed into Promise::then and
EventLoop::evalLater should not be permitted to cause a fiber switch.
I'm fine when them scheduling a fiber, a thread, or anything else for
later execution.

--Andy

Kenton Varda

unread,
Nov 17, 2013, 9:00:17 PM11/17/13
to Andrew Lutomirski, capnproto
On Sun, Nov 17, 2013 at 5:48 PM, Andrew Lutomirski <an...@luto.us> wrote:
I guess what I mean is that functions passed into Promise::then and
EventLoop::evalLater should not be permitted to cause a fiber switch.
I'm fine when them scheduling a fiber, a thread, or anything else for
later execution.

I am fine with that if it actually simplifies the implementation, but I'm not completely convinced yet that it does.  That is, I'm not convinced that then() couldn't deal gracefully with a callback that just happens to end up calling wait().  But I haven't worked out the details yet; perhaps you're seeing something that I haven't gotten to.

Andrew Lutomirski

unread,
Nov 17, 2013, 9:13:08 PM11/17/13
to Kenton Varda, capnproto
I'm sure the implementation is doable and possibly just as short, but
I'm pretty sure it'll be harder to understand why it's correct. That
being said, one thing I *really* like about the Promise model (as
opposed to explicit callbacks between one object and another) is that
state doesn't change across events that get fired. For example,
network code could do (in code that is called by EventLoop, which, in
the code I write, tends to be essentially everything):

Promise<kj::array<char>> data = stream.read();

or

fulfiller.fulfill(someValue);

without having to think at all about whether it will be reentered.

If code called from EventLoop can switch fibers, then presumably
fulfiller.fulfill is okay, but you'll have to look at the docs or
implementation of read() to see if that part is okay. IMO this
negates the big advantage of using Promises.

If, on the other hand, it's impossible to switch fibers from inside
the EventLoop, then there's nothing to think about. Code written in
the synchronous style (using wait, etc) expects fiber switches, and
code that lives in Promises knows that fibers won't switch. And you
can call back and forth between those style with minimal fuss.

--Andy

Kenton Varda

unread,
Nov 17, 2013, 11:17:32 PM11/17/13
to Andrew Lutomirski, capnproto
[re-adding capnproto cc that got lost]

So I just had another thought:  What if wait() requires, as a parameter, a reference to an object representing the calling fiber.  This reference must be passed down the stack all the way from the callback that started the fiber.  Thus, every function along the way is required to make explicit the fact that it might block, by taking a Fiber& as a parameter.  Thus it's impossible to accidentally miss the fact that a particular function blocks.

-Kenton

On Sun, Nov 17, 2013 at 7:12 PM, Andrew Lutomirski <an...@luto.us> wrote:
On Sun, Nov 17, 2013 at 6:56 PM, Kenton Varda <temp...@gmail.com> wrote:

> On Sun, Nov 17, 2013 at 6:13 PM, Andrew Lutomirski <an...@luto.us> wrote:
>>
>> If code called from EventLoop can switch fibers, then presumably
>> fulfiller.fulfill is okay, but you'll have to look at the docs or
>> implementation of read() to see if that part is okay.  IMO this
>> negates the big advantage of using Promises.
>
>
> Right, that's the tradeoff I described.

>
>>
>> If, on the other hand, it's impossible to switch fibers from inside
>> the EventLoop, then there's nothing to think about.  Code written in
>> the synchronous style (using wait, etc) expects fiber switches, and
>> code that lives in Promises knows that fibers won't switch.  And you
>> can call back and forth between those style with minimal fuss.
>
>
> Yes, that's what I'm proposing.
>
> OK, I see...  are you just saying that you want people to explicitly use
> thenInFiber() so that you know that if you accidentally call wait() from a
> continuation registered with then(), you'll get an exception?  That makes
> sense.

Exactly.

--Andy

Andrew Lutomirski

unread,
Nov 18, 2013, 12:44:39 AM11/18/13
to Kenton Varda, capnproto


On Nov 17, 2013 8:17 PM, "Kenton Varda" <temp...@gmail.com> wrote:
>
> [re-adding capnproto cc that got lost]
>
> So I just had another thought:  What if wait() requires, as a parameter, a reference to an object representing the calling fiber.  This reference must be passed down the stack all the way from the callback that started the fiber.  Thus, every function along the way is required to make explicit the fact that it might block, by taking a Fiber& as a parameter.  Thus it's impossible to accidentally miss the fact that a particular function blocks.
>

There's a risk of cheating, but that may be minor.

What's the advantage?  I think that thenInAFiber is still needed (w/o the Fiber& parameter) so that functions that don't block can nonetheless spawn fibers.

--Andy

> -Kenton
>
> On Sun, Nov 17, 2013 at 7:12 PM, Andrew Lutomirski <an...@luto.us> wrote:
>>
>> On Sun, Nov 17, 2013 at 6:56 PM, Kenton Varda <temp...@gmail.com> wrote:
>> > On Sun, Nov 17, 2013 at 6:13 PM, Andrew Lutomirski <an...@luto.us> wrote:
>> >>
>> >> If code called from EventLoop can switch fibers, then presumably
>> >> fulfiller.fulfill is okay, but you'll have to look at the docs or
>> >> implementation of read() to see if that part is okay.  IMO this
>> >> negates the big advantage of using Promises.
>> >
>> >
>> > Right, that's the tradeoff I described.
>> >
>> >>
>> >> If, on the other hand, it's impossible to switch fibers from inside
>> >> the EventLoop, then there's nothing to think about.  Code written in
>> >> the synchronous style (using wait, etc) expects fiber switches, and
>> >> code that lives in Promises knows that fibers won't switch.  And you
>> >> can call back and forth between those style with minimal fuss.
>> >
>> >
>> > Yes, that's what I'm proposing.
>> >
>> > OK, I see...  are you just saying that you want people to explicitly use
>> > thenInFiber() so that you know that if you accidentally call wait() from a
>> > continuation registered with then(), you'll get an exception?  That makes
>> > sense.
>>
>> Exactly.
>>
>> --Andy
>
>

Kenton Varda

unread,
Nov 18, 2013, 12:51:39 AM11/18/13
to Andrew Lutomirski, capnproto
Yes, you still need thenInAFiber() -- in fact, that's how you get the Fiber& in the first place.  (Regular then() would not pass a Fiber& to its continuation.)

The point is, even if you are using thenInAFiber(), you need to know which calls that you make may block so that you know where to be prepared for reentrance.  If you call foo() and then bar(), both of which block, you may be aware that foo() blocks but forget that bar() blocks and have all the same problems.  But if both functions take the Fiber& parameter, then you are forced to be aware that they both could block.

Maybe "Fiber" is not the best name for this type.  Maybe "Blocking"?  And by convention, you would always name the parameter "blocking".  Then you get code like this:

  void foo(Blocking& blocking) {
    bar();
    baz(blocking);
    quxt();
  }

It's very easy to see which call blocks.

-Kenton

Andrew Lutomirski

unread,
Nov 18, 2013, 1:06:46 AM11/18/13
to Kenton Varda, capnproto
On Sun, Nov 17, 2013 at 9:51 PM, Kenton Varda <temp...@gmail.com> wrote:
> Yes, you still need thenInAFiber() -- in fact, that's how you get the Fiber&
> in the first place. (Regular then() would not pass a Fiber& to its
> continuation.)
>
> The point is, even if you are using thenInAFiber(), you need to know which
> calls that you make may block so that you know where to be prepared for
> reentrance. If you call foo() and then bar(), both of which block, you may
> be aware that foo() blocks but forget that bar() blocks and have all the
> same problems. But if both functions take the Fiber& parameter, then you
> are forced to be aware that they both could block.

Hmm. I was imagining that you'd assume that everything could block
(i.e. pretend that you're using a real thread, not a fiber), but that
could be useful too.

>
> Maybe "Fiber" is not the best name for this type. Maybe "Blocking"? And by
> convention, you would always name the parameter "blocking". Then you get
> code like this:
>
> void foo(Blocking& blocking) {
> bar();
> baz(blocking);
> quxt();
> }

To me, "blocking" means "waits w/o busy-looping", which isn't really
the same thing as "may reenter me".

It could be worth looking at how Go handles this.

--Andy

Kenton Varda

unread,
Nov 18, 2013, 1:17:09 AM11/18/13
to Andrew Lutomirski, capnproto
On Sun, Nov 17, 2013 at 10:06 PM, Andrew Lutomirski <an...@luto.us> wrote:
Hmm.  I was imagining that you'd assume that everything could block
(i.e. pretend that you're using a real thread, not a fiber), but that
could be useful too.

That doesn't really work.  In a real thread, you'd use mutexes to protect your state, but you can't use mutexes with fibers because you'd deadlock.  We could invent a kind of mutex explicitly for fibers which switches fibers if the lock is busy.  But, arguably the whole point of using fibers instead of threads is to avoid the need for mutexes and all the errors people make with them.  To avoid mutexes, people need to know exactly where reentrance can happen and handle it there, while being able to skip that extra work everywhere else.
 
To me, "blocking" means "waits w/o busy-looping", which isn't really
the same thing as "may reenter me".

Fair enough.  Any better ideas?
 
It could be worth looking at how Go handles this.

I think their threads are actually preemptive, and they encourage inter-thread communication via channels.  But I don't know much about Go.

Andrew Lutomirski

unread,
Nov 18, 2013, 1:24:45 AM11/18/13
to Kenton Varda, capnproto


On Nov 17, 2013 10:17 PM, "Kenton Varda" <temp...@gmail.com> wrote:
>
> On Sun, Nov 17, 2013 at 10:06 PM, Andrew Lutomirski <an...@luto.us> wrote:
>>
>> Hmm.  I was imagining that you'd assume that everything could block
>> (i.e. pretend that you're using a real thread, not a fiber), but that
>> could be useful too.
>
>
> That doesn't really work.  In a real thread, you'd use mutexes to protect your state, but you can't use mutexes with fibers because you'd deadlock.  We could invent a kind of mutex explicitly for fibers which switches fibers if the lock is busy.  But, arguably the whole point of using fibers instead of threads is to avoid the need for mutexes and all the errors people make with them.  To avoid mutexes, people need to know exactly where reentrance can happen and handle it there, while being able to skip that extra work everywhere else.

Fair enough.

>  
>>
>> To me, "blocking" means "waits w/o busy-looping", which isn't really
>> the same thing as "may reenter me".
>
>
> Fair enough.  Any better ideas?
>  

How about FiberSwitcher?  IOW, I think the relevant part is the permission to invoke the fiber switching mechanism, rather than knowledge of what fiber you are.

It's too bad that there's no universal name for cooperatively scheduled threads.  Windows has fibers, Linux has whatever the ucontext mechanism is called, Python has greenlets and some obsolete thing that I've forgotten the name of, and Javascript can't even figure out what to call a thread.

--Andy

>>
>> It could be worth looking at how Go handles this.
>
>
> I think their threads are actually preemptive, and they encourage inter-thread communication via channels.  But I don't know much about Go.
>

Kenton Varda

unread,
Nov 18, 2013, 3:48:42 AM11/18/13
to Andrew Lutomirski, capnproto
On Sun, Nov 17, 2013 at 10:24 PM, Andrew Lutomirski <an...@luto.us> wrote:

How about FiberSwitcher?  IOW, I think the relevant part is the permission to invoke the fiber switching mechanism, rather than knowledge of what fiber you are.

Could work.  Feels bulky. 

It's too bad that there's no universal name for cooperatively scheduled threads.  Windows has fibers, Linux has whatever the ucontext mechanism is called, Python has greenlets and some obsolete thing that I've forgotten the name of, and Javascript can't even figure out what to call a thread.

I like "fibers".

I guess the fact that greenlets work (and are implemented as a C extension) implies that the Python can handle C libraries screwing with the stack.  Maybe it will work to simply wrap whatever implementation I come up with from Python, then.  Though I'm still not sure it's worth it in Python.

Joakim Johansson

unread,
Nov 18, 2013, 4:34:27 AM11/18/13
to capn...@googlegroups.com
Hi,

I agree, managing concurrency in the general case as a framework developer is hard, as there is usually a wide range of use cases that may have conflicting requirements.

Using threads as the unit of abstraction for a framework is ripe with problems, especially as it comes to scalability problems as different frameworks may implement their own thread pools for scalability (or not...) and the primitives for synchronization (mutex locks / condition variables) are not very efficient and scalable under load. Lockless is as you point out harder, and only works for some scenarios - but can be fast when done right.

Apart from the various approaches outlined below, I would suggest to consider one additional one which we (http://www.tbricks.com) have had good experience with - Grand Central Dispatch [1] which is provided as open source [2] [3] by Apple and has been ported to various platforms.

It has very decent abstractions using dispatch queues for specifying serialization, which allow describing concurrency behavior rather than locking, and uses a combination of standard OS synchronization mechanisms and lockless operation for enqueue/dequeue when under load - the higher the load on the system, the lower the overhead... It provides an easy-to-use level of abstraction while yielding good performance. 

It also has abstractions for event sources which is relevant in the context of IPC. A dispatch queue can be called asynchronously (which may lead to waking up a worker thread if the queue is quiescent, or only atomic ops if the queue is busy), or synchronously (in which case the calling thread will be used to perform the operation inline), choosing which one of these that is used is simple.

One additional benefit with the libdispatch approach in the context of a framework is that the management of the threads that are used behind-the-scenes to actually perform work, is performed on a system level for operating systems supporting the relevant extensions (today only OS X and FreeBSD AFAIR), but even on other platforms it will actually allow for  shared thread pools in the process scope and allows multiple frameworks to share the same thread pool and thus avoid creating too many threads.

If you are not familiar with the APIs provided, the best initial source would be the Concurrency Programmin guide [4] and the GCD reference [5] and there is also a more packaged solution at [6] that has required supporting infrastructure for e.g. Linux. There are also a number of introduction sessions at Apple's developer website, but those requires a dev account for access.

It is not a silver bullet, but it has many good characteristics, especially for more complex applications when many subsystems may need to cooperate with regard to available CPU resources.

Just one input for consideration.

Cheers,

Joakim

[2] http://libdispatch.macosforge.org

Tobias Hahn

unread,
Nov 18, 2013, 11:00:26 AM11/18/13
to Cap'n Proto
Hi,

just one more data point:

We used fibers (ucontext on OS X, fibers on Win) at one point in our application, and eventually removed them. The main reason is that they don't seem to be supported very well by the OS and tools these days. Especially on OS X, we ran into hard to debug crashes where stacks where smashed. Also, there is no (easy?) way to inspect the state/stack of a fiber other than the current one in the debugger (gdb/lldb). They were obsoleted in POSIX.1-2004 and removed in POSIX.1-2008 in favor of POSIX threads.

Best,
Tobias
> --
> You received this message because you are subscribed to the Google Groups "Cap'n Proto" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to capnproto+...@googlegroups.com.
> Visit this group at http://groups.google.com/group/capnproto.

Ableton AG, Schoenhauser Allee 6-7, 10119 Berlin, Germany
Sitz (Registered Office) Berlin, Amtsgericht Berlin-Charlottenburg, HRB 72838
Vorstand (Management Board): Gerhard Behles, Jan Bohl
Vorsitzender des Aufsichtsrats (Chair of the Supervisory Board): Uwe Struck


Jan Huwald

unread,
Nov 18, 2013, 1:03:08 PM11/18/13
to capn...@googlegroups.com
On 11/17/2013 11:52 AM, Kenton Varda wrote:
> I just had a nutty idea... What if we implemented Promise::wait()
> using fibers (green threads)?

Better than OS threads. But what if you implement it agnostic of the
underlying thread model?

> Meanwhile, however, it's still fundamentally event-loop programming.
> Your state can only change out from under you at defined points, not
> at just any instruction. No need for lots of mutexes. Yes, this
> means it's cooperative multitasking, although you can always create
> real threads (with separate event loops) when you need them and
> communicate with them safely via Cap'n Proto interfaces.

Fibers and threads are orthogonal features. Allowing fibers to run on
arbitrary threads (like e.g. Erlang or GCD) enhances scalability. OTOH,
when all fibers competing for some resource run on the same thread, no
locking is required.

Both variants (and fiberless pure threads) are optimal solution for some
application. Implementing a specific model for capnp prevents capnp
being used (elegently) in those applications. Which will lead to more
(terrible concurrency-related) bugs.

A thread model could be passed as template parameter + default
constructed parameter and offer types for thread and mutexes, and
functions for creating threads, yielding cpu, and mutex handling. Like:

template<typename ThreadModel = CapnpStandardThreadModel>
void foo(int bar, ThreadModel threadModel = ThreadModel());

That would allow the capnp user to choose the thread model that most
benefits him. It allows for thin wrappers to preferred threading API.
No-ops and empty types (e.g. mutexes for single-theaded fibers) are
optimized away at compile time. And for simple use cases (the default)
the template and function parameter can be omitted and the user does not
to care.


Side notice:

> Both Unix and Windows have standard APIs for fibers.

makecontext() and swapcontext() have been deprecated as of POSIX.1-2008
so there is no standard fiber API for UNIX. The function will probably
continue to be available for a long time. But they have been terribly
slow [1]. I can vouch for boost::context [2] on UNIX systems, but have
no experience using it with windows.

[1]
http://www.boost.org/doc/libs/1_55_0/libs/context/doc/html/context/performance.html
[2] http://www.boost.org/doc/libs/1_55_0/libs/context/doc/html/index.html


With best regards,
Jan

signature.asc

Kenton Varda

unread,
Nov 18, 2013, 2:26:05 PM11/18/13
to Tobias Hahn, Cap'n Proto
Hi Tobias,

On Mon, Nov 18, 2013 at 8:00 AM, Tobias Hahn <tobia...@ableton.com> wrote:
Hi,

just one more data point:

We used fibers (ucontext on OS X, fibers on Win) at one point in our application, and eventually removed them. The main reason is that they don't seem to be supported very well by the OS and tools these days. Especially on OS X, we ran into hard to debug crashes where stacks where smashed. Also, there is no (easy?) way to inspect the state/stack of a fiber other than the current one in the debugger (gdb/lldb). They were obsoleted in POSIX.1-2004 and removed in POSIX.1-2008 in favor of POSIX threads.

Thanks for the data point.

Debugability would be nice.  That said, examining the event loop state in the debugger is not so easy either.  Fibers would probably improve debugability overall vs. callbacks, since you could actually walk up the stack to the calling task.

-Kenton

Kenton Varda

unread,
Nov 18, 2013, 2:45:29 PM11/18/13
to Jan Huwald, capnproto
Event loop concurrency, even with fibers, is, as you say, a completely different programming model to threads.  I don't think it's something that could be chosen by a template parameter -- even if I were willing to put the entire RPC implementation in a header file, which I'm not.  An RPC implementation based on threads would be written very differently from one based on event loops; it would need to be a whole new library, really.

In any case, applications can still use threads with Cap'n Proto by having a separate event loop on each thread.  However, the RPC system lives in a single EventLoop (or, at least, the state associated with a particular connection does).

-Kenton

Andrew Lutomirski

unread,
Nov 18, 2013, 2:48:35 PM11/18/13
to Kenton Varda, Jan Huwald, capnproto
On Mon, Nov 18, 2013 at 11:45 AM, Kenton Varda <temp...@gmail.com> wrote:
> Event loop concurrency, even with fibers, is, as you say, a completely
> different programming model to threads. I don't think it's something that
> could be chosen by a template parameter -- even if I were willing to put the
> entire RPC implementation in a header file, which I'm not. An RPC
> implementation based on threads would be written very differently from one
> based on event loops; it would need to be a whole new library, really.
>
> In any case, applications can still use threads with Cap'n Proto by having a
> separate event loop on each thread. However, the RPC system lives in a
> single EventLoop (or, at least, the state associated with a particular
> connection does).
>

One issue with fibers: unless you have something like Scheduler
Activations (or whatever they were called), sending blocking I/O to a
separate fiber will still cause the whole thread to stall, which IMO
negates some of the advantage of using a synchronous style. You'll
still have to use a special I/O library that's fully asynchronous.

(As far as I know, everyone has given up on scheduler activations.
There might be some BSD kernels around that support them.)

--Andy

Kenton Varda

unread,
Nov 18, 2013, 2:57:11 PM11/18/13
to Andrew Lutomirski, Jan Huwald, capnproto
On Mon, Nov 18, 2013 at 11:48 AM, Andrew Lutomirski <an...@luto.us> wrote:
One issue with fibers: unless you have something like Scheduler
Activations (or whatever they were called), sending blocking I/O to a
separate fiber will still cause the whole thread to stall, which IMO
negates some of the advantage of using a synchronous style.  You'll
still have to use a special I/O library that's fully asynchronous.

(As far as I know, everyone has given up on scheduler activations.
There might be some BSD kernels around that support them.)

Yep, I'm aware.  You'd still have to formulate your IO in terms of the async I/O interfaces (kj/async-io.h, kj/async-unix.h).  But if you can split your work reasonably across event loops in separate threads, you might still be able to do blocking disk I/O without causing too much trouble (I'd hate to lose mmap...).
Reply all
Reply to author
Forward
0 new messages