Re: Experiment with Coroutines in //remoting?

15 views
Skip to first unread message

Joe Mason

unread,
Dec 22, 2024, 6:09:20 PM12/22/24
to Kevin Smith, scheduler-dev, cxx, pkas...@chromium.org, rkj...@chromium.org
+scheduler-dev for visibility

On Wed, Dec 18, 2024, 09:43 'Kevin Smith' via cxx <c...@chromium.org> wrote:
I'm also experimenting with a co_await-enabling prototype and I'm currently looking for code that would benefit from co_await. Are there any particular classes (in //remoting or elsewhere) that might be good candidates?

Thanks!

On Friday, December 13, 2024 at 11:02:57 AM UTC-5 pkas...@chromium.org wrote:
On Thursday, December 12, 2024 at 7:10:17 PM UTC-8 Erik Jensen wrote:
Are you familiar with Google's internal coroutines code?

I have no experience or familiarity with C++ in google3 whatsoever.

OK. jacobsa@ is the relevant owner, and wrote the initial k3::Co library that I believe is now renamed c9::. He gave a good talk on coroutine implementation choices: https://www.youtube.com/watch?v=k-A12dpMYHo -- if you want to implement something, I would definitely watch.

 Are you familiar with my coroutines prototype?

I don't see it linked from https://crbug.com/40251667. Where would I find it?

https://chromium-review.googlesource.com/c/chromium/src/+/4227153, but it isn't really the right direction to go. It's a direct port of the Dec. 31 2022 version of the k3 code, so it's not aware of the task scheduler at all. The bug you linked has my comment with thoughts on a better design direction (which is more like what you're doing here).

You didn't mention the promises work, I assume you haven't seen it. There's a lot of overlap between that and what coroutines could do. But let's leave it aside for the moment.

To be clear, I don't think I have the context or capacity to design or implement a new task system in Chromium. While Chrome Remote Desktop lives in-tree and uses the task system from base, it is surely not representative of how that task system is used in the browser. It's also a very small team, so it would be challenging for us to invest the resources needed for a Chromium-wide endeavor.

We definitely don't want to reinvent the whole task system.  Being able to interface with the existing one is the right direction. I do think we need to have an idea of the long-term direction -- relevant base/ OWNERS like gab@ and I should be on the hook to help you with that, it shouldn't be on your shoulders.
 
The implementation of what I have in mind should be pretty straightforward, so maybe the best next step would be to post a CL that “‘merely’ provides a tool to rewrite laboriously broken-apart chains of execution, completion callback arguments, etc. as straight-line coroutine code” (as mentioned in the bug), and then have something concrete to discuss whether it would be okay in //remoting, be desired as an intermediate option for all of Chromium, or is fundamentally flawed.

Yeah, that sounds like a good route. A minimal working prototype is always easiest to think about.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
To view this discussion visit https://groups.google.com/a/chromium.org/d/msgid/cxx/1b86a1c1-ad6b-449e-a842-9685ea722b04n%40chromium.org.

Joe Mason

unread,
Jan 28, 2025, 10:13:47 PMJan 28
to Erik Jensen, scheduler-dev, cxx, Peter Kasting, etie...@google.com, ksm...@brave.com, Alex Clarke
+scheduler-dev 

Can you include schedu...@chromium.org in this design conversation? Many of the domain experts watch that list more closely than cxx@.

Thanks!


On Tue, Jan 28, 2025, 06:26 Erik Jensen <rkj...@chromium.org> wrote:
I don't know that I'd nearly qualify as an "expert", but I would enjoy being a part of that if it comes to fruition.

In the mean time, here's my draft prototype for implementing coroutines in Chromium: https://chromium-review.googlesource.com/c/chromium/src/+/6205666 (There are surely style, et cetera issues. I was mostly trying to get my thoughts down.)

I ended up changing my approach from what I initially sketched, since the initial sketch was meant to be something that could be used just in //remoting, while the prototype instead tries to integrate into //base to be more widely usable. The basic concept is "posting a coroutine task should be as similar to posting a regular task as possible." Specifically:
  • BindOnce() must be used to bind a coroutine's arguments to it prior to posting.
  • A coroutine OnceCallback can be posted as a task by passing it to a TaskRunner's PostTask() method. (This required updating a few sites that take a reference to PostTask() to explicitly select the desired overload. The alternative would be to have a distinctly named PostCoroutine() or some such.)
  • When a coroutine is passed to PostTaskAndReply(), the reply callback will be invoked when the coroutine has fully executed.
  • All of the code within a posted coroutine task runs on the same runner. If the coroutine awaits on something from a different thread / sequence / et cetera, the implementation ensures that it is still resumed on the expected sequence. (This is in contrast to many implementations, where the coroutine must be prepared to resume on any arbitrary thread.) This allows sequence-bound and thread-affine objects to be used within a coroutine.
    • However, the coroutine may be interleaved with other tasks on the same runner when suspending via co_await, and it may resume on a different physical thread (unless it was posted to a SingleThreadTaskRunner).
  • To do work on a different runner a new task must be posted using that runner.
    • I added PostTaskAndAwait() and PostTaskAndAwaitResult() methods (that work with both coroutine and regular tasks) to make it easy to co_await such a task.
  • Like with regular tasks, if the a task posted to PostTaskAndReply() is canceled, the reply callback will not be invoked.
    • Unlike with regular tasks, a coroutine task can be can canceled at a suspend point after it has started execution. (Or manually canceled by awaiting a special CancelTask value.)
    • If a task posted to PostTaskAndAwait() is canceled, the implementation makes a best-effort attempt to cancel the caller as well to prevent the waiting coroutine from leaking.
Handling WeakPtrs is a little tricky, since a WeakPtr might be invalidated by a interleaved task any time the coroutine suspends. Manually rechecking WeakPtrs after every resume isn't tenable, as the currently-executing coroutine might not even know the pointer / reference it was passed was derived from a WeakPtr, let alone have access to it. This is especially true for methods, which can only be passed their receiver as a raw pointer, and being able to co_await another method in the same class is pretty important for usability.

My solution is to allow coroutines to create a scoped guard declaring the use of a given WeakPtr. The implementation keeps a list of all such guards that are currently live on the coroutine stack, and will automatically check them all each time the task resumes. If any guarded WeakPtrs are invalid, the implementation will immediately cancel the task instead of resuming execution of the coroutine. When a WeakPtr is bound as the receiver for a coroutine method, BindOnce() will automatically ensure that such a guard exists for the lifetime of the bound coroutine's execution.

Like with a regular task, a coroutine task will be destroyed on the appropriate runner if it is canceled. However, since the coroutine task may have already started execution, there's more state to destroy than just the bound arguments in a regular task, and thus more of a chance that a destructor in the coroutine stack might try to access a pointer derived from a now-invalid WeakPtr. A coroutine needs to take care that, for objects that live across a suspend point, their destructors don't access any pointers or references that are not guaranteed to point to objects also on the coroutine stack. The alternative would be to leak any canceled coroutine instead of destroying it, but that seems undesirable.

There is one major caveat with my prototype: it currently crashes if the posted coroutine never actually suspends. I'm pretty sure the crash is due to a miscompilation of ExecuteInTaskImpl, but please let me know if there's a bug in that function that I'm missing. The resulting assembly looks like this:

0x5555593d887c  <+ 2172>        48 8b bd c0 fe ff ff     mov    rdi,QWORD PTR [rbp-0x140]
// Call to delete for the coroutine state (sample value for $rdi: 0x964000f0000)
0x5555593d8883  <+ 2179>        e8 38 12 44 00           call   0x555559819ac0 <_ZdlPv>
0x5555593d8888  <+ 2184>        8b 85 c4 fd ff ff        mov    eax,DWORD PTR [rbp-0x23c]
0x5555593d888e  <+ 2190>        85 c0                    test   eax,eax
0x5555593d8890  <+ 2192>        74 04                    je     0x5555593d8896
0x5555593d8892  <+ 2194>        eb 00                    jmp    0x5555593d8894
0x5555593d8894  <+ 2196>        eb 02                    jmp    0x5555593d8898
0x5555593d8896  <+ 2198>        eb 00                    jmp    0x5555593d8898
0x5555593d8898  <+ 2200>        eb 00                    jmp    0x5555593d889a
0x5555593d889a  <+ 2202>        eb 00                    jmp    0x5555593d889c
// Calls to destructors for copies of |task| and |current_runner| within the already-deleted coroutine state
0x5555593d889c  <+ 2204>        48 8b bd 48 fe ff ff     mov    rdi,QWORD PTR [rbp-0x1b8]
// $rdi here is 0x964000f00d8, within the just-freed allocation.
// Crashes because that memory has been overwritten with 0xcdcdcdcdcdcdcdcd.
0x5555593d88a3  <+ 2211>        e8 d8 79 f8 fd           call   0x555557360280 <_ZN13scoped_refptrIN4base10TaskRunnerEED2Ev>
0x5555593d88a8  <+ 2216>        48 8b bd 40 fe ff ff     mov    rdi,QWORD PTR [rbp-0x1c0]
0x5555593d88af  <+ 2223>        e8 dc e4 f6 fd           call   0x555557346d90 <_ZN4base12OnceCallbackIFNS_5AsyncIvEEvEED2Ev>
0x5555593d88b4  <+ 2228>        48 81 c4 40 02 00 00     add    rsp,0x240
0x5555593d88bb  <+ 2235>        5d                       pop    rbp
0x5555593d88bc  <+ 2236>        c3                       ret

If I uncomment line 219, suspending ExecuteInTaskImpl never to be resumed (leaking its state), then all the tests pass, but obviously that's not a tenable solution.

On Friday, January 10, 2025 at 1:49:13 PM UTC-8 Peter Kasting wrote:
I reached out to ATLs about how to do this The Right Way, hopefully I'll have an update soon on how we can figure this all out. 

It will likely look like a separate "expert committee" to make an initial investigation and recommendation around API shape, so if anyone specifically wants to be part of that -- or knows someone else who should -- let me know. There's a few people like gab and fdoray that I know should be there, but assume I'll miss even obvious choices and speak up :)

PK

On Thu, Jan 9, 2025, 2:04 AM Alex Clarke <alexc...@google.com> wrote:
FWIW with a suitable library in front of them, coroutines in C++ are really quite nice.  You can write code in a similar vein to JS or Kotlin style. I've got a side project where I've used this extensively. 

On Wed, 18 Dec 2024 at 18:50, 'Etienne Pierre-doray' via cxx <c...@chromium.org> wrote:
 Are there any particular classes (in //remoting or elsewhere) that might be good
You'll find some examples in http://bit.ly/cpp-promises

On Wed, Dec 18, 2024 at 1:35 PM Peter Kasting <pkas...@chromium.org> wrote:
Media code has a ton of several-levels-deep callbacks and continuations, and would probably be a place to refactor if we had coroutines.

PK

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.

--
You received this message because you are subscribed to the Google Groups "cxx" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
Reply all
Reply to author
Forward
0 new messages