[webgpu] Make DawnWireServices optionally thread-safe.
- This change adds an optional lock in DawnWireServices to
make it thread-safe in the case where we want the wire client
to handle GPU return data on a separate thread.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
scoped_refptr<base::SequencedTaskRunner> main_task_runner_;This comment seems to only explain part of the story. What I understand that happens is that we want to receive commands out of order, such that `OnGpuControlReturnData` will be called in a different `SequencedTaskRunner`. This means that we want the wire to process the return commands immediately, in parallel of whatever the user of WebGPU hosted on that `dawn::wire` is doing. However to avoid races, we will only trigger the spontaneous callbacks on the same sequence to avoid issues.
Later in the CL chain, `WaitAny` is used that will be able to wait and synchronously trigger the callback that's marked complete in parallel to the `WaitAny`. There is additional complexity to make all the Blink callbacks use `AllowProcessEvents` vs. `AllowSpontaneous` because we want to control when the callbacks happen in multithreaded mode.
How about instead adding an option to dawn::wire::client to "downgrade" all callbacks to `AllowProcessEvents`? This is transparent for whoever calls dawn::wire, but it groups more of the logic for multithreading in `webgpu_implementation` so we don't have to require that all the code adapts to using `AllowProcessEvents`.
TBH there's enough moving parts that a design doc that explains the layering, the mojo filtering etc would be useful. I had to dig into the whole CL chain and recall previous CLs in Dawn to have an idea of the direction we're going. (maybe I missed the DD for this)
void DawnWireServices::ClientHandleCommands(```suggestion
void DawnWireServices::HandleClientCommands(
```
// TODO(enga): Instead of a CHECK, this could generate a device lostThis seems to want to harden the renderer from malformed commands coming from the GPU process. IMHO we should remove this TODO and comment that we expect commands from the GPU process to be well-formed.
void DawnWireServices::SerializerCommit() {Any way to make these method name make more sense in the context of `DawnWireServices`? Maybe just removing `Serializer` would help since we're asking the wire to be committed and don't care about the implementation details.
bool support_locking)Is the plan to eventually always have `support_locking` to true such that we remove the dual-code path complexity?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
raw_ptr<base::Lock> lock_ptr_ = nullptr;nit: make this `const` if possible so it's clear it doesn't change state
or if possible combine these into one member to simplify and avoid having a self-pointer. Either:
I would guess the latter is slightly more efficient (cache locality) but I don't think it really makes a difference.
// TODO(enga): Instead of a CHECK, this could generate a device lostThis seems to want to harden the renderer from malformed commands coming from the GPU process. IMHO we should remove this TODO and comment that we expect commands from the GPU process to be well-formed.
+1
scoped_refptr<base::SequencedTaskRunner> main_task_runner_;This comment seems to only explain part of the story. What I understand that happens is that we want to receive commands out of order, such that `OnGpuControlReturnData` will be called in a different `SequencedTaskRunner`. This means that we want the wire to process the return commands immediately, in parallel of whatever the user of WebGPU hosted on that `dawn::wire` is doing. However to avoid races, we will only trigger the spontaneous callbacks on the same sequence to avoid issues.
Later in the CL chain, `WaitAny` is used that will be able to wait and synchronously trigger the callback that's marked complete in parallel to the `WaitAny`. There is additional complexity to make all the Blink callbacks use `AllowProcessEvents` vs. `AllowSpontaneous` because we want to control when the callbacks happen in multithreaded mode.
How about instead adding an option to dawn::wire::client to "downgrade" all callbacks to `AllowProcessEvents`? This is transparent for whoever calls dawn::wire, but it groups more of the logic for multithreading in `webgpu_implementation` so we don't have to require that all the code adapts to using `AllowProcessEvents`.
TBH there's enough moving parts that a design doc that explains the layering, the mojo filtering etc would be useful. I had to dig into the whole CL chain and recall previous CLs in Dawn to have an idea of the direction we're going. (maybe I missed the DD for this)
Yea, sorry, I was putting a DD together for this sort of at the same time as implementing it because it was difficult to understand what was happening without actually trying stuff. The doc is relatively complete now at: https://docs.google.com/document/d/1fd1u4U8z3_tMvSE9UCATNp0mdxyWebqnAo3zkMCNKPQ/edit?usp=sharing
As for the question about downgrading all callbacks to `AllowProcessEvents`, the callbacks that would need to be downgraded are the user callbacks on the client side, not the server ones. The server ones are still all spontaneous. If we wanted to do something like force all client callbacks to be `AllowProcessEvents`, we would need to manually overwrite all the async entry points to not take a callback mode or something? I'm also not sure if that's what we want anyways since we may need to use the `WaitAnyOnly` for some stuff also.
nit: make this `const` if possible so it's clear it doesn't change state
or if possible combine these into one member to simplify and avoid having a self-pointer. Either:
- `const std::unique_ptr<base::Lock> lock_;` + `lock_.get()`
- `const std::optional<base::Lock> lock_;` + `base::OptionalToPtr(lock_)`
I would guess the latter is slightly more efficient (cache locality) but I don't think it really makes a difference.
Done
void DawnWireServices::ClientHandleCommands(Loko Kung```suggestion
void DawnWireServices::HandleClientCommands(
```
Done
// TODO(enga): Instead of a CHECK, this could generate a device lostKai NinomiyaThis seems to want to harden the renderer from malformed commands coming from the GPU process. IMHO we should remove this TODO and comment that we expect commands from the GPU process to be well-formed.
+1
Done
Any way to make these method name make more sense in the context of `DawnWireServices`? Maybe just removing `Serializer` would help since we're asking the wire to be committed and don't care about the implementation details.
Done
bool support_locking)Is the plan to eventually always have `support_locking` to true such that we remove the dual-code path complexity?
Maybe? I did this for now so that we I could implement the new stuff without changing the existing behavior in most cases. I'm not sure if we would ever need to enable this all the time though since the additional locking could potentially cause us to regress a little performance wise if we turned this on unconditionally?
That said, if we do some benchmarking and we don't see a significant issue with performance, it would definitely be nice to use this as the default path and not have to maintain two branches.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
scoped_refptr<base::SequencedTaskRunner> main_task_runner_;Loko KungThis comment seems to only explain part of the story. What I understand that happens is that we want to receive commands out of order, such that `OnGpuControlReturnData` will be called in a different `SequencedTaskRunner`. This means that we want the wire to process the return commands immediately, in parallel of whatever the user of WebGPU hosted on that `dawn::wire` is doing. However to avoid races, we will only trigger the spontaneous callbacks on the same sequence to avoid issues.
Later in the CL chain, `WaitAny` is used that will be able to wait and synchronously trigger the callback that's marked complete in parallel to the `WaitAny`. There is additional complexity to make all the Blink callbacks use `AllowProcessEvents` vs. `AllowSpontaneous` because we want to control when the callbacks happen in multithreaded mode.
How about instead adding an option to dawn::wire::client to "downgrade" all callbacks to `AllowProcessEvents`? This is transparent for whoever calls dawn::wire, but it groups more of the logic for multithreading in `webgpu_implementation` so we don't have to require that all the code adapts to using `AllowProcessEvents`.
TBH there's enough moving parts that a design doc that explains the layering, the mojo filtering etc would be useful. I had to dig into the whole CL chain and recall previous CLs in Dawn to have an idea of the direction we're going. (maybe I missed the DD for this)
Yea, sorry, I was putting a DD together for this sort of at the same time as implementing it because it was difficult to understand what was happening without actually trying stuff. The doc is relatively complete now at: https://docs.google.com/document/d/1fd1u4U8z3_tMvSE9UCATNp0mdxyWebqnAo3zkMCNKPQ/edit?usp=sharing
As for the question about downgrading all callbacks to `AllowProcessEvents`, the callbacks that would need to be downgraded are the user callbacks on the client side, not the server ones. The server ones are still all spontaneous. If we wanted to do something like force all client callbacks to be `AllowProcessEvents`, we would need to manually overwrite all the async entry points to not take a callback mode or something? I'm also not sure if that's what we want anyways since we may need to use the `WaitAnyOnly` for some stuff also.
Thank you for the DD, it helps a lot to understand where this is going and how it fits in the larger effort. DD LGTM.
Alternatively, could we make first make all WebGPU callbacks use `AllowProcessEvents` and call `instance.ProcessEvents` after the `dawn::wire::client::Client::HandleCommands` so that we don't need to carry around two different callback modes that we'll have to think about? This would be done at [1] and in this CL turned into the `else` branch of `if (main_task_runner_)`. This is again to make sure we don't end up with duplicate code paths that we have to test and be aware of as we're making changes.
bool support_locking)Loko KungIs the plan to eventually always have `support_locking` to true such that we remove the dual-code path complexity?
Maybe? I did this for now so that we I could implement the new stuff without changing the existing behavior in most cases. I'm not sure if we would ever need to enable this all the time though since the additional locking could potentially cause us to regress a little performance wise if we turned this on unconditionally?
That said, if we do some benchmarking and we don't see a significant issue with performance, it would definitely be nice to use this as the default path and not have to maintain two branches.
My understanding is that all these code paths are fairly cold, so adding a little uncontested locking should be very cheap. We could start with a boolean then Finch it to eventually unconditionally set it to true.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
bool support_locking)Loko KungIs the plan to eventually always have `support_locking` to true such that we remove the dual-code path complexity?
Corentin WallezMaybe? I did this for now so that we I could implement the new stuff without changing the existing behavior in most cases. I'm not sure if we would ever need to enable this all the time though since the additional locking could potentially cause us to regress a little performance wise if we turned this on unconditionally?
That said, if we do some benchmarking and we don't see a significant issue with performance, it would definitely be nice to use this as the default path and not have to maintain two branches.
My understanding is that all these code paths are fairly cold, so adding a little uncontested locking should be very cheap. We could start with a boolean then Finch it to eventually unconditionally set it to true.
I agree these code paths seem cold enough that locking overhead should be insignificant. Finch plan SGTM (as followup)
bool WebGPUImplementation::EnsureAwaitingFlush() {nit: Maybe this could be moved into DawnWireServices so it doesn't have to take the lock twice.
helper_->FlushLazy();Is it OK that this is not locked?
scoped_refptr<base::SequencedTaskRunner> main_task_runner_;Loko KungThis comment seems to only explain part of the story. What I understand that happens is that we want to receive commands out of order, such that `OnGpuControlReturnData` will be called in a different `SequencedTaskRunner`. This means that we want the wire to process the return commands immediately, in parallel of whatever the user of WebGPU hosted on that `dawn::wire` is doing. However to avoid races, we will only trigger the spontaneous callbacks on the same sequence to avoid issues.
Later in the CL chain, `WaitAny` is used that will be able to wait and synchronously trigger the callback that's marked complete in parallel to the `WaitAny`. There is additional complexity to make all the Blink callbacks use `AllowProcessEvents` vs. `AllowSpontaneous` because we want to control when the callbacks happen in multithreaded mode.
How about instead adding an option to dawn::wire::client to "downgrade" all callbacks to `AllowProcessEvents`? This is transparent for whoever calls dawn::wire, but it groups more of the logic for multithreading in `webgpu_implementation` so we don't have to require that all the code adapts to using `AllowProcessEvents`.
TBH there's enough moving parts that a design doc that explains the layering, the mojo filtering etc would be useful. I had to dig into the whole CL chain and recall previous CLs in Dawn to have an idea of the direction we're going. (maybe I missed the DD for this)
Corentin WallezYea, sorry, I was putting a DD together for this sort of at the same time as implementing it because it was difficult to understand what was happening without actually trying stuff. The doc is relatively complete now at: https://docs.google.com/document/d/1fd1u4U8z3_tMvSE9UCATNp0mdxyWebqnAo3zkMCNKPQ/edit?usp=sharing
As for the question about downgrading all callbacks to `AllowProcessEvents`, the callbacks that would need to be downgraded are the user callbacks on the client side, not the server ones. The server ones are still all spontaneous. If we wanted to do something like force all client callbacks to be `AllowProcessEvents`, we would need to manually overwrite all the async entry points to not take a callback mode or something? I'm also not sure if that's what we want anyways since we may need to use the `WaitAnyOnly` for some stuff also.
Thank you for the DD, it helps a lot to understand where this is going and how it fits in the larger effort. DD LGTM.
Alternatively, could we make first make all WebGPU callbacks use `AllowProcessEvents` and call `instance.ProcessEvents` after the `dawn::wire::client::Client::HandleCommands` so that we don't need to carry around two different callback modes that we'll have to think about? This would be done at [1] and in this CL turned into the `else` branch of `if (main_task_runner_)`. This is again to make sure we don't end up with duplicate code paths that we have to test and be aware of as we're making changes.
I don't want to change the callback types in this change since that affects a lot of the Blink files, but because explicitly calling `ProcessEvents` here will just be a no-op given that the existing modes are all `AllowSpontaneous`, I added the call now, and in the child CL, I will remove the need to branch modes and just have everything be `ProcessEvents` and I think that should be equivalent.
bool support_locking)Loko KungIs the plan to eventually always have `support_locking` to true such that we remove the dual-code path complexity?
Corentin WallezMaybe? I did this for now so that we I could implement the new stuff without changing the existing behavior in most cases. I'm not sure if we would ever need to enable this all the time though since the additional locking could potentially cause us to regress a little performance wise if we turned this on unconditionally?
That said, if we do some benchmarking and we don't see a significant issue with performance, it would definitely be nice to use this as the default path and not have to maintain two branches.
Kai NinomiyaMy understanding is that all these code paths are fairly cold, so adding a little uncontested locking should be very cheap. We could start with a boolean then Finch it to eventually unconditionally set it to true.
I agree these code paths seem cold enough that locking overhead should be insignificant. Finch plan SGTM (as followup)
Acknowledged
nit: Maybe this could be moved into DawnWireServices so it doesn't have to take the lock twice.
Done
Is it OK that this is not locked?
I think it should be ok in this case because a race would imply that the wire was cleaned up already anyways.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
LGTM but Corentin should probably re-review
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
LGTM but Corentin should probably re-review
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
[webgpu] Make DawnWireServices optionally thread-safe.
- This change adds an optional lock in DawnWireServices to
make it thread-safe in the case where we want the wire client
to handle GPU return data on a separate thread.
- The change also removes the public getters to the internals of
the DawnWireServices object in favor of methods that directly
call into the internals to allow us to take the lock
appropriately when needed.
- Note that we also add calls to ProcessEvents on the Dawn wire
when running in a multithreaded environment since the
callbacks must happen on the thread which created the
DawnWireServices object.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |