WebNN: Handle WebNN context lost error [chromium/src : main]

73 views
Skip to first unread message

Mingming1 Xu (Gerrit)

unread,
Jun 12, 2024, 4:53:06 AMJun 12
to ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
Attention needed from ningxin hu

Mingming1 Xu added 1 comment

Patchset-level comments
File-level comment, Patchset 4 (Latest):
Mingming1 Xu . resolved

PTAL, thanks!

Open in Gerrit

Related details

Attention is currently required from:
  • ningxin hu
Submit Requirements:
  • requirement satisfiedCode-Coverage
  • requirement is not satisfiedCode-Review
Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
Gerrit-MessageType: comment
Gerrit-Project: chromium/src
Gerrit-Branch: main
Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
Gerrit-Change-Number: 5602134
Gerrit-PatchSet: 4
Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
Gerrit-CC: Jiewei Qian <q...@chromium.org>
Gerrit-CC: Kentaro Hara <har...@chromium.org>
Gerrit-Attention: ningxin hu <ningx...@intel.com>
Gerrit-Comment-Date: Wed, 12 Jun 2024 08:52:57 +0000
Gerrit-HasComments: Yes
Gerrit-Has-Labels: No
satisfied_requirement
unsatisfied_requirement
open
diffy

ningxin hu (Gerrit)

unread,
Jun 13, 2024, 8:59:44 AMJun 13
to Mingming1 Xu, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
Attention needed from Austin Sullivan, Mingming1 Xu and Rafael Cintron

ningxin hu added 4 comments

Patchset-level comments
ningxin hu . resolved

Adding other reviewers. Please also take a look, thanks!

File services/webnn/webnn_context_impl.cc
Line 98, Patchset 4 (Latest): // receiver_.ReportBadMessage(context_lost_info);
ningxin hu . unresolved

Do we need it?

File third_party/blink/renderer/modules/ml/ml_context.cc
Line 224, Patchset 4 (Latest):void MLContext::OnWebNNContextLostCaptured(const String& context_lost_info) {
ningxin hu . unresolved

Should this info be reported to JS code?

File third_party/blink/renderer/modules/ml/ml_context_lost_info.h
Line 27, Patchset 4 (Latest): String message_;
ningxin hu . unresolved

const String?

Open in Gerrit

Related details

Attention is currently required from:
  • Austin Sullivan
  • Mingming1 Xu
  • Rafael Cintron
Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 4
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 13 Jun 2024 12:59:32 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 13, 2024, 1:45:38 PMJun 13
    to Mingming1 Xu, Rafael Cintron, Austin Sullivan, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu and Rafael Cintron

    Bryan Bernhart added 6 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4 (Latest): // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    File services/webnn/dml/graph_impl_dml.cc
    Line 5755, Patchset 4 (Latest):// TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    nit: remove TODO?

    File services/webnn/dml/utils.cc
    Line 369, Patchset 4 (Latest):void HandleWebNNContextLost(HRESULT hr, const WebNNContextImpl* context) {
    Bryan Bernhart . unresolved

    Shouldn't this be off the `WebNNContextImpl`?

    Line 377, Patchset 4 (Latest): } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . unresolved

    It would be helpful to LOG the actual reason too via GetDeviceRemovedReason().

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4 (Latest): context_provider_->OnConnectionError(this);
    Bryan Bernhart . unresolved

    If the context gets destroyed, isn't that considered "lost"?

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4 (Latest): FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Gerrit-Comment-Date: Thu, 13 Jun 2024 17:45:27 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 13, 2024, 5:12:33 PMJun 13
    to Mingming1 Xu, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 10 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4 (Latest): // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    File services/webnn/public/mojom/webnn_context_lost_capturer.mojom
    Line 7, Patchset 4 (Latest):// This interface runs in the GPU process and is called from the GPU process.
    Reilly Grant . unresolved

    ```suggestion
    // This interface runs in the renderer process and is called from the GPU process.
    ```

    Line 11, Patchset 4 (Latest):interface WebNNContextLostCapturer {
    Reilly Grant . unresolved

    Can we generalize this to `WebNNContextClient` to match the style of other Mojo interfaces where we have a `Foo` interface and a `FooClient` interface for any async events that need to be sent back to the consumer of the `Foo` interface?

    File services/webnn/webnn_context_impl.h
    Line 34, Patchset 4 (Latest): context_lost_capturer_remote = mojo::NullRemote());
    Reilly Grant . unresolved

    Remove this default parameter and pass this parameter through all the subclasses.

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4 (Latest): context_provider_->OnConnectionError(this);
    Bryan Bernhart . unresolved

    If the context gets destroyed, isn't that considered "lost"?

    Reilly Grant

    If this method is being called then the renderer isn't listening anymore.

    Line 98, Patchset 4 (Latest): // receiver_.ReportBadMessage(context_lost_info);
    ningxin hu . unresolved

    Do we need it?

    Reilly Grant

    No, this isn't a bad message.

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 154, Patchset 4 (Latest): // Reset the remote of `WebNNContext` and the receiver of
    // `WebNNContextLostCapturer`, and then resolve the `lost_property_` which
    // indicates that the WebNN context is lost. It will be called in two cases:
    // 1. The `WebNNContext` remote is cut off from its receiver, for example,
    // when the WebNN Service is crashed.
    // 2. We need to release the WebNNContext in the WebNN Service when the the
    // WebNN Service captured a WebNN context lost error.
    Reilly Grant . unresolved
    ```suggestion
    // Closes the `remote_context_` and `context_lost_capturer_receiver_` pipes
    // because the context has been lost.
    ```
    File third_party/blink/renderer/modules/ml/ml_context.cc
    Line 214, Patchset 4 (Latest): &MLContext::OnWebNNContextDisconnected, WrapWeakPersistent(this)));
    Reilly Grant . unresolved

    I would attach this to `context_lost_capturer_receiver_` so that we have the invariant that either we get a call to `OnWebNNContextLostCaptured()` explaining why the context has been lost or we get a call to this handler to indicate we never will (and assume a GPU process crash).

    Line 239, Patchset 4 (Latest): "WebNN context is lost due to connection error."));
    Reilly Grant . unresolved

    Call `OnWebNNContextLostCaptured()` with this error. Put all the logic for reporting errors in that function.

    File third_party/blink/renderer/modules/ml/ml_context_lost_info.idl
    Line 11, Patchset 4 (Latest):] interface MLContextLostInfo {
    Reilly Grant . unresolved

    Use a dictionary instead of an interface since this type doesn't have any methods. I know there are a number of examples where this sort of type is defined as an interface but most of the ones I'm responsible for (such as the WebUSB transfer result types) I wish I'd defined as dictionaries instead.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 4
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 13 Jun 2024 21:12:21 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 13, 2024, 6:48:54 PMJun 13
    to Mingming1 Xu, Rafael Cintron, Austin Sullivan, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 2 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4 (Latest): // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4 (Latest): context_provider_->OnConnectionError(this);
    Bryan Bernhart . unresolved

    If the context gets destroyed, isn't that considered "lost"?

    Reilly Grant

    If this method is being called then the renderer isn't listening anymore.

    Bryan Bernhart

    I suppose the renderer or `MLContext` could generate "lost" event beforehand. Not exactly certain why this was needed in WebGPU.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 4
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 13 Jun 2024 22:48:41 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 13, 2024, 7:20:46 PMJun 13
    to Mingming1 Xu, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 1 comment

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4 (Latest): // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    Reilly Grant

    I don't think we should be reporting a context loss for errors like this. That's a separate concept.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 4
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 13 Jun 2024 23:20:35 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 13, 2024, 7:35:12 PMJun 13
    to Mingming1 Xu, Rafael Cintron, Austin Sullivan, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4 (Latest): // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    Reilly Grant

    I don't think we should be reporting a context loss for errors like this. That's a separate concept.

    Bryan Bernhart

    If this returns E_OUTOFMEMORY, the context will become inoperable and effectively lost. Also, the HR could be any other error result (but unlikely) so I can see it being quite useful to coverage to one error handler.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 4
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 13 Jun 2024 23:35:00 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 14, 2024, 5:28:19 AMJun 14
    to AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 13 comments

    Patchset-level comments
    File-level comment, Patchset 6 (Latest):
    Mingming1 Xu . resolved

    Addressed some of your comments, will continue addressing others.

    File services/webnn/dml/graph_impl_dml.cc
    Line 5755, Patchset 4:// TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . resolved

    nit: remove TODO?

    Mingming1 Xu

    Done

    File services/webnn/public/mojom/webnn_context_lost_capturer.mojom
    Line 7, Patchset 4:// This interface runs in the GPU process and is called from the GPU process.
    Reilly Grant . resolved

    ```suggestion
    // This interface runs in the renderer process and is called from the GPU process.
    ```

    Mingming1 Xu

    Done

    Line 11, Patchset 4:interface WebNNContextLostCapturer {
    Reilly Grant . unresolved

    Can we generalize this to `WebNNContextClient` to match the style of other Mojo interfaces where we have a `Foo` interface and a `FooClient` interface for any async events that need to be sent back to the consumer of the `Foo` interface?

    Mingming1 Xu

    Good idea! Now I delete this file to move the `WebNNContextClient` to `webnn_context_provider.mojom`, PTAL.

    File services/webnn/webnn_context_impl.h
    Line 34, Patchset 4: context_lost_capturer_remote = mojo::NullRemote());
    Reilly Grant . resolved

    Remove this default parameter and pass this parameter through all the subclasses.

    Mingming1 Xu

    Done

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4: context_provider_->OnConnectionError(this);
    Bryan Bernhart . unresolved

    If the context gets destroyed, isn't that considered "lost"?

    Reilly Grant

    If this method is being called then the renderer isn't listening anymore.

    Bryan Bernhart

    I suppose the renderer or `MLContext` could generate "lost" event beforehand. Not exactly certain why this was needed in WebGPU.

    Mingming1 Xu

    Yes, I think this method will only be called when the renderer destroys the WebNNContext pipe.

    Line 98, Patchset 4: // receiver_.ReportBadMessage(context_lost_info);
    ningxin hu . resolved

    Do we need it?

    Reilly Grant

    No, this isn't a bad message.

    Mingming1 Xu

    Done

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 154, Patchset 4: // Reset the remote of `WebNNContext` and the receiver of

    // `WebNNContextLostCapturer`, and then resolve the `lost_property_` which
    // indicates that the WebNN context is lost. It will be called in two cases:
    // 1. The `WebNNContext` remote is cut off from its receiver, for example,
    // when the WebNN Service is crashed.
    // 2. We need to release the WebNNContext in the WebNN Service when the the
    // WebNN Service captured a WebNN context lost error.
    Reilly Grant . unresolved
    ```suggestion
    // Closes the `remote_context_` and `context_lost_capturer_receiver_` pipes
    // because the context has been lost.
    ```
    Mingming1 Xu

    Done. I move these comments above the `OnWebNNContextLostCaptured` because now `OnWebNNContextDisconnected` will call it. Also rename them as `OnLostCaptured` and `OnDisconnected` to simplify.

    File third_party/blink/renderer/modules/ml/ml_context.cc
    Line 214, Patchset 4: &MLContext::OnWebNNContextDisconnected, WrapWeakPersistent(this)));
    Reilly Grant . resolved

    I would attach this to `context_lost_capturer_receiver_` so that we have the invariant that either we get a call to `OnWebNNContextLostCaptured()` explaining why the context has been lost or we get a call to this handler to indicate we never will (and assume a GPU process crash).

    Mingming1 Xu

    Done

    Line 224, Patchset 4:void MLContext::OnWebNNContextLostCaptured(const String& context_lost_info) {
    ningxin hu . unresolved

    Should this info be reported to JS code?

    Mingming1 Xu

    Now I reported these info related to adapter error in `src\services\webnn\dml\utils.cc`:

      `std::string context_lost_info;
    if (hr == E_OUTOFMEMORY) {
    context_lost_info = "out of memory.";

    } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
        context_lost_info = "device removed.";
    } else if (hr == DXGI_ERROR_DEVICE_HUNG) {
    context_lost_info = "device hung.";
    } else if (hr == DXGI_ERROR_DEVICE_RESET) {
    context_lost_info = "device reset.";
    } else {
    return;
    }`
    Line 239, Patchset 4: "WebNN context is lost due to connection error."));
    Reilly Grant . resolved

    Call `OnWebNNContextLostCaptured()` with this error. Put all the logic for reporting errors in that function.

    Mingming1 Xu

    Done

    File third_party/blink/renderer/modules/ml/ml_context_lost_info.h
    Line 27, Patchset 4: String message_;
    ningxin hu . resolved

    const String?

    Mingming1 Xu

    Now I use `dictionary MLContextLostInfo`, the header file will be generated automatically.

    File third_party/blink/renderer/modules/ml/ml_context_lost_info.idl
    Line 11, Patchset 4:] interface MLContextLostInfo {
    Reilly Grant . unresolved

    Use a dictionary instead of an interface since this type doesn't have any methods. I know there are a number of examples where this sort of type is defined as an interface but most of the ones I'm responsible for (such as the WebUSB transfer result types) I wish I'd defined as dictionaries instead.

    Mingming1 Xu

    Done. I deleted this file and remove the dictionary definition to `ml_context.idl`, PTAL.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 6
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Fri, 14 Jun 2024 09:28:08 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 14, 2024, 2:45:43 PMJun 14
    to Mingming1 Xu, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 6 comments

    File services/webnn/dml/graph_impl_dml.cc
    Line 4619, Patchset 6 (Latest): HandleWebNNContextLost(hr, context);
    Reilly Grant . unresolved

    @rafael....@microsoft.com, can you confirm that all graph creation failures should be considered "context lost" scenarios?

    File services/webnn/dml/utils.cc
    Line 374, Patchset 6 (Latest): std::string message;
    Reilly Grant . unresolved

    nit: use `std::string_view` and a `switch` statement.

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 113, Patchset 6 (Latest): // Called by the GPU process to create a message pipe for
    // `WebNNContextClient`, the `message` is the error message
    // sent to the renderer process, the renderer process will
    // override this method to handle the lost error.
    Reilly Grant . unresolved
    ```suggestion
    // Called to explain why the `WebNNContextClient` is no longer available.
    ```
    Line 117, Patchset 6 (Latest): OnLostCaptured(string message);
    Reilly Grant . unresolved

    To match WebGPU and better express that the context is unusable after this.

    ```suggestion
    OnDestroyed(string message);
    ```
    File services/webnn/webnn_context_impl.cc
    Line 95, Patchset 6 (Latest): client_remote_->OnLostCaptured(message);
    Reilly Grant . unresolved

    After sending this message we should call `context_provider_->OnConnectionError(this)` so that the context provider destroys this object.

    File services/webnn/webnn_context_provider_impl.cc
    Line 14, Patchset 6 (Latest):#include "services/webnn/public/mojom/webnn_context_lost_capturer.mojom.h"
    Reilly Grant . unresolved

    This file no longer exists.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Fri, 14 Jun 2024 18:45:31 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 14, 2024, 5:00:07 PMJun 14
    to Mingming1 Xu, AyeAye, Rafael Cintron, Austin Sullivan, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4: context_provider_->OnConnectionError(this);
    Bryan Bernhart . unresolved

    If the context gets destroyed, isn't that considered "lost"?

    Reilly Grant

    If this method is being called then the renderer isn't listening anymore.

    Bryan Bernhart

    I suppose the renderer or `MLContext` could generate "lost" event beforehand. Not exactly certain why this was needed in WebGPU.

    Mingming1 Xu

    Yes, I think this method will only be called when the renderer destroys the WebNNContext pipe.

    Bryan Bernhart

    Dug into this more. Looks like WebGPU considers destroyed as "lost" to communicate the context/device shouldn't be kept alive, by the web developer, if some transient (but still fatal) failure occurred. A transient error could be OOM, for example.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Fri, 14 Jun 2024 20:59:52 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 14, 2024, 7:16:20 PMJun 14
    to Mingming1 Xu, AyeAye, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Line 4619, Patchset 6 (Latest): HandleWebNNContextLost(hr, context);
    Reilly Grant . unresolved

    @rafael....@microsoft.com, can you confirm that all graph creation failures should be considered "context lost" scenarios?

    Rafael Cintron

    I'm glad we're having this conversation. :-)

    Here are the three errors Chromium should handle from DirectML and Direct3D:

    1- `DXGI_ERROR_DEVICE_REMOVED`
    "Device removed" can happen for reasons Chromium is not able to control such as a driver update, other application on the system behaving poorly or the adapter actually being physically removed.

    The latter can happen if you're performing computation on the Surface Book's NVidia GPU (which is under the keyboard) and the user detaches the tablet portion from the keyboard. The Intel GPU, which is behind the monitor, is the only one left you can use.

    Since removed devices are basically useless and the D3D12 device is a singleton per process, and per adapter, it is difficult for WebNN, or other individual Chromium system to recover from this on its own. Hence, other code in the GPU process detects the device has become removed (at least on the D3D11 device) and terminates the GPU process with a special error code. The browser process restarts it.

    Separately from this change, we should work with the people who own that code in Chromium to make WebNN be a part of it somehow.

    For this case, #1, the renderer process will see that the GPU process has been disconnected and we should resolve the "lost" promise.

    2- `E_OUTOFMEMORY`
    It is possible to handle `E_OUTOFMEMORY` without everything becoming hosed. _However_, when the GPU process reaches the job commit limit, the browser process is notified by the OS. Upon the browser process being notified, it terminates and restarts the GPU process. So, you have a race between the GPU process handling this gracefully and the browser process terminating it.

    I think we should handle this case gracefully. In practice, the web developer will see the GPU process connection get terminated, same as the "device removed".

    3- <Everything else>
    For any other error, we should treat it as Chromium passing invalid parameters to the API and do a `CHECK` so that we can get crash dumps and diagnose.

    Here is the [Dawn code](https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d/D3DError.cpp) which handles errors similar to what I described above.


    We should `LOG(ERROR)` any failure we receive from DirectX along with the `logging::SystemErrorCodeToString` friendly string, even ones we CHECK for. This will become critical to root causing failures in the wild from web developers. Pooja has already converted these but we should convert others that might have been missed.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Comment-Date: Fri, 14 Jun 2024 23:16:08 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 14, 2024, 8:46:56 PMJun 14
    to Mingming1 Xu, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Line 4619, Patchset 6 (Latest): HandleWebNNContextLost(hr, context);
    Reilly Grant . unresolved

    @rafael....@microsoft.com, can you confirm that all graph creation failures should be considered "context lost" scenarios?

    Rafael Cintron

    I'm glad we're having this conversation. :-)

    Here are the three errors Chromium should handle from DirectML and Direct3D:

    1- `DXGI_ERROR_DEVICE_REMOVED`
    "Device removed" can happen for reasons Chromium is not able to control such as a driver update, other application on the system behaving poorly or the adapter actually being physically removed.

    The latter can happen if you're performing computation on the Surface Book's NVidia GPU (which is under the keyboard) and the user detaches the tablet portion from the keyboard. The Intel GPU, which is behind the monitor, is the only one left you can use.

    Since removed devices are basically useless and the D3D12 device is a singleton per process, and per adapter, it is difficult for WebNN, or other individual Chromium system to recover from this on its own. Hence, other code in the GPU process detects the device has become removed (at least on the D3D11 device) and terminates the GPU process with a special error code. The browser process restarts it.

    Separately from this change, we should work with the people who own that code in Chromium to make WebNN be a part of it somehow.

    For this case, #1, the renderer process will see that the GPU process has been disconnected and we should resolve the "lost" promise.

    2- `E_OUTOFMEMORY`
    It is possible to handle `E_OUTOFMEMORY` without everything becoming hosed. _However_, when the GPU process reaches the job commit limit, the browser process is notified by the OS. Upon the browser process being notified, it terminates and restarts the GPU process. So, you have a race between the GPU process handling this gracefully and the browser process terminating it.

    I think we should handle this case gracefully. In practice, the web developer will see the GPU process connection get terminated, same as the "device removed".

    3- <Everything else>
    For any other error, we should treat it as Chromium passing invalid parameters to the API and do a `CHECK` so that we can get crash dumps and diagnose.

    Here is the [Dawn code](https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d/D3DError.cpp) which handles errors similar to what I described above.


    We should `LOG(ERROR)` any failure we receive from DirectX along with the `logging::SystemErrorCodeToString` friendly string, even ones we CHECK for. This will become critical to root causing failures in the wild from web developers. Pooja has already converted these but we should convert others that might have been missed.

    Reilly Grant

    My concern is mainly that context loss is not an error handling mechanism. If the system is broken in a way that is out of the developer's control then we can report a context loss. If the developer has asked for something unreasonable that's an error we should throw.

    In general we should not `CHECK` on OS return values. If we want to collect crash reports to diagnose unexpected errors we can use `base::debug::DumpWithoutCrashing()`.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Sat, 15 Jun 2024 00:46:44 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 17, 2024, 5:02:22 AM (12 days ago) Jun 17
    to AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, Reilly Grant, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 13 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4: // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    Reilly Grant

    I don't think we should be reporting a context loss for errors like this. That's a separate concept.

    Bryan Bernhart

    If this returns E_OUTOFMEMORY, the context will become inoperable and effectively lost. Also, the HR could be any other error result (but unlikely) so I can see it being quite useful to coverage to one error handler.

    Mingming1 Xu

    This discussion should be merged into above one https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/: whether we should handle the context lost according to the system error code `HRESULT`.

    File services/webnn/dml/graph_impl_dml.cc
    Line 4619, Patchset 6: HandleWebNNContextLost(hr, context);
    Reilly Grant . unresolved

    @rafael....@microsoft.com, can you confirm that all graph creation failures should be considered "context lost" scenarios?

    Rafael Cintron

    I'm glad we're having this conversation. :-)

    Here are the three errors Chromium should handle from DirectML and Direct3D:

    1- `DXGI_ERROR_DEVICE_REMOVED`
    "Device removed" can happen for reasons Chromium is not able to control such as a driver update, other application on the system behaving poorly or the adapter actually being physically removed.

    The latter can happen if you're performing computation on the Surface Book's NVidia GPU (which is under the keyboard) and the user detaches the tablet portion from the keyboard. The Intel GPU, which is behind the monitor, is the only one left you can use.

    Since removed devices are basically useless and the D3D12 device is a singleton per process, and per adapter, it is difficult for WebNN, or other individual Chromium system to recover from this on its own. Hence, other code in the GPU process detects the device has become removed (at least on the D3D11 device) and terminates the GPU process with a special error code. The browser process restarts it.

    Separately from this change, we should work with the people who own that code in Chromium to make WebNN be a part of it somehow.

    For this case, #1, the renderer process will see that the GPU process has been disconnected and we should resolve the "lost" promise.

    2- `E_OUTOFMEMORY`
    It is possible to handle `E_OUTOFMEMORY` without everything becoming hosed. _However_, when the GPU process reaches the job commit limit, the browser process is notified by the OS. Upon the browser process being notified, it terminates and restarts the GPU process. So, you have a race between the GPU process handling this gracefully and the browser process terminating it.

    I think we should handle this case gracefully. In practice, the web developer will see the GPU process connection get terminated, same as the "device removed".

    3- <Everything else>
    For any other error, we should treat it as Chromium passing invalid parameters to the API and do a `CHECK` so that we can get crash dumps and diagnose.

    Here is the [Dawn code](https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d/D3DError.cpp) which handles errors similar to what I described above.


    We should `LOG(ERROR)` any failure we receive from DirectX along with the `logging::SystemErrorCodeToString` friendly string, even ones we CHECK for. This will become critical to root causing failures in the wild from web developers. Pooja has already converted these but we should convert others that might have been missed.

    Reilly Grant

    My concern is mainly that context loss is not an error handling mechanism. If the system is broken in a way that is out of the developer's control then we can report a context loss. If the developer has asked for something unreasonable that's an error we should throw.

    In general we should not `CHECK` on OS return values. If we want to collect crash reports to diagnose unexpected errors we can use `base::debug::DumpWithoutCrashing()`.

    Mingming1 Xu

    @rafael....@microsoft.com should I handle all the DXGI, D3D11, D3D12 errors just like [Dawn code](https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d/D3DError.cpp) as context lost errors? Currently I only handle the `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, `DXGI_ERROR_DEVICE_HUNG` and `DXGI_ERROR_DEVICE_RESET`.

    File services/webnn/dml/utils.cc
    Line 369, Patchset 4:void HandleWebNNContextLost(HRESULT hr, const WebNNContextImpl* context) {
    Bryan Bernhart . resolved

    Shouldn't this be off the `WebNNContextImpl`?

    Mingming1 Xu

    Done

    Line 374, Patchset 6: std::string message;
    Reilly Grant . resolved

    nit: use `std::string_view` and a `switch` statement.

    Mingming1 Xu

    Done

    Line 377, Patchset 4: } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . unresolved

    It would be helpful to LOG the actual reason too via GetDeviceRemovedReason().

    Mingming1 Xu

    choice 1: add `LOG(ERROR) << logging::SystemErrorCodeToString(hr);` for each case.
    choice 2: call `ID3D12Device::GetDeviceRemovedReason` and LOG the result of it. But I am not sure which case indicates device removed, is it only the `DXGI_ERROR_DEVICE_REMOVED`?
    choice 3: both 1 and 2

    Any ideas? @rafael....@microsoft.com

    File services/webnn/public/mojom/webnn_context_lost_capturer.mojom
    Line 11, Patchset 4:interface WebNNContextLostCapturer {
    Reilly Grant . resolved

    Can we generalize this to `WebNNContextClient` to match the style of other Mojo interfaces where we have a `Foo` interface and a `FooClient` interface for any async events that need to be sent back to the consumer of the `Foo` interface?

    Mingming1 Xu

    Good idea! Now I delete this file to move the `WebNNContextClient` to `webnn_context_provider.mojom`, PTAL.

    Mingming1 Xu

    Done

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 113, Patchset 6: // Called by the GPU process to create a message pipe for

    // `WebNNContextClient`, the `message` is the error message
    // sent to the renderer process, the renderer process will
    // override this method to handle the lost error.
    Reilly Grant . resolved
    ```suggestion
    // Called to explain why the `WebNNContextClient` is no longer available.
    ```
    Mingming1 Xu

    Done

    Line 117, Patchset 6: OnLostCaptured(string message);
    Reilly Grant . resolved

    To match WebGPU and better express that the context is unusable after this.

    ```suggestion
    OnDestroyed(string message);
    ```
    Mingming1 Xu

    Done

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4: context_provider_->OnConnectionError(this);
    Bryan Bernhart . unresolved

    If the context gets destroyed, isn't that considered "lost"?

    Reilly Grant

    If this method is being called then the renderer isn't listening anymore.

    Bryan Bernhart

    I suppose the renderer or `MLContext` could generate "lost" event beforehand. Not exactly certain why this was needed in WebGPU.

    Mingming1 Xu

    Yes, I think this method will only be called when the renderer destroys the WebNNContext pipe.

    Bryan Bernhart

    Dug into this more. Looks like WebGPU considers destroyed as "lost" to communicate the context/device shouldn't be kept alive, by the web developer, if some transient (but still fatal) failure occurred. A transient error could be OOM, for example.

    Mingming1 Xu

    Can we resolve this discussion? @bryan.b...@intel.com

    Line 95, Patchset 6: client_remote_->OnLostCaptured(message);
    Reilly Grant . resolved

    After sending this message we should call `context_provider_->OnConnectionError(this)` so that the context provider destroys this object.

    Mingming1 Xu

    Yes, I intended to trigger the `WebNNContextImpl::OnConnectionError` by the `context_remote_.reset();` in blink side. I am OK to explicitly call it here.

    File services/webnn/webnn_context_provider_impl.cc
    Line 14, Patchset 6:#include "services/webnn/public/mojom/webnn_context_lost_capturer.mojom.h"
    Reilly Grant . resolved

    This file no longer exists.

    Mingming1 Xu

    Done

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 154, Patchset 4: // Reset the remote of `WebNNContext` and the receiver of

    // `WebNNContextLostCapturer`, and then resolve the `lost_property_` which
    // indicates that the WebNN context is lost. It will be called in two cases:
    // 1. The `WebNNContext` remote is cut off from its receiver, for example,
    // when the WebNN Service is crashed.
    // 2. We need to release the WebNNContext in the WebNN Service when the the
    // WebNN Service captured a WebNN context lost error.
    Reilly Grant . resolved
    ```suggestion
    // Closes the `remote_context_` and `context_lost_capturer_receiver_` pipes
    // because the context has been lost.
    ```
    Mingming1 Xu

    Done. I move these comments above the `OnWebNNContextLostCaptured` because now `OnWebNNContextDisconnected` will call it. Also rename them as `OnLostCaptured` and `OnDisconnected` to simplify.

    Mingming1 Xu

    Done

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 7
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Mon, 17 Jun 2024 09:02:09 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 17, 2024, 5:51:23 PM (12 days ago) Jun 17
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant voted and added 4 comments

    Votes added by Reilly Grant

    Code-Review+1

    4 comments

    Patchset-level comments
    File-level comment, Patchset 8 (Latest):
    Reilly Grant . resolved

    LGTM with nits.

    I think I'm satisfied that the situations within the DirectML backend where we treat an error as context loss represent errors which are not the developer's responsibility and are instead situations in which the system has gotten in the an unrecoverable state (e.g. the GPU process is nearly at its commit limit) and so the only reasonable course of action is to clean everything up and start again.

    We will have to see how this works out in real-world usage.

    File services/webnn/webnn_graph_impl.h
    Line 69, Patchset 8 (Latest): raw_ptr<WebNNContextImpl> context_;
    Reilly Grant . unresolved

    Why this change?

    File third_party/blink/renderer/modules/ml/ml_context.cc
    Line 183, Patchset 8 (Latest): "Invalid script state.")));
    Reilly Grant . unresolved
    ```suggestion
    "Context is lost.")));
    ```
    Line 248, Patchset 8 (Latest): // Remote context gets automatically unbound when the execution context
    // destructs.
    if (!context_remote_.is_bound()) {
    return;
    }
    Reilly Grant . unresolved

    This check should move into `MLContext::createBuffer()` and throw a "Context is lost." error. We can then `CHECK()` here that `context_remote_` is bound.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 8
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Mon, 17 Jun 2024 21:51:07 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 17, 2024, 6:45:46 PM (12 days ago) Jun 17
    to Mingming1 Xu, Reilly Grant, AyeAye, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu and ningxin hu

    Rafael Cintron added 4 comments

    File services/webnn/dml/graph_impl_dml.cc
    Line 4610, Patchset 8 (Latest): DLOG(ERROR) << error_message << " " << logging::SystemErrorCodeToString(hr);
    Rafael Cintron . unresolved

    I believe that by the time we get here, we should have already `LOG`ed all of the errors encountered. If so, we may not need this particular `DLOG`.

    But, I'm OK with leaving it in and converting to `LOG`. We should prepend the incoming error with a "[WebNN Graph Build Error]" string so people looking at the log know where the error came from.

    OK to be a separate change and also fixup the other "HandleX" methods that follow a similar pattern.

    Line 4619, Patchset 6: HandleWebNNContextLost(hr, context);
    Reilly Grant . unresolved

    @rafael....@microsoft.com, can you confirm that all graph creation failures should be considered "context lost" scenarios?

    Rafael Cintron

    I'm glad we're having this conversation. :-)

    Here are the three errors Chromium should handle from DirectML and Direct3D:

    1- `DXGI_ERROR_DEVICE_REMOVED`
    "Device removed" can happen for reasons Chromium is not able to control such as a driver update, other application on the system behaving poorly or the adapter actually being physically removed.

    The latter can happen if you're performing computation on the Surface Book's NVidia GPU (which is under the keyboard) and the user detaches the tablet portion from the keyboard. The Intel GPU, which is behind the monitor, is the only one left you can use.

    Since removed devices are basically useless and the D3D12 device is a singleton per process, and per adapter, it is difficult for WebNN, or other individual Chromium system to recover from this on its own. Hence, other code in the GPU process detects the device has become removed (at least on the D3D11 device) and terminates the GPU process with a special error code. The browser process restarts it.

    Separately from this change, we should work with the people who own that code in Chromium to make WebNN be a part of it somehow.

    For this case, #1, the renderer process will see that the GPU process has been disconnected and we should resolve the "lost" promise.

    2- `E_OUTOFMEMORY`
    It is possible to handle `E_OUTOFMEMORY` without everything becoming hosed. _However_, when the GPU process reaches the job commit limit, the browser process is notified by the OS. Upon the browser process being notified, it terminates and restarts the GPU process. So, you have a race between the GPU process handling this gracefully and the browser process terminating it.

    I think we should handle this case gracefully. In practice, the web developer will see the GPU process connection get terminated, same as the "device removed".

    3- <Everything else>
    For any other error, we should treat it as Chromium passing invalid parameters to the API and do a `CHECK` so that we can get crash dumps and diagnose.

    Here is the [Dawn code](https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d/D3DError.cpp) which handles errors similar to what I described above.


    We should `LOG(ERROR)` any failure we receive from DirectX along with the `logging::SystemErrorCodeToString` friendly string, even ones we CHECK for. This will become critical to root causing failures in the wild from web developers. Pooja has already converted these but we should convert others that might have been missed.

    Reilly Grant

    My concern is mainly that context loss is not an error handling mechanism. If the system is broken in a way that is out of the developer's control then we can report a context loss. If the developer has asked for something unreasonable that's an error we should throw.

    In general we should not `CHECK` on OS return values. If we want to collect crash reports to diagnose unexpected errors we can use `base::debug::DumpWithoutCrashing()`.

    Mingming1 Xu

    @rafael....@microsoft.com should I handle all the DXGI, D3D11, D3D12 errors just like [Dawn code](https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d/D3DError.cpp) as context lost errors? Currently I only handle the `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, `DXGI_ERROR_DEVICE_HUNG` and `DXGI_ERROR_DEVICE_RESET`.

    Rafael Cintron

    In general we should not CHECK on OS return values. If we want to collect crash reports to diagnose unexpected errors we can use base::debug::DumpWithoutCrashing().

    I disagree with this thinking, in general. There are many instances in Chromium today where we CHECK that OS APIs return success values or, in limited cases, known errors. The alternative is to gracefully handle (and, of course, have tests for) all of the combinations of failures you can get.

    In rendering code, an incorrectly handled error case in a stateful API can result in very difficult to debug "my browser window is black" bugs. Correlating these bug reports with `DumpWithoutCrashing` metrics is not straightforward. Aside from being willing to upload a full memory dump of the browser process to a Chromium engineer to investigate, there's not much an end user can do to help themselves (and you) root cause because, well, their browser is black.

    We do need to have provisions for gracefully handling things when Chromium is not at fault. But, in my opinion, errors like `E_INVALIDARG` should always cause a `CHECK`. There is a general trend in Chromium to fail fast in more instances where previously, we would have limped along.

    My concern is mainly that context loss is not an error handling mechanism. If the system is broken in a way that is out of the developer's control then we can report a context loss. If the developer has asked for something unreasonable that's an error we should throw.

    I share this concern and I like how you and others have championed synchronous errors getting returned to web developers at graph building time rather than compilation or inferencing time. WebGPU/WebGL do not quite have this luxury and, in the case of WebGPU, have an "ErrorScope" mechanism to help you make sense of errors asynchronously.

    For WebNN, the only case for "the developer has asked for something unreasonable" that comes to mind is `MLBuffer` creation. We can continue to handle this with "content lost" (similar to WebGL/WebGPU) or we can go further and make buffer creation be asynchronous. I go back and forth on it myself. Might be best to tackle asynchronous buffer creation as a separate discussion thread.

     rafael....@microsoft.com should I handle all the DXGI, D3D11, D3D12 errors just like Dawn code as context lost errors? Currently I only handle the E_OUTOFMEMORY, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_HUNG and DXGI_ERROR_DEVICE_RESET.

    `DEVICE_HUNG` is the application's fault. `DEVICE_RESET` is usually not.

    With that in mind, let's have the final list of errors we handle gracefully be `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED` and `DXGI_ERROR_DEVICE_RESET`. All others, I think we should CHECK.

    File services/webnn/dml/utils.cc
    Line 370, Patchset 8 (Latest):void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Rafael Cintron . unresolved

    The "HandleXFailure" methods which call `HandleWebNNContextLost` have code in common.

    To better unify them, I think we should structure `HandleWebNNContextLost` as follows:
    1-If the `hr` is `E_OUTOFMEMORY`, set the web developer friendly error message to "out of memory", otherwise, set it to "device removed." Having an "internal error" for the non-device removed ones may give an attacker information about whether the error is exploitable but I don't feel strongly about this.
    2-Call the `OnDestroyed` callback.
    3-`LOG(ERROR) << "[WebNN]: " << logging::SystemErrorCodeToString(hr)`.
    4-`CHECK` that the error is either `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, or `DXGI_ERROR_DEVICE_RESET`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    WDYT?

    Line 371, Patchset 8 (Latest): if (!context) {
    return;
    }
    Rafael Cintron . unresolved

    `context` is always non-nullptr so we should be able to remove this if statement.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • ningxin hu
    Gerrit-Comment-Date: Mon, 17 Jun 2024 22:45:26 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 17, 2024, 10:04:57 PM (12 days ago) Jun 17
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Reilly Grant

    I'm trying to reconcile this with [our official guidance](https://chromium.googlesource.com/chromium/src/+/main/styleguide/c++/checks.md) because I generally agree that some interactions with OS APIs can be considered invariants, like `E_INVALIDARG` errors, if we're confident that the API isn't going to change out from underneath us.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Tue, 18 Jun 2024 02:04:39 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 17, 2024, 10:20:43 PM (12 days ago) Jun 17
    to Mingming1 Xu, Reilly Grant, AyeAye, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Rafael Cintron

    @rei...@chromium.org, which part of the official guidance were you specifically referring to?

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Tue, 18 Jun 2024 02:20:24 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 18, 2024, 5:18:45 AM (11 days ago) Jun 18
    to Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 9 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4: // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    Reilly Grant

    I don't think we should be reporting a context loss for errors like this. That's a separate concept.

    Bryan Bernhart

    If this returns E_OUTOFMEMORY, the context will become inoperable and effectively lost. Also, the HR could be any other error result (but unlikely) so I can see it being quite useful to coverage to one error handler.

    Mingming1 Xu

    This discussion should be merged into above one https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/: whether we should handle the context lost according to the system error code `HRESULT`.

    Mingming1 Xu

    I am thinking maybe all the methods that return `HRESULT` inside or outside should be handled by `HandleWebNNContextLost`, if the `HRESULT` is context lost error, we should call `OnDestoryed`. WDYT? @rafael....@microsoft.com

    File services/webnn/dml/graph_impl_dml.cc
    Line 4610, Patchset 8: DLOG(ERROR) << error_message << " " << logging::SystemErrorCodeToString(hr);
    Rafael Cintron . resolved

    I believe that by the time we get here, we should have already `LOG`ed all of the errors encountered. If so, we may not need this particular `DLOG`.

    But, I'm OK with leaving it in and converting to `LOG`. We should prepend the incoming error with a "[WebNN Graph Build Error]" string so people looking at the log know where the error came from.

    OK to be a separate change and also fixup the other "HandleX" methods that follow a similar pattern.

    Mingming1 Xu
    File services/webnn/dml/utils.cc
    Line 370, Patchset 8:void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Rafael Cintron . unresolved

    The "HandleXFailure" methods which call `HandleWebNNContextLost` have code in common.

    To better unify them, I think we should structure `HandleWebNNContextLost` as follows:
    1-If the `hr` is `E_OUTOFMEMORY`, set the web developer friendly error message to "out of memory", otherwise, set it to "device removed." Having an "internal error" for the non-device removed ones may give an attacker information about whether the error is exploitable but I don't feel strongly about this.
    2-Call the `OnDestroyed` callback.
    3-`LOG(ERROR) << "[WebNN]: " << logging::SystemErrorCodeToString(hr)`.
    4-`CHECK` that the error is either `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, or `DXGI_ERROR_DEVICE_RESET`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    WDYT?

    Mingming1 Xu

    I am not clear for the step 1,3,4, please help me understand.
    Currently I handled `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`,`DXGI_ERROR_DEVICE_RESET` three ones to call `OnDestroyed`, otherwise I directly return and won't call `OnDestroyed`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    The `OnDestroyed` method is called to send context lost message to renderer, I suppose we needn't LOG(ERROR) the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call `OnDestroyed` with the message in `HandleWebNNContextLost`.

    Line 371, Patchset 8: if (!context) {
    return;
    }
    Rafael Cintron . resolved

    `context` is always non-nullptr so we should be able to remove this if statement.

    Mingming1 Xu

    Done

    Line 377, Patchset 4: } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . unresolved

    It would be helpful to LOG the actual reason too via GetDeviceRemovedReason().

    Mingming1 Xu

    choice 1: add `LOG(ERROR) << logging::SystemErrorCodeToString(hr);` for each case.
    choice 2: call `ID3D12Device::GetDeviceRemovedReason` and LOG the result of it. But I am not sure which case indicates device removed, is it only the `DXGI_ERROR_DEVICE_REMOVED`?
    choice 3: both 1 and 2

    Any ideas? @rafael....@microsoft.com

    Mingming1 Xu

    Think again, we may needn't call `LOG(ERROR) << logging::SystemErrorCodeToString(hr)` again here, it can be called outside, I want to leave this method only responsible for handling specific context lost error message which will be sent by `OnDestroyed` to renderer.

    File services/webnn/webnn_graph_impl.h
    Line 69, Patchset 8: raw_ptr<WebNNContextImpl> context_;
    Reilly Grant . resolved

    Why this change?

    Mingming1 Xu

    Done

    File third_party/blink/renderer/modules/ml/ml_context.cc
    Line 183, Patchset 8: "Invalid script state.")));
    Reilly Grant . resolved
    ```suggestion
    "Context is lost.")));
    ```
    Mingming1 Xu

    Done

    Line 248, Patchset 8: // Remote context gets automatically unbound when the execution context

    // destructs.
    if (!context_remote_.is_bound()) {
    return;
    }
    Reilly Grant . resolved

    This check should move into `MLContext::createBuffer()` and throw a "Context is lost." error. We can then `CHECK()` here that `context_remote_` is bound.

    Mingming1 Xu

    Done

    File third_party/blink/renderer/modules/ml/ml_context_lost_info.idl
    Line 11, Patchset 4:] interface MLContextLostInfo {
    Reilly Grant . resolved

    Use a dictionary instead of an interface since this type doesn't have any methods. I know there are a number of examples where this sort of type is defined as an interface but most of the ones I'm responsible for (such as the WebUSB transfer result types) I wish I'd defined as dictionaries instead.

    Mingming1 Xu

    Done. I deleted this file and remove the dictionary definition to `ml_context.idl`, PTAL.

    Mingming1 Xu

    Done

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 9
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Tue, 18 Jun 2024 09:18:35 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    udantumexico

    unread,
    Jun 18, 2024, 5:20:00 AM (11 days ago) Jun 18
    to Chromium-reviews, Mingming1 Xu (Gerrit), Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org, change-...@chromium-review.googlesource.com

    Bryan Bernhart (Gerrit)

    unread,
    Jun 18, 2024, 6:54:14 PM (11 days ago) Jun 18
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/webnn_context_impl.cc
    Line 39, Patchset 4: context_provider_->OnConnectionError(this);
    Bryan Bernhart . resolved

    If the context gets destroyed, isn't that considered "lost"?

    Reilly Grant

    If this method is being called then the renderer isn't listening anymore.

    Bryan Bernhart

    I suppose the renderer or `MLContext` could generate "lost" event beforehand. Not exactly certain why this was needed in WebGPU.

    Mingming1 Xu

    Yes, I think this method will only be called when the renderer destroys the WebNNContext pipe.

    Bryan Bernhart

    Dug into this more. Looks like WebGPU considers destroyed as "lost" to communicate the context/device shouldn't be kept alive, by the web developer, if some transient (but still fatal) failure occurred. A transient error could be OOM, for example.

    Mingming1 Xu

    Can we resolve this discussion? @bryan.b...@intel.com

    Bryan Bernhart

    @mingmi...@intel.com Yes, thanks for following up. Thinking some more, I don't think we need to address this until we need `MLContext.destroy()`.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Tue, 18 Jun 2024 22:54:03 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 18, 2024, 7:06:22 PM (11 days ago) Jun 18
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/dml/utils.cc
    Line 377, Patchset 4: } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . unresolved

    It would be helpful to LOG the actual reason too via GetDeviceRemovedReason().

    Mingming1 Xu

    choice 1: add `LOG(ERROR) << logging::SystemErrorCodeToString(hr);` for each case.
    choice 2: call `ID3D12Device::GetDeviceRemovedReason` and LOG the result of it. But I am not sure which case indicates device removed, is it only the `DXGI_ERROR_DEVICE_REMOVED`?
    choice 3: both 1 and 2

    Any ideas? @rafael....@microsoft.com

    Mingming1 Xu

    Think again, we may needn't call `LOG(ERROR) << logging::SystemErrorCodeToString(hr)` again here, it can be called outside, I want to leave this method only responsible for handling specific context lost error message which will be sent by `OnDestroyed` to renderer.

    Bryan Bernhart

    Any HR that results in "lost" could leverage GetDeviceRemovedReason(). So at-least, HUNG, REMOVED, RESET, INTERNAL_ERROR, or NOT_CURRENTLY_AVAILABLE. We could also, perhaps more simply, always append the reason from any HR returned by it. This is what WebGPU does [1].

    [1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d12/DeviceD3D12.h;drc=eb9fd54dfff30384cac3f208cb1ddaf58afd639a;l=224

    Gerrit-Comment-Date: Tue, 18 Jun 2024 23:06:13 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 18, 2024, 7:12:46 PM (11 days ago) Jun 18
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Reilly Grant

    _Bad/untrusted/changing driver, kernel API, hardware failure:_ A misbehaving GPU driver may cause us to be unable to proceed. This is not an invariant failure. On platforms where we are wary that a kernel API may change without sufficient prior notice we should not CHECK() its result as we expect the rug to be pulled from under our feet. In the case of hardware failure we should not for instance CHECK() that a write succeeded.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Tue, 18 Jun 2024 23:12:30 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 18, 2024, 9:28:47 PM (11 days ago) Jun 18
    to Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Rafael Cintron and ningxin hu

    Mingming1 Xu added 1 comment

    File services/webnn/dml/utils.cc
    Line 377, Patchset 4: } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . unresolved

    It would be helpful to LOG the actual reason too via GetDeviceRemovedReason().

    Mingming1 Xu

    choice 1: add `LOG(ERROR) << logging::SystemErrorCodeToString(hr);` for each case.
    choice 2: call `ID3D12Device::GetDeviceRemovedReason` and LOG the result of it. But I am not sure which case indicates device removed, is it only the `DXGI_ERROR_DEVICE_REMOVED`?
    choice 3: both 1 and 2

    Any ideas? @rafael....@microsoft.com

    Mingming1 Xu

    Think again, we may needn't call `LOG(ERROR) << logging::SystemErrorCodeToString(hr)` again here, it can be called outside, I want to leave this method only responsible for handling specific context lost error message which will be sent by `OnDestroyed` to renderer.

    Bryan Bernhart

    Any HR that results in "lost" could leverage GetDeviceRemovedReason(). So at-least, HUNG, REMOVED, RESET, INTERNAL_ERROR, or NOT_CURRENTLY_AVAILABLE. We could also, perhaps more simply, always append the reason from any HR returned by it. This is what WebGPU does [1].

    [1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d12/DeviceD3D12.h;drc=eb9fd54dfff30384cac3f208cb1ddaf58afd639a;l=224

    Mingming1 Xu

    So we don't need the `switch-case` here to get the contest lost message, we can directly call `ID3D12Device::GetDeviceRemovedReason` to get the message and send it to renderer by `OnDestroyed`. That sounds good!
    Two questions:
    1. if `ID3D12Device::GetDeviceRemovedReason` handled all the lost types including `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED` and `DXGI_ERROR_DEVICE_RESET` (@rafael....@microsoft.com mentioned)?
    2. I find there is another API called `IDMLDevice::GetDeviceRemovedReason`, what's the difference and should we use this?

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Rafael Cintron
    • ningxin hu
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Wed, 19 Jun 2024 01:28:36 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 19, 2024, 3:46:39 AM (11 days ago) Jun 19
    to Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Rafael Cintron and ningxin hu

    Mingming1 Xu added 1 comment

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Rafael Cintron
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 10
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Wed, 19 Jun 2024 07:46:29 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 19, 2024, 7:02:01 PM (10 days ago) Jun 19
    to Mingming1 Xu, Reilly Grant, AyeAye, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 3 comments

    File services/webnn/dml/graph_impl_dml.cc
    Rafael Cintron

    @rei...@chromium.org, thank you for the quote.

    I agree with the guidance on not `CHECK`-ing for "hardware failure" or "changing driver". These map, roughly, to `DXGI_ERROR_DEVICE_REMOVED`/`DXGI_ERROR_DEVICE_RESET` and I think we should definitely handle that gracefully by `LOG(ERROR)`-ing and arranging for orderly GPU process shutdown like the rest of the code does.

    Windows does not really have the analogous of "... kernel API may change without sufficient prior notice...". Windows components tend to be stable in terms the input they receive and error out on. If anything, my experience has been they're overly generous about continuing to accept bad input that was accepted by a previous version of the API.

    File services/webnn/dml/utils.cc
    Line 370, Patchset 8:void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Rafael Cintron . unresolved

    The "HandleXFailure" methods which call `HandleWebNNContextLost` have code in common.

    To better unify them, I think we should structure `HandleWebNNContextLost` as follows:
    1-If the `hr` is `E_OUTOFMEMORY`, set the web developer friendly error message to "out of memory", otherwise, set it to "device removed." Having an "internal error" for the non-device removed ones may give an attacker information about whether the error is exploitable but I don't feel strongly about this.
    2-Call the `OnDestroyed` callback.
    3-`LOG(ERROR) << "[WebNN]: " << logging::SystemErrorCodeToString(hr)`.
    4-`CHECK` that the error is either `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, or `DXGI_ERROR_DEVICE_RESET`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    WDYT?

    Mingming1 Xu

    I am not clear for the step 1,3,4, please help me understand.
    Currently I handled `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`,`DXGI_ERROR_DEVICE_RESET` three ones to call `OnDestroyed`, otherwise I directly return and won't call `OnDestroyed`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    The `OnDestroyed` method is called to send context lost message to renderer, I suppose we needn't LOG(ERROR) the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call `OnDestroyed` with the message in `HandleWebNNContextLost`.

    Rafael Cintron

    Currently I handled E_OUTOFMEMORY, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET three ones to call OnDestroyed, otherwise I directly return and won't call OnDestroyed.

    By the time we get here, we know we have a failure HRESULT, correct? If so, we should always call `OnDestroyed` so that the web developer can realize that no more WebNN will happen with the context. The difference is whether the error warrants a `CHECK` or graceful termination of the GPU process.

    The `OnDestroye`d method is called to send context lost message to renderer, I suppose we needn't `LOG(ERROR)` the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call OnDestroyed with the message in HandleWebNNContextLost.

    The `LOG(ERROR)` gives the about:gpu log additional information about why the error occurred that we are not providing to the renderer process. Also, we can't count on the web developer reporting the error to the end user in a way we can easily diagnose. Having the error in the about:gpu log means we, the WebNN implementation, can control when and what we log, not the web developer.

    Line 377, Patchset 4: } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . unresolved

    It would be helpful to LOG the actual reason too via GetDeviceRemovedReason().

    Mingming1 Xu

    choice 1: add `LOG(ERROR) << logging::SystemErrorCodeToString(hr);` for each case.
    choice 2: call `ID3D12Device::GetDeviceRemovedReason` and LOG the result of it. But I am not sure which case indicates device removed, is it only the `DXGI_ERROR_DEVICE_REMOVED`?
    choice 3: both 1 and 2

    Any ideas? @rafael....@microsoft.com

    Mingming1 Xu

    Think again, we may needn't call `LOG(ERROR) << logging::SystemErrorCodeToString(hr)` again here, it can be called outside, I want to leave this method only responsible for handling specific context lost error message which will be sent by `OnDestroyed` to renderer.

    Bryan Bernhart

    Any HR that results in "lost" could leverage GetDeviceRemovedReason(). So at-least, HUNG, REMOVED, RESET, INTERNAL_ERROR, or NOT_CURRENTLY_AVAILABLE. We could also, perhaps more simply, always append the reason from any HR returned by it. This is what WebGPU does [1].

    [1] https://source.chromium.org/chromium/chromium/src/+/main:third_party/dawn/src/dawn/native/d3d12/DeviceD3D12.h;drc=eb9fd54dfff30384cac3f208cb1ddaf58afd639a;l=224

    Mingming1 Xu

    So we don't need the `switch-case` here to get the contest lost message, we can directly call `ID3D12Device::GetDeviceRemovedReason` to get the message and send it to renderer by `OnDestroyed`. That sounds good!
    Two questions:
    1. if `ID3D12Device::GetDeviceRemovedReason` handled all the lost types including `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED` and `DXGI_ERROR_DEVICE_RESET` (@rafael....@microsoft.com mentioned)?
    2. I find there is another API called `IDMLDevice::GetDeviceRemovedReason`, what's the difference and should we use this?

    Rafael Cintron

    1) if `ID3D12Device::GetDeviceRemovedReason` handled all the lost types including E_OUTOFMEMORY, DXGI_ERROR_DEVICE_REMOVED and DXGI_ERROR_DEVICE_RESET (@rafael....@microsoft.com mentioned)?

    All DML entrypoints which return an HRESULT first call `ID3D12Device::GetDeviceRemovedReason` and return `DXGI_ERROR_DEVICE_REMOVED` on failure. To be expected since DML is heavily tied to D3D12 for command execution and resource allocation/management.

    2) I find there is another API called `IDMLDevice::GetDeviceRemovedReason` what's the difference and should we use this?

    In the current DML implementation, `IDMLDevice::GetDeviceRemovedReason` calls `ID3D12Device::GetDeviceRemovedReason`. So in today's implementation, DML cannot become removed when D3D12 is still unremoved or vice versa.

    I think we should check the return value of all HRESULT-returning DirectX methods and `LOG(ERROR)` right at the point of failure. This will help us better diagnose problems in the wild. We already largely do this with the macros in `services/webnn/dml/error.h`.

    I can't think of a good reason to also use the "GetDeviceRemovedReason" APIs, except maybe as a sanity check in special codepaths.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 11
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 19 Jun 2024 23:01:49 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 19, 2024, 10:37:04 PM (10 days ago) Jun 19
    to Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 3 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4: // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    Reilly Grant

    I don't think we should be reporting a context loss for errors like this. That's a separate concept.

    Bryan Bernhart

    If this returns E_OUTOFMEMORY, the context will become inoperable and effectively lost. Also, the HR could be any other error result (but unlikely) so I can see it being quite useful to coverage to one error handler.

    Mingming1 Xu

    This discussion should be merged into above one https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/: whether we should handle the context lost according to the system error code `HRESULT`.

    Mingming1 Xu

    I am thinking maybe all the methods that return `HRESULT` inside or outside should be handled by `HandleWebNNContextLost`, if the `HRESULT` is context lost error, we should call `OnDestoryed`. WDYT? @rafael....@microsoft.com

    Mingming1 Xu

    Updated, PTAL.

    File services/webnn/dml/utils.cc
    Line 370, Patchset 8:void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Rafael Cintron . unresolved

    The "HandleXFailure" methods which call `HandleWebNNContextLost` have code in common.

    To better unify them, I think we should structure `HandleWebNNContextLost` as follows:
    1-If the `hr` is `E_OUTOFMEMORY`, set the web developer friendly error message to "out of memory", otherwise, set it to "device removed." Having an "internal error" for the non-device removed ones may give an attacker information about whether the error is exploitable but I don't feel strongly about this.
    2-Call the `OnDestroyed` callback.
    3-`LOG(ERROR) << "[WebNN]: " << logging::SystemErrorCodeToString(hr)`.
    4-`CHECK` that the error is either `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, or `DXGI_ERROR_DEVICE_RESET`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    WDYT?

    Mingming1 Xu

    I am not clear for the step 1,3,4, please help me understand.
    Currently I handled `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`,`DXGI_ERROR_DEVICE_RESET` three ones to call `OnDestroyed`, otherwise I directly return and won't call `OnDestroyed`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    The `OnDestroyed` method is called to send context lost message to renderer, I suppose we needn't LOG(ERROR) the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call `OnDestroyed` with the message in `HandleWebNNContextLost`.

    Rafael Cintron

    Currently I handled E_OUTOFMEMORY, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET three ones to call OnDestroyed, otherwise I directly return and won't call OnDestroyed.

    By the time we get here, we know we have a failure HRESULT, correct? If so, we should always call `OnDestroyed` so that the web developer can realize that no more WebNN will happen with the context. The difference is whether the error warrants a `CHECK` or graceful termination of the GPU process.

    The `OnDestroye`d method is called to send context lost message to renderer, I suppose we needn't `LOG(ERROR)` the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call OnDestroyed with the message in HandleWebNNContextLost.

    The `LOG(ERROR)` gives the about:gpu log additional information about why the error occurred that we are not providing to the renderer process. Also, we can't count on the web developer reporting the error to the end user in a way we can easily diagnose. Having the error in the about:gpu log means we, the WebNN implementation, can control when and what we log, not the web developer.

    Mingming1 Xu

    Thanks! Please see my updates.

    Mingming1 Xu

    I think we should check the return value of all HRESULT-returning DirectX methods and LOG(ERROR) right at the point of failure. This will help us better diagnose problems in the wild. We already largely do this with the macros in services/webnn/dml/error.h.

    Agree, sorry to be unclear, I just mean I don't want to LOG(ERROR) the `logging::SystemErrorCodeToString(hr)` twice. Now I only LOG it in `HandleWebNNContextLost` as you suggested.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 12
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Austin Sullivan <asu...@chromium.org>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Austin Sullivan <asu...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 20 Jun 2024 02:36:50 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 20, 2024, 4:08:03 PM (9 days ago) Jun 20
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Reilly Grant

    For DirectML, the question is how transparently driver quirks are exposed to API consumers. I trust Windows components more than vendor-specific drivers.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 20 Jun 2024 20:07:54 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 20, 2024, 8:24:23 PM (9 days ago) Jun 20
    to Mingming1 Xu, Reilly Grant, AyeAye, Austin Sullivan, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Austin Sullivan, Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 4 comments

    File services/webnn/dml/graph_impl_dml.cc
    Rafael Cintron

    If history is any indication, I expect we will encounter driver/hardware quirks for incorrect output. I do not expect rogue, new errors to be returned straight from the driver to the application. The reason DML and D3D runtimes exist is to shield applications from this.

    Would it make you feel more comfortable if we only `CHECK`-ed certain errors and gracefully handled everything else by declaring "context lost" and emitting a LOG + histogram? With a histogram, we can track the values we get over time to see how bad things are in practice.

    File services/webnn/dml/utils.cc
    Line 370, Patchset 8:void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Rafael Cintron . resolved

    The "HandleXFailure" methods which call `HandleWebNNContextLost` have code in common.

    To better unify them, I think we should structure `HandleWebNNContextLost` as follows:
    1-If the `hr` is `E_OUTOFMEMORY`, set the web developer friendly error message to "out of memory", otherwise, set it to "device removed." Having an "internal error" for the non-device removed ones may give an attacker information about whether the error is exploitable but I don't feel strongly about this.
    2-Call the `OnDestroyed` callback.
    3-`LOG(ERROR) << "[WebNN]: " << logging::SystemErrorCodeToString(hr)`.
    4-`CHECK` that the error is either `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`, or `DXGI_ERROR_DEVICE_RESET`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    WDYT?

    Mingming1 Xu

    I am not clear for the step 1,3,4, please help me understand.
    Currently I handled `E_OUTOFMEMORY`, `DXGI_ERROR_DEVICE_REMOVED`,`DXGI_ERROR_DEVICE_RESET` three ones to call `OnDestroyed`, otherwise I directly return and won't call `OnDestroyed`.

    We could also have this function run the callback if we templatize it on the callback type and error type. If that's too complicated, we can have it return the developer friendly string to the caller and have the caller use it to run the callback.

    The `OnDestroyed` method is called to send context lost message to renderer, I suppose we needn't LOG(ERROR) the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call `OnDestroyed` with the message in `HandleWebNNContextLost`.

    Rafael Cintron

    Currently I handled E_OUTOFMEMORY, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET three ones to call OnDestroyed, otherwise I directly return and won't call OnDestroyed.

    By the time we get here, we know we have a failure HRESULT, correct? If so, we should always call `OnDestroyed` so that the web developer can realize that no more WebNN will happen with the context. The difference is whether the error warrants a `CHECK` or graceful termination of the GPU process.

    The `OnDestroye`d method is called to send context lost message to renderer, I suppose we needn't `LOG(ERROR)` the context lost message again and let the caller run the callback (`CreateGraphCallback` and `ComputeCallback`)? As current code base, I only want to get the context lost message, and then call OnDestroyed with the message in HandleWebNNContextLost.

    The `LOG(ERROR)` gives the about:gpu log additional information about why the error occurred that we are not providing to the renderer process. Also, we can't count on the web developer reporting the error to the end user in a way we can easily diagnose. Having the error in the about:gpu log means we, the WebNN implementation, can control when and what we log, not the web developer.

    Mingming1 Xu

    Thanks! Please see my updates.

    Rafael Cintron

    Acknowledged

    Line 389, Patchset 12 (Latest): CHECK(hr == E_OUTOFMEMORY || hr == DXGI_ERROR_DEVICE_REMOVED ||
    hr == DXGI_ERROR_DEVICE_REMOVED);
    Rafael Cintron . unresolved

    The last two statements are the same.

    Line 391, Patchset 12 (Latest):}
    Rafael Cintron . unresolved

    @mingmi...@intel.com, apologies for misleading. What you have here so far is good.

    If the HRESULT is a failure HRESULT, please have the browser additionally call `GetDeviceRemovedReason` and `LOG(ERROR)` that as well with the string prefix "[WebNN] Device Removed Reason: ". `GetDeviceRemovedReason` returns more information about the error than just the generic device removed error.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Austin Sullivan
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 00:24:11 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Austin Sullivan (Gerrit)

    unread,
    Jun 21, 2024, 11:25:56 AM (8 days ago) Jun 21
    to Mingming1 Xu, Reilly Grant, AyeAye, Rafael Cintron, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Austin Sullivan added 1 comment

    Patchset-level comments
    File-level comment, Patchset 12 (Latest):
    Austin Sullivan . resolved

    moving myself to CC since this CL already has overlapping reviewers!

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 12
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 15:25:46 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 21, 2024, 3:42:54 PM (8 days ago) Jun 21
    to Mingming1 Xu, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Mingming1 Xu, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Open in Gerrit

    Related details

    Attention is currently required from:
    Gerrit-Comment-Date: Fri, 21 Jun 2024 19:42:42 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 21, 2024, 4:41:46 PM (8 days ago) Jun 21
    to Mingming1 Xu, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 1 comment

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Rafael Cintron

    Here are the [WebGPU device lost tests](https://gpuweb.github.io/cts/standalone/?q=webgpu:api,operation,device,lost:*). They're implemented with a `destroy` method on the device, which I think we should also have in WebNN.

    The destroy method can come as a separate change.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 20:41:35 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Reilly Grant (Gerrit)

    unread,
    Jun 21, 2024, 5:18:07 PM (8 days ago) Jun 21
    to Mingming1 Xu, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

    Reilly Grant added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Line 4619, Patchset 6: HandleWebNNContextLost(hr, context);
    Reilly Grant . resolved
    Reilly Grant

    That makes sense to me.

    The only other thing I want to make sure we are clear on is separating errors by how we expect developers to react to them. As I understand it developers should respond to context loss by simply redoing what they did before (i.e. if the dGPU is removed, create a new context for the iGPU and _maybe_ use a different model for the smaller chip). On the other hand there might be graph or buffer creation errors which mean the developer should not try again, or should take some specific corrective action (e.g. trying to load a different model). These should not be context loss errors and we'll need to use a different method for reporting them (e.g. rejecting from `build()`, `dispatch()` or `readBuffer()`).

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 12
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 21:17:54 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Dwayne Robinson (Gerrit)

    unread,
    Jun 21, 2024, 7:00:47 PM (8 days ago) Jun 21
    to Mingming1 Xu, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Dwayne Robinson added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Dwayne Robinson

    (I hadn't seen this CR until Rafael brought it to my attention)

    Agreed with `E_INVALIDARG`, as Chromium should not be sending invalid parameters to DML in the first place, and DML would report this error due to bad *caller* parameters, not bubbled up from the driver (which become different error messages).
     
    Some others besides `DXGI_ERROR_DEVICE_REMOVED` to gracefully handle include: `DXGI_ERROR_UNSUPPORTED`, `DXGI_DDI_ERR_UNSUPPORTED`, `DXGI_ERROR_DEVICE_RESET`.

    Though, `DXGI_ERROR_DRIVER_INTERNAL_ERROR` is more serious - "An internal issue prevented the driver from carrying out the specified operation. The driver's state is probably suspect, and the application should not continue." 🤔

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 12
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 23:00:35 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 21, 2024, 7:27:15 PM (8 days ago) Jun 21
    to Mingming1 Xu, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 3 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 225, Patchset 4: // TODO(crbug.com/41492165): generate error using context.
    Bryan Bernhart . unresolved

    Can we handle context lost here as well?

    Reilly Grant

    Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

    Bryan Bernhart

    Are we assuming that any failure to create a buffer or graph means the context has been lost?

    We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

    Reilly Grant

    I don't think we should be reporting a context loss for errors like this. That's a separate concept.

    Bryan Bernhart

    If this returns E_OUTOFMEMORY, the context will become inoperable and effectively lost. Also, the HR could be any other error result (but unlikely) so I can see it being quite useful to coverage to one error handler.

    Mingming1 Xu

    This discussion should be merged into above one https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/: whether we should handle the context lost according to the system error code `HRESULT`.

    Mingming1 Xu

    I am thinking maybe all the methods that return `HRESULT` inside or outside should be handled by `HandleWebNNContextLost`, if the `HRESULT` is context lost error, we should call `OnDestoryed`. WDYT? @rafael....@microsoft.com

    Mingming1 Xu

    Updated, PTAL.

    Bryan Bernhart

    @mingmi...@intel.com

    It depends if the HRESULT error will be recoverable or not. In this case, E_OUTOFMEMORY is not recoverable and leaves the context inoperable so a "lost" seems appropriate. In other cases, this is not so [1].

    Line 329, Patchset 12 (Latest): HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    File services/webnn/dml/utils.cc
    Line 370, Patchset 12 (Latest):void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Bryan Bernhart . unresolved

    Since it's not possible to HandleWebNNContextLost() without a context now and every WebNN object should have access to said context, why must this function be in utils?

    Open in Gerrit

    Related details

    Attention is currently required from:
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 23:27:07 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 21, 2024, 7:49:45 PM (8 days ago) Jun 21
    to Mingming1 Xu, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Dwayne Robinson, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 1 comment

    File services/webnn/dml/graph_impl_dml.cc
    Rafael Cintron

    As I understand it developers should respond to context loss by simply redoing what they did before

    @rei...@chromium.org, both WebGL and WebGPU allow the web developer _one_ context lost of grace. After that, no more WebGL/WebGPU for you. Not a great story but is the best we have with the information we're given by the platform about what's really happening.

    That said, WebNN is not like WebGPU/WebGL where the platform ingests arbitrary shaders with infinite loops so I expect the platform will be better able to preempt work the web developer submits.

    Of the errors Dwayne quoted, `DXGI_ERROR_UNSUPPORTED` and `DXGI_DDI_ERR_UNSUPPORTED` are a temporary implementation detail specific to NPUS when not all of the ops are supported at compile time. I'm double checking with the DML team but I expect/hope that we don't have to completely tear down and restart DML when this happens. OK with me if we handle those particular ones as a separate change.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Dwayne Robinson
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Comment-Date: Fri, 21 Jun 2024 23:49:36 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Dwayne Robinson <dwa...@microsoft.com>
    Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 24, 2024, 3:13:29 AM (6 days ago) Jun 24
    to Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Dwayne Robinson, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 7 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    File services/webnn/dml/utils.cc
    Line 370, Patchset 12:void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Bryan Bernhart . unresolved

    Since it's not possible to HandleWebNNContextLost() without a context now and every WebNN object should have access to said context, why must this function be in utils?

    Mingming1 Xu

    Good points, now move it to `ContextImplDml` as `ContextImplDml::HandleContextLost`

    Line 377, Patchset 4: } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    Bryan Bernhart . resolved
    Mingming1 Xu

    Done

    Line 389, Patchset 12: CHECK(hr == E_OUTOFMEMORY || hr == DXGI_ERROR_DEVICE_REMOVED ||
    hr == DXGI_ERROR_DEVICE_REMOVED);
    Rafael Cintron . resolved

    The last two statements are the same.

    Mingming1 Xu

    Done

    Rafael Cintron . unresolved

    @mingmi...@intel.com, apologies for misleading. What you have here so far is good.

    If the HRESULT is a failure HRESULT, please have the browser additionally call `GetDeviceRemovedReason` and `LOG(ERROR)` that as well with the string prefix "[WebNN] Device Removed Reason: ". `GetDeviceRemovedReason` returns more information about the error than just the generic device removed error.

    Mingming1 Xu

    Updated, PTAL, thanks!

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 68, Patchset 13 (Latest): pending_receiver<WebNNContextClient>? context_client_receiver;
    Mingming1 Xu . unresolved

    @ningx...@intel.com we also need reviewer for this mojom file, could you help invite? Thanks!

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Rafael Cintron

    Here are the [WebGPU device lost tests](https://gpuweb.github.io/cts/standalone/?q=webgpu:api,operation,device,lost:*). They're implemented with a `destroy` method on the device, which I think we should also have in WebNN.

    The destroy method can come as a separate change.

    Mingming1 Xu

    But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed).

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Dwayne Robinson
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 13
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Mon, 24 Jun 2024 07:13:17 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 24, 2024, 12:45:56 PM (5 days ago) Jun 24
    to Mingming1 Xu, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Dwayne Robinson, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 2 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Rafael Cintron

    Here are the [WebGPU device lost tests](https://gpuweb.github.io/cts/standalone/?q=webgpu:api,operation,device,lost:*). They're implemented with a `destroy` method on the device, which I think we should also have in WebNN.

    The destroy method can come as a separate change.

    Mingming1 Xu

    Thanks, open https://issues.chromium.org/issues/348904836 to track.

    But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed).

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Bryan Bernhart

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Correct but only when the context wasn't destroyed/lost on purpose. I am fine with a TODO.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Dwayne Robinson
    • Mingming1 Xu
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Mon, 24 Jun 2024 16:45:40 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 24, 2024, 9:10:26 PM (5 days ago) Jun 24
    to Mingming1 Xu, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Dwayne Robinson, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 4 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 100, Patchset 13 (Latest): HandleContextLost(hr);
    Rafael Cintron . unresolved

    Every occurrence of `HandleContextLost` in this CL is preceded by `LOG(ERROR) << "some error"`.

    To better streamline how we do error handling, Consider having `HandleContextLost`, itself, accept an error string that it can combine with the existing strings it already outputs.

    Line 328, Patchset 13 (Latest): LOG(ERROR) << error_message;
    Rafael Cintron . unresolved

    Nit: While you're here, please prepend the string "[WebNN] " to this error message so that we know which system it is coming from in logs.

    (Same goes for the other places where you're outputting to the error log)

    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    Rafael Cintron

    @bryan.b...@intel.com, can you give me an example of a non-fatal HRESULT error that is recoverable by re-creating the command recorder?

    Line 354, Patchset 13 (Latest): CHECK(hr == E_OUTOFMEMORY || hr == DXGI_ERROR_DEVICE_REMOVED ||
    hr == DXGI_ERROR_DEVICE_REMOVED);
    Rafael Cintron . unresolved

    These two || expressions are the same.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Dwayne Robinson
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Tue, 25 Jun 2024 01:10:11 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 25, 2024, 1:48:03 AM (5 days ago) Jun 25
    to Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 6 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 100, Patchset 13: HandleContextLost(hr);
    Rafael Cintron . resolved

    Every occurrence of `HandleContextLost` in this CL is preceded by `LOG(ERROR) << "some error"`.

    To better streamline how we do error handling, Consider having `HandleContextLost`, itself, accept an error string that it can combine with the existing strings it already outputs.

    Mingming1 Xu

    Good idea,done.

    Line 328, Patchset 13: LOG(ERROR) << error_message;
    Rafael Cintron . resolved

    Nit: While you're here, please prepend the string "[WebNN] " to this error message so that we know which system it is coming from in logs.

    (Same goes for the other places where you're outputting to the error log)

    Mingming1 Xu

    Done

    Line 354, Patchset 13: CHECK(hr == E_OUTOFMEMORY || hr == DXGI_ERROR_DEVICE_REMOVED ||
    hr == DXGI_ERROR_DEVICE_REMOVED);
    Rafael Cintron . resolved

    These two || expressions are the same.

    Mingming1 Xu

    Done

    File services/webnn/dml/utils.cc
    Line 370, Patchset 12:void HandleWebNNContextLost(HRESULT hr, ContextImplDml* context) {
    Bryan Bernhart . resolved

    Since it's not possible to HandleWebNNContextLost() without a context now and every WebNN object should have access to said context, why must this function be in utils?

    Mingming1 Xu

    Good points, now move it to `ContextImplDml` as `ContextImplDml::HandleContextLost`

    Mingming1 Xu

    Done

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Rafael Cintron

    Here are the [WebGPU device lost tests](https://gpuweb.github.io/cts/standalone/?q=webgpu:api,operation,device,lost:*). They're implemented with a `destroy` method on the device, which I think we should also have in WebNN.

    The destroy method can come as a separate change.

    Mingming1 Xu

    Thanks, open https://issues.chromium.org/issues/348904836 to track.

    But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed).

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Bryan Bernhart

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Correct but only when the context wasn't destroyed/lost on purpose. I am fine with a TODO.

    Mingming1 Xu

    Thinking again, what's the purpose and usecase of introducing `MLContext.destroy()` into WebNN? Is it only for testing against forcing context lost from renderer process? @rafael....@microsoft.com @bryan.b...@intel.com @ningx...@intel.com

    File third_party/blink/renderer/modules/ml/ml_context.cc
    Line 224, Patchset 4:void MLContext::OnWebNNContextLostCaptured(const String& context_lost_info) {
    ningxin hu . unresolved

    Should this info be reported to JS code?

    Mingming1 Xu

    Now I reported these info related to adapter error in `src\services\webnn\dml\utils.cc`:

      `std::string context_lost_info;
    if (hr == E_OUTOFMEMORY) {
    context_lost_info = "out of memory.";

    } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
        context_lost_info = "device removed.";
    } else if (hr == DXGI_ERROR_DEVICE_HUNG) {
    context_lost_info = "device hung.";
    } else if (hr == DXGI_ERROR_DEVICE_RESET) {
    context_lost_info = "device reset.";
    } else {
    return;
    }`
    Mingming1 Xu

    Discussed a lot in other comments, PTAL, thanks!

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 14
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Tue, 25 Jun 2024 05:47:53 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 25, 2024, 11:48:06 AM (4 days ago) Jun 25
    to Mingming1 Xu, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron added 1 comment

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . unresolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Rafael Cintron

    Here are the [WebGPU device lost tests](https://gpuweb.github.io/cts/standalone/?q=webgpu:api,operation,device,lost:*). They're implemented with a `destroy` method on the device, which I think we should also have in WebNN.

    The destroy method can come as a separate change.

    Mingming1 Xu

    Thanks, open https://issues.chromium.org/issues/348904836 to track.

    But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed).

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Bryan Bernhart

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Correct but only when the context wasn't destroyed/lost on purpose. I am fine with a TODO.

    Mingming1 Xu

    Thinking again, what's the purpose and usecase of introducing `MLContext.destroy()` into WebNN? Is it only for testing against forcing context lost from renderer process? @rafael....@microsoft.com @bryan.b...@intel.com @ningx...@intel.com

    Rafael Cintron

    Destroy serves two purposes. One is for testing. The other is for the web developer to free the memory and resources taken up by the context. In the case of hybrid GPUs, that means powering down the discrete GPU and giving hours of battery life back to the end user.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 14
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Tue, 25 Jun 2024 15:47:56 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 25, 2024, 12:28:01 PM (4 days ago) Jun 25
    to Mingming1 Xu, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Rafael Cintron, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/dml/context_impl_dml.cc
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    Rafael Cintron

    @bryan.b...@intel.com, can you give me an example of a non-fatal HRESULT error that is recoverable by re-creating the command recorder?

    Bryan Bernhart

    @rafael....@microsoft.com any HRESULT that doesn't explicitly say the device will be put into a removal state could be recoverable. If the web developer builds N graphs and the N+1 graph build causes `E_INVALIDARG` or `DXGI_ERROR_INVALID_CALL` and so on then the entire app/workload will go down and the web developer won't know what graph caused it.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Tue, 25 Jun 2024 16:27:47 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 25, 2024, 8:34:48 PM (4 days ago) Jun 25
    to Mingming1 Xu, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron voted and added 3 comments

    Votes added by Rafael Cintron

    Code-Review+1

    3 comments

    Patchset-level comments
    File-level comment, Patchset 14 (Latest):
    Rafael Cintron . resolved

    Code change LGTM lone "nit" I just opened.

    File services/webnn/dml/context_impl_dml.cc
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    Rafael Cintron

    @bryan.b...@intel.com, can you give me an example of a non-fatal HRESULT error that is recoverable by re-creating the command recorder?

    Bryan Bernhart

    @rafael....@microsoft.com any HRESULT that doesn't explicitly say the device will be put into a removal state could be recoverable. If the web developer builds N graphs and the N+1 graph build causes `E_INVALIDARG` or `DXGI_ERROR_INVALID_CALL` and so on then the entire app/workload will go down and the web developer won't know what graph caused it.

    Rafael Cintron

    My opinion, as I've stated in other comment threads in this CL, is that `E_INVALIDARG` and `DXGI_ERROR_INVALID_CALL` are not errors the browser should try to gracefully recover from. We should `CHECK` those and fail fast the GPU process.

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 146, Patchset 14 (Latest): bool IsLost() { return is_lost_; }
    Rafael Cintron . unresolved

    Nit: Please `IsLost` a const method.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 00:34:32 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 25, 2024, 11:03:36 PM (4 days ago) Jun 25
    to Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Reilly Grant and ningxin hu

    Mingming1 Xu added 2 comments

    File services/webnn/dml/utils.cc
    Line 391, Patchset 12:}
    Rafael Cintron . resolved

    @mingmi...@intel.com, apologies for misleading. What you have here so far is good.

    If the HRESULT is a failure HRESULT, please have the browser additionally call `GetDeviceRemovedReason` and `LOG(ERROR)` that as well with the string prefix "[WebNN] Device Removed Reason: ". `GetDeviceRemovedReason` returns more information about the error than just the generic device removed error.

    Mingming1 Xu

    Updated, PTAL, thanks!

    Mingming1 Xu

    Done

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 146, Patchset 14: bool IsLost() { return is_lost_; }
    Rafael Cintron . resolved

    Nit: Please `IsLost` a const method.

    Mingming1 Xu

    Done

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 16
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 03:03:24 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    ningxin hu (Gerrit)

    unread,
    Jun 25, 2024, 11:04:47 PM (4 days ago) Jun 25
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart and Reilly Grant

    ningxin hu added 1 comment

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 68, Patchset 13: pending_receiver<WebNNContextClient>? context_client_receiver;
    Mingming1 Xu . resolved

    @ningx...@intel.com we also need reviewer for this mojom file, could you help invite? Thanks!

    ningxin hu

    Done. Add @aj...@chromium.org for reviewing mojo changes.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    • Reilly Grant
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 16
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
    Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Alex Gough <aj...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 03:04:35 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    ningxin hu (Gerrit)

    unread,
    Jun 26, 2024, 2:13:13 AM (4 days ago) Jun 26
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Mingming1 Xu and Reilly Grant

    ningxin hu added 14 comments

    Patchset-level comments
    File-level comment, Patchset 16:
    ningxin hu . resolved

    Adapter's `HandleAdapterFailure()` of https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/adapter.cc;bpv=0;bpt=1 may also need to handle context lost.

    File services/webnn/dml/context_impl_dml.h
    Line 43, Patchset 16: void HandleContextLost(std::string_view error_message, HRESULT hr);
    ningxin hu . unresolved

    This method may crash GPU process. It'd be better to make it explicit: i.e. something like `HandleContextLostOrCrash()`.

    It might be worth adding a comment explaining which errors are treated as "context lost".

    File services/webnn/dml/context_impl_dml.cc
    Line 146, Patchset 16: HandleContextLost("Failed to start recording.", hr);
    ningxin hu . unresolved

    `StartRecordingIfNecessary()` may already fail and call `HandleRecordingError()` which calls `HandleContextLost()` before. It may crash at `CHECK` before running the callback and this `HandleContextLost()`.

    Line 156, Patchset 16: HandleRecordingError("Failed to close and execute the command list.", hr);
    ningxin hu . unresolved

    `HandleRecordingError()` calls `HandleContextLost()` that may crash at `CHECK` and won't run the following callback.

    Line 180, Patchset 16: HandleRecordingError("Failed to download the buffer.", hr);
    ningxin hu . unresolved

    ditto.

    Line 191, Patchset 16: if (FAILED(hr)) {
    ningxin hu . unresolved

    Why don't we call `HandleContextLost()` for this failure like others?

    Line 323, Patchset 16:void ContextImplDml::HandleContextLost(std::string_view error_message,
    ningxin hu . unresolved

    This is not an error message passed to user JS code but a log message. I feel it is confusing by passing it just for logging. Could it be moved out from this method? Other helpers, like `RETURN_IF_FAILED` in [error.h](https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/error.h;l=14;drc=ae983e94583c5e2d400fa7641439d2e8ca13fd93;bpv=1;bpt=1), already log the error.

    Line 341, Patchset 16: OnDestroyed(base::StrCat({"WebNN context is lost due to ", message}));
    ningxin hu . unresolved

    Do we want to reject `lost` promise for other "internal errors"? Or we just crash the GPU process by `CHECK` an error is not an "internal error"?

    Line 343, Patchset 16: LOG(ERROR) << "[WebNN] Device Removed Reason: "
    << logging::SystemErrorCodeToString(
    adapter_->d3d12_device()->GetDeviceRemovedReason());
    ningxin hu . unresolved

    Should this log only for device removal error?

    File services/webnn/dml/graph_impl_dml.cc
    Line 4793, Patchset 16: if (hr == E_OUTOFMEMORY) {
    std::move(callback).Run(base::unexpected(CreateError(
    mojom::Error::Code::kUnknownError,
    error_message + " No enough memory resources are available.")));
    ningxin hu . unresolved

    Should we remove this case? Because "out of memory" is now handled by context lost.

    Line 4801, Patchset 16: context->HandleContextLost(error_message, hr);
    ningxin hu . unresolved

    Where does the `context` come from? Did you forget passing it via parameter?

    Line 5848, Patchset 16: std::move(callback).Run(
    base::unexpected(std::move(create_operator_result.error())));
    ningxin hu . unresolved

    Do we handle context lost here? Given the following code handles graph creation failure (it internally handles context lost) if creating an identity node fails.

    Line 5872, Patchset 16: HandleGraphCreationFailure("Failed to create identity operator.",
    std::move(callback));
    ningxin hu . unresolved

    This may need to pass `context`.

    Line 5915, Patchset 16: if (hr == E_OUTOFMEMORY) {
    std::move(callback).Run(ComputeResult::NewError(CreateError(
    mojom::Error::Code::kUnknownError,
    error_message + " No enough memory resources are available.")));
    ningxin hu . unresolved

    ditto, this case might be unnecessary.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 06:12:59 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    ningxin hu (Gerrit)

    unread,
    Jun 26, 2024, 2:13:51 AM (4 days ago) Jun 26
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Mingming1 Xu and Reilly Grant

    ningxin hu added 1 comment

    Patchset-level comments
    File-level comment, Patchset 16:
    ningxin hu . unresolved

    Adapter's `HandleAdapterFailure()` of https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/adapter.cc;bpv=0;bpt=1 may also need to handle context lost.

    ningxin hu

    Open it for comments.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 18
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
    Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Alex Gough <aj...@chromium.org>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 06:13:38 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 26, 2024, 4:59:43 AM (3 days ago) Jun 26
    to Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Reilly Grant and ningxin hu

    Mingming1 Xu added 12 comments

    Patchset-level comments
    ningxin hu . unresolved

    Adapter's `HandleAdapterFailure()` of https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/adapter.cc;bpv=0;bpt=1 may also need to handle context lost.

    ningxin hu

    Open it for comments.

    Mingming1 Xu

    While the adapter is created before creating a context, if fails, there is no context.

    File services/webnn/dml/context_impl_dml.cc
    Line 146, Patchset 16: HandleContextLost("Failed to start recording.", hr);
    ningxin hu . unresolved

    `StartRecordingIfNecessary()` may already fail and call `HandleRecordingError()` which calls `HandleContextLost()` before. It may crash at `CHECK` before running the callback and this `HandleContextLost()`.

    Mingming1 Xu

    Do you mean we should not call `HandleRecordingError()` which calls`HandleContextLost()` in the `StartRecordingIfNecessary()`? We can call the `HandleRecordingError()` outside if `StartRecordingIfNecessary()` fails.

    Line 156, Patchset 16: HandleRecordingError("Failed to close and execute the command list.", hr);
    ningxin hu . unresolved

    `HandleRecordingError()` calls `HandleContextLost()` that may crash at `CHECK` and won't run the following callback.

    Mingming1 Xu

    Do you mean move the `HandleRecordingError()` below the `std::move(callback).Run...` to let the callback run at first?

    Line 191, Patchset 16: if (FAILED(hr)) {
    ningxin hu . resolved

    Why don't we call `HandleContextLost()` for this failure like others?

    Mingming1 Xu

    Yes, I missed it. Added now.

    Line 323, Patchset 16:void ContextImplDml::HandleContextLost(std::string_view error_message,
    ningxin hu . unresolved

    This is not an error message passed to user JS code but a log message. I feel it is confusing by passing it just for logging. Could it be moved out from this method? Other helpers, like `RETURN_IF_FAILED` in [error.h](https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/error.h;l=14;drc=ae983e94583c5e2d400fa7641439d2e8ca13fd93;bpv=1;bpt=1), already log the error.

    Line 341, Patchset 16: OnDestroyed(base::StrCat({"WebNN context is lost due to ", message}));
    ningxin hu . unresolved

    Do we want to reject `lost` promise for other "internal errors"? Or we just crash the GPU process by `CHECK` an error is not an "internal error"?

    Line 343, Patchset 16: LOG(ERROR) << "[WebNN] Device Removed Reason: "
    << logging::SystemErrorCodeToString(
    adapter_->d3d12_device()->GetDeviceRemovedReason());
    ningxin hu . unresolved

    Should this log only for device removal error?

    Mingming1 Xu
    Agree, maybe we should:
    ` HRESULT device_removed_reason = adapter_->d3d12_device()->GetDeviceRemovedReason();
    if (FAILED(device_removed_reason)) {

    LOG(ERROR) << "[WebNN] Device Removed Reason: "
                   << logging::SystemErrorCodeToString(device_removed_reason);
    }`
    File services/webnn/dml/graph_impl_dml.cc
    Line 4793, Patchset 16: if (hr == E_OUTOFMEMORY) {
    std::move(callback).Run(base::unexpected(CreateError(
    mojom::Error::Code::kUnknownError,
    error_message + " No enough memory resources are available.")));
    ningxin hu . resolved

    Should we remove this case? Because "out of memory" is now handled by context lost.

    Mingming1 Xu

    Done

    Line 4801, Patchset 16: context->HandleContextLost(error_message, hr);
    ningxin hu . resolved

    Where does the `context` come from? Did you forget passing it via parameter?

    Mingming1 Xu

    Done

    Line 5848, Patchset 16: std::move(callback).Run(
    base::unexpected(std::move(create_operator_result.error())));
    ningxin hu . unresolved

    Do we handle context lost here? Given the following code handles graph creation failure (it internally handles context lost) if creating an identity node fails.

    Mingming1 Xu
    `void HandleGraphCreationFailure(
    const std::string& error_message,
    WebNNContextImpl::CreateGraphImplCallback callback)` won't handle context lost, only `void HandleGraphCreationFailure(
    const std::string& error_message,
    HRESULT hr,
    WebNNContextImpl::CreateGraphImplCallback callback, ContextImplDml* context)` will handle context lost, it requires the `HRESULT`.
    Line 5872, Patchset 16: HandleGraphCreationFailure("Failed to create identity operator.",
    std::move(callback));
    ningxin hu . unresolved

    This may need to pass `context`.

    Mingming1 Xu

    There is no `HRESULT` to be handled by `HandleContextLost`.

    Line 5915, Patchset 16: if (hr == E_OUTOFMEMORY) {
    std::move(callback).Run(ComputeResult::NewError(CreateError(
    mojom::Error::Code::kUnknownError,
    error_message + " No enough memory resources are available.")));
    ningxin hu . resolved

    ditto, this case might be unnecessary.

    Mingming1 Xu

    Done

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 19
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
    Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-Attention: Alex Gough <aj...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 08:59:27 +0000
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Alex Gough (Gerrit)

    unread,
    Jun 26, 2024, 2:01:28 PM (3 days ago) Jun 26
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Alex Gough added 2 comments

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 68, Patchset 19 (Latest): pending_receiver<WebNNContextClient>? context_client_receiver;
    Alex Gough . unresolved

    don't really need _receiver in the name here - but up to you

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 165, Patchset 19 (Latest): // Closes the `context_remote_` and `context_client_receiver_` pipes
    // because the context has been lost.
    void OnDestroyed(const String& message) override;

    void OnDisconnected();
    Alex Gough . unresolved

    I'm not sure of the naming of these - it's not super-obvious that it's about the remote side of the context being lost at the moment. OnDisconnected should perhaps refer to the context client being disconnected (not the /this/) and OnDestroyed should somehow refer to the other side being gone.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 18:01:07 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 26, 2024, 2:12:05 PM (3 days ago) Jun 26
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/dml/context_impl_dml.cc
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    Rafael Cintron

    @bryan.b...@intel.com, can you give me an example of a non-fatal HRESULT error that is recoverable by re-creating the command recorder?

    Bryan Bernhart

    @rafael....@microsoft.com any HRESULT that doesn't explicitly say the device will be put into a removal state could be recoverable. If the web developer builds N graphs and the N+1 graph build causes `E_INVALIDARG` or `DXGI_ERROR_INVALID_CALL` and so on then the entire app/workload will go down and the web developer won't know what graph caused it.

    Rafael Cintron

    My opinion, as I've stated in other comment threads in this CL, is that `E_INVALIDARG` and `DXGI_ERROR_INVALID_CALL` are not errors the browser should try to gracefully recover from. We should `CHECK` those and fail fast the GPU process.

    Bryan Bernhart

    @rafael....@microsoft.com D3D12 does not guarantee where a certain error comes from: CreateCommandAllocator() could generate `E_OUTOFMEMORY`. Is that not recoverable? The problem with a "no error or lost" approach is it assumes every error is now fatal. But if you still feel strongly, I will close as "by design".

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 18:11:52 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 26, 2024, 6:23:26 PM (3 days ago) Jun 26
    to Mingming1 Xu, Alex Gough, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Reilly Grant and ningxin hu

    Rafael Cintron voted and added 4 comments

    Votes added by Rafael Cintron

    Code-Review+1

    4 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 323, Patchset 16:void ContextImplDml::HandleContextLost(std::string_view error_message,
    ningxin hu . unresolved

    This is not an error message passed to user JS code but a log message. I feel it is confusing by passing it just for logging. Could it be moved out from this method? Other helpers, like `RETURN_IF_FAILED` in [error.h](https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/error.h;l=14;drc=ae983e94583c5e2d400fa7641439d2e8ca13fd93;bpv=1;bpt=1), already log the error.

    Mingming1 Xu

    This change is to address @rafael....@microsoft.com 's comments: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/f38abdf6_7c0fbbcd/

    Rafael Cintron

    I think Ningxin is getting confused by the two, very similarly named strings: one called `error_message` and the other one just called `message`.

    The first one is meant to go into logs and provide some context on where the error occurred. The second one is meant to be a simpler error we provide to web developers that avoids revealing too much information about the context of the error to attackers.

    Perhaps we can better name these to disambiguate; rename `message` to `message_for_promise` or something like that.

    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    Rafael Cintron

    @bryan.b...@intel.com, can you give me an example of a non-fatal HRESULT error that is recoverable by re-creating the command recorder?

    Bryan Bernhart

    @rafael....@microsoft.com any HRESULT that doesn't explicitly say the device will be put into a removal state could be recoverable. If the web developer builds N graphs and the N+1 graph build causes `E_INVALIDARG` or `DXGI_ERROR_INVALID_CALL` and so on then the entire app/workload will go down and the web developer won't know what graph caused it.

    Rafael Cintron

    My opinion, as I've stated in other comment threads in this CL, is that `E_INVALIDARG` and `DXGI_ERROR_INVALID_CALL` are not errors the browser should try to gracefully recover from. We should `CHECK` those and fail fast the GPU process.

    Bryan Bernhart

    @rafael....@microsoft.com D3D12 does not guarantee where a certain error comes from: CreateCommandAllocator() could generate `E_OUTOFMEMORY`. Is that not recoverable? The problem with a "no error or lost" approach is it assumes every error is now fatal. But if you still feel strongly, I will close as "by design".

    Rafael Cintron

    D3D12 does not guarantee where a certain error comes from: CreateCommandAllocator() could generate E_OUTOFMEMORY. Is that not recoverable?

    When Chromium runs in the sandbox and the job limit is reached, a message will be sent to the browser process notifying it. The browser process will turnaround and kill the GPU process. So, while it is technically recoverable, the GPU process is doomed. :-) FWIW, WebGPU also makes E_OUTOFMEMORY turn into WebGPU context lost.

    The problem with a "no error or lost" approach is it assumes every error is now fatal. But if you still feel strongly, I will close as "by design".

    Device removed, device reset and similar HRESULTs should cause a graceful termination of the GPU process. This CL handles them gracefully but doesn't terminate the GPU process yet. We'll need to hook that up in a followon change similar to existing code in the GPU process.

    In my opinion, HRESULTS like E_INVALIDARG should result in a CHECK. This is a bug in the DML backend that we should treat similar to CHECK failures that we're counting on the validation layer have dealt before arriving at DML code.

    Line 343, Patchset 16: LOG(ERROR) << "[WebNN] Device Removed Reason: "
    << logging::SystemErrorCodeToString(
    adapter_->d3d12_device()->GetDeviceRemovedReason());
    ningxin hu . unresolved

    Should this log only for device removal error?

    Mingming1 Xu
    Agree, maybe we should:
    ` HRESULT device_removed_reason = adapter_->d3d12_device()->GetDeviceRemovedReason();
    if (FAILED(device_removed_reason)) {
    LOG(ERROR) << "[WebNN] Device Removed Reason: "
    << logging::SystemErrorCodeToString(device_removed_reason);
    }`
    Rafael Cintron

    +1

    File services/webnn/dml/graph_impl_dml.cc
    Line 5848, Patchset 16: std::move(callback).Run(
    base::unexpected(std::move(create_operator_result.error())));
    ningxin hu . unresolved

    Do we handle context lost here? Given the following code handles graph creation failure (it internally handles context lost) if creating an identity node fails.

    Mingming1 Xu
    `void HandleGraphCreationFailure(
    const std::string& error_message,
    WebNNContextImpl::CreateGraphImplCallback callback)` won't handle context lost, only `void HandleGraphCreationFailure(
    const std::string& error_message,
    HRESULT hr,
    WebNNContextImpl::CreateGraphImplCallback callback, ContextImplDml* context)` will handle context lost, it requires the `HRESULT`.
    Rafael Cintron

    @ningx...@intel.com brings up a good point that this particular error path does not result in context lost.

    We originally had this error code path be graceful because we didn't want to crash the GPU process for unimplemented operators in the DML backend. Now that we have an implementation for every op, might be good to revisit this sooner rather than later.

    OK to solve this as a separate change.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Reilly Grant
    • ningxin hu
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 22:23:11 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 26, 2024, 7:20:14 PM (3 days ago) Jun 26
    to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Mingming1 Xu, Rafael Cintron, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 1 comment

    File services/webnn/dml/context_impl_dml.cc
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . unresolved

    [1] In certain situations, like command recording, it is not expected to generate a "lost" HRESULT error (hence no TODO), because the error was recoverable.

    The problem here is that HandleWebNNContextLost() really only needs to be called from `CommandRecorder::Execute()`, instead of `CommandRecorder::CloseAndExecute()`, where HandleRecordingError() only needs to LOG.

    Mingming1 Xu

    What do you mean? The `CommandRecorder::CloseAndExecute()` will also call `CommandRecorder::Execute()`. And there is no any bad effect if we call the `HandleWebNNContextLost()` here, right? Just like the `GraphImplDml::HandleDispatchFailure`, `GraphImplDml::HandleComputationFailure` and `HandleGraphCreationFailure`, I handle the failure HRESULT everywhere.

    Bryan Bernhart

    And there is no any bad effect if we call the HandleWebNNContextLost() here, right?

    If we call HandleWebNNContextLost(), the entire context becomes inoperable, even when the HRESULT may not indicate a 'lost' error. Before this CL, any recording HRESULT error was fully handled by re-creating the command recorder.

    Either HandleWebNNContextLost() needs to avoid generating a lost on non-fatal HRESULT errors that are recoverable OR we must call HandleWebNNContextLost() ONLY where a lost HRESULT is expected (ex. `CommandRecorder::Execute`).

    Rafael Cintron

    @bryan.b...@intel.com, can you give me an example of a non-fatal HRESULT error that is recoverable by re-creating the command recorder?

    Bryan Bernhart

    @rafael....@microsoft.com any HRESULT that doesn't explicitly say the device will be put into a removal state could be recoverable. If the web developer builds N graphs and the N+1 graph build causes `E_INVALIDARG` or `DXGI_ERROR_INVALID_CALL` and so on then the entire app/workload will go down and the web developer won't know what graph caused it.

    Rafael Cintron

    My opinion, as I've stated in other comment threads in this CL, is that `E_INVALIDARG` and `DXGI_ERROR_INVALID_CALL` are not errors the browser should try to gracefully recover from. We should `CHECK` those and fail fast the GPU process.

    Bryan Bernhart

    @rafael....@microsoft.com D3D12 does not guarantee where a certain error comes from: CreateCommandAllocator() could generate `E_OUTOFMEMORY`. Is that not recoverable? The problem with a "no error or lost" approach is it assumes every error is now fatal. But if you still feel strongly, I will close as "by design".

    Rafael Cintron

    D3D12 does not guarantee where a certain error comes from: CreateCommandAllocator() could generate E_OUTOFMEMORY. Is that not recoverable?

    When Chromium runs in the sandbox and the job limit is reached, a message will be sent to the browser process notifying it. The browser process will turnaround and kill the GPU process. So, while it is technically recoverable, the GPU process is doomed. :-) FWIW, WebGPU also makes E_OUTOFMEMORY turn into WebGPU context lost.

    The problem with a "no error or lost" approach is it assumes every error is now fatal. But if you still feel strongly, I will close as "by design".

    Device removed, device reset and similar HRESULTs should cause a graceful termination of the GPU process. This CL handles them gracefully but doesn't terminate the GPU process yet. We'll need to hook that up in a followon change similar to existing code in the GPU process.

    In my opinion, HRESULTS like E_INVALIDARG should result in a CHECK. This is a bug in the DML backend that we should treat similar to CHECK failures that we're counting on the validation layer have dealt before arriving at DML code.

    Bryan Bernhart

    Okay, @rafael....@chromium.org I see where you're coming from.

    We'll need to hook that up in a followon change similar to existing code in the GPU process.

    I agree with you we'll need to fix this at some point. But once we do, E_OUTOFMEMORY will be recoverable and this code shouldn't generate a context lost (due to the reset). But I'm fine with a TODO there if the browser process won't (currently) allow it.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Mingming1 Xu
    • Rafael Cintron
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Gerrit-CC: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Wed, 26 Jun 2024 23:20:02 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 26, 2024, 11:01:48 PM (3 days ago) Jun 26
    to Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Rafael Cintron, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 4 comments

    File services/webnn/dml/context_impl_dml.cc
    Mingming1 Xu

    We'll need to hook that up in a followon change similar to existing code in the GPU process.

    I am not very clear what's the followon change. Where is the existing code, can you paste link here to help me understand? Thanks!

    Line 343, Patchset 16: LOG(ERROR) << "[WebNN] Device Removed Reason: "
    << logging::SystemErrorCodeToString(
    adapter_->d3d12_device()->GetDeviceRemovedReason());
    ningxin hu . resolved

    Should this log only for device removal error?

    Mingming1 Xu
    Agree, maybe we should:
    ` HRESULT device_removed_reason = adapter_->d3d12_device()->GetDeviceRemovedReason();
    if (FAILED(device_removed_reason)) {
    LOG(ERROR) << "[WebNN] Device Removed Reason: "
    << logging::SystemErrorCodeToString(device_removed_reason);
    }`
    Rafael Cintron

    +1

    Mingming1 Xu

    Done

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 68, Patchset 19 (Latest): pending_receiver<WebNNContextClient>? context_client_receiver;
    Alex Gough . unresolved

    don't really need _receiver in the name here - but up to you

    Mingming1 Xu

    I just want to align with the `context_remote`. Maybe the suffix will make codes easier to read. Agree that it's not necessary.

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 165, Patchset 19 (Latest): // Closes the `context_remote_` and `context_client_receiver_` pipes
    // because the context has been lost.
    void OnDestroyed(const String& message) override;

    void OnDisconnected();
    Alex Gough . unresolved

    I'm not sure of the naming of these - it's not super-obvious that it's about the remote side of the context being lost at the moment. OnDisconnected should perhaps refer to the context client being disconnected (not the /this/) and OnDestroyed should somehow refer to the other side being gone.

    Mingming1 Xu

    `WebNNContextClient` remote in GPU process calls the `OnDestroyed` method explicitly when some errors are captured in GPU process, it means the derived `MLContext` in the renderer process will be destroyed (lost).

    The `OnDisconnected` method is called when the GPU process is crashed which will cause `WebNNContextClient` receiver disconnected. `OnDisconnected` calls `OnDestroyed` inside to destroy the `MLContext` since the mojom pipe is disconnected.

    The name `OnDestroyed` is also to match WebGPU and better express that the context is unusable, from comments: this.https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/60724d4a_a12552ca/. In the future, we may plan to add another corresponding interface `Destroy()` to destroy the context from the renderer process, see discussion here: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/d1632baa_7bdad5fb/

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    Gerrit-Attention: Alex Gough <aj...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 27 Jun 2024 03:01:35 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Alex Gough <aj...@chromium.org>
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    ningxin hu (Gerrit)

    unread,
    Jun 27, 2024, 2:33:06 AM (3 days ago) Jun 27
    to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Mingming1 Xu, Rafael Cintron, Rafael Cintron and Reilly Grant

    ningxin hu added 8 comments

    Patchset-level comments
    ningxin hu . unresolved

    Adapter's `HandleAdapterFailure()` of https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/adapter.cc;bpv=0;bpt=1 may also need to handle context lost.

    ningxin hu

    Open it for comments.

    Mingming1 Xu

    While the adapter is created before creating a context, if fails, there is no context.

    ningxin hu

    You are correct, there is no context to lose. The 'lost' error could be reported by rejecting promise returned by `createContext()`. However, my open is should we be consistent and crash GPU process for non 'lost' errors. OK to add a TODO and follow up in a separate CL.

    File services/webnn/dml/context_impl_dml.cc
    Line 146, Patchset 16: HandleContextLost("Failed to start recording.", hr);
    ningxin hu . unresolved

    `StartRecordingIfNecessary()` may already fail and call `HandleRecordingError()` which calls `HandleContextLost()` before. It may crash at `CHECK` before running the callback and this `HandleContextLost()`.

    Mingming1 Xu

    Do you mean we should not call `HandleRecordingError()` which calls`HandleContextLost()` in the `StartRecordingIfNecessary()`? We can call the `HandleRecordingError()` outside if `StartRecordingIfNecessary()` fails.

    ningxin hu

    We can call the HandleRecordingError() outside if StartRecordingIfNecessary() fails.

    This might work. My point is we should ensure the behavior is consistent when error happens. The user code should get promise rejection from `readBuffer()` method.

    Line 156, Patchset 16: HandleRecordingError("Failed to close and execute the command list.", hr);
    ningxin hu . unresolved

    `HandleRecordingError()` calls `HandleContextLost()` that may crash at `CHECK` and won't run the following callback.

    Mingming1 Xu

    Do you mean move the `HandleRecordingError()` below the `std::move(callback).Run...` to let the callback run at first?

    ningxin hu

    Yes, I suppose the user code should get promise rejection from `readBuffer()` method.

    Line 323, Patchset 16:void ContextImplDml::HandleContextLost(std::string_view error_message,
    ningxin hu . unresolved

    This is not an error message passed to user JS code but a log message. I feel it is confusing by passing it just for logging. Could it be moved out from this method? Other helpers, like `RETURN_IF_FAILED` in [error.h](https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/error.h;l=14;drc=ae983e94583c5e2d400fa7641439d2e8ca13fd93;bpv=1;bpt=1), already log the error.

    Mingming1 Xu

    This change is to address @rafael....@microsoft.com 's comments: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/f38abdf6_7c0fbbcd/

    Rafael Cintron

    I think Ningxin is getting confused by the two, very similarly named strings: one called `error_message` and the other one just called `message`.

    The first one is meant to go into logs and provide some context on where the error occurred. The second one is meant to be a simpler error we provide to web developers that avoids revealing too much information about the context of the error to attackers.

    Perhaps we can better name these to disambiguate; rename `message` to `message_for_promise` or something like that.

    ningxin hu

    Perhaps we can better name these to disambiguate; rename message to message_for_promise or something like that.

    This helps. Or we can rename `error_message` to `message_for_log`.

    Line 341, Patchset 16: OnDestroyed(base::StrCat({"WebNN context is lost due to ", message}));
    ningxin hu . unresolved

    Do we want to reject `lost` promise for other "internal errors"? Or we just crash the GPU process by `CHECK` an error is not an "internal error"?

    3- <Everything else>
    For any other error, we should treat it as Chromium passing invalid parameters to the API and do a CHECK so that we can get crash dumps and diagnose.

    We should LOG(ERROR) any failure we receive from DirectX along with the logging::SystemErrorCodeToString friendly string, even ones we CHECK for. This will become critical to root causing failures in the wild from web developers. Pooja has already converted these but we should convert others that might have been missed.

    I didn't find the decision that we should report "everything else" errors to JS code by rejecting `lost` promise. Did I miss anything?

    File services/webnn/dml/graph_impl_dml.cc
    Line 5848, Patchset 16: std::move(callback).Run(
    base::unexpected(std::move(create_operator_result.error())));
    ningxin hu . unresolved

    Do we handle context lost here? Given the following code handles graph creation failure (it internally handles context lost) if creating an identity node fails.

    Mingming1 Xu
    `void HandleGraphCreationFailure(
    const std::string& error_message,
    WebNNContextImpl::CreateGraphImplCallback callback)` won't handle context lost, only `void HandleGraphCreationFailure(
    const std::string& error_message,
    HRESULT hr,
    WebNNContextImpl::CreateGraphImplCallback callback, ContextImplDml* context)` will handle context lost, it requires the `HRESULT`.
    Rafael Cintron

    @ningx...@intel.com brings up a good point that this particular error path does not result in context lost.

    We originally had this error code path be graceful because we didn't want to crash the GPU process for unimplemented operators in the DML backend. Now that we have an implementation for every op, might be good to revisit this sooner rather than later.

    OK to solve this as a separate change.

    ningxin hu

    OK to solve this as a separate change.

    +1. Worth adding a TODO.

    Line 5872, Patchset 16: HandleGraphCreationFailure("Failed to create identity operator.",
    std::move(callback));
    ningxin hu . unresolved

    This may need to pass `context`.

    Mingming1 Xu

    There is no `HRESULT` to be handled by `HandleContextLost`.

    ningxin hu

    Same as above, we may need to crash GPU process for operator creation failures. Please add a TODO.

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 165, Patchset 19 (Latest): // Closes the `context_remote_` and `context_client_receiver_` pipes
    // because the context has been lost.
    void OnDestroyed(const String& message) override;

    void OnDisconnected();
    Alex Gough . unresolved

    I'm not sure of the naming of these - it's not super-obvious that it's about the remote side of the context being lost at the moment. OnDisconnected should perhaps refer to the context client being disconnected (not the /this/) and OnDestroyed should somehow refer to the other side being gone.

    Mingming1 Xu

    `WebNNContextClient` remote in GPU process calls the `OnDestroyed` method explicitly when some errors are captured in GPU process, it means the derived `MLContext` in the renderer process will be destroyed (lost).

    The `OnDisconnected` method is called when the GPU process is crashed which will cause `WebNNContextClient` receiver disconnected. `OnDisconnected` calls `OnDestroyed` inside to destroy the `MLContext` since the mojom pipe is disconnected.

    The name `OnDestroyed` is also to match WebGPU and better express that the context is unusable, from comments: this.https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/60724d4a_a12552ca/. In the future, we may plan to add another corresponding interface `Destroy()` to destroy the context from the renderer process, see discussion here: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/d1632baa_7bdad5fb/

    ningxin hu

    OnDestroyed should somehow refer to the other side being gone

    Would `OnContextLost()` or `OnLost()` be more straightforward? Because service-side calls `HandleContextLost()` to send this message and blink-side would resolve the `lost` promise upon receiving this message. "OnDestroyed" sounds like something else.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • Rafael Cintron
    • Reilly Grant
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Comment-Date: Thu, 27 Jun 2024 06:32:53 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Alex Gough <aj...@chromium.org>
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    ningxin hu (Gerrit)

    unread,
    Jun 27, 2024, 4:54:53 AM (2 days ago) Jun 27
    to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Mingming1 Xu, Rafael Cintron, Rafael Cintron and Reilly Grant

    ningxin hu added 1 comment

    File services/webnn/dml/context_impl_dml.cc
    Line 341, Patchset 16: OnDestroyed(base::StrCat({"WebNN context is lost due to ", message}));
    ningxin hu . resolved

    Do we want to reject `lost` promise for other "internal errors"? Or we just crash the GPU process by `CHECK` an error is not an "internal error"?

    Mingming1 Xu

    The discussion is here: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/ and https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/7aa27202_25387285/

    ningxin hu

    I only read @rafael....@microsoft.com's [comment](https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/)

    3- <Everything else>
    For any other error, we should treat it as Chromium passing invalid parameters to the API and do a CHECK so that we can get crash dumps and diagnose.

    We should LOG(ERROR) any failure we receive from DirectX along with the logging::SystemErrorCodeToString friendly string, even ones we CHECK for. This will become critical to root causing failures in the wild from web developers. Pooja has already converted these but we should convert others that might have been missed.

    I didn't find the decision that we should report "everything else" errors to JS code by rejecting `lost` promise. Did I miss anything?

    ningxin hu

    Mingming helped me find another [comment](https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/7aa27202_25387285/) that I missed

    Currently I handled E_OUTOFMEMORY, DXGI_ERROR_DEVICE_REMOVED, DXGI_ERROR_DEVICE_RESET three ones to call OnDestroyed, otherwise I directly return and won't call OnDestroyed.

    By the time we get here, we know we have a failure HRESULT, correct? If so, we should always call OnDestroyed so that the web developer can realize that no more WebNN will happen with the context. The difference is whether the error warrants a CHECK or graceful termination of the GPU process.

    I am fine with notifying user JS code with "internal error" via "lost" promise because GPU process is going to crash.

    Gerrit-Comment-Date: Thu, 27 Jun 2024 08:54:39 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Mingming1 Xu (Gerrit)

    unread,
    Jun 27, 2024, 5:08:31 AM (2 days ago) Jun 27
    to Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Bryan Bernhart, Rafael Cintron, Rafael Cintron, Reilly Grant and ningxin hu

    Mingming1 Xu added 12 comments

    Patchset-level comments
    ningxin hu . unresolved

    Adapter's `HandleAdapterFailure()` of https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/adapter.cc;bpv=0;bpt=1 may also need to handle context lost.

    ningxin hu

    Open it for comments.

    Mingming1 Xu

    While the adapter is created before creating a context, if fails, there is no context.

    ningxin hu

    You are correct, there is no context to lose. The 'lost' error could be reported by rejecting promise returned by `createContext()`. However, my open is should we be consistent and crash GPU process for non 'lost' errors. OK to add a TODO and follow up in a separate CL.

    File services/webnn/dml/context_impl_dml.h
    Line 43, Patchset 16: void HandleContextLost(std::string_view error_message, HRESULT hr);
    ningxin hu . unresolved

    This method may crash GPU process. It'd be better to make it explicit: i.e. something like `HandleContextLostOrCrash()`.

    It might be worth adding a comment explaining which errors are treated as "context lost".

    Mingming1 Xu

    Updated, ptal.

    File services/webnn/dml/context_impl_dml.cc
    Line 146, Patchset 16: HandleContextLost("Failed to start recording.", hr);
    ningxin hu . unresolved

    `StartRecordingIfNecessary()` may already fail and call `HandleRecordingError()` which calls `HandleContextLost()` before. It may crash at `CHECK` before running the callback and this `HandleContextLost()`.

    Mingming1 Xu

    Do you mean we should not call `HandleRecordingError()` which calls`HandleContextLost()` in the `StartRecordingIfNecessary()`? We can call the `HandleRecordingError()` outside if `StartRecordingIfNecessary()` fails.

    ningxin hu

    We can call the HandleRecordingError() outside if StartRecordingIfNecessary() fails.

    This might work. My point is we should ensure the behavior is consistent when error happens. The user code should get promise rejection from `readBuffer()` method.

    Mingming1 Xu

    Updated. Also cc @bryan.b...@intel.com to check the change here.

    Line 156, Patchset 16: HandleRecordingError("Failed to close and execute the command list.", hr);
    ningxin hu . resolved

    `HandleRecordingError()` calls `HandleContextLost()` that may crash at `CHECK` and won't run the following callback.

    Mingming1 Xu

    Do you mean move the `HandleRecordingError()` below the `std::move(callback).Run...` to let the callback run at first?

    ningxin hu

    Yes, I suppose the user code should get promise rejection from `readBuffer()` method.

    Mingming1 Xu

    Done

    Line 180, Patchset 16: HandleRecordingError("Failed to download the buffer.", hr);
    ningxin hu . resolved

    ditto.

    Mingming1 Xu

    Done

    Line 323, Patchset 16:void ContextImplDml::HandleContextLost(std::string_view error_message,
    ningxin hu . resolved

    This is not an error message passed to user JS code but a log message. I feel it is confusing by passing it just for logging. Could it be moved out from this method? Other helpers, like `RETURN_IF_FAILED` in [error.h](https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/error.h;l=14;drc=ae983e94583c5e2d400fa7641439d2e8ca13fd93;bpv=1;bpt=1), already log the error.

    Mingming1 Xu

    This change is to address @rafael....@microsoft.com 's comments: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/f38abdf6_7c0fbbcd/

    Rafael Cintron

    I think Ningxin is getting confused by the two, very similarly named strings: one called `error_message` and the other one just called `message`.

    The first one is meant to go into logs and provide some context on where the error occurred. The second one is meant to be a simpler error we provide to web developers that avoids revealing too much information about the context of the error to attackers.

    Perhaps we can better name these to disambiguate; rename `message` to `message_for_promise` or something like that.

    ningxin hu

    Perhaps we can better name these to disambiguate; rename message to message_for_promise or something like that.

    This helps. Or we can rename `error_message` to `message_for_log`.

    Mingming1 Xu

    Done

    Mingming1 Xu
    File services/webnn/dml/graph_impl_dml.cc
    Line 5848, Patchset 16: std::move(callback).Run(
    base::unexpected(std::move(create_operator_result.error())));
    ningxin hu . resolved

    Do we handle context lost here? Given the following code handles graph creation failure (it internally handles context lost) if creating an identity node fails.

    Mingming1 Xu
    `void HandleGraphCreationFailure(
    const std::string& error_message,
    WebNNContextImpl::CreateGraphImplCallback callback)` won't handle context lost, only `void HandleGraphCreationFailure(
    const std::string& error_message,
    HRESULT hr,
    WebNNContextImpl::CreateGraphImplCallback callback, ContextImplDml* context)` will handle context lost, it requires the `HRESULT`.
    Rafael Cintron

    @ningx...@intel.com brings up a good point that this particular error path does not result in context lost.

    We originally had this error code path be graceful because we didn't want to crash the GPU process for unimplemented operators in the DML backend. Now that we have an implementation for every op, might be good to revisit this sooner rather than later.

    OK to solve this as a separate change.

    ningxin hu

    OK to solve this as a separate change.

    +1. Worth adding a TODO.

    Mingming1 Xu
    Line 5872, Patchset 16: HandleGraphCreationFailure("Failed to create identity operator.",
    std::move(callback));
    ningxin hu . resolved

    This may need to pass `context`.

    Mingming1 Xu

    There is no `HRESULT` to be handled by `HandleContextLost`.

    ningxin hu

    Same as above, we may need to crash GPU process for operator creation failures. Please add a TODO.

    Mingming1 Xu

    Done

    File services/webnn/webnn_graph_impl_unittest.cc
    Line 108, Patchset 4: FakeWebNNContextImpl(mojo::PendingReceiver<mojom::WebNNContext> receiver,
    Bryan Bernhart . resolved

    Looks like we have no test cases which generate a context lost (and ensure delivery).

    What's our plan there?

    Mingming1 Xu

    Yes, agree, I should add tests for that, will do in following patchset.

    Mingming1 Xu

    Added in patchset 10, PTAL.

    Bryan Bernhart

    What we added is good but I think we'll need to go further and add a CTS. To do that, there needs to be a way to force a context lost from script w/o relying on the OS/driver generating a "lost" HRESULT.

    Introducing `MLContext.destroy()` which also loses the context could achieve that. But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed). `MLContextLostInfo` will probably need a field to the know the source of the message to avoid that. WDYT?

    Rafael Cintron

    Here are the [WebGPU device lost tests](https://gpuweb.github.io/cts/standalone/?q=webgpu:api,operation,device,lost:*). They're implemented with a `destroy` method on the device, which I think we should also have in WebNN.

    The destroy method can come as a separate change.

    Mingming1 Xu

    Thanks, open https://issues.chromium.org/issues/348904836 to track.

    But if we treat destroy as lost, the web developer could mistakenly re-create the context (when in fact, it should remain destroyed).

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Bryan Bernhart

    Re-creating a new context after lost is as expected, the new context can work after the GPU recovers, right?

    Correct but only when the context wasn't destroyed/lost on purpose. I am fine with a TODO.

    Mingming1 Xu

    Thinking again, what's the purpose and usecase of introducing `MLContext.destroy()` into WebNN? Is it only for testing against forcing context lost from renderer process? @rafael....@microsoft.com @bryan.b...@intel.com @ningx...@intel.com

    Rafael Cintron

    Destroy serves two purposes. One is for testing. The other is for the web developer to free the memory and resources taken up by the context. In the case of hybrid GPUs, that means powering down the discrete GPU and giving hours of battery life back to the end user.

    Mingming1 Xu

    Thanks!

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 165, Patchset 19: // Closes the `context_remote_` and `context_client_receiver_` pipes

    // because the context has been lost.
    void OnDestroyed(const String& message) override;

    void OnDisconnected();
    Alex Gough . unresolved

    I'm not sure of the naming of these - it's not super-obvious that it's about the remote side of the context being lost at the moment. OnDisconnected should perhaps refer to the context client being disconnected (not the /this/) and OnDestroyed should somehow refer to the other side being gone.

    Mingming1 Xu

    `WebNNContextClient` remote in GPU process calls the `OnDestroyed` method explicitly when some errors are captured in GPU process, it means the derived `MLContext` in the renderer process will be destroyed (lost).

    The `OnDisconnected` method is called when the GPU process is crashed which will cause `WebNNContextClient` receiver disconnected. `OnDisconnected` calls `OnDestroyed` inside to destroy the `MLContext` since the mojom pipe is disconnected.

    The name `OnDestroyed` is also to match WebGPU and better express that the context is unusable, from comments: this.https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/60724d4a_a12552ca/. In the future, we may plan to add another corresponding interface `Destroy()` to destroy the context from the renderer process, see discussion here: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/d1632baa_7bdad5fb/

    ningxin hu

    OnDestroyed should somehow refer to the other side being gone

    Would `OnContextLost()` or `OnLost()` be more straightforward? Because service-side calls `HandleContextLost()` to send this message and blink-side would resolve the `lost` promise upon receiving this message. "OnDestroyed" sounds like something else.

    Mingming1 Xu

    Rename as `OnLost` now.

    File third_party/blink/renderer/modules/ml/ml_context.cc
    Line 224, Patchset 4:void MLContext::OnWebNNContextLostCaptured(const String& context_lost_info) {
    ningxin hu . resolved

    Should this info be reported to JS code?

    Mingming1 Xu

    Now I reported these info related to adapter error in `src\services\webnn\dml\utils.cc`:

      `std::string context_lost_info;
    if (hr == E_OUTOFMEMORY) {
    context_lost_info = "out of memory.";
    } else if (hr == DXGI_ERROR_DEVICE_REMOVED) {
    context_lost_info = "device removed.";
    } else if (hr == DXGI_ERROR_DEVICE_HUNG) {
    context_lost_info = "device hung.";
    } else if (hr == DXGI_ERROR_DEVICE_RESET) {
    context_lost_info = "device reset.";
    } else {
    return;
    }`
    Mingming1 Xu

    Discussed a lot in other comments, PTAL, thanks!

    Mingming1 Xu

    Done

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Bryan Bernhart
    • Rafael Cintron
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
    Submit Requirements:
    • requirement satisfiedCode-Coverage
    • requirement is not satisfiedCode-Review
    • requirement is not satisfiedNo-Unresolved-Comments
    Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
    Gerrit-MessageType: comment
    Gerrit-Project: chromium/src
    Gerrit-Branch: main
    Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
    Gerrit-Change-Number: 5602134
    Gerrit-PatchSet: 20
    Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
    Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
    Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
    Gerrit-CC: Austin Sullivan <asu...@chromium.org>
    Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
    Gerrit-CC: Jiewei Qian <q...@chromium.org>
    Gerrit-CC: Kentaro Hara <har...@chromium.org>
    Gerrit-CC: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Attention: Alex Gough <aj...@chromium.org>
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Comment-Date: Thu, 27 Jun 2024 09:08:19 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    Comment-In-Reply-To: Alex Gough <aj...@chromium.org>
    Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
    Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Bryan Bernhart (Gerrit)

    unread,
    Jun 27, 2024, 1:34:41 PM (2 days ago) Jun 27
    to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Alex Gough, Mingming1 Xu, Rafael Cintron, Rafael Cintron, Reilly Grant and ningxin hu

    Bryan Bernhart added 2 comments

    File services/webnn/dml/context_impl_dml.cc
    Line 146, Patchset 16: HandleContextLost("Failed to start recording.", hr);
    ningxin hu . unresolved

    `StartRecordingIfNecessary()` may already fail and call `HandleRecordingError()` which calls `HandleContextLost()` before. It may crash at `CHECK` before running the callback and this `HandleContextLost()`.

    Mingming1 Xu

    Do you mean we should not call `HandleRecordingError()` which calls`HandleContextLost()` in the `StartRecordingIfNecessary()`? We can call the `HandleRecordingError()` outside if `StartRecordingIfNecessary()` fails.

    ningxin hu

    We can call the HandleRecordingError() outside if StartRecordingIfNecessary() fails.

    This might work. My point is we should ensure the behavior is consistent when error happens. The user code should get promise rejection from `readBuffer()` method.

    Mingming1 Xu

    Updated. Also cc @bryan.b...@intel.com to check the change here.

    Bryan Bernhart
    Line 329, Patchset 12: HandleWebNNContextLost(hr, this);
    Bryan Bernhart . resolved

    I am not very clear what's the followon change.

    Perhaps one way is to assign WebNN a lower job limit, which gives us first chance to handle OOM.

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Alex Gough
    • Mingming1 Xu
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Comment-Date: Thu, 27 Jun 2024 17:34:24 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: No
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Alex Gough (Gerrit)

    unread,
    Jun 27, 2024, 3:56:24 PM (2 days ago) Jun 27
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Mingming1 Xu, Rafael Cintron, Rafael Cintron, Reilly Grant and ningxin hu

    Alex Gough voted and added 3 comments

    Votes added by Alex Gough

    Code-Review+1

    3 comments

    Patchset-level comments
    File-level comment, Patchset 20 (Latest):
    Alex Gough . resolved

    lgtm mojom

    File services/webnn/public/mojom/webnn_context_provider.mojom
    Line 68, Patchset 19: pending_receiver<WebNNContextClient>? context_client_receiver;
    Alex Gough . unresolved

    don't really need _receiver in the name here - but up to you

    Mingming1 Xu

    I just want to align with the `context_remote`. Maybe the suffix will make codes easier to read. Agree that it's not necessary.

    Alex Gough

    totally up to you - feel free to resolve either way.

    File third_party/blink/renderer/modules/ml/ml_context.h
    Line 165, Patchset 19: // Closes the `context_remote_` and `context_client_receiver_` pipes
    // because the context has been lost.
    void OnDestroyed(const String& message) override;

    void OnDisconnected();
    Alex Gough . resolved

    I'm not sure of the naming of these - it's not super-obvious that it's about the remote side of the context being lost at the moment. OnDisconnected should perhaps refer to the context client being disconnected (not the /this/) and OnDestroyed should somehow refer to the other side being gone.

    Mingming1 Xu

    `WebNNContextClient` remote in GPU process calls the `OnDestroyed` method explicitly when some errors are captured in GPU process, it means the derived `MLContext` in the renderer process will be destroyed (lost).

    The `OnDisconnected` method is called when the GPU process is crashed which will cause `WebNNContextClient` receiver disconnected. `OnDisconnected` calls `OnDestroyed` inside to destroy the `MLContext` since the mojom pipe is disconnected.

    The name `OnDestroyed` is also to match WebGPU and better express that the context is unusable, from comments: this.https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/60724d4a_a12552ca/. In the future, we may plan to add another corresponding interface `Destroy()` to destroy the context from the renderer process, see discussion here: https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/d1632baa_7bdad5fb/

    ningxin hu

    OnDestroyed should somehow refer to the other side being gone

    Would `OnContextLost()` or `OnLost()` be more straightforward? Because service-side calls `HandleContextLost()` to send this message and blink-side would resolve the `lost` promise upon receiving this message. "OnDestroyed" sounds like something else.

    Mingming1 Xu

    Rename as `OnLost` now.

    Alex Gough

    thanks.

    Open in Gerrit

    Related details

    Attention is currently required from:
    Gerrit-Attention: ningxin hu <ningx...@intel.com>
    Gerrit-Attention: Reilly Grant <rei...@chromium.org>
    Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
    Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
    Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
    Gerrit-Comment-Date: Thu, 27 Jun 2024 19:56:10 +0000
    Gerrit-HasComments: Yes
    Gerrit-Has-Labels: Yes
    Comment-In-Reply-To: Alex Gough <aj...@chromium.org>
    satisfied_requirement
    unsatisfied_requirement
    open
    diffy

    Rafael Cintron (Gerrit)

    unread,
    Jun 27, 2024, 4:45:49 PM (2 days ago) Jun 27
    to Mingming1 Xu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
    Attention needed from Bryan Bernhart, Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

    Rafael Cintron voted and added 2 comments

    Votes added by Rafael Cintron

    Code-Review+1

    2 comments

    Patchset-level comments
    Rafael Cintron . resolved

    Still LGTM assuming issues others have raised have been fixed.

    File services/webnn/dml/context_impl_dml.cc
    Rafael Cintron

    I agree with you we'll need to fix this at some point. But once we do, E_OUTOFMEMORY will be recoverable and this code shouldn't generate a context lost (due to the reset).

    Perhaps one way is to assign WebNN a lower job limit, which gives us first chance to handle OOM.

    @bryan.b...@intel.com. The job limit is for the entire GPU process. There is no way to have a different job limit for WebNN vs. everything else. Attempting to allocate memory with D3D will trigger the job limit (if you're reached it) and cause the browser process to terminate the GPU process. To make large buffer allocations gracefully fail, we'll need to guesstimate whether the process will OOM before making the allocation.

    I am not very clear what's the followon change. Where is the existing code, can you paste link here to help me understand? Thanks!

    @mingmi...@intel.com the code I had in mind is `GpuServiceImpl::MaybeExitOnContextLost`. If WebNN is the only thing using D3D12 in the GPU process, it can attempt to recover on its own. When several systems use it at the same time, there needs to either be coordination on restarting everything together (difficult) or coordination to self-terminate (easier).

    Open in Gerrit

    Related details

    Attention is currently required from:
    • Bryan Bernhart
    • Mingming1 Xu
    • Rafael Cintron
    • Reilly Grant
    • ningxin hu
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Review
      Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Thu, 27 Jun 2024 20:45:36 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Bryan Bernhart (Gerrit)

      unread,
      Jun 27, 2024, 5:56:35 PM (2 days ago) Jun 27
      to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

      Bryan Bernhart added 1 comment

      File services/webnn/dml/context_impl_dml.cc
      Line 225, Patchset 4: // TODO(crbug.com/41492165): generate error using context.
      Bryan Bernhart . resolved

      Can we handle context lost here as well?

      Reilly Grant

      Are we assuming that any failure to create a buffer or graph means the context has been lost? That seems strange. I expect that context loss is a signal we specifically get from the adapter.

      Bryan Bernhart

      Are we assuming that any failure to create a buffer or graph means the context has been lost?

      We could, like WebGPU. If we do not then the OOM skews to some, possibly a unrelated call, which does handle context lost which is hard to debug.

      Reilly Grant

      I don't think we should be reporting a context loss for errors like this. That's a separate concept.

      Bryan Bernhart

      If this returns E_OUTOFMEMORY, the context will become inoperable and effectively lost. Also, the HR could be any other error result (but unlikely) so I can see it being quite useful to coverage to one error handler.

      Mingming1 Xu

      This discussion should be merged into above one https://chromium-review.googlesource.com/c/chromium/src/+/5602134/comment/ad47c8af_d3f27f05/: whether we should handle the context lost according to the system error code `HRESULT`.

      Mingming1 Xu

      I am thinking maybe all the methods that return `HRESULT` inside or outside should be handled by `HandleWebNNContextLost`, if the `HRESULT` is context lost error, we should call `OnDestoryed`. WDYT? @rafael....@microsoft.com

      Mingming1 Xu

      Updated, PTAL.

      Bryan Bernhart

      @mingmi...@intel.com

      It depends if the HRESULT error will be recoverable or not. In this case, E_OUTOFMEMORY is not recoverable and leaves the context inoperable so a "lost" seems appropriate. In other cases, this is not so [1].

      Bryan Bernhart

      LGTM.

      Open in Gerrit

      Related details

      Attention is currently required from:
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Thu, 27 Jun 2024 21:56:21 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Bryan Bernhart (Gerrit)

      unread,
      Jun 27, 2024, 7:25:25 PM (2 days ago) Jun 27
      to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, ningxin hu, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Mingming1 Xu, Rafael Cintron, Reilly Grant and ningxin hu

      Bryan Bernhart added 1 comment

      File services/webnn/dml/context_impl_dml.cc
      Bryan Bernhart

      There are two OOM recovery situations: the case of creating a very large buffer that would otherwise OOM vs needing to free-up existing memory from OOM (ex. command recorder reset).

      In latter case, I don't think guestimates help. But if we can recover from large buffer allocations then it'll likely we can recover from other OOM failures too.

      Gerrit-Comment-Date: Thu, 27 Jun 2024 23:25:12 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
      Comment-In-Reply-To: Bryan Bernhart <bryan.b...@intel.com>
      Comment-In-Reply-To: Rafael Cintron <rafael....@microsoft.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      ningxin hu (Gerrit)

      unread,
      Jun 27, 2024, 9:49:41 PM (2 days ago) Jun 27
      to Mingming1 Xu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Bryan Bernhart, Mingming1 Xu, Rafael Cintron and Reilly Grant

      ningxin hu voted and added 4 comments

      Votes added by ningxin hu

      Code-Review+1

      4 comments

      Patchset-level comments
      File-level comment, Patchset 16:
      ningxin hu . resolved

      Adapter's `HandleAdapterFailure()` of https://source.chromium.org/chromium/chromium/src/+/main:services/webnn/dml/adapter.cc;bpv=0;bpt=1 may also need to handle context lost.

      ningxin hu

      Open it for comments.

      Mingming1 Xu

      While the adapter is created before creating a context, if fails, there is no context.

      ningxin hu

      You are correct, there is no context to lose. The 'lost' error could be reported by rejecting promise returned by `createContext()`. However, my open is should we be consistent and crash GPU process for non 'lost' errors. OK to add a TODO and follow up in a separate CL.

      Mingming1 Xu

      https://issues.chromium.org/issues/349640007 tracked.

      ningxin hu

      Acknowledged

      ningxin hu . resolved

      lgtm % open comments

      File services/webnn/dml/context_impl_dml.h
      Line 43, Patchset 16: void HandleContextLost(std::string_view error_message, HRESULT hr);
      ningxin hu . resolved

      This method may crash GPU process. It'd be better to make it explicit: i.e. something like `HandleContextLostOrCrash()`.

      It might be worth adding a comment explaining which errors are treated as "context lost".

      Mingming1 Xu

      Updated, ptal.

      ningxin hu

      Acknowledged

      File services/webnn/dml/context_impl_dml.cc
      Line 146, Patchset 16: HandleContextLost("Failed to start recording.", hr);
      ningxin hu . resolved

      `StartRecordingIfNecessary()` may already fail and call `HandleRecordingError()` which calls `HandleContextLost()` before. It may crash at `CHECK` before running the callback and this `HandleContextLost()`.

      Mingming1 Xu

      Do you mean we should not call `HandleRecordingError()` which calls`HandleContextLost()` in the `StartRecordingIfNecessary()`? We can call the `HandleRecordingError()` outside if `StartRecordingIfNecessary()` fails.

      ningxin hu

      We can call the HandleRecordingError() outside if StartRecordingIfNecessary() fails.

      This might work. My point is we should ensure the behavior is consistent when error happens. The user code should get promise rejection from `readBuffer()` method.

      Mingming1 Xu

      Updated. Also cc @bryan.b...@intel.com to check the change here.

      Bryan Bernhart

      @mingmi...@intel.com LGTM

      ningxin hu

      lgtm

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Bryan Bernhart
      • Mingming1 Xu
      • Rafael Cintron
      • Reilly Grant
      Gerrit-Attention: Reilly Grant <rei...@chromium.org>
      Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Fri, 28 Jun 2024 01:49:27 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      Comment-In-Reply-To: ningxin hu <ningx...@intel.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Mingming1 Xu (Gerrit)

      unread,
      Jun 28, 2024, 2:16:55 AM (yesterday) Jun 28
      to ningxin hu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, Reilly Grant, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Bryan Bernhart, Rafael Cintron and Reilly Grant

      Mingming1 Xu added 1 comment

      File services/webnn/public/mojom/webnn_context_provider.mojom
      Line 68, Patchset 19: pending_receiver<WebNNContextClient>? context_client_receiver;
      Alex Gough . resolved

      don't really need _receiver in the name here - but up to you

      Mingming1 Xu

      I just want to align with the `context_remote`. Maybe the suffix will make codes easier to read. Agree that it's not necessary.

      Alex Gough

      totally up to you - feel free to resolve either way.

      Mingming1 Xu

      Thanks!

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Bryan Bernhart
      • Rafael Cintron
      • Reilly Grant
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Review
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
      Gerrit-Change-Number: 5602134
      Gerrit-PatchSet: 21
      Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
      Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
      Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
      Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
      Gerrit-CC: Austin Sullivan <asu...@chromium.org>
      Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
      Gerrit-CC: Jiewei Qian <q...@chromium.org>
      Gerrit-CC: Kentaro Hara <har...@chromium.org>
      Gerrit-CC: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Attention: Reilly Grant <rei...@chromium.org>
      Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Fri, 28 Jun 2024 06:16:42 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Alex Gough <aj...@chromium.org>
      Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
      satisfied_requirement
      open
      diffy

      Reilly Grant (Gerrit)

      unread,
      Jun 28, 2024, 5:30:29 PM (23 hours ago) Jun 28
      to Mingming1 Xu, Reilly Grant, ningxin hu, Rafael Cintron, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Alex Gough, Bryan Bernhart, Mingming1 Xu, Rafael Cintron, Rafael Cintron and ningxin hu

      Reilly Grant voted and added 4 comments

      Votes added by Reilly Grant

      Code-Review+1

      4 comments

      Patchset-level comments
      File services/webnn/webnn_graph_impl_unittest.cc
      Line 184, Patchset 27 (Latest): context_impl_->OnLost("Context is lost");
      context_impl_ = nullptr;
      Reilly Grant . unresolved

      This should let you remove the `DanglingUntriaged` annotation below. A call to `ExtractAsDangling()` is necessary for self-destruct functions like `OnLost()`.

      ```suggestion
      context_impl_.ExtractAsDangling()->OnLost("Context is lost");
      ```
      Line 420, Patchset 27 (Latest): base::RunLoop().RunUntilIdle();
      EXPECT_TRUE(context_client_impl.IsLost());
      Reilly Grant . unresolved
      ```suggestion
      base::RunUntil([&]() { return context_client_impl.IsLost(); });
      ```
      File third_party/blink/renderer/modules/ml/ml_context.cc
      Line 255, Patchset 27 (Latest): if (lost_property_->GetState() == LostProperty::kPending) {
      Reilly Grant . unresolved

      We can assert that this is true because `OnLost()` can only be called once.

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Alex Gough
      • Bryan Bernhart
      • Mingming1 Xu
      • Rafael Cintron
      • Rafael Cintron
      • ningxin hu
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement is not satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
      Gerrit-Change-Number: 5602134
      Gerrit-PatchSet: 27
      Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
      Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
      Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
      Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
      Gerrit-CC: Austin Sullivan <asu...@chromium.org>
      Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
      Gerrit-CC: Jiewei Qian <q...@chromium.org>
      Gerrit-CC: Kentaro Hara <har...@chromium.org>
      Gerrit-CC: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Attention: Alex Gough <aj...@chromium.org>
      Gerrit-Attention: ningxin hu <ningx...@intel.com>
      Gerrit-Attention: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-Attention: Rafael Cintron <rafael....@microsoft.com>
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Fri, 28 Jun 2024 21:30:10 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Rafael Cintron (Gerrit)

      unread,
      Jun 28, 2024, 8:28:40 PM (20 hours ago) Jun 28
      to Mingming1 Xu, Reilly Grant, ningxin hu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Alex Gough, Bryan Bernhart, Mingming1 Xu, Rafael Cintron and ningxin hu

      Rafael Cintron voted and added 1 comment

      Votes added by Rafael Cintron

      Code-Review+1

      1 comment

      Patchset-level comments
      Rafael Cintron . resolved

      LGTM sans Reilly issues

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Alex Gough
      • Bryan Bernhart
      • Mingming1 Xu
      • Rafael Cintron
      • ningxin hu
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Review
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Sat, 29 Jun 2024 00:28:28 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: Yes
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Mingming1 Xu (Gerrit)

      unread,
      Jun 28, 2024, 9:12:25 PM (20 hours ago) Jun 28
      to Rafael Cintron, Reilly Grant, ningxin hu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Alex Gough, Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

      Mingming1 Xu added 3 comments

      File services/webnn/webnn_graph_impl_unittest.cc
      Line 184, Patchset 27: context_impl_->OnLost("Context is lost");
      context_impl_ = nullptr;
      Reilly Grant . resolved

      This should let you remove the `DanglingUntriaged` annotation below. A call to `ExtractAsDangling()` is necessary for self-destruct functions like `OnLost()`.

      ```suggestion
      context_impl_.ExtractAsDangling()->OnLost("Context is lost");
      ```
      Mingming1 Xu

      Done

      Line 420, Patchset 27: base::RunLoop().RunUntilIdle();
      EXPECT_TRUE(context_client_impl.IsLost());
      Reilly Grant . unresolved
      ```suggestion
      base::RunUntil([&]() { return context_client_impl.IsLost(); });
      ```
      Mingming1 Xu

      I guess you mean `base::test::RunUntil` from "base/test/run_until.h". PTAL, thanks!

      File third_party/blink/renderer/modules/ml/ml_context.cc
      Line 255, Patchset 27: if (lost_property_->GetState() == LostProperty::kPending) {
      Reilly Grant . resolved

      We can assert that this is true because `OnLost()` can only be called once.

      Mingming1 Xu

      Done

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Alex Gough
      • Bryan Bernhart
      • Rafael Cintron
      • Reilly Grant
      • ningxin hu
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
      Gerrit-Change-Number: 5602134
      Gerrit-PatchSet: 28
      Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
      Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
      Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
      Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
      Gerrit-CC: Austin Sullivan <asu...@chromium.org>
      Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
      Gerrit-CC: Jiewei Qian <q...@chromium.org>
      Gerrit-CC: Kentaro Hara <har...@chromium.org>
      Gerrit-CC: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Attention: Alex Gough <aj...@chromium.org>
      Gerrit-Attention: Reilly Grant <rei...@chromium.org>
      Gerrit-Attention: ningxin hu <ningx...@intel.com>
      Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Sat, 29 Jun 2024 01:12:12 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy

      Mingming1 Xu (Gerrit)

      unread,
      Jun 28, 2024, 11:39:34 PM (17 hours ago) Jun 28
      to Rafael Cintron, Reilly Grant, ningxin hu, Alex Gough, Rafael Cintron, Chromium LUCI CQ, Dwayne Robinson, Austin Sullivan, AyeAye, Bryan Bernhart, Tricium, chromium...@chromium.org, Kentaro Hara, Jiewei Qian, mac-r...@chromium.org, blink-revie...@chromium.org, blink-...@chromium.org, blundell+...@chromium.org, ipc-securi...@chromium.org, jmedle...@chromium.org
      Attention needed from Alex Gough, Bryan Bernhart, Rafael Cintron, Reilly Grant and ningxin hu

      Mingming1 Xu added 1 comment

      File services/webnn/webnn_graph_impl_unittest.cc
      Line 184, Patchset 27: context_impl_->OnLost("Context is lost");
      context_impl_ = nullptr;
      Reilly Grant . unresolved

      This should let you remove the `DanglingUntriaged` annotation below. A call to `ExtractAsDangling()` is necessary for self-destruct functions like `OnLost()`.

      ```suggestion
      context_impl_.ExtractAsDangling()->OnLost("Context is lost");
      ```
      Mingming1 Xu

      Done

      Mingming1 Xu

      If I remove `DanglingUntriaged`, following tests `WebNNGraphImplTest.ValidateDispatchTest` and `WebNNGraphImplTest.ValidateInputsTest` will fail due to `Detected dangling raw_ptr` https://ci.chromium.org/ui/p/chromium/builders/try/win-rel/664189/overview, so I have to revert here to patchset 27. PTAL.

      Open in Gerrit

      Related details

      Attention is currently required from:
      • Alex Gough
      • Bryan Bernhart
      • Rafael Cintron
      • Reilly Grant
      • ningxin hu
      Submit Requirements:
      • requirement satisfiedCode-Coverage
      • requirement satisfiedCode-Review
      • requirement is not satisfiedNo-Unresolved-Comments
      Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. DiffyGerrit
      Gerrit-MessageType: comment
      Gerrit-Project: chromium/src
      Gerrit-Branch: main
      Gerrit-Change-Id: I16775ac8cdbd508e04025c22740d13f01e591b0d
      Gerrit-Change-Number: 5602134
      Gerrit-PatchSet: 29
      Gerrit-Owner: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Alex Gough <aj...@chromium.org>
      Gerrit-Reviewer: Mingming1 Xu <mingmi...@intel.com>
      Gerrit-Reviewer: Rafael Cintron <rafael....@microsoft.com>
      Gerrit-Reviewer: Reilly Grant <rei...@chromium.org>
      Gerrit-Reviewer: ningxin hu <ningx...@intel.com>
      Gerrit-CC: Austin Sullivan <asu...@chromium.org>
      Gerrit-CC: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-CC: Dwayne Robinson <dwa...@microsoft.com>
      Gerrit-CC: Jiewei Qian <q...@chromium.org>
      Gerrit-CC: Kentaro Hara <har...@chromium.org>
      Gerrit-CC: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Attention: Alex Gough <aj...@chromium.org>
      Gerrit-Attention: Reilly Grant <rei...@chromium.org>
      Gerrit-Attention: ningxin hu <ningx...@intel.com>
      Gerrit-Attention: Bryan Bernhart <bryan.b...@intel.com>
      Gerrit-Attention: Rafael Cintron <rafael....@chromium.org>
      Gerrit-Comment-Date: Sat, 29 Jun 2024 03:39:23 +0000
      Gerrit-HasComments: Yes
      Gerrit-Has-Labels: No
      Comment-In-Reply-To: Reilly Grant <rei...@chromium.org>
      Comment-In-Reply-To: Mingming1 Xu <mingmi...@intel.com>
      satisfied_requirement
      unsatisfied_requirement
      open
      diffy
      Reply all
      Reply to author
      Forward
      0 new messages