OV NPU can't support models with MatMul whose number of
columns in the first matrix is too large, resulting in hange issue
during model compilation.Xu, Mingming1When you say "OV NPU can't support ....", is this a limitation of the hardware or a bug in the EP itself? Does the "hang" really mean compilation will never complete or just that it takes a long time? If the latter, how long is "a long time"? What is the ETA for a fix?
In general, I think we should avoid doing workarounds like this and, instead, insist that vendors fix their software. Otherwise, workarounds become permanent and linger forever.
Xu, Mingming1>When you say "OV NPU can't support ....", is this a limitation of the hardware or a bug in the EP itself?
It's a NPU driver bug, not hardware limitation.
>Does the "hang" really mean compilation will never complete or just that it takes a long time? If the latter, how long is "a long time"?
As I know, it's the latter case, it takes too long time (even one day). /cc @qiuji...@intel.com
>What is the ETA for a fix?
A NPU compiler fix is WIP. Maybe the fix can be in the next NPU driver.
Xu, Mingming1The NPU compiler fix PR has been landed yet.
Done
// Workaround: Split MatMul operations with excessively large K dimension size
// into smaller chunks to prevent hang issues during model compilation on some
// NPU devices.Xu, Mingming1As discussed with @rafael....@microsoft.com offline, in this CL, we agreed to fail the graph build for this issue. We'll explore the workaround in EP. Once new EP release has the fix, we'll increase the minimum required version for that EP and remove the handling from Chromium. This would be a TODO.
Updated, PTAL, thanks!
k_dimension_size > matmul_k_dimension_limit_.value()) {Xu, Mingming1If this issue only exists for grouped / batched matmul, we should only reject that.
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
This CL reject such MatMul operations to prevent hang issues.nit: rejects
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
This CL reject such MatMul operations to prevent hang issues.Xu, Mingming1nit: rejects
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I found that the hang issue can cause the entire GPU process to hang.
I did a test: I opened more than 10 web pages running the same model (which has a hang issue caused by batched MatMul). Each page should compile the model, which blocks a background thread and never completes. The Chrome UI hung for a while, and eventually the GPU process crashed with the error:
`The GPU process crashed! Exit code: RESULT_CODE_HUNG.`
I'm concerned that this could be a serious issue. What's your opinion?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
I found that the hang issue can cause the entire GPU process to hang.
I did a test: I opened more than 10 web pages running the same model (which has a hang issue caused by batched MatMul). Each page should compile the model, which blocks a background thread and never completes. The Chrome UI hung for a while, and eventually the GPU process crashed with the error:
`The GPU process crashed! Exit code: RESULT_CODE_HUNG.`
I'm concerned that this could be a serious issue. What's your opinion?
Thanks @mingmi...@intel.com for the analysis, I agreed this is a critical issue. You may want to document the reproduce steps at the issue https://issues.chromium.org/issues/467442135. Could a web page cause the GPU process hung issue by initiating 10 async model compilations without waiting for the completion?
Hu, NingxinI found that the hang issue can cause the entire GPU process to hang.
I did a test: I opened more than 10 web pages running the same model (which has a hang issue caused by batched MatMul). Each page should compile the model, which blocks a background thread and never completes. The Chrome UI hung for a while, and eventually the GPU process crashed with the error:
`The GPU process crashed! Exit code: RESULT_CODE_HUNG.`
I'm concerned that this could be a serious issue. What's your opinion?
Thanks @mingmi...@intel.com for the analysis, I agreed this is a critical issue. You may want to document the reproduce steps at the issue https://issues.chromium.org/issues/467442135. Could a web page cause the GPU process hung issue by initiating 10 async model compilations without waiting for the completion?
Sure, please see steps in https://issues.chromium.org/issues/467442135
Hu, NingxinI found that the hang issue can cause the entire GPU process to hang.
I did a test: I opened more than 10 web pages running the same model (which has a hang issue caused by batched MatMul). Each page should compile the model, which blocks a background thread and never completes. The Chrome UI hung for a while, and eventually the GPU process crashed with the error:
`The GPU process crashed! Exit code: RESULT_CODE_HUNG.`
I'm concerned that this could be a serious issue. What's your opinion?
Xu, Mingming1Thanks @mingmi...@intel.com for the analysis, I agreed this is a critical issue. You may want to document the reproduce steps at the issue https://issues.chromium.org/issues/467442135. Could a web page cause the GPU process hung issue by initiating 10 async model compilations without waiting for the completion?
Sure, please see steps in https://issues.chromium.org/issues/467442135
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Exportable changes to web-platform-tests were detected in this CL and a pull request in the upstream repo has been made: https://github.com/web-platform-tests/wpt/pull/57011.
When this CL lands, the bot will automatically merge the PR on GitHub if the required GitHub checks pass; otherwise, ecosystem-infra@ team will triage the failures and may contact you.
WPT Export docs:
https://chromium.googlesource.com/chromium/src/+/main/docs/testing/web_platform_tests.md#Automatic-export-process
OV NPU can't support models with batched MatMul whose K
dimension size is too large, resulting in hange issue during
model compilation.
This CL rejects such MatMul operations to prevent hang issues.```suggestion
The OpenVINO NPU EP will take a very long time to compile models
with batched MatMul operations whose K dimension size is too large.
This CL rejects such MatMul operations to prevent WebNN from
becoming unresponsive.
```
// dimension size to prevent hang issues during model compilation on some NPU
// devices.```suggestion
// dimension size to prevent the EP from becoming unresponsive during model
// compilation on some NPU devices.
```
// TODO(crbug.com/467468912): When the hang issue is fixed, remove the
// Limitation and increase the minimum required OV EP version.```suggestion
// TODO(crbug.com/467468912): When the OpenVINO issue is fixed, remove
// the limitation and increase the minimum required EP version.
```
if (!model_info_result.has_value()) {Use the ASSIGN_OR_RETURN macro to simplify this.
// It's safe to keep `first_selected_device_` as `SessionOptions` also keeps a
// reference of `Environment` which owns all EP devices.```suggestion
// It's safe to keep `first_selected_device_` as `env_` owns all EP devices.
```
OV NPU can't support models with batched MatMul whose K
dimension size is too large, resulting in hange issue during
model compilation.
This CL rejects such MatMul operations to prevent hang issues.```suggestion
The OpenVINO NPU EP will take a very long time to compile models
with batched MatMul operations whose K dimension size is too large.This CL rejects such MatMul operations to prevent WebNN from
becoming unresponsive.
```
👍Done
// dimension size to prevent hang issues during model compilation on some NPU
// devices.```suggestion
// dimension size to prevent the EP from becoming unresponsive during model
// compilation on some NPU devices.
```
Done
// TODO(crbug.com/467468912): When the hang issue is fixed, remove the
// Limitation and increase the minimum required OV EP version.```suggestion
// TODO(crbug.com/467468912): When the OpenVINO issue is fixed, remove
// the limitation and increase the minimum required EP version.
```
Done
Use the ASSIGN_OR_RETURN macro to simplify this.
Good catch! Thanks!
// It's safe to keep `first_selected_device_` as `SessionOptions` also keeps a
// reference of `Environment` which owns all EP devices.```suggestion
// It's safe to keep `first_selected_device_` as `env_` owns all EP devices.
```
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
const char* ep_name = ort_api->EpDevice_EpName(first_selected_device);
const auto iter = kKnownEPs.find(UNSAFE_BUFFERS(base::cstring_view(ep_name)));
if (iter == kKnownEPs.end()) {
return std::nullopt;
}By the time we get to this code, shouldn't the EP always be in the known EP list?
const std::vector<uint32_t>& input_a_shape =
GetOperand(matmul.a_operand_id).descriptor.shape();
uint32_t k_dimension_size = input_a_shape.back();
const OperandDescriptor& output_descriptor =
GetOperand(matmul.output_operand_id).descriptor;
bool is_batched_matmul = output_descriptor.Rank() > 2;To avoid having correctly implemented EPs pay the price of all of this lookup and checking, please structure the code such that the `batched_matmul_k_dimension_limit_` has_value() and value() calls are made first.
// Limitation: Reject batched MatMul operations with excessively large KPlease also link to the OV bug and provide a rough ETA for when we can expect a fix.
// The OpenVINO NPU limits the batched MatMul K dimension size to
// 8192.I see that you provided a detailed analysis of the situation in `GraphBuilderOrt::AddMatMulOperation`, including links to crbugs. Please add a comment here referring to that code, or vice versa, so people can easily find more details.
std::optional<uint32_t> npu_batched_matmul_k_dimension_limit;Should `npu_batched_matmul_k_dimension_limit` go in the `EpWorkarounds` structure? I know that it is already passed around in various place which might make accessing the value easier.
| Code-Review | +1 |
// Limitation: Reject batched MatMul operations with excessively large KPlease also link to the OV bug and provide a rough ETA for when we can expect a fix.
https://github.com/microsoft/onnxruntime/issues/26643, the fix is expected to be available in NPU driver Feb release.
const char* ep_name = ort_api->EpDevice_EpName(first_selected_device);
const auto iter = kKnownEPs.find(UNSAFE_BUFFERS(base::cstring_view(ep_name)));
if (iter == kKnownEPs.end()) {
return std::nullopt;
}By the time we get to this code, shouldn't the EP always be in the known EP list?
It's possible. The `kKnownEPs` doesn't contain default EPs like DML EP.
const std::vector<uint32_t>& input_a_shape =
GetOperand(matmul.a_operand_id).descriptor.shape();
uint32_t k_dimension_size = input_a_shape.back();
const OperandDescriptor& output_descriptor =
GetOperand(matmul.output_operand_id).descriptor;
bool is_batched_matmul = output_descriptor.Rank() > 2;To avoid having correctly implemented EPs pay the price of all of this lookup and checking, please structure the code such that the `batched_matmul_k_dimension_limit_` has_value() and value() calls are made first.
Done
// Limitation: Reject batched MatMul operations with excessively large KHu, NingxinPlease also link to the OV bug and provide a rough ETA for when we can expect a fix.
https://github.com/microsoft/onnxruntime/issues/26643, the fix is expected to be available in NPU driver Feb release.
Done
// The OpenVINO NPU limits the batched MatMul K dimension size to
// 8192.I see that you provided a detailed analysis of the situation in `GraphBuilderOrt::AddMatMulOperation`, including links to crbugs. Please add a comment here referring to that code, or vice versa, so people can easily find more details.
Done
std::optional<uint32_t> npu_batched_matmul_k_dimension_limit;Should `npu_batched_matmul_k_dimension_limit` go in the `EpWorkarounds` structure? I know that it is already passed around in various place which might make accessing the value easier.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// The fix is expected to be available in NPU driver Feb release.```suggestion
// The fix is expected to be available in NPU driver Feb '26 release.
```
std::optional<uint32_t> GetBatchedMatMulKDimensionLimit(
const OrtEpDevice* first_selected_device) {
const OrtApi* ort_api = PlatformFunctions::GetInstance()->ort_api();
const char* ep_name = ort_api->EpDevice_EpName(first_selected_device);
const auto iter = kKnownEPs.find(UNSAFE_BUFFERS(base::cstring_view(ep_name)));
if (iter == kKnownEPs.end()) {
return std::nullopt;
}
OrtHardwareDeviceType hardware_device_type = ort_api->HardwareDevice_Type(
ort_api->EpDevice_Device(first_selected_device));
if (hardware_device_type != OrtHardwareDeviceType_NPU) {
return std::nullopt;
}
return iter->second.workarounds.npu_batched_matmul_k_dimension_limit;
}Place this method into the unnamed namespace.
std::optional<uint32_t> npu_batched_matmul_k_dimension_limit;Nit: for struct, data member definition should be declared before functions.
// The maximum K dimension size of batched MatMul on certain NPU devices.You may want to explain it is unnecessary to compute `|=` operation result across EP devices, because there will be only one NPU device EP.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// The fix is expected to be available in NPU driver Feb release.```suggestion
// The fix is expected to be available in NPU driver Feb '26 release.
```
Done
std::optional<uint32_t> GetBatchedMatMulKDimensionLimit(
const OrtEpDevice* first_selected_device) {
const OrtApi* ort_api = PlatformFunctions::GetInstance()->ort_api();
const char* ep_name = ort_api->EpDevice_EpName(first_selected_device);
const auto iter = kKnownEPs.find(UNSAFE_BUFFERS(base::cstring_view(ep_name)));
if (iter == kKnownEPs.end()) {
return std::nullopt;
}
OrtHardwareDeviceType hardware_device_type = ort_api->HardwareDevice_Type(
ort_api->EpDevice_Device(first_selected_device));
if (hardware_device_type != OrtHardwareDeviceType_NPU) {
return std::nullopt;
}
return iter->second.workarounds.npu_batched_matmul_k_dimension_limit;
}Place this method into the unnamed namespace.
Done
std::optional<uint32_t> npu_batched_matmul_k_dimension_limit;Nit: for struct, data member definition should be declared before functions.
Done
// The maximum K dimension size of batched MatMul on certain NPU devices.You may want to explain it is unnecessary to compute `|=` operation result across EP devices, because there will be only one NPU device EP.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
WebNN: Reject batched MatMul with large K dimension size on OV NPU
The OpenVINO NPU EP will take a very long time to compile models
with batched MatMul operations whose K dimension size is too large.
This CL rejects such MatMul operations to prevent WebNN from
becoming unresponsive.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
The WPT PR for this CL has been merged upstream! https://github.com/web-platform-tests/wpt/pull/57011
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |