Intent to Ship: WebAssembly Relaxed SIMD

491 views
Skip to first unread message

Deepti Gandluri

unread,
Mar 10, 2023, 2:06:25 AM3/10/23
to blin...@chromium.org, Lutz Vahl, Zhi An Ng, Thibaud Michaud

Contact emails

gde...@chromium.orgzh...@chromium.orgthib...@chromium.org

Explainer

https://github.com/WebAssembly/relaxed-simd/blob/main/proposals/relaxed-simd/Overview.md

Specification

https://github.com/WebAssembly/relaxed-simd/tree/main/document

Summary

Relaxed SIMD extends the existing SIMD proposal to introduce vector instructions that relax the strict determinism constraints of portable SIMD to take better advantage of the underlying hardware. The operations introduced in this proposal take advantage of widely available instruction sets to accelerate compute workloads.


Blink component

Blink>JavaScript>WebAssembly

TAG review

Not required as per: https://v8.dev/docs/feature-launch-process. This introduces an additional set of vector operations to WebAssembly, and makes no API changes.


Risks



Interoperability and Compatibility


Gecko: In development, enabled in nightly

WebKit: Neutral as per issue comment

Web developers: Strongly positive, Proposal in Phase 3 in the WebAssembly CG the proposal was incubated to address some of the developer/user requests from the previous SIMD proposal.

Other signals: Proposal voted to a provisional phase 4 as per meeting notes in the February 14th CG meeting (notes: https://github.com/WebAssembly/meetings/blob/main/main/2023/CG-02-14.md). The feature has consensus in the CG, but the vote is provisional till the formal spec is up to date (Tracking issue: https://github.com/WebAssembly/relaxed-simd/issues/134, PRs also in flight).

WebView application risks

Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications? No



Debuggability

Supported instructions are enabled in Liftoff, and are visible to DevTools for debuggability.



Will this feature be supported on all six Blink platforms (Windows, Mac, Linux, Chrome OS, Android, and Android WebView)?

Yes

Is this feature fully tested by web-platform-tests?

Not applicable, tested by WebAssembly spec tests

Flag name

V8: --wasm-relaxed-simd

Chrome: Features::kWebAssemblyRelaxedSimd

Requires code in //chrome?

False

Tracking bug

https://bugs.chromium.org/p/v8/issues/detail?id=12284

Estimated milestones

114


Anticipated spec changes

No anticipated spec changes, but some potential for compat issues based on hardware, more details in this Entropy.md, and the linked issues.


Link to entry on the Chrome Platform Status

https://chromestatus.com/feature/5082417973952512

Deepti Gandluri

unread,
Mar 10, 2023, 2:39:03 AM3/10/23
to blin...@chromium.org, Mike Taylor, Lutz Vahl, Zhi An Ng, Thibaud Michaud
Pasting, and responding to entropy questions from the previous thread: 

>> For most of the exposed entropy, we already expose this via the User-Agent string, or the Arch UA Client Hint. Can you say more about "Differences between hardware that has native FMA support, and hardware that does not." and "whether the Dot product extension is supported in the most optimal codegen" - any idea what the distributions would look there there?

For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on  AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA. 

Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".

>> As to compat, "code compiled for one browser works differently on a different browser" - this sounds a little bit scary! Do we have any ideas on how to minimize (I assume preventing isn't a reality) this outcome?

The proposal tries to minimize this by being very prescriptive of optimal instruction sequences, for consistent outcome. While we expect browsers engines to use this set of instructions in their code generation, loosening the determinism means that we don't have a way to necessarily guarantee that. 

Thanks,
Deepti

Paul Jensen

unread,
Mar 13, 2023, 11:38:12 AM3/13/23
to blink-dev, Deepti Gandluri, Lutz Vahl, Zhi An Ng, Thibaud Michaud, Mike Taylor
>> For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on  AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.

Am I correct in interpreting this to mean that for devices made in the last decade, there wouldn't be substantial exposed entropy for FMA?

>> Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".

The "How does behavior differ across processors?" section lists three different options for dot products.  Which option is Chrome pursuing?  Do you know how much entropy is exposed here?

On Friday, March 10, 2023 at 2:39:03 AM UTC-5 Deepti Gandluri wrote:
Pasting, and responding to entropy questions from the previous thread: 

>> For most of the exposed entropy, we already expose this via the User-Agent string, or the Arch UA Client Hint. Can you say more about "Differences between hardware that has native FMA support, and hardware that does not." and "whether the Dot product extension is supported in the most optimal codegen" - any idea what the distributions would look there there?

For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on  AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA. 

Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".

>> As to compat, "code compiled for one browser works differently on a different browser" - this sounds a little bit scary! Do we have any ideas on how to minimize (I assume preventing isn't a reality) this outcome?

The proposal tries to minimize this by being very prescriptive of optimal instruction sequences, for consistent outcome. While we expect browsers engines to use this set of instructions in their code generation, loosening the determinism means that we don't have a way to necessarily guarantee that. 

Thanks,
Deepti


On Thu, Mar 9, 2023 at 11:06 PM Deepti Gandluri <gde...@chromium.org> wrote:

Deepti Gandluri

unread,
Mar 14, 2023, 12:54:54 PM3/14/23
to Paul Jensen, blink-dev, Lutz Vahl, Zhi An Ng, Thibaud Michaud
On Mon, Mar 13, 2023 at 8:38 AM Paul Jensen <paulj...@chromium.org> wrote:
>> For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on  AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.

Am I correct in interpreting this to mean that for devices made in the last decade, there wouldn't be substantial exposed entropy for FMA?

Yes, that is correct. Though a small caveat is that even though the processor was released in 2013, consumer hardware does lag, so not strictly a decade, but I would guess close to it. 

>> Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".

The "How does behavior differ across processors?" section lists three different options for dot products.  Which option is Chrome pursuing?  Do you know how much entropy is exposed here?

On x86/64, we use the lowerings for "x86/x86-64 processors with AVX instruction set", so we don't support the most optimal lowering at this time (though we are experimenting with them for prototyping/benchmarking). On ARM64, we are using the "ARM64 processors with Dot Product extension" option which is supported from ARMv8.2+. This is available in all the newer Pixel Phones (2018 onwards), and on the ARM64 based M1/M2 Macbooks. The older android phones, and ARM Chromebooks do not have native hardware support for Dot product instructions. The reason we decided to support this for the newer hardware was a between ~2-4x performance speedup (over existing Wasm+SIMD) for benchmarks that are sensitive to this

Thanks,
Deepti

On Friday, March 10, 2023 at 2:39:03 AM UTC-5 Deepti Gandluri wrote:
Pasting, and responding to entropy questions from the previous thread: 

>> For most of the exposed entropy, we already expose this via the User-Agent string, or the Arch UA Client Hint. Can you say more about "Differences between hardware that has native FMA support, and hardware that does not." and "whether the Dot product extension is supported in the most optimal codegen" - any idea what the distributions would look there there?

For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on  AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA. 

Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".

>> As to compat, "code compiled for one browser works differently on a different browser" - this sounds a little bit scary! Do we have any ideas on how to minimize (I assume preventing isn't a reality) this outcome?

The proposal tries to minimize this by being very prescriptive of optimal instruction sequences, for consistent outcome. While we expect browsers engines to use this set of instructions in their code generation, loosening the determinism means that we don't have a way to necessarily guarantee that. 

Thanks,
Deepti


On Thu, Mar 9, 2023 at 11:06 PM Deepti Gandluri <gde...@chromium.org> wrote:

Mike Taylor

unread,
Mar 15, 2023, 11:58:14 AM3/15/23
to Deepti Gandluri, Paul Jensen, blink-dev, Lutz Vahl, Zhi An Ng, Thibaud Michaud

On 3/14/23 12:54 PM, Deepti Gandluri wrote:

On Mon, Mar 13, 2023 at 8:38 AM Paul Jensen <paulj...@chromium.org> wrote:
>> For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on  AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.

Am I correct in interpreting this to mean that for devices made in the last decade, there wouldn't be substantial exposed entropy for FMA?

Yes, that is correct. Though a small caveat is that even though the processor was released in 2013, consumer hardware does lag, so not strictly a decade, but I would guess close to it.

>> Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".

The "How does behavior differ across processors?" section lists three different options for dot products.  Which option is Chrome pursuing?  Do you know how much entropy is exposed here?

On x86/64, we use the lowerings for "x86/x86-64 processors with AVX instruction set", so we don't support the most optimal lowering at this time (though we are experimenting with them for prototyping/benchmarking). On ARM64, we are using the "ARM64 processors with Dot Product extension" option which is supported from ARMv8.2+. This is available in all the newer Pixel Phones (2018 onwards), and on the ARM64 based M1/M2 Macbooks. The older android phones, and ARM Chromebooks do not have native hardware support for Dot product instructions. The reason we decided to support this for the newer hardware was a between ~2-4x performance speedup (over existing Wasm+SIMD) for benchmarks that are sensitive to this.
Thanks for the answers here, Deepti. Much appreciated. I'm somewhat out of my depth here, admittedly, but don't see anything objectionable.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CALi9WK_Kpj1OUOV4aC0AP9%3Db106hNwQMVxtvJDKe0M2c9pSxYQ%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages