Relaxed SIMD extends the existing SIMD proposal to introduce vector instructions that relax the strict determinism constraints of portable SIMD to take better advantage of the underlying hardware. The operations introduced in this proposal take advantage of widely available instruction sets to accelerate compute workloads.
Does this intent deprecate or change behavior of existing APIs, such that it has potentially high risk for Android WebView-based applications? No
Supported instructions are enabled in Liftoff, and are visible to DevTools for debuggability.
114
No anticipated spec changes, but some potential for compat issues based on hardware, more details in this Entropy.md, and the linked issues.
Pasting, and responding to entropy questions from the previous thread:>> For most of the exposed entropy, we already expose this via the User-Agent string, or the Arch UA Client Hint. Can you say more about "Differences between hardware that has native FMA support, and hardware that does not." and "whether the Dot product extension is supported in the most optimal codegen" - any idea what the distributions would look there there?For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".>> As to compat, "code compiled for one browser works differently on a different browser" - this sounds a little bit scary! Do we have any ideas on how to minimize (I assume preventing isn't a reality) this outcome?The proposal tries to minimize this by being very prescriptive of optimal instruction sequences, for consistent outcome. While we expect browsers engines to use this set of instructions in their code generation, loosening the determinism means that we don't have a way to necessarily guarantee that.Thanks,Deepti
On Thu, Mar 9, 2023 at 11:06 PM Deepti Gandluri <gde...@chromium.org> wrote:
Contact emails
gde...@chromium.org, zhin@chromium.org, thibaudm@chromium.org
>> For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.
Am I correct in interpreting this to mean that for devices made in the last decade, there wouldn't be substantial exposed entropy for FMA?
>> Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".The "How does behavior differ across processors?" section lists three different options for dot products. Which option is Chrome pursuing? Do you know how much entropy is exposed here?
On Friday, March 10, 2023 at 2:39:03 AM UTC-5 Deepti Gandluri wrote:
Pasting, and responding to entropy questions from the previous thread:>> For most of the exposed entropy, we already expose this via the User-Agent string, or the Arch UA Client Hint. Can you say more about "Differences between hardware that has native FMA support, and hardware that does not." and "whether the Dot product extension is supported in the most optimal codegen" - any idea what the distributions would look there there?For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".>> As to compat, "code compiled for one browser works differently on a different browser" - this sounds a little bit scary! Do we have any ideas on how to minimize (I assume preventing isn't a reality) this outcome?The proposal tries to minimize this by being very prescriptive of optimal instruction sequences, for consistent outcome. While we expect browsers engines to use this set of instructions in their code generation, loosening the determinism means that we don't have a way to necessarily guarantee that.Thanks,Deepti
On Thu, Mar 9, 2023 at 11:06 PM Deepti Gandluri <gde...@chromium.org> wrote:
Contact emails
gde...@chromium.org, zh...@chromium.org, thib...@chromium.org
On 3/14/23 12:54 PM, Deepti Gandluri wrote:
On Mon, Mar 13, 2023 at 8:38 AM Paul Jensen <paulj...@chromium.org> wrote:
>> For FMA, on x86, everything from Haswell (2013) onwards, and Piledriver (2012) onwards on AMD support native FMA operations. On ARM64, Neon support implies FMA would be natively supported as well (Neon is the baseline on ARM64 for being able to generate any vector instructions). On Arm32, the vfma/vfms instructions are supported on Armv7 onwards. So most recent processors have native support for FMA.
Am I correct in interpreting this to mean that for devices made in the last decade, there wouldn't be substantial exposed entropy for FMA?
Yes, that is correct. Though a small caveat is that even though the processor was released in 2013, consumer hardware does lag, so not strictly a decade, but I would guess close to it.
>> Regarding the dot product instruction, the SDOT instruction is natively supported on Armv8.2+ , we don't currently implement the AVX2-VNNI implementation at this time as a newer extension and the inability to test it on our bots. More details are outlined in this issue under "How does behavior differ across processors?".
The "How does behavior differ across processors?" section lists three different options for dot products. Which option is Chrome pursuing? Do you know how much entropy is exposed here?
On x86/64, we use the lowerings for "x86/x86-64 processors with AVX instruction set", so we don't support the most optimal lowering at this time (though we are experimenting with them for prototyping/benchmarking). On ARM64, we are using the "ARM64 processors with Dot Product extension" option which is supported from ARMv8.2+. This is available in all the newer Pixel Phones (2018 onwards), and on the ARM64 based M1/M2 Macbooks. The older android phones, and ARM Chromebooks do not have native hardware support for Dot product instructions. The reason we decided to support this for the newer hardware was a between ~2-4x performance speedup (over existing Wasm+SIMD) for benchmarks that are sensitive to this.
--
You received this message because you are subscribed to the Google Groups "blink-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to blink-dev+...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/blink-dev/CALi9WK_Kpj1OUOV4aC0AP9%3Db106hNwQMVxtvJDKe0M2c9pSxYQ%40mail.gmail.com.