[dev.simd] simd, cmd/compile: add WASM SIMD first steps
DO NOT SUBMIT
WIP
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
First round of review. Looks good overall. Thanks.
genericOps = append(genericOps, simdGenericOps()...)
genericOps = append(genericOps, simdGenericOpsWasm()...)This suggests that our existing simdGenericOps are still architecture dependent, in that they are generic ops, but corresponding to a specific architecture. Maybe we should rename? Maybe we should find another way to enumerate all generic ops?
{name: "AndMask8x16", argLength: 2, commutative: true},
{name: "AndMask16x8", argLength: 2, commutative: true},
{name: "AndMask32x4", argLength: 2, commutative: true},
{name: "AndMask64x2", argLength: 2, commutative: true},
{name: "AndNotMask8x16", argLength: 2},
{name: "AndNotMask16x8", argLength: 2},
{name: "AndNotMask32x4", argLength: 2},
{name: "AndNotMask64x2", argLength: 2},It looks like Mask types are just int vector types, the same as AMD64. On AMD64 we don't need Mask ops as generic ops. Instead, we just intrinsify Mask8x18.And to AndInt8x16. Can we do the same for Wasm?
//go:build goexperiment.simd && amd64We probably don't want this. See other comments. Some files are available on all architectures, like cpu.go. Other files use _amd64 suffix in the file name as the constraint.
//go:build goexperiment.simd && amd64The doc comments below says that the features are defined on all architectures, just return false if it is not X86. So I think we should not make this change.
//go:build amd64||wasmAdd a space before and after ||.
//go:build goexperiment.simd && amd64The build constraint is implied by the file name, so this shouldn't be necessary.
// i8x16.absWe should format this in the same way as amd64.
// Mask8x16 is a 128-bit SIMD vector of 16 int8s.This is for a vector type, not a mask type.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
All but two comments addressed so far, want to remail the stack and iterate.
genericOps = append(genericOps, simdGenericOps()...)
genericOps = append(genericOps, simdGenericOpsWasm()...)This suggests that our existing simdGenericOps are still architecture dependent, in that they are generic ops, but corresponding to a specific architecture. Maybe we should rename? Maybe we should find another way to enumerate all generic ops?
I think I might want to figure this out in a separate CL, because it will involve coordinating with AMD64 while other changes are in flight.
We probably don't want this. See other comments. Some files are available on all architectures, like cpu.go. Other files use _amd64 suffix in the file name as the constraint.
Done
The doc comments below says that the features are defined on all architectures, just return false if it is not X86. So I think we should not make this change.
Done
Add a space before and after ||.
Done
The build constraint is implied by the file name, so this shouldn't be necessary.
Done
This is for a vector type, not a mask type.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
{name: "AndMask8x16", argLength: 2, commutative: true},
{name: "AndMask16x8", argLength: 2, commutative: true},
{name: "AndMask32x4", argLength: 2, commutative: true},
{name: "AndMask64x2", argLength: 2, commutative: true},
{name: "AndNotMask8x16", argLength: 2},
{name: "AndNotMask16x8", argLength: 2},
{name: "AndNotMask32x4", argLength: 2},
{name: "AndNotMask64x2", argLength: 2},It looks like Mask types are just int vector types, the same as AMD64. On AMD64 we don't need Mask ops as generic ops. Instead, we just intrinsify Mask8x18.And to AndInt8x16. Can we do the same for Wasm?
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// i8x16.absDavid ChaseWe should format this in the same way as amd64.
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
{name: "AbsInt8x16", argLength: 1}, // ARCH:amd64Is this right? Shouldn't it be ARCH:amd64,wasm ?...
case ssa.OpWasmI8x16AllTrue:
getValue128(s, v.Args[0])
s.Prog(v.Op.Asm())
case ssa.OpWasmV128AnyTrue:
getValue128(s, v.Args[0])
s.Prog(v.Op.Asm())Can merge them?
There are many other similarly mergeable places too
// Copyright 2025 The Go Authors. All rights reserved.Should these belong to CL 772102?
// Bitselect returns the elementwise bitwise selection using mask m.BitSelect?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Is this right? Shouldn't it be ARCH:amd64,wasm ?...
It should, if Wasm had that instruction, but it does not. Floating point only.
vunopf𝑁x𝑀 ::= abs | neg | sqrt | ceil | floor | trunc | nearest
case ssa.OpWasmI8x16AllTrue:
getValue128(s, v.Args[0])
s.Prog(v.Op.Asm())
case ssa.OpWasmV128AnyTrue:
getValue128(s, v.Args[0])
s.Prog(v.Op.Asm())Can merge them?
There are many other similarly mergeable places too
This file is all merged now.
Should these belong to CL 772102?
It could go there, so I did that. Originally the Musman fixes had not landed yet.
The AI-suggested fix is irrelevant and would probably provoke a copyright-test failure, be sure to tell the robot it made a mistake.
Should these belong to CL 772102?
Done
// Bitselect returns the elementwise bitwise selection using mask m.David ChaseBitSelect?
I fixed the declaration/documentation; we might not want this for all the types, but for now it is.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
Looks good overall. The compiler change looks good. Haven't looked carfully on the generator. Thanks.
simdregmask: fp,Let's use v. (v and fp are equal, just for clarity.)
// "R15", // because the register allocator currently has a 64-register limit, exclude a few registers.The limit no longer applies. Maybe we don't need this now.
sourceMask := s.compatRegs(v.Type)
if !sourceMask.hasReg(r) && !onWasmStack {
// Assign a temporary register that can be copied to the desired destination;
// this at least works where it is currently a problem (x86).
// This happens processing e.g. ASAN/TSAN with SIMD *simdtype methods.
s.setOrig(c, v)
s.assignReg(s.allocReg(sourceMask, v), v, c)
c = s.curBlock.NewValue1(pos, OpCopy, v.Type, c)
}This seems unrelated to this CL? Maybe another CL?
if base.Ctxt.Retpoline {
// Note spectre=all implies retpoline which requires binary search instead of table switch.
return branchTableImm8(s, idx, intrinsicCall, genOp)
}
This may be another CL.
if base.Flag.Cfg.SpectreIndex {
// Potential Spectre vulnerability hardening?
idx = s.newValue2(ssa.OpSpectreSliceIndex, t, idx, s.uintptrConstant(255))
}For spectre=index but not all, we may need this? Either way, should be another CL.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Let's use v. (v and fp are equal, just for clarity.)
Done
// "R15", // because the register allocator currently has a 64-register limit, exclude a few registers.The limit no longer applies. Maybe we don't need this now.
Done
sourceMask := s.compatRegs(v.Type)
if !sourceMask.hasReg(r) && !onWasmStack {
// Assign a temporary register that can be copied to the desired destination;
// this at least works where it is currently a problem (x86).
// This happens processing e.g. ASAN/TSAN with SIMD *simdtype methods.
s.setOrig(c, v)
s.assignReg(s.allocReg(sourceMask, v), v, c)
c = s.curBlock.NewValue1(pos, OpCopy, v.Type, c)
}This seems unrelated to this CL? Maybe another CL?
I moved it into its own CL. The bug, now disappeared, occurred in the "use uint8 for shift distance" CL that has been abandoned, and I can't figure out how to make it happen again.
if base.Ctxt.Retpoline {
// Note spectre=all implies retpoline which requires binary search instead of table switch.
return branchTableImm8(s, idx, intrinsicCall, genOp)
}
This may be another CL.
This is bringing this code in line with the main repository. I stole this code to fix the SIMD-retpoline bug there, and then fixed it and tweaked it along the way. Not matching exactly means extra fun for the next merge.
if base.Flag.Cfg.SpectreIndex {
// Potential Spectre vulnerability hardening?
idx = s.newValue2(ssa.OpSpectreSliceIndex, t, idx, s.uintptrConstant(255))
}For spectre=index but not all, we may need this? Either way, should be another CL.
We did not seem to need this; I could not make it fail without it, and Keith thinks that because it does all of the values, it does not apply. It matches the main repo.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.I noticed the new slicepart_128.go still has a few stale *Slice references in comments from before the load/store rename. Fixing them would help git auto-merge cleanly regardless of whether this CL or CL 752803 (ARM64 SIMD) lands first:
```
sed -i 's/LoadInt8x16Slice/LoadInt8x16/g; s/LoadInt16x8Slice/LoadInt16x8/g; s/LoadUint8x16Slice/LoadUint8x16/g; s/LoadUint16x8Slice/LoadUint16x8/g; s/x\.StoreSlice/x.Store/g' src/simd/archsimd/slicepart_128.go
```
Lines 56, 109, 161, 194, 228, 253.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// If s has 16 or more elements, the function is equivalent to LoadInt8x16Slice.I noticed the new slicepart_128.go still has a few stale *Slice references in comments from before the load/store rename. Fixing them would help git auto-merge cleanly regardless of whether this CL or CL 752803 (ARM64 SIMD) lands first:
```
sed -i 's/LoadInt8x16Slice/LoadInt8x16/g; s/LoadInt16x8Slice/LoadInt16x8/g; s/LoadUint8x16Slice/LoadUint8x16/g; s/LoadUint16x8Slice/LoadUint16x8/g; s/x\.StoreSlice/x.Store/g' src/simd/archsimd/slicepart_128.go
```Lines 56, 109, 161, 194, 228, 253.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |