void InstructionSelector::VisitI32x4DotI8x16S(OpIndex node) {This is a test; please ignore it (and mark is as "resolved" when you want to land)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hi Matthias, could you PTAL?
Hi Sam, I have a large amount of bugs filed in my name that I have been investigating in the last few days (I did the "mistake" of adding some more validation to V8 which is now causing all kinds of issues that fuzzers find.)
I didn't really have the time yet for reviewing this in detail, but I'm reasonably concerned of the risk of this.
IIUC we match a very specific pattern in the machine optimization reducer, then we try to match that pattern much later during ISel.
It is probably rather unlikely that fuzzers will be able to generate this pattern, so this reminds me of https://chromium-review.git.corp.google.com/c/v8/v8/+/7462509.
Would this be a similar case where it's probably necessary that we make sure via some custom fuzzing logic that this pattern works as expected? While times have changed and AI has gotten much better, I'm not sure how much I'd want to rely on that?
if (auto dot = Get(binop.left()).TryCast<Simd128BinopOp>()) {Don't we typically also check for `CanCover` to ensure we only apply the optimization if the intermediate result isn't used by other operations?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHi Matthias, could you PTAL?
Hi Sam, I have a large amount of bugs filed in my name that I have been investigating in the last few days (I did the "mistake" of adding some more validation to V8 which is now causing all kinds of issues that fuzzers find.)
I didn't really have the time yet for reviewing this in detail, but I'm reasonably concerned of the risk of this.
IIUC we match a very specific pattern in the machine optimization reducer, then we try to match that pattern much later during ISel.It is probably rather unlikely that fuzzers will be able to generate this pattern, so this reminds me of https://chromium-review.git.corp.google.com/c/v8/v8/+/7462509.
Would this be a similar case where it's probably necessary that we make sure via some custom fuzzing logic that this pattern works as expected? While times have changed and AI has gotten much better, I'm not sure how much I'd want to rely on that?
I have a large amount of bugs filed in my name
I know that feeling :)
Okay, I'll try to put something together.
if (auto dot = Get(binop.left()).TryCast<Simd128BinopOp>()) {Don't we typically also check for `CanCover` to ensure we only apply the optimization if the intermediate result isn't used by other operations?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
if (auto dot = Get(binop.left()).TryCast<Simd128BinopOp>()) {Sam Parker-HaynesDon't we typically also check for `CanCover` to ensure we only apply the optimization if the intermediate result isn't used by other operations?
Argh, yes. Working in the IR has spoiled me.
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hmmmm, is it possible that my fuzz test has discovered a bug?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Hmmmm, is it possible that my fuzz test has discovered a bug?
Possibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Possibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
It seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
It seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
I can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
I can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Do you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
Sam Parker-HaynesI can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Do you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
In general, I'm a good candidate, I can reproduce it. Let's see if AI can resolve this for me, otherwise I'll need to see if I find some time in the next few days to take a look.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
Sam Parker-HaynesI can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Matthias LiedtkeDo you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
In general, I'm a good candidate, I can reproduce it. Let's see if AI can resolve this for me, otherwise I'll need to see if I find some time in the next few days to take a look.
Yes, my AI friend has confirmed my isel suspicions. I will try to get the fix into this patch.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
Sam Parker-HaynesI can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Matthias LiedtkeDo you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
Sam Parker-HaynesIn general, I'm a good candidate, I can reproduce it. Let's see if AI can resolve this for me, otherwise I'll need to see if I find some time in the next few days to take a look.
Yes, my AI friend has confirmed my isel suspicions. I will try to get the fix into this patch.
My AI fix is handling the case where `dst == src2` in `SharedMacroAssemblerBase::I16x8ExtMulHighS` for x64.
As this has nothing to do with your change, can you please upload it as a separate change? (You can then rebase this change on top of it as a dependent change)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
Sam Parker-HaynesI can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Matthias LiedtkeDo you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
Sam Parker-HaynesIn general, I'm a good candidate, I can reproduce it. Let's see if AI can resolve this for me, otherwise I'll need to see if I find some time in the next few days to take a look.
Matthias LiedtkeYes, my AI friend has confirmed my isel suspicions. I will try to get the fix into this patch.
My AI fix is handling the case where `dst == src2` in `SharedMacroAssemblerBase::I16x8ExtMulHighS` for x64.
As this has nothing to do with your change, can you please upload it as a separate change? (You can then rebase this change on top of it as a dependent change)
Impressive to find two bugs, in two opcodes, from such a small test! Okay, I'll put them both in a separate patch.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
Sam Parker-HaynesI can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Matthias LiedtkeDo you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
Sam Parker-HaynesIn general, I'm a good candidate, I can reproduce it. Let's see if AI can resolve this for me, otherwise I'll need to see if I find some time in the next few days to take a look.
Matthias LiedtkeYes, my AI friend has confirmed my isel suspicions. I will try to get the fix into this patch.
Sam Parker-HaynesMy AI fix is handling the case where `dst == src2` in `SharedMacroAssemblerBase::I16x8ExtMulHighS` for x64.
As this has nothing to do with your change, can you please upload it as a separate change? (You can then rebase this change on top of it as a dependent change)
Impressive to find two bugs, in two opcodes, from such a small test! Okay, I'll put them both in a separate patch.
One reason why I'm not getting to anything these days is because I landed a new verification for `OpEffects` in Turboshaft (we now have randomized "resccheduling" of operations, similar to what the x64 revectorization does). I got 9 fuzzer issues filed, some were duplicates but I already fixed 3 bugs and at least one of these issues is still unresolved. 😐
So yes, any kind of validation that we didn't have before has a decent chance of flushing out issues and especially "corner case" configurations and differential fuzzing are somewhat underrepresented I think.
Thanks a lot for adding it and for looking into the x64 issues, this isn't really your core mission I assume. 😊
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
LGTM with comments. Thanks both for investing the time in adding some fuzzing coverage here, investigating the x86 issues and being so patient on my review! (Just assuming you were patient and not incredibly annoyed by it. 😊)
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();What was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
if (mul_left && mul_right && mul_left->kind != mul_right->kind &&So, one of them needs be ExtMulHigh and one of them needs to be ExtMulLow. That makes sense.
Is it correct that these operations are commutative? I can clearly see that on the input operation that is the case, is it also the case when transforming this to the shuffle operations? Or would the shuffle indices need to be adapted if the mulLow vs. mulHigh input order gets commutated?
(I feel like ideally I'd be able to answer this myself but I'm always struggling with mentally visualizing these shuffle transformations... 😞)
auto GetExtMul = [this](OpIndex input) -> const Simd128BinopOp* {Nit:
```suggestion
auto TryCastExtMul = [this](OpIndex input) -> const Simd128BinopOp* {
```
as this is semantically the same as `input.TryCast<ExtMulLow>() ?: input.TryCast<ExtMulHigh>()`?
// Which, with 16-bit lanes, is equivalent to: 0, 4, 1, 5, 2, 6, 3, 7Thanks for the detailed comment, I would've had no idea what I'm even looking at otherwise! 😊
...SimdInstr(kExprI16x8ExtMulLowI8x16S),
...SimdInstr(kExprI32x4ExtAddPairwiseI16x8S),
kExprLocalGet, 3,
kExprLocalGet, 4,
...SimdInstr(kExprI16x8ExtMulHighI8x16S),
...SimdInstr(kExprI32x4ExtAddPairwiseI16x8S),So maybe we can also swap this once `simd_dot_inverted` and make sure it produces the same values and that should help answering my question from above? 😊
emit_extmul_pairwise(tree.left_branch);
emit_extmul_pairwise(tree.right_branch);I guess, this will also cover the commutation question I brought up above IIUC.
bool disable_avx = !allow_avx && CpuFeatures::IsSupported(AVX);
if (disable_avx) CpuFeatures::SetUnsupported(AVX);So in general gtests can be run as a suite. I think for some reason that isn't really true for our gtests and we invoke them one by one from the `run-tests.py`(?) but I think the test case should still restore the previous state. Can you make this a scoped object that resets the AVX flag to the previous state in its destructor?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
LGTM with comments. Thanks both for investing the time in adding some fuzzing coverage here, investigating the x86 issues and being so patient on my review! (Just assuming you were patient and not incredibly annoyed by it. 😊)
Ha, thanks for the review - asking for a fuzzer was clearly wise! It also found another bug... so I will put up another CL before I can land this!
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();What was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
I can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
if (mul_left && mul_right && mul_left->kind != mul_right->kind &&So, one of them needs be ExtMulHigh and one of them needs to be ExtMulLow. That makes sense.
Is it correct that these operations are commutative? I can clearly see that on the input operation that is the case, is it also the case when transforming this to the shuffle operations? Or would the shuffle indices need to be adapted if the mulLow vs. mulHigh input order gets commutated?
(I feel like ideally I'd be able to answer this myself but I'm always struggling with mentally visualizing these shuffle transformations... 😞)
It doesn't matter whether low/high is on the left or right, but it does matter that they continue to be on the left and right going into the Dot.
auto GetExtMul = [this](OpIndex input) -> const Simd128BinopOp* {Nit:
```suggestion
auto TryCastExtMul = [this](OpIndex input) -> const Simd128BinopOp* {
```
as this is semantically the same as `input.TryCast<ExtMulLow>() ?: input.TryCast<ExtMulHigh>()`?
Done
...SimdInstr(kExprI16x8ExtMulLowI8x16S),
...SimdInstr(kExprI32x4ExtAddPairwiseI16x8S),
kExprLocalGet, 3,
kExprLocalGet, 4,
...SimdInstr(kExprI16x8ExtMulHighI8x16S),
...SimdInstr(kExprI32x4ExtAddPairwiseI16x8S),So maybe we can also swap this once `simd_dot_inverted` and make sure it produces the same values and that should help answering my question from above? 😊
Added a new function.
emit_extmul_pairwise(tree.left_branch);
emit_extmul_pairwise(tree.right_branch);I guess, this will also cover the commutation question I brought up above IIUC.
Hopefully :)
bool disable_avx = !allow_avx && CpuFeatures::IsSupported(AVX);
if (disable_avx) CpuFeatures::SetUnsupported(AVX);So in general gtests can be run as a suite. I think for some reason that isn't really true for our gtests and we invoke them one by one from the `run-tests.py`(?) but I think the test case should still restore the previous state. Can you make this a scoped object that resets the AVX flag to the previous state in its destructor?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
void InstructionSelector::VisitI32x4DotI8x16S(OpIndex node) {This is a test; please ignore it (and mark is as "resolved" when you want to land)
Acknowledged
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
I can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Me neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
if (mul_left && mul_right && mul_left->kind != mul_right->kind &&Sam Parker-HaynesSo, one of them needs be ExtMulHigh and one of them needs to be ExtMulLow. That makes sense.
Is it correct that these operations are commutative? I can clearly see that on the input operation that is the case, is it also the case when transforming this to the shuffle operations? Or would the shuffle indices need to be adapted if the mulLow vs. mulHigh input order gets commutated?
(I feel like ideally I'd be able to answer this myself but I'm always struggling with mentally visualizing these shuffle transformations... 😞)
It doesn't matter whether low/high is on the left or right, but it does matter that they continue to be on the left and right going into the Dot.
Thanks for clarifying
bool disable_ = false;
explicit AVXSupport(bool allow) : disable_(!allow) {
if (disable_) CpuFeatures::SetUnsupported(AVX);
}
~AVXSupport() {
if (disable_) CpuFeatures::SetSupported(AVX);
}
};```suggestion
bool previous_;
explicit AVXSupport(bool allow) : previous_(CpuFeatures::IsSupported(AVX)) {
if (!allow) CpuFeatures::SetUnsupported(AVX);
}
~AVXSupport() {
if(previous_) {
CpuFeatures::SetSupported(AVX);
}
}
};
```
Right now if we enter with AVX disabled, and you call `AVXSupport(false)`, we'd enable it afterwards?
bool disable_avx = !allow_avx && CpuFeatures::IsSupported(AVX);
if (disable_avx) CpuFeatures::SetUnsupported(AVX);Sam Parker-HaynesSo in general gtests can be run as a suite. I think for some reason that isn't really true for our gtests and we invoke them one by one from the `run-tests.py`(?) but I think the test case should still restore the previous state. Can you make this a scoped object that resets the AVX flag to the previous state in its destructor?
I think I've done what you've asked for.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
Matthias LiedtkeI can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Me neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
I think that's because it's passing a bunch of unassigned registers to the code generator to use. But in this case the temp register is being defined and then will be written over. Otherwise I don't see how Temp registers could ever be used instead of Unique..? I don't suppose we have a register allocation stress tester..?
bool disable_ = false;
explicit AVXSupport(bool allow) : disable_(!allow) {
if (disable_) CpuFeatures::SetUnsupported(AVX);
}
~AVXSupport() {
if (disable_) CpuFeatures::SetSupported(AVX);
}
};```suggestion
bool previous_;
explicit AVXSupport(bool allow) : previous_(CpuFeatures::IsSupported(AVX)) {
if (!allow) CpuFeatures::SetUnsupported(AVX);
}
~AVXSupport() {
if(previous_) {
CpuFeatures::SetSupported(AVX);
}
}
};
```
Right now if we enter with AVX disabled, and you call `AVXSupport(false)`, we'd enable it afterwards?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
Matthias LiedtkeI can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Sam Parker-HaynesMe neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
I think that's because it's passing a bunch of unassigned registers to the code generator to use. But in this case the temp register is being defined and then will be written over. Otherwise I don't see how Temp registers could ever be used instead of Unique..? I don't suppose we have a register allocation stress tester..?
I think the only thing that finds register allocation bugs are fuzzers and LLMs, I don't think we have a way to stress-test register allocation decisions.
@dmerc...@chromium.org: Can you say with confidence whether `g.useRegister(input)` + `g.TempRegister()` could end up aliasing or not?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
Matthias LiedtkeI can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Sam Parker-HaynesMe neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
Matthias LiedtkeI think that's because it's passing a bunch of unassigned registers to the code generator to use. But in this case the temp register is being defined and then will be written over. Otherwise I don't see how Temp registers could ever be used instead of Unique..? I don't suppose we have a register allocation stress tester..?
I think the only thing that finds register allocation bugs are fuzzers and LLMs, I don't think we have a way to stress-test register allocation decisions.
@dmerc...@chromium.org: Can you say with confidence whether `g.useRegister(input)` + `g.TempRegister()` could end up aliasing or not?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Sorry for the delay, I missed this CL :/
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
Matthias LiedtkeI can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Sam Parker-HaynesMe neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
Matthias LiedtkeI think that's because it's passing a bunch of unassigned registers to the code generator to use. But in this case the temp register is being defined and then will be written over. Otherwise I don't see how Temp registers could ever be used instead of Unique..? I don't suppose we have a register allocation stress tester..?
Sam Parker-HaynesI think the only thing that finds register allocation bugs are fuzzers and LLMs, I don't think we have a way to stress-test register allocation decisions.
@dmerc...@chromium.org: Can you say with confidence whether `g.useRegister(input)` + `g.TempRegister()` could end up aliasing or not?
Ping @dmerc...@chromium.org
Can you say with confidence whether g.useRegister(input) + g.TempRegister() could end up aliasing or not?
I am _pretty_ sure that they can alias yes. And indeed it would mean that you need a `useUniqueRegister` for both `left` and `right` to make sure that they don't alias with `acc`.
This matches what the x64 instruction selector does.
I also asked Gemini just in case, and it agrees that they can alias, but I'm not familiar enough with the register allocator to judge whether Gemini's explanation makes sense or not 😄
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
Matthias LiedtkeI can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Sam Parker-HaynesMe neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
Matthias LiedtkeI think that's because it's passing a bunch of unassigned registers to the code generator to use. But in this case the temp register is being defined and then will be written over. Otherwise I don't see how Temp registers could ever be used instead of Unique..? I don't suppose we have a register allocation stress tester..?
Sam Parker-HaynesI think the only thing that finds register allocation bugs are fuzzers and LLMs, I don't think we have a way to stress-test register allocation decisions.
@dmerc...@chromium.org: Can you say with confidence whether `g.useRegister(input)` + `g.TempRegister()` could end up aliasing or not?
Darius MercadierPing @dmerc...@chromium.org
Can you say with confidence whether g.useRegister(input) + g.TempRegister() could end up aliasing or not?
I am _pretty_ sure that they can alias yes. And indeed it would mean that you need a `useUniqueRegister` for both `left` and `right` to make sure that they don't alias with `acc`.
This matches what the x64 instruction selector does.
I also asked Gemini just in case, and it agrees that they can alias, but I'm not familiar enough with the register allocator to judge whether Gemini's explanation makes sense or not 😄
Damn, thanks. So when is it safe to use `TempSimd128Register`..?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
InstructionOperand left = g.UseRegister(op.left());
InstructionOperand right = g.UseRegister(op.right());
InstructionOperand acc = g.TempSimd128Register();Sam Parker-HaynesWhat was the logic with temp registers again? Do we need to use `g.UseUniqueRegister` above to prevent sharing between input and temp registers or was that only for input and output registers?
Matthias LiedtkeI can't remember the rules! One of my many gripes about V8 codegen :) I think it was for input and output registers, at least that's the only time I remember being pulled up on it. But looking at the comments in seems it's also dependent on what the code generator is doing with the operation, such as emitting multiple instructions and the input(s) are re-used across instructions. So, I don't think it's necessary here.
Sam Parker-HaynesMe neither. The problem is that this is so hard to verify...
https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=5633;drc=18b4ac79d20301512a20271e9735cd06ff515a0b
sounds like at least at some point someone thought that we might need to use `UseUniqueRegister` so that they don't alias with temp registers?
Matthias LiedtkeI think that's because it's passing a bunch of unassigned registers to the code generator to use. But in this case the temp register is being defined and then will be written over. Otherwise I don't see how Temp registers could ever be used instead of Unique..? I don't suppose we have a register allocation stress tester..?
Sam Parker-HaynesI think the only thing that finds register allocation bugs are fuzzers and LLMs, I don't think we have a way to stress-test register allocation decisions.
@dmerc...@chromium.org: Can you say with confidence whether `g.useRegister(input)` + `g.TempRegister()` could end up aliasing or not?
Darius MercadierPing @dmerc...@chromium.org
Sam Parker-HaynesCan you say with confidence whether g.useRegister(input) + g.TempRegister() could end up aliasing or not?
I am _pretty_ sure that they can alias yes. And indeed it would mean that you need a `useUniqueRegister` for both `left` and `right` to make sure that they don't alias with `acc`.
This matches what the x64 instruction selector does.
I also asked Gemini just in case, and it agrees that they can alias, but I'm not familiar enough with the register allocator to judge whether Gemini's explanation makes sense or not 😄
Damn, thanks. So when is it safe to use `TempSimd128Register`..?
Oh, ignore... I understand.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
OpIndex shuffle_right = __ Simd128Shuffle(V<Simd128>
OpIndex shuffle_left = __ Simd128Shuffle(V<Simd128>
auto mul_left = TryCastExtMul(unop_left->input());Not a huge fan of this `auto` and the one next line.
unop_left->kind == Simd128UnaryOp::Kind::kI32x4ExtAddPairwiseI16x8S) {Can you match this in the `TryCast` already with an Opmask?
(if yes, then you can also remove the `unop_left->kind == unop_right->kind` check above)
auto TryCastExtMul = [this](OpIndex input) -> const Simd128BinopOp* {`V<Simd128>` (I think)
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +1 |
OpIndex shuffle_right = __ Simd128Shuffle(Sam Parker-HaynesV<Simd128>
Done
OpIndex shuffle_left = __ Simd128Shuffle(Sam Parker-HaynesV<Simd128>
Done
auto mul_left = TryCastExtMul(unop_left->input());Not a huge fan of this `auto` and the one next line.
Done
unop_left->kind == Simd128UnaryOp::Kind::kI32x4ExtAddPairwiseI16x8S) {Can you match this in the `TryCast` already with an Opmask?
(if yes, then you can also remove the `unop_left->kind == unop_right->kind` check above)
I've folded all of this into the helper function.
auto TryCastExtMul = [this](OpIndex input) -> const Simd128BinopOp* {| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
12,I think there was a short time when there was some deps change that enabled formatthing for JS. Could you rebase, undo the reformatting of the newest patchset in this file, run `gclient sync` and try again to `git cl format` and hopefully it won't touch the file again?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
I think there was a short time when there was some deps change that enabled formatthing for JS. Could you rebase, undo the reformatting of the newest patchset in this file, run `gclient sync` and try again to `git cl format` and hopefully it won't touch the file again?
Ah, okay. Thought it was a strange thing to suddenly introduce. I've reverted the changes, but my gclient still isn't happy about it!
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
Sam Parker-HaynesI think there was a short time when there was some deps change that enabled formatthing for JS. Could you rebase, undo the reformatting of the newest patchset in this file, run `gclient sync` and try again to `git cl format` and hopefully it won't touch the file again?
Ah, okay. Thought it was a strange thing to suddenly introduce. I've reverted the changes, but my gclient still isn't happy about it!
Hm, I think this should be resolved soon in some way, this forced formatting doesn't work well for Wasm and is really bad for readability.
The `git cl upload --bypass-hooks` is an option that could bypass this and I think the CQ presubmit check shouldn't fail for wrongly formatted test cases but I'm not sure and certainly this isn't the intended solution. 😊
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Matthias LiedtkeHmmmm, is it possible that my fuzz test has discovered a bug?
Sam Parker-HaynesPossibly, or something is wrong in the fuzztest itself? You should be able to reproduce locally and investigate the case it produces and then you could verify if this is correct or not?
Let me know if you run into any issues or need guidance on this.
Sam Parker-HaynesIt seems to stem from no AVX support. The first failed on only that specific runner and now that I've added AVX as a fuzzing bool, it fails for basically all of the x64 builders.
Sam Parker-HaynesI can confirm that the no-AVX results disagree with liftoff. The x64 turboshaft implementations of ExtAddPairwise seem suspicious as it looks as though the instructions overwrite their input, even though the input may have multiple users, which is one of the things I fuzz for.
Matthias LiedtkeDo you have someone that can take a look at x64? Looking at the macroassembler it looks like ExtAddPairwise supported is predicated on AVX support (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/codegen/x64/macro-assembler-x64.cc;drc=a9c0f953cb7441880a0fff7a245b17a000e187df;l=2291) so I'm very confused, none of it looks right to me.
Sam Parker-HaynesIn general, I'm a good candidate, I can reproduce it. Let's see if AI can resolve this for me, otherwise I'll need to see if I find some time in the next few days to take a look.
Matthias LiedtkeYes, my AI friend has confirmed my isel suspicions. I will try to get the fix into this patch.
Sam Parker-HaynesMy AI fix is handling the case where `dst == src2` in `SharedMacroAssemblerBase::I16x8ExtMulHighS` for x64.
As this has nothing to do with your change, can you please upload it as a separate change? (You can then rebase this change on top of it as a dependent change)
Matthias LiedtkeImpressive to find two bugs, in two opcodes, from such a small test! Okay, I'll put them both in a separate patch.
One reason why I'm not getting to anything these days is because I landed a new verification for `OpEffects` in Turboshaft (we now have randomized "resccheduling" of operations, similar to what the x64 revectorization does). I got 9 fuzzer issues filed, some were duplicates but I already fixed 3 bugs and at least one of these issues is still unresolved. 😐
So yes, any kind of validation that we didn't have before has a decent chance of flushing out issues and especially "corner case" configurations and differential fuzzing are somewhat underrepresented I think.Thanks a lot for adding it and for looking into the x64 issues, this isn't really your core mission I assume. 😊
Done
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
[compiler][arm64] I32x4DotI8x16S
Introduce a new Dot operation which performs a lane-wise
multiplication of sign-extended i8x16 data and then two iterations of
pairwise addition to produce an i32x4. For Arm, we then select this to
sdot. Matching these exact semantics is non-trivial, as WebAssembly
doesn't include the pairwise add (without extension) that we'd need
for the final pairwise step. So, we shuffle the inputs to maintain
the semantics.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |