Fuzztesting correctness turboshaft vs liftoff

17 views
Skip to first unread message

Emily Boarer

unread,
Feb 2, 2026, 7:21:03 AM (2 days ago) Feb 2
to v8-dev
Hi All,
cc: Matthias Liedtke, Clemens Backes

I am working on adding AddReduce / arm64 SIMD ADDV instruction support in this patch: https://chromium-review.googlesource.com/c/v8/v8/+/7462509 .
For this patch, I am looking to add a fuzztest to compare correctness between turboshaft and liftoff executions of some simple wasm code, which can exercise the pre-optimized instructions in question here (to prove that any optimizations retain correctness).

This problem doesn't seem to really have an example/comparable solution already V8 to my knowledge (I will gladly be corrected on that!).

`test/unittests/wasm/module-generation-fuzztest.cc`'s `SimdSeeds()` could be an example of how to achieve something like this? I am not entirely sure what's going on with that function, but perhaps it is a wasm blob? In general this seems a very non-obvious addition to be made, and I could do with all the help and suggestions on offer.

Thanks,
Emily

Clemens Backes

unread,
Feb 2, 2026, 8:20:42 AM (2 days ago) Feb 2
to v8-...@googlegroups.com
Hi Emily,

those "SimdSeeds" are just blobs of bytes which are used to drive the generation of the wasm module, see the "data" argument on `GenerateRandomWasmModule`. The interpretation of those bytes often changes when additions or modifications to this random generation are made, so it's not ideal to hard-code them anywhere (but we sometimes do it anyway if there is no better regression test we have).

I see two options:

1. You could look at simd-cross-compiler-determinism-fuzztest.cc and float32-cross-compiler-determinism-fuzztest.cc, which generate some code in different compilers very deterministically and compare the results against each other. Maybe you can generate some interesting patterns explicitly in a similar way.

2. Generally the "random module generation" fuzzers are what we use to find differences between compilers which are not covered by simpler tests. But they are not targeting specific patterns. We do have some variants nowadays (GenerateWasmModuleForInitExpressions, GenerateWasmModuleForDeopt, GenerateWasmModuleForRevec), but it does not scale to add a variant for every new feature or optimization. You could just run this fuzzer over night, maybe commenting out parts which are not interesting to you. Since you are interested in SIMD operations, I would start with the v8_wasm_compile_simd_fuzzer (using libfuzzer, only works in a chromium checkout), or the centipede variant of that (this "ModuleGenerationTest.TestSimd" which also works in V8). Let me know if you need more details on that and I can support you off-list.

-Clemens


--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/v8-dev/59abf3f7-4c1c-4454-a101-1000a8eceaean%40googlegroups.com.


--

Clemens Backes

Software Engineer

clem...@google.com

Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian   

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.


This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


Matthias Liedtke

unread,
Feb 2, 2026, 8:02:05 PM (2 days ago) Feb 2
to v8-...@googlegroups.com
Sharing some of my thoughts on the original CL here, perhaps others have ideas on how to address this.

The change introduces an optimization for a pattern that requires a number of i32.add instructions that in their entirety add a bunch of i32 SIMD extract lane operations that have the lane indices 0, 1, 2 and 3 exactly once.
We'd like to test patterns where we emit something that either fits this pattern or is very close to it (which could catch cases where the optimization wrongly applies).
Based on my assumptions about our random module generation capabilities, I think it is extremely unlikely that the fuzzer will emit any pattern that fully meets the optimization requirements, so running our current fuzzers imo won't help. We can't seed the fuzzer with such a pattern either as we can't go from wasm module to fuzzer input. We could add something that emits exactly the pattern but what we'd want to fuzz is something that is close to this pattern but in most cases will have small differences in the pattern or different orders of these instructions and inputs.

I think the easiest solution is to use a fuzzer that can emit the correct pattern and can perform meaningful mutations on that pattern.
The wasm compile-fuzzers can't perform mutations (it's generative only) and the code fuzzers only interpret input bytes as wasm code, so correct mutations will be exceedingly rare.
This leaves us with Fuzzilli, which can perform these actions. But Fuzzilli doesn't support differential fuzzing. While there is ongoing work on differential fuzzing ("Dumpling"), it is centered around deopt-points in JavaScript and would only compare different JS tiers and therefore can't detect correctness bugs in Wasm. Furthermore the V8 part of the integration only supports x64 and this is an arm64-only optimization.

This is why I'm leaning towards: We should have a small fuzztest that emits a bunch of i32 and S128 operations, including some extract lanes. It shall also emit a bunch of i32.add instructions consuming these extract lanes (and potentially some of the other i32 inputs). The fuzztest would build this into a single function that produces an i32 as a result and runs it with liftoff and turbofan and compares the results. This will only cover basic bugs but it should sometimes produce patterns that match the optimization's requirements.

If at some point we end up with dozens of such small fuzzers for dozens of different Turboshaft optimizations, it should still scale well as fuzztest is designed for having dozens or hundreds of small fuzzers (AFAIK).

Let me know what you think.

Matthias

Reply all
Reply to author
Forward
0 new messages