| Commit-Queue | +1 |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the first, quick synchronous review. It's ready for another pass in your own time now. (I fixed the notes I made for myself earlier.)
if (v8_flags.turboshaft_wasm_in_js_inlining) {remove: DCE interaction etc.
const JSWasmCallParameters* js_wasm_call_parameters;add TODO: if TSCallDescriptor becomes shared, add this to the TS CallOp, needs lots of code churn in all reducers
CopyingPhase<WasmInJSInliningReducer, WasmLoweringReducer>::Run(data,add TODO: GC TypeOptimizationReducer
DCHECK(v8_flags.turboshaft_wasm_in_js_inlining);Make it a `CHECK`
if constexpr (ValidationTag::validate) {I double checked, and this is required. See added comment for clarity.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Great work! Given the size of the CL, I hope it isn't surprising that I also have a bunch of comments... 😊
#if V8_ENABLE_WEBASSEMBLYAs mentioned above, I think this is a large difference in readability. If we can just keep the members (a few extra bytes in one reducer during a limited amount of time during compilation), that sounds like the better trade-off.
// don't set the thread-in-wasm flag now, and instead do that when _not_if
// TODO(dlehmann,353475584): Fix this restriction when porting to Turboshaft.Sure? Even the wasm-into-wasm inlining doesn't support this (i.e. cross-module-inlining), see e.g. https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/wasm-lowering-reducer.h;l=977. This will not be a trivial change.
(You can obviously keep the `TODO` but you might want to lower your expectations on it. 😊)
// in Turboshaft (or in Maglev, depending on the shared frontend).That's the big disadvantage. We'd need to implement JS-to-Wasm-wrappers in maglev. Ideally we'd have a design where we don't need to duplicate it. (Although even in Turbofan that wasn't fully the case.)
So doing it late in Turboshaft would probably be a nicer solution.
const JSWasmCallParameters* jswcp = nullptr;`wasm_call_parameters`? Yes, there is an 80 char line limit but it isn't that terrible. 😄
if (call_descriptor->kind() == CallDescriptor::kCallWasmFunction) {
if (v8_flags.turboshaft_wasm_in_js_inlining) {Nit:
```suggestion
if (call_descriptor->kind() == CallDescriptor::kCallWasmFunction
&& v8_flags.turboshaft_wasm_in_js_inlining) {
```
// Make sure that for each no-yet-body-inlined call node, there is annot
DCHECK_NE(it, js_wasm_calls_sidetable->end());Nit: This would produce an oob-pointer, so maybe just `CHECK_NE`? I used to use `DCHECK` everywhere, but especially in the optimizing compiler I feel like we should use `CHECK`s more deliberately.
// Then, one would need to make this an argument of the Turboshaft CallOp`Then`? In which of the two cases? 😊
Maybe "For sharing call descriptors, the JSWasmCallParameters need to be moved to the CallOp which causes a lot of code churn (needs touching all `REDUCE(Call)`).
Well, actually, there aren't many of them (2), so it wouldn't be all that bad. Still needs touching the operation etc. but that isn't a lot of work.
// one particular call site, this assumes (only works correctly) if```suggestion
// one particular call site, this assumes that (only works correctly if)
```
Run<turboshaft::MachineLoweringPhase>();I feel like we should do the inlining conceptually after the `MachineLowering` as it replaces higher-level JS concepts with low-level instructions and the inlined wasm uses low-level instructions as well.
On the other hand, the `MachineLoweringPhase` also does a few optimizations like `MachineOptimizationReducer` that we'd want to run for the inlined wasm as well.
It could indeed be the best strategy to merge it into the `MachineLoweringPhase` in the end but for now I'd keep it separate (because of the experimental nature and the open questions).
std::copy(arguments_.begin(), arguments_.end(), locals_.begin());`std::copy` doesn't know about `locals_.end()`, so we should probably add a `CHECK` here?
bool func_is_shared)Can we please just bail out in that case? Shared-everything is a super-complicated proposal and the whole `trusted_instance_data` handling for it will be challenging. It is unclear who'll be responsible for then providing the necessary testing for mixing wasm-into-js-inlining with shared-everything, so there is a big risk that this will create issues in the future.
There is also no need to support this right now, shared-everything isn't going to be shipped within the next 12 months (I guess).
namespace v8::internal::wasm {I don't think this is a good solution. We should stay with the `compiler::turboshaft` namespace in my opinion. But I also had my concerns about putting the `turboshaft-graph-interface.cc` to `src/wasm` instead of `src/compiler`.
// Regular call, nothing to do with Wasm or inlining. Proceed untouched...
return Next::ReduceCall(callee, frame_state, arguments, descriptor,
effects);I think it's easier to read if you put this in the beginning:
```
if (!descriptor->js_wasm_call_parameters) {
return Next::ReduceCall(callee, frame_state, arguments, descriptor,
effects);
}
```
OpIndex thread_in_wasm_flag_address =`V<WordPtr>`?
// not toggle the thread-in-Wasm flag, since it's not needed in theI don't think this is semantically correct. We do not want to set the flag in the inlined case (the flag is meant to be only set on wasm frames).
V<WasmTrustedInstanceData> trusted_instance_data(bool element_is_shared) {As said above, I'd like to get rid of any sharedness here.
V<Any> DefaultValue(ValueType type) {Nit: The data members should be last (meaning this functions should all be moved above them).
// Wasm instance. Used only in `StartFunction()`.Nit: I'm not a fan of such comments. They tend to get outdated and then cause more confusion than without having the comment.
// TODO(dlehmann): Not 100% sure this is the correct way. Took this from
// the Wasm pipeline, but is it also correct in JavaScript?This should be correct. However, you should have an `isolate` so you should be able to just emit a `HeapConstant` for this instead of having to look up the value from the roots.
I couldn't see any of your test-cases testing the result value produced by the inlining (in all cases, not just the void return one), you should do that, otherwise we only know that it didn't crash...
UNREACHABLE();Just `Bailout`? I think chances are high that we'll add a new unary op at some point in time. Whoever adds that, might miss this inliner and it would be fine if we don't support the new instruction but it wouldn't be great if we then suddenly have stability issues in rare corner cases because of that.
If you think that "the inlining should support all unary ops", then we can add a `DCHECK` plus the bailout, so that fuzzers can find this but this shouldn't risk production stability.
return __ TruncateWord64ToWord32(arg);I'm pretty sure that most if not all of the `I64` instructions you added are broken. When not on 64 bits, you'll need to run the `turboshaft::Int64LoweringReducer`.
As you're on an experimental flag, this doesn't have to happen on this change but that's a bit of the risk of adding dozens of instructions and testing like 5 of them.
// need to understand what the inlining ID is / where to get it.You'll need to provide that from the JS side, see https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/js-inlining.cc;l=454
This might not be fully trivial given that you are doing the inlining later than the JS inlining.
// Initialize the non-parameter locals.How many test cases do you have for this? #passiveAggressiveQuestion 😊
Maybe you'd want to add a `Bailout` here if we have non-parameter-locals and add this feature later.
// Can't use Turboshaft Wasm-in-JS inlining without the Turboshaft _JavaScript_Nit: I think those emphasizing `_`s are a bit more confusing than helping.
if constexpr (ValidationTag::validate) {So, what was the reason for this change again?
// We would need to extend the wrapper inlining to support this.As discussed offline, I don't think so for multi-returns as these require allocating a JSArray, so it isn't completely straight-forward.
kExprUnreachable,This should be easy, we need traps anyways for all the gc instructions. But we don't really need `kExprUnreachable` (it only makes sense in combination with exceptions or other control flow).
console.log(e)It might be better to return it or not use the error at all. I've learnt in the past that printing errors creates confusion because someone looks at the output and thinks "Oh, `RuntimeError: unreachable`, seems like the test has failed!"
Successfully inlined Wasm function #0Nit: You probably want to print some information about the module (e.g. the address) which helps in case of multiple instantiations.
Successfully inlined Wasm function #1Nit: The `successfully` seems redundant.
Successfully inlined Wasm function #2Nit: You should be able to also print the debug name if present (see the turbofan implementation that seems to do that).
Cannot consider {ADDRESS} {{ADDRESS} <FeedbackCell[many closures]>} for inlining (no feedback vector)Nit: But we "considered" it for inlining, otherwise we wouldn't have printed this line? 😊
Should we just treat this as "Cannot inline [...] (no feedback vector)"?
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
Thanks for the first, quick synchronous review. It's ready for another pass in your own time now. (I fixed the notes I made for myself earlier.)
Done
As mentioned above, I think this is a large difference in readability. If we can just keep the members (a few extra bytes in one reducer during a limited amount of time during compilation), that sounds like the better trade-off.
Ah, I forgot to make this change after our sync meeting. Done.
// don't set the thread-in-wasm flag now, and instead do that when _not_Daniel Lehmannif
Done
// TODO(dlehmann,353475584): Fix this restriction when porting to Turboshaft.Sure? Even the wasm-into-wasm inlining doesn't support this (i.e. cross-module-inlining), see e.g. https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/wasm-lowering-reducer.h;l=977. This will not be a trivial change.
(You can obviously keep the `TODO` but you might want to lower your expectations on it. 😊)
Ack, this is not going to happen for the TS Wasm-in-JS inlining MVP, so I toned down the comment.
In terms of importance/prioritization, I do think the restriction to inline only from a single module in Wasm-in-JS inlining is more of a weird performance cliff than not inlining across different Wasm modules. I.e., that the first Wasm module "wins" in Wasm-in-JS inlining feels "more arbitrary" or "more surprising". (This is just my gut feeling for what I think "a typical Web developer" expects, WDYT?)
// in Turboshaft (or in Maglev, depending on the shared frontend).That's the big disadvantage. We'd need to implement JS-to-Wasm-wrappers in maglev. Ideally we'd have a design where we don't need to duplicate it. (Although even in Turbofan that wasn't fully the case.)
So doing it late in Turboshaft would probably be a nicer solution.
Good point, that's a good reason to do it late in Turboshaft in any case (i.e., here.)
`wasm_call_parameters`? Yes, there is an 80 char line limit but it isn't that terrible. 😄
fair 😄
if (call_descriptor->kind() == CallDescriptor::kCallWasmFunction) {
if (v8_flags.turboshaft_wasm_in_js_inlining) {Nit:
```suggestion
if (call_descriptor->kind() == CallDescriptor::kCallWasmFunction
&& v8_flags.turboshaft_wasm_in_js_inlining) {
```
Done
// Make sure that for each no-yet-body-inlined call node, there is anDaniel Lehmannnot
Done
Nit: This would produce an oob-pointer, so maybe just `CHECK_NE`? I used to use `DCHECK` everywhere, but especially in the optimizing compiler I feel like we should use `CHECK`s more deliberately.
Ack & done
// Then, one would need to make this an argument of the Turboshaft CallOp`Then`? In which of the two cases? 😊
Maybe "For sharing call descriptors, the JSWasmCallParameters need to be moved to the CallOp which causes a lot of code churn (needs touching all `REDUCE(Call)`).Well, actually, there aren't many of them (2), so it wouldn't be all that bad. Still needs touching the operation etc. but that isn't a lot of work.
Done
// one particular call site, this assumes (only works correctly) if```suggestion
// one particular call site, this assumes that (only works correctly if)
```
Done
Run<turboshaft::MachineLoweringPhase>();I feel like we should do the inlining conceptually after the `MachineLowering` as it replaces higher-level JS concepts with low-level instructions and the inlined wasm uses low-level instructions as well.
On the other hand, the `MachineLoweringPhase` also does a few optimizations like `MachineOptimizationReducer` that we'd want to run for the inlined wasm as well.
It could indeed be the best strategy to merge it into the `MachineLoweringPhase` in the end but for now I'd keep it separate (because of the experimental nature and the open questions).
I have thought again about my comment `Skip this phase if there were no JS->Wasm calls` a couple lines above and replaced it instead with the plan you laid out:
1. Keep a separate phase before the `MachineLoweringPhase` until the MVP is feature-complete and cleaned-up (but before Finching or shipping it).
2. Then integrate my reducer into the `MachineLoweringPhase` probably before `DataViewLoweringReducer` (since we have `DataViewLoweringReducer` in the `WasmGraphBuilderBase` as well, that seems like the right place).
3. Not worry about the performance of always running the reducer: If we have inlineable Wasm functions, running a separate phase is more expensive, and if we don't inline it's just an additional load+conditional branch per CallOp during Turbofan compilation, which is not worth the hassle. (Except if we add a lot more reducers that apply to JS operations as well, but I don't think that will be the case.)
std::copy(arguments_.begin(), arguments_.end(), locals_.begin());`std::copy` doesn't know about `locals_.end()`, so we should probably add a `CHECK` here?
Done
Can we please just bail out in that case? Shared-everything is a super-complicated proposal and the whole `trusted_instance_data` handling for it will be challenging. It is unclear who'll be responsible for then providing the necessary testing for mixing wasm-into-js-inlining with shared-everything, so there is a big risk that this will create issues in the future.
There is also no need to support this right now, shared-everything isn't going to be shipped within the next 12 months (I guess).
Yes, sure. Removed the field, bail out in `TryInlineWasmCall` now.
I don't think this is a good solution. We should stay with the `compiler::turboshaft` namespace in my opinion. But I also had my concerns about putting the `turboshaft-graph-interface.cc` to `src/wasm` instead of `src/compiler`.
Sure, kept both in the same `compiler::turboshaft` namespace and added a bunch of `wasm::` prefixes (and few `using wasm::...` declarations for frequently repeated things or where there's no danger of confusing it with JS things).
// Regular call, nothing to do with Wasm or inlining. Proceed untouched...
return Next::ReduceCall(callee, frame_state, arguments, descriptor,
effects);I think it's easier to read if you put this in the beginning:
```
if (!descriptor->js_wasm_call_parameters) {
return Next::ReduceCall(callee, frame_state, arguments, descriptor,
effects);
}
```
Good idea! Done.
OpIndex thread_in_wasm_flag_address =Daniel Lehmann`V<WordPtr>`?
Done
// not toggle the thread-in-Wasm flag, since it's not needed in theI don't think this is semantically correct. We do not want to set the flag in the inlined case (the flag is meant to be only set on wasm frames).
Done
V<WasmTrustedInstanceData> trusted_instance_data(bool element_is_shared) {As said above, I'd like to get rid of any sharedness here.
Done
Nit: The data members should be last (meaning this functions should all be moved above them).
Done
Nit: I'm not a fan of such comments. They tend to get outdated and then cause more confusion than without having the comment.
Done (removed).
// TODO(dlehmann): Not 100% sure this is the correct way. Took this from
// the Wasm pipeline, but is it also correct in JavaScript?This should be correct. However, you should have an `isolate` so you should be able to just emit a `HeapConstant` for this instead of having to look up the value from the roots.
I couldn't see any of your test-cases testing the result value produced by the inlining (in all cases, not just the void return one), you should do that, otherwise we only know that it didn't crash...
Changed to loading `__ HeapConstant(isolate->factory()->undefined_value())`.
Regarding testing the result of the inlining: all test cases `assertEqual(unoptimizedResult, inlinedResult)`, so basically a differential oracle. I think that's good enough, but feel free to reopen if I misunderstood something.
Just `Bailout`? I think chances are high that we'll add a new unary op at some point in time. Whoever adds that, might miss this inliner and it would be fine if we don't support the new instruction but it wouldn't be great if we then suddenly have stability issues in rare corner cases because of that.
If you think that "the inlining should support all unary ops", then we can add a `DCHECK` plus the bailout, so that fuzzers can find this but this shouldn't risk production stability.
My reasoning for the `UNREACHABLE` was that we should trip over it when adding a new (in this case: unary) op and forget to handle it here (in the spirit of "make assumptions explicit", "constrain the valid execution paths maximally", and "make all handled enum cases explicit") but I agree that this shouldn't risk stability. So instead, I am going with a three pronged approach:
return __ TruncateWord64ToWord32(arg);I'm pretty sure that most if not all of the `I64` instructions you added are broken. When not on 64 bits, you'll need to run the `turboshaft::Int64LoweringReducer`.
As you're on an experimental flag, this doesn't have to happen on this change but that's a bit of the risk of adding dozens of instructions and testing like 5 of them.
Oh, you are right, and constant folding may eliminate the i64 operations in many simple test cases so it's not immediately obvious. Just adding the `Int64LoweringReducer` is also not possible to support them, since it makes some assumptions about `Parameter` and `Return` ops and assumes a (non-existent in the Wasm-in-JS inlining case) `wasm_sig`.
All in all, I have to agree that this is a bit too risky and would require a whole bunch of more changes to land right now. So I will remove i64 unary/binary ops from this CL, and only gradually extend later.
I also extended the test with all unary and binary ops that are currently supported by the inliner (which required a bit of a rewrite/testing framework, otherwise it would have been a lot of repetition).
// need to understand what the inlining ID is / where to get it.You'll need to provide that from the JS side, see https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/js-inlining.cc;l=454
This might not be fully trivial given that you are doing the inlining later than the JS inlining.
Thanks for the pointer, will leave this open and address in a follow-up CL/part of the implementation.
How many test cases do you have for this? #passiveAggressiveQuestion 😊
Maybe you'd want to add a `Bailout` here if we have non-parameter-locals and add this feature later.
Haha, well, there was one test with a non-parameter local (`localTee` below), but I see your point :)
I will leave it in, since the code required for non-parameter locals is really quite minimal, but added another test with more than one local and some swaps between locals.
// Can't use Turboshaft Wasm-in-JS inlining without the Turboshaft _JavaScript_Nit: I think those emphasizing `_`s are a bit more confusing than helping.
Done (removed).
So, what was the reason for this change again?
`DecoderError()` in its body has `V8_ASSUME(ValidationTag::validate)` to help the compiler optimize (and since it's an unchecked assumption, V8_ASSUME additionally DCHECKs that it holds). Here, we call `DecodeError` in a context where `validate` could be false. I am not entirely sure why we didn't trip over this beforehand, but this is the correct thing to do and it comes up with bailouts in the `WasmInJsInliningInterface` now.
// We would need to extend the wrapper inlining to support this.As discussed offline, I don't think so for multi-returns as these require allocating a JSArray, so it isn't completely straight-forward.
Correct, rephrased the comment: If we want to support it, we first need to support this in the JS-to-Wasm wrapper inlining. I would rather first port the wrapper inlining to Turboshaft and then implement it, so it's quite far out (doesn't make much sense to spend time on features in the TurboFan wrapper inlining).
This should be easy, we need traps anyways for all the gc instructions. But we don't really need `kExprUnreachable` (it only makes sense in combination with exceptions or other control flow).
Acknowledged
It might be better to return it or not use the error at all. I've learnt in the past that printing errors creates confusion because someone looks at the output and thinks "Oh, `RuntimeError: unreachable`, seems like the test has failed!"
Good point, ack and done.
Nit: You probably want to print some information about the module (e.g. the address) which helps in case of multiple instantiations.
Ack and done: Use the same format as the TurboFan implementation (see also comment below).
Nit: The `successfully` seems redundant.
Ack and done: Use the same format as the TurboFan implementation (see also comment below).
Nit: You should be able to also print the debug name if present (see the turbofan implementation that seems to do that).
Ack and done: Use the same format as the TurboFan implementation (see also comment below).
Cannot consider {ADDRESS} {{ADDRESS} <FeedbackCell[many closures]>} for inlining (no feedback vector)Nit: But we "considered" it for inlining, otherwise we wouldn't have printed this line? 😊
Should we just treat this as "Cannot inline [...] (no feedback vector)"?
This is actually output from the existing JS-to-Wasm wrapper inlining in Turbofan. I assume it is written this way to be consistent with JavaScript inlining and the positive case where inlining works out: ``` Considering wasm function [24] anyConvertExternConvertAny of module 0x56523c722ac8 for inlining
I don't terribly like the format either, but it's really just bikeshedding and for consistency with the existing tracing output, I use the same format now here. The wrapper inlining is still in Turbofan, which is yet another reason to use this format.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Code-Review | +1 |
// TODO(dlehmann,353475584): Fix this restriction when porting to Turboshaft.Daniel LehmannSure? Even the wasm-into-wasm inlining doesn't support this (i.e. cross-module-inlining), see e.g. https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/wasm-lowering-reducer.h;l=977. This will not be a trivial change.
(You can obviously keep the `TODO` but you might want to lower your expectations on it. 😊)
Ack, this is not going to happen for the TS Wasm-in-JS inlining MVP, so I toned down the comment.
In terms of importance/prioritization, I do think the restriction to inline only from a single module in Wasm-in-JS inlining is more of a weird performance cliff than not inlining across different Wasm modules. I.e., that the first Wasm module "wins" in Wasm-in-JS inlining feels "more arbitrary" or "more surprising". (This is just my gut feeling for what I think "a typical Web developer" expects, WDYT?)
More arbitrary, yes.
However, the multi-module-use-cases are rare so far. It does work with multiple instantiations afaik as both instantiations would have the same `wasm::WasmModule` object, so that's already ahead of the wasm-into-wasm inlining.
I'd rather see us implement cross-module / cross-instantiation inlining for the wasm-inlining than for the Wasm-into-JS inlining as this is much more of a corner case (and much less something that developers would expect the engine to optimize for).
Run<turboshaft::MachineLoweringPhase>();Daniel LehmannI feel like we should do the inlining conceptually after the `MachineLowering` as it replaces higher-level JS concepts with low-level instructions and the inlined wasm uses low-level instructions as well.
On the other hand, the `MachineLoweringPhase` also does a few optimizations like `MachineOptimizationReducer` that we'd want to run for the inlined wasm as well.
It could indeed be the best strategy to merge it into the `MachineLoweringPhase` in the end but for now I'd keep it separate (because of the experimental nature and the open questions).
I have thought again about my comment `Skip this phase if there were no JS->Wasm calls` a couple lines above and replaced it instead with the plan you laid out:
1. Keep a separate phase before the `MachineLoweringPhase` until the MVP is feature-complete and cleaned-up (but before Finching or shipping it).
2. Then integrate my reducer into the `MachineLoweringPhase` probably before `DataViewLoweringReducer` (since we have `DataViewLoweringReducer` in the `WasmGraphBuilderBase` as well, that seems like the right place).
3. Not worry about the performance of always running the reducer: If we have inlineable Wasm functions, running a separate phase is more expensive, and if we don't inline it's just an additional load+conditional branch per CallOp during Turbofan compilation, which is not worth the hassle. (Except if we add a lot more reducers that apply to JS operations as well, but I don't think that will be the case.)
Ah, actually I remember one difference here:
For Turbofan, if we inline wasm functions, we run a bunch of extra optimizations (the `WasmGCOptimizationPhase`) and some of that stuff is actually relevant (e.g. eliminating redundant casts and the `extern.internalize(extern.externalize(...))` that would be a typical pattern if you chain two wasm calls (one externalizes the return and the other internalizes the argument.)
So you might want to do the same later on. (OTOH, we can still run it in the regular phase and then conditionally run the wasm optimizations later on, they anyways need their own phase in Turboshaft.)
// We currently don't sup port operations that:Nit: `support`
// asm.js:Thanks. Also note that we can't inline asm.js because it would require different handling in frames.cc as it would need to create a "JS-looking" stack frame for a wasm function. (There is a test case somewhere I think.)
// TODO(dlehmann): Not 100% sure this is the correct way. Took this from
// the Wasm pipeline, but is it also correct in JavaScript?Daniel LehmannThis should be correct. However, you should have an `isolate` so you should be able to just emit a `HeapConstant` for this instead of having to look up the value from the roots.
I couldn't see any of your test-cases testing the result value produced by the inlining (in all cases, not just the void return one), you should do that, otherwise we only know that it didn't crash...
Changed to loading `__ HeapConstant(isolate->factory()->undefined_value())`.
Regarding testing the result of the inlining: all test cases `assertEqual(unoptimizedResult, inlinedResult)`, so basically a differential oracle. I think that's good enough, but feel free to reopen if I misunderstood something.
Fair, I missed the `assertEquals`. (I'll blame it on the CL size! 😜)
UNREACHABLE();Daniel LehmannJust `Bailout`? I think chances are high that we'll add a new unary op at some point in time. Whoever adds that, might miss this inliner and it would be fine if we don't support the new instruction but it wouldn't be great if we then suddenly have stability issues in rare corner cases because of that.
If you think that "the inlining should support all unary ops", then we can add a `DCHECK` plus the bailout, so that fuzzers can find this but this shouldn't risk production stability.
My reasoning for the `UNREACHABLE` was that we should trip over it when adding a new (in this case: unary) op and forget to handle it here (in the spirit of "make assumptions explicit", "constrain the valid execution paths maximally", and "make all handled enum cases explicit") but I agree that this shouldn't risk stability. So instead, I am going with a three pronged approach:
- ops that we support
- ops that we don't support, but not a bug: `Bailout()`
- default (ops that we "forgot to handle"), a bug, but should not affect stability: `DHCECK(false); Bailout()`
Sounds good!
return __ TruncateWord64ToWord32(arg);Daniel LehmannI'm pretty sure that most if not all of the `I64` instructions you added are broken. When not on 64 bits, you'll need to run the `turboshaft::Int64LoweringReducer`.
As you're on an experimental flag, this doesn't have to happen on this change but that's a bit of the risk of adding dozens of instructions and testing like 5 of them.
Oh, you are right, and constant folding may eliminate the i64 operations in many simple test cases so it's not immediately obvious. Just adding the `Int64LoweringReducer` is also not possible to support them, since it makes some assumptions about `Parameter` and `Return` ops and assumes a (non-existent in the Wasm-in-JS inlining case) `wasm_sig`.
All in all, I have to agree that this is a bit too risky and would require a whole bunch of more changes to land right now. So I will remove i64 unary/binary ops from this CL, and only gradually extend later.
I also extended the test with all unary and binary ops that are currently supported by the inliner (which required a bit of a rewrite/testing framework, otherwise it would have been a lot of repetition).
and constant folding may eliminate the i64 operations in many simple test cases so it's not immediately obvious.
Yes, but even `Word64Constant` should fail without the lowering. 😊
// Initialize the non-parameter locals.Daniel LehmannHow many test cases do you have for this? #passiveAggressiveQuestion 😊
Maybe you'd want to add a `Bailout` here if we have non-parameter-locals and add this feature later.
Haha, well, there was one test with a non-parameter local (`localTee` below), but I see your point :)
I will leave it in, since the code required for non-parameter locals is really quite minimal, but added another test with more than one local and some swaps between locals.
Thanks!
if constexpr (ValidationTag::validate) {Daniel LehmannSo, what was the reason for this change again?
`DecoderError()` in its body has `V8_ASSUME(ValidationTag::validate)` to help the compiler optimize (and since it's an unchecked assumption, V8_ASSUME additionally DCHECKs that it holds). Here, we call `DecodeError` in a context where `validate` could be false. I am not entirely sure why we didn't trip over this beforehand, but this is the correct thing to do and it comes up with bailouts in the `WasmInJsInliningInterface` now.
Ah, I think I got it, with bailouts, we don't reach the end but we still don't want to trigger a `DecodeError` here, right?
// TODO(353475584): Is it worth lifting this restriction? Is this a
// restriction also for regular JS inlining?I'd hope that
1) People don't use a generic wrapper mechanism with `...` (as that would also prevent inlining as everything goes through the same call).
2) compilers generate the arguments explicitly since wasm signatures are static, so it should be very easy to generate the argument list.)
So, if these assumptions hold true, we shouldn't need this. (But still a good idea to keep the todo for now.)
kGCPrefix, kExprAnyConvertExtern,
kGCPrefix, kExprExternConvertAny,Note that this can be optimized away (as mentioned in another comment).
An idea using `kSi_r_i`:
```
kExprLocalGet, 0,
kExprRefI31,
kExprExternConvertAny
```
and `kSig_i_r` for the other way:
```
kExprLocalGet, 0,
kExprAnyConvertExtern,
kExprI31GetS,
```
(I didn't check if you have the ref.i31 stuff in your CL, so this could also be a nice test case for a follow-up CL.)
And yes, there could still be some optimizations here, but I don't think we optimize anything there (other than very generic optimizations).
addTestcase('createArray', makeSig([kWasmI32], [kWasmExternRef]), [42], [This isn't needed for feature parity, it's only needed to test the `array.len` below, so your TODO is slightly misleading. 😊
// addBinaryTestcase('RefEq', makeSig([kWasmExternRef, kWasmExternRef], [kWasmExternRef]), {}, null);Should this be `kWasmEqRef` or do I misunderstand the comment?
// Also, there is no wrapper inlining for i64 on 32-bit architectures rightOh, I never considered this. Independently, it's probably easiest to add the lowering to support the operations and ignore the fact that wrappers won't work well on 32 bits as we don't care too much about missed optimizations on 32 bit platforms.
try {
wasmExports.trapNoInline();
} catch (e) {
return e.toString();
}This would also require inlining into a try-block which in Turbofan would've been a pain (not sure about Turboshaft in that aspect).
eval(jsFunctionSrc);
const jsFunction = eval('js_' + name);You can slightly reduce the amount of eval and magic by doing
```suggestion
const jsFunction = eval(jsFunctionSrc);
```
assertEquals(%GetOptimizationStatus(jsFunction), V8OptimizationStatus.kIsFunction | V8OptimizationStatus.kOptimized | V8OptimizationStatus.kTurboFanned);I think these asserts can produce all kinds of issues in various build modes (e.g. the one that randomly deopts functions).
You should probably try to use the helpers like `assertOptimized(jsFunction)` that hides most of that complexity.
Cannot consider {ADDRESS} {{ADDRESS} <FeedbackCell[many closures]>} for inlining (no feedback vector)Daniel LehmannNit: But we "considered" it for inlining, otherwise we wouldn't have printed this line? 😊
Should we just treat this as "Cannot inline [...] (no feedback vector)"?
This is actually output from the existing JS-to-Wasm wrapper inlining in Turbofan. I assume it is written this way to be consistent with JavaScript inlining and the positive case where inlining works out: ``` Considering wasm function [24] anyConvertExternConvertAny of module 0x56523c722ac8 for inlining
- inlining
- ```
I don't terribly like the format either, but it's really just bikeshedding and for consistency with the existing tracing output, I use the same format now here. The wrapper inlining is still in Turbofan, which is yet another reason to use this format.
Consistency makes sense, I don't like the format too much either, but I guess, I also wanted to make it consistent with the JS inlining. 😊
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Commit-Queue | +2 |
// TODO(dlehmann,353475584): Fix this restriction when porting to Turboshaft.Daniel LehmannSure? Even the wasm-into-wasm inlining doesn't support this (i.e. cross-module-inlining), see e.g. https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/wasm-lowering-reducer.h;l=977. This will not be a trivial change.
(You can obviously keep the `TODO` but you might want to lower your expectations on it. 😊)
Matthias LiedtkeAck, this is not going to happen for the TS Wasm-in-JS inlining MVP, so I toned down the comment.
In terms of importance/prioritization, I do think the restriction to inline only from a single module in Wasm-in-JS inlining is more of a weird performance cliff than not inlining across different Wasm modules. I.e., that the first Wasm module "wins" in Wasm-in-JS inlining feels "more arbitrary" or "more surprising". (This is just my gut feeling for what I think "a typical Web developer" expects, WDYT?)
More arbitrary, yes.
However, the multi-module-use-cases are rare so far. It does work with multiple instantiations afaik as both instantiations would have the same `wasm::WasmModule` object, so that's already ahead of the wasm-into-wasm inlining.
I'd rather see us implement cross-module / cross-instantiation inlining for the wasm-inlining than for the Wasm-into-JS inlining as this is much more of a corner case (and much less something that developers would expect the engine to optimize for).
That makes sense: more arbitrary, but less used in practice, hence add features rather to Wasm-inlining. (Marking as resolved, further discussion on team priorities offline.)
Run<turboshaft::MachineLoweringPhase>();Daniel LehmannI feel like we should do the inlining conceptually after the `MachineLowering` as it replaces higher-level JS concepts with low-level instructions and the inlined wasm uses low-level instructions as well.
On the other hand, the `MachineLoweringPhase` also does a few optimizations like `MachineOptimizationReducer` that we'd want to run for the inlined wasm as well.
It could indeed be the best strategy to merge it into the `MachineLoweringPhase` in the end but for now I'd keep it separate (because of the experimental nature and the open questions).
Matthias LiedtkeI have thought again about my comment `Skip this phase if there were no JS->Wasm calls` a couple lines above and replaced it instead with the plan you laid out:
1. Keep a separate phase before the `MachineLoweringPhase` until the MVP is feature-complete and cleaned-up (but before Finching or shipping it).
2. Then integrate my reducer into the `MachineLoweringPhase` probably before `DataViewLoweringReducer` (since we have `DataViewLoweringReducer` in the `WasmGraphBuilderBase` as well, that seems like the right place).
3. Not worry about the performance of always running the reducer: If we have inlineable Wasm functions, running a separate phase is more expensive, and if we don't inline it's just an additional load+conditional branch per CallOp during Turbofan compilation, which is not worth the hassle. (Except if we add a lot more reducers that apply to JS operations as well, but I don't think that will be the case.)
Ah, actually I remember one difference here:
For Turbofan, if we inline wasm functions, we run a bunch of extra optimizations (the `WasmGCOptimizationPhase`) and some of that stuff is actually relevant (e.g. eliminating redundant casts and the `extern.internalize(extern.externalize(...))` that would be a typical pattern if you chain two wasm calls (one externalizes the return and the other internalizes the argument.)So you might want to do the same later on. (OTOH, we can still run it in the regular phase and then conditionally run the wasm optimizations later on, they anyways need their own phase in Turboshaft.)
Yes, good point. I already had a `TODO` in `wasm-in-js-inlining-phase.cc` to add Wasm GC optimizations, but I hadn't accounted for them needing their own phase due to the analysis on the input graph.
IIUC the analysis code walks over all operations in all blocks (i.e., it wouldn't only add overhead for Wasm GC operations), so we cannot just add it to the beginning of a following phase unconditionally.
// We currently don't sup port operations that:Daniel LehmannNit: `support`
Done
Thanks. Also note that we can't inline asm.js because it would require different handling in frames.cc as it would need to create a "JS-looking" stack frame for a wasm function. (There is a test case somewhere I think.)
I added an early bailout to ensure the Wasm module we inline is not coming from `asm.js`, to document this explicitly.
return __ TruncateWord64ToWord32(arg);Daniel LehmannI'm pretty sure that most if not all of the `I64` instructions you added are broken. When not on 64 bits, you'll need to run the `turboshaft::Int64LoweringReducer`.
As you're on an experimental flag, this doesn't have to happen on this change but that's a bit of the risk of adding dozens of instructions and testing like 5 of them.
Matthias LiedtkeOh, you are right, and constant folding may eliminate the i64 operations in many simple test cases so it's not immediately obvious. Just adding the `Int64LoweringReducer` is also not possible to support them, since it makes some assumptions about `Parameter` and `Return` ops and assumes a (non-existent in the Wasm-in-JS inlining case) `wasm_sig`.
All in all, I have to agree that this is a bit too risky and would require a whole bunch of more changes to land right now. So I will remove i64 unary/binary ops from this CL, and only gradually extend later.
I also extended the test with all unary and binary ops that are currently supported by the inliner (which required a bit of a rewrite/testing framework, otherwise it would have been a lot of repetition).
and constant folding may eliminate the i64 operations in many simple test cases so it's not immediately obvious.
Yes, but even `Word64Constant` should fail without the lowering. 😊
Right, I also removed support for `i64.const`. (But it's not easy to test a lone `i64.const` instruction: the wrapper inlining doesn't support `i64` in the signature on 32-bit architectures, i.e., you first need to convert it to an `i32` somehow, and that may be constant folded again.)
if constexpr (ValidationTag::validate) {Daniel LehmannSo, what was the reason for this change again?
Matthias Liedtke`DecoderError()` in its body has `V8_ASSUME(ValidationTag::validate)` to help the compiler optimize (and since it's an unchecked assumption, V8_ASSUME additionally DCHECKs that it holds). Here, we call `DecodeError` in a context where `validate` could be false. I am not entirely sure why we didn't trip over this beforehand, but this is the correct thing to do and it comes up with bailouts in the `WasmInJsInliningInterface` now.
Ah, I think I got it, with bailouts, we don't reach the end but we still don't want to trigger a `DecodeError` here, right?
Yes, exactly, good explanation. I updated the comment to explicitly mention bailouts.
kGCPrefix, kExprAnyConvertExtern,
kGCPrefix, kExprExternConvertAny,Note that this can be optimized away (as mentioned in another comment).
An idea using `kSi_r_i`:
```
kExprLocalGet, 0,
kExprRefI31,
kExprExternConvertAny
```and `kSig_i_r` for the other way:
```
kExprLocalGet, 0,
kExprAnyConvertExtern,
kExprI31GetS,
```
(I didn't check if you have the ref.i31 stuff in your CL, so this could also be a nice test case for a follow-up CL.)And yes, there could still be some optimizations here, but I don't think we optimize anything there (other than very generic optimizations).
Thanks for the test case. Since I don't have `ref.i31` support yet, I'll keep this open and add it in a later CL.
addTestcase('createArray', makeSig([kWasmI32], [kWasmExternRef]), [42], [This isn't needed for feature parity, it's only needed to test the `array.len` below, so your TODO is slightly misleading. 😊
Oh, good point. Fixed.
// addBinaryTestcase('RefEq', makeSig([kWasmExternRef, kWasmExternRef], [kWasmExternRef]), {}, null);Should this be `kWasmEqRef` or do I misunderstand the comment?
Right, I missed that I can already test this without having to support operations that create an `eqref` in Wasm, since one can pass JS numbers as `kWasmEqRef`. So fixed and added a `RefEq` testcase now.
eval(jsFunctionSrc);
const jsFunction = eval('js_' + name);You can slightly reduce the amount of eval and magic by doing
```suggestion
const jsFunction = eval(jsFunctionSrc);
```
Unfortunately, that's not possible, since in `eval('function foo() { ... }')` the JS is parsed as a function declaration (not a named function expression) and hence has return value `undefined`. It's only a named function expression when this syntax is used in a context that doesn't allow statements (e.g., the right-hand side of an assignment). So I think I have to keep this particular eval magic :/
assertEquals(%GetOptimizationStatus(jsFunction), V8OptimizationStatus.kIsFunction | V8OptimizationStatus.kOptimized | V8OptimizationStatus.kTurboFanned);I think these asserts can produce all kinds of issues in various build modes (e.g. the one that randomly deopts functions).
You should probably try to use the helpers like `assertOptimized(jsFunction)` that hides most of that complexity.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
[turboshaft][wasm] Wasm-in-JS inlining: Part 1
Start with some boilerplate and incomplete implementation. This adds:
- Off-by-default, experimental flag.
- Attaching the information which Wasm function is called to a new
TurboFan sidetable (determined the least invasive/best option
together with Nico).
- Convert that sidetable information and attach to `TSCallDescriptor`.
- New Turboshaft phase and reducer, which consume this info and do the
actual inlining.
- Mjsunit-style message test.
- WasmFullDecoder instantiation (in the TS reducer) and new interface
for the decoder.
- Bailout by decoding Wasm function twice.
- Don't toggle thread-in-Wasm-flag when inlining (HACKY, see comment).
- Implementation (roughly copied from regular Wasm TS pipeline) of
* locals, globals, constants,
* simple arithmetic,
* several binary and unary operations that don't trap or call
builtins.
TODOs for follow-up parts:
- Correct stack trace, metadata, debugging ...
- Implement support for more Wasm instructions in the inlining
interface (which might need smarter code sharing with the regular
Wasm pipeline)
- Implement memory accesses (with bounds checking)
- Fuzz the whole thing with Fuzzilli
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
// need to understand what the inlining ID is / where to get it.You'll need to provide that from the JS side, see https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/js-inlining.cc;l=454
This might not be fully trivial given that you are doing the inlining later than the JS inlining.
Daniel Lehmann
Daniel LehmannThanks for the pointer, will leave this open and address in a follow-up CL/part of the implementation.
This got fixed in https://crrev.com/c/7638289.
kGCPrefix, kExprAnyConvertExtern,
kGCPrefix, kExprExternConvertAny,Daniel LehmannNote that this can be optimized away (as mentioned in another comment).
An idea using `kSi_r_i`:
```
kExprLocalGet, 0,
kExprRefI31,
kExprExternConvertAny
```and `kSig_i_r` for the other way:
```
kExprLocalGet, 0,
kExprAnyConvertExtern,
kExprI31GetS,
```
(I didn't check if you have the ref.i31 stuff in your CL, so this could also be a nice test case for a follow-up CL.)And yes, there could still be some optimizations here, but I don't think we optimize anything there (other than very generic optimizations).
Thanks for the test case. Since I don't have `ref.i31` support yet, I'll keep this open and add it in a later CL.
Finally fixed this test together with the support for `any.convert_extern` and `extern.convert_any` in https://crrev.com/c/7725990.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |