About TFTurboshaftWasmOptimize

Yahan Lu

unread,

Nov 13, 2023, 6:19:19 AM11/13/23

to v8-dev

----- V8.TFTurboshaftWasmOptimize -----

MERGE B0
Parameter()[1]
StackCheck()[WebAssembly, function-entry]
Constant()[word64: 1]
WordBinop(#0, #2)[BitwiseAnd, Word64]
Constant()[word64: 0]
Equal(#3, #4)[Word64]
Constant()[word32: 0]
Branch(#5)[B2, B1, False]

BLOCK B1 <- B0
Load *(#0) [tagged base, TaggedPointer]
Load *(#10 + 12) [tagged base, Uint16, offset: 12]
Constant()[word32: 128]
Comparison(#11, #12)[UnsignedLessThan, Word32]
Goto()[B3]

BLOCK B2 <- B0
Goto()[B3]

MERGE B3 <- B1, B2
Phi(#13, #6)[Word32]
Return(#6, #16)

----- V8.TFTurboshaftWasmDeadCodeElimination -----

MERGE B0
Parameter()[1]
Constant()[word64: 1]
WordBinop(#0, #1)[BitwiseAnd, Word64]
Constant()[word64: 0]
Equal(#2, #3)[Word64]
Constant()[word32: 0]
Branch(#4)[B2, B1, False]

BLOCK B1 <- B0
TaggedBitcast(#0)[Tagged, Word64]
Load *(#9 - 1) [raw, TaggedPointer, offset: -1]
TaggedBitcast(#10)[Tagged, Word64]
Load *(#11 + 11) [raw, Uint16, offset: 11]
Constant()[word32: 128]
Comparison(#12, #13)[UnsignedLessThan, Word32]
Goto()[B3]

On Block B1

Before TFTurboshaftWasmDeadCodeElimination

10: Load *(#0) [tagged base, TaggedPointer]

After:

9: TaggedBitcast(#0)[Tagged, Word64]
10: Load *(#9 - 1) [raw, TaggedPointer, offset: -1]

Is it changed by TFTurboshaftWasmDeadCodeElimination?

I set constexpr bool trace_analysis true,

I can't find any information about this change.

cmd:

arm64.release/d8 --test test/mjsunit/mjsunit.js test/mjsunit/wasm/turboshaft/instruction-selection.js --random-seed=-1803833885 --nohard-abort --verify-heap --testing-d8-test-runner --no-liftoff --no-wasm-lazy-compilation --experimental-wasm-stringref --turboshaft-wasm --turboshaft-wasm-instruction-selection --print-code --trace-turbo-graph --trace-turbo-filter=isString

Matthias Liedtke

unread,

Nov 13, 2023, 6:41:33 AM11/13/23

to v8-...@googlegroups.com

Hi Yahan,

Yes, this happens in a reducer introduced a few days ago here.

In Turbofan, a load is a simple operation that represents *(base + index) in Turboshaft, a load expresses *(base + index * element_size + offset).

This is somewhat close to loads in e.g. x86-64 but doesn't seem to map that greatly to e.g. arm64 where a load is a simpler / more restricted instruction.
There are two options to solve this:
a) During instruction selection map the complex load to a sequence of machine instructions.

b) Before instruction selection, lower the "complex" loads to simpler operations that are closer to the target machine.

(a) is probably the cleaner design as it means that the optimizations are machine-independent (which already isn't true in many other cases, like count-leading-zeros and other operations that will directly be replaced by something else very early if the target machine doesn't support these).

The advantage of (b) is that any subsequent loads with the same base + index * element_size but a different offset can profit from global value numbering to not having to repeat this calculation multiple times.

That being said, please note that --turboshaft-wasm-instruction-selection is a highly experimental flag that currently doesn't support much else than what is covered in this single mjsunit test.

It will take multiple weeks to get it to a state where it's feature-complete.

It might also be a bit unfortunate that this is done during a phase called WasmDeadCodeElimination while this isn't about dead code. The issue is that every additional phase has a significant impact on both compile time and binary size. For the time of this being experimental it might however be a good idea to move it into its own phase.

Does this help? Do you see an issue in this transformation? (Note that prior to this similar modifications would happen when translating the Turboshaft load to a Turbofan load in recreate-schedule.cc.

Best regards,

Matthias

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/0f717fd7-a31d-4d32-ae62-4a5e2d03e50dn%40googlegroups.com.

--

Matthias Liedtke

Software Engineer

mlie...@google.com

Google Germany GmbH

Erika-Mann-Straße 33

80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Yahan Lu

unread,

Nov 13, 2023, 7:58:44 AM11/13/23

to v8-dev

Hi Matthias,
Yes, thank you! I am porting your pacth to risc-v.

Matthias Liedtke

unread,

Nov 13, 2023, 8:29:53 AM11/13/23

to v8-...@googlegroups.com

Awesome, great to hear that!

If risc-v loads are similar to arm64, you might just add risc-v here and it might then be somewhat easier to port than without.

The reducer needs some further work, I think it doesn't handle trapping null correctly (as it removes the tagged_base bit) and it might generate nonoptimal code (e.g. if it could happen that an index is a constant, then it should just be merged with the offset), so you might run into issues if risc-v is too different from arm64.

For 32 bit we don't have precedence yet as neither ia32 nor arm32 have been ported / started.

To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/2872a9e8-4050-42c0-931e-7813856f8bebn%40googlegroups.com.

Reply all

Reply to author

Forward