About TFTurboshaftWasmOptimize

41 views
Skip to first unread message

Yahan Lu

unread,
Nov 13, 2023, 6:19:19 AM11/13/23
to v8-dev

----- V8.TFTurboshaftWasmOptimize -----

MERGE B0
0: Parameter()[1]
1: StackCheck()[WebAssembly, function-entry]
2: Constant()[word64: 1]
3: WordBinop(#0, #2)[BitwiseAnd, Word64]
4: Constant()[word64: 0]
5: Equal(#3, #4)[Word64]
6: Constant()[word32: 0]
7: Branch(#5)[B2, B1, False]

BLOCK B1 <- B0
10: Load *(#0) [tagged base, TaggedPointer]
11: Load *(#10 + 12) [tagged base, Uint16, offset: 12]
12: Constant()[word32: 128]
13: Comparison(#11, #12)[UnsignedLessThan, Word32]
14: Goto()[B3]

BLOCK B2 <- B0
15: Goto()[B3]

MERGE B3 <- B1, B2
16: Phi(#13, #6)[Word32]
17: Return(#6, #16)

----- V8.TFTurboshaftWasmDeadCodeElimination -----

MERGE B0
0: Parameter()[1]
1: Constant()[word64: 1]
2: WordBinop(#0, #1)[BitwiseAnd, Word64]
3: Constant()[word64: 0]
4: Equal(#2, #3)[Word64]
5: Constant()[word32: 0]
6: Branch(#4)[B2, B1, False]

BLOCK B1 <- B0
9: TaggedBitcast(#0)[Tagged, Word64]
10: Load *(#9 - 1) [raw, TaggedPointer, offset: -1]
11: TaggedBitcast(#10)[Tagged, Word64]
12: Load *(#11 + 11) [raw, Uint16, offset: 11]
13: Constant()[word32: 128]
14: Comparison(#12, #13)[UnsignedLessThan, Word32]
15: Goto()[B3]


On Block B1
Before TFTurboshaftWasmDeadCodeElimination
10: Load *(#0) [tagged base, TaggedPointer]
After:
9: TaggedBitcast(#0)[Tagged, Word64]
10: Load *(#9 - 1) [raw, TaggedPointer, offset: -1]

Is it changed by TFTurboshaftWasmDeadCodeElimination? 
I set  constexpr bool trace_analysis  true,
I can't find any information about this change.

cmd:
arm64.release/d8 --test test/mjsunit/mjsunit.js test/mjsunit/wasm/turboshaft/instruction-selection.js --random-seed=-1803833885 --nohard-abort --verify-heap --testing-d8-test-runner --no-liftoff --no-wasm-lazy-compilation --experimental-wasm-stringref --turboshaft-wasm --turboshaft-wasm-instruction-selection --print-code --trace-turbo-graph --trace-turbo-filter=isString

Matthias Liedtke

unread,
Nov 13, 2023, 6:41:33 AM11/13/23
to v8-...@googlegroups.com
Hi Yahan,

Yes, this happens in a reducer introduced a few days ago here.
In Turbofan, a load is a simple operation that represents *(base + index) in Turboshaft, a load expresses *(base + index * element_size + offset).
This is somewhat close to loads in e.g. x86-64 but doesn't seem to map that greatly to e.g. arm64 where a load is a simpler / more restricted instruction.
There are two options to solve this:
  a) During instruction selection map the complex load to a sequence of machine instructions.
  b) Before instruction selection, lower the "complex" loads to simpler operations that are closer to the target machine.
(a) is probably the cleaner design as it means that the optimizations are machine-independent (which already isn't true in many other cases, like count-leading-zeros and other operations that will directly be replaced by something else very early if the target machine doesn't support these).
The advantage of (b) is that any subsequent loads with the same base + index * element_size but a different offset can profit from global value numbering to not having to repeat this calculation multiple times.

That being said, please note that --turboshaft-wasm-instruction-selection is a highly experimental flag that currently doesn't support much else than what is covered in this single mjsunit test.
It will take multiple weeks to get it to a state where it's feature-complete.
It might also be a bit unfortunate that this is done during a phase called WasmDeadCodeElimination while this isn't about dead code. The issue is that every additional phase has a significant impact on both compile time and binary size. For the time of this being experimental it might however be a good idea to move it into its own phase.

Does this help? Do you see an issue in this transformation? (Note that prior to this similar modifications would happen when translating the Turboshaft load to a Turbofan load in recreate-schedule.cc.

Best regards,
Matthias


--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/0f717fd7-a31d-4d32-ae62-4a5e2d03e50dn%40googlegroups.com.


--

Matthias Liedtke

Software Engineer

mlie...@google.com


Google Germany GmbH

Erika-Mann-Straße 33

80636 München


Geschäftsführer: Paul Manicle, Liana Sebastian

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde. 

     

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.


Yahan Lu

unread,
Nov 13, 2023, 7:58:44 AM11/13/23
to v8-dev
Hi Matthias,
Yes, thank you! I am porting your pacth to risc-v.

Matthias Liedtke

unread,
Nov 13, 2023, 8:29:53 AM11/13/23
to v8-...@googlegroups.com
Awesome, great to hear that!
If risc-v loads are similar to arm64, you might just add risc-v here and it might then be somewhat easier to port than without.
The reducer needs some further work, I think it doesn't handle trapping null correctly (as it removes the tagged_base bit) and it might generate nonoptimal code (e.g. if it could happen that an index is a constant, then it should just be merged with the offset), so you might run into issues if risc-v is too different from arm64.
For 32 bit we don't have precedence yet as neither ia32 nor arm32 have been ported / started.

Reply all
Reply to author
Forward
0 new messages