turboshaft isel failure with projections

70 views
Skip to first unread message

Sam Parker-Haynes

unread,
Nov 8, 2023, 7:54:39 AM11/8/23
to v8-dev
Hi,

I've been investigating rematerialising flag setting instructions in VisitWordCompareZero, just by removing the CanCover check, and something is going wrong with turboshaft.

I have an arithmetic overflow operation, Int32AddWithOverflow, and the overflow check is used by two branch operations. With a turbofan-only flow, the overflow operation and the projection will be duplicated for the second branch, and everything is fine. But with turboshaft, neither the overflow operation or projection are duplicated and the second branch uses the original projection[1] value. This looks fine at the IR level, but ends up broken after isel when the register allocator comes across a virtual register without a definition. This happens for both x64 and arm64, so I'm assuming turboshaft is making some assumptions, based on the current behaviour, that are non-obvious to me.

Any ideas?

Thanks!
Sam



Matthias Liedtke

unread,
Nov 8, 2023, 8:12:56 AM11/8/23
to v8-...@googlegroups.com
Hi Sam,

I don't have any insights on x64 for VisitWordCompareZero but as a quick note, the TurboshaftAdapter instruction selection implementation for VisitWordCompareZero for arm64 was just added yesterday by me (https://chromium-review.googlesource.com/c/v8/v8/+/5002003).

So, just to clarify, I assume you are talking about the Turbofan {Node* / TurbofanAdapter} based instruction selection with Turboshaft optimizations followed by a RecreateSchedule?
If you do changes for the TurbofanAdapter instruction selection on arm64 for code that already has an implementation for the TurboshaftAdapter, could you add a TODO(mliedtke) on the Turboshaft implementation with some nice description and CC me on the gerrit changes, so these don't get lost when we switch over to the TurboshaftAdapter?

Thanks and best regards,
Matthias

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/175a4ba2-9d33-45bb-84e3-91d41a722846n%40googlegroups.com.

Sam Parker-Haynes

unread,
Nov 8, 2023, 8:20:03 AM11/8/23
to v8-dev
Hi Matthias,

Ah, I hadn't seen that, so yes I've been tweaking the OG selector function for arm64. On the x64 side of things, everything is fine with 'pure' turboshaft pipeline but, as with my arm64 checkout, the issue arises when modifying the turbofan selector function and using ---no-turboshaft.

So, I guess the problem is with the RecreateSchedule?

Thanks,
Sam 

Sam Parker-Haynes

unread,
Nov 8, 2023, 8:26:12 AM11/8/23
to v8-dev
EDIT: only passes with --no-turboshaft

Matthias Liedtke

unread,
Nov 8, 2023, 9:14:08 AM11/8/23
to v8-...@googlegroups.com
Hi Sam,

I'd probably try to see if there are any suspicious differences on the comparisons that produce errors with --trace-turbo in the "V8TFTurboshaftRecreateSchedule" phase with --turboshaft vs. the "schedule" phase with --no-turboshaft.
One known difference is also that the schedule recreation from Turboshaft does not rebuild an effect and control chain for the sea of nodes / Turbofan representation. I don't think, however, that this should matter here.

Best regards,
Matthias





Sam Parker-Haynes

unread,
Nov 8, 2023, 9:15:58 AM11/8/23
to v8-dev
Also, are any instruction selection tests being added for Turboshaft? Something like the equivalent of test/unittests/compiler/arm64/instruction-selector-arm64-unittest.cc?

Sam Parker-Haynes

unread,
Nov 8, 2023, 11:24:16 AM11/8/23
to v8-dev
Hi Matthias,

My RecreateSchedule example looks like this:

B0:
  0: Int32AddWithOverflow
  1: Projection[0] (0)
  2: Projection[1] (1)
  Branch (2)
...

B4:
  Branch (2)    <-- how should this branch receive (2)

I know I said it looks okay at the IR level, but since the IR isn't documented that may not be true ;) So, what are the semantics of a Projection..?
- Is the Projection[1] of an Overflow operation a part of the control chain?
- Is it legal for all projected values to be live across basic blocks? If so, who should be responsible for ensuring values are live where they need to be?

Thanks,
Sam

Matthias Liedtke

unread,
Nov 8, 2023, 12:02:40 PM11/8/23
to v8-...@googlegroups.com
Hi Sam,

Honestly I don't feel knowledgeable enough to answer this question, so I hope some Turbofan experts can chime in.
This line defines that the Projection in Turbofan has one value input, a control input and a value output but no control output.
I don't understand why it has a control input or what it exactly means that an operation has a control input vs. let's say a Select operation that doesn't have one.
Still, I'd assume that the Branch in B4 should have a control input that at some point leads to the same control input as "2: Projection [1]" and therefore I'd assume that it's valid as the control input of the projection dominates the branch that uses it.
Looking at the schedule, as the Projection is in B0, it is guaranteed to be "executed" before B4, so the virtual register should be defined and should be "available" / usable by all instructions following itself in B0 and in all instructions in blocks that are reachable from B0 directly or indirectly.
I don't think there is anything limiting the liveness of a virtual register, so as long as a block A dominates another Block B, all virtual registers defined in A are usable in B.
So your snippet looks totally fine based on my (limited) understanding of Turbofan.

However, I'd assume it should be
  1: Projection[0] (0)
  2: Projection[1] (0)
and not
  1: Projection[0] (0)
  2: Projection[1] (1)
(i.e. both projections should have Int32AddWithOverflow as an input)

Best regards,
Matthias


dmerc...@google.com

unread,
Nov 8, 2023, 12:20:37 PM11/8/23
to v8-dev
Hi,

> With a turbofan-only flow, the overflow operation and the projection will be duplicated for the second branch, and everything is fine.

I'm not sure which phase could duplicate a Int32AddWithOverflow. I don't think that the Instruction Selector does this. It could be an artifact of scheduling (which happens before effect control linearization or before instruction selection). I'm not sure it's a conscious optimization though, or just a side-effect of the scheduler.

> Is it legal for all projected values to be live across basic blocks? If so, who should be responsible for ensuring values are live where they need to be?

Yes it is legal (and probably happens fairly often). As to who should ensure that the values are live... It sounds like here the bug is in the Instruction Selector. What probably happens is that when emitting the 1st branch, we decide that it can cover the Projection[1] and don't need to emit it, which is wrong. However, this should be prevented by this check https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/backend/x64/instruction-selector-x64.cc;l=4278: when user=the_first_branch and value=projection[1], CanCover should return false, because the projection has another use.

If you can provide an example that we can reproduce, I'd be happy to have a look, but without being able to reproduce, we're just speculating :/

> I don't understand why it has a control input

Projections don't really need control inputs, but I think that having control inputs in projections is somewhat useful for scheduling in Turbofan. We trying removing the control input (https://crrev.com/c/4570503), and performance regressed in some cases.

Cheers,
Darius

Sam Parker-Haynes

unread,
Nov 9, 2023, 4:50:53 AM11/9/23
to v8-dev
Hi,

Thanks both!

Looking again at the IR, I wonder if it is control related. This is the well-behaved IR, where both instances of the Int32AddWithOverflow have a control input.

--- BLOCK B1145 id5053 <- B1143, B1144 ---                                                                                    
  43: Merge(42, 13024)                                                                                                        
  53: Phi[kRepWord32](18, 13025, 43)                                                                                          
  52: Phi[kRepWord32](17, 13021, 43)                                                                                          
  51: EffectPhi(15, 13024, 43)                                                                                                
  13034: ChangeInt32ToFloat64(53)
  6670: Load[kRepTaggedPointer|kTypeAny](6114, 23562, 51, 43)
  13026: Int32AddWithOverflow(53, 53, 43)                                                                                      
  13027: Projection[1](13026, 43)                                                                                              
  13028: Branch[Machine, False](13027, 43) -> B1147, B1146

--- BLOCK B1158 id5027 <- B1157, B1153 ---
  13061: Merge(13054, 13044)
  13063: Phi[kRepFloat64](13056, 13060, 13061)
  13062: EffectPhi(13056, 78, 13061)
  6672: Load[kRepTaggedPointer|kTypeAny](6111, 23562, 13062, 13061)
  13068: Int32AddWithOverflow(53, 53, 13061)
  13069: Projection[1](13068, 13061)
  13070: Branch[Machine, False](13069, 13061) -> B1160, B1159

However, with Turboshaft, the blocks look like this:

--- BLOCK B1145 id1146 <- B1143, B1144 ---
  5473: Phi[kRepWord32](5449, 5470)
  5474: Phi[kRepWord32](5450, 5466)
  5475: Int64Constant[19]
  5476: Load[kRepTaggedPointer|kTypeAny](28, 5475)
  5477: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
  5478: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, sparse:^^](5474, 5473)
  5479: TypedStateValues[, sparse:..]
  5480: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, kRepTagged|kTypeAny, sparse:^^^^^...](5478, 5479, 27, 26, 25)
  5481: TypedStateValues[, sparse:.]
  5482: FrameState[UNOPTIMIZED_FRAME, 59, PokeAt(0), 0x191e0015eab5 <SharedFunctionInfo test_div>](5477, 5480, 5481, 2, 8, 0)
  5483: Int32AddWithOverflow(5473, 5473)
  5484: Projection[0](5483)
  5485: Projection[1](5483)
  5486: Branch[Unspecified, False](5485) -> B1147, B1146

--- BLOCK B1158 id1159 <- B1153, B1157 ---
  5549: Phi[kRepFloat64](5534, 5548)
  5550: Float64RoundDown(5549)
  5551: Int64Constant[19]
  5552: Load[kRepTaggedPointer|kTypeAny](27, 5551)
  5553: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
  5554: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, sparse:^^](5474, 5473)
  5555: TypedStateValues[, sparse:..]
  5556: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, sparse:^^......](5554, 5555)
  5557: TypedStateValues[, sparse:.]
  5558: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 <SharedFunctionInfo test_div>](5553, 5556, 5557, 2, 8, 0)
  5559: Branch[Unspecified, False](5485) -> B1160, B1159


I have seen that Turboshaft has an EmitProjectionReducer (https://source.chromium.org/chromium/chromium/src/+/main:v8/src/compiler/turboshaft/assembler.h;drc=f5a7962861b208e9cf82e61c1fa9f8dc0d216a87;l=606) and seems to be closely related to ValueNumbering. Without control inputs, I can't see why ValueNumbering, or something similar, wouldn't remove the second instance.  

>  It could be an artifact of scheduling

I will have a look at the scheduler.

> However, this should be prevented by this check

Yes, it's that check that I'm trying to remove :) and this works for everything but projections. Surely we should be able to combine an operation even if it has multiple users? 

Regards,
Sam

Sam Parker-Haynes

unread,
Nov 9, 2023, 5:31:24 AM11/9/23
to v8-dev
I can confirm that by disabling value numbering, a second Int32AddWithOverflow does exist:

--- BLOCK B1158 id1159 <- B1153, B1157 ---
  5768: Phi[kRepFloat64](5753, 5767)
  5769: Float64RoundDown(5768)
  5770: Int64Constant[19]
  5771: Load[kRepTaggedPointer|kTypeAny](27, 5770)
  5772: TypedStateValues[kRepTagged|kTypeAny, sparse:^](3)
  5773: TypedStateValues[kRepWord32|kTypeInt32, kRepWord32|kTypeInt32, sparse:^^](5690, 5689)
  5774: TypedStateValues[, sparse:..]
  5775: TypedStateValues[kRepTagged|kTypeAny, kRepTagged|kTypeAny, sparse:^^......](5773, 5774)
  5776: TypedStateValues[, sparse:.]
  5777: FrameState[UNOPTIMIZED_FRAME, 92, PokeAt(0), 0x191e0015eab5 <SharedFunctionInfo test_div>](5772, 5775, 5776, 2, 8, 0)
  5778: Int32AddWithOverflow(5689, 5689)
  5779: Projection[0](5778)
  5780: Projection[1](5778)
  5781: Branch[Unspecified, False](5780) -> B1160, B1159

So could Overflow operations and/or Projections[1] be special-cased..? And why is there a difference here with a normal comparison?

Thanks,
Sam

dmerc...@google.com

unread,
Nov 9, 2023, 7:00:07 AM11/9/23
to v8-dev
Arf sorry, I read a bit too quickly your initial message and missed that you were trying to remove a CanCover in VisitWordCompareZero.

> Surely we should be able to combine an operation even if it has multiple users?

Yes, we should (but we don't). We've been thinking about relaxing some CanCovers for this purpose for a while, but haven't had time to get to it just yet. It's not surprising that it currently doesn't work, but it would be nice to make it work.

> So could Overflow operations and/or Projections[1] be special-cased..? And why is there a difference here with a normal comparison?

First, I think that very few operations besides Overflow binops have projections. Only Calls come to mind, and they are fairly different with regard to VisitWordCompareZero and CanCover (you probably never want to cover a Call). So, Overflow binops are different in the sense that they have 2 outputs and in particular an overflow output.

Then, for a more useful explanation of what it going on:
In VisitWordCompareZero, the continuation is a ForBranch when coming from a VisitBranch (which is your case). With such continuation, the overflow bit is never Defined even if it's used (see VisitBinop: it clearly has a single output regardless of the continuation, and EmitWithContinuation only defines an additional output for Set continuations), so it's not surprising that you end up with a virtual register (the one for the overflow bit) being used but not defined: you use it in the 2nd branch, but the 1st branch doesn't define it.
If you look at VisitInt32AddWithOverflow, you'll see that it uses a VisitBinop (like VisitWordCompareZero does for branches on Int32AddWithOverflow's) with a ForSet continuation, and this ForSet continuation causes EmitWithContinuation to emit the overflow bit.

In order to allow VisitWordCompareZero to cover Int32AddWithOverflow when the overflow bit has other uses (which should all be already Defined), you'll need to change something about the continuation used in VisitWordCompareZero. A ForBranchAndSet continuation sounds reasonable, but I haven't looked at the details.

Happy to help if you have other questions :)

Cheers,
Darius

Sam Parker-Haynes

unread,
Nov 9, 2023, 8:22:22 AM11/9/23
to v8-dev
Aaaaaaah, thank you! Well, now it seems obvious :)

I think, for now, I will leave the continuation alone and special-case projections to still require CanCover.

My question remains around testing. I can currently test specific isel patterns for Turbofan, is there anything like that for Turboshaft?

Thanks again,
Sam

Matthias Liedtke

unread,
Nov 9, 2023, 10:14:31 AM11/9/23
to v8-...@googlegroups.com
Hi Sam,

At least for now such a simplified setup for assembling Turboshaft instructions manually in a gtest and running the instruction selection on it doesn't exist.
It should probably be feasible to define a graph builder for such test cases that basically just provides access to the turboshaft::Assembler.
The test environment would need to provide some meaningful turboshaft::PipelineData.

On our side this hasn't been a priority yet.

Best regards,
Matthias

Reply all
Reply to author
Forward
0 new messages