[llvm-dev] [RFC] Setting the current debug loc when the insertion point changes

52 views

Skip to first unread message

Vedant Kumar via llvm-dev

unread,

Nov 6, 2017, 5:19:21 PM11/6/17

to llvm-dev

Hi everybody,

I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.

# Correctness

The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

To fix the issue, clients of IRBuilder should be required to either:

1. Explicitly set the correct debug location before inserting instructions, or

2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

# Convenience

The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.

If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.

# Solutions

1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.

2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it's easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

I'd appreciate your thoughts on this!

thanks,

vedant

Adrian Prantl via llvm-dev

unread,

Nov 6, 2017, 5:33:00 PM11/6/17

to Vedant Kumar, llvm-dev

> On Nov 6, 2017, at 2:19 PM, Vedant Kumar <v...@apple.com> wrote:
>
> Hi everybody,
>
> I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.
>
> # Correctness
>
> The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

I agree with your analysis.

>
> To fix the issue, clients of IRBuilder should be required to either:
>
> 1. Explicitly set the correct debug location before inserting instructions, or
> 2. Opt-in to the current behavior of propagating the debug loc from the insertion point.
>
> # Convenience
>
> The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.
>
> If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.
>
> # Solutions
>
> 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
> 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.
>
> I have a mild preference for idea #1 because it's easier to read, IMO.
>
> Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

I also like the idea of renaming the method to make it really obvious what happens here. Could we *gasp* even fix the capitalization?

We should also think about how to map this to the C API.

-- adrian

>
> I'd appreciate your thoughts on this!
>
> thanks,
> vedant

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Davide Italiano via llvm-dev

unread,

Nov 6, 2017, 5:42:51 PM11/6/17

to Vedant Kumar, llvm-dev

On Mon, Nov 6, 2017 at 2:19 PM, Vedant Kumar <v...@apple.com> wrote:
> Hi everybody,
>
> I'm investigating some correctness issues with debug line table information
> in optimized code. I've noticed something problematic in IRBuilder. Setting
> the insertion point of an IRBuilder sets the builder’s current debug loc to
> the one attached to the insertion point. IMO this isn't always a correct or
> convenient thing to do, and it shouldn't be the default behavior.
>
> # Correctness
>
> The IRBuilder shouldn't assume that the debug loc at its insertion point
> applies to any instructions it emits. It doesn't have enough information to
> make that connection. This assumption leads to bugs like llvm.org/PR25630.
> I.e it creates situations in which we silently push around incorrect
> DILocations without realizing it, because the debug locs in the IR look nice
> and contiguous.
>

I remember that one, and I got to your same conclusion at the time
when analyzing the bug, so thanks for writing this down :)

> To fix the issue, clients of IRBuilder should be required to either:
>
> 1. Explicitly set the correct debug location before inserting instructions,
> or
> 2. Opt-in to the current behavior of propagating the debug loc from the
> insertion point.
>
> # Convenience
>
> The current behavior of IRBuilder can be convenient when it's correct. But
> it can be *really* inconvenient when it's incorrect.
>
> If a client sets an IRBuilder's debug loc correctly and makes a call that
> changes the builder's insertion point, all generated instructions will pick
> up a bogus debug loc from the IP. E.g this is what happens with
> SCEVExpander::expandCodeFor(). There's no way to fix this bug without
> changing IRBuilder's behavior.
>
> # Solutions
>
> 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically
> to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't
> change the current debug loc.
> 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which
> defaults to false. Only change the current debug loc if UpdateDbgLoc is
> true.
>
> I have a mild preference for idea #1 because it's easier to read, IMO.
>
> Whichever solution we pick, my plan is to first migrate all in-tree users of
> SetInsertPoint() to the explicit API, and to then gradually fix passes which
> rely on the current incorrect behavior.
>

My preference is as well for 1. I kind of dislike adding bool
parameters to API as it becomes easier to get them wrong.
Your plan makes sense to me, regardless. I'll be happy to review the patches.

Thanks!

--
Davide

Vedant Kumar via llvm-dev

unread,

Nov 6, 2017, 5:50:24 PM11/6/17

to Adrian Prantl, llvm-dev

> On Nov 6, 2017, at 2:32 PM, Adrian Prantl <apr...@apple.com> wrote:
>
>
>> On Nov 6, 2017, at 2:19 PM, Vedant Kumar <v...@apple.com> wrote:
>>
>> Hi everybody,
>>
>> I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.
>>
>> # Correctness
>>
>> The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.
>
> I agree with your analysis.
>
>>
>> To fix the issue, clients of IRBuilder should be required to either:
>>
>> 1. Explicitly set the correct debug location before inserting instructions, or
>> 2. Opt-in to the current behavior of propagating the debug loc from the insertion point.
>>
>> # Convenience
>>
>> The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.
>>
>> If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.
>>
>> # Solutions
>>
>> 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
>> 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.
>>
>> I have a mild preference for idea #1 because it's easier to read, IMO.
>>
>> Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.
>
> Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

There's at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence.

I'm not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn't be able to promptly fix each use of the API.

OTOH, if we added a method like SetInsertPointAndDebugLoc(), we could incrementally eliminate as many uses of that method as possible.

> I also like the idea of renaming the method to make it really obvious what happens here. Could we *gasp* even fix the capitalization?

Sure :).

> We should also think about how to map this to the C API.

Wdyt of exposing SetInsertPointAndDebugLoc() via the C API, with a note explaining that it will eventually be deprecated?

vedant

Adrian Prantl via llvm-dev

unread,

Nov 6, 2017, 5:58:26 PM11/6/17

to Vedant Kumar, llvm-dev

> On Nov 6, 2017, at 2:50 PM, Vedant Kumar <v...@apple.com> wrote:
>
>>
>> On Nov 6, 2017, at 2:32 PM, Adrian Prantl <apr...@apple.com> wrote:
>>
>>
>>> On Nov 6, 2017, at 2:19 PM, Vedant Kumar <v...@apple.com> wrote:
>>>
>>> Hi everybody,
>>>
>>> I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.
>>>
>>> # Correctness
>>>
>>> The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.
>>
>> I agree with your analysis.
>>
>>>
>>> To fix the issue, clients of IRBuilder should be required to either:
>>>
>>> 1. Explicitly set the correct debug location before inserting instructions, or
>>> 2. Opt-in to the current behavior of propagating the debug loc from the insertion point.
>>>
>>> # Convenience
>>>
>>> The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.
>>>
>>> If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.
>>>
>>> # Solutions
>>>
>>> 1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
>>> 2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.
>>>
>>> I have a mild preference for idea #1 because it's easier to read, IMO.
>>>
>>> Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.
>>
>> Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.
>
> There's at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence.
>
> I'm not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn't be able to promptly fix each use of the API.

I thought that your proposal was to add SetInsertPointAndDebugLoc() *and* convert all existing users over at the same time, and then audit them for correctness one-by-one (and potentially move them over to the new SetInsertPoint()). I don't see how this is cheaper than adding an extra parameter to SetInsertPoint(). The auditing needs to be done either way. Am I misunderstanding your proposal?

-- adrian

Vedant Kumar via llvm-dev

unread,

Nov 6, 2017, 6:32:53 PM11/6/17

to Adrian Prantl, llvm-dev

On Nov 6, 2017, at 2:58 PM, Adrian Prantl <apr...@apple.com> wrote:

On Nov 6, 2017, at 2:50 PM, Vedant Kumar <v...@apple.com> wrote:

On Nov 6, 2017, at 2:32 PM, Adrian Prantl <apr...@apple.com> wrote:

On Nov 6, 2017, at 2:19 PM, Vedant Kumar <v...@apple.com> wrote:

Hi everybody,

I'm investigating some correctness issues with debug line table information in optimized code. I've noticed something problematic in IRBuilder. Setting the insertion point of an IRBuilder sets the builder’s current debug loc to the one attached to the insertion point. IMO this isn't always a correct or convenient thing to do, and it shouldn't be the default behavior.

# Correctness

The IRBuilder shouldn't assume that the debug loc at its insertion point applies to any instructions it emits. It doesn't have enough information to make that connection. This assumption leads to bugs like llvm.org/PR25630. I.e it creates situations in which we silently push around incorrect DILocations without realizing it, because the debug locs in the IR look nice and contiguous.

I agree with your analysis.

To fix the issue, clients of IRBuilder should be required to either:

1. Explicitly set the correct debug location before inserting instructions, or
2. Opt-in to the current behavior of propagating the debug loc from the insertion point.

# Convenience

The current behavior of IRBuilder can be convenient when it's correct. But it can be *really* inconvenient when it's incorrect.

If a client sets an IRBuilder's debug loc correctly and makes a call that changes the builder's insertion point, all generated instructions will pick up a bogus debug loc from the IP. E.g this is what happens with SCEVExpander::expandCodeFor(). There's no way to fix this bug without changing IRBuilder's behavior.

# Solutions

1. Add a method called SetInsertPointAndDebugLoc() which behaves identically to the current SetInsertPoint(). Change SetInsertPoint() so that it doesn't change the current debug loc.
2. Add a boolean argument to SetInsertPoint() called UpdateDbgLoc which defaults to false. Only change the current debug loc if UpdateDbgLoc is true.

I have a mild preference for idea #1 because it's easier to read, IMO.

Whichever solution we pick, my plan is to first migrate all in-tree users of SetInsertPoint() to the explicit API, and to then gradually fix passes which rely on the current incorrect behavior.

Is the behavior of inheriting the debug location of the insertion point ever what we want? I guess I can see how it can be useful when expanding an existing instruction into a sequence, but I wonder if any clients actually correctly use it like that today. If there aren't many uses, we could also add a (DILocation *) argument to SetInsertPoint to force every client to think about how to properly set the location in their use-case.

There's at least one correct use of the current SetInsertPoint() API in speculateSelectInstLoads() :). As you predicted, it expands an existing instruction into a sequence.

I'm not sure how many correct uses of the API we have. There are over 300 uses in just {llvm,clang}/lib. I think that prevents us from adding a mandatory argument to SetInsertPoint(): we wouldn't be able to promptly fix each use of the API.

I thought that your proposal was to add SetInsertPointAndDebugLoc() *and* convert all existing users over at the same time, and then audit them for correctness one-by-one (and potentially move them over to the new SetInsertPoint()).

Yes.

I don't see how this is cheaper than adding an extra parameter to SetInsertPoint(). The auditing needs to be done either way. Am I misunderstanding your proposal?

Sorry, I actually think I misunderstood your suggestion.

Did you mean that we should add an extra parameter to SetInsertPoint(), convert all in-tree users so they call SetInsertPoint(IP, IP->getDebugLoc()), and then audit all uses of the API? If so, I think that's workable, but it presents a difficult migration problem. There's an overload of SetInsertPoint which accepts a BasicBlock::iterator, so we'd have to detect that and write this in-line:

IRB.SetInsertPoint(BB, IP, (IP != BB->end()) ? IP->getDebugLoc() : IBR.getCurrentDebugLocation())

I do like that this approach forces clients of IRBuilder to be intentional about setting debug locs, but it looks like it could get messy.

Wdyt of adopting parts of both solutions? I.e, we'd have the following methods:

// Transition API

- SetInsertPointAndDebugLoc(BasicBlock *)

- SetInsertPointAndDebugLoc(Instruction *)

- SetInsertPointAndDebugLoc(BasicBlock *, BasicBlock::iterator)

// Future API

- SetInsertPoint(BasicBlock *, const DebugLoc &)

- SetInsertPoint(Instruction *, const DebugLoc &)

- SetInsertPoint(BasicBlock *, BasicBlock::iterator, const DebugLoc &)

We'd make this work by renaming SetInsertPoint to SetInsertPointAndDebugLoc and marking it as deprecated. This would make the transition easy (run sed in a loop), we'd get the future API we want, and it becomes easy to find unaudited API calls.

vedant

Tobias Edler von Koch via llvm-dev

unread,

Nov 8, 2017, 12:29:20 PM11/8/17

to Vedant Kumar, Adrian Prantl, llvm-dev

There's also the opposite problem of the IRBuilder sometimes not being
able to find a debug location at the insertion point when it is required
to add one (e.g. when inserting a CallInst into a function that has
debug info)... another argument for making it explicit!

On 11/06/2017 05:32 PM, Vedant Kumar via llvm-dev wrote:
> Wdyt of adopting parts of both solutions? I.e, we'd have the following
> methods:
>
> // Transition API
> - SetInsertPointAndDebugLoc(BasicBlock *)
> - SetInsertPointAndDebugLoc(Instruction *)
> - SetInsertPointAndDebugLoc(BasicBlock *, BasicBlock::iterator)
>
> // Future API
> - SetInsertPoint(BasicBlock *, const DebugLoc &)
> - SetInsertPoint(Instruction *, const DebugLoc &)
> - SetInsertPoint(BasicBlock *, BasicBlock::iterator, const DebugLoc &)

+1 to this solution. We shouldn't silently change the behavior of the
existing SetInsertPoint API.

Thanks,
Tobias

--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project.

Vedant Kumar via llvm-dev

unread,

Nov 9, 2017, 1:54:40 PM11/9/17

to Tobias Edler von Koch, llvm-dev

> On Nov 8, 2017, at 9:29 AM, Tobias Edler von Koch <tob...@codeaurora.org> wrote:
>
> There's also the opposite problem of the IRBuilder sometimes not being able to find a debug location at the insertion point when it is required to add one (e.g. when inserting a CallInst into a function that has debug info)... another argument for making it explicit!

Right. It's not too onerous to type "setInsertPoint(..., DebugLoc())" if the location really is unknown, and it's a lot more explicit.

> On 11/06/2017 05:32 PM, Vedant Kumar via llvm-dev wrote:
>> Wdyt of adopting parts of both solutions? I.e, we'd have the following methods:
>>
>> // Transition API
>> - SetInsertPointAndDebugLoc(BasicBlock *)
>> - SetInsertPointAndDebugLoc(Instruction *)
>> - SetInsertPointAndDebugLoc(BasicBlock *, BasicBlock::iterator)
>>
>> // Future API
>> - SetInsertPoint(BasicBlock *, const DebugLoc &)
>> - SetInsertPoint(Instruction *, const DebugLoc &)
>> - SetInsertPoint(BasicBlock *, BasicBlock::iterator, const DebugLoc &)
>
> +1 to this solution. We shouldn't silently change the behavior of the existing SetInsertPoint API.

Good to hear. I think it would be even simpler if we skip renaming the existing users, and just added the future APIs. I'll send out a patch.