[llvm-dev] Debug Locations for Optimized Code

David Blaikie via llvm-dev

unread,

Dec 6, 2016, 11:39:09 PM12/6/16

to llvm-dev, Kostya Serebryany

Hi Kostya,

So... a bunch of folks (not all CC'd - feel free to add anyone who seems relevant, I haven't gone back over the commits to check who's made what changes) have been making improvements to optimized code debug information - much of it around making sure sample profiles are accurate.

This mostly entails removing debug locations from instructions that are moved between basic blocks.

The justification for this has usually been backed up by "it'll make users debug experience better too" because the debugger won't jump around so much, or give users misleading situations like "why am I stepped into a line of code in the 'if' block when the condition was clearly false" (because we raised a load that was done in both blocks up to before the branch and had to pick one of the two locations for it)

This may lead to reduction in quality of debug info for other uses - in particular in ASan and MSan at least.

ASan and MSan care a lot about loads and stores and the location where those came from - less so about whether the load or store was sunk into or out of a loop, etc. Dropping the locations entirely means the user has far less information about which load or store caused the invalid memory access. (The 'if' case is still tricky and would be confusing to a *San user as it was to a debug user.)

I don't know what the right, if any, solution to this is - but I thought I should bring it up in case you or anyone else wanted to puzzle it over & see if the competing needs/desires might need to be considered.

- Dave

Hal Finkel via llvm-dev

unread,

Dec 7, 2016, 6:42:09 AM12/7/16

to David Blaikie, llvm-dev

From: "David Blaikie via llvm-dev" <llvm...@lists.llvm.org>
To: "llvm-dev" <llvm...@lists.llvm.org>, "Kostya Serebryany" <k...@google.com>
Sent: Tuesday, December 6, 2016 10:38:50 PM
Subject: [llvm-dev] Debug Locations for Optimized Code

Hi Kostya,

So... a bunch of folks (not all CC'd - feel free to add anyone who seems relevant, I haven't gone back over the commits to check who's made what changes) have been making improvements to optimized code debug information - much of it around making sure sample profiles are accurate.

This mostly entails removing debug locations from instructions that are moved between basic blocks.

The justification for this has usually been backed up by "it'll make users debug experience better too" because the debugger won't jump around so much, or give users misleading situations like "why am I stepped into a line of code in the 'if' block when the condition was clearly false" (because we raised a load that was done in both blocks up to before the branch and had to pick one of the two locations for it)

This may lead to reduction in quality of debug info for other uses - in particular in ASan and MSan at least.

ASan and MSan care a lot about loads and stores and the location where those came from - less so about whether the load or store was sunk into or out of a loop, etc. Dropping the locations entirely means the user has far less information about which load or store caused the invalid memory access. (The 'if' case is still tricky and would be confusing to a *San user as it was to a debug user.)

I think that it is always potentially confusing: Having ASAN complain about a memory access from a loop that the code hasn't entered yet or has already completed is also difficult to decipher.

I don't know what the right, if any, solution to this is - but I thought I should bring it up in case you or anyone else wanted to puzzle it over & see if the competing needs/desires might need to be considered.

One thing that I recall being discussed was changing the way that we set the is_stmt flag in the DWARF line-table information. As I understand it, we currently set this flag for the first instruction in any sequence that is on the same line. This is, in part, why the debugger appears to jump around when stepping through code with speculated instructions, etc. If we did not do this for out-of-place instructions, then we might be able to keep for debugging information for tools while still providing a reasonable debugging experience.

-Hal

- Dave

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--

Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Robinson, Paul via llvm-dev

unread,

Dec 7, 2016, 10:39:31 AM12/7/16

to Hal Finkel, David Blaikie, llvm...@lists.llvm.org

>> I don't know what the right, if any, solution to this is - but I
>> thought I should bring it up in case you or anyone else wanted to
>> puzzle it over & see if the competing needs/desires might need to be
>> considered.
> One thing that I recall being discussed was changing the way that we
> set the is_stmt flag in the DWARF line-table information. As I
> understand it, we currently set this flag for the first instruction in
> any sequence that is on the same line. This is, in part, why the
> debugger appears to jump around when stepping through code with
> speculated instructions, etc. If we did not do this for out-of-place
> instructions, then we might be able to keep for debugging information
> for tools while still providing a reasonable debugging experience.

When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for
accurate source attribution is just asking for trouble. It happens to
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full
of ambiguities, as shown above. Sanitizers need a more reliable
source-of-truth, i.e. they should encode source info into their own
instrumentation.

--paulr

Reid Kleckner via llvm-dev

unread,

Dec 7, 2016, 11:23:00 AM12/7/16

to Robinson, Paul, llvm...@lists.llvm.org

On Wed, Dec 7, 2016 at 7:39 AM, Robinson, Paul via llvm-dev <llvm...@lists.llvm.org> wrote:

When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

My personal opinion is that having sanitizers *rely* on debug info for
accurate source attribution is just asking for trouble. It happens to
work at –O0 but cannot be considered reliable in the face of optimization.
IMO this is a fundamental design flaw; debug info is best-effort and full
of ambiguities, as shown above. Sanitizers need a more reliable
source-of-truth, i.e. they should encode source info into their own
instrumentation.

I don't see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

It really doesn't make sense for ASan to invent another mechanism to track source location information. Any mechanism we build would be so redundant with debug info that, as an implementation detail, we would find a way to make them use the same storage when possible. With that in mind, maybe we should really find a way to mark source locations as "hoisted" or "sunk" so that we can suppress them from our line tables or do something else clever.

Robinson, Paul via llvm-dev

unread,

Dec 7, 2016, 12:02:13 PM12/7/16

to Reid Kleckner, llvm...@lists.llvm.org

I don't see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about *accurate* source location info for *specific* instructions, the ones that do something ASan cares about. The source attributions for any other instruction is irrelevant to ASan. The source attributions for these instructions *must* survive optimization.

Debuggers care about *useful* source location info for *sets* of instructions, i.e. the instructions related to some particular source statement. If that set is only 90% complete/accurate, instead of 100%, generally that doesn't adversely affect the user experience. If you step past statement A, and happen to execute one or two instructions from the next statement B before you actually stop, generally that is not important to the user. Debuggers are able to tolerate a moderate amount of slop in the source attributions, because absolute accuracy is not critical to correct operation of the debugger. This is why optimizations can get away with dropping attributions that are difficult to represent accurately.

ASan should be able to encode source info for just the instructions it cares about, e.g. pass an index or other encoded representation to the RT calls. Being actual parameters, they will survive any correct optimization, unlike today's situation where multiple calls might be merged by an optimization, damaging the correctness of ASan reports. (We've see this exact thing happen.) ASan does not need a line table mapping all instructions back to their source; it needs a parameter at each call (more or less). It does need a file table, that's the main bit of redundancy with debug info that I see happening.

--paulr

Adrian Prantl via llvm-dev

unread,

Dec 7, 2016, 12:08:13 PM12/7/16

to David Blaikie, llvm-dev

Should it turn out that the needs of instrumentation versus debugging are truly irreconcilable (and I hope they aren't) we could add an instrumentation debugger tuning option as a last resort.

-- adrian

Hal Finkel via llvm-dev

unread,

Dec 7, 2016, 1:19:13 PM12/7/16

to Paul Robinson, llvm...@lists.llvm.org

From: "Paul Robinson" <paul.r...@sony.com>
To: "Reid Kleckner" <r...@google.com>
Cc: "Hal Finkel" <hfi...@anl.gov>, "David Blaikie" <dbla...@gmail.com>, llvm...@lists.llvm.org
Sent: Wednesday, December 7, 2016 11:01:57 AM
Subject: RE: [llvm-dev] Debug Locations for Optimized Code

I don't see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

ASan cares about *accurate* source location info for *specific* instructions, the ones that do something ASan cares about. The source attributions for any other instruction is irrelevant to ASan. The source attributions for these instructions *must* survive optimization.

Debuggers care about *useful* source location info for *sets* of instructions, i.e. the instructions related to some particular source statement. If that set is only 90% complete/accurate, instead of 100%, generally that doesn't adversely affect the user experience. If you step past statement A, and happen to execute one or two instructions from the next statement B before you actually stop, generally that is not important to the user. Debuggers are able to tolerate a moderate amount of slop in the source attributions, because absolute accuracy is not critical to correct operation of the debugger. This is why optimizations can get away with dropping attributions that are difficult to represent accurately.

ASan should be able to encode source info for just the instructions it cares about, e.g. pass an index or other encoded representation to the RT calls. Being actual parameters, they will survive any correct optimization, unlike today's situation where multiple calls might be merged by an optimization, damaging the correctness of ASan reports. (We've see this exact thing happen.) ASan does not need a line table mapping all instructions back to their source; it needs a parameter at each call (more or less). It does need a file table, that's the main bit of redundancy with debug info that I see happening.

I suspect that you misunderstand where ASan instrumentation is added. Unlike UBSan, which is added by Clang during initial IR generation, ASan instrumentation is added late (at the EP_OptimizerLast extension point). I don't see any better way to get the location information at that point than using the existing debug info.

-Hal

Hal Finkel via llvm-dev

unread,

Dec 7, 2016, 1:20:24 PM12/7/16

to Paul Robinson, llvm...@lists.llvm.org

----- Original Message -----
> From: "Paul Robinson" <paul.r...@sony.com>
> To: "Hal Finkel" <hfi...@anl.gov>, "David Blaikie" <dbla...@gmail.com>
> Cc: llvm...@lists.llvm.org
> Sent: Wednesday, December 7, 2016 9:39:16 AM
> Subject: RE: [llvm-dev] Debug Locations for Optimized Code
>

Is there a reason why we must only have one location for every instruction? If not, why not merge them and keep them all?

-Hal

>
> My personal opinion is that having sanitizers *rely* on debug info
> for
> accurate source attribution is just asking for trouble. It happens
> to
> work at –O0 but cannot be considered reliable in the face of
> optimization.
> IMO this is a fundamental design flaw; debug info is best-effort and
> full
> of ambiguities, as shown above. Sanitizers need a more reliable
> source-of-truth, i.e. they should encode source info into their own
> instrumentation.
>
> --paulr
>
>

--

Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

David Blaikie via llvm-dev

unread,

Dec 7, 2016, 1:22:47 PM12/7/16

to Robinson, Paul, Reid Kleckner, llvm...@lists.llvm.org

On Wed, Dec 7, 2016 at 9:02 AM Robinson, Paul <paul.r...@sony.com> wrote:

I don't see how ASan and debuggers are different. It feels like both need reasonably accurate source location attribution for any instruction. ASan just happens to care more about loads and stores than interactive stepping debuggers.

Actually they are pretty different in their requirements.

I think they're closer than they appear below.

ASan cares about *accurate* source location info for *specific* instructions, the ones that do something ASan cares about. The source attributions for any other instruction is irrelevant to ASan. The source attributions for these instructions *must* survive optimization.

Kostya can correct me if I'm wrong - but I don't believe there's a requirement that the must survive anymore than debug info locations.

I believe the sanitizers run on similar requirements about impact on optimizations - they probably don't want to adversely perturb optimizations by adding a more strict location tracking system that was undroppable (maybe I'm wrong here) like intrinsics. I think this is perhaps the critical point - if ASan has the same "don't mess with optimization" requirement as debug info, and it needs high accuracy, it can be no higher than debug info /can/ be (even if it's not that accurate now). If that's the case, then we should endeavor to make debug info (if only for the instructions ASan cares about) as accurate ASan needs, and that benefits all debug info consumers.

Now, if there's a competing need for what information (as I brought up in this thread) hopefully we can have a conversation about what those competing needs look like - how to address them (if we can reconcile the different needs, or need different tuning mode, etc).

David Blaikie via llvm-dev

unread,

Dec 7, 2016, 1:26:52 PM12/7/16

to Hal Finkel, Paul Robinson, llvm...@lists.llvm.org

Not a requirement - of course we could keep them all with some kind of ordered list and even potentially include a "this is the one we would've picked" info (eg: the first one's the one we would pick today, if we would've picked one rather than none) so we could be backwards compatible if desired.

That would be a lot of engineering work to plumb through LLVM the notion of multiple debug locations, I think.

I'm not sure how DWARF (or CodeView) and its consumers currently copes with multiple locations - it's probably technically possible to describe using the line table format (not sure if it's intentional/documented for that purpose), but existing consumers might have to be fixed not to trip over it.

It'd certainly be cute/fun/nice to have the extra fidelity (though all extra fidelity also comes at a size cost to the IR and the resulting object/executable files).

Not sure anyone's in a position to sign up for that work right now - but maybe someone is. (looks like Apple's making a bit of a push on optimized debug info quality at the moment)

- David

Philip Reames via llvm-dev

unread,

Dec 7, 2016, 1:36:18 PM12/7/16

to Robinson, Paul, Reid Kleckner, llvm...@lists.llvm.org

FYI, if we do end up deciding that asan needs a stronger guarantee than debug info can provide, we have another mechanism in tree which is available for this purpose. Operand bundles can be associated with a callsite today to provide a guaranteed side table of value locations at runtime. We use the "deopt" bundle type for exactly this purpose and it's explicitly part of the design to be stronger than debug info and accept the performance impact that implies while trying to minimize it as much as possible. We might have to extend the notion of operand bundles to other instruction types, but the fundamental mechanism is already in the IR.

Philip

Robinson, Paul via llvm-dev

unread,

Dec 7, 2016, 4:03:53 PM12/7/16

to Hal Finkel, llvm...@lists.llvm.org

I suspect that you misunderstand where ASan instrumentation is added. Unlike UBSan, which is added by Clang during initial IR generation, ASan instrumentation is added late (at the EP_OptimizerLast extension point).

You are correct, I did not know that.

I don't see any better way to get the location information at that point than using the existing debug info.

Let us distinguish between information carried around in metadata, and information emitted to the object file.

Seems like it would be completely feasible to flag a DebugLoc instance as "do not emit" and yet retain it in the metadata, rather than erasing it (overwriting it with DebugLoc()). Then the ASan instrumentation can extract the file/line info from it, while we still decline to generate it in the DWARF line table. ASan gets the info it needs, the debugger doesn't get the information that will distress the user.

Another possibility is to flag the DebugLoc for a moved/combined instruction as "not a statement" and cause this flag to suppress the DWARF "is_stmt" flag. That idea would need a little more baking but feels like it could have potential. Some debuggers might have to learn to pay attention to the flag, but that's the debugger's problem.

--paulr

Robinson, Paul via llvm-dev

unread,

Dec 7, 2016, 4:11:39 PM12/7/16

to David Blaikie, Hal Finkel, llvm...@lists.llvm.org

Is there a reason why we must only have one location for every instruction? If not, why not merge them and keep them all?

Not a requirement - of course we could keep them all with some kind of ordered list and even potentially include a "this is the one we would've picked" info (eg: the first one's the one we would pick today, if we would've picked one rather than none) so we could be backwards compatible if desired.

That would be a lot of engineering work to plumb through LLVM the notion of multiple debug locations, I think.

I'm not sure how DWARF (or CodeView) and its consumers currently copes with multiple locations - it's probably technically possible to describe using the line table format (not sure if it's intentional/documented for that purpose), but existing consumers might have to be fixed not to trip over it.

Technically the DWARF encoding of the line table does allow it, I've seen it happen, but not with the intent of describing two real source locations; it was by accident. (And was one of the things that prompted me to submit patch D27492.) I seriously doubt any DWARF consumer takes the trouble to look for it. It's really not clear how a debugger *should* respond to seeing two source locations for one instruction.

--paulr

Kostya Serebryany via llvm-dev

unread,

Dec 7, 2016, 4:44:48 PM12/7/16

to Robinson, Paul, llvm...@lists.llvm.org

my 2c.

the sanitizers rely on debug info to produce human-readable error messages,

and I agree with Reid that it's unwise to have a parallel way of encoding the source locations.

Well, we have something like this in the clang coverage already... Right?

(I never particularly liked this design decision).

But since the debug info is known to be unreliable it kind of made sense. Grrr.

And since the coverage instrumentation is applied early (in clang) we can do it.

asan/etc don't have this luxury.

The sanitizers do not actually rely hard on the correctness of debug info,

but lots of tests in compiler-rt expect the debug info to be sane.

If we break debug info in a way that affects the sanitizers two things may happen:

a) some of the existing *san tests in compiler-rt will start failing. That's usually easy to fix.

b) all tests will continue working but users will be getting less readable reports -- and we will learn about it 6 months from the time of breakage.

That's less welcome, but I am not sure if we can do something here.

--kcc

Evgenii Stepanov via llvm-dev

unread,

Dec 7, 2016, 7:51:21 PM12/7/16

to Kostya Serebryany, llvm...@lists.llvm.org

In fact, we are already using "parallel" debug info. Frame layout
descriptions encode line numbers for local variable declarations. They
don't include file names to keep object size under control, and we
can't really afford to add more duplicate debug info.

Robinson, Paul via llvm-dev

unread,

Dec 7, 2016, 8:52:12 PM12/7/16

to Evgenii Stepanov, Kostya Serebryany, llvm...@lists.llvm.org

> -----Original Message-----
> From: Evgenii Stepanov [mailto:eugeni....@gmail.com]
> Sent: Wednesday, December 07, 2016 4:51 PM
> To: Kostya Serebryany
> Cc: Robinson, Paul; llvm...@lists.llvm.org
> Subject: Re: [llvm-dev] Debug Locations for Optimized Code
>

> In fact, we are already using "parallel" debug info. Frame layout
> descriptions encode line numbers for local variable declarations. They
> don't include file names to keep object size under control, and we
> can't really afford to add more duplicate debug info.
>
> On Wed, Dec 7, 2016 at 1:44 PM, Kostya Serebryany via llvm-dev
> <llvm...@lists.llvm.org> wrote:
> > my 2c.
> >
> > the sanitizers rely on debug info to produce human-readable error
> messages,

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?
--paulr

Kostya Serebryany via llvm-dev

unread,

Dec 8, 2016, 12:35:13 PM12/8/16

to Robinson, Paul, llvm...@lists.llvm.org

On Wed, Dec 7, 2016 at 5:49 PM, Robinson, Paul <paul.r...@sony.com> wrote:

> -----Original Message-----
> From: Evgenii Stepanov [mailto:eugeni.stepanov@gmail.com]
> Sent: Wednesday, December 07, 2016 4:51 PM
> To: Kostya Serebryany
> Cc: Robinson, Paul; llvm...@lists.llvm.org
> Subject: Re: [llvm-dev] Debug Locations for Optimized Code
>
> In fact, we are already using "parallel" debug info. Frame layout
> descriptions encode line numbers for local variable declarations. They
> don't include file names to keep object size under control, and we
> can't really afford to add more duplicate debug info.
>
> On Wed, Dec 7, 2016 at 1:44 PM, Kostya Serebryany via llvm-dev
> <llvm...@lists.llvm.org> wrote:
> > my 2c.
> >
> > the sanitizers rely on debug info to produce human-readable error
> messages,

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?

the frame layout descriptions contain names of the local variables, which are not part of line tables.

--paulr

Robinson, Paul via llvm-dev

unread,

Dec 8, 2016, 1:37:37 PM12/8/16

to Kostya Serebryany, llvm...@lists.llvm.org

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?

the frame layout descriptions contain names of the local variables, which are not part of line tables.

Sorry, I was not clear, but I was actually responding to this statement:

> Frame layout
> descriptions encode line numbers for local variable declarations. They
> don't include file names to keep object size under control, and we
> can't really afford to add more duplicate debug info.

This suggests that omitting file info from frame layout descriptions is a problem. But surely whoever is reading the descriptions can get file info from the line table, so there is no need to duplicate that info—in short, why is this a problem in frame layouts?

I assume you build your own frame layout descriptions because you want to work with –gmlt info instead of requiring full –g info? If you have full –g then the variable info is in the .debug_info section already.

--paulr

From: Kostya Serebryany [mailto:k...@google.com]

Sent: Thursday, December 08, 2016 9:35 AM
To: Robinson, Paul

Cc: Evgenii Stepanov; llvm...@lists.llvm.org
Subject: Re: [llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 5:49 PM, Robinson, Paul <paul.r...@sony.com> wrote:

Kostya Serebryany via llvm-dev

unread,

Dec 8, 2016, 1:55:18 PM12/8/16

to Robinson, Paul, llvm...@lists.llvm.org

On Thu, Dec 8, 2016 at 10:37 AM, Robinson, Paul <paul.r...@sony.com> wrote:

If the sanitizers already rely on the debug info line table, which
already has a file table, why can't frame layout descriptions use
that as well?

the frame layout descriptions contain names of the local variables, which are not part of line tables.

Sorry, I was not clear, but I was actually responding to this statement:

> Frame layout
> descriptions encode line numbers for local variable declarations. They
> don't include file names to keep object size under control, and we
> can't really afford to add more duplicate debug info.

This suggests that omitting file info from frame layout descriptions is a problem. But surely whoever is reading the descriptions can get file info from the line table, so there is no need to duplicate that info—in short, why is this a problem in frame layouts?

I assume you build your own frame layout descriptions because you want to work with –gmlt info instead of requiring full –g info? If you have full –g then the variable info is in the .debug_info section already.

Yes. Most of the *big* asan users can't use -g due to enormous code size and thus fall back to -gmlt.

So, asan should work well with gmlt

Eric Christopher via llvm-dev

unread,

Dec 9, 2016, 9:02:08 PM12/9/16

to Adrian Prantl, David Blaikie, llvm-dev

Ultimately I agree with Adrian here. I'd also need a compelling argument that not only are these things irreconcilable, but that there's nothing we can do as far as information or even dwarf changes to ameliorate the problems.

Thanks.

-eric

Andrea Di Biagio via llvm-dev

unread,

Dec 15, 2016, 10:05:11 AM12/15/16

to Robinson, Paul, llvm...@lists.llvm.org

On Wed, Dec 7, 2016 at 3:39 PM, Robinson, Paul via llvm-dev <llvm...@lists.llvm.org> wrote:

>> I don't know what the right, if any, solution to this is - but I
>> thought I should bring it up in case you or anyone else wanted to
>> puzzle it over & see if the competing needs/desires might need to be
>> considered.
> One thing that I recall being discussed was changing the way that we
> set the is_stmt flag in the DWARF line-table information. As I
> understand it, we currently set this flag for the first instruction in
> any sequence that is on the same line. This is, in part, why the
> debugger appears to jump around when stepping through code with
> speculated instructions, etc. If we did not do this for out-of-place
> instructions, then we might be able to keep for debugging information
> for tools while still providing a reasonable debugging experience.

When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

It would be nice to have a way to mark the debugloc of an instruction which has been hoisted or sunk. Marked locations would not be treated as "reccomended breakpoint locations". That means, we could take advantage of that bit of information to decide how to emit the 'is_stmt' flag.

For the purpose of sample pgo, is_stmt=0 would not be enough. So, it would be nice if we could encode that information within the discriminator. For example, we could emit a 'special' discriminator to notify consumers (example: autofdo) that the location should not be used for the purpose of pgo. This is obviously just an idea; not sure whether it makes sense, nor if it would negatively affect this RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html.

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

For the record: revision 289661 (http://llvm.org/viewvc/llvm-project?view=revision&revision=289661) added a new API to obtain a merged/combined location from multiple source locations. That said, the new functionality added at that revision is just a stub. So, it is still unclear at the moment how we could 'use' that in future.

-Andrea

Hal Finkel via llvm-dev

unread,

Dec 15, 2016, 10:32:04 AM12/15/16

to Andrea Di Biagio, llvm...@lists.llvm.org

From: "Andrea Di Biagio" <andrea....@gmail.com>
To: "Paul Robinson" <paul.r...@sony.com>
Cc: "Hal Finkel" <hfi...@anl.gov>, "David Blaikie" <dbla...@gmail.com>, llvm...@lists.llvm.org
Sent: Thursday, December 15, 2016 9:05:00 AM
Subject: Re: [llvm-dev] Debug Locations for Optimized Code

On Wed, Dec 7, 2016 at 3:39 PM, Robinson, Paul via llvm-dev <llvm...@lists.llvm.org> wrote:
>> I don't know what the right, if any, solution to this is - but I
>> thought I should bring it up in case you or anyone else wanted to
>> puzzle it over & see if the competing needs/desires might need to be
>> considered.
> One thing that I recall being discussed was changing the way that we
> set the is_stmt flag in the DWARF line-table information. As I
> understand it, we currently set this flag for the first instruction in
> any sequence that is on the same line. This is, in part, why the
> debugger appears to jump around when stepping through code with
> speculated instructions, etc. If we did not do this for out-of-place
> instructions, then we might be able to keep for debugging information
> for tools while still providing a reasonable debugging experience.

When we are looking at a situation where an instruction is merely *moved*
from one place to another, retaining the source location and having a
less naïve statement-marking tactic could help the debugging experience
without perturbing other consumers (although one still wonders whether
profiles will get messed up in cases where e.g. a loop invariant gets
hoisted out of a cold loop into a hot predecessor).

It would be nice to have a way to mark the debugloc of an instruction which has been hoisted or sunk. Marked locations would not be treated as "reccomended breakpoint locations". That means, we could take advantage of that bit of information to decide how to emit the 'is_stmt' flag.
For the purpose of sample pgo, is_stmt=0 would not be enough. So, it would be nice if we could encode that information within the discriminator. For example, we could emit a 'special' discriminator to notify consumers (example: autofdo) that the location should not be used for the purpose of pgo. This is obviously just an idea; not sure whether it makes sense, nor if it would negatively affect this RFC: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106532.html.

I think that you could use that scheme: just encode a duplication factor of zero. Dehao, is that correct?

-Hal

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

For the record: revision 289661 (http://llvm.org/viewvc/llvm-project?view=revision&revision=289661) added a new API to obtain a merged/combined location from multiple source locations. That said, the new functionality added at that revision is just a stub. So, it is still unclear at the moment how we could 'use' that in future.

-Andrea

Andrea Di Biagio via llvm-dev

unread,

Dec 15, 2016, 1:56:06 PM12/15/16

to Hal Finkel, llvm-dev

If that is correct, and we can really use the zero factor as a special case, then that would be great!
It would solve the pgo use case (at least for hoisted/sunk instructions) at the cost of emitting extra discriminators, and it would probably make people happier because we would not need to remove debug locations anymore.

- Andrea

Dehao Chen via llvm-dev

unread,

Dec 15, 2016, 5:36:37 PM12/15/16

to Hal Finkel, llvm-dev

That's an interesting idea! Yes, in theory this should work. But in the current implementation "under review", we cannot distinguish between "duplication factor=0" and "no duplication factor". I'll need to update the encoding to make duplication_factor=0 explicit.

Dehao

-Hal

When we are looking at a situation where two instructions are *merged* or
*combined* into one, and the original two instructions had different
source locations, that's a separate problem. In that case there is no
single correct source location for the new instruction, and typically
erasing the source location will give a better debugging experience (also
a less misleading profile).

For the record: revision 289661 (http://llvm.org/viewvc/llvm-project?view=revision&revision=289661) added a new API to obtain a merged/combined location from multiple source locations. That said, the new functionality added at that revision is just a stub. So, it is still unclear at the moment how we could 'use' that in future.

-Andrea

--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

Reply all

Reply to author

Forward