[llvm-dev] Code coverage BoF - notes and updates

Vedant Kumar via llvm-dev

unread,

Oct 24, 2017, 4:24:44 PM10/24/17

to llvm-dev

Hello,

Our goals for the code coverage BoF (10/19) were to find areas where we can improve the coverage tooling, and to learn more about how coverage is used. I'd like to thank all of the attendees for their input and for making the BoF productive. Special thanks to Mandeep Grang, who volunteered as a mic runner at the last minute.

In this email I'll share my (rough) notes and outline some future plans. Please feel free to ask for clarifications or to add your own notes.

Here are the slides from the BoF:

https://docs.google.com/presentation/d/e/2PACX-1vS-rV02j1zhPq9Y6AtcUkbZW2c7Q5YYuQ6FPxN-aYiKwrw6c8DU3zW_RYeJlWPMZ5-S6hgz_CIcL8Gd/pub?start=false&loop=false&delayms=3000&slide=id.p

1. The header problem

Coverage instrumentation overhead is roughly quadratic in the number of translation units in a project. The problem is that coverage mappings for template instantiations and static inline functions from headers are pulled into every TU. This bloats the profile metadata sections (which can slow down profile I/O), results in large binaries, and causes long link times (or link failures).

We could solve this problem by maintaining an external coverage database and discarding duplicate coverage mappings from the DB. Another idea is to emit coverage mappings to a side file and unique them when generating coverage reports. Both ideas require changes to the build workflow.

A third option is to emit named coverage mappings with linkonce_odr linkage (for languages with an ODR). This would be a format-breaking change but it wouldn't affect the build workflow. My plan is to try and evaluate this idea in the coming week.

2. HTML report quality

There seems to be widespread interest in improving the quality of coverage reports. We need volunteers to work on this and would love your help! Here are some desired features:

* Search and filtering for coverage summaries

* Collapsing parts of a coverage summary by subdirectory

* Automatically generating a top 10 list of code regions which need better coverage

* Searching via complex queries (e.g: 'give me uncovered regions in covered lines', or 'give me uncovered regions after a call')

* Generating coverage deltas between two profiles, and identifying coverage regressions in a patch/commit

* Simplified tracking of coverage trends over time

There is some consensus that this functionality should not be built on top of the existing llvm-cov C++ codebase. It might be better to develop these features in a language more amenable to rapid prototyping and interoperation with popular web application frameworks (perhaps Python). To facilitate this, llvm-cov gained support for exporting all of its data to JSON (see CoverageExporterJson.cpp). If you are interested in working on these features, I would be happy to work with you on design issues and on code review.

3. Optimizing profile counter placement

From Eli's notes:

I remember we also spent some time discussing the counter intrinsics, and whether we could produce a different set of intrinsics in the frontend, and produce the counters later in the pipeline to avoid duplicate counters. I didn't completely follow that discussion; I haven't spent much time looking at the counter intrinsics or how they're lowered.

Just to recap: the frontend emits calls to the llvm.instrprof_increment intrinsic to implement counter updates. Each increment intrinsic is passed a function name and a counter index (there's a mapping between AST nodes and counter indices). The intrinsics are lowered in the InstrProfiling pass. During lowering, an array of uint64_t counters is created for each function, and the intrinsic calls are replaced by a load-add-store pattern.

Frontend counter updates can look highly redundant because of inlining. It's common to see single basic blocks with tens of distinct counter updates, most of which are redundant. One potential solution is to create a minimal set of profile counter updates after the inliner runs, and to map these counters back to AST nodes (https://bugs.llvm.org/show_bug.cgi?id=33500). This is the most promising approach we know of to cut down on counter updates, but I don't have a precise idea of how it would work. Here's a rough sketch of a solution:

* Have the frontend emit 'virtual' llvm.instrprof_increment intrinsics. These will eventually be discarded during lowering.

* Run an early inlining step, then run the IR PGO pass.

* In the lowering step, emit a section into the object which describes how to map the real counter updates to the virtual ones. I don't have a clear idea of how to build or encode this mapping.

* Teach llvm-profdata how to reconstruct an indexed profile which the frontend can understand (i.e map the real counters back to the virtual ones). llvm-profdata would need to inspect the mapping section in the binary to accomplish this.

4. Optimizing profile counter updates

We had a few different suggestions to speed up profile counter updates:

* Make function counter arrays linkonce_odr when possible. This is similar to the solution from the first section ("The header problem"). I'll try to evaluate this idea in the coming week.

* Enable register promotion for counter updates which occur within loops. David Li has already done the work to enable this for IR PGO.

* Investigate the # of relocations emitted for counter updates. It might be cheaper to load the address of the function counter array once and index into it, instead of indexing into the global on each update.

* Use 32-bit counters. This would cut the size of the counters section in half and speed up profile I/O.

* Use 1-bit counters. This could be useful for those who are only interested in binary coverage. IMO there are other ideas we should try before compromising on report accuracy.

* Use saturating counters. IMO this isn't likely to be a win in common cases, but could increase compile time and code size.

5. Using coverage interactively while hacking on llvm

During the BoF I mentioned that it can be really useful to see coverage reporting interactively, as you're working on a patch. Here's a hacky way to do this:

* Build your code as you normally would (say, "ninja opt")

* Change the files you're interested in

* cd to your build directory and export CCC_OVERRIDE_OPTIONS="+-fcoverage-mapping +-fprofile-instr-generate=/tmp/opt_%m.profraw"

* Rebuild ("ninja opt" again). This will enable coverage instrumentation, but only for the files you've affected with your changes.

* Run a one-liner to generate a coverage report (http://clang.llvm.org/docs/SourceBasedCodeCoverage.html#creating-coverage-reports)

I like this approach because it means I don't have to maintain a separate, coverage-enabled build tree. It's an easy way to check that your patches have decent test coverage. If I want to disable coverage reporting I just need to unset CCC_OVERRIDE_OPTIONS and recompile.

6. C APIs for libCoverage

We didn't get a chance to discuss this in detail during the BoF, but I would like to upstream some C APIs to surface functionality from libCoverage. This will make it easier for IDEs and editors to display coverage information "in-line", right next to source code. Here's what that might look like:

https://developer.apple.com/library/content/documentation/DeveloperTools/Conceptual/testing_with_xcode/chapters/07-code_coverage.html

If anyone has concerns about adding in these APIs, please let me know!

7. Making use of debug info

From Eli's notes:

It seemed like we got a lot of questions related to why we aren't using debug info. :) It might be possible to come up with some sort of hybrid which trades off runtime overhead for lower resolution, without completely throwing away regions like gcov does. But it would be a big project, and the end result would still have a lot of the same problems as actual gcov in terms of the optimizer destroying necessary info.

To add to this: I think there are a lot of unanswered questions here. It's unclear how clang would decide to use debug info instead of regions, or how the different types of coverage counters would interact. I'm not very optimistic about this.

thanks,

vedant

Dean Michael Berris via llvm-dev

unread,

Oct 24, 2017, 4:54:07 PM10/24/17

to Vedant Kumar, LLVM Developers

Thanks for the summary Vedant!

I'm sorry I missed this BoF session.

Others have mentioned the possibility of maybe using XRay for some of this information (function-level coverage, maybe having more intrinsics for marking branch/basic-block level instrumentation). Was this explored in the BoF? Is there interest in potentially exploring this particular space?

Cheers

_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- Dean

Vedant Kumar via llvm-dev

unread,

Oct 24, 2017, 5:06:24 PM10/24/17

to Dean Michael Berris, LLVM Developers

Hi Dean,

We didn't discuss using XRay instrumentation during the BoF but it is an interesting idea (by the way, thanks for your talk about XRay internals!). XRay provides the advantage of being able to turn profiling on and off, but I'm not sure how the resulting data could be used.

The code coverage feature is highly dependent on the frontend's profile counter placement. The mapping between counters and parts of the AST is used to gather accurate information about regions within a line. For example, the coverage tool can show you that the l.h.s of "true || false" is evaluated once, and the r.h.s isn't evaluated. This works with arbitrarily nested short-circuit operators.

It might be possible to use XRay instrumentation to gather profile data, but I think it will be challenging to precisely map that data back to the AST nodes the frontend knows about. The problem is similar to the one I've outlined in section 3 ("Optimizing profile counter placement"). The idea there is to map a minimal set of counters placed by IR PGO back to AST nodes: the one sketch of a solution I have still depends on running the frontend counter placement pass to achieve this.

What are your thoughts on this?

thanks,

vedant

Alex L via llvm-dev

unread,

Oct 24, 2017, 7:19:26 PM10/24/17

to Vedant Kumar, llvm-dev

It might be possible to avoid changing llvm-profdata by teaching compiler-rt how to propagate the counter values from a subset of emitted counters to all "virtual" counters before the counter values are written out by compiler-rt to disk.

Dean Michael Berris via llvm-dev

unread,

Oct 25, 2017, 6:47:27 AM10/25/17

to Vedant Kumar, LLVM Developers

On 25 Oct 2017, at 10:06, Vedant Kumar <v...@apple.com> wrote:

Hi Dean,

We didn't discuss using XRay instrumentation during the BoF but it is an interesting idea (by the way, thanks for your talk about XRay internals!). XRay provides the advantage of being able to turn profiling on and off, but I'm not sure how the resulting data could be used.

The talk was my pleasure -- hopefully when the video comes out, more people can get up to speed on the intricacies involved in making things work. :)

The dynamic control might not be as important as being able to patch pre-main and get execution coverage.

The code coverage feature is highly dependent on the frontend's profile counter placement. The mapping between counters and parts of the AST is used to gather accurate information about regions within a line. For example, the coverage tool can show you that the l.h.s of "true || false" is evaluated once, and the r.h.s isn't evaluated. This works with arbitrarily nested short-circuit operators.

Yes, this is cool stuff. The way we try to do this in XRay is at least to do the following:

- When we're lowering the functions into assembler, we're able to maintain a side-table (separate section) that maps where the instrumentation points are. The instrumentation points are currently only automatically placed for function entry and exits. We use heuristics to figure out which functions get instrumented, but can be flag-controlled.

- XRay also has support for inserting intrinsics at the IR level that mark "custom events" which take a pointer and a size -- the handler in the XRay runtime can then do what it deems useful to do at these custom event pointer. The downside to this intrinsic is that it behaves as a function call, and might cause some (re)ordering issues or prevent some important optimisations for code motion. Maybe coverage mode runs don't necessarily care about the optimisations that happen and this is not an issue.

It seems to me that the combination of emitting custom events along with a handler implementation at a basic block level and still be able to correlate information more precisely to the AST (source-level) might be good to do. DWARF info might not be the best thing to use to determine whether a specific sub-line path was taken, but it's certainly helpful. The best I can think of at least with XRay as something to build upon is a way of also marking where at source-level certain branches/blocks are. This might be doable in a compact manner as a CFG path, and if there's a consistent numbering of the CFG paths those could be used to track the counters/latches in some sort of table.

I don't have a very clear picture yet in my head how that table might look like or how that's different compared to the way it's being done now, so I'll have to sit on it a bit longer and maybe whiteboard/sketch it out a bit more.

It might be possible to use XRay instrumentation to gather profile data, but I think it will be challenging to precisely map that data back to the AST nodes the frontend knows about. The problem is similar to the one I've outlined in section 3 ("Optimizing profile counter placement"). The idea there is to map a minimal set of counters placed by IR PGO back to AST nodes: the one sketch of a solution I have still depends on running the frontend counter placement pass to achieve this.

What are your thoughts on this?

There's probably a representation we can use that will indicate which paths in a CFG for a function were visited, and associating which parts of the AST have been covered already. Assuming for example that we can give all paths from a function's entry to a function's exits a unique number (for that function) then we can certainly re-create that path as we're going through the execution path and collect that information for coverage/reconstruction purposes.

The advantage of using XRay in the collection path is so you can process the data either in the handler implementation(s) while the program is running and collecting the pre-computed coverage paths somehow (through files, or in memory collection) or even collecting a bit more information and processing it offline (things like collecting frequency and some PMU data while you're at it).

Currently I don't have time to explore this further, but if you or someone else might be interested in giving this a go, I'd certainly be happy to further explore at least the design of a potential alternative solution that involves using XRay for coverage/PGO data gathering.

Cheers

-- Dean

Vedant Kumar via llvm-dev

unread,

Oct 25, 2017, 5:19:55 PM10/25/17

to Dean Michael Berris, LLVM Developers

On Oct 25, 2017, at 3:47 AM, Dean Michael Berris <dean....@gmail.com> wrote:

On 25 Oct 2017, at 10:06, Vedant Kumar <v...@apple.com> wrote:

Hi Dean,

We didn't discuss using XRay instrumentation during the BoF but it is an interesting idea (by the way, thanks for your talk about XRay internals!). XRay provides the advantage of being able to turn profiling on and off, but I'm not sure how the resulting data could be used.

The talk was my pleasure -- hopefully when the video comes out, more people can get up to speed on the intricacies involved in making things work. :)

The dynamic control might not be as important as being able to patch pre-main and get execution coverage.

The code coverage feature is highly dependent on the frontend's profile counter placement. The mapping between counters and parts of the AST is used to gather accurate information about regions within a line. For example, the coverage tool can show you that the l.h.s of "true || false" is evaluated once, and the r.h.s isn't evaluated. This works with arbitrarily nested short-circuit operators.

Yes, this is cool stuff. The way we try to do this in XRay is at least to do the following:

- When we're lowering the functions into assembler, we're able to maintain a side-table (separate section) that maps where the instrumentation points are. The instrumentation points are currently only automatically placed for function entry and exits. We use heuristics to figure out which functions get instrumented, but can be flag-controlled.

- XRay also has support for inserting intrinsics at the IR level that mark "custom events" which take a pointer and a size -- the handler in the XRay runtime can then do what it deems useful to do at these custom event pointer.

Very cool!

The downside to this intrinsic is that it behaves as a function call, and might cause some (re)ordering issues or prevent some important optimisations for code motion. Maybe coverage mode runs don't necessarily care about the optimisations that happen and this is not an issue.

I'm not sure how much we benefit from the optimizer's handling of the load-add-store counter update pattern. I have seen examples where it's able to, e.g unroll a loop and perform a single counter update, or vectorize counter updates. Would it ever make sense to inline the handler at custom event points, or is this counter to XRay's design?

It seems to me that the combination of emitting custom events along with a handler implementation at a basic block level and still be able to correlate information more precisely to the AST (source-level) might be good to do. DWARF info might not be the best thing to use to determine whether a specific sub-line path was taken, but it's certainly helpful. The best I can think of at least with XRay as something to build upon is a way of also marking where at source-level certain branches/blocks are. This might be doable in a compact manner as a CFG path, and if there's a consistent numbering of the CFG paths those could be used to track the counters/latches in some sort of table.

I don't have a very clear picture yet in my head how that table might look like or how that's different compared to the way it's being done now, so I'll have to sit on it a bit longer and maybe whiteboard/sketch it out a bit more.

The profiling pass used for coverage isn't path sensitive, it just looks at basic block counts (not edge counts). I think one way to get started with XRay is to emit custom event points everywhere the llvm.instrprof_increment intrinsic is currently used (see clang's CodeGenPGO). The next step would be to massage the resulting profile into a form the coverage reader can understand.

That said, I think there are some people interested in working with path-sensitive coverage. That's not on my radar right now, but if someone does want to pick this project up, I think it would be useful to at least consider 'standardizing' on one type of instrumentation (like XRay).

It might be possible to use XRay instrumentation to gather profile data, but I think it will be challenging to precisely map that data back to the AST nodes the frontend knows about. The problem is similar to the one I've outlined in section 3 ("Optimizing profile counter placement"). The idea there is to map a minimal set of counters placed by IR PGO back to AST nodes: the one sketch of a solution I have still depends on running the frontend counter placement pass to achieve this.

What are your thoughts on this?

There's probably a representation we can use that will indicate which paths in a CFG for a function were visited, and associating which parts of the AST have been covered already. Assuming for example that we can give all paths from a function's entry to a function's exits a unique number (for that function) then we can certainly re-create that path as we're going through the execution path and collect that information for coverage/reconstruction purposes.

The advantage of using XRay in the collection path is so you can process the data either in the handler implementation(s) while the program is running and collecting the pre-computed coverage paths somehow (through files, or in memory collection) or even collecting a bit more information and processing it offline (things like collecting frequency and some PMU data while you're at it).

Reflecting on coverage while a program is running is useful enough that the profiling runtime has basic support for it. But I agree with your points here. The tooling and flexibility you can get with XRay makes it an interesting choice.

thanks,

vedant

Moshtaghi, Alireza via llvm-dev

unread,

Oct 25, 2017, 5:45:58 PM10/25/17

to Vedant Kumar, llvm-dev

Hi

I’m interested in implementing the solution for the header problem described in (1.) to emit coverage mappings to a side file and unique them when generating coverage reports. As you also mentioned this would require modifying the build workflow. Can you explain how do you suggest changing the build workflow? I tried objcopy-ing the profile data sections from the “.o” files and relinking them again them again but that caused the (just to see if it is possible) but the output raw profile data was not written into.

I’m open to trying any suggestion.

Thanks

A

Reid Kleckner via llvm-dev

unread,

Oct 25, 2017, 6:21:52 PM10/25/17

to Moshtaghi, Alireza, llvm-dev

On Wed, Oct 25, 2017 at 2:45 PM, Moshtaghi, Alireza via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi

I’m interested in implementing the solution for the header problem described in (1.) to emit coverage mappings to a side file and unique them when generating coverage reports. As you also mentioned this would require modifying the build workflow. Can you explain how do you suggest changing the build workflow? I tried objcopy-ing the profile data sections from the “.o” files and relinking them again them again but that caused the (just to see if it is possible) but the output raw profile data was not written into.

I’m open to trying any suggestion.

Isn't that exactly what emitting the coverage data as linkonce_odr accomplishes?

Vedant Kumar via llvm-dev

unread,

Oct 25, 2017, 6:28:54 PM10/25/17

to Moshtaghi, Alireza, llvm-dev

Hi Alireza,

On Oct 25, 2017, at 3:21 PM, Reid Kleckner <r...@google.com> wrote:

On Wed, Oct 25, 2017 at 2:45 PM, Moshtaghi, Alireza via llvm-dev <llvm...@lists.llvm.org> wrote:

Hi
I’m interested in implementing the solution for the header problem described in (1.) to emit coverage mappings to a side file and unique them when generating coverage reports. As you also mentioned this would require modifying the build workflow. Can you explain how do you suggest changing the build workflow? I tried objcopy-ing the profile data sections from the “.o” files and relinking them again them again but that caused the (just to see if it is possible) but the output raw profile data was not written into.

I’m open to trying any suggestion.

I sent out an RFC about emitting coverage mapping data outside of object files in June:

http://lists.llvm.org/pipermail/llvm-dev/2017-June/114855.html

There was a lot of feedback about that idea, so I encourage you to take a look at that thread. The two main take-aways (for me) were that:

1) The build system should know about the side files, so it can regenerate them and clean them up as needed

2) If we're going to use side-files, we should try to reuse existing linker and tooling support for split-DWARF as much as possible

I've heard arguments about the second point that cut both ways. It would be convenient to rely on linker support for not copying debug info sections into executables. OTOH, you'd need a tool like dsymutil to copy instrumented programs to a different machine.

Isn't that exactly what emitting the coverage data as linkonce_odr accomplishes?

Yeah, I'm in favor of trying this idea out first because it requires less work on linkers and build systems.

vedant

Vedant Kumar via llvm-dev

unread,

Oct 30, 2017, 4:42:02 PM10/30/17

to llvm-dev, mmo...@chromium.org

On Oct 24, 2017, at 1:24 PM, Vedant Kumar <v...@apple.com> wrote:

Hello,

Our goals for the code coverage BoF (10/19) were to find areas where we can improve the coverage tooling, and to learn more about how coverage is used. I'd like to thank all of the attendees for their input and for making the BoF productive. Special thanks to Mandeep Grang, who volunteered as a mic runner at the last minute.

In this email I'll share my (rough) notes and outline some future plans. Please feel free to ask for clarifications or to add your own notes.

Here are the slides from the BoF:
https://docs.google.com/presentation/d/e/2PACX-1vS-rV02j1zhPq9Y6AtcUkbZW2c7Q5YYuQ6FPxN-aYiKwrw6c8DU3zW_RYeJlWPMZ5-S6hgz_CIcL8Gd/pub?start=false&loop=false&delayms=3000&slide=id.p

1. The header problem

Coverage instrumentation overhead is roughly quadratic in the number of translation units in a project. The problem is that coverage mappings for template instantiations and static inline functions from headers are pulled into every TU. This bloats the profile metadata sections (which can slow down profile I/O), results in large binaries, and causes long link times (or link failures).

We could solve this problem by maintaining an external coverage database and discarding duplicate coverage mappings from the DB. Another idea is to emit coverage mappings to a side file and unique them when generating coverage reports. Both ideas require changes to the build workflow.

A third option is to emit named coverage mappings with linkonce_odr linkage (for languages with an ODR). This would be a format-breaking change but it wouldn't affect the build workflow. My plan is to try and evaluate this idea in the coming week.

Following up on this thread:

I found that marking coverage mappings, function records, and names of functions from headers as linkonce_odr results in decent binary size savings. I tested this idea out and reported my results here: https://bugs.llvm.org/show_bug.cgi?id=34533. I think this is the solution we should go with, but am curious to know what others think.

thanks,

vedant

Bob Wilson via llvm-dev

unread,

Oct 30, 2017, 6:24:14 PM10/30/17

to Vedant Kumar, llvm-dev, mmo...@chromium.org

On Oct 30, 2017, at 1:41 PM, Vedant Kumar <v...@apple.com> wrote:

On Oct 24, 2017, at 1:24 PM, Vedant Kumar <v...@apple.com> wrote:

Hello,

Our goals for the code coverage BoF (10/19) were to find areas where we can improve the coverage tooling, and to learn more about how coverage is used. I'd like to thank all of the attendees for their input and for making the BoF productive. Special thanks to Mandeep Grang, who volunteered as a mic runner at the last minute.

In this email I'll share my (rough) notes and outline some future plans. Please feel free to ask for clarifications or to add your own notes.

Here are the slides from the BoF:
https://docs.google.com/presentation/d/e/2PACX-1vS-rV02j1zhPq9Y6AtcUkbZW2c7Q5YYuQ6FPxN-aYiKwrw6c8DU3zW_RYeJlWPMZ5-S6hgz_CIcL8Gd/pub?start=false&loop=false&delayms=3000&slide=id.p

1. The header problem

Coverage instrumentation overhead is roughly quadratic in the number of translation units in a project. The problem is that coverage mappings for template instantiations and static inline functions from headers are pulled into every TU. This bloats the profile metadata sections (which can slow down profile I/O), results in large binaries, and causes long link times (or link failures).

We could solve this problem by maintaining an external coverage database and discarding duplicate coverage mappings from the DB. Another idea is to emit coverage mappings to a side file and unique them when generating coverage reports. Both ideas require changes to the build workflow.

A third option is to emit named coverage mappings with linkonce_odr linkage (for languages with an ODR). This would be a format-breaking change but it wouldn't affect the build workflow. My plan is to try and evaluate this idea in the coming week.

Following up on this thread:

I found that marking coverage mappings, function records, and names of functions from headers as linkonce_odr results in decent binary size savings. I tested this idea out and reported my results here: https://bugs.llvm.org/show_bug.cgi?id=34533. I think this is the solution we should go with, but am curious to know what others think.

thanks,
vedant

This seems like a good step forward. An external coverage database might still be a good idea, but that these solutions are complementary.

Reply all

Reply to author

Forward