Hi Ammar, bcov sounds really interesting.
I just completed implementation for LLVM source-based branch condition coverage, which extends llvm-cov visualization and makes use of (mostly) existing counter instrumentation to track True/False execution counts for leaf-conditions comprising any source-level Boolean expression. I plan to upstream this support soon. MC/DC would also be the next logical step.
Having a binary-level coverage tool is a great complement, and perhaps you can utilize the extensions I made llvm-cov for visualization.
-Alan Phipps
Hi Ammar,
Great work!
I think the following statement in the paper is not true: "gcov relies on debug
information which is less accurate in optimized builds." GCC's gcov
implementation does not rely on debugging information. This can be verified with
-fdump-debug: `gcc --coverage -fdump-debug -c a.c` dumps an empty a.c.*.debug
file.
clang supports gcov instrumentation as well. clang --coverage does leverage
debugging information, but the usage is very minimum - debugging information is
just used to retrieve the start line of a function. I have studied and
contributed a bit to clang's gcov implementation recently and thus I am fairly
confident about this.
> In comparison, llvm-cov features a custom mapping format embedded in
> LLVM’s intermediate representation (IR).
I don't know whether a clarification is needed here. llvm-cov is a frontend tool
which supports two formats: (1) coverage mapping (clang -fprofile-instr-generate
-fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be clearer in
the context.
> Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) which add
> varying overhead. Extending bcov with additional feedback signals, similar to
> sancov, is an interesting future work
SanitizerCoverage is a standalone instrumentation pass
(llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp)
which is not coupled with asan. -fsanitize-coverage= can be used standalone, or
together with asan, lsan, msan, ubsan, etc.
Its overhead can be very small, especially if you use the recent inline-bool-flag
https://reviews.llvm.org/D77244
(`clang -fsanitize-coverage=inline-bool-flag a.c`) or inline-8bit-counters.
You can choose 3 modes: edge (default), bb, and func.
The flags are stored in a section __sancov_bools.
There is no standalone dumper though...
A few random comments below.
If I understand correctly, by leveraging superblock dominator graph (from
"Dominators, Super Blocks, and Program Coverage") and the any-node policy, the
coverage result is binary: not-covered vs covered. I agree that in many cases
this is sufficient. gcov and -fcoverage-mapping, on the other hand, can provide
line execution counts, which are handy in other cases. This fact should probably
mentioned somewhere, "RQ2: Instrumentation overhead" or another place. Both
-fcoverage-mapping and gcov provide more information, so they have larger
overhead (I agree that gcov isn't a format optimized for file sizes - it uses
uint32_t records everywhere which can be quite expensive representing smaller
integers).
bcov currently rewrites program headers but not the section header table, so for
example, objdump -d a.any cannot disassemble the synthesized code.
bcov depends on capstone, which appears to be a pretty standard disassembly
tool... It is based on a ~2013 shimmed-down version of LLVM MC and
LLVM*Disassembler. The generated files (e.g. arch/X86/X86DisassemblerDecoder.c)
get ad-hoc sporadic amendment from contributors over the years.. It is a bit
unfortunate for LLVM that MC and LLVM*Disassembler (Machine Code, including
assembly/disassembly libraries) cannot be easily adapted by downstream
users..... (Well, downstream packages can use LLVM exported CMake files and use
find_program(LLVM) and link against some LLVM* libraries but this may pull in a
large set of dependencies)
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
clang supports gcov instrumentation as well. clang --coverage does leverage
debugging information, but the usage is very minimum - debugging information is
just used to retrieve the start line of a function. I have studied and
contributed a bit to clang's gcov implementation recently and thus I am fairly
confident about this.
> In comparison, llvm-cov features a custom mapping format embedded in
> LLVM’s intermediate representation (IR).
I don't know whether a clarification is needed here. llvm-cov is a frontend tool
which supports two formats: (1) coverage mapping (clang -fprofile-instr-generate
-fcoverage-mapping) and (2) gcov. Using -fcoverage-mapping might be clearer in
the context.
> Also, sancov is tightly coupled with LLVM sanitizers (e.g., ASan) which add
> varying overhead. Extending bcov with additional feedback signals, similar to
> sancov, is an interesting future work
SanitizerCoverage is a standalone instrumentation pass
(llvm/lib/Transforms/Instrumentation/SanitizerCoverage.cpp)
which is not coupled with asan. -fsanitize-coverage= can be used standalone, or
together with asan, lsan, msan, ubsan, etc.
Its overhead can be very small, especially if you use the recent inline-bool-flag
https://reviews.llvm.org/D77244
(`clang -fsanitize-coverage=inline-bool-flag a.c`) or inline-8bit-counters.
You can choose 3 modes: edge (default), bb, and func.
The flags are stored in a section __sancov_bools.
There is no standalone dumper though...
A few random comments below.
If I understand correctly, by leveraging superblock dominator graph (from
"Dominators, Super Blocks, and Program Coverage") and the any-node policy, the
coverage result is binary: not-covered vs covered. I agree that in many cases
this is sufficient. gcov and -fcoverage-mapping, on the other hand, can provide
line execution counts, which are handy in other cases. This fact should probably
mentioned somewhere, "RQ2: Instrumentation overhead" or another place. Both
-fcoverage-mapping and gcov provide more information, so they have larger
overhead (I agree that gcov isn't a format optimized for file sizes - it uses
uint32_t records everywhere which can be quite expensive representing smaller
integers).
bcov currently rewrites program headers but not the section header table, so for
example, objdump -d a.any cannot disassemble the synthesized code.
From a document of IRIX, https://irix7.com/techpubs/007-2479-001.pdf
p90, I believe it works in a way very similar to bcov.
Run pixie to generate the equivalent program containing
basic-block-counting code.
% pixie myprog
...
pixie takes myprog and writes an equivalent program, myprog.pixie,
containing additional code that counts the execution of each basic
block
IRIX is a discontinued operating system. I can't find more information
on this tool. Hope some MIPS folks on the list can provide more
information.
Sadly it can't. trace-pc-guard is the only feature which can currently
dump information to a .sancov file, which can then be inspected by
the 'sancov' tool.
The runtime must be provided by a sanitizer,
-fsanitize={address,memory,thread,fuzzer,undefined,cfi}. There must be a
runtime. So saying SanitizerCoverage is coupled with sanitizers is probably fine.
As to me, I cannot provide any additional info. This tool is too ancient.
--
Simon Atanasyan