There is a resource difference between clang -gen-reproducer /
environment variable "FORCE_CLANG_DIAGNOSTICS_CRASH" and ld.lld --reproduce.
clang -gen-reproducer produces a source file and a .sh file for one
single translation unit, the space consumption is low.
ld.lld --reproduce can potentially pack a large list of files, which may
take hundreds of megabytes or several gigabytes.
I am skeptical that users will want to have this behavior by default.
If this behavior is guarded by an option, it might be fine.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
I'll retract my words about an option. This behavior looks like it
needs a fair bit of customization and is build system dependent.
You can replace the proposed option with a shell script wrapper, which
is more convenient than implementing the restartable action in the
clang driver.
When dealing with linker problems, (I doubt there are many nowadays;
when there are problems, mostly are LTO problems), I will usually
change compiler/linker options a bit.
If you do this, you may only specify the proposed option when all the
stuff has been done, but then it is only a very small extra step to
invoke the link again with -Wl,--reproduce.
Probably would help (if this isn't done already) this part at least
(ie: users who don't have this newly proposed feature enabled) if
lld's crash reporter printed the command line to run with the extra
flag "to reproduce this run <this command>" for discoverability?
(not to derail the primary discussion on this thread, which I don't
have much opinion on)
The crash report can be easily implemented via a shell script, but is difficult
to implementat reliably in the process itself. When a process crashes,
naturally not everything can work very robustly. The process wants to recover
some state and starts a .tar writer, collects every touched file and places
them in the .tar writer. There are many steps things can go afoul. I am
worrying about the robustness. Of course, this may be solved by a multiprocess
architecture, but I am not sure we want to pay the complexity in the LLD
entrypoint itself.
(Crashing LLD is not the idea I hear a lot. For some groups it has been very stable.
The crashes are more frequently from some optimizations triggered by llvm/lib/LTO.
The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like to provide.)
On the other hand, this task seems to require a fair amount of customization to
me. First we have the tarball size problem. Then say there is a common crash
and 100 links of a similar kind crash at the same time, do we write 100
tarballs? In a controlled environment, for example when there is some
deduplicater or throttling this may be feasible. The output filename may want
customization as well, and different groups may have different opinions. It
feels to me that a script, whether or not LLD has the built-in crash reporting
feature, is indispensable. Then the built-in C++ crash reporter code in LLD
does not convince me.
On 2021-04-15, Manoj Gupta via llvm-dev wrote:
>LLD reproducers is something we'd like to have in Chrome OS as well, see
>bug https://bugs.chromium.org/p/chromium/issues/detail?id=1134940 (no
>activity yet).
>Our plan is to create a shell wrapper and re-exec LLD if needed with
>--reproduce. Obviously, if LLD supports creating reproducers natively,
>that'd be great!
>
>-Manoj
The crash report can be easily implemented via a shell script, but is difficult
to implementat reliably in the process itself. When a process crashes,
naturally not everything can work very robustly. The process wants to recover
some state and starts a .tar writer, collects every touched file and places
them in the .tar writer. There are many steps things can go afoul. I am
worrying about the robustness. Of course, this may be solved by a multiprocess
architecture, but I am not sure we want to pay the complexity in the LLD
entrypoint itself.
(Crashing LLD is not the idea I hear a lot. For some groups it has been very stable.
The crashes are more frequently from some optimizations triggered by llvm/lib/LTO.
The nature of the crashes is useful, if Fuchsia/ChromeOS folks would like to provide.)
On the other hand, this task seems to require a fair amount of customization to
me. First we have the tarball size problem. Then say there is a common crash
and 100 links of a similar kind crash at the same time, do we write 100
tarballs? In a controlled environment, for example when there is some
deduplicater or throttling this may be feasible. The output filename may want
customization as well, and different groups may have different opinions. It
feels to me that a script, whether or not LLD has the built-in crash reporting
feature, is indispensable. Then the built-in C++ crash reporter code in LLD
does not convince me.
The main argument here is whether implementing it in the driver
simplifies things. I doubt it and I care much about the maintainability.
If the logic is simply calling CrashRecoveryContext::RunSafely (actually
we have already done this, just set the environment variable
LLD_IN_TEST=1) and the code can meld into the existing framework for
running the lld entrypoint more than once, this could look fine.
But you actually needed to parse an option in the generic driver code
and do some non-trivial things there.
If the logic is simply `if CrashRecoveryContext::RunSafely finds a
failure, rerun with --reproduce`. I don't think implementing this in the
driver is simpler than `(ld.lld "$@" 2> log; code=$?; if grep 'crash pattern, maybe Stack' log; then ld.lld "$@" --reproduce=somewhere; fi ...).
BTW: if exitLld is called (via fatal, or sufficient number of error()),
lld may enter a less robust state. It uses longjmp to recover from a
crash. AFAIK, only some JIT style library users do this. This is not
sufficiently tested in production systems.
While maintainability is important - in some ways, making the code
more re-entrant friendly could improve maintainability (making it
easier to unit test parts of the code, making it easier to reason
about the code (by removing global shared state), etc).
On Tue, Apr 20, 2021 at 1:00 PM Fangrui Song via llvm-dev
Clang used to do this (& still can, I guess) with the whole driver+cc1
separation. I wonder if clang's crash reporting in the driver could be
generalized/special cased to also work with invocations of lld from
the clang driver? I don't think this'd be good for all cases (some
builds do run the linker directly without using the compiler driver)
but would help with the "severe" crash category, I think?
1 looks good to me as well, if sufficient parties find this useful.
I am concerned with 2 due to fragility of same-process crash reporting.
(I think the stack overflow problem can be mitigated by sigaltstack)
For 3: on Linux, there are many system effects which can suppress crash dumps...
I am still abit concerned with the size of the core file..
If users find this particular useful and want to go down this route, it
is fine to me.
It seems to work pretty well for clang, I think?