Hi everyone,
I'd like to ask about the various symbolizers that are used by ASan and sanitizer_common, and then propose some changes to get better OS X support.
If I understand correctly, the general llvm-symbolizer (and its interface in `LLVMSymbolizerProcess`) should be the easiest to use and supported solution, but we also have a few others:
* `LibbacktraceSymbolizer`, which is an in-process symbolizer that uses libbacktrace (but only supports ELF).
* `Addr2LinePool` uses the addr2line command line tool.
* `WinSymbolizer` uses the DbgHelp.dll on Windows.
* `InternalSymbolizer`, which can be used to link in an externally-built symbolizer.
And the `POSIXSymbolizer` is a wrapper that decides which actual symbolizer will be used. What I'm curious about is what's the primary use and quality of these symbolizers. Are the in-process ones present just to support running inside a sandbox, where we cannot spawn an external process? Are there other reasons to prefer in-process symbolication?
None of the in-process solutions seem to support Darwin, meaning symbolication doesn't work in sandboxed (fork disabled) environments. Another issue is that llvm-symbolizer is not present on any current installation of OS X or Xcode. So in order to transfer an ASanified program to another machine, one has to ship llvm-symbolizer with the program.
While for a lot of use cases, llvm-symbolizer works fine, I'd like to consider adding fallback symbolizers that would work on OS X: In case the llvm-symbolizer executable is not present, we could spawn `atos` instead, which can also be run in an interactive mode and is even able to inspect a running process. We already have such a fallback that uses the `addr2line` tool on Linux, when llvm-symbolizer is not found.
In case forking is disabled, we should consider having an in-process symbolizer that is supported on OS X, like dladdr() or backtrace(). I understand that we cannot just straightforwardly use these functions, because there are concerns about how they internally allocate memory, etc. Can I ask what exactly would be needed to use these calls in a reliable way?
This is also slightly related to ASan issue suppression (
http://reviews.llvm.org/D6280), which for some suppression types requires a working symbolizer and might also benefit from having an in-process symbolizer.
Thank you for your feedback!
Kuba