Symbolizers and OS X

287 views

Skip to first unread message

Kuba Brecka

unread,

Dec 3, 2014, 6:59:12 PM12/3/14

to address-...@googlegroups.com

Hi everyone,

I'd like to ask about the various symbolizers that are used by ASan and sanitizer_common, and then propose some changes to get better OS X support.

If I understand correctly, the general llvm-symbolizer (and its interface in `LLVMSymbolizerProcess`) should be the easiest to use and supported solution, but we also have a few others:

* `LibbacktraceSymbolizer`, which is an in-process symbolizer that uses libbacktrace (but only supports ELF).

* `Addr2LinePool` uses the addr2line command line tool.

* `WinSymbolizer` uses the DbgHelp.dll on Windows.

* `InternalSymbolizer`, which can be used to link in an externally-built symbolizer.

And the `POSIXSymbolizer` is a wrapper that decides which actual symbolizer will be used. What I'm curious about is what's the primary use and quality of these symbolizers. Are the in-process ones present just to support running inside a sandbox, where we cannot spawn an external process? Are there other reasons to prefer in-process symbolication?

None of the in-process solutions seem to support Darwin, meaning symbolication doesn't work in sandboxed (fork disabled) environments. Another issue is that llvm-symbolizer is not present on any current installation of OS X or Xcode. So in order to transfer an ASanified program to another machine, one has to ship llvm-symbolizer with the program.

While for a lot of use cases, llvm-symbolizer works fine, I'd like to consider adding fallback symbolizers that would work on OS X: In case the llvm-symbolizer executable is not present, we could spawn `atos` instead, which can also be run in an interactive mode and is even able to inspect a running process. We already have such a fallback that uses the `addr2line` tool on Linux, when llvm-symbolizer is not found.

In case forking is disabled, we should consider having an in-process symbolizer that is supported on OS X, like dladdr() or backtrace(). I understand that we cannot just straightforwardly use these functions, because there are concerns about how they internally allocate memory, etc. Can I ask what exactly would be needed to use these calls in a reliable way?

This is also slightly related to ASan issue suppression (http://reviews.llvm.org/D6280), which for some suppression types requires a working symbolizer and might also benefit from having an in-process symbolizer.

Thank you for your feedback!

Kuba

Evgeniy Stepanov

unread,

Dec 4, 2014, 5:04:06 AM12/4/14

to address-...@googlegroups.com

In-process symbolizer sometimes makes deployment easy, as you don't need to carry an extra library or binary. For the same reason we prefer linking runtime library statically where possible.

AFAIK, libbacktrace is used in gcc asan only.

Another internal symbolizer can be found here:

https://code.google.com/p/address-sanitizer/source/browse/#svn%2Ftrunk%2Finternal_symbolizer

The scripts may be a bit out of date, but the idea is to link llvm-symbolizer statically, internalizing all its symbols except for a very simple interface to avoid conflicts with the user code. It would be nice to integrate it into llvm build system.

I don't mind an "atos" symbolizer, if you think it would be useful.

--
You received this message because you are subscribed to the Google Groups "address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to address-sanitizer+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alexander Potapenko

unread,

Dec 4, 2014, 5:21:29 AM12/4/14

to address-...@googlegroups.com

Side note: IIRC both atos and addr2line are inaccurate when it comes
to -gline-tables-only debug info. Though it's fine to have atos as a
fallback.

Regarding symbolization on the test machines, do you actually need it?
I suspect we aren't talking about continuous builds for which it
should be easy to deliver llvm-symbolizer and stuff.
In the case when you're giving instrumented binaries to the user, are
you going to give .dSYM information as well? If not, symbolization
simply won't work. But you can always process the report on the
developer's machine where you've both the debug info and the tools
able to process it.

BTW I remember us discussing the possibility to generate OSX crash
reports in the cases when ASan detected an error.
Are there any facilities that can let one automatically send ASan
reports in that format to the developer?

>> email to address-saniti...@googlegroups.com.

>> For more options, visit https://groups.google.com/d/optout.
>

> --
> You received this message because you are subscribed to the Google Groups
> "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to address-saniti...@googlegroups.com.

> For more options, visit https://groups.google.com/d/optout.

--
Alexander Potapenko
Software Engineer
Google Moscow

Alexey Samsonov

unread,

Dec 4, 2014, 7:55:24 PM12/4/14

to address-...@googlegroups.com

Hi,

On Wed, Dec 3, 2014 at 3:59 PM, Kuba Brecka <kuba....@gmail.com> wrote:

Hi everyone,

I'd like to ask about the various symbolizers that are used by ASan and sanitizer_common, and then propose some changes to get better OS X support.

If I understand correctly, the general llvm-symbolizer (and its interface in `LLVMSymbolizerProcess`) should be the easiest to use and supported solution, but we also have a few others:

* `LibbacktraceSymbolizer`, which is an in-process symbolizer that uses libbacktrace (but only supports ELF).

^^ Right, I believe GCC uses it by default by linking libbacktrace in.

* `Addr2LinePool` uses the addr2line command line tool.

^^ This one is currently used in TSan only, although it can be enabled for another sanitizers with runtime flags.

* `WinSymbolizer` uses the DbgHelp.dll on Windows.
* `InternalSymbolizer`, which can be used to link in an externally-built symbolizer.

^^ What Evgeniy says. We use internal symbolizer for ASan, TSan and MSan internally and have observed little problems with it. It saves you a lot of deployment effort and allows

you to get nice backtraces from any binary, provided it's built with at least -gline-tables-only. Note that internal symbolizer we use is essentially an llvm-symbolizer built with hacks and hooks.

I was planning to work on porting internal_symbolizer into LLVM build system until EOY (now that support for CMake builds of libcxx/libcxxabi is solid). I think doing that will help Mac as well, and could

probably solve your problems. I'd be more than happy to collaborate with you on this and verify that things work as expected on Mac.

And the `POSIXSymbolizer` is a wrapper that decides which actual symbolizer will be used. What I'm curious about is what's the primary use and quality of these symbolizers. Are the in-process ones present just to support running inside a sandbox, where we cannot spawn an external process? Are there other reasons to prefer in-process symbolication?

None of the in-process solutions seem to support Darwin, meaning symbolication doesn't work in sandboxed (fork disabled) environments. Another issue is that llvm-symbolizer is not present on any current installation of OS X or Xcode. So in order to transfer an ASanified program to another machine, one has to ship llvm-symbolizer with the program.

While for a lot of use cases, llvm-symbolizer works fine, I'd like to consider adding fallback symbolizers that would work on OS X: In case the llvm-symbolizer executable is not present, we could spawn `atos` instead, which can also be run in an interactive mode and is even able to inspect a running process. We already have such a fallback that uses the `addr2line` tool on Linux, when llvm-symbolizer is not found.

In case forking is disabled, we should consider having an in-process symbolizer that is supported on OS X, like dladdr() or backtrace(). I understand that we cannot just straightforwardly use these functions, because there are concerns about how they internally allocate memory, etc. Can I ask what exactly would be needed to use these calls in a reliable way?

This is also slightly related to ASan issue suppression (http://reviews.llvm.org/D6280), which for some suppression types requires a working symbolizer and might also benefit from having an in-process symbolizer.

Thank you for your feedback!
Kuba

--
You received this message because you are subscribed to the Google Groups "address-sanitizer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to address-saniti...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.