Heads Up: New Crash Analyzer (minidump-stackwalk) for "Try" and Local Builds

126 views

Skip to first unread message

Alexis Beingessner

unread,

Nov 23, 2021, 2:45:08 PM11/23/21

to dev-platform

High Level Details

Bug 1741205 has just landed.

minidump-stackwalk, the tool that is used to produce a backtrace from a minidump when a test unexpectedly crashes (via mozcrash.py) has been completely rewritten in Rust. The core behaviour of the tool should be largely unchanged. It should produce basically the exact same backtraces, although I expect it will be faster and more reliable. If you notice any problems, please let us know by filing a bug against "Toolkit > Crash Reporting" or by leaving a comment on Bug 174120.

Although this is ideally a non-functional change for almost everyone, this is a major step in the ongoing oxidation of our crash reporting infrastructure: this is the first deployment of rust-minidump as a replacement for breakpad's minidump processor. This same implementation will (if testing goes smoothly) become the new backend for crash-stats.mozilla.org.

Moving to rust-minidump will make it easier to maintain our crash reporting infrastructure and make improvements to it. We've already found several long-standing issues that it accidentally fixes and found that it does runs faster than the old version. And of course, moving from C++ to Rust trivially improves the security of our infrastructure. Hooray!

Perhaps more immediately interesting to some of you: rust-minidump also makes it a lot easier to be a crash-analysis power user, and is easier to build and run locally (see the last section).

Sections:

What is affected (and how to tell which version is being used)
Nitty Gritty details (how builds/tooling configs have changed)
New crash analysis power user features (digging deeper into minidumps)

What is affected

This tool is used to generate this kind of line in the try frontend:

PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)]

And this information in the logs:

INFO - mozcrash Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe --human C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp Z:\task_163718277621850\build\symbols --symbols-url=https://symbols.mozilla.org/
INFO - mozcrash Saved minidump as Z:\task_163718277621850\build\blobber_upload_dir\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
INFO - PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)]
INFO - Crash dump filename: C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp
INFO - Operating system: Windows NT
INFO - 10.0.19041
INFO - CPU: amd64
INFO - family 6 model 85 stepping 7
INFO - 8 CPUs
INFO -
INFO - Crash reason: EXCEPTION_BREAKPOINT
INFO - Crash address: 0x7ff8ae0df019
INFO - Process uptime: 2 seconds
INFO -
INFO - Thread 3 TaskController #2 (crashed)
INFO - 0 xul.dll!static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool) [SurfaceCache.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 1831 + 0x40]
INFO - rax = 0x00007ff8b541d9f9 rdx = 0x0000000000000000
INFO - rcx = 0x00007ff8e277c978 rbx = 0x0000000000000001
INFO - rsi = 0x00000041a977f350 rdi = 0x00000177b0f6f740
INFO - rbp = 0x00000177aa172130 rsp = 0x00000041a977f2b0
INFO - r8 = 0x00000041a977f820 r9 = 0x00007ff8ea530000
INFO - r10 = 0x00007ff8ea582651 r11 = 0x00000041a977ec70
INFO - r12 = 0x00007ff8e26e9630 r13 = 0x00000177b3a15120
INFO - r14 = 0x00000177b0f970d8 r15 = 0x00000177b28a2a00
INFO - rip = 0x00007ff8ae0df019
INFO - Found by: given as instruction pointer in context
INFO - 1 xul.dll!mozilla::image::DecodedSurfaceProvider::FinishDecoding() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 200 + 0x37]
INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130
INFO - rsp = 0x00000041a977f310 r12 = 0x00007ff8e26e9630
INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8
INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0dbf
INFO - Found by: call frame info
INFO - 2 xul.dll!mozilla::image::DecodedSurfaceProvider::Run() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 129 + 0x7]
INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130
INFO - rsp = 0x00000041a977f390 r12 = 0x00007ff8e26e9630
INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8
INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0929
INFO - Found by: call frame info

Only the new implementation will include the `--human` flag in the "Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe" line, if you aren't sure which implementation is being used. However I believe this line is omitted if fix-stacks.py is involved. In that case, you can use the fact that the backtraces will end with "unimplemented streams", a self-debugging feature unique to the new tool:

INFO - Unimplemented streams encountered:
INFO - Stream 0x00000000 UnusedStream (Official) @ 0x00000000
INFO - Stream 0x00000015 SystemMemoryInfoStream (Official) @ 0x00002d98
INFO - Stream 0x00000016 ProcessVmCountersStream (Official) @ 0x00002f84

These differences similarly apply to local crashes, as far as I know (unverified).

Nitty Gritty Details

On paper, this should be a significant upgrade to try's minidump-stackwalk, in that the one that is currently being used is a weird unmaintained fork of what's used on crash-stats. The new one should be faster and more reliable, and will be easy to update by just changing the commit in its toolchain fetch:

rust-minidump:
description: rust-minidump source code (for minidump-stackwalk)
fetch:
type: git
repo: https://github.com/luser/rust-minidump/
revision: 0c90e02544797317503d1c4cff8daab0cabdea86

However this will introduce some changes to how minidump-stackwalk is built:

minidump-stackwalk no longer builds from checked-in source, so it will only need to be built if the toolchain fetch is updated (less builds and churn, yay!)
This introduces a currently orphaned win64-minidump-stackwalk build, for future use in solving Bug 1410840. That can be removed if having an orphan tool sets off some annoying warnings/errors for the sheriffs.
There was technically some wiring in `mach` to support building minidump-stackwalk locally. this no longer works, as there is no local source to use. mozboot was already downloading `minidump_stackwalk` for you, so it's unlikely this will affect anyone's workflow.
If for whatever reason you want to build your own copy of minidump-stackwalk, you can:

`cargo install minidump-stackwalk`
or checkout the rust-minidump tree and build it with `cargo build --release` (that's all our build servers do!)
NB: the rust binary is officially "minidump-stackwalk" but we rename it to "minidump_stackwalk" in CI to avoid needless churn, if you do either of these things, you will get a binary called "minidump-stackwalk"

Crash Analysis Power User Features

The new minidump-stackwalk should build and run locally on all mainstream platforms without any issue via "cargo install minidump-stackwalk". I have done my best to write user documentation and the process for analyzing a firefox minidump is streamlined:

> minidump-stackwalk --symbols-url=https://symbols.mozilla.org/ /path/to/minidump.dmp

Because it is designed to be the backend for crash-stats (aka socorro), it can do ~all the analysis you expect there (and more details are surfaced in the default JSON output than the --human one used by "try" and local builds).

To help me test this, I have also created a tool for downloading minidumps/annotations from crash-stats and comparing local minidump-stackwalk output to the values on crash-stats: socc-pair. While the comparison machinery is probably not necessarily useful to you, as a side-effect of its purpose it automates fetching all the details of a crash and running local analysis, and saves the results to files you can search through for details:

      socc-pair --api-token=f0c129d4467bf58eeca0ad8e8e5d --crash-id=b4f58e9f-49be-4ba5-a203-8ef160211027

...

Output Files:
* Minidump: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.dmp
* Socorro Processed Crash: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.json
* Raw JSON: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.raw.json
* Local minidump-stackwalk Output: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.local.json
* Local minidump-stackwalk Logs: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.log.txt

NOTE: these files can contain protected user information. Although they are written to your system's default "temp", I recommend deleting the temp `socc-pair` directory regularly to ensure compliance with our protected data policies.

Notably this includes all of the logging rust-minidump did (in the `.log.txt`), including tracing for the backtracer's analysis, which can help you debug strange backtraces:

[TRACE] unwind: starting stack unwind
[TRACE] unwind: unwinding NtGetContextThread
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind: .cfa: $rsp 8 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x000000ec00000000, caller_sp: 0x000000ec7fbfd790
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: unwinding 1013612281855
[TRACE] unwind: trying cfi
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd172c2a24, caller_sp: 0xec7fbfd7f8
[TRACE] unwind: unwinding <unknown in ntdll.dll>
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd162b7034, caller_sp: 0xec7fbfd828
[TRACE] unwind: unwinding BaseThreadInitThunk
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind: .cfa: $rsp 48 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x0000000000000000, caller_sp: 0x000000ec7fbfd858
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: instruction pointer was nullish, assuming unwind complete
[TRACE] unwind: finished stack unwind

If you are using minidump-stackwalk directly, you can get these same logs with `--verbose=trace`. `--output-file=x/y/z` and `--log-file=x/y/z` arguments are included to make it easier to pipe these streams to files.

Reply all

Reply to author

Forward

0 new messages