minidump-stackwalk, the tool that is used to produce a backtrace from a minidump when a test unexpectedly crashes (via
mozcrash.py) has been completely rewritten in Rust.
The core behaviour of the tool should be largely unchanged.
It should produce basically the exact same backtraces, although I
expect it will be faster and more reliable. If you notice any problems,
please let us know by filing a bug against "Toolkit > Crash
Reporting" or by leaving a comment on
Bug 174120.
Although this is
ideally a
non-functional change for almost everyone, this is a major step in the
ongoing oxidation of our crash reporting infrastructure: this is the
first deployment of
rust-minidump
as a replacement for breakpad's minidump processor. This same
implementation will (if testing goes smoothly) become the new backend
for
crash-stats.mozilla.org.
Moving
to rust-minidump will make it easier to maintain our crash reporting
infrastructure and make improvements to it. We've already found several
long-standing issues that it accidentally fixes and found that it
does runs faster than the old version. And of course, moving from C++
to Rust trivially improves the security of our infrastructure. Hooray!
- What is affected (and how to tell which version is being used)
- Nitty Gritty details (how builds/tooling configs have changed)
- New crash analysis power user features (digging deeper into minidumps)
What is affected
This tool is used to generate this kind of line in the try frontend:
PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)]
And this information in the logs:
INFO - mozcrash Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe --human C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmp Z:\task_163718277621850\build\symbols --symbols-url=https://symbols.mozilla.org/INFO - mozcrash Saved minidump as Z:\task_163718277621850\build\blobber_upload_dir\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmpINFO - PROCESS-CRASH | Last test finished | application crashed [@ static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool)]INFO - Crash dump filename: C:\Users\task_163718277621850\AppData\Local\Temp\tmpra9trz2u.mozrunner\minidumps\b2b1c4f0-3e0a-46b8-b469-988591c3c015.dmpINFO - Operating system: Windows NTINFO - 10.0.19041 INFO - CPU: amd64INFO - family 6 model 85 stepping 7INFO - 8 CPUsINFO - INFO - Crash reason: EXCEPTION_BREAKPOINTINFO - Crash address: 0x7ff8ae0df019INFO - Process uptime: 2 secondsINFO - INFO - Thread 3 TaskController #2 (crashed)INFO - 0 xul.dll!static mozilla::image::SurfaceCache::ReleaseImageOnMainThread(already_AddRefed<mozilla::image::Image>, bool) [SurfaceCache.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 1831 + 0x40]INFO - rax = 0x00007ff8b541d9f9 rdx = 0x0000000000000000INFO - rcx = 0x00007ff8e277c978 rbx = 0x0000000000000001INFO - rsi = 0x00000041a977f350 rdi = 0x00000177b0f6f740INFO - rbp = 0x00000177aa172130 rsp = 0x00000041a977f2b0INFO - r8 = 0x00000041a977f820 r9 = 0x00007ff8ea530000INFO - r10 = 0x00007ff8ea582651 r11 = 0x00000041a977ec70INFO - r12 = 0x00007ff8e26e9630 r13 = 0x00000177b3a15120INFO - r14 = 0x00000177b0f970d8 r15 = 0x00000177b28a2a00INFO - rip = 0x00007ff8ae0df019INFO - Found by: given as instruction pointer in contextINFO - 1 xul.dll!mozilla::image::DecodedSurfaceProvider::FinishDecoding() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 200 + 0x37]INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130INFO - rsp = 0x00000041a977f310 r12 = 0x00007ff8e26e9630INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0dbfINFO - Found by: call frame infoINFO - 2 xul.dll!mozilla::image::DecodedSurfaceProvider::Run() [DecodedSurfaceProvider.cpp:694ce55b85c51b3381eaf432020924e4f0ca4717 : 129 + 0x7]INFO - rbx = 0x0000000000000001 rbp = 0x00000177aa172130INFO - rsp = 0x00000041a977f390 r12 = 0x00007ff8e26e9630INFO - r13 = 0x00000177b3a15120 r14 = 0x00000177b0f970d8INFO - r15 = 0x00000177b28a2a00 rip = 0x00007ff8ae0b0929INFO - Found by: call frame info Only the new implementation will include the `--human` flag in the
"Copy/paste: Z:/task_163718277621850/fetches\minidump_stackwalk\minidump_stackwalk.exe"
line, if you aren't sure which implementation is being used. However I
believe this line is omitted if fix-stacks.py is involved. In that case,
you can use the fact that the backtraces will end with "unimplemented
streams", a self-debugging feature unique to the new tool:
INFO - Unimplemented streams encountered:
INFO - Stream 0x00000000 UnusedStream (Official) @ 0x00000000
INFO - Stream 0x00000015 SystemMemoryInfoStream (Official) @ 0x00002d98
INFO - Stream 0x00000016 ProcessVmCountersStream (Official) @ 0x00002f84
These differences similarly apply to local crashes, as far as I know (unverified).
Nitty Gritty Details On
paper, this should be a significant upgrade to try's
minidump-stackwalk, in that the one that is currently being used is a
weird unmaintained fork of what's used on crash-stats. The new one
should be faster and more reliable, and will be easy to update by just
changing the commit in its
toolchain fetch:
However this will introduce some changes to how minidump-stackwalk is built:
- minidump-stackwalk
no longer builds from checked-in source, so it will only need to be
built if the toolchain fetch is updated (less builds and churn, yay!)
- This
introduces a currently orphaned win64-minidump-stackwalk build, for
future use in solving Bug 1410840. That can be removed if having an
orphan tool sets off some annoying warnings/errors for the sheriffs.
- There
was technically some wiring in `mach` to support building
minidump-stackwalk locally. this no longer works, as there is no local
source to use. mozboot was already downloading `minidump_stackwalk` for
you, so it's unlikely this will affect anyone's workflow.
- If for whatever reason you want to build your own copy of minidump-stackwalk, you can:
- `cargo install minidump-stackwalk`
- or checkout the rust-minidump tree and build it with `cargo build --release` (that's all our build servers do!)
- NB:
the rust binary is officially "minidump-stackwalk" but we rename it to
"minidump_stackwalk" in CI to avoid needless churn, if you do either of
these things, you will get a binary called "minidump-stackwalk"
Crash Analysis Power User FeaturesThe
new minidump-stackwalk should build and run locally on all mainstream platforms without any issue via
"cargo install minidump-stackwalk". I have done my best to
write user documentation and the process for analyzing a firefox minidump is streamlined:
Because
it is designed to be the backend for crash-stats (aka socorro), it can
do ~all the analysis you expect there (and more details are surfaced in
the default JSON output than the --human one used by "try" and local
builds).
To help me test this, I have also
created a tool for downloading minidumps/annotations from crash-stats
and comparing local minidump-stackwalk output to the values on
crash-stats:
socc-pair.
While the comparison machinery is probably not necessarily useful to
you, as a side-effect of its purpose it automates fetching all the
details of a crash and running local analysis, and saves the results to
files you can search through for details:
socc-pair --api-token=f0c129d4467bf58eeca0ad8e8e5d --crash-id=b4f58e9f-49be-4ba5-a203-8ef160211027
<lots of interesting analysis>
...
Output Files:
* Minidump: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.dmp
* Socorro Processed Crash: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.json
* Raw JSON: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.raw.json
* Local minidump-stackwalk Output: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.local.json
* Local minidump-stackwalk Logs: C:\Users\gankra\AppData\Local\Temp\socc-pair\dumps\b4f58e9f-49be-4ba5-a203-8ef160211027.log.txt
NOTE:
these files can contain protected user information. Although they are
written to your system's default "temp", I recommend deleting the temp
`socc-pair` directory regularly to ensure compliance with our protected data policies.Notably this includes all of the logging rust-minidump did (in the `.log.txt`), including
tracing for the backtracer's analysis, which can help you debug strange backtraces:
[TRACE] unwind: starting stack unwind
[TRACE] unwind: unwinding NtGetContextThread
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind: .cfa: $rsp 8 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x000000ec00000000, caller_sp: 0x000000ec7fbfd790
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: unwinding 1013612281855
[TRACE] unwind: trying cfi
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd172c2a24, caller_sp: 0xec7fbfd7f8
[TRACE] unwind: unwinding <unknown in ntdll.dll>
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying frame pointer
[TRACE] unwind: trying scan
[TRACE] unwind: scan seems valid -- caller_ip: 0x7ffd162b7034, caller_sp: 0xec7fbfd828
[TRACE] unwind: unwinding BaseThreadInitThunk
[TRACE] unwind: trying cfi
[TRACE] unwind: found symbols for address, searching for cfi entries
[TRACE] unwind: trying STACK CFI exprs
[TRACE] unwind: .cfa: $rsp 8 + .ra: .cfa 8 - ^
[TRACE] unwind: .cfa: $rsp 48 +
[TRACE] unwind: STACK CFI parse successful
[TRACE] unwind: STACK CFI seems reasonable, evaluating
[TRACE] unwind: successfully evaluated .cfa (frame address)
[TRACE] unwind: successfully evaluated .ra (return address)
[TRACE] unwind: cfi evaluation was successful -- caller_ip: 0x0000000000000000, caller_sp: 0x000000ec7fbfd858
[TRACE] unwind: cfi result seems valid
[TRACE] unwind: instruction pointer was nullish, assuming unwind complete
[TRACE] unwind: finished stack unwind
If
you are using minidump-stackwalk directly, you can get these same logs
with `--verbose=trace`. `--output-file=x/y/z` and `--log-file=x/y/z`
arguments are included to make it easier to pipe these streams to files.