Crashpad on Linux not uploading renderer process crashes

470 views
Skip to first unread message

Marshall Greenblatt

unread,
Mar 15, 2022, 5:41:11 PM3/15/22
to chromium-dev, jpe...@chromium.org
Hi All,

I'm in the process of migrating my Chromium-based application (at M100) from Breakpad to Crashpad on Linux. I currently have crash report upload working for crashes originating from the main and GPU processes, but I can't seem to get a report for the renderer process. When crashing the renderer process (via chrome://crash) I get the following console messages:

[WARNING:exception_snapshot_linux.cc(349)] thread ID 1 not found in process
[ERROR:process_snapshot_linux.cc(129)] thread not found 1

Are there any special configuration requirements related to Crashpad and the renderer process that I might be missing? The docs seem to suggest that there might be, but the details appear to be out of date.

Thanks,
Marshall

Marshall Greenblatt

unread,
Mar 17, 2022, 2:16:22 PM3/17/22
to Joshua Peraza, chromium-dev, Joshua Peraza
On Tue, Mar 15, 2022 at 7:36 PM Joshua Peraza <jpe...@google.com> wrote:
When a crashing thread requests a dump from crashpad_handler, it includes its own thread ID in its namespace and a pointer to its stack. Renderer processes are in a PID namespace so "thread ID 1" is likely what the thread reported using getttid(). Crashpad knows about PID namespacing though, and should use the stack pointer to find which thread has that stack allowing it to convert to a thread ID in crashpad_handler's namespace. It looks like that isn't happening here. The stack address isn't being passed through properly when using a broker so adding it as a parameter to HandleExceptionWithBroker() might fix this, though it is unexpected that chromium would be using the broker on Linux (unless you've modified it to use StartHandlerAtCrash()?), though it should generally be okay.

Thanks for the detailed explanation! I'm currently using CrashpadClient::StartHandler via crashpad::HandlerMain and copying the Linux-specific bits (like calling GetHandlerSocket and passing kCrashpadHandlerPid) from your headless Linux commit. This HandlerMain approach matches what we're currently using with Crashpad on Windows and MacOS, but maybe it's not the correct approach for Linux?

As an additional data point, crash reporting from the renderer works if I run with "--no-sandbox". I'll try to implement the PID namespacing change that you mention and see if that helps.

Thanks,
Marshall

Marshall Greenblatt

unread,
Mar 17, 2022, 2:32:53 PM3/17/22
to Joshua Peraza, chromium-dev, Joshua Peraza
On Thu, Mar 17, 2022 at 2:14 PM Marshall Greenblatt <magree...@gmail.com> wrote:
On Tue, Mar 15, 2022 at 7:36 PM Joshua Peraza <jpe...@google.com> wrote:
When a crashing thread requests a dump from crashpad_handler, it includes its own thread ID in its namespace and a pointer to its stack. Renderer processes are in a PID namespace so "thread ID 1" is likely what the thread reported using getttid(). Crashpad knows about PID namespacing though, and should use the stack pointer to find which thread has that stack allowing it to convert to a thread ID in crashpad_handler's namespace. It looks like that isn't happening here. The stack address isn't being passed through properly when using a broker so adding it as a parameter to HandleExceptionWithBroker() might fix this, though it is unexpected that chromium would be using the broker on Linux (unless you've modified it to use StartHandlerAtCrash()?), though it should generally be okay.

Thanks for the detailed explanation! I'm currently using CrashpadClient::StartHandler via crashpad::HandlerMain and copying the Linux-specific bits (like calling GetHandlerSocket and passing kCrashpadHandlerPid) from your headless Linux commit. This HandlerMain approach matches what we're currently using with Crashpad on Windows and MacOS, but maybe it's not the correct approach for Linux?

As an additional data point, crash reporting from the renderer works if I run with "--no-sandbox". I'll try to implement the PID namespacing change that you mention and see if that helps.

Looks like it takes the kDirectPtrace code path. The code is calling FindThreadWithStackAddress and |local_requesting_thread_id| is -1 both with sandbox enabled and disabled.

Marshall Greenblatt

unread,
Mar 17, 2022, 4:20:29 PM3/17/22
to Joshua Peraza, chromium-dev, Joshua Peraza
On Thu, Mar 17, 2022 at 3:01 PM Joshua Peraza <jpe...@google.com> wrote:


On Thu, Mar 17, 2022 at 11:29 AM Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Mar 17, 2022 at 2:14 PM Marshall Greenblatt <magree...@gmail.com> wrote:
On Tue, Mar 15, 2022 at 7:36 PM Joshua Peraza <jpe...@google.com> wrote:
When a crashing thread requests a dump from crashpad_handler, it includes its own thread ID in its namespace and a pointer to its stack. Renderer processes are in a PID namespace so "thread ID 1" is likely what the thread reported using getttid(). Crashpad knows about PID namespacing though, and should use the stack pointer to find which thread has that stack allowing it to convert to a thread ID in crashpad_handler's namespace. It looks like that isn't happening here. The stack address isn't being passed through properly when using a broker so adding it as a parameter to HandleExceptionWithBroker() might fix this, though it is unexpected that chromium would be using the broker on Linux (unless you've modified it to use StartHandlerAtCrash()?), though it should generally be okay.

Thanks for the detailed explanation! I'm currently using CrashpadClient::StartHandler via crashpad::HandlerMain and copying the Linux-specific bits (like calling GetHandlerSocket and passing kCrashpadHandlerPid) from your headless Linux commit. This HandlerMain approach matches what we're currently using with Crashpad on Windows and MacOS, but maybe it's not the correct approach for Linux?

As an additional data point, crash reporting from the renderer works if I run with "--no-sandbox". I'll try to implement the PID namespacing change that you mention and see if that helps.

Looks like it takes the kDirectPtrace code path. The code is calling FindThreadWithStackAddress and |local_requesting_thread_id| is -1 both with sandbox enabled and disabled.

That sounds like either crashpad isn't finding this thread in /proc/<pid>/task, or it isn't finding the right bounds for the stack. In FindThreadWithStackAddress() is the count of threads correct for your process? Are any of the stacks close to the stack_address being searched for?

Below is the result of adding some printf statements in FindThreadWithStackAddress. The thread count appears to be correct (checked immediately before crashing). I ran it multiple times both with and without sandbox, and the results are pretty consistent each time (closest always off by about 63MB).

$ ps -ax | grep cefclient | grep renderer
  2772 pts/0    Sl+    0:05 /home/marshall/code/chromium_git/chromium/src/out/Release_GN_x64/cefclient --type=renderer --user-data-dir=/home/marshall/.config/cef_user_data --crashpad-handler-pid=2679 --log-file=/home/marshall/code/chromium_git/chromium/src/out/Release_GN_x64/debug.log --lang=en-US --num-raster-threads=4 --enable-main-frame-before-activation --renderer-client-id=4 --launch-time-ticks=1074246880 --shared-files=v8_context_snapshot_data:100 --field-trial-handle=0,i,15066828721958674165,5913390433226292925,131072

$ ls /proc/2772/task
2772  2773  2774  2775  2777  2778  2779  2780  2781  2782  2783  2784

# Printf output:
find stack_address 139754817895280
tid 2772 stack_region_address 139756237480800 stack_region_size 40096 (diff 1419585520)
tid 2773 stack_region_address 139754754685968 stack_region_size 6896 (diff 63209312)
tid 2774 stack_region_address 139754746236224 stack_region_size 6592 (diff 71659056)
tid 2775 stack_region_address 139754706127888 stack_region_size 6896 (diff 111767392)
tid 2777 stack_region_address 139754674461776 stack_region_size 6832 (diff 143433504)
tid 2778 stack_region_address 139754652167408 stack_region_size 6672 (diff 165727872)
tid 2779 stack_region_address 139754632109456 stack_region_size 6512 (diff 185785824)
tid 2780 stack_region_address 139754612052176 stack_region_size 5680 (diff 205843104)
tid 2781 stack_region_address 139754603602128 stack_region_size 5680 (diff 214293152)
tid 2782 stack_region_address 139754595152080 stack_region_size 5680 (diff 222743200)
tid 2783 stack_region_address 139754586702032 stack_region_size 5680 (diff 231193248)
tid 2784 stack_region_address 139754555039952 stack_region_size 5680 (diff 262855328)

Marshall Greenblatt

unread,
Mar 17, 2022, 4:33:40 PM3/17/22
to Joshua Peraza, chromium-dev, Joshua Peraza

On Mar 17, 2022, at 4:18 PM, Marshall Greenblatt <magree...@gmail.com> wrote:


On Thu, Mar 17, 2022 at 3:01 PM Joshua Peraza <jpe...@google.com> wrote:


On Thu, Mar 17, 2022 at 11:29 AM Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Mar 17, 2022 at 2:14 PM Marshall Greenblatt <magree...@gmail.com> wrote:
On Tue, Mar 15, 2022 at 7:36 PM Joshua Peraza <jpe...@google.com> wrote:
When a crashing thread requests a dump from crashpad_handler, it includes its own thread ID in its namespace and a pointer to its stack. Renderer processes are in a PID namespace so "thread ID 1" is likely what the thread reported using getttid(). Crashpad knows about PID namespacing though, and should use the stack pointer to find which thread has that stack allowing it to convert to a thread ID in crashpad_handler's namespace. It looks like that isn't happening here. The stack address isn't being passed through properly when using a broker so adding it as a parameter to HandleExceptionWithBroker() might fix this, though it is unexpected that chromium would be using the broker on Linux (unless you've modified it to use StartHandlerAtCrash()?), though it should generally be okay.

Thanks for the detailed explanation! I'm currently using CrashpadClient::StartHandler via crashpad::HandlerMain and copying the Linux-specific bits (like calling GetHandlerSocket and passing kCrashpadHandlerPid) from your headless Linux commit. This HandlerMain approach matches what we're currently using with Crashpad on Windows and MacOS, but maybe it's not the correct approach for Linux?

As an additional data point, crash reporting from the renderer works if I run with "--no-sandbox". I'll try to implement the PID namespacing change that you mention and see if that helps.

Looks like it takes the kDirectPtrace code path. The code is calling FindThreadWithStackAddress and |local_requesting_thread_id| is -1 both with sandbox enabled and disabled.

That sounds like either crashpad isn't finding this thread in /proc/<pid>/task, or it isn't finding the right bounds for the stack. In FindThreadWithStackAddress() is the count of threads correct for your process? Are any of the stacks close to the stack_address being searched for?

Below is the result of adding some printf statements in FindThreadWithStackAddress. The thread count appears to be correct (checked immediately before crashing). I ran it multiple times both with and without sandbox, and the results are pretty consistent each time (closest always off by about 63MB).

I should mention that this is an is_asan=true build, in case that changes the expected behavior. Also, I suspect that there’s a different issue causing the crash submission failure with the sandboxed renderer, since the submission with sandbox disabled still works despite showing this FindThreadWithStackAddress behavior.

Joshua Peraza

unread,
Mar 18, 2022, 5:40:09 PM3/18/22
to Marshall Greenblatt, chromium-dev, Joshua Peraza
When a crashing thread requests a dump from crashpad_handler, it includes its own thread ID in its namespace and a pointer to its stack. Renderer processes are in a PID namespace so "thread ID 1" is likely what the thread reported using getttid(). Crashpad knows about PID namespacing though, and should use the stack pointer to find which thread has that stack allowing it to convert to a thread ID in crashpad_handler's namespace. It looks like that isn't happening here. The stack address isn't being passed through properly when using a broker so adding it as a parameter to HandleExceptionWithBroker() might fix this, though it is unexpected that chromium would be using the broker on Linux (unless you've modified it to use StartHandlerAtCrash()?), though it should generally be okay.

On Tue, Mar 15, 2022 at 2:39 PM Marshall Greenblatt <magree...@gmail.com> wrote:

Joshua Peraza

unread,
Mar 18, 2022, 5:42:16 PM3/18/22
to Marshall Greenblatt, chromium-dev, Joshua Peraza
On Thu, Mar 17, 2022 at 11:29 AM Marshall Greenblatt <magree...@gmail.com> wrote:
On Thu, Mar 17, 2022 at 2:14 PM Marshall Greenblatt <magree...@gmail.com> wrote:
On Tue, Mar 15, 2022 at 7:36 PM Joshua Peraza <jpe...@google.com> wrote:
When a crashing thread requests a dump from crashpad_handler, it includes its own thread ID in its namespace and a pointer to its stack. Renderer processes are in a PID namespace so "thread ID 1" is likely what the thread reported using getttid(). Crashpad knows about PID namespacing though, and should use the stack pointer to find which thread has that stack allowing it to convert to a thread ID in crashpad_handler's namespace. It looks like that isn't happening here. The stack address isn't being passed through properly when using a broker so adding it as a parameter to HandleExceptionWithBroker() might fix this, though it is unexpected that chromium would be using the broker on Linux (unless you've modified it to use StartHandlerAtCrash()?), though it should generally be okay.

Thanks for the detailed explanation! I'm currently using CrashpadClient::StartHandler via crashpad::HandlerMain and copying the Linux-specific bits (like calling GetHandlerSocket and passing kCrashpadHandlerPid) from your headless Linux commit. This HandlerMain approach matches what we're currently using with Crashpad on Windows and MacOS, but maybe it's not the correct approach for Linux?

As an additional data point, crash reporting from the renderer works if I run with "--no-sandbox". I'll try to implement the PID namespacing change that you mention and see if that helps.

Looks like it takes the kDirectPtrace code path. The code is calling FindThreadWithStackAddress and |local_requesting_thread_id| is -1 both with sandbox enabled and disabled.

That sounds like either crashpad isn't finding this thread in /proc/<pid>/task, or it isn't finding the right bounds for the stack. In FindThreadWithStackAddress() is the count of threads correct for your process? Are any of the stacks close to the stack_address being searched for?

 
Reply all
Reply to author
Forward
0 new messages