How to kill a thread in another process

45 views
Skip to first unread message

Adam Barth

unread,
Jun 25, 2021, 6:55:05 PM6/25/21
to zircon-dev
# Background

Starnix is a Fuchsia program that lets us run unmodified Linux binaries on Fuchsia.  Currently, Starnix works by creating a child process and loading the Linux executable into that child process.  Whenever the Linux binary wants to create a thread, Starnix uses zx_thread_create to create a Zircon thread remotely in the child process.

When the Linux binary is done with the thread, the child process issues the exit() Linux syscall, which generates a BAD_SYSCALL exception that Starnix catches.  Starnix updates its internal state to reflect the death of the thread, but currently Starnix has no way to actually destroy the underlying Zircon thread in the child process.

Previously, the zx_task_kill Zircon syscall could do that work, but RFC-0007 removed support for killing threads with zx_task_kill.  We also cannot use zx_thread_exit, which is how normal Fuchsia threads destroy themselves, because that syscall can only be issued from the thread being destroyed and the child process does not know how to issue Zircon syscalls (and does not have the requisite capabilities anyway).

# Problem statement

How can Starnix kill the underlying Zircon thread in the child process?

# Proposal

We could add a ZX_EXCEPTION_STATE_FATAL disposition to Zircon exceptions.  When Starnix catches the BAD_SYSCALL exception for exit() from the child process, Starnix could set the disposition for the exception to ZX_EXCEPTION_STATE_FATAL (rather than ZX_EXCEPTION_STATE_TRY_NEXT or ZX_EXCEPTION_STATE_HANDLED).

When pumping the exception handler state machine, Zircon could recognize the ZX_EXCEPTION_STATE_FATAL disposition and kill the thread rather than continue propagating the exception.

# Prototype


With this prototype, Starnix is able to use ZX_EXCEPTION_STATE_FATAL to successfully pass the PipeTest.WriterSideCloses test case, which is a particular test case in a Linux binary that (incidentally) creates and destroys a number of threads.

# Alternative

We could make zx_thread_kill work again, potentially only when the thread is suspended or with some other restriction to avoid the pitfalls discussed in RFC-0007.

Thoughts?
Adam

Venkatesh Srinivas

unread,
Jun 29, 2021, 10:00:13 PM6/29/21
to zircon-dev, Adam Barth
On Friday, June 25, 2021 at 3:55:05 PM UTC-7 Adam Barth wrote:

# Alternative

We could make zx_thread_kill work again, potentially only when the thread is suspended or with some other restriction to avoid the pitfalls discussed in RFC-0007.

To repeat from RFC-0007, the concerns WRT asynchronous thread killing:
```
* Locks can be left acquired, including global locks like ones controlling the heap.
* Memory can be leaked. At the very least the thread stack, but often many other pieces.
* Runtime left in an inconsistent state. This is at least true for the C and Go runtime.
* Killing a thread in its way to a syscall leaves the process in an unknown state. Kernel is
  fine but the process does not have a way to know what happened and what did not happen.
* Defeats RAII wrappers and automatic cleanup. In fact, it defeats most guarantees from the high
  level languages Fuchsia uses.
```

Restoring thread killing and restricting it to only suspended threads still runs into
many of the problems above; suspending a target thread is not cooperative with the
thread's runtime, so RAII wrappers/automatic cleanup/high-level guarantees could not
be maintained. The only problems from full asynchronous thread killing avoided are
about unknown kernel state.

Thanks,
-- vs;

Roland McGrath

unread,
Jun 29, 2021, 10:03:22 PM6/29/21
to Adam Barth, zircon-dev
The rationales stated in RFC-0007 are all about not having a good reason to want it, rather than it being problematic to support it per se from an implementation perspective.  So perhaps this use case is just reason to reconsider that, while still strongly discouraging its use outside specialized circumstances like Starnix for all the reasons given there.  But if there is another natural sort of API for this use case to use rather than a resurrection of the general thread-killing operation, then that avoids any hint of reopening the can of worms that RFC-0007 was intended to close.

I'll also point at wrt zx_thread_exit as a potential path that this is one of the few system calls that doesn't require any capability.  So not wanting to have capabilities like the thread or process handle in the emulation process itself is not directly mutually exclusive with using zx_thread_exit.  However, it seems likely desirable that an emulation process not have the vDSO in its address space at all, which is a good reason not to try to use a Zircon system call there.

If the entire emulation scheme is already based on the thread exception path then using an exception handler's result as the means for this seems somewhat natural.

I would recommend that the new exception disposition be called something more precise such as ZX_EXCEPTION_STATE_THREAD_EXIT.  A term like "fatal" is ambiguous as to whether it applies to the thread or the process, is considered "normal" or "abnormal", etc.  I think it's better to avoid introducing a term like that and then defining it, when we can instead simply refer directly to the existing Zircon API operation that corresponds to the behavior being specified.

--
All posts must follow the Fuchsia Code of Conduct https://fuchsia.dev/fuchsia-src/CODE_OF_CONDUCT or may be removed.
---
You received this message because you are subscribed to the Google Groups "zircon-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zircon-dev+...@fuchsia.dev.
To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/zircon-dev/CAP%3D28ce8-ofaR_AfjOQ5LF6jf5L%2Bi7ZX-ZJEv4x%2B1rayTHXn8A%40mail.gmail.com.

Adam Barth

unread,
Jun 29, 2021, 10:07:52 PM6/29/21
to Roland McGrath, zircon-dev
Yes, ZX_EXCEPTION_STATE_THREAD_EXIT is a better name, thanks for the suggestion!

It turns out dispatch_user_exception occurs on the thread that generates the exception, which means I can do the equivalent of zx_thread_exit from there after marking the exception as ZX_EXCEPTION_STATE_THREAD_EXIT, which looks a bit cleaner than what's in the prototype CL.

Adam

Reply all
Reply to author
Forward
0 new messages