Arachne: Core-Aware Thread Management

Skip to first unread message

Barret Rhoden

Jun 20, 2019, 10:37:56 AM6/20/19
to akaros-list
Cooperative M:N threading system, built on Linux, using cpusets.

Haven't read it fully, but I noticed we got mentioned in their rel
work. Looks like an after-thought, TBH.

Arachne is a new user-level implementation of threads that
provides both low latency and high throughput for applications
with extremely short-lived threads (only a few microseconds).
Arachne is core-aware: each application determines how many
cores it needs, based on its load; it always knows exactly
which cores it has been allocated, and it controls the
placement of its threads on those cores. A central core arbiter
allocates cores between applications. Adding Arachne to
memcached improved SLO-compliant throughput by 37%, reduced
tail latency by more than 10x, and allowed memcached to coexist
with background applications with almost no performance impact.
Adding Arachne to the RAMCloud storage system increased its
write throughput by more than 2.5x. The Arachne threading
library is optimized to minimize cache misses; it can initiate
a new user thread on a different core (with load balancing) in
320 ns. Arachne is implemented entirely at user level on Linux;
no kernel modifications are needed.

One interesting thing would be whether or not we could have a 2LS that
uses the brains of their scheduler. Specifically:

Arachne contains mechanisms to estimate the number of cores
needed by an application as it runs.

Arachne allows each application to define a core policy,
which determines at runtime how many cores the application
needs and how threads are placed on the available cores.

The Arachne runtime was designed to minimize cache misses. It
uses a novel representation of scheduling information with no
ready queues, which enables lowlatency and scalable mechanisms
for thread creation, scheduling, and synchronization.

As far as core preemption goes, it looks like they take a "2LS yields
when told to" approach, and they have a similar 'CG/LL' core
partitioning split (i.e. Core 0 for daemons and background stuff). For
user-level preemption, they don't do it: "occasionally yield."

They don't do anything with page faults or blocking syscalls: use a lot
of RAM and async I/O. The whole "here are a lot of user threads" but
also use async I/O is a little odd to me. But then again I've been
drinking the uthread kool-aid for a while now.


Reply all
Reply to author
0 new messages