https://bugzilla.kernel.org/show_bug.cgi?id=209219
Bug ID: 209219
Summary: KSHAKER: scheduling/execution timing perturbations
Product: Memory Management
Version: 2.5
Kernel Version: ALL
Hardware: All
OS: Linux
Tree: Mainline
Status: NEW
Severity: enhancement
Priority: P1
Component: Sanitizers
Assignee:
mm_san...@kernel-bugs.kernel.org
Reporter:
dvy...@google.com
CC:
kasa...@googlegroups.com
Regression: No
Two recent bug examples:
https://lore.kernel.org/netdev/20200908104025.4...@google.com/
https://lore.kernel.org/netdev/20200908145911.4...@google.com/
In both cases the race window is extremely narrow. And I suspect in the first
case it's not just the race window, but also the typical scheduling of events
is such that the UAF won't happen. Namely here:
- ieee802154_xmit_complete(&local->hw, skb, false);
-
dev->stats.tx_packets++;
dev->stats.tx_bytes += skb->len;
+ ieee802154_xmit_complete(&local->hw, skb, false);
+
The dev is _usually_ not freed by the call to ieee802154_xmit_complete. But the
bug is very straightforward (we literally free the object and use it after
that) and was introduced and unnoticed since 2014(!).
The other one was present in WireGuard initial implementation and was not
noticed since then as well.
There are sure way more examples like this -- most of the bugs that happen few
times and don't have reproducers.
The proposal is to introduce artificial random delays into execution and/or
some atypical scheduler perturbations. There are some sound approaches for
systematic enumeration of all possible executions (or specific subsets of
executions), but that's probably not feasible for kernel. Just some random
(maybe somewhat intelligently random) perturbations should be good enough for
starters.
For race-free programs it's enough to introduce delays only before
synchronization actions (atomic/lock operations). Any delays between local
actions can't lead to observable behavior differences. Now the kernel is not
race-free, so it does not have this nice properly. But we probably still want
to start with introducing delays only before synchronizations actions, that's
still a good oracle.
We already have some instrumentation hooks in atomic ops. Not sure about locks
(maybe something like might_sleep() will do?).
This should be useful for any automated/manual testing/fuzzing.
The proposed name: KSHAKER.
--
You are receiving this mail because:
You are on the CC list for the bug.