[PATCH] x86, kasan: add KASAN checks to atomic operations

Dmitry Vyukov

unread,

Mar 6, 2017, 7:42:59 AM3/6/17

to ak...@linux-foundation.org, arya...@virtuozzo.com, pet...@infradead.org, mi...@redhat.com, Dmitry Vyukov, kasa...@googlegroups.com, linu...@kvack.org, linux-...@vger.kernel.org

KASAN uses compiler instrumentation to intercept all memory accesses.
But it does not see memory accesses done in assembly code.
One notable user of assembly code is atomic operations. Frequently,
for example, an atomic reference decrement is the last access to an
object and a good candidate for a racy use-after-free.

Add manual KASAN checks to atomic operations.
Note: we need checks only before asm blocks and don't need them
in atomic functions composed of other atomic functions
(e.g. load-cmpxchg loops).

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Andrey Ryabinin <arya...@virtuozzo.com>
Cc: kasa...@googlegroups.com
Cc: linu...@kvack.org
Cc: linux-...@vger.kernel.org

---
Within a day it has found its first bug:

==================================================================
BUG: KASAN: use-after-free in atomic_dec_and_test
arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158
BUG: KASAN: use-after-free in put_task_struct
include/linux/sched/task.h:93 [inline] at addr ffff880079c30158
BUG: KASAN: use-after-free in put_ctx+0xcf/0x110
kernel/events/core.c:1131 at addr ffff880079c30158
Write of size 4 by task syz-executor6/25698
CPU: 2 PID: 25698 Comm: syz-executor6 Not tainted 4.10.0+ #302
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x2fb/0x3fd lib/dump_stack.c:52
kasan_object_err+0x1c/0x90 mm/kasan/report.c:166
print_address_description mm/kasan/report.c:208 [inline]
kasan_report_error mm/kasan/report.c:292 [inline]
kasan_report.part.2+0x1b0/0x460 mm/kasan/report.c:314
kasan_report+0x21/0x30 mm/kasan/report.c:301
check_memory_region_inline mm/kasan/kasan.c:326 [inline]
check_memory_region+0x139/0x190 mm/kasan/kasan.c:333
kasan_check_write+0x14/0x20 mm/kasan/kasan.c:344
atomic_dec_and_test arch/x86/include/asm/atomic.h:123 [inline]
put_task_struct include/linux/sched/task.h:93 [inline]
put_ctx+0xcf/0x110 kernel/events/core.c:1131
perf_event_release_kernel+0x3ad/0xc90 kernel/events/core.c:4322
perf_release+0x37/0x50 kernel/events/core.c:4338
__fput+0x332/0x800 fs/file_table.c:209
____fput+0x15/0x20 fs/file_table.c:245
task_work_run+0x197/0x260 kernel/task_work.c:116
exit_task_work include/linux/task_work.h:21 [inline]
do_exit+0xb38/0x29c0 kernel/exit.c:880
do_group_exit+0x149/0x420 kernel/exit.c:984
get_signal+0x7e0/0x1820 kernel/signal.c:2318
do_signal+0xd2/0x2190 arch/x86/kernel/signal.c:808
exit_to_usermode_loop+0x200/0x2a0 arch/x86/entry/common.c:157
syscall_return_slowpath arch/x86/entry/common.c:191 [inline]
do_syscall_64+0x6fc/0x930 arch/x86/entry/common.c:286
entry_SYSCALL64_slow_path+0x25/0x25
RIP: 0033:0x4458d9
RSP: 002b:00007f3f07187cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: fffffffffffffe00 RBX: 00000000007080c8 RCX: 00000000004458d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000007080c8
RBP: 00000000007080a8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 00007f3f071889c0 R15: 00007f3f07188700
Object at ffff880079c30140, in cache task_struct size: 5376
Allocated:
PID = 25681
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:616
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:555
kmem_cache_alloc_node+0x122/0x6f0 mm/slab.c:3662
alloc_task_struct_node kernel/fork.c:153 [inline]
dup_task_struct kernel/fork.c:495 [inline]
copy_process.part.38+0x19c8/0x4aa0 kernel/fork.c:1560
copy_process kernel/fork.c:1531 [inline]
_do_fork+0x200/0x1010 kernel/fork.c:1994
SYSC_clone kernel/fork.c:2104 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2098
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 25681
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:513
set_track mm/kasan/kasan.c:525 [inline]
kasan_slab_free+0x6f/0xb0 mm/kasan/kasan.c:589
__cache_free mm/slab.c:3514 [inline]
kmem_cache_free+0x71/0x240 mm/slab.c:3774
free_task_struct kernel/fork.c:158 [inline]
free_task+0x151/0x1d0 kernel/fork.c:370
copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
copy_process kernel/fork.c:1531 [inline]
_do_fork+0x200/0x1010 kernel/fork.c:1994
SYSC_clone kernel/fork.c:2104 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2098
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a
---
arch/x86/include/asm/atomic.h | 11 +++++++++++
arch/x86/include/asm/atomic64_64.h | 10 ++++++++++
arch/x86/include/asm/cmpxchg.h | 4 ++++
3 files changed, 25 insertions(+)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 14635c5ea025..64f0a7fb9b2f 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -2,6 +2,7 @@
#define _ASM_X86_ATOMIC_H

#include <linux/compiler.h>
+#include <linux/kasan-checks.h>
#include <linux/types.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>
@@ -47,6 +48,7 @@ static __always_inline void atomic_set(atomic_t *v, int i)
*/
static __always_inline void atomic_add(int i, atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)
: "ir" (i));
@@ -61,6 +63,7 @@ static __always_inline void atomic_add(int i, atomic_t *v)
*/
static __always_inline void atomic_sub(int i, atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "subl %1,%0"
: "+m" (v->counter)
: "ir" (i));
@@ -77,6 +80,7 @@ static __always_inline void atomic_sub(int i, atomic_t *v)
*/
static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", e);
}

@@ -88,6 +92,7 @@ static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
*/
static __always_inline void atomic_inc(atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "incl %0"
: "+m" (v->counter));
}
@@ -100,6 +105,7 @@ static __always_inline void atomic_inc(atomic_t *v)
*/
static __always_inline void atomic_dec(atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "decl %0"
: "+m" (v->counter));
}
@@ -114,6 +120,7 @@ static __always_inline void atomic_dec(atomic_t *v)
*/
static __always_inline bool atomic_dec_and_test(atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", e);
}

@@ -127,6 +134,7 @@ static __always_inline bool atomic_dec_and_test(atomic_t *v)
*/
static __always_inline bool atomic_inc_and_test(atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", e);
}

@@ -141,6 +149,7 @@ static __always_inline bool atomic_inc_and_test(atomic_t *v)
*/
static __always_inline bool atomic_add_negative(int i, atomic_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", s);
}

@@ -194,6 +203,7 @@ static inline int atomic_xchg(atomic_t *v, int new)
#define ATOMIC_OP(op) \
static inline void atomic_##op(int i, atomic_t *v) \
{ \
+ kasan_check_write(v, sizeof(*v)); \
asm volatile(LOCK_PREFIX #op"l %1,%0" \
: "+m" (v->counter) \
: "ir" (i) \
@@ -258,6 +268,7 @@ static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
*/
static __always_inline short int atomic_inc_short(short int *v)
{
+ kasan_check_write(v, sizeof(*v));
asm(LOCK_PREFIX "addw $1, %0" : "+m" (*v));
return *v;
}
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 89ed2f6ae2f7..13fe8ff5a126 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -2,6 +2,7 @@
#define _ASM_X86_ATOMIC64_64_H

#include <linux/types.h>
+#include <linux/kasan-checks.h>
#include <asm/alternative.h>
#include <asm/cmpxchg.h>

@@ -42,6 +43,7 @@ static inline void atomic64_set(atomic64_t *v, long i)
*/
static __always_inline void atomic64_add(long i, atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "addq %1,%0"
: "=m" (v->counter)
: "er" (i), "m" (v->counter));
@@ -56,6 +58,7 @@ static __always_inline void atomic64_add(long i, atomic64_t *v)
*/
static inline void atomic64_sub(long i, atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "subq %1,%0"
: "=m" (v->counter)
: "er" (i), "m" (v->counter));
@@ -72,6 +75,7 @@ static inline void atomic64_sub(long i, atomic64_t *v)
*/
static inline bool atomic64_sub_and_test(long i, atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, "er", i, "%0", e);
}

@@ -83,6 +87,7 @@ static inline bool atomic64_sub_and_test(long i, atomic64_t *v)
*/
static __always_inline void atomic64_inc(atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "incq %0"
: "=m" (v->counter)
: "m" (v->counter));
@@ -96,6 +101,7 @@ static __always_inline void atomic64_inc(atomic64_t *v)
*/
static __always_inline void atomic64_dec(atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
asm volatile(LOCK_PREFIX "decq %0"
: "=m" (v->counter)
: "m" (v->counter));
@@ -111,6 +117,7 @@ static __always_inline void atomic64_dec(atomic64_t *v)
*/
static inline bool atomic64_dec_and_test(atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", e);
}

@@ -124,6 +131,7 @@ static inline bool atomic64_dec_and_test(atomic64_t *v)
*/
static inline bool atomic64_inc_and_test(atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", e);
}

@@ -138,6 +146,7 @@ static inline bool atomic64_inc_and_test(atomic64_t *v)
*/
static inline bool atomic64_add_negative(long i, atomic64_t *v)
{
+ kasan_check_write(v, sizeof(*v));
GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, "er", i, "%0", s);
}

@@ -233,6 +242,7 @@ static inline long atomic64_dec_if_positive(atomic64_t *v)
#define ATOMIC64_OP(op) \
static inline void atomic64_##op(long i, atomic64_t *v) \
{ \
+ kasan_check_write(v, sizeof(*v)); \
asm volatile(LOCK_PREFIX #op"q %1,%0" \
: "+m" (v->counter) \
: "er" (i) \
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 97848cdfcb1a..a10e7fb09210 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -2,6 +2,7 @@
#define ASM_X86_CMPXCHG_H

#include <linux/compiler.h>
+#include <linux/kasan-checks.h>
#include <asm/cpufeatures.h>
#include <asm/alternative.h> /* Provides LOCK_PREFIX */

@@ -41,6 +42,7 @@ extern void __add_wrong_size(void)
#define __xchg_op(ptr, arg, op, lock) \
({ \
__typeof__ (*(ptr)) __ret = (arg); \
+ kasan_check_write((void *)(ptr), sizeof(*(ptr))); \
switch (sizeof(*(ptr))) { \
case __X86_CASE_B: \
asm volatile (lock #op "b %b0, %1\n" \
@@ -86,6 +88,7 @@ extern void __add_wrong_size(void)
__typeof__(*(ptr)) __ret; \
__typeof__(*(ptr)) __old = (old); \
__typeof__(*(ptr)) __new = (new); \
+ kasan_check_write((void *)(ptr), sizeof(*(ptr))); \
switch (size) { \
case __X86_CASE_B: \
{ \
@@ -171,6 +174,7 @@ extern void __add_wrong_size(void)
BUILD_BUG_ON(sizeof(*(p2)) != sizeof(long)); \
VM_BUG_ON((unsigned long)(p1) % (2 * sizeof(long))); \
VM_BUG_ON((unsigned long)((p1) + 1) != (unsigned long)(p2)); \
+ kasan_check_write((void *)(p1), 2 * sizeof(*(p1))); \
asm volatile(pfx "cmpxchg%c4b %2; sete %0" \
: "=a" (__ret), "+d" (__old2), \
"+m" (*(p1)), "+m" (*(p2)) \
--
2.12.0.rc1.440.g5b76565f74-goog

Dmitry Vyukov

unread,

Mar 6, 2017, 7:51:09 AM3/6/17

to Andrew Morton, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Dmitry Vyukov, kasan-dev, linu...@kvack.org, LKML

On Mon, Mar 6, 2017 at 1:42 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> KASAN uses compiler instrumentation to intercept all memory accesses.
> But it does not see memory accesses done in assembly code.
> One notable user of assembly code is atomic operations. Frequently,
> for example, an atomic reference decrement is the last access to an
> object and a good candidate for a racy use-after-free.
>
> Add manual KASAN checks to atomic operations.
> Note: we need checks only before asm blocks and don't need them
> in atomic functions composed of other atomic functions
> (e.g. load-cmpxchg loops).

Peter, also pointed me at arch/x86/include/asm/bitops.h. Will add them in v2.

Peter Zijlstra

unread,

Mar 6, 2017, 7:58:49 AM3/6/17

to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML

On Mon, Mar 06, 2017 at 01:50:47PM +0100, Dmitry Vyukov wrote:
> On Mon, Mar 6, 2017 at 1:42 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> > KASAN uses compiler instrumentation to intercept all memory accesses.
> > But it does not see memory accesses done in assembly code.
> > One notable user of assembly code is atomic operations. Frequently,
> > for example, an atomic reference decrement is the last access to an
> > object and a good candidate for a racy use-after-free.
> >
> > Add manual KASAN checks to atomic operations.
> > Note: we need checks only before asm blocks and don't need them
> > in atomic functions composed of other atomic functions
> > (e.g. load-cmpxchg loops).
>
> Peter, also pointed me at arch/x86/include/asm/bitops.h. Will add them in v2.
>

> > static __always_inline void atomic_add(int i, atomic_t *v)
> > {
> > + kasan_check_write(v, sizeof(*v));
> > asm volatile(LOCK_PREFIX "addl %1,%0"
> > : "+m" (v->counter)
> > : "ir" (i));

So the problem is doing load/stores from asm bits, and GCC
(traditionally) doesn't try and interpret APP asm bits.

However, could we not write a GCC plugin that does exactly that?
Something that interprets the APP asm bits and generates these KASAN
bits that go with it?

Peter Zijlstra

unread,

Mar 6, 2017, 8:01:05 AM3/6/17

to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML

Another suspect is the per-cpu stuff, that's all asm foo as well.

Dmitry Vyukov

unread,

Mar 6, 2017, 9:24:44 AM3/6/17

to Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Mark Rutland

+x86, Mark

Let me provide more context and design alternatives.

There are also other archs, at least arm64 for now.
There are also other tools. For KTSAN (race detector) we will
absolutely need to hook into atomic ops. For KMSAN (uses of unit
values) we also need to understand atomic ops at least to some degree.
Both of them will require different instrumentation.
For KASAN we are also more interested in cases where it's more likely
that an object is touched only by an asm, but not by normal memory
accesses (otherwise we would report the bug on the normal access,
which is fine, this makes atomic ops stand out in my opinion).

We could involve compiler (and by compiler I mean clang, because we
are not going to touch gcc, any volunteers?).
However, it's unclear if it will be simpler or not. There will
definitely will be a problem with uaccess asm blocks. Currently KASAN
relies of the fact that it does not see uaccess accesses and the user
addresses are considered bad by KASAN. There can also be a problem
with offsets/sizes, it's not possible to figure out what exactly an
asm block touches, we can only assume that it directly dereferences
the passed pointer. However, for example, bitops touch the pointer
with offset. Looking at the current x86 impl, we should be able to
handle it because the offset is computed outside of asm blocks. But
it's unclear if we hit this problem in other places.
I also see that arm64 bitops are implemented in .S files. And we won't
be able to instrument them in compiler.
There can also be other problems. Is it possible that some asm blocks
accept e.g. physical addresses? KASAN would consider them as bad.

We could also provide a parallel implementation of atomic ops based on
the new compiler builtins (__atomic_load_n and friends):
https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
and enable it under KSAN. The nice thing about it is that it will
automatically support arm64 and KMSAN and KTSAN.
But it's more work.

Re per-cpu asm. I would say that it's less critical than atomic ops.
Static per-cpu slots are not subject to use-after-free. Dynamic slots
can be subject to use-after-free and it would be nice to catch bugs
there. However, I think we will need to add manual
poisoning/unpoisoning of dynamic slots as well.

Bottom line:
1. Involving compiler looks quite complex, hard to deploy, and it's
unclear if it will actually make things easier.
2. This patch is the simplest short-term option (I am leaning towards
adding bitops to this patch and leaving percpu out for now).
3. Providing an implementation of atomic ops based on compiler
builtins looks like a nice option for other archs and tools, but is
more work. If you consider this as a good solution, we can move
straight to this option.

Peter Zijlstra

unread,

Mar 6, 2017, 10:20:14 AM3/6/17

to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Mark Rutland, Will Deacon

On Mon, Mar 06, 2017 at 03:24:23PM +0100, Dmitry Vyukov wrote:
> We could also provide a parallel implementation of atomic ops based on
> the new compiler builtins (__atomic_load_n and friends):
> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> and enable it under KSAN. The nice thing about it is that it will
> automatically support arm64 and KMSAN and KTSAN.
> But it's more work.

There's a summary out there somewhere, I think Will knows, that explain
how the C/C++ memory model and the Linux Kernel Memory model differ and
how its going to be 'interesting' to make using the C/C++ builtin crud
with the kernel 'correct.

Peter Zijlstra

unread,

Mar 6, 2017, 10:33:29 AM3/6/17

to Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Mark Rutland

On Mon, Mar 06, 2017 at 03:24:23PM +0100, Dmitry Vyukov wrote:

> We could involve compiler (and by compiler I mean clang, because we
> are not going to touch gcc, any volunteers?).

FWIW, clang isn't even close to being a viable compiler for the kernel.
It lacks far too many features, _IF_ you can get it to compiler in the
first place.

Mark Rutland

unread,

Mar 6, 2017, 11:04:15 AM3/6/17

to Peter Zijlstra, Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Will Deacon

Trivially, The C++ model doesn't feature I/O ordering [1]...

Otherwise Will pointed out a few details in [2].

Thanks,
Mark.

[1] https://lwn.net/Articles/698014/
[2] http://lwn.net/Articles/691295/

Mark Rutland

unread,

Mar 6, 2017, 11:20:30 AM3/6/17

to Dmitry Vyukov, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, will....@arm.com

Hi,

[roping in Will, since he loves atomics]

Unfortunately, I think that manual annotation is the only way to handle
these (as we already do for kernel part of the uaccess sequences), since
we hide things from the compiler or otherwise trick it into doing what
we want.

> +x86, Mark
>
> Let me provide more context and design alternatives.
>
> There are also other archs, at least arm64 for now.
> There are also other tools. For KTSAN (race detector) we will
> absolutely need to hook into atomic ops. For KMSAN (uses of unit
> values) we also need to understand atomic ops at least to some degree.
> Both of them will require different instrumentation.
> For KASAN we are also more interested in cases where it's more likely
> that an object is touched only by an asm, but not by normal memory
> accesses (otherwise we would report the bug on the normal access,
> which is fine, this makes atomic ops stand out in my opinion).
>
> We could involve compiler (and by compiler I mean clang, because we
> are not going to touch gcc, any volunteers?).

I don't think there's much you'll be able to do within the compiler,
assuming you mean to derive this from the asm block inputs and outputs.

Those can hide address-generation (e.g. with per-cpu stuff), which the
compiler may erroneously be detected as racing.

Those may also take fake inputs (e.g. the sp input to arm64's
__my_cpu_offset()) which may confuse matters.

Parsing the assembly itself will be *extremely* painful due to the way
that's set up for run-time patching.

> However, it's unclear if it will be simpler or not. There will
> definitely will be a problem with uaccess asm blocks. Currently KASAN
> relies of the fact that it does not see uaccess accesses and the user
> addresses are considered bad by KASAN. There can also be a problem
> with offsets/sizes, it's not possible to figure out what exactly an
> asm block touches, we can only assume that it directly dereferences
> the passed pointer. However, for example, bitops touch the pointer
> with offset. Looking at the current x86 impl, we should be able to
> handle it because the offset is computed outside of asm blocks. But
> it's unclear if we hit this problem in other places.

As above, I think you'd see more fun for the percpu stuff, since the
pointer passed into those is "fake", with a percpu pointer accessing
different addresses dependent on the CPU it is executed on.

> I also see that arm64 bitops are implemented in .S files. And we won't
> be able to instrument them in compiler.
> There can also be other problems. Is it possible that some asm blocks
> accept e.g. physical addresses? KASAN would consider them as bad.

I'm not sure I follow what you mean here.

I can imagine physical addresses being passed into asm statements that
don't access memory (e.g. for setting up the base registers for page
tables).

> We could also provide a parallel implementation of atomic ops based on
> the new compiler builtins (__atomic_load_n and friends):
> https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> and enable it under KSAN. The nice thing about it is that it will
> automatically support arm64 and KMSAN and KTSAN.
> But it's more work.

These don't permit runtime patching, and there are some differences
between the C11 and Linux kernel memory models, so at least in the near
term, I don't imagine we'd be likely to use this.

> Re per-cpu asm. I would say that it's less critical than atomic ops.
> Static per-cpu slots are not subject to use-after-free. Dynamic slots
> can be subject to use-after-free and it would be nice to catch bugs
> there. However, I think we will need to add manual
> poisoning/unpoisoning of dynamic slots as well.
>
> Bottom line:
> 1. Involving compiler looks quite complex, hard to deploy, and it's
> unclear if it will actually make things easier.
> 2. This patch is the simplest short-term option (I am leaning towards
> adding bitops to this patch and leaving percpu out for now).
> 3. Providing an implementation of atomic ops based on compiler
> builtins looks like a nice option for other archs and tools, but is
> more work. If you consider this as a good solution, we can move
> straight to this option.

Having *only* seen the assembly snippet at the top of this mail, I can't
say whether this is the simplest implementation.

However, I do think that annotation of this sort is the only reasonable
way to handle this.

Thanks,
Mark.

Dmitry Vyukov

unread,

Mar 6, 2017, 11:28:07 AM3/6/17

to Mark Rutland, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Will Deacon

Here is the whole patch:
https://groups.google.com/d/msg/kasan-dev/3sNHjjb4GCI/X76pwg_tAwAJ

Andrey Ryabinin

unread,

Mar 6, 2017, 11:46:55 AM3/6/17

to Dmitry Vyukov, Peter Zijlstra, Andrew Morton, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Mark Rutland

On 03/06/2017 05:24 PM, Dmitry Vyukov wrote:

> Let me provide more context and design alternatives.
>
> There are also other archs, at least arm64 for now.
> There are also other tools. For KTSAN (race detector) we will
> absolutely need to hook into atomic ops. For KMSAN (uses of unit
> values) we also need to understand atomic ops at least to some degree.
> Both of them will require different instrumentation.
> For KASAN we are also more interested in cases where it's more likely
> that an object is touched only by an asm, but not by normal memory
> accesses (otherwise we would report the bug on the normal access,
> which is fine, this makes atomic ops stand out in my opinion).
>
> We could involve compiler (and by compiler I mean clang, because we
> are not going to touch gcc, any volunteers?).

We've tried this with gcc about 3 years ago. Here is the patch - https://gcc.gnu.org/ml/gcc-patches/2014-05/msg02447.html
The problem is that memory block in "m" constraint doesn't actually mean
that inline asm will access it. It only means that asm block *may* access that memory (or part of it).
This causes false positives. As I vaguely remember I hit some false-positive in FPU-related code.

This problem gave birth to another idea - add a new constraint to strictly mark the memory access
inside asm block. See https://gcc.gnu.org/ml/gcc/2014-09/msg00237.html
But all ended with nothing.

Mark Rutland

unread,

Mar 6, 2017, 12:25:25 PM3/6/17

to Dmitry Vyukov, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Will Deacon

On Mon, Mar 06, 2017 at 05:27:44PM +0100, Dmitry Vyukov wrote:
> On Mon, Mar 6, 2017 at 5:20 PM, Mark Rutland <mark.r...@arm.com> wrote:
> > On Mon, Mar 06, 2017 at 03:24:23PM +0100, Dmitry Vyukov wrote:
> >> On Mon, Mar 6, 2017 at 2:01 PM, Peter Zijlstra <pet...@infradead.org> wrote:
> >> > On Mon, Mar 06, 2017 at 01:58:51PM +0100, Peter Zijlstra wrote:
> >> >> On Mon, Mar 06, 2017 at 01:50:47PM +0100, Dmitry Vyukov wrote:
> >> >> > On Mon, Mar 6, 2017 at 1:42 PM, Dmitry Vyukov <dvy...@google.com> wrote:
> >> >> > > KASAN uses compiler instrumentation to intercept all memory accesses.
> >> >> > > But it does not see memory accesses done in assembly code.
> >> >> > > One notable user of assembly code is atomic operations. Frequently,
> >> >> > > for example, an atomic reference decrement is the last access to an
> >> >> > > object and a good candidate for a racy use-after-free.
> >> >> > >
> >> >> > > Add manual KASAN checks to atomic operations.
> >> >> > > Note: we need checks only before asm blocks and don't need them
> >> >> > > in atomic functions composed of other atomic functions
> >> >> > > (e.g. load-cmpxchg loops).
> >> >> >
> >> >> > Peter, also pointed me at arch/x86/include/asm/bitops.h. Will add them in v2.
> >> >> >
> >> >>
> >> >> > > static __always_inline void atomic_add(int i, atomic_t *v)
> >> >> > > {
> >> >> > > + kasan_check_write(v, sizeof(*v));
> >> >> > > asm volatile(LOCK_PREFIX "addl %1,%0"
> >> >> > > : "+m" (v->counter)
> >> >> > > : "ir" (i));

> >> Bottom line:
> >> 1. Involving compiler looks quite complex, hard to deploy, and it's
> >> unclear if it will actually make things easier.
> >> 2. This patch is the simplest short-term option (I am leaning towards
> >> adding bitops to this patch and leaving percpu out for now).
> >> 3. Providing an implementation of atomic ops based on compiler
> >> builtins looks like a nice option for other archs and tools, but is
> >> more work. If you consider this as a good solution, we can move
> >> straight to this option.
> >
> > Having *only* seen the assembly snippet at the top of this mail, I can't
> > say whether this is the simplest implementation.
> >
> > However, I do think that annotation of this sort is the only reasonable
> > way to handle this.
>
> Here is the whole patch:
> https://groups.google.com/d/msg/kasan-dev/3sNHjjb4GCI/X76pwg_tAwAJ

I see.

Given we'd have to instrument each architecture's atomics in an
identical fashion, maybe we should follow the example of spinlocks, and
add an arch_ prefix to the arch-specific implementation, and place the
instrumentation in a common wrapper.

i.e. have something like:

static __always_inline void atomic_inc(atomic_t *v)
{

kasan_check_write(v, sizeof(*v));
arch_atomic_inc(v);
}

... in asm-generic somewhere.

It's more churn initially, but it should bea saving overall, and I
imagine for KMSAN or other things we may want more instrumentation
anyway...

Thanks,
Mark.

Peter Zijlstra

unread,

Mar 6, 2017, 3:34:59 PM3/6/17

to Mark Rutland, Dmitry Vyukov, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, will....@arm.com

On Mon, Mar 06, 2017 at 04:20:18PM +0000, Mark Rutland wrote:
> > >> So the problem is doing load/stores from asm bits, and GCC
> > >> (traditionally) doesn't try and interpret APP asm bits.
> > >>
> > >> However, could we not write a GCC plugin that does exactly that?
> > >> Something that interprets the APP asm bits and generates these KASAN
> > >> bits that go with it?

> I don't think there's much you'll be able to do within the compiler,
> assuming you mean to derive this from the asm block inputs and outputs.

Nah, I was thinking about a full asm interpreter.

> Those can hide address-generation (e.g. with per-cpu stuff), which the
> compiler may erroneously be detected as racing.
>
> Those may also take fake inputs (e.g. the sp input to arm64's
> __my_cpu_offset()) which may confuse matters.
>
> Parsing the assembly itself will be *extremely* painful due to the way
> that's set up for run-time patching.

Argh, yah, completely forgot about all that alternative and similar
nonsense :/

Dmitry Vyukov

unread,

Mar 8, 2017, 8:42:32 AM3/8/17

to Peter Zijlstra, Mark Rutland, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org, Will Deacon

I think if we scope compiler atomic builtins to KASAN/KTSAN/KMSAN (and
consequently x86/arm64) initially, it becomes more realistic. For the
tools we don't care about absolute efficiency and this gets rid of
Will's points (2), (4) and (6) here https://lwn.net/Articles/691295/.
Re (3) I think rmb/wmb can be reasonably replaced with
atomic_thread_fence(acquire/release). Re (5) situation with
correctness becomes better very quickly as more people use them in
user-space. Since KASAN is not intended to be used in production (or
at least such build is expected to crash), we can afford to shake out
any remaining correctness issues in such build. (1) I don't fully
understand, what exactly is the problem with seq_cst?

I've sketched a patch that does it, and did some testing with/without
KASAN on x86_64.

In short, it adds include/linux/atomic_compiler.h which is included
from include/linux/atomic.h when CONFIG_COMPILER_ATOMIC is defined;
and <asm/atomic.h> is not included when CONFIG_COMPILER_ATOMIC is
defined.
For bitops it is similar except that only parts of asm/bitops.h are
selectively disabled when CONFIG_COMPILER_ATOMIC, because it also
defines other stuff.
asm/barriers.h is left intact for now. We don't need it for KASAN. But
for KTSAN we can do similar thing -- selectively disable some of the
barriers in asm/barriers.h (e.g. leaving dma_rmb/wmb per arch).

Such change would allow us to support atomic ops for multiple arches
for all of KASAN/KTSAN/KMSAN.

Thoughts?

atomic_compiler.patch

Mark Rutland

unread,

Mar 8, 2017, 10:20:57 AM3/8/17

to Dmitry Vyukov, Will Deacon, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

Hi,

On Wed, Mar 08, 2017 at 02:42:10PM +0100, Dmitry Vyukov wrote:
> I think if we scope compiler atomic builtins to KASAN/KTSAN/KMSAN (and
> consequently x86/arm64) initially, it becomes more realistic. For the
> tools we don't care about absolute efficiency and this gets rid of
> Will's points (2), (4) and (6) here https://lwn.net/Articles/691295/.
> Re (3) I think rmb/wmb can be reasonably replaced with
> atomic_thread_fence(acquire/release). Re (5) situation with
> correctness becomes better very quickly as more people use them in
> user-space. Since KASAN is not intended to be used in production (or
> at least such build is expected to crash), we can afford to shake out
> any remaining correctness issues in such build. (1) I don't fully
> understand, what exactly is the problem with seq_cst?

I'll have to leave it to Will to have the final word on these; I'm
certainly not familiar enough with the C11 memory model to comment on
(1).

However, w.r.t. (3), I don't think we can substitute rmb() and wmb()
with atomic_thread_fence_acquire() and atomic_thread_fence_release()
respectively on arm64.

The former use barriers with full system scope, whereas the latter may
be limited to the inner shareable domain. While I'm not sure of the
precise intended semantics of wmb() and rmb(), I believe this
substitution would break some cases (e.g. communicating with a
non-coherent master).

Note that regardless, we'd have to special-case __iowmb() to use a full
system barrier.

Also, w.r.t. (5), modulo the lack of atomic instrumentation, people use
KASAN today, with compilers that are known to have bugs in their atomics
(e.g. GCC bug 69875). Thus, we cannot rely on the compiler's
implementation of atomics without introducing a functional regression.

> i'Ve sketched a patch that does it, and did some testing with/without

> KASAN on x86_64.
>
> In short, it adds include/linux/atomic_compiler.h which is included
> from include/linux/atomic.h when CONFIG_COMPILER_ATOMIC is defined;
> and <asm/atomic.h> is not included when CONFIG_COMPILER_ATOMIC is
> defined.
> For bitops it is similar except that only parts of asm/bitops.h are
> selectively disabled when CONFIG_COMPILER_ATOMIC, because it also
> defines other stuff.
> asm/barriers.h is left intact for now. We don't need it for KASAN. But
> for KTSAN we can do similar thing -- selectively disable some of the
> barriers in asm/barriers.h (e.g. leaving dma_rmb/wmb per arch).
>
> Such change would allow us to support atomic ops for multiple arches
> for all of KASAN/KTSAN/KMSAN.
>
> Thoughts?

As in my other reply, I'd prefer that we wrapped the (arch-specific)
atomic implementations such that we can instrument them explicitly in a
core header. That means that the implementation and semantics of the
atomics don't change at all.

Note that we could initially do this just for x86 and arm64), e.g. by
having those explicitly include an <asm-generic/atomic-instrumented.h>
at the end of their <asm/atomic.h>.

For architectures which can use the compiler's atomics, we can allow
them to do so, skipping the redundant explicit instrumentation.

Other than being potentially slower (which we've established we don't
care too much about above), is there a problem with that approach?

Thanks,
Mark.

Dmitry Vyukov

unread,

Mar 8, 2017, 10:27:33 AM3/8/17

to Mark Rutland, Will Deacon, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

How exactly do you want to do this incrementally?
I don't feel ready to shuffle all archs, but doing x86 in one patch
and then arm64 in another looks tractable.

Mark Rutland

unread,

Mar 8, 2017, 10:44:10 AM3/8/17

to Dmitry Vyukov, Will Deacon, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

I guess we'd have three patches: one adding the header and any core
infrastructure, followed by separate patches migrating arm64 and x86
over.

Thanks,
Mark.

Dmitry Vyukov

unread,

Mar 8, 2017, 10:46:20 AM3/8/17

to Mark Rutland, Will Deacon, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

But if we add e.g. atomic_read() which forwards to arch_atomic_read()
to <linux/atomic.h>, it will break all archs that don't rename its
atomic_read() to arch_atomic_read().

Mark Rutland

unread,

Mar 8, 2017, 10:49:07 AM3/8/17

to Dmitry Vyukov, Will Deacon, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

... as above, that'd be handled by placing this in an
<asm-generic/atomic-instrumented.h> file, that we only include at the
end of the arch implementation.

So we'd only include that on arm64 and x86, without needing to change
the names elsewhere.

Thanks,
Mark.

Will Deacon

unread,

Mar 8, 2017, 12:42:53 PM3/8/17

to Mark Rutland, Dmitry Vyukov, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

On Wed, Mar 08, 2017 at 03:20:41PM +0000, Mark Rutland wrote:
> On Wed, Mar 08, 2017 at 02:42:10PM +0100, Dmitry Vyukov wrote:
> > I think if we scope compiler atomic builtins to KASAN/KTSAN/KMSAN (and
> > consequently x86/arm64) initially, it becomes more realistic. For the
> > tools we don't care about absolute efficiency and this gets rid of
> > Will's points (2), (4) and (6) here https://lwn.net/Articles/691295/.
> > Re (3) I think rmb/wmb can be reasonably replaced with
> > atomic_thread_fence(acquire/release). Re (5) situation with
> > correctness becomes better very quickly as more people use them in
> > user-space. Since KASAN is not intended to be used in production (or
> > at least such build is expected to crash), we can afford to shake out
> > any remaining correctness issues in such build. (1) I don't fully
> > understand, what exactly is the problem with seq_cst?
>
> I'll have to leave it to Will to have the final word on these; I'm
> certainly not familiar enough with the C11 memory model to comment on
> (1).

rmb()/wmb() are not remotely similar to
atomic_thread_fenc_{acquire,release}, even if you restrict ordering to
coherent CPUs (i.e. the smp_* variants). Please don't do that :)

I'm also terrified of the optimisations that the compiler is theoretically
allowed to make to C11 atomics given the assumptions of the language
virtual machine, which are not necessarily valid in the kernel environment.
We would at least need well-supported compiler options to disable these
options, and also to allow data races with things like READ_ONCE.

Will

Dmitry Vyukov

unread,

Mar 14, 2017, 11:23:14 AM3/14/17

to Will Deacon, Mark Rutland, Peter Zijlstra, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

Hello,

I've prototyped what Mark suggested:
- prefix arch atomics with arch_
- add <asm-generic/atomic-instrumented.h> which defined atomics and
forwards to the arch_ version

Patch attached. It boot with/without KASAN.

Does it look reasonable to you?

If so, I will split it into:
- minor kasan patch that adds volatile to kasan_check_read/write
- main patch that adds arch_ prefix and
<asm-generic/atomic-instrumented.h> header
- kasan instrumentation in <asm-generic/atomic-instrumented.h>

Any other suggestions?

atomic.patch

Peter Zijlstra

unread,

Mar 14, 2017, 11:31:58 AM3/14/17

to Dmitry Vyukov, Will Deacon, Mark Rutland, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

On Tue, Mar 14, 2017 at 04:22:52PM +0100, Dmitry Vyukov wrote:
> Any other suggestions?

> - return i + xadd(&v->counter, i);
> + return i + arch_xadd(&v->counter, i);

> +#define xadd(ptr, v) \
> +({ \
> + __typeof__(ptr) ____ptr = (ptr); \
> + kasan_check_write(____ptr, sizeof(*____ptr)); \
> + arch_xadd(____ptr, (v)); \
> +})

xadd() isn't a generic thing, it only exists inside x86 as a helper to
implement atomic bits.

Peter Zijlstra

unread,

Mar 14, 2017, 11:32:42 AM3/14/17

to Dmitry Vyukov, Will Deacon, Mark Rutland, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

On Tue, Mar 14, 2017 at 04:22:52PM +0100, Dmitry Vyukov wrote:

> -static __always_inline int atomic_read(const atomic_t *v)
> +static __always_inline int arch_atomic_read(const atomic_t *v)
> {
> - return READ_ONCE((v)->counter);
> + return READ_ONCE_NOCHECK((v)->counter);

Should NOCHEKC come with a comment, because i've no idea why this is so.

> }

Mark Rutland

unread,

Mar 14, 2017, 11:44:46 AM3/14/17

to Peter Zijlstra, Dmitry Vyukov, Will Deacon, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

I suspect the idea is that given the wrapper will have done the KASAN
check, duplicating it here is either sub-optimal, or results in
duplicate splats. READ_ONCE() has an implicit KASAN check,
READ_ONCE_NOCHECK() does not.

If this is to solve duplicate splats, it'd be worth having a
WRITE_ONCE_NOCHECK() for arch_atomic_set().

Agreed on the comment, regardless.

Thanks,
Mark.

Dmitry Vyukov

unread,

Mar 14, 2017, 3:24:19 PM3/14/17

to mark.r...@arm.com, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org, Dmitry Vyukov

KASAN uses compiler instrumentation to intercept all memory accesses.
But it does not see memory accesses done in assembly code.
One notable user of assembly code is atomic operations. Frequently,
for example, an atomic reference decrement is the last access to an
object and a good candidate for a racy use-after-free.

Atomic operations are defined in arch files, but KASAN instrumentation
is required for several archs that support KASAN. Later we will need
similar hooks for KMSAN (uninit use detector) and KTSAN (data race
detector).

This change introduces wrappers around atomic operations that can be
used to add KASAN/KMSAN/KTSAN instrumentation across several archs,
and adds KASAN checks to them.

This patch uses the wrappers only for x86 arch. Arm64 will be switched
later. And we also plan to instrument bitops in a similar way.

Within a day it has found its first bug:

BUG: KASAN: use-after-free in atomic_dec_and_test
arch/x86/include/asm/atomic.h:123 [inline] at addr ffff880079c30158

Write of size 4 by task syz-executor6/25698
CPU: 2 PID: 25698 Comm: syz-executor6 Not tainted 4.10.0+ #302
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:

kmem_cache_alloc_node+0x122/0x6f0 mm/slab.c:3662
alloc_task_struct_node kernel/fork.c:153 [inline]
dup_task_struct kernel/fork.c:495 [inline]
copy_process.part.38+0x19c8/0x4aa0 kernel/fork.c:1560
copy_process kernel/fork.c:1531 [inline]
_do_fork+0x200/0x1010 kernel/fork.c:1994
SYSC_clone kernel/fork.c:2104 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2098
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a
Freed:
PID = 25681

__cache_free mm/slab.c:3514 [inline]
kmem_cache_free+0x71/0x240 mm/slab.c:3774
free_task_struct kernel/fork.c:158 [inline]
free_task+0x151/0x1d0 kernel/fork.c:370
copy_process.part.38+0x18e5/0x4aa0 kernel/fork.c:1931
copy_process kernel/fork.c:1531 [inline]
_do_fork+0x200/0x1010 kernel/fork.c:1994
SYSC_clone kernel/fork.c:2104 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2098
do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
return_from_SYSCALL_64+0x0/0x7a

Dmitry Vyukov (3):
kasan: allow kasan_check_read/write() to accept pointers to volatiles
asm-generic, x86: wrap atomic operations
asm-generic: add KASAN instrumentation to atomic operations

arch/x86/include/asm/atomic.h | 100 ++++++------
arch/x86/include/asm/atomic64_32.h | 86 ++++++-----
arch/x86/include/asm/atomic64_64.h | 90 +++++------
arch/x86/include/asm/cmpxchg.h | 12 +-
arch/x86/include/asm/cmpxchg_32.h | 8 +-
arch/x86/include/asm/cmpxchg_64.h | 4 +-
include/asm-generic/atomic-instrumented.h | 246 ++++++++++++++++++++++++++++++
include/linux/kasan-checks.h | 10 +-
mm/kasan/kasan.c | 4 +-
9 files changed, 411 insertions(+), 149 deletions(-)
create mode 100644 include/asm-generic/atomic-instrumented.h

--
2.12.0.367.g23dc2f6d3c-goog

Dmitry Vyukov

unread,

Mar 14, 2017, 3:24:21 PM3/14/17

to mark.r...@arm.com, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org, Dmitry Vyukov

Currently kasan_check_read/write() accept 'const void*', make them
accept 'const volatile void*'. This is required for instrumentation
of atomic operations and there is just no reason to not allow that.

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Cc: Mark Rutland <mark.r...@arm.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Will Deacon <will....@arm.com>,
Cc: Andrew Morton <ak...@linux-foundation.org>,
Cc: Andrey Ryabinin <arya...@virtuozzo.com>,
Cc: Ingo Molnar <mi...@redhat.com>,

Cc: kasa...@googlegroups.com
Cc: linu...@kvack.org
Cc: linux-...@vger.kernel.org

Cc: x...@kernel.org
---
include/linux/kasan-checks.h | 10 ++++++----
mm/kasan/kasan.c | 4 ++--
2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan-checks.h b/include/linux/kasan-checks.h
index b7f8aced7870..41960fecf783 100644
--- a/include/linux/kasan-checks.h
+++ b/include/linux/kasan-checks.h
@@ -2,11 +2,13 @@
#define _LINUX_KASAN_CHECKS_H

#ifdef CONFIG_KASAN
-void kasan_check_read(const void *p, unsigned int size);
-void kasan_check_write(const void *p, unsigned int size);
+void kasan_check_read(const volatile void *p, unsigned int size);
+void kasan_check_write(const volatile void *p, unsigned int size);
#else
-static inline void kasan_check_read(const void *p, unsigned int size) { }
-static inline void kasan_check_write(const void *p, unsigned int size) { }
+static inline void kasan_check_read(const volatile void *p, unsigned int size)
+{ }
+static inline void kasan_check_write(const volatile void *p, unsigned int size)
+{ }
#endif

#endif
diff --git a/mm/kasan/kasan.c b/mm/kasan/kasan.c
index 98b27195e38b..db46e66eb1d4 100644
--- a/mm/kasan/kasan.c
+++ b/mm/kasan/kasan.c
@@ -333,13 +333,13 @@ static void check_memory_region(unsigned long addr,
check_memory_region_inline(addr, size, write, ret_ip);
}

-void kasan_check_read(const void *p, unsigned int size)
+void kasan_check_read(const volatile void *p, unsigned int size)
{
check_memory_region((unsigned long)p, size, false, _RET_IP_);
}
EXPORT_SYMBOL(kasan_check_read);

-void kasan_check_write(const void *p, unsigned int size)
+void kasan_check_write(const volatile void *p, unsigned int size)
{
check_memory_region((unsigned long)p, size, true, _RET_IP_);
}
--
2.12.0.367.g23dc2f6d3c-goog

Dmitry Vyukov

unread,

Mar 14, 2017, 3:24:22 PM3/14/17

to mark.r...@arm.com, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org, Dmitry Vyukov

KASAN uses compiler instrumentation to intercept all memory accesses.
But it does not see memory accesses done in assembly code.
One notable user of assembly code is atomic operations. Frequently,
for example, an atomic reference decrement is the last access to an
object and a good candidate for a racy use-after-free.

Atomic operations are defined in arch files, but KASAN instrumentation
is required for several archs that support KASAN. Later we will need
similar hooks for KMSAN (uninit use detector) and KTSAN (data race
detector).

This change introduces wrappers around atomic operations that can be

used to add KASAN/KMSAN/KTSAN instrumentation across several archs.

This patch uses the wrappers only for x86 arch. Arm64 will be switched
later.

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Cc: Mark Rutland <mark.r...@arm.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Will Deacon <will....@arm.com>,
Cc: Andrew Morton <ak...@linux-foundation.org>,
Cc: Andrey Ryabinin <arya...@virtuozzo.com>,
Cc: Ingo Molnar <mi...@redhat.com>,
Cc: kasa...@googlegroups.com
Cc: linu...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: x...@kernel.org
---

arch/x86/include/asm/atomic.h | 100 +++++++-------
arch/x86/include/asm/atomic64_32.h | 86 ++++++------
arch/x86/include/asm/atomic64_64.h | 90 ++++++-------

arch/x86/include/asm/cmpxchg.h | 12 +-
arch/x86/include/asm/cmpxchg_32.h | 8 +-
arch/x86/include/asm/cmpxchg_64.h | 4 +-

include/asm-generic/atomic-instrumented.h | 210 ++++++++++++++++++++++++++++++
7 files changed, 367 insertions(+), 143 deletions(-)

diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 14635c5ea025..95dd167eb3af 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -16,36 +16,46 @@
#define ATOMIC_INIT(i) { (i) }

/**
- * atomic_read - read atomic variable
+ * arch_atomic_read - read atomic variable
* @v: pointer of type atomic_t
*
* Atomically reads the value of @v.
*/

-static __always_inline int atomic_read(const atomic_t *v)
+static __always_inline int arch_atomic_read(const atomic_t *v)
{
- return READ_ONCE((v)->counter);

+ /*
+ * We use READ_ONCE_NOCHECK() because atomic_read() contains KASAN
+ * instrumentation. Double instrumentation is unnecessary.
+ */
+ return READ_ONCE_NOCHECK((v)->counter);
}

/**
- * atomic_set - set atomic variable
+ * arch_atomic_set - set atomic variable
* @v: pointer of type atomic_t
* @i: required value
*
* Atomically sets the value of @v to @i.
*/
-static __always_inline void atomic_set(atomic_t *v, int i)
+static __always_inline void arch_atomic_set(atomic_t *v, int i)
{
+ /*
+ * We could use WRITE_ONCE_NOCHECK() if it exists, similar to
+ * READ_ONCE_NOCHECK() in arch_atomic_read(). But there is no such
+ * thing at the moment, and introducing it for this case does not
+ * worth it.
+ */
WRITE_ONCE(v->counter, i);
}

/**
- * atomic_add - add integer to atomic variable
+ * arch_atomic_add - add integer to atomic variable
* @i: integer value to add
* @v: pointer of type atomic_t
*
* Atomically adds @i to @v.
*/
-static __always_inline void atomic_add(int i, atomic_t *v)
+static __always_inline void arch_atomic_add(int i, atomic_t *v)
{

asm volatile(LOCK_PREFIX "addl %1,%0"
: "+m" (v->counter)

@@ -53,13 +63,13 @@ static __always_inline void atomic_add(int i, atomic_t *v)
}

/**
- * atomic_sub - subtract integer from atomic variable
+ * arch_atomic_sub - subtract integer from atomic variable
* @i: integer value to subtract
* @v: pointer of type atomic_t
*
* Atomically subtracts @i from @v.
*/
-static __always_inline void atomic_sub(int i, atomic_t *v)
+static __always_inline void arch_atomic_sub(int i, atomic_t *v)
{

asm volatile(LOCK_PREFIX "subl %1,%0"
: "+m" (v->counter)

@@ -67,7 +77,7 @@ static __always_inline void atomic_sub(int i, atomic_t *v)
}

/**
- * atomic_sub_and_test - subtract value from variable and test result
+ * arch_atomic_sub_and_test - subtract value from variable and test result
* @i: integer value to subtract
* @v: pointer of type atomic_t
*
@@ -75,63 +85,63 @@ static __always_inline void atomic_sub(int i, atomic_t *v)
* true if the result is zero, or false for all
* other cases.
*/
-static __always_inline bool atomic_sub_and_test(int i, atomic_t *v)
+static __always_inline bool arch_atomic_sub_and_test(int i, atomic_t *v)
{

GEN_BINARY_RMWcc(LOCK_PREFIX "subl", v->counter, "er", i, "%0", e);
}

/**
- * atomic_inc - increment atomic variable
+ * arch_atomic_inc - increment atomic variable
* @v: pointer of type atomic_t
*
* Atomically increments @v by 1.
*/
-static __always_inline void atomic_inc(atomic_t *v)
+static __always_inline void arch_atomic_inc(atomic_t *v)
{

asm volatile(LOCK_PREFIX "incl %0"
: "+m" (v->counter));
}

/**
- * atomic_dec - decrement atomic variable
+ * arch_atomic_dec - decrement atomic variable
* @v: pointer of type atomic_t
*
* Atomically decrements @v by 1.
*/
-static __always_inline void atomic_dec(atomic_t *v)
+static __always_inline void arch_atomic_dec(atomic_t *v)
{

asm volatile(LOCK_PREFIX "decl %0"
: "+m" (v->counter));
}

/**
- * atomic_dec_and_test - decrement and test
+ * arch_atomic_dec_and_test - decrement and test
* @v: pointer of type atomic_t
*
* Atomically decrements @v by 1 and
* returns true if the result is 0, or false for all other
* cases.
*/
-static __always_inline bool atomic_dec_and_test(atomic_t *v)
+static __always_inline bool arch_atomic_dec_and_test(atomic_t *v)
{

GEN_UNARY_RMWcc(LOCK_PREFIX "decl", v->counter, "%0", e);
}

/**
- * atomic_inc_and_test - increment and test
+ * arch_atomic_inc_and_test - increment and test
* @v: pointer of type atomic_t
*
* Atomically increments @v by 1
* and returns true if the result is zero, or false for all
* other cases.
*/
-static __always_inline bool atomic_inc_and_test(atomic_t *v)
+static __always_inline bool arch_atomic_inc_and_test(atomic_t *v)
{

GEN_UNARY_RMWcc(LOCK_PREFIX "incl", v->counter, "%0", e);
}

/**
- * atomic_add_negative - add and test if negative
+ * arch_atomic_add_negative - add and test if negative
* @i: integer value to add
* @v: pointer of type atomic_t
*
@@ -139,60 +149,60 @@ static __always_inline bool atomic_inc_and_test(atomic_t *v)
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
-static __always_inline bool atomic_add_negative(int i, atomic_t *v)
+static __always_inline bool arch_atomic_add_negative(int i, atomic_t *v)
{

GEN_BINARY_RMWcc(LOCK_PREFIX "addl", v->counter, "er", i, "%0", s);
}

/**
- * atomic_add_return - add integer and return
+ * arch_atomic_add_return - add integer and return
* @i: integer value to add
* @v: pointer of type atomic_t
*
* Atomically adds @i to @v and returns @i + @v
*/
-static __always_inline int atomic_add_return(int i, atomic_t *v)
+static __always_inline int arch_atomic_add_return(int i, atomic_t *v)
{

return i + xadd(&v->counter, i);
}

/**
- * atomic_sub_return - subtract integer and return
+ * arch_atomic_sub_return - subtract integer and return
* @v: pointer of type atomic_t
* @i: integer value to subtract
*
* Atomically subtracts @i from @v and returns @v - @i
*/
-static __always_inline int atomic_sub_return(int i, atomic_t *v)
+static __always_inline int arch_atomic_sub_return(int i, atomic_t *v)
{
- return atomic_add_return(-i, v);
+ return arch_atomic_add_return(-i, v);
}

-#define atomic_inc_return(v) (atomic_add_return(1, v))
-#define atomic_dec_return(v) (atomic_sub_return(1, v))
+#define arch_atomic_inc_return(v) (arch_atomic_add_return(1, v))
+#define arch_atomic_dec_return(v) (arch_atomic_sub_return(1, v))

-static __always_inline int atomic_fetch_add(int i, atomic_t *v)
+static __always_inline int arch_atomic_fetch_add(int i, atomic_t *v)
{
return xadd(&v->counter, i);
}

-static __always_inline int atomic_fetch_sub(int i, atomic_t *v)
+static __always_inline int arch_atomic_fetch_sub(int i, atomic_t *v)
{
return xadd(&v->counter, -i);
}

-static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+static __always_inline int arch_atomic_cmpxchg(atomic_t *v, int old, int new)
{
- return cmpxchg(&v->counter, old, new);
+ return arch_cmpxchg(&v->counter, old, new);
}

-static inline int atomic_xchg(atomic_t *v, int new)
+static inline int arch_atomic_xchg(atomic_t *v, int new)
{
- return xchg(&v->counter, new);
+ return arch_xchg(&v->counter, new);
}

#define ATOMIC_OP(op) \
-static inline void atomic_##op(int i, atomic_t *v) \
+static inline void arch_atomic_##op(int i, atomic_t *v) \
{ \

asm volatile(LOCK_PREFIX #op"l %1,%0" \
: "+m" (v->counter) \

@@ -201,11 +211,11 @@ static inline void atomic_##op(int i, atomic_t *v) \
}

#define ATOMIC_FETCH_OP(op, c_op) \
-static inline int atomic_fetch_##op(int i, atomic_t *v) \
+static inline int arch_atomic_fetch_##op(int i, atomic_t *v) \
{ \
- int old, val = atomic_read(v); \
+ int old, val = arch_atomic_read(v); \
for (;;) { \
- old = atomic_cmpxchg(v, val, val c_op i); \
+ old = arch_atomic_cmpxchg(v, val, val c_op i); \
if (old == val) \
break; \
val = old; \
@@ -226,7 +236,7 @@ ATOMIC_OPS(xor, ^)
#undef ATOMIC_OP

/**
- * __atomic_add_unless - add unless the number is already a given value
+ * __arch_atomic_add_unless - add unless the number is already a given value
* @v: pointer of type atomic_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
@@ -234,14 +244,14 @@ ATOMIC_OPS(xor, ^)
* Atomically adds @a to @v, so long as @v was not already @u.
* Returns the old value of @v.
*/
-static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
+static __always_inline int __arch_atomic_add_unless(atomic_t *v, int a, int u)
{
int c, old;
- c = atomic_read(v);
+ c = arch_atomic_read(v);
for (;;) {
if (unlikely(c == (u)))
break;
- old = atomic_cmpxchg((v), c, c + (a));
+ old = arch_atomic_cmpxchg((v), c, c + (a));
if (likely(old == c))
break;
c = old;
@@ -250,13 +260,13 @@ static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
}

/**
- * atomic_inc_short - increment of a short integer
+ * arch_atomic_inc_short - increment of a short integer
* @v: pointer to type int
*
* Atomically adds 1 to @v
* Returns the new value of @u
*/
-static __always_inline short int atomic_inc_short(short int *v)
+static __always_inline short int arch_atomic_inc_short(short int *v)
{

asm(LOCK_PREFIX "addw $1, %0" : "+m" (*v));
return *v;

@@ -268,4 +278,6 @@ static __always_inline short int atomic_inc_short(short int *v)
# include <asm/atomic64_64.h>
#endif

+#include <asm-generic/atomic-instrumented.h>
+
#endif /* _ASM_X86_ATOMIC_H */
diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h
index 71d7705fb303..4a48b20000f3 100644
--- a/arch/x86/include/asm/atomic64_32.h
+++ b/arch/x86/include/asm/atomic64_32.h
@@ -61,7 +61,7 @@ ATOMIC64_DECL(add_unless);
#undef ATOMIC64_EXPORT

/**
- * atomic64_cmpxchg - cmpxchg atomic64 variable
+ * arch_atomic64_cmpxchg - cmpxchg atomic64 variable
* @v: pointer to type atomic64_t
* @o: expected value
* @n: new value
@@ -70,20 +70,21 @@ ATOMIC64_DECL(add_unless);
* the old value.
*/

-static inline long long atomic64_cmpxchg(atomic64_t *v, long long o, long long n)
+static inline long long arch_atomic64_cmpxchg(atomic64_t *v, long long o,
+ long long n)
{
- return cmpxchg64(&v->counter, o, n);
+ return arch_cmpxchg64(&v->counter, o, n);
}

/**
- * atomic64_xchg - xchg atomic64 variable
+ * arch_atomic64_xchg - xchg atomic64 variable
* @v: pointer to type atomic64_t
* @n: value to assign
*
* Atomically xchgs the value of @v to @n and returns
* the old value.
*/
-static inline long long atomic64_xchg(atomic64_t *v, long long n)
+static inline long long arch_atomic64_xchg(atomic64_t *v, long long n)
{
long long o;
unsigned high = (unsigned)(n >> 32);
@@ -95,13 +96,13 @@ static inline long long atomic64_xchg(atomic64_t *v, long long n)
}

/**
- * atomic64_set - set atomic64 variable
+ * arch_atomic64_set - set atomic64 variable
* @v: pointer to type atomic64_t
* @i: value to assign
*
* Atomically sets the value of @v to @n.
*/
-static inline void atomic64_set(atomic64_t *v, long long i)
+static inline void arch_atomic64_set(atomic64_t *v, long long i)
{
unsigned high = (unsigned)(i >> 32);
unsigned low = (unsigned)i;
@@ -111,12 +112,12 @@ static inline void atomic64_set(atomic64_t *v, long long i)
}

/**
- * atomic64_read - read atomic64 variable
+ * arch_atomic64_read - read atomic64 variable
* @v: pointer to type atomic64_t
*
* Atomically reads the value of @v and returns it.
*/
-static inline long long atomic64_read(const atomic64_t *v)
+static inline long long arch_atomic64_read(const atomic64_t *v)
{
long long r;
alternative_atomic64(read, "=&A" (r), "c" (v) : "memory");
@@ -124,13 +125,13 @@ static inline long long atomic64_read(const atomic64_t *v)
}

/**
- * atomic64_add_return - add and return
+ * arch_atomic64_add_return - add and return
* @i: integer value to add
* @v: pointer to type atomic64_t
*
* Atomically adds @i to @v and returns @i + *@v
*/
-static inline long long atomic64_add_return(long long i, atomic64_t *v)
+static inline long long arch_atomic64_add_return(long long i, atomic64_t *v)
{
alternative_atomic64(add_return,
ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -141,7 +142,7 @@ static inline long long atomic64_add_return(long long i, atomic64_t *v)
/*
* Other variants with different arithmetic operators:
*/
-static inline long long atomic64_sub_return(long long i, atomic64_t *v)
+static inline long long arch_atomic64_sub_return(long long i, atomic64_t *v)
{
alternative_atomic64(sub_return,
ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -149,7 +150,7 @@ static inline long long atomic64_sub_return(long long i, atomic64_t *v)
return i;
}

-static inline long long atomic64_inc_return(atomic64_t *v)
+static inline long long arch_atomic64_inc_return(atomic64_t *v)
{
long long a;
alternative_atomic64(inc_return, "=&A" (a),
@@ -157,7 +158,7 @@ static inline long long atomic64_inc_return(atomic64_t *v)
return a;
}

-static inline long long atomic64_dec_return(atomic64_t *v)
+static inline long long arch_atomic64_dec_return(atomic64_t *v)
{
long long a;
alternative_atomic64(dec_return, "=&A" (a),
@@ -166,13 +167,13 @@ static inline long long atomic64_dec_return(atomic64_t *v)
}

/**
- * atomic64_add - add integer to atomic64 variable
+ * arch_atomic64_add - add integer to atomic64 variable
* @i: integer value to add
* @v: pointer to type atomic64_t
*
* Atomically adds @i to @v.
*/
-static inline long long atomic64_add(long long i, atomic64_t *v)
+static inline long long arch_atomic64_add(long long i, atomic64_t *v)
{
__alternative_atomic64(add, add_return,
ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -181,13 +182,13 @@ static inline long long atomic64_add(long long i, atomic64_t *v)
}

/**
- * atomic64_sub - subtract the atomic64 variable
+ * arch_atomic64_sub - subtract the atomic64 variable
* @i: integer value to subtract
* @v: pointer to type atomic64_t
*
* Atomically subtracts @i from @v.
*/
-static inline long long atomic64_sub(long long i, atomic64_t *v)
+static inline long long arch_atomic64_sub(long long i, atomic64_t *v)
{
__alternative_atomic64(sub, sub_return,
ASM_OUTPUT2("+A" (i), "+c" (v)),
@@ -196,7 +197,7 @@ static inline long long atomic64_sub(long long i, atomic64_t *v)
}

/**
- * atomic64_sub_and_test - subtract value from variable and test result
+ * arch_atomic64_sub_and_test - subtract value from variable and test result
* @i: integer value to subtract
* @v: pointer to type atomic64_t
*
@@ -204,46 +205,46 @@ static inline long long atomic64_sub(long long i, atomic64_t *v)
* true if the result is zero, or false for all
* other cases.
*/
-static inline int atomic64_sub_and_test(long long i, atomic64_t *v)
+static inline int arch_atomic64_sub_and_test(long long i, atomic64_t *v)
{
- return atomic64_sub_return(i, v) == 0;
+ return arch_atomic64_sub_return(i, v) == 0;
}

/**
- * atomic64_inc - increment atomic64 variable
+ * arch_atomic64_inc - increment atomic64 variable
* @v: pointer to type atomic64_t
*
* Atomically increments @v by 1.
*/
-static inline void atomic64_inc(atomic64_t *v)
+static inline void arch_atomic64_inc(atomic64_t *v)
{
__alternative_atomic64(inc, inc_return, /* no output */,
"S" (v) : "memory", "eax", "ecx", "edx");
}

/**
- * atomic64_dec - decrement atomic64 variable
+ * arch_atomic64_dec - decrement atomic64 variable
* @v: pointer to type atomic64_t
*
* Atomically decrements @v by 1.
*/
-static inline void atomic64_dec(atomic64_t *v)
+static inline void arch_atomic64_dec(atomic64_t *v)
{
__alternative_atomic64(dec, dec_return, /* no output */,
"S" (v) : "memory", "eax", "ecx", "edx");
}

/**
- * atomic64_dec_and_test - decrement and test
+ * arch_atomic64_dec_and_test - decrement and test
* @v: pointer to type atomic64_t
*
* Atomically decrements @v by 1 and
* returns true if the result is 0, or false for all other
* cases.
*/
-static inline int atomic64_dec_and_test(atomic64_t *v)
+static inline int arch_atomic64_dec_and_test(atomic64_t *v)
{
- return atomic64_dec_return(v) == 0;
+ return arch_atomic64_dec_return(v) == 0;
}

/**
@@ -254,13 +255,13 @@ static inline int atomic64_dec_and_test(atomic64_t *v)
* and returns true if the result is zero, or false for all
* other cases.
*/
-static inline int atomic64_inc_and_test(atomic64_t *v)
+static inline int arch_atomic64_inc_and_test(atomic64_t *v)
{
- return atomic64_inc_return(v) == 0;
+ return arch_atomic64_inc_return(v) == 0;
}

/**
- * atomic64_add_negative - add and test if negative
+ * arch_atomic64_add_negative - add and test if negative
* @i: integer value to add
* @v: pointer to type atomic64_t
*
@@ -268,13 +269,13 @@ static inline int atomic64_inc_and_test(atomic64_t *v)
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
-static inline int atomic64_add_negative(long long i, atomic64_t *v)
+static inline int arch_atomic64_add_negative(long long i, atomic64_t *v)
{
- return atomic64_add_return(i, v) < 0;
+ return arch_atomic64_add_return(i, v) < 0;
}

/**
- * atomic64_add_unless - add unless the number is a given value
+ * arch_atomic64_add_unless - add unless the number is a given value
* @v: pointer of type atomic64_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
@@ -282,7 +283,8 @@ static inline int atomic64_add_negative(long long i, atomic64_t *v)
* Atomically adds @a to @v, so long as it was not @u.
* Returns non-zero if the add was done, zero otherwise.
*/
-static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
+static inline int arch_atomic64_add_unless(atomic64_t *v, long long a,
+ long long u)
{
unsigned low = (unsigned)u;
unsigned high = (unsigned)(u >> 32);
@@ -293,7 +295,7 @@ static inline int atomic64_add_unless(atomic64_t *v, long long a, long long u)
}

-static inline int atomic64_inc_not_zero(atomic64_t *v)
+static inline int arch_atomic64_inc_not_zero(atomic64_t *v)
{
int r;
alternative_atomic64(inc_not_zero, "=&a" (r),
@@ -301,7 +303,7 @@ static inline int atomic64_inc_not_zero(atomic64_t *v)
return r;
}

-static inline long long atomic64_dec_if_positive(atomic64_t *v)
+static inline long long arch_atomic64_dec_if_positive(atomic64_t *v)
{
long long r;
alternative_atomic64(dec_if_positive, "=&A" (r),
@@ -313,25 +315,25 @@ static inline long long atomic64_dec_if_positive(atomic64_t *v)
#undef __alternative_atomic64

#define ATOMIC64_OP(op, c_op) \
-static inline void atomic64_##op(long long i, atomic64_t *v) \
+static inline void arch_atomic64_##op(long long i, atomic64_t *v) \
{ \
long long old, c = 0; \
- while ((old = atomic64_cmpxchg(v, c, c c_op i)) != c) \
+ while ((old = arch_atomic64_cmpxchg(v, c, c c_op i)) != c) \
c = old; \
}

#define ATOMIC64_FETCH_OP(op, c_op) \
-static inline long long atomic64_fetch_##op(long long i, atomic64_t *v) \
+static inline long long arch_atomic64_fetch_##op(long long i, atomic64_t *v) \
{ \
long long old, c = 0; \
- while ((old = atomic64_cmpxchg(v, c, c c_op i)) != c) \
+ while ((old = arch_atomic64_cmpxchg(v, c, c c_op i)) != c) \
c = old; \
return old; \
}

ATOMIC64_FETCH_OP(add, +)

-#define atomic64_fetch_sub(i, v) atomic64_fetch_add(-(i), (v))
+#define arch_atomic64_fetch_sub(i, v) arch_atomic64_fetch_add(-(i), (v))

#define ATOMIC64_OPS(op, c_op) \
ATOMIC64_OP(op, c_op) \
diff --git a/arch/x86/include/asm/atomic64_64.h b/arch/x86/include/asm/atomic64_64.h
index 89ed2f6ae2f7..de9555d35cb0 100644
--- a/arch/x86/include/asm/atomic64_64.h
+++ b/arch/x86/include/asm/atomic64_64.h
@@ -16,31 +16,31 @@
* Atomically reads the value of @v.
* Doesn't imply a read memory barrier.
*/
-static inline long atomic64_read(const atomic64_t *v)
+static inline long arch_atomic64_read(const atomic64_t *v)

{
- return READ_ONCE((v)->counter);
+ return READ_ONCE_NOCHECK((v)->counter);
}

/**
- * atomic64_set - set atomic64 variable
+ * arch_atomic64_set - set atomic64 variable
* @v: pointer to type atomic64_t
* @i: required value
*
* Atomically sets the value of @v to @i.
*/
-static inline void atomic64_set(atomic64_t *v, long i)
+static inline void arch_atomic64_set(atomic64_t *v, long i)
{
WRITE_ONCE(v->counter, i);
}

/**
- * atomic64_add - add integer to atomic64 variable
+ * arch_atomic64_add - add integer to atomic64 variable
* @i: integer value to add
* @v: pointer to type atomic64_t
*
* Atomically adds @i to @v.
*/
-static __always_inline void atomic64_add(long i, atomic64_t *v)
+static __always_inline void arch_atomic64_add(long i, atomic64_t *v)
{

asm volatile(LOCK_PREFIX "addq %1,%0"
: "=m" (v->counter)

@@ -48,13 +48,13 @@ static __always_inline void atomic64_add(long i, atomic64_t *v)
}

/**
- * atomic64_sub - subtract the atomic64 variable
+ * arch_atomic64_sub - subtract the atomic64 variable
* @i: integer value to subtract
* @v: pointer to type atomic64_t
*
* Atomically subtracts @i from @v.
*/
-static inline void atomic64_sub(long i, atomic64_t *v)
+static inline void arch_atomic64_sub(long i, atomic64_t *v)
{

asm volatile(LOCK_PREFIX "subq %1,%0"
: "=m" (v->counter)

@@ -62,7 +62,7 @@ static inline void atomic64_sub(long i, atomic64_t *v)
}

/**
- * atomic64_sub_and_test - subtract value from variable and test result
+ * arch_atomic64_sub_and_test - subtract value from variable and test result
* @i: integer value to subtract
* @v: pointer to type atomic64_t
*
@@ -70,18 +70,18 @@ static inline void atomic64_sub(long i, atomic64_t *v)
* true if the result is zero, or false for all
* other cases.
*/
-static inline bool atomic64_sub_and_test(long i, atomic64_t *v)
+static inline bool arch_atomic64_sub_and_test(long i, atomic64_t *v)
{

GEN_BINARY_RMWcc(LOCK_PREFIX "subq", v->counter, "er", i, "%0", e);
}

/**
- * atomic64_inc - increment atomic64 variable
+ * arch_atomic64_inc - increment atomic64 variable
* @v: pointer to type atomic64_t
*
* Atomically increments @v by 1.
*/
-static __always_inline void atomic64_inc(atomic64_t *v)
+static __always_inline void arch_atomic64_inc(atomic64_t *v)
{

asm volatile(LOCK_PREFIX "incq %0"
: "=m" (v->counter)

@@ -89,12 +89,12 @@ static __always_inline void atomic64_inc(atomic64_t *v)
}

/**
- * atomic64_dec - decrement atomic64 variable
+ * arch_atomic64_dec - decrement atomic64 variable
* @v: pointer to type atomic64_t
*
* Atomically decrements @v by 1.
*/
-static __always_inline void atomic64_dec(atomic64_t *v)
+static __always_inline void arch_atomic64_dec(atomic64_t *v)
{

asm volatile(LOCK_PREFIX "decq %0"
: "=m" (v->counter)

@@ -102,33 +102,33 @@ static __always_inline void atomic64_dec(atomic64_t *v)
}

/**
- * atomic64_dec_and_test - decrement and test
+ * arch_atomic64_dec_and_test - decrement and test
* @v: pointer to type atomic64_t
*
* Atomically decrements @v by 1 and
* returns true if the result is 0, or false for all other
* cases.
*/
-static inline bool atomic64_dec_and_test(atomic64_t *v)
+static inline bool arch_atomic64_dec_and_test(atomic64_t *v)
{

GEN_UNARY_RMWcc(LOCK_PREFIX "decq", v->counter, "%0", e);
}

/**
- * atomic64_inc_and_test - increment and test
+ * arch_atomic64_inc_and_test - increment and test
* @v: pointer to type atomic64_t
*
* Atomically increments @v by 1
* and returns true if the result is zero, or false for all
* other cases.
*/
-static inline bool atomic64_inc_and_test(atomic64_t *v)
+static inline bool arch_atomic64_inc_and_test(atomic64_t *v)
{

GEN_UNARY_RMWcc(LOCK_PREFIX "incq", v->counter, "%0", e);
}

/**
- * atomic64_add_negative - add and test if negative
+ * arch_atomic64_add_negative - add and test if negative
* @i: integer value to add
* @v: pointer to type atomic64_t
*
@@ -136,53 +136,53 @@ static inline bool atomic64_inc_and_test(atomic64_t *v)
* if the result is negative, or false when
* result is greater than or equal to zero.
*/
-static inline bool atomic64_add_negative(long i, atomic64_t *v)
+static inline bool arch_atomic64_add_negative(long i, atomic64_t *v)
{

GEN_BINARY_RMWcc(LOCK_PREFIX "addq", v->counter, "er", i, "%0", s);
}

/**
- * atomic64_add_return - add and return
+ * arch_atomic64_add_return - add and return
* @i: integer value to add
* @v: pointer to type atomic64_t
*
* Atomically adds @i to @v and returns @i + @v
*/
-static __always_inline long atomic64_add_return(long i, atomic64_t *v)
+static __always_inline long arch_atomic64_add_return(long i, atomic64_t *v)
{

return i + xadd(&v->counter, i);
}

-static inline long atomic64_sub_return(long i, atomic64_t *v)
+static inline long arch_atomic64_sub_return(long i, atomic64_t *v)
{
- return atomic64_add_return(-i, v);
+ return arch_atomic64_add_return(-i, v);
}

-static inline long atomic64_fetch_add(long i, atomic64_t *v)
+static inline long arch_atomic64_fetch_add(long i, atomic64_t *v)
{
return xadd(&v->counter, i);
}

-static inline long atomic64_fetch_sub(long i, atomic64_t *v)
+static inline long arch_atomic64_fetch_sub(long i, atomic64_t *v)
{
return xadd(&v->counter, -i);
}

-#define atomic64_inc_return(v) (atomic64_add_return(1, (v)))
-#define atomic64_dec_return(v) (atomic64_sub_return(1, (v)))
+#define arch_atomic64_inc_return(v) (arch_atomic64_add_return(1, (v)))
+#define arch_atomic64_dec_return(v) (arch_atomic64_sub_return(1, (v)))

-static inline long atomic64_cmpxchg(atomic64_t *v, long old, long new)
+static inline long arch_atomic64_cmpxchg(atomic64_t *v, long old, long new)
{
- return cmpxchg(&v->counter, old, new);
+ return arch_cmpxchg(&v->counter, old, new);
}

-static inline long atomic64_xchg(atomic64_t *v, long new)
+static inline long arch_atomic64_xchg(atomic64_t *v, long new)
{
- return xchg(&v->counter, new);
+ return arch_xchg(&v->counter, new);
}

/**
- * atomic64_add_unless - add unless the number is a given value
+ * arch_atomic64_add_unless - add unless the number is a given value
* @v: pointer of type atomic64_t
* @a: the amount to add to v...
* @u: ...unless v is equal to u.
@@ -190,14 +190,14 @@ static inline long atomic64_xchg(atomic64_t *v, long new)
* Atomically adds @a to @v, so long as it was not @u.
* Returns the old value of @v.
*/
-static inline bool atomic64_add_unless(atomic64_t *v, long a, long u)
+static inline bool arch_atomic64_add_unless(atomic64_t *v, long a, long u)
{
long c, old;
- c = atomic64_read(v);
+ c = arch_atomic64_read(v);
for (;;) {
if (unlikely(c == (u)))
break;
- old = atomic64_cmpxchg((v), c, c + (a));
+ old = arch_atomic64_cmpxchg((v), c, c + (a));
if (likely(old == c))
break;
c = old;
@@ -205,24 +205,24 @@ static inline bool atomic64_add_unless(atomic64_t *v, long a, long u)
return c != (u);
}

-#define atomic64_inc_not_zero(v) atomic64_add_unless((v), 1, 0)
+#define arch_atomic64_inc_not_zero(v) arch_atomic64_add_unless((v), 1, 0)

/*
- * atomic64_dec_if_positive - decrement by 1 if old value positive
+ * arch_atomic64_dec_if_positive - decrement by 1 if old value positive
* @v: pointer of type atomic_t
*
* The function returns the old value of *v minus 1, even if
* the atomic variable, v, was not decremented.
*/
-static inline long atomic64_dec_if_positive(atomic64_t *v)
+static inline long arch_atomic64_dec_if_positive(atomic64_t *v)
{
long c, old, dec;
- c = atomic64_read(v);
+ c = arch_atomic64_read(v);
for (;;) {
dec = c - 1;
if (unlikely(dec < 0))
break;
- old = atomic64_cmpxchg((v), c, dec);
+ old = arch_atomic64_cmpxchg((v), c, dec);
if (likely(old == c))
break;
c = old;
@@ -231,7 +231,7 @@ static inline long atomic64_dec_if_positive(atomic64_t *v)
}

#define ATOMIC64_OP(op) \
-static inline void atomic64_##op(long i, atomic64_t *v) \
+static inline void arch_atomic64_##op(long i, atomic64_t *v) \
{ \

asm volatile(LOCK_PREFIX #op"q %1,%0" \
: "+m" (v->counter) \

@@ -240,11 +240,11 @@ static inline void atomic64_##op(long i, atomic64_t *v) \
}

#define ATOMIC64_FETCH_OP(op, c_op) \
-static inline long atomic64_fetch_##op(long i, atomic64_t *v) \
+static inline long arch_atomic64_fetch_##op(long i, atomic64_t *v) \
{ \
- long old, val = atomic64_read(v); \
+ long old, val = arch_atomic64_read(v); \
for (;;) { \
- old = atomic64_cmpxchg(v, val, val c_op i); \
+ old = arch_atomic64_cmpxchg(v, val, val c_op i); \
if (old == val) \
break; \
val = old; \
diff --git a/arch/x86/include/asm/cmpxchg.h b/arch/x86/include/asm/cmpxchg.h
index 97848cdfcb1a..75f09809666f 100644
--- a/arch/x86/include/asm/cmpxchg.h
+++ b/arch/x86/include/asm/cmpxchg.h
@@ -74,7 +74,7 @@ extern void __add_wrong_size(void)
* use "asm volatile" and "memory" clobbers to prevent gcc from moving
* information around.
*/
-#define xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")
+#define arch_xchg(ptr, v) __xchg_op((ptr), (v), xchg, "")

/*
* Atomic compare and exchange. Compare OLD with MEM, if identical,
@@ -144,13 +144,13 @@ extern void __add_wrong_size(void)
# include <asm/cmpxchg_64.h>
#endif

-#define cmpxchg(ptr, old, new) \
+#define arch_cmpxchg(ptr, old, new) \
__cmpxchg(ptr, old, new, sizeof(*(ptr)))

-#define sync_cmpxchg(ptr, old, new) \
+#define arch_sync_cmpxchg(ptr, old, new) \
__sync_cmpxchg(ptr, old, new, sizeof(*(ptr)))

-#define cmpxchg_local(ptr, old, new) \
+#define arch_cmpxchg_local(ptr, old, new) \
__cmpxchg_local(ptr, old, new, sizeof(*(ptr)))

/*
@@ -179,10 +179,10 @@ extern void __add_wrong_size(void)
__ret; \
})

-#define cmpxchg_double(p1, p2, o1, o2, n1, n2) \
+#define arch_cmpxchg_double(p1, p2, o1, o2, n1, n2) \
__cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)

-#define cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \
+#define arch_cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \
__cmpxchg_double(, p1, p2, o1, o2, n1, n2)

#endif /* ASM_X86_CMPXCHG_H */
diff --git a/arch/x86/include/asm/cmpxchg_32.h b/arch/x86/include/asm/cmpxchg_32.h
index e4959d023af8..d897291d2bf9 100644
--- a/arch/x86/include/asm/cmpxchg_32.h
+++ b/arch/x86/include/asm/cmpxchg_32.h
@@ -35,10 +35,10 @@ static inline void set_64bit(volatile u64 *ptr, u64 value)
}

#ifdef CONFIG_X86_CMPXCHG64
-#define cmpxchg64(ptr, o, n) \
+#define arch_cmpxchg64(ptr, o, n) \
((__typeof__(*(ptr)))__cmpxchg64((ptr), (unsigned long long)(o), \
(unsigned long long)(n)))
-#define cmpxchg64_local(ptr, o, n) \
+#define arch_cmpxchg64_local(ptr, o, n) \
((__typeof__(*(ptr)))__cmpxchg64_local((ptr), (unsigned long long)(o), \
(unsigned long long)(n)))
#endif
@@ -75,7 +75,7 @@ static inline u64 __cmpxchg64_local(volatile u64 *ptr, u64 old, u64 new)
* to simulate the cmpxchg8b on the 80386 and 80486 CPU.
*/

-#define cmpxchg64(ptr, o, n) \
+#define arch_cmpxchg64(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
__typeof__(*(ptr)) __old = (o); \
@@ -92,7 +92,7 @@ static inline u64 __cmpxchg64_local(volatile u64 *ptr, u64 old, u64 new)
__ret; })

-#define cmpxchg64_local(ptr, o, n) \
+#define arch_cmpxchg64_local(ptr, o, n) \
({ \
__typeof__(*(ptr)) __ret; \
__typeof__(*(ptr)) __old = (o); \
diff --git a/arch/x86/include/asm/cmpxchg_64.h b/arch/x86/include/asm/cmpxchg_64.h
index caa23a34c963..fafaebacca2d 100644
--- a/arch/x86/include/asm/cmpxchg_64.h
+++ b/arch/x86/include/asm/cmpxchg_64.h
@@ -6,13 +6,13 @@ static inline void set_64bit(volatile u64 *ptr, u64 val)
*ptr = val;
}

-#define cmpxchg64(ptr, o, n) \
+#define arch_cmpxchg64(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
cmpxchg((ptr), (o), (n)); \
})

-#define cmpxchg64_local(ptr, o, n) \
+#define arch_cmpxchg64_local(ptr, o, n) \
({ \
BUILD_BUG_ON(sizeof(*(ptr)) != 8); \
cmpxchg_local((ptr), (o), (n)); \
diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
new file mode 100644
index 000000000000..41e5e6ffdfc8
--- /dev/null
+++ b/include/asm-generic/atomic-instrumented.h
@@ -0,0 +1,210 @@
+#ifndef _LINUX_ATOMIC_INSTRUMENTED_H
+#define _LINUX_ATOMIC_INSTRUMENTED_H
+
+static __always_inline int atomic_read(const atomic_t *v)
+{
+ return arch_atomic_read(v);
+}
+
+static __always_inline long long atomic64_read(const atomic64_t *v)
+{
+ return arch_atomic64_read(v);
+}
+
+
+static __always_inline void atomic_set(atomic_t *v, int i)
+{
+ arch_atomic_set(v, i);
+}
+
+static __always_inline void atomic64_set(atomic64_t *v, long long i)
+{
+ arch_atomic64_set(v, i);
+}
+
+static __always_inline int atomic_xchg(atomic_t *v, int i)
+{
+ return arch_atomic_xchg(v, i);
+}
+
+static __always_inline long long atomic64_xchg(atomic64_t *v, long long i)
+{
+ return arch_atomic64_xchg(v, i);
+}
+
+static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
+{
+ return arch_atomic_cmpxchg(v, old, new);
+}
+
+static __always_inline long long atomic64_cmpxchg(atomic64_t *v, long long old,
+ long long new)
+{
+ return arch_atomic64_cmpxchg(v, old, new);
+}
+
+static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
+{
+ return __arch_atomic_add_unless(v, a, u);
+}
+
+
+static __always_inline bool atomic64_add_unless(atomic64_t *v, long long a,
+ long long u)
+{
+ return arch_atomic64_add_unless(v, a, u);
+}
+
+static __always_inline short int atomic_inc_short(short int *v)
+{
+ return arch_atomic_inc_short(v);
+}
+
+#define __INSTR_VOID1(op, sz) \
+static __always_inline void atomic##sz##_##op(atomic##sz##_t *v) \
+{ \
+ arch_atomic##sz##_##op(v); \
+}
+
+#define INSTR_VOID1(op) \
+__INSTR_VOID1(op,); \
+__INSTR_VOID1(op, 64)
+
+INSTR_VOID1(inc);
+INSTR_VOID1(dec);
+
+#undef __INSTR_VOID1
+#undef INSTR_VOID1
+
+#define __INSTR_VOID2(op, sz, type) \
+static __always_inline void atomic##sz##_##op(type i, atomic##sz##_t *v)\
+{ \
+ arch_atomic##sz##_##op(i, v); \
+}
+
+#define INSTR_VOID2(op) \
+__INSTR_VOID2(op, , int); \
+__INSTR_VOID2(op, 64, long long)
+
+INSTR_VOID2(add);
+INSTR_VOID2(sub);
+INSTR_VOID2(and);
+INSTR_VOID2(or);
+INSTR_VOID2(xor);
+
+#undef __INSTR_VOID2
+#undef INSTR_VOID2
+
+#define __INSTR_RET1(op, sz, type, rtype) \
+static __always_inline rtype atomic##sz##_##op(atomic##sz##_t *v) \
+{ \
+ return arch_atomic##sz##_##op(v); \
+}
+
+#define INSTR_RET1(op) \
+__INSTR_RET1(op, , int, int); \
+__INSTR_RET1(op, 64, long long, long long)
+
+INSTR_RET1(inc_return);
+INSTR_RET1(dec_return);
+__INSTR_RET1(inc_not_zero, 64, long long, long long);
+__INSTR_RET1(dec_if_positive, 64, long long, long long);
+
+#define INSTR_RET_BOOL1(op) \
+__INSTR_RET1(op, , int, bool); \
+__INSTR_RET1(op, 64, long long, bool)
+
+INSTR_RET_BOOL1(dec_and_test);
+INSTR_RET_BOOL1(inc_and_test);
+
+#undef __INSTR_RET1
+#undef INSTR_RET1
+#undef INSTR_RET_BOOL1
+
+#define __INSTR_RET2(op, sz, type, rtype) \
+static __always_inline rtype atomic##sz##_##op(type i, atomic##sz##_t *v) \
+{ \
+ return arch_atomic##sz##_##op(i, v); \
+}
+
+#define INSTR_RET2(op) \
+__INSTR_RET2(op, , int, int); \
+__INSTR_RET2(op, 64, long long, long long)
+
+INSTR_RET2(add_return);
+INSTR_RET2(sub_return);
+INSTR_RET2(fetch_add);
+INSTR_RET2(fetch_sub);
+INSTR_RET2(fetch_and);
+INSTR_RET2(fetch_or);
+INSTR_RET2(fetch_xor);
+
+#define INSTR_RET_BOOL2(op) \
+__INSTR_RET2(op, , int, bool); \
+__INSTR_RET2(op, 64, long long, bool)
+
+INSTR_RET_BOOL2(sub_and_test);
+INSTR_RET_BOOL2(add_negative);
+
+#undef __INSTR_RET2
+#undef INSTR_RET2
+#undef INSTR_RET_BOOL2
+
+/*
+ * In the following macros we need to be careful to not clash with arch_ macros.
+ * arch_xchg() can be defined as an extended statement expression as well,
+ * if we define a __ptr variable, and arch_xchg() also defines __ptr variable,
+ * and we pass __ptr as an argument to arch_xchg(), it will use own __ptr
+ * instead of ours. This leads to unpleasant crashes. To avoid the problem
+ * the following macros declare variables with lots of underscores.
+ */
+
+#define xchg(ptr, v) \

+({ \
+ __typeof__(ptr) ____ptr = (ptr); \

+ arch_xchg(____ptr, (v)); \
+})
+
+#define cmpxchg(ptr, old, new) \
+({ \
+ __typeof__(ptr) ___ptr = (ptr); \
+ arch_cmpxchg(___ptr, (old), (new)); \
+})
+
+#define sync_cmpxchg(ptr, old, new) \
+({ \
+ __typeof__(ptr) ___ptr = (ptr); \
+ arch_sync_cmpxchg(___ptr, (old), (new)); \
+})
+
+#define cmpxchg_local(ptr, old, new) \

+({ \
+ __typeof__(ptr) ____ptr = (ptr); \

+ arch_cmpxchg_local(____ptr, (old), (new)); \
+})
+
+#define cmpxchg64(ptr, old, new) \

+({ \
+ __typeof__(ptr) ____ptr = (ptr); \

+ arch_cmpxchg64(____ptr, (old), (new)); \
+})
+
+#define cmpxchg64_local(ptr, old, new) \

+({ \
+ __typeof__(ptr) ____ptr = (ptr); \

+ arch_cmpxchg64_local(____ptr, (old), (new)); \
+})
+
+#define cmpxchg_double(p1, p2, o1, o2, n1, n2) \
+({ \
+ __typeof__(p1) ____p1 = (p1); \
+ arch_cmpxchg_double(____p1, (p2), (o1), (o2), (n1), (n2)); \
+})
+
+#define cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \
+({ \
+ __typeof__(p1) ____p1 = (p1); \
+ arch_cmpxchg_double_local(____p1, (p2), (o1), (o2), (n1), (n2));\
+})
+
+#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */
--
2.12.0.367.g23dc2f6d3c-goog

Dmitry Vyukov

unread,

Mar 14, 2017, 3:24:22 PM3/14/17

to mark.r...@arm.com, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org, Dmitry Vyukov

KASAN uses compiler instrumentation to intercept all memory accesses.
But it does not see memory accesses done in assembly code.
One notable user of assembly code is atomic operations. Frequently,
for example, an atomic reference decrement is the last access to an
object and a good candidate for a racy use-after-free.

Add manual KASAN checks to atomic operations.

Signed-off-by: Dmitry Vyukov <dvy...@google.com>
Cc: Mark Rutland <mark.r...@arm.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Will Deacon <will....@arm.com>,
Cc: Andrew Morton <ak...@linux-foundation.org>,
Cc: Andrey Ryabinin <arya...@virtuozzo.com>,
Cc: Ingo Molnar <mi...@redhat.com>,
Cc: kasa...@googlegroups.com
Cc: linu...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: x...@kernel.org
---

include/asm-generic/atomic-instrumented.h | 36 +++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/include/asm-generic/atomic-instrumented.h b/include/asm-generic/atomic-instrumented.h
index 41e5e6ffdfc8..951bcd083925 100644
--- a/include/asm-generic/atomic-instrumented.h
+++ b/include/asm-generic/atomic-instrumented.h
@@ -1,50 +1,72 @@
+/*
+ * This file provides wrappers with KASAN instrumentation for atomic operations.
+ * To use this functionality an arch's atomic.h file needs to define all
+ * atomic operations with arch_ prefix (e.g. arch_atomic_read()) and include
+ * this file at the end. This file provides atomic_read() that forwards to
+ * arch_atomic_read() for actual atomic operation.
+ * Note: if an arch atomic operation is implemented by means of other atomic
+ * operations (e.g. atomic_read()/atomic_cmpxchg() loop), then it needs to use
+ * arch_ variants (i.e. arch_atomic_read()/arch_atomic_cmpxchg()) to avoid
+ * double instrumentation.
+ */
#ifndef _LINUX_ATOMIC_INSTRUMENTED_H
#define _LINUX_ATOMIC_INSTRUMENTED_H

+#include <linux/kasan-checks.h>
+

static __always_inline int atomic_read(const atomic_t *v)
{

+ kasan_check_read(v, sizeof(*v));
return arch_atomic_read(v);

}

static __always_inline long long atomic64_read(const atomic64_t *v)
{

+ kasan_check_read(v, sizeof(*v));
return arch_atomic64_read(v);

}

static __always_inline void atomic_set(atomic_t *v, int i)
{

+ kasan_check_write(v, sizeof(*v));
arch_atomic_set(v, i);

}

static __always_inline void atomic64_set(atomic64_t *v, long long i)
{

+ kasan_check_write(v, sizeof(*v));
arch_atomic64_set(v, i);

}

static __always_inline int atomic_xchg(atomic_t *v, int i)
{

+ kasan_check_write(v, sizeof(*v));
return arch_atomic_xchg(v, i);

}

static __always_inline long long atomic64_xchg(atomic64_t *v, long long i)
{

+ kasan_check_write(v, sizeof(*v));
return arch_atomic64_xchg(v, i);

}

static __always_inline int atomic_cmpxchg(atomic_t *v, int old, int new)
{

+ kasan_check_write(v, sizeof(*v));
return arch_atomic_cmpxchg(v, old, new);

}

static __always_inline long long atomic64_cmpxchg(atomic64_t *v, long long old,

long long new)
{
+ kasan_check_write(v, sizeof(*v));
return arch_atomic64_cmpxchg(v, old, new);

}

static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)

{
+ kasan_check_write(v, sizeof(*v));
return __arch_atomic_add_unless(v, a, u);
}

@@ -52,17 +74,20 @@ static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)

static __always_inline bool atomic64_add_unless(atomic64_t *v, long long a,

long long u)
{
+ kasan_check_write(v, sizeof(*v));
return arch_atomic64_add_unless(v, a, u);

}

static __always_inline short int atomic_inc_short(short int *v)

{
+ kasan_check_write(v, sizeof(*v));
return arch_atomic_inc_short(v);
}

#define __INSTR_VOID1(op, sz) \

static __always_inline void atomic##sz##_##op(atomic##sz##_t *v) \
{ \

+ kasan_check_write(v, sizeof(*v)); \
arch_atomic##sz##_##op(v); \
}

@@ -79,6 +104,7 @@ INSTR_VOID1(dec);
#define __INSTR_VOID2(op, sz, type) \

static __always_inline void atomic##sz##_##op(type i, atomic##sz##_t *v)\
{ \

+ kasan_check_write(v, sizeof(*v)); \
arch_atomic##sz##_##op(i, v); \
}

@@ -98,6 +124,7 @@ INSTR_VOID2(xor);
#define __INSTR_RET1(op, sz, type, rtype) \

static __always_inline rtype atomic##sz##_##op(atomic##sz##_t *v) \
{ \

+ kasan_check_write(v, sizeof(*v)); \
return arch_atomic##sz##_##op(v); \
}

@@ -124,6 +151,7 @@ INSTR_RET_BOOL1(inc_and_test);
#define __INSTR_RET2(op, sz, type, rtype) \

static __always_inline rtype atomic##sz##_##op(type i, atomic##sz##_t *v) \
{ \

+ kasan_check_write(v, sizeof(*v)); \
return arch_atomic##sz##_##op(i, v); \
}

@@ -162,48 +190,56 @@ INSTR_RET_BOOL2(add_negative);
#define xchg(ptr, v) \
({ \
__typeof__(ptr) ____ptr = (ptr); \
+ kasan_check_write(____ptr, sizeof(*____ptr)); \
arch_xchg(____ptr, (v)); \
})

#define cmpxchg(ptr, old, new) \
({ \
__typeof__(ptr) ___ptr = (ptr); \
+ kasan_check_write(___ptr, sizeof(*___ptr)); \
arch_cmpxchg(___ptr, (old), (new)); \
})

#define sync_cmpxchg(ptr, old, new) \
({ \
__typeof__(ptr) ___ptr = (ptr); \
+ kasan_check_write(___ptr, sizeof(*___ptr)); \
arch_sync_cmpxchg(___ptr, (old), (new)); \
})

#define cmpxchg_local(ptr, old, new) \
({ \
__typeof__(ptr) ____ptr = (ptr); \
+ kasan_check_write(____ptr, sizeof(*____ptr)); \
arch_cmpxchg_local(____ptr, (old), (new)); \
})

#define cmpxchg64(ptr, old, new) \
({ \
__typeof__(ptr) ____ptr = (ptr); \
+ kasan_check_write(____ptr, sizeof(*____ptr)); \
arch_cmpxchg64(____ptr, (old), (new)); \
})

#define cmpxchg64_local(ptr, old, new) \
({ \
__typeof__(ptr) ____ptr = (ptr); \
+ kasan_check_write(____ptr, sizeof(*____ptr)); \
arch_cmpxchg64_local(____ptr, (old), (new)); \
})

#define cmpxchg_double(p1, p2, o1, o2, n1, n2) \

({ \
__typeof__(p1) ____p1 = (p1); \
+ kasan_check_write(____p1, 2 * sizeof(*____p1)); \

arch_cmpxchg_double(____p1, (p2), (o1), (o2), (n1), (n2)); \

})

#define cmpxchg_double_local(p1, p2, o1, o2, n1, n2) \

({ \
__typeof__(p1) ____p1 = (p1); \
+ kasan_check_write(____p1, 2 * sizeof(*____p1)); \

arch_cmpxchg_double_local(____p1, (p2), (o1), (o2), (n1), (n2));\

})

--
2.12.0.367.g23dc2f6d3c-goog

Dmitry Vyukov

unread,

Mar 14, 2017, 3:25:53 PM3/14/17

to Mark Rutland, Peter Zijlstra, Will Deacon, Andrew Morton, Andrey Ryabinin, Ingo Molnar, kasan-dev, linu...@kvack.org, LKML, x...@kernel.org

On Tue, Mar 14, 2017 at 4:44 PM, Mark Rutland <mark.r...@arm.com> wrote:
>> > -static __always_inline int atomic_read(const atomic_t *v)
>> > +static __always_inline int arch_atomic_read(const atomic_t *v)
>> > {
>> > - return READ_ONCE((v)->counter);
>> > + return READ_ONCE_NOCHECK((v)->counter);
>>
>> Should NOCHEKC come with a comment, because i've no idea why this is so.
>
> I suspect the idea is that given the wrapper will have done the KASAN
> check, duplicating it here is either sub-optimal, or results in
> duplicate splats. READ_ONCE() has an implicit KASAN check,
> READ_ONCE_NOCHECK() does not.
>
> If this is to solve duplicate splats, it'd be worth having a
> WRITE_ONCE_NOCHECK() for arch_atomic_set().
>
> Agreed on the comment, regardless.

Reverted xchg changes.
Added comments re READ_ONCE_NOCHECK() and WRITE_ONCE().
Added file comment.
Split into 3 patches and mailed.

Thanks!

Andrey Ryabinin

unread,

Mar 20, 2017, 12:40:17 PM3/20/17

to Dmitry Vyukov, mark.r...@arm.com, pet...@infradead.org, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org

On 03/14/2017 10:24 PM, Dmitry Vyukov wrote:

> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index 14635c5ea025..95dd167eb3af 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -16,36 +16,46 @@
> #define ATOMIC_INIT(i) { (i) }
>
> /**
> - * atomic_read - read atomic variable
> + * arch_atomic_read - read atomic variable
> * @v: pointer of type atomic_t
> *
> * Atomically reads the value of @v.
> */
> -static __always_inline int atomic_read(const atomic_t *v)
> +static __always_inline int arch_atomic_read(const atomic_t *v)
> {
> - return READ_ONCE((v)->counter);
> + /*
> + * We use READ_ONCE_NOCHECK() because atomic_read() contains KASAN
> + * instrumentation. Double instrumentation is unnecessary.
> + */
> + return READ_ONCE_NOCHECK((v)->counter);

This is kinda questionable "optimization". READ_ONCE_NOCHECK() is actually
function call to __read_once_size_nocheck().
So this _NOCHECK adds some bloat to the kernel.
E.g. on my .config remove of _NOCHECK() saves ~400K of .text:

size vmlinux_read
text data bss dec hex filename
40548235 16418838 23289856 80256929 4c89fa1 vmlinux_read
size vmlinux_read_nocheck
text data bss dec hex filename
40938418 16451958 23289856 80680232 4cf1528 vmlinux_read_nocheck

Mark Rutland

unread,

Mar 20, 2017, 1:17:36 PM3/20/17

to Dmitry Vyukov, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org

Hi,

On Tue, Mar 14, 2017 at 08:24:13PM +0100, Dmitry Vyukov wrote:
> /**
> - * atomic_read - read atomic variable
> + * arch_atomic_read - read atomic variable
> * @v: pointer of type atomic_t
> *
> * Atomically reads the value of @v.
> */
> -static __always_inline int atomic_read(const atomic_t *v)
> +static __always_inline int arch_atomic_read(const atomic_t *v)
> {
> - return READ_ONCE((v)->counter);
> + /*
> + * We use READ_ONCE_NOCHECK() because atomic_read() contains KASAN
> + * instrumentation. Double instrumentation is unnecessary.
> + */
> + return READ_ONCE_NOCHECK((v)->counter);
> }

Just to check, we do this to avoid duplicate reports, right?

If so, double instrumentation isn't solely "unnecessary"; it has a
functional difference, and we should explicitly describe that in the
comment.

... or are duplicate reports supressed somehow?

[...]

> +static __always_inline void arch_atomic_set(atomic_t *v, int i)
> {
> + /*
> + * We could use WRITE_ONCE_NOCHECK() if it exists, similar to
> + * READ_ONCE_NOCHECK() in arch_atomic_read(). But there is no such
> + * thing at the moment, and introducing it for this case does not
> + * worth it.
> + */
> WRITE_ONCE(v->counter, i);
> }

If we are trying to avoid duplicate reports, we should do the same here.

[...]

> +static __always_inline short int atomic_inc_short(short int *v)
> +{
> + return arch_atomic_inc_short(v);
> +}

This is x86-specific, and AFAICT, not used anywhere.

Given that it is arch-specific, I don't think it should be instrumented
here. If it isn't used, we could get rid of it entirely...

Thanks,
Mark.

Andrey Ryabinin

unread,

Mar 21, 2017, 5:23:51 AM3/21/17

to Mark Rutland, Dmitry Vyukov, pet...@infradead.org, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org

On 03/20/2017 08:17 PM, Mark Rutland wrote:
> Hi,
>
> On Tue, Mar 14, 2017 at 08:24:13PM +0100, Dmitry Vyukov wrote:
>> /**
>> - * atomic_read - read atomic variable
>> + * arch_atomic_read - read atomic variable
>> * @v: pointer of type atomic_t
>> *
>> * Atomically reads the value of @v.
>> */
>> -static __always_inline int atomic_read(const atomic_t *v)
>> +static __always_inline int arch_atomic_read(const atomic_t *v)
>> {
>> - return READ_ONCE((v)->counter);
>> + /*
>> + * We use READ_ONCE_NOCHECK() because atomic_read() contains KASAN
>> + * instrumentation. Double instrumentation is unnecessary.
>> + */
>> + return READ_ONCE_NOCHECK((v)->counter);
>> }
>
> Just to check, we do this to avoid duplicate reports, right?
>
> If so, double instrumentation isn't solely "unnecessary"; it has a
> functional difference, and we should explicitly describe that in the
> comment.
>
> ... or are duplicate reports supressed somehow?
>

They are not suppressed yet. But I think we should just switch kasan to single shot mode,
i.e. report only the first error. Single bug quite often has multiple invalid memory accesses
causing storm in dmesg. Also write OOB might corrupt metadata so the next report will print
bogus alloc/free stacktraces.
In most cases we need to look only at the first report, so reporting anything after the first
is just counterproductive.

Mark Rutland

unread,

Mar 21, 2017, 6:42:04 AM3/21/17

to Andrey Ryabinin, Dmitry Vyukov, pet...@infradead.org, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org

FWIW, that sounds sane to me.

Given that, I agree with your comment regarding READ_ONCE{,_NOCHECK}().

If anyone really wants all the reports, we could have a boot-time option
to do that.

Thanks,
Mark.

Dmitry Vyukov

unread,

Mar 21, 2017, 2:06:36 PM3/21/17

to Mark Rutland, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML

I don't mind changing READ_ONCE_NOCHECK to READ_ONCE. But I don't have
strong preference either way.

We could do:
#define arch_atomic_read_is_already_instrumented 1
and then skip instrumentation in asm-generic if it's defined. But I
don't think it's worth it.

There is no functional difference, it's only an optimization (now
somewhat questionable). As Andrey said, one can get a splash of
reports anyway, and it's the first one that is important. We use KASAN
with panic_on_warn=1 so we don't even see the rest.

Arnd Bergmann

unread,

Mar 21, 2017, 5:20:42 PM3/21/17

to Dmitry Vyukov, Mark Rutland, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML

I'm getting couple of new stack size warnings that are all the result
of the _NOCHECK.

/git/arm-soc/mm/page_alloc.c: In function 'show_free_areas':
/git/arm-soc/mm/page_alloc.c:4685:1: error: the frame size of 3368
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
}
/git/arm-soc/lib/atomic64_test.c: In function 'test_atomic':
/git/arm-soc/lib/atomic64_test.c:148:1: error: the frame size of 6528
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]
}
^
/git/arm-soc/lib/atomic64_test.c: In function 'test_atomic64':
/git/arm-soc/lib/atomic64_test.c:243:1: error: the frame size of 7112
bytes is larger than 3072 bytes [-Werror=frame-larger-than=]

This is with my previous set of patches already applied, so
READ_ONCE should not cause problems. Reverting
the READ_ONCE_NOCHECK() in atomic_read() and atomic64_read()
back to READ_ONCE()

I also get a build failure as a result of your patch, but this one is
not addressed by using READ_ONCE():

In file included from /git/arm-soc/arch/x86/include/asm/atomic.h:7:0,
from /git/arm-soc/include/linux/atomic.h:4,
from /git/arm-soc/arch/x86/include/asm/thread_info.h:53,
from /git/arm-soc/include/linux/thread_info.h:25,
from /git/arm-soc/arch/x86/include/asm/preempt.h:6,
from /git/arm-soc/include/linux/preempt.h:80,
from /git/arm-soc/include/linux/spinlock.h:50,
from /git/arm-soc/include/linux/mmzone.h:7,
from /git/arm-soc/include/linux/gfp.h:5,
from /git/arm-soc/include/linux/mm.h:9,
from /git/arm-soc/mm/slub.c:12:
/git/arm-soc/mm/slub.c: In function '__slab_free':
/git/arm-soc/arch/x86/include/asm/cmpxchg.h:174:2: error: 'asm'
operand has impossible constraints
asm volatile(pfx "cmpxchg%c4b %2; sete %0" \
^
/git/arm-soc/arch/x86/include/asm/cmpxchg.h:183:2: note: in expansion
of macro '__cmpxchg_double'

__cmpxchg_double(LOCK_PREFIX, p1, p2, o1, o2, n1, n2)

^~~~~~~~~~~~~~~~
/git/arm-soc/include/asm-generic/atomic-instrumented.h:236:2: note: in
expansion of macro 'arch_cmpxchg_double'

arch_cmpxchg_double(____p1, (p2), (o1), (o2), (n1), (n2)); \

^~~~~~~~~~~~~~~~~~~
/git/arm-soc/mm/slub.c:385:7: note: in expansion of macro 'cmpxchg_double'
if (cmpxchg_double(&page->freelist, &page->counters,
^~~~~~~~~~~~~~
/git/arm-soc/scripts/Makefile.build:308: recipe for target 'mm/slub.o' failed

http://pastebin.com/raw/qXVpi9Ev has the defconfig file I used, and I get the
error with any gcc version I tried (4.9 through 7.0.1).

Arnd

Dmitry Vyukov

unread,

Mar 22, 2017, 6:43:09 AM3/22/17

to Arnd Bergmann, Mark Rutland, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML

Initially I've tested with my stock gcc 4.8.4 (Ubuntu
4.8.4-2ubuntu1~14.04.3) and amusingly it works. But I can reproduce
the bug with 7.0.1.
Filed https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80148
Will think about kernel fix.
Thanks!

Arnd Bergmann

unread,

Mar 22, 2017, 7:30:29 AM3/22/17

to Dmitry Vyukov, Mark Rutland, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML

It's probably because gcc-4.8 didn't support KASAN yet, so the added
check had no effect.

Arnd

Dmitry Vyukov

unread,

Mar 22, 2017, 8:15:02 AM3/22/17

to Arnd Bergmann, Mark Rutland, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML

I've tested without KASAN with both compilers.

Arnd Bergmann

unread,

Mar 22, 2017, 8:48:38 AM3/22/17

to Dmitry Vyukov, Mark Rutland, Andrey Ryabinin, Peter Zijlstra, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML

Ah ok. I had not realized that this happened even without KASAN. I only saw this
problem in one out of hundreds of defconfig builds and assumed it was related
since this came from a kasan change.

Arnd

Ingo Molnar

unread,

Mar 24, 2017, 2:52:07 AM3/24/17

to Dmitry Vyukov, mark.r...@arm.com, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, ak...@linux-foundation.org, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org, Peter Zijlstra, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

Ugh, that's disgusting really...

>
> diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
> index 14635c5ea025..95dd167eb3af 100644
> --- a/arch/x86/include/asm/atomic.h
> +++ b/arch/x86/include/asm/atomic.h
> @@ -16,36 +16,46 @@
> #define ATOMIC_INIT(i) { (i) }
>
> /**
> - * atomic_read - read atomic variable
> + * arch_atomic_read - read atomic variable
> * @v: pointer of type atomic_t
> *
> * Atomically reads the value of @v.
> */
> -static __always_inline int atomic_read(const atomic_t *v)
> +static __always_inline int arch_atomic_read(const atomic_t *v)
> {
> - return READ_ONCE((v)->counter);
> + /*
> + * We use READ_ONCE_NOCHECK() because atomic_read() contains KASAN
> + * instrumentation. Double instrumentation is unnecessary.
> + */
> + return READ_ONCE_NOCHECK((v)->counter);
> }

Firstly, the patch is way too large, please split off new the documentation parts
of the patch to reduce the size and to make it easier to read!

Secondly, the next patch should do the rename to arch_atomic_*() pattern - and
nothing else:

>
> /**
> - * atomic_set - set atomic variable
> + * arch_atomic_set - set atomic variable
> * @v: pointer of type atomic_t
> * @i: required value
> *
> * Atomically sets the value of @v to @i.
> */
> -static __always_inline void atomic_set(atomic_t *v, int i)
> +static __always_inline void arch_atomic_set(atomic_t *v, int i)

Third, the prototype CPP complications:

Are just utterly disgusting that turn perfectly readable code into an unreadable,
unmaintainable mess.

You need to find some better, cleaner solution please, or convince me that no such
solution is possible. NAK for the time being.

Thanks,

Ingo

Dmitry Vyukov

unread,

Mar 24, 2017, 3:15:07 AM3/24/17

to Ingo Molnar, Mark Rutland, Peter Zijlstra, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Peter Zijlstra, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

Hello Ingo,

> Firstly, the patch is way too large, please split off new the documentation parts
> of the patch to reduce the size and to make it easier to read!
>
> Secondly, the next patch should do the rename to arch_atomic_*() pattern - and
> nothing else:

Next after what? Please provide full list of patches as you see them.
How do we avoid build breakage if we do only the rename in a separate patch?

Well, I can just write all functions as is. Does it better confirm to
kernel style? I've just looked at the x86 atomic.h and it uses macros
for similar purpose (ATOMIC_OP/ATOMIC_FETCH_OP), so I thought that
must be idiomatic kernel style...

Dmitry Vyukov

unread,

Mar 24, 2017, 4:39:56 AM3/24/17

to Ingo Molnar, Mark Rutland, Peter Zijlstra, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Peter Zijlstra, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

Stephen Rothwell reported that this patch conflicts with:
a9ebf306f52c ("locking/atomic: Introduce atomic_try_cmpxchg()")
e6790e4b5d5e ("locking/atomic/x86: Use atomic_try_cmpxchg()")
does it make sense to base my patch on the tree where these patches
were added and then submit to that tree?

I've also sent 2 fixes for this patch, if I resent this I also squash
these fixes, right?

Ingo Molnar

unread,

Mar 24, 2017, 6:57:04 AM3/24/17

to Dmitry Vyukov, Mark Rutland, Peter Zijlstra, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Peter Zijlstra, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

I think writing the prototypes out as-is, properly organized, beats any of these
macro based solutions.

> [...] I've just looked at the x86 atomic.h and it uses macros for similar

> purpose (ATOMIC_OP/ATOMIC_FETCH_OP), so I thought that must be idiomatic kernel
> style...

Mind fixing those too while at it?

And please squash any bug fixes and re-send a clean series against latest upstream
or so.

Thanks,

Ingo

Dmitry Vyukov

unread,

Mar 24, 2017, 8:46:21 AM3/24/17

to Ingo Molnar, Mark Rutland, Peter Zijlstra, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Peter Zijlstra, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

On Fri, Mar 24, 2017 at 11:57 AM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Dmitry Vyukov <dvy...@google.com> wrote:
>
>> > Are just utterly disgusting that turn perfectly readable code into an
>> > unreadable, unmaintainable mess.
>> >
>> > You need to find some better, cleaner solution please, or convince me that no
>> > such solution is possible. NAK for the time being.
>>
>> Well, I can just write all functions as is. Does it better confirm to kernel
>> style?
>
> I think writing the prototypes out as-is, properly organized, beats any of these
> macro based solutions.

You mean write out the prototypes, but use what for definitions? Macros again?

>> [...] I've just looked at the x86 atomic.h and it uses macros for similar
>> purpose (ATOMIC_OP/ATOMIC_FETCH_OP), so I thought that must be idiomatic kernel
>> style...
>
> Mind fixing those too while at it?

I don't mind once I understand how exactly you want it to look.

Ingo Molnar

unread,

Mar 28, 2017, 3:52:36 AM3/28/17

to Dmitry Vyukov, Mark Rutland, Peter Zijlstra, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Peter Zijlstra, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

* Dmitry Vyukov <dvy...@google.com> wrote:

> On Fri, Mar 24, 2017 at 11:57 AM, Ingo Molnar <mi...@kernel.org> wrote:
> >
> > * Dmitry Vyukov <dvy...@google.com> wrote:
> >
> >> > Are just utterly disgusting that turn perfectly readable code into an
> >> > unreadable, unmaintainable mess.
> >> >
> >> > You need to find some better, cleaner solution please, or convince me that no
> >> > such solution is possible. NAK for the time being.
> >>
> >> Well, I can just write all functions as is. Does it better confirm to kernel
> >> style?
> >
> > I think writing the prototypes out as-is, properly organized, beats any of these
> > macro based solutions.
>
> You mean write out the prototypes, but use what for definitions? Macros again?

No, regular C code.

I don't see the point of generating all this code via CPP - it's certainly not
making it more readable to me. I.e. this patch I commented on is a step backwards
for readability.

I'd prefer repetition and a higher overall line count over complex CPP constructs.

Thanks,

Ingo

Peter Zijlstra

unread,

Mar 28, 2017, 5:27:20 AM3/28/17

to Ingo Molnar, Dmitry Vyukov, Mark Rutland, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

On Tue, Mar 28, 2017 at 09:52:32AM +0200, Ingo Molnar wrote:

> No, regular C code.
>
> I don't see the point of generating all this code via CPP - it's certainly not
> making it more readable to me. I.e. this patch I commented on is a step backwards
> for readability.

Note that much of the atomic stuff we have today is all CPP already.

x86 is the exception because its 'weird', but most other archs are
almost pure CPP -- check Alpha for example, or asm-generic/atomic.h.

Also, look at linux/atomic.h, its a giant maze of CPP.

The CPP help us generate functions, reduces endless copy/paste (which
induces random differences -- read bugs) and construct variants
depending on the architecture input.

Yes, the CPP is a pain, but writing all that out explicitly is more of a
pain.

I've not yet looked too hard at these patches under consideration; and I
really wish we could get the compiler to do the right thing here, but
reducing the endless copy/paste that's otherwise the result of this, is
something I've found to be very valuable.

Not to mention that adding additional atomic ops got trivial (the set is
now near complete, so that's not much of an argument anymore -- but it
was, its what kept me sane sanitizing the atomic ops across all our 25+
architectures).

Dmitry Vyukov

unread,

Mar 28, 2017, 5:46:27 AM3/28/17

to Peter Zijlstra, Ingo Molnar, Mark Rutland, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

I am almost done with Ingo's proposal, including de-macro-ifying x86
atomic ops code.
I am ready to do either of them, I think both have pros and cons and
there is no perfect solution. But please agree on something.

While we are here, one thing that I noticed is that 32-bit atomic code
uses 'long long' for 64-bit operands, while 64-bit code uses 'long'
for 64-bit operands. This sorta worked more of less before, but
ultimately it makes it impossible to write any portable code (e.g. you
don't know what format specifier to use to print return value of
atomic64_read, nor what local variable type to use to avoid compiler
warnings). With the try_cmpxchg it become worse, because 'long*' is
not convertible to 'long long*' so it is not possible to write any
portable code that uses it. If you declare 'old' variable as 'long'
32-bit code won't compile, if you declare it as 'long long' 64-bit
code won't compiler. I think we need to switch to a single type for
64-bit operands/return values, e.g. 'long long'.

Ingo Molnar

unread,

Mar 28, 2017, 5:51:54 AM3/28/17

to Peter Zijlstra, Dmitry Vyukov, Mark Rutland, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

* Peter Zijlstra <pet...@infradead.org> wrote:

> On Tue, Mar 28, 2017 at 09:52:32AM +0200, Ingo Molnar wrote:
>
> > No, regular C code.
> >
> > I don't see the point of generating all this code via CPP - it's certainly not
> > making it more readable to me. I.e. this patch I commented on is a step backwards
> > for readability.
>
> Note that much of the atomic stuff we have today is all CPP already.

Yeah, but there it's implementational: we pick up arch primitives depending on
whether they are defined, such as:

#ifndef atomic_read_acquire
# define atomic_read_acquire(v) smp_load_acquire(&(v)->counter)
#endif

> x86 is the exception because its 'weird', but most other archs are
> almost pure CPP -- check Alpha for example, or asm-generic/atomic.h.

include/asm-generic/atomic.h looks pretty clean and readable overall.

> Also, look at linux/atomic.h, its a giant maze of CPP.

Nah, that's OK, much of is is essentially __weak inlines implemented via CPP -
i.e. CPP is filling in a missing compiler feature.

But this patch I replied to appears to add instrumentation wrappery via CPP which
looks like excessive and avoidable obfuscation to me.

If it's much more readable and much more compact than the C version then maybe,
but I'd like to see the C version first and see ...

> The CPP help us generate functions, reduces endless copy/paste (which induces
> random differences -- read bugs) and construct variants depending on the
> architecture input.
>
> Yes, the CPP is a pain, but writing all that out explicitly is more of a
> pain.

So I'm not convinced that it's true in this case.

Could we see the C version and compare? I could be wrong about it all.

Thanks,

Ingo

Dmitry Vyukov

unread,

Mar 28, 2017, 5:56:53 AM3/28/17

to Ingo Molnar, Peter Zijlstra, Mark Rutland, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

Here it is (without instrumentation):
https://gist.github.com/dvyukov/e33d580f701019e0cd99429054ff1f9a

Instrumentation will add for each function:

static __always_inline void atomic64_set(atomic64_t *v, long long i)
{

+ kasan_check_write(v, sizeof(*v));
arch_atomic64_set(v, i);
}

Ingo Molnar

unread,

Mar 28, 2017, 6:15:36 AM3/28/17

to Dmitry Vyukov, Peter Zijlstra, Mark Rutland, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

* Dmitry Vyukov <dvy...@google.com> wrote:

> > So I'm not convinced that it's true in this case.
> >
> > Could we see the C version and compare? I could be wrong about it all.
>
> Here it is (without instrumentation):
> https://gist.github.com/dvyukov/e33d580f701019e0cd99429054ff1f9a

Could you please include the full patch so that it can be discussed via email and
such?

> Instrumentation will add for each function:
>
> static __always_inline void atomic64_set(atomic64_t *v, long long i)
> {
> + kasan_check_write(v, sizeof(*v));
> arch_atomic64_set(v, i);
> }

That in itself looks sensible and readable.

Thanks,

Ingo

Dmitry Vyukov

unread,

Mar 28, 2017, 12:30:01 PM3/28/17

to Ingo Molnar, Peter Zijlstra, Mark Rutland, Andrey Ryabinin, Ingo Molnar, Will Deacon, Andrew Morton, kasan-dev, linu...@kvack.org, x...@kernel.org, LKML, Linus Torvalds, Thomas Gleixner, H. Peter Anvin

On Tue, Mar 28, 2017 at 12:15 PM, Ingo Molnar <mi...@kernel.org> wrote:
>
> * Dmitry Vyukov <dvy...@google.com> wrote:
>
>> > So I'm not convinced that it's true in this case.
>> >
>> > Could we see the C version and compare? I could be wrong about it all.
>>
>> Here it is (without instrumentation):
>> https://gist.github.com/dvyukov/e33d580f701019e0cd99429054ff1f9a
>
> Could you please include the full patch so that it can be discussed via email and
> such?

Mailed the whole series.

Andrew Morton

unread,

Mar 30, 2017, 6:30:19 PM3/30/17

to Dmitry Vyukov, mark.r...@arm.com, pet...@infradead.org, arya...@virtuozzo.com, mi...@redhat.com, will....@arm.com, kasa...@googlegroups.com, linu...@kvack.org, x...@kernel.org, linux-...@vger.kernel.org

On Tue, 14 Mar 2017 20:24:11 +0100 Dmitry Vyukov <dvy...@google.com> wrote:

> KASAN uses compiler instrumentation to intercept all memory accesses.
> But it does not see memory accesses done in assembly code.
> One notable user of assembly code is atomic operations. Frequently,
> for example, an atomic reference decrement is the last access to an
> object and a good candidate for a racy use-after-free.

I'm getting a pile of build errors from this patchset (and related
patches). Due to messed up merge fixing, probably. Please the review
process has been a bit bumpy.

So I'll drop

kasan-allow-kasan_check_read-write-to-accept-pointers-to-volatiles.patch
asm-generic-x86-wrap-atomic-operations.patch
asm-generic-x86-wrap-atomic-operations-fix.patch
asm-generic-add-kasan-instrumentation-to-atomic-operations.patch
asm-generic-fix-compilation-failure-in-cmpxchg_double.patch
x86-remove-unused-atomic_inc_short.patch
x86-asm-generic-add-kasan-instrumentation-to-bitops.patch

for now. Please resend (against -mm or linux-next) when the dust has
settled.

Reply all

Reply to author

Forward