Kernel Concurrency Sanitizer (KCSAN)

186 views
Skip to first unread message

Marco Elver

unread,
Sep 20, 2019, 10:19:10 AM9/20/19
to kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
Hi all,

We would like to share a new data-race detector for the Linux kernel:
Kernel Concurrency Sanitizer (KCSAN) --
https://github.com/google/ktsan/wiki/KCSAN (Details:
https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)

To those of you who we mentioned at LPC that we're working on a
watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
renamed it to KCSAN to avoid confusion with KTSAN).
[1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf

In the coming weeks we're planning to:
* Set up a syzkaller instance.
* Share the dashboard so that you can see the races that are found.
* Attempt to send fixes for some races upstream (if you find that the
kcsan-with-fixes branch contains an important fix, please feel free to
point it out and we'll prioritize that).

There are a few open questions:
* The big one: most of the reported races are due to unmarked
accesses; prioritization or pruning of races to focus initial efforts
to fix races might be required. Comments on how best to proceed are
welcome. We're aware that these are issues that have recently received
attention in the context of the LKMM
(https://lwn.net/Articles/793253/).
* How/when to upstream KCSAN?

Feel free to test and send feedback.

Thanks,
-- Marco

Will Deacon

unread,
Sep 20, 2019, 11:54:28 AM9/20/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
Hi Marco,

On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> We would like to share a new data-race detector for the Linux kernel:
> Kernel Concurrency Sanitizer (KCSAN) --
> https://github.com/google/ktsan/wiki/KCSAN (Details:
> https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
>
> To those of you who we mentioned at LPC that we're working on a
> watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
> renamed it to KCSAN to avoid confusion with KTSAN).
> [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf

Oh, spiffy!

> In the coming weeks we're planning to:
> * Set up a syzkaller instance.
> * Share the dashboard so that you can see the races that are found.
> * Attempt to send fixes for some races upstream (if you find that the
> kcsan-with-fixes branch contains an important fix, please feel free to
> point it out and we'll prioritize that).

Curious: do you take into account things like alignment and/or access size
when looking at READ_ONCE/WRITE_ONCE? Perhaps you could initially prune
naturally aligned accesses for which __native_word() is true?

> There are a few open questions:
> * The big one: most of the reported races are due to unmarked
> accesses; prioritization or pruning of races to focus initial efforts
> to fix races might be required. Comments on how best to proceed are
> welcome. We're aware that these are issues that have recently received
> attention in the context of the LKMM
> (https://lwn.net/Articles/793253/).

This one is tricky. What I think we need to avoid is an onslaught of
patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
code being modified. My worry is that Joe Developer is eager to get their
first patch into the kernel, so runs this tool and starts spamming
maintainers with these things to the point that they start ignoring KCSAN
reports altogether because of the time they take up.

I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
to have a comment describing the racy access, a bit like we do for memory
barriers. Another possibility would be to use atomic_t more widely if
there is genuine concurrency involved.

> * How/when to upstream KCSAN?

Start by posting the patches :)

Will

Mark Rutland

unread,
Sep 20, 2019, 12:31:32 PM9/20/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> Hi all,

Hi,

> We would like to share a new data-race detector for the Linux kernel:
> Kernel Concurrency Sanitizer (KCSAN) --
> https://github.com/google/ktsan/wiki/KCSAN (Details:
> https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)

Nice!

BTW kcsan_atomic_next() is missing a stub definition in <linux/kcsan.h>
when !CONFIG_KCSAN:

https://github.com/google/ktsan/commit/a22a093a0f0d0b582c82cdbac4f133a3f61d207c#diff-19d7c475b4b92aab8ba440415ab786ec

... and I think the kcsan_{begin,end}_atomic() stubs need to be static
inline too.

It looks like this is easy enough to enable on arm64, with the only real
special case being secondary_start_kernel() which we might want to
refactor to allow some portions to be instrumented.

I pushed the trivial patches I needed to get arm64 booting to my arm64/kcsan
branch:

git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/kcsan

We have some interesting splats at boot time in stop_machine, which
don't seem to have been hit/fixed on x86 yet in the kcsan-with-fixes
branch, e.g.

[ 0.237939] ==================================================================
[ 0.239431] BUG: KCSAN: data-race in multi_cpu_stop+0xa8/0x198 and set_state+0x80/0xb0
[ 0.241189]
[ 0.241606] write to 0xffff00001003bd00 of 4 bytes by task 24 on cpu 3:
[ 0.243435] set_state+0x80/0xb0
[ 0.244328] multi_cpu_stop+0x16c/0x198
[ 0.245406] cpu_stopper_thread+0x170/0x298
[ 0.246565] smpboot_thread_fn+0x40c/0x560
[ 0.247696] kthread+0x1a8/0x1b0
[ 0.248586] ret_from_fork+0x10/0x18
[ 0.249589]
[ 0.250006] read to 0xffff00001003bd00 of 4 bytes by task 14 on cpu 1:
[ 0.251804] multi_cpu_stop+0xa8/0x198
[ 0.252851] cpu_stopper_thread+0x170/0x298
[ 0.254008] smpboot_thread_fn+0x40c/0x560
[ 0.255135] kthread+0x1a8/0x1b0
[ 0.256027] ret_from_fork+0x10/0x18
[ 0.257036]
[ 0.257449] Reported by Kernel Concurrency Sanitizer on:
[ 0.258918] CPU: 1 PID: 14 Comm: migration/1 Not tainted 5.3.0-00007-g67ab35a199f4-dirty #3
[ 0.261241] Hardware name: linux,dummy-virt (DT)
[ 0.262517] ==================================================================

> To those of you who we mentioned at LPC that we're working on a
> watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
> renamed it to KCSAN to avoid confusion with KTSAN).
> [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf
>
> In the coming weeks we're planning to:
> * Set up a syzkaller instance.
> * Share the dashboard so that you can see the races that are found.
> * Attempt to send fixes for some races upstream (if you find that the
> kcsan-with-fixes branch contains an important fix, please feel free to
> point it out and we'll prioritize that).
>
> There are a few open questions:
> * The big one: most of the reported races are due to unmarked
> accesses; prioritization or pruning of races to focus initial efforts
> to fix races might be required. Comments on how best to proceed are
> welcome. We're aware that these are issues that have recently received
> attention in the context of the LKMM
> (https://lwn.net/Articles/793253/).

I think the big risk here is drive-by "fixes" masking the warnings
rather than fixing the actual issue. It's easy for people to suppress a
warning with {READ,WRITE}_ONCE(), so they're liable to do that even the
resulting race isn't benign.

I don't have a clue how to prevent that, though.

> * How/when to upstream KCSAN?

I would love to see this soon!

Thanks,
Mark.

Dmitry Vyukov

unread,
Sep 20, 2019, 12:47:14 PM9/20/19
to Mark Rutland, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
I think this is mostly orthogonal problem. E.g. for some syzbot bugs I
see fixes that also try to simply "shut up" the immediate
manifestation with whatever means, e.g. sprinkling some slinlocks. So
(1) it's not unique to atomics, (2) presence of READ/WRITE_ONCE will
make the reader aware of the fact that this runs concurrently with
something else, and then they may ask themselves why this runs
concurrently with something when the object is supposed to be private
to the thread, and then maybe they re-fix it properly. Whereas if it's
completely unmarked, nobody will even notice that this code accesses
the object concurrently with other code. So even if READ/WRITE_ONCE
was a wrong fix, it's still better to have it rather than not.

Marco Elver

unread,
Sep 20, 2019, 1:51:08 PM9/20/19
to Will Deacon, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
On Fri, 20 Sep 2019 at 17:54, Will Deacon <wi...@kernel.org> wrote:
>
> Hi Marco,
>
> On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > We would like to share a new data-race detector for the Linux kernel:
> > Kernel Concurrency Sanitizer (KCSAN) --
> > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> >
> > To those of you who we mentioned at LPC that we're working on a
> > watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
> > renamed it to KCSAN to avoid confusion with KTSAN).
> > [1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf
>
> Oh, spiffy!
>
> > In the coming weeks we're planning to:
> > * Set up a syzkaller instance.
> > * Share the dashboard so that you can see the races that are found.
> > * Attempt to send fixes for some races upstream (if you find that the
> > kcsan-with-fixes branch contains an important fix, please feel free to
> > point it out and we'll prioritize that).
>
> Curious: do you take into account things like alignment and/or access size
> when looking at READ_ONCE/WRITE_ONCE? Perhaps you could initially prune
> naturally aligned accesses for which __native_word() is true?

Nothing special (other than the normal check if accesses overlap) done
with size in READ_ONCE/WRITE_ONCE.

When you say prune naturally aligned && __native_word() accesses, I
assume you mean _plain_ naturally aligned && __native_word(), right? I
think this is a slippery slope, because if we start pretending that
such plain accesses should be treated as atomics, then we will also
miss e.g. races where the accesses should actually have been protected
by a mutex.

> > There are a few open questions:
> > * The big one: most of the reported races are due to unmarked
> > accesses; prioritization or pruning of races to focus initial efforts
> > to fix races might be required. Comments on how best to proceed are
> > welcome. We're aware that these are issues that have recently received
> > attention in the context of the LKMM
> > (https://lwn.net/Articles/793253/).
>
> This one is tricky. What I think we need to avoid is an onslaught of
> patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
> code being modified. My worry is that Joe Developer is eager to get their
> first patch into the kernel, so runs this tool and starts spamming
> maintainers with these things to the point that they start ignoring KCSAN
> reports altogether because of the time they take up.
>
> I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
> to have a comment describing the racy access, a bit like we do for memory
> barriers. Another possibility would be to use atomic_t more widely if
> there is genuine concurrency involved.

Our plan here is to use some of the options in Kconfig.kcsan to limit
reported volume of races initially, at least for syzbot instances. But
of course, this will not make the real issue go away, and eventually
we'll have to deal with all reported races somehow.

Thanks,
-- Marco

Marco Elver

unread,
Sep 20, 2019, 1:51:17 PM9/20/19
to Dmitry Vyukov, Mark Rutland, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
On Fri, 20 Sep 2019 at 18:47, Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Fri, Sep 20, 2019 at 6:31 PM Mark Rutland <mark.r...@arm.com> wrote:
> >
> > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > We would like to share a new data-race detector for the Linux kernel:
> > > Kernel Concurrency Sanitizer (KCSAN) --
> > > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> >
> > Nice!
> >
> > BTW kcsan_atomic_next() is missing a stub definition in <linux/kcsan.h>
> > when !CONFIG_KCSAN:
> >
> > https://github.com/google/ktsan/commit/a22a093a0f0d0b582c82cdbac4f133a3f61d207c#diff-19d7c475b4b92aab8ba440415ab786ec
> >
> > ... and I think the kcsan_{begin,end}_atomic() stubs need to be static
> > inline too.

Thanks for catching, fixed and pushed. Feel free to rebase your arm64 branch.

> > It looks like this is easy enough to enable on arm64, with the only real
> > special case being secondary_start_kernel() which we might want to
> > refactor to allow some portions to be instrumented.
> >
> > I pushed the trivial patches I needed to get arm64 booting to my arm64/kcsan
> > branch:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git arm64/kcsan

Cool, thanks for testing!
Thanks, the fixes in -with-fixes were ones I only encountered with
Syzkaller, where I disable KCSAN during boot. I've just added a fix
for this race and pushed to kcsan-with-fixes.

Boqun Feng

unread,
Sep 23, 2019, 12:31:27 AM9/23/19
to Will Deacon, Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
Instead of commenting READ_ONCE/WRITE_ONCE()s, how about adding
anotations for data fields/variables that might be accessed without
holding a lock? Because if all accesses to a variable are protected by
proper locks, we mostly don't need to worry about data races caused by
not using READ_ONCE/WRITE_ONCE(). Bad things happen when we write to a
variable using locks but read it outside a lock critical section for
better performance, for example, rcu_node::qsmask. I'm thinking so maybe
we can introduce a new annotation similar to __rcu, maybe call it
__lockfree ;-) as follow:

struct rcu_node {
...
unsigned long __lockfree qsmask;
...
}

, and __lockfree indicates that by design the maintainer of this data
structure or variable believe there will be accesses outside lock
critical sections. Note that not all accesses to __lockfree field, need
to be READ_ONCE/WRITE_ONCE(), if the developer manages to build a
complex but working wake/wait state machine so that it could not be
accessed in the same time, READ_ONCE()/WRITE_ONCE() is not needed.

If we have such an annotation, I think it won't be hard for configuring
KCSAN to only examine accesses to variables with this annotation. Also
this annotation could help other checkers in the future.

If KCSAN (at the least the upstream version) only check accesses with
such an anotation, "spamming with KCSAN warnings/fixes" will be the
choice of each maintainer ;-)

Thoughts?

Regards,
Boqun
signature.asc

Dmitry Vyukov

unread,
Sep 23, 2019, 4:21:52 AM9/23/19
to Boqun Feng, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Daniel Lustig, Jade Alglave, Luc Maranget
But doesn't this defeat the main goal of any race detector -- finding
concurrent accesses to complex data structures, e.g. forgotten
spinlock around rbtree manipulation? Since rbtree is not meant to
concurrent accesses, it won't have __lockfree annotation, and thus we
will ignore races on it...

Boqun Feng

unread,
Sep 23, 2019, 4:54:17 AM9/23/19
to Dmitry Vyukov, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Daniel Lustig, Jade Alglave, Luc Maranget
Maybe, but for forgotten locks detection, we already have lockdep and
also sparse can help a little. Having a __lockfree annotation could be
benefical for KCSAN to focus on checking the accesses whose race
conditions could only be detected by KCSAN at this time. I think this
could help KCSAN find problem more easily (and fast).

Out of curiosity, does KCSAN ever find a problem with forgotten locks
involved? I didn't see any in the -with-fixes branch (that's
understandable, given the seriousness, the fixes of this kind of
problems could already be submitted to upstream once KCSAN found it.)

Regards,
Boqun
signature.asc

Dmitry Vyukov

unread,
Sep 23, 2019, 4:59:17 AM9/23/19
to Boqun Feng, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Daniel Lustig, Jade Alglave, Luc Maranget
They don't do this at all, or to the necessary degree.

> Having a __lockfree annotation could be
> benefical for KCSAN to focus on checking the accesses whose race
> conditions could only be detected by KCSAN at this time. I think this
> could help KCSAN find problem more easily (and fast).
>
> Out of curiosity, does KCSAN ever find a problem with forgotten locks
> involved? I didn't see any in the -with-fixes branch (that's
> understandable, given the seriousness, the fixes of this kind of
> problems could already be submitted to upstream once KCSAN found it.)

This one comes to mind:
https://www.spinics.net/lists/linux-mm/msg92677.html

Maybe some others here, but I don't remember which ones now:
https://github.com/google/ktsan/wiki/KTSAN-Found-Bugs

Marco Elver

unread,
Sep 23, 2019, 7:01:42 AM9/23/19
to Dmitry Vyukov, Boqun Feng, Will Deacon, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Daniel Lustig, Jade Alglave, Luc Maranget
Just to confirm, the annotation is supposed to mean "this variable
should not be accessed concurrently". '__lockfree' may be confusing,
as "lock-free" has a very specific meaning ("lock-free algorithm"),
and I initially thought the annotation means the opposite. Maybe more
intuitive would be '__nonatomic'.

My view, however, is that this will not scale. 1) Our goal is to
*avoid* more annotations if possible. 2) Furthermore, any such
annotation assumes the developer already has understanding of all
concurrently accessed variables; however, this may not be the case for
the next person touching the code, resulting in an error. By
"whitelisting" variables, we would likely miss almost every serious
bug.

To enable/disable KCSAN for entire subsystems, it's already possible
to use 'KCSAN_SANITIZE :=n' in the Makefile, or 'KCSAN_SANITIZE_file.o
:= n' for individual files.

> > Out of curiosity, does KCSAN ever find a problem with forgotten locks
> > involved? I didn't see any in the -with-fixes branch (that's
> > understandable, given the seriousness, the fixes of this kind of
> > problems could already be submitted to upstream once KCSAN found it.)

The sheer volume of 'benign' data-races makes it difficult to filter
through and get to these, but it certainly detects such issues.

Thanks,
-- Marco

Boqun Feng

unread,
Sep 23, 2019, 8:32:25 AM9/23/19
to Marco Elver, Dmitry Vyukov, Will Deacon, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Daniel Lustig, Jade Alglave, Luc Maranget
Well, "__lockfree" means the variable will be accessed without holding a
lock, most likely in a lock-free algorithm. But I admit I haven't
thought too much about the name, so maybe it's a bad one ;-)

> My view, however, is that this will not scale. 1) Our goal is to
> *avoid* more annotations if possible. 2) Furthermore, any such

Understood. I don't have any objection to your goal, and I think
achieving that will really help developers.

> annotation assumes the developer already has understanding of all
> concurrently accessed variables; however, this may not be the case for
> the next person touching the code, resulting in an error. By

By introducing annotations as __lockfree/__nonatomic, won't it help pass
the understanding between developers? "the next person" should learn
about the design by reading the code (or document) rather than adding
some random code and see if KCSAN yells.

Just to be clear, my initial thought of introduing the annotation is to
have a better way to document the variable that cannot be accessed
"plainly" in a non mutual-exclusive environment, so that the fixes that
add READ_ONCE or WRITE_ONCE to accesses of those variables should be
considered useful.

> "whitelisting" variables, we would likely miss almost every serious
> bug.
>
> To enable/disable KCSAN for entire subsystems, it's already possible
> to use 'KCSAN_SANITIZE :=n' in the Makefile, or 'KCSAN_SANITIZE_file.o
> := n' for individual files.
>
> > > Out of curiosity, does KCSAN ever find a problem with forgotten locks
> > > involved? I didn't see any in the -with-fixes branch (that's
> > > understandable, given the seriousness, the fixes of this kind of
> > > problems could already be submitted to upstream once KCSAN found it.)
>
> The sheer volume of 'benign' data-races makes it difficult to filter
> through and get to these, but it certainly detects such issues.
>
> Thanks,
> -- Marco
>
> > This one comes to mind:
> > https://www.spinics.net/lists/linux-mm/msg92677.html
> >

Yeah, this one is a "forgotten lock" case in the wild.

> > Maybe some others here, but I don't remember which ones now:
> > https://github.com/google/ktsan/wiki/KTSAN-Found-Bugs

Thank you both for the information!

Regards,
Boqun
signature.asc

Daniel Axtens

unread,
Oct 1, 2019, 10:50:10 AM10/1/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Anatol Pomazau, Will Deacon, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
Hi Marco,

> We would like to share a new data-race detector for the Linux kernel:
> Kernel Concurrency Sanitizer (KCSAN) --
> https://github.com/google/ktsan/wiki/KCSAN (Details:
> https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)

This builds and begins to boot on powerpc, which is fantastic.

I'm seeing a lot of reports for locks are changed while being watched by
kcsan, so many that it floods the console and stalls the boot.

I think, if I've understood correctly, that this is because powerpc
doesn't use the queued lock implementation for its spinlock but rather
its own assembler locking code. This means the writes aren't
instrumented by the compiler, while some reads are. (see
__arch_spin_trylock in e.g. arch/powerpc/include/asm/spinlock.h)

Would the correct way to deal with this be for the powerpc code to call
out to __tsan_readN/__tsan_writeN before invoking the assembler that
reads and writes the lock?

Regards,
Daniel


[ 24.612864] ==================================================================
[ 24.614188] BUG: KCSAN: racing read in __spin_yield+0xa8/0x180
[ 24.614669]
[ 24.614799] race at unknown origin, with read to 0xc00000003fff9d00 of 4 bytes by task 449 on cpu 11:
[ 24.616024] __spin_yield+0xa8/0x180
[ 24.616377] _raw_spin_lock_irqsave+0x1a8/0x1b0
[ 24.616850] release_pages+0x3a0/0x880
[ 24.617203] free_pages_and_swap_cache+0x13c/0x220
[ 24.622548] tlb_flush_mmu+0x210/0x2f0
[ 24.622979] tlb_finish_mmu+0x12c/0x240
[ 24.623286] exit_mmap+0x138/0x2c0
[ 24.623779] mmput+0xe0/0x330
[ 24.624504] do_exit+0x65c/0x1050
[ 24.624835] do_group_exit+0xb4/0x210
[ 24.625458] __wake_up_parent+0x0/0x80
[ 24.625985] system_call+0x5c/0x70
[ 24.626415]
[ 24.626651] Reported by Kernel Concurrency Sanitizer on:
[ 24.628329] CPU: 11 PID: 449 Comm: systemd-bless-b Not tainted 5.3.0-00007-gad29ff6c190d-dirty #9
[ 24.629508] ==================================================================

[ 24.672860] ==================================================================
[ 24.675901] BUG: KCSAN: data-race in _raw_spin_lock_irqsave+0x13c/0x1b0 and _raw_spin_unlock_irqrestore+0x94/0x100
[ 24.680847]
[ 24.682743] write to 0xc0000001ffeefe00 of 4 bytes by task 455 on cpu 5:
[ 24.683402] _raw_spin_unlock_irqrestore+0x94/0x100
[ 24.684593] release_pages+0x250/0x880
[ 24.685148] free_pages_and_swap_cache+0x13c/0x220
[ 24.686068] tlb_flush_mmu+0x210/0x2f0
[ 24.690190] tlb_finish_mmu+0x12c/0x240
[ 24.691082] exit_mmap+0x138/0x2c0
[ 24.693216] mmput+0xe0/0x330
[ 24.693597] do_exit+0x65c/0x1050
[ 24.694170] do_group_exit+0xb4/0x210
[ 24.694658] __wake_up_parent+0x0/0x80
[ 24.696230] system_call+0x5c/0x70
[ 24.700414]
[ 24.712991] read to 0xc0000001ffeefe00 of 4 bytes by task 454 on cpu 20:
[ 24.714419] _raw_spin_lock_irqsave+0x13c/0x1b0
[ 24.715018] pagevec_lru_move_fn+0xfc/0x1d0
[ 24.715527] __lru_cache_add+0x124/0x1a0
[ 24.716072] lru_cache_add+0x30/0x50
[ 24.716411] add_to_page_cache_lru+0x134/0x250
[ 24.717938] mpage_readpages+0x220/0x3f0
[ 24.719737] blkdev_readpages+0x50/0x80
[ 24.721891] read_pages+0xb4/0x340
[ 24.722834] __do_page_cache_readahead+0x318/0x350
[ 24.723290] force_page_cache_readahead+0x150/0x280
[ 24.724391] page_cache_sync_readahead+0xe4/0x110
[ 24.725087] generic_file_buffered_read+0xa20/0xdf0
[ 24.727003] generic_file_read_iter+0x220/0x310
[ 24.728906]
[ 24.730044] Reported by Kernel Concurrency Sanitizer on:
[ 24.732185] CPU: 20 PID: 454 Comm: systemd-gpt-aut Not tainted 5.3.0-00007-gad29ff6c190d-dirty #9
[ 24.734317] ==================================================================


>
> Thanks,
> -- Marco

Joel Fernandes

unread,
Oct 1, 2019, 5:19:51 PM10/1/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
Looks exciting. I think based on our discussion at LPC, you mentioned
one way of pruning is if the compiler generated different code with _ONCE
annotations than what would have otherwise been generated. Is that still on
the table, for the purposing of pruning the reports?

Also appreciate a CC on future patches as well.

thanks,

- Joel

Marco Elver

unread,
Oct 2, 2019, 3:43:11 PM10/2/19
to Daniel Axtens, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Hi Daniel,

On Tue, 1 Oct 2019 at 16:50, Daniel Axtens <d...@axtens.net> wrote:
>
> Hi Marco,
>
> > We would like to share a new data-race detector for the Linux kernel:
> > Kernel Concurrency Sanitizer (KCSAN) --
> > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
>
> This builds and begins to boot on powerpc, which is fantastic.
>
> I'm seeing a lot of reports for locks are changed while being watched by
> kcsan, so many that it floods the console and stalls the boot.
>
> I think, if I've understood correctly, that this is because powerpc
> doesn't use the queued lock implementation for its spinlock but rather
> its own assembler locking code. This means the writes aren't
> instrumented by the compiler, while some reads are. (see
> __arch_spin_trylock in e.g. arch/powerpc/include/asm/spinlock.h)
>
> Would the correct way to deal with this be for the powerpc code to call
> out to __tsan_readN/__tsan_writeN before invoking the assembler that
> reads and writes the lock?

This should not be the issue, because with KCSAN, not instrumenting
something does not lead to false positives. If two accesses are
involved in a race, and neither of them are instrumented, KCSAN will
not report a race; if however, 1 of them is instrumented (and the
uninstrumented access is a write), KCSAN will infer a race due to the
data value changed ("race at unknown origin").

Rather, if there is spinlock code causing data-races, then there are 2 options:
1) Actually missing READ_ONCE/WRITE_ONCE somewhere.
2) You need to disable instrumentation for an entire function with
__no_sanitize_thread or __no_kcsan_or_inline (for inline functions).
This should only be needed for arch-specific code (e.g. see the
changes we made to arch/x86).

Note: you can explicitly add instrumentation to uninstrumented
accesses with the API in <linux/kcsan-checks.h>, but this shouldn't be
the issue here.

It would be good to symbolize the stack-traces, as otherwise it's hard
to say exactly what needs to be done.

Best,
-- Marco
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/8736gc4j1g.fsf%40dja-thinkpad.axtens.net.

Marco Elver

unread,
Oct 2, 2019, 3:52:11 PM10/2/19
to Joel Fernandes, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Hi Joel,
This might be interesting at first, but it's not entirely clear how
feasible it is. It's also dangerous, because the real issue would be
ignored. It may be that one compiler version on a particular
architecture generates the same code, but any change in compiler or
architecture and this would no longer be true. Let me know if you have
any more ideas.

Best,
-- Marco

> Also appreciate a CC on future patches as well.
>
> thanks,
>
> - Joel
>
>
> >
> > Feel free to test and send feedback.
> >
> > Thanks,
> > -- Marco
>
> --
> You received this message because you are subscribed to the Google Groups "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/20191001211948.GA42035%40google.com.

Dmitry Vyukov

unread,
Oct 3, 2019, 9:14:12 AM10/3/19
to Marco Elver, Joel Fernandes, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget

Dmitry Vyukov

unread,
Oct 3, 2019, 12:00:52 PM10/3/19
to Marco Elver, Christian Brauner, Joel Fernandes, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
+Christian opts in for _all_ reports for
kernel/{fork,exit,pid,signal}.c and friends.
Just wanted it to be written down for future reference :)

Mark Rutland

unread,
Oct 3, 2019, 12:12:42 PM10/3/19
to Marco Elver, Dmitry Vyukov, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
On Fri, Sep 20, 2019 at 07:51:04PM +0200, Marco Elver wrote:
> On Fri, 20 Sep 2019 at 18:47, Dmitry Vyukov <dvy...@google.com> wrote:
> >
> > On Fri, Sep 20, 2019 at 6:31 PM Mark Rutland <mark.r...@arm.com> wrote:
> > >
> > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > > We would like to share a new data-race detector for the Linux kernel:
> > > > Kernel Concurrency Sanitizer (KCSAN) --
> > > > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > >
> > > Nice!
> > >
> > > BTW kcsan_atomic_next() is missing a stub definition in <linux/kcsan.h>
> > > when !CONFIG_KCSAN:
> > >
> > > https://github.com/google/ktsan/commit/a22a093a0f0d0b582c82cdbac4f133a3f61d207c#diff-19d7c475b4b92aab8ba440415ab786ec
> > >
> > > ... and I think the kcsan_{begin,end}_atomic() stubs need to be static
> > > inline too.
>
> Thanks for catching, fixed and pushed. Feel free to rebase your arm64 branch.

Great; I've just done so!

What's the plan for posting a PATCH or RFC series?

The rest of this email is rabbit-holing on the issue KCSAN spotted;
sorry about that!

[...]
I think that's:

https://github.com/google/ktsan/commit/c1bc8ab013a66919d8347c2392f320feabb14f92

... but that doesn't look quite right to me, as it leaves us with the shape:

do {
if (READ_ONCE(msdata->state) != curstate) {
curstate = msdata->state;
switch (curstate) {
...
}
ack_state(msdata);
}
} while (curstate != MULTI_STOP_EXIT);

I don't believe that we have a guarantee of read-after-read ordering
between the READ_ONCE(msdata->state) and the subsequent plain access of
msdata->state, as we've been caught out on that in the past, e.g.

https://lore.kernel.org/lkml/1506527369-19535-1-git...@arm.com/

... which I think means we could switch on a stale value of
msdata->state. That would mean we might handle the same state twice,
calling ack_state() more times than expected and corrupting the count.

The compiler could also replace uses of curstate with a reload of
msdata->state. If it did so for the while condition, we could skip the
expected ack_state() for MULTI_STOP_EXIT, though it looks like that
might not matter.

I think we need to make sure that we use a consistent snapshot,
something like the below. Assuming I'm not barking up the wrong tree, I
can spin this as a proper patch.

Thanks,
Mark.

---->8----
diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
index b4f83f7bdf86..67a0b454b5b5 100644
--- a/kernel/stop_machine.c
+++ b/kernel/stop_machine.c
@@ -167,7 +167,7 @@ static void set_state(struct multi_stop_data *msdata,
/* Reset ack counter. */
atomic_set(&msdata->thread_ack, msdata->num_threads);
smp_wmb();
- msdata->state = newstate;
+ WRITE_ONCE(msdata->state, newstate);
}

/* Last one to ack a state moves to the next state. */
@@ -186,7 +186,7 @@ void __weak stop_machine_yield(const struct cpumask *cpumask)
static int multi_cpu_stop(void *data)
{
struct multi_stop_data *msdata = data;
- enum multi_stop_state curstate = MULTI_STOP_NONE;
+ enum multi_stop_state newstate, curstate = MULTI_STOP_NONE;
int cpu = smp_processor_id(), err = 0;
const struct cpumask *cpumask;
unsigned long flags;
@@ -210,8 +210,9 @@ static int multi_cpu_stop(void *data)
do {
/* Chill out and ensure we re-read multi_stop_state. */
stop_machine_yield(cpumask);
- if (msdata->state != curstate) {
- curstate = msdata->state;
+ newstate = READ_ONCE(msdata->state);
+ if (newstate != curstate) {
+ curstate = newstate;
switch (curstate) {
case MULTI_STOP_DISABLE_IRQ:
local_irq_disable();

Marco Elver

unread,
Oct 3, 2019, 3:27:21 PM10/3/19
to Mark Rutland, Dmitry Vyukov, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
On Thu, 3 Oct 2019 at 18:12, Mark Rutland <mark.r...@arm.com> wrote:
>
> On Fri, Sep 20, 2019 at 07:51:04PM +0200, Marco Elver wrote:
> > On Fri, 20 Sep 2019 at 18:47, Dmitry Vyukov <dvy...@google.com> wrote:
> > >
> > > On Fri, Sep 20, 2019 at 6:31 PM Mark Rutland <mark.r...@arm.com> wrote:
> > > >
> > > > On Fri, Sep 20, 2019 at 04:18:57PM +0200, Marco Elver wrote:
> > > > > We would like to share a new data-race detector for the Linux kernel:
> > > > > Kernel Concurrency Sanitizer (KCSAN) --
> > > > > https://github.com/google/ktsan/wiki/KCSAN (Details:
> > > > > https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)
> > > >
> > > > Nice!
> > > >
> > > > BTW kcsan_atomic_next() is missing a stub definition in <linux/kcsan.h>
> > > > when !CONFIG_KCSAN:
> > > >
> > > > https://github.com/google/ktsan/commit/a22a093a0f0d0b582c82cdbac4f133a3f61d207c#diff-19d7c475b4b92aab8ba440415ab786ec
> > > >
> > > > ... and I think the kcsan_{begin,end}_atomic() stubs need to be static
> > > > inline too.
> >
> > Thanks for catching, fixed and pushed. Feel free to rebase your arm64 branch.
>
> Great; I've just done so!
>
> What's the plan for posting a PATCH or RFC series?

I'm planning to send some patches, but with the amount of data-races
being found I need to prioritize what we send first. Currently the
plan is to let syzbot find data-races, and we'll start by sending a
few critical reports that syzbot found. Syzbot should be set up fully
and start finding data-races within next few days.

> The rest of this email is rabbit-holing on the issue KCSAN spotted;
> sorry about that!

Thanks for looking into this! I think you're right, and please do feel
free to send a proper patch out.

Thanks,
-- Marco

Christian Brauner

unread,
Oct 3, 2019, 3:39:30 PM10/3/19
to Dmitry Vyukov, Marco Elver, Joel Fernandes, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Yes, please! :)
Christian

Joel Fernandes

unread,
Oct 4, 2019, 12:49:01 PM10/4/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
My thought was this technique of looking at compiler generated code can be
used for prioritization of the reports. Have you tested it though? I think
without testing such technique, we could not know how much of benefit (or
lack thereof) there is to the issue.

In fact, IIRC, the compiler generating different code with _ONCE annotation
can be given as justification for patches doing such conversions.

thanks,

- Joel

Dmitry Vyukov

unread,
Oct 4, 2019, 12:53:02 PM10/4/19
to Joel Fernandes, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
We also should not forget about "missed mutex" races (e.g. unprotected
radix tree), which are much worse and higher priority than a missed
atomic annotation. If we look at codegen we may discard most of them
as non important.

Joel Fernandes

unread,
Oct 4, 2019, 12:57:38 PM10/4/19
to Dmitry Vyukov, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Sure. I was not asking to look at codegen as the only signal. But to use the
signal for whatever it is worth.

thanks,

- Joel

Dmitry Vyukov

unread,
Oct 4, 2019, 1:01:49 PM10/4/19
to Joel Fernandes, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
But then we need other, stronger signals. We don't have any.
So if the codegen is the only one and it says "this is not important",
then we conclude "this is not important".

Joel Fernandes

unread,
Oct 4, 2019, 2:08:51 PM10/4/19
to Dmitry Vyukov, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
I didn't mean for codegen to say "this is not important", but rather "this IS
important". And for the other ones, "this may not be important, or it may
be very important, I don't know".

Why do you say a missed atomic anotation is lower priority? A bug is a bug,
and ought to be fixed IMHO. Arguably missing lock acquisition can be detected
more easily due to lockdep assertions and using lockdep, than missing _ONCE
annotations. The latter has no way of being detected at runtime easily and
can be causing failures in mysterious ways.

I think you can divide the problem up.. One set of bugs that are because of
codegen changes and data races and are "important" for that reason. Another
one that is less clear whether they are important or not -- until you have a
better way of providing a signal for categorizing those.

Did I miss something?

thanks,

- Joel

Dmitry Vyukov

unread,
Oct 4, 2019, 2:28:55 PM10/4/19
to Joel Fernandes, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
You started talking about prioritization ;)

> and ought to be fixed IMHO. Arguably missing lock acquisition can be detected
> more easily due to lockdep assertions and using lockdep, than missing _ONCE
> annotations. The latter has no way of being detected at runtime easily and
> can be causing failures in mysterious ways.
>
> I think you can divide the problem up.. One set of bugs that are because of
> codegen changes and data races and are "important" for that reason. Another
> one that is less clear whether they are important or not -- until you have a
> better way of providing a signal for categorizing those.
>
> Did I miss something?

We have:
1. missed annotation with changing codegen.
2. missed annotation with non-changing codegen.
3. missed mutex with changing codegen.
4. missed mutex with non-changing codegen.

One can arguably say that 2 is less important than 1. But then both 3
and 4 are not low priority under any circumstances. And we don't have
any means to distinguish 1/2 from 3/4.
In this situation I don't see how "changing codegen" vs "non-changing
codegen" gives us any useful signal.

Assuming we have some signal for lower priority, the only useful way
of using this signal that I see is throwing lower priority bugs away
automatically for now (not reporting on syzbot). Because if we do
report all bugs and humans need to look at all of them anyway, this
signal is not too useful. If am already spending time on a report, I
can as well quickly prioritize it much more precisely than any
automatic scheme.

If we are not reporting lower priority bugs, we cannot offer to
classify "missed mutexes" as lower priority.

Eric Dumazet

unread,
Oct 4, 2019, 8:58:39 PM10/4/19
to Will Deacon, Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr


On 9/20/19 8:54 AM, Will Deacon wrote:

>
> This one is tricky. What I think we need to avoid is an onslaught of
> patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
> code being modified. My worry is that Joe Developer is eager to get their
> first patch into the kernel, so runs this tool and starts spamming
> maintainers with these things to the point that they start ignoring KCSAN
> reports altogether because of the time they take up.
>
> I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
> to have a comment describing the racy access, a bit like we do for memory
> barriers. Another possibility would be to use atomic_t more widely if
> there is genuine concurrency involved.
>

About READ_ONCE() and WRITE_ONCE(), we will probably need

ADD_ONCE(var, value) for arches that can implement the RMW in a single instruction.

WRITE_ONCE(var, var + value) does not look pretty, and increases register pressure.

I had a look at first KCSAN reports, and I can tell that tcp_poll() being lockless
means we need to add hundreds of READ_ONCE(), WRITE_ONCE() and ADD_ONCE() all over the places.

-> Absolute nightmare for future backports to stable branches.

Dmitry Vyukov

unread,
Oct 5, 2019, 12:17:01 AM10/5/19
to Eric Dumazet, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
On Sat, Oct 5, 2019 at 2:58 AM Eric Dumazet <eric.d...@gmail.com> wrote:
> > This one is tricky. What I think we need to avoid is an onslaught of
> > patches adding READ_ONCE/WRITE_ONCE without a concrete analysis of the
> > code being modified. My worry is that Joe Developer is eager to get their
> > first patch into the kernel, so runs this tool and starts spamming
> > maintainers with these things to the point that they start ignoring KCSAN
> > reports altogether because of the time they take up.
> >
> > I suppose one thing we could do is to require each new READ_ONCE/WRITE_ONCE
> > to have a comment describing the racy access, a bit like we do for memory
> > barriers. Another possibility would be to use atomic_t more widely if
> > there is genuine concurrency involved.
> >
>
> About READ_ONCE() and WRITE_ONCE(), we will probably need
>
> ADD_ONCE(var, value) for arches that can implement the RMW in a single instruction.
>
> WRITE_ONCE(var, var + value) does not look pretty, and increases register pressure.

FWIW modern compilers can handle this if we tell them what we are trying to do:

void foo(int *p, int x)
{
x += __atomic_load_n(p, __ATOMIC_RELAXED);
__atomic_store_n(p, x, __ATOMIC_RELAXED);
}

$ clang test.c -c -O2 && objdump -d test.o

0000000000000000 <foo>:
0: 01 37 add %esi,(%rdi)
2: c3 retq

We can have syntactic sugar on top of this of course.

Dmitry Vyukov

unread,
Oct 9, 2019, 3:46:05 AM10/9/19
to Eric Dumazet, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
An interesting precedent come up in another KCSAN bug report. Namely,
it may be reasonable for a compiler to use different optimization
heuristics for concurrent and non-concurrent code. Consider there are
some legal code transformations, but it's unclear if they are
profitable or not. It may be the case that for non-concurrent code the
expectation is that it's a profitable transformation, but for
concurrent code it is not. So that may be another reason to
communicate to compiler what we want to do, rather than trying to
trick and play against each other. I've added the concrete example
here:
https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-may-improve-performance

Eric Dumazet

unread,
Oct 9, 2019, 12:39:45 PM10/9/19
to Dmitry Vyukov, Eric Dumazet, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Note that for bit fields, READ_ONCE() wont work.

Concrete example in net/xfrm/xfrm_algo.c:xfrm_probe_algs(void)
...
if (aalg_list[i].available != status)
aalg_list[i].available = status;
...
if (ealg_list[i].available != status)
ealg_list[i].available = status;
...
if (calg_list[i].available != status)
calg_list[i].available = status;

Daniel Axtens

unread,
Oct 10, 2019, 11:46:07 PM10/10/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Thanks, that was what I needed. I can now get it to boot Ubuntu on
ppc64le. Still hitting a lot of things, but we'll poke and prod it a bit
internally and let you know how we get on!

Regards,
Daniel

Andrea Parri

unread,
Oct 14, 2019, 6:35:28 AM10/14/19
to Dmitry Vyukov, Eric Dumazet, Will Deacon, Marco Elver, kasan-dev, LKML, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Unrelated, but maybe worth pointing out/for reference: I think that
the section discussing the LKMM,

https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE#it-is-required-for-kernel-memory-model ,

might benefit from a revision/an update, in particular, the statement
"The Kernel Memory Consistency Model requires marking of all shared
accesses" seems now quite inaccurate to me, c.f., e.g.,

d1a84ab190137 ("tools/memory-model: Add definitions of plain and marked accesses")
0031e38adf387 ("tools/memory-model: Add data-race detection")

and

https://lkml.kernel.org/r/Pine.LNX.4.44L0.1910...@iolanthe.rowland.org .

Thanks,
Andrea

Walter

unread,
Dec 12, 2019, 4:57:33 AM12/12/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, pau...@linux.ibm.com, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, st...@rowland.harvard.edu, aki...@gmail.com, npi...@gmail.com, boqun...@gmail.com, dlu...@nvidia.com, j.al...@ucl.ac.uk, luc.ma...@inria.fr
Hi Marco,

Data racing issues always bothers us, we are happy to use this debug tool to
detect the root cause. So, we need to understand this tool implementation,
we try to trace your code and have some questions, would you take the free time
to answer the question.
Thanks.

Question:
We assume they access the same variable when use read() and write()
Below two Scenario are false negative?

===
Scenario 1:

CPU 0:                                                                                     CPU 1:
tsan_read()                                                                               tsan_write()
  check_access()                                                                         check_access()
     watchpoint=find_watchpoint() // watchpoint=NULL                     watchpoint=find_watchpoint() // watchpoint=NULL
     kcsan_setup_watchpoint()                                                          kcsan_setup_watchpoint()
        watchpoint = insert_watchpoint                                                    watchpoint = insert_watchpoint
        if (!remove_watchpoint(watchpoint)) // no enter, no report           if (!remove_watchpoint(watchpoint)) // no enter, no report

===
Scenario 2:

CPU 0:                                                                                    CPU 1:
tsan_read()
  check_access()
    watchpoint=find_watchpoint() // watchpoint=NULL
    kcsan_setup_watchpoint()
      watchpoint = insert_watchpoint()

tsan_read()                                                                              tsan_write()  
  check_access()                                                                        check_access()
    find_watchpoint()                                            
      if(expect_write && !is_write)
        continue
      return NULL                                                            
    kcsan_setup_watchpoint()
      watchpoint = insert_watchpoint()                                              
      remove_watchpoint(watchpoint)
        watchpoint = INVALID_WATCHPOINT                                                                
                                                                                                      watchpoint = find_watchpoint()                                                                                      
                                                                                                      kcsan_found_watchpoint()
                                                                                                          consumed = try_consume_watchpoint() // consumed=false, no report

Thanks.
Walter

'Marco Elver' via kasan-dev <kasa...@googlegroups.com> 於 2019年9月20日 週五 下午10:19寫道:
Hi all,


We would like to share a new data-race detector for the Linux kernel:
Kernel Concurrency Sanitizer (KCSAN) --
https://github.com/google/ktsan/wiki/KCSAN  (Details:
https://github.com/google/ktsan/blob/kcsan/Documentation/dev-tools/kcsan.rst)

To those of you who we mentioned at LPC that we're working on a
watchpoint-based KTSAN inspired by DataCollider [1], this is it (we
renamed it to KCSAN to avoid confusion with KTSAN).
[1] http://usenix.org/legacy/events/osdi10/tech/full_papers/Erickson.pdf

In the coming weeks we're planning to:
* Set up a syzkaller instance.
* Share the dashboard so that you can see the races that are found.
* Attempt to send fixes for some races upstream (if you find that the
kcsan-with-fixes branch contains an important fix, please feel free to
point it out and we'll prioritize that).

There are a few open questions:
* The big one: most of the reported races are due to unmarked
accesses; prioritization or pruning of races to focus initial efforts
to fix races might be required. Comments on how best to proceed are
welcome. We're aware that these are issues that have recently received
attention in the context of the LKMM
(https://lwn.net/Articles/793253/).
* How/when to upstream KCSAN?

Feel free to test and send feedback.

Thanks,
-- Marco

--
You received this message because you are subscribed to the Google Groups "kasan-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.

Marco Elver

unread,
Dec 12, 2019, 3:53:42 PM12/12/19
to Walter, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
On Thu, 12 Dec 2019 at 10:57, Walter <tru...@gmail.com> wrote:
>
> Hi Marco,
>
> Data racing issues always bothers us, we are happy to use this debug tool to
> detect the root cause. So, we need to understand this tool implementation,
> we try to trace your code and have some questions, would you take the free time
> to answer the question.
> Thanks.
>
> Question:
> We assume they access the same variable when use read() and write()
> Below two Scenario are false negative?
>
> ===
> Scenario 1:
>
> CPU 0: CPU 1:
> tsan_read() tsan_write()
> check_access() check_access()
> watchpoint=find_watchpoint() // watchpoint=NULL watchpoint=find_watchpoint() // watchpoint=NULL
> kcsan_setup_watchpoint() kcsan_setup_watchpoint()
> watchpoint = insert_watchpoint watchpoint = insert_watchpoint

Assumption: have more than 1 free slot for the address, otherwise
impossible that both set up a watchpoint.

> if (!remove_watchpoint(watchpoint)) // no enter, no report if (!remove_watchpoint(watchpoint)) // no enter, no report

Correct.

> ===
> Scenario 2:
>
> CPU 0: CPU 1:
> tsan_read()
> check_access()
> watchpoint=find_watchpoint() // watchpoint=NULL
> kcsan_setup_watchpoint()
> watchpoint = insert_watchpoint()
>
> tsan_read() tsan_write()
> check_access() check_access()
> find_watchpoint()
> if(expect_write && !is_write)
> continue
> return NULL
> kcsan_setup_watchpoint()
> watchpoint = insert_watchpoint()
> remove_watchpoint(watchpoint)
> watchpoint = INVALID_WATCHPOINT
> watchpoint = find_watchpoint()
> kcsan_found_watchpoint()

This is a bit incorrect, because if atomically setting watchpoint to
INVALID_WATCHPOINT happened before concurrent find_watchpoint(),
find_watchpoint will not return anything, thus not entering
kcsan_found_watchpoint. If find_watchpoint happened before setting
watchpoint to INVALID_WATCHPOINT, the rest of the trace matches.
Either way, no reporting will happen.

> consumed = try_consume_watchpoint() // consumed=false, no report

Correct again, no reporting would happen. While running, have a look
at /sys/kernel/debug/kcsan and look at the 'report_races' counter;
that counter tells you how often this case actually occurred. In all
our testing with the default config, this case is extremely rare.

As it says on the tin, KCSAN is a *sampling watchpoint* based data
race detector so all the above are expected. If you want to tweak
KCSAN's config to be more aggressive, there are various options
available. The most important ones:

* KCSAN_UDELAY_{TASK,INTERRUPT} -- Watchpoint delay in microseconds
for tasks and interrupts respectively. [Increasing this will make
KCSAN more aggressive.]
* KCSAN_SKIP_WATCH -- Skip instructions before setting up watchpoint.
[Decreasing this will make KCSAN more aggressive.]

Note, however, that making KCSAN more aggressive also implies a
noticeable performance hit.

Also, please find the latest version here:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=kcsan
-- there have been a number of changes since the initial version from
September/October.

Thanks,
-- Marco

Walter Wu

unread,
Dec 13, 2019, 7:26:18 AM12/13/19
to Marco Elver, kasan-dev, LKML, Dmitry Vyukov, Andrey Konovalov, Alexander Potapenko, Paul E. McKenney, Paul Turner, Daniel Axtens, Anatol Pomazau, Will Deacon, Andrea Parri, Alan Stern, LKMM Maintainers -- Akira Yokosawa, Nicholas Piggin, Boqun Feng, Daniel Lustig, Jade Alglave, Luc Maranget
Marco Elver <el...@google.com> 於 2019年12月13日 週五 上午4:53寫道:
Timing should be an important factor for data racing, if we change this config,
May the data racing issue be disappeared? or?

* KCSAN_SKIP_WATCH -- Skip instructions before setting up watchpoint.
[Decreasing this will make KCSAN more aggressive.]
 
I see. 

Note, however, that making KCSAN more aggressive also implies a
noticeable performance hit.

Thanks for your suggestion.
In fact, we are more concerned about false positive, we haven't seen it. if we see it is
exist, we will report upwards to you. 

Also, please find the latest version here:
https://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git/log/?h=kcsan
-- there have been a number of changes since the initial version from
September/October.

Yes, we will continue to see newer KCSAN, when it is upstream success, we will import it
to our company, and I have another question, do you know whether it will merge into
Android common kernel? 

Walter
Reply all
Reply to author
Forward
0 new messages