rcu: WARNING in rcu_seq_end

26 views
Skip to first unread message

Dmitry Vyukov

unread,
Mar 4, 2017, 11:01:40 AM3/4/17
to Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Hello,

Paul, you wanted bugs in rcu.

I've got this WARNING while running syzkaller fuzzer on
86292b33d4b79ee03e2f43ea0381ef85f077c760:

------------[ cut here ]------------
WARNING: CPU: 0 PID: 4832 at kernel/rcu/tree.c:3533
rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 4832 Comm: kworker/0:3 Not tainted 4.10.0+ #276
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events wait_rcu_exp_gp
Call Trace:
__dump_stack lib/dump_stack.c:15 [inline]
dump_stack+0x2ee/0x3ef lib/dump_stack.c:51
panic+0x1fb/0x412 kernel/panic.c:179
__warn+0x1c4/0x1e0 kernel/panic.c:540
warn_slowpath_null+0x2c/0x40 kernel/panic.c:583
rcu_seq_end+0x110/0x140 kernel/rcu/tree.c:3533
rcu_exp_gp_seq_end kernel/rcu/tree_exp.h:36 [inline]
rcu_exp_wait_wake+0x8a9/0x1330 kernel/rcu/tree_exp.h:517
rcu_exp_sel_wait_wake kernel/rcu/tree_exp.h:559 [inline]
wait_rcu_exp_gp+0x83/0xc0 kernel/rcu/tree_exp.h:570
process_one_work+0xc06/0x1c20 kernel/workqueue.c:2096
worker_thread+0x223/0x19c0 kernel/workqueue.c:2230
kthread+0x326/0x3f0 kernel/kthread.c:227
ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled
Rebooting in 86400 seconds..


Not reproducible. But looking at the code, shouldn't it be:

static void rcu_seq_end(unsigned long *sp)
{
smp_mb(); /* Ensure update-side operation before counter increment. */
+ WARN_ON_ONCE(!(*sp & 0x1));
WRITE_ONCE(*sp, *sp + 1);
- WARN_ON_ONCE(*sp & 0x1);
}

?

Otherwise wait_event in _synchronize_rcu_expedited can return as soon
as WRITE_ONCE(*sp, *sp + 1) finishes. As far as I understand this
consequently can allow start of next grace periods. Which in turn can
make the warning fire. Am I missing something?

I don't see any other bad consequences of this. The rest of
rcu_exp_wait_wake can proceed when _synchronize_rcu_expedited has
returned and destroyed work on stack and next period has started and
ended, but it seems OK.

Paul E. McKenney

unread,
Mar 4, 2017, 8:14:17 PM3/4/17
to Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
On Sat, Mar 04, 2017 at 05:01:19PM +0100, Dmitry Vyukov wrote:
> Hello,
>
> Paul, you wanted bugs in rcu.

Well, whether I want them or not, I must deal with them. ;-)
I believe that this is a heygood change, but I don't see how it will
help in this case. BTW, may I have your Signed-off-by?

The reason I don't believe that it will help is that the
rcu_exp_gp_seq_end() function is called from a workqueue handler that
is invoked holding ->exp_mutex, and this mutex is not released until
after the handler invokes rcu_seq_end() and then wakes up the task that
scheduled the workqueue handler. So the ordering above should not matter
(but I agree that your ordering is cleaner.

That said, it looks like I am missing some memory barriers, please
see the following patch.

But what architecture did you see this on?

Thanx, Paul

------------------------------------------------------------------------

commit 1038d97908afb04750d0775748e2165a6e92d851
Author: Paul E. McKenney <pau...@linux.vnet.ibm.com>
Date: Sat Mar 4 12:33:53 2017 -0800

rcu: Expedited wakeups need to be fully ordered

Expedited grace periods use workqueue handlers that wake up the requesters,
but there is no lock mediating this wakeup. Therefore, memory barriers
are required to ensure that the handler's memory references are seen by
all to occur before synchronize_*_expedited() returns to its caller.
Possibly detected by syzkaller.

Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Paul E. McKenney <pau...@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 51ca287828a2..027e123d93c7 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -533,6 +533,7 @@ static void rcu_exp_wait_wake(struct rcu_state *rsp, unsigned long s)
rnp->exp_seq_rq = s;
spin_unlock(&rnp->exp_lock);
}
+ smp_mb(); /* All above changes before wakeup. */
wake_up_all(&rnp->exp_wq[(rsp->expedited_sequence >> 1) & 0x3]);
}
trace_rcu_exp_grace_period(rsp->name, s, TPS("endwake"));
@@ -614,6 +615,7 @@ static void _synchronize_rcu_expedited(struct rcu_state *rsp,
wait_event(rnp->exp_wq[(s >> 1) & 0x3],
sync_exp_work_done(rsp,
&rdp->exp_workdone0, s));
+ smp_mb(); /* Workqueue actions happen before return. */

/* Let the next expedited grace period start. */
mutex_unlock(&rsp->exp_mutex);

Dmitry Vyukov

unread,
Mar 5, 2017, 5:51:00 AM3/5/17
to Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
This is just x86.

You seem to assume that wait_event() waits for the wakeup. It does not
work this way. It can return as soon as the condition becomes true
without ever waiting:

305 #define wait_event(wq, condition) \
306 do { \
307 might_sleep(); \
308 if (condition) \
309 break; \
310 __wait_event(wq, condition); \
311 } while (0)

Mailed a signed patch:
https://groups.google.com/d/msg/syzkaller/XzUXuAzKkCw/5054wU9MEAAJ

Paul E. McKenney

unread,
Mar 5, 2017, 1:47:42 PM3/5/17
to Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Agreed, hence my patch in the previous email. I guess I knew that, but
on the day I wrote that code, my fingers didn't. Or somew similar lame
excuse. ;-)
This is the patch you also sent by email, that moves the WARN_ON_ONCE(),
thank you!

Thanx, Paul
>

Dmitry Vyukov

unread,
Mar 6, 2017, 4:24:45 AM3/6/17
to Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
On Sun, Mar 5, 2017 at 7:47 PM, Paul E. McKenney
Ah, you meant to synchronize rcu_seq_end with rcu_seq_done?
I think you placed the barrier incorrectly for that. rcu_exp_wait_wake
is already too late. The write that unblocks waiter is in rcu_seq_end
so you need a release barrier _before_ that write.
Also can we please start using smp_load_acquire/smp_store_release
where they are what doctor said. They are faster, more readable,
better for race detectors _and_ would prevent you from introducing
this bug, because you would need to find the exact write that
signifies completion. I.e.:

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d80c2587bed8..aa7ba83f6a56 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp)
static void rcu_seq_end(unsigned long *sp)
{
smp_mb(); /* Ensure update-side operation before counter increment. */
- WRITE_ONCE(*sp, *sp + 1);
+ smp_store_release(sp, *sp + 1);
WARN_ON_ONCE(*sp & 0x1);
}

@@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp)
*/
static bool rcu_seq_done(unsigned long *sp, unsigned long s)
{
- return ULONG_CMP_GE(READ_ONCE(*sp), s);
+ return ULONG_CMP_GE(smp_load_acquire(sp), s);

Paul E. McKenney

unread,
Mar 6, 2017, 5:07:49 AM3/6/17
to Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
No, there is a mutex release and acquisition that do the synchronization,
but only if the wakeup has appropriate barriers. The issue is that
part of the mutex's critical section executes in a workqueue, possibly
on some other CPU.

Thanx, Paul

Dmitry Vyukov

unread,
Mar 6, 2017, 5:11:44 AM3/6/17
to Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
On Mon, Mar 6, 2017 at 11:07 AM, Paul E. McKenney
What is that mutex? And what locks/unlocks provide synchronization? I
see that one uses exp_mutex and another -- exp_wake_mutex.

Paul E. McKenney

unread,
Mar 6, 2017, 6:08:07 PM3/6/17
to Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Both of them.

->exp_mutex is acquired by the task requesting the grace period, and
the counter's first increment is done by that task under that mutex.
This task then schedules a workqueue, which drives forward the grace
period. Upon grace-period completion, the workqueue handler does the
second increment (the one that your patch addressed). The workqueue
handler then acquires ->exp_wake_mutex and wakes the task that holds
->exp_mutex (along with all other tasks waiting for this grace period),
and that task releases ->exp_mutex, which allows the next grace period to
start (and the first increment for that next grace period to be carried
out under that lock). The workqueue handler releases ->exp_wake_mutex
after finishing its wakeups.

Does that make sense?

Thanx, Paul

Dmitry Vyukov

unread,
Mar 7, 2017, 2:05:40 AM3/7/17
to Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Then we need the following for the case when task requesting the grace
period does not block, right?

diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index d80c2587bed8..aa7ba83f6a56 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp)
static void rcu_seq_end(unsigned long *sp)
{
smp_mb(); /* Ensure update-side operation before counter increment. */

Boqun Feng

unread,
Mar 7, 2017, 9:29:12 AM3/7/17
to Dmitry Vyukov, Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote:
[...]
> >>
> >> What is that mutex? And what locks/unlocks provide synchronization? I
> >> see that one uses exp_mutex and another -- exp_wake_mutex.
> >
> > Both of them.
> >
> > ->exp_mutex is acquired by the task requesting the grace period, and
> > the counter's first increment is done by that task under that mutex.
> > This task then schedules a workqueue, which drives forward the grace
> > period. Upon grace-period completion, the workqueue handler does the
> > second increment (the one that your patch addressed). The workqueue
> > handler then acquires ->exp_wake_mutex and wakes the task that holds
> > ->exp_mutex (along with all other tasks waiting for this grace period),
> > and that task releases ->exp_mutex, which allows the next grace period to
> > start (and the first increment for that next grace period to be carried
> > out under that lock). The workqueue handler releases ->exp_wake_mutex
> > after finishing its wakeups.
>
>
> Then we need the following for the case when task requesting the grace
> period does not block, right?
>

Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the
smp_mb__before_atomic() in sync_exp_work_done() already provide the
required ordering, no?

Regards,
Boqun
signature.asc

Dmitry Vyukov

unread,
Mar 7, 2017, 9:44:03 AM3/7/17
to Boqun Feng, Paul McKenney, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
On Tue, Mar 7, 2017 at 3:27 PM, Boqun Feng <boqun...@gmail.com> wrote:
> On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote:
> [...]
>> >>
>> >> What is that mutex? And what locks/unlocks provide synchronization? I
>> >> see that one uses exp_mutex and another -- exp_wake_mutex.
>> >
>> > Both of them.
>> >
>> > ->exp_mutex is acquired by the task requesting the grace period, and
>> > the counter's first increment is done by that task under that mutex.
>> > This task then schedules a workqueue, which drives forward the grace
>> > period. Upon grace-period completion, the workqueue handler does the
>> > second increment (the one that your patch addressed). The workqueue
>> > handler then acquires ->exp_wake_mutex and wakes the task that holds
>> > ->exp_mutex (along with all other tasks waiting for this grace period),
>> > and that task releases ->exp_mutex, which allows the next grace period to
>> > start (and the first increment for that next grace period to be carried
>> > out under that lock). The workqueue handler releases ->exp_wake_mutex
>> > after finishing its wakeups.
>>
>>
>> Then we need the following for the case when task requesting the grace
>> period does not block, right?
>>
>
> Won't be necessary I think, as the smp_mb() in rcu_seq_end() and the
> smp_mb__before_atomic() in sync_exp_work_done() already provide the
> required ordering, no?

smp_mb() is probably fine, but smp_mb__before_atomic() is release not
acquire. If we want to play that game, then I guess we also need
smp_mb__after_atomic() there. But it would be way easier to understand
what's happens there and prove that it's correct, if we use
store_release/load_acquire.

Paul E. McKenney

unread,
Mar 7, 2017, 10:16:42 AM3/7/17
to Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
On Tue, Mar 07, 2017 at 08:05:19AM +0100, Dmitry Vyukov wrote:
First and foremost, thank you for reviewing this stuff!!!

> Then we need the following for the case when task requesting the grace
> period does not block, right?
>
> diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> index d80c2587bed8..aa7ba83f6a56 100644
> --- a/kernel/rcu/tree.c
> +++ b/kernel/rcu/tree.c
> @@ -3534,7 +3534,7 @@ static void rcu_seq_start(unsigned long *sp)
> static void rcu_seq_end(unsigned long *sp)
> {
> smp_mb(); /* Ensure update-side operation before counter increment. */
> - WRITE_ONCE(*sp, *sp + 1);
> + smp_store_release(sp, *sp + 1);
> WARN_ON_ONCE(*sp & 0x1);
> }

The above is not needed, as the smp_mb() covers it completely.

> @@ -3554,7 +3554,7 @@ static unsigned long rcu_seq_snap(unsigned long *sp)
> */
> static bool rcu_seq_done(unsigned long *sp, unsigned long s)
> {
> - return ULONG_CMP_GE(READ_ONCE(*sp), s);
> + return ULONG_CMP_GE(smp_load_acquire(sp), s);

And this one is covered by the smp_mb__before_atomic() inside the "if"
statement in sync_exp_work_done(), via rcu_exp_gp_seq_done(). That is,
sync_exp_work_done() invokes rcu_exp_gp_seq_done() which in turn invokes
rcu_seq_done().

Why smp_mb() instead of release-acquire? Because the grace-period
relationship is quite strong, it cannot be implemented with
release-acquire on all platforms.

Thanx, Paul

Paul E. McKenney

unread,
Mar 7, 2017, 10:27:22 AM3/7/17
to Dmitry Vyukov, Boqun Feng, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Fair point, how about the following?

Thanx, Paul

------------------------------------------------------------------------

commit 6fd8074f1976596898e39f5b7ea1755652533906
Author: Paul E. McKenney <pau...@linux.vnet.ibm.com>
Date: Tue Mar 7 07:21:23 2017 -0800

rcu: Add smp_mb__after_atomic() to sync_exp_work_done()

The sync_exp_work_done() function needs to fully order the counter-check
operation against anything happening after the corresponding grace period.
This is a theoretical bug, as all current architectures either provide
full ordering for atomic operation on the one hand or implement,
however, a little future-proofing is a good thing. This commit
therefore adds smp_mb__after_atomic() after the atomic_long_inc()
in sync_exp_work_done().

Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Paul E. McKenney <pau...@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 027e123d93c7..652071abd9b4 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -247,6 +247,7 @@ static bool sync_exp_work_done(struct rcu_state *rsp, atomic_long_t *stat,
/* Ensure test happens before caller kfree(). */
smp_mb__before_atomic(); /* ^^^ */
atomic_long_inc(stat);
+ smp_mb__after_atomic(); /* ^^^ */
return true;
}
return false;

Dmitry Vyukov

unread,
Mar 7, 2017, 1:38:18 PM3/7/17
to Paul McKenney, Boqun Feng, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
I am not qualified enough to reason about these smp_mb__after_atomic.
From practical point of view there may be enough barriers in the
resulting machine code already, but re formal semantics of adding
smp_mb__after_atomic after an unrelated subsequent atomic RMW op I
gave up. You must be the best candidate for this now :)

Paul E. McKenney

unread,
Mar 7, 2017, 2:09:38 PM3/7/17
to Dmitry Vyukov, Boqun Feng, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Unfortunately, there are code paths from sync_exp_work_done() that
have no memory barriers. :-(

And I might be the best candidate, but this email thread has definitely
shown that I am not infallable, never mind that there was already
plenty of evidence on this particular point. So thank you again for
your testing and review efforts!

Boqun Feng

unread,
Mar 7, 2017, 6:05:22 PM3/7/17
to Paul E. McKenney, Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
The point is that smp_mb__before_atomic() + atomic_long_inc() will
guarantee a smp_mb() before or right along with the atomic operation,
and that's enough because rcu_seq_done() followed by a smp_mb() will
give it a acquire-like behavior.

> > smp_mb__after_atomic() there. But it would be way easier to understand

Adding smp_mb__after_atomic() would be pointless as it's the load of
->expedited_sequence that we want to ensure having acquire behavior
rather than the atomic increment of @stat.
If we really care about future-proofing, I think it's more safe to
change smp_mb__before_atomic() to smp_mb() rather than adding
__after_atomic() barrier. Though I think both would be unnecessary ;-)

Regards,
Boqun

> return true;
> }
> return false;
>
signature.asc

Paul E. McKenney

unread,
Mar 7, 2017, 6:32:01 PM3/7/17
to Boqun Feng, Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Given current architectures, true enough, from what I can see.

However, let's take a look at atomic_ops.rst:


If a caller requires memory barrier semantics around an atomic_t
operation which does not return a value, a set of interfaces are
defined which accomplish this::

void smp_mb__before_atomic(void);
void smp_mb__after_atomic(void);

For example, smp_mb__before_atomic() can be used like so::

obj->dead = 1;
smp_mb__before_atomic();
atomic_dec(&obj->ref_count);

It makes sure that all memory operations preceding the atomic_dec()
call are strongly ordered with respect to the atomic counter
operation. In the above example, it guarantees that the assignment of
"1" to obj->dead will be globally visible to other cpus before the
atomic counter decrement.

Without the explicit smp_mb__before_atomic() call, the
implementation could legally allow the atomic counter update visible
to other cpus before the "obj->dead = 1;" assignment.

So the ordering is guaranteed against the atomic operation, not
necessarily the stuff after it. But again, the implementations I know
of do make the guarantee, hence my calling it a theoretical bug in the
commit log.

> > > smp_mb__after_atomic() there. But it would be way easier to understand
>
> Adding smp_mb__after_atomic() would be pointless as it's the load of
> ->expedited_sequence that we want to ensure having acquire behavior
> rather than the atomic increment of @stat.

Again, agreed given current code, but atomic_ops.rst doesn't guarantee
ordering past the actual atomic operation itself.

Thanx, Paul

Boqun Feng

unread,
Mar 7, 2017, 8:39:18 PM3/7/17
to Paul E. McKenney, Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Fair enough ;-) It's me who misunderstood this part of document.

However, the names of the barriers are smp_mb__{before,after}_atomic(),
so if they, semantically, only provide ordering for the corresponding
atomic ops rather than a full barrier, I would their names are
misleading ;-)

> > > > smp_mb__after_atomic() there. But it would be way easier to understand
> >
> > Adding smp_mb__after_atomic() would be pointless as it's the load of
> > ->expedited_sequence that we want to ensure having acquire behavior
> > rather than the atomic increment of @stat.
>
> Again, agreed given current code, but atomic_ops.rst doesn't guarantee
> ordering past the actual atomic operation itself.
>

Neither does atomic_ops.rst guarantee the ordering between a load before
the atomic op and memory accesses after the atomic op, right? I.e.
atomic_ops.rst doesn't say no for reordering like this:

r1 = READ_ONCE(a); ---------+
atomic_long_inc(b); |
smp_mb__after_atomic(); |
WRITE_ONCE(c); |
{r1 = READ_ONCE(a)} <-------+

So it's still not an acquire for READ_ONCE(a), in our case "a" is
->expedited_sequence.

To me, we can either fix the atomic_ops.rst or, as I proposed, just
change smp_mb__before_atomic() to smp_mb().

Thoughts?

Regards,
Boqun
signature.asc

Paul E. McKenney

unread,
Mar 7, 2017, 9:26:10 PM3/7/17
to Boqun Feng, Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
Well, if you have both ordering before and after, then you have full
ordering.

> > > > > smp_mb__after_atomic() there. But it would be way easier to understand
> > >
> > > Adding smp_mb__after_atomic() would be pointless as it's the load of
> > > ->expedited_sequence that we want to ensure having acquire behavior
> > > rather than the atomic increment of @stat.
> >
> > Again, agreed given current code, but atomic_ops.rst doesn't guarantee
> > ordering past the actual atomic operation itself.
>
> Neither does atomic_ops.rst guarantee the ordering between a load before
> the atomic op and memory accesses after the atomic op, right? I.e.
> atomic_ops.rst doesn't say no for reordering like this:
>
> r1 = READ_ONCE(a); ---------+
> atomic_long_inc(b); |
> smp_mb__after_atomic(); |
> WRITE_ONCE(c); |
> {r1 = READ_ONCE(a)} <-------+
>
> So it's still not an acquire for READ_ONCE(a), in our case "a" is
> ->expedited_sequence.
>
> To me, we can either fix the atomic_ops.rst or, as I proposed, just
> change smp_mb__before_atomic() to smp_mb().

Or have both an smp_mb__before_atomic() and an smp_mb__after_atomic(),
as is the usual approach when you need full ordering. ;-)

Thanx, Paul

Boqun Feng

unread,
Mar 7, 2017, 9:44:24 PM3/7/17
to Paul E. McKenney, Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
I mean the names of the barriers are *smp_mb*__before_atomic() and
*smp_mb*__after_atomic(), so it's natural to think they provide a
smp_mb() in some situations ;-)

> > > > > > smp_mb__after_atomic() there. But it would be way easier to understand
> > > >
> > > > Adding smp_mb__after_atomic() would be pointless as it's the load of
> > > > ->expedited_sequence that we want to ensure having acquire behavior
> > > > rather than the atomic increment of @stat.
> > >
> > > Again, agreed given current code, but atomic_ops.rst doesn't guarantee
> > > ordering past the actual atomic operation itself.
> >
> > Neither does atomic_ops.rst guarantee the ordering between a load before
> > the atomic op and memory accesses after the atomic op, right? I.e.
> > atomic_ops.rst doesn't say no for reordering like this:
> >
> > r1 = READ_ONCE(a); ---------+
> > atomic_long_inc(b); |
> > smp_mb__after_atomic(); |
> > WRITE_ONCE(c); |
> > {r1 = READ_ONCE(a)} <-------+
> >
> > So it's still not an acquire for READ_ONCE(a), in our case "a" is
> > ->expedited_sequence.
> >
> > To me, we can either fix the atomic_ops.rst or, as I proposed, just
> > change smp_mb__before_atomic() to smp_mb().
>
> Or have both an smp_mb__before_atomic() and an smp_mb__after_atomic(),
> as is the usual approach when you need full ordering. ;-)
>

Yes ;-) It's just that "adding a barrier after one operation to provide
acquire semantic for another operation" looks weird to me.

Regards,
Boqun
signature.asc

Paul E. McKenney

unread,
Mar 7, 2017, 10:08:13 PM3/7/17
to Boqun Feng, Dmitry Vyukov, jo...@joshtriplett.org, Steven Rostedt, Mathieu Desnoyers, jiangs...@gmail.com, LKML, syzkaller
But they are memory barriers! They are -suppposed- to look weird! ;-)

Thanx, Paul
Reply all
Reply to author
Forward
0 new messages