[BUG] WARNING: at kernel/trace/ring

Li Zefan

unread,

Mar 8, 2010, 1:10:03 AM3/8/10

to

While running a ftrace test, a kernel warning showed up. The bug is not so
easy to reproduce, normally it takes 30 mins to 2 hours.

After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
for 15 hours, and the bug seems disapeared.

------------[ cut here ]------------
WARNING: at kernel/trace/ring_buffer.c:3420 ring_buffer_reset_cpu+0x5a/0x9c()
Hardware name: PRIMERGY
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat bridge stp llc deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput i5k_amb hwmon e1000e i5000_edac edac_core iTCO_wdt iTCO_vendor_support shpchp i2c_i801 i2c_core serio_raw pcspkr pata_acpi ata_generic mptsas mptscsih ata_piix mptbase scsi_transport_sas dm_mod [last unloaded: speedstep_lib]
Pid: 2108, comm: ftrace_current_ Tainted: G W 2.6.33-tip+ #2
Call Trace:
[<ffffffff8104ebd0>] warn_slowpath_common+0x7c/0x94
[<ffffffff8104ebfc>] warn_slowpath_null+0x14/0x16
[<ffffffff810a50f5>] ring_buffer_reset_cpu+0x5a/0x9c
[<ffffffff810aa3f0>] tracing_reset_online_cpus+0x4a/0x79
[<ffffffff810aa434>] tracer_init+0x15/0x20
[<ffffffff810aa601>] tracing_set_tracer+0x1c2/0x1e7
[<ffffffff810d9847>] ? might_fault+0x5c/0xac
[<ffffffff810aa6b2>] tracing_set_trace_write+0x8c/0xbc
[<ffffffff810a8e79>] ? trace_buffer_lock_reserve+0x21/0x56
[<ffffffff810a95ce>] ? trace_current_buffer_lock_reserve+0x16/0x18
[<ffffffff813ef135>] ? retint_swapgs+0x13/0x1b
[<ffffffff81104618>] vfs_write+0xae/0x10b
[<ffffffff81104735>] sys_write+0x4a/0x6e
[<ffffffff81009da6>] tracesys+0xd0/0xd5
---[ end trace f7002f0890d5b023 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Lai Jiangshan

unread,

Mar 8, 2010, 2:00:01 AM3/8/10

to

Li Zefan wrote:
> While running a ftrace test, a kernel warning showed up. The bug is not so
> easy to reproduce, normally it takes 30 mins to 2 hours.
>
> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> for 15 hours, and the bug seems disapeared.
>

This bug exhausted us, we found a issue: a tiny window which is
one of causes of this bug. This patch is to close this tiny window.
(bug is not fixed)

From: Lai Jiangshan <la...@cn.fujitsu.com>

ringbuffer resizing and reseting will increase the ->record_disabled
and then wait until a rcu_shced grace period passes.

Contrarily, testing ->record_disabled should be at the same
preempt disabled critical region as writing into ringbuffer, otherwise
it will leave a window break ringbuffer resizing or reseting.

Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
Reported-by: Li Zefan <li...@cn.fujitsu.com>
---
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8c1b2d2..54191d6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2232,12 +2232,12 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
if (ring_buffer_flags != RB_BUFFERS_ON)
return NULL;

- if (atomic_read(&buffer->record_disabled))
- return NULL;
-
/* If we are tracing schedule, we don't want to recurse */
resched = ftrace_preempt_disable();

+ if (atomic_read(&buffer->record_disabled))
+ goto out_nocheck;
+
if (trace_recursive_lock())
goto out_nocheck;

@@ -2469,11 +2469,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
if (ring_buffer_flags != RB_BUFFERS_ON)
return -EBUSY;

- if (atomic_read(&buffer->record_disabled))
- return -EBUSY;
-
resched = ftrace_preempt_disable();

+ if (atomic_read(&buffer->record_disabled))
+ goto out;
+
cpu = raw_smp_processor_id();

if (!cpumask_test_cpu(cpu, buffer->cpumask))

Steven Rostedt

unread,

Mar 8, 2010, 10:00:02 AM3/8/10

to

On Mon, 2010-03-08 at 14:04 +0800, Li Zefan wrote:
> While running a ftrace test, a kernel warning showed up. The bug is not so
> easy to reproduce, normally it takes 30 mins to 2 hours.

What test are you running? Can you share it?

>
> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> for 15 hours, and the bug seems disapeared.

So basically, you are reading both the trace and trace_pipe (or
trace_pipe_raw) at the same time?

-- Steve

Steven Rostedt

unread,

Mar 8, 2010, 10:10:01 AM3/8/10

to

On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:

> > ringbuffer resizing and reseting will increase the ->record_disabled
> > and then wait until a rcu_shced grace period passes.
> >
> > Contrarily, testing ->record_disabled should be at the same
> > preempt disabled critical region as writing into ringbuffer, otherwise
> > it will leave a window break ringbuffer resizing or reseting.
>

> So the resizing and the resetting need a synchronize_sched() after the
> disabling of the buffers, right?

Looking at the code, the synchronize_sched() is already done in
ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
disables the ring buffer and calls synchronize_sched().

With that, what other window is still opened (after this fix)?

-- Steve

Steven Rostedt

unread,

Mar 8, 2010, 10:10:02 AM3/8/10

to

On Mon, 2010-03-08 at 14:50 +0800, Lai Jiangshan wrote:
> Li Zefan wrote:
> > While running a ftrace test, a kernel warning showed up. The bug is not so
> > easy to reproduce, normally it takes 30 mins to 2 hours.
> >
> > After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> > for 15 hours, and the bug seems disapeared.
> >
>
> This bug exhausted us, we found a issue: a tiny window which is
> one of causes of this bug. This patch is to close this tiny window.
> (bug is not fixed)
>
> From: Lai Jiangshan <la...@cn.fujitsu.com>
>
> ringbuffer resizing and reseting will increase the ->record_disabled
> and then wait until a rcu_shced grace period passes.
>
> Contrarily, testing ->record_disabled should be at the same
> preempt disabled critical region as writing into ringbuffer, otherwise
> it will leave a window break ringbuffer resizing or reseting.

So the resizing and the resetting need a synchronize_sched() after the

disabling of the buffers, right?

-- Steve

Li Zefan

unread,

Mar 8, 2010, 8:10:02 PM3/8/10

to

Steven Rostedt wrote:
> On Mon, 2010-03-08 at 14:04 +0800, Li Zefan wrote:
>> While running a ftrace test, a kernel warning showed up. The bug is not so
>> easy to reproduce, normally it takes 30 mins to 2 hours.
>
> What test are you running? Can you share it?
>

The test runs 10+ processes, reading/writing files in debugfs/tracing/
simultaneously.

I'll send it to you in a private mail.

>> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
>> for 15 hours, and the bug seems disapeared.
>
> So basically, you are reading both the trace and trace_pipe (or
> trace_pipe_raw) at the same time?
>

reading trace, yes. reading trace_pipe, no.

Actualy we don't see how this commit relates to the bug.
I'll run the test again to be sure on this.

Lai Jiangshan

unread,

Mar 8, 2010, 10:40:01 PM3/8/10

to

Steven Rostedt wrote:
> On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
>
>>> ringbuffer resizing and reseting will increase the ->record_disabled
>>> and then wait until a rcu_shced grace period passes.
>>>
>>> Contrarily, testing ->record_disabled should be at the same
>>> preempt disabled critical region as writing into ringbuffer, otherwise
>>> it will leave a window break ringbuffer resizing or reseting.
>> So the resizing and the resetting need a synchronize_sched() after the
>> disabling of the buffers, right?
>
> Looking at the code, the synchronize_sched() is already done in
> ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
> disables the ring buffer and calls synchronize_sched().
>
> With that, what other window is still opened (after this fix)?
>

This window is still opened: (RCU vs IDLE vs Tracing)

synchronize_sched() does not protect preempt_disable()/enable() for
idle process. But tracing(function_graph, function) introduce more
preempt_disable()/enable() for idle process. It brings windows.

I bet that this bug is not come from this window.
(I added some strict code to RCU and did stress test,
bug was still occurred.)

Steven Rostedt

unread,

Mar 8, 2010, 11:00:02 PM3/8/10

to

On Tue, 2010-03-09 at 11:35 +0800, Lai Jiangshan wrote:
> Steven Rostedt wrote:
> > On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
> >
> >>> ringbuffer resizing and reseting will increase the ->record_disabled
> >>> and then wait until a rcu_shced grace period passes.
> >>>
> >>> Contrarily, testing ->record_disabled should be at the same
> >>> preempt disabled critical region as writing into ringbuffer, otherwise
> >>> it will leave a window break ringbuffer resizing or reseting.
> >> So the resizing and the resetting need a synchronize_sched() after the
> >> disabling of the buffers, right?
> >
> > Looking at the code, the synchronize_sched() is already done in
> > ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
> > disables the ring buffer and calls synchronize_sched().
> >
> > With that, what other window is still opened (after this fix)?
> >
>
> This window is still opened: (RCU vs IDLE vs Tracing)
>
> synchronize_sched() does not protect preempt_disable()/enable() for
> idle process. But tracing(function_graph, function) introduce more
> preempt_disable()/enable() for idle process. It brings windows.

It most definitely should. If it does not, then we have major problems.
>From the comment above synchronize_sched(void):

/**
* synchronize_sched - wait until an rcu-sched grace period has elapsed.
*
* Control will return to the caller some time after a full rcu-sched
* grace period has elapsed, in other words after all currently executing
* rcu-sched read-side critical sections have completed. These read-side
* critical sections are delimited by rcu_read_lock_sched() and
* rcu_read_unlock_sched(), and may be nested. Note that preempt_disable(),
* local_irq_disable(), and so on may be used in place of
* rcu_read_lock_sched().

I do not see how idle can be ignored here.

>
> I bet that this bug is not come from this window.
> (I added some strict code to RCU and did stress test,
> bug was still occurred.)
>

What exactly were you doing when this happened? Was this from Li's test?

-- Steve

Steven Rostedt

unread,

Mar 9, 2010, 12:30:03 PM3/9/10

to

On Tue, 2010-03-09 at 11:35 +0800, Lai Jiangshan wrote:

> Steven Rostedt wrote:
> > On Mon, 2010-03-08 at 10:03 -0500, Steven Rostedt wrote:
> >
> >>> ringbuffer resizing and reseting will increase the ->record_disabled
> >>> and then wait until a rcu_shced grace period passes.
> >>>
> >>> Contrarily, testing ->record_disabled should be at the same
> >>> preempt disabled critical region as writing into ringbuffer, otherwise
> >>> it will leave a window break ringbuffer resizing or reseting.
> >> So the resizing and the resetting need a synchronize_sched() after the
> >> disabling of the buffers, right?
> >
> > Looking at the code, the synchronize_sched() is already done in
> > ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
> > disables the ring buffer and calls synchronize_sched().
> >
> > With that, what other window is still opened (after this fix)?
> >
>
> This window is still opened: (RCU vs IDLE vs Tracing)
>
> synchronize_sched() does not protect preempt_disable()/enable() for
> idle process. But tracing(function_graph, function) introduce more
> preempt_disable()/enable() for idle process. It brings windows.
>
> I bet that this bug is not come from this window.
> (I added some strict code to RCU and did stress test,
> bug was still occurred.)

With Li's test, I was able to trigger the ring_buffer warning about
resetting while committing. I was still able to trigger it with your
patch to move the ring buffer disable code. But I did not trigger it
with both the ring buffer disable code _and_ your RCU fix.

But I was able to trigger the segfault that Li is seeing. That is most
likely a separate issue. I'm currently writing special debug code to
find out why that is happening.

Thanks!

-- Steve

Steven Rostedt

unread,

Mar 9, 2010, 1:50:02 PM3/9/10

to

On Tue, 2010-03-09 at 12:28 -0500, Steven Rostedt wrote:
> On Tue, 2010-03-09 at 11:35 +0800, Lai Jiangshan wrote:

> With Li's test, I was able to trigger the ring_buffer warning about
> resetting while committing. I was still able to trigger it with your
> patch to move the ring buffer disable code. But I did not trigger it
> with both the ring buffer disable code _and_ your RCU fix.

Spoke too soon. I hit it again. But I think I know why.

tip-bot for Lai Jiangshan

unread,

Mar 13, 2010, 7:30:02 AM3/13/10

to

Commit-ID: 52fbe9cde7fdb5c6fac196d7ebd2d92d05ef3cd4
Gitweb: http://git.kernel.org/tip/52fbe9cde7fdb5c6fac196d7ebd2d92d05ef3cd4
Author: Lai Jiangshan <la...@cn.fujitsu.com>
AuthorDate: Mon, 8 Mar 2010 14:50:43 +0800
Committer: Steven Rostedt <ros...@goodmis.org>
CommitDate: Fri, 12 Mar 2010 20:26:56 -0500

ring-buffer: Move disabled check into preempt disable section

The ring buffer resizing and resetting relies on a schedule RCU
action. The buffers are disabled, a synchronize_sched() is called
and then the resize or reset takes place.

But this only works if the disabling of the buffers are within the
preempt disabled section, otherwise a window exists that the buffers
can be written to while a reset or resize takes place.

Cc: sta...@kernel.org
Reported-by: Li Zefan <li...@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
LKML-Reference: <4B949E43...@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <ros...@goodmis.org>
---
kernel/trace/ring_buffer.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

[BUG] WARNING: at kernel/trace/ring_buffer.c:3420

Li Zefan

Lai Jiangshan

Steven Rostedt

Steven Rostedt

Steven Rostedt

Li Zefan

Lai Jiangshan

Steven Rostedt

Steven Rostedt

Steven Rostedt

tip-bot for Lai Jiangshan