After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
for 15 hours, and the bug seems disapeared.
------------[ cut here ]------------
WARNING: at kernel/trace/ring_buffer.c:3420 ring_buffer_reset_cpu+0x5a/0x9c()
Hardware name: PRIMERGY
Modules linked in: ipt_MASQUERADE iptable_nat nf_nat bridge stp llc deflate zlib_deflate ctr camellia cast5 rmd160 crypto_null ccm serpent blowfish twofish_x86_64 twofish_common ecb xcbc cbc sha256_generic sha512_generic des_generic cryptd aes_x86_64 aes_generic ah6 ah4 esp6 esp4 xfrm4_mode_beet xfrm4_tunnel tunnel4 xfrm4_mode_tunnel xfrm4_mode_transport xfrm6_mode_transport xfrm6_mode_beet ipcomp ipcomp6 xfrm_ipcomp xfrm6_tunnel tunnel6 af_key autofs4 sunrpc ipv6 dm_mirror dm_region_hash dm_log dm_multipath uinput i5k_amb hwmon e1000e i5000_edac edac_core iTCO_wdt iTCO_vendor_support shpchp i2c_i801 i2c_core serio_raw pcspkr pata_acpi ata_generic mptsas mptscsih ata_piix mptbase scsi_transport_sas dm_mod [last unloaded: speedstep_lib]
Pid: 2108, comm: ftrace_current_ Tainted: G W 2.6.33-tip+ #2
Call Trace:
[<ffffffff8104ebd0>] warn_slowpath_common+0x7c/0x94
[<ffffffff8104ebfc>] warn_slowpath_null+0x14/0x16
[<ffffffff810a50f5>] ring_buffer_reset_cpu+0x5a/0x9c
[<ffffffff810aa3f0>] tracing_reset_online_cpus+0x4a/0x79
[<ffffffff810aa434>] tracer_init+0x15/0x20
[<ffffffff810aa601>] tracing_set_tracer+0x1c2/0x1e7
[<ffffffff810d9847>] ? might_fault+0x5c/0xac
[<ffffffff810aa6b2>] tracing_set_trace_write+0x8c/0xbc
[<ffffffff810a8e79>] ? trace_buffer_lock_reserve+0x21/0x56
[<ffffffff810a95ce>] ? trace_current_buffer_lock_reserve+0x16/0x18
[<ffffffff813ef135>] ? retint_swapgs+0x13/0x1b
[<ffffffff81104618>] vfs_write+0xae/0x10b
[<ffffffff81104735>] sys_write+0x4a/0x6e
[<ffffffff81009da6>] tracesys+0xd0/0xd5
---[ end trace f7002f0890d5b023 ]---
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This bug exhausted us, we found a issue: a tiny window which is
one of causes of this bug. This patch is to close this tiny window.
(bug is not fixed)
From: Lai Jiangshan <la...@cn.fujitsu.com>
ringbuffer resizing and reseting will increase the ->record_disabled
and then wait until a rcu_shced grace period passes.
Contrarily, testing ->record_disabled should be at the same
preempt disabled critical region as writing into ringbuffer, otherwise
it will leave a window break ringbuffer resizing or reseting.
Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
Reported-by: Li Zefan <li...@cn.fujitsu.com>
---
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8c1b2d2..54191d6 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2232,12 +2232,12 @@ ring_buffer_lock_reserve(struct ring_buffer *buffer, unsigned long length)
if (ring_buffer_flags != RB_BUFFERS_ON)
return NULL;
- if (atomic_read(&buffer->record_disabled))
- return NULL;
-
/* If we are tracing schedule, we don't want to recurse */
resched = ftrace_preempt_disable();
+ if (atomic_read(&buffer->record_disabled))
+ goto out_nocheck;
+
if (trace_recursive_lock())
goto out_nocheck;
@@ -2469,11 +2469,11 @@ int ring_buffer_write(struct ring_buffer *buffer,
if (ring_buffer_flags != RB_BUFFERS_ON)
return -EBUSY;
- if (atomic_read(&buffer->record_disabled))
- return -EBUSY;
-
resched = ftrace_preempt_disable();
+ if (atomic_read(&buffer->record_disabled))
+ goto out;
+
cpu = raw_smp_processor_id();
if (!cpumask_test_cpu(cpu, buffer->cpumask))
What test are you running? Can you share it?
>
> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
> for 15 hours, and the bug seems disapeared.
So basically, you are reading both the trace and trace_pipe (or
trace_pipe_raw) at the same time?
-- Steve
> > ringbuffer resizing and reseting will increase the ->record_disabled
> > and then wait until a rcu_shced grace period passes.
> >
> > Contrarily, testing ->record_disabled should be at the same
> > preempt disabled critical region as writing into ringbuffer, otherwise
> > it will leave a window break ringbuffer resizing or reseting.
>
> So the resizing and the resetting need a synchronize_sched() after the
> disabling of the buffers, right?
Looking at the code, the synchronize_sched() is already done in
ring_buffer_resize, and the caller (trace.c:tracing_reset() ) also
disables the ring buffer and calls synchronize_sched().
With that, what other window is still opened (after this fix)?
-- Steve
So the resizing and the resetting need a synchronize_sched() after the
disabling of the buffers, right?
-- Steve
The test runs 10+ processes, reading/writing files in debugfs/tracing/
simultaneously.
I'll send it to you in a private mail.
>> After revering 3c05d7482777f15e71bb4cb1ba78dee2800dfec6, I ran the test
>> for 15 hours, and the bug seems disapeared.
>
> So basically, you are reading both the trace and trace_pipe (or
> trace_pipe_raw) at the same time?
>
reading trace, yes. reading trace_pipe, no.
Actualy we don't see how this commit relates to the bug.
I'll run the test again to be sure on this.
This window is still opened: (RCU vs IDLE vs Tracing)
synchronize_sched() does not protect preempt_disable()/enable() for
idle process. But tracing(function_graph, function) introduce more
preempt_disable()/enable() for idle process. It brings windows.
I bet that this bug is not come from this window.
(I added some strict code to RCU and did stress test,
bug was still occurred.)
It most definitely should. If it does not, then we have major problems.
>From the comment above synchronize_sched(void):
/**
* synchronize_sched - wait until an rcu-sched grace period has elapsed.
*
* Control will return to the caller some time after a full rcu-sched
* grace period has elapsed, in other words after all currently executing
* rcu-sched read-side critical sections have completed. These read-side
* critical sections are delimited by rcu_read_lock_sched() and
* rcu_read_unlock_sched(), and may be nested. Note that preempt_disable(),
* local_irq_disable(), and so on may be used in place of
* rcu_read_lock_sched().
I do not see how idle can be ignored here.
>
> I bet that this bug is not come from this window.
> (I added some strict code to RCU and did stress test,
> bug was still occurred.)
>
What exactly were you doing when this happened? Was this from Li's test?
-- Steve
With Li's test, I was able to trigger the ring_buffer warning about
resetting while committing. I was still able to trigger it with your
patch to move the ring buffer disable code. But I did not trigger it
with both the ring buffer disable code _and_ your RCU fix.
But I was able to trigger the segfault that Li is seeing. That is most
likely a separate issue. I'm currently writing special debug code to
find out why that is happening.
Thanks!
-- Steve
> With Li's test, I was able to trigger the ring_buffer warning about
> resetting while committing. I was still able to trigger it with your
> patch to move the ring buffer disable code. But I did not trigger it
> with both the ring buffer disable code _and_ your RCU fix.
Spoke too soon. I hit it again. But I think I know why.
ring-buffer: Move disabled check into preempt disable section
The ring buffer resizing and resetting relies on a schedule RCU
action. The buffers are disabled, a synchronize_sched() is called
and then the resize or reset takes place.
But this only works if the disabling of the buffers are within the
preempt disabled section, otherwise a window exists that the buffers
can be written to while a reset or resize takes place.
Cc: sta...@kernel.org
Reported-by: Li Zefan <li...@cn.fujitsu.com>
Signed-off-by: Lai Jiangshan <la...@cn.fujitsu.com>
LKML-Reference: <4B949E43...@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <ros...@goodmis.org>
---
kernel/trace/ring_buffer.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)