With latest linus kernel, I see following crash. I was running some
scripts which create cgroups and launch fio jobs in those groups. In
a separate window I wrote a script to change the IO scheduler on the
device in a loop while IO was happening on the device. After few
seconds I see following. So far I tried it twice and reproduced it
both the times in first few seconds.
Thanks
Vivek
[ 94.217015] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[ 94.218004] IP: [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
[ 94.218004] PGD 13abda067 PUD 137d52067 PMD 0
[ 94.218004] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 94.218004] CPU 0
[ 94.218004] Modules linked in: [last unloaded: scsi_wait_scan]
[ 94.218004]
[ 94.218004] Pid: 0, comm: swapper/0 Not tainted 3.2.0+ #16 Hewlett-Packard HP xw6600 Workstation/0A9Ch
[ 94.218004] RIP: 0010:[<ffffffff81142fae>] [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
[ 94.218004] RSP: 0018:ffff88013fc03de0 EFLAGS: 00010006
[ 94.218004] RAX: ffffffff81e0d020 RBX: ffff880138b3c680 RCX: 00000001801c001b
[ 94.218004] RDX: 00000000003aac1d RSI: ffff880138b3c680 RDI: ffffffff81142fae
[ 94.218004] RBP: ffff88013fc03e10 R08: ffff880137830238 R09: 0000000000000001
[ 94.218004] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 94.218004] R13: ffffea0004e2cf00 R14: ffffffff812f6eb6 R15: 0000000000000246
[ 94.218004] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 94.218004] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 94.218004] CR2: 000000000000001c CR3: 00000001395ab000 CR4: 00000000000006f0
[ 94.218004] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 94.218004] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 94.218004] Process swapper/0 (pid: 0, threadinfo ffffffff81e00000, task ffffffff81e0d020)
[ 94.218004] Stack:
[ 94.218004] 0000000000000102 ffff88013fc0db20 ffffffff81e22700 ffff880139500f00
[ 94.218004] 0000000000000001 000000000000000a ffff88013fc03e20 ffffffff812f6eb6
[ 94.218004] ffff88013fc03e90 ffffffff810c8da2 ffffffff81e01fd8 ffff880137830240
[ 94.218004] Call Trace:
[ 94.218004] <IRQ>
[ 94.218004] [<ffffffff812f6eb6>] icq_free_icq_rcu+0x16/0x20
[ 94.218004] [<ffffffff810c8da2>] __rcu_process_callbacks+0x1c2/0x420
[ 94.218004] [<ffffffff810c9038>] rcu_process_callbacks+0x38/0x250
[ 94.218004] [<ffffffff810405ee>] __do_softirq+0xce/0x3e0
[ 94.218004] [<ffffffff8108ed04>] ? clockevents_program_event+0x74/0x100
[ 94.218004] [<ffffffff81090104>] ? tick_program_event+0x24/0x30
[ 94.218004] [<ffffffff8183ed1c>] call_softirq+0x1c/0x30
[ 94.218004] [<ffffffff8100422d>] do_softirq+0x8d/0xc0
[ 94.218004] [<ffffffff81040c3e>] irq_exit+0xae/0xe0
[ 94.218004] [<ffffffff8183f4be>] smp_apic_timer_interrupt+0x6e/0x99
[ 94.218004] [<ffffffff8183e330>] apic_timer_interrupt+0x70/0x80
[ 94.218004] <EOI>
[ 94.218004] [<ffffffff8100a806>] ? mwait_idle+0xb6/0x4c0
[ 94.218004] [<ffffffff8100a7fd>] ? mwait_idle+0xad/0x4c0
[ 94.218004] [<ffffffff810011e6>] cpu_idle+0x96/0xe0
[ 94.218004] [<ffffffff818064df>] rest_init+0x133/0x144
[ 94.218004] [<ffffffff81806425>] ? rest_init+0x79/0x144
[ 94.218004] [<ffffffff81ed4b51>] start_kernel+0x35b/0x366
[ 94.218004] [<ffffffff81ed4321>] x86_64_start_reservations+0x131/0x135
[ 94.218004] [<ffffffff81ed4415>] x86_64_start_kernel+0xf0/0xf7
[ 94.218004] Code: f3 e8 97 cb ee ff 48 c1 e8 0c 48 c1 e0 06 49 01 c5 49 8b 45 00 f6 c4 80 0f 85 99 00 00 00 4c 8b 75 08 9c 41 5f fa e8 12 f8 f4 ff <49> 63 74 24 1c 48 89 df e8 e5 4b f5 ff 41 f7 c7 00 02 00 00 74
[ 94.218004] RIP [<ffffffff81142fae>] kmem_cache_free+0x5e/0x200
[ 94.218004] RSP <ffff88013fc03de0>
[ 94.218004] CR2: 000000000000001c
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Can you try this?
diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 163263d..ee55019 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -3117,18 +3117,17 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
*/
static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
- struct cfq_queue *old_cfqq = cfqd->active_queue;
-
cfq_log_cfqq(cfqd, cfqq, "preempt");
- cfq_slice_expired(cfqd, 1);
/*
* workload type is changed, don't save slice, otherwise preempt
* doesn't happen
*/
- if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
+ if (cfqq_type(cfqd->active_queue) != cfqq_type(cfqq))
cfqq->cfqg->saved_workload_slice = 0;
+ cfq_slice_expired(cfqd, 1);
+
/*
* Put the new queue at the front of the of the current list,
* so we know that it will be selected next.
--
Jens Axboe
Nope, this does not help either. Can reproduce the issue with below
patch applied.
>
>
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 163263d..ee55019 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -3117,18 +3117,17 @@ cfq_should_preempt(struct cfq_data *cfqd, struct cfq_queue *new_cfqq,
> */
> static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
> {
> - struct cfq_queue *old_cfqq = cfqd->active_queue;
> -
> cfq_log_cfqq(cfqd, cfqq, "preempt");
> - cfq_slice_expired(cfqd, 1);
>
> /*
> * workload type is changed, don't save slice, otherwise preempt
> * doesn't happen
> */
> - if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
> + if (cfqq_type(cfqd->active_queue) != cfqq_type(cfqq))
> cfqq->cfqg->saved_workload_slice = 0;
>
> + cfq_slice_expired(cfqd, 1);
> +
cfq_slice_expired() will overwrite the value of
cfqq->cfqg->saved_workload_slice. So we need to set it to zero after
cfq_slice_expired.
I was thinking of just saving the workload type of cfqq before
cfq_slice_expired() so that we don't access old cfqq after
cfq_slice_expired(). But then I noticed that we don't drop a cfqq
reference in cfq_slice_expired(). So not sure how cfq_slice_expired()
can lead to freeing up of queue. It should happen only when process
has exited and last request on the queue if finished.
Thanks
Vivek
Good point, lets just fix that up afterwards, the use-after-free needs
to go asap.
> I was thinking of just saving the workload type of cfqq before
> cfq_slice_expired() so that we don't access old cfqq after
> cfq_slice_expired(). But then I noticed that we don't drop a cfqq
> reference in cfq_slice_expired(). So not sure how cfq_slice_expired()
> can lead to freeing up of queue. It should happen only when process
> has exited and last request on the queue if finished.
It does, it drops a ref to the cic which in turn drops the active async
and sync queues.
Out for tonight, will pick this up in the morning.
--
Jens Axboe
Ok, I see it now. Thanks.
I modified your patch a bit. It does not seem to solve my problem but
might help with Chris Mason's boot issue.
Chris, can you please give it a try.
Thanks
Vivek
---
block/cfq-iosched.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
Index: linux-2.6/block/cfq-iosched.c
===================================================================
--- linux-2.6.orig/block/cfq-iosched.c 2012-01-18 02:49:33.000000000 -0500
+++ linux-2.6/block/cfq-iosched.c 2012-01-18 02:50:31.000000000 -0500
@@ -3117,7 +3117,7 @@ cfq_should_preempt(struct cfq_data *cfqd
*/
static void cfq_preempt_queue(struct cfq_data *cfqd, struct cfq_queue *cfqq)
{
- struct cfq_queue *old_cfqq = cfqd->active_queue;
+ enum wl_type_t old_cfqq_wl_type = cfqq_type(cfqd->active_queue);
cfq_log_cfqq(cfqd, cfqq, "preempt");
cfq_slice_expired(cfqd, 1);
@@ -3126,7 +3126,7 @@ static void cfq_preempt_queue(struct cfq
* workload type is changed, don't save slice, otherwise preempt
* doesn't happen
*/
- if (cfqq_type(old_cfqq) != cfqq_type(cfqq))
+ if (old_cfqq_wl_type != cfqq_type(cfqq))
cfqq->cfqg->saved_workload_slice = 0;
/*
[..]
Oops, old mail id of chris. Fixing it now.
Thanks
Vivek
Does the following patch make any difference?
diff --git a/block/elevator.c b/block/elevator.c
index 91e18f8..b5a94ed 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -105,6 +105,7 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
if (bio_integrity(bio) != blk_integrity_rq(rq))
return 0;
+ return 0;
if (!elv_iosched_allow_merge(rq, bio))
return 0;
Nope. Still can reproduce the issue.
Thanks
Vivek
On Wed, Jan 18, 2012 at 09:01:00AM +0800, Shaohua Li wrote:
> 2012/1/18 Vivek Goyal <vgo...@redhat.com>:
> >> Does the following patch make any difference?
> >
> > Nope. Still can reproduce the issue.
>
> I had similar issue too. Looks ioc_create_icq finds an icq (in ioc_lookup_icq)
> the icq is in rcu free (it has __rcu_icq_cache set)
Vivek is seeing the problem while switching elevators. Are you too?
Or is it during normal operation?
Thanks.
--
tejun
Hmmm... maybe quiescing isn't working as expected and kmem cache is
being destroyed with live icq's. I'll try to reproduce it.
Thanks.
--
tejun
diff --git a/block/blk-core.c b/block/blk-core.c
index e6c05a9..c6a8ef5 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -872,11 +872,11 @@ retry:
spin_unlock_irq(q->queue_lock);
/* create icq if missing */
- if (unlikely(et->icq_cache && !icq))
+ if (unlikely(et->icq_cache && !icq && (rw_flags & REQ_ELVPRIV)))
icq = ioc_create_icq(q, gfp_mask);
/* rqs are guaranteed to have icq on elv_set_request() if requested */
- if (likely(!et->icq_cache || icq))
+ if (likely(!et->icq_cache || icq || !(rw_flags & REQ_ELVPRIV)))
rq = blk_alloc_request(q, icq, rw_flags, gfp_mask);
if (unlikely(!rq)) {
switch to noop
insert icq to ioc
switch to cfq
ioc_clear_queue
in ioc_clear_queue, ioc has icq in its list, but current elevator is noop,
so ioc_exit_icq will get a NULL et->icq_cache
I hit a kmem_cache_alloc_node oops too. because the et in ioc_create_icq
might not be the et of get_request (we drop lock before calling ioc_create_icq)
The put_io_context has a workqueue to run ioc_exit_icq, I suspect
there is problem too.