blk: NULL ptr deref in blk_dequeue

Sasha Levin

unread,

Sep 22, 2012, 4:40:01 PM9/22/12

to

Hi all,

While fuzzing with trinity inside a KVM tools guest running the latest linux-next kernel, I've stumbled on the following BUG.

I've also hit a similar trace where the 'BUG_ON(ELV_ON_HASH(rq));' above that list_del_init() gets hit, so I guess it's a race
condition of some sorts.

[ 9.900299] BUG: unable to handle kernel NULL pointer dereference at (null)
[ 9.909508] IP: [<ffffffff819ea637>] __list_del_entry+0xb7/0xe0
[ 9.910191] PGD 0
[ 9.910191] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 9.910191] Dumping ftrace buffer:
[ 9.910191] (ftrace buffer empty)
[ 9.910191] CPU 2
[ 9.910191] Pid: 3996, comm: kworker/u:2 Tainted: G W 3.6.0-rc6-next-20120921-sasha-00001-geb77a39-dirty #3
[ 9.910191] RIP: 0010:[<ffffffff819ea637>] [<ffffffff819ea637>] __list_del_entry+0xb7/0xe0
[ 9.910191] RSP: 0000:ffff880034e11c88 EFLAGS: 00010007
[ 9.910191] RAX: 0000000000000000 RBX: ffff880034e3ec00 RCX: dead000000200200
[ 9.910191] RDX: 0000000000000000 RSI: ffffffff85366998 RDI: ffff880034e3ec00
[ 9.910191] RBP: ffff880034e11c88 R08: 0000000000000000 R09: ffff88001af60928
[ 9.910191] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 9.910191] R13: ffffffff85366360 R14: 0000000000000000 R15: ffffffff85b4edd0
[ 9.910191] FS: 0000000000000000(0000) GS:ffff880029800000(0000) knlGS:0000000000000000
[ 9.910191] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.910191] CR2: 0000000000000000 CR3: 0000000004c26000 CR4: 00000000000406e0
[ 9.910191] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9.910191] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 9.910191] Process kworker/u:2 (pid: 3996, threadinfo ffff880034e10000, task ffff88001af60000)
[ 9.910191] Stack:
[ 9.910191] ffff880034e11ca8 ffffffff819a1a45 ffff880034e3ec00 0000000000000000
[ 9.910191] ffff880034e11cc8 ffffffff819a1ae1 0000000000000000 ffff880034e3ec00
[ 9.910191] ffff880034e11ce8 ffffffff819a271e 0000000000000000 0000000000000000
[ 9.910191] Call Trace:
[ 9.910191] [<ffffffff819a1a45>] blk_dequeue_request+0x35/0xc0
[ 9.910191] [<ffffffff819a1ae1>] blk_start_request+0x11/0x40
[ 9.910191] [<ffffffff819a271e>] blk_fetch_request+0x1e/0x30
[ 9.910191] [<ffffffff81e5a89d>] redo_fd_request+0x9d/0x3f0
[ 9.910191] [<ffffffff8112a779>] process_one_work+0x3b9/0x770
[ 9.910191] [<ffffffff8112a628>] ? process_one_work+0x268/0x770
[ 9.910191] [<ffffffff81177a22>] ? get_lock_stats+0x22/0x70
[ 9.910191] [<ffffffff81e5a800>] ? start_motor+0x120/0x120
[ 9.910191] [<ffffffff8112b0fa>] worker_thread+0x2ba/0x3f0
[ 9.910191] [<ffffffff8112ae40>] ? rescuer_thread+0x2d0/0x2d0
[ 9.910191] [<ffffffff81135d83>] kthread+0xe3/0xf0
[ 9.910191] [<ffffffff81177aae>] ? put_lock_stats.isra.16+0xe/0x40
[ 9.910191] [<ffffffff81135ca0>] ? insert_kthread_work+0x90/0x90
[ 9.910191] [<ffffffff839f1e45>] kernel_thread_helper+0x5/0x10
[ 9.910191] [<ffffffff81135ca0>] ? insert_kthread_work+0x90/0x90
[ 9.910191] Code: 6a 84 be 3e 00 00 00 48 c7 c7 7b d8 6a 84 31 c0 e8 8f c2 71 ff eb 2c 0f 1f 44 00 00 48 b9 00 02 20 00 00 00
ad de 48 39 c8 74 8c <4c> 8b 00 4c 39 c7 75 a6 4c 8b 42 08 4c 39 c7 75 bc 48 89 42 08
[ 9.910191] RIP [<ffffffff819ea637>] __list_del_entry+0xb7/0xe0
[ 9.910191] RSP <ffff880034e11c88>
[ 9.910191] CR2: 0000000000000000

Thanks,
Sasha
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Sasha Levin

unread,

Oct 7, 2012, 2:30:01 PM10/7/12

to

Ping?

I'm still seeing this on linux-next.

Jan Kara

unread,

Oct 8, 2012, 1:30:02 PM10/8/12

to

On Sun 07-10-12 14:26:42, Sasha Levin wrote:
> Ping?
>
> I'm still seeing this on linux-next.

I think this is floppy related (see redo_fd_request() in the stack
trace). And there were quite some changes to the area recently. Adding
maintainer to CC.

Honza

--
Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Jiri Kosina

unread,

Oct 8, 2012, 5:50:01 PM10/8/12

to

On Mon, 8 Oct 2012, Jan Kara wrote:

> > I'm still seeing this on linux-next.
> I think this is floppy related (see redo_fd_request() in the stack
> trace). And there were quite some changes to the area recently. Adding
> maintainer to CC.

Hmm ... I don't immediately see how this is happening.

Sasha, could you please do git bisect on drivers/block/floppy.c between
f6365201d and your git HEAD for starters (assuming that f6365201d works
well for you?).

Seems like the queue obtained in set_next_request() through

q = unit[fdc_queue].gendisk->queue;

is not a proper one. I am currently not sure why.

> > > [ 9.910191] [<ffffffff819a1a45>] blk_dequeue_request+0x35/0xc0
> > > [ 9.910191] [<ffffffff819a1ae1>] blk_start_request+0x11/0x40
> > > [ 9.910191] [<ffffffff819a271e>] blk_fetch_request+0x1e/0x30
> > > [ 9.910191] [<ffffffff81e5a89d>] redo_fd_request+0x9d/0x3f0
> > > [ 9.910191] [<ffffffff8112a779>] process_one_work+0x3b9/0x770
> > > [ 9.910191] [<ffffffff8112a628>] ? process_one_work+0x268/0x770
> > > [ 9.910191] [<ffffffff81177a22>] ? get_lock_stats+0x22/0x70
> > > [ 9.910191] [<ffffffff81e5a800>] ? start_motor+0x120/0x120
> > > [ 9.910191] [<ffffffff8112b0fa>] worker_thread+0x2ba/0x3f0
> > > [ 9.910191] [<ffffffff8112ae40>] ? rescuer_thread+0x2d0/0x2d0
> > > [ 9.910191] [<ffffffff81135d83>] kthread+0xe3/0xf0
> > > [ 9.910191] [<ffffffff81177aae>] ? put_lock_stats.isra.16+0xe/0x40
> > > [ 9.910191] [<ffffffff81135ca0>] ? insert_kthread_work+0x90/0x90
> > > [ 9.910191] [<ffffffff839f1e45>] kernel_thread_helper+0x5/0x10
> > > [ 9.910191] [<ffffffff81135ca0>] ? insert_kthread_work+0x90/0x90
> > > [ 9.910191] Code: 6a 84 be 3e 00 00 00 48 c7 c7 7b d8 6a 84 31 c0 e8 8f c2 71 ff eb 2c 0f 1f 44 00 00 48 b9 00 02 20 00 00 00
> > > ad de 48 39 c8 74 8c <4c> 8b 00 4c 39 c7 75 a6 4c 8b 42 08 4c 39 c7 75 bc 48 89 42 08
> > > [ 9.910191] RIP [<ffffffff819ea637>] __list_del_entry+0xb7/0xe0
> > > [ 9.910191] RSP <ffff880034e11c88>
> > > [ 9.910191] CR2: 0000000000000000
> > >
> > >
> > > Thanks,
> > > Sasha

--

Jiri Kosina
SUSE Labs

Sasha Levin

unread,

Oct 9, 2012, 9:30:02 AM10/9/12

to

On 10/08/2012 05:45 PM, Jiri Kosina wrote:
> On Mon, 8 Oct 2012, Jan Kara wrote:
>
>>> > > I'm still seeing this on linux-next.
>> > I think this is floppy related (see redo_fd_request() in the stack
>> > trace). And there were quite some changes to the area recently. Adding
>> > maintainer to CC.
> Hmm ... I don't immediately see how this is happening.
>
> Sasha, could you please do git bisect on drivers/block/floppy.c between
> f6365201d and your git HEAD for starters (assuming that f6365201d works
> well for you?).
>

A bisect on floppy.c yielded the following:

b33d002f4b6bae912463e5a66387c498aa69b6fe is the first bad commit
commit b33d002f4b6bae912463e5a66387c498aa69b6fe
Author: Ben Hutchings <b...@decadent.org.uk>
Date: Mon Aug 27 20:56:53 2012 -0300

genhd: Make put_disk() safe for disks that have not been registered

Thanks,
Sasha

Sasha Levin

unread,

Oct 9, 2012, 9:30:02 AM10/9/12

to

On 10/09/2012 09:21 AM, Sasha Levin wrote:
> On 10/08/2012 05:45 PM, Jiri Kosina wrote:
>> On Mon, 8 Oct 2012, Jan Kara wrote:
>>
>>>>>> I'm still seeing this on linux-next.
>>>> I think this is floppy related (see redo_fd_request() in the stack
>>>> trace). And there were quite some changes to the area recently. Adding
>>>> maintainer to CC.
>> Hmm ... I don't immediately see how this is happening.
>>
>> Sasha, could you please do git bisect on drivers/block/floppy.c between
>> f6365201d and your git HEAD for starters (assuming that f6365201d works
>> well for you?).
>>
>
> A bisect on floppy.c yielded the following:
>
> b33d002f4b6bae912463e5a66387c498aa69b6fe is the first bad commit
> commit b33d002f4b6bae912463e5a66387c498aa69b6fe
> Author: Ben Hutchings <b...@decadent.org.uk>
> Date: Mon Aug 27 20:56:53 2012 -0300
>
> genhd: Make put_disk() safe for disks that have not been registered

2 more things:

1. The guest vm which I'm testing on doesn't emulate anything which even looks like a floppy.
2. I'm seeing the following lines before the BUG:

[ 9.836604] floppy0: no floppy controllers found
[ 9.837246] work still pending
[ 9.837743] floppy0: floppy_shutdown: timeout handler died.

Ben Hutchings

unread,

Oct 10, 2012, 12:00:03 PM10/10/12

to

On Tue, 2012-10-09 at 09:26 -0400, Sasha Levin wrote:
> On 10/09/2012 09:21 AM, Sasha Levin wrote:
> > On 10/08/2012 05:45 PM, Jiri Kosina wrote:
> >> On Mon, 8 Oct 2012, Jan Kara wrote:
> >>
> >>>>>> I'm still seeing this on linux-next.
> >>>> I think this is floppy related (see redo_fd_request() in the stack
> >>>> trace). And there were quite some changes to the area recently. Adding
> >>>> maintainer to CC.
> >> Hmm ... I don't immediately see how this is happening.
> >>
> >> Sasha, could you please do git bisect on drivers/block/floppy.c between
> >> f6365201d and your git HEAD for starters (assuming that f6365201d works
> >> well for you?).
> >>
> >
> > A bisect on floppy.c yielded the following:
> >
> > b33d002f4b6bae912463e5a66387c498aa69b6fe is the first bad commit
> > commit b33d002f4b6bae912463e5a66387c498aa69b6fe
> > Author: Ben Hutchings <b...@decadent.org.uk>
> > Date: Mon Aug 27 20:56:53 2012 -0300
> >
> > genhd: Make put_disk() safe for disks that have not been registered
>
> 2 more things:
>
> 1. The guest vm which I'm testing on doesn't emulate anything which even looks like a floppy.
> 2. I'm seeing the following lines before the BUG:
>
> [ 9.836604] floppy0: no floppy controllers found
> [ 9.837246] work still pending
> [ 9.837743] floppy0: floppy_shutdown: timeout handler died.

I see two problems:

1. redo_fd_request() races with tear-down of the disks, but because
set_next_request() checks disk->queue before doing anything this was
usually harmless. Now that do_floppy_init() doesn't clear disk->queue,
the race condition is much easier to hit. This may fix that problem in
do_floppy_init(), though there appear to be worse bugs in tear-down
order in floppy_module_exit():

--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4320,13 +4320,13 @@ out_unreg_region:
out_unreg_blkdev:
unregister_blkdev(FLOPPY_MAJOR, "fd");
out_put_disk:
+ destroy_workqueue(floppy_wq);
while (dr--) {
del_timer_sync(&motor_off_timer[dr]);
if (disks[dr]->queue)
blk_cleanup_queue(disks[dr]->queue);
put_disk(disks[dr]);
}
- destroy_workqueue(floppy_wq);
return err;
}

--- END ---

2. I made a big mistake in using the existing GENHD_FL_UP flag, as it is
cleared by del_gendisk(). Incremental patch below, but it should be
squashed into the previous patch if that branch is still rebase-able.

Ben.

---
From: Ben Hutchings <b...@decadent.org.uk>
Date: Wed, 10 Oct 2012 16:17:01 +0100
Subject: [PATCH] genhd: Make put_disk() safe again for disks that *have* been
registered

Commit b33d002 ('genhd: Make put_disk() safe for disks that have not
been registered') wrongly used the GENHD_FL_UP flag to test whether a
disk held a reference to its queue. Since this is cleared by
del_gendisk(), queues will not be properly cleaned up if a disk has
been registered and then torn down in the normal way. Introduce a
new flag for this purpose.

Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
---
block/genhd.c | 7 +++++--
include/linux/genhd.h | 1 +
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 633751d..b5f482f 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -617,7 +617,10 @@ void add_disk(struct gendisk *disk)
* Take an extra ref on queue which will be put on disk_release()
* so that it sticks around as long as @disk is there.
*/
- WARN_ON_ONCE(!blk_get_queue(disk->queue));
+ if (blk_get_queue(disk->queue))
+ disk->flags |= GENHD_FL_GOT_QUEUE;
+ else
+ WARN_ON(1);

retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
"bdi");
@@ -1105,7 +1108,7 @@ static void disk_release(struct device *dev)
disk_replace_part_tbl(disk, NULL);
free_part_stats(&disk->part0);
free_part_info(&disk->part0);
- if (disk->queue && disk->flags & GENHD_FL_UP)
+ if (disk->queue && disk->flags & GENHD_FL_GOT_QUEUE)
blk_put_queue(disk->queue);
kfree(disk);
}
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
index 4f440b3..7c2560c 100644
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -134,6 +134,7 @@ struct hd_struct {
#define GENHD_FL_NATIVE_CAPACITY 128
#define GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE 256
#define GENHD_FL_NO_PART_SCAN 512
+#define GENHD_FL_GOT_QUEUE 1024

enum {
DISK_EVENT_MEDIA_CHANGE = 1 << 0, /* media changed */

--
Ben Hutchings
Who are all these weirdos? - David Bowie, about L-Space IRC channel #afp

signature.asc

Jiri Kosina

unread,

Oct 12, 2012, 11:00:02 AM10/12/12

to

Sasha,

did you manage to test this to see if it fixes the symptom you are seeing,
please?

--
Jiri Kosina
SUSE Labs

Sasha Levin

unread,

Oct 12, 2012, 2:00:01 PM10/12/12

to

Hi Ben,

I'm now seeing these instead:

[ 34.823972] general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
[ 34.830888] Dumping ftrace buffer:
[ 34.830888] (ftrace buffer empty)
[ 34.830888] CPU 5
[ 34.830888] Pid: 6, comm: kworker/u:0 Tainted: G W
3.6.0-next-20121012-sasha-00002-gae01a05-dirty #650
[ 34.830888] RIP: 0010:[<ffffffff8112dfe8>] [<ffffffff8112dfe8>]
flush_workqueue_prep_cwqs+0xf8/0x260
[ 34.830888] RSP: 0000:ffff8801bf059a58 EFLAGS: 00010287
[ 34.830888] RAX: 0000000000000000 RBX: ffff100b833d8000 RCX: 0000000000000000
[ 34.830888] RDX: 0000000000000000 RSI: 0000000000000078 RDI: 0000000000000078
[ 34.830888] RBP: ffff8801bf059aa8 R08: ffffffff858bb800 R09: 0000000000000000
[ 34.830888] R10: 2222222222222222 R11: 0000000000000078 R12: ffff8809c1610600
[ 34.830888] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000002
[ 34.830888] FS: 0000000000000000(0000) GS:ffff8809c4000000(0000)
knlGS:0000000000000000
[ 34.830888] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 34.830888] CR2: 00000000ffffffff CR3: 0000000004e25000 CR4: 00000000000006e0
[ 34.830888] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 34.830888] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 34.830888] Process kworker/u:0 (pid: 6, threadinfo
ffff8801bf058000, task ffff8801bf053000)
[ 34.830888] Stack:
[ 34.830888] ffff8801bf059a88 00ff8809c1610620 0000000000000006
ffff8809c16106d0
[ 34.830888] ffffffff8480a53f ffff8809c1610600 ffff8801bf059ae8
0000000000000003
[ 34.830888] ffff8809c16106f0 ffff8809c1610620 ffff8801bf059c08
ffffffff8112e3ba
[ 34.830888] Call Trace:
[ 34.830888] [<ffffffff8112e3ba>] flush_workqueue+0x26a/0x5c0
[ 34.991665] [<ffffffff8112e150>] ? flush_workqueue_prep_cwqs+0x260/0x260
[ 34.991665] [<ffffffff8112f9c0>] drain_workqueue+0x70/0x270
[ 34.991665] [<ffffffff819d1a25>] ? kobject_cleanup+0x145/0x190
[ 34.991665] [<ffffffff8112fbd3>] destroy_workqueue+0x13/0x200
[ 34.991665] [<ffffffff85b038dc>] do_floppy_init+0x672/0x70c
[ 34.991665] [<ffffffff85b0397f>] floppy_async_init+0x9/0xb
[ 34.991665] [<ffffffff81143f5b>] async_run_entry_fn+0xab/0x180
[ 34.991665] [<ffffffff8112ec46>] process_one_work+0x386/0x570
[ 34.991665] [<ffffffff8112eb18>] ? process_one_work+0x258/0x570
[ 34.991665] [<ffffffff81143eb0>] ? async_schedule+0x20/0x20
[ 34.991665] [<ffffffff8113062a>] worker_thread+0x20a/0x340
[ 34.991665] [<ffffffff81130420>] ? manage_workers+0x160/0x160
[ 34.991665] [<ffffffff81139c52>] kthread+0xe2/0xf0
[ 34.991665] [<ffffffff8118386a>] ? __lock_release+0x1ba/0x1d0
[ 34.991665] [<ffffffff81139b70>] ? __init_kthread_worker+0x70/0x70
[ 34.991665] [<ffffffff83a645bc>] ret_from_fork+0x7c/0x90
[ 34.991665] [<ffffffff81139b70>] ? __init_kthread_worker+0x70/0x70
[ 34.991665] Code: 5c 24 08 44 89 f0 48 03 1c c5 00 13 8b 85 eb 1b
0f 1f 00 41 81 fe 00 10 00 00 75 07 49 8b 5c 24 08 eb 08 31 db 66 0f
1f 44 00 00 <48> 8b 03 48 8b 08 48 89 cf 48 89 4d b0 e8 46 4c 93 02 45
85 ff
[ 34.991665] RIP [<ffffffff8112dfe8>] flush_workqueue_prep_cwqs+0xf8/0x260
[ 34.991665] RSP <ffff8801bf059a58>
[ 35.151058] ---[ end trace 48a38e4c9e8f037d ]---

Ben Hutchings

unread,

Oct 16, 2012, 10:20:01 PM10/16/12

to

On Fri, 2012-10-12 at 13:55 -0400, Sasha Levin wrote:
> Hi Ben,
>
> On Wed, Oct 10, 2012 at 11:52 AM, Ben Hutchings <b...@decadent.org.uk> wrote:
> > On Tue, 2012-10-09 at 09:26 -0400, Sasha Levin wrote:
> >> On 10/09/2012 09:21 AM, Sasha Levin wrote:
> >> > On 10/08/2012 05:45 PM, Jiri Kosina wrote:
> >> >> On Mon, 8 Oct 2012, Jan Kara wrote:
> >> >>
> >> >>>>>> I'm still seeing this on linux-next.
> >> >>>> I think this is floppy related (see redo_fd_request() in the stack
> >> >>>> trace). And there were quite some changes to the area recently. Adding
> >> >>>> maintainer to CC.
> >> >> Hmm ... I don't immediately see how this is happening.
> >> >>
> >> >> Sasha, could you please do git bisect on drivers/block/floppy.c between
> >> >> f6365201d and your git HEAD for starters (assuming that f6365201d works
> >> >> well for you?).
> >> >>
> >> >
> >> > A bisect on floppy.c yielded the following:
> >> >
> >> > b33d002f4b6bae912463e5a66387c498aa69b6fe is the first bad commit
> >> > commit b33d002f4b6bae912463e5a66387c498aa69b6fe
> >> > Author: Ben Hutchings <b...@decadent.org.uk>
> >> > Date: Mon Aug 27 20:56:53 2012 -0300
> >> >
> >> > genhd: Make put_disk() safe for disks that have not been registered

[...]

> > I see two problems:
> >
> > 1. redo_fd_request() races with tear-down of the disks, but because
> > set_next_request() checks disk->queue before doing anything this was
> > usually harmless. Now that do_floppy_init() doesn't clear disk->queue,
> > the race condition is much easier to hit. This may fix that problem in
> > do_floppy_init(), though there appear to be worse bugs in tear-down
> > order in floppy_module_exit():

[...]

> > 2. I made a big mistake in using the existing GENHD_FL_UP flag, as it is
> > cleared by del_gendisk(). Incremental patch below, but it should be
> > squashed into the previous patch if that branch is still rebase-able.

[...]

> I'm now seeing these instead:

[...]

Sorry, I'm not going to spend more time in the quagmire of the floppy
driver. Whoever has this commit in their tree, please revert or drop it
as appropriate.

Ben.

--
Ben Hutchings
No political challenge can be met by shopping. - George Monbiot

signature.asc

Ben Hutchings

unread,

Oct 16, 2012, 10:20:01 PM10/16/12

to

On Fri, 2012-10-12 at 13:55 -0400, Sasha Levin wrote:

> Hi Ben,
>
> On Wed, Oct 10, 2012 at 11:52 AM, Ben Hutchings <b...@decadent.org.uk> wrote:
> > On Tue, 2012-10-09 at 09:26 -0400, Sasha Levin wrote:
> >> On 10/09/2012 09:21 AM, Sasha Levin wrote:
> >> > On 10/08/2012 05:45 PM, Jiri Kosina wrote:
> >> >> On Mon, 8 Oct 2012, Jan Kara wrote:
> >> >>
> >> >>>>>> I'm still seeing this on linux-next.
> >> >>>> I think this is floppy related (see redo_fd_request() in the stack
> >> >>>> trace). And there were quite some changes to the area recently. Adding
> >> >>>> maintainer to CC.
> >> >> Hmm ... I don't immediately see how this is happening.
> >> >>
> >> >> Sasha, could you please do git bisect on drivers/block/floppy.c between
> >> >> f6365201d and your git HEAD for starters (assuming that f6365201d works
> >> >> well for you?).
> >> >>
> >> >
> >> > A bisect on floppy.c yielded the following:
> >> >
> >> > b33d002f4b6bae912463e5a66387c498aa69b6fe is the first bad commit
> >> > commit b33d002f4b6bae912463e5a66387c498aa69b6fe
> >> > Author: Ben Hutchings <b...@decadent.org.uk>
> >> > Date: Mon Aug 27 20:56:53 2012 -0300
> >> >
> >> > genhd: Make put_disk() safe for disks that have not been registered

[...]

> > I see two problems:
> >
> > 1. redo_fd_request() races with tear-down of the disks, but because
> > set_next_request() checks disk->queue before doing anything this was
> > usually harmless. Now that do_floppy_init() doesn't clear disk->queue,
> > the race condition is much easier to hit. This may fix that problem in
> > do_floppy_init(), though there appear to be worse bugs in tear-down
> > order in floppy_module_exit():

[...]

> > 2. I made a big mistake in using the existing GENHD_FL_UP flag, as it is
> > cleared by del_gendisk(). Incremental patch below, but it should be
> > squashed into the previous patch if that branch is still rebase-able.

[...]

> I'm now seeing these instead:

signature.asc

Jiri Kosina

unread,

Oct 17, 2012, 10:20:03 AM10/17/12

to

On Wed, 17 Oct 2012, Ben Hutchings wrote:

> > > 1. redo_fd_request() races with tear-down of the disks, but because
> > > set_next_request() checks disk->queue before doing anything this was
> > > usually harmless. Now that do_floppy_init() doesn't clear disk->queue,
> > > the race condition is much easier to hit. This may fix that problem in
> > > do_floppy_init(), though there appear to be worse bugs in tear-down
> > > order in floppy_module_exit():
> [...]
> > > 2. I made a big mistake in using the existing GENHD_FL_UP flag, as it is
> > > cleared by del_gendisk(). Incremental patch below, but it should be
> > > squashed into the previous patch if that branch is still rebase-able.
> [...]
> > I'm now seeing these instead:
> [...]
>
> Sorry, I'm not going to spend more time in the quagmire of the floppy
> driver. Whoever has this commit in their tree, please revert or drop it
> as appropriate.

As far as I can tell, Jens has pulled it from me, but it hasn't made it
into Linus' tree as of today.

I will do it in my tree and send a new pull request to Jens.

--
Jiri Kosina
SUSE Labs

Jens Axboe

unread,

Oct 17, 2012, 10:30:02 AM10/17/12

to

On 2012-10-17 16:11, Jiri Kosina wrote:
> On Wed, 17 Oct 2012, Ben Hutchings wrote:
>
>>>> 1. redo_fd_request() races with tear-down of the disks, but because
>>>> set_next_request() checks disk->queue before doing anything this was
>>>> usually harmless. Now that do_floppy_init() doesn't clear disk->queue,
>>>> the race condition is much easier to hit. This may fix that problem in
>>>> do_floppy_init(), though there appear to be worse bugs in tear-down
>>>> order in floppy_module_exit():
>> [...]
>>>> 2. I made a big mistake in using the existing GENHD_FL_UP flag, as it is
>>>> cleared by del_gendisk(). Incremental patch below, but it should be
>>>> squashed into the previous patch if that branch is still rebase-able.
>> [...]
>>> I'm now seeing these instead:
>> [...]
>>
>> Sorry, I'm not going to spend more time in the quagmire of the floppy
>> driver. Whoever has this commit in their tree, please revert or drop it
>> as appropriate.
>
> As far as I can tell, Jens has pulled it from me, but it hasn't made it
> into Linus' tree as of today.
>
> I will do it in my tree and send a new pull request to Jens.

I did not add the patch from Ben, as it was reported as not working. My
driver pull is late this time due to travel, but it'll go out start of
next week. So if you have pending floppy updates that are tested at that
time, then please do send them my way.

--
Jens Axboe

Jiri Kosina

unread,

Oct 26, 2012, 2:10:02 PM10/26/12

to

On Wed, 17 Oct 2012, Jens Axboe wrote:

> >>>> 1. redo_fd_request() races with tear-down of the disks, but because
> >>>> set_next_request() checks disk->queue before doing anything this was
> >>>> usually harmless. Now that do_floppy_init() doesn't clear disk->queue,
> >>>> the race condition is much easier to hit. This may fix that problem in
> >>>> do_floppy_init(), though there appear to be worse bugs in tear-down
> >>>> order in floppy_module_exit():
> >> [...]
> >>>> 2. I made a big mistake in using the existing GENHD_FL_UP flag, as it is
> >>>> cleared by del_gendisk(). Incremental patch below, but it should be
> >>>> squashed into the previous patch if that branch is still rebase-able.
> >> [...]
> >>> I'm now seeing these instead:
> >> [...]
> >>
> >> Sorry, I'm not going to spend more time in the quagmire of the floppy
> >> driver. Whoever has this commit in their tree, please revert or drop it
> >> as appropriate.
> >
> > As far as I can tell, Jens has pulled it from me, but it hasn't made it
> > into Linus' tree as of today.
> >
> > I will do it in my tree and send a new pull request to Jens.
>
> I did not add the patch from Ben, as it was reported as not working. My
> driver pull is late this time due to travel, but it'll go out start of
> next week. So if you have pending floppy updates that are tested at that
> time, then please do send them my way.

Sorry for the delay on my side as well. I am now working on reverting
Ben's original patch and doing some testing. Will be sending you a pull
request still tonight.

Thanks,

--
Jiri Kosina
SUSE Labs

blk: NULL ptr deref in blk_dequeue_request()

Sasha Levin

Sasha Levin

Jan Kara

Jiri Kosina

Sasha Levin

Sasha Levin

Ben Hutchings

Jiri Kosina

Sasha Levin

Ben Hutchings

Ben Hutchings

Jiri Kosina

Jens Axboe

Jiri Kosina