[PATCH] block: Fix lock unbalance caused by lock disconnect

Asias He

unread,

May 24, 2012, 10:11:03 PM5/24/12

to Jens Axboe, Tejun Heo, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Asias He

Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
supplied queue_lock before blk_drain_queue(). This would introduce lock
unbalance because theads which have taken the external lock might unlock
the internal lock in the during the queue drain.

This patch fixes this by disconnecting the lock after the queue drain.

=====================================
[ BUG: bad unlock balance detected! ]
3.4.0+ #288 Not tainted
-------------------------------------
fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
[<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
but there are no more locks to release!

other info that might help us debug this:
1 lock held by fio/17706:
#0: (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
get_request_wait+0x19a/0x250

stack backtrace:
Pid: 17706, comm: fio Not tainted 3.4.0+ #288
Call Trace:
[<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
[<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
[<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
[<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
[<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
[<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
[<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
[<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
[<ffffffff810e0079>] lock_release+0xd9/0x250
[<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
[<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
[<ffffffff81328faa>] generic_make_request+0xca/0x100
[<ffffffff81329056>] submit_bio+0x76/0xf0
[<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
[<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
[<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
[<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
[<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
[<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
[<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
[<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
[<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
[<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
[<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
[<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
[<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
[<ffffffff811e8250>] ? aio_fsync+0x30/0x30
[<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
[<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
[<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[<ffffffff811eae50>] sys_io_submit+0x10/0x20
[<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b

Signed-off-by: Asias He <as...@redhat.com>
---
block/blk-core.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 1f61b74..7b4f6fe 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -416,15 +416,10 @@ void blk_cleanup_queue(struct request_queue *q)
/* mark @q DEAD, no new request or merges will be allowed afterwards */
mutex_lock(&q->sysfs_lock);
queue_flag_set_unlocked(QUEUE_FLAG_DEAD, q);
-
spin_lock_irq(lock);
queue_flag_set(QUEUE_FLAG_NOMERGES, q);
queue_flag_set(QUEUE_FLAG_NOXMERGES, q);
queue_flag_set(QUEUE_FLAG_DEAD, q);
-
- if (q->queue_lock != &q->__queue_lock)
- q->queue_lock = &q->__queue_lock;
-
spin_unlock_irq(lock);
mutex_unlock(&q->sysfs_lock);

@@ -440,6 +435,11 @@ void blk_cleanup_queue(struct request_queue *q)
del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
blk_sync_queue(q);

+ spin_lock_irq(lock);
+ if (q->queue_lock != &q->__queue_lock)
+ q->queue_lock = &q->__queue_lock;
+ spin_unlock_irq(lock);
+
/* @q is and will stay empty, shutdown and put */
blk_put_queue(q);
}
--
1.7.10.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Tejun Heo

unread,

May 27, 2012, 8:08:55 PM5/27/12

to Asias He, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org

On Fri, May 25, 2012 at 10:10:59AM +0800, Asias He wrote:
> Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
> supplied queue_lock before blk_drain_queue(). This would introduce lock
> unbalance because theads which have taken the external lock might unlock
> the internal lock in the during the queue drain.
>
> This patch fixes this by disconnecting the lock after the queue drain.

I don't think the patch description is correct. The lock switcihng is
inherently broken and the patch doesn't really fix the problem
although it *might* make the problem less likely. Trying to switch
locks while there are other accessors of the lock is simply broken, it
can never work without outer synchronization. Your patch might make
the problem somewhat less likely simply because queue draining makes a
lot of request_queue users go away.

So, can you please update the commit description? It doesn't really
*fix* anything but I think we're still better off with the change.

Thanks.

--
tejun

Asias He

unread,

May 27, 2012, 10:14:28 PM5/27/12

to Tejun Heo, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org

On 05/28/2012 08:07 AM, Tejun Heo wrote:
> On Fri, May 25, 2012 at 10:10:59AM +0800, Asias He wrote:
>> Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
>> supplied queue_lock before blk_drain_queue(). This would introduce lock
>> unbalance because theads which have taken the external lock might unlock
>> the internal lock in the during the queue drain.
>>
>> This patch fixes this by disconnecting the lock after the queue drain.
>
> I don't think the patch description is correct. The lock switcihng is
> inherently broken and the patch doesn't really fix the problem
> although it *might* make the problem less likely. Trying to switch
> locks while there are other accessors of the lock is simply broken, it
> can never work without outer synchronization.

Since the lock switching is broken, is it a good idea to force all the
drivers to use the block layer provided lock? i.e. Change the API from
blk_init_queue(rfn, driver_lock) to blk_init_queue(rfn). Any reason not
to use the block layer provided one.

> Your patch might make
> the problem somewhat less likely simply because queue draining makes a
> lot of request_queue users go away.

Who will use the request_queue after blk_cleanup_queue()?

> So, can you please update the commit description? It doesn't really
> *fix* anything but I think we're still better off with the change.

Sure. Will send v2.

--
Asias

Asias He

unread,

May 27, 2012, 10:21:43 PM5/27/12

to Jens Axboe, Tejun Heo, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Asias He

Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
supplied queue_lock before blk_drain_queue(). This would introduce lock
unbalance because theads which have taken the external lock might unlock
the internal lock in the during the queue drain.

This patch mitigate this by disconnecting the lock after the queue
draining since queue draining makes a lot of request_queue users go
away.

Changes since v1: Update commit log as Tejun suggested.

Tejun Heo

unread,

May 28, 2012, 6:21:11 AM5/28/12

to Asias He, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Christoph Hellwig

Hello, Asias.

On Mon, May 28, 2012 at 10:15:18AM +0800, Asias He wrote:
> >I don't think the patch description is correct. The lock switcihng is
> >inherently broken and the patch doesn't really fix the problem
> >although it *might* make the problem less likely. Trying to switch
> >locks while there are other accessors of the lock is simply broken, it
> >can never work without outer synchronization.
>
> Since the lock switching is broken, is it a good idea to force all
> the drivers to use the block layer provided lock? i.e. Change the
> API from
> blk_init_queue(rfn, driver_lock) to blk_init_queue(rfn). Any reason
> not to use the block layer provided one.

I think hch tried to do that a while ago. Dunno what happened to the
patches. IIRC, the whole external lock thing was about sharing a
single lock across different request_queues. Not sure whether it's
actually beneficial enough or just a crazy broken optimization.

> >Your patch might make
> >the problem somewhat less likely simply because queue draining makes a
> >lot of request_queue users go away.
>
> Who will use the request_queue after blk_cleanup_queue()?

Anyone who still holds a ref might try to issue a new request on a
dead queue. ie. blkdev with filesystem mounted goes away and the FS
issues a new read request after blk_cleanup_queue() finishes drainig.

Thanks.

--
tejun

Tejun Heo

unread,

May 28, 2012, 6:22:37 AM5/28/12

to Asias He, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org

On Mon, May 28, 2012 at 10:20:03AM +0800, Asias He wrote:
> Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
> supplied queue_lock before blk_drain_queue(). This would introduce lock
> unbalance because theads which have taken the external lock might unlock
> the internal lock in the during the queue drain.
>
> This patch mitigate this by disconnecting the lock after the queue
> draining since queue draining makes a lot of request_queue users go
> away.

Can you please point out how the code is broken and that the code is
still broken after the patch but somewhat less likely to actually
fail?

Thanks.

--
tejun

Asias He

unread,

May 28, 2012, 9:39:22 PM5/28/12

to Jens Axboe, Tejun Heo, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Asias He

Commit 777eb1bf15b8532c396821774bf6451e563438f5 disconnects externally
supplied queue_lock before blk_drain_queue(). Switching the lock would

introduce lock unbalance because theads which have taken the external
lock might unlock the internal lock in the during the queue drain. This
patch mitigate this by disconnecting the lock after the queue draining
since queue draining makes a lot of request_queue users go away.

However, please note, this patch only makes the problem less likely to
happen. Anyone who still holds a ref might try to issue a new request on
a dead queue after the blk_cleanup_queue() finishes draining, the lock
unbalance might still happen in this case.

Changes since v2: Update commit log to explain how the code is still
broken even if we delay the lock switching after the drain.

Asias He

unread,

May 28, 2012, 9:39:27 PM5/28/12

to Jens Axboe, Tejun Heo, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Asias He

Tejun Heo

unread,

May 28, 2012, 9:41:42 PM5/28/12

to Asias He, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org

Acked-by: Tejun Heo <t...@kernel.org>

Thanks.

--
tejun

Asias He

unread,

May 28, 2012, 9:49:00 PM5/28/12

to Tejun Heo, Jens Axboe, Christoph Hellwig, linux-...@vger.kernel.org, linux-...@vger.kernel.org

On 05/28/2012 06:20 PM, Tejun Heo wrote:
> Hello, Asias.
>
> On Mon, May 28, 2012 at 10:15:18AM +0800, Asias He wrote:
>>> I don't think the patch description is correct. The lock switcihng is
>>> inherently broken and the patch doesn't really fix the problem
>>> although it *might* make the problem less likely. Trying to switch
>>> locks while there are other accessors of the lock is simply broken, it
>>> can never work without outer synchronization.
>>
>> Since the lock switching is broken, is it a good idea to force all
>> the drivers to use the block layer provided lock? i.e. Change the
>> API from
>> blk_init_queue(rfn, driver_lock) to blk_init_queue(rfn). Any reason
>> not to use the block layer provided one.
>
> I think hch tried to do that a while ago. Dunno what happened to the
> patches. IIRC, the whole external lock thing was about sharing a
> single lock across different request_queues. Not sure whether it's
> actually beneficial enough or just a crazy broken optimization.

Do we have any existing use case of sharing a single lock across
different request_queues? What's point of this sharing. Christoph?

If nobody has any objections I'd like to make the patches. Jens, any
comments?

>>> Your patch might make
>>> the problem somewhat less likely simply because queue draining makes a
>>> lot of request_queue users go away.
>>
>> Who will use the request_queue after blk_cleanup_queue()?
>
> Anyone who still holds a ref might try to issue a new request on a
> dead queue. ie. blkdev with filesystem mounted goes away and the FS
> issues a new read request after blk_cleanup_queue() finishes drainig.

OK. Thanks for explaining.

--
Asias

Tim Gardner

unread,

May 29, 2012, 9:46:38 AM5/29/12

to Asias He, Jens Axboe, Tejun Heo, linux-...@vger.kernel.org, linux-...@vger.kernel.org, tim.g...@canonical.com

On 05/28/2012 07:39 PM, Asias He wrote:

<snip>

> @@ -440,6 +435,11 @@ void blk_cleanup_queue(struct request_queue *q)
> del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
> blk_sync_queue(q);
>
> + spin_lock_irq(lock);
> + if (q->queue_lock != &q->__queue_lock)
> + q->queue_lock = &q->__queue_lock;
> + spin_unlock_irq(lock);
> +

Isn't the 'if' clause superfluous ? You could just do the assignment, e.g.,

+ spin_lock_irq(lock);

+ q->queue_lock = &q->__queue_lock;
+ spin_unlock_irq(lock);

rtg
--
Tim Gardner tim.g...@canonical.com

Asias He

unread,

May 30, 2012, 2:27:14 AM5/30/12

to Tim Gardner, Jens Axboe, Tejun Heo, linux-...@vger.kernel.org, linux-...@vger.kernel.org, tim.g...@canonical.com

On 05/29/2012 09:45 PM, Tim Gardner wrote:
> On 05/28/2012 07:39 PM, Asias He wrote:
>
> <snip>
>
>> @@ -440,6 +435,11 @@ void blk_cleanup_queue(struct request_queue *q)
>> del_timer_sync(&q->backing_dev_info.laptop_mode_wb_timer);
>> blk_sync_queue(q);
>>
>> + spin_lock_irq(lock);
>> + if (q->queue_lock !=&q->__queue_lock)

>> + q->queue_lock =&q->__queue_lock;

>> + spin_unlock_irq(lock);
>> +
>
> Isn't the 'if' clause superfluous ? You could just do the assignment, e.g.,
>
> + spin_lock_irq(lock);
> + q->queue_lock =&q->__queue_lock;
> + spin_unlock_irq(lock);

Well, this saves a if clause but adds an unnecessary assignment if the
lock is already internal lock.

--
Asias

Tejun Heo

unread,

May 30, 2012, 2:28:46 AM5/30/12

to Asias He, Tim Gardner, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org, tim.g...@canonical.com

Hello,

On Wed, May 30, 2012 at 3:28 PM, Asias He <as...@redhat.com> wrote:
>> Isn't the 'if' clause superfluous ? You could just do the assignment,
>> e.g.,
>>
>> + spin_lock_irq(lock);
>> + q->queue_lock =&q->__queue_lock;
>> + spin_unlock_irq(lock);
>
>
> Well, this saves a if clause but adds an unnecessary assignment if the lock
> is already internal lock.

It's not hot path. Dirtying the cacheline there doesn't mean anything.
I don't really care either way but making optimization argument is
pretty silly here.

--
tejun

Asias He

unread,

May 30, 2012, 2:49:57 AM5/30/12

to Tejun Heo, Tim Gardner, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org, tim.g...@canonical.com

On 05/30/2012 02:28 PM, Tejun Heo wrote:
> Hello,
>
> On Wed, May 30, 2012 at 3:28 PM, Asias He<as...@redhat.com> wrote:
>>> Isn't the 'if' clause superfluous ? You could just do the assignment,
>>> e.g.,
>>>
>>> + spin_lock_irq(lock);
>>> + q->queue_lock =&q->__queue_lock;
>>> + spin_unlock_irq(lock);
>>
>>
>> Well, this saves a if clause but adds an unnecessary assignment if the lock
>> is already internal lock.
>
> It's not hot path. Dirtying the cacheline there doesn't mean anything.
> I don't really care either way but making optimization argument is
> pretty silly here.

I don't care this neither ;-)

--
Asias

Michael S. Tsirkin

unread,

Jun 1, 2012, 5:28:52 AM6/1/12

to Tejun Heo, Asias He, Jens Axboe, linux-...@vger.kernel.org, linux-...@vger.kernel.org, Christoph Hellwig

On Mon, May 28, 2012 at 07:20:55PM +0900, Tejun Heo wrote:
> Hello, Asias.
>
> On Mon, May 28, 2012 at 10:15:18AM +0800, Asias He wrote:
> > >I don't think the patch description is correct. The lock switcihng is
> > >inherently broken and the patch doesn't really fix the problem
> > >although it *might* make the problem less likely. Trying to switch
> > >locks while there are other accessors of the lock is simply broken, it
> > >can never work without outer synchronization.
> >
> > Since the lock switching is broken, is it a good idea to force all
> > the drivers to use the block layer provided lock? i.e. Change the
> > API from
> > blk_init_queue(rfn, driver_lock) to blk_init_queue(rfn). Any reason
> > not to use the block layer provided one.
>
> I think hch tried to do that a while ago. Dunno what happened to the
> patches. IIRC, the whole external lock thing was about sharing a
> single lock across different request_queues. Not sure whether it's
> actually beneficial enough or just a crazy broken optimization.

Looks like almost all drivers get it wrong. And it's likely
something like a floppy driver doesn't need an optimization:
drivers/block/floppy.c: disks[dr]->queue = blk_init_queue(do_fd_request, &floppy_lock);

The obvious use of this API is wrong. So how about introducing
a correct one, deprecating the broken one so we can start
slowly converting users?

Then if someone sees a real reason for the internal lock,
he will complain.

Jens Axboe

unread,

Jun 1, 2012, 5:31:54 AM6/1/12

to Tejun Heo, Asias He, Tim Gardner, linux-...@vger.kernel.org, linux-...@vger.kernel.org, tim.g...@canonical.com

On 05/30/2012 08:28 AM, Tejun Heo wrote:
> Hello,
>
> On Wed, May 30, 2012 at 3:28 PM, Asias He <as...@redhat.com> wrote:
>>> Isn't the 'if' clause superfluous ? You could just do the assignment,
>>> e.g.,
>>>
>>> + spin_lock_irq(lock);
>>> + q->queue_lock =&q->__queue_lock;
>>> + spin_unlock_irq(lock);
>>
>>
>> Well, this saves a if clause but adds an unnecessary assignment if the lock
>> is already internal lock.
>
> It's not hot path. Dirtying the cacheline there doesn't mean anything.
> I don't really care either way but making optimization argument is
> pretty silly here.

And more importantly, dropping the if loses information as well. That's
a lot more important than any misguided optimization attempts. So I
agree, the if stays.

--
Jens Axboe

Asias He

unread,

Jun 5, 2012, 10:11:42 PM6/5/12

to Jens Axboe, Tejun Heo, Tim Gardner, linux-...@vger.kernel.org, linux-...@vger.kernel.org, tim.g...@canonical.com

Hello, Jens

On 06/01/2012 05:31 PM, Jens Axboe wrote:
> On 05/30/2012 08:28 AM, Tejun Heo wrote:
>> Hello,
>>
>> On Wed, May 30, 2012 at 3:28 PM, Asias He<as...@redhat.com> wrote:
>>>> Isn't the 'if' clause superfluous ? You could just do the assignment,
>>>> e.g.,
>>>>
>>>> + spin_lock_irq(lock);
>>>> + q->queue_lock =&q->__queue_lock;
>>>> + spin_unlock_irq(lock);
>>>
>>>
>>> Well, this saves a if clause but adds an unnecessary assignment if the lock
>>> is already internal lock.
>>
>> It's not hot path. Dirtying the cacheline there doesn't mean anything.
>> I don't really care either way but making optimization argument is
>> pretty silly here.
>
> And more importantly, dropping the if loses information as well. That's
> a lot more important than any misguided optimization attempts. So I
> agree, the if stays.

Could you pick this patch in your tree?

--
Asias