[PATCH 2/2] block: rescan partitions on invalidated devices on -ENOMEDIA too

Tejun Heo

unread,

Apr 6, 2011, 8:30:03 AM4/6/11

to

__blkdev_get() doesn't rescan partitions if disk->fops->open() fails,
which leads to ghost partition devices lingering after medimum removal
is known to both the kernel and userland. The behavior also creates a
subtle inconsistency where O_NONBLOCK open, which doesn't fail even if
there's no medium, clears the ghots partitions, which is exploited to
work around the problem from userland.

Fix it by updating __blkdev_get() to issue partition rescan after
-ENOMEDIA too.

This was reported in the following bz.

https://bugzilla.kernel.org/show_bug.cgi?id=13029

Signed-off-by: Tejun Heo <t...@kernel.org>
Reported-by: David Zeuthen <zeu...@gmail.com>
Reported-by: Martin Pitt <marti...@ubuntu.com>
Reported-by: Kay Sievers <kay.s...@vrfy.org>
Tested-by: Kay Sievers <kay.s...@vrfy.org>
Cc: Alan Cox <al...@lxorguk.ukuu.org.uk>
---
fs/block_dev.c | 27 ++++++++++++++++++---------
1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index c1511c6..a926ad4 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1102,6 +1102,7 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
if (!bdev->bd_part)
goto out_clear;

+ ret = 0;
if (disk->fops->open) {
ret = disk->fops->open(bdev, mode);
if (ret == -ERESTARTSYS) {
@@ -1118,9 +1119,18 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
put_disk(disk);
goto restart;
}
- if (ret)
- goto out_clear;
}
+ /*
+ * If the device is invalidated, rescan partition
+ * if open succeeded or failed with -ENOMEDIUM.
+ * The latter is necessary to prevent ghost
+ * partitions on a removed medium.
+ */
+ if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
+ rescan_partitions(disk, bdev);
+ if (ret)
+ goto out_clear;
+
if (!bdev->bd_openers) {
bd_set_size(bdev,(loff_t)get_capacity(disk)<<9);
bdi = blk_get_backing_dev_info(bdev);
@@ -1128,8 +1138,6 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
bdi = &default_backing_dev_info;
bdev_inode_switch_bdi(bdev->bd_inode, bdi);
}
- if (bdev->bd_invalidated)
- rescan_partitions(disk, bdev);
} else {
struct block_device *whole;
whole = bdget_disk(disk, 0);
@@ -1153,13 +1161,14 @@ static int __blkdev_get(struct block_device *bdev, fmode_t mode, int for_part)
}
} else {
if (bdev->bd_contains == bdev) {
- if (bdev->bd_disk->fops->open) {
+ ret = 0;
+ if (bdev->bd_disk->fops->open)
ret = bdev->bd_disk->fops->open(bdev, mode);
- if (ret)
- goto out_unlock_bdev;
- }
- if (bdev->bd_invalidated)
+ /* the same as first opener case, read comment there */
+ if (bdev->bd_invalidated && (!ret || ret == -ENOMEDIUM))
rescan_partitions(bdev->bd_disk, bdev);
+ if (ret)
+ goto out_unlock_bdev;
}
/* only one opener holds refs to the module and disk */
module_put(disk->fops->owner);
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Tejun Heo

unread,

Apr 6, 2011, 8:30:02 AM4/6/11

to

cdrom_open() called check_disk_change() after the rest of open path
succeeded which leads to the following bizarre behavior.

* After media change, if the device opened without O_NONBLOCK,
open_for_data() naturally fails with -ENOMEDIA and
check_disk_change() is never called. The media is known to be gone
and the open failure makes it obvious to the userland but device
invalidation never happens.

* But if the device is opened with O_NONBLOCK, all the checks are
bypassed and cdrom_open() doesn't notice that the media is not there
and check_disk_change() is called and invalidation happens.

There's nothing to be gained by avoiding calling check_disk_change()
on open failure. Common cases end up calling check_disk_change()
anyway. All we get is inconsistent behavior.

Fix it by moving check_disk_change() invocation to the top of
cdrom_open() so that it always gets called regardless of how the rest
of open proceeds.

Signed-off-by: Tejun Heo <t...@kernel.org>
Reported-by: Amit Shah <amit...@redhat.com>
Tested-by: Amit Shah <amit...@redhat.com>
---
drivers/cdrom/cdrom.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/cdrom/cdrom.c b/drivers/cdrom/cdrom.c
index e2c48a7..5ade78a 100644
--- a/drivers/cdrom/cdrom.c
+++ b/drivers/cdrom/cdrom.c
@@ -986,6 +986,9 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev, fmode_t

cdinfo(CD_OPEN, "entering cdrom_open\n");

+ /* open is event synchronization point, check events first */
+ check_disk_change(bdev);
+
/* if this was a O_NONBLOCK open and we should honor the flags,
* do a quick open without drive/disc integrity checks. */
cdi->use_count++;
@@ -1012,9 +1015,6 @@ int cdrom_open(struct cdrom_device_info *cdi, struct block_device *bdev, fmode_t

cdinfo(CD_OPEN, "Use count for \"/dev/%s\" now %d\n",
cdi->name, cdi->use_count);
- /* Do this on open. Don't wait for mount, because they might
- not be mounting, but opening with O_NONBLOCK */
- check_disk_change(bdev);
return 0;
err_release:
if (CDROM_CAN(CDC_LOCK) && cdi->options & CDO_LOCK) {

Amit Shah

unread,

Apr 29, 2011, 3:00:02 AM4/29/11

to

On (Wed) 06 Apr 2011 [05:20:09], Tejun Heo wrote:
> cdrom_open() called check_disk_change() after the rest of open path
> succeeded which leads to the following bizarre behavior.
>
> * After media change, if the device opened without O_NONBLOCK,
> open_for_data() naturally fails with -ENOMEDIA and
> check_disk_change() is never called. The media is known to be gone
> and the open failure makes it obvious to the userland but device
> invalidation never happens.
>
> * But if the device is opened with O_NONBLOCK, all the checks are
> bypassed and cdrom_open() doesn't notice that the media is not there
> and check_disk_change() is called and invalidation happens.
>
> There's nothing to be gained by avoiding calling check_disk_change()
> on open failure. Common cases end up calling check_disk_change()
> anyway. All we get is inconsistent behavior.
>
> Fix it by moving check_disk_change() invocation to the top of
> cdrom_open() so that it always gets called regardless of how the rest
> of open proceeds.
>
> Signed-off-by: Tejun Heo <t...@kernel.org>
> Reported-by: Amit Shah <amit...@redhat.com>
> Tested-by: Amit Shah <amit...@redhat.com>

Ping?

Also, please mark this for stable-2.6.38.

Thanks,

Amit

Tejun Heo

unread,

Apr 29, 2011, 4:20:02 AM4/29/11

to

On Fri, Apr 29, 2011 at 12:27:18PM +0530, Amit Shah wrote:
> On (Wed) 06 Apr 2011 [05:20:09], Tejun Heo wrote:
> > cdrom_open() called check_disk_change() after the rest of open path
> > succeeded which leads to the following bizarre behavior.
> >
> > * After media change, if the device opened without O_NONBLOCK,
> > open_for_data() naturally fails with -ENOMEDIA and
> > check_disk_change() is never called. The media is known to be gone
> > and the open failure makes it obvious to the userland but device
> > invalidation never happens.
> >
> > * But if the device is opened with O_NONBLOCK, all the checks are
> > bypassed and cdrom_open() doesn't notice that the media is not there
> > and check_disk_change() is called and invalidation happens.
> >
> > There's nothing to be gained by avoiding calling check_disk_change()
> > on open failure. Common cases end up calling check_disk_change()
> > anyway. All we get is inconsistent behavior.
> >
> > Fix it by moving check_disk_change() invocation to the top of
> > cdrom_open() so that it always gets called regardless of how the rest
> > of open proceeds.
> >
> > Signed-off-by: Tejun Heo <t...@kernel.org>
> > Reported-by: Amit Shah <amit...@redhat.com>
> > Tested-by: Amit Shah <amit...@redhat.com>
>
> Ping?
>
> Also, please mark this for stable-2.6.38.

Jens?

--
tejun

Jens Axboe

unread,

Apr 29, 2011, 4:20:01 AM4/29/11

to

On 2011-04-29 08:57, Amit Shah wrote:
> On (Wed) 06 Apr 2011 [05:20:09], Tejun Heo wrote:
>> cdrom_open() called check_disk_change() after the rest of open path
>> succeeded which leads to the following bizarre behavior.
>>
>> * After media change, if the device opened without O_NONBLOCK,
>> open_for_data() naturally fails with -ENOMEDIA and
>> check_disk_change() is never called. The media is known to be gone
>> and the open failure makes it obvious to the userland but device
>> invalidation never happens.
>>
>> * But if the device is opened with O_NONBLOCK, all the checks are
>> bypassed and cdrom_open() doesn't notice that the media is not there
>> and check_disk_change() is called and invalidation happens.
>>
>> There's nothing to be gained by avoiding calling check_disk_change()
>> on open failure. Common cases end up calling check_disk_change()
>> anyway. All we get is inconsistent behavior.
>>
>> Fix it by moving check_disk_change() invocation to the top of
>> cdrom_open() so that it always gets called regardless of how the rest
>> of open proceeds.
>>
>> Signed-off-by: Tejun Heo <t...@kernel.org>
>> Reported-by: Amit Shah <amit...@redhat.com>
>> Tested-by: Amit Shah <amit...@redhat.com>
>
> Ping?
>
> Also, please mark this for stable-2.6.38.

Done, added for 2.6.39 and marked stable for 2.6.38.

--
Jens Axboe

Jens Axboe

unread,

Apr 29, 2011, 4:30:01 AM4/29/11

to

On 2011-04-29 10:16, Tejun Heo wrote:

> On Fri, Apr 29, 2011 at 10:15:50AM +0200, Jens Axboe wrote:
>>> Also, please mark this for stable-2.6.38.
>>
>> Done, added for 2.6.39 and marked stable for 2.6.38.
>

> Heh, mid-air collision. Good morning, Jens. :)

Indeed :-)

Tejun Heo

unread,

Apr 29, 2011, 4:30:02 AM4/29/11

to

On Fri, Apr 29, 2011 at 10:15:50AM +0200, Jens Axboe wrote:

> > Also, please mark this for stable-2.6.38.
>
> Done, added for 2.6.39 and marked stable for 2.6.38.

Heh, mid-air collision. Good morning, Jens. :)

--
tejun

Amit Shah

unread,

May 10, 2011, 2:50:02 AM5/10/11

to

Ping again. Don't see this yet in Linus's git tree.

Amit

Jens Axboe

unread,

May 10, 2011, 3:50:02 AM5/10/11

to

It was added for 2.6.40 and marked for stable.

--
Jens Axboe

Amit Shah

unread,

May 10, 2011, 4:20:02 AM5/10/11

to

OK, Thanks.

Amit

Jonathan Nieder

unread,

Sep 3, 2011, 6:20:02 PM9/3/11

to

Hi,

Various people[1] have been noticing a race (or races?) in which the
cdrom_id and scsi_id programs from udev can get stuck in the D state
if udev doesn't sleep a little while before running them. This
prevents the machine from suspending. The problem was discovered with
Debian kernels

2.6.39-1
2.6.39-2
3.0.0-1

and was not experienced with kernel

2.6.38-5

(These kernels are closely based on v2.6.39, v2.6.39.1, v3.0, and
v2.6.38.5, respectively.) Reverting commit bf2253a6f00e (cdrom:
always check_disk_change() on open, 2011-04-29) seems to avoid
trouble.

One common theme seems to be DVD drives. Details at [1].

Known problem? Any ideas for tracking it down?

Looking forward to your thoughts,
Jonathan

[1] <http://bugs.debian.org/628600>. Submitters cc-ed.

Tejun Heo

unread,

Sep 3, 2011, 10:50:01 PM9/3/11

to

Hello,

On Sat, Sep 03, 2011 at 05:14:56PM -0500, Jonathan Nieder wrote:
> Various people[1] have been noticing a race (or races?) in which the
> cdrom_id and scsi_id programs from udev can get stuck in the D state
> if udev doesn't sleep a little while before running them. This
> prevents the machine from suspending. The problem was discovered with
> Debian kernels
>
> 2.6.39-1
> 2.6.39-2
> 3.0.0-1
>
> and was not experienced with kernel
>
> 2.6.38-5
>
> (These kernels are closely based on v2.6.39, v2.6.39.1, v3.0, and
> v2.6.38.5, respectively.) Reverting commit bf2253a6f00e (cdrom:
> always check_disk_change() on open, 2011-04-29) seems to avoid
> trouble.
>
> One common theme seems to be DVD drives. Details at [1].
>
> Known problem? Any ideas for tracking it down?

Can you please build vanilla kernel with CONFIG_FRAME_POINTER turned
on, reproduce the problem and attach full dmesg output?

Thanks.

--
tejun

Jonathan Nieder

unread,

Sep 4, 2011, 11:10:02 AM9/4/11

to

Tejun Heo wrote:
> On Sat, Sep 03, 2011 at 05:14:56PM -0500, Jonathan Nieder wrote:

>> Various people[1] have been noticing a race (or races?) in which the
>> cdrom_id and scsi_id programs from udev can get stuck in the D state

[...]

> Can you please build vanilla kernel with CONFIG_FRAME_POINTER turned
> on, reproduce the problem and attach full dmesg output?

Thanks, Tejun. I haven't been able to reproduce it here, but I'd be
happy to help anyone cc-ed wanting to do that who has questions (feel
free to email me privately).

Jameson Graef Rollins

unread,

Sep 4, 2011, 10:00:02 PM9/4/11

to

On Sun, 4 Sep 2011 11:42:44 +0900, Tejun Heo <t...@kernel.org> wrote:
> Can you please build vanilla kernel with CONFIG_FRAME_POINTER turned
> on, reproduce the problem and attach full dmesg output?

Hi, Tejun. I recompiled the kernel from the Debian linux-source-3.0.0
(following the instructions here [0]) with the following config:

CONFIG_FRAME_POINTER=y

I then rebooted into this new kernel and triggered the bug. Attached is
the dmesg output from right before and right after the bug was
triggered. The bug was triggered by ejecting my laptop from it's dock,
and then trying to put it to sleep (which of course it failed to do).

Please let me know if there's anything else I can do, or any other
information I can provide. Thanks so much for the help.

jamie.

[0] http://kernel-handbook.alioth.debian.org/ch-common-tasks.html#s-common-building

dmesg.before

dmesg.after

Tejun Heo

unread,

Sep 6, 2011, 1:50:01 PM9/6/11

to

Hello,

On Sun, Sep 04, 2011 at 06:49:38PM -0700, Jameson Graef Rollins wrote:
> I then rebooted into this new kernel and triggered the bug. Attached is
> the dmesg output from right before and right after the bug was
> triggered. The bug was triggered by ejecting my laptop from it's dock,
> and then trying to put it to sleep (which of course it failed to do).

If you wait (five mins) after triggering the problem, does the kernel
print further messages? Also, can you please explain the steps needed
to trigger this problem?

> [ 221.306959] cdrom_id D 0000000000000000 0 2693 2689 0x00800004
> [ 221.306969] ffff88023180b8e8 0000000000000082 7fffffffffffffff ffff880200000000
> [ 221.306978] ffff8802314b83c0 ffff88023180bfd8 ffff88023180bfd8 0000000000012840
> [ 221.306987] ffff880232e18f60 ffff8802314b83c0 ffff88023180b938 000000018133e5cf
> [ 221.306995] Call Trace:
> [ 221.307011] [<ffffffff8133ea82>] schedule_timeout+0x2f/0xd9
> [ 221.307021] [<ffffffff8133e86e>] wait_for_common+0x9e/0x115
> [ 221.307029] [<ffffffff8133e97b>] wait_for_completion+0x18/0x1a
> [ 221.307035] [<ffffffff8105b0fe>] flush_work+0x29/0x2f
> [ 221.307042] [<ffffffff8105b48a>] flush_delayed_work+0x3a/0x3e
> [ 221.307047] [<ffffffff81199bc6>] disk_clear_events+0x8f/0xf5
> [ 221.307073] [<ffffffff81121827>] check_disk_change+0x29/0x5b
> [ 221.307081] [<ffffffffa0069e71>] cdrom_open+0x44/0x4b2 [cdrom]
> [ 221.307097] [<ffffffffa010786b>] sr_block_open+0x90/0xb7 [sr_mod]
> [ 221.307101] [<ffffffff811226dd>] __blkdev_get+0xe5/0x39b
> [ 221.307104] [<ffffffff81122b5f>] blkdev_get+0x1cc/0x2bb
> [ 221.307111] [<ffffffff81122cb3>] blkdev_open+0x65/0x6a
> [ 221.307115] [<ffffffff810f98f2>] __dentry_open+0x185/0x29f
> [ 221.307129] [<ffffffff810fa842>] nameidata_to_filp+0x5b/0x62
> [ 221.307135] [<ffffffff81105843>] do_last+0x448/0x55d
> [ 221.307138] [<ffffffff81106483>] path_openat+0xc3/0x304
> [ 221.307142] [<ffffffff811066f7>] do_filp_open+0x33/0x81
> [ 221.307150] [<ffffffff810fa8b2>] do_sys_open+0x69/0xfb
> [ 221.307153] [<ffffffff810fa95f>] sys_open+0x1b/0x1d
> [ 221.307157] [<ffffffff81344d52>] system_call_fastpath+0x16/0x1b

Hmmm... so, this is cdrom_id waiting for event check work item to
complete. The event work is executed on system_nrt_wq which is frozen
after all userland tasks are frozen so there's no reason for the above
to stall. Can you please enable sysrq, trigger the problem and while
the system is in that 20 sec freeze stall, hit sysrq-t and report the
dmesg?

> [ 242.681434] INFO: task cdrom_id:2693 blocked for more than 120 seconds.
> [ 242.681438] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [ 242.681442] cdrom_id D 0000000000000000 0 2693 2689 0x00000004
> [ 242.681448] ffff88023180b8e8 0000000000000082 7fffffffffffffff ffff880200000000
> [ 242.681455] ffff8802314b83c0 ffff88023180bfd8 ffff88023180bfd8 0000000000012840
> [ 242.681460] ffff880232e18f60 ffff8802314b83c0 ffff88023180b938 000000018133e5cf
> [ 242.681466] Call Trace:
> [ 242.681478] [<ffffffff8133ea82>] schedule_timeout+0x2f/0xd9
> [ 242.681490] [<ffffffff8133e86e>] wait_for_common+0x9e/0x115
> [ 242.681500] [<ffffffff8133e97b>] wait_for_completion+0x18/0x1a
> [ 242.681507] [<ffffffff8105b0fe>] flush_work+0x29/0x2f

Does the cdrom eventually become usable afterwards?

Matthijs Kooijman

unread,

Sep 7, 2011, 2:40:02 PM9/7/11

to

Hi Tejun,

> Also, can you please explain the steps needed to trigger this problem?

I'm not the one that supplied the backtraces, but I'm having the same
problem. I did a fairly extensive writeup on the Debian bug report,
perhaps this could (help to) answer your question:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=628600#23

I'll leave the other questions to Jameson for now.

Gr.

Matthijs

signature.asc

Tejun Heo

unread,

Sep 11, 2011, 12:10:02 AM9/11/11

to

Hmm... so, hibernation, disk event or udev isn't the cause. Device
removal somehow ends up with stalled request queue hanging event check
which eventually propagates as failed freeze during hibernation
attempt.

Can someone please attach dmesg output cdrom_id is hung after dock
removal and also the output of sysrq-t?

Gaudenz Steinlin

unread,

Oct 24, 2011, 4:40:02 PM10/24/11

to

Hi

I stumbled upon this old thread while debugging a hung cdrom_id task on
my Thinkpad X200.

On Sun, 11 Sep 2011 13:08:26 +0900, Tejun Heo <t...@kernel.org> wrote:
> On Wed, Sep 07, 2011 at 10:50:37AM +0200, Matthijs Kooijman wrote:
> > Hi Tejun,
> >
> > > Also, can you please explain the steps needed to trigger this problem?
> > I'm not the one that supplied the backtraces, but I'm having the same
> > problem. I did a fairly extensive writeup on the Debian bug report,
> > perhaps this could (help to) answer your question:
> >
> > http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=628600#23
> >
> > I'll leave the other questions to Jameson for now.
>
> Hmm... so, hibernation, disk event or udev isn't the cause. Device
> removal somehow ends up with stalled request queue hanging event check
> which eventually propagates as failed freeze during hibernation
> attempt.

Yes this seems to be a race condition or a deadlock while removing the
cdrom drive when undocking the notebock from the dock which contains the
cd drive. The process is already unkillable before any attempt to
suspend the machine.

Please see the requested dmesg output below. I can also provide
additional debugging information if needed.

>
> Can someone please attach dmesg output cdrom_id is hung after dock
> removal and also the output of sysrq-t?
>

Dmesg output after a faild suspend attempt:
Oct 24 21:12:05 meteor kernel: [84916.468058] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
Oct 24 21:12:05 meteor kernel: [84916.468142] cdrom_id D ffff8800a5e1a970 0 26723 26720 0x00800004
Oct 24 21:12:05 meteor kernel: [84916.468149] ffff8800a5e1a970 0000000000000086 ffff8800a5e1ad28 00ff88013bc92800
Oct 24 21:12:05 meteor kernel: [84916.468157] ffff88010ee8e300 0000000000012800 ffff88012a539fd8 ffff88012a539fd8
Oct 24 21:12:05 meteor kernel: [84916.468165] 0000000000012800 ffff8800a5e1a970 0000000000012800 0000000000012800
Oct 24 21:12:05 meteor kernel: [84916.468173] Call Trace:
Oct 24 21:12:05 meteor kernel: [84916.468179] [<ffffffff81335bb1>] ? schedule_timeout+0x2d/0xd7
Oct 24 21:12:05 meteor kernel: [84916.468186] [<ffffffff81038189>] ? test_tsk_need_resched+0xe/0x17
Oct 24 21:12:05 meteor kernel: [84916.468192] [<ffffffff8133598b>] ? wait_for_common+0x9d/0x116
Oct 24 21:12:05 meteor kernel: [84916.468197] [<ffffffff8103f0a4>] ? try_to_wake_up+0x199/0x199
Oct 24 21:12:05 meteor kernel: [84916.468203] [<ffffffff8105b266>] ? start_flush_work+0xec/0x103
Oct 24 21:12:05 meteor kernel: [84916.468208] [<ffffffff8105b57c>] ? flush_work+0x24/0x2c
Oct 24 21:12:05 meteor kernel: [84916.468214] [<ffffffff8105ae8f>] ? do_work_for_cpu+0x1b/0x1b
Oct 24 21:12:05 meteor kernel: [84916.468220] [<ffffffff81199092>] ? disk_clear_events+0x86/0xe4
Oct 24 21:12:05 meteor kernel: [84916.468230] [<ffffffff81121ef2>] ? check_disk_change+0x22/0x50
Oct 24 21:12:05 meteor kernel: [84916.468235] [<ffffffffa0a3d051>] ? cdrom_open+0x45/0x4aa [cdrom]
Oct 24 21:12:05 meteor kernel: [84916.468238] [<ffffffff811a484d>] ? kobject_get+0x12/0x17
Oct 24 21:12:05 meteor kernel: [84916.468242] [<ffffffff81197c4d>] ? get_disk+0x6d/0x8d
Oct 24 21:12:05 meteor kernel: [84916.468245] [<ffffffff8103840a>] ? should_resched+0x5/0x24
Oct 24 21:12:05 meteor kernel: [84916.468248] [<ffffffff813358d7>] ? _cond_resched+0x9/0x20
Oct 24 21:12:05 meteor kernel: [84916.468252] [<ffffffff8124b753>] ? kobj_lookup+0x13a/0x174
Oct 24 21:12:05 meteor kernel: [84916.468255] [<ffffffff811a484d>] ? kobject_get+0x12/0x17
Oct 24 21:12:05 meteor kernel: [84916.468259] [<ffffffffa0a4485d>] ? sr_block_open+0x93/0xbc [sr_mod]
Oct 24 21:12:05 meteor kernel: [84916.468263] [<ffffffff81122c95>] ? __blkdev_get+0xe3/0x380
Oct 24 21:12:05 meteor kernel: [84916.468266] [<ffffffff810cf9c5>] ? set_pte_at+0x5/0x8
Oct 24 21:12:05 meteor kernel: [84916.468269] [<ffffffff811231d9>] ? blkdev_get+0x2a7/0x2a7
Oct 24 21:12:05 meteor kernel: [84916.468272] [<ffffffff811230f9>] ? blkdev_get+0x1c7/0x2a7
Oct 24 21:12:05 meteor kernel: [84916.468276] [<ffffffff811231d9>] ? blkdev_get+0x2a7/0x2a7
Oct 24 21:12:05 meteor kernel: [84916.468279] [<ffffffff810fa612>] ? __dentry_open+0x182/0x29c
Oct 24 21:12:05 meteor kernel: [84916.468283] [<ffffffff811038c3>] ? dget+0x12/0x1e
Oct 24 21:12:05 meteor kernel: [84916.468286] [<ffffffff81105b45>] ? do_last+0x46d/0x584
Oct 24 21:12:05 meteor kernel: [84916.468289] [<ffffffff81106ece>] ? path_openat+0xc7/0x349
Oct 24 21:12:05 meteor kernel: [84916.468292] [<ffffffff810d0562>] ? tlb_flush_mmu+0x37/0x50
Oct 24 21:12:05 meteor kernel: [84916.468296] [<ffffffff8110717c>] ? do_filp_open+0x2c/0x72
Oct 24 21:12:05 meteor kernel: [84916.468299] [<ffffffff813358d7>] ? _cond_resched+0x9/0x20
Oct 24 21:12:05 meteor kernel: [84916.468303] [<ffffffff811aca91>] ? __strncpy_from_user+0x19/0x4a
Oct 24 21:12:05 meteor kernel: [84916.468306] [<ffffffff811104ec>] ? alloc_fd+0x69/0x110
Oct 24 21:12:05 meteor kernel: [84916.468309] [<ffffffff810fb41c>] ? do_sys_open+0x5f/0xe6
Oct 24 21:12:05 meteor kernel: [84916.468313] [<ffffffff8133bd12>] ? system_call_fastpath+0x16/0x1b
Oct 24 21:12:05 meteor kernel: [84916.468324]
Oct 24 21:12:05 meteor kernel: [84916.468325] Restarting tasks ... done.

Output of sysrq-w (blocked tasks):
Oct 24 21:29:12 meteor kernel: [85943.671066] SysRq : Show Blocked State
Oct 24 21:29:12 meteor kernel: [85943.671078] task PC stack pid father
Oct 24 21:29:12 meteor kernel: [85943.671212] kworker/1:0 D ffff88012ffaa080 0 25836 2 0x00000000
Oct 24 21:29:12 meteor kernel: [85943.671225] ffff88012ffaa080 0000000000000046 ffff8801074bb000 ffffffff81071f28
Oct 24 21:29:12 meteor kernel: [85943.671237] ffff88011902e1c0 0000000000012800 ffff880135eaffd8 ffff880135eaffd8
Oct 24 21:29:12 meteor kernel: [85943.671249] 0000000000012800 ffff88012ffaa080 0000000000012800 0000000000012800
Oct 24 21:29:12 meteor kernel: [85943.671261] Call Trace:
Oct 24 21:29:12 meteor kernel: [85943.671278] [<ffffffff81071f28>] ? arch_local_irq_save+0x14/0x1d
Oct 24 21:29:12 meteor kernel: [85943.671292] [<ffffffff81335c24>] ? schedule_timeout+0xa0/0xd7
Oct 24 21:29:12 meteor kernel: [85943.671303] [<ffffffff81052bcf>] ? run_timer_softirq+0x28a/0x28a
Oct 24 21:29:12 meteor kernel: [85943.671312] [<ffffffff8133598b>] ? wait_for_common+0x9d/0x116
Oct 24 21:29:12 meteor kernel: [85943.671322] [<ffffffff8103f0a4>] ? try_to_wake_up+0x199/0x199
Oct 24 21:29:12 meteor kernel: [85943.671330] [<ffffffff81336a8d>] ? _raw_spin_lock_irq+0xd/0x1a
Oct 24 21:29:12 meteor kernel: [85943.671340] [<ffffffff81195532>] ? blk_execute_rq+0xa5/0xe7
Oct 24 21:29:12 meteor kernel: [85943.671350] [<ffffffff8112098e>] ? bio_phys_segments+0xf/0x14
Oct 24 21:29:12 meteor kernel: [85943.671361] [<ffffffff811928ee>] ? blk_rq_bio_prep+0x39/0x6c
Oct 24 21:29:12 meteor kernel: [85943.671371] [<ffffffff81195177>] ? blk_rq_map_kern+0xef/0x11f
Oct 24 21:29:12 meteor kernel: [85943.671457] [<ffffffffa00959a0>] ? scsi_execute+0xfb/0x14d [scsi_mod]
Oct 24 21:29:12 meteor kernel: [85943.671491] [<ffffffffa0095a7b>] ? scsi_execute_req+0x89/0xb9 [scsi_mod]
Oct 24 21:29:12 meteor kernel: [85943.671502] [<ffffffff81197af0>] ? disk_events_set_dfl_poll_msecs+0x44/0x44
Oct 24 21:29:12 meteor kernel: [85943.671514] [<ffffffffa0a442c2>] ? sr_check_events+0x9a/0x230 [sr_mod]
Oct 24 21:29:12 meteor kernel: [85943.671525] [<ffffffff81030008>] ? compound_head+0x3/0x10
Oct 24 21:29:12 meteor kernel: [85943.671537] [<ffffffffa0a3a041>] ? cdrom_check_events+0xf/0x22 [cdrom]
Oct 24 21:29:12 meteor kernel: [85943.671546] [<ffffffff81197b2a>] ? disk_events_workfn+0x3a/0xd5
Oct 24 21:29:12 meteor kernel: [85943.671555] [<ffffffff81197af0>] ? disk_events_set_dfl_poll_msecs+0x44/0x44
Oct 24 21:29:12 meteor kernel: [85943.671566] [<ffffffff8105b943>] ? process_one_work+0x193/0x28f
Oct 24 21:29:12 meteor kernel: [85943.671575] [<ffffffff8105cacf>] ? worker_thread+0xef/0x172
Oct 24 21:29:12 meteor kernel: [85943.671584] [<ffffffff8105c9e0>] ? manage_workers.clone.17+0x15b/0x15b
Oct 24 21:29:12 meteor kernel: [85943.671593] [<ffffffff8105c9e0>] ? manage_workers.clone.17+0x15b/0x15b
Oct 24 21:29:12 meteor kernel: [85943.671603] [<ffffffff8105fc0b>] ? kthread+0x7a/0x82
Oct 24 21:29:12 meteor kernel: [85943.671613] [<ffffffff8133ce24>] ? kernel_thread_helper+0x4/0x10
Oct 24 21:29:12 meteor kernel: [85943.671622] [<ffffffff8105fb91>] ? kthread_worker_fn+0x149/0x149
Oct 24 21:29:12 meteor kernel: [85943.671630] [<ffffffff8133ce20>] ? gs_change+0x13/0x13
Oct 24 21:29:12 meteor kernel: [85943.671637] cdrom_id D ffff8800a5e1a970 0 26723 26720 0x00000004
Oct 24 21:29:12 meteor kernel: [85943.671647] ffff8800a5e1a970 0000000000000086 ffff8800a5e1ad28 00ff88013bc92800
Oct 24 21:29:12 meteor kernel: [85943.671659] ffff88010ee8e300 0000000000012800 ffff88012a539fd8 ffff88012a539fd8
Oct 24 21:29:12 meteor kernel: [85943.671671] 0000000000012800 ffff8800a5e1a970 0000000000012800 0000000000012800
Oct 24 21:29:12 meteor kernel: [85943.671682] Call Trace:
Oct 24 21:29:12 meteor kernel: [85943.671692] [<ffffffff81335bb1>] ? schedule_timeout+0x2d/0xd7
Oct 24 21:29:12 meteor kernel: [85943.671702] [<ffffffff81038189>] ? test_tsk_need_resched+0xe/0x17
Oct 24 21:29:12 meteor kernel: [85943.671712] [<ffffffff8133598b>] ? wait_for_common+0x9d/0x116
Oct 24 21:29:12 meteor kernel: [85943.671720] [<ffffffff8103f0a4>] ? try_to_wake_up+0x199/0x199
Oct 24 21:29:12 meteor kernel: [85943.671729] [<ffffffff8105b266>] ? start_flush_work+0xec/0x103
Oct 24 21:29:12 meteor kernel: [85943.671738] [<ffffffff8105b57c>] ? flush_work+0x24/0x2c
Oct 24 21:29:12 meteor kernel: [85943.671746] [<ffffffff8105ae8f>] ? do_work_for_cpu+0x1b/0x1b
Oct 24 21:29:12 meteor kernel: [85943.671755] [<ffffffff81199092>] ? disk_clear_events+0x86/0xe4
Oct 24 21:29:12 meteor kernel: [85943.671766] [<ffffffff81121ef2>] ? check_disk_change+0x22/0x50
Oct 24 21:29:12 meteor kernel: [85943.671778] [<ffffffffa0a3d051>] ? cdrom_open+0x45/0x4aa [cdrom]
Oct 24 21:29:12 meteor kernel: [85943.671787] [<ffffffff811a484d>] ? kobject_get+0x12/0x17
Oct 24 21:29:12 meteor kernel: [85943.671796] [<ffffffff81197c4d>] ? get_disk+0x6d/0x8d
Oct 24 21:29:12 meteor kernel: [85943.671805] [<ffffffff8103840a>] ? should_resched+0x5/0x24
Oct 24 21:29:12 meteor kernel: [85943.671813] [<ffffffff813358d7>] ? _cond_resched+0x9/0x20
Oct 24 21:29:12 meteor kernel: [85943.671824] [<ffffffff8124b753>] ? kobj_lookup+0x13a/0x174
Oct 24 21:29:12 meteor kernel: [85943.671833] [<ffffffff811a484d>] ? kobject_get+0x12/0x17
Oct 24 21:29:12 meteor kernel: [85943.671844] [<ffffffffa0a4485d>] ? sr_block_open+0x93/0xbc [sr_mod]
Oct 24 21:29:12 meteor kernel: [85943.671854] [<ffffffff81122c95>] ? __blkdev_get+0xe3/0x380
Oct 24 21:29:12 meteor kernel: [85943.671864] [<ffffffff810cf9c5>] ? set_pte_at+0x5/0x8
Oct 24 21:29:12 meteor kernel: [85943.671872] [<ffffffff811231d9>] ? blkdev_get+0x2a7/0x2a7
Oct 24 21:29:12 meteor kernel: [85943.671880] [<ffffffff811230f9>] ? blkdev_get+0x1c7/0x2a7
Oct 24 21:29:12 meteor kernel: [85943.671889] [<ffffffff811231d9>] ? blkdev_get+0x2a7/0x2a7
Oct 24 21:29:12 meteor kernel: [85943.671900] [<ffffffff810fa612>] ? __dentry_open+0x182/0x29c
Oct 24 21:29:12 meteor kernel: [85943.671910] [<ffffffff811038c3>] ? dget+0x12/0x1e
Oct 24 21:29:12 meteor kernel: [85943.671918] [<ffffffff81105b45>] ? do_last+0x46d/0x584
Oct 24 21:29:12 meteor kernel: [85943.671927] [<ffffffff81106ece>] ? path_openat+0xc7/0x349
Oct 24 21:29:12 meteor kernel: [85943.671936] [<ffffffff810d0562>] ? tlb_flush_mmu+0x37/0x50
Oct 24 21:29:12 meteor kernel: [85943.671945] [<ffffffff8110717c>] ? do_filp_open+0x2c/0x72
Oct 24 21:29:12 meteor kernel: [85943.671954] [<ffffffff813358d7>] ? _cond_resched+0x9/0x20
Oct 24 21:29:12 meteor kernel: [85943.671964] [<ffffffff811aca91>] ? __strncpy_from_user+0x19/0x4a
Oct 24 21:29:12 meteor kernel: [85943.671973] [<ffffffff811104ec>] ? alloc_fd+0x69/0x110
Oct 24 21:29:12 meteor kernel: [85943.671983] [<ffffffff810fb41c>] ? do_sys_open+0x5f/0xe6
Oct 24 21:29:12 meteor kernel: [85943.671993] [<ffffffff8133bd12>] ? system_call_fastpath+0x16/0x1b
Oct 24 21:29:12 meteor kernel: [85943.672023] Sched Debug Version: v0.10, 3.0.0-1-amd64 #1
Oct 24 21:29:12 meteor kernel: [85943.672029] ktime : 85943672.020455
Oct 24 21:29:12 meteor kernel: [85943.672035] sched_clk : 3956869.639252
Oct 24 21:29:12 meteor kernel: [85943.672041] cpu_clk : 85943672.020051
Oct 24 21:29:12 meteor kernel: [85943.672046] jiffies : 4316378213
Oct 24 21:29:12 meteor kernel: [85943.672051] sched_clock_stable : 0
Oct 24 21:29:12 meteor kernel: [85943.672055]
Oct 24 21:29:12 meteor kernel: [85943.672058] sysctl_sched
Oct 24 21:29:12 meteor kernel: [85943.672063] .sysctl_sched_latency : 12.000000
Oct 24 21:29:12 meteor kernel: [85943.672069] .sysctl_sched_min_granularity : 1.500000
Oct 24 21:29:12 meteor kernel: [85943.672074] .sysctl_sched_wakeup_granularity : 2.000000
Oct 24 21:29:12 meteor kernel: [85943.672079] .sysctl_sched_child_runs_first : 0
Oct 24 21:29:12 meteor kernel: [85943.672084] .sysctl_sched_features : 15471
Oct 24 21:29:12 meteor kernel: [85943.672090] .sysctl_sched_tunable_scaling : 1 (logaritmic)
Oct 24 21:29:12 meteor kernel: [85943.672097]
Oct 24 21:29:12 meteor kernel: [85943.672098] cpu#0, 2526.987 MHz
Oct 24 21:29:12 meteor kernel: [85943.672103] .nr_running : 0
Oct 24 21:29:12 meteor kernel: [85943.672108] .load : 0
Oct 24 21:29:12 meteor kernel: [85943.672112] .nr_switches : 76035225
Oct 24 21:29:12 meteor kernel: [85943.672117] .nr_load_updates : 3956102
Oct 24 21:29:12 meteor kernel: [85943.672122] .nr_uninterruptible : 3964
Oct 24 21:29:12 meteor kernel: [85943.672127] .next_balance : 4316.378209
Oct 24 21:29:12 meteor kernel: [85943.672132] .curr->pid : 0
Oct 24 21:29:12 meteor kernel: [85943.672137] .clock : 85943665.023849
Oct 24 21:29:12 meteor kernel: [85943.672142] .cpu_load[0] : 0
Oct 24 21:29:12 meteor kernel: [85943.672146] .cpu_load[1] : 0
Oct 24 21:29:12 meteor kernel: [85943.672150] .cpu_load[2] : 0
Oct 24 21:29:12 meteor kernel: [85943.672155] .cpu_load[3] : 0
Oct 24 21:29:12 meteor kernel: [85943.672159] .cpu_load[4] : 0
Oct 24 21:29:12 meteor kernel: [85943.672164]
Oct 24 21:29:12 meteor kernel: [85943.672166] cfs_rq[0]:/
Oct 24 21:29:12 meteor kernel: [85943.672170] .exec_clock : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672175] .MIN_vruntime : 0.000001
Oct 24 21:29:12 meteor kernel: [85943.672181] .min_vruntime : 6085680.660055
Oct 24 21:29:12 meteor kernel: [85943.672186] .max_vruntime : 0.000001
Oct 24 21:29:12 meteor kernel: [85943.672191] .spread : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672196] .spread0 : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672201] .nr_spread_over : 0
Oct 24 21:29:12 meteor kernel: [85943.672205] .nr_running : 0
Oct 24 21:29:12 meteor kernel: [85943.672209] .load : 0
Oct 24 21:29:12 meteor kernel: [85943.672214] .load_avg : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672219] .load_period : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672224] .load_contrib : 0
Oct 24 21:29:12 meteor kernel: [85943.672228] .load_tg : 0
Oct 24 21:29:12 meteor kernel: [85943.672233]
Oct 24 21:29:12 meteor kernel: [85943.672235] rt_rq[0]:
Oct 24 21:29:12 meteor kernel: [85943.672238] .rt_nr_running : 0
Oct 24 21:29:12 meteor kernel: [85943.672243] .rt_throttled : 0
Oct 24 21:29:12 meteor kernel: [85943.672248] .rt_time : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672253] .rt_runtime : 950.000000
Oct 24 21:29:12 meteor kernel: [85943.672259]
Oct 24 21:29:12 meteor kernel: [85943.672261] runnable tasks:
Oct 24 21:29:12 meteor kernel: [85943.672263] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
Oct 24 21:29:12 meteor kernel: [85943.672267] ----------------------------------------------------------------------------------------------------------
Oct 24 21:29:12 meteor kernel: [85943.672302]
Oct 24 21:29:12 meteor kernel: [85943.672304] cpu#1, 2526.987 MHz
Oct 24 21:29:12 meteor kernel: [85943.672309] .nr_running : 0
Oct 24 21:29:12 meteor kernel: [85943.672313] .load : 0
Oct 24 21:29:12 meteor kernel: [85943.672318] .nr_switches : 72970323
Oct 24 21:29:12 meteor kernel: [85943.672322] .nr_load_updates : 2931016
Oct 24 21:29:12 meteor kernel: [85943.672327] .nr_uninterruptible : -3962
Oct 24 21:29:12 meteor kernel: [85943.672332] .next_balance : 4316.378236
Oct 24 21:29:12 meteor kernel: [85943.672337] .curr->pid : 0
Oct 24 21:29:12 meteor kernel: [85943.672342] .clock : 85943605.322844
Oct 24 21:29:12 meteor kernel: [85943.672347] .cpu_load[0] : 1024
Oct 24 21:29:12 meteor kernel: [85943.672352] .cpu_load[1] : 512
Oct 24 21:29:12 meteor kernel: [85943.672356] .cpu_load[2] : 256
Oct 24 21:29:12 meteor kernel: [85943.672360] .cpu_load[3] : 128
Oct 24 21:29:12 meteor kernel: [85943.672365] .cpu_load[4] : 64
Oct 24 21:29:12 meteor kernel: [85943.672370]
Oct 24 21:29:12 meteor kernel: [85943.672371] cfs_rq[1]:/
Oct 24 21:29:12 meteor kernel: [85943.672376] .exec_clock : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672381] .MIN_vruntime : 0.000001
Oct 24 21:29:12 meteor kernel: [85943.672386] .min_vruntime : 7848861.840230
Oct 24 21:29:12 meteor kernel: [85943.672391] .max_vruntime : 0.000001
Oct 24 21:29:12 meteor kernel: [85943.672396] .spread : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672401] .spread0 : 1763181.180175
Oct 24 21:29:12 meteor kernel: [85943.672406] .nr_spread_over : 0
Oct 24 21:29:12 meteor kernel: [85943.672411] .nr_running : 0
Oct 24 21:29:12 meteor kernel: [85943.672415] .load : 0
Oct 24 21:29:12 meteor kernel: [85943.672420] .load_avg : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672425] .load_period : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672429] .load_contrib : 0
Oct 24 21:29:12 meteor kernel: [85943.672434] .load_tg : 0
Oct 24 21:29:12 meteor kernel: [85943.672438]
Oct 24 21:29:12 meteor kernel: [85943.672440] rt_rq[1]:
Oct 24 21:29:12 meteor kernel: [85943.672444] .rt_nr_running : 0
Oct 24 21:29:12 meteor kernel: [85943.672448] .rt_throttled : 0
Oct 24 21:29:12 meteor kernel: [85943.672453] .rt_time : 0.000000
Oct 24 21:29:12 meteor kernel: [85943.672458] .rt_runtime : 950.000000
Oct 24 21:29:12 meteor kernel: [85943.672464]
Oct 24 21:29:12 meteor kernel: [85943.672466] runnable tasks:
Oct 24 21:29:12 meteor kernel: [85943.672468] task PID tree-key switches prio exec-runtime sum-exec sum-sleep
Oct 24 21:29:12 meteor kernel: [85943.672472] ----------------------------------------------------------------------------------------------------------
Oct 24 21:29:12 meteor kernel: [85943.672501]

Gaudenz

--
Ever tried. Ever failed. No matter.
Try again. Fail again. Fail better.
~ Samuel Beckett ~

Matthijs Kooijman - Brevidius

unread,

Oct 26, 2011, 3:10:02 PM10/26/11

to

Hi folks,

it seems this same problem also occurs with udisks-daemon, though the
behaviour is slightly different (it seems to hang even before
undocking, I _think_).

Anyway, here's the dmesg output when trying to suspend:

[26806.508362] Freezing of tasks failed after 20.01 seconds (1 tasks refusing to freeze, wq_busy=0):
[26806.508392] udisks-daemon D ffff88023192cfa0 0 4250 4248 0x00800004
[26806.508399] ffff88023192cfa0 0000000000000086 ffffffff8103a613 ffffffff8134384c
[26806.508405] ffff88022507f160 0000000000012f00 ffff880225a91fd8 ffff880225a91fd8
[26806.508410] 0000000000012f00 ffff88023192cfa0 0000000000012f00 0000000000012f00
[26806.508416] Call Trace:
[26806.508428] [<ffffffff8103a613>] ? need_resched+0x1a/0x23
[26806.508435] [<ffffffff8134384c>] ? __schedule+0x598/0x5af
[26806.508441] [<ffffffff81343d04>] ? schedule_timeout+0x2d/0xd7
[26806.508444] [<ffffffff8103a3a0>] ? test_tsk_need_resched+0xe/0x17
[26806.508447] [<ffffffff8103a471>] ? check_preempt_curr+0x54/0x61
[26806.508449] [<ffffffff8103a509>] ? ttwu_do_wakeup+0x51/0xcb
[26806.508455] [<ffffffff81074878>] ? arch_local_irq_save+0x14/0x1d
[26806.508457] [<ffffffff81343961>] ? wait_for_common+0x9d/0x116
[26806.508461] [<ffffffff81041288>] ? try_to_wake_up+0x199/0x199
[26806.508465] [<ffffffff8105d9dd>] ? start_flush_work+0xec/0x103
[26806.508468] [<ffffffff8105dcf3>] ? flush_work+0x24/0x2c
[26806.508470] [<ffffffff8105d606>] ? do_work_for_cpu+0x1b/0x1b
[26806.508474] [<ffffffff8119f8e1>] ? disk_clear_events+0x86/0xe5
[26806.508504] [<ffffffff81125bf0>] ? check_disk_change+0x22/0x50
[26806.508510] [<ffffffffa00ab02e>] ? sd_open+0xeb/0x197 [sd_mod]
[26806.508513] [<ffffffff8112696e>] ? __blkdev_get+0xe0/0x37d
[26806.508515] [<ffffffff81126eb2>] ? blkdev_get+0x2a7/0x2a7
[26806.508517] [<ffffffff81126dd2>] ? blkdev_get+0x1c7/0x2a7
[26806.508519] [<ffffffff81126eb2>] ? blkdev_get+0x2a7/0x2a7
[26806.508523] [<ffffffff810fce44>] ? __dentry_open+0x17f/0x296
[26806.508527] [<ffffffff811064a1>] ? dget+0x12/0x1e
[26806.508530] [<ffffffff8110953f>] ? do_last+0x479/0x57e
[26806.508533] [<ffffffff81109cb5>] ? path_openat+0xce/0x350
[26806.508536] [<ffffffff81109f80>] ? do_filp_open+0x2c/0x72
[26806.508538] [<ffffffff813438ad>] ? _cond_resched+0x9/0x20
[26806.508541] [<ffffffff811b342d>] ? __strncpy_from_user+0x19/0x4a
[26806.508545] [<ffffffff81113184>] ? alloc_fd+0x69/0x110
[26806.508547] [<ffffffff810fdbda>] ? do_sys_open+0x5f/0xe6
[26806.508552] [<ffffffff81349e52>] ? system_call_fastpath+0x16/0x1b

Not sure if this helps, but I wanted to share this in case it does.

Met vriendelijke groet,

Matthijs Kooijman

--
Brevidius Multimedia Projecten | www.brevidius.nl
Alexander Boersstraat 28 | matt...@brevidius.nl
1071 KZ Amsterdam | tel. +31634764539

signature.asc

Tejun Heo

unread,

Oct 26, 2011, 6:30:02 PM10/26/11

to

Hello,

On Wed, Oct 26, 2011 at 08:26:50PM +0200, Matthijs Kooijman - Brevidius wrote:
> Hi folks,
>
> it seems this same problem also occurs with udisks-daemon, though the
> behaviour is slightly different (it seems to hang even before
> undocking, I _think_).
>
> Anyway, here's the dmesg output when trying to suspend:

Can you please turn on CONFIG_FRAME_POINTER, trigger the problem, wait
for some minutes and then do sysrq-t and report the dmesg output?
This could be from DEAD state handling race currently being addressed
in upstream.

Tejun Heo

unread,

Oct 30, 2011, 4:40:02 PM10/30/11

to

Hello,

On Fri, Oct 28, 2011 at 03:47:43PM +0200, Matthijs Kooijman wrote:
> I'm not sure if the FRAME_POINTER option actually works, though, since I
> couldn't find any obvious differences in the format of the stack traces?

The lines w/o leading "?" are from stack frames discovered by
following frame pointer. The ones w/ "?" are from dumb stack
scanning. w/o FP, it wouldn't be possible to tell between the actual
ones from the spurious ones.

So, the following kworker is what the scsi_id is waiting for in
flush_work().

kworker/2:1 D ffff880206054830 0 10775 2 0x00000000
ffff880206067aa0 0000000000000046 ffff880206067a50 ffffffff8105476d
ffff88022ed58140 ffff880206067fd8 ffff880206067fd8 0000000000012f40
ffff880206054830 ffff88022ed58140 ffff880200000002 0000000000000286
Call Trace:
[<ffffffff8134c996>] schedule+0x55/0x57
[<ffffffff8134cc9f>] schedule_timeout+0xa2/0xd9
[<ffffffff8134c832>] wait_for_common+0x9e/0x115
[<ffffffff8134c925>] wait_for_completion_timeout+0xe/0x10
[<ffffffff8119c502>] blk_execute_rq+0xb7/0xf9
[<ffffffffa0043044>] scsi_execute+0xf5/0x14d [scsi_mod]
[<ffffffffa004311e>] scsi_execute_req+0x82/0xb4 [scsi_mod]
[<ffffffffa00bb2d9>] sr_check_events+0x92/0x21e [sr_mod]
[<ffffffffa009d04b>] cdrom_check_events+0x14/0x29 [cdrom]
[<ffffffffa00bb696>] sr_block_check_events+0x14/0x16 [sr_mod]
[<ffffffff8119ea7b>] disk_events_workfn+0x3b/0xd7
[<ffffffff8105dcc9>] process_one_work+0x165/0x27a
[<ffffffff8105ed4f>] worker_thread+0xce/0x152
[<ffffffff81061f6c>] kthread+0x7f/0x87
[<ffffffff81355034>] kernel_thread_helper+0x4/0x10

The kworker submitted request but it got lost during queue destruction
and blk_execute_rq() hangs. There have been a number of recent
changes in block devel tree to address this issue. Can you please
test the following branch?

git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-ref

Matthijs Kooijman

unread,

Oct 31, 2011, 5:30:02 AM10/31/11

to

Hi Tejun,

> The kworker submitted request but it got lost during queue destruction
> and blk_execute_rq() hangs. There have been a number of recent
> changes in block devel tree to address this issue. Can you please
> test the following branch?
>
> git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git block-ref

I'm running this kernel now and the problem seems to have disappeared.
I've done a dozen or so undocks and suspends, without any problems.
Also, the "udisks-daemon" process in ps now says "not polling any
devices" after undocking, where it would continue to say "polling sr0"
before.

I'll continue running this kernel for a while and inform you if the
problem resurfaces.

Any other tests I should be doing?

Thanks,

Matthijs

signature.asc

Tejun Heo

unread,

Oct 31, 2011, 11:30:02 AM10/31/11

to

Hello,

On Mon, Oct 31, 2011 at 10:28:12AM +0100, Matthijs Kooijman wrote:
> I'm running this kernel now and the problem seems to have disappeared.
> I've done a dozen or so undocks and suspends, without any problems.
> Also, the "udisks-daemon" process in ps now says "not polling any
> devices" after undocking, where it would continue to say "polling sr0"
> before.

Ah, good to hear.

> I'll continue running this kernel for a while and inform you if the
> problem resurfaces.
>
> Any other tests I should be doing?

Not at the moment, please update after testing some more.

Thank you.

Matthijs Kooijman

unread,

Dec 19, 2011, 6:20:02 AM12/19/11

to

Hi Tehun,

I've been running the tj/misc.git kernel (a3b96625209aae) for six weeks
now, the suspend problems have not occured once since then. I suspect
it's safe to say that the problems are fixed in that kernel.

Are these fixes pushed to the main kernel tree yet?

Gr.

Matthijs

signature.asc

Tejun Heo

unread,

Dec 19, 2011, 11:50:02 AM12/19/11

to

Hello,

Yeah, they're all in mainline now and 3.2 should have it fixed. 3.2
is already at rc6, so you might want to give it a try. It would be
nice to backport the changes to -stable too but it's quite invasive.
Hmmm...

Thanks.