busy inodes -> ext3 umount crash

Jiri Slaby

unread,

Apr 16, 2010, 2:10:01 PM4/16/10

to

Hi,

with mmotm 2010-04-05-16-09 and much older (I hadn't camera to take a
picture) I sometimes get a BUG() trace in ext3 umount code:
http://www.fi.muni.cz/~xslaby/sklad/panics/ext3_1.png
http://www.fi.muni.cz/~xslaby/sklad/panics/ext3_2.png

I have no idea how to reproduce it :(, but it usually happens when I do
shutdown/kexec.

Those busy inodes are pretty common in current kernels, I don't know if
that's related -- I doubt it since it is for different bdevs.

Do you have any clue what to test, how to debug that?

@Honza: this is the one we talked about earlier, you wanted to see
details, but I thought it disappeared. (Just in case you are interested.)

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Jan Kara

unread,

Apr 19, 2010, 10:20:02 AM4/19/10

to

Hi,

On Fri 16-04-10 20:09:05, Jiri Slaby wrote:
> with mmotm 2010-04-05-16-09 and much older (I hadn't camera to take a
> picture) I sometimes get a BUG() trace in ext3 umount code:
> http://www.fi.muni.cz/~xslaby/sklad/panics/ext3_1.png
> http://www.fi.muni.cz/~xslaby/sklad/panics/ext3_2.png

I see several "Busy inodes on umount" messages from several filesystems
and then the complaint on dm-1 about ext3 orphan inodes on umount (which are
actually directories with i_nlink == 0). I guess these are actually caused by
the same bug - some leak in inode references. I think this is a bug
specific to mmotm since otherwise we'd be seeing much more reports of this.
Looking at the patches in mmotm,
vfs-fix-vfs_rename_dir-for-fs_rename_does_d_move-filesystems.patch caught
my eye but frankly I don't see how we could leak dentry because of that
change. Certainly a path taken by dput() changes but the new one does
dentry_iput() as well.

> I have no idea how to reproduce it :(, but it usually happens when I do
> shutdown/kexec.

If it's the patch I suspect above, then moving one directory over another
one might trigger the leak which would be later spotted on umount of the
filesystem. Or maybe to trigger the leak you have to have a process which
has its CWD in the directory you are going to delete by the rename... not
sure.

> Those busy inodes are pretty common in current kernels, I don't know if
> that's related -- I doubt it since it is for different bdevs.

I think it is related - if the busy inode is a deleted one, then you get
exactly the WARN_ON you are reporting... So if you can easily reproduce
the "busy inodes" message then I'd start with debugging that one. Do you
see it also with vanilla kernels?

Honza
--
Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Jiri Slaby

unread,

Apr 19, 2010, 10:40:02 AM4/19/10

to

On 04/19/2010 04:11 PM, Jan Kara wrote:
>> I have no idea how to reproduce it :(, but it usually happens when I do
>> shutdown/kexec.
> If it's the patch I suspect above, then moving one directory over another
> one might trigger the leak which would be later spotted on umount of the
> filesystem. Or maybe to trigger the leak you have to have a process which
> has its CWD in the directory you are going to delete by the rename... not
> sure.

The trigger for busy inodes is as simple as (I=initialization done only
once):
I> # dd if=/dev/zero of=/dev/shm/ext3 bs=1024 count=1 seek=$((100*1024))
I> # mkfs.ext3 -m 0 /dev/shm/ext3
# mount -oloop /dev/shm/ext3 /mnt/c
# umount /mnt/c
# dmesg|tail
VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
Have a nice day...

(The printk time varies -- this sequence really suffices.)

> So if you can easily reproduce
> the "busy inodes" message then I'd start with debugging that one. Do you
> see it also with vanilla kernels?

I don't know, now I'm going to play with that as I have the trigger ;).
Will be back soon.

thanks,
--
js

Jiri Slaby

unread,

Apr 20, 2010, 10:20:02 AM4/20/10

to

On 04/19/2010 04:33 PM, Jiri Slaby wrote:
> The trigger for busy inodes is as simple as (I=initialization done only
> once):
> I> # dd if=/dev/zero of=/dev/shm/ext3 bs=1024 count=1 seek=$((100*1024))
> I> # mkfs.ext3 -m 0 /dev/shm/ext3
> # mount -oloop /dev/shm/ext3 /mnt/c
> # umount /mnt/c
> # dmesg|tail
> VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
> Have a nice day...
>
> (The printk time varies -- this sequence really suffices.)

Well, this happens only after gnome-session is started and it's fuzzy --
sometimes it happens, sometimes not. I didn't find 100% trigger yet.

>> So if you can easily reproduce
>> the "busy inodes" message then I'd start with debugging that one. Do you
>> see it also with vanilla kernels?

Vanilla seems not to be affected. It's in next/master already though
(2603ecd9). I'll investigate it further later.

Jan Kara

unread,

Apr 20, 2010, 11:30:03 AM4/20/10

to

On Tue 20-04-10 16:12:03, Jiri Slaby wrote:
> On 04/19/2010 04:33 PM, Jiri Slaby wrote:
> > The trigger for busy inodes is as simple as (I=initialization done only
> > once):
> > I> # dd if=/dev/zero of=/dev/shm/ext3 bs=1024 count=1 seek=$((100*1024))
> > I> # mkfs.ext3 -m 0 /dev/shm/ext3
> > # mount -oloop /dev/shm/ext3 /mnt/c
> > # umount /mnt/c
> > # dmesg|tail
> > VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
> > Have a nice day...
> >
> > (The printk time varies -- this sequence really suffices.)
>
> Well, this happens only after gnome-session is started and it's fuzzy --
> sometimes it happens, sometimes not. I didn't find 100% trigger yet.

Hmph - maybe something in inotify? Dunno...

> >> So if you can easily reproduce
> >> the "busy inodes" message then I'd start with debugging that one. Do you
> >> see it also with vanilla kernels?
>
> Vanilla seems not to be affected. It's in next/master already though
> (2603ecd9). I'll investigate it further later.

Do you mean it's in today's linux-next but not in Linus' tree?

Honza
--
Jan Kara <ja...@suse.cz>
SUSE Labs, CR

Jiri Slaby

unread,

Apr 21, 2010, 11:20:02 AM4/21/10

to

On 04/20/2010 05:28 PM, Jan Kara wrote:
> On Tue 20-04-10 16:12:03, Jiri Slaby wrote:
>> On 04/19/2010 04:33 PM, Jiri Slaby wrote:
>>> The trigger for busy inodes is as simple as (I=initialization done only
>>> once):
>>> I> # dd if=/dev/zero of=/dev/shm/ext3 bs=1024 count=1 seek=$((100*1024))
>>> I> # mkfs.ext3 -m 0 /dev/shm/ext3
>>> # mount -oloop /dev/shm/ext3 /mnt/c
>>> # umount /mnt/c
>>> # dmesg|tail
>>> VFS: Busy inodes after unmount of loop0. Self-destruct in 5 seconds.
>>> Have a nice day...
>>>
>>> (The printk time varies -- this sequence really suffices.)
>>
>> Well, this happens only after gnome-session is started and it's fuzzy --
>> sometimes it happens, sometimes not. I didn't find 100% trigger yet.
> Hmph - maybe something in inotify? Dunno...

fsnotify...

>>>> So if you can easily reproduce
>>>> the "busy inodes" message then I'd start with debugging that one. Do you
>>>> see it also with vanilla kernels?
>>
>> Vanilla seems not to be affected. It's in next/master already though
>> (2603ecd9). I'll investigate it further later.
> Do you mean it's in today's linux-next but not in Linus' tree?

Yes, exactly.

And the winner is (seemingly):
commit 69c1182c4e5d8b7da772ddad512c6f6b67ec1bb8
Author: Eric Paris <epa...@redhat.com>
Date: Thu Dec 17 21:24:27 2009 -0500

fsnotify: vfsmount marks generic functions

Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.

Signed-off-by: Eric Paris <epa...@redhat.com>

I can't verify it by reverting on the top of -mm as it doesn't revert
cleanly. Do you see any bug in there?

thanks,
--
js

Eric Paris

unread,

Apr 21, 2010, 11:30:02 AM4/21/10

to

You cannot revert, but you can certainly reset --hard before and after
this patch. I'll take a look, but I'm not seeing a problem right off
hand. This patch wasn't supposed to mess with inode refcounting at
all....

-Eric

Jiri Slaby

unread,

Apr 21, 2010, 11:50:02 AM4/21/10

to

On 04/21/2010 05:24 PM, Eric Paris wrote:
> I'll take a look, but I'm not seeing a problem right off
> hand. This patch wasn't supposed to mess with inode refcounting at
> all....

Heh, but with very high probability now, it did :). Do you want me to
inject some printouts anywhere? Can you reproduce it? KDE doesn't
trigger the bug. After I switched from KDE to gnome in qemu, it started
to occur (X :0 & (sleep 1; DISPLAY=:0 gnome-session)).

--
js

Eric Paris

unread,

Apr 21, 2010, 12:10:01 PM4/21/10

to

On Wed, 2010-04-21 at 17:47 +0200, Jiri Slaby wrote:
> On 04/21/2010 05:24 PM, Eric Paris wrote:
> > I'll take a look, but I'm not seeing a problem right off
> > hand. This patch wasn't supposed to mess with inode refcounting at
> > all....
>
> Heh, but with very high probability now, it did :). Do you want me to
> inject some printouts anywhere? Can you reproduce it? KDE doesn't
> trigger the bug. After I switched from KDE to gnome in qemu, it started
> to occur (X :0 & (sleep 1; DISPLAY=:0 gnome-session)).

Well I reproduced and I'll take a look. reliable steps seem to be:

# dd if=/dev/zero of=/dev/shm/ext3 bs=1024 count=1 seek=$((100*1024))

# mkfs.ext3 -m 0 /dev/shm/ext3
# mount -oloop /dev/shm/ext3 /mnt/c

# touch /mnt/c/file
# inotifywait -m /mnt/c/file

# umount /mnt/c
# dmesg|tail

-Eric

Eric Paris

unread,

Apr 21, 2010, 5:30:02 PM4/21/10

to

On Wed, 2010-04-21 at 17:16 +0200, Jiri Slaby wrote:

Surprised noone else ever hit this, it's been broken for a LONG time. In
any case I'll have this in the next time he pushes a -next.

-Eric

commit bf770d242d100882891ac60e42f2cf0096fc3f3c
Author: Eric Paris <epa...@redhat.com>
Date: Wed Apr 21 16:49:38 2010 -0400

fsnotify: add iput on inodes when no longer marked

fsnotify takes an igrab on an inode when it adds a mark. The code was
supposed to drop the reference when the mark was removed. The problem
was that what actually happened was below

void fsnotify_destroy_inode_mark(struct fsnotify_mark *mark)
{
...
mark->inode = NULL;
...
}

void fsnotify_destroy_mark(struct fsnotify_mark *mark)
{
struct inode *inode = NULL;
...
if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
fsnotify_destroy_inode_mark(mark);
inode = mark->i.inode;
}
...
if (inode)
iput(inode);
...
}

Obviously the intent was to capture the inode before it was set to NULL in
fsnotify_destory_inode_mark().

Signed-off-by: Eric Paris <epa...@redhat.com>

diff --git a/fs/notify/mark.c b/fs/notify/mark.c
index 1e824e6..8f3b0e7 100644
--- a/fs/notify/mark.c
+++ b/fs/notify/mark.c
@@ -133,8 +133,8 @@ void fsnotify_destroy_mark(struct fsnotify_mark *mark)
spin_lock(&group->mark_lock);

if (mark->flags & FSNOTIFY_MARK_FLAG_INODE) {
- fsnotify_destroy_inode_mark(mark);
inode = mark->i.inode;
+ fsnotify_destroy_inode_mark(mark);
} else if (mark->flags & FSNOTIFY_MARK_FLAG_VFSMOUNT)
fsnotify_destroy_vfsmount_mark(mark);
else