[RFC, PATCH 0/5] fsfreeze: fix sb vs bdev freeze/thaw b0rkage

Dave Chinner

unread,

Jun 10, 2010, 3:30:02 AM6/10/10

to

The following series is for to address bugs in the emergency thawing code, as
well as mismatcheѕ with freezing at the block layer and the superblock that
break the freeze/thaw nesting order.

The first two patches fix the emergency thaw infinite loop reported by Jeff
Merkey and the deadlock on sb->s_umount that the infinite loop hid. These may
be stable kernel candidates.

The remainder of the patches address the bdev/sb mismatch and the fact that sb
level freezing does not nest correctly. For all the places that the bdev
interfaces are used, we need a superblock anyway so we may as well make
freeze/thaw work only at the sb level. As such, this series moves all the
nesting code to the sb from the bdev level and removes the
freeze_bdev/thaw_bdev interfaces completely. It also converts the emergency
thaw to work at the superblock level such that it will now thaw manually frozen
filesystems.

A *big* outstanding problem still exists - freezing takes an active reference
to the superblock, so unmounting an frozen filesystem has some nasty and
unexpected side effects. The existing code results in an unmountable block
device:

# mount /dev/vda /mnt/test
# xfs_freeze -f /mnt/test
# umount /mnt/test
# grep test /proc/mounts
# mkfs.xfs -f -l size=128m /dev/vda
mkfs.xfs: /dev/vda contains a mounted filesystem
Usage: mkfs.xfs
....
# mount /dev/vda /mnt/test
mount: /dev/vda already mounted or /mnt/test busy
#

At this point I can't get access to /dev/vda and needs a reboot to
get it and /mnt/test back.

This patch series results in the block device being mountable, but
remains frozen across unmount/mount:

# mount /dev/vda /mnt/test
# xfs_freeze -f /mnt/test
# umount /mnt/test
# grep test /proc/mounts
# mkfs.xfs -f -l size=128m /dev/vda
mkfs.xfs: /dev/vda contains a mounted filesystem
Usage: mkfs.xfs
....
# mount /dev/vda /mnt/test
# touch /mnt/test/foo &
[1] 2647
#
# xfs_freeze -u /mnt/test
[1]+ Done sudo touch /mnt/test/foo
# umount /mnt/test
# mkfs.xfs -f -l size=128m /dev/vda
meta-data=/dev/vda isize=256 agcount=4, agsize=262144 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=1048576, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=32768, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
#

This behaviour is only marginally better than the existing behaviour
(at least you can release the references). However, I don't really
like either option - we used to disallow umount on a frozen
filesystems to avoid this problem.

So What is really supposed to happen when we unmount a frozen
superblock? Should unmount return EBUSY? Should it be automatically
thawed so it doesn't affect block device behaviour after unmount?
Something else?

Cheers,

Dave.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Dave Chinner

unread,

Jun 10, 2010, 3:30:03 AM6/10/10

to

From: Dave Chinner <dchi...@redhat.com>

freeze/thaw_bdev are now just trivial wrappers around
freeze/thaw_super(). Convert all users of the bdev interfaces to use
the superblock interfaces instead, and remove the bdev interfaces.

Signed-off-by: Dave Chinner <dchi...@redhat.com>
---
drivers/md/dm.c | 13 +++++++----
fs/block_dev.c | 54 ----------------------------------------------------
fs/buffer.c | 2 +-
fs/super.c | 1 +
fs/xfs/xfs_fsops.c | 10 ++------
include/linux/fs.h | 19 ------------------
6 files changed, 13 insertions(+), 86 deletions(-)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index d21e128..70d5fd6 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2427,15 +2427,18 @@ static int lock_fs(struct mapped_device *md)

WARN_ON(md->frozen_sb);

- md->frozen_sb = freeze_bdev(md->bdev);
- if (IS_ERR(md->frozen_sb)) {
- r = PTR_ERR(md->frozen_sb);
+ md->frozen_sb = get_active_super(md->bdev);
+ if (!md->frozen_sb)
+ return -EIO;
+ r = freeze_super(md->frozen_sb);
+ if (r) {
+ deactivate_super(md->frozen_sb);
md->frozen_sb = NULL;
return r;
}
+ deactivate_super(md->frozen_sb);

set_bit(DMF_FROZEN, &md->flags);
-
return 0;
}

@@ -2444,7 +2447,7 @@ static void unlock_fs(struct mapped_device *md)
if (!test_bit(DMF_FROZEN, &md->flags))
return;

- thaw_bdev(md->bdev, md->frozen_sb);
+ thaw_super(md->frozen_sb);
md->frozen_sb = NULL;
clear_bit(DMF_FROZEN, &md->flags);
}
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 84899b3..3c3d1fe 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -213,60 +213,6 @@ int fsync_bdev(struct block_device *bdev)
}
EXPORT_SYMBOL(fsync_bdev);

-/**
- * freeze_bdev -- lock a filesystem and force it into a consistent state
- * @bdev: blockdevice to lock
- *
- * If a superblock is found on this device, we take the s_umount semaphore
- * on it to make sure nobody unmounts until the snapshot creation is done.
- * The reference counter (bd_fsfreeze_count) guarantees that only the last
- * unfreeze process can unfreeze the frozen filesystem actually when multiple
- * freeze requests arrive simultaneously. It counts up in freeze_bdev() and
- * count down in thaw_bdev(). When it becomes 0, thaw_bdev() will unfreeze
- * actually.
- */
-struct super_block *freeze_bdev(struct block_device *bdev)
-{
- struct super_block *sb;
- int error = 0;
-
- sb = get_active_super(bdev);
- if (!sb)
- goto out;
- error = freeze_super(sb);
- if (error) {
- deactivate_super(sb);
- return ERR_PTR(error);
- }
- deactivate_super(sb);
- out:
- sync_blockdev(bdev);
- return sb; /* thaw_bdev releases s->s_umount */
-}
-EXPORT_SYMBOL(freeze_bdev);
-
-/**
- * thaw_bdev -- unlock filesystem
- * @bdev: blockdevice to unlock
- * @sb: associated superblock
- *
- * Unlocks the filesystem and marks it writeable again after freeze_bdev().
- */
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
- if (!sb)
- return -EINVAL;
- return thaw_super(sb);
-}
-EXPORT_SYMBOL(thaw_bdev);
-
-int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
-{
- if (!sb)
- return -EINVAL;
- return thaw_super_emergency(sb);
-}
-
static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
{
return block_write_full_page(page, blkdev_get_block, wbc);
diff --git a/fs/buffer.c b/fs/buffer.c
index f0c55d9..b095fc1 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -564,7 +564,7 @@ repeat:
static void do_thaw_one(struct super_block *sb, void *unused)
{
char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))
+ while (sb->s_bdev && !thaw_super_emergency(sb))
printk(KERN_WARNING "Emergency Thaw on %s\n",
bdevname(sb->s_bdev, b));
}
diff --git a/fs/super.c b/fs/super.c
index 5f13431..81a4034 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -976,6 +976,7 @@ int freeze_super(struct super_block *sb)
}
}
out_active:
+ sync_blockdev(sb->s_bdev);
up_write(&sb->s_umount);
out_unlock:
mutex_unlock(&sb->s_freeze_mutex);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 37a6f62..cc993a5 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -642,16 +642,12 @@ xfs_fs_goingdown(
__uint32_t inflags)
{
switch (inflags) {
- case XFS_FSOP_GOING_FLAGS_DEFAULT: {
- struct super_block *sb = freeze_bdev(mp->m_super->s_bdev);
-
- if (sb && !IS_ERR(sb)) {
+ case XFS_FSOP_GOING_FLAGS_DEFAULT:
+ if (!freeze_super(mp->m_super)) {
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
- thaw_bdev(sb->s_bdev, sb);
+ thaw_super(mp->m_super);
}
-
break;
- }
case XFS_FSOP_GOING_FLAGS_LOGFLUSH:
xfs_force_shutdown(mp, SHUTDOWN_FORCE_UMOUNT);
break;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index f92b077..39bf4ac 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1951,30 +1951,11 @@ extern void invalidate_bdev(struct block_device *);
extern int sync_blockdev(struct block_device *bdev);
extern struct super_block *freeze_bdev(struct block_device *);
extern void emergency_thaw_all(void);
-extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
-extern int thaw_bdev_emergency(struct block_device *bdev,
- struct super_block *sb);
extern int fsync_bdev(struct block_device *);
#else
static inline void bd_forget(struct inode *inode) {}
static inline int sync_blockdev(struct block_device *bdev) { return 0; }
static inline void invalidate_bdev(struct block_device *bdev) {}
-
-static inline struct super_block *freeze_bdev(struct block_device *sb)
-{
- return NULL;
-}
-
-static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
-{
- return 0;
-}
-
-static inline int thaw_bdev_emergency(struct block_device *bdev,
- struct super_block *sb)
-{
- return 0;
-}
#endif
extern int sync_filesystem(struct super_block *);
extern const struct file_operations def_blk_fops;
--
1.7.1

Dave Chinner

unread,

Jun 10, 2010, 3:30:03 AM6/10/10

to

From: Dave Chinner <dchi...@redhat.com>

The thawing of a filesystem through sysrq-j loops infinitely as it
incorrectly detects a thawed filesytsem as frozen and tries to
unfreeze repeatedly. This is a regression caused by
4504230a71566785a05d3e6b53fa1ee071b864eb ("freeze_bdev: grab active
reference to frozen superblocks") in that it no longer returned
-EINVAL for superblocks that were not frozen.

Return -EINVAL when the filesystem is already unfrozen to avoid this
problem.

Signed-off-by: Dave Chinner <dchi...@redhat.com>
---

fs/block_dev.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 7346c96..366ac38 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -276,22 +276,19 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
if (!bdev->bd_fsfreeze_count)
goto out;

- error = 0;
- if (--bdev->bd_fsfreeze_count > 0)
+ if (!sb)
goto out;

- if (!sb)
+ error = 0;
+ if (--bdev->bd_fsfreeze_count > 0)
goto out;

error = thaw_super(sb);
- if (error) {
+ if (error)
bdev->bd_fsfreeze_count++;
- mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return error;
- }
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
- return 0;
+ return error;
}
EXPORT_SYMBOL(thaw_bdev);

--
1.7.1

Dave Chinner

unread,

Jun 10, 2010, 3:30:03 AM6/10/10

to

From: Dave Chinner <dchi...@redhat.com>

It makes no sense having the emergency thaw code in fs/buffer.c when all of
it's operations are one superblocks and the code it executes is all in
fs/super.c. Move the code there and clean it up.

Signed-off-by: Dave Chinner <dchi...@redhat.com>
---

fs/buffer.c | 31 -------------------------------
fs/super.c | 37 +++++++++++++++++++++++++++++++++----
include/linux/fs.h | 1 -
3 files changed, 33 insertions(+), 36 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index b095fc1..61bd994 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -561,37 +561,6 @@ repeat:
return err;
}

-static void do_thaw_one(struct super_block *sb, void *unused)
-{
- char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_super_emergency(sb))
- printk(KERN_WARNING "Emergency Thaw on %s\n",
- bdevname(sb->s_bdev, b));
-}
-
-static void do_thaw_all(struct work_struct *work)
-{
- iterate_supers(do_thaw_one, NULL);
- kfree(work);
- printk(KERN_WARNING "Emergency Thaw complete\n");
-}
-
-/**
- * emergency_thaw_all -- forcibly thaw every frozen filesystem
- *
- * Used for emergency unfreeze of all filesystems via SysRq
- */
-void emergency_thaw_all(void)
-{
- struct work_struct *work;
-
- work = kmalloc(sizeof(*work), GFP_ATOMIC);
- if (work) {
- INIT_WORK(work, do_thaw_all);
- schedule_work(work);
- }
-}
-
/**
* sync_mapping_buffers - write out & wait upon a mapping's "associated" buffers
* @mapping: the mapping which wants those buffers written
diff --git a/fs/super.c b/fs/super.c
index 81a4034..680b8d5 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1059,7 +1059,7 @@ out_unlock:
}

/**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
* @sb: the super to thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_super().
@@ -1075,11 +1075,40 @@ EXPORT_SYMBOL(thaw_super);
* @sb: the super to thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_super().
- * This avoids taking the s_umount lock if it is already held.
+ * This avoids taking the s_umount lock because it is already held.
+ */
+static void thaw_super_emergency(struct super_block *sb, void *unused)
+{
+
+ if (sb->s_bdev) {
+ char b[BDEVNAME_SIZE];
+ printk(KERN_WARNING "Emergency Thaw on %s.\n",
+ bdevname(sb->s_bdev, b));
+ }
+ while (!__thaw_super(sb, 1));
+}
+
+static void do_thaw_all(struct work_struct *work)
+{
+ iterate_supers(thaw_super_emergency, NULL);
+ kfree(work);
+ printk(KERN_WARNING "Emergency Thaw complete\n");
+}
+
+/**
+ * emergency_thaw_all -- forcibly thaw every frozen filesystem
+ *
+ * Used for emergency unfreeze of all filesystems via SysRq
*/
-int thaw_super_emergency(struct super_block *sb)
+void emergency_thaw_all(void)
{
- return __thaw_super(sb, 1);
+ struct work_struct *work;
+
+ work = kmalloc(sizeof(*work), GFP_ATOMIC);
+ if (work) {
+ INIT_WORK(work, do_thaw_all);
+ schedule_work(work);
+ }
}

static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 39bf4ac..a704062 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1801,7 +1801,6 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,
extern int vfs_statfs(struct dentry *, struct kstatfs *);
extern int freeze_super(struct super_block *super);
extern int thaw_super(struct super_block *super);
-extern int thaw_super_emergency(struct super_block *super);

extern int current_umask(void);

--
1.7.1

Dave Chinner

unread,

Jun 10, 2010, 3:30:03 AM6/10/10

to

From: Dave Chinner <dchi...@redhat.com>

The emergency thaw process uses iterate_super() which holds the
sb->s_umount lock in read mode. The current thaw_super() code takes
the sb->s_umount lock in write mode, hence leading to an instant
deadlock.

Pass the emergency state into the thaw_bdev/thaw_super code to avoid
taking the s_umount lock in this case. We are running under the bdev
freeze mutex, so this is still serialised against freeze despite
only having a read lock on the sb->s_umount. Hence it should be safe
to execute in this manner, especially given that emergency thaw is a
rarely executed "get-out-of-jail" feature.

Signed-off-by: Dave Chinner <dchi...@redhat.com>
---

fs/block_dev.c | 26 ++++++++++++++++++++--
fs/buffer.c | 2 +-
fs/super.c | 58 +++++++++++++++++++++++++++++++++++++++++++---------
include/linux/fs.h | 9 ++++++++
4 files changed, 81 insertions(+), 14 deletions(-)

diff --git a/fs/block_dev.c b/fs/block_dev.c
index 366ac38..a8c8224 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -262,13 +262,14 @@ struct super_block *freeze_bdev(struct block_device *bdev)
EXPORT_SYMBOL(freeze_bdev);

/**
- * thaw_bdev -- unlock filesystem
+ * __thaw_bdev -- unlock filesystem

* @bdev: blockdevice to unlock

* @sb: associated superblock
+ * @emergency: emergency thaw
*
* Unlocks the filesystem and marks it writeable again after freeze_bdev().
*/
-int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+static int __thaw_bdev(struct block_device *bdev, struct super_block *sb, int emergency)
{
int error = -EINVAL;

@@ -283,15 +284,34 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)

if (--bdev->bd_fsfreeze_count > 0)
goto out;

- error = thaw_super(sb);
+ if (emergency)
+ error = thaw_super_emergency(sb);
+ else
+ error = thaw_super(sb);
if (error)
bdev->bd_fsfreeze_count++;
out:
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return error;
}
+/**
+ * thaw_bdev -- unlock filesystem
+ * @bdev: blockdevice to unlock
+ * @sb: associated superblock
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_bdev().
+ */
+int thaw_bdev(struct block_device *bdev, struct super_block *sb)
+{
+ return __thaw_bdev(bdev, sb, 0);
+}
EXPORT_SYMBOL(thaw_bdev);

+int thaw_bdev_emergency(struct block_device *bdev, struct super_block *sb)
+{
+ return __thaw_bdev(bdev, sb, 1);
+}
+

static int blkdev_writepage(struct page *page, struct writeback_control *wbc)
{
return block_write_full_page(page, blkdev_get_block, wbc);

diff --git a/fs/buffer.c b/fs/buffer.c
index d54812b..f0c55d9 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c

@@ -564,7 +564,7 @@ repeat:

static void do_thaw_one(struct super_block *sb, void *unused)

{
char b[BDEVNAME_SIZE];
- while (sb->s_bdev && !thaw_bdev(sb->s_bdev, sb))
+ while (sb->s_bdev && !thaw_bdev_emergency(sb->s_bdev, sb))

printk(KERN_WARNING "Emergency Thaw on %s\n",

bdevname(sb->s_bdev, b));
}
diff --git a/fs/super.c b/fs/super.c
index 5c35bc7..76ed922 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -987,19 +987,24 @@ int freeze_super(struct super_block *sb)
EXPORT_SYMBOL(freeze_super);

/**
- * thaw_super -- unlock filesystem
+ * __thaw_super -- unlock filesystem
* @sb: the super to thaw

+ * @emergency: emergency thaw

*
* Unlocks the filesystem and marks it writeable again after freeze_super().

+ * If we are doing an emergency thaw, we don't need to grab the sb->s_umount
+ * lock as it is already held.
*/
-int thaw_super(struct super_block *sb)
+static int __thaw_super(struct super_block *sb, int emergency)
{
- int error;
+ int error = 0;
+
+ if (!emergency)
+ down_write(&sb->s_umount);

- down_write(&sb->s_umount);
if (sb->s_frozen == SB_UNFROZEN) {
- up_write(&sb->s_umount);
- return -EINVAL;
+ error = -EINVAL;
+ goto out_unlock;
}

if (sb->s_flags & MS_RDONLY)
@@ -1011,8 +1016,7 @@ int thaw_super(struct super_block *sb)
printk(KERN_ERR
"VFS:Filesystem thaw failed\n");
sb->s_frozen = SB_FREEZE_TRANS;
- up_write(&sb->s_umount);
- return error;
+ goto out_unlock;
}
}

@@ -1020,12 +1024,46 @@ out:
sb->s_frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_wait_unfrozen);
- deactivate_locked_super(sb);
-
+ /*
+ * When called from emergency scope, we cannot grab the s_umount lock
+ * so we cannot deactivate the superblock. This may leave unbalanced
+ * superblock references which could prevent unmount, but given this is
+ * an emergency operation....
+ */
+ if (!emergency)
+ deactivate_locked_super(sb);
return 0;
+
+out_unlock:
+ if (!emergency)
+ up_write(&sb->s_umount);
+ return error;
+}
+
+/**
+ * thaw_super -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ */
+int thaw_super(struct super_block *sb)
+{
+ return __thaw_super(sb, 0);
}
EXPORT_SYMBOL(thaw_super);

+/**
+ * thaw_super_emergency -- unlock filesystem
+ * @sb: the super to thaw
+ *
+ * Unlocks the filesystem and marks it writeable again after freeze_super().
+ * This avoids taking the s_umount lock if it is already held.
+ */
+int thaw_super_emergency(struct super_block *sb)
+{
+ return __thaw_super(sb, 1);
+}

+
static struct vfsmount *fs_set_subtype(struct vfsmount *mnt, const char *fstype)

{
int err;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 471e1ff..e246389 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1803,6 +1803,7 @@ extern int iterate_mounts(int (*)(struct vfsmount *, void *), void *,

extern int vfs_statfs(struct dentry *, struct kstatfs *);
extern int freeze_super(struct super_block *super);
extern int thaw_super(struct super_block *super);

+extern int thaw_super_emergency(struct super_block *super);

extern int current_umask(void);

@@ -1953,6 +1954,8 @@ extern int sync_blockdev(struct block_device *bdev);

extern struct super_block *freeze_bdev(struct block_device *);
extern void emergency_thaw_all(void);

extern int thaw_bdev(struct block_device *bdev, struct super_block *sb);
+extern int thaw_bdev_emergency(struct block_device *bdev,
+ struct super_block *sb);

extern int fsync_bdev(struct block_device *);
#else
static inline void bd_forget(struct inode *inode) {}

@@ -1968,6 +1971,12 @@ static inline int thaw_bdev(struct block_device *bdev, struct super_block *sb)
{
return 0;
}
+
+static inline int thaw_bdev_emergency(struct block_device *bdev,
+ struct super_block *sb)
+{
+ return 0;
+}

#endif
extern int sync_filesystem(struct super_block *);
extern const struct file_operations def_blk_fops;
--
1.7.1

--

Josef Bacik

unread,

Jun 10, 2010, 8:40:02 AM6/10/10

to

My vote is for EBUSY, that way we don't get this weird unexpected behavior
thing. Plus if dm is doing a snapshot or whatever, unfreezing the fs to let it
unmount in the middle of it doing its thing could be unfun. Thanks,

Josef

Christoph Hellwig

unread,

Jun 14, 2010, 11:20:03 AM6/14/10

to

On Thu, Jun 10, 2010 at 05:19:50PM +1000, Dave Chinner wrote:
> Return -EINVAL when the filesystem is already unfrozen to avoid this
> problem.

This includes some additional changes in addition to the description,
and at least one of them seems incorrect.

> - error = 0;
> - if (--bdev->bd_fsfreeze_count > 0)
> + if (!sb)
> goto out;
>
> - if (!sb)
> + error = 0;
> + if (--bdev->bd_fsfreeze_count > 0)
> goto out;

Here you reorder the sb check to be before the counter decrement. But
we do support calling freeze_bdev on a device without a superblock, and
you would leak bd_fsfreeze_count for that case and wrongly return
-EINVAL on unthaw for these now.

> error = thaw_super(sb);
> - if (error) {
> + if (error)
> bdev->bd_fsfreeze_count++;
> - mutex_unlock(&bdev->bd_fsfreeze_mutex);
> - return error;
> - }

Ok, useful cleanup.

> out:
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
> - return 0;
> + return error;

And this is the actual fix of course, also looks good.

Christoph Hellwig

unread,

Jun 14, 2010, 11:30:01 AM6/14/10

to

On Thu, Jun 10, 2010 at 05:19:53PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchi...@redhat.com>
>
> freeze/thaw_bdev are now just trivial wrappers around
> freeze/thaw_super(). Convert all users of the bdev interfaces to use
> the superblock interfaces instead, and remove the bdev interfaces.

This requires exporting get_active_super. Buit if we want to keep suppoting
freezing of unmounted block devices we need to keep the helpers anyway.

Christoph Hellwig

unread,

Jun 14, 2010, 11:30:02 AM6/14/10

to

On Thu, Jun 10, 2010 at 05:19:54PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchi...@redhat.com>
>
> It makes no sense having the emergency thaw code in fs/buffer.c when all of
> it's operations are one superblocks and the code it executes is all in
> fs/super.c. Move the code there and clean it up.

> +static void thaw_super_emergency(struct super_block *sb, void *unused)

> +{
> +
> + if (sb->s_bdev) {
> + char b[BDEVNAME_SIZE];
> + printk(KERN_WARNING "Emergency Thaw on %s.\n",
> + bdevname(sb->s_bdev, b));
> + }
> + while (!__thaw_super(sb, 1));

Please move the ; to a separate line to make the loop better redable.

Christoph Hellwig

unread,

Jun 14, 2010, 11:30:02 AM6/14/10

to

On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> The emergency thaw process uses iterate_super() which holds the
> sb->s_umount lock in read mode. The current thaw_super() code takes
> the sb->s_umount lock in write mode, hence leading to an instant
> deadlock.
>
> Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> taking the s_umount lock in this case. We are running under the bdev
> freeze mutex, so this is still serialised against freeze despite
> only having a read lock on the sb->s_umount. Hence it should be safe
> to execute in this manner, especially given that emergency thaw is a
> rarely executed "get-out-of-jail" feature.

This is correct as long as no one calls thaw_super directly, which
is not the case currently.

Dave Chinner

unread,

Jun 14, 2010, 7:30:02 PM6/14/10

to

On Mon, Jun 14, 2010 at 11:18:15AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:50PM +1000, Dave Chinner wrote:
> > Return -EINVAL when the filesystem is already unfrozen to avoid this
> > problem.
>
>
> This includes some additional changes in addition to the description,
> and at least one of them seems incorrect.
>
> > - error = 0;
> > - if (--bdev->bd_fsfreeze_count > 0)
> > + if (!sb)
> > goto out;
> >
> > - if (!sb)
> > + error = 0;
> > + if (--bdev->bd_fsfreeze_count > 0)
> > goto out;
>
> Here you reorder the sb check to be before the counter decrement. But
> we do support calling freeze_bdev on a device without a superblock, and
> you would leak bd_fsfreeze_count for that case and wrongly return
> -EINVAL on unthaw for these now.

Ok, will fix it.

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Dave Chinner

unread,

Jun 14, 2010, 7:30:02 PM6/14/10

to

On Mon, Jun 14, 2010 at 11:20:11AM -0400, Christoph Hellwig wrote:
> On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> > The emergency thaw process uses iterate_super() which holds the
> > sb->s_umount lock in read mode. The current thaw_super() code takes
> > the sb->s_umount lock in write mode, hence leading to an instant
> > deadlock.
> >
> > Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> > taking the s_umount lock in this case. We are running under the bdev
> > freeze mutex, so this is still serialised against freeze despite
> > only having a read lock on the sb->s_umount. Hence it should be safe
> > to execute in this manner, especially given that emergency thaw is a
> > rarely executed "get-out-of-jail" feature.
>
> This is correct as long as no one calls thaw_super directly, which
> is not the case currently.

Yeah, the idea of the first two patches is that they can be applied
to a current tree and backported and prevent the infinite loop or
deadlock. The problem of thaw_bdev/thaw_super is what the rest of
the patches are supposed to address.

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Dave Chinner

unread,

Jun 20, 2010, 10:00:02 PM6/20/10

to

On Mon, Jun 14, 2010 at 11:20:11AM -0400, Christoph Hellwig wrote:

> On Thu, Jun 10, 2010 at 05:19:51PM +1000, Dave Chinner wrote:
> > The emergency thaw process uses iterate_super() which holds the
> > sb->s_umount lock in read mode. The current thaw_super() code takes
> > the sb->s_umount lock in write mode, hence leading to an instant
> > deadlock.
> >
> > Pass the emergency state into the thaw_bdev/thaw_super code to avoid
> > taking the s_umount lock in this case. We are running under the bdev
> > freeze mutex, so this is still serialised against freeze despite
> > only having a read lock on the sb->s_umount. Hence it should be safe
> > to execute in this manner, especially given that emergency thaw is a
> > rarely executed "get-out-of-jail" feature.
>
> This is correct as long as no one calls thaw_super directly, which
> is not the case currently.

This patch doesn't try to deal with the bdev/super mismatches; all
it does is prevent an obvious deadlock. Calling freeze/thaw_super
directly will serialise on the s_umount lock, calling
freeze/thaw_bdev() will serialise on the bdev freeze mutex, and if
we mix the two they'll serialise on the s_umount lock. So I think
with this patch serialisation will still occur correctly but avoid
the current deadlock.

I'll change the commit message to explain this better.

Cheers,

Dave.
--
Dave Chinner
da...@fromorbit.com

Christoph Hellwig

unread,

Jun 21, 2010, 3:50:01 AM6/21/10

to

On Mon, Jun 21, 2010 at 11:57:31AM +1000, Dave Chinner wrote:
> This patch doesn't try to deal with the bdev/super mismatches; all
> it does is prevent an obvious deadlock. Calling freeze/thaw_super
> directly will serialise on the s_umount lock, calling
> freeze/thaw_bdev() will serialise on the bdev freeze mutex, and if
> we mix the two they'll serialise on the s_umount lock. So I think
> with this patch serialisation will still occur correctly but avoid
> the current deadlock.
>
> I'll change the commit message to explain this better.

I don;t think the explanation alone is enough.

Right now thaw_super itself is only serialized by exclusive shared
s_umount. thaw_bdev it also serialized by bd_fsfreeze_mutex, but
there are callers of thaw_super that do not go through thaw_bdev, so
our locking is not enough here.