fix queue freeze and limit locking order v2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:26 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Hi all,

this is my version of Damien's "Fix queue freeze and limit locking order".
A lot looks very similar, but it was done independently based on the
previous discussion.

Changes since v1:
- more comment typo fixing
- fix loop as well
- make the poll sysfs attr show method more accurate

Changes since RFC:
- fix a bizzare virtio_blk bisection snafu
- set BLK_FEAT_POLL a little less eagerly for blk-mq
- drop the loop patch just adding a comment
- improve various commit logs and coments

Diffstat:
block/blk-core.c | 17 ++++-
block/blk-integrity.c | 4 -
block/blk-mq.c | 17 -----
block/blk-settings.c | 27 +++++++-
block/blk-sysfs.c | 134 ++++++++++++++++++++---------------------
block/blk-zoned.c | 7 --
block/blk.h | 1
drivers/block/loop.c | 52 ++++++++++-----
drivers/block/nbd.c | 17 -----
drivers/block/virtio_blk.c | 4 -
drivers/nvme/host/core.c | 9 +-
drivers/scsi/sd.c | 17 +----
drivers/scsi/sr.c | 5 -
drivers/usb/storage/scsiglue.c | 5 -
include/linux/blkdev.h | 5 -
15 files changed, 164 insertions(+), 157 deletions(-)

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:30 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

queue_limits_commit_update is the function that needs to operate on a
frozen queue, not queue_limits_start_update. Update the kerneldoc
comments to reflect that.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Damien Le Moal <dle...@kernel.org>
Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---
block/blk-settings.c | 3 ++-
include/linux/blkdev.h | 3 +--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8f09e33f41f6..89d8366fd43c 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
* @lim: limits to apply
*
* Apply the limits in @lim that were obtained from queue_limits_start_update()
- * and updated by the caller to @q.
+ * and updated by the caller to @q. The caller must have frozen the queue or
+ * ensure that there are no outstanding I/Os by other means.
*
* Returns 0 if successful, else a negative error code.
*/
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5d40af2ef971..e781d4e6f92d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -944,8 +944,7 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
* the caller can modify. The caller must call queue_limits_commit_update()
* to finish the update.
*
- * Context: process context. The caller must have frozen the queue or ensured
- * that there is outstanding I/O by other means.
+ * Context: process context.
*/
static inline struct queue_limits
queue_limits_start_update(struct request_queue *q)
--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:34 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Add a helper that freezes the queue, updates the queue limits and
unfreezes the queue and convert all open coded versions of that to the
new helper.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Damien Le Moal <dle...@kernel.org>
Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---

block/blk-integrity.c | 4 +---
block/blk-settings.c | 24 ++++++++++++++++++++++++
block/blk-zoned.c | 7 +------
drivers/block/virtio_blk.c | 4 +---
drivers/scsi/sd.c | 17 +++++------------
drivers/scsi/sr.c | 5 +----
include/linux/blkdev.h | 2 ++
7 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index b180cac61a9d..013469faa5e7 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -218,9 +218,7 @@ static ssize_t flag_store(struct device *dev, const char *page, size_t count,
else
lim.integrity.flags |= flag;

- blk_mq_freeze_queue(q);
- err = queue_limits_commit_update(q, &lim);
- blk_mq_unfreeze_queue(q);
+ err = queue_limits_commit_update_frozen(q, &lim);
if (err)
return err;
return count;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 89d8366fd43c..7c099d686dd8 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -444,6 +444,30 @@ int queue_limits_commit_update(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(queue_limits_commit_update);

+/**
+ * queue_limits_commit_update_frozen - commit an atomic update of queue limits
+ * @q: queue to update
+ * @lim: limits to apply
+ *
+ * Apply the limits in @lim that were obtained from queue_limits_start_update()
+ * and updated by the caller to @q. Freezes the queue before the updated and
+ * unfreezes it after.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_commit_update_frozen(struct request_queue *q,
+ struct queue_limits *lim)
+{
+ int ret;
+
+ blk_mq_freeze_queue(q);
+ ret = queue_limits_commit_update(q, lim);
+ blk_mq_unfreeze_queue(q);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(queue_limits_commit_update_frozen);
+
/**
* queue_limits_set - apply queue limits to queue
* @q: queue to update
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 4b0be40a8ea7..9d08a54c201e 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -1444,7 +1444,6 @@ static int disk_update_zone_resources(struct gendisk *disk,
unsigned int nr_seq_zones, nr_conv_zones;
unsigned int pool_size;
struct queue_limits lim;
- int ret;

disk->nr_zones = args->nr_zones;
disk->zone_capacity = args->zone_capacity;
@@ -1495,11 +1494,7 @@ static int disk_update_zone_resources(struct gendisk *disk,
}

commit:
- blk_mq_freeze_queue(q);
- ret = queue_limits_commit_update(q, &lim);
- blk_mq_unfreeze_queue(q);
-
- return ret;
+ return queue_limits_commit_update_frozen(q, &lim);
}

static int blk_revalidate_conv_zone(struct blk_zone *zone, unsigned int idx,
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 71a7ffeafb32..bbaa26b523b8 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1105,9 +1105,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
lim.features |= BLK_FEAT_WRITE_CACHE;
else
lim.features &= ~BLK_FEAT_WRITE_CACHE;
- blk_mq_freeze_queue(disk->queue);
- i = queue_limits_commit_update(disk->queue, &lim);
- blk_mq_unfreeze_queue(disk->queue);
+ i = queue_limits_commit_update_frozen(disk->queue, &lim);
if (i)
return i;
return count;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8947dab132d7..af62a8ed8620 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -177,9 +177,8 @@ cache_type_store(struct device *dev, struct device_attribute *attr,

lim = queue_limits_start_update(sdkp->disk->queue);
sd_set_flush_flag(sdkp, &lim);
- blk_mq_freeze_queue(sdkp->disk->queue);
- ret = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ ret = queue_limits_commit_update_frozen(sdkp->disk->queue,
+ &lim);
if (ret)
return ret;
return count;
@@ -483,9 +482,7 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,

lim = queue_limits_start_update(sdkp->disk->queue);
sd_config_discard(sdkp, &lim, mode);
- blk_mq_freeze_queue(sdkp->disk->queue);
- err = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ err = queue_limits_commit_update_frozen(sdkp->disk->queue, &lim);
if (err)
return err;
return count;
@@ -594,9 +591,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,

lim = queue_limits_start_update(sdkp->disk->queue);
sd_config_write_same(sdkp, &lim);
- blk_mq_freeze_queue(sdkp->disk->queue);
- err = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ err = queue_limits_commit_update_frozen(sdkp->disk->queue, &lim);
if (err)
return err;
return count;
@@ -3803,9 +3798,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_config_write_same(sdkp, &lim);
kfree(buffer);

- blk_mq_freeze_queue(sdkp->disk->queue);
- err = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ err = queue_limits_commit_update_frozen(sdkp->disk->queue, &lim);
if (err)
return err;

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 198bec87bb8e..b17796d5ee66 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -797,10 +797,7 @@ static int get_sectorsize(struct scsi_cd *cd)

lim = queue_limits_start_update(q);
lim.logical_block_size = sector_size;
- blk_mq_freeze_queue(q);
- err = queue_limits_commit_update(q, &lim);
- blk_mq_unfreeze_queue(q);
- return err;
+ return queue_limits_commit_update_frozen(q, &lim);
}

static int get_capabilities(struct scsi_cd *cd)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e781d4e6f92d..13d353351c37 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,6 +952,8 @@ queue_limits_start_update(struct request_queue *q)
mutex_lock(&q->limits_lock);
return q->limits;
}
+int queue_limits_commit_update_frozen(struct request_queue *q,
+ struct queue_limits *lim);
int queue_limits_commit_update(struct request_queue *q,
struct queue_limits *lim);
int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:37 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

When __blk_mq_update_nr_hw_queues changes the number of tag sets, it
might have to disable poll queues. Currently it does so by adjusting
the BLK_FEAT_POLL, which is a bit against the intent of features that
describe hardware / driver capabilities, but more importantly causes
nasty lock order problems with the broadly held freeze when updating the
number of hardware queues and the limits lock. Fix this by leaving
BLK_FEAT_POLL alone, and instead check for the number of poll queues in
the bio submission and poll handlers. While this adds extra work to the
fast path, the variables are in cache lines used by these operations
anyway, so it should be cheap enough.

Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits")
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
block/blk-core.c | 17 ++++++++++++++---
block/blk-mq.c | 17 +----------------
block/blk-sysfs.c | 6 +++++-
block/blk.h | 1 +
4 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 666efe8fa202..4fb495d25c85 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -753,6 +753,18 @@ static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q,
return BLK_STS_OK;
}

+inline bool bdev_can_poll(struct block_device *bdev)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (!(q->limits.features & BLK_FEAT_POLL))
+ return false;
+
+ if (queue_is_mq(q))
+ return q->tag_set->map[HCTX_TYPE_POLL].nr_queues;
+ return true;
+}
+
/**
* submit_bio_noacct - re-submit a bio to the block device layer for I/O
* @bio: The bio describing the location in memory and on the device.
@@ -805,8 +817,7 @@ void submit_bio_noacct(struct bio *bio)
}
}

- if (!(q->limits.features & BLK_FEAT_POLL) &&
- (bio->bi_opf & REQ_POLLED)) {
+ if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) {
bio_clear_polled(bio);
goto not_supported;
}
@@ -935,7 +946,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
return 0;

q = bdev_get_queue(bdev);
- if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL))
+ if (cookie == BLK_QC_T_NONE || !bdev_can_poll(bdev))
return 0;

blk_flush_plug(current->plug, false);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2e6132f778fd..f795d81b6b38 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4320,12 +4320,6 @@ void blk_mq_release(struct request_queue *q)
blk_mq_sysfs_deinit(q);
}

-static bool blk_mq_can_poll(struct blk_mq_tag_set *set)
-{
- return set->nr_maps > HCTX_TYPE_POLL &&
- set->map[HCTX_TYPE_POLL].nr_queues;
-}
-
struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
struct queue_limits *lim, void *queuedata)
{
@@ -4336,7 +4330,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
if (!lim)
lim = &default_lim;
lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
- if (blk_mq_can_poll(set))
+ if (set->nr_maps > HCTX_TYPE_POLL)
lim->features |= BLK_FEAT_POLL;

q = blk_alloc_queue(lim, set->numa_node);
@@ -5024,8 +5018,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
fallback:
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
- struct queue_limits lim;
-
blk_mq_realloc_hw_ctxs(set, q);

if (q->nr_hw_queues != set->nr_hw_queues) {
@@ -5039,13 +5031,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
set->nr_hw_queues = prev_nr_hw_queues;
goto fallback;
}
- lim = queue_limits_start_update(q);
- if (blk_mq_can_poll(set))
- lim.features |= BLK_FEAT_POLL;
- else
- lim.features &= ~BLK_FEAT_POLL;
- if (queue_limits_commit_update(q, &lim) < 0)
- pr_warn("updating the poll flag failed\n");
blk_mq_map_swqueue(q);
}

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 767598e719ab..54488af6c001 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -245,10 +245,14 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
!!(disk->queue->limits.features & _feature)); \
}

-QUEUE_SYSFS_FEATURE_SHOW(poll, BLK_FEAT_POLL);
QUEUE_SYSFS_FEATURE_SHOW(fua, BLK_FEAT_FUA);
QUEUE_SYSFS_FEATURE_SHOW(dax, BLK_FEAT_DAX);

+static ssize_t queue_poll_show(struct gendisk *disk, char *page)
+{
+ return sysfs_emit(page, "%u\n", bdev_can_poll(disk->part0));
+}
+
static ssize_t queue_zoned_show(struct gendisk *disk, char *page)
{
if (blk_queue_is_zoned(disk->queue))
diff --git a/block/blk.h b/block/blk.h
index 4904b86d5fec..c8fdbb22d483 100644
--- a/block/blk.h
+++ b/block/blk.h
@@ -589,6 +589,7 @@ int truncate_bdev_range(struct block_device *bdev, blk_mode_t mode,
long blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
int blkdev_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags);
long compat_blkdev_ioctl(struct file *file, unsigned cmd, unsigned long arg);
+bool bdev_can_poll(struct block_device *bdev);

extern const struct address_space_operations def_blk_aops;

--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:41 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

De-duplicate the code for updating queue limits by adding a store_limit
method that allows having common code handle the actual queue limits
update.

Note that this is a pure refactoring patch and does not address the
existing freeze vs limits lock order problem in the refactored code,
which will be addressed next.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Damien Le Moal <dle...@kernel.org>
Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---

block/blk-sysfs.c | 128 ++++++++++++++++++++++------------------------
1 file changed, 61 insertions(+), 67 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 54488af6c001..f36356cbde0b 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -24,6 +24,8 @@ struct queue_sysfs_entry {
struct attribute attr;
ssize_t (*show)(struct gendisk *disk, char *page);
ssize_t (*store)(struct gendisk *disk, const char *page, size_t count);
+ int (*store_limit)(struct gendisk *disk, const char *page,
+ size_t count, struct queue_limits *lim);
void (*load_module)(struct gendisk *disk, const char *page, size_t count);
};

@@ -153,13 +155,11 @@ QUEUE_SYSFS_SHOW_CONST(discard_zeroes_data, 0)
QUEUE_SYSFS_SHOW_CONST(write_same_max, 0)
QUEUE_SYSFS_SHOW_CONST(poll_delay, -1)

-static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
- const char *page, size_t count)
+static int queue_max_discard_sectors_store(struct gendisk *disk,
+ const char *page, size_t count, struct queue_limits *lim)
{
unsigned long max_discard_bytes;
- struct queue_limits lim;
ssize_t ret;
- int err;

ret = queue_var_store(&max_discard_bytes, page, count);
if (ret < 0)
@@ -171,38 +171,28 @@ static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
return -EINVAL;

- lim = queue_limits_start_update(disk->queue);
- lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
- err = queue_limits_commit_update(disk->queue, &lim);
- if (err)
- return err;
- return ret;
+ lim->max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
+ return 0;
}

-static ssize_t
-queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count)
+static int
+queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count,
+ struct queue_limits *lim)
{
unsigned long max_sectors_kb;
- struct queue_limits lim;
ssize_t ret;
- int err;

ret = queue_var_store(&max_sectors_kb, page, count);
if (ret < 0)
return ret;

- lim = queue_limits_start_update(disk->queue);
- lim.max_user_sectors = max_sectors_kb << 1;
- err = queue_limits_commit_update(disk->queue, &lim);
- if (err)
- return err;
- return ret;
+ lim->max_user_sectors = max_sectors_kb << 1;
+ return 0;
}

static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
- size_t count, blk_features_t feature)
+ size_t count, struct queue_limits *lim, blk_features_t feature)
{
- struct queue_limits lim;
unsigned long val;
ssize_t ret;

@@ -210,15 +200,11 @@ static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
if (ret < 0)
return ret;

- lim = queue_limits_start_update(disk->queue);
if (val)
- lim.features |= feature;
+ lim->features |= feature;
else
- lim.features &= ~feature;
- ret = queue_limits_commit_update(disk->queue, &lim);
- if (ret)
- return ret;
- return count;
+ lim->features &= ~feature;
+ return 0;
}

#define QUEUE_SYSFS_FEATURE(_name, _feature) \
@@ -227,10 +213,10 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
return sysfs_emit(page, "%u\n", \

!!(disk->queue->limits.features & _feature)); \

} \
-static ssize_t queue_##_name##_store(struct gendisk *disk, \
- const char *page, size_t count) \
+static int queue_##_name##_store(struct gendisk *disk, \
+ const char *page, size_t count, struct queue_limits *lim) \
{ \
- return queue_feature_store(disk, page, count, _feature); \
+ return queue_feature_store(disk, page, count, lim, _feature); \
}

QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
@@ -270,10 +256,9 @@ static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page)
return queue_var_show(!!blk_queue_passthrough_stat(disk->queue), page);
}

-static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
- const char *page, size_t count)
+static int queue_iostats_passthrough_store(struct gendisk *disk,
+ const char *page, size_t count, struct queue_limits *lim)
{
- struct queue_limits lim;
unsigned long ios;
ssize_t ret;

@@ -281,18 +266,13 @@ static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
if (ret < 0)
return ret;

- lim = queue_limits_start_update(disk->queue);
if (ios)
- lim.flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
+ lim->flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
else
- lim.flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
-
- ret = queue_limits_commit_update(disk->queue, &lim);
- if (ret)
- return ret;
-
- return count;
+ lim->flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
+ return 0;
}
+
static ssize_t queue_nomerges_show(struct gendisk *disk, char *page)
{
return queue_var_show((blk_queue_nomerges(disk->queue) << 1) |
@@ -395,12 +375,10 @@ static ssize_t queue_wc_show(struct gendisk *disk, char *page)
return sysfs_emit(page, "write through\n");
}

-static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
- size_t count)
+static int queue_wc_store(struct gendisk *disk, const char *page,
+ size_t count, struct queue_limits *lim)
{
- struct queue_limits lim;
bool disable;
- int err;

if (!strncmp(page, "write back", 10)) {
disable = false;
@@ -411,15 +389,11 @@ static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
return -EINVAL;
}

- lim = queue_limits_start_update(disk->queue);
if (disable)
- lim.flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
+ lim->flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
else
- lim.flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
- err = queue_limits_commit_update(disk->queue, &lim);
- if (err)
- return err;
- return count;
+ lim->flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
+ return 0;
}

#define QUEUE_RO_ENTRY(_prefix, _name) \
@@ -435,6 +409,13 @@ static struct queue_sysfs_entry _prefix##_entry = { \
.store = _prefix##_store, \
};

+#define QUEUE_LIM_RW_ENTRY(_prefix, _name) \
+static struct queue_sysfs_entry _prefix##_entry = { \
+ .attr = { .name = _name, .mode = 0644 }, \
+ .show = _prefix##_show, \
+ .store_limit = _prefix##_store, \
+}
+
#define QUEUE_RW_LOAD_MODULE_ENTRY(_prefix, _name) \
static struct queue_sysfs_entry _prefix##_entry = { \
.attr = { .name = _name, .mode = 0644 }, \
@@ -445,7 +426,7 @@ static struct queue_sysfs_entry _prefix##_entry = { \

QUEUE_RW_ENTRY(queue_requests, "nr_requests");
QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb");
-QUEUE_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
+QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
@@ -461,7 +442,7 @@ QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
QUEUE_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes");
-QUEUE_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes");
+QUEUE_LIM_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes");
QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");

QUEUE_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes");
@@ -481,11 +462,11 @@ QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");

QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
-QUEUE_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough");
+QUEUE_LIM_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough");
QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
QUEUE_RW_ENTRY(queue_poll, "io_poll");
QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay");
-QUEUE_RW_ENTRY(queue_wc, "write_cache");
+QUEUE_LIM_RW_ENTRY(queue_wc, "write_cache");
QUEUE_RO_ENTRY(queue_fua, "fua");
QUEUE_RO_ENTRY(queue_dax, "dax");
QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
@@ -498,10 +479,10 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry = {
.show = queue_logical_block_size_show,
};

-QUEUE_RW_ENTRY(queue_rotational, "rotational");
-QUEUE_RW_ENTRY(queue_iostats, "iostats");
-QUEUE_RW_ENTRY(queue_add_random, "add_random");
-QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_LIM_RW_ENTRY(queue_rotational, "rotational");
+QUEUE_LIM_RW_ENTRY(queue_iostats, "iostats");
+QUEUE_LIM_RW_ENTRY(queue_add_random, "add_random");
+QUEUE_LIM_RW_ENTRY(queue_stable_writes, "stable_writes");

#ifdef CONFIG_BLK_WBT
static ssize_t queue_var_store64(s64 *var, const char *page)
@@ -699,7 +680,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
struct request_queue *q = disk->queue;
ssize_t res;

- if (!entry->store)
+ if (!entry->store_limit && !entry->store)
return -EIO;

/*
@@ -710,11 +691,24 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
if (entry->load_module)
entry->load_module(disk, page, length);

- blk_mq_freeze_queue(q);
mutex_lock(&q->sysfs_lock);
- res = entry->store(disk, page, length);
- mutex_unlock(&q->sysfs_lock);
+ blk_mq_freeze_queue(q);
+ if (entry->store_limit) {
+ struct queue_limits lim = queue_limits_start_update(q);
+
+ res = entry->store_limit(disk, page, length, &lim);
+ if (res < 0) {
+ queue_limits_cancel_update(q);
+ } else {
+ res = queue_limits_commit_update(q, &lim);
+ if (!res)
+ res = length;
+ }
+ } else {
+ res = entry->store(disk, page, length);
+ }
blk_mq_unfreeze_queue(q);
+ mutex_unlock(&q->sysfs_lock);
return res;
}

--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:45 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

queue_attr_store() always freezes a device queue before calling the
attribute store operation. For attributes that control queue limits, the
store operation will also lock the queue limits with a call to
queue_limits_start_update(). However, some drivers (e.g. SCSI sd) may
need to issue commands to a device to obtain limit values from the
hardware with the queue limits locked. This creates a potential ABBA
deadlock situation if a user attempts to modify a limit (thus freezing
the device queue) while the device driver starts a revalidation of the
device queue limits.

Avoid such deadlock by not freezing the queue before calling the
->store_limit() method in struct queue_sysfs_entry and instead use the
queue_limits_commit_update_frozen helper to freeze the queue after taking
the limits lock.

(commit log adapted from a similar patch from Damien Le Moal)

Fixes: ff956a3be95b ("block: use queue_limits_commit_update in queue_discard_max_store")
Fixes: 0327ca9d53bf ("block: use queue_limits_commit_update in queue_max_sectors_store")
Signed-off-by: Christoph Hellwig <h...@lst.de>

Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---

block/blk-sysfs.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index f36356cbde0b..2de405cb5f10 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -691,22 +691,24 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,

if (entry->load_module)
entry->load_module(disk, page, length);

- mutex_lock(&q->sysfs_lock);
- blk_mq_freeze_queue(q);
if (entry->store_limit) {

struct queue_limits lim = queue_limits_start_update(q);

res = entry->store_limit(disk, page, length, &lim);

if (res < 0) {
queue_limits_cancel_update(q);
- } else {
- res = queue_limits_commit_update(q, &lim);
- if (!res)
- res = length;
+ return res;
}
- } else {

- res = entry->store(disk, page, length);

+
+ res = queue_limits_commit_update_frozen(q, &lim);
+ if (res)
+ return res;
+ return length;
}
+
+ mutex_lock(&q->sysfs_lock);
+ blk_mq_freeze_queue(q);

+ res = entry->store(disk, page, length);

blk_mq_unfreeze_queue(q);

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:49 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Match the locking order used by the core block code by only freezing

the queue after taking the limits lock.

Unlike most queue updates this does not use the
queue_limits_commit_update_frozen helper as the nvme driver want the
queue frozen for more than just the limits update.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Damien Le Moal <dle...@kernel.org>

Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---

drivers/nvme/host/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2250ddef5a2..1ccf17f6ea7f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2128,9 +2128,10 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
struct queue_limits lim;
int ret;

- blk_mq_freeze_queue(ns->disk->queue);
lim = queue_limits_start_update(ns->disk->queue);
nvme_set_ctrl_limits(ns->ctrl, &lim);
+
+ blk_mq_freeze_queue(ns->disk->queue);
ret = queue_limits_commit_update(ns->disk->queue, &lim);
set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
blk_mq_unfreeze_queue(ns->disk->queue);
@@ -2177,12 +2178,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
goto out;
}

+ lim = queue_limits_start_update(ns->disk->queue);
+
blk_mq_freeze_queue(ns->disk->queue);
ns->head->lba_shift = id->lbaf[lbaf].ds;
ns->head->nuse = le64_to_cpu(id->nuse);
capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
-
- lim = queue_limits_start_update(ns->disk->queue);
nvme_set_ctrl_limits(ns->ctrl, &lim);
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
@@ -2285,6 +2286,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
struct queue_limits *ns_lim = &ns->disk->queue->limits;
struct queue_limits lim;

+ lim = queue_limits_start_update(ns->head->disk->queue);
blk_mq_freeze_queue(ns->head->disk->queue);
/*
* queue_limits mixes values that are the hardware limitations
@@ -2301,7 +2303,6 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
* the splitting limits in to make sure we still obey possibly
* lower limitations of other controllers.
*/
- lim = queue_limits_start_update(ns->head->disk->queue);
lim.logical_block_size = ns_lim->logical_block_size;
lim.physical_block_size = ns_lim->physical_block_size;
lim.io_min = ns_lim->io_min;
--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:52 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Match the locking order used by the core block code by only freezing

the queue after taking the limits lock using the
queue_limits_commit_update_frozen helper.

This also allows removes the need for the separate __nbd_set_size helper,
so remove it.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Damien Le Moal <dle...@kernel.org>
Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---

drivers/block/nbd.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 259bd57fc529..efa05c3c06bf 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -327,8 +327,7 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
nsock->sent = 0;
}

-static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
- loff_t blksize)
+static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize, loff_t blksize)
{
struct queue_limits lim;
int error;
@@ -368,7 +367,7 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,

lim.logical_block_size = blksize;
lim.physical_block_size = blksize;
- error = queue_limits_commit_update(nbd->disk->queue, &lim);
+ error = queue_limits_commit_update_frozen(nbd->disk->queue, &lim);
if (error)
return error;

@@ -379,18 +378,6 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
return 0;
}

-static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
- loff_t blksize)
-{
- int error;
-
- blk_mq_freeze_queue(nbd->disk->queue);
- error = __nbd_set_size(nbd, bytesize, blksize);
- blk_mq_unfreeze_queue(nbd->disk->queue);
-
- return error;
-}
-
static void nbd_complete_rq(struct request *req)
{
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:25:58 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Match the locking order used by the core block code by only freezing
the queue after taking the limits lock using the
queue_limits_commit_update_frozen helper.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Reviewed-by: Damien Le Moal <dle...@kernel.org>
Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
---

drivers/usb/storage/scsiglue.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 8c8b5e6041cc..dc98ceecb724 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -592,12 +592,9 @@ static ssize_t max_sectors_store(struct device *dev, struct device_attribute *at
if (sscanf(buf, "%hu", &ms) <= 0)
return -EINVAL;

- blk_mq_freeze_queue(sdev->request_queue);
lim = queue_limits_start_update(sdev->request_queue);
lim.max_hw_sectors = ms;
- ret = queue_limits_commit_update(sdev->request_queue, &lim);
- blk_mq_unfreeze_queue(sdev->request_queue);
-
+ ret = queue_limits_commit_update_frozen(sdev->request_queue, &lim);

if (ret)
return ret;
return count;

--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:26:06 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Replace loop_reconfigure_limits with a slightly less encompassing
loop_update_limits that expects the aller to acquire and commit the
queue limits to prepare for sorting out the freeze vs limits lock
ordering.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---

drivers/block/loop.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 836a53eef4b4..560d6d5879d6 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -977,12 +977,12 @@ static unsigned int loop_default_blocksize(struct loop_device *lo,
return SECTOR_SIZE;
}

-static int loop_reconfigure_limits(struct loop_device *lo, unsigned int bsize)
+static void loop_update_limits(struct loop_device *lo, struct queue_limits *lim,
+ unsigned int bsize)
{
struct file *file = lo->lo_backing_file;
struct inode *inode = file->f_mapping->host;
struct block_device *backing_bdev = NULL;
- struct queue_limits lim;
u32 granularity = 0, max_discard_sectors = 0;

if (S_ISBLK(inode->i_mode))
@@ -995,22 +995,20 @@ static int loop_reconfigure_limits(struct loop_device *lo, unsigned int bsize)

loop_get_discard_config(lo, &granularity, &max_discard_sectors);

- lim = queue_limits_start_update(lo->lo_queue);
- lim.logical_block_size = bsize;
- lim.physical_block_size = bsize;
- lim.io_min = bsize;
- lim.features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_ROTATIONAL);
+ lim->logical_block_size = bsize;
+ lim->physical_block_size = bsize;
+ lim->io_min = bsize;
+ lim->features &= ~(BLK_FEAT_WRITE_CACHE | BLK_FEAT_ROTATIONAL);
if (file->f_op->fsync && !(lo->lo_flags & LO_FLAGS_READ_ONLY))
- lim.features |= BLK_FEAT_WRITE_CACHE;
+ lim->features |= BLK_FEAT_WRITE_CACHE;
if (backing_bdev && !bdev_nonrot(backing_bdev))
- lim.features |= BLK_FEAT_ROTATIONAL;
- lim.max_hw_discard_sectors = max_discard_sectors;
- lim.max_write_zeroes_sectors = max_discard_sectors;
+ lim->features |= BLK_FEAT_ROTATIONAL;
+ lim->max_hw_discard_sectors = max_discard_sectors;
+ lim->max_write_zeroes_sectors = max_discard_sectors;
if (max_discard_sectors)
- lim.discard_granularity = granularity;
+ lim->discard_granularity = granularity;
else
- lim.discard_granularity = 0;
- return queue_limits_commit_update(lo->lo_queue, &lim);
+ lim->discard_granularity = 0;
}

static int loop_configure(struct loop_device *lo, blk_mode_t mode,
@@ -1019,6 +1017,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
{
struct file *file = fget(config->fd);
struct address_space *mapping;
+ struct queue_limits lim;
int error;
loff_t size;
bool partscan;
@@ -1090,7 +1089,9 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,
lo->old_gfp_mask = mapping_gfp_mask(mapping);
mapping_set_gfp_mask(mapping, lo->old_gfp_mask & ~(__GFP_IO|__GFP_FS));

- error = loop_reconfigure_limits(lo, config->block_size);
+ lim = queue_limits_start_update(lo->lo_queue);
+ loop_update_limits(lo, &lim, config->block_size);
+ error = queue_limits_commit_update(lo->lo_queue, &lim);
if (error)
goto out_unlock;

@@ -1458,6 +1459,7 @@ static int loop_set_dio(struct loop_device *lo, unsigned long arg)

static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
{
+ struct queue_limits lim;
int err = 0;

if (lo->lo_state != Lo_bound)
@@ -1470,7 +1472,9 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
invalidate_bdev(lo->lo_device);

blk_mq_freeze_queue(lo->lo_queue);
- err = loop_reconfigure_limits(lo, arg);
+ lim = queue_limits_start_update(lo->lo_queue);
+ loop_update_limits(lo, &lim, arg);
+ err = queue_limits_commit_update(lo->lo_queue, &lim);
loop_update_dio(lo);
blk_mq_unfreeze_queue(lo->lo_queue);

--
2.45.2

Christoph Hellwig

unread,

Jan 8, 2025, 4:26:08 AMJan 8

to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Match the locking order used by the core block code by only freezing
the queue after taking the limits lock using the

queue_limits_commit_update_frozen helper and document the callers that
do not freeze the queue at all.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---

drivers/block/loop.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)

diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 560d6d5879d6..15e486baa223 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -311,6 +311,13 @@ static void loop_clear_limits(struct loop_device *lo, int mode)
lim.discard_granularity = 0;
}

+ /*
+ * XXX: this updates the queue limits without freezing the queue, which
+ * is against the locking protocol and dangerous. But we can't just
+ * freeze the queue as we're inside the ->queue_rq method here. So this
+ * should move out into a workqueue unless we get the file operations to
+ * advertise if they support specific fallocate operations.
+ */
queue_limits_commit_update(lo->lo_queue, &lim);
}

@@ -1091,6 +1098,7 @@ static int loop_configure(struct loop_device *lo, blk_mode_t mode,

lim = queue_limits_start_update(lo->lo_queue);
loop_update_limits(lo, &lim, config->block_size);
+ /* No need to freeze the queue as the device isn't bound yet. */

error = queue_limits_commit_update(lo->lo_queue, &lim);
if (error)
goto out_unlock;

@@ -1151,7 +1159,12 @@ static void __loop_clr_fd(struct loop_device *lo)
lo->lo_sizelimit = 0;
memset(lo->lo_file_name, 0, LO_NAME_SIZE);

- /* reset the block size to the default */
+ /*
+ * Reset the block size to the default.
+ *
+ * No queue freezing needed because this is called from the final
+ * ->release call only, so there can't be any outstanding I/O.
+ */
lim = queue_limits_start_update(lo->lo_queue);
lim.logical_block_size = SECTOR_SIZE;
lim.physical_block_size = SECTOR_SIZE;
@@ -1471,9 +1484,10 @@ static int loop_set_block_size(struct loop_device *lo, unsigned long arg)
sync_blockdev(lo->lo_device);
invalidate_bdev(lo->lo_device);

- blk_mq_freeze_queue(lo->lo_queue);
lim = queue_limits_start_update(lo->lo_queue);
loop_update_limits(lo, &lim, arg);
+
+ blk_mq_freeze_queue(lo->lo_queue);

Damien Le Moal

unread,

Jan 8, 2025, 5:18:36 AMJan 8

to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 6:25 PM, Christoph Hellwig wrote:
> When __blk_mq_update_nr_hw_queues changes the number of tag sets, it
> might have to disable poll queues. Currently it does so by adjusting
> the BLK_FEAT_POLL, which is a bit against the intent of features that
> describe hardware / driver capabilities, but more importantly causes
> nasty lock order problems with the broadly held freeze when updating the
> number of hardware queues and the limits lock. Fix this by leaving
> BLK_FEAT_POLL alone, and instead check for the number of poll queues in
> the bio submission and poll handlers. While this adds extra work to the
> fast path, the variables are in cache lines used by these operations
> anyway, so it should be cheap enough.
>
> Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits")
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Looks OK to me.

Reviewed-by: Damien Le Moal <dle...@kernel.org>

--
Damien Le Moal
Western Digital Research

Damien Le Moal

unread,

Jan 8, 2025, 5:19:06 AMJan 8

to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 6:25 PM, Christoph Hellwig wrote:

> queue_attr_store() always freezes a device queue before calling the
> attribute store operation. For attributes that control queue limits, the
> store operation will also lock the queue limits with a call to
> queue_limits_start_update(). However, some drivers (e.g. SCSI sd) may
> need to issue commands to a device to obtain limit values from the
> hardware with the queue limits locked. This creates a potential ABBA
> deadlock situation if a user attempts to modify a limit (thus freezing
> the device queue) while the device driver starts a revalidation of the
> device queue limits.
>
> Avoid such deadlock by not freezing the queue before calling the
> ->store_limit() method in struct queue_sysfs_entry and instead use the
> queue_limits_commit_update_frozen helper to freeze the queue after taking
> the limits lock.
>
> (commit log adapted from a similar patch from Damien Le Moal)
>
> Fixes: ff956a3be95b ("block: use queue_limits_commit_update in queue_discard_max_store")
> Fixes: 0327ca9d53bf ("block: use queue_limits_commit_update in queue_max_sectors_store")
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Ming Lei

unread,

Jan 8, 2025, 5:19:49 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:24:58AM +0100, Christoph Hellwig wrote:
> queue_limits_commit_update is the function that needs to operate on a
> frozen queue, not queue_limits_start_update. Update the kerneldoc
> comments to reflect that.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Reviewed-by: Ming Lei <ming...@redhat.com>

Thanks,
Ming

Damien Le Moal

unread,

Jan 8, 2025, 5:20:57 AMJan 8

to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 6:25 PM, Christoph Hellwig wrote:

> Replace loop_reconfigure_limits with a slightly less encompassing
> loop_update_limits that expects the aller to acquire and commit the
> queue limits to prepare for sorting out the freeze vs limits lock
> ordering.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Reviewed-by: Damien Le Moal <dle...@kernel.org>

Ming Lei

unread,

Jan 8, 2025, 5:21:11 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:24:59AM +0100, Christoph Hellwig wrote:
> Add a helper that freezes the queue, updates the queue limits and
> unfreezes the queue and convert all open coded versions of that to the
> new helper.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Damien Le Moal

unread,

Jan 8, 2025, 5:21:15 AMJan 8

to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 6:25 PM, Christoph Hellwig wrote:

> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper and document the callers that
> do not freeze the queue at all.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Reviewed-by: Damien Le Moal <dle...@kernel.org>

Ming Lei

unread,

Jan 8, 2025, 5:31:33 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:00AM +0100, Christoph Hellwig wrote:
> When __blk_mq_update_nr_hw_queues changes the number of tag sets, it
> might have to disable poll queues. Currently it does so by adjusting
> the BLK_FEAT_POLL, which is a bit against the intent of features that
> describe hardware / driver capabilities, but more importantly causes
> nasty lock order problems with the broadly held freeze when updating the
> number of hardware queues and the limits lock. Fix this by leaving
> BLK_FEAT_POLL alone, and instead check for the number of poll queues in
> the bio submission and poll handlers. While this adds extra work to the
> fast path, the variables are in cache lines used by these operations
> anyway, so it should be cheap enough.
>
> Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits")
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---

...

> /**
> * submit_bio_noacct - re-submit a bio to the block device layer for I/O
> * @bio: The bio describing the location in memory and on the device.
> @@ -805,8 +817,7 @@ void submit_bio_noacct(struct bio *bio)
> }
> }
>
> - if (!(q->limits.features & BLK_FEAT_POLL) &&
> - (bio->bi_opf & REQ_POLLED)) {
> + if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) {

submit_bio_noacct() is called without grabbing .q_usage_counter,
so tagset may be freed now, then use-after-free on q->tag_set?

Thanks,
Ming

Ming Lei

unread,

Jan 8, 2025, 5:33:51 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:01AM +0100, Christoph Hellwig wrote:
> De-duplicate the code for updating queue limits by adding a store_limit
> method that allows having common code handle the actual queue limits
> update.
>
> Note that this is a pure refactoring patch and does not address the
> existing freeze vs limits lock order problem in the refactored code,
> which will be addressed next.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Ming Lei

unread,

Jan 8, 2025, 5:38:33 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Looks fine, but now ->store_limit() is called without holding
->sysfs_lock, maybe it should be documented.

Reviewed-by: Ming Lei <ming...@redhat.com>

thanks,
Ming

Ming Lei

unread,

Jan 8, 2025, 5:39:34 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:03AM +0100, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock.
>
> Unlike most queue updates this does not use the
> queue_limits_commit_update_frozen helper as the nvme driver want the
> queue frozen for more than just the limits update.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Ming Lei

unread,

Jan 8, 2025, 5:40:35 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:04AM +0100, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper.
>
> This also allows removes the need for the separate __nbd_set_size helper,
> so remove it.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Ming Lei

unread,

Jan 8, 2025, 5:41:22 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:05AM +0100, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Ming Lei

unread,

Jan 8, 2025, 5:43:13 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:06AM +0100, Christoph Hellwig wrote:
> Replace loop_reconfigure_limits with a slightly less encompassing
> loop_update_limits that expects the aller to acquire and commit the
> queue limits to prepare for sorting out the freeze vs limits lock
> ordering.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Ming Lei

unread,

Jan 8, 2025, 5:44:35 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 10:25:07AM +0100, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper and document the callers that
> do not freeze the queue at all.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---

Johannes Thumshirn

unread,

Jan 8, 2025, 5:51:24 AMJan 8

to hch, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

For the series,
Reviewed-by: Johannes Thumshirn <johannes....@wdc.com>

Nilay Shroff

unread,

Jan 8, 2025, 5:54:47 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 2:55 PM, Christoph Hellwig wrote:
> When __blk_mq_update_nr_hw_queues changes the number of tag sets, it
> might have to disable poll queues. Currently it does so by adjusting
> the BLK_FEAT_POLL, which is a bit against the intent of features that
> describe hardware / driver capabilities, but more importantly causes
> nasty lock order problems with the broadly held freeze when updating the
> number of hardware queues and the limits lock. Fix this by leaving
> BLK_FEAT_POLL alone, and instead check for the number of poll queues in
> the bio submission and poll handlers. While this adds extra work to the
> fast path, the variables are in cache lines used by these operations
> anyway, so it should be cheap enough.
>
> Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits")
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Looks good to me:

Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

Nilay Shroff

unread,

Jan 8, 2025, 5:57:01 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 2:55 PM, Christoph Hellwig wrote:

> Replace loop_reconfigure_limits with a slightly less encompassing
> loop_update_limits that expects the aller to acquire and commit the
> queue limits to prepare for sorting out the freeze vs limits lock
> ordering.

Nilay Shroff

unread,

Jan 8, 2025, 5:58:04 AMJan 8

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 2:55 PM, Christoph Hellwig wrote:

> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper and document the callers that
> do not freeze the queue at all.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Christoph Hellwig

unread,

Jan 8, 2025, 10:27:12 AMJan 8

to Ming Lei, Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 06:31:15PM +0800, Ming Lei wrote:
> > - if (!(q->limits.features & BLK_FEAT_POLL) &&
> > - (bio->bi_opf & REQ_POLLED)) {
> > + if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) {
>
> submit_bio_noacct() is called without grabbing .q_usage_counter,
> so tagset may be freed now, then use-after-free on q->tag_set?

Indeed. That also means the previous check wasn't reliable either.
I think we can simple move the check into
blk_mq_submit_bio/__submit_bio which means we'll do a bunch more
checks before we eventually fail, but otherwise it'll work the
same.

Christoph Hellwig

unread,

Jan 8, 2025, 10:29:14 AMJan 8

to Ming Lei, Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Wed, Jan 08, 2025 at 06:38:15PM +0800, Ming Lei wrote:
> Looks fine, but now ->store_limit() is called without holding
> ->sysfs_lock, maybe it should be documented.

Ok, I had this in my own commit log, but it got lost when I stole
Damien's much better one instead :)

I assume you're fine with just documenting it in the commit log?

Damien Le Moal

unread,

Jan 8, 2025, 7:05:54 PMJan 8

to Christoph Hellwig, Ming Lei, Jens Axboe, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Given that the request queue is the same for all tag sets, I do not think we
need to have the queue_limits_start_update()/commit_update() within the tag set
loop in __blk_mq_update_nr_hw_queues(). So something like this should be enough
for an initial fix, no ?

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 8ac19d4ae3c0..ac71e9cee25b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4986,6 +4986,7 @@ static void __blk_mq_update_nr_hw_queues(struct
blk_mq_tag_set *set,
int nr_hw_queues)
{
struct request_queue *q;
+ struct queue_limits lim;
LIST_HEAD(head);
int prev_nr_hw_queues = set->nr_hw_queues;
int i;
@@ -4999,8 +5000,10 @@ static void __blk_mq_update_nr_hw_queues(struct
blk_mq_tag_set *set,
if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues)
return;

+ lim = queue_limits_start_update(q);
list_for_each_entry(q, &set->tag_list, tag_set_list)
blk_mq_freeze_queue(q);
+
/*
* Switch IO scheduler to 'none', cleaning up the data associated
* with the previous scheduler. We will switch back once we are done
@@ -5036,13 +5039,10 @@ static void __blk_mq_update_nr_hw_queues(struct
blk_mq_tag_set *set,
set->nr_hw_queues = prev_nr_hw_queues;
goto fallback;
}
- lim = queue_limits_start_update(q);
if (blk_mq_can_poll(set))
lim.features |= BLK_FEAT_POLL;
else
lim.features &= ~BLK_FEAT_POLL;
- if (queue_limits_commit_update(q, &lim) < 0)
- pr_warn("updating the poll flag failed\n");
blk_mq_map_swqueue(q);
}

@@ -5059,6 +5059,9 @@ static void __blk_mq_update_nr_hw_queues(struct
blk_mq_tag_set *set,
list_for_each_entry(q, &set->tag_list, tag_set_list)
blk_mq_unfreeze_queue(q);

+ if (queue_limits_commit_update(q, &lim) < 0)
+ pr_warn("updating the poll flag failed\n");
+
/* Free the excess tags when nr_hw_queues shrink. */
for (i = set->nr_hw_queues; i < prev_nr_hw_queues; i++)
__blk_mq_free_map_and_rqs(set, i);

With that, no modification of the hot path to check the poll feature should be
needed. And I also fail to see why we need to do the queue freeze for all tag
sets. Once should be enough as well...

Ming Lei

unread,

Jan 8, 2025, 9:18:42 PMJan 8

to Damien Le Moal, Christoph Hellwig, Jens Axboe, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On Thu, Jan 09, 2025 at 09:05:49AM +0900, Damien Le Moal wrote:
> On 1/9/25 00:27, Christoph Hellwig wrote:
> > On Wed, Jan 08, 2025 at 06:31:15PM +0800, Ming Lei wrote:
> >>> - if (!(q->limits.features & BLK_FEAT_POLL) &&
> >>> - (bio->bi_opf & REQ_POLLED)) {
> >>> + if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) {
> >>
> >> submit_bio_noacct() is called without grabbing .q_usage_counter,
> >> so tagset may be freed now, then use-after-free on q->tag_set?
> >
> > Indeed. That also means the previous check wasn't reliable either.
> > I think we can simple move the check into
> > blk_mq_submit_bio/__submit_bio which means we'll do a bunch more
> > checks before we eventually fail, but otherwise it'll work the
> > same.
>
> Given that the request queue is the same for all tag sets, I do not think we

No, it isn't same.

> need to have the queue_limits_start_update()/commit_update() within the tag set
> loop in __blk_mq_update_nr_hw_queues(). So something like this should be enough
> for an initial fix, no ?
>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 8ac19d4ae3c0..ac71e9cee25b 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -4986,6 +4986,7 @@ static void __blk_mq_update_nr_hw_queues(struct
> blk_mq_tag_set *set,
> int nr_hw_queues)
> {
> struct request_queue *q;
> + struct queue_limits lim;
> LIST_HEAD(head);
> int prev_nr_hw_queues = set->nr_hw_queues;
> int i;
> @@ -4999,8 +5000,10 @@ static void __blk_mq_update_nr_hw_queues(struct
> blk_mq_tag_set *set,
> if (set->nr_maps == 1 && nr_hw_queues == set->nr_hw_queues)
> return;
>
> + lim = queue_limits_start_update(q);
> list_for_each_entry(q, &set->tag_list, tag_set_list)
> blk_mq_freeze_queue(q);

It could be worse, since the limits_lock is connected with lots of other
subsystem's lock(debugfs, sysfs dir, ...), it may introduce new deadlock
risk.

Thanks,
Ming

Hannes Reinecke

unread,

Jan 13, 2025, 2:19:36 AMJan 13

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 10:24, Christoph Hellwig wrote:
> queue_limits_commit_update is the function that needs to operate on a
> frozen queue, not queue_limits_start_update. Update the kerneldoc
> comments to reflect that.
>

> Signed-off-by: Christoph Hellwig <h...@lst.de>

> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>

> ---
> block/blk-settings.c | 3 ++-
> include/linux/blkdev.h | 3 +--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
Reviewed-by: Hannes Reinecke <ha...@suse.de>

Cheers,

Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
ha...@suse.de +49 911 74053 688
SUSE Software Solutions GmbH, Frankenstr. 146, 90461 Nürnberg
HRB 36809 (AG Nürnberg), GF: I. Totev, A. McDonald, W. Knoblich

Hannes Reinecke

unread,

Jan 13, 2025, 2:21:02 AMJan 13

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 10:24, Christoph Hellwig wrote:

> Add a helper that freezes the queue, updates the queue limits and
> unfreezes the queue and convert all open coded versions of that to the
> new helper.
>

> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
> ---

Hannes Reinecke

unread,

Jan 13, 2025, 2:23:50 AMJan 13

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

> ---
> block/blk-core.c | 17 ++++++++++++++---
> block/blk-mq.c | 17 +----------------
> block/blk-sysfs.c | 6 +++++-
> block/blk.h | 1 +
> 4 files changed, 21 insertions(+), 20 deletions(-)

Hannes Reinecke

unread,

Jan 13, 2025, 2:24:39 AMJan 13

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 10:25, Christoph Hellwig wrote:

> De-duplicate the code for updating queue limits by adding a store_limit
> method that allows having common code handle the actual queue limits
> update.
>
> Note that this is a pure refactoring patch and does not address the
> existing freeze vs limits lock order problem in the refactored code,
> which will be addressed next.
>

> Signed-off-by: Christoph Hellwig <h...@lst.de>
> Reviewed-by: Damien Le Moal <dle...@kernel.org>
> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
> ---

> block/blk-sysfs.c | 128 ++++++++++++++++++++++------------------------
> 1 file changed, 61 insertions(+), 67 deletions(-)

Hannes Reinecke

unread,

Jan 13, 2025, 2:25:28 AMJan 13

to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

On 1/8/25 10:25, Christoph Hellwig wrote:

> queue_attr_store() always freezes a device queue before calling the
> attribute store operation. For attributes that control queue limits, the
> store operation will also lock the queue limits with a call to
> queue_limits_start_update(). However, some drivers (e.g. SCSI sd) may
> need to issue commands to a device to obtain limit values from the
> hardware with the queue limits locked. This creates a potential ABBA
> deadlock situation if a user attempts to modify a limit (thus freezing
> the device queue) while the device driver starts a revalidation of the
> device queue limits.
>
> Avoid such deadlock by not freezing the queue before calling the
> ->store_limit() method in struct queue_sysfs_entry and instead use the
> queue_limits_commit_update_frozen helper to freeze the queue after taking
> the limits lock.
>
> (commit log adapted from a similar patch from Damien Le Moal)
>
> Fixes: ff956a3be95b ("block: use queue_limits_commit_update in queue_discard_max_store")
> Fixes: 0327ca9d53bf ("block: use queue_limits_commit_update in queue_max_sectors_store")

> Signed-off-by: Christoph Hellwig <h...@lst.de>

> Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>
> ---

> block/blk-sysfs.c | 18 ++++++++++--------
> 1 file changed, 10 insertions(+), 8 deletions(-)

Reply all

Reply to author

Forward