fix queue freeze and limit locking order

73 views
Skip to first unread message

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:28 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Hi all,

this is my version of Damien's "Fix queue freeze and limit locking order".
A lot looks very similar, but it was done independently based on the
previous discussion.

Changes since RFC:
- fix a bizzare virtio_blk bisection snafu
- set BLK_FEAT_POLL a little less eagerly for blk-mq
- drop the loop patch just adding a comment
- improve various commit logs and coments

Diffstat:
block/blk-core.c | 17 ++++-
block/blk-integrity.c | 4 -
block/blk-mq.c | 17 -----
block/blk-settings.c | 27 ++++++++
block/blk-sysfs.c | 128 +++++++++++++++++++----------------------
block/blk-zoned.c | 7 --
drivers/block/nbd.c | 17 -----
drivers/block/virtio_blk.c | 4 -
drivers/nvme/host/core.c | 9 +-
drivers/scsi/sd.c | 17 +----
drivers/scsi/sr.c | 5 -
drivers/usb/storage/scsiglue.c | 5 -
include/linux/blkdev.h | 5 -
13 files changed, 123 insertions(+), 139 deletions(-)

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:32 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
queue_limits_commit_update is the function that needs to operate on a
frozen queue, not queue_limits_start_update. Update the kerneldoc
comments to reflect that.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
block/blk-settings.c | 3 ++-
include/linux/blkdev.h | 3 +--
2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index 8f09e33f41f6..b6b8c580d018 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
* @lim: limits to apply
*
* Apply the limits in @lim that were obtained from queue_limits_start_update()
- * and updated by the caller to @q.
+ * and updated by the caller to @q. The caller must have frozen the queue or
+ * ensure that there are outstanding I/Os by other means.
*
* Returns 0 if successful, else a negative error code.
*/
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5d40af2ef971..e781d4e6f92d 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -944,8 +944,7 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
* the caller can modify. The caller must call queue_limits_commit_update()
* to finish the update.
*
- * Context: process context. The caller must have frozen the queue or ensured
- * that there is outstanding I/O by other means.
+ * Context: process context.
*/
static inline struct queue_limits
queue_limits_start_update(struct request_queue *q)
--
2.45.2

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:34 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Add a helper that freezes the queue, updates the queue limits and
unfreezes the queue and convert all open coded versions of that to the
new helper.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
block/blk-integrity.c | 4 +---
block/blk-settings.c | 24 ++++++++++++++++++++++++
block/blk-zoned.c | 7 +------
drivers/block/virtio_blk.c | 4 +---
drivers/scsi/sd.c | 17 +++++------------
drivers/scsi/sr.c | 5 +----
include/linux/blkdev.h | 2 ++
7 files changed, 35 insertions(+), 28 deletions(-)

diff --git a/block/blk-integrity.c b/block/blk-integrity.c
index b180cac61a9d..013469faa5e7 100644
--- a/block/blk-integrity.c
+++ b/block/blk-integrity.c
@@ -218,9 +218,7 @@ static ssize_t flag_store(struct device *dev, const char *page, size_t count,
else
lim.integrity.flags |= flag;

- blk_mq_freeze_queue(q);
- err = queue_limits_commit_update(q, &lim);
- blk_mq_unfreeze_queue(q);
+ err = queue_limits_commit_update_frozen(q, &lim);
if (err)
return err;
return count;
diff --git a/block/blk-settings.c b/block/blk-settings.c
index b6b8c580d018..2afc67182efd 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -444,6 +444,30 @@ int queue_limits_commit_update(struct request_queue *q,
}
EXPORT_SYMBOL_GPL(queue_limits_commit_update);

+/**
+ * queue_limits_commit_update_frozen - commit an atomic update of queue limits
+ * @q: queue to update
+ * @lim: limits to apply
+ *
+ * Apply the limits in @lim that were obtained from queue_limits_start_update()
+ * and updated by the caller to @q. Freezes the queue before the updated and
+ * unfreezes it after.
+ *
+ * Returns 0 if successful, else a negative error code.
+ */
+int queue_limits_commit_update_frozen(struct request_queue *q,
+ struct queue_limits *lim)
+{
+ int ret;
+
+ blk_mq_freeze_queue(q);
+ ret = queue_limits_commit_update(q, lim);
+ blk_mq_unfreeze_queue(q);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(queue_limits_commit_update_frozen);
+
/**
* queue_limits_set - apply queue limits to queue
* @q: queue to update
diff --git a/block/blk-zoned.c b/block/blk-zoned.c
index 4b0be40a8ea7..9d08a54c201e 100644
--- a/block/blk-zoned.c
+++ b/block/blk-zoned.c
@@ -1444,7 +1444,6 @@ static int disk_update_zone_resources(struct gendisk *disk,
unsigned int nr_seq_zones, nr_conv_zones;
unsigned int pool_size;
struct queue_limits lim;
- int ret;

disk->nr_zones = args->nr_zones;
disk->zone_capacity = args->zone_capacity;
@@ -1495,11 +1494,7 @@ static int disk_update_zone_resources(struct gendisk *disk,
}

commit:
- blk_mq_freeze_queue(q);
- ret = queue_limits_commit_update(q, &lim);
- blk_mq_unfreeze_queue(q);
-
- return ret;
+ return queue_limits_commit_update_frozen(q, &lim);
}

static int blk_revalidate_conv_zone(struct blk_zone *zone, unsigned int idx,
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
index 71a7ffeafb32..bbaa26b523b8 100644
--- a/drivers/block/virtio_blk.c
+++ b/drivers/block/virtio_blk.c
@@ -1105,9 +1105,7 @@ cache_type_store(struct device *dev, struct device_attribute *attr,
lim.features |= BLK_FEAT_WRITE_CACHE;
else
lim.features &= ~BLK_FEAT_WRITE_CACHE;
- blk_mq_freeze_queue(disk->queue);
- i = queue_limits_commit_update(disk->queue, &lim);
- blk_mq_unfreeze_queue(disk->queue);
+ i = queue_limits_commit_update_frozen(disk->queue, &lim);
if (i)
return i;
return count;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8947dab132d7..af62a8ed8620 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -177,9 +177,8 @@ cache_type_store(struct device *dev, struct device_attribute *attr,

lim = queue_limits_start_update(sdkp->disk->queue);
sd_set_flush_flag(sdkp, &lim);
- blk_mq_freeze_queue(sdkp->disk->queue);
- ret = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ ret = queue_limits_commit_update_frozen(sdkp->disk->queue,
+ &lim);
if (ret)
return ret;
return count;
@@ -483,9 +482,7 @@ provisioning_mode_store(struct device *dev, struct device_attribute *attr,

lim = queue_limits_start_update(sdkp->disk->queue);
sd_config_discard(sdkp, &lim, mode);
- blk_mq_freeze_queue(sdkp->disk->queue);
- err = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ err = queue_limits_commit_update_frozen(sdkp->disk->queue, &lim);
if (err)
return err;
return count;
@@ -594,9 +591,7 @@ max_write_same_blocks_store(struct device *dev, struct device_attribute *attr,

lim = queue_limits_start_update(sdkp->disk->queue);
sd_config_write_same(sdkp, &lim);
- blk_mq_freeze_queue(sdkp->disk->queue);
- err = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ err = queue_limits_commit_update_frozen(sdkp->disk->queue, &lim);
if (err)
return err;
return count;
@@ -3803,9 +3798,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
sd_config_write_same(sdkp, &lim);
kfree(buffer);

- blk_mq_freeze_queue(sdkp->disk->queue);
- err = queue_limits_commit_update(sdkp->disk->queue, &lim);
- blk_mq_unfreeze_queue(sdkp->disk->queue);
+ err = queue_limits_commit_update_frozen(sdkp->disk->queue, &lim);
if (err)
return err;

diff --git a/drivers/scsi/sr.c b/drivers/scsi/sr.c
index 198bec87bb8e..b17796d5ee66 100644
--- a/drivers/scsi/sr.c
+++ b/drivers/scsi/sr.c
@@ -797,10 +797,7 @@ static int get_sectorsize(struct scsi_cd *cd)

lim = queue_limits_start_update(q);
lim.logical_block_size = sector_size;
- blk_mq_freeze_queue(q);
- err = queue_limits_commit_update(q, &lim);
- blk_mq_unfreeze_queue(q);
- return err;
+ return queue_limits_commit_update_frozen(q, &lim);
}

static int get_capabilities(struct scsi_cd *cd)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index e781d4e6f92d..13d353351c37 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -952,6 +952,8 @@ queue_limits_start_update(struct request_queue *q)
mutex_lock(&q->limits_lock);
return q->limits;
}
+int queue_limits_commit_update_frozen(struct request_queue *q,
+ struct queue_limits *lim);
int queue_limits_commit_update(struct request_queue *q,
struct queue_limits *lim);
int queue_limits_set(struct request_queue *q, struct queue_limits *lim);
--
2.45.2

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:38 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
When __blk_mq_update_nr_hw_queues changes the number of tag sets, it
might have to disable poll queues. Currently it does so by adjusting
the BLK_FEAT_POLL, which is a bit against the intent of features that
describe hardware / driver capabilities, but more importantly causes
nasty lock order problems with the broadly held freeze when updating the
number of hardware queues and the limits lock. Fix this by leaving
BLK_FEAT_POLL alone, and instead check for the number of poll queues in
the bio submission and poll handlers. While this adds extra work to the
fast path, the variables are in cache lines used by these operations
anyway, so it should be cheap enough.

Fixes: 8023e144f9d6 ("block: move the poll flag to queue_limits")
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
block/blk-core.c | 17 ++++++++++++++---
block/blk-mq.c | 17 +----------------
2 files changed, 15 insertions(+), 19 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 666efe8fa202..bd5bec843d37 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -753,6 +753,18 @@ static blk_status_t blk_validate_atomic_write_op_size(struct request_queue *q,
return BLK_STS_OK;
}

+static bool bdev_can_poll(struct block_device *bdev)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+
+ if (!(q->limits.features & BLK_FEAT_POLL))
+ return false;
+
+ if (queue_is_mq(q))
+ return q->tag_set->map[HCTX_TYPE_POLL].nr_queues;
+ return true;
+}
+
/**
* submit_bio_noacct - re-submit a bio to the block device layer for I/O
* @bio: The bio describing the location in memory and on the device.
@@ -805,8 +817,7 @@ void submit_bio_noacct(struct bio *bio)
}
}

- if (!(q->limits.features & BLK_FEAT_POLL) &&
- (bio->bi_opf & REQ_POLLED)) {
+ if ((bio->bi_opf & REQ_POLLED) && !bdev_can_poll(bdev)) {
bio_clear_polled(bio);
goto not_supported;
}
@@ -935,7 +946,7 @@ int bio_poll(struct bio *bio, struct io_comp_batch *iob, unsigned int flags)
return 0;

q = bdev_get_queue(bdev);
- if (cookie == BLK_QC_T_NONE || !(q->limits.features & BLK_FEAT_POLL))
+ if (cookie == BLK_QC_T_NONE || !bdev_can_poll(bdev))
return 0;

blk_flush_plug(current->plug, false);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 2e6132f778fd..f795d81b6b38 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -4320,12 +4320,6 @@ void blk_mq_release(struct request_queue *q)
blk_mq_sysfs_deinit(q);
}

-static bool blk_mq_can_poll(struct blk_mq_tag_set *set)
-{
- return set->nr_maps > HCTX_TYPE_POLL &&
- set->map[HCTX_TYPE_POLL].nr_queues;
-}
-
struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
struct queue_limits *lim, void *queuedata)
{
@@ -4336,7 +4330,7 @@ struct request_queue *blk_mq_alloc_queue(struct blk_mq_tag_set *set,
if (!lim)
lim = &default_lim;
lim->features |= BLK_FEAT_IO_STAT | BLK_FEAT_NOWAIT;
- if (blk_mq_can_poll(set))
+ if (set->nr_maps > HCTX_TYPE_POLL)
lim->features |= BLK_FEAT_POLL;

q = blk_alloc_queue(lim, set->numa_node);
@@ -5024,8 +5018,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
fallback:
blk_mq_update_queue_map(set);
list_for_each_entry(q, &set->tag_list, tag_set_list) {
- struct queue_limits lim;
-
blk_mq_realloc_hw_ctxs(set, q);

if (q->nr_hw_queues != set->nr_hw_queues) {
@@ -5039,13 +5031,6 @@ static void __blk_mq_update_nr_hw_queues(struct blk_mq_tag_set *set,
set->nr_hw_queues = prev_nr_hw_queues;
goto fallback;
}
- lim = queue_limits_start_update(q);
- if (blk_mq_can_poll(set))
- lim.features |= BLK_FEAT_POLL;
- else
- lim.features &= ~BLK_FEAT_POLL;
- if (queue_limits_commit_update(q, &lim) < 0)
- pr_warn("updating the poll flag failed\n");
blk_mq_map_swqueue(q);
}

--
2.45.2

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:39 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
De-duplicate the code for updating queue limits by adding a store_limit
method that allows having common code handle the actual queue limits
update.

Note that this is a pure refactoring patch and does not address the
existing freeze vs limits lock order problem in the refactored code,
which will be addressed next.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
block/blk-sysfs.c | 128 ++++++++++++++++++++++------------------------
1 file changed, 61 insertions(+), 67 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 767598e719ab..8d69315e986d 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -24,6 +24,8 @@ struct queue_sysfs_entry {
struct attribute attr;
ssize_t (*show)(struct gendisk *disk, char *page);
ssize_t (*store)(struct gendisk *disk, const char *page, size_t count);
+ int (*store_limit)(struct gendisk *disk, const char *page,
+ size_t count, struct queue_limits *lim);
void (*load_module)(struct gendisk *disk, const char *page, size_t count);
};

@@ -153,13 +155,11 @@ QUEUE_SYSFS_SHOW_CONST(discard_zeroes_data, 0)
QUEUE_SYSFS_SHOW_CONST(write_same_max, 0)
QUEUE_SYSFS_SHOW_CONST(poll_delay, -1)

-static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
- const char *page, size_t count)
+static int queue_max_discard_sectors_store(struct gendisk *disk,
+ const char *page, size_t count, struct queue_limits *lim)
{
unsigned long max_discard_bytes;
- struct queue_limits lim;
ssize_t ret;
- int err;

ret = queue_var_store(&max_discard_bytes, page, count);
if (ret < 0)
@@ -171,38 +171,28 @@ static ssize_t queue_max_discard_sectors_store(struct gendisk *disk,
if ((max_discard_bytes >> SECTOR_SHIFT) > UINT_MAX)
return -EINVAL;

- lim = queue_limits_start_update(disk->queue);
- lim.max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
- err = queue_limits_commit_update(disk->queue, &lim);
- if (err)
- return err;
- return ret;
+ lim->max_user_discard_sectors = max_discard_bytes >> SECTOR_SHIFT;
+ return 0;
}

-static ssize_t
-queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count)
+static int
+queue_max_sectors_store(struct gendisk *disk, const char *page, size_t count,
+ struct queue_limits *lim)
{
unsigned long max_sectors_kb;
- struct queue_limits lim;
ssize_t ret;
- int err;

ret = queue_var_store(&max_sectors_kb, page, count);
if (ret < 0)
return ret;

- lim = queue_limits_start_update(disk->queue);
- lim.max_user_sectors = max_sectors_kb << 1;
- err = queue_limits_commit_update(disk->queue, &lim);
- if (err)
- return err;
- return ret;
+ lim->max_user_sectors = max_sectors_kb << 1;
+ return 0;
}

static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
- size_t count, blk_features_t feature)
+ size_t count, struct queue_limits *lim, blk_features_t feature)
{
- struct queue_limits lim;
unsigned long val;
ssize_t ret;

@@ -210,15 +200,11 @@ static ssize_t queue_feature_store(struct gendisk *disk, const char *page,
if (ret < 0)
return ret;

- lim = queue_limits_start_update(disk->queue);
if (val)
- lim.features |= feature;
+ lim->features |= feature;
else
- lim.features &= ~feature;
- ret = queue_limits_commit_update(disk->queue, &lim);
- if (ret)
- return ret;
- return count;
+ lim->features &= ~feature;
+ return 0;
}

#define QUEUE_SYSFS_FEATURE(_name, _feature) \
@@ -227,10 +213,10 @@ static ssize_t queue_##_name##_show(struct gendisk *disk, char *page) \
return sysfs_emit(page, "%u\n", \
!!(disk->queue->limits.features & _feature)); \
} \
-static ssize_t queue_##_name##_store(struct gendisk *disk, \
- const char *page, size_t count) \
+static int queue_##_name##_store(struct gendisk *disk, \
+ const char *page, size_t count, struct queue_limits *lim) \
{ \
- return queue_feature_store(disk, page, count, _feature); \
+ return queue_feature_store(disk, page, count, lim, _feature); \
}

QUEUE_SYSFS_FEATURE(rotational, BLK_FEAT_ROTATIONAL)
@@ -266,10 +252,9 @@ static ssize_t queue_iostats_passthrough_show(struct gendisk *disk, char *page)
return queue_var_show(!!blk_queue_passthrough_stat(disk->queue), page);
}

-static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
- const char *page, size_t count)
+static int queue_iostats_passthrough_store(struct gendisk *disk,
+ const char *page, size_t count, struct queue_limits *lim)
{
- struct queue_limits lim;
unsigned long ios;
ssize_t ret;

@@ -277,18 +262,13 @@ static ssize_t queue_iostats_passthrough_store(struct gendisk *disk,
if (ret < 0)
return ret;

- lim = queue_limits_start_update(disk->queue);
if (ios)
- lim.flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
+ lim->flags |= BLK_FLAG_IOSTATS_PASSTHROUGH;
else
- lim.flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
-
- ret = queue_limits_commit_update(disk->queue, &lim);
- if (ret)
- return ret;
-
- return count;
+ lim->flags &= ~BLK_FLAG_IOSTATS_PASSTHROUGH;
+ return 0;
}
+
static ssize_t queue_nomerges_show(struct gendisk *disk, char *page)
{
return queue_var_show((blk_queue_nomerges(disk->queue) << 1) |
@@ -391,12 +371,10 @@ static ssize_t queue_wc_show(struct gendisk *disk, char *page)
return sysfs_emit(page, "write through\n");
}

-static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
- size_t count)
+static int queue_wc_store(struct gendisk *disk, const char *page,
+ size_t count, struct queue_limits *lim)
{
- struct queue_limits lim;
bool disable;
- int err;

if (!strncmp(page, "write back", 10)) {
disable = false;
@@ -407,15 +385,11 @@ static ssize_t queue_wc_store(struct gendisk *disk, const char *page,
return -EINVAL;
}

- lim = queue_limits_start_update(disk->queue);
if (disable)
- lim.flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
+ lim->flags |= BLK_FLAG_WRITE_CACHE_DISABLED;
else
- lim.flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
- err = queue_limits_commit_update(disk->queue, &lim);
- if (err)
- return err;
- return count;
+ lim->flags &= ~BLK_FLAG_WRITE_CACHE_DISABLED;
+ return 0;
}

#define QUEUE_RO_ENTRY(_prefix, _name) \
@@ -431,6 +405,13 @@ static struct queue_sysfs_entry _prefix##_entry = { \
.store = _prefix##_store, \
};

+#define QUEUE_LIM_RW_ENTRY(_prefix, _name) \
+static struct queue_sysfs_entry _prefix##_entry = { \
+ .attr = { .name = _name, .mode = 0644 }, \
+ .show = _prefix##_show, \
+ .store_limit = _prefix##_store, \
+}
+
#define QUEUE_RW_LOAD_MODULE_ENTRY(_prefix, _name) \
static struct queue_sysfs_entry _prefix##_entry = { \
.attr = { .name = _name, .mode = 0644 }, \
@@ -441,7 +422,7 @@ static struct queue_sysfs_entry _prefix##_entry = { \

QUEUE_RW_ENTRY(queue_requests, "nr_requests");
QUEUE_RW_ENTRY(queue_ra, "read_ahead_kb");
-QUEUE_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
+QUEUE_LIM_RW_ENTRY(queue_max_sectors, "max_sectors_kb");
QUEUE_RO_ENTRY(queue_max_hw_sectors, "max_hw_sectors_kb");
QUEUE_RO_ENTRY(queue_max_segments, "max_segments");
QUEUE_RO_ENTRY(queue_max_integrity_segments, "max_integrity_segments");
@@ -457,7 +438,7 @@ QUEUE_RO_ENTRY(queue_io_opt, "optimal_io_size");
QUEUE_RO_ENTRY(queue_max_discard_segments, "max_discard_segments");
QUEUE_RO_ENTRY(queue_discard_granularity, "discard_granularity");
QUEUE_RO_ENTRY(queue_max_hw_discard_sectors, "discard_max_hw_bytes");
-QUEUE_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes");
+QUEUE_LIM_RW_ENTRY(queue_max_discard_sectors, "discard_max_bytes");
QUEUE_RO_ENTRY(queue_discard_zeroes_data, "discard_zeroes_data");

QUEUE_RO_ENTRY(queue_atomic_write_max_sectors, "atomic_write_max_bytes");
@@ -477,11 +458,11 @@ QUEUE_RO_ENTRY(queue_max_open_zones, "max_open_zones");
QUEUE_RO_ENTRY(queue_max_active_zones, "max_active_zones");

QUEUE_RW_ENTRY(queue_nomerges, "nomerges");
-QUEUE_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough");
+QUEUE_LIM_RW_ENTRY(queue_iostats_passthrough, "iostats_passthrough");
QUEUE_RW_ENTRY(queue_rq_affinity, "rq_affinity");
QUEUE_RW_ENTRY(queue_poll, "io_poll");
QUEUE_RW_ENTRY(queue_poll_delay, "io_poll_delay");
-QUEUE_RW_ENTRY(queue_wc, "write_cache");
+QUEUE_LIM_RW_ENTRY(queue_wc, "write_cache");
QUEUE_RO_ENTRY(queue_fua, "fua");
QUEUE_RO_ENTRY(queue_dax, "dax");
QUEUE_RW_ENTRY(queue_io_timeout, "io_timeout");
@@ -494,10 +475,10 @@ static struct queue_sysfs_entry queue_hw_sector_size_entry = {
.show = queue_logical_block_size_show,
};

-QUEUE_RW_ENTRY(queue_rotational, "rotational");
-QUEUE_RW_ENTRY(queue_iostats, "iostats");
-QUEUE_RW_ENTRY(queue_add_random, "add_random");
-QUEUE_RW_ENTRY(queue_stable_writes, "stable_writes");
+QUEUE_LIM_RW_ENTRY(queue_rotational, "rotational");
+QUEUE_LIM_RW_ENTRY(queue_iostats, "iostats");
+QUEUE_LIM_RW_ENTRY(queue_add_random, "add_random");
+QUEUE_LIM_RW_ENTRY(queue_stable_writes, "stable_writes");

#ifdef CONFIG_BLK_WBT
static ssize_t queue_var_store64(s64 *var, const char *page)
@@ -695,7 +676,7 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
struct request_queue *q = disk->queue;
ssize_t res;

- if (!entry->store)
+ if (!entry->store_limit && !entry->store)
return -EIO;

/*
@@ -706,11 +687,24 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
if (entry->load_module)
entry->load_module(disk, page, length);

- blk_mq_freeze_queue(q);
mutex_lock(&q->sysfs_lock);
- res = entry->store(disk, page, length);
- mutex_unlock(&q->sysfs_lock);
+ blk_mq_freeze_queue(q);
+ if (entry->store_limit) {
+ struct queue_limits lim = queue_limits_start_update(q);
+
+ res = entry->store_limit(disk, page, length, &lim);
+ if (res < 0) {
+ queue_limits_cancel_update(q);
+ } else {
+ res = queue_limits_commit_update(q, &lim);
+ if (!res)
+ res = length;
+ }
+ } else {
+ res = entry->store(disk, page, length);
+ }
blk_mq_unfreeze_queue(q);
+ mutex_unlock(&q->sysfs_lock);
return res;
}

--
2.45.2

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:45 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
queue_attr_store() always freezes a device queue before calling the
attribute store operation. For attributes that control queue limits, the
store operation will also lock the queue limits with a call to
queue_limits_start_update(). However, some drivers (e.g. SCSI sd) may
need to issue commands to a device to obtain limit values from the
hardware with the queue limits locked. This creates a potential ABBA
deadlock situation if a user attempts to modify a limit (thus freezing
the device queue) while the device driver starts a revalidation of the
device queue limits.

Avoid such deadlock by not freezing the queue before calling the
->store_limit() method in struct queue_sysfs_entry and instead use the
queue_limits_commit_update_frozen helper to freeze the queue after taking
the limits lock.

(commit log adapted from a similar patch from Damien Le Moal)

Fixes: ff956a3be95b ("block: use queue_limits_commit_update in queue_discard_max_store")
Fixes: 0327ca9d53bf ("block: use queue_limits_commit_update in queue_max_sectors_store")
Signed-off-by: Christoph Hellwig <h...@lst.de>
---
block/blk-sysfs.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 8d69315e986d..3bac27fcd635 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -687,22 +687,24 @@ queue_attr_store(struct kobject *kobj, struct attribute *attr,
if (entry->load_module)
entry->load_module(disk, page, length);

- mutex_lock(&q->sysfs_lock);
- blk_mq_freeze_queue(q);
if (entry->store_limit) {
struct queue_limits lim = queue_limits_start_update(q);

res = entry->store_limit(disk, page, length, &lim);
if (res < 0) {
queue_limits_cancel_update(q);
- } else {
- res = queue_limits_commit_update(q, &lim);
- if (!res)
- res = length;
+ return res;
}
- } else {
- res = entry->store(disk, page, length);
+
+ res = queue_limits_commit_update_frozen(q, &lim);
+ if (res)
+ return res;
+ return length;
}
+
+ mutex_lock(&q->sysfs_lock);
+ blk_mq_freeze_queue(q);
+ res = entry->store(disk, page, length);
blk_mq_unfreeze_queue(q);

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:47 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Match the locking order used by the core block code by only freezing
the queue after taking the limits lock.

Unlike most queue updates this does not use the
queue_limits_commit_update_frozen helper as the nvme driver want the
queue frozen for more than just the limits update.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
drivers/nvme/host/core.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index c2250ddef5a2..1ccf17f6ea7f 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2128,9 +2128,10 @@ static int nvme_update_ns_info_generic(struct nvme_ns *ns,
struct queue_limits lim;
int ret;

- blk_mq_freeze_queue(ns->disk->queue);
lim = queue_limits_start_update(ns->disk->queue);
nvme_set_ctrl_limits(ns->ctrl, &lim);
+
+ blk_mq_freeze_queue(ns->disk->queue);
ret = queue_limits_commit_update(ns->disk->queue, &lim);
set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
blk_mq_unfreeze_queue(ns->disk->queue);
@@ -2177,12 +2178,12 @@ static int nvme_update_ns_info_block(struct nvme_ns *ns,
goto out;
}

+ lim = queue_limits_start_update(ns->disk->queue);
+
blk_mq_freeze_queue(ns->disk->queue);
ns->head->lba_shift = id->lbaf[lbaf].ds;
ns->head->nuse = le64_to_cpu(id->nuse);
capacity = nvme_lba_to_sect(ns->head, le64_to_cpu(id->nsze));
-
- lim = queue_limits_start_update(ns->disk->queue);
nvme_set_ctrl_limits(ns->ctrl, &lim);
nvme_configure_metadata(ns->ctrl, ns->head, id, nvm, info);
nvme_set_chunk_sectors(ns, id, &lim);
@@ -2285,6 +2286,7 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
struct queue_limits *ns_lim = &ns->disk->queue->limits;
struct queue_limits lim;

+ lim = queue_limits_start_update(ns->head->disk->queue);
blk_mq_freeze_queue(ns->head->disk->queue);
/*
* queue_limits mixes values that are the hardware limitations
@@ -2301,7 +2303,6 @@ static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_ns_info *info)
* the splitting limits in to make sure we still obey possibly
* lower limitations of other controllers.
*/
- lim = queue_limits_start_update(ns->head->disk->queue);
lim.logical_block_size = ns_lim->logical_block_size;
lim.physical_block_size = ns_lim->physical_block_size;
lim.io_min = ns_lim->io_min;
--
2.45.2

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:50 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Match the locking order used by the core block code by only freezing
the queue after taking the limits lock using the
queue_limits_commit_update_frozen helper.

This also allows removes the need for the separate __nbd_set_size helper,
so remove it.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
drivers/block/nbd.c | 17 ++---------------
1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 259bd57fc529..efa05c3c06bf 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -327,8 +327,7 @@ static void nbd_mark_nsock_dead(struct nbd_device *nbd, struct nbd_sock *nsock,
nsock->sent = 0;
}

-static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
- loff_t blksize)
+static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize, loff_t blksize)
{
struct queue_limits lim;
int error;
@@ -368,7 +367,7 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,

lim.logical_block_size = blksize;
lim.physical_block_size = blksize;
- error = queue_limits_commit_update(nbd->disk->queue, &lim);
+ error = queue_limits_commit_update_frozen(nbd->disk->queue, &lim);
if (error)
return error;

@@ -379,18 +378,6 @@ static int __nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
return 0;
}

-static int nbd_set_size(struct nbd_device *nbd, loff_t bytesize,
- loff_t blksize)
-{
- int error;
-
- blk_mq_freeze_queue(nbd->disk->queue);
- error = __nbd_set_size(nbd, bytesize, blksize);
- blk_mq_unfreeze_queue(nbd->disk->queue);
-
- return error;
-}
-
static void nbd_complete_rq(struct request *req)
{
struct nbd_cmd *cmd = blk_mq_rq_to_pdu(req);
--
2.45.2

Christoph Hellwig

unread,
Jan 7, 2025, 1:31:54 AMJan 7
to Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Match the locking order used by the core block code by only freezing
the queue after taking the limits lock using the
queue_limits_commit_update_frozen helper.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
drivers/usb/storage/scsiglue.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 8c8b5e6041cc..dc98ceecb724 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -592,12 +592,9 @@ static ssize_t max_sectors_store(struct device *dev, struct device_attribute *at
if (sscanf(buf, "%hu", &ms) <= 0)
return -EINVAL;

- blk_mq_freeze_queue(sdev->request_queue);
lim = queue_limits_start_update(sdev->request_queue);
lim.max_hw_sectors = ms;
- ret = queue_limits_commit_update(sdev->request_queue, &lim);
- blk_mq_unfreeze_queue(sdev->request_queue);
-
+ ret = queue_limits_commit_update_frozen(sdev->request_queue, &lim);
if (ret)
return ret;
return count;
--
2.45.2

Nilay Shroff

unread,
Jan 7, 2025, 1:57:50 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
As discussed in another thread with Damien, shouldn't we need to
move bdev_can_poll() to header file? We also need to use this
function while reading sysfs attribute "io-poll", no?

Nilay Shroff

unread,
Jan 7, 2025, 1:58:42 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net


On 1/7/25 12:00 PM, Christoph Hellwig wrote:
I think we should freeze queue before nvme_set_ctrl_limits().

Nilay Shroff

unread,
Jan 7, 2025, 2:02:12 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net


On 1/7/25 12:00 PM, Christoph Hellwig wrote:
> queue_limits_commit_update is the function that needs to operate on a
> frozen queue, not queue_limits_start_update. Update the kerneldoc
> comments to reflect that.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---
> block/blk-settings.c | 3 ++-
> include/linux/blkdev.h | 3 +--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8f09e33f41f6..b6b8c580d018 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
> * @lim: limits to apply
> *
> * Apply the limits in @lim that were obtained from queue_limits_start_update()
> - * and updated by the caller to @q.
> + * and updated by the caller to @q. The caller must have frozen the queue or
> + * ensure that there are outstanding I/Os by other means.

Maybe typo: "ensure that there are *NO* outstanding I/Os by other means"

Ming Lei

unread,
Jan 7, 2025, 2:26:10 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Order between freeze and ->sysfs_lock is changed, and it may cause new
lockdep warning because we may freeze queue first before acquiring
->sysfs_lock in del_gendisk().


thanks,
Ming

Nilay Shroff

unread,
Jan 7, 2025, 2:51:27 AMJan 7
to Ming Lei, Christoph Hellwig, Jens Axboe, Damien Le Moal, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On contrary, in elevator_disable() and elevator_switch() we acquire
->sysfs_lock first before freezing the queue. I think this is a mess and
we need to fix ordering. We need to decide ordering rules. IMO, the
correct order should be to acquire ->sysfs_lock before freezing queue.
Likewise with this patch now we acquire ->limits_lock before freezing the
queue.

Thanks,
--Nilay


Christoph Hellwig

unread,
Jan 7, 2025, 3:21:52 AMJan 7
to Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 12:27:35PM +0530, Nilay Shroff wrote:
> As discussed in another thread with Damien, shouldn't we need to
> move bdev_can_poll() to header file?

Well, if it was needed I would have done it, otherwise the code wouldn't
compile, would it?

> We also need to use this
> function while reading sysfs attribute "io-poll", no?

This now reports polling support when the driver declared it but
later resized the number of queues to have no queues left. Which I
think is a fine tradeoff if you do that.

Christoph Hellwig

unread,
Jan 7, 2025, 3:22:30 AMJan 7
to Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 12:28:29PM +0530, Nilay Shroff wrote:
> > - blk_mq_freeze_queue(ns->disk->queue);
> > lim = queue_limits_start_update(ns->disk->queue);
> > nvme_set_ctrl_limits(ns->ctrl, &lim);
> > +
> > + blk_mq_freeze_queue(ns->disk->queue);
> > ret = queue_limits_commit_update(ns->disk->queue, &lim);
> > set_disk_ro(ns->disk, nvme_ns_is_readonly(ns, info));
> > blk_mq_unfreeze_queue(ns->disk->queue);
>
> I think we should freeze queue before nvme_set_ctrl_limits().

Why?

Christoph Hellwig

unread,
Jan 7, 2025, 3:23:11 AMJan 7
to Ming Lei, Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Hi Ming,

this is a friendly reminder to reply without quoting the entire mail.
I did not find any content after scrolling half a dozend page of
full quote and gave up.
---end quoted text---

Christoph Hellwig

unread,
Jan 7, 2025, 3:26:58 AMJan 7
to Nilay Shroff, Ming Lei, Christoph Hellwig, Jens Axboe, Damien Le Moal, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Hi Nilay,

this is a friendly reminder to reply without quoting the entire mail.
I did not find any content after scrolling half a dozen page of

Nilay Shroff

unread,
Jan 7, 2025, 3:45:18 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
The nvme_set_ctrl_limits() sets certain queue limits which are
also used during IO processing. For instance, ->max_segment_size
is used while submitting bio.
Also, if we look at the code before your patch, nvme_set_ctrl_limits()
is called when the queue is freezed.

Thanks,
--Nilay

Christoph Hellwig

unread,
Jan 7, 2025, 3:58:32 AMJan 7
to Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 02:15:05PM +0530, Nilay Shroff wrote:
> The nvme_set_ctrl_limits() sets certain queue limits which are
> also used during IO processing. For instance, ->max_segment_size
> is used while submitting bio.

nvme_set_ctrl_limits only sets them in the on-stack queue_limits
structure, which is local to the calling thread.

Nilay Shroff

unread,
Jan 7, 2025, 4:23:57 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net


On 1/7/25 1:51 PM, Christoph Hellwig wrote:
> On Tue, Jan 07, 2025 at 12:27:35PM +0530, Nilay Shroff wrote:
>> As discussed in another thread with Damien, shouldn't we need to
>> move bdev_can_poll() to header file?
>
> Well, if it was needed I would have done it, otherwise the code wouldn't
> compile, would it?
>
I think, there won't be compile error because if we look at the show function
for "io_poll" attribute under sysfs, then I see it evaluates the queue limits
feature flag BLK_FEAT_POLL and returns the value.

>> We also need to use this
>> function while reading sysfs attribute "io-poll", no?
>
> This now reports polling support when the driver declared it but
> later resized the number of queues to have no queues left. Which I
> think is a fine tradeoff if you do that.
>
When I applied you patch on my system and access io_poll attribute
of one of my nvme disk, I see it returns 1, though I didn't configure
poll queue for the disk. With this patch, as we're now always setting
BLK_FEAT_POLL (under blk_mq_alloc_queue()) it return 1. So when I haven't
configured poll queue for NVMe driver, shouldn't it return 0 when I access
/sys/block/nvmeXnY/queue/io_poll ?

Thanks,
--Nilay

Nilay Shroff

unread,
Jan 7, 2025, 4:29:25 AMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Oh yeah, I didn't notice it while reading patch. Agreed, we don't
require freezing queue here. Sorry for the noise.

Thanks,
--Nilay

Damien Le Moal

unread,
Jan 7, 2025, 4:58:24 AMJan 7
to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 15:30, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock.
>
> Unlike most queue updates this does not use the
> queue_limits_commit_update_frozen helper as the nvme driver want the
> queue frozen for more than just the limits update.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Looks good to me.

Reviewed-by: Damien Le Moal <dle...@kernel.org>

--
Damien Le Moal
Western Digital Research

Damien Le Moal

unread,
Jan 7, 2025, 5:00:11 AMJan 7
to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 15:30, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper.
>
> This also allows removes the need for the separate __nbd_set_size helper,
> so remove it.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Damien Le Moal

unread,
Jan 7, 2025, 5:00:29 AMJan 7
to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 15:30, Christoph Hellwig wrote:
> Match the locking order used by the core block code by only freezing
> the queue after taking the limits lock using the
> queue_limits_commit_update_frozen helper.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Damien Le Moal

unread,
Jan 7, 2025, 5:04:02 AMJan 7
to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 15:30, Christoph Hellwig wrote:
> queue_limits_commit_update is the function that needs to operate on a
> frozen queue, not queue_limits_start_update. Update the kerneldoc
> comments to reflect that.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

With this comment fixed as Nilay suggested, feel free to add:

Damien Le Moal

unread,
Jan 7, 2025, 5:05:14 AMJan 7
to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 15:30, Christoph Hellwig wrote:
> Add a helper that freezes the queue, updates the queue limits and
> unfreezes the queue and convert all open coded versions of that to the
> new helper.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Damien Le Moal

unread,
Jan 7, 2025, 5:05:25 AMJan 7
to Christoph Hellwig, Jens Axboe, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 15:30, Christoph Hellwig wrote:
> De-duplicate the code for updating queue limits by adding a store_limit
> method that allows having common code handle the actual queue limits
> update.
>
> Note that this is a pure refactoring patch and does not address the
> existing freeze vs limits lock order problem in the refactored code,
> which will be addressed next.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>

Looks good.

Ming Lei

unread,
Jan 7, 2025, 5:25:38 AMJan 7
to Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 01:21:14PM +0530, Nilay Shroff wrote:
>
>
> On 1/7/25 12:55 PM, Ming Lei wrote:
> > On Tue, Jan 07, 2025 at 07:30:36AM +0100, Christoph Hellwig wrote:
> >> De-duplicate the code for updating queue limits by adding a store_limit
> >> method that allows having common code handle the actual queue limits
> >> update.
> >>
> >> Note that this is a pure refactoring patch and does not address the
> >> existing freeze vs limits lock order problem in the refactored code,
> >> which will be addressed next.
> >>
> >> Signed-off-by: Christoph Hellwig <h...@lst.de>

...

> > Order between freeze and ->sysfs_lock is changed, and it may cause new
> > lockdep warning because we may freeze queue first before acquiring
> > ->sysfs_lock in del_gendisk().
> >
> On contrary, in elevator_disable() and elevator_switch() we acquire
> ->sysfs_lock first before freezing the queue. I think this is a mess and
> we need to fix ordering. We need to decide ordering rules. IMO, the
> correct order should be to acquire ->sysfs_lock before freezing queue.
> Likewise with this patch now we acquire ->limits_lock before freezing the
> queue.

__blk_mq_update_nr_hw_queues() freezes queue before acquiring ->syfs_lock too.

So yes, it is a mess wrt. order between ->sysfs_lock and freezing queue.


Thanks,
Ming

Christoph Hellwig

unread,
Jan 7, 2025, 8:49:21 AMJan 7
to Ming Lei, Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 06:25:20PM +0800, Ming Lei wrote:
> __blk_mq_update_nr_hw_queues() freezes queue before acquiring ->syfs_lock too.
>
> So yes, it is a mess wrt. order between ->sysfs_lock and freezing queue.

Let's sort out the current freeze vs limits lock issue first. Next step
is to kill sysfs_lock in it's current form.

Christoph Hellwig

unread,
Jan 7, 2025, 8:52:00 AMJan 7
to Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 02:53:40PM +0530, Nilay Shroff wrote:
> When I applied you patch on my system and access io_poll attribute
> of one of my nvme disk, I see it returns 1, though I didn't configure
> poll queue for the disk. With this patch, as we're now always setting
> BLK_FEAT_POLL (under blk_mq_alloc_queue()) it return 1. So when I haven't
> configured poll queue for NVMe driver, shouldn't it return 0 when I access
> /sys/block/nvmeXnY/queue/io_poll ?

While that was the case with the previous RFC series it should not be
the case with this version, as the nvme driver does not enable the
poll tag set map unless poll queues are enabled. I also double checked
that I do not see it on any of my test setups.

Nilay Shroff

unread,
Jan 7, 2025, 12:55:28 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Ohk I did install previous RFC series and tested it.
On another note, with latest patch series, assuming NVMe driver reports polling
support when it's loaded, accessing io_poll under sysfs reports 1. This is good.
However later resizing queue so that no poll queue is left and I reset the controller
and then access the io_poll it still reports 1. Is this expected? Other than this
everything else looks fine.

Thanks,
--Nilay

Nilay Shroff

unread,
Jan 7, 2025, 12:58:27 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
Looks good to me.
Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>



Nilay Shroff

unread,
Jan 7, 2025, 12:59:00 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Nilay Shroff

unread,
Jan 7, 2025, 1:00:06 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Nilay Shroff

unread,
Jan 7, 2025, 1:00:54 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Nilay Shroff

unread,
Jan 7, 2025, 1:01:27 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Nilay Shroff

unread,
Jan 7, 2025, 1:02:01 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net

Nilay Shroff

unread,
Jan 7, 2025, 1:05:27 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 1/7/25 12:31 PM, Nilay Shroff wrote:
>
>
> On 1/7/25 12:00 PM, Christoph Hellwig wrote:
>> queue_limits_commit_update is the function that needs to operate on a
>> frozen queue, not queue_limits_start_update. Update the kerneldoc
>> comments to reflect that.
>>
>> Signed-off-by: Christoph Hellwig <h...@lst.de>
>> ---
>> block/blk-settings.c | 3 ++-
>> include/linux/blkdev.h | 3 +--
>> 2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 8f09e33f41f6..b6b8c580d018 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
>> * @lim: limits to apply
>> *
>> * Apply the limits in @lim that were obtained from queue_limits_start_update()
>> - * and updated by the caller to @q.
>> + * and updated by the caller to @q. The caller must have frozen the queue or
>> + * ensure that there are outstanding I/Os by other means.
>
> Maybe typo: "ensure that there are *NO* outstanding I/Os by other means"

Other than typo, everything else looks good. Once the above typo is fixed, please feel free to add:

Reviewed-by: Nilay Shroff <ni...@linux.ibm.com>


Ming Lei

unread,
Jan 7, 2025, 9:22:12 PMJan 7
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On Tue, Jan 07, 2025 at 07:30:32AM +0100, Christoph Hellwig wrote:
> Hi all,
>
> this is my version of Damien's "Fix queue freeze and limit locking order".
> A lot looks very similar, but it was done independently based on the
> previous discussion.
>
> Changes since RFC:
> - fix a bizzare virtio_blk bisection snafu
> - set BLK_FEAT_POLL a little less eagerly for blk-mq
> - drop the loop patch just adding a comment
> - improve various commit logs and coments
>

loop_set_block_size() needs same change, can you cover it to this series?

Thanks,
Ming

John Garry

unread,
Jan 9, 2025, 6:19:58 AMJan 9
to Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, Nilay Shroff, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 07/01/2025 06:30, Christoph Hellwig wrote:
> queue_limits_commit_update is the function that needs to operate on a
> frozen queue, not queue_limits_start_update. Update the kerneldoc
> comments to reflect that.
>
> Signed-off-by: Christoph Hellwig <h...@lst.de>
> ---
> block/blk-settings.c | 3 ++-
> include/linux/blkdev.h | 3 +--
> 2 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> index 8f09e33f41f6..b6b8c580d018 100644
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
> * @lim: limits to apply
> *
> * Apply the limits in @lim that were obtained from queue_limits_start_update()
> - * and updated by the caller to @q.
> + * and updated by the caller to @q. The caller must have frozen the queue or
> + * ensure that there are outstanding I/Os by other means.

is that a typo - /s/outstanding/ no outstanding/ ?

> *
> * Returns 0 if successful, else a negative error code.
> */
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 5d40af2ef971..e781d4e6f92d 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -944,8 +944,7 @@ static inline unsigned int blk_boundary_sectors_left(sector_t offset,
> * the caller can modify. The caller must call queue_limits_commit_update()
> * to finish the update.
> *
> - * Context: process context. The caller must have frozen the queue or ensured
> - * that there is outstanding I/O by other means.
> + * Context: process context.
> */
> static inline struct queue_limits
> queue_limits_start_update(struct request_queue *q)

Nilay Shroff

unread,
Jan 9, 2025, 6:24:51 AMJan 9
to John Garry, Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net


On 1/9/25 4:49 PM, John Garry wrote:
> On 07/01/2025 06:30, Christoph Hellwig wrote:
>> queue_limits_commit_update is the function that needs to operate on a
>> frozen queue, not queue_limits_start_update.  Update the kerneldoc
>> comments to reflect that.
>>
>> Signed-off-by: Christoph Hellwig <h...@lst.de>
>> ---
>>   block/blk-settings.c   | 3 ++-
>>   include/linux/blkdev.h | 3 +--
>>   2 files changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/blk-settings.c b/block/blk-settings.c
>> index 8f09e33f41f6..b6b8c580d018 100644
>> --- a/block/blk-settings.c
>> +++ b/block/blk-settings.c
>> @@ -413,7 +413,8 @@ int blk_set_default_limits(struct queue_limits *lim)
>>    * @lim:    limits to apply
>>    *
>>    * Apply the limits in @lim that were obtained from queue_limits_start_update()
>> - * and updated by the caller to @q.
>> + * and updated by the caller to @q.  The caller must have frozen the queue or
>> + * ensure that there are outstanding I/Os by other means.
>
> is that a typo - /s/outstanding/ no outstanding/ ?
It's already fixed in v3 here: https://lore.kernel.org/all/20250109055810...@lst.de/

Thanks,
--Nilay

John Garry

unread,
Jan 9, 2025, 6:27:25 AMJan 9
to Nilay Shroff, Christoph Hellwig, Jens Axboe, Damien Le Moal, Ming Lei, linux...@vger.kernel.org, linux...@lists.infradead.org, n...@other.debian.org, linux...@vger.kernel.org, usb-s...@lists.one-eyed-alien.net
On 09/01/2025 11:24, Nilay Shroff wrote:
>> is that a typo -/s/outstanding/ no outstanding/ ?
> It's already fixed in v3 here:https://urldefense.com/v3/__https://lore.kernel.org/
> all/20250109055810...@lst.de/__;!!ACWV5N9M2RV99hQ!
> IYos3CWlynlAtnimomFwr9cZ8nYQLZJ6haZCdjSYQp5lpMaD1aJRlX9OLTuHEU-
> XlB8R5BpZJQazJwzm2XQ$

oh, yes. I thought that there was a v3, but it was hiding in another
folder...

cheers
Reply all
Reply to author
Forward
0 new messages