[PATCH 0/8] dma-mapping: migrate to physical address-based API

8 views
Skip to first unread message

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:18 AMJun 25
to Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series consists of 8 patches that progressively convert the DMA
mapping infrastructure from page-based to physical address-based APIs:

The series maintains backward compatibility by keeping the old
page-based API as wrapper functions around the new physical
address-based implementations.

Thanks

Leon Romanovsky (8):
dma-debug: refactor to use physical addresses for page mapping
dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
kmsan: convert kmsan_handle_dma to use physical addresses
dma-mapping: fail early if physical address is mapped through platform
callback
dma-mapping: export new dma_*map_phys() interface
mm/hmm: migrate to physical address-based DMA mapping API

Documentation/core-api/dma-api.rst | 4 +-
arch/powerpc/kernel/dma-iommu.c | 4 +-
drivers/iommu/dma-iommu.c | 14 +++----
drivers/virtio/virtio_ring.c | 4 +-
include/linux/dma-map-ops.h | 8 ++--
include/linux/dma-mapping.h | 13 ++++++
include/linux/iommu-dma.h | 7 ++--
include/linux/kmsan.h | 12 +++---
include/trace/events/dma.h | 4 +-
kernel/dma/debug.c | 28 ++++++++-----
kernel/dma/debug.h | 16 ++++---
kernel/dma/direct.c | 6 +--
kernel/dma/direct.h | 13 +++---
kernel/dma/mapping.c | 67 +++++++++++++++++++++---------
kernel/dma/ops_helpers.c | 6 +--
mm/hmm.c | 8 ++--
mm/kmsan/hooks.c | 36 ++++++++++++----
tools/virtio/linux/kmsan.h | 2 +-
18 files changed, 159 insertions(+), 93 deletions(-)

--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:21 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
include/trace/events/dma.h | 4 ++--
kernel/dma/mapping.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c..c77d478b6deb 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -71,7 +71,7 @@ DEFINE_EVENT(dma_map, name, \
size_t size, enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))

-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
DEFINE_MAP_EVENT(dma_map_resource);

DECLARE_EVENT_CLASS(dma_unmap,
@@ -109,7 +109,7 @@ DEFINE_EVENT(dma_unmap, name, \
enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, addr, size, dir, attrs))

-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
DEFINE_UNMAP_EVENT(dma_unmap_resource);

DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4c1dfbabb8ae..fe1f0da6dc50 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
iommu_dma_unmap_page(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
- trace_dma_unmap_page(dev, addr, size, dir, attrs);
+ trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:26 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.

The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.

This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
Documentation/core-api/dma-api.rst | 4 ++--
kernel/dma/debug.c | 28 +++++++++++++++++-----------
kernel/dma/debug.h | 16 +++++++---------
kernel/dma/mapping.c | 15 ++++++++-------
4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 2ad08517e626..7491ee85ab25 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -816,7 +816,7 @@ example warning message may look like this::
[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
[<ffffffff803c7ea3>] check_unmap+0x203/0x490
- [<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+ [<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -910,7 +910,7 @@ that a driver may be leaking mappings.
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check DMA mapping errors on addresses returned by dma_map_single() and
dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
this flag is still set, prints warning message that includes call trace that
leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index e43c6de2bce4..517dc58329e0 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -39,6 +39,7 @@ enum {
dma_debug_sg,
dma_debug_coherent,
dma_debug_resource,
+ dma_debug_phy,
};

enum map_err_types {
@@ -141,6 +142,7 @@ static const char *type2name[] = {
[dma_debug_sg] = "scatter-gather",
[dma_debug_coherent] = "coherent",
[dma_debug_resource] = "resource",
+ [dma_debug_phy] = "phy",
};

static const char *dir2name[] = {
@@ -1201,9 +1203,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
}
EXPORT_SYMBOL(debug_dma_map_single);

-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
- size_t size, int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ int direction, dma_addr_t dma_addr, unsigned long attrs)
{
struct dma_debug_entry *entry;

@@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
return;

entry->dev = dev;
- entry->type = dma_debug_single;
- entry->paddr = page_to_phys(page) + offset;
+ entry->type = dma_debug_phy;
+ entry->paddr = phys;
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
entry->map_err_type = MAP_ERR_NOT_CHECKED;

- check_for_stack(dev, page, offset);
+ if (pfn_valid(PHYS_PFN(phys))) {
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(page);

- if (!PageHighMem(page)) {
- void *addr = page_address(page) + offset;
+ check_for_stack(dev, page, offset);

- check_for_illegal_area(dev, addr, size);
+ if (!PageHighMem(page)) {
+ void *addr = page_address(page) + offset;
+
+ check_for_illegal_area(dev, addr, size);
+ }
}

add_dma_entry(entry, attrs);
@@ -1274,11 +1280,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
}
EXPORT_SYMBOL(debug_dma_mapping_error);

-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
size_t size, int direction)
{
struct dma_debug_entry ref = {
- .type = dma_debug_single,
+ .type = dma_debug_phy,
.dev = dev,
.dev_addr = dma_addr,
.size = size,
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index f525197d3cae..76adb42bffd5 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
#define _KERNEL_DMA_DEBUG_H

#ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction, dma_addr_t dma_addr,
unsigned long attrs);

-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction);

extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -55,14 +54,13 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
struct scatterlist *sg,
int nelems, int direction);
#else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction,
+ dma_addr_t dma_addr, unsigned long attrs)
{
}

-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction)
{
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 107e4a4d251d..4c1dfbabb8ae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t addr;

BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+ arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
- attrs);
- debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+ trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

return addr;
}
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_page(dev, addr, size, dir, attrs);
- debug_dma_unmap_page(dev, addr, size, dir);
+ debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);

@@ -712,7 +712,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
if (page) {
trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle,
size, dir, gfp, 0);
- debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0);
+ debug_dma_map_phys(dev, page_to_phys(page), size, dir,
+ *dma_handle, 0);
} else {
trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0);
}
@@ -738,7 +739,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
dma_addr_t dma_handle, enum dma_data_direction dir)
{
trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0);
- debug_dma_unmap_page(dev, dma_handle, size, dir);
+ debug_dma_unmap_phys(dev, dma_handle, size, dir);
__dma_free_pages(dev, size, page, dma_handle, dir);
}
EXPORT_SYMBOL_GPL(dma_free_pages);
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:33 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.

The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.

All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 ++++++--------
include/linux/iommu-dma.h | 7 +++----
kernel/dma/mapping.c | 4 ++--
kernel/dma/ops_helpers.c | 6 +++---
4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fe..cd4bc22efa96 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1190,11 +1190,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
return iova_offset(iovad, phys | size);
}

-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1222,7 +1220,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
return iova;
}

-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1341,7 +1339,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
int i;

for_each_sg(sg, s, nents, i)
- iommu_dma_unmap_page(dev, sg_dma_address(s),
+ iommu_dma_unmap_phys(dev, sg_dma_address(s),
sg_dma_len(s), dir, attrs);
}

@@ -1354,8 +1352,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
sg_dma_mark_swiotlb(sg);

for_each_sg(sg, s, nents, i) {
- sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
- s->offset, s->length, dir, attrs);
+ sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+ s->length, dir, attrs);
if (sg_dma_address(s) == DMA_MAPPING_ERROR)
goto out_unmap;
sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39..485bdffed988 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
}
#endif /* CONFIG_IOMMU_DMA */

-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs);
int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe1f0da6dc50..58482536db9b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
- addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+ addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
arch_dma_unmap_page_direct(dev, addr + size))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+ iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb9..6f9d604d9d40 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
return NULL;

if (use_dma_iommu(dev))
- *dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
- DMA_ATTR_SKIP_CPU_SYNC);
+ *dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
else
*dma_handle = ops->map_page(dev, page, 0, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
const struct dma_map_ops *ops = get_dma_ops(dev);

if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, dma_handle, size, dir,
+ iommu_dma_unmap_phys(dev, dma_handle, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
else if (ops->unmap_page)
ops->unmap_page(dev, dma_handle, size, dir,
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:37 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the KMSAN DMA handling function from page-based to physical
address-based interface.

The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). A PFN_VALID check is added to prevent KMSAN operations
on non-page memory, preventing from non struct page backed address,

As part of this change, support for highmem addresses is implemented
using kmap_local_page() to handle both lowmem and highmem regions
properly. All callers throughout the codebase are updated to use the
new phys_addr_t based interface.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/virtio/virtio_ring.c | 4 ++--
include/linux/kmsan.h | 12 +++++++-----
kernel/dma/mapping.c | 2 +-
mm/kmsan/hooks.c | 36 +++++++++++++++++++++++++++++-------
tools/virtio/linux/kmsan.h | 2 +-
5 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index b784aab66867..dab49385e3e8 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
* is initialized by the hardware. Explicitly check/unpoison it
* depending on the direction.
*/
- kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+ kmsan_handle_dma(sg_phys(sg), sg->length, direction);
*addr = (dma_addr_t)sg_phys(sg);
return 0;
}
@@ -3149,7 +3149,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
struct vring_virtqueue *vq = to_vvq(_vq);

if (!vq->use_dma_api) {
- kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+ kmsan_handle_dma(virt_to_phys(ptr), size, dir);
return (dma_addr_t)virt_to_phys(ptr);
}

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d5..6f27b9824ef7 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

/**
* kmsan_handle_dma() - Handle a DMA data transfer.
- * @page: first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys: physical address of the buffer.
* @size: buffer size.
* @dir: one of possible dma_data_direction values.
*
@@ -191,8 +190,11 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
* * checks the buffer, if it is copied to device;
* * initializes the buffer, if it is copied from device;
* * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ *
+ * The function handles page lookup internally and supports both lowmem
+ * and highmem addresses.
*/
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir);

/**
@@ -372,8 +374,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
{
}

-static inline void kmsan_handle_dma(struct page *page, size_t offset,
- size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+ enum dma_data_direction dir)
{
}

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 80481a873340..709405d46b2b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
- kmsan_handle_dma(page, offset, size, dir);
+ kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f0..eab7912a3bf0 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,25 +336,48 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
}

/* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
u64 page_offset, to_go, addr;
+ struct page *page;
+ void *kaddr;

- if (PageHighMem(page))
+ if (!pfn_valid(PHYS_PFN(phys)))
return;
- addr = (u64)page_address(page) + offset;
+
+ page = phys_to_page(phys);
+ page_offset = offset_in_page(phys);
+
/*
* The kernel may occasionally give us adjacent DMA pages not belonging
* to the same allocation. Process them separately to avoid triggering
* internal KMSAN checks.
*/
while (size > 0) {
- page_offset = offset_in_page(addr);
to_go = min(PAGE_SIZE - page_offset, (u64)size);
+
+ if (PageHighMem(page))
+ /* Handle highmem pages using kmap */
+ kaddr = kmap_local_page(page);
+ else
+ /* Lowmem pages can be accessed directly */
+ kaddr = page_address(page);
+
+ addr = (u64)kaddr + page_offset;
kmsan_handle_dma_page((void *)addr, to_go, dir);
- addr += to_go;
+
+ if (PageHighMem(page))
+ kunmap_local(page);
+
+ phys += to_go;
size -= to_go;
+
+ /* Move to next page if needed */
+ if (size > 0) {
+ page = phys_to_page(phys);
+ page_offset = offset_in_page(phys);
+ }
}
}
EXPORT_SYMBOL_GPL(kmsan_handle_dma);
@@ -366,8 +389,7 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
int i;

for_each_sg(sg, item, nents, i)
- kmsan_handle_dma(sg_page(item), item->offset, item->length,
- dir);
+ kmsan_handle_dma(sg_phys(item), item->length, dir);
}

/* Functions from kmsan-checks.h follow. */
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5..6cd2e3efd03d 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@

#include <linux/gfp.h>

-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
}
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:40 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

All platforms which implement map_page interface don't support physical
addresses without real struct page. Add condition to check it.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
kernel/dma/mapping.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 709405d46b2b..74efb6909103 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
{
const struct dma_map_ops *ops = get_dma_ops(dev);
phys_addr_t phys = page_to_phys(page) + offset;
+ bool is_pfn_valid = true;
dma_addr_t addr;

BUG_ON(!valid_dma_direction(dir));
@@ -170,8 +171,20 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
- else
+ else {
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
+ is_pfn_valid = pfn_valid(PHYS_PFN(phys));
+
+ if (unlikely(!is_pfn_valid))
+ return DMA_MAPPING_ERROR;
+
+ /*
+ * All platforms which implement .map_page() don't support
+ * non-struct page backed addresses.
+ */
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ }
+
kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:43 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs() and dma_unmap_page_attrs() functions
converted to simple wrappers around the phys-based implementations.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
include/linux/dma-mapping.h | 13 +++++++++++++
kernel/dma/mapping.c | 25 ++++++++++++++++++++-----
2 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb..ba54bbeca861 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -118,6 +118,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs);
void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -172,6 +176,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
}
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
static inline unsigned int dma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents, enum dma_data_direction dir,
unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 74efb6909103..29e8594a725a 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,12 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
}

-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
- size_t offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- phys_addr_t phys = page_to_phys(page) + offset;
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(page);
bool is_pfn_valid = true;
dma_addr_t addr;

@@ -191,9 +191,17 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,

return addr;
}
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+ size_t offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ return dma_map_phys(dev, page_to_phys(page) + offset, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_map_page_attrs);

-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -209,6 +217,13 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ dma_unmap_phys(dev, addr, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_unmap_page_attrs);

static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:49 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with pfn_valid() checks
using PHYS_PFN(phys). This provides more accurate validation for non-page
backed memory regions without need to have "faked" struct page.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
arch/powerpc/kernel/dma-iommu.c | 4 ++--
include/linux/dma-map-ops.h | 8 ++++----
kernel/dma/direct.c | 6 +++---
kernel/dma/direct.h | 13 ++++++-------
kernel/dma/mapping.c | 8 ++++----
5 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f3..0359ab72cd3b 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
#define can_map_direct(dev, addr) \
((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))

-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
{
if (likely(!dev->bus_dma_limit))
return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)

#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)

-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
{
if (likely(!dev->bus_dma_limit))
return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5..71f5b3025415 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
void arch_dma_clear_uncached(void *addr, size_t size);

#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
#else
-#define arch_dma_map_page_direct(d, a) (false)
-#define arch_dma_unmap_page_direct(d, a) (false)
+#define arch_dma_map_phys_direct(d, a) (false)
+#define arch_dma_unmap_phys_direct(d, a) (false)
#define arch_dma_map_sg_direct(d, s, n) (false)
#define arch_dma_unmap_sg_direct(d, s, n) (false)
#endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c879..fa75e3070073 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
if (sg_dma_is_bus_address(sg))
sg_dma_unmark_bus_address(sg);
else
- dma_direct_unmap_page(dev, sg->dma_address,
+ dma_direct_unmap_phys(dev, sg->dma_address,
sg_dma_len(sg), dir, attrs);
}
}
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
*/
break;
case PCI_P2PDMA_MAP_NONE:
- sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
- sg->offset, sg->length, dir, attrs);
+ sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+ sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR) {
ret = -EIO;
goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc..10c1ba73c482 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,22 +80,21 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
arch_dma_mark_clean(paddr, size);
}

-static inline dma_addr_t dma_direct_map_page(struct device *dev,
- struct page *page, unsigned long offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+ phys_addr_t phys, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);

if (is_swiotlb_force_bounce(dev)) {
- if (is_pci_p2pdma_page(page))
+ if (!pfn_valid(PHYS_PFN(phys)))
return DMA_MAPPING_ERROR;
return swiotlb_map(dev, phys, size, dir, attrs);
}

if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
dma_kmalloc_needs_bounce(dev, size, dir)) {
- if (is_pci_p2pdma_page(page))
+ if (!pfn_valid(PHYS_PFN(phys)))
return DMA_MAPPING_ERROR;
if (is_swiotlb_active(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
@@ -111,7 +110,7 @@ static inline dma_addr_t dma_direct_map_page(struct device *dev,
return dma_addr;
}

-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
phys_addr_t phys = dma_to_phys(dev, addr);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58482536db9b..80481a873340 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, phys + size))
- addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+ arch_dma_map_phys_direct(dev, phys + size))
+ addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,

BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_page_direct(dev, addr + size))
- dma_direct_unmap_page(dev, addr, size, dir, attrs);
+ arch_dma_unmap_phys_direct(dev, addr + size))
+ dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
--
2.49.0

Leon Romanovsky

unread,
Jun 25, 2025, 9:19:53 AMJun 25
to Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.

The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.

This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
mm/hmm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index feac86196a65..9354fae3ae06 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -779,8 +779,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
goto error;

- dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);
if (dma_mapping_error(dev, dma_addr))
goto error;

@@ -823,8 +823,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
} else if (dma_need_unmap(dev))
- dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);

pfns[idx] &=
~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
--
2.49.0

Alexander Potapenko

unread,
Jun 26, 2025, 1:43:45 PMJun 26
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Wed, Jun 25, 2025 at 3:19 PM Leon Romanovsky <le...@kernel.org> wrote:
>
> From: Leon Romanovsky <leo...@nvidia.com>

Hi Leon,

>
> Convert the KMSAN DMA handling function from page-based to physical
> address-based interface.
>
> The refactoring renames kmsan_handle_dma() parameters from accepting
> (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> size_t size).

Could you please elaborate a bit why this is needed? Are you fixing
some particular issue?

> A PFN_VALID check is added to prevent KMSAN operations
> on non-page memory, preventing from non struct page backed address,
>
> As part of this change, support for highmem addresses is implemented
> using kmap_local_page() to handle both lowmem and highmem regions
> properly. All callers throughout the codebase are updated to use the
> new phys_addr_t based interface.

KMSAN only works on 64-bit systems, do we actually have highmem on any of these?

Leon Romanovsky

unread,
Jun 26, 2025, 2:45:13 PMJun 26
to Alexander Potapenko, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Thu, Jun 26, 2025 at 07:43:06PM +0200, Alexander Potapenko wrote:
> On Wed, Jun 25, 2025 at 3:19 PM Leon Romanovsky <le...@kernel.org> wrote:
> >
> > From: Leon Romanovsky <leo...@nvidia.com>
>
> Hi Leon,
>
> >
> > Convert the KMSAN DMA handling function from page-based to physical
> > address-based interface.
> >
> > The refactoring renames kmsan_handle_dma() parameters from accepting
> > (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> > size_t size).
>
> Could you please elaborate a bit why this is needed? Are you fixing
> some particular issue?

It is soft of the fix and improvement at the same time.
Improvement:
It allows direct call to kmsan_handle_dma() without need
to convert from phys_addr_t to struct page for newly introduced
dma_map_phys() routine.

Fix:
It prevents us from executing kmsan for addresses that don't have struct page
(for example PCI_P2PDMA_MAP_THRU_HOST_BRIDGE pages), which we are doing
with original code.

dma_map_sg_attrs()
-> __dma_map_sg_attrs()
-> dma_direct_map_sg()
-> PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and nents > 0
-> kmsan_handle_dma_sg();
-> kmsan_handle_dma(g_page(item) <---- this is "fake" page.

We are trying to build DMA API that doesn't require struct pages.

>
> > A PFN_VALID check is added to prevent KMSAN operations
> > on non-page memory, preventing from non struct page backed address,
> >
> > As part of this change, support for highmem addresses is implemented
> > using kmap_local_page() to handle both lowmem and highmem regions
> > properly. All callers throughout the codebase are updated to use the
> > new phys_addr_t based interface.
>
> KMSAN only works on 64-bit systems, do we actually have highmem on any of these?

I don't know, but the original code had this check:
344 if (PageHighMem(page))
345 return;

Thanks

Marek Szyprowski

unread,
Jun 27, 2025, 9:44:16 AMJun 27
to Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
On 25.06.2025 15:18, Leon Romanovsky wrote:
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.
>
> The series consists of 8 patches that progressively convert the DMA
> mapping infrastructure from page-based to physical address-based APIs:
>
> The series maintains backward compatibility by keeping the old
> page-based API as wrapper functions around the new physical
> address-based implementations.

Thanks for this rework! I assume that the next step is to add map_phys
callback also to the dma_map_ops and teach various dma-mapping providers
to use it to avoid more phys-to-page-to-phys conversions.

I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys()
API is also suitable for the recently discussed PCI P2P DMA? While
adding a new API maybe we should take this into account? My main concern
is the lack of the source phys addr passed to the dma_unmap_phys()
function and I'm aware that this might complicate a bit code conversion
from old dma_map/unmap_page() API.

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Alexander Potapenko

unread,
Jun 27, 2025, 12:28:53 PMJun 27
to Leon Romanovsky, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
Thanks for clarifying that!

> > KMSAN only works on 64-bit systems, do we actually have highmem on any of these?
>
> I don't know, but the original code had this check:
> 344 if (PageHighMem(page))
> 345 return;
>
> Thanks

Ouch, I overlooked that, sorry!

I spent a while trying to understand where this code originated from,
and found the following discussion:
https://lore.kernel.org/all/20200327170...@lst.de/

It's still unclear to me whether we actually need this check, because
with my config it doesn't produce any code.
But I think this shouldn't be blocking your patch, I'd rather make a
follow-up fix.

Leon Romanovsky

unread,
Jun 27, 2025, 1:02:20 PMJun 27
to Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
On Fri, Jun 27, 2025 at 03:44:10PM +0200, Marek Szyprowski wrote:
> On 25.06.2025 15:18, Leon Romanovsky wrote:
> > This series refactors the DMA mapping to use physical addresses
> > as the primary interface instead of page+offset parameters. This
> > change aligns the DMA API with the underlying hardware reality where
> > DMA operations work with physical addresses, not page structures.
> >
> > The series consists of 8 patches that progressively convert the DMA
> > mapping infrastructure from page-based to physical address-based APIs:
> >
> > The series maintains backward compatibility by keeping the old
> > page-based API as wrapper functions around the new physical
> > address-based implementations.
>
> Thanks for this rework! I assume that the next step is to add map_phys
> callback also to the dma_map_ops and teach various dma-mapping providers
> to use it to avoid more phys-to-page-to-phys conversions.

Probably Christoph will say yes, however I personally don't see any
benefit in this. Maybe I wrong here, but all existing .map_page()
implementation platforms don't support p2p anyway. They won't benefit
from this such conversion.

>
> I only wonder if this newly introduced dma_map_phys()/dma_unmap_phys()
> API is also suitable for the recently discussed PCI P2P DMA? While
> adding a new API maybe we should take this into account?

First, immediate user (not related to p2p) is blk layer:
https://lore.kernel.org/linux-nvme/bcdcb5eb-17ed-412f...@nvidia.com/T/#m7e715697d4b2e3997622a3400243477c75cab406

+static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
+ struct blk_dma_iter *iter, struct phys_vec *vec)
+{
+ iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
+ offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+ if (dma_mapping_error(dma_dev, iter->addr)) {
+ iter->status = BLK_STS_RESOURCE;
+ return false;
+ }
+ iter->len = vec->len;
+ return true;
+}

Block layer started to store phys addresses instead of struct pages and
this phys_to_page() conversion in data-path will be avoided.

Leon Romanovsky

unread,
Jul 6, 2025, 2:00:15 AMJul 6
to Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
I almost completed main user of this dma_map_phys() callback. It is
rewrite of this patch [PATCH v3 3/3] vfio/pci: Allow MMIO regions to be exported through dma-buf
https://lore.kernel.org/all/20250307052248.4058...@intel.com/

Whole populate_sgt()->dma_map_resource() block looks differently now and
it is relying on dma_map_phys() as we are exporting memory without
struct pages. It will be something like this:

89 for (i = 0; i < priv->nr_ranges; i++) {
90 phys = pci_resource_start(priv->vdev->pdev,
91 dma_ranges[i].region_index);
92 phys += dma_ranges[i].offset;
93
94 if (priv->bus_addr) {
95 addr = pci_p2pdma_bus_addr_map(&p2pdma_state, phys);
96 fill_sg_entry(sgl, dma_ranges[i].length, addr);
97 sgl = sg_next(sgl);
98 } else if (dma_use_iova(&priv->state)) {
99 ret = dma_iova_link(attachment->dev, &priv->state, phys,
100 priv->mapped_len,
101 dma_ranges[i].length, dir, attrs);
102 if (ret)
103 goto err_unmap_dma;
104
105 priv->mapped_len += dma_ranges[i].length;
106 } else {
107 addr = dma_map_phys(attachment->dev, phys, 0,
108 dma_ranges[i].length, dir, attrs);
109 ret = dma_mapping_error(attachment->dev, addr);
110 if (ret)
111 goto unmap_dma_buf;
112
113 fill_sg_entry(sgl, dma_ranges[i].length, addr);
114 sgl = sg_next(sgl);
115 }
116 }
117
118 if (dma_use_iova(&priv->state) && !priv->bus_addr) {
119 ret = dma_iova_sync(attachment->dev, &pri->state, 0,
120 priv->mapped_len);
121 if (ret)
122 goto err_unmap_dma;
123
124 fill_sg_entry(sgl, priv->mapped_len, priv->state.addr);
125 }

>
> > My main concern is the lack of the source phys addr passed to the dma_unmap_phys()
> > function and I'm aware that this might complicate a bit code conversion
> > from old dma_map/unmap_page() API.

It is not needed for now, all p2p logic is external to DMA API.

Thanks

Marek Szyprowski

unread,
Jul 8, 2025, 6:27:17 AMJul 8
to Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
On 30.06.2025 15:38, Christoph Hellwig wrote:
> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>> Thanks for this rework! I assume that the next step is to add map_phys
>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>> to use it to avoid more phys-to-page-to-phys conversions.
>> Probably Christoph will say yes, however I personally don't see any
>> benefit in this. Maybe I wrong here, but all existing .map_page()
>> implementation platforms don't support p2p anyway. They won't benefit
>> from this such conversion.
> I think that conversion should eventually happen, and rather sooner than
> later.

Agreed.

Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
needs a stable branch with it.

Leon, it would be great if You could also prepare an incremental patch
adding map_phys callback to the dma_maps_ops, so the individual
arch-specific dma-mapping providers can be then converted (or simplified
in many cases) too.

Leon Romanovsky

unread,
Jul 8, 2025, 7:00:13 AMJul 8
to Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
> On 30.06.2025 15:38, Christoph Hellwig wrote:
> > On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
> >>> Thanks for this rework! I assume that the next step is to add map_phys
> >>> callback also to the dma_map_ops and teach various dma-mapping providers
> >>> to use it to avoid more phys-to-page-to-phys conversions.
> >> Probably Christoph will say yes, however I personally don't see any
> >> benefit in this. Maybe I wrong here, but all existing .map_page()
> >> implementation platforms don't support p2p anyway. They won't benefit
> >> from this such conversion.
> > I think that conversion should eventually happen, and rather sooner than
> > later.
>
> Agreed.
>
> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> needs a stable branch with it.

Thanks a lot, I don't think that stable branch is needed. Realistically
speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
it is complete rewrite from RFC version and touches pci-p2p code (to
remove dependency on struct page) in addition to VFIO, so it will take
time.

Regarding, last patch (hmm), it will be great if you can take it.
We didn't touch anything in hmm.c this cycle and have no plans to send PR.
It can safely go through your tree.

>
> Leon, it would be great if You could also prepare an incremental patch
> adding map_phys callback to the dma_maps_ops, so the individual
> arch-specific dma-mapping providers can be then converted (or simplified
> in many cases) too.

Sure, will do.

Marek Szyprowski

unread,
Jul 8, 2025, 7:45:28 AMJul 8
to Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
On 08.07.2025 13:00, Leon Romanovsky wrote:
> On Tue, Jul 08, 2025 at 12:27:09PM +0200, Marek Szyprowski wrote:
>> On 30.06.2025 15:38, Christoph Hellwig wrote:
>>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>>> Probably Christoph will say yes, however I personally don't see any
>>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>>> implementation platforms don't support p2p anyway. They won't benefit
>>>> from this such conversion.
>>> I think that conversion should eventually happen, and rather sooner than
>>> later.
>> Agreed.
>>
>> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
>> needs a stable branch with it.
> Thanks a lot, I don't think that stable branch is needed. Realistically
> speaking, my VFIO DMA work won't be merged this cycle, We are in -rc5,
> it is complete rewrite from RFC version and touches pci-p2p code (to
> remove dependency on struct page) in addition to VFIO, so it will take
> time.
>
> Regarding, last patch (hmm), it will be great if you can take it.
> We didn't touch anything in hmm.c this cycle and have no plans to send PR.
> It can safely go through your tree.

Okay, then I would like to get an explicit ack from Jérôme for this.

>> Leon, it would be great if You could also prepare an incremental patch
>> adding map_phys callback to the dma_maps_ops, so the individual
>> arch-specific dma-mapping providers can be then converted (or simplified
>> in many cases) too.
> Sure, will do.

Thanks!

Leon Romanovsky

unread,
Jul 8, 2025, 8:06:57 AMJul 8
to Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
Jerome is not active in HMM world for a long time already.
HMM tree is managed by us (RDMA) https://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git/log/?h=hmm
➜ kernel git:(m/dmabuf-vfio) git log --merges mm/hmm.c
...
Pull HMM updates from Jason Gunthorpe:
...

https://web.git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=58ba80c4740212c29a1cf9b48f588e60a7612209
+hmm git git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git#hmm

We just never bothered to reflect current situation in MAINTAINERS file.

Thanks

Marek Szyprowski

unread,
Jul 8, 2025, 8:56:31 AMJul 8
to Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
Maybe this is the time to update it :)

I was just a bit confused that no-one commented the HMM patch, but if
You maintain it, then this is okay.

Marek Szyprowski

unread,
Jul 8, 2025, 11:57:39 AMJul 8
to Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
I've applied the last patch to dma-mapping-for-next branch.

Will Deacon

unread,
Jul 15, 2025, 9:24:48 AMJul 15
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
Hi Leon,

On Wed, Jun 25, 2025 at 04:19:05PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Convert HMM DMA operations from the legacy page-based API to the new
> physical address-based dma_map_phys() and dma_unmap_phys() functions.
> This demonstrates the preferred approach for new code that should use
> physical addresses directly rather than page+offset parameters.
>
> The change replaces dma_map_page() and dma_unmap_page() calls with
> dma_map_phys() and dma_unmap_phys() respectively, using the physical
> address that was already available in the code. This eliminates the
> redundant page-to-physical address conversion and aligns with the
> DMA subsystem's move toward physical address-centric interfaces.
>
> This serves as an example of how new code should be written to leverage
> the more efficient physical address API, which provides cleaner interfaces
> for drivers that already have access to physical addresses.

I'm struggling a little to see how this is cleaner or more efficient
than the old code.

From what I can tell, dma_map_page_attrs() takes a 'struct page *' and
converts it to a physical address using page_to_phys() whilst your new
dma_map_phys() interface takes a physical address and converts it to
a 'struct page *' using phys_to_page(). In both cases, hmm_dma_map_pfn()
still needs the page for other reasons. If anything, existing users of
dma_map_page_attrs() now end up with a redundant page-to-phys-to-page
conversion which hopefully the compiler folds away.

I'm assuming there's future work which builds on top of the new API
and removes the reliance on 'struct page' entirely, is that right? If
so, it would've been nicer to be clearer about that as, on its own, I'm
not really sure this patch series achieves an awful lot and the
efficiency argument looks quite weak to me.

Cheers,

Will

Leon Romanovsky

unread,
Jul 15, 2025, 9:58:39 AMJul 15
to Will Deacon, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Robin Murphy, Joerg Roedel, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Tue, Jul 15, 2025 at 02:24:38PM +0100, Will Deacon wrote:
> Hi Leon,
>
> On Wed, Jun 25, 2025 at 04:19:05PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leo...@nvidia.com>
> >
> > Convert HMM DMA operations from the legacy page-based API to the new
> > physical address-based dma_map_phys() and dma_unmap_phys() functions.
> > This demonstrates the preferred approach for new code that should use
> > physical addresses directly rather than page+offset parameters.
> >
> > The change replaces dma_map_page() and dma_unmap_page() calls with
> > dma_map_phys() and dma_unmap_phys() respectively, using the physical
> > address that was already available in the code. This eliminates the
> > redundant page-to-physical address conversion and aligns with the
> > DMA subsystem's move toward physical address-centric interfaces.
> >
> > This serves as an example of how new code should be written to leverage
> > the more efficient physical address API, which provides cleaner interfaces
> > for drivers that already have access to physical addresses.
>
> I'm struggling a little to see how this is cleaner or more efficient
> than the old code.

It is not, the main reason for hmm conversion is to show how the API is
used. HMM is built around struct page.

>
> From what I can tell, dma_map_page_attrs() takes a 'struct page *' and
> converts it to a physical address using page_to_phys() whilst your new
> dma_map_phys() interface takes a physical address and converts it to
> a 'struct page *' using phys_to_page(). In both cases, hmm_dma_map_pfn()
> still needs the page for other reasons. If anything, existing users of
> dma_map_page_attrs() now end up with a redundant page-to-phys-to-page
> conversion which hopefully the compiler folds away.
>
> I'm assuming there's future work which builds on top of the new API
> and removes the reliance on 'struct page' entirely, is that right? If
> so, it would've been nicer to be clearer about that as, on its own, I'm
> not really sure this patch series achieves an awful lot and the
> efficiency argument looks quite weak to me.

Yes, there is ongoing work, which is built on top of dma_map_phys() API
and can't be built without DMA phys.

My WIP branch, where I'm using it can be found here:
https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=dmabuf-vfio

In that branch, we save one phys_to_page conversion in block datapath:
block-dma: migrate to dma_map_phys instead of map_page

and implement DMABUF exporter for MMIO pages:
vfio/pci: Allow MMIO regions to be exported through dma-buf
see vfio_pci_dma_buf_map() function.

Thanks

>
> Cheers,
>
> Will
>

Robin Murphy

unread,
Jul 25, 2025, 4:05:00 PMJul 25
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On 2025-06-25 2:19 pm, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> All platforms which implement map_page interface don't support physical
> addresses without real struct page. Add condition to check it.

As-is, the condition also needs to cover iommu-dma, because that also
still doesn't support non-page-backed addresses. You can't just do a
simple s/page/phys/ rename and hope it's OK because you happen to get
away with it for coherent, 64-bit, trusted devices.

Thanks,
Robin.

Robin Murphy

unread,
Jul 25, 2025, 4:05:57 PMJul 25
to Leon Romanovsky, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On 2025-06-25 2:18 pm, Leon Romanovsky wrote:
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.

That is obvious nonsense - the DMA *API* does not exist in "hardware
reality"; the DMA API abstracts *software* operations that must be
performed before and after the actual hardware DMA operation in order to
preserve memory coherency etc.

Streaming DMA API callers get their buffers from alloc_pages() or
kmalloc(); they do not have physical addresses, they have a page or
virtual address. The internal operations of pretty much every DMA API
implementation that isn't a no-op also require a page and/or virtual
address. It is 100% logical for the DMA API interfaces to take a page or
virtual address (and since virt_to_page() is pretty trivial, we already
consolidated the two interfaces ages ago).

Yes, once you get right down to the low-level arch_sync_dma_*()
interfaces that passes a physical address, but that's mostly an artefact
of them being factored out of old dma_sync_single_*() implementations
that took a (physical) DMA address. Nearly all of them then use __va()
or phys_to_virt() to actually consume it. Even though it's a
phys_addr_t, the implicit guarantee that it represents page-backed
memory is absolutely vital.

Take a step back; what do you imagine that a DMA API call on a
non-page-backed physical address could actually *do*?

- Cache maintenance? No, it would be illogical for a P2P address to be
cached in a CPU cache, and anyway it would almost always crash because
it requires page-backed memory with a virtual address.

- Bounce buffering? Again no, that would be illogical, defeat the entire
point of a P2P operation, and anyway would definitely crash because it
requires page-backed memory with a virtual address.

- IOMMU mappings? Oh hey look that's exactly what dma_map_resource() has
been doing for 9 years. Not to mention your new IOMMU API if callers
want to be IOMMU-aware (although without the same guarantee of not also
doing the crashy things.)

- Debug tracking? Again, already taken care of by dma_map_resource().

- Some entirely new concept? Well, I'm eager to be enlightened if so!

But given what we do already know of from decades of experience, obvious
question: For the tiny minority of users who know full well when they're
dealing with a non-page-backed physical address, what's wrong with using
dma_map_resource?

Does it make sense to try to consolidate our p2p infrastructure so
dma_map_resource() could return bus addresses where appropriate? Yes,
almost certainly, if it makes it more convenient to use. And with only
about 20 users it's not too impractical to add some extra arguments or
even rejig the whole interface if need be. Indeed an overhaul might even
help solve the current grey area as to when it should take dma_range_map
into account or not for platform devices.

> The series consists of 8 patches that progressively convert the DMA
> mapping infrastructure from page-based to physical address-based APIs:

And as a result ends up making said DMA mapping infrastructure slightly
more complicated and slightly less efficient for all its legitimate
users, all so one or two highly specialised users can then pretend to
call it in situations where it must be a no-op anyway? Please explain
convincingly why that is not a giant waste of time.

Are we trying to remove struct page from the kernel altogether? If yes,
then for goodness' sake lead with that, but even then I'd still prefer
to see the replacements for critical related infrastructure like
pfn_valid() in place before we start trying to reshape the DMA API to fit.

Thanks,
Robin.

Leon Romanovsky

unread,
Jul 27, 2025, 2:30:34 AMJul 27
to Robin Murphy, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Fri, Jul 25, 2025 at 09:04:50PM +0100, Robin Murphy wrote:
> On 2025-06-25 2:19 pm, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leo...@nvidia.com>
> >
> > All platforms which implement map_page interface don't support physical
> > addresses without real struct page. Add condition to check it.
>
> As-is, the condition also needs to cover iommu-dma, because that also still
> doesn't support non-page-backed addresses. You can't just do a simple
> s/page/phys/ rename and hope it's OK because you happen to get away with it
> for coherent, 64-bit, trusted devices.

It needs to be follow up patch. Is this what you envision?

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index e1586eb52ab34..31214fde88124 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -167,6 +167,12 @@ dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
arch_dma_map_phys_direct(dev, phys + size))
addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+ !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ is_pfn_valid = pfn_valid(PHYS_PFN(phys));
+
+ if (unlikely(!is_pfn_valid))
+ return DMA_MAPPING_ERROR;
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else {
struct page *page = phys_to_page(phys);
~
~
~

Thanks

Jason Gunthorpe

unread,
Jul 29, 2025, 10:04:00 AMJul 29
to Robin Murphy, Leon Romanovsky, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Fri, Jul 25, 2025 at 09:05:46PM +0100, Robin Murphy wrote:

> But given what we do already know of from decades of experience, obvious
> question: For the tiny minority of users who know full well when they're
> dealing with a non-page-backed physical address, what's wrong with using
> dma_map_resource?

I was also pushing for this, that we would have two seperate paths:

- the phys_addr was guarenteed to have a KVA (and today also struct page)
- the phys_addr is non-cachable and no KVA may exist

This is basically already the distinction today between map resource
and map page.

The caller would have to look at what it is trying to map, do the P2P
evaluation and then call the cachable phys or resource path(s).

Leon, I think you should revive the work you had along these lines. It
would address my concerns with the dma_ops changes too. I continue to
think we should not push non-cachable, non-KVA MMIO down the map_page
ops, those should use the map_resource op.

> Does it make sense to try to consolidate our p2p infrastructure so
> dma_map_resource() could return bus addresses where appropriate?

For some users but not entirely :( The sg path for P2P relies on
storing information inside the scatterlist so unmap knows what to do.

Changing map_resource to return a similar flag and then having drivers
somehow store that flag and give it back to unmap is not a trivial
change. It would be a good API for simple drivers, and I think we
could build such a helper calling through the new flow. But places
like DMABUF that have more complex lists will not like it.

For them we've been following the approach of BIO where the
driver/subystem will maintain a mapping list and be aware of when the
P2P information is changing. Then it has to do different map/unmap
sequences based on its own existing tracking.

I view this as all very low level infrastructure, I'm really hoping we
can get an agreement with Chritain and build a scatterlist replacement
for DMABUF that encapsulates all this away from drivers like BIO does
for block.

But we can't start that until we have a DMA API working fully for
non-struct page P2P memory. That is being driven by this series and
the VFIO DMABUF implementation on top of it.

> Are we trying to remove struct page from the kernel altogether?

Yes, it is a very long term project being pushed along with the
folios, memdesc conversion and so forth. It is huge, with many
aspects, but we can start to reasonably work on parts of them
independently.

A mid-term dream is to be able to go from pin_user_pages() -> DMA
without drivers needing to touch struct page at all.

This is a huge project on its own, and we are progressing it slowly
"bottom up" by allowing phys_addr_t in the DMA API then we can build
more infrastructure for subsystems to be struct-page free, culminating
in some pin_user_phyr() and phys_addr_t bio_vec someday.

Certainly a big part of this series is influenced by requirements to
advance pin_user_pages() -> DMA, while the other part is about
allowing P2P to work using phys_addr_t without struct page.

Jason

Robin Murphy

unread,
Jul 30, 2025, 7:11:41 AMJul 30
to Marek Szyprowski, Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
On 2025-07-08 11:27 am, Marek Szyprowski wrote:
> On 30.06.2025 15:38, Christoph Hellwig wrote:
>> On Fri, Jun 27, 2025 at 08:02:13PM +0300, Leon Romanovsky wrote:
>>>> Thanks for this rework! I assume that the next step is to add map_phys
>>>> callback also to the dma_map_ops and teach various dma-mapping providers
>>>> to use it to avoid more phys-to-page-to-phys conversions.
>>> Probably Christoph will say yes, however I personally don't see any
>>> benefit in this. Maybe I wrong here, but all existing .map_page()
>>> implementation platforms don't support p2p anyway. They won't benefit
>>> from this such conversion.
>> I think that conversion should eventually happen, and rather sooner than
>> later.
>
> Agreed.
>
> Applied patches 1-7 to my dma-mapping-next branch. Let me know if one
> needs a stable branch with it.

As the maintainer of iommu-dma, please drop the iommu-dma patch because
it is broken. It does not in any way remove the struct page dependency
from iommu-dma, it merely hides it so things can crash more easily in
circumstances that clearly nobody's bothered to test.

> Leon, it would be great if You could also prepare an incremental patch
> adding map_phys callback to the dma_maps_ops, so the individual
> arch-specific dma-mapping providers can be then converted (or simplified
> in many cases) too.

Marek, I'm surprised that even you aren't seeing why that would at best
be pointless churn. The fundamental design of dma_map_page() operating
on struct page is that it sits in between alloc_pages() at the caller
and kmap_atomic() deep down in the DMA API implementation (which also
subsumes any dependencies on having a kernel virtual address at the
implementation end). The natural working unit for whatever replaces
dma_map_page() will be whatever the replacement for alloc_pages()
returns, and the replacement for kmap_atomic() operates on. Until that
exists (and I simply cannot believe it would be an unadorned physical
address) there cannot be any *meaningful* progress made towards removing
the struct page dependency from the DMA API. If there is also a goal to
kill off highmem before then, then logically we should just wait for
that to land, then revert back to dma_map_single() being the first-class
interface, and dma_map_page() can turn into a trivial page_to_virt()
wrapper for the long tail of caller conversions.

Simply obfuscating the struct page dependency today by dressing it up as
a phys_addr_t with implicit baggage is not not in any way helpful. It
only makes the code harder to understand and more bug-prone. Despite the
disingenuous claims, it is quite blatantly the opposite of "efficient"
for callers to do extra work to throw away useful information with
page_to_phys(), and the implementation then have to re-derive that
information with pfn_valid()/phys_to_page().

And by "bug-prone" I also include greater distractions like this
misguided idea that the same API could somehow work for non-memory
addresses too, so then everyone can move on bikeshedding VFIO while
overlooking the fundamental flaws in the whole premise. I mean, besides
all the issues I've already pointed out in that regard, not least the
glaring fact that it's literally just a worse version of *an API we
already have*, as DMA API maintainer do you *really* approve of a design
that depends on callers abusing DMA_ATTR_SKIP_CPU_SYNC, yet will still
readily blow up if they did then call a dma_sync op?

Thanks,
Robin.

Leon Romanovsky

unread,
Jul 30, 2025, 9:40:35 AMJul 30
to Robin Murphy, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
Robin, Marek

I would like to ask you to do not drop this series and allow me to
gradually change the code during my VFIO DMABUF adventure.

The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
pass it to both dma_map_phys() and dma_iova_link(). This flag will
indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

dma_map_phys() isn't entirely wrong, it just needs an extra tweaks.

Thanks

>
> Thanks,
> Robin.
>

Jason Gunthorpe

unread,
Jul 30, 2025, 10:28:21 AMJul 30
to Leon Romanovsky, Matthew Wilcox, David Hildenbrand, Robin Murphy, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

> > The natural working unit for whatever replaces dma_map_page() will be
> > whatever the replacement for alloc_pages() returns, and the replacement for
> > kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> > would be an unadorned physical address) there cannot be any
> > *meaningful*

alloc_pages becomes legacy.

There will be some new API 'memdesc alloc'. If I understand Matthew's
plan properly - here is a sketch of changing iommu-pages:

--- a/drivers/iommu/iommu-pages.c
+++ b/drivers/iommu/iommu-pages.c
@@ -36,9 +36,10 @@ static_assert(sizeof(struct ioptdesc) <= sizeof(struct page));
*/
void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
{
+ struct ioptdesc *desc;
unsigned long pgcnt;
- struct folio *folio;
unsigned int order;
+ void *addr;

/* This uses page_address() on the memory. */
if (WARN_ON(gfp & __GFP_HIGHMEM))
@@ -56,8 +57,8 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();

- folio = __folio_alloc_node(gfp | __GFP_ZERO, order, nid);
- if (unlikely(!folio))
+ addr = memdesc_alloc_pages(&desc, gfp | __GFP_ZERO, order, nid);
+ if (unlikely(!addr))
return NULL;

/*
@@ -73,7 +74,7 @@ void *iommu_alloc_pages_node_sz(int nid, gfp_t gfp, size_t size)
mod_node_page_state(folio_pgdat(folio), NR_IOMMU_PAGES, pgcnt);
lruvec_stat_mod_folio(folio, NR_SECONDARY_PAGETABLE, pgcnt);

- return folio_address(folio);
+ return addr;
}

Where the memdesc_alloc_pages() will kmalloc a 'struct ioptdesc' and
some other change so that virt_to_ioptdesc() indirects through a new
memdesc. See here:

https://kernelnewbies.org/MatthewWilcox/Memdescs

We don't end up with some kind of catch-all struct to mean 'cachable
CPU memory' anymore because every user gets their own unique "struct
XXXdesc". So the thinking has been that the phys_addr_t is the best
option. I guess the alternative would be the memdesc as a handle, but
I'm not sure that is such a good idea.

People still express a desire to be able to do IO to cachable memory
that has a KVA through phys_to_virt but no memdesc/page allocation. I
don't know if this will happen but it doesn't seem like a good idea to
make it impossible by forcing memdesc types into low level APIs that
don't use them.

Also, the bio/scatterlist code between pin_user_pages() and DMA
mapping is consolidating physical contiguity. This runs faster if you
don't have to to page_to_phys() because everything is already
phys_addr_t.

> > progress made towards removing the struct page dependency from the DMA API.
> > If there is also a goal to kill off highmem before then, then logically we
> > should just wait for that to land, then revert back to dma_map_single()
> > being the first-class interface, and dma_map_page() can turn into a trivial
> > page_to_virt() wrapper for the long tail of caller conversions.

As I said there are many many projects related here and we can
meaningfully make progress in parts. It is not functionally harmful to
do the phys to page conversion before calling the legacy
dma_ops/SWIOTLB etc. This avoids creating patch dependencies with
highmem removal and other projects.

So long as the legacy things (highmem, dma_ops, etc) continue to work
I think it is OK to accept some obfuscation to allow the modern things
to work better. The majority flow - no highmem, no dma ops, no
swiotlb, does not require struct page. Having to do

PTE -> phys -> page -> phys -> DMA

Does have a cost.

> The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> pass it to both dma_map_phys() and dma_iova_link(). This flag will
> indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> right callbacks which will set IOMMU_MMIO flag and skip CPU sync,

So the idea is if the memory is non-cachable, no-KVA you'd call
dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
DMA_ATTR_MMIO) ?

And then internally the dma_ops and dma_iommu would use the existing
map_page/map_resource variations based on the flag, thus ensuring that
MMIO is never kmap'd or cache flushed?

dma_map_resource is really then just
dma_map_phys(phys_addr, DMA_ATTR_MMIO)?

I like this, I think it well addresses the concerns.

Jason

Marek Szyprowski

unread,
Jul 30, 2025, 12:32:50 PMJul 30
to Robin Murphy, Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
Robin, Your concerns are right. I missed the fact that making everything
depend on phys_addr_t would make DMA-mapping API prone for various
abuses. I need to think a bit more on this and try to understand more
the PCI P2P case, what means that I will probably miss this merge
window. I'm sorry for the lack of being active in the discussion, but I
just got back from my holidays and I'm trying to catch up.

Leon Romanovsky

unread,
Jul 31, 2025, 2:01:24 AMJul 31
to Jason Gunthorpe, Matthew Wilcox, David Hildenbrand, Robin Murphy, Marek Szyprowski, Christoph Hellwig, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Wed, Jul 30, 2025 at 11:28:18AM -0300, Jason Gunthorpe wrote:
> On Wed, Jul 30, 2025 at 04:40:26PM +0300, Leon Romanovsky wrote:

<...>

> > The most reasonable way to prevent DMA_ATTR_SKIP_CPU_SYNC leakage is to
> > introduce new DMA attribute (let's call it DMA_ATTR_MMIO for now) and
> > pass it to both dma_map_phys() and dma_iova_link(). This flag will
> > indicate that p2p type is PCI_P2PDMA_MAP_THRU_HOST_BRIDGE and call to
> > right callbacks which will set IOMMU_MMIO flag and skip CPU sync,
>
> So the idea is if the memory is non-cachable, no-KVA you'd call
> dma_iova_link(phys_addr, DMA_ATTR_MMIO) and dma_map_phys(phys_addr,
> DMA_ATTR_MMIO) ?

Yes

>
> And then internally the dma_ops and dma_iommu would use the existing
> map_page/map_resource variations based on the flag, thus ensuring that
> MMIO is never kmap'd or cache flushed?
>
> dma_map_resource is really then just
> dma_map_phys(phys_addr, DMA_ATTR_MMIO)?
>
> I like this, I think it well addresses the concerns.

Yes, I had this idea and implementation before. :(

>
> Jason
>

Matthew Wilcox

unread,
Jul 31, 2025, 1:37:42 PMJul 31
to Robin Murphy, Marek Szyprowski, Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org, Jason Gunthorpe
Hi Robin,

I don't know the DMA mapping code well and haven't reviewed this
patch set in particular, but I wanted to comment on some of the things
you say here.

> Marek, I'm surprised that even you aren't seeing why that would at best be
> pointless churn. The fundamental design of dma_map_page() operating on
> struct page is that it sits in between alloc_pages() at the caller and
> kmap_atomic() deep down in the DMA API implementation (which also subsumes
> any dependencies on having a kernel virtual address at the implementation
> end). The natural working unit for whatever replaces dma_map_page() will be
> whatever the replacement for alloc_pages() returns, and the replacement for
> kmap_atomic() operates on. Until that exists (and I simply cannot believe it
> would be an unadorned physical address) there cannot be any *meaningful*
> progress made towards removing the struct page dependency from the DMA API.
> If there is also a goal to kill off highmem before then, then logically we
> should just wait for that to land, then revert back to dma_map_single()
> being the first-class interface, and dma_map_page() can turn into a trivial
> page_to_virt() wrapper for the long tail of caller conversions.

While I'm sure we'd all love to kill off highmem, that's not a realistic
goal for another ten years or so. There are meaningful improvements we
can make, for example pulling page tables out of highmem, but we need to
keep file data and anonymous memory in highmem, so we'll need to support
DMA to highmem for the foreseeable future.

The replacement for kmap_atomic() is already here -- it's
kmap_(atomic|local)_pfn(). If a simple wrapper like kmap_local_phys()
would make this more palatable, that would be fine by me. Might save
a bit of messing around with calculating offsets in each caller.

As far as replacing alloc_pages() goes, some callers will still use
alloc_pages(). Others will use folio_alloc() or have used kmalloc().
Or maybe the caller won't have used any kind of page allocation because
they're doing I/O to something that isn't part of Linux's memory at all.
Part of the Grand Plan here is for Linux to catch up with Xen's ability
to do I/O to guests without allocating struct pages for every page of
memory in the guests.

You say that a physical address will need some adornment -- can you
elaborate on that for me? It may be that I'm missing something
important here.

Jason Gunthorpe

unread,
Aug 3, 2025, 11:59:09 AMAug 3
to Matthew Wilcox, Robin Murphy, Marek Szyprowski, Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Thu, Jul 31, 2025 at 06:37:11PM +0100, Matthew Wilcox wrote:

> The replacement for kmap_atomic() is already here -- it's
> kmap_(atomic|local)_pfn(). If a simple wrapper like kmap_local_phys()
> would make this more palatable, that would be fine by me. Might save
> a bit of messing around with calculating offsets in each caller.

I think that makes the general plan clearer. We should be removing the
struct pages entirely from the insides of DMA API layer and use the
phys_addr_t, kmap_XX_phys(), phys_to_virt(), and so on.

The request from Christoph and Marek to clean up the dma_ops makes
sense in that context, we'd have to go into the ops and replace the
struct page kmaps/etc with the phys based ones.

This hides the struct page requirement to get to a KVA inside the core
mm code only and that sort of modularity is exactly the sort of thing
that could help entirely remove a struct page requirement for some
kinds of DMA someday.

Matthew, do you think it makes sense to introduce types to make this
clearer? We have two kinds of values that a phys_addr_t can store -
something compatible with kmap_XX_phys(), and something that isn't.

This was recently a long discussion in ARM KVM as well which had a
similar confusion that a phys_addr_t was actually two very different
things inside its logic.

So what about some dedicated types:
kphys_addr_t - A physical address that can be passed to
kmap_XX_phys(), phys_to_virt(), etc.

raw_phys_addr_t - A physical address that may not be cachable, may
not be DRAM, and does not work with kmap_XX_phys()/etc.

We clearly have these two different ideas floating around in code,
page tables, etc.

I read some of Robin's concern that the struct page provided a certain
amount of type safety in the DMA API, this could provide similar.

Thanks,
Jason

Matthew Wilcox

unread,
Aug 3, 2025, 11:38:33 PMAug 3
to Jason Gunthorpe, Robin Murphy, Marek Szyprowski, Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Sun, Aug 03, 2025 at 12:59:06PM -0300, Jason Gunthorpe wrote:
> Matthew, do you think it makes sense to introduce types to make this
> clearer? We have two kinds of values that a phys_addr_t can store -
> something compatible with kmap_XX_phys(), and something that isn't.

I was with you up until this point. And then you said "What if we have
a raccoon that isn't a raccoon" and my brain derailed.

> This was recently a long discussion in ARM KVM as well which had a
> similar confusion that a phys_addr_t was actually two very different
> things inside its logic.

No. A phys_addr_t is a phys_addr_t. If something's abusing a
phys_addr_t to store something entirely different then THAT is what
should be using a different type. We've defined what a phys_addr_t
is. That was in Documentation/core-api/bus-virt-phys-mapping.rst
before Arnd removed it; to excerpt the relevant bit:

---

- CPU untranslated. This is the "physical" address. Physical address
0 is what the CPU sees when it drives zeroes on the memory bus.

[...]
So why do we care about the physical address at all? We do need the physical
address in some cases, it's just not very often in normal code. The physical
address is needed if you use memory mappings, for example, because the
"remap_pfn_range()" mm function wants the physical address of the memory to
be remapped as measured in units of pages, a.k.a. the pfn.

---

So if somebody is stuffing something else into phys_addr_t, *THAT* is
what needs to be fixed, not adding a new sub-type of phys_addr_t for
things which are actually phys_addr_t.

> We clearly have these two different ideas floating around in code,
> page tables, etc.

No. No, we don't. I've never heard of this asininity before.

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:15 AMAug 4
to Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Changelog:
v1:
* Added new DMA_ATTR_MMIO attribute to indicate
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
* Rewrote dma_map_* functions to use thus new attribute
v0: https://lore.kernel.org/all/cover.175085...@kernel.org/
------------------------------------------------------------------------

This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series maintains export symbol backward compatibility by keeping
the old page-based API as wrapper functions around the new physical
address-based implementations.

Thanks

Leon Romanovsky (16):
dma-mapping: introduce new DMA attribute to indicate MMIO memory
iommu/dma: handle MMIO path in dma_iova_link
dma-debug: refactor to use physical addresses for page mapping
dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
kmsan: convert kmsan_handle_dma to use physical addresses
dma-mapping: handle MMIO flow in dma_map|unmap_page
xen: swiotlb: Open code map_resource callback
dma-mapping: export new dma_*map_phys() interface
mm/hmm: migrate to physical address-based DMA mapping API
mm/hmm: properly take MMIO path
block-dma: migrate to dma_map_phys instead of map_page
block-dma: properly take MMIO path
nvme-pci: unmap MMIO pages with appropriate interface

Documentation/core-api/dma-api.rst | 4 +-
Documentation/core-api/dma-attributes.rst | 7 ++
arch/powerpc/kernel/dma-iommu.c | 4 +-
block/blk-mq-dma.c | 15 ++-
drivers/iommu/dma-iommu.c | 69 +++++++------
drivers/nvme/host/pci.c | 18 +++-
drivers/virtio/virtio_ring.c | 4 +-
drivers/xen/swiotlb-xen.c | 21 +++-
include/linux/blk-mq-dma.h | 6 +-
include/linux/blk_types.h | 2 +
include/linux/dma-direct.h | 2 -
include/linux/dma-map-ops.h | 8 +-
include/linux/dma-mapping.h | 27 +++++
include/linux/iommu-dma.h | 11 +--
include/linux/kmsan.h | 12 ++-
include/trace/events/dma.h | 9 +-
kernel/dma/debug.c | 71 ++++---------
kernel/dma/debug.h | 37 ++-----
kernel/dma/direct.c | 22 +----
kernel/dma/direct.h | 50 ++++++----
kernel/dma/mapping.c | 115 +++++++++++++---------
kernel/dma/ops_helpers.c | 6 +-
mm/hmm.c | 19 ++--
mm/kmsan/hooks.c | 36 +++++--
rust/kernel/dma.rs | 3 +
tools/virtio/linux/kmsan.h | 2 +-
26 files changed, 320 insertions(+), 260 deletions(-)

--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:21 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Make sure that CPU is not synced if MMIO path is taken.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 21 ++++++++++++++++-----
1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fef..399838c17b705 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1837,13 +1837,20 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
phys_addr_t phys, size_t size, enum dma_data_direction dir,
unsigned long attrs)
{
- bool coherent = dev_is_dma_coherent(dev);
+ int prot;

- if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
- arch_sync_dma_for_device(phys, size, dir);
+ if (attrs & DMA_ATTR_MMIO)
+ prot = dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO;
+ else {
+ bool coherent = dev_is_dma_coherent(dev);
+
+ if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ arch_sync_dma_for_device(phys, size, dir);
+ prot = dma_info_to_prot(dir, coherent, attrs);
+ }

return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
- dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
+ prot, GFP_ATOMIC);
}

static int iommu_dma_iova_bounce_and_link(struct device *dev, dma_addr_t addr,
@@ -1949,9 +1956,13 @@ int dma_iova_link(struct device *dev, struct dma_iova_state *state,
return -EIO;

if (dev_use_swiotlb(dev, size, dir) &&
- iova_unaligned(iovad, phys, size))
+ iova_unaligned(iovad, phys, size)) {
+ if (attrs & DMA_ATTR_MMIO)
+ return -EPERM;
+
return iommu_dma_iova_link_swiotlb(dev, state, phys, offset,
size, dir, attrs);
+ }

return __dma_iova_link(dev, state->addr + offset - iova_start_pad,
phys - iova_start_pad,
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:24 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.

The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.

This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
Documentation/core-api/dma-api.rst | 4 ++--
kernel/dma/debug.c | 28 +++++++++++++++++-----------
kernel/dma/debug.h | 16 +++++++---------
kernel/dma/mapping.c | 15 ++++++++-------
4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 3087bea715ed2..ca75b35416792 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -761,7 +761,7 @@ example warning message may look like this::
[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
[<ffffffff803c7ea3>] check_unmap+0x203/0x490
- [<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+ [<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -855,7 +855,7 @@ that a driver may be leaking mappings.
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check DMA mapping errors on addresses returned by dma_map_single() and
dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
this flag is still set, prints warning message that includes call trace that
leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index e43c6de2bce4e..da6734e3a4ce9 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -39,6 +39,7 @@ enum {
dma_debug_sg,
dma_debug_coherent,
dma_debug_resource,
+ dma_debug_phy,
};

enum map_err_types {
@@ -141,6 +142,7 @@ static const char *type2name[] = {
[dma_debug_sg] = "scatter-gather",
[dma_debug_coherent] = "coherent",
[dma_debug_resource] = "resource",
+ [dma_debug_phy] = "phy",
};

static const char *dir2name[] = {
@@ -1201,9 +1203,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
}
EXPORT_SYMBOL(debug_dma_map_single);

-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
- size_t size, int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ int direction, dma_addr_t dma_addr, unsigned long attrs)
{
struct dma_debug_entry *entry;

@@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
return;

entry->dev = dev;
- entry->type = dma_debug_single;
- entry->paddr = page_to_phys(page) + offset;
+ entry->type = dma_debug_phy;
+ entry->paddr = phys;
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
entry->map_err_type = MAP_ERR_NOT_CHECKED;

- check_for_stack(dev, page, offset);
+ if (!(attrs & DMA_ATTR_MMIO)) {
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(page);

- if (!PageHighMem(page)) {
- void *addr = page_address(page) + offset;
+ check_for_stack(dev, page, offset);

- check_for_illegal_area(dev, addr, size);
+ if (!PageHighMem(page)) {
+ void *addr = page_address(page) + offset;
+
+ check_for_illegal_area(dev, addr, size);
+ }
}

add_dma_entry(entry, attrs);
@@ -1274,11 +1280,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
}
EXPORT_SYMBOL(debug_dma_mapping_error);

-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
size_t size, int direction)
{
struct dma_debug_entry ref = {
- .type = dma_debug_single,
+ .type = dma_debug_phy,
.dev = dev,
.dev_addr = dma_addr,
.size = size,
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index f525197d3cae6..76adb42bffd5f 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
#define _KERNEL_DMA_DEBUG_H

#ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction, dma_addr_t dma_addr,
unsigned long attrs);

-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction);

extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -55,14 +54,13 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
struct scatterlist *sg,
int nelems, int direction);
#else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction,
+ dma_addr_t dma_addr, unsigned long attrs)
{
}

-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction)
{
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 107e4a4d251df..4c1dfbabb8ae5 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t addr;

BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+ arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
- attrs);
- debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+ trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

return addr;
}
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_page(dev, addr, size, dir, attrs);
- debug_dma_unmap_page(dev, addr, size, dir);
+ debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);

@@ -712,7 +712,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
if (page) {
trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle,
size, dir, gfp, 0);
- debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0);
+ debug_dma_map_phys(dev, page_to_phys(page), size, dir,
+ *dma_handle, 0);
} else {
trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0);
}
@@ -738,7 +739,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
dma_addr_t dma_handle, enum dma_data_direction dir)
{
trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0);
- debug_dma_unmap_page(dev, dma_handle, size, dir);
+ debug_dma_unmap_phys(dev, dma_handle, size, dir);
__dma_free_pages(dev, size, page, dma_handle, dir);
}
EXPORT_SYMBOL_GPL(dma_free_pages);
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:31 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
that reside in memory-mapped I/O (MMIO) regions, such as device BARs
exposed through the host bridge, which are accessible for peer-to-peer
(P2P) DMA.

This attribute is especially useful for exporting device memory to other
devices for DMA without CPU involvement, and avoids unnecessary or
potentially detrimental CPU cache maintenance calls.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
Documentation/core-api/dma-attributes.rst | 7 +++++++
include/linux/dma-mapping.h | 14 ++++++++++++++
include/trace/events/dma.h | 3 ++-
rust/kernel/dma.rs | 3 +++
4 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e926..91acd2684e506 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,10 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute is especially useful for exporting device memory to other
+devices for DMA without CPU involvement, and avoids unnecessary or
+potentially detrimental CPU cache maintenance calls.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb3..afc89835c7457 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,20 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)

+/*
+ * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
+ *
+ * This attribute is used for MMIO memory regions that are exposed through
+ * the host bridge and are accessible for peer-to-peer (P2P) DMA. Memory
+ * marked with this attribute is not system RAM and may represent device
+ * BAR windows or peer-exposed memory.
+ *
+ * Typical usage is for mapping hardware memory BARs or exporting device
+ * memory to other devices for DMA without involving main system RAM.
+ * The attribute guarantees no CPU cache maintenance calls will be made.
+ */
+#define DMA_ATTR_MMIO (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c8..ee90d6f1dcf35 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -31,7 +31,8 @@ TRACE_DEFINE_ENUM(DMA_NONE);
{ DMA_ATTR_FORCE_CONTIGUOUS, "FORCE_CONTIGUOUS" }, \
{ DMA_ATTR_ALLOC_SINGLE_PAGES, "ALLOC_SINGLE_PAGES" }, \
{ DMA_ATTR_NO_WARN, "NO_WARN" }, \
- { DMA_ATTR_PRIVILEGED, "PRIVILEGED" })
+ { DMA_ATTR_PRIVILEGED, "PRIVILEGED" }, \
+ { DMA_ATTR_MMIO, "MMIO" })

DECLARE_EVENT_CLASS(dma_map,
TP_PROTO(struct device *dev, phys_addr_t phys_addr, dma_addr_t dma_addr,
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 2bc8ab51ec280..61d9eed7a786e 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -242,6 +242,9 @@ pub mod attrs {
/// Indicates that the buffer is fully accessible at an elevated privilege level (and
/// ideally inaccessible or at least read-only at lesser-privileged levels).
pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+
+ /// Indicates that the buffer is MMIO memory.
+ pub const DMA_ATTR_MMIO: Attrs = Attrs(bindings::DMA_ATTR_MMIO);
}

/// An abstraction of the `dma_alloc_coherent` API.
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:36 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
include/trace/events/dma.h | 4 ++--
kernel/dma/mapping.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index ee90d6f1dcf35..84416c7d6bfaa 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -72,7 +72,7 @@ DEFINE_EVENT(dma_map, name, \
size_t size, enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))

-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
DEFINE_MAP_EVENT(dma_map_resource);

DECLARE_EVENT_CLASS(dma_unmap,
@@ -110,7 +110,7 @@ DEFINE_EVENT(dma_unmap, name, \
enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, addr, size, dir, attrs))

-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
DEFINE_UNMAP_EVENT(dma_unmap_resource);

DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4c1dfbabb8ae5..fe1f0da6dc507 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
iommu_dma_unmap_page(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
- trace_dma_unmap_page(dev, addr, size, dir, attrs);
+ trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:42 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
order to allow single phys_addr_t flow.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 11c5d5f8c0981..0a19ce50938b3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1193,12 +1193,17 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
- bool coherent = dev_is_dma_coherent(dev);
- int prot = dma_info_to_prot(dir, coherent, attrs);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
dma_addr_t iova, dma_mask = dma_get_mask(dev);
+ bool coherent;
+ int prot;
+
+ if (attrs & DMA_ATTR_MMIO)
+ return __iommu_dma_map(dev, phys, size,
+ dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
+ dma_get_mask(dev));

/*
* If both the physical buffer start address and size are page aligned,
@@ -1211,6 +1216,9 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
return DMA_MAPPING_ERROR;
}

+ coherent = dev_is_dma_coherent(dev);
+ prot = dma_info_to_prot(dir, coherent, attrs);
+
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
arch_sync_dma_for_device(phys, size, dir);

@@ -1223,10 +1231,14 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- struct iommu_domain *domain = iommu_get_dma_domain(dev);
phys_addr_t phys;

- phys = iommu_iova_to_phys(domain, dma_handle);
+ if (attrs & DMA_ATTR_MMIO) {
+ __iommu_dma_unmap(dev, dma_handle, size);
+ return;
+ }
+
+ phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
if (WARN_ON(!phys))
return;

--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:46 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
arch/powerpc/kernel/dma-iommu.c | 4 +--
include/linux/dma-map-ops.h | 8 +++---
kernel/dma/direct.c | 6 ++--
kernel/dma/direct.h | 50 ++++++++++++++++++++-------------
kernel/dma/mapping.c | 8 +++---
5 files changed, 44 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f38..0359ab72cd3ba 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
#define can_map_direct(dev, addr) \
((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))

-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
{
if (likely(!dev->bus_dma_limit))
return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)

#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)

-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
{
if (likely(!dev->bus_dma_limit))
return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5d..71f5b30254159 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
void arch_dma_clear_uncached(void *addr, size_t size);

#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
#else
-#define arch_dma_map_page_direct(d, a) (false)
-#define arch_dma_unmap_page_direct(d, a) (false)
+#define arch_dma_map_phys_direct(d, a) (false)
+#define arch_dma_unmap_phys_direct(d, a) (false)
#define arch_dma_map_sg_direct(d, s, n) (false)
#define arch_dma_unmap_sg_direct(d, s, n) (false)
#endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c8799..fa75e30700730 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
if (sg_dma_is_bus_address(sg))
sg_dma_unmark_bus_address(sg);
else
- dma_direct_unmap_page(dev, sg->dma_address,
+ dma_direct_unmap_phys(dev, sg->dma_address,
sg_dma_len(sg), dir, attrs);
}
}
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
*/
break;
case PCI_P2PDMA_MAP_NONE:
- sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
- sg->offset, sg->length, dir, attrs);
+ sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+ sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR) {
ret = -EIO;
goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc0..2b442efc9b5a7 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,42 +80,54 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
arch_dma_mark_clean(paddr, size);
}

-static inline dma_addr_t dma_direct_map_page(struct device *dev,
- struct page *page, unsigned long offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+ phys_addr_t phys, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
- dma_addr_t dma_addr = phys_to_dma(dev, phys);
+ bool is_mmio = attrs & DMA_ATTR_MMIO;
+ dma_addr_t dma_addr;
+ bool capable;
+
+ dma_addr = (is_mmio) ? phys : phys_to_dma(dev, phys);
+ capable = dma_capable(dev, dma_addr, size, is_mmio);
+ if (is_mmio) {
+ if (unlikely(!capable))
+ goto err_overflow;
+ return dma_addr;
+ }

- if (is_swiotlb_force_bounce(dev)) {
- if (is_pci_p2pdma_page(page))
- return DMA_MAPPING_ERROR;
+ if (is_swiotlb_force_bounce(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
- }

- if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
- dma_kmalloc_needs_bounce(dev, size, dir)) {
- if (is_pci_p2pdma_page(page))
- return DMA_MAPPING_ERROR;
+ if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
if (is_swiotlb_active(dev))
return swiotlb_map(dev, phys, size, dir, attrs);

- dev_WARN_ONCE(dev, 1,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- return DMA_MAPPING_ERROR;
+ goto err_overflow;
}

if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
arch_sync_dma_for_device(phys, size, dir);
return dma_addr;
+
+err_overflow:
+ dev_WARN_ONCE(
+ dev, 1,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ return DMA_MAPPING_ERROR;
}

-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = dma_to_phys(dev, addr);
+ phys_addr_t phys;
+
+ if (attrs & DMA_ATTR_MMIO)
+ /* nothing to do: uncached and no swiotlb */
+ return;

+ phys = dma_to_phys(dev, addr);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58482536db9bb..80481a873340a 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, phys + size))
- addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+ arch_dma_map_phys_direct(dev, phys + size))
+ addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,

BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_page_direct(dev, addr + size))
- dma_direct_unmap_page(dev, addr, size, dir, attrs);
+ arch_dma_unmap_phys_direct(dev, addr + size))
+ dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:54 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.

The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.

All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 ++++++--------
include/linux/iommu-dma.h | 7 +++----
kernel/dma/mapping.c | 4 ++--
kernel/dma/ops_helpers.c | 6 +++---
4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 399838c17b705..11c5d5f8c0981 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1190,11 +1190,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
return iova_offset(iovad, phys | size);
}

-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1222,7 +1220,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
return iova;
}

-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1341,7 +1339,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
int i;

for_each_sg(sg, s, nents, i)
- iommu_dma_unmap_page(dev, sg_dma_address(s),
+ iommu_dma_unmap_phys(dev, sg_dma_address(s),
sg_dma_len(s), dir, attrs);
}

@@ -1354,8 +1352,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
sg_dma_mark_swiotlb(sg);

for_each_sg(sg, s, nents, i) {
- sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
- s->offset, s->length, dir, attrs);
+ sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+ s->length, dir, attrs);
if (sg_dma_address(s) == DMA_MAPPING_ERROR)
goto out_unmap;
sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39e..485bdffed9888 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
}
#endif /* CONFIG_IOMMU_DMA */

-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs);
int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe1f0da6dc507..58482536db9bb 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
- addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+ addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
arch_dma_unmap_page_direct(dev, addr + size))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+ iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb96..6f9d604d9d406 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
return NULL;

if (use_dma_iommu(dev))
- *dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
- DMA_ATTR_SKIP_CPU_SYNC);
+ *dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
else
*dma_handle = ops->map_page(dev, page, 0, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
const struct dma_map_ops *ops = get_dma_ops(dev);

if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, dma_handle, size, dir,
+ iommu_dma_unmap_phys(dev, dma_handle, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
else if (ops->unmap_page)
ops->unmap_page(dev, dma_handle, size, dir,
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:43:56 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Extend base DMA page API to handle MMIO flow.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
kernel/dma/mapping.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 709405d46b2b4..f5f051737e556 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
{
const struct dma_map_ops *ops = get_dma_ops(dev);
phys_addr_t phys = page_to_phys(page) + offset;
+ bool is_mmio = attrs & DMA_ATTR_MMIO;
dma_addr_t addr;

BUG_ON(!valid_dma_direction(dir));
@@ -166,12 +167,23 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_phys_direct(dev, phys + size))
+ (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
- else
+ else if (is_mmio) {
+ if (!ops->map_resource)
+ return DMA_MAPPING_ERROR;
+
+ addr = ops->map_resource(dev, phys, size, dir, attrs);
+ } else {
+ /*
+ * All platforms which implement .map_page() don't support
+ * non-struct page backed addresses.
+ */
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ }
+
kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
@@ -184,14 +196,18 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ bool is_mmio = attrs & DMA_ATTR_MMIO;

BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_phys_direct(dev, addr + size))
+ (!is_mmio && arch_dma_unmap_phys_direct(dev, addr + size)))
dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
- else
+ else if (is_mmio) {
+ if (ops->unmap_resource)
+ ops->unmap_resource(dev, addr, size, dir, attrs);
+ } else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:01 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the KMSAN DMA handling function from page-based to physical
address-based interface.

The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). A PFN_VALID check is added to prevent KMSAN operations
on non-page memory, preventing from non struct page backed address,

As part of this change, support for highmem addresses is implemented
using kmap_local_page() to handle both lowmem and highmem regions
properly. All callers throughout the codebase are updated to use the
new phys_addr_t based interface.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/virtio/virtio_ring.c | 4 ++--
include/linux/kmsan.h | 12 +++++++-----
kernel/dma/mapping.c | 2 +-
mm/kmsan/hooks.c | 36 +++++++++++++++++++++++++++++-------
tools/virtio/linux/kmsan.h | 2 +-
5 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5062061c4084..c147145a65930 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
* is initialized by the hardware. Explicitly check/unpoison it
* depending on the direction.
*/
- kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+ kmsan_handle_dma(sg_phys(sg), sg->length, direction);
*addr = (dma_addr_t)sg_phys(sg);
return 0;
}
@@ -3157,7 +3157,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
struct vring_virtqueue *vq = to_vvq(_vq);

if (!vq->use_dma_api) {
- kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+ kmsan_handle_dma(virt_to_phys(ptr), size, dir);
return (dma_addr_t)virt_to_phys(ptr);
}

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d59..6f27b9824ef77 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

/**
* kmsan_handle_dma() - Handle a DMA data transfer.
- * @page: first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys: physical address of the buffer.
* @size: buffer size.
* @dir: one of possible dma_data_direction values.
*
@@ -191,8 +190,11 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
* * checks the buffer, if it is copied to device;
* * initializes the buffer, if it is copied from device;
* * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ *
+ * The function handles page lookup internally and supports both lowmem
+ * and highmem addresses.
*/
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir);

/**
@@ -372,8 +374,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
{
}

-static inline void kmsan_handle_dma(struct page *page, size_t offset,
- size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+ enum dma_data_direction dir)
{
}

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 80481a873340a..709405d46b2b4 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
- kmsan_handle_dma(page, offset, size, dir);
+ kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f07..eab7912a3bf05 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,25 +336,48 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
}

/* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
u64 page_offset, to_go, addr;
+ struct page *page;
+ void *kaddr;

- if (PageHighMem(page))
+ if (!pfn_valid(PHYS_PFN(phys)))
return;
- addr = (u64)page_address(page) + offset;
+
+ page = phys_to_page(phys);
+ page_offset = offset_in_page(phys);
+
/*
* The kernel may occasionally give us adjacent DMA pages not belonging
* to the same allocation. Process them separately to avoid triggering
* internal KMSAN checks.
*/
while (size > 0) {
- page_offset = offset_in_page(addr);
to_go = min(PAGE_SIZE - page_offset, (u64)size);
+
+ if (PageHighMem(page))
+ /* Handle highmem pages using kmap */
+ kaddr = kmap_local_page(page);
+ else
+ /* Lowmem pages can be accessed directly */
+ kaddr = page_address(page);
+
+ addr = (u64)kaddr + page_offset;
kmsan_handle_dma_page((void *)addr, to_go, dir);
- addr += to_go;
+
+ if (PageHighMem(page))
+ kunmap_local(page);
+
+ phys += to_go;
size -= to_go;
+
+ /* Move to next page if needed */
+ if (size > 0) {
+ page = phys_to_page(phys);
+ page_offset = offset_in_page(phys);
+ }
}
}
EXPORT_SYMBOL_GPL(kmsan_handle_dma);
@@ -366,8 +389,7 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
int i;

for_each_sg(sg, item, nents, i)
- kmsan_handle_dma(sg_page(item), item->offset, item->length,
- dir);
+ kmsan_handle_dma(sg_phys(item), item->length, dir);
}

/* Functions from kmsan-checks.h follow. */
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5a..6cd2e3efd03dc 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@

#include <linux/gfp.h>

-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
}
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:07 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs() and dma_unmap_page_attrs() functions
converted to simple wrappers around the phys-based implementations.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 --------
include/linux/dma-direct.h | 2 --
include/linux/dma-mapping.h | 13 +++++++
include/linux/iommu-dma.h | 4 ---
include/trace/events/dma.h | 2 --
kernel/dma/debug.c | 43 -----------------------
kernel/dma/debug.h | 21 ------------
kernel/dma/direct.c | 16 ---------
kernel/dma/mapping.c | 68 ++++++++++++++++++++-----------------
9 files changed, 49 insertions(+), 134 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0a19ce50938b3..69f85209be7ab 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1556,20 +1556,6 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
__iommu_dma_unmap(dev, start, end - start);
}

-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- return __iommu_dma_map(dev, phys, size,
- dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
- dma_get_mask(dev));
-}
-
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- __iommu_dma_unmap(dev, handle, size);
-}
-
static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
{
size_t alloc_size = PAGE_ALIGN(size);
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index f3bc0bcd70980..c249912456f96 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -149,7 +149,5 @@ void dma_direct_free_pages(struct device *dev, size_t size,
struct page *page, dma_addr_t dma_addr,
enum dma_data_direction dir);
int dma_direct_supported(struct device *dev, u64 mask);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs);

#endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index afc89835c7457..2aa43a6bed92b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -132,6 +132,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs);
void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -186,6 +190,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
}
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
static inline unsigned int dma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents, enum dma_data_direction dir,
unsigned long attrs)
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 485bdffed9888..a92b3ff9b9343 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -42,10 +42,6 @@ size_t iommu_dma_opt_mapping_size(void);
size_t iommu_dma_max_mapping_size(struct device *dev);
void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t handle, unsigned long attrs);
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev, size_t size,
enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index 84416c7d6bfaa..5da59fd8121db 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -73,7 +73,6 @@ DEFINE_EVENT(dma_map, name, \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))

DEFINE_MAP_EVENT(dma_map_phys);
-DEFINE_MAP_EVENT(dma_map_resource);

DECLARE_EVENT_CLASS(dma_unmap,
TP_PROTO(struct device *dev, dma_addr_t addr, size_t size,
@@ -111,7 +110,6 @@ DEFINE_EVENT(dma_unmap, name, \
TP_ARGS(dev, addr, size, dir, attrs))

DEFINE_UNMAP_EVENT(dma_unmap_phys);
-DEFINE_UNMAP_EVENT(dma_unmap_resource);

DECLARE_EVENT_CLASS(dma_alloc_class,
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index da6734e3a4ce9..06e31fd216e38 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -38,7 +38,6 @@ enum {
dma_debug_single,
dma_debug_sg,
dma_debug_coherent,
- dma_debug_resource,
dma_debug_phy,
};

@@ -141,7 +140,6 @@ static const char *type2name[] = {
[dma_debug_single] = "single",
[dma_debug_sg] = "scatter-gather",
[dma_debug_coherent] = "coherent",
- [dma_debug_resource] = "resource",
[dma_debug_phy] = "phy",
};

@@ -1448,47 +1446,6 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
check_unmap(&ref);
}

-void debug_dma_map_resource(struct device *dev, phys_addr_t addr, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
-{
- struct dma_debug_entry *entry;
-
- if (unlikely(dma_debug_disabled()))
- return;
-
- entry = dma_entry_alloc();
- if (!entry)
- return;
-
- entry->type = dma_debug_resource;
- entry->dev = dev;
- entry->paddr = addr;
- entry->size = size;
- entry->dev_addr = dma_addr;
- entry->direction = direction;
- entry->map_err_type = MAP_ERR_NOT_CHECKED;
-
- add_dma_entry(entry, attrs);
-}
-
-void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
- size_t size, int direction)
-{
- struct dma_debug_entry ref = {
- .type = dma_debug_resource,
- .dev = dev,
- .dev_addr = dma_addr,
- .size = size,
- .direction = direction,
- };
-
- if (unlikely(dma_debug_disabled()))
- return;
-
- check_unmap(&ref);
-}
-
void debug_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
size_t size, int direction)
{
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index 76adb42bffd5f..424b8f912aded 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -30,14 +30,6 @@ extern void debug_dma_alloc_coherent(struct device *dev, size_t size,
extern void debug_dma_free_coherent(struct device *dev, size_t size,
void *virt, dma_addr_t addr);

-extern void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
- size_t size, int direction,
- dma_addr_t dma_addr,
- unsigned long attrs);
-
-extern void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
- size_t size, int direction);
-
extern void debug_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size,
int direction);
@@ -88,19 +80,6 @@ static inline void debug_dma_free_coherent(struct device *dev, size_t size,
{
}

-static inline void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
- size_t size, int direction,
- dma_addr_t dma_addr,
- unsigned long attrs)
-{
-}
-
-static inline void debug_dma_unmap_resource(struct device *dev,
- dma_addr_t dma_addr, size_t size,
- int direction)
-{
-}
-
static inline void debug_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle,
size_t size, int direction)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fa75e30700730..1062caac47e7b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -502,22 +502,6 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
return ret;
}

-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- dma_addr_t dma_addr = paddr;
-
- if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
- dev_err_once(dev,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- WARN_ON_ONCE(1);
- return DMA_MAPPING_ERROR;
- }
-
- return dma_addr;
-}
-
int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index f5f051737e556..b747794448130 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,10 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
}

-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
- size_t offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- phys_addr_t phys = page_to_phys(page) + offset;
bool is_mmio = attrs & DMA_ATTR_MMIO;
dma_addr_t addr;

@@ -177,6 +175,9 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,

addr = ops->map_resource(dev, phys, size, dir, attrs);
} else {
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(phys);
+
/*
* All platforms which implement .map_page() don't support
* non-struct page backed addresses.
@@ -190,9 +191,25 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,

return addr;
}
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+ size_t offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ phys_addr_t phys = page_to_phys(page) + offset;
+
+ if (unlikely(attrs & DMA_ATTR_MMIO))
+ return DMA_MAPPING_ERROR;
+
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
+ WARN_ON_ONCE(!pfn_valid(PHYS_PFN(phys)));
+
+ return dma_map_phys(dev, phys, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_map_page_attrs);

-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -212,6 +229,16 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ if (unlikely(attrs & DMA_ATTR_MMIO))
+ return;
+
+ dma_unmap_phys(dev, addr, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_unmap_page_attrs);

static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -337,41 +364,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- dma_addr_t addr = DMA_MAPPING_ERROR;
-
- BUG_ON(!valid_dma_direction(dir));
-
- if (WARN_ON_ONCE(!dev->dma_mask))
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+ WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;

- if (dma_map_direct(dev, ops))
- addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
- else if (use_dma_iommu(dev))
- addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
- else if (ops->map_resource)
- addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
-
- trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
- debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
- return addr;
+ return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
}
EXPORT_SYMBOL(dma_map_resource);

void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_map_direct(dev, ops))
- ; /* nothing to do: uncached and no swiotlb */
- else if (use_dma_iommu(dev))
- iommu_dma_unmap_resource(dev, addr, size, dir, attrs);
- else if (ops->unmap_resource)
- ops->unmap_resource(dev, addr, size, dir, attrs);
- trace_dma_unmap_resource(dev, addr, size, dir, attrs);
- debug_dma_unmap_resource(dev, addr, size, dir);
+ dma_unmap_phys(dev, addr, size, dir, attrs | DMA_ATTR_MMIO);
}
EXPORT_SYMBOL(dma_unmap_resource);

--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:12 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.

The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.

This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
mm/hmm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index d545e24949949..015ab243f0813 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -775,8 +775,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
goto error;

- dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);
if (dma_mapping_error(dev, dma_addr))
goto error;

@@ -819,8 +819,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
} else if (dma_need_unmap(dev))
- dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);

pfns[idx] &=
~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:16 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

General dma_direct_map_resource() is going to be removed
in next patch, so simply open-code it in xen driver.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/xen/swiotlb-xen.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index da1a7d3d377cf..dd7747a2de879 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -392,6 +392,25 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
}
}

+static dma_addr_t xen_swiotlb_direct_map_resource(struct device *dev,
+ phys_addr_t paddr,
+ size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ dma_addr_t dma_addr = paddr;
+
+ if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
+ dev_err_once(dev,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ WARN_ON_ONCE(1);
+ return DMA_MAPPING_ERROR;
+ }
+
+ return dma_addr;
+}
+
/*
* Return whether the given device DMA address mask can be supported
* properly. For example, if your device can only drive the low 24-bits
@@ -426,5 +445,5 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.max_mapping_size = swiotlb_max_mapping_size,
- .map_resource = dma_direct_map_resource,
+ .map_resource = xen_swiotlb_direct_map_resource,
};
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:20 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

In case peer-to-peer transaction traverses through host bridge,
the IOMMU needs to have IOMMU_MMIO flag, together with skip of
CPU sync.

The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag,
but IOMMU flag was missed, due to assumption that such memory
can be treated as regular one.

Reuse newly introduced DMA attribute to properly take MMIO path.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
mm/hmm.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 015ab243f0813..6556c0e074ba8 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -746,7 +746,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
case PCI_P2PDMA_MAP_NONE:
break;
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
- attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+ attrs |= DMA_ATTR_MMIO;
pfns[idx] |= HMM_PFN_P2PDMA;
break;
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -776,7 +776,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
goto error;

dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
- DMA_BIDIRECTIONAL, 0);
+ DMA_BIDIRECTIONAL, attrs);
if (dma_mapping_error(dev, dma_addr))
goto error;

@@ -811,16 +811,17 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
if ((pfns[idx] & valid_dma) != valid_dma)
return false;

+ if (pfns[idx] & HMM_PFN_P2PDMA)
+ attrs |= DMA_ATTR_MMIO;
+
if (pfns[idx] & HMM_PFN_P2PDMA_BUS)
; /* no need to unmap bus address P2P mappings */
- else if (dma_use_iova(state)) {
- if (pfns[idx] & HMM_PFN_P2PDMA)
- attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+ else if (dma_use_iova(state))
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
- } else if (dma_need_unmap(dev))
+ else if (dma_need_unmap(dev))
dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL, 0);
+ DMA_BIDIRECTIONAL, attrs);

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:25 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

After introduction of dma_map_phys(), there is no need to convert
from physical address to struct page in order to map page. So let's
use it directly.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
block/blk-mq-dma.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index ad283017caef2..37e2142be4f7d 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
struct blk_dma_iter *iter, struct phys_vec *vec)
{
- iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
- offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+ iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
+ rq_dma_dir(req), 0);
if (dma_mapping_error(dma_dev, iter->addr)) {
iter->status = BLK_STS_RESOURCE;
return false;
--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:30 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Block layer maps MMIO memory through dma_map_phys() interface
with help of DMA_ATTR_MMIO attribute. There is a need to unmap
that memory with the appropriate unmap function.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/nvme/host/pci.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 071efec25346f..0b624247948c5 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -682,11 +682,15 @@ static void nvme_free_prps(struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+ unsigned int attrs = 0;
unsigned int i;

+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
+
for (i = 0; i < iod->nr_dma_vecs; i++)
- dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr,
- iod->dma_vecs[i].len, rq_dma_dir(req));
+ dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
+ iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
}

@@ -699,15 +703,19 @@ static void nvme_free_sgls(struct request *req)
unsigned int sqe_dma_len = le32_to_cpu(iod->cmd.common.dptr.sgl.length);
struct nvme_sgl_desc *sg_list = iod->descriptors[0];
enum dma_data_direction dir = rq_dma_dir(req);
+ unsigned int attrs = 0;
+
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;

if (iod->nr_descriptors) {
unsigned int nr_entries = sqe_dma_len / sizeof(*sg_list), i;

for (i = 0; i < nr_entries; i++)
- dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr),
- le32_to_cpu(sg_list[i].length), dir);
+ dma_unmap_phys(dma_dev, le64_to_cpu(sg_list[i].addr),
+ le32_to_cpu(sg_list[i].length), dir, attrs);
} else {
- dma_unmap_page(dma_dev, sqe_dma_addr, sqe_dma_len, dir);
+ dma_unmap_phys(dma_dev, sqe_dma_addr, sqe_dma_len, dir, attrs);
}
}

--
2.50.1

Leon Romanovsky

unread,
Aug 4, 2025, 8:44:34 AMAug 4
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Make sure that CPU is not synced and IOMMU is configured to take
MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
block/blk-mq-dma.c | 13 +++++++++++--
include/linux/blk-mq-dma.h | 6 +++++-
include/linux/blk_types.h | 2 ++
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index 37e2142be4f7d..d415088ed9fd2 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,13 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
struct blk_dma_iter *iter, struct phys_vec *vec)
{
+ unsigned int attrs = 0;
+
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
+
iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
- rq_dma_dir(req), 0);
+ rq_dma_dir(req), attrs);
if (dma_mapping_error(dma_dev, iter->addr)) {
iter->status = BLK_STS_RESOURCE;
return false;
@@ -103,14 +108,17 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
{
enum dma_data_direction dir = rq_dma_dir(req);
unsigned int mapped = 0;
+ unsigned int attrs = 0;
int error;

iter->addr = state->addr;
iter->len = dma_iova_size(state);
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;

do {
error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
- vec->len, dir, 0);
+ vec->len, dir, attrs);
if (error)
break;
mapped += vec->len;
@@ -176,6 +184,7 @@ bool blk_rq_dma_map_iter_start(struct request *req, struct device *dma_dev,
* same as non-P2P transfers below and during unmap.
*/
req->cmd_flags &= ~REQ_P2PDMA;
+ req->cmd_flags |= REQ_MMIO;
break;
default:
iter->status = BLK_STS_INVAL;
diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h
index c26a01aeae006..6c55f5e585116 100644
--- a/include/linux/blk-mq-dma.h
+++ b/include/linux/blk-mq-dma.h
@@ -48,12 +48,16 @@ static inline bool blk_rq_dma_map_coalesce(struct dma_iova_state *state)
static inline bool blk_rq_dma_unmap(struct request *req, struct device *dma_dev,
struct dma_iova_state *state, size_t mapped_len)
{
+ unsigned int attrs = 0;
+
if (req->cmd_flags & REQ_P2PDMA)
return true;

if (dma_use_iova(state)) {
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
dma_iova_destroy(dma_dev, state, mapped_len, rq_dma_dir(req),
- 0);
+ attrs);
return true;
}

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 09b99d52fd365..283058bcb5b14 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -387,6 +387,7 @@ enum req_flag_bits {
__REQ_FS_PRIVATE, /* for file system (submitter) use */
__REQ_ATOMIC, /* for atomic write operations */
__REQ_P2PDMA, /* contains P2P DMA pages */
+ __REQ_MMIO, /* contains MMIO memory */
/*
* Command specific flags, keep last:
*/
@@ -420,6 +421,7 @@ enum req_flag_bits {
#define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
#define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
#define REQ_P2PDMA (__force blk_opf_t)(1ULL << __REQ_P2PDMA)
+#define REQ_MMIO (__force blk_opf_t)(1ULL << __REQ_MMIO)

#define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP)

--
2.50.1

Jason Gunthorpe

unread,
Aug 5, 2025, 11:36:10 AMAug 5
to Matthew Wilcox, Robin Murphy, Marek Szyprowski, Christoph Hellwig, Leon Romanovsky, Jonathan Corbet, Madhavan Srinivasan, Michael Ellerman, Nicholas Piggin, Christophe Leroy, Joerg Roedel, Will Deacon, Michael S. Tsirkin, Jason Wang, Xuan Zhuo, Eugenio Pérez, Alexander Potapenko, Marco Elver, Dmitry Vyukov, Masami Hiramatsu, Mathieu Desnoyers, Jérôme Glisse, Andrew Morton, linu...@vger.kernel.org, linux-...@vger.kernel.org, linuxp...@lists.ozlabs.org, io...@lists.linux.dev, virtual...@lists.linux.dev, kasa...@googlegroups.com, linux-tra...@vger.kernel.org, linu...@kvack.org
On Mon, Aug 04, 2025 at 04:37:56AM +0100, Matthew Wilcox wrote:
> On Sun, Aug 03, 2025 at 12:59:06PM -0300, Jason Gunthorpe wrote:
> > Matthew, do you think it makes sense to introduce types to make this
> > clearer? We have two kinds of values that a phys_addr_t can store -
> > something compatible with kmap_XX_phys(), and something that isn't.
>
> I was with you up until this point. And then you said "What if we have
> a raccoon that isn't a raccoon" and my brain derailed.

I though it was clear..

kmap_local_pfn(phys >> PAGE_SHIFT)
phys_to_virt(phys)

Does not work for all values of phys. It definately illegal for
non-cachable MMIO. Agree?

There is a subset of phys that is cachable and has struct page that is
usable with kmap_local_pfn()/etc

phys is always this:

> - CPU untranslated. This is the "physical" address. Physical address
> 0 is what the CPU sees when it drives zeroes on the memory bus.

But that is a pure HW perspective. It doesn't say which of our SW APIs
are allowed to use this address.

We have callchains in DMA API land that want to do a kmap at the
bottom. It would be nice to mark the whole call chain that the
phys_addr being passed around is actually required to be kmappable.

Because if you pass a non-kmappable MMIO backed phys it will explode
in some way on some platforms.

> > We clearly have these two different ideas floating around in code,
> > page tables, etc.

> No. No, we don't. I've never heard of this asininity before.

Welcome to the fun world of cachable and non-cachable memory.

Consider, today we can create struct pages of type
MEMORY_DEVICE_PCI_P2PDMA for non-cachable MMIO. I think today you
"can" use kmap to establish a cachable mapping in the vmap.

But it is *illegal* to establish a cachable CPU mapping of MMIO. Archs
are free to MCE if you do this - speculative cache line load of MMIO
can just error in HW inside the interconnect.

So, the phys_addr is always a "CPU untranslated physical address" but
the cachable/non-cachable cases, or DRAM vs MMIO, are sometimes
semantically very different things for the SW!

Jason

Jason Gunthorpe

unread,
Aug 6, 2025, 1:31:37 PMAug 6
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:35PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
> that reside in memory-mapped I/O (MMIO) regions, such as device BARs
> exposed through the host bridge, which are accessible for peer-to-peer
> (P2P) DMA.
>
> This attribute is especially useful for exporting device memory to other
> devices for DMA without CPU involvement, and avoids unnecessary or
> potentially detrimental CPU cache maintenance calls.

It is worth mentioning here that dma_map_resource() and DMA_ATTR_MMIO
are intended to be the same thing.

> --- a/Documentation/core-api/dma-attributes.rst
> +++ b/Documentation/core-api/dma-attributes.rst
> @@ -130,3 +130,10 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
> subsystem that the buffer is fully accessible at the elevated privilege
> level (and ideally inaccessible or at least read-only at the
> lesser-privileged levels).
> +
> +DMA_ATTR_MMIO
> +-------------
> +
> +This attribute is especially useful for exporting device memory to other
> +devices for DMA without CPU involvement, and avoids unnecessary or
> +potentially detrimental CPU cache maintenance calls.

How about

This attribute indicates the physical address is not normal system
memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
functions, it may not be cachable, and access using CPU load/store
instructions may not be allowed.

Usually this will be used to describe MMIO addresses, or other non
cachable register addresses. When DMA mapping this sort of address we
call the operation Peer to Peer as a one device is DMA'ing to another
device. For PCI devices the p2pdma APIs must be used to determine if
DMA_ATTR_MMIO is appropriate.

For architectures that require cache flushing for DMA coherence
DMA_ATTR_MMIO will not perform any cache flushing. The address
provided must never be mapped cachable into the CPU.

> +/*
> + * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
> + *
> + * This attribute is used for MMIO memory regions that are exposed through
> + * the host bridge and are accessible for peer-to-peer (P2P) DMA. Memory
> + * marked with this attribute is not system RAM and may represent device
> + * BAR windows or peer-exposed memory.
> + *
> + * Typical usage is for mapping hardware memory BARs or exporting device
> + * memory to other devices for DMA without involving main system RAM.
> + * The attribute guarantees no CPU cache maintenance calls will be made.
> + */

I'd copy the Documentation/ text

Jason

Jason Gunthorpe

unread,
Aug 6, 2025, 2:10:48 PMAug 6
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:36PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Make sure that CPU is not synced if MMIO path is taken.

Let's elaborate..

Implement DMA_ATTR_MMIO for dma_iova_link().

This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid
touching the possibly non-KVA MMIO memory.

Also correct the incorrect caching attribute for the IOMMU, MMIO
memory should not be cachable inside the IOMMU mapping or it can
possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO.

> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index ea2ef53bd4fef..399838c17b705 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1837,13 +1837,20 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
> phys_addr_t phys, size_t size, enum dma_data_direction dir,
> unsigned long attrs)
> {
> - bool coherent = dev_is_dma_coherent(dev);
> + int prot;
>
> - if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> - arch_sync_dma_for_device(phys, size, dir);
> + if (attrs & DMA_ATTR_MMIO)
> + prot = dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO;

Yeah, exactly, we need the IOPTE on ARM to have the right cachability
or some systems might go wrong.


> + else {
> + bool coherent = dev_is_dma_coherent(dev);
> +
> + if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
> + arch_sync_dma_for_device(phys, size, dir);
> + prot = dma_info_to_prot(dir, coherent, attrs);
> + }
>
> return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
> - dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
> + prot, GFP_ATOMIC);
> }

Hmm, I missed this in prior series, ideally the GFP_ATOMIC should be
passed in as a gfp_t here so we can use GFP_KERNEL in callers that are
able.

Reviewed-by: Jason Gunthorpe <j...@nvidia.com>

Jason

Jason Gunthorpe

unread,
Aug 6, 2025, 2:26:38 PMAug 6
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:37PM +0300, Leon Romanovsky wrote:
> +void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> + int direction, dma_addr_t dma_addr, unsigned long attrs)
> {
> struct dma_debug_entry *entry;

Should this patch should also absorb debug_dma_map_resource() into
here as well and we can have the caller of dma_dma_map_resource() call
debug_dma_map_page with ATTR_MMIO?

If not, this looks OK

Leon Romanovsky

unread,
Aug 6, 2025, 2:38:38 PMAug 6
to Jason Gunthorpe, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Wed, Aug 06, 2025 at 03:26:30PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 03:42:37PM +0300, Leon Romanovsky wrote:
> > +void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> > + int direction, dma_addr_t dma_addr, unsigned long attrs)
> > {
> > struct dma_debug_entry *entry;
>
> Should this patch should also absorb debug_dma_map_resource() into
> here as well and we can have the caller of dma_dma_map_resource() call
> debug_dma_map_page with ATTR_MMIO?

It is done in "[PATCH v1 11/16] dma-mapping: export new dma_*map_phys() interface".

Thanks

Jason Gunthorpe

unread,
Aug 6, 2025, 2:44:50 PMAug 6
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:39PM +0300, Leon Romanovsky wrote:
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 399838c17b705..11c5d5f8c0981 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1190,11 +1190,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
> return iova_offset(iovad, phys | size);
> }
>
> -dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
> - unsigned long offset, size_t size, enum dma_data_direction dir,
> - unsigned long attrs)
> +dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> + enum dma_data_direction dir, unsigned long attrs)
> {
> - phys_addr_t phys = page_to_phys(page) + offset;
> bool coherent = dev_is_dma_coherent(dev);
> int prot = dma_info_to_prot(dir, coherent, attrs);
> struct iommu_domain *domain = iommu_get_dma_domain(dev);

No issue with pushing the page_to_phys to the looks like two callers..

It is worth pointing though that today if the page * was a
MEMORY_DEVICE_PCI_P2PDMA page then it is illegal to call the swiotlb
functions a few lines below this:

phys = iommu_dma_map_swiotlb(dev, phys, size, dir, attrs);

ie struct page alone as a type is not sufficient to make this function
safe for a long time now.

So I would add some explanation in the commit message how this will be
situated in the final call chains, and maybe leave behind a comment
that attrs may not have ATTR_MMIO in this function.

I think the answer is iommu_dma_map_phys() is only called for
!ATTR_MMIO addresses, and that iommu_dma_map_resource() will be called
for ATTR_MMIO?

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 8:07:21 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:40PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
> order to allow single phys_addr_t flow.

Some later patch deletes iommu_dma_map_resource() ? Mention that plan here?

> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1193,12 +1193,17 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
> dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
> enum dma_data_direction dir, unsigned long attrs)
> {
> - bool coherent = dev_is_dma_coherent(dev);
> - int prot = dma_info_to_prot(dir, coherent, attrs);
> struct iommu_domain *domain = iommu_get_dma_domain(dev);
> struct iommu_dma_cookie *cookie = domain->iova_cookie;
> struct iova_domain *iovad = &cookie->iovad;
> dma_addr_t iova, dma_mask = dma_get_mask(dev);
> + bool coherent;
> + int prot;
> +
> + if (attrs & DMA_ATTR_MMIO)
> + return __iommu_dma_map(dev, phys, size,
> + dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
> + dma_get_mask(dev));

I realize that iommu_dma_map_resource() doesn't today, but shouldn't
this be checking for swiotlb:

if (dev_use_swiotlb(dev, size, dir) &&
iova_unaligned(iovad, phys, size)) {

Except we have to fail for ATTR_MMIO?

Now that we have ATTR_MMIO, should dma_info_to_prot() just handle it
directly instead of open coding the | IOMMU_MMIO and messing with the
coherent attribute?

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 8:13:35 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:41PM +0300, Leon Romanovsky wrote:
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -80,42 +80,54 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
> arch_dma_mark_clean(paddr, size);
> }
>
> -static inline dma_addr_t dma_direct_map_page(struct device *dev,
> - struct page *page, unsigned long offset, size_t size,
> - enum dma_data_direction dir, unsigned long attrs)
> +static inline dma_addr_t dma_direct_map_phys(struct device *dev,
> + phys_addr_t phys, size_t size, enum dma_data_direction dir,
> + unsigned long attrs)
> {
> - phys_addr_t phys = page_to_phys(page) + offset;
> - dma_addr_t dma_addr = phys_to_dma(dev, phys);
> + bool is_mmio = attrs & DMA_ATTR_MMIO;
> + dma_addr_t dma_addr;
> + bool capable;
> +
> + dma_addr = (is_mmio) ? phys : phys_to_dma(dev, phys);
> + capable = dma_capable(dev, dma_addr, size, is_mmio);
> + if (is_mmio) {
> + if (unlikely(!capable))
> + goto err_overflow;
> + return dma_addr;

Similar remark here, shouldn't we be checking swiotlb things for
ATTR_MMIO and failing if swiotlb is needed?

> - if (is_swiotlb_force_bounce(dev)) {
> - if (is_pci_p2pdma_page(page))
> - return DMA_MAPPING_ERROR;

This

> - if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
> - dma_kmalloc_needs_bounce(dev, size, dir)) {
> - if (is_pci_p2pdma_page(page))
> - return DMA_MAPPING_ERROR;

And this

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 8:21:21 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:42PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Convert the KMSAN DMA handling function from page-based to physical
> address-based interface.
>
> The refactoring renames kmsan_handle_dma() parameters from accepting
> (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> size_t size). A PFN_VALID check is added to prevent KMSAN operations
> on non-page memory, preventing from non struct page backed address,
>
> As part of this change, support for highmem addresses is implemented
> using kmap_local_page() to handle both lowmem and highmem regions
> properly. All callers throughout the codebase are updated to use the
> new phys_addr_t based interface.

Use the function Matthew pointed at kmap_local_pfn()

Maybe introduce the kmap_local_phys() he suggested too.

> /* Helper function to handle DMA data transfers. */
> -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> enum dma_data_direction dir)
> {
> u64 page_offset, to_go, addr;
> + struct page *page;
> + void *kaddr;
>
> - if (PageHighMem(page))
> + if (!pfn_valid(PHYS_PFN(phys)))
> return;

Not needed, the caller must pass in a phys that is kmap
compatible. Maybe just leave a comment. FWIW today this is also not
checking for P2P or DEVICE non-kmap struct pages either, so it should
be fine without checks.

> - addr = (u64)page_address(page) + offset;
> +
> + page = phys_to_page(phys);
> + page_offset = offset_in_page(phys);
> +
> /*
> * The kernel may occasionally give us adjacent DMA pages not belonging
> * to the same allocation. Process them separately to avoid triggering
> * internal KMSAN checks.
> */
> while (size > 0) {
> - page_offset = offset_in_page(addr);
> to_go = min(PAGE_SIZE - page_offset, (u64)size);
> +
> + if (PageHighMem(page))
> + /* Handle highmem pages using kmap */
> + kaddr = kmap_local_page(page);

No need for the PageHighMem() - just always call kmap_local_pfn().

I'd also propose that any debug/sanitizer checks that the passed phys
is valid for kmap (eg pfn valid, not zone_device, etc) should be
inside the kmap code.

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 9:08:21 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:43PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Extend base DMA page API to handle MMIO flow.

I would mention here this follows the long ago agreement that we don't
need to enable P2P in the legacy dma_ops area. Simply failing when
getting an ATTR_MMIO is OK.

> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
> {
> const struct dma_map_ops *ops = get_dma_ops(dev);
> phys_addr_t phys = page_to_phys(page) + offset;
> + bool is_mmio = attrs & DMA_ATTR_MMIO;
> dma_addr_t addr;
>
> BUG_ON(!valid_dma_direction(dir));
> @@ -166,12 +167,23 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
> return DMA_MAPPING_ERROR;
>
> if (dma_map_direct(dev, ops) ||
> - arch_dma_map_phys_direct(dev, phys + size))
> + (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
> addr = dma_direct_map_phys(dev, phys, size, dir, attrs);

I don't know this area, maybe explain a bit in the commit message how
you see ATTR_MMIO interacts with arch_dma_map_phys_direct ?

> else if (use_dma_iommu(dev))
> addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
> - else
> + else if (is_mmio) {
> + if (!ops->map_resource)
> + return DMA_MAPPING_ERROR;
> +
> + addr = ops->map_resource(dev, phys, size, dir, attrs);
> + } else {
> + /*
> + * All platforms which implement .map_page() don't support
> + * non-struct page backed addresses.
> + */
> addr = ops->map_page(dev, page, offset, size, dir, attrs);

Comment could be clearer maybe just:

The dma_ops API contract for ops->map_page() requires kmappable memory, while
ops->map_resource() does not.

But this approach looks good to me, it prevents non-kmappable phys
from going down to the legacy dma_ops map_page where it cannot work.

From here you could do what Marek and Christoph asked to flush the
struct page out of the ops->map_page() and replace it with
kmap_local_phys().

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 9:14:24 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Yeah, this is a lot cleaner

Jason Gunthorpe

unread,
Aug 7, 2025, 9:15:05 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:46PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Convert HMM DMA operations from the legacy page-based API to the new
> physical address-based dma_map_phys() and dma_unmap_phys() functions.
> This demonstrates the preferred approach for new code that should use
> physical addresses directly rather than page+offset parameters.
>
> The change replaces dma_map_page() and dma_unmap_page() calls with
> dma_map_phys() and dma_unmap_phys() respectively, using the physical
> address that was already available in the code. This eliminates the
> redundant page-to-physical address conversion and aligns with the
> DMA subsystem's move toward physical address-centric interfaces.
>
> This serves as an example of how new code should be written to leverage
> the more efficient physical address API, which provides cleaner interfaces
> for drivers that already have access to physical addresses.
>
> Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
> ---
> mm/hmm.c | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)

Reviewed-by: Jason Gunthorpe <j...@nvidia.com>

Maybe the next patch should be squished into here too if it is going
to be a full example

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 9:38:54 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:45PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
> that operate directly on physical addresses instead of page+offset
> parameters. This provides a more efficient interface for drivers that
> already have physical addresses available.
>
> The new functions are implemented as the primary mapping layer, with
> the existing dma_map_page_attrs() and dma_unmap_page_attrs() functions
> converted to simple wrappers around the phys-based implementations.

Briefly explain how the existing functions are remapped into wrappers
calling the phys functions.

> +dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
> + size_t offset, size_t size, enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + phys_addr_t phys = page_to_phys(page) + offset;
> +
> + if (unlikely(attrs & DMA_ATTR_MMIO))
> + return DMA_MAPPING_ERROR;
> +
> + if (IS_ENABLED(CONFIG_DMA_API_DEBUG))
> + WARN_ON_ONCE(!pfn_valid(PHYS_PFN(phys)));

This is not useful, if we have a struct page and did page_to_phys then
pfn_valid is always true.

Instead this should check for any ZONE_DEVICE page and reject that.
And handle the error:

if (WARN_ON_ONCE()) return DMA_MAPPING_ERROR;

I'd add another debug check inside dma_map_phys that if !ATTR_MMIO
then pfn_valid, and not zone_device

> @@ -337,41 +364,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
> dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
> size_t size, enum dma_data_direction dir, unsigned long attrs)
> {

> - const struct dma_map_ops *ops = get_dma_ops(dev);
> - dma_addr_t addr = DMA_MAPPING_ERROR;
> -
> - BUG_ON(!valid_dma_direction(dir));
> -
> - if (WARN_ON_ONCE(!dev->dma_mask))
> + if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
> + WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
> return DMA_MAPPING_ERROR;
>
> - if (dma_map_direct(dev, ops))
> - addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
> - else if (use_dma_iommu(dev))
> - addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
> - else if (ops->map_resource)
> - addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
> -
> - trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
> - debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
> - return addr;
> + return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
> }
> EXPORT_SYMBOL(dma_map_resource);

I think this makes alot of sense at least.

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 9:45:40 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:50PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> Block layer maps MMIO memory through dma_map_phys() interface
> with help of DMA_ATTR_MMIO attribute. There is a need to unmap
> that memory with the appropriate unmap function.

Be specific, AFIACT the issue is that on dma_ops platforms the map
will call ops->map_resource for ATTR_MMIO so we must have the unmap
call ops->unmap_resournce

Maybe these patches should be swapped then, as adding ATTR_MMIO seems
like it created this issue?

Jason

Jason Gunthorpe

unread,
Aug 7, 2025, 10:19:37 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Mon, Aug 04, 2025 at 03:42:34PM +0300, Leon Romanovsky wrote:
> Changelog:
> v1:
> * Added new DMA_ATTR_MMIO attribute to indicate
> PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
> * Rewrote dma_map_* functions to use thus new attribute
> v0: https://lore.kernel.org/all/cover.175085...@kernel.org/
> ------------------------------------------------------------------------
>
> This series refactors the DMA mapping to use physical addresses
> as the primary interface instead of page+offset parameters. This
> change aligns the DMA API with the underlying hardware reality where
> DMA operations work with physical addresses, not page structures.

Lets elaborate this as Robin asked:

This series refactors the DMA mapping API to provide a phys_addr_t
based, and struct-page free, external API that can handle all the
mapping cases we want in modern systems:

- struct page based cachable DRAM
- struct page MEMORY_DEVICE_PCI_P2PDMA PCI peer to peer non-cachable MMIO
- struct page-less PCI peer to peer non-cachable MMIO
- struct page-less "resource" MMIO

Overall this gets much closer to Matthew's long term wish for
struct-pageless IO to cachable DRAM. The remaining primary work would
be in the mm side to allow kmap_local_pfn()/phys_to_virt() to work on
phys_addr_t without a struct page.

The general design is to remove struct page usage entirely from the
DMA API inner layers. For flows that need to have a KVA for the
physical address they can use kmap_local_pfn() or phys_to_virt(). This
isolates the struct page requirements to MM code only. Long term all
removals of struct page usage are supporting Matthew's memdesc
project which seeks to substantially transform how struct page works.

Instead make the DMA API internals work on phys_addr_t. Internally
there are still dedicated 'page' and 'resource' flows, except they are
now distinguished by a new DMA_ATTR_MMIO instead of by callchain. Both
flows use the same phys_addr_t.

When DMA_ATTR_MMIO is specified things work similar to the existing
'resource' flow. kmap_local_pfn(), phys_to_virt(), phys_to_page(),
pfn_valid(), etc are never called on the phys_addr_t. This requires
rejecting any configuration that would need swiotlb. CPU cache
flushing is not required, and avoided, as ATTR_MMIO also indicates the
address have no cachable mappings. This effectively removes any
DMA API side requirement to have struct page when DMA_ATTR_MMIO is
used.

In the !DMA_ATTR_MMIO mode things work similarly to the 'page' flow,
except on the common path of no cache flush, no swiotlb it never
touches a struct page. When cache flushing or swiotlb copying
kmap_local_pfn()/phys_to_virt() are used to get a KVA for CPU
usage. This was already the case on the unmap side, now the map side
is symmetric.

Callers are adjusted to set DMA_ATTR_MMIO. Existing 'resource' users
must set it. The existing struct page based MEMORY_DEVICE_PCI_P2PDMA
path must also set it. This corrects some existing bugs where iommu
mappings for P2P MMIO were improperly marked IOMMU_CACHE.

Since ATTR_MMIO is made to work with all the existing DMA map entry
points, particularly dma_iova_link(), this finally allows a way to use
the new DMA API to map PCI P2P MMIO without creating struct page. The
VFIO DMABUF series demonstrates how this works. This is intended to
replace the incorrect driver use of dma_map_resource() on PCI BAR
addresses.

This series does the core code and modern flows. A followup series
will give the same treatement to the legacy dma_ops implementation.

Jason

Jürgen Groß

unread,
Aug 7, 2025, 10:41:01 AMAug 7
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On 04.08.25 14:42, Leon Romanovsky wrote:
> From: Leon Romanovsky <leo...@nvidia.com>
>
> General dma_direct_map_resource() is going to be removed
> in next patch, so simply open-code it in xen driver.
>
> Signed-off-by: Leon Romanovsky <leo...@nvidia.com>

Reviewed-by: Juergen Gross <jgr...@suse.com>


Juergen
OpenPGP_0xB0DE9DD628BF132F.asc
OpenPGP_signature.asc

Marek Szyprowski

unread,
Aug 8, 2025, 2:51:17 PMAug 8
to Jason Gunthorpe, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Thanks for the elaborate description, that's something that was missing
in the previous attempt. I read again all the previous discussion and
this explanation and there are still two things that imho needs more
clarification.


First - basing the API on the phys_addr_t.

Page based API had the advantage that it was really hard to abuse it and
call for something that is not 'a normal RAM'. I initially though that
phys_addr_t based API will somehow simplify arch specific
implementation, as some of them indeed rely on phys_addr_t internally,
but I missed other things pointed by Robin. Do we have here any
alternative?


Second - making dma_map_phys() a single API to handle all cases.

Do we really need such single function to handle all cases? To handle
P2P case, the caller already must pass DMA_ATTR_MMIO, so it must somehow
keep such information internally. Cannot it just call existing
dma_map_resource(), so there will be clear distinction between these 2
cases (DMA to RAM and P2P DMA)? Do we need additional check for
DMA_ATTR_MMIO for every typical DMA user? I know that branching is
cheap, but this will probably increase code size for most of the typical
users for no reason.


Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Jason Gunthorpe

unread,
Aug 9, 2025, 9:35:02 AMAug 9
to Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Fri, Aug 08, 2025 at 08:51:08PM +0200, Marek Szyprowski wrote:
> First - basing the API on the phys_addr_t.
>
> Page based API had the advantage that it was really hard to abuse it and
> call for something that is not 'a normal RAM'.

This is not true anymore. Today we have ZONE_DEVICE as a struct page
type with a whole bunch of non-dram sub-types:

enum memory_type {
/* 0 is reserved to catch uninitialized type fields */
MEMORY_DEVICE_PRIVATE = 1,
MEMORY_DEVICE_COHERENT,
MEMORY_DEVICE_FS_DAX,
MEMORY_DEVICE_GENERIC,
MEMORY_DEVICE_PCI_P2PDMA,
};

Few of which are kmappable/page_to_virtable() in a way that is useful
for the DMA API.

DMA API sort of ignores all of this and relies on the caller to not
pass in an incorrect struct page. eg we rely on things like the block
stack to do the right stuff when a MEMORY_DEVICE_PCI_P2PDMA is present
in a bio_vec.

Which is not really fundamentally different from just using
phys_addr_t in the first place.

Sure, this was a stronger argument when this stuff was originally
written, before ZONE_DEVICE was invented.

> I initially though that phys_addr_t based API will somehow simplify
> arch specific implementation, as some of them indeed rely on
> phys_addr_t internally, but I missed other things pointed by
> Robin. Do we have here any alternative?

I think it is less of a code simplification, more as a reduction in
conceptual load. When we can say directly there is no struct page type
anyhwere in the DMA API layers then we only have to reason about
kmap/phys_to_virt compatibly.

This is also a weaker overall requirement than needing an actual
struct page which allows optimizing other parts of the kernel. Like we
aren't forced to create MEMORY_DEVICE_PCI_P2PDMA stuct pages just to
use the dma api.

Again, any place in the kernel we can get rid of struct page the
smoother the road will be for the MM side struct page restructuring.

For example one of the bigger eventual goes here is to make a bio_vec
store phys_addr_t, not struct page pointers.

DMA API is not alone here, we have been de-struct-paging the kernel
for a long time now:

netdev: https://lore.kernel.org/linux-mm/20250609043225.7...@sk.com/
slab: https://lore.kernel.org/linux-mm/20211201181510...@suse.cz/
iommmu: https://lore.kernel.org/all/0-v4-c8663abbb606+...@nvidia.com/
page tables: https://lore.kernel.org/linux-mm/20230731170332.69...@gmail.com/
zswap: https://lore.kernel.org/all/20241216150450.12...@gmail.com/

With a long term goal that struct page only exists for legacy code,
and is maybe entirely compiled out of modern server kernels.

> Second - making dma_map_phys() a single API to handle all cases.
>
> Do we really need such single function to handle all cases?

If we accept the direction to remove struct page then it makes little
sense to have a dma_map_ram(phys_addr) and dma_map_resource(phys_addr)
and force key callers (like block) to have more ifs - especially if
the conditional could become "free" inside the dma API (see below).

Plus if we keep the callchain split then adding a
"dma_link_resource"/etc are now needed as well.

> DMA_ATTR_MMIO for every typical DMA user? I know that branching is
> cheap, but this will probably increase code size for most of the typical
> users for no reason.

Well, having two call chains will increase the code size much more,
and 'resource' can't be compiled out. Arguably this unification should
reduce the .text size since many of the resource only functions go
away.

There are some branches, and I think the push toward re-using
DMA_ATTR_SKIP_CPU_SYNC was directly to try to reduce that branch
cost.

However, I think we should be looking for a design here that is "free"
on the fast no-swiotlb and non-cache-flush path. I think this can be
achieved by checking ATTR_MMIO only after seeing swiotlb is needed
(like today's is p2p check). And we can probably freely fold it into
the existing sync check:

if ((attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)) == 0)

I saw Leon hasn't done these micro optimizations, but it seems like it
could work out.

Regards,
Jason

Demi Marie Obenour

unread,
Aug 9, 2025, 12:53:32 PMAug 9
to Jason Gunthorpe, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Why just server kernels? I suspect client systems actually run
newer kernels than servers do.
--
Sincerely,
Demi Marie Obenour (she/her/hers)
OpenPGP_0xB288B55FFF9C22C1.asc
OpenPGP_signature.asc

Jason Gunthorpe

unread,
Aug 10, 2025, 1:02:10 PMAug 10
to Demi Marie Obenour, Marek Szyprowski, Leon Romanovsky, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Sat, Aug 09, 2025 at 12:53:09PM -0400, Demi Marie Obenour wrote:
> > With a long term goal that struct page only exists for legacy code,
> > and is maybe entirely compiled out of modern server kernels.
>
> Why just server kernels? I suspect client systems actually run
> newer kernels than servers do.

I would guess this is because of the people who are interested in this
work. Frankly there isn't much benifit for small memory client
systems. Modern servers have > 1TB of memory and struct page really
hurts here.

The flip side of this is the work is enormous and I think there is a
general idea that the smaller set of server related drivers and
subsystems will get ready well before the wider universe of stuff a
client or android might use.

It is not that more can't happen it just ultimately depends on
interest and time.

Many modern servers use quite new kernels if you ignore the enterprise
distros :\

Jason

Leon Romanovsky

unread,
Aug 13, 2025, 11:07:25 AMAug 13
to Jason Gunthorpe, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Thu, Aug 07, 2025 at 09:21:15AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 04, 2025 at 03:42:42PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <leo...@nvidia.com>
> >
> > Convert the KMSAN DMA handling function from page-based to physical
> > address-based interface.
> >
> > The refactoring renames kmsan_handle_dma() parameters from accepting
> > (struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
> > size_t size). A PFN_VALID check is added to prevent KMSAN operations
> > on non-page memory, preventing from non struct page backed address,
> >
> > As part of this change, support for highmem addresses is implemented
> > using kmap_local_page() to handle both lowmem and highmem regions
> > properly. All callers throughout the codebase are updated to use the
> > new phys_addr_t based interface.
>
> Use the function Matthew pointed at kmap_local_pfn()
>
> Maybe introduce the kmap_local_phys() he suggested too.

At this point it gives nothing.

>
> > /* Helper function to handle DMA data transfers. */
> > -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> > +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> > enum dma_data_direction dir)
> > {
> > u64 page_offset, to_go, addr;
> > + struct page *page;
> > + void *kaddr;
> >
> > - if (PageHighMem(page))
> > + if (!pfn_valid(PHYS_PFN(phys)))
> > return;
>
> Not needed, the caller must pass in a phys that is kmap
> compatible. Maybe just leave a comment. FWIW today this is also not
> checking for P2P or DEVICE non-kmap struct pages either, so it should
> be fine without checks.

It is not true as we will call to kmsan_handle_dma() unconditionally in
dma_map_phys(). The reason to it is that kmsan_handle_dma() is guarded
with debug kconfig options and cost of pfn_valid() can be accommodated
in that case. It gives more clean DMA code.

155 dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
156 enum dma_data_direction dir, unsigned long attrs)
157 {
<...>
187
188 kmsan_handle_dma(phys, size, dir);
189 trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
190 debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
191
192 return addr;
193 }
194 EXPORT_SYMBOL_GPL(dma_map_phys);

So let's keep this patch as is.

Thanks

Leon Romanovsky

unread,
Aug 13, 2025, 11:37:35 AMAug 13
to Jason Gunthorpe, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
The best variant will be to squash previous patch "block-dma: properly
take MMIO path", but I don't want to mix them as they for different
kernel areas.

Thanks

>
> Jason
>

Leon Romanovsky

unread,
Aug 14, 2025, 6:13:52 AMAug 14
to Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Changelog:
v2:
* Used commit messages and cover letter from Jason
* Moved setting IOMMU_MMIO flag to dma_info_to_prot function
* Micro-optimized the code
* Rebased code on v6.17-rc1
v1: https://lore.kernel.org/all/cover.175429...@kernel.org
* Added new DMA_ATTR_MMIO attribute to indicate
PCI_P2PDMA_MAP_THRU_HOST_BRIDGE path.
* Rewrote dma_map_* functions to use thus new attribute
v0: https://lore.kernel.org/all/cover.175085...@kernel.org/
------------------------------------------------------------------------

This series refactors the DMA mapping to use physical addresses
as the primary interface instead of page+offset parameters. This
change aligns the DMA API with the underlying hardware reality where
DMA operations work with physical addresses, not page structures.

The series maintains export symbol backward compatibility by keeping
the old page-based API as wrapper functions around the new physical
address-based implementations.
will give the same treatment to the legacy dma_ops implementation.

Thanks

Leon Romanovsky (16):
dma-mapping: introduce new DMA attribute to indicate MMIO memory
iommu/dma: implement DMA_ATTR_MMIO for dma_iova_link().
dma-debug: refactor to use physical addresses for page mapping
dma-mapping: rename trace_dma_*map_page to trace_dma_*map_phys
iommu/dma: rename iommu_dma_*map_page to iommu_dma_*map_phys
iommu/dma: extend iommu_dma_*map_phys API to handle MMIO memory
dma-mapping: convert dma_direct_*map_page to be phys_addr_t based
kmsan: convert kmsan_handle_dma to use physical addresses
dma-mapping: handle MMIO flow in dma_map|unmap_page
xen: swiotlb: Open code map_resource callback
dma-mapping: export new dma_*map_phys() interface
mm/hmm: migrate to physical address-based DMA mapping API
mm/hmm: properly take MMIO path
block-dma: migrate to dma_map_phys instead of map_page
block-dma: properly take MMIO path
nvme-pci: unmap MMIO pages with appropriate interface

Documentation/core-api/dma-api.rst | 4 +-
Documentation/core-api/dma-attributes.rst | 18 ++++
arch/powerpc/kernel/dma-iommu.c | 4 +-
block/blk-mq-dma.c | 15 ++-
drivers/iommu/dma-iommu.c | 61 ++++++------
drivers/nvme/host/pci.c | 18 +++-
drivers/virtio/virtio_ring.c | 4 +-
drivers/xen/swiotlb-xen.c | 21 +++-
include/linux/blk-mq-dma.h | 6 +-
include/linux/blk_types.h | 2 +
include/linux/dma-direct.h | 2 -
include/linux/dma-map-ops.h | 8 +-
include/linux/dma-mapping.h | 33 ++++++
include/linux/iommu-dma.h | 11 +-
include/linux/kmsan.h | 12 ++-
include/trace/events/dma.h | 9 +-
kernel/dma/debug.c | 71 ++++---------
kernel/dma/debug.h | 37 ++-----
kernel/dma/direct.c | 22 +---
kernel/dma/direct.h | 52 ++++++----
kernel/dma/mapping.c | 116 +++++++++++++---------
kernel/dma/ops_helpers.c | 6 +-
mm/hmm.c | 19 ++--
mm/kmsan/hooks.c | 36 +++++--
rust/kernel/dma.rs | 3 +
tools/virtio/linux/kmsan.h | 2 +-
26 files changed, 333 insertions(+), 259 deletions(-)

--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:13:56 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

This will replace the hacky use of DMA_ATTR_SKIP_CPU_SYNC to avoid
touching the possibly non-KVA MMIO memory.

Also correct the incorrect caching attribute for the IOMMU, MMIO
memory should not be cachable inside the IOMMU mapping or it can
possibly create system problems. Set IOMMU_MMIO for DMA_ATTR_MMIO.

Reviewed-by: Jason Gunthorpe <j...@nvidia.com>
Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 18 ++++++++++++++----
1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fe..e1185ba73e23 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -724,7 +724,12 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, struct device *dev
static int dma_info_to_prot(enum dma_data_direction dir, bool coherent,
unsigned long attrs)
{
- int prot = coherent ? IOMMU_CACHE : 0;
+ int prot;
+
+ if (attrs & DMA_ATTR_MMIO)
+ prot = IOMMU_MMIO;
+ else
+ prot = coherent ? IOMMU_CACHE : 0;

if (attrs & DMA_ATTR_PRIVILEGED)
prot |= IOMMU_PRIV;
@@ -1838,12 +1843,13 @@ static int __dma_iova_link(struct device *dev, dma_addr_t addr,
unsigned long attrs)
{
bool coherent = dev_is_dma_coherent(dev);
+ int prot = dma_info_to_prot(dir, coherent, attrs);

- if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ if (!coherent && !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
arch_sync_dma_for_device(phys, size, dir);

return iommu_map_nosync(iommu_get_dma_domain(dev), addr, phys, size,
- dma_info_to_prot(dir, coherent, attrs), GFP_ATOMIC);
+ prot, GFP_ATOMIC);
}

static int iommu_dma_iova_bounce_and_link(struct device *dev, dma_addr_t addr,
@@ -1949,9 +1955,13 @@ int dma_iova_link(struct device *dev, struct dma_iova_state *state,
return -EIO;

if (dev_use_swiotlb(dev, size, dir) &&
- iova_unaligned(iovad, phys, size))
+ iova_unaligned(iovad, phys, size)) {
+ if (attrs & DMA_ATTR_MMIO)
+ return -EPERM;
+
return iommu_dma_iova_link_swiotlb(dev, state, phys, offset,
size, dir, attrs);
+ }

return __dma_iova_link(dev, state->addr + offset - iova_start_pad,
phys - iova_start_pad,
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:00 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
that reside in memory-mapped I/O (MMIO) regions, such as device BARs
exposed through the host bridge, which are accessible for peer-to-peer
(P2P) DMA.

This attribute is especially useful for exporting device memory to other
devices for DMA without CPU involvement, and avoids unnecessary or
potentially detrimental CPU cache maintenance calls.

DMA_ATTR_MMIO is supposed to provide dma_map_resource() functionality
without need to call to special function and perform branching by
the callers.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
Documentation/core-api/dma-attributes.rst | 18 ++++++++++++++++++
include/linux/dma-mapping.h | 20 ++++++++++++++++++++
include/trace/events/dma.h | 3 ++-
rust/kernel/dma.rs | 3 +++
4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..58a1528a9bb9 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,21 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute indicates the physical address is not normal system
+memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+functions, it may not be cachable, and access using CPU load/store
+instructions may not be allowed.
+
+Usually this will be used to describe MMIO addresses, or other non
+cachable register addresses. When DMA mapping this sort of address we
+call the operation Peer to Peer as a one device is DMA'ing to another
+device. For PCI devices the p2pdma APIs must be used to determine if
+DMA_ATTR_MMIO is appropriate.
+
+For architectures that require cache flushing for DMA coherence
+DMA_ATTR_MMIO will not perform any cache flushing. The address
+provided must never be mapped cachable into the CPU.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb..ead5514d389e 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,26 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)

+/*
+ * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
+ *
+ * This attribute indicates the physical address is not normal system
+ * memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+ * functions, it may not be cachable, and access using CPU load/store
+ * instructions may not be allowed.
+ *
+ * Usually this will be used to describe MMIO addresses, or other non
+ * cachable register addresses. When DMA mapping this sort of address we
+ * call the operation Peer to Peer as a one device is DMA'ing to another
+ * device. For PCI devices the p2pdma APIs must be used to determine if
+ * DMA_ATTR_MMIO is appropriate.
+ *
+ * For architectures that require cache flushing for DMA coherence
+ * DMA_ATTR_MMIO will not perform any cache flushing. The address
+ * provided must never be mapped cachable into the CPU.
+ */
+#define DMA_ATTR_MMIO (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c..ee90d6f1dcf3 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -31,7 +31,8 @@ TRACE_DEFINE_ENUM(DMA_NONE);
{ DMA_ATTR_FORCE_CONTIGUOUS, "FORCE_CONTIGUOUS" }, \
{ DMA_ATTR_ALLOC_SINGLE_PAGES, "ALLOC_SINGLE_PAGES" }, \
{ DMA_ATTR_NO_WARN, "NO_WARN" }, \
- { DMA_ATTR_PRIVILEGED, "PRIVILEGED" })
+ { DMA_ATTR_PRIVILEGED, "PRIVILEGED" }, \
+ { DMA_ATTR_MMIO, "MMIO" })

DECLARE_EVENT_CLASS(dma_map,
TP_PROTO(struct device *dev, phys_addr_t phys_addr, dma_addr_t dma_addr,
diff --git a/rust/kernel/dma.rs b/rust/kernel/dma.rs
index 2bc8ab51ec28..61d9eed7a786 100644
--- a/rust/kernel/dma.rs
+++ b/rust/kernel/dma.rs
@@ -242,6 +242,9 @@ pub mod attrs {
/// Indicates that the buffer is fully accessible at an elevated privilege level (and
/// ideally inaccessible or at least read-only at lesser-privileged levels).
pub const DMA_ATTR_PRIVILEGED: Attrs = Attrs(bindings::DMA_ATTR_PRIVILEGED);
+
+ /// Indicates that the buffer is MMIO memory.
+ pub const DMA_ATTR_MMIO: Attrs = Attrs(bindings::DMA_ATTR_MMIO);
}

/// An abstraction of the `dma_alloc_coherent` API.
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:04 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

As a preparation for following map_page -> map_phys API conversion,
let's rename trace_dma_*map_page() to be trace_dma_*map_phys().

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
include/trace/events/dma.h | 4 ++--
kernel/dma/mapping.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index ee90d6f1dcf3..84416c7d6bfa 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -72,7 +72,7 @@ DEFINE_EVENT(dma_map, name, \
size_t size, enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))

-DEFINE_MAP_EVENT(dma_map_page);
+DEFINE_MAP_EVENT(dma_map_phys);
DEFINE_MAP_EVENT(dma_map_resource);

DECLARE_EVENT_CLASS(dma_unmap,
@@ -110,7 +110,7 @@ DEFINE_EVENT(dma_unmap, name, \
enum dma_data_direction dir, unsigned long attrs), \
TP_ARGS(dev, addr, size, dir, attrs))

-DEFINE_UNMAP_EVENT(dma_unmap_page);
+DEFINE_UNMAP_EVENT(dma_unmap_phys);
DEFINE_UNMAP_EVENT(dma_unmap_resource);

DECLARE_EVENT_CLASS(dma_alloc_class,
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 4c1dfbabb8ae..fe1f0da6dc50 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -173,7 +173,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

return addr;
@@ -193,7 +193,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
iommu_dma_unmap_page(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
- trace_dma_unmap_page(dev, addr, size, dir, attrs);
+ trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:10 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Rename the IOMMU DMA mapping functions to better reflect their actual
calling convention. The functions iommu_dma_map_page() and
iommu_dma_unmap_page() are renamed to iommu_dma_map_phys() and
iommu_dma_unmap_phys() respectively, as they already operate on physical
addresses rather than page structures.

The calling convention changes from accepting (struct page *page,
unsigned long offset) to (phys_addr_t phys), which eliminates the need
for page-to-physical address conversion within the functions. This
renaming prepares for the broader DMA API conversion from page-based
to physical address-based mapping throughout the kernel.

All callers are updated to pass physical addresses directly, including
dma_map_page_attrs(), scatterlist mapping functions, and DMA page
allocation helpers. The change simplifies the code by removing the
page_to_phys() + offset calculation that was previously done inside
the IOMMU functions.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 ++++++--------
include/linux/iommu-dma.h | 7 +++----
kernel/dma/mapping.c | 4 ++--
kernel/dma/ops_helpers.c | 6 +++---
4 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index e1185ba73e23..aea119f32f96 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1195,11 +1195,9 @@ static inline size_t iova_unaligned(struct iova_domain *iovad, phys_addr_t phys,
return iova_offset(iovad, phys | size);
}

-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1227,7 +1225,7 @@ dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
return iova;
}

-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
struct iommu_domain *domain = iommu_get_dma_domain(dev);
@@ -1346,7 +1344,7 @@ static void iommu_dma_unmap_sg_swiotlb(struct device *dev, struct scatterlist *s
int i;

for_each_sg(sg, s, nents, i)
- iommu_dma_unmap_page(dev, sg_dma_address(s),
+ iommu_dma_unmap_phys(dev, sg_dma_address(s),
sg_dma_len(s), dir, attrs);
}

@@ -1359,8 +1357,8 @@ static int iommu_dma_map_sg_swiotlb(struct device *dev, struct scatterlist *sg,
sg_dma_mark_swiotlb(sg);

for_each_sg(sg, s, nents, i) {
- sg_dma_address(s) = iommu_dma_map_page(dev, sg_page(s),
- s->offset, s->length, dir, attrs);
+ sg_dma_address(s) = iommu_dma_map_phys(dev, sg_phys(s),
+ s->length, dir, attrs);
if (sg_dma_address(s) == DMA_MAPPING_ERROR)
goto out_unmap;
sg_dma_len(s) = s->length;
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 508beaa44c39..485bdffed988 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -21,10 +21,9 @@ static inline bool use_dma_iommu(struct device *dev)
}
#endif /* CONFIG_IOMMU_DMA */

-dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
- unsigned long offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs);
-void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
+dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs);
int iommu_dma_map_sg(struct device *dev, struct scatterlist *sg, int nents,
enum dma_data_direction dir, unsigned long attrs);
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index fe1f0da6dc50..58482536db9b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -169,7 +169,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
- addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
+ addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
@@ -190,7 +190,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
arch_dma_unmap_page_direct(dev, addr + size))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, addr, size, dir, attrs);
+ iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
diff --git a/kernel/dma/ops_helpers.c b/kernel/dma/ops_helpers.c
index 9afd569eadb9..6f9d604d9d40 100644
--- a/kernel/dma/ops_helpers.c
+++ b/kernel/dma/ops_helpers.c
@@ -72,8 +72,8 @@ struct page *dma_common_alloc_pages(struct device *dev, size_t size,
return NULL;

if (use_dma_iommu(dev))
- *dma_handle = iommu_dma_map_page(dev, page, 0, size, dir,
- DMA_ATTR_SKIP_CPU_SYNC);
+ *dma_handle = iommu_dma_map_phys(dev, page_to_phys(page), size,
+ dir, DMA_ATTR_SKIP_CPU_SYNC);
else
*dma_handle = ops->map_page(dev, page, 0, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
@@ -92,7 +92,7 @@ void dma_common_free_pages(struct device *dev, size_t size, struct page *page,
const struct dma_map_ops *ops = get_dma_ops(dev);

if (use_dma_iommu(dev))
- iommu_dma_unmap_page(dev, dma_handle, size, dir,
+ iommu_dma_unmap_phys(dev, dma_handle, size, dir,
DMA_ATTR_SKIP_CPU_SYNC);
else if (ops->unmap_page)
ops->unmap_page(dev, dma_handle, size, dir,
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:13 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Combine iommu_dma_*map_phys with iommu_dma_*map_resource interfaces in
order to allow single phys_addr_t flow.

In the following patches, the iommu_dma_map_resource() will be removed
in favour of iommu_dma_map_phys(..., attrs | DMA_ATTR_MMIO) flow.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index aea119f32f96..6804aaf034a1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1211,16 +1211,19 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
*/
if (dev_use_swiotlb(dev, size, dir) &&
iova_unaligned(iovad, phys, size)) {
+ if (attrs & DMA_ATTR_MMIO)
+ return DMA_MAPPING_ERROR;
+
phys = iommu_dma_map_swiotlb(dev, phys, size, dir, attrs);
if (phys == (phys_addr_t)DMA_MAPPING_ERROR)
return DMA_MAPPING_ERROR;
}

- if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ if (!coherent && !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
arch_sync_dma_for_device(phys, size, dir);

iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
- if (iova == DMA_MAPPING_ERROR)
+ if (iova == DMA_MAPPING_ERROR && !(attrs & DMA_ATTR_MMIO))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
return iova;
}
@@ -1228,10 +1231,14 @@ dma_addr_t iommu_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
void iommu_dma_unmap_phys(struct device *dev, dma_addr_t dma_handle,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- struct iommu_domain *domain = iommu_get_dma_domain(dev);
phys_addr_t phys;

- phys = iommu_iova_to_phys(domain, dma_handle);
+ if (attrs & DMA_ATTR_MMIO) {
+ __iommu_dma_unmap(dev, dma_handle, size);
+ return;
+ }
+
+ phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
if (WARN_ON(!phys))
return;

--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:16 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the DMA debug infrastructure from page-based to physical address-based
mapping as a preparation to rely on physical address for DMA mapping routines.

The refactoring renames debug_dma_map_page() to debug_dma_map_phys() and
changes its signature to accept a phys_addr_t parameter instead of struct page
and offset. Similarly, debug_dma_unmap_page() becomes debug_dma_unmap_phys().
A new dma_debug_phy type is introduced to distinguish physical address mappings
from other debug entry types. All callers throughout the codebase are updated
to pass physical addresses directly, eliminating the need for page-to-physical
conversion in the debug layer.

This refactoring eliminates the need to convert between page pointers and
physical addresses in the debug layer, making the code more efficient and
consistent with the DMA mapping API's physical address focus.

Reviewed-by: Jason Gunthorpe <j...@nvidia.com>
Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
Documentation/core-api/dma-api.rst | 4 ++--
kernel/dma/debug.c | 28 +++++++++++++++++-----------
kernel/dma/debug.h | 16 +++++++---------
kernel/dma/mapping.c | 15 ++++++++-------
4 files changed, 34 insertions(+), 29 deletions(-)

diff --git a/Documentation/core-api/dma-api.rst b/Documentation/core-api/dma-api.rst
index 3087bea715ed..ca75b3541679 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -761,7 +761,7 @@ example warning message may look like this::
[<ffffffff80235177>] find_busiest_group+0x207/0x8a0
[<ffffffff8064784f>] _spin_lock_irqsave+0x1f/0x50
[<ffffffff803c7ea3>] check_unmap+0x203/0x490
- [<ffffffff803c8259>] debug_dma_unmap_page+0x49/0x50
+ [<ffffffff803c8259>] debug_dma_unmap_phys+0x49/0x50
[<ffffffff80485f26>] nv_tx_done_optimized+0xc6/0x2c0
[<ffffffff80486c13>] nv_nic_irq_optimized+0x73/0x2b0
[<ffffffff8026df84>] handle_IRQ_event+0x34/0x70
@@ -855,7 +855,7 @@ that a driver may be leaking mappings.
dma-debug interface debug_dma_mapping_error() to debug drivers that fail
to check DMA mapping errors on addresses returned by dma_map_single() and
dma_map_page() interfaces. This interface clears a flag set by
-debug_dma_map_page() to indicate that dma_mapping_error() has been called by
+debug_dma_map_phys() to indicate that dma_mapping_error() has been called by
the driver. When driver does unmap, debug_dma_unmap() checks the flag and if
this flag is still set, prints warning message that includes call trace that
leads up to the unmap. This interface can be called from dma_mapping_error()
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index e43c6de2bce4..da6734e3a4ce 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -39,6 +39,7 @@ enum {
dma_debug_sg,
dma_debug_coherent,
dma_debug_resource,
+ dma_debug_phy,
};

enum map_err_types {
@@ -141,6 +142,7 @@ static const char *type2name[] = {
[dma_debug_sg] = "scatter-gather",
[dma_debug_coherent] = "coherent",
[dma_debug_resource] = "resource",
+ [dma_debug_phy] = "phy",
};

static const char *dir2name[] = {
@@ -1201,9 +1203,8 @@ void debug_dma_map_single(struct device *dev, const void *addr,
}
EXPORT_SYMBOL(debug_dma_map_single);

-void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
- size_t size, int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+void debug_dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ int direction, dma_addr_t dma_addr, unsigned long attrs)
{
struct dma_debug_entry *entry;

@@ -1218,19 +1219,24 @@ void debug_dma_map_page(struct device *dev, struct page *page, size_t offset,
return;

entry->dev = dev;
- entry->type = dma_debug_single;
- entry->paddr = page_to_phys(page) + offset;
+ entry->type = dma_debug_phy;
+ entry->paddr = phys;
entry->dev_addr = dma_addr;
entry->size = size;
entry->direction = direction;
entry->map_err_type = MAP_ERR_NOT_CHECKED;

- check_for_stack(dev, page, offset);
+ if (!(attrs & DMA_ATTR_MMIO)) {
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(page);

- if (!PageHighMem(page)) {
- void *addr = page_address(page) + offset;
+ check_for_stack(dev, page, offset);

- check_for_illegal_area(dev, addr, size);
+ if (!PageHighMem(page)) {
+ void *addr = page_address(page) + offset;
+
+ check_for_illegal_area(dev, addr, size);
+ }
}

add_dma_entry(entry, attrs);
@@ -1274,11 +1280,11 @@ void debug_dma_mapping_error(struct device *dev, dma_addr_t dma_addr)
}
EXPORT_SYMBOL(debug_dma_mapping_error);

-void debug_dma_unmap_page(struct device *dev, dma_addr_t dma_addr,
+void debug_dma_unmap_phys(struct device *dev, dma_addr_t dma_addr,
size_t size, int direction)
{
struct dma_debug_entry ref = {
- .type = dma_debug_single,
+ .type = dma_debug_phy,
.dev = dev,
.dev_addr = dma_addr,
.size = size,
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index f525197d3cae..76adb42bffd5 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -9,12 +9,11 @@
#define _KERNEL_DMA_DEBUG_H

#ifdef CONFIG_DMA_API_DEBUG
-extern void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
+extern void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction, dma_addr_t dma_addr,
unsigned long attrs);

-extern void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+extern void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction);

extern void debug_dma_map_sg(struct device *dev, struct scatterlist *sg,
@@ -55,14 +54,13 @@ extern void debug_dma_sync_sg_for_device(struct device *dev,
struct scatterlist *sg,
int nelems, int direction);
#else /* CONFIG_DMA_API_DEBUG */
-static inline void debug_dma_map_page(struct device *dev, struct page *page,
- size_t offset, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
+static inline void debug_dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, int direction,
+ dma_addr_t dma_addr, unsigned long attrs)
{
}

-static inline void debug_dma_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void debug_dma_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, int direction)
{
}
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 107e4a4d251d..4c1dfbabb8ae 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -157,6 +157,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t addr;

BUG_ON(!valid_dma_direction(dir));
@@ -165,16 +166,15 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
+ arch_dma_map_page_direct(dev, phys + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
kmsan_handle_dma(page, offset, size, dir);
- trace_dma_map_page(dev, page_to_phys(page) + offset, addr, size, dir,
- attrs);
- debug_dma_map_page(dev, page, offset, size, dir, addr, attrs);
+ trace_dma_map_page(dev, phys, addr, size, dir, attrs);
+ debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

return addr;
}
@@ -194,7 +194,7 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_page(dev, addr, size, dir, attrs);
- debug_dma_unmap_page(dev, addr, size, dir);
+ debug_dma_unmap_phys(dev, addr, size, dir);
}
EXPORT_SYMBOL(dma_unmap_page_attrs);

@@ -712,7 +712,8 @@ struct page *dma_alloc_pages(struct device *dev, size_t size,
if (page) {
trace_dma_alloc_pages(dev, page_to_virt(page), *dma_handle,
size, dir, gfp, 0);
- debug_dma_map_page(dev, page, 0, size, dir, *dma_handle, 0);
+ debug_dma_map_phys(dev, page_to_phys(page), size, dir,
+ *dma_handle, 0);
} else {
trace_dma_alloc_pages(dev, NULL, 0, size, dir, gfp, 0);
}
@@ -738,7 +739,7 @@ void dma_free_pages(struct device *dev, size_t size, struct page *page,
dma_addr_t dma_handle, enum dma_data_direction dir)
{
trace_dma_free_pages(dev, page_to_virt(page), dma_handle, size, dir, 0);
- debug_dma_unmap_page(dev, dma_handle, size, dir);
+ debug_dma_unmap_phys(dev, dma_handle, size, dir);
__dma_free_pages(dev, size, page, dma_handle, dir);
}
EXPORT_SYMBOL_GPL(dma_free_pages);
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:20 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the KMSAN DMA handling function from page-based to physical
address-based interface.

The refactoring renames kmsan_handle_dma() parameters from accepting
(struct page *page, size_t offset, size_t size) to (phys_addr_t phys,
size_t size). A PFN_VALID check is added to prevent KMSAN operations
on non-page memory, preventing from non struct page backed address,

As part of this change, support for highmem addresses is implemented
using kmap_local_page() to handle both lowmem and highmem regions
properly. All callers throughout the codebase are updated to use the
new phys_addr_t based interface.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/virtio/virtio_ring.c | 4 ++--
include/linux/kmsan.h | 12 +++++++-----
kernel/dma/mapping.c | 2 +-
mm/kmsan/hooks.c | 36 +++++++++++++++++++++++++++++-------
tools/virtio/linux/kmsan.h | 2 +-
5 files changed, 40 insertions(+), 16 deletions(-)

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index f5062061c408..c147145a6593 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -378,7 +378,7 @@ static int vring_map_one_sg(const struct vring_virtqueue *vq, struct scatterlist
* is initialized by the hardware. Explicitly check/unpoison it
* depending on the direction.
*/
- kmsan_handle_dma(sg_page(sg), sg->offset, sg->length, direction);
+ kmsan_handle_dma(sg_phys(sg), sg->length, direction);
*addr = (dma_addr_t)sg_phys(sg);
return 0;
}
@@ -3157,7 +3157,7 @@ dma_addr_t virtqueue_dma_map_single_attrs(struct virtqueue *_vq, void *ptr,
struct vring_virtqueue *vq = to_vvq(_vq);

if (!vq->use_dma_api) {
- kmsan_handle_dma(virt_to_page(ptr), offset_in_page(ptr), size, dir);
+ kmsan_handle_dma(virt_to_phys(ptr), size, dir);
return (dma_addr_t)virt_to_phys(ptr);
}

diff --git a/include/linux/kmsan.h b/include/linux/kmsan.h
index 2b1432cc16d5..6f27b9824ef7 100644
--- a/include/linux/kmsan.h
+++ b/include/linux/kmsan.h
@@ -182,8 +182,7 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);

/**
* kmsan_handle_dma() - Handle a DMA data transfer.
- * @page: first page of the buffer.
- * @offset: offset of the buffer within the first page.
+ * @phys: physical address of the buffer.
* @size: buffer size.
* @dir: one of possible dma_data_direction values.
*
@@ -191,8 +190,11 @@ void kmsan_iounmap_page_range(unsigned long start, unsigned long end);
* * checks the buffer, if it is copied to device;
* * initializes the buffer, if it is copied from device;
* * does both, if this is a DMA_BIDIRECTIONAL transfer.
+ *
+ * The function handles page lookup internally and supports both lowmem
+ * and highmem addresses.
*/
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir);

/**
@@ -372,8 +374,8 @@ static inline void kmsan_iounmap_page_range(unsigned long start,
{
}

-static inline void kmsan_handle_dma(struct page *page, size_t offset,
- size_t size, enum dma_data_direction dir)
+static inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
+ enum dma_data_direction dir)
{
}

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 80481a873340..709405d46b2b 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -172,7 +172,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
- kmsan_handle_dma(page, offset, size, dir);
+ kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index 97de3d6194f0..eab7912a3bf0 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -336,25 +336,48 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,
}

/* Helper function to handle DMA data transfers. */
-void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
u64 page_offset, to_go, addr;
+ struct page *page;
+ void *kaddr;

- if (PageHighMem(page))
+ if (!pfn_valid(PHYS_PFN(phys)))
return;
- addr = (u64)page_address(page) + offset;
+
+ page = phys_to_page(phys);
+ page_offset = offset_in_page(phys);
+
/*
* The kernel may occasionally give us adjacent DMA pages not belonging
* to the same allocation. Process them separately to avoid triggering
* internal KMSAN checks.
*/
while (size > 0) {
- page_offset = offset_in_page(addr);
to_go = min(PAGE_SIZE - page_offset, (u64)size);
+
+ if (PageHighMem(page))
+ /* Handle highmem pages using kmap */
+ kaddr = kmap_local_page(page);
+ else
+ /* Lowmem pages can be accessed directly */
+ kaddr = page_address(page);
+
+ addr = (u64)kaddr + page_offset;
kmsan_handle_dma_page((void *)addr, to_go, dir);
- addr += to_go;
+
+ if (PageHighMem(page))
+ kunmap_local(page);
+
+ phys += to_go;
size -= to_go;
+
+ /* Move to next page if needed */
+ if (size > 0) {
+ page = phys_to_page(phys);
+ page_offset = offset_in_page(phys);
+ }
}
}
EXPORT_SYMBOL_GPL(kmsan_handle_dma);
@@ -366,8 +389,7 @@ void kmsan_handle_dma_sg(struct scatterlist *sg, int nents,
int i;

for_each_sg(sg, item, nents, i)
- kmsan_handle_dma(sg_page(item), item->offset, item->length,
- dir);
+ kmsan_handle_dma(sg_phys(item), item->length, dir);
}

/* Functions from kmsan-checks.h follow. */
diff --git a/tools/virtio/linux/kmsan.h b/tools/virtio/linux/kmsan.h
index 272b5aa285d5..6cd2e3efd03d 100644
--- a/tools/virtio/linux/kmsan.h
+++ b/tools/virtio/linux/kmsan.h
@@ -4,7 +4,7 @@

#include <linux/gfp.h>

-inline void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
+inline void kmsan_handle_dma(phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
}
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:24 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert the DMA direct mapping functions to accept physical addresses
directly instead of page+offset parameters. The functions were already
operating on physical addresses internally, so this change eliminates
the redundant page-to-physical conversion at the API boundary.

The functions dma_direct_map_page() and dma_direct_unmap_page() are
renamed to dma_direct_map_phys() and dma_direct_unmap_phys() respectively,
with their calling convention changed from (struct page *page,
unsigned long offset) to (phys_addr_t phys).

Architecture-specific functions arch_dma_map_page_direct() and
arch_dma_unmap_page_direct() are similarly renamed to
arch_dma_map_phys_direct() and arch_dma_unmap_phys_direct().

The is_pci_p2pdma_page() checks are replaced with DMA_ATTR_MMIO checks
to allow integration with dma_direct_map_resource and dma_direct_map_phys()
is extended to support MMIO path either.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
arch/powerpc/kernel/dma-iommu.c | 4 +--
include/linux/dma-map-ops.h | 8 ++---
kernel/dma/direct.c | 6 ++--
kernel/dma/direct.h | 52 +++++++++++++++++++++------------
kernel/dma/mapping.c | 8 ++---
5 files changed, 46 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index 4d64a5db50f3..0359ab72cd3b 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -14,7 +14,7 @@
#define can_map_direct(dev, addr) \
((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))

-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr)
{
if (likely(!dev->bus_dma_limit))
return false;
@@ -24,7 +24,7 @@ bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)

#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)

-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle)
{
if (likely(!dev->bus_dma_limit))
return false;
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index f48e5fb88bd5..71f5b3025415 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -392,15 +392,15 @@ void *arch_dma_set_uncached(void *addr, size_t size);
void arch_dma_clear_uncached(void *addr, size_t size);

#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
-bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
-bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_phys_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_phys_direct(struct device *dev, dma_addr_t dma_handle);
bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
int nents);
#else
-#define arch_dma_map_page_direct(d, a) (false)
-#define arch_dma_unmap_page_direct(d, a) (false)
+#define arch_dma_map_phys_direct(d, a) (false)
+#define arch_dma_unmap_phys_direct(d, a) (false)
#define arch_dma_map_sg_direct(d, s, n) (false)
#define arch_dma_unmap_sg_direct(d, s, n) (false)
#endif
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 24c359d9c879..fa75e3070073 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -453,7 +453,7 @@ void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
if (sg_dma_is_bus_address(sg))
sg_dma_unmark_bus_address(sg);
else
- dma_direct_unmap_page(dev, sg->dma_address,
+ dma_direct_unmap_phys(dev, sg->dma_address,
sg_dma_len(sg), dir, attrs);
}
}
@@ -476,8 +476,8 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
*/
break;
case PCI_P2PDMA_MAP_NONE:
- sg->dma_address = dma_direct_map_page(dev, sg_page(sg),
- sg->offset, sg->length, dir, attrs);
+ sg->dma_address = dma_direct_map_phys(dev, sg_phys(sg),
+ sg->length, dir, attrs);
if (sg->dma_address == DMA_MAPPING_ERROR) {
ret = -EIO;
goto out_unmap;
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index d2c0b7e632fc..92dbadcd3b2f 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -80,42 +80,56 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
arch_dma_mark_clean(paddr, size);
}

-static inline dma_addr_t dma_direct_map_page(struct device *dev,
- struct page *page, unsigned long offset, size_t size,
- enum dma_data_direction dir, unsigned long attrs)
+static inline dma_addr_t dma_direct_map_phys(struct device *dev,
+ phys_addr_t phys, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
{
- phys_addr_t phys = page_to_phys(page) + offset;
- dma_addr_t dma_addr = phys_to_dma(dev, phys);
+ dma_addr_t dma_addr;
+ bool capable;

if (is_swiotlb_force_bounce(dev)) {
- if (is_pci_p2pdma_page(page))
- return DMA_MAPPING_ERROR;
+ if (attrs & DMA_ATTR_MMIO)
+ goto err_overflow;
+
return swiotlb_map(dev, phys, size, dir, attrs);
}

- if (unlikely(!dma_capable(dev, dma_addr, size, true)) ||
- dma_kmalloc_needs_bounce(dev, size, dir)) {
- if (is_pci_p2pdma_page(page))
- return DMA_MAPPING_ERROR;
- if (is_swiotlb_active(dev))
+ if (attrs & DMA_ATTR_MMIO)
+ dma_addr = phys;
+ else
+ dma_addr = phys_to_dma(dev, phys);
+
+ capable = dma_capable(dev, dma_addr, size, !(attrs & DMA_ATTR_MMIO));
+ if (unlikely(!capable) || dma_kmalloc_needs_bounce(dev, size, dir)) {
+ if (is_swiotlb_active(dev) && !(attrs & DMA_ATTR_MMIO))
return swiotlb_map(dev, phys, size, dir, attrs);

- dev_WARN_ONCE(dev, 1,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- return DMA_MAPPING_ERROR;
+ goto err_overflow;
}

- if (!dev_is_dma_coherent(dev) && !(attrs & DMA_ATTR_SKIP_CPU_SYNC))
+ if (!dev_is_dma_coherent(dev) &&
+ !(attrs & (DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_MMIO)))
arch_sync_dma_for_device(phys, size, dir);
return dma_addr;
+
+err_overflow:
+ dev_WARN_ONCE(
+ dev, 1,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ return DMA_MAPPING_ERROR;
}

-static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+static inline void dma_direct_unmap_phys(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- phys_addr_t phys = dma_to_phys(dev, addr);
+ phys_addr_t phys;
+
+ if (attrs & DMA_ATTR_MMIO)
+ /* nothing to do: uncached and no swiotlb */
+ return;

+ phys = dma_to_phys(dev, addr);
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
dma_direct_sync_single_for_cpu(dev, addr, size, dir);

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 58482536db9b..80481a873340 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -166,8 +166,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_page_direct(dev, phys + size))
- addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+ arch_dma_map_phys_direct(dev, phys + size))
+ addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
else
@@ -187,8 +187,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,

BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_page_direct(dev, addr + size))
- dma_direct_unmap_page(dev, addr, size, dir, attrs);
+ arch_dma_unmap_phys_direct(dev, addr + size))
+ dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
else
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:29 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

General dma_direct_map_resource() is going to be removed
in next patch, so simply open-code it in xen driver.

Reviewed-by: Juergen Gross <jgr...@suse.com>
Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/xen/swiotlb-xen.c | 21 ++++++++++++++++++++-
1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index da1a7d3d377c..dd7747a2de87 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -392,6 +392,25 @@ xen_swiotlb_sync_sg_for_device(struct device *dev, struct scatterlist *sgl,
}
}

+static dma_addr_t xen_swiotlb_direct_map_resource(struct device *dev,
+ phys_addr_t paddr,
+ size_t size,
+ enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ dma_addr_t dma_addr = paddr;
+
+ if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
+ dev_err_once(dev,
+ "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
+ &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
+ WARN_ON_ONCE(1);
+ return DMA_MAPPING_ERROR;
+ }
+
+ return dma_addr;
+}
+
/*
* Return whether the given device DMA address mask can be supported
* properly. For example, if your device can only drive the low 24-bits
@@ -426,5 +445,5 @@ const struct dma_map_ops xen_swiotlb_dma_ops = {
.alloc_pages_op = dma_common_alloc_pages,
.free_pages = dma_common_free_pages,
.max_mapping_size = swiotlb_max_mapping_size,
- .map_resource = dma_direct_map_resource,
+ .map_resource = xen_swiotlb_direct_map_resource,
};
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:33 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Introduce new DMA mapping functions dma_map_phys() and dma_unmap_phys()
that operate directly on physical addresses instead of page+offset
parameters. This provides a more efficient interface for drivers that
already have physical addresses available.

The new functions are implemented as the primary mapping layer, with
the existing dma_map_page_attrs()/dma_map_resource() and
dma_unmap_page_attrs()/dma_unmap_resource() functions converted to simple
wrappers around the phys-based implementations.

In case dma_map_page_attrs(), the struct page is converted to physical
address with help of page_to_phys() function and dma_map_resource()
provides physical address as is together with addition of DMA_ATTR_MMIO
attribute.

The old page-based API is preserved in mapping.c to ensure that existing
code won't be affected by changing EXPORT_SYMBOL to EXPORT_SYMBOL_GPL
variant for dma_*map_phys().

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/iommu/dma-iommu.c | 14 --------
include/linux/dma-direct.h | 2 --
include/linux/dma-mapping.h | 13 +++++++
include/linux/iommu-dma.h | 4 ---
include/trace/events/dma.h | 2 --
kernel/dma/debug.c | 43 -----------------------
kernel/dma/debug.h | 21 -----------
kernel/dma/direct.c | 16 ---------
kernel/dma/mapping.c | 69 ++++++++++++++++++++-----------------
9 files changed, 50 insertions(+), 134 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6804aaf034a1..7944a3af4545 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1556,20 +1556,6 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
__iommu_dma_unmap(dev, start, end - start);
}

-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- return __iommu_dma_map(dev, phys, size,
- dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
- dma_get_mask(dev));
-}
-
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- __iommu_dma_unmap(dev, handle, size);
-}
-
static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
{
size_t alloc_size = PAGE_ALIGN(size);
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index f3bc0bcd7098..c249912456f9 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -149,7 +149,5 @@ void dma_direct_free_pages(struct device *dev, size_t size,
struct page *page, dma_addr_t dma_addr,
enum dma_data_direction dir);
int dma_direct_supported(struct device *dev, u64 mask);
-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs);

#endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index ead5514d389e..cebfbe237595 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -138,6 +138,10 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
unsigned long attrs);
void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs);
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs);
unsigned int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
int nents, enum dma_data_direction dir, unsigned long attrs);
void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -192,6 +196,15 @@ static inline void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
}
+static inline dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+ return DMA_MAPPING_ERROR;
+}
+static inline void dma_unmap_phys(struct device *dev, dma_addr_t addr,
+ size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
static inline unsigned int dma_map_sg_attrs(struct device *dev,
struct scatterlist *sg, int nents, enum dma_data_direction dir,
unsigned long attrs)
diff --git a/include/linux/iommu-dma.h b/include/linux/iommu-dma.h
index 485bdffed988..a92b3ff9b934 100644
--- a/include/linux/iommu-dma.h
+++ b/include/linux/iommu-dma.h
@@ -42,10 +42,6 @@ size_t iommu_dma_opt_mapping_size(void);
size_t iommu_dma_max_mapping_size(struct device *dev);
void iommu_dma_free(struct device *dev, size_t size, void *cpu_addr,
dma_addr_t handle, unsigned long attrs);
-dma_addr_t iommu_dma_map_resource(struct device *dev, phys_addr_t phys,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
-void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
- size_t size, enum dma_data_direction dir, unsigned long attrs);
struct sg_table *iommu_dma_alloc_noncontiguous(struct device *dev, size_t size,
enum dma_data_direction dir, gfp_t gfp, unsigned long attrs);
void iommu_dma_free_noncontiguous(struct device *dev, size_t size,
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index 84416c7d6bfa..5da59fd8121d 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
@@ -73,7 +73,6 @@ DEFINE_EVENT(dma_map, name, \
TP_ARGS(dev, phys_addr, dma_addr, size, dir, attrs))

DEFINE_MAP_EVENT(dma_map_phys);
-DEFINE_MAP_EVENT(dma_map_resource);

DECLARE_EVENT_CLASS(dma_unmap,
TP_PROTO(struct device *dev, dma_addr_t addr, size_t size,
@@ -111,7 +110,6 @@ DEFINE_EVENT(dma_unmap, name, \
TP_ARGS(dev, addr, size, dir, attrs))

DEFINE_UNMAP_EVENT(dma_unmap_phys);
-DEFINE_UNMAP_EVENT(dma_unmap_resource);

DECLARE_EVENT_CLASS(dma_alloc_class,
TP_PROTO(struct device *dev, void *virt_addr, dma_addr_t dma_addr,
diff --git a/kernel/dma/debug.c b/kernel/dma/debug.c
index da6734e3a4ce..06e31fd216e3 100644
--- a/kernel/dma/debug.c
+++ b/kernel/dma/debug.c
@@ -38,7 +38,6 @@ enum {
dma_debug_single,
dma_debug_sg,
dma_debug_coherent,
- dma_debug_resource,
dma_debug_phy,
};

@@ -141,7 +140,6 @@ static const char *type2name[] = {
[dma_debug_single] = "single",
[dma_debug_sg] = "scatter-gather",
[dma_debug_coherent] = "coherent",
- [dma_debug_resource] = "resource",
[dma_debug_phy] = "phy",
};

@@ -1448,47 +1446,6 @@ void debug_dma_free_coherent(struct device *dev, size_t size,
check_unmap(&ref);
}

-void debug_dma_map_resource(struct device *dev, phys_addr_t addr, size_t size,
- int direction, dma_addr_t dma_addr,
- unsigned long attrs)
-{
- struct dma_debug_entry *entry;
-
- if (unlikely(dma_debug_disabled()))
- return;
-
- entry = dma_entry_alloc();
- if (!entry)
- return;
-
- entry->type = dma_debug_resource;
- entry->dev = dev;
- entry->paddr = addr;
- entry->size = size;
- entry->dev_addr = dma_addr;
- entry->direction = direction;
- entry->map_err_type = MAP_ERR_NOT_CHECKED;
-
- add_dma_entry(entry, attrs);
-}
-
-void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
- size_t size, int direction)
-{
- struct dma_debug_entry ref = {
- .type = dma_debug_resource,
- .dev = dev,
- .dev_addr = dma_addr,
- .size = size,
- .direction = direction,
- };
-
- if (unlikely(dma_debug_disabled()))
- return;
-
- check_unmap(&ref);
-}
-
void debug_dma_sync_single_for_cpu(struct device *dev, dma_addr_t dma_handle,
size_t size, int direction)
{
diff --git a/kernel/dma/debug.h b/kernel/dma/debug.h
index 76adb42bffd5..424b8f912ade 100644
--- a/kernel/dma/debug.h
+++ b/kernel/dma/debug.h
@@ -30,14 +30,6 @@ extern void debug_dma_alloc_coherent(struct device *dev, size_t size,
extern void debug_dma_free_coherent(struct device *dev, size_t size,
void *virt, dma_addr_t addr);

-extern void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
- size_t size, int direction,
- dma_addr_t dma_addr,
- unsigned long attrs);
-
-extern void debug_dma_unmap_resource(struct device *dev, dma_addr_t dma_addr,
- size_t size, int direction);
-
extern void debug_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle, size_t size,
int direction);
@@ -88,19 +80,6 @@ static inline void debug_dma_free_coherent(struct device *dev, size_t size,
{
}

-static inline void debug_dma_map_resource(struct device *dev, phys_addr_t addr,
- size_t size, int direction,
- dma_addr_t dma_addr,
- unsigned long attrs)
-{
-}
-
-static inline void debug_dma_unmap_resource(struct device *dev,
- dma_addr_t dma_addr, size_t size,
- int direction)
-{
-}
-
static inline void debug_dma_sync_single_for_cpu(struct device *dev,
dma_addr_t dma_handle,
size_t size, int direction)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index fa75e3070073..1062caac47e7 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -502,22 +502,6 @@ int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
return ret;
}

-dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr,
- size_t size, enum dma_data_direction dir, unsigned long attrs)
-{
- dma_addr_t dma_addr = paddr;
-
- if (unlikely(!dma_capable(dev, dma_addr, size, false))) {
- dev_err_once(dev,
- "DMA addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
- &dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
- WARN_ON_ONCE(1);
- return DMA_MAPPING_ERROR;
- }
-
- return dma_addr;
-}
-
int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt,
void *cpu_addr, dma_addr_t dma_addr, size_t size,
unsigned long attrs)
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 8725508a6c57..7e0e21a7ba04 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -152,12 +152,10 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
}

-dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
- size_t offset, size_t size, enum dma_data_direction dir,
- unsigned long attrs)
+dma_addr_t dma_map_phys(struct device *dev, phys_addr_t phys, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
- phys_addr_t phys = page_to_phys(page) + offset;
bool is_mmio = attrs & DMA_ATTR_MMIO;
dma_addr_t addr;

@@ -177,6 +175,9 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,

addr = ops->map_resource(dev, phys, size, dir, attrs);
} else {
+ struct page *page = phys_to_page(phys);
+ size_t offset = offset_in_page(phys);
+
/*
* The dma_ops API contract for ops->map_page() requires
* kmappable memory, while ops->map_resource() does not.
@@ -190,9 +191,26 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,

return addr;
}
+EXPORT_SYMBOL_GPL(dma_map_phys);
+
+dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
+ size_t offset, size_t size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+ phys_addr_t phys = page_to_phys(page) + offset;
+
+ if (unlikely(attrs & DMA_ATTR_MMIO))
+ return DMA_MAPPING_ERROR;
+
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+ WARN_ON_ONCE(is_zone_device_page(page)))
+ return DMA_MAPPING_ERROR;
+
+ return dma_map_phys(dev, phys, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_map_page_attrs);

-void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+void dma_unmap_phys(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
@@ -212,6 +230,16 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
}
+EXPORT_SYMBOL_GPL(dma_unmap_phys);
+
+void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
+ enum dma_data_direction dir, unsigned long attrs)
+{
+ if (unlikely(attrs & DMA_ATTR_MMIO))
+ return;
+
+ dma_unmap_phys(dev, addr, size, dir, attrs);
+}
EXPORT_SYMBOL(dma_unmap_page_attrs);

static int __dma_map_sg_attrs(struct device *dev, struct scatterlist *sg,
@@ -337,41 +365,18 @@ EXPORT_SYMBOL(dma_unmap_sg_attrs);
dma_addr_t dma_map_resource(struct device *dev, phys_addr_t phys_addr,
size_t size, enum dma_data_direction dir, unsigned long attrs)
{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- dma_addr_t addr = DMA_MAPPING_ERROR;
-
- BUG_ON(!valid_dma_direction(dir));
-
- if (WARN_ON_ONCE(!dev->dma_mask))
+ if (IS_ENABLED(CONFIG_DMA_API_DEBUG) &&
+ WARN_ON_ONCE(pfn_valid(PHYS_PFN(phys_addr))))
return DMA_MAPPING_ERROR;

- if (dma_map_direct(dev, ops))
- addr = dma_direct_map_resource(dev, phys_addr, size, dir, attrs);
- else if (use_dma_iommu(dev))
- addr = iommu_dma_map_resource(dev, phys_addr, size, dir, attrs);
- else if (ops->map_resource)
- addr = ops->map_resource(dev, phys_addr, size, dir, attrs);
-
- trace_dma_map_resource(dev, phys_addr, addr, size, dir, attrs);
- debug_dma_map_resource(dev, phys_addr, size, dir, addr, attrs);
- return addr;
+ return dma_map_phys(dev, phys_addr, size, dir, attrs | DMA_ATTR_MMIO);
}
EXPORT_SYMBOL(dma_map_resource);

void dma_unmap_resource(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!valid_dma_direction(dir));
- if (dma_map_direct(dev, ops))
- ; /* nothing to do: uncached and no swiotlb */
- else if (use_dma_iommu(dev))
- iommu_dma_unmap_resource(dev, addr, size, dir, attrs);
- else if (ops->unmap_resource)
- ops->unmap_resource(dev, addr, size, dir, attrs);
- trace_dma_unmap_resource(dev, addr, size, dir, attrs);
- debug_dma_unmap_resource(dev, addr, size, dir);
+ dma_unmap_phys(dev, addr, size, dir, attrs | DMA_ATTR_MMIO);
}
EXPORT_SYMBOL(dma_unmap_resource);

--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:38 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Extend base DMA page API to handle MMIO flow and follow
existing dma_map_resource() implementation to rely on dma_map_direct()
only to take DMA direct path.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
kernel/dma/mapping.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 709405d46b2b..8725508a6c57 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -158,6 +158,7 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
{
const struct dma_map_ops *ops = get_dma_ops(dev);
phys_addr_t phys = page_to_phys(page) + offset;
+ bool is_mmio = attrs & DMA_ATTR_MMIO;
dma_addr_t addr;

BUG_ON(!valid_dma_direction(dir));
@@ -166,12 +167,23 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
return DMA_MAPPING_ERROR;

if (dma_map_direct(dev, ops) ||
- arch_dma_map_phys_direct(dev, phys + size))
+ (!is_mmio && arch_dma_map_phys_direct(dev, phys + size)))
addr = dma_direct_map_phys(dev, phys, size, dir, attrs);
else if (use_dma_iommu(dev))
addr = iommu_dma_map_phys(dev, phys, size, dir, attrs);
- else
+ else if (is_mmio) {
+ if (!ops->map_resource)
+ return DMA_MAPPING_ERROR;
+
+ addr = ops->map_resource(dev, phys, size, dir, attrs);
+ } else {
+ /*
+ * The dma_ops API contract for ops->map_page() requires
+ * kmappable memory, while ops->map_resource() does not.
+ */
addr = ops->map_page(dev, page, offset, size, dir, attrs);
+ }
+
kmsan_handle_dma(phys, size, dir);
trace_dma_map_phys(dev, phys, addr, size, dir, attrs);
debug_dma_map_phys(dev, phys, size, dir, addr, attrs);
@@ -184,14 +196,18 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size,
enum dma_data_direction dir, unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
+ bool is_mmio = attrs & DMA_ATTR_MMIO;

BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops) ||
- arch_dma_unmap_phys_direct(dev, addr + size))
+ (!is_mmio && arch_dma_unmap_phys_direct(dev, addr + size)))
dma_direct_unmap_phys(dev, addr, size, dir, attrs);
else if (use_dma_iommu(dev))
iommu_dma_unmap_phys(dev, addr, size, dir, attrs);
- else
+ else if (is_mmio) {
+ if (ops->unmap_resource)
+ ops->unmap_resource(dev, addr, size, dir, attrs);
+ } else
ops->unmap_page(dev, addr, size, dir, attrs);
trace_dma_unmap_phys(dev, addr, size, dir, attrs);
debug_dma_unmap_phys(dev, addr, size, dir);
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:41 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Convert HMM DMA operations from the legacy page-based API to the new
physical address-based dma_map_phys() and dma_unmap_phys() functions.
This demonstrates the preferred approach for new code that should use
physical addresses directly rather than page+offset parameters.

The change replaces dma_map_page() and dma_unmap_page() calls with
dma_map_phys() and dma_unmap_phys() respectively, using the physical
address that was already available in the code. This eliminates the
redundant page-to-physical address conversion and aligns with the
DMA subsystem's move toward physical address-centric interfaces.

This serves as an example of how new code should be written to leverage
the more efficient physical address API, which provides cleaner interfaces
for drivers that already have access to physical addresses.

Reviewed-by: Jason Gunthorpe <j...@nvidia.com>
Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
mm/hmm.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index d545e2494994..015ab243f081 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -775,8 +775,8 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
if (WARN_ON_ONCE(dma_need_unmap(dev) && !dma_addrs))
goto error;

- dma_addr = dma_map_page(dev, page, 0, map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);
if (dma_mapping_error(dev, dma_addr))
goto error;

@@ -819,8 +819,8 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
} else if (dma_need_unmap(dev))
- dma_unmap_page(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL);
+ dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
+ DMA_BIDIRECTIONAL, 0);

pfns[idx] &=
~(HMM_PFN_DMA_MAPPED | HMM_PFN_P2PDMA | HMM_PFN_P2PDMA_BUS);
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:46 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

In case peer-to-peer transaction traverses through host bridge,
the IOMMU needs to have IOMMU_MMIO flag, together with skip of
CPU sync.

The latter was handled by provided DMA_ATTR_SKIP_CPU_SYNC flag,
but IOMMU flag was missed, due to assumption that such memory
can be treated as regular one.

Reuse newly introduced DMA attribute to properly take MMIO path.

Reviewed-by: Jason Gunthorpe <j...@nvidia.com>
Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
mm/hmm.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/mm/hmm.c b/mm/hmm.c
index 015ab243f081..6556c0e074ba 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -746,7 +746,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
case PCI_P2PDMA_MAP_NONE:
break;
case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE:
- attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+ attrs |= DMA_ATTR_MMIO;
pfns[idx] |= HMM_PFN_P2PDMA;
break;
case PCI_P2PDMA_MAP_BUS_ADDR:
@@ -776,7 +776,7 @@ dma_addr_t hmm_dma_map_pfn(struct device *dev, struct hmm_dma_map *map,
goto error;

dma_addr = dma_map_phys(dev, paddr, map->dma_entry_size,
- DMA_BIDIRECTIONAL, 0);
+ DMA_BIDIRECTIONAL, attrs);
if (dma_mapping_error(dev, dma_addr))
goto error;

@@ -811,16 +811,17 @@ bool hmm_dma_unmap_pfn(struct device *dev, struct hmm_dma_map *map, size_t idx)
if ((pfns[idx] & valid_dma) != valid_dma)
return false;

+ if (pfns[idx] & HMM_PFN_P2PDMA)
+ attrs |= DMA_ATTR_MMIO;
+
if (pfns[idx] & HMM_PFN_P2PDMA_BUS)
; /* no need to unmap bus address P2P mappings */
- else if (dma_use_iova(state)) {
- if (pfns[idx] & HMM_PFN_P2PDMA)
- attrs |= DMA_ATTR_SKIP_CPU_SYNC;
+ else if (dma_use_iova(state))
dma_iova_unlink(dev, state, idx * map->dma_entry_size,
map->dma_entry_size, DMA_BIDIRECTIONAL, attrs);
- } else if (dma_need_unmap(dev))
+ else if (dma_need_unmap(dev))
dma_unmap_phys(dev, dma_addrs[idx], map->dma_entry_size,
- DMA_BIDIRECTIONAL, 0);
+ DMA_BIDIRECTIONAL, attrs);

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:51 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

After introduction of dma_map_phys(), there is no need to convert
from physical address to struct page in order to map page. So let's
use it directly.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
block/blk-mq-dma.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index ad283017caef..37e2142be4f7 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,8 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
struct blk_dma_iter *iter, struct phys_vec *vec)
{
- iter->addr = dma_map_page(dma_dev, phys_to_page(vec->paddr),
- offset_in_page(vec->paddr), vec->len, rq_dma_dir(req));
+ iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
+ rq_dma_dir(req), 0);
if (dma_mapping_error(dma_dev, iter->addr)) {
iter->status = BLK_STS_RESOURCE;
return false;
--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:55 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Block layer maps MMIO memory through dma_map_phys() interface
with help of DMA_ATTR_MMIO attribute. There is a need to unmap
that memory with the appropriate unmap function, something which
wasn't possible before adding new REQ attribute to block layer in
previous patch.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
drivers/nvme/host/pci.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 2c6d9506b172..f8ecc0e0f576 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -682,11 +682,15 @@ static void nvme_free_prps(struct request *req)
{
struct nvme_iod *iod = blk_mq_rq_to_pdu(req);
struct nvme_queue *nvmeq = req->mq_hctx->driver_data;
+ unsigned int attrs = 0;
unsigned int i;

+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
+
for (i = 0; i < iod->nr_dma_vecs; i++)
- dma_unmap_page(nvmeq->dev->dev, iod->dma_vecs[i].addr,
- iod->dma_vecs[i].len, rq_dma_dir(req));
+ dma_unmap_phys(nvmeq->dev->dev, iod->dma_vecs[i].addr,
+ iod->dma_vecs[i].len, rq_dma_dir(req), attrs);
mempool_free(iod->dma_vecs, nvmeq->dev->dmavec_mempool);
}

@@ -699,15 +703,19 @@ static void nvme_free_sgls(struct request *req)
unsigned int sqe_dma_len = le32_to_cpu(iod->cmd.common.dptr.sgl.length);
struct nvme_sgl_desc *sg_list = iod->descriptors[0];
enum dma_data_direction dir = rq_dma_dir(req);
+ unsigned int attrs = 0;
+
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;

if (iod->nr_descriptors) {
unsigned int nr_entries = sqe_dma_len / sizeof(*sg_list), i;

for (i = 0; i < nr_entries; i++)
- dma_unmap_page(dma_dev, le64_to_cpu(sg_list[i].addr),
- le32_to_cpu(sg_list[i].length), dir);
+ dma_unmap_phys(dma_dev, le64_to_cpu(sg_list[i].addr),
+ le32_to_cpu(sg_list[i].length), dir, attrs);
} else {
- dma_unmap_page(dma_dev, sqe_dma_addr, sqe_dma_len, dir);
+ dma_unmap_phys(dma_dev, sqe_dma_addr, sqe_dma_len, dir, attrs);
}
}

--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 6:14:59 AMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

Make sure that CPU is not synced and IOMMU is configured to take
MMIO path by providing newly introduced DMA_ATTR_MMIO attribute.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
block/blk-mq-dma.c | 13 +++++++++++--
include/linux/blk-mq-dma.h | 6 +++++-
include/linux/blk_types.h | 2 ++
3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/block/blk-mq-dma.c b/block/blk-mq-dma.c
index 37e2142be4f7..d415088ed9fd 100644
--- a/block/blk-mq-dma.c
+++ b/block/blk-mq-dma.c
@@ -87,8 +87,13 @@ static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec)
static bool blk_dma_map_direct(struct request *req, struct device *dma_dev,
struct blk_dma_iter *iter, struct phys_vec *vec)
{
+ unsigned int attrs = 0;
+
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
+
iter->addr = dma_map_phys(dma_dev, vec->paddr, vec->len,
- rq_dma_dir(req), 0);
+ rq_dma_dir(req), attrs);
if (dma_mapping_error(dma_dev, iter->addr)) {
iter->status = BLK_STS_RESOURCE;
return false;
@@ -103,14 +108,17 @@ static bool blk_rq_dma_map_iova(struct request *req, struct device *dma_dev,
{
enum dma_data_direction dir = rq_dma_dir(req);
unsigned int mapped = 0;
+ unsigned int attrs = 0;
int error;

iter->addr = state->addr;
iter->len = dma_iova_size(state);
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;

do {
error = dma_iova_link(dma_dev, state, vec->paddr, mapped,
- vec->len, dir, 0);
+ vec->len, dir, attrs);
if (error)
break;
mapped += vec->len;
@@ -176,6 +184,7 @@ bool blk_rq_dma_map_iter_start(struct request *req, struct device *dma_dev,
* same as non-P2P transfers below and during unmap.
*/
req->cmd_flags &= ~REQ_P2PDMA;
+ req->cmd_flags |= REQ_MMIO;
break;
default:
iter->status = BLK_STS_INVAL;
diff --git a/include/linux/blk-mq-dma.h b/include/linux/blk-mq-dma.h
index c26a01aeae00..6c55f5e58511 100644
--- a/include/linux/blk-mq-dma.h
+++ b/include/linux/blk-mq-dma.h
@@ -48,12 +48,16 @@ static inline bool blk_rq_dma_map_coalesce(struct dma_iova_state *state)
static inline bool blk_rq_dma_unmap(struct request *req, struct device *dma_dev,
struct dma_iova_state *state, size_t mapped_len)
{
+ unsigned int attrs = 0;
+
if (req->cmd_flags & REQ_P2PDMA)
return true;

if (dma_use_iova(state)) {
+ if (req->cmd_flags & REQ_MMIO)
+ attrs = DMA_ATTR_MMIO;
dma_iova_destroy(dma_dev, state, mapped_len, rq_dma_dir(req),
- 0);
+ attrs);
return true;
}

diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 09b99d52fd36..283058bcb5b1 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -387,6 +387,7 @@ enum req_flag_bits {
__REQ_FS_PRIVATE, /* for file system (submitter) use */
__REQ_ATOMIC, /* for atomic write operations */
__REQ_P2PDMA, /* contains P2P DMA pages */
+ __REQ_MMIO, /* contains MMIO memory */
/*
* Command specific flags, keep last:
*/
@@ -420,6 +421,7 @@ enum req_flag_bits {
#define REQ_FS_PRIVATE (__force blk_opf_t)(1ULL << __REQ_FS_PRIVATE)
#define REQ_ATOMIC (__force blk_opf_t)(1ULL << __REQ_ATOMIC)
#define REQ_P2PDMA (__force blk_opf_t)(1ULL << __REQ_P2PDMA)
+#define REQ_MMIO (__force blk_opf_t)(1ULL << __REQ_MMIO)

#define REQ_NOUNMAP (__force blk_opf_t)(1ULL << __REQ_NOUNMAP)

--
2.50.1

Jason Gunthorpe

unread,
Aug 14, 2025, 8:13:22 AMAug 14
to Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Wed, Aug 13, 2025 at 06:07:18PM +0300, Leon Romanovsky wrote:
> > > /* Helper function to handle DMA data transfers. */
> > > -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> > > +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> > > enum dma_data_direction dir)
> > > {
> > > u64 page_offset, to_go, addr;
> > > + struct page *page;
> > > + void *kaddr;
> > >
> > > - if (PageHighMem(page))
> > > + if (!pfn_valid(PHYS_PFN(phys)))
> > > return;
> >
> > Not needed, the caller must pass in a phys that is kmap
> > compatible. Maybe just leave a comment. FWIW today this is also not
> > checking for P2P or DEVICE non-kmap struct pages either, so it should
> > be fine without checks.
>
> It is not true as we will call to kmsan_handle_dma() unconditionally in
> dma_map_phys(). The reason to it is that kmsan_handle_dma() is guarded
> with debug kconfig options and cost of pfn_valid() can be accommodated
> in that case. It gives more clean DMA code.

Then check attrs here, not pfn_valid.

> So let's keep this patch as is.

Still need to fix the remarks you clipped, do not check PageHighMem
just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
patch and should not be here, it is the wrong way to use highmem.

Jason

Leon Romanovsky

unread,
Aug 14, 2025, 8:35:14 AMAug 14
to Jason Gunthorpe, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Thu, Aug 14, 2025 at 09:13:16AM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 13, 2025 at 06:07:18PM +0300, Leon Romanovsky wrote:
> > > > /* Helper function to handle DMA data transfers. */
> > > > -void kmsan_handle_dma(struct page *page, size_t offset, size_t size,
> > > > +void kmsan_handle_dma(phys_addr_t phys, size_t size,
> > > > enum dma_data_direction dir)
> > > > {
> > > > u64 page_offset, to_go, addr;
> > > > + struct page *page;
> > > > + void *kaddr;
> > > >
> > > > - if (PageHighMem(page))
> > > > + if (!pfn_valid(PHYS_PFN(phys)))
> > > > return;
> > >
> > > Not needed, the caller must pass in a phys that is kmap
> > > compatible. Maybe just leave a comment. FWIW today this is also not
> > > checking for P2P or DEVICE non-kmap struct pages either, so it should
> > > be fine without checks.
> >
> > It is not true as we will call to kmsan_handle_dma() unconditionally in
> > dma_map_phys(). The reason to it is that kmsan_handle_dma() is guarded
> > with debug kconfig options and cost of pfn_valid() can be accommodated
> > in that case. It gives more clean DMA code.
>
> Then check attrs here, not pfn_valid.

attrs are not available in kmsan_handle_dma(). I can add it if you prefer.

>
> > So let's keep this patch as is.
>
> Still need to fix the remarks you clipped, do not check PageHighMem
> just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> patch and should not be here, it is the wrong way to use highmem.

Sure, thanks

>
> Jason
>

Jason Gunthorpe

unread,
Aug 14, 2025, 8:44:56 AMAug 14
to Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Thu, Aug 14, 2025 at 03:35:06PM +0300, Leon Romanovsky wrote:
> > Then check attrs here, not pfn_valid.
>
> attrs are not available in kmsan_handle_dma(). I can add it if you prefer.

That makes more sense to the overall design. The comments I gave
before were driving at a promise to never try to touch a struct page
for ATTR_MMIO and think this should be comphrensive to never touching
a struct page even if pfnvalid.

> > > So let's keep this patch as is.
> >
> > Still need to fix the remarks you clipped, do not check PageHighMem
> > just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> > patch and should not be here, it is the wrong way to use highmem.
>
> Sure, thanks

I am wondering if there is some reason it was written like this in the
first place. Maybe we can't even do kmap here.. So perhaps if there is
not a strong reason to change it just continue to check pagehighmem
and fail.

if (!(attrs & ATTR_MMIO) && PageHighMem(phys_to_page(phys)))
return;

Jason

Leon Romanovsky

unread,
Aug 14, 2025, 9:31:14 AMAug 14
to Jason Gunthorpe, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Does this version good enough? There is no need to call to
kmap_local_pfn() if we prevent PageHighMem pages.

diff --git a/mm/kmsan/hooks.c b/mm/kmsan/hooks.c
index eab7912a3bf0..d9cf70f4159c 100644
--- a/mm/kmsan/hooks.c
+++ b/mm/kmsan/hooks.c
@@ -337,13 +337,13 @@ static void kmsan_handle_dma_page(const void *addr, size_t size,

/* Helper function to handle DMA data transfers. */
void kmsan_handle_dma(phys_addr_t phys, size_t size,
- enum dma_data_direction dir)
+ enum dma_data_direction dir, unsigned long attrs)
{
u64 page_offset, to_go, addr;
struct page *page;
void *kaddr;

- if (!pfn_valid(PHYS_PFN(phys)))
+ if ((attrs & ATTR_MMIO) || PageHighMem(phys_to_page(phys)))
return;

page = phys_to_page(phys);
@@ -357,19 +357,12 @@ void kmsan_handle_dma(phys_addr_t phys, size_t size,
while (size > 0) {
to_go = min(PAGE_SIZE - page_offset, (u64)size);

- if (PageHighMem(page))
- /* Handle highmem pages using kmap */
- kaddr = kmap_local_page(page);
- else
- /* Lowmem pages can be accessed directly */
- kaddr = page_address(page);
+ /* Lowmem pages can be accessed directly */
+ kaddr = page_address(page);

addr = (u64)kaddr + page_offset;
kmsan_handle_dma_page((void *)addr, to_go, dir);

- if (PageHighMem(page))
- kunmap_local(page);
-
phys += to_go;
size -= to_go;

(END)


>
> Jason
>

Jason Gunthorpe

unread,
Aug 14, 2025, 10:14:45 AMAug 14
to Leon Romanovsky, Marek Szyprowski, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
On Thu, Aug 14, 2025 at 04:31:06PM +0300, Leon Romanovsky wrote:
> On Thu, Aug 14, 2025 at 09:44:48AM -0300, Jason Gunthorpe wrote:
> > On Thu, Aug 14, 2025 at 03:35:06PM +0300, Leon Romanovsky wrote:
> > > > Then check attrs here, not pfn_valid.
> > >
> > > attrs are not available in kmsan_handle_dma(). I can add it if you prefer.
> >
> > That makes more sense to the overall design. The comments I gave
> > before were driving at a promise to never try to touch a struct page
> > for ATTR_MMIO and think this should be comphrensive to never touching
> > a struct page even if pfnvalid.
> >
> > > > > So let's keep this patch as is.
> > > >
> > > > Still need to fix the remarks you clipped, do not check PageHighMem
> > > > just call kmap_local_pfn(). All thie PageHighMem stuff is new to this
> > > > patch and should not be here, it is the wrong way to use highmem.
> > >
> > > Sure, thanks
> >
> > I am wondering if there is some reason it was written like this in the
> > first place. Maybe we can't even do kmap here.. So perhaps if there is
> > not a strong reason to change it just continue to check pagehighmem
> > and fail.
> >
> > if (!(attrs & ATTR_MMIO) && PageHighMem(phys_to_page(phys)))
> > return;
>
> Does this version good enough? There is no need to call to
> kmap_local_pfn() if we prevent PageHighMem pages.

Why make the rest of the changes though, isn't it just:

if (PageHighMem(page))
return;

Becomes:

if (attrs & ATTR_MMIO))
return;

page = phys_to_page(phys);
if (PageHighMem(page))
return;

Leave the rest as is?

Jason

Randy Dunlap

unread,
Aug 14, 2025, 1:37:39 PMAug 14
to Leon Romanovsky, Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Hi Leon,

On 8/14/25 3:13 AM, Leon Romanovsky wrote:
> diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
> index 1887d92e8e92..58a1528a9bb9 100644
> --- a/Documentation/core-api/dma-attributes.rst
> +++ b/Documentation/core-api/dma-attributes.rst
> @@ -130,3 +130,21 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
> subsystem that the buffer is fully accessible at the elevated privilege
> level (and ideally inaccessible or at least read-only at the
> lesser-privileged levels).
> +
> +DMA_ATTR_MMIO
> +-------------
> +
> +This attribute indicates the physical address is not normal system
> +memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
> +functions, it may not be cachable, and access using CPU load/store

Usually "cacheable" (git grep -w cacheable counts 1042 hits vs.
55 hits for "cachable"). And the $internet agrees.

> +instructions may not be allowed.
> +
> +Usually this will be used to describe MMIO addresses, or other non

non-cacheable

> +cachable register addresses. When DMA mapping this sort of address we

> +call the operation Peer to Peer as a one device is DMA'ing to another
> +device. For PCI devices the p2pdma APIs must be used to determine if
> +DMA_ATTR_MMIO is appropriate.
> +
> +For architectures that require cache flushing for DMA coherence
> +DMA_ATTR_MMIO will not perform any cache flushing. The address
> +provided must never be mapped cachable into the CPU.
again.

thanks.
--
~Randy

Leon Romanovsky

unread,
Aug 14, 2025, 1:43:41 PMAug 14
to Randy Dunlap, Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Thanks, I will fix.

>
> thanks.
> --
> ~Randy
>
>

Leon Romanovsky

unread,
Aug 14, 2025, 1:54:18 PMAug 14
to Marek Szyprowski, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
Changelog:
v3:
* Fixed typo in "cacheable" word
* Simplified kmsan patch a lot to be simple argument refactoring
v2: https://lore.kernel.org/all/cover.175515...@kernel.org
drivers/iommu/dma-iommu.c | 61 +++++------
drivers/nvme/host/pci.c | 18 +++-
drivers/virtio/virtio_ring.c | 4 +-
drivers/xen/swiotlb-xen.c | 21 +++-
include/linux/blk-mq-dma.h | 6 +-
include/linux/blk_types.h | 2 +
include/linux/dma-direct.h | 2 -
include/linux/dma-map-ops.h | 8 +-
include/linux/dma-mapping.h | 33 ++++++
include/linux/iommu-dma.h | 11 +-
include/linux/kmsan.h | 9 +-
include/trace/events/dma.h | 9 +-
kernel/dma/debug.c | 71 ++++---------
kernel/dma/debug.h | 37 ++-----
kernel/dma/direct.c | 22 +---
kernel/dma/direct.h | 52 ++++++----
kernel/dma/mapping.c | 117 +++++++++++++---------
kernel/dma/ops_helpers.c | 6 +-
mm/hmm.c | 19 ++--
mm/kmsan/hooks.c | 7 +-
rust/kernel/dma.rs | 3 +
tools/virtio/linux/kmsan.h | 2 +-
26 files changed, 306 insertions(+), 255 deletions(-)

--
2.50.1

Leon Romanovsky

unread,
Aug 14, 2025, 1:54:23 PMAug 14
to Marek Szyprowski, Leon Romanovsky, Jason Gunthorpe, Abdiel Janulgue, Alexander Potapenko, Alex Gaynor, Andrew Morton, Christoph Hellwig, Danilo Krummrich, io...@lists.linux.dev, Jason Wang, Jens Axboe, Joerg Roedel, Jonathan Corbet, Juergen Gross, kasa...@googlegroups.com, Keith Busch, linux...@vger.kernel.org, linu...@vger.kernel.org, linux-...@vger.kernel.org, linu...@kvack.org, linux...@lists.infradead.org, linuxp...@lists.ozlabs.org, linux-tra...@vger.kernel.org, Madhavan Srinivasan, Masami Hiramatsu, Michael Ellerman, Michael S. Tsirkin, Miguel Ojeda, Robin Murphy, rust-fo...@vger.kernel.org, Sagi Grimberg, Stefano Stabellini, Steven Rostedt, virtual...@lists.linux.dev, Will Deacon, xen-...@lists.xenproject.org
From: Leon Romanovsky <leo...@nvidia.com>

This patch introduces the DMA_ATTR_MMIO attribute to mark DMA buffers
that reside in memory-mapped I/O (MMIO) regions, such as device BARs
exposed through the host bridge, which are accessible for peer-to-peer
(P2P) DMA.

This attribute is especially useful for exporting device memory to other
devices for DMA without CPU involvement, and avoids unnecessary or
potentially detrimental CPU cache maintenance calls.

DMA_ATTR_MMIO is supposed to provide dma_map_resource() functionality
without need to call to special function and perform branching by
the callers.

Signed-off-by: Leon Romanovsky <leo...@nvidia.com>
---
Documentation/core-api/dma-attributes.rst | 18 ++++++++++++++++++
include/linux/dma-mapping.h | 20 ++++++++++++++++++++
include/trace/events/dma.h | 3 ++-
rust/kernel/dma.rs | 3 +++
4 files changed, 43 insertions(+), 1 deletion(-)

diff --git a/Documentation/core-api/dma-attributes.rst b/Documentation/core-api/dma-attributes.rst
index 1887d92e8e92..0bdc2be65e57 100644
--- a/Documentation/core-api/dma-attributes.rst
+++ b/Documentation/core-api/dma-attributes.rst
@@ -130,3 +130,21 @@ accesses to DMA buffers in both privileged "supervisor" and unprivileged
subsystem that the buffer is fully accessible at the elevated privilege
level (and ideally inaccessible or at least read-only at the
lesser-privileged levels).
+
+DMA_ATTR_MMIO
+-------------
+
+This attribute indicates the physical address is not normal system
+memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+functions, it may not be cacheable, and access using CPU load/store
+instructions may not be allowed.
+
+Usually this will be used to describe MMIO addresses, or other non-cacheable
+register addresses. When DMA mapping this sort of address we call
+the operation Peer to Peer as a one device is DMA'ing to another device.
+For PCI devices the p2pdma APIs must be used to determine if
+DMA_ATTR_MMIO is appropriate.
+
+For architectures that require cache flushing for DMA coherence
+DMA_ATTR_MMIO will not perform any cache flushing. The address
+provided must never be mapped cacheable into the CPU.
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 55c03e5fe8cb..4254fd9bdf5d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -58,6 +58,26 @@
*/
#define DMA_ATTR_PRIVILEGED (1UL << 9)

+/*
+ * DMA_ATTR_MMIO - Indicates memory-mapped I/O (MMIO) region for DMA mapping
+ *
+ * This attribute indicates the physical address is not normal system
+ * memory. It may not be used with kmap*()/phys_to_virt()/phys_to_page()
+ * functions, it may not be cacheable, and access using CPU load/store
+ * instructions may not be allowed.
+ *
+ * Usually this will be used to describe MMIO addresses, or other non-cacheable
+ * register addresses. When DMA mapping this sort of address we call
+ * the operation Peer to Peer as a one device is DMA'ing to another device.
+ * For PCI devices the p2pdma APIs must be used to determine if DMA_ATTR_MMIO
+ * is appropriate.
+ *
+ * For architectures that require cache flushing for DMA coherence
+ * DMA_ATTR_MMIO will not perform any cache flushing. The address
+ * provided must never be mapped cacheable into the CPU.
+ */
+#define DMA_ATTR_MMIO (1UL << 10)
+
/*
* A dma_addr_t can hold any valid DMA or bus address for the platform. It can
* be given to a device to use as a DMA source or target. It is specific to a
diff --git a/include/trace/events/dma.h b/include/trace/events/dma.h
index d8ddc27b6a7c..ee90d6f1dcf3 100644
--- a/include/trace/events/dma.h
+++ b/include/trace/events/dma.h
It is loading more messages.
0 new messages