Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 20/20] PCI: keystone-dw: update PCI config space remap function

135 views
Skip to first unread message

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:07 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Murali Karicheri <m-kari...@ti.com>
---
drivers/pci/dwc/pci-keystone-dw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/dwc/pci-keystone-dw.c b/drivers/pci/dwc/pci-keystone-dw.c
index 6b396f6..8bc626e 100644
--- a/drivers/pci/dwc/pci-keystone-dw.c
+++ b/drivers/pci/dwc/pci-keystone-dw.c
@@ -543,7 +543,7 @@ int __init ks_dw_pcie_host_init(struct keystone_pcie *ks_pcie,

/* Index 0 is the config reg. space address */
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
- pci->dbi_base = devm_ioremap_resource(dev, res);
+ pci->dbi_base = devm_pci_remap_cfg_resource(dev, res);
if (IS_ERR(pci->dbi_base))
return PTR_ERR(pci->dbi_base);

--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:07 AM2/27/17
to
According to the PCI local bus specifications (Revision 3.0, 3.2.5),
I/O Address space transactions are non-posted. On architectures where
I/O space is implemented through a chunk of memory mapped space mapped
to PCI address space (ie IA64/ARM/ARM64) the memory mapping for the
region backing I/O Address Space transactions determines the I/O
transactions attributes (before the transactions actually reaches the
PCI bus where it is handled according to the PCI specifications).

Current pci_remap_iospace() interface, that is used to map the PCI I/O
Address Space into virtual address space, use pgprot_device() as memory
attribute for the virtual address mapping, that in some architectures
(ie ARM64) provides non-cacheable but write bufferable mappings (ie
posted writes), which clash with the non-posted write behaviour for I/O
Address Space mandated by the PCI specifications.

Update the prot ioremap_page_range() parameter in pci_remap_iospace()
to pgprot_noncached to ensure that the virtual mapping backing
I/O Address Space guarantee non-posted write transactions issued
when addressing I/O Address Space through the MMIO mapping.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Will Deacon <will....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: Catalin Marinas <catalin...@arm.com>
---
drivers/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bd98674..bfb3c6e 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3375,7 +3375,7 @@ int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
return -EINVAL;

return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
- pgprot_device(PAGE_KERNEL));
+ pgprot_noncached(PAGE_KERNEL));
#else
/* this architecture does not have memory mapped I/O space,
so this function should never be called */
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:07 AM2/27/17
to
PCI local bus specifications (Rev3.0, 3.2.5 "Transaction Ordering
and Posting") strictly require PCI configuration and I/O Address space
write transactions to be non-posted.

Current crop of DT/ACPI PCI host controllers drivers relies on
the ioremap interface to map ECAM and ECAM-derivative PCI config
regions and pci_remap_iospace() to create a VMA for mapping
PCI host bridge I/O Address space transactions to CPU virtual address
space.

On some platforms (ie ARM/ARM64) ioremap fails to comply with the PCI
configuration non-posted write transactions requirement, because it
provides a memory mapping that issues "bufferable" or, in PCI terms
"posted" write transactions. Likewise, the current pci_remap_iospace()
implementation maps the physical address range that the PCI translates
to I/O space cycles to virtual address space through pgprot_device()
attributes that on eg ARM64 provides a memory mapping issuing
posted writes transactions, which is not PCI specifications compliant.

This patch series[1] addresses both issues in one go:

- It updates the pci_remap_iospace() function to use a page mapping
that guarantees non-posted write transactions for I/O space addresses
- It adds a kernel API to remap PCI config space resources, so that
architecture can override it with a mapping implementation that
guarantees PCI specifications compliancy wrt non-posted write
configuration transactions
- It updates all PCI host controller implementations (and the generic
ECAM layer) to use the newly introduced mapping interface

Tested on Juno ECAM based interface (DT/ACPI).

Non-ECAM PCI host controller drivers patches need checking to make
sure that:

- I patched the correct resource region mapping for config space
- There are not any other ways to ensure posted-write completion
in the respective pci_ops that make the relevant patch unnecessary

[1] git://git.kernel.org/pub/scm/linux/kernel/git/lpieralisi/linux.git pci/config-io-mappings-fix

Cc: Pratyush Anand <pratyus...@gmail.com>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Jonathan Corbet <cor...@lwn.net>
Cc: Will Deacon <will....@arm.com>
Cc: Jingoo Han <jingo...@gmail.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Mingkai Hu <mingk...@freescale.com>
Cc: Tanmay Inamdar <tina...@apm.com>
Cc: Murali Karicheri <m-kari...@ti.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: Bharat Kumar Gogada <bharat.ku...@xilinx.com>
Cc: Ray Jui <rj...@broadcom.com>
Cc: Wenrui Li <wenr...@rock-chips.com>
Cc: Shawn Lin <shaw...@rock-chips.com>
Cc: Minghuan Lian <minghu...@freescale.com>
Cc: Catalin Marinas <catalin...@arm.com>
Cc: Jon Mason <jonm...@broadcom.com>
Cc: Gabriele Paoloni <gabriele...@huawei.com>
Cc: Thomas Petazzoni <thomas.p...@free-electrons.com>
Cc: Joao Pinto <Joao....@synopsys.com>
Cc: Thierry Reding <thierry...@gmail.com>
Cc: Michal Simek <michal...@xilinx.com>
Cc: Stanimir Varbanov <svar...@mm-sol.com>
Cc: Zhou Wang <wang...@hisilicon.com>
Cc: Roy Zang <tie-fe...@freescale.com>

Lorenzo Pieralisi (20):
PCI: remove __weak tag from pci_remap_iospace()
PCI: fix pci_remap_iospace() remap attribute
asm-generic/io.h: add PCI config space remap interface
ARM64: implement pci_remap_cfgspace() interface
ARM: implement pci_remap_cfgspace() interface
PCI: ECAM: use pci_remap_cfgspace() to map config region
PCI: implement Devres interface to map PCI config space
PCI: xilinx: update PCI config space remap function
PCI: xilinx-nwl: update PCI config space remap function
PCI: spear13xx: update PCI config space remap function
PCI: rockchip: update PCI config space remap function
PCI: qcom: update PCI config space remap function
PCI: iproc-platform: update PCI config space remap function
PCI: hisi: update PCI config space remap function
PCI: designware: update PCI config space remap function
PCI: armada8k: update PCI config space remap function
PCI: xgene: update PCI config space remap function
PCI: tegra: update PCI config space remap function
PCI: layerscape: update PCI config space remap function
PCI: keystone-dw: update PCI config space remap function

Documentation/driver-model/devres.txt | 6 ++-
arch/arm/include/asm/io.h | 10 ++++
arch/arm/mm/ioremap.c | 7 +++
arch/arm64/include/asm/io.h | 10 ++++
drivers/pci/dwc/pci-keystone-dw.c | 2 +-
drivers/pci/dwc/pci-layerscape.c | 2 +-
drivers/pci/dwc/pcie-armada8k.c | 2 +-
drivers/pci/dwc/pcie-designware-host.c | 12 +++--
drivers/pci/dwc/pcie-hisi.c | 3 +-
drivers/pci/dwc/pcie-qcom.c | 2 +-
drivers/pci/dwc/pcie-spear13xx.c | 2 +-
drivers/pci/ecam.c | 5 +-
drivers/pci/host/pci-tegra.c | 4 +-
drivers/pci/host/pci-xgene.c | 4 +-
drivers/pci/host/pcie-iproc-platform.c | 3 +-
drivers/pci/host/pcie-rockchip.c | 2 +-
drivers/pci/host/pcie-xilinx-nwl.c | 2 +-
drivers/pci/host/pcie-xilinx.c | 2 +-
drivers/pci/pci.c | 86 +++++++++++++++++++++++++++++++++-
include/asm-generic/io.h | 9 ++++
include/linux/pci.h | 5 ++
21 files changed, 154 insertions(+), 26 deletions(-)

--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:07 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Stanimir Varbanov <svar...@mm-sol.com>
---
drivers/pci/dwc/pcie-qcom.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/dwc/pcie-qcom.c b/drivers/pci/dwc/pcie-qcom.c
index e36abe0..bf7bd21 100644
--- a/drivers/pci/dwc/pcie-qcom.c
+++ b/drivers/pci/dwc/pcie-qcom.c
@@ -698,7 +698,7 @@ static int qcom_pcie_probe(struct platform_device *pdev)
return PTR_ERR(pcie->parf);

res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "dbi");

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:07 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Mingkai Hu <mingk...@freescale.com>
Cc: Minghuan Lian <minghu...@freescale.com>
Cc: Roy Zang <tie-fe...@freescale.com>
---
drivers/pci/dwc/pci-layerscape.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/dwc/pci-layerscape.c b/drivers/pci/dwc/pci-layerscape.c
index 175c09e..e39beaa 100644
--- a/drivers/pci/dwc/pci-layerscape.c
+++ b/drivers/pci/dwc/pci-layerscape.c
@@ -281,7 +281,7 @@ static int __init ls_pcie_probe(struct platform_device *pdev)
pci->ops = pcie->drvdata->dw_pcie_ops;

dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "regs");
- pci->dbi_base = devm_ioremap_resource(dev, dbi_base);
+ pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:09 AM2/27/17
to
The PCI bus specifications (rev 3.0, 3.2.5 "Transaction Ordering
and Posting") defines rules for PCI configuration space transactions
ordering and posting, that state that configuration writes
are non-posted transactions.

This rule is reinforced by the ARM v8 architecture reference manual
(issue A.k, Early Write Acknowledgment) that explicitly recommends
that No Early Write Acknowledgment attribute should be used to map
PCI configuration (write) transactions.

Current ioremap interface on ARM64 implements mapping functions
where the Early Write Acknowledgment hint is enabled, so they
cannot be used to map PCI configuration space in a PCI specs
compliant way.

Implement an ARM64 specific pci_remap_cfgspace() interface
that allows to map PCI config region with nGnRnE attributes, providing
a remap function that complies with PCI specifications and the ARMv8
architecture reference manual recommendations.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Will Deacon <will....@arm.com>
Cc: Catalin Marinas <catalin...@arm.com>
---
arch/arm64/include/asm/io.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
index 0c00c87..7a03250 100644
--- a/arch/arm64/include/asm/io.h
+++ b/arch/arm64/include/asm/io.h
@@ -173,6 +173,16 @@ extern void __iomem *ioremap_cache(phys_addr_t phys_addr, size_t size);
#define iounmap __iounmap

/*
+ * PCI configuration space mapping function.
+ *
+ * PCI specifications disallows posted write configuration transactions.
+ * Add an arch specific pci_remap_cfgspace definition that is implemented
+ * through nGnRnE device memory attribute as recommended by the ARM v8
+ * Architecture reference manual Issue A.k B2.8.2 "Device memory".
+ */
+#define pci_remap_cfgspace(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRnE))
+
+/*
* io{read,write}{16,32,64}be() macros
*/
#define ioread16be(p) ({ __u16 __v = be16_to_cpu((__force __be16)__raw_readw(p)); __iormb(); __v; })
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:10 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Wenrui Li <wenr...@rock-chips.com>
Cc: Shawn Lin <shaw...@rock-chips.com>
---
drivers/pci/host/pcie-rockchip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/host/pcie-rockchip.c b/drivers/pci/host/pcie-rockchip.c
index 26ddd35..690a7b5 100644
--- a/drivers/pci/host/pcie-rockchip.c
+++ b/drivers/pci/host/pcie-rockchip.c
@@ -822,7 +822,7 @@ static int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
regs = platform_get_resource_byname(pdev,
IORESOURCE_MEM,
"axi-base");
- rockchip->reg_base = devm_ioremap_resource(dev, regs);
+ rockchip->reg_base = devm_pci_remap_cfg_resource(dev, regs);
if (IS_ERR(rockchip->reg_base))
return PTR_ERR(rockchip->reg_base);

--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:10 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use correct memory mapping attributes to map config space
regions to enforce configuration space non-posted writes behaviour.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Thierry Reding <thierry...@gmail.com>
---
drivers/pci/host/pci-tegra.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pci-tegra.c b/drivers/pci/host/pci-tegra.c
index ed8a93f..2618f87 100644
--- a/drivers/pci/host/pci-tegra.c
+++ b/drivers/pci/host/pci-tegra.c
@@ -380,7 +380,7 @@ static struct tegra_pcie_bus *tegra_pcie_bus_alloc(struct tegra_pcie *pcie,
unsigned int busnr)
{
struct device *dev = pcie->dev;
- pgprot_t prot = pgprot_device(PAGE_KERNEL);
+ pgprot_t prot = pgprot_noncached(PAGE_KERNEL);
phys_addr_t cs = pcie->cs->start;
struct tegra_pcie_bus *bus;
unsigned int i;
@@ -1962,7 +1962,7 @@ static int tegra_pcie_parse_dt(struct tegra_pcie *pcie)
rp->pcie = pcie;
rp->np = port;

- rp->base = devm_ioremap_resource(dev, &rp->regs);
+ rp->base = devm_pci_remap_cfg_resource(dev, &rp->regs);
if (IS_ERR(rp->base))
return PTR_ERR(rp->base);

--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:10 AM2/27/17
to
The PCI specifications (Rev 3.0, 3.2.5 "Transaction Ordering and
Posting") mandate non-posted configuration transactions. As further
highlighted in the PCIe specifications (4.0 - Rev0.3, "Ordering
Considerations for the Enhanced Configuration Access Mechanism"),
through ECAM and ECAM-derivative configuration mechanism, the memory
mapped transactions from the host CPU into Configuration Requests on the
PCI express fabric may create ordering problems for software because
writes to memory address are typically posted transactions (unless the
architecture can enforce through virtual address mapping non-posted
write transactions behaviour) but writes to Configuration Space are not
posted on the PCI express fabric.

Current DT and ACPI host bridge controllers map PCI configuration space
(ECAM and ECAM-derivative) into the virtual address space through
ioremap() calls, that are non-cacheable device accesses on most
architectures, but may provide "bufferable" or "posted" write semantics
in architecture like eg ARM/ARM64 that allow ioremap'ed regions writes
to be buffered in the bus connecting the host CPU to the PCI fabric;
this behaviour, as underlined in the PCIe specifications, may trigger
transactions ordering rules and must be prevented.

Introduce a new generic and explicit API to create a memory
mapping for ECAM and ECAM-derivative config space area that
defaults to ioremap_nocache() (which should provide a sane default
behaviour) but still allowing architectures on which ioremap_nocache()
results in posted write transactions to override the function
call with an arch specific implementation that complies with
the PCI specifications for configuration transactions.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Will Deacon <will....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Russell King <li...@armlinux.org.uk>
Cc: Catalin Marinas <catalin...@arm.com>
---
include/asm-generic/io.h | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 7ef015e..52dda81 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -915,6 +915,15 @@ extern void ioport_unmap(void __iomem *p);
#endif /* CONFIG_GENERIC_IOMAP */
#endif /* CONFIG_HAS_IOPORT_MAP */

+#ifndef pci_remap_cfgspace
+#define pci_remap_cfgspace pci_remap_cfgspace
+static inline void __iomem *pci_remap_cfgspace(phys_addr_t offset,
+ size_t size)
+{
+ return ioremap_nocache(offset, size);
+}
+#endif
+
#ifndef xlate_dev_kmem_ptr
#define xlate_dev_kmem_ptr xlate_dev_kmem_ptr
static inline void *xlate_dev_kmem_ptr(void *addr)
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:20:10 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Tanmay Inamdar <tina...@apm.com>
---
drivers/pci/host/pci-xgene.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/host/pci-xgene.c b/drivers/pci/host/pci-xgene.c
index 1a61087..de19898 100644
--- a/drivers/pci/host/pci-xgene.c
+++ b/drivers/pci/host/pci-xgene.c
@@ -248,7 +248,7 @@ static int xgene_pcie_ecam_init(struct pci_config_window *cfg, u32 ipversion)
dev_err(dev, "can't get CSR resource\n");
return ret;
}
- port->csr_base = devm_ioremap_resource(dev, &csr);
+ port->csr_base = devm_pci_remap_cfg_resource(dev, &csr);
if (IS_ERR(port->csr_base))
return PTR_ERR(port->csr_base);

@@ -359,7 +359,7 @@ static int xgene_pcie_map_reg(struct xgene_pcie_port *port,
struct resource *res;

res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "csr");
- port->csr_base = devm_ioremap_resource(dev, res);
+ port->csr_base = devm_pci_remap_cfg_resource(dev, res);
if (IS_ERR(port->csr_base))
return PTR_ERR(port->csr_base);

--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:05 AM2/27/17
to
pci_remap_iospace() is marked as a weak symbol even though
no architecture is currently overriding it; given that its
implementation internals have already code paths that are
arch specific (ie PCI_IOBASE and ioremap_page_range() attributes)
there is no need to leave the weak symbol in the kernel since the
same functionality can be achieved by customizing per-arch the
corresponding functionality.

Remove the __weak symbol from pci_remap_iospace().

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Bjorn Helgaas <bhel...@google.com>
---
drivers/pci/pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 7904d02..bd98674 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3363,7 +3363,7 @@ unsigned long __weak pci_address_to_pio(phys_addr_t address)
* Only architectures that have memory mapped IO functions defined
* (and the PCI_IOBASE value defined) should call this function.
*/
-int __weak pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
+int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
{
#if defined(PCI_IOBASE) && defined(CONFIG_MMU)
unsigned long vaddr = (unsigned long)PCI_IOBASE + res->start;
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:06 AM2/27/17
to
Current ECAM kernel implementation uses ioremap() to map the ECAM
configuration space memory region; this is not safe in that on some
architectures the ioremap interface provides mappings that allow posted
write transactions. This, as highlighted in the PCIe specifications
(4.0 - Rev0.3, "Ordering Considerations for the Enhanced Configuration
Address Mechanism"), can create ordering issues for software because
posted writes transactions on the CPU host bus are non posted in the
PCI express fabric.

Update the ioremap() interface to use pci_remap_cfgspace() whose
mapping attributes guarantee that non-posted writes transactions
are issued for memory writes within the ECAM memory mapped address
region.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
---
drivers/pci/ecam.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/ecam.c b/drivers/pci/ecam.c
index 2fee61b..e2068a4 100644
--- a/drivers/pci/ecam.c
+++ b/drivers/pci/ecam.c
@@ -84,12 +84,13 @@ struct pci_config_window *pci_ecam_create(struct device *dev,
if (!cfg->winp)
goto err_exit_malloc;
for (i = 0; i < bus_range; i++) {
- cfg->winp[i] = ioremap(cfgres->start + i * bsz, bsz);
+ cfg->winp[i] =
+ pci_remap_cfgspace(cfgres->start + i * bsz, bsz);
if (!cfg->winp[i])
goto err_exit_iomap;
}
} else {
- cfg->win = ioremap(cfgres->start, bus_range * bsz);
+ cfg->win = pci_remap_cfgspace(cfgres->start, bus_range * bsz);
if (!cfg->win)
goto err_exit_iomap;
}
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:06 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generate on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Pratyush Anand <pratyus...@gmail.com>
Cc: Bjorn Helgaas <bhel...@google.com>
---
drivers/pci/dwc/pcie-spear13xx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/dwc/pcie-spear13xx.c b/drivers/pci/dwc/pcie-spear13xx.c
index 348f9c5..ea676a1 100644
--- a/drivers/pci/dwc/pcie-spear13xx.c
+++ b/drivers/pci/dwc/pcie-spear13xx.c
@@ -271,7 +271,7 @@ static int spear13xx_pcie_probe(struct platform_device *pdev)
}

dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM, "dbi");
- pci->dbi_base = devm_ioremap_resource(dev, dbi_base);
+ pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
if (IS_ERR(pci->dbi_base)) {
dev_err(dev, "couldn't remap dbi base %p\n", dbi_base);
ret = PTR_ERR(pci->dbi_base);
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:06 AM2/27/17
to
The PCI bus specifications (rev 3.0, 3.2.5 "Transaction Ordering
and Posting") define rules for PCI configuration space transactions
ordering and posting, that state that configuration writes have to
be non-posted transactions.

Current ioremap interface on ARM provides mapping functions that
provide "bufferable" writes transactions (ie ioremap uses MT_DEVICE
memory type) aka posted writes, so PCI host controller drivers have
no arch interface to remap PCI configuration space with memory
attributes that comply with the PCI specifications for configuration
space.

Implement an ARM specific pci_remap_cfgspace() interface that allows to
map PCI config memory regions with MT_UNCACHED memory type (ie strongly
ordered - non-posted writes), providing a remap function that complies
with PCI specifications for config space transactions.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Russell King <li...@armlinux.org.uk>
---
arch/arm/include/asm/io.h | 10 ++++++++++
arch/arm/mm/ioremap.c | 7 +++++++
2 files changed, 17 insertions(+)

diff --git a/arch/arm/include/asm/io.h b/arch/arm/include/asm/io.h
index 42871fb..74d1b09 100644
--- a/arch/arm/include/asm/io.h
+++ b/arch/arm/include/asm/io.h
@@ -187,6 +187,16 @@ static inline void pci_ioremap_set_mem_type(int mem_type) {}
extern int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr);

/*
+ * PCI configuration space mapping function.
+ *
+ * PCI specifications does not allow configuration write
+ * transactions to be posted. Add an arch specific
+ * pci_remap_cfgspace definition that is implemented
+ * through strongly ordered memory mappings.
+ */
+#define pci_remap_cfgspace pci_remap_cfgspace
+void __iomem *pci_remap_cfgspace(resource_size_t res_cookie, size_t size);
+/*
* Now, pick up the machine-defined IO definitions
*/
#ifdef CONFIG_NEED_MACH_IO_H
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index ff0eed2..fc91205 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -481,6 +481,13 @@ int pci_ioremap_io(unsigned int offset, phys_addr_t phys_addr)
__pgprot(get_mem_type(pci_ioremap_mem_type)->prot_pte));
}
EXPORT_SYMBOL_GPL(pci_ioremap_io);
+
+void __iomem *pci_remap_cfgspace(resource_size_t res_cookie, size_t size)
+{
+ return arch_ioremap_caller(res_cookie, size, MT_UNCACHED,
+ __builtin_return_address(0));
+}
+EXPORT_SYMBOL_GPL(pci_remap_cfgspace);
#endif

/*
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:07 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Bharat Kumar Gogada <bharat.ku...@xilinx.com>
Cc: Michal Simek <michal...@xilinx.com>
---
drivers/pci/host/pcie-xilinx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/host/pcie-xilinx.c b/drivers/pci/host/pcie-xilinx.c
index 7f030f5..2fe2df5 100644
--- a/drivers/pci/host/pcie-xilinx.c
+++ b/drivers/pci/host/pcie-xilinx.c
@@ -606,7 +606,7 @@ static int xilinx_pcie_parse_dt(struct xilinx_pcie_port *port)
return err;
}

- port->reg_base = devm_ioremap_resource(dev, &regs);
+ port->reg_base = devm_pci_remap_cfg_resource(dev, &regs);
if (IS_ERR(port->reg_base))
return PTR_ERR(port->reg_base);

--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:07 AM2/27/17
to
The introduction of the pci_remap_cfgspace() interface allows
PCI host controller drivers to map PCI config space through a
dedicated kernel interface. Current PCI host controller drivers
use the devm_ioremap_* Devres interfaces to map PCI configuration
space regions so in order to update them to the new
pci_remap_cfgspace() mapping interface a new set of Devres interfaces
should be implemented so that PCI host controller drivers can make
use of them.

Introduce two new functions in the PCI kernel layer and Devres
documentation:

- devm_pci_remap_cfgspace()
- devm_pci_remap_cfg_resource()

so that PCI host controller drivers can make use of them to map
PCI configuration space regions.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Jonathan Corbet <cor...@lwn.net>
Cc: Bjorn Helgaas <bhel...@google.com>
---
Documentation/driver-model/devres.txt | 6 ++-
drivers/pci/pci.c | 82 +++++++++++++++++++++++++++++++++++
include/linux/pci.h | 5 +++
3 files changed, 91 insertions(+), 2 deletions(-)

diff --git a/Documentation/driver-model/devres.txt b/Documentation/driver-model/devres.txt
index bf34d5b..e72587f 100644
--- a/Documentation/driver-model/devres.txt
+++ b/Documentation/driver-model/devres.txt
@@ -342,8 +342,10 @@ PER-CPU MEM
devm_free_percpu()

PCI
- pcim_enable_device() : after success, all PCI ops become managed
- pcim_pin_device() : keep PCI device enabled after release
+ devm_pci_remap_cfgspace() : ioremap PCI configuration space
+ devm_pci_remap_cfg_resource() : ioremap PCI configuration space resource
+ pcim_enable_device() : after success, all PCI ops become managed
+ pcim_pin_device() : keep PCI device enabled after release

PHY
devm_usb_get_phy()
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index bfb3c6e..1e435c2 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3401,6 +3401,88 @@ void pci_unmap_iospace(struct resource *res)
#endif
}

+/**
+ * devm_pci_remap_cfgspace - Managed pci_remap_cfgspace()
+ * @dev: Generic device to remap IO address for
+ * @offset: BUS offset to map
+ * @size: Size of map
+ *
+ * Managed pci_remap_cfgspace(). Map is automatically unmapped on driver
+ * detach.
+ */
+void __iomem *devm_pci_remap_cfgspace(struct device *dev,
+ resource_size_t offset,
+ resource_size_t size)
+{
+ void __iomem **ptr, *addr;
+
+ ptr = devres_alloc(devm_ioremap_release, sizeof(*ptr), GFP_KERNEL);
+ if (!ptr)
+ return NULL;
+
+ addr = pci_remap_cfgspace(offset, size);
+ if (addr) {
+ *ptr = addr;
+ devres_add(dev, ptr);
+ } else
+ devres_free(ptr);
+
+ return addr;
+}
+EXPORT_SYMBOL(devm_pci_remap_cfgspace);
+
+/**
+ * devm_pci_remap_cfg_resource - check, request region and ioremap cfg resource
+ * @dev: generic device to handle the resource for
+ * @res: configuration space resource to be handled
+ *
+ * Checks that a resource is a valid memory region, requests the memory
+ * region and ioremaps with pci_remap_cfgspace() API that ensures the
+ * proper PCI configuration space memory attributes are guaranteed.
+ *
+ * All operations are managed and will be undone on driver detach.
+ *
+ * Returns a pointer to the remapped memory or an ERR_PTR() encoded error code
+ * on failure. Usage example:
+ *
+ * res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+ * base = devm_pci_remap_cfg_resource(&pdev->dev, res);
+ * if (IS_ERR(base))
+ * return PTR_ERR(base);
+ */
+void __iomem *devm_pci_remap_cfg_resource(struct device *dev,
+ struct resource *res)
+{
+ resource_size_t size;
+ const char *name;
+ void __iomem *dest_ptr;
+
+ BUG_ON(!dev);
+
+ if (!res || resource_type(res) != IORESOURCE_MEM) {
+ dev_err(dev, "invalid resource\n");
+ return IOMEM_ERR_PTR(-EINVAL);
+ }
+
+ size = resource_size(res);
+ name = res->name ?: dev_name(dev);
+
+ if (!devm_request_mem_region(dev, res->start, size, name)) {
+ dev_err(dev, "can't request region for resource %pR\n", res);
+ return IOMEM_ERR_PTR(-EBUSY);
+ }
+
+ dest_ptr = devm_pci_remap_cfgspace(dev, res->start, size);
+ if (!dest_ptr) {
+ dev_err(dev, "ioremap failed for resource %pR\n", res);
+ devm_release_mem_region(dev, res->start, size);
+ dest_ptr = IOMEM_ERR_PTR(-ENOMEM);
+ }
+
+ return dest_ptr;
+}
+EXPORT_SYMBOL(devm_pci_remap_cfg_resource);
+
static void __pci_set_master(struct pci_dev *dev, bool enable)
{
u16 old_cmd, cmd;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 282ed32..5a3588b 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1199,6 +1199,11 @@ unsigned long pci_address_to_pio(phys_addr_t addr);
phys_addr_t pci_pio_to_address(unsigned long pio);
int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr);
void pci_unmap_iospace(struct resource *res);
+void __iomem *devm_pci_remap_cfgspace(struct device *dev,
+ resource_size_t offset,
+ resource_size_t size);
+void __iomem *devm_pci_remap_cfg_resource(struct device *dev,
+ struct resource *res);

static inline pci_bus_addr_t pci_bus_address(struct pci_dev *pdev, int bar)
{
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 27, 2017, 10:30:08 AM2/27/17
to
PCI configuration space should be mapped with a memory region type that
generates on the CPU host bus non-posted write transations. Update the
driver to use the devm_pci_remap_cfg* interface to make sure the correct
memory mappings for PCI configuration space are used.

Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Bharat Kumar Gogada <bharat.ku...@xilinx.com>
Cc: Michal Simek <michal...@xilinx.com>
---
drivers/pci/host/pcie-xilinx-nwl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/host/pcie-xilinx-nwl.c b/drivers/pci/host/pcie-xilinx-nwl.c
index 4c3e0ab..4b16b26 100644
--- a/drivers/pci/host/pcie-xilinx-nwl.c
+++ b/drivers/pci/host/pcie-xilinx-nwl.c
@@ -761,7 +761,7 @@ static int nwl_pcie_parse_dt(struct nwl_pcie *pcie,
pcie->phys_pcie_reg_base = res->start;

res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "cfg");
- pcie->ecam_base = devm_ioremap_resource(dev, res);
+ pcie->ecam_base = devm_pci_remap_cfg_resource(dev, res);
if (IS_ERR(pcie->ecam_base))
return PTR_ERR(pcie->ecam_base);
pcie->phys_ecam_base = res->start;
--
2.10.0

Lorenzo Pieralisi

unread,
Feb 28, 2017, 5:50:06 AM2/28/17
to
Arnd, all,
Provided this series is acceptable as-is, I noticed that I have
to add a #define for pci_remap_cfgspace() on all arches that do
not include asm-generic/io.h in their asm/io.h, I missed that,
please shout if you see an easier way to implement a
pci_remap_cfgspace() fall-back to ioremap_nocache() on all arches
that do not override it.

Thanks !
Lorenzo

Arnd Bergmann

unread,
Mar 1, 2017, 11:20:10 AM3/1/17
to
On Mon, Feb 27, 2017 at 4:14 PM, Lorenzo Pieralisi
<lorenzo....@arm.com> wrote:
> pci_remap_iospace() is marked as a weak symbol even though
> no architecture is currently overriding it; given that its
> implementation internals have already code paths that are
> arch specific (ie PCI_IOBASE and ioremap_page_range() attributes)
> there is no need to leave the weak symbol in the kernel since the
> same functionality can be achieved by customizing per-arch the
> corresponding functionality.
>
> Remove the __weak symbol from pci_remap_iospace().

Good idea,

Acked-by: Arnd Bergmann <ar...@arndb.de>

Arnd Bergmann

unread,
Mar 1, 2017, 11:30:06 AM3/1/17
to
On Mon, Feb 27, 2017 at 4:14 PM, Lorenzo Pieralisi
<lorenzo....@arm.com> wrote:

> This patch series[1] addresses both issues in one go:
>
> - It updates the pci_remap_iospace() function to use a page mapping
> that guarantees non-posted write transactions for I/O space addresses
> - It adds a kernel API to remap PCI config space resources, so that
> architecture can override it with a mapping implementation that
> guarantees PCI specifications compliancy wrt non-posted write
> configuration transactions
> - It updates all PCI host controller implementations (and the generic
> ECAM layer) to use the newly introduced mapping interface
>
> Tested on Juno ECAM based interface (DT/ACPI).

This looks all good to me, nice work!

Arnd

Andy Shevchenko

unread,
Mar 1, 2017, 9:10:05 PM3/1/17
to
On Mon, Feb 27, 2017 at 5:14 PM, Lorenzo Pieralisi
<lorenzo....@arm.com> wrote:
> The introduction of the pci_remap_cfgspace() interface allows
> PCI host controller drivers to map PCI config space through a
> dedicated kernel interface. Current PCI host controller drivers
> use the devm_ioremap_* Devres interfaces to map PCI configuration
> space regions so in order to update them to the new
> pci_remap_cfgspace() mapping interface a new set of Devres interfaces
> should be implemented so that PCI host controller drivers can make
> use of them.
>
> Introduce two new functions in the PCI kernel layer and Devres
> documentation:
>
> - devm_pci_remap_cfgspace()
> - devm_pci_remap_cfg_resource()
>
> so that PCI host controller drivers can make use of them to map
> PCI configuration space regions.

Wouldn't you like to be consistent with current PCI API, i.e.:
1. devm_*() functions called pcim_*() in PCI.
2. If you may notice there is no separate pcim_*map*() stuff, they are
dynamically adapting to the case.

?

--
With Best Regards,
Andy Shevchenko

Lorenzo Pieralisi

unread,
Mar 2, 2017, 7:10:05 AM3/2/17
to
Hi Andy,

On Thu, Mar 02, 2017 at 01:54:42AM +0200, Andy Shevchenko wrote:
> On Mon, Feb 27, 2017 at 5:14 PM, Lorenzo Pieralisi
> <lorenzo....@arm.com> wrote:
> > The introduction of the pci_remap_cfgspace() interface allows
> > PCI host controller drivers to map PCI config space through a
> > dedicated kernel interface. Current PCI host controller drivers
> > use the devm_ioremap_* Devres interfaces to map PCI configuration
> > space regions so in order to update them to the new
> > pci_remap_cfgspace() mapping interface a new set of Devres interfaces
> > should be implemented so that PCI host controller drivers can make
> > use of them.
> >
> > Introduce two new functions in the PCI kernel layer and Devres
> > documentation:
> >
> > - devm_pci_remap_cfgspace()
> > - devm_pci_remap_cfg_resource()
> >
> > so that PCI host controller drivers can make use of them to map
> > PCI configuration space regions.
>
> Wouldn't you like to be consistent with current PCI API, i.e.:
> 1. devm_*() functions called pcim_*() in PCI.

I thought about that and did not do it because here we are remapping
resources that are _not_ PCI bus resources (ie it is not PCI BARs we
are remapping), keeping the devm_* prefix would be more consistent
to the typical device drivers remapping functions pattern (ie a
typical PCI host controller driver would mix devm_ and pcim_ calls
which is a bit hard to parse), that was my rationale.

I am not too fussed about that either way, I am happy to update it to
pcim_* though, it is Bjorn/Arnd's decision.

> 2. If you may notice there is no separate pcim_*map*() stuff, they are
> dynamically adapting to the case.

I do not understand what you mean here I would ask you to elaborate
a bit more please so that I can do something about it.

Thanks !
Lorenzo

Andy Shevchenko

unread,
Mar 2, 2017, 8:30:07 AM3/2/17
to
On Thu, Mar 2, 2017 at 2:05 PM, Lorenzo Pieralisi
<lorenzo....@arm.com> wrote:
> On Thu, Mar 02, 2017 at 01:54:42AM +0200, Andy Shevchenko wrote:
>> On Mon, Feb 27, 2017 at 5:14 PM, Lorenzo Pieralisi
>> <lorenzo....@arm.com> wrote:

+Cc: Tejun, who is initial author of PCI managed resources implementation.

> I thought about that and did not do it because here we are remapping
> resources that are _not_ PCI bus resources (ie it is not PCI BARs we
> are remapping), keeping the devm_* prefix would be more consistent
> to the typical device drivers remapping functions pattern (ie a
> typical PCI host controller driver would mix devm_ and pcim_ calls
> which is a bit hard to parse), that was my rationale.
>
> I am not too fussed about that either way, I am happy to update it to
> pcim_* though, it is Bjorn/Arnd's decision.

I would vote for pcim_*() variant.

>> 2. If you may notice there is no separate pcim_*map*() stuff, they are
>> dynamically adapting to the case.
>
> I do not understand what you mean here I would ask you to elaborate
> a bit more please so that I can do something about it.

Oh, sorry, there are two examples currently, i.e.
pci_enable_msi()/pci_enable_msix() and pci_request_region*() which has
no "m" in the name, but are managed on release by pcim_release().
Some developers consider this as a bad idea, but so far no patch has
been sent to introduce pcim_*() variants of those.

So, regarding to your stuff, I would stick with "pcim" prefix.

Lorenzo Pieralisi

unread,
Mar 2, 2017, 1:10:05 PM3/2/17
to
Thanks a lot Arnd. There is a pending issue to complete the series,
that is related to asm-generic/io.h, which is not included by all
arches, therefore, on top of adding a default inline for
pci_remap_cfg_space() in asm-generic/io.h I will have to patch
all arches that do not include asm-generic/io.h (eg x86) to
make the series completely functional - ie I will have to add a

#define pci_remap_cfgspace ioremap_nocache

in every given asm/io.h that does not include <asm-generic/io.h>

Another option would consist in adding the default inline for
pci_remap_cfg_space() in asm-generic/pci_iomap.h which seems to
be included by all arches.

I think the first option, even if it requires more extensive
patching is more complete but please all let me know how
you want me to fix this niggle, I am not sure I grasp the background
policy behind asm-generic files entirely so I'd better ask.

Thanks !
Lorenzo

Thierry Reding

unread,
Mar 2, 2017, 3:20:05 PM3/2/17
to
On Thu, Mar 02, 2017 at 02:24:06PM -0500, Tejun Heo wrote:
> Hello,
>
> On Thu, Mar 02, 2017 at 02:50:00PM +0200, Andy Shevchenko wrote:
> > > I thought about that and did not do it because here we are remapping
> > > resources that are _not_ PCI bus resources (ie it is not PCI BARs we
> > > are remapping), keeping the devm_* prefix would be more consistent
> > > to the typical device drivers remapping functions pattern (ie a
> > > typical PCI host controller driver would mix devm_ and pcim_ calls
> > > which is a bit hard to parse), that was my rationale.
> > >
> > > I am not too fussed about that either way, I am happy to update it to
> > > pcim_* though, it is Bjorn/Arnd's decision.
> >
> > I would vote for pcim_*() variant.
>
> Me too, for brevity.

devm_* is equally brief. Also, all existing pcim_*() functions take a
struct pci_dev * as their first argument, because they operate on the
PCI devices. However in this case the devm_pci_remap_*() functions do
not operate on PCI devices. Rather they operate on the struct device
that represents the PCI host bridge. Therefore I think devm_ is more
appropriate here.

Thierry
signature.asc

Tejun Heo

unread,
Mar 2, 2017, 6:20:04 PM3/2/17
to
Hello,

On Thu, Mar 02, 2017 at 02:50:00PM +0200, Andy Shevchenko wrote:
> > I thought about that and did not do it because here we are remapping
> > resources that are _not_ PCI bus resources (ie it is not PCI BARs we
> > are remapping), keeping the devm_* prefix would be more consistent
> > to the typical device drivers remapping functions pattern (ie a
> > typical PCI host controller driver would mix devm_ and pcim_ calls
> > which is a bit hard to parse), that was my rationale.
> >
> > I am not too fussed about that either way, I am happy to update it to
> > pcim_* though, it is Bjorn/Arnd's decision.
>
> I would vote for pcim_*() variant.

Me too, for brevity.

> >> 2. If you may notice there is no separate pcim_*map*() stuff, they are
> >> dynamically adapting to the case.
> >
> > I do not understand what you mean here I would ask you to elaborate
> > a bit more please so that I can do something about it.
>
> Oh, sorry, there are two examples currently, i.e.
> pci_enable_msi()/pci_enable_msix() and pci_request_region*() which has
> no "m" in the name, but are managed on release by pcim_release().
> Some developers consider this as a bad idea, but so far no patch has
> been sent to introduce pcim_*() variants of those.
>
> So, regarding to your stuff, I would stick with "pcim" prefix.

Sounds good to me.

Thanks.

--
tejun

Bjorn Helgaas

unread,
Mar 16, 2017, 5:20:05 PM3/16/17
to
[+cc Luis]
I'm fine with this conceptually, but I think it would make more sense
if the name weren't specific to PCI or config space, e.g.,
ioremap_nopost() or something.

Bjorn Helgaas

unread,
Mar 16, 2017, 5:50:06 PM3/16/17
to
[+cc Luis]

On Mon, Feb 27, 2017 at 03:14:13PM +0000, Lorenzo Pieralisi wrote:
> According to the PCI local bus specifications (Revision 3.0, 3.2.5),
> I/O Address space transactions are non-posted. On architectures where
> I/O space is implemented through a chunk of memory mapped space mapped
> to PCI address space (ie IA64/ARM/ARM64) the memory mapping for the
> region backing I/O Address Space transactions determines the I/O
> transactions attributes (before the transactions actually reaches the
> PCI bus where it is handled according to the PCI specifications).
>
> Current pci_remap_iospace() interface, that is used to map the PCI I/O
> Address Space into virtual address space, use pgprot_device() as memory
> attribute for the virtual address mapping, that in some architectures
> (ie ARM64) provides non-cacheable but write bufferable mappings (ie
> posted writes), which clash with the non-posted write behaviour for I/O
> Address Space mandated by the PCI specifications.
>
> Update the prot ioremap_page_range() parameter in pci_remap_iospace()
> to pgprot_noncached to ensure that the virtual mapping backing
> I/O Address Space guarantee non-posted write transactions issued
> when addressing I/O Address Space through the MMIO mapping.
>
> Signed-off-by: Lorenzo Pieralisi <lorenzo....@arm.com>
> Cc: Arnd Bergmann <ar...@arndb.de>
> Cc: Will Deacon <will....@arm.com>
> Cc: Bjorn Helgaas <bhel...@google.com>
> Cc: Russell King <li...@armlinux.org.uk>
> Cc: Catalin Marinas <catalin...@arm.com>
> ---
> drivers/pci/pci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index bd98674..bfb3c6e 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3375,7 +3375,7 @@ int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
> return -EINVAL;
>
> return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
> - pgprot_device(PAGE_KERNEL));
> + pgprot_noncached(PAGE_KERNEL));

pgprot_device() is equivalent to pgprot_noncached() on all arches
except ARM64, and I trust you're doing the right thing on ARM64, so
I'm fine with this from a PCI perspective.

I do find this puzzling because I naively expected pgprot_noncached()
to match up with ioremap_nocache(), and apparently it doesn't.

For example, ARM64 ioremap_nocache() uses PROT_DEVICE_nGnRE, which
doesn't match the MT_DEVICE_nGnRnE in pgprot_noncached().

The point of these patches is to use non-posted mappings. Apparently
you can do that with pgprot_noncached() here, but ioremap_nocache()
isn't enough for the config space mappings?

I suppose that's a consequence of the pgprot_noncached() vs
ioremap_nocache() mismatch, but this is all extremely confusing.

> #else
> /* this architecture does not have memory mapped I/O space,
> so this function should never be called */
> --
> 2.10.0
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-ar...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Luis R. Rodriguez

unread,
Mar 16, 2017, 8:10:04 PM3/16/17
to
So... I take it this is actually fixing a series of odd issues also,
do we at least have some reports or actual issues ?
Seems reasonable to me -- but are there other buses that could use this already
as well ? Wouldn't these other buses also run into similar issues ? Can someone
also bounce me a copy of the patches that use this ?

While at it, please add some documentation too, the above commit log is huge,
and yet for the person eyeballing the code they won't have any clue why this
was added exactly. Since this is about helping with picking the right
ioremap due to certain semantics / requirements on the PCI config space,
we should clarify then what exactly are the expectations here. The clearer
you are the less in trouble we can get later.

Luis

Luis R. Rodriguez

unread,
Mar 16, 2017, 9:00:06 PM3/16/17
to
On Thu, Mar 16, 2017 at 04:48:44PM -0500, Bjorn Helgaas wrote:
> [+cc Luis]
>
> On Mon, Feb 27, 2017 at 03:14:13PM +0000, Lorenzo Pieralisi wrote:
> > According to the PCI local bus specifications (Revision 3.0, 3.2.5),
> > I/O Address space transactions are non-posted. On architectures where
> > I/O space is implemented through a chunk of memory mapped space mapped
> > to PCI address space (ie IA64/ARM/ARM64) the memory mapping for the
> > region backing I/O Address Space transactions determines the I/O
> > transactions attributes (before the transactions actually reaches the
> > PCI bus where it is handled according to the PCI specifications).
> >
> > Current pci_remap_iospace() interface, that is used to map the PCI I/O
> > Address Space into virtual address space, use pgprot_device() as memory
> > attribute for the virtual address mapping, that in some architectures
> > (ie ARM64) provides non-cacheable but write bufferable mappings (ie
> > posted writes),

<sarcasm>
Gee wiz, I am glad this is so well documented.
</sarcasm>

> > which clash with the non-posted write behaviour for I/O
> > Address Space mandated by the PCI specifications.
> >
> > Update the prot ioremap_page_range() parameter in pci_remap_iospace()
> > to pgprot_noncached to ensure that the virtual mapping backing
> > I/O Address Space guarantee non-posted write transactions issued
> > when addressing I/O Address Space through the MMIO mapping.

How did we end up with pgprot_device() then in the first place Liviu Dudau [0] ?
I ask for two reasons:

a) should we then use a Fixes tag for this patch ?
b) it does not seem clear what the semantics for pgprot_device() or even
pgprot_noncached(). Can you add some ?

8b921acfeffdb ("PCI: Add pci_remap_iospace() to map bus I/O resources")

Also this patch claims archs can override this call alone, as its __weak.
So is the right thing to do to change pci_remap_iospace() to pgprot_noncached()
or is it for archs to add their own pci_remap_iospace()? If so why ? Without
proper semantics defined for these helpers this is all fuzzy.
This is for iospace it seems, so the other patch I think was for
config space.

Luis

> I suppose that's a consequence of the pgprot_noncached() vs
> ioremap_nocache() mismatch, but this is all extremely confusing.
>
> > #else
> > /* this architecture does not have memory mapped I/O space,
> > so this function should never be called */
> > --
> > 2.10.0
> >
> >
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-ar...@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

--
Luis Rodriguez, SUSE LINUX GmbH
Maxfeldstrasse 5; D-90409 Nuernberg

Liviu Dudau

unread,
Mar 17, 2017, 8:40:06 AM3/17/17
to
[replying using personal email as the corporate email system is taking its sweet time
to deliver the email to my inbox]

I've asked the people with the right knowledge about the correct API to use (Hi Catalin!),
and during the review it did not throw any red flags. I guess, given Bjorn's comment,
that everyone assumed AArch64 is the same as all other architectures and pgprot_device
is synonymous to pgprot_noncached.

>
> a) should we then use a Fixes tag for this patch ?

I'm not aware of issues being reported, but Lorenzo might have more info on this.

> b) it does not seem clear what the semantics for pgprot_device() or even
> pgprot_noncached(). Can you add some ?
>
> 8b921acfeffdb ("PCI: Add pci_remap_iospace() to map bus I/O resources")
>
> Also this patch claims archs can override this call alone, as its __weak.
> So is the right thing to do to change pci_remap_iospace() to pgprot_noncached()
> or is it for archs to add their own pci_remap_iospace()? If so why ? Without
> proper semantics defined for these helpers this is all fuzzy.

That was the initial intention, to let arches / platforms overwrite the whole
pci_remap_iospace(). I guess the reality is that no one needs to overwrite it except
for the AArch64 quirk, so probably easier to remove the __weak and fix the attributes for arm64.

Best regards,
Liviu
--
_
_|_|_
('_')
(⊃ )⊃
|_|_|

Luis R. Rodriguez

unread,
Mar 17, 2017, 12:30:06 PM3/17/17
to
On Fri, Mar 17, 2017 at 10:43:39AM +0000, Liviu Dudau wrote:
> On Fri, Mar 17, 2017 at 01:33:21AM +0100, Luis R. Rodriguez wrote:
> > a) should we then use a Fixes tag for this patch ?
>
> I'm not aware of issues being reported, but Lorenzo might have more info on this.

Lorenzo ? If not what exactly made you discover this ? If it is a fix, and only
ARM64 is implicated, seems like a worthy change to consider for stable for the
sake of stable ARM64 kernels. But, that would leave the PCI config space without
a simple 1 liner fix too -- so maybe its not worth it. Distributions wanting
to support ARM64 however would like to carry these changes, so some annotations
such as Fixes should help.

> > b) it does not seem clear what the semantics for pgprot_device() or even
> > pgprot_noncached(). Can you add some ?
> >
> > 8b921acfeffdb ("PCI: Add pci_remap_iospace() to map bus I/O resources")
> >
> > Also this patch claims archs can override this call alone, as its __weak.
> > So is the right thing to do to change pci_remap_iospace() to pgprot_noncached()
> > or is it for archs to add their own pci_remap_iospace()? If so why ? Without
> > proper semantics defined for these helpers this is all fuzzy.
>
> That was the initial intention, to let arches / platforms overwrite the whole
> pci_remap_iospace(). I guess the reality is that no one needs to overwrite it except
> for the AArch64 quirk, so probably easier to remove the __weak and fix the attributes for arm64.

Sounds much more reasonable to me.

Luis

John Garry

unread,
Mar 20, 2017, 6:30:06 AM3/20/17
to
We (Huawei) originally raised the concern [1], but have not actually
experienced any related issue.

Thanks,
John

[1]
http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/477353.html
> .
>

Bjorn Helgaas

unread,
Mar 20, 2017, 12:10:06 PM3/20/17
to
On Fri, Mar 17, 2017 at 01:33:21AM +0100, Luis R. Rodriguez wrote:
> On Thu, Mar 16, 2017 at 04:48:44PM -0500, Bjorn Helgaas wrote:
> > [+cc Luis]
> >
> > On Mon, Feb 27, 2017 at 03:14:13PM +0000, Lorenzo Pieralisi wrote:
> > > According to the PCI local bus specifications (Revision 3.0, 3.2.5),
> > > I/O Address space transactions are non-posted. On architectures where
> > > I/O space is implemented through a chunk of memory mapped space mapped
> > > to PCI address space (ie IA64/ARM/ARM64) the memory mapping for the
> > > region backing I/O Address Space transactions determines the I/O
> > > transactions attributes (before the transactions actually reaches the
> > > PCI bus where it is handled according to the PCI specifications).
> > >
> > > Current pci_remap_iospace() interface, that is used to map the PCI I/O
> > > Address Space into virtual address space, use pgprot_device() as memory
> > > attribute for the virtual address mapping, that in some architectures
> > > (ie ARM64) provides non-cacheable but write bufferable mappings (ie
> > > posted writes),
>
> <sarcasm>
> Gee wiz, I am glad this is so well documented.
> </sarcasm>

I'm not sure this is actionable feedback. The two paragraphs above
seem clear and useful to me. How would you like to see them improved?

> > > ...
> > > @@ -3375,7 +3375,7 @@ int pci_remap_iospace(const struct resource *res, phys_addr_t phys_addr)
> > > return -EINVAL;
> > >
> > > return ioremap_page_range(vaddr, vaddr + resource_size(res), phys_addr,
> > > - pgprot_device(PAGE_KERNEL));
> > > + pgprot_noncached(PAGE_KERNEL));
> >
> > ...
> > I do find this puzzling because I naively expected pgprot_noncached()
> > to match up with ioremap_nocache(), and apparently it doesn't.
> >
> > For example, ARM64 ioremap_nocache() uses PROT_DEVICE_nGnRE, which
> > doesn't match the MT_DEVICE_nGnRnE in pgprot_noncached().
> >
> > The point of these patches is to use non-posted mappings. Apparently
> > you can do that with pgprot_noncached() here, but ioremap_nocache()
> > isn't enough for the config space mappings?
>
> This is for iospace it seems, so the other patch I think was for
> config space.

I understand that 02/20 is for PCI I/O port space and 03/20 is for PCI
config space. I'm confused because I thought we wanted the same
non-posted mapping for both, but looks like they're different.

Patch 02/20 uses ioremap_page_range(..., pgprot_noncached(PAGE_KERNEL))
to map PCI I/O port space:

#define pgprot_noncached(prot) \
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRnE) | PTE_PXN | PTE_UXN)

Patch 03/20 uses ioremap_nocache() to map PCI config space:

#define ioremap_nocache(addr, size) __ioremap((addr), (size), __pgprot(PROT_DEVICE_nGnRE))

So the I/O port mapping uses MT_DEVICE_nGnRnE, while the config space
mapping uses PROT_DEVICE_nGnRE, which looks different.

Bjorn

Lorenzo Pieralisi

unread,
Mar 20, 2017, 12:20:04 PM3/20/17
to
On Fri, Mar 17, 2017 at 05:26:18PM +0100, Luis R. Rodriguez wrote:
> On Fri, Mar 17, 2017 at 10:43:39AM +0000, Liviu Dudau wrote:
> > On Fri, Mar 17, 2017 at 01:33:21AM +0100, Luis R. Rodriguez wrote:
> > > a) should we then use a Fixes tag for this patch ?
> >
> > I'm not aware of issues being reported, but Lorenzo might have more info on this.
>
> Lorenzo ? If not what exactly made you discover this ? If it is a fix, and only
> ARM64 is implicated, seems like a worthy change to consider for stable for the
> sake of stable ARM64 kernels. But, that would leave the PCI config space without
> a simple 1 liner fix too -- so maybe its not worth it. Distributions wanting
> to support ARM64 however would like to carry these changes, so some annotations
> such as Fixes should help.

It started with this thread:

http://lists.infradead.org/pipermail/linux-arm-kernel/2017-January/477353.html

this series is not fixing any current issue I am aware of (but I am not
keen on leaving code as-is either) hence adding a Fixes: tag is problematic.

I would leave stable kernels alone for the time being.

Lorenzo

Bjorn Helgaas

unread,
Mar 20, 2017, 12:30:09 PM3/20/17
to
> > ...

> > > +#ifndef pci_remap_cfgspace
> > > +#define pci_remap_cfgspace pci_remap_cfgspace
> > > +static inline void __iomem *pci_remap_cfgspace(phys_addr_t offset,
> > > + size_t size)
> > > +{
> > > + return ioremap_nocache(offset, size);
> > > +}
> >
> > I'm fine with this conceptually, but I think it would make more sense
> > if the name weren't specific to PCI or config space, e.g.,
> > ioremap_nopost() or something.
>
> Seems reasonable to me -- but are there other buses that could use
> this already as well ? Wouldn't these other buses also run into
> similar issues ? Can someone also bounce me a copy of the patches
> that use this ?

I forwarded a copy of the initial posting of all 20 patches to you.

> While at it, please add some documentation too, the above commit log
> is huge, and yet for the person eyeballing the code they won't have
> any clue why this was added exactly. Since this is about helping
> with picking the right ioremap due to certain semantics /
> requirements on the PCI config space, we should clarify then what
> exactly are the expectations here. The clearer you are the less in
> trouble we can get later.

I think the documentation above does contain the critical facts that:

- Accesses to PCI config space and PCI I/O port space should be
non-posted

- ARM64 currently maps them as posted, which may cause ordering
problems

It's possible the changelog could be made clearer by *removing* text,
but I'm not sure what Lorenzo could *add* to make it better.

This particular patch is generic (not ARM64-specific), so maybe all we
need to say here is that PCI requires non-posted mappings in some
cases, and we currently don't have a generic ioremap() to make them,
so we're adding a way for an arch to provide such an interface.

Bjorn

Lorenzo Pieralisi

unread,
Mar 20, 2017, 12:40:06 PM3/20/17
to
On Mon, Mar 20, 2017 at 11:06:36AM -0500, Bjorn Helgaas wrote:

[...]
On ARM64 (PATCH 4) and ARM (PATCH 5) we override pci_remap_cfgspace()
with implementations that provide non-posted writes bus attributes,
PATCH 3 is just there to provide a "safe" (well, I need input on that)
fall-back.

Thanks,
Lorenzo

Bjorn Helgaas

unread,
Mar 20, 2017, 12:40:07 PM3/20/17
to
Ah, thanks, that makes sense to me finally! After patch 4,
pci_remap_cfgspace() uses PROT_DEVICE_nGnRnE, which does look like the
MT_DEVICE_nGnRnE used in the ioremap_page_range() path.

I wonder if we can get rid of pgprot_device() altogether? Patch 2
removed the use in pci_remap_iospace(), and patch 18 removes the use
in tegra_pcie_bus_alloc().

Bjorn

Russell King - ARM Linux

unread,
Mar 20, 2017, 1:00:05 PM3/20/17
to
On Mon, Feb 27, 2017 at 03:14:16PM +0000, Lorenzo Pieralisi wrote:
> The PCI bus specifications (rev 3.0, 3.2.5 "Transaction Ordering
> and Posting") define rules for PCI configuration space transactions
> ordering and posting, that state that configuration writes have to
> be non-posted transactions.
>
> Current ioremap interface on ARM provides mapping functions that
> provide "bufferable" writes transactions (ie ioremap uses MT_DEVICE
> memory type) aka posted writes, so PCI host controller drivers have
> no arch interface to remap PCI configuration space with memory
> attributes that comply with the PCI specifications for configuration
> space.
>
> Implement an ARM specific pci_remap_cfgspace() interface that allows to
> map PCI config memory regions with MT_UNCACHED memory type (ie strongly
> ordered - non-posted writes), providing a remap function that complies
> with PCI specifications for config space transactions.

Doesn't this have the side effect that configuration writes can bypass
writes to device memory if there isn't an intervening dsb? (They
probably can do on some CPUs today anyway.)

So, in any case, this looks like an improvement:

Acked-by: Russell King <rmk+k...@armlinux.org.uk>

Thanks.

--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Lorenzo Pieralisi

unread,
Mar 20, 2017, 2:50:11 PM3/20/17
to
On Thu, Mar 16, 2017 at 04:12:43PM -0500, Bjorn Helgaas wrote:
I agree with you since there is not much PCI specific in the ioremap
implementation we are introducing (and it may be used more by other
subsystems too), I suspect it is all about finding a reasonable
ioremap_ naming extension (ioremap_nowb() ?).

On the other hand having an explicit pci_remap_cfgspace() interface
would simplify preventing drivers to sneak into the kernel with the
wrong ioremap API for config space (even though when all existing
ones are converted new drivers would just be based on existing ones
so it should be fine).

Suggestions welcome.

Thanks !
Lorenzo

Lorenzo Pieralisi

unread,
Mar 21, 2017, 11:30:06 AM3/21/17
to
Hi Russell,

On Mon, Mar 20, 2017 at 04:43:55PM +0000, Russell King - ARM Linux wrote:
> On Mon, Feb 27, 2017 at 03:14:16PM +0000, Lorenzo Pieralisi wrote:
> > The PCI bus specifications (rev 3.0, 3.2.5 "Transaction Ordering
> > and Posting") define rules for PCI configuration space transactions
> > ordering and posting, that state that configuration writes have to
> > be non-posted transactions.
> >
> > Current ioremap interface on ARM provides mapping functions that
> > provide "bufferable" writes transactions (ie ioremap uses MT_DEVICE
> > memory type) aka posted writes, so PCI host controller drivers have
> > no arch interface to remap PCI configuration space with memory
> > attributes that comply with the PCI specifications for configuration
> > space.
> >
> > Implement an ARM specific pci_remap_cfgspace() interface that allows to
> > map PCI config memory regions with MT_UNCACHED memory type (ie strongly
> > ordered - non-posted writes), providing a remap function that complies
> > with PCI specifications for config space transactions.
>
> Doesn't this have the side effect that configuration writes can bypass
> writes to device memory if there isn't an intervening dsb? (They
> probably can do on some CPUs today anyway.)

I assumed that in plain terms, the difference between MT_DEVICE and
MT_UNCACHED is write posting (aka bufferable) behaviour (across CPU
architecture versions) and that does not affect write ordering rules.

You and Catalin have more insights into ARM 32-bit memory types so
I definitely need your input here to be comprehensive and avoid
pitfalls, let me know if you have some specific CPUs in mind on
which this may trigger a regression.

> So, in any case, this looks like an improvement:
>
> Acked-by: Russell King <rmk+k...@armlinux.org.uk>

Thank you !

Lorenzo

Russell King - ARM Linux

unread,
Mar 21, 2017, 1:00:09 PM3/21/17
to
On Tue, Mar 21, 2017 at 03:26:36PM +0000, Lorenzo Pieralisi wrote:
> I assumed that in plain terms, the difference between MT_DEVICE and
> MT_UNCACHED is write posting (aka bufferable) behaviour (across CPU
> architecture versions) and that does not affect write ordering rules.

Having looked it up in the ARM ARM, I think we're fine. Device and
strongly ordered accesses are both ordered as one group, so can be
thought of as the same. The only difference is that Device can
complete from the CPU's perspective before it hits the peripheral,
whereas strongly ordered only completes once it's hit the peripheral.

So this looks like exactly the right thing to do here.

Lorenzo Pieralisi

unread,
Mar 22, 2017, 11:10:08 AM3/22/17
to
Hi Bjorn, Arnd,

On Thu, Mar 16, 2017 at 04:12:43PM -0500, Bjorn Helgaas wrote:
I would like to respin shortly so I have to make a decision on the
interface, either:

(1) I keep it a PCI only interface (which means I can even move it to
include/linux/pci.h and arch*/include/asm/pci.h to override it)

or

(2) We add it as a generic ioremap_* interface (I am not sure though
that's really useful other than to map PCI config space, actually
the reasoning behind the naming was to limit its usage to PCI
config space mappings)

I take it as Bjorn is keener on (2), just running a final check before
putting v2 together to make progress.

Thanks !
Lorenzo

Arnd Bergmann

unread,
Mar 22, 2017, 11:20:05 AM3/22/17
to
I'm fine with either way.

Arnd

Bjorn Helgaas

unread,
Mar 22, 2017, 12:40:10 PM3/22/17
to
The point of this seems to be to use non-posted mappings for both I/O
port space and ECAM (config) space. So I think the most readable way
would be to have generic definitions like this:

#ifndef ioremap_nopost
#define ioremap_nopost ioremap_nocache
#endif

#ifndef pgprot_nonposted
#define pgprot_nonposted(prot) pgprot_noncached(prot)
#endif

and let ARM/ARM64 implement their own versions of those. Then the
devm code might fit naturally in lib/devres.c, e.g.,

void __iomem *devm_ioremap_nopost(...) { ... }

and we probably wouldn't need the pci_remap_cfgspace() wrapper, e.g.,
we could do:

return ioremap_page_range(..., pgprot_nonposted(PAGE_KERNEL));

cfg->win = ioremap_nopost(...);

But I might be oversimplifying things.

BTW, I tried to apply the v1 series on my "master" branch (v4.11-rc1)
and patch 19 didn't apply cleanly. It's trivial and I can fix it up
by hand, so not really a problem, just FYI.

Bjorn
0 new messages