Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH v4 0/5] PCI: allocate 64bit mmio pref

6 views
Skip to first unread message

Yinghai Lu

unread,
Dec 10, 2013, 2:00:01 AM12/10/13
to
mmio 64 allocation that could help Guo Chao <y...@linux.vnet.ibm.com> on powerpc mmio allocation.
It will try to assign 64 bit resource above 4g at first.

And it is based on current pci/for-linus.

-v2: update after patch that move device_del down to pci_destroy_dev.
add "Try best to allocate pref mmio 64bit above 4G"

-v3: refresh and send out after pci_clip_resource() changes,
as Bjorn is not happy with attachments.

-v4: make pcibios_resource_to_bus take bus directly.

Yinghai Lu (5):
PCI: pcibus address to resource converting take bus instead of dev
PCI: Don't use 4G bus address directly in resource allocation
PCI: Try to allocate mem64 above 4G at first
PCI: Try best to allocate pref mmio 64bit above 4g
PCI: Sort pci root bus resources list

arch/alpha/kernel/pci-sysfs.c | 4 +-
arch/powerpc/kernel/pci-common.c | 4 +-
arch/powerpc/kernel/pci_of_scan.c | 4 +-
arch/sparc/kernel/pci.c | 6 +-
arch/x86/include/asm/pci.h | 1 -
drivers/pci/bus.c | 73 +++++++++++++++---
drivers/pci/host-bridge.c | 24 +++---
drivers/pci/probe.c | 18 ++---
drivers/pci/quirks.c | 2 +-
drivers/pci/rom.c | 2 +-
drivers/pci/setup-bus.c | 149 +++++++++++++++++++++++-------------
drivers/pci/setup-res.c | 16 +++-
drivers/pcmcia/i82092.c | 2 +-
drivers/pcmcia/yenta_socket.c | 6 +-
drivers/scsi/sym53c8xx_2/sym_glue.c | 5 +-
drivers/video/arkfb.c | 2 +-
drivers/video/s3fb.c | 2 +-
drivers/video/vt8623fb.c | 2 +-
include/linux/pci.h | 8 +-
19 files changed, 217 insertions(+), 113 deletions(-)

--
1.8.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Yinghai Lu

unread,
Dec 10, 2013, 2:00:02 AM12/10/13
to
Some x86 systems expose above 4G 64bit mmio in _CRS as non-pref mmio range.
[ 49.415281] PCI host bridge to bus 0000:00
[ 49.419921] pci_bus 0000:00: root bus resource [bus 00-1e]
[ 49.426107] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
[ 49.433041] pci_bus 0000:00: root bus resource [io 0x1000-0x5fff]
[ 49.440010] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[ 49.447768] pci_bus 0000:00: root bus resource [mem 0xfed8c000-0xfedfffff]
[ 49.455532] pci_bus 0000:00: root bus resource [mem 0x90000000-0x9fffbfff]
[ 49.463259] pci_bus 0000:00: root bus resource [mem 0x380000000000-0x381fffffffff]

During assign unassigned 64bit mmio resource, it will go through
every non-pref mmio for root bus in pci_bus_alloc_resource().
As the loop is with pci_bus_for_each_resource(), and could have chance
to use under 4G mmio range instead of above 4G mmio range if the requested
range is not big enough, even it could handle above 4G 64bit pref mmio.

For root bus, we can order list from high to low in pci_add_resource_offset(),
during creating root bus, it will still keep the same order in final bus
resource list.
pci_acpi_scan_root
==> add_resources
==> pci_add_resource_offset: # Add to temp resources
==> pci_create_root_bus
==> pci_bus_add_resource # add to final bus resources.

After that, we can make sure 64bit pref mmio for pci bridges will be allocated
higest of mmio non-pref, and in this case it is above 4G instead of under 4G.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
---
drivers/pci/bus.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 45d8de5..7798cd3 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -21,7 +21,8 @@
void pci_add_resource_offset(struct list_head *resources, struct resource *res,
resource_size_t offset)
{
- struct pci_host_bridge_window *window;
+ struct pci_host_bridge_window *window, *tmp;
+ struct list_head *n;

window = kzalloc(sizeof(struct pci_host_bridge_window), GFP_KERNEL);
if (!window) {
@@ -31,7 +32,17 @@ void pci_add_resource_offset(struct list_head *resources, struct resource *res,

window->res = res;
window->offset = offset;
- list_add_tail(&window->list, resources);
+
+ /* Keep list sorted according to res end */
+ n = resources;
+ list_for_each_entry(tmp, resources, list)
+ if (window->res->end > tmp->res->end) {
+ n = &tmp->list;
+ break;
+ }
+
+ /* Insert it just before n */
+ list_add_tail(&window->list, n);
}
EXPORT_SYMBOL(pci_add_resource_offset);

Yinghai Lu

unread,
Dec 10, 2013, 2:00:02 AM12/10/13
to
For allocating resource under bus path, we do not have dev to pass along,
and we only have bus to use instead.

-v2: drop pcibios_bus_addr_to_resource().
-v3: drop __* change requested by Bjorn.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
Cc: linux...@vger.kernel.org
Cc: linuxp...@lists.ozlabs.org
Cc: sparc...@vger.kernel.org
Cc: linux-...@lists.infradead.org
Cc: linux...@vger.kernel.org
Cc: linux...@vger.kernel.org
---
arch/alpha/kernel/pci-sysfs.c | 4 ++--
arch/powerpc/kernel/pci-common.c | 4 ++--
arch/powerpc/kernel/pci_of_scan.c | 4 ++--
arch/sparc/kernel/pci.c | 6 +++---
drivers/pci/host-bridge.c | 24 +++++++++++-------------
drivers/pci/probe.c | 18 +++++++++---------
drivers/pci/quirks.c | 2 +-
drivers/pci/rom.c | 2 +-
drivers/pci/setup-bus.c | 16 ++++++++--------
drivers/pci/setup-res.c | 2 +-
drivers/pcmcia/i82092.c | 2 +-
drivers/pcmcia/yenta_socket.c | 6 +++---
drivers/scsi/sym53c8xx_2/sym_glue.c | 5 +++--
drivers/video/arkfb.c | 2 +-
drivers/video/s3fb.c | 2 +-
drivers/video/vt8623fb.c | 2 +-
include/linux/pci.h | 4 ++--
17 files changed, 52 insertions(+), 53 deletions(-)

diff --git a/arch/alpha/kernel/pci-sysfs.c b/arch/alpha/kernel/pci-sysfs.c
index 2b183b0..99e8d47 100644
--- a/arch/alpha/kernel/pci-sysfs.c
+++ b/arch/alpha/kernel/pci-sysfs.c
@@ -83,7 +83,7 @@ static int pci_mmap_resource(struct kobject *kobj,
if (iomem_is_exclusive(res->start))
return -EINVAL;

- pcibios_resource_to_bus(pdev, &bar, res);
+ pcibios_resource_to_bus(pdev->bus, &bar, res);
vma->vm_pgoff += bar.start >> (PAGE_SHIFT - (sparse ? 5 : 0));
mmap_type = res->flags & IORESOURCE_MEM ? pci_mmap_mem : pci_mmap_io;

@@ -139,7 +139,7 @@ static int sparse_mem_mmap_fits(struct pci_dev *pdev, int num)
long dense_offset;
unsigned long sparse_size;

- pcibios_resource_to_bus(pdev, &bar, &pdev->resource[num]);
+ pcibios_resource_to_bus(pdev->bus, &bar, &pdev->resource[num]);

/* All core logic chips have 4G sparse address space, except
CIA which has 16G (see xxx_SPARSE_MEM and xxx_DENSE_MEM
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index a1e3e40..d9476c1 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -835,7 +835,7 @@ static void pcibios_fixup_resources(struct pci_dev *dev)
* at 0 as unset as well, except if PCI_PROBE_ONLY is also set
* since in that case, we don't want to re-assign anything
*/
- pcibios_resource_to_bus(dev, &reg, res);
+ pcibios_resource_to_bus(dev->bus, &reg, res);
if (pci_has_flag(PCI_REASSIGN_ALL_RSRC) ||
(reg.start == 0 && !pci_has_flag(PCI_PROBE_ONLY))) {
/* Only print message if not re-assigning */
@@ -886,7 +886,7 @@ static int pcibios_uninitialized_bridge_resource(struct pci_bus *bus,

/* Job is a bit different between memory and IO */
if (res->flags & IORESOURCE_MEM) {
- pcibios_resource_to_bus(dev, &region, res);
+ pcibios_resource_to_bus(dev->bus, &region, res);

/* If the BAR is non-0 then it's probably been initialized */
if (region.start != 0)
diff --git a/arch/powerpc/kernel/pci_of_scan.c b/arch/powerpc/kernel/pci_of_scan.c
index ac0b034..83c26d8 100644
--- a/arch/powerpc/kernel/pci_of_scan.c
+++ b/arch/powerpc/kernel/pci_of_scan.c
@@ -111,7 +111,7 @@ static void of_pci_parse_addrs(struct device_node *node, struct pci_dev *dev)
res->name = pci_name(dev);
region.start = base;
region.end = base + size - 1;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
}
}

@@ -280,7 +280,7 @@ void of_scan_pci_bridge(struct pci_dev *dev)
res->flags = flags;
region.start = of_read_number(&ranges[1], 2);
region.end = region.start + size - 1;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
}
sprintf(bus->name, "PCI Bus %04x:%02x", pci_domain_nr(bus),
bus->number);
diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c
index cb02145..7de8d1f 100644
--- a/arch/sparc/kernel/pci.c
+++ b/arch/sparc/kernel/pci.c
@@ -392,7 +392,7 @@ static void apb_fake_ranges(struct pci_dev *dev,
res->flags = IORESOURCE_IO;
region.start = (first << 21);
region.end = (last << 21) + ((1 << 21) - 1);
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);

pci_read_config_byte(dev, APB_MEM_ADDRESS_MAP, &map);
apb_calc_first_last(map, &first, &last);
@@ -400,7 +400,7 @@ static void apb_fake_ranges(struct pci_dev *dev,
res->flags = IORESOURCE_MEM;
region.start = (first << 29);
region.end = (last << 29) + ((1 << 29) - 1);
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
}

static void pci_of_scan_bus(struct pci_pbm_info *pbm,
@@ -491,7 +491,7 @@ static void of_scan_pci_bridge(struct pci_pbm_info *pbm,
res->flags = flags;
region.start = GET_64BIT(ranges, 1);
region.end = region.start + size - 1;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
}
after_ranges:
sprintf(bus->name, "PCI Bus %04x:%02x", pci_domain_nr(bus),
diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
index a68dc61..6bcd233 100644
--- a/drivers/pci/host-bridge.c
+++ b/drivers/pci/host-bridge.c
@@ -9,22 +9,19 @@

#include "pci.h"

-static struct pci_bus *find_pci_root_bus(struct pci_dev *dev)
+static struct pci_bus *find_pci_root_bus(struct pci_bus *bus)
{
- struct pci_bus *bus;
-
- bus = dev->bus;
while (bus->parent)
bus = bus->parent;

return bus;
}

-static struct pci_host_bridge *find_pci_host_bridge(struct pci_dev *dev)
+static struct pci_host_bridge *find_pci_host_bridge(struct pci_bus *bus)
{
- struct pci_bus *bus = find_pci_root_bus(dev);
+ struct pci_bus *root_bus = find_pci_root_bus(bus);

- return to_pci_host_bridge(bus->bridge);
+ return to_pci_host_bridge(root_bus->bridge);
}

void pci_set_host_bridge_release(struct pci_host_bridge *bridge,
@@ -40,10 +37,11 @@ static bool resource_contains(struct resource *res1, struct resource *res2)
return res1->start <= res2->start && res1->end >= res2->end;
}

-void pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
- struct resource *res)
+void pcibios_resource_to_bus(struct pci_bus *bus,
+ struct pci_bus_region *region,
+ struct resource *res)
{
- struct pci_host_bridge *bridge = find_pci_host_bridge(dev);
+ struct pci_host_bridge *bridge = find_pci_host_bridge(bus);
struct pci_host_bridge_window *window;
resource_size_t offset = 0;

@@ -68,10 +66,10 @@ static bool region_contains(struct pci_bus_region *region1,
return region1->start <= region2->start && region1->end >= region2->end;
}

-void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
- struct pci_bus_region *region)
+void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
+ struct pci_bus_region *region)
{
- struct pci_host_bridge *bridge = find_pci_host_bridge(dev);
+ struct pci_host_bridge *bridge = find_pci_host_bridge(bus);
struct pci_host_bridge_window *window;
resource_size_t offset = 0;

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 38e403d..f049e3f 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -269,8 +269,8 @@ int __pci_read_base(struct pci_dev *dev, enum pci_bar_type type,
region.end = l + sz;
}

- pcibios_bus_to_resource(dev, res, &region);
- pcibios_resource_to_bus(dev, &inverted_region, res);
+ pcibios_bus_to_resource(dev->bus, res, &region);
+ pcibios_resource_to_bus(dev->bus, &inverted_region, res);

/*
* If "A" is a BAR value (a bus address), "bus_to_resource(A)" is
@@ -364,7 +364,7 @@ static void pci_read_bridge_io(struct pci_bus *child)
res->flags = (io_base_lo & PCI_IO_RANGE_TYPE_MASK) | IORESOURCE_IO;
region.start = base;
region.end = limit + io_granularity - 1;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
dev_printk(KERN_DEBUG, &dev->dev, " bridge window %pR\n", res);
}
}
@@ -386,7 +386,7 @@ static void pci_read_bridge_mmio(struct pci_bus *child)
res->flags = (mem_base_lo & PCI_MEMORY_RANGE_TYPE_MASK) | IORESOURCE_MEM;
region.start = base;
region.end = limit + 0xfffff;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
dev_printk(KERN_DEBUG, &dev->dev, " bridge window %pR\n", res);
}
}
@@ -436,7 +436,7 @@ static void pci_read_bridge_mmio_pref(struct pci_bus *child)
res->flags |= IORESOURCE_MEM_64;
region.start = base;
region.end = limit + 0xfffff;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
dev_printk(KERN_DEBUG, &dev->dev, " bridge window %pR\n", res);
}
}
@@ -1084,24 +1084,24 @@ int pci_setup_device(struct pci_dev *dev)
region.end = 0x1F7;
res = &dev->resource[0];
res->flags = LEGACY_IO_RESOURCE;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
region.start = 0x3F6;
region.end = 0x3F6;
res = &dev->resource[1];
res->flags = LEGACY_IO_RESOURCE;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
}
if ((progif & 4) == 0) {
region.start = 0x170;
region.end = 0x177;
res = &dev->resource[2];
res->flags = LEGACY_IO_RESOURCE;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
region.start = 0x376;
region.end = 0x376;
res = &dev->resource[3];
res->flags = LEGACY_IO_RESOURCE;
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
}
}
break;
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 3a02717..5cb726c 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -339,7 +339,7 @@ static void quirk_io_region(struct pci_dev *dev, int port,
/* Convert from PCI bus to resource space */
bus_region.start = region;
bus_region.end = region + size - 1;
- pcibios_bus_to_resource(dev, res, &bus_region);
+ pcibios_bus_to_resource(dev->bus, res, &bus_region);

if (!pci_claim_resource(dev, nr))
dev_info(&dev->dev, "quirk: %pR claimed by %s\n", res, name);
diff --git a/drivers/pci/rom.c b/drivers/pci/rom.c
index c5d0a08..5d59572 100644
--- a/drivers/pci/rom.c
+++ b/drivers/pci/rom.c
@@ -31,7 +31,7 @@ int pci_enable_rom(struct pci_dev *pdev)
if (!res->flags)
return -1;

- pcibios_resource_to_bus(pdev, &region, res);
+ pcibios_resource_to_bus(pdev->bus, &region, res);
pci_read_config_dword(pdev, pdev->rom_base_reg, &rom_addr);
rom_addr &= ~PCI_ROM_ADDRESS_MASK;
rom_addr |= region.start | PCI_ROM_ADDRESS_ENABLE;
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 219a410..7933982 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -475,7 +475,7 @@ void pci_setup_cardbus(struct pci_bus *bus)
&bus->busn_res);

res = bus->resource[0];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_IO) {
/*
* The IO resource is allocated a range twice as large as it
@@ -489,7 +489,7 @@ void pci_setup_cardbus(struct pci_bus *bus)
}

res = bus->resource[1];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_IO) {
dev_info(&bridge->dev, " bridge window %pR\n", res);
pci_write_config_dword(bridge, PCI_CB_IO_BASE_1,
@@ -499,7 +499,7 @@ void pci_setup_cardbus(struct pci_bus *bus)
}

res = bus->resource[2];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_MEM) {
dev_info(&bridge->dev, " bridge window %pR\n", res);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_0,
@@ -509,7 +509,7 @@ void pci_setup_cardbus(struct pci_bus *bus)
}

res = bus->resource[3];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_MEM) {
dev_info(&bridge->dev, " bridge window %pR\n", res);
pci_write_config_dword(bridge, PCI_CB_MEMORY_BASE_1,
@@ -546,7 +546,7 @@ static void pci_setup_bridge_io(struct pci_bus *bus)

/* Set up the top and bottom of the PCI I/O segment for this bus. */
res = bus->resource[0];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_IO) {
pci_read_config_dword(bridge, PCI_IO_BASE, &l);
l &= 0xffff0000;
@@ -578,7 +578,7 @@ static void pci_setup_bridge_mmio(struct pci_bus *bus)

/* Set up the top and bottom of the PCI Memory segment for this bus. */
res = bus->resource[1];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_MEM) {
l = (region.start >> 16) & 0xfff0;
l |= region.end & 0xfff00000;
@@ -604,7 +604,7 @@ static void pci_setup_bridge_mmio_pref(struct pci_bus *bus)
/* Set up PREF base/limit. */
bu = lu = 0;
res = bus->resource[2];
- pcibios_resource_to_bus(bridge, &region, res);
+ pcibios_resource_to_bus(bridge->bus, &region, res);
if (res->flags & IORESOURCE_PREFETCH) {
l = (region.start >> 16) & 0xfff0;
l |= region.end & 0xfff00000;
@@ -1422,7 +1422,7 @@ static int iov_resources_unassigned(struct pci_dev *dev, void *data)
if (!r->flags)
continue;

- pcibios_resource_to_bus(dev, &region, r);
+ pcibios_resource_to_bus(dev->bus, &region, r);
if (!region.start) {
*unassigned = true;
return 1; /* return early from pci_walk_bus() */
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 83c4d3b..5c060b1 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -52,7 +52,7 @@ void pci_update_resource(struct pci_dev *dev, int resno)
if (res->flags & IORESOURCE_PCI_FIXED)
return;

- pcibios_resource_to_bus(dev, &region, res);
+ pcibios_resource_to_bus(dev->bus, &region, res);

new = region.start | (res->flags & PCI_REGION_FLAG_MASK);
if (res->flags & IORESOURCE_IO)
diff --git a/drivers/pcmcia/i82092.c b/drivers/pcmcia/i82092.c
index 519c4d6..7d47456 100644
--- a/drivers/pcmcia/i82092.c
+++ b/drivers/pcmcia/i82092.c
@@ -608,7 +608,7 @@ static int i82092aa_set_mem_map(struct pcmcia_socket *socket, struct pccard_mem_

enter("i82092aa_set_mem_map");

- pcibios_resource_to_bus(sock_info->dev, &region, mem->res);
+ pcibios_resource_to_bus(sock_info->dev->bus, &region, mem->res);

map = mem->map;
if (map > 4) {
diff --git a/drivers/pcmcia/yenta_socket.c b/drivers/pcmcia/yenta_socket.c
index dc18a3a..8485761 100644
--- a/drivers/pcmcia/yenta_socket.c
+++ b/drivers/pcmcia/yenta_socket.c
@@ -445,7 +445,7 @@ static int yenta_set_mem_map(struct pcmcia_socket *sock, struct pccard_mem_map *
unsigned int start, stop, card_start;
unsigned short word;

- pcibios_resource_to_bus(socket->dev, &region, mem->res);
+ pcibios_resource_to_bus(socket->dev->bus, &region, mem->res);

map = mem->map;
start = region.start;
@@ -709,7 +709,7 @@ static int yenta_allocate_res(struct yenta_socket *socket, int nr, unsigned type
region.start = config_readl(socket, addr_start) & mask;
region.end = config_readl(socket, addr_end) | ~mask;
if (region.start && region.end > region.start && !override_bios) {
- pcibios_bus_to_resource(dev, res, &region);
+ pcibios_bus_to_resource(dev->bus, res, &region);
if (pci_claim_resource(dev, PCI_BRIDGE_RESOURCES + nr) == 0)
return 0;
dev_printk(KERN_INFO, &dev->dev,
@@ -1033,7 +1033,7 @@ static void yenta_config_init(struct yenta_socket *socket)
struct pci_dev *dev = socket->dev;
struct pci_bus_region region;

- pcibios_resource_to_bus(socket->dev, &region, &dev->resource[0]);
+ pcibios_resource_to_bus(socket->dev->bus, &region, &dev->resource[0]);

config_writel(socket, CB_LEGACY_MODE_BASE, 0);
config_writel(socket, PCI_BASE_ADDRESS_0, region.start);
diff --git a/drivers/scsi/sym53c8xx_2/sym_glue.c b/drivers/scsi/sym53c8xx_2/sym_glue.c
index bac55f7..6d3ee1a 100644
--- a/drivers/scsi/sym53c8xx_2/sym_glue.c
+++ b/drivers/scsi/sym53c8xx_2/sym_glue.c
@@ -1531,7 +1531,7 @@ static int sym_iomap_device(struct sym_device *device)
struct pci_bus_region bus_addr;
int i = 2;

- pcibios_resource_to_bus(pdev, &bus_addr, &pdev->resource[1]);
+ pcibios_resource_to_bus(pdev->bus, &bus_addr, &pdev->resource[1]);
device->mmio_base = bus_addr.start;

if (device->chip.features & FE_RAM) {
@@ -1541,7 +1541,8 @@ static int sym_iomap_device(struct sym_device *device)
*/
if (!pdev->resource[i].flags)
i++;
- pcibios_resource_to_bus(pdev, &bus_addr, &pdev->resource[i]);
+ pcibios_resource_to_bus(pdev->bus, &bus_addr,
+ &pdev->resource[i]);
device->ram_base = bus_addr.start;
}

diff --git a/drivers/video/arkfb.c b/drivers/video/arkfb.c
index a6b29bd..adc4ea2 100644
--- a/drivers/video/arkfb.c
+++ b/drivers/video/arkfb.c
@@ -1014,7 +1014,7 @@ static int ark_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)

vga_res.flags = IORESOURCE_IO;

- pcibios_bus_to_resource(dev, &vga_res, &bus_reg);
+ pcibios_bus_to_resource(dev->bus, &vga_res, &bus_reg);

par->state.vgabase = (void __iomem *) vga_res.start;

diff --git a/drivers/video/s3fb.c b/drivers/video/s3fb.c
index 968b299..9a3f8f1 100644
--- a/drivers/video/s3fb.c
+++ b/drivers/video/s3fb.c
@@ -1180,7 +1180,7 @@ static int s3_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)

vga_res.flags = IORESOURCE_IO;

- pcibios_bus_to_resource(dev, &vga_res, &bus_reg);
+ pcibios_bus_to_resource(dev->bus, &vga_res, &bus_reg);

par->state.vgabase = (void __iomem *) vga_res.start;

diff --git a/drivers/video/vt8623fb.c b/drivers/video/vt8623fb.c
index 8bc6e09..5c7cbc6 100644
--- a/drivers/video/vt8623fb.c
+++ b/drivers/video/vt8623fb.c
@@ -729,7 +729,7 @@ static int vt8623_pci_probe(struct pci_dev *dev, const struct pci_device_id *id)

vga_res.flags = IORESOURCE_IO;

- pcibios_bus_to_resource(dev, &vga_res, &bus_reg);
+ pcibios_bus_to_resource(dev->bus, &vga_res, &bus_reg);

par->state.vgabase = (void __iomem *) vga_res.start;

diff --git a/include/linux/pci.h b/include/linux/pci.h
index eb8078a..da069fa 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -737,9 +737,9 @@ void pci_fixup_cardbus(struct pci_bus *);

/* Generic PCI functions used internally */

-void pcibios_resource_to_bus(struct pci_dev *dev, struct pci_bus_region *region,
+void pcibios_resource_to_bus(struct pci_bus *bus, struct pci_bus_region *region,
struct resource *res);
-void pcibios_bus_to_resource(struct pci_dev *dev, struct resource *res,
+void pcibios_bus_to_resource(struct pci_bus *bus, struct resource *res,
struct pci_bus_region *region);
void pcibios_scan_specific_bus(int busn);
struct pci_bus *pci_find_bus(int domain, int busnr);

Yinghai Lu

unread,
Dec 10, 2013, 2:00:02 AM12/10/13
to
Current we are using PCIBIOS_MAX_MEM_32 (4G limit) directly in the
pci_bus_alloc_resource to make sure that don't allocate too high
pref 64bit above 4G in the system that does not support that.

That is not right, as allocate_resource() should take resource limit.

Add pci_clip_resource() and use it check the pci bus address limit.

At last remove PCIBIOS_MAX_MEM_32.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
---
arch/x86/include/asm/pci.h | 1 -
drivers/pci/bus.c | 41 ++++++++++++++++++++++++++++++++++-------
include/linux/pci.h | 4 ----
3 files changed, 34 insertions(+), 12 deletions(-)

diff --git a/arch/x86/include/asm/pci.h b/arch/x86/include/asm/pci.h
index 947b5c4..122c299 100644
--- a/arch/x86/include/asm/pci.h
+++ b/arch/x86/include/asm/pci.h
@@ -125,7 +125,6 @@ int setup_msi_irq(struct pci_dev *dev, struct msi_desc *msidesc,

/* generic pci stuff */
#include <asm-generic/pci.h>
-#define PCIBIOS_MAX_MEM_32 0xffffffff

#ifdef CONFIG_NUMA
/* Returns the node based on pci bus */
diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index fc1b740..3ad4fd9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -98,6 +98,25 @@ void pci_bus_remove_resources(struct pci_bus *bus)
}
}

+static struct pci_bus_region pci_mem_32 = {0, 0xffffffff};
+
+static void pci_clip_resource(struct resource *res, struct pci_bus *bus,
+ struct pci_bus_region *region)
+{
+ struct pci_bus_region r;
+
+ pcibios_resource_to_bus(bus, &r, res);
+ if (r.start < region->start)
+ r.start = region->start;
+ if (r.end > region->end)
+ r.end = region->end;
+
+ if (r.end < r.start)
+ res->end = res->start - 1;
+ else
+ pcibios_bus_to_resource(bus, res, &r);
+}
+
/**
* pci_bus_alloc_resource - allocate a resource from a parent bus
* @bus: PCI bus
@@ -125,15 +144,12 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
{
int i, ret = -ENOMEM;
struct resource *r;
- resource_size_t max = -1;

type_mask |= IORESOURCE_IO | IORESOURCE_MEM;

- /* don't allocate too high if the pref mem doesn't support 64bit*/
- if (!(res->flags & IORESOURCE_MEM_64))
- max = PCIBIOS_MAX_MEM_32;
-
pci_bus_for_each_resource(bus, r, i) {
+ struct resource avail;
+
if (!r)
continue;

@@ -147,10 +163,21 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
!(res->flags & IORESOURCE_PREFETCH))
continue;

+ /*
+ * don't allocate too high if the pref mem doesn't
+ * support 64bit.
+ */
+ avail = *r;
+ if (!(res->flags & IORESOURCE_MEM_64)) {
+ pci_clip_resource(&avail, bus, &pci_mem_32);
+ if (!resource_size(&avail))
+ continue;
+ }
+
/* Ok, try it out.. */
ret = allocate_resource(r, res, size,
- r->start ? : min,
- max, align,
+ max(avail.start, r->start ? : min),
+ avail.end, align,
alignf, alignf_data);
if (ret == 0)
break;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index da069fa..99e9040 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1485,10 +1485,6 @@ static inline struct pci_dev *pci_dev_get(struct pci_dev *dev)

#include <asm/pci.h>

-#ifndef PCIBIOS_MAX_MEM_32
-#define PCIBIOS_MAX_MEM_32 (-1)
-#endif
-
/* these helpers provide future and backwards compatibility
* for accessing popular PCI BAR info */
#define pci_resource_start(dev, bar) ((dev)->resource[(bar)].start)

Yinghai Lu

unread,
Dec 10, 2013, 2:00:02 AM12/10/13
to
When one of children resources does not support MEM_64, MEM_64 for
bridge get reset, so pull down whole pref resource on the bridge under 4G.

If the bridge support pref mem 64, will only allocate that with pref mem64 to
children that support it.
For children resources if they only support pref mem 32, will allocate them
from non pref mem instead.

If the bridge only support 32bit pref mmio, will still have all children pref
mmio under that.

-v2: Add release bridge res support with bridge mem res for pref_mem children res.
-v3: refresh and make it can be applied early before for_each_dev_res patchset.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
Tested-by: Guo Chao <y...@linux.vnet.ibm.com>
---
drivers/pci/setup-bus.c | 133 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 14 ++++-
2 files changed, 101 insertions(+), 46 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7933982..843764e 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -711,12 +711,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
bus resource of a given type. Note: we intentionally skip
the bus resources which have already been assigned (that is,
have non-NULL parent resource). */
-static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned long type)
+static struct resource *find_free_bus_resource(struct pci_bus *bus,
+ unsigned long type_mask, unsigned long type)
{
int i;
struct resource *r;
- unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;

pci_bus_for_each_resource(bus, r, i) {
if (r == &ioport_resource || r == &iomem_resource)
@@ -813,7 +812,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
resource_size_t add_size, struct list_head *realloc_head)
{
struct pci_dev *dev;
- struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
+ struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
+ IORESOURCE_IO);
resource_size_t size = 0, size0 = 0, size1 = 0;
resource_size_t children_add_size = 0;
resource_size_t min_align, align;
@@ -913,15 +913,16 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
* guarantees that all child resources fit in this size.
*/
static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
- unsigned long type, resource_size_t min_size,
- resource_size_t add_size,
- struct list_head *realloc_head)
+ unsigned long type, unsigned long type2,
+ resource_size_t min_size, resource_size_t add_size,
+ struct list_head *realloc_head)
{
struct pci_dev *dev;
resource_size_t min_align, align, size, size0, size1;
resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
int order, max_order;
- struct resource *b_res = find_free_bus_resource(bus, type);
+ struct resource *b_res = find_free_bus_resource(bus,
+ mask | IORESOURCE_PREFETCH, type);
unsigned int mem64_mask = 0;
resource_size_t children_add_size = 0;

@@ -942,7 +943,8 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
struct resource *r = &dev->resource[i];
resource_size_t r_size;

- if (r->parent || (r->flags & mask) != type)
+ if (r->parent || ((r->flags & mask) != type &&
+ (r->flags & mask) != type2))
continue;
r_size = resource_size(r);
#ifdef CONFIG_PCI_IOV
@@ -1115,8 +1117,9 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
struct list_head *realloc_head)
{
struct pci_dev *dev;
- unsigned long mask, prefmask;
+ unsigned long mask, prefmask, type2 = 0;
resource_size_t additional_mem_size = 0, additional_io_size = 0;
+ struct resource *b_res;

list_for_each_entry(dev, &bus->devices, bus_list) {
struct pci_bus *b = dev->subordinate;
@@ -1161,15 +1164,31 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
has already been allocated by arch code, try
non-prefetchable range for both types of PCI memory
resources. */
+ b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
mask = IORESOURCE_MEM;
prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
- if (pbus_size_mem(bus, prefmask, prefmask,
+ if (b_res[2].flags & IORESOURCE_MEM_64) {
+ prefmask |= IORESOURCE_MEM_64;
+ if (pbus_size_mem(bus, prefmask, prefmask, prefmask,
realloc_head ? 0 : additional_mem_size,
- additional_mem_size, realloc_head))
- mask = prefmask; /* Success, size non-prefetch only. */
- else
- additional_mem_size += additional_mem_size;
- pbus_size_mem(bus, mask, IORESOURCE_MEM,
+ additional_mem_size, realloc_head)) {
+ /* Success, size non-pref64 only. */
+ mask = prefmask;
+ type2 = prefmask & ~IORESOURCE_MEM_64;
+ }
+ }
+ if (!type2) {
+ prefmask &= ~IORESOURCE_MEM_64;
+ if (pbus_size_mem(bus, prefmask, prefmask, prefmask,
+ realloc_head ? 0 : additional_mem_size,
+ additional_mem_size, realloc_head)) {
+ /* Success, size non-prefetch only. */
+ mask = prefmask;
+ } else
+ additional_mem_size += additional_mem_size;
+ type2 = IORESOURCE_MEM;
+ }
+ pbus_size_mem(bus, mask, IORESOURCE_MEM, type2,
realloc_head ? 0 : additional_mem_size,
additional_mem_size, realloc_head);
break;
@@ -1255,42 +1274,66 @@ static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
static void pci_bridge_release_resources(struct pci_bus *bus,
unsigned long type)
{
- int idx;
- bool changed = false;
- struct pci_dev *dev;
+ struct pci_dev *dev = bus->self;
struct resource *r;
unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
+ unsigned old_flags = 0;
+ struct resource *b_res;
+ int idx = 1;

- dev = bus->self;
- for (idx = PCI_BRIDGE_RESOURCES; idx <= PCI_BRIDGE_RESOURCE_END;
- idx++) {
- r = &dev->resource[idx];
- if ((r->flags & type_mask) != type)
- continue;
- if (!r->parent)
- continue;
- /*
- * if there are children under that, we should release them
- * all
- */
- release_child_resources(r);
- if (!release_resource(r)) {
- dev_printk(KERN_DEBUG, &dev->dev,
- "resource %d %pR released\n", idx, r);
- /* keep the old size */
- r->end = resource_size(r) - 1;
- r->start = 0;
- r->flags = 0;
- changed = true;
- }
- }
+ b_res = &dev->resource[PCI_BRIDGE_RESOURCES];
+
+ /*
+ * 1. if there is io port assign fail, will release bridge
+ * io port.
+ * 2. if there is non pref mmio assign fail, release bridge
+ * nonpref mmio.
+ * 3. if there is 64bit pref mmio assign fail, and bridge pref
+ * is 64bit, release bridge pref mmio.
+ * 4. if there is pref mmio assign fail, and bridge pref is
+ * 32bit mmio, release bridge pref mmio
+ * 5. if there is pref mmio assign fail, and bridge pref is not
+ * assigned, release bridge nonpref mmio.
+ */
+ if (type & IORESOURCE_IO)
+ idx = 0;
+ else if (!(type & IORESOURCE_PREFETCH))
+ idx = 1;
+ else if ((type & IORESOURCE_MEM_64) &&
+ (b_res[2].flags & IORESOURCE_MEM_64))
+ idx = 2;
+ else if (!(b_res[2].flags & IORESOURCE_MEM_64) &&
+ (b_res[2].flags & IORESOURCE_PREFETCH))
+ idx = 2;
+ else
+ idx = 1;
+
+ r = &b_res[idx];
+
+ if (!r->parent)
+ return;
+
+ /*
+ * if there are children under that, we should release them
+ * all
+ */
+ release_child_resources(r);
+ if (!release_resource(r)) {
+ type = old_flags = r->flags & type_mask;
+ dev_printk(KERN_DEBUG, &dev->dev, "resource %d %pR released\n",
+ PCI_BRIDGE_RESOURCES + idx, r);
+ /* keep the old size */
+ r->end = resource_size(r) - 1;
+ r->start = 0;
+ r->flags = 0;

- if (changed) {
/* avoiding touch the one without PREF */
if (type & IORESOURCE_PREFETCH)
type = IORESOURCE_PREFETCH;
__pci_setup_bridge(bus, type);
+ /* for next child res under same bridge */
+ r->flags = old_flags;
}
}

@@ -1469,7 +1512,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
LIST_HEAD(fail_head);
struct pci_dev_resource *fail_res;
unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
int pci_try_num = 1;
enum enable_type enable_local;

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 5c060b1..1a84f30 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -208,9 +208,21 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,

/* First, try exact prefetching match.. */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
- IORESOURCE_PREFETCH,
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
pcibios_align_resource, dev);

+ if (ret < 0 &&
+ (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
+ /*
+ * That failed.
+ *
+ * Try below 4g pref
+ */
+ ret = pci_bus_alloc_resource(bus, res, size, align, min,
+ IORESOURCE_PREFETCH,
+ pcibios_align_resource, dev);
+ }
+
if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
/*
* That failed.

Yinghai Lu

unread,
Dec 10, 2013, 2:00:02 AM12/10/13
to
On system with more pcie cards, we do not have enough range under 4G
to allocate those pci devices.

On 64bit system, we could try to allocate mem64 above 4G at first,
and fall back to below 4g if it can not find any above 4g.

x86 32bit without X86_PAE support will have bottom set to 0, because
resource_size_t is 32bit.
For 32bit kernel that resource_size_t is 64bit when pae is support.
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
use MAX_RESOURCE instead of -1
use start/end instead of bottom/max
for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.
-v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
-v6: checking pcibios_resource_to_bus return for every bus res, to decide it
if we need to try high at first.
It supports all arches instead of just x86_64.
-v7: split 4G limit change out to another patch according to Bjorn.
also use pci_clip_resource instead.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
---
drivers/pci/bus.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 3ad4fd9..45d8de5 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -99,6 +99,8 @@ void pci_bus_remove_resources(struct pci_bus *bus)
}

static struct pci_bus_region pci_mem_32 = {0, 0xffffffff};
+static struct pci_bus_region pci_mem_64 = {(resource_size_t)(1ULL<<32),
+ (resource_size_t)(-1ULL)};

static void pci_clip_resource(struct resource *res, struct pci_bus *bus,
struct pci_bus_region *region)
@@ -149,6 +151,7 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,

pci_bus_for_each_resource(bus, r, i) {
struct resource avail;
+ int try_again = 0;

if (!r)
continue;
@@ -165,15 +168,23 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,

/*
* don't allocate too high if the pref mem doesn't
- * support 64bit.
+ * support 64bit, also if this is a 64-bit mem
+ * resource, try above 4GB first
*/
avail = *r;
- if (!(res->flags & IORESOURCE_MEM_64)) {
+ if (res->flags & IORESOURCE_MEM_64) {
+ pci_clip_resource(&avail, bus, &pci_mem_64);
+ if (!resource_size(&avail))
+ avail = *r;
+ else
+ try_again = 1;
+ } else {
pci_clip_resource(&avail, bus, &pci_mem_32);
if (!resource_size(&avail))
continue;
}

+again:
/* Ok, try it out.. */
ret = allocate_resource(r, res, size,
max(avail.start, r->start ? : min),
@@ -181,6 +192,12 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
alignf, alignf_data);
if (ret == 0)
break;
+
+ if (try_again) {
+ avail = *r;
+ try_again = 0;
+ goto again;
+ }
}
return ret;

Guo Chao

unread,
Dec 16, 2013, 3:30:02 AM12/16/13
to
64-bit non-prefetchable BARs are missed from caculation in the scheme,
causing assign failed eventually.

[ 0.350882] pci 0002:00:00.0: BAR 14: assigned [mem
0x3d04080000000-0x3d04080
7fffff]
[ 0.350941] pci 0002:01:00.4: BAR 2: assigned [mem
0x3d04080000000-0x3d040807fffff 64bit]
[ 0.351009] pci 0002:01:00.0: BAR 0: can't assign mem (size 0x40000)
[ 0.351055] pci 0002:01:00.0: BAR 6: can't assign mem pref (size
0x40000)
[ 0.351101] pci 0002:01:00.1: BAR 0: can't assign mem (size 0x40000)
[ 0.351148] pci 0002:01:00.1: BAR 6: can't assign mem pref (size
0x40000)
[ 0.351195] pci 0002:01:00.2: BAR 0: can't assign mem (size 0x40000)
[ 0.351241] pci 0002:01:00.2: BAR 6: can't assign mem pref (size
0x40000)
[ 0.351286] pci 0002:01:00.3: BAR 0: can't assign mem (size 0x40000)
[ 0.351335] pci 0002:01:00.3: BAR 6: can't assign mem pref (size
0x40000)
[ 0.351382] pci 0002:01:00.4: BAR 0: can't assign mem (size 0x40000)
[ 0.351428] pci 0002:01:00.5: BAR 0: can't assign mem (size 0x40000)
[ 0.351473] pci 0002:01:00.6: BAR 0: can't assign mem (size 0x40000)
[ 0.351519] pci 0002:01:00.0: BAR 4: can't assign mem (size 0x2000)
[ 0.351604] pci 0002:01:00.1: BAR 4: can't assign mem (size 0x2000)
[ 0.351696] pci 0002:01:00.2: BAR 4: can't assign mem (size 0x2000)
[ 0.351789] pci 0002:01:00.3: BAR 4: can't assign mem (size 0x2000)
[ 0.351882] pci 0002:01:00.4: BAR 4: can't assign mem (size 0x2000)
[ 0.351974] pci 0002:01:00.5: BAR 4: can't assign mem (size 0x2000)
[ 0.352067] pci 0002:01:00.6: BAR 4: can't assign mem (size 0x2000)


Though I remember 64-bit BAR should always be prefetchable ... ...

Will you figure out a better way to cover them or just add a 'type3' parameter?

Thanks,
Guo Chao

Yinghai Lu

unread,
Dec 16, 2013, 1:20:02 PM12/16/13
to
Not really.

If the root bus has 64bit mmio non-pref, and devices on the root bus
directly, could have
64bit non-pref range.

but we don't need to do size bridge for root bus as we can not change
root bus resource.

for pci bridge, according spec, it would support
1. 32bit mmio non-pref
2. 64bit mmio pref or 32 bit mmio pref.

>
> Will you figure out a better way to cover them or just add a 'type3' parameter?

if the bridge's mmio pref support 64bit pref, we will only use them
with above 4G 64bit support.
other 32bit mmio pref from children will be under bridge 32bit mmio
non-pref range.

Maybe I miss sth in this path. so please post whole boot log.

Thanks

Yinghai

Yinghai Lu

unread,
Dec 16, 2013, 4:40:02 PM12/16/13
to
Looks like we have add type3 for that case.

Yinghai Lu

unread,
Dec 16, 2013, 7:40:02 PM12/16/13
to
On Mon, Dec 16, 2013 at 1:36 PM, Yinghai Lu <yin...@kernel.org> wrote:
> On Mon, Dec 16, 2013 at 10:13 AM, Yinghai Lu <yin...@kernel.org> wrote:
>>>
>>> 64-bit non-prefetchable BARs are missed from caculation in the scheme,
>>> causing assign failed eventually.

>>>
>>> Will you figure out a better way to cover them or just add a 'type3' parameter?
>
> Looks like we have add type3 for that case.

please check attached delta patch.

Thanks

Yinghai
pref_mem_64_only_fix_non_pref.patch

Guo Chao

unread,
Dec 18, 2013, 5:00:03 AM12/18/13
to
Hi:
It works, thank you.

Guo Chao

> Thanks
>
> Yinghai

> ---
> drivers/pci/setup-bus.c | 17 +++++++++++------
> drivers/pci/setup-res.c | 8 ++++++--
> 2 files changed, 17 insertions(+), 8 deletions(-)
>
> Index: linux-2.6/drivers/pci/setup-bus.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/setup-bus.c
> +++ linux-2.6/drivers/pci/setup-bus.c
> @@ -916,6 +916,7 @@ static inline resource_size_t calculate_
> */
> static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
> unsigned long type, unsigned long type2,
> + unsigned long type3,
> resource_size_t min_size, resource_size_t add_size,
> struct list_head *realloc_head)
> {
> @@ -946,7 +947,8 @@ static int pbus_size_mem(struct pci_bus
> resource_size_t r_size;
>
> if (r->parent || ((r->flags & mask) != type &&
> - (r->flags & mask) != type2))
> + (r->flags & mask) != type2 &&
> + (r->flags & mask) != type3))
> continue;
> r_size = resource_size(r);
> #ifdef CONFIG_PCI_IOV
> @@ -1119,7 +1121,7 @@ void __ref __pci_bus_size_bridges(struct
> struct list_head *realloc_head)
> {
> struct pci_dev *dev;
> - unsigned long mask, prefmask, type2 = 0;
> + unsigned long mask, prefmask, type2 = 0, type3 = 0;
> resource_size_t additional_mem_size = 0, additional_io_size = 0;
> struct resource *b_res;
>
> @@ -1171,26 +1173,29 @@ void __ref __pci_bus_size_bridges(struct
> prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
> if (b_res[2].flags & IORESOURCE_MEM_64) {
> prefmask |= IORESOURCE_MEM_64;
> - if (pbus_size_mem(bus, prefmask, prefmask, prefmask,
> + if (pbus_size_mem(bus, prefmask, prefmask,
> + prefmask, prefmask,
> realloc_head ? 0 : additional_mem_size,
> additional_mem_size, realloc_head)) {
> /* Success, size non-pref64 only. */
> mask = prefmask;
> type2 = prefmask & ~IORESOURCE_MEM_64;
> + type3 = prefmask & ~IORESOURCE_PREFETCH;
> }
> }
> if (!type2) {
> prefmask &= ~IORESOURCE_MEM_64;
> - if (pbus_size_mem(bus, prefmask, prefmask, prefmask,
> + if (pbus_size_mem(bus, prefmask, prefmask,
> + prefmask, prefmask,
> realloc_head ? 0 : additional_mem_size,
> additional_mem_size, realloc_head)) {
> /* Success, size non-prefetch only. */
> mask = prefmask;
> } else
> additional_mem_size += additional_mem_size;
> - type2 = IORESOURCE_MEM;
> + type2 = type3 = IORESOURCE_MEM;
> }
> - pbus_size_mem(bus, mask, IORESOURCE_MEM, type2,
> + pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
> realloc_head ? 0 : additional_mem_size,
> additional_mem_size, realloc_head);
> break;
> Index: linux-2.6/drivers/pci/setup-res.c
> ===================================================================
> --- linux-2.6.orig/drivers/pci/setup-res.c
> +++ linux-2.6/drivers/pci/setup-res.c
> @@ -212,7 +212,8 @@ static int __pci_assign_resource(struct
> pcibios_align_resource, dev);
>
> if (ret < 0 &&
> - (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
> + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
> + (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
> /*
> * That failed.
> *
> @@ -223,12 +224,15 @@ static int __pci_assign_resource(struct
> pcibios_align_resource, dev);
> }
>
> - if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
> + if (ret < 0 &&
> + (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
> /*
> * That failed.
> *
> * But a prefetching area can handle a non-prefetching
> * window (it will just not perform as well).
> + *
> + * Also can put 64bit under 32bit range. (below 4g).
> */
> ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
> pcibios_align_resource, dev);

Yinghai Lu

unread,
Dec 19, 2013, 1:10:01 AM12/19/13
to
When one of children resources does not support MEM_64, MEM_64 for
bridge get reset, so pull down whole pref resource on the bridge under 4G.

If the bridge support pref mem 64, will only allocate that with pref mem64 to
children that support it.
For children resources if they only support pref mem 32, will allocate them
from non pref mem instead.

If the bridge only support 32bit pref mmio, will still have all children pref
mmio under that.

-v2: Add release bridge res support with bridge mem res for pref_mem children res.
-v3: refresh and make it can be applied early before for_each_dev_res patchset.
-v4: fix non-pref mmio 64bit support found by Guo Chao.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
Tested-by: Guo Chao <y...@linux.vnet.ibm.com>
---
drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 20 ++++++-
2 files changed, 111 insertions(+), 47 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 138bdd6..b29504f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -713,12 +713,11 @@ static void pci_bridge_check_ranges(struct pci_bus *bus)
bus resource of a given type. Note: we intentionally skip
the bus resources which have already been assigned (that is,
have non-NULL parent resource). */
-static struct resource *find_free_bus_resource(struct pci_bus *bus, unsigned long type)
+static struct resource *find_free_bus_resource(struct pci_bus *bus,
+ unsigned long type_mask, unsigned long type)
{
int i;
struct resource *r;
- unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;

pci_bus_for_each_resource(bus, r, i) {
if (r == &ioport_resource || r == &iomem_resource)
@@ -815,7 +814,8 @@ static void pbus_size_io(struct pci_bus *bus, resource_size_t min_size,
resource_size_t add_size, struct list_head *realloc_head)
{
struct pci_dev *dev;
- struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO);
+ struct resource *b_res = find_free_bus_resource(bus, IORESOURCE_IO,
+ IORESOURCE_IO);
resource_size_t size = 0, size0 = 0, size1 = 0;
resource_size_t children_add_size = 0;
resource_size_t min_align, align;
@@ -915,15 +915,17 @@ static inline resource_size_t calculate_mem_align(resource_size_t *aligns,
* guarantees that all child resources fit in this size.
*/
static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
- unsigned long type, resource_size_t min_size,
- resource_size_t add_size,
- struct list_head *realloc_head)
+ unsigned long type, unsigned long type2,
+ unsigned long type3,
+ resource_size_t min_size, resource_size_t add_size,
+ struct list_head *realloc_head)
{
struct pci_dev *dev;
resource_size_t min_align, align, size, size0, size1;
resource_size_t aligns[12]; /* Alignments from 1Mb to 2Gb */
int order, max_order;
- struct resource *b_res = find_free_bus_resource(bus, type);
+ struct resource *b_res = find_free_bus_resource(bus,
+ mask | IORESOURCE_PREFETCH, type);
unsigned int mem64_mask = 0;
resource_size_t children_add_size = 0;

@@ -944,7 +946,9 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask,
struct resource *r = &dev->resource[i];
resource_size_t r_size;

- if (r->parent || (r->flags & mask) != type)
+ if (r->parent || ((r->flags & mask) != type &&
+ (r->flags & mask) != type2 &&
+ (r->flags & mask) != type3))
continue;
r_size = resource_size(r);
#ifdef CONFIG_PCI_IOV
@@ -1117,8 +1121,9 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
struct list_head *realloc_head)
{
struct pci_dev *dev;
- unsigned long mask, prefmask;
+ unsigned long mask, prefmask, type2 = 0, type3 = 0;
resource_size_t additional_mem_size = 0, additional_io_size = 0;
+ struct resource *b_res;

list_for_each_entry(dev, &bus->devices, bus_list) {
struct pci_bus *b = dev->subordinate;
@@ -1163,15 +1168,34 @@ void __ref __pci_bus_size_bridges(struct pci_bus *bus,
has already been allocated by arch code, try
non-prefetchable range for both types of PCI memory
resources. */
+ b_res = &bus->self->resource[PCI_BRIDGE_RESOURCES];
mask = IORESOURCE_MEM;
prefmask = IORESOURCE_MEM | IORESOURCE_PREFETCH;
- if (pbus_size_mem(bus, prefmask, prefmask,
+ if (b_res[2].flags & IORESOURCE_MEM_64) {
+ prefmask |= IORESOURCE_MEM_64;
+ if (pbus_size_mem(bus, prefmask, prefmask,
+ prefmask, prefmask,
realloc_head ? 0 : additional_mem_size,
- additional_mem_size, realloc_head))
- mask = prefmask; /* Success, size non-prefetch only. */
- else
- additional_mem_size += additional_mem_size;
- pbus_size_mem(bus, mask, IORESOURCE_MEM,
+ additional_mem_size, realloc_head)) {
+ /* Success, size non-pref64 only. */
+ mask = prefmask;
+ type2 = prefmask & ~IORESOURCE_MEM_64;
+ type3 = prefmask & ~IORESOURCE_PREFETCH;
+ }
+ }
+ if (!type2) {
+ prefmask &= ~IORESOURCE_MEM_64;
+ if (pbus_size_mem(bus, prefmask, prefmask,
+ prefmask, prefmask,
+ realloc_head ? 0 : additional_mem_size,
+ additional_mem_size, realloc_head)) {
+ /* Success, size non-prefetch only. */
+ mask = prefmask;
+ } else
+ additional_mem_size += additional_mem_size;
+ type2 = type3 = IORESOURCE_MEM;
+ }
+ pbus_size_mem(bus, mask, IORESOURCE_MEM, type2, type3,
realloc_head ? 0 : additional_mem_size,
additional_mem_size, realloc_head);
break;
@@ -1257,42 +1281,66 @@ static void __ref __pci_bridge_assign_resources(const struct pci_dev *bridge,
static void pci_bridge_release_resources(struct pci_bus *bus,
unsigned long type)
{
- int idx;
- bool changed = false;
- struct pci_dev *dev;
+ struct pci_dev *dev = bus->self;
struct resource *r;
unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
+ unsigned old_flags = 0;
+ struct resource *b_res;
@@ -1471,7 +1519,7 @@ void pci_assign_unassigned_root_bus_resources(struct pci_bus *bus)
LIST_HEAD(fail_head);
struct pci_dev_resource *fail_res;
unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM |
- IORESOURCE_PREFETCH;
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64;
int pci_try_num = 1;
enum enable_type enable_local;

diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 5c060b1..2c659e4 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -208,15 +208,31 @@ static int __pci_assign_resource(struct pci_bus *bus, struct pci_dev *dev,

/* First, try exact prefetching match.. */
ret = pci_bus_alloc_resource(bus, res, size, align, min,
- IORESOURCE_PREFETCH,
+ IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
pcibios_align_resource, dev);

- if (ret < 0 && (res->flags & IORESOURCE_PREFETCH)) {
+ if (ret < 0 &&
+ (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) ==
+ (IORESOURCE_PREFETCH | IORESOURCE_MEM_64)) {
+ /*
+ * That failed.
+ *
+ * Try below 4g pref
+ */
+ ret = pci_bus_alloc_resource(bus, res, size, align, min,
+ IORESOURCE_PREFETCH,
+ pcibios_align_resource, dev);
+ }
+
+ if (ret < 0 &&
+ (res->flags & (IORESOURCE_PREFETCH | IORESOURCE_MEM_64))) {
/*
* That failed.
*
* But a prefetching area can handle a non-prefetching
* window (it will just not perform as well).
+ *
+ * Also can put 64bit under 32bit range. (below 4g).
*/
ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
pcibios_align_resource, dev);
--
1.8.4

Yinghai Lu

unread,
Dec 19, 2013, 1:10:02 AM12/19/13
to
mmio 64 allocation that could help Guo Chao <y...@linux.vnet.ibm.com> on powerpc mmio allocation.
It will try to assign 64 bit resource above 4g at first.

And it is based on current pci/next and pci/resource.

-v2: update after patch that move device_del down to pci_destroy_dev.
add "Try best to allocate pref mmio 64bit above 4G"

-v3: refresh and send out after pci_clip_resource() changes,
as Bjorn is not happy with attachments.

-v4: make pcibios_resource_to_bus take bus directly.

-v5: fix non-pref mmio64 allocation problem found by Guo Chao.
refresh last three as Bjorn update first two and put them in pci/resource

Yinghai Lu (3):
PCI: Try to allocate mem64 above 4G at first
PCI: Try best to allocate pref mmio 64bit above 4g
PCI: Sort pci root bus resources list

drivers/pci/bus.c | 43 +++++++++++----
drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 20 ++++++-
3 files changed, 144 insertions(+), 57 deletions(-)

Yinghai Lu

unread,
Dec 19, 2013, 1:10:02 AM12/19/13
to
On system with more pcie cards, we do not have enough range under 4G
to allocate those pci devices.

On 64bit system, we could try to allocate mem64 above 4G at first,
and fall back to below 4g if it can not find any above 4g.

x86 32bit without X86_PAE support will have bottom set to 0, because
resource_size_t is 32bit.
For 32bit kernel that resource_size_t is 64bit when pae is support.
we are safe because iomem_resource is limited to 32bit according to
x86_phys_bits.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
use MAX_RESOURCE instead of -1
use start/end instead of bottom/max
for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.
-v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
-v6: checking pcibios_resource_to_bus return for every bus res, to decide it
if we need to try high at first.
It supports all arches instead of just x86_64.
-v7: split 4G limit change out to another patch according to Bjorn.
also use pci_clip_resource instead.
-v8: refresh after changes in pci/resource.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
---
drivers/pci/bus.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 263b90c..1fd0bf8 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -100,6 +100,9 @@ void pci_bus_remove_resources(struct pci_bus *bus)

/* The region that can be mapped by a 32-bit BAR. */
static struct pci_bus_region pci_32_bit = {0, 0xffffffff};
+/* The region that can be mapped by a 64-bit BAR above 4G */
+static struct pci_bus_region pci_64_bit = {(resource_size_t)(1ULL<<32),
+ (resource_size_t)(-1ULL)};

/*
* @res contains CPU addresses. Clip it so the corresponding bus addresses
@@ -150,12 +153,12 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
{
int i, ret = -ENOMEM;
struct resource *r;
- resource_size_t max;

type_mask |= IORESOURCE_IO | IORESOURCE_MEM;

pci_bus_for_each_resource(bus, r, i) {
struct resource avail;
+ int try_again = 0;

if (!r)
continue;
@@ -174,12 +177,19 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
* Unless this is a 64-bit BAR, we have to clip the
* available space to the part that maps to the region of
* 32-bit bus addresses.
+ * If this is a 64-bit BAR, try above 4G first.
*/
avail = *r;
if (!(res->flags & IORESOURCE_MEM_64)) {
pci_clip_resource_to_bus(bus, &avail, &pci_32_bit);
if (!resource_size(&avail))
continue;
+ } else {
+ pci_clip_resource_to_bus(bus, &avail, &pci_64_bit);
+ if (!resource_size(&avail))
+ avail = *r;
+ else
+ try_again = 1;
}

/*
@@ -188,16 +198,18 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
* this is an already-configured bridge window, its start
* overrides "min".
*/
- if (avail.start)
- min = avail.start;
-
- max = avail.end;
-
+again:
/* Ok, try it out.. */
- ret = allocate_resource(r, res, size, min, max,
- align, alignf, alignf_data);
+ ret = allocate_resource(r, res, size, avail.start ? : min,
+ avail.end, align, alignf, alignf_data);
if (ret == 0)
break;
+
+ if (try_again) {
+ avail = *r;
+ try_again = 0;
+ goto again;
+ }
}
return ret;
}

Yinghai Lu

unread,
Dec 19, 2013, 1:20:02 AM12/19/13
to
Some x86 systems expose above 4G 64bit mmio in _CRS as non-pref mmio range.
[ 49.415281] PCI host bridge to bus 0000:00
[ 49.419921] pci_bus 0000:00: root bus resource [bus 00-1e]
[ 49.426107] pci_bus 0000:00: root bus resource [io 0x0000-0x0cf7]
[ 49.433041] pci_bus 0000:00: root bus resource [io 0x1000-0x5fff]
[ 49.440010] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff]
[ 49.447768] pci_bus 0000:00: root bus resource [mem 0xfed8c000-0xfedfffff]
[ 49.455532] pci_bus 0000:00: root bus resource [mem 0x90000000-0x9fffbfff]
[ 49.463259] pci_bus 0000:00: root bus resource [mem 0x380000000000-0x381fffffffff]

During assign unassigned 64bit mmio resource, it will go through
every non-pref mmio for root bus in pci_bus_alloc_resource().
As the loop is with pci_bus_for_each_resource(), and could have chance
to use under 4G mmio range instead of above 4G mmio range if the requested
range is not big enough, even it could handle above 4G 64bit pref mmio.

For root bus, we can order list from high to low in pci_add_resource_offset(),
during creating root bus, it will still keep the same order in final bus
resource list.
pci_acpi_scan_root
==> add_resources
==> pci_add_resource_offset: # Add to temp resources
==> pci_create_root_bus
==> pci_bus_add_resource # add to final bus resources.

After that, we can make sure 64bit pref mmio for pci bridges will be allocated
higest of mmio non-pref, and in this case it is above 4G instead of under 4G.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
---
drivers/pci/bus.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 1fd0bf8..b8a2370 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -21,7 +21,8 @@
void pci_add_resource_offset(struct list_head *resources, struct resource *res,
resource_size_t offset)
{
- struct pci_host_bridge_window *window;
+ struct pci_host_bridge_window *window, *tmp;
+ struct list_head *n;

window = kzalloc(sizeof(struct pci_host_bridge_window), GFP_KERNEL);
if (!window) {
@@ -31,7 +32,17 @@ void pci_add_resource_offset(struct list_head *resources, struct resource *res,

window->res = res;
window->offset = offset;
- list_add_tail(&window->list, resources);
+
+ /* Keep list sorted according to res end */
+ n = resources;
+ list_for_each_entry(tmp, resources, list)
+ if (window->res->end > tmp->res->end) {
+ n = &tmp->list;
+ break;
+ }
+
+ /* Insert it just before n */
+ list_add_tail(&window->list, n);
}
EXPORT_SYMBOL(pci_add_resource_offset);

Yinghai Lu

unread,
Dec 19, 2013, 11:40:01 AM12/19/13
to
On Wed, Dec 18, 2013 at 10:09 PM, Yinghai Lu <yin...@kernel.org> wrote:
> On system with more pcie cards, we do not have enough range under 4G
> to allocate those pci devices.
>
> On 64bit system, we could try to allocate mem64 above 4G at first,
> and fall back to below 4g if it can not find any above 4g.
>
> x86 32bit without X86_PAE support will have bottom set to 0, because
> resource_size_t is 32bit.
> For 32bit kernel that resource_size_t is 64bit when pae is support.
> we are safe because iomem_resource is limited to 32bit according to
> x86_phys_bits.
>
> -v2: update bottom assigning to make it clear for non-pae support machine.
> -v3: Bjorn's change:
> use MAX_RESOURCE instead of -1
> use start/end instead of bottom/max
> for all arch instead of just x86_64
> -v4: updated after PCI_MAX_RESOURCE_32 change.
> -v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
> -v6: checking pcibios_resource_to_bus return for every bus res, to decide it
> if we need to try high at first.
> It supports all arches instead of just x86_64.
> -v7: split 4G limit change out to another patch according to Bjorn.
> also use pci_clip_resource instead.
> -v8: refresh after changes in pci/resource.

looks still have other problem, will send out updated version later.

Yinghai Lu

unread,
Dec 19, 2013, 3:50:03 PM12/19/13
to
mmio 64 allocation that could help Guo Chao <y...@linux.vnet.ibm.com> on powerpc mmio allocation.
It will try to assign 64 bit resource above 4g at first.

And it is based on current pci/next and pci/resource.

-v2: update after patch that move device_del down to pci_destroy_dev.
add "Try best to allocate pref mmio 64bit above 4G"

-v3: refresh and send out after pci_clip_resource() changes,
as Bjorn is not happy with attachments.

-v4: make pcibios_resource_to_bus take bus directly.

-v5: fix non-pref mmio64 allocation problem found by Guo Chao.
refresh last three as Bjorn update first two and put them in pci/resource

-v6: try above 4G at first, then restart again for under 4G.
so we can drop patch that sort pci root bus resource list.

Yinghai Lu (2):
PCI: Try to allocate mem64 above 4G at first
PCI: Try best to allocate pref mmio 64bit above 4g

drivers/pci/bus.c | 34 ++++++++----
drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 20 ++++++-
3 files changed, 135 insertions(+), 57 deletions(-)

--
1.8.4

Yinghai Lu

unread,
Dec 19, 2013, 3:50:03 PM12/19/13
to
When one of children resources does not support MEM_64, MEM_64 for
bridge get reset, so pull down whole pref resource on the bridge under 4G.

If the bridge support pref mem 64, will only allocate that with pref mem64 to
children that support it.
For children resources if they only support pref mem 32, will allocate them
from non pref mem instead.

If the bridge only support 32bit pref mmio, will still have all children pref
mmio under that.

-v2: Add release bridge res support with bridge mem res for pref_mem children res.
-v3: refresh and make it can be applied early before for_each_dev_res patchset.
-v4: fix non-pref mmio 64bit support found by Guo Chao.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
Tested-by: Guo Chao <y...@linux.vnet.ibm.com>
---
drivers/pci/setup-bus.c | 138 ++++++++++++++++++++++++++++++++----------------
drivers/pci/setup-res.c | 20 ++++++-

Yinghai Lu

unread,
Dec 19, 2013, 3:50:03 PM12/19/13
to
On system with more pcie cards, we do not have enough range under 4G
to allocate those pci devices.

On 64bit system, we could try to allocate mem64 above 4G at first,
and fall back to below 4g if it can not find any above 4g.

-v2: update bottom assigning to make it clear for non-pae support machine.
-v3: Bjorn's change:
use MAX_RESOURCE instead of -1
use start/end instead of bottom/max
for all arch instead of just x86_64
-v4: updated after PCI_MAX_RESOURCE_32 change.
-v5: restore io handling to use PCI_MAX_RESOURCE_32 as limit.
-v6: checking pcibios_resource_to_bus return for every bus res, to decide it
if we need to try high at first.
It supports all arches instead of just x86_64.
-v7: split 4G limit change out to another patch according to Bjorn.
also use pci_clip_resource instead.
-v8: refresh after changes in pci/resource.
-v9: make second try to restart from first res of bus.
so we can ommit the patch that sort resource list of pci root bus.

Signed-off-by: Yinghai Lu <yin...@kernel.org>
---
drivers/pci/bus.c | 34 ++++++++++++++++++++++++----------
1 file changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 263b90c..d49e6cb 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -100,6 +100,9 @@ void pci_bus_remove_resources(struct pci_bus *bus)

/* The region that can be mapped by a 32-bit BAR. */
static struct pci_bus_region pci_32_bit = {0, 0xffffffff};
+/* The region that can be mapped by a 64-bit BAR above 4G */
+static struct pci_bus_region pci_64_bit = {(resource_size_t)(1ULL<<32),
+ (resource_size_t)(-1ULL)};

/*
* @res contains CPU addresses. Clip it so the corresponding bus addresses
@@ -150,10 +153,11 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
{
int i, ret = -ENOMEM;
struct resource *r;
- resource_size_t max;
+ bool try_again = !!(res->flags & IORESOURCE_MEM_64);

type_mask |= IORESOURCE_IO | IORESOURCE_MEM;

+again:
pci_bus_for_each_resource(bus, r, i) {
struct resource avail;

@@ -170,13 +174,21 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
!(res->flags & IORESOURCE_PREFETCH))
continue;

+ /* If this is a 64-bit BAR, try above 4G first. */
+ avail = *r;
+ if (try_again) {
+ /* res->flags has IORESOURCE_MEM_64 set */
+ pci_clip_resource_to_bus(bus, &avail, &pci_64_bit);
+ if (!resource_size(&avail))
+ continue;
+ }
+
/*
* Unless this is a 64-bit BAR, we have to clip the
* available space to the part that maps to the region of
* 32-bit bus addresses.
*/
- avail = *r;
- if (!(res->flags & IORESOURCE_MEM_64)) {
+ if (!try_again && !(res->flags & IORESOURCE_MEM_64)) {
pci_clip_resource_to_bus(bus, &avail, &pci_32_bit);
if (!resource_size(&avail))
continue;
@@ -188,17 +200,19 @@ pci_bus_alloc_resource(struct pci_bus *bus, struct resource *res,
* this is an already-configured bridge window, its start
* overrides "min".
*/
- if (avail.start)
- min = avail.start;
-
- max = avail.end;

/* Ok, try it out.. */
- ret = allocate_resource(r, res, size, min, max,
- align, alignf, alignf_data);
+ ret = allocate_resource(r, res, size, avail.start ? : min,
+ avail.end, align, alignf, alignf_data);
if (ret == 0)
- break;
+ return 0;
}
+
+ if (try_again) {
+ try_again = false;
+ goto again;
+ }
+
return ret;

Bjorn Helgaas

unread,
Dec 22, 2013, 7:10:01 PM12/22/13
to
On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu <yin...@kernel.org> wrote:

Let me see if I can figure out what you're trying to do here. Please
correct me if I'm wrong:

> When one of children resources does not support MEM_64, MEM_64 for
> bridge get reset, so pull down whole pref resource on the bridge under 4G.

When we allocate space for a bridge's prefetchable window, we
currently look at the devices behind the bridge and put the window
below 4GB if any of those children has a 32-bit prefetchable BAR.

This maximizes the use of prefetch, at the cost of using more 32-bit
address space.

> If the bridge support pref mem 64, will only allocate that with pref mem64 to
> children that support it.
> For children resources if they only support pref mem 32, will allocate them
> from non pref mem instead.

You are changing this so that we will always try to put a bridge's
64-bit prefetchable window above 4GB, regardless of what devices are
behind the bridge. If a device behind the bridge has a 32-bit
prefetchable BAR, we will place that BAR in the bridge's 32-bit
non-prefetchable window.

This minimizes the use of the 32-bit address space, at the cost of not
being able to use prefetch as much.

> If the bridge only support 32bit pref mmio, will still have all children pref
> mmio under that.

Obviously, if a bridge has a prefetchable window that's only 32 bits,
64-bit prefetchable BARs behind the bridge will have to be in that
32-bit prefetchable window or the 32-bit non-prefetchable window. And
if the bridge has no prefetchable window at all, every memory BAR
behind the bridge will have to be in the 32-bit non-prefetchable
window.

I'll look at the actual patch later; I just want to make sure I
understand your intent first.

Bjorn

Yinghai Lu

unread,
Dec 22, 2013, 8:20:02 PM12/22/13
to
On Sun, Dec 22, 2013 at 4:00 PM, Bjorn Helgaas <bhel...@google.com> wrote:
> On Thu, Dec 19, 2013 at 1:44 PM, Yinghai Lu <yin...@kernel.org> wrote:
>
> Let me see if I can figure out what you're trying to do here. Please
> correct me if I'm wrong:
>
>> When one of children resources does not support MEM_64, MEM_64 for
>> bridge get reset, so pull down whole pref resource on the bridge under 4G.
>
> When we allocate space for a bridge's prefetchable window, we
> currently look at the devices behind the bridge and put the window
> below 4GB if any of those children has a 32-bit prefetchable BAR.
>
> This maximizes the use of prefetch, at the cost of using more 32-bit
> address space.

yes. and we have problem when we have 8 sockets or 32 sockets system,
will have limit 32bit space.
but we have enough above 4G 64bit mmio for prefetchable.

>
>> If the bridge support pref mem 64, will only allocate that with pref mem64 to
>> children that support it.
>> For children resources if they only support pref mem 32, will allocate them
>> from non pref mem instead.
>
> You are changing this so that we will always try to put a bridge's
> 64-bit prefetchable window above 4GB, regardless of what devices are
> behind the bridge. If a device behind the bridge has a 32-bit
> prefetchable BAR, we will place that BAR in the bridge's 32-bit
> non-prefetchable window.

Yes. so we can keep IORESOURCE_MEM64 in the flags for PREF.

>
> This minimizes the use of the 32-bit address space, at the cost of not
> being able to use prefetch as much.
>
>> If the bridge only support 32bit pref mmio, will still have all children pref
>> mmio under that.
>
> Obviously, if a bridge has a prefetchable window that's only 32 bits,
> 64-bit prefetchable BARs behind the bridge will have to be in that
> 32-bit prefetchable window or the 32-bit non-prefetchable window. And
> if the bridge has no prefetchable window at all, every memory BAR
> behind the bridge will have to be in the 32-bit non-prefetchable
> window.

Yes.

>
> I'll look at the actual patch later; I just want to make sure I
> understand your intent first.

Thanks

Yinghai

Yinghai Lu

unread,
Jan 8, 2014, 6:40:01 PM1/8/14
to
Hi, Bjorn,

Can you check and add this one to your pci/resource branch?
With that we can close the loop for 64bit mmio resource allocation.

Guo Chao

unread,
Jan 10, 2014, 4:50:02 AM1/10/14
to
Just FYI, a Mellanox net card failed after exactly this patch.

3.13-rc7 + bjorn's series is OK. After this patch applied, Mellanox
driver complains:

|mlx4_core 0003:05:00.0: Multiple PFs not yet supported. Skipping PF.
|mlx4_core: probe of 0003:05:00.0 failed with error -22

This is caused by MMIO read from BAR 0 (64-bit non-prefetchable) returns
non-zore value.

Resource assignment, as far as we can see, works fine. The noticable
effect of this patch is putting ROM BAR under non-prefetachable. I try
to revert this effect by adding MEM_64 to its ROM resource and it works
again (system does not expose 4G above aperture yet). Not sure what's
the root cause, looks like a driver/firmware/hardware defect.

Thanks
Guo Chao

Yinghai Lu

unread,
Jan 10, 2014, 12:10:01 PM1/10/14
to
Interesting. Can you post boot log with "debug ignore_loglevel initcall_debug"
and with/without this patch?
0 new messages