Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 0/2] mm/memblock: Excluded memory, free_all_bootmem

1 view
Skip to first unread message

Philipp Hachtmann

unread,
Jan 13, 2014, 6:40:01 AM1/13/14
to
These two patches fit (only) on top of linux-next!

The first patch changes back the behavior of free_all_bootmem() to
a more generic way: With CONFIG_DISCARD_MEMBLOCK memblock.memory
and memblock.reserved will be freed (if allocated, of course).
Removed the debugfs dependency. Think this is cleaner.

While further working on the s390 migration to memblock it is desirable
to have memblock support unmapped (i.e. completely forgotten and unused)
memory areas. The usual way of just forgetting about them by means of
truncating the memblocks does not work for us because we still need the
information about the real full memory structure at a later time.


Philipp Hachtmann (2):
mm/nobootmem: free_all_bootmem again
mm/memblock: Add support for excluded memory areas

include/linux/memblock.h | 50 ++++++--
mm/memblock.c | 324 +++++++++++++++++++++++++++++++----------------
mm/nobootmem.c | 13 +-
3 files changed, 271 insertions(+), 116 deletions(-)

--
1.8.4.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Philipp Hachtmann

unread,
Jan 13, 2014, 6:40:02 AM1/13/14
to
get_allocated_memblock_reserved_regions_info() should work if it is
compiled in. Extended the ifdef around
get_allocated_memblock_memory_regions_info() to include
get_allocated_memblock_reserved_regions_info() as well.
Similar changes in nobootmem.c/free_low_memory_core_early() where
the two functions are called.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
mm/memblock.c | 17 ++---------------
mm/nobootmem.c | 4 ++--
2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 64ed243..9c0aeef 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -266,33 +266,20 @@ static void __init_memblock memblock_remove_region(struct memblock_type *type, u
}
}

+#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
+
phys_addr_t __init_memblock get_allocated_memblock_reserved_regions_info(
phys_addr_t *addr)
{
if (memblock.reserved.regions == memblock_reserved_init_regions)
return 0;

- /*
- * Don't allow nobootmem allocator to free reserved memory regions
- * array if
- * - CONFIG_DEBUG_FS is enabled;
- * - CONFIG_ARCH_DISCARD_MEMBLOCK is not enabled;
- * - reserved memory regions array have been resized during boot.
- * Otherwise debug_fs entry "sys/kernel/debug/memblock/reserved"
- * will show garbage instead of state of memory reservations.
- */
- if (IS_ENABLED(CONFIG_DEBUG_FS) &&
- !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK))
- return 0;
-
*addr = __pa(memblock.reserved.regions);

return PAGE_ALIGN(sizeof(struct memblock_region) *
memblock.reserved.max);
}

-#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-
phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info(
phys_addr_t *addr)
{
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 17c8902..e2906a5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -122,13 +122,13 @@ static unsigned long __init free_low_memory_core_early(void)
for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
count += __free_memory_core(start, end);

+#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
+
/* Free memblock.reserved array if it was allocated */
size = get_allocated_memblock_reserved_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);

-#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-
/* Free memblock.memory array if it was allocated */
size = get_allocated_memblock_memory_regions_info(&start);
if (size)

Philipp Hachtmann

unread,
Jan 13, 2014, 6:40:02 AM1/13/14
to
Add a new memory state "nomap" to memblock. This can be used to truncate
the usable memory in the system without forgetting about what is really
installed.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
include/linux/memblock.h | 50 ++++++--
mm/memblock.c | 307 +++++++++++++++++++++++++++++++++--------------
mm/nobootmem.c | 9 ++
3 files changed, 267 insertions(+), 99 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 1ef6636..2333d3f 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -18,6 +18,7 @@
#include <linux/mm.h>

#define INIT_MEMBLOCK_REGIONS 128
+#define INIT_MEMBLOCK_NOMAP_REGIONS 4

/* Definition of memblock flags. */
#define MEMBLOCK_HOTPLUG 0x1 /* hotpluggable region */
@@ -43,6 +44,9 @@ struct memblock {
phys_addr_t current_limit;
struct memblock_type memory;
struct memblock_type reserved;
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ struct memblock_type nomap;
+#endif
};

extern struct memblock memblock;
@@ -68,6 +72,10 @@ int memblock_add(phys_addr_t base, phys_addr_t size);
int memblock_remove(phys_addr_t base, phys_addr_t size);
int memblock_free(phys_addr_t base, phys_addr_t size);
int memblock_reserve(phys_addr_t base, phys_addr_t size);
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+int memblock_nomap(phys_addr_t base, phys_addr_t size);
+int memblock_remap(phys_addr_t base, phys_addr_t size);
+#endif
void memblock_trim_memory(phys_addr_t align);
int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
@@ -113,8 +121,9 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

-void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
+void __next_mem_range(u64 *idx, int nid, struct memblock_type *type_a,
+ struct memblock_type *type_b, phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);

/**
* for_each_free_mem_range - iterate through free memblock areas
@@ -129,12 +138,31 @@ void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
*/
#define for_each_free_mem_range(i, nid, p_start, p_end, p_nid) \
for (i = 0, \
- __next_free_mem_range(&i, nid, p_start, p_end, p_nid); \
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.reserved, p_start, \
+ p_end, p_nid); \
+ i != (u64)ULLONG_MAX; \
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid))
+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+#define for_each_mapped_mem_range(i, nid, p_start, p_end, p_nid) \
+ for (i = 0, \
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, p_start, \
+ p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_free_mem_range(&i, nid, p_start, p_end, p_nid))
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, \
+ p_start, p_end, p_nid))
+#endif

-void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
+void __next_mem_range_rev(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);

/**
* for_each_free_mem_range_reverse - rev-iterate through free memblock areas
@@ -149,9 +177,15 @@ void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
*/
#define for_each_free_mem_range_reverse(i, nid, p_start, p_end, p_nid) \
for (i = (u64)ULLONG_MAX, \
- __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid); \
+ __next_mem_range_rev(&i, nid, \
+ &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid))
+ __next_mem_range_rev(&i, nid, \
+ &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid))

static inline void memblock_set_region_flags(struct memblock_region *r,
unsigned long flags)
diff --git a/mm/memblock.c b/mm/memblock.c
index 9c0aeef..dba3252 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -28,6 +28,11 @@
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;

+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+static struct memblock_region
+memblock_nomap_init_regions[INIT_MEMBLOCK_NOMAP_REGIONS] __initdata_memblock;
+#endif
+
struct memblock memblock __initdata_memblock = {
.memory.regions = memblock_memory_init_regions,
.memory.cnt = 1, /* empty dummy entry */
@@ -37,6 +42,11 @@ struct memblock memblock __initdata_memblock = {
.reserved.cnt = 1, /* empty dummy entry */
.reserved.max = INIT_MEMBLOCK_REGIONS,

+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ .nomap.regions = memblock_nomap_init_regions,
+ .nomap.cnt = 1, /* empty dummy entry */
+ .nomap.max = INIT_MEMBLOCK_NOMAP_REGIONS,
+#endif
.bottom_up = false,
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};
@@ -292,7 +302,21 @@ phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info(
memblock.memory.max);
}

-#endif
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+phys_addr_t __init_memblock get_allocated_memblock_nomap_regions_info(
+ phys_addr_t *addr)
+{
+ if (memblock.memory.regions == memblock_memory_init_regions)
+ return 0;
+
+ *addr = __pa(memblock.memory.regions);
+
+ return PAGE_ALIGN(sizeof(struct memblock_region) *
+ memblock.memory.max);
+}
+
+#endif /* CONFIG_ARCH_MEMBLOCK_NOMAP */
+#endif /* CONFIG_ARCH_DISCARD_MEMBLOCK */

/**
* memblock_double_array - double the size of the memblock regions array
@@ -757,18 +781,76 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
return 0;
}

+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+/*
+ * memblock_nomap() - mark a memory range as completely unusable
+ *
+ * This can be used to exclude memory regions from every further treatment
+ * in the running system. Ranges which are added to the nomap list will
+ * also be marked as reserved. So they won't either be allocated by memblock
+ * nor freed to the page allocator.
+ *
+ * The usable (i.e. not in nomap list) memory can be iterated
+ * via for_each_mapped_mem_range().
+ *
+ * memblock_start_of_DRAM() and memblock_end_of_DRAM() still refer to the
+ * whole system memory.
+ */
+int __init_memblock memblock_nomap(phys_addr_t base, phys_addr_t size)
+{
+ int ret;
+ memblock_dbg("memblock_nomap: [%#016llx-%#016llx] %pF\n",
+ (unsigned long long)base,
+ (unsigned long long)base + size,
+ (void *)_RET_IP_);
+
+ ret = memblock_add_region(&memblock.reserved, base, size, MAX_NUMNODES);
+ if (ret)
+ return ret;
+
+ return memblock_add_region(&memblock.nomap, base, size, MAX_NUMNODES);
+}
+
+/*
+ * memblock_remap() - remove a memory range from the nomap list
+ *
+ * This is the inverse function to memblock_nomap().
+ */
+int __init_memblock memblock_remap(phys_addr_t base, phys_addr_t size)
+{
+ int ret;
+ memblock_dbg("memblock_remap: [%#016llx-%#016llx] %pF\n",
+ (unsigned long long)base,
+ (unsigned long long)base + size,
+ (void *)_RET_IP_);
+
+ ret = __memblock_remove(&memblock.reserved, base, size);
+ if (ret)
+ return ret;
+
+ return __memblock_remove(&memblock.nomap, base, size);
+}
+
+#endif
+
/**
- * __next_free_mem_range - next function for for_each_free_mem_range()
+ * __next_mem_range - generic next function for for_each_*_range()
+ *
+ * Finds the next range from type_a which is not marked as unsuitable
+ * in type_b.
+ *
* @idx: pointer to u64 loop variable
- * @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @nid: node selector, %MAX_NUMNODES for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Find the first free area from *@idx which matches @nid, fill the out
+ * Find the first present area from *@idx which matches @nid, fill the out
* parameters, and update *@idx for the next iteration. The lower 32bit of
- * *@idx contains index into memory region and the upper 32bit indexes the
- * areas before each reserved region. For example, if reserved regions
+ * *@idx contains index into type_a region and the upper 32bit indexes the
+ * areas before each type_b region. For example, if type_a regions
* look like the following,
*
* 0:[0-16), 1:[32-48), 2:[128-130)
@@ -780,135 +862,178 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
* As both region arrays are sorted, the function advances the two indices
* in lockstep and returns each intersection.
*/
-void __init_memblock __next_free_mem_range(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
+void __init_memblock __next_mem_range(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.reserved;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
+ int idx_a = *idx & 0xffffffff;
+ int idx_b = *idx >> 32;

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;

- for ( ; mi < mem->cnt; mi++) {
- struct memblock_region *m = &mem->regions[mi];
+ for (; idx_a < type_a->cnt; idx_a++) {
+ struct memblock_region *m = &type_a->regions[idx_a];
phys_addr_t m_start = m->base;
phys_addr_t m_end = m->base + m->size;
+ int m_nid = memblock_get_region_node(m);

/* only memory regions are associated with nodes, check it */
- if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
+ if (nid != MAX_NUMNODES && nid != memblock_get_region_node(m))
continue;

- /* scan areas before each reservation for intersection */
- for ( ; ri < rsv->cnt + 1; ri++) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
-
- /* if ri advanced past mi, break out to advance mi */
- if (r_start >= m_end)
- break;
- /* if the two regions intersect, we're done */
- if (m_start < r_end) {
- if (out_start)
- *out_start = max(m_start, r_start);
- if (out_end)
- *out_end = min(m_end, r_end);
- if (out_nid)
- *out_nid = memblock_get_region_node(m);
+ /* With type_b NULL we only iterate through type_a */
+ if (type_b == NULL) {
+ if (out_start)
+ *out_start = m_start;
+ if (out_end)
+ *out_end = m_end;
+ if (out_nid)
+ *out_nid = m_nid;
+ idx_a++;
+ *idx = (u32)idx_a;
+ return;
+ } else {
+ /* scan areas before each reservation */
+ for (; idx_b < type_b->cnt + 1; idx_b++) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
+ r->base : ULLONG_MAX;
+
/*
- * The region which ends first is advanced
- * for the next iteration.
+ *if idx_b advanced past idx_a,
+ * break out to advance idx_a
*/
- if (m_end <= r_end)
- mi++;
- else
- ri++;
- *idx = (u32)mi | (u64)ri << 32;
- return;
+ if (r_start >= m_end)
+ break;
+ /* if the two regions intersect, we're done */
+ if (m_start < r_end) {
+ if (out_start)
+ *out_start =
+ max(m_start, r_start);
+ if (out_end)
+ *out_end = min(m_end, r_end);
+ if (out_nid)
+ *out_nid = m_nid;
+
+ /*
+ * The region which ends first is
+ * advanced for the next iteration.
+ */
+ if (m_end <= r_end)
+ idx_a++;
+ else
+ idx_b++;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
+ return;
+ }
}
}
}
-
/* signal end of iteration */
*idx = ULLONG_MAX;
}

/**
- * __next_free_mem_range_rev - next function for for_each_free_mem_range_reverse()
+ * __next_mem_range_rev - generic next function for for_each_*_range_rev()
+ *
+ * Finds the next range from type_a which is not marked as unsuitable
+ * in type_b.
+ *
* @idx: pointer to u64 loop variable
- * @nid: nid: node selector, %NUMA_NO_NODE for all nodes
+ * @nid: nid: node selector, %MAX_NUMNODES for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Reverse of __next_free_mem_range().
- *
- * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
- * be able to hot-remove hotpluggable memory used by the kernel. So this
- * function skip hotpluggable regions if needed when allocating memory for the
- * kernel.
+ * Reverse of __next_mem_range().
*/
-void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
+void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.reserved;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
+ int idx_a = *idx & 0xffffffff;
+ int idx_b = *idx >> 32;

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;

if (*idx == (u64)ULLONG_MAX) {
- mi = mem->cnt - 1;
- ri = rsv->cnt;
+ idx_a = type_a->cnt - 1;
+ idx_b = type_b->cnt;
}

- for ( ; mi >= 0; mi--) {
- struct memblock_region *m = &mem->regions[mi];
+ for (; idx_a >= 0; idx_a--) {
+ struct memblock_region *m = &type_a->regions[idx_a];
phys_addr_t m_start = m->base;
phys_addr_t m_end = m->base + m->size;

/* only memory regions are associated with nodes, check it */
- if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
+ if (nid != MAX_NUMNODES && nid != memblock_get_region_node(m))
continue;

- /* skip hotpluggable memory regions if needed */
- if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
- continue;
-
- /* scan areas before each reservation for intersection */
- for ( ; ri >= 0; ri--) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
-
- /* if ri advanced past mi, break out to advance mi */
- if (r_end <= m_start)
- break;
- /* if the two regions intersect, we're done */
- if (m_end > r_start) {
- if (out_start)
- *out_start = max(m_start, r_start);
- if (out_end)
- *out_end = min(m_end, r_end);
- if (out_nid)
- *out_nid = memblock_get_region_node(m);
-
- if (m_start >= r_start)
- mi--;
- else
- ri--;
- *idx = (u32)mi | (u64)ri << 32;
- return;
+ /* With type_b NULL we only iterate through type_a */
+ if (type_b == NULL) {
+ if (out_start)
+ *out_start = m_start;
+ if (out_end)
+ *out_end = m_end;
+ if (out_nid)
+ *out_nid = memblock_get_region_node(m);
+ idx_a--;
+ *idx = (u32)idx_a;
+ return;
+ } else {
+ /* scan areas before each reservation */
+ for (; idx_b >= 0; idx_b--) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+ int m_nid = memblock_get_region_node(m);
+
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
+ r->base : ULLONG_MAX;
+ /*
+ * if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
+ if (r_end <= m_start)
+ break;
+ /* if the two regions intersect, we're done */
+ if (m_end > r_start) {
+ if (out_start)
+ *out_start =
+ max(m_start, r_start);
+ if (out_end)
+ *out_end =
+ min(m_end, r_end);
+ if (out_nid)
+ *out_nid = m_nid;
+
+ if (m_start >= r_start)
+ idx_a--;
+ else
+ idx_b--;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
+ return;
+ }
}
}
}
-
+ /* signal end of iteration */
*idx = ULLONG_MAX;
}

diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index e2906a5..c57d5e3 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -133,6 +133,15 @@ static unsigned long __init free_low_memory_core_early(void)
size = get_allocated_memblock_memory_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);
+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+
+ /* Free memblock.nomap array if it was allocated */
+ size = get_allocated_memblock_memory_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+
+#endif
#endif

return count;

Philipp Hachtmann

unread,
Jan 13, 2014, 7:50:02 AM1/13/14
to
These two patches fit (only) on top of linux-next!

The first patch changes back the behavior of free_all_bootmem() to
a more generic way: With CONFIG_DISCARD_MEMBLOCK memblock.memory
and memblock.reserved will be freed (if allocated, of course).
Removed the debugfs dependency. Think this is cleaner.

While further working on the s390 migration to memblock it is desirable
to have memblock support unmapped (i.e. completely forgotten and unused)
memory areas. The usual way of just forgetting about them by means of
truncating the memblocks does not work for us because we still need the
information about the real full memory structure at a later time.

Philipp Hachtmann (2):
mm/nobootmem: free_all_bootmem again
mm/memblock: Add support for excluded memory areas

include/linux/memblock.h | 50 +++++++--
mm/Kconfig | 3 +
mm/memblock.c | 276 ++++++++++++++++++++++++++++++++++-------------
mm/nobootmem.c | 13 ++-
4 files changed, 255 insertions(+), 87 deletions(-)

Philipp Hachtmann

unread,
Jan 13, 2014, 7:50:02 AM1/13/14
to
get_allocated_memblock_reserved_regions_info() should work if it is
compiled in. Extended the ifdef around
get_allocated_memblock_memory_regions_info() to include
get_allocated_memblock_reserved_regions_info() as well.
Similar changes in nobootmem.c/free_low_memory_core_early() where
the two functions are called.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
mm/memblock.c | 17 ++---------------
mm/nobootmem.c | 4 ++--
2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 64ed243..9c0aeef 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 17c8902..e2906a5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -122,13 +122,13 @@ static unsigned long __init free_low_memory_core_early(void)
for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
count += __free_memory_core(start, end);

+#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
+
/* Free memblock.reserved array if it was allocated */
size = get_allocated_memblock_reserved_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);

-#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-
/* Free memblock.memory array if it was allocated */
size = get_allocated_memblock_memory_regions_info(&start);
if (size)

Philipp Hachtmann

unread,
Jan 13, 2014, 7:50:02 AM1/13/14
to
Add a new memory state "nomap" to memblock. This can be used to truncate
the usable memory in the system without forgetting about what is really
installed.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
include/linux/memblock.h | 50 +++++++--
mm/Kconfig | 3 +
mm/memblock.c | 259 +++++++++++++++++++++++++++++++++++------------
mm/nobootmem.c | 9 ++
4 files changed, 251 insertions(+), 70 deletions(-)
diff --git a/mm/Kconfig b/mm/Kconfig
index 2d9f150..6907654 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
config ARCH_DISCARD_MEMBLOCK
boolean

+config ARCH_MEMBLOCK_NOMAP
+ boolean
+
config NO_BOOTMEM
boolean

diff --git a/mm/memblock.c b/mm/memblock.c
index 9c0aeef..36070fa 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
* @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Find the first free area from *@idx which matches @nid, fill the out
+ * Find the first present area from *@idx which matches @nid, fill the out
* parameters, and update *@idx for the next iteration. The lower 32bit of
- * *@idx contains index into memory region and the upper 32bit indexes the
- * areas before each reserved region. For example, if reserved regions
+ * *@idx contains index into type_a region and the upper 32bit indexes the
+ * areas before each type_b region. For example, if type_a regions
* look like the following,
*
* 0:[0-16), 1:[32-48), 2:[128-130)
@@ -780,96 +862,120 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
continue;

- /* scan areas before each reservation for intersection */
- for ( ; ri < rsv->cnt + 1; ri++) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
+ /* With type_b NULL we only iterate through type_a */
+ if (type_b == NULL) {
+ if (out_start)
+ *out_start = m_start;
+ if (out_end)
+ *out_end = m_end;
+ if (out_nid)
+ *out_nid = m_nid;
+ idx_a++;
+ *idx = (u32)idx_a;
+ return;
+ }
+
+ /* scan areas before each reservation */
+ for (; idx_b < type_b->cnt + 1; idx_b++) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;

- /* if ri advanced past mi, break out to advance mi */
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
+ r->base : ULLONG_MAX;
+
+ /*
+ *if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
if (r_start >= m_end)
break;
/* if the two regions intersect, we're done */
if (m_start < r_end) {
if (out_start)
- *out_start = max(m_start, r_start);
+ *out_start =
+ max(m_start, r_start);
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
+ *out_nid = m_nid;
+
/*
- * The region which ends first is advanced
- * for the next iteration.
+ * The region which ends first is
+ * advanced for the next iteration.
*/
if (m_end <= r_end)
- mi++;
+ idx_a++;
else
- ri++;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b++;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
/* signal end of iteration */
*idx = ULLONG_MAX;
}

/**
- * __next_free_mem_range_rev - next function for for_each_free_mem_range_reverse()
+ * __next_mem_range_rev - generic next function for for_each_*_range_rev()
+ *
+ * Finds the next range from type_a which is not marked as unsuitable
+ * in type_b.
+ *
* @idx: pointer to u64 loop variable
* @nid: nid: node selector, %NUMA_NO_NODE for all nodes
@@ -877,17 +983,34 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
continue;

- /* skip hotpluggable memory regions if needed */
- if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
- continue;
-
- /* scan areas before each reservation for intersection */
- for ( ; ri >= 0; ri--) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
+ /* With type_b NULL we only iterate through type_a */
+ if (type_b == NULL) {
+ if (out_start)
+ *out_start = m_start;
+ if (out_end)
+ *out_end = m_end;
+ if (out_nid)
+ *out_nid = memblock_get_region_node(m);
+ idx_a--;
+ *idx = (u32)idx_a;
+ return;
+ }

- /* if ri advanced past mi, break out to advance mi */
+ /* scan areas before each reservation */
+ for (; idx_b >= 0; idx_b--) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+ int m_nid = memblock_get_region_node(m);
+
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
+ r->base : ULLONG_MAX;
+ /*
+ * if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
if (r_end <= m_start)
break;
/* if the two regions intersect, we're done */
@@ -897,18 +1020,17 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
-
+ *out_nid = m_nid;
if (m_start >= r_start)
- mi--;
+ idx_a--;
else
- ri--;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b--;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
+ /* signal end of iteration */
*idx = ULLONG_MAX;
}

@@ -1294,6 +1416,9 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
/* truncate both memory and reserved regions */
__memblock_remove(&memblock.memory, max_addr, (phys_addr_t)ULLONG_MAX);
__memblock_remove(&memblock.reserved, max_addr, (phys_addr_t)ULLONG_MAX);
+#ifdef ARCH_MEMBLOCK_NOMAP
+ __memblock_remove(&memblock.nomap, max_addr, (phys_addr_t)ULLONG_MAX);
+#endif
}

static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
@@ -1438,12 +1563,22 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
void __init_memblock __memblock_dump_all(void)
{
pr_info("MEMBLOCK configuration:\n");
+#ifndef CONFIG_ARCH_MEMBLOCK_NOMAP
pr_info(" memory size = %#llx reserved size = %#llx\n",
(unsigned long long)memblock.memory.total_size,
(unsigned long long)memblock.reserved.total_size);
+#else
+ pr_info(" memory size = %#llx reserved size = %#llx\n",
+ (unsigned long long)memblock.memory.total_size,
+ (unsigned long long)memblock.reserved.total_size);
+ (unsigned long long)memblock.nomap.total_size);
+#endif

memblock_dump(&memblock.memory, "memory");
memblock_dump(&memblock.reserved, "reserved");
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ memblock_dump(&memblock.nomap, "nomap");
+#endif
}

void __init memblock_allow_resize(void)
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index e2906a5..c57d5e3 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -133,6 +133,15 @@ static unsigned long __init free_low_memory_core_early(void)
size = get_allocated_memblock_memory_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);
+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+
+ /* Free memblock.nomap array if it was allocated */
+ size = get_allocated_memblock_memory_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+
+#endif
#endif

return count;

Philipp Hachtmann

unread,
Jan 13, 2014, 8:10:02 AM1/13/14
to
These two patches fit (only) on top of linux-next!

The first patch changes back the behavior of free_all_bootmem() to
a more generic way: With CONFIG_DISCARD_MEMBLOCK memblock.memory
and memblock.reserved will be freed (if allocated, of course).
Removed the debugfs dependency. Think this is cleaner.

While further working on the s390 migration to memblock it is desirable
to have memblock support unmapped (i.e. completely forgotten and unused)
memory areas. The usual way of just forgetting about them by means of
truncating the memblocks does not work for us because we still need the
information about the real full memory structure at a later time.

(sorry for the two too bad versions before)

Philipp Hachtmann (2):
mm/nobootmem: free_all_bootmem again
mm/memblock: Add support for excluded memory areas

arch/s390/Kconfig | 1 +
include/linux/memblock.h | 50 +++++++--
mm/Kconfig | 3 +
mm/memblock.c | 278 ++++++++++++++++++++++++++++++++++-------------
mm/nobootmem.c | 13 ++-
5 files changed, 258 insertions(+), 87 deletions(-)

Philipp Hachtmann

unread,
Jan 13, 2014, 8:10:02 AM1/13/14
to
Add a new memory state "nomap" to memblock. This can be used to truncate
the usable memory in the system without forgetting about what is really
installed.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
arch/s390/Kconfig | 1 +
include/linux/memblock.h | 50 +++++++--
mm/Kconfig | 3 +
mm/memblock.c | 261 ++++++++++++++++++++++++++++++++++++-----------
mm/nobootmem.c | 9 ++
5 files changed, 254 insertions(+), 70 deletions(-)

diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 4f858f7..9346e2c 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -61,6 +61,7 @@ config PCI_QUIRKS
config S390
def_bool y
select ARCH_DISCARD_MEMBLOCK
+ select ARCH_MEMBLOCK_NOMAP
select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
select ARCH_HAS_DEBUG_STRICT_USER_COPY_CHECKS
select ARCH_HAVE_NMI_SAFE_CMPXCHG
index 9c0aeef..855e642 100644
@@ -757,18 +781,78 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
+ size, MAX_NUMNODES, 0);
+ if (ret)
+ return ret;
+
+ return memblock_add_region(&memblock.nomap, base,
+ size, MAX_NUMNODES, 0);
@@ -780,96 +864,120 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
@@ -877,17 +985,34 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
@@ -897,18 +1022,17 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
-
+ *out_nid = m_nid;
if (m_start >= r_start)
- mi--;
+ idx_a--;
else
- ri--;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b--;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
+ /* signal end of iteration */
*idx = ULLONG_MAX;
}

@@ -1294,6 +1418,9 @@ void __init memblock_enforce_memory_limit(phys_addr_t limit)
/* truncate both memory and reserved regions */
__memblock_remove(&memblock.memory, max_addr, (phys_addr_t)ULLONG_MAX);
__memblock_remove(&memblock.reserved, max_addr, (phys_addr_t)ULLONG_MAX);
+#ifdef ARCH_MEMBLOCK_NOMAP
+ __memblock_remove(&memblock.nomap, max_addr, (phys_addr_t)ULLONG_MAX);
+#endif
}

static int __init_memblock memblock_search(struct memblock_type *type, phys_addr_t addr)
@@ -1438,12 +1565,22 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
void __init_memblock __memblock_dump_all(void)
{
pr_info("MEMBLOCK configuration:\n");
+#ifndef CONFIG_ARCH_MEMBLOCK_NOMAP
pr_info(" memory size = %#llx reserved size = %#llx\n",
(unsigned long long)memblock.memory.total_size,
(unsigned long long)memblock.reserved.total_size);
+#else
+ pr_info(" memory size = %#llx reserved size = %#llx nomap size = %#llx\n",
+ (unsigned long long)memblock.memory.total_size,
+ (unsigned long long)memblock.reserved.total_size,

Philipp Hachtmann

unread,
Jan 13, 2014, 8:10:02 AM1/13/14
to
get_allocated_memblock_reserved_regions_info() should work if it is
compiled in. Extended the ifdef around
get_allocated_memblock_memory_regions_info() to include
get_allocated_memblock_reserved_regions_info() as well.
Similar changes in nobootmem.c/free_low_memory_core_early() where
the two functions are called.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
mm/memblock.c | 17 ++---------------
mm/nobootmem.c | 4 ++--
2 files changed, 4 insertions(+), 17 deletions(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index 64ed243..9c0aeef 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 17c8902..e2906a5 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -122,13 +122,13 @@ static unsigned long __init free_low_memory_core_early(void)
for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
count += __free_memory_core(start, end);

+#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
+
/* Free memblock.reserved array if it was allocated */
size = get_allocated_memblock_reserved_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);

-#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-
/* Free memblock.memory array if it was allocated */
size = get_allocated_memblock_memory_regions_info(&start);
if (size)

Andrew Morton

unread,
Jan 13, 2014, 7:40:02 PM1/13/14
to
On Mon, 13 Jan 2014 14:03:37 +0100 Philipp Hachtmann <pha...@linux.vnet.ibm.com> wrote:

> Add a new memory state "nomap" to memblock. This can be used to truncate
> the usable memory in the system without forgetting about what is really
> installed.
>
> ...
>
> 5 files changed, 254 insertions(+), 70 deletions(-)

Patch is big. I'll toss this in for some testing but it does look too
large and late for 3.14. How will this affect your s390 development?

Hopefully some people who are familiar with memblock will have time to
review this carefully, please.

Philipp Hachtmann

unread,
Jan 14, 2014, 4:50:02 AM1/14/14
to
Am Mon, 13 Jan 2014 16:36:20 -0800
schrieb Andrew Morton <ak...@linux-foundation.org>:

> Patch is big. I'll toss this in for some testing but it does look too
> large and late for 3.14. How will this affect your s390 development?

It is needed for s390 bootmem -> memblock transition. The s390 dump
mechanisms cannot be switched to memblock (from using something s390
specific called memory_chunk) without the nomap list.
I'm also working on another enhancement on s390 that will rely on a
clean transition to memblock.

I have written and tested the stuff on top of our local development
tree. And then realised that it does not fit the linux-next tree. So I
converted it to fit linux-next and posted it. Have to maintain two
versions now.

Grygorii Strashko

unread,
Jan 14, 2014, 7:30:02 AM1/14/14
to
Hi Philipp,

On 01/13/2014 03:03 PM, Philipp Hachtmann wrote:
> Add a new memory state "nomap" to memblock. This can be used to truncate
> the usable memory in the system without forgetting about what is really
> installed.


Sorry, but this solution looks a bit complex (and probably wrong - from design point of view))
if you need just to fix memblock_start_of_DRAM()/memblock_end_of_DRAM() APIs.

More over, other arches use at least below APIs:
- memblock_is_region_memory() !!!
- for_each_memblock(memory, reg) !!!
- __next_mem_pfn_range() !!!
- memblock_phys_mem_size()
- memblock_mem_size()
- memblock_start_of_DRAM()
- memblock_end_of_DRAM()
with assumption that "memory" regions array have been updated
when mem block is stolen (no-mapped), as result this change may
have unpredictable side effects :( if these new APIs
will be re-used (for ARM arch, as example).

You can take a look on how ARM is using arm_memblock_steal() -
the stolen memory is not accounted any more.

Seems, it would be safer to track separately memory, available
for Linux ("memory" regions), and real phys memory. For example:
- add memblock type "phys_memory" and update it each time
memblock_add()/memblock_remove() are called,
but don't update, if memblock_nomap()/memblock_remap() are called?

Another question is - Should the real phys memory configuration data be
a part of memblock or not?

Also, I like more memblock_steal()/memblock_reclaim() names for new APIs )

regards,
-grygorii

Santosh Shilimkar

unread,
Jan 14, 2014, 9:40:02 AM1/14/14
to
On Tuesday 14 January 2014 08:17 AM, Grygorii Strashko wrote:
> Hi Philipp,
>
> On 01/13/2014 03:03 PM, Philipp Hachtmann wrote:
>> Add a new memory state "nomap" to memblock. This can be used to truncate
>> the usable memory in the system without forgetting about what is really
>> installed.
>
>
> Sorry, but this solution looks a bit complex (and probably wrong - from design point of view))
> if you need just to fix memblock_start_of_DRAM()/memblock_end_of_DRAM() APIs.
>
> More over, other arches use at least below APIs:
> - memblock_is_region_memory() !!!
> - for_each_memblock(memory, reg) !!!
> - __next_mem_pfn_range() !!!
> - memblock_phys_mem_size()
> - memblock_mem_size()
> - memblock_start_of_DRAM()
> - memblock_end_of_DRAM()
> with assumption that "memory" regions array have been updated
> when mem block is stolen (no-mapped), as result this change may
> have unpredictable side effects :( if these new APIs
> will be re-used (for ARM arch, as example).
>
> You can take a look on how ARM is using arm_memblock_steal() -
> the stolen memory is not accounted any more.
>
I was also wondering instead of nomap state, the memblock_add/remove()
will do the same trick. arm_memblock_steal() wrapper does achieve
similar functionality of reserving the DRAM without mapping it into
the Linux. Why not just use the same idea ?

Regards,
Santosh

Philipp Hachtmann

unread,
Jan 14, 2014, 2:00:01 PM1/14/14
to
Hello Grygorii,

thank you for your comments.

To clarify we have the following requirements for memblock:

(1) Reserved areas can be declared before memory is added.
(2) The physical memory is detected once only.
(3) The free memory (i.e. not reserved) memory can be iterated to add
it to the buddy allocator.
(4) Memory designated to be mapped into the kernel address space can be
iterated.
(5) Kdump on s390 requires knowledge about the full system memory
layout.

The s390 kdump implementation works a bit different from the
implementation on other architectures: The layout is not taken from the
production system and saved for the kdump kernel. Instead the kdump
kernel needs to gather information about the whole memory without
respect to locked out areas (like mem= and OLDMEM etc.).

Without kdump's requirement it would of course be suitable and easy
just to remove memory from memblock.memory. But then this information
is lost for later use by kdump.

The patch does not change any behaviour of the current API - whether it
is enabled or not.

The current patch seems to be overly complicated.
The following patch contains only the nomap functionality without any
cleanup and refactoring. I will post a V4 patch set which will contain
this patch.

Kind regards

Philipp

From eb1ad42c19c6f7bfdf3258a83dc54461c5f02d55 Mon Sep 17 00:00:00 2001
From: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
Date: Tue, 14 Jan 2014 19:49:39 +0100
Subject: [PATCH] mm/memblock: Add support for excluded memory areas

Add a new memory state "nomap" to memblock. This can be used to truncate
the usable memory in the system without forgetting about what is really
installed.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
include/linux/memblock.h | 25 +++++++
mm/Kconfig | 3 +
mm/memblock.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++-
mm/nobootmem.c | 9 +++
4 files changed, 211 insertions(+), 1 deletion(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 1ef6636..be1c819 100644
@@ -133,6 +141,23 @@ void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
i != (u64)ULLONG_MAX; \
__next_free_mem_range(&i, nid, p_start, p_end, p_nid))

+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+#define for_each_mapped_mem_range(i, nid, p_start, p_end, p_nid) \
+ for (i = 0, \
+ __next_mapped_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, p_start, \
+ p_end, p_nid); \
+ i != (u64)ULLONG_MAX; \
+ __next_mapped_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, \
+ p_start, p_end, p_nid))
+
+void __next_mapped_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);
+
+#endif
+
void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
phys_addr_t *out_end, int *out_nid);

diff --git a/mm/Kconfig b/mm/Kconfig
index 2d9f150..6907654 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
config ARCH_DISCARD_MEMBLOCK
boolean

+config ARCH_MEMBLOCK_NOMAP
+ boolean
+
config NO_BOOTMEM
boolean

diff --git a/mm/memblock.c b/mm/memblock.c
index 9c0aeef..b36f5d3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -28,6 +28,11 @@
static struct memblock_region memblock_memory_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;
static struct memblock_region memblock_reserved_init_regions[INIT_MEMBLOCK_REGIONS] __initdata_memblock;

+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+static struct memblock_region
+memblock_nomap_init_regions[INIT_MEMBLOCK_NOMAP_REGIONS] __initdata_memblock;
+#endif
+
struct memblock memblock __initdata_memblock = {
.memory.regions = memblock_memory_init_regions,
.memory.cnt = 1, /* empty dummy entry */
@@ -37,6 +42,11 @@ struct memblock memblock __initdata_memblock = {
.reserved.cnt = 1, /* empty dummy entry */
.reserved.max = INIT_MEMBLOCK_REGIONS,

+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ .nomap.regions = memblock_nomap_init_regions,
+ .nomap.cnt = 1, /* empty dummy entry */
+ .nomap.max = INIT_MEMBLOCK_NOMAP_REGIONS,
+#endif
.bottom_up = false,
.current_limit = MEMBLOCK_ALLOC_ANYWHERE,
};
@@ -292,6 +302,20 @@ phys_addr_t __init_memblock get_allocated_memblock_memory_regions_info(
memblock.memory.max);
}

+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+phys_addr_t __init_memblock get_allocated_memblock_nomap_regions_info(
+ phys_addr_t *addr)
+{
+ if (memblock.memory.regions == memblock_memory_init_regions)
+ return 0;
+
+ *addr = __pa(memblock.memory.regions);
+
+ return PAGE_ALIGN(sizeof(struct memblock_region) *
+ memblock.memory.max);
+}
+
+#endif /* CONFIG_ARCH_MEMBLOCK_NOMAP */
#endif

/**
@@ -757,6 +781,60 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size)
* __next_free_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
@@ -836,6 +914,88 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
*idx = ULLONG_MAX;
}

+#ifdef ARCH_MEMBLOCK_NOMAP
+/**
+ * __next_mapped_mem_range - next function for for_each_free_mem_range()
+ * @idx: pointer to u64 loop variable
+ * @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ * @out_nid: ptr to int for nid of the range, can be %NULL
+ *
+ * Find the first free area from *@idx which matches @nid, fill the out
+ * parameters, and update *@idx for the next iteration. The lower 32bit of
+ * *@idx contains index into memory region and the upper 32bit indexes the
+ * areas before each reserved region. For example, if reserved regions
+ * look like the following,
+ *
+ * 0:[0-16), 1:[32-48), 2:[128-130)
+ *
+ * The upper 32bit indexes the following regions.
+ *
+ * 0:[0-0), 1:[16-32), 2:[48-128), 3:[130-MAX)
+ *
+ * As both region arrays are sorted, the function advances the two indices
+ * in lockstep and returns each intersection.
+ */
+void __init_memblock __next_mapped_mem_range(u64 *idx, int nid,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
+{
+ struct memblock_type *mem = &memblock.memory;
+ struct memblock_type *rsv = &memblock.nomap;
+ int mi = *idx & 0xffffffff;
+ int ri = *idx >> 32;
+
+ if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
+ nid = NUMA_NO_NODE;
+
+ for (; mi < mem->cnt; mi++) {
+ struct memblock_region *m = &mem->regions[mi];
+ phys_addr_t m_start = m->base;
+ phys_addr_t m_end = m->base + m->size;
+
+ /* only memory regions are associated with nodes, check it */
+ if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
+ continue;
+
+ /* scan areas before each reservation for intersection */
+ for (; ri < rsv->cnt + 1; ri++) {
+ struct memblock_region *r = &rsv->regions[ri];
+ phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
+ phys_addr_t r_end = ri < rsv->cnt ?
+ r->base : ULLONG_MAX;
+
+ /* if ri advanced past mi, break out to advance mi */
+ if (r_start >= m_end)
+ break;
+ /* if the two regions intersect, we're done */
+ if (m_start < r_end) {
+ if (out_start)
+ *out_start = max(m_start, r_start);
+ if (out_end)
+ *out_end = min(m_end, r_end);
+ if (out_nid)
+ *out_nid = memblock_get_region_node(m);
+ /*
+ * The region which ends first is advanced
+ * for the next iteration.
+ */
+ if (m_end <= r_end)
+ mi++;
+ else
+ ri++;
+ *idx = (u32)mi | (u64)ri << 32;
+ return;
+ }
+ }
+ }
+
+ /* signal end of iteration */
+ *idx = ULLONG_MAX;
+}
+#endif
+
/**
* __next_free_mem_range_rev - next function for for_each_free_mem_range_reverse()
* @idx: pointer to u64 loop variable
@@ -1438,12 +1598,21 @@ static void __init_memblock memblock_dump(struct memblock_type *type, char *name
void __init_memblock __memblock_dump_all(void)
{
pr_info("MEMBLOCK configuration:\n");
+#ifndef CONFIG_ARCH_MEMBLOCK_NOMAP
pr_info(" memory size = %#llx reserved size = %#llx\n",
(unsigned long long)memblock.memory.total_size,
(unsigned long long)memblock.reserved.total_size);
-
+#else
+ pr_info(" memory size = %#llx reserved size = %#llx nomap size = %#llx\n",
+ (unsigned long long)memblock.memory.total_size,
+ (unsigned long long)memblock.reserved.total_size,
+ (unsigned long long)memblock.nomap.total_size);
+#endif
memblock_dump(&memblock.memory, "memory");
memblock_dump(&memblock.reserved, "reserved");
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ memblock_dump(&memblock.nomap, "nomap");
+#endif
}

void __init memblock_allow_resize(void)
@@ -1502,6 +1671,10 @@ static int __init memblock_init_debugfs(void)
return -ENXIO;
debugfs_create_file("memory", S_IRUGO, root, &memblock.memory, &memblock_debug_fops);
debugfs_create_file("reserved", S_IRUGO, root, &memblock.reserved, &memblock_debug_fops);
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ debugfs_create_file("nomap", S_IRUGO, root,
+ &memblock.nomap, &memblock_debug_fops);
+#endif

return 0;
}
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index e2906a5..c57d5e3 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -133,6 +133,15 @@ static unsigned long __init free_low_memory_core_early(void)
size = get_allocated_memblock_memory_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);
+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+
+ /* Free memblock.nomap array if it was allocated */
+ size = get_allocated_memblock_memory_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+
+#endif
#endif

return count;
--
1.8.4.5

Yinghai Lu

unread,
Jan 14, 2014, 2:30:01 PM1/14/14
to
On Mon, Jan 13, 2014 at 3:37 AM, Philipp Hachtmann
<pha...@linux.vnet.ibm.com> wrote:
> get_allocated_memblock_reserved_regions_info() should work if it is
> compiled in. Extended the ifdef around
> get_allocated_memblock_memory_regions_info() to include
> get_allocated_memblock_reserved_regions_info() as well.
> Similar changes in nobootmem.c/free_low_memory_core_early() where
> the two functions are called.
>
> Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>

Acked-by: Yinghai Lu <yin...@kernel.org>

Andrew Morton

unread,
Jan 14, 2014, 5:10:02 PM1/14/14
to
On Tue, 14 Jan 2014 10:42:53 +0100 Philipp Hachtmann <pha...@linux.vnet.ibm.com> wrote:

> Am Mon, 13 Jan 2014 16:36:20 -0800
> schrieb Andrew Morton <ak...@linux-foundation.org>:
>
> > Patch is big. I'll toss this in for some testing but it does look too
> > large and late for 3.14. How will this affect your s390 development?
>
> It is needed for s390 bootmem -> memblock transition. The s390 dump
> mechanisms cannot be switched to memblock (from using something s390
> specific called memory_chunk) without the nomap list.
> I'm also working on another enhancement on s390 that will rely on a
> clean transition to memblock.
>
> I have written and tested the stuff on top of our local development
> tree. And then realised that it does not fit the linux-next tree. So I
> converted it to fit linux-next and posted it. Have to maintain two
> versions now.

So at 3.14-rc1 everything will come good - get the review issues sorted
out, add the patch to your tree (and hence linux-next).

Philipp Hachtmann

unread,
Jan 16, 2014, 8:30:02 AM1/16/14
to
diff --git a/mm/memblock.c b/mm/memblock.c
index 9c0aeef..b36f5d3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index 12cbb04..3608940 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -137,6 +137,15 @@ static unsigned long __init free_low_memory_core_early(void)
size = get_allocated_memblock_memory_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);
+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+
+ /* Free memblock.nomap array if it was allocated */
+ size = get_allocated_memblock_memory_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+
+#endif
#endif

return count;
--
1.8.4.5

Philipp Hachtmann

unread,
Jan 16, 2014, 8:30:02 AM1/16/14
to
Here is a new version of the memblock.nomap patch.

This time without the first patch (that has already been taken by akpm).

The second patch is now split into the functional part (Add support...)
and the cleanup/refactoring part. This has been done for clarity as
announced before.

Philipp Hachtmann (2):
mm/memblock: Add support for excluded memory areas
mm/memblock: Cleanup and refactoring after addition of nomap

include/linux/memblock.h | 57 +++++++++---
mm/Kconfig | 3 +
mm/memblock.c | 233 +++++++++++++++++++++++++++++++++++------------
mm/nobootmem.c | 9 ++
4 files changed, 231 insertions(+), 71 deletions(-)

Philipp Hachtmann

unread,
Jan 16, 2014, 8:30:02 AM1/16/14
to
Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
include/linux/memblock.h | 50 ++++++-----
mm/memblock.c | 214 +++++++++++++++++------------------------------
2 files changed, 107 insertions(+), 157 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index be1c819..ec2da3b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -121,8 +121,9 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

-void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
+void __next_mem_range(u64 *idx, int nid, struct memblock_type *type_a,
+ struct memblock_type *type_b, phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);

/**
* for_each_free_mem_range - iterate through free memblock areas
@@ -137,29 +138,31 @@ void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
*/
#define for_each_free_mem_range(i, nid, p_start, p_end, p_nid) \
for (i = 0, \
- __next_free_mem_range(&i, nid, p_start, p_end, p_nid); \
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.reserved, p_start, \
+ p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_free_mem_range(&i, nid, p_start, p_end, p_nid))
-
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid))

#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
#define for_each_mapped_mem_range(i, nid, p_start, p_end, p_nid) \
for (i = 0, \
- __next_mapped_mem_range(&i, nid, &memblock.memory, \
+ __next_mem_range(&i, nid, &memblock.memory, \
&memblock.nomap, p_start, \
p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_mapped_mem_range(&i, nid, &memblock.memory, \
- &memblock.nomap, \
- p_start, p_end, p_nid))
-
-void __next_mapped_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
-
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, \
+ p_start, p_end, p_nid))
#endif

-void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
+void __next_mem_range_rev(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);

/**
* for_each_free_mem_range_reverse - rev-iterate through free memblock areas
@@ -174,9 +177,15 @@ void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
*/
#define for_each_free_mem_range_reverse(i, nid, p_start, p_end, p_nid) \
for (i = (u64)ULLONG_MAX, \
- __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid); \
+ __next_mem_range_rev(&i, nid, \
+ &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_free_mem_range_rev(&i, nid, p_start, p_end, p_nid))
+ __next_mem_range_rev(&i, nid, \
+ &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid))

static inline void memblock_set_region_flags(struct memblock_region *r,
unsigned long flags)
@@ -321,12 +330,11 @@ static inline unsigned long memblock_region_reserved_end_pfn(const struct memblo
return PFN_UP(reg->base + reg->size);
}

-#define for_each_memblock(memblock_type, region) \
- for (region = memblock.memblock_type.regions; \
- region < (memblock.memblock_type.regions + memblock.memblock_type.cnt); \
+#define for_each_memblock(type_name, region) \
+ for (region = memblock.type_name.regions; \
+ region < (memblock.type_name.regions + memblock.type_name.cnt); \
region++)

-
#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
#define __init_memblock __meminit
#define __initdata_memblock __meminitdata
diff --git a/mm/memblock.c b/mm/memblock.c
index b36f5d3..dd6fd6f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -836,97 +836,23 @@ int __init_memblock memblock_remap(phys_addr_t base, phys_addr_t size)
#endif

/**
- * __next_free_mem_range - next function for for_each_free_mem_range()
- * @idx: pointer to u64 loop variable
- * @nid: node selector, %NUMA_NO_NODE for all nodes
- * @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
- * @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
- * @out_nid: ptr to int for nid of the range, can be %NULL
- *
- * Find the first free area from *@idx which matches @nid, fill the out
- * parameters, and update *@idx for the next iteration. The lower 32bit of
- * *@idx contains index into memory region and the upper 32bit indexes the
- * areas before each reserved region. For example, if reserved regions
- * look like the following,
- *
- * 0:[0-16), 1:[32-48), 2:[128-130)
+ * __next_mem_range - generic next function for for_each_*_range()
*
- * The upper 32bit indexes the following regions.
+ * Finds the next range from type_a which is not marked as unsuitable
+ * in type_b.
*
- * 0:[0-0), 1:[16-32), 2:[48-128), 3:[130-MAX)
- *
- * As both region arrays are sorted, the function advances the two indices
- * in lockstep and returns each intersection.
- */
-void __init_memblock __next_free_mem_range(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
-{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.reserved;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
-
- if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
- nid = NUMA_NO_NODE;
-
- for ( ; mi < mem->cnt; mi++) {
- struct memblock_region *m = &mem->regions[mi];
- phys_addr_t m_start = m->base;
- phys_addr_t m_end = m->base + m->size;
-
- /* only memory regions are associated with nodes, check it */
- if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
- continue;
-
- /* scan areas before each reservation for intersection */
- for ( ; ri < rsv->cnt + 1; ri++) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
-
- /* if ri advanced past mi, break out to advance mi */
- if (r_start >= m_end)
- break;
- /* if the two regions intersect, we're done */
- if (m_start < r_end) {
- if (out_start)
- *out_start = max(m_start, r_start);
- if (out_end)
- *out_end = min(m_end, r_end);
- if (out_nid)
- *out_nid = memblock_get_region_node(m);
- /*
- * The region which ends first is advanced
- * for the next iteration.
- */
- if (m_end <= r_end)
- mi++;
- else
- ri++;
- *idx = (u32)mi | (u64)ri << 32;
- return;
- }
- }
- }
-
- /* signal end of iteration */
- *idx = ULLONG_MAX;
-}
-
-#ifdef ARCH_MEMBLOCK_NOMAP
-/**
- * __next_mapped_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
* @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Find the first free area from *@idx which matches @nid, fill the out
+ * Find the first present area from *@idx which matches @nid, fill the out
* parameters, and update *@idx for the next iteration. The lower 32bit of
- * *@idx contains index into memory region and the upper 32bit indexes the
- * areas before each reserved region. For example, if reserved regions
+ * *@idx contains index into type_a region and the upper 32bit indexes the
+ * areas before each type_b region. For example, if type_a regions
* look like the following,
*
* 0:[0-16), 1:[32-48), 2:[128-130)
@@ -938,98 +864,107 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
* As both region arrays are sorted, the function advances the two indices
* in lockstep and returns each intersection.
*/
-void __init_memblock __next_mapped_mem_range(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
+void __init_memblock __next_mem_range(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.nomap;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
+ int idx_a = *idx & 0xffffffff;
+ int idx_b = *idx >> 32;

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;

- for (; mi < mem->cnt; mi++) {
- struct memblock_region *m = &mem->regions[mi];
+ for (; idx_a < type_a->cnt; idx_a++) {
+ struct memblock_region *m = &type_a->regions[idx_a];
phys_addr_t m_start = m->base;
phys_addr_t m_end = m->base + m->size;
+ int m_nid = memblock_get_region_node(m);

/* only memory regions are associated with nodes, check it */
if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
continue;

- /* scan areas before each reservation for intersection */
- for (; ri < rsv->cnt + 1; ri++) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ?
+ /* scan areas before each reservation */
+ for (; idx_b < type_b->cnt + 1; idx_b++) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
r->base : ULLONG_MAX;

- /* if ri advanced past mi, break out to advance mi */
+ /*
+ *if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
if (r_start >= m_end)
break;
/* if the two regions intersect, we're done */
if (m_start < r_end) {
if (out_start)
- *out_start = max(m_start, r_start);
+ *out_start =
+ max(m_start, r_start);
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
+ *out_nid = m_nid;
+
/*
- * The region which ends first is advanced
- * for the next iteration.
+ * The region which ends first is
+ * advanced for the next iteration.
*/
if (m_end <= r_end)
- mi++;
+ idx_a++;
else
- ri++;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b++;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
/* signal end of iteration */
*idx = ULLONG_MAX;
}
-#endif

/**
- * __next_free_mem_range_rev - next function for for_each_free_mem_range_reverse()
+ * __next_mem_range_rev - generic next function for for_each_*_range_rev()
+ *
+ * Finds the next range from type_a which is not marked as unsuitable
+ * in type_b.
+ *
* @idx: pointer to u64 loop variable
* @nid: nid: node selector, %NUMA_NO_NODE for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Reverse of __next_free_mem_range().
- *
- * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
- * be able to hot-remove hotpluggable memory used by the kernel. So this
- * function skip hotpluggable regions if needed when allocating memory for the
- * kernel.
+ * Reverse of __next_mem_range().
*/
-void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
+void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.reserved;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
+ int idx_a = *idx & 0xffffffff;
+ int idx_b = *idx >> 32;

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;

if (*idx == (u64)ULLONG_MAX) {
- mi = mem->cnt - 1;
- ri = rsv->cnt;
+ idx_a = type_a->cnt - 1;
+ idx_b = type_b->cnt;
}

- for ( ; mi >= 0; mi--) {
- struct memblock_region *m = &mem->regions[mi];
+ for (; idx_a >= 0; idx_a--) {
+ struct memblock_region *m = &type_a->regions[idx_a];
phys_addr_t m_start = m->base;
phys_addr_t m_end = m->base + m->size;

@@ -1041,13 +976,21 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
continue;

- /* scan areas before each reservation for intersection */
- for ( ; ri >= 0; ri--) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
+ /* scan areas before each reservation */
+ for (; idx_b >= 0; idx_b--) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+ int m_nid = memblock_get_region_node(m);

- /* if ri advanced past mi, break out to advance mi */
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
+ r->base : ULLONG_MAX;
+ /*
+ * if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
if (r_end <= m_start)
break;
/* if the two regions intersect, we're done */
@@ -1057,18 +1000,17 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
-
+ *out_nid = m_nid;
if (m_start >= r_start)
- mi--;
+ idx_a--;
else
- ri--;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b--;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
+ /* signal end of iteration */
*idx = ULLONG_MAX;

Philipp Hachtmann

unread,
Jan 16, 2014, 11:00:02 AM1/16/14
to
Patch fits Andrew's linux-next.

Strashko, Grygorii

unread,
Jan 17, 2014, 1:10:02 PM1/17/14
to
Hi Philipp,

On 01/14/2014 08:52 PM, Philipp Hachtmann wrote:
> Hello Grygorii,
>
> thank you for your comments.
>
> To clarify we have the following requirements for memblock:
>
> (1) Reserved areas can be declared before memory is added.
> (2) The physical memory is detected once only.
> (3) The free memory (i.e. not reserved) memory can be iterated to add
> it to the buddy allocator.
> (4) Memory designated to be mapped into the kernel address space can be
> iterated.
> (5) Kdump on s390 requires knowledge about the full system memory
> layout.
>
> The s390 kdump implementation works a bit different from the
> implementation on other architectures: The layout is not taken from the
> production system and saved for the kdump kernel. Instead the kdump
> kernel needs to gather information about the whole memory without
> respect to locked out areas (like mem= and OLDMEM etc.).
>
> Without kdump's requirement it would of course be suitable and easy
> just to remove memory from memblock.memory. But then this information
> is lost for later use by kdump.
>
> The patch does not change any behaviour of the current API - whether it
> is enabled or not.

Sorry, for the delayed reply.

My main concern here was that you are introducing new *generic* API,
but in fact it is not generic, because it can't be re-used without huge rework
of existing code.
(at least as of wide usage of for_each_memblock(memory,...),
because (if ARCH_MEMBLOCK_NOMAP=y) the meaning of "memory"
ranges will be changed form "mapped memory" to "real phys memory").

And therefore, I've proposed to keep things as is and introduce phys_memory
ranges instead, to store real phys memory configuration.

>
> The current patch seems to be overly complicated.
> The following patch contains only the nomap functionality without any
> cleanup and refactoring. I will post a V4 patch set which will contain
> this patch.

Regards,
-grygorii

Philipp Hachtmann

unread,
Jan 20, 2014, 3:50:02 AM1/20/14
to
Am Fri, 17 Jan 2014 18:08:13 +0000
schrieb "Strashko, Grygorii" <grygorii...@ti.com>:

Hello Grygorii,

> > The current patch seems to be overly complicated.
> > The following patch contains only the nomap functionality without
> > any cleanup and refactoring. I will post a V4 patch set which will
> > contain this patch.

please see the V4 patch set I've sent to the list. There you will
clearly see that nothing is changed. No API is broken by the patch.
The patch only adds functionality.
Everything that worked before keeps working as before without any
changes needed in any arch's code.

Kind regards

Philipp

Philipp Hachtmann

unread,
Jan 20, 2014, 6:40:02 AM1/20/14
to
Add a new memory state "nomap" to memblock. This can be used to truncate
the usable memory in the system without forgetting about what is really
installed.

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
include/linux/memblock.h | 25 +++++++
mm/Kconfig | 3 +
mm/memblock.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++-
mm/nobootmem.c | 8 ++-
4 files changed, 209 insertions(+), 2 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 1ef6636..be1c819 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -133,6 +141,23 @@ void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
i != (u64)ULLONG_MAX; \
__next_free_mem_range(&i, nid, p_start, p_end, p_nid))

+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+#define for_each_mapped_mem_range(i, nid, p_start, p_end, p_nid) \
+ for (i = 0, \
+ __next_mapped_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, p_start, \
+ p_end, p_nid); \
+ i != (u64)ULLONG_MAX; \
+ __next_mapped_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, \
+ p_start, p_end, p_nid))
+
+void __next_mapped_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);
+
+#endif
+
void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
phys_addr_t *out_end, int *out_nid);

diff --git a/mm/Kconfig b/mm/Kconfig
index 2d9f150..6907654 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -137,6 +137,9 @@ config HAVE_MEMBLOCK_NODE_MAP
config ARCH_DISCARD_MEMBLOCK
boolean

+config ARCH_MEMBLOCK_NOMAP
+ boolean
+
config NO_BOOTMEM
boolean

diff --git a/mm/memblock.c b/mm/memblock.c
index 9c0aeef..b36f5d3 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
* __next_free_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
@@ -836,6 +914,88 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
*idx = ULLONG_MAX;
}

+#ifdef ARCH_MEMBLOCK_NOMAP
+/**
+ * __next_mapped_mem_range - next function for for_each_free_mem_range()
+ * @idx: pointer to u64 loop variable
+ * @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ * @out_nid: ptr to int for nid of the range, can be %NULL
+ *
+ * Find the first free area from *@idx which matches @nid, fill the out
+ * parameters, and update *@idx for the next iteration. The lower 32bit of
+ * *@idx contains index into memory region and the upper 32bit indexes the
+ * areas before each reserved region. For example, if reserved regions
+ * look like the following,
+ *
+ * 0:[0-16), 1:[32-48), 2:[128-130)
+ *
+ * The upper 32bit indexes the following regions.
+ *
+ * 0:[0-0), 1:[16-32), 2:[48-128), 3:[130-MAX)
+ *
+ * As both region arrays are sorted, the function advances the two indices
+ * in lockstep and returns each intersection.
+ */
+void __init_memblock __next_mapped_mem_range(u64 *idx, int nid,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
+{
+ struct memblock_type *mem = &memblock.memory;
+ struct memblock_type *rsv = &memblock.nomap;
+ int mi = *idx & 0xffffffff;
+ int ri = *idx >> 32;
+
+ if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
+ nid = NUMA_NO_NODE;
+
+ for (; mi < mem->cnt; mi++) {
+ struct memblock_region *m = &mem->regions[mi];
+ phys_addr_t m_start = m->base;
+ phys_addr_t m_end = m->base + m->size;
+
+ /* signal end of iteration */
+ *idx = ULLONG_MAX;
+}
+#endif
+
/**
* __next_free_mem_range_rev - next function for for_each_free_mem_range_reverse()
* @idx: pointer to u64 loop variable
index 0215c77..61966b6 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -138,9 +138,15 @@ static unsigned long __init free_low_memory_core_early(void)
size = get_allocated_memblock_memory_regions_info(&start);
if (size)
count += __free_memory_core(start, start + size);
+
+#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
+ /* Free memblock.nomap array if it was allocated */
+ size = get_allocated_memblock_memory_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+#endif
}
#endif
-
return count;
}

--
1.8.4.5

Philipp Hachtmann

unread,
Jan 20, 2014, 6:40:02 AM1/20/14
to
This fixes an unused variable warning in nobootmem.c

Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
mm/nobootmem.c | 28 +++++++++++++++++-----------
1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/mm/nobootmem.c b/mm/nobootmem.c
index e2906a5..0215c77 100644
--- a/mm/nobootmem.c
+++ b/mm/nobootmem.c
@@ -116,23 +116,29 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
static unsigned long __init free_low_memory_core_early(void)
{
unsigned long count = 0;
- phys_addr_t start, end, size;
+ phys_addr_t start, end;
u64 i;

+#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
+ phys_addr_t size;
+#endif
+
for_each_free_mem_range(i, NUMA_NO_NODE, &start, &end, NULL)
count += __free_memory_core(start, end);

#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
-
- /* Free memblock.reserved array if it was allocated */
- size = get_allocated_memblock_reserved_regions_info(&start);
- if (size)
- count += __free_memory_core(start, start + size);
-
- /* Free memblock.memory array if it was allocated */
- size = get_allocated_memblock_memory_regions_info(&start);
- if (size)
- count += __free_memory_core(start, start + size);
+ {
+ phys_addr_t size;
+ /* Free memblock.reserved array if it was allocated */
+ size = get_allocated_memblock_reserved_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+
+ /* Free memblock.memory array if it was allocated */
+ size = get_allocated_memblock_memory_regions_info(&start);
+ if (size)
+ count += __free_memory_core(start, start + size);
+ }
#endif

Philipp Hachtmann

unread,
Jan 20, 2014, 6:40:02 AM1/20/14
to
This all fits linux-next.

The first patch is a fix (not a replacement) to a patch that has already
been put into linux-next. The original patch generates a warning for an
unused variable in case that CONFIG_ARCH_DISCARD_MEMBLOCK is not set.
It's now made the way Andrew suggested.

The second patch adds support for exluded memory region handling in
memblock. This is needed by the current s390 development in conjunction
with kdump.
The patch is straightforward and adds some redundancy. This has been
done to clarify that this patch does not intend to change any of
the current memblock API's behaviour.

The third patch does some cleanup and refactoring to memblock. It removes
the redundancies introduced by the patch before. It also is not intended
to change or break any behaviour or API of memblock.


Philipp Hachtmann (3):
mm/nobootmem: Fix unused variable
mm/memblock: Add support for excluded memory areas
mm/memblock: Cleanup and refactoring after addition of nomap

include/linux/memblock.h | 57 +++++++++---
mm/Kconfig | 3 +
mm/memblock.c | 233 +++++++++++++++++++++++++++++++++++------------
mm/nobootmem.c | 30 ++++--
4 files changed, 243 insertions(+), 80 deletions(-)

Philipp Hachtmann

unread,
Jan 20, 2014, 6:40:02 AM1/20/14
to
Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
---
include/linux/memblock.h | 50 ++++++-----
mm/memblock.c | 214 +++++++++++++++++------------------------------
2 files changed, 107 insertions(+), 157 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index be1c819..ec2da3b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -121,8 +121,9 @@ void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
i >= 0; __next_mem_pfn_range(&i, nid, p_start, p_end, p_nid))
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */

-void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
+void __next_mem_range(u64 *idx, int nid, struct memblock_type *type_a,
+ struct memblock_type *type_b, phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid);

/**
* for_each_free_mem_range - iterate through free memblock areas
@@ -137,29 +138,31 @@ void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
*/
#define for_each_free_mem_range(i, nid, p_start, p_end, p_nid) \
for (i = 0, \
- __next_free_mem_range(&i, nid, p_start, p_end, p_nid); \
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.reserved, p_start, \
+ p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_free_mem_range(&i, nid, p_start, p_end, p_nid))
-
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.reserved, \
+ p_start, p_end, p_nid))

#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
#define for_each_mapped_mem_range(i, nid, p_start, p_end, p_nid) \
for (i = 0, \
- __next_mapped_mem_range(&i, nid, &memblock.memory, \
+ __next_mem_range(&i, nid, &memblock.memory, \
&memblock.nomap, p_start, \
p_end, p_nid); \
i != (u64)ULLONG_MAX; \
- __next_mapped_mem_range(&i, nid, &memblock.memory, \
- &memblock.nomap, \
- p_start, p_end, p_nid))
-
-void __next_mapped_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
-
+ __next_mem_range(&i, nid, &memblock.memory, \
+ &memblock.nomap, \
+ p_start, p_end, p_nid))
#endif

-void __next_free_mem_range_rev(u64 *idx, int nid, phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid);
+void __next_mem_range_rev(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
diff --git a/mm/memblock.c b/mm/memblock.c
index b36f5d3..dd6fd6f 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
- * __next_mapped_mem_range - next function for for_each_free_mem_range()
* @idx: pointer to u64 loop variable
* @nid: node selector, %NUMA_NO_NODE for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Find the first free area from *@idx which matches @nid, fill the out
+ * Find the first present area from *@idx which matches @nid, fill the out
* parameters, and update *@idx for the next iteration. The lower 32bit of
- * *@idx contains index into memory region and the upper 32bit indexes the
- * areas before each reserved region. For example, if reserved regions
+ * *@idx contains index into type_a region and the upper 32bit indexes the
+ * areas before each type_b region. For example, if type_a regions
* look like the following,
*
* 0:[0-16), 1:[32-48), 2:[128-130)
@@ -938,98 +864,107 @@ void __init_memblock __next_free_mem_range(u64 *idx, int nid,
* As both region arrays are sorted, the function advances the two indices
* in lockstep and returns each intersection.
*/
-void __init_memblock __next_mapped_mem_range(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
+void __init_memblock __next_mem_range(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.nomap;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
+ int idx_a = *idx & 0xffffffff;
+ int idx_b = *idx >> 32;

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;

- for (; mi < mem->cnt; mi++) {
- struct memblock_region *m = &mem->regions[mi];
+ for (; idx_a < type_a->cnt; idx_a++) {
+ struct memblock_region *m = &type_a->regions[idx_a];
phys_addr_t m_start = m->base;
phys_addr_t m_end = m->base + m->size;
+ int m_nid = memblock_get_region_node(m);

/* only memory regions are associated with nodes, check it */
if (nid != NUMA_NO_NODE && nid != memblock_get_region_node(m))
continue;

- /* scan areas before each reservation for intersection */
- for (; ri < rsv->cnt + 1; ri++) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ?
+ /* scan areas before each reservation */
+ for (; idx_b < type_b->cnt + 1; idx_b++) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
r->base : ULLONG_MAX;

- /* if ri advanced past mi, break out to advance mi */
+ /*
+ *if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
if (r_start >= m_end)
break;
/* if the two regions intersect, we're done */
if (m_start < r_end) {
if (out_start)
- *out_start = max(m_start, r_start);
+ *out_start =
+ max(m_start, r_start);
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
+ *out_nid = m_nid;
+
/*
- * The region which ends first is advanced
- * for the next iteration.
+ * The region which ends first is
+ * advanced for the next iteration.
*/
if (m_end <= r_end)
- mi++;
+ idx_a++;
else
- ri++;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b++;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
/* signal end of iteration */
*idx = ULLONG_MAX;
}
-#endif

/**
- * __next_free_mem_range_rev - next function for for_each_free_mem_range_reverse()
+ * __next_mem_range_rev - generic next function for for_each_*_range_rev()
+ *
+ * Finds the next range from type_a which is not marked as unsuitable
+ * in type_b.
+ *
* @idx: pointer to u64 loop variable
* @nid: nid: node selector, %NUMA_NO_NODE for all nodes
+ * @type_a: pointer to memblock_type from where the range is taken
+ * @type_b: pointer to memblock_type which excludes memory from being taken
* @out_start: ptr to phys_addr_t for start address of the range, can be %NULL
* @out_end: ptr to phys_addr_t for end address of the range, can be %NULL
* @out_nid: ptr to int for nid of the range, can be %NULL
*
- * Reverse of __next_free_mem_range().
- *
- * Linux kernel cannot migrate pages used by itself. Memory hotplug users won't
- * be able to hot-remove hotpluggable memory used by the kernel. So this
- * function skip hotpluggable regions if needed when allocating memory for the
- * kernel.
+ * Reverse of __next_mem_range().
*/
-void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
- phys_addr_t *out_start,
- phys_addr_t *out_end, int *out_nid)
+void __init_memblock __next_mem_range_rev(u64 *idx, int nid,
+ struct memblock_type *type_a,
+ struct memblock_type *type_b,
+ phys_addr_t *out_start,
+ phys_addr_t *out_end, int *out_nid)
{
- struct memblock_type *mem = &memblock.memory;
- struct memblock_type *rsv = &memblock.reserved;
- int mi = *idx & 0xffffffff;
- int ri = *idx >> 32;
+ int idx_a = *idx & 0xffffffff;
+ int idx_b = *idx >> 32;

if (WARN_ONCE(nid == MAX_NUMNODES, "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n"))
nid = NUMA_NO_NODE;

if (*idx == (u64)ULLONG_MAX) {
- mi = mem->cnt - 1;
- ri = rsv->cnt;
+ idx_a = type_a->cnt - 1;
+ idx_b = type_b->cnt;
}

- for ( ; mi >= 0; mi--) {
- struct memblock_region *m = &mem->regions[mi];
+ for (; idx_a >= 0; idx_a--) {
+ struct memblock_region *m = &type_a->regions[idx_a];
phys_addr_t m_start = m->base;
phys_addr_t m_end = m->base + m->size;

@@ -1041,13 +976,21 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (movable_node_is_enabled() && memblock_is_hotpluggable(m))
continue;

- /* scan areas before each reservation for intersection */
- for ( ; ri >= 0; ri--) {
- struct memblock_region *r = &rsv->regions[ri];
- phys_addr_t r_start = ri ? r[-1].base + r[-1].size : 0;
- phys_addr_t r_end = ri < rsv->cnt ? r->base : ULLONG_MAX;
+ /* scan areas before each reservation */
+ for (; idx_b >= 0; idx_b--) {
+ struct memblock_region *r;
+ phys_addr_t r_start;
+ phys_addr_t r_end;
+ int m_nid = memblock_get_region_node(m);

- /* if ri advanced past mi, break out to advance mi */
+ r = &type_b->regions[idx_b];
+ r_start = idx_b ? r[-1].base + r[-1].size : 0;
+ r_end = idx_b < type_b->cnt ?
+ r->base : ULLONG_MAX;
+ /*
+ * if idx_b advanced past idx_a,
+ * break out to advance idx_a
+ */
if (r_end <= m_start)
break;
/* if the two regions intersect, we're done */
@@ -1057,18 +1000,17 @@ void __init_memblock __next_free_mem_range_rev(u64 *idx, int nid,
if (out_end)
*out_end = min(m_end, r_end);
if (out_nid)
- *out_nid = memblock_get_region_node(m);
-
+ *out_nid = m_nid;
if (m_start >= r_start)
- mi--;
+ idx_a--;
else
- ri--;
- *idx = (u32)mi | (u64)ri << 32;
+ idx_b--;
+ *idx = (u32)idx_a | (u64)idx_b << 32;
return;
}
}
}
-
+ /* signal end of iteration */
*idx = ULLONG_MAX;

Robin Holt

unread,
Jan 20, 2014, 11:30:02 AM1/20/14
to
On Mon, Jan 20, 2014 at 5:32 AM, Philipp Hachtmann
<pha...@linux.vnet.ibm.com> wrote:
> This fixes an unused variable warning in nobootmem.c
>
> Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
> ---
> mm/nobootmem.c | 28 +++++++++++++++++-----------
> 1 file changed, 17 insertions(+), 11 deletions(-)
>
> diff --git a/mm/nobootmem.c b/mm/nobootmem.c
> index e2906a5..0215c77 100644
> --- a/mm/nobootmem.c
> +++ b/mm/nobootmem.c
> @@ -116,23 +116,29 @@ static unsigned long __init __free_memory_core(phys_addr_t start,
> static unsigned long __init free_low_memory_core_early(void)
> {
> unsigned long count = 0;
> - phys_addr_t start, end, size;
> + phys_addr_t start, end;
> u64 i;
>
> +#ifdef CONFIG_ARCH_DISCARD_MEMBLOCK
> + phys_addr_t size;
> +#endif
> +

Is this needed? It looks like you declare size again inside the next
#ifdef chunk.

David Rientjes

unread,
Jan 21, 2014, 1:20:01 AM1/21/14
to
I think you may have misunderstood Andrew's suggestion: "size" here is
overloading the "size" you have already declared for this configuration.

Not sure why you don't just do a one line patch:

- phys_addr_t size;
+ phys_addr_t size __maybe_unused;

to fix it.

> + /* Free memblock.reserved array if it was allocated */
> + size = get_allocated_memblock_reserved_regions_info(&start);
> + if (size)
> + count += __free_memory_core(start, start + size);
> +
> + /* Free memblock.memory array if it was allocated */
> + size = get_allocated_memblock_memory_regions_info(&start);
> + if (size)
> + count += __free_memory_core(start, start + size);
> + }
> #endif
>
> return count;
--

Philipp Hachtmann

unread,
Jan 21, 2014, 2:00:02 AM1/21/14
to


Am Mon, 20 Jan 2014 22:16:33 -0800 (PST)
schrieb David Rientjes <rien...@google.com>:

> Not sure why you don't just do a one line patch:
>
> - phys_addr_t size;
> + phys_addr_t size __maybe_unused;
> to fix it.

Just because I did not know that __maybe_unused thing.

Discussion of this fix seems to be obsolete because Andrew already took
the patch int the form he suggested: One #ifdef in the function with a
basic block declaring size once inside.

Regards

Philipp

David Rientjes

unread,
Jan 21, 2014, 5:00:01 AM1/21/14
to
On Tue, 21 Jan 2014, Philipp Hachtmann wrote:

> > Not sure why you don't just do a one line patch:
> >
> > - phys_addr_t size;
> > + phys_addr_t size __maybe_unused;
> > to fix it.
>
> Just because I did not know that __maybe_unused thing.
>

- phys_addr_t size;
+ phys_addr_t size = 0;

would have done the same thing.

The compiler generated code isn't going to change with either of these, so
we're only talking about how the source code is structured. If you and
Andrew believe that adding block scope to something so trivial then that's
your taste. Looks ugly to me.

Philipp Hachtmann

unread,
Jan 22, 2014, 6:20:02 AM1/22/14
to
Hi again,

I'd like to remind that the s390 development relies on this patch
(and the next one, for cleanliness, of course) being added. It would be
very good to see it being added to the -mm tree resp. linux-next.

Kind regards

Philipp

Robin Holt

unread,
Jan 22, 2014, 10:30:02 AM1/22/14
to
The reason I have not responded is I do not see the utility of this patch
and did not feel like I had been engaged enough in the design of whatever
is going to be using this to know if this is the right direction to
go. As for the
code, it all looks like what I would have done assuming I really needed this.

I don't like the _nomap, because that indicates to too many people too many
different things. That said, without knowing what this is going to be used
for, the only "better" term I could come up with is _reserved which is more
problematic. Just as I was getting ready to send this email, I got the
flickering though that memblock_set_unusable()/memblock_set_usable()
might be a better pair of furnctions.

Sorry for coming across as difficult, I just don't feel comfortable with
my understanding of the context for this patch (and I am too lazy to
dig into it further). I have looked at the prior discussions and I also
don't feel you have addressed the other concerns expressed in
those threads. I, of course, reserve the right to be wrong. I nearly
always am.

Thanks and sorry,
Robin Holt


On Mon, Jan 20, 2014 at 5:32 AM, Philipp Hachtmann
<pha...@linux.vnet.ibm.com> wrote:
> Add a new memory state "nomap" to memblock. This can be used to truncate
> the usable memory in the system without forgetting about what is really
> installed.
>
> Signed-off-by: Philipp Hachtmann <pha...@linux.vnet.ibm.com>
> ---
> include/linux/memblock.h | 25 +++++++
> mm/Kconfig | 3 +
> mm/memblock.c | 175 ++++++++++++++++++++++++++++++++++++++++++++++-
> mm/nobootmem.c | 8 ++-
> 4 files changed, 209 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
> index 1ef6636..be1c819 100644
> --- a/include/linux/memblock.h
> +++ b/include/linux/memblock.h
> @@ -18,6 +18,7 @@
> #include <linux/mm.h>
>
> #define INIT_MEMBLOCK_REGIONS 128
> +#define INIT_MEMBLOCK_NOMAP_REGIONS 4

That 4 seems rather arbitrary. Care to comment on how 4 was determined?

I think SGI has a special purpose driver that might benefit from _nomap
regions. I will drag Cliff Whickman in to comment on that if he feels
like it.

> /* Definition of memblock flags. */
> #define MEMBLOCK_HOTPLUG 0x1 /* hotpluggable region */
> @@ -43,6 +44,9 @@ struct memblock {
> phys_addr_t current_limit;
> struct memblock_type memory;
> struct memblock_type reserved;
> +#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
> + struct memblock_type nomap;
> +#endif
> };
>
> extern struct memblock memblock;
> @@ -68,6 +72,10 @@ int memblock_add(phys_addr_t base, phys_addr_t size);
> int memblock_remove(phys_addr_t base, phys_addr_t size);
> int memblock_free(phys_addr_t base, phys_addr_t size);
> int memblock_reserve(phys_addr_t base, phys_addr_t size);
> +#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
> +int memblock_nomap(phys_addr_t base, phys_addr_t size);
> +int memblock_remap(phys_addr_t base, phys_addr_t size);

Here is why I dislike _nomap. The function to reverse the effect becomes
even more misleading.

> +#endif
> void memblock_trim_memory(phys_addr_t align);
> int memblock_mark_hotplug(phys_addr_t base, phys_addr_t size);
> int memblock_clear_hotplug(phys_addr_t base, phys_addr_t size);
> @@ -133,6 +141,23 @@ void __next_free_mem_range(u64 *idx, int nid, phys_addr_t *out_start,
> i != (u64)ULLONG_MAX; \
> __next_free_mem_range(&i, nid, p_start, p_end, p_nid))
>
> +
> +#ifdef CONFIG_ARCH_MEMBLOCK_NOMAP
> +#define for_each_mapped_mem_range(i, nid, p_start, p_end, p_nid) \

Again with the name. To me, the _mapped_ implies there is a virtual to
physical translation or something like that which changes one address
into another, yet both resolve to the same physical memory.
Personally, I never like using _RET_IP, but that might just be me. Since it is
already used in equivalent functions, I would not try to argue against using
it here.

> +
> + ret = memblock_add_region(&memblock.reserved, base,
> + size, MAX_NUMNODES, 0);
> + if (ret)
> + return ret;
> +
> + return memblock_add_region(&memblock.nomap, base,
> + size, MAX_NUMNODES, 0);
> +}
> +
> +/*
> + * memblock_remap() - remove a memory range from the nomap list
> + *
> + * This is the inverse function to memblock_nomap().

Shouldn't this really be the "reverse" function?

Andrew Morton

unread,
Jan 22, 2014, 3:50:02 PM1/22/14
to
On Wed, 22 Jan 2014 12:18:21 +0100 Philipp Hachtmann <pha...@linux.vnet.ibm.com> wrote:

> Hi again,
>
> I'd like to remind that the s390 development relies on this patch
> (and the next one, for cleanliness, of course) being added. It would be
> very good to see it being added to the -mm tree resp. linux-next.
>

Once the patch has passed review (hopefully by yinghai, who reviews
very well) I'd ask you to include it in the s390 tree which actually
uses it.

Patch 2/3 would benefit from a more complete changelog. Why does s390
need CONFIG_ARCH_MEMBLOCK_NOMAP? How is it used and how does it work?
Do we expect other architectures to use it? If so, how? etcetera.

btw, you have a "#ifdef ARCH_MEMBLOCK_NOMAP" in there which should be
CONFIG_ARCH_MEMBLOCK_NOMAP. I don't see how the code could have
compiled as-is - __next_mapped_mem_range() will be omitted?
0 new messages