Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Re: mmotm boot panic bootmem-avoid-dma32-zone-by-default.patch

6 views
Skip to first unread message

Yinghai Lu

unread,
Mar 5, 2010, 4:10:02 AM3/5/10
to
On 03/04/2010 07:21 PM, Johannes Weiner wrote:
> Hello Greg,
>
> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>> On several systems I am seeing a boot panic if I use mmotm
>> (stamp-2010-03-02-18-38). If I remove
>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. I
>> find that:
>> * 2.6.33 boots fine.
>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>> * 2.6.33 + mmotm (including
>> bootmem-avoid-dma32-zone-by-default.patch): panics.
>> Note: I had to enable earlyprintk to see the panic. Without
>> earlyprintk no console output was seen. The system appeared to hang
>> after the loader.
>
> where sparse_index_init(), in the SPARSEMEM_EXTREME case, will allocate
> the mem_section descriptor with bootmem. If this would fail, the box
> would panic immediately earlier, but NO_BOOTMEM does not seem to get it
> right.
>
> Greg, could you retry _with_ my bootmem patch applied, but with setting
> CONFIG_NO_BOOTMEM=n up front?
>
> I think NO_BOOTMEM has several problems. Yinghai, can you verify them?
...
>
> 1. It does not seem to handle goal appropriately: bootmem would try
> without the goal if it does not make sense. And in this case, the
> goal is 4G (above DMA32) and the amount of memory is 256M.
>
> And if I did not miss something, this is the difference with my patch:
> without it, the default goal is 16M, which is no problem as it is well
> within your available memory. But the change of the default goal moved
> it outside it which the bootmem replacement can not handle.
>
> 2. The early reservation stuff seems to return NULL but callsites assume
> that the bootmem interface never does that. Okay, the result is the same,
> we crash. But it still moves error reporting to a possibly much later
> point where somebody actually dereferences the returned pointer.

under CONFIG_NO_BOOTMEM
for alloc_bootmem_node it will honor goal, if someone input big goal it will not
fallback to get a small one below that goal.

return NULL, could make caller have more choice and more control.

anyway we should honor the goal, otherwise should use _nopanic instead.

according to context
http://patchwork.kernel.org/patch/73893/

Jiri,
please check current linus tree still have problem about mem_map is using that much low mem?

on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
with NO_BOOTMEM

[ 0.000000] a - 11
[ 0.000000] 19 40 - 80 95
[ 0.000000] 702 740 - 1000 1000
[ 0.000000] 331f 3340 - 3400 3400
[ 0.000000] 35dd - 3600
[ 0.000000] 37dd - 3800
[ 0.000000] 39dd - 3a00
[ 0.000000] 3bdd - 3c00
[ 0.000000] 3ddd - 3e00
[ 0.000000] 3fdd - 4000
[ 0.000000] 41dd - 4200
[ 0.000000] 43dd - 4400
[ 0.000000] 45dd - 4600
[ 0.000000] 47dd - 4800
[ 0.000000] 49dd - 4a00
[ 0.000000] 4bdd - 4c00
[ 0.000000] 4ddd - 4e00
[ 0.000000] 4fdd - 5000
[ 0.000000] 51dd - 5200
[ 0.000000] 93dd 9400 - 7d500 7d53b
[ 0.000000] 7f730 - 7f750
[ 0.000000] 100012 100040 - 100200 100200
[ 0.000000] 170200 170200 - 2080000 2080000
[ 0.000000] 2080065 2080080 - 2080200 2080200

so PFN: 9400 - 7d500 are free.

without NO_BOOTMEM
[ 0.000000] nid=0 start=0x0000000000 end=0x0002080000 aligned=1
[ 0.000000] free [0x000000000a - 0x0000000095]
[ 0.000000] free [0x0000000702 - 0x0000001000]
[ 0.000000] free [0x00000032c4 - 0x0000003400]
[ 0.000000] free [0x00000035de - 0x0000003600]
[ 0.000000] free [0x00000037dd - 0x0000003800]
[ 0.000000] free [0x00000039dd - 0x0000003a00]
[ 0.000000] free [0x0000003bdd - 0x0000003c00]
[ 0.000000] free [0x0000003ddd - 0x0000003e00]
[ 0.000000] free [0x0000003fdd - 0x0000004000]
[ 0.000000] free [0x00000041dd - 0x0000004200]
[ 0.000000] free [0x00000043dd - 0x0000004400]
[ 0.000000] free [0x00000045dd - 0x0000004600]
[ 0.000000] free [0x00000047dd - 0x0000004800]
[ 0.000000] free [0x00000049dd - 0x0000004a00]
[ 0.000000] free [0x0000004bdd - 0x0000004c00]
[ 0.000000] free [0x0000004ddd - 0x0000004e00]
[ 0.000000] free [0x0000004fdd - 0x0000005000]
[ 0.000000] free [0x00000051dd - 0x0000005200]
[ 0.000000] free [0x00000053dd - 0x000007d53b]
[ 0.000000] free [0x000007f730 - 0x000007f750]
[ 0.000000] free [0x000010041f - 0x0000100a00]
[ 0.000000] free [0x0000170a00 - 0x0000180a00]
[ 0.000000] free [0x0000180a03 - 0x0002080000]
so pfn: 53dd 7d53b are free

looks like we don't need to change the default goal in alloc_bootmem_node.

YH
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Jiri Slaby

unread,
Mar 5, 2010, 5:30:02 AM3/5/10
to
On 03/05/2010 10:04 AM, Yinghai Lu wrote:
> according to context
> http://patchwork.kernel.org/patch/73893/
>
> Jiri,
> please check current linus tree still have problem about mem_map is using that much low mem?

Hi!

Sorry, I don't have direct access to the machine. I might try to ask the
owners to do so.

> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.

So where gets your mem_map allocated (I suppose you're running flat model)?

Note that the failure we were seeing was with different amount of memory
on different machines. Obviously because of different e820 reservations
and driver requirements at boot time. So the required memory to trigger
the error oscillated around 128G, sometimes being 130G.

It triggered when mem_map fit exactly into 0-2G (and 2-4G was reserved)
and no more space was there. If RAM was more than 130G, mem_map was
above 4G boundary implicitly, so that there was enough space in the
first 4G of memory for others with specific bootmem limitations.

Could you explain more the dmesg output?

thanks,
--
js

Johannes Weiner

unread,
Mar 5, 2010, 8:10:02 AM3/5/10
to

Yes, that's the problem.

> return NULL, could make caller have more choice and more control.

Most callers do not need it as there is no real way to handle allocation
failures at this point of time in the boot process.

For everything else, there is the _nopanic API.

Greg Thelen

unread,
Mar 5, 2010, 2:20:03 PM3/5/10
to
On Fri, Mar 5, 2010 at 10:41 AM, Yinghai Lu <yin...@kernel.org> wrote:
> On 03/04/2010 09:17 PM, Greg Thelen wrote:

>> On Thu, Mar 4, 2010 at 7:21 PM, Johannes Weiner <han...@cmpxchg.org> wrote:
>>> On Thu, Mar 04, 2010 at 01:21:41PM -0800, Greg Thelen wrote:
>>>> On several systems I am seeing a boot panic if I use mmotm
>>>> (stamp-2010-03-02-18-38). �If I remove
>>>> bootmem-avoid-dma32-zone-by-default.patch then no panic is seen. �I
>>>> find that:
>>>> * 2.6.33 boots fine.
>>>> * 2.6.33 + mmotm w/o bootmem-avoid-dma32-zone-by-default.patch: boots fine.
>>>> * 2.6.33 + mmotm (including
>>>> bootmem-avoid-dma32-zone-by-default.patch): panics.
> ...
>>
>> Note: mmotm has been recently updated to stamp-2010-03-04-18-05. �I
>> re-tested with 'make defconfig' to confirm the panic with this later
>> mmotm.
>
> please check
>
> [PATCH] early_res: double check with updated goal in alloc_memory_core_early
>
> Johannes Weiner pointed out that new early_res replacement for alloc_bootmem_node
> change the behavoir about goal.
> original bootmem one will try go further regardless of goal.
>
> and it will break his patch about default goal from MAX_DMA to MAX_DMA32...
> also broke uncommon machines with <=16M of memory.
> (really? our x86 kernel still can run on 16M system?)
>
> so try again with update goal.
>
> Reported-by: Greg Thelen <gth...@google.com>
> Signed-off-by: Yinghai Lu <yin...@kernel.org>
>
> ---
> �mm/bootmem.c | � 28 +++++++++++++++++++++++++---
> �1 file changed, 25 insertions(+), 3 deletions(-)
>
> Index: linux-2.6/mm/bootmem.c
> ===================================================================
> --- linux-2.6.orig/mm/bootmem.c
> +++ linux-2.6/mm/bootmem.c
> @@ -170,6 +170,28 @@ void __init free_bootmem_late(unsigned l
> �}
>
> �#ifdef CONFIG_NO_BOOTMEM
> +static void * __init ___alloc_memory_core_early(pg_data_t *pgdat, u64 size,
> + � � � � � � � � � � � � � � � � � � � � � � � �u64 align, u64 goal, u64 limit)
> +{
> + � � � void *ptr;
> + � � � unsigned long end_pfn;
> +
> + � � � ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> + � � � � � � � � � � � � � � � � � � � �goal, limit);
> + � � � if (ptr)
> + � � � � � � � return ptr;
> +
> + � � � /* check goal according �*/
> + � � � end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;
> + � � � if ((end_pfn << PAGE_SHIFT) < (goal + size)) {
> + � � � � � � � goal = pgdat->node_start_pfn << PAGE_SHIFT;
> + � � � � � � � ptr = __alloc_memory_core_early(pgdat->node_id, size, align,
> + � � � � � � � � � � � � � � � � � � � � � � � �goal, limit);
> + � � � }
> +
> + � � � return ptr;
> +}
> +
> �static void __init __free_pages_memory(unsigned long start, unsigned long end)
> �{
> � � � �int i;
> @@ -836,7 +858,7 @@ void * __init __alloc_bootmem_node(pg_da
> � � � � � � � �return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
> �#ifdef CONFIG_NO_BOOTMEM
> - � � � return __alloc_memory_core_early(pgdat->node_id, size, align,
> + � � � return �___alloc_memory_core_early(pgdat, size, align,
> � � � � � � � � � � � � � � � � � � � � goal, -1ULL);
> �#else
> � � � �return ___alloc_bootmem_node(pgdat->bdata, size, align, goal, 0);
> @@ -920,7 +942,7 @@ void * __init __alloc_bootmem_node_nopan
> � � � � � � � �return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
> �#ifdef CONFIG_NO_BOOTMEM
> - � � � ptr = �__alloc_memory_core_early(pgdat->node_id, size, align,
> + � � � ptr = �___alloc_memory_core_early(pgdat, size, align,
> � � � � � � � � � � � � � � � � � � � � � � � � goal, -1ULL);
> �#else
> � � � �ptr = alloc_arch_preferred_bootmem(pgdat->bdata, size, align, goal, 0);
> @@ -980,7 +1002,7 @@ void * __init __alloc_bootmem_low_node(p
> � � � � � � � �return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);
>
> �#ifdef CONFIG_NO_BOOTMEM
> - � � � return __alloc_memory_core_early(pgdat->node_id, size, align,
> + � � � return ___alloc_memory_core_early(pgdat, size, align,
> � � � � � � � � � � � � � � � �goal, ARCH_LOW_ADDRESS_LIMIT);
> �#else
> � � � �return ___alloc_bootmem_node(pgdat->bdata, size, align,
>

On my 256MB VM, which detected the problem starting this thread, the
"double check with updated goal in alloc_memory_core_early" patch
(above) boots without panic.

My initial impression is that this fixes the reported problem. Note:
I have not tested to see if any other issues are introduced.

--
Greg

Yinghai Lu

unread,
Mar 5, 2010, 1:50:02 PM3/5/10
to

please check

Yinghai Lu

unread,
Mar 5, 2010, 3:30:02 PM3/5/10
to

it will list free pfn range that will be use for slab...

attached is debug patch for print out without CONFIG_NO_BOOTMEM set.

YH

print_free_bootmem.patch

Yinghai Lu

unread,
Mar 5, 2010, 3:50:01 PM3/5/10
to
if you don't want to drop
| bootmem: avoid DMA32 zone by default

today mainline tree actually DO NOT need that patch according to print out ...

please apply this one too.

[PATCH] x86/bootmem: introduce bootmem_default_goal

don't punish the 64bit systems with less 4G RAM.
they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
arch/x86/kernel/setup.c | 13 +++++++++++++
include/linux/bootmem.h | 3 ++-
mm/bootmem.c | 4 ++++
3 files changed, 19 insertions(+), 1 deletion(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -686,6 +686,18 @@ static void __init trim_bios_range(void)
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
}

+#ifdef MAX_DMA32_PFN
+static void __init set_bootmem_default_goal(void)
+{
+ if (max_pfn_mapped < MAX_DMA32_PFN)
+ bootmem_default_goal = __pa(MAX_DMA_ADDRESS);
+}
+#else
+static void __init set_bootmem_default_goal(void)
+{
+}
+#endif
+
/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures
@@ -931,6 +943,7 @@ void __init setup_arch(char **cmdline_p)
max_low_pfn = max_pfn;
}
#endif
+ set_bootmem_default_goal();

/*
* NOTE: On x86-32, only from this point on, fixmaps are ready for use.
Index: linux-2.6/include/linux/bootmem.h
===================================================================
--- linux-2.6.orig/include/linux/bootmem.h
+++ linux-2.6/include/linux/bootmem.h
@@ -104,7 +104,8 @@ extern void *__alloc_bootmem_low_node(pg
unsigned long goal);

#ifdef MAX_DMA32_PFN
-#define BOOTMEM_DEFAULT_GOAL (MAX_DMA32_PFN << PAGE_SHIFT)
+extern unsigned long bootmem_default_goal;
+#define BOOTMEM_DEFAULT_GOAL bootmem_default_goal
#else
#define BOOTMEM_DEFAULT_GOAL __pa(MAX_DMA_ADDRESS)
#endif


Index: linux-2.6/mm/bootmem.c
===================================================================
--- linux-2.6.orig/mm/bootmem.c
+++ linux-2.6/mm/bootmem.c

@@ -25,6 +25,10 @@ unsigned long max_low_pfn;
unsigned long min_low_pfn;
unsigned long max_pfn;

+#ifdef MAX_DMA32_PFN
+unsigned long bootmem_default_goal = (MAX_DMA32_PFN << PAGE_SHIFT);
+#endif
+
#ifdef CONFIG_CRASH_DUMP
/*
* If we have booted due to a crash, max_pfn will be a very low value. We need

Johannes Weiner

unread,
Mar 5, 2010, 7:00:01 PM3/5/10
to
Hello Yinghai,

Thanks for the patch, it seems to be correct.

However, I have a more generic question about it, regarding the future of the
early_res allocator.

Did you plan on keeping the bootmem API for longer? Because my impression was,
emulating it is a temporary measure until all users are gone and bootmem can
be finally dropped.

But then this would require some sort of handling of 'user does not need DMA[32]
memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
allocator as well.

I ask this specifically because you move this fix into the bootmem compatibility
code while there is not yet a way to tell early_res the same thing, so switching
a user that _needs_ to specify this requirement from bootmem to early_res is not
yet possible, is it?

I think it would make sense to move the parameter check before doing the
allocation. Then you save the second call.

And a second nitpick: naming the inner function __foo and the outer one ___foo seems
confusing to me. Could you maybe rename the wrapper? bootmem_compat_alloc_early() or
something like that?

Thanks,
Hannes

Yinghai Lu

unread,
Mar 5, 2010, 9:00:02 PM3/5/10
to

that depends on every arch maintainer.

user can compare them on x86 to check if...

next step will be make fw_mem_map to generiaized and combine them with lmb.

>
> But then this would require some sort of handling of 'user does not need DMA[32]
> memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
> allocator as well.
>
> I ask this specifically because you move this fix into the bootmem compatibility
> code while there is not yet a way to tell early_res the same thing, so switching
> a user that _needs_ to specify this requirement from bootmem to early_res is not
> yet possible, is it?

just let caller set the goal.

I am trying to avoid the second call.
please check another patch about "introduce bootmem_default_goal : don't punish 64bit system without 4g ram"

>
> And a second nitpick: naming the inner function __foo and the outer one ___foo seems
> confusing to me. Could you maybe rename the wrapper? bootmem_compat_alloc_early() or
> something like that?

ok.

Thanks

Yinghai

Johannes Weiner

unread,
Mar 5, 2010, 9:30:02 PM3/5/10
to

Humm, now that is a bit disappointing. Because it means we will never get rid
of bootmem as long as it works for the other architectures. And your changeset
just added ~900 lines of code, some of it being a rather ugly compatibility
layer in bootmem that I hoped could go away again sooner than later.

I do not know what the upsides for x86 are from no longer using bootmem but it
would suck from a code maintainance point of view to get stuck half way through
this transition and have now TWO implementations of the bootmem interface we
would like to get rid of.

> next step will be make fw_mem_map to generiaized and combine them with lmb.
>
> >
> > But then this would require some sort of handling of 'user does not need DMA[32]
> > memory, so avoid it' and 'user can only use DMA[32] memory' in the early_res
> > allocator as well.
> >
> > I ask this specifically because you move this fix into the bootmem compatibility
> > code while there is not yet a way to tell early_res the same thing, so switching
> > a user that _needs_ to specify this requirement from bootmem to early_res is not
> > yet possible, is it?
>
> just let caller set the goal.

That means that every caller must be aware of where the DMA zone ends and if
it is non-empty and open-code the fallback to the DMA zone if the non-DMA zone
is exhausted?

Yinghai Lu

unread,
Mar 5, 2010, 9:40:02 PM3/5/10
to

some data, and others can compare them more on x86 systems...

I didn't plan to post this data before you said ....

for my 1T system

nobootmem:
text data bss dec hex filename
19185736 4148404 12170736 35504876 21dc2ec vmlinux.nobootmem
Memory: 1058662820k/1075838976k available (11388k kernel code, 2106480k absent, 15069676k reserved, 8589k data, 2744k init
[ 220.947157] calling ip_auto_config+0x0/0x24d @ 1


bootmem:
text data bss dec hex filename
19188441 4153956 12170736 35513133 21de32d vmlinux.bootmem
Memory: 1058662796k/1075838976k available (11388k kernel code, 2106480k absent, 15069700k reserved, 8589k data, 2752k init)
[ 236.765364] calling ip_auto_config+0x0/0x24d @ 1

YH

Yinghai Lu

unread,
Mar 6, 2010, 12:50:01 AM3/6/10
to
On 03/05/2010 12:38 PM, Yinghai Lu wrote:
> if you don't want to drop
> | bootmem: avoid DMA32 zone by default
>
> today mainline tree actually DO NOT need that patch according to print out ...
>
> please apply this one too.
>
> [PATCH] x86/bootmem: introduce bootmem_default_goal
>
> don't punish the 64bit systems with less 4G RAM.
> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...

andrew,

please drop Johannes' patch : bootmem: avoid DMA32 zone by default

so you don't need to apply two fix patches from me:


[PATCH] early_res: double check with updated goal in alloc_memory_core_early

[PATCH] x86/bootmem: introduce bootmem_default_goal

move all bootmem to above 4g, make system performance get worse...

Thanks

Yinghai Lu

Andrew Morton

unread,
Mar 6, 2010, 7:30:02 PM3/6/10
to
On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yin...@kernel.org> wrote:

> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
> > if you don't want to drop
> > | bootmem: avoid DMA32 zone by default
> >
> > today mainline tree actually DO NOT need that patch according to print out ...
> >
> > please apply this one too.
> >
> > [PATCH] x86/bootmem: introduce bootmem_default_goal
> >
> > don't punish the 64bit systems with less 4G RAM.
> > they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
>
> andrew,
>
> please drop Johannes' patch : bootmem: avoid DMA32 zone by default

I'd rather not. That patch is said to fix a runtime problem which is
present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.

I don't have a clue what your patches do. Can you tell us?

Earlier, Johannes wrote

: Humm, now that is a bit disappointing. Because it means we will never


: get rid of bootmem as long as it works for the other architectures.
: And your changeset just added ~900 lines of code, some of it being a
: rather ugly compatibility layer in bootmem that I hoped could go away
: again sooner than later.
:
: I do not know what the upsides for x86 are from no longer using bootmem
: but it would suck from a code maintainance point of view to get stuck
: half way through this transition and have now TWO implementations of
: the bootmem interface we would like to get rid of.

Which is a pretty good-sounding argument. Perhaps we should be
dropping your patches.

What patches _are_ these x86 bootmem changes, anyway? Please identify
them so people can take a look and see what they do.

Yinghai Lu

unread,
Mar 6, 2010, 7:50:01 PM3/6/10
to
On 03/06/2010 04:22 PM, Andrew Morton wrote:
> On Fri, 05 Mar 2010 21:44:38 -0800 Yinghai Lu <yin...@kernel.org> wrote:
>
>> On 03/05/2010 12:38 PM, Yinghai Lu wrote:
>>> if you don't want to drop
>>> | bootmem: avoid DMA32 zone by default
>>>
>>> today mainline tree actually DO NOT need that patch according to print out ...
>>>
>>> please apply this one too.
>>>
>>> [PATCH] x86/bootmem: introduce bootmem_default_goal
>>>
>>> don't punish the 64bit systems with less 4G RAM.
>>> they should use _pa(MAX_DMA_ADDRESS) at first pass instead of failback...
>>
>> andrew,
>>
>> please drop Johannes' patch : bootmem: avoid DMA32 zone by default
>
> I'd rather not. That patch is said to fix a runtime problem which is
> present in 2.6.33 and hence we planned on backporting it into 2.6.33.x.

that patch make my box booting time from 215s to 265s.

should have better way to fix the problem:
just put the mem_map or the big chunk on high.
instead put everything above 4g.

some thing like
static void * __init_refok __earlyonly_bootmem_alloc(int node,
unsigned long size,
unsigned long align,
unsigned long goal)
{
return __alloc_bootmem_node_high(NODE_DATA(node), size, align, goal);
}

void * __init __alloc_bootmem_node_high(pg_data_t *pgdat, unsigned long size,
unsigned long align, unsigned long goal)
{
#ifdef MAX_DMA32_PFN
unsigned long end_pfn;

if (WARN_ON_ONCE(slab_is_available()))


return kzalloc_node(size, GFP_NOWAIT, pgdat->node_id);

/* update goal according ...MAX_DMA32_PFN */


end_pfn = pgdat->node_start_pfn + pgdat->node_spanned_pages;

if (end_pfn > MAX_DMA32_PFN + (128 >> (20 - PAGE_SHIFT)) &&
(goal >> PAGE_SHIFT) < MAX_DMA32_PFN) {
void *ptr;
unsigned long new_goal;

new_goal = MAX_DMA32_PFN << PAGE_SHIFT;
#ifdef CONFIG_NO_BOOTMEM


ptr = __alloc_memory_core_early(pgdat->node_id, size, align,

new_goal, -1ULL);
#else
ptr = alloc_bootmem_core(pgdat->bdata, size, align,
new_goal, 0);
#endif
if (ptr)
return ptr;
}
#endif

return __alloc_bootmem_node(pgdat, size, align, goal);

}


>
> I don't have a clue what your patches do. Can you tell us?

do use bootmem, and use early_res instead.

you are on the to list...

please check...
http://lkml.org/lkml/2010/2/10/39


>
> Earlier, Johannes wrote
>
> : Humm, now that is a bit disappointing. Because it means we will never
> : get rid of bootmem as long as it works for the other architectures.
> : And your changeset just added ~900 lines of code, some of it being a
> : rather ugly compatibility layer in bootmem that I hoped could go away
> : again sooner than later.
> :
> : I do not know what the upsides for x86 are from no longer using bootmem
> : but it would suck from a code maintainance point of view to get stuck
> : half way through this transition and have now TWO implementations of
> : the bootmem interface we would like to get rid of.
>
> Which is a pretty good-sounding argument. Perhaps we should be
> dropping your patches.
>
> What patches _are_ these x86 bootmem changes, anyway? Please identify
> them so people can take a look and see what they do.

http://lkml.org/lkml/2010/2/10/39

and you and linus, ingo, hpa, tglx on the To list.

Yinghai

Yinghai Lu

unread,
Mar 6, 2010, 8:00:02 PM3/6/10
to

Jiri, can you send out your bootlog and .config?

Paul Mackerras

unread,
Mar 6, 2010, 8:10:01 PM3/6/10
to
On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> Earlier, Johannes wrote
>
> : Humm, now that is a bit disappointing. Because it means we will never
> : get rid of bootmem as long as it works for the other architectures.
> : And your changeset just added ~900 lines of code, some of it being a
> : rather ugly compatibility layer in bootmem that I hoped could go away
> : again sooner than later.

Whoa! Who's proposing to get rid of bootmem, and why?

Paul.

Yinghai Lu

unread,
Mar 6, 2010, 8:20:03 PM3/6/10
to
On 03/05/2010 02:26 AM, Jiri Slaby wrote:
> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>> according to context
>> http://patchwork.kernel.org/patch/73893/
>>
>> Jiri,
>> please check current linus tree still have problem about mem_map is using that much low mem?
>
> Hi!
>
> Sorry, I don't have direct access to the machine. I might try to ask the
> owners to do so.
>
>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
>
> So where gets your mem_map allocated (I suppose you're running flat model)?

what kernel version? 2.6.27?

x86 64bit now only support SPARSEMEM.

Yinghai

Stephen Rothwell

unread,
Mar 6, 2010, 8:50:02 PM3/6/10
to
Hi Paul,

On Sun, 7 Mar 2010 12:03:27 +1100 Paul Mackerras <pau...@samba.org> wrote:
>
> On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> > Earlier, Johannes wrote
> >
> > : Humm, now that is a bit disappointing. Because it means we will never
> > : get rid of bootmem as long as it works for the other architectures.
> > : And your changeset just added ~900 lines of code, some of it being a
> > : rather ugly compatibility layer in bootmem that I hoped could go away
> > : again sooner than later.
>
> Whoa! Who's proposing to get rid of bootmem, and why?

I assume that is the point of the "early_res" work already in Linus' tree
starting from commit 27811d8cabe56e0c3622251b049086f49face4ff ("x86: Move
range related operation to one file").

--
Cheers,
Stephen Rothwell s...@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/

Yinghai Lu

unread,
Mar 6, 2010, 9:20:01 PM3/6/10
to

we need to put mem_map high when virtual memmap is not used.

before this patch
free mem pfn range on first node:
[ 0.000000] 19 - 1f
[ 0.000000] 28 40 - 80 95


[ 0.000000] 702 740 - 1000 1000

[ 0.000000] 347c - 347e
[ 0.000000] 34e7 3500 - 3b80 3b8b
[ 0.000000] 73b8b 73bc0 - 73c00 73c00
[ 0.000000] 73ddd - 73e00
[ 0.000000] 73fdd - 74000
[ 0.000000] 741dd - 74200
[ 0.000000] 743dd - 74400
[ 0.000000] 745dd - 74600
[ 0.000000] 747dd - 74800
[ 0.000000] 749dd - 74a00
[ 0.000000] 74bdd - 74c00
[ 0.000000] 74ddd - 74e00
[ 0.000000] 74fdd - 75000
[ 0.000000] 751dd - 75200
[ 0.000000] 753dd - 75400
[ 0.000000] 755dd - 75600
[ 0.000000] 757dd - 75800
[ 0.000000] 759dd - 75a00
[ 0.000000] 79bdd 79c00 - 7d540 7d550
[ 0.000000] 7f745 - 7f750
[ 0.000000] 10000b 100040 - 2080000 2080000
so only 79c00 - 7d540 are major free block under 4g...

after this patch, we will get
[ 0.000000] 19 - 1f
[ 0.000000] 28 40 - 80 95


[ 0.000000] 702 740 - 1000 1000

[ 0.000000] 347c - 347e
[ 0.000000] 34e7 3500 - 3600 3600


[ 0.000000] 37dd - 3800
[ 0.000000] 39dd - 3a00
[ 0.000000] 3bdd - 3c00
[ 0.000000] 3ddd - 3e00
[ 0.000000] 3fdd - 4000
[ 0.000000] 41dd - 4200
[ 0.000000] 43dd - 4400
[ 0.000000] 45dd - 4600
[ 0.000000] 47dd - 4800
[ 0.000000] 49dd - 4a00
[ 0.000000] 4bdd - 4c00
[ 0.000000] 4ddd - 4e00
[ 0.000000] 4fdd - 5000
[ 0.000000] 51dd - 5200

[ 0.000000] 53dd - 5400
[ 0.000000] 95dd 9600 - 7d540 7d550
[ 0.000000] 7f745 - 7f750
[ 0.000000] 17000b 170040 - 2080000 2080000
we will have 9600 - 7d540 for major free block...

sparse-vmemmap path already used __alloc_bootmem_node_high()

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
mm/sparse.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

Index: linux-2.6/mm/sparse.c
===================================================================
--- linux-2.6.orig/mm/sparse.c
+++ linux-2.6/mm/sparse.c
@@ -381,13 +381,15 @@ static void __init sparse_early_usemaps_
struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid)
{
struct page *map;
+ unsigned long size;

map = alloc_remap(nid, sizeof(struct page) * PAGES_PER_SECTION);
if (map)
return map;

- map = alloc_bootmem_pages_node(NODE_DATA(nid),
- PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION));
+ size = PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION);
+ map = __alloc_bootmem_node_high(NODE_DATA(nid), size,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
return map;
}
void __init sparse_mem_maps_populate_node(struct page **map_map,
@@ -411,7 +413,8 @@ void __init sparse_mem_maps_populate_nod
}

size = PAGE_ALIGN(size);
- map = alloc_bootmem_pages_node(NODE_DATA(nodeid), size * map_count);
+ map = __alloc_bootmem_node_high(NODE_DATA(nodeid), size * map_count,
+ PAGE_SIZE, __pa(MAX_DMA_ADDRESS));
if (map) {
for (pnum = pnum_begin; pnum < pnum_end; pnum++) {
if (!present_section_nr(pnum))

Russell King

unread,
Mar 7, 2010, 4:20:02 AM3/7/10
to
On Sun, Mar 07, 2010 at 12:03:27PM +1100, Paul Mackerras wrote:
> On Sat, Mar 06, 2010 at 04:22:34PM -0800, Andrew Morton wrote:
> > Earlier, Johannes wrote
> >
> > : Humm, now that is a bit disappointing. Because it means we will never
> > : get rid of bootmem as long as it works for the other architectures.
> > : And your changeset just added ~900 lines of code, some of it being a
> > : rather ugly compatibility layer in bootmem that I hoped could go away
> > : again sooner than later.
>
> Whoa! Who's proposing to get rid of bootmem, and why?

It would be nice if this stuff was copied to linux-arch since it
impacts all architectures.

--
Russell King
Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/
maintainer of:

Jiri Slaby

unread,
Mar 11, 2010, 6:00:02 AM3/11/10
to
On 03/07/2010 02:17 AM, Yinghai Lu wrote:
> On 03/05/2010 02:26 AM, Jiri Slaby wrote:
>> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>>> according to context
>>> http://patchwork.kernel.org/patch/73893/
>>>
>>> Jiri,
>>> please check current linus tree still have problem about mem_map is using that much low mem?
>>
>> Hi!
>>
>> Sorry, I don't have direct access to the machine. I might try to ask the
>> owners to do so.
>>
>>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
>>
>> So where gets your mem_map allocated (I suppose you're running flat model)?
>
> what kernel version? 2.6.27?

Hi, yes, it is 2.6.27.

> x86 64bit now only support SPARSEMEM.


--
js

Yinghai Lu

unread,
Mar 11, 2010, 3:20:03 PM3/11/10
to
On 03/11/2010 02:54 AM, Jiri Slaby wrote:
> On 03/07/2010 02:17 AM, Yinghai Lu wrote:
>> On 03/05/2010 02:26 AM, Jiri Slaby wrote:
>>> On 03/05/2010 10:04 AM, Yinghai Lu wrote:
>>>> according to context
>>>> http://patchwork.kernel.org/patch/73893/
>>>>
>>>> Jiri,
>>>> please check current linus tree still have problem about mem_map is
>>>> using that much low mem?
>>>
>>> Hi!
>>>
>>> Sorry, I don't have direct access to the machine. I might try to ask the
>>> owners to do so.
>>>
>>>> on my 1024g system first node has 128G ram, [2g, 4g) are mmio range.
>>>
>>> So where gets your mem_map allocated (I suppose you're running flat
>>> model)?
>>
>> what kernel version? 2.6.27?
>
> Hi, yes, it is 2.6.27.

SLES 11?

Yinghai

Yinghai Lu

unread,
Mar 11, 2010, 4:50:04 PM3/11/10
to
On 03/11/2010 01:40 PM, Jiri Slaby wrote:

> On 03/11/2010 09:12 PM, Yinghai Lu wrote:
>> On 03/11/2010 02:54 AM, Jiri Slaby wrote:
>>> Hi, yes, it is 2.6.27.
>>
>> SLES 11?
>
> Sorry I wrote that in haste. It is SLES 10 in the end. That means it is
> 2.6.16, not 2.6.27. Hence no sparsemem whatsoever. With SLES11 it should
> be OK, we are using flatmem only for i386.
>
> Whatever, it should be no issue now, as flatmem currently (as of 2.6.25)
> depends on i386.
>
> On the other hand I still considered the patch as applicable to
> contemporary kernels since there might be weird bios e820 maps and huge
> (and sparse) bootmem allocations/reservations (memory cgroups, initrd)
> so that code requiring much memory below 4g (swiotlb) will fail then.
>
> Whatever, in the current kernel, the particular issue I was referring to
> *is not reproducible*.

the point is: we should only put the memmap put high. that is big chunk...
other users should be ok... and leave them alone.

YH

Jiri Slaby

unread,
Mar 11, 2010, 4:50:03 PM3/11/10
to
On 03/11/2010 09:12 PM, Yinghai Lu wrote:
> On 03/11/2010 02:54 AM, Jiri Slaby wrote:
>> Hi, yes, it is 2.6.27.
>
> SLES 11?

Sorry I wrote that in haste. It is SLES 10 in the end. That means it is

2.6.16, not 2.6.27. Hence no sparsemem whatsoever. With SLES11 it should
be OK, we are using flatmem only for i386.

Whatever, it should be no issue now, as flatmem currently (as of 2.6.25)
depends on i386.

On the other hand I still considered the patch as applicable to
contemporary kernels since there might be weird bios e820 maps and huge
(and sparse) bootmem allocations/reservations (memory cgroups, initrd)
so that code requiring much memory below 4g (swiotlb) will fail then.

Whatever, in the current kernel, the particular issue I was referring to
*is not reproducible*.

thanks,
--
js

0 new messages