Add memory hotremove config option to x86_64
Memory hotremove functionality can currently be configured into
the ia64, powerpc, and s390 kernels. This patch makes it possible
to configure the memory hotremove functionality into the x86_64
kernel as well.
Signed-off-by: Gary Hade <gary...@us.ibm.com>
---
arch/x86/Kconfig | 3 +++
arch/x86/mm/init_64.c | 18 ++++++++++++++++++
2 files changed, 21 insertions(+)
Index: linux-2.6.27-rc5/arch/x86/Kconfig
===================================================================
--- linux-2.6.27-rc5.orig/arch/x86/Kconfig 2008-09-03 13:33:59.000000000 -0700
+++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-03 13:34:55.000000000 -0700
@@ -1384,6 +1384,9 @@
def_bool y
depends on X86_64 || (X86_32 && HIGHMEM)
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+ def_bool y
+
config HAVE_ARCH_EARLY_PFN_TO_NID
def_bool X86_64
depends on NUMA
Index: linux-2.6.27-rc5/arch/x86/mm/init_64.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/x86/mm/init_64.c 2008-09-03 13:34:08.000000000 -0700
+++ linux-2.6.27-rc5/arch/x86/mm/init_64.c 2008-09-03 13:34:55.000000000 -0700
@@ -740,6 +740,24 @@
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif
+#ifdef CONFIG_MEMORY_HOTREMOVE
+int remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn, end_pfn;
+ unsigned long timeout = 120 * HZ;
+ int ret;
+ start_pfn = start >> PAGE_SHIFT;
+ end_pfn = start_pfn + (size >> PAGE_SHIFT);
+ ret = offline_pages(start_pfn, end_pfn, timeout);
+ if (ret)
+ goto out;
+ /* Arch-specific calls go here */
+out:
+ return ret;
+}
+EXPORT_SYMBOL_GPL(remove_memory);
+#endif /* CONFIG_MEMORY_HOTREMOVE */
+
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
> Add memory hotremove config option to x86_64
>
> Memory hotremove functionality can currently be configured into the
> ia64, powerpc, and s390 kernels. This patch makes it possible to
> configure the memory hotremove functionality into the x86_64 kernel as
> well.
hm, why is it for 64-bit only?
> +++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-03 13:34:55.000000000 -0700
> @@ -1384,6 +1384,9 @@
> def_bool y
> depends on X86_64 || (X86_32 && HIGHMEM)
>
> +config ARCH_ENABLE_MEMORY_HOTREMOVE
> + def_bool y
so this will break the build on 32-bit, if CONFIG_MEMORY_HOTREMOVE=y?
mm/memory_hotplug.c assumes that remove_memory() is provided by the
architecture.
> +#ifdef CONFIG_MEMORY_HOTREMOVE
> +int remove_memory(u64 start, u64 size)
> +{
> + unsigned long start_pfn, end_pfn;
> + unsigned long timeout = 120 * HZ;
> + int ret;
> + start_pfn = start >> PAGE_SHIFT;
> + end_pfn = start_pfn + (size >> PAGE_SHIFT);
> + ret = offline_pages(start_pfn, end_pfn, timeout);
> + if (ret)
> + goto out;
> + /* Arch-specific calls go here */
> +out:
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(remove_memory);
> +#endif /* CONFIG_MEMORY_HOTREMOVE */
hm, nothing appears to be arch-specific about this trivial wrapper
around offline_pages().
Shouldnt this be moved to the CONFIG_MEMORY_HOTREMOVE portion of
mm/memory_hotplug.c instead, as a weak function? That way architectures
only have to enable ARCH_ENABLE_MEMORY_HOTREMOVE - and architectures
with different/special needs can override it.
Ingo
You forgot to describe how you tested it? Does it actually work.
And why do you want to do it it? What's the use case?
The general understanding was that it doesn't work very well on a real
machine at least because it cannot be controlled how that memory maps
to real pluggable hardware (and you cannot completely empty a node at runtime)
and a Hypervisor would likely use different interfaces anyways.
-Andi
Yes. All the archs (ppc64, ia64, s390, x86_64) have exact same
function. No architecture needed special handling so far (initial
versions of ppc64 needed extra handling, but I moved the code
to different place).
We can make this generic and kill all arch-specific ones.
Initially, we didn't know if any arch needs special handling -
so ended up having private functions for each arch.
I think its time to merge them all.
>
> Shouldnt this be moved to the CONFIG_MEMORY_HOTREMOVE portion of
> mm/memory_hotplug.c instead, as a weak function? That way architectures
> only have to enable ARCH_ENABLE_MEMORY_HOTREMOVE - and architectures
> with different/special needs can override it.
Yes. We should do that. I will send out a patch.
Thanks,
Badari
ok - if all architectures have the same function then please make it a
regular function not a weak one, and remove all the duplications.
Ingo
I will let Gary answer these :)
> The general understanding was that it doesn't work very well on a real
> machine at least because it cannot be controlled how that memory maps
> to real pluggable hardware (and you cannot completely empty a node at runtime)
> and a Hypervisor would likely use different interfaces anyways.
At this time we are interested on node remove (on x86_64).
It doesn't really work well at this time - due to some of the structures
(pgdat etc) are striped across all nodes. These is no easy way to
relocate them. Yasunori Goto is working on patches to address some of
these issues.
But we are considering adding support to restrict/skip bootmem
allocations on selected nodes. That way, we should be able to do
node remove.
(BTW, on ppc64 this works fine - since we are interested mostly in
removing *some* sections of memory to give it back to hypervisor -
not entire node removal).
Thanks,
Badari
That's a quite euphemistic way to put it.
> due to some of the structures
That means you can never put any slab data on specific nodes.
And all the kernel subsystems on that node will not ever get local
memory. How are you going to solve that? And if you disallow
kernel allocations in so large memory areas you get many of the highmem
issues that plagued 32bit back in the 64bit kernel.
There are lots of other issues. It's quite questionable if this
whole exercise makes sense at all.
> (BTW, on ppc64 this works fine - since we are interested mostly in
> removing *some* sections of memory to give it back to hypervisor -
> not entire node removal).
Ok for hypervisors you can do it reasonably easy on x86 too, but it's likely
that some hypercall interface is better than going through
sysfs.
-Andi
So far, I have tested it on a 2-node IBM x460, 2-node IBM x3950, and
a 4-node IBM x3950 M2 and have been able to successfully offline and
re-online all memory sections marked as removable multiple times with
no apparent problems.
By directing the change to -mm our hope is that others will try it
on their systems and help us shake out any issues that they my find.
> And why do you want to do it it? What's the use case?
A baby step towards evental total node removal.
>
> The general understanding was that it doesn't work very well on a real
> machine at least because it cannot be controlled how that memory maps
> to real pluggable hardware (and you cannot completely empty a node at runtime)
> and a Hypervisor would likely use different interfaces anyways.
The inability to offline all non-primary node memory sections
certainly needs to be addressed. The pgdat removal work that
Yasunori Goto has started will hopefully continue and help resolve
this issue. We have only just started thinking about issues related
to resources other that CPUs and memory that will need to be released
in preparation for node removal (e.g. memory and i/o resources
assigned to PCI devices on a node targeted for removal). Much of
this is new territory for us so any suggestions that you and others
can offer will be much appreciated.
Thanks for asking.
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
gary...@us.ibm.com
http://www.ibm.com/linux/ltc
You make it sound like it's just some minor technical hurdle
that needs to be addressed. But from all analysis of these issues
I've seen so far it's extremly hard and all possible solutions
have serious issues. So before doing some baby steps there
should be at least some general idea how this thing is supposed
to work in the end.
> We have only just started thinking about issues related
> to resources other that CPUs and memory that will need to be released
> in preparation for node removal (e.g. memory and i/o resources
> assigned to PCI devices on a node targeted for removal).
That's the easy stuff. The hard parts are all the kernel objects
that you cannot move.
-Andi
Sorry, that was not my intent.
> But from all analysis of these issues
> I've seen so far it's extremly hard and all possible solutions
> have serious issues. So before doing some baby steps there
> should be at least some general idea how this thing is supposed
> to work in the end.
I am not sure if I understand why you appear to be opposed to
enabling the hotremove function before all the issues related
to an eventual goal of being able to free all memory on a node
are addressed. Even in the absence of solutions for these issues
it seems like there could still be other possible benefits such
as the ability to selectively expand and shrink available memory
for testing or debugging purposes. I believe it would also be
helpful to those working on or testing possible solutions for
the removal issues.
Gary
--
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503 IBM T/L: 775-4503
gary...@us.ibm.com
http://www.ibm.com/linux/ltc
--
You are absolutely correct. There is no easy solution - one has
to loose performance in order to support node removal, along with
some old x86 issues :(
We were contemplating idea of limiting node removal to few
select set of nodes as a compromise - but it didn't sound right :(
>
> There are lots of other issues. It's quite questionable if this
> whole exercise makes sense at all.
Same issues exist with ia64 and x86_64 won't be any worse off.
Gary was trying to enable the functionality so that we can atleast
test out offlining memory section easier (test page migration,
isolation code and hash out issues)
Another possible idea being considered (still lot of unknowns)
to make use offline memory section feature for power management
(*cough*).
Anyway, as you can see this patch doesn't add any code - just
enables config option for x86_64. (if you are worried about
code bloat).
> > (BTW, on ppc64 this works fine - since we are interested mostly in
> > removing *some* sections of memory to give it back to hypervisor -
> > not entire node removal).
>
> Ok for hypervisors you can do it reasonably easy on x86 too, but it's likely
> that some hypercall interface is better than going through
> sysfs.
sysfs interface already exists to offline sections of memory. (same
interface as online).
The proposed patch provides easy way to find out what sections of
memory belongs to which node. (could be useful on its own).
Thanks,
Badari
I'm quite sceptical that it can be ever made to work in a useful
way for real hardware (as opposed to an hypervisor para virtual setup
for which this interface is not the right way -- it should be done
in some specific driver instead)
And if it cannot be made to work then it will be a false promise
to the user. They will see it and think it will work, but it will
not.
This means I don't see a real use case for this feature.
-Andi
I don't think its driver is almighty.
IIRC, balloon driver can be cause of fragmentation for 24-7 system.
In addition, I have heard that memory hotplug would be useful for reducing
of power consumption of DIMM.
I have to admit that memory hotplug has many issues, but I would like to
solve them step by step.
Thanks.
--
Yasunori Goto
Sure the balloon driver can be likely improved too, it's just
that I don't think a balloon driver should call into the function
the original patch in the series hooked up.
>
> In addition, I have heard that memory hotplug would be useful for reducing
> of power consumption of DIMM.
It's unclear that memory hotplug is the right model for DIMM power management.
The problem is that DIMMs are interleaved, so you again have to completely
free a quite large area. It's not much easier than node hotplug.
> I have to admit that memory hotplug has many issues, but I would like to
Let's call it "node" or "hardware" memory hot unplug, not that
anyone confuses it with the easier VM based hot unplug or the really
easy hotadd.
> solve them step by step.
The question is if they are even solvable in a useful way.
I'm not sure it's that useful to start and then find out
that it doesn't work anyways.
-Andi
> I don't think its driver is almighty. IIRC, balloon driver can be
> cause of fragmentation for 24-7 system.
>
> In addition, I have heard that memory hotplug would be useful for
> reducing of power consumption of DIMM.
>
> I have to admit that memory hotplug has many issues, but I would like
> to solve them step by step.
What would be nice is to insert the information both during bootup and
in /proc/meminfo and 'free' output that hot-removable memory segments
are not generic free memory, it's currently a limited resource that
might or might not be sufficient to serve a given workload.
Perhaps even exclude it from 'total' memory reported by meminfo - to be
on the safe side of user expectations. In terms of user-space memory it
is already generic swappable memory but in terms of kernel-space
allocations it is not.
As i said it earlier in the thread, i certainly have no objections from
the x86 maintenance side - nothing is worse than a generic kernel
feature only available on certain less frequently used platforms. Memory
hotplug has been available for some time in the MM and it's not really
causing any maintenance trouble at the moment and it is not enabled by
default either.
Having said that, i have my doubts about its generic utility (the power
saving aspects are likely not realizable - nobody really wants DIMMs to
just sit there unused and the cost of dynamic migration is just
horrendous) - but as long as it's opt-in there's no reason to limit the
availability of an in-kernel feature artificially.
Removing those limitations of kernel-space allocations should indeed be
done in baby steps - and whether it's worth turning such memory into
completely generic kernel memory is an open question.
But the fact that a piece of memory is not fully generic is no reason
not to allow users to create special, capability-limited RAM resources
like they can already do via hugetlbfs or ramfs, as long as the the
capability limitations are advertised clearly.
Yes, memory hotplug has limitations we all understand, but still it's an
arguably useful feature in some circumstances. If we never give a
feature a chance to evolve on the main Linux platform that 90%+ of our
users use it wont ever be truly useful.
Please send the new patches against -git or -tip and we can put them
into a separate standalone feature topic and can test it on various x86
boxes and send them towards linux-next if Andrew agrees with that
process too.
Btw., it would be nice if memory hotplug had a self-test that could be
activated from the .config and would run autonomously (a bit like
rcu-torture): it would mark say 10% of all RAM as hot-pluggable during
bootup and would periodically hot-plug and hot-unplug that memory, every
10 seconds or 30 seconds or so, transparently. That would also test the
x86 architecture's pagetable init code, the page migration code, etc.
(Disabled by default and dependent on DEBUG_KERNEL && EXPERIMENTAL.)
Ingo
Thanks,
-Kame
Most of problems which Goto wrote are mainly about placement of memmap and
pgdat, zones. One example is that "when SPARSEMEM_VMEMMAP is enabled,
memmap is not removed even when memory is removed. "
>As i said it earlier in the thread, i certainly have no objections from
>the x86 maintenance side - nothing is worse than a generic kernel
>feature only available on certain less frequently used platforms. Memory
>hotplug has been available for some time in the MM and it's not really
>causing any maintenance trouble at the moment and it is not enabled by
>default either.
>
>Having said that, i have my doubts about its generic utility (the power
>saving aspects are likely not realizable - nobody really wants DIMMs to
>just sit there unused and the cost of dynamic migration is just
>horrendous) - but as long as it's opt-in there's no reason to limit the
>availability of an in-kernel feature artificially.
Nobody ? maybe just a trade-off problem in user side.
Even without DIMM hotplug or DIMM's power save mode, making a DIMM idle
is of no use ? I think memory consumes much power when it used.
Memory Hotplug and ZONE_MOVABLE can make some memory idle.
(I'm sorry if my thinking is wrong.)
>
>Removing those limitations of kernel-space allocations should indeed be
>done in baby steps - and whether it's worth turning such memory into
>completely generic kernel memory is an open question.
>
I think generic kernel space memory hotplug will never be available.
>But the fact that a piece of memory is not fully generic is no reason
>not to allow users to create special, capability-limited RAM resources
>like they can already do via hugetlbfs or ramfs, as long as the the
>capability limitations are advertised clearly.
>
Hmm, adding a feature like
- offline some memory at boot.
- online-memory-as-hugeltb mode
is useful for generic pc users ?
Regards,
-Kame
> > Removing those limitations of kernel-space allocations should indeed
> > be done in baby steps - and whether it's worth turning such memory
> > into completely generic kernel memory is an open question.
>
> I think generic kernel space memory hotplug will never be available.
yeah, most likely. (It's possible technically even on a native kernel -
just very expensive to various aspects of the kernel.)
> > But the fact that a piece of memory is not fully generic is no
> > reason not to allow users to create special, capability-limited RAM
> > resources like they can already do via hugetlbfs or ramfs, as long
> > as the the capability limitations are advertised clearly.
>
> Hmm, adding a feature like
> - offline some memory at boot.
> - online-memory-as-hugeltb mode
>
> is useful for generic pc users ?
yeah - it's actually the way how hugetlb should be done. Plus expand
gbpages to hugetlbfs and hotplug memory on Barcelona CPUs and you can do
user-space apps that can run for a long time without any TLB misses.
_That_ might make sense to explore in practice. (i'm not holding my
breath though, TLB misses are _fast_ on the best x86 CPUs.)
But we wont be able to make such experiments without having the
capability on x86. So i'd like to break the catch-22 by accepting all
this into arch/x86, it certainly is simple and makes some sense, it's
just that i'm not that convinced about it personally at the moment.
So feel free to turn it all into a killer feature (make hugetlb backed
memory transparent to user-space, etc. etc.) that high-performance
computing users strive for and all that will change. Please send the
reshaped patches so we can move past the 'what if' discussion phase ;-)
Ingo
You use non-linear mappings for the kernel, so that kernel data is
not tied to a specific physical address. AFAIK, that is the only way
to really do it completely (like the fragmentation problem).
Of course, I don't think that would be a good idea to do that in the
forseeable future.
Even with that there are lots of issues, like keeping track of
DMAs or handling executing kernel code.
>
> Of course, I don't think that would be a good idea to do that in the
> forseeable future.
Agreed.
-Andi
Right, but the "high level" software solution is to have nonlinear
kernel mappings. Executing kernel code should not be so hard because
it could be handled just like executing user code (ie. the CPU that
is executing will subsequently fault and be blocked until the
relocation is complete).
DMAs aren't trivial at all, but I guess there could be say, a method
to submit and revoke areas of memory for DMA, and the submit would
block if the memory is currently being relocated underneath it (then
it would be able to find the new address).
Anwyay, whatever the case, yeah I'm not trying to say it is trivial
at all. Even without thinking about DMA it would be costly.
> > Of course, I don't think that would be a good idea to do that in the
> > forseeable future.
>
> Agreed.
Same as the "anti-frag" patches. We must not proceed with this kind of
thing on the justification that "in future we'll be able to unplug any
bit of memory". Because it is not just a matter of logical steps to
reach that point, but basically a fundamental rethink of how the kernel
memory mapping should work.
Other realistic justifications are OK, but if someone wants to unplug
everything, then please put effort into *first* making the kernel
mapping nonlinear, and then we can look at the complexity and
performance costs of that fundamental step.
First blocking arbitary code is hard. There is some code parts
which are not allowed to block arbitarily. Machine check or NMI
handlers come to mind, but there are likely more.
Then that would be essentially a hypervisor or micro kernel approach.
e.g. Xen does that already kind of, but even there it would
be quite hard to do fully in a general way. And for hardware hotplug
only the fully generally way is actually useful unfortunately.
-Andi
Sorry, by "block", I really mean spin I guess. I mean that the CPU will
be forced to stop executing due to the page fault during this sequence:
for prot RO:
alloc new page
memcpy(new, old)
ptep_clear_flush(ptep) <--- from here
set_pte(ptep, newpte) <--- until here
for prot RW, the window also would include the memcpy, however if that
adds too much latency for execute/reads, then it can be mapped RO first,
then memcpy, then flushed and switched.
> Then that would be essentially a hypervisor or micro kernel approach.
What would be? Blocking in interrupts? Or non-linear kernel mapping in
general? Nonlinear kernel mapping I don't think anyone disputes is the
only way to defragment (for unplug or large allocations) arbitrary
physical memory with any sort of guarantee. In the future if TLB costs
grow very much larger, I think this might be worth considering.
But until that becomes inevitable, I really don't want to hack the VM
with crap like transparent variable order mappings etc. but rather
"encourage" CPU manufacturers to have big fast TLBs :)
> e.g. Xen does that already kind of, but even there it would
> be quite hard to do fully in a general way. And for hardware hotplug
> only the fully generally way is actually useful unfortunately.
Yeah I don't really get the hardware hotplug thing. For reliability or
anything it should all be done in hardware (eg. warm/hot spare memory
module). For power I guess there is some argument, but I would prefer
to wait the trends out longer before committing to something big: non
volatile ram replacement for dram for example might be achieved in
future.
But if anybody disagrees, they are sure free to implement non-linear
kernel mappings and physical defragmentation and shut me up with
real numbers!
It's hard for NMIs at least. They cannot execute faults.
In the end you would need to define a core kernel which
cannot be remapped and the rest which can and you end up
with even more micro kernel like mess.
> ptep_clear_flush(ptep) <--- from here
> set_pte(ptep, newpte) <--- until here
>
> for prot RW, the window also would include the memcpy, however if that
> adds too much latency for execute/reads, then it can be mapped RO first,
> then memcpy, then flushed and switched.
>
>
> > Then that would be essentially a hypervisor or micro kernel approach.
>
> What would be? Blocking in interrupts? Or non-linear kernel mapping in
Well in general someone remapping all the memory beyond you.
That's essentially a hypervisor in my book.
-Andi
Well, just for executing code (and reading RO data), then it shouldn't
matter at all actually if the CPU starts executing from the new page
or the old page, so long as there is a way to quiesce NMIs before freeing
the old page.
So the NMI can run, and read data, but it may have a problem with stores.
At least, some kind of redesign of NMI handlers might be required so that
they can make a note of the pending operation and try to do something
sane in that case. Or, there could be a small region of memory; a page or
two, which does not get migrated and NMIs can write to it. I don't think
you need to go so far as saying the entire kernel image must be non
movable just for NMIs.
> In the end you would need to define a core kernel which
> cannot be remapped and the rest which can and you end up
> with even more micro kernel like mess.
Are there any important NMIs that really can't fit with this?
> > ptep_clear_flush(ptep) <--- from here
> > set_pte(ptep, newpte) <--- until here
> >
> > for prot RW, the window also would include the memcpy, however if that
> > adds too much latency for execute/reads, then it can be mapped RO first,
> > then memcpy, then flushed and switched.
> >
> > > Then that would be essentially a hypervisor or micro kernel approach.
> >
> > What would be? Blocking in interrupts? Or non-linear kernel mapping in
>
> Well in general someone remapping all the memory beyond you.
> That's essentially a hypervisor in my book.
I don't see it. It is among one of the things a hypervisor may do.
But anyway, call it what you will.
Signed-off-by: Badari Pulavarty <pba...@us.ibm.com>
arch/ia64/mm/init.c | 17 -----------------
arch/powerpc/mm/mem.c | 17 -----------------
arch/s390/mm/init.c | 11 -----------
mm/memory_hotplug.c | 10 ++++++++++
4 files changed, 10 insertions(+), 45 deletions(-)
Index: linux-2.6.27-rc5/arch/ia64/mm/init.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/ia64/mm/init.c 2008-08-28 15:52:02.000000000 -0700
+++ linux-2.6.27-rc5/arch/ia64/mm/init.c 2008-09-08 12:38:59.000000000 -0700
@@ -701,23 +701,6 @@ int arch_add_memory(int nid, u64 start,
return ret;
}
-#ifdef CONFIG_MEMORY_HOTREMOVE
-int remove_memory(u64 start, u64 size)
-{
- unsigned long start_pfn, end_pfn;
- unsigned long timeout = 120 * HZ;
- int ret;
- start_pfn = start >> PAGE_SHIFT;
- end_pfn = start_pfn + (size >> PAGE_SHIFT);
- ret = offline_pages(start_pfn, end_pfn, timeout);
- if (ret)
- goto out;
- /* we can free mem_map at this point */
-out:
- return ret;
-}
-EXPORT_SYMBOL_GPL(remove_memory);
-#endif /* CONFIG_MEMORY_HOTREMOVE */
#endif
/*
Index: linux-2.6.27-rc5/arch/powerpc/mm/mem.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/powerpc/mm/mem.c 2008-08-28 15:52:02.000000000 -0700
+++ linux-2.6.27-rc5/arch/powerpc/mm/mem.c 2008-09-08 12:39:19.000000000 -0700
@@ -135,23 +135,6 @@ int arch_add_memory(int nid, u64 start,
return __add_pages(zone, start_pfn, nr_pages);
}
-
-#ifdef CONFIG_MEMORY_HOTREMOVE
-int remove_memory(u64 start, u64 size)
-{
- unsigned long start_pfn, end_pfn;
- int ret;
-
- start_pfn = start >> PAGE_SHIFT;
- end_pfn = start_pfn + (size >> PAGE_SHIFT);
- ret = offline_pages(start_pfn, end_pfn, 120 * HZ);
- if (ret)
- goto out;
- /* Arch-specific calls go here - next patch */
-out:
- return ret;
-}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
Index: linux-2.6.27-rc5/arch/s390/mm/init.c
===================================================================
--- linux-2.6.27-rc5.orig/arch/s390/mm/init.c 2008-08-28 15:52:02.000000000 -0700
+++ linux-2.6.27-rc5/arch/s390/mm/init.c 2008-09-08 12:40:41.000000000 -0700
@@ -189,14 +189,3 @@ int arch_add_memory(int nid, u64 start,
return rc;
}
#endif /* CONFIG_MEMORY_HOTPLUG */
-
-#ifdef CONFIG_MEMORY_HOTREMOVE
-int remove_memory(u64 start, u64 size)
-{
- unsigned long start_pfn, end_pfn;
-
- start_pfn = PFN_DOWN(start);
- end_pfn = start_pfn + PFN_DOWN(size);
- return offline_pages(start_pfn, end_pfn, 120 * HZ);
-}
-#endif /* CONFIG_MEMORY_HOTREMOVE */
Index: linux-2.6.27-rc5/mm/memory_hotplug.c
===================================================================
--- linux-2.6.27-rc5.orig/mm/memory_hotplug.c 2008-08-28 15:52:02.000000000 -0700
+++ linux-2.6.27-rc5/mm/memory_hotplug.c 2008-09-08 12:41:37.000000000 -0700
@@ -26,6 +26,7 @@
#include <linux/delay.h>
#include <linux/migrate.h>
#include <linux/page-isolation.h>
+#include <linux/pfn.h>
#include <asm/tlbflush.h>
@@ -849,6 +850,15 @@ failed_removal:
return ret;
}
+
+int remove_memory(u64 start, u64 size)
+{
+ unsigned long start_pfn, end_pfn;
+
+ start_pfn = PFN_DOWN(start);
+ end_pfn = start_pfn + PFN_DOWN(size);
+ return offline_pages(start_pfn, end_pfn, 120 * HZ);
+}
#else
int remove_memory(u64 start, u64 size)
{
Thanks,
Badari
Add memory hotremove config option to x86
Memory hotremove functionality can currently be configured into
the ia64, powerpc, and s390 kernels. This patch makes it possible
to configure the memory hotremove functionality into the x86
kernel as well.
Signed-off-by: Badari Pulavarty <pba...@us.ibm.com>
Signed-off-by: Gary Hade <gary...@us.ibm.com>
---
arch/x86/Kconfig | 4 ++++
1 file changed, 4 insertions(+)
Index: linux-2.6.27-rc5/arch/x86/Kconfig
===================================================================
--- linux-2.6.27-rc5.orig/arch/x86/Kconfig 2008-09-08 12:36:06.000000000 -0700
+++ linux-2.6.27-rc5/arch/x86/Kconfig 2008-09-08 12:45:30.000000000 -0700
@@ -1384,6 +1384,10 @@ config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y
depends on X86_64 || (X86_32 && HIGHMEM)
+config ARCH_ENABLE_MEMORY_HOTREMOVE
+ def_bool y
+ depends on MEMORY_HOTPLUG
+
config HAVE_ARCH_EARLY_PFN_TO_NID
def_bool X86_64
depends on NUMA
> There is nothing architecture specific about remove_memory().
> remove_memory() function is common for all architectures which
> support hotplug memory remove. Instead of duplicating it in every
> architecture, collapse them into arch neutral function.
>
> Signed-off-by: Badari Pulavarty <pba...@us.ibm.com>
>
> arch/ia64/mm/init.c | 17 -----------------
> arch/powerpc/mm/mem.c | 17 -----------------
> arch/s390/mm/init.c | 11 -----------
> mm/memory_hotplug.c | 10 ++++++++++
> 4 files changed, 10 insertions(+), 45 deletions(-)
I spent some time trying to build-test this on ia64 and gave up. How
the heck do you turn on memory hotplug on ia64?
> On Mon, 08 Sep 2008 14:52:34 -0700
> Badari Pulavarty <pba...@us.ibm.com> wrote:
>
> > There is nothing architecture specific about remove_memory().
> > remove_memory() function is common for all architectures which
> > support hotplug memory remove. Instead of duplicating it in every
> > architecture, collapse them into arch neutral function.
> >
> > Signed-off-by: Badari Pulavarty <pba...@us.ibm.com>
> >
> > arch/ia64/mm/init.c | 17 -----------------
> > arch/powerpc/mm/mem.c | 17 -----------------
> > arch/s390/mm/init.c | 11 -----------
> > mm/memory_hotplug.c | 10 ++++++++++
> > 4 files changed, 10 insertions(+), 45 deletions(-)
>
> I spent some time trying to build-test this on ia64 and gave up. How
> the heck do you turn on memory hotplug on ia64?
After using ia64 defconfig, all I had to do was enable Sparse Memory model
instead of Discontiguous.
---
~Randy
Linux Plumbers Conference, 17-19 September 2008, Portland, Oregon USA
http://linuxplumbersconf.org/
EXPORT_SYMBOL_GPL(remove_memory) is removed.
It is required by drivers/acpi/acpi_memhotplug.ko.
--
Yasunori Goto
Thanks for catching it. I forgot that it was being used
by acpi. Since we didn't export it for ppc and s390,
I assumed its safe to remove the export. Sorry !!
Thanks,
Badari