Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 0/4] x86, kdump: Fix crashkernel high with old kexec-tools

5 views
Skip to first unread message

Yinghai Lu

unread,
Apr 2, 2013, 1:20:19 PM4/2/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org
Vivek found some problems with old kexec-tools.

We keep the old crashkernel=X to old behavoir, so it will not break
old tools.
Add crashkernel=X,high to support new kexec-tools that supports loading high.
when high is used, memblock will search from top to low.
if the allocated one is above 4G, kernel will try to auto allocate
72M under 4G for swiotlb.
user could crashkernel=Y,low to change 72M to other value.

Thanks

Yinghai
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Yinghai Lu

unread,
Apr 2, 2013, 1:20:43 PM4/2/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem
instead.

So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem.

Suggested-by: Vivek Goyal <vgo...@redhat.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
kernel/kexec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -55,7 +55,7 @@ struct resource crashk_res = {
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
struct resource crashk_low_res = {
- .name = "Crash kernel low",
+ .name = "Crash kernel",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM

Yinghai Lu

unread,
Apr 2, 2013, 1:20:48 PM4/2/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Per hpa, use crashkernel=XM;high crashkernel=YM;low instead of
crashkernel_hign=XM crashkernel_low=YM. As that could be extensible.

-v2: according to Vivek, change delimiter to ;

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 8 ++++----
arch/x86/kernel/setup.c | 6 +++---
kernel/kexec.c | 23 +++++++++++++++++------
3 files changed, 24 insertions(+), 13 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,13 +603,13 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

- crashkernel_high=size[KMG]
+ crashkernel=size[KMG];high
[KNL, x86_64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
be above 4G if system have more than 4G ram installed.
- crashkernel_low=size[KMG]
- [KNL, x86_64] range under 4G. When crashkernel_high= is
- passed, kernel could allocate physical memory region
+ crashkernel=size[KMG];low
+ [KNL, x86_64] range under 4G. When crashkernel=X;high
+ is passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that need swiotlb later. Kernel would try to allocate
some region below 4G automatically. This one let
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -528,7 +528,7 @@ static void __init reserve_crashkernel_l
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
- /* crashkernel_low=YM */
+ /* crashkernel=YM;low */
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
if (ret != 0) {
@@ -541,7 +541,7 @@ static void __init reserve_crashkernel_l
low_size = swiotlb_size_or_default() + (8UL<<20);
auto_set = true;
} else {
- /* passed with crashkernel_low=0 ? */
+ /* passed with crashkernel=0;low ? */
if (!low_size)
return;
}
@@ -577,7 +577,7 @@ static void __init reserve_crashkernel(v

total_mem = memblock_phys_mem_size();

- /* crashkernel_high=XM */
+ /* crashkernel=XM;high */
ret = parse_crashkernel_high(boot_command_line, total_mem,
&crash_size, &crash_base);
if (ret != 0 || crash_size <= 0) {
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1360,7 +1360,7 @@ static int __init parse_crashkernel_simp

if (*cur == '@')
*crash_base = memparse(cur+1, &cur);
- else if (*cur != ' ' && *cur != '\0') {
+ else if (*cur != ' ' && *cur != ';' && *cur != '\0') {
pr_warning("crashkernel: unrecognized char\n");
return -EINVAL;
}
@@ -1376,7 +1376,8 @@ static int __init __parse_crashkernel(ch
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base,
- const char *name)
+ const char *name,
+ const char *suffix)
{
char *p = cmdline, *ck_cmdline = NULL;
char *first_colon, *first_space;
@@ -1388,7 +1389,17 @@ static int __init __parse_crashkernel(ch
/* find crashkernel and use the last one if there are more */
p = strstr(p, name);
while (p) {
- ck_cmdline = p;
+ if (!suffix)
+ ck_cmdline = p;
+ else {
+ char *end_p = strchr(p, ' ');
+
+ if (!end_p)
+ end_p = p + strlen(p);
+ end_p -= strlen(suffix);
+ if (!strncmp(end_p, suffix, strlen(suffix)))
+ ck_cmdline = p;
+ }
p = strstr(p+1, name);
}

@@ -1419,7 +1430,7 @@ int __init parse_crashkernel(char *cmdli
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel=");
+ "crashkernel=", NULL);
}

int __init parse_crashkernel_high(char *cmdline,
@@ -1428,7 +1439,7 @@ int __init parse_crashkernel_high(char *
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel_high=");
+ "crashkernel=", ";high");
}

int __init parse_crashkernel_low(char *cmdline,
@@ -1437,7 +1448,7 @@ int __init parse_crashkernel_low(char *c
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel_low=");
+ "crashkernel=", ";low");
}

static void update_vmcoreinfo_note(void)

Yinghai Lu

unread,
Apr 2, 2013, 1:21:18 PM4/2/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 8 ++++++--
arch/x86/kernel/setup.c | 26 ++++++++++++++++++++------
include/linux/kexec.h | 2 ++
kernel/kexec.c | 9 +++++++++
4 files changed, 37 insertions(+), 8 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,9 +603,13 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_high=size[KMG]
+ [KNL, x86_64] range could be above 4G. Allow kernel
+ to allocate physical memory region from top, so could
+ be above 4G if system have more than 4G ram installed.
crashkernel_low=size[KMG]
- [KNL, x86_64] range under 4G. When crashkernel= is
- passed, kernel allocate physical memory region
+ [KNL, x86_64] range under 4G. When crashkernel_high= is
+ passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that need swiotlb later. Kernel would try to allocate
some region below 4G automatically. This one let
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -507,11 +507,14 @@ static void __init memblock_x86_reserve_
/*
* Keep the crash kernel below this limit. On 32 bits earlier kernels
* would limit the kernel to the low 512 MiB due to mapping restrictions.
+ * On 64bit, old kexec-tools need to under 896MiB.
*/
#ifdef CONFIG_X86_32
-# define CRASH_KERNEL_ADDR_MAX (512 << 20)
+# define CRASH_KERNEL_ADDR_LOW_MAX (512 << 20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+# define CRASH_KERNEL_ADDR_LOW_MAX (896UL<<20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX MAXMEM
#endif

static void __init reserve_crashkernel_low(void)
@@ -525,6 +528,7 @@ static void __init reserve_crashkernel_l
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
+ /* crashkernel_low=YM */
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
if (ret != 0) {
@@ -568,14 +572,22 @@ static void __init reserve_crashkernel(v
const unsigned long long alignment = 16<<20; /* 16M */
unsigned long long total_mem;
unsigned long long crash_size, crash_base;
+ bool high = true;
int ret;

total_mem = memblock_phys_mem_size();

- ret = parse_crashkernel(boot_command_line, total_mem,
+ /* crashkernel_high=XM */
+ ret = parse_crashkernel_high(boot_command_line, total_mem,
&crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
- return;
+ if (ret != 0 || crash_size <= 0) {
+ /* crashkernel=XM */
+ ret = parse_crashkernel(boot_command_line, total_mem,
+ &crash_size, &crash_base);
+ if (ret != 0 || crash_size <= 0)
+ return;
+ high = false;
+ }

/* 0 means: find the address automatically */
if (crash_base <= 0) {
@@ -583,7 +595,9 @@ static void __init reserve_crashkernel(v
* kexec want bzImage is below CRASH_KERNEL_ADDR_MAX
*/
crash_base = memblock_find_in_range(alignment,
- CRASH_KERNEL_ADDR_MAX, crash_size, alignment);
+ high ? CRASH_KERNEL_ADDR_HIGH_MAX :
+ CRASH_KERNEL_ADDR_LOW_MAX,
+ crash_size, alignment);

if (!crash_base) {
pr_info("crashkernel reservation failed - No suitable area found.\n");
Index: linux-2.6/include/linux/kexec.h
===================================================================
--- linux-2.6.orig/include/linux/kexec.h
+++ linux-2.6/include/linux/kexec.h
@@ -200,6 +200,8 @@ extern size_t vmcoreinfo_max_size;

int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
+ unsigned long long *crash_size, unsigned long long *crash_base);
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
int crash_shrink_memory(unsigned long new_size);
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1422,6 +1422,15 @@ int __init parse_crashkernel(char *cmdli
"crashkernel=");
}

+int __init parse_crashkernel_high(char *cmdline,
+ unsigned long long system_ram,
+ unsigned long long *crash_size,
+ unsigned long long *crash_base)
+{
+ return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+ "crashkernel_high=");
+}
+
int __init parse_crashkernel_low(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,

Yinghai Lu

unread,
Apr 2, 2013, 1:22:24 PM4/2/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y. This causes regression
if iommu is not enabled. Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.

Reported-by: WANG Chao <chao...@redhat.com>
Tested-by: WANG Chao <chao...@redhat.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 15 ++++++++++++---
arch/x86/kernel/setup.c | 20 +++++++++++++++++---
include/linux/swiotlb.h | 1 +
lib/swiotlb.c | 19 +++++++++++++++----
4 files changed, 45 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -521,19 +521,33 @@ static void __init reserve_crashkernel_l
unsigned long long low_base = 0, low_size = 0;
unsigned long total_low_mem;
unsigned long long base;
+ bool auto_set = false;
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
- if (ret != 0 || low_size <= 0)
- return;
+ if (ret != 0) {
+ /*
+ * two parts from lib/swiotlb.c:
+ * swiotlb size: user specified with swiotlb= or default.
+ * swiotlb overflow buffer: now is hardcoded to 32k,
+ * round to 8M to cover more others.
+ */
+ low_size = swiotlb_size_or_default() + (8UL<<20);
+ auto_set = true;
+ } else {
+ /* passed with crashkernel_low=0 ? */
+ if (!low_size)
+ return;
+ }

low_base = memblock_find_in_range(low_size, (1ULL<<32),
low_size, alignment);

if (!low_base) {
- pr_info("crashkernel low reservation failed - No suitable area found.\n");
+ if (!auto_set)
+ pr_info("crashkernel low reservation failed - No suitable area found.\n");

return;
}
Index: linux-2.6/include/linux/swiotlb.h
===================================================================
--- linux-2.6.orig/include/linux/swiotlb.h
+++ linux-2.6/include/linux/swiotlb.h
@@ -25,6 +25,7 @@ extern int swiotlb_force;
extern void swiotlb_init(int verbose);
int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
+unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);

/*
Index: linux-2.6/lib/swiotlb.c
===================================================================
--- linux-2.6.orig/lib/swiotlb.c
+++ linux-2.6/lib/swiotlb.c
@@ -105,9 +105,9 @@ setup_io_tlb_npages(char *str)
if (!strcmp(str, "force"))
swiotlb_force = 1;

- return 1;
+ return 0;
}
-__setup("swiotlb=", setup_io_tlb_npages);
+early_param("swiotlb", setup_io_tlb_npages);
/* make io_tlb_overflow tunable too? */

unsigned long swiotlb_nr_tbl(void)
@@ -115,6 +115,18 @@ unsigned long swiotlb_nr_tbl(void)
return io_tlb_nslabs;
}
EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
+
+/* default to 64MB */
+#define IO_TLB_DEFAULT_SIZE (64UL<<20)
+unsigned long swiotlb_size_or_default(void)
+{
+ unsigned long size;
+
+ size = io_tlb_nslabs << IO_TLB_SHIFT;
+
+ return size ? size : (IO_TLB_DEFAULT_SIZE);
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -188,8 +200,7 @@ int __init swiotlb_init_with_tbl(char *t
void __init
swiotlb_init(int verbose)
{
- /* default to 64MB */
- size_t default_size = 64UL<<20;
+ size_t default_size = IO_TLB_DEFAULT_SIZE;
unsigned char *vstart;
unsigned long bytes;

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.

- crashkernel_low=size[KMG]
- [KNL, x86] parts under 4G.
-
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
@@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_low=size[KMG]
+ [KNL, x86_64] range under 4G. When crashkernel= is
+ passed, kernel allocate physical memory region
+ above 4G, that cause second kernel crash on system
+ that need swiotlb later. Kernel would try to allocate
+ some region below 4G automatically. This one let
+ user to specify own low range under 4G for second
+ kernel instead.
+ 0: to disable low allocation on systems that do not
+ need swiotlb, that will save 72M low ram in first
+ kernel.
+
cs89x0_dma= [HW,NET]
Format: <dma>

Vivek Goyal

unread,
Apr 2, 2013, 2:06:21 PM4/2/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On Tue, Apr 02, 2013 at 10:19:42AM -0700, Yinghai Lu wrote:

[..]
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6/Documentation/kernel-parameters.txt
> @@ -603,9 +603,13 @@ bytes respectively. Such letter suffixes
> a memory unit (amount[KMG]). See also
> Documentation/kdump/kdump.txt for an example.
>
> + crashkernel_high=size[KMG]
> + [KNL, x86_64] range could be above 4G. Allow kernel
> + to allocate physical memory region from top, so could
> + be above 4G if system have more than 4G ram installed.
> crashkernel_low=size[KMG]
> - [KNL, x86_64] range under 4G. When crashkernel= is
> - passed, kernel allocate physical memory region
> + [KNL, x86_64] range under 4G. When crashkernel_high= is
> + passed, kernel could allocate physical memory region
> above 4G, that cause second kernel crash on system
> that need swiotlb later. Kernel would try to allocate
> some region below 4G automatically. This one let

Hi Yinghai,

I think there are still some issues with crashkernel= semantics.

What if I specify both crashkernel_high= as well as crashkernel_low=.
Looks like crashkernel_low will be parsed only if crashkernel_high
reserved memory above 4G.

So if one gives following command line.

crashkernel=256M;high crashkernel=100M;low

Final outcome will vary across systems. If system has all RAM below 4G
we will see only one 256M chunk reserved otherwise we will see one 256M
and one 100M chunk reserved. And a user might think that I asked you to
reserve two chunks. One 256M and otherr 100M.

Also interesting is, what if user specifies both crashkernel=X and
crashkernel=Y;high. Looks like we will ignore crashkernel=X and honor
only crashkernel=Y;high.

So the problem here is, do we want to parse multiple crashkernel=
command line and support reserving multiple ranges? Till 3.8 kernel
we did not do that. If we want to do that, then parsing crashkernel=
logic needs to be more generic.

- I would say that to keep things simple, we can stick to semantics
of 3.8 kernel and say only first crashkernel= option is parsed and
acted upon. Rest are ignored. Trying to support multiple ranges will
require much more work.

- If we say that we will only parse first crashkernel= option, then
crashkernel=X;high and crashkernel0;low can not co-exist. I would say
use a new option to disable automatically reserved low memory. Say,
crashkernel_no_auto_low; That way it can co-exist with other
crashkernel= options without any conflict.

In fact this will also work with crashkernel=X, if we decide to extend
crashkernel=X to look for memory below 4G and look beyond 4G.

- Support crashkernel=X;high as a new crashkernel= option.

Thanks
Vivek

Yinghai Lu

unread,
Apr 2, 2013, 2:42:16 PM4/2/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
Yes, that is intentional.

If you like, I could remove that checking, just add the low.

>
> Also interesting is, what if user specifies both crashkernel=X and
> crashkernel=Y;high. Looks like we will ignore crashkernel=X and honor
> only crashkernel=Y;high.

Yes, that is intentional.

>
> So the problem here is, do we want to parse multiple crashkernel=
> command line and support reserving multiple ranges? Till 3.8 kernel
> we did not do that. If we want to do that, then parsing crashkernel=
> logic needs to be more generic.
>
> - I would say that to keep things simple, we can stick to semantics
> of 3.8 kernel and say only first crashkernel= option is parsed and
> acted upon. Rest are ignored. Trying to support multiple ranges will
> require much more work.

we could do that, but that is not necessary.

>
> - If we say that we will only parse first crashkernel= option, then
> crashkernel=X;high and crashkernel0;low can not co-exist. I would say
> use a new option to disable automatically reserved low memory. Say,
> crashkernel_no_auto_low; That way it can co-exist with other
> crashkernel= options without any conflict.

I don't see any reason to make them co-exist.

aka:
old kexec-tools stay with "crashkernel=X"
new kexec-tools stay with
1. like old kexec tools
2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
Y could be 100M or 0 etc.

>
> In fact this will also work with crashkernel=X, if we decide to extend
> crashkernel=X to look for memory below 4G and look beyond 4G.
>
> - Support crashkernel=X;high as a new crashkernel= option.

Actually we still support only one region that is could be high or low,
and that extra low is just for workaround
buggy system that does not support iommu with kdump.

Thanks

Yinghai

Yinghai Lu

unread,
Apr 2, 2013, 2:49:24 PM4/2/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yin...@kernel.org> wrote:
> aka:
> old kexec-tools stay with "crashkernel=X"
> new kexec-tools stay with
> 1. like old kexec tools
> 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> Y could be 100M or 0 etc.

I keep the old logic like:
if there are several "crashkernel=X,high", only last one is honored.
if there are several "crashkernel=Y,low", only last one is honored.

Vivek Goyal

unread,
Apr 2, 2013, 3:09:44 PM4/2/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
Why it is intentional. It seems be to aberration from user's point of
view.

>
> If you like, I could remove that checking, just add the low.
>
> >
> > Also interesting is, what if user specifies both crashkernel=X and
> > crashkernel=Y;high. Looks like we will ignore crashkernel=X and honor
> > only crashkernel=Y;high.
>
> Yes, that is intentional.

Again, it is not clear that why are we prefering crashkernel=Y;high
over crashkernel=X. There needs to be clearly defined behavior.

>
> >
> > So the problem here is, do we want to parse multiple crashkernel=
> > command line and support reserving multiple ranges? Till 3.8 kernel
> > we did not do that. If we want to do that, then parsing crashkernel=
> > logic needs to be more generic.
> >
> > - I would say that to keep things simple, we can stick to semantics
> > of 3.8 kernel and say only first crashkernel= option is parsed and
> > acted upon. Rest are ignored. Trying to support multiple ranges will
> > require much more work.
>
> we could do that, but that is not necessary.
>
> >
> > - If we say that we will only parse first crashkernel= option, then
> > crashkernel=X;high and crashkernel0;low can not co-exist. I would say
> > use a new option to disable automatically reserved low memory. Say,
> > crashkernel_no_auto_low; That way it can co-exist with other
> > crashkernel= options without any conflict.
>
> I don't see any reason to make them co-exist.

We still need to define a clear behavior. What happens if user specifies
multiple crashkernel= options.

>
> aka:
> old kexec-tools stay with "crashkernel=X"
> new kexec-tools stay with
> 1. like old kexec tools
> 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> Y could be 100M or 0 etc.

You are thinking that user will specify only the options you are looking
for. But a user is free to specify all the possible inputs and we need
to define very clearly what happens in those cases.

>
> >
> > In fact this will also work with crashkernel=X, if we decide to extend
> > crashkernel=X to look for memory below 4G and look beyond 4G.
> >
> > - Support crashkernel=X;high as a new crashkernel= option.
>
> Actually we still support only one region that is could be high or low,
> and that extra low is just for workaround
> buggy system that does not support iommu with kdump.

Well, crashkernel=X;high crashkernel=Y;low will reserve two ranges (one
high and one low). So in some cases we are supporting 2 and in some
cases we are supporting 1 range.

So I still think that let us stick to old behavior of supporting one
crashkernel= option. Last crashkernel= option on command line will be
acted upon.

Thanks
Vivek

Vivek Goyal

unread,
Apr 2, 2013, 3:11:46 PM4/2/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yin...@kernel.org> wrote:
> > aka:
> > old kexec-tools stay with "crashkernel=X"
> > new kexec-tools stay with
> > 1. like old kexec tools
> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> > Y could be 100M or 0 etc.
>
> I keep the old logic like:
> if there are several "crashkernel=X,high", only last one is honored.
> if there are several "crashkernel=Y,low", only last one is honored.

Yes but if different types of crashkernel= options are mixes then
behavior is not defined.

crashkernel=X,high crashkernel=X
crashkernel=X,high crashkernel=Y;low
crashkernel=Y;low crashkernel=X

And possibilities go on. So I think it makes life simpler if we always
parse last crashkernel= option and act upon that. And use
crashkernel_no_auto_low to opt out of auto reserved low memory area.

Thanks
Vivek

Yinghai Lu

unread,
Apr 2, 2013, 4:00:58 PM4/2/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 2, 2013 at 12:11 PM, Vivek Goyal <vgo...@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
>> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yin...@kernel.org> wrote:
>> > aka:
>> > old kexec-tools stay with "crashkernel=X"
>> > new kexec-tools stay with
>> > 1. like old kexec tools
>> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
>> > Y could be 100M or 0 etc.
>>
>> I keep the old logic like:
>> if there are several "crashkernel=X,high", only last one is honored.
>> if there are several "crashkernel=Y,low", only last one is honored.
>
> Yes but if different types of crashkernel= options are mixes then
> behavior is not defined.

dmesg or /proc/iomem will show them what is finally reserved.

>
> crashkernel=X,high crashkernel=X

==> crashkernel=X is tossed away.

> crashkernel=X,high crashkernel=Y;low

normal case. if user want more or disable low range.

> crashkernel=Y;low crashkernel=X

crashkernel=X will be used.

>
> And possibilities go on. So I think it makes life simpler if we always
> parse last crashkernel= option and act upon that. And use
> crashkernel_no_auto_low to opt out of auto reserved low memory area.

No, that is not just disable. User could want more like 128M instead of 72M.

Thanks

Yinghai

Yinghai Lu

unread,
Apr 2, 2013, 4:05:05 PM4/2/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 2, 2013 at 12:09 PM, Vivek Goyal <vgo...@redhat.com> wrote:
>>
>> Actually we still support only one region that is could be high or low,
>> and that extra low is just for workaround
>> buggy system that does not support iommu with kdump.
>
> Well, crashkernel=X;high crashkernel=Y;low will reserve two ranges (one
> high and one low). So in some cases we are supporting 2 and in some
> cases we are supporting 1 range.

when you have more 4G ram or not.

and when we have two ranges, low range actually is not for second kernel to
be loaded. and it is only for swiotlb in case.

>
> So I still think that let us stick to old behavior of supporting one
> crashkernel= option. Last crashkernel= option on command line will be
> acted upon.

Yes, only one crashkernel=;high and one crashkernel=;low is honored.

Vivek Goyal

unread,
Apr 2, 2013, 4:12:09 PM4/2/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 02, 2013 at 01:00:46PM -0700, Yinghai Lu wrote:
> On Tue, Apr 2, 2013 at 12:11 PM, Vivek Goyal <vgo...@redhat.com> wrote:
> > On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
> >> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yin...@kernel.org> wrote:
> >> > aka:
> >> > old kexec-tools stay with "crashkernel=X"
> >> > new kexec-tools stay with
> >> > 1. like old kexec tools
> >> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
> >> > Y could be 100M or 0 etc.
> >>
> >> I keep the old logic like:
> >> if there are several "crashkernel=X,high", only last one is honored.
> >> if there are several "crashkernel=Y,low", only last one is honored.
> >
> > Yes but if different types of crashkernel= options are mixes then
> > behavior is not defined.
>
> dmesg or /proc/iomem will show them what is finally reserved.
>
> >
> > crashkernel=X,high crashkernel=X
>
> ==> crashkernel=X is tossed away.

You are just describing what your code does. There is no theme or
justification behind this behavior. There is no uniformity. A user can
question that so far you used to honor last crashkernel= parameter and
suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
overriding crashkernel=X and it is not obivious why.

>
> > crashkernel=X,high crashkernel=Y;low
>
> normal case. if user want more or disable low range.

Again, behavior is not clear to user. Please stop describing your code
behavior. Discuss more in terms of presenting a consistent interface to
user.

>
> > crashkernel=Y;low crashkernel=X
>
> crashkernel=X will be used.

Why? When crashkernel=X is specified with crashkernel=Y;high, high takes
over but when crashkernel=X is specified with crashkernel=Y;low,
crashkernel=X takes over. These is highly unintutive.

>
> >
> > And possibilities go on. So I think it makes life simpler if we always
> > parse last crashkernel= option and act upon that. And use
> > crashkernel_no_auto_low to opt out of auto reserved low memory area.
>
> No, that is not just disable. User could want more like 128M instead of 72M.

If user wants 128M in low memory, they will just specify
crashkernel=128M;low

If they want to control multiple ranges of memory, then that's the feature
we currently don't support. Currently we support only reserving one range
of memory.

If you want to support multiple ranges of memory,then do it properly.
Parse all crashkernel= options, prepare a list of memory to be reserved
and unreserved, resolve all the conflicts between various options and
then reserve the memory. But that does not seem to be a requirement at
this point of time.

Thanks
Vivek

Vivek Goyal

unread,
Apr 2, 2013, 4:25:30 PM4/2/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 02, 2013 at 04:11:48PM -0400, Vivek Goyal wrote:

[..]
> > No, that is not just disable. User could want more like 128M instead of 72M.

So apart from swiotlb, are there other scenarios where we need to reserve
low memory (With main memory reserved high).

Yinghai Lu

unread,
Apr 2, 2013, 4:36:11 PM4/2/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 2, 2013 at 1:11 PM, Vivek Goyal <vgo...@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 01:00:46PM -0700, Yinghai Lu wrote:
>> On Tue, Apr 2, 2013 at 12:11 PM, Vivek Goyal <vgo...@redhat.com> wrote:
>> > On Tue, Apr 02, 2013 at 11:49:13AM -0700, Yinghai Lu wrote:
>> >> On Tue, Apr 2, 2013 at 11:42 AM, Yinghai Lu <yin...@kernel.org> wrote:
>> >> > aka:
>> >> > old kexec-tools stay with "crashkernel=X"
>> >> > new kexec-tools stay with
>> >> > 1. like old kexec tools
>> >> > 2. or "crashkernel=X,high" or "crashkernel=X,high crashkernel=Y,low",
>> >> > Y could be 100M or 0 etc.
>> >>
>> >> I keep the old logic like:
>> >> if there are several "crashkernel=X,high", only last one is honored.
>> >> if there are several "crashkernel=Y,low", only last one is honored.
>> >
>> > Yes but if different types of crashkernel= options are mixes then
>> > behavior is not defined.
>>
>> dmesg or /proc/iomem will show them what is finally reserved.
>>
>> >
>> > crashkernel=X,high crashkernel=X
>>
>> ==> crashkernel=X is tossed away.
>
> You are just describing what your code does. There is no theme or
> justification behind this behavior. There is no uniformity. A user can
> question that so far you used to honor last crashkernel= parameter and
> suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
> overriding crashkernel=X and it is not obivious why.

Let me repeat again:
we keep crashkernel=X old behavior with old kexec-tools.
crashkernel=X;high is for new kexec-tools that support loading high.

If the user want to use ,high but still with old kexec-tools, that is
not going to work.

Can we just keep it separated?

>
>>
>> > crashkernel=X,high crashkernel=Y;low
>>
>> normal case. if user want more or disable low range.
>
> Again, behavior is not clear to user. Please stop describing your code
> behavior. Discuss more in terms of presenting a consistent interface to
> user.
>
>>
>> > crashkernel=Y;low crashkernel=X
>>
>> crashkernel=X will be used.
>
> Why? When crashkernel=X is specified with crashkernel=Y;high, high takes
> over but when crashkernel=X is specified with crashkernel=Y;low,
> crashkernel=X takes over. These is highly unintutive.
>
>>
>> >
>> > And possibilities go on. So I think it makes life simpler if we always
>> > parse last crashkernel= option and act upon that. And use
>> > crashkernel_no_auto_low to opt out of auto reserved low memory area.
>>
>> No, that is not just disable. User could want more like 128M instead of 72M.
>
> If user wants 128M in low memory, they will just specify
> crashkernel=128M;low

in the kernel-parameter.txt, already says ;low is need to used with ;high.

>
> If they want to control multiple ranges of memory, then that's the feature
> we currently don't support. Currently we support only reserving one range
> of memory.
>
> If you want to support multiple ranges of memory,then do it properly.
> Parse all crashkernel= options, prepare a list of memory to be reserved
> and unreserved, resolve all the conflicts between various options and
> then reserve the memory. But that does not seem to be a requirement at
> this point of time.

No we does not support multiple ranges, as it will need more changes
in kexec-tools.

Can we stop here with those four patches?

Later, we can extend it if it is really needed.

Thanks

Yinghai

Vivek Goyal

unread,
Apr 3, 2013, 9:19:02 AM4/3/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 02, 2013 at 01:36:02PM -0700, Yinghai Lu wrote:

[..]
> > You are just describing what your code does. There is no theme or
> > justification behind this behavior. There is no uniformity. A user can
> > question that so far you used to honor last crashkernel= parameter and
> > suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
> > overriding crashkernel=X and it is not obivious why.
>
> Let me repeat again:
> we keep crashkernel=X old behavior with old kexec-tools.
> crashkernel=X;high is for new kexec-tools that support loading high.
>
> If the user want to use ,high but still with old kexec-tools, that is
> not going to work.
>
> Can we just keep it separated?

Kernel does not know about old kexec-tools or new kexec-tools. Neither
kernel can enforce what command line options are passed by user. So
kernel needs to define a clean interface here which is easily understood
and is extensible also in future.

[..]
> >
> > If user wants 128M in low memory, they will just specify
> > crashkernel=128M;low
>
> in the kernel-parameter.txt, already says ;low is need to used with ;high.

But why are we tying ;low to ;high. One should be easily extend
crashkernel=X to be able to reserve memory above 4G if specified amount
is not available below 4G. In that case also one might want to reserve
some low memory?

For that matter crashkernel=range1:size,range2:size syntax should be
extendible too to reserve memory above 4G if desired size of memory
is not available in low memory.

Now in those cases too, one would like to have 72M of low memory
reserved. So ;low shoud not be tied to ;high necessarily.

In fact current code does not care whetehr ;high was specified or not.
If memory is reserved above 4G, ;low code will kick in.

>
> >
> > If they want to control multiple ranges of memory, then that's the feature
> > we currently don't support. Currently we support only reserving one range
> > of memory.
> >
> > If you want to support multiple ranges of memory,then do it properly.
> > Parse all crashkernel= options, prepare a list of memory to be reserved
> > and unreserved, resolve all the conflicts between various options and
> > then reserve the memory. But that does not seem to be a requirement at
> > this point of time.
>
> No we does not support multiple ranges, as it will need more changes
> in kexec-tools.
>
> Can we stop here with those four patches?
>
> Later, we can extend it if it is really needed.

crashkernel= options are already confusing. I think we with this patchset
we will just make them even more confusing and future extensions
difficult.

We really need to stick to the notion of only one crashkernel= option
is accepted and that is last one on command line. And if need be,
we need to work on multi range reservation feature where we process
and reserve ranges as specified by all crashkernel= parameters on
command line.

Creating new combinations where some crashkernel= are preferred over
others and some crashkernel= options work with only selected crashkernel=
options, is asking for trouble, especially keeping in mind future
extensions.

I prefer following for 3.9.

- process only right most crashkernel= option.
- implement crashkernel_no_auto_low option to opt out of auto reserved
low memory
- implement crashkernel=X;high to support high memory reservations.

And now old kexec-tools user can use crashkernel=X while users needing
high memory reservation can use crashkernel=X;high.

If you really want to support user defined crashkernel=X;low along with
crashkernel=Y;high, that is really a multi range reservation feature and
need to be implemented properly instead of coming up with short cuts.

Thanks
Vivek

Yinghai Lu

unread,
Apr 3, 2013, 1:12:53 PM4/3/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> On Tue, Apr 02, 2013 at 01:36:02PM -0700, Yinghai Lu wrote:
>
> [..]
>> > You are just describing what your code does. There is no theme or
>> > justification behind this behavior. There is no uniformity. A user can
>> > question that so far you used to honor last crashkernel= parameter and
>> > suddenly in 3.9 that's no more the case. Out of blue crashkernel=X,high is
>> > overriding crashkernel=X and it is not obivious why.
>>
>> Let me repeat again:
>> we keep crashkernel=X old behavior with old kexec-tools.
>> crashkernel=X;high is for new kexec-tools that support loading high.
>>
>> If the user want to use ,high but still with old kexec-tools, that is
>> not going to work.
>>
>> Can we just keep it separated?
>
> Kernel does not know about old kexec-tools or new kexec-tools. Neither
> kernel can enforce what command line options are passed by user. So
> kernel needs to define a clean interface here which is easily understood
> and is extensible also in future.

Looks you are chasing wrong direction.

Those four patches fixes the regression that Wang and you reported,
User don't need to change their kexec-tools and boot command lines
kdump still works.

We will never can stop user doing crazy thing with their system.

>
> [..]
>> >
>> > If user wants 128M in low memory, they will just specify
>> > crashkernel=128M;low
>>
>> in the kernel-parameter.txt, already says ;low is need to used with ;high.
>
> But why are we tying ;low to ;high. One should be easily extend
> crashkernel=X to be able to reserve memory above 4G if specified amount
> is not available below 4G. In that case also one might want to reserve
> some low memory?

I want to keep crashkernel=X to the old behavior.

If you want to have crashkernel=X to allocate high above 4G, old kexec-tools
will not work with new kernel.

>
> For that matter crashkernel=range1:size,range2:size syntax should be
> extendible too to reserve memory above 4G if desired size of memory
> is not available in low memory.
>
> Now in those cases too, one would like to have 72M of low memory
> reserved. So ;low shoud not be tied to ;high necessarily.
>
> In fact current code does not care whetehr ;high was specified or not.
> If memory is reserved above 4G, ;low code will kick in.

No, that is not right.

only when ;high is specified, kernel will try to allocate high above 4G.


>
>>
>> >
>> > If they want to control multiple ranges of memory, then that's the feature
>> > we currently don't support. Currently we support only reserving one range
>> > of memory.
>> >
>> > If you want to support multiple ranges of memory,then do it properly.
>> > Parse all crashkernel= options, prepare a list of memory to be reserved
>> > and unreserved, resolve all the conflicts between various options and
>> > then reserve the memory. But that does not seem to be a requirement at
>> > this point of time.
>>
>> No we does not support multiple ranges, as it will need more changes
>> in kexec-tools.
>>
>> Can we stop here with those four patches?
>>
>> Later, we can extend it if it is really needed.
>
> crashkernel= options are already confusing. I think we with this patchset
> we will just make them even more confusing and future extensions
> difficult.

So keep crashkernel= without high and low to old behavior.

>
> We really need to stick to the notion of only one crashkernel= option
> is accepted and that is last one on command line. And if need be,
> we need to work on multi range reservation feature where we process
> and reserve ranges as specified by all crashkernel= parameters on
> command line.

That is kept.

and only last high is honored

>
> Creating new combinations where some crashkernel= are preferred over
> others and some crashkernel= options work with only selected crashkernel=
> options, is asking for trouble, especially keeping in mind future
> extensions.

I don't think so.

old conf that works before still use crashkernel= with high and low.
old conf that does not work, could switch to crashkernel=;high/low
with new kexec-tools

>
> I prefer following for 3.9.
>
> - process only right most crashkernel= option.

what is "right most" ?
only last crashkernel=X is honored?
I restored that already with those four patches.

> - implement crashkernel_no_auto_low option to opt out of auto reserved
> low memory

No, that is ugly.

> - implement crashkernel=X;high to support high memory reservations.
>
> And now old kexec-tools user can use crashkernel=X while users needing
> high memory reservation can use crashkernel=X;high.

The four patches did not do that?

>
> If you really want to support user defined crashkernel=X;low along with
> crashkernel=Y;high, that is really a multi range reservation feature and
> need to be implemented properly instead of coming up with short cuts.

No it is not.

It's *you* want me to change "Crash kernel low" to "Crash kernel".

Do we need to drop second patch? So will still keep
"Crash kernel low" in /proc/iomem?

Thanks

Yinghai

Yinghai Lu

unread,
Apr 3, 2013, 1:32:31 PM4/3/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 3, 2013 at 10:12 AM, Yinghai Lu <yin...@kernel.org> wrote:
> On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal <vgo...@redhat.com> wrote:
>
>> - implement crashkernel_no_auto_low option to opt out of auto reserved
>> low memory
>
> No, that is ugly.
..
>
> It's *you* want me to change "Crash kernel low" to "Crash kernel".
>
> Do we need to drop second patch? So will still keep
> "Crash kernel low" in /proc/iomem?

also we can drop the last patch and keep "crashkernel_high=" and
"crashkernel_low="

as you even like to introduce "crashkernel_no_auto_low".

Vivek Goyal

unread,
Apr 3, 2013, 1:37:13 PM4/3/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 03, 2013 at 10:12:46AM -0700, Yinghai Lu wrote:

[..]
> >> Can we just keep it separated?
> >
> > Kernel does not know about old kexec-tools or new kexec-tools. Neither
> > kernel can enforce what command line options are passed by user. So
> > kernel needs to define a clean interface here which is easily understood
> > and is extensible also in future.
>
> Looks you are chasing wrong direction.

I don't think so. Once we are defining bunch of crashkernel= parameters
we need to make sure it is well understood how do they work together.

>
> Those four patches fixes the regression that Wang and you reported,
> User don't need to change their kexec-tools and boot command lines
> kdump still works.

I am not objecting to that. I am objecting to introducing unexpected
behavior in crashkernel= command line space and ignoring how does
it co-exist with existing syntax.

>
> We will never can stop user doing crazy thing with their system.

Yes, but we need to make sure if bunch of crashkernel= are passed on
kernel command line then behavior is well defined and extensible for
future usage.

[..]
> > But why are we tying ;low to ;high. One should be easily extend
> > crashkernel=X to be able to reserve memory above 4G if specified amount
> > is not available below 4G. In that case also one might want to reserve
> > some low memory?
>
> I want to keep crashkernel=X to the old behavior.
>
> If you want to have crashkernel=X to allocate high above 4G, old kexec-tools
> will not work with new kernel.

It does not work anyway. Because current crashkernel=X will fail to
reserve memory if sufficient memory is not available below 896MB. So no
surprises there.

It will help new kexec-tools continue to work with crashkernel=X even
in high memory ranges.

> >
> > For that matter crashkernel=range1:size,range2:size syntax should be
> > extendible too to reserve memory above 4G if desired size of memory
> > is not available in low memory.
> >
> > Now in those cases too, one would like to have 72M of low memory
> > reserved. So ;low shoud not be tied to ;high necessarily.
> >
> > In fact current code does not care whetehr ;high was specified or not.
> > If memory is reserved above 4G, ;low code will kick in.
>
> No, that is not right.
>
> only when ;high is specified, kernel will try to allocate high above 4G.

As of today. But one can always extend crashkernel=X to allocate from
high addresses if memory is not available in low addresses without
breaking old tools.

Current ;low logic does not care whether high reservation was done using
crashkernel=X or crashkernel=X;high. And it should not. Tying ;low with
;high is the problem. Each crashkernel= directive should be able to
specify its own range to reserve with constraints. There is no dependency
on other crashkernel= options passed. And that keeps the behavior well
defined.

If we start introducing dependency between various crashkernel= options
our behavior matrix will explode and it will be very difficult to explain
how does it work.

[..]
> > We really need to stick to the notion of only one crashkernel= option
> > is accepted and that is last one on command line. And if need be,
> > we need to work on multi range reservation feature where we process
> > and reserve ranges as specified by all crashkernel= parameters on
> > command line.
>
> That is kept.
>
> and only last high is honored

What about rest of the crashkernel=. I am not sure why are you not
seeing that you are stepping onto to already defined crashkernel=
command line option and breaking its semantics.

If you were defining crashkernel_foo, I couldn't care less.

Given the fact you are using crashkernel=, you need to take already
defined parameters in to consideration and stick to those semantics.

- Either support single range reservation and always use rightmost
crashkernel= option.

- Or support multiple ranges and process all the crashkernel= options
as specified on command line.

Please don't define more modes here.

>
> >
> > Creating new combinations where some crashkernel= are preferred over
> > others and some crashkernel= options work with only selected crashkernel=
> > options, is asking for trouble, especially keeping in mind future
> > extensions.
>
> I don't think so.
>
> old conf that works before still use crashkernel= with high and low.
> old conf that does not work, could switch to crashkernel=;high/low
> with new kexec-tools

Please come out of this old conf and new conf mode. That is specific
usage of interface you are providing. But interface semantics should
still be well defined. And currently they are not.

>
> >
> > I prefer following for 3.9.
> >
> > - process only right most crashkernel= option.
>
> what is "right most" ?
> only last crashkernel=X is honored?
> I restored that already with those four patches.

only last crashkernel= is honored. It could be either crashkernel=X
or crashkernel=X;high or crashkernel=X;<option1>;<option2>;....

Point being crashkernel= space is taken and there is scope for defining
more options to it as our needs grow.

>
> > - implement crashkernel_no_auto_low option to opt out of auto reserved
> > low memory
>
> No, that is ugly.

It might be but it is better than special casing ;high or ;low.

>
> > - implement crashkernel=X;high to support high memory reservations.
> >
> > And now old kexec-tools user can use crashkernel=X while users needing
> > high memory reservation can use crashkernel=X;high.
>
> The four patches did not do that?

I am objecting to.

- Not parsing right most crashkernel=
- Tying ;high and ;low together.

>
> >
> > If you really want to support user defined crashkernel=X;low along with
> > crashkernel=Y;high, that is really a multi range reservation feature and
> > need to be implemented properly instead of coming up with short cuts.
>
> No it is not.
>
> It's *you* want me to change "Crash kernel low" to "Crash kernel".
>
> Do we need to drop second patch? So will still keep
> "Crash kernel low" in /proc/iomem?

No. I want to keep everything named "Crash kernel". There is no point
in naming low memory as "Crash kernel low".

Just that right now we don't support reserving multiple range feature,
execpt the exception of a low memory reserved automatically. And we
are providing a parameter to disable reservation of low memory.

If we don't support multiple ranges, don't try to special case it by
;high and ;low combination. Then one needs to parse all crashkernel=
options passed on command line (including crashkernel=X) and reserve
all the ranges as specified by various crashkernel= options.

You can't say, well ;high is specified so I am going to process only
crashkernel=X;high and crashkernel=Y;low and ignore rest of the
crashkernel= options. That does not make sense to me.

Thanks
Vivek

Vivek Goyal

unread,
Apr 3, 2013, 1:47:58 PM4/3/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 03, 2013 at 10:32:23AM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 10:12 AM, Yinghai Lu <yin...@kernel.org> wrote:
> > On Wed, Apr 3, 2013 at 6:18 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> >
> >> - implement crashkernel_no_auto_low option to opt out of auto reserved
> >> low memory
> >
> > No, that is ugly.
> ...
> >
> > It's *you* want me to change "Crash kernel low" to "Crash kernel".
> >
> > Do we need to drop second patch? So will still keep
> > "Crash kernel low" in /proc/iomem?
>
> also we can drop the last patch and keep "crashkernel_high=" and
> "crashkernel_low="

as hpa mentioned, we should express memory reservation and dependency
of it in crashkernel= options. So introducing crashkernel_high or
crashkernel_low, just because you we don't want to support multiple
ranges is a kludge.

>
> as you even like to introduce "crashkernel_no_auto_low".

This is a kludge too for ease of use. At least it does not spoil
crashkernel= space and also works with existing crashkernel=X
parameters.

You know what, I think multiple ranges has another problem. And that is
all of the kexec/kdump code is written thinking there is one contiguous
reserved range.

/* Verify we have a valid entry point */
if ((entry < crashk_res.start) || (entry > crashk_res.end)) {
result = -EADDRNOTAVAIL;
goto out;
}


Also look at crash_shrink_memory().

So what I am saying that all our code is written assuming there is one
single reserved range. Now if we need to reserve two ranges, then let
us make it generic to suppoprt multiple ranges instead of hardcoding
things and assume there can be 2 ranges. That will be a more generic
solution.

So how about this.

- In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
memory. Support reservation of single range only. It could be either
high or low.

- Those who are using iommu, they can use crashkernel=X;high. Old code
can continue to use crashkernel=X and get memory reserved in low
memory areas.

- In 3.10 add a feature to support multiple crash reserved ranges.

Thanks
Vivek

Yinghai Lu

unread,
Apr 3, 2013, 4:39:03 PM4/3/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> So what I am saying that all our code is written assuming there is one
> single reserved range. Now if we need to reserve two ranges, then let
> us make it generic to suppoprt multiple ranges instead of hardcoding
> things and assume there can be 2 ranges. That will be a more generic
> solution.

I don't think we have case that we need to support more than two ranges.

We only need to have one big range above 4G, and put second kernel and
initrd in it.
and another low one is only for switotlb or others that will be used by
second kernel.

>
> So how about this.
>
> - In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
> memory. Support reservation of single range only. It could be either
> high or low.
>
> - Those who are using iommu, they can use crashkernel=X;high. Old code
> can continue to use crashkernel=X and get memory reserved in low
> memory areas.

That will not handle the case: big system that crashkernel=X;high
and kdump does not work with iommu.

>
> - In 3.10 add a feature to support multiple crash reserved ranges.

Again, we only need one high and one low range.
We don't need to support more than two ranges for crash kernel.

Thanks

Yinghai

Vivek Goyal

unread,
Apr 3, 2013, 5:24:27 PM4/3/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 03, 2013 at 01:38:56PM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> > So what I am saying that all our code is written assuming there is one
> > single reserved range. Now if we need to reserve two ranges, then let
> > us make it generic to suppoprt multiple ranges instead of hardcoding
> > things and assume there can be 2 ranges. That will be a more generic
> > solution.
>
> I don't think we have case that we need to support more than two ranges.
>

never say never.

One of the biggest problems is that we are trying to reserve all the
memory at system bootup time using kernel command line.

Why not reserve memory using some kernel interface after system has booted.
That will make life much simpler.

Reason being that currently we depend on one big single chunk being
available in low memory area. And after boot there is no guarantee
we will have that memory.

But what if we just reserve enough memory for kernel and initramfs
during boot and rest of the memory is reserved from user space. If
system has more memory, there are high chances that after boot we will
get memory immediately.

What if we don't get a single range of memory, but multiple ranges,
we should still be able to make use of all those ranges. And I think
that is one possible area where multiple ranges could be useful.

So yes, we don't have an immediate use case, but one can always pop
up in future as we try to extend the functionality.


> We only need to have one big range above 4G, and put second kernel and
> initrd in it.
> and another low one is only for switotlb or others that will be used by
> second kernel.
>
> >
> > So how about this.
> >
> > - In 3.9, just implement crashkernel=X;high. Don't auto reserve any low
> > memory. Support reservation of single range only. It could be either
> > high or low.
> >
> > - Those who are using iommu, they can use crashkernel=X;high. Old code
> > can continue to use crashkernel=X and get memory reserved in low
> > memory areas.
>
> That will not handle the case: big system that crashkernel=X;high
> and kdump does not work with iommu.

If we start supporting multiple ranges properly in 3.10, then it will.
And we have never supported it so it is not a regression. Delaying a
feature by a realease should be worth it in an attempt to do it
properly.

>
> >
> > - In 3.10 add a feature to support multiple crash reserved ranges.
>
> Again, we only need one high and one low range.
> We don't need to support more than two ranges for crash kernel.

For your current use case, you need one high and one low currently. But
there can always be scenarios where multiple crash ranges are required,
like reserving memory from user space.

Anyway, if you are adamant on hardcoding this stuff, please don't
use crashkernel= space. Continue to use crashkernel_high and/or
crashkernel_low options. That way it allows us to extend crashkernel=
in the form of crashkernel=X;<range-options>;<option2>;...
and support multiple range feature properly down the line and not be
bogged down with this new strange semantics.

Also docuemnt very clearly that specifying crashkernel_high ignores
any of the crashkernel= options.

Also what happens to reserve memory release interface. IIUC, there
is no way to release crashkernel_low memory from user space and
only memory reserved by crashkernel_high will be released?

Seriously, why are you rushing with this high/low hardcoding. Why
not wait for one more release and support multiple ranges properly
both in kernel and kexec-tools.

Thanks
Vivek

Yinghai Lu

unread,
Apr 3, 2013, 8:39:12 PM4/3/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org
Vivek found some problems with old kexec-tools.

We keep the old crashkernel=X to old behavoir, so it will not break
old kexec-tools.
Add crashkernel=X,high to support new kexec-tools that supports loading high.
when high is used, memblock will search from top to low.
if the allocated one is above 4G, kernel will try to auto allocate
72M under 4G for swiotlb.
user could crashkernel=Y,low to change 72M to other value.

-v2: reorder the patch sequences
crashkernel=X,high, crashkernel=Y,low only handle simple form.
crashkernel=X will override crashkernel=X;high crashkernel=Y;low

Thanks

Yinghai

Yinghai Lu

unread,
Apr 3, 2013, 8:39:16 PM4/3/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.

- crashkernel_low=size[KMG]
- [KNL, x86] parts under 4G.
-
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
@@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_low=size[KMG]
+ [KNL, x86_64] range under 4G. When crashkernel= is
+ passed, kernel allocate physical memory region
+ above 4G, that cause second kernel crash on system
+ that need swiotlb later. Kernel would try to allocate
+ some region below 4G automatically. This one let
+ user to specify own low range under 4G for second
+ kernel instead.
+ 0: to disable low allocation on systems that do not
+ need swiotlb, that will save 72M low ram in first
+ kernel.
+
cs89x0_dma= [HW,NET]
Format: <dma>

Yinghai Lu

unread,
Apr 3, 2013, 8:39:22 PM4/3/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

v2: let crashkernel=X override crashkernel_high=
update description about _high will be ignored by crashkernel=X

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 11 +++++++++--
arch/x86/kernel/setup.c | 24 +++++++++++++++++++-----
include/linux/kexec.h | 2 ++
kernel/kexec.c | 9 +++++++++
4 files changed, 39 insertions(+), 7 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,9 +603,14 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_high=size[KMG]
+ [KNL, x86_64] range could be above 4G. Allow kernel
+ to allocate physical memory region from top, so could
+ be above 4G if system have more than 4G ram installed.
+ It will be ignored if crashkernel=X is specified.
crashkernel_low=size[KMG]
- [KNL, x86_64] range under 4G. When crashkernel= is
- passed, kernel allocate physical memory region
+ [KNL, x86_64] range under 4G. When crashkernel_high= is
+ passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that need swiotlb later. Kernel would try to allocate
some region below 4G automatically. This one let
@@ -614,6 +619,8 @@ bytes respectively. Such letter suffixes
0: to disable low allocation on systems that do not
need swiotlb, that will save 72M low ram in first
kernel.
+ It will be ignored when crashkernel_high=X is not used
+ or return from that is below 4G.

cs89x0_dma= [HW,NET]
Format: <dma>
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -505,11 +505,14 @@ static void __init memblock_x86_reserve_
/*
* Keep the crash kernel below this limit. On 32 bits earlier kernels
* would limit the kernel to the low 512 MiB due to mapping restrictions.
+ * On 64bit, old kexec-tools need to under 896MiB.
*/
#ifdef CONFIG_X86_32
-# define CRASH_KERNEL_ADDR_MAX (512 << 20)
+# define CRASH_KERNEL_ADDR_LOW_MAX (512 << 20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+# define CRASH_KERNEL_ADDR_LOW_MAX (896UL<<20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX MAXMEM
#endif

static void __init reserve_crashkernel_low(void)
@@ -523,6 +526,7 @@ static void __init reserve_crashkernel_l
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
+ /* crashkernel_low=YM */
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
if (ret != 0) {
@@ -566,14 +570,22 @@ static void __init reserve_crashkernel(v
const unsigned long long alignment = 16<<20; /* 16M */
unsigned long long total_mem;
unsigned long long crash_size, crash_base;
+ bool high = false;
int ret;

total_mem = memblock_phys_mem_size();

+ /* crashkernel=XM */
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base);
- if (ret != 0 || crash_size <= 0)
- return;
+ if (ret != 0 || crash_size <= 0) {
+ /* crashkernel_high=XM */
+ ret = parse_crashkernel_high(boot_command_line, total_mem,
+ &crash_size, &crash_base);
+ if (ret != 0 || crash_size <= 0)
+ return;
+ high = true;
+ }

/* 0 means: find the address automatically */
if (crash_base <= 0) {
@@ -581,7 +593,9 @@ static void __init reserve_crashkernel(v

Yinghai Lu

unread,
Apr 3, 2013, 8:39:34 PM4/3/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Per hpa, use crashkernel=XM;high crashkernel=YM;low instead of
crashkernel_hign=XM crashkernel_low=YM. As that could be extensible.

-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 10 +--
arch/x86/kernel/setup.c | 6 +-
kernel/kexec.c | 94 +++++++++++++++++++++++++++---------
3 files changed, 80 insertions(+), 30 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,14 +603,14 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

- crashkernel_high=size[KMG]
+ crashkernel=size[KMG];high
[KNL, x86_64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
be above 4G if system have more than 4G ram installed.
It will be ignored if crashkernel=X is specified.
- crashkernel_low=size[KMG]
- [KNL, x86_64] range under 4G. When crashkernel_high= is
- passed, kernel could allocate physical memory region
+ crashkernel=size[KMG];low
+ [KNL, x86_64] range under 4G. When crashkernel=X;high
+ is passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that need swiotlb later. Kernel would try to allocate
some region below 4G automatically. This one let
@@ -619,7 +619,7 @@ bytes respectively. Such letter suffixes
0: to disable low allocation on systems that do not
need swiotlb, that will save 72M low ram in first
kernel.
- It will be ignored when crashkernel_high=X is not used
+ It will be ignored when crashkernel=X;high is not used
or return from that is below 4G.

cs89x0_dma= [HW,NET]
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -526,7 +526,7 @@ static void __init reserve_crashkernel_l
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
- /* crashkernel_low=YM */
+ /* crashkernel=YM;low */
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
if (ret != 0) {
@@ -539,7 +539,7 @@ static void __init reserve_crashkernel_l
low_size = swiotlb_size_or_default() + (8UL<<20);
auto_set = true;
} else {
- /* passed with crashkernel_low=0 ? */
+ /* passed with crashkernel=0;low ? */
if (!low_size)
return;
}
@@ -579,7 +579,7 @@ static void __init reserve_crashkernel(v
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base);
if (ret != 0 || crash_size <= 0) {
- /* crashkernel_high=XM */
+ /* crashkernel=XM;high */
ret = parse_crashkernel_high(boot_command_line, total_mem,
&crash_size, &crash_base);
if (ret != 0 || crash_size <= 0)
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1360,7 +1360,7 @@ static int __init parse_crashkernel_simp

if (*cur == '@')
*crash_base = memparse(cur+1, &cur);
- else if (*cur != ' ' && *cur != '\0') {
+ else if (*cur != ' ' && *cur != ';' && *cur != '\0') {
pr_warning("crashkernel: unrecognized char\n");
return -EINVAL;
}
@@ -1368,58 +1368,108 @@ static int __init parse_crashkernel_simp
return 0;
}

-/*
- * That function is the entry point for command line parsing and should be
- * called from the arch-specific code.
- */
+#define SUFFIX_HIGH 0
+#define SUFFIX_LOW 1
+#define SUFFIX_NULL 2
+static __initdata char *suffix_tbl[] = {
+ [SUFFIX_HIGH] = ";high",
+ [SUFFIX_LOW] = ";low",
+ [SUFFIX_NULL] = NULL,
+};
+
+static __init char *get_last_crashkernel(char *cmdline,
+ const char *name,
+ const char *suffix)
+{
+ char *p = cmdline, *ck_cmdline = NULL;
+
+ /* find crashkernel and use the last one if there are more */
+ p = strstr(p, name);
+ while (p) {
+ char *end_p = strchr(p, ' ');
+ char *q;
+
+ if (!end_p)
+ end_p = p + strlen(p);
+
+ if (!suffix) {
+ int i;
+
+ /* skip the one with any known suffix */
+ for (i = 0; suffix_tbl[i]; i++) {
+ q = end_p - strlen(suffix_tbl[i]);
+ if (!strncmp(q, suffix_tbl[i],
+ strlen(suffix_tbl[i])))
+ goto next;
+ }
+ ck_cmdline = p;
+ } else {
+ q = end_p - strlen(suffix);
+ if (!strncmp(q, suffix, strlen(suffix)))
+ ck_cmdline = p;
+ }
+next:
+ p = strstr(p+1, name);
+ }
+
+ if (!ck_cmdline)
+ return NULL;
+
+ ck_cmdline += strlen(name);
+
+ return ck_cmdline;
+}
+
static int __init __parse_crashkernel(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base,
- const char *name)
+ const char *name,
+ const char *suffix,
+ bool simple_only)
{
- char *p = cmdline, *ck_cmdline = NULL;
char *first_colon, *first_space;
+ char *ck_cmdline;

BUG_ON(!crash_size || !crash_base);
*crash_size = 0;
*crash_base = 0;

- /* find crashkernel and use the last one if there are more */
- p = strstr(p, name);
- while (p) {
- ck_cmdline = p;
- p = strstr(p+1, name);
- }
+ ck_cmdline = get_last_crashkernel(cmdline, name, suffix);

if (!ck_cmdline)
return -EINVAL;

- ck_cmdline += strlen(name);
-
/*
* if the commandline contains a ':', then that's the extended
* syntax -- if not, it must be the classic syntax
*/
first_colon = strchr(ck_cmdline, ':');
first_space = strchr(ck_cmdline, ' ');
- if (first_colon && (!first_space || first_colon < first_space))
- return parse_crashkernel_mem(ck_cmdline, system_ram,
- crash_size, crash_base);
- else
+ if (first_colon && (!first_space || first_colon < first_space)) {
+ if (simple_only)
+ return -EINVAL;
+ else
+ return parse_crashkernel_mem(ck_cmdline, system_ram,
+ crash_size, crash_base);
+ } else
return parse_crashkernel_simple(ck_cmdline, crash_size,
crash_base);

return 0;
}

+/*
+ * That function is the entry point for command line parsing and should be
+ * called from the arch-specific code.
+ */
int __init parse_crashkernel(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel=");
+ "crashkernel=", NULL, false);
}

int __init parse_crashkernel_high(char *cmdline,
@@ -1428,7 +1478,7 @@ int __init parse_crashkernel_high(char *
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel_high=");
+ "crashkernel=", suffix_tbl[SUFFIX_HIGH], true);
}

int __init parse_crashkernel_low(char *cmdline,
@@ -1437,7 +1487,7 @@ int __init parse_crashkernel_low(char *c
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel_low=");
+ "crashkernel=", suffix_tbl[SUFFIX_LOW], true);
}

static void update_vmcoreinfo_note(void)

Yinghai Lu

unread,
Apr 3, 2013, 8:39:53 PM4/3/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem
instead.

So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem.

Suggested-by: Vivek Goyal <vgo...@redhat.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
kernel/kexec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -55,7 +55,7 @@ struct resource crashk_res = {
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
struct resource crashk_low_res = {
- .name = "Crash kernel low",
+ .name = "Crash kernel",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM

Yinghai Lu

unread,
Apr 3, 2013, 8:56:08 PM4/3/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 3, 2013 at 2:00 PM, Vivek Goyal <vgo...@redhat.com> wrote:
> On Wed, Apr 03, 2013 at 01:38:56PM -0700, Yinghai Lu wrote:
>> On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgo...@redhat.com> wrote:
>> > So what I am saying that all our code is written assuming there is one
>> > single reserved range. Now if we need to reserve two ranges, then let
>> > us make it generic to suppoprt multiple ranges instead of hardcoding
>> > things and assume there can be 2 ranges. That will be a more generic
>> > solution.
>>
>> I don't think we have case that we need to support more than two ranges.
>>
>
> never say never.

One big rang under 4G is working well till second kernel need more than 512M
on bigger system with more memory.

Now we make the 64bit kernel/initrd etc could stay above 4G, we can have
enough range above 4G.

The only problem is during that second kernel running it need some ram
under 4G when iommu can not be used.

So have one big range above 4G, and a small range below 4g like 72M
is rational in a long run.

>
> One of the biggest problems is that we are trying to reserve all the
> memory at system bootup time using kernel command line.
>
> Why not reserve memory using some kernel interface after system has booted.
> That will make life much simpler.
>
> Reason being that currently we depend on one big single chunk being
> available in low memory area. And after boot there is no guarantee
> we will have that memory.
>
> But what if we just reserve enough memory for kernel and initramfs
> during boot and rest of the memory is reserved from user space. If
> system has more memory, there are high chances that after boot we will
> get memory immediately.

Allocate 72M continuous below 4G after system is booted up?

>
> What if we don't get a single range of memory, but multiple ranges,
> we should still be able to make use of all those ranges. And I think
> that is one possible area where multiple ranges could be useful.

swiotlb does not need continuous area?

>
> So yes, we don't have an immediate use case, but one can always pop
> up in future as we try to extend the functionality.

I don't think we need to extend it anymore with high/low two range support.
will change back to crashkernel= override crashkernel_high.

>
> Also what happens to reserve memory release interface. IIUC, there
> is no way to release crashkernel_low memory from user space and
> only memory reserved by crashkernel_high will be released?

can not decode.


>
> Seriously, why are you rushing with this high/low hardcoding. Why
> not wait for one more release and support multiple ranges properly
> both in kernel and kexec-tools.

As more then 2 range is not needed.
These four patches are complete solution.

Thanks

Yinghai

HATAYAMA Daisuke

unread,
Apr 4, 2013, 2:55:45 AM4/4/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org
(2013/04/04 9:38), Yinghai Lu wrote:

> Index: linux-2.6/kernel/kexec.c
> ===================================================================
> --- linux-2.6.orig/kernel/kexec.c
> +++ linux-2.6/kernel/kexec.c
> @@ -1360,7 +1360,7 @@ static int __init parse_crashkernel_simp
>
> if (*cur == '@')
> *crash_base = memparse(cur+1, &cur);
> - else if (*cur != ' ' && *cur != '\0') {
> + else if (*cur != ' ' && *cur != ';' && *cur != '\0') {
> pr_warning("crashkernel: unrecognized char\n");
> return -EINVAL;
> }

As I said below, ";high" or ";low" check should be here. It would be
enough to replace the condition *cur != ';' by strncmp(cur, ";high", 5)
|| strncmp(cur, ";low", 4).

> @@ -1368,58 +1368,108 @@ static int __init parse_crashkernel_simp
> return 0;
> }
>
> -/*
> - * That function is the entry point for command line parsing and should be
> - * called from the arch-specific code.
> - */
> +#define SUFFIX_HIGH 0
> +#define SUFFIX_LOW 1
> +#define SUFFIX_NULL 2
> +static __initdata char *suffix_tbl[] = {
> + [SUFFIX_HIGH] = ";high",
> + [SUFFIX_LOW] = ";low",
> + [SUFFIX_NULL] = NULL,
> +};
> +
> +static __init char *get_last_crashkernel(char *cmdline,
> + const char *name,
> + const char *suffix)
> +{
> + char *p = cmdline, *ck_cmdline = NULL;
> +
> + /* find crashkernel and use the last one if there are more */

Why did you choose the last one? Is there any reason you didn't choose
the first one?

Also, it's better to describe this bahaviour in
Documentations/kernel-parameter.txt.

> + p = strstr(p, name);
> + while (p) {
> + char *end_p = strchr(p, ' ');
> + char *q;
> +
> + if (!end_p)
> + end_p = p + strlen(p);
> +
> + if (!suffix) {
> + int i;
> +
> + /* skip the one with any known suffix */
> + for (i = 0; suffix_tbl[i]; i++) {
> + q = end_p - strlen(suffix_tbl[i]);
> + if (!strncmp(q, suffix_tbl[i],
> + strlen(suffix_tbl[i])))
> + goto next;
> + }
> + ck_cmdline = p;
> + } else {
> + q = end_p - strlen(suffix);
> + if (!strncmp(q, suffix, strlen(suffix)))
> + ck_cmdline = p;
> + }

It looks to me that this function does more than its name suggests. It
seems to me enough to get the last occurence of "crashkernel=<some
value>" and to leave the "<some value>" unknown for now.

The current code of yours checks if each "crashkernel=<some value>"
detected by strstr() ends with each of ";high" or ";low", but doesn't
check the formeter letters at all; e.g, "crashkernel=foobar;high" is passed.

Also, this function can be called in different contexts: from a variant
of parse_crashkernel_*(). Is it better to move this function in
reserve_crashkernel() and then pass the obtained "crashkernel=<some
value>" to a variant of parse_crashkernel_*() functions?

Thanks.
HATAYAMA, Daisuke

HATAYAMA Daisuke

unread,
Apr 4, 2013, 4:12:05 AM4/4/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org
(2013/04/04 9:38), Yinghai Lu wrote:

The variable default_size seems no longer necessary. IO_TLB_DEFAULT_SIZE
should be used directly.

>
> Index: linux-2.6/Documentation/kernel-parameters.txt
> ===================================================================
> --- linux-2.6.orig/Documentation/kernel-parameters.txt
> +++ linux-2.6/Documentation/kernel-parameters.txt
> @@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
> is selected automatically. Check
> Documentation/kdump/kdump.txt for further details.
>
> - crashkernel_low=size[KMG]
> - [KNL, x86] parts under 4G.
> -
> crashkernel=range1:size1[,range2:size2,...][@offset]
> [KNL] Same as above, but depends on the memory
> in the running system. The syntax of range is
> @@ -606,6 +603,18 @@ bytes respectively. Such letter suffixes
> a memory unit (amount[KMG]). See also
> Documentation/kdump/kdump.txt for an example.
>
> + crashkernel_low=size[KMG]
> + [KNL, x86_64] range under 4G. When crashkernel= is
> + passed, kernel allocate physical memory region
> + above 4G, that cause second kernel crash on system
> + that need swiotlb later. Kernel would try to allocate
> + some region below 4G automatically. This one let
> + user to specify own low range under 4G for second
> + kernel instead.

In fact, swiotlb caused the 2nd kernel to fail to boot this time, but in
general, other components that require low memory can cause the same
situation. I think it better to avoid the description that only swiotlb
can cause the 2nd kernel to fail to boot.

So, how about this?

.. that cause second kernel crash on system that require some amount of
low memory, e.g. swiotlb that requires at least 72MB low memory at
default. Kernel would ...

> + 0: to disable low allocation on systems that do not
> + need swiotlb, that will save 72M low ram in first
> + kernel.
> +

Similarly, I think it better not to say swiotlb thing here. Just:

0: to disable automatic low memory allocation.

Thanks.
HATAYAMA, Daisuke

Vivek Goyal

unread,
Apr 4, 2013, 9:42:26 AM4/4/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 03, 2013 at 05:56:00PM -0700, Yinghai Lu wrote:
> On Wed, Apr 3, 2013 at 2:00 PM, Vivek Goyal <vgo...@redhat.com> wrote:
> > On Wed, Apr 03, 2013 at 01:38:56PM -0700, Yinghai Lu wrote:
> >> On Wed, Apr 3, 2013 at 10:47 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> >> > So what I am saying that all our code is written assuming there is one
> >> > single reserved range. Now if we need to reserve two ranges, then let
> >> > us make it generic to suppoprt multiple ranges instead of hardcoding
> >> > things and assume there can be 2 ranges. That will be a more generic
> >> > solution.
> >>
> >> I don't think we have case that we need to support more than two ranges.
> >>
> >
> > never say never.
>
> One big rang under 4G is working well till second kernel need more than 512M
> on bigger system with more memory.
>
> Now we make the 64bit kernel/initrd etc could stay above 4G, we can have
> enough range above 4G.
>
> The only problem is during that second kernel running it need some ram
> under 4G when iommu can not be used.
>
> So have one big range above 4G, and a small range below 4g like 72M
> is rational in a long run.

That's one use case. You have not addressed the issue of reserving memory
after boot and not finding one contiguous range and finding smaller
contiguous ranges.

>
> >
> > One of the biggest problems is that we are trying to reserve all the
> > memory at system bootup time using kernel command line.
> >
> > Why not reserve memory using some kernel interface after system has booted.
> > That will make life much simpler.
> >
> > Reason being that currently we depend on one big single chunk being
> > available in low memory area. And after boot there is no guarantee
> > we will have that memory.
> >
> > But what if we just reserve enough memory for kernel and initramfs
> > during boot and rest of the memory is reserved from user space. If
> > system has more memory, there are high chances that after boot we will
> > get memory immediately.
>
> Allocate 72M continuous below 4G after system is booted up?
>
> >
> > What if we don't get a single range of memory, but multiple ranges,
> > we should still be able to make use of all those ranges. And I think
> > that is one possible area where multiple ranges could be useful.
>
> swiotlb does not need continuous area?

One chunk could be below 4G during boot. Say 128M in low memory area
which is sufficient to load kernel, initrd and swiotlb. Rest of the
memory reservation could come after boot on need basis and that does
not have to be contiguous.
[..]
> >
> > Also what happens to reserve memory release interface. IIUC, there
> > is no way to release crashkernel_low memory from user space and
> > only memory reserved by crashkernel_high will be released?
>
> can not decode.

How do we release low memory from user space. Look at
crash_shrink_memory().

Vivek Goyal

unread,
Apr 4, 2013, 9:51:27 AM4/4/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 03, 2013 at 05:56:00PM -0700, Yinghai Lu wrote:

[..]
>
> One big rang under 4G is working well till second kernel need more than 512M
> on bigger system with more memory.

Currently one range is allocated below 896MB (and not 4G). So if we extend
crashkernel=X to first try below 896MB and then reserve below 4G, we are
just fine. And in fact we should be able to extend to even look beyond
4G if nothing suitable is available below 4G.

Thanks
Vivek

Vivek Goyal

unread,
Apr 4, 2013, 10:11:40 AM4/4/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On Wed, Apr 03, 2013 at 05:38:23PM -0700, Yinghai Lu wrote:

[..]
> + if (ret != 0) {
> + /*
> + * two parts from lib/swiotlb.c:
> + * swiotlb size: user specified with swiotlb= or default.
> + * swiotlb overflow buffer: now is hardcoded to 32k,
> + * round to 8M to cover more others.
> + */
> + low_size = swiotlb_size_or_default() + (8UL<<20);
> + auto_set = true;

What is the correlation between swiotlb size of first kernel and second
kernel. They might be completely different kernel and with different
default size for swiotlb buffers.

[..]
>
> + crashkernel_low=size[KMG]
> + [KNL, x86_64] range under 4G. When crashkernel= is
> + passed, kernel allocate physical memory region
> + above 4G,

This is not right. "When crashkernel=X;high is passed kernel first tries
to allocate memory above 4G". crashkernel=X does not try to allocate
memory above 4G.

Vivek

Vivek Goyal

unread,
Apr 4, 2013, 10:16:22 AM4/4/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
Also specify "otherwise memory will be allocated below 4G, if available".

> + It will be ignored if crashkernel=X is specified.
> crashkernel_low=size[KMG]
> - [KNL, x86_64] range under 4G. When crashkernel= is
> - passed, kernel allocate physical memory region
> + [KNL, x86_64] range under 4G. When crashkernel_high= is
> + passed, kernel could allocate physical memory region
> above 4G, that cause second kernel crash on system
> that need swiotlb later. Kernel would try to allocate
> some region below 4G automatically. This one let
> @@ -614,6 +619,8 @@ bytes respectively. Such letter suffixes
> 0: to disable low allocation on systems that do not
> need swiotlb, that will save 72M low ram in first
> kernel.
> + It will be ignored when crashkernel_high=X is not used
> + or return from that is below 4G.

Replace "return from that" with "memory reserved".

Vivek

Yinghai Lu

unread,
Apr 4, 2013, 12:28:38 PM4/4/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Thu, Apr 4, 2013 at 7:11 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> On Wed, Apr 03, 2013 at 05:38:23PM -0700, Yinghai Lu wrote:
>
> [..]
>> + if (ret != 0) {
>> + /*
>> + * two parts from lib/swiotlb.c:
>> + * swiotlb size: user specified with swiotlb= or default.
>> + * swiotlb overflow buffer: now is hardcoded to 32k,
>> + * round to 8M to cover more others.
>> + */
>> + low_size = swiotlb_size_or_default() + (8UL<<20);
>> + auto_set = true;
>
> What is the correlation between swiotlb size of first kernel and second
> kernel. They might be completely different kernel and with different
> default size for swiotlb buffers.
>

You are right.
That is most good guess assume they are the same.


> [..]
>>
>> + crashkernel_low=size[KMG]
>> + [KNL, x86_64] range under 4G. When crashkernel= is
>> + passed, kernel allocate physical memory region
>> + above 4G,
>
> This is not right. "When crashkernel=X;high is passed kernel first tries
> to allocate memory above 4G". crashkernel=X does not try to allocate
> memory above 4G.

This is first patch, the text will be updated in second and fourth patch.

Thanks

Yinghai

Yinghai Lu

unread,
Apr 4, 2013, 12:45:23 PM4/4/13
to HATAYAMA Daisuke, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List
Yes, but that does not hurt.
> ... that cause second kernel crash on system that require some amount of
> low memory, e.g. swiotlb that requires at least 72MB low memory at
> default. Kernel would ...

ok, will change that.

>
>> + 0: to disable low allocation on systems that do not
>> + need swiotlb, that will save 72M low ram in first
>> + kernel.
>> +
>
> Similarly, I think it better not to say swiotlb thing here. Just:
>
> 0: to disable automatic low memory allocation.

ok,

it will be:

---
crashkernel_low=size[KMG]
[KNL, x86_64] range under 4G. When crashkernel= is
passed, kernel allocate physical memory region
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
requires at least 64M+32K low memory. Kernel would
try to allocate 72M below 4G automatically.
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
---

Thanks

Yinghai

Yinghai Lu

unread,
Apr 4, 2013, 12:56:20 PM4/4/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Thu, Apr 4, 2013 at 7:16 AM, Vivek Goyal <vgo...@redhat.com> wrote:
>> + crashkernel_high=size[KMG]
>> + [KNL, x86_64] range could be above 4G. Allow kernel
>> + to allocate physical memory region from top, so could
>> + be above 4G if system have more than 4G ram installed.
>
> Also specify "otherwise memory will be allocated below 4G, if available".
>
..
>> @@ -614,6 +619,8 @@ bytes respectively. Such letter suffixes
>> 0: to disable low allocation on systems that do not
>> need swiotlb, that will save 72M low ram in first
>> kernel.
>> + It will be ignored when crashkernel_high=X is not used
>> + or return from that is below 4G.
>
> Replace "return from that" with "memory reserved".

updated.

Thanks

Yinghai

Yinghai Lu

unread,
Apr 4, 2013, 1:33:29 PM4/4/13
to HATAYAMA Daisuke, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List
On Wed, Apr 3, 2013 at 11:55 PM, HATAYAMA Daisuke
<d.hat...@jp.fujitsu.com> wrote:
> (2013/04/04 9:38), Yinghai Lu wrote:
>
>> Index: linux-2.6/kernel/kexec.c
>> ===================================================================
>> --- linux-2.6.orig/kernel/kexec.c
>> +++ linux-2.6/kernel/kexec.c
>> @@ -1360,7 +1360,7 @@ static int __init parse_crashkernel_simp
>>
>> if (*cur == '@')
>> *crash_base = memparse(cur+1, &cur);
>> - else if (*cur != ' ' && *cur != '\0') {
>> + else if (*cur != ' ' && *cur != ';' && *cur != '\0') {
>> pr_warning("crashkernel: unrecognized char\n");
>> return -EINVAL;
>> }
>
> As I said below, ";high" or ";low" check should be here. It would be
> enough to replace the condition *cur != ';' by strncmp(cur, ";high", 5)
> || strncmp(cur, ";low", 4).

Good catch, will use that strict checking.

>
>> @@ -1368,58 +1368,108 @@ static int __init parse_crashkernel_simp
>> return 0;
>> }
>>
>> -/*
>> - * That function is the entry point for command line parsing and should be
>> - * called from the arch-specific code.
>> - */
>> +#define SUFFIX_HIGH 0
>> +#define SUFFIX_LOW 1
>> +#define SUFFIX_NULL 2
>> +static __initdata char *suffix_tbl[] = {
>> + [SUFFIX_HIGH] = ";high",
>> + [SUFFIX_LOW] = ";low",
>> + [SUFFIX_NULL] = NULL,
>> +};
>> +
>> +static __init char *get_last_crashkernel(char *cmdline,
>> + const char *name,
>> + const char *suffix)
>> +{
>> + char *p = cmdline, *ck_cmdline = NULL;
>> +
>> + /* find crashkernel and use the last one if there are more */
>
> Why did you choose the last one? Is there any reason you didn't choose
> the first one?

I split the function out, and keep the old comments.

Also that is default behavior of early_param and __setup() handling,
last one will be take effective.
ok will move
ck_cmdline += strlen(name);

down.

>
> The current code of yours checks if each "crashkernel=<some value>"
> detected by strstr() ends with each of ";high" or ";low", but doesn't
> check the formeter letters at all; e.g, "crashkernel=foobar;high" is passed.

that should be ok, later parse...simple will report error.

>
> Also, this function can be called in different contexts: from a variant
> of parse_crashkernel_*(). Is it better to move this function in
> reserve_crashkernel() and then pass the obtained "crashkernel=<some
> value>" to a variant of parse_crashkernel_*() functions?

Do you mean calling get_last_crashkernel directly from reserve_crashkernel
directly?

I think we could keep parse_crashkernel_*() call it directly, so could have less
change in arch/x86/kernel/setup.c.

Thanks

Yinghai

Yinghai Lu

unread,
Apr 4, 2013, 6:17:33 PM4/4/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org
Vivek found some problems with old kexec-tools.

We keep the old crashkernel=X to old behavoir, so it will not break
old kexec-tools.
Add crashkernel=X,high to support new kexec-tools that supports loading high.
when high is used, memblock will search from top to low.
if the allocated one is above 4G, kernel will try to auto allocate
72M under 4G for swiotlb.
user could crashkernel=Y,low to change 72M to other value.

-v2: reorder the patch sequences
crashkernel=X,high, crashkernel=Y,low only handle simple form.
crashkernel=X will override crashkernel=X;high crashkernel=Y;low
-v3: update description in kernel-parameters.txt
update get_last_crashkernel and _simple checking about suffix.

Yinghai Lu

unread,
Apr 4, 2013, 6:17:37 PM4/4/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Vivek found old kexec-tools does not work new kernel anymore.

So change back crashkernel= back to old behavoir, and add crashkernel_high=
to let user decide if buffer could be above 4G, and also new kexec-tools will
be needed.

-v2: let crashkernel=X override crashkernel_high=
update description about _high will be ignored by crashkernel=X
-v3: update description about kernel-parameters.txt according to Vivek.

Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 13 +++++++++++--
arch/x86/kernel/setup.c | 24 +++++++++++++++++++-----
include/linux/kexec.h | 2 ++
kernel/kexec.c | 9 +++++++++
4 files changed, 41 insertions(+), 7 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,9 +603,16 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_high=size[KMG]
+ [KNL, x86_64] range could be above 4G. Allow kernel
+ to allocate physical memory region from top, so could
+ be above 4G if system have more than 4G ram installed.
+ Otherwise memory region will be allocated below 4G, if
+ available.
+ It will be ignored if crashkernel=X is specified.
crashkernel_low=size[KMG]
- [KNL, x86_64] range under 4G. When crashkernel= is
- passed, kernel allocate physical memory region
+ [KNL, x86_64] range under 4G. When crashkernel_high= is
+ passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
requires at least 64M+32K low memory. Kernel would
@@ -613,6 +620,8 @@ bytes respectively. Such letter suffixes
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
+ It will be ignored when crashkernel_high=X is not used
+ or memory reserved is below 4G.

cs89x0_dma= [HW,NET]
Format: <dma>
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -505,11 +505,14 @@ static void __init memblock_x86_reserve_
/*
* Keep the crash kernel below this limit. On 32 bits earlier kernels
* would limit the kernel to the low 512 MiB due to mapping restrictions.
+ * On 64bit, old kexec-tools need to under 896MiB.
*/
#ifdef CONFIG_X86_32
-# define CRASH_KERNEL_ADDR_MAX (512 << 20)
+# define CRASH_KERNEL_ADDR_LOW_MAX (512 << 20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX (512 << 20)
#else
-# define CRASH_KERNEL_ADDR_MAX MAXMEM
+# define CRASH_KERNEL_ADDR_LOW_MAX (896UL<<20)
+# define CRASH_KERNEL_ADDR_HIGH_MAX MAXMEM
#endif

static void __init reserve_crashkernel_low(void)
@@ -523,6 +526,7 @@ static void __init reserve_crashkernel_l
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
+ /* crashkernel_low=YM */
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
if (ret != 0) {
@@ -566,14 +570,22 @@ static void __init reserve_crashkernel(v
const unsigned long long alignment = 16<<20; /* 16M */
unsigned long long total_mem;
unsigned long long crash_size, crash_base;
+ bool high = false;
int ret;

total_mem = memblock_phys_mem_size();

+ /* crashkernel=XM */
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base);
int __init parse_crashkernel(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
+int parse_crashkernel_high(char *cmdline, unsigned long long system_ram,
+ unsigned long long *crash_size, unsigned long long *crash_base);
int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
unsigned long long *crash_size, unsigned long long *crash_base);
int crash_shrink_memory(unsigned long new_size);
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1422,6 +1422,15 @@ int __init parse_crashkernel(char *cmdli
"crashkernel=");
}

+int __init parse_crashkernel_high(char *cmdline,
+ unsigned long long system_ram,
+ unsigned long long *crash_size,
+ unsigned long long *crash_base)
+{
+ return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
+ "crashkernel_high=");
+}
+
int __init parse_crashkernel_low(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,

Yinghai Lu

unread,
Apr 4, 2013, 6:17:45 PM4/4/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
We can extend kexec-tools to support multiple "Crash kernel" in /proc/iomem
instead.

So we can use "Crash kernel" instead of "Crash kernel low" in /proc/iomem.

Suggested-by: Vivek Goyal <vgo...@redhat.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
kernel/kexec.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -55,7 +55,7 @@ struct resource crashk_res = {
.flags = IORESOURCE_BUSY | IORESOURCE_MEM
};
struct resource crashk_low_res = {
- .name = "Crash kernel low",
+ .name = "Crash kernel",
.start = 0,
.end = 0,
.flags = IORESOURCE_BUSY | IORESOURCE_MEM

Yinghai Lu

unread,
Apr 4, 2013, 6:18:14 PM4/4/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Per hpa, use crashkernel=XM;high crashkernel=YM;low instead of
crashkernel_hign=XM crashkernel_low=YM. As that could be extensible.

-v2: according to Vivek, change delimiter to ;
-v3: let hign and low only handle simple form and it conforms to
description in kernel-parameters.txt
still keep crashkernel=X override any crashkernel=X,high
crashkernel=Y,low
-v4: update get_last_crashkernel returning and add more strict
checking in parse_crashkernel_simple() found by HATAYAMA.

Cc:HATAYAMA Daisuke <d.hat...@jp.fujitsu.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 10 +--
arch/x86/kernel/setup.c | 6 +-
kernel/kexec.c | 103 ++++++++++++++++++++++++++++--------
3 files changed, 89 insertions(+), 30 deletions(-)

Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -603,16 +603,16 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

- crashkernel_high=size[KMG]
+ crashkernel=size[KMG];high
[KNL, x86_64] range could be above 4G. Allow kernel
to allocate physical memory region from top, so could
be above 4G if system have more than 4G ram installed.
Otherwise memory region will be allocated below 4G, if
available.
It will be ignored if crashkernel=X is specified.
- crashkernel_low=size[KMG]
- [KNL, x86_64] range under 4G. When crashkernel_high= is
- passed, kernel could allocate physical memory region
+ crashkernel=size[KMG];low
+ [KNL, x86_64] range under 4G. When crashkernel=X;high
+ is passed, kernel could allocate physical memory region
above 4G, that cause second kernel crash on system
that require some amount of low memory, e.g. swiotlb
requires at least 64M+32K low memory. Kernel would
@@ -620,7 +620,7 @@ bytes respectively. Such letter suffixes
This one let user to specify own low range under 4G
for second kernel instead.
0: to disable low allocation.
- It will be ignored when crashkernel_high=X is not used
+ It will be ignored when crashkernel=X;high is not used
or memory reserved is below 4G.

cs89x0_dma= [HW,NET]
Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -526,7 +526,7 @@ static void __init reserve_crashkernel_l
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
- /* crashkernel_low=YM */
+ /* crashkernel=YM;low */
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
if (ret != 0) {
@@ -539,7 +539,7 @@ static void __init reserve_crashkernel_l
low_size = swiotlb_size_or_default() + (8UL<<20);
auto_set = true;
} else {
- /* passed with crashkernel_low=0 ? */
+ /* passed with crashkernel=0;low ? */
if (!low_size)
return;
}
@@ -579,7 +579,7 @@ static void __init reserve_crashkernel(v
ret = parse_crashkernel(boot_command_line, total_mem,
&crash_size, &crash_base);
if (ret != 0 || crash_size <= 0) {
- /* crashkernel_high=XM */
+ /* crashkernel=XM;high */
ret = parse_crashkernel_high(boot_command_line, total_mem,
&crash_size, &crash_base);
if (ret != 0 || crash_size <= 0)
Index: linux-2.6/kernel/kexec.c
===================================================================
--- linux-2.6.orig/kernel/kexec.c
+++ linux-2.6/kernel/kexec.c
@@ -1339,6 +1339,15 @@ static int __init parse_crashkernel_mem(
return 0;
}

+#define SUFFIX_HIGH 0
+#define SUFFIX_LOW 1
+#define SUFFIX_NULL 2
+static __initdata char *suffix_tbl[] = {
+ [SUFFIX_HIGH] = ";high",
+ [SUFFIX_LOW] = ";low",
+ [SUFFIX_NULL] = NULL,
+};
+
/*
* That function parses "simple" (old) crashkernel command lines like
*
@@ -1360,37 +1369,80 @@ static int __init parse_crashkernel_simp

if (*cur == '@')
*crash_base = memparse(cur+1, &cur);
- else if (*cur != ' ' && *cur != '\0') {
- pr_warning("crashkernel: unrecognized char\n");
- return -EINVAL;
+ else {
+ int i;
+
+ /* check with known suffix */
+ for (i = 0; suffix_tbl[i]; i++)
+ if (!strncmp(cur, suffix_tbl[i], strlen(suffix_tbl[i])))
+ return 0;
+
+ if (*cur != ' ' && *cur != '\0') {
+ pr_warning("crashkernel: unrecognized char\n");
+ return -EINVAL;
+ }
}

return 0;
}

-/*
- * That function is the entry point for command line parsing and should be
- * called from the arch-specific code.
- */
+static __init char *get_last_crashkernel(char *cmdline,
+ const char *name,
+ const char *suffix)
+{
+ char *p = cmdline, *ck_cmdline = NULL;
+
+ /* find crashkernel and use the last one if there are more */
+ p = strstr(p, name);
+ while (p) {
+ char *end_p = strchr(p, ' ');
+ char *q;
+
+ if (!end_p)
+ end_p = p + strlen(p);
+
+ if (!suffix) {
+ int i;
+
+ /* skip the one with any known suffix */
+ for (i = 0; suffix_tbl[i]; i++) {
+ q = end_p - strlen(suffix_tbl[i]);
+ if (!strncmp(q, suffix_tbl[i],
+ strlen(suffix_tbl[i])))
+ goto next;
+ }
+ ck_cmdline = p;
+ } else {
+ q = end_p - strlen(suffix);
+ if (!strncmp(q, suffix, strlen(suffix)))
+ ck_cmdline = p;
+ }
+next:
+ p = strstr(p+1, name);
+ }
+
+ if (!ck_cmdline)
+ return NULL;
+
+ return ck_cmdline;
+}
+
static int __init __parse_crashkernel(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base,
- const char *name)
+ const char *name,
+ const char *suffix,
+ bool simple_only)
{
- char *p = cmdline, *ck_cmdline = NULL;
char *first_colon, *first_space;
+ char *ck_cmdline;

BUG_ON(!crash_size || !crash_base);
*crash_size = 0;
*crash_base = 0;

- /* find crashkernel and use the last one if there are more */
- p = strstr(p, name);
- while (p) {
- ck_cmdline = p;
- p = strstr(p+1, name);
- }
+ ck_cmdline = get_last_crashkernel(cmdline, name, suffix);

if (!ck_cmdline)
return -EINVAL;
@@ -1403,23 +1455,30 @@ static int __init __parse_crashkernel(ch
*/
first_colon = strchr(ck_cmdline, ':');
first_space = strchr(ck_cmdline, ' ');
- if (first_colon && (!first_space || first_colon < first_space))
- return parse_crashkernel_mem(ck_cmdline, system_ram,
- crash_size, crash_base);
- else
+ if (first_colon && (!first_space || first_colon < first_space)) {
+ if (simple_only)
+ return -EINVAL;
+ else
+ return parse_crashkernel_mem(ck_cmdline, system_ram,
+ crash_size, crash_base);
+ } else
return parse_crashkernel_simple(ck_cmdline, crash_size,
crash_base);

return 0;
}

+/*
+ * That function is the entry point for command line parsing and should be
+ * called from the arch-specific code.
+ */
int __init parse_crashkernel(char *cmdline,
unsigned long long system_ram,
unsigned long long *crash_size,
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel=");
+ "crashkernel=", NULL, false);
}

int __init parse_crashkernel_high(char *cmdline,
@@ -1428,7 +1487,7 @@ int __init parse_crashkernel_high(char *
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel_high=");
+ "crashkernel=", suffix_tbl[SUFFIX_HIGH], true);
}

int __init parse_crashkernel_low(char *cmdline,
@@ -1437,7 +1496,7 @@ int __init parse_crashkernel_low(char *c
unsigned long long *crash_base)
{
return __parse_crashkernel(cmdline, system_ram, crash_size, crash_base,
- "crashkernel_low=");
+ "crashkernel=", suffix_tbl[SUFFIX_LOW], true);
}

static void update_vmcoreinfo_note(void)

Yinghai Lu

unread,
Apr 4, 2013, 6:19:04 PM4/4/13
to Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org, Yinghai Lu
Chao said that kdump does does work well on his system on 3.8
without extra parameter, even iommu does not work with kdump.
And now have to append crashkernel_low=Y in first kernel to make
kdump work.

We have now modified crashkernel=X to allocate memory beyong 4G (if
available) and do not allocate low range for crashkernel if the user
does not specify that with crashkernel_low=Y. This causes regression
if iommu is not enabled. Without iommu, swiotlb needs to be setup in
first 4G and there is no low memory available to second kernel.

Set crashkernel_low automatically if the user does not specify that.

For system that does support IOMMU with kdump properly, user could
specify crashkernel_low=0 to save that 72M low ram.

-v3: add swiotlb_size() according to Konrad.
-v4: add comments what 8M is for according to hpa.
also update more crashkernel_low= in kernel-parameters.txt
-v5: update changelog according to Vivek.
-v6: Change description about swiotlb referring according to HATAYAMA.

Reported-by: WANG Chao <chao...@redhat.com>
Tested-by: WANG Chao <chao...@redhat.com>
Signed-off-by: Yinghai Lu <yin...@kernel.org>

---
Documentation/kernel-parameters.txt | 14 +++++++++++---
arch/x86/kernel/setup.c | 20 +++++++++++++++++---
include/linux/swiotlb.h | 1 +
lib/swiotlb.c | 19 +++++++++++++++----
4 files changed, 44 insertions(+), 10 deletions(-)

Index: linux-2.6/arch/x86/kernel/setup.c
===================================================================
--- linux-2.6.orig/arch/x86/kernel/setup.c
+++ linux-2.6/arch/x86/kernel/setup.c
@@ -519,19 +519,33 @@ static void __init reserve_crashkernel_l
unsigned long long low_base = 0, low_size = 0;
unsigned long total_low_mem;
unsigned long long base;
+ bool auto_set = false;
int ret;

total_low_mem = memblock_mem_size(1UL<<(32-PAGE_SHIFT));
ret = parse_crashkernel_low(boot_command_line, total_low_mem,
&low_size, &base);
- if (ret != 0 || low_size <= 0)
- return;
+ if (ret != 0) {
+ /*
+ * two parts from lib/swiotlb.c:
+ * swiotlb size: user specified with swiotlb= or default.
+ * swiotlb overflow buffer: now is hardcoded to 32k,
+ * round to 8M to cover more others.
+ */
Index: linux-2.6/Documentation/kernel-parameters.txt
===================================================================
--- linux-2.6.orig/Documentation/kernel-parameters.txt
+++ linux-2.6/Documentation/kernel-parameters.txt
@@ -596,9 +596,6 @@ bytes respectively. Such letter suffixes
is selected automatically. Check
Documentation/kdump/kdump.txt for further details.

- crashkernel_low=size[KMG]
- [KNL, x86] parts under 4G.
-
crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
@@ -606,6 +603,17 @@ bytes respectively. Such letter suffixes
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.

+ crashkernel_low=size[KMG]
+ [KNL, x86_64] range under 4G. When crashkernel= is
+ passed, kernel allocate physical memory region
+ above 4G, that cause second kernel crash on system
+ that require some amount of low memory, e.g. swiotlb
+ requires at least 64M+32K low memory. Kernel would
+ try to allocate 72M below 4G automatically.
+ This one let user to specify own low range under 4G
+ for second kernel instead.
+ 0: to disable low allocation.
+
cs89x0_dma= [HW,NET]
Format: <dma>

Dave Young

unread,
Apr 8, 2013, 3:24:27 AM4/8/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, linux-...@vger.kernel.org
On 04/05/2013 06:16 AM, Yinghai Lu wrote:
> Chao said that kdump does does work well on his system on 3.8
> without extra parameter, even iommu does not work with kdump.
> And now have to append crashkernel_low=Y in first kernel to make
> kdump work.
>
> We have now modified crashkernel=X to allocate memory beyong 4G (if
> available) and do not allocate low range for crashkernel if the user
> does not specify that with crashkernel_low=Y. This causes regression
> if iommu is not enabled. Without iommu, swiotlb needs to be setup in
> first 4G and there is no low memory available to second kernel.

Is it possible to reuse the 1st kernel swiotlb region in 2nd capture
kernel if it's available?

>
> Set crashkernel_low automatically if the user does not specify that.
>
> For system that does support IOMMU with kdump properly, user could
> specify crashkernel_low=0 to save that 72M low ram.

How about make swiotlb size tunable in 1st kernel as well such as adding
a swiotlb_size= to cmdline, if it's set in 1st kernel crashkernel
reserving code can take it automaticlly.

This will benefit to user who use low-mem machines.
--
Thanks
Dave

Yinghai Lu

unread,
Apr 8, 2013, 2:37:16 PM4/8/13
to Dave Young, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List
On Mon, Apr 8, 2013 at 12:09 AM, Dave Young <dyo...@redhat.com> wrote:
>> We have now modified crashkernel=X to allocate memory beyong 4G (if
>> available) and do not allocate low range for crashkernel if the user
>> does not specify that with crashkernel_low=Y. This causes regression
>> if iommu is not enabled. Without iommu, swiotlb needs to be setup in
>> first 4G and there is no low memory available to second kernel.
>
> Is it possible to reuse the 1st kernel swiotlb region in 2nd capture
> kernel if it's available?

If the first kernel is using intel iommu, and swiotlb is freed after intel
iommus is enabled in first kernel.

>
>>
>> Set crashkernel_low automatically if the user does not specify that.
>>
>> For system that does support IOMMU with kdump properly, user could
>> specify crashkernel_low=0 to save that 72M low ram.
>
> How about make swiotlb size tunable in 1st kernel as well such as adding
> a swiotlb_size= to cmdline, if it's set in 1st kernel crashkernel
> reserving code can take it automaticlly.
>
can not understand this.

Thanks

Yinghai

Dave Young

unread,
Apr 8, 2013, 11:26:00 PM4/8/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List
On 04/09/2013 02:37 AM, Yinghai Lu wrote:
> On Mon, Apr 8, 2013 at 12:09 AM, Dave Young <dyo...@redhat.com> wrote:
>>> We have now modified crashkernel=X to allocate memory beyong 4G (if
>>> available) and do not allocate low range for crashkernel if the user
>>> does not specify that with crashkernel_low=Y. This causes regression
>>> if iommu is not enabled. Without iommu, swiotlb needs to be setup in
>>> first 4G and there is no low memory available to second kernel.
>>
>> Is it possible to reuse the 1st kernel swiotlb region in 2nd capture
>> kernel if it's available?
>
> If the first kernel is using intel iommu, and swiotlb is freed after intel
> iommus is enabled in first kernel.

Ok, also it's hard to handle such as 1st kernel iommu=off, 2nd kernel
iommu=on etc.

I have another question, under x86_64 consider 1st kernel memory < 4G,
is the swiotlb memory still necessary?

>
>>
>>>
>>> Set crashkernel_low automatically if the user does not specify that.
>>>
>>> For system that does support IOMMU with kdump properly, user could
>>> specify crashkernel_low=0 to save that 72M low ram.
>>
>> How about make swiotlb size tunable in 1st kernel as well such as adding
>> a swiotlb_size= to cmdline, if it's set in 1st kernel crashkernel
>> reserving code can take it automaticlly.
>>
> can not understand this.

This maybe out of topic. I means swiotlb size is hardcoded, I'm thinking
how about make it configurable via kconfig or boot cmdline.

>
> Thanks
>
> Yinghai
>


--
Thanks
Dave

Yinghai Lu

unread,
Apr 8, 2013, 11:37:20 PM4/8/13
to Dave Young, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Vivek Goyal, Eric W. Biederman, Linux Kernel Mailing List
On Mon, Apr 8, 2013 at 8:25 PM, Dave Young <dyo...@redhat.com> wrote:

> I have another question, under x86_64 consider 1st kernel memory < 4G,
> is the swiotlb memory still necessary?

No. unless use swiotlb=force.

Thanks

Yinghai

Vivek Goyal

unread,
Apr 9, 2013, 9:46:07 AM4/9/13
to Yinghai Lu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On Thu, Apr 04, 2013 at 03:17:01PM -0700, Yinghai Lu wrote:

[..]
> @@ -1360,37 +1369,80 @@ static int __init parse_crashkernel_simp
>
> if (*cur == '@')
> *crash_base = memparse(cur+1, &cur);
> - else if (*cur != ' ' && *cur != '\0') {
> - pr_warning("crashkernel: unrecognized char\n");
> - return -EINVAL;
> + else {
> + int i;
> +
> + /* check with known suffix */
> + for (i = 0; suffix_tbl[i]; i++)
> + if (!strncmp(cur, suffix_tbl[i], strlen(suffix_tbl[i])))
> + return 0;
> +

So crashkernel=X@Y;high is a valid syntax? Looks like we will reserve
X amount of RAM at base Y and ignore "high" or "low".

[..]
Why don't we structure it little differently. Now we seem to have 3
categories of crashkernel= parameters.

- crashkernel_simple (crashkernel=X or crashkernel=X@Y)
- crashkernel_mem (crashkernel=range:size,.....)
- crashkerenl_high_low_suffix (crashkernel=X;high or crashkernel=Y;low)

if (suffix) {
parse_crashkernel_high_low_suffix()
} else {
if (first_colon.....)
parse_crashkernel_mem()
else
parse_crashkernel_simple();
}

And now you should not require "simple_only" function parameter and you
can also do strict syntax checking for each type of crashkernel=
parameter.

Thanks
Vivek

H. Peter Anvin

unread,
Apr 9, 2013, 11:01:32 AM4/9/13
to Vivek Goyal, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
Please, no semicolons. We already have established syntax for suboptions (option=suboption,suboption,...) and suboptions with parameters (option=suboption:value,...)
Sent from my mobile phone. Please excuse brevity and lack of formatting.

Vivek Goyal

unread,
Apr 9, 2013, 12:47:39 PM4/9/13
to H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On Tue, Apr 09, 2013 at 08:00:57AM -0700, H. Peter Anvin wrote:
> Please, no semicolons. We already have established syntax for suboptions (option=suboption,suboption,...) and suboptions with parameters (option=suboption:value,...)

Ok, to understand it better, so crashkernel= will look as follows?

crashkernel=suboption[,suboption[,....]][@offset]

A suboption can be.

- A memory value (128[KMG])
- A range with value (range:size)
- Or a property influencing memory allocation behavior (e.g high or low)

If yes, sounds good.

Thanks
Vivek

H. Peter Anvin

unread,
Apr 9, 2013, 12:49:58 PM4/9/13
to Vivek Goyal, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On 04/09/2013 09:47 AM, Vivek Goyal wrote:
> On Tue, Apr 09, 2013 at 08:00:57AM -0700, H. Peter Anvin wrote:
>> Please, no semicolons. We already have established syntax for suboptions (option=suboption,suboption,...) and suboptions with parameters (option=suboption:value,...)
>
> Ok, to understand it better, so crashkernel= will look as follows?
>
> crashkernel=suboption[,suboption[,....]][@offset]
>
> A suboption can be.
>
> - A memory value (128[KMG])
> - A range with value (range:size)
> - Or a property influencing memory allocation behavior (e.g high or low)
>
> If yes, sounds good.
>

Sorry, I don't quite grok @offset and range:size here.

What exactly does those bits do?

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Vivek Goyal

unread,
Apr 9, 2013, 1:01:00 PM4/9/13
to H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On Tue, Apr 09, 2013 at 09:49:28AM -0700, H. Peter Anvin wrote:
> On 04/09/2013 09:47 AM, Vivek Goyal wrote:
> > On Tue, Apr 09, 2013 at 08:00:57AM -0700, H. Peter Anvin wrote:
> >> Please, no semicolons. We already have established syntax for suboptions (option=suboption,suboption,...) and suboptions with parameters (option=suboption:value,...)
> >
> > Ok, to understand it better, so crashkernel= will look as follows?
> >
> > crashkernel=suboption[,suboption[,....]][@offset]
> >
> > A suboption can be.
> >
> > - A memory value (128[KMG])
> > - A range with value (range:size)
> > - Or a property influencing memory allocation behavior (e.g high or low)
> >
> > If yes, sounds good.
> >
>
> Sorry, I don't quite grok @offset and range:size here.
>
> What exactly does those bits do?

We have following crashkernel= syntax defined in kernel-parameters.txt.

crashkernel=range1:size1[,range2:size2,...][@offset]
[KNL] Same as above, but depends on the memory
in the running system. The syntax of range is
start-[end] where start and end are both
a memory unit (amount[KMG]). See also
Documentation/kdump/kdump.txt for an example.


Because memory required for filtering depended on existing RAM in the
box, somebody came with this syntax. "range" specifies the range of
memory present in the system and "value" represents how much RAM
to reserve if system RAM falls in the range.

For example (From kdump.txt)

crashkernel=512M-2G:64M,2G-:128M

This would mean:

1) if the RAM is smaller than 512M, then don't reserve anything
(this is the "rescue" case)
2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
3) if the RAM size is larger than 2G, then reserve 128M


Thanks
Vivek

Vivek Goyal

unread,
Apr 9, 2013, 1:12:22 PM4/9/13
to H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On Tue, Apr 09, 2013 at 09:49:28AM -0700, H. Peter Anvin wrote:
> On 04/09/2013 09:47 AM, Vivek Goyal wrote:
> > On Tue, Apr 09, 2013 at 08:00:57AM -0700, H. Peter Anvin wrote:
> >> Please, no semicolons. We already have established syntax for suboptions (option=suboption,suboption,...) and suboptions with parameters (option=suboption:value,...)
> >
> > Ok, to understand it better, so crashkernel= will look as follows?
> >
> > crashkernel=suboption[,suboption[,....]][@offset]
> >
> > A suboption can be.
> >
> > - A memory value (128[KMG])
> > - A range with value (range:size)
> > - Or a property influencing memory allocation behavior (e.g high or low)
> >
> > If yes, sounds good.
> >
>
> Sorry, I don't quite grok @offset and range:size here.

And @offset means that reserve memory at a particular offset, if
available.

Thanks
Vivek

H. Peter Anvin

unread,
Apr 9, 2013, 1:20:38 PM4/9/13
to Vivek Goyal, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, linux-...@vger.kernel.org
On 04/09/2013 10:12 AM, Vivek Goyal wrote:
> On Tue, Apr 09, 2013 at 09:49:28AM -0700, H. Peter Anvin wrote:
>> On 04/09/2013 09:47 AM, Vivek Goyal wrote:
>>> On Tue, Apr 09, 2013 at 08:00:57AM -0700, H. Peter Anvin wrote:
>>>> Please, no semicolons. We already have established syntax for suboptions (option=suboption,suboption,...) and suboptions with parameters (option=suboption:value,...)
>>>
>>> Ok, to understand it better, so crashkernel= will look as follows?
>>>
>>> crashkernel=suboption[,suboption[,....]][@offset]
>>>
>>> A suboption can be.
>>>
>>> - A memory value (128[KMG])
>>> - A range with value (range:size)
>>> - Or a property influencing memory allocation behavior (e.g high or low)
>>>
>>> If yes, sounds good.
>>>
>>
>> Sorry, I don't quite grok @offset and range:size here.
>
> And @offset means that reserve memory at a particular offset, if
> available.
>

OK, the @offset should probably be treated as a suboption, with an
implicit/optional comma.

Otherwise, makes sense to me.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Yinghai Lu

unread,
Apr 9, 2013, 4:06:02 PM4/9/13
to Vivek Goyal, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 9, 2013 at 6:45 AM, Vivek Goyal <vgo...@redhat.com> wrote:
> On Thu, Apr 04, 2013 at 03:17:01PM -0700, Yinghai Lu wrote:
>
> [..]
>> @@ -1360,37 +1369,80 @@ static int __init parse_crashkernel_simp
>>
>> if (*cur == '@')
>> *crash_base = memparse(cur+1, &cur);
>> - else if (*cur != ' ' && *cur != '\0') {
>> - pr_warning("crashkernel: unrecognized char\n");
>> - return -EINVAL;
>> + else {
>> + int i;
>> +
>> + /* check with known suffix */
>> + for (i = 0; suffix_tbl[i]; i++)
>> + if (!strncmp(cur, suffix_tbl[i], strlen(suffix_tbl[i])))
>> + return 0;
>> +
>
> So crashkernel=X@Y;high is a valid syntax? Looks like we will reserve
> X amount of RAM at base Y and ignore "high" or "low".

yes, we should reject them.

>
> [..]
..
>
> Why don't we structure it little differently. Now we seem to have 3
> categories of crashkernel= parameters.
>
> - crashkernel_simple (crashkernel=X or crashkernel=X@Y)
> - crashkernel_mem (crashkernel=range:size,.....)
> - crashkerenl_high_low_suffix (crashkernel=X;high or crashkernel=Y;low)
>
> if (suffix) {
> parse_crashkernel_high_low_suffix()
> } else {
> if (first_colon.....)
> parse_crashkernel_mem()
> else
> parse_crashkernel_simple();
> }
>
> And now you should not require "simple_only" function parameter and you
> can also do strict syntax checking for each type of crashkernel=
> parameter.

yes, that will the code more readable.

Just send -v4 of this patch that will not reuse parse_crashkernel_simple.

Thanks

Yinghai

H. Peter Anvin

unread,
Apr 9, 2013, 4:25:18 PM4/9/13
to Yinghai Lu, Vivek Goyal, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On 04/09/2013 01:05 PM, Yinghai Lu wrote:
>>
>> So crashkernel=X@Y;high is a valid syntax? Looks like we will reserve
>> X amount of RAM at base Y and ignore "high" or "low".
>
> yes, we should reject them.
>

What if there isn't X amount of RAM available at base Y?

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Vivek Goyal

unread,
Apr 9, 2013, 4:30:05 PM4/9/13
to H. Peter Anvin, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On Tue, Apr 09, 2013 at 01:24:46PM -0700, H. Peter Anvin wrote:
> On 04/09/2013 01:05 PM, Yinghai Lu wrote:
> >>
> >> So crashkernel=X@Y;high is a valid syntax? Looks like we will reserve
> >> X amount of RAM at base Y and ignore "high" or "low".
> >
> > yes, we should reject them.
> >
>
> What if there isn't X amount of RAM available at base Y?

We don't reserve anything.

In this context crashkernel=X@Y,high is invalid syntax and should probably
be ignored by parser. The very fact user specified the offset, high or
low option does not carry any meaning.

crashkernel=X,high is valid though.

Thanks
Vivek

H. Peter Anvin

unread,
Apr 9, 2013, 4:33:55 PM4/9/13
to Vivek Goyal, Yinghai Lu, Thomas Gleixner, Ingo Molnar, WANG Chao, Eric W. Biederman, Linux Kernel Mailing List
On 04/09/2013 01:29 PM, Vivek Goyal wrote:
> On Tue, Apr 09, 2013 at 01:24:46PM -0700, H. Peter Anvin wrote:
>> On 04/09/2013 01:05 PM, Yinghai Lu wrote:
>>>>
>>>> So crashkernel=X@Y;high is a valid syntax? Looks like we will reserve
>>>> X amount of RAM at base Y and ignore "high" or "low".
>>>
>>> yes, we should reject them.
>>>
>>
>> What if there isn't X amount of RAM available at base Y?
>
> We don't reserve anything.
>
> In this context crashkernel=X@Y,high is invalid syntax and should probably
> be ignored by parser. The very fact user specified the offset, high or
> low option does not carry any meaning.
>
> crashkernel=X,high is valid though.
>

Yes, if the offset being invalid is an error, that is the right
behavior, I would think.

-hpa


--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

0 new messages