[v2 4/5] arm64: add kdump support

AKASHI Takahiro

unread,

Apr 24, 2015, 4:00:06 AM4/24/15

to

Please read the following commits for arm64-specific constraints:
arm64: kdump: reserve memory for crash dump kernel
arm64: kdump: do not go into EL2 before starting a crash dump kernel

Signed-off-by: AKASHI Takahiro <takahir...@linaro.org>
---
Documentation/kdump/kdump.txt | 31 ++++++++++++++++++++++++++++++-
arch/arm64/Kconfig | 12 ++++++++++++
2 files changed, 42 insertions(+), 1 deletion(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index bc4bd5a..f9cf6f5 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -18,7 +18,7 @@ memory image to a dump file on the local disk, or across the network to
a remote system.

Kdump and kexec are currently supported on the x86, x86_64, ppc64, ia64,
-s390x and arm architectures.
+s390x, arm and arm64 architectures.

When the system kernel boots, it reserves a small section of memory for
the dump-capture kernel. This ensures that ongoing Direct Memory Access
@@ -249,6 +249,29 @@ Dump-capture kernel config options (Arch Dependent, arm)

AUTO_ZRELADDR=y

+Dump-capture kernel config options (Arch Dependent, arm64)
+----------------------------------------------------------
+
+1) Disable symmetric multi-processing support under "Processor type and
+ features":
+
+ CONFIG_SMP=n
+
+ (If CONFIG_SMP=y, then specify maxcpus=1 on the kernel command line
+ when loading the dump-capture kernel, see section "Load the Dump-capture
+ Kernel".)
+
+2) Under uefi, the maximum memory size on the dump-capture kernel must be
+ limited by specifying:
+
+ mem=X[MG]
+
+ where X should be less than or equal to the size in "crashkernel="
+ boot parameter. Kexec-tools will automatically add this.
+
+3) Currently, kvm will not be initialized on the dump-capture kernel even
+ if it is configured.
+
Extended crashkernel syntax
===========================

@@ -312,6 +335,7 @@ Boot into System Kernel
any space below the alignment point may be overwritten by the dump-capture kernel,
which means it is possible that the vmcore is not that precise as expected.

+ On arm64, use "crashkernel=Y@X".

Load the Dump-capture Kernel
============================
@@ -334,6 +358,8 @@ For s390x:
- Use image or bzImage
For arm:
- Use zImage
+For arm64:
+ - Use vmlinux

If you are using a uncompressed vmlinux image then use following command
to load dump-capture kernel.
@@ -377,6 +403,9 @@ For s390x:
For arm:
"1 maxcpus=1 reset_devices"

+For arm64:
+ "1 maxcpus=1 mem=X[MG] reset_devices"
+
Notes on loading the dump-capture kernel:

* By default, the ELF headers are stored in ELF64 format to support
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 5716edf..8e2f545 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -562,6 +562,18 @@ config KEXEC
but it is independent of the system firmware. And like a reboot
you can start any kernel with it, not just Linux.

+config CRASH_DUMP
+ bool "Build kdump crash kernel"
+ help
+ Generate crash dump after being started by kexec. This should
+ be normally only set in special crash dump kernels which are
+ loaded in the main kernel with kexec-tools into a specially
+ reserved region and then later executed after a crash by
+ kdump/kexec. The crash dump kernel must be compiled to a
+ memory address not used by the main kernel.
+
+ For more details see Documentation/kdump/kdump.txt
+
config XEN_DOM0
def_bool y
depends on XEN
--
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

AKASHI Takahiro

unread,

Apr 24, 2015, 4:00:06 AM4/24/15

to

Signed-off-by: AKASHI Takahiro <takahir...@linaro.org>
---

arch/arm64/configs/defconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 583cacb..7b3fa3a 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -49,6 +49,7 @@ CONFIG_KSM=y
CONFIG_TRANSPARENT_HUGEPAGE=y
CONFIG_CMA=y
CONFIG_KEXEC=y
+CONFIG_CRASH_DUMP=y
CONFIG_CMDLINE="console=ttyAMA0"
# CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
CONFIG_COMPAT=y

AKASHI Takahiro

unread,

Apr 24, 2015, 4:00:06 AM4/24/15

to

kdump calls machine_crash_shutdown() to shut down non-boot cpus and
save per-cpu general-purpose registers before restarting the crash dump
kernel. See kernel_kexec().
ipi_cpu_stop() is used and a bit modified to support this behavior.

Signed-off-by: AKASHI Takahiro <takahir...@linaro.org>
---

arch/arm64/include/asm/kexec.h | 34 ++++++++++++++++++++++-
arch/arm64/kernel/machine_kexec.c | 55 ++++++++++++++++++++++++++++++++++++-
arch/arm64/kernel/smp.c | 12 ++++++--
3 files changed, 97 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/kexec.h b/arch/arm64/include/asm/kexec.h
index 3530ff5..eaf3fcb 100644
--- a/arch/arm64/include/asm/kexec.h
+++ b/arch/arm64/include/asm/kexec.h
@@ -30,6 +30,8 @@

#if !defined(__ASSEMBLY__)

+extern bool in_crash_kexec;
+
/**
* crash_setup_regs() - save registers for the panic kernel
*
@@ -40,7 +42,37 @@
static inline void crash_setup_regs(struct pt_regs *newregs,
struct pt_regs *oldregs)
{
- /* Empty routine needed to avoid build errors. */
+ if (oldregs) {
+ memcpy(newregs, oldregs, sizeof(*newregs));
+ } else {
+ __asm__ __volatile__ (
+ "stp x0, x1, [%3]\n\t"
+ "stp x2, x3, [%3, 0x10]\n\t"
+ "stp x4, x5, [%3, 0x20]\n\t"
+ "stp x6, x7, [%3, 0x30]\n\t"
+ "stp x8, x9, [%3, 0x40]\n\t"
+ "stp x10, x11, [%3, 0x50]\n\t"
+ "stp x12, x13, [%3, 0x60]\n\t"
+ "stp x14, x15, [%3, 0x70]\n\t"
+ "stp x16, x17, [%3, 0x80]\n\t"
+ "stp x18, x19, [%3, 0x90]\n\t"
+ "stp x20, x21, [%3, 0xa0]\n\t"
+ "stp x22, x23, [%3, 0xb0]\n\t"
+ "stp x24, x25, [%3, 0xc0]\n\t"
+ "stp x26, x27, [%3, 0xd0]\n\t"
+ "stp x28, x29, [%3, 0xe0]\n\t"
+ "str x30, [%3, 0xf0]\n\t"
+ "mov %0, sp\n\t"
+ "adr %1, 1f\n\t"
+ "mrs %2, spsr_el1\n\t"
+ "1:"
+ : "=r" (newregs->sp),
+ "=r" (newregs->pc),
+ "=r" (newregs->pstate)
+ : "r" (&newregs->regs)
+ : "memory"
+ );
+ }
}

#endif /* !defined(__ASSEMBLY__) */
diff --git a/arch/arm64/kernel/machine_kexec.c b/arch/arm64/kernel/machine_kexec.c
index fd6b742..6f7887b 100644
--- a/arch/arm64/kernel/machine_kexec.c
+++ b/arch/arm64/kernel/machine_kexec.c
@@ -9,6 +9,8 @@
* published by the Free Software Foundation.
*/

+#include <linux/irq.h>
+#include <linux/kernel.h>
#include <linux/kexec.h>
#include <linux/of_fdt.h>
#include <linux/slab.h>
@@ -24,6 +26,8 @@ extern unsigned long arm64_kexec_dtb_addr;
extern unsigned long arm64_kexec_kimage_head;
extern unsigned long arm64_kexec_kimage_start;

+bool in_crash_kexec = false;
+
/**
* kexec_is_dtb - Helper routine to check the device tree header signature.
*/
@@ -178,7 +182,56 @@ void machine_kexec(struct kimage *image)
soft_restart(reboot_code_buffer_phys);
}

+static void machine_kexec_mask_interrupts(void)
+{
+ unsigned int i;
+ struct irq_desc *desc;
+
+ for_each_irq_desc(i, desc) {
+ struct irq_chip *chip;
+
+ chip = irq_desc_get_chip(desc);
+ if (!chip)
+ continue;
+
+ if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
+ chip->irq_eoi(&desc->irq_data);
+
+ if (chip->irq_mask)
+ chip->irq_mask(&desc->irq_data);
+
+ if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
+ chip->irq_disable(&desc->irq_data);
+ }
+}
+
+/**
+ * machine_crash_shutdown - shutdown non-boot cpus and save registers
+ */
void machine_crash_shutdown(struct pt_regs *regs)
{
- /* Empty routine needed to avoid build errors. */
+ struct pt_regs dummy_regs;
+ int cpu;
+
+ local_irq_disable();
+
+ in_crash_kexec = true;
+
+ /*
+ * clear and initialize the per-cpu info. This is necessary
+ * because, otherwise, slots for offline cpus would not be
+ * filled up. See smp_send_stop().
+ */
+ memset(&dummy_regs, 0, sizeof(dummy_regs));
+ for_each_possible_cpu(cpu)
+ crash_save_cpu(&dummy_regs, cpu);
+
+ /* shutdown non-boot cpus */
+ smp_send_stop();
+
+ /* for this cpu */
+ crash_save_cpu(regs, smp_processor_id());
+ machine_kexec_mask_interrupts();
+
+ pr_info("Loading crashdump kernel...\n");
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index ffe8e1b..cf7f361 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -36,6 +36,7 @@
#include <linux/completion.h>
#include <linux/of.h>
#include <linux/irq_work.h>
+#include <linux/kexec.h>

#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
/*
* ipi_cpu_stop - handle IPI from smp_send_stop()
*/
-static void ipi_cpu_stop(unsigned int cpu)
+static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs)
{
if (system_state == SYSTEM_BOOTING ||
system_state == SYSTEM_RUNNING) {
@@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu)

local_irq_disable();

+#ifdef CONFIG_KEXEC
+ if (in_crash_kexec) {
+ crash_save_cpu(regs, cpu);
+ flush_cache_all();
+ }
+#endif /* CONFIG_KEXEC */
+
while (1)
cpu_relax();
}
@@ -586,7 +594,7 @@ void handle_IPI(int ipinr, struct pt_regs *regs)

case IPI_CPU_STOP:
irq_enter();
- ipi_cpu_stop(cpu);
+ ipi_cpu_stop(cpu, regs);
irq_exit();
break;

AKASHI Takahiro

unread,

Apr 24, 2015, 4:00:06 AM4/24/15

to

On system kernel, the memory region used by crash dump kernel must be
specified by "crashkernel=X@Y" boot parameter. reserve_crashkernel()
will allocate the region in "System RAM" and reserve it for later use.

On crash dump kernel, memory region information in system kernel is
described in a specific region specified by "elfcorehdr=X@Y" boot parameter.
reserve_elfcorehdr() will set aside the region to avoid data destruction
by the kernel.

Crash dump kernel will access memory regions in system kernel via
copy_oldmem_page(), which reads a page by ioremap'ing it assuming that
such pages are not part of main memory of crash dump kernel.
This is true under non-UEFI environment because kexec-tools modifies
a device tree adding "usablemem" attributes to memory sections.
Under UEFI, however, this is not true because UEFI remove memory sections
in a device tree and export all the memory regions, even though they belong
to system kernel.

So we should add "mem=X[MG]" boot parameter to limit the memory size and
avoid hitting the following assertion in ioremap():
if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
return NULL;

Signed-off-by: AKASHI Takahiro <takahir...@linaro.org>
---

arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 +++++++++++++++++++++++++++++++++
arch/arm64/kernel/setup.c | 8 +++-
arch/arm64/mm/init.c | 84 ++++++++++++++++++++++++++++++++++++++++
4 files changed, 163 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kernel/crash_dump.c

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index ac3c2e2..6fcc602 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -36,6 +36,7 @@ arm64-obj-$(CONFIG_EFI) += efi.o efi-stub.o efi-entry.o
arm64-obj-$(CONFIG_PCI) += pci.o
arm64-obj-$(CONFIG_ARMV8_DEPRECATED) += armv8_deprecated.o
arm64-obj-$(CONFIG_KEXEC) += machine_kexec.o relocate_kernel.o
+arm64-obj-$(CONFIG_CRASH_DUMP) += crash_dump.o

obj-y += $(arm64-obj-y) vdso/
obj-m += $(arm64-obj-m)
diff --git a/arch/arm64/kernel/crash_dump.c b/arch/arm64/kernel/crash_dump.c
new file mode 100644
index 0000000..3d86c0a
--- /dev/null
+++ b/arch/arm64/kernel/crash_dump.c
@@ -0,0 +1,71 @@
+/*
+ * Routines for doing kexec-based kdump
+ *
+ * Copyright (C) 2014 Linaro Limited
+ * Author: AKASHI Takahiro <takahir...@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/crash_dump.h>
+#include <linux/errno.h>
+#include <linux/io.h>
+#include <linux/memblock.h>
+#include <linux/uaccess.h>
+#include <asm/memory.h>
+
+/**
+ * copy_oldmem_page() - copy one page from old kernel memory
+ * @pfn: page frame number to be copied
+ * @buf: buffer where the copied page is placed
+ * @csize: number of bytes to copy
+ * @offset: offset in bytes into the page
+ * @userbuf: if set, @buf is in a user address space
+ *
+ * This function copies one page from old kernel memory into buffer pointed by
+ * @buf. If @buf is in userspace, set @userbuf to %1. Returns number of bytes
+ * copied or negative error in case of failure.
+ */
+ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
+ size_t csize, unsigned long offset,
+ int userbuf)
+{
+ void *vaddr;
+
+ if (!csize)
+ return 0;
+
+ vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
+ if (!vaddr)
+ return -ENOMEM;
+
+ if (userbuf) {
+ if (copy_to_user(buf, vaddr + offset, csize)) {
+ iounmap(vaddr);
+ return -EFAULT;
+ }
+ } else {
+ memcpy(buf, vaddr + offset, csize);
+ }
+
+ iounmap(vaddr);
+
+ return csize;
+}
+
+/**
+ * elfcorehdr_read - read from ELF core header
+ * @buf: buffer where the data is placed
+ * @csize: number of bytes to read
+ * @ppos: address in the memory
+ *
+ * This function reads @count bytes from elf core header which exists
+ * on crash dump kernel's memory.
+ */
+ssize_t elfcorehdr_read(char *buf, size_t count, u64 *ppos)
+{
+ memcpy(buf, phys_to_virt((phys_addr_t)*ppos), count);
+ return count;
+}
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c
index 51ef972..7932bd0 100644
--- a/arch/arm64/kernel/setup.c
+++ b/arch/arm64/kernel/setup.c
@@ -31,7 +31,6 @@
#include <linux/screen_info.h>
#include <linux/init.h>
#include <linux/kexec.h>
-#include <linux/crash_dump.h>
#include <linux/root_dev.h>
#include <linux/clk-provider.h>
#include <linux/cpu.h>
@@ -364,6 +363,12 @@ static void __init request_standard_resources(void)
kernel_data.end <= res->end)
request_resource(res, &kernel_data);
}
+
+#ifdef CONFIG_KEXEC
+ /* User space tools will detect the region with /proc/iomem */
+ if (crashk_res.end)
+ insert_resource(&iomem_resource, &crashk_res);
+#endif
}

u64 __cpu_logical_map[NR_CPUS] = { [0 ... NR_CPUS-1] = INVALID_HWID };
@@ -393,6 +398,7 @@ void __init setup_arch(char **cmdline_p)
local_async_enable();

efi_init();
+
arm64_memblock_init();

paging_init();
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index ae85da6..ea70d41 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -34,6 +34,8 @@
#include <linux/dma-contiguous.h>
#include <linux/efi.h>
#include <linux/swiotlb.h>
+#include <linux/kexec.h>
+#include <linux/crash_dump.h>

#include <asm/fixmap.h>
#include <asm/memory.h>
@@ -66,6 +68,81 @@ static int __init early_initrd(char *p)
early_param("initrd", early_initrd);
#endif

+#ifdef CONFIG_KEXEC
+/*
+ * reserve_crashkernel() - reserves memory for crash kernel
+ *
+ * This function reserves memory area given in "crashkernel=" kernel command
+ * line parameter. The memory reserved is used by a dump capture kernel when
+ * primary kernel is crashing.
+ */
+static void __init reserve_crashkernel(phys_addr_t limit)
+{
+ unsigned long long crash_size = 0, crash_base = 0;
+ int ret;
+
+ ret = parse_crashkernel(boot_command_line, limit,
+ &crash_size, &crash_base);
+ if (ret)
+ return;
+
+ if (crash_base == 0) {
+ crash_base = memblock_alloc(crash_size, 1 << 20);
+ if (crash_base == 0) {
+ pr_warn("crashkernel allocation failed (size:%llx)\n",
+ crash_size);
+ return;
+ }
+ } else {
+ /* User specifies base address explicitly. Sanity check */
+ if (!memblock_is_region_memory(crash_base, crash_size) ||
+ memblock_is_region_reserved(crash_base, crash_size)) {
+ pr_warn("crashkernel= has wrong address or size\n");
+ return;
+ }
+
+ if (memblock_reserve(crash_base, crash_size)) {
+ pr_warn("crashkernel reservation failed - out of memory\n");
+ return;
+ }
+ }
+
+ pr_info("Reserving %lldMB of memory at %lldMB for crashkernel\n",
+ crash_size >> 20, crash_base >> 20);
+
+ crashk_res.start = crash_base;
+ crashk_res.end = crash_base + crash_size - 1;

+}
+#endif /* CONFIG_KEXEC */
+

+#ifdef CONFIG_CRASH_DUMP
+/*
+ * reserve_elfcorehdr() - reserves memory for elf core header
+ *
+ * This function reserves memory area given in "elfcorehdr=" kernel command
+ * line parameter. The memory reserved is used by a dump capture kernel to
+ * identify the memory used by primary kernel.
+ */
+static void __init reserve_elfcorehdr(void)
+{
+ if (!elfcorehdr_size)
+ return;
+
+ if (memblock_is_region_reserved(elfcorehdr_addr, elfcorehdr_size)) {
+ pr_warn("elfcorehdr reservation failed - memory is in use (0x%llx)\n",
+ elfcorehdr_addr);
+ return;
+ }
+
+ if (memblock_reserve(elfcorehdr_addr, elfcorehdr_size)) {
+ pr_warn("elfcorehdr reservation failed - out of memory\n");
+ return;
+ }
+
+ pr_info("Reserving %lldKB of memory at %lldMB for elfcorehdr\n",
+ elfcorehdr_size >> 10, elfcorehdr_addr >> 20);
+}
+#endif /* CONFIG_CRASH_DUMP */
/*
* Return the maximum physical address for ZONE_DMA (DMA_BIT_MASK(32)). It
* currently assumes that for memory starting above 4G, 32-bit devices will
@@ -170,6 +247,13 @@ void __init arm64_memblock_init(void)
memblock_reserve(__virt_to_phys(initrd_start), initrd_end - initrd_start);
#endif

+#ifdef CONFIG_KEXEC
+ reserve_crashkernel(memory_limit);
+#endif
+#ifdef CONFIG_CRASH_DUMP
+ reserve_elfcorehdr();
+#endif
+
early_init_fdt_scan_reserved_mem();

/* 4GB maximum for 32-bit only capable devices */

AKASHI Takahiro

unread,

Apr 24, 2015, 4:00:06 AM4/24/15

to

Unlike normal kexec case, we don't have a chance to reset EL2 context in
a generic way because bad exceptions may directly invoke crash_kexec().
(See die().)
Kvm is not useful on crash dump kernel anyway, and so we let it
un-initialized across rebooting.

Signed-off-by: AKASHI Takahiro <takahir...@linaro.org>
---

arch/arm64/kernel/process.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index d894d3e..9859f5c 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -48,6 +48,7 @@
#include <asm/compat.h>
#include <asm/cacheflush.h>
#include <asm/fpsimd.h>
+#include <asm/kexec.h>
#include <asm/mmu_context.h>
#include <asm/processor.h>
#include <asm/stacktrace.h>
@@ -64,7 +65,11 @@ void soft_restart(unsigned long addr)
setup_mm_for_reboot();

cpu_soft_restart(virt_to_phys(cpu_reset),
- is_hyp_mode_available(), addr);
+#ifdef CONFIG_KEXEC
+ !in_crash_kexec &&
+#endif
+ is_hyp_mode_available(),
+ addr);

/* Should never get here */
BUG();

AKASHI Takahiro

unread,

Apr 24, 2015, 4:00:06 AM4/24/15

to

This patch set enables kdump (crash dump kernel) support on arm64 on top of
Geoff's kexec patchset.

In this version, there are some arm64-specific usage/constraints:
1) "mem=" boot parameter must be specified on crash dump kernel
if the system starts on uefi.
2) Kvm will not be enabled on crash dump kernel even if configured
See commit messages and Documentation/kdump/kdump.txt for details.

The only concern I have is whether or not we can use the exact same kernel
as both system kernel and crash dump kernel. The current arm64 kernel is
not relocatable in the exact sense but I see no problems in using the same
binary when testing kdump.

I tested the code with
- ATF v1.1 + EDK2(UEFI) v3.0-rc0
- kernel v4.0 + Geoff' kexec v9
on
- Base fast model, and
- MediaTek MT8173-EVB
using my own kexec-tools [1], currently v0.12.

You may want to start a kernel with the following boot parameter:
crashkernel=64M (or so, on model)
and try
$ kexec -p --load <vmlinux> --append ...
$ echo c > /proc/sysrq-trigger

To examine vmcore (/proc/vmcore), you should use
- gdb v7.7 or later
- crash + a small patch (to recognize v4.0 kernel)

Changes from v1:
* rebased to Geoff's v9
* tested this patchset on real hardware and fixed bugs:
- added cache flush operation in ipi_cpu_stop() when shutting down
the system. Otherwise, data saved in vmcore's note sections by
crash_save_cpu() might not be flushed to dumped memory and crash command
fail to fetch correct data.
I will address Mark's commit[2] after Geoff takes care of it on kexec.
- modified to use ioremap_cache() instead of ioremap() when reading
crash memory. Otherwise, accessing /proc/vmcore on crash dump kernel
might cause an alignment fault.
* allows reserve_crashkernel() to handle "crashkernel=xyz[MG]" correctly,
thanks to Pratyush Anand. And it now also enforces memory limit.
* moved reserve_crashkernel() and reserve_elfcorehdr() to
arm64_memblock_init() to clarify that they should be called before
dma_contignuous_reserve().

[1] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/338171.html

AKASHI Takahiro (5):

arm64: kdump: reserve memory for crash dump kernel

arm64: kdump: implement machine_crash_shutdown()

arm64: kdump: do not go into EL2 before starting a crash dump kernel

arm64: add kdump support
arm64: enable kdump in the arm64 defconfig

Documentation/kdump/kdump.txt | 31 +++++++++++++-
arch/arm64/Kconfig | 12 ++++++
arch/arm64/configs/defconfig | 1 +
arch/arm64/include/asm/kexec.h | 34 ++++++++++++++-

arch/arm64/kernel/Makefile | 1 +
arch/arm64/kernel/crash_dump.c | 71 +++++++++++++++++++++++++++++++

arch/arm64/kernel/machine_kexec.c | 55 +++++++++++++++++++++++-
arch/arm64/kernel/process.c | 7 +++-
arch/arm64/kernel/setup.c | 8 +++-
arch/arm64/kernel/smp.c | 12 +++++-
arch/arm64/mm/init.c | 84 +++++++++++++++++++++++++++++++++++++
11 files changed, 309 insertions(+), 7 deletions(-)
create mode 100644 arch/arm64/kernel/crash_dump.c

Mark Rutland

unread,

Apr 24, 2015, 6:00:05 AM4/24/15

to

Hi,

On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
> This patch set enables kdump (crash dump kernel) support on arm64 on top of
> Geoff's kexec patchset.
>
> In this version, there are some arm64-specific usage/constraints:
> 1) "mem=" boot parameter must be specified on crash dump kernel
> if the system starts on uefi.

This sounds very painful. Why is this the case, and how do x86 and/or
ia64 get around that?

> 2) Kvm will not be enabled on crash dump kernel even if configured
> See commit messages and Documentation/kdump/kdump.txt for details.
>
> The only concern I have is whether or not we can use the exact same kernel
> as both system kernel and crash dump kernel. The current arm64 kernel is
> not relocatable in the exact sense but I see no problems in using the same
> binary when testing kdump.

Ard has been working on decoupling the kernel text/data, FDT, and linear
memory mappings, which would allow the kernel to be loaded anywhere and
still be able to access all of memory [3]. I assume that's what you mean
by "relocatable"?

> I tested the code with
> - ATF v1.1 + EDK2(UEFI) v3.0-rc0
> - kernel v4.0 + Geoff' kexec v9
> on
> - Base fast model, and
> - MediaTek MT8173-EVB
> using my own kexec-tools [1], currently v0.12.
>
> You may want to start a kernel with the following boot parameter:
> crashkernel=64M (or so, on model)
> and try
> $ kexec -p --load <vmlinux> --append ...
> $ echo c > /proc/sysrq-trigger
>
> To examine vmcore (/proc/vmcore), you should use
> - gdb v7.7 or later
> - crash + a small patch (to recognize v4.0 kernel)
>
> Changes from v1:
> * rebased to Geoff's v9
> * tested this patchset on real hardware and fixed bugs:
> - added cache flush operation in ipi_cpu_stop() when shutting down
> the system. Otherwise, data saved in vmcore's note sections by
> crash_save_cpu() might not be flushed to dumped memory and crash command
> fail to fetch correct data.

We'll need to give this a go on something with a system cache too (e.g.
Seattle or X-Gene). Even if that's only UP it would give me much greater
confidence in the cache maintenance.

> I will address Mark's commit[2] after Geoff takes care of it on kexec.
> - modified to use ioremap_cache() instead of ioremap() when reading
> crash memory. Otherwise, accessing /proc/vmcore on crash dump kernel
> might cause an alignment fault.
> * allows reserve_crashkernel() to handle "crashkernel=xyz[MG]" correctly,
> thanks to Pratyush Anand. And it now also enforces memory limit.

I worry that there could be potentially bad interaction between this and
Ard's patches, depending on how the memory area to use is chosen. It is
probably fine, but we should make sure that it is.

> * moved reserve_crashkernel() and reserve_elfcorehdr() to
> arm64_memblock_init() to clarify that they should be called before
> dma_contignuous_reserve().
>
> [1] https://git.linaro.org/people/takahiro.akashi/kexec-tools.git
> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/338171.html

[3] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-April/337596.html

Thanks,
Mark.

Mark Rutland

unread,

Apr 24, 2015, 6:20:05 AM4/24/15

to

On Fri, Apr 24, 2015 at 08:53:04AM +0100, AKASHI Takahiro wrote:
> On system kernel, the memory region used by crash dump kernel must be
> specified by "crashkernel=X@Y" boot parameter. reserve_crashkernel()
> will allocate the region in "System RAM" and reserve it for later use.
>
> On crash dump kernel, memory region information in system kernel is
> described in a specific region specified by "elfcorehdr=X@Y" boot parameter.
> reserve_elfcorehdr() will set aside the region to avoid data destruction
> by the kernel.
>
> Crash dump kernel will access memory regions in system kernel via
> copy_oldmem_page(), which reads a page by ioremap'ing it assuming that
> such pages are not part of main memory of crash dump kernel.
> This is true under non-UEFI environment because kexec-tools modifies
> a device tree adding "usablemem" attributes to memory sections.

I'm not sure what you mean by "usablemem" here.

Do you just mean that the memory nodes are altered such that they only
cover memory usable by the crash kernel?

Why not _always_ require a command line argument for the crash kernel
that restricts its memory usage to a particular range? That way it
doesn't matter whether we're using UEFI or not.

> Under UEFI, however, this is not true because UEFI remove memory sections
> in a device tree and export all the memory regions, even though they belong
> to system kernel.
>
> So we should add "mem=X[MG]" boot parameter to limit the memory size and
> avoid hitting the following assertion in ioremap():
> if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
> return NULL;

That looks suspicious. What is being ioremapped at that point?

[...]

> @@ -393,6 +398,7 @@ void __init setup_arch(char **cmdline_p)
> local_async_enable();
>
> efi_init();
> +
> arm64_memblock_init();
>
> paging_init();

Nit: unrelated whitespace change.

> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index ae85da6..ea70d41 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -34,6 +34,8 @@
> #include <linux/dma-contiguous.h>
> #include <linux/efi.h>
> #include <linux/swiotlb.h>
> +#include <linux/kexec.h>
> +#include <linux/crash_dump.h>

Nit: please keep these ordered.

[...]

> + if (memblock_reserve(crash_base, crash_size)) {
> + pr_warn("crashkernel reservation failed - out of memory\n");
> + return;
> + }

If we can remove this memory rather than reserving it, we can limit the
first kernel's ability to accidentally clobber the crash kernel, at the
expense of having to explicitly map/unmap around loading it.

Mark.

Mark Rutland

unread,

Apr 24, 2015, 6:40:05 AM4/24/15

to

Why the tabs?

Please use #16 * N as the offset for consistency with entry.S, with 0
for the first N.

[...]

> +static void machine_kexec_mask_interrupts(void)
> +{
> + unsigned int i;
> + struct irq_desc *desc;
> +
> + for_each_irq_desc(i, desc) {
> + struct irq_chip *chip;
> +
> + chip = irq_desc_get_chip(desc);
> + if (!chip)
> + continue;
> +
> + if (chip->irq_eoi && irqd_irq_inprogress(&desc->irq_data))
> + chip->irq_eoi(&desc->irq_data);
> +
> + if (chip->irq_mask)
> + chip->irq_mask(&desc->irq_data);
> +
> + if (chip->irq_disable && !irqd_irq_disabled(&desc->irq_data))
> + chip->irq_disable(&desc->irq_data);
> + }
> +}

I'm surprised that this isn't left to the irqchip driver init code in
the crash kernel. For all we know this state could be corrupt anyway.

Is there any reason we can't get the GIC driver to nuke all of this at
probe time?

[...]

> @@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
> /*
> * ipi_cpu_stop - handle IPI from smp_send_stop()
> */
> -static void ipi_cpu_stop(unsigned int cpu)
> +static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs)
> {
> if (system_state == SYSTEM_BOOTING ||
> system_state == SYSTEM_RUNNING) {
> @@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu)
>
> local_irq_disable();
>
> +#ifdef CONFIG_KEXEC
> + if (in_crash_kexec) {
> + crash_save_cpu(regs, cpu);
> + flush_cache_all();

Any cache maintenance will need to be by VA; flush_cache_all doesn't do
what the name implies, though may appear to work by chance.

Is kdump implemented for ARM? I don't see equivalent for in the arch/arm
ipi_cpu_stop.

Mark.

Marc Zyngier

unread,

Apr 24, 2015, 6:50:06 AM4/24/15

to

Indeed, parsing the irqdesc list is a recipe for disaster. Who knows
which locks have been taken or simply corrupted, pointers nuked...

> Is there any reason we can't get the GIC driver to nuke all of this at
> probe time?

This feels like the better option. I can cook a patch or two for that.

Thanks,

M.
--
Jazz is not dead. It just smells funny...

Baoquan He

unread,

Apr 28, 2015, 5:30:05 AM4/28/15

to

> +#ifdef CONFIG_CRASH_DUMP
> +/*
> + * reserve_elfcorehdr() - reserves memory for elf core header
> + *
> + * This function reserves memory area given in "elfcorehdr=" kernel command
> + * line parameter. The memory reserved is used by a dump capture kernel to
> + * identify the memory used by primary kernel.
> + */

Hi AKASHI,

May I know why elfcorehdr need be reserved separately but not locate a
memory region in crashkernel reserved region like all other ARCHs? Is
there any special reason?

Thanks
Baoquan

Dave Young

unread,

May 8, 2015, 8:30:06 AM5/8/15

to

Hi,

I noticed you are passing mem=X in kexec-tools, Pratyush found a problem
that vmcore is corrupted because kdump kernel is using the crash notes memory.

How does kdump kernel know the system ram range especially the mem_start?
Only with the size from mem=? parammeter?

In X86 because it use E820 so kexec-tools can pass the memory range to kdump
kernel even for UEFI booting. But in arm64 may need find other way to
communicate with 2nd kernel like memmap=exactmap in X86..

> _______________________________________________
> kexec mailing list
> ke...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec

AKASHI Takahiro

unread,

May 11, 2015, 2:20:05 AM5/11/15

to

Hi

Sorry for late response. I was on vacation.

On 04/24/2015 06:53 PM, Mark Rutland wrote:
> Hi,
>
> On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
>> This patch set enables kdump (crash dump kernel) support on arm64 on top of
>> Geoff's kexec patchset.
>>
>> In this version, there are some arm64-specific usage/constraints:
>> 1) "mem=" boot parameter must be specified on crash dump kernel
>> if the system starts on uefi.
>
> This sounds very painful. Why is this the case, and how do x86 and/or
> ia64 get around that?

As Dave (Young) said, x86 uses "memmap=XX" kernel commandline parameters
to specify usable memory for crash dump kernel.
On my arm64 implementation, "linux,usable-memory" property is added
to device tree blob by kexec-tools for this purpose.
This is because, when I first implemented kdump on arm64, ppc is the only
architecture that supports kdump AND utilizes device trees.
Since kexec-tools as well as the kernel already has this framework,
I believed that device-tree approach was smarter than a commandline
parameter.

However, uefi-based kernel ignores all the memory-related properties
in a device tree and so this "mem=" workaround was added.

>> 2) Kvm will not be enabled on crash dump kernel even if configured
>> See commit messages and Documentation/kdump/kdump.txt for details.
>>
>> The only concern I have is whether or not we can use the exact same kernel
>> as both system kernel and crash dump kernel. The current arm64 kernel is
>> not relocatable in the exact sense but I see no problems in using the same
>> binary when testing kdump.
>
> Ard has been working on decoupling the kernel text/data, FDT, and linear
> memory mappings, which would allow the kernel to be loaded anywhere and
> still be able to access all of memory [3]. I assume that's what you mean
> by "relocatable"?

I'm still trying to understand Ard's patchset, but I think yes.

Is there any genric interface to do so?

>> I will address Mark's commit[2] after Geoff takes care of it on kexec.
>> - modified to use ioremap_cache() instead of ioremap() when reading
>> crash memory. Otherwise, accessing /proc/vmcore on crash dump kernel
>> might cause an alignment fault.
>> * allows reserve_crashkernel() to handle "crashkernel=xyz[MG]" correctly,
>> thanks to Pratyush Anand. And it now also enforces memory limit.
>
> I worry that there could be potentially bad interaction between this and
> Ard's patches, depending on how the memory area to use is chosen. It is
> probably fine, but we should make sure that it is.

I'm not sure what the point is, but I will try to check it.

Thanks,
-Takahiro AKASHI

AKASHI Takahiro

unread,

May 11, 2015, 2:50:06 AM5/11/15

to

On 04/24/2015 07:11 PM, Mark Rutland wrote:
> On Fri, Apr 24, 2015 at 08:53:04AM +0100, AKASHI Takahiro wrote:
>> On system kernel, the memory region used by crash dump kernel must be
>> specified by "crashkernel=X@Y" boot parameter. reserve_crashkernel()
>> will allocate the region in "System RAM" and reserve it for later use.
>>
>> On crash dump kernel, memory region information in system kernel is
>> described in a specific region specified by "elfcorehdr=X@Y" boot parameter.
>> reserve_elfcorehdr() will set aside the region to avoid data destruction
>> by the kernel.
>>
>> Crash dump kernel will access memory regions in system kernel via
>> copy_oldmem_page(), which reads a page by ioremap'ing it assuming that
>> such pages are not part of main memory of crash dump kernel.
>> This is true under non-UEFI environment because kexec-tools modifies
>> a device tree adding "usablemem" attributes to memory sections.
>
> I'm not sure what you mean by "usablemem" here.

I think I explained it in my previous reply.

> Do you just mean that the memory nodes are altered such that they only
> cover memory usable by the crash kernel?
>
> Why not _always_ require a command line argument for the crash kernel
> that restricts its memory usage to a particular range? That way it
> doesn't matter whether we're using UEFI or not.

This is one option, but why does uefi ignore all the memory properties?

>> Under UEFI, however, this is not true because UEFI remove memory sections
>> in a device tree and export all the memory regions, even though they belong
>> to system kernel.
>>
>> So we should add "mem=X[MG]" boot parameter to limit the memory size and
>> avoid hitting the following assertion in ioremap():
>> if (WARN_ON(pfn_valid(__phys_to_pfn(phys_addr))))
>> return NULL;
>
> That looks suspicious. What is being ioremapped at that point?

As explained so far, all the memory regions are exposed to crash dump kernel,
and it recognizes any pages which should belong to the old kernel also as
part of crash kernel's memory. So pfn_valid() returns true.

> [...]
>
>> @@ -393,6 +398,7 @@ void __init setup_arch(char **cmdline_p)
>> local_async_enable();
>>
>> efi_init();
>> +
>> arm64_memblock_init();
>>
>> paging_init();
>
> Nit: unrelated whitespace change.

Ok. Will fix it.

>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
>> index ae85da6..ea70d41 100644
>> --- a/arch/arm64/mm/init.c
>> +++ b/arch/arm64/mm/init.c
>> @@ -34,6 +34,8 @@
>> #include <linux/dma-contiguous.h>
>> #include <linux/efi.h>
>> #include <linux/swiotlb.h>
>> +#include <linux/kexec.h>
>> +#include <linux/crash_dump.h>
>
> Nit: please keep these ordered.

Yeah, but others "linux/*.h" in this file are already in a random order.

> [...]
>
>> + if (memblock_reserve(crash_base, crash_size)) {
>> + pr_warn("crashkernel reservation failed - out of memory\n");
>> + return;
>> + }
>
> If we can remove this memory rather than reserving it, we can limit the
> first kernel's ability to accidentally clobber the crash kernel, at the
> expense of having to explicitly map/unmap around loading it.

Do you mean that we should remove mmu mapping of crash kernel memory?
Might be a good idea, but it requires modifying kernel/kexec.c.

-Takahiro AKASHI

AKASHI Takahiro

unread,

May 11, 2015, 3:20:05 AM5/11/15

to

Don't know. Will remove them.

> Please use #16 * N as the offset for consistency with entry.S, with 0
> for the first N.

OK.

I don't get the point. You mean that the code be put in probe() of GIC driver?
To be honest, this function was copied from arm's implementation.

> [...]
>
>> @@ -542,7 +543,7 @@ static DEFINE_RAW_SPINLOCK(stop_lock);
>> /*
>> * ipi_cpu_stop - handle IPI from smp_send_stop()
>> */
>> -static void ipi_cpu_stop(unsigned int cpu)
>> +static void ipi_cpu_stop(unsigned int cpu, struct pt_regs *regs)
>> {
>> if (system_state == SYSTEM_BOOTING ||
>> system_state == SYSTEM_RUNNING) {
>> @@ -556,6 +557,13 @@ static void ipi_cpu_stop(unsigned int cpu)
>>
>> local_irq_disable();
>>
>> +#ifdef CONFIG_KEXEC
>> + if (in_crash_kexec) {
>> + crash_save_cpu(regs, cpu);
>> + flush_cache_all();
>
> Any cache maintenance will need to be by VA; flush_cache_all doesn't do
> what the name implies, though may appear to work by chance.
>
> Is kdump implemented for ARM? I don't see equivalent for in the arch/arm
> ipi_cpu_stop.

Arm has a dedicated function in arch/arm/kernel/machine_kexec.c:
machine_crash_shutdown()
=> machine_crash_nonpanic_core()

Thanks,
-Takahiro AKASHI

AKASHI Takahiro

unread,

May 11, 2015, 3:40:06 AM5/11/15

to

Hi Baoquan,

On 04/28/2015 06:19 PM, Baoquan He wrote:
>> +#ifdef CONFIG_CRASH_DUMP
>> +/*
>> + * reserve_elfcorehdr() - reserves memory for elf core header
>> + *
>> + * This function reserves memory area given in "elfcorehdr=" kernel command
>> + * line parameter. The memory reserved is used by a dump capture kernel to
>> + * identify the memory used by primary kernel.
>> + */
>
> Hi AKASHI,
>
> May I know why elfcorehdr need be reserved separately but not locate a
> memory region in crashkernel reserved region like all other ARCHs? Is
> there any special reason?

I don't get your point, but arm as well as arm64 locates elfcorehdr
in a crash kernel's memory region.
See kexec/arch/arm{,64}/crashdump-arm{,64}.c in kexec-tools.

And this region is reserved at boot time *on crash kernel* because we don't want
to corrupt it accidentally.
(After Mark's comment, we might better remove the mmu mapping for this region, too.)

Make sense?

-Takahiro AKASHI

Dave Young

unread,

May 11, 2015, 3:50:06 AM5/11/15

to

Discussed with Pratyush in irc, he mentioned that start address is get from
kernel->entry which was passed by kexec-tools.

So sorry for the noise, it will works fine with the UEFI mem= limitation
fix from Mark Salter

Baoquan He

unread,

May 11, 2015, 4:00:06 AM5/11/15

to

On 05/11/15 at 04:38pm, AKASHI Takahiro wrote:
> Hi Baoquan,
>
> On 04/28/2015 06:19 PM, Baoquan He wrote:
> >>+#ifdef CONFIG_CRASH_DUMP
> >>+/*
> >>+ * reserve_elfcorehdr() - reserves memory for elf core header
> >>+ *
> >>+ * This function reserves memory area given in "elfcorehdr=" kernel command
> >>+ * line parameter. The memory reserved is used by a dump capture kernel to
> >>+ * identify the memory used by primary kernel.
> >>+ */
> >
> >Hi AKASHI,
> >
> >May I know why elfcorehdr need be reserved separately but not locate a
> >memory region in crashkernel reserved region like all other ARCHs? Is
> >there any special reason?
>
> I don't get your point, but arm as well as arm64 locates elfcorehdr
> in a crash kernel's memory region.
> See kexec/arch/arm{,64}/crashdump-arm{,64}.c in kexec-tools.
>
> And this region is reserved at boot time *on crash kernel* because we don't want
> to corrupt it accidentally.
> (After Mark's comment, we might better remove the mmu mapping for this region, too.)

Sorry, I don't make myself clear.

In this patch you reserve a separate memory region in 1st kernel to
store elfcorehdr. I am wondering why you don't call add_buffer in
kexec-tools directly. Like this you can get a region from reserved
crashkernel region. Then you don't need reserve_elfcorehdr() to reserve
memory for elfcorehdr specifically. Like other ARCHs do only one memory
region is reserved in 1st kernel, that's crashkernel region.

Thanks
Baoquan

AKASHI Takahiro

unread,

May 11, 2015, 4:00:07 AM5/11/15

to

Dave,

No, not at all.
As you know, in another thread (with Mark), we're discussing this.
If we conclude that we should go for "memmap=" approach, I will update my patch.

Thanks,
-Takahiro AKASHI

AKASHI Takahiro

unread,

May 11, 2015, 4:20:06 AM5/11/15

to

I think that you misunderstand somewhat.
* Kexec-tools only locates/identifies a small region for elfcore header within crash kernel's
memory region while 1st kernel is running.
* the data in elfcore header is filled up by kexec_load system call on 1st kernel.
* 1st kernel doesn't reserve any region for elfcore header because the kernel
commandline parameters don't contains "elfcorehdr=" parameter, then elfcorehdr_size=0.
* Crash dump kernel does reserve the region, as I said, because we don't want to
corrupt the info in elfcore header accidentally while crash kernel is running.

Clear?

-Takahiro AKASHI

Dave Young

unread,

May 11, 2015, 4:40:04 AM5/11/15

to

Ok, good to know. I did not notice the thread, will take a look.

There's another problem for memmap= approach though. In x86 we have problems when
we need pass reserved memory ranges as well because of some pci mmconfig issue.
In that case kernel command line buffer is too small and there's too many memory
ranges for reserved ranges. Thus later we switched to pass e820 in x86 boot params
directly.

If there's same potential problems we may consider to use memmap= only for system
ram, and for any other reserved types we can still get from same dtb in 1st kernel.

Thanks
Dave

Baoquan He

unread,

May 11, 2015, 5:50:05 AM5/11/15

to

On 05/11/15 at 05:17pm, AKASHI Takahiro wrote:
> On 05/11/2015 04:54 PM, Baoquan He wrote:
> >In this patch you reserve a separate memory region in 1st kernel to
> >store elfcorehdr. I am wondering why you don't call add_buffer in
> >kexec-tools directly. Like this you can get a region from reserved
> >crashkernel region. Then you don't need reserve_elfcorehdr() to reserve
> >memory for elfcorehdr specifically. Like other ARCHs do only one memory
> >region is reserved in 1st kernel, that's crashkernel region.
>
> I think that you misunderstand somewhat.
> * Kexec-tools only locates/identifies a small region for elfcore header within crash kernel's
> memory region while 1st kernel is running.
> * the data in elfcore header is filled up by kexec_load system call on 1st kernel.
> * 1st kernel doesn't reserve any region for elfcore header because the kernel
> commandline parameters don't contains "elfcorehdr=" parameter, then elfcorehdr_size=0.
> * Crash dump kernel does reserve the region, as I said, because we don't want to
> corrupt the info in elfcore header accidentally while crash kernel is running.
>
> Clear?

OK, got it now.

Then I am wondering why "elfcorehdr=" can't be contained in kernel
cmdline as other ARCH does. Maybe I need go over all related threads
then know why it is. Thanks for explanation.

Thanks
Baoquan

Dave Young

unread,

May 12, 2015, 1:50:06 AM5/12/15

to

On 05/11/15 at 03:16pm, AKASHI Takahiro wrote:
> Hi
>
> Sorry for late response. I was on vacation.
>
> On 04/24/2015 06:53 PM, Mark Rutland wrote:
> >Hi,
> >
> >On Fri, Apr 24, 2015 at 08:53:03AM +0100, AKASHI Takahiro wrote:
> >>This patch set enables kdump (crash dump kernel) support on arm64 on top of
> >>Geoff's kexec patchset.
> >>
> >>In this version, there are some arm64-specific usage/constraints:
> >>1) "mem=" boot parameter must be specified on crash dump kernel
> >> if the system starts on uefi.
> >
> >This sounds very painful. Why is this the case, and how do x86 and/or
> >ia64 get around that?
>
> As Dave (Young) said, x86 uses "memmap=XX" kernel commandline parameters
> to specify usable memory for crash dump kernel.

Originally x86 use memmap=exactmap memmap=XX to specify each section of
memories for 2nd kernel. But later because a lot of reserved type ranges
need to be passed ie. for pci mmconfig, and kernel cmdline buffer is
limited so kexec-tools later switch to passing these in x86 boot params as
E820 memory ranges directly.

> On my arm64 implementation, "linux,usable-memory" property is added
> to device tree blob by kexec-tools for this purpose.
> This is because, when I first implemented kdump on arm64, ppc is the only
> architecture that supports kdump AND utilizes device trees.
> Since kexec-tools as well as the kernel already has this framework,
> I believed that device-tree approach was smarter than a commandline
> parameter.
>
> However, uefi-based kernel ignores all the memory-related properties
> in a device tree and so this "mem=" workaround was added.

Kdump kernel reuses the memmap info getting from firmware during 1st kernel
boot, I do not think the memmap info can be cooked for crash kernel usable
memory. But it might be a better way to use a special fdt node for crash
kernel memory even for UEFI..

Another way is introducing a similar memmap=, but maybe consider only
system_ram type ranges. For other memory areas still use UEFI memmap.

Thanks
Dave

AKASHI Takahiro

unread,

May 12, 2015, 3:40:05 AM5/12/15

to

On 05/11/2015 06:41 PM, Baoquan He wrote:
> On 05/11/15 at 05:17pm, AKASHI Takahiro wrote:
>> On 05/11/2015 04:54 PM, Baoquan He wrote:
>>> In this patch you reserve a separate memory region in 1st kernel to
>>> store elfcorehdr. I am wondering why you don't call add_buffer in
>>> kexec-tools directly. Like this you can get a region from reserved
>>> crashkernel region. Then you don't need reserve_elfcorehdr() to reserve
>>> memory for elfcorehdr specifically. Like other ARCHs do only one memory
>>> region is reserved in 1st kernel, that's crashkernel region.
>>
>> I think that you misunderstand somewhat.
>> * Kexec-tools only locates/identifies a small region for elfcore header within crash kernel's
>> memory region while 1st kernel is running.
>> * the data in elfcore header is filled up by kexec_load system call on 1st kernel.
>> * 1st kernel doesn't reserve any region for elfcore header because the kernel
>> commandline parameters don't contains "elfcorehdr=" parameter, then elfcorehdr_size=0.
>> * Crash dump kernel does reserve the region, as I said, because we don't want to
>> corrupt the info in elfcore header accidentally while crash kernel is running.
>>
>> Clear?
>
> OK, got it now.
>
> Then I am wondering why "elfcorehdr=" can't be contained in kernel
> cmdline as other ARCH does. Maybe I need go over all related threads
> then know why it is. Thanks for explanation.

Kexec-tools on 1st kernel appends "elfcorehdr=" to the kernel command line,
actually chosen/bootargs in a device tree, that is passed to *crash dump kernel*.
So when crash dump kernel boots up, it can recognizes that area (and reserves it
for later use of managing /proc/vmcore.)

Thanks,
-Takahiro AKASHI

AKASHI Takahiro

unread,

May 18, 2015, 4:10:05 AM5/18/15

to

Mark,

> Kdump kernel reuses the memmap info getting from firmware during 1st kernel
> boot, I do not think the memmap info can be cooked for crash kernel usable
> memory. But it might be a better way to use a special fdt node for crash
> kernel memory even for UEFI..

Do you still prefer "memmap=" approach?

Just FYI,

x86 arm arm64 powerpc{,64}

(a) cmdline(crashkernel=) cmdline cmdline cmdline
(b) iomem iomem iomem dev tree
(c) cmdline(memmap=) cmdline(mem=) dev tree dev tree
& cmdline(mem=)

(a) how we specify/reserve crash dump kernel's memory on 1st kernel
(b) how we inform userspace tool of crash dump kernel's memory on 1st kernel
- iomem: add "Crash kernel" entry in /proc/iomem
- dev tree: add /chosen/linux,crashkernel-base (and crashkernel-size)
(c) how we specify usable memory on crash dump kernel
- dev tree: add /chosen/memory/linux,usable-memory

As you see, arm64, and arm as well, does things in a *hybrid* way of x86 and ppc.
I think we should go for a uniform way with device tree as ppc does.
(Even for (a), we can use a device tree.)

> Another way is introducing a similar memmap=, but maybe consider only
> system_ram type ranges. For other memory areas still use UEFI memmap.

Yeah, I don't know whether we need additional formats of memmap=, like
memmap=XX$YY or memmap=XX#YY, for ACPI stuffs.

Thanks,
-Takahiro AKASHI

AKASHI Takahiro

unread,

May 22, 2015, 2:00:05 AM5/22/15

to

Mark,

Could you please elaborate a bit more details so that I can address the issue?

Thanks,
-Takahiro AKASHI

AKASHI Takahiro

unread,

Aug 6, 2015, 3:20:06 AM8/6/15

to

Marc, Mark

Sorry for not revisiting your comment below for a while.

Is it just enough to remove kexec_mask_interrupts() and add gic_eoi_irq()
at the beginning of gic_cpu_init() in irq-gic.c and irq-gic-v3.c?

> This feels like the better option. I can cook a patch or two for that.

If you do, that will be much better :)

BTW, in arm-gic-v3.h, GICD_CTRL_ARE_NS is defined as
(1U << 4)
but should it be 5?
(I'm referring to the page 8-415 in IHI0069A.)

Thanks,
-Takahiro AKASHI

> Thanks,
>
> M.

Marc Zyngier

unread,

Aug 6, 2015, 12:00:05 PM8/6/15

to

Hi,

On 06/08/15 08:09, AKASHI Takahiro wrote:
> Marc, Mark
>
> Sorry for not revisiting your comment below for a while.

Wow. It took me a few minutes to page the context back in.

No, doing an EOI is definitely the wrong thing to do. If you do it in
the wrong order, you just screw up the GIC state machine. Plus, you have
no idea what to write there...

The only real solution is to zero the "active" registers.

>> This feels like the better option. I can cook a patch or two for that.
>
> If you do, that will be much better :)

OK, I'll prepare something that we can merge at the same time kexec
comes back from the dead (if it ever does - I'm not holding my breath).

>
> BTW, in arm-gic-v3.h, GICD_CTRL_ARE_NS is defined as
> (1U << 4)
> but should it be 5?
> (I'm referring to the page 8-415 in IHI0069A.)

No, look at the definition ARE_NS has when the access is non-secure or
on a system supporting a single security state. The definition you're
referring to is for a secure access (firmware).

Thanks,

M.
--
Jazz is not dead. It just smells funny...

AKASHI Takahiro

unread,

Aug 7, 2015, 12:30:05 AM8/7/15

to

Marc,

On 08/07/2015 12:51 AM, Marc Zyngier wrote:
> Hi,
>
> On 06/08/15 08:09, AKASHI Takahiro wrote:
>> Marc, Mark
>>
>> Sorry for not revisiting your comment below for a while.
>
> Wow. It took me a few minutes to page the context back in.

Please don't purge the page from your cache for a while :)

Thank you.
Please note that the same function, machine_kexec_mask_interrupts(),
is already there on arm(/kernel/machine_kexec.c).

Well, kexec/kdump stuff is not dead.
Hopefully I and Geoff will submit a full series of patchset in a few weeks
although the main logic will be the same.

>>
>> BTW, in arm-gic-v3.h, GICD_CTRL_ARE_NS is defined as
>> (1U << 4)
>> but should it be 5?
>> (I'm referring to the page 8-415 in IHI0069A.)
>
> No, look at the definition ARE_NS has when the access is non-secure or
> on a system supporting a single security state. The definition you're
> referring to is for a secure access (firmware).

Aha, I should remember that arm has "multiple personalities."

-Takahiro AKASHI

> Thanks,
>
> M.