[C/R ARM][PATCH 0/3] Linux Checkpoint-Restart

Christoffer Dall

unread,

Mar 21, 2010, 9:10:02 PM3/21/10

to

Following there will be two preparatory patches for an ARM port of the
checkpoint-restart code and finally a third patch implementing the
architecture-specific parts of c/r.

The preparatory patches consist of a systrace implementation for ARM
based on a previous patch from Roland McGrath and an eclone implementation
for ARM. The systrace implementation is partial and provides the needed
functionality for c/r.

There is a separate patch for the user space code, which supports
cross-compilation, extracting headers for ARM and an eclone implementation
for ARM.

The kernel patches presented here are based on the ckpt-v20 patch set.

Signed-off-by: Christoffer Dall <christo...@christofferdall.dk>
Acked-by: Oren Laadan <or...@cs.columbia.edu>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Christoffer Dall

unread,

Mar 21, 2010, 9:10:02 PM3/21/10

to

Implements architecture specific requirements for checkpoint/restart on
ARM. The changes touch almost only c/r related code. Most of the work is
done in arch/arm/checkpoint.c, which implements checkpointing of the CPU
and necessary fields on the thread_info struct.

The ISA version (given by __LINUX_ARM_ARCH__) is checkpointed and verified
against the machine architecture on restart. If they differ, an error is
raised and restart aborted. It should be possible to restart on newer
architectures, but further investigation is warranted.

Regarding ThumbEE, the thumbee_state field on the thread_info is stored
in checkpoints when CONFIG_ARM_THUMBEE and 0 is stored otherwise. If
a value different than 0 is checkpointed and CONFIG_ARM_THUMBEE is not
set on the restore system, the restore is aborted. Feedback on this
implementation is very welcome.

We checkpoint whether the system is running with CONFIG_MMU or not and
require the same configuration for the system on which we restore the
process. It might be possible to allow something more fine-grained,
if it's worth the energy. Input on this item is also very welcome,
specifically from someone who knows the exact meaning of the end_brk
field.

Added support for syscall sys_checkpoint and sys_restart for ARM:
__NR_checkpoint 367
__NR_restart 368

Cc: r...@arm.linux.org.uk

Signed-off-by: Christoffer Dall <christo...@christofferdall.dk>
Acked-by: Oren Laadan <or...@cs.columbia.edu>

---
arch/arm/Kconfig | 4 +
arch/arm/include/asm/checkpoint_hdr.h | 71 +++++++++
arch/arm/include/asm/ptrace.h | 1 +
arch/arm/include/asm/unistd.h | 2 +
arch/arm/kernel/Makefile | 1 +
arch/arm/kernel/calls.S | 2 +
arch/arm/kernel/checkpoint.c | 276 +++++++++++++++++++++++++++++++++
arch/arm/kernel/signal.c | 5 +
arch/arm/kernel/sys_arm.c | 13 ++
include/linux/checkpoint_hdr.h | 2 +
10 files changed, 377 insertions(+), 0 deletions(-)
create mode 100644 arch/arm/include/asm/checkpoint_hdr.h
create mode 100644 arch/arm/kernel/checkpoint.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 184a6bd..fe83129 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -94,6 +94,10 @@ config HAVE_LATENCYTOP_SUPPORT
depends on !SMP
default y

+config CHECKPOINT_SUPPORT
+ bool
+ default y
+
config LOCKDEP_SUPPORT
bool
default y
diff --git a/arch/arm/include/asm/checkpoint_hdr.h b/arch/arm/include/asm/checkpoint_hdr.h
new file mode 100644
index 0000000..c08a4ae
--- /dev/null
+++ b/arch/arm/include/asm/checkpoint_hdr.h
@@ -0,0 +1,71 @@
+#ifndef __ASM_ARM_CKPT_HDR_H
+#define __ASM_ARM_CKPT_HDR_H
+/*
+ * Checkpoint/restart - architecture specific headers ARM
+ *
+ * Copyright (C) 2008-2010 Oren Laadan
+ * Copyright 2010 Christoffer Dall
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+
+#ifndef _CHECKPOINT_CKPT_HDR_H_
+#error asm/checkpoint_hdr.h included directly
+#endif
+
+#include <linux/types.h>
+
+/* ARM structure seen from kernel/userspace */
+#ifdef __KERNEL__
+#include <asm/processor.h>
+#endif
+
+#define CKPT_ARCH_ID CKPT_ARCH_ARM
+
+/* arch dependent constants */
+#define CKPT_ARCH_NSIG 64
+#define CKPT_TTY_NCC 8
+
+#ifdef __KERNEL__
+
+#include <asm/signal.h>
+#if CKPT_ARCH_NSIG != _NSIG
+#error CKPT_ARCH_NSIG size is wrong per asm/signal.h and asm/checkpoint_hdr.h
+#endif
+
+#include <linux/tty.h>
+#if CKPT_TTY_NCC != NCC
+#error CKPT_TTY_NCC size is wrong per asm-generic/termios.h
+#endif
+
+#endif /* __KERNEL__ */
+
+
+struct ckpt_hdr_header_arch {
+ struct ckpt_hdr h;
+ __u32 linux_arm_arch;
+ __u8 mmu; /* Checkpointed on mmu system */
+ __u8 oabi_compat; /* Checkpointed on old ABI compat. system */
+} __attribute__((aligned(8)));
+
+struct ckpt_hdr_thread {
+ struct ckpt_hdr h;
+ __u32 syscall;
+ __u32 tp_value;
+ __u32 thumbee_state;
+} __attribute__((aligned(8)));
+
+
+struct ckpt_hdr_cpu {
+ struct ckpt_hdr h;
+ __u32 uregs[18];
+} __attribute__((aligned(8)));
+
+struct ckpt_hdr_mm_context {
+ struct ckpt_hdr h;
+ __u32 end_brk;
+} __attribute__((aligned(8)));
+
+#endif /* __ASM_ARM_CKPT_HDR__H */
diff --git a/arch/arm/include/asm/ptrace.h b/arch/arm/include/asm/ptrace.h
index eec6e89..624e5d1 100644
--- a/arch/arm/include/asm/ptrace.h
+++ b/arch/arm/include/asm/ptrace.h
@@ -57,6 +57,7 @@
#define PSR_C_BIT 0x20000000
#define PSR_Z_BIT 0x40000000
#define PSR_N_BIT 0x80000000
+#define PSR_GE_BITS 0x000f0000

/*
* Groups of PSR bits
diff --git a/arch/arm/include/asm/unistd.h b/arch/arm/include/asm/unistd.h
index f295a6c..7ec526e 100644
--- a/arch/arm/include/asm/unistd.h
+++ b/arch/arm/include/asm/unistd.h
@@ -393,6 +393,8 @@
#define __NR_perf_event_open (__NR_SYSCALL_BASE+364)
#define __NR_recvmmsg (__NR_SYSCALL_BASE+365)
#define __NR_eclone (__NR_SYSCALL_BASE+366)
+#define __NR_checkpoint (__NR_SYSCALL_BASE+367)
+#define __NR_restart (__NR_SYSCALL_BASE+368)

/*
* The following SWIs are ARM private.
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index dd00f74..1669065 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_ARM_THUMBEE) += thumbee.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_ARM_UNWIND) += unwind.o
obj-$(CONFIG_HAVE_TCM) += tcm.o
+obj-$(CONFIG_CHECKPOINT) += checkpoint.o

obj-$(CONFIG_CRUNCH) += crunch.o crunch-bits.o
AFLAGS_crunch-bits.o := -Wa,-mcpu=ep9312
diff --git a/arch/arm/kernel/calls.S b/arch/arm/kernel/calls.S
index 5ef0b03..aefb432 100644
--- a/arch/arm/kernel/calls.S
+++ b/arch/arm/kernel/calls.S
@@ -376,6 +376,8 @@
CALL(sys_perf_event_open)
/* 365 */ CALL(sys_recvmmsg)
CALL(sys_eclone_wrapper)
+ CALL(sys_checkpoint)
+ CALL(sys_restart)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
diff --git a/arch/arm/kernel/checkpoint.c b/arch/arm/kernel/checkpoint.c
new file mode 100644
index 0000000..1c9bb34
--- /dev/null
+++ b/arch/arm/kernel/checkpoint.c
@@ -0,0 +1,276 @@
+/*
+ * Checkpoint/restart - architecture specific support for ARM
+ *
+ * Copyright (C) 2008-2010 Oren Laadan
+ * Copyright (C) 2010 Christoffer Dall
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file COPYING in the main directory of the Linux
+ * distribution for more details.
+ */
+#include <linux/checkpoint.h>
+#include <linux/checkpoint_hdr.h>
+
+#include <asm/processor.h>
+
+
+#ifdef CONFIG_MMU
+ const u8 ckpt_mmu = 1;
+#else
+ const u8 ckpt_mmu = 0;
+#endif
+
+#ifdef CONFIG_OABI_COMPAT
+ const u8 ckpt_oabi_compat = 1;
+#else
+ const u8 ckpt_oabi_compat = 0;
+#endif
+
+
+/**************************************************************************
+ * Checkpoint
+ */
+
+/* dump the thread_struct of a given task */
+int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t)
+{
+ int ret;
+ struct ckpt_hdr_thread *h;
+ struct thread_info *ti = task_thread_info(t);
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_THREAD);
+ if (!h)
+ return -ENOMEM;
+
+ /*
+ * Store the syscall information about the checkpointed process
+ * as we need to know if the process was doing a syscall (and which)
+ * during restart.
+ */
+ h->syscall = ti->syscall;
+
+ /*
+ * Store remaining thread-specific info.
+ */
+ h->tp_value = ti->tp_value;
+#ifdef CONFIG_ARM_THUMBEE
+ h->thumbee_state = ti->thumbee_state;
+#else
+ /*
+ * If restoring on system with ThumbeEE support,
+ * zero will set ThumbEE state to unused.
+ */
+ h->thumbee_state = 0;
+#endif
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+static void save_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
+{
+ struct pt_regs *regs = task_pt_regs(t);
+
+ memcpy(&h->uregs, regs, sizeof(h->uregs));
+
+ /*
+ * for checkpoint in process context (from within a container),
+ * the actual syscall is taking place at this very moment; so
+ * we (optimistically) subtitute the future return value (0) of
+ * this syscall into r0, so that upon restart it will
+ * succeed (or it will endlessly retry checkpoint...)
+ */
+ if (t == current)
+ h->ARM_r0 = 0;
+}
+
+/* dump the cpu state and registers of a given task */
+int checkpoint_cpu(struct ckpt_ctx *ctx, struct task_struct *t)
+{
+ struct ckpt_hdr_cpu *h;
+ int ret;
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_CPU);
+ if (!h)
+ return -ENOMEM;
+
+ save_cpu_regs(h, t);
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+int checkpoint_write_header_arch(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_header_arch *arch_hdr;
+ int ret;
+
+ arch_hdr = ckpt_hdr_get_type(ctx, sizeof(*arch_hdr),
+ CKPT_HDR_HEADER_ARCH);
+ if (!arch_hdr)
+ return -ENOMEM;
+
+ arch_hdr->linux_arm_arch = __LINUX_ARM_ARCH__;
+ arch_hdr->mmu = ckpt_mmu;
+ arch_hdr->oabi_compat = ckpt_oabi_compat;
+
+ ret = ckpt_write_obj(ctx, &arch_hdr->h);
+ ckpt_hdr_put(ctx, arch_hdr);
+
+ return ret;
+}
+
+/* dump the mm->context state */
+int checkpoint_mm_context(struct ckpt_ctx *ctx, struct mm_struct *mm)
+{
+ struct ckpt_hdr_mm_context *h;
+ int ret = 0;
+
+ h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_MM_CONTEXT);
+ if (!h)
+ return -ENOMEM;
+
+#ifdef CONFIG_MMU
+ /*
+ * We do not checkpoint kvm_seq as we do not know of any generally
+ * exported functionality which would associate an ioremapped VMA
+ * with a task. A driver might use this functionality, but should
+ * implement its own checkpoint functionality to deal with this.
+ */
+#else
+ h->end_brk = mm->context.end_brk;
+#endif
+
+ ret = ckpt_write_obj(ctx, &h->h);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+/**************************************************************************
+ * Restart
+ */
+
+/* read the thread_struct into the current task */
+int restore_thread(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_thread *h;
+ int ret = 0;
+ struct thread_info *ti = task_thread_info(current);
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_THREAD);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+ ti->syscall = h->syscall;
+ ti->tp_value = h->tp_value;
+
+#ifdef CONFIG_ARM_THUMBEE
+ /*
+ * If the checkpoint system did not support ThumbEE, this field
+ * will be zero, equivalent to unused ThumbEE state.
+ */
+ h->thumbee_state = ti->thumbee_state;
+#else
+ if (ti->thumbee_state != 0) {
+ ret = -EINVAL;
+ ckpt_err(ctx, ret, "Checkpoint had ThumbEE state but "
+ "ARM_THUMBEE not configured.");
+ }
+#endif
+
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+static int load_cpu_regs(struct ckpt_hdr_cpu *h, struct task_struct *t)
+{
+ int i;
+ struct pt_regs *regs = task_pt_regs(t);
+
+ memcpy(regs, &h->uregs, sizeof(struct pt_regs));
+
+ for (i = 0; i < 16; i++)
+ regs->uregs[i] = h->uregs[i];
+
+ /*
+ * Restore only user-writable bits on the CPSR
+ */
+ regs->ARM_cpsr = regs->ARM_cpsr |
+ (h->ARM_cpsr & (PSR_N_BIT | PSR_Z_BIT |
+ PSR_C_BIT | PSR_V_BIT |
+ PSR_V_BIT | PSR_Q_BIT |
+ PSR_E_BIT | PSR_GE_BITS));
+ regs->ARM_ORIG_r0 = h->ARM_ORIG_r0;
+
+ return 0;
+}
+
+/* read the cpu state and registers for the current task */
+int restore_cpu(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_cpu *h;
+ struct task_struct *t = current;
+ int ret;
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_CPU);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+ ret = load_cpu_regs(h, t);
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
+
+int restore_read_header_arch(struct ckpt_ctx *ctx)
+{
+ struct ckpt_hdr_header_arch *arch_hdr;
+ int ret = -EINVAL;
+
+ arch_hdr = ckpt_read_obj_type(ctx, sizeof(*arch_hdr),
+ CKPT_HDR_HEADER_ARCH);
+ if (IS_ERR(arch_hdr))
+ return PTR_ERR(arch_hdr);
+
+ if (arch_hdr->linux_arm_arch != __LINUX_ARM_ARCH__) {
+ ckpt_err(ctx, ret, "incompatible ARM architecture versions");
+ goto out;
+ }
+
+ /* TODO: Maybe compatibility can be more fine-grained */
+ if (arch_hdr->mmu != ckpt_mmu) {
+ ckpt_err(ctx, ret, "checkpoint %s MMU, restore %s MMU",
+ arch_hdr->mmu ? "with" : "without",
+ ckpt_mmu ? "with" : "without");
+ goto out;
+ }
+
+ ret = 0;
+
+ if (arch_hdr->oabi_compat && !ckpt_oabi_compat) {
+ ckpt_msg(ctx, "warning: process may have used old ABI. "
+ "CONFIG_OABI_COMPAT not set.");
+ }
+
+out:
+ ckpt_hdr_put(ctx, arch_hdr);
+ return ret;
+}
+
+int restore_mm_context(struct ckpt_ctx *ctx, struct mm_struct *mm)
+{
+ struct ckpt_hdr_mm_context *h;
+ int ret = 0;
+
+ h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_MM_CONTEXT);
+ if (IS_ERR(h))
+ return PTR_ERR(h);
+
+#if !CONFIG_MMU
+ mm->context.end_brk = h->end_brk;
+#endif
+
+ ckpt_hdr_put(ctx, h);
+ return ret;
+}
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index f695239..b42c39a 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -688,6 +688,11 @@ static void do_signal(struct pt_regs *regs)
single_step_set(current);
}

+int task_has_saved_sigmask(struct task_struct *task)
+{
+ return !!(task_thread_info(task)->flags & _TIF_RESTORE_SIGMASK);
+}
+
asmlinkage void
do_notify_resume(struct pt_regs *regs, unsigned int thread_flags)
{
diff --git a/arch/arm/kernel/sys_arm.c b/arch/arm/kernel/sys_arm.c
index fd8199d..eb178ad 100644
--- a/arch/arm/kernel/sys_arm.c
+++ b/arch/arm/kernel/sys_arm.c
@@ -27,6 +27,7 @@
#include <linux/file.h>
#include <linux/ipc.h>
#include <linux/uaccess.h>
+#include <linux/checkpoint.h>

struct mmap_arg_struct {
unsigned long addr;
@@ -295,3 +296,15 @@ asmlinkage long sys_arm_fadvise64_64(int fd, int advice,
{
return sys_fadvise64_64(fd, offset, len, advice);
}
+
+asmlinkage long sys_checkpoint(unsigned long pid, unsigned long fd,
+ unsigned long flags, unsigned long logfd)
+{
+ return do_sys_checkpoint(pid, fd, flags, logfd);
+}
+
+asmlinkage long sys_restart(unsigned long pid, unsigned long fd,
+ unsigned long flags, unsigned long logfd)
+{
+ return do_sys_restart(pid, fd, flags, logfd);
+}
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 41412d1..8309a3b 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -202,6 +202,8 @@ enum {
#define CKPT_ARCH_PPC32 CKPT_ARCH_PPC32
CKPT_ARCH_PPC64,
#define CKPT_ARCH_PPC64 CKPT_ARCH_PPC64
+ CKPT_ARCH_ARM,
+#define CKPT_ARCH_ARM CKPT_ARCH_ARM
};

/* shared objrects (objref) */
--
1.5.6.5

Christoffer Dall

unread,

Mar 21, 2010, 9:10:02 PM3/21/10

to

This small commit introduces a global state of system calls for ARM
making it possible for a debugger or checkpointing to gain information
about another process' state with respect to system calls.

The patch is based on this proposal from Roland McGrath:
https://patchwork.kernel.org/patch/32101/

Cc: Roland McGrath <rol...@redhat.com>

Signed-off-by: Christoffer Dall <christo...@christofferdall.dk>
Acked-by: Oren Laadan <or...@cs.columbia.edu>

---
arch/arm/include/asm/syscall.h | 31 +++++++++++++++++++++++++++++++
arch/arm/kernel/asm-offsets.c | 1 +
arch/arm/kernel/entry-common.S | 8 +++++++-
arch/arm/kernel/ptrace.c | 2 --
arch/arm/kernel/signal.c | 14 +++++++-------
5 files changed, 46 insertions(+), 10 deletions(-)
create mode 100644 arch/arm/include/asm/syscall.h

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
new file mode 100644
index 0000000..3b3248f
--- /dev/null
+++ b/arch/arm/include/asm/syscall.h
@@ -0,0 +1,31 @@
+/*
+ * syscalls.h - Linux syscall interfaces for ARM
+ *
+ * Copyright (c) 2010 Christoffer Dall
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#ifndef _ASM_ARM_SYSCALLS_H
+#define _ASM_ARM_SYSCALLS_H
+
+static inline int syscall_get_nr(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return (int)(task_thread_info(task)->syscall);
+}
+
+static inline long syscall_get_return_value(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->ARM_r0;
+}
+
+static inline long syscall_get_error(struct task_struct *task,
+ struct pt_regs *regs)
+{
+ return regs->ARM_r0;
+}
+
+#endif /* _ASM_ARM_SYSCALLS_H */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 4a88125..726a0ad 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -48,6 +48,7 @@ int main(void)
DEFINE(TI_CPU, offsetof(struct thread_info, cpu));
DEFINE(TI_CPU_DOMAIN, offsetof(struct thread_info, cpu_domain));
DEFINE(TI_CPU_SAVE, offsetof(struct thread_info, cpu_context));
+ DEFINE(TI_SYSCALL, offsetof(struct thread_info, syscall));
DEFINE(TI_USED_CP, offsetof(struct thread_info, used_cp));
DEFINE(TI_TP_VALUE, offsetof(struct thread_info, tp_value));
DEFINE(TI_FPSTATE, offsetof(struct thread_info, fpstate));
diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
index 2c1db77..f694f4d 100644
--- a/arch/arm/kernel/entry-common.S
+++ b/arch/arm/kernel/entry-common.S
@@ -30,6 +30,9 @@ ret_fast_syscall:
tst r1, #_TIF_WORK_MASK
bne fast_work_pending

+ mov r2, #-1
+ str r2, [tsk, #TI_SYSCALL]
+
/* perform architecture specific actions before user return */
arch_ret_to_user r1, lr

@@ -47,7 +50,6 @@ work_pending:
tst r1, #_TIF_SIGPENDING|_TIF_NOTIFY_RESUME
beq no_work_pending
mov r0, sp @ 'regs'
- mov r2, why @ 'syscall'
bl do_notify_resume
b ret_slow_syscall @ Check work again

@@ -62,6 +64,9 @@ ret_slow_syscall:
ldr r1, [tsk, #TI_FLAGS]
tst r1, #_TIF_WORK_MASK
bne work_pending
+
+ mov r2, #-1
+ str r2, [tsk, #TI_SYSCALL]
no_work_pending:
/* perform architecture specific actions before user return */
arch_ret_to_user r1, lr
@@ -274,6 +279,7 @@ ENTRY(vector_swi)
eor scno, scno, #__NR_SYSCALL_BASE @ check OS number
#endif

+ str scno, [tsk, #TI_SYSCALL] @ store syscall nr. globally
stmdb sp!, {r4, r5} @ push fifth and sixth args
tst ip, #_TIF_SYSCALL_TRACE @ are we tracing syscalls?
bne __sys_trace
diff --git a/arch/arm/kernel/ptrace.c b/arch/arm/kernel/ptrace.c
index a2ea385..44ab437 100644
--- a/arch/arm/kernel/ptrace.c
+++ b/arch/arm/kernel/ptrace.c
@@ -863,8 +863,6 @@ asmlinkage int syscall_trace(int why, struct pt_regs *regs, int scno)
ip = regs->ARM_ip;
regs->ARM_ip = why;

- current_thread_info()->syscall = scno;
-
/* the 0x80 provides a way for the tracing parent to distinguish
between a syscall stop and SIGTRAP delivery */
ptrace_notify(SIGTRAP | ((current->ptrace & PT_TRACESYSGOOD)
diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index e7714f3..f695239 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -527,7 +527,7 @@ static inline void setup_syscall_restart(struct pt_regs *regs)
static int
handle_signal(unsigned long sig, struct k_sigaction *ka,
siginfo_t *info, sigset_t *oldset,
- struct pt_regs * regs, int syscall)
+ struct pt_regs *regs)
{
struct thread_info *thread = current_thread_info();
struct task_struct *tsk = current;
@@ -537,7 +537,7 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
/*
* If we were from a system call, check for system call restarting...
*/
- if (syscall) {
+ if (thread->syscall != -1) {
switch (regs->ARM_r0) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:
@@ -601,7 +601,7 @@ handle_signal(unsigned long sig, struct k_sigaction *ka,
* the kernel can handle, and then we build all the user-level signal handling
* stack-frames in one go after that.
*/
-static void do_signal(struct pt_regs *regs, int syscall)
+static void do_signal(struct pt_regs *regs)
{
struct k_sigaction ka;
siginfo_t info;
@@ -629,7 +629,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
oldset = &current->saved_sigmask;
else
oldset = &current->blocked;
- if (handle_signal(signr, &ka, &info, oldset, regs, syscall) == 0) {
+ if (handle_signal(signr, &ka, &info, oldset, regs) == 0) {
/*
* A signal was successfully delivered; the saved
* sigmask will have been stored in the signal frame,
@@ -647,7 +647,7 @@ static void do_signal(struct pt_regs *regs, int syscall)
/*
* No signal to deliver to the process - restart the syscall.
*/
- if (syscall) {
+ if (current_thread_info()->syscall != -1) {
if (regs->ARM_r0 == -ERESTART_RESTARTBLOCK) {
if (thumb_mode(regs)) {
regs->ARM_r7 = __NR_restart_syscall - __NR_SYSCALL_BASE;
@@ -689,10 +689,10 @@ static void do_signal(struct pt_regs *regs, int syscall)
}

asmlinkage void
-do_notify_resume(struct pt_regs *regs, unsigned int thread_flags, int syscall)
+do_notify_resume(struct pt_regs *regs, unsigned int thread_flags)
{
if (thread_flags & _TIF_SIGPENDING)
- do_signal(regs, syscall);
+ do_signal(regs);

if (thread_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
--
1.5.6.5

Serge E. Hallyn

unread,

Mar 23, 2010, 12:10:02 PM3/23/10

to

In terms of the cr api I don't see any problems. Two nits below,
but in any case

Acked-by: Serge Hallyn <se...@us.ibm.com>

thanks, this is really cool, especially how minimal it is :)
-serge

...

will load_cpu_regs() ever be changed to return anything but 0? If
not both fns can be simplified.

...

> +int restore_mm_context(struct ckpt_ctx *ctx, struct mm_struct *mm)
> +{
> + struct ckpt_hdr_mm_context *h;
> + int ret = 0;
> +
> + h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_MM_CONTEXT);
> + if (IS_ERR(h))
> + return PTR_ERR(h);
> +
> +#if !CONFIG_MMU
> + mm->context.end_brk = h->end_brk;
> +#endif
> +
> + ckpt_hdr_put(ctx, h);
> + return ret;

Again ret doesn't seem needed here.

-serge

Russell King - ARM Linux

unread,

Mar 23, 2010, 8:00:02 PM3/23/10

to

On Sun, Mar 21, 2010 at 09:06:05PM -0400, Christoffer Dall wrote:
> The ISA version (given by __LINUX_ARM_ARCH__) is checkpointed and verified
> against the machine architecture on restart.

I think you misunderstand what __LINUX_ARM_ARCH__ signifies. It is the
build architecture for the kernel, and it indicates the lowest
architecture version that the kernel will run on.

That doesn't indicate what ISA version the system is running on, or even
if the ABI is compatible (we have two ABIs - OABI and EABI).

There's also the matter of FP implementation - whether it is VFP or FPA,
and whether iwMMXt is available or not. (iwMMXt precludes the use of
FPA.)

> Regarding ThumbEE, the thumbee_state field on the thread_info is stored
> in checkpoints when CONFIG_ARM_THUMBEE and 0 is stored otherwise. If
> a value different than 0 is checkpointed and CONFIG_ARM_THUMBEE is not
> set on the restore system, the restore is aborted. Feedback on this
> implementation is very welcome.

I don't recognise this configuration symbol; it doesn't exist in mainline.

> We checkpoint whether the system is running with CONFIG_MMU or not and
> require the same configuration for the system on which we restore the
> process. It might be possible to allow something more fine-grained,
> if it's worth the energy. Input on this item is also very welcome,
> specifically from someone who knows the exact meaning of the end_brk
> field.

Processes which run on MMU and non-MMU CPUs are unlikely to be
interchangable - the run time environments are quite different. I
think this is a sane check.

> +/* dump the thread_struct of a given task */
> +int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t)
> +{
> + int ret;
> + struct ckpt_hdr_thread *h;
> + struct thread_info *ti = task_thread_info(t);
> +
> + h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_THREAD);
> + if (!h)
> + return -ENOMEM;
> +
> + /*
> + * Store the syscall information about the checkpointed process
> + * as we need to know if the process was doing a syscall (and which)
> + * during restart.
> + */
> + h->syscall = ti->syscall;
> +
> + /*
> + * Store remaining thread-specific info.
> + */
> + h->tp_value = ti->tp_value;

How do you safely obtain consistent information from a thread? Do you
temporarily stop it?

Russell King - ARM Linux

unread,

Mar 23, 2010, 8:00:03 PM3/23/10

to

On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
> This small commit introduces a global state of system calls for ARM
> making it possible for a debugger or checkpointing to gain information
> about another process' state with respect to system calls.

I don't particularly like the idea that we always store the syscall
number to memory for every system call, whether the stored version is
used or not.

Since ARM caches are generally not write allocate, this means mostly
write-only variables can have a higher than expected expense.

Is there not some thread flag which can be checked to see if we need to
store the syscall number?

Matt Helsley

unread,

Mar 23, 2010, 10:00:03 PM3/23/10

to

On Tue, Mar 23, 2010 at 09:18:43PM +0000, Russell King - ARM Linux wrote:

<snip> (sorry -- I'm not familiar with ARM so I can't respond to those)

> > +/* dump the thread_struct of a given task */
> > +int checkpoint_thread(struct ckpt_ctx *ctx, struct task_struct *t)
> > +{
> > + int ret;
> > + struct ckpt_hdr_thread *h;
> > + struct thread_info *ti = task_thread_info(t);
> > +
> > + h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_THREAD);
> > + if (!h)
> > + return -ENOMEM;
> > +
> > + /*
> > + * Store the syscall information about the checkpointed process
> > + * as we need to know if the process was doing a syscall (and which)
> > + * during restart.
> > + */
> > + h->syscall = ti->syscall;
> > +
> > + /*
> > + * Store remaining thread-specific info.
> > + */
> > + h->tp_value = ti->tp_value;
>
> How do you safely obtain consistent information from a thread? Do you
> temporarily stop it?

It must be frozen with the cgroup freezer (which reuses the suspend freezer).
sys_checkpoint moves the cgroup freezer into the CHECKPOINTING state which
prevents tasks in that group from being thawed until just before checkpoint
returns.

Cheers,
-Matt Helsley

Matt Helsley

unread,

Mar 23, 2010, 10:10:01 PM3/23/10

to

On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux wrote:
> On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
> > This small commit introduces a global state of system calls for ARM
> > making it possible for a debugger or checkpointing to gain information
> > about another process' state with respect to system calls.
>
> I don't particularly like the idea that we always store the syscall
> number to memory for every system call, whether the stored version is
> used or not.
>
> Since ARM caches are generally not write allocate, this means mostly
> write-only variables can have a higher than expected expense.
>
> Is there not some thread flag which can be checked to see if we need to
> store the syscall number?

Perhaps before we freeze the task we can save the syscall number on ARM.
The patches suggest that the signal delivery path -- which the freezer
utilizes -- has the syscall number already.

Should work since the threads must be frozen first anyway.

Cheers,
-Matt Helsley

Oren Laadan

unread,

Mar 24, 2010, 1:00:02 AM3/24/10

to

On Tue, 23 Mar 2010, Matt Helsley wrote:

> On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux wrote:
> > On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
> > > This small commit introduces a global state of system calls for ARM
> > > making it possible for a debugger or checkpointing to gain information
> > > about another process' state with respect to system calls.
> >
> > I don't particularly like the idea that we always store the syscall
> > number to memory for every system call, whether the stored version is
> > used or not.
> >
> > Since ARM caches are generally not write allocate, this means mostly
> > write-only variables can have a higher than expected expense.
> >
> > Is there not some thread flag which can be checked to see if we need to
> > store the syscall number?
>
> Perhaps before we freeze the task we can save the syscall number on ARM.
> The patches suggest that the signal delivery path -- which the freezer
> utilizes -- has the syscall number already.
>
> Should work since the threads must be frozen first anyway.

I like the idea.

However, would it also work for those cases when the freezing does not
occur from the signal delivery path - e.g. for vfork and ptraced tasks ?

Oren.

Matt Helsley

unread,

Mar 24, 2010, 10:50:01 AM3/24/10

to

On Wed, Mar 24, 2010 at 12:57:46AM -0400, Oren Laadan wrote:
> On Tue, 23 Mar 2010, Matt Helsley wrote:
>
> > On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux wrote:
> > > On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
> > > > This small commit introduces a global state of system calls for ARM
> > > > making it possible for a debugger or checkpointing to gain information
> > > > about another process' state with respect to system calls.
> > >
> > > I don't particularly like the idea that we always store the syscall
> > > number to memory for every system call, whether the stored version is
> > > used or not.
> > >
> > > Since ARM caches are generally not write allocate, this means mostly
> > > write-only variables can have a higher than expected expense.
> > >
> > > Is there not some thread flag which can be checked to see if we need to
> > > store the syscall number?
> >
> > Perhaps before we freeze the task we can save the syscall number on ARM.
> > The patches suggest that the signal delivery path -- which the freezer
> > utilizes -- has the syscall number already.
> >
> > Should work since the threads must be frozen first anyway.
>
> I like the idea.
>
> However, would it also work for those cases when the freezing does not
> occur from the signal delivery path - e.g. for vfork and ptraced tasks ?

We could just as easily set it before the vfork uninterruptible completion.
ptracing I'd don't know about though.

Cheers,
-Matt Helsley

Oren Laadan

unread,

Mar 24, 2010, 12:00:04 PM3/24/10

to

Matt Helsley wrote:
> On Wed, Mar 24, 2010 at 12:57:46AM -0400, Oren Laadan wrote:
>> On Tue, 23 Mar 2010, Matt Helsley wrote:
>>
>>> On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux wrote:
>>>> On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
>>>>> This small commit introduces a global state of system calls for ARM
>>>>> making it possible for a debugger or checkpointing to gain information
>>>>> about another process' state with respect to system calls.
>>>> I don't particularly like the idea that we always store the syscall
>>>> number to memory for every system call, whether the stored version is
>>>> used or not.
>>>>
>>>> Since ARM caches are generally not write allocate, this means mostly
>>>> write-only variables can have a higher than expected expense.
>>>>
>>>> Is there not some thread flag which can be checked to see if we need to
>>>> store the syscall number?
>>> Perhaps before we freeze the task we can save the syscall number on ARM.
>>> The patches suggest that the signal delivery path -- which the freezer
>>> utilizes -- has the syscall number already.

Actually, the signal path doesn't have the syscall number, it has
a binary "in syscall" value.

>>>
>>> Should work since the threads must be frozen first anyway.
>> I like the idea.
>>
>> However, would it also work for those cases when the freezing does not
>> occur from the signal delivery path - e.g. for vfork and ptraced tasks ?
>
> We could just as easily set it before the vfork uninterruptible completion.
> ptracing I'd don't know about though.
>

vfork() uses freezer_do_not_count() to tell the freezer that it's
effectively frozen. It's also used by drivers/char/apm-emulation.c

Looking at calls to ptrace_notify(), ptrace_stop() and ptace_event(),
there are several places where a ptraced task can stop with TASK_TRACED
(which is good enough for the freezer), outside the signal handling
path.

This means that recording the syscall number for all these cases is
going to be tedious and intrusive.

I prefer to somehow figure out the syscall from the task's state or
pt_regs, or by (re)using the same assembly code that already does that.

Oren.

Christoffer Dall

unread,

Mar 24, 2010, 3:40:02 PM3/24/10

to

On Wed, Mar 24, 2010 at 4:53 PM, Oren Laadan <or...@cs.columbia.edu> wrote:
>
>
> Matt Helsley wrote:
>>
>> On Wed, Mar 24, 2010 at 12:57:46AM -0400, Oren Laadan wrote:
>>>
>>> On Tue, 23 Mar 2010, Matt Helsley wrote:
>>>
>>>> On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux
>>>> wrote:
>>>>>
>>>>> On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
>>>>>>
>>>>>> This small commit introduces a global state of system calls for ARM
>>>>>> making it possible for a debugger or checkpointing to gain information
>>>>>> about another process' state with respect to system calls.
>>>>>
>>>>> I don't particularly like the idea that we always store the syscall
>>>>> number to memory for every system call, whether the stored version is
>>>>> used or not.
>>>>>
>>>>> Since ARM caches are generally not write allocate, this means mostly
>>>>> write-only variables can have a higher than expected expense.
>>>>>
>>>>> Is there not some thread flag which can be checked to see if we need to
>>>>> store the syscall number?
>>>>
>>>> Perhaps before we freeze the task we can save the syscall number on ARM.
>>>> The patches suggest that the signal delivery path -- which the freezer
>>>> utilizes -- has the syscall number already.
>
> Actually, the signal path doesn't have the syscall number, it has
> a binary "in syscall" value.
>

Well, this could be changed to pass the syscall number through
registers along to try_to_freeze without any mentionable performance
hit.

>>>>
>>>> Should work since the threads must be frozen first anyway.
>>>
>>> I like the idea.
>>>
>>> However, would it also work for those cases when the freezing does not
>>> occur from the signal delivery path - e.g. for vfork and ptraced tasks ?
>>
>> We could just as easily set it before the vfork uninterruptible
>> completion.
>> ptracing I'd don't know about though.
>>
>
> vfork() uses freezer_do_not_count() to tell the freezer that it's
> effectively frozen. It's also used by drivers/char/apm-emulation.c
>
> Looking at calls to ptrace_notify(), ptrace_stop() and ptace_event(),
> there are several places where a ptraced task can stop with TASK_TRACED
> (which is good enough for the freezer), outside the signal handling
> path.
>
> This means that recording the syscall number for all these cases is
> going to be tedious and intrusive.
>
> I prefer to somehow figure out the syscall from the task's state or
> pt_regs, or by (re)using the same assembly code that already does that.

Re-using the assembly code or factoring it out so that it can be used
from multiple places doesn't seem very pleasing to me, as the assembly
code is in the critical path and written specifically for the context
of a process entering the kernel. Please correct me if I'm wrong.

I imagine simply a function in C, more or less re-implementing the
logic that's already in entry-common.S, might do the trick. I wouldn't
worry much about the performance in this case as it will not be used
often. The following _untested_ snippet illustrates my idea:

---
arch/arm/include/asm/syscall.h | 93 +++++++++++++++++++++++++++++++++++++++-
1 files changed, 92 insertions(+), 1 deletions(-)

diff --git a/arch/arm/include/asm/syscall.h b/arch/arm/include/asm/syscall.h
index 3b3248f..a7f2615 100644
--- a/arch/arm/include/asm/syscall.h
+++ b/arch/arm/include/asm/syscall.h
@@ -10,10 +10,101 @@
#ifndef _ASM_ARM_SYSCALLS_H
#define _ASM_ARM_SYSCALLS_H

+static inline int get_swi_instruction(struct task_struct *task,
+ struct pt_regs *regs,
+ unsigned long *instr)
+{
+ struct page *page = NULL;
+ unsigned long instr_addr;
+ unsigned long *ptr;
+ int ret;
+
+ instr_addr = regs->ARM_pc - 4;
+
+ down_read(&task->mm->mmap_sem);
+ ret = get_user_pages(task, task->mm, instr_addr,
+ 1, 0, 0, &page, NULL);
+ up_read(&task->mm->mmap_sem);
+
+ if (ret < 0)
+ return ret;
+
+ ptr = (unsigned long *)kmap_atomic(page, KM_USER1);
+ memcpy(instr,
+ ptr + (instr_addr >> PAGE_SHIFT),
+ sizeof(unsigned long));
+ kunmap_atomic(ptr, KM_USER1);
+
+ page_cache_release(page);

+
+ return 0;
+}
+

+static inline int __syscall_get_nr(struct task_struct *task,

+ struct pt_regs *regs)
+{

+ int ret;
+ int scno;
+ unsigned long instr;
+ bool config_oabi = false;
+ bool config_aeabi = false;
+ bool config_arm_thumb = false;
+ bool config_cpu_endian_be8 = false;
+
+#ifdef CONFIG_OABI_COMPAT
+ config_oabi = true;
+#endif
+#ifdef CONFIG_AEABI
+ config_aeabi = true;
+#endif
+#ifdef CONFIG_ARM_THUMB
+ config_arm_thumb = true;
+#endif
+#ifdef CONFIG_CPU_ENDIAN_BE8
+ config_cpu_endian_be8 = true;
+#endif
+#ifdef CONFIG_CPU_ARM710
+ return -1;
+#endif
+
+ if (config_aeabi && !config_oabi) {
+ /* Pure EABI */
+ return regs->ARM_r7;
+ } else if (config_oabi) {
+ if (config_arm_thumb && (regs->ARM_cpsr & PSR_T_BIT))
+ return -1;
+
+ ret = get_swi_instruction(task, regs, &instr);
+ if (ret < 0)
+ return -1;
+
+ if (config_cpu_endian_be8)
+ asm ("rev %[out], %[in]": [out] "=r" (instr):
+ : [in] "r" (instr));
+
+ if ((instr & 0x00ffffff) == 0)
+ return regs->ARM_r7; /* EABI call */
+ else
+ return (instr & 0x00ffffff) | __NR_OABI_SYSCALL_BASE;
+ } else {
+ /* Legacy ABI only */
+ if (config_arm_thumb && (regs->ARM_cpsr & PSR_T_BIT)) {
+ /* Thumb mode ABI */
+ scno = regs->ARM_r7 + __NR_SYSCALL_BASE;
+ } else {
+ ret = get_swi_instruction(task, regs, &instr);
+ if (ret < 0)
+ return -1;
+ scno = instr;
+ }
+ return scno & 0x00ffffff;
+ }
+}
+

static inline int syscall_get_nr(struct task_struct *task,

struct pt_regs *regs)
{
- return (int)(task_thread_info(task)->syscall);
+ return __syscall_get_nr(task, regs);
}

static inline long syscall_get_return_value(struct task_struct *task,

--
1.5.6.5

Christoffer Dall

unread,

Mar 24, 2010, 3:50:02 PM3/24/10

to

thanks

you're right. I will put load_cpu_regs() inline in restore_cpu.

> ...
>
>> +int restore_mm_context(struct ckpt_ctx *ctx, struct mm_struct *mm)
>> +{
>> + struct ckpt_hdr_mm_context *h;
>> + int ret = 0;
>> +
>> + h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_MM_CONTEXT);
>> + if (IS_ERR(h))
>> + return PTR_ERR(h);
>> +
>> +#if !CONFIG_MMU
>> + mm->context.end_brk = h->end_brk;
>> +#endif
>> +
>> + ckpt_hdr_put(ctx, h);
>> + return ret;
>
> Again ret doesn't seem needed here.

indeed it doesn't.

-Christoffer

Christoffer Dall

unread,

Mar 24, 2010, 4:50:02 PM3/24/10

to

On Tue, Mar 23, 2010 at 10:18 PM, Russell King - ARM Linux
<li...@arm.linux.org.uk> wrote:
> On Sun, Mar 21, 2010 at 09:06:05PM -0400, Christoffer Dall wrote:
>> The ISA version (given by __LINUX_ARM_ARCH__) is checkpointed and verified
>> against the machine architecture on restart.
>
> I think you misunderstand what __LINUX_ARM_ARCH__ signifies. It is the
> build architecture for the kernel, and it indicates the lowest
> architecture version that the kernel will run on.

Yes, clearly I didn't understand this fully. So is it in fact possible
to compile the kernel with __LINUX_ARM_ARCH=6 and have
CONFIG_CPU_32v7? Or is it a matter of running a v6 kernel with
CONFIG_CPU_32v6 on a newer architecture?

What I would like to accomplish is the best way to make sure that the
restarted process will in fact be able to run. What is the best way to
ensure this with regards to the architecture version?

>
> That doesn't indicate what ISA version the system is running on, or even
> if the ABI is compatible (we have two ABIs - OABI and EABI).

That's why I checkpointed CONFIG_OABI_COMPAT, but I realize that it's
not sufficient.

How about checkpointing CONFIG_AEABI and CONFIG_OABI_COMPAT and making
sure that we either restore to the same setting of the two or restore
to CONFIG_OABI_COMPAT=y?

>
> There's also the matter of FP implementation - whether it is VFP or FPA,
> and whether iwMMXt is available or not. (iwMMXt precludes the use of
> FPA.)

I had a feeling this would be an issue, but I never dove into the
workings of FP on ARM. Can you give me some concrete pointers as what
to checkpoint and restart / check on restart for a process using FP to
be able to be restarted?

>
>> Regarding ThumbEE, the thumbee_state field on the thread_info is stored
>> in checkpoints when CONFIG_ARM_THUMBEE and 0 is stored otherwise. If
>> a value different than 0 is checkpointed and CONFIG_ARM_THUMBEE is not
>> set on the restore system, the restore is aborted. Feedback on this
>> implementation is very welcome.
>
> I don't recognise this configuration symbol; it doesn't exist in mainline.
>

I encountered it when looking at struct thread_info in
arch/arm/include/asm/thread_info.h, and have not seen et before. After
looking into it a little more, it's included it 2.6.33 and defined in
arch/arm/mm/Kconfig.

>> We checkpoint whether the system is running with CONFIG_MMU or not and
>> require the same configuration for the system on which we restore the
>> process. It might be possible to allow something more fine-grained,
>> if it's worth the energy. Input on this item is also very welcome,
>> specifically from someone who knows the exact meaning of the end_brk
>> field.
>
> Processes which run on MMU and non-MMU CPUs are unlikely to be
> interchangable - the run time environments are quite different. I
> think this is a sane check.
>

thanks.

Matt Helsley

unread,

Mar 24, 2010, 9:20:01 PM3/24/10

to

On Wed, Mar 24, 2010 at 08:36:39PM +0100, Christoffer Dall wrote:
> On Wed, Mar 24, 2010 at 4:53 PM, Oren Laadan <or...@cs.columbia.edu> wrote:
> >
> >
> > Matt Helsley wrote:
> >>
> >> On Wed, Mar 24, 2010 at 12:57:46AM -0400, Oren Laadan wrote:
> >>>
> >>> On Tue, 23 Mar 2010, Matt Helsley wrote:
> >>>
> >>>> On Tue, Mar 23, 2010 at 08:53:42PM +0000, Russell King - ARM Linux
> >>>> wrote:
> >>>>>
> >>>>> On Sun, Mar 21, 2010 at 09:06:03PM -0400, Christoffer Dall wrote:
> >>>>>>
> >>>>>> This small commit introduces a global state of system calls for ARM
> >>>>>> making it possible for a debugger or checkpointing to gain information
> >>>>>> about another process' state with respect to system calls.
> >>>>>
> >>>>> I don't particularly like the idea that we always store the syscall
> >>>>> number to memory for every system call, whether the stored version is
> >>>>> used or not.
> >>>>>
> >>>>> Since ARM caches are generally not write allocate, this means mostly
> >>>>> write-only variables can have a higher than expected expense.
> >>>>>
> >>>>> Is there not some thread flag which can be checked to see if we need to
> >>>>> store the syscall number?
> >>>>
> >>>> Perhaps before we freeze the task we can save the syscall number on ARM.
> >>>> The patches suggest that the signal delivery path -- which the freezer
> >>>> utilizes -- has the syscall number already.
> >
> > Actually, the signal path doesn't have the syscall number, it has
> > a binary "in syscall" value.
> >

Argh. I read too much into the name :(.

>
> Well, this could be changed to pass the syscall number through
> registers along to try_to_freeze without any mentionable performance
> hit.

Yes, that's possible. I was thinking we could still use your thread info
field but only store to it when we know it will be useful for c/r rather
than for each syscall. Personally, I'd rather avoid passing the extra
parameter into try_to_freeze(). Your idea below seems better to me.

^shouldn't this be:
instr_addr & PAGE_MASK

> + sizeof(unsigned long));
> + kunmap_atomic(ptr, KM_USER1);
> +
> + page_cache_release(page);
> +
> + return 0;
> +}

(again, not familiar with ARM so my understanding is:

I guess swi is "syscall word immediate".

The syscall nr is embedded in the instruction as an immediate
value and you're getting a copy of that instruction using the value of
the pc register just after the syscall instruction was executed.)

Perhaps I am missing or forgetting something. Why isn't this as simple
as calling get_user() or even copy_from_user() using instr_addr?

Cheers,
-Matt Helsley

Matt Helsley

unread,

Mar 24, 2010, 9:20:01 PM3/24/10

to

On Wed, Mar 24, 2010 at 06:11:32PM -0700, Matt Helsley wrote:
> On Wed, Mar 24, 2010 at 08:36:39PM +0100, Christoffer Dall wrote:

<snip>

Oops, made my own mistake. I think the address of the kmap'd instruction
would be:

ptr + (instr_addr & ~PAGE_MASK)

Oren Laadan

unread,

Mar 24, 2010, 9:40:02 PM3/24/10

to

In c/r, we only need it at restart when a task calls it on itself.

However the interface itself of get_syscall_nr() can be called by
any task on another task.

(In fact, I think that for the most part, saving the syscall number
at checkpoint time may be better than figuring out at restart time).

Oren.

Christoffer Dall

unread,

Mar 25, 2010, 6:30:02 AM3/25/10

to

Yes. Thanks for pointing it out.

Christoffer Dall

unread,

Mar 25, 2010, 6:40:01 AM3/25/10

to

So, as Oren is saying, the point was to make the syscall_get_nr(..)
work according to the interface specified in
include/asm-generic/syscall.h.

Considering it's unknown how we will deal with checkpoint/restart
across CONFIG_ARM_THUMB, CONFIG_OABI_COMPAT etc., I also think it's a
better idea to checkpoint the syscall number at checkpoint and for the
restore, place architecture specific hooks to get the syscall number
instead of calling syscall_get_nr(...) directly. In this way we should
always be able to get the syscall and correctly restart, independently
of what tricks we do to checkpoint restart across configuration
settings - if any.

Best,
Christoffer

Jamie Lokier

unread,

Mar 25, 2010, 10:50:01 PM3/25/10

to

Christoffer Dall wrote:
> > That doesn't indicate what ISA version the system is running on, or even
> > if the ABI is compatible (we have two ABIs - OABI and EABI).
>
> That's why I checkpointed CONFIG_OABI_COMPAT, but I realize that it's
> not sufficient.
>
> How about checkpointing CONFIG_AEABI and CONFIG_OABI_COMPAT and making
> sure that we either restore to the same setting of the two or restore
> to CONFIG_OABI_COMPAT=y?

With CONFIG_OABI_COMPAT enabled, each process can be in either
personality: OABI or EABI. Checkpointing will need to remember which
one.

With CONFIG_OABI_COMPAT disabled, it'll be fixed at one or the other,
but there's no reason why a process should not be moved between
kernels with different values of CONFIG_OABI_COMPAT, so long as the
OABI or EABI personality is supported by the destination kernel.

In other words, CONFIG_OABI_COMPAT shouldn't be in the checkpoint
state at all - only the per-process personalities should be.

> >> We checkpoint whether the system is running with CONFIG_MMU or not and
> >> require the same configuration for the system on which we restore the
> >> process. It might be possible to allow something more fine-grained,
> >> if it's worth the energy. Input on this item is also very welcome,
> >> specifically from someone who knows the exact meaning of the end_brk
> >> field.
> >
> > Processes which run on MMU and non-MMU CPUs are unlikely to be
> > interchangable - the run time environments are quite different. I
> > think this is a sane check.
> >
> thanks.

It's possible in principle to run many non-MMU binaries on MMU
kernels, but I've never heard of anyone doing it.

-- Jamie

Paul Mundt

unread,

Mar 25, 2010, 11:10:02 PM3/25/10

to

On Fri, Mar 26, 2010 at 02:47:59AM +0000, Jamie Lokier wrote:
> Christoffer Dall wrote:
> > >> We checkpoint whether the system is running with CONFIG_MMU or not and
> > >> require the same configuration for the system on which we restore the
> > >> process. It might be possible to allow something more fine-grained,
> > >> if it's worth the energy. Input on this item is also very welcome,
> > >> specifically from someone who knows the exact meaning of the end_brk
> > >> field.
> > >
> > > Processes which run on MMU and non-MMU CPUs are unlikely to be

> > > interchangable - the run time environments are quite different. ?I

> > > think this is a sane check.
> > >
> > thanks.
>
> It's possible in principle to run many non-MMU binaries on MMU
> kernels, but I've never heard of anyone doing it.
>

FDPIC supports running the same binaries with or without MMU depending on
your ABI, it's not really that uncommon, even if it's mostly just used
for prototyping.

Jamie Lokier

unread,

Mar 26, 2010, 12:00:01 AM3/26/10

to

Paul Mundt wrote:
> FDPIC supports running the same binaries with or without MMU depending on
> your ABI, it's not really that uncommon, even if it's mostly just used
> for prototyping.

Thanks - I didn't know anyone actually did it :-)

But I can see the value for product rollouts of some binaries onto a
mixture of hardware, too.

bFLT (flat) binaries should be runnable with an MMU too.

-- Jamie

Christoffer Dall

unread,

Mar 28, 2010, 7:00:02 PM3/28/10

to

On Fri, Mar 26, 2010 at 5:02 AM, Paul Mundt <let...@linux-sh.org> wrote:
> On Fri, Mar 26, 2010 at 02:47:59AM +0000, Jamie Lokier wrote:
>>
>> It's possible in principle to run many non-MMU binaries on MMU
>> kernels, but I've never heard of anyone doing it.
>>
> FDPIC supports running the same binaries with or without MMU depending on
> your ABI, it's not really that uncommon, even if it's mostly just used
> for prototyping.
>

I would imagine that the chance that a restart will fail anyway when
restoring an MMU process on a non-MMU kernel. However, as you suggest,
the other way around should be possible. Thanks for clearing that up.

Specifically, do you know the meaning of the end_brk field on the
mm_context_t struct and if I need to checkpoint it on restart for
non-MMU systems (and potentially do something more clever during
restart on an MMU kernel?)

[C/R ARM][PATCH 0/3] Linux Checkpoint-Restart - ARM port

Christoffer Dall

Christoffer Dall

Christoffer Dall

Serge E. Hallyn

Russell King - ARM Linux

Russell King - ARM Linux

Matt Helsley

Matt Helsley

Oren Laadan

Matt Helsley

Oren Laadan

Christoffer Dall

Christoffer Dall

Christoffer Dall

Matt Helsley

Matt Helsley

Oren Laadan

Christoffer Dall

Christoffer Dall

Jamie Lokier

Paul Mundt

Jamie Lokier

Christoffer Dall