[PATCH 0/4] coredump: core dump masking support v2

3 views
Skip to first unread message

Kawai, Hidehiro

unread,
Jan 26, 2007, 9:10:06 AM1/26/07
to
Hi,

This patch series is version 2 of the core dump masking feature,
which enables you to specify the memory segment types you don't
want to dump into a core file.

In this version, the setting for which memory segment types are
dumped is stored as a bit field and placed next to `dumpable'
bit field in mm_struct. Writing to these two bit fields can cause
race condition, so I use a global spin lock to protect them from
write-write race.
In consideration of security, I adds a sysctl parameter to
enable/disable this feature.

This patch series can be applied against 2.6.20-rc4-mm1.
The supported core file formats are ELF and ELF-FDPIC. ELF has been
tested, but ELF-FDPIC has not been build and tested because I don't
have the test environment.


Description:
You can specify memory segment types you don't want to dump via
/proc/<pid>/core_flags file, which is provided per process.
This file represents a set of flags, but currently, only bit 0 is
available. If bit 0 is set, the kernel core dump routine doesn't
dump anonymous shared memory segments, which includes IPC shared
memory and some of mmap(2)'ed memory.

System administrator can enable/disable these flags one by one via
/proc/sys/kernel/core_flags_enable file. The default value is 1.
This means that bit 0 in core_flags is effective.


Background:
Some software programs share huge memory among hundreds of
processes. If a failure occurs on one of these processes, they can
be signaled by a monitoring process to generate core files and
restart the service. However, it can develop into a system-wide
failure such as system slow down for a long time and disk space
shortage because the total size of the core files is very huge!

To avoid the above situation we can limit the core file size by
setrlimit(2) or ulimit(1). But this method can lose important data
such as stack because core dumping is terminated halfway.
So I suggest keeping shared memory segments from being dumped for
particular processes. Because the shared memory attached to processes
is common in them, we don't need to dump the shared memory every time.


Usage:
If you don't want to dump all shared memory segments attached to
pid 1234, set the bit 0 of the process's core_flags to 1:

$ echo 1 > /proc/1234/core_flags

Additionally, you can check its hexadecimal value by reading the file:

$ cat /proc/1234/core_flags
00000001

When a new process is created, the process inherits the core_flags
setting from its parent. It is useful to set the core_flags before
the program runs. For example:

$ echo 1 > /proc/self/core_flags
$ ./some_program


ChangeLog:
v2:
- rename `coremask' to `core_flags'
- change `core_flags' member in mm_struct to a bit field
next to `dumpable'
- introduce a global spin lock to protect adjacent two bit fields
(core_flags and dumpable) from race condition
- fix a bug that the generated core file can be corrupted when
core dumping and updating core_flags occur concurrently
- add kernel.core_flags_enable sysctl parameter to enable/disable
flags in /proc/<pid>/core_flags
- support ELF-FDPIC binary format, but not tested

v1:
http://lkml.org/lkml/2006/12/13/17

--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Kawai, Hidehiro

unread,
Jan 26, 2007, 9:20:09 AM1/26/07
to
This patch adds an interface to specify which memory segment types
should be dumped or not.

/proc/<pid>/core_flags file is provided as the interface.
You can change the setting value (which memory segment types are
dumped or not) for a particular process by writing to or reading
from the file.

The setting value is inherited to the child process when it is
created.

The setting value is stored into core_flags member of mm_struct,
which shares bytes with dumpable member because these two are
adjacent bit fields. In order to avoid write-write race between the
two, we use a global spin lock.

smp_wmb() at updating dumpable is removed because set_dumpable()
includes a pair of spin lock and unlock which has the effect of
memory barrier.

Signed-off-by: Hidehiro Kawai <hidehiro...@hitachi.com>
---
fs/exec.c | 11 +++-
fs/proc/base.c | 93 ++++++++++++++++++++++++++++++++++++++++
include/linux/sched.h | 30 ++++++++++++
kernel/fork.c | 2
kernel/sys.c | 62 +++++++++-----------------
security/commoncap.c | 2
security/dummy.c | 2
7 files changed, 155 insertions(+), 47 deletions(-)

Index: linux-2.6.20-rc4-mm1/fs/proc/base.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/proc/base.c
+++ linux-2.6.20-rc4-mm1/fs/proc/base.c
@@ -73,6 +73,7 @@
#include <linux/poll.h>
#include <linux/nsproxy.h>
#include <linux/oom.h>
+#include <linux/elf.h>
#include "internal.h"

/* NOTE:
@@ -912,6 +913,95 @@ static struct file_operations proc_fault
};
#endif

+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+static ssize_t proc_core_flags_read(struct file *file, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task = get_proc_task(file->f_dentry->d_inode);
+ struct mm_struct *mm;
+ char buffer[PROC_NUMBUF];
+ size_t len;
+ unsigned int flags;
+ loff_t __ppos = *ppos;
+ int ret;
+
+ ret = -ESRCH;
+ if (!task)
+ goto out_no_task;
+
+ ret = 0;
+ mm = get_task_mm(task);
+ if (!mm)
+ goto out_no_mm;
+ flags = mm->core_flags;
+
+ len = snprintf(buffer, sizeof(buffer), "%08x\n", flags);
+ if (__ppos >= len)
+ goto out;
+ if (count > len - __ppos)
+ count = len - __ppos;
+
+ ret = -EFAULT;
+ if (copy_to_user(buf, buffer + __ppos, count))
+ goto out;
+
+ ret = count;
+ *ppos = __ppos + count;
+
+ out:
+ mmput(mm);
+ out_no_mm:
+ put_task_struct(task);
+ out_no_task:
+ return ret;
+}
+
+static ssize_t proc_core_flags_write(struct file *file, const char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct task_struct *task;
+ struct mm_struct *mm;
+ char buffer[PROC_NUMBUF], *end;
+ unsigned int flags;
+ int ret;
+
+ ret = -EFAULT;
+ memset(buffer, 0, sizeof(buffer));
+ if (count > sizeof(buffer) - 1)
+ count = sizeof(buffer) - 1;
+ if (copy_from_user(buffer, buf, count))
+ goto out_no_task;
+
+ ret = -EINVAL;
+ flags = (unsigned int)simple_strtoul(buffer, &end, 0);
+ if (*end == '\n')
+ end++;
+ if (end - buffer == 0)
+ goto out_no_task;
+
+ ret = -ESRCH;
+ task = get_proc_task(file->f_dentry->d_inode);
+ if (!task)
+ goto out_no_task;
+
+ ret = end - buffer;
+ mm = get_task_mm(task);
+ if (mm) {
+ set_core_flags(mm, flags);
+ mmput(mm);
+ }
+
+ put_task_struct(task);
+ out_no_task:
+ return ret;
+}
+
+static struct file_operations proc_core_flags_operations = {
+ .read = proc_core_flags_read,
+ .write = proc_core_flags_write,
+};
+#endif
+
static void *proc_pid_follow_link(struct dentry *dentry, struct nameidata *nd)
{
struct inode *inode = dentry->d_inode;
@@ -1876,6 +1966,9 @@ static struct pid_entry tgid_base_stuff[
#ifdef CONFIG_FAULT_INJECTION
REG("make-it-fail", S_IRUGO|S_IWUSR, fault_inject),
#endif
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+ REG("core_flags", S_IRUGO|S_IWUSR, core_flags),
+#endif
#ifdef CONFIG_TASK_IO_ACCOUNTING
INF("io", S_IRUGO, pid_io_accounting),
#endif
Index: linux-2.6.20-rc4-mm1/include/linux/sched.h
===================================================================
--- linux-2.6.20-rc4-mm1.orig/include/linux/sched.h
+++ linux-2.6.20-rc4-mm1/include/linux/sched.h
@@ -372,6 +372,9 @@ struct mm_struct {

unsigned char dumpable:2;

+ /* Control the core dump routines. */
+ unsigned char core_flags:1;
+
/* coredumping support */
int core_waiters;
struct completion *core_startup_done, core_done;
@@ -1710,6 +1713,33 @@ extern int sched_create_sysfs_power_savi

extern void normalize_rt_tasks(void);

+#include <linux/elf.h>
+/*
+ * These macros are used to protect dumpable and core_flags bit fields in
+ * mm_struct from write race between the two.
+ */
+extern spinlock_t dump_bits_lock;
+#define __set_dump_bits(dest, val) \
+ do { \
+ spin_lock(&dump_bits_lock); \
+ (dest) = (val); \
+ spin_unlock(&dump_bits_lock); \
+ } while (0)
+
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+# define set_dumpable(mm, val) \
+ __set_dump_bits((mm)->dumpable, val)
+# define set_core_flags(mm, val) \
+ __set_dump_bits((mm)->core_flags, val)
+#else
+# define set_dumpable(mm, val) \
+ do { \
+ (mm)->dumpable = (val); \
+ smp_wmb(); \
+ } while (0)
+# define set_core_flags(mm, val)
+#endif
+
#endif /* __KERNEL__ */

#endif
Index: linux-2.6.20-rc4-mm1/fs/exec.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/exec.c
+++ linux-2.6.20-rc4-mm1/fs/exec.c
@@ -62,6 +62,9 @@ int core_uses_pid;
char core_pattern[128] = "core";
int suid_dumpable = 0;

+/* Protect dumpable and core_flags in each mm_struct from race condition. */
+DEFINE_SPINLOCK(dump_bits_lock);
+
EXPORT_SYMBOL(suid_dumpable);
/* The maximal length of core_pattern is also specified in sysctl.c */

@@ -853,9 +856,9 @@ int flush_old_exec(struct linux_binprm *
current->sas_ss_sp = current->sas_ss_size = 0;

if (current->euid == current->uid && current->egid == current->gid)
- current->mm->dumpable = 1;
+ set_dumpable(current->mm, 1);
else
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

name = bprm->filename;

@@ -883,7 +886,7 @@ int flush_old_exec(struct linux_binprm *
file_permission(bprm->file, MAY_READ) ||
(bprm->interp_flags & BINPRM_FLAGS_ENFORCE_NONDUMP)) {
suid_keys(current);
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);
}

/* An exec changes our domain. We are no longer part of the thread
@@ -1482,7 +1485,7 @@ int do_coredump(long signr, int exit_cod
flag = O_EXCL; /* Stop rewrite attacks */
current->fsuid = 0; /* Dump root private */
}
- mm->dumpable = 0;
+ set_dumpable(mm, 0);

retval = coredump_wait(exit_code);
if (retval < 0)
Index: linux-2.6.20-rc4-mm1/kernel/fork.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/kernel/fork.c
+++ linux-2.6.20-rc4-mm1/kernel/fork.c
@@ -332,6 +332,8 @@ static struct mm_struct * mm_init(struct
atomic_set(&mm->mm_count, 1);
init_rwsem(&mm->mmap_sem);
INIT_LIST_HEAD(&mm->mmlist);
+ /* don't need to use set_core_flags() */
+ mm->core_flags = (current->mm) ? current->mm->core_flags : 0;
mm->core_waiters = 0;
mm->nr_ptes = 0;
set_mm_counter(mm, file_rss, 0);
Index: linux-2.6.20-rc4-mm1/kernel/sys.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/kernel/sys.c
+++ linux-2.6.20-rc4-mm1/kernel/sys.c
@@ -1017,10 +1017,8 @@ asmlinkage long sys_setregid(gid_t rgid,
else
return -EPERM;
}
- if (new_egid != old_egid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (new_egid != old_egid)
+ set_dumpable(current->mm, suid_dumpable);
if (rgid != (gid_t) -1 ||
(egid != (gid_t) -1 && egid != old_rgid))
current->sgid = new_egid;
@@ -1047,16 +1045,12 @@ asmlinkage long sys_setgid(gid_t gid)
return retval;

if (capable(CAP_SETGID)) {
- if (old_egid != gid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (old_egid != gid)
+ set_dumpable(current->mm, suid_dumpable);
current->gid = current->egid = current->sgid = current->fsgid = gid;
} else if ((gid == current->gid) || (gid == current->sgid)) {
- if (old_egid != gid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (old_egid != gid)
+ set_dumpable(current->mm, suid_dumpable);
current->egid = current->fsgid = gid;
}
else
@@ -1084,10 +1078,8 @@ static int set_user(uid_t new_ruid, int

switch_uid(new_user);

- if (dumpclear) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (dumpclear)
+ set_dumpable(current->mm, suid_dumpable);
current->uid = new_ruid;
return 0;
}
@@ -1140,10 +1132,8 @@ asmlinkage long sys_setreuid(uid_t ruid,
if (new_ruid != old_ruid && set_user(new_ruid, new_euid != old_euid) < 0)
return -EAGAIN;

- if (new_euid != old_euid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (new_euid != old_euid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsuid = current->euid = new_euid;
if (ruid != (uid_t) -1 ||
(euid != (uid_t) -1 && euid != old_ruid))
@@ -1190,10 +1180,8 @@ asmlinkage long sys_setuid(uid_t uid)
} else if ((uid != current->uid) && (uid != new_suid))
return -EPERM;

- if (old_euid != uid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (old_euid != uid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsuid = current->euid = uid;
current->suid = new_suid;

@@ -1235,10 +1223,8 @@ asmlinkage long sys_setresuid(uid_t ruid
return -EAGAIN;
}
if (euid != (uid_t) -1) {
- if (euid != current->euid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (euid != current->euid)
+ set_dumpable(current->mm, suid_dumpable);
current->euid = euid;
}
current->fsuid = current->euid;
@@ -1285,10 +1271,8 @@ asmlinkage long sys_setresgid(gid_t rgid
return -EPERM;
}
if (egid != (gid_t) -1) {
- if (egid != current->egid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (egid != current->egid)
+ set_dumpable(current->mm, suid_dumpable);
current->egid = egid;
}
current->fsgid = current->egid;
@@ -1331,10 +1315,8 @@ asmlinkage long sys_setfsuid(uid_t uid)
if (uid == current->uid || uid == current->euid ||
uid == current->suid || uid == current->fsuid ||
capable(CAP_SETUID)) {
- if (uid != old_fsuid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (uid != old_fsuid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsuid = uid;
}

@@ -1360,10 +1342,8 @@ asmlinkage long sys_setfsgid(gid_t gid)
if (gid == current->gid || gid == current->egid ||
gid == current->sgid || gid == current->fsgid ||
capable(CAP_SETGID)) {
- if (gid != old_fsgid) {
- current->mm->dumpable = suid_dumpable;
- smp_wmb();
- }
+ if (gid != old_fsgid)
+ set_dumpable(current->mm, suid_dumpable);
current->fsgid = gid;
key_fsgid_changed(current);
proc_id_connector(current, PROC_EVENT_GID);
@@ -2158,7 +2138,7 @@ asmlinkage long sys_prctl(int option, un
error = -EINVAL;
break;
}
- current->mm->dumpable = arg2;
+ set_dumpable(current->mm, arg2);
break;

case PR_SET_UNALIGN:
Index: linux-2.6.20-rc4-mm1/security/commoncap.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/security/commoncap.c
+++ linux-2.6.20-rc4-mm1/security/commoncap.c
@@ -244,7 +244,7 @@ void cap_bprm_apply_creds (struct linux_

if (bprm->e_uid != current->uid || bprm->e_gid != current->gid ||
!cap_issubset (new_permitted, current->cap_permitted)) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

if (unsafe & ~LSM_UNSAFE_PTRACE_CAP) {
if (!capable(CAP_SETUID)) {
Index: linux-2.6.20-rc4-mm1/security/dummy.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/security/dummy.c
+++ linux-2.6.20-rc4-mm1/security/dummy.c
@@ -130,7 +130,7 @@ static void dummy_bprm_free_security (st
static void dummy_bprm_apply_creds (struct linux_binprm *bprm, int unsafe)
{
if (bprm->e_uid != current->uid || bprm->e_gid != current->gid) {
- current->mm->dumpable = suid_dumpable;
+ set_dumpable(current->mm, suid_dumpable);

if ((unsafe & ~LSM_UNSAFE_PTRACE_CAP) && !capable(CAP_SETUID)) {
bprm->e_uid = current->uid;

Kawai, Hidehiro

unread,
Jan 26, 2007, 9:20:07 AM1/26/07
to
This patch adds kernel.core_flags_enable sysctl parameter, which allows
root user to disable the /proc/<pid>/core_flags feature globally.

Signed-off-by: Hidehiro Kawai <hidehiro...@hitachi.com>
---

fs/binfmt_elf.c | 3 ++-
fs/binfmt_elf_fdpic.c | 3 ++-
fs/exec.c | 1 +
include/linux/binfmts.h | 1 +
include/linux/sysctl.h | 1 +
kernel/sysctl.c | 11 +++++++++++
6 files changed, 18 insertions(+), 2 deletions(-)

Index: linux-2.6.20-rc4-mm1/fs/exec.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/exec.c
+++ linux-2.6.20-rc4-mm1/fs/exec.c

@@ -61,6 +61,7 @@


int core_uses_pid;
char core_pattern[128] = "core";
int suid_dumpable = 0;

+unsigned int sysctl_core_flags_enable = 0x1;



/* Protect dumpable and core_flags in each mm_struct from race condition. */

DEFINE_SPINLOCK(dump_bits_lock);
Index: linux-2.6.20-rc4-mm1/include/linux/sysctl.h
===================================================================
--- linux-2.6.20-rc4-mm1.orig/include/linux/sysctl.h
+++ linux-2.6.20-rc4-mm1/include/linux/sysctl.h
@@ -160,6 +160,7 @@ enum
KERN_MAX_LOCK_DEPTH=74,
KERN_NMI_WATCHDOG=75, /* int: enable/disable nmi watchdog */
KERN_PANIC_ON_NMI=76, /* int: whether we will panic on an unrecovered */
+ KERN_CORE_FLAGS_ENABLE=77, /* int: enabled flags in core_flags */
};


Index: linux-2.6.20-rc4-mm1/kernel/sysctl.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/kernel/sysctl.c
+++ linux-2.6.20-rc4-mm1/kernel/sysctl.c
@@ -69,6 +69,7 @@ extern int max_threads;
extern int core_uses_pid;
extern int suid_dumpable;
extern char core_pattern[];
+extern unsigned int sysctl_core_flags_enable;
extern int pid_max;
extern int min_free_kbytes;
extern int printk_ratelimit_jiffies;
@@ -354,6 +355,16 @@ static ctl_table kern_table[] = {
.proc_handler = &proc_dostring,
.strategy = &sysctl_string,
},
+#if defined(USE_ELF_CORE_DUMP) && defined(CONFIG_ELF_CORE)
+ {
+ .ctl_name = KERN_CORE_FLAGS_ENABLE,
+ .procname = "core_flags_enable",
+ .data = &sysctl_core_flags_enable,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = &proc_dointvec,
+ },
+#endif
#ifdef CONFIG_PROC_SYSCTL
{
.ctl_name = KERN_TAINTED,
Index: linux-2.6.20-rc4-mm1/include/linux/binfmts.h
===================================================================
--- linux-2.6.20-rc4-mm1.orig/include/linux/binfmts.h
+++ linux-2.6.20-rc4-mm1/include/linux/binfmts.h
@@ -81,6 +81,7 @@ extern int suid_dumpable;

/* Core dump control flags */
#define CORE_OMIT_ANON_SHARED 0x1 /* don't dump anonymous shared memory */
+extern unsigned int sysctl_core_flags_enable;

extern int setup_arg_pages(struct linux_binprm * bprm,
unsigned long stack_top,
Index: linux-2.6.20-rc4-mm1/fs/binfmt_elf.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/binfmt_elf.c
+++ linux-2.6.20-rc4-mm1/fs/binfmt_elf.c
@@ -1597,7 +1597,8 @@ static int elf_core_dump(long signr, str
}

dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
- __set_dump_bits(core_flags, current->mm->core_flags);
+ __set_dump_bits(core_flags,
+ current->mm->core_flags & sysctl_core_flags_enable);

/* Write program headers for segments dump */
for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {
Index: linux-2.6.20-rc4-mm1/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6.20-rc4-mm1/fs/binfmt_elf_fdpic.c
@@ -1703,7 +1703,8 @@ static int elf_fdpic_core_dump(long sign
/* Page-align dumped data */
dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

- __set_dump_bits(core_flags, current->mm->core_flags);
+ __set_dump_bits(core_flags,
+ current->mm->core_flags & sysctl_core_flags_enable;);

/* write program headers for segments dump */
for (

Kawai, Hidehiro

unread,
Jan 26, 2007, 9:20:12 AM1/26/07
to
This patch adds the documentation for the following parameters:
/proc/<pid>/core_flags
/proc/sys/kernel/core_flags_enable

Signed-off-by: Hidehiro Kawai <hidehiro...@hitachi.com>
---

Documentation/filesystems/proc.txt | 42 +++++++++++++++++++++++++++
Documentation/sysctl/kernel.txt | 11 +++++++
2 files changed, 53 insertions(+)

Index: linux-2.6.20-rc4-mm1/Documentation/filesystems/proc.txt
===================================================================
--- linux-2.6.20-rc4-mm1.orig/Documentation/filesystems/proc.txt
+++ linux-2.6.20-rc4-mm1/Documentation/filesystems/proc.txt
@@ -41,6 +41,7 @@ Table of Contents
2.11 /proc/sys/fs/mqueue - POSIX message queues filesystem
2.12 /proc/<pid>/oom_adj - Adjust the oom-killer score
2.13 /proc/<pid>/oom_score - Display current oom-killer score
+ 2.14 /proc/<pid>/core_flags - Core dump control flags

------------------------------------------------------------------------------
Preface
@@ -1981,6 +1982,47 @@ This file can be used to check the curre
any given <pid>. Use it together with /proc/<pid>/oom_adj to tune which
process should be killed in an out-of-memory situation.

+2.14 /proc/<pid>/core_flags - Core dump control flags
+---------------------------------------------------------------------
+When a process is dumped, all anonymous memory is written to a core file as
+long as the size of the core file isn't limited. But sometimes we don't want
+to dump some memory segments, for example, huge shared memory.
+
+The /proc/<pid>/core_flags file enables you to omit some anonymous memory from
+a core file when it is generated. The content of the proc file is bitmask of
+memory segment types you don't want to dump. When the <pid> process is dumped,
+the core dump routine decides whether a given memory segment should be dumped
+into a core file or not, based on the type of the memory segment and bitmask.
+
+Currently, only valid bit is bit 0. If bit 0 is set, anonymous `shared' memory
+segments are not dumped. There are three types of anonymous shared memory:
+
+ - IPC shared memory
+ - the memory segments created by mmap(2) with MAP_ANONYMOUS and MAP_SHARED
+ flags
+ - the memory segments created by mmap(2) with MAP_SHARED flag, and the
+ mapped file has already been unlinked
+
+Because current core dump routine doesn't distinguish these segments, you can
+only choose either dumping all anonymous shared memory segments or not.
+
+If you don't want to dump all shared memory segments attached to pid 1234, set
+the bit 0 of the process's core_flags to 1:
+
+ $ echo 1 > /proc/1234/core_flags
+
+Additionally, you can check its hexadecimal value by reading the file:
+
+ $ cat /proc/1234/core_flags
+ 00000001
+
+When a new process is created, the process inherits the core_flags setting
+from its parent. It is useful to set the core_flags before the program runs.
+For example:
+
+ $ echo 1 > /proc/self/core_flags
+ $ ./some_program
+
------------------------------------------------------------------------------
Summary
------------------------------------------------------------------------------
Index: linux-2.6.20-rc4-mm1/Documentation/sysctl/kernel.txt
===================================================================
--- linux-2.6.20-rc4-mm1.orig/Documentation/sysctl/kernel.txt
+++ linux-2.6.20-rc4-mm1/Documentation/sysctl/kernel.txt
@@ -20,6 +20,7 @@ show up in /proc/sys/kernel:
- acct
- core_pattern
- core_uses_pid
+- core_flags_enable
- ctrl-alt-del
- dentry-state
- domainname
@@ -122,6 +123,16 @@ the filename.

==============================================================

+core_flags_enable:
+
+This file enables/disables each flag in /proc/<pid>/core_flags
+(please see Documentation/filesystems/proc.txt). If a bit in
+core_flags_enable is set, the corresponding flag in
+/proc/<pid>/core_flags is effective, otherwise the flag is
+discarded.
+
+==============================================================
+
ctrl-alt-del:

When the value in this file is 0, ctrl-alt-del is trapped and

Kawai, Hidehiro

unread,
Jan 26, 2007, 9:20:12 AM1/26/07
to
This patch enables to omit anonymous shared memory from a core file
when it is generated.

If you don't want to dump shared memory segments of <pid> process,
set the bit 0 of the /proc/<pid>/core_flags to 1.

$ echo 1 > /proc/<pid>/core_flags


The debug messages from maydump() in fs/binfmt_elf_fdpic.c are changed
appropriately so that we can know what kind of memory segments are
dumped or not.

Signed-off-by: Hidehiro Kawai <hidehiro...@hitachi.com>
---

fs/binfmt_elf.c | 20 ++++++++++++++------
fs/binfmt_elf_fdpic.c | 37 +++++++++++++++++++++++++------------
include/linux/binfmts.h | 3 +++
3 files changed, 42 insertions(+), 18 deletions(-)

Index: linux-2.6.20-rc4-mm1/fs/binfmt_elf.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/binfmt_elf.c
+++ linux-2.6.20-rc4-mm1/fs/binfmt_elf.c

@@ -1176,15 +1176,21 @@ static int dump_seek(struct file *file,
*
* I think we should skip something. But I am not sure how. H.J.
*/
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, unsigned int core_flags)
{
/* Do not dump I/O mapped devices or special mappings */
if (vma->vm_flags & (VM_IO | VM_RESERVED))
return 0;

- /* Dump shared memory only if mapped from an anonymous file. */
- if (vma->vm_flags & VM_SHARED)
- return vma->vm_file->f_path.dentry->d_inode->i_nlink == 0;
+ /*
+ * Dump shared memory only if mapped from an anonymous file and not
+ * masked by /proc/<pid>/core_flags.
+ */
+ if (vma->vm_flags & VM_SHARED) {
+ if (vma->vm_file->f_path.dentry->d_inode->i_nlink)
+ return 0;
+ return (core_flags & CORE_OMIT_ANON_SHARED) == 0;
+ }

/* If it hasn't been written to, don't write it out */
if (!vma->anon_vma)
@@ -1456,6 +1462,7 @@ static int elf_core_dump(long signr, str
#endif
int thread_status_size = 0;
elf_addr_t *auxv;
+ unsigned int core_flags;

/*
* We no longer stop all VM operations.
@@ -1590,6 +1597,7 @@ static int elf_core_dump(long signr, str


}

dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

+ __set_dump_bits(core_flags, current->mm->core_flags);



/* Write program headers for segments dump */
for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {

@@ -1602,7 +1610,7 @@ static int elf_core_dump(long signr, str
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = maydump(vma) ? sz : 0;
+ phdr.p_filesz = maydump(vma, core_flags) ? sz : 0;
phdr.p_memsz = sz;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1644,7 +1652,7 @@ static int elf_core_dump(long signr, str


for (vma = current->mm->mmap; vma != NULL; vma = vma->vm_next) {

unsigned long addr;

- if (!maydump(vma))
+ if (!maydump(vma, core_flags))
continue;

for (addr = vma->vm_start;


Index: linux-2.6.20-rc4-mm1/fs/binfmt_elf_fdpic.c
===================================================================
--- linux-2.6.20-rc4-mm1.orig/fs/binfmt_elf_fdpic.c
+++ linux-2.6.20-rc4-mm1/fs/binfmt_elf_fdpic.c

@@ -1167,7 +1167,7 @@ static int dump_seek(struct file *file,
*
* I think we should skip something. But I am not sure how. H.J.
*/
-static int maydump(struct vm_area_struct *vma)
+static int maydump(struct vm_area_struct *vma, unsigned int core_flags)
{
/* Do not dump I/O mapped devices or special mappings */
if (vma->vm_flags & (VM_IO | VM_RESERVED)) {
@@ -1183,15 +1183,22 @@ static int maydump(struct vm_area_struct
return 0;
}

- /* Dump shared memory only if mapped from an anonymous file. */
+ /*
+ * Dump shared memory only if mapped from an anonymous file and not
+ * masked by /proc/<pid>/core_flags.
+ */
if (vma->vm_flags & VM_SHARED) {
- if (vma->vm_file->f_path.dentry->d_inode->i_nlink == 0) {
+ if (vma->vm_file->f_path.dentry->d_inode->i_nlink) {
kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
+ return 0;
+ }
+ if (core_flags & CORE_OMIT_ANON_SHARED) {
+ kdcore("%08lx: %08lx: no (anon-share)", vma->vm_start, vma->vm_flags);
+ return 0;
+ } else {
+ kdcore("%08lx: %08lx: yes (anon-share)", vma->vm_start, vma->vm_flags);
return 1;
}
-
- kdcore("%08lx: %08lx: no (share)", vma->vm_start, vma->vm_flags);
- return 0;
}

#ifdef CONFIG_MMU
@@ -1443,14 +1450,15 @@ static int elf_dump_thread_status(long s
*/
#ifdef CONFIG_MMU
static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
- size_t *size, unsigned long *limit)
+ size_t *size, unsigned long *limit,
+ unsigned int core_flags)
{
struct vm_area_struct *vma;

for (vma = current->mm->mmap; vma; vma = vma->vm_next) {
unsigned long addr;

- if (!maydump(vma))
+ if (!maydump(vma, core_flags))
continue;

for (addr = vma->vm_start;
@@ -1498,14 +1506,15 @@ end_coredump:
*/
#ifndef CONFIG_MMU
static int elf_fdpic_dump_segments(struct file *file, struct mm_struct *mm,
- size_t *size, unsigned long *limit)
+ size_t *size, unsigned long *limit,
+ unsigned int core_flags)
{
struct vm_list_struct *vml;

for (vml = current->mm->context.vmlist; vml; vml = vml->next) {
struct vm_area_struct *vma = vml->vma;

- if (!maydump(vma))
+ if (!maydump(vma, core_flags))
continue;

if ((*size += PAGE_SIZE) > *limit)
@@ -1556,6 +1565,7 @@ static int elf_fdpic_core_dump(long sign
struct vm_list_struct *vml;
#endif
elf_addr_t *auxv;
+ unsigned int core_flags;

/*
* We no longer stop all VM operations.
@@ -1693,6 +1703,8 @@ static int elf_fdpic_core_dump(long sign


/* Page-align dumped data */
dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);

+ __set_dump_bits(core_flags, current->mm->core_flags);
+


/* write program headers for segments dump */
for (

#ifdef CONFIG_MMU
@@ -1714,7 +1726,7 @@ static int elf_fdpic_core_dump(long sign
phdr.p_offset = offset;
phdr.p_vaddr = vma->vm_start;
phdr.p_paddr = 0;
- phdr.p_filesz = maydump(vma) ? sz : 0;
+ phdr.p_filesz = maydump(vma, core_flags) ? sz : 0;
phdr.p_memsz = sz;
offset += phdr.p_filesz;
phdr.p_flags = vma->vm_flags & VM_READ ? PF_R : 0;
@@ -1748,7 +1760,8 @@ static int elf_fdpic_core_dump(long sign

DUMP_SEEK(dataoff);

- if (elf_fdpic_dump_segments(file, current->mm, &size, &limit) < 0)
+ if (elf_fdpic_dump_segments(file, current->mm, &size, &limit,
+ core_flags) < 0)
goto end_coredump;

#ifdef ELF_CORE_WRITE_EXTRA_DATA


Index: linux-2.6.20-rc4-mm1/include/linux/binfmts.h
===================================================================
--- linux-2.6.20-rc4-mm1.orig/include/linux/binfmts.h
+++ linux-2.6.20-rc4-mm1/include/linux/binfmts.h

@@ -79,6 +79,9 @@ extern int suid_dumpable;
#define EXSTACK_DISABLE_X 1 /* Disable executable stacks */
#define EXSTACK_ENABLE_X 2 /* Enable executable stacks */

+/* Core dump control flags */
+#define CORE_OMIT_ANON_SHARED 0x1 /* don't dump anonymous shared memory */
+


extern int setup_arg_pages(struct linux_binprm * bprm,
unsigned long stack_top,

int executable_stack);

Robin Holt

unread,
Jan 26, 2007, 10:40:10 AM1/26/07
to
On Fri, Jan 26, 2007 at 11:05:07PM +0900, Kawai, Hidehiro wrote:
> You can specify memory segment types you don't want to dump via
> /proc/<pid>/core_flags file, which is provided per process.
> This file represents a set of flags, but currently, only bit 0 is
> available. If bit 0 is set, the kernel core dump routine doesn't
> dump anonymous shared memory segments, which includes IPC shared
> memory and some of mmap(2)'ed memory.

Can you make this a little more transparent? Having a magic bitmask does
not seem like the best way to do stuff. Could you maybe make a core_flags
directory with a seperate file for each flag. It could still map to a
single field in the mm, but be broken out for the proc filesystem.

I can certainly see the value of this for our customers. We have some
customers that run jobs in the 1-2TB range. Most of those customers
have always had coredumps disabled and just rely upon being able to
rerun the application and have MPI drop them into a debugger.

Thanks,
Robin

Pavel Machek

unread,
Jan 26, 2007, 12:10:11 PM1/26/07
to
On Fri 2007-01-26 23:14:53, Kawai, Hidehiro wrote:
> This patch adds kernel.core_flags_enable sysctl parameter, which allows
> root user to disable the /proc/<pid>/core_flags feature globally.

What is it good for?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

Kawai, Hidehiro

unread,
Jan 30, 2007, 2:40:11 AM1/30/07
to
Hi Robin,

Robin Holt wrote:
> Can you make this a little more transparent? Having a magic bitmask does
> not seem like the best way to do stuff. Could you maybe make a core_flags
> directory with a seperate file for each flag. It could still map to a
> single field in the mm, but be broken out for the proc filesystem.

It seems to be one of the good enhancement idea, thanks.:-)
But currently, there is only one flag. So we had better keep this simple
implementation until someone requests to add a new flag.

Thanks,


--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

-

Robin Holt

unread,
Jan 30, 2007, 7:50:10 AM1/30/07
to
On Tue, Jan 30, 2007 at 04:36:34PM +0900, Kawai, Hidehiro wrote:
> Hi Robin,
>
> Robin Holt wrote:
> > Can you make this a little more transparent? Having a magic bitmask does
> > not seem like the best way to do stuff. Could you maybe make a core_flags
> > directory with a seperate file for each flag. It could still map to a
> > single field in the mm, but be broken out for the proc filesystem.
>
> It seems to be one of the good enhancement idea, thanks.:-)
> But currently, there is only one flag. So we had better keep this simple
> implementation until someone requests to add a new flag.

If that is the case, can we rename the file from core_flags to something
more descriptive like dump_core_skip_anonymous_mappings. The name
is a wild suggestion, the renaming does seem fairly important to me.
Remember once you get this in, changing the name will be fairly difficult
as admin tools and documentation will adopt the name. These are usually
cases where it is better to do it right the first time.

Thanks,
Robin

Kawai, Hidehiro

unread,
Jan 31, 2007, 7:50:13 AM1/31/07
to
Hi,

Robin Holt wrote:
>>>Can you make this a little more transparent? Having a magic bitmask does
>>>not seem like the best way to do stuff. Could you maybe make a core_flags
>>>directory with a seperate file for each flag. It could still map to a
>>>single field in the mm, but be broken out for the proc filesystem.
>>
>>It seems to be one of the good enhancement idea, thanks.:-)
>>But currently, there is only one flag. So we had better keep this simple
>>implementation until someone requests to add a new flag.
>
> If that is the case, can we rename the file from core_flags to something
> more descriptive like dump_core_skip_anonymous_mappings. The name
> is a wild suggestion, the renaming does seem fairly important to me.
> Remember once you get this in, changing the name will be fairly difficult
> as admin tools and documentation will adopt the name. These are usually
> cases where it is better to do it right the first time.

Okay, I'll adopt your idea in the next version.
I'm going to provide the proc entry as follows:

(1) /proc/<pid>/core_flags/flags
(2) /proc/<pid>/core_flags/omit_anon_shared

(1) is the same as current core_flags. It is for expert users.
(2) corresponds to one bit in (1).
If (2) is set to 1, anonymous shared memory of the process is never
dumped.

Thanks,
--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

Pavel Machek

unread,
Feb 3, 2007, 8:30:09 AM2/3/07
to
Hi!

> >>>Can you make this a little more transparent? Having a magic bitmask does
> >>>not seem like the best way to do stuff. Could you maybe make a core_flags
> >>>directory with a seperate file for each flag. It could still map to a
> >>>single field in the mm, but be broken out for the proc filesystem.
> >>
> >>It seems to be one of the good enhancement idea, thanks.:-)
> >>But currently, there is only one flag. So we had better keep this simple
> >>implementation until someone requests to add a new flag.
> >
> > If that is the case, can we rename the file from core_flags to something
> > more descriptive like dump_core_skip_anonymous_mappings. The name
> > is a wild suggestion, the renaming does seem fairly important to me.
> > Remember once you get this in, changing the name will be fairly difficult
> > as admin tools and documentation will adopt the name. These are usually
> > cases where it is better to do it right the first time.
>
> Okay, I'll adopt your idea in the next version.
> I'm going to provide the proc entry as follows:
>
> (1) /proc/<pid>/core_flags/flags
> (2) /proc/<pid>/core_flags/omit_anon_shared
>
> (1) is the same as current core_flags. It is for expert users.
> (2) corresponds to one bit in (1).
> If (2) is set to 1, anonymous shared memory of the process is never
> dumped.

Now, that's what I call an ugly interface.

Can we simply add ulimit with boolean value, that says dump
anon_shared... or not? It will be simpler and faster, because you'll
not need locking.

Kawai, Hidehiro

unread,
Feb 14, 2007, 8:30:06 AM2/14/07
to
Hi,

I'm sorry that I couldn't reply you sooner.

Pavel Machek wrote:
>>Okay, I'll adopt your idea in the next version.
>>I'm going to provide the proc entry as follows:
>>
>> (1) /proc/<pid>/core_flags/flags
>> (2) /proc/<pid>/core_flags/omit_anon_shared
>>
>>(1) is the same as current core_flags. It is for expert users.
>>(2) corresponds to one bit in (1).
>>If (2) is set to 1, anonymous shared memory of the process is never
>>dumped.
>
> Now, that's what I call an ugly interface.

I considered for a while, and then I decided not to use the
above-stated interfaces. Instead, I'll create just one proc entry
to access to one core dump flag:
/proc/<pid>/coredump_omit_anonymous_shared

I think it is simple enough.
Would you think it is still ugly?


> Can we simply add ulimit with boolean value, that says dump
> anon_shared... or not? It will be simpler and faster, because you'll
> not need locking.

Yes, using ulimit will be simpler and faster, but less flexible.
The core dump flags can be changed only before running the program.
If the user modifies the program, the user will be able to change the
flags anytime. But the user can't always modify the program.

I need the ability to change the core dump flags anytime.


Thanks,
--
Hidehiro Kawai
Hitachi, Ltd., Systems Development Laboratory

-

Pavel Machek

unread,
Feb 14, 2007, 8:40:09 AM2/14/07
to
Hi!

> > Can we simply add ulimit with boolean value, that says dump
> > anon_shared... or not? It will be simpler and faster, because you'll
> > not need locking.
>
> Yes, using ulimit will be simpler and faster, but less flexible.

It is prefered in this case.

> The core dump flags can be changed only before running the program.
> If the user modifies the program, the user will be able to change the
> flags anytime. But the user can't always modify the program.

You can do ulimit then exec from wrapper.

> I need the ability to change the core dump flags anytime.

Then maybe right solution is to extend _ulimit_ to change anything
anytime. But that's definitely separate patch.
Pavel

Reply all
Reply to author
Forward
0 new messages