Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH -v3 0/2] printk.devkmsg: Ratelimit it by default

871 views
Skip to first unread message

Borislav Petkov

unread,
Jul 4, 2016, 10:25:14 AM7/4/16
to LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Hi all,

here's v3 integrating Ingo's comments. The thing is called
printk.devkmsg= or printk_devkmsg now, depending on cmdline option or
sysctl.


Changelog:
----------

v2:

here's v2 with the requested sysctl option kernel.printk_kmsg and
locking of the setting when printk.kmsg= is supplied on the command
line.

Patch 1 is unchanged.

Patch 2 has grown the sysctl addition.

v1:

Rostedt is busy so I took Linus' old patch and Steven's last v2 and
split and extended them with the comments people had on the last thread:

https://lkml.kernel.org/r/20160425145...@gandalf.local.home

I hope, at least.

So it is ratelimiting by default, with "on" and "off" cmdline options. I
called the option somewhat a bit shorter too: "printk.kmsg"

The current use cases of this and of which I'm aware are:

* debug the kernel and thus shut up all interfering input from
userspace, i.e. boot with "printk.kmsg=off"

* debug userspace (and by that I mean systemd) by booting with
"printk.kmsg=on" so that the ratelimiting is disabled and the kernel log
gets all the spew.

Thoughts?

Thanks.

Borislav Petkov (2):
ratelimit: Extend to print suppressed messages on release
printk: Add kernel parameter to control writes to /dev/kmsg

Documentation/kernel-parameters.txt | 6 +++
Documentation/sysctl/kernel.txt | 14 +++++++
include/linux/printk.h | 7 ++++
include/linux/ratelimit.h | 36 +++++++++++++---
kernel/printk/printk.c | 83 +++++++++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ++++
lib/ratelimit.c | 6 ++-
7 files changed, 146 insertions(+), 15 deletions(-)

--
2.7.3

Borislav Petkov

unread,
Jul 4, 2016, 10:25:16 AM7/4/16
to LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

Separated from a previous patch by Linus.

Also, make the ON_RELEASE image not use "callbacks" as it is misleading.

Signed-off-by: Borislav Petkov <b...@suse.de>
Acked-by: Linus Torvalds <torv...@linux-foundation.org>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---
include/linux/ratelimit.h | 36 +++++++++++++++++++++++++++++++-----
lib/ratelimit.c | 6 ++++--
2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
index 18102529254e..1d8a17ee8395 100644
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -2,11 +2,15 @@
#define _LINUX_RATELIMIT_H

#include <linux/param.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>

#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
#define DEFAULT_RATELIMIT_BURST 10

+/* issue num suppressed message on exit */
+#define RATELIMIT_MSG_ON_RELEASE BIT(0)
+
struct ratelimit_state {
raw_spinlock_t lock; /* protect the state */

@@ -15,6 +19,7 @@ struct ratelimit_state {
int printed;
int missed;
unsigned long begin;
+ unsigned long flags;
};

#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) { \
@@ -34,12 +39,33 @@ struct ratelimit_state {
static inline void ratelimit_state_init(struct ratelimit_state *rs,
int interval, int burst)
{
+ memset(rs, 0, sizeof(*rs));
+
raw_spin_lock_init(&rs->lock);
- rs->interval = interval;
- rs->burst = burst;
- rs->printed = 0;
- rs->missed = 0;
- rs->begin = 0;
+ rs->interval = interval;
+ rs->burst = burst;
+}
+
+static inline void ratelimit_default_init(struct ratelimit_state *rs)
+{
+ return ratelimit_state_init(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+}
+
+static inline void ratelimit_state_exit(struct ratelimit_state *rs)
+{
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
+ return;
+
+ if (rs->missed)
+ printk(KERN_WARNING "%s: %d output lines suppressed due to ratelimiting\n",
+ current->comm, rs->missed);
+}
+
+static inline void
+ratelimit_set_flags(struct ratelimit_state *rs, unsigned long flags)
+{
+ rs->flags = flags;
}

extern struct ratelimit_state printk_ratelimit_state;
diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index 2c5de86460c5..b753f0cfb00b 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -46,12 +46,14 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
rs->begin = jiffies;

if (time_is_before_jiffies(rs->begin + rs->interval)) {
- if (rs->missed)
+ if (rs->missed && !(rs->flags & RATELIMIT_MSG_ON_RELEASE))
printk(KERN_WARNING "%s: %d callbacks suppressed\n",
func, rs->missed);
rs->begin = jiffies;
rs->printed = 0;
- rs->missed = 0;
+
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
+ rs->missed = 0;
}
if (rs->burst && rs->burst > rs->printed) {
rs->printed++;
--
2.7.3

Borislav Petkov

unread,
Jul 4, 2016, 10:25:36 AM7/4/16
to LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg. It has three options:

* ratelimit - ratelimit logging from userspace.
* on - unlimited logging from userspace
* off - logging from userspace gets ignored

The default setting is to ratelimit the messages written to it.

It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.

This patch is based on previous patches from Linus and Steven.

In addition, we can control the logging from a lower priority
sysctl interface - kernel.printk_devkmsg={0,1,2} - with numeric values
corresponding to the options above.

That interface will succeed only if printk.devkmsg *hasn't* been supplied
on the command line. If it has, then printk.devkmsg is a one-time setting
which remains for the duration of the system lifetime.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---
Documentation/kernel-parameters.txt | 6 +++
Documentation/sysctl/kernel.txt | 14 +++++++
include/linux/printk.h | 7 ++++
kernel/printk/printk.c | 83 +++++++++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ++++
5 files changed, 111 insertions(+), 8 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 82b42c958d1c..0b1fea56dd49 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3150,6 +3150,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
default: disabled

+ printk.devkmsg={on,off}
+ Control writing to /dev/kmsg.
+ on - unlimited logging to /dev/kmsg from userspace
+ off - logging to /dev/kmsg disabled
+ Default: ratelimited logging.
+
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index a3683ce2a2f3..dec84d90061c 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -752,6 +752,20 @@ send before ratelimiting kicks in.

==============================================================

+printk_devkmsg:
+
+Control the logging to /dev/kmsg from userspace:
+
+0: default, ratelimited
+1: unlimited logging to /dev/kmsg from userspace
+2: logging to /dev/kmsg disabled
+
+The kernel command line parameter printk.devkmsg= overrides this and is
+a one-time setting until next reboot: once set, it cannot be changed by
+this sysctl interface anymore.
+
+==============================================================
+
randomize_va_space:

This option can be used to select the type of process address
diff --git a/include/linux/printk.h b/include/linux/printk.h
index f4da695fd615..e6bb50751504 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -171,6 +171,13 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
extern int printk_delay_msec;
extern int dmesg_restrict;
extern int kptr_restrict;
+extern unsigned int devkmsg_log;
+
+struct ctl_table;
+
+extern int
+devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
+ size_t *lenp, loff_t *ppos);

extern void wake_up_klogd(void);

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 60cdf6386763..9bc1d037340c 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,6 +86,55 @@ static struct lockdep_map console_lock_dep_map = {
};
#endif

+enum devkmsg_log_bits {
+ __DEVKMSG_LOG_BIT_ON = 0,
+ __DEVKMSG_LOG_BIT_OFF,
+ __DEVKMSG_LOG_BIT_LOCK,
+};
+
+enum devkmsg_log_masks {
+ DEVKMSG_LOG_MASK_ON = BIT(__DEVKMSG_LOG_BIT_ON),
+ DEVKMSG_LOG_MASK_OFF = BIT(__DEVKMSG_LOG_BIT_OFF),
+ DEVKMSG_LOG_MASK_LOCK = BIT(__DEVKMSG_LOG_BIT_LOCK),
+};
+
+/* Keep both the 'on' and 'off' bits clear, i.e. ratelimit by default: */
+#define DEVKMSG_LOG_MASK_DEFAULT 0
+
+unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
+static int __init control_devkmsg(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!strncmp(str, "on", 2))
+ devkmsg_log = DEVKMSG_LOG_MASK_ON;
+ else if (!strncmp(str, "off", 3))
+ devkmsg_log = DEVKMSG_LOG_MASK_OFF;
+ else if (!strncmp(str, "ratelimit", 9))
+ devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
+ else
+ return -EINVAL;
+
+ /* Sysctl cannot change it anymore. */
+ devkmsg_log |= DEVKMSG_LOG_MASK_LOCK;
+
+ return 0;
+}
+__setup("printk.devkmsg=", control_devkmsg);
+
+
+int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ if (devkmsg_log & DEVKMSG_LOG_MASK_LOCK) {
+ if (write)
+ return -EINVAL;
+ }
+
+ return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
+}
+
/*
* Number of registered extended console drivers.
*
@@ -614,6 +663,7 @@ struct devkmsg_user {
u64 seq;
u32 idx;
enum log_flags prev;
+ struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
};
@@ -623,11 +673,24 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
char *buf, *line;
int level = default_message_loglevel;
int facility = 1; /* LOG_USER */
+ struct file *file = iocb->ki_filp;
+ struct devkmsg_user *user = file->private_data;
size_t len = iov_iter_count(from);
ssize_t ret = len;

- if (len > LOG_LINE_MAX)
+ if (!user || len > LOG_LINE_MAX)
return -EINVAL;
+
+ /* Ignore when user logging is disabled. */
+ if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
+ return len;
+
+ /* Ratelimit when not explicitly enabled or when we're not booting. */
+ if ((system_state != SYSTEM_BOOTING) && !(devkmsg_log & DEVKMSG_LOG_MASK_ON)) {
+ if (!___ratelimit(&user->rs, current->comm))
+ return ret;
+ }
+
buf = kmalloc(len+1, GFP_KERNEL);
if (buf == NULL)
return -ENOMEM;
@@ -801,18 +864,20 @@ static int devkmsg_open(struct inode *inode, struct file *file)
int err;

/* write-only does not need any file context */
- if ((file->f_flags & O_ACCMODE) == O_WRONLY)
- return 0;
-
- err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
- SYSLOG_FROM_READER);
- if (err)
- return err;
+ if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+ err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
+ SYSLOG_FROM_READER);
+ if (err)
+ return err;
+ }

user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
if (!user)
return -ENOMEM;

+ ratelimit_default_init(&user->rs);
+ ratelimit_set_flags(&user->rs, RATELIMIT_MSG_ON_RELEASE);
+
mutex_init(&user->lock);

raw_spin_lock_irq(&logbuf_lock);
@@ -831,6 +896,8 @@ static int devkmsg_release(struct inode *inode, struct file *file)
if (!user)
return 0;

+ ratelimit_state_exit(&user->rs);
+
mutex_destroy(&user->lock);
kfree(user);
return 0;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87b2fc38398b..013d5fe0636a 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -814,6 +814,15 @@ static struct ctl_table kern_table[] = {
.extra2 = &ten_thousand,
},
{
+ .procname = "printk_devkmsg",
+ .data = &devkmsg_log,
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = devkmsg_sysctl_set_loglvl,
+ .extra1 = &zero,
+ .extra2 = &two,
+ },
+ {
.procname = "dmesg_restrict",
.data = &dmesg_restrict,
.maxlen = sizeof(int),
--
2.7.3

Steven Rostedt

unread,
Jul 5, 2016, 2:27:20 PM7/5/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
Is the comm important? Maybe add the function that called it?

"%pS", _THIS_IP_

Perhaps add __always_inline, as _THIS_IP_ will point into the function
that calls this?

-- Steve

Borislav Petkov

unread,
Jul 5, 2016, 2:45:36 PM7/5/16
to Steven Rostedt, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Tue, Jul 05, 2016 at 02:26:48PM -0400, Steven Rostedt wrote:
> > + if (rs->missed)
> > + printk(KERN_WARNING "%s: %d output lines suppressed due to ratelimiting\n",
> > + current->comm, rs->missed);
>
> Is the comm important?

Yes, we wanna dump the task name which called devkmsg_release().

> Maybe add the function that called it?
>
> "%pS", _THIS_IP_
>
> Perhaps add __always_inline, as _THIS_IP_ will point into the function
> that calls this?

That would inadvertently be devkmsg_release() in this use case. But
we want to dump the task name which opened and spat so much crap into
/dev/kmsg so as to cause the ratelimiting to hit.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.

Steven Rostedt

unread,
Jul 5, 2016, 2:57:50 PM7/5/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Tue, 5 Jul 2016 20:45:17 +0200
Borislav Petkov <b...@alien8.de> wrote:

> On Tue, Jul 05, 2016 at 02:26:48PM -0400, Steven Rostedt wrote:
> > > + if (rs->missed)
> > > + printk(KERN_WARNING "%s: %d output lines suppressed due to ratelimiting\n",
> > > + current->comm, rs->missed);
> >
> > Is the comm important?
>
> Yes, we wanna dump the task name which called devkmsg_release().
>
> > Maybe add the function that called it?
> >
> > "%pS", _THIS_IP_
> >
> > Perhaps add __always_inline, as _THIS_IP_ will point into the function
> > that calls this?
>
> That would inadvertently be devkmsg_release() in this use case. But
> we want to dump the task name which opened and spat so much crap into
> /dev/kmsg so as to cause the ratelimiting to hit.
>

Perhaps we should show both, unless you don't think this will ever be
used by anything other than devkmsg?

-- Steve

Borislav Petkov

unread,
Jul 5, 2016, 3:43:03 PM7/5/16
to Steven Rostedt, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Tue, Jul 05, 2016 at 02:57:32PM -0400, Steven Rostedt wrote:
> Perhaps we should show both, unless you don't think this will ever be
> used by anything other than devkmsg?

I'd say let's do it only when we go down that road and start using it
for something else.

Because, for example, the ratelimiting thing is, in fact, generic but it
is used primarily to ratelimit printks. Even though it could be used for
something else, theoretically...

Steven Rostedt

unread,
Jul 5, 2016, 3:50:19 PM7/5/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Tue, 5 Jul 2016 21:42:45 +0200
Borislav Petkov <b...@alien8.de> wrote:

> On Tue, Jul 05, 2016 at 02:57:32PM -0400, Steven Rostedt wrote:
> > Perhaps we should show both, unless you don't think this will ever be
> > used by anything other than devkmsg?
>
> I'd say let's do it only when we go down that road and start using it
> for something else.
>
> Because, for example, the ratelimiting thing is, in fact, generic but it
> is used primarily to ratelimit printks. Even though it could be used for
> something else, theoretically...
>

But you know... Build it and they will come.

OK, I'm fine then.

-- Steve

Joe Perches

unread,
Jul 5, 2016, 4:09:06 PM7/5/16
to Steven Rostedt, Borislav Petkov, Paolo Bonzini, Radim Krčmář, Christian Borntraeger, Cornelia Huck, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
arch/s390/kvm/kvm-s390.c:       ratelimit_state_init(&kvm->arch.sthyi_limit, 5 * HZ, 500);

As far as I know, _ratelimit is used for non-printk purposes in
arch/s390/kvm/sthyi.c

int handle_sthyi(struct kvm_vcpu *vcpu)
{
int reg1, reg2, r = 0;
u64 code, addr, cc = 0;
struct sthyi_sctns *sctns = NULL;

/*
* STHYI requires extensive locking in the higher hypervisors
* and is very computational/memory expensive. Therefore we
* ratelimit the executions per VM.
*/
if (!__ratelimit(&vcpu->kvm->arch.sthyi_limit)) {
kvm_s390_retry_instr(vcpu);
return 0;
}



Christian Borntraeger

unread,
Jul 5, 2016, 4:54:13 PM7/5/16
to Joe Perches, Steven Rostedt, Borislav Petkov, Paolo Bonzini, Radim Krčmář, Cornelia Huck, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König, Janosch Frank
Yes, this is new in next. As far as I can see, the new message would only
appear if we would call ratelimit_state_exit. Correct? We do not call this -
I assume this is ok?

We really only want to reuse the rate limit base code (to avoid writing the same
code twice) and being in lib indicated that this can indeed be used outside
printk.
Now: your patch 1 would allow me to get rid of the messages completely
by setting the flag and by not calling ratelimit_state_exit. Which is probably
what we should do in our code.


Christian


Paolo Bonzini

unread,
Jul 5, 2016, 5:15:46 PM7/5/16
to Christian Borntraeger, Joe Perches, Steven Rostedt, Borislav Petkov, Radim Krčmář, Cornelia Huck, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König, Janosch Frank


On 05/07/2016 22:53, Christian Borntraeger wrote:
> Yes, this is new in next. As far as I can see, the new message would only
> appear if we would call ratelimit_state_exit. Correct? We do not call this -
> I assume this is ok?
>
> We really only want to reuse the rate limit base code (to avoid writing the same
> code twice) and being in lib indicated that this can indeed be used outside
> printk.
> Now: your patch 1 would allow me to get rid of the messages completely
> by setting the flag and by not calling ratelimit_state_exit. Which is probably
> what we should do in our code.

Can we delay fixing this after the code is merged in Linus's tree?

Paolo

Christian Borntraeger

unread,
Jul 5, 2016, 5:23:30 PM7/5/16
to Paolo Bonzini, Joe Perches, Steven Rostedt, Borislav Petkov, Radim Krčmář, Cornelia Huck, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König, Janosch Frank
Absolutely. We already have 2 smaller conflicts in next and I certainly do not want to add
another one.

The current ratelimit print does not hurt - it is just not necessary for us.
so my statement was just a "statement of direction" to write some IBM speak ;-)




Borislav Petkov

unread,
Jul 5, 2016, 5:32:07 PM7/5/16
to Christian Borntraeger, Joe Perches, Steven Rostedt, Paolo Bonzini, Radim Krčmář, Cornelia Huck, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König, Janosch Frank
On Tue, Jul 05, 2016 at 10:53:55PM +0200, Christian Borntraeger wrote:
> Yes, this is new in next. As far as I can see, the new message would only
> appear if we would call ratelimit_state_exit. Correct? We do not call this -
> I assume this is ok?

Right, the idea for the /dev/kmsg use case was to issue the suppressed
count only when we release the ratelimit state.

> We really only want to reuse the rate limit base code (to avoid writing the same
> code twice) and being in lib indicated that this can indeed be used outside
> printk.
> Now: your patch 1 would allow me to get rid of the messages completely
> by setting the flag and by not calling ratelimit_state_exit. Which is probably
> what we should do in our code.

Yeah, that should work for your usecase.

Steven Rostedt

unread,
Jul 5, 2016, 5:48:02 PM7/5/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Mon, 4 Jul 2016 16:24:52 +0200
Borislav Petkov <b...@alien8.de> wrote:

> @@ -614,6 +663,7 @@ struct devkmsg_user {
> u64 seq;
> u32 idx;
> enum log_flags prev;
> + struct ratelimit_state rs;
> struct mutex lock;
> char buf[CONSOLE_EXT_LOG_MAX];
> };
> @@ -623,11 +673,24 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
> char *buf, *line;
> int level = default_message_loglevel;
> int facility = 1; /* LOG_USER */
> + struct file *file = iocb->ki_filp;
> + struct devkmsg_user *user = file->private_data;
> size_t len = iov_iter_count(from);
> ssize_t ret = len;
>
> - if (len > LOG_LINE_MAX)
> + if (!user || len > LOG_LINE_MAX)
> return -EINVAL;
> +
> + /* Ignore when user logging is disabled. */
> + if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
> + return len;

I wonder if we should return some sort of error message here? ENODEV?
Hmm, there's no error message when it is disabled? I'm not sure that is
what we want. I specifically had the return be an error on open if it
was disabled, because (surprisingly) systemd does the right thing and
uses another utility for syslogging.

If you silently fail here, then we lose all logging because systemd
thinks this is working when it is not. That's not what I want.

-- Steve

Borislav Petkov

unread,
Jul 5, 2016, 6:02:42 PM7/5/16
to Steven Rostedt, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Tue, Jul 05, 2016 at 05:47:49PM -0400, Steven Rostedt wrote:
> I wonder if we should return some sort of error message here? ENODEV?

What for?

We're silently ignoring it. If we start returning an error here, we
might break doofus if it checks the write retval.

And it's not like we care - we're ignoring all writes whatsoever.

> If you silently fail here, then we lose all logging because systemd
> thinks this is working when it is not. That's not what I want.

Hmm, ok, you're making sense to me.

Do you want an error message too or only an -ENODEV returned or
somesuch?

Steven Rostedt

unread,
Jul 5, 2016, 6:09:07 PM7/5/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Wed, 6 Jul 2016 00:02:21 +0200
Borislav Petkov <b...@alien8.de> wrote:

> > If you silently fail here, then we lose all logging because systemd
> > thinks this is working when it is not. That's not what I want.
>
> Hmm, ok, you're making sense to me.
>
> Do you want an error message too or only an -ENODEV returned or
> somesuch?
>

My patch returned -EPERM and it works on my box. That is, systemd finds
something else to log to.

-- Steve

Borislav Petkov

unread,
Jul 6, 2016, 9:28:24 AM7/6/16
to LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

Separated from a previous patch by Linus.

Also, make the ON_RELEASE image not use "callbacks" as it is misleading.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---

v3.1: Simplify testing of !RATELIMIT_MSG_ON_RELEASE in ___ratelimit()

include/linux/ratelimit.h | 36 +++++++++++++++++++++++++++++++-----
lib/ratelimit.c | 15 +++++++++------
2 files changed, 40 insertions(+), 11 deletions(-)
index 2c5de86460c5..734cba5c8842 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -46,12 +46,15 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
rs->begin = jiffies;

if (time_is_before_jiffies(rs->begin + rs->interval)) {
- if (rs->missed)
- printk(KERN_WARNING "%s: %d callbacks suppressed\n",
- func, rs->missed);
- rs->begin = jiffies;
- rs->printed = 0;
- rs->missed = 0;
+ if (rs->missed) {
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
+ pr_warn("%s: %d callbacks suppressed\n", func, rs->missed);
+ rs->missed = 0;
+ }
+
+ rs->begin = jiffies;
+ rs->printed = 0;
+ }
}
if (rs->burst && rs->burst > rs->printed) {
rs->printed++;
--
2.7.3

Borislav Petkov

unread,
Jul 6, 2016, 9:29:46 AM7/6/16
to LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
v3.1: Return error (-EPERM) on open when devkmsg logging disabled, as
Steve requested.

Documentation/kernel-parameters.txt | 6 +++
Documentation/sysctl/kernel.txt | 14 ++++++
include/linux/printk.h | 7 +++
kernel/printk/printk.c | 86 +++++++++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ++++
5 files changed, 114 insertions(+), 8 deletions(-)
index 60cdf6386763..1e71369780f0 100644
@@ -800,19 +863,24 @@ static int devkmsg_open(struct inode *inode, struct file *file)
struct devkmsg_user *user;
int err;

- /* write-only does not need any file context */
- if ((file->f_flags & O_ACCMODE) == O_WRONLY)
- return 0;
+ if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
+ return -EPERM;

- err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
- SYSLOG_FROM_READER);
- if (err)
- return err;
+ /* write-only does not need any file context */
+ if ((file->f_flags & O_ACCMODE) != O_WRONLY) {
+ err = check_syslog_permissions(SYSLOG_ACTION_READ_ALL,
+ SYSLOG_FROM_READER);
+ if (err)
+ return err;
+ }

user = kmalloc(sizeof(struct devkmsg_user), GFP_KERNEL);
if (!user)
return -ENOMEM;

+ ratelimit_default_init(&user->rs);
+ ratelimit_set_flags(&user->rs, RATELIMIT_MSG_ON_RELEASE);
+
mutex_init(&user->lock);

raw_spin_lock_irq(&logbuf_lock);
@@ -831,6 +899,8 @@ static int devkmsg_release(struct inode *inode, struct file *file)

Steven Rostedt

unread,
Jul 6, 2016, 9:40:44 AM7/6/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
Hmm, the above changes the previous logic. That seems wrong.

Before:

if (rs->missed)
[...]
rs->begin = jiffies;
rs->printed = 0;

after:

if (rs->missed) {
rs->begin = jiffies;
rs->printed = 0;
}

I think you wanted:

if (rs->missed) {
if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
[..]
rs->missed = 0;
}
}
rs->begin = jiffies;
rs->printed = 0;

-- Steve

Joe Perches

unread,
Jul 6, 2016, 10:50:53 AM7/6/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Wed, 2016-07-06 at 15:28 +0200, Borislav Petkov wrote:
> Extend the ratelimiting facility to print the amount of suppressed lines
> when it is being released.
[]
> diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
[]
> +static inline void ratelimit_state_exit(struct ratelimit_state *rs)
> +{
> + if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
> + return;
> +
> + if (rs->missed)
> + printk(KERN_WARNING "%s: %d output lines suppressed due to ratelimiting\n",
> +        current->comm, rs->missed);

Please use pr_warn as it will use whatever
pr_fmt prefix is specified by the subsystem.

Maybe:
if (rs->missed)
pr_warn("ratelimit: %s: %d output lines suppressed\n",
current->comm, rs->missed);

to be more similar to the callback suppressed message in
lib/ratelimit.c

Borislav Petkov

unread,
Jul 6, 2016, 10:59:50 AM7/6/16
to Steven Rostedt, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Wed, Jul 06, 2016 at 09:40:34AM -0400, Steven Rostedt wrote:
> if (rs->missed) {
> rs->begin = jiffies;
> rs->printed = 0;
> }

This is what I get when I'm trying to juggle 10 things at the same time
:-\

Ok, here's v3.2. Thanks dude!

---
From: Borislav Petkov <b...@suse.de>
Date: Tue, 14 Jun 2016 11:51:04 +0200
Subject: [PATCH -v3.2 1/2] ratelimit: Extend to print suppressed messages on release
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

Separated from a previous patch by Linus.

Also, make the ON_RELEASE image not use "callbacks" as it is misleading.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---

v3.2: Simplify testing of !RATELIMIT_MSG_ON_RELEASE in ___ratelimit() correctly!

include/linux/ratelimit.h | 36 +++++++++++++++++++++++++++++++-----
lib/ratelimit.c | 10 ++++++----
2 files changed, 37 insertions(+), 9 deletions(-)
- rs->printed = 0;
- rs->missed = 0;
- rs->begin = 0;
+ rs->interval = interval;
+ rs->burst = burst;
+}
+
+static inline void ratelimit_default_init(struct ratelimit_state *rs)
+{
+ return ratelimit_state_init(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+}
+
+static inline void ratelimit_state_exit(struct ratelimit_state *rs)
+{
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
+ return;
+
+ if (rs->missed)
+ printk(KERN_WARNING "%s: %d output lines suppressed due to ratelimiting\n",
+ current->comm, rs->missed);
+}
+
+static inline void
+ratelimit_set_flags(struct ratelimit_state *rs, unsigned long flags)
+{
+ rs->flags = flags;
}

extern struct ratelimit_state printk_ratelimit_state;
diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index 2c5de86460c5..08f8043cac61 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -46,12 +46,14 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
rs->begin = jiffies;

if (time_is_before_jiffies(rs->begin + rs->interval)) {
- if (rs->missed)
- printk(KERN_WARNING "%s: %d callbacks suppressed\n",
- func, rs->missed);
+ if (rs->missed) {
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
+ pr_warn("%s: %d callbacks suppressed\n", func, rs->missed);
+ rs->missed = 0;
+ }
+ }
rs->begin = jiffies;
rs->printed = 0;
- rs->missed = 0;
}
if (rs->burst && rs->burst > rs->printed) {
rs->printed++;

Steven Rostedt

unread,
Jul 6, 2016, 1:52:22 PM7/6/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
Hmm, this does nothing to stop user space from doing the following:

while :; do echo '5,0,0,-;hello' > /dev/kmsg; done

But at least it's a start.

-- Steve

Borislav Petkov

unread,
Jul 6, 2016, 2:33:25 PM7/6/16
to Steven Rostedt, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Wed, Jul 06, 2016 at 01:52:11PM -0400, Steven Rostedt wrote:
> Hmm, this does nothing to stop user space from doing the following:
>
> while :; do echo '5,0,0,-;hello' > /dev/kmsg; done

I wouldn't worry too much if /dev/kmsg has root:root perms elsewhere
too, like it is the case in my guest here:

crw-r--r-- 1 root root 1, 11 Jul 6 22:28 /dev/kmsg

root can shoot herself in the foot in countless ways anyway.

Steven Rostedt

unread,
Jul 6, 2016, 9:18:31 PM7/6/16
to Borislav Petkov, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Wed, 6 Jul 2016 16:59:29 +0200
Borislav Petkov <b...@alien8.de> wrote:

> +static inline void ratelimit_state_exit(struct ratelimit_state *rs)
> +{
> + if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
> + return;
> +
> + if (rs->missed)
> + printk(KERN_WARNING "%s: %d output lines suppressed due to ratelimiting\n",
> + current->comm, rs->missed);

Hmm, should this clear the missed flag? Especially since it isn't
cleared below.

-- Steve

Borislav Petkov

unread,
Jul 7, 2016, 1:37:24 AM7/7/16
to Steven Rostedt, LKML, Andrew Morton, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Wed, Jul 06, 2016 at 09:17:52PM -0400, Steven Rostedt wrote:
> Hmm, should this clear the missed flag? Especially since it isn't
> cleared below.

The expectation is that after you call exit on something, you don't need
it anymore. But I know exactly why you're asking for this so I'll do the
change. :-)

Borislav Petkov

unread,
Jul 8, 2016, 5:19:08 AM7/8/16
to Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Hi all,

sorry for spamming so quickly again and not waiting for a week before
resubmitting but I believe the stuff is ready for 4.8.

So here's v4 with all the minor review comments addressed.


Changelog:
----------

v3:

here's v3 integrating Ingo's comments. The thing is called
printk.devkmsg= or printk_devkmsg now, depending on cmdline option or
sysctl.


v2:

here's v2 with the requested sysctl option kernel.printk_kmsg and
locking of the setting when printk.kmsg= is supplied on the command
line.

Patch 1 is unchanged.

Patch 2 has grown the sysctl addition.

v1:

Rostedt is busy so I took Linus' old patch and Steven's last v2 and
split and extended them with the comments people had on the last thread:

https://lkml.kernel.org/r/20160425145...@gandalf.local.home

I hope, at least.

So it is ratelimiting by default, with "on" and "off" cmdline options. I
called the option somewhat a bit shorter too: "printk.kmsg"

The current use cases of this and of which I'm aware are:

* debug the kernel and thus shut up all interfering input from
userspace, i.e. boot with "printk.kmsg=off"

* debug userspace (and by that I mean systemd) by booting with
"printk.kmsg=on" so that the ratelimiting is disabled and the kernel log
gets all the spew.

Thoughts?

Please queue,
thanks.

Borislav Petkov (2):
ratelimit: Extend to print suppressed messages on release
printk: Add kernel parameter to control writes to /dev/kmsg

Documentation/kernel-parameters.txt | 6 +++
Documentation/sysctl/kernel.txt | 14 ++++++
include/linux/printk.h | 7 +++
include/linux/ratelimit.h | 38 +++++++++++++---
kernel/printk/printk.c | 86 +++++++++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ++++
lib/ratelimit.c | 10 +++--
7 files changed, 153 insertions(+), 17 deletions(-)

--
2.7.3

Borislav Petkov

unread,
Jul 8, 2016, 5:19:25 AM7/8/16
to Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

Separated from a previous patch by Linus.

Also, make the ON_RELEASE image not use "callbacks" as it is misleading.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---
include/linux/ratelimit.h | 38 +++++++++++++++++++++++++++++++++-----
lib/ratelimit.c | 10 ++++++----
2 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/include/linux/ratelimit.h b/include/linux/ratelimit.h
index 18102529254e..57c9e0622a38 100644
--- a/include/linux/ratelimit.h
+++ b/include/linux/ratelimit.h
@@ -2,11 +2,15 @@
#define _LINUX_RATELIMIT_H

#include <linux/param.h>
+#include <linux/sched.h>
#include <linux/spinlock.h>

#define DEFAULT_RATELIMIT_INTERVAL (5 * HZ)
#define DEFAULT_RATELIMIT_BURST 10

+/* issue num suppressed message on exit */
+#define RATELIMIT_MSG_ON_RELEASE BIT(0)
+
struct ratelimit_state {
raw_spinlock_t lock; /* protect the state */

@@ -15,6 +19,7 @@ struct ratelimit_state {
int printed;
int missed;
unsigned long begin;
+ unsigned long flags;
};

#define RATELIMIT_STATE_INIT(name, interval_init, burst_init) { \
@@ -34,12 +39,35 @@ struct ratelimit_state {
static inline void ratelimit_state_init(struct ratelimit_state *rs,
int interval, int burst)
{
+ memset(rs, 0, sizeof(*rs));
+
raw_spin_lock_init(&rs->lock);
- rs->interval = interval;
- rs->burst = burst;
- rs->printed = 0;
- rs->missed = 0;
- rs->begin = 0;
+ rs->interval = interval;
+ rs->burst = burst;
+}
+
+static inline void ratelimit_default_init(struct ratelimit_state *rs)
+{
+ return ratelimit_state_init(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+}
+
+static inline void ratelimit_state_exit(struct ratelimit_state *rs)
+{
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
+ return;
+
+ if (rs->missed) {
+ pr_warn("%s: %d output lines suppressed due to ratelimiting\n",
+ current->comm, rs->missed);
--
2.7.3

Borislav Petkov

unread,
Jul 8, 2016, 5:19:31 AM7/8/16
to Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg. It has three options:

* ratelimit - ratelimit logging from userspace.
* on - unlimited logging from userspace
* off - logging from userspace gets ignored

The default setting is to ratelimit the messages written to it.

It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.

This patch is based on previous patches from Linus and Steven.

In addition, we can control the logging from a lower priority
sysctl interface - kernel.printk_devkmsg={0,1,2} - with numeric values
corresponding to the options above.

That interface will succeed only if printk.devkmsg *hasn't* been supplied
on the command line. If it has, then printk.devkmsg is a one-time setting
which remains for the duration of the system lifetime.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---
Documentation/kernel-parameters.txt | 6 +++
Documentation/sysctl/kernel.txt | 14 ++++++
include/linux/printk.h | 7 +++
kernel/printk/printk.c | 86 +++++++++++++++++++++++++++++++++----
kernel/sysctl.c | 9 ++++
+ if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
+ return len;
+
+ /* Ratelimit when not explicitly enabled or when we're not booting. */
+ if ((system_state != SYSTEM_BOOTING) && !(devkmsg_log & DEVKMSG_LOG_MASK_ON)) {
+ if (!___ratelimit(&user->rs, current->comm))
+ return ret;
+ }
+
buf = kmalloc(len+1, GFP_KERNEL);
if (buf == NULL)
return -ENOMEM;

Andrew Morton

unread,
Jul 14, 2016, 4:23:40 PM7/14/16
to Borislav Petkov, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Fri, 8 Jul 2016 11:18:50 +0200 Borislav Petkov <b...@alien8.de> wrote:

> From: Borislav Petkov <b...@suse.de>
>
> Add a "printk.devkmsg" kernel command line parameter which controls how
> userspace writes into /dev/kmsg. It has three options:
>
> * ratelimit - ratelimit logging from userspace.
> * on - unlimited logging from userspace
> * off - logging from userspace gets ignored
>
> The default setting is to ratelimit the messages written to it.

Which changes current kernel behaviour. Can we please get some
discussion into this changelog which justifies that decision? What's
the reasoning and how do we know it won't break existing stuff?

> It additionally does not limit logging to /dev/kmsg while the system is
> booting if we haven't disabled it on the command line.
>
> This patch is based on previous patches from Linus and Steven.
>
> In addition, we can control the logging from a lower priority
> sysctl interface - kernel.printk_devkmsg={0,1,2} - with numeric values
> corresponding to the options above.
>
> That interface will succeed only if printk.devkmsg *hasn't* been supplied
> on the command line. If it has, then printk.devkmsg is a one-time setting
> which remains for the duration of the system lifetime.

Please also include a description of the reasoning behind this design
decision.

> Signed-off-by: Borislav Petkov <b...@suse.de>
> Cc: Andrew Morton <ak...@linux-foundation.org>
> Cc: Franck Bui <fb...@suse.com>
> Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> Cc: Ingo Molnar <mi...@kernel.org>
> Cc: Linus Torvalds <torv...@linux-foundation.org>
> Cc: Peter Zijlstra <pet...@infradead.org>
> Cc: Steven Rostedt <ros...@goodmis.org>
> Cc: Uwe Kleine-K__nig <u.klein...@pengutronix.de>

Did nobody ack any of this?

> Documentation/kernel-parameters.txt | 6 +++
> Documentation/sysctl/kernel.txt | 14 ++++++
> include/linux/printk.h | 7 +++
> kernel/printk/printk.c | 86 +++++++++++++++++++++++++++++++++----
> kernel/sysctl.c | 9 ++++
> 5 files changed, 114 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> index 82b42c958d1c..0b1fea56dd49 100644
> --- a/Documentation/kernel-parameters.txt
> +++ b/Documentation/kernel-parameters.txt
> @@ -3150,6 +3150,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
> default: disabled
>
> + printk.devkmsg={on,off}
> + Control writing to /dev/kmsg.
> + on - unlimited logging to /dev/kmsg from userspace
> + off - logging to /dev/kmsg disabled
> + Default: ratelimited logging.

printk.devkmsg=ratelimit is undocumented?

> printk.time= Show timing data prefixed to each printk message line
> Format: <bool> (1/Y/y=enable, 0/N/n=disable)
>
> diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> index a3683ce2a2f3..dec84d90061c 100644
> --- a/Documentation/sysctl/kernel.txt
> +++ b/Documentation/sysctl/kernel.txt
> @@ -752,6 +752,20 @@ send before ratelimiting kicks in.
>
> ==============================================================
>
> +printk_devkmsg:
> +
> +Control the logging to /dev/kmsg from userspace:
> +
> +0: default, ratelimited
> +1: unlimited logging to /dev/kmsg from userspace
> +2: logging to /dev/kmsg disabled

hm. If we're going to make this a pain old 0/1/2 then perhaps the boot
parameter should also simply accept 0/1/2.

> +The kernel command line parameter printk.devkmsg= overrides this and is
> +a one-time setting until next reboot: once set, it cannot be changed by
> +this sysctl interface anymore.

Why?

> +
> +==============================================================
> +
> randomize_va_space:
>
> This option can be used to select the type of process address
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index f4da695fd615..e6bb50751504 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -171,6 +171,13 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
> extern int printk_delay_msec;
> extern int dmesg_restrict;
> extern int kptr_restrict;
> +extern unsigned int devkmsg_log;
> +
> +struct ctl_table;

Please put these forward declarations near top-of-file. Otherwise we
can later get duplicates.

> +extern int
> +devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
> + size_t *lenp, loff_t *ppos);
>
> extern void wake_up_klogd(void);
>
> ...
>
> +unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
> +static int __init control_devkmsg(char *str)
> +{
> + if (!str)
> + return -EINVAL;
> +
> + if (!strncmp(str, "on", 2))
> + devkmsg_log = DEVKMSG_LOG_MASK_ON;
> + else if (!strncmp(str, "off", 3))
> + devkmsg_log = DEVKMSG_LOG_MASK_OFF;
> + else if (!strncmp(str, "ratelimit", 9))
> + devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
> + else
> + return -EINVAL;
> +
> + /* Sysctl cannot change it anymore. */

Please enhance the comment to describe why this is being done. The
reasoning behind it.

> + devkmsg_log |= DEVKMSG_LOG_MASK_LOCK;
> +
> + return 0;
> +}
> +__setup("printk.devkmsg=", control_devkmsg);
> +
> +
> +int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> + void __user *buffer, size_t *lenp, loff_t *ppos)
> +{
> + if (devkmsg_log & DEVKMSG_LOG_MASK_LOCK) {
> + if (write)
> + return -EINVAL;
> + }
> +
> + return proc_dointvec_minmax(table, write, buffer, lenp, ppos);
> +}
> +
>
> ...
>

Andrew Morton

unread,
Jul 14, 2016, 4:29:45 PM7/14/16
to Borislav Petkov, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Fri, 8 Jul 2016 11:18:49 +0200 Borislav Petkov <b...@alien8.de> wrote:

> From: Borislav Petkov <b...@suse.de>
>
> Extend the ratelimiting facility to print the amount of suppressed lines
> when it is being released.

Why? What's driving this? What are the benefits to our users? Are
there any downsides or back-compatibility issues?

I see from the code that this is not actually enabled by default. The
client code must use ratelimit_set_flags() to select this behaviour,
and the second patch uses this. Please include all such info in the
changelog.

> Separated from a previous patch by Linus.
>
> Also, make the ON_RELEASE image not use "callbacks" as it is misleading.

"image"?

I don't understand this sentence. What's misleading about what? Much
more detail please.

>
> ...
>
> --- a/lib/ratelimit.c
> +++ b/lib/ratelimit.c
> @@ -46,12 +46,14 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
> rs->begin = jiffies;
>
> if (time_is_before_jiffies(rs->begin + rs->interval)) {
> - if (rs->missed)
> - printk(KERN_WARNING "%s: %d callbacks suppressed\n",
> - func, rs->missed);
> + if (rs->missed) {
> + if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> + pr_warn("%s: %d callbacks suppressed\n", func, rs->missed);
> + rs->missed = 0;
> + }
> + }

hm, what's the difference between an output line being suppressed and a
callback being suppressed? I think I've forgotten how this code works ;)

Steven Rostedt

unread,
Jul 14, 2016, 4:40:13 PM7/14/16
to Andrew Morton, Borislav Petkov, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Thu, 14 Jul 2016 13:23:23 -0700
Andrew Morton <ak...@linux-foundation.org> wrote:

> On Fri, 8 Jul 2016 11:18:50 +0200 Borislav Petkov <b...@alien8.de> wrote:
>
> > From: Borislav Petkov <b...@suse.de>
> >
> > Add a "printk.devkmsg" kernel command line parameter which controls how
> > userspace writes into /dev/kmsg. It has three options:
> >
> > * ratelimit - ratelimit logging from userspace.
> > * on - unlimited logging from userspace
> > * off - logging from userspace gets ignored
> >
> > The default setting is to ratelimit the messages written to it.
>
> Which changes current kernel behaviour. Can we please get some
> discussion into this changelog which justifies that decision? What's
> the reasoning and how do we know it won't break existing stuff?

Well, Linus was the one that suggested the ratelimit default. Usually
when userspace starts spamming dev/kmsg, it breaks the kernel. Well,
it's the same as when anything in the kernel spams the console. Why let
userspace do it (and it does it a lot) where data gets lost anyway, and
worse yet, so do all kernel messages. Which is the complaint about
letting userspace spam the kernel printk buffer in the first place.


>
> > It additionally does not limit logging to /dev/kmsg while the system is
> > booting if we haven't disabled it on the command line.
> >
> > This patch is based on previous patches from Linus and Steven.
> >
> > In addition, we can control the logging from a lower priority
> > sysctl interface - kernel.printk_devkmsg={0,1,2} - with numeric values
> > corresponding to the options above.
> >
> > That interface will succeed only if printk.devkmsg *hasn't* been supplied
> > on the command line. If it has, then printk.devkmsg is a one-time setting
> > which remains for the duration of the system lifetime.
>
> Please also include a description of the reasoning behind this design
> decision.
>
> > Signed-off-by: Borislav Petkov <b...@suse.de>
> > Cc: Andrew Morton <ak...@linux-foundation.org>
> > Cc: Franck Bui <fb...@suse.com>
> > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> > Cc: Ingo Molnar <mi...@kernel.org>
> > Cc: Linus Torvalds <torv...@linux-foundation.org>
> > Cc: Peter Zijlstra <pet...@infradead.org>
> > Cc: Steven Rostedt <ros...@goodmis.org>
> > Cc: Uwe Kleine-K__nig <u.klein...@pengutronix.de>
>
> Did nobody ack any of this?

Heh, I've been reviewing each of his previous releases. This one got
sorta lost in my inbox since I'm working hard on my other
responsibilities done before I'm off on vacation.

>
> > Documentation/kernel-parameters.txt | 6 +++
> > Documentation/sysctl/kernel.txt | 14 ++++++
> > include/linux/printk.h | 7 +++
> > kernel/printk/printk.c | 86 +++++++++++++++++++++++++++++++++----
> > kernel/sysctl.c | 9 ++++
> > 5 files changed, 114 insertions(+), 8 deletions(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
> > index 82b42c958d1c..0b1fea56dd49 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -3150,6 +3150,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
> > Format: <bool> (1/Y/y=enable, 0/N/n=disable)
> > default: disabled
> >
> > + printk.devkmsg={on,off}
> > + Control writing to /dev/kmsg.
> > + on - unlimited logging to /dev/kmsg from userspace
> > + off - logging to /dev/kmsg disabled
> > + Default: ratelimited logging.
>
> printk.devkmsg=ratelimit is undocumented?

That needs to be fixed.

>
> > printk.time= Show timing data prefixed to each printk message line
> > Format: <bool> (1/Y/y=enable, 0/N/n=disable)
> >
> > diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
> > index a3683ce2a2f3..dec84d90061c 100644
> > --- a/Documentation/sysctl/kernel.txt
> > +++ b/Documentation/sysctl/kernel.txt
> > @@ -752,6 +752,20 @@ send before ratelimiting kicks in.
> >
> > ==============================================================
> >
> > +printk_devkmsg:
> > +
> > +Control the logging to /dev/kmsg from userspace:
> > +
> > +0: default, ratelimited
> > +1: unlimited logging to /dev/kmsg from userspace
> > +2: logging to /dev/kmsg disabled
>
> hm. If we're going to make this a pain old 0/1/2 then perhaps the boot
> parameter should also simply accept 0/1/2.

Good ponit.

>
> > +The kernel command line parameter printk.devkmsg= overrides this and is
> > +a one-time setting until next reboot: once set, it cannot be changed by
> > +this sysctl interface anymore.
>
> Why?

Should be added that we don't trust userspace. If someone wants to
disable userspace from touching /dev/kmsg, then if it is on the kernel
command line, then it should be permanent. I'm sick of userspace
using the kernel printk buffers for their own messaging. It makes it
harder to debug kernel issues. Thus, if /dev/kmsg is disabled on boot
up, I don't want any userspace application to have the ability to
're-enable' it again.
Yep, need to add: "The kernel command line setting of this parameter is
to force the setting to be permanent throughout the runtime of the
system and can not be changed at a later time. Used by admins that
don't trust their systems to behave properly."

;-)

-- Steve

Borislav Petkov

unread,
Jul 15, 2016, 12:01:04 AM7/15/16
to Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
Hi Andrew,

thanks for taking a look.

On Thu, Jul 14, 2016 at 01:29:36PM -0700, Andrew Morton wrote:
> Why? What's driving this? What are the benefits to our users? Are
> there any downsides or back-compatibility issues?
>
> I see from the code that this is not actually enabled by default. The
> client code must use ratelimit_set_flags() to select this behaviour,
> and the second patch uses this. Please include all such info in the
> changelog.

How about:

"This use case is aimed at short-termed, burst-like users of the
ratelimiting facility for which we want to output the suppressed lines
stats only once, after it has been disposed of. For an example, see
usage in /dev/kmsg."

?

> > Separated from a previous patch by Linus.
> >
> > Also, make the ON_RELEASE image not use "callbacks" as it is misleading.
>
> "image"?

Bah, it should say

"Also, change the printk line we issue on release to not use "callbacks"
as it is misleading. We're not suppressing callbacks but printk calls."

> > @@ -46,12 +46,14 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
> > rs->begin = jiffies;
> >
> > if (time_is_before_jiffies(rs->begin + rs->interval)) {
> > - if (rs->missed)
> > - printk(KERN_WARNING "%s: %d callbacks suppressed\n",
> > - func, rs->missed);
> > + if (rs->missed) {
> > + if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
> > + pr_warn("%s: %d callbacks suppressed\n", func, rs->missed);
> > + rs->missed = 0;
> > + }
> > + }
>
> hm, what's the difference between an output line being suppressed and a
> callback being suppressed? I think I've forgotten how this code works ;)

Right, ___ratelimit() gets as @func arg the name of the current calling
function:

#define __ratelimit(state) ___ratelimit(state, __func__)

I'm strongly assuming this is the "callback" ___ratelimit() is talking
about :-)

In our case, we don't have callbacks but /dev/kmsg users and I thought
the most generic way of referring to them would be by not doing so at
all but simply talking about output lines being suppressed.

--
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--

Borislav Petkov

unread,
Jul 15, 2016, 12:29:48 AM7/15/16
to Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Thu, Jul 14, 2016 at 01:23:23PM -0700, Andrew Morton wrote:
> Which changes current kernel behaviour. Can we please get some
> discussion into this changelog which justifies that decision? What's
> the reasoning and how do we know it won't break existing stuff?

What Rostedt said and in addition, since we do offer this /dev/kmsg
facility to userspace and since userspace likes to (ab-)use it from time
to time, we're adding a controlling cmdline param which should make
everyone happy as it is a tristate.

> Please also include a description of the reasoning behind this design
> decision.

See above. It is really that simple - we want to control how it is being
logged to that interface, thus a tristate. For example, the "on" usecase
I know of is people wanting to debug systemd and they want to see *all*
output go to dmesg.

The "off" use case is for kernel people not wanting to have any
userspace lines in the logs when debugging the kernel.

The "ratelimit" being the default...

> > Signed-off-by: Borislav Petkov <b...@suse.de>
> > Cc: Andrew Morton <ak...@linux-foundation.org>
> > Cc: Franck Bui <fb...@suse.com>
> > Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
> > Cc: Ingo Molnar <mi...@kernel.org>
> > Cc: Linus Torvalds <torv...@linux-foundation.org>
> > Cc: Peter Zijlstra <pet...@infradead.org>
> > Cc: Steven Rostedt <ros...@goodmis.org>
> > Cc: Uwe Kleine-K__nig <u.klein...@pengutronix.de>
>
> Did nobody ack any of this?

I'm sure they all want this but are tired of the bikeshedding :-)

> > + printk.devkmsg={on,off}
> > + Control writing to /dev/kmsg.
> > + on - unlimited logging to /dev/kmsg from userspace
> > + off - logging to /dev/kmsg disabled
> > + Default: ratelimited logging.
>
> printk.devkmsg=ratelimit is undocumented?

Fixed.

> > +printk_devkmsg:
> > +
> > +Control the logging to /dev/kmsg from userspace:
> > +
> > +0: default, ratelimited
> > +1: unlimited logging to /dev/kmsg from userspace
> > +2: logging to /dev/kmsg disabled
>
> hm. If we're going to make this a pain old 0/1/2 then perhaps the boot
> parameter should also simply accept 0/1/2.

So my reasoning to have the string option names for the cmdline
parameter is so that you don't have to always go lookup what was what,
was 1 off or on, etc.

So actually, I should make the sysctl accept "on", "off" and "ratelimit"
instead too.

Agreed?

> > +The kernel command line parameter printk.devkmsg= overrides this and is
> > +a one-time setting until next reboot: once set, it cannot be changed by
> > +this sysctl interface anymore.
>
> Why?

What Rostedt said.

> Please put these forward declarations near top-of-file. Otherwise we
> can later get duplicates.

Done.

> Please enhance the comment to describe why this is being done. The
> reasoning behind it.

I changed Rostedt's text a bit:

/*
* Sysctl cannot change it anymore. The kernel command line setting of
* this parameter is to force the setting to be permanent throughout the
* runtime of the system. This is a precation measure against userspace
* trying to be a smarta** and attempting to change it up on us.
*/

Dave Young

unread,
Jul 15, 2016, 2:21:30 AM7/15/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/08/16 at 11:18am, Borislav Petkov wrote:
> From: Borislav Petkov <b...@suse.de>
>
> Add a "printk.devkmsg" kernel command line parameter which controls how
> userspace writes into /dev/kmsg. It has three options:
>
> * ratelimit - ratelimit logging from userspace.
> * on - unlimited logging from userspace
> * off - logging from userspace gets ignored
>
> The default setting is to ratelimit the messages written to it.

Sorry for jumping in late. But I just see this today and I really like the
idea to add a switch to turn off the kmsg writing from userspace because
I suffer from it also.

I may missed the background, what is the reason for "ratelimit"? It sounds
a little odd. I think use default "on" should be safe to keep same behavior
as before.

Thanks
Dave

Borislav Petkov

unread,
Jul 15, 2016, 8:45:28 AM7/15/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Fri, Jul 15, 2016 at 02:21:09PM +0800, Dave Young wrote:
> Sorry for jumping in late. But I just see this today and I really like the
> idea to add a switch to turn off the kmsg writing from userspace because
> I suffer from it also.

Thanks, I'll send v5 soon.

> I may missed the background, what is the reason for "ratelimit"? It sounds
> a little odd. I think use default "on" should be safe to keep same behavior
> as before.

Yeah, so we want to ratelimit by default so that excessive logging
doesn't disturb system operation. People who want to see all that
blubber can do so with "on" so they should be taken care of too.

Borislav Petkov

unread,
Jul 16, 2016, 2:18:00 AM7/16/16
to Andrew Morton, LKML, Borislav Petkov, Dave Young, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Hi guys,

here's v5 with Andrew's review comments addressed (I hope all).

I've unified the setting names for both sysctl(2) and the kernel cmdline
to "on", "off" and "ratelimit" so that there's no confusion anymore as
to which sysctl number corresponds to what string. This should make
dealing with that option straight-forward.

Also, requested textual expansion regarding the reasoning for the
current design has been added in comments and commit messages.

Thanks.


Changelog:
----------

v4:

sorry for spamming so quickly again and not waiting for a week before
resubmitting but I believe the stuff is ready for 4.8.

So here's v4 with all the minor review comments addressed.

v3:

here's v3 integrating Ingo's comments. The thing is called
printk.devkmsg= or printk_devkmsg now, depending on cmdline option or
sysctl.


v2:

here's v2 with the requested sysctl option kernel.printk_kmsg and
locking of the setting when printk.kmsg= is supplied on the command
line.

Patch 1 is unchanged.

Patch 2 has grown the sysctl addition.

v1:

Rostedt is busy so I took Linus' old patch and Steven's last v2 and
split and extended them with the comments people had on the last thread:

https://lkml.kernel.org/r/20160425145...@gandalf.local.home

I hope, at least.

So it is ratelimiting by default, with "on" and "off" cmdline options. I
called the option somewhat a bit shorter too: "printk.kmsg"

The current use cases of this and of which I'm aware are:

* debug the kernel and thus shut up all interfering input from
userspace, i.e. boot with "printk.kmsg=off"

* debug userspace (and by that I mean systemd) by booting with
"printk.kmsg=on" so that the ratelimiting is disabled and the kernel log
gets all the spew.

Thoughts?

Please queue,
thanks.


Borislav Petkov (2):
ratelimit: Extend to print suppressed messages on release
printk: Add kernel parameter to control writes to /dev/kmsg

Documentation/kernel-parameters.txt | 7 ++
Documentation/sysctl/kernel.txt | 14 ++++
include/linux/printk.h | 9 +++
include/linux/ratelimit.h | 38 ++++++++--
kernel/printk/printk.c | 143 ++++++++++++++++++++++++++++++++++--
kernel/sysctl.c | 7 ++
lib/ratelimit.c | 10 ++-
7 files changed, 211 insertions(+), 17 deletions(-)

--
2.8.4

Borislav Petkov

unread,
Jul 16, 2016, 2:18:01 AM7/16/16
to Andrew Morton, LKML, Borislav Petkov, Dave Young, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Extend the ratelimiting facility to print the amount of suppressed lines
when it is being released.

This use case is aimed at short-termed, burst-like users for which we
want to output the suppressed lines stats only once, after it has been
disposed of. For an example, see /dev/kmsg usage in a follow-on patch.

Also, change the printk() line we issue on release to not use "callbacks"
as it is misleading: we're not suppressing callbacks but printk() calls.

This has been separated from a previous patch by Linus.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Dave Young <dyo...@redhat.com>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
- rs->printed = 0;
- rs->missed = 0;
- rs->begin = 0;
+ rs->interval = interval;
+ rs->burst = burst;
+}
+
+static inline void ratelimit_default_init(struct ratelimit_state *rs)
+{
+ return ratelimit_state_init(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+}
+
+static inline void ratelimit_state_exit(struct ratelimit_state *rs)
+{
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE))
+ return;
+
+ if (rs->missed) {
+ pr_warn("%s: %d output lines suppressed due to ratelimiting\n",
+ current->comm, rs->missed);
+ rs->missed = 0;
+ }
+}
+
+static inline void
+ratelimit_set_flags(struct ratelimit_state *rs, unsigned long flags)
+{
+ rs->flags = flags;
}

extern struct ratelimit_state printk_ratelimit_state;
diff --git a/lib/ratelimit.c b/lib/ratelimit.c
index 2c5de86460c5..08f8043cac61 100644
--- a/lib/ratelimit.c
+++ b/lib/ratelimit.c
@@ -46,12 +46,14 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func)
rs->begin = jiffies;

if (time_is_before_jiffies(rs->begin + rs->interval)) {
- if (rs->missed)
- printk(KERN_WARNING "%s: %d callbacks suppressed\n",
- func, rs->missed);
+ if (rs->missed) {
+ if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) {
+ pr_warn("%s: %d callbacks suppressed\n", func, rs->missed);
+ rs->missed = 0;
+ }
+ }
rs->begin = jiffies;
rs->printed = 0;
- rs->missed = 0;
}
if (rs->burst && rs->burst > rs->printed) {
rs->printed++;
--
2.8.4

Borislav Petkov

unread,
Jul 16, 2016, 2:18:18 AM7/16/16
to Andrew Morton, LKML, Borislav Petkov, Dave Young, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
From: Borislav Petkov <b...@suse.de>

Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg. It has three options:

* ratelimit - ratelimit logging from userspace.
* on - unlimited logging from userspace
* off - logging from userspace gets ignored

The default setting is to ratelimit the messages written to it.

This changes the kernel default setting of "on" to "ratelimit" and we
do that because we want to keep userspace spamming /dev/kmsg to sane
levels. This is especially moot when a small kernel log buffer wraps
around and messages get lost. So the ratelimiting setting should be a
sane setting where kernel messages should have a bit higher chance of
survival from all the spamming.

It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.

Furthermore, we can control the logging from a lower priority sysctl
interface - kernel.printk_devkmsg.

That interface will succeed only if printk.devkmsg *hasn't* been
supplied on the command line. If it has, then printk.devkmsg is a
one-time setting which remains for the duration of the system lifetime.
This "locking" of the setting is to prevent userspace from changing the
logging on us through sysctl(2).

This patch is based on previous patches from Linus and Steven.

Signed-off-by: Borislav Petkov <b...@suse.de>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Dave Young <dyo...@redhat.com>
Cc: Franck Bui <fb...@suse.com>
Cc: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Steven Rostedt <ros...@goodmis.org>
Cc: Uwe Kleine-König <u.klein...@pengutronix.de>
---
Documentation/kernel-parameters.txt | 7 ++
Documentation/sysctl/kernel.txt | 14 ++++
include/linux/printk.h | 9 +++
kernel/printk/printk.c | 143 ++++++++++++++++++++++++++++++++++--
kernel/sysctl.c | 7 ++
5 files changed, 172 insertions(+), 8 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 82b42c958d1c..a8e9f0bd7801 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -3150,6 +3150,13 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Format: <bool> (1/Y/y=enable, 0/N/n=disable)
default: disabled

+ printk.devkmsg={on,off,ratelimit}
+ Control writing to /dev/kmsg.
+ on - unlimited logging to /dev/kmsg from userspace
+ off - logging to /dev/kmsg disabled
+ ratelimit - ratelimit the logging
+ Default: ratelimit
+
printk.time= Show timing data prefixed to each printk message line
Format: <bool> (1/Y/y=enable, 0/N/n=disable)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index a3683ce2a2f3..378421d423fa 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -752,6 +752,20 @@ send before ratelimiting kicks in.

==============================================================

+printk_devkmsg:
+
+Control the logging to /dev/kmsg from userspace:
+
+ratelimit: default, ratelimited
+on: unlimited logging to /dev/kmsg from userspace
+off: logging to /dev/kmsg disabled
+
+The kernel command line parameter printk.devkmsg= overrides this and is
+a one-time setting until next reboot: once set, it cannot be changed by
+this sysctl interface anymore.
+
+==============================================================
+
randomize_va_space:

This option can be used to select the type of process address
diff --git a/include/linux/printk.h b/include/linux/printk.h
index f4da695fd615..9c21535e5714 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -61,6 +61,11 @@ static inline void console_verbose(void)
console_loglevel = CONSOLE_LOGLEVEL_MOTORMOUTH;
}

+/* strlen("ratelimit") + 1 */
+#define DEVKMSG_STR_MAX_SIZE 10
+extern char devkmsg_log_str[];
+struct ctl_table;
+
struct va_format {
const char *fmt;
va_list *va;
@@ -172,6 +177,10 @@ extern int printk_delay_msec;
extern int dmesg_restrict;
extern int kptr_restrict;

+extern int
+devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write, void __user *buf,
+ size_t *lenp, loff_t *ppos);
+
extern void wake_up_klogd(void);

char *log_buf_addr_get(void);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 60cdf6386763..ea5944aa71d9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -86,6 +86,112 @@ static struct lockdep_map console_lock_dep_map = {
};
#endif

+enum devkmsg_log_bits {
+ __DEVKMSG_LOG_BIT_ON = 0,
+ __DEVKMSG_LOG_BIT_OFF,
+ __DEVKMSG_LOG_BIT_LOCK,
+};
+
+enum devkmsg_log_masks {
+ DEVKMSG_LOG_MASK_ON = BIT(__DEVKMSG_LOG_BIT_ON),
+ DEVKMSG_LOG_MASK_OFF = BIT(__DEVKMSG_LOG_BIT_OFF),
+ DEVKMSG_LOG_MASK_LOCK = BIT(__DEVKMSG_LOG_BIT_LOCK),
+};
+
+/* Keep both the 'on' and 'off' bits clear, i.e. ratelimit by default: */
+#define DEVKMSG_LOG_MASK_DEFAULT 0
+
+static unsigned int __read_mostly devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
+
+static int __control_devkmsg(char *str)
+{
+ if (!str)
+ return -EINVAL;
+
+ if (!strncmp(str, "on", 2)) {
+ devkmsg_log = DEVKMSG_LOG_MASK_ON;
+ return 2;
+ } else if (!strncmp(str, "off", 3)) {
+ devkmsg_log = DEVKMSG_LOG_MASK_OFF;
+ return 3;
+ } else if (!strncmp(str, "ratelimit", 9)) {
+ devkmsg_log = DEVKMSG_LOG_MASK_DEFAULT;
+ return 9;
+ }
+ else
+ return -EINVAL;
+}
+
+static int __init control_devkmsg(char *str)
+{
+ if (__control_devkmsg(str) < 0)
+ return 1;
+
+ /*
+ * Set sysctl string accordingly:
+ */
+ if (devkmsg_log == DEVKMSG_LOG_MASK_ON) {
+ memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
+ strncpy(devkmsg_log_str, "on", 2);
+ } else if (devkmsg_log == DEVKMSG_LOG_MASK_OFF) {
+ memset(devkmsg_log_str, 0, DEVKMSG_STR_MAX_SIZE);
+ strncpy(devkmsg_log_str, "off", 3);
+ }
+ /* else "ratelimit" which is set by default. */
+
+ /*
+ * Sysctl cannot change it anymore. The kernel command line setting of
+ * this parameter is to force the setting to be permanent throughout the
+ * runtime of the system. This is a precation measure against userspace
+ * trying to be a smarta** and attempting to change it up on us.
+ */
+ devkmsg_log |= DEVKMSG_LOG_MASK_LOCK;
+
+ return 0;
+}
+__setup("printk.devkmsg=", control_devkmsg);
+
+char devkmsg_log_str[DEVKMSG_STR_MAX_SIZE] = "ratelimit";
+
+int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ char old_str[DEVKMSG_STR_MAX_SIZE];
+ unsigned int old;
+ int err;
+
+ if (write) {
+ if (devkmsg_log & DEVKMSG_LOG_MASK_LOCK)
+ return -EINVAL;
+
+ old = devkmsg_log;
+ strncpy(old_str, devkmsg_log_str, DEVKMSG_STR_MAX_SIZE);
+ }
+
+ err = proc_dostring(table, write, buffer, lenp, ppos);
+ if (err)
+ return err;
+
+ if (write) {
+ err = __control_devkmsg(devkmsg_log_str);
+
+ /*
+ * Do not accept an unknown string OR a known string with
+ * trailing crap...
+ */
+ if (err < 0 || (err + 1 != *lenp)) {
+
+ /* ... and restore old setting. */
+ devkmsg_log = old;
+ strncpy(devkmsg_log_str, old_str, DEVKMSG_STR_MAX_SIZE);
+
+ return -EINVAL;
+ }
+ }
+
+ return 0;
+}
+
/*
* Number of registered extended console drivers.
*
@@ -614,6 +720,7 @@ struct devkmsg_user {
u64 seq;
u32 idx;
enum log_flags prev;
+ struct ratelimit_state rs;
struct mutex lock;
char buf[CONSOLE_EXT_LOG_MAX];
};
@@ -623,11 +730,24 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
char *buf, *line;
int level = default_message_loglevel;
int facility = 1; /* LOG_USER */
+ struct file *file = iocb->ki_filp;
+ struct devkmsg_user *user = file->private_data;
size_t len = iov_iter_count(from);
ssize_t ret = len;

- if (len > LOG_LINE_MAX)
+ if (!user || len > LOG_LINE_MAX)
return -EINVAL;
+
+ /* Ignore when user logging is disabled. */
+ if (devkmsg_log & DEVKMSG_LOG_MASK_OFF)
+ return len;
+
+ /* Ratelimit when not explicitly enabled or when we're not booting. */
+ if ((system_state != SYSTEM_BOOTING) && !(devkmsg_log & DEVKMSG_LOG_MASK_ON)) {
+ if (!___ratelimit(&user->rs, current->comm))
+ return ret;
+ }
+
buf = kmalloc(len+1, GFP_KERNEL);
if (buf == NULL)
return -ENOMEM;
@@ -800,19 +920,24 @@ static int devkmsg_open(struct inode *inode, struct file *file)
@@ -831,6 +956,8 @@ static int devkmsg_release(struct inode *inode, struct file *file)
if (!user)
return 0;

+ ratelimit_state_exit(&user->rs);
+
mutex_destroy(&user->lock);
kfree(user);
return 0;
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 87b2fc38398b..0d9fe53682bb 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -814,6 +814,13 @@ static struct ctl_table kern_table[] = {
.extra2 = &ten_thousand,
},
{
+ .procname = "printk_devkmsg",
+ .data = devkmsg_log_str,
+ .maxlen = DEVKMSG_STR_MAX_SIZE,
+ .mode = 0644,
+ .proc_handler = devkmsg_sysctl_set_loglvl,
+ },
+ {
.procname = "dmesg_restrict",
.data = &dmesg_restrict,
.maxlen = sizeof(int),
--
2.8.4

Dave Young

unread,
Jul 16, 2016, 6:44:45 AM7/16/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/15/16 at 02:45pm, Borislav Petkov wrote:
> On Fri, Jul 15, 2016 at 02:21:09PM +0800, Dave Young wrote:
> > Sorry for jumping in late. But I just see this today and I really like the
> > idea to add a switch to turn off the kmsg writing from userspace because
> > I suffer from it also.
>
> Thanks, I'll send v5 soon.
>
> > I may missed the background, what is the reason for "ratelimit"? It sounds
> > a little odd. I think use default "on" should be safe to keep same behavior
> > as before.
>
> Yeah, so we want to ratelimit by default so that excessive logging
> doesn't disturb system operation. People who want to see all that
> blubber can do so with "on" so they should be taken care of too.

Ratelimit the writing to kmsg sounds not a good way, the ratelimit use cases
are for "callbacks" being mentioned in other thread. Basiclly specific
printk from same line of a function will be suppressed, that means the
supressed messages are somehow same error/message/warnings. But different
writings to /dev/msg could be from different userspace software so that the
messages being supressed could be different and the *ratelimit* is not
fine-grained as *ratelimt* supposed to be.

Mis-ratelimit cause critical userspace messages being lost, that is worse
than use off as default. Suppose we turn off devkmsg by default distributions
can still turn on it with sysctl and for us who do not want the flooding we can
use printk.devkmsg=off in kernel cmdline to override it.

Of course if we turn off it by default we can print a warning to alert user.

BTW, for userspace messages maybe they should not go to same log buffer, maybe
a separate log buffer for /dev/msg will be better.

Thanks
Dave

Borislav Petkov

unread,
Jul 17, 2016, 1:40:56 AM7/17/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Sat, Jul 16, 2016 at 06:44:25PM +0800, Dave Young wrote:

..

> Mis-ratelimit cause critical userspace messages being lost, that is worse

The current setting is quite generous so that all critical messages
should land in dmesg. Besides, we don't ratelimit during boot. The idea
is that userspace should switch to a different logging facility once the
system is up... which userspace does reportedly.

IOW, /dev/kmsg should handle a relatively big logging amount without
ratelimiting.

> than use off as default. Suppose we turn off devkmsg by default distributions
> can still turn on it with sysctl and for us who do not want the flooding we can
> use printk.devkmsg=off in kernel cmdline to override it.

That part I cannot parse.

> Of course if we turn off it by default we can print a warning to alert user.
>
> BTW, for userspace messages maybe they should not go to same log buffer, maybe
> a separate log buffer for /dev/msg will be better.

See above.

Dave Young

unread,
Jul 17, 2016, 10:18:25 PM7/17/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/17/16 at 07:40am, Borislav Petkov wrote:
> On Sat, Jul 16, 2016 at 06:44:25PM +0800, Dave Young wrote:
>
> ...
>
> > Mis-ratelimit cause critical userspace messages being lost, that is worse
>
> The current setting is quite generous so that all critical messages
> should land in dmesg. Besides, we don't ratelimit during boot. The idea
> is that userspace should switch to a different logging facility once the
> system is up... which userspace does reportedly.

I would say avoiding ratelimit during boot make no much sense. Userspace can not
write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
has not run yet.

>
> IOW, /dev/kmsg should handle a relatively big logging amount without
> ratelimiting.
>
> > than use off as default. Suppose we turn off devkmsg by default distributions
> > can still turn on it with sysctl and for us who do not want the flooding we can
> > use printk.devkmsg=off in kernel cmdline to override it.
>
> That part I cannot parse.

I do not understand, care to elaborate a bit?

Let me explain my comments, I means to set printk.devkmsg=off by default,
userspace can set it to on by sysctl. User can provide kernel cmdline
printk.devkmsg=off if he/she want.

Or set printk.devkmsg=on by default to avoid break userspace, it is also fine.

>
> > Of course if we turn off it by default we can print a warning to alert user.
> >
> > BTW, for userspace messages maybe they should not go to same log buffer, maybe
> > a separate log buffer for /dev/msg will be better.
>
> See above.
>

Thanks
Dave

Borislav Petkov

unread,
Jul 18, 2016, 12:44:18 AM7/18/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote:
> I would say avoiding ratelimit during boot make no much sense. Userspace can not
> write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
> has not run yet.

You're right - kernel_init() sets SYSTEM_RUNNING before running the init
process. I probably should kill all that logic in the second patch.

> I means to set printk.devkmsg=off by default, userspace can set it to
> on by sysctl.

That can't happen: DEVKMSG_LOG_MASK_LOCK.

Dave Young

unread,
Jul 18, 2016, 1:20:48 AM7/18/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/18/16 at 06:44am, Borislav Petkov wrote:
> On Mon, Jul 18, 2016 at 10:18:09AM +0800, Dave Young wrote:
> > I would say avoiding ratelimit during boot make no much sense. Userspace can not
> > write to /dev/kmsg when system_state == SYSTEM_BOOTING because init process
> > has not run yet.
>
> You're right - kernel_init() sets SYSTEM_RUNNING before running the init
> process. I probably should kill all that logic in the second patch.
>
> > I means to set printk.devkmsg=off by default, userspace can set it to
> > on by sysctl.
>
> That can't happen: DEVKMSG_LOG_MASK_LOCK.

Sorry, seems I do not get your point, suppose using the bis defined in your
patch, shouldn't below work?

#define DEVKMSG_LOG_MASK_DEFAULT 2

Thanks
Dave

Borislav Petkov

unread,
Jul 18, 2016, 3:21:14 AM7/18/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Mon, Jul 18, 2016 at 01:20:32PM +0800, Dave Young wrote:
> Sorry, seems I do not get your point, suppose using the bis defined in your
> patch, shouldn't below work?

When you supply "printk.devkmsg=" on the kernel command line, then
devkmsg logging setting is final for the current boot because we lock it
with DEVKMSG_LOG_MASK_LOCK, see control_devkmsg() here:

https://lkml.kernel.org/r/201607160617...@alien8.de

Dave Young

unread,
Jul 18, 2016, 3:39:14 AM7/18/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/18/16 at 09:21am, Borislav Petkov wrote:
> On Mon, Jul 18, 2016 at 01:20:32PM +0800, Dave Young wrote:
> > Sorry, seems I do not get your point, suppose using the bis defined in your
> > patch, shouldn't below work?
>
> When you supply "printk.devkmsg=" on the kernel command line, then
> devkmsg logging setting is final for the current boot because we lock it
> with DEVKMSG_LOG_MASK_LOCK, see control_devkmsg() here:

Boris, I saw that point and I agreed about that design, what I do not like is
the ratelimit part. What in my mind is like below (suppose default=off)

printk.devkmsg= on kernel cmdline
-> locked, one can not use sysctl to change it
-> follow the kernel cmdline setting
no printk.devkmsg= on kernel cmdline
-> unlocked, default is off, writing to /dev/kmsg is forbidded
-> usespace set sysctl printk.devkmsg=on
-> writing to /dev/kmsg is allowed.

Thanks
Dave

Borislav Petkov

unread,
Jul 18, 2016, 4:08:47 AM7/18/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Mon, Jul 18, 2016 at 03:38:45PM +0800, Dave Young wrote:
> no printk.devkmsg= on kernel cmdline
> -> unlocked, default is off, writing to /dev/kmsg is forbidded
> -> usespace set sysctl printk.devkmsg=on
> -> writing to /dev/kmsg is allowed.

And that helps how exactly?

I don't understand why you're harping on the ratelimiting. It is a sane
default for the *default* case. Userspace should switch to its own
logging scheme when it has started anyway.

If you still want to see *all* messages, you boot with "=on" or set it
through sysctl.

So what is the problem?!

Dave Young

unread,
Jul 18, 2016, 4:17:27 AM7/18/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/18/16 at 10:08am, Borislav Petkov wrote:
> On Mon, Jul 18, 2016 at 03:38:45PM +0800, Dave Young wrote:
> > no printk.devkmsg= on kernel cmdline
> > -> unlocked, default is off, writing to /dev/kmsg is forbidded
> > -> usespace set sysctl printk.devkmsg=on
> > -> writing to /dev/kmsg is allowed.
>
> And that helps how exactly?
>
> I don't understand why you're harping on the ratelimiting. It is a sane
> default for the *default* case. Userspace should switch to its own
> logging scheme when it has started anyway.

Because IMHO it is wrong, they can not be ratelimited because the writing could
be from different userspace programs. Simply ratelimiting different sources of
writing is pointless to me. One can only see messages they would like to see
by luck, it is worse than off.

>
> If you still want to see *all* messages, you boot with "=on" or set it
> through sysctl.
>
> So what is the problem?!

As above mentioned..

Thanks
Dave

Borislav Petkov

unread,
Jul 18, 2016, 5:06:35 AM7/18/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Mon, Jul 18, 2016 at 04:17:12PM +0800, Dave Young wrote:
> Because IMHO it is wrong, they can not be ratelimited because the writing could
> be from different userspace programs.

It is ratelimited by interface openers.

> Simply ratelimiting different sources of writing is pointless to me.

So what are you arguing for? What is the *actual* *real-life* *use* case
you think will be handicapped?

> One can only see messages they would like to see by luck, it is worse
> than off.

THAT'S WHY YOU BOOT WITH "printk.devkmsg=on" TO SEE THEM ALL!

The /dev/kmsg thing was added for the more or less, wrong, historic
reasons and userspace started abusing it and interfering with kernel
operation. That's why we're adding this tristate option.

In the default case we're ratelimiting writes to it because they should
not interfere with kernel operation.

[ Frankly, those writes are pretty much useless to the normal user so
we can just as well ignore them but WTH. ]

So give me a concrete problem you see with the ralimiting and not some
notion of a feeling you might have of it being pointless, ok?

Dave Young

unread,
Jul 18, 2016, 8:35:29 PM7/18/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/18/16 at 11:06am, Borislav Petkov wrote:
> On Mon, Jul 18, 2016 at 04:17:12PM +0800, Dave Young wrote:
> > Because IMHO it is wrong, they can not be ratelimited because the writing could
> > be from different userspace programs.
>
> It is ratelimited by interface openers.

You are thinking from kernel point of view, but it will be different from
a user point of view.

>
> > Simply ratelimiting different sources of writing is pointless to me.
>
> So what are you arguing for? What is the *actual* *real-life* *use* case
> you think will be handicapped?
>
> > One can only see messages they would like to see by luck, it is worse
> > than off.
>
> THAT'S WHY YOU BOOT WITH "printk.devkmsg=on" TO SEE THEM ALL!
>
> The /dev/kmsg thing was added for the more or less, wrong, historic
> reasons and userspace started abusing it and interfering with kernel
> operation. That's why we're adding this tristate option.
>
> In the default case we're ratelimiting writes to it because they should
> not interfere with kernel operation.
>
> [ Frankly, those writes are pretty much useless to the normal user so
> we can just as well ignore them but WTH. ]
>
> So give me a concrete problem you see with the ralimiting and not some
> notion of a feeling you might have of it being pointless, ok?

Ok, for example below (A and B can be any program, systemd or other logging
utility, dracut scripts...)

Program A:
for (i = 0; i < 100; i++)
write an error message A1

for (i =0; i < 100; i++)
write an error message A2

Program B:
for (i = 0; i < 100; i++)
write an error message B1

for (i =0; i < 100; i++)
write an error message B2

In above case, ratelimit specific message like A1 is reasonable, but simply
ratelimit the writing to /dev/kmsg will cause the final messages appear are
uncertain. This is not ratelimit should do.

We provide a default printk.devkmsg, it should either works or not, the
middle point is just wrong because it helps nothing, so what is the benefit
of the ratelimit compares with "off"? If there's no benefits why we will
make the code complicate?

Thanks
Dave

Borislav Petkov

unread,
Jul 19, 2016, 2:50:01 AM7/19/16
to Dave Young, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On Tue, Jul 19, 2016 at 08:35:07AM +0800, Dave Young wrote:
> Ok, for example below (A and B can be any program, systemd or other logging
> utility, dracut scripts...)

Are you even reading what I'm writing to you?

I asked for real-life use case and I told you the ratelimiting is done
by openers of /dev/kmsg.

> Program A:
> for (i = 0; i < 100; i++)
> write an error message A1

Because this is a real use case, right? Write A1 100 times in a loop?!
You gotta be f*cking kidding me!

Now I'm going to ignore you know until you start reading what I'm
writing to you.

Dave Young

unread,
Jul 19, 2016, 3:02:32 AM7/19/16
to Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Steven Rostedt, Uwe Kleine-König
On 07/19/16 at 08:49am, Borislav Petkov wrote:
> On Tue, Jul 19, 2016 at 08:35:07AM +0800, Dave Young wrote:
> > Ok, for example below (A and B can be any program, systemd or other logging
> > utility, dracut scripts...)
>
> Are you even reading what I'm writing to you?
>
> I asked for real-life use case and I told you the ratelimiting is done
> by openers of /dev/kmsg.
>
> > Program A:
> > for (i = 0; i < 100; i++)
> > write an error message A1
>
> Because this is a real use case, right? Write A1 100 times in a loop?!
> You gotta be f*cking kidding me!

It is just emulating a use case, in a real life use case of course it shouldn't
be a for loop. Suppose checking a device existance every some period, it
is possible for same error messages being printed again and again, may for
same device, may for different device, but it is from same source code
line. I suppose you can understand, but you did not.

Please stop being angry, I was trying to understand you. I hope you can
also think from different point of view.

>
> Now I'm going to ignore you know until you start reading what I'm
> writing to you.

Same to me, I will stop reply because I have tried my best but I'm
beginning have same feeling as you that is you just do not want to read
what I wrote, sigh.

Thanks
Dave

Steven Rostedt

unread,
Jul 25, 2016, 11:18:28 AM7/25/16
to Dave Young, Borislav Petkov, Andrew Morton, LKML, Franck Bui, Greg Kroah-Hartman, Ingo Molnar, Linus Torvalds, Peter Zijlstra, Uwe Kleine-König
On Mon, 18 Jul 2016 16:17:12 +0800
Dave Young <dyo...@redhat.com> wrote:

> On 07/18/16 at 10:08am, Borislav Petkov wrote:
> > On Mon, Jul 18, 2016 at 03:38:45PM +0800, Dave Young wrote:
> > > no printk.devkmsg= on kernel cmdline
> > > -> unlocked, default is off, writing to /dev/kmsg is forbidded
> > > -> usespace set sysctl printk.devkmsg=on
> > > -> writing to /dev/kmsg is allowed.
> >
> > And that helps how exactly?
> >
> > I don't understand why you're harping on the ratelimiting. It is a sane
> > default for the *default* case. Userspace should switch to its own
> > logging scheme when it has started anyway.
>
> Because IMHO it is wrong, they can not be ratelimited because the writing could
> be from different userspace programs. Simply ratelimiting different sources of
> writing is pointless to me. One can only see messages they would like to see
> by luck, it is worse than off.
>

[ sorry for coming in late, I've been on vacation ]

Note, this is a kernel resource that, IMHO should never had been
accessed by userspace in the first place. Perhaps we should create a
separate buffer for userspace instead. But until then, we need this
usage.

If ratelimit is triggered, then something is blasting the limited
resource of the kernel, and information will be lost. Most of the time,
it's the first lines that are needed to determine what happened, so if
something starts writing a lot into the buffer, you want to save what
was first written. That's the most critical part. Rate limiting will
help save the part you want to see most.

And it could have been the kernel that caused the issue. Rate limiting
userspace to preserve the kernel output is a good thing.

-- Steve
0 new messages