Bug 44211 - /proc/kmsg does not (always) block for 1-byte reads

Jukka Ollila

unread,

Jul 6, 2012, 1:50:02 PM7/6/12

to

Hello,

A few days ago I filed a kernel regression report concerning a change
in /proc/kmsg behaviour with short reads:

https://bugzilla.kernel.org/show_bug.cgi?id=44211

The comments suggest that this is probably intentional, but that it
would be best make sure that the current semantics wrt short reads are
as intended.

The problem appears on a Debian (unstable) system that drains
/proc/kmsg into a separate fifo read by klogd(8):

/bin/dd bs=1 if=/proc/kmsg of=/var/run/klogd/kmsg

With the recent kernel logging changes this /bin/dd exits immediately,
as 1-byte reads are shorter than any log message could possibly be and
read() returns 0. No dd feeding the fifo results in no logging and a
rather unhappy klogd on the reading end of /var/run/klogd/kmsg.

I suppose a safe solution is to only do reads that are big enough for
any single kernel message, but this is still a change that affects
user space being shipped, so some might find it surprising.

I don't know what other distros do. Is it just Debian being the odd one out?

-J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Greg KH

unread,

Jul 6, 2012, 2:00:02 PM7/6/12

to

On Fri, Jul 06, 2012 at 08:45:44PM +0300, Jukka Ollila wrote:
> Hello,
>
> A few days ago I filed a kernel regression report concerning a change
> in /proc/kmsg behaviour with short reads:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=44211
>
> The comments suggest that this is probably intentional, but that it
> would be best make sure that the current semantics wrt short reads are
> as intended.
>
> The problem appears on a Debian (unstable) system that drains
> /proc/kmsg into a separate fifo read by klogd(8):
>
> /bin/dd bs=1 if=/proc/kmsg of=/var/run/klogd/kmsg
>
> With the recent kernel logging changes this /bin/dd exits immediately,
> as 1-byte reads are shorter than any log message could possibly be and
> read() returns 0. No dd feeding the fifo results in no logging and a
> rather unhappy klogd on the reading end of /var/run/klogd/kmsg.
>
> I suppose a safe solution is to only do reads that are big enough for
> any single kernel message, but this is still a change that affects
> user space being shipped, so some might find it surprising.
>
> I don't know what other distros do. Is it just Debian being the odd one out?

I think we just fixed this, what kernel version are you seeing this
problem on?

Kay did your other patches that I just accepted resolve this?

thanks,

greg k-h

Jukka Ollila

unread,

Jul 6, 2012, 2:50:02 PM7/6/12

to

On Fri, Jul 6, 2012 at 8:55 PM, Greg KH <gr...@kroah.com> wrote:
> On Fri, Jul 06, 2012 at 08:45:44PM +0300, Jukka Ollila wrote:
>> A few days ago I filed a kernel regression report concerning a change
>> in /proc/kmsg behaviour with short reads:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=44211

> I think we just fixed this, what kernel version are you seeing this
> problem on?

It's still there. Pulled a fresh tree, this is
1b7fa4c27111757789b21bb78543317dad4cfd08 from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

So if this is actually not intentional and getting fixed instead, the
changes haven't made their way upstream yet.

Thanks,

-J

Alan Cox

unread,

Jul 6, 2012, 3:40:01 PM7/6/12

to

On Fri, 6 Jul 2012 20:45:44 +0300
Jukka Ollila <jiik...@gmail.com> wrote:

> Hello,
>
> A few days ago I filed a kernel regression report concerning a change
> in /proc/kmsg behaviour with short reads:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=44211
>
> The comments suggest that this is probably intentional, but that it
> would be best make sure that the current semantics wrt short reads are
> as intended.
>
> The problem appears on a Debian (unstable) system that drains
> /proc/kmsg into a separate fifo read by klogd(8):
>
> /bin/dd bs=1 if=/proc/kmsg of=/var/run/klogd/kmsg
>
> With the recent kernel logging changes this /bin/dd exits immediately,
> as 1-byte reads are shorter than any log message could possibly be and
> read() returns 0. No dd feeding the fifo results in no logging and a
> rather unhappy klogd on the reading end of /var/run/klogd/kmsg.
>
> I suppose a safe solution is to only do reads that are big enough for
> any single kernel message, but this is still a change that affects
> user space being shipped, so some might find it surprising.
>
> I don't know what other distros do. Is it just Debian being the odd one out?

If this is observed on an actual standard distro userspace and breaks it
then its a regression and it needs fixing or reverting.

Cc'ing Linus

Linus Torvalds

unread,

Jul 6, 2012, 4:40:02 PM7/6/12

to

Kay, this needs to be fixed.

Suggested fix: just use the 'seq_printf()' interfaces, which do the
proper buffering, and allow any size reads of various packetized data.

Of course, I'd also suggest that whoever was the genius who thought it
was a good idea to read things ONE F*CKING BYTE AT A TIME with system
calls for each byte should be retroactively aborted. Who the f*ck does
idiotic things like that? How did they noty die as babies, considering
that they were likely too stupid to find a tit to suck on?

Linus

Linus Torvalds

unread,

Jul 6, 2012, 6:10:01 PM7/6/12

to

On Fri, Jul 6, 2012 at 3:05 PM, Jukka Ollila <jiik...@gmail.com> wrote:
>
> Now this got me wondering if Debian _unstable_ actually qualifies as a
> standard distro userspace.

Oh, if the kernel breaks some standard user space, that counts. Tons
of people run Debian unstable (and from my limited interactions with
it, for damn good reasons: -stable tends to run so old versions of
everything that you have to sometimes deal with cuneiform writing when
using it)

Jukka Ollila

unread,

Jul 6, 2012, 6:10:02 PM7/6/12

to

On Fri, Jul 6, 2012 at 10:38 PM, Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
> On Fri, 6 Jul 2012 20:45:44 +0300
> Jukka Ollila <jiik...@gmail.com> wrote:
>> A few days ago I filed a kernel regression report concerning a change
>> in /proc/kmsg behaviour with short reads:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=44211

>> I don't know what other distros do. Is it just Debian being the odd one out?
>
> If this is observed on an actual standard distro userspace and breaks it
> then its a regression and it needs fixing or reverting.

Now this got me wondering if Debian _unstable_ actually qualifies as a
standard distro userspace.

And I did a little digging. According to the Debian package tracking
system[1] it would seem that the _stable_ distro carries a version
that doesn't do the dd shuffling at all and probably runs its klogd as
root, reading /proc/kmsg directly. That may or may not work with
3.5-rc kernels, depending on how big its reads are. I'm CCing the
listed maintainer just in case.

The unstable version does the problematic dd bs=1 trick. Also the
Ubuntu diff in the PTS has the dd. But I have no idea how Ubuntu does
it's release management. Not to mention other derivatives.

So it might not be too late to fix this in userspace, in case the
kernel wants to move on. Some real distro people might want to chime
in.

Thanks,

-J

[1] http://packages.debian.org/search?keywords=sysklogd

Kay Sievers

unread,

Jul 7, 2012, 5:30:02 PM7/7/12

to

On Fri, Jul 6, 2012 at 10:30 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
> Kay, this needs to be fixed.
>
> Suggested fix: just use the 'seq_printf()' interfaces, which do the
> proper buffering, and allow any size reads of various packetized data.

I'll have a look.

> Of course, I'd also suggest that whoever was the genius who thought it
> was a good idea to read things ONE F*CKING BYTE AT A TIME with system
> calls for each byte should be retroactively aborted. Who the f*ck does
> idiotic things like that? How did they noty die as babies, considering
> that they were likely too stupid to find a tit to suck on?

Maybe the bs=1 in the dd call stands for bullshit. :)

Kay

Kay Sievers

unread,

Jul 7, 2012, 8:40:01 PM7/7/12

to

On Sat, Jul 7, 2012 at 12:05 AM, Jukka Ollila <jiik...@gmail.com> wrote:
> And I did a little digging. According to the Debian package tracking
> system[1] it would seem that the _stable_ distro carries a version
> that doesn't do the dd shuffling at all and probably runs its klogd as
> root, reading /proc/kmsg directly. That may or may not work with
> 3.5-rc kernels, depending on how big its reads are. I'm CCing the
> listed maintainer just in case.
>
> The unstable version does the problematic dd bs=1 trick. Also the
> Ubuntu diff in the PTS has the dd. But I have no idea how Ubuntu does
> it's release management. Not to mention other derivatives.

Just a note, unrelated to fixing the issue:

/proc/kmsg can be opened many times, but it never behaved too well
when this was done.

The processes might wake up at the same time, but only one of them
gets the data. If someone does a "cat /proc/kmsg' in parallel, the
read() in dd might return 0 and dd will exit. And this is not a recent
change, it was always the case.

Using dd here is a pretty silly idea for many reasons.

Kay

Kay Sievers

unread,

Jul 7, 2012, 10:20:01 PM7/7/12

to

On Sat, 2012-07-07 at 23:19 +0200, Kay Sievers wrote:
> On Fri, Jul 6, 2012 at 10:30 PM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
> > Kay, this needs to be fixed.
> >
> > Suggested fix: just use the 'seq_printf()' interfaces, which do the
> > proper buffering, and allow any size reads of various packetized data.
>
> I'll have a look.

Hmm, we need to block in the read() when we have no data, and we need to
support concurrent readers where only one of them sees the data, and we
need O_NONBLOCK support. Maybe I miss something but the seq_file stuff
seems to get complicated, as it takes a mutex internally which gets in
the way of the O_NONBLOCK stuff.

Here is what seems to work for me. If the buffer is to small to fit the
first record, we deliver a partial record, and start from that offset
again with the next read().

I'll need to do more testing tomorrow.

Thanks,
Kay

---
kernel/printk.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)

--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -217,6 +217,7 @@ static DEFINE_RAW_SPINLOCK(logbuf_lock);
/* the next printk record to read by syslog(READ) or /proc/kmsg */
static u64 syslog_seq;
static u32 syslog_idx;
+static size_t syslog_partial;

/* index and sequence number of the first record stored in the buffer */
static u64 log_first_seq;
@@ -890,22 +891,33 @@ static int syslog_print(char __user *buf

while (size > 0) {
size_t n;
+ size_t skip;

raw_spin_lock_irq(&logbuf_lock);
if (syslog_seq < log_first_seq) {
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
syslog_idx = log_first_idx;
+ syslog_partial = 0;
}
if (syslog_seq == log_next_seq) {
raw_spin_unlock_irq(&logbuf_lock);
break;
}
+
+ skip = syslog_partial;
msg = log_from_idx(syslog_idx);
n = msg_print_text(msg, true, text, LOG_LINE_MAX);
- if (n <= size) {
+ if (n - syslog_partial <= size) {
+ /* message fits into buffer, move forward */
syslog_idx = log_next(syslog_idx);
syslog_seq++;
+ n -= syslog_partial;
+ syslog_partial = 0;
+ } else if (!len){
+ /* partial read(), remember position */
+ n = size;
+ syslog_partial += n;
} else
n = 0;
raw_spin_unlock_irq(&logbuf_lock);
@@ -913,17 +925,15 @@ static int syslog_print(char __user *buf
if (!n)
break;

- len += n;
- size -= n;
- buf += n;
- n = copy_to_user(buf - n, text, n);
-
- if (n) {
- len -= n;
+ if (copy_to_user(buf, text + skip, n)) {
if (!len)
len = -EFAULT;
break;
}
+
+ len += n;
+ size -= n;
+ buf += n;
}

kfree(text);
@@ -1107,6 +1117,7 @@ int do_syslog(int type, char __user *buf
/* messages are gone, move to first one */
syslog_seq = log_first_seq;
syslog_idx = log_first_idx;
+ syslog_partial = 0;
}
if (from_file) {
/*
@@ -1129,6 +1140,7 @@ int do_syslog(int type, char __user *buf
idx = log_next(idx);
seq++;
}
+ error -= syslog_partial;
}
raw_spin_unlock_irq(&logbuf_lock);
break;

Kay Sievers

unread,

Jul 8, 2012, 8:40:02 AM7/8/12

to

On Fri, Jul 6, 2012 at 7:55 PM, Greg KH <gr...@kroah.com> wrote:
> On Fri, Jul 06, 2012 at 08:45:44PM +0300, Jukka Ollila wrote:
>> Hello,
>>
>> A few days ago I filed a kernel regression report concerning a change
>> in /proc/kmsg behaviour with short reads:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=44211
>>
>> The comments suggest that this is probably intentional, but that it
>> would be best make sure that the current semantics wrt short reads are
>> as intended.
>>
>> The problem appears on a Debian (unstable) system that drains
>> /proc/kmsg into a separate fifo read by klogd(8):
>>
>> /bin/dd bs=1 if=/proc/kmsg of=/var/run/klogd/kmsg
>>
>> With the recent kernel logging changes this /bin/dd exits immediately,
>> as 1-byte reads are shorter than any log message could possibly be and
>> read() returns 0. No dd feeding the fifo results in no logging and a
>> rather unhappy klogd on the reading end of /var/run/klogd/kmsg.
>>
>> I suppose a safe solution is to only do reads that are big enough for
>> any single kernel message, but this is still a change that affects
>> user space being shipped, so some might find it surprising.
>>
>> I don't know what other distros do. Is it just Debian being the odd one out?
>
> I think we just fixed this, what kernel version are you seeing this
> problem on?
>
> Kay did your other patches that I just accepted resolve this?

No, it's not fixed. We avoided the delivery of partial messages to
userspace, which obviously does not got too well with 1-byte reads. :)

I posted a patch in the other thread:
https://lkml.org/lkml/2012/7/7/118

Side note:
The patch will not fix the underlying problem, but just make it behave
more like it was and allow partial message reads. This is a years old
problem, the net is full of bugreports of stuff going wrong with
running dd bs=1 on /proc/kmsg. It is a really stupid idea, and can not
work for many other reasons too. The interface can not safely be used
that way, it does not have the usual semantics, it always returned 0
for read() whenever it needed to.

Thanks,
Kay

Alan Cox

unread,

Jul 8, 2012, 9:00:01 AM7/8/12

to

> The patch will not fix the underlying problem, but just make it behave
> more like it was and allow partial message reads. This is a years old
> problem, the net is full of bugreports of stuff going wrong with
> running dd bs=1 on /proc/kmsg. It is a really stupid idea, and can not
> work for many other reasons too. The interface can not safely be used
> that way, it does not have the usual semantics, it always returned 0
> for read() whenever it needed to.

If you are breaking the semantics perhaps that should also get fixed ?

Alan

Kay Sievers

unread,

Jul 8, 2012, 9:10:02 AM7/8/12

to

On Sun, Jul 8, 2012 at 2:59 PM, Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
>> The patch will not fix the underlying problem, but just make it behave
>> more like it was and allow partial message reads. This is a years old
>> problem, the net is full of bugreports of stuff going wrong with
>> running dd bs=1 on /proc/kmsg. It is a really stupid idea, and can not
>> work for many other reasons too. The interface can not safely be used
>> that way, it does not have the usual semantics, it always returned 0
>> for read() whenever it needed to.
>
> If you are breaking the semantics perhaps that should also get fixed ?

I hopefully just restored the old semantics now.

Fixing it properly would be a bigger code change, and it can't use the
far-too-simple tunneling through the syslog() syscall to feed
/proc/kmsg.

If the seq_file interface could be used, that would probably be the
best option, but I have no good idea how to make blocking reads, and
concurrent non-blocks work with the seq_file stuff.

This is how read() in /proc/kmsg works and it is not protected by a
lock or anything, and there is a not too small window between the
check and the action. Things just go wrong if there is more than a
single reader, but that was the case since forever.

static ssize_t kmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
{
if ((file->f_flags & O_NONBLOCK) &&
!do_syslog(SYSLOG_ACTION_SIZE_UNREAD, NULL, 0, SYSLOG_FROM_FILE))
return -EAGAIN;
return do_syslog(SYSLOG_ACTION_READ, buf, count, SYSLOG_FROM_FILE);
}

Thanks,
Kay

Jan Engelhardt

unread,

Aug 10, 2012, 9:00:02 AM8/10/12

to

On Saturday 2012-07-07 23:19, Kay Sievers wrote:
>On Fri, Jul 6, 2012 at 10:30 PM, Linus Torvalds
><torv...@linux-foundation.org> wrote:
>> Kay, this needs to be fixed.
>>
>> Suggested fix: just use the 'seq_printf()' interfaces, which do the
>> proper buffering, and allow any size reads of various packetized data.
>
>I'll have a look.
>
>> Of course, I'd also suggest that whoever was the genius who thought it
>> was a good idea to read things ONE F*CKING BYTE AT A TIME with system
>> calls for each byte should be retroactively aborted. Who the f*ck does
>> idiotic things like that? How did they noty die as babies, considering
>> that they were likely too stupid to find a tit to suck on?
>
>Maybe the bs=1 in the dd call stands for bullshit. :)

It seems people need to be taught to use ddrescue, stringently.
Having to calculate appropriate values for bs= and count= when
you just want to transfer bytes is already a crime, not to mention
the problem when you have a prime number of bytes to transfer.