[PATCH] kernel: escape non-ASCII and control characters in printk()

Vasiliy Kulikov

unread,

Jun 22, 2011, 6:00:02 AM6/22/11

to

This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
charset passed to printk().

There are numerous printk() instances with user supplied input as "%s"
data, and unprivileged user may craft log messages with substrings
containing control characters via these printk()s. Control characters
might fool root viewing the logs via tty.

Printing non-ASCII characters is not portable since not everyone sees
the same characters after 0xFF. If any driver use it to print some
binary data, it should be fixed. Not fixed code will print hex codes
of the binary data.

On testing Samsung Q310 laptop there are no users of chars outside of the
restricted charset.

Signed-off-by: Vasiliy Kulikov <seg...@openwall.com>
---

This patch does nothing with crafted "%s" data with '\n' inside. It
allows unprivileged user to craft arbitrary log messages via breaking
log lines boundaries. It is a bit tricky to fix it compatible way.
Limiting "%s" to one line in vscnprintf() would break legitimate users
of the multiline feature. Intoducing new "%S" format for single lines
makes little sense as there are tons of printk() calls that should be
already restricted to one line.

Proposals about '\n' inside of '%s" are welcome.

kernel/printk.c | 16 +++++++++++++++-
1 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..1f23988 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -671,6 +671,20 @@ static void emit_log_char(char c)
logged_chars++;
}

+static void emit_log_char_escaped(char c)
+{
+ char buffer[8];
+ int i, len;
+
+ if ((c >= ' ' && c < 127) || c == '\n')
+ emit_log_char(c);
+ else {
+ len = sprintf(buffer, "#%02x", c);
+ for (i = 0; i < len; i++)
+ emit_log_char(buffer[i]);
+ }
+}
+
/*
* Zap console related locks when oopsing. Only zap at most once
* every 10 seconds, to leave time for slow consoles to print a
@@ -938,7 +952,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)
break;
}

- emit_log_char(*p);
+ emit_log_char_escaped(*p);
if (*p == '\n')
new_text_line = 1;
}
--
1.7.0.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Greg KH

unread,

Jun 22, 2011, 11:40:03 AM6/22/11

to

On Wed, Jun 22, 2011 at 01:53:41PM +0400, Vasiliy Kulikov wrote:
> This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> charset passed to printk().
>
> There are numerous printk() instances with user supplied input as "%s"
> data, and unprivileged user may craft log messages with substrings
> containing control characters via these printk()s. Control characters
> might fool root viewing the logs via tty.

There are "numerous" places this could happen? Shouldn't this be
handled by the viewers of the log file and not the kernel itself?

And what could these control characters cause to be "fooled"?

greg k-h

Vasiliy Kulikov

unread,

Jun 22, 2011, 12:20:01 PM6/22/11

to

Hi Greg,

On Wed, Jun 22, 2011 at 08:37 -0700, Greg KH wrote:
> On Wed, Jun 22, 2011 at 01:53:41PM +0400, Vasiliy Kulikov wrote:
> > This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> > charset passed to printk().
> >
> > There are numerous printk() instances with user supplied input as "%s"
> > data, and unprivileged user may craft log messages with substrings
> > containing control characters via these printk()s. Control characters
> > might fool root viewing the logs via tty.
>
> There are "numerous" places this could happen?

From where I'm working now: warn_setuid_and_fcaps_mixed(),
get_file_caps(). task->comm is not filtered and sometimes it is used in
log entries related to the process. I don't know a global policy of
filtering such user supplied strings and didn't spot such filtering
(however, maybe I've missed it somewhere). It's MUCH simplier to filter
it in one place rather than hunt after the callers all over the kernel.

> Shouldn't this be
> handled by the viewers of the log file and not the kernel itself?

What viewers? Do you mean syslog implementation? IMO it's not its
duty, it logs what it was fed. It should be entiely app's care to
maintain consisten logs, syslog might not know precise log structure.
What should syslog do if app's log structure changes?

> And what could these control characters cause to be "fooled"?

E.g. line up:

ALERT!
^[1AUseless line.

TTY will interpret it as a single line "Useless line", ALERT will be
fully losen.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Joe Perches

unread,

Jun 22, 2011, 12:40:01 PM6/22/11

to

On Wed, 2011-06-22 at 13:53 +0400, Vasiliy Kulikov wrote:
> This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> charset passed to printk().

[]

> diff --git a/kernel/printk.c b/kernel/printk.c
[]

> +static void emit_log_char_escaped(char c)
> +{
> + char buffer[8];
> + int i, len;
> +
> + if ((c >= ' ' && c < 127) || c == '\n')

if (isprint(c))

Why not add this to emit_log_char?

Vasiliy Kulikov

unread,

Jun 22, 2011, 1:00:03 PM6/22/11

to

On Wed, Jun 22, 2011 at 09:38 -0700, Joe Perches wrote:
> On Wed, 2011-06-22 at 13:53 +0400, Vasiliy Kulikov wrote:
> > This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> > charset passed to printk().
>
> []
>
> > diff --git a/kernel/printk.c b/kernel/printk.c
> []
> > +static void emit_log_char_escaped(char c)
> > +{
> > + char buffer[8];
> > + int i, len;
> > +
> > + if ((c >= ' ' && c < 127) || c == '\n')
>
> if (isprint(c))

#define isprint(c) ((__ismask(c)&(_P|_U|_L|_D|_SP)) != 0)

It slightly differs from what I've written. It (1) lacks '\n', (2)
passes non-ASCII symbols. How would non-ASCII symbols look like if
terminal doesn't support it? (I don't know, merely asking).

But if >159 (the number got from _ctype[]) is not an issue then
(isprint(c) || (c == '\n')) looks really better.

> Why not add this to emit_log_char?

No real need, but someone may want to explicitly bypass the check, it
should use emit_log_char() instead of emit_log_char_escaped().

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Joe Perches

unread,

Jun 22, 2011, 1:20:02 PM6/22/11

to

On Wed, 2011-06-22 at 20:53 +0400, Vasiliy Kulikov wrote:
> On Wed, Jun 22, 2011 at 09:38 -0700, Joe Perches wrote:
> > > + if ((c >= ' ' && c < 127) || c == '\n')
> > if (isprint(c))
> #define isprint(c) ((__ismask(c)&(_P|_U|_L|_D|_SP)) != 0)
> It slightly differs from what I've written. It (1) lacks '\n',

You still need tab, so:

if (isprint(c) || isspace(c))

> (2) passes non-ASCII symbols.
> How would non-ASCII symbols look like if
> terminal doesn't support it? (I don't know, merely asking).

I believe most would work fine.

Are there any lp01's or la36's still connected
to a serial console?

I don't know what would happen to a 7 bit
ascii only device.

Vasiliy Kulikov

unread,

Jun 22, 2011, 1:50:01 PM6/22/11

to

On Wed, Jun 22, 2011 at 10:14 -0700, Joe Perches wrote:
> On Wed, 2011-06-22 at 20:53 +0400, Vasiliy Kulikov wrote:
> > On Wed, Jun 22, 2011 at 09:38 -0700, Joe Perches wrote:
> > > > + if ((c >= ' ' && c < 127) || c == '\n')
> > > if (isprint(c))
> > #define isprint(c) ((__ismask(c)&(_P|_U|_L|_D|_SP)) != 0)
> > It slightly differs from what I've written. It (1) lacks '\n',
>
> You still need tab,

Correct.

> so:
>
> if (isprint(c) || isspace(c))

No, it also allows vertical tabs. Looking into __ctype only ' ', '\n' and
'\t' should be allowed among all _S, so

if (isprint(c) || (c == '\n') || (c == '\t'))

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Alan Cox

unread,

Jun 22, 2011, 2:20:01 PM6/22/11

to

On Wed, 22 Jun 2011 09:38:03 -0700
Joe Perches <j...@perches.com> wrote:

> On Wed, 2011-06-22 at 13:53 +0400, Vasiliy Kulikov wrote:
> > This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> > charset passed to printk().

I think this is fundamentally wrong.

It makes sense for some interfaces but not others and arbitarily doing it
makes a nasty mess of anything like file name printing in non English
languages.

The right way to do this IMHO is for the console device itself to have a
filter function, the default would be the 0x20-0x7E but for example with
any console which has an accompanying tty device the right behaviour
depends upon the port UTF8 flag (IUTF8).

If that is set you shouldn't be filtering out unicode, just control codes.

Minor other nit is that you might want to allow BEL through and you
certainly want to allow tab through.

The core code should not be hardcoding policy assumptions about symbol
sets and ASCII, for an awful lot of consoles today that assumption is
just plain wrong, for others it makes sense

So with tty maintainer hat on - NAK to the current approach but a good
idea to do it properly.

Vasiliy Kulikov

unread,

Jun 22, 2011, 3:10:03 PM6/22/11

to

Hi Alan,

On Wed, Jun 22, 2011 at 19:10 +0100, Alan Cox wrote:
> If that is set you shouldn't be filtering out unicode, just control codes.

OK.

> Minor other nit is that you might want to allow BEL through and you
> certainly want to allow tab through.

In what situation do you think BEL makes sense in kernel log? I cannot
image the situation. Alarms should use KERN_EMERG/KERN_ALERT log level.

> The core code should not be hardcoding policy assumptions about symbol
> sets and ASCII, for an awful lot of consoles today that assumption is
> just plain wrong, for others it makes sense

No problem, I don't insist.

> So with tty maintainer hat on - NAK to the current approach but a good
> idea to do it properly.

The final check should be:

if (iscntrl(c) && (c != '\n') && (c != '\t'))

Any comments against this variant?

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Matthew Garrett

unread,

Jun 23, 2011, 9:40:02 AM6/23/11

to

On Wed, Jun 22, 2011 at 08:37:42AM -0700, Greg KH wrote:
> On Wed, Jun 22, 2011 at 01:53:41PM +0400, Vasiliy Kulikov wrote:
> > This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> > charset passed to printk().
> >
> > There are numerous printk() instances with user supplied input as "%s"
> > data, and unprivileged user may craft log messages with substrings
> > containing control characters via these printk()s. Control characters
> > might fool root viewing the logs via tty.
>
> There are "numerous" places this could happen?

USB product identifiers?

--
Matthew Garrett | mj...@srcf.ucam.org

Vasiliy Kulikov

unread,

Jun 23, 2011, 11:30:02 AM6/23/11

to

This patch escapes control characters fed to printk() except '\n' and '\t'.

There are numerous printk() instances with user supplied input as "%s"
data, and unprivileged user may craft log messages with substrings
containing control characters via these printk()s. Control characters

might fool root viewing the logs via tty, e.g. using ^[1A to suppress
the previous log line.

On the testing Samsung Q310 laptop there are no users of chars outside
of the restricted charset.

v2 - Allow chars with code >127. Allow tabs.

Reported-by: Solar Designer <so...@openwall.com>

Signed-off-by: Vasiliy Kulikov <seg...@openwall.com>
---

kernel/printk.c | 17 ++++++++++++++++-
1 files changed, 16 insertions(+), 1 deletions(-)

---
diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..727ff7d 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -41,6 +41,7 @@
#include <linux/cpu.h>
#include <linux/notifier.h>
#include <linux/rculist.h>
+#include <linux/ctype.h>

#include <asm/uaccess.h>

@@ -671,6 +672,20 @@ static void emit_log_char(char c)
logged_chars++;
}

+static void emit_log_char_escaped(char c)
+{
+ char buffer[8];
+ int i, len;
+

+ if (!iscntrl(c) || (c == '\n') || (c == '\t'))
+ emit_log_char(c);
+ else {
+ len = sprintf(buffer, "#x%02x", c);

+ for (i = 0; i < len; i++)
+ emit_log_char(buffer[i]);
+ }
+}
+
/*
* Zap console related locks when oopsing. Only zap at most once
* every 10 seconds, to leave time for slow consoles to print a

@@ -938,7 +953,7 @@ asmlinkage int vprintk(const char *fmt, va_list args)

break;
}

- emit_log_char(*p);
+ emit_log_char_escaped(*p);
if (*p == '\n')
new_text_line = 1;
}

---

Geert Uytterhoeven

unread,

Jun 23, 2011, 2:20:02 PM6/23/11

to

On Wed, Jun 22, 2011 at 21:07, Vasiliy Kulikov <seg...@openwall.com> wrote:
> On Wed, Jun 22, 2011 at 19:10 +0100, Alan Cox wrote:
>> If that is set you shouldn't be filtering out unicode, just control codes.
>
> OK.
>
>> Minor other nit is that you might want to allow BEL through and you
>> certainly want to allow tab through.
>
> In what situation do you think BEL makes sense in kernel log? I cannot
> image the situation. Alarms should use KERN_EMERG/KERN_ALERT log level.

Does BEL work? Last time I tried fancy things (e.g. color output
depending on the
message level), it didn't work. I only got strange characters.

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds

Greg KH

unread,

Jun 23, 2011, 5:50:02 PM6/23/11

to

On Thu, Jun 23, 2011 at 02:36:05PM +0100, Matthew Garrett wrote:
> On Wed, Jun 22, 2011 at 08:37:42AM -0700, Greg KH wrote:
> > On Wed, Jun 22, 2011 at 01:53:41PM +0400, Vasiliy Kulikov wrote:
> > > This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> > > charset passed to printk().
> > >
> > > There are numerous printk() instances with user supplied input as "%s"
> > > data, and unprivileged user may craft log messages with substrings
> > > containing control characters via these printk()s. Control characters
> > > might fool root viewing the logs via tty.
> >
> > There are "numerous" places this could happen?
>
> USB product identifiers?

That's one, sure, but the ability to overwrite something else that you
don't want someone to see based on plugging in a USB device is pretty
slim. If I can plug any type of USB device I want into the system, odds
are I just owned it anyway...

greg k-h

Willy Tarreau

unread,

Jun 25, 2011, 5:00:02 PM6/25/11

to

Hi Vasiliy,

On Wed, Jun 22, 2011 at 11:07:39PM +0400, Vasiliy Kulikov wrote:
> The final check should be:
>
> if (iscntrl(c) && (c != '\n') && (c != '\t'))
>
> Any comments against this variant?

In fact, I'm not sure we're adding that much protection with such a check
because as long as the '\n' is allowed, it's easy to fake logs. For instance :

$ cd /tmp
$ echo "main() { *(int*)0=0; }" | gcc -xc -o fail -
$ ln -s fail $'Oops: 000\nklogd'
$ ./Oops*
$ dmesg|tail -2
Oops: 000
klogd[1927]: segfault at 0 ip 0000000008048337 sp 00000000ffb54ba4 error 6 in fail[8048000+1000]
$

In an ideal world, only \n should be escaped since it's the only delimitor,
and klogd would get the raw logs with lines correctly sequenced. Other
characters should probably be escaped before going to log files if those
files are supposed to be readable on a terminal.

But I recall it was not possible to escape \n when we worked on the subject
several years ago on 2.4, because some drivers used to send multi-line logs
in a single printk().

The fundamental issue we're facing is that neither inputs nor outputs have
been clearly typed in the past. I tend to consider that a log file is readable
by "tail -f" and a such should not contain dangerous chars, however I also
tend to prefer sending raw logs over the network when they are archived by
different means. In the end it makes sense for the kernel and klogd to
exchange raw logs and syslogd should encode them when pushing them to a
file.

Best regards,
Willy

Ingo Molnar

unread,

Jun 26, 2011, 6:40:02 AM6/26/11

to

Nit: please use balanced curly braces.

Also, i think it would be better to make this opt-out, i.e. exclude
the handful of control characters that are harmful (such as backline
and console escape), instead of trying to include the known-useful
ones.

The whole non-ASCII-languages issue would not have happened if such
an approach was taken.

It's also the better approach for the kernel: we handle known harmful
things and are permissive otherwise.

Thanks,

Ingo

Vasiliy Kulikov

unread,

Jun 26, 2011, 1:00:01 PM6/26/11

to

Hi Ingo,

On Sun, Jun 26, 2011 at 12:39 +0200, Ingo Molnar wrote:
> > + if (!iscntrl(c) || (c == '\n') || (c == '\t'))
> > + emit_log_char(c);
> > + else {
> > + len = sprintf(buffer, "#x%02x", c);
> > + for (i = 0; i < len; i++)
> > + emit_log_char(buffer[i]);
> > + }
>
> Nit: please use balanced curly braces.

OK.

> Also, i think it would be better to make this opt-out, i.e. exclude
> the handful of control characters that are harmful (such as backline
> and console escape), instead of trying to include the known-useful
> ones.

Do you see any issue with the check above?

> The whole non-ASCII-languages issue would not have happened if such
> an approach was taken.
>
> It's also the better approach for the kernel: we handle known harmful
> things and are permissive otherwise.

I hope it is not a universal tip for the whole kernel development.
Black lists are almost always suck.

Could you instantly answer without reading the previous discussion what
control characters are harmful, what are sometimes harmful (on some
ttys), and what are always safe and why (or even answer why it is
harmful at all)? I'm not a tty guy and I have to read console_codes(4)
or similar docs to answer this question, the majority of kernel devs
might have to read the docs too.

Writing the black list implies the full knowledge of _all_ possible
malformed input values, which is somewhat hard to achieve (or even
impossible). Some developers might not be interested in learning such
details, but still interested in how this API can be used.

Quite the contrary, the allowed values set makes sense to the developer
and more stricktly defines the API in question. Discussing the API
goals and reaching the consensus about its usage is much more
productive. It might catch some wrong and dangerous API misuses. If the
allowed set becomes too strict one day, no problem - just explicitly
relax the check. If you lose some value in the black list (e.g. it
becomes known that some control char sequence can be used to fake the
logs), the miss significance would be higher.

And from the cynical point of view the white list is simply smaller and
cleaner than the black list.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Ingo Molnar

unread,

Jun 26, 2011, 2:30:02 PM6/26/11

to

* Vasiliy Kulikov <seg...@openwall.com> wrote:

> > Also, i think it would be better to make this opt-out, i.e.
> > exclude the handful of control characters that are harmful (such
> > as backline and console escape), instead of trying to include the
> > known-useful ones.
>
> Do you see any issue with the check above?

There were clear problems with the first version you posted and
that's enough proof to request the exclusion of known-dangerous
characters instead of including known-useful characters.

> > The whole non-ASCII-languages issue would not have happened if
> > such an approach was taken.
> >
> > It's also the better approach for the kernel: we handle known
> > harmful things and are permissive otherwise.
>
> I hope it is not a universal tip for the whole kernel development.
> Black lists are almost always suck.
>
> Could you instantly answer without reading the previous discussion
> what control characters are harmful, what are sometimes harmful (on
> some ttys), and what are always safe and why (or even answer why it
> is harmful at all)? I'm not a tty guy and I have to read
> console_codes(4) or similar docs to answer this question, the
> majority of kernel devs might have to read the docs too.
>
> Writing the black list implies the full knowledge of _all_ possible
> malformed input values, which is somewhat hard to achieve (or even
> impossible). Some developers might not be interested in learning
> such details, but still interested in how this API can be used.

The point is to protect against known harm and not to turn it into
"lets disable things broadly, just in case something unusual is
harmful" kind of voodoo programming.

And yes, i very much expect someone *restricting* Linux utility to
have *full* knowledge of the topic involved.

> Quite the contrary, the allowed values set makes sense to the
> developer and more stricktly defines the API in question.
> Discussing the API goals and reaching the consensus about its usage
> is much more productive. It might catch some wrong and dangerous
> API misuses. If the allowed set becomes too strict one day, no
> problem - just explicitly relax the check. If you lose some value
> in the black list (e.g. it becomes known that some control char
> sequence can be used to fake the logs), the miss significance would
> be higher.
>
> And from the cynical point of view the white list is simply smaller
> and cleaner than the black list.

A black list is well-defined: it disables the display of certain
characters because they are *known to be dangerous*.

A white list on the other hand does it the wrong way around: it tries
to put the 'burden of proof' on the useful, good guys - and that's
counter-productive really.

I don't mind protecting against known threats, with emphasis on
'known'.

Thanks,

Ingo

Vasiliy Kulikov

unread,

Jun 26, 2011, 3:10:01 PM6/26/11

to

On Sun, Jun 26, 2011 at 20:26 +0200, Ingo Molnar wrote:
> > > Also, i think it would be better to make this opt-out, i.e.
> > > exclude the handful of control characters that are harmful (such
> > > as backline and console escape), instead of trying to include the
> > > known-useful ones.
> >
> > Do you see any issue with the check above?
>
> There were clear problems with the first version you posted and
> that's enough proof to request the exclusion of known-dangerous
> characters instead of including known-useful characters.

It doesn't proof anything. If I/someone else did a mistake with
blacklisting would you say it is enough proof to request the inclusion
of well-known allowed characters?

> A black list is well-defined: it disables the display of certain
> characters because they are *known to be dangerous*.

What do you do with dangerous characters that are *not yet known* to be
dangerous?

> A white list on the other hand does it the wrong way around: it tries
> to put the 'burden of proof' on the useful, good guys - and that's
> counter-productive really.

Really? I think strict API definition is productive, unlike using it
in cases where it looks like working, but creating tricky and obscure
bugs.

Yes, drawing multicolor logs is funny, but ...egrrr... printk() is not
written for these things.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Ingo Molnar

unread,

Jun 26, 2011, 4:00:02 PM6/26/11

to

* Vasiliy Kulikov <seg...@openwall.com> wrote:

> On Sun, Jun 26, 2011 at 20:26 +0200, Ingo Molnar wrote:
>
> > > > Also, i think it would be better to make this opt-out, i.e.
> > > > exclude the handful of control characters that are harmful
> > > > (such as backline and console escape), instead of trying to
> > > > include the known-useful ones.
> > >
> > > Do you see any issue with the check above?
> >
> > There were clear problems with the first version you posted and
> > that's enough proof to request the exclusion of known-dangerous
> > characters instead of including known-useful characters.
>
> It doesn't proof anything. If I/someone else did a mistake with
> blacklisting would you say it is enough proof to request the
> inclusion of well-known allowed characters?

No, because the problems such a mistake causes are not equivalent: it
would have been far more harmful to not print out the *very real*
product names written in some non-US language than to accidentally
include some control character you did not think of.

> > A black list is well-defined: it disables the display of certain
> > characters because they are *known to be dangerous*.
>
> What do you do with dangerous characters that are *not yet known*
> to be dangerous?

I'm ready to act on facts only. Also, i really prefer the policy of
acting on known dangers instead of being afraid of the unknown.

The whole 'trust but verify' thing.

> > A white list on the other hand does it the wrong way around: it
> > tries to put the 'burden of proof' on the useful, good guys - and
> > that's counter-productive really.
>
> Really? I think strict API definition is productive, unlike using
> it in cases where it looks like working, but creating tricky and
> obscure bugs.

You werent really creating a well-defined API here, were you?

> Yes, drawing multicolor logs is funny, but ...egrrr... printk() is
> not written for these things.

maybe, but i still think that such a change works better, has fewer
unintended side effects and is better documented if it excludes known
dangers instead of trying to include known useful bits imperfectly.

Thanks,

Ingo

Vasiliy Kulikov

unread,

Jun 26, 2011, 4:40:01 PM6/26/11

to

On Sun, Jun 26, 2011 at 21:46 +0200, Ingo Molnar wrote:
> > > > > Also, i think it would be better to make this opt-out, i.e.
> > > > > exclude the handful of control characters that are harmful
> > > > > (such as backline and console escape), instead of trying to
> > > > > include the known-useful ones.
> > > >
> > > > Do you see any issue with the check above?
> > >
> > > There were clear problems with the first version you posted and
> > > that's enough proof to request the exclusion of known-dangerous
> > > characters instead of including known-useful characters.
> >
> > It doesn't proof anything. If I/someone else did a mistake with
> > blacklisting would you say it is enough proof to request the
> > inclusion of well-known allowed characters?
>
> No, because the problems such a mistake causes are not equivalent: it
> would have been far more harmful to not print out the *very real*
> product names written in some non-US language than to accidentally
> include some control character you did not think of.

???

Not "not print", but print in "crypted" form. The information is
still not lost, you can obviously restore it to the original form, with
some effort, but possible. Compare it with the harm of log spoofing -
it is not "restorable".

> > > A black list is well-defined: it disables the display of certain
> > > characters because they are *known to be dangerous*.
> >
> > What do you do with dangerous characters that are *not yet known*
> > to be dangerous?
>
> I'm ready to act on facts only.

The *fact* is you/anybody/everybody might not know all bad things. If
you just don't care because it is yet unknown then you will be
vulnerable as soon as it disclosured.

> Also, i really prefer the policy of
> acting on known dangers instead of being afraid of the unknown.

Do you know the principle "Attacks always get better, never worse"? If
you are protected against only of known attack, you will be vulnerable
to *every* danger not known to you.

Maybe you don't know, but it is really possible to be protected against
some *yet unknown* attack techniques. (The assessment of what attacks it
protects against is undefined too, though.) And upstream Linux is *already*
protected against some *yet unknown* bugs, not the whole bug classes,
but at least small kinds of it.

> > > A white list on the other hand does it the wrong way around: it
> > > tries to put the 'burden of proof' on the useful, good guys - and
> > > that's counter-productive really.
> >
> > Really? I think strict API definition is productive, unlike using
> > it in cases where it looks like working, but creating tricky and
> > obscure bugs.
>
> You werent really creating a well-defined API here, were you?

No, I was - only ascii chars and \n are allowed. In v2 all ascii chars,
the upper charset and 2 control chars are allowed. Rather clear, IMO.

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Ingo Molnar

unread,

Jun 26, 2011, 6:10:02 PM6/26/11

to

* Vasiliy Kulikov <seg...@openwall.com> wrote:

> On Sun, Jun 26, 2011 at 21:46 +0200, Ingo Molnar wrote:
> > > > > > Also, i think it would be better to make this opt-out, i.e.
> > > > > > exclude the handful of control characters that are harmful
> > > > > > (such as backline and console escape), instead of trying to
> > > > > > include the known-useful ones.
> > > > >
> > > > > Do you see any issue with the check above?
> > > >
> > > > There were clear problems with the first version you posted and
> > > > that's enough proof to request the exclusion of known-dangerous
> > > > characters instead of including known-useful characters.
> > >
> > > It doesn't proof anything. If I/someone else did a mistake with
> > > blacklisting would you say it is enough proof to request the
> > > inclusion of well-known allowed characters?
> >
> > No, because the problems such a mistake causes are not equivalent: it
> > would have been far more harmful to not print out the *very real*
> > product names written in some non-US language than to accidentally
> > include some control character you did not think of.
>
> ???
>
> Not "not print", but print in "crypted" form. The information is
> still not lost, you can obviously restore it to the original form,
> with some effort, but possible. Compare it with the harm of log
> spoofing - it is not "restorable".

The harm of 'potential' log spoofing affecting exactly zero known
users right now, versus the harm of obfuscating the output for a
known space of USB devices that print in non-US characters, at
minimum.

> > > > A black list is well-defined: it disables the display of
> > > > certain characters because they are *known to be dangerous*.
> > >
> > > What do you do with dangerous characters that are *not yet known*
> > > to be dangerous?
> >
> > I'm ready to act on facts only.
>
> The *fact* is you/anybody/everybody might not know all bad things.
> If you just don't care because it is yet unknown then you will be
> vulnerable as soon as it disclosured.

Erm, do you claim that it's not possible to know which characters are
dangerous and which ones not?

> > Also, i really prefer the policy of acting on known dangers
> > instead of being afraid of the unknown.
>
> Do you know the principle "Attacks always get better, never worse"?
> If you are protected against only of known attack, you will be
> vulnerable to *every* danger not known to you.
>
> Maybe you don't know, but it is really possible to be protected
> against some *yet unknown* attack techniques. (The assessment of
> what attacks it protects against is undefined too, though.) And
> upstream Linux is *already* protected against some *yet unknown*
> bugs, not the whole bug classes, but at least small kinds of it.

This claim is silly - do you claim some 'unknown bug' in the ASCII
printout space?

Cannot you be bothered to enumerate the known 'bad' control and
escape characters?

You *clearly* did not consider the full utility spectrum in the first
version of the patch so i think it's necessary due diligence on our
part to ask you to be more thoughtful with this ...

> > > > A white list on the other hand does it the wrong way around:
> > > > it tries to put the 'burden of proof' on the useful, good
> > > > guys - and that's counter-productive really.
> > >
> > > Really? I think strict API definition is productive, unlike
> > > using it in cases where it looks like working, but creating
> > > tricky and obscure bugs.
> >
> > You werent really creating a well-defined API here, were you?
>
> No, I was - only ascii chars and \n are allowed. In v2 all ascii
> chars, the upper charset and 2 control chars are allowed. Rather
> clear, IMO.

Please enumerate the space you excluded and the reason for exclusion.

Terminals are not some unknown and unknowable space.

Thanks,

Ingo

Vasiliy Kulikov

unread,

Jun 27, 2011, 4:40:02 AM6/27/11

to

On Mon, Jun 27, 2011 at 00:01 +0200, Ingo Molnar wrote:
> * Vasiliy Kulikov <seg...@openwall.com> wrote:
> > On Sun, Jun 26, 2011 at 21:46 +0200, Ingo Molnar wrote:
> > > No, because the problems such a mistake causes are not equivalent: it
> > > would have been far more harmful to not print out the *very real*
> > > product names written in some non-US language than to accidentally
> > > include some control character you did not think of.
> >
> > ???
> >
> > Not "not print", but print in "crypted" form. The information is
> > still not lost, you can obviously restore it to the original form,
> > with some effort, but possible. Compare it with the harm of log
> > spoofing - it is not "restorable".
>
> The harm of 'potential' log spoofing affecting exactly zero known
> users right now,

???

A potential thing affects all users that *can be* affected by actual
log spoofing. This is what the word "potential" means.

Analogy: if some privilege escalation bug is found in some very core
code then all users iteracting with an untrusted security domains
(local users, network, etc.) being able to exploit it would be affected.
It is silly to say that nobody is affected because you just don't know
any such cases of this bug exploitation in the past.

> > > > > A black list is well-defined: it disables the display of
> > > > > certain characters because they are *known to be dangerous*.
> > > >
> > > > What do you do with dangerous characters that are *not yet known*
> > > > to be dangerous?
> > >
> > > I'm ready to act on facts only.
> >
> > The *fact* is you/anybody/everybody might not know all bad things.
> > If you just don't care because it is yet unknown then you will be
> > vulnerable as soon as it disclosured.
>
> Erm, do you claim that it's not possible to know which characters are
> dangerous and which ones not?

A sample:

Is BEL dangerous? I can imagine 200 BELs/scrolls down would hang a
slow tty by a considerable amount of time without ^C ability to kill the
log viewer. On the rest ttys it is harmless. Even Alan wanted to add
it to the white list, not just exclude from the black list.

If someone skilled in the TTY codes provides us the full analysis of tty
code table with *a proof* what codes are harmfull (in the sense of
logging) and what are not, the question would be closed. I'm not
skilled it in, I'd not try to do it (maybe the manpage is incomplete?).
I've reported just the *known for sure* farmfull sequence (the first
report of cntrl chars harm belogs to Solar Designer and dates to 2006:
http://lists.openwall.net/linux-kernel/2006/08/22/29).

> > > Also, i really prefer the policy of acting on known dangers
> > > instead of being afraid of the unknown.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > Do you know the principle "Attacks always get better, never worse"?
> > If you are protected against only of known attack, you will be
> > vulnerable to *every* danger not known to you.
> >
> > Maybe you don't know, but it is really possible to be protected
> > against some *yet unknown* attack techniques. (The assessment of
> > what attacks it protects against is undefined too, though.) And
> > upstream Linux is *already* protected against some *yet unknown*
> > bugs, not the whole bug classes, but at least small kinds of it.
>
> This claim is silly - do you claim some 'unknown bug' in the ASCII
> printout space?

You were talking about dangers in general, I'm answering about it in
general too.

> Cannot you be bothered to enumerate the known 'bad' control and
> escape characters?

Note it is not bad characters, but character sequences. You cannot just
grep for bad substrings because of KERN_CONT (because the black list means
everything not denied is permitted, and breaking ctrl sequence is then
permitted, right?), so you have to define a state machine for this
filtering (if you want to filter just only bad charsequences). It looks
too complicated for me for this little thing - just to write to a log
with user supplied data. But you say it is simplier and safer, maybe
you will do it?

> You *clearly* did not consider the full utility spectrum in the first
> version of the patch

Sure, that's why maintainers exist and they should be competent to
ack/nak the solution that does/doesn't fit the actual needs.

BTW, it has happened because of missing clear conversions about what
characters printk() can be fed. In the comment the encoding is not
mentioned, only "See also: printf(3)", which might be related to format
strings only.

Also '\n' problem with multiline messages wouldn't arrise if printk()
was clearly declared as a function logging exactly one line.

> > > > > A white list on the other hand does it the wrong way around:
> > > > > it tries to put the 'burden of proof' on the useful, good
> > > > > guys - and that's counter-productive really.
> > > >
> > > > Really? I think strict API definition is productive, unlike
> > > > using it in cases where it looks like working, but creating
> > > > tricky and obscure bugs.
> > >
> > > You werent really creating a well-defined API here, were you?
> >
> > No, I was - only ascii chars and \n are allowed. In v2 all ascii
> > chars, the upper charset and 2 control chars are allowed. Rather
> > clear, IMO.
>
> Please enumerate the space you excluded

All control characters except space chars (tabs and \n) are not needed
for the log purposes, they define the cursor position, output colors,
and tty modes. All this is not needed for log purposes, it is a cynical
log, not a cristmas tree.

> and the reason for exclusion.

Note that *you* want to define a black list with a clear denied
semantics, I want to do the reverse - to define the allowed semantics,
which additionally implies better API semantics understanding. I don't
need to enumerate all evils, I'm evading it by defining what one can do
for sure.

I don't understand why you so resist to define API more clear to avoid
other possible problems in the future.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Vasiliy Kulikov

unread,

Jun 27, 2011, 5:40:02 AM6/27/11

to

On Mon, Jun 27, 2011 at 12:36 +0400, Vasiliy Kulikov wrote:
> > and the reason for exclusion.
>
> Note that *you* want to define a black list with a clear denied
> semantics,

Sorry, s/semantics/charsets with a proof of the harm/

Alan Cox

unread,

Jun 27, 2011, 6:00:03 AM6/27/11

to

Go back and read what I said originally, and think about it.

You have to look at the IUTF bit of the relevant console you are logging
to.

As far as I can see we need:

No IUTF: block various control codes + CSI
IUTF: block various control codes only

But arbitarily blocking anything non ASCII is going to make a mess in
anything non American. Even English needs UTF-8.

Alan

Vasiliy Kulikov

unread,

Jun 27, 2011, 2:40:02 PM6/27/11

to

On Mon, Jun 27, 2011 at 10:40 +0100, Alan Cox wrote:
>
> Go back and read what I said originally, and think about it.
>
> You have to look at the IUTF bit of the relevant console you are logging
> to.

The knowledge of what specific console is relevant is unknown at the
moment of printk(). Do you approve only filtering on klogd side?

> But arbitarily blocking anything non ASCII is going to make a mess in
> anything non American. Even English needs UTF-8.

Sure, I don't propose it anymore (v2 goes without it).

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Linus Torvalds

unread,

Jun 28, 2011, 3:40:02 PM6/28/11

to

On Mon, Jun 27, 2011 at 11:38 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
>
> Sure, I don't propose it anymore (v2 goes without it).

What point would you like to filter things at?

I really think that user space should do its own filtering - nobody
does a plain 'cat' on dmesg. Or if they do, they really have
themselves to blame.

And afaik, we don't do any escape sequence handling at the console
level either, so you cannot mess up the console with control
characters.

And the most dangerous character seems to be one that you don't
filter: the one we really do react to is '\n', and you could possibly
make confusing log messages by embedding a newline in your string and
then trying to make the rest look like something bad (say, an oops).

So I'm not entirely convinced about this filtering at all.

Linus

Ingo Molnar

unread,

Jul 1, 2011, 8:10:01 AM7/1/11

to

* Linus Torvalds <torv...@linux-foundation.org> wrote:

> On Mon, Jun 27, 2011 at 11:38 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
> >
> > Sure, I don't propose it anymore (v2 goes without it).
>
> What point would you like to filter things at?
>
> I really think that user space should do its own filtering - nobody
> does a plain 'cat' on dmesg. Or if they do, they really have
> themselves to blame.
>
> And afaik, we don't do any escape sequence handling at the console
> level either, so you cannot mess up the console with control
> characters.
>
> And the most dangerous character seems to be one that you don't
> filter: the one we really do react to is '\n', and you could possibly
> make confusing log messages by embedding a newline in your string and
> then trying to make the rest look like something bad (say, an oops).
>
> So I'm not entirely convinced about this filtering at all.

Yeah. It would be nice to see a demonstration of at least one 'bad
thing' that is possible via the current code, before we protect
against it.

The claim the patch makes is rather specific:

| There are numerous printk() instances with user supplied input as
| "%s" data, and unprivileged user may craft log messages with
| substrings containing control characters via these printk()s.
| Control characters might fool root viewing the logs via tty, e.g.
| using ^[1A to suppress the previous log line.

So it ought to be demonstrable.

Thanks,

Ingo

Ingo Molnar

unread,

Jul 1, 2011, 8:20:01 AM7/1/11

to

* Vasiliy Kulikov <seg...@openwall.com> wrote:

> On Mon, Jun 27, 2011 at 00:01 +0200, Ingo Molnar wrote:
> > * Vasiliy Kulikov <seg...@openwall.com> wrote:
> > > On Sun, Jun 26, 2011 at 21:46 +0200, Ingo Molnar wrote:
> > > > No, because the problems such a mistake causes are not equivalent: it
> > > > would have been far more harmful to not print out the *very real*
> > > > product names written in some non-US language than to accidentally
> > > > include some control character you did not think of.
> > >
> > > ???
> > >
> > > Not "not print", but print in "crypted" form. The information
> > > is still not lost, you can obviously restore it to the original
> > > form, with some effort, but possible. Compare it with the harm
> > > of log spoofing - it is not "restorable".
> >
> > The harm of 'potential' log spoofing affecting exactly zero known
> > users right now,
>
> ???
>
> A potential thing affects all users that *can be* affected by
> actual log spoofing. This is what the word "potential" means.

Yes, but there's a world of a difference between alleged harm and
actual demonstrated harm.

That is a not so fine distinction that is often missed in security
circles! :-)

So what i asked for before and what i ask for here is to protect
against real, specific harm. If we just 'protect' against things that
look dangerous it's easy to over-protect and cause colletaral damage.
(like the UTF-8 details the v1 patch missed)

> Analogy: if some privilege escalation bug is found in some very
> core code then all users iteracting with an untrusted security
> domains (local users, network, etc.) being able to exploit it would
> be affected. It is silly to say that nobody is affected because you
> just don't know any such cases of this bug exploitation in the
> past.

That analogy does not hold. If a security hole is obvious at first
sight then we'll indeed fix it without waiting for someone to be
exploited.

But here the actual 'harm' is a lot less clear and what i'm trying to
steer you towards is to be more fact-based and less belief-based. The
only 'harm' that got demonstrated so far was collateral damage caused
by the v1 patch ...

Thanks,

Ingo

Vasiliy Kulikov

unread,

Jul 1, 2011, 9:00:02 AM7/1/11

to

On Fri, Jul 01, 2011 at 14:00 +0200, Ingo Molnar wrote:
>
> * Linus Torvalds <torv...@linux-foundation.org> wrote:
>
> > On Mon, Jun 27, 2011 at 11:38 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
> > >
> > > Sure, I don't propose it anymore (v2 goes without it).
> >
> > What point would you like to filter things at?
> >
> > I really think that user space should do its own filtering - nobody
> > does a plain 'cat' on dmesg. Or if they do, they really have
> > themselves to blame.

Some people do "dmesg | tail" or "grep -i err /var/log/messages". Or
even "tail -f /var/log/messages".

http://www.google.com/search?q=grep+var+log+messages

> > And afaik, we don't do any escape sequence handling at the console
> > level either, so you cannot mess up the console with control
> > characters.
> >
> > And the most dangerous character seems to be one that you don't
> > filter: the one we really do react to is '\n', and you could possibly
> > make confusing log messages by embedding a newline in your string and
> > then trying to make the rest look like something bad (say, an oops).
> >
> > So I'm not entirely convinced about this filtering at all.
>
> Yeah. It would be nice to see a demonstration of at least one 'bad
> thing' that is possible via the current code, before we protect
> against it.
>
> The claim the patch makes is rather specific:
>
> | There are numerous printk() instances with user supplied input as
> | "%s" data, and unprivileged user may craft log messages with
> | substrings containing control characters via these printk()s.
> | Control characters might fool root viewing the logs via tty, e.g.
> | using ^[1A to suppress the previous log line.
>
> So it ought to be demonstrable.

Read this mail again:

http://www.openwall.com/lists/kernel-hardening/2011/06/22/6

Using the instance of the bug in core dump code that Willy used here for
\n thing:

http://www.openwall.com/lists/kernel-hardening/2011/06/25/3

We get:

$ echo "main() { sprintf((char*)3, (char*)1); }" | gcc -xc -o ../1234 -
$ ln -s ../1234 $'\x1b[1Ainit\nfail'

Now generate an arbitrary log message:

$ ../1234
Ошибка сегментирования (core dumped)
$ dmesg | tail -n1
[16291.982492] 1234[21579]: segfault at 0 ip 00007f527186028a sp 00007fff08251dd8 error 4 in libc-2.11.1.so[7f52717d8000+17a000]

Change this log entry:

$ './^[[1Ainit
fail'
Ошибка сегментирования (core dumped)

$ dmesg | tail
...
[16291.982492] init[21579]: segfault at 0 ip 00007f527186028a sp 00007fff08251dd8 error 4 in libc-2.11.1.so[7f52717d8000+17a000]
[16379.001357] fail[21586]: segfault at 0 ip 00007f332831028a sp 00007fffc52de568 error 4 in libc-2.11.1.so[7f3328288000+17a000]
$
(see 1234 => init change)

As you see here, the previous log entry was showed not as it was
initally logged. Core dump message was chosen fully arbitrary, just to
show that some specific part of it can be "edited".

If this step-by-step instruction is not a proof, then I don't know what
the word "proof" means...

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Alan Cox

unread,

Jul 1, 2011, 10:20:01 AM7/1/11

to

> $ dmesg | tail
> ...
> [16291.982492] init[21579]: segfault at 0 ip 00007f527186028a sp 00007fff08251dd8 error 4 in libc-2.11.1.so[7f52717d8000+17a000]
> [16379.001357] fail[21586]: segfault at 0 ip 00007f332831028a sp 00007fffc52de568 error 4 in libc-2.11.1.so[7f3328288000+17a000]
> $
> (see 1234 => init change)
>
> As you see here, the previous log entry was showed not as it was
> initally logged. Core dump message was chosen fully arbitrary, just to
> show that some specific part of it can be "edited".

Your syslog daemon appears to need improvement and that would be the
right fix for dmesg etc.

The other case is the actual printk direct to console case, particularly
on a serial port. That one actually has more potential for actual
annoyance.

Vasiliy Kulikov

unread,

Jul 1, 2011, 10:40:02 AM7/1/11

to

On Tue, Jun 28, 2011 at 12:30 -0700, Linus Torvalds wrote:
> And the most dangerous character seems to be one that you don't
> filter: the one we really do react to is '\n', and you could possibly
> make confusing log messages by embedding a newline in your string and
> then trying to make the rest look like something bad (say, an oops).

Btw, I've already outlined this problem in patch v1 comment, but
received no single comment on the suggested 2 possible ways:

http://www.openwall.com/lists/kernel-hardening/2011/06/22/2

"This patch does nothing with crafted "%s" data with '\n' inside. It
allows unprivileged user to craft arbitrary log messages via breaking
log lines boundaries. It is a bit tricky to fix it compatible way.
Limiting "%s" to one line in vscnprintf() would break legitimate users
of the multiline feature. Intoducing new "%S" format for single lines
makes little sense as there are tons of printk() calls that should be
already restricted to one line.

Proposals about '\n' inside of '%s" are welcome."

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Alan Cox

unread,

Jul 1, 2011, 10:50:01 AM7/1/11

to

> of the multiline feature. Intoducing new "%S" format for single lines
> makes little sense as there are tons of printk() calls that should be
> already restricted to one line.

You don't need a new format string surely. Your expectation for printk is

"multiple new lines are cool providing they are in the format string"

So that bit isn't hard to deal with,

You make vprintk take an extra arg (trusted/untrusted args)
You make printk pass 'untrusted'
You make %s quote the arguments for control codes if untrusted is set but
you don't mangle format string controls.

End of problem ?

At which point your attacker has more work to do but given a long string
yawns and stars using the right number of spaces for the likely 80 col
screen :)

Vasiliy Kulikov

unread,

Jul 2, 2011, 4:20:02 AM7/2/11

to

On Fri, Jul 01, 2011 at 15:49 +0100, Alan Cox wrote:
> > of the multiline feature. Intoducing new "%S" format for single lines
> > makes little sense as there are tons of printk() calls that should be
> > already restricted to one line.
>
> You don't need a new format string surely. Your expectation for printk is
>
> "multiple new lines are cool providing they are in the format string"
>
> So that bit isn't hard to deal with,
>
> You make vprintk take an extra arg (trusted/untrusted args)

Not vprintk, but vscnprintf(), vsnprintf() and string() because
vprintk() is used in tens of places besides of printk(). Or better
implement _vscnprintf(.., bool untrusted) and

vscnprintf(...) { return _vscnprintf(..., false); }

to leave current users of it as is.

But yes, I got the idea.

> You make printk pass 'untrusted'
> You make %s quote the arguments for control codes

What to do with CSI? It is a valid byte inside of a UTF-8 string.
Parsing a supplied string assuming it is UTF-8 string and filtering CSI
iff it is not a part of UTF-8 symbol is something a bit ugly IMO.

Greg - do you know any devices supplying multibyte strings, but not in
UTF-8 encoding? If yes, then CSI filtering is a bad idea :\

> At which point your attacker has more work to do but given a long string
> yawns and stars using the right number of spaces for the likely 80 col
> screen :)

Yeah, but introducing some artificial limit for string length is IMO
more harmfull: there is no universal limit for all situations, somewhere
the resulting string is already 70 chars and even 20 bytes would
overflow the col; in rare cases a string of 50 bytes might be still
acceptable.

Thanks,

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Greg KH

unread,

Jul 2, 2011, 11:20:01 AM7/2/11

to

On Sat, Jul 02, 2011 at 12:10:22PM +0400, Vasiliy Kulikov wrote:
> Greg - do you know any devices supplying multibyte strings, but not in
> UTF-8 encoding? If yes, then CSI filtering is a bad idea :\

USB is "supposed" to have UTF-16 strings, but there's devices out there
that have crazy stuff in them. And of course, anyone can put anything
in the device string if they know how to write a bit of firmware, so we
have to watch out for that. Hopefully we handle that properly in the
usb_string() function in the kernel but review of it is always nice to
have.

Other than that, I don't know of any multi-byte strings in devices at
the moment. I haven't seen the Thunderbolt specs though, so who knows
what is coming next.

Hope this helps,

greg k-h

Solar Designer

unread,

Jul 2, 2011, 12:50:02 PM7/2/11

to

On Fri, Jul 01, 2011 at 03:20:51PM +0100, Alan Cox wrote:
> > $ dmesg | tail
> > ...
> > [16291.982492] init[21579]: segfault at 0 ip 00007f527186028a sp 00007fff08251dd8 error 4 in libc-2.11.1.so[7f52717d8000+17a000]
> > [16379.001357] fail[21586]: segfault at 0 ip 00007f332831028a sp 00007fffc52de568 error 4 in libc-2.11.1.so[7f3328288000+17a000]
> > $
> > (see 1234 => init change)
> >
> > As you see here, the previous log entry was showed not as it was
> > initally logged. Core dump message was chosen fully arbitrary, just to
> > show that some specific part of it can be "edited".
>
> Your syslog daemon appears to need improvement

syslogd definitely should (and does) perform escaping of control chars
for logs written through it.

> and that would be the right fix for dmesg etc.

How? We can't realistically expect everyone to stop invoking dmesg
directly. Moreover, this is sometimes useful, although someone who is
aware of the risk and who cares about it would do "dmesg | less". The
problem is that most people are not aware, and we can't change that.

> The other case is the actual printk direct to console case, particularly
> on a serial port. That one actually has more potential for actual
> annoyance.

Right.

Alexander

Alan Cox

unread,

Jul 2, 2011, 3:40:02 PM7/2/11

to

> > Your syslog daemon appears to need improvement
>
> syslogd definitely should (and does) perform escaping of control chars
> for logs written through it.
>
> > and that would be the right fix for dmesg etc.
>
> How? We can't realistically expect everyone to stop invoking dmesg
> directly. Moreover, this is sometimes useful, although someone who is

No but you could make "dmesg" do filtering.

Alan

Linus Torvalds

unread,

Jul 2, 2011, 4:40:02 PM7/2/11

to

On Sat, Jul 2, 2011 at 12:33 PM, Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
>
> No but you could make "dmesg" do filtering.

So I really hate doing the filtering at the level that the original
patch did, but I would _not_ mind doing the filtering at "vsnprintf()"
time.

Even without some magic "safe" flag, in fact.

So I would not even be opposed to "%s" just doing filtering (including
considering "\n" to be a control character) by default. So "\n" would
be ok in the format string (where it is pretty common), but not for
%s.

For kernel uses that really want raw strings, we could have a %p
sequence, the way we handle other special formats.

However, that would be a patch that would need a lot more testing,
since we'd have to find the users that really do want control
characters.

Linus

Vasiliy Kulikov

unread,

Jul 3, 2011, 6:10:01 AM7/3/11

to

On Fri, Jul 01, 2011 at 15:49 +0100, Alan Cox wrote:

> Your expectation for printk is
>
> "multiple new lines are cool providing they are in the format string"

Sigh... After implementing controls filtering including \n inside of
%s, I got numerous false positives. Most of them are startup messages
with driver/hardware name/version, in drivers/.

I'm attaching the patch with both vscnprintf() changes and drivers fixes
just to show the idea. Before applying the vscnprintf patch all/most of
such drivers should be fixed. If someone think it's better to apply the
patch to some tree to learn the list of such false positives - tell me,
I'll resend the vscnprintf part of the patch.

Also, the problem with CSI code is still here. It cannot be filtered
because it is a valid byte inside of UTF-8 character. For a console
with IUTF bit set all dangerous characters are filtered, but for the
rest they are not. As at the stage of vscnprintf the presence of IUTF
bit is not clear, users of non-IUTF console should still use "| less" or
similar. Obviously, \n is filtered for all consoles.

Thanks,

---
diff --git a/arch/blackfin/kernel/early_printk.c b/arch/blackfin/kernel/early_printk.c
index 84ed837..5a9ba3e 100644
--- a/arch/blackfin/kernel/early_printk.c
+++ b/arch/blackfin/kernel/early_printk.c
@@ -173,6 +173,7 @@ asmlinkage void __init init_early_exception_vectors(void)
*/
mark_shadow_error();
early_shadow_puts(linux_banner);
+ early_shadow_puts("\n");
early_shadow_stamp();

if (CPUID != bfin_cpuid()) {
diff --git a/arch/blackfin/kernel/trace.c b/arch/blackfin/kernel/trace.c
index 050db44..16678b0 100644
--- a/arch/blackfin/kernel/trace.c
+++ b/arch/blackfin/kernel/trace.c
@@ -872,7 +872,7 @@ void show_regs(struct pt_regs *fp)
#endif
);

- pr_notice("%s", linux_banner);
+ pr_notice("%s\n", linux_banner);

pr_notice("\nSEQUENCER STATUS:\t\t%s\n", print_tainted());
pr_notice(" SEQSTAT: %08lx IPEND: %04lx IMASK: %04lx SYSCFG: %04lx\n",
diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index c016033..c84e3b3 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -2544,7 +2544,7 @@ unsigned long __init prom_init(unsigned long r3, unsigned long r4,
*/
prom_init_stdout();

- prom_printf("Preparing to boot %s", RELOC(linux_banner));
+ prom_printf("Preparing to boot %s\n", RELOC(linux_banner));

/*
* Get default machine type. At this point, we do not differentiate
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 33a0c11..258659c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -644,7 +644,9 @@ static void __cpuinit announce_cpu(int cpu, int apicid)
current_node = node;
pr_info("Booting Node %3d, Processors ", node);
}
- pr_cont(" #%d%s", cpu, cpu == (nr_cpu_ids - 1) ? " Ok.\n" : "");
+ pr_cont(" #%d%s", cpu, cpu == (nr_cpu_ids - 1) ? " Ok." : "");
+ if (cpu == (nr_cpu_ids - 1))
+ pr_cont("\n");
return;
} else
pr_info("Booting Node %d Processor %d APIC 0x%x\n",
diff --git a/drivers/gpu/drm/nouveau/nouveau_pm.c b/drivers/gpu/drm/nouveau/nouveau_pm.c
index da8d994..0997d2c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_pm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_pm.c
@@ -178,7 +178,7 @@ nouveau_pm_perflvl_info(struct nouveau_pm_level *perflvl, char *ptr, int len)
if (perflvl->timing)
snprintf(t, sizeof(t), " timing %d", perflvl->timing->id);

- snprintf(ptr, len, "memory %dMHz%s%s%s%s%s\n", perflvl->memory / 1000,
+ snprintf(ptr, len, "memory %dMHz%s%s%s%s%s", perflvl->memory / 1000,
c, s, v, f, t);
}

@@ -195,6 +195,7 @@ nouveau_pm_get_perflvl_info(struct device *d,
len -= strlen(buf);

nouveau_pm_perflvl_info(perflvl, ptr, len);
+ strcat(ptr, "\n");
return strlen(buf);
}

@@ -218,8 +219,10 @@ nouveau_pm_get_perflvl(struct device *d, struct device_attribute *a, char *buf)
len -= strlen(buf);

ret = nouveau_pm_perflvl_get(dev, &cur);
- if (ret == 0)
+ if (ret == 0) {
nouveau_pm_perflvl_info(&cur, ptr, len);
+ strcat(ptr, "\n");
+ }
return strlen(buf);
}

@@ -488,7 +491,7 @@ nouveau_pm_init(struct drm_device *dev)
NV_INFO(dev, "%d available performance level(s)\n", pm->nr_perflvl);
for (i = 0; i < pm->nr_perflvl; i++) {
nouveau_pm_perflvl_info(&pm->perflvl[i], info, sizeof(info));
- NV_INFO(dev, "%d: %s", pm->perflvl[i].id, info);
+ NV_INFO(dev, "%d: %s\n", pm->perflvl[i].id, info);
}

/* determine current ("boot") performance level */
@@ -498,7 +501,7 @@ nouveau_pm_init(struct drm_device *dev)
pm->cur = &pm->boot;

nouveau_pm_perflvl_info(&pm->boot, info, sizeof(info));
- NV_INFO(dev, "c: %s", info);
+ NV_INFO(dev, "c: %s\n", info);
}

/* switch performance levels now if requested */
diff --git a/drivers/scsi/bnx2i/bnx2i_init.c b/drivers/scsi/bnx2i/bnx2i_init.c
index 6adbdc3..f1354a6 100644
--- a/drivers/scsi/bnx2i/bnx2i_init.c
+++ b/drivers/scsi/bnx2i/bnx2i_init.c
@@ -23,7 +23,7 @@ static u32 adapter_count;

static char version[] __devinitdata =
"Broadcom NetXtreme II iSCSI Driver " DRV_MODULE_NAME \
- " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+ " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")";

MODULE_AUTHOR("Anil Veerabhadrappa <ani...@broadcom.com> and "
@@ -372,7 +372,7 @@ static int __init bnx2i_mod_init(void)
{
int err;

- printk(KERN_INFO "%s", version);
+ printk(KERN_INFO "%s\n", version);

if (sq_size && !is_power_of_2(sq_size))
sq_size = roundup_pow_of_two(sq_size);
diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
index fc2cdb6..b2a659b 100644
--- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
+++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
@@ -38,7 +38,7 @@ static unsigned int dbg_level;

static char version[] =
DRV_MODULE_DESC " " DRV_MODULE_NAME
- " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
+ " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")";

MODULE_AUTHOR("Chelsio Communications, Inc.");
MODULE_DESCRIPTION(DRV_MODULE_DESC);
@@ -1402,7 +1402,7 @@ static int __init cxgb3i_init_module(void)
{
int rc;

- printk(KERN_INFO "%s", version);
+ printk(KERN_INFO "%s\b", version);

rc = cxgbi_iscsi_init(&cxgb3i_iscsi_transport, &cxgb3i_stt);
if (rc < 0)
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index fb0e732..db729ad 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -300,6 +300,8 @@ extern int scnprintf(char * buf, size_t size, const char * fmt, ...)
__attribute__ ((format (printf, 3, 4)));
extern int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
__attribute__ ((format (printf, 3, 0)));
+extern int _vscnprintf(char *buf, size_t size, const char *fmt, va_list args, bool unsafe_args)
+ __attribute__ ((format (printf, 3, 0)));
extern char *kasprintf(gfp_t gfp, const char *fmt, ...)
__attribute__ ((format (printf, 2, 3)));
extern char *kvasprintf(gfp_t gfp, const char *fmt, va_list args);
diff --git a/init/main.c b/init/main.c
index cafba67..b28e364 100644
--- a/init/main.c
+++ b/init/main.c
@@ -484,7 +484,7 @@ asmlinkage void __init start_kernel(void)
tick_init();
boot_cpu_init();
page_address_init();
- printk(KERN_NOTICE "%s", linux_banner);
+ printk(KERN_NOTICE "%s\b", linux_banner);
setup_arch(&command_line);
mm_init_owner(&init_mm, &init_task);
mm_init_cpumask(&init_mm);
diff --git a/init/version.c b/init/version.c
index 86fe0cc..256cfff 100644
--- a/init/version.c
+++ b/init/version.c
@@ -40,7 +40,7 @@ EXPORT_SYMBOL_GPL(init_uts_ns);
/* FIXED STRINGS! Don't touch! */
const char linux_banner[] =
"Linux version " UTS_RELEASE " (" LINUX_COMPILE_BY "@"
- LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION "\n";
+ LINUX_COMPILE_HOST ") (" LINUX_COMPILER ") " UTS_VERSION;

const char linux_proc_banner[] =
"%s version %s"
diff --git a/kernel/printk.c b/kernel/printk.c
index 3518539..67a7fb6 100644
--- a/kernel/printk.c
+++ b/kernel/printk.c
@@ -869,8 +869,8 @@ asmlinkage int vprintk(const char *fmt, va_list args)
printed_len = strlen(recursion_bug_msg);
}
/* Emit the output into the temporary buffer */
- printed_len += vscnprintf(printk_buf + printed_len,
- sizeof(printk_buf) - printed_len, fmt, args);
+ printed_len += _vscnprintf(printk_buf + printed_len,
+ sizeof(printk_buf) - printed_len, fmt, args, true);

p = printk_buf;

diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c112056..86e4332 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -395,8 +395,7 @@ char *number(char *buf, char *end, unsigned long long num,
return buf;
}

-static noinline_for_stack
-char *string(char *buf, char *end, const char *s, struct printf_spec spec)
+char *_string(char *buf, char *end, const char *s, struct printf_spec spec, bool unsafe)
{
int len, i;

@@ -413,8 +412,12 @@ char *string(char *buf, char *end, const char *s, struct printf_spec spec)
}
}
for (i = 0; i < len; ++i) {
- if (buf < end)
- *buf = *s;
+ if (buf < end) {
+ if (unsafe && iscntrl(*s))
+ *buf = '?';
+ else
+ *buf = *s;
+ }
++buf; ++s;
}
while (len < spec.field_width--) {
@@ -427,6 +430,12 @@ char *string(char *buf, char *end, const char *s, struct printf_spec spec)
}

static noinline_for_stack
+char *string(char *buf, char *end, const char *s, struct printf_spec spec)
+{
+ return _string(buf, end, s, spec, false);
+}
+
+static noinline_for_stack
char *symbol_string(char *buf, char *end, void *ptr,
struct printf_spec spec, char ext)
{
@@ -1163,7 +1172,7 @@ qualifier:
*
* If you're not already dealing with a va_list consider using snprintf().
*/
-int vsnprintf(char *buf, size_t size, const char *fmt, va_list args)
+int _vsnprintf(char *buf, size_t size, const char *fmt, va_list args, bool unsafe_args)
{
unsigned long long num;
char *str, *end;
@@ -1233,7 +1242,7 @@ int vsnprintf(char *buf, size_t size, const char *fmt, va_list args)
}

case FORMAT_TYPE_STR:
- str = string(str, end, va_arg(args, char *), spec);
+ str = _string(str, end, va_arg(args, char *), spec, unsafe_args);
break;

case FORMAT_TYPE_PTR:
@@ -1322,14 +1331,21 @@ int vsnprintf(char *buf, size_t size, const char *fmt, va_list args)
return str-buf;

}
+
+int vsnprintf(char *buf, size_t size, const char *fmt, va_list args)
+{
+ return _vsnprintf(buf, size, fmt, args, false);
+}
EXPORT_SYMBOL(vsnprintf);

/**
- * vscnprintf - Format a string and place it in a buffer
+ * _vscnprintf - Format a string and place it in a buffer
* @buf: The buffer to place the result into
* @size: The size of the buffer, including the trailing null space
* @fmt: The format string to use
* @args: Arguments for the format string
+ * @unsafe_args: Whether some of args are user supplied. If true, control
+ * characters like \n and cursor movements are escaped.
*
* The return value is the number of characters which have been written into
* the @buf not including the trailing '\0'. If @size is == 0 the function
@@ -1339,11 +1355,11 @@ EXPORT_SYMBOL(vsnprintf);
*
* See the vsnprintf() documentation for format string extensions over C99.
*/
-int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
+int _vscnprintf(char *buf, size_t size, const char *fmt, va_list args, bool unsafe_args)
{
int i;

- i = vsnprintf(buf, size, fmt, args);
+ i = _vsnprintf(buf, size, fmt, args, unsafe_args);

if (likely(i < size))
return i;
@@ -1351,6 +1367,12 @@ int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
return size - 1;
return 0;
}
+EXPORT_SYMBOL(_vscnprintf);
+
+int vscnprintf(char *buf, size_t size, const char *fmt, va_list args)
+{
+ return _vscnprintf(buf, size, fmt, args, false);
+}
EXPORT_SYMBOL(vscnprintf);

/**
---

Vasiliy Kulikov

unread,

Jul 3, 2011, 7:50:01 AM7/3/11

to

On Sun, Jul 03, 2011 at 14:01 +0400, Vasiliy Kulikov wrote:
> diff --git a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> index fc2cdb6..b2a659b 100644
> --- a/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> +++ b/drivers/scsi/cxgbi/cxgb3i/cxgb3i.c
> @@ -38,7 +38,7 @@ static unsigned int dbg_level;
>
> static char version[] =
> DRV_MODULE_DESC " " DRV_MODULE_NAME
> - " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")\n";
> + " v" DRV_MODULE_VERSION " (" DRV_MODULE_RELDATE ")";
>
> MODULE_AUTHOR("Chelsio Communications, Inc.");
> MODULE_DESCRIPTION(DRV_MODULE_DESC);
> @@ -1402,7 +1402,7 @@ static int __init cxgb3i_init_module(void)
> {
> int rc;
>
> - printk(KERN_INFO "%s", version);
> + printk(KERN_INFO "%s\b", version);

Surely this and init/main.c should have \n, sorry.

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Alan Cox

unread,

Jul 3, 2011, 8:30:01 AM7/3/11

to

> Also, the problem with CSI code is still here. It cannot be filtered
> because it is a valid byte inside of UTF-8 character. For a console
> with IUTF bit set all dangerous characters are filtered, but for the
> rest they are not. As at the stage of vscnprintf the presence of IUTF
> bit is not clear, users of non-IUTF console should still use "| less" or
> similar. Obviously, \n is filtered for all consoles.

This is why the filter pattern has to come from the console. Your patch
also for example doesn't fix anything on a 7bit serial console.. only the
hardware driver really knows what the filter rules are.

Looks a very good starting point though, and the embedded \n's mostly
want fixing up anyway.

Linus Torvalds

unread,

Jul 3, 2011, 1:50:02 PM7/3/11

to

On Sun, Jul 3, 2011 at 3:01 AM, Vasiliy Kulikov <seg...@openwall.com> wrote:
>
> Sigh... After implementing controls filtering including \n inside of
> %s, I got numerous false positives. Most of them are startup messages
> with driver/hardware name/version, in drivers/.

Yeah, I was afraid of that. It's going to be a much bigger patch, and
likely somewhat annoying.

That said, if we really want to do this, I think doing it with %s
filtering is the only way, and it would make the default case where
people really don't think about possible user-supplied strings be
safe.

So saying "%s is for pure 7-bit ASCII with no control codes" is
annoying, but would really fix it.

That said, I think it should be unconditional. None of this "safe vs
unsafe" flags, and none of this "printk format strings are different
from other vsprintf format strings". If special characters are a
potential security problem for printk(), then they are a potential
security problem for other things (eg /proc filenames or content etc).

Linus

Alan Cox

unread,

Jul 3, 2011, 5:10:02 PM7/3/11

to

> That said, if we really want to do this, I think doing it with %s
> filtering is the only way, and it would make the default case where
> people really don't think about possible user-supplied strings be
> safe.
>
> So saying "%s is for pure 7-bit ASCII with no control codes" is
> annoying, but would really fix it.

It's fairly easy for the console in question to provide a filter but 7bit
ascii wouldn't be a bad default for a printing console, definitely a
wrong default for the log file though.

> That said, I think it should be unconditional. None of this "safe vs
> unsafe" flags, and none of this "printk format strings are different
> from other vsprintf format strings". If special characters are a
> potential security problem for printk(), then they are a potential
> security problem for other things (eg /proc filenames or content etc).

In which case we can't do it because we need \n in proc content so that's
a complete and utter non starter. In addition we have things with
filenames in it and filenames are unicode so we'd break apps that look
filenames up via /proc for things like monitoring.

It depends how much you care as well - any idiot can figure out how to
simply use spaces and/or tabs to build multiple lines of fake looking
output like say a spoofed Oops.

As Lars can no doubt remind people spoofing oopses can be fun ;)

Alan

Linus Torvalds

unread,

Jul 3, 2011, 5:40:02 PM7/3/11

to

On Sun, Jul 3, 2011 at 2:10 PM, Alan Cox <al...@lxorguk.ukuu.org.uk> wrote:
>
> In which case we can't do it because we need \n in proc content so that's
> a complete and utter non starter.

You didn't read the original email I sent, did you?

For people who _want_ the unsafe versions with \n and other control
characters, they'd use a "raw string" format. Something like our
extended "%p" formats. Maybe '%p[]' - we could allow extensions later
that describe which characters to escape inside the brackets (for
example, we currently have ad-hoc escaping of things like pathnames:
with '/' not being legal in a path component)

The point being that that way it would be (a) safe by default (b)
require _thought_ when you actually wanted to print control characters
and (c) be easily greppable too.

Linus

Vasiliy Kulikov

unread,

Jul 5, 2011, 1:50:04 PM7/5/11

to

On Sun, Jul 03, 2011 at 10:42 -0700, Linus Torvalds wrote:
> So saying "%s is for pure 7-bit ASCII with no control codes" is
> annoying, but would really fix it.

Hmm.. It breaks usb UTF-8 strings for sure. I see some printks in
debugging code. There might be other users of UTF-8 strings fed to
printk().

--
Vasiliy Kulikov
http://www.openwall.com - bringing security into open computing environments

Pavel Machek

unread,

Jul 11, 2011, 2:40:02 AM7/11/11

to

On Wed 2011-06-22 08:37:42, Greg KH wrote:
> On Wed, Jun 22, 2011 at 01:53:41PM +0400, Vasiliy Kulikov wrote:
> > This patch escapes all characters outside of allowed '\n' plus 0x20-0x7E
> > charset passed to printk().

> >
> > There are numerous printk() instances with user supplied input as "%s"
> > data, and unprivileged user may craft log messages with substrings
> > containing control characters via these printk()s. Control characters

> > might fool root viewing the logs via tty.
>
> There are "numerous" places this could happen? Shouldn't this be
> handled by the viewers of the log file and not the kernel itself?

well, currently cat /proc/kmsg is bad idea on multiuser system... right?

> And what could these control characters cause to be "fooled"?

terminals are powerful (or shall we say insecure?) these days. Including replies.

cat /proc/kmsg can result in something like "9q" be added to the next
bash command. It would not surprise me if nastier stuff was possible
with some xterm variant.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html