[PATCH] lib/string.c: implement stpcpy

19 views
Skip to first unread message

Nick Desaulniers

unread,
Aug 14, 2020, 8:24:29 PM8/14/20
to Andrew Morton, Dávid Bolvanský, Eli Friedman, Nick Desaulniers, sta...@vger.kernel.org, Sami Tolvanen, Joe Perches, Tony Luck, Yury Norov, Daniel Axtens, Arvind Sankar, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
LLVM implemented a recent "libcall optimization" that lowers calls to
`sprintf(dest, "%s", str)` where the return value is used to
`stpcpy(dest, str) - dest`. This generally avoids the machinery involved
in parsing format strings.

`stpcpy` is just like `strcpy` except:
1. it returns the pointer to the new tail of `dest`. This allows you to
chain multiple calls to `stpcpy` in one statement.
2. it requires the parameters not to overlap. Calling `sprintf` with
overlapping arguments was clarified in ISO C99 and POSIX.1-2001 to be
undefined behavior.

`stpcpy` was first standardized in POSIX.1-2008.

Implement this so that we don't observe linkage failures due to missing
symbol definitions for `stpcpy`.

Similar to last year's fire drill with:
commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

This optimization was introduced into clang-12.

Cc: sta...@vger.kernel.org
Link: https://bugs.llvm.org/show_bug.cgi?id=47162
Link: https://github.com/ClangBuiltLinux/linux/issues/1126
Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html
Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html
Link: https://reviews.llvm.org/D85963
Reported-by: Sami Tolvanen <samito...@google.com>
Signed-off-by: Nick Desaulniers <ndesau...@google.com>
---
include/linux/string.h | 3 +++
lib/string.c | 23 +++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index b1f3894a0a3e..e570b9b10f50 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);
#ifndef __HAVE_ARCH_STRSCPY
ssize_t strscpy(char *, const char *, size_t);
#endif
+#ifndef __HAVE_ARCH_STPCPY
+extern char *stpcpy(char *__restrict, const char *__restrict__);
+#endif

/* Wraps calls to strscpy()/memset(), no arch specific code required */
ssize_t strscpy_pad(char *dest, const char *src, size_t count);
diff --git a/lib/string.c b/lib/string.c
index 6012c385fb31..81bc4d62c256 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -272,6 +272,29 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)
}
EXPORT_SYMBOL(strscpy_pad);

+#ifndef __HAVE_ARCH_STPCPY
+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ * of dest, including src's NULL terminator. May overrun dest.
+ * @dest: pointer to end of string being copied into. Must be large enough
+ * to receive copy.
+ * @src: pointer to the beginning of string being copied from. Must not overlap
+ * dest.
+ *
+ * stpcpy differs from strcpy in two key ways:
+ * 1. inputs must not overlap.
+ * 2. return value is the new NULL terminated character. (for strcpy, the
+ * return value is a pointer to src.
+ */
+#undef stpcpy
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
+{
+ while ((*dest++ = *src++) != '\0')
+ /* nothing */;
+ return dest;
+}
+#endif
+
#ifndef __HAVE_ARCH_STRCAT
/**
* strcat - Append one %NUL-terminated string to another
--
2.28.0.220.ged08abb693-goog

Joe Perches

unread,
Aug 14, 2020, 8:52:05 PM8/14/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Sami Tolvanen, Tony Luck, Yury Norov, Daniel Axtens, Arvind Sankar, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
On Fri, 2020-08-14 at 17:24 -0700, Nick Desaulniers wrote:
> LLVM implemented a recent "libcall optimization" that lowers calls to
> `sprintf(dest, "%s", str)` where the return value is used to
> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> in parsing format strings.
[]
> diff --git a/include/linux/string.h b/include/linux/string.h
[]
> @@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);
> #ifndef __HAVE_ARCH_STRSCPY
> ssize_t strscpy(char *, const char *, size_t);
> #endif
> +#ifndef __HAVE_ARCH_STPCPY
> +extern char *stpcpy(char *__restrict, const char *__restrict__);

Why use two different forms for __restrict and __restrict__?
Any real reason to use __restrict__ at all?

It's used nowhere else in the kernel.

$ git grep -w -P '__restrict_{0,2}'
scripts/genksyms/keywords.c: // According to rth, c99 defines "_Bool", __restrict", __restrict__", "restrict". KAO
scripts/genksyms/keywords.c: { "__restrict__", RESTRICT_KEYW },


Sami Tolvanen

unread,
Aug 14, 2020, 8:53:40 PM8/14/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Joe Perches, Tony Luck, Yury Norov, Daniel Axtens, Arvind Sankar, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, LKML, clang-built-linux
Hi Nick,
Thank you for the patch! I can confirm that this fixes the build for
me with ToT Clang.

Tested-by: Sami Tolvanen <samito...@google.com>

Sami

Arvind Sankar

unread,
Aug 14, 2020, 9:33:13 PM8/14/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Sami Tolvanen, Joe Perches, Tony Luck, Yury Norov, Daniel Axtens, Arvind Sankar, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
On Fri, Aug 14, 2020 at 05:24:15PM -0700, Nick Desaulniers wrote:
> +#ifndef __HAVE_ARCH_STPCPY
> +/**
> + * stpcpy - copy a string from src to dest returning a pointer to the new end
> + * of dest, including src's NULL terminator. May overrun dest.
> + * @dest: pointer to end of string being copied into. Must be large enough
> + * to receive copy.
> + * @src: pointer to the beginning of string being copied from. Must not overlap
> + * dest.
> + *
> + * stpcpy differs from strcpy in two key ways:
> + * 1. inputs must not overlap.
> + * 2. return value is the new NULL terminated character. (for strcpy, the
> + * return value is a pointer to src.
> + */
> +#undef stpcpy
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
> +{
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return dest;
> +}
> +#endif
> +

Won't this return a pointer that's one _past_ the terminating NUL? I
think you need the increments to be inside the loop body, rather than as
part of the condition.

Nit: NUL is more correct than NULL to refer to the string terminator.

Arvind Sankar

unread,
Aug 14, 2020, 9:40:09 PM8/14/20
to Arvind Sankar, Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Sami Tolvanen, Joe Perches, Tony Luck, Yury Norov, Daniel Axtens, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Also, strcpy (at least the one in the C standard) is undefined if the
strings overlap, so that's not really a difference.

Nick Desaulniers

unread,
Aug 14, 2020, 10:00:36 PM8/14/20
to Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Sami Tolvanen, Tony Luck, Yury Norov, Daniel Axtens, Arvind Sankar, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, LKML, clang-built-linux
On Fri, Aug 14, 2020 at 5:52 PM Joe Perches <j...@perches.com> wrote:
>
> On Fri, 2020-08-14 at 17:24 -0700, Nick Desaulniers wrote:
> > LLVM implemented a recent "libcall optimization" that lowers calls to
> > `sprintf(dest, "%s", str)` where the return value is used to
> > `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> > in parsing format strings.
> []
> > diff --git a/include/linux/string.h b/include/linux/string.h
> []
> > @@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);
> > #ifndef __HAVE_ARCH_STRSCPY
> > ssize_t strscpy(char *, const char *, size_t);
> > #endif
> > +#ifndef __HAVE_ARCH_STPCPY
> > +extern char *stpcpy(char *__restrict, const char *__restrict__);
>
> Why use two different forms for __restrict and __restrict__?
> Any real reason to use __restrict__ at all?

Bah, sorry, I recently enabled some setting in my ~/.vimrc to help me
find my cursor better:
" highlight cursor
set cursorline
set cursorcolumn

Turns out this makes it pretty difficult to see underscores, or the
lack thereof. Will fix up.

>
> It's used nowhere else in the kernel.
>
> $ git grep -w -P '__restrict_{0,2}'
> scripts/genksyms/keywords.c: // According to rth, c99 defines "_Bool", __restrict", __restrict__", "restrict". KAO
> scripts/genksyms/keywords.c: { "__restrict__", RESTRICT_KEYW },
>
>


--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Aug 14, 2020, 10:01:12 PM8/14/20
to Arvind Sankar, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Sami Tolvanen, Joe Perches, Tony Luck, Yury Norov, Daniel Axtens, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, LKML, clang-built-linux
Yep, looks like I had a bug in my test program that masked this.
Thanks for triple checking.

>
> Nit: NUL is more correct than NULL to refer to the string terminator.

TIL.

--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Aug 14, 2020, 10:09:54 PM8/14/20
to Andrew Morton, Dávid Bolvanský, Eli Friedman, Nick Desaulniers, sta...@vger.kernel.org, Arvind Sankar, Joe Perches, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Kees Cook, Ingo Molnar, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
LLVM implemented a recent "libcall optimization" that lowers calls to
`sprintf(dest, "%s", str)` where the return value is used to
`stpcpy(dest, str) - dest`. This generally avoids the machinery involved
in parsing format strings. Calling `sprintf` with overlapping arguments
was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.

`stpcpy` is just like `strcpy` except it returns the pointer to the new
tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
one statement.

`stpcpy` was first standardized in POSIX.1-2008.

Implement this so that we don't observe linkage failures due to missing
symbol definitions for `stpcpy`.

Similar to last year's fire drill with:
commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

This optimization was introduced into clang-12.

Cc: sta...@vger.kernel.org
Link: https://bugs.llvm.org/show_bug.cgi?id=47162
Link: https://github.com/ClangBuiltLinux/linux/issues/1126
Link: https://man7.org/linux/man-pages/man3/stpcpy.3.html
Link: https://pubs.opengroup.org/onlinepubs/9699919799/functions/stpcpy.html
Link: https://reviews.llvm.org/D85963
Suggested-by: Arvind Sankar <nive...@alum.mit.edu>
Suggested-by: Joe Perches <j...@perches.com>
Reported-by: Sami Tolvanen <samito...@google.com>
Tested-by: Sami Tolvanen <samito...@google.com>
Signed-off-by: Nick Desaulniers <ndesau...@google.com>
---
Changes V2:
* Added Sami's Tested by; though the patch changed implementation, the
missing symbol at link time was the problem Sami was observing.
* Fix __restrict -> __restrict__ typo as per Joe.
* Drop note about restrict from commit message as per Arvind.
* Fix NULL -> NUL as per Arvind; NUL is ASCII '\0'. TIL
* Fix off by one error as per Arvind; I had another off by one error in
my test program that was masking this.

include/linux/string.h | 3 +++
lib/string.c | 23 +++++++++++++++++++++++
2 files changed, 26 insertions(+)

diff --git a/include/linux/string.h b/include/linux/string.h
index b1f3894a0a3e..7686dbca8582 100644
--- a/include/linux/string.h
+++ b/include/linux/string.h
@@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);
#ifndef __HAVE_ARCH_STRSCPY
ssize_t strscpy(char *, const char *, size_t);
#endif
+#ifndef __HAVE_ARCH_STPCPY
+extern char *stpcpy(char *__restrict__, const char *__restrict__);
+#endif

/* Wraps calls to strscpy()/memset(), no arch specific code required */
ssize_t strscpy_pad(char *dest, const char *src, size_t count);
diff --git a/lib/string.c b/lib/string.c
index 6012c385fb31..68ddbffbbd58 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -272,6 +272,29 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)
}
EXPORT_SYMBOL(strscpy_pad);

+#ifndef __HAVE_ARCH_STPCPY
+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ * of dest, including src's NUL terminator. May overrun dest.
+ * @dest: pointer to end of string being copied into. Must be large enough
+ * to receive copy.
+ * @src: pointer to the beginning of string being copied from. Must not overlap
+ * dest.
+ *
+ * stpcpy differs from strcpy in two key ways:
+ * 1. inputs must not overlap.
+ * 2. return value is the new NULL terminated character. (for strcpy, the
+ * return value is a pointer to src.
+ */
+#undef stpcpy
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
+{
+ while ((*dest++ = *src++) != '\0')
+ /* nothing */;
+ return --dest;

Arvind Sankar

unread,
Aug 14, 2020, 10:58:20 PM8/14/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Arvind Sankar, Joe Perches, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Kees Cook, Ingo Molnar, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Looks like you missed my second email: strcpy also does not allow inputs
to overlap. Couple typos below.

> + * 2. return value is the new NULL terminated character. (for strcpy, the
^^ NUL terminator.
> + * return value is a pointer to src.
^^ dest.)
> + */
> +#undef stpcpy
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
> +{
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return --dest;
> +}
> +#endif
> +
> #ifndef __HAVE_ARCH_STRCAT
> /**
> * strcat - Append one %NUL-terminated string to another
> --
> 2.28.0.220.ged08abb693-goog
>

The kernel-doc comments in string.c currently have a mix of %NUL and
NUL, but the former seems to be more common. %NUL-terminator appears to
be the preferred wording.

Joe Perches

unread,
Aug 14, 2020, 11:42:40 PM8/14/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Kees Cook, Ingo Molnar, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
On Fri, 2020-08-14 at 19:09 -0700, Nick Desaulniers wrote:
> LLVM implemented a recent "libcall optimization" that lowers calls to
> `sprintf(dest, "%s", str)` where the return value is used to
> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> in parsing format strings. Calling `sprintf` with overlapping arguments
> was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
>
> `stpcpy` is just like `strcpy` except it returns the pointer to the new
> tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> one statement.
[]
> diff --git a/include/linux/string.h b/include/linux/string.h
[]
> @@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);
> #ifndef __HAVE_ARCH_STRSCPY
> ssize_t strscpy(char *, const char *, size_t);
> #endif
> +#ifndef __HAVE_ARCH_STPCPY
> +extern char *stpcpy(char *__restrict__, const char *__restrict__);

If __restrict__ is used, perhaps it should follow the
kernel style used by attributes like __iomem and __user

extern char *stpcpy(char __restrict *dest, const char __restrict *src);

(though I would lose the extern too)

char *stpcpy(char __restrict *dest, const char __restrict *src);


Kees Cook

unread,
Aug 15, 2020, 12:34:18 PM8/15/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Arvind Sankar, Joe Perches, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> LLVM implemented a recent "libcall optimization" that lowers calls to
> `sprintf(dest, "%s", str)` where the return value is used to
> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> in parsing format strings. Calling `sprintf` with overlapping arguments
> was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
>
> `stpcpy` is just like `strcpy` except it returns the pointer to the new
> tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> one statement.

O_O What?

No; this is a _terrible_ API: there is no bounds checking, there are no
buffer sizes. Anything using the example sprintf() pattern is _already_
wrong and must be removed from the kernel. (Yes, I realize that the
kernel is *filled* with this bad assumption that "I'll never write more
than PAGE_SIZE bytes to this buffer", but that's both theoretically
wrong ("640k is enough for anybody") and has been known to be wrong in
practice too (e.g. when suddenly your writing routine is reachable by
splice(2) and you may not have a PAGE_SIZE buffer).

But we cannot _add_ another dangerous string API. We're already in a
terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. This
needs to be addressed up by removing the unbounded sprintf() uses. (And
to do so without introducing bugs related to using snprintf() when
scnprintf() is expected[4].)

-Kees

[1] https://github.com/KSPP/linux/issues/88
[2] https://github.com/KSPP/linux/issues/89
[3] https://github.com/KSPP/linux/issues/90
[4] https://lore.kernel.org/lkml/20200810092100.GA42813@2f5448a72a42/

--
Kees Cook

Dávid Bolvanský

unread,
Aug 15, 2020, 1:38:55 PM8/15/20
to Kees Cook, Nick Desaulniers, Andrew Morton, Eli Friedman, sta...@vger.kernel.org, Arvind Sankar, Joe Perches, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
Yeah, sprintf calls should be replaced with something safer.

> Dňa 15. 8. 2020 o 18:34 užívateľ Kees Cook <kees...@chromium.org> napísal:

Nick Desaulniers

unread,
Aug 15, 2020, 4:48:04 PM8/15/20
to Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Joe Perches, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
On Sat, Aug 15, 2020 at 9:34 AM Kees Cook <kees...@chromium.org> wrote:
>
> On Fri, Aug 14, 2020 at 07:09:44PM -0700, Nick Desaulniers wrote:
> > LLVM implemented a recent "libcall optimization" that lowers calls to
> > `sprintf(dest, "%s", str)` where the return value is used to
> > `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> > in parsing format strings. Calling `sprintf` with overlapping arguments
> > was clarified in ISO C99 and POSIX.1-2001 to be undefined behavior.
> >
> > `stpcpy` is just like `strcpy` except it returns the pointer to the new
> > tail of `dest`. This allows you to chain multiple calls to `stpcpy` in
> > one statement.
>
> O_O What?
>
> No; this is a _terrible_ API: there is no bounds checking, there are no
> buffer sizes. Anything using the example sprintf() pattern is _already_
> wrong and must be removed from the kernel. (Yes, I realize that the
> kernel is *filled* with this bad assumption that "I'll never write more
> than PAGE_SIZE bytes to this buffer", but that's both theoretically
> wrong ("640k is enough for anybody") and has been known to be wrong in
> practice too (e.g. when suddenly your writing routine is reachable by
> splice(2) and you may not have a PAGE_SIZE buffer).
>
> But we cannot _add_ another dangerous string API. We're already in a
> terrible mess trying to remove strcpy[1], strlcpy[2], and strncpy[3]. This
> needs to be addressed up by removing the unbounded sprintf() uses. (And
> to do so without introducing bugs related to using snprintf() when
> scnprintf() is expected[4].)

Well, everything (-next, mainline, stable) is broken right now (with
ToT Clang) without providing this symbol. I'm not going to go clean
the entire kernel's use of sprintf to get our CI back to being green.

Thoughts on not exposing the declaration in the header, and changing
the comment to be to the effect of:
"Exists only for optimizing compilers to replace calls to sprintf
with; generally unsafe as bounds checks aren't performed, do not use."
It's still a worthwhile optimization to have, even if the use that
generated it didn't do any bounds checking.
--
Thanks,
~Nick Desaulniers

Joe Perches

unread,
Aug 15, 2020, 5:24:04 PM8/15/20
to Nick Desaulniers, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
Maybe this should get place in compiler-clang.h so it isn't
generic and public.

Something like:

---
include/linux/compiler-clang.h | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)

diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
index cee0c728d39a..6279f1904e39 100644
--- a/include/linux/compiler-clang.h
+++ b/include/linux/compiler-clang.h
@@ -61,3 +61,30 @@
#if __has_feature(shadow_call_stack)
# define __noscs __attribute__((__no_sanitize__("shadow-call-stack")))
#endif
+
+#ifndef __HAVE_ARCH_STPCPY
+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ * of dest, including src's NULL terminator. May overrun dest.
+ * @dest: pointer to buffer being copied into.
+ * Must be large enough to receive copy.
+ * @src: pointer to the beginning of string being copied from.
+ * Must not overlap dest.
+ *
+ * This function exists _only_ to support clang's possible conversion of
+ * sprintf calls to stpcpy.
+ *
+ * stpcpy differs from strcpy in two key ways:
+ * 1. inputs must not overlap.
+ * 2. return value is dest's NUL termination character after copy.
+ * (for strcpy, the return value is a pointer to src)
+ */
+
+static inline char *stpcpy(char __restrict *dest, const char __restrict *src)

Dávid Bolvanský

unread,
Aug 15, 2020, 5:27:24 PM8/15/20
to Joe Perches, Nick Desaulniers, Kees Cook, Andrew Morton, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
-fno-builtin-stpcpy can be used to disable stpcpy but Nick at llvm bugzilla wrote that these flags are broken with LTO.


> Dňa 15. 8. 2020 o 23:24 užívateľ Joe Perches <j...@perches.com> napísal:

Nick Desaulniers

unread,
Aug 15, 2020, 5:28:58 PM8/15/20
to Joe Perches, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
https://bugs.llvm.org/show_bug.cgi?id=47162#c7 and
https://bugs.llvm.org/show_bug.cgi?id=47144
Seem to imply that Clang is not the only compiler that can lower a
sequence of libcalls to stpcpy. Do we want to wait until we have a
fire drill w/ GCC to move such an implementation from
include/linux/compiler-clang.h back in to lib/string.c?
--
Thanks,
~Nick Desaulniers

Joe Perches

unread,
Aug 15, 2020, 5:31:41 PM8/15/20
to Nick Desaulniers, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
My guess is yes, wait until gcc, if ever, needs it.


David Laight

unread,
Aug 15, 2020, 6:17:48 PM8/15/20
to Nick Desaulniers, Andrew Morton, Dávid Bolvanský, Eli Friedman, sta...@vger.kernel.org, Sami Tolvanen, Joe Perches, Tony Luck, Yury Norov, Daniel Axtens, Arvind Sankar, Dan Williams, Joel Fernandes (Google), Andy Shevchenko, Kees Cook, Alexandru Ardelean, linux-...@vger.kernel.org, clang-bu...@googlegroups.com
From: Nick Desaulniers
> Sent: 15 August 2020 01:24
>
> LLVM implemented a recent "libcall optimization" that lowers calls to
> `sprintf(dest, "%s", str)` where the return value is used to
> `stpcpy(dest, str) - dest`. This generally avoids the machinery involved
> in parsing format strings.
>
> `stpcpy` is just like `strcpy` except:
> 1. it returns the pointer to the new tail of `dest`. This allows you to
> chain multiple calls to `stpcpy` in one statement.
> 2. it requires the parameters not to overlap. Calling `sprintf` with
> overlapping arguments was clarified in ISO C99 and POSIX.1-2001 to be
> undefined behavior.
>
> `stpcpy` was first standardized in POSIX.1-2008.
>
> Implement this so that we don't observe linkage failures due to missing
> symbol definitions for `stpcpy`.
>
..
...
> diff --git a/include/linux/string.h b/include/linux/string.h
> index b1f3894a0a3e..e570b9b10f50 100644
> --- a/include/linux/string.h
> +++ b/include/linux/string.h
> @@ -31,6 +31,9 @@ size_t strlcpy(char *, const char *, size_t);
> #ifndef __HAVE_ARCH_STRSCPY
> ssize_t strscpy(char *, const char *, size_t);
> #endif
> +#ifndef __HAVE_ARCH_STPCPY
> +extern char *stpcpy(char *__restrict, const char *__restrict__);
> +#endif
>
> /* Wraps calls to strscpy()/memset(), no arch specific code required */
> ssize_t strscpy_pad(char *dest, const char *src, size_t count);
> diff --git a/lib/string.c b/lib/string.c
> index 6012c385fb31..81bc4d62c256 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -272,6 +272,29 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)
> }
> EXPORT_SYMBOL(strscpy_pad);
>
> +#ifndef __HAVE_ARCH_STPCPY
...
> +#undef stpcpy
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
> +{
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return dest;
> +}
> +#endif
> +

Hmmmm....
Maybe the compiler should just inline the above?

OTOH there are faster copies on 64bit systems
(for moderate length strings).

It would also be nicer if the compiler actually used/required
a symbol in the 'reserved for the implementation' namespace.
Then the linker should be able to do a fixup to a differently
name symbol - if that is required.

But compiler writers enjoy making embedded coders life hell.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Nick Desaulniers

unread,
Aug 15, 2020, 6:17:51 PM8/15/20
to Joe Perches, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
The suggestion to use static inline doesn't even make sense. The
compiler is lowering calls to other library routines; `stpcpy` isn't
being explicitly called. Even if it was, not sure we want it being
inlined. No symbol definition will be emitted; problem not solved.
And I refuse to add any more code using `extern inline`. Putting the
definition in lib/string.c is the most straightforward and avoids
revisiting this issue in the future for other toolchains. I'll limit
access by removing the declaration, and adding a comment to avoid its
use. But if you're going to use a gnu target triple without using
-ffreestanding because you *want* libcall optimizations, then you have
to provide symbols for all possible library routines!
--
Thanks,
~Nick Desaulniers

Fangrui Song

unread,
Aug 15, 2020, 8:19:23 PM8/15/20
to Nick Desaulniers, Joe Perches, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
Adding a definition without a declaration for stpcpy looks good.
Clang LTO will work.

(If the kernel does not want to provide these routines,
is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
probably wrong? (why remove -ffreestanding from the main Makefile) )

Sedat Dilek

unread,
Aug 16, 2020, 1:22:48 AM8/16/20
to Fangrui Song, Nick Desaulniers, Joe Perches, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
We had some many issues in arch/x86 where *FLAGS were removed or used
differently and had to re-add them :-(.

So if -ffreestanding is used in arch/x86 and was! used in top-level
Makefile - it makes sense to re-add it back?
( I cannot speak for archs other than x86. )

- Sedat -

Arvind Sankar

unread,
Aug 16, 2020, 11:02:21 AM8/16/20
to Sedat Dilek, Fangrui Song, Nick Desaulniers, Joe Perches, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Arvind Sankar, Sami Tolvanen, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
-ffreestanding disables _all_ builtins and libcall optimizations, which
is probably not desirable. If we added it back, we'd need to also go
back to #define various string functions to the __builtin versions.

Though I don't understand the original issue, with -ffreestanding,
sprintf shouldn't have been turned into strcpy in the first place.

32-bit still has -ffreestanding -- I wonder if that's actually necessary
any more?

Why does -fno-builtin-stpcpy not work with clang LTO? Isn't that a
compiler bug?

Thanks.

Sami Tolvanen

unread,
Aug 17, 2020, 1:14:56 PM8/17/20
to Arvind Sankar, Sedat Dilek, Fangrui Song, Nick Desaulniers, Joe Perches, Kees Cook, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
I just confirmed that adding -fno-builtin-stpcpy to KBUILD_CFLAGS does
work with LTO as well.

Sami

Nick Desaulniers

unread,
Aug 17, 2020, 2:37:01 PM8/17/20
to Sami Tolvanen, Arvind Sankar, Kees Cook, Sedat Dilek, Fangrui Song, Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
On Mon, Aug 17, 2020 at 10:14 AM Sami Tolvanen <samito...@google.com> wrote:
>
> On Sun, Aug 16, 2020 at 8:02 AM Arvind Sankar <nive...@alum.mit.edu> wrote:
> >
> > On Sun, Aug 16, 2020 at 07:22:35AM +0200, Sedat Dilek wrote:
> > > On Sun, Aug 16, 2020 at 2:19 AM 'Fangrui Song' via Clang Built Linux
> > > <clang-bu...@googlegroups.com> wrote:
> > > >
> > > > Adding a definition without a declaration for stpcpy looks good.
> > > > Clang LTO will work.
> > > >
> > > > (If the kernel does not want to provide these routines,
> > > > is http://git.kernel.org/linus/6edfba1b33c701108717f4e036320fc39abe1912
> > > > probably wrong? (why remove -ffreestanding from the main Makefile) )
> > > >
> > >
> > > We had some many issues in arch/x86 where *FLAGS were removed or used
> > > differently and had to re-add them :-(.
> > >
> > > So if -ffreestanding is used in arch/x86 and was! used in top-level
> > > Makefile - it makes sense to re-add it back?
> > > ( I cannot speak for archs other than x86. )
> > >
> > > - Sedat -
> >
> > -ffreestanding disables _all_ builtins and libcall optimizations, which
> > is probably not desirable. If we added it back, we'd need to also go

I agree.

> > back to #define various string functions to the __builtin versions.
> >
> > Though I don't understand the original issue, with -ffreestanding,
> > sprintf shouldn't have been turned into strcpy in the first place.

Huh? The original issue for this thread is because `-ffreestanding`
*isn't* being used for most targets (oh boy, actually mixed usage by
ARCH. Looks like MIPS, m68k, superH, xtensa, and 32b x86 use it?); and
I'm not suggesting it be used.

> > 32-bit still has -ffreestanding -- I wonder if that's actually necessary
> > any more?

Fair question. Someone will have to go chase git history, since
0a6ef376d4ba covers it up. If anyone has any tricks to do so quickly;
I'd love to know. I generally checkout the commit prior, then use vim
fugitive to get git blame.

> > Why does -fno-builtin-stpcpy not work with clang LTO? Isn't that a
> > compiler bug?

Yes; Sami found a recent patch that looks to me like it may have
recently solved that bug.
https://reviews.llvm.org/D71193 which landed Dec 12 2019. The bug
report was based on
https://github.com/ClangBuiltLinux/linux/issues/416#issuecomment-472231304
(Issue reported March 8 2019). And I do recall being able to
reproduce the bug when I sent
commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

Now that that is fixed as reported by Sami below, I don't mind sending
a revert for 5f074f3e192f that adds -fno-builtin-bcmp, because the
current implementation of bcmp isn't useful.

That said, this libcall optimization/transformation (sprintf->stpcpy)
does look useful to me. Kees, do you have thoughts on me providing
the implementation without exposing it in a header vs using
-fno-builtin-stpcpy? (I would need to add the missing EXPORT_SYMBOL,
as pointed out by 0day bot and on the github thread). I don't care
either way; I'd just like your input before sending a V+1. Maybe
better to just not implement it and never implement it?

>
> I just confirmed that adding -fno-builtin-stpcpy to KBUILD_CFLAGS does
> work with LTO as well.
>
> Sami

--
Thanks,
~Nick Desaulniers

Kees Cook

unread,
Aug 17, 2020, 3:15:25 PM8/17/20
to Nick Desaulniers, Sami Tolvanen, Arvind Sankar, Sedat Dilek, Fangrui Song, Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
On Mon, Aug 17, 2020 at 11:36:49AM -0700, Nick Desaulniers wrote:
> That said, this libcall optimization/transformation (sprintf->stpcpy)
> does look useful to me. Kees, do you have thoughts on me providing
> the implementation without exposing it in a header vs using
> -fno-builtin-stpcpy? (I would need to add the missing EXPORT_SYMBOL,
> as pointed out by 0day bot and on the github thread). I don't care
> either way; I'd just like your input before sending a V+1. Maybe
> better to just not implement it and never implement it?

I think I would ultimately prefer -fno-builtin-stpcpy, but for now,
sure, an implementation without a header (and a biiig comment above it
detailing why and a reference to the bug) would be fine by me.

--
Kees Cook

Kees Cook

unread,
Aug 17, 2020, 3:16:41 PM8/17/20
to Sami Tolvanen, Arvind Sankar, Sedat Dilek, Fangrui Song, Nick Desaulniers, Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes
On Mon, Aug 17, 2020 at 10:14:43AM -0700, Sami Tolvanen wrote:
> I just confirmed that adding -fno-builtin-stpcpy to KBUILD_CFLAGS does
> work with LTO as well.

Oh, I read this out of order; sorry! Yes, if -fno-builtin-stpcpy works,
let's use that instead. Doesn't that solve it too?

--
Kees Cook

Arvind Sankar

unread,
Aug 17, 2020, 4:13:55 PM8/17/20
to Nick Desaulniers, Sami Tolvanen, Arvind Sankar, Kees Cook, Sedat Dilek, Fangrui Song, Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
On Mon, Aug 17, 2020 at 11:36:49AM -0700, Nick Desaulniers wrote:
> > > Though I don't understand the original issue, with -ffreestanding,
> > > sprintf shouldn't have been turned into strcpy in the first place.
>
> Huh? The original issue for this thread is because `-ffreestanding`
> *isn't* being used for most targets (oh boy, actually mixed usage by
> ARCH. Looks like MIPS, m68k, superH, xtensa, and 32b x86 use it?); and
> I'm not suggesting it be used.
>

Sorry, I meant the issue mentioned in the commit that removed
-ffreestanding, not the stpcpy one you're solving now. It says that
sprintf got converted into strcpy, which caused failures because back
then, strcpy was #define'd to __builtin_strcpy, and the default
implementation was actually of a function called __builtin_strcpy o_O,
not strcpy.

Anyway, that's water under the bridge now.

6edfba1b33c7 ("x86_64: Don't define string functions to builtin")
gcc should handle this anyways, and it causes problems when
sprintf is turned into strcpy by gcc behind our backs and
the C fallback version of strcpy is actually defining __builtin_strcpy

Nick Desaulniers

unread,
Aug 17, 2020, 5:45:39 PM8/17/20
to Arvind Sankar, Sami Tolvanen, Kees Cook, Sedat Dilek, Fangrui Song, Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
For fun, I tried removing `-ffreestanding` from arch/x86/Makefile;
both gcc and clang can compile+boot the i386 defconfig just fine. Why
don't I send a patch removing it with your suggested by in a series of
fixes for stpcpy and bcmp?

--
Thanks,
~Nick Desaulniers

David Laight

unread,
Aug 18, 2020, 4:24:24 AM8/18/20
to Nick Desaulniers, Sami Tolvanen, Arvind Sankar, Kees Cook, Sedat Dilek, Fangrui Song, Joe Perches, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
From: Nick Desaulniers
> Sent: 17 August 2020 19:37
..
> That said, this libcall optimization/transformation (sprintf->stpcpy)
> does look useful to me.

I'd rather get a cow if I ask for a cow...

Maybe checkpatch (etc) could report places where snprintf()
could be replaced by a simpler function.

Actually the compilers need an option to leave snprintf()
calls alone.
The next 'optimisation' could easily be to parse the format
string at compile time and call separate functions for
each parameter.
While that may be faster, it will bloat code paths that
aren't really important.

Joe Perches

unread,
Aug 18, 2020, 4:32:18 AM8/18/20
to David Laight, Nick Desaulniers, Sami Tolvanen, Arvind Sankar, Kees Cook, Sedat Dilek, Fangrui Song, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
On Tue, 2020-08-18 at 08:24 +0000, David Laight wrote:
> From: Nick Desaulniers
> > Sent: 17 August 2020 19:37
> ..
> > That said, this libcall optimization/transformation (sprintf->stpcpy)
> > does look useful to me.
>
> I'd rather get a cow if I ask for a cow...
>
> Maybe checkpatch (etc) could report places where snprintf()
> could be replaced by a simpler function.

You mean sprintf no?

Reminds me of seq_printf vs seq_puts...

https://lkml.org/lkml/2013/3/16/79


David Laight

unread,
Aug 18, 2020, 5:07:40 AM8/18/20
to Joe Perches, Nick Desaulniers, Sami Tolvanen, Arvind Sankar, Kees Cook, Sedat Dilek, Fangrui Song, Andrew Morton, Dávid Bolvanský, Eli Friedman, # 3.4.x, Vishal Verma, Dan Williams, Andy Shevchenko, Joel Fernandes (Google), Daniel Axtens, Ingo Molnar, Yury Norov, Alexandru Ardelean, LKML, clang-built-linux, Rasmus Villemoes, Nathan Chancellor
From: Joe Perches
> Sent: 18 August 2020 09:32
>
> On Tue, 2020-08-18 at 08:24 +0000, David Laight wrote:
> > From: Nick Desaulniers
> > > Sent: 17 August 2020 19:37
> > ..
> > > That said, this libcall optimization/transformation (sprintf->stpcpy)
> > > does look useful to me.
> >
> > I'd rather get a cow if I ask for a cow...
> >
> > Maybe checkpatch (etc) could report places where snprintf()
> > could be replaced by a simpler function.
>
> You mean sprintf no?

No one should be using sprintf() so I typed snprintf() :-)

> Reminds me of seq_printf vs seq_puts...
>
> https://lkml.org/lkml/2013/3/16/79

At least that was a choice in the header file, not some
'random' conversion the compiler decided to do.

I've got annoyed in the past by it converting:
printf("%s", string) to puts(string)
now maybe the build should have had -ffree-standing
but I'd only got an implementation of printf().

Nick Desaulniers

unread,
Aug 25, 2020, 9:58:57 AM8/25/20
to Masahiro Yamada, clang-bu...@googlegroups.com, Nick Desaulniers, sta...@vger.kernel.org, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, linux-...@vger.kernel.org
LLVM implemented a recent "libcall optimization" that lowers calls to
`sprintf(dest, "%s", str)` where the return value is used to
`stpcpy(dest, str) - dest`. This generally avoids the machinery involved
in parsing format strings. `stpcpy` is just like `strcpy` except it
returns the pointer to the new tail of `dest`. This optimization was
introduced into clang-12.

Implement this so that we don't observe linkage failures due to missing
symbol definitions for `stpcpy`.

Similar to last year's fire drill with:
commit 5f074f3e192f ("lib/string.c: implement a basic bcmp")

The kernel is somewhere between a "freestanding" environment (no full libc)
and "hosted" environment (many symbols from libc exist with the same
type, function signature, and semantics).

As H. Peter Anvin notes, there's not really a great way to inform the
compiler that you're targeting a freestanding environment but would like
to opt-in to some libcall optimizations (see pr/47280 below), rather than
opt-out.

Arvind notes, -fno-builtin-* behaves slightly differently between GCC
and Clang, and Clang is missing many __builtin_* definitions, which I
consider a bug in Clang and am working on fixing.

Masahiro summarizes the subtle distinction between compilers justly:
To prevent transformation from foo() into bar(), there are two ways in
Clang to do that; -fno-builtin-foo, and -fno-builtin-bar. There is
only one in GCC; -fno-buitin-foo.

(Any difference in that behavior in Clang is likely a bug from a missing
__builtin_* definition.)

Masahiro also notes:
We want to disable optimization from foo() to bar(),
but we may still benefit from the optimization from
foo() into something else. If GCC implements the same transform, we
would run into a problem because it is not -fno-builtin-bar, but
-fno-builtin-foo that disables that optimization.

In this regard, -fno-builtin-foo would be more future-proof than
-fno-built-bar, but -fno-builtin-foo is still potentially overkill. We
may want to prevent calls from foo() being optimized into calls to
bar(), but we still may want other optimization on calls to foo().

It seems that compilers today don't quite provide the fine grain control
over which libcall optimizations pseudo-freestanding environments would
prefer.

Finally, Kees notes that this interface is unsafe, so we should not
encourage its use. As such, I've removed the declaration from any
header, but it still needs to be exported to avoid linkage errors in
modules.
Link: https://bugs.llvm.org/show_bug.cgi?id=47280
Suggested-by: Andy Lavr <andy...@gmail.com>
Suggested-by: Arvind Sankar <nive...@alum.mit.edu>
Suggested-by: Joe Perches <j...@perches.com>
Suggested-by: Masahiro Yamada <masa...@kernel.org>
Suggested-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>
Reported-by: Sami Tolvanen <samito...@google.com>
Signed-off-by: Nick Desaulniers <ndesau...@google.com>
---
Changes V3:
* Drop Sami's Tested by tag; newer patch.
* Add EXPORT_SYMBOL as per Andy.
* Rewrite commit message, rewrote part of what Masahiro said to be
generic in terms of foo() and bar().
* Prefer %NUL-terminated to NULL terminated. NUL is the ASCII character
'\0', as per Arvind and Rasmus.

Changes V2:
* Added Sami's Tested by; though the patch changed implementation, the
missing symbol at link time was the problem Sami was observing.
* Fix __restrict -> __restrict__ typo as per Joe.
* Drop note about restrict from commit message as per Arvind.
* Fix NULL -> NUL as per Arvind; NUL is ASCII '\0'. TIL
* Fix off by one error as per Arvind; I had another off by one error in
my test program that was masking this.

lib/string.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 6012c385fb31..6bd0cf0fb009 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -272,6 +272,30 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)
}
EXPORT_SYMBOL(strscpy_pad);

+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ * of dest, including src's %NUL-terminator. May overrun dest.
+ * @dest: pointer to end of string being copied into. Must be large enough
+ * to receive copy.
+ * @src: pointer to the beginning of string being copied from. Must not overlap
+ * dest.
+ *
+ * stpcpy differs from strcpy in a key way: the return value is the new
+ * %NUL-terminated character. (for strcpy, the return value is a pointer to
+ * src. This interface is considered unsafe as it doesn't perform bounds
+ * checking of the inputs. As such it's not recommended for usage. Instead,
+ * its definition is provided in case the compiler lowers other libcalls to
+ * stpcpy.
+ */
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src);
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
+{
+ while ((*dest++ = *src++) != '\0')
+ /* nothing */;
+ return --dest;
+}
+EXPORT_SYMBOL(stpcpy);
+
#ifndef __HAVE_ARCH_STRCAT
/**
* strcat - Append one %NUL-terminated string to another
--
2.28.0.297.g1956fa8f8d-goog

Nick Desaulniers

unread,
Aug 25, 2020, 10:00:18 AM8/25/20
to Masahiro Yamada, clang-bu...@googlegroups.com, Nick Desaulniers, sta...@vger.kernel.org, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Kees Cook, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org

Nathan Chancellor

unread,
Aug 25, 2020, 2:51:25 PM8/25/20
to Nick Desaulniers, h@ubuntu-n2-xlarge-x86, Masahiro Yamada, clang-bu...@googlegroups.com, sta...@vger.kernel.org, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, linux-...@vger.kernel.org
Tested-by: Nathan Chancellor <natecha...@gmail.com>

Kees Cook

unread,
Aug 26, 2020, 11:22:12 AM8/26/20
to Nick Desaulniers, Masahiro Yamada, clang-bu...@googlegroups.com, sta...@vger.kernel.org, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Yury Norov, Alexandru Ardelean, linux-...@vger.kernel.org
Acked-by: Kees Cook <kees...@chromium.org>

--
Kees Cook

Sedat Dilek

unread,
Aug 26, 2020, 11:41:28 AM8/26/20
to Nick Desaulniers, Masahiro Yamada, Clang-Built-Linux ML, sta...@vger.kernel.org, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, linux-...@vger.kernel.org
Tested-by: Sedat Dilek <sedat...@gmail.com> # included in Sami's
clang-cfi Git

- Sedat -
> --
> You received this message because you are subscribed to the Google Groups "Clang Built Linux" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clang-built-li...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clang-built-linux/20200825135838.2938771-1-ndesaulniers%40google.com.

Masahiro Yamada

unread,
Aug 26, 2020, 11:59:28 AM8/26/20
to Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
return a pointer to src?


"man 3 strcpy" says:

The strcpy() and strncpy() functions return
a pointer to the destination string *dest*.








> This interface is considered unsafe as it doesn't perform bounds
> + * checking of the inputs. As such it's not recommended for usage. Instead,
> + * its definition is provided in case the compiler lowers other libcalls to
> + * stpcpy.
> + */
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src);
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
> +{
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return --dest;
> +}
> +EXPORT_SYMBOL(stpcpy);
> +
> #ifndef __HAVE_ARCH_STRCAT
> /**
> * strcat - Append one %NUL-terminated string to another
> --
> 2.28.0.297.g1956fa8f8d-goog
>


--
Best Regards
Masahiro Yamada

Masahiro Yamada

unread,
Aug 26, 2020, 12:49:58 PM8/26/20
to Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Tue, Aug 25, 2020 at 10:58 PM Nick Desaulniers
<ndesau...@google.com> wrote:
>
I do not have time to keep track of the discussion fully,
but could you give me a little more context why
the usage of stpcpy() is not recommended ?

The implementation of strcpy() is almost the same.
It is unclear to me what makes stpcpy() unsafe..



> + */
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src);
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
> +{
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return --dest;
> +}
> +EXPORT_SYMBOL(stpcpy);
> +
> #ifndef __HAVE_ARCH_STRCAT
> /**
> * strcat - Append one %NUL-terminated string to another
> --
> 2.28.0.297.g1956fa8f8d-goog
>


Joe Perches

unread,
Aug 26, 2020, 12:57:33 PM8/26/20
to Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, 2020-08-27 at 01:49 +0900, Masahiro Yamada wrote:
> I do not have time to keep track of the discussion fully,
> but could you give me a little more context why
> the usage of stpcpy() is not recommended ?
>
> The implementation of strcpy() is almost the same.
> It is unclear to me what makes stpcpy() unsafe..

It's the same thing that makes strcpy unsafe:

Unchecked buffer lengths with no guarantee src is terminated.



Nick Desaulniers

unread,
Aug 26, 2020, 12:58:13 PM8/26/20
to Joe Perches, Masahiro Yamada, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Wed, Aug 26, 2020 at 9:57 AM Joe Perches <j...@perches.com> wrote:
>
> On Thu, 2020-08-27 at 01:49 +0900, Masahiro Yamada wrote:
> > I do not have time to keep track of the discussion fully,
> > but could you give me a little more context why
> > the usage of stpcpy() is not recommended ?
> >
> > The implementation of strcpy() is almost the same.
> > It is unclear to me what makes stpcpy() unsafe..

https://lore.kernel.org/lkml/202008150921.B70721A359@keescook/

>
> It's the same thing that makes strcpy unsafe:
>
> Unchecked buffer lengths with no guarantee src is terminated.

--
Thanks,
~Nick Desaulniers

Masahiro Yamada

unread,
Aug 26, 2020, 7:00:45 PM8/26/20
to Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Kees Cook, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
OK, then stpcpy(), strcpy() and sprintf()
have the same level of unsafety.


strcpy() is used everywhere.

I am not convinced why only stpcpy() should be hidden.

Kees Cook

unread,
Aug 26, 2020, 7:38:26 PM8/26/20
to Masahiro Yamada, Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, Aug 27, 2020 at 07:59:45AM +0900, Masahiro Yamada wrote:
> On Thu, Aug 27, 2020 at 1:58 AM Nick Desaulniers
> <ndesau...@google.com> wrote:
> >
> > On Wed, Aug 26, 2020 at 9:57 AM Joe Perches <j...@perches.com> wrote:
> > >
> > > On Thu, 2020-08-27 at 01:49 +0900, Masahiro Yamada wrote:
> > > > I do not have time to keep track of the discussion fully,
> > > > but could you give me a little more context why
> > > > the usage of stpcpy() is not recommended ?
> > > >
> > > > The implementation of strcpy() is almost the same.
> > > > It is unclear to me what makes stpcpy() unsafe..
> >
> > https://lore.kernel.org/lkml/202008150921.B70721A359@keescook/
> >
> > >
> > > It's the same thing that makes strcpy unsafe:
> > >
> > > Unchecked buffer lengths with no guarantee src is terminated.
> >
>
>
> OK, then stpcpy(), strcpy() and sprintf()
> have the same level of unsafety.

Yes. And even snprintf() is dangerous because its return value is how
much it WOULD have written, which when (commonly) used as an offset for
further pointer writes, causes OOB writes too. :(
https://github.com/KSPP/linux/issues/105

> strcpy() is used everywhere.

Yes. It's very frustrating, but it's not an excuse to continue
using it nor introducing more bad APIs.

$ git grep '\bstrcpy\b' | wc -l
2212
$ git grep '\bstrncpy\b' | wc -l
751
$ git grep '\bstrlcpy\b' | wc -l
1712

$ git grep '\bstrscpy\b' | wc -l
1066

https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy
https://github.com/KSPP/linux/issues/88

https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings
https://github.com/KSPP/linux/issues/89

https://www.kernel.org/doc/html/latest/process/deprecated.html#strlcpy
https://github.com/KSPP/linux/issues/90

We have no way right now to block the addition of deprecated API usage,
which makes ever catching up on this replacement very challenging. The
only way we caught up with VLA removal was because of -Wvla on sfr's
-next builds.

I guess we could set up a robot to just watch -next commits and yell
about new instances, but patches come and go -- I worry it'd be noisy...

> I am not convinced why only stpcpy() should be hidden.

Because nothing uses it right now. It's only the compiler suddenly now
trying to use it directly...

--
Kees Cook

Joe Perches

unread,
Aug 26, 2020, 7:57:46 PM8/26/20
to Kees Cook, Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Wed, 2020-08-26 at 16:38 -0700, Kees Cook wrote:
> On Thu, Aug 27, 2020 at 07:59:45AM +0900, Masahiro Yamada wrote:
[]
These could be added to checkpatch's deprecated_api test.
---
scripts/checkpatch.pl | 3 +++
1 file changed, 3 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 149518d2a6a7..f9ccb2a63a95 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -605,6 +605,9 @@ foreach my $entry (@mode_permission_funcs) {
$mode_perms_search = "(?:${mode_perms_search})";

our %deprecated_apis = (
+ "strcpy" => "strscpy",
+ "strncpy" => "strscpy",
+ "strlcpy" => "strscpy",
"synchronize_rcu_bh" => "synchronize_rcu",
"synchronize_rcu_bh_expedited" => "synchronize_rcu_expedited",
"call_rcu_bh" => "call_rcu",


Kees Cook

unread,
Aug 26, 2020, 10:33:48 PM8/26/20
to Joe Perches, Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
Good idea, yeah. We, unfortunately, need to leave strncpy() off this
list for now because it's not *strictly* deprecated (see the notes in
bug report[1]), but the others can be.

[1] https://github.com/KSPP/linux/issues/89

--
Kees Cook

Joe Perches

unread,
Aug 26, 2020, 10:42:22 PM8/26/20
to Kees Cook, Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
OK, but it is in Documentation/process/deprecated.rst

strncpy() on NUL-terminated strings
-----------------------------------
Use of strncpy() does not guarantee that the destination buffer
will be NUL terminated. This can lead to various linear read overflows
and other misbehavior due to the missing termination. It also NUL-pads the
destination buffer if the source contents are shorter than the destination
buffer size, which may be a needless performance penalty for callers using
only NUL-terminated strings. The safe replacement is strscpy().
(Users of strscpy() still needing NUL-padding should instead
use strscpy_pad().)

If a caller is using non-NUL-terminated strings, strncpy() can
still be used, but destinations should be marked with the `__nonstring
<https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html>`_
attribute to avoid future compiler warnings.


Andy Shevchenko

unread,
Aug 27, 2020, 4:59:41 AM8/27/20
to Kees Cook, Masahiro Yamada, Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, Aug 27, 2020 at 2:40 AM Kees Cook <kees...@chromium.org> wrote:
> On Thu, Aug 27, 2020 at 07:59:45AM +0900, Masahiro Yamada wrote:
> > On Thu, Aug 27, 2020 at 1:58 AM Nick Desaulniers
> > <ndesau...@google.com> wrote:
> > > On Wed, Aug 26, 2020 at 9:57 AM Joe Perches <j...@perches.com> wrote:
> > > > On Thu, 2020-08-27 at 01:49 +0900, Masahiro Yamada wrote:
> > > > > I do not have time to keep track of the discussion fully,
> > > > > but could you give me a little more context why
> > > > > the usage of stpcpy() is not recommended ?
> > > > >
> > > > > The implementation of strcpy() is almost the same.
> > > > > It is unclear to me what makes stpcpy() unsafe..
> > >
> > > https://lore.kernel.org/lkml/202008150921.B70721A359@keescook/
> > >
> > > >
> > > > It's the same thing that makes strcpy unsafe:
> > > >
> > > > Unchecked buffer lengths with no guarantee src is terminated.
> > >
> >
> >
> > OK, then stpcpy(), strcpy() and sprintf()
> > have the same level of unsafety.
>
> Yes. And even snprintf() is dangerous because its return value is how
> much it WOULD have written, which when (commonly) used as an offset for
> further pointer writes, causes OOB writes too. :(
> https://github.com/KSPP/linux/issues/105
>
> > strcpy() is used everywhere.
>
> Yes. It's very frustrating, but it's not an excuse to continue
> using it nor introducing more bad APIs.

strcpy() is not a bad API for the cases when you know what you are
doing. A problem that most of the developers do not know what they are
doing.
No need to split everything to bad and good by its name or semantics,
each API has its own pros and cons and programmers must use their
brains.
--
With Best Regards,
Andy Shevchenko

Kees Cook

unread,
Aug 27, 2020, 2:26:07 PM8/27/20
to Joe Perches, Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
"... on NUL-terminated strings". It's "valid" to use it on known-size
(either external or by definition) NUL-padded buffers (e.g. NLA_STRING).

--
Kees Cook

Kees Cook

unread,
Aug 27, 2020, 2:30:51 PM8/27/20
to Andy Shevchenko, Masahiro Yamada, Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, Aug 27, 2020 at 11:59:24AM +0300, Andy Shevchenko wrote:
> strcpy() is not a bad API for the cases when you know what you are
> doing. A problem that most of the developers do not know what they are
> doing.
> No need to split everything to bad and good by its name or semantics,
> each API has its own pros and cons and programmers must use their
> brains.

I equate "unsafe" or "fragile" with "bad". There's no reason to use our
brains for remembering what's safe or not when we can just remove unsafe
things from the available APIs, and/or lean on the compiler to help
(e.g. CONFIG_FORTIFY_SOURCE).

Most of the uses of strcpy() in the kernel are just copying between two
known-at-compile-time NUL-terminated character arrays. We had wanted to
introduce stracpy() for this, but Linus objected to yet more string
functions. So for now, I'm aimed at removing strlcpy() completely first,
then look at strcpy() -> strscpy() for cases where target size is NOT
compile-time known, and then to convert the kernel's strcpy() into
_requiring_ that source/dest lengths are known at compile time.

And then tackle strncpy(), which is a mess.

--
Kees Cook

Joe Perches

unread,
Aug 27, 2020, 3:37:08 PM8/27/20
to Kees Cook, Andy Shevchenko, Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, 2020-08-27 at 11:30 -0700, Kees Cook wrote:

> Most of the uses of strcpy() in the kernel are just copying between two
> known-at-compile-time NUL-terminated character arrays. We had wanted to
> introduce stracpy() for this, but Linus objected to yet more string
> functions.

https://lore.kernel.org/kernel-hardening/24bb53c57767c1c2...@sk2.org/T/

I still think stracpy is a good idea.

Maybe when the strcpy/strlcpy uses are removed
it'll be more acceptable.

And here's a cocci script to convert most of them.
https://lore.kernel.org/kernel-hardening/b9bb5550b264d4b29b2b20f...@perches.com/


Kees Cook

unread,
Aug 27, 2020, 3:41:48 PM8/27/20
to Joe Perches, Andy Shevchenko, Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
Yeah, thanks again for that. Most of this is very mechanical. (strncpy is not, unfortunately)

--
Kees Cook

Andy Shevchenko

unread,
Aug 27, 2020, 4:05:59 PM8/27/20
to Kees Cook, Masahiro Yamada, Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
In general it's better to have a robust API, but what may go wrong
with the interface where we have no length of the buffer passed, but
we all know that it's PAGE_SIZE?
So, what's wrong with doing something like
strcpy(buf, "Yes, we know we won't overflow here\n");
?

Kees Cook

unread,
Aug 27, 2020, 6:26:14 PM8/27/20
to Andy Shevchenko, Masahiro Yamada, Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, Aug 27, 2020 at 11:05:42PM +0300, Andy Shevchenko wrote:
> In general it's better to have a robust API, but what may go wrong
> with the interface where we have no length of the buffer passed, but
> we all know that it's PAGE_SIZE?
> So, what's wrong with doing something like
> strcpy(buf, "Yes, we know we won't overflow here\n");

(There's a whole thread[1] about this right now, actually.)

The problem isn't the uses where it's safe (obviously), it's about the
uses where it is NOT safe. (Or _looks_ safe but isn't.) In order to
eliminate bug classes, we need remove the APIs that are foot-guns. Even
if one developer never gets it wrong, others might.

[1] https://lore.kernel.org/lkml/c256eba42a564c01...@AcuMS.aculab.com/T/#mac95487d7ae427de03251b49b75dd4de40c2462d

--
Kees Cook

Andy Shevchenko

unread,
Aug 28, 2020, 4:17:58 AM8/28/20
to Kees Cook, Masahiro Yamada, Nick Desaulniers, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
Seems to me that this is a fixation on an abstract problem that never
exists (of course, if a developer has brains to think).

Nick Desaulniers

unread,
Aug 31, 2020, 7:21:30 PM8/31/20
to Andy Shevchenko, Kees Cook, Masahiro Yamada, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, Aug 27, 2020 at 1:59 AM Andy Shevchenko
<andy.sh...@gmail.com> wrote:
>
> strcpy() is not a bad API for the cases when you know what you are
> doing. A problem that most of the developers do not know what they are
> doing.
> No need to split everything to bad and good by its name or semantics,
> each API has its own pros and cons and programmers must use their
> brains.

On Fri, Aug 28, 2020 at 1:17 AM Andy Shevchenko
<andy.sh...@gmail.com> wrote:
>
> Seems to me that this is a fixation on an abstract problem that never
> exists (of course, if a developer has brains to think).

Of course, no "True Scotsman" would accidentally misuse C string.h API!
https://yourlogicalfallacyis.com/no-true-scotsman

(I will note the irony of my off by one in my v1 implementation of
stpcpy. I've also missed strncpy zeroing the rest of a destination
buffer before. I might not be a "True Scotsman.")

On Thu, Aug 27, 2020 at 11:30 AM Kees Cook <kees...@chromium.org> wrote:
>
> I equate "unsafe" or "fragile" with "bad". There's no reason to use our
> brains for remembering what's safe or not when we can just remove unsafe
> things from the available APIs, and/or lean on the compiler to help
> (e.g. CONFIG_FORTIFY_SOURCE).

Having seatbelts is great (ie. fortify source), but is no substitute
for driving carefully (having proper APIs that help me not shoot my
foot off). I think it's nice to have *both*, but if I drove solely
relying on my seatbelts, we might all be in trouble. Not disagreeing
with you, Kees.
--
Thanks,
~Nick Desaulniers

David Laight

unread,
Sep 1, 2020, 4:51:17 AM9/1/20
to Nick Desaulniers, Andy Shevchenko, Kees Cook, Masahiro Yamada, Joe Perches, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
> Of course, no "True Scotsman" would accidentally misuse C string.h API!
> https://yourlogicalfallacyis.com/no-true-scotsman

Google will find plenty of:
str[strlen(str)] = 0;

Kees Cook

unread,
Sep 6, 2020, 5:57:30 AM9/6/20
to Masahiro Yamada, Nick Desaulniers, clang-built-linux, stable, Andy Lavr, Arvind Sankar, Joe Perches, Rasmus Villemoes, Sami Tolvanen, Andrew Morton, Andy Shevchenko, Alexandru Ardelean, Yury Norov, Linux Kernel Mailing List
On Thu, Aug 27, 2020 at 12:58:35AM +0900, Masahiro Yamada wrote:
> On Tue, Aug 25, 2020 at 10:58 PM Nick Desaulniers
> <ndesau...@google.com> wrote:
> > [...]
> > +/**
> > + * stpcpy - copy a string from src to dest returning a pointer to the new end
> > + * of dest, including src's %NUL-terminator. May overrun dest.
> > + * @dest: pointer to end of string being copied into. Must be large enough
> > + * to receive copy.
> > + * @src: pointer to the beginning of string being copied from. Must not overlap
> > + * dest.
> > + *
> > + * stpcpy differs from strcpy in a key way: the return value is the new
> > + * %NUL-terminated character. (for strcpy, the return value is a pointer to
> > + * src.
>
>
> return a pointer to src?
>
> "man 3 strcpy" says:
>
> The strcpy() and strncpy() functions return
> a pointer to the destination string *dest*.

Agreed; that's a typo.

--
Kees Cook

Nick Desaulniers

unread,
Sep 14, 2020, 12:10:12 PM9/14/20
to Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, linux-...@vger.kernel.org, clang-bu...@googlegroups.com, Nick Desaulniers, Sami Tolvanen, Andy Lavr, Joe Perches, Rasmus Villemoes, Nathan Chancellor, sta...@vger.kernel.org
Reported-by: Sami Tolvanen <samito...@google.com>
Suggested-by: Andy Lavr <andy...@gmail.com>
Suggested-by: Arvind Sankar <nive...@alum.mit.edu>
Suggested-by: Joe Perches <j...@perches.com>
Suggested-by: Kees Cook <kees...@chromium.org>
Suggested-by: Masahiro Yamada <masa...@kernel.org>
Suggested-by: Rasmus Villemoes <li...@rasmusvillemoes.dk>
Signed-off-by: Nick Desaulniers <ndesau...@google.com>
Tested-by: Nathan Chancellor <natecha...@gmail.com>
Reviewed-by: Kees Cook <kees...@chromium.org>
---
Changes V4:
* Roll up Kees' comment fixup from
https://lore.kernel.org/lkml/202009060302.4574D8D0E0@keescook/#t.
* Keep Nathan's tested by tag.
* Add Kees' reviewed by tag from
https://lore.kernel.org/lkml/202009031446.3865FE82B@keescook/.

Changes V3:
* Drop Sami's Tested by tag; newer patch.
* Add EXPORT_SYMBOL as per Andy.
* Rewrite commit message, rewrote part of what Masahiro said to be
generic in terms of foo() and bar().
* Prefer %NUL-terminated to NULL terminated. NUL is the ASCII character
'\0', as per Arvind and Rasmus.

Changes V2:
* Added Sami's Tested by; though the patch changed implementation, the
missing symbol at link time was the problem Sami was observing.
* Fix __restrict -> __restrict__ typo as per Joe.
* Drop note about restrict from commit message as per Arvind.
* Fix NULL -> NUL as per Arvind; NUL is ASCII '\0'. TIL
* Fix off by one error as per Arvind; I had another off by one error in
my test program that was masking this.
lib/string.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 6012c385fb31..b6b8847218b5 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -272,6 +272,30 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)
}
EXPORT_SYMBOL(strscpy_pad);

+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ * of dest, including src's %NUL-terminator. May overrun dest.
+ * @dest: pointer to end of string being copied into. Must be large enough
+ * to receive copy.
+ * @src: pointer to the beginning of string being copied from. Must not overlap
+ * dest.
+ *
+ * stpcpy differs from strcpy in a key way: the return value is a pointer
+ * to the new %NUL-terminating character in @dest. (For strcpy, the return
+ * value is a pointer to the start of @dest. This interface is considered
+ * unsafe as it doesn't perform bounds checking of the inputs. As such it's
+ * not recommended for usage. Instead, its definition is provided in case
+ * the compiler lowers other libcalls to stpcpy.
+ */
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src);
+char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
+{
+ while ((*dest++ = *src++) != '\0')
+ /* nothing */;
+ return --dest;
+}
+EXPORT_SYMBOL(stpcpy);
+
#ifndef __HAVE_ARCH_STRCAT
/**
* strcat - Append one %NUL-terminated string to another
--
2.28.0.618.gf4bc123cb7-goog

Nick Desaulniers

unread,
Sep 14, 2020, 12:14:15 PM9/14/20
to Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, LKML, clang-built-linux, Sami Tolvanen, Andy Lavr, Joe Perches, Rasmus Villemoes, Nathan Chancellor, # 3.4.x
And, I forgot the stupid close parens...I will resend a v5...

> + * unsafe as it doesn't perform bounds checking of the inputs. As such it's
> + * not recommended for usage. Instead, its definition is provided in case
> + * the compiler lowers other libcalls to stpcpy.
> + */
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src);
> +char *stpcpy(char *__restrict__ dest, const char *__restrict__ src)
> +{
> + while ((*dest++ = *src++) != '\0')
> + /* nothing */;
> + return --dest;
> +}
> +EXPORT_SYMBOL(stpcpy);
> +
> #ifndef __HAVE_ARCH_STRCAT
> /**
> * strcat - Append one %NUL-terminated string to another
> --
> 2.28.0.618.gf4bc123cb7-goog
>


--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Sep 14, 2020, 12:16:51 PM9/14/20
to Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, linux-...@vger.kernel.org, clang-bu...@googlegroups.com, Nick Desaulniers, Sami Tolvanen, Andy Lavr, Joe Perches, Rasmus Villemoes, Nathan Chancellor, sta...@vger.kernel.org
Changes V5:
* fix missing parens in comment.

Changes V4:
* Roll up Kees' comment fixup from
https://lore.kernel.org/lkml/202009060302.4574D8D0E0@keescook/#t.
* Keep Nathan's tested by tag.
* Add Kees' reviewed by tag from
https://lore.kernel.org/lkml/202009031446.3865FE82B@keescook/.

Changes V3:
* Drop Sami's Tested by tag; newer patch.
* Add EXPORT_SYMBOL as per Andy.
* Rewrite commit message, rewrote part of what Masahiro said to be
generic in terms of foo() and bar().
* Prefer %NUL-terminated to NULL terminated. NUL is the ASCII character
'\0', as per Arvind and Rasmus.

Changes V2:
* Added Sami's Tested by; though the patch changed implementation, the
missing symbol at link time was the problem Sami was observing.
* Fix __restrict -> __restrict__ typo as per Joe.
* Drop note about restrict from commit message as per Arvind.
* Fix NULL -> NUL as per Arvind; NUL is ASCII '\0'. TIL
* Fix off by one error as per Arvind; I had another off by one error in
my test program that was masking this.

lib/string.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/lib/string.c b/lib/string.c
index 6012c385fb31..4288e0158d47 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -272,6 +272,30 @@ ssize_t strscpy_pad(char *dest, const char *src, size_t count)
}
EXPORT_SYMBOL(strscpy_pad);

+/**
+ * stpcpy - copy a string from src to dest returning a pointer to the new end
+ * of dest, including src's %NUL-terminator. May overrun dest.
+ * @dest: pointer to end of string being copied into. Must be large enough
+ * to receive copy.
+ * @src: pointer to the beginning of string being copied from. Must not overlap
+ * dest.
+ *
+ * stpcpy differs from strcpy in a key way: the return value is a pointer
+ * to the new %NUL-terminating character in @dest. (For strcpy, the return
+ * value is a pointer to the start of @dest). This interface is considered

Nathan Chancellor

unread,
Sep 15, 2020, 12:22:37 AM9/15/20
to Nick Desaulniers, Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, linux-...@vger.kernel.org, clang-bu...@googlegroups.com, Sami Tolvanen, Andy Lavr, Joe Perches, Rasmus Villemoes, sta...@vger.kernel.org
Reviewed-by: Nathan Chancellor <natecha...@gmail.com>

It would be nice to get this into mainline sooner rather than later so
that it can start filtering into the stable trees. ToT LLVM builds have
been broken for a month now.
For the record, I don't see Kees' review tag on this.

Joe Perches

unread,
Sep 15, 2020, 12:28:17 AM9/15/20
to Nathan Chancellor, Nick Desaulniers, Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, linux-...@vger.kernel.org, clang-bu...@googlegroups.com, Sami Tolvanen, Andy Lavr, Rasmus Villemoes, sta...@vger.kernel.org
On Mon, 2020-09-14 at 21:22 -0700, Nathan Chancellor wrote:
> It would be nice to get this into mainline sooner rather than later so
> that it can start filtering into the stable trees. ToT LLVM builds have
> been broken for a month now.

People that build stable trees with new compilers
unsupported at the time the of the base version
release are just asking for trouble.


Nick Desaulniers

unread,
Sep 15, 2020, 2:14:07 PM9/15/20
to Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, LKML, clang-built-linux, Sami Tolvanen, Andy Lavr, Joe Perches, Rasmus Villemoes, # 3.4.x, Nathan Chancellor
On Mon, Sep 14, 2020 at 9:22 PM Nathan Chancellor
<natecha...@gmail.com> wrote:
>
> It would be nice to get this into mainline sooner rather than later so
> that it can start filtering into the stable trees. ToT LLVM builds have
> been broken for a month now.

Hi Andrew, I appreciate your help getting this submitted to mainline.

Noting that other folks are hitting this frequently:
https://lore.kernel.org/lkml/CAKwvOdkh=bZE6uY8zk_QePq5B3fY1...@mail.gmail.com/

CrOS folks are already shipping v3 downstream since this is blocking
the release of their toolchain.

(Also, I appreciate folks' thoughts on the comments in the patch, but
please stop delaying this patch from hitting mainline. You can
rewrite the commit message to whatever you want, delete it for all I
care, please for god's sake please unbreak the build first).
--
Thanks,
~Nick Desaulniers

Nick Desaulniers

unread,
Sep 15, 2020, 7:05:22 PM9/15/20
to Joe Perches, Nathan Chancellor, Andrew Morton, Kees Cook, Arvind Sankar, Masahiro Yamada, LKML, clang-built-linux, Sami Tolvanen, Andy Lavr, Rasmus Villemoes, # 3.4.x, Greg KH
It is asymmetry that we have a minimum supported version of a
toolchain, but no maximum supported version of a toolchain for a given
branch. I think that's a good thing; imagine if you were stuck on an
old compiler for a stable branch. No thanks. I guess we just like to
live dangerously? :P

Also, GKH has voiced support for newer toolchains for older kernel
releases before. Related to this issue, in fact.
https://lore.kernel.org/lkml/2020081807...@kroah.com/
--
Thanks,
~Nick Desaulniers
Reply all
Reply to author
Forward
0 new messages