Re: [PATCH v7 1/6] net: introduce helper sendpage_ok() in include/linux/net.h

26 views
Skip to first unread message

Christoph Hellwig

unread,
Aug 18, 2020, 12:24:09 PM8/18/20
to Coly Li, linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Chaitanya Kulkarni, Christoph Hellwig, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
I think we should go for something simple like this instead:

---
From 4867e158ee86ebd801b4c267e8f8a4a762a71343 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <h...@lst.de>
Date: Tue, 18 Aug 2020 18:19:23 +0200
Subject: net: bypass ->sendpage for slab pages

Sending Slab or tail pages into ->sendpage will cause really strange
delayed oops. Prevent it right in the networking code instead of
requiring drivers to work around the fact.

Signed-off-by: Christoph Hellwig <h...@lst.de>
---
net/socket.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index dbbe8ea7d395da..fbc82eb96d18ce 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -3638,7 +3638,12 @@ EXPORT_SYMBOL(kernel_getpeername);
int kernel_sendpage(struct socket *sock, struct page *page, int offset,
size_t size, int flags)
{
- if (sock->ops->sendpage)
+ /*
+ * sendpage does manipulates the refcount of the passed in page, which
+ * does not work for Slab pages, or for tails of non-__GFP_COMP
+ * high order pages.
+ */
+ if (sock->ops->sendpage && !PageSlab(page) && page_count(page) > 0)
return sock->ops->sendpage(sock, page, offset, size, flags);

return sock_no_sendpage(sock, page, offset, size, flags);
--
2.28.0

Christoph Hellwig

unread,
Aug 18, 2020, 3:49:34 PM8/18/20
to Coly Li, Christoph Hellwig, linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Chaitanya Kulkarni, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
On Wed, Aug 19, 2020 at 12:33:37AM +0800, Coly Li wrote:
> On 2020/8/19 00:24, Christoph Hellwig wrote:
> > I think we should go for something simple like this instead:
>
> This idea is fine to me. Should a warning message be through here? IMHO
> the driver still sends an improper page in, fix it in silence is too
> kind or over nice to the buggy driver(s).

I don't think a warning is a good idea. An API that does the right
thing underneath and doesn't require boiler plate code in most callers
is the right API.

Coly Li

unread,
Aug 18, 2020, 6:34:06 PM8/18/20
to linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Coly Li, Chaitanya Kulkarni, Christoph Hellwig, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
The original problem was from nvme-over-tcp code, who mistakenly uses
kernel_sendpage() to send pages allocated by __get_free_pages() without
__GFP_COMP flag. Such pages don't have refcount (page_count is 0) on
tail pages, sending them by kernel_sendpage() may trigger a kernel panic
from a corrupted kernel heap, because these pages are incorrectly freed
in network stack as page_count 0 pages.

This patch introduces a helper sendpage_ok(), it returns true if the
checking page,
- is not slab page: PageSlab(page) is false.
- has page refcount: page_count(page) is not zero

All drivers who want to send page to remote end by kernel_sendpage()
may use this helper to check whether the page is OK. If the helper does
not return true, the driver should try other non sendpage method (e.g.
sock_no_sendpage()) to handle the page.

Signed-off-by: Coly Li <col...@suse.de>
Cc: Chaitanya Kulkarni <chaitanya...@wdc.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Hannes Reinecke <ha...@suse.de>
Cc: Jan Kara <ja...@suse.com>
Cc: Jens Axboe <ax...@kernel.dk>
Cc: Mikhail Skorzhinskii <mskorz...@solarflare.com>
Cc: Philipp Reisner <philipp...@linbit.com>
Cc: Sagi Grimberg <sa...@grimberg.me>
Cc: Vlastimil Babka <vba...@suse.com>
Cc: sta...@vger.kernel.org
---
include/linux/net.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/include/linux/net.h b/include/linux/net.h
index d48ff1180879..05db8690f67e 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -21,6 +21,7 @@
#include <linux/rcupdate.h>
#include <linux/once.h>
#include <linux/fs.h>
+#include <linux/mm.h>
#include <linux/sockptr.h>

#include <uapi/linux/net.h>
@@ -286,6 +287,21 @@ do { \
#define net_get_random_once_wait(buf, nbytes) \
get_random_once_wait((buf), (nbytes))

+/*
+ * E.g. XFS meta- & log-data is in slab pages, or bcache meta
+ * data pages, or other high order pages allocated by
+ * __get_free_pages() without __GFP_COMP, which have a page_count
+ * of 0 and/or have PageSlab() set. We cannot use send_page for
+ * those, as that does get_page(); put_page(); and would cause
+ * either a VM_BUG directly, or __page_cache_release a page that
+ * would actually still be referenced by someone, leading to some
+ * obscure delayed Oops somewhere else.
+ */
+static inline bool sendpage_ok(struct page *page)
+{
+ return !PageSlab(page) && page_count(page) >= 1;
+}
+
int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec *vec,
size_t num, size_t len);
int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg,
--
2.26.2

Coly Li

unread,
Aug 18, 2020, 6:34:06 PM8/18/20
to Christoph Hellwig, linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Chaitanya Kulkarni, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
On 2020/8/19 00:24, Christoph Hellwig wrote:
> I think we should go for something simple like this instead:

This idea is fine to me. Should a warning message be through here? IMHO
the driver still sends an improper page in, fix it in silence is too
kind or over nice to the buggy driver(s).

And maybe the fix in nvme-tcp driver and do_tcp_sendpages() are still
necessary. I am not network expert, this is my opinion for reference.

Coly Li

Coly Li

unread,
Aug 18, 2020, 6:34:06 PM8/18/20
to linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Coly Li, Chaitanya Kulkarni, Christoph Hellwig, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
index d48ff1180879..a807fad31958 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -21,6 +21,7 @@
#include <linux/rcupdate.h>
#include <linux/once.h>
#include <linux/fs.h>
+#include <linux/mm.h>
#include <linux/sockptr.h>

#include <uapi/linux/net.h>
@@ -286,6 +287,21 @@ do { \
#define net_get_random_once_wait(buf, nbytes) \
get_random_once_wait((buf), (nbytes))

+/*
+ * E.g. XFS meta- & log-data is in slab pages, or bcache meta
+ * data pages, or other high order pages allocated by
+ * __get_free_pages() without __GFP_COMP, which have a page_count
+ * of 0 and/or have PageSlab() set. We cannot use send_page for
+ * those, as that does get_page(); put_page(); and would cause
+ * either a VM_BUG directly, or __page_cache_release a page that
+ * would actually still be referenced by someone, leading to some
+ * obscure delayed Oops somewhere else.
+ */
+static inline bool sendpage_ok(struct page *page)
+{
+ return (!PageSlab(page) && page_count(page) >= 1);

Coly Li

unread,
Aug 19, 2020, 12:22:19 AM8/19/20
to Christoph Hellwig, linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Chaitanya Kulkarni, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
Then I don't have more comment.

Thanks.

Coly Li

Ulrich Windl

unread,
Aug 19, 2020, 1:57:45 AM8/19/20
to open-iscsi
>>> Coly Li <col...@suse.de> schrieb am 18.08.2020 um 14:47 in Nachricht
<20200818124736...@suse.de>:
Actually I think this comment is somewhat mis-placed:
It should describe what the function does (check for specific properties of a page), but not where this function might be used. Most notably, because the use (from where it is called) may change over time, while the function will still do the same thing.

> +static inline bool sendpage_ok(struct page *page)
> +{
> + return (!PageSlab(page) && page_count(page) >= 1);
> +}
> +
> int kernel_sendmsg(struct socket *sock, struct msghdr *msg, struct kvec
> *vec,
> size_t num, size_t len);
> int kernel_sendmsg_locked(struct sock *sk, struct msghdr *msg,
> --
> 2.26.2
>
> --
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to open-iscsi+...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/open-iscsi/20200818124736.5790-2-colyli%40s
> use.de.




Christoph Hellwig

unread,
Sep 23, 2020, 4:43:10 AM9/23/20
to Coly Li, Christoph Hellwig, linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Chaitanya Kulkarni, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
So given the feedback from Dave I suspect we should actually resurrect
this series, sorry for the noise. And in this case I think we do need
the warning in kernel_sendpage.

Coly Li

unread,
Sep 23, 2020, 4:45:21 AM9/23/20
to Christoph Hellwig, linux...@vger.kernel.org, linux...@lists.infradead.org, net...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org, ceph-...@vger.kernel.org, linux-...@vger.kernel.org, Chaitanya Kulkarni, Hannes Reinecke, Jan Kara, Jens Axboe, Mikhail Skorzhinskii, Philipp Reisner, Sagi Grimberg, Vlastimil Babka, sta...@vger.kernel.org
Copied, then I will post a v8 series, which adding a warning message in
kernel_sendpage() if non-acceptible paage sent in.

Coly Li
Reply all
Reply to author
Forward
0 new messages