[PATCH net v3] mptcp: fix stale skb->sk reference on subflow close

0 views
Skip to first unread message

Kalpan Jani

unread,
May 19, 2026, 5:09:49 AM (yesterday) May 19
to kalpan...@gmail.com, Kalpan Jani, Paolo Abeni, syzkaller
The backlog list is updated by mptcp_data_ready() under
mptcp_data_lock(), but the cleanup of references to a closing subflow
in mptcp_close_ssk() was done after release_sock(ssk) without holding
that lock.

Once release_sock(ssk) returns, the RX path can acquire the ssk lock,
call mptcp_data_ready(), and enqueue a new skb under mptcp_data_lock()
before the close path reaches its list_for_each traversal. That skb is
missed by cleanup, leaving skb->sk pointing to the freed ssk.

Fix this by moving the backlog cleanup into __mptcp_close_ssk(), after
subflow->closing is set to 1 and while the ssk lock is still held,
serialized under mptcp_data_lock(). The cleanup runs only on the push
path (MPTCP_CF_PUSH), where backlog references accumulate; on other
teardown paths the caller already handles cleanup.

With both locks held simultaneously, any concurrent mptcp_data_ready()
either completes its enqueue before the purge runs and gets caught, or
observes closing=1 while the ssk lock is still held and bails without
enqueuing. After mptcp_data_unlock(), no new skbs can be enqueued.
The cleanup is exhaustive.

Remove the unprotected traversal from mptcp_close_ssk() entirely.

Tested with the MPTCP kernel selftests on the patched kernel:
- tools/testing/selftests/net/mptcp/mptcp_join.sh: all tests pass
- tools/testing/selftests/net/mptcp/mptcp_connect.sh: 68/68 pass

Suggested-by: Paolo Abeni <pab...@redhat.com>
Reported-by: syzkaller <syzk...@googlegroups.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/621
Signed-off-by: Kalpan Jani <kalpa...@mpiricsoftware.com>
---
v3: follow Paolo's suggestion exactly — use mptcp_cleanup_ssk_backlog() helper under MPTCP_CF_PUSH condition only; remove inline loop duplication and second call from !dispose_it path.
v2: moved cleanup into __mptcp_close_ssk() but duplicated logic and added redundant second call; incorrect approach.
v1: incorrect race analysis around subflow->closing flag.

net/mptcp/protocol.c | 33 +++++++++++++++++++--------------
1 file changed, 19 insertions(+), 14 deletions(-)

diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 718e910ff..149f816fe 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2527,6 +2527,22 @@ static void __mptcp_subflow_disconnect(struct sock *ssk,
}
}

+static void mptcp_cleanup_ssk_backlog(struct sock *sk, struct sock *ssk)
+{
+ struct mptcp_sock *msk = mptcp_sk(sk);
+ struct sk_buff *skb;
+
+ mptcp_data_lock(sk);
+ list_for_each_entry(skb, &msk->backlog_list, list) {
+ if (skb->sk != ssk)
+ continue;
+
+ atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
+ skb->sk = NULL;
+ }
+ mptcp_data_unlock(sk);
+}
+
/* subflow sockets can be either outgoing (connect) or incoming
* (accept).
*
@@ -2550,6 +2566,9 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
subflow->closing = 1;

+ if (flags & MPTCP_CF_PUSH)
+ mptcp_cleanup_ssk_backlog(sk, ssk);
+
/* Borrow the fwd allocated page left-over; fwd memory for the subflow
* could be negative at this point, but will be reach zero soon - when
* the data allocated using such fragment will be freed.
@@ -2641,9 +2660,6 @@ static void __mptcp_close_ssk(struct sock *sk, struct sock *ssk,
void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
struct mptcp_subflow_context *subflow)
{
- struct mptcp_sock *msk = mptcp_sk(sk);
- struct sk_buff *skb;
-
/* The first subflow can already be closed or disconnected */
if (subflow->close_event_done || READ_ONCE(subflow->local_id) < 0)
return;
@@ -2653,17 +2669,6 @@ void mptcp_close_ssk(struct sock *sk, struct sock *ssk,
if (sk->sk_state == TCP_ESTABLISHED)
mptcp_event(MPTCP_EVENT_SUB_CLOSED, mptcp_sk(sk), ssk, GFP_KERNEL);

- /* Remove any reference from the backlog to this ssk; backlog skbs consume
- * space in the msk receive queue, no need to touch sk->sk_rmem_alloc
- */
- list_for_each_entry(skb, &msk->backlog_list, list) {
- if (skb->sk != ssk)
- continue;
-
- atomic_sub(skb->truesize, &skb->sk->sk_rmem_alloc);
- skb->sk = NULL;
- }
-
/* subflow aborted before reaching the fully_established status
* attempt the creation of the next subflow
*/
--
2.43.0

Kalpan Jani

unread,
May 19, 2026, 5:14:20 AM (yesterday) May 19
to Pabeni, syzkaller, kalpanjani009
Sorry, that patch email was a local test sent unintentionally. Please ignore.

Kalpan Jani
Reply all
Reply to author
Forward
0 new messages