As noted by Herbert, possible solutions include a timeout followed by a
copy/change of ownership of the skb, or always copying/changing
ownership if we're going into a hostile device.
This patch implements the second approach.
Note: one issue still remaining is that since skbs
keep reference to tun socket and tun socket has a
reference to tun device, we won't flush backlog,
instead simply waiting for all skbs to get transmitted.
At least this is not user-triggerable, and
this was not reported in practice, my assumption is
other devices besides tap complete an skb
within finite time after it has been queued.
A possible solution for the second issue
would not to have socket reference the device,
instead, implement dev->destructor for tun, and
wait for all skbs to complete there, but this
needs some thought, probably too risky for 2.6.34.
Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
Tested-by: Yan Vugenfirer <yvug...@redhat.com>
---
Please review the below, and consider for 2.6.34,
and stable trees.
drivers/net/tun.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 96c39bd..4326520 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -387,6 +387,10 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
}
}
+ /* Orphan the skb - required as we might hang on to it
+ * for indefinite time. */
+ skb_orphan(skb);
+
/* Enqueue packet */
skb_queue_tail(&tun->socket.sk->sk_receive_queue, skb);
dev->trans_start = jiffies;
--
1.7.0.2.280.gc6f05
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Acked-by: Herbert Xu <her...@gondor.apana.org.au>
Thanks,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
And before that, tap1 may not be able to send further packets to anyone
else on the bridge as its TX resources were blocked by tap2 - that's
what we saw in the field.
Nice solution, thanks!
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
After the patch, tap1 is able to flood tap2, and tap3/tap4 not able to
send one single frame. Is it OK ?
Back to the problem : tap1 cannot be closed.
Why ? because of refcounts ?
When a socket with inflight tx packets is closed, we dont block the
close, we only delay the socket freeing once all packets were delivered
and freed.
I think if that's a real issue, you have to apply traffic shaping to the
untrusted nodes. The existing flow-control scheme was fragile anyway as
you had to translate packet lengths on TX side to packet counts on RX.
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
Yes :) This was always possible. Number of senders needed to flood
a receiver might vary depending on send/recv queue size
that you set. External sources can also fill your RX queue
if you let them. In the end, we need to rely on the scheduler for fairness,
or apply packet shaping.
> Back to the problem : tap1 cannot be closed.
>
> Why ? because of refcounts ?
Yes.
> When a socket with inflight tx packets is closed, we dont block the
> close, we only delay the socket freeing once all packets were delivered
> and freed.
>
Which is wrong, since this is under userspace control, so you get
unkillable processes.
--
MST
> > When a socket with inflight tx packets is closed, we dont block the
> > close, we only delay the socket freeing once all packets were delivered
> > and freed.
> >
>
> Which is wrong, since this is under userspace control, so you get
> unkillable processes.
>
We do not get unkillable processes, at least with sockets I was thinking
about (TCP/UDP ones).
Maybe tun sockets can behave the same ?
Herbert Acked your patch, so I guess its OK, but I think it can be
dangerous.
Anyway my feeling is that we try to add various mechanisms to keep a
hostile user flooding another one.
For example, UDP got memory accounting quite recently, and we added
socket backlog limits very recently. It was considered not needed few
years ago.
Looks like that's what my patch does: ip_rcv seems to call
skb_orphan too.
Well, I was speaking of tx side, you speak of receiving side.
An external flood (coming from another domain) is another problem.
A sender might flood the 'network' inside our domain. How can we
reasonably limit the sender ?
Maybe the answer is 'We can not', but it should be stated somewhere, so
that someone can address this point later.
Point is, both ip_rcv and my patch call skb_orphan.
> An external flood (coming from another domain) is another problem.
>
> A sender might flood the 'network' inside our domain. How can we
> reasonably limit the sender ?
>
> Maybe the answer is 'We can not', but it should be stated somewhere, so
> that someone can address this point later.
>
And whatever's done should ideally work for tap to IP
and IP to IP sockets as well, not just tap to tap.
--
MST
The tun socket accounting was never designed to stop it from
flooding another tun interface. It's there to stop it from
transmitting above a destination interface TX bandwidth and
cause unnecessary packet drops. It also limits the total amount
of kernel memory that can be pinned down by a single tun interface.
In this case, all we're doing is shifting the accounting from the
"hardware" queue to the qdisc queue.
So your ability to flood a tun interface is essentially unchanged.
BTW we do the same thing in a number of hardware drivers, as well
as virtio-net.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
> On Tue, Apr 13, 2010 at 08:31:03PM +0200, Eric Dumazet wrote:
>>
>> Herbert Acked your patch, so I guess its OK, but I think it can be
>> dangerous.
>
> The tun socket accounting was never designed to stop it from
> flooding another tun interface. It's there to stop it from
> transmitting above a destination interface TX bandwidth and
> cause unnecessary packet drops. It also limits the total amount
> of kernel memory that can be pinned down by a single tun interface.
>
> In this case, all we're doing is shifting the accounting from the
> "hardware" queue to the qdisc queue.
>
> So your ability to flood a tun interface is essentially unchanged.
>
> BTW we do the same thing in a number of hardware drivers, as well
> as virtio-net.
Right. Although this reminds me about the whole SKB
orphaning on xmit issue that keeps coming back to haunt
us.
If there weren't odd references to the SKB's socket in
the packet scheduler et al. we could just orphan these
things right upon entry to the qdisc and not have to
add hacks like this to every driver.
In fact... maybe we can just do it in dev_hard_queue_xmit()
since we are out of the qdisc at that point.... but I guess
there might be weird drivers that want the SKB socket in
their ->xmit routine... Ho hum.
In any event that's net-next-2.6 exploratory material, and I've
applied this patch to net-2.6, thanks!
This is commit 0110d6f22f392f976e84ab49da1b42f85b64a3c5 in net-2.6
Please cherry-pick this fix in stable kernels (2.6.32 and 2.6.33).
Thanks!
--
MST
Before I forget again:
Tested-by: Jan Kiszka <jan.k...@siemens.com>
(namely the field test of our customer)
Jan
--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
Cool, thanks!
David Miller queues up the patches for the network subsystem for the
stable trees, and then forwards them to me when he feels they are ready.
So I'll defer to him on this one and wait for it to come from him.
thanks,
greg k-h
On Tue, Apr 13, 2010 at 05:59:44PM +0300, Michael S. Tsirkin wrote:
> To unsubscribe from this list: send the line "unsubscribe netdev" in