It depends upon our interpretation of how you intended the
SPLICE_F_NONBLOCK flag to work when you added it way back
when.
Could you take a quick look and make sure our interpretation matches
your intent? This behavior has been bugging people for a while and I
want to close this out, one way or another.
Thanks!
[PATCH] net: splice() from tcp to pipe should take into account O_NONBLOCK
tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag
Before this patch :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE);
causes a random endless block (if pipe is full) and
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK);
will return 0 immediately if the TCP buffer is empty.
User application has no way to instruct splice() that socket should be in blocking mode
but pipe in nonblock more.
Many projects cannot use splice(tcp -> pipe) because of this flaw.
http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD
http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html
Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
(splice: add SPLICE_F_NONBLOCK flag )
It doesn't make the splice itself necessarily nonblocking (because the
actual file descriptors that are spliced from/to may block unless they
have the O_NONBLOCK flag set), but it makes the splice pipe operations
nonblocking.
Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK
flag, as other socket operations do.
Users will then call :
splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );
to block on data coming from socket (if file is in blocking mode),
and not block on pipe output (to avoid deadlock)
First version of this patch was submitted by Octavian Purdila
Reported-by: Volker Lendecke <v...@samba.org>
Reported-by: Jason Gunthorpe <jgunt...@obsidianresearch.com>
Signed-off-by: Eric Dumazet <eric.d...@gmail.com>
Signed-off-by: Octavian Purdila <opur...@ixiacom.com>
---
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 21387eb..8cdfab6 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -580,7 +580,7 @@ ssize_t tcp_splice_read(struct socket *sock, loff_t *ppos,
lock_sock(sk);
- timeo = sock_rcvtimeo(sk, flags & SPLICE_F_NONBLOCK);
+ timeo = sock_rcvtimeo(sk, sock->file->f_flags & O_NONBLOCK);
while (tss.len) {
ret = __tcp_splice_read(sk, &tss);
if (ret < 0)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
On Thu, 1 Oct 2009, David Miller wrote:
>
> It depends upon our interpretation of how you intended the
> SPLICE_F_NONBLOCK flag to work when you added it way back
> when.
>
> Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
> (splice: add SPLICE_F_NONBLOCK flag )
>
> It doesn't make the splice itself necessarily nonblocking (because the
> actual file descriptors that are spliced from/to may block unless they
> have the O_NONBLOCK flag set), but it makes the splice pipe operations
> nonblocking.
>
> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
Ack. The original intent was for the flag to affect the buffering, not the
end points.
> splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK );
>
> to block on data coming from socket (if file is in blocking mode),
> and not block on pipe output (to avoid deadlock)
Yes. Sounds correct. Although the more I think about it, the more I
suspect that the whole NONBLOCK thing should probably have been two bits,
and simply been about "nonblocking input" vs "nonblocking output" (so that
you could control both sides on a call-by-call basis).
But I think that your patch is fundamentally correct with the semantics
as-is.
Linus
On Thu, 1 Oct 2009, Linus Torvalds wrote:
>
> Ack. The original intent was for the flag to affect the buffering, not the
> end points.
Side note, in case it wasn't clear: I added Jens to the cc, because in the
end my "original intent" probably doesn't matter all that much. I may have
set up the basic ideas and so on, but I never wrote any apps that use it,
and Jens has done most of the actual fixes since.
So give "my original intent" the weight (or rather, lack there-of) that it
deserves.
> On Thu, 1 Oct 2009, David Miller wrote:
>>
>> It depends upon our interpretation of how you intended the
>> SPLICE_F_NONBLOCK flag to work when you added it way back
>> when.
>>
>> Linus introduced SPLICE_F_NONBLOCK in commit 29e350944fdc2dfca102500790d8ad6d6ff4f69d
>> (splice: add SPLICE_F_NONBLOCK flag )
>>
>> It doesn't make the splice itself necessarily nonblocking (because the
>> actual file descriptors that are spliced from/to may block unless they
>> have the O_NONBLOCK flag set), but it makes the splice pipe operations
>> nonblocking.
>>
>> Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only
>
> Ack. The original intent was for the flag to affect the buffering, not the
> end points.
Great, thanks for reviewing.
> Although the more I think about it, the more I suspect that the
> whole NONBLOCK thing should probably have been two bits, and simply
> been about "nonblocking input" vs "nonblocking output" (so that you
> could control both sides on a call-by-call basis).
I think we could still extend things in this way if we wanted to.
So if you specify the explicit input and/or output nonblock flag,
it takes precedence over the SPLICE_F_NONBLOCK thing.
Anyways, just an idea.
Yes I agree, thank god for having a 'flags' parameter for the syscalls
:-). I'll make a note to add and test bidirectional nonblock hints.
The net patch looks fine and correct to me, feel free to add my acked-by
if you want.
--
Jens Axboe
> The net patch looks fine and correct to me, feel free to add my acked-by
> if you want.
Thanks Jens.