supporting multiple outstanding overlapped IO on udp_socket_win.cc

340 views
Skip to first unread message

Guo-wei Shieh

unread,
Dec 11, 2014, 7:39:21 PM12/11/14
to net-dev
Hi,

On windows, multiple outstanding overlapped io are allowed but it doesn't seem like we allow it in udp_socket_win.cc.

Although from MSDN,

The lpOverlapped parameter must be valid for the duration of the overlapped operation. If multiple I/O operations are simultaneously outstanding, each must reference a separate WSAOVERLAPPEDstructure.

In udp_socket_win.cc, 
 
1. there is only 1 single OVERLAPPED structure coming from the core_
2. we also mapped both EWOULDBLOCK and WSA_IO_PENDING to Net::ERR_IO_PENDING

Is there any reason that we don't support multiple outstanding overlapped IOs? 

Thanks,
Guowei

Guo-wei Shieh

unread,
Dec 11, 2014, 7:48:33 PM12/11/14
to net-dev

Hi,

On windows, multiple outstanding overlapped io are allowed but it doesn't seem like our implementation allows it in udp_socket_win.cc.

Although from MSDN,

The lpOverlapped parameter must be valid for the duration of the overlapped operation. If multiple I/O operations are simultaneously outstanding, each must reference a separate WSAOVERLAPPEDstructure.

In udp_socket_win.cc, 
 
1. there is only 1 single OVERLAPPED structure coming from the core_
2. we also mapped both EWOULDBLOCK and WSA_IO_PENDING to Net::ERR_IO_PENDING

Is there any reason that we don't do multiple outstanding overlapped IOs? 

Thanks,
Guowei

Ryan Sleevi

unread,
Dec 11, 2014, 8:49:41 PM12/11/14
to Guo-wei Shieh, net-dev

We intentionally don't expose the low-level details, which is a non-goal. We instead limit ourselves to APIs we can uniformly support cross platform, and we support platforms implemented via non-blocking IO patterns.

Are you asking about read/write pairing or about multiple reads?

--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CABM%3Dn%2B2XgniEuMQhoKYy0ga1VpOX%2BxmCw3sBJC%3DPLfotU9QWWQ%40mail.gmail.com.

Guo-wei Shieh

unread,
Dec 12, 2014, 1:44:57 AM12/12/14
to Ryan Sleevi, net-dev, Justin Uberti
TL;DR;
1. On windows, WSAEWOULDBLOCK means the same as EWOULDBLOCK on linux; we are out of buffer, you should try again. However, WSA_IO_PENDING just means an overlapped io has started as the HW can't send fast enough.
2. Translating WSA_IO_PENDING to ERR_IO_PENDING prevent us from using the full send buffer. We're basically limited by how fast the HW could send a packet synchronously.
3. Worse, translating both WSA_IO_PENDING and WSAEWOULDBLOCK to net::ERR_IO_PENDING could cause upper layer bugs like data lost or even deadlock.
4. Handling of WSA_IO_PENDING doesn't really expose any OS specific details. In fact, it makes the handling of WSAEWOULDBLOCK the same as what it does on linux.
5. This unnecessarily lowers the overall send speed as we're not overlapping multiple outstanding sends.

Details:

My focus now is about multiple write (sendto) since in webrtc, we need to send out video packet asap. 

The current net stack has such contract that when net::ERR_IO_PENDING is returned, the client doesn't need to keep the buffer and a call back will come later when send is completed.

I'm not sure how handling WSA_IO_PENDING in the correct way would conflict with the goal you mentioned or expose any low-level detail though.

In windows, there are 2 error codes. WSAEWOULDBLOCK and WSA_IO_PENDING. WSAEWOULDBLOCK, same as the EWOULDBLOCK on linux, means that we don't have more system buffer to copy. However, WSA_IO_PENDING just means that in an OVERLAPPED socket, your request is processed and you'll receive a signal later. It has nothing to do with buffer. 

It seems to me that we incorrectly map WSA_IO_PENDING to the concept of EWOULDBLOCK on windows. (In fact, if you look at the code, on udp_socket_libevent.cc, when we receive EWOULDBLOCK, we keep the buffer and resend later. However, in udp_socket_win.cc, we keep the buffer but never reuse them. Since WSA_IO_PENDING has nothing to do with buffer size).

In other words, no matter how big we set the SNDBUF on windows, we are not using the full buffer size. We are only limited by the speed of send (such that we could send synchronously or asynchronously)

There are also two potential bug here (although they are masked now). What if windows system returns WSAEWOULDBLOCK? In that case, we happily translate them to net::ERR_IO_PENDING and return. But the data is just lost (since we're not going to resend them). Worse, since it's WSAEWOULDBLOCK, there will be no callback. This could cause a deadlock on client. We didn't hit this since there is no case where we run out of buffer as we just never try to use as much as available.

WSA_IO_PENDING
997

Overlapped operations will complete later.

The application has initiated an overlapped operation that cannot be completed immediately. A completion indication will be given later when the operation has been completed. Note that this error is returned by the operating system, so the error number may change in future releases of Windows.


WSAEWOULDBLOCK
10035

Resource temporarily unavailable.

This error is returned from operations on nonblocking sockets that cannot be completed immediately, for example recv when no data is queued to be read from the socket. It is a nonfatal error, and the operation should be retried later. It is normal for WSAEWOULDBLOCK to be reported as the result from calling connect on a nonblocking SOCK_STREAM socket, since some time must elapse for the connection to be established.


Ryan Sleevi

unread,
Dec 12, 2014, 4:46:50 PM12/12/14
to Guo-wei Shieh, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Ryan Hamilton, Raman Tenneti
On Thu, Dec 11, 2014 at 10:44 PM, Guo-wei Shieh <guo...@chromium.org> wrote:
TL;DR;
1. On windows, WSAEWOULDBLOCK means the same as EWOULDBLOCK on linux; we are out of buffer, you should try again. However, WSA_IO_PENDING just means an overlapped io has started as the HW can't send fast enough.
2. Translating WSA_IO_PENDING to ERR_IO_PENDING prevent us from using the full send buffer. We're basically limited by how fast the HW could send a packet synchronously.
3. Worse, translating both WSA_IO_PENDING and WSAEWOULDBLOCK to net::ERR_IO_PENDING could cause upper layer bugs like data lost or even deadlock.

I'm not sure I follow your conclusion that it would necessarily lead to deadlock. That seems more an issue about how you use the //net code than the //net code itself.
 
4. Handling of WSA_IO_PENDING doesn't really expose any OS specific details. In fact, it makes the handling of WSAEWOULDBLOCK the same as what it does on linux.

I'm not sure why this is said. It still creates an inconsistency where WSA_IO_PENDING may still result in a failure that is not surfaced, whereas WSAEWOULDBLOCK and ERR_IO_PENDING don't.
 
5. This unnecessarily lowers the overall send speed as we're not overlapping multiple outstanding sends.

So I guess I wasn't clearer in my response why I think this is problematic.

WSA_IO_PENDING is a promise by the OS to attempt to send; as you note, it can come up when for example the HW can't send a packet synchronously (IF you've opted in to the overlapped IO handling). However, as you noted, in the case of overlapped IO, the error result is delivered asynchronously once the IO completion has completed and the OVERLAPPED returned via the IOCP.

The contract of //net gets violated when you permit multiple sends to be enqueued, but no longer have a reliable guarantee about what was sent and what wasn't. We try to avoid this at the TCP layer, where it's far easier to handle, and we similarly try to avoid it at the UDP layer.

For example, consider if the WSA_IO_PENDING is treated as net::OK. There's no way to signal back that something bad happened. Indeed, if a HW buffer is both full AND slow, we might end up in a situation where the OVERLAPPED continually returns WSA_IO_PENDING and then fails, but we never have a way to signal that to the higher layers so that backpressure can be applied. We can argue that it's a driver bug for letting the overlapped IO be enqueued in the first place, but driver bugs (and crap middleware) are the state of the world.

That's why I said that it's an inconsistent experience. There's no question that IOCPs offer an exceptionally high performance networking layer, especially with the ability to enqueue multiple sends(), but the //net API doesn't expose such a capability. It would require inverting the //net layer such that the UDP socket in //net allowed multiple sends(), and the ability to track the multiple associated CompletionCallbacks, and I'm not entirely sure that's a good model. Perhaps there's an alternative solution you see, but I think it's more to the point that we should have a single, consistent API contract in //net.

You privately mentioned seeing latency as high as 700ms for sending a single 1K UDP packet. That sounds crazy clown town, so adding a few other //net Chromies to see what they can share.

Guo-wei Shieh

unread,
Dec 12, 2014, 6:41:09 PM12/12/14
to Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Ryan Hamilton, Raman Tenneti
I got what you said. However, there is still one problem. On POSIX system, nonblocking udp socket send just copies the buffer to kernel until it reaches the limit and EWOULDBLOCK will return. On windows, however, it seems as long as we're not sending faster than HW could drain, ERR_IO_PENDING will return. At that point, my guess is that we haven't reached the amount of sendbuffer we have.

In other words, why do we even do overlapped IO on windows? Why don't we just go back to nonblocking one, at least, we could get more packets on the way simultaneously? 

Posix also doesn't tell you whether each send succeeds or not in the nonblocking mode. Why do we need to know that for windows platform? 

Let me also explain a bit about the 700ms. It's on a windows laptop (previous lenovo model issued by the company). 700ms to 1 sec is how long on wifi it takes for a 1k byte packet from being "sendto" to the point the callback returns. (The measurement is in content/browser/p2p/socket_udp_host.cc.) It was measured by a chrome://tracing session and for a good couple seconds. We also put measurement the same across all webrtc packets. Windows seems to take a long longer than Mac

Thanks,
Guowei


--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.

Ryan Hamilton

unread,
Dec 12, 2014, 6:51:53 PM12/12/14
to Guo-wei Shieh, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Raman Tenneti
On Fri, Dec 12, 2014 at 3:41 PM, Guo-wei Shieh <guo...@chromium.org> wrote:
I got what you said. However, there is still one problem. On POSIX system, nonblocking udp socket send just copies the buffer to kernel until it reaches the limit and EWOULDBLOCK will return. On windows, however, it seems as long as we're not sending faster than HW could drain, ERR_IO_PENDING will return. At that point, my guess is that we haven't reached the amount of sendbuffer we have.

​On Windows, sending a UDP packet that is smaller than 256 bytes (I think) ​goes through a "fast path" which writes synchronously. Any packet larger than that will always go through the "slow path" and ERR_IO_PENDING will be returned. We bumped into this issue in QUIC.

Ryan Sleevi

unread,
Dec 12, 2014, 6:52:31 PM12/12/14
to Guo-wei Shieh, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Ryan Hamilton, Raman Tenneti, Patrick Meenan
On Fri, Dec 12, 2014 at 3:41 PM, Guo-wei Shieh <guo...@chromium.org> wrote:
I got what you said. However, there is still one problem. On POSIX system, nonblocking udp socket send just copies the buffer to kernel until it reaches the limit and EWOULDBLOCK will return. On windows, however, it seems as long as we're not sending faster than HW could drain, ERR_IO_PENDING will return. At that point, my guess is that we haven't reached the amount of sendbuffer we have.

In other words, why do we even do overlapped IO on windows? Why don't we just go back to nonblocking one, at least, we could get more packets on the way simultaneously? 

I seem to recall this was explored for the TCP socket side, after bugs like https://code.google.com/p/chromium/issues/detail?id=30144 and https://code.google.com/p/chromium/issues/detail?id=86515#c5

Indeed, a quick check of TCPSocketWin shows that while we send with overlapped (WSASend), we use recv() and WSAEventSelect, so it's not unprecedented. +Pat to comment on his experiences implementing this.
 

Posix also doesn't tell you whether each send succeeds or not in the nonblocking mode. Why do we need to know that for windows platform? 

We know whether or not it was kernel enqueued; but that's a fair point regarding the behaviour of Windows in that it has artificial delays when enqueueing in the event of OVERLAPPED that don't necessarily apply.

Patrick Meenan

unread,
Dec 15, 2014, 9:12:43 AM12/15/14
to Ryan Sleevi, Guo-wei Shieh, net-dev, Justin Uberti, j...@chromium.org, Ryan Hamilton, Raman Tenneti
In the case of the TCP recv() implementation it was a really easy and clean switch to go from Overlapped to non-blocking and it allowed us to remove a hack where Chrome was trying to pace the recv buffer sizes to match what it expected the TCP stack would deliver during slow start.  The non-blocking mode also matches how the Linux and Mac stacks work so it ends up being a lot cleaner in general.  I didn't tackle send() or UDP because I had a very specific issue I was fixing but it looks like it makes sense to transition over all the way.

The //net contract should be able to remain the same, it's just a cleaner implementation of the contract where the underlying API's actually match the behavior that the //net interfaces expose.  The sockets should allow the sends to complete until the underlying kernel buffer is full and return WSAEWOULDBLOCK when it fills which is pretty much what the consumers of //net would want.  The fact that overlapped I/O has tighter time constraints and will return a pending write when there is still space in the kernel buffer just because it couldn't get it there immediately causes a mismatch with the //net API.  It would be interesting to see if it also manifests in the case of large TCP uploads.

That said, it's a pretty big change to a crazy-sensitive part of the stack with lots of potential for LSP and A/V issues.  A good test case where the improvements can be tested in dev will help make sure the assumptions are right but having some UMAs and finch control so we can measure the actual impact are going to be critical if we make the change.

Ryan Hamilton

unread,
Dec 15, 2014, 10:07:02 AM12/15/14
to Patrick Meenan, Ryan Sleevi, Guo-wei Shieh, net-dev, Justin Uberti, j...@chromium.org, Raman Tenneti
On Mon, Dec 15, 2014 at 6:12 AM, Patrick Meenan <pme...@google.com> wrote:
In the case of the TCP recv() implementation it was a really easy and clean switch to go from Overlapped to non-blocking and it allowed us to remove a hack where Chrome was trying to pace the recv buffer sizes to match what it expected the TCP stack would deliver during slow start.  The non-blocking mode also matches how the Linux and Mac stacks work so it ends up being a lot cleaner in general.  I didn't tackle send() or UDP because I had a very specific issue I was fixing but it looks like it makes sense to transition over all the way.

The //net contract should be able to remain the same, it's just a cleaner implementation of the contract where the underlying API's actually match the behavior that the //net interfaces expose.  The sockets should allow the sends to complete until the underlying kernel buffer is full and return WSAEWOULDBLOCK when it fills which is pretty much what the consumers of //net would want.  The fact that overlapped I/O has tighter time constraints and will return a pending write when there is still space in the kernel buffer just because it couldn't get it there immediately causes a mismatch with the //net API.  It would be interesting to see if it also manifests in the case of large TCP uploads.

That said, it's a pretty big change to a crazy-sensitive part of the stack with lots of potential for LSP and A/V issues.  A good test case where the improvements can be tested in dev will help make sure the assumptions are right but having some UMAs and finch control so we can measure the actual impact are going to be critical if we make the change.

​This is key for us (the QUIC team). As long as we can finch this on/off until we're sure of the performance implications we'll be happy.

Cheers,

Ryan

Guo-wei Shieh

unread,
Dec 15, 2014, 1:08:01 PM12/15/14
to Ryan Hamilton, Patrick Meenan, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Raman Tenneti
It seems to me that there are interests for this issue from multiple teams. I have created a cr bug 442392 to track this. Could you triage this and let us know the next step? 



--
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+u...@chromium.org.
To post to this group, send email to net...@chromium.org.

Matt Menke

unread,
Dec 15, 2014, 1:12:01 PM12/15/14
to Guo-wei Shieh, Ryan Hamilton, Patrick Meenan, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Raman Tenneti
Am I missing something?  I don't see any other interest from multiple reads/writes at once from anyone other than you - just discussion of switching from overlapped to non-blocking IO.

Patrick Meenan

unread,
Dec 15, 2014, 1:27:40 PM12/15/14
to Matt Menke, Guo-wei Shieh, Ryan Hamilton, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Raman Tenneti
I think it's just the wording of the bug.  As best as I can tell there is no ask for multiple pending writes at the //net layer, just to switch to non-blocking IO and let the data pass through to the underlying send buffers.  It would help to re-word the bug to indicate that overlapped IO is artificially limiting the outbound data instead of calling it "multiple outgoing send"

Guo-wei Shieh

unread,
Dec 15, 2014, 1:45:49 PM12/15/14
to Patrick Meenan, Matt Menke, Ryan Hamilton, Ryan Sleevi, net-dev, Justin Uberti, j...@chromium.org, Raman Tenneti
Yes, thanks for clarifying for me, Patrick. I don't care how it is done, all I care is that we're limiting the outbound data speed unnecessarily and that should be addressed.

Changed the title to "Overlapped IO on windows artificially limiting the outbound data on UDP (and possibly TCP too)"

hc...@google.com

unread,
Dec 29, 2014, 7:16:11 PM12/29/14
to net...@chromium.org, pme...@google.com, mme...@chromium.org, r...@chromium.org, rsl...@chromium.org, jub...@chromium.org, j...@chromium.org, rten...@chromium.org, guo...@chromium.org
2014年12月15日月曜日 10時45分49秒 UTC-8 Guo-wei Shieh:
Voicing the interest from the Cast Streaming team as well.

We have a media streaming feature in Chrome that runs on the UDP socket implementation. Our data showed a much higher application queuing dalay (in our application buffer) for outgoing packets on Windows than on OSX. We're interested in seeing / helping to get this issue resolved.
Reply all
Reply to author
Forward
0 new messages