Windows overlapped IO benefits

287 views
Skip to first unread message

Matt Menke

unread,
Nov 30, 2017, 12:12:36 PM11/30/17
to net-dev, network-service-dev
When uploading files, we use overlapped I/O for reading from the file, in the FileStream / FileStreamContext classes.  I don't suppose anyone knows if there are, experimentally, any significant performance benefits from using overlapped I/O?

Context:  I'm working on adding file upload support to the network service.  Since the network service will be sandboxed, we need to pass in file handles.  Currently, one can safely reuse the content::ResourceRequest structure (Say, to retry, or if you make a request, it redirects, and you want to significantly modify the redirected request, you can modify the request and use it again).  Conveniently, there is an API to duplicate file handles on all platforms!  So...problem solved, right?  Except on Windows, we use CreateIoCompletionPort to listen to file handles in the network process.  This can, apparently, only be used once per file handle.  Even if we call DuplicateFileHandle on a handle for which we've never called this method, calling CreateIoCompletionPort on two different handles duplicated from the original handle fails.

So we either need to impose a restriction that a content::ResourceRequest with a file handle can never be duplicate, switch to sync IO on Windows (We use standard sync IO on other platforms here, anyways), or look for an alternative solution.

We could switch to sync IO but also add an extra layer of buffering, but that introduces more copies, unless we rework the UploadDataStream API, so may not be a good idea.

Matt Menke

unread,
Nov 30, 2017, 12:18:50 PM11/30/17
to net-dev, network-service-dev
One other possibility is to pass in a mojo callback / pipe to retrieve a file handle.  Then we can open the file only as needed, and if we need to re-use the ResourceRequest, we can just re-create the file handle.

That also has the benefit of requiring we keep around fewer file handles, but on the downside, it adds an extra process hop before we can start uploading anything.

Ken Rockot

unread,
Nov 30, 2017, 12:29:18 PM11/30/17
to Matt Menke, net-dev, network-service-dev, Robert Liao
Sorry I don't have any concrete data, but it's my experience anecdotally and my understanding in general that overlapped I/O is quite a bit more efficient than its alternatives on Windows. That is relatively old information, so it's possible something has changed and it's not as significant now. It's also possible that efficiency isn't critical for something like file upload.

+robliao for Windows expertise/advice.

On Thu, Nov 30, 2017 at 9:12 AM, 'Matt Menke' via network-service-dev <network-s...@chromium.org> wrote:
When uploading files, we use overlapped I/O for reading from the file, in the FileStream / FileStreamContext classes.  I don't suppose anyone knows if there are, experimentally, any significant performance benefits from using overlapped I/O?

Context:  I'm working on adding file upload support to the network service.  Since the network service will be sandboxed, we need to pass in file handles.  Currently, one can safely reuse the content::ResourceRequest structure (Say, to retry, or if you make a request, it redirects, and you want to significantly modify the redirected request, you can modify the request and use it again).  Conveniently, there is an API to duplicate file handles on all platforms!  So...problem solved, right?  Except on Windows, we use CreateIoCompletionPort to listen to file handles in the network process.  This can, apparently, only be used once per file handle.  Even if we call DuplicateFileHandle on a handle for which we've never called this method, calling CreateIoCompletionPort on two different handles duplicated from the original handle fails.

Yeah, you get one IOCP per object regardless of how many handles the object has referencing it; and once an object is associated with an IOCP, that IOCP exists until said object is destroyed (i.e. all its handles closed). I assume this was done to strike a balance between simplicity and efficiency in the API, but it sure is unfortunate.


So we either need to impose a restriction that a content::ResourceRequest with a file handle can never be duplicate, switch to sync IO on Windows (We use standard sync IO on other platforms here, anyways), or look for an alternative solution.

We could switch to sync IO but also add an extra layer of buffering, but that introduces more copies, unless we rework the UploadDataStream API, so may not be a good idea. 

--
You received this message because you are subscribed to the Google Groups "network-service-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to network-service-dev+unsub...@chromium.org.
To post to this group, send email to network-service-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/network-service-dev/CAEK7mvqgF9Ep40QVD7PHQHH%2B%2BPxq1ooRm%2BSizZHfe_6R9Fq_uA%40mail.gmail.com.

Matt Menke

unread,
Nov 30, 2017, 12:46:28 PM11/30/17
to Ken Rockot, net-dev, network-service-dev, Robert Liao
I guess we could use overlapped IO without using completion ports by using win::ObjectWatcher.

I may do some local testing and then a field trial where we don't use overlapped IO, and see how things go. The POSIX code should work on Windows without modification, though I'll need to make sure consumers that pass us base::Files directly still work.

On Thu, Nov 30, 2017 at 12:29 PM, Ken Rockot <roc...@chromium.org> wrote:
Sorry I don't have any concrete data, but it's my experience anecdotally and my understanding in general that overlapped I/O is quite a bit more efficient than its alternatives on Windows. That is relatively old information, so it's possible something has changed and it's not as significant now. It's also possible that efficiency isn't critical for something like file upload.

+robliao for Windows expertise/advice.

On Thu, Nov 30, 2017 at 9:12 AM, 'Matt Menke' via network-service-dev <network-service-dev@chromium.org> wrote:
When uploading files, we use overlapped I/O for reading from the file, in the FileStream / FileStreamContext classes.  I don't suppose anyone knows if there are, experimentally, any significant performance benefits from using overlapped I/O?

Context:  I'm working on adding file upload support to the network service.  Since the network service will be sandboxed, we need to pass in file handles.  Currently, one can safely reuse the content::ResourceRequest structure (Say, to retry, or if you make a request, it redirects, and you want to significantly modify the redirected request, you can modify the request and use it again).  Conveniently, there is an API to duplicate file handles on all platforms!  So...problem solved, right?  Except on Windows, we use CreateIoCompletionPort to listen to file handles in the network process.  This can, apparently, only be used once per file handle.  Even if we call DuplicateFileHandle on a handle for which we've never called this method, calling CreateIoCompletionPort on two different handles duplicated from the original handle fails.

Yeah, you get one IOCP per object regardless of how many handles the object has referencing it; and once an object is associated with an IOCP, that IOCP exists until said object is destroyed (i.e. all its handles closed). I assume this was done to strike a balance between simplicity and efficiency in the API, but it sure is unfortunate.


So we either need to impose a restriction that a content::ResourceRequest with a file handle can never be duplicate, switch to sync IO on Windows (We use standard sync IO on other platforms here, anyways), or look for an alternative solution.

We could switch to sync IO but also add an extra layer of buffering, but that introduces more copies, unless we rework the UploadDataStream API, so may not be a good idea. 

--
You received this message because you are subscribed to the Google Groups "network-service-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to network-service-dev+unsubscribe...@chromium.org.

Robert Liao

unread,
Nov 30, 2017, 12:48:22 PM11/30/17
to Matt Menke, Ken Rockot, net-dev, network-service-dev
My understanding here is that Overlapped I/O is more or less synonymous with what's commonly called Asynchronous I/O (even MS acknowledges as much).

The real question here is do you want to block the thread for your I/O request? If you want non-blocking behavior, you use Overlapped I/O. Otherwise you use synchronous I/O.

Matt Menke

unread,
Nov 30, 2017, 12:51:15 PM11/30/17
to Robert Liao, Ken Rockot, net-dev, network-service-dev
We do blocking calls disk reads off-thread on other platforms using the standard base::File API.  I don't see why we can't just use the same code on all platforms.

This is just for file uploads, so it's not the most common codepath, though obviously we don't want upload performance to be poor.

Ryan Sleevi

unread,
Nov 30, 2017, 12:54:07 PM11/30/17
to Robert Liao, Matt Menke, Ken Rockot, net-dev, network-service-dev
Well, we have Async IO, non-blocking I/O, and blocking I/O.

In the Async case, we go through to IOCPs
In the non-blocking case, we use OVERLAPPED with an hEvent
In blocking, well, blocking APIs are used

The main benefit of the async vs non-blocking is that it scales far more efficiently, especially for servers, since you don't need to have one-hEvent-per-interesting-event, but can instead use the IOCP and associated context to dispatch effectively (and to have multiple threads efficiently reading from an IOCP). We gain some benefits from how the kernel and drivers interact with the IOCP, as well as the scheduler.

But that's primarily an issue when you have a large number of handles open - such that the hEvent scaling becomes a matter. This matters for C10K, but I think less-so for some of these file socket upload cases.

Are there any other main advantages of the IOCP treatment over simply OVERLAPPED+hEvent 'quasi'-nonblocking (since it's not really truly async or non-blocking for some events in older Windows')

On Thu, Nov 30, 2017 at 12:48 PM, Robert Liao <rob...@chromium.org> wrote:
You received this message because you are subscribed to the Google Groups "net-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to net-dev+unsubscribe@chromium.org.
To post to this group, send email to net...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/net-dev/CAK7A45XhdrPv%3D2nLi-6gYNOcRASF28r74og1G6LeysjGB0OpAw%40mail.gmail.com.

Robert Liao

unread,
Nov 30, 2017, 12:58:07 PM11/30/17
to rsl...@chromium.org, Matt Menke, Ken Rockot, net-dev, network-service-dev
Well, we have Async IO, non-blocking I/O, and blocking I/O.
I guess it might be clearer for the discussion if we clarify that APIs in question.
The Windows File APIs have overlapped/non-overlapped. WinSock2 has the notion of a blocking/non-blocking socket. Those parameters are orthogonal (and I'm less familiar with WinSock2).

Which Windows APIs are used underneath the hood?

Matt Menke

unread,
Nov 30, 2017, 1:06:06 PM11/30/17
to Robert Liao, Ryan Sleevi, Ken Rockot, net-dev, network-service-dev
WinSock2 has both WSAAsyncSelect (For "non-blocking" IO) and WSARecv can take an overlapped argument (for "overlapped" IO).  Files have no equivalent of WSAAsyncSelect, I believe.

Matt Menke

unread,
Nov 30, 2017, 1:08:19 PM11/30/17
to Robert Liao, Ryan Sleevi, Ken Rockot, net-dev, network-service-dev
Erm, WSAAsyncSelect and WSAEventSelect for non-blocking, rather.  WSAASyncSelect sounds semi-deprecated.

Ryan Sleevi

unread,
Nov 30, 2017, 1:13:03 PM11/30/17
to Robert Liao, Ryan Sleevi, Matt Menke, Ken Rockot, net-dev, network-service-dev
I meant we can have

OVERLAPPED.hEvent == NULL, IOCP == fully async (dispatched to IOCP)
OVERLAPPED.hEvent != NULL, no associated IOCP = what I was calling 'nonblocking' (due to the need to wait for the event)
LPOVERLAPPED == NULL, fully sync

You're right that Windows also has the BSD-approach of non-blocking for sockets (in which a 'try again later / I'm givin ya all I got' is returned)

The current implementation of FileStreamContextWin uses base::MessageLoopForIO::IOContext with a ::ReadFile, but the underlying file handle has pump-> RegisterIOHandler() called, meaning it's associated with the IOCP (hEvent == NULL)

If I understand (perhaps incorrectly) Matt's proposal, the question would be, what are the implications of not associating the file handle with the IOCP, and instead using either non-blocking (OVERLAPPED.hEvent != NULL, waiting for event) or blocking-on-another-thread

Matt Menke

unread,
Nov 30, 2017, 1:36:33 PM11/30/17
to Ryan Sleevi, Robert Liao, Ken Rockot, net-dev, network-service-dev
Correct.  I'd tend to prefer the blocking-on-another thread, if it works:  That lets us use the same code everywhere, and there's less chance of mixing and matching files open for sync/async I/O and running into issues.  For instance, MessageLoopForIO::RegisterIOHandler DCHECKs when IOCP fails (due to being passed a file opened for sync I/O, or due to being passed a file previously passed to IOCP), which strikes me as rather unsafe behavior, since there's no easy way to probe for either state before calling it.

Charles 'Buck' Krasic

unread,
Nov 30, 2017, 2:58:44 PM11/30/17
to Matt Menke, Ryan Sleevi, Robert Liao, Ken Rockot, net-dev, network-service-dev
Sorry if this is only tangentially related.  I've been looking at upload performance recently (specifically very large files on very fast links), but my measurements and profile are on Android, and a bit of Linux desktop.   There are many outstanding questions  still un-answered from my profiles, one of them is why the cost of reading data from files seems counter-intuitively high.   

My main reason to mention this, is to call attention to the possibility that what we do for reading files on non-windows platforms might be quite far from optimal.

Reply all
Reply to author
Forward
0 new messages