Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

CloseFile() latency on Windows

650 views
Skip to first unread message

Gregory Szorc

unread,
Sep 28, 2015, 5:41:32 PM9/28/15
to dev-platform
As part of profiling a Python process on Windows using Process Monitor.exe,
I noticed that closing file handles using CloseFile() takes 1+ms. Contrast
this with other I/O related system calls like WriteFile() that tend to take
~1us. (I /think/ it only takes a longer time if the file has been written
to.) This is on Windows 10 running natively (no VM) when writing to an SSD.
Files are opened with _fopen() in "a+" mode if it matters. I can also repro
in "a" mode.

When writing thousands of files in rapid succession, this 1+ms pause
(assuming synchronous I/O) piles up. Assuming a 1ms pause, writing 100,000
files spends 100s in CloseFile()! The process profile also shows the bulk
of the time in CloseFile(), so this is a real hot spot.

Both Mercurial and the Firefox build system need to write thousands of
files in rapid order. Both are using Python. Both should experience
significant performance improvements if we find a more efficient way to do
bulk file writing (which currently is over 2x slower than Linux and OS X).

Short of going full overlapped I/O (which will be a PITA with Python), the
best I can think of is to perform CloseFile() on a background thread or
pool of background threads.

I was curious if anyone has dug into optimizing the writing of thousands of
files on Windows and can share any insights so we can make the Firefox
build system and Mercurial significantly faster on Windows.

gps

Kyle Huey

unread,
Sep 28, 2015, 5:44:31 PM9/28/15
to Gregory Szorc, dev-platform
We recently dealt with something similar in Gecko. See
https://bugzilla.mozilla.org/show_bug.cgi?id=1152046

- Kyle
> _______________________________________________
> dev-platform mailing list
> dev-pl...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform

Ehsan Akhgari

unread,
Sep 28, 2015, 9:45:52 PM9/28/15
to Gregory Szorc, dev-platform
On 2015-09-28 5:41 PM, Gregory Szorc wrote:
> When writing thousands of files in rapid succession, this 1+ms pause
> (assuming synchronous I/O) piles up. Assuming a 1ms pause, writing 100,000
> files spends 100s in CloseFile()! The process profile also shows the bulk
> of the time in CloseFile(), so this is a real hot spot.

There is no CloseFile() on Windows. Did you mean CloseHandle()?

The reason I'm asking is that CloseHandle() can close various types of
kernel objects, and if that is showing up in profiles, it's worth to
verify that the handle passed to it is actually coming from CreateFile(Ex).

Closing handles on a background thread doesn't help with performance if
you're invoking sub-processes that need to close a handle and wait for
the operation to finish. It would help if you provided more details on
the constraints you're dealing with, e.g., where do these handles come
from? Are they being created by one long running process or by several
short lived ones? etc. Another idea to experiment with is leaking the
handles and letting the kernel close them for you when your process is
terminated. I _think_ (but I'm not sure) that won't count towards the
handle of the process to become signaled so if you're spawning a process
that needs to close the file and wait for that to finish, that may be
faster.

On the topic of performance on Windows (but not directly related to your
question), beware of the ~60ms CreateProcess overhead
<https://llvm.org/bugs/show_bug.cgi?id=20253#c4>. Depending on the
context, one could kill both of these perf issues with one stone by
doing as many of the file manipulation in one process as you can, and
closing the handles on a background thread.

Gregory Szorc

unread,
Sep 29, 2015, 12:52:55 AM9/29/15
to Ehsan Akhgari, dev-platform, Gregory Szorc
On Mon, Sep 28, 2015 at 6:45 PM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:

> On 2015-09-28 5:41 PM, Gregory Szorc wrote:
>
>> When writing thousands of files in rapid succession, this 1+ms pause
>> (assuming synchronous I/O) piles up. Assuming a 1ms pause, writing 100,000
>> files spends 100s in CloseFile()! The process profile also shows the bulk
>> of the time in CloseFile(), so this is a real hot spot.
>>
>
> There is no CloseFile() on Windows. Did you mean CloseHandle()?
>

While this is probably something I should know, I confess to blindly
copying results from Sysinternals' procmon utility, which reports file
closes as the "CloseFile()" "operation." I reckon it is being intelligent
and converting CloseHandle() to something more useful for reporting
purposes. In my defense, procmon does report "operations" that I know are
actual Windows functions. Kinda weird it is inconsistent. Who knows.

The reason I'm asking is that CloseHandle() can close various types of
> kernel objects, and if that is showing up in profiles, it's worth to verify
> that the handle passed to it is actually coming from CreateFile(Ex).
>

Procmon is reporting lots of CreateFile() calls. And I'm 100% certain the
underlying C code is calling CreateFile().


> Closing handles on a background thread doesn't help with performance if
> you're invoking sub-processes that need to close a handle and wait for the
> operation to finish. It would help if you provided more details on the
> constraints you're dealing with, e.g., where do these handles come from?
> Are they being created by one long running process or by several short
> lived ones? etc. Another idea to experiment with is leaking the handles
> and letting the kernel close them for you when your process is terminated.
> I _think_ (but I'm not sure) that won't count towards the handle of the
> process to become signaled so if you're spawning a process that needs to
> close the file and wait for that to finish, that may be faster.
>

I'm dealing with a single threaded single long-running process that
performs synchronous I/O, 1 open file at a time. CreateFile, CloseHandle,
CreateFile, CloseHandle, ... I'm pretty sure leaking handles is out of the
question, as we need to write to thousands or even tens of thousands of
files and this will exhaust open files limits.


>
> On the topic of performance on Windows (but not directly related to your
> question), beware of the ~60ms CreateProcess overhead <
> https://llvm.org/bugs/show_bug.cgi?id=20253#c4>. Depending on the
> context, one could kill both of these perf issues with one stone by doing
> as many of the file manipulation in one process as you can, and closing the
> handles on a background thread.
>

I'm well aware that new processes on Windows are much more expensive than
on other architectures. You could say that parts of the Firefox build
system go out of their way to avoid new processes because of this property
:)

Ehsan Akhgari

unread,
Sep 29, 2015, 9:05:11 AM9/29/15
to Gregory Szorc, dev-platform
On 2015-09-29 12:52 AM, Gregory Szorc wrote:
> On Mon, Sep 28, 2015 at 6:45 PM, Ehsan Akhgari <ehsan....@gmail.com
> <mailto:ehsan....@gmail.com>> wrote:
>
> On 2015-09-28 5:41 PM, Gregory Szorc wrote:
>
> When writing thousands of files in rapid succession, this 1+ms pause
> (assuming synchronous I/O) piles up. Assuming a 1ms pause,
> writing 100,000
> files spends 100s in CloseFile()! The process profile also shows
> the bulk
> of the time in CloseFile(), so this is a real hot spot.
>
>
> There is no CloseFile() on Windows. Did you mean CloseHandle()?
>
>
> While this is probably something I should know, I confess to blindly
> copying results from Sysinternals' procmon utility, which reports file
> closes as the "CloseFile()" "operation." I reckon it is being
> intelligent and converting CloseHandle() to something more useful for
> reporting purposes. In my defense, procmon does report "operations" that
> I know are actual Windows functions. Kinda weird it is inconsistent. Who
> knows.

Fair! Honestly I haven't used procmon in years, I don't even remember
it having any profiling tools when I last saw it. :-) But it probably
tracks which handles are being passed to CloseHandle().

> The reason I'm asking is that CloseHandle() can close various types
> of kernel objects, and if that is showing up in profiles, it's worth
> to verify that the handle passed to it is actually coming from
> CreateFile(Ex).
>
>
> Procmon is reporting lots of CreateFile() calls. And I'm 100% certain
> the underlying C code is calling CreateFile().

Good. I'm assuming you mean CreateFile() directly, not wrappers such as
_open or fopen.

> Closing handles on a background thread doesn't help with performance
> if you're invoking sub-processes that need to close a handle and
> wait for the operation to finish. It would help if you provided
> more details on the constraints you're dealing with, e.g., where do
> these handles come from? Are they being created by one long running
> process or by several short lived ones? etc. Another idea to
> experiment with is leaking the handles and letting the kernel close
> them for you when your process is terminated. I _think_ (but I'm
> not sure) that won't count towards the handle of the process to
> become signaled so if you're spawning a process that needs to close
> the file and wait for that to finish, that may be faster.
>
>
> I'm dealing with a single threaded single long-running process that
> performs synchronous I/O, 1 open file at a time. CreateFile,
> CloseHandle, CreateFile, CloseHandle, ... I'm pretty sure leaking
> handles is out of the question, as we need to write to thousands or even
> tens of thousands of files and this will exhaust open files limits.

You'd be surprised. :-)

Windows doesn't really have a notion of open file limits similar to
Unix. File handles opened using _open can go up to a maximum of 2048.
fopen has a cap of 512 which can be raised up to 2048 using
_setmaxstdio(). *But* these are just CRT limits, and if you use Win32
directly, you can open up to 2^24 handles all at once
<https://technet.microsoft.com/en-us/library/bb896645.aspx>. Since we
will never need to open that many file handles, you may very well be
able to use this approach.

Cheers,
Ehsan

Ehsan Akhgari

unread,
Sep 29, 2015, 9:44:28 AM9/29/15
to Gregory Szorc, dev-platform
On 2015-09-29 9:05 AM, Ehsan Akhgari wrote:
> You'd be surprised. :-)
>
> Windows doesn't really have a notion of open file limits similar to
> Unix. File handles opened using _open can go up to a maximum of 2048.
> fopen has a cap of 512 which can be raised up to 2048 using
> _setmaxstdio(). *But* these are just CRT limits, and if you use Win32
> directly, you can open up to 2^24 handles all at once
> <https://technet.microsoft.com/en-us/library/bb896645.aspx>. Since we
> will never need to open that many file handles, you may very well be
> able to use this approach.

Sorry, the link was meant to be:
<http://blogs.technet.com/b/markrussinovich/archive/2009/09/29/3283844.aspx>

Neil

unread,
Sep 29, 2015, 11:57:23 AM9/29/15
to
Gregory Szorc wrote:

>Files are opened with _fopen() in "a+" mode if it matters. I can also repro in "a" mode.
>
>
...

>Short of going full overlapped I/O
>
Overlapped I/O isn't supported for operations that change the valid data
length of the file.
http://blogs.msdn.com/b/oldnewthing/archive/2011/09/23/10215586.aspx

--
Warning: May contain traces of nuts.

Gregory Szorc

unread,
Sep 29, 2015, 1:47:19 PM9/29/15
to Ehsan Akhgari, dev-platform, Gregory Szorc
On Tue, Sep 29, 2015 at 6:05 AM, Ehsan Akhgari <ehsan....@gmail.com>
wrote:

> On 2015-09-29 12:52 AM, Gregory Szorc wrote:
>
>> On Mon, Sep 28, 2015 at 6:45 PM, Ehsan Akhgari <ehsan....@gmail.com
>> <mailto:ehsan....@gmail.com>> wrote:
>>
>> On 2015-09-28 5:41 PM, Gregory Szorc wrote:
>>
>> When writing thousands of files in rapid succession, this 1+ms
>> pause
>> (assuming synchronous I/O) piles up. Assuming a 1ms pause,
>> writing 100,000
>> files spends 100s in CloseFile()! The process profile also shows
>> the bulk
>> of the time in CloseFile(), so this is a real hot spot.
>>
>>
>> There is no CloseFile() on Windows. Did you mean CloseHandle()?
>>
>>
>> While this is probably something I should know, I confess to blindly
>> copying results from Sysinternals' procmon utility, which reports file
>> closes as the "CloseFile()" "operation." I reckon it is being
>> intelligent and converting CloseHandle() to something more useful for
>> reporting purposes. In my defense, procmon does report "operations" that
>> I know are actual Windows functions. Kinda weird it is inconsistent. Who
>> knows.
>>
>
> Fair! Honestly I haven't used procmon in years, I don't even remember it
> having any profiling tools when I last saw it. :-) But it probably tracks
> which handles are being passed to CloseHandle().
>

It has some very limited profiling tools built in. I had to dump the output
and write a script to perform the analysis I needed :) It does in fact
track various arguments so you can get filename-level activity for all I/O
operations.


>
> The reason I'm asking is that CloseHandle() can close various types
>> of kernel objects, and if that is showing up in profiles, it's worth
>> to verify that the handle passed to it is actually coming from
>> CreateFile(Ex).
>>
>>
>> Procmon is reporting lots of CreateFile() calls. And I'm 100% certain
>> the underlying C code is calling CreateFile().
>>
>
> Good. I'm assuming you mean CreateFile() directly, not wrappers such as
> _open or fopen.
>

We're calling CreateFile() or CreateFileA() directly. However...


>
> Closing handles on a background thread doesn't help with performance
>> if you're invoking sub-processes that need to close a handle and
>> wait for the operation to finish. It would help if you provided
>> more details on the constraints you're dealing with, e.g., where do
>> these handles come from? Are they being created by one long running
>> process or by several short lived ones? etc. Another idea to
>> experiment with is leaking the handles and letting the kernel close
>> them for you when your process is terminated. I _think_ (but I'm
>> not sure) that won't count towards the handle of the process to
>> become signaled so if you're spawning a process that needs to close
>> the file and wait for that to finish, that may be faster.
>>
>>
>> I'm dealing with a single threaded single long-running process that
>> performs synchronous I/O, 1 open file at a time. CreateFile,
>> CloseHandle, CreateFile, CloseHandle, ... I'm pretty sure leaking
>> handles is out of the question, as we need to write to thousands or even
>> tens of thousands of files and this will exhaust open files limits.
>>
>
> You'd be surprised. :-)
>
> Windows doesn't really have a notion of open file limits similar to Unix.
> File handles opened using _open can go up to a maximum of 2048. fopen has a
> cap of 512 which can be raised up to 2048 using _setmaxstdio(). *But*
> these are just CRT limits, and if you use Win32 directly, you can open up
> to 2^24 handles all at once <
> https://technet.microsoft.com/en-us/library/bb896645.aspx>. Since we
> will never need to open that many file handles, you may very well be able
> to use this approach.
>

I experimented with a background thread for just processing file closes.
This drastically increases performance! However, the queue periodically
accumulates and I was seeing errors for too many open files - despite using
CreateFile()! We do make a call to _open_osfhandle() after CreateFile().
I'm guessing the file limit is on file descriptors (not handles) and
_open_osfhandle() triggers the 512 default ceiling? This call is necessary
because Python file objects speak in terms of file descriptors. Not calling
_open_osfhandle() would mean re-implementing Python's file object, which
I'm going to say is too much work for the interim.

Buried in that last paragraph is that a background threading closing files
resulted in significant performance wins - ~5:00 wall on an operation that
was previously ~16:00 wall! And I'm pretty sure it would go faster if
multiple closing threads were used. Still not as fast as Linux. But much
better than the 3x increase from before.

Ehsan Akhgari

unread,
Sep 29, 2015, 2:09:33 PM9/29/15
to Gregory Szorc, dev-platform
On 2015-09-29 1:47 PM, Gregory Szorc wrote:
> You'd be surprised. :-)
>
> Windows doesn't really have a notion of open file limits similar to
> Unix. File handles opened using _open can go up to a maximum of
> 2048. fopen has a cap of 512 which can be raised up to 2048 using
> _setmaxstdio(). *But* these are just CRT limits, and if you use
> Win32 directly, you can open up to 2^24 handles all at once
> <https://technet.microsoft.com/en-us/library/bb896645.aspx>. Since
> we will never need to open that many file handles, you may very well
> be able to use this approach.
>
>
> I experimented with a background thread for just processing file closes.
> This drastically increases performance! However, the queue periodically
> accumulates and I was seeing errors for too many open files - despite
> using CreateFile()! We do make a call to _open_osfhandle() after
> CreateFile(). I'm guessing the file limit is on file descriptors (not
> handles) and _open_osfhandle() triggers the 512 default ceiling?

Yeah that would go through the CRT so you would be subject to the CRT
limit of 512 (which you can bump up to 2048 as I said before.)

> This
> call is necessary because Python file objects speak in terms of file
> descriptors. Not calling _open_osfhandle() would mean re-implementing
> Python's file object, which I'm going to say is too much work for the
> interim.

Ugh. In that case, closing file handles on a background thread is the
best you can do, I think.

> Buried in that last paragraph is that a background threading closing
> files resulted in significant performance wins - ~5:00 wall on an
> operation that was previously ~16:00 wall! And I'm pretty sure it would
> go faster if multiple closing threads were used. Still not as fast as
> Linux. But much better than the 3x increase from before.

Cool!
0 new messages