Best way to read/write files with AsyncIO

12,155 views
Skip to first unread message

Ludovic Gasc

unread,
Aug 25, 2014, 6:06:44 PM8/25/14
to python...@googlegroups.com
Hi,

I'm looking for the best way to read/write files with AsyncIO.
Ideally, I want to r/w asynchronously, like with network.


Is it ok, or do you have a better suggestion ?

Regards.

Guido van Rossum

unread,
Aug 25, 2014, 6:55:57 PM8/25/14
to Ludovic Gasc, python-tulip
On most OSes, select() and other polling APIs always report disk files to be "ready", so you basically can't use asyncio with them. On Windows it will fail; on *n*x it will appear to work but actually you are doing the whole thing synchronously. The only way to overlap disk I/O with asyncio events would be to do the disk I/O on a separate thread.

Someone is working on sendfile support (https://code.google.com/p/tulip/issues/detail?id=144, http://bugs.python.org/issue17552) which would help overlap disk I/O and socket I/O for the specific case of serving a file from disk directly to a socket, but even that would be pretty limited.
--
--Guido van Rossum (python.org/~guido)

Glyph

unread,
Aug 26, 2014, 10:11:51 PM8/26/14
to Guido van Rossum, Ludovic Gasc, python-tulip
If anyone is curious about the abysmal state of asynchronous file I/O in popular operating systems, this question I asked on Stack Overflow a while back has got some really excellent answers on it: <https://stackoverflow.com/questions/87892/what-is-the-status-of-posix-asynchronous-i-o-aio>.

-glyph

Luciano Ramalho

unread,
Jan 13, 2015, 9:54:49 AM1/13/15
to python...@googlegroups.com, gu...@python.org, gml...@gmail.com
Reviving the thread... if I understand correctly, there is no portable way to do disk I/O asynchronously (and the gist [1] provided by the OP is bogus: the read_data function will block the event loop). Is my understanding correct?

[1] https://gist.github.com/kunev/f83146d407c81a2d64a6

Second question: Node.js does have a complete async API for doing filesysem I/O, even an async stat function! But after reading Glyph's Q&A on Stackoverflow I conclude those filesystem I/O functions in Node must rely on threads underneath, since we can't rely on OS APIs for async filesystem I/O. Is that it?

Thanks!

Best,

Luciano

Andrew Svetlov

unread,
Jan 13, 2015, 10:29:26 AM1/13/15
to Luciano Ramalho, python...@googlegroups.com, Guido van Rossum, Ludovic Gasc
1. Yes, file functions from your gist do block event loop.
2. Yes, portable nonblocking file API should be built on threads. Or,
as an option, you can make *actually blocking* code with *nonblocking
interface* if you like coroutines so much.
--
Thanks,
Andrew Svetlov

Saúl Ibarra Corretgé

unread,
Jan 13, 2015, 10:32:04 AM1/13/15
to python...@googlegroups.com
On 01/13/2015 03:54 PM, Luciano Ramalho wrote:
> Reviving the thread... if I understand correctly, there is no portable
> way to do disk I/O asynchronously (and the gist [1] provided by the OP
> is bogus: the read_data function will block the event loop). Is my
> understanding correct?
>
> [1] https://gist.github.com/kunev/f83146d407c81a2d64a6
> <https://www.google.com/url?q=https%3A%2F%2Fgist.github.com%2Fkunev%2Ff83146d407c81a2d64a6&sa=D&sntz=1&usg=AFQjCNG-U5Mxoo48n49VafkFEr4TnQVm8g>
>

Yes, that will block the event loop.

> Second question: Node.js does have a complete async API for doing
> filesysem I/O, even an async stat function! But after reading Glyph's
> Q&A on Stackoverflow I conclude those filesystem I/O functions in Node
> must rely on threads underneath, since we can't rely on OS APIs for
> async filesystem I/O. Is that it?
>

libuv (Node's platform layer) core dev here. That's right all filesystem
operations are run in a threadpool, much like asyncio runs getaddrinfo
in a ThreadPoolExecutor.

Here is an interesting read:
http://blog.libtorrent.org/2012/10/asynchronous-disk-io/


Cheers,

--
Saúl Ibarra Corretgé
bettercallsaghul.com


signature.asc

Victor Stinner

unread,
Jan 13, 2015, 11:07:48 AM1/13/15
to Saúl Ibarra Corretgé, python-tulip
2015-01-13 16:24 GMT+01:00 Saúl Ibarra Corretgé <sag...@gmail.com>:
> libuv (Node's platform layer) core dev here. That's right all filesystem
> operations are run in a threadpool, much like asyncio runs getaddrinfo
> in a ThreadPoolExecutor.
>
> Here is an interesting read:
> http://blog.libtorrent.org/2012/10/asynchronous-disk-io/

Thanks for the link. I used the ThirdParty wiki page to keep these info:
https://code.google.com/p/tulip/wiki/ThirdParty#Filesystem

I don't understand the status on Windows when the ProactorEventLoop is
used: are filesystem operation blocking or not?

Latest discussion on the Linux kernel: http://lwn.net/Articles/612483/

Victor

Guido van Rossum

unread,
Jan 13, 2015, 11:34:44 AM1/13/15
to Victor Stinner, Saúl Ibarra Corretgé, python-tulip
Interesting blog. The gist seems to be to use a thread pool for disk operations instead of AOI. This translates fairly directly to Tulip/Trollius, using the run_in_executor() operation. It's pretty clear to me that Windows IOCP does support async operations on disk files, so it would behoove us to design a standard API for async disk operations that can be implemented either using IOCP or using a thread pool.

Saúl Ibarra Corretgé

unread,
Jan 13, 2015, 11:40:08 AM1/13/15
to gu...@python.org, Victor Stinner, python-tulip
On 01/13/2015 05:34 PM, Guido van Rossum wrote:
> Interesting blog. The gist seems to be to use a thread pool for disk
> operations instead of AOI. This translates fairly directly to
> Tulip/Trollius, using the run_in_executor() operation. It's pretty clear
> to me that Windows IOCP does support async operations on disk files, so
> it would behoove us to design a standard API for async disk operations
> that can be implemented either using IOCP or using a thread pool.
>

I can ask our resident Windows expert, but IIRC there was some problem
on Windows too. Something along the lines of async operations falling
back to being sync in certain conditions...
signature.asc
Reply all
Reply to author
Forward
0 new messages