Multipart data Decoder and asynchronous issues

21 views
Skip to first unread message

Станислав Соколко

unread,
Oct 28, 2016, 6:19:01 AM10/28/16
to python-pulsar
Hello!

I'm writing some code for large file uploads and I discovered the ability to pass stream parameter when fetching data and files from request (e.g. using request.data_and_files(data=False, files=True, stream=callme)).
I wonder why in this async method https://github.com/s-sokolko/pulsar/blob/2036963c4031a9bb76450c72dbc26f5859cb74b8/pulsar/apps/wsgi/formdata.py#L166 feed_data and done are called synchronously.
I think this might freeze the application if the recaiving stream is reading the data slowly. So I have a diff (https://github.com/s-sokolko/pulsar/commit/2036963c4031a9bb76450c72dbc26f5859cb74b8) to fix this, but I'm afraid it migh influence some other parts of the framework (feed_data and done methods are modified and now made async in my version)

So what do you think of this diff and this problem? Maybe there are other parts of the project I should also modify together with my diff?


lsbardel

unread,
Oct 29, 2016, 4:54:49 PM10/29/16
to python-pulsar
Hi,


On Friday, October 28, 2016 at 11:19:01 AM UTC+1, Станислав Соколко wrote:
Hello!

I'm writing some code for large file uploads and I discovered the ability to pass stream parameter when fetching data and files from request (e.g. using request.data_and_files(data=False, files=True, stream=callme)).

Yes, It has been around for a little while but not documented yet :-(
 
I wonder why in this async method https://github.com/s-sokolko/pulsar/blob/2036963c4031a9bb76450c72dbc26f5859cb74b8/pulsar/apps/wsgi/formdata.py#L166 feed_data and done are called synchronously.

Because both methods are not asynchronous, feed_data pass data, which has been already received, to the parser, therefore it is not an IO operation. The done method simply checks if we are done with parsing.
 
I think this might freeze the application if the recaiving stream is reading the data slowly. So I have a diff (https://github.com/s-sokolko/pulsar/commit/2036963c4031a9bb76450c72dbc26f5859cb74b8) to fix this, but I'm afraid it migh influence some other parts of the framework (feed_data and done methods are modified and now made async in my version)

No it wont freeze the application. Both methods are not IO, however it is important that the stream function you pass in
 
request.data_and_files

is a good citizen, i.e. it doesn't block. If you are doing IO operations on the stream of bytes the stream function receive, make sure to use tasks (coroutines running on the event loop) or some other async helpers.
If this is not clear, please share your use case and I'll try to assist.


Станислав Соколко

unread,
Oct 30, 2016, 2:46:15 PM10/30/16
to python-pulsar
I'm implementing some large files upload, so the full contents of the file cannot be held in memory (e.g. I want to upload 30 GB file to server having 2 GB of RAM). I made this diff because I want request.data_and_files not to call stream, but to wait for stream method to write out the received data to the disk. While waiting for write operation to complete I would like the whole event loop not to block. Maybe you can give me some advice on how this can be done?
Reply all
Reply to author
Forward
0 new messages