About the big file upload and download

344 views
Skip to first unread message

Cong Wang

unread,
Aug 16, 2016, 8:43:57 AM8/16/16
to Tornado Web Server
Hi
I am new to Tornado Framework.

 I am doing a task where the client need  to Upload a
directory(or files inside it) from home directory to the server. Also
Download a directory(or files inside it) from the Server. All the data should be transferred in binary.

I have investigate some API that tornado provided for file transfer ,I have found that tornado will put all the data in the memory and then transfer it ,so 
here comes the problem for me to do this .

Can someone help me out here?About which API  I should use to do this ?And if someone has the experience doing this ,can you share the code with me ?

Thanks a loooooot!


best regards

Hao Weibo

unread,
Aug 16, 2016, 8:55:45 AM8/16/16
to python-...@googlegroups.com
Can you speak Chinese?

--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornado+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Cong Wang

unread,
Aug 17, 2016, 2:10:19 AM8/17/16
to python-...@googlegroups.com
hey, is there someone can help solve this question ?

--
You received this message because you are subscribed to a topic in the Google Groups "Tornado Web Server" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/python-tornado/mGEHDmZY19I/unsubscribe.
To unsubscribe from this group and all its topics, send an email to python-tornado+unsubscribe@googlegroups.com.

Kevin LaTona

unread,
Aug 17, 2016, 5:19:52 AM8/17/16
to python-tornado@googlegroups.com Server

If all you doing is writing a simple uploader / downloader calls and nothing else, sure Tornado can do this using either in it's async mode ( it’s going to block the main thread, but who cares if that’s all it’s doing) or use the ThreadPoll executor calls to pass it off to a thread pool to deal with it.

There are numerous other Python based http servers who all are thread pool based like CherryPy, Flask, Django, Pyramid, WebPy and all the numerous other’s frameworks like these that kicking about that could make better sense to use than Tornado in this situation.

So you know this list is pretty focused and tends to lend a hand on existing code questions to get a person unstuck vs writing code for a person. 

Github is filled with numerous example Tornado repo’s and lots of Gist ideas up there to look at for ideas.

Maybe some one else will add in this conversation, but the above are my suggestions based on what you have stated so far.

-Kevin

Ben Darnell

unread,
Aug 18, 2016, 11:40:48 PM8/18/16
to python-...@googlegroups.com
On Wed, Aug 17, 2016 at 5:19 AM Kevin LaTona <li...@studiosola.com> wrote:

If all you doing is writing a simple uploader / downloader calls and nothing else, sure Tornado can do this using either in it's async mode ( it’s going to block the main thread, but who cares if that’s all it’s doing) or use the ThreadPoll executor calls to pass it off to a thread pool to deal with it.

Tornado is very well suited to this problem because of the `@stream_request_body` decorator (and `yield self.flush()` for downloads). It's true that when writing to local disk you can't do any better than a thread pool, but the network transfer matters too and that's where Tornado's async model helps. 


-Ben
 

There are numerous other Python based http servers who all are thread pool based like CherryPy, Flask, Django, Pyramid, WebPy and all the numerous other’s frameworks like these that kicking about that could make better sense to use than Tornado in this situation.

So you know this list is pretty focused and tends to lend a hand on existing code questions to get a person unstuck vs writing code for a person. 

Github is filled with numerous example Tornado repo’s and lots of Gist ideas up there to look at for ideas.

Maybe some one else will add in this conversation, but the above are my suggestions based on what you have stated so far.

-Kevin




On Aug 16, 2016, at 11:10 PM, Cong Wang <congfa...@gmail.com> wrote:

hey, is there someone can help solve this question ?

2016-08-16 20:55 GMT+08:00 Hao Weibo <haowei...@gmail.com>:
Can you speak Chinese?

2016-08-16 20:43 GMT+08:00 Cong Wang <congfa...@gmail.com>:
Hi
I am new to Tornado Framework.

 I am doing a task where the client need  to Upload a
directory(or files inside it) from home directory to the server. Also
Download a directory(or files inside it) from the Server. All the data should be transferred in binary.

I have investigate some API that tornado provided for file transfer ,I have found that tornado will put all the data in the memory and then transfer it ,so 
here comes the problem for me to do this .

Can someone help me out here?About which API  I should use to do this ?And if someone has the experience doing this ,can you share the code with me ?

Thanks a loooooot!


best regards


--
You received this message because you are subscribed to the Google Groups "Tornado Web Server" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-tornad...@googlegroups.com.

Kevin LaTona

unread,
Aug 19, 2016, 1:28:24 AM8/19/16
to python-tornado@googlegroups.com Server

Hmm so does the ethernet packet size of 1500 to  9000 bytes come into play here where other IO connections get a chance to break in during a call like this in on the Async thread?

I’ve always handed such things like a big file upload off to a thread pool to just to play it safe and allow the pool scheduler to deal with any possible blocking issues with the upload.

Maybe I missed seeing where in any of this code that had the yield points to allow other calls to break and not allow a 10 GB file upload to hog the whole server.

-Kevin

Ben Darnell

unread,
Aug 19, 2016, 9:25:35 AM8/19/16
to python-...@googlegroups.com
On Fri, Aug 19, 2016 at 1:28 AM Kevin LaTona <li...@studiosola.com> wrote:

Hmm so does the ethernet packet size of 1500 to  9000 bytes come into play here where other IO connections get a chance to break in during a call like this in on the Async thread?

It has more to do with the socket receive buffer size than the packet size (SO_RCVBUF)
 

I’ve always handed such things like a big file upload off to a thread pool to just to play it safe and allow the pool scheduler to deal with any possible blocking issues with the upload.

Maybe I missed seeing where in any of this code that had the yield points to allow other calls to break and not allow a 10 GB file upload to hog the whole server.

If your data_received method is a coroutine, Tornado will throttle the upload to match the speed at which you are able to process the data. In practice, this is typically enough to keep one upload from monopolizing the server:

    async def data_received(self, chunk):
        await threadpool.submit(write_chunk_to_disk, chunk)

You could throw in an `await gen.sleep()` if you want to artificially limit the upload speed, which might allow for more fair sharing of resources.

-Ben
 

-Kevin



On Aug 18, 2016, at 8:40 PM, Ben Darnell <b...@bendarnell.com> wrote:

On Wed, Aug 17, 2016 at 5:19 AM Kevin LaTona <li...@studiosola.com> wrote:

If all you doing is writing a simple uploader / downloader calls and nothing else, sure Tornado can do this using either in it's async mode ( it’s going to block the main thread, but who cares if that’s all it’s doing) or use the ThreadPoll executor calls to pass it off to a thread pool to deal with it.

Tornado is very well suited to this problem because of the `@stream_request_body` decorator (and `yield self.flush()` for downloads). It's true that when writing to local disk you can't do any better than a thread pool, but the network transfer matters too and that's where Tornado's async model helps. 



Kevin LaTona

unread,
Aug 19, 2016, 9:40:20 AM8/19/16
to python-tornado@googlegroups.com Server

One problem I’ve never fully resolved to a comfy level with Python Async calls is how to get my code to step back and out of the IOloop to give some of my other calls time to do things as well and then step back into the IOLoop.

This gen sleep idea is looking like a very simple elegant way to throttle things at times like that…… and well I’ve just flat out overlooked it in cases like this.

Thanks for the suggestion… much appreciated.

-Kevin
Reply all
Reply to author
Forward
0 new messages