FTP'ing without locking

3 views
Skip to first unread message

Greg Taylor

unread,
Dec 17, 2008, 4:15:43 PM12/17/08
to Django users
This is somewhat of a core Python question with a Django twist. I'm
running mod_wsgi and am trying to figure out how to FTP a file from my
Django app to a remote host without locking the thread up. I've tried
something like:

from subprocess import Popen
print Popen(["python", command_str, "53363", "1"]).pid

I'm sure there's a much better way to do what I'm trying to. I thought
about threading it off, but wouldn't the wsgi process have to stick
around for the thread to return?

Russell Keith-Magee

unread,
Dec 17, 2008, 5:48:35 PM12/17/08
to django...@googlegroups.com

You can't really do any long lived processing in a view. Threading
won't really solve the problem, for the exact reason you have
identified.

The solution here is to use a separate worker process. Your view
creates 'jobs' on a queue describing what needs to be FTP'd and from
where. A completely separate process then polls the queue and performs
the actually downloading. The queue processor is completely decoupled
from the view and doesn't prevent the view returning quickly. One
simple approach for implementing the queue is to use cron to perform
regular polling; there are many other possible solutions, especially
if you're sensitive to the latency that cron would introduce.

If you want to get extra fancy and you want to provide download
feedback, you can use a second, AJAX-style view to monitor the state
of the queue. This lets the user know what progress has been made on
their download without locking the webserver.

Yours,
Russ Magee %-)

Greg Taylor

unread,
Dec 17, 2008, 5:55:55 PM12/17/08
to Django users
Yeah, I was afraid this would be the case. The interval polling script
was something I really wanted to avoid.

I can't believe this isn't possible, though. I assume this is a Django
limitation of some sort?

On Dec 17, 5:48 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

Russell Keith-Magee

unread,
Dec 17, 2008, 6:44:24 PM12/17/08
to django...@googlegroups.com
On Thu, Dec 18, 2008 at 7:55 AM, Greg Taylor <squish...@gmail.com> wrote:
>
> Yeah, I was afraid this would be the case. The interval polling script
> was something I really wanted to avoid.
>
> I can't believe this isn't possible, though. I assume this is a Django
> limitation of some sort?

Ok, then. How would you do it in PHP? Rails? Any other web framework?

The limitation here isn't Django per se. It is a fundamental design
contstraint of the web itself. HTTP essentially requires that all
requests can be satisfied very quickly. This pretty much eliminates
the possibility of having long-lived processing as part of request
handling.

Strictly speaking, this isn't even a limitation of web applications.
Regardless of the programming paradigm, you shouldn't arbitrarily
start a long lived processing task. In order to give good UI feedback,
you should always start a long lived task in the background, and use
the foreground processing to monitor the status of the background
task. Web frameworks impose certain unique restrictions on the way
this pattern is realized, but the base requirements are fundamentally
the same as any other programming paradigm.

Yours,
Russ Magee %-)

Greg Taylor

unread,
Dec 17, 2008, 7:10:26 PM12/17/08
to Django users
I understand what you're saying, but I could've sworn I've seen Perl
CGI scripts that forked something off from the web process that didn't
hang the client up. Maybe I'm completely imagining that though (which
is a distinct possibility).

On Dec 17, 6:44 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

Jeff Anderson

unread,
Dec 17, 2008, 7:27:57 PM12/17/08
to django...@googlegroups.com
Russell Keith-Magee wrote:
> On Thu, Dec 18, 2008 at 6:15 AM, Greg Taylor <squish...@gmail.com> wrote:
>
>> This is somewhat of a core Python question with a Django twist. I'm
>> running mod_wsgi and am trying to figure out how to FTP a file from my
>> Django app to a remote host without locking the thread up. I've tried
>> something like:
>>
>> from subprocess import Popen
>> print Popen(["python", command_str, "53363", "1"]).pid
>>
>> I'm sure there's a much better way to do what I'm trying to. I thought
>> about threading it off, but wouldn't the wsgi process have to stick
>> around for the thread to return?
>>
>
> You can't really do any long lived processing in a view. Threading
> won't really solve the problem, for the exact reason you have
> identified.
>
> The solution here is to use a separate worker process.
I was playing around with doing something similar, but instead of an
independent worker process, I did use the python thread library. I just
spouted off a thread to do its thing from the view, and then returned
the httpresponse object. I never put this into production, and I was
mostly just playing around with it. I only tested it in the django
development server. It worked quite well. I had a separate view that
answered ajax requests about the status of the process. Since the django
dev server didn't hang, I don't imaging having a persistant python
thread hanging around processing for a minute or two would make the
webserver hang in a mod_python or mod_wsgi environment, but I didn't try
it. Why would this be the case?


From the "I've done it without a hitch" department,

Jeff Anderson

signature.asc

Malcolm Tredinnick

unread,
Dec 17, 2008, 7:28:54 PM12/17/08
to django...@googlegroups.com

On Wed, 2008-12-17 at 16:10 -0800, Greg Taylor wrote:
> I understand what you're saying, but I could've sworn I've seen Perl
> CGI scripts that forked something off from the web process that didn't
> hang the client up. Maybe I'm completely imagining that though (which
> is a distinct possibility).

And you can do that in Python, too. It's not a particularly robust way
of operating, though, since you either have to do the full "daemonize"
dance, or else the long-running process blocks the parent process from
being killed off (and something like Apache regularly recycles child
processes). Error handling also becomes problematic, since there's
nothing to really report the error *to*.

Fork, exec, spawn ... they all exist in Python. Use them if you wish,
but understand the drawbacks.

The approach Russell originally pointed out, of inserting things into a
queue that are picked up by another periodic process, is similar to the
fork-ing approach, except it separates the process lifecycles and makes
things generally a bit easier to manage, both on a system administration
level and debugging. If a process hangs, you can stop it and restart it.
If you need to temporarily halt the process, say, because the remote
system you're talking to is down, you can restart it again later and it
will just go back to processing queue entries (which won't have been
lost in the interim). In general, asynchronous processing of
long-running items tends to scale a lot better than the inline forking
approach. So it's a pattern worth implementing.

Regards,
Malcolm

Reply all
Reply to author
Forward
0 new messages