You can't really do any long lived processing in a view. Threading
won't really solve the problem, for the exact reason you have
identified.
The solution here is to use a separate worker process. Your view
creates 'jobs' on a queue describing what needs to be FTP'd and from
where. A completely separate process then polls the queue and performs
the actually downloading. The queue processor is completely decoupled
from the view and doesn't prevent the view returning quickly. One
simple approach for implementing the queue is to use cron to perform
regular polling; there are many other possible solutions, especially
if you're sensitive to the latency that cron would introduce.
If you want to get extra fancy and you want to provide download
feedback, you can use a second, AJAX-style view to monitor the state
of the queue. This lets the user know what progress has been made on
their download without locking the webserver.
Yours,
Russ Magee %-)
Ok, then. How would you do it in PHP? Rails? Any other web framework?
The limitation here isn't Django per se. It is a fundamental design
contstraint of the web itself. HTTP essentially requires that all
requests can be satisfied very quickly. This pretty much eliminates
the possibility of having long-lived processing as part of request
handling.
Strictly speaking, this isn't even a limitation of web applications.
Regardless of the programming paradigm, you shouldn't arbitrarily
start a long lived processing task. In order to give good UI feedback,
you should always start a long lived task in the background, and use
the foreground processing to monitor the status of the background
task. Web frameworks impose certain unique restrictions on the way
this pattern is realized, but the base requirements are fundamentally
the same as any other programming paradigm.
Yours,
Russ Magee %-)
From the "I've done it without a hitch" department,
Jeff Anderson
And you can do that in Python, too. It's not a particularly robust way
of operating, though, since you either have to do the full "daemonize"
dance, or else the long-running process blocks the parent process from
being killed off (and something like Apache regularly recycles child
processes). Error handling also becomes problematic, since there's
nothing to really report the error *to*.
Fork, exec, spawn ... they all exist in Python. Use them if you wish,
but understand the drawbacks.
The approach Russell originally pointed out, of inserting things into a
queue that are picked up by another periodic process, is similar to the
fork-ing approach, except it separates the process lifecycles and makes
things generally a bit easier to manage, both on a system administration
level and debugging. If a process hangs, you can stop it and restart it.
If you need to temporarily halt the process, say, because the remote
system you're talking to is down, you can restart it again later and it
will just go back to processing queue entries (which won't have been
lost in the interim). In general, asynchronous processing of
long-running items tends to scale a lot better than the inline forking
approach. So it's a pattern worth implementing.
Regards,
Malcolm