Processing a file...

4 views
Skip to first unread message

Amirouche B.

unread,
Feb 12, 2009, 6:55:44 PM2/12/09
to Django users
Hello,

I 'd like to process some file before saving it, let's say it's an
audio file that I want to be converted to ogg before it lands in the
database forever.

I tried to run ffmpeg2theora in the save method of the models but this
doesn't work verywell, the testserver hangup after the processing is
done.

## FILE : models.py, ##

class Song:
[...]

def save(self):
super(Song, self).save()
path = self.file.path.split("/")
path, file = path[1:-1], path[-1]
joined_path = "/"
for e in path:
joined_path += e + "/"

os.chdir(joined_path)

res = os.popen("/usr/bin/ffmpeg2theora " + path)

any help ?

Jacob Rigby

unread,
Feb 13, 2009, 4:42:52 AM2/13/09
to Django users
The testserver is single threaded (assuming you're using django-
admin.py runserver), so it would appear to hang until the os.popen
call completes. If it's really crashing, then something horrible must
be going wrong with the child process.

In a production setting, where you're using a multiprocess or
multithreaded server like lighttpd+fastcgi or apache+mod_python/
mod_wsgi you'd almost get this code to work as is, but it's not really
a best practice. Chances are the server would decide your script is
taking too long and give up, presenting the user an error page after a
long wait.

The simplest option is to save the unprocessed file to disk, then have
a background job process the file and stick it in the database.
Basically just don't do everything in the web process.

You could use cron to schedule the background process or you could get
fancy and implement a daemon that listens to work requests via a queue
(perhaps using the multiprocessing module)

There might be a way to use filesystem notification events to autorun
a script if a file is added to a directory (MacOS definitely has
something like this via launchd)

The catch with all this multiprocess jazz is that you might still need
a way to tell your user what you've done. Since HTTP works on a pull
models, so you can't exactly send them a message once processing is
done (could use email, tweets, rss, or tricky comet stuff). A simple
workaround is to either metarefresh the page or use ajax to poll for
processing completion. You might also just neglect to tell the user
about the encoding step and only let them know when you've finished
uploading the file.

The easiest way to communicate between the background process and the
web process is to use a plain old file in a known location, or to use
the database (best option IMHO) or a shared queue (why bother)


-Jacob

Graham Dumpleton

unread,
Feb 13, 2009, 6:44:21 AM2/13/09
to Django users


On Feb 13, 8:42 pm, Jacob Rigby <rigbyja...@gmail.com> wrote:
> The testserver is single threaded (assuming you're using django-
> admin.py runserver), so it would appear to hang until the os.popen
> call completes.  If it's really crashing, then something horrible must
> be going wrong with the child process.
>
> In a production setting, where you're using a multiprocess or
> multithreaded server like lighttpd+fastcgi or apache+mod_python/mod_wsgi
> you'd almost get this code to work as is, but it's not really
> a best practice.  Chances are the server would decide your script is
> taking too long and give up, presenting the user an error page after a
> long wait.

Only fastcgi would generally timeout the request and do that.

In mod_wsgi daemon mode there is an inactivity-timeout you can
optionally set, but would only trigger if it was the only request
against that daemon process. This is because it looks at activity of
the process as a whole and not single request. Thus if multithreaded
and other requests still being handled by process, then inactivity
timeout wouldn't trigger.

The inactivity timeout in mod_wsgi daemon mode serves two purposes.
The first is so that process has not received any requests for a
while, then process will restart, thereby dumping loaded instance and
return process back to base size. For infrequently used sites this is
a memory saving measure. The second is that if for some reason all
threads lock up and aren't consuming input or producing output, then
it acts as a failsafe to unstick the process without manual
intervention.

I would concur that trying to push this to a separate process as a
background task would be a better idea. That way you aren't consuming
a web server thread, which in case of something like Apache is a
limited resource.

Graham

Amirouche B.

unread,
Feb 13, 2009, 9:17:52 AM2/13/09
to Django users
thank you for your answers and the insight into the how servers works.

Using a background process, is kind of a burden for deployement
but let's do it the way it works :D
Reply all
Reply to author
Forward
0 new messages