proposal: abstract file upload/download handling

91 views
Skip to first unread message

Waldemar Kornewald

unread,
Jun 22, 2010, 2:55:30 AM6/22/10
to django-developers
Hi,
first I should say that I'm not 100% sure if this feature should be in
Django core, so I'm kind-of asking for you opinion.

Problem:
Currently, Django already has an API for file uploads, but it's not
useful for the new cloud-based file services which support the more
efficient asynchronous uploads (i.e., the file gets sent directly to
the file server instead of being piped through Django). Also, Django
doesn't provide an API for handling downloads in an abstract way
(e.g., via X-Sendfile or a cloud-based file service). This means that
you can't write reusable Django apps that handle file uploads and
downloads using your actual hosting solution. Also, you can't use
Django's admin interface for file uploads/downloads.

Proposal:
I've already implemented a really simple backend-based API which could
solve this problem. It's called django-filetransfers:
http://www.allbuttonspressed.com/projects/django-filetransfers

It basically has:

* a pre-upload handler which generates an upload URL for a single file
and additional POST data if that's needed (e.g., for async S3)

* a template tag which generates <input type="hidden" ... /> fields
for the additional POST data

* a download URL generator for public files (e.g., publicly accessible
files hosted on S3 or via MEDIA_URL or ...)

* a download view for private files (e.g., X-Sendfile, App Engine
Blobstore, private S3, ...)
The download view also serves as a fallback in case the backend
doesn't support public URLs.

With this solution you can continue to use ModelForm and FileField in
your project.

Normally, I'd just keep this API in a separate app (outside of Django
core), but without this API the admin interface won't work properly.
What do you think? Should Django consider including this (or a
similar) API in core, so there is a standard way to handle any kind of
upload/download solution? Or should the admin interface try to use
django-filetransfers if it's available (probably not; just thinking
aloud)?

Bye,
Waldemar Kornewald

Gert Van Gool

unread,
Jun 22, 2010, 3:00:34 AM6/22/10
to django-d...@googlegroups.com
What's wrong with the current StorageBackends? 
Apart from the fact that everything goes through Django first, which imho isn't a bad default solution (easier for parsing and the like...).

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.


Waldemar Kornewald

unread,
Jun 22, 2010, 3:10:32 AM6/22/10
to django-d...@googlegroups.com
On Tue, Jun 22, 2010 at 9:00 AM, Gert Van Gool <gertv...@gmail.com> wrote:
> What's wrong with the current StorageBackends?
> Apart from the fact that everything goes through Django first, which imho
> isn't a bad default solution (easier for parsing and the like...).

Just to clarify: django-filetransfers is an addition to the storage
backends. IOW, you use both in combination.

The storage backends API has these problems:
1) It doesn't work with all storage solutions (e.g., App Engine Blobstore)
2) Pushing large files through Django actually is really bad and
depending on your configuration it can block a whole Django instance
for the duration of the upload (small files might be fine, but still
this eats resources unnecessarily). Even if you use multi-threading
with Django it'll block a thread for a long time, so if you have lots
of uploads going on your whole web server can become unresponsive.

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Jun 22, 2010, 8:40:43 AM6/22/10
to django-d...@googlegroups.com

FYI -- X-Sendfile was addressed in the WSGI-improvements SoC project
last year; merging the results of that SoC project into trunk is one
of the things on my todo list for 1.3.

> With this solution you can continue to use ModelForm and FileField in
> your project.
>
> Normally, I'd just keep this API in a separate app (outside of Django
> core), but without this API the admin interface won't work properly.
> What do you think? Should Django consider including this (or a
> similar) API in core, so there is a standard way to handle any kind of
> upload/download solution? Or should the admin interface try to use
> django-filetransfers if it's available (probably not; just thinking
> aloud)?

We're not going to modify Admin to be dependent on an external app, so
that approach is a non-starter.

However, if there's a hole in the storage backend that is preventing,
then that sounds like something that needs to be addressed. So -- the
short answer is that yes, this is something I'd be interested in
entertaining as a core contribution, with the caveat that I first need
to have a better understanding of exactly what it is you're proposing.

My initial impression of django-filetransfers is that you've
constructed a lot of very complex infrastructure for what is
ultimately a couple of very simple changes --
* header injections in some support views to allow for better serving
of files by Django itself (X-Sendfile, etc),
* some backend-sensitive decisions about the URL that is used to
serve content, and
* some custom handling of upload URLs

I wasn't able to find the source for your appengine and s3 backends,
so I don't have a good feel for exactly what is required on the upload
side.

It also strikes me that a lot of this is being configured at the
global level -- i.e., you have to nominate that your public upload
backend will be S3, rather than nominating that a specific file field
will be backed by S3.

I suppose my other query would be exactly what you are proposing for
inclusion into trunk. The API you have documented for
django-filestransfers strikes me as something that has far too many
moving parts; there are six different settings, a bunch of different
utility functions that need to be called at the right times, and a
couple of views that may need to be deployed.

I want to specify "This file field uses S3". The rest of the details
should be handled for me as long as I use the right accessors (such as
'url') for the properties of that field. I might accept the need to
deploy an extra named view to support downloads, but in the case of
admin, that view should be autodeployed based on the fact that I've
nominated a storage backend that needs a download support view.

I appreciate that some of the design decisions you have made in
django-filetransfers have been made because you're maintaining this as
an external project. What I want to know is what changes to core you
are proposing, and how that affects the code that you have already
published.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Jun 22, 2010, 2:58:06 PM6/22/10
to django-d...@googlegroups.com
On Tue, Jun 22, 2010 at 2:40 PM, Russell Keith-Magee
<rus...@keith-magee.com> wrote:
> On Tue, Jun 22, 2010 at 2:55 PM, Waldemar Kornewald
> <wkorn...@gmail.com> wrote:
> My initial impression of django-filetransfers is that you've
> constructed a lot of very complex infrastructure for what is
> ultimately a couple of very simple changes --
>  * header injections in some support views to allow for better serving
> of files by Django itself (X-Sendfile, etc),
>  * some backend-sensitive decisions about the URL that is used to
> serve content, and
>  * some custom handling of upload URLs

Yes, that's basically what it does. More specifically:
1) completely overriding the response for file serving (it can be
header injection or a redirect)
2) backend-sensitive decisions about only publicly accessible URLs
(and if that's not available: fallback to (1))
3) custom handling of upload URLs and upload POST data

The primary purpose of this API is to allow for writing reusable
Django apps which aren't hard-coded on a specific hosting
configuration.

At a high level, my proposal is to add an API which handles those
three tasks and fits the primary purpose. This doesn't have to be
accomplished like in django-filetransfers. I'd just like to make sure
that we both agree on the same high-level functionality and the
primary purpose, so we can have a focused discussion.

> I wasn't able to find the source for your appengine and s3 backends,
> so I don't have a good feel for exactly what is required on the upload
> side.

The App Engine backend is here:
http://bitbucket.org/wkornewald/djangoappengine/src/tip/storage.py
(the django-filetransfers part is just the first two functions:
prepare_upload() and serve_file())

For one of our projects we've also built an async S3 upload mechanism,
so we know what's needed to support it, but we haven't implemented a
backend for it, yet.

> It also strikes me that a lot of this is being configured at the
> global level -- i.e., you have to nominate that your public upload
> backend will be S3, rather than nominating that a specific file field
> will be backed by S3.

I'm repeating myself here, but anyway: The primary purpose of this API
is to allow for writing reusable Django apps that automatically work
with your project's file handling solution(s) without hard-coding
anything. This requires that you specify the upload/download backends
separately from FileField (otherwise it's hard-coded and not reusable)
and instead provide a mechanism for detecting which backend should be
chosen for the current request. Of course, this mechanism could take
the FileField into account.

One possible solution is that the user writes a delegation backend
which inspects the given arguments (e.g., request, model, FileField)
and then delegates to the actual backend for that particular request.

Currently, the django-filetransfers API doesn't get the FileField, but
this could be added. BTW, should the prepare_upload() function
possibly become a member of FileField and should serve_file() and
public_download_url() become members of File (additionally File could
provide a way to get the originating FileField, so delegation backends
can use that information, too)? Maybe public_download_url() could
replace File.url, but that might break the existing behavior where the
storage backend implements File.url, so maybe it's better to just add
a new method to File.

FYI, we already provide a "delegate" backend in our code (see
filetransfers.backends.delegate) and the documentation contains an
example of how to configure it to use App Engine for private files and
some other mechanism (e.g. S3) for publicly accessible files.

> I suppose my other query would be exactly what you are proposing for
> inclusion into trunk. The API you have documented for
> django-filestransfers strikes me as something that has far too many
> moving parts; there are six different settings, a bunch of different
> utility functions that need to be called at the right times, and a
> couple of views that may need to be deployed.

There are only three different settings. The other settings were used
to configure the selected backend (this should probably be changed to
be a dict, similar to the DATABASES setting). Every backend is
specified independently (I guess that's what you mean with too many
moving parts) because you really need a lot of flexibility to handle
all the different hosting configurations. The download mechanism is
not necessarily related to the upload mechanism (download can be
X-Sendfile or MEDIA_URL or anything). The public download URL is not
necessarily related to the private serve_file() method (public via S3
or ..., private via X-Sendfile or App Engine or ...).

> I appreciate that some of the design decisions you have made in
> django-filetransfers have been made because you're maintaining this as
> an external project. What I want to know is what changes to core you
> are proposing, and how that affects the code that you have already
> published.

Don't worry about the current code. We haven't yet officially
announced that project and I'll hold it back until it's clear whether
this will be part of Django core or not. We can completely reinvent
the API from scratch if necessary.

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Jun 22, 2010, 8:58:33 PM6/22/10
to django-d...@googlegroups.com
On Wed, Jun 23, 2010 at 2:58 AM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> On Tue, Jun 22, 2010 at 2:40 PM, Russell Keith-Magee
> <rus...@keith-magee.com> wrote:
>> On Tue, Jun 22, 2010 at 2:55 PM, Waldemar Kornewald
>> <wkorn...@gmail.com> wrote:
>> It also strikes me that a lot of this is being configured at the
>> global level -- i.e., you have to nominate that your public upload
>> backend will be S3, rather than nominating that a specific file field
>> will be backed by S3.
>
> I'm repeating myself here, but anyway: The primary purpose of this API
> is to allow for writing reusable Django apps that automatically work
> with your project's file handling solution(s) without hard-coding
> anything. This requires that you specify the upload/download backends
> separately from FileField (otherwise it's hard-coded and not reusable)
> and instead provide a mechanism for detecting which backend should be
> chosen for the current request. Of course, this mechanism could take
> the FileField into account.

You may feel that you're repeating yourself, but this point certainly
wasn't clear to me from your previous two posts.

Making it possible to configure the file handling strategy of a
reusable app is certainly a reasonable feature request. However:

1) Again, the file storage strategy isn't something that is constant
across a deployment. I may want to use S3 for one type of file, but
use local storage for another.

2) Handling this flexibility at the level of the request is
completely the wrong approach. File storage doesn't change per request
-- it's defined per model. Every usage of the Profile model has an
'avatar' ImageField; it doesn't matter how or where you access that
field, it needs to be accessed and displayed the same way. This is a
per-model setting, not a per-view or per-request setting.

There is a larger issue here of how we should treat the problem of
configuring the internals of reusable applications. It's analogous to
the reasons why we dropped the 'using' argument from Meta declarations
on models -- it isn't a decision that a reusable-app-maker can make;
it needs to be configured as a deployment decision.

The solution for 'using' was to introduce the idea of a database
Router; perhaps an analogous approach is required here.

> Currently, the django-filetransfers API doesn't get the FileField, but
> this could be added. BTW, should the prepare_upload() function
> possibly become a member of FileField and should serve_file() and
> public_download_url() become members of File (additionally File could
> provide a way to get the originating FileField, so delegation backends
> can use that information, too)? Maybe public_download_url() could
> replace File.url, but that might break the existing behavior where the
> storage backend implements File.url, so maybe it's better to just add
> a new method to File.

Yes - it should be something that is integrated into the FileField at
some level. Breaking backwards-compatibility is not an option.

In which, case, you need to make a specific proposal. I'll admit that
this is a problem that needs to be addressed, and I'm interested in
seeing approaches that addressing this problem, I can't say it's a
particularly high priority for me. I'm happy to give you feedback on a
specific proposal, but I have many much higher priority items on my
plate for 1.3.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Jun 23, 2010, 12:24:16 PM6/23/10
to django-d...@googlegroups.com
On Wed, Jun 23, 2010 at 2:58 AM, Russell Keith-Magee

That's ok, but in addition to the model and field I'd still pass the
request object to the backend, so you can check if the user hasn't
used up his quote or maybe other things.

> There is a larger issue here of how we should treat the problem of
> configuring the internals of reusable applications. It's analogous to
> the reasons why we dropped the 'using' argument from Meta declarations
> on models -- it isn't a decision that a reusable-app-maker can make;
> it needs to be configured as a deployment decision.
>
> The solution for 'using' was to introduce the idea of a database
> Router; perhaps an analogous approach is required here.

The "delegate" backend is basically a router. With the
django-filetransfers API you can just write another backend which does
the routing.

>> Don't worry about the current code. We haven't yet officially
>> announced that project and I'll hold it back until it's clear whether
>> this will be part of Django core or not. We can completely reinvent
>> the API from scratch if necessary.
>
> In which, case, you need to make a specific proposal. I'll admit that
> this is a problem that needs to be addressed, and I'm interested in
> seeing approaches that addressing this problem, I can't say it's a
> particularly high priority for me. I'm happy to give you feedback on a
> specific proposal, but I have many much higher priority items on my
> plate for 1.3.

OK, so before I send a patch, here is how I'd like to do it:

From the end-user perspective
-----------------------------------------------

FileField gets a new method prepare_upload() which takes the following
arguments:
* request
* upload_url: the target URL of the upload view
* private: should this be only privately accessibly or also publicly?
(default: False; whether this actually works depends on the chosen
backend's capabilities and your hosting setup)
The function returns a tuple with a new target submission URL and a
dict with additional POST data.

forms.Form also gets a prepare_upload() function which searches for a
FileField. If multiple FileFields exist and the backend doesn't
support multi-file uploads an exception is raised. The function
doesn't return anything, but instead stores the results as
self.upload_url and self.some_file_field.post_data. When the form is
rendered the FileField automatically generates <input type="hidden" />
fields for self.post_data. Q: or should prepare_upload() better be
defined on forms.FileField?

File (from FileField) gets a public_download_url() method which takes
no arguments. It returns the file's permanent public URL or None if no
such URL exists.

File (from FileField) gets a serve() method which takes the following arguments:
* request
* save_as: if False (default) lets the browser decide whether to
display or download the file; if True forces browser to download as
"file.name"; if it's a string it forces a download with the given
string as the file name
* content_type: if None (default) automatically detects content type
based on file name; otherwise it overrides the default content type
This returns an HttpResponse.

Example:

Upload view:
form = FileForm()
form.prepare_upload(request, '/upload')

Upload template:
<form action="{{ form.upload_url }}" ...>
{% csrf_token %}
<table>{{ form }}</table>
</form>

Download template for private-only downloads (entity is a model instance):
<a href="{% url download_view pk=entity.pk %}">Download</a>

Download template for public downloads (entity is a model instance
with a FileField called "file"):
{% url download_view pk=entity.pk as fallback_url %}
<a href="{% firstof entity.file.public_download_url fallback_url
%}">Download</a>

Download view for public and private downloads (public only as a
fall-back if there is no public_download_url):
entity.file.serve(request)

The prepare_upload(), public_download_url(), and serve() functions
each have their own backend type. Each of the three backend types is
configured separately in settings.py:
PREPARE_UPLOAD_BACKEND = {
'backend': 'some.backend',
'setting': ...,
'setting2': ...,
}
PUBLIC_DOWNLOAD_URL_BACKEND = {
'backend': 'some.backend',
...
}
SERVE_FILE_BACKEND = {
'backend': 'some.backend',
...
}

Backend API
--------------------------------------------------------
I won't, yet talk too much about the backend API itself because we
should first talk about the end-user API. Each backend type gets all
of the parameters that the user passes to the respective public
function (prepare_upload(), etc.). Additionally, each function gets
the model and FileField. Also, the File methods (public_download_url()
and serve()) get the file instance:
def prepare_upload(model, field, request, upload_url, private, **kwargs):
...

def public_download_url(model, field, file, **kwargs):
...

def serve_file(model, field, file, request, save_as, content_type, **kwargs):
...

There is no special Router API. Instead, you can write your own
backend which routes/delegates to some other backend.

Bye,
Waldemar Kornewald

Robert Coup

unread,
Jun 23, 2010, 5:06:07 PM6/23/10
to django-d...@googlegroups.com
On Thu, Jun 24, 2010 at 4:24 AM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> FileField gets a new method prepare_upload() which takes the following
> arguments:
> * request
> * upload_url: the target URL of the upload view
> * private: should this be only privately accessibly or also publicly?
> (default: False; whether this actually works depends on the chosen
> backend's capabilities and your hosting setup)

There are many shades of grey between public & private, I'm not sure a
boolean will cut it here for the long term.

- anyone on the net
- registered users
- members of a group
- the uploader & admins...

Can/should we tie this into object-level permissions?

There are also different URLs as well, which could all be valid for
the same object:
- public, nice urls: http://example.com/alex/photos/mycar.jpg
- obscured urls: http://example.com/asdfgh123456/mycar.jpg
- URLs valid for some time (I'm thinking S3/etc):
http://example.com/alex/photos/mycar.jpg?auth=123456

The backend should be generating #3 and #1, maybe #2 is handled by the views?

Rob :)

Waldemar Kornewald

unread,
Jun 24, 2010, 2:40:55 AM6/24/10
to django-d...@googlegroups.com
On Wed, Jun 23, 2010 at 11:06 PM, Robert Coup
<rober...@koordinates.com> wrote:
> On Thu, Jun 24, 2010 at 4:24 AM, Waldemar Kornewald
> <wkorn...@gmail.com> wrote:
>> FileField gets a new method prepare_upload() which takes the following
>> arguments:
>> * request
>> * upload_url: the target URL of the upload view
>> * private: should this be only privately accessibly or also publicly?
>> (default: False; whether this actually works depends on the chosen
>> backend's capabilities and your hosting setup)
>
> There are many shades of grey between public & private, I'm not sure a
> boolean will cut it here for the long term.
>
> - anyone on the net
> - registered users
> - members of a group
> - the uploader & admins...
>
> Can/should we tie this into object-level permissions?

The boolean is sufficient because those permission checks should be
done in the download view (or a router backend):

if request.user.is_authenticated:
return file.serve()
else:
# user has no permissions

This can only work if the upload handler knows whether the file should
be publicly accessible or private. If you want to be able to do
permission checks you say private=True. Only if you know that a
certain file will always be public you should use private=False which
is nothing more than an optimization, so the download URL (via
file.public_download_url()) doesn't point to Django, but directly to
the file server.

> There are also different URLs as well, which could all be valid for
> the same object:
> - public, nice urls: http://example.com/alex/photos/mycar.jpg

That's what a normal public_download_url() backend would return.

What's the purpose of obscuring the URL?
* If you want shorter URLs and your file server understands that URL
format it can be handled by file.public_download_url(), possibly with
a custom backend.
* If you want short URLs and your file server doesn't understand that
obscured format you can write a custom file.serve() backend or a
custom view which just redirects from the obscured URL to the real
URL.

> - URLs valid for some time (I'm thinking S3/etc):
> http://example.com/alex/photos/mycar.jpg?auth=123456

This should not be file.public_download_url() because this URL will
expire (thus it will not be public forever). Such URLs should be
generated by file.serve() (returning an HttpResponseRedirect) in the
download view.

Bye,
Waldemar Kornewald

Luke Plant

unread,
Jun 24, 2010, 7:12:44 AM6/24/10
to django-d...@googlegroups.com
On Thu, 2010-06-24 at 08:40 +0200, Waldemar Kornewald wrote:

> The boolean is sufficient because those permission checks should be
> done in the download view (or a router backend):
>
> if request.user.is_authenticated:
> return file.serve()
> else:
> # user has no permissions

This seems like an arbitrary restriction - why do anonymous users have
no permissions? We already have a mechanism for making this decision -
user.has_perm(), and that works for anonymous users as well as
authenticated users.

Luke

--
Sometimes I wonder if men and women really suit each other. Perhaps
they should live next door and just visit now and then. (Katherine
Hepburn)

Luke Plant || http://lukeplant.me.uk/

Waldemar Kornewald

unread,
Jun 24, 2010, 7:41:32 AM6/24/10
to django-d...@googlegroups.com
On Thu, Jun 24, 2010 at 1:12 PM, Luke Plant <L.Pla...@cantab.net> wrote:
> On Thu, 2010-06-24 at 08:40 +0200, Waldemar Kornewald wrote:
>
>> The boolean is sufficient because those permission checks should be
>> done in the download view (or a router backend):
>>
>> if request.user.is_authenticated:
>>     return file.serve()
>> else:
>>     # user has no permissions
>
> This seems like an arbitrary restriction - why do anonymous users have
> no permissions?  We already have a mechanism for making this decision -
> user.has_perm(), and that works for anonymous users as well as
> authenticated users.

That was just a code sample. :) I could as well have written:

if request.user.has_perm(...):
return file.serve()
else:
# no permissions

The point is, you can make any kind of permission check in the
download view and only call file.serve() if the permissions are OK.

Bye,
Waldemar Kornewald

Waldemar Kornewald

unread,
Jun 26, 2010, 12:52:37 PM6/26/10
to django-d...@googlegroups.com
Hi again,
so, does the proposal look fine for now, so I can actually make a
patch or can you already tell me now that there is a problem which
needs to be solved, first?

Bye,
Waldemar Kornewald

Russell Keith-Magee

unread,
Jun 28, 2010, 8:08:09 AM6/28/10
to django-d...@googlegroups.com
Apologies for the late reply - I was at a conference all weekend, so
I'm still catching up on mail.

On Thu, Jun 24, 2010 at 12:24 AM, Waldemar Kornewald

You're missing my point. You aren't guaranteed to have a request when
you're using the field. Consider the case of a standalone data
processing script.

>> There is a larger issue here of how we should treat the problem of
>> configuring the internals of reusable applications. It's analogous to
>> the reasons why we dropped the 'using' argument from Meta declarations
>> on models -- it isn't a decision that a reusable-app-maker can make;
>> it needs to be configured as a deployment decision.
>>
>> The solution for 'using' was to introduce the idea of a database
>> Router; perhaps an analogous approach is required here.
>
> The "delegate" backend is basically a router. With the
> django-filetransfers API you can just write another backend which does
> the routing.

They're not quite the same. Sure, the Delegate interface defines a way
to configure behaviour, but it doesn't provide an interface to decide
which behavior is appropriate at any given time.

Again, this comes back to the concept that different reusable apps may
have different file storage requirements, and you need to be able to
define the strategy for those requirements at a project level. Sure,
the simple case of "put everything on S3" needs to be simple, but you
also need to be able to say "put all the Profile.avatar images on S3,
but all the Document.upload files on a AppEngine filestore".

You also need to take into account that a file field may not
necessarily be backed by a model at all. It could just be an upload.
Form.FileField doesn't require that it is backed by a model.FileField.

>>> Don't worry about the current code. We haven't yet officially
>>> announced that project and I'll hold it back until it's clear whether
>>> this will be part of Django core or not. We can completely reinvent
>>> the API from scratch if necessary.
>>
>> In which, case, you need to make a specific proposal. I'll admit that
>> this is a problem that needs to be addressed, and I'm interested in
>> seeing approaches that addressing this problem, I can't say it's a
>> particularly high priority for me. I'm happy to give you feedback on a
>> specific proposal, but I have many much higher priority items on my
>> plate for 1.3.
>
> OK, so before I send a patch, here is how I'd like to do it:
>
> From the end-user perspective
> -----------------------------------------------
>
> FileField gets a new method prepare_upload() which takes the following
> arguments:
> * request

Again - this isn't always available.

> * upload_url: the target URL of the upload view
> * private: should this be only privately accessibly or also publicly?
> (default: False; whether this actually works depends on the chosen
> backend's capabilities and your hosting setup)

As Robert and Luke both point out, public/private is neither a
necessary nor sufficient basis on which to make this decision.

> The function returns a tuple with a new target submission URL and a
> dict with additional POST data.
>
> forms.Form also gets a prepare_upload() function which searches for a
> FileField. If multiple FileFields exist and the backend doesn't
> support multi-file uploads an exception is raised.

Ok - no problems with this bit. I have no problems with the idea that
if you've configured your file backend to use a particular strategy on
your form, and your form won't support that strategy, an exception
will need to be raised.

> The function
> doesn't return anything, but instead stores the results as
> self.upload_url and self.some_file_field.post_data. When the form is
> rendered the FileField automatically generates <input type="hidden" />
> fields for self.post_data. Q: or should prepare_upload() better be
> defined on forms.FileField?

This sounds like a better place for it.

> File (from FileField) gets a public_download_url() method which takes
> no arguments. It returns the file's permanent public URL or None if no
> such URL exists.

Sorry - are you saying this is on the FileField, or on the File
object? If it's on the FileField, how do you get the download URL when
there isn't a model backing the form? And if it's on the File object,
what's wrong with the url() method that is already there?

> File (from FileField) gets a serve() method which takes the following arguments:
> * request
> * save_as: if False (default) lets the browser decide whether to
> display or download the file; if True forces browser to download as
> "file.name"; if it's a string it forces a download with the given
> string as the file name
> * content_type: if None (default) automatically detects content type
> based on file name; otherwise it overrides the default content type
> This returns an HttpResponse.
>
> Example:
>
> Upload view:
> form = FileForm()
> form.prepare_upload(request, '/upload')

I'm not wile that the prepare_upload step is required. It feels like
something that should be handled internally, based on the
configuration of handler for the field.

> Upload template:
> <form action="{{ form.upload_url }}" ...>
>    {% csrf_token %}
>    <table>{{ form }}</table>
> </form>

I'm not sure I follow what happends with the non-form data here. From
what I can make out, upload_url won't be able to handle any non-file
data. I'm also not sure what happens for the default case (i.e., the
case the defines the current file upload behaviour), where multiple
files can be uploaded to the normal form handling API.

> Download template for private-only downloads (entity is a model instance):
> <a href="{% url download_view pk=entity.pk %}">Download</a>

What determines the view name "download_view"?

Again, there will be a need for *some* sort of routing API, because
this isn't a decision that can be configured on a project-wide basis
with a single setting (or a pair of settings on an arbitrary
public/private distinction).

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Jul 17, 2010, 1:59:22 PM7/17/10
to django-d...@googlegroups.com
Hi Russell,
so, after our chat on IRC I've finally found the time to implement a
real proposal including unit tests. I've attached the patch to this
ticket:
http://code.djangoproject.com/ticket/13960

Now there is just one backend type with a single setting:

FILE_BACKENDS = (
'path.to.Backend',
'path.to.Backend2',
)

--------

The end-user will normally use the API via ModelForm like this when uploading:
form = MyModelForm()
form.prepare_upload(request)

By default, prepare_upload() will prepare an upload to the current URL
(and thus current view). Optionally, you can specify a different url
via form.prepare_upload(request, url='/other/url').

The prepare_upload function is necessary because we need to support
services which first send the file and form data to a different URL
(e.g., to S3 or '/_ah/upload/...' on GAE). Once the upload is
finished, the service or backend is responsible for sending a request
to the actual target URL (some Django view).

So, in the template you have to use the generated upload url:
<form action="{{ form.upload_url }}" ...>
<table>{{ form }}</table>

FYI, behind the scenes prepare_upload() might also add hidden input
fields to your form, depending on the service you're using.

------

Downloads are split into two functions which both exist on the file
object returned by FileField.

With file.serve(request) which returns an HttpResponse you can handle
the file download from a Django view. This also allows app developers
to check permissions in the view before calling file.serve(request).

Additionally, there's file.download_url() which deprecates file.url
(because that uses the storage backend which is the wrong place for
this feature). This function will always return an URL. If none of
your backends provides an URL we use "MEDIA_URL + file.name" as a
fallback.

In some cases file.download_url() will point to a Django view which
calls file.serve(request) (and which can do some permission checks if
necessary). Authors of reusable Django apps are responsible for
providing a default backend which generates the URL to their app's
built-in download view. End-users can overrides this.

In your templates you'd always use code like this:
<a href="{{ entity.file.download_url }}">Download</a>

--------

Backends

Every backend derives from BaseFileBackend. Every backend instance has
these members:
* self.model: the model that own the FileFields to upload to or download from
* self.fields: the list of FileField instances to upload to or
download from (in case of a download there is just one entry)
For convenience, there is also self.field which makes sure that there
is just one field in self.fields and returns that field.

Every backend can override the functions mentioned above
(prepare_upload, file.serve, file.download_url). You can take a look
at the source for more details. FYI, the prepare_upload() function is
pretty much the same as before, just without the private=True/False
parameter.

Additionally, backends can define get_storage_backend() which returns
a storage backend for the given model/field combination. As a fallback
DEFAULT_STORAGE_BACKEND is used.

The API is also similar to DB routers. If any of those functions
returns None the next backend is tried (as defined in
settings.FILE_BACKENDS).

Please provide some feedback. Does this solve all issues you had with the API?

Bye,
Waldemar Kornewald

On Mon, Jun 28, 2010 at 2:08 PM, Russell Keith-Magee

Russell Keith-Magee

unread,
Jul 21, 2010, 10:30:17 PM7/21/10
to django-d...@googlegroups.com
On Sun, Jul 18, 2010 at 1:59 AM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> Hi Russell,
> so, after our chat on IRC I've finally found the time to implement a
> real proposal including unit tests. I've attached the patch to this
> ticket:
> http://code.djangoproject.com/ticket/13960
>
> Now there is just one backend type with a single setting:
>
> FILE_BACKENDS = (
>    'path.to.Backend',
>    'path.to.Backend2',
> )

Bikeshedding, but FILE_BACKENDS is a little generic for my taste; it's
not specifying anything about *files* per-se, it's about the upload
and download of files.

> --------
>
> The end-user will normally use the API via ModelForm like this when uploading:
> form = MyModelForm()
> form.prepare_upload(request)

I accept the need for this, but this seems like a bit of a wart. This
method wouldn't be required at all if the Form took a request
argument. This isn't an unusual requirement, either -- perhaps we
should introduce a RequestForm/RequestModelForm that formalizes the
availability of the request object during form processing.

This would also have an analog with Context/RequestContext. In most
practical situations, you need to use RequestContext, but the raw
context is there if you need it.

> By default, prepare_upload() will prepare an upload to the current URL
> (and thus current view). Optionally, you can specify a different url
> via form.prepare_upload(request, url='/other/url').

In which case, this should be an argument or modifier to the RequestForm. e.g.,

form = MyRequestModelForm(request)
form.upload_location = '/other/url'

> The prepare_upload function is necessary because we need to support
> services which first send the file and form data to a different URL
> (e.g., to S3 or '/_ah/upload/...' on GAE). Once the upload is
> finished, the service or backend is responsible for sending a request
> to the actual target URL (some Django view).
>
> So, in the template you have to use the generated upload url:
> <form action="{{ form.upload_url }}" ...>
> <table>{{ form }}</table>

Sure. With the caveat that if we're doing a RequestForm, then this
attribute should *always* be available.

> FYI, behind the scenes prepare_upload() might also add hidden input
> fields to your form, depending on the service you're using.

I'm not wild about the idea that a function post-modifies the fields
on a form. This strikes me as something that should be predictable at
time of form construction.

> ------
>
> Downloads are split into two functions which both exist on the file
> object returned by FileField.
>
> With file.serve(request) which returns an HttpResponse you can handle
> the file download from a Django view. This also allows app developers
> to check permissions in the view before calling file.serve(request).
>
> Additionally, there's file.download_url() which deprecates file.url
> (because that uses the storage backend which is the wrong place for
> this feature). This function will always return an URL. If none of
> your backends provides an URL we use "MEDIA_URL + file.name" as a
> fallback.
>
> In some cases file.download_url() will point to a Django view which
> calls file.serve(request) (and which can do some permission checks if
> necessary). Authors of reusable Django apps are responsible for
> providing a default backend which generates the URL to their app's
> built-in download view. End-users can overrides this.
>
> In your templates you'd always use code like this:
> <a href="{{ entity.file.download_url }}">Download</a>

No problem with the general principles here; however, there is some
overlap with the WSGI-improvements GSoC work from last year, which
included much more complete handling of streaming file downloads.

> --------
>
> Backends
>
> Every backend derives from BaseFileBackend. Every backend instance has
> these members:
> * self.model: the model that own the FileFields to upload to or download from

Except that there's no guarantee that a Form will be backed by a
model. You can upload a file without a model being involved.

> * self.fields: the list of FileField instances to upload to or
> download from (in case of a download there is just one entry)
> For convenience, there is also self.field which makes sure that there
> is just one field in self.fields and returns that field.
>
> Every backend can override the functions mentioned above
> (prepare_upload, file.serve, file.download_url). You can take a look
> at the source for more details. FYI, the prepare_upload() function is
> pretty much the same as before, just without the private=True/False
> parameter.
>
> Additionally, backends can define get_storage_backend() which returns
> a storage backend for the given model/field combination. As a fallback
> DEFAULT_STORAGE_BACKEND is used.
>
> The API is also similar to DB routers. If any of those functions
> returns None the next backend is tried (as defined in
> settings.FILE_BACKENDS).

The backend API itself is starting to look reasonably good -- there
are a lot less moving parts, which is good. However, when I alluded to
Routers in our previous discussions, I didn't mean you should be quite
so literal.

There are three pieces to this puzzle.
a) The mechanics for how to do a file transfer to S3, etc
b) Configuring a specific instance of an S3 transfer backend to point
at a particular path
c) Configuring which backend should be used for which form.

At the moment, it appears that your convolving all three roles into
one backend class. To my mind, (b) should be an instantiation of (a),
and (c) is a separate issue for determining which (b) instance is
required at any given time. That is, the routing behavior and the file
upload logic shouldn't be part of the same class as the backend
itself. (By analogy, the database routers determine which database
will be used, but contain no logic to actually write to a database).

What we need to set up is a way of clearly identifying a particular
form (which is to say, a file upload point) so that as part of system
configuration, we can associate a particular uploading behavior. In
the parlance of Aspect-oriented programming, we're specifying a join
point.

In the case of database routers, the model/instance is what we're
actually operating upon, so that is the object that is appropriate for
making a routing decision. A big part of the router selection comes
from the hint -- extra information associated with the activity that
requires a router decision. In most cases, this is the 'other' object
that is involved in the process.

It's not clear to me that the file backends provide a similarly
unambiguous mechanism for making a decision, or *any* analogous
mechanism for 'hinting'. You've attempted to do this with a
combination of model and field, but I'm not convinced that this is
sufficient -- and even if it is sufficient, it's more than a little
bit fragile. In the case of a file upload, the model is both optional,
and not the primary focus of the activity you're configuring.

It may be as simple as providing a simple tag (a-la named URLs) that
can be used as a namespace to describe the file uploads that are
necessary.

Once such a name is available, you may also find that the cascading
behavior isn't actually required. Cascading is necessary in the case
of database routers because you may need to consider several factors
before deciding which database needs to be used; however, in the case
of file uploads, the uploading behavior isn't really dynamic -- form X
will always be backed by upload mechanism Y. All you really need is a
way of performing that mapping.

> Please provide some feedback. Does this solve all issues you had with the API?

It's not perfect, but It's certainly looking a lot better.

Yours,
Russ Magee %-)

Waldemar Kornewald

unread,
Jul 22, 2010, 6:01:45 AM7/22/10
to django-d...@googlegroups.com
On Thu, Jul 22, 2010 at 4:30 AM, Russell Keith-Magee
<rus...@keith-magee.com> wrote:
> On Sun, Jul 18, 2010 at 1:59 AM, Waldemar Kornewald
> <wkorn...@gmail.com> wrote:
>> Hi Russell,
>> so, after our chat on IRC I've finally found the time to implement a
>> real proposal including unit tests. I've attached the patch to this
>> ticket:
>> http://code.djangoproject.com/ticket/13960
>>
>> Now there is just one backend type with a single setting:
>>
>> FILE_BACKENDS = (
>>    'path.to.Backend',
>>    'path.to.Backend2',
>> )
>
> Bikeshedding, but FILE_BACKENDS is a little generic for my taste; it's
> not specifying anything about *files* per-se, it's about the upload
> and download of files.

I'm open for naming suggestions. I could rename it to FILE_TRANSFER_BACKENDS.

>> --------
>>
>> The end-user will normally use the API via ModelForm like this when uploading:
>> form = MyModelForm()
>> form.prepare_upload(request)
>
> I accept the need for this, but this seems like a bit of a wart. This
> method wouldn't be required at all if the Form took a request
> argument. This isn't an unusual requirement, either -- perhaps we
> should introduce a RequestForm/RequestModelForm that formalizes the
> availability of the request object during form processing.

Maybe. Then RequestForm could be used for hinting (instead of
Model/FileField). The constructor would take a target_url parameter
and take care of adding hidden input fields if they're required for
the upload. However, I also added this dependency because I wanted to
provide some useful hint for download_url() and serve().

>> Backends
>>
>> Every backend derives from BaseFileBackend. Every backend instance has
>> these members:
>> * self.model: the model that own the FileFields to upload to or download from
>
> Except that there's no guarantee that a Form will be backed by a
> model. You can upload a file without a model being involved.

I was thinking in terms of file uploads for storage on the server. In
that case the file file upload mechanism might only work with one
specific file storage backend (which is defined on FileField). We
could try to make it independent of the model, but then the question
is: Should we make file downloads independent of the Model/Field, too?
How do we handle file downloads without using a model/field as the
hint?

Tag names could have name clashes and not all developers will
appreciate that they suddenly have to give every form and file
download a unique tag. I'm not sure if tag names are the best
solution, but I don't have any better alternative, either. Is this
really the way it should be used?

Maybe we can at least auto-generate tag names?

# FileField needs a tag to select the storage backend
class MyModel(models.Model):
file = models.FileField(tag='myapp')

# Should the default tag be generated from model and field?
class MyModel(models.Model):
file = models.FileField() # tag defaults to
'myapp.models.MyModel.file' or something like that

# Forms need a tag argument
# (again, maybe the default tag could be auto-generated, e.g.
'myapp.forms.MyRequestForm'?)
MyRequestForm(request, tag='myapp')

# Files get the default tag from their FileField
# or you have to manually set
# file.tag = '...'
# (this would always be necessary for request.FILES)
file.download_url()
file.serve(request)

> Once such a name is available, you may also find that the cascading
> behavior isn't actually required. Cascading is necessary in the case
> of database routers because you may need to consider several factors
> before deciding which database needs to be used; however, in the case
> of file uploads, the uploading behavior isn't really dynamic -- form X
> will always be backed by upload mechanism Y. All you really need is a
> way of performing that mapping.

The request (and request.user) is a dynamic component in the upload
process, so the cascading could still be needed. I'm not sure if we
should remove the request from the backend API. At least, I could
imagine that file downloads need the request to run some permission
checks against the user.

> There are three pieces to this puzzle.
>  a) The mechanics for how to do a file transfer to S3, etc
>  b) Configuring a specific instance of an S3 transfer backend to point
> at a particular path
>  c) Configuring which backend should be used for which form.
>
> At the moment, it appears that your convolving all three roles into
> one backend class. To my mind, (b) should be an instantiation of (a),
> and (c) is a separate issue for determining which (b) instance is
> required at any given time. That is, the routing behavior and the file
> upload logic shouldn't be part of the same class as the backend
> itself. (By analogy, the database routers determine which database
> will be used, but contain no logic to actually write to a database).

I agree that these things should be separated, so you'd have one S3
backend which takes care of file transfers, but does no routing and a
separate backend would handle only the routing, but not file
transfers. I just don't think we need to implement different APIs for
these concepts because the APIs would be basically identical, anyway
(both getting the same parameters and providing the same functions).

Bye,
Waldemar Kornewald

Waldemar Kornewald

unread,
Aug 14, 2010, 7:45:19 AM8/14/10
to django-d...@googlegroups.com
On Thu, Jul 22, 2010 at 4:30 AM, Russell Keith-Magee
<rus...@keith-magee.com> wrote:
> I accept the need for this, but this seems like a bit of a wart. This
> method wouldn't be required at all if the Form took a request
> argument. This isn't an unusual requirement, either -- perhaps we
> should introduce a RequestForm/RequestModelForm that formalizes the
> availability of the request object during form processing.

OK, I've now added a Request(Model)Form. The tag is based on
<cls.__module__>.<cls.__name__>, so you don't have to manually specify
it. FileField takes an optional parameter to override this. Forms
currently only allow to set the tag by subclassing. Not sure if this
is sufficient.
http://code.djangoproject.com/attachment/ticket/13960/filetransfers.3.diff

I could need some feedback. Does this go in the right direction? Is
anything important missing?

class UploadForm(RequestModelForm):
class Meta:
model = bla

form = UploadForm(request)
render_to_response(... {... 'form': form} ...)

<form action="{{ form.upload_url }}">
{{ form }}
</form>

Apart from that you can call file.serve() and file.download_url() as
discussed in the previous posts. You can see another example in the
unit tests of my patch. It also contains a delegation backend which
allows to easily specify which backend should be used for which
app/form/model/field via the settings:

DEFAULT_FILE_TRANSFER_BACKENDS = {
'myapp.*': 'some.backend',
'myapp.models.MyModel.s3_file': 'S3Backend',
'otherapp.models.SomeModel.*': 'OtherBackend',
}

If you need more control you can use FILE_TRANSFER_BACKENDS which
works more like the routers API.

Bye,
Waldemar Kornewald

--
Django on App Engine, MongoDB, ...? Browser-side Python? It's open-source:
http://www.allbuttonspressed.com/blog/django

Waldemar Kornewald

unread,
Aug 15, 2010, 1:13:26 PM8/15/10
to django-d...@googlegroups.com
On Sat, Aug 14, 2010 at 1:45 PM, Waldemar Kornewald
<wkorn...@gmail.com> wrote:
> On Thu, Jul 22, 2010 at 4:30 AM, Russell Keith-Magee
> <rus...@keith-magee.com> wrote:
>> I accept the need for this, but this seems like a bit of a wart. This
>> method wouldn't be required at all if the Form took a request
>> argument. This isn't an unusual requirement, either -- perhaps we
>> should introduce a RequestForm/RequestModelForm that formalizes the
>> availability of the request object during form processing.
>
> OK, I've now added a Request(Model)Form. The tag is based on
> <cls.__module__>.<cls.__name__>, so you don't have to manually specify
> it. FileField takes an optional parameter to override this. Forms
> currently only allow to set the tag by subclassing. Not sure if this
> is sufficient.
> http://code.djangoproject.com/attachment/ticket/13960/filetransfers.3.diff
>

Now I've finished admin UI support for the proposed API:
http://code.djangoproject.com/attachment/ticket/13960/filetransfers.4.diff

If you want to see a more practical example, the following
demonstrates an async upload process (like with S3 or App Engine):
http://bitbucket.org/wkornewald/upload-sample-trunk-filetransfers

The same async upload process is used in the admin UI. If you submit a
file from within the admin UI the file first gets sent to a view at
/upload and then the request gets forwarded to the actual admin view.
Also, the backend changes the file download URLs (even in the admin
UI) to point to the download view at /download.

BTW, if you're too busy/lazy to apply the patch you can just clone my
Django development branch:
http://bitbucket.org/wkornewald/django-trunk-filetransfers

Waldemar Kornewald

unread,
Aug 18, 2010, 7:50:11 PM8/18/10
to django-d...@googlegroups.com
No comments means it's still not good enough and I'll never get it
into an acceptable shape? :)

Bye,
Waldemar

On Sun, Aug 15, 2010 at 7:13 PM, Waldemar Kornewald

Jannis Leidel

unread,
Aug 19, 2010, 3:38:15 AM8/19/10
to django-d...@googlegroups.com
Am 19.08.2010 um 01:50 schrieb Waldemar Kornewald:

> No comments means it's still not good enough and I'll never get it
> into an acceptable shape? :)

No, in that case we would tell you :) It's much more likely that nobody had time yet (in the last 3 days) to look at your patch thoroughly.

Jannis

Waldemar Kornewald

unread,
Sep 2, 2010, 4:25:50 AM9/2/10
to django-d...@googlegroups.com
How often should I ping, so my patch won't be forgotten? :)

Russell Keith-Magee

unread,
Sep 4, 2010, 12:45:55 PM9/4/10
to django-d...@googlegroups.com
On Thu, Sep 2, 2010 at 4:25 PM, Waldemar Kornewald <wkorn...@gmail.com> wrote:
> How often should I ping, so my patch won't be forgotten? :)

Look - I don't want to appear rude or disparaging, but it will happen
when it happens.

Your patch is on a long list of things needing review -- including
hundreds currently sitting on tickets in Trac.

The question you need to ask is "why hasn't my patch received priority
over anyone else's?"

I can't speak for others in the core team, but my priority list works
roughly like this:

1. Problems that I've caused myself (either because I added a feature
with a bug, or a bug fix was wrong)
2. Problems I'm having myself and features I have an immediate use for
3. Patches that come from people I have a history of working with successfully
4. Patches that I think I can dispense with easily
5. Patches that appear to have a lot of community interest or support
6. Everything else.

The tickets falling into group 1-2 are generally a fairly short list.
Category 3 is a fairly manageable list too. That leaves categories
4-6. Category 4 has plenty of candidates in RFC state in Trac.
Category 5 is usually fairly well populated too.

At the moment, this proposal falls into category 6 for me. I've put a
bunch of time into the idea in the early phases (both in mailing lists
and on IRC) because I thought the idea had some merit. Also, I was on
the road on business when you originally raised the idea, so I had
some time to kill in a hotel room -- you caught me at an opportune
moment.

After that initial design work, you came back with a patch, which I
reviewed; again, I had the time, and I figured a review was warranted
on the chance that the design notes I'd given you had resulted in a
trunk-ready patch. That patch was better that the original, but wasn't
a small step away from being trunk-ready -- there's was still a bunch
of development left to do.

On top of that, there has been a lack of substantial feedback from
anyone else in the community. I don't just mean core developers,
either -- I mean *anyone* other than myself.

What I saw in the first patch hasn't given me the confidence that this
is a feature that is a quick review away from trunk -- it's more
likely to be something that requires a lot more rounds of design
feedback. And even if I did think it was very close to trunk-ready, I
don't like committing big features to trunk without evidence that
someone else (and preferably several someone elses) has sanity checked
the idea -- doubly so when we're talking about major conceptual jumps
like RequestForm.

In short, nothing about this history of this patch has given me cause
to prioritize another review of this feature over any of the other
potential uses of my Django-related volunteer time.

This doesn't mean that what you've proposed wouldn't be a valuable
addition -- it just means that for me, helping you finesse this idea
into a trunk-ready patch is a lower priority than a bunch of other
stuff on my todo list. For Django 1.3, I have at least 2 big features
that fall into category 2 that I haven't had time to look at yet.
There's also a huge backlog of RFC tickets relating to existing that
deserve attention given the feature-based focus of the 1.1 and 1.2
releases.

So, at this point, I'm not making any promises of a rapid review; it
will happen when it happens. The ticket is in Trac, and I've got it
flagged on my todo list of stuff to look at. If you can do anything to
address the points I've raised in this email, you'll improve your
chances of getting reviewed sooner.

Yours,
Russ Magee %-)

Jeremy Dunck

unread,
Sep 4, 2010, 3:00:46 PM9/4/10
to django-d...@googlegroups.com, django-d...@googlegroups.com

On Sep 4, 2010, at 11:45 AM, Russell Keith-Magee <rus...@keith-magee.com> wrote:

> On Thu, Sep 2, 2010 at 4:25 PM, Waldemar Kornewald <wkorn...@gmail.com> wrote:
>> How often should I ping, so my patch won't be forgotten? :)
>
> Look - I don't want to appear rude or disparaging, but it will happen
> when it happens.
>
> Your patch is on a long list of things needing review -- including
> hundreds currently sitting on tickets in Trac.
>
> The question you need to ask is "why hasn't my patch received priority
> over anyone else's?"
>
> I can't speak for others in the core team, but my priority list works
> roughly like this:
>
> 1. Problems that I've caused myself (either because I added a feature
> with a bug, or a bug fix was wrong)
> 2. Problems I'm having myself and features I have an immediate use for
> 3. Patches that come from people I have a history of working with successfully
> 4. Patches that I think I can dispense with easily
> 5. Patches that appear to have a lot of community interest or support
> 6. Everything else.
>

This explanation of your priorities is useful to me. Thanks.

Reply all
Reply to author
Forward
0 new messages