get_FIELD_filename() in pluggable FileField backends

5 views
Skip to first unread message

Marty Alchin

unread,
Sep 4, 2007, 2:09:28 PM9/4/07
to django-d...@googlegroups.com
I finally have some code to support multiple backends, but I'd like to
ask a question and write some documentation before I create a ticket
for it. I apologize in advance for such a long email, but there's a
bit of backstory necessary to ask the question.

I was wondering about the intended use of get_FIELD_filename() and the
impacts this patch would have on it. Currently, get_FIELD_filename()
returns the absolute filesystem location of the file, which seems (to
me) intended as a way to open() the file for specialty access, for
instance by a view.

However, this usefulness is only valid for the default
FileSystemBackend. For an S3Backend (which I'm also building as a test
of this system), this convenience would be lost, as simply returning a
path wouldn't be enough to easily retrieve the file for read (or
write) access. Even if a view author did want to write the extra code,
it would no longer be portable to different environments, so file
reading/editing wouldn't be a realistic option for distributable apps.

So what I've done, and what I'd like comments on, is provide a special
object in place of the existing string attribute, to enable portable
access to the file on the backend. For existing templates, this will
look the same as the existing field, as its __str__ simply returns the
filename as it would have appeared before. But in views, this object
has a few methods to replace the various get_FIELD_* methods currently
employed. Some methods that are implemented include get_filename(),
get_absolute_url() and get_filesize().

There is also an open(mode) method, which returns a file (or
file-like) object ready for reading or writing, depending on the mode
supplied. For the FIleSystemBackend, it returns a true file object for
obvious reasons. For S3Backend, it returns an instance of a supplied
RemoteFile class, which extends StringIO to provide portable
read/write access to files, regardless of the backend in use. It
handles writing by setting a dirty flag during its write() method, and
pushing the file to the backend on close(), only if the dirty flag is
set. So, for reading operations only, the file's contents are
retrieved once, then destroyed with garbage collection when no longer
in use, without touching the backend again.

That was a long, roundabout way to get back to my real question, so
here it is. Would this new object's open(mode) method suffice as a
replacement for get_FIELD_filename()? The existing method is still
available, so it won't break existing code, but it will only be of use
for the FileSystemBackend. Anyone using a different backend will have
to change any specialty code anyway. It should be as simple as
replacing this first line to the second:

fp = open(obj.get_data_filename(), 'rb')
fp = obj.data.open()

But that assumes that opening the file is the only reasonable use of
get_FIELD_filename. Is there some other use for it that I'm not
considering that might be hindered if people try to use a different
backend?

-Gul

Marty Alchin

unread,
Sep 4, 2007, 4:20:13 PM9/4/07
to django-d...@googlegroups.com
On 9/4/07, Marty Alchin <gulo...@gamemusic.org> wrote:
> Some methods that are implemented include get_filename(),
> get_absolute_url() and get_filesize().

Before this sparks any confusion, allow me to correct myself.
get_filename() is not an available method on the special File object.
I had it implemented as a pass-through to the existing
get_FIELD_filename() functionality, but I've removed it for now,
pending some feedback on this thread.

I just don't want anybody getting confused about what the difference
is between the two, or why I'd provide one and not the other. So, just
ignore get_filename() for now.

-Gul

Marty Alchin

unread,
Sep 7, 2007, 4:00:59 PM9/7/07
to django-d...@googlegroups.com
I don't want to be annoying, but I thought I'd ping this question,
because I'd like to get some feedback in time to have pluggable
FileField backends ready for consideration before (or during) the
sprint next week.

-Gul

Joseph Kocherhans

unread,
Sep 7, 2007, 4:10:20 PM9/7/07
to django-d...@googlegroups.com
On 9/7/07, Marty Alchin <gulo...@gamemusic.org> wrote:
>
> I don't want to be annoying, but I thought I'd ping this question,
> because I'd like to get some feedback in time to have pluggable
> FileField backends ready for consideration before (or during) the
> sprint next week.

Could you post your current patch to Trac? I'd find it a lot easier to
comment if I could look at some code. Using an object to represent the
file rather than a bunch of special model methods definitely seems
like the right direction here.

Joseph

Marty Alchin

unread,
Sep 7, 2007, 4:17:59 PM9/7/07
to django-d...@googlegroups.com
On 9/7/07, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> Could you post your current patch to Trac? I'd find it a lot easier to
> comment if I could look at some code. Using an object to represent the
> file rather than a bunch of special model methods definitely seems
> like the right direction here.

I'll be posting the patch in a couple hours, once I get to a PC with a
proper diff tool. And I'm not so much concerned with whether or not I
should be setting up a new object, as much as the fact the exact
functionality provided by get_FIELD_filename() would be going away in
my current implementation. Hopefully it will make some more sense once
I get the patch checked in.

-Gul

Joseph Kocherhans

unread,
Sep 7, 2007, 4:26:51 PM9/7/07
to django-d...@googlegroups.com
On 9/7/07, Marty Alchin <gulo...@gamemusic.org> wrote:
>
> I'll be posting the patch in a couple hours, once I get to a PC with a
> proper diff tool. And I'm not so much concerned with whether or not I
> should be setting up a new object, as much as the fact the exact
> functionality provided by get_FIELD_filename() would be going away in
> my current implementation. Hopefully it will make some more sense once
> I get the patch checked in.

Awesome. Would it make sense in your implementation to have a
get_filename method on the new file object that mimics
get_FIELD_filename functionality, but returns None for things that
don't exist on the filesystem?

Joseph

Marty Alchin

unread,
Sep 7, 2007, 4:32:37 PM9/7/07
to django-d...@googlegroups.com
I could do that, yes, I just worry that people might use it without
the proper checks, then have problems when things break. Especially
when (I expect) the most common use of get_FIELD_filename(), opening a
file for reading/writing, has a much better, and backend-agnostic
method available. I'd much rather encourage people to use that, unless
there's a compelling reason to have filename access, other than that.

I'm heading home now, so I'll have the patch up in an hour or two.

-Gul

On 9/7/07, Joseph Kocherhans <jkoch...@gmail.com> wrote:
>

Jacob Kaplan-Moss

unread,
Sep 7, 2007, 5:13:39 PM9/7/07
to django-d...@googlegroups.com
On 9/7/07, Marty Alchin <gulo...@gamemusic.org> wrote:
> I could do that, yes, I just worry that people might use it without
> the proper checks, then have problems when things break. Especially
> when (I expect) the most common use of get_FIELD_filename(), opening a
> file for reading/writing, has a much better, and backend-agnostic
> method available. I'd much rather encourage people to use that, unless
> there's a compelling reason to have filename access, other than that.

I think, though, for the sake of backwards-compat we shouldn't remove
get_FIELD_filename() -- storing files on the filesystem is going to be
by far the common case; let's not make those doing the common thing
suffer.

I'd say the right approach would be:

* Change the docs to indicate that FileBackend.open(mode) is the right
way to read/write files.

* Make Model.get_FIELD_filename() simply call FileBackend.get_filename().

* Make backends for which filenames don't make sense raise an
exception when used with get_FIELD_filename()

Oh, and I don't know if this is on the list, but whatever you do
*please* don't break Model.get_FIELD_url(); I can't even count the
number of places that would bite me.

Jacob

Marty Alchin

unread,
Sep 7, 2007, 10:12:00 PM9/7/07
to django-d...@googlegroups.com
On 9/7/07, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> I think, though, for the sake of backwards-compat we shouldn't remove
> get_FIELD_filename() -- storing files on the filesystem is going to be
> by far the common case; let's not make those doing the common thing
> suffer.

Well, get_FIELD_filename() is still present, it just issues a DeprecationWarning

>
> I'd say the right approach would be:
>
> * Change the docs to indicate that FileBackend.open(mode) is the right
> way to read/write files.
>
> * Make Model.get_FIELD_filename() simply call FileBackend.get_filename().

Well, FileSystemBackend._get_absolute_path(), to be accurate. :)

> * Make backends for which filenames don't make sense raise an
> exception when used with get_FIELD_filename()

They'll raise an AttributeError, since _get_absolute_path() isn't
implemented. I can have it raise something more specific if necessary
though.

> Oh, and I don't know if this is on the list, but whatever you do
> *please* don't break Model.get_FIELD_url(); I can't even count the
> number of places that would bite me.

Well, none of the get_FIELD_* functions are gone in this patch. They
all redirect to the new implementation, issuing a DeprecationWarning
along the way.

The patch isn't quite ready yet after all. Once I got the code home
and started to plug it into a proper trunk checkout, it's taking a bit
more effort than I expected to get it to work properly. I'd rather
make sure it does in fact work right before I check it in.

-Gul

Marty Alchin

unread,
Sep 7, 2007, 11:14:19 PM9/7/07
to django-d...@googlegroups.com
On 9/7/07, Marty Alchin <gulo...@gamemusic.org> wrote:
> The patch isn't quite ready yet after all. Once I got the code home
> and started to plug it into a proper trunk checkout, it's taking a bit
> more effort than I expected to get it to work properly. I'd rather
> make sure it does in fact work right before I check it in.

Well, that was easier to fix than I thought. I've put in ticket #5361
with the (rather lengthy) patch. This is designed to be completely
backward-compatible, so existing code should work unmodified.
FileField now takes a new argument, "backend" which is an instance of
django.db.models.fields.backends.FileSystemBackend, or one of its
forthcoming cousings.

Each backend can take whatever arguments it needs, it just needs to
provide a set of methods that do the backend-specific work. Take a
look at FileSystemBackend for details on how those work, until I get
proper documentation written. These methods should be sufficient for
most uses, and to make sure it would suffice, I've started writing up
a few other backends (S3, MogileFS, WebDav, and even SVN!).

I don't have environments to test most of those, so I'm building them
just out of library documentation. Basically, I'm doing just enough to
make sure that the existing hooks would suffice to enable each of the
various backends.

I hope the source makes enough sense to give you guys a decent idea of
how I'm approaching this. I'm not happy with the organization of it,
but I'm not sure how best to organize everything. It seems like it'd
be better to break the file stuff into a new module, now that it's
considerably larger. I'm not sure though.

Please ask any questions and give any feedback, I'm anxious to hear
it. I'll have documentation ready shortly.

-Gul

Marty Alchin

unread,
Sep 8, 2007, 3:38:19 PM9/8/07
to django-d...@googlegroups.com

Doing some more thinking on the subject, it might make sense to move
the backends outside the django.db.models area, since they're really
not specific to that. Maybe something like django.core.filesystems?
Since, in theory, they could possibly be used for other things, like
maybe serving static files from MogileFS or something. I don't know,
I'm still not sold on one way to oranize everything.

-Gul

Jacob Kaplan-Moss

unread,
Sep 8, 2007, 8:02:20 PM9/8/07
to django-d...@googlegroups.com
On 9/8/07, Marty Alchin <gulo...@gamemusic.org> wrote:
> Doing some more thinking on the subject, it might make sense to move
> the backends outside the django.db.models area, since they're really
> not specific to that. Maybe something like django.core.filesystems?
> Since, in theory, they could possibly be used for other things, like
> maybe serving static files from MogileFS or something. I don't know,
> I'm still not sold on one way to oranize everything.

Agreed - my initial reaction to your code was "what are these doing in
django.db?"

I'd say something like django.core.filesystems would be a good idea
(though I'd call it django.core.filestorage and paint the bikeshed
yellow.)

Jacob

Marty Alchin

unread,
Sep 8, 2007, 8:19:08 PM9/8/07
to django-d...@googlegroups.com
On 9/8/07, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> I'd say something like django.core.filesystems would be a good idea
> (though I'd call it django.core.filestorage and paint the bikeshed
> yellow.)

I have pretty much zero preference on what it's called or where it
goes, so yellow sounds fine to me. I still think that __init__.py is
getting a bit unwieldy with the new File, ImageFile and FileProxy
classes, which aren't actually fields, but I couldn't think of a
better way to deal with that particular one.

I'm tempted to wonder if it might be time to break that file up a bit.
Perhaps split it up into a few different modules?

* text.py for CharField, EmailField, TextField, XMLField, etc.
* numbers.py for IntegerField, DecimalField, PhoneNumberField, etc.
* files.py for FileField, ImageField, and potentially AudioField if I
ever get working on it.

Obviously that wouldn't cover everything, so miscellaneous things like
BooleanField could still be in __init__.py along with the base Field
class.

I don't know, I'm probably crazy here, and I'm definitely changing the
subject, so feel free to ignore my comments on the issue.

-Gul

Marty Alchin

unread,
Sep 8, 2007, 11:35:10 PM9/8/07
to django-d...@googlegroups.com
I submitted a new patch, which moves the backend code into
django.core.filestorage and tweaks a few minor things. Most
importantly, this new patch includes documentation in the form of a
new document, files.txt, and some minor updates to model-api.txt and
db-api.txt.

The documentation probably isn't 100% understandable, since I'm so
entrenched in it, I'm probably making some assumptions that I
shouldn't be. I appreciate feedback on any part of the patch, but
especially the documentaiton. Hopefully I documented it well enough to
at least help you guys ask important questions.

-Gul

Reply all
Reply to author
Forward
0 new messages