FileField and ImageField

369 views
Skip to first unread message

Cristiano Coelho

unread,
Mar 16, 2016, 6:16:37 PM3/16/16
to Django developers (Contributions to Django itself)
I am recently trying to make an aws S3 storage (I know there are a few libraries in there but I needed some customization). The storage works fine so far!

However, there are some implementation details on FileField and ImageField, in particular the function
generate_filename
https://github.com/django/django/blob/master/django/db/models/fields/files.py#L310

that is called before saving a model (and its file/image field). That code assumes the file uses standar file paths based on your current operative system. This creates quite some complications on S3 for example where all the paths to simulate folders uses a "/", so if you are in Windows and your file name is something like "hello/my/file.txt" it will overwrite your file name to be "hello\\my\\file.txt" which is something I don't want as I have explicitly created an upload_to callable to generate a very specific file name (and that it shouldn't be changed).
Looking deeper it seems like the whole FileField implementation relies on a directory and file structure, which might not be always right.

How easy would it be to add customization to this? Is your only option defining your own FileFilend and ImageField classes that override this method?

Cristiano Coelho

unread,
Mar 16, 2016, 9:16:00 PM3/16/16
to Django developers (Contributions to Django itself)
To add a bit more about this, it seems that FileField is really meant to be working with an OS file system, making it harder to use a custom Storage that sends data to somewhere like AWS S3 where basically everything is a file (there are no real folders, just key prefixes)

These 3 functions inside FileField are the culprits:

def get_directory_name(self):
       
return os.path.normpath(force_text(datetime.datetime.now().strftime(force_str(self.upload_to))))

   
def get_filename(self, filename):
       
return os.path.normpath(self.storage.get_valid_name(os.path.basename(filename)))

   
def generate_filename(self, instance, filename):
       
# If upload_to is a callable, make sure that the path it returns is
       
# passed through get_valid_name() of the underlying storage.
       
if callable(self.upload_to):
            directory_name
, filename = os.path.split(self.upload_to(instance, filename))
           
filename = self.storage.get_valid_name(filename)
           
return os.path.normpath(os.path.join(directory_name, filename))

       
return os.path.join(self.get_directory_name(), self.get_filename(filename))


They basically destroy any file name you give to it even with upload_to. This is not an issue on a storage that uses the underlying file system, but it might be quite an issue on different systems, in particular if file names are using slashes as prefixes.

So what I did was to override it a bit:

class S3FileField(FileField):
 
   
def generate_filename(self, instance, filename):
       
# If upload_to is a callable, make sure that the path it returns is
       
# passed through get_valid_name() of the underlying storage.
       
if callable(self.upload_to):
           
filename = self.upload_to(instance, filename)            
           
filename = self.storage.get_valid_name(filename)
           
return filename

       
return self.storage.get_valid_name(filename)

And all S3 issues gone! I wonder if this is the best way to do it. It would be great to have an additional keyword argument or something on the File (and image) fields to let the above functions know that they should not perform any OS operation on paths but seems like it would cause a lot of trouble.

Josh Smeaton

unread,
Mar 16, 2016, 10:41:58 PM3/16/16
to Django developers (Contributions to Django itself)
It seems like FileField should delegate some of these methods to an underlying Storage backend, no? I don't know what the implications to back-compat would be, but the idea seems like a sensible one to start with. The storage backend API may need to grow some additional methods to verify/validate paths and filenames or it might already have the correct methods needed for FileField to work. Fields should do all of their path/storage IO via their storage object though.

Cristiano Coelho

unread,
Mar 16, 2016, 11:30:43 PM3/16/16
to Django developers (Contributions to Django itself)
I agree with you, generate_filename should just call the field upload_to and then delegate the whole name generation to the storage.

There's another thing about file storage that is troubling me: https://github.com/django/django/blob/master/django/core/files/storage.py#L57

The docs state you are not suposed to override the save method and just implement _save, however, doing a complete random rename at the send pretty much forces you to always override save instead. That name replacement is probably to save the file name in a way you can easily create the URL, but again, that should be delegated to the storage.url method shouldn't it?

There's one more thing I noticed (gonna open a ticket about this one) is that the proxy class for FileField (FieldFile, what a name!) states in the docs that in order to call its save method you need a django File object rather than a simple file-like object, I can understand that's because the save method uses the .size property (https://github.com/django/django/blob/master/django/db/models/fields/files.py#L92) to save the file size into a _size variable. It doesn't seem anywhere in the code the _size is being used as size is a property that gets the file size either from the storage class or the actual file. So removing that line, could also allow us to use normal files in the save method.

Josh Smeaton

unread,
Mar 17, 2016, 12:11:34 AM3/17/16
to Django developers (Contributions to Django itself)
It seems like you have a pretty good grasp on the problems and possible solutions. I'm not familiar enough with storage backends and FileFields to provide much guidance so hopefully someone else can comment. Regarding the linked line in the save method which forces forward slashes, it seems that the FileSystemStorage should be doing that, rather than the base Storage class. It looks like Storage tries to do too much, so pushing some behaviour back down to FileSystemStorage seems correct. Again though, we have to be wary of backwards compatibility. You could offer opt-in to more correct behaviour while deprecating the status quo. Something like a `normalize_slashes = True` that can be flipped to False for brand new code.

Cristiano Coelho

unread,
Mar 20, 2016, 10:02:36 PM3/20/16
to Django developers (Contributions to Django itself)
I have done some work here https://github.com/cristianocca/django/commit/4be936ba3031f95d34101a919648f4e8f7d7856a
And opened a ticket here https://code.djangoproject.com/ticket/26382
If someone's willing to take a look.
Reply all
Reply to author
Forward
0 new messages