FileField uploads accumulating in FILE_UPLOAD_TEMP_DIR

1,516 views
Skip to first unread message

Scott White

unread,
Mar 4, 2013, 5:00:02 PM3/4/13
to django-res...@googlegroups.com
Hi again,

I've noticed an intermittent issue using DRF for uploading files which exceed Django's FILE_UPLOAD_MAX_MEMORY_SIZE (defaults to 2.5MB). Here's a link to Django's help topic on managing files:


My server setup / version info:

- Ubuntu Server 12.04 LTS 
- Apache 2.2.22
- mod_wsgi 3.3
- Python 2.7.3
- Django 1.4
- Django REST Framework 2.2.1

When uploading a files >2.5MB via a regular Django view or the Django Admin, the file uploads and the temp copy in /tmp is removed. When I upload using django-rest-framework via POST, the file uploads but the temporary copy remains in /tmp. And possibly related...I have a REST client script to upload multiple files and I notice that roughly 1 in 10 uploads return a bad HTTP response. After repeating the same group of files numerous times, it is not always the same file, seems random. When I get the bad response, the model instance is saved properly, however I'm also attempting to save some related records to another table, and these are not getting created. I'm overriding post in a ListCreateAPIView to create these related records. These related records are created based on some of the file's content, thus require opening the file after it is saved.

My first thought as to what is going on, is that some process still has the temp file open and it cannot be deleted, and perhaps causes some issue opening the file for the related record stuff. I've experimented with disabling the post override (thus not creating the related records) and I seem to avoid the bad responses, but the temp files are still collecting in /tmp. The temp files are also still open in some apache2 process verified using lsof.

I've perused the Django code related to saving the temp files. The files get named and saved in the TemporaryUploadedFile class in django/core/files/uploadedfile.py. The temp files should get moved to the model's 'upload_to' location from /tmp (Unix) in the _save method of the FileSystemStorage class in django/core/files/storage.py.

I also looked through the DRF code to see if there were any cases where the uploaded file is opened or read and couldn't find anything. I'm stumped at this point. Any ideas?

Thanks,
Scott

Scott White

unread,
Mar 4, 2013, 5:03:03 PM3/4/13
to django-res...@googlegroups.com
Oops, I meant to include the link to the File Uploads document here:

Scott White

unread,
Mar 4, 2013, 6:41:50 PM3/4/13
to django-res...@googlegroups.com
And, here's the stack trace from pdb at the line after the temp file was created:

  /srv/django-projects/ReFlow/reflow/wsgi.py(64)__call__()
-> return self.__object(*args, **kwargs)
  /usr/local/lib/python2.7/dist-packages/django/core/handlers/wsgi.py(241)__call__()
-> response = self.get_response(request)
  /usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py(111)get_response()
-> response = callback(request, *callback_args, **callback_kwargs)
  /usr/local/lib/python2.7/dist-packages/django/views/generic/base.py(48)view()
-> return self.dispatch(request, *args, **kwargs)
  /usr/local/lib/python2.7/dist-packages/django/utils/decorators.py(25)_wrapper()
-> return bound_func(*args, **kwargs)
  /usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py(77)wrapped_view()
-> return view_func(*args, **kwargs)
  /usr/local/lib/python2.7/dist-packages/django/utils/decorators.py(21)bound_func()
-> return func(self, *args2, **kwargs2)
  /srv/django-projects/ReFlow/repository/api_views.py(55)dispatch()
-> return super(LoginRequiredMixin, self).dispatch(request, *args, **kwargs)
  /usr/local/lib/python2.7/dist-packages/django/views/decorators/csrf.py(77)wrapped_view()
-> return view_func(*args, **kwargs)
  /usr/local/lib/python2.7/dist-packages/rest_framework/views.py(384)dispatch()
-> if request.method.lower() in self.http_method_names:
  /usr/local/lib/python2.7/dist-packages/rest_framework/request.py(114)method()
-> self._load_method_and_content_type()
  /usr/local/lib/python2.7/dist-packages/rest_framework/request.py(234)_load_method_and_content_type()
-> self._perform_form_overloading()
  /usr/local/lib/python2.7/dist-packages/rest_framework/request.py(275)_perform_form_overloading()
-> self._data = self._request.POST
  /usr/local/lib/python2.7/dist-packages/django/core/handlers/wsgi.py(180)_get_post()
-> self._load_post_and_files()
  /usr/local/lib/python2.7/dist-packages/django/http/__init__.py(367)_load_post_and_files()
-> self._post, self._files = self.parse_file_upload(self.META, data)
  /usr/local/lib/python2.7/dist-packages/django/http/__init__.py(327)parse_file_upload()
-> return parser.parse()
  /usr/local/lib/python2.7/dist-packages/django/http/multipartparser.py(190)parse()
-> charset)
  /usr/local/lib/python2.7/dist-packages/django/core/files/uploadhandler.py(135)new_file()
-> self.file = TemporaryUploadedFile(self.file_name, self.content_type, 0, self.charset)
> /usr/local/lib/python2.7/dist-packages/django/core/files/uploadedfile.py(67)__init__()
-> super(TemporaryUploadedFile, self).__init__(file, name, content_type, size, charset) 

Tom Christie

unread,
Mar 5, 2013, 8:32:28 AM3/5/13
to django-res...@googlegroups.com
Hi Scott,

  REST framework's file upload machinery is essentially the same as standard Django, inasmuch as it reuses the standard MultiPartParser, so I'm not entirely sure where to start with this.  It's feasible that there's some difference in how file fields are dealt with when ModelSerializer.save() is called versus how they're dealt with when ModelForm.save() is called, so there might be something there. 

Also worth noting this message to django-users, titled 'TemporaryFileUploadHandler leaving tempfiles behind on disk'.  No idea if it's relevant or not, but worth taking a look.

In order to be able to help further, would need the problem to be narrowed down a bit..

* Can you replicate the problem in a test environment, without the full setup of apache etc?
* Can you demonstrate a REST framework view and corresponding Django view that ought to do the same thing, that exhibit different behavior?

I'd recommend you drop down to using view you're writing yourself rather than using the generic views when trying to debug the issues you're seeing.

For example I'd start with trying to test against both a simple API view and a simple standard Django view, like so...

@api_view(['POST'])
def upload_api_view(request):
    print request.FILES
    print show_me_all_the_tempfiles()
    serializer = SimpleModelSerializer(request.DATA, request.FILES)
    serializer.save()
    print serializer.is_valid()
    print show_me_all_the_tempfiles()

def upload_standard_view(request):
    print request.FILES
    print show_me_all_the_tempfiles()
    form = SimpleModelForm(request.POST, request.FILES)
    print form.is_valid()
    form.save()
    print show_me_all_the_tempfiles()

Let us know where you get to. 

Cheers,

  Tom

Scott White

unread,
Mar 5, 2013, 6:59:23 PM3/5/13
to django-res...@googlegroups.com
Thanks Tom,

At first I didn't think the problem existed in the Django development server, however it seems the Django FILE_UPLOAD_TEMP_DIR setting is not set to anything by default on Macs (which I'm using for development). As a result, I never saw any issues with temporary files when debugging locally because all the uploads were in memory. After setting this explicitly in my settings.py to "/tmp" I can replicate the issue without the Ubuntu/Apache setup. This is good news as it's much easier to debug this way rather than using pdb in Apache's debug mode.

As you suggested, I looked at the differences in the Django ModelForm behaviour vs the REST framework's serializer. I found my model's clean method gets called twice from the REST framework and only once using a Django regular view via a ModelForm. I'm not sure this is related to the sticky temp file issue, but in the 2nd go round of clean, sometimes the file in the FileField is not readable anymore...dunno why yet. I observed this during one session where the SHA-1 hash I'm saving in clean was a hash of an empty string and not the file.

It looks like in rest_framework/serializers.py, the 2 calls to the clean method are both happening in the ModelSerializer's from_native() method:

1) Line 606: ModelSerializer.from_native()  -->  parent BaseSerializer.from_native()  -->  self.restore_object()  -->  ModelSerializer.restore_object() --> Model.full_clean()

2) Line 608: ModelSerializer.from_native()  -->  ModelSerializer.full_clean()  -->  Model.full_clean()

In the method comment for ModelSerializer.full_clean() there's this:

        Note that we don't perform this inside the `.restore_object()` method,
        so that subclasses can override `.restore_object()`, and still get
        the full_clean validation checking.

But, it seems .restore_object is calling instance.full_clean in line 595. I commented out the try block in restore_object and ran through a couple tests. Looks like everything works ok, and clean is only called once. The temp file still remained in /tmp however :(

Also, there's something in the Django code I do not understand. The only place I've found where the temporary file is moved to it's final location is within django/core/files/storage.py. This is in FileSystemStorage._save():

                # This file has a file path that we can move.
                if hasattr(content, 'temporary_file_path'):
                    file_move_safe(content.temporary_file_path(), full_path)
                    content.close()

When I set a breakdance point on the file_move_safe method call, it never gets triggered even when Django removes the temp file successfully using the ModelForm regular view. Eeks! If I set the break point on the if statement, code execution stops. Then I can see the 'temporary_file_path' attribute is not present, explaining why the method wasn't called, but not why the file was removed. Double-eeks! Even stranger is that just by setting the break point on the if statement, the temp file gets stuck in /tmp using the ModelForm based method...have I found a true Heisenbug? I digress, this part is more a question for django-users, and I plan on posting there so I have a better understanding of what is happening.

As to the double-clean method call, can you verify whether this was supposed to happen? Also, I will put together a small Django project that shows the behaviour if you still think it is useful.

Many many thanks,
Scott

Scott White

unread,
Mar 5, 2013, 7:31:05 PM3/5/13
to django-res...@googlegroups.com
Tom, sorry for the noise. Just noticed you removed the try block in a recent revision:


The double clean issue is moot at this point. I'll check the repo before posting next time. Still interested in any feedback/comments though.

Scott White

unread,
Mar 7, 2013, 4:34:35 PM3/7/13
to django-res...@googlegroups.com
Ok, reporting back on what I've learned so far. 

First, there were 2 issues going on. The new update to 2.2.2 (now 2.2.3) fixed the double clean call which made testing for me much easier. It also seems to reduce the likelihood that a temp file gets stuck.

I can now see that the difference in behaviour has to do with the Django regular view doing a redirect on POST. Similarly, the temp file may get joggled loose if you hit refresh on a project page after submitting a file via a REST client. Sometimes this doesn't work though. If you upload lots of files in succession and monitor /tmp you can see that a collection of files is allowed to exist and eventually most of them will get released.

So, the take home is that I don't think this is an issue with the REST framework, it just made the issue more obvious because of the lack of re-directs when posting files via a client.

I put together a simple Django project on Github to illustrate the issue:


I've included an extra setting CRAZY_HACK which is False by default. This setting controls whether a post_save signal is triggered to delete the temp file, as well as attempting to remove it if clean() fails (in the example code, this is if the file is a duplicate). It's a hack because I'm having to save an extra attribute to the UploadedFile model instance (self.temp_file_path). This is done in an overridden save() method prior to calling the parent save, since the FieldFile's file is no longer a TemporaryUploadedFile after the parent save. And, if I try to close the temp file before the parent save it screws up django/core/files/base.py with chunks() throwing "ValueError: I/O operation on closed file".

What is particularly worrisome is that an uploaded file will get into /tmp even if the model instance isn't saved due to some validation issue, etc.  Of course, if it's not in memory the file has to be uploaded somewhere but one would hope the temp file would get removed quickly by Django if the related request failed validation. All the more reason to have /tmp on it's own partition.

Also, a tip that the Unix watch command is a good tool for monitoring, you can give it a command and an interval to execute the command. To monitor /tmp every 0.3 second:

watch -n -0.3 ls /tmp

Well, even if it isn't an issue in the REST framework, hopefully the code is useful for someone as an example of uploading files. Let me know if anyone finds an issue or a better way to do something. I've also included an example REST client script using httplib for uploading files.

Regards,
Scott

Tom Christie

unread,
Mar 8, 2013, 5:31:36 AM3/8/13
to django-res...@googlegroups.com
Thanks Scott, that's incredibly comprehensive!

I'm not really sure how much of this is really Django's concern and how much is down to making sure your system has a sensibly setup temporary storage, but at any rate it's reassuring to confirm that it's not a difference of behavior between REST framework and regular Django.

Cheers,

  Tom

--
You received this message because you are subscribed to the Google Groups "django-rest-framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-rest-fram...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Reply all
Reply to author
Forward
0 new messages