**Expected**: File should present in open state (`seekable()` should be
`True`)
Below is a reproducible example and also contains version details:
{{{
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
In [18]: import django
In [19]: django.VERSION
Out[19]: (1, 11, 11, u'final', 0)
In [20]: from django.http.request import QueryDict
In [21]: from django.core.files.uploadedfile import TemporaryUploadedFile
In [22]: d = QueryDict(mutable=True)
In [23]: f = TemporaryUploadedFile('test.jpg', 'image/jpeg', 100, 'utf-8')
In [24]: f.seekable()
Out[24]: True
In [25]: d.appendlist('image', f)
In [26]: d['image'].seekable()
Out[25]: True
In [27]: c = d.copy()
In [28]: c['image'].seekable()
Out[28]: False
In [30]:
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/29510>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* component: Uncategorized => File uploads/storage
* stage: Unreviewed => Accepted
Comment:
I can reproduce with Python 2, even if I'm unsure that the behavior is
incorrect:
{{{
>>> from django.core.files.uploadedfile import TemporaryUploadedFile
>>> from copy import deepcopy
>>> f = TemporaryUploadedFile('test.jpg', 'image/jpeg', 100, 'utf-8')
f.seekable()
True
a = deepcopy(f)
>>> a.seekable()
False
}}}
However, the `deepcopy()` crashes on Python 3 with:
`TypeError: cannot serialize '_io.BufferedRandom' object`
Accepting for further investigation.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:1>
Old description:
New description:
When uploaded file size is greater than `FILE_UPLOAD_MAX_MEMORY_SIZE`,
Django uses `TemporaryUploadedFile` to represent the file object. However,
when executing `.copy()` on a `QueryDict` containing such a file, the
returned object has the file but it is in closed state (`seekable()` is
`False`).
**Expected**: File should be present in open state (`seekable()` should be
`True`)
Below is a reproducible example and also contains version details:
{{{
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
In [18]: import django
In [19]: django.VERSION
Out[19]: (1, 11, 11, u'final', 0)
In [20]: from django.http.request import QueryDict
In [21]: from django.core.files.uploadedfile import TemporaryUploadedFile
In [22]: d = QueryDict(mutable=True)
In [23]: f = TemporaryUploadedFile('test.jpg', 'image/jpeg', 100, 'utf-8')
In [24]: f.seekable()
Out[24]: True
In [25]: d.appendlist('image', f)
In [26]: d['image'].seekable()
Out[25]: True
In [27]: c = d.copy()
In [28]: c['image'].seekable()
Out[28]: False
In [30]:
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:2>
* cc: Jeff (added)
* owner: nobody => Jeff
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:3>
Comment (by Jeff):
This is not the reported issue, however:
In python 3, the `copy.deepcopy` function is not able to copy
`TemporaryUploadedFiles` because python's `tempfile.TemporaryFile` is not
serializable. I'm not certain how serious the issue is, or if we want to
fix it, but I believe this makes QueryDicts unable to be copied if it
contains a `TemporaryUploadedFile` obj.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:4>
* owner: Jeff => (none)
* status: assigned => new
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:5>
* cc: Herbert Fortes (added)
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:6>
* owner: (none) => Paulo
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:7>
Comment (by Paulo):
The underlying issue here is a change in Python attempting to pickle the
file.
On python 2.7, I can reproduce the reported issue:
{{{
In [19]: file = tempfile.NamedTemporaryFile(suffix='.upload')
In [20]: copy.deepcopy(file)
Out[20]: <closed file '<uninitialized file>', mode '<uninitialized file>'
at 0x7f45f2d0b420>
}}}
On Python 3.6 I can't reproduce the original issue, instead I get Tim's
issue:
{{{
In [17]: file = tempfile.NamedTemporaryFile(suffix='.upload')
In [18]: copy.deepcopy(file)
---------------------------------------------------------------------------
TypeError Traceback (most recent call
last)
<ipython-input-18-36f4ecfb14c1> in <module>
----> 1 copy.deepcopy(file)
/usr/local/lib/python3.6/copy.py in deepcopy(x, memo, _nil)
167 reductor = getattr(x, "__reduce_ex__", None)
168 if reductor:
--> 169 rv = reductor(4)
170 else:
171 reductor = getattr(x, "__reduce__", None)
/usr/local/lib/python3.6/tempfile.py in func_wrapper(*args, **kwargs)
483 @_functools.wraps(func)
484 def func_wrapper(*args, **kwargs):
--> 485 return func(*args, **kwargs)
486 # Avoid closing the file as long as the wrapper is
alive,
487 # see issue #18879.
TypeError: cannot serialize '_io.BufferedRandom' object
}}}
Personally I think the 2.7 issue is not as bad as the 3.x since it at
least allows for the copy to proceed (while silently closing the file).
My suggestion would be to preemptively check that the objects being copied
are serializable or to catch the TypeError and raise a different error
indicating the user that the query dict contains a non-seralizable value
(like file, etc..)
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:8>
Comment (by Claude Paroz):
Did you explore possible handling of such file objects in
`QueryDict.__deepcopy__`?
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:9>
Comment (by Paulo):
Replying to [comment:9 Claude Paroz]:
> Did you explore possible handling of such file objects in
`QueryDict.__deepcopy__`?
To keep the file on copy?
If so then yes, we can check value for UploadedFile and create a new
instance that points to the same underlying file (although I guess that's
more shallow than deep copy).
To have fully deep copy we'd have to create new tmp if is on disk or
somehow copy the file stream.
That said, I think "soft" copy should be enough and is more expected
behavior for the file itself.
My concern with this approach was if we have any other non file values
that are not serializable, would this open the door to having to handle
those too?
Guess not since there's only so many types that are handled by QueryDict.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:10>
* owner: Paulo => Dan Madere
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:11>
Comment (by Dan Madere):
Replying to [comment:10 Paulo]:
> Replying to [comment:9 Claude Paroz]:
> > Did you explore possible handling of such file objects in
`QueryDict.__deepcopy__`?
>
> To keep the file on copy?
> If so then yes, we can check value for UploadedFile and create a new
instance that points to the same underlying file (although I guess that's
more shallow than deep copy).
> To have fully deep copy we'd have to create new tmp if is on disk or
somehow copy the file stream.
After trying it, I don't think either of these suggestions is workable,
considering the nature of `TemporaryUploadedFile`. It has logic where the
file is deleted as soon as it's closed. When copying/streaming the old
file's content to the new one, we have a problem where the new file
deletes itself immediately.
I stepped back, and pondered why are people copying a `QueryDict` anyway,
and what do they expect to happen to a `TemporaryUploadedFile` inside?
[[https://docs.djangoproject.com/en/dev/ref/request-response/#querydict-
objects]]:
{{{
class QueryDict¶
In an HttpRequest object, the GET and POST attributes are instances of
django.http.QueryDict, a dictionary-like class customized to deal with
multiple values for the same key. This is necessary because some HTML form
elements, notably <select multiple>, pass multiple values for the same
key.
The QueryDicts at request.POST and request.GET will be immutable when
accessed in a normal request/response cycle. To get a mutable version you
need to use QueryDict.copy().
}}}
That leads me to think that people are copying a `QueryDict` because it's
a nice starting point to have all the query params organized, but they
want to modify it, and pass to the template. Doing so would result in a
`This QueryDict instance is immutable` exception, and the advice above, so
they try to copy the `QueryDict` before modifying it.
I propose we omit `TemporaryUploadedFile` values on deep copy of a
`QueryDict`, and will open a PR for feedback.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:12>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/16215 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:13>
Comment (by Carlton Gibson):
> I stepped back, and pondered why are people copying a QueryDict
anyway...
This seems like the right thought to me. 🤔
I have to admit I'm struggling to see the use-case — Folks would have to
be making a copy of `request.FILES` for this to come up no? Why would you
want a mutable copy of that? (Maybe more ☕️ is needed :)
Copying `request.GET` in order to edit a query string makes sense.
Uploaded files though... 🤔
(Sorry if I'm being dense)
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:14>
Comment (by Dan Madere):
Replying to [comment:14 Carlton Gibson]:
> > I stepped back, and pondered why are people copying a QueryDict
anyway...
>
> This seems like the right thought to me. 🤔
>
> I have to admit I'm struggling to see the use-case — Folks would have to
be making a copy of `request.FILES` for this to come up no? Why would you
want a mutable copy of that? (Maybe more ☕️ is needed :)
Thanks Carlton!
Not necessarily, it could also be making a copy of `request.POST`, which
surprisingly includes the files too. My theory is that most people believe
that copying it only includes the query params, because that's the useful
part to append more data to, and pass to the template.
Regardless of the use case though, it seems impossible to me to copy a
`TemporaryUploadedFile`. I don't see any reason people would expect it to
actually copy the file.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:15>
Comment (by Carlton Gibson):
Hey Dan — just to let you know, I'm looking at the request object now,
with this half in mind. Will follow-up once I've done that. Thanks
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:16>
Comment (by anton-kazlouski):
This is exactly what happened in my case. I have a code that modifies some
attributes of `request.data` and then passes it to a serializer. At some
point we started to notice `This QueryDict instance is immutable`
exceptions in our logs. Thus, decided to use built-in method which you are
referenced to. Once, replaced direct modifications `request.data` with
`request.data.copy()`, `TypeError: cannot pickle '_io.BufferedRandom'
object on...` started to araise.
Replying to [comment:12 Dan Madere]:
>
> After trying it, I don't think either of these suggestions is workable,
considering the nature of `TemporaryUploadedFile`. It has logic where the
file is deleted as soon as it's closed. When copying/streaming the old
file's content to the new one, we have a problem where the new file
deletes itself immediately.
>
> I stepped back, and pondered why are people copying a `QueryDict`
anyway, and what do they expect to happen to a `TemporaryUploadedFile`
inside? [[https://docs.djangoproject.com/en/dev/ref/request-response
/#querydict-objects]]:
>
> {{{
> class QueryDict¶
>
> In an HttpRequest object, the GET and POST attributes are instances of
django.http.QueryDict, a dictionary-like class customized to deal with
multiple values for the same key. This is necessary because some HTML form
elements, notably <select multiple>, pass multiple values for the same
key.
>
> The QueryDicts at request.POST and request.GET will be immutable when
accessed in a normal request/response cycle. To get a mutable version you
need to use QueryDict.copy().
> }}}
>
> That leads me to think that people are copying a `QueryDict` because
it's a nice starting point to have all the query params organized, but
they want to modify it, and pass to the template. Doing so would result in
a `This QueryDict instance is immutable` exception, and the advice above,
so they try to copy the `QueryDict` before modifying it.
>
> I propose we omit `TemporaryUploadedFile` values on deep copy of a
`QueryDict`, and will open a PR for feedback.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:17>
Comment (by Carlton Gibson):
Hey Dan. Sorry for the pause. Getting there now.
Can I ask you to clarify this:
> Not necessarily, it could also be making a copy of request.POST, which
surprisingly includes the files too.
I'm looking at `% ./runtests.py file_uploads -k test_simple_upload
--parallel=1` and `% ./runtests.py file_uploads -k test_large_upload
--parallel=1` and inspecting the request for the views:
{{{
(django) carlton@Carltons-MacBook-Pro tests % ./runtests.py file_uploads
-k test_simple_upload --parallel=1
Testing against Django installed in
'/Users/carlton/Projects/Django/django/django'
Found 1 test(s).
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
<WSGIRequest: POST '/upload/'>
POST:
<QueryDict: {'name': ['Ringo']}>
FILES:
<MultiValueDict: {'file_field': [<InMemoryUploadedFile: tests.py
(text/x-python)>]}>
>
/Users/carlton/Projects/Django/django/tests/file_uploads/views.py(31)file_upload_view()
-> form_data = request.POST.copy()
(Pdb) interact
*interactive*
>>> request.POST.copy()
<QueryDict: {'name': ['Ringo']}>
>>> request.FILES.copy()
<MultiValueDict: {'file_field': [<InMemoryUploadedFile: tests.py
(text/x-python)>]}>
>>> request.FILES.copy()["file_field"]
<InMemoryUploadedFile: tests.py (text/x-python)>
>>> request.FILES.copy()["file_field"].seekable()
True
}}}
And the same here:
{{{
(django) carlton@Carltons-MacBook-Pro tests % ./runtests.py file_uploads
-k test_large_upload --parallel=1
Testing against Django installed in
'/Users/carlton/Projects/Django/django/django'
Found 1 test(s).
Creating test database for alias 'default'...
System check identified no issues (0 silenced).
<WSGIRequest: POST '/verify/'>
POST:
<QueryDict: {'name': ['Ringo'], 'name_hash':
['a6d6d397a391d876d5aae28cdca9f25071906022'], 'file_field1_hash':
['b84b7b775430232c2ee4e2216db1a591d97b2db0'], 'file_field2_hash':
['fbcf384f1c96e3d24fb25b749b00241a5cc4690f']}>
FILES:
<MultiValueDict: {'file_field1': [<TemporaryUploadedFile:
tmpx56phiic.file1 (application/octet-stream)>], 'file_field2':
[<TemporaryUploadedFile: tmp7ipmyal8.file2 (application/octet-stream)>]}>
>
/Users/carlton/Projects/Django/django/tests/file_uploads/views.py(59)file_upload_view_verify()
-> form_data = request.POST.copy()
(Pdb) interact
*interactive*
>>> request.POST.copy()
<QueryDict: {'name': ['Ringo'], 'name_hash':
['a6d6d397a391d876d5aae28cdca9f25071906022'], 'file_field1_hash':
['b84b7b775430232c2ee4e2216db1a591d97b2db0'], 'file_field2_hash':
['fbcf384f1c96e3d24fb25b749b00241a5cc4690f']}>
>>> request.FILES.copy()
<MultiValueDict: {'file_field1': [<TemporaryUploadedFile:
tmpx56phiic.file1 (application/octet-stream)>], 'file_field2':
[<TemporaryUploadedFile: tmp7ipmyal8.file2 (application/octet-stream)>]}>
>>> request.FILES.copy()["file_field1"]
<TemporaryUploadedFile: tmpx56phiic.file1 (application/octet-stream)>
>>> request.FILES.copy()["file_field1"].seekable()
True
}}}
So, it's neither the case that `request.POST` includes the files data nor
that copying `request.FILES` leads to the error. 🤔
I do get the error creating the `TemporaryUploadedFile` by-hand, so I need
to see what's different about the upload flow.
If you were able to provide a (minimal but) full reproduce, it might save
a cycle.
Thanks!
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:18>
Comment (by Carlton Gibson):
Presumably you're using [https://www.django-rest-framework.org/api-
guide/requests/#data DRF's request.data] which "includes all parsed
content, including file and non-file inputs."
`request.POST` is still available there that gives you the QueryDict
without the file data.
(Still looking at how I'm able to copy the `TemporaryUploadedFile` in the
test case.)
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:19>
* status: assigned => closed
* resolution: => invalid
Comment:
So (ruling out the test client) I've experimented with a real-view,
uploading a test file with some POST data as well:
{{{
from django.urls import path
from django.http import HttpResponse
import pprint
# With settings to ensure TemporaryUploadedFile:
# FILE_UPLOAD_MAX_MEMORY_SIZE=10
def file_upload_view(request):
pprint.pprint(request)
print("\nPOST:")
pprint.pprint(request.POST)
print("\nFILES:")
pprint.pprint(request.FILES)
f = request.FILES.copy()
assert f["file1"].seekable()
return HttpResponse("Got it!")
urlpatterns = [
path("upload", file_upload_view),
]
}}}
This passes without issue:
{{{
<WSGIRequest: POST '/upload'>
POST:
<QueryDict: {'key1': ['value1'], 'key2': ['value2']}>
FILES:
{'file1': <TemporaryUploadedFile: Large Test File.txt (text/plain)>}
}}}
Note the...
{{{
f = request.FILES.copy()
assert f["file1"].seekable()
}}}
I've tested back to Python 3.9 and Django 3.2.
The reason this works is that `FILES` is a `MultiValueDict` not a
`QueryDict`:
{{{
>>> request.FILES.copy()
<MultiValueDict: {'file_field1': [<TemporaryUploadedFile:
tmpx56phiic.file1 (application/octet-stream)>], 'file_field2':
[<TemporaryUploadedFile: tmp7ipmyal8.file2 (application/octet-stream)>]}>
}}}
...and so `copy()` uses `copy.copy()` rather than `copy.deepcopy()`.
If you try and `deepcopy()` the `TemporaryUploadedFile` you indeed get
the error, but that's not what Django does.
--
Ticket URL: <https://code.djangoproject.com/ticket/29510#comment:20>