--
Ticket URL: <http://code.djangoproject.com/ticket/16022>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Comment:
Please disregard the patch, it breaks file uploads. I'm trying to improve
it at the moment (and suggestions are much appreciated).
--
Ticket URL: <http://code.djangoproject.com/ticket/16022#comment:1>
Comment (by Gustavo):
I've tried replacing weakref.ref() with weakref.proxy(), which does work
but then memory keeps growing as usual. I think a proper way to fix it is
to break the cyclic reference, but that'd be backwards incompatible.
I cannot afford spending more time on this, so I'll leave it to someone
else. I'm happy to test any eventual fix though. In the meantime, we'll be
using mod_wsgi's ''maximum-requests'' to mitigate the consequences of this
problem.
A hint to whoever will have a look at this: Try setting ''self.instance''
to ''None'' in ''!FieldFile.!__init!__'' and you won't notice any memory
growth when you query many records. This breaks creation, modification and
deletion of files, but helps demonstrate that this is the problematic
reference.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:2>
* keywords: => memory leak
* has_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:3>
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:4>
Comment (by aaugustin):
Python's gc is supposed to be able to collect reference cycles these days,
unless there are `__del__` methods. I didn't see any that could by linked
with this bug.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:5>
* status: new => closed
* resolution: => worksforme
Comment:
OK, I tried to reproduce this seriously. I created a project called
`madupload` and an app called `receiver` inside.
{{{
madupload
├── __init__.py
├── manage.py
├── receiver
│ ├── __init__.py
│ ├── models.py
│ ├── views.py
├── settings.py
└── urls.py
}}}
`receiver/models.py`:
{{{
from django import forms
from django.db import models
class UploadModel(models.Model):
upload = models.FileField()
class UploadForm(forms.ModelForm):
class Meta:
model = UploadModel
}}}
`receiver/views.py`:
{{{
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
from .models import UploadForm
@csrf_exempt
def upload(request):
form = UploadForm(request.POST, request.FILES)
try:
model = form.save(commit=False)
return HttpResponse("Saved %i bytes.\n" % model.upload.size,
status=201, content_type='text/plain')
except ValueError:
return HttpResponse(repr(form.errors) + "\n",
status=400, content_type='text/plain')
}}}
`urls.py`:
{{{
from django.conf.urls.defaults import patterns, url
urlpatterns = patterns('',
url(r'^$', 'receiver.views.upload'),
)
}}}
All other files are unchanged from Django's template.
Then I launched `runserver` and started uploading:
{{{
myk@mYk madupload % dd if=/dev/random of=1mb.bin bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes transferred in 0.146655 secs (7149958 bytes/sec)
myk@mYk madupload % while true; do curl -F "upload=@1mb.bin"
http://localhost:8000; done
Saved 1048576 bytes.
Saved 1048576 bytes.
Saved 1048576 bytes.
...
}}}
I've let it run more than one hundred of uploads, and the memory footprint
of the Python process handling `runserver` is flat.
At this point, I have proven that the cyclic reference in !FileField is
not a problem in itself.
I still don't know what's causing your bug. You may have a global variable
referencing your !FileField objects somewhere.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:6>
* status: closed => reopened
* resolution: worksforme =>
Comment:
Hello, aaugustin.
Thank you very much for looking into this!
I forgot to add a useful piece of information here, which I mentioned on
[http://groups.google.com/group/django-
users/browse_thread/thread/804b4fd7a3d126fb Google Groups]: Memory
increases considerably when you query a model that has a !FileField. I'm
sorry about that.
To reproduce this, you could create, say, 2000 records and iterate over
them like this inside a Django view:
{{{
for record in MyModel.objects.all():
echo record.upload
}}}
As you refresh the page, you'll notice that memory increases.
Cheers.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:7>
* status: reopened => closed
* resolution: => needsinfo
Comment:
I read your email to django-users which had more info and I updated my
test code above as follows.
`receiver/models.py`:
{{{
from django import forms
from django.db import models
class UploadModel(models.Model):
upload = models.FileField(upload_to='receiver')
class UploadForm(forms.ModelForm):
class Meta:
model = UploadModel
}}}
`receiver/views.py`:
{{{
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt
from .models import UploadForm, UploadModel
@csrf_exempt
def upload(request, commit=False):
form = UploadForm(request.POST, request.FILES)
try:
model = form.save(commit=commit)
action = "Saved" if commit else "Received"
return HttpResponse("%s %i bytes.\n" % (action,
model.upload.size),
status=201, content_type='text/plain')
except ValueError:
return HttpResponse(repr(form.errors) + "\n",
status=400, content_type='text/plain')
def names(request):
upload_names = [model.upload.name for model in
UploadModel.objects.all()]
return HttpResponse("%i objects \n" % len(upload_names),
status=200, content_type='text/plain')
}}}
`urls.py`:
{{{
from django.conf.urls.defaults import patterns, url
urlpatterns = patterns('',
url(r'^fake/$', 'receiver.views.upload', {'commit': False}),
url(r'^save/$', 'receiver.views.upload', {'commit': True}),
url(r'^read/$', 'receiver.views.names'),
)
}}}
I added a database, etc. to the settings and ran `syncdb`.
I uploaded 100 1kb files:
{{{
myk@mYk madupload % dd if=/dev/random of=1kb.bin bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes transferred in 0.000232 secs (4414149 bytes/sec)
myk@mYk madupload % for i in {1..100}; do curl -F "upload=@1kb.bin"
http://localhost:8000/save/; done
Saved 1024 bytes.
Saved 1024 bytes.
Saved 1024 bytes.
...
}}}
And then I read them over and over:
{{{
myk@mYk madupload % while true; do curl http://localhost:8000/read/; done
100 objects
100 objects
100 objects
...
}}}
I've been heating the planet like this for a few minutes and the memory
curve climbed a little bit at the beginning, then stabilized.
Initially, I was using Python 2.6; I switched to Python 2.5 but the result
was the same.
So, unfortunately, we still don't have a proof that the bug is in Django
itself.
At this point, I am afraid you didn't provide sufficient information for
me to reproduce the bug. You could:
- try to play with `gc.garbage` to see what happens in your app, see
http://docs.python.org/library/gc.html
- modify my example until it exhibits the memory leak
- come up with your own reproducible example, and provide complete
instructions of how to run it
Thanks!
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:8>
Comment (by aaugustin):
PS : closing the ticket as "needsinfo" is part of the standard triaging
procedure when we can't reproduce a problem. It doesn't mean the problem
doesn't exist.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:9>
* status: closed => new
* ui_ux: => 0
* resolution: needsinfo =>
Comment:
I just got hit with this. I created a [https://github.com/massover
/possible-leak repo] with more info and tests that can reproduce the
issue. The code in production was a logging statement that was accessing
the file field. I only wanted to log the file name, not the file itself.
By removing the log of the file field entirely, our memory utilization
dropped by 1/4. The leaking model / file field is pretty crucial and is
used in a majority our endpoints.
{{{
# before
extra = {
"file_field": obj.file_field
"id": obj.id
}
logger.info("hello world!", extra=extra)
# after with 1/4 less memory usage and no other changes
extra = {
"id": obj.id
}
logger.info("hello world!", extra=extra)
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:10>
* type: Bug => Cleanup/optimization
* stage: Unreviewed => Accepted
Comment:
Hi Joshua.
Thanks for the reproduce project — very nice.
I'm going to accept this for review as a potential optimisation, since
memory use is lower in your example using the `weakref` there, and, as you
say, it doesn't break any tests.
I **don't** think it's an actual memory leak per se — rather a question of
the (somewhat high) level at which Python's garbage collector chooses to
kick in.
You have a loop in your test-cases:
{{{
for i in range(100):
# print(process.memory_full_info().rss / 1024 ** 2)
new = Leak.objects.get(id=leak.id)
extra = {"f": new.f}
}}}
If you uncomment the `print()` we see the memory usage rising, in the
first unpatched case:
{{{
...
653.0
663.015625
673.03125
683.046875
693.0625
703.09375
713.109375
723.125
733.140625
743.15625
753.171875
763.1875
773.203125
783.21875
793.234375
...
}}}
If you then though `import gc` and force the garbage collector to run,
you'll see that the memory use remains steady:
{{{
...
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
...
}}}
As such it's not a **leak** — it's perfectly collectable — which is inline
with Aymeric's previous observation:
> ... the memory curve climbed a little bit at the beginning, then
stabilized.
The GC does eventually kick-in — but **yes** no reason why we can't see
about bringing that down by removing the reference loop.
Do you fancy making a PR with your suggested change, so we can get some
eyes on it?
Thanks.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:11>
* owner: nobody => Joshua Massover
* status: new => assigned
* has_patch: 0 => 1
* version: 1.1 => 4.0
Comment:
[https://github.com/django/django/pull/15592 PR] up! Added much more
information there.
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:12>
Comment (by Carlton Gibson):
Super, thanks — great work! Let's see what folks say.
Slight aside (not meant to distract from the proposed change): As part of
this, I'm wondering if there's some basic advice on using `gc` that we
might add to the docs. 🤔 For example a `gc.collect()` in a file upload
view might go a long way...
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:13>
* needs_better_patch: 0 => 1
* needs_tests: 0 => 1
* needs_docs: 0 => 1
Comment:
From [https://github.com/django/django/pull/15592 the PR] it looks like we
can push forwards with adding a `weak` option to `FileField` that would
have `FieldFile` use a `weakref`, but that we can't just bring that in as
there's a behaviour change with `save()` and `delete()`. (Maybe the case
for the stronger change can be made via discussion on the
DevelopersMailingList.)
--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:14>