[Django] #16022: Cyclic reference in FieldFile

7 views
Skip to first unread message

Django

unread,
May 13, 2011, 11:14:14 AM5/13/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile
---------------------+----------------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer (models, ORM)
Version: 1.1 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 1 | Easy pickings: 0
---------------------+----------------------------------------------
django.db.models.fields.files:FieldFile creates a cyclic reference to the
model instance it's attached to via
django.db.models.fields.files:FileDescriptor.

The effects can be considerable. In our Web site, for example, it causes
the process running Django to increase the memory used by over 1MB after
every request.

The patch is generated against Django trunk, even though it's only been
tested under Django 1.1.4 (but the cyclic reference doesn't seem to have
been fixed).

--
Ticket URL: <http://code.djangoproject.com/ticket/16022>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 13, 2011, 12:16:02 PM5/13/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords:
Unreviewed | Has patch: 1
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------
Changes (by Gustavo):

* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0


Comment:

Please disregard the patch, it breaks file uploads. I'm trying to improve
it at the moment (and suggestions are much appreciated).

--
Ticket URL: <http://code.djangoproject.com/ticket/16022#comment:1>

Django

unread,
May 19, 2011, 7:37:14 AM5/19/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords:
Unreviewed | Has patch: 1
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------

Comment (by Gustavo):

I've tried replacing weakref.ref() with weakref.proxy(), which does work
but then memory keeps growing as usual. I think a proper way to fix it is
to break the cyclic reference, but that'd be backwards incompatible.

I cannot afford spending more time on this, so I'll leave it to someone
else. I'm happy to test any eventual fix though. In the meantime, we'll be
using mod_wsgi's ''maximum-requests'' to mitigate the consequences of this
problem.

A hint to whoever will have a look at this: Try setting ''self.instance''
to ''None'' in ''!FieldFile.!__init!__'' and you won't notice any memory
growth when you query many records. This breaks creation, modification and
deletion of files, but helps demonstrate that this is the problematic
reference.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:2>

Django

unread,
May 19, 2011, 7:41:48 AM5/19/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords: memory leak
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------
Changes (by Gustavo):

* keywords: => memory leak
* has_patch: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:3>

Django

unread,
May 19, 2011, 7:43:29 AM5/19/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords: memory leak
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:4>

Django

unread,
May 27, 2011, 12:22:08 PM5/27/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: new
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords: memory leak
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

Python's gc is supposed to be able to collect reference cycles these days,
unless there are `__del__` methods. I didn't see any that could by linked
with this bug.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:5>

Django

unread,
Jun 2, 2011, 6:23:48 AM6/2/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: closed
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
worksforme | Keywords: memory leak
Triage Stage: | Has patch: 0
Unreviewed | Needs tests: 0
Needs documentation: 0 | Easy pickings: 0
Patch needs improvement: 0 |
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* status: new => closed
* resolution: => worksforme


Comment:

OK, I tried to reproduce this seriously. I created a project called
`madupload` and an app called `receiver` inside.
{{{

madupload
├── __init__.py
├── manage.py
├── receiver
│   ├── __init__.py
│   ├── models.py
│   ├── views.py
├── settings.py
└── urls.py
}}}

`receiver/models.py`:
{{{
from django import forms
from django.db import models

class UploadModel(models.Model):
upload = models.FileField()

class UploadForm(forms.ModelForm):
class Meta:
model = UploadModel
}}}

`receiver/views.py`:
{{{
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt

from .models import UploadForm

@csrf_exempt
def upload(request):
form = UploadForm(request.POST, request.FILES)
try:
model = form.save(commit=False)
return HttpResponse("Saved %i bytes.\n" % model.upload.size,
status=201, content_type='text/plain')
except ValueError:
return HttpResponse(repr(form.errors) + "\n",
status=400, content_type='text/plain')
}}}

`urls.py`:
{{{
from django.conf.urls.defaults import patterns, url

urlpatterns = patterns('',
url(r'^$', 'receiver.views.upload'),
)
}}}

All other files are unchanged from Django's template.

Then I launched `runserver` and started uploading:

{{{
myk@mYk madupload % dd if=/dev/random of=1mb.bin bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes transferred in 0.146655 secs (7149958 bytes/sec)
myk@mYk madupload % while true; do curl -F "upload=@1mb.bin"
http://localhost:8000; done
Saved 1048576 bytes.
Saved 1048576 bytes.
Saved 1048576 bytes.
...
}}}

I've let it run more than one hundred of uploads, and the memory footprint
of the Python process handling `runserver` is flat.

At this point, I have proven that the cyclic reference in !FileField is
not a problem in itself.

I still don't know what's causing your bug. You may have a global variable
referencing your !FileField objects somewhere.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:6>

Django

unread,
Jun 2, 2011, 6:33:49 AM6/2/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: reopened
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: | Severity: Normal
Triage Stage: | Keywords: memory leak
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------
Changes (by Gustavo):

* status: closed => reopened
* resolution: worksforme =>


Comment:

Hello, aaugustin.

Thank you very much for looking into this!

I forgot to add a useful piece of information here, which I mentioned on
[http://groups.google.com/group/django-
users/browse_thread/thread/804b4fd7a3d126fb Google Groups]: Memory
increases considerably when you query a model that has a !FileField. I'm
sorry about that.

To reproduce this, you could create, say, 2000 records and iterate over
them like this inside a Django view:

{{{
for record in MyModel.objects.all():
echo record.upload
}}}

As you refresh the page, you'll notice that memory increases.

Cheers.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:7>

Django

unread,
Jun 2, 2011, 7:35:43 AM6/2/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: closed
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: needsinfo | Severity: Normal
Triage Stage: | Keywords: memory leak
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------
Changes (by aaugustin):

* status: reopened => closed
* resolution: => needsinfo


Comment:

I read your email to django-users which had more info and I updated my
test code above as follows.


`receiver/models.py`:

{{{
from django import forms
from django.db import models

class UploadModel(models.Model):

upload = models.FileField(upload_to='receiver')


class UploadForm(forms.ModelForm):
class Meta:
model = UploadModel
}}}

`receiver/views.py`:
{{{
from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt

from .models import UploadForm, UploadModel

@csrf_exempt
def upload(request, commit=False):

form = UploadForm(request.POST, request.FILES)
try:

model = form.save(commit=commit)
action = "Saved" if commit else "Received"
return HttpResponse("%s %i bytes.\n" % (action,
model.upload.size),

status=201, content_type='text/plain')
except ValueError:
return HttpResponse(repr(form.errors) + "\n",
status=400, content_type='text/plain')

def names(request):
upload_names = [model.upload.name for model in
UploadModel.objects.all()]
return HttpResponse("%i objects \n" % len(upload_names),
status=200, content_type='text/plain')

}}}

`urls.py`:
{{{
from django.conf.urls.defaults import patterns, url

urlpatterns = patterns('',

url(r'^fake/$', 'receiver.views.upload', {'commit': False}),
url(r'^save/$', 'receiver.views.upload', {'commit': True}),
url(r'^read/$', 'receiver.views.names'),
)
}}}

I added a database, etc. to the settings and ran `syncdb`.

I uploaded 100 1kb files:

{{{
myk@mYk madupload % dd if=/dev/random of=1kb.bin bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes transferred in 0.000232 secs (4414149 bytes/sec)
myk@mYk madupload % for i in {1..100}; do curl -F "upload=@1kb.bin"
http://localhost:8000/save/; done
Saved 1024 bytes.
Saved 1024 bytes.
Saved 1024 bytes.
...
}}}

And then I read them over and over:

{{{
myk@mYk madupload % while true; do curl http://localhost:8000/read/; done
100 objects
100 objects
100 objects
...
}}}

I've been heating the planet like this for a few minutes and the memory
curve climbed a little bit at the beginning, then stabilized.

Initially, I was using Python 2.6; I switched to Python 2.5 but the result
was the same.

So, unfortunately, we still don't have a proof that the bug is in Django
itself.

At this point, I am afraid you didn't provide sufficient information for
me to reproduce the bug. You could:

- try to play with `gc.garbage` to see what happens in your app, see
http://docs.python.org/library/gc.html
- modify my example until it exhibits the memory leak
- come up with your own reproducible example, and provide complete
instructions of how to run it

Thanks!

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:8>

Django

unread,
Jun 2, 2011, 7:37:03 AM6/2/11
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo | Owner: nobody
Type: Bug | Status: closed
Milestone: | Component: Database layer
Version: 1.1 | (models, ORM)
Resolution: needsinfo | Severity: Normal
Triage Stage: | Keywords: memory leak
Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
-------------------------------------+-------------------------------------

Comment (by aaugustin):

PS : closing the ticket as "needsinfo" is part of the standard triaging
procedure when we can't reproduce a problem. It doesn't mean the problem
doesn't exist.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:9>

Django

unread,
Apr 12, 2022, 7:01:45 PM4/12/22
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo Narea | Owner: nobody
Type: Bug | Status: new

Component: Database layer | Version: 1.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: memory leak | Triage Stage:
| Unreviewed

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Joshua Massover):

* status: closed => new
* ui_ux: => 0
* resolution: needsinfo =>


Comment:

I just got hit with this. I created a [https://github.com/massover
/possible-leak repo] with more info and tests that can reproduce the
issue. The code in production was a logging statement that was accessing
the file field. I only wanted to log the file name, not the file itself.
By removing the log of the file field entirely, our memory utilization
dropped by 1/4. The leaking model / file field is pretty crucial and is
used in a majority our endpoints.

{{{
# before
extra = {
"file_field": obj.file_field
"id": obj.id
}
logger.info("hello world!", extra=extra)
# after with 1/4 less memory usage and no other changes
extra = {
"id": obj.id
}
logger.info("hello world!", extra=extra)
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:10>

Django

unread,
Apr 13, 2022, 3:57:05 AM4/13/22
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo Narea | Owner: nobody
Type: | Status: new
Cleanup/optimization |

Component: Database layer | Version: 1.1
(models, ORM) |
Severity: Normal | Resolution:
Keywords: memory leak | Triage Stage: Accepted

Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* type: Bug => Cleanup/optimization
* stage: Unreviewed => Accepted


Comment:

Hi Joshua.

Thanks for the reproduce project — very nice.

I'm going to accept this for review as a potential optimisation, since
memory use is lower in your example using the `weakref` there, and, as you
say, it doesn't break any tests.

I **don't** think it's an actual memory leak per se — rather a question of
the (somewhat high) level at which Python's garbage collector chooses to
kick in.

You have a loop in your test-cases:


{{{
for i in range(100):
# print(process.memory_full_info().rss / 1024 ** 2)
new = Leak.objects.get(id=leak.id)
extra = {"f": new.f}
}}}

If you uncomment the `print()` we see the memory usage rising, in the
first unpatched case:


{{{
...
653.0
663.015625
673.03125
683.046875
693.0625
703.09375
713.109375
723.125
733.140625
743.15625
753.171875
763.1875
773.203125
783.21875
793.234375
...
}}}

If you then though `import gc` and force the garbage collector to run,
you'll see that the memory use remains steady:


{{{
...
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
152.453125
...
}}}

As such it's not a **leak** — it's perfectly collectable — which is inline
with Aymeric's previous observation:

> ... the memory curve climbed a little bit at the beginning, then
stabilized.

The GC does eventually kick-in — but **yes** no reason why we can't see
about bringing that down by removing the reference loop.

Do you fancy making a PR with your suggested change, so we can get some
eyes on it?
Thanks.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:11>

Django

unread,
Apr 13, 2022, 11:44:52 AM4/13/22
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo Narea | Owner: Joshua
Type: | Massover
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 4.0

(models, ORM) |
Severity: Normal | Resolution:
Keywords: memory leak | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Joshua Massover):

* owner: nobody => Joshua Massover
* status: new => assigned
* has_patch: 0 => 1
* version: 1.1 => 4.0


Comment:

[https://github.com/django/django/pull/15592 PR] up! Added much more
information there.

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:12>

Django

unread,
Apr 13, 2022, 11:59:48 AM4/13/22
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo Narea | Owner: Joshua
Type: | Massover
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: memory leak | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

Super, thanks — great work! Let's see what folks say.

Slight aside (not meant to distract from the proposed change): As part of
this, I'm wondering if there's some basic advice on using `gc` that we
might add to the docs. 🤔 For example a `gc.collect()` in a file upload
view might go a long way...

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:13>

Django

unread,
Jun 22, 2022, 9:08:31 AM6/22/22
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo Narea | Owner: Joshua
Type: | Massover
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: memory leak | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* needs_better_patch: 0 => 1
* needs_tests: 0 => 1
* needs_docs: 0 => 1


Comment:

From [https://github.com/django/django/pull/15592 the PR] it looks like we
can push forwards with adding a `weak` option to `FileField` that would
have `FieldFile` use a `weakref`, but that we can't just bring that in as
there's a behaviour change with `save()` and `delete()`. (Maybe the case
for the stronger change can be made via discussion on the
DevelopersMailingList.)

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:14>

Django

unread,
Nov 6, 2024, 3:51:44 PM11/6/24
to django-...@googlegroups.com
#16022: Cyclic reference in FieldFile causes memory usage to grow considerably
-------------------------------------+-------------------------------------
Reporter: Gustavo Narea | Owner: Joshua
Type: | Massover
Cleanup/optimization | Status: assigned
Component: Database layer | Version: 4.0
(models, ORM) |
Severity: Normal | Resolution:
Keywords: memory leak | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 1
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Ülgen Sarıkavak):

* cc: Ülgen Sarıkavak (added)

--
Ticket URL: <https://code.djangoproject.com/ticket/16022#comment:15>
Reply all
Reply to author
Forward
0 new messages