[Django] #34356: Memory leak when generating PDFs

72 views
Skip to first unread message

Django

unread,
Feb 20, 2023, 10:01:41 AM2/20/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin | Owner: nobody
(Robert) Thomas |
Type: Bug | Status: new
Component: Core | Version: 4.1
(Other) | Keywords: memory memory-leak
Severity: Normal | pdf weasyprint
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
== Context

Our app generates a one-page PDF report for users. It contains a few small
SVG and PNG icons, and 4 big textual tables. The PDF is generated once,
after which it is put in a storage bucket for subsequent retrieval.

== Problem

The app is Django 4.1.6, Weasyprint 57.2, running on Heroku (heroku-22).
We're not having any issues retrieving previously-generated PDFs, but each
time it generates a new PDF (filesize 38kb) the app's memory RSS increases
by 20 - 40mb, as reported by Heroku. This memory usage doesn't go down
until the server is restarted.

Unfortunately Heroku doesn't automatically restart the server until both
memory RSS and swap exceed the 512mb limit, so once RSS is used up we
start getting a lot of pings about OOM errors and have to manually restart
it.

== What we've tried

Even after removing all images, fonts, and CSS (filesize 32kb) each
generation still increases the memory RSS by about 17mb.

If we remove everything from the report template, leaving just <!DOCTYPE
html><html lang="en"><head><title>Test</title><body></body></html>
(filesize 863b), each generation increases the memory RSS by about 1.3mb.

== Reproduce

I deployed a little test app to show this in action, with a link to the
source code: https://weasyprint-mem.herokuapp.com/

You can see that every time a PDF is generated it increases the memory
usage, although not always consistently. I would expect the data for each
PDF to be garbage-collected once it has rendered:

== Related

I opened a bug ticket about this with Weasyprint
(https://github.com/Kozea/WeasyPrint/issues/1496). They say that because
they cannot reproduce this when running just Weasyprint by itself from the
command-line, the memory leak must be elsewhere in the ecosystem.

--
Ticket URL: <https://code.djangoproject.com/ticket/34356>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Feb 20, 2023, 10:01:50 AM2/20/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody

Thomas |
Type: Bug | Status: new
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution:
Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Robin (Robert) Thomas):

* Attachment "219976184-2e826b19-eb1d-40a8-926a-b9751468f0eb.jpg" added.

Screenshot of memory usage

Django

unread,
Feb 20, 2023, 10:03:12 AM2/20/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: new
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution:
Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Robin (Robert) Thomas:

Old description:

New description:

== Context

Our app generates a one-page PDF report for users. It contains a few small
SVG and PNG icons, and 4 big textual tables. The PDF is generated once,
after which it is put in a storage bucket for subsequent retrieval.

== Problem

The app is Django 4.1.6, Weasyprint 57.2, running on Heroku (heroku-22).
We're not having any issues retrieving previously-generated PDFs, but each
time it generates a new PDF (filesize 38kb) the app's memory RSS increases
by 20 - 40mb, as reported by Heroku. This memory usage doesn't go down
until the server is restarted.

Unfortunately Heroku doesn't automatically restart the server until both
memory RSS and swap exceed the 512mb limit, so once RSS is used up we
start getting a lot of pings about OOM errors and have to manually restart
it.

== What we've tried

Even after removing all images, fonts, and CSS (filesize 32kb) each
generation still increases the memory RSS by about 17mb.

If we remove everything from the report template, leaving just <!DOCTYPE
html><html lang="en"><head><title>Test</title><body></body></html>
(filesize 863b), each generation increases the memory RSS by about 1.3mb.

== Reproduce

I deployed a little test app to show this in action, with a link to the
source code: https://weasyprint-mem.herokuapp.com/

You can see from the attached image that every time a PDF is generated it


increases the memory usage, although not always consistently. I would
expect the data for each PDF to be garbage-collected once it has rendered:

[[Image(https://code.djangoproject.com/raw-
attachment/ticket/34356/219976184-2e826b19-eb1d-40a8-926a-
b9751468f0eb.jpg)]]

== Related

I opened a bug ticket about this with Weasyprint
(https://github.com/Kozea/WeasyPrint/issues/1496). They say that because
they cannot reproduce this when running just Weasyprint by itself from the
command-line, the memory leak must be elsewhere in the ecosystem.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:1>

Django

unread,
Feb 20, 2023, 10:03:29 AM2/20/23
to django-...@googlegroups.com

Old description:

> You can see from the attached image that every time a PDF is generated it
> increases the memory usage, although not always consistently. I would
> expect the data for each PDF to be garbage-collected once it has
> rendered:
>
> [[Image(https://code.djangoproject.com/raw-
> attachment/ticket/34356/219976184-2e826b19-eb1d-40a8-926a-
> b9751468f0eb.jpg)]]
>
> == Related
>
> I opened a bug ticket about this with Weasyprint
> (https://github.com/Kozea/WeasyPrint/issues/1496). They say that because
> they cannot reproduce this when running just Weasyprint by itself from
> the command-line, the memory leak must be elsewhere in the ecosystem.

New description:

== Context

== Problem

== Related

== Reproduce

[[Image(https://code.djangoproject.com/raw-
attachment/ticket/34356/219976184-2e826b19-eb1d-40a8-926a-
b9751468f0eb.jpg)]]

--

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:2>

Django

unread,
Feb 20, 2023, 1:45:12 PM2/20/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: closed

Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution: needsinfo

Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => needsinfo


Comment:

Hi, I don't think you've explained the issue in enough detail to confirm a
bug in Django. Please reopen the ticket if you can debug your issue and
provide details about why and where Django is at fault.

This may be a duplicate of #16022, see
[https://github.com/django/django/pull/15592 PR] for a possible solution.

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:3>

Django

unread,
Feb 20, 2023, 1:51:14 PM2/20/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: closed
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution: needsinfo
Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Robin (Robert) Thomas):

@mariusz The ticket you referenced is for DB file fields, and the given
source code and example do not use models or a database at all.

I'll try this in Flask to see if there's a similar result. If not, then
the issue must be with Django in which case I'll open a new ticket.

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:4>

Django

unread,
Feb 20, 2023, 2:15:23 PM2/20/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: closed
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution: needsinfo
Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Mariusz Felisiak):

> If not, then the issue must be with Django in which case I'll open a new
ticket.

Please don't reopen the ticket without providing an extra details, i.e.


why and where Django is at fault.

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:5>

Django

unread,
Feb 21, 2023, 3:50:36 AM2/21/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: closed
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution: needsinfo
Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Carlton Gibson):

Often this is Python's garbage collection not kicking in as soon as you
want. First this I'd do it add a `gc.collect()` after processing the PDF,
to see if you can bring it down by hand. (If the memory is collected, it's
not a leak per se… — unless there's something specific, gc behaviour is a
Python issue, rather than anything Django can do.)

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:6>

Django

unread,
Feb 21, 2023, 10:58:20 AM2/21/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: closed
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution: needsinfo
Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Robin (Robert) Thomas):

Getting the same behavior in Django:

https://weasyprint-mem.herokuapp.com/

...and in Flask:

https://weasyprint-mem-flask.herokuapp.com/

...so I'll leave this closed. Thanks! :)

--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:7>

Django

unread,
Feb 21, 2023, 11:21:30 AM2/21/23
to django-...@googlegroups.com
#34356: Memory leak when generating PDFs
-------------------------------------+-------------------------------------
Reporter: Robin (Robert) | Owner: nobody
Thomas |
Type: Bug | Status: closed
Component: Core (Other) | Version: 4.1
Severity: Normal | Resolution: invalid

Keywords: memory memory-leak | Triage Stage:
pdf weasyprint | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* resolution: needsinfo => invalid


--
Ticket URL: <https://code.djangoproject.com/ticket/34356#comment:8>

Reply all
Reply to author
Forward
0 new messages