#36700: ASGIHandler creates reference cycles that require a gc pass to free
-------------------------------------+-------------------------------------
Reporter: Patryk Zawadzki | Type: Bug
Status: new | Component: HTTP
| handling
Version: 5.2 | Severity: Normal
Keywords: memory asgihandler | Triage Stage:
gc | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Disclaimer: it's impossible for pure Python code to truly leak memory (in
the sense that valgrind would detect), however it's quite easy to create
structures that effectively occupy memory for a long time because they
require the deepest (generation 2) garbage collection cycle to collect and
that happens very rarely. In addition to that, the more such structures
aggregate, the more expensive the garbage collection cycle becomes,
because it effectively stops the entire interpreter to do its job and it
can take seconds. On top of that, it's entirely possible for a container
to run out of memory before the garbage collection happens and we (Saleor
Commerce) see containers being terminated by the kernel OOM killer due to
high memory pressure where most of that memory is locked by garbage.
One such case is found in the `ASGIHandler`. When handling a request, the
`ASGIHandler.handle` spawns two async tasks. One for the actual app code
(`process_request`) and one for the disconnection handler
(`ASGIHandler.listen_for_disconnect`). The latter will raise
`RequestAborted` every time it receives the `http.disconnect` ASGI
message.
In our setup (`uvicorn`), the `http.disconnect` message is received for
every request, even after successfully processing the view code and
delivering the response, but that's not critical for this issue, it just
makes it easy to reproduce this on our end.
Here's where the problem is:
1. When `RequestAborted` is raised, its stack trace includes the call to
`ASGIHandler.handle`, which is where `ASGIHandler.listen_for_disconnect`
was called.
2. In turn, the `ASGIHandler.handle` stack frame includes references to
all local variables.
3. Among those variables is `tasks` which holds the references to both
async tasks.
4. Now, one of those tasks is the task created from
`ASGIHandler.listen_for_disconnect`.
5. The task future is already resolved and now holds a reference back to
the `RequestAborted` exception from step 1. And thus the cycle completes,
creating an unfreeable reference cycle.
All of those objects hold references to other objects and stack frames
that also become unfreeable, ending up holding a sizeable list of objects
hostage until the next time `gc.collect(2)` happens (which can be minutes,
depending on how much code your app executes).
Making `ASGIHandler.handle` explicitly call `tasks.clear()` or just `del
tasks` after the tasks are no longer needed breaks the cycle by removing
the link between the exception stack frame locals and the future
referencing the exception.
PS: I've classified this as a bug as high memory use can lead to OOM kills
and crashes but feel free to reclassify as "cleanup/optimization" if
that's more fitting.
--
Ticket URL: <
https://code.djangoproject.com/ticket/36700>
Django <
https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.