[Django] #34870: Memory leak when using psycopg-c with django hstore

44 views
Skip to first unread message

Django

unread,
Sep 24, 2023, 4:11:31 PM9/24/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David | Owner: nobody
Burke |
Type: Bug | Status: new
Component: Database | Version: 4.2
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
See https://gitlab.com/bufke/psycopg-memory-leak

When running Celery, Django ORM, psycopg-c, and hstore - a memory leak
occurs.

Here's the same bug in my real project with a graphic of the memory usage
https://gitlab.com/glitchtip/glitchtip-backend/-/issues/265

I hope this is the right place to report, let me know if otherwise. It
doesn't happen without the Django ORM.

The easiest workaround is probably to use the pure python psycopg package.
I'm unsure if it can be reproduced or not without Celery but Celery seems
the easiest way to run the code 1000 times so that the leak is obvious.
Another workaround is to use a service/conf option to restart the process
periodically.

In my observations of a production server and running the memory_leak
project locally - the memory usage goes up forever and never halts. The
memory is never released. DEBUG can be True or False and it still occurs.

--
Ticket URL: <https://code.djangoproject.com/ticket/34870>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Sep 24, 2023, 7:27:48 PM9/24/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Tim Graham):

If the pure Python psycopg fixes the issue, does it follow that psycopg-c
is at fault? Can you explain why Django is at fault? If not, I'm not sure
this ticket is actionable.

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:1>

Django

unread,
Sep 24, 2023, 8:44:00 PM9/24/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Burke):

I'm unsure which project is at fault. The reverse could be said as - not
using Django's "django.contrib.postgres" fixes the issue, does it follow
that django is at fault. Is there something that makes you suspect psycopg
is at fault or a better place to submit an issue? In the linked sample
repo, just commenting out that one line in INSTALLED_APPS fixes it. But
then the HStore feature is no longer available from Django.

If I can strip out more code away for a more minimalist reproduction repo,
I will follow up here. It would also be more ideal to remove Celery to
rule that out.

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:2>

Django

unread,
Sep 24, 2023, 9:22:55 PM9/24/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Burke):

Removing the line `hstore.register_hstore(ti, connection.connection)` in
django/contrib/postgres/signals.py also "fixes" the leak. It appears that
it's being registered for each celery task.
So we have:

- Why does celery result in calling this django signal so many times?
- Why does a psycopg connection string parsing function leak memory when
called many times.
- Why does Django register it so many times.

Personally I could use pure python psycopg and move on with life. But I
have to think it's a common set up to use Postgres, Celery, and Django
together. psycopg-c is what that project recommends for production usage.
I presume HStore is just incidental and that any Django contrib Postgres
feature would be affected.

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:3>

Django

unread,
Sep 25, 2023, 8:13:41 PM9/25/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Burke):

https://github.com/psycopg/psycopg/issues/647 has been reported and looks
to have a resolution. I'm unsure if it makes sense for Celery to call
these Django contrib.postgres signal functions for each task, but at least
there is no longer a memory leak once this patch is applied.

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:4>

Django

unread,
Sep 25, 2023, 11:56:14 PM9/25/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: closed

Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: invalid

Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* status: new => closed
* resolution: => invalid


Comment:

David, thanks for the investigation. It seems to be fixed in the `psycopg`
(Thanks Daniele!) so I'm closing as "invalid".

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:5>

Django

unread,
Sep 26, 2023, 6:46:48 AM9/26/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Florian Apolloner):

* cc: Florian Apolloner (added)


Comment:

Aside from the leak I wouldn't be surprised if we have a bug in Django or
celery here as well. Mind printing the traceback around registering the
hstore so we see where is calling us and why?

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:6>

Django

unread,
Sep 26, 2023, 10:00:53 AM9/26/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by David Burke):

Right. It seems odd to run the register function more than once.
Stacktrace as requested.

{{{
File "/env/lib/python3.10/site-packages/celery/app/trace.py", line 477,
in trace_task
R = retval = fun(*args, **kwargs)
File "/env/lib/python3.10/site-packages/celery/app/trace.py", line 760,
in __protected_call__
return self.run(*args, **kwargs)
File "/memory_leak/tasks.py", line 9, in count_it
User.objects.count()
File "/env/lib/python3.10/site-packages/django/db/models/manager.py",
line 87, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/env/lib/python3.10/site-packages/django/db/models/query.py", line
608, in count
return self.query.get_count(using=self.db)
File "/env/lib/python3.10/site-packages/django/db/models/sql/query.py",
line 568, in get_count
return obj.get_aggregation(using, {"__count": Count("*")})["__count"]
File "/env/lib/python3.10/site-packages/django/db/models/sql/query.py",
line 554, in get_aggregation
result = compiler.execute_sql(SINGLE)
File "/env/lib/python3.10/site-
packages/django/db/models/sql/compiler.py", line 1560, in execute_sql
cursor = self.connection.cursor()
File "/env/lib/python3.10/site-packages/django/utils/asyncio.py", line
26, in inner
return func(*args, **kwargs)
File "/env/lib/python3.10/site-
packages/django/db/backends/base/base.py", line 330, in cursor
return self._cursor()
File "/env/lib/python3.10/site-
packages/django/db/backends/base/base.py", line 306, in _cursor
self.ensure_connection()
File "/env/lib/python3.10/site-packages/django/utils/asyncio.py", line
26, in inner
return func(*args, **kwargs)
File "/env/lib/python3.10/site-
packages/django/db/backends/base/base.py", line 289, in ensure_connection
self.connect()
File "/env/lib/python3.10/site-packages/django/utils/asyncio.py", line
26, in inner
return func(*args, **kwargs)
File "/env/lib/python3.10/site-
packages/django/db/backends/base/base.py", line 273, in connect
connection_created.send(sender=self.__class__, connection=self)
File "/env/lib/python3.10/site-packages/django/dispatch/dispatcher.py",
line 176, in send
return [
File "/env/lib/python3.10/site-packages/django/dispatch/dispatcher.py",
line 177, in <listcomp>
(receiver, receiver(signal=self, sender=sender, **named))
File "/env/lib/python3.10/site-
packages/django/contrib/postgres/signals.py", line 43, in
register_type_handlers
raise Exception()
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:7>

Django

unread,
Sep 26, 2023, 10:08:57 AM9/26/23
to django-...@googlegroups.com
#34870: Memory leak when using psycopg-c with django hstore
-------------------------------------+-------------------------------------
Reporter: David Burke | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.2
(models, ORM) |
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Florian Apolloner):

Ok, raising an exception could cause problems, but as far as that
traceback is concerned I am reading the following out of it:

* Django registers the hstore type once per connection. This makes sense
since connections to multiple databases might have different hstore oids
(In older versions of Django we might have registered types globally for
psycopg once, but I think with psycopg3 this got improved).
* Django only queries the database once per alias since `get_hstore_oids`
is cached.

So from the looks of it it seems as if celery opens a new connection per
task which is kinda (?) wasteful. But it is definitely not a bug in
Django.

--
Ticket URL: <https://code.djangoproject.com/ticket/34870#comment:8>

Reply all
Reply to author
Forward
0 new messages