[Django] #37033: Psycopg2 has different semantics than psycopg3 and should not be deprecated

20 views
Skip to first unread message

Django

unread,
Apr 13, 2026, 10:05:54 AMApr 13
to django-...@googlegroups.com
#37033: Psycopg2 has different semantics than psycopg3 and should not be deprecated
--------------------------+-----------------------------------------
Reporter: sastraxi | Type: Uncategorized
Status: new | Component: Uncategorized
Version: 6.0 | Severity: Normal
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------+-----------------------------------------
The context: we have been trialling psycopg3 in our (async-free!) Django
5.2 codebase. I noted that
https://docs.djangoproject.com/en/6.0/releases/4.2/#psycopg-3-support says
"Support for psycopg2 is likely to be deprecated and removed at some point
in the future." -- this ticket has my observations that make me believe
this is not a good idea.

We have many Celery (5.6.2) tasks that run on idempotent queue workers
with Django fully loaded. When we re-deploy, we kill the worker containers
and re-queue the tasks that were in-flight / ready to process on that
worker. It does this by raising exceptions from signal handlers:
https://github.com/celery/billiard/blob/main/billiard/common.py#L106
ultimately raises a `SystemExit` exception from inside the signal handler
when it receives the SIGQUIT signal.

As outlined in [Safe Asynchronous Exceptions for
Python](https://www.cs.williams.edu/~freund/papers/python.pdf), raising
exceptions in this way can arbitrarily interrupt execute at any bytecode
boundary.

=== Example case

Here's a simplified task handler so you can visualize the problem:

{{{#!python
@celery_app.task(MY_IDEMPOTENT_QUEUE)
def my_task():
with transaction.atomic():
[...]
my_model.save()
transaction.on_commit(lambda: my_other_task.delay())
}}}

Sometimes, when a signal is delivered while commit() is running, the
following happens:

1. PostgreSQL has applied the COMMIT (we can later verify DB state).
2. psycopg3 is interrupted during its Python‑level wait loop; an exception
bubbles out of commit() (or _set_autocommit) rather than a normal return.
3. Django sees an exception in `atomic.__exit__` and therefore does not
run `on_commit` callbacks, even though the transaction actually committed.

=== Root cause

* psycopg2's COMMIT path uses `PQexec("COMMIT")` inside a
Py_BEGIN_ALLOW_THREADS region, so the interpreter will not run the Python
signal handler until that call returns; this effectively makes COMMIT
atomic w.r.t. Python async exceptions.
* psycopg3's design explicitly avoids blocking libpq calls, using a Python
generator + wait functions to drive the COMMIT protocol step-by-step. That
means Python bytecode is executing throughout, so signal handlers can
raise exceptions mid‑protocol.

I do not believe this will be possible to solve with psycopg3's approach,
so developers using Django's on_commit inside of celery worker handlers
need to be aware of this behaviour.
--
Ticket URL: <https://code.djangoproject.com/ticket/37033>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 13, 2026, 12:50:11 PMApr 13
to django-...@googlegroups.com
#37033: Psycopg2 has different semantics than psycopg3 and should not be deprecated
--------------------------------+--------------------------------------
Reporter: Cameron Gorrie | Owner: (none)
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Comment (by Simon Charette):

I think the real problem at play is that your worker deployment workflow
is not truly idempotent; I don't think it's fair to assume that
`SIGKILL`'ing a process without gracefully handling it on the receiving
end will do the right thing by default.

The way Celery deals with `SoftTimeLimitExceeded` and worker termination
signals by surfacing exception and allowing the flow of the program to be
interrupted at any time is naive and error prone (it also causes cursor
corruption on the `mysqlclient` MySQL library). The correct way of
gracefully handling non-fatal error signal is usually to set a flag on the
execution context that the application can check at a safe checkpoint
(`self.terminate = True`).

> When we re-deploy, we kill the worker containers and re-queue the tasks
that were in-flight / ready to process on that worker.

Any reason not to
[https://docs.celeryq.dev/en/stable/userguide/workers.html#warm-shutdown
perform a warm shutdown] first to limit the cases when this can happen

I'm afraid there's is little Django can do here, if `psycopg2` development
halts and it is no longer supported upstream we will have to pull the plug
on it. I'm not sure this is the right place to advocate for a problem at
the intersection of Celery and psycopg to be solved, have you raised this
problem with any of these third party library maintainers? Surely Celery
is aware that this signalling approach is problematic (e.g. what if a task
is the middle of performing a critical HTTP request when the signal
surfaces?) and has documented patterns on how to deal with this problem.
--
Ticket URL: <https://code.djangoproject.com/ticket/37033#comment:1>

Django

unread,
Apr 13, 2026, 8:54:41 PMApr 13
to django-...@googlegroups.com
#37033: Psycopg2 has different semantics than psycopg3 and should not be deprecated
--------------------------------+--------------------------------------
Reporter: Cameron Gorrie | Owner: (none)
Type: Uncategorized | Status: new
Component: Uncategorized | Version: 6.0
Severity: Normal | Resolution:
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Comment (by Cameron Gorrie):

Thanks for taking a look, Simon. Agreed that we're asking a lot of the
libraries / frameworks here w.r.t. idempotency.

Warm shutdown seems like a good plan: check e.g. `self.terminate = True`
periodically. In the context we're supporting this system, it would
require us to change an uncomfortable amount of code, but it is doable. I
thought about doing it in a thread, but I think `SystemExit` would have
the exact same problem. I have a feeling we'd have options if we had async
handlers, but we're not on 6.0 yet and even then it's a big change... can
you think of anything I've missed?

I've opened a ticket with psycopg and am about to with celery. I don't
think this issue is resolved by writing code -- but to collaborate in
ensuring the message is out to help those who--perhaps naively--believe
this combination will work. I opened it here too because the root cause is
two-fold:

1. celery soft / cold shutdown (i.e. exception propagation)
2. I/O "commits" that require transactional properties across Python
bytecode

Django ORM's `on_commit` semantics hit the latter: when we can't be sure
if our COMMIT was successful or not, we cannot run code that depends on
the result. It results in situations that seem impossible until you look
at the problem from the right lens.

In my case I inherited parts of this system and am still in some ways
finding my footing; opening a ticket with this problem in itself makes it
more discoverable, but I think you're right -- it's Celery's soft shutdown
that intrinsically breaks some of the lower-level stuff in the python
stack.
--
Ticket URL: <https://code.djangoproject.com/ticket/37033#comment:2>

Django

unread,
Apr 14, 2026, 4:19:51 AMApr 14
to django-...@googlegroups.com
#37033: Psycopg2 has different semantics than psycopg3 and should not be deprecated
--------------------------------+--------------------------------------
Reporter: Cameron Gorrie | Owner: (none)
Type: Uncategorized | Status: closed
Component: Uncategorized | Version: 6.0
Severity: Normal | Resolution: invalid
Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
--------------------------------+--------------------------------------
Changes (by Sarah Boyce):

* resolution: => invalid
* status: new => closed

Comment:

Thank you both for the discussion. As the current conclusion is that
Django is not at fault here, closing the ticket
--
Ticket URL: <https://code.djangoproject.com/ticket/37033#comment:3>
Reply all
Reply to author
Forward
0 new messages