{{{
...
File ".../django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
}}}
After that, Django will close the connection in question. During the
handling of the next request, that Django process will create a new DB
connection and it will succeed. So with persistent connections (assuming
all processes/workers for a Django app/project already have a DB
connection open), every Django process will fail to handle one DB
accessing request after its connection is closed by the DB server.
This isn't news. Quoting from #24810:
> When the database closes the connection, for instance because the
database server was restarted, an exception will be raised the next time
user code attempts to use the database. Django doesn't attempt to do
anything about it.
>
> This is expected to result in at most one 500 error for each thread that
serves requests, since database connections are thread-local. This doesn't
sound horrific. After all the database was restarted while the website was
serving requests.
We could try to prevent these errors by
[https://github.com/django/django/blob/80482e924953ec0e2484a9cb0f44bd5eeea93856/django/db/backends/base/base.py#L480-L490
health checking the connection] before every re-use attempt (currently the
health check is only done after an error has already been detected for the
connection).
This [https://github.com/django/django/compare/master...suligap:db-conn-
health-checks Work In Progress patch] (no tests and docs yet) introduces a
`CONN_HEALTH_CHECK_DELAY` DB setting to Django:
* If it's set to `None` (default), the behavior is the same as before.
* Otherwise, it expects an integer with a number of seconds after which
the connection should be health checked before it's reused (`0` for
"before every reuse").
When checks are enabled, it works as follows (assuming already existing
"persistent" DB connection):
* Before each connection reuse, perform a connection test (`SELECT 1`
query in case of PostgreSQL).
* This only happens if last check was performed more than
`CONN_HEALTH_CHECK_DELAY` seconds ago.
* This only happens once per request (or async task -
[https://github.com/celery/celery/blob/3d387704d4a18db5aea758e9a26c1ee4d71659df/celery/fixups/django.py#L163-L192
Hi Celery])) regardless of how many transactions the "request" will
perform.
* This only happens for "requests" that do use the database.
* If the check fails, close the connection and create a new one.
I feel like I'm probably missing something important here...
Anyway, on the other hand, what gives me hints that this might not be the
worst idea in the world are examples from other places:
- `django-db-geventpool` (used with success in some
gunicorn/gevent/PostgreSQL setups) [https://github.com/jneight/django-db-
geventpool/blob/a1a691d9cf031a7fa0c377f87a69e87bb54e7ea6/django_db_geventpool/backends/postgresql_psycopg2/psycopg2_pool.py#L44
does] `SELECT 1` every time it leases out an existing connection.
- PgBouncer does the check query too (configurable by the
[https://github.com/pgbouncer/pgbouncer/blob/cb9a1511a48f7f5fb3ddf258055c698c8004f00f/doc/config.md#server_check_delay
server_check_delay] setting - default 30 seconds).
- HikariCP (a popular Java JDBC connection pool) also does
[https://github.com/brettwooldridge/HikariCP/blob/e8e9cdd00ab886b26340c05706485ddb092f5116/src/main/java/com/zaxxer/hikari/pool/HikariPool.java#L185
connection health checks] before leasing out connections.
- Rails' Active Record
[https://github.com/rails/rails/blob/b1fc1319dfbc450451bb0b410113dd6b7ab1d412/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L859-L869
seems] like
[https://github.com/rails/rails/blob/dc2d19b8bf693886bec520c1787166eb105c8567/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L512-L517
it also] does
[https://github.com/rails/rails/blob/dc2d19b8bf693886bec520c1787166eb105c8567/activerecord/lib/active_record/connection_adapters/mysql2_adapter.rb#L93-L95
something like that].
Granted, the above are _separate_ components (doing connection pooling and
some other things in case of Active Record) and very likely a proper fix
for #24810 might fix the issue this PR fixes. On the other hand
[https://github.com/django/django/compare/master...suligap:db-conn-health-
checks this approach here] is addressing a different issue and might be
simpler and more feasible (at least as the request/response cycle code
part goes)? Happy to see this through if something like this makes sense
for Django.
---
My main motivation for researching this was trying to create a solution
for Celery. With default `prefork` concurrency, celery tasks running
Django ORM code would work very nicely with non-zero `CONN_MAX_AGE`. But
the problem is I don't want to end up loosing messages/tasks because of
these "db server closes connections" events (however rare they might be -
with lots connections and threads/processes the number of errors gets
bigger - Lowering the `CONN_MAX_AGE` is a trade-off which doesn't
guarantee good enough safety). And Celery tasks by default are
[http://docs.celeryproject.org/en/v4.3.0/reference/celery.app.task.html#celery.app.task.Task.acks_late
ACK-ed] before they're executed (so if they fail because of that, they're
lost). But it seems that Django itself could also benefit from this. And
so here we are.
--
Ticket URL: <https://code.djangoproject.com/ticket/30398>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
Comment:
OK, yes, this seems reasonable to at least look at. (Assuming there's no
massive performance hit or such...)
Aside: once of the main benefits of a move to async is that it allows easy
creation of background tasks, for which this kind of health check might be
suitable. (Not celery's model, but related...)
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:1>
* has_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:2>
* owner: nobody => Przemysław Suliga
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:3>
* has_patch: 0 => 1
Comment:
[https://github.com/django/django/pull/11311 Proof of Concept PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:4>
* needs_docs: 0 => 1
* needs_tests: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:5>
Comment (by Ran Benita):
Taking a risk of detracting the discussion a bit, but given Djagno's async
future, wouldn't this be the time to revisit the decision to use
persistent connections over a connection pool? For most async frameworks
(we use gevent, but will be the same with ASGI), persistent connections
don't work since async tasks are not reused like threads/processes,
instead a task is spawned per request.
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:6>
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:7>
* needs_docs: 1 => 0
* needs_tests: 1 => 0
Comment:
[https://github.com/django/django/pull/14899 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:8>
* needs_better_patch: 0 => 1
Comment:
See
[https://github.com/django/django/pull/14899#pullrequestreview-788437304
Florian's review].
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:9>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:10>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:11>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"4ce59f602ed28320caf3035212cb4d1c5430da2b" 4ce59f60]:
{{{
#!CommitTicketReference repository=""
revision="4ce59f602ed28320caf3035212cb4d1c5430da2b"
Fixed #30398 -- Added CONN_HEALTH_CHECKS database setting.
The CONN_HEALTH_CHECKS setting can be used to enable database
connection health checks for Django's persistent DB connections.
Thanks Florian Apolloner for reviews.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:12>
Comment (by Claude Paroz):
Does that also fix #24810?
--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:13>