[Django] #30398: Check database connection's health before its reused

354 views
Skip to first unread message

Django

unread,
Apr 24, 2019, 8:21:12 AM4/24/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: | Owner: nobody
Przemysław Suliga |
Type: New | Status: new
feature |
Component: Database | Version: master
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 1
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
With [https://docs.djangoproject.com/en/2.2/ref/databases/#persistent-
connections persistent database connections], when the DB server closes
the connection (for example during its restart), the next request that
tries to use that persistent DB connection in Django will fail with an
error (even after the DB server is ready to process queries) like this one
(example for PostgreSQL):

{{{
...
File ".../django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.OperationalError: server closed the connection
unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
}}}

After that, Django will close the connection in question. During the
handling of the next request, that Django process will create a new DB
connection and it will succeed. So with persistent connections (assuming
all processes/workers for a Django app/project already have a DB
connection open), every Django process will fail to handle one DB
accessing request after its connection is closed by the DB server.

This isn't news. Quoting from #24810:

> When the database closes the connection, for instance because the
database server was restarted, an exception will be raised the next time
user code attempts to use the database. Django doesn't attempt to do
anything about it.
>
> This is expected to result in at most one 500 error for each thread that
serves requests, since database connections are thread-local. This doesn't
sound horrific. After all the database was restarted while the website was
serving requests.

We could try to prevent these errors by
[https://github.com/django/django/blob/80482e924953ec0e2484a9cb0f44bd5eeea93856/django/db/backends/base/base.py#L480-L490
health checking the connection] before every re-use attempt (currently the
health check is only done after an error has already been detected for the
connection).

This [https://github.com/django/django/compare/master...suligap:db-conn-
health-checks Work In Progress patch] (no tests and docs yet) introduces a
`CONN_HEALTH_CHECK_DELAY` DB setting to Django:

* If it's set to `None` (default), the behavior is the same as before.
* Otherwise, it expects an integer with a number of seconds after which
the connection should be health checked before it's reused (`0` for
"before every reuse").

When checks are enabled, it works as follows (assuming already existing
"persistent" DB connection):

* Before each connection reuse, perform a connection test (`SELECT 1`
query in case of PostgreSQL).
* This only happens if last check was performed more than
`CONN_HEALTH_CHECK_DELAY` seconds ago.
* This only happens once per request (or async task -
[https://github.com/celery/celery/blob/3d387704d4a18db5aea758e9a26c1ee4d71659df/celery/fixups/django.py#L163-L192
Hi Celery])) regardless of how many transactions the "request" will
perform.
* This only happens for "requests" that do use the database.
* If the check fails, close the connection and create a new one.

I feel like I'm probably missing something important here...
Anyway, on the other hand, what gives me hints that this might not be the
worst idea in the world are examples from other places:

- `django-db-geventpool` (used with success in some
gunicorn/gevent/PostgreSQL setups) [https://github.com/jneight/django-db-
geventpool/blob/a1a691d9cf031a7fa0c377f87a69e87bb54e7ea6/django_db_geventpool/backends/postgresql_psycopg2/psycopg2_pool.py#L44
does] `SELECT 1` every time it leases out an existing connection.
- PgBouncer does the check query too (configurable by the
[https://github.com/pgbouncer/pgbouncer/blob/cb9a1511a48f7f5fb3ddf258055c698c8004f00f/doc/config.md#server_check_delay
server_check_delay] setting - default 30 seconds).
- HikariCP (a popular Java JDBC connection pool) also does
[https://github.com/brettwooldridge/HikariCP/blob/e8e9cdd00ab886b26340c05706485ddb092f5116/src/main/java/com/zaxxer/hikari/pool/HikariPool.java#L185
connection health checks] before leasing out connections.
- Rails' Active Record
[https://github.com/rails/rails/blob/b1fc1319dfbc450451bb0b410113dd6b7ab1d412/activerecord/lib/active_record/connection_adapters/abstract/connection_pool.rb#L859-L869
seems] like
[https://github.com/rails/rails/blob/dc2d19b8bf693886bec520c1787166eb105c8567/activerecord/lib/active_record/connection_adapters/abstract_adapter.rb#L512-L517
it also] does
[https://github.com/rails/rails/blob/dc2d19b8bf693886bec520c1787166eb105c8567/activerecord/lib/active_record/connection_adapters/mysql2_adapter.rb#L93-L95
something like that].

Granted, the above are _separate_ components (doing connection pooling and
some other things in case of Active Record) and very likely a proper fix
for #24810 might fix the issue this PR fixes. On the other hand
[https://github.com/django/django/compare/master...suligap:db-conn-health-
checks this approach here] is addressing a different issue and might be
simpler and more feasible (at least as the request/response cycle code
part goes)? Happy to see this through if something like this makes sense
for Django.

---

My main motivation for researching this was trying to create a solution
for Celery. With default `prefork` concurrency, celery tasks running
Django ORM code would work very nicely with non-zero `CONN_MAX_AGE`. But
the problem is I don't want to end up loosing messages/tasks because of
these "db server closes connections" events (however rare they might be -
with lots connections and threads/processes the number of errors gets
bigger - Lowering the `CONN_MAX_AGE` is a trade-off which doesn't
guarantee good enough safety). And Celery tasks by default are
[http://docs.celeryproject.org/en/v4.3.0/reference/celery.app.task.html#celery.app.task.Task.acks_late
ACK-ed] before they're executed (so if they fail because of that, they're
lost). But it seems that Django itself could also benefit from this. And
so here we are.

--
Ticket URL: <https://code.djangoproject.com/ticket/30398>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
Apr 30, 2019, 3:12:50 PM4/30/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Carlton Gibson):

* stage: Unreviewed => Accepted


Comment:

OK, yes, this seems reasonable to at least look at. (Assuming there's no
massive performance hit or such...)

Aside: once of the main benefits of a move to async is that it allows easy
creation of background tasks, for which this kind of health check might be
suitable. (Not celery's model, but related...)

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:1>

Django

unread,
May 1, 2019, 2:37:46 AM5/1/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* has_patch: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:2>

Django

unread,
May 1, 2019, 3:37:57 AM5/1/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned

Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 0 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Przemysław Suliga):

* owner: nobody => Przemysław Suliga
* status: new => assigned


--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:3>

Django

unread,
May 1, 2019, 4:04:53 PM5/1/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Przemysław Suliga):

* has_patch: 0 => 1


Comment:

[https://github.com/django/django/pull/11311 Proof of Concept PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:4>

Django

unread,
May 3, 2019, 6:41:53 AM5/3/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by felixxm):

* needs_docs: 0 => 1
* needs_tests: 0 => 1


--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:5>

Django

unread,
May 3, 2019, 9:41:17 AM5/3/19
to django-...@googlegroups.com
#30398: Check database connection's health before its reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Ran Benita):

Taking a risk of detracting the discussion a bit, but given Djagno's async
future, wouldn't this be the time to revisit the decision to use
persistent connections over a connection pool? For most async frameworks
(we use gevent, but will be the same with ASGI), persistent connections
don't work since async tasks are not reused like threads/processes,
instead a task is spawned per request.

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:6>

Django

unread,
Sep 26, 2021, 3:45:58 AM9/26/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused

-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: dev

(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 1
Needs tests: 1 | Patch needs improvement: 0

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:7>

Django

unread,
Oct 3, 2021, 9:53:54 AM10/3/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* needs_docs: 1 => 0
* needs_tests: 1 => 0


Comment:

[https://github.com/django/django/pull/14899 PR]

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:8>

Django

unread,
Oct 26, 2021, 12:17:18 AM10/26/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 1

Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 0 => 1


Comment:

See
[https://github.com/django/django/pull/14899#pullrequestreview-788437304
Florian's review].

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:9>

Django

unread,
Nov 29, 2021, 2:38:34 AM11/29/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* needs_better_patch: 1 => 0


--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:10>

Django

unread,
Nov 30, 2021, 3:33:30 PM11/30/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak):

* stage: Accepted => Ready for checkin


--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:11>

Django

unread,
Dec 1, 2021, 2:24:00 AM12/1/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: closed

Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: fixed

Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Mariusz Felisiak <felisiak.mariusz@…>):

* status: assigned => closed
* resolution: => fixed


Comment:

In [changeset:"4ce59f602ed28320caf3035212cb4d1c5430da2b" 4ce59f60]:
{{{
#!CommitTicketReference repository=""
revision="4ce59f602ed28320caf3035212cb4d1c5430da2b"
Fixed #30398 -- Added CONN_HEALTH_CHECKS database setting.

The CONN_HEALTH_CHECKS setting can be used to enable database
connection health checks for Django's persistent DB connections.

Thanks Florian Apolloner for reviews.
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:12>

Django

unread,
Dec 1, 2021, 5:00:29 AM12/1/21
to django-...@googlegroups.com
#30398: Check database connection's health before it is reused
-------------------------------------+-------------------------------------
Reporter: Przemysław Suliga | Owner:
| Przemysław Suliga
Type: New feature | Status: closed
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution: fixed
Keywords: | Triage Stage: Ready for
| checkin
Has patch: 1 | Needs documentation: 0

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Claude Paroz):

Does that also fix #24810?

--
Ticket URL: <https://code.djangoproject.com/ticket/30398#comment:13>

Reply all
Reply to author
Forward
0 new messages