#30646 - close_if_unusable_or_obsolete fails to close unusable connections

32 views
Skip to first unread message

Daniel Neuhaeuser

unread,
Jul 18, 2019, 4:22:32 PM7/18/19
to Django developers (Contributions to Django itself)
Hello,

I'd like to start a discussion about the ticket I created https://code.djangoproject.com/ticket/30646 that was closed as won't fix.

Essentially the ticket is about how to handle database disconnects. These can happen for a variety of reasons such as network problems or failover from a master to a replica. It took us quite some time until we figured out how to handle this reasonably and we now do this through a combination of:

* Checking whether connection is usable (what the ticket and the associated PR do)
* Exponential backoff and retry
* A middleware that detects database errors that are likely transient and turns them into 503 with Retry-After in situations where retrying would introduce unreasonably high latency

Given that Django aims to be a full-stack framework for "perfectionists with deadlines", I think this is complexity that should move into the framework. At the very least it should be seriously considered.

SQLAlchemy, while not addressing the problem fully, covers this problem in detail in their documentation https://docs.sqlalchemy.org/en/13/core/pooling.html#dealing-with-disconnects. The "pre ping" approach mentioned there would correspond to what I've proposed in the ticket and the associated PR. 


Kind regards,
Daniel

Aymeric Augustin

unread,
Jul 18, 2019, 5:53:15 PM7/18/19
to django-d...@googlegroups.com
Hello Daniel,

In the PR you propose to ping all database connections at the beginning and at the end of every HTTP request. I'm positive that this will have a significant performance impact. It's quite common to have secondary database connections that are rarely used, not scaled to be pinged twice per HTTP request, and sometimes slow to connect to.

I understand that you solved this problem in the context of Zalando. It's unclear to me that your solution easily generalizes to every Django-based system.

Surely, a serious, concrete proposal will be seriously considered. What we're missing at this point is a proposal that we can discuss!

As you know, currently, Django handles disconnects optimistically. It accepts to return one 500 error every time a worker loses its database connection. You can reduce this to one 500 error every time a worker loses its database connection while it's processing a request (rather than waiting for the next request) by disabling persistent connections with CONN_MAX_AGE = 0 but the performance hit is likely not worth the reliability improvement. On the ticket you say that you're seeing "a number of requests failing with 500s on every failover". Is that one request per thread or several requests in the same thread? The latter would be a bug.

You'd like to handle disconnects pessimistically. Since middleware runs outside of per-view transaction handling (when ATOMIC_REQUESTS is enabled), I think you can do anything you need in a middleware, which is likely what you have done (unless you had to patch Django?) A third-party package would be a good way to demonstrate the feature and also to document its pros and cons.

Best regards,

-- 
Aymeric.



--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/cc0bd0bb-38f9-4ce3-84f5-d45c415d390c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages