== Analysis
It seems the related objects A get deleted in one query with an unbound
`IN (...)` list of B objects.
In fact this pattern already lead to an issue with the sqlite backend
([#link0]: sqlite supports only 999 parameters per query)
This was fixed in django 1.8 by adding batch query [#link1], [#link2] with
a size specified per backend:
- sqlite: 999 [#link3]
- oracle: 64k [#link4]
- all others (base): not limited [#link5]
== Workaround
As a temporary workaround we monkey patched the `connection` instance own
`bulk_batch_size` and limit to 64k.
{{{#!python
import types
def monkey_patch_connection_bulk_batch_size(connection):
def limited_bulk_batch_size(self, fields, objs):
"""
PostgreSQL can crash with too many parameters in a query
e.g. 'DELETE FROM x WHERE x.y IN (...large list...)'
=> limit to 64k
"""
return 2**16
connection.ops.bulk_batch_size =
types.MethodType(limited_bulk_batch_size, connection.ops)
}}}
It worked great in our case: we used it in a migration.
workaround limitations:
- no idea where to monkey patch for global usage
- no idea how to choose the bulk size value
- didn't handle the more complex computation using `fields` and `objs`
parameters
(Related remark: this is for deleting the related objects, then for the
main objects deletion django already uses batch deletion, but with much
smaller batch size: `GET_ITERATOR_CHUNK_SIZE = 100`; with some latency to
the database it's a really small value (0.2ms per query))
== Context
- postgresql 9.6 (GCP Cloud SQL)
- django 2.0.13 (but related code has not changed in current master it
seems)
---
link 0 [=#link0] https://code.djangoproject.com/ticket/16426
link 1 [=#link1]
https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/models/deletion.py#L224-L225
link 2 [=#link2]
https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/models/deletion.py#L167
link 3 [=#link3]
https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/backends/sqlite3/operations.py#L27-L40
link 4 [=#link4]
https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/backends/oracle/operations.py#L592-L596
link 5 [=#link5]
https://github.com/django/django/blob/f6075fb333bae29ee213b050e91eaadef75496dd/django/db/backends/base/operations.py#L65-L71
--
Ticket URL: <https://code.djangoproject.com/ticket/30533>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* version: 2.0 => master
Comment:
Thanks for the report. Do you know any source that describes the max
number of parameters for PostgreSQL? (e.g. 64k is specified in db docs on
Oracle).
--
Ticket URL: <https://code.djangoproject.com/ticket/30533#comment:1>
Comment (by Thomas Riccardi):
Replying to [comment:1 felixxm]:
> Thanks for the report. Do you know any source that describes the max
number of parameters for PostgreSQL? (e.g. 64k is specified in db docs on
Oracle).
I think there is no hardcoded limits in postgresql; what I think probably
happened is a high memory allocation that resulted in an OOMKill on the
connection process on the db side: with more RAM the query could probably
work.
The queries pattern in this scenario is:
- get all B objects to delete: SELECT
- delete all A related objects: the unbound DELETE IN that crashes
- delete all B objects: DELETE IN with batch of 100
It would make sense to have the same size for the 2 DELETE IN, but 100 is
really small for DELETEs (the result size is constant, not proportional to
the number of deleted objects): maybe add another constant for DELETE:
`GET_ITERATOR_CHUNK_SIZE` should be for get (where the result size is
proportional to the number of objects), with a much higher value; this
could be part of the fix for this ticket?
--
Ticket URL: <https://code.djangoproject.com/ticket/30533#comment:2>
* status: new => closed
* resolution: => wontfix
Comment:
If we don't have any restrictions in PostgreSQL I wouldn't classify this
as a bug in Django. In SQLite and Oracle we limit the number of queries
parameters or even the number of parameters in ` IN (...)` list, because
we have to. These are databases limitations, which is not the case on
PostgreSQL. You have at least two ways to handle this:
- `delete()` in chunks (1-2M) using ORM, or
- [https://www.postgresql.org/docs/11/sql-truncate.html TRUNCATE ...
CASCADE] table manually .
--
Ticket URL: <https://code.djangoproject.com/ticket/30533#comment:3>
--
Ticket URL: <https://code.djangoproject.com/ticket/30533#comment:4>