`Comment.objects.order_by('post_id',
'created_at').distinct('post_id').delete()`
Before proceeding I tested it with:
`Comment.objects.order_by('post_id',
'created_at').distinct('post_id').count()`
Made sure the result actually contains what I needed and proceeded with
the `delete()`. The result was rather surprising. I was notified that the
whole table was wiped clean. I then checked the actual SQL that was
executed and it was a simple `DELETE FROM comments;`.
As an ORM user, I would say it is the worst outcome possible and I would
at least expect an error in such a case or ideally a SQL of what I was
trying to achieve. At the same time, `count` and `delete` produces an
inconsistent result which is even more mistaking.
Potential solutions:
* raise an error with a decent explanation
* produce a desired SQL according to the query
--
Ticket URL: <https://code.djangoproject.com/ticket/32433>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
> I was looking for a way to delete the first Comment of each Post (a
> sample domain). Since I know that every new Post starts with a system
> generated comment I decided to go with:
>
> `Comment.objects.order_by('post_id',
> 'created_at').distinct('post_id').delete()`
>
> Before proceeding I tested it with:
>
> `Comment.objects.order_by('post_id',
> 'created_at').distinct('post_id').count()`
>
> Made sure the result actually contains what I needed and proceeded with
> the `delete()`. The result was rather surprising. I was notified that the
> whole table was wiped clean. I then checked the actual SQL that was
> executed and it was a simple `DELETE FROM comments;`.
>
> As an ORM user, I would say it is the worst outcome possible and I would
> at least expect an error in such a case or ideally a SQL of what I was
> trying to achieve. At the same time, `count` and `delete` produces an
> inconsistent result which is even more mistaking.
>
> Potential solutions:
> * raise an error with a decent explanation
> * produce a desired SQL according to the query
New description:
I was looking for a way to delete the first Comment of each Post (a sample
domain). Since I know that every new Post starts with a system generated
comment I decided to go with:
`Comment.objects.order_by('post_id',
'created_at').distinct('post_id').delete()`
Before proceeding I tested it with:
`Comment.objects.order_by('post_id',
'created_at').distinct('post_id').count()`
Made sure the result actually contains what I needed and proceeded with
the `delete()`. The result was rather surprising. I was notified that the
whole table was wiped clean. I then checked the actual SQL that was
executed and it was a simple `DELETE FROM comments;`.
As an ORM user, I would say it is the worst outcome possible and I would
at least expect an error in such a case or ideally a SQL of what I was
trying to achieve. At the same time, `count` and `delete` produces an
inconsistent result which is even more mistaking.
Potential solutions:
* raise an error with a decent explanation
* produce a desired SQL according to the query
Since I have never submitted a change to Django, I have a very limited
knowledge of the ORM and its intricacies. I could give it a try and issue
a patch for this with some guidance.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:1>
* keywords: orm, delete, distinct, postgresql => orm, delete, distinct
* type: Uncategorized => Bug
* stage: Unreviewed => Accepted
Comment:
I think we should start by failing loudly when `distinct().delete()` is
used as adding support for it would require a subquery pushdown (`DELETE
FROM table WHERE field IN (SELECT DISTINCT field FROM table`) which would
require a non negligible amount of work and additional complexity for the
rare cases where this is useful and a manual `__in` subquery lookup can be
used to achieve the same results e.g.
{{{#!python
Comment.objects.filter(
post_id__in=Comment.objects.order_by('post_id',
'created_at').distinct('post_id').values('post_id')
).delete()
}}}
**egism** if you want to work on a patch you should raise a `TypeError` in
`QuerySet.delete` if `self.query.distinct or self.query.distinct_fields`.
A regression test can then be added to `delete_regress/tests`
([https://github.com/django/django/blob/5fcfe5361e5b8c9738b1ee4c1e9a6f293a7dda40/tests/delete_regress/tests.py
source]).
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:2>
Comment (by egism):
Hi Simon,
Sounds very simple, I will give it a go. Will be a good opportunity to
make my first contribution. Thanks for directions.
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:3>
* owner: nobody => egism
* status: new => assigned
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:4>
Comment (by egism):
[https://github.com/django/django/pull/13992 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:5>
* has_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:6>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:7>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"6307c3f1a123f5975c73b231e8ac4f115fd72c0d" 6307c3f1]:
{{{
#!CommitTicketReference repository=""
revision="6307c3f1a123f5975c73b231e8ac4f115fd72c0d"
Fixed #32433 -- Added error message on QuerySet.delete() following
distinct().
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:8>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"aa1aed923b391b56d514be499c4ddf4b5110b377" aa1aed92]:
{{{
#!CommitTicketReference repository=""
revision="aa1aed923b391b56d514be499c4ddf4b5110b377"
[3.2.x] Fixed #32433 -- Added error message on QuerySet.delete() following
distinct().
Backport of 6307c3f1a123f5975c73b231e8ac4f115fd72c0d from master
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:9>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"28e2077148f7602d29165e90965974698819cbba" 28e20771]:
{{{
#!CommitTicketReference repository=""
revision="28e2077148f7602d29165e90965974698819cbba"
Refs #32433 -- Reallowed calling QuerySet.delete() after distinct().
While values(*field_excluding_pk).distinct() and
distinct(*field_excluding_pk) can reduce the number of resulting rows
in a way that makes subsequent delete() calls ambiguous standalone
.distinct() calls cannot.
Since delete() already disallows chain usages with values() the only
case that needs to be handled, as originally reported, is when
DISTINCT ON is used via distinct(*fields).
Refs #32682 which had to resort to subqueries to prevent duplicates in
the admin and caused significant performance regressions on MySQL
(refs #34639).
This partly reverts 6307c3f1a123f5975c73b231e8ac4f115fd72c0d.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/32433#comment:10>