Should a queryset avoid using a pk of a deleted object?

95 views
Skip to first unread message

Carles Pina i Estany

unread,
Nov 4, 2020, 5:51:36 AM11/4/20
to django-d...@googlegroups.com

Hi all,

I have a question about Django ORM. I'd like to confirm that the current
behaviour is expected (in a way it is!) or find/open an issue if
needeed.

In this example I've defined two models (see them at the end of the
email if needed): one for a Book with a ManyToMany to authors, and a
second model of Author.

And code like this:
```
author = Author.objects.create(name='James')
book = Book.objects.create(title='test')

book.authors.add(author)
book.save()

authors = book.authors.all()

book.delete()

print(authors) # should a user expect all the authors?
```

In the `print(authors)` line the book is already deleted and it returns
no authors. It executes an SQL query where at that point book.pk does
not exist. I think that technically it could return wrong existing data
if the book.pk had been reused, AFAIK none of the default database
backends would do this but I think that it could happen if doing a field
with primary_key=True.

If the `authors = book.authors.all()` line was after the
`book.delete()`: Django raises a ValueError with the error message
'<Book: test> needs to have a value for field "id" before this
many-to-many relationship can be used.'

I understand why this happens. In some code I needed something along
these lines and I'm doing something like "authors =
list(book.authors.all())" to keep the list of authors, delete the book
and then do some actions in the authors.

I think that this can be confusing. Also, Django prevents the error in
one case (get query set after the deletion) but not the other case
(execute the query after deletion).

Fix idea:
If QuerySet was holding a book instance it could enforce that book.pk is
set at the time of running the query (or returning the result if the
result was cached?)

I had a quick look at QuerySet and the book instance is in
self._hints['instance']. In `QuerySet._fetch_all` I could do something
along the lines of:
`
if 'instance' in self._hints and self._hints['instance'].pk is None:
raise ValueError('Object deleted')
`

I guess that in
create_forward_many_to_many_manager.ManyRelatedManager.__init__ the book
instance could be saved in some variable in the QuerySet object (I guess
that self._hints['instance'] is not meant to be used like my example) to
check this when appropiate. Has this ever been considered? Or is the
behaviour as it is now good enough?

Thank you very much,

PS: Models used in the example:
```
class Author(models.Model):
name = models.TextField(max_length=64)

def __str__(self):
return self.name


class Book(models.Model):
title = models.TextField(max_length=128)
authors = models.ManyToManyField(Author)

def __str__(self):
return self.title
```

--
Carles Pina i Estany
https://carles.pina.cat

Jason Johns

unread,
Nov 7, 2020, 10:49:33 AM11/7/20
to Django developers (Contributions to Django itself)
Does https://code.djangoproject.com/ticket/22553 cover this case?  It was closed as a documentation update with this commit in the docs: https://github.com/django/django/pull/2974/files


From your example, re-doing the book.authors.all() call will refresh the queryset's cache.

Carles Pina i Estany

unread,
Nov 8, 2020, 6:30:31 AM11/8/20
to django-d...@googlegroups.com

Hi Jason,

On Nov/07/2020, Jason Johns wrote:
> Does https://code.djangoproject.com/ticket/22553 cover this case? It was
> closed as a documentation update with this commit in the docs:
> https://github.com/django/django/pull/2974/files

Thanks very much for the links, it's a related case one could say but
not my case.

It says "If the data in the database might have changed, you can get
updated results for the same query by calling ``all()`` on a previously
evaluated ``QuerySet``."

It doesn't cover this case because the queryset has never been evaluated
yet in my case and the first time that the queryset is evaluated is
using a book.id that doesn't exist anymore. The original code:

```
author = Author.objects.create(name='James')
book = Book.objects.create(title='test')

book.authors.add(author)
book.save()

authors = book.authors.all() # No queryset evaluation

book.delete() # book is deleted from the database

print(authors) # First time that the queryset is evaluated
# and it uses a book.id that doesn't exist anymore[1]
```

The query launched:
SELECT `modeltest_author`.`id`, `modeltest_author`.`name` FROM `modeltest_author` INNER JOIN `modeltest_book_authors` ON (`modeltest_author`.`id` = `modeltest_book_authors`.`author_id`) WHERE `modeltest_book_authors`.`book_id` = 1 LIMIT 21

If using a field as a primary key, deleting the book, creating a new one
with the same primary key: it's possible to expect (depends on the user
experience) the first set of authors but get the new noe (with the same
primary key).

Thanks again for the pointers,

Carles
> --
> You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/28155993-2ea0-4406-b9d0-44da422b3ad5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages