Database initially looks like this:
{{{
parent1 -- child1
}}}
User runs `Parent.objects.all().prefetch_related('child_set')`.
Django fetches `parent1`.
Another transaction commits, resulting in this structure:
{{{
parent1 +- child1
`- child2
parent2
}}}
Django fetches all children pointing to `parent1`, which returns `child1`
and `child2`.
The result of the query is:
{{{
parent1 +- child1
`- child2
}}}
But at no point in the history did the database look like this.
This bug report is based on code inspection.
I verified that `prefetch_related` doesn't start a transaction.
However I didn't write code exhibiting the race condition because that's a
bit complicated.
--
Ticket URL: <https://code.djangoproject.com/ticket/27403>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
* stage: Unreviewed => Accepted
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:1>
Comment (by Rivo Laks):
Correct me if I'm wrong, but I think just wrapping the
`prefetch_related()` in a transaction wouldn't fix this.
By default, at least in Postgres, queries made within a transaction can
still see data from other _committed_ transactions, so the same problem
would occur.
In Postgres, you could select a higher isolation level, I think Repeatable
Read would prevent the case you've described. But this has other
consequences and should be done on per-app basis. See
https://www.postgresql.org/docs/current/static/transaction-iso.html for
more info.
If I'm right, maybe docs should note that there might be edge cases
regards transactional consistency?
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:2>
* cc: Shai Berger (added)
Comment:
Rivo, I think you're wrong, but not in the direction you suspected: In
fact, REPEATABLE READ would still not fix this, and as far as I
understand, on any database that isn't PG, even SERIALIZABLE won't --
unless we first select (with no need to fetch) the join of the tables. But
that would undo much of the performance benefits of `prefetch_related`...
Some databse engines -- notably SQL Server -- can return several result
sets (=querysets) from a single command (typically a procedure call).
Something like that could, at least in principle, help us here.
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:3>
* type: Bug => Cleanup/optimization
* component: Database layer (models, ORM) => Documentation
Comment:
I agree with both conclusions. Repurposing the ticket for a mention in the
documentation.
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:4>
* owner: nobody => Tim Bell
* status: new => assigned
Comment:
I'm looking at this ticket during the DjangoCon sprints.
I've reproduced a version of this race condition using the models defined
in the `prefetch_related()` docs
(https://docs.djangoproject.com/en/4.2/ref/models/querysets/#prefetch-
related). By adding a `sleep()` at the start of
`prefetch_related_objects()`, I was able to run another database query in
between the two queries required to evaluate
`Pizza.objects.prefetch_related('toppings')`.
When deleting `Pizza.objects.get(name="Hawaiian")` during the sleep, we
get this:
{{{
>>> Pizza.objects.prefetch_related('toppings')
Sleeping...
<QuerySet [<Pizza: Hawaiian ()>, <Pizza: Seafood (prawns, smoked
salmon)>]>
}}}
Just like in the example from this ticket's description, at no point in
the history did the database look like this.
I'll now try to write a succinct description of this issue (only one or
two sentences) to add to the docs.
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:5>
* has_patch: 0 => 1
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:6>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:7>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"ee104251c403fbac83b8475163ff2ac01c567d25" ee10425]:
{{{
#!CommitTicketReference repository=""
revision="ee104251c403fbac83b8475163ff2ac01c567d25"
Fixed #27403 -- Doc'd that QuerySet.prefetch_related() doesn't guarantee
transactional consistency.
Added a note about the potential race condition in prefetch_related()
that could produce an inconsistent result, one that does not correspond
to any point in the database history.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:8>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"8b18e0bb3ba8bb1f51e15487a6d5402853e637ae" 8b18e0bb]:
{{{
#!CommitTicketReference repository=""
revision="8b18e0bb3ba8bb1f51e15487a6d5402853e637ae"
[5.0.x] Fixed #27403 -- Doc'd that QuerySet.prefetch_related() doesn't
guarantee transactional consistency.
Added a note about the potential race condition in prefetch_related()
that could produce an inconsistent result, one that does not correspond
to any point in the database history.
Backport of ee104251c403fbac83b8475163ff2ac01c567d25 from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/27403#comment:9>