[Django] #35434: prefetch_related_objects fails to cache UUID FKs when the string representation of a UUID is used

21 views
Skip to first unread message

Django

unread,
May 5, 2024, 5:47:52 AMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: selcuk | Owner: nobody
Type: Bug | Status: new
Component: Database | Version: dev
layer (models, ORM) |
Severity: Normal | Keywords:
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
Consider the following models:

{{{
class Pet(models.Model):
id = models.UUIDField(primary_key=True, default=uuid.uuid4,
editable=False)
name = models.CharField(max_length=20)


class Toy(models.Model):
pet = models.ForeignKey(Pet, models.CASCADE, related_name="toys")
name = models.CharField(max_length=20)
}}}
The following works and caches the deferred {{{pet}}} instance as
expected:
{{{
pet = Pet.objects.create(name="Fifi")
toy_bone = Toy(pet_id=str(pet.id), name="Bone")
print(toy_bone.pet)
}}}
It fails if {{{prefetch_related_objects}}} is used:
{{{
pet = Pet.objects.create(name="Fifi")
toy_bone = Toy(pet_id=str(pet.id), name="Bone")
prefetch_related_objects([toy_bone], "pet")
print(toy_bone.pet)
}}}
{{{
raise self.RelatedObjectDoesNotExist(
^^^^^^^^^^^^^^^^^
prefetch_related.models.Toy.pet.RelatedObjectDoesNotExist: Toy has no pet.
}}}
The behaviour is inconsistent as str to UUID conversion is automatically
performed in the first example.

One may argue that the use of {{{prefetch_related_objects}}} is redundant
in this case. However, it is useful in async code where deferred lookups
are not automatic.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 5, 2024, 5:51:43 AMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: selcuk | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by selcuk:

Old description:
New description:
Unit test to reproduce the issue:
https://github.com/django/django/pull/18132

--
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:1>

Django

unread,
May 5, 2024, 7:45:44 AMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by Selcuk Ayguney:
Unit test to reproduce the issue and the suggested fix:
https://github.com/django/django/pull/18132

--
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:2>

Django

unread,
May 5, 2024, 7:47:25 AMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: nobody
Type: Bug | Status: new
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Selcuk Ayguney):

* has_patch: 0 => 1

--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:3>

Django

unread,
May 5, 2024, 11:43:55 AMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* owner: nobody => Selcuk Ayguney
* stage: Unreviewed => Accepted
* status: new => assigned

Comment:

I ran into a symptom of this problem recently and didn't get far enough to
diagnose the root cause.

I was prefetching at two levels, so my
[https://github.com/archesproject/arches/blob/3940c46627fb3dedbbe58cb64f892b517d86dc99/arches/app/utils/index_database.py#L220-L224
hack] was to just iterate the objects and reselect the object at the
intermediate level (reintroducing N+1 queries at that level), but at least
allowing me to avoid N+1 queries at the second level.

Thanks for the report.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:4>

Django

unread,
May 5, 2024, 12:36:55 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Simon Charette):

* cc: Simon Charette (added)

Comment:

Have we performed any form of benchmarks to assess the impact of calling
`to_python` for each sides of the prefetching ''bridge''?
`Field.to_python` can perform non-trivial validation and instance checks
that quickly add up when dealing with large collection of objects.

Given `prefetch_related_object` is normally called from a
`prefetch_related` queryset where both sides are already guaranteed to be
of the right type and serializers/forms are meant to be turn
representations of objects to the expected field types at input boundaries
it seems like this change could do more harm than good both from a
performance and areas of concerns violation perspective.

My question to you would be why you are assigning the unexpected type to
field in the first place (the `pet_id=str(pet.id)`)? Plenty of things
subtly break if you assign the string representation of a `DateField`,
`DecimalField`, `FloatField`, etc to a model instance field instead of the
proper type so I'm not sure how this particular case is special. If you
have to assign such properties I believe you should call `.full_clean()` /
`clean_fields()` on the model instance and be ready to deal with any
associated `ValidationError`.

I'm more of the opinion that we should wont-fix this issue than proceed
here.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:5>

Django

unread,
May 5, 2024, 12:49:52 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Selcuk Ayguney):

True, I already implemented a similar workaround in the codebase I was
working on when I spotted this issue, and honestly I'm not sure if this is
a wontfix or a legitimate bug. I haven't performed any benchmark either.
My rationale was that the usual Django deferred lookup works when a
{{{str}}} is used, but it only fails when {{{prefetch_related_objects}}}
is used, so this looked like the more consistent behaviour. I don't think
other types such as {{{DecimalField}}} etc are commonly used as primary
keys.

In my real world example the model instance is instantiated from a JSON
object in a generic way, and foreign keys are set using the {{{attname}}}s
where it is not possible to know the actual data type.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:6>

Django

unread,
May 5, 2024, 12:55:16 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

> My rationale was that the usual Django deferred lookup works when a str
is used, but it only fails when prefetch_related_objects is used, so this
looked like the more consistent behaviour.

I understand where that could be perceived as inconsistent but I would
argue that it's undefined behaviour in both cases. It only happens to work
in the lazy loading case because it incurs a query that defers the lookup
to the backend.

> I don't think other types such as DecimalField etc are commonly used as
primary keys.

Fair point, it remains unnecessary while arguably faster validation for
`AutoField, `UUIDField`, and others though.

> In my real world example the model instance is instantiated from a JSON
object in a generic way, and foreign keys are set using the attnames where
it is not possible to know the actual data type.

If you are turning a ''raw'' JSON object to a model instance I think that
you should be expected to call `clean_fields` at the very least.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:7>

Django

unread,
May 5, 2024, 1:06:47 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Selcuk Ayguney):

> It only happens to work in the lazy loading case because it incurs a
query that defers the lookup to the backend.

Agreed. Granted, it is still because of the database backend, but strings
can be used interchangeably with UUIDs in almost everywhere, for example
{{{Toy.objects.filter(pk="00000000-...").exists()}}}. This makes the
behaviour in this ticket even more peculiar.

> That's what serialization layers are usually for; turn data into their
proper model equivalent.

Sure thing. As I mentioned I have already changed my code to validate the
instance, but this behaviour took me by surprise and it looked like it
violates the least astonishment principle.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:8>

Django

unread,
May 5, 2024, 1:46:39 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Jacob Walls):

> I understand where that could be perceived as inconsistent but I would
argue that it's undefined behaviour in both cases. It only happens to work
in the lazy loading case because it incurs a query that defers the lookup
to the backend.

Interesting. If `pet_id=str(pet.id)` is undefined, then I'm looking at a
huge yak shave to audit my projects for UUIDs represented as strings,
which I'm pretty sure we're doing in every view, test, model, migration...
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:9>

Django

unread,
May 5, 2024, 1:55:49 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: Selcuk
| Ayguney
Type: Bug | Status: assigned
Component: Database layer | Version: dev
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage: Accepted
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Simon Charette):

There's a difference between lookup against the database (e.g.
`filter(pet_id=str(pet.id))`) which performs implicit conversions and
direct model attribute assignments. I would argue that if you explicitly
assign the string representation of objects meant to be stored in a field
to a model instance you are effectively doing things wrong.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:10>

Django

unread,
May 5, 2024, 7:02:35 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* component: Database layer (models, ORM) => Documentation
* has_patch: 1 => 0
* owner: Selcuk Ayguney => (none)
* stage: Accepted => Unreviewed
* status: assigned => new
* type: Bug => Cleanup/optimization

Comment:

Thanks, that helps me understand the severity of the anti-pattern.

Now I'm aware that despite the implicit type-conversion you get on the
database-side, python-side operations like `==` or `obj in queryset`
become unsafe when a stringified UUID is assigned to a model attribute.

Or even `queryset.contains()`, but only if called on an evaluated
queryset.


{{{
In [1]: from models import GraphModel

In [2]: original = GraphModel.objects.first()

In [3]: unsafe = GraphModel(str(original.pk))

In [4]: all_graphs = GraphModel.objects.all()

In [5]: all_graphs.contains(unsafe)
Out[5]: True

In [6]: len(all_graphs)
Out[6]: 1

In [7]: all_graphs.contains(unsafe)
Out[7]: False

}}}

I think a version of that REPL would be useful in the docs somewhere.
Calling your query tested by testing as far as line 5 but not line 7 is
something I can see happening.

Selcuk, is it okay if I reframe your ticket as a documentation request and
ask for another opinion?
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:11>

Django

unread,
May 5, 2024, 10:47:54 PMMay 5
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Comment (by Selcuk Ayguney):

Replying to [comment:11 Jacob Walls]:
> Selcuk, is it okay if I reframe your ticket as a documentation request
and ask for another opinion?
It's perfectly fine. Simon has a good point, and I agree that it is easy
to handle this on the application side.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:12>

Django

unread,
May 7, 2024, 4:03:20 AMMay 7
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Documentation | Version: dev
Severity: Normal | Resolution: needsinfo
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Sarah Boyce):

* resolution: => needsinfo
* status: new => closed

Comment:

Hi all, currently closing as "needsinfo". It's not clear to me where in
the docs you're hoping to add something or what would have been useful in
the docs for this case. Can you share an idea of what you have in mind?
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:13>

Django

unread,
May 10, 2024, 11:55:38 AMMay 10
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: (none)
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: dev
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Jacob Walls):

* has_patch: 0 => 1
* resolution: needsinfo =>
* status: closed => new

Comment:

[https://github.com/django/django/pull/18156 Suggested doc edits]
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:14>

Django

unread,
May 10, 2024, 3:25:29 PMMay 10
to django-...@googlegroups.com
#35434: prefetch_related_objects fails to cache UUID FKs when the string
representation of a UUID is used
-------------------------------------+-------------------------------------
Reporter: Selcuk Ayguney | Owner: (none)
Type: | Status: closed
Cleanup/optimization |
Component: Documentation | Version: dev
Severity: Normal | Resolution: wontfix
Keywords: | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by Natalia Bidart):

* resolution: => wontfix
* status: new => closed

Comment:

I have reviewed the PR and as mentioned there I'm not comfortable with the
adding, I think it's misleading and could be interpreted it as a use case
that Django supports when it does not. In my view, the example line doing
`DailyEntry(day="2020-01-01", headline="Happy New Year!")` goes in direct
opposition of [comment:10 what Simon said here]:

> There's a difference between lookup against the database (e.g.
`filter(pet_id=str(pet.id))`) which performs implicit conversions and
direct model attribute assignments. I would argue that if you explicitly
assign the string representation of objects meant to be stored in a field
to a model instance you are effectively doing something wrong.
--
Ticket URL: <https://code.djangoproject.com/ticket/35434#comment:15>
Reply all
Reply to author
Forward
0 new messages