[Django] #34207: Incorrect SQL query when adding a ManyToMany related object with a "through" table prevents adding a new relationship if the new relationship is identical except for a different value for "through

Django

unread,

Dec 10, 2022, 8:59:00 AM12/10/22

to django-...@googlegroups.com

#34207: Incorrect SQL query when adding a ManyToMany related object with a
"through" table prevents adding a new relationship if the new relationship
is identical except for a different value for "through_defaults"
-------------------------------------+-------------------------------------
Reporter: | Owner: nobody
Credentive |
Type: Bug | Status: new
Component: Database | Version: 4.1
layer (models, ORM) | Keywords: ManyToManyField
Severity: Normal | through
Triage Stage: | Has patch: 0
Unreviewed |
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
-------------------------------------+-------------------------------------
I cannot add a related element to an object that is related via a
ManyToMany relationship with a through field. I believe I have traced the
error to a bug in the SQL query generated.

Some Code extracts to set the stage:

models.py:

{{{
class Version(models.Model):
uuid = models.UUIDField(primary_key=True, unique=True, editable=False,
default=uuid.uuid4, )
<...>

class Section(models.Model):
uuid = models.UUIDField(primary_key=True, unique=True, editable=False,
default=uuid.uuid4, )
<...>

class Statement(models.Model):
uuid = models.UUIDField(primary_key=True, unique=True, editable=False,
default=uuid.uuid4, )
<...>
section = models.ManyToManyField(
Section, through="StatementLocation", symmetrical=False,
related_name="statements_in_section"
)

class StatementLocation(models.Model):
section = models.ForeignKey(Section, on_delete=models.CASCADE)
statement = models.ForeignKey(Statement, on_delete=models.CASCADE)
sl_version = models.ForeignKey(Version, on_delete=models.CASCADE)

class Meta:
constraints = [
models.UniqueConstraint(
fields=["section", "statement", "sl_version"],
name="unique_statement_location_per_version"
),
models.UniqueConstraint(fields=["statement", "sl_version"],
name="statement_appears_once_per_version"),
]
}}}

The idea behind the model is that we are representing a document that has
statements in sections. Statements can be moved around in different
versions, so the relationship between a statement and a section, called
"statement location", must be tied to a version.

I have a function under the Version model called "copy". The "copy"
function produces a copy of the version and recreates the existing
relationships.

models.Version (extract):

{{{
def copy(self, name: str, public: bool) -> "Version":
# Make a copy of a published version and return a reference to the
copy
# Create the new version object in the Database
new_version = Version(
name=name,
state="DRAFT",
public=public,
obsoletes=self,
authority=self.authority,
)
new_version.save()

return new_version
<...>

for statement in self.version_data.statements:
<...>
sect = statement.section.filter(version=self).get()
statement.section.add(sect, through_defaults={"sl_version":
new_version})
<...>
}}}

When executing the function, I noticed that the new "StatementLocation"
objects were not being created. For example:

{{{
>>> Version.objects.all()
<QuerySet [<Version: Document - 2.2>]>
>>> version = Version.objects.get()
>>> StatementLocation.objects.count()
89
>>> StatementLocation.objects.filter(sl_version=version).count()
89
>>> new_verson = version.copy(name="2.3", public=True)
>>> new_verson
<Version: Document - 2.3>
>>> StatementLocation.objects.count()
89
>>> StatementLocation.objects.filter(sl_version=new_verson).count()
0
>>>
}}}

If I try a simplified version, with SQL query logging enabled, I get the
following:

{{{
# Create New Version
>>> new_version = Version(name="2.3", state="DRAFT", public=True,
obsoletes=version, authority=version.authority)

# Save It
>>> new_version.save()
(0.002) SELECT "policypublisher_version"."uuid",
"policypublisher_version"."name", "policypublisher_version"."state",
"policypublisher_version"."public",
"policypublisher_version"."published_date",
"policypublisher_version"."effective_date",
"policypublisher_version"."obsoletes_id",
"policypublisher_version"."authority_id" FROM "policypublisher_version"
WHERE "policypublisher_version"."uuid" =
'bcbb218ebdc5435b9e5caf879b55cd24' LIMIT 21;
args=('bcbb218ebdc5435b9e5caf879b55cd24',); alias=default
(0.014) INSERT INTO "policypublisher_version" ("uuid", "name", "state",
"public", "published_date", "effective_date", "obsoletes_id",
"authority_id") VALUES ('bcbb218ebdc5435b9e5caf879b55cd24', '2.3',
'DRAFT', 1, NULL, NULL, 'be8225c85fab4a0ba8ea879a3d992abe',
'833c1b9b30964f3a90b045e2a1254c1b');
args=('bcbb218ebdc5435b9e5caf879b55cd24', '2.3', 'DRAFT', True, None,
None, 'be8225c85fab4a0ba8ea879a3d992abe',
'833c1b9b30964f3a90b045e2a1254c1b'); alias=default

# Get a random statement
>>> statement=Statement.objects.first()
(0.003) SELECT "policypublisher_statement"."uuid",
"policypublisher_statement"."text", "policypublisher_statement"."type",
"policypublisher_statement"."public",
"policypublisher_statement"."obsoletes_id" FROM
"policypublisher_statement" ORDER BY "policypublisher_statement"."uuid"
ASC LIMIT 1; args=(); alias=default

# Make sure we got one
>>> statement
<Statement: Rules for interpreting the pivFASC-N name>

# Add the new version to the statement's version list
>>> statement.version.add(new_version)
(0.000) BEGIN; args=None; alias=default
(0.006) INSERT OR IGNORE INTO "policypublisher_statement_version"
("statement_id", "version_id") VALUES ('0509afa269604028818dc1f53b8f93d2',
'bcbb218ebdc5435b9e5caf879b55cd24');
args=('0509afa269604028818dc1f53b8f93d2',
'bcbb218ebdc5435b9e5caf879b55cd24'); alias=default

# get the section for the statement under the old version
>>> sect=statement.section.filter(version=version).get()
(0.001) SELECT "policypublisher_section"."uuid",
"policypublisher_section"."name", "policypublisher_section"."public" FROM
"policypublisher_section" INNER JOIN "policypublisher_statementlocation"
ON ("policypublisher_section"."uuid" =
"policypublisher_statementlocation"."section_id") INNER JOIN
"policypublisher_section_version" ON ("policypublisher_section"."uuid" =
"policypublisher_section_version"."section_id") WHERE
("policypublisher_statementlocation"."statement_id" =
'0509afa269604028818dc1f53b8f93d2' AND
"policypublisher_section_version"."version_id" =
'be8225c85fab4a0ba8ea879a3d992abe') LIMIT 21;
args=('0509afa269604028818dc1f53b8f93d2',
'be8225c85fab4a0ba8ea879a3d992abe'); alias=default

# Make Sure we got one
>>> sect
<Section: Rules for Interpreting Various Name Forms>

# Add a new section via the through field # NOTE NO INSERT!
>>> statement.section.add(sect,
through_defaults={"sl_version":new_version})
(0.000) BEGIN; args=None; alias=default
(0.002) SELECT "policypublisher_statementlocation"."section_id" FROM
"policypublisher_statementlocation" WHERE
("policypublisher_statementlocation"."section_id" IN
('1cc10c2e5bba4d238d5fabbfdfd72d83') AND
"policypublisher_statementlocation"."statement_id" =
'0509afa269604028818dc1f53b8f93d2');
args=('1cc10c2e5bba4d238d5fabbfdfd72d83',
'0509afa269604028818dc1f53b8f93d2'); alias=default

# No new object was created
>>> StatementLocation.objects.filter(sl_version=new_version).count()
(0.003) SELECT COUNT(*) AS "__count" FROM
"policypublisher_statementlocation" WHERE
"policypublisher_statementlocation"."sl_version_id" =
'bcbb218ebdc5435b9e5caf879b55cd24';
args=('bcbb218ebdc5435b9e5caf879b55cd24',); alias=default
0
}}}

On the second to last statement, I believe Django is checking to see
whether there is an existing M2M relationship, but it is not adding the
version_id to the query, so it thinks the relationship already exists!

Note that I can manually create the relationship, so this is not a
constraints problem:

{{{
>>> new_sl = StatementLocation(section=sect, statement=statement,
sl_version=new_version)
>>> new_sl.save()
(0.027) INSERT INTO "policypublisher_statementlocation" ("section_id",
"statement_id", "sl_version_id") VALUES
('1cc10c2e5bba4d238d5fabbfdfd72d83', '0509afa269604028818dc1f53b8f93d2',
'bcbb218ebdc5435b9e5caf879b55cd24');
args=['1cc10c2e5bba4d238d5fabbfdfd72d83',
'0509afa269604028818dc1f53b8f93d2', 'bcbb218ebdc5435b9e5caf879b55cd24'];
alias=default
>>> StatementLocation.objects.filter(sl_version=new_version).count()
(0.001) SELECT COUNT(*) AS "__count" FROM
"policypublisher_statementlocation" WHERE
"policypublisher_statementlocation"."sl_version_id" =
'bcbb218ebdc5435b9e5caf879b55cd24';
args=('bcbb218ebdc5435b9e5caf879b55cd24',); alias=default
1
>>>
}}}

The FIX:

When a ManyToMany relationship is created with a "through" table, and
through_defaults are provided, that should be part of the search query
that django uses to determine if the relationship already exists.

e.g.
SELECT "policypublisher_statementlocation"."section_id" FROM
"policypublisher_statementlocation" WHERE
("policypublisher_statementlocation"."section_id" IN
('1cc10c2e5bba4d238d5fabbfdfd72d83') AND
"policypublisher_statementlocation"."statement_id" =
'0509afa269604028818dc1f53b8f93d2') AND
"policypublisher_statementlocation"."sl_version_id" =
'bcbb218ebdc5435b9e5caf879b55cd24';

--
Ticket URL: <https://code.djangoproject.com/ticket/34207>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,

Dec 10, 2022, 9:29:46 AM12/10/22

to django-...@googlegroups.com

#34207: Incorrect SQL query when adding a ManyToMany related object with a
"through" table prevents adding a new relationship if the new relationship
is identical except for a different value for "through_defaults"
-------------------------------------+-------------------------------------

Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Changes (by Simon Charette):

* status: new => closed
* resolution: => duplicate

Comment:

`through_defaults` are values that are used to create a missing
relationship between two entities that related to each other through a
many-to-many relationship.

If the relationship already exists between the origin and the destination,
like it's the case when you do `statement.section.add(sect,
through_defaults={"sl_version":new_version})`, then they are ignored as
expected as `ManyToManyField` only supports a single related value between
two entities.

This is reflected in other APIs such as `statement.section.remove` which
doesn't specify through values.

In other words, while `ManyToManyField` supports relationship annotations
these annotations cannot uniquely define the relationship between the
related objects; `ManyToManyField` doesn't support multi-dimensional
relationships.

Duplicate of #33209

--
Ticket URL: <https://code.djangoproject.com/ticket/34207#comment:1>

Django

unread,

Dec 10, 2022, 9:34:09 AM12/10/22

to django-...@googlegroups.com

#34207: Incorrect SQL query when adding a ManyToMany related object with a
"through" table prevents adding a new relationship if the new relationship
is identical except for a different value for "through_defaults"
-------------------------------------+-------------------------------------
Reporter: Credentive | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: ManyToManyField | Triage Stage:
through | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Credentive):

Your example documentation strongly suggests otherwise (while cautioning
about "remove"), see here:
https://docs.djangoproject.com/en/4.1/topics/db/models/#extra-fields-on-
many-to-many-relationships

{{{
>>> ringo = Person.objects.create(name="Ringo Starr")
>>> paul = Person.objects.create(name="Paul McCartney")
>>> beatles = Group.objects.create(name="The Beatles")
>>> m1 = Membership(person=ringo, group=beatles,
... date_joined=date(1962, 8, 16),
... invite_reason="Needed a new drummer.")
>>> m1.save()
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>]>
>>> ringo.group_set.all()
<QuerySet [<Group: The Beatles>]>
>>> m2 = Membership.objects.create(person=paul, group=beatles,
... date_joined=date(1960, 8, 1),
... invite_reason="Wanted to form a band.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>]>
>>> Membership.objects.create(person=ringo, group=beatles,
... date_joined=date(1968, 9, 4),
... invite_reason="You've been gone for a month and we miss you.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>, <Person: Ringo
Starr>]>
>>> # This deletes both of the intermediate model instances for Ringo
Starr
>>> beatles.members.remove(ringo)
>>> beatles.members.all()
<QuerySet [<Person: Paul McCartney>]>
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/34207#comment:2>

Django

unread,

Dec 10, 2022, 1:16:41 PM12/10/22

to django-...@googlegroups.com

#34207: Incorrect SQL query when adding a ManyToMany related object with a
"through" table prevents adding a new relationship if the new relationship
is identical except for a different value for "through_defaults"
-------------------------------------+-------------------------------------
Reporter: Credentive | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: ManyToManyField | Triage Stage:
through | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Simon Charette):

> Your example documentation strongly suggests otherwise

Could you elaborate on that? From what I can read none of the examples you
linked suggest that `add` supports this feature?

There is effectively a section that points at the fact that
`remove(related)` will remove all entries matching the `(from, to)` tuple
but defined by the many-to-many

In the example you've just provided you don't use `members.add` to add the
second relationship to Ringo but rely on explicit `Membership` creation
which is supported.

To make it clear, intermediary models that don't define a strict unique
relationship on `(from, to)` are supported. However
`add(through_defaults)` is focused on enforcing that at least one entry of
the `(from, to)` tuple exists and doesn't make any assumption with regards
to the unique constraints defined on the intermediary model.

If you want to enforce the unique existence of a tuple that is a superset
of the the `(from, to)` relationship you should use `get_or_create`
instead of expecting `add(through_defaults)` to figure out which of your
unique constraints should be enforced based on your provided
`through_defaults`.

--
Ticket URL: <https://code.djangoproject.com/ticket/34207#comment:3>

Django

unread,

Dec 12, 2022, 6:00:32 AM12/12/22

to django-...@googlegroups.com

#34207: Incorrect SQL query when adding a ManyToMany related object with a
"through" table prevents adding a new relationship if the new relationship
is identical except for a different value for "through_defaults"
-------------------------------------+-------------------------------------
Reporter: Credentive | Owner: nobody
Type: Bug | Status: closed
Component: Database layer | Version: 4.1
(models, ORM) |
Severity: Normal | Resolution: duplicate
Keywords: ManyToManyField | Triage Stage:
through | Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------

Comment (by Credentive):

I've re-read the section, and while it's not stated that it will work,
it's also not clear to me that it won't.

{{{
Now that you have set up your ManyToManyField to use your intermediary
model (Membership, in this case), you’re ready to start creating some
many-to-many relationships. You do this by creating instances of the
intermediate model:

>>> ringo = Person.objects.create(name="Ringo Starr")
>>> paul = Person.objects.create(name="Paul McCartney")
>>> beatles = Group.objects.create(name="The Beatles")
>>> m1 = Membership(person=ringo, group=beatles,
... date_joined=date(1962, 8, 16),
... invite_reason="Needed a new drummer.")
>>> m1.save()
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>]>
>>> ringo.group_set.all()
<QuerySet [<Group: The Beatles>]>
>>> m2 = Membership.objects.create(person=paul, group=beatles,
... date_joined=date(1960, 8, 1),
... invite_reason="Wanted to form a band.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>]>

You can also use add(), create(), or set() to create relationships, as
long as you specify through_defaults for any required fields:

>>> beatles.members.add(john, through_defaults={'date_joined': date(1960,
8, 1)})
>>> beatles.members.create(name="George Harrison",
through_defaults={'date_joined': date(1960, 8, 1)})
>>> beatles.members.set([john, paul, ringo, george],
through_defaults={'date_joined': date(1960, 8, 1)})
You may prefer to create instances of the intermediate model directly.

If the custom through table defined by the intermediate model does not
enforce uniqueness on the (model1, model2) pair, allowing multiple values,
the remove() call will remove all intermediate model instances:

>>> Membership.objects.create(person=ringo, group=beatles,
... date_joined=date(1968, 9, 4),
... invite_reason="You've been gone for a month and we miss you.")
>>> beatles.members.all()
<QuerySet [<Person: Ringo Starr>, <Person: Paul McCartney>, <Person: Ringo
Starr>]>
>>> # This deletes both of the intermediate model instances for Ringo
Starr
>>> beatles.members.remove(ringo)
>>> beatles.members.all()
<QuerySet [<Person: Paul McCartney>]>
}}}

Looking at that text again, it is not clear that add() will not work for
multiple references to the same object, it's just suggested that adding
relationship objects directly is a matter of preference. It may be good to
clarify in the documentation that add does not support adding multiple
references to the same object. Would have saved me a few hours :)

In fact, all of the calls through the intermediate model kind of break
down, because you can't filter on attributes of the intermediate model in
the calls. I will be using the intermediate model for this, but actually
removing the manytomany relationship, as it doesn't really work. For
anyone who wants to create a self-referential relationship where the
values of the intermediate objects are intended to be used to filter the
relationship sets, the ManyToMany field won't support this, while the
intermediate objects themselves work fine. The only thing I could think of
that might make the work the way I want would be to write a custom
ManyRelatedManager, but I haven't seen any documentation on how to do
this, and it might not be worth the time just to get the ability to use
the manytomany methods.

If, in the example in the documentation, you wanted to query for the
members of the beatles as of a particular date, you would run into the
same problems I have. My whole use case is built around this, so I will
just use the intermediate table, and ignore all(), get(), add() etc.. The
use of the date in the example kind of suggests this could be possible
when in fact it isn't.

--
Ticket URL: <https://code.djangoproject.com/ticket/34207#comment:4>

Reply all

Reply to author

Forward

[Django] #34207: Incorrect SQL query when adding a ManyToMany related object with a "through" table prevents adding a new relationship if the new relationship is identical except for a different value for "through_defaults"

Django

Django

Django

Django

Django