Feature idea: bulk_associate: Add ManyToMany relationships in bulk

493 views
Skip to first unread message

David Foster

unread,
Sep 26, 2019, 1:45:45 PM9/26/19
to Django developers (Contributions to Django itself)
Given the following example model:

class M1(models.Model):
    m2_set
= models.ManyToManyField('M2')

It is already possible to associate one M1 with many M2s with a single DB query:

m1.m2_set.add(*m2s)

However it's more difficult to associate many M1s with many M2s, particularly if you want to skip associations that already exist.

# NOTE: Does NOT skip associations that already exist!
m1_and_m2_id_tuples
= [(m1_id, m2_id), ...]
M1_M2
= M1.m2_set.through
M1_M2
.objects.bulk_create([
    M1_M2
(m1_id=m1_id, m2_id=m2_id)
   
for (m1_id, m2_id) in
    m1_and_m2_id_tuples
])

What if we could do something like the following instead:

bulk_associate(M1.m2_set, [(m1, m2), ...])
# --- OR ---
bulk_associate_ids
(M1.m2_set, [(m1_id, m2_id), ...])

I propose to write and add a bulk_associate() method to Django. I also propose to add a paired bulk_disassociate() method.

1. Does this sound like a good idea in general?


In more detail, I propose adding the specific APIs, importable from django.db:

M1 = TypeVar('M1', bound=Model)  # M1 extends Model
M2 = TypeVar('M2', bound=Model)  # M2 extends Model

def bulk_associate(
        M1_m2_set
: ManyToManyDescriptor,
        m1_m2_tuples
: 'List[Tuple[M1, M2]]',
       
*, assert_no_collisions: bool=True) -> None:
   
"""
    Creates many (M1, M2) associations with O(1) database queries.
   
    If any requested associations already exist, then they will be left alone.
   
    If you assert that none of the requested associations already exist,
    you can pass assert_no_collisions=True to save 1 database query.
    """

   
pass

def bulk_associate_ids(
        M1_m2_set
: ManyToManyDescriptor,
        m1_m2_id_tuples
: 'List[Tuple[Any, Any]]',
       
*, assert_no_collisions: bool=True) -> None:
    pass

If assert_no_collisions is False then (1 filter) query and (1 bulk_create) query will be performed.
If assert_no_collisions is True then only (1 bulk_create) will be performed.

def bulk_disassociate(
        M1_m2_set
: ManyToManyDescriptor,
        m1_m2_tuples
: 'List[Tuple[M1, M2]]') -> None:
   
"""
    Deletes many (M1, M2) associations with O(1) database queries.
    """
    pass

def bulk_disassociate_ids(
        M1_m2_set
: ManyToManyDescriptor,
        m1_m2_id_tuples
: 'List[Tuple[Any, Any]]') -> None:
    pass

The database connection corresponding to the M1_M2 through-table will be used.

2. Any comments on the specific API or capabilities?


If this sounds good I'd be happy to open an item on Trac and submit a PR.

David Foster

unread,
Sep 26, 2019, 1:51:56 PM9/26/19
to Django developers (Contributions to Django itself)
Errata: The proposed default value for assert_no_collisions is False rather than True, for safety.

David Foster

unread,
Sep 29, 2019, 9:13:50 PM9/29/19
to Django developers (Contributions to Django itself)
Here is another API variation I might suggest:

M1.m2_set.add_pairs(*[(m1, m2), ...], assert_no_collisions=False)
# --- OR ---
M1
.m2_set.add_pair_ids(*[(m1_id, m2_id), ...], assert_no_collisions=False)

This has the advantages of being more similar to the existing add() API and not requiring a special function import.

For bulk_disassociate() the analogous API would be:

M1.m2_set.remove_pairs(*[(m1, m2), ...])
# --- OR ---
M1
.m2_set.remove_pair_ids(*[(m1_id, m2_id), ...])

- David

On Thursday, September 26, 2019 at 10:45:45 AM UTC-7, David Foster wrote:

Tom Forbes

unread,
Oct 1, 2019, 6:02:38 AM10/1/19
to django-d...@googlegroups.com
Hey David,
I like this idea, while I don’t think the use case is common there have been a few times where I’ve needed this and got around it by creating/modifying the through model in bulk. Having a method that does this would be good IMO.

Unless anyone has strong opinions against this then can you make a ticket? 

Tom

On 30 Sep 2019, at 02:14, David Foster <davi...@gmail.com> wrote:


--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/49b45d05-836f-4512-91b8-8e4dbb55f6a4%40googlegroups.com.

David Foster

unread,
Oct 2, 2019, 2:06:15 AM10/2/19
to Django developers (Contributions to Django itself)
To unsubscribe from this group and stop receiving emails from it, send an email to django-d...@googlegroups.com.

David Foster

unread,
Oct 13, 2019, 6:37:07 PM10/13/19
to Django developers (Contributions to Django itself)
I've created a PR which is waiting for review, if someone has time.

According to Trac, the next step is:

For anyone except the patch author to review the patch using the patch review checklist and either mark the ticket as "Ready for checkin" if everything looks good, or leave comments for improvement and mark the ticket as "Patch needs improvement".

Thanks for any help.

- David

David Foster

unread,
Oct 13, 2019, 6:39:12 PM10/13/19
to Django developers (Contributions to Django itself)
Here's the link to the PR for review: https://github.com/django/django/pull/11899

(Apologies for the double-post)

- David

David Foster

unread,
Oct 26, 2019, 6:42:15 PM10/26/19
to Django developers (Contributions to Django itself)
Requesting reviewers for the latest iteration of the PR to bulk-associate many-to-many relationships.

The new PR to review, which is only a documentation change showing how to bulk-associate many-to-many relationships, is here: https://github.com/django/django/pull/11948 👈

In case it's useful, the previous PR which actually introduced two new methods "add_relations" and "remove_relations" is here: https://github.com/django/django/pull/11899

It was previously argued that the implementation of "add_relations" and "remove_relations" was simple enough that only a documentation change might be needed. But after seeing the relatively complex boilerplate that the proposed documentation suggests, I'm still leaning toward putting in dedicated "add_relations" and "remove_relations" methods. Comments here? Put them on the umbrella Trac ticket: https://code.djangoproject.com/ticket/30828

Cheers,
David Foster | Seattle, WA, USA
Reply all
Reply to author
Forward
0 new messages