Adding a bulk_save method to models

Skip to first unread message

Tom Forbes

unread,
Jan 19, 2018, 12:49:48 PM1/19/18
to django-d...@googlegroups.com

Hello all,

I’d love for some feedback on an idea I’ve been mulling around lately, namely adding a bulk_save method to Dango.

A somewhat common pattern for some applications is to loop over a list of models, set an attribute and call save on them. This unfortunately can issue a lot of database queries which can be a significant slowdown. You can work around this by using ‘.update()’ in some cases, but not all.

It seems it would be possible to use a CASE statement in SQL to handle bulk-updating many rows with differing values. For example:

SomeModel.object.filter(id__in=[1,2]).update(
    some_field=Case(
        When(id=1, then=Value('Field value for ID=1')),
        When(id=2, then=Value('Field value for ID=2'))
    )
)

I’ve made a ticket for this here: https://code.djangoproject.com/ticket/29037

I managed to get a 70x performance increase using this technique on a fairly large table, and it seems it could be applicable to many projects just like bulk_create.

The downsides to this is that it can produce very large SQL statements when updating many rows (I had MySQL complain about a 10MB statement once), but this can be overcome with batching and other optimisations (i.e the same values can use WHEN id IN (x, y, z) rather than 3 individual WHEN statements).

I’m imagining an API very similar to bulk_create, but spend any time on a patch I thought I would ask if anyone have any feedback on this suggestion. Would this be a good addition to Dango?



Neal Todd

unread,
Jan 22, 2018, 10:10:45 AM1/22/18
to Django developers (Contributions to Django itself)
Hi Tom,

A built-in bulk save that's more flexible than update would certainly be nice. Just in case you haven't come across it though, there is a package called django-bulk-update:

https://github.com/aykut/django-bulk-update

I've found it very useful on a number of occassions where update isn't quite enough but the loop-edit-save pattern is too slow to be convenient.

Probably some useful things in there when considering the API and approach.

Cheers, Neal 

Tom Forbes

unread,
Jan 22, 2018, 2:41:11 PM1/22/18
to django-d...@googlegroups.com

Hey Neal,

Thank you very much for pointing that out, I actually found out about this package as I was researching the ticket - I wish I had known about this a couple of years ago as it would have saved me a fair bit of CPU and brain time!

I think that module is a good starting point and proves that it’s possible, however I think the implementation can be improved upon if we bring it inside core. I worked on a small PR to add this and the implementation was refreshingly simple. It still needs docs, a couple more tests and to fix a strange error with sqlite on Windows, but overall it seems like a lot of gain for a small amount of code.

Tom

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5988d579-7843-4c42-a6f9-1e389c58ece6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Neal Todd

unread,
Jan 23, 2018, 7:38:18 AM1/23/18
to Django developers (Contributions to Django itself)
Hi Tom,

That's great, should be a helpful addition to core. Will follow the ticket and PR.

Neal

(Apologies - I hadn't spotted that you'd already referenced django-bulk-update in your ticket when I left my drive-by comment!)

Tim Graham

unread,
Sep 14, 2018, 10:56:38 AM9/14/18
to Django developers (Contributions to Django itself)

I wanted to ask about naming of the new method. Currently the proposed name is "QuerySet.bulk_save()" but I think it's a bit confusing since it uses QuerySet.update(), not Model.save(). It works similarly to QuerySet.bulk_update() from https://github.com/aykut/django-bulk-update but the arguments are a bit different.


Josh's comment on the PR: "Since this only works for instances with an pk, do you think that bulk_update would be a better name? The regular save() method can either create or update depending on pk status which may confuse users here."

And Tom's reply: "I considered this, but queryset.update() is the best 'bulk update' method. I didn't want to confuse the two, this is more about saving multiple model fields with multiple differing values, gene bulk_save. Open to changing it though."


Raphael Michel

unread,
Sep 14, 2018, 11:32:08 AM9/14/18
to Tim Graham, django-d...@googlegroups.com
Hi,

I'd be very careful about calling it bulk_save(), since calling
it something with save() very strongly suggests that it calls pre_save
or post_save signals.

Best
Raphael


Am Fri, 14 Sep 2018 07:56:38 -0700 (PDT)
schrieb Tim Graham <timog...@gmail.com>:
> >> <https://github.com/django/django/pull/9606/files#diff-5b0dda5eb9a242c15879dc9cd2121379R473>
> >> <https://groups.google.com/d/msgid/django-developers/5988d579-7843-4c42-a6f9-1e389c58ece6%40googlegroups.com?utm_medium=email&utm_source=footer> .

Tom Forbes

unread,
Sep 15, 2018, 10:01:16 AM9/15/18
to django-d...@googlegroups.com

My original reasoning was that Queryset.update() already bulk updates rows, so the bulk prefix seems a bit redundant here (how do you bulk something that already does something in bulk?). .save() however operates on a single object, so the bulk prefix seems more appropriate and easier to understand.

I agree bulk_save() maybe is not the best name as people might expect signals to be sent, but are there any suggestions other than bulk_update()? Maybe something more accurate, like bulk_update_fields()? Or bulk_save_fields()?

Adam Johnson

unread,
Sep 15, 2018, 6:15:39 PM9/15/18
to django-d...@googlegroups.com
Bikeshed time.

I'm also against bulk_save for the same reason that it implies save().

bulk_update sounds okay to me, update() is indeed already a 'bulk' operation but it could be claimed this is doing a 'bulk' amount of update operations.

bulk_update_fields also sounds good, the longer method name is probably balanced by the lower frequency of use.



For more options, visit https://groups.google.com/d/optout.


--
Adam

charettes

unread,
Sep 15, 2018, 8:34:11 PM9/15/18
to Django developers (Contributions to Django itself)
I also dislike bulk_save() for the same reasons.

I feel like bulk_update makes the most of sense given it has a signature similar to bulk_create where an iterable of model instances must be passed we're really just performing an update.

To the bulk_update and update is the natural analogous to what bulk_create is to create; bulk_update_fields feels too verbose and breaks the symmetry of bulk_create/create for update.

Cheers,
Simon

Tobias Kunze

unread,
Sep 16, 2018, 5:19:42 AM9/16/18
to django-d...@googlegroups.com
On 18-09-15 23:15:10, Adam Johnson wrote:
>> I agree bulk_save() maybe is not the best name as people might expect
>> signals to be sent, but are there any suggestions other than bulk_update()?
>> Maybe something more accurate, like bulk_update_fields()? Or
>> bulk_save_fields()?
>
>bulk_update_fields also sounds good, the longer method name is probably
>balanced by the lower frequency of use.

bulk_update_fields() sounds fine to me, as it makes clearer what
happens. With bulk_update() alone, I'd expect the exact analogous
action to update() to occur, since we're already used to that pattern
from create() vs bulk_create().

update_fields() alone may also work. Upside: it's shorter. Downside:
it's not immediately clear that it takes an iterable and not an
instance. I'd be happy with both options.

Best regards,
Tobias
signature.asc

Tom Forbes

unread,
Sep 16, 2018, 11:24:09 AM9/16/18
to django-d...@googlegroups.com

Thank you all for the feedback, I’ve changed the method to be bulk_update() as this seems to be the most liked option. Naming things is hard, and while bulk_update() isn’t perfect I think it’s a bit better than bulk_update_fields() or just update_fields().

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages