Hello all,
I’d love for some feedback on an idea I’ve been mulling around lately, namely adding a bulk_save method to Dango.
A somewhat common pattern for some applications is to loop over a list of models, set an attribute and call save on them. This unfortunately can issue a lot of database queries which can be a significant slowdown. You can work around this by using ‘.update()’ in some cases, but not all.
It seems it would be possible to use a CASE statement in SQL to handle bulk-updating many rows with differing values. For example:
SomeModel.object.filter(id__in=[1,2]).update(
some_field=Case(
When(id=1, then=Value('Field value for ID=1')),
When(id=2, then=Value('Field value for ID=2'))
)
)
I’ve made a ticket for this here: https://code.djangoproject.com/ticket/29037
I managed to get a 70x performance increase using this technique on a fairly large table, and it seems it could be applicable to many projects just like bulk_create.
The downsides to this is that it can produce very large SQL statements when updating many rows (I had MySQL complain about a 10MB statement once), but this can be overcome with batching and other optimisations (i.e the same values can use WHEN id IN (x, y, z) rather than 3 individual WHEN statements).
I’m imagining an API very similar to bulk_create, but spend any time on a patch I thought I would ask if anyone have any feedback on this suggestion. Would this be a good addition to Dango?
Hey Neal,
Thank you very much for pointing that out, I actually found out about this package as I was researching the ticket - I wish I had known about this a couple of years ago as it would have saved me a fair bit of CPU and brain time!
I think that module is a good starting point and proves that it’s possible, however I think the implementation can be improved upon if we bring it inside core. I worked on a small PR to add this and the implementation was refreshingly simple. It still needs docs, a couple more tests and to fix a strange error with sqlite on Windows, but overall it seems like a lot of gain for a small amount of code.
Tom
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5988d579-7843-4c42-a6f9-1e389c58ece6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I wanted to ask about naming of the new method. Currently the proposed name is "QuerySet.bulk_save()" but I think it's a bit confusing since it uses QuerySet.update(), not Model.save(). It works similarly to QuerySet.bulk_update() from https://github.com/aykut/django-bulk-update but the arguments are a bit different.
Josh's comment on the PR: "Since this only works for instances with an pk, do you think that bulk_update
would be a better name? The regular save()
method can either create or update depending on pk status which may confuse users here."
And Tom's reply: "I considered this, but queryset.update()
is
the best 'bulk update' method. I didn't want to confuse the two, this is
more about saving multiple model fields with multiple differing values,
gene bulk_save
. Open to changing it though."
My original reasoning was that Queryset.update()
already bulk updates rows, so the bulk prefix seems a bit redundant here (how do you bulk something that already does something in bulk?). .save()
however operates on a single object, so the bulk prefix seems more appropriate and easier to understand.
I agree bulk_save()
maybe is not the best name as people might expect signals to be sent, but are there any suggestions other than bulk_update()
? Maybe something more accurate, like bulk_update_fields()
? Or bulk_save_fields()
?
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/20180914173155.41685505%40kvothe.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAFNZOJNS%2BnwSAHgxdsTxYZOi%3Dsed%3DQLjX0%2BcXhxzOORC0K%2BfoQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
Thank you all for the feedback, I’ve changed the method to be bulk_update()
as this seems to be the most liked option. Naming things is hard, and while bulk_update()
isn’t perfect I think it’s a bit better than bulk_update_fields()
or just update_fields()
.
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/20180916091931.7hlmh2xc5eo5z7ws%40ronja.localdomain.