bulk_update() support for unique fields instead of only primary key

1,059 views
Skip to first unread message

Ebram Shehata

unread,
Jun 6, 2022, 8:24:14 PM6/6/22
to Django developers (Contributions to Django itself)
I've already created a ticket that ended up with 'WONTFIX' here.
Please read it first.

The scenario that led me trying to add that feature:
Well, I'v a model that has a primary key and unique field...
Every  X period of time I've a cron that calls an external API to load updated data and store it in my Django project db.
Thing is I want to execute the update query without having to have the pk with me.. I already have the unique field in hand (got it from the API) and it should be enough to identify what object to be updated..

example code that will do so..

# call external API and load data as list[dict]
objs = list()
for record in data:
    # Just create instance of MyModel class with values needed..
    objs.append(MyModel(my_unique_field=data['national_id'], name="New Name"))
# I expect to do:
MyModel.objects.bulk_update(objs, fields=["name"], unique_field="my_unique_field")


The current implementation requires every object to have its primary key set, but, that's not always the case. Also, if the primary key is my-django-project-generated, there's no way AFAIK to get those PKs without having to query them from DB, which will be a very expensive query to run if the data is large.

You can find the POC PR here: https://github.com/django/django/pull/15764

I really thing that'd be a very useful update to add.

I also have a question about this function: We can't use it to update primary keys.. I'm wondering why ? I want to also add that support to it.

Jörg Breitbart

unread,
Jun 7, 2022, 2:03:13 AM6/7/22
to django-d...@googlegroups.com
Imho the pk field being treated special by bulk_update (as the only
record filter predicate, non updatable) comes from one simple idea -
stable object identity during the updates. For an ORM proper object
identification is rather important to not mix things up, shifting
identities might create havoc further down the road or in internal code,
if it is not prepared to deal with identity shifts. Whether stable
identities are really needed for bulk_update or could be lifted by
making pk updatable I cannot tell for sure.

About record identification by another unique field - imho that should
be safe as long as the unique field can sufficiently identify rows in
question on a table. Implementation might get complicated in multi table
inheritance, where the unqiue field may not reside on the target table
itself (would need a similar tables+rows lookup expansion as done for pk).
Overall this sounds like a valuable API addition to bulk_update, whether
it ultimately could lift the non updatable state of pk itself, remains
uncertain (prolly needs tons of tests and a serious code review of some
internals).

Cheers,
jerch

NB: Btw fetching proper pks upfront from some other unique field is
typically very cheap compared to bulk_update runtime itself, given you
have indexed those columns.


Am 07.06.22 um 02:15 schrieb Ebram Shehata:
> I've already created a ticket that ended up with 'WONTFIX' here
> <https://code.djangoproject.com/ticket/33770>.
> Please read it first.
>
> The scenario that led me trying to add that feature:
> Well, I'v a model that has a primary key and unique field...
> Every  X period of time I've a cron that calls an external API to load
> updated data and store it in my Django project db.
> Thing is I want to execute the update query without having to have the
> pk with me.. I already have the unique field in hand (got it from the
> API) and it should be enough to identify what object to be updated..
>
> example code that will do so..
>
> /# call external API and load data as list[dict]/
> /objs = list()/
> /for record in data:/
> /    # Just create instance of MyModel class with values needed../
> /    objs.append(MyModel(my_unique_field=data['national_id'], name="New
> Name"))/
> /# I expect to do:
> MyModel.objects.bulk_update(objs, fields=["name"],
> unique_field="my_unique_field")/
>
> The current implementation requires every object to have its primary key
> set, but, that's not always the case. Also, if the primary key is
> my-django-project-generated, there's no way AFAIK to get those PKs
> without having to query them from DB, which will be a very expensive
> query to run if the data is large.
>
> You can find the POC PR here: https://github.com/django/django/pull/15764
>
> I really thing that'd be a very useful update to add.
>
> I also have a question about this function: We can't use it to update
> primary keys.. I'm wondering why ? I want to also add that support to it.
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/f532b4fd-aba6-44dd-b373-e6e934ff8b67n%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/f532b4fd-aba6-44dd-b373-e6e934ff8b67n%40googlegroups.com?utm_medium=email&utm_source=footer>.

Ebram Shehata

unread,
Jun 7, 2022, 5:05:52 PM6/7/22
to Django developers (Contributions to Django itself)

Thank you for your reply.

> Implementation might get complicated in multi table
> inheritance, where the unqiue field may not reside on the target table
> itself (would need a similar tables+rows lookup expansion as done for pk).

How can it be complicated ? I thought it's just something like getattr(MyModel, field_name) to get value and
model._meta.get_field(unique_field) to get the field instance..
Where can I find explanation for what you mean? and where can I find the PK code you're referring to?


> Overall this sounds like a valuable API addition to bulk_update

Thank you. I was asked to re-open the ticket if there's an agreement to add the update..
Should I re-open it or we will need more than one person to agree on the update ?
(I'm sorry, this is my first contribution)

Jörg Breitbart

unread,
Jun 8, 2022, 8:17:32 AM6/8/22
to django-d...@googlegroups.com

> How can it be complicated ? I thought it's just something like
> getattr(MyModel, field_name) to get value and
> model._meta.get_field(unique_field) to get the field instance..
> Where can I find explanation for what you mean? and where can I find the
> PK code you're referring to?

Field updates for multi-table models are internally split into multiple
tables updates, if the fields reside on several tables from the
inheritance chain. The identity of records is matched against pk values
stored in OneToOne xxx_ptr fields on the subtables. You can find the
construction logic for this in UpdateQuery.get_related_updates
(https://github.com/django/django/blob/49b470b9187b6be60e573fed08c8f4a87f133750/django/db/models/sql/subqueries.py#L124).

With using a different field for record identification without having
the pk (thus from degenerated model instances) this wont work anymore
(note it directly refers to pk__in selection). Most likely you will have
to do an explicit db roundtrip resolving "some_unique_field -> pk"
upfront to keep it working for multi-table spanning updates. Thats where
the impl might get complicated/messy.


> Thank you. I was asked to re-open the ticket if there's an agreement to
> add the update..
> Should I re-open it or we will need more than one person to agree on the
> update ?
> (I'm sorry, this is my first contribution)

You def. should get for more feedback, a single opinion does not mean
much here.

Cheers, jerch

Ebram Shehata

unread,
Jun 19, 2022, 9:14:56 PM6/19/22
to Django developers (Contributions to Django itself)
Ah, I see. Thank you for explaining the thing and linking the code.
It looks more interesting now! I'm interested to start working on it.
I'll wait for approval from other people, too.

Reply all
Reply to author
Forward
0 new messages