bulk_create for multi table inheritance support

1,196 views
Skip to first unread message

Podrigal, Aron

unread,
Mar 29, 2016, 4:47:18 AM3/29/16
to Django developers (Contributions to Django itself)
Hi,

I worked on supporting bulk_insert for multi table inheritance here [1] I would like to get some feedback.

bulk_create can be used in one of 2 ways

1) If you already have parent records in the database and want to bulk_insert into the child table only.
2) There isn't any parent records, and need to create them as well.

In the first case, all database backends can support it. It simply requires the user to set the `parent_ptr` attributes for all child instances, and then do  bulk_create.
In the second case it gets tricky and it cannot be supported for all databases for any model as it requires a way to get all the ids from the inserted parent records. Postgres is the most flexible in that case and supports bulk_insert for any table and any type of field because it supports the RETURNING clause so we can always retrieve the ids for the inserted rows. For sqlite we can only support bulk_create if the model does not have a parent with an AutoField.
for MySQL I think we can rely on `LAST_INSERT_ID()` which will return the first ID out of the rows count inserted, so we can than generate a list of IDs of `range(last_insert_id, last_insert_id + count_rows)`. 

Can anyone confirm if we can rely on MySQL last_insert_id to be consistent and and without gaps for all records inserted at once?

Thanks.



--
Aron Podrigal
-
'1000001', '1110010', '1101111', '1101110'   '1010000', '1101111', '1100100', '1110010', '1101001', '1100111', '1100001', '1101100'

P: '2b', '31', '33', '34', '37', '34', '35', '38', '36', '30', '39', '39'

Anssi Kääriäinen

unread,
Mar 29, 2016, 6:19:14 AM3/29/16
to django-d...@googlegroups.com
For MySQL InnoDB tables the behavior is described here:
http://dev.mysql.com/doc/refman/5.1/en/innodb-auto-increment-handling.html,
see "Gaps in auto-increment values for “bulk inserts”" item. So, you
are safe as long as the lock mode is correct, and assuming the lock
mode is the default is OK for Django.

For SQLite there can't be concurrent inserts (tables are locked
exclusively when writing), so you can just check the max id from the
table after insert and calculate the values from that.

There are still all the 3rd party databases (mssql for example), so
even if all core backends can support the multi-table bulk create,
you'll need to take in account that this isn't necessarily the case
for all possible backends.

Personally I think we should just rename the method to fast_create()
and allow it to fall back to single row at time behavior. This way you
could safely call it for any objects on any database and know the
results will be OK. I don't see much point in throwing errors when
batch insert isn't possible. If batch insert isn't possible, the user
needs to insert the values one row at a time in any case.

- Anssi
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CANJp-yh7SXmC1a3T3RZjdDApdAqo6Op06DsQCqOQ0LhbmVhY5g%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Shai Berger

unread,
Mar 30, 2016, 7:07:21 PM3/30/16
to django-d...@googlegroups.com
On Tuesday 29 March 2016 13:19:01 Anssi Kääriäinen wrote:
>
> Personally I think we should just rename the method to fast_create()
> and allow it to fall back to single row at time behavior. This way you
> could safely call it for any objects on any database and know the
> results will be OK. I don't see much point in throwing errors when
> batch insert isn't possible. If batch insert isn't possible, the user
> needs to insert the values one row at a time in any case.
>

I beg to differ. fast_create() as you described will be suitable for some
users, but if creating 2000 records is going to require 2000 database
roundtrips, I'd appreciate a way to be told about it already when I'm testing
with 2. I could have other options to create the records (perhaps not using
the ORM at all), or I could decide early that the backend is unsuitable for my
needs.

For some optimizations, it is OK to just drop them if you can't do them. But
with bulk_create we could be talking about differences of orders of magnitude.
It seems to me that with that premise, the decision that it's ok to drop the
optimization should be explicit, and not Django's default.

My 2 cents,
Shai.

Alex Orange

unread,
Jul 31, 2016, 12:41:27 PM7/31/16
to Django developers (Contributions to Django itself)
I'm interested in this as well. Two points to make. One, why not start with postgresql, which if I understand correct already supports getting ids in bulk_create. I think the "how to support getting ids in MySQL" is a separate issue. Second, I strongly agree with Shai, falling back should either not happen at all, or need to be requested explicitly (a keyword to accept it that defaults to false).
Reply all
Reply to author
Forward
0 new messages