Many To Many Insertion

102 views
Skip to first unread message

danem

unread,
Mar 4, 2020, 10:15:51 PM3/4/20
to peewee-orm
Hi, SQL and Peewee noob here:

I'm trying to create a database with a lot of many to many relations. I've found simply doing Model.create is way too slow. ~400k insertions takes ~500 seconds on my machine. I see there is the bulk insertion API which is much more performant. The problem with this however is that doing `insert_many` doesn't return the foreign keys I need to construct the other instances. The solution I've come up with is generating my own unique IDs inserting the many to many pairing on a separate table, and using an insert_from with a join to populate my real table. This is undesirable because it requires me to generate my own keys, as well as use a temporary table. 

Like I said, I'm an SQL noob so I'm sure I'm missing something obvious.

I've attached a toy example for reference.

Thanks

I've
peeweetest.py

Charles Leifer

unread,
Mar 4, 2020, 10:59:19 PM3/4/20
to peewe...@googlegroups.com
What database are you using? Postgres, for instance, supports returning the IDs of newly-created rows, which could make things easier if you happen to be using it.

Have you tried wrapping your inserts in a transaction? You might see a pretty huge speedup just by doing that, since you'll avoid a commit for every row and have just one commit at the end.

--
You received this message because you are subscribed to the Google Groups "peewee-orm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to peewee-orm+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/peewee-orm/36aa8026-e02f-406b-8f2d-af216a650e92%40googlegroups.com.

danem

unread,
Mar 5, 2020, 2:32:34 AM3/5/20
to peewee-orm
I'm using Sqlite. 

And are you referring to database.atomic? Yes, I've tried that and it is similarly slow. I've attached a file with two different tests for comparison. "test1" completes in 15 seconds. "test2" which doesn't use bulk insertion takes 147 seconds.

Thanks so much!


On Wednesday, March 4, 2020 at 7:59:19 PM UTC-8, Charles wrote:
What database are you using? Postgres, for instance, supports returning the IDs of newly-created rows, which could make things easier if you happen to be using it.

Have you tried wrapping your inserts in a transaction? You might see a pretty huge speedup just by doing that, since you'll avoid a commit for every row and have just one commit at the end.

On Wed, Mar 4, 2020 at 9:15 PM danem <danem...@gmail.com> wrote:
Hi, SQL and Peewee noob here:

I'm trying to create a database with a lot of many to many relations. I've found simply doing Model.create is way too slow. ~400k insertions takes ~500 seconds on my machine. I see there is the bulk insertion API which is much more performant. The problem with this however is that doing `insert_many` doesn't return the foreign keys I need to construct the other instances. The solution I've come up with is generating my own unique IDs inserting the many to many pairing on a separate table, and using an insert_from with a join to populate my real table. This is undesirable because it requires me to generate my own keys, as well as use a temporary table. 

Like I said, I'm an SQL noob so I'm sure I'm missing something obvious.

I've attached a toy example for reference.

Thanks

I've

--
You received this message because you are subscribed to the Google Groups "peewee-orm" group.
To unsubscribe from this group and stop receiving emails from it, send an email to peewe...@googlegroups.com.
peeweetest.py

danem

unread,
Mar 6, 2020, 1:40:47 PM3/6/20
to peewee-orm
As a point of reference, here is the equivalent behavior using sqlalchemy. This example completes in ~50 seconds on my machine. While not as quick as the optimized Peewee version, it doesn't require me to generate unique ids for each item i'm inserting, or introduce another table. Does Peewee support an interface like this?

Thanks
sqlalchemytest.py

Charles Leifer

unread,
Mar 6, 2020, 6:34:53 PM3/6/20
to peewe...@googlegroups.com
SQLAlchemy uses a fundamentally different design. It's a data-mapper so it keeps everything in memory and manages your relations that way. Peewee is activerecord so it doesn't do all the magic when you commit. It's a much simpler layer on top of sql in other words.

Pick your poison, either one is doing sql at the end of the day.

To unsubscribe from this group and stop receiving emails from it, send an email to peewee-orm+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/peewee-orm/844b73fb-6e6c-40b4-9b30-eb7f9e989e4d%40googlegroups.com.

danem

unread,
Mar 7, 2020, 4:15:36 PM3/7/20
to peewee-orm
Ok, no problem. I just wanted to be clear that there was no obvious improvements I could make on my approach.

Thanks again
Reply all
Reply to author
Forward
0 new messages