bulk insert from generator instead of list, possible?

296 views
Skip to first unread message

Andy Crain

unread,
Jun 16, 2016, 12:32:08 PM6/16/16
to sqlalchemy
Hi,

I'm attempting to do a bulk insert from a large .csv file. I've read through the various options at http://docs.sqlalchemy.org/en/latest/faq/performance.html#i-m-inserting-400-000-rows-with-the-orm-and-it-s-really-slow, and I would like to perform a bulk insert using Core, along the lines of:

engine.execute(MyTable.__table__.insert(), [...])

Instead of a list, though, I'd like to use a generator, as I'm very memory constrained. When I try something like this:

engine.execute(MyTable.__table__.insert(), (d for d in my_dict_generator()) )

...I get "AttributeError: 'list' object has no attribute 'keys'".

Is it not possible to use a generator in lieu of a list here?

Thanks,
Andy 


Mike Bayer

unread,
Jun 16, 2016, 1:03:04 PM6/16/16
to sqlal...@googlegroups.com
not directly. You should break your generator into batches:

import itertools
with engine.begin() as conn:
while True:
chunk = [elem for elem in itertools.islice(generator, 0, 10000)]
if not chunk:
break
conn.execute(table.__insert__, chunk)

>
> Thanks,
> Andy
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Andy Crain

unread,
Jun 16, 2016, 1:16:15 PM6/16/16
to sqlalchemy
Mike,
Thanks very much. I thought that might be the case, but was hoping to avoid even chunking, was hoping for pure streaming. I'll go with that approach.
FYI, I think I was thrown off by stream_results; I was thinking perhaps it was possible to use a generator to stream a huge insert, but I believe I misunderstood stream_results, which seems to be more about streaming large selects by the driver, instead of buffering entire result sets.
Thanks again.
Andy 
Reply all
Reply to author
Forward
0 new messages