I'm not sure I can comment on the overall approach, but there are a
couple of things that might help you.
1. If you use Query.get rather than Query.filter, you won't actually
query the database when the object already exists in the session. You'll
probably need to clear the session every now and then (I don't think
flush() or commit() clear it, but I could be wrong)
2. You may want to distinguish Session.flush() from Session.commit() -
you could flush every N new objects, and only commit once at the very
end.
3. If you know you are the only person writing to the database, consider
setting expire_on_commit=False on your session. Otherwise I think
accessing orm_obj.id after Session.commit() will trigger another
(possibly unnecessary) query to the database.
Hope that helps,
Simon
1. theres just one commit() at the end. I'd like the whole operation in one transaction
2. There are flush() calls every 100-1000 or so. 10 is very low.
3. I frequently will disable autoflush, if there are many flushes occurring due to queries for related data as the bulk proceeds.
4. I dont use try/except to find duplicates - this invalidates the transaction (SQLAlchemy does this but many DBs force it anyway). I use a SELECT to get things ahead of time, preferably loading the entire database worth of keys into a set, or loading the keys that I know we're dealing with, so that individual per-key SELECTs are not needed. Or if the set of data I'm working with is the whole thing at once, I store the keys in a set as I get them, then I know which one's I've got as I go along.
5. if i really need to do try/except, use savepoints, i.e. begin_nested().
Or the same idea, using the multiprocessing module instead of threading if the GIL is still getting in the way. Or using Celery. Maybe a deferred approach like that of Twisted. There's lots of ways to offload slow IO operations while work continues.
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>