Hi all,
Just sharing some perf insights into the bulk operation function bulk_insert_mappings.
I was recently debugging a SQL Alchemy powered web app that was crashing due to out of memory issues on a small Kubernetes node. It turned out to be "caused" by an over optimistic invocation of bulk_insert_mappings. Basically I'm reading a CSV file with ~500,000 entries into a list of dictionaries, and then passing it into the bulk_insert_mappings function at once. It turned out the SQL Alchemy work was using 750mb of RAM, which was enough to OOM the small node the web app was running on.
A simple workaround is to split the list of 500,000 entries into chunks of 1000 entries each, and then call bulk_insert_mappings on each chunk. When I do this, the extra memory usage is not even noticeable. But also, it seems that this chunked approach is actually faster! I might benchmark that to quantify that.
Thought it was interesting. I wonder would it be worth adding to the docs on bulk_insert_mappings? Given that function is motivated by performance, it seems it might be relevant.
James