Do pyodbc and pypyodbc differ in terms of performance for database (read SQL Server) INSERTs??

1,601 views
Skip to first unread message

Muqadder Iqbal

unread,
Jun 3, 2015, 1:27:38 PM6/3/15
to pyo...@googlegroups.com
I'm testing a data extraction/ load solution I've built in python. The source is a set of APIs and the target is a set of SQL Server (2012) tables. In my python code, I loop over the resultset I receive from the API call, add a batch of 10,000 rows into a list of tuples (representing the rows I want to insert) and then call a pypyodbc.executemany function by passing the list object as input to a simple INSERT INTO.....VALUES.... statement. So essentially, you'd expect to see about 10,000 rows inserted into the SQL table before a COMMIT happens (I have an explicit cursor.commit right after the INSERT). However, when I query the table while the data load is happening, I'm seeing ~ 100 new rows being inserted every time I rerun my SELECT *... on the table (almost every couple of seconds). IN order to load about 500,000 rows into the table, it took about an hour to do so. Note that I am not doing any kind of data transformation at all. Its just a straight INSERT INTO..

This feels terribly slow considering I'm not limited by any of the hardware resources on the machine I'm running this process on. I have seen posts regarding pyodbc being slow for INSERTs (I haven't tried it myself yet) but nothing for the pypyodbc package.

Does anyone have hands-on experiences to share regarding doing INSERTs using pyodbc vs pypyodbc? Are they expected to perform exactly the same? Is there any merit in choosing one over the other?
Reply all
Reply to author
Forward
0 new messages