I'm testing a data extraction/ load solution I've built in python. The source is a set of APIs and the target is a set of SQL Server (2012) tables. In my python code, I loop over the resultset I receive from the API call, add a batch of 10,000 rows into a list of tuples (representing the rows I want to insert) and then call a pypyodbc.executemany function by passing the list object as input to a simple INSERT INTO.....VALUES.... statement. So essentially, you'd expect to see about 10,000 rows inserted into the SQL table before a COMMIT happens (I have an explicit cursor.commit right after the INSERT). However, when I query the table while the data load is happening, I'm seeing ~ 100 new rows being inserted every time I rerun my SELECT *... on the table (almost every couple of seconds). IN order to load about 500,000 rows into the table, it took about an hour to do so. Note that I am not doing any kind of data transformation at all. Its just a straight INSERT INTO..
This feels terribly slow considering I'm not limited by any of the hardware resources on the machine I'm running this process on. I have seen posts regarding pyodbc being slow for INSERTs (I haven't tried it myself yet) but nothing for the pypyodbc package.
Does anyone have hands-on experiences to share regarding doing INSERTs using pyodbc vs pypyodbc? Are they expected to perform exactly the same? Is there any merit in choosing one over the other?