Looking for recommendations on the best way to approach getting large volumes of data into a DB2 database using Python with ibm_db and working with SQL/DDL files.
Today we use DB2CMD scripts on our application servers that handle loading multi GB data files (multi GB) into DB2 databases. This is a remote load scenario. The drivers for looking at Python is that we have to add additional functionality to the scripts and with the restrictions in Windows CMD scripting it makes sense to look at other options. That said, because the current solution works and works well there isn't a desire to make architectural changes to the data flow. Namely, we are stuck with the data files being sent to application server.
In the current environment the scripts use the standard DB2 LOAD command. Something along the lines of;
DB2 LOAD CLIENT FROM <filename> OF DEL RESTARTCOUNT 1 MESSAGES <messages file> REPLACE INTO <target table> NONERECOVERABBLE
or
DB2 LOAD CLIENT FROM <filename> OF DEL RESTARTCOUNT 1
MESSAGES <messages file> REPLACE INTO <target table> COPY YES
This affords;
* Fast data loads. Yes, it is a little slower than if the data files were on the DB server but performance has been very good (9K on low power untuned systems).
* Ability to restart a load
* On failure the load can be backed out completely
* Good status reporting
- Message log file with row by row data
- Status summary;
Number of rows read =
Number of rows skipped =
Number of rows loaded =
Number of rows rejected =
Number of rows deleted =
Number of rows committed =
I understand that DB2 LOAD is a command line processor command so it isn't an option in ibm_db. And with the data files residing on the application server 'call admin_cmd' isn't an option. So what is the best option to;
* Maintain current configuration of data files on the application server
* Performance that is equal to or better than a DB2 LOAD
* Can back out on failure
* Provide the same or equivalent level of logging (read, skipped, rejected, committed) or at least some way of reconciling what records didn't make it in.
Because I haven't put enough in this post I'll add one more spoon feed item. In our scripts we do a lot through DDL/SQL files where the script will execute something along the lines of "db2 -tf <whateverfile.sql>". I'm not seeing how to do this with Python.
Appreciate any assistance. Fine with being pointed to resources but would ask that such resources have applicable examples.
Thanks!