The major database servers and clients do have similar extensions to ANSI and the Python DBAPI here, but they're fairly different from one another in how they work and their input/output formats (and capabilities). They also are not necessarily supported by the underlying database drivers. Pyscopg2 has "copy_" methods on the cursor (
https://www.psycopg.org/docs/usage.html#copy); there are several mysql drivers, i looked at two and neither have explicit support for this. I'm not sure how this would really work though - it's basically designed for being implemented in the C clients with local files.
The "bulk loading" functionality in SqlAlchemy is largely in the realm of being a low-level DBAPI operations for insert statements, and SqlAlchemy's performance is about as close to using the driver itself - while still having the syntactic sugar and not needing to write to a database specifically. I don't believe there is any code to invoke a database client to read a source file itself; I am often mistaken though as Mike has put a ton of work into SqlAlchemy over the years.
From experience with large imports, I can tell you the following:
* The best performance for large updates involves:
1) take the database offline to clients
2) drop (and copy) all the foreign key and check constraints
3) import your data from a file with the native client that ships with your database *
4) add the foreign key and check constraints back in
if you feel the need to script this, you could use `subprocess` to invoke the database's native client on the machine
* The second best performance is:
use SqlAlchemy's bulk strategies,
segment out the ranges of your source data file
run multiple processes in parallel, each processing a subset of the source data file
* The third best performance is:
write a script that uses the SqlAlchemy ORM with the 'windowed query' option to iterate over a range of lines in your source file
have the script use a 3rd party tool like Redis to mark that it has "checked out" sections of the source data, and has completed/not-completed that range
run multiple processes in parallel, each processing a subset of the source data file
In the second and third approach, you can periodically run a query or script to calculate the right number of processes to run. It could be 7, or 20, or another number.