hi -
when you say "mysql directly" what are you refering to, your command line client, or the Python driver? Also what python driver are you using?
A comparison for how much time the operation needs to take should be produced by using the Python driver that you have with the *exact* SQL query you're starting with, then fully fetching all rows in Python. It sounds a little off that the mysqlclient driver can fetch ten million rows in 0.016 seconds. at the end of this message is a test script that fetches 5 million rows with just two columns each, using the raw mysqlclient driver. mysqlclient is a native driver that is as fast as you can get. I'm running it on my laptop with *no network*, local MySQL server, and it takes 7 seconds to fetch that many rows. ten million rows is a lot and it's not going to be very fast in any case.
That said, the ORM when it fetches full Python objects is *much* slower than fetching rows, because there is a lot of Python overhead in building up the objects. if you have 10M rows, just for Python to set aside the memory to store 10M heavy Python objects at once will take many seconds, and once it churns into swap space the time will grow exponentially. It's unlikely you need the full blown business object functionality on 10M rows at once so I would fetch columns instead of objects. There's discussion about this in the FAQ at
https://docs.sqlalchemy.org/en/13/faq/performance.html#result-fetching-slowness-orm . Additionally, the ORM can yield out the ORM objects in batches, keeping in mind the database driver has probably buffered the raw rows in any case, using yield_per().
What I would suggest is take a look at all the example fetching suites at
https://docs.sqlalchemy.org/en/13/_modules/examples/performance/large_resultsets.html . This illustrates all the different ways you can fetch rows with SQLAlchemy and compares the speed of each. you can run this suite straight from the distribution, and it will show the relative differences in speed between different kinds of fetches. The ORM in particular has a lot of work to do in both generating Python objects and populating them, and if you fetch rows with columns instead of objects, that will save most of the time.
To get a feel for this suite, here's a run from my own laptop:
$ python -m examples.performance large_resultsets --dburl mysql://scott:tiger@localhost/test
Running setup once...
Tests to run: test_orm_full_objects_list, test_orm_full_objects_chunks, test_orm_bundles, test_orm_columns, test_core_fetchall, test_core_fetchmany_w_streaming, test_core_fetchmany, test_dbapi_fetchall_plus_append_objects, test_dbapi_fetchall_no_object
test_orm_full_objects_list : Load fully tracked ORM objects into one big list(). (500000 iterations); total time 6.135940 sec
test_orm_full_objects_chunks : Load fully tracked ORM objects a chunk at a time using yield_per(). (500000 iterations); total time 3.340366 sec
test_orm_bundles : Load lightweight "bundle" objects using the ORM. (500000 iterations); total time 0.949388 sec
test_orm_columns : Load individual columns into named tuples using the ORM. (500000 iterations); total time 0.560157 sec
test_core_fetchall : Load Core result rows using fetchall. (500000 iterations); total time 0.466407 sec
test_core_fetchmany_w_streaming : Load Core result rows using fetchmany/streaming. (500000 iterations); total time 0.339930 sec
test_core_fetchmany : Load Core result rows using Core / fetchmany. (500000 iterations); total time 0.470984 sec
test_dbapi_fetchall_plus_append_objects : Load rows using DBAPI fetchall(), generate an object for each row. (500000 iterations); total time 0.476398 sec
test_dbapi_fetchall_no_object : Load rows using DBAPI fetchall(), don't make any objects. (500000 iterations); total time 0.330984 sec
import random
import time
from sqlalchemy import create_engine
e = create_engine("mysql://scott:tiger@localhost/test")
with e.connect() as conn:
conn.execute("drop table if exists data")
conn.execute("create table data (x integer, y integer)")
conn.execute(
"insert into data (x, y) values (%s, %s)",
[
(random.randint(1, 10), random.randint(1, 10))
for i in range(5000000)
],
)
raw_mysql_connection = conn.connection.connection
cursor = raw_mysql_connection.cursor()
now = time.perf_counter()
cursor.execute("select x, y from data")
cursor.fetchall()
cursor.close()
total = time.perf_counter() - now
print("total time %s" % total)