I've been using SQLAlchemy for some queries against a MS SQL server
via pymssql in my work and I've found performance of the fetchall call
to be slower than our old python sybase module based solution.
A quick peek into the code shows that fetchall is calling fetchone
repeatedly. Changing it to use fetchall from the cursor yields a
massive speedup in my queries. The same goes for the object mapper,
though not as dramatic.
I've attached two diffs which use fetchall instead of looping over
fetchone. In one of my calls it slashes the query time down from about
4 seconds to half a second for about 14,000 rows.
Is there any particular reason why fetchall is implemented this way?
I'm guessing it's the most compatible way of doing this.
cheers,
mick
My original concern for things like fetchall() regarded DBAPIs that
might have "active rows", meaning particular values in an individual
row might maintain some state with the database cursor, such as a
BLOB. I could only find one DBAPI that could potentially have this
phenomenon (i think it was an alternate postgres API) and we dont
support it anyway. but then again you dont really have to call
fetchall() if you dont want to.
anyway patch is committed in 1974.
> > <base_fetchall.diff>
> <orm_fetchall.diff>
I suspect performance improvements are database connector specific,
freetds (pymssql and sybase) based connection libraries seem to suffer
badly when using fetchone vs fetchall.
> My original concern for things like fetchall() regarded DBAPIs that
> might have "active rows", meaning particular values in an individual
> row might maintain some state with the database cursor, such as a
> BLOB. I could only find one DBAPI that could potentially have this
> phenomenon (i think it was an alternate postgres API) and we dont
> support it anyway. but then again you dont really have to call
> fetchall() if you dont want to.
>
> anyway patch is committed in 1974.
Cool, thanks.
mick