Hi,
I have a question regarding the implementation of python database API.
I have read the API documentation, but it wasn't very helpful and I
was hoping for some more information.
My question is regarding the Cursor object in pyodbc.
I know that when I execute "fetchmany" I return the remaining rows in
the result set into memory (can be dangerous when the result set is
large).
but what happens when I execute a cursor, where is the result set is
kept, and If I iterate over the cursor returning one row at a time
what are the performance issues I should consider.
The memory is maintained by the driver, so you would have to reference your driver's documentation. Most seem to read a set of rows in a tightly compressed format and buffer them in memory or on disk. As you read rows, pyodbc formats them as Python objects and the driver discards them. When it's buffer runs low, it requests more.
From a memory point-of-view, your best optimizations are always reducing the number of rows and columns you read. To reduce RAM, you can read either 1 row at a time, or you can use fetchmany to read them in batches. The performance difference will simply be the number of times the fetch function is called and some of its preliminary checks before it starts formatting results. This is usually negligible, but timing is the only way to know for sure. If one row at a time is noticeably slower, simply increase your fetchmany() batch size.
Note that the fetchall() function is the one that reads all remaining rows, not fetchmany(), so make sure you are reading them all.
I am fairly new to python and relational databases and struggling with
memory errors.
Accessing an MSSQL database using python 2.7.
To reduce RAM i am
1. pulling only two columns in a query
2. limiting the query to 100 rows
3. deleting the cursor query
4. closing the cursor
Somehow the cursor (or something else?) is retaining data, because it
successfully executes the first 100-row query and fails on the second
100-row query.
> The memory is maintained by the driver, so you would have to reference your
> driver's documentation. Most seem to read a set of rows in a tightly
> compressed format and buffer them in memory or on disk. As you read rows,
> pyodbc formats them as Python objects and the driver discards them. When
> it's buffer runs low, it requests more.
> From a memory point-of-view, your best optimizations are always reducing
> the number of rows and columns you read. To reduce RAM, you can read
> either 1 row at a time, or you can use fetchmany to read them in batches.
> The performance difference will simply be the number of times the fetch
> function is called and some of its preliminary checks before it starts
> formatting results. This is usually negligible, but timing is the only way
> to know for sure. If one row at a time is noticeably slower, simply
> increase your fetchmany() batch size.
> Note that the fetchall() function is the one that reads all remaining
> rows, not fetchmany(), so make sure you are reading them all.
hello.
i am fairly new to python and relational databases, struggling with
memory errors using cursor.execute to query MSSQL database. Using
python 2.7
To save ram i am:
1. querying only two columns from the table at a time
2. limiting query to 100 rows
3. deleting cursor query
4. closing cursor
the cursor (or something else?) must still be retaining data though,
because it successfully executes the first 100-row query and fails on
the second 100-row query.
Your code is kind of confusing because it can't be what you're
actually running, e.g.,, it has syntax errors and you're calling
execute twice in the loop. Can you post the actual code?
You can iterate over the cursor instead of using fetchall, or use
fetchmany. I wouldn't think closing the cursor each iteration saves
much memory.
> hello.
> i am fairly new to python and relational databases, struggling with
> memory errors using cursor.execute to query MSSQL database. Using
> python 2.7
> To save ram i am:
> 1. querying only two columns from the table at a time
> 2. limiting query to 100 rows
> 3. deleting cursor query
> 4. closing cursor
> the cursor (or something else?) must still be retaining data though,
> because it successfully executes the first 100-row query and fails on
> the second 100-row query.
Apologies. Below is the actual code, as well as an additional attempt
with fetchone instead of fetchall. I did read the wiki on cursors,
have tried fetchmany, fetchone, closing and deleting cursors, even a
generator for iterating over cursor results (http://
code.activestate.com/recipes/137270-use-generators-for-fetching-large-
db-record-sets/).
All efforts yeild the same memory problem, leading me to believe that
somehow i am not clearing the cursor as i loop.
Not sure if you specified before, but you're using the latest versions
of pyodbc and your drivers? If you just simplify it down to a cleaner
example does it still give you the error? The following code does the
same thing as your second example:
import pyodbc
url =
'DSN=NTSERVER17;UID=username;PWD=password;PORT=1433;DATABASE=dw_spegcs'
pyodbcconn = pyodbc.connect(url)
cursor = pyodbcconn.cursor()
thisquery = 'SELECT invoiceid, datename(q,tenderdate) FROM invoices'
cursor.execute(thisquery)
print thisquery
while (1):
thisdata = cursor.fetchone()
if thisdata is None: break
print thisdata
If that code gives you an error, then I'll defer to mkleehammer...
On Oct 27, 2:54 pm, dmoney <dtankers...@schipul.com> wrote:
> Apologies. Below is the actual code, as well as an additional attempt
> with fetchone instead of fetchall. I did read the wiki on cursors,
> have tried fetchmany, fetchone, closing and deleting cursors, even a
> generator for iterating over cursor results (http://
> code.activestate.com/recipes/137270-use-generators-for-fetching-large-
> db-record-sets/).
> All efforts yeild the same memory problem, leading me to believe that
> somehow i am not clearing the cursor as i loop.
I know that there are 68,866 invoices, so am confused why it is able
to execute 10 and then have memory error, if the cursor is not
supposed to retain data through the iterations?
-d
> The memory is maintained by the driver, so you would have to reference your
> driver's documentation. Most seem to read a set of rows in a tightly
> compressed format and buffer them in memory or on disk. As you read rows,
> pyodbc formats them as Python objects and the driver discards them. When
> it's buffer runs low, it requests more.
> From a memory point-of-view, your best optimizations are always reducing
> the number of rows and columns you read. To reduce RAM, you can read
> either 1 row at a time, or you can use fetchmany to read them in batches.
> The performance difference will simply be the number of times the fetch
> function is called and some of its preliminary checks before it starts
> formatting results. This is usually negligible, but timing is the only way
> to know for sure. If one row at a time is noticeably slower, simply
> increase your fetchmany() batch size.
> Note that the fetchall() function is the one that reads all remaining
> rows, not fetchmany(), so make sure you are reading them all.
Pretty ugly code, but I was able to solve the problem by:
1. chunking data during SELECT query
2. doing date transformation in python instead of SQL query
(e.g., extracting month, year, from python objects that are the output
of the cursor query)
Neither
closing the cursor cursor.close()
deleting the query deleted=cursor.execute(thisquery)
limiting cursor fetches cursor.fetchone()
made any difference to memory use for me. Solution script below.