pyodbc Cursor object

3,587 views
Skip to first unread message

Tzur Turkenitz

unread,
Oct 24, 2011, 4:11:11 AM10/24/11
to pyodbc
Hi,
I have a question regarding the implementation of python database API.
I have read the API documentation, but it wasn't very helpful and I
was hoping for some more information.

My question is regarding the Cursor object in pyodbc.
I know that when I execute "fetchmany" I return the remaining rows in
the result set into memory (can be dangerous when the result set is
large).
but what happens when I execute a cursor, where is the result set is
kept, and If I iterate over the cursor returning one row at a time
what are the performance issues I should consider.

Thanks!

Michael Kleehammer

unread,
Oct 25, 2011, 1:05:47 PM10/25/11
to pyo...@googlegroups.com
The memory is maintained by the driver, so you would have to reference your driver's documentation.  Most seem to read a set of rows in a tightly compressed format and buffer them in memory or on disk.  As you read rows, pyodbc formats them as Python objects and the driver discards them.  When it's buffer runs low, it requests more.

From a memory point-of-view, your best optimizations are always reducing the number of rows and columns you read.  To reduce RAM, you can read either 1 row at a time, or you can use fetchmany to read them in batches.  The performance difference will simply be the number of times the fetch function is called and some of its preliminary checks before it starts formatting results.  This is usually negligible, but timing is the only way to know for sure.  If one row at a time is noticeably slower, simply increase your fetchmany() batch size.

Note that the  fetchall() function is the one that reads all remaining rows, not fetchmany(), so make sure you are reading them all.

Dharol Tankersley

unread,
Oct 27, 2011, 11:43:18 AM10/27/11
to pyodbc
I am fairly new to python and relational databases and struggling with
memory errors.
Accessing an MSSQL database using python 2.7.

To reduce RAM i am
1. pulling only two columns in a query
2. limiting the query to 100 rows
3. deleting the cursor query
4. closing the cursor

Somehow the cursor (or something else?) is retaining data, because it
successfully executes the first 100-row query and fails on the second
100-row query.


SCRIPT:
invfields = ['invoiceid','datename(q,tenderdate)']

for field in range(1,len(invfields)):
thisdata = ();
cursor.close(); cursor = pyodbcconn.cursor()
thisquery = 'SELECT ' + invfields[0] + ', ' +
invfields[field] + ' FROM invoices WHERE invoiceid < 100'
print thisquery
cursor.execute(thisquery);
thisdata = cursor.fetchall()
deleted = cursor.execute(thisquery); print thisdata

Dharol Tankersley

unread,
Oct 27, 2011, 11:56:00 AM10/27/11
to pyodbc
hello.
i am fairly new to python and relational databases, struggling with
memory errors using cursor.execute to query MSSQL database. Using
python 2.7

To save ram i am:
1. querying only two columns from the table at a time
2. limiting query to 100 rows
3. deleting cursor query
4. closing cursor

the cursor (or something else?) must still be retaining data though,
because it successfully executes the first 100-row query and fails on
the second 100-row query.

#SCRIPT:
chunksize = 100; itemEND = 100; rowST = 1; rowMAX = 1000
invfields = ['invoiceid','datename(q,tenderdate)']

for field in range(1,len(invfields)):
while (itemEND <= itemMAX):
thisdata = ();
cursor.close();
cursor = pyodbcconn.cursor()

thisquery = 'SELECT ' + invfields[0] + ', ' +
invfields[field] + ' FROM invoices \
WHERE invoiceid >= ' + str(itemST) + ' AND invoiceid <
' + str(itemEND)
print thisquery
cursor.execute(thisquery);
thisdata = cursor.fetchall()
deleted = cursor.execute(thisquery);
print itemST, itemEND

itemST = itemST + chunksize; itemEND = itemEND +
chunksize;

#OUTPUT & ERROR:
SELECT invoiceid, datename(q,tenderdate) FROM invoices WHERE
invoiceid >= 1 and invoiceid < 100
1 100
SELECT invoiceid, datename(q,tenderdate) FROM invoices WHERE
invoiceid >= 101 and invoiceid < 200
101 200

Traceback (most recent call last):
File "/Users/dharol/eclipse/workspace/mongodb/invoicequery.py",
line 48, in <module>
thisdata = cursor.fetchall()
MemoryError

Any help, guidance towards alternatives to cursors, or more effective
means of using cursors with mssql, greatly appreciated.
-d

ashaner

unread,
Oct 27, 2011, 1:54:52 PM10/27/11
to pyodbc
Your code is kind of confusing because it can't be what you're
actually running, e.g.,, it has syntax errors and you're calling
execute twice in the loop. Can you post the actual code?

You can iterate over the cursor instead of using fetchall, or use
fetchmany. I wouldn't think closing the cursor each iteration saves
much memory.

Did you read the Wiki on cursors?
http://code.google.com/p/pyodbc/wiki/Cursor

On Oct 27, 11:56 am, Dharol Tankersley <dtankers...@schipul.com>
wrote:

dmoney

unread,
Oct 27, 2011, 2:54:19 PM10/27/11
to pyodbc
Apologies. Below is the actual code, as well as an additional attempt
with fetchone instead of fetchall. I did read the wiki on cursors,
have tried fetchmany, fetchone, closing and deleting cursors, even a
generator for iterating over cursor results (http://
code.activestate.com/recipes/137270-use-generators-for-fetching-large-
db-record-sets/).

All efforts yeild the same memory problem, leading me to believe that
somehow i am not clearing the cursor as i loop.

Thank you again for your help. -d


#I. ATTEMPT WITH CLOSE/ DELETE CURSOR
import pyodbc
from pymongo.collection import Collection
from pymongo import Connection; mongoconn =
Connection('localhost')
mongodb = mongoconn.t4xquery;
url =
'DSN=NTSERVER17;UID=username;PWD=password;PORT=1433;DATABASE=dw_cmh'
pyodbcconn = pyodbc.connect(url); cursor = pyodbcconn.cursor()
thiscollection = Collection(mongodb, 'invoices')

chunksize = 100; rowST = 1; rowEND = 100; rowMAX = 1000
invfields = ['invoiceid','datename(q,tenderdate)']

for field in range(1,len(invfields)):
while (rowEND <= rowMAX):
thisdata = ();
cursor.close();
cursor = pyodbcconn.cursor()
thisquery = 'SELECT ' + invfields[0] + ', ' +
invfields[field] + ' FROM invoices WHERE invoiceid >= ' + str(rowST) +
' AND invoiceid < ' + str(rowEND)
print thisquery

cursor.execute(thisquery);
thisdata = cursor.fetchall()
deleted = cursor.execute(thisquery);
print rowST, rowEND

rowST = rowST + chunksize; rowEND = rowEND + chunksize;

#II. OUTPUT FROM CLOSE/ DELETE CURSOR ATTEMPT:
SELECT invoiceid, datename(q,tenderdate) FROM invoices WHERE
invoiceid >= 1 AND invoiceid < 100
1 100
SELECT invoiceid, datename(q,tenderdate) FROM invoices WHERE
invoiceid >= 101 AND invoiceid < 200
Traceback (most recent call last):
File "/Users/dharol/eclipse/workspace/mongodb/shortexecute.py",
line 26, in <module>
thisdata = cursor.fetchall()
MemoryError

#III. ATTEMPT WITH FETCHONE:
import pyodbc
from pymongo.collection import Collection
from pymongo import Connection; mongoconn =
Connection('localhost')
mongodb = mongoconn.t4xquery;
url =
'DSN=NTSERVER17;UID=username;PWD=password;PORT=1433;DATABASE=dw_spegcs'
pyodbcconn = pyodbc.connect(url); cursor = pyodbcconn.cursor()
thiscollection = Collection(mongodb, 'invoices')

invfields = ['invoiceid','datename(q,tenderdate)']

for field in range(1,len(invfields)):
thisdata = ();
cursor.close();
cursor = pyodbcconn.cursor()
thisquery = 'SELECT ' + invfields[0] + ', ' + invfields[field]
+ ' FROM invoices'
cursor.execute(thisquery);
print thisquery

while (1):
thisdata = cursor.fetchone()
if thisdata == None: break
print thisdata

#IV. OUTPUT FROM FETCHONE ATTEMPT:
SELECT invoiceid, datename(q,tenderdate) FROM invoices
(68797, u'3')
(68798, u'3')
(68799, u'3')
Traceback (most recent call last):
File "/Users/dharol/eclipse/workspace/mongodb/shortexecute.py",
line 53, in <module>
thisdata = cursor.fetchone()
MemoryError
(68800, u'3')
(68801, u'3')
(68802, u'3')
(68803, u'3')
(68804, u'3')
(68805, u'3')
(68806, u'3')
(68807, u'3')

ashaner

unread,
Oct 27, 2011, 4:09:37 PM10/27/11
to pyodbc
Not sure if you specified before, but you're using the latest versions
of pyodbc and your drivers? If you just simplify it down to a cleaner
example does it still give you the error? The following code does the
same thing as your second example:

import pyodbc
url =
'DSN=NTSERVER17;UID=username;PWD=password;PORT=1433;DATABASE=dw_spegcs'
pyodbcconn = pyodbc.connect(url)
cursor = pyodbcconn.cursor()
thisquery = 'SELECT invoiceid, datename(q,tenderdate) FROM invoices'
cursor.execute(thisquery)
print thisquery
while (1):
thisdata = cursor.fetchone()
if thisdata is None: break
print thisdata

If that code gives you an error, then I'll defer to mkleehammer...

dmoney

unread,
Oct 27, 2011, 4:32:00 PM10/27/11
to pyodbc
thank you for your response.

1. I am using pyodbc 2.1.13 (from git+git://github.com/mkleehammer/
pyodbc.git )
and freeTDS 0.91

2. running your simplified code I still get the memory error:

--------------------
SELECT invoiceid, datename(q,tenderdate) FROM invoices
MemoryError
(68797, u'3')
(68798, u'3')
(68799, u'3')
(68800, u'3')
(68801, u'3')
(68802, u'3')
(68803, u'3')
(68804, u'3')
(68805, u'3')
(68806, u'3')
(68807, u'3')
Traceback (most recent call last):
File "/Users/dharol/eclipse/workspace/mongodb/shortexecute.py",
line 66, in <module>
thisdata = cursor.fetchone()
MemoryError
--------------------

I know that there are 68,866 invoices, so am confused why it is able
to execute 10 and then have memory error, if the cursor is not
supposed to retain data through the iterations?
-d

Tzur Turkenitz

unread,
Oct 31, 2011, 7:47:42 AM10/31/11
to pyodbc
Thank you very much. This was very helpful in understanding

dmoney

unread,
Oct 31, 2011, 1:51:20 PM10/31/11
to pyodbc
Pretty ugly code, but I was able to solve the problem by:
1. chunking data during SELECT query
2. doing date transformation in python instead of SQL query
(e.g., extracting month, year, from python objects that are the output
of the cursor query)

Neither
closing the cursor cursor.close()
deleting the query deleted=cursor.execute(thisquery)
limiting cursor fetches cursor.fetchone()
made any difference to memory use for me. Solution script below.

Thanks again for all of your assistance!
-d


SOLUTION SCRIPT:
chunksize = 1000
import pyodbc

url =
'DSN=myserver;UID=myusername;PWD=mypassword;PORT=1433;DATABASE=mydb'
pyodbcconn = pyodbc.connect(url);
cursor = pyodbcconn.cursor()

#GET COUNT OF INVOICES (query rows)
cursor.execute('SELECT MAX(invoiceid) FROM invoices');
counttup = cursor.fetchall(); rowMAX = counttup[0][0];

# LOOP THROUGH DATA CHUNKS
rowST = 1; rowEND = chunksize;

while (rowEND < rowMAX+chunksize):
thisquery = 'SELECT tenderdate FROM invoices \
WHERE invoiceid >= ' + str(rowST) + ' AND invoiceid < ' +
str(rowEND)
cursor.execute(thisquery); thisdata = cursor.fetchall();

for tup in thisdata:
print tup[0]
thisday, thismonth, thisyear = tup[0].day, tup[0].month,
tup[0].year
print thisday
print thismonth
print thisyear
print 'Finished: updated {0} of {1} invoices'.format(str(rowEND),
str(rowMAX))
Reply all
Reply to author
Forward
0 new messages