oracle speed problem and arraysize hack

310 views
Skip to first unread message

gniquil

unread,
May 26, 2008, 4:35:33 AM5/26/08
to sqlalchemy
Hi All,

There seems to be some shortcomings for fetching large number of rows
due to the default arraysize being set to 1 (too many roundtrips to
oracle). I couldn't find any easy ways to fix this problem except by
adding the following line to

pool.py

def cursor(...)
try:
c = self.connection.cursor(*args, **kwargs)
c.arraysize = 500
....

anybody has a better suggestion to this problem. Fetching 1000+ lines
would take no time with this hack, but without it would take forever.

Frank

Michael Bayer

unread,
May 26, 2008, 11:06:33 AM5/26/08
to sqlal...@googlegroups.com


Reading the docs for arraysize, its not clear if this only applies to
fetchone() and a non-arg fetchmany(), or to all fetches (I'm hoping
you have better detail on this.). If the former, why not call
fetchmany() or fetchall() ? The ORM uses fetchall()/fetchmany() when
receiving rows. If the latter, that seems pretty strange that
cx_oracle would still make 100 individual calls for a fetchmany(100)
when just one would do.

Waldemar Osuch

unread,
May 26, 2008, 6:20:01 PM5/26/08
to sqlalchemy
>
> Reading the docs for arraysize, its not clear if this only applies to
> fetchone() and a non-arg fetchmany(), or to all fetches (I'm hoping
> you have better detail on this.). If the former, why not call
> fetchmany() or fetchall() ? The ORM uses fetchall()/fetchmany() when
> receiving rows. If the latter, that seems pretty strange that
> cx_oracle would still make 100 individual calls for a fetchmany(100)
> when just one would do.

Help is on the way
http://permalink.gmane.org/gmane.comp.python.db.cx-oracle/1325

gniquil

unread,
May 26, 2008, 6:34:15 PM5/26/08
to sqlalchemy
I am glad to see the fix is on the way. This is the only thing that's
holding me back using SQLAlchemy. By the way, can we also ensure that
we will be able to change this parameter more easily? Thanks

Michael Bayer

unread,
May 26, 2008, 7:09:38 PM5/26/08
to sqlal...@googlegroups.com

this is the same as the document which I read on the cx_oracle website.

What happens if I say:

cursor.fetchall()

?

does it make one "round trip" per row ? or do a full fetch in one
"round trip" ? similar question for fetchmany(n).

Michael Bayer

unread,
May 26, 2008, 7:10:35 PM5/26/08
to sqlal...@googlegroups.com

On May 26, 2008, at 6:34 PM, gniquil wrote:

>
> I am glad to see the fix is on the way. This is the only thing that's
> holding me back using SQLAlchemy. By the way, can we also ensure that
> we will be able to change this parameter more easily? Thanks

I dont think changing this parameter is the solution. Just call
fetchall() or fetchmany(). I think calling fetchone() should fetch
only one row, IMHO.

gniquil

unread,
May 27, 2008, 12:54:31 AM5/27/08
to sqlalchemy
I haven't looked into cx_Oracle too deeply but one thing I found is
this. In order for arraysize to be effective, you have to set it
before you execute. Once it's after execution, it's already too late.
I guess the root of the problem lies in cx_Oracle, i.e. it needs to
support dynamically changing the arraysize.

By the way, if you peek into the source code for just a sec, you will
see that internally it's called rowlimit, which I hope is the case
(which also may entail that fetchone does only fetch one row). But
then judging from the information from the following article, I am not
quite sure it's so simple. cx_Oracle might do something smart to take
advantage of packing more rows into any single network packets.

http://www.toadworld.com/Knowledge/DatabaseKnowledge/GuyHarrisonsResolvingOracleContention/ResolvingOracleContention/May2008OracleNetworkContention/tabid/374/Default.aspx

finally i completely agree with you that either sqlalchemy or
cx_oracle needs to deal with this problem implicitly, rather than
forcing people to fudge this parameter (but still leave us the option
to change it)

Michael Bayer

unread,
May 27, 2008, 10:45:21 AM5/27/08
to sqlal...@googlegroups.com
Hi -

sorry to hassle you on this but it would be of great help if you could
create a small test case using cx_oracle only, which inserts around
10,000 rows into a table and then uses various methods, like
fetchmany() with and without the manual "arraysize" setting, to get
them back. You can time each function using Timeit or alternatively
just using time.time(). If cx_oracle is in fact optimizing
fetchmany() for *only* the case that no argument is passed, which
would be amazingly weird, we can add an engine-level setting for this
easily enough but I need confirmation that this is the case.

Waldemar Osuch

unread,
May 27, 2008, 3:21:22 PM5/27/08
to sqlalchemy
The way I understand it does one "round trip" per row.

I have brought this thread to the attention of Anthony Tuininga
(the author of cx_Oracle). His clarification follows.

Quote:

Up to this point the default arraysize is 1 meaning that a single row
is internally fetched at a time. This has nothing to do with
fetchone(), fetchmany() or fetchall(). Regardless of which of those
methods is used, internally cx_Oracle fetches one row at a time. If
you change the arraysize to 50, internally cx_Oracle will fetch 50
rows at a time. Again, this is regardless of whether you use
fetchone(), fetchmany() or fetchall(). Some of the confusion may lie
in the fact that the default value for rows to fetch in fetchmany() is
the arraysize -- but that is all it is, a default value!

In my own code I have created a subclass of cx_Oracle.Connection that
does the following:

class MyConnection(cx_Oracle.Connection):

def cursor(self):
cursor = cx_Oracle.Cursor(self)
cursor.arraysize = 50
return cursor

What this does is automatically set the arraysize to 50 every time a
cursor is created. This can be done to transparently set the arraysize
and should allow you to proceed with whatever code needs to assume an
arraysize of that value. Otherwise you can feel free to change it
yourself after creating the cursor.

And as has already been noted, in the next release of cx_Oracle, the
default arraysize will be 50 in order to resolve this problem
"permanently". :-)

Hope this helps.

Anthony

Michael Bayer

unread,
May 27, 2008, 4:21:02 PM5/27/08
to sqlal...@googlegroups.com

On May 27, 2008, at 3:21 PM, Waldemar Osuch wrote:

> Quote:
>
> Up to this point the default arraysize is 1 meaning that a single row
> is internally fetched at a time. This has nothing to do with
> fetchone(), fetchmany() or fetchall(). Regardless of which of those
> methods is used, internally cx_Oracle fetches one row at a time. If
> you change the arraysize to 50, internally cx_Oracle will fetch 50
> rows at a time. Again, this is regardless of whether you use
> fetchone(), fetchmany() or fetchall(). Some of the confusion may lie
> in the fact that the default value for rows to fetch in fetchmany() is
> the arraysize -- but that is all it is, a default value!

see, that's a really unfortunate decision on his part to reuse
"arraysize" in such an arbitrary way like that, while *not* using it
at the point at which it is entirely reasonable, that is when you have
already stated you want to fetchmany(n) or fetchall(). This is
totally a bug in cx_oracle.

>
> class MyConnection(cx_Oracle.Connection):
>
> def cursor(self):
> cursor = cx_Oracle.Cursor(self)
> cursor.arraysize = 50
> return cursor
>
> What this does is automatically set the arraysize to 50 every time a
> cursor is created. This can be done to transparently set the arraysize
> and should allow you to proceed with whatever code needs to assume an
> arraysize of that value. Otherwise you can feel free to change it
> yourself after creating the cursor.
>
> And as has already been noted, in the next release of cx_Oracle, the
> default arraysize will be 50 in order to resolve this problem
> "permanently". :-)

that workaround works, and also implementing "default_arraysize"
within OracleDialect as follows is acceptable. If someone can test
this and post a trac ticket I can commit this to 0.4/0.5:

Index: lib/sqlalchemy/databases/oracle.py
===================================================================
--- lib/sqlalchemy/databases/oracle.py (revision 4819)
+++ lib/sqlalchemy/databases/oracle.py (working copy)
@@ -213,6 +213,12 @@
self.out_parameters[name] =
self.cursor.var(dbtype)
self.parameters[0][name] =
self.out_parameters[name]

+ def create_cursor(self):
+ cursor = self._connection.connection.cursor()
+ if self.dialect.default_arraysize:
+ cursor.arraysize = self.dialect.default_arraysize
+ return cursor
+
def get_result_proxy(self):
if hasattr(self, 'out_parameters'):
if self.compiled_parameters is not None and
len(self.compiled_parameters) == 1:
@@ -242,8 +248,9 @@
supports_pk_autoincrement = False
default_paramstyle = 'named'

- def __init__(self, use_ansi=True, auto_setinputsizes=True,
auto_convert_lobs=True, threaded=True, allow_twophase=True, **kwargs):
+ def __init__(self, use_ansi=True, auto_setinputsizes=True,
auto_convert_lobs=True, threaded=True, allow_twophase=True,
default_arraysize=None, **kwargs):
default.DefaultDialect.__init__(self, **kwargs)
+ self.default_arraysize = default_arraysize
self.use_ansi = use_ansi
self.threaded = threaded
self.allow_twophase = allow_twophase

gniquil

unread,
May 27, 2008, 5:34:43 PM5/27/08
to sqlalchemy
Hey, i did a test included below with this output:

$ python cxotest.py
Setting arraysize to 1 before execution yields 25.8921508181
Setting arraysize to 500 before execution yields 0.26524348765
Setting arraysize to 1 after execution yields 25.8829982582
Setting arraysize to 500 after execution yields 25.8650820146


/* sql code for the table */
begin
for i in 1..10000
Loop
insert into testtable
(id, name)
values
(i, 'abcde');
end loop;
end;

#python code#################################

from cx_Oracle import connect, Cursor

conn = connect('username', 'password', 'testaccount')
curs = Cursor(conn)

def fetchdata_preset(sql, arraysize=1, n=1000):
curs.arraysize = arraysize
curs.execute(sql)
curs.fetchmany(n)

def fetchdata_postset(sql, arraysize=1, n=1000):
curs.arraysize = 1 #make sure this is set back to default
curs.execute(sql)
curs.arraysize = arraysize
curs.fetchmany(n)


if __name__ == '__main__':
from timeit import Timer

sql = "\'select * from testtable\'"

t1a = Timer("fetchdata_preset(%s,1,1000)" % sql,
"from __main__ import fetchdata_preset")
t1b = Timer("fetchdata_preset(%s,500,1000)" % sql,
"from __main__ import fetchdata_preset")
t2a = Timer("fetchdata_postset(%s,1,1000)" % sql,
"from __main__ import fetchdata_postset")
t2b = Timer("fetchdata_postset(%s,500,1000)" % sql,
"from __main__ import fetchdata_postset")

statement = "Setting arraysize to %s %s execution yields %s"

print statement % (str(1), "before", str(t1a.timeit(1)))
print statement % (str(500), "before", str(t1b.timeit(1)))
print statement % (str(1), "after", str(t2a.timeit(1)))
print statement % (str(500), "after", str(t2b.timeit(1)))

gniquil

unread,
May 27, 2008, 5:37:40 PM5/27/08
to sqlalchemy
Or we can just recompile the source and modify the line:

line 275 in Cursor.c

self->arraySize = 1; to 50

:)

Michael Bayer

unread,
May 27, 2008, 6:34:56 PM5/27/08
to sqlal...@googlegroups.com
So, I think i'll probably implement the patch I posted earlier and
actually default arraysize to 50, although I have a feeling it might
mess around with some of the BLOB-oriented functionality (since BLOBs
must be fetched at the point of receiving the row, else the cursor
moves on)..if thats the case I'll try to keep it as an option.

Michael Bayer

unread,
May 30, 2008, 5:06:07 PM5/30/08
to sqlal...@googlegroups.com
the "arraysize=50" default is set up in r4827 on the 0.4 branch and
r4828 on the trunk. I was surprised to find the rules for "BLOB no
longer accesssible" are different than what I had assumed they were,
so binary results and all work just fine here.


On May 27, 2008, at 5:34 PM, gniquil wrote:

Reply all
Reply to author
Forward
0 new messages