performance problem on a simple select

1,664 views
Skip to first unread message

SRage

unread,
Feb 13, 2009, 4:57:19 PM2/13/09
to pyodbc
I'm not sure where the problem lies, but I'm doing a simple query of
this form:

select * from requests where requestdatetime > ?

on a table in a SQL Server 2005 database with about 1.8 million
records and an expected result set size of ~500-1000 records. When I
run the query directly in MS Query Analyzer on a remote machine it
takes ~6 seconds. When I run it using the tsql command line client
(from freetds) on my linux machine it also takes ~6 seconds. When I
run it on that same linux machine with pyodbc it takes ~1-2 minutes. I
read in the issues that you're working on some performance problems -
would that account for this?

Linux client is 64bit Gentoo
pyodbc versions tested were 2.0.58 and 2.1.4
freetds version is 0.64, using tds version 8 to connect to SQL2005
host
unixODBC version is 3.3.8b.

I also tested with pymssql v0.8 and it was slower than pyodbc. Maybe
it's the DB API in python. Any suggestions?

Thanks
Alec

mkleehammer

unread,
Feb 14, 2009, 10:10:15 AM2/14/09
to pyodbc
I assume that you are measuring the complete time from select to the
last fetch? There shouldn't be anything that slows the actual query
down, so first measure from before execute to after the first fetch.
I would expect that to be around 5-6 seconds.

The part that will be slow using Python is converting the results to
Python objects. tsql and QA only have to print the results on the
screen so they can ask SQL Server for the results in text. The best
thing you can do is cut the number of result columns down to just
those you need (unless you're building a generic utility of course).

To test this, try selecting just the primary key and time that.

Some data types like datetime and decimal (unfortunately) are going to
take more time than others to convert. If you have a lot of them, you
could cast them to strings. (Unfortunately ODBC drivers, including
SQL Server's, are very buggy when dealing with decimals so I have to
select those as text and convert them myself instead of using the
binary structure.)

That's all I can think of right now, but I'll do some testing.

The performance improvements I was talking about won't affect this --
they were to speed up the actual query if you were using non-NULL
parameters.

Also try testing with pyodbc on Windows if you can to see if there is
a 64-bit or OS difference.

SRage

unread,
Feb 14, 2009, 12:43:32 PM2/14/09
to pyodbc
Thanks for the reply. I broke down the timings and it was still the
actual query that was taking so long. However, you're suggestion
about python objects pointed me in the right direction. It must also
apply to parameterized statements? I was executing the query like
this:

sql = "select requestdatetime, clientid from requests where
requestdatetime > ?"
cursor.execute(sql, request_time)

But when I remove the parameters it is just as fast as tsql or QA:

select requestdatetime, clientid from requests where
requestdatetime > '2009-02-14 09:31:34.106'
cursor.execute(sql)

The difference in this case was from 22 seconds to 0.723295 seconds.

Ramiro Morales

unread,
Feb 14, 2009, 12:56:58 PM2/14/09
to pyo...@googlegroups.com

Ideally, your testing should also include performing the query using a
tsql-equivalent provided by your ODBC manager (isql in the case of
unixODBC) to be competely sure about where the problem might lie.

I think what you're seeing could be be related to this:

http://code.google.com/p/django-pyodbc/issues/detail?id=16

I've opened it as a issue in the django-pyodbc issue tracker
to have, but I think there isn't much we (django-pyodbc) can do about it.

HTH

--
Ramiro Morales

mkleehammer

unread,
Feb 15, 2009, 9:00:03 AM2/15/09
to pyodbc
> Thanks for the reply. I broke down the timings and it was still the
> actual query that was taking so long. However,  you're suggestion
> about python objects pointed me in the right direction. It must also
> apply to parameterized statements? I was executing the query like
> this:
>
>    sql = "select requestdatetime, clientid from requests where
> requestdatetime > ?"
>    cursor.execute(sql, request_time)
>
> But when I remove the parameters it is just as fast as tsql or QA:
>
>    select requestdatetime, clientid from requests where
> requestdatetime > '2009-02-14 09:31:34.106'
>    cursor.execute(sql)
>
> The difference in this case was from 22 seconds to 0.723295 seconds.

I'm having trouble making sense of that. I can assure you that pyodbc
doesn't normally take 20 seconds to convert a single datetime
parameter! The SQL Server unittests (in the tests directory) run
very, very fast and use a lot of parameters. That doesn't mean
there's not a bug that you found, but I don't think so. Not in both
versions because they handle parameters completely differently. It
would have to be something about the driver.

When you tested on Windows, did you use the Microsoft SS driver
('DRIVER={SQL Server})?

Also, when you passed the parameter, did you pass it as a datetime
object or a string?

SRage

unread,
Feb 16, 2009, 2:39:36 PM2/16/09
to pyodbc
My initial parameter was passed as a datetime object. I tried using a
string instead and it behaves very much like the datetime object in
terms of performance. I ran a trace on the Sql Server and noticed some
interesting differences, but I need to educate myself a little more on
how to interpret them. FYI, what I observed was the SQL with hardcoded
date had readings of 16 CPU, 190 Reads, 4 Duration, whereas the SQL
using the bind parameter had readings of 1765 CPU, 21347 Reads, and
1758 Duration. I think you are probably correct that the issue isn't
pyodbc, but either the underlying driver or perhaps SQL Server itself
(and likely something I haven't set up correctly).

Note: regarding the issue linked in Ramiro Morales' post, I did not
see the SET FMTONLY ON; ... ; SET FMTONLY OFF anywhere in the query.

I have not tested on Windows yet - I am going to try that as soon as I
get python installed on a Windows machine.

SRage

unread,
Feb 16, 2009, 3:10:40 PM2/16/09
to pyodbc
On Feb 16, 2:39 pm, SRage <asha...@chumpland.org> wrote:
> I have not tested on Windows yet - I am going to try that as soon as I
> get python installed on a Windows machine.

I should have waited before the last post - installing python and
pyodbc on Windows and running the tests took about 10 minutes. Using
the DRIVER={SQL Server} as you suggested, the query is very fast and
essentially the same whether I bind the date parameter (as datetime or
string) or hardcode the date into the SQL. I think I have to
concentrate on either unixODBC or freetds.
Reply all
Reply to author
Forward
0 new messages