pyodbc + sqlserver : performance issue with multiple threads

Alexandre Fayolle

unread,

Jan 5, 2010, 12:47:55 PM1/5/10

to pyodbc

Hello, happy new year everyone.

I'm experiencing a weird performance issue with a multi threaded python server
connecting to SQL Server. I'm still in the development phase, and using a kvm
virtual machine to run my code, so maybe this is the cause and similar things
won't happen once deployed. However this is critical enough for me to need to
clarify things.

My setup : win2003 (running on a kvm with 2 CPU and 2GB ram), SQL Server 2005
(developer edition), connecting using pyodbc (latest head from GIT) and Native
Client 10.

I have boiled down things to the attached script : after editing the
get_connection function to match you database, you can run it passing a number
of threads to use to issue queries on the database. The first run will create
a table called TestMThread and insert about 500k rows. Subsequent runs will
just use the data. The test consists in lauching N threads which will execute
select * from TestMThread and use fetchmany to retrieve the data (I've tested
with fetchone and fetchall and the results are similar)

What I observe is that if I run the test with 1 thread, the execute statement
is almost instantaneous, and the fetching takes about 4s. If I run with 2
threads, the execute is almost instantaneous and the fetching takes about 30s
for all threads. With 3 threads, things are not getting better on the fetching
side (~100s for all threads).

At this point any help is welcome.

I'd be grateful if someone could try and reproduce this to determine if the
issue comes from pyodbc, the Native Client 10.0 driver, SQLServer or KVM. I
get correct performance with Postgresql/psycopg2 on Linux.

Is there something similar to Linux's strace for Windows that I could use to
diagnose what is happening in my process?

Thanks in advance,

--
Alexandre Fayolle LOGILAB, Paris (France)
Formations Python, Zope, Plone, Debian: http://www.logilab.fr/formations
Développement logiciel sur mesure: http://www.logilab.fr/services
Informatique scientifique: http://www.logilab.fr/science

test_pyodbc_mthread.py

mkleehammer

unread,

Jan 14, 2010, 10:56:47 AM1/14/10

to pyodbc

We know that Python performs pretty poorly w/ multiple threads in
general. Are you sure it isn't just that normal degradation? Would
it be possible to simulate the load w/o pyodbc? Perhaps reading 500K
rows from a text file (already mapped into memory) or something?

Threads are unlocked in the C code only around the actual ODBC calls.
There are a lot of program-visible data structures that are accessed
by the rest of the C code, so most of the time is like running regular
Python code : the GIL is held.

I'll try looking at the specific calls you are making and see if there
are large chunks of C code that can be unlocked. Unfortunately, I
don't think there is much in the fetching since I have to use a lot of
PyString_FromString calls. You can't release the GIL and call those.

Alexandre Fayolle

unread,

Jan 14, 2010, 2:00:17 PM1/14/10

to pyo...@googlegroups.com, mkleehammer

Hi,

Thanks for looking into this.

I've been able to try the code with an Access database, and surprisingly the
performance are significantly better :

1 thread : fetchall in 2s
2 threads : fetchall in 3s
4 threads : fetchall in 4-10s

(I unfortunately cannot deploy using Access as the DB backend :-/ )

So this looks like the problem lies in the MSSql ODBC driver :-( and the
Python GIL is not guilty.

Any suggestion welcome, regarding possible configuration tweaks or ways I could
solve this. Thanks a lot.

Alexandre Fayolle

unread,

Feb 25, 2010, 2:39:48 AM2/25/10

to pyo...@googlegroups.com

After scratching my head for weeks on this issue (litterally) and peeling off
my cornea looking in the wrong places and using the wrong keywords in searches
(which all lead me back to my original post to this list sooner or later), I
finally solved this.

The good news is, this has nothing to do with pyodbc. Further testing proved
it has nothing to do with threads or the GIL or the Native Client ODBC driver.
I'm posting here though, because some may find this tip useful (and if they
share some thinking patterns with me, may end up on this thread when trying to
google themselve out of the problem).

The short story is :

ALTER DATABASE my_db SET READ_COMMITTED_SNAPSHOT ON;

The longer story is that with the default setting (read commited snapshot off),
SQLServer uses row locking to achieve READ COMMITTED isolation level, which
means that read operations do not block write operations. Problem solved.

For the longer story, refer to:

* Isolation Levels in the Database Engine
http://msdn.microsoft.com/en-us/library/ms189122.aspx
* ALTER DATABASE SET options:
http://msdn.microsoft.com/en-us/library/bb522682.aspx
* Using Row Versioning-based Isolation Levels
http://msdn.microsoft.com/en-us/library/ms179599.aspx

Side note : the read committed implementation in Postgresql is equivalent to
the read committed snapshot implementation of MS SQLServer.