Fwd: django Views bug /PERFORMACE ISSUE

92 views
Skip to first unread message

Jassin MEKNASSI

unread,
Mar 10, 2010, 12:40:14 PM3/10/10
to ibm...@googlegroups.com


Begin forwarded message:

From: Jassin MEKNASSI <jassin....@gmail.com>
Date: March 8, 2010 11:13:56 PM EST
To: Rahul Priyadarshi2 <rahul.pr...@in.ibm.com>
Cc: Ambrish Bhargava1 <abha...@in.ibm.com>, Mario Ds Briggs <mario....@in.ibm.com>, Abhigyan Agrawal1 <abhigyan...@in.ibm.com>
Subject: Re: django Views bug /PERFORMACE ISSUE

Hi,

I upgraded from 9,5 to 9.7 but there is still no improvement.

thnks for any suggestion

Regards,

Jassin

On Mar 4, 2010, at 2:17 PM, Jassin MEKNASSI wrote:

hi Rahul,

I copied the file to Library/Python/2.6/site-packages/ibm_db-1.0-py2.6-macosx-10.6-universal.egg/ 
delete all the pyc files
Unfortunately this still takes forever2037.1454041 seconds

For all the version of  ibm_db_dbi.py i get:

/Library/Python/2.6/site-packages/ibm_db-1.0-py2.6-macosx-10.6-universal.egg/ibm_db_dbi.py:25: DeprecationWarning: the sets module is deprecated
  from sets import ImmutableSet

On Mar 4, 2010, at 1:25 AM, Rahul Priyadarshi2 wrote:


Hi Jassin:
  pleased find the attached ibm_db_dbi.py  wrapper  below.  Replace existing  ibm_db_dbi.py with this one  (at location '../site-packages/ibm_db.py2.x-xxx.egg/').


 Try  same testing to this wrapper and let me know how much time now it takes.


Thanks & Regards,
Rahul Priyadarshi

System Software Engineer

Open Source - Common AD
India Software Labs - IBM India Pvt Ltd.
Tel(Off) +91 80-402-55240
Tel(Mob) +91 9916422069

Office Addr • D2. First Floor. • Manyata Embassy Business Park • Outer Ring Road, Nagavara • Bangalore: 560 045 •




Jassin MEKNASSI <jmek...@me.com>

02/03/2010 20:54

To
Rahul Priyadarshi2/India/IBM@IBMIN
cc
Subject
Re: django Views bug /PERFORMACE ISSUE





takes forever 1863.21640682 seconds :(

thnks


On Mar 2, 2010, at 1:04 AM, Rahul Priyadarshi2 wrote:


You are not using ibm_db_dbi wrapper in correct way.
Please try  it as below and let me know how much time it is taking.


$ import ibm_db_dbi

$ import time

$

$ t1 = time.time()

$ conn = ibm_db_dbi.connect('node', 'user', 'pwd')

$ cur = conn.cursor()

$
sql="select sheet_id,name from nativefiles.sheetnames"
$ cur.execute(sql)

$ row = cur.fetchone()

$ wile row:

$        print row[1]

$        row = cur.fetchone()

$ elapsed = time.time() - t1

$ print elapsed



Thanks & Regards,
Rahul Priyadarshi

System Software Engineer

Open Source - Common AD
India Software Labs - IBM India Pvt Ltd.
Tel(Off) +91 80-402-55240
Tel(Mob) +91 9916422069

Office Addr • D2. First Floor. • Manyata Embassy Business Park • Outer Ring Road, Nagavara • Bangalore: 560 045 •




Jassin MEKNASSI <jassin....@googlemail.com>

01/03/2010 22:47


To
Rahul Priyadarshi2/India/IBM@IBMIN
cc
Jassin MEKNASSI <jassin....@googlemail.com>
Subject
Re: django Views bug /PERFORMACE ISSUE







Using ibm_db_dbi is as fast as using ibm_db
since i have never user dbi before i will paste the code i used to compare all the different modules (ibm_db,ibm_db_dbi,sqlalchemy and django):


TABLE has 15951 rows : DDL is
CREATE TABLE NATIVEFILES.SHEETNAMES (
SHEET_ID INTEGER NOT NULL,
NAME VARCHAR(50) NOT NULL,
PRIMARY KEY (SHEET_ID)
);


IBM_DB_DBI


import  ibm_db_dbi

import
time

t1 = time.time()
ibm_db_conn = ibm_db_dbi.connect(
'node', 'user', 'pwd')


sql=
"select sheet_id,name from nativefiles.sheetnames"
stmt = ibm_db_dbi.ibm_db.exec_immediate(ibm_db_conn, sql)
result = ibm_db_dbi.ibm_db.fetch_both(stmt)

while
( result ):
   result[
'SHEET_ID']
   result[
'NAME']
   result = ibm_db_dbi.ibm_db.fetch_both(stmt)
elapsed = time.time() - t1

print
elapsed

ibm_db_dbi  :3.89495897293 seconds



IBM_DB



import
ibm_db
import
time
ibm_db_conn = ibm_db.connect(
'node', 'user', 'pwd')

t1 = time.time()

sql=
"select sheet_id,name from nativefiles.sheetnames"
stmt = ibm_db.exec_immediate(ibm_db_conn, sql)
result = ibm_db.fetch_both(stmt)

while
( result ):
   result[
'SHEET_ID']
   result[
'NAME']
   result = ibm_db.fetch_both(stmt)
   
   
elapsed = time.time() - t1

print
elapsed

ibm_db:  2.36251711845 seconds



SQL_ALCHEMY:


import
ibm_db_sa.ibm_db_sa
from
sqlalchemy import *
from
sqlalchemy.orm import scoped_session,sessionmaker
from
sqlalchemy import Table
from
sqlalchemy import select
from
sqlalchemy import MetaData
from
sqlalchemy.orm import mapper

metadata = MetaData()


class
Sheet(object):
   
pass

engine=create_engine(
'connstring')
Session = scoped_session(sessionmaker(bind=engine))

sheetnames_table=Table(
'SHEETNAMES', metadata,schema='NATIVEFILES',autoload=True,autoload_with=engine)

mapper(Sheet,sheetnames_table)

for
s in Session.query(Sheet).all():
   s.sheet_id

   
s.name

takes forever



DJANGO:


models .py



class
SheetNames(models.Model):
   sheetid = models.IntegerField(primary_key=
True,db_column = 'SHEET_ID')
   name=models.CharField(max_length=
50,db_column='NAME')
   
class Meta:
       db_table =
'"NATIVEFILES"."SHEETNAMES"
       managed=
False

views.py


s=SheetNames.objects.all()

   
for a in s:
   
   a.name

takes forever



Thnks for yr help



   

On Feb 28, 2010, at 4:37 AM, Rahul Priyadarshi2 wrote:


Hi ,

Please try the same query  with ibm_db_dbi. Through this we would be able  identify  the main problem. Because this problem may aeries due to  django or our ibm_db_dbi wrapper.


Try this and let me know how much time it is taking
 with same query.

Thanks & Regards,
Rahul Priyadarshi

System Software Engineer

Open Source - Common AD
India Software Labs - IBM India Pvt Ltd.
Tel(Off) +91 80-402-55240
Tel(Mob) +91 9916422069


Office Addr • D2. First Floor. • Manyata Embassy Business Park • Outer Ring Road, Nagavara • Bangalore: 560 045 •




Jassin MEKNASSI <jassin....@googlemail.com>

25/02/2010 21:21


To
Rahul Priyadarshi2/India/IBM@IBMIN
cc
Subject
Re: django Views bug









Thanks it works.

I noticed that sqlalchemy and django drivers are really slow compared to the ibm_db driver which seems to use the underlying client.
I m using db2 9.5 express-c.
populating values from django on a simple table (PK, char(10)) with 10k rows takes more than 10 min . with ibm_db this is almost instantaneous.
Using native sql and using execute and fetchone with django does not improve performance.

Any idea? this would be very helpful since i m now stuck with only ibm_db. Move to 9.7?

Regards


On Feb 23, 2010, at 6:15 AM, Rahul Priyadarshi2 wrote:



Hi Jassin,

You got this error not because of any django adaptor bug. This is coming due to your model definition.


In your Sheetcount model you are not specifying any attribute as primary key. So that Django automatically adds an “ID” primary key attribute. And in you db2 view there is no field of name “ID”.  


If you specify any one of attribute as primary key then you will not get this error.


Please let me know this will help you or not.


By next time can you please post your issues in our forum (
http://groups.google.com/group/ibm_db). Because your issues will may help our other users also. If they will get same problem.


Thanks & Regards,
Rahul Priyadarshi

System Software Engineer

Open Source - Common AD
India Software Labs - IBM India Pvt Ltd.
Tel(Off) +91 80-402-55240
Tel(Mob) +91 9916422069

Office Addr • D2. First Floor. • Manyata Embassy Business Park • Outer Ring Road, Nagavara • Bangalore: 560 045 •




Jassin MEKNASSI <jassin....@googlemail.com>

23/02/2010 03:26


To
Rahul Priyadarshi2/India/IBM@IBMIN
cc
Subject
django Views bug











hi im using Django-1.1.1 and and ibm_db_django-0.1.4-py2.5.egg and get
SQL0206N when i use a db2 view
from django.db import models


class Sheetcount(models.Model):
name = models.CharField(max_length=50,db_column='SHEETNAME')
total =  models.IntegerField(db_column='TOTAL')
samplefile = models.CharField(max_length=50,db_column='SAMPLEFILE')
class Meta:
    db_table = '"TEST"."TESTTABLE"'
    managed=False



[IBM][CLI Driver][DB2/LINUXX8664] SQL0206N  "TEST.TESTTABLE.ID" is not valid in the context where it is used.  SQLSTATE=42703 SQLCODE=-206


thnks





<ibm_db_dbi.py>




Mario Briggs

unread,
Mar 11, 2010, 12:49:26 AM3/11/10
to ibm_db
Adding the response from the email thread

>>
Hi Jassin,

We have a grasp of the issues (they are not related to server
version), which we suspected and are right. I am sorry that the patch
we sent was wrong (didnt cover what we wanted to cover.)

Here is the gist of the problem.
- the dbi driver is looking to implement the dbi spec which requires
that the # of rows affected by the query be immeditaely available
after cursor.execute()
To support this requirment, the dbi driver switches the cursor
type to static as well as request the server to prefetch the rowcount.
This is certainly a overhead for the dbi that can account for more
than half the extra time the dbi takes.
- there are couple of other small things that we have identified for
improving in the dbi wrapper

Rahul has a use-case local with the performance measurements for these
issues. He can fill you in with the details and plans on this and
send you a temp dbi wrapper to verify that these are the same issues
you are hitting.

The good point is we have identified the issues. We have to spend a
little time thinking how we want to remove these wrapper bottlenecks
as well as support someone who depends on this spec behaviour.

P.S. We will move this email chain to the google-groups forum and
continue the discussion there.

thanks
Mario
<<

On Mar 10, 10:40 pm, Jassin MEKNASSI <jassin.mekna...@googlemail.com>
wrote:
> Begin forwarded message:

Dave Hughes

unread,
Mar 11, 2010, 6:29:01 AM3/11/10
to ibm...@googlegroups.com
Hi Mario,

On Wed, 2010-03-10 at 21:49 -0800, Mario Briggs wrote:
> Adding the response from the email thread
>
> >>
> Hi Jassin,
>
> We have a grasp of the issues (they are not related to server
> version), which we suspected and are right. I am sorry that the patch
> we sent was wrong (didnt cover what we wanted to cover.)
>
> Here is the gist of the problem.
> - the dbi driver is looking to implement the dbi spec which requires
> that the # of rows affected by the query be immeditaely available
> after cursor.execute()
> To support this requirment, the dbi driver switches the cursor
> type to static as well as request the server to prefetch the rowcount.
> This is certainly a overhead for the dbi that can account for more
> than half the extra time the dbi takes.

I suggest you interpret the spec a bit differently. To quote from
PEP-249:

"""
.rowcount

This read-only attribute specifies the number of rows that the
last .execute*() produced (for DQL statements like 'select') or affected
(for DML statements like 'update' or 'insert').

The attribute is -1 in case no .execute*() has been
performed on the cursor or the rowcount of the last operation
is cannot be determined by the interface. [7]
"""

The way I've seen most database drivers handle this issue is with the
following train of thought:

* Static cursors are too slow for most practical purposes. Forward only
cursors are by far the most efficient, so let's use them.

* However, forward only cursors can't determine (in advance) the number
of rows in a result set before it's been completely enumerated (which is
one of the reasons they're so fast), so just return -1 in such cases
with the justification that it "cannot be determined by the interface".

* In the case of DML statements (INSERT, UPDATE, etc.) you can still
return the rowcount because it's easily determined either by retrieving
the sqlerrd(3) value from the SQLCA after execution, or by calling
SQLRowCount().

In fact, SQLRowCount() is particularly convenient. From the docs: "when
you use SQLRowCount() on a non-scrollable SELECT-only cursor, the
function sets the contents of RowCountPtr to -1 because the number of
rows is not available until all of the data has been fetched", which is
exactly the behaviour we're after.

If you want to get fancy you could update the rowcount property after
the result set is completely enumerated (e.g. after fetchall()) by
having the cursor object count the number of rows as they're retrieved
(but obviously not "publish" this result until the retrieval is
complete). This would be supported under footnote [7] from PEP-249:

"""
[7] The rowcount attribute may be coded in a way that updates its value
dynamically. This can be useful for databases that return usable
rowcount values only after the first call to a .fetch*() method.
"""

Side note: row counts are subtle things and all too often people rely on
them without thinking. In some cases they can even be somewhat
subjective. For example, if I delete 2 rows from a table which is the
parent of a foreign key with a CASCADE rule, which then causes 5 more
rows to be deleted, should rowcount return 2 or 7? The latter figure can
be obtained from sqlerrd(5), but most systems I've come across consider
the first value to be the answer.

Anyway this is one of the reasons that I don't think drivers should try
too hard to provide row counts (and especially not at the huge cost to
performance that static cursors bring). If people actually need a row
count, they can go and explicitly query for them (which will also allow
them to determine exactly what they mean by a rowcount without any of
the ambiguity alluded to above).

Cheers,

Dave.

Mario Briggs

unread,
Mar 11, 2010, 8:05:23 AM3/11/10
to ibm_db
Dave,
you have exactly said what we were 99% going to do :-) and drivers
(across languages too) have been always doing this

regards
Mario

Jassin MEKNASSI

unread,
Mar 12, 2010, 2:01:27 PM3/12/10
to ibm...@googlegroups.com
Thnks looking forward to it

> --
> You received this message because you are subscribed to the Google Groups "ibm_db" group.
> To post to this group, send email to ibm...@googlegroups.com.
> To unsubscribe from this group, send email to ibm_db+un...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/ibm_db?hl=en.
>

Jassin MEKNASSI

unread,
Mar 25, 2010, 12:56:06 PM3/25/10
to ibm...@googlegroups.com
Hi,

I executed the ibm_db_dbi test code directly from the server. This takes 3seconds.
Unfortunately it still takes forever when used from a remote client.
Stats are:

remote ibm_db : 2.9sec
remote ibm_db_dbi: forever ( >10min)
server ibm_db : 1.9sec
server ibm_db_dbi:3.3sec

remote client used ibm_db 1.0 on Macosx 10.6 (ibm_db-1.0-py2.6-macosx-10.6-universal.egg)
server is used ibm_db 1.0 on a debian box (ibm_db-1.0-py2.5-linux-x86_64.egg)

Rahul

unread,
Mar 29, 2010, 3:34:45 PM3/29/10
to ibm_db
Hi Jassin,
I am little bit confused here.
What about your previous findings, In that you have used database
locally or remotely.

Thanks,
Rahul Priyadarshi

On Mar 25, 9:56 pm, Jassin MEKNASSI <jassin.mekna...@googlemail.com>
wrote:

Rahul Priyadarshi2

unread,
Apr 8, 2010, 2:23:26 AM4/8/10
to ibm...@googlegroups.com

I have made some changes in ibm_db_dbi wrapper to increase performance. My  initial testing gives very positive results.

Please pick the modified patch from attachment and replace the existing ibm_db_dbi.py with this one(at location “../site-packages/ibm_db.py2.x-xxx.egg/”).
Try this patch and let me know how it works.  



Thanks & Regards,
Rahul Priyadarshi

System Software Engineer

Open Source - Common AD
India Software Labs - IBM India Pvt Ltd.
Tel(Off) +91 80-402-55240
Tel(Mob) +91 9916422069

Office Addr • D2. First Floor. • Manyata Embassy Business Park • Outer Ring Road, Nagavara • Bangalore: 560 045 •




From: Rahul Priyadarshi2/India/IBM@IBMIN
To: ibm_db <ibm...@googlegroups.com>
Date: 30/03/2010 01:01
Subject: [ibm_db:414] Re: Fwd: django Views bug /PERFORMACE ISSUE
Sent by: ibm...@googlegroups.com


ibm_db_dbi.py

Jassin MEKNASSI

unread,
Apr 8, 2010, 9:07:07 AM4/8/10
to ibm...@googlegroups.com
Unfortunately i do get when using sqlalchemy using the following test code:




/Library/Python/2.6/site-packages/SQLAlchemy-0.4.0-py2.6.egg/sqlalchemy/util.py:7: DeprecationWarning: the md5 module is deprecated; use hashlib instead
  import md5, sys, warnings, sets
/Library/Python/2.6/site-packages/SQLAlchemy-0.4.0-py2.6.egg/sqlalchemy/util.py:7: DeprecationWarning: the sets module is deprecated
  import md5, sys, warnings, sets
Traceback (most recent call last):
  File "/Volumes/DATA/Development/DEV/db2_drivertest/src/sql_alchemy.py", line 20, in <module>
    sheetnames_table = Table('SHEETNAMES', metadata, schema = 'MYTESTSCHEMA', autoload = True, autoload_with = engine)
  File "/Library/Python/2.6/site-packages/SQLAlchemy-0.4.0-py2.6.egg/sqlalchemy/schema.py", line 112, in __call__
    autoload_with.reflecttable(table, include_columns=include_columns)
  File "/Library/Python/2.6/site-packages/SQLAlchemy-0.4.0-py2.6.egg/sqlalchemy/engine/base.py", line 1178, in reflecttable
    self.dialect.reflecttable(conn, table, include_columns)
  File "/Library/Python/2.6/site-packages/ibm_db_sa-0.1.6-py2.6.egg/ibm_db_sa/ibm_db_sa.py", line 474, in reflecttable
    schema_name = self.get_default_schema_name(connection)
  File "/Library/Python/2.6/site-packages/ibm_db_sa-0.1.6-py2.6.egg/ibm_db_sa/ibm_db_sa.py", line 401, in get_default_schema_name
    schema_name = connection.connection.connection.get_current_schema()
  File "/Library/Python/2.6/site-packages/ibm_db-1.0-py2.6-macosx-10.6-universal.egg/ibm_db_dbi.py", line 631, in get_current_schema
    return self.current_schema
AttributeError: 'Connection' object has no attribute 'current_schema'


metadata = MetaData()

class Sheet(object):
    pass

db_name = 'DBNAME'
schema_name = 'MYTESTSCHEMA'
engine=create_engine('ibm_db_sa://user:pwd@xxx.com:portnr/%s;CurrentSchema=%s'%(db_name, schema_name)) 
Session = scoped_session(sessionmaker(bind = engine))

sheetnames_table = Table('SHEETNAMES', metadata, schema = 'MYTESTSCHEMA', autoload = True, autoload_with = engine)



mapper(Sheet, sheetnames_table)


for s in Session.query(Sheet).all():
    print s.sheet_id
    s.name


<ibm_db_dbi.py>

Jassin MEKNASSI

unread,
Apr 8, 2010, 8:52:36 AM4/8/10
to ibm...@googlegroups.com
Great work. 
I go 2.94796013832 from a remoteclient
Thank you very much
On Apr 8, 2010, at 2:23 AM, Rahul Priyadarshi2 wrote:

<ibm_db_dbi.py>

Rahul Priyadarshi2

unread,
Apr 9, 2010, 7:19:36 AM4/9/10
to ibm...@googlegroups.com

Hi Jassin,
 You got this error because i removed one call to set default current schema. This call is happening at the time to creating connection. And CLI uses its local variable to return current schema, so without setting it we got empty string as current schema.

In attached dbi patch  i have added that call. Replace your existing ibm_db_dbi.py to this(at location“../site-packages/ibm_db.py2.x-xxx.egg/”).
Try this patch and let me know how it works.



Thanks,
Rahul Priyadarshi
ibm_db_dbi.py

Rahul

unread,
Apr 30, 2010, 1:57:54 AM4/30/10
to ibm_db
Hi Jassin,
New released version of ibm_db(1.0.2) also contains performance issue
fix.
So, try this and let me know how it goes.

Thanks,
Rahul Priyadarshi


On Apr 9, 4:19 pm, Rahul Priyadarshi2 <rahul.priyadar...@in.ibm.com>
wrote:
>  ibm_db_dbi.py
> 71KViewDownload
Reply all
Reply to author
Forward
0 new messages