Unexplainable SQL queries / Performance issues

112 views
Skip to first unread message

Martin84

unread,
Nov 21, 2012, 3:25:28 AM11/21/12
to sqlal...@googlegroups.com
Hi,

I use SQLalchemy 0.7.9 with SQLite and have a performance issue.

I have a simple database with joined table inheritance and some relationships. 

In short: I have a base class Human, and 3 subclasses (Men, Woman and Child), and I have a class House.

The house class have a many to many relationship to the human class (a house have residents, and every human have houses).

And there are two more relationship, the Woman and the Men class have a many to many relationship to the Child class (a men/woman can have many children).


Here you can see my model , query code: http://pastebin.com/mcum0c7Q


T
he issue is: if I load a house from the database with house = session.query(House).first() and then access the residents of this house with "house.residents",
and iterate over the residents and access the children of every resident then sqlalchemy emits a new sqlquery on every access:
for resident in house.residentes:
print resident.myChildren # emits new sql-query

This is very bad for my performance, what I need is a solution, that load with a single "session.query(House)-Command" all residents AND all children of the residents at once!
Is this possible?

For the many to many relationship between the Women/Men and the Child class is use lazy='subquery', but sqlalchemy ignore this! Why?

I hope someone could help me.




Diana Clarke

unread,
Nov 21, 2012, 9:50:14 AM11/21/12
to sqlal...@googlegroups.com
Morning Martin:

I could be wrong, but I think what you're looking for is lazy='joined'
rather than lazy='subquery'.

When I change the following, I see one query per showDatabase() call
rather than two.

class Men(Human):

....

myChildren = relationship('Child', secondary=link_table, lazy='joined')

class Woman(Human):

....

myChildren = relationship('Child', secondary=link_table, lazy='joined')

Here's how I think of it, with examples from:

http://docs.sqlalchemy.org/en/rel_0_8/orm/loading.html?highlight=subqueryload#using-loader-strategies-lazy-loading-eager-loading

* Case 1 (1 + N queries):
# set children to load lazily
session.query(Parent).options(lazyload('children')).all()

* Case 2 (1 query):
# set children to load eagerly with a join
session.query(Parent).options(joinedload('children')).all()

* Case 3 (2 queries):
# set children to load eagerly with a second statement
session.query(Parent).options(subqueryload('children')).all()

The subqueryload() and lazy='subquery' options emit an *additional*
SQL statement for each collection requested, but at least it's not N
queries (one for each child).

Thanks for including isolated code to easily reproduce the question.

Cheers,

--diana

Martin84

unread,
Nov 22, 2012, 6:09:53 AM11/22/12
to sqlal...@googlegroups.com
Hi Diana,

thank you for your help, but unfortunately my problem still exists.
In my case the lazy attribute for the myChildren relationship has absolutely no effect!
At this point one more information is important: I build and populate the database with one session, and then open a new session and call showDatabase.
If you populate the database with one session, and then call showDatabase with the same (now not empty) session,
then the instances are already in the session and sqlalchemy don't fire new queries for access to the myChildren attribute of human.
So, it is important to call showDatabase with an empty session to reproduce my issue.

One more information could be important:
If I load a woman instance and access myChildren with:

woman = session.query(Woman).one()
print woman.myChildren

then sqlalchemy emits an extra sql query on the access to myChildren and ignore my lazy='subquery' or lazy='joined' parameter for the mychildren relationship configuration.
But if I modify the query like this:

woman = session.query(Woman).options(subqueryload('myChildren')).one()
print woman.myChildren

then sqlalchemy load the woman and all children at once! Exactly what I need.
But this makes completely no sense to me, i thought that subqueryload() just overload the load strategie for a relationship.
So myChildren = relationship('Child', secondary=link_table, lazy='subquery') and subqueryload('myChildren') should be equivalent.
Why is there a difference?






Diana Clarke

unread,
Nov 22, 2012, 9:49:29 AM11/22/12
to sqlal...@googlegroups.com
Hmm.... maybe I'm missing something. Perhaps someone else can jump in,
and show me what I'm missing?

When I take your code, and execute the following 5 cases:

CASE A: no 'subquery' on relationship, 'subqueryload' on query
CASE B: 'subquery' on relationship, no 'subqueryload' on query
CASE C: no 'joined' on relationship, 'joinedload' on query
CASE D: 'joined' on relationship, no 'joinedload' on query
CASE E: no 'joined' or 'subquery' on relationship, no 'joinedload'
or 'subqueryload' on query

1) The queries in cases A & B are equal (exactly 2 queries each)
2) The queries in cases C & D are equal (exactly 1 query each)
3) Case E is completely lazy, there are exactly 1 + N queries

Here are the notes I kept, as I tested those cases:

http://pastebin.com/hx0Kj4An

Martin: perhaps create a new pastebin that shows exactly what you're
doing (including table/engine/session creation, data population, etc),
so that I have a better chance of seeing what you're seeing?

http://pastebin.com/mcum0c7Q

--diana
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/sqlalchemy/-/GponYdm2PLsJ.
>
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to
> sqlalchemy+...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/sqlalchemy?hl=en.

Michael Bayer

unread,
Nov 22, 2012, 10:00:35 AM11/22/12
to sqlal...@googlegroups.com
what is awkward about this mapping is that the "myChildren" relationship is established twice, specific to the Men and Woman subclasses. SQLAlchemy doesn't have any with_polymorphic instruction here when it loads Human objects, so when you say Human->(some attribute that is only on Men/Woman) it has no choice but to load the additional columns local to Men and Woman, then loads the myChildren relationship. The instruction to load "Human" objects with an eagerload to "myChildren" is a meaningless instruction - "Human" *does not have* a "myChildren" relationship. When you load Human objects, the fact that some may be Men and some may be Woman is not known at query time - it is only known after the rows are loaded, and it's too late to make decisions about eager loads on attributes specific to those subclasses.

As far we query(Woman).one(), if the Woman object is already present in the Session then its attributes are not refreshed and the subqueryload for "myChildren" won't emit.

If I load "Woman" with a clean session, the SQL is as expected:

session.close()
woman = session.query(Woman).one()
print woman.myChildren



SELECT human.type AS human_type, human.id AS human_id, woman.id_human AS woman_id_human
FROM human JOIN woman ON human.id = woman.id_human
2012-11-22 09:58:46,474 INFO sqlalchemy.engine.base.Engine ()
2012-11-22 09:58:46,475 INFO sqlalchemy.engine.base.Engine SELECT anon_1.human_type AS anon_1_human_type, anon_1.human_id AS anon_1_human_id, anon_1.child_id_human AS anon_1_child_id_human, anon_1.child_age AS anon_1_child_age, anon_2.woman_id_human AS anon_2_woman_id_human
FROM (SELECT woman.id_human AS woman_id_human
FROM human JOIN woman ON human.id = woman.id_human) AS anon_2 JOIN woman_child AS woman_child_1 ON anon_2.woman_id_human = woman_child_1.woman_id JOIN (SELECT human.type AS human_type, human.id AS human_id, child.id_human AS child_id_human, child.age AS child_age
FROM human JOIN child ON human.id = child.id_human) AS anon_1 ON anon_1.child_id_human = woman_child_1.child_id ORDER BY anon_2.woman_id_human

Martin84

unread,
Nov 22, 2012, 11:16:00 AM11/22/12
to sqlal...@googlegroups.com
Hi Diana & Michael Bayer,

you both have exactly the same results/sql queries, but unfortunately my results are different. This is really strange,
here is my complete python script with my output: http://pastebin.com/CyCs90cW

In my case the lazy='subquery' parameter have no effect and the sql queries differs from yours.

Michael Bayer

unread,
Nov 23, 2012, 12:42:59 AM11/23/12
to sqlal...@googlegroups.com
On Nov 22, 2012, at 11:16 AM, Martin84 wrote:

Hi Diana & Michael Bayer,

you both have exactly the same results/sql queries, but unfortunately my results are different. This is really strange,
here is my complete python script with my output: http://pastebin.com/CyCs90cW

As a side note, that script is slightly different.  A "lazy='joined'" is added to Human.houses.  So after your session.close(), then your query(Woman), you get "human JOIN woman" which is the query(Woman), then it also has "LEFT OUTER JOIN human_house_table LEFT OUTER JOIN house", which is your Human.houses lazy="joined".  I'd usually not use "lazy" at all in a joined inheritance mapping, as there are already too many JOINs happening implicitly for there to be extra joins like this coming in. 

So then, why you don't see the "subquery".   In 0.7, you need join_depth=1 here, because otherwise, the ORM sees Woman->Child as Human->Human, which is a cycle.  join_depth=1 means, "it's OK to join one time in this cycle".

In 0.8 this was improved so that Woman->Child isn't seen as a cycle, the subclass identity is taken into account - so join_depth can be removed.  See http://www.sqlalchemy.org/trac/ticket/2481 for a description of this issue.

Overall, 0.8 has a lot of improvements to joined inheritance in conjunction with eagerloading and with_polymorphic.  

Martin84

unread,
Nov 23, 2012, 3:38:44 AM11/23/12
to sqlal...@googlegroups.com
Hi Diana & Michael Bayer,

thanks for your help!
So, you both use sqlalchemy 0.8 and I use 0.7.9 and that explains our different SQL queries.
Now, with the join_depth=1 parameter the unexplainable SQL queries disappear and there is no more difference between lazy='subquery' and subqueryload().
But unfortunately now there is an other and far more problematic issue, the output of showDatabase is incorrect.

I modify my showDatabase() function like this:

def showDatabase(session):
    house = session.query(House).one()
    for resident in house.residents:
        print resident.myChildren

and now only one resident have a children (the men), and the one from the woman disappear!
How is this possible?

Here is my complete script with output: http://pastebin.com/HRqSWxQ0

Michael Bayer

unread,
Nov 23, 2012, 11:05:04 AM11/23/12
to sqlal...@googlegroups.com
On Nov 23, 2012, at 3:38 AM, Martin84 wrote:

Hi Diana & Michael Bayer,

thanks for your help!
So, you both use sqlalchemy 0.8 and I use 0.7.9 and that explains our different SQL queries.
Now, with the join_depth=1 parameter the unexplainable SQL queries disappear and there is no more difference between lazy='subquery' and subqueryload().
But unfortunately now there is an other and far more problematic issue, the output of showDatabase is incorrect.

I modify my showDatabase() function like this:

def showDatabase(session):
    house = session.query(House).one()
    for resident in house.residents:
        print resident.myChildren

and now only one resident have a children (the men), and the one from the woman disappear!
How is this possible?

Here is my complete script with output: http://pastebin.com/HRqSWxQ0


Child refers to its parent via id_human, is a many-to-one - a Child can only have *one* human.  your script assigns newChild twice, once to newMen, once to newWoman.   assigning it to newWoman essentially removes it from newMen.




Am Freitag, 23. November 2012 06:43:33 UTC+1 schrieb Michael Bayer:

On Nov 22, 2012, at 11:16 AM, Martin84 wrote:

Hi Diana & Michael Bayer,

you both have exactly the same results/sql queries, but unfortunately my results are different. This is really strange,
here is my complete python script with my output: http://pastebin.com/CyCs90cW

As a side note, that script is slightly different.  A "lazy='joined'" is added to Human.houses.  So after your session.close(), then your query(Woman), you get "human JOIN woman" which is the query(Woman), then it also has "LEFT OUTER JOIN human_house_table LEFT OUTER JOIN house", which is your Human.houses lazy="joined".  I'd usually not use "lazy" at all in a joined inheritance mapping, as there are already too many JOINs happening implicitly for there to be extra joins like this coming in. 

So then, why you don't see the "subquery".   In 0.7, you need join_depth=1 here, because otherwise, the ORM sees Woman->Child as Human->Human, which is a cycle.  join_depth=1 means, "it's OK to join one time in this cycle".

In 0.8 this was improved so that Woman->Child isn't seen as a cycle, the subclass identity is taken into account - so join_depth can be removed.  See http://www.sqlalchemy.org/trac/ticket/2481 for a description of this issue.

Overall, 0.8 has a lot of improvements to joined inheritance in conjunction with eagerloading and with_polymorphic.  


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/OUUgp8S57XAJ.

Michael Bayer

unread,
Nov 23, 2012, 11:09:07 AM11/23/12
to sqlal...@googlegroups.com

On Nov 23, 2012, at 11:05 AM, Michael Bayer wrote:

>
> Child refers to its parent via id_human, is a many-to-one - a Child can only have *one* human. your script assigns newChild twice, once to newMen, once to newWoman. assigning it to newWoman essentially removes it from newMen.

cancel this message. will look again.





Michael Bayer

unread,
Nov 23, 2012, 11:36:29 AM11/23/12
to sqlal...@googlegroups.com

On Nov 23, 2012, at 3:38 AM, Martin84 wrote:

> Hi Diana & Michael Bayer,
>
> thanks for your help!
> So, you both use sqlalchemy 0.8 and I use 0.7.9 and that explains our different SQL queries.
> Now, with the join_depth=1 parameter the unexplainable SQL queries disappear and there is no more difference between lazy='subquery' and subqueryload().
> But unfortunately now there is an other and far more problematic issue, the output of showDatabase is incorrect.
>
> I modify my showDatabase() function like this:
>
> def showDatabase(session):
> house = session.query(House).one()
> for resident in house.residents:
> print resident.myChildren
>
> and now only one resident have a children (the men), and the one from the woman disappear!
> How is this possible?

sorry, this is a actually an eagerloading bug, which has probably come up before, but is now posted as http://www.sqlalchemy.org/trac/ticket/2614. eager loading in conjunction with with_polymorphic has always been a bleeding edge feature and in this case the internal attribute targeting used by the ORM is seeing a conflict between ASubclassA.attr and ASubclassB.attr. It will require a major rethink of some aspects of this internal naming in order for this issue to be resolved. 0.8 has already had many weeks of effort put into improving the with_polymorphic system, so we are in better shape to attack such issues.

you will get the correct results if you don't try to subqueryload both same-named attributes at the same time. A workaround for now would be to name the two relationships differently, the use a @property to proxy them both:

class A(..):
myChildrenA = relationship(...)

@property
def myChildren(self):
return myChildrenA

class B(..):
myChildrenB = relationship(...)

@property
def myChildren(self):
return myChildrenB

or alternatively, another suggestion is to use a database schema that does not rely so heavily on long chains of joins in order to produce basic results.

Martin84

unread,
Nov 23, 2012, 1:49:27 PM11/23/12
to sqlal...@googlegroups.com
Hi Diana & Michael Bayer,

thanks a lot for your help and time. I will use the workaround or change the database schema.
Good to know, that sqlalchemy has such a helpful community!

Bye

Martin84

unread,
Nov 26, 2012, 5:45:18 AM11/26/12
to sqlal...@googlegroups.com
Hi Michael Bayer,

maybe I found a further sqlalchemy bug. If you add to the human<->houses relationship a lazy='subquery' parameter, then sqlalchemy throws an keyerror.
Check this script: http://pastebin.com/2ihWMZBA





Am Freitag, 23. November 2012 17:37:00 UTC+1 schrieb Michael Bayer:

Michael Bayer

unread,
Nov 26, 2012, 9:12:25 AM11/26/12
to sqlal...@googlegroups.com
thanks, stick to 0.7 for now, http://www.sqlalchemy.org/trac/ticket/2617.




--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/ca3VNEWRfoMJ.

Michael Bayer

unread,
Dec 1, 2012, 8:54:19 PM12/1/12
to sqlal...@googlegroups.com
both issues, same-named attributes on dual subclasses and structural subqueryloads in conjunction with with_polymorphic,  are now repaired in the latest tip for 0.8 and both test scripts are functional in 0.8 now.    Feel free to test it out and send me more feedback, thanks !



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/moyAu8IEYOUJ.
Reply all
Reply to author
Forward
0 new messages