SA0.7.9 - poor query performance (full scan instead of index scan because of joined table inheritance)

25 views
Skip to first unread message

Ladislav Lenart

unread,
Jun 13, 2013, 9:18:11 AM6/13/13
to sqlal...@googlegroups.com
Hello.

I have a query that does a full scan of an inherited table with more than
million rows even though I need only 100 of them (on postgres 9.1). This is a
real bummer! Please help me rewrite the SA query or instruct postgres to not do
this stupidity.


I have the following setup (only the interesting relations):

Contact
contact_tags -> ContactTag (collection)
phones -> Phone (collection)
emails -> Email (collection)

ContactTag
tag -> Tag (cannot be NULL)

PersonalContact (Contact subclass)
partner -> Partner (can be NULL)
client -> PersonalClient (can be NULL)

CorporateContact (Contact subclass)
client -> CorporateClient (can be NULL)

Client

PersonalClient (Client subclass)
data -> PersonalData (cannot be NULL)

CorporateClient (Client subclass)
data -> CorporateData (cannot be NULL)


I have the following query that loads data of one window:

# window (input argument) is a list of id values.
q = session.query(PersonalContact).filter(PersonalContact.id.in_(window)
q = q.options(
subqueryload_all(PersonalContact.contact_tags, ContactTag.tag),
subqueryload(PersonalContact.phones),
subqueryload(PersonalContact.emails),
joinedload_all(
PersonalContact.partner,
Partner.personal_data,
PersonalData.address,
),
joinedload_all(
PersonalContact.client,
PersonalClient.data,
PersonalData.address
),
)


(Note that I have similar query for CorporateContact. Infact, I generate them
both in the same method.)

It produces SQL like this (the problematic part is emphasized):

SELECT *
FROM
contact
JOIN personal_contact ON contact.id = personal_contact.id
-- ***************
LEFT OUTER JOIN (
SELECT *
FROM
client
JOIN personal_client ON client.id = personal_client.id
) AS anon_1 ON personal_contact.client_id = anon_1.personal_client_id
-- ***************
LEFT OUTER JOIN partner AS partner_1
ON personal_contact.partner_id = partner_1.id
LEFT OUTER JOIN personal_data AS personal_data_1
ON partner_1.personal_data_id = personal_data_1.id
LEFT OUTER JOIN address AS address_1
ON personal_data_1.address_id = address_1.id
LEFT OUTER JOIN personal_data AS personal_data_2
ON anon_1.personal_client_data_id = personal_data_2.id
LEFT OUTER JOIN address AS address_2
ON personal_data_2.address_id = address_2.id
WHERE personal_contact.id IN (...)


The inner select directly corresponds to joinedload of PersonalContact.client, a
PersonalClient instance (and a Client subclass).

The postgres does a full scan of tables Client and PersonalClient even though I
will need at most 100 rows from each one.

However, if I rewrite the problematic part by hand like this:

LEFT OUTER JOIN client
ON personal_contact.client_id = client.id
LEFT OUTER JOIN personal_client
ON client.id = personal_client.id

it works like a charm.

Unfortunately I don't know how to write such a query in SA. I am really stuck so
any help is much appreciated.


Thank you,

Ladislav Lenart


Michael Bayer

unread,
Jun 13, 2013, 9:51:16 AM6/13/13
to sqlal...@googlegroups.com
Please try out 0.9 from the git master which fixes the issue of the nested SELECT on the right side of a join.

Sent from my iPhone
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Ladislav Lenart

unread,
Jun 13, 2013, 11:03:55 AM6/13/13
to sqlal...@googlegroups.com, Michael Bayer
Unfortunately migrating to SA 0.9 is not an option for me at the moment due to
severe time constraints.

Could you please help me write SA query for 0.7.9 that uses index scan and also
loads all the necessary relations? It must be possible with a proper use of
from_statement(), contains_eager() and/or other SA features. It is just that toy
examples in the documentation don't help me much with this complex beast.

Here is a simplified version of my problem:

Foo
bar (can be NULL)

Bar

BarA (subclass of Bar)
data -> Data (cannot be NULL)

Data
value (string)

I need to rewrite this query:

q = session.query(Foo).filter(Foo.id.in_(...))
q = q.options(
joinedload_all(Foo.bar, BarA.data)
)

in such a way that it does NOT perform any full scan and also populates
Foo.bar.data.value of each returned Foo. The new query can return whatever it
pleases as long as it returns Foo instances with properly populated relations.

Please help me write it (or tell me that it is not possible in SA 0.7.9, though
I highly doubt that).


Thank you,

Ladislav Lenart

Ladislav Lenart

unread,
Jun 13, 2013, 11:08:55 AM6/13/13
to sqlal...@googlegroups.com, Michael Bayer
Hello.

Just a minor correction:

The new query can return whatever it pleases as long as it *also*
returns Foo instances with properly populated relations.

The rest of the e-mail is the same.


Please help me,

Ladislav Lenart

Michael Bayer

unread,
Jun 13, 2013, 12:44:23 PM6/13/13
to sqlal...@googlegroups.com

On Jun 13, 2013, at 11:03 AM, Ladislav Lenart <lena...@volny.cz> wrote:

> Unfortunately migrating to SA 0.9 is not an option for me at the moment due to
> severe time constraints.

I was pretty sure you'd say that, though I'm really looking to verify that my fixes are going to hold up under real world usage. The issues you're having are real issues, and they've been fixed.

>
> Could you please help me write SA query for 0.7.9 that uses index scan and also
> loads all the necessary relations? It must be possible with a proper use of
> from_statement(), contains_eager() and/or other SA features. It is just that toy
> examples in the documentation don't help me much with this complex beast.

you use the SQL expression language in conjunction with .join()/outerjoin(), pass to query.select_from(), then use contains_eager():

j = Foo.__table__.outerjoin(Bar.__table__.outerjoin(BarA.__table__).outerjoin(Data.__table__))

q = s.query(Foo).\
select_from(j).\
filter(Foo.id.in_([1, 2, 3])).\
options(
contains_eager(Foo.bar),
contains_eager(Foo.bar.of_type(BarA), BarA.data)
)



Ladislav Lenart

unread,
Jun 14, 2013, 5:18:54 AM6/14/13
to sqlal...@googlegroups.com, Michael Bayer
Hello.


On 13.6.2013 18:44, Michael Bayer wrote:
>
> On Jun 13, 2013, at 11:03 AM, Ladislav Lenart <lena...@volny.cz> wrote:
>
>> Unfortunately migrating to SA 0.9 is not an option for me at the moment due to
>> severe time constraints.
>
> I was pretty sure you'd say that, though I'm really looking to verify that my fixes are going to hold up under real world usage. The issues you're having are real issues, and they've been fixed.

I will make a 1:1 self-containted test case for this issue during the next week,
I promise!


>> Could you please help me write SA query for 0.7.9 that uses index scan and also
>> loads all the necessary relations? It must be possible with a proper use of
>> from_statement(), contains_eager() and/or other SA features. It is just that toy
>> examples in the documentation don't help me much with this complex beast.
>
> you use the SQL expression language in conjunction with .join()/outerjoin(), pass to query.select_from(), then use contains_eager():
>
> j = Foo.__table__.outerjoin(Bar.__table__.outerjoin(BarA.__table__).outerjoin(Data.__table__))
>
> q = s.query(Foo).\
> select_from(j).\
> filter(Foo.id.in_([1, 2, 3])).\
> options(
> contains_eager(Foo.bar),
> contains_eager(Foo.bar.of_type(BarA), BarA.data)
> )

Thank you very much! I am going to try it :-)


Ladislav Lenart

Ladislav Lenart

unread,
Jun 17, 2013, 11:47:57 AM6/17/13
to sqlal...@googlegroups.com, Michael Bayer
Hello.

Here is the fully self-contained regression of the issue, including the
workaround for SA 0.7.9. Thank you again, because I wouldn't figure it out
without your help (the select_from part). I haven't tried it on SA 0.9.

If you have any questions, please ask.

HTH,

Ladislav Lenart
sandbox_issue_inheritance_fullscan.py

Ladislav Lenart

unread,
Jun 17, 2013, 5:54:14 AM6/17/13
to sqlal...@googlegroups.com
Hello.

I ended up with the following query:

@classmethod
def _find_contacts_fetch_window(cls, contact_cls, win):
"""Special data-fetching query for contacts and all their related info
including tags, partner, client,...

NOTE: We build the FROM part entirely by hand, because SA generates bad
SQL for postgres. It does a FULL SCAN of client and personal_client /
corporate_client even though it reads at most window_size rows from
them. All this because SA inheritance creates a subselect which leads
to the full scan.
"""
# win .. list of ids
# contact_cls .. PersonalContact / CorporateContact
client_cls = contact_cls.client_cls() # PersonalClient / CorporateClient
data_cls = client_cls.data_cls() # PersonalData / CorporateData

# We need TABLEs to build the FROM part by hand.
# We reference PersonalData/CorporateData and Address twice, hence we
# need to alias them.
# We also need their aliased ORM classes for contains_eager() to work.
contact_table = Contact.__table__
contact_subtable = contact_cls.__table__
client_table = Client.__table__
personal_client_table = client_cls.__table__
partner_table = Partner.__table__
partner_data_table = PersonalData.__table__.alias(name='partner_data')
partner_address_table = Address.__table__.alias(name='partner_address')
client_data_table = data_cls.__table__.alias(name='client_data')
client_address_table = Address.__table__.alias(name='client_address')
partner_data = aliased(PersonalData, partner_data_table)
partner_address = aliased(Address, partner_address_table)
client_data = aliased(data_cls, client_data_table)
client_address = aliased(Address, client_address_table)
select_from = contact_table.join(
contact_subtable,
contact_table.c.id == contact_subtable.c.id
).outerjoin(
client_table,
contact_subtable.c.client_id == client_table.c.id
).outerjoin(
personal_client_table,
client_table.c.id == personal_client_table.c.id
).outerjoin(
client_data_table,
personal_client_table.c.data_id == client_data_table.c.id
).outerjoin(
client_address_table,
client_data_table.c.address_id == client_address_table.c.id
)
options = [
subqueryload_all(contact_cls.contact_tags, ContactTag.tag),
subqueryload(contact_cls.phones),
subqueryload(contact_cls.emails),
contains_eager(contact_cls.client),
contains_eager(contact_cls.client, client_cls.data, alias=client_data),
contains_eager(contact_cls.client, client_cls.data,
client_data.address, alias=client_address),
]
if contact_cls is PersonalContact:
select_from = select_from.outerjoin(
partner_table,
contact_subtable.c.partner_id == partner_table.c.id
).outerjoin(
partner_data_table,
partner_table.c.personal_data_id == partner_data_table.c.id
).outerjoin(
partner_address_table,
partner_data_table.c.address_id == partner_address_table.c.id
)
options.extend([
contains_eager(contact_cls.partner),
contains_eager(contact_cls.partner, Partner.personal_data,
alias=partner_data),
contains_eager(contact_cls.partner, Partner.personal_data,
partner_data.address, alias=partner_address),
])
q = session.query(contact_cls).select_from(select_from)
q = q.filter(contact_cls.id.in_(win))
q = q.options(*options)
return q


It works great though it took me quite a while to get right, especially the
aliasing between tables and classes. Also, I wouldn't figure it out without your
help (I mean the part with select_from).

Thank you again,

Ladislav Lenart


On 13.6.2013 18:44, Michael Bayer wrote:
>

Ladislav Lenart

unread,
Jun 17, 2013, 5:35:27 AM6/17/13
to sqlal...@googlegroups.com, Michael Bayer
Reply all
Reply to author
Forward
0 new messages