Optimizing joins generated by mixin-based relationships on polymorphic subclasses

118 views
Skip to first unread message

Gerald Thibault

unread,
Nov 23, 2012, 12:57:44 AM11/23/12
to sqlal...@googlegroups.com
I have a base class and 3 subclasses which inherit from it polymorphically. I also have a mixin, which attempts to add a relation to the 'Extra' class, which has a foreign key to the base class.

Here is the runnable test.

from sqlalchemy import *
from sqlalchemy.ext.declarative import declared_attr, declarative_base
from sqlalchemy.orm import Session, relationship

Base = declarative_base()

class Test(Base):
    __tablename__ = 'test'

    id = Column(Integer, primary_key=True)
    type = Column(String(12))
    name = Column(String(50))

    __mapper_args__ = {
        'polymorphic_identity': 'base',
        'polymorphic_on': type,
        'with_polymorphic': '*',
        }

class Extra(Base):
    __tablename__ = 'extra'
    
    id = Column(Integer, ForeignKey(Test.id), primary_key=True)
    extra_data = Column(Text, default='goats')

class ExtraMixin(object):

    @declared_attr
    def extra(cls):
        return relationship(Extra, uselist=False, lazy='joined',
            primaryjoin=Extra.id==Test.id,
            foreign_keys=Extra.id,
            )

class TestOne(Test):
    __tablename__ = 'test1'
    __mapper_args__ = { 'polymorphic_identity': 'one' }

    id = Column(Integer, ForeignKey(Test.id), primary_key=True)
    value1 = Column(String(16))

class TestTwo(Test, ExtraMixin):
    __tablename__ = 'test2'
    __mapper_args__ = { 'polymorphic_identity': 'two' }

    id = Column(Integer, ForeignKey(Test.id), primary_key=True)
    value2 = Column(String(16))

class TestThree(Test, ExtraMixin):
    __tablename__ = 'test3'
    __mapper_args__ = { 'polymorphic_identity': 'three' }

    id = Column(Integer, ForeignKey(Test.id), primary_key=True)
    value3 = Column(String(16))


if __name__ == '__main__':

    import logging
    logging.basicConfig()
    logging.getLogger('sqlalchemy.orm').setLevel(logging.INFO)

    e = create_engine('sqlite:////tmp/foo.db', echo=True)
    Base.metadata = MetaData(e)
    Base.metadata.drop_all()
    Base.metadata.create_all()
    session = Session(e)

    session.query(Test).all()
    session.query(TestOne).all()
    session.query(TestTwo).all()

Issuing the first query (note the with_polymorphic on the base) yields this:

SELECT test.id AS test_id, test.type AS test_type, test.name AS test_name, test1.id AS test1_id, test1.value1 AS test1_value1, test2.id AS test2_id, test2.value2 AS test2_value2, test3.id AS test3_id, test3.value3 AS test3_value3, extra_1.id AS extra_1_id, extra_1.extra_data AS extra_1_extra_data, extra_2.id AS extra_2_id, extra_2.extra_data AS extra_2_extra_data 
FROM test LEFT OUTER JOIN test1 ON test.id = test1.id LEFT OUTER JOIN test2 ON test.id = test2.id LEFT OUTER JOIN test3 ON test.id = test3.id LEFT OUTER JOIN extra AS extra_1 ON extra_1.id = test.id LEFT OUTER JOIN extra AS extra_2 ON extra_2.id = test.id

The extra table is being joined twice, because two subclasses inherit the relationship to the Extra class.

Is there a way to prevent the relationship from being aliased? This is a one-to-one relation, so ideally a polymorphic query would issue left outer joins once for every involved table, all joined from Test.id, and it wouldn't alias any of them, so the resulting data would be usable for the active defined relationships (would I use contains_eager for something like this?). What can I do to make the mixed in table behave like the mapped polymorphic table, where the join is very clean and not aliased? There is a definite ideal join condition I am looking to achieve here, and I don't know how to get to it.

Would I need to forego using eagerjoined relations due to the aliasing of the joined tables? If so, would I need to iteratively join new tables to mapper.mapped_table? I've tried that already, and ran into problems with Extra.id not being recognized as a key that can fold into the other equivalent keys, and it throws an error while trying to autogen the columns from the mapped_table. Is there a way to specify beforehand that certain columns should be folded into existing ones?


Michael Bayer

unread,
Nov 23, 2012, 10:53:55 AM11/23/12
to sqlal...@googlegroups.com
On Nov 23, 2012, at 12:57 AM, Gerald Thibault wrote:

I have a base class and 3 subclasses which inherit from it polymorphically. I also have a mixin, which attempts to add a relation to the 'Extra' class, which has a foreign key to the base class.

Here is the runnable test.


Issuing the first query (note the with_polymorphic on the base) yields this:

SELECT test.id AS test_id, test.type AS test_type, test.name AS test_name, test1.id AS test1_id, test1.value1 AS test1_value1, test2.id AS test2_id, test2.value2 AS test2_value2, test3.id AS test3_id, test3.value3 AS test3_value3, extra_1.id AS extra_1_id, extra_1.extra_data AS extra_1_extra_data, extra_2.id AS extra_2_id, extra_2.extra_data AS extra_2_extra_data 
FROM test LEFT OUTER JOIN test1 ON test.id = test1.id LEFT OUTER JOIN test2 ON test.id = test2.id LEFT OUTER JOIN test3 ON test.id = test3.id LEFT OUTER JOIN extra AS extra_1 ON extra_1.id = test.id LEFT OUTER JOIN extra AS extra_2 ON extra_2.id = test.id

The extra table is being joined twice, because two subclasses inherit the relationship to the Extra class.

the first observation to make here is that this is an overuse of joined table inheritance.   We can try to optimize individual cases but at the end of the day, establishing the mapping such that all classes are dependent on a base is going to lead to lots of joins, each of which adds performance latency.  with_polymorphic='*' as a default behavior particularly.   So we're starting with an approach that's guaranteed to lead to non-performant queries.



Is there a way to prevent the relationship from being aliased?

the relationship has to be aliased since this is how joined loading works, unless you construct the joins manually and combine it with contains_eager() .    See http://docs.sqlalchemy.org/en/rel_0_8/orm/loading.html#the-zen-of-eager-loading and http://docs.sqlalchemy.org/en/rel_0_8/orm/loading.html#routing-explicit-joins-statements-into-eagerly-loaded-collections. Aliasing is not a problem.  It's the multiple joins you don't want.


This is a one-to-one relation, so ideally a polymorphic query would issue left outer joins once for every involved table, all joined from Test.id, and it wouldn't alias any of them,

there is no chance of SQLAlchemy ever knowing how to do that automatically, so you'd need to stick the relationship on the base object here or construct the joins manually.


Would I need to forego using eagerjoined relations due to the aliasing of the joined tables? If so, would I need to iteratively join new tables to mapper.mapped_table? I've tried that already, and ran into problems with Extra.id not being recognized as a key that can fold into the other equivalent keys, and it throws an error while trying to autogen the columns from the mapped_table. Is there a way to specify beforehand that certain columns should be folded into existing ones?

I'm not parsing this completely (autogen the columns from the mapped_table...), but overall my impression is that there's a battle here between purity (having Extra applied to only those individual Test subclasses that need it, applied exactly the same way to each one, making sure there isn't an unused "extra" attribute on those Test subclasses that don't need it) and practicality (just put .extra on the base, everything works, some of your Test subclasses have an "extra" attribute that is unused).   The foreign key relationship here is between Extra and Test, so IMHO it belongs on the base.   If that is failing, then send that test case along.


Michael Bayer

unread,
Nov 24, 2012, 4:07:02 PM11/24/12
to sqlal...@googlegroups.com

I'd hate to make a bad story worse here, but in fact this is also the same eager loading pattern used in some other emails I've been answering here by Martin84...it's very unusual how these never-seen-before use cases suddenly pop up in pairs..but in any case, the eager loading here is actually broken in all versions, and it's not even possible to have two same-named attributes eager loading simultaneously on two subclasses of a with_polymorphic().  I've spent all day examining a fix, and it is turning out to be not at all trivial.  So for now the new release of 0.8 will flat out raise an exception if this condition is detected.   The eager loads otherwise return the incorrect results in many/most cases.


On Nov 23, 2012, at 12:57 AM, Gerald Thibault wrote:

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/Mg8uL-3q0m8J.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Gerald Thibault

unread,
Nov 24, 2012, 5:12:52 PM11/24/12
to sqlal...@googlegroups.com
Thanks for taking the time to look into it. It's kind of an odd case, so I wasn't particularly expecting it to work out.
Reply all
Reply to author
Forward
0 new messages