Filters on polymorphic unions

238 views
Skip to first unread message

Benjamin Gonzalez

unread,
Oct 9, 2012, 9:54:53 AM10/9/12
to sqlal...@googlegroups.com

 Hi,

 I've been using SQLAlchemy for the last couple of weeks. I'll definitely call myself a beginner so I apologize if I'm missing something obvious with this question :)

 I'm working with a database where there is data split across tables with the exact same structure, so there are cases where I need to query more than one table and filter by the same criteria. Looking around in the documentation I read about Concrete Table Inheritance and thought that could help me in this case. Well, it almost did :) I might be missing something obvious, but if we take the example provided: http://docs.sqlalchemy.org/en/rel_0_7/orm/inheritance.html#concrete-table-inheritance

 And perform a query such as:

 session.query(Employee).filter(Employee.name=='test')

 It will apply the name filter after the union is done:

SELECT pjoin.employee_id AS pjoin_employee_id, pjoin.name AS pjoin_name, pjoin.type AS pjoin_type, pjoin.manager_data AS pjoin_manager_data, pjoin.engineer_info AS pjoin_engineer_info 
FROM (SELECT employees.employee_id AS employee_id, employees.name AS name, CAST(NULL AS VARCHAR(50)) AS manager_data, CAST(NULL AS VARCHAR(50)) AS engineer_info, 'employee' AS type 
FROM employees UNION ALL SELECT managers.employee_id AS employee_id, managers.name AS name, managers.manager_data AS manager_data, CAST(NULL AS VARCHAR(50)) AS engineer_info, 'manager' AS type 
FROM managers UNION ALL SELECT engineers.employee_id AS employee_id, engineers.name AS name, CAST(NULL AS VARCHAR(50)) AS manager_data, engineers.engineer_info AS engineer_info, 'engineer' AS type 
FROM engineers) AS pjoin 
WHERE pjoin.name = :name_1

 Is there a way to ask SQLAlchemy to apply the filter to each table in the inner queries? They all have a name column, and in the case of a large dataset it's helpful because it avoids loading all the rows for each table first and then applying the WHERE clause.

 Thanks in advance,
 Benjamin

Michael Bayer

unread,
Oct 9, 2012, 10:35:30 AM10/9/12
to sqlal...@googlegroups.com
On Oct 9, 2012, at 9:54 AM, Benjamin Gonzalez wrote:


 Hi,

 I've been using SQLAlchemy for the last couple of weeks. I'll definitely call myself a beginner so I apologize if I'm missing something obvious with this question :)

 I'm working with a database where there is data split across tables with the exact same structure, so there are cases where I need to query more than one table and filter by the same criteria.

this sounds like you're describing horizontal partitioning.   such as if a particular row of data could equally well be in one table or the other, and its only some kind of sharding scheme that determines which table.


Looking around in the documentation I read about Concrete Table Inheritance and thought that could help me in this case.

yeah concrete inheritance is not really intended for this case, though I can see what you're going for.    


 And perform a query such as:

 session.query(Employee).filter(Employee.name=='test')

 It will apply the name filter after the union is done:

SELECT pjoin.employee_id AS pjoin_employee_id, pjoin.name AS pjoin_name, pjoin.type AS pjoin_type, pjoin.manager_data AS pjoin_manager_data, pjoin.engineer_info AS pjoin_engineer_info 
FROM (SELECT employees.employee_id AS employee_id, employees.name AS name, CAST(NULL AS VARCHAR(50)) AS manager_data, CAST(NULL AS VARCHAR(50)) AS engineer_info, 'employee' AS type 
FROM employees UNION ALL SELECT managers.employee_id AS employee_id, managers.name AS name, managers.manager_data AS manager_data, CAST(NULL AS VARCHAR(50)) AS engineer_info, 'manager' AS type 
FROM managers UNION ALL SELECT engineers.employee_id AS employee_id, engineers.name AS name, CAST(NULL AS VARCHAR(50)) AS manager_data, engineers.engineer_info AS engineer_info, 'engineer' AS type 
FROM engineers) AS pjoin 
WHERE pjoin.name = :name_1

 Is there a way to ask SQLAlchemy to apply the filter to each table in the inner queries? They all have a name column, and in the case of a large dataset it's helpful because it avoids loading all the rows for each table first and then applying the WHERE clause.

oh OK so you're taking advantage of the UNION generated by polymorphic_union to try filtering across all those tables.    this is a funny case because we do have an extension that serves as an example for "horizontal partitioning", which is in sqlalchemy.ext.horizontal_shard, but that example is architected around the idea that you're talking to multiple database backends, each of which has just one table in it.   So it doesn't do the UNION thing which of course is preferable if it is possible.   

One example that might be worth peeking at is how this mapping scheme can be set up without necessarily using Concrete inheritance, though really it's probably easier that you are just using the concrete system, that example is at: http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName .   It has a "classical" version and a "declarative" version, though the declarative is a bit awkward.    Stick with the concrete if that's easier.

As far as the query, to come up with an ad-hoc, filtered UNION would have to be invented.    I'd start by working it out manually using Query, then perhaps trying to genericize it, either that you have a simple function that comes up with a Query with unions, or if you really need to do this fluently, as a Query subclass.   Here's sort of the idea of that:

def query_all_shards(session, criterion):
    q = None
    for subclass in MyConcreteBase.__subclasses__():
        if q is None:
            q = session.query(subclass).filter(criterion)
        else:
            union_q = session.query(subclass).filter(criterion)
            q = q.union_all(union_q)
    return q

I haven't tested this, but something like that.   Then if you wanted to bake this into Query, you could build a Query subclass that does this, giving it a method like "union_on_shards()" that would trigger a "union" operation similar to the above.    There's a Query subclass example in the sqlalchemy/ext/horizontal_shard.py module, and also in some of the wiki examples such as http://www.sqlalchemy.org/trac/wiki/UsageRecipes/PreFilteredQuery .

Overall I'd stay away from polymorphic_union as a mapper-level default since it generates huge beasts of queries, which you often won't want.





Benjamin Gonzalez

unread,
Oct 9, 2012, 2:17:37 PM10/9/12
to sqlal...@googlegroups.com

Thanks for the valuable suggestions. You are actually correct: the scheme that they are using is sharding across tables, that's why the structure is the same.

And the way I was setting things up is very similar to the example you mentioned (http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName) so that I could query multiple table names using the same class. I put together a small example that using that and a simplified version of Manager and Engineer. Constructing the union the way you suggested works (the criterion goes in the subquery's WHERE clause instead of outside of the union). 

SELECT anon_1.manager_employee_id AS anon_1_manager_employee_id, anon_1.manager_name AS anon_1_manager_name 
FROM (SELECT manager.employee_id AS manager_employee_id, manager.name AS manager_name 
FROM manager 
WHERE manager.name = :name_1 UNION ALL SELECT engineer.employee_id AS engineer_employee_id, engineer.name AS engineer_name 
FROM engineer 
WHERE engineer.name = :name_2) AS anon_1

But now the problem seems to be that SQLAlchemy will only return rows matching the first class. 

Here's the example code. In this case there is 1 row in the manager & engineer tables that contains the name "test". After constructing the union, I was hoping the query would return two results but it will only return one. However if I run count() on the query, it correctly states that there are two rows. Am I missing something else?

class Employee(object):
    def __init__(self, name):
        self.name = name
    def __repr__(self):
        return self.__class__.__name__ + " " + self.name

managers_table = Table('manager', metadata,
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
)

engineers_table = Table('engineer', metadata,
    Column('employee_id', Integer, primary_key=True),
    Column('name', String(50)),
)

def map_class_to_some_table(cls, table, entity_name, **kw):
     newcls = type(entity_name, (cls, ), {})
     mapper(newcls, table, **kw)
     return newcls

Manager = map_class_to_some_table(Employee, managers_table, 'Manager')
Engineer = map_class_to_some_table(Employee, engineers_table, 'Engineer')

q1 = session.query(Manager).filter(Manager.name=='test')
q2 = session.query(Engineer).filter(Engineer.name=='test')

u = q1.union_all(q2)
print u.count() #count = 2
rows = u.all()
print len(rows) #len = 1
print rows[0] # only contains the manager result

Michael Bayer

unread,
Oct 9, 2012, 5:42:48 PM10/9/12
to sqlal...@googlegroups.com
On Oct 9, 2012, at 2:17 PM, Benjamin Gonzalez wrote:


Thanks for the valuable suggestions. You are actually correct: the scheme that they are using is sharding across tables, that's why the structure is the same.

And the way I was setting things up is very similar to the example you mentioned (http://www.sqlalchemy.org/trac/wiki/UsageRecipes/EntityName) so that I could query multiple table names using the same class. I put together a small example that using that and a simplified version of Manager and Engineer. Constructing the union the way you suggested works (the criterion goes in the subquery's WHERE clause instead of outside of the union). 

SELECT anon_1.manager_employee_id AS anon_1_manager_employee_id, anon_1.manager_name AS anon_1_manager_name 
FROM (SELECT manager.employee_id AS manager_employee_id, manager.name AS manager_name 
FROM manager 
WHERE manager.name = :name_1 UNION ALL SELECT engineer.employee_id AS engineer_employee_id, engineer.name AS engineer_name 
FROM engineer 
WHERE engineer.name = :name_2) AS anon_1

But now the problem seems to be that SQLAlchemy will only return rows matching the first class. 

Here's the example code. In this case there is 1 row in the manager & engineer tables that contains the name "test". After constructing the union, I was hoping the query would return two results but it will only return one. However if I run count() on the query, it correctly states that there are two rows. Am I missing something else?

Unfortunately what you're trying to do is nearly impossible.    I spent about an hour trying to come up with any kind of workaround; the only workaround, which barely works, is attached.     Concrete inheritance is required since you're looking to load polymorphically, and some direct manipulation of column expressions is needed so that the "polymorphic" lookup works.  Maybe it will be useful, though I don't know if this approach can handle more complexity than what we have in the current "hello world" version you see here.

I'd need to add new ORM features and apis to make this possible (the irony when people say we have too many features).    I don't necessarily have a plan for how this feature would be supported, though.  It's kind of turning the usual ORM loading interaction inside-out.

test.py

Michael Bayer

unread,
Oct 9, 2012, 6:32:35 PM10/9/12
to sqlal...@googlegroups.com
OK, heh, this will work in 0.7 or 0.8, just the argument name was wrong:

e_ = literal_column("'engineer'").label('pjoin_type')
m = literal_column("'manager'").label('pjoin_type')

u = s.query(Engineer, e_).filter(Engineer.name.in_(['e2', 'e3']))
u2 = s.query(Manager, m).filter(Manager.name.in_(['m2', 'm3']))

u = union_all(u, u2)

q = s.query(Employee).with_polymorphic('*', u, u.c.pjoin_type).from_statement(u)
print q.all()


getting closer !


Michael Bayer

unread,
Oct 9, 2012, 6:26:40 PM10/9/12
to sqlal...@googlegroups.com
On Oct 9, 2012, at 5:42 PM, Michael Bayer wrote:

this variation also works in 0.8 without using any unofficial APIs, still poking around with it:

e_ = literal_column("'engineer'").label('pjoin_type')
m = literal_column("'manager'").label('pjoin_type')

u = s.query(Engineer, e_).filter(Engineer.name.in_(['e2', 'e3']))
u2 = s.query(Manager, m).filter(Manager.name.in_(['m2', 'm3']))

u = union_all(u, u2)
q = s.query(Employee).with_polymorphic('*', u, polymorphic_on=u.c.pjoin_type).from_statement(u)
print q.all()


Benjamin Gonzalez

unread,
Oct 10, 2012, 9:05:42 PM10/10/12
to sqlal...@googlegroups.com
I tried the example you showed in the last post and it worked. Thank you so much! 

I know this is a very uncommon use case, so I really appreciate the help.

Thanks again,
Ben
Reply all
Reply to author
Forward
0 new messages