how to tell which tables already joined in a query statement (or best way to dynamically build a query?)

967 views
Skip to first unread message

Brian Cherinka

unread,
Aug 7, 2015, 10:35:36 AM8/7/15
to sqlalchemy

Hi, 

I'm trying to build an SQLalchemy ORM query dynamically based on user selected options.  Some of these options come from the same table, but the user could select either one or both criteria to filter on.  Since I don't know which options the user will select ahead of time, I have to join to the same table multiple times.  However this throws an error 

ProgrammingError: (psycopg2.ProgrammingError) table name "TableB" specified more than once

when I try to submit the query.  How can I find out which tables have already been joined in a query?  Or what's the best way to handle building a query based on multiple criteria?   I'm using SQLalchemy 1.0.0. 

Here is my pseudo-code.  

Option 1.  Option 2.  Option 3.   (any or all options can be selected, and they all come from the same joined table)

// base table
query = session.query(TableA)

// add on new criteria
if option 1: query = query.join(TableB).filter(TableB.option1 > X )
if option 2: query = query.join(TableB).filter(TableB.option2 > X )
if option 3: query = query.join(TableB).filter(TableB.option3 > X )

However, when attempting query.all(), this throws the above error, if I have selected any two options.   What I think it should be is something like this...

//base
query = session.query(TableA)
//join
query = query.join(TableB)
// add on new criteria
if option 1: query = query.filter(TableB.option1 > X )
if option 2: query = query.filter(TableB.option2 > X )
if option 3: query = query.filter(TableB.option3 > X )

but I don't want to join to TableB if I don't have to.  I have many different tables where this kind of situation applies, and it seems inefficient to join to all other tables just in case I may need to filter on something.  

Any thoughts, help or suggestions?
Thanks, Brian





Mike Bayer

unread,
Aug 7, 2015, 11:34:00 AM8/7/15
to sqlal...@googlegroups.com
This is a thing for which there is a plan to make this really clear and doable.   But right now that is only a plan.   It's not necessarily straightforward, in the totally open-ended case, to determine every table that will be in the FROM clause, given that if a query has filter(X.foo == 'bar'), now "X" is in the FROM list, which you wouldn't know until you generate the core Select statement.    The problem of determining exactly what outer JOINs and such are present in an easy and performant way is a large problem to be solved and you can see the proposed, eventual API for this at https://bitbucket.org/zzzeek/sqlalchemy/issues/3225/query-heuristic-inspection-system#comment-12988632.

So given that this is a big new feature that's currently targeted at least a year away, for now we need to keep things simple.

If I really had simple imperative code all in one place like that, I'd probably just refer to the flag twice:

if option1 or option2 or option3:
    query = query.join(TableB)   

if option1: # etc
if option2: # etc.

But I can hear you cringing, so you'd have to just keep track of what you're joining.    So still keeping it boringly simple, use a set.   To make this maybe less tedious, we'll build a closure function so that we can be q a little more succinct:

def joiner():
    already_joined = set()
    def join(q, ent):
        if ent not in already_joined:
            already_joined.add(ent)
            q = q.join(ent)
        return q
    return join

def my_filter_thing(option1, option2, option3):
    q = session.query(A)
    join = joiner()

    if option1:
        q = join(q, B).filter(B.option1 == foo)
    if option2:
        q = join(q, B).filter(B.option2 == bar)

Then I hear, OK but I'm passing the Query to other functions and I don't want to ship that extra thing along with it everywhere.  So we can stick the already_joined on the Query directly.  But the thing to track here is that this is an immutable object so that the previous version of the Query doesn't have the newer state on it, in case you are forking off a Query object into multiple versions of itself:

def join(q, ent):
    if not hasattr(q, '_already_joined'):
        already_joined = q._already_joined = frozenset()
    else:
        already_joined = q._already_joined
    if ent not in already_joined:
        q = q.join(ent)
        q._already_joined = q._already_joined.union([ent])
    return q


def my_filter_thing(option1, option2, option3):
    q = session.query(A)

    if option1:
        q = join(q, B).filter(B.option1 == foo)
    if option2:
        q = join(q, B).filter(B.option2 == bar)


Another option is to try to anticipate what we'll be doing in #3225; that is, look in query._from_obj.  This is where we're poking around in things not 100% stable API over the long term and the #3225 API would be very preferable, but it could be:

from sqlalchemy.sql import util

def join(q, ent):
    if ent not in set(util.surface_selectables(q._from_obj[0])):
        q = q.join(ent)
    return q

surface_selectables() is a helper that basically looks at Join objects and pulls out the "left" and "right" of each recursively, so it can find tables.  query._from_obj is a list of elements in the FROM list as explicitly built, e.g. when you call query.select_from() or query.join().   It is usually of length one unless you've done more thing with select_from().  It does not pick up on implicit FROM objects that come in from the WHERE clause and such.


So short TL;DR;, it's a known use case, and there's an ambitious plan to make it really nice at some point, for now you kind of have to rough it on your own a bit.






Any thoughts, help or suggestions?
Thanks, Brian





--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at http://groups.google.com/group/sqlalchemy.
For more options, visit https://groups.google.com/d/optout.

Brian Cherinka

unread,
Sep 10, 2015, 4:33:52 PM9/10/15
to sqlalchemy
Hi Michael, 

Thanks for your response.  It helped a lot.  I ended up going with the quick and dirty query.from_obj[0] method you described.  That was faster to implement and served my purposes exactly. 

Cheers, Brian

Reply all
Reply to author
Forward
0 new messages