Detect whether a query already has a join on a table?

Kevin Q

unread,

Nov 21, 2012, 9:26:15 PM11/21/12

to sqlal...@googlegroups.com

I want to avoid double joining on the same table. I know query._from_obj is where the query stores the join elements. However, it's not there if the join is from query.options(joinedload('some_relation')). For example, I have the following table relations:

User:

* userid

* name

Thing

* thingid

* name

* userid

Thing.user = relation(User, User.userid==Thing.userid)

If I call:

query = session.query(Thing).options(joinedload('user')).filter(User.name=='blah').all()

This will generate the following query:

SELECT thing.thingid, thing.name, thing.userid, user1.userid, user1.name

FROM thing INNER JOIN user AS user1

INNER JOIN user

WHERE user.name == 'blah'

Notice the double join there.

Now, I wouldn't do that if I'm writing the query in a single function, but if the code is modular, the child object loading and filtering is done in separate functions, with the query being passed around. Is there a way for me to detect whether a query already has a join on a certain table, whether the join is from query.join() or query.options(joinedload(x))?

Any suggestion is welcome and appreciated.

Michael Bayer

unread,

Nov 22, 2012, 10:16:53 AM11/22/12

to sqlal...@googlegroups.com

On Nov 21, 2012, at 9:26 PM, Kevin Q wrote:

I want to avoid double joining on the same table. I know query._from_obj is where the query stores the join elements. However, it's not there if the join is from query.options(joinedload('some_relation')). For example, I have the following table relations:

User:
* userid
* name

Thing
* thingid
* name
* userid

Thing.user = relation(User, User.userid==Thing.userid)

If I call:

query = session.query(Thing).options(joinedload('user')).filter(User.name=='blah').all()

This will generate the following query:
SELECT thing.thingid, thing.name, thing.userid, user1.userid, user1.name
FROM thing INNER JOIN user AS user1
INNER JOIN user
WHERE user.name == 'blah'

Notice the double join there.

that example doesn't actually make sense. Your query only specifies joinedload('user'), which would only create "thing INNER JOIN user AS user1". I don't see anything about that query which would also create "INNER JOIN user", are you omitting an extra join() call ?

Now, I wouldn't do that if I'm writing the query in a single function, but if the code is modular, the child object loading and filtering is done in separate functions, with the query being passed around. Is there a way for me to detect whether a query already has a join on a certain table, whether the join is from query.join() or query.options(joinedload(x))?

I tend not to use this approach in an extended way, not only because there aren't very public APIs for this, but also it's not necessarily simple to determine "A joins to B" since "A" can join to "B" in any number of ways, with different criterion, A or B could be wrapped in a subquery for some reason, etc. The structure of the query can be more complicated than one which you'd want to be introspecting in order to get at various decisions for subsequent transformations. It's best if all the decisions for how the Query can be built can be made in one place, exporting not the Query object itself for remote transformations but rather some other "intent-collecting" structure which can be interpreted in one step at the end.

You can get the joins if you look into query._from_obj, but to get at eagerloads it's fairly tedious as you'd need to dig through query._attributes.

Basically I try to keep the number of non-coordinated functions which each participate in modifying the structure of a particular Query to a minimum, and if you're needing to dig in and find who's been applying eagerloads and options, I'd want to pull back on how many unrelated actors are all affecting the same structure.

If you truly need lots of non-coordinated functions to establish possibly-conflicting intentions upon a Query, another approach is to build a stateful structure around Query that represents the Query-specific activities which your non-coordinating functions all take. Basically your own coarse grained "Query" interface that then knows how to intelligently build the Query object based on the state given to it. Again, the approach here is one of bringing together all those actors that would modify a Query into one system so that they can coordinate closely, but that system is also one which provides a successful API to non-coordinated functions.

Kevin Qiu

unread,

Nov 22, 2012, 12:20:04 PM11/22/12

to sqlal...@googlegroups.com

In my example, the extra join was brought in by the relationship `Thing.user`. Because the query was on Thing, and the filtering criteria is on `User`, which, if I'm not mistaken, causes sqlalchemy to consult the mapper and bring in the relationship between `Thing` and `User`.

Anyway, I think your suggestion is super helpful. The component I'm working on does suffer from this non-coordinated query-modifying process. I need to tackle this problem on a different level.

Thanks for taking the time and writing this meticulous reply.

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Reply all

Reply to author

Forward