Left join on one-to-many relationship with filters

5,031 views
Skip to first unread message

Fotinakis

unread,
Mar 19, 2008, 8:31:23 PM3/19/08
to sqlalchemy
Hello all,

I have a question about SQLAlchemy when dealing with joining on one-to-
many relationships--I've looked through the tutorials and documentation
many times, but I just can't figure this out! Thanks for any help...

OK, so I'm going to make up a scenario close to what I'm doing, using
Users and Addresses.

A User can have multiple Addresses, and (important) an Address can be
one of many types (as in, a work address, a home address, or a mailing
address).

I need to basically accomplish this following SQL (get all users, and
include the work address if it exists):
(all pseudo code, just meant to get across the idea, no guarantees on
syntactical correctness)

SELECT *
FROM users
LEFT OUTER JOIN
( SELECT * FROM addresses WHERE type = 1 )
AS addresses ON users.id = addresses.uid

I can do this:

query =
session.query(User).add_entity(Address).outerjoin(addresses).filter(Address.type=='home')

But, that filters on the entire query, not just on the joined sub-
query, generating something like this:

SELECT *
FROM users
LEFT OUTER JOIN addresses ON users.id = addresses.uid
WHERE addresses.type = 1
ORDER BY hosts.mac

Because it's a one-to-many relationship, this query only returns the
users that have a home addresses ... and _excludes_ users totally who
have an address, but one that is not of type 'home'. I need it to
return all users regardless (hence the LEFT JOIN) and just join
addresses of type 1.

I hope that makes sense, any ideas?

Michael Bayer

unread,
Mar 20, 2008, 10:23:57 AM3/20/08
to sqlal...@googlegroups.com

you'd probably want to put the criterion in the ON clause:


session
.query
(User).add_entity(Address).select_from(users.outerjoin(addresses,
and_(Address.type=='home', Address.user_id=User.id)))

alternatively you can shove the actual subquery in there in a few
ways, one of them is like this:

sel = addresses.select().where(Address.type=='home')
session.query(User).add_entity(Address).outerjoin(('addresses',
sel))

or otherwise spell out the join to the subquery using select_from()
again.


Fotinakis

unread,
Mar 20, 2008, 4:47:41 PM3/20/08
to sqlalchemy
Ahhh ... that makes complete sense. Thanks very much Michael--glad to
have this figured out.

Manoj Mokashi

unread,
May 22, 2019, 1:02:06 AM5/22/19
to sqlalchemy
Hi Michael,

It would be nice to have something like options(filterrelated(relation, filter))
That way, since the join is already specified in the relation, we don't have to add it again.
Or is that difficult due to lazyloading ?

Btw, when we use contains_eager() with add_entity(), it creates two entities in the output instead of just the main one.
Is that expected ?

regards,
manoj

Mike Bayer

unread,
May 22, 2019, 2:35:26 AM5/22/19
to sqlal...@googlegroups.com
On Wed, May 22, 2019 at 1:02 AM Manoj Mokashi <manojm...@gmail.com> wrote:
>
> Hi Michael,
>
> It would be nice to have something like options(filterrelated(relation, filter))
> That way, since the join is already specified in the relation, we don't have to add it again.
> Or is that difficult due to lazyloading ?

hi there -

you're responding to a message that is 11 years old. Can you start
from scratch and specify what you are looking for?

"filterelated" as you describe would appear to correspond to:


query(Thing).outerjoin(Thing.relationship).filter(Thing.something ==
'bar').options(contains_eager(Thing.relationship))

that is, you are asking for three separate things, so there's three
separate methods to call. I don't see it as straightforward from an
API perspective to add various shortcut functions that perform these
tasks in a different way since this spreads confusion.

If you'd like you make your own "filterrelated" function, do this:

def filterrelated(relation, filter):
def transform(q):
return q.outerjoin(relation).filter(filter).options(contains_eager(relation))
return transform


then you use it like this:

my_query = my_query.with_transformation(filterrelated(relation, filter))

that way you can have the funciton you want and SQLAlchemy's official
API doesn't have to have a confusing array of functions that all do
similar yet different things.
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/f10335a2-9b95-4f1c-881b-03202d697121%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Manoj Mokashi

unread,
May 23, 2019, 1:11:51 AM5/23/19
to sqlal...@googlegroups.com
Hi Michael,

Thanks for the quick reply as usual.
You have understood the use case correctly.
Was aiming for DRY( don't repeat yourself) for the relation : once i declare the relation, i should not have to join to it again, to filter it.
The transform looks useful.

Another thing : Is it possible to specify which relation columns to fetch when i declare the relation ?

regards,
manoj

You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/2r-CzHOnGx4/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.

To post to this group, send email to sqlal...@googlegroups.com.
Visit this group at https://groups.google.com/group/sqlalchemy.

Manoj Mokashi

unread,
May 23, 2019, 8:04:01 AM5/23/19
to sqlal...@googlegroups.com
>>Was aiming for DRY( don't repeat yourself) for the relation : once i declare the relation, i should not have to join to it again, to filter it.

To be clearer,
The relationship could be loaded lazy or selectin, not necessarily a join.
It should load as defined, but with the filter applied.

regards,
manoj

Mike Bayer

unread,
May 23, 2019, 2:16:11 PM5/23/19
to sqlal...@googlegroups.com


On Thu, May 23, 2019, at 8:04 AM, Manoj Mokashi wrote:
>>Was aiming for DRY( don't repeat yourself) for the relation : once i declare the relation, i should not have to join to it again, to filter it.

DRY is about unnecessary repetition that makes it difficult to change parts of the code later, because certain elements are repeated in many places.    That's not the case here as there are two distinct API operations being performed on the object.



To be clearer,
The relationship could be loaded lazy or selectin, not necessarily a join.
It should load as defined, but with the filter applied.

filters don't apply to lazy or selectin loads or any other kind of load other than contains_eager() which is against the Query as it stands, so I'm not sure what you mean.


Manoj Mokashi

unread,
May 24, 2019, 6:17:13 AM5/24/19
to sqlal...@googlegroups.com
About DRY :
If i define a join condition in a relationship, but still have to define it again in the queries, it seems to go against DRY.
If the condition changes, i would have to change it everywhere. I suppose we could have method like joinAddress(query) to do it in a single place.

Consider getting only Mumbai city addresses of all users.
qry_users = session.query(User).outerjoin(User.addresses, Address.city=='Mumbai').options(contains_eager(User.addresses)).all()
Currently, the above is not allowed, i.e. specify additional condition for a relation in join. 
So we have to user Address entity and add the join condition again.
qry_users = session.query(User).outerjoin(Address, (Address.user_id == User.id) & (Address.city=='Mumbai')).options(contains_eager(User.addresses)).all()

If in this case, it were allowed to add the condition to the relation, it would be nice.

But it still forces me to use a join, whereas my relation could be set to load differently.
I know query api is pretty complex and flexible, and i am new to sqlalchemy, so i probably don't understand many things well.

I was thinking that if we had a options(filterrelated(User.addresses, Address.city=='Mumbai')), it would fetch Users,
and for each User, it would set this filter condition in User.addresses.
Later, when the addresses relation is to be loaded by whatever way, it would see this filter and apply it.
My usage then becomes :
qry_users = session.query(User).options(filterrelated(User.addresses, Address.city=='Mumbai')).all()



Simon King

unread,
May 24, 2019, 6:56:00 AM5/24/19
to sqlal...@googlegroups.com
(Below is just my opinion, feel free to ignore it)

I think this example goes against some of the philosophy of
SQLAlchemy, in that "User.addresses" is not really intended to be
filtered. Since a given user can only exist once in a session, if you
load that user with a filtered "addresses" property, any subsequent
code that uses the same session to load that user from the database
will also get that filtered view, whether they wanted it or not.

Also, I'm not sure what happens if the user has *already* been loaded
into the session; I don't know if your query will overwrite the
"user.addresses" property, or if it will leave it untouched.

Instead, I think you should probably use something like:

def get_city_addresses(session, city):
q = session.query(User, Address).outerjoin(User.addresses,
Address.city==city).order_by(User.id)
for user, addresses in itertools.groupby(q, key=lambda (u, a): u):
yield user, list(addresses)

Simon
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CALbcBG%2B5XcftoxL5zOGUZFf2Vntso_oMX58rj6HXjgY4LQcQKw%40mail.gmail.com.

Manoj Mokashi

unread,
May 24, 2019, 7:18:47 AM5/24/19
to sqlal...@googlegroups.com
HI Simon,
thanks or your input.
Yes, that was one question i forgot to add : how to prevent caching of addresses in this case.
From my experiment, one can use session.commit/rollback/expire_all or expire(entity) specifically.
That might mean caching is related to the transactions and not session ?
Have you run your query ? I thought a condition was not allowed for relations, i.e should be outerjoin Address rather than User.addresses.


Simon King

unread,
May 24, 2019, 8:10:12 AM5/24/19
to sqlal...@googlegroups.com
I haven't run my query, no. If that form doesn't work, try this instead:

session.query(User,
Address).outerjoin(User.addresses).filter(Address.city==city)

"Caching" is a property of the session, but it can be affected by the
transaction. By default, Session.expire_on_commit is True, so all
cached relationships are expired when you commit.

Simon
> To view this discussion on the web visit https://groups.google.com/d/msgid/sqlalchemy/CALbcBG%2BZa%2BednRBT_R2Jn3SqoVPGJGgf1%3DwD7v%2BczTL3u%2BZBzw%40mail.gmail.com.

Mike Bayer

unread,
May 24, 2019, 9:03:44 AM5/24/19
to sqlal...@googlegroups.com


On Fri, May 24, 2019, at 6:17 AM, Manoj Mokashi wrote:
About DRY :
If i define a join condition in a relationship, but still have to define it again in the queries, it seems to go against DRY.
If the condition changes, i would have to change it everywhere. I suppose we could have method like joinAddress(query) to do it in a single place.


Consider getting only Mumbai city addresses of all users.
qry_users = session.query(User).outerjoin(User.addresses, Address.city=='Mumbai').options(contains_eager(User.addresses)).all()
Currently, the above is not allowed, i.e. specify additional condition for a relation in join. 
So we have to user Address entity and add the join condition again.
qry_users = session.query(User).outerjoin(Address, (Address.user_id == User.id) & (Address.city=='Mumbai')).options(contains_eager(User.addresses)).all()
If in this case, it were allowed to add the condition to the relation, it would be nice.

so as far as contains_eager() + join() and explicitly being able to add conditions to a relatonship, a lot of people complain about this.    there was an email thread about it regarding many-to-ones, and then also I proposed a feature for a one-to-many (should work with many-to-one also) version here: https://github.com/sqlalchemy/sqlalchemy/issues/4472#issuecomment-458995069   note that you *can* add conditions to a one-to-many relationship as detailed in that issue via "Class.relationship & (<some criteria>)", see the examples.

I have it targeted at 1.4 but that is not necessarily where it will happen because I have a lot of other things going on for 1.4 and that feature is not a simple one.



But it still forces me to use a join, whereas my relation could be set to load differently.
I know query api is pretty complex and flexible, and i am new to sqlalchemy, so i probably don't understand many things well.

I was thinking that if we had a options(filterrelated(User.addresses, Address.city=='Mumbai')), it would fetch Users,
and for each User, it would set this filter condition in User.addresses.

there's still a join there.   in the above proposal you'd use joinedload() and it's just one call.

Jonathan Vanasco

unread,
May 24, 2019, 12:48:23 PM5/24/19
to sqlalchemy
I agree with Simon, and I think I'm very much -1 on the usage of contains_eager above (and I think that ticket you linked to, Mike).  I find that sort of stuff causes a lot of bugs in the long run.

I am looking at it from this perspective, which is the same as Simon's but some stronger language...

The relationship `User.addresses` is specified as the logical relationship of a User to *all* Addresses.
If the addresses are filtered to only "home" or a particular city, that collection does not represent **all** Addresses and should not be mapped to the `.addresses` relationship.

By allowing `contains_eager` on that filtered view, it is very possible that another section of code will be utilizing the object and not knowing that .addresses is filtered.  The filtered view is Addresses, but not User.addresses.  I've probably wanted to do this myself in the past, but experience has really taught me that this should not be allowed.  

I'd be +1 on a feature that prevents `contains_eager` to be invoked on a relationship if it is filtered/joined differently -  though that would be immensely hard to detect on complex joins and likely not worth the effort.

Mike Bayer

unread,
May 24, 2019, 1:29:32 PM5/24/19
to sqlal...@googlegroups.com


On Fri, May 24, 2019, at 12:48 PM, Jonathan Vanasco wrote:
I agree with Simon, and I think I'm very much -1 on the usage of contains_eager above (and I think that ticket you linked to, Mike).  I find that sort of stuff causes a lot of bugs in the long run.

I am looking at it from this perspective, which is the same as Simon's but some stronger language...

The relationship `User.addresses` is specified as the logical relationship of a User to *all* Addresses.
If the addresses are filtered to only "home" or a particular city, that collection does not represent **all** Addresses and should not be mapped to the `.addresses` relationship.

So, this is absolutely how this was intended.  However, for years, and years, and years, people constantly ask for their object-bound relationships to be filtered, hence contains_eager() was created to make this possible.  I don't think I have *ever* used contains_eager() to produce a filtered relationship, for the reasons you mention; the collection is defined as *all* records and if you start messing with that, you don't know what you have, because the ORM uses an identity map.

But boy, do people want it to be filtered.   I don't know what they're doing, and perhaps if everyone that uses filtered relationships eventually realizes it's a bad idea, if there were some way to describe that.  I mean, see if you can convince the person who posted issue https://github.com/sqlalchemy/sqlalchemy/issues/4472 they don't need to do this.  


By allowing `contains_eager` on that filtered view, it is very possible that another section of code will be utilizing the object and not knowing that .addresses is filtered. 

yup



The filtered view is Addresses, but not User.addresses.  I've probably wanted to do this myself in the past, but experience has really taught me that this should not be allowed.  


I'd be +1 on a feature that prevents `contains_eager` to be invoked on a relationship if it is filtered/joined differently -  though that would be immensely hard to detect on complex joins and likely not worth the effort.

this is kind of the documented purpose of contains_eager(), to filter the join condition, so if you are using it for a collection, it's the exception rather than the norm that you'd be getting back primary objects with their complete collections associated.




--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages