Relationship with complicated join conditions

788 views
Skip to first unread message

Simon King

unread,
Jan 22, 2014, 11:54:19 AM1/22/14
to sqlal...@googlegroups.com
Hi all,

I've been having a little trouble configuring a relationship between 2
mapped classes where the join condition pulls in another 2 tables. I
eventually got it working based on Mike's "method one" from
http://stackoverflow.com/questions/17580649/sqlalchemy-relationships-across-multiple-tables,
by adding foreign() and remote() around a couple of columns involved
in the join.

However, I don't really understand what I'm doing, and probably as a
result of that, the relationship doesn't work in conjunction with
joinedload (one of the tables is missing from the FROM clause).

I can work up a proper example script, but before that I'd really like
to understand the effect that foreign() and remote() have when
constructing the relationship. The comment in the SO post says:

# B.id is "remote", well at the moment this is kind of
# where the ORM wants it, it sort of means "this is where the stuff
# starts that's not directly part of the A side"

In my situation, I've got tables A, B, C and D, with foreign keys:

A->B
B->D
C->A
C->D

and then extra constraints between B and C, and A and D, and I'm
trying to build a relationship from A to D. In all that, I've no real
idea where to put foreign() or remote().

Thanks for any advice,

Simon

Jonathan Vanasco

unread,
Jan 22, 2014, 5:32:00 PM1/22/14
to sqlal...@googlegroups.com

Simon King

unread,
Jan 22, 2014, 6:43:31 PM1/22/14
to sqlal...@googlegroups.com
> On 22 Jan 2014, at 22:32, Jonathan Vanasco <jona...@findmeon.com> wrote:
>
> Would an association proxy help ?
>
> http://docs.sqlalchemy.org/en/rel_0_9/orm/extensions/associationproxy.html
>

Thanks for the suggestion. In fact I'm already using association
proxies in a couple of places, but I'm not sure it'll help in this
case, because what I'm really after is SA's ability to eager load the
relationship. Despite the intermediate tables, this is actually a 1-1
relationship.

For anyone who is interested, the tables I'm working with represent
users (table A) who belong to companies (table B). Companies have a
foreign key to a set of terms and conditions that their members must
accept (table D, different companies may have different terms). The
terms table has a version number column to keep track of when the
terms are updated.

Table C keeps track of the acceptances, with foreign keys to the user
and terms tables, and also the version of the terms at the time they
were accepted.

The relationship I'm trying to model is the "current valid
acceptance", ie the row in table C that points to the user, to the
same set of terms as the company that the user belongs to, and has the
same version number that the terms has. The join conditions look
something like:

User.companyid == Company.id
Company.termsid == Terms.id
Acceptance.userid == User.id
Acceptance.termsid == Company.termsid
Acceptance.termsversion == Terms.version

And in fact one more condition as well: DATEDIFF(NOW(),
Acceptance.datetime) <= 365

If this relationship finds a matching row, the user has accepted the
terms in the last year. But if no row matches, the user must
re-accept.

I'll work up the test script tomorrow to clarify.

Thanks,

Simon

Michael Bayer

unread,
Jan 22, 2014, 6:45:10 PM1/22/14
to sqlal...@googlegroups.com

On Jan 22, 2014, at 11:54 AM, Simon King <si...@simonking.org.uk> wrote:

> Hi all,
>
> I've been having a little trouble configuring a relationship between 2
> mapped classes where the join condition pulls in another 2 tables. I
> eventually got it working based on Mike's "method one" from
> http://stackoverflow.com/questions/17580649/sqlalchemy-relationships-across-multiple-tables,
> by adding foreign() and remote() around a couple of columns involved
> in the join.
>
> However, I don't really understand what I'm doing, and probably as a
> result of that, the relationship doesn't work in conjunction with
> joinedload (one of the tables is missing from the FROM clause).

I’m actually not even sure the example I put in that stack overflow answer would work with joinedload(). joinedload() is designed to think of joining between two classes as a join between two tables, or if secondary is present, between three tables, and it is hardwired to do this by rendering a LEFT OUTER JOIN, which requires a specific table (or sub selectable) on each side.

When the primaryjoin mentions columns from a third (or fourth, etc.) table, lots of cases work when SQLAlchemy is rendering an “implicit join”, e.g. no JOIN keyword. The lazyload is one such case; lazyload doesn’t even render the parent table, it only renders the target table in the SELECT. if its WHERE clause happens to refer to some other table, that table gets pulled into the FROM clause, as the select() construct pulls everything from the WHERE clause into the FROM.

Within a join, that doesn’t work. If you do a select() like this:

select([a]).select_from(a.join(b, a.c.id == b.c.id)).where(b.c.c_id == c.c.id)

you’ll get the almost always not wanted SQL:

SELECT a.* FROM c, a JOIN b ON a.id = b.id WHERE b.c_id = c.id

where above we have “c, a JOIN b”, that’s the common failure case here. The “c” isn’t part of the join.

So really, “method one” is not a great answer here. Some version of either “method two” or “method three”, that is a non-primary mapper() to the destination tables, will be more resilient to JOIN use cases.

This whole subject area is a big missing hole in the docs. The use cases are complicated and we really don’t want to encourage long-chained primaryjoins in any case, unless they’re really needed.


> I can work up a proper example script, but before that I'd really like
> to understand the effect that foreign() and remote() have when
> constructing the relationship. The comment in the SO post says:
>
> # B.id is "remote", well at the moment this is kind of
> # where the ORM wants it, it sort of means "this is where the stuff
> # starts that's not directly part of the A side"
>
> In my situation, I've got tables A, B, C and D, with foreign keys:
>
> A->B
> B->D
> C->A
> C->D
>
> and then extra constraints between B and C, and A and D, and I'm
> trying to build a relationship from A to D. In all that, I've no real
> idea where to put foreign() or remote().

so really foreign() and remote() isn’t going to help with the join/joinedload use case, as there’s no system whereby joinedload() knows to render "a JOIN b JOIN c JOIN d” and such. foreign() and remote() are just trying to tell the relationship which columns are part of which side, and in which direction data should be copied during a flush (e.g. when we flush user->addresses, we are always copying user.id -> address.user_id, that is, PK -> FK - the foreign() annotation is a way of controlling which column in “user.id == address.user_id” is the “FK”).

Going forward, the a->secondary->b technique, as well as the a-><non primary mapper of b, c, d, e…> technique, I am hoping will become more feasible, now that the ORM can nest JOINs together without needing to use a SELECT subquery.

However, trying out your use case the way I’d like, doesn’t work yet, which is that I’d like to set up “secondary” just as a join() object. In 0.9.1, you still get a SELECT subquery for join() and joinedload(), though you’ll get the right results; but lazy loads are failing. I can fix both issues but that would be in 0.9.2.

In 0.9.1, the following setup seems to work so far, but it requires a SELECT subquery for the “secondary” table to work fully, which is the join of B and C:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class A(Base):
__tablename__ = 'a'

id = Column(Integer, primary_key=True)
b_id = Column(ForeignKey('b.id'))


class B(Base):
__tablename__ = 'b'

id = Column(Integer, primary_key=True)
d_id = Column(ForeignKey('d.id'))

class C(Base):
__tablename__ = 'c'

id = Column(Integer, primary_key=True)
a_id = Column(ForeignKey('a.id'))
d_id = Column(ForeignKey('d.id'))

class D(Base):
__tablename__ = 'd'

id = Column(Integer, primary_key=True)

a = A.__table__
b = B.__table__
d = D.__table__
c = C.__table__

j = join(b, d, b.c.d_id == d.c.id).join(c, c.c.d_id == d.c.id).alias()
A.d = relationship("D",
secondary=j,
primaryjoin=and_(a.c.b_id == j.c.b_id, a.c.id == j.c.c_a_id),
secondaryjoin=d.c.id == j.c.b_d_id)


e = create_engine("postgresql://scott:tiger@localhost/test", echo=True)
Base.metadata.drop_all(e)
Base.metadata.create_all(e)

sess = Session(e)

sess.query(A).options(joinedload(A.d)).all()
sess.query(A).join(A.d).all()



signature.asc

Simon King

unread,
Jan 22, 2014, 7:17:23 PM1/22/14
to sqlal...@googlegroups.com
> On 22 Jan 2014, at 23:45, Michael Bayer <mik...@zzzcomputing.com> wrote:
>
>
>> On Jan 22, 2014, at 11:54 AM, Simon King <si...@simonking.org.uk> wrote:
>>
>> Hi all,
>>
>> I've been having a little trouble configuring a relationship between 2
>> mapped classes where the join condition pulls in another 2 tables. I
>> eventually got it working based on Mike's "method one" from
>> http://stackoverflow.com/questions/17580649/sqlalchemy-relationships-across-multiple-tables,
>> by adding foreign() and remote() around a couple of columns involved
>> in the join.
>>
>> However, I don't really understand what I'm doing, and probably as a
>> result of that, the relationship doesn't work in conjunction with
>> joinedload (one of the tables is missing from the FROM clause).
>
> I’m actually not even sure the example I put in that stack overflow answer would work with joinedload(). joinedload() is designed to think of joining between two classes as a join between two tables, or if secondary is present, between three tables, and it is hardwired to do this by rendering a LEFT OUTER JOIN, which requires a specific table (or sub selectable) on each side.
>
> When the primaryjoin mentions columns from a third (or fourth, etc.) table, lots of cases work when SQLAlchemy is rendering an “implicit join”, e.g. no JOIN keyword. The lazyload is one such case; lazyload doesn’t even render the parent table, it only renders the target table in the SELECT. if its WHERE clause happens to refer to some other table, that table gets pulled into the FROM clause, as the select() construct pulls everything from the WHERE clause into the FROM.
>
> Within a join, that doesn’t work. If you do a select() like this:
>
> select([a]).select_from(a.join(b, a.c.id == b.c.id)).where(b.c.c_id == c.c.id)
>
> you’ll get the almost always not wanted SQL:
>
> SELECT a.* FROM c, a JOIN b ON a.id = b.id WHERE b.c_id = c.id
>
> where above we have “c, a JOIN b”, that’s the common failure case here. The “c” isn’t part of the join.
>

Ah, OK, that certainly explains some of the behaviour I was seeing.


> So really, “method one” is not a great answer here. Some version of either “method two” or “method three”, that is a non-primary mapper() to the destination tables, will be more resilient to JOIN use cases.
>

I read the bit in the docs about non-primary mappers but was scared
off by the "almost never needed" warnings. Actually, for the purposes
I'm using it for, a completely separate class mapped to the select
would probably be fine.


> This whole subject area is a big missing hole in the docs. The use cases are complicated and we really don’t want to encourage long-chained primaryjoins in any case, unless they’re really needed.
>
>
>> I can work up a proper example script, but before that I'd really like
>> to understand the effect that foreign() and remote() have when
>> constructing the relationship. The comment in the SO post says:
>>
>> # B.id is "remote", well at the moment this is kind of
>> # where the ORM wants it, it sort of means "this is where the stuff
>> # starts that's not directly part of the A side"
>>
>> In my situation, I've got tables A, B, C and D, with foreign keys:
>>
>> A->B
>> B->D
>> C->A
>> C->D
>>
>> and then extra constraints between B and C, and A and D, and I'm
>> trying to build a relationship from A to D. In all that, I've no real
>> idea where to put foreign() or remote().
>
> so really foreign() and remote() isn’t going to help with the join/joinedload use case, as there’s no system whereby joinedload() knows to render "a JOIN b JOIN c JOIN d” and such. foreign() and remote() are just trying to tell the relationship which columns are part of which side, and in which direction data should be copied during a flush (e.g. when we flush user->addresses, we are always copying user.id -> address.user_id, that is, PK -> FK - the foreign() annotation is a way of controlling which column in “user.id == address.user_id” is the “FK”).
>

Ok, thanks for the explanation.
Mike, thanks as always for the very comprehensive answer. I'm
currently on 0.8 but this seems like a good incentive to upgrade. I'll
try it out tomorrow and will let you know how I get on.

Thanks again,

Simon

Michael Bayer

unread,
Jan 22, 2014, 8:45:21 PM1/22/14
to sqlal...@googlegroups.com

On Jan 22, 2014, at 7:17 PM, Simon King <si...@simonking.org.uk> wrote:

>
> I read the bit in the docs about non-primary mappers but was scared
> off by the "almost never needed" warnings. Actually, for the purposes
> I'm using it for, a completely separate class mapped to the select
> would probably be fine.

So I’ve added new doc sections:

http://docs.sqlalchemy.org/en/rel_0_9/orm/relationships.html#composite-secondary-join

http://docs.sqlalchemy.org/en/rel_0_9/orm/relationships.html#relationship-to-non-primary-mapper


the first one is the one I gave you, and various aspects of that example need things I just committed, though the more version I gave in this email *should* work on 0.8, more or less. The second is showing the classic “secondary mapper” use case and I’ve tried to adjust some of the other language that refers to it. That case is also applicable to 0.8 more or less. I think I’ve identified the difference between “long primaryjoin”, “elaborate secondary”, and “non primary mapper” like this:

1. long primaryjoin works until you have tables mentioned in the clause which are joined to each other, e.g. C and D in these examples. a JOIN between all tables won’t render correctly.

2. elaborate secondary works until you have to refer to conditions directly between A and B, not just within the “secondary” selectable to A and B separately.

3. non primary mapper allows you to set up any series of columns as a target and build a relationship to it, but the current restrictions are either that you have to join to a subquery which is inefficient, or if you join to a join(), you have to disambiguate the names of columns; in both cases you get a bunch of extra columns stuck on your mapping also.


>
>> Going forward, the a->secondary->b technique, as well as the a-><non primary mapper of b, c, d, e...> technique, I am hoping will become more feasible, now that the ORM can nest JOINs together without needing to use a SELECT subquery.
> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email tosqlalchemy...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at http://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/groups/opt_out.

signature.asc

Simon King

unread,
Jan 23, 2014, 2:07:20 PM1/23/14
to sqlal...@googlegroups.com
That's fantastic, thanks so much. I feel bad that my silly use case
has caused so much work for you and grown the docs even more (perhaps
you need a separate "Tricks for People who Should Know Better"
section)

If I understand what you've written correctly, the non-primary-mapper
version is the only one that will meet my needs. It seems to be
working well, both lazy and eager loading, and the extra properties
appearing on the class aren't an issue (I'm marking a lot of them as
deferred so as not to load too much from the database).

Thanks again,

Simon

Michael Bayer

unread,
Jan 23, 2014, 2:46:13 PM1/23/14
to sqlal...@googlegroups.com

On Jan 23, 2014, at 2:07 PM, Simon King <si...@simonking.org.uk> wrote:

>
> That's fantastic, thanks so much. I feel bad that my silly use case
> has caused so much work for you and grown the docs even more (perhaps
> you need a separate "Tricks for People who Should Know Better"
> section)

oh but this use case comes up all the time, it’s now that SQLA can really write a decent SELECT for this kind of thing that I felt it was time to make it an official use case on the site.

>
> If I understand what you've written correctly, the non-primary-mapper
> version is the only one that will meet my needs. It seems to be
> working well, both lazy and eager loading, and the extra properties
> appearing on the class aren't an issue (I'm marking a lot of them as
> deferred so as not to load too much from the database).

Right, I think the differentiating factor when NP is needed is that you’re joining A->B, there’s any number of C, D in the middle, but also A and B have some direct foreign keys as well. is that the case for you?

the non_primary use case could still benefit from some cleanup; the column naming issue as well as the fact that you get those extra columns in your results.
signature.asc

Simon King

unread,
Jan 24, 2014, 8:40:03 AM1/24/14
to sqlal...@googlegroups.com
For anyone interested, here is roughly what I ended up with (full test
script attached):

import sqlalchemy as sa
import sqlalchemy.orm as saorm
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Terms(Base):
__tablename__ = 'terms'
id = sa.Column(sa.Integer(), primary_key=True)
version = sa.Column(sa.Integer(), default=1)
terms = sa.Column(sa.UnicodeText())

class Company(Base):
__tablename__ = 'company'
id = sa.Column(sa.Integer(), primary_key=True)
name = sa.Column(sa.Unicode(60))
termsid = sa.Column(sa.Integer, sa.ForeignKey(Terms.id),
nullable=False)
terms = saorm.relationship(Terms)

class User(Base):
__tablename__ = 'user'
id = sa.Column(sa.Integer(), primary_key=True)
name = sa.Column(sa.Unicode(60))
companyid = sa.Column(sa.Integer, sa.ForeignKey(Company.id),
nullable=False)
company = saorm.relationship(Company, backref='members')

class Acceptance(Base):
__tablename__ = 'acceptance'
userid = sa.Column(sa.Integer(), sa.ForeignKey(User.id),
primary_key=True)
termsid = sa.Column(sa.Integer(), sa.ForeignKey(Terms.id),
primary_key=True)
termsversion = sa.Column(sa.Integer(), nullable=False)

user = saorm.relationship(User)
terms = saorm.relationship(Terms)

j = (saorm.join(Company, Terms)
.join(Acceptance,
sa.and_(Acceptance.termsid == Terms.id,
Acceptance.termsversion == Terms.version)
)
)
currentacceptance = saorm.mapper(
Acceptance,
j,
properties={
'companyid': j.c.company_id,
'termsid': [j.c.terms_id,
j.c.company_termsid,
j.c.acceptance_termsid],
},
non_primary=True)

User.currentacceptance = saorm.relationship(
currentacceptance,
primaryjoin=sa.and_(
User.id == saorm.foreign(j.c.acceptance_userid),
User.companyid == j.c.company_id,
),
viewonly=True,
uselist=False)


Thanks again to Mike for all your help,

Simon
sajointest.py
Reply all
Reply to author
Forward
0 new messages