Ambiguous joins, relationships, and aliases

111 views
Skip to first unread message

Dane K Barney

unread,
Jun 17, 2021, 1:04:14 PM6/17/21
to sqlalchemy
I have scoured the documentation looking for the preferred way of handling ambiguous joins, and it seems to be through use of aliased tables, but I still wonder if there's a better way. Intuitively, it seems like relationships hold enough information to make aliased tables unnecessary. But the API doesn't behave in that way (AFAIK), and I'm curious why.

Suppose I have the following the model:

Base = declarative_base()

class Parent(Base):
    __tablename__ = "parent"
    id = Column(Integer, primary_key=True)

class Child(Base):
    __tablename__ = "child"
    id = Column(Integer, primary_key=True)
    mother_id = Column(Integer, ForeignKey("parent.id"))
    father_id = Column(Integer, ForeignKey("parent.id"))

    mother = relationship("Parent", foreign_keys=[mother_id])
    father = relationship("Parent", foreign_keys=[father_id])

The Child table can join to the Parent table through 2 possible relationships: mother and father. When querying for Child entities, in order to reference columns through either of these relationships, it would appear that aliased() is required.

Something like this:

mother_alias = aliased(Parent)
father_alias = aliased(Parent)
statement = select(Child).join(mother_alias, Child.mother).join(father_alias, Child.father).where(mother_alias.id == "Martha", father_alias.id == "Bob")

What I find strange about this is that the alias and relationship it's associated with are not inherently coupled together. I associate them with one another when I pass them together into the join method(), but this association is only implicit and not enforced. It's up to me to keep track of which alias belongs with which relationship.

Given that I can join to a table using a relationship property, why is it not possible to also select columns and filter by columns using relationship properties? It seems like this would eliminate the need for aliased() in the above example entirely. This feels much more intuitive to me:

statement = select(Child).join(Child.mother).join(Child.father).where(Child.mother.id == "Martha", Child.father.id == "Bob")

In other words, the relationship can act as a proxy to its target table, and you can access all columns on that table through the relationship. The concept of an "aliased table" would be internal to the relationship, and the developer would not have to manage these aliases themselves.

Is this possible? Is there an obvious reason I'm missing for why the API cannot function in this way?

Thanks for any help or insight you can provide!
-Dane

Mike Bayer

unread,
Jun 17, 2021, 6:09:21 PM6/17/21
to noreply-spamdigest via sqlalchemy


On Thu, Jun 17, 2021, at 1:04 PM, Dane K Barney wrote:
I have scoured the documentation looking for the preferred way of handling ambiguous joins, and it seems to be through use of aliased tables, but I still wonder if there's a better way. Intuitively, it seems like relationships hold enough information to make aliased tables unnecessary.

technically, they do.  





But the API doesn't behave in that way (AFAIK), and I'm curious why.

We have had an API that does what you seek here for many years and quite frankly it's being removed in SQLAlchemy 2.0.   It is one of the most complex parts of the Query object and has for many years made adding new behaviors, fixing issues , and refactoring code extremely painful.    On top of that, it's a feature that we've observed virtually nobody uses, is aware of, or understands.



Suppose I have the following the model:

Base = declarative_base()

class Parent(Base):
    __tablename__ = "parent"
    id = Column(Integer, primary_key=True)

class Child(Base):
    __tablename__ = "child"
    id = Column(Integer, primary_key=True)
    mother_id = Column(Integer, ForeignKey("parent.id"))
    father_id = Column(Integer, ForeignKey("parent.id"))

    mother = relationship("Parent", foreign_keys=[mother_id])
    father = relationship("Parent", foreign_keys=[father_id])

The Child table can join to the Parent table through 2 possible relationships: mother and father. When querying for Child entities, in order to reference columns through either of these relationships, it would appear that aliased() is required.

Something like this:

mother_alias = aliased(Parent)
father_alias = aliased(Parent)
statement = select(Child).join(mother_alias, Child.mother).join(father_alias, Child.father).where(mother_alias.id == "Martha", father_alias.id == "Bob")

What I find strange about this is that the alias and relationship it's associated with are not inherently coupled together. I associate them with one another when I pass them together into the join method(), but this association is only implicit and not enforced. It's up to me to keep track of which alias belongs with which relationship.

Given that I can join to a table using a relationship property, why is it not possible to also select columns and filter by columns using relationship properties? It seems like this would eliminate the need for aliased() in the above example entirely. This feels much more intuitive to me:

statement = select(Child).join(Child.mother).join(Child.father).where(Child.mother.id == "Martha", Child.father.id == "Bob")

In other words, the relationship can act as a proxy to its target table, and you can access all columns on that table through the relationship. The concept of an "aliased table" would be internal to the relationship, and the developer would not have to manage these aliases themselves

Yes using legacy Query you would say:

q = session.query(Child).join(Child.mother, aliased=True).filter(Parent.id == "x").join(Child.father, aliased=True).filter(Parent.id == "y").

The order of join() / filter() is important as the Query has to keep track of the last join(Child.x,  aliased) as the current meaning of  "Parent".

So what's different here is that there is no "magic namespace" introduced on the relationship(), e.g. "Child.mother.id".   The "Child.mother" attribute already has its own set of query methods so is not appropriate as a namespace of columns; also, people would intuitively want to do things like "Child.mother.related_foo.related_bar.x.y.z" which would in theory represent some kind of join but there's no simple way to manage that when any attribute can span out like that (try out the Django ORM for more of that kind of approach).    Additionally, hardcoding these namespaces to "Child.mother" / "Child.father" means the API cannot be used for the more common approach of querying an adjacency list model, which looks like this:

class Node(Base):
   __tablename__ = 'node'

   id = Column(Integer, primary_key=True)
   parent_id = Column(ForeignKey("node.id"))
   data = Column(String)
   parent = relationship("Node", remote_side=id)

Above, to query Node many levels deep means we need "Node" itself to be interpreted differently at each step, so "Node.related" would remain ambiguous:

q = session.query(Node).filter(Node.data=='child').join(Node.parent, aliased=True).filter(Node.data == "parent").join(Node.parent, aliased=True).filter(Node.data == "grandparent")












.

Is this possible? Is there an obvious reason I'm missing for why the API cannot function in this way?

the reason is non-obvious as it took me ten years at least to decide this API wasn't worth the complexity.   it's still there in 1.4 but I am super excited to get rid of it.    Explicit is better than implicit, your code will be easier to read with the explicit aliased objects (more like real SQL)  and people were rightfully not very interested in digging into more implicitness.




Thanks for any help or insight you can provide!
-Dane


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Reply all
Reply to author
Forward
0 new messages