Using sqlalchemy to figure out multi table relationships

2,280 views
Skip to first unread message

Abdeali Kothari

unread,
Jul 11, 2019, 7:41:31 AM7/11/19
to sqlalchemy
I am trying to use SQLAlchemy to do some smart joins for me without me having to explicitly figure out the joins during queries.
(i.e. by figuring out the relationships on its own to figure out how the tables are related to each other)

I have an example where i have BookSeries -> Book -> Boot2AuthorTable -> Author
to link a series to the authors who wrote the series.

If I do something like:
>>> print(Query(BookSeries).join(Author))
It throws an error:
InvalidRequestError: Don't know how to join to <class '__main__.Author'>;
                     please use an ON clause to more clearly establish the left side of this join

Doing an explicit join one-by-one
>>> print(Query(BookSeries).join(Book).join(Book2Author).join(Author))
SELECT ...
FROM bookseries
  JOIN book ON bookseries.series_id = book.series_id
  JOIN auth2book ON book.book_id = auth2book.book_id
  JOIN author ON author.author_id = auth2book.author_id

Seems to do what I expected it to do.

I'm trying to figure out if there any way for me to not have to give it all the tables in between and it auto-magically figured it out for me ?
Note: I understand that not all examples are as simple as this one. And there are nuances about when to do join/leftjoin/etc. and also about multiple possible paths existing between the tables.
    Assuming those are not an issue for now.

Also, the reason I do not want to mention the intermediate tables myself, is because the schema of all the tables are not managed by me - as it is read from an external database.

Either sqlalchemy itself, extensions, or third party libraries, or any pointers on logic to how I can solve something like this would be appreciated !



import sqlalchemy as sa
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm.query import Query
from sqlalchemy.orm import relationship

Base = declarative_base()

class BookSeries(Base):
    __tablename__ = "bookseries"
    pk_id = sa.Column(sa.String, primary_key=True)
    series_id = sa.Column(sa.String)
    series_name = sa.Column(sa.String)
    books = relationship('Book', back_populates='book_series')



class Book(Base):
    __tablename__ = "book"
    pk_id = sa.Column(sa.String, primary_key=True)
    book_id = sa.Column(sa.String)
    series_id = sa.Column(sa.String, sa.ForeignKey('bookseries.series_id'))
    book_name = sa.Column(sa.String)
    book_series = relationship('BookSeries', back_populates='books')
    book_authors = relationship('Book2Author', back_populates='book')


class Book2Author(Base):
    __tablename__ = "auth2book"
    pk_id = sa.Column(sa.String, primary_key=True)
    author_id = sa.Column(sa.String, sa.ForeignKey('author.author_id'))
    book_id = sa.Column(sa.String, sa.ForeignKey('book.book_id'))
    author = relationship('Author')
    book = relationship('Book', back_populates='book_authors')


class Author(Base):
    __tablename__ = "author"
    pk_id = sa.Column(sa.String, primary_key=True)
    author_id = sa.Column(sa.String)
    author_name = sa.Column(sa.String)

Mike Bayer

unread,
Jul 11, 2019, 10:16:00 AM7/11/19
to noreply-spamdigest via sqlalchemy


On Thu, Jul 11, 2019, at 7:41 AM, Abdeali Kothari wrote:
I am trying to use SQLAlchemy to do some smart joins for me without me having to explicitly figure out the joins during queries.
(i.e. by figuring out the relationships on its own to figure out how the tables are related to each other)

I have an example where i have BookSeries -> Book -> Boot2AuthorTable -> Author
to link a series to the authors who wrote the series.

If I do something like:
>>> print(Query(BookSeries).join(Author))
It throws an error:
InvalidRequestError: Don't know how to join to <class '__main__.Author'>;
                     please use an ON clause to more clearly establish the left side of this join

Doing an explicit join one-by-one
>>> print(Query(BookSeries).join(Book).join(Book2Author).join(Author))
SELECT ...
FROM bookseries
  JOIN book ON bookseries.series_id = book.series_id
  JOIN auth2book ON book.book_id = auth2book.book_id
  JOIN author ON author.author_id = auth2book.author_id

Seems to do what I expected it to do.

I'm trying to figure out if there any way for me to not have to give it all the tables in between and it auto-magically figured it out for me ?
Note: I understand that not all examples are as simple as this one. And there are nuances about when to do join/leftjoin/etc. and also about multiple possible paths existing between the tables.
    Assuming those are not an issue for now.

Also, the reason I do not want to mention the intermediate tables myself, is because the schema of all the tables are not managed by me - as it is read from an external database.

Either sqlalchemy itself, extensions, or third party libraries, or any pointers on logic to how I can solve something like this would be appreciated !


SQLAlchemy only knows how to do such joins if there is a relationship() construct present.

Since it seems you are saying you don't want to write the relationship() yourself, the automap extension will attempt to guess these for you:


however automap may lack the necessary API granularity to implement *only* relationships and not figure everything else out too, but I'd suggest playing with it and maybe reading the source a bit to get some ideas for how you might adapt this to your exact needs.



--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
To post to this group, send email to sqlal...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mike Bayer

unread,
Jul 11, 2019, 10:19:26 AM7/11/19
to noreply-spamdigest via sqlalchemy

On Thu, Jul 11, 2019, at 7:41 AM, Abdeali Kothari wrote:
I am trying to use SQLAlchemy to do some smart joins for me without me having to explicitly figure out the joins during queries.
(i.e. by figuring out the relationships on its own to figure out how the tables are related to each other)

I have an example where i have BookSeries -> Book -> Boot2AuthorTable -> Author
to link a series to the authors who wrote the series.

If I do something like:
>>> print(Query(BookSeries).join(Author))
It throws an error:
InvalidRequestError: Don't know how to join to <class '__main__.Author'>;
                     please use an ON clause to more clearly establish the left side of this join

Doing an explicit join one-by-one
>>> print(Query(BookSeries).join(Book).join(Book2Author).join(Author))

to be more specific, instead of using automap directly, you could look at how it traverses through all the tables in a MetaData collection to find linkages, and you could write your own function:

def join(some_query, source, dest):
  # ...

which finds a path between source and dest.  There can of course be multiple such paths but it's a pretty standard comp sci problem if this is what you re looking to do :)     Tables are nodes, edges are ForeignKey objects which you can collect by iterating through Table.foreign_keys.  
 



Abdeali Kothari

unread,
Jul 11, 2019, 10:35:13 AM7/11/19
to sqlal...@googlegroups.com
I was hoping to not do graph traversals myself, because for the more complex cases it gets complicated, and was wondering if there was someone who had figured out the heuristics for some of the more complex cases ^_^
I will take a look at automap, I did see it - but did not realize that it also handles relationships.
Thanks for the input!


Reply all
Reply to author
Forward
0 new messages