SQL Alchemy Closure Table Relationship Definition

849 views
Skip to first unread message

jonstjohn

unread,
Jan 9, 2012, 6:51:40 PM1/9/12
to sqlalchemy
I recently started working with SQL Alchemy for a project that
involves climbing areas and routes. Areas are hierarchical in that a
single area may contain multiple areas, which in turn may contain
other areas. A route is directly associated with a single area, but is
also associated with that area's parent, etc.

To implement this I chose to use a closure table ala Bill Karwin
(http://karwin.blogspot.com/2010/03/rendering-trees-with-closure-
tables.html). In the closure table implementation, a second table is
created to store the ancestor/descendent information. A self-
referencing row is created when a node is added, as well as a row for
each ancestor in the tree.


The table structure is as follows (simplified):

-- area --
area_id
name

-- area_relationship --
ancestor
descendent

-- route --
route_id
area_id
name


Sample data:

-- area --
1, New River Gorge
2, Kaymoor
3, South Nuttall
4, Meadow River Gorge

-- area_relationship (ancestor, descendent) --
1, 1 (self-referencing)
2, 2 (self-referencing)
1, 2 (Kaymoor is w/i New River Gorge)
3, 3 (self-referencing)
1, 3 (South Nutall is w/i New River Gorge)
4, 4 (self-referencing)

-- route (route_id, area_id, name)
1, 2, Leave it to Jesus
2, 2, Green Piece
3, 4, Fancy Pants


To query for all areas for a given route (up the tree), I can execute:

SELECT area.area_id, area.name
FROM route
INNER JOIN area_relationship ON route.area_id =
area_relationship.descendent
INNER JOIN area ON area.area_id = area_relationship.ancestor
WHERE route.route_id = 1
Similarly, I can query for all routes in a particular area (including
descendent areas) with:

SELECT route.route_id, route.name
FROM area
INNER JOIN area_relationship ON area.area_id =
area_relationship.ancestor
INNER JOIN route ON route.area_id = area_relationship.descendent
WHERE area.area_id = 1


In SQL Alchemy I've created a relationship and two tables to handle
these relationships:

area_relationship_table = Table('area_relationship', Base.metadata,
Column('ancestor', Integer, ForeignKey('area.area_id')),
Column('descendent', Integer, ForeignKey('area.area_id'))
)


DbArea class -

class DbArea(Base):

__tablename__ = 'area'

area_id = Column(Integer, primary_key = True)
name = Column(VARCHAR(50))
created = Column(DATETIME)

area_relationship_table.c.ancestor])

descendents = relationship('DbArea', backref = 'ancestors',
secondary = area_relationship_table,
primaryjoin = area_id == area_relationship_table.c.ancestor,
secondaryjoin = area_id ==
area_relationship_table.c.descendent)


DbRoute class -

class DbRoute(Base):

__tablename__ = 'route'

route_id = Column(Integer, primary_key = True)
area_id = Column(Integer, ForeignKey('area.area_id'))
name = Column(VARCHAR(50))
created = Column(DATETIME)

area = relationship("DbArea")

areas = relationship('DbArea', backref = 'routes',
secondary = area_relationship_table,
primaryjoin = area_id ==
area_relationship_table.c.ancestor,
secondaryjoin = area_id ==
area_relationship_table.c.descendent,
foreign_keys=[area_relationship_table.c.ancestor,
area_relationship_table.c.descendent])


Currently, I am able to determine the areas from the individual route,
using the areas relationship in DbRoute. However, when I try to use
the backref 'routes' in DbArea, I get the following error:

sqlalchemy.exc.StatementError: No column route.area_id is configured
on mapper Mapper|DbArea|area... (original cause: UnmappedColumnError:
No column route.area_id is configured on mapper Mapper|DbArea|area...)
'SELECT route.route_id AS route_route_id, route.area_id AS
route_area_id, route.name AS route_name, route.created AS
route_created \nFROM route, area_relationship \nWHERE %s =
area_relationship.descendent AND route.area_id =
area_relationship.ancestor' [immutabledict({})]

I'm guessing that I likely need to add something to DbArea to
establish the relationship, but after experimenting with some
different options was unable to determine the solution.

Michael Bayer

unread,
Jan 9, 2012, 7:28:46 PM1/9/12
to sqlal...@googlegroups.com

On Jan 9, 2012, at 6:51 PM, jonstjohn wrote:

> To implement this I chose to use a closure table ala Bill Karwin
> (http://karwin.blogspot.com/2010/03/rendering-trees-with-closure-
> tables.html). In the closure table implementation, a second table is
> created to store the ancestor/descendent information. A self-
> referencing row is created when a node is added, as well as a row for
> each ancestor in the tree.

This is a schema design I've not seen before. What's unusual here is that you'd like the "route" table to have something to do with the "area_relationship" table, however "area_relationship" has no foreign key to "route", nor vice versa. This doesn't follow typical normalization rules, in that it's very easy to have rows in DbRoute that don't exist in area_relationship_table. SQLAlchemy's ORM can't really navigate around a design like that and it's probably not really what you want here.

I checked the blog post you refer to and it does not have this pattern. It has just a "node" table, which here corresponds to DbArea, and "closure", which here would be "area_relationship_table" and in SQLA we call it a self-referential many-to-many. There's no "extra" table, so we need to figure out what you want there.

If it's the case that information such as "Leave it to Jesus", "Green Peace" are associated with a particular association between two areas, you would first need to promote area_relationship to be a fully mapped class, forming what we refer to as an "association object" (http://www.sqlalchemy.org/docs/orm/relationships.html#association-object) - a many-to-many table that contains additional information about the association. Based on how you've named things here I think you'd then want to foreign key that table to DbRoute, so that DbRoute contains a collection of AreaRelationship associations. You could also do something simpler which is just to add a string column to the AreaRelationship association table directly.

Introducing the association object usually leads to the usage of the association proxy to "hide" the middle object in most cases. As this is likely to be a lot to take in, and there's also some complexity in getting the "routes" for an area, the attached script illustrates a mapping that seems to correspond to the sample data you have.


test.py

Michael Bayer

unread,
Jan 9, 2012, 7:33:04 PM1/9/12
to sqlal...@googlegroups.com
typo in the DbArea.ancestors attribute:

ancestors = association_proxy("ancestor_rels", "ancestor")


more demos:

print s.query(DbArea).filter_by(name="Kaymoor").one().ancestors
print s.query(DbArea).filter_by(name="Kaymoor").one().descendents


jonstjohn

unread,
Jan 9, 2012, 11:58:07 PM1/9/12
to sqlalchemy
Michael -

I greatly appreciate the time and consideration you put into your
thorough reply. It has really helped me better understand how SQL
Alchemy handles associations. In particular, it is now apparent to me
that there is no clear association between area and route, which makes
it not possible to map through. Although I think I completely
understand your explanation and sample code, it might be helpful for
me to clarify the problem with a simple example in a common problem
domain.

The route and area relationship is similar to the hypothetical problem
of a business location. Suppose you have a set of business locations,
each in a specific city. The business must be associated with one and
only one city. The city is located in ever widening areas, e.g., the
county, region, state, country, planet, etc. Suppose you want to find
all businesses within a given county, or a state. I'm not sure I
agree that storing the city id in the business violates normalization,
since the business can have only one city. And I'm not sure that
storing the business id in every geographical designation (i.e.,
country, region, state, etc) is a better design. On the contrary, I
think that you would want to store the city id with the business, and
the relationship between geographical entities separately.

As I mentioned in the original post, I can construct an SQL query that
gets at this relationship (sorry if it got buried):

SELECT route.route_id, route.name
FROM area
INNER JOIN area_relationship ON area.area_id =
area_relationship.ancestor
INNER JOIN route ON route.area_id = area_relationship.descendent
WHERE area.area_id = 1

However, I still can't figure out how to create these mappings in SQL
Alchemy.

As a work-around, I queried for the descendents first, then filtered
with an 'in' for all descendents. This works fine for me since I only
have several levels of nesting, but would probably get inefficient if
I had deeper nestings (and thus a potentially very large 'in'
condition).

Work around:

def get_routes(area_id):
# do imports, init session, etc
area_ids = []
area = session.query(DbArea).filter(DbArea.area_id ==
area_id).one()
for descendent in area.descendents:
area_ids.append(descendent.area_id)
return
session.query(DbRoute).filter(DbRoute.area_id.in_(area_ids))


Let me know if you see a way to accomplish this w/o doing the
intermediate query for descendents and the 'in' condition.

Thanks again!
Jon

Michael Bayer

unread,
Jan 10, 2012, 1:22:07 AM1/10/12
to sqlal...@googlegroups.com

On Jan 9, 2012, at 11:58 PM, jonstjohn wrote:

> The route and area relationship is similar to the hypothetical problem
> of a business location. Suppose you have a set of business locations,
> each in a specific city. The business must be associated with one and
> only one city. The city is located in ever widening areas, e.g., the
> county, region, state, country, planet, etc. Suppose you want to find
> all businesses within a given county, or a state. I'm not sure I
> agree that storing the city id in the business violates normalization,
> since the business can have only one city.

> And I'm not sure that
> storing the business id in every geographical designation (i.e.,
> country, region, state, etc) is a better design. On the contrary, I
> think that you would want to store the city id with the business, and
> the relationship between geographical entities separately.

I was going to suggest relating the DbRoute directly to DbArea, then I noticed that you've actually done this with DbRoute.area_id and DbRoute.area, hadn't noticed that before.

DbRoute.areas asks relationship() to do something impossible - you're asking it to load DbArea objects but then the relationship is forced to not look at any columns that are actually in the DbArea table. You have it linking back to DbRoute.area_id on both sides. Hence it tries to link DbRoute.area_id to DbArea and fails.

>
> As I mentioned in the original post, I can construct an SQL query that
> gets at this relationship (sorry if it got buried):
>
> SELECT route.route_id, route.name
> FROM area
> INNER JOIN area_relationship ON area.area_id =
> area_relationship.ancestor
> INNER JOIN route ON route.area_id = area_relationship.descendent
> WHERE area.area_id = 1

This query suggests linking route on one side and area on the other, which is more traditional, so you'd just need to link to DbArea.area_id:

areas = relationship('DbArea', backref = 'routes',
secondary = area_relationship_table,
primaryjoin = area_id == area_relationship_table.c.ancestor,

secondaryjoin = DbArea.area_id == area_relationship_table.c.descendent,
innerjoin=True)

This should produce the equivalent idea, an implicit join when lazily loaded and INNER JOIN if joinedload() is used.


jonstjohn

unread,
Jan 10, 2012, 3:09:58 PM1/10/12
to sqlalchemy
Thanks for persisting with me - your solution was almost perfect, and
really pointing me in the right direction. Below is what eventually
worked:

areas = relationship('DbArea', backref = backref('routes',
order_by = 'DbRoute.name'),
secondary = area_relationship_table,
primaryjoin = area_id == area_relationship_table.c.descendent,
secondaryjoin = DbArea.area_id ==
area_relationship_table.c.ancestor,
innerjoin = True, order_by = DbArea.name,
foreign_keys = [area_relationship_table.c.ancestor,
area_relationship_table.c.descendent])

I swapped your descendent and ancestor, and added the foreign_keys due
a warning that I was receiving 'SAWarning: No ForeignKey objects were
present in secondary table 'area_relationship' ...'. I also added
some sorting for convenience.

This back and forth really helped me out in my work, but also to
better understand SA and particularly SA relationships. A million
thanks!

Jon
Reply all
Reply to author
Forward
0 new messages