contains_eager across multiple relationships using aliases

1,428 views
Skip to first unread message

Justin Thiessen

unread,
Jan 20, 2012, 6:59:01 PM1/20/12
to sqlal...@googlegroups.com
Hi,

I'm trying to use the 'contains_eager' option but experiencing some difficulty figuring out how to make it generate what I need.

Short background:

The code is part of an automated search-generation algorithm that takes kw=val constructs (such as user.name='bob', user.job.title='engineer', user.job.desk.desk_make='steelcase') and joins related models as necessary to produce the final query.

(1) because the query is generated automatically, the code typically joins on aliased models to avoid potential conflicts.
(2) there is no predetermined limit to the number of relationships the query can span, and there are a number of cases where spanning 2 relationships is very useful.
(3) it is necessary to be able to filter on any or all of the referenced models

I believe that (2) and (3) pretty much force me to explicitly handle the joins myself and then use 'contains_eager' to eagerly load the related model records, as described in:

http://www.sqlalchemy.org/docs/orm/loading.html#contains-eager

Here is a made-up example:

Given models such as:

class User:
    user_id = Integer
    job = Relationship(Job)
    (...)

class Job:
    job_id = Integer
    desk=Relationship(Cubicle)

class Cubicle:
    cubicle_id = Integer
    desk_make = Text
    cubicle_location = Integer

If I want to identify Users whose job provides them with a desk made by steelcase,

My code looks something like:

q=Session.Query(User)

job_alias=Aliased(Job)
cubicle_alias=Aliased(Cubicle)

q=q.join(job_alias, from_jointpoint=True)
q=q.join(cube_alias, from_jointpoint=True)

# now I wish to force eager loading.  According to the sqlalchemy docs,
# I need to provide the complete path from User to Cubicle,
# which would lead me to use something like:

q=q.options(contains_eager(User.job, Job.desk))

# but also according to the docs, if I am using an alias,
# I need to supply that to the contains_eager option
# so that the aliased table can be properly identified.

# the example in the docs looks like:

q=q.options(contains_eager(User.job, alias=job_alias))

# if one of these approaches worked, then, of course, I could filter like so:

q.filter(cubicle_alias.desk_make == 'steelcase')

----

Unfortunately I do not see how to provide the necessary full path from the root of the query to the tip of the join and at the same time provide a value for alias for each node of the join.

I have taken a cursory look at the contains_eager function and followed the trail of inherited methods/models back a couple of steps, but I do not immediately see any way to induce it to handle both a list of joins and a list of corresponding aliases.

For what it is worth, I've written an earlier version of the search function which works by recursively joining as necessary across related models.  It is functional, but because there is no eager loading, it can be slow on large collections, given the number of separate queries it generates.

If I can't get contains_eager to work with joins across multiple aliased models, I can write a more sophisticated algorithm which will only create aliases as necessary.  Then it would have to 'chunk' the eager loading, breaking it at joins to aliased models.  This would reduce the performance somewhat but should solve the problem.

It does seem though, that there is no reason why contains_eager should not handle the case I am describing, so I feel like I'm just missing something obvious.

Any help or direction would be most appreciated.

Thanks,

Justin Thiessen

Michael Bayer

unread,
Jan 21, 2012, 12:17:53 AM1/21/12
to sqlal...@googlegroups.com
On Jan 20, 2012, at 6:59 PM, Justin Thiessen wrote:

Hi,

I'm trying to use the 'contains_eager' option but experiencing some difficulty figuring out how to make it generate what I need.

Short background:

The code is part of an automated search-generation algorithm that takes kw=val constructs (such as user.name='bob', user.job.title='engineer', user.job.desk.desk_make='steelcase') and joins related models as necessary to produce the final query.

(1) because the query is generated automatically, the code typically joins on aliased models to avoid potential conflicts.
(2) there is no predetermined limit to the number of relationships the query can span, and there are a number of cases where spanning 2 relationships is very useful.
(3) it is necessary to be able to filter on any or all of the referenced models

I believe that (2) and (3) pretty much force me to explicitly handle the joins myself and then use 'contains_eager' to eagerly load the related model records, as described in:

http://www.sqlalchemy.org/docs/orm/loading.html#contains-eager

class User:
    user_id = Integer
    job = Relationship(Job)
    (...)

class Job:
    job_id = Integer
    desk=Relationship(Cubicle)

class Cubicle:
    cubicle_id = Integer
    desk_make = Text
    cubicle_location = Integer


q=Session.Query(User)

job_alias=Aliased(Job)
cubicle_alias=Aliased(Cubicle)

q=q.join(job_alias, from_jointpoint=True)
q=q.join(cube_alias, from_jointpoint=True)

# now I wish to force eager loading.  According to the sqlalchemy docs,
# I need to provide the complete path from User to Cubicle,
# which would lead me to use something like:

q=q.options(contains_eager(User.job, Job.desk))

# but also according to the docs, if I am using an alias,
# I need to supply that to the contains_eager option
# so that the aliased table can be properly identified.

# the example in the docs looks like:

q=q.options(contains_eager(User.job, alias=job_alias))

# if one of these approaches worked, then, of course, I could filter like so:

q.filter(cubicle_alias.desk_make == 'steelcase')

----

Unfortunately I do not see how to provide the necessary full path from the root of the query to the tip of the join and at the same time provide a value for alias for each node of the join.

I have taken a cursory look at the contains_eager function and followed the trail of inherited methods/models back a couple of steps, but I do not immediately see any way to induce it to handle both a list of joins and a list of corresponding aliases.

contains_eager() only cares about the columns it's going to be picking out of the result set.    Here it seems you'd like the job_alias columns to populate into User.job, and the cubicle_alias columns into the Job.desk relationship.  So two contains eagers, each illustrating the path plus target alias:

contains_eager(User.job, alias=job_alias)
contains_eager(User.job, Job.desk, alias=desk_alias)

both are needed since the User->job->desk load can't take place unless Job objects are being loaded in the results.

If that specific combination doesn't work, then I need a full test case which if it follows the pattern correctly, would mean it's a bug.    I don't use contains_eager across more than one hop generally, though we should have test coverage for multiple paths....





Justin Thiessen

unread,
Jan 28, 2012, 9:21:31 PM1/28/12
to sqlal...@googlegroups.com


On Friday, January 20, 2012 9:17:53 PM UTC-8, Michael Bayer wrote:
On Jan 20, 2012, at 6:59 PM, Justin Thiessen wrote:

<snip description of confusion regarding contains_eager and multi-level joins>
 
contains_eager(User.job, alias=job_alias)
contains_eager(User.job, Job.desk, alias=desk_alias)

both are needed since the User->job->desk load can't take place unless Job objects are being loaded in the results.
 
Thanks!  This is exactly what I was trying to figure out.

I had misinterpreted:

http://docs.sqlalchemy.org/en/latest/orm/loading.html#contains-eager

to indicate that you could supply a single contains_eager option to join multiple tables at once.

My code now works as expected, and I get the benefit of a single SQL query for joins on large datasets, which is not inconsiderable.

Cheers,

Justin Thiessen

Reply all
Reply to author
Forward
0 new messages