How to use load_only or with_entities with relationships

5,298 views
Skip to first unread message

Gmoney

unread,
Jan 8, 2019, 4:16:18 PM1/8/19
to sqlalchemy
Unable to limit the columns returned from a query using .load_only or .with_entities.  I feel like I'm missing a key concept here, and could use some guidance.

One column of interest is a relationship - which I'm guessing is my problem.
class Task(Base):
    __tablename__
= 'TASK'
    id
= Column('ID', VARCHAR2(250), primary_key=True)
    subject
= Column('SUBJECT', VARCHAR2(250))
    comments
=relationship('Comment', back_populates='task', lazy='joined')

class Comment(Base):
    __tablename__
= 'COMMENT'
    id
= Column('ID', Integer, primary_key=True)
    msg_id
= Column('MSG_ID', VARCHAR2(250), ForeignKey(Task.id))
    comment_given_by
= Column('COMMENT_GIVEN_BY', VARCHAR2(250))
    task
= relationship('Task', back_populates='comments', lazy='joined')

Running the query without any attempt to limit columns works fine (ie. the Classes/models seem OK)
query = session.query(Task)

But now I tried .with_entities.
If you ONLY request actual Columns, and not relationships, it lists the columns and tables in the select, but does not add the ON clause (cartesian join)
If I request the relationship, it generates invalid SQL:
query = session.query(Task).with_entities(Task.id, Task.comments)
   
# SELECT
   
# "TASK"."ID",
    # "TASK"."ID" = "COMMENT"."MSG_ID" AS comments
    # FROM
   
# "TASK",
   
# "COMMENT"


So trying with .load_only:
No luck many ways, all failing... every random thing I try results in a different exception - I feel like I'm getting farther and just guessing now
 
query = session.query(Task, Comment).options(load_only(Task.id, Comment.comment_given_by))
---->sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity.  Use Load(ent) to specify specific entities.

query = session.query(Task).options(load_only('id', 'comments'))
----> Exception: can't locate strategy for <class 'sqlalchemy.orm.relationships.RelationshipProperty'> (('deferred', False), ('instrument', True))

query = session.query(Task).options(load_only(Task.id, Comment.comment_given_by))
----> sqlalchemy.exc.ArgumentError: Can't find property 'comment_given_by' on any entity specified in this Query.  Note the full path from root (Mapper|Task|TASK) to target entity must be specified.

query = session.query(Task, Comment).options(load_only(Task.id, Comment.comment_given_by))
----> sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity.  Use Load(ent) to specify specific entities.

query = session.query(Task).options(load_only(Task.id, Task.comments))
----> Exception: can't locate strategy for <class 'sqlalchemy.orm.relationships.RelationshipProperty'> (('deferred', False), ('instrument', True))

query = session.query(Task).options(load_only(Task.comments), subqueryload("comments").load_only('comment_given_by'))
----> No exception but only generates:  "SELECT "TASK"."ID" FROM "TASK"

Mike Bayer

unread,
Jan 9, 2019, 12:22:18 AM1/9/19
to sqlal...@googlegroups.com
The documentation for what you are trying to do is at:

https://docs.sqlalchemy.org/en/latest/orm/loading_columns.html#deferred-loading-with-multiple-entities

When there are multiple entities, the Query never jumps between two
entities without you specifying how you get there. in this case,
the way you navigate between them is via Task.comments, so that
navigation must be stated:

session.query(Task).options(load_only(Task.id),
defaultload(Task.comments).load_only(Comment.comment_given_by))

You in fact did pretty much the same query at the end of your attempts
and I think you probably relied upon print(query) to determine if it
was "working" or not, because your query with subqueryload() works,
you just would not be able to see this by print(query) alone because
subqueryload() emits a second query only when the objects are loaded.


>
> query = session.query(Task, Comment).options(load_only(Task.id, Comment.comment_given_by))
> ---->sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity. Use Load(ent) to specify specific entities.

load_only means, defer('*') + undefer(Task.id) +
undefer(Comment.comment_given_by). But the wildcard doesn't know
how to do Task, Comment together, you would need to say
Load(Task).load_only(Task.id),
Load(Comment).load_only(Comment.comment_given_by), but again this is
not what you want because you are asking for a cartesian product in
the above query. Error message is not great here because it comes
from the defer('*') that you can't see happening here, so that should
be improved.



> query = session.query(Task).options(load_only('id', 'comments'))
> ----> Exception: can't locate strategy for <class 'sqlalchemy.orm.relationships.RelationshipProperty'> (('deferred', False), ('instrument', True))

load_only() does not apply to relationships like "comments', only to
columns. again not a great error message here, that could be
improved.


>
> query = session.query(Task).options(load_only(Task.id, Comment.comment_given_by))
> ----> sqlalchemy.exc.ArgumentError: Can't find property 'comment_given_by' on any entity specified in this Query. Note the full path from root (Mapper|Task|TASK) to target entity must be specified.

Task has nothing to do with Comment unless you state this in terms of
Task.comments, which is what "full path" means


>
> query = session.query(Task, Comment).options(load_only(Task.id, Comment.comment_given_by))
> ----> sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity. Use Load(ent) to specify specific entities.

you already tried this one above


>
> query = session.query(Task).options(load_only(Task.id, Task.comments))
> ----> Exception: can't locate strategy for <class 'sqlalchemy.orm.relationships.RelationshipProperty'> (('deferred', False), ('instrument', True))

this is the same thing as the other one you did, string attribute
names or Object.attrname, same thing

>
> query = session.query(Task).options(load_only(Task.comments), subqueryload("comments").load_only('comment_given_by'))
> ----> No exception but only generates: "SELECT "TASK"."ID" FROM "TASK"

I'm not sure you tested this correctly, this one actually works - the
subqueryload("comments") cancels out the load_only(Task.comments) that
would normally raise an error. Subqueryload is emitted as a second
query, so if you just said print(query), you would not have seen it.

Here is what it emits if I run on a model like this:

s.query(A).options(load_only(A.bs), subqueryload(A.bs).load_only(B.data)).all()

output:

SELECT a.id AS a_id
FROM a

SELECT b.id AS b_id, b.data AS b_data, anon_1.a_id AS anon_1_a_id
FROM (SELECT a.id AS a_id
FROM a) AS anon_1 JOIN b ON anon_1.a_id = b.a_id ORDER BY anon_1.a_id



>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Mike Bayer

unread,
Jan 9, 2019, 12:38:13 AM1/9/19
to sqlal...@googlegroups.com
every case is now listed out for improved error reporting in:

https://github.com/sqlalchemy/sqlalchemy/issues/4433

Gmoney

unread,
Jan 10, 2019, 8:34:20 AM1/10/19
to sqlalchemy
Thanks for the very clear reply, that helps understand the underpinnings a bit more.  Hope you didn't take the list of errors as a complaint - just a list of what I tried... but the error message enhancements look like  a good addition that will help when I forget all this and repeat it a year from now.

As you said, I was relying on the rendered SQL to debug, so that part makes sense now too, and It is indeed working with the defaultload.

The part that still confuses me a bit is that if do the default and select all fields (including relationships) and don't do any specific deferrals, the ORM knows what to do for joins etc - a simple Task.query() works.  Is there something about the deferral/downselecting that breaks that ability and requires the more involved syntax specifying the secondary joins/loads for the relationships?  I was hoping to make this part very simple for the people using the Model where they could simply say - Task.query().load_only(comments)  (or with_entities, or....  something of similar complexity) without needing to understand the underpinnings of the model.

Thanks again for the help

Mike Bayer

unread,
Jan 10, 2019, 10:18:59 AM1/10/19
to sqlal...@googlegroups.com
On Thu, Jan 10, 2019 at 8:34 AM Gmoney <grg...@gmail.com> wrote:
>
> Thanks for the very clear reply, that helps understand the underpinnings a bit more. Hope you didn't take the list of errors as a complaint - just a list of what I tried... but the error message enhancements look like a good addition that will help when I forget all this and repeat it a year from now.

the load_only() stuff is recent and I absolutely felt your
frustration, I very often have very similar sequences where I'm trying
to get the ORM to do something intuitive and every single thing I try
doesn't do it - this is because after 13 years of development I don't
have a memory of every use case either, so finding these patches of
looks intuitive and raises an inscrutable error message are bugs that
need to be fixed.


>
> As you said, I was relying on the rendered SQL to debug, so that part makes sense now too, and It is indeed working with the defaultload.
>
> The part that still confuses me a bit is that if do the default and select all fields (including relationships) and don't do any specific deferrals, the ORM knows what to do for joins etc - a simple Task.query() works.

this may be because you have lazy="joined" on your relationship, it
will automatically fetch Task.comments in the same query.

> Is there something about the deferral/downselecting that breaks that ability and requires the more involved syntax specifying the secondary joins/loads for the relationships?

when you give Query instructions about columns, it always needs to
know where in the "path" those columns are present. you could have
multiple relationships from Task that join to Comment:

class Task:
new_comments = relationship(Comment, primaryjoin=task.id ==
comment.new_task_id)
old_comments = relationship(Comment, primaryjoin=task.id ==
comment.old_task_id)

So above, if you wanted to only load the columns on Task.new_comments
and not Task.old_comments, that's why this explicitness is necessary.
I could above say:

query(Task).options(
defaultload(Task.new_comments).load_only(Comment.authorname),
defaultload(Task.old_comments).load_only(Comment.date))

the two paths are:

Task / new_comments / Comment / authorname
Task / old_comments / Comment / date

The same entities can be present in multiple places in an entity query.



> I was hoping to make this part very simple for the people using the Model where they could simply say - Task.query().load_only(comments) (or with_entities, or.... something of similar complexity) without needing to understand the underpinnings of the model.

I would consider that load_only() is itself a SQL performance
optimization so it's hard to expect users to know how to apply
performance optimizations to a query while simultaneously not having
knowledge of the structure of the tables they want to optimize.
Reply all
Reply to author
Forward
0 new messages