IN() Relationship Eager Loading Strategy

89 views
Skip to first unread message

Paul Brown

unread,
Mar 15, 2017, 1:12:02 AM3/15/17
to sqlalchemy

tl;dr: I'm looking for an elegant way to eager load related rows using separate IN() queries, because MySQL is "Using temporary; Using filesort" when I try subquery eager loading.


My first attempt used joined eager loading, but that was causing performance issues on my queries with nested relationships. This is because joined eager loading uses a LEFT OUTER JOIN which loads the full set of columns represented by the parents on each row of the results. The "What Kind of Loading to Use?" part of the docs explains the issue pretty well.


Next, I tried switching to subquery eager loading to fix the problem. But, MySQL 5.6 says "Using temporary; Using filesort" when I run an explain on the query. It did seem to make the query run ~20% faster, but I'm worried about the load that would put on my database. Example: https://gist.github.com/pawl/bde2737c4d217b468eba1107a03fbcb5


Next, I was hoping I could get SQLAlchemy to use its own internal storage in the session to find related rows if I already queried for them. However, this only seems to work one way by default. It will still run queries (if you’re not doing eager loading) if you access relations from the “one” side of a one to many relationship. Example: https://gist.github.com/pawl/abc0e536219144e569c728c8590b0d39


At this point, I'm really wishing there was an eager loading strategy that queries for the related rows using IN(). So, I read the "Creating Custom Load Rules" section of the docs and came up with an example that added the related rows with "set_committed_value" after I queried for them separately: https://gist.github.com/pawl/df5ba8923d9929dd1f4fc4e683eced40


Is there a more elegant way to do this? And, has adding an IN() eager loading strategy to SQLAlchemy been considered?

mike bayer

unread,
Mar 15, 2017, 9:36:07 AM3/15/17
to sqlal...@googlegroups.com


On 03/15/2017 01:12 AM, Paul Brown wrote:
> tl;dr: I'm looking for an elegant way to eager load related rows using
> separate IN() queries, because MySQL is "Using temporary; Using
> filesort" when I try subquery eager loading.
>
>
> My first attempt used joined eager loading, but that was causing
> performance issues on my queries with nested relationships. This is
> because joined eager loading uses a LEFT OUTER JOIN which loads the full
> set of columns represented by the parents on each row of the results.
> The "What Kind of Loading to Use?"
> <http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#what-kind-of-loading-to-use>part
> of the docs explains the issue pretty well.
>
>
> Next, I tried switching to subquery eager loading to fix the problem.
> But, MySQL 5.6 says "Using temporary; Using filesort" when I run an
> explain on the query. It did seem to make the query run ~20% faster, but
> I'm worried about the load that would put on my database. Example:
> https://gist.github.com/pawl/bde2737c4d217b468eba1107a03fbcb5
>
>
> Next, I was hoping I could get SQLAlchemy to use its own internal
> storage in the session to find related rows if I already queried for
> them. However, this only seems to work one way by default. It will still
> run queries (if you’re not doing eager loading) if you access relations
> from the “one” side of a one to many relationship. Example:
> https://gist.github.com/pawl/abc0e536219144e569c728c8590b0d39
>
>
> At this point, I'm really wishing there was an eager loading strategy
> that queries for the related rows using IN(). So, I read the "Creating
> Custom Load Rules"
> <http://docs.sqlalchemy.org/en/latest/orm/loading_relationships.html#creating-custom-load-rules>section
> of the docs and came up with an example that added the related rows with
> "set_committed_value" after I queried for them separately:
> https://gist.github.com/pawl/df5ba8923d9929dd1f4fc4e683eced40


So I'd like to someday add support for the IN() thing, I had a feeling
that sqlalchemy-utils supplied this but I'm not seeing it in their docs.

There's a recipe for "roll your own eager loading" at
https://bitbucket.org/zzzeek/sqlalchemy/wiki/UsageRecipes/DisjointEagerLoading
which is written for subquery eager loading before it was added. The
IN version is similar, you just need to grab the related primary keys as
you get the primary rows.

However I would note that others are not having this problem with "using
temporary; using filesort" with MySQL 5.6. There should be ways to
set up your schema so that an efficient plan is used. I'd want to look
into fixing it on that end first if possible.


>
>
> Is there a more elegant way to do this? And, has adding an IN() eager
> loading strategy to SQLAlchemy been considered?
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages