Using the same joinedload object in multiple options causes performance issue with baked queries

26 views
Skip to first unread message

Tom Flannaghan

unread,
Jun 6, 2018, 6:53:58 AM6/6/18
to sqlalchemy
Hi all,

We have just upgraded to sqlalchemy 1.2.7 (from 1.1.14), and had a performance issue with a query that uses a lot of joinedloads that was caused by the automatic baking of all relationship queries that was introduced in 1.2.

Say we have a set of tables with relationships Book.pages, Page.font and Font.layout. We have a query of this form:

pages = joinedload(Book.pages)
option1 = pages.joinedload(Page.font)
option2 = pages.joinedload(Page.layout)

query = session().query(Book).options(option1, option2)

The important point here is that the pages object defined on line 1 is reused in both option1 and option2. Now suppose we fetch another relationship that wasn't joined-loaded on the returned instances. This will case another query as it is not loaded already, and this query will be baked due to the change in 1.2 to bake all relationship loads.

We found that the construction of the cache key for baking this query becomes very slow as the number of options of this form increases, and is in fact quadratic in the number of such options (we have ~25 such options in our problematic query). This is due to each option containing all of the joinedloads inside its _to_bind attribute, and _UnboundLoad._generate_cache_key has to process everything in the _to_bind list. E.g. in this example:

print([[str(i) for i in load.path] for load in option1._to_bind])
print([[str(i) for i in load.path] for load in option2._to_bind])

[['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']]

[['Book.pages'], ['Book.pages', 'Page.font'], ['Book.pages', 'Page.layout']]


Therefore, when generating the key for each option we are processing the joinedloads from all of the options, leading to the quadratic performance degradation.

We fixed it by avoiding reusing the joinedload for Book.pages by doing this:

option1 = joinedload(Book.pages).joinedload(Page.font)
option2 = joinedload(Book.pages).joinedload(Page.layout)

The resulting query is unchanged, but the cache key function is now just linear in the number of joinedloads as each option has only its relationships in its _to_bind attribute. In our case, this completely solved the performance issue.

I'm not sure whether this behaviour is a bug or whether joinedloads aren't intended to be reused. If the latter, it would be great if they raised a warning if reused like this (and a mention of this issue in the docs).

Thanks,
Tom

Mike Bayer

unread,
Jun 6, 2018, 9:25:43 AM6/6/18
to sqlal...@googlegroups.com
I'd classify this as a bug which should be fixed. I would need to
find time to dig into the structure of the options objects but it
seems as though if the options themselves copied out to a new option,
or the cache key could be smarter about what it's doing. I don't
understand exactly what you've observed yet but it sounds like in the
re-use case it is spending a lot of time with redundant data that
isn't needed. this is up at
https://bitbucket.org/zzzeek/sqlalchemy/issues/4270/cache-key-calc-for-loader-options-w-reuse


>
> Thanks,
> Tom
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Tom Flannaghan

unread,
Jun 6, 2018, 10:32:25 AM6/6/18
to sqlalchemy
Thanks Mike. I've attached a script that shows the difference in case that helps.
joinedload.py

Mike Bayer

unread,
Jun 6, 2018, 4:52:48 PM6/6/18
to sqlal...@googlegroups.com
There's a patch at
https://gerrit.sqlalchemy.org/#/c/zzzeek/sqlalchemy/+/771/ which if
you can review against your specific mappings would be helpful to
confirm this fixed the issue.

On Wed, Jun 6, 2018 at 10:32 AM, Tom Flannaghan <tomfla...@gmail.com> wrote:
> Thanks Mike. I've attached a script that shows the difference in case that
> helps.
>

Tom Flannaghan

unread,
Jun 7, 2018, 7:09:15 AM6/7/18
to sqlal...@googlegroups.com
I've tested your patch against the problematic queries we had, and it completely solves the problem. Thanks!


You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy/9hBwMGPneJM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy+...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages