similar to OrderingList but order in code instead of via sql?

78 views
Skip to first unread message

Paul Winkler

unread,
Oct 26, 2016, 5:50:09 PM10/26/16
to sqlalchemy
Hi all,

I've been happily using OrderingList as a collection_class for some time.
Recently discovered a slow query due to multiple JOINs where the
order_by on the relationship ends up causing a table scan
because the field in question is not indexed.

One solution of course would be to add an index, but I'm looking
into other options because alters of the (large) mysql table in
question are getting quite slow, and the number of related objects is always small
(< 20).


I had the thought that if I could figure out the right sqlalchemy hooks
 to leverage, I could have a collection class that behaves like OrderingList,
 but where the list representation would be created by ordering in memory in
python code when loaded from the db.  I've been trying to wrap my head around
how to make a CollectionClass do this and don't think I grok things yet.
One thought would be to simply keep using OrderingList,
remove the order_by on the relationship, and use reorder_on_append.

So I currently have this:

    thingies = relationship(
        "Thingy",
        order_by="Thingy.sort_order",
        collection_class=ordering_list("sort_order"),
    )


But could maybe change it to:


    thingies = relationship(
        "Thingy",
        collection_class=ordering_list(
            "sort_order", reorder_on_append=True)
    )

I *think* that would do what I want, and the related items would be put
in place by reordering the list as each is appended during load,
but a) that seems like a rather inefficient sort algorithm (I'd rather load the full list
and then implicitly call reorder() exactly once, but I don't see a hook for doing something
like that on object load) and b) `reorder_on_append` comes with big warnings about

dangers of concurrent modification.

I could of course wrap the relationship in a property that does the sorting,
but then would have to update a bunch of existing query code that assumes
this attribute is a relationship that can be joined on.

I feel like I'm barking up the wrong tree. Any pointers on where I should be looking?

- Paul

mike bayer

unread,
Oct 26, 2016, 9:19:23 PM10/26/16
to sqlal...@googlegroups.com


On 10/26/2016 05:50 PM, Paul Winkler wrote:
> So I currently have this:
>
> thingies = relationship(
> "Thingy",
> order_by="Thingy.sort_order",
> collection_class=ordering_list("sort_order"),
> )
>
>
> But could maybe change it to:
>
>
> thingies = relationship(
> "Thingy",
> collection_class=ordering_list(
> "sort_order", reorder_on_append=True)
> )
>
> I *think* that would do what I want, and the related items would be put
> in place by reordering the list as each is appended during load,
> but a) that seems like a rather inefficient sort algorithm (I'd rather
> load the full list
> and then implicitly call reorder() exactly once, but I don't see a hook
> for doing something
> like that on object load) and b) `reorder_on_append` comes with big
> warnings about
> dangers of concurrent modification.


ordering list has lots of weird edges to it that I've never had the
means to figure out the solution towards. In this case, sorting by
one-append-at-a-time isn't that efficient, but otherwise I don't think
you'd have the concurrent modification issue here, I'm pretty sure
ordering list is meant to work with the ORM's normal append to the
collection as the source. I think the edges have more to do with when
the sort key is the primary key and people are trying to change the sort
keys and stuff like that.

But the idea to hit the list after population and sort it just once, and
not really worrying about ordering list being involved at all,
is a good one. Looking around though, we have a lot of ways to get a
hold of an object event for when we first build it from a row, but,
there's no hook that corresponds to, "we've eagerly loaded all the
collections and we're about to yield the object", nor is there a "we've
just lazy loaded this collection and stuffed it all into the list"
event. These are events we probably should have, and it might be good
as a "collection_loaded" event that works equally across
lazy/eager/whatever loads, and is oriented towards "I want to mutate the
collection in a certain way", which I think right now we have via the
attributes.set_committed_value method (you'd just set a new list into it
with the ordering you want).

right now, without that "collection_loaded" event, to do this approach
you'd be stuck subclassing Query to intercept results as they are
emitted via __iter__. You'd only have to override the __iter__ method
though with a filter and it can be programmatically determined what
should be done with the objects. It would be either: 1. detect all the
objects in the result that are of type Foo and have a collection ".bar"
that you know was eagerly loaded and needs to be sorted, or 2. detect
that we just lazy loaded the .bar collection on a Foo, sort the result
(the event hook would be applied to "Foo.bar" in the general sense and
fire off for any "here's a newly populated Foo.bar for you to mutate in
place").

We could even write an interim Query subclass that overrides __iter__
using this technique and then produces the "collection loaded" event
itself. You could in theory code to that, we could write tests for it,
then when SQLAlchemy adds the actual "collection_loaded" event hook
you're all set.

Of course this all sounds like I'm looking for free contributions to my
software, and you'd be right :). TLDR, sort of doable-ish right now
with ugliness, should be doable nicely with a new API feature, and a
path can probably be drawn between those points as well so that the
feature can live first as an external hack and later as a supported API.


>
> I could of course wrap the relationship in a property that does the sorting,
> but then would have to update a bunch of existing query code that assumes
> this attribute is a relationship that can be joined on.

right....well ....right there's hooks that could automate that too (in
fact the "attribute_instrument" event can probably used to automate
this) but then, you're still having to guess if you sorted the
collection already or not, as well as making a custom descriptor here is
pretty awkward, not that straightforward. the
intercept-the-collection-load event seems a lot nicer.

>
> I feel like I'm barking up the wrong tree. Any pointers on where I
> should be looking?
>
> - Paul
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
> To post to this group, send email to sqlal...@googlegroups.com
> <mailto:sqlal...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Paul Winkler

unread,
Oct 27, 2016, 12:42:52 PM10/27/16
to sqlalchemy
Wow, thanks for the very detailed reply Mike!


This is rather anticlimactic followup I'm afraid :)
But we mulled this over a bit and just slapped an index on the column in question instead.

- Paul

mike bayer

unread,
Oct 27, 2016, 4:13:05 PM10/27/16
to sqlal...@googlegroups.com
works for me! at least I have the plan for the next time this comes up.
> > an email to sqlalchemy+...@googlegroups.com <javascript:>
> > <mailto:sqlalchemy+...@googlegroups.com <javascript:>>.
> > To post to this group, send email to sqlal...@googlegroups.com
> <javascript:>
> > <mailto:sqlal...@googlegroups.com <javascript:>>.
> <https://groups.google.com/group/sqlalchemy>.
> > For more options, visit https://groups.google.com/d/optout
> <https://groups.google.com/d/optout>.
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and
> Verifiable Example. See http://stackoverflow.com/help/mcve for a full
> description.
> ---
> You received this message because you are subscribed to the Google
> Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to sqlalchemy+...@googlegroups.com
> <mailto:sqlalchemy+...@googlegroups.com>.
Reply all
Reply to author
Forward
0 new messages