Eager loading with SELECT instead of JOIN

Emil Ivanov

unread,

Oct 15, 2009, 12:21:45 PM10/15/09

to sqlalchemy

As far as I understand eager loading it works by adding a join clause
to the select. That has the drawback of increased number or rows
returned. Imagine having 100 objects and each with 200 related
objects, most of which overlap so the total number of related objects
is about 300. That will generate 20 000 rows, where in reality it
would be much better to fetch the 100 objects with a single query and
then fetch the 300 related objects with another query.

It that possible with the current version of sqlalchemy.

Hibernate has something like this in the form of Batch fetching:
http://docs.jboss.org/hibernate/stable/core/reference/en/html_single/#performance-fetching-batch

Regards,
Emil Vladev

Michael Bayer

unread,

Oct 15, 2009, 1:19:22 PM10/15/09

to sqlal...@googlegroups.com

its not built in but you can run the second query manually, using the
first query as a subquery to formulate the WHERE criterion for the second.
I do this often, in fact.

there is a helper method sqlalchemy.orm.attributes.set_committed_value()
which you can use to populate the related scalar or collections of each
parent object, without any history events firing off, so that the related
object(s) will have the state of objects that were loaded from the
database.

the general idea is:

q= sess.query(Parent).filter(...)
parents = q.all()

children =
sess.query(Child).filter(Child.id.in_(sess.query(Parent.id).filter(...))

parent_dict = dict((p.id, []) for p in parents)

for child in children:
parent_dict[child.parent_id].append(child)

for p in parents:
set_committed_value(p, "children", parent_dict[p.id])

there are several reasons this is not built in. one is that the subquery
step is very difficult to formulate for relations that deal with complex
join conditions and composite primary keys - by leaving this to the user
SQLA avoids wading into what is probably kind of a bottomless hole. The
other is that the ORM doesn't include an "immediate second query" in its
architecture, nor does it include a "deferred load among many objects from
a previous query in response to a single attribute access" - which is
essentially the "lazy" version of this and is often what people prefer.

there are two features though which would make the above operation simpler
- a query.select_entities(*cols_or_entities) method, which is essentially
what query.values() does without the execution (I always preferred it that
way, in fact), and perhaps a "knit_collections_together" type of method
that would do the second step.

Michael Bayer

unread,

Oct 15, 2009, 2:12:49 PM10/15/09

to sqlal...@googlegroups.com

Michael Bayer wrote:
>
> Emil Ivanov wrote:
>>
>> As far as I understand eager loading it works by adding a join clause
>> to the select. That has the drawback of increased number or rows
>> returned. Imagine having 100 objects and each with 200 related
>> objects, most of which overlap so the total number of related objects
>> is about 300. That will generate 20 000 rows, where in reality it
>> would be much better to fetch the 100 objects with a single query and
>> then fetch the 300 related objects with another query.

Also something to clarify here. If you have 100 parents, each with 200
children, and there is "overlap", that implies there is a many-to-many
relation between parent/child, and that there are essentially 20K rows in
the association table. But you still need to query all 20K of those rows
in order to determine the correct associations to the parents. So no rows
are saved.

Simlarly, if the relation is one-to-many or many-to-one, there would be no
"overlap" and you still ultimately have to fetch everything.

The approach only saves on columns being fetched, not rows.

Michael Bayer

unread,

Oct 15, 2009, 2:17:39 PM10/15/09

to sqlal...@googlegroups.com

Emil Ivanov wrote:
>
> As far as I understand eager loading it works by adding a join clause

and...the rationale for the Hibernate feature is to optimize *lazy*
loading, not eager loading - its a query that you only want if needed.
Also hibernate does a poor job with join-based eager loading - they make
no adjustment for things like LIMIT/OFFSET, for example (which is
disastrous IMHO).

The performance gains in the "eager" version are only that of fetching
columns. In Hibernate's case, some JDBC implementations don't pull
columns over the wire unless requested on the result. Sadly this is not
the case for most Python DBAPIs.

Reply all

Reply to author

Forward