Many tables in eagerloading

8 views
Skip to first unread message

Arun Kumar PG

unread,
Sep 4, 2007, 2:56:49 AM9/4/07
to sqlal...@googlegroups.com
Guys,

Was wondering if we have 10 tables or so which are related to each other and are required during  let's say report generation then if I specify eagerloading for all those attributes which are related to these tables then down the line as the records in the table grows the temp tables generated in the join (left outer in eagerloading) will be massive before appying the where clause. So I guess we should worry about this or is that fine as long as the tables are getting join on primary/foreign key as the query plan looks decent ?

I am doing this for 7-8 tables out of which data is growing continuously in couple tables with a factor of 2XN every month. I am worried if eagerloading may be a problem in the sense if it will bring the db server down to knees some day considering the joins happening ? FYI: the eager loading is specified at the Query level so that I can optimize where I really need.

But currently it's faster as compared to executing individual query. And in my case if I go with individual queries due to lazy load it takes forever. And in many cases the request times out when using a web browser. So eargerloading is better but just worried about speed. Any good guidelines on how we should use eagerloading, best practises, any limitation etc ?

--
Cheers,

- A

svilen

unread,
Sep 4, 2007, 5:04:06 AM9/4/07
to sqlal...@googlegroups.com
we have similar data and reports, so here what we have invented so
far.
be ware: its all pure theory.

-1 (horizontal) (eager) loading ONLY of the needed row attributes,
also hierarhicaly (a.b.c.d)
-2 (vertical) simultanously loading of columns - e.g. the lazy
attribites - wholly, or in portions/slices (depending on UI
visibility or other slice-size)
-3 skipping creation of objects - only using the data, if time of
creation gets critical. For example a simple report for a name.alias
and age of person, the creation of 100,000 Persons can be ommitted.
To be able to do drill-down, the person.db_id would be needed+stored
too.
-4 cacheing of some aggregations/calculations in special
columns/tables, so they're not re-invented everytime
-5 translate the whole report - calculations, aggregations, grouping
etc. into sql and use the result as is (with same thing about
db_id's)

and combination of above, in whatever subset.

i've done something about aggregation/4 with Paul Colomiets, see last
development at
http://www.mr-pc.kiev.ua/en/projects/SQLAlchemyAggregator/
But there's more to it, as i want it completely transparent and
switchable (on/off).

i'd be most interested in 7/ last one, but as i see the trend, very
very few people look a level higher than plain sql expressions (and
_want_ all the sql-dependencies that follow from that), what to say
about translations of meta-info...

on the way further, we'll probably have more of these invented, unless
someone does it first, which would be very welcome...

ciao
svilen

Arun Kumar PG

unread,
Sep 4, 2007, 10:12:26 AM9/4/07
to sqlal...@googlegroups.com
Good work svilan!

couple questions from what you suggested:

>> skipping creation of objects - only using the data, if time of creation gets critical.
In my query wherein the eagerloading is being done on 8 tables if I manually run the join query being generated by SA ORM I get a result set of over 70,000+ records because this is contains duplicate things. I guess SA ORM traverses the returned result and creates the right number of objects discarding the duplicate or unwanted rows.

How much time is generally spent by SA if let's say we have a 1,000,000 rows returned and this resultset will form the whole heirarchy i.e. A (main entity) A.b, A.c [] (a list) , A.c.d [] (again a list) etc. ? Is that a considerable time ?

What is the best optimization technique followed (recommended by SA ORM) when it comes to aggregation of data from multiple tables with lots of rows in it (and continuously growing) ?

I am very much thinking about optimization as it is really critical for me right now.
--
Cheers,

- A

Michael Bayer

unread,
Sep 4, 2007, 10:14:29 AM9/4/07
to sqlal...@googlegroups.com

you might want to look into creating views for each thing that you
need to select, thus moving the complexity of fine tuning the joins
over to the database side.

Arun Kumar PG

unread,
Sep 4, 2007, 10:15:50 AM9/4/07
to sqlal...@googlegroups.com
i thought so earlier but unfortunately i am on a lower version of mysql :(
--
Cheers,

- A

svilen

unread,
Sep 4, 2007, 11:15:24 AM9/4/07
to sqlal...@googlegroups.com
On Tuesday 04 September 2007 17:12:26 Arun Kumar PG wrote:
> Good work svilan!
>
> couple questions from what you suggested:
> >> skipping creation of objects - only using the data, if time of
> >> creation gets critical.
> In my query wherein the eagerloading is being done on 8 tables if I
> manually run the join query being generated by SA ORM I get a
> result set of over 70,000+ records because this is contains
> duplicate things. I guess SA ORM traverses the returned result and
> creates the right number of objects discarding the duplicate or
> unwanted rows.
probably, but unless u discard 50% of rows, it makes no much
difference.

> How much time is generally spent by SA if let's say we have a
> 1,000,000 rows returned and this resultset will form the whole
> heirarchy i.e. A (main entity) A.b, A.c [] (a list) , A.c.d []
> (again a list) etc. ? Is that a considerable time ?

well u just sum how many constructors has to be executed - per row,
multiply by million... IMO anything in python mutiplied by million is
considerable time. Maybe in 10 years it will be not. Thats why i want
to keep my model separate from the denormalization scheme.

> What is the best optimization technique followed (recommended by SA
> ORM) when it comes to aggregation of data from multiple tables with
> lots of rows in it (and continuously growing) ?

SA 0.4 can do query polymorphisms by multiple per-subtype queries, but
i have no idea how that works or how much better it is than one huge
union/leftouterjoin + the type-switching over it, + u'll miss overall
the ordering.
Mike, can this mechanism be used somehow for my "vertical" loading,
i.e. some load "column" from some related table that is not in the
polymorhistic hierarchy?

Michael Bayer

unread,
Sep 4, 2007, 11:56:22 AM9/4/07
to sqlal...@googlegroups.com

On Sep 4, 2007, at 10:15 AM, Arun Kumar PG wrote:

> i thought so earlier but unfortunately i am on a lower version of
> mysql :(

upgrade.

Arun Kumar PG

unread,
Sep 5, 2007, 1:18:37 AM9/5/07
to sqlal...@googlegroups.com
bit tough as many apps on that server and won't be easy as of now.

On 9/4/07, Michael Bayer <mik...@zzzcomputing.com > wrote:



--
Cheers,

- A
Reply all
Reply to author
Forward
0 new messages