Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Query performance: DivideAndConquer vs. BruteForce

167 views
Skip to first unread message

Tushar Mahapatra

unread,
Jan 4, 2002, 10:05:17 AM1/4/02
to
I have a question about what would be the best way to query an SQL
database (I am using Oracle) where many parent and child tables are
involved. I seek opinions on which of the following two methods would
require the least execution time:

Method 1 - BruteForce method:
Execute a single query where all the parent and child tables are
joined and all the data is returned at once. For example:

select *
from ParentTable P, Child1Table C1, Child2Table C2, ...
where P.Child1Key = C1.Key and P.Child2Key = C2.Key and ...


Method 2 - DivideAndConquer method:
Execute more than one query. The first query gets data from the parent
table:

select *
from ParentTable P
where ...

Next, in the front-end code, iterate through the parent table data
received and execute queries into the children tables:

select *
from Child1Table C1
where C1.Key = varPChild1Key and ...

I am inclined to believe that the DivideAndConquer method would be
faster, but I invite your comments, especially with respect to Oracle
9i.

The scenarios I have described are actually much simplified
representations of my situation.

D@b.a

unread,
Jan 4, 2002, 1:51:03 PM1/4/02
to
It depends how do you want to get the data in the front end. One relation? Then,
almost certainly, competing with optimizer (DivideAndConquer) is a losing game
for you. Nested structures like this:

class Parent {
Child[] children;
};

where you want to put the result into an array of Parent? Then you need to split
your query into many, but don't fall into the trap of recursion: you'd better
get the rows from the parent table first, and then all the detail records from
the child table _in a single shot_. One more time: no explicit nested loops as
you suggest, or your system will never scale.

In article <1ad0000.02010...@posting.google.com>, Tushar Mahapatra
says...

Alan

unread,
Jan 4, 2002, 2:02:27 PM1/4/02
to
_Generally_ (YMMV) when > 3 tables are joined in series (T1 -> T2 ->
T3 ->Tn), I have found that divide and conquer is a lot faster, joining 2
tables at a time and creating a third table as a result set to join the next
table to, etc. If the joins are in "parallel" (T1 -> T2 and T1 -> T3 and
T2 -> T4, etc.) then it depends, and it is often necessary to experiment. I
would try brute force first in this case.

"Tushar Mahapatra" <tushar_m...@yahoo.com> wrote in message
news:1ad0000.02010...@posting.google.com...

Andrew Wilson

unread,
Jan 5, 2002, 10:57:44 AM1/5/02
to
As a rule of thumb - "The more you perform on the database server, (and the
less on the client) the faster will be the overall response time"!

Performing a loop on the client, performing multiple queries will probably
be a factor of many times slower.

In the full query case I would imagine that the server would perform a table
scan on the parent table, and then perform full sort merges for each of the
child tables (depending on table sizes, statistics on columns, table
structuring, cleverness of optimiser, complexity of query, etc, etc). All
data then returned across the network to the client.

In the "divide-and-conquer" case a table scan of the parent table is
peformed and the data returned across the network (or locally).

Now we get into the trouble area, because for each row in the parent table
we now perform a select from each of the child tables. As they are primary
key selects, and I assume the table is structured on primary key, then (if
btree/ISAM) a tree walk is performed on the child table. The total number of
pages read will be greater (index pages need to be re-read each time [maybe
buffered]), the amount of server CPU will be greater (each seperate query
will have to be re-analysed), network traffic will certainly be greater.

At a total guess I would imagine that in the first case the selection is of
a logN type, whilst in the second case it is more likely N to some power.

Stick to the server ......


Andrew

"Tushar Mahapatra" <tushar_m...@yahoo.com> wrote in message
news:1ad0000.02010...@posting.google.com...

David Williams

unread,
Jan 6, 2002, 10:02:11 PM1/6/02
to

"Andrew Wilson" <and...@blueoffice.com> wrote in message
news:a177qa$2b1m$1...@news.cybercity.dk...

> As a rule of thumb - "The more you perform on the database server, (and
the
> less on the client) the faster will be the overall response time"!
>
True.

> Performing a loop on the client, performing multiple queries will probably
> be a factor of many times slower.
>

True.

> In the full query case I would imagine that the server would perform a
table
> scan on the parent table, and then perform full sort merges for each of
the
> child tables (depending on table sizes, statistics on columns, table
> structuring, cleverness of optimiser, complexity of query, etc, etc). All
> data then returned across the network to the client.
>
> In the "divide-and-conquer" case a table scan of the parent table is
> peformed and the data returned across the network (or locally).
>
> Now we get into the trouble area, because for each row in the parent table
> we now perform a select from each of the child tables. As they are primary
> key selects, and I assume the table is structured on primary key, then (if
> btree/ISAM) a tree walk is performed on the child table. The total number
of
> pages read will be greater (index pages need to be re-read each time
[maybe
> buffered]), the amount of server CPU will be greater (each seperate query
> will have to be re-analysed), network traffic will certainly be greater.
>
> At a total guess I would imagine that in the first case the selection is
of
> a logN type, whilst in the second case it is more likely N to some power.
>
> Stick to the server ......
>

However under Informix I find that >3 table queries are slower.

Use a temp (unlogged) table and select results into it, index it and
then join to it.

I usually find many tables queries come in several forms

1. Lots of small reference data tables and one large table

Join the large tables so a few of the small tables (the main ones
which restrict the rows from the large table at low cost).
Put this into the small temp table. Join the temp table to the rest.

2. Several large tables which produce a small number of rows and
many small tables.

Join the large tables to get a small temp table then join the
temp table to the rest.

3. Lots of complex sub selects and one/few main tables.

Partially do the subselects into one or more temp tables which
simplify the joins to the main tables. Then use a set of several
temp tables to gradually join the data together.

Here you can use knowledge you have the data to work out
the best way to do the query. Hints:

- Each new temp table which results should have fewer
rows than the last. Less rows to join is better. Get the most
restrictive filters in earlier
- Perhaps only get primary jeys and the final query can get
long strings from the table.

E.g.

select key1,key2, long string1 from table1,....
where <a> and <b> and ..

It may be better to initially do

select key1 from table1
where <a>...
into temp t1;
...
select key1,key2,string1 from t1..
where <b> and

if <b> does not restict the number of row from <t1>
very much...


This is where benchmarking comes in..!

Jim Kennedy

unread,
Jan 8, 2002, 1:16:56 AM1/8/02
to
I haven't had this problem under Oracle. (joins greater than 3 tables)
Jim
"David Williams" <d...@smooth1.fsnet.co.uk> wrote in message
news:a1b33k$idf$1...@news8.svr.pol.co.uk...

Nuno Souto

unread,
Jan 8, 2002, 7:10:35 AM1/8/02
to
Jim Kennedy doodled thusly:

>I haven't had this problem under Oracle. (joins greater than 3 tables)

me neither. this morning did an 11 table join without the slightest
performance problem...


Cheers
Nuno Souto
nso...@optushome.com.au.nospam

David Williams

unread,
Jan 8, 2002, 5:10:37 PM1/8/02
to

"Nuno Souto" <nso...@optushome.com.au.nospam> wrote in message
news:3c3ae18b...@news-vip.optusnet.com.au...

> Jim Kennedy doodled thusly:
>
> >I haven't had this problem under Oracle. (joins greater than 3 tables)
>
> me neither. this morning did an 11 table join without the slightest
> performance problem...
>
>

With complex subqueries, where exists and where not in's
with outer joins?

> Cheers
> Nuno Souto
> nso...@optushome.com.au.nospam


Thomas Kyte

unread,
Jan 8, 2002, 7:25:03 PM1/8/02
to
In article <a1fqp2$hj4$1...@news8.svr.pol.co.uk>, "David says...

>
>
>"Nuno Souto" <nso...@optushome.com.au.nospam> wrote in message
>news:3c3ae18b...@news-vip.optusnet.com.au...
>> Jim Kennedy doodled thusly:
>>
>> >I haven't had this problem under Oracle. (joins greater than 3 tables)
>>
>> me neither. this morning did an 11 table join without the slightest
>> performance problem...
>>
>>
>
> With complex subqueries, where exists and where not in's
> with outer joins?

sure, it is not a function of the number of tables -- its a function of how much
data is going to have to be sifted through. Those constructs could cause you to
sift through tons of data (and if you attempted to do it procedurally, on your
own, in the client -- you would lose).


I'm just putting this up to show off, but this is a view that is executed
internally at Oracle many thousands of times a day. Its part of a palm sync I
worked on. Some of the tables referenced in here have 16 million + rows. There
are exists, unions, subqueries out the whazoo, calls to custom PLSQL functions
for timezones and other things, queries on queries on queries and probably more
then 16 tables (never counted them).

It runs in about a second at most, typically subsecond (a function of the number
of events a person has ever scheduled in their tenure). I had the choice of
either filtering the data on the client (a palm sync conduit) or letting the
database do my work for me. I always choose the database - and trust me, its
way faster (not saying the first query you write will be faster, you might have
to tune it but it'll be faster)

create or replace force view wwv_date_book_sync_schedulesV2
as
select user USERNAME,
rownum+200000 SEQNO,
nvl(event_reminder,0) ALARM_ADVANCE_TIME,
0 ALARM_ADVANCE_UNIT,
event_title DESCRIPTION,
date_time_begin_local+length_minutes/1440 END_DATE,
decode( nvl(event_reminder,0), 0, 'N', 'Y' ) ISALARMED,
isrepeating ISREPEATING,
decode( nvl(length_minutes,0), 0, 'Y', 'N' ) ISUNTIMED,
event_notes NOTE,
decode( event_recur_type, 'NO_REPEAT', 0, 'DAILY', 1,
'WEEKDAYS', 2, 'WEEKLY', 2, 'BIWEEKLY', 2, 'MONTHLY_DATE', 4,
'MONTHLY_DAY', 3, 'YEARLY', 5, null ) REPEAT_TYPE,
event_recur_end_date_local REPEAT_END_DATE,
decode( event_recur_type, 'NO_REPEAT', 0,
nvl(decode(event_recur_type,'BIWEEKLY',2,event_recur_frequency),1))
REPEAT_FREQUENCY,
decode(event_recur_type,'NO_REPEAT', 0,
date_time_begin_local- next_day(date_time_begin_local-7, 'SUN'))
REPEAT_ON_DAY,
decode( event_recur_type, 'NO_REPEAT', 0, event_recur_month_week-1)
REPEAT_ON_WEEK,
0 REPEAT_START_WEEK,
date_time_begin_local START_DATE,
nvl( palm_id, 0 ) PALM_ID,
'N' IS_ARCHIVED,
decode( deleted_as_of, to_date('01010001','ddmmyyyy'), 'N', 'Y' )
IS_DELETED,
decode( palm_id, NULL, 'Y', 'N' ) IS_NEW,
decode( palm_id, NULL, 'N', 'Y' ) IS_MODIFIED,
event_private IS_PRIVATE,
id DB_ID,
'N' FROM_PALM,
created_on CREATED_ON,
deleted_as_of DELETED_AS_OF,
api_updated_on API_UPDATED_ON,
table_from ,
event_owner
from
( select decode( length_minutes, 0, trunc(a.date_time_begin),
decode( a.date_time_begin, null, to_date(null),
get_adjusted_time( a.date_time_begin,
to_number(substr( userenv('client_info'), 1,
instr(userenv('client_info'),',')-1)) ) ) )
AS date_time_begin_local,
decode( length_minutes, 0, trunc( a.event_recur_end_date ),
decode( a.event_recur_end_date, null, to_date(null),
get_adjusted_time( a.event_recur_end_date,
to_number(substr( userenv('client_info'), 1,
instr(userenv('client_info'),',')-1)) ) ) )
AS event_recur_end_date_local,
id, event_reminder, event_title, nvl(length_minutes,0) length_minutes,
event_recur_type, event_recur_frequency,
event_recur_month_week, event_private, isrepeating, event_notes,palm_id,
created_on, deleted_as_of, api_updated_on, table_from, event_owner
from ( select a.id, event_reminder, event_title,
nvl(length_minutes,0) length_minutes,
event_recur_type, event_recur_frequency,
event_recur_month_week, event_private, event_recur_end_date,
date_time_begin , 'Y' isrepeating,
wwv_date_book_sync_utils_pkg.get_rules_notes( a.rowid )
event_notes, palm_id,
created_on, deleted_as_of, api_updated_on, 'RULES'
table_from, event_owner
from wwv_cal_schedules_rules$ a,
wwv_date_book_sync_id_mapping b,
( select distinct event_recur_rule_id
from wwv_cal_schedules$
where event_owner in ( select * from
wwv_date_book_sync_eventowners )
and event_recur_rule_id <> 0
and deleted_as_of = to_date( '01010001', 'ddmmyyyy' )
) c
where a.id = c.event_recur_rule_id
and a.deleted_as_of = to_date( '01010001', 'ddmmyyyy' )
and b.username(+) = USER
and b.id(+) = a.id
and b.isrule(+) = 'Y'
and ( (b.id is NULL )
or (b.id is not NULL and greatest(
a.deleted_as_of,nvl(a.api_updated_on,to_date('01010001','ddmmyyyy')) ) >
to_date(substr(userenv('client_info'),instr(userenv('client_info'),',')+1),'dd/mm/yyyy
hh24:mi:ss'))
)
union all
select a.id, event_reminder, event_title,
nvl(length_minutes,0) length_minutes,
'NO_REPEAT' event_recur_type, 1 event_recur_frequency,
to_number(null) event_recur_month_week, event_private,
to_date(null) event_recur_end_date,
date_time_begin , 'N' isrepeating,
wwv_date_book_sync_utils_pkg.get_schedules_notes( a.rowid )
event_notes, palm_id,
created_on, deleted_as_of, api_updated_on,
'SCHEDULES' table_from, event_owner
from wwv_cal_schedules$ a,
wwv_date_book_sync_id_mapping b
where a.event_owner in ( select * from
wwv_date_book_sync_eventowners )
and a.deleted_as_of = to_date( '01010001', 'ddmmyyyy' )
and ( event_recur_rule_id = 0
or
(event_recur_rule_id <> 0 and change_exception = 'Y')
)
and b.username(+) = USER
and b.id(+) = a.id
and b.isrule(+) = 'N'
and ( (b.id is NULL )
or (b.id is not NULL and greatest(
a.deleted_as_of,nvl(a.api_updated_on,to_date('01010001','ddmmyyyy')) ) >
to_date(substr(userenv('client_info'),instr(userenv('client_info'),',')+1),'dd/mm/yyyy
hh24:mi:ss'))
)
union all
select a.id, null event_reminder, null event_title, 0
length_minutes,
decode( isrule, 'Y', 'WEEKLY', 'NO_REPEAT' )
event_recur_type, 1 event_recur_frequency,
to_number(null) event_recur_month_week, 'N' event_private,
to_date(null) event_recur_end_date,
to_date(null) date_time_begin , isrule isrepeating,
null event_notes, palm_id,
to_date(null) created_on, sysdate deleted_as_of,
to_date(null) api_updated_on, 'ID_MAPPING' table_from,
'unknown'
from wwv_date_book_sync_id_mapping a
where username = USER
and ( ( isRule = 'Y'
and
not exists ( select null
from wwv_cal_schedules_rules$
where id = a.id
and event_owner in
( select * from
wwv_date_book_sync_eventowners )
and deleted_as_of = to_date( '01010001', 'ddmmyyyy' )
)
)
or ( isRule = 'N'
and
not exists ( select null
from wwv_cal_schedules$
where id = a.id
and event_owner in ( select * from wwv_date_book_sync_eventowners )
and ( deleted_as_of = to_date( '01010001', 'ddmmyyyy' )
or
( event_recur_rule_id <> 0 and change_exception = 'N' )
)
)
)
or ( isRule = 'N'
and
exists ( select null
from wwv_cal_schedules$
where id = a.id
and event_owner in ( select * from wwv_date_book_sync_eventowners )
and event_recur_rule_id <> 0 and change_exception = 'N'
)
)
)
) a
)
>
>> Cheers
>> Nuno Souto
>> nso...@optushome.com.au.nospam
>
>

--
Thomas Kyte (tk...@us.oracle.com) http://asktom.oracle.com/
Expert one on one Oracle, programming techniques and solutions for Oracle.
http://www.amazon.com/exec/obidos/ASIN/1861004826/
Opinions are mine and do not necessarily reflect those of Oracle Corp

Roger Stapley

unread,
Jan 8, 2002, 9:00:44 PM1/8/02
to
Now Tom - that's just plain nasty!
roger.stapley.vcf

Nuno Souto

unread,
Jan 9, 2002, 6:52:32 AM1/9/02
to
David Williams doodled thusly:

>>
>
> With complex subqueries, where exists and where not in's
> with outer joins?
>

very much so. one dynamic view, three unions, three subqueries on
each union. around 11 tables per union. ah, yes: one subquery on the
column list as well.
not a major problem at all in Oracle, provided all PK/FK indexing is
in place and there are reasonably fresh stats for the optimizer to
use.

of course if the result set was in the millions of rows, things would
slow down. but that has nothing to do with query efficiency and all
to do with result set data volumes. all this query is doing is
pulling out about 100 rows out of a few hundred thousand on each
table.


Cheers
Nuno Souto
nso...@optushome.com.au.nospam

Daniel Guntermann

unread,
Jan 12, 2002, 4:49:24 AM1/12/02
to
Of course the best alternative depends on the context of the problem, but
generally speaking, especially in the case of 3, I'm not sure that I agree
that creating a bunch of temp tables is necessarily the most "efficient" way
of processing or executing a general query involving complex sub-selects and
one/few main tables. It helps to "divide and conquer" and problem solving
process when constructing complex queries, but I am inclined to believe that
does not mean an automatic physical manifestation of logic based code in the
form of separate distinct queries and temporary structures.

I'm not familiar with the Informix optimizer, but these days, optimizers
have gotten to the point where they are pretty decent in finding more
efficient algorithms, usually based on finding equivalent relational
algebra/calculus expressions, to generate the final relation representation,
given that it has an entire query to work with at once.

Another issue is whether the temp tables are actually written to disk, or
whether they are maintained purely in memory. Disk I/O still takes longer
than holding and processing everything in memory. I imagine that some
vendors attempt to use temp tables in a transient sort of way, while others,
as well as developers, actually initialize structures on disk as a
persistant structure.

I personally try as much as possible to refrain from intermediate table
solutions unless it is absolutely clear that the optimizer and indexes are
doing an unacceptable job of producing results. If intermediate tables are
required to materialize complex processing results to improve or facilitate
view and reporting results based on batch or large sets of data, and
especially in the case of deriving new data, I can understand the use of
them. But in the case of temp tables, there is often some overhead involved
in managing the creation, population, and destruction of such tables; and it
goes against the mantra of one fact, in one place, etc. because all one is
really doing is filtering the same data into new structures as a kind of
'snapshot', often co-existing within the same local database.

Thus, replication issues and concurrency issues might come into play. For
example, one issue that might be considered is the possibility that in the
case of analyzing and solving a complex problem from a large set of data
using temp tables, tuples within the base data set could change mid way
through the extraction process between different temp tables. Ideally, an
optimizer would make the right choices in determining the granularity,
level, and length of time for maintaining locks on all tuples/rows being
manipulated/scanned for the duration of the entire query - by this I mean
both the main query and the nested sub-querys and various correlations.
However, a think a danger might exist in that in using a "divide and
conquer" process with multiple intermediate temp db structures, data could
be updated or added in the base tables between individual queries that could
lead to some incongruency and inconsistency.

Obviously, there might be some cases where it would be wise to explicitly
control the locks on data when using temporary structures, especially since
you take this ability away from the optimizer when you break it up into
chunks. On the other hand, if the database is not updatable, then the
points I raised are not really an issue except in the cases where data might
be concurrently added as a query is processing; which in that case, the
points I raised are just as pertinent.

I do agree on the hints.

Dan

Daniel A. Morgan

unread,
Jan 12, 2002, 7:01:38 AM1/12/02
to
People that want to build a bunch of temp tables generally come from a SQL
Server / Sybase background and haven't taken any time to learn Oracle's
architecture. They are trying to write what they did in SQL Server and think
because it was what you did in one product it is what you do in all products.
They also tend to write absolutely horrible code that brings Oracle to its
knees.

In more than 12 years of coding in Oracle I have never found a case where
"creating a bunch of temp tables" was essential or even advisable.

Daniel Morgan

David Cressey

unread,
Jan 16, 2002, 9:52:19 AM1/16/02
to
Tushar,

It depends. It always depends.

My comments are not about version 9i, but apply pretty much across
the board, and should extend to 9i reasonably well.

For a well designed and well behaved database, and a well formed
query, the brute force method nearly always yields performance that
is good enough. Given the extra programming effort needed to
implement "divide and conquer", and the extra maintenance effort that
this approach may create for the future, the brute force method is
probably the one to try first, and use divide and conquer in those
cases where performance is really a problem.

Let me change the phrasing. In this case, I prefer, "free the
optimizer" to "brute force". The optimizer is a lot more subtle than
a mere brute force solution. The cost based optimizer will come up
with an execution plan that reflects available indexes, various join
algorithms, table cardinalities (size expressed in rows), and other
considerations.

If, for example, there is one huge table, with selection criteria
that will narrow the search down to a few dozen rows, and the other
tables can all be
searched using a index for table lookup, the response will be
blazingly fast, just using the optimizer.

There are several things that can upset this rosy picture.

The database design might be illogical or clumsy. The database may be
overpopulated, compared to what was originally planned for. The query
may be poorly phrased, or in some cases, logically incorrect. The
database may be using the RBO, and the phrasing of the query may lead
to an unfortunate execution plan. The query may be correct, but
outside the scope of what was thought of when the database was
designed. I'm thinking particularly of index design.

My plan usually, is to make the query as sensible and well formed as
possible, and to try it out. Often, this yields satisfactory
performance, and I move on to the next issue. In cases where the
performance is aproblem, I look at the execution plan. If it's a
terrible plan, like for instance one that involves a cartesian join
between large tables, I look for several things.

One thing I look for is logical errors in the query. Then, I make sure
the CBO is in use. Then, I make sure the tables have been analyzed so
that the size is accurately cataloged. Then, I take a look at the
indexes.

19 out of 20 queries I write never get beyond this point.

If I haven't found the problem by then, I'll use hints, or rephrase
the
query. As a last resort, I'll program it in "divide and conquer"
fashion.

I realize this is all kind of vague and hand wavy, but I hope it
helps.

Keith Boulton

unread,
Jan 16, 2002, 7:24:05 PM1/16/02
to
Someone has already said that if using a divide and conquer approach, you do
a single table scan during the session so that you can expand lookup codes.
This gets more efficient the more likely people are to stay connected -
however:

> is good enough. Given the extra programming effort needed to
> implement "divide and conquer", and the extra maintenance effort that

Minimal, given adequate programmers (what is that pink thing that just flew
by the window?)

> a mere brute force solution. The cost based optimizer will come up
> with an execution plan that reflects available indexes, various join
> algorithms, table cardinalities (size expressed in rows), and other
> considerations.

And then randomly select one.

> If, for example, there is one huge table, with selection criteria
> that will narrow the search down to a few dozen rows, and the other
> tables can all be
> searched using a index for table lookup, the response will be
> blazingly fast, just using the optimizer.

Not true of queries using bind variables.

> The database design might be illogical or clumsy. The database may be
> overpopulated, compared to what was originally planned for.

The point of the CBO is that it should allow for this (it doesn't)

> The query may be poorly phrased

ie the cbo is crap and can't understand it

> possible, and to try it out. Often, this yields satisfactory
> performance, and I move on to the next issue. In cases where the

The rule of all rules: fast enough is fast enough


0 new messages