Discrepency between SQL/MP Explain plan, actual sequence of IOs

Brian Ray

unread,

Nov 20, 2007, 1:34:46 PM11/20/07

to

Working on a Pathway server that does the ole' "grab a screen's worth of
data for display in a list." Basic first page, next page, previous page kind
of stuff. It opens a cursor consisting of a two-table inner join, let's call
them TABLE1 and TABLE2. There's a lot of predicate filtering on several of
TABLE1's columns, for which we built an index, and a tiny bit of filtering
on one of TABLE2's columns (beyond the implicit filtering of the inner
join). So on inhouse SQL compilation we get an Explain much like the
following:

- Operation 1) Scan TABLE1 via the aforementioned purpose-built index.

- Operation 2) Scan TABLE2 via one of its indexes containing the one column
used in the predicate filter and then

- Operation 2.1) Hash join the result to the result of Operation 1.

And hooray, no sorts required. In this dev pathway our datasets are pretty
small and the join method usually changes when we get the software into a
bigger production database. But I'm happy because it's concentrating on
TABLE1's index right out of the gate; this is the most selective step.

So then it's just a few dozen fetches until we hit end of cursor or fill up
the reply message with result rows. Cursor is BROWSE ACCESS.

Now Measure was showing some unusually high RECORDS USED and RECORDS
ACCESSED counts for this cursor, but low LOGICAL READS on the corresponding
tables, so I popped a some SQLSADISPLAY statements after the cursor open and
the fetch. The behavior reported by SQLSADISPLAY was:

- Cursor open: A block's worth (?) of *TABLE2's* rows accessed, and a few
dozen less rows used--all the rows that would satisfy the SELECT. None of
TABLE1's rows accessed or used.

- Each fetch: 1 row accessed and used from *TABLE1*, none for TABLE2.

Which seems like the exact opposite of the strategy that the Explain plan
laid out. Between SQL compilation and runtime, can the query processor flip
strategies on the fly? And why, if the cursor is BROWSE ACCESS, does it
scoop up all the rows from one of the tables? This cursor does not trigger a
sort. Nor is it locking any rows. Since the entire result set of one of the
scans resided in a single block (I'm guessing), did the thing say "Aha, no
point in going back to the disk/disk cache for TABLE2's rows for these
fetches, I'll flip the whole access plan on its head."?

And as mentioned, I'm confused why all the SQLSTMT's records accessed, or at
least records used, are not reflected in the corresponding FILEs' logical
reads--the SQLSTMT counters are much higher than the FILE counters. Do the
records accessed/used on a cursor open somehow "not count"? Do only the rows
accessed/used during the fetch count?

Thanks,

Brian Ray

wbreidbach

unread,

Nov 21, 2007, 3:42:45 AM11/21/07

to

Hi Brian,

perhaps you should use Measure to check this. Just do an "ADD SQLSTMT
<your object>" and you will see, which statement produces how much I/
O.
Regards
Wolfgang

wbreidbach

unread,

Nov 21, 2007, 3:48:46 AM11/21/07

to

On 21 Nov., 09:42, wbreidbach <wolfgang.breidb...@bv-

> Wolfgang- Zitierten Text ausblenden -
>
> - Zitierten Text anzeigen -

Hi again,

sorry, hit the "send button" too early, you already did what I
recommended. But you should have a look at the object itself by doing
a FUP INFO..., DETAIL. If you do not have a "VALID SQL OBJECT", the
program would be SQL-compiled every time it is used and this might
explain the different explain plans. Another thing are the statistics
on the tables. Are the actual? And did you use UPDATE STATISTICS or
UPDATE ALL STATISTICS? If you use the probabilistic option, queries
might run better, but creating the statistics takes much longer.
Regards
Wolfgang

Keith

unread,

Nov 21, 2007, 8:20:40 AM11/21/07

to

This is pure speculation, but one thing that occurs to me is that the
results from SQLSADISPLAY might look odd if the query was able to do
the scan of table1 using only columns from the index. In such a case,
I don't know what it would report in the SQLSA.

I'd kind of expect it to give the index name as the table name and
tell you what it accessed from the index, but maybe it doesn't do
that.

I believe the Explain makes it clear whether the query plan is to do
an index-only scan or use the index then access its base table. Which
does it show for your query?

Again, this is just speculation on my part -- I don't have any
experience with what is reported in SQLSA when the query involves an
index-only scan.

Brian Ray

unread,

Nov 21, 2007, 2:33:08 PM11/21/07

to

"wbreidbach" <wolfgang....@bv-zahlungssysteme.de> wrote in message
news:f9237cc0-b7e0-4c39...@f3g2000hsg.googlegroups.com...

> On 21 Nov., 09:42, wbreidbach <wolfgang.breidb...@bv-
> zahlungssysteme.de> wrote:

[chopping ...]

> sorry, hit the "send button" too early, you already did what I
> recommended. But you should have a look at the object itself by doing
> a FUP INFO..., DETAIL. If you do not have a "VALID SQL OBJECT", the
> program would be SQL-compiled every time it is used and this might
> explain the different explain plans. Another thing are the statistics
> on the tables. Are the actual? And did you use UPDATE STATISTICS or
> UPDATE ALL STATISTICS? If you use the probabilistic option, queries
> might run better, but creating the statistics takes much longer.
> Regards
> Wolfgang

I should definitely get statistics updated on these tables before looking at
the stats again. They may not be fresh.

It's a valid SQL object.

Thanks,

Brian Ray

unread,

Nov 21, 2007, 2:42:16 PM11/21/07

to

"Keith" <kd...@acm.org> wrote in message
news:2b40a60e-1477-45f1...@w34g2000hsg.googlegroups.com...

Aha, that could be it, as the TABLE1 scan is by indexed columns only--no
base table predicate. The cursor does select un-indexed TABLE1 columns, but
only filters by indexed columns. On cursor open it could just be dealing
with TABLE1's index, which contains everything it needs for the WHERE clause
and for the inner join. Then on the actual fetch maybe it's going back and
picking up the other selected columns.

Thanks,

Brian Ray

HKREGC...@spammotel.com

unread,

Nov 22, 2007, 11:55:22 AM11/22/07

to

Tables joined by a hash don't have to be accessed in any particular
order. In fact, in SQL/MX, they can be accessed simultaneously
(inherent parallelism). However, the explain plan must show them in
some order due to how it formats its output.

No sort? The query must not have an ORDER BY. Isn't there a
requirement for ordering the result? If the results from a hashjoin
come back in the correct order, it's purely by accident. This
(usually) only happens with small datasets.

Also, it's likely that the entire result set has to be built prior to
anything being returned by the first fetch. This might prove too slow
for a production database.

The steps for a hash join (might make the Measure stats more
meaningful):

1. Read all the data (that can possibly satisfy the query) from one of
the tables and hash the join key.
2. Read all the data (that can possibly satisfy the query) from the
other table and hash the join key.
3. Join the data using the hashed values.
4. Return the result.

You might want to try a larger dataset with an ORDER BY clause (with
STATS updated of course). It may produce a plan that uses a nested
join loop in the required order (much more efficient for cursors).
CONTROL QUERY INTERACTIVE ACCESS ON may coax the optimizer into
choosing this option.

Hope this helps,
Joe Bishop

Brian Ray

unread,

Nov 27, 2007, 2:27:04 PM11/27/07

to

<HKREGC...@spammotel.com> wrote in message
news:9473d4bd-f4c0-4956...@s36g2000prg.googlegroups.com...

> On Nov 20, 11:34 am, "Brian Ray" <be_...@sbcglobal.net> wrote:

[orig post snipped]

>
> Tables joined by a hash don't have to be accessed in any particular
> order. In fact, in SQL/MX, they can be accessed simultaneously
> (inherent parallelism). However, the explain plan must show them in
> some order due to how it formats its output.

Good to know.

> No sort? The query must not have an ORDER BY. Isn't there a
> requirement for ordering the result? If the results from a hashjoin
> come back in the correct order, it's purely by accident. This
> (usually) only happens with small datasets.

It has an ORDER BY that jibes with sort order of the index used to access
the first table.

> Also, it's likely that the entire result set has to be built prior to
> anything being returned by the first fetch. This might prove too slow
> for a production database.

Indeed, if this were to happen in production it would short-circuit the
intent of this enhancement, which was to drive down IO per screenful of data
to some reasonable volume. Killing the join and implementing it "manually,"
via a cursor on the first table and a separate SELECT on the second table,
dropped IOs/$RECEIVE down to what I was predicting in the first place.

> The steps for a hash join (might make the Measure stats more
> meaningful):
>
> 1. Read all the data (that can possibly satisfy the query) from one of
> the tables and hash the join key.
> 2. Read all the data (that can possibly satisfy the query) from the
> other table and hash the join key.
> 3. Join the data using the hashed values.
> 4. Return the result.
>
> You might want to try a larger dataset with an ORDER BY clause (with
> STATS updated of course). It may produce a plan that uses a nested
> join loop in the required order (much more efficient for cursors).
> CONTROL QUERY INTERACTIVE ACCESS ON may coax the optimizer into
> choosing this option.

I was hoping for this behavior in my tiny little inhouse pathway but for
lack of time to experiment further, I'm resigned to getting rid of the inner
join. Next time 'round I'll find a better development database ...

> Joe Bishop

Heh, the Tandem world is too small a place for there to be two Joe Bishops.
I was just in the Bay Area and stopped by the old FDS offices to say high to
Eric, Jacob, and the handful of guys still there from My Old Days (1993 -
1997). Greetings once again.

Brian

HKREGC...@spammotel.com

unread,

Nov 28, 2007, 12:29:25 AM11/28/07

to

On Nov 27, 12:27 pm, "Brian Ray" <be_...@sbcglobal.net> wrote:
> <HKREGCJAB...@spammotel.com> wrote in message

>
> news:9473d4bd-f4c0-4956...@s36g2000prg.googlegroups.com...
>
> > On Nov 20, 11:34 am, "Brian Ray" <be_...@sbcglobal.net> wrote:
> [orig post snipped]
>

> [more snipping]

>
> > You might want to try a larger dataset with an ORDER BY clause (with
> > STATS updated of course). It may produce a plan that uses a nested
> > join loop in the required order (much more efficient for cursors).
> > CONTROL QUERY INTERACTIVE ACCESS ON may coax the optimizer into
> > choosing this option.
>
> I was hoping for this behavior in my tiny little inhouse pathway but for
> lack of time to experiment further, I'm resigned to getting rid of the inner
> join. Next time 'round I'll find a better development database ...
>
> > Joe Bishop
>
> Heh, the Tandem world is too small a place for there to be two Joe Bishops.
> I was just in the Bay Area and stopped by the old FDS offices to say high to
> Eric, Jacob, and the handful of guys still there from My Old Days (1993 -
> 1997). Greetings once again.
>
> Brian

Hey Brian,
I thought I recognized that name! I owe the FDS guys a visit too.

I did some more research, and I'm afraid I have to take back those
hash join steps. They only apply during a parallel operation using a
"hybrid" hash join.

The revised version:

1. Read all the required data from the table with the smaller dataset
and hash each distinct join key.
2. Read from the larger dataset and, for each row, compare the hash of
its join key with the hash value from the first table.
3. If it matches, return the result.

If the 2nd step uses an index (or primary key), the rows are returned
in the order of that key (what some call an "ordered" hash join).

Therefore, to avoid a sort, the index could only begin to be accessed
after the other table had been completely read.

I still think you'd get better performance by forcing a nested join
instead of splitting it into multiple statements.

Good to see that many of the old FDS crowd are still in the Tandem
world,
Joe