Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

LEFT JOIN vs. OUTER -- Performance Problem

763 views
Skip to first unread message

rspi...@googlemail.com

unread,
Mar 7, 2012, 11:17:28 AM3/7/12
to
Dear Informixers,

why is a query containing outer joins written as LEFT JOIN so much slower than when written using the Informix-specific OUTER syntax, as soon as the query includes tables from another database in another Informix instance on the same server?

Here is a test case. This query returns instantly:

SELECT
s.persnr,
s.vollname,
vt.v_end AS ver_ende,
fa.von AS farzt_datum,
dr.von AS doktr_datum
FROM stamm s, OUTER vertrag vt,
OUTER verteiler@opserver_tcp:pers_zusatz fa,
OUTER verteiler@opserver_tcp:pers_zusatz dr
WHERE s.persnr > 0
AND vt.persnr = s.persnr
AND vt.v_end = (SELECT max(v_end) FROM vertrag x WHERE x.persnr = s.persnr)
AND fa.persnr = s.persnr
AND fa.attribut = "FARZ"
AND dr.persnr = s.persnr
AND dr.attribut = "PROM"
ORDER BY s.persnr DESC

while this query takes over 1 minute to return identical results:

SELECT
s.persnr,
s.vollname,
vt.v_end AS ver_ende,
fa.von AS farzt_datum,
dr.von AS doktr_datum
FROM stamm s
LEFT JOIN vertrag vt ON (vt.persnr = s.persnr AND vt.v_end = (SELECT max(v_end) FROM vertrag x WHERE x.persnr = s.persnr) )
LEFT JOIN verteiler@opserver_tcp:pers_zusatz fa ON (fa.persnr = s.persnr AND fa.attribut = "FARZ")
LEFT JOIN verteiler@opserver_tcp:pers_zusatz dr ON (dr.persnr = s.persnr AND dr.attribut = "PROM")
WHERE s.persnr > 0
ORDER BY s.persnr DESC

The tables involved are rather small:
stamm: 985 rows
vertrag: 5214 rows
verteiler@opserver_tcp:pers_zusatz: 14963 rows

In the test environment where the table "pers_zusatz" was in the same database as the other two tables, there were no performance problems.

Database engine is IDS 11.50UC7GE running on SuSE Linux Enterprise Server 10 SP1 32bit.

I didn't write the original "LEFT JOIN" query, the application developer asked me for help as the database admin. Re-writing the query using the OUTER syntax solved the problem for now, but I'm still puzzled. Any ideas?

Regards, Richard

Marco Greco

unread,
Mar 7, 2012, 12:22:40 PM3/7/12
to inform...@iiug.org
> _______________________________________________
> Informix-list mailing list
> Inform...@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list
>

Your proble is the

WHERE s.persnr> 0

filter.
In ANSI joins, this is evaluated as a post join filter, which means that all
the joins get materialized first and then the filter gets applied.
Moove it to the filters in the first join and skip the where clause entirely -
that should fix it.

--
Ciao,
Marco
______________________________________________________________________________
Marco Greco /UK /IBM Standard disclaimers apply!

Structured Query Scripting Language http://www.4glworks.com/sqsl.htm
4glworks http://www.4glworks.com
Informix on Linux http://www.4glworks.com/ifmxlinux.htm

Richard Spitz

unread,
Mar 8, 2012, 4:33:53 AM3/8/12
to inform...@iiug.org
Am Mittwoch, 7. März 2012 18:22:40 UTC+1 schrieb Marco Greco:
>
> Your proble is the
>
> WHERE s.persnr> 0
>
> filter.
> In ANSI joins, this is evaluated as a post join filter, which means that all
> the joins get materialized first and then the filter gets applied.
> Moove it to the filters in the first join and skip the where clause entirely

I read about the post join filters in ANSI LEFT JOINs when I googled for a solution, but I just can't figure out where to apply the "WHERE s.persnr > 0" filter so I really only get results where this condition is met. Could you tell me how to write the above query with the filter in the first join?

When I leave out this condition altogether, the run time for the query is cut into half from 90 to 45 seconds, but this is still way too much.

Regards, Richard

Marco Greco

unread,
Mar 8, 2012, 7:14:16 AM3/8/12
to inform...@iiug.org
> _______________________________________________
> Informix-list mailing list
> Inform...@iiug.org
> http://www.iiug.org/mailman/listinfo/informix-list
>

This should do it

SELECT
s.persnr,
s.vollname,
vt.v_end AS ver_ende,
fa.von AS farzt_datum,
dr.von AS doktr_datum
FROM stamm s
LEFT JOIN vertrag vt ON (s.persnr > 0 AND vt.persnr = s.persnr AND vt.v_end
= (SELECT max(v_end) FROM vertrag x WHERE x.persnr = s.persnr) )
LEFT JOIN verteiler@opserver_tcp:pers_zusatz fa ON (fa.persnr = s.persnr
AND fa.attribut = "FARZ")
LEFT JOIN verteiler@opserver_tcp:pers_zusatz dr ON (dr.persnr = s.persnr
AND dr.attribut = "PROM")
ORDER BY s.persnr DESC

Also, do not forget to compare ANSI and non ANSI query plans. It wouldn't be
the first time they are optimized differently.

Richard Spitz

unread,
Mar 8, 2012, 11:05:18 AM3/8/12
to inform...@iiug.org
Am Donnerstag, 8. März 2012 13:14:16 UTC+1 schrieb Marco Greco:
> This should do it
>
> SELECT
> s.persnr,
> s.vollname,
> vt.v_end AS ver_ende,
> fa.von AS farzt_datum,
> dr.von AS doktr_datum
> FROM stamm s
> LEFT JOIN vertrag vt ON (s.persnr > 0 AND vt.persnr = s.persnr AND vt.v_end
> = (SELECT max(v_end) FROM vertrag x WHERE x.persnr = s.persnr) )
> LEFT JOIN verteiler@opserver_tcp:pers_zusatz fa ON (fa.persnr = s.persnr
> AND fa.attribut = "FARZ")
> LEFT JOIN verteiler@opserver_tcp:pers_zusatz dr ON (dr.persnr = s.persnr
> AND dr.attribut = "PROM")
> ORDER BY s.persnr DESC

Thanks for your suggestion, but I had tried the same before and it didn't solve the problem. Not only does this query return records with persnr < 0, but it is just as slow as when omitting the post join filter "persnr > 0" altogether.

> Also, do not forget to compare ANSI and non ANSI query plans. It wouldn't be
> the first time they are optimized differently.

Posting the output of "set explain on" would be a bit much in this newsgroup, but I'd be happy to send it by mail to anybody willing to look at it.

Regards, Richard

Marco Greco

unread,
Mar 8, 2012, 12:45:48 PM3/8/12
to informix-list
this definitely applies the filter on stamm before any join, although chances
are that the subquery on stamm might be materialized:

SELECT
s.persnr,
s.vollname,
vt.v_end AS ver_ende,
fa.von AS farzt_datum,
dr.von AS doktr_datum
FROM (select * from stamm where stamm.persnr>0) s
LEFT JOIN vertrag vt ON (vt.persnr = s.persnr AND vt.v_end = (SELECT
max(v_end) FROM vertrag x WHERE x.persnr = s.persnr) )
LEFT JOIN verteiler@opserver_tcp:pers_zusatz fa ON (fa.persnr = s.persnr
AND fa.attribut = "FARZ")
LEFT JOIN verteiler@opserver_tcp:pers_zusatz dr ON (dr.persnr = s.persnr
AND dr.attribut = "PROM")
ORDER BY s.persnr DESC


Richard Spitz

unread,
Mar 12, 2012, 12:15:28 PM3/12/12
to informix-list
Am Donnerstag, 8. März 2012 18:45:48 UTC+1 schrieb Marco Greco:
> this definitely applies the filter on stamm before any join, although chances
> are that the subquery on stamm might be materialized:
>
> SELECT
> s.persnr,
> s.vollname,
> vt.v_end AS ver_ende,
> fa.von AS farzt_datum,
> dr.von AS doktr_datum
> FROM (select * from stamm where stamm.persnr>0) s
> LEFT JOIN vertrag vt ON (vt.persnr = s.persnr AND vt.v_end = (SELECT max(v_end) FROM vertrag x WHERE x.persnr = s.persnr) )
> LEFT JOIN verteiler@opserver_tcp:pers_zusatz fa ON (fa.persnr = s.persnr
> AND fa.attribut = "FARZ")
> LEFT JOIN verteiler@opserver_tcp:pers_zusatz dr ON (dr.persnr = s.persnr
> AND dr.attribut = "PROM")
> ORDER BY s.persnr DESC

Thanks again, this time the query delivers the correct results but performs just as badly as my original version.

The main question remains: Why does LEFT JOIN performance suck as soon as the query involves tables from another server?

Regards, Richard

Everett Mills

unread,
Mar 12, 2012, 1:13:59 PM3/12/12
to inform...@iiug.org
Richard-
My best guess is that it's applying your fa.attribut = "FARZ" and dr.attribut = "PROM" filters after the rows are brought to the querying server. Try this instead. I think it should prevent the excess traffic:

SELECT
s.persnr,
s.vollname,
vt.v_end AS ver_ende,
fa.von AS farzt_datum,
dr.von AS doktr_datum
FROM (select * from stamm where stamm.persnr>0) s
LEFT JOIN vertrag vt ON (vt.persnr = s.persnr AND vt.v_end =
(SELECT max(v_end) FROM vertrag x WHERE x.persnr = s.persnr) )
LEFT JOIN
(
SELECT von, persnr FROM verteiler@opserver_tcp:pers_zusatz fa
WHERE attribute = "FARZ"
) fa
ON (fa.persnr = s.persnr)
LEFT JOIN
(
SELECT von, persnr FROM verteiler@opserver_tcp:pers_zusatz
WHERE attribute = "PROM"
) dr
ON (dr.persnr= s.persnr)
ORDER BY s.persnr DESC

--EEM

If you need fa and dr to be indexed, load them into temp tables, index them and join the temp tables instead.

--EEM

Richard Spitz

unread,
Mar 14, 2012, 12:34:52 PM3/14/12
to inform...@iiug.org
Hi EEM,

you da man!

You are right on the spot with your assumption. Your query runs just in under a second!

I'm presently rewriting the original query (which is more complicated and, among other things, contains a couple more of these left joins) to see if the performance is still comparable to the OUTER syntax, but now I know I'm on the right track.

Thanks a bunch,

Richard
0 new messages