Question about HASH JOIN

45 views
Skip to first unread message

Tomasz Dubiel

unread,
Nov 23, 2022, 9:41:54 AM11/23/22
to firebird-support
Hello.
there could be an advantage on FB 3.0 with using hash join when joining small and big tables.
It really gives sometimes way better perfomance (even 13 times faster), but why is this a mysterious feature which we have to explicitly use? Why can't an optimizer check this?
Best regards.

Dmitry Yemanov

unread,
Nov 23, 2022, 9:58:59 AM11/23/22
to firebird...@googlegroups.com

Tomasz Dubiel

unread,
Nov 23, 2022, 10:11:48 AM11/23/22
to firebird-support
OK, so it works like that in Firebird 5.0.

Steve Naidamast

unread,
Nov 23, 2022, 11:20:29 AM11/23/22
to firebird-support
There is a Firebird 5.0 now?  Haven't seen it announced on the website...

Dimitry Sibiryakov

unread,
Nov 23, 2022, 11:33:02 AM11/23/22
to firebird...@googlegroups.com
Steve Naidamast wrote 23.11.2022 17:20:
> There is a Firebird 5.0 now?  Haven't seen it announced on the website...

It is under development. You are encouraged to try to use snapshot and report
issues.

--
WBR, SD.

Tomasz Dubiel

unread,
Nov 24, 2022, 2:48:06 AM11/24/22
to firebird-support
How will react Firebird 5.0 when spotting a.id_column +0 = b.id_column +0 in SQL? Will it check what is better or will it always choose hash join?

środa, 23 listopada 2022 o 15:58:59 UTC+1 Dmitry Yemanov napisał(a):

Dmitry Yemanov

unread,
Nov 24, 2022, 2:53:38 AM11/24/22
to firebird...@googlegroups.com
24.11.2022 10:48, Tomasz Dubiel wrote:

> How will react Firebird 5.0 when spotting a.id_column +0 = b.id_column
> +0 in SQL? Will it check what is better or will it always choose hash join?

Always hash join. Such a "hinting" with +0 is commonly used
intentionally to change the plan, so the optimizer will not be breaking
existing tricks.


Dmitry

Tomasz Dubiel

unread,
Nov 24, 2022, 3:02:50 AM11/24/22
to firebird-support
Well, I could think about such a situation where now , with Firebird 3 and current data in tables, hash join will be better, but with something new in Firebird 5 and different data hash join would be worse.
That's what is written on the page - this can be better when joining big and small tables, but that's not a rule (and I can also confirm this - thats very random on our databases).
Because, if in Firebird 5.0 an optimizer always chooses what is better, there is no use in forcing hash join, which we have to do now in Firebird 3.0. Thats why I would vote for ignoring +0 for both columns of a JOIN for Firebird 5.

Dimitry Sibiryakov

unread,
Nov 24, 2022, 5:34:13 AM11/24/22
to firebird...@googlegroups.com
Tomasz Dubiel wrote 24.11.2022 9:02:
> Because, if in Firebird 5.0 an optimizer always chooses what is better, there is
> no use in forcing hash join, which we have to do now in Firebird 3.0. Thats why
> I would vote for ignoring +0 for both columns of a JOIN for Firebird 5.

At first it must be proven to always choose what is better. Or explicit hints
must be introduced.

--
WBR, SD.

Dmitry Yemanov

unread,
Nov 24, 2022, 5:37:42 AM11/24/22
to firebird...@googlegroups.com
24.11.2022 11:02, Tomasz Dubiel wrote:

> Because, if in Firebird 5.0 an optimizer always chooses what is better,
> there is no use in forcing hash join, which we have to do now in
> Firebird 3.0. Thats why I would vote for ignoring +0 for both columns of
> a JOIN for Firebird 5.

If someone already used +0 tricks, they knows for sure (based on
experience) that hash join is better. Ignoring that would introduce a
risk of wrong optimizer decision (which is still possible regardless of
how clever the optimizer is).


Dmitry

Tomasz Dubiel

unread,
Nov 24, 2022, 9:39:30 AM11/24/22
to firebird-support
I try to check how to use it in our ERP system, but that is problematic to use. We have various clients with the same database structure and completely different level of data in it.
Sometimes SQL with hash join runs 13 times faster on one database, when on the other database - the same SQL runs a little bit more slowly. We havent noticed so far the situation where SQL with hash join is much slower, but it comes to testing on as many databases as possible.

Tomasz Dubiel

unread,
Nov 24, 2022, 9:42:50 AM11/24/22
to firebird-support
*" the same SQL" with hash join comparing to without hash join

Karol Bieniaszewski

unread,
Nov 24, 2022, 11:32:37 AM11/24/22
to firebird...@googlegroups.com

13 times faster is not possible, until current join is „wrong”.

Can you show some example?

 

Regards,

Karol Bieniaszewski

--
You received this message because you are subscribed to the Google Groups "firebird-support" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firebird-suppo...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/firebird-support/b6a9caa8-7b39-4026-a102-6dfdf761f4c0n%40googlegroups.com.

 

Tomasz Dubiel

unread,
Nov 25, 2022, 8:47:49 AM11/25/22
to firebird-support
Maybe it's an exaggeration, but I have now example of 7 times better perfomance and I need a little bit more of explanations to fully understand when and if only on FB 3.0.
Why is it adviced to use HASH JOIN on FB 3.0? I just checked that using this trick on FB 2.5 makes almost the same profit. I don't see HASH JOIN in the plan, but the difference in indexed reads is very similar when seeing HASH JOIN on FB 3.0.
Normal plan on FB 2.5 vs adding +0 to both columns of a join - second one resulting in 7 times better performance.
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN JOIN (JOIN (MERGE (SORT (JOIN (N INDEX (MK_NAGL_KLU_STATUS), NA INDEX (PK_NAGLAKT), KN INDEX (RDB$PRIMARY91), K INDEX (RDB$PRIMARY91), NZR INDEX (FK_NAGLZR_NAGL), NZW INDEX (RDB$FOREIGN1005), PZR INDEX (FK_POZZR_NAGLZR, FK_POZZR_KARTOTEKA), PZW INDEX (RDB$PRIMARY979))), SORT (U NATURAL)), AKW INDEX (RDB$PRIMARY18)), ND INDEX (PK_NAGLDOST))
PLAN JOIN (JOIN (JOIN (JOIN (N INDEX (MK_NAGL_KLU_STATUS), U INDEX (RDB$PRIMARY122), NA INDEX (PK_NAGLAKT), NZR INDEX (FK_NAGLZR_NAGL), NZW INDEX (RDB$FOREIGN1005), PZR INDEX (FK_POZZR_NAGLZR, FK_POZZR_KARTOTEKA), PZW INDEX (RDB$PRIMARY979)), J INDEX (RDB$PRIMARY116)), AKW INDEX (RDB$PRIMARY18)), ND INDEX (PK_NAGLDOST))

Adapted Plan
--------------------------------------------------------------------------------
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN JOIN (JOIN (MERGE (SORT (JOIN (N INDEX (MK_NAGL_KLU_STATUS), NA INDEX (PK_NAGLAKT), KN INDEX (PK_KONTRAH), K INDEX (PK_KONTRAH), NZR INDEX (FK_NAGLZR_NAGL), NZW INDEX (FK_NAGLZAMWSP_NAGL), PZR INDEX (FK_POZZR_NAGLZR, FK_POZZR_KARTOTEKA), PZW INDEX (PK_POZZAMWSP))), SORT (U NATURAL)), AKW INDEX (PK_AKWIZYTOR)), ND INDEX (PK_NAGLDOST))
PLAN JOIN (JOIN (JOIN (JOIN (N INDEX (MK_NAGL_KLU_STATUS), U INDEX (PK_UZYTKOWNIK), NA INDEX (PK_NAGLAKT), NZR INDEX (FK_NAGLZR_NAGL), NZW INDEX (FK_NAGLZAMWSP_NAGL), PZR INDEX (FK_POZZR_NAGLZR, FK_POZZR_KARTOTEKA), PZW INDEX (PK_POZZAMWSP)), J INDEX (PK_JEDNORG)), AKW INDEX (PK_AKWIZYTOR)), ND INDEX (PK_NAGLDOST))

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN JOIN (JOIN (MERGE (SORT (JOIN (PZR INDEX (FK_POZZR_KARTOTEKA), NZR INDEX (PK_NAGLZR), PZW INDEX (RDB$PRIMARY979))), SORT (MERGE (SORT (JOIN (N INDEX (MK_NAGL_KLU_STATUS), NA INDEX (PK_NAGLAKT), KN INDEX (RDB$PRIMARY91), K INDEX (RDB$PRIMARY91), NZW INDEX (RDB$FOREIGN1005))), SORT (U NATURAL)))), AKW INDEX (RDB$PRIMARY18)), ND INDEX (PK_NAGLDOST))
PLAN JOIN (JOIN (JOIN (MERGE (SORT (MERGE (SORT (JOIN (PZR INDEX (FK_POZZR_KARTOTEKA), NZR INDEX (PK_NAGLZR), PZW INDEX (RDB$PRIMARY979))), SORT (JOIN (N INDEX (MK_NAGL_KLU_STATUS), NA INDEX (PK_NAGLAKT), NZW INDEX (RDB$FOREIGN1005))))), SORT (U NATURAL)), J INDEX (RDB$PRIMARY116)), AKW INDEX (RDB$PRIMARY18)), ND INDEX (PK_NAGLDOST))

Adapted Plan
--------------------------------------------------------------------------------
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN (R INDEX (FK_REZ_POZZR))
PLAN JOIN (JOIN (MERGE (SORT (JOIN (PZR INDEX (FK_POZZR_KARTOTEKA), NZR INDEX (PK_NAGLZR), PZW INDEX (PK_POZZAMWSP))), SORT (MERGE (SORT (JOIN (N INDEX (MK_NAGL_KLU_STATUS), NA INDEX (PK_NAGLAKT), KN INDEX (PK_KONTRAH), K INDEX (PK_KONTRAH), NZW INDEX (FK_NAGLZAMWSP_NAGL))), SORT (U NATURAL)))), AKW INDEX (PK_AKWIZYTOR)), ND INDEX (PK_NAGLDOST))
PLAN JOIN (JOIN (JOIN (MERGE (SORT (MERGE (SORT (JOIN (PZR INDEX (FK_POZZR_KARTOTEKA), NZR INDEX (PK_NAGLZR), PZW INDEX (PK_POZZAMWSP))), SORT (JOIN (N INDEX (MK_NAGL_KLU_STATUS), NA INDEX (PK_NAGLAKT), NZW INDEX (FK_NAGLZAMWSP_NAGL))))), SORT (U NATURAL)), J INDEX (PK_JEDNORG)), AKW INDEX (PK_AKWIZYTOR)), ND INDEX (PK_NAGLDOST))

Tomasz Dubiel

unread,
Nov 25, 2022, 8:50:06 AM11/25/22
to firebird-support
Modified SELECT: with added +0 to both columns of a join

select *
from (select PZR.ID_KARTOTEKA, PZR.ID_MAGAZYN, 'Z' as RODZAJ, N.DATADOK, N.NRDOKZEW, PZR.ILOSC as ILOSC,
             PZR.ILOSCREZ as ILPOTW, PZR.ILOSCZREALIZ as ILZREALIZZAM, PZR.ILOSCDOREALIZ as ILPOZOSTALOZAM,
             cast(KN.NAZWASKR as varchar(255)) as NAZWAKTO, PZR.TERMIN as TERMINDOSTPOZ, PZW.DATAWYSYLKI,
             N.ID_UZYTKOWNIK, PZR.ID_POZ, U.NAZWISKOIMIE as OPERATOR, NZW.DATAOBOWZAM, ND.TERMINDOST as TERMINDOSTDOK,
             AKW.NAZWISKOIMIE as AKWIZYTOR, AKW.NRAKW,
             (select sum(R.ILZREALIZ)
              from REZ R
              where R.ID_POZZR = PZR.ID_POZZR) as REZZREALIZ,
             (select sum(R.ILDOREALIZ)
              from REZ R
              where R.ID_POZZR = PZR.ID_POZZR) as REZPOZOSTALO, cast(K.NAZWASKR as varchar(255)) as NAZWAKTN
      from NAGLAKT NA
      inner join NAGL N on (N.ID_NAGL = NA.ID_NAGL)
      inner join NAGLZAMWSP NZW on (NZW.ID_NAGL = N.ID_NAGL)
      inner join NAGLZR NZR on (NZR.ID_NAGL + 0 = N.ID_NAGL + 0)
      inner join POZZR PZR on (PZR.ID_NAGLZR = NZR.ID_NAGLZR)
      inner join POZZAMWSP PZW on (PZW.ID_POZ = PZR.ID_POZ)
      inner join UZYTKOWNIK U on (N.ID_UZYTKOWNIK + 0 = U.ID_UZYTKOWNIK + 0)
      inner join KONTRAH KN on (KN.ID_KONTRAH = N.ID_KONTRAHNAB)
      inner join KONTRAH K on (K.ID_KONTRAH = N.ID_KONTRAH)
      left outer join AKWIZYTOR AKW on (AKW.ID_AKWIZYTOR = N.ID_AKWIZYTOR)
      left outer join NAGLDOST ND on (ND.ID_NAGL = N.ID_NAGL)
      where (N.ID_GRUPADOK = 80) and
            (N.KLU_STATUS = 0) and
            (PZR.ILOSC > PZR.ILOSCZREALIZ) and
            (PZR.ID_KARTOTEKA = 11176)
      union all
      select PZR.ID_KARTOTEKA, PZR.ID_MAGAZYN, 'W' as RODZAJ, N.DATADOK, N.NRDOKZEW, PZR.ILOSC as ILOSC,
             PZR.ILOSCREZ as ILPOTW, PZR.ILOSCZREALIZ as ILZREALIZZAM, PZR.ILOSCDOREALIZ as ILPOZOSTALOZAM,
             cast(J.NAZWA as varchar(255)) as NAZWAKTO, PZR.TERMIN as TERMINDOSTPOZ, PZW.DATAWYSYLKI, N.ID_UZYTKOWNIK,
             PZR.ID_POZ, U.NAZWISKOIMIE as OPERATOR, NZW.DATAOBOWZAM, ND.TERMINDOST as TERMINDOSTDOK,
             AKW.NAZWISKOIMIE as AKWIZYTOR, AKW.NRAKW,
             (select sum(R.ILZREALIZ)
              from REZ R
              where R.ID_POZZR = PZR.ID_POZZR) as REZZREALIZ,
             (select sum(R.ILDOREALIZ)
              from REZ R
              where R.ID_POZZR = PZR.ID_POZZR) as REZPOZOSTALO, cast(J.NAZWA as varchar(255)) as NAZWAKTN
      from NAGLAKT NA
      inner join NAGL N on (N.ID_NAGL = NA.ID_NAGL)
      inner join NAGLZAMWSP NZW on (NZW.ID_NAGL = N.ID_NAGL)
      inner join NAGLZR NZR on (NZR.ID_NAGL + 0 = N.ID_NAGL + 0)
      inner join POZZR PZR on (PZR.ID_NAGLZR = NZR.ID_NAGLZR)
      inner join POZZAMWSP PZW on (PZW.ID_POZ = PZR.ID_POZ)
      inner join UZYTKOWNIK U on (N.ID_UZYTKOWNIK + 0 = U.ID_UZYTKOWNIK + 0)
      left outer join JEDNORG J on (N.ID_JEDNORG = J.ID_JEDNORG)
      left outer join AKWIZYTOR AKW on (AKW.ID_AKWIZYTOR = N.ID_AKWIZYTOR)
      left outer join NAGLDOST ND on (ND.ID_NAGL = N.ID_NAGL)
      where (N.ID_GRUPADOK = 82) and
            (N.KLU_STATUS = 0) and
            (PZR.ILOSC > PZR.ILOSCZREALIZ) and
            (PZR.ID_KARTOTEKA = 11176)
      order by 3, 4, 5) 

Dmitry Yemanov

unread,
Nov 25, 2022, 9:08:50 AM11/25/22
to firebird...@googlegroups.com
25.11.2022 16:47, Tomasz Dubiel wrote:

> Maybe it's an exaggeration, but I have now example of 7 times better
> perfomance and I need a little bit more of explanations to fully
> understand when and if only on FB 3.0.
> Why is it adviced to use HASH JOIN on FB 3.0? I just checked that using
> this trick on FB 2.5 makes almost the same profit. I don't see HASH JOIN
> in the plan, but the difference in indexed reads is very similar when
> seeing HASH JOIN on FB 3.0.

FB 2.5 does not support hash joins. In your case a merge join will be
used instead. But its cost is higher (due to need to sort the streams)
than for the hash join.


Dmitry

Reply all
Reply to author
Forward
0 new messages