Request for advice

38 views
Skip to first unread message

Silvio

unread,
Apr 26, 2023, 5:03:28 AM4/26/23
to H2 Database
We have some heavy queries that involve selecting records from a base cached table A (~100K records) that satisfy a quite a number of conditions expressed as

A.PK [NOT] IN (...)

on a secondary cached table B (~10M) records. Although the subqueries use indexed columns the overall query is very slow.

We are thinking of using a temporary memory table C that holds primary keys of table A, evaluating the subqueries on B seperately inserting or removing keys into table C as needed and finally having a single subquery

A.PK NOT IN (SELECT FK_A FROM C)

Has anoyone ever tried such an approach in H2? Is there any reason to expect an improvement in performance in comparison to the single large query we have now?

Andreas Reichel

unread,
Apr 26, 2023, 5:09:58 AM4/26/23
to h2-da...@googlegroups.com
Greetings.

As far as I understand it, Evgenji just committed a change regarding index use in `IN()` clause.
Maybe try the very lastest GIT Master first.

Also I wonder, why you would not use a JOIN instead of a IN() when you have a list of FK_A from C:

select *
from a
inner join c
on a.pk = c.fk_a

Why use (uncorrelated?) sub-queries?

Best regards
Andreas
--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/h2-database/6e7570af-a74c-4e83-9560-a85cfad1e8d4n%40googlegroups.com.

Noel Grandin

unread,
Apr 26, 2023, 5:12:19 AM4/26/23
to h2-da...@googlegroups.com, Silvio


On 4/26/2023 11:03 AM, Silvio wrote:
> We have some heavy queries that involve selecting records from a base cached table A (~100K records) that satisfy a
> quite a number of conditions expressed as
>
> A.PK [NOT] IN (...)
>

You could also express those as

A.PK NOT IN ( SubNotCondition1 UNION SubNotCondition2 ... )
AND A.PK IN ( SubCondition1 UNION SubCondition2 ... )

which should result in less scanning of the second table.

>
> We are thinking of using a temporary memory table C that holds primary keys of table A, evaluating the subqueries on B
> seperately inserting or removing keys into table C as needed and finally having a single subquery
>

That might help, and I have done similar things (on other databases).

You probably want a session-local temporary table, and you probably want to do

CREATE IF NOT EXISTS TEMP1
TRUNCATE TEMP1

for each query, rather than creating and dropping it, because CREATE/DROP is quite expensive in H2.

Silvio

unread,
Apr 26, 2023, 5:34:29 AM4/26/23
to H2 Database
Unfortunately we are talking about a production situation and using a h2-version that is not officially released (for whatever that means) is something we are not allowed to do. I would be very eager to check what happens with any new upcoming release.

A join would naturally work but I would also expect the subquery to result in a similar evaluation path. If there is any reason to expect otherwise we would happily use the join.

We initially did a multi-join instead of the subqueries we have now. That was beyond slow. We never saw any of those queries come to completion.

Silvio

unread,
Apr 26, 2023, 5:41:33 AM4/26/23
to H2 Database
That is actually a very good idea. We never considered using a UNION here. Would H2 be able to use a different index for multiple queries inside a single UNION? I was under the impression H2 always uses one index for a query but that may no longer be the case. If that is true this could improve things a lot.

The suggestion about the temp table creation is also valuable. We have to account for multiple users running the same type of queries at the same time so we would need a separate table for each of them or complicate things by adding additional information to the temp table to distinguish the records for separate queries.
Reply all
Reply to author
Forward
0 new messages