H2 database: slow query although index is used

926 views
Skip to first unread message

BR

unread,
Jan 23, 2015, 3:42:40 PM1/23/15
to h2-da...@googlegroups.com
Pasting here from Stackoverflow since it seems it has been overseen there.

I am using the H2 1.3.176.

1) table definition:

CREATE TABLE TEST(ID BIGINT PRIMARY KEY, ACCOUNT BIGINT, TXID BIGINT); 

2) inserting values into the table:

INSERT INTO TEST SELECT X, RAND()*100, X FROM SYSTEM_RANGE(1, 1000000);

3) creating an index to use for my query:

CREATE Unique INDEX IDX_TEST_ACCOUNT_TXID ON `test` (account, txId DESC);
 

4) doing the following query:

explain analyze
select txid from test where account=22 AND txid<9999999 order by txid desc limit 25
 

I get the following execution plan:

SELECT
    TXID
FROM PUBLIC.TEST
    /* PUBLIC.IDX_TEST_ACCOUNT_TXID: ACCOUNT = 22
        AND TXID < 9999999
     */
    /* scanCount: 9867 */
WHERE (ACCOUNT = 22)
    AND (TXID < 9999999)
ORDER BY 1 DESC
LIMIT 25
/*
TEST.IDX_TEST_ACCOUNT_TXID read: 103
*/
 

Question:

why does H2 need to scan through the entire index?

I was expecting the scan count to be 25 since the txid in the index should be in descending order already so once H2 is in the account=22 branch of the index it should be able to just read the next 25 entries. This will lead to slow queries if there are millions of entries in the table. Even if H2 has to search for the first matching entry within the index I would expect this to be an O(log(N)) algorithm and not a scan. If I do the same thing without the column account (means that the table just contains id and txid) then a descending index on txid will indeed result in a scan count of 25 (using the query "select txid from test where txid<9999999 order by txid desc"). Why is the additional column ruining the execution plan? Maybe I don't understand how the index works. Is there a better way to define an index for my query?


BR

unread,
Jan 24, 2015, 9:44:44 AM1/24/15
to h2-da...@googlegroups.com
I stepped through the h2 source code and found out what is going wrong:

During preparing the execution of the query, h2 tries to determine whether it can use the index for sorting and limiting the result set. Since the first index column (account) is not in the order by clause, h2 thinks it cannot use the index. This results in h2 scanning through the entire index fetching all rows and then sorting and limiting the result set. This is surprising since the condition on account is an "equal" condition so h2 should realize that it can indeed use the index for sorting and limiting the result set.
The solution is to provide the account column in the order by clause. Thus the query should be:

select txid from test where account=22 AND txid<9999999 order by accountid, txid desc limit 25

and i get the expected execution plan


SELECT
    TXID
FROM PUBLIC.TEST
    /* PUBLIC.IDX_TEST_ACCOUNT_TXID: ACCOUNT = 22
        AND TXID < 9999999
     */
    /* scanCount: 25 */

WHERE (ACCOUNT = 22)
    AND (TXID < 9999999)
ORDER BY =ACCOUNT, 1 DESC
LIMIT 25
/* index sorted */

which has a scan count of only 25 :)

Fred&Dani&Pandora&Aquiles

unread,
Jan 27, 2015, 10:03:03 AM1/27/15
to h2-da...@googlegroups.com
Hi,

Even with the found solution, I was wondering if the optimization process could be smart enough in this special case, where you have a query with only one table and an equality filter condition in the indexed column account, which is absent of the ORDER BY clause. 

I'm not totally sure, but sometimes the absence of a column in the ORDER BY clause do not alter the expected final result, and a index sorted scan still could be used, as related previously. Besides, I saw that the Postgres scans only 25 tuples.

So, what you think about this statement: a query with only one table and indexed columns filtered only by equality conditions (of type <COLUMN> = <CONSTANT>), which are absent of ORDER BY clause, can be optimized to use such columns  implicitly in a sorted index scan.

Finally, considering the previous statement as true, I wrote the attached patch to solve the problem. As I'm not an H2 core developer currently, I don't know if the code follows your minimum quality requirements, I don't even know if the solution is worth, but I decided sharing the patch, maybe it can be used as a start point for a future optimization.



--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

sort_optimization.patch

BR

unread,
Jan 29, 2015, 1:17:33 PM1/29/15
to h2-da...@googlegroups.com
Hi,

I am not a core developer too so i can't alter any H2 code. But maybe Thomas Müller can take a look?

Fred&Dani&Pandora&Aquiles

unread,
Jan 30, 2015, 8:57:31 AM1/30/15
to h2-da...@googlegroups.com
Hi,

Debugging the Postgres code I saw that my idea is not so wrong. They use a similar idea to test sorted path keys. The function pathkey_is_redundant (line 130 of this file) is used by the optimization process to evaluate path keys, and if you read the comments, especially the case 1, you'll see that it is similar to my idea. Besides, I've checked also in Postgres that this optimization step works only in disjunctive queries with restriction clauses of type COL <OP> CONST (), as I thought also.

Anyway, I won't insist in this optimization patch, because I understand this only was my first contribution attempt for the H2 and I still do not know how works the process of submission and acceptance of patches. I'll stay tuned ;-)

Regards,

Fred

2015-01-29 16:17 GMT-02:00 BR <nempr...@gmx.de>:
Hi,

I am not a core developer too so i can't alter any H2 code. But maybe Thomas Müller can take a look?

--

Noel Grandin

unread,
Jan 30, 2015, 8:58:42 AM1/30/15
to h2-da...@googlegroups.com
HI

Just to let you know that I am looking at your patch, it's just taking me a while because I don't have much time right now.

-- Noel

Noel Grandin

unread,
Feb 4, 2015, 3:59:26 AM2/4/15
to h2-da...@googlegroups.com
Hi

Sorry this took so long, your patch has been committed.

Regards, Noel
Reply all
Reply to author
Forward
0 new messages