Slow query for compound IN condition?

63 views
Skip to first unread message

Ken Johanson

unread,
Feb 23, 2010, 3:10:59 AM2/23/10
to h2-da...@googlegroups.com
Hi,

With this table:
TABLE_CATALOGTABLE_SCHEMA TABLE_NAMECOLUMN_NAME ORDINAL_POSITIONCOLUMN_DEFAULT IS_NULLABLEDATA_TYPE CHARACTER_MAXIMUM_LENGTHCHARACTER_OCTET_LENGTH NUMERIC_PRECISIONNUMERIC_PRECISION_RADIX NUMERIC_SCALECHARACTER_SET_NAME COLLATION_NAMETYPE_NAME NULLABLEIS_COMPUTED SELECTIVITYCHECK_CONSTRAINT SEQUENCE_NAMEREMARKS SOURCE_DATA_TYPE
'TEST''PUBLIC' 'OM' 'B'1'0''NO' 4101010 100'Unicode''OFF''INTEGER' 0false50'' null''null
'TEST''PUBLIC' 'OM' 'PK'2null'YES' -51919 19100'Unicode''OFF' 'BIGINT'1false 50''null''null

and about 500,00 rows, the following queries on a 1Ghz CPU have run-times:

50 seconds:
SELECT pk FROM om WHERE pk IN (<BIGINT-LIST>)
AND b IN (1,2)

5 seconds:
SELECT pk FROM om WHERE pk IN (<BIGINT-LIST>)
AND b IN (1)
(or any value in place of 1)

< 0.1 seconds:
SELECT pk FROM om WHERE pk IN (<BIGINT-LIST>)

... where<BIGINT-LIST> is a constant/fixed list of ~40 longs, like:
1254619816788000000,1176099280681000000,(.... 3rd-40th)

`explain plan` for the first query returns:
'SELECT PK FROM PUBLIC.OM /* PUBLIC.OM_DATA */ WHERE (PK IN(<BINGINT-LIST>) AND (B IN(1, 2))'

Both columns ('PK' and 'B') are indexed...

Is this a known optimization todo/issue?

Thank you,
Ken

kensystem

unread,
Feb 23, 2010, 12:06:00 PM2/23/10
to H2 Database
Let me apologize about the formatting, I thought that G Groups would
have retained the HTML-table formatting I coped pasted from the info
schema (it certainly allowed in in the textarea anyway)
CREATE table OM (pk BIGINT primary key auto_increment, b INTEGER)
CREATE INDEX b ON om(b)

Also I typod '500,00' but meant the table has '500,000' rows.

Also to be clearer, the 'pk' column is a primary key and so only about
40 items in the set matched the query; even so adding the second AND-
IN seems to slow it down exponentially as that list grows; its as if
the entire dataset and not subset is searched for each value in each
list.

Jan Kotek

unread,
Feb 23, 2010, 2:41:44 PM2/23/10
to h2-da...@googlegroups.com
Hi,
I had simular problem. H2 does not optimize AND conditions very well.

I would recommend using subqueries.

Regards,
j

> --
> You received this message because you are subscribed to the Google Groups "H2 Database" group.
> To post to this group, send email to h2-da...@googlegroups.com.
> To unsubscribe from this group, send email to h2-database...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.
>
>

kensystem

unread,
Feb 23, 2010, 11:40:58 PM2/23/10
to H2 Database
Jan, thank you for the suggestion. I gave that some consideration but
1) the data for the two AND conditions is input values and I can't use
subquery or outer-joins to mimic it and 2) the app has so many AND
condition with list (IN/HAVING) that it seems overwhelming to change
them all; I may wait for a fix.

Thomas I would appreciate the opportunity to help any way I can to
improve this (except I'm afraid monetarily); is it an issue you are
aware of and if so do you have a plan or test build I could try, or
other way (test cases, code todo etc) I could help?

Thanks,
Ken

On Feb 23, 12:41 pm, Jan Kotek <openco...@gmail.com> wrote:
> Hi,
> I had simular problem. H2 does not optimize AND conditions very well.
>
> I would recommend using subqueries.
>
> Regards,
> j
>

kensystem

unread,
Feb 25, 2010, 12:35:32 AM2/25/10
to H2 Database
Here's a java.hprof.txt link for this. Again it was run on a 1Ghz i386/
Linux box, and even with this shortened query takes 17 seconds:
http://kensystem.com/h2-time.zip

SELECT objId,b FROM om WHERE
b IN (1,2) AND objId IN (1254619816788000000,1176099280681000000)

H2 was started with:
java -Xrunhprof:cpu=time,depth=8 -Xmx256M -DRETURN_LOB_OBJECTS=false -
jar h2-1.2.128.jar org.h2.tools.Server -webAllowOthers -tcpAllowOthers
-tcpPort 9101&
(note this profiling mode took enough time that I needed to send a
term signal to java, before the query completed)

Thomas Mueller

unread,
Feb 25, 2010, 1:12:34 AM2/25/10
to h2-da...@googlegroups.com
Hi,

I found the problem, it will be fixed in the next release. The index
is currently not used if there were multiple IN(..) conditions. A
workaround is to use OR:

SELECT pk FROM om WHERE pk IN (<BIGINT-LIST>) AND (b = 1 OR b = 2)

The query plan is not very clear. I will change that as well, so that
the query plan will contain ".tableScan" if a table scan is used.

Regards,
Thomas

kensystem

unread,
Feb 25, 2010, 2:26:41 AM2/25/10
to H2 Database
Thank you, sir!!! YOu are right it does help, but in a way that
supirises me (I tried OR before but got the second result below):

This query now runs super fast, <15ms (sweet!!!):


SELECT objId,b FROM om WHERE

(b=1 OR b=2)
AND objId IN (1254619816788000000,1176099280681000000)

But this query still takes >10 seconds:


SELECT objId,b FROM om WHERE

(b=1 OR b=2)
AND (objId=1254619816788000000 OR objId=1176099280681000000)

Will your solution account for this difference also?

Again many thanks!!!

Ken

On Feb 24, 11:12 pm, Thomas Mueller <thomas.tom.muel...@gmail.com>
wrote:

kensystem

unread,
Feb 26, 2010, 3:06:52 AM2/26/10
to H2 Database
Thomas, you may already be aware but I thought I 'd mention.. my being
anxious to try the fix (but not knowing how often you commit changes
to CVS) I grabbed the latest nightly build http://www.h2database.com/automated/h2-latest.jarbut
it doesn't seem to ever return for any query. I started identically
except different .jar name. Would this be a bug?

Thomas Mueller

unread,
Feb 27, 2010, 6:07:55 AM2/27/10
to h2-da...@googlegroups.com
Hi,

Could you try again with version 1.2.130?

Regards,
Thomas

kensystem

unread,
Feb 27, 2010, 11:21:30 AM2/27/10
to H2 Database
Yes, sir!!! All examples on this thread now run lightning fast!!!
Thank you many times for your fine work!

kensystem

unread,
Mar 3, 2010, 1:51:10 AM3/3/10
to H2 Database
Thomas, I'm afraid I've found another query where IN does not use the
index, but OR does (using build 1.2.130):

<10ms:
SELECT * FROM om WHERE
(b=26 OR b=27)
AND numval = 468203

But ~2000ms:
SELECT * FROM om WHERE
b IN (26,27)
AND numval = 468203

where 'numval' is a indexed double type (which I recently added)...

If I remove either of the AND conditions, the speed query again
executes very fast...

Let me know if you need a profiler output..

Thomas Mueller

unread,
Mar 4, 2010, 3:45:59 PM3/4/10
to h2-da...@googlegroups.com
Hi,

The optimizer doesn't choose the best index if there are multiple.
This is a limitation that is quite hard to fix currently. I will add a
feature request. My test case is:

drop table test;
create table test(b double, numval int);
insert into test
select mod(x, 10), x from system_range(1, 1000);

explain SELECT * FROM test
WHERE b IN (26,27) AND numval = 468203;
-- tableScan

create index idx_b on test(b);
explain SELECT * FROM test
WHERE b IN (26,27) AND numval = 468203;
-- IDX_B: B IN(26, 27)

create index idx_numbal on test(numval);
explain SELECT * FROM test
WHERE b IN (26,27) AND numval = 468203;
-- IDX_B: B IN(26, 27)

Regards,
Thomas

kensystem

unread,
Sep 22, 2011, 1:18:05 AM9/22/11
to h2-da...@googlegroups.com
Hi Thomas,

I decided to test again using h2-1.3.159, and based on speed of the query, it seems to be using scans, not indexes. Both IN and OR take ~60 seconds on my 1Ghz machine.

explain SELECT * FROM om WHERE
OMATTRIBID IN (26,27)
AND objvalue='468203'
result:
SELECT OM.OMATTRIBID, OM.OBJID, OM.OWNERID, OM.OBJVALUE, OM.LISTORDER, OM.OMID FROM PUBLIC.OM /* PUBLIC.OM_ATTRVAL_ID: OMATTRIBID IN(26, 27) AND OBJVALUE = '468203' */ WHERE (OMATTRIBID IN(26, 27)) AND (OBJVALUE = '468203')

explain SELECT * FROM om WHERE
(OMATTRIBID=26 OR OMATTRIBID=27)
AND objvalue='468203'
result:
SELECT OM.OMATTRIBID, OM.OBJID, OM.OWNERID, OM.OBJVALUE, OM.LISTORDER, OM.OMID FROM PUBLIC.OM /* PUBLIC.OM_ATTRVAL_ID: OMATTRIBID IN(26, 27) AND OBJVALUE = '468203' */ WHERE (OMATTRIBID IN(26, 27)) AND (OBJVALUE = '468203')

Some notes:

-The table contains 2M rows.

-"OBJVALUE" in this case is a varchar column.

-java -Xmx256M -Dh2.queryCacheSize=0 -Dh2.returnLobObjects=false -Dh2.objectCacheSize=50000 -Dh2.pageSize=8192 -Dh2.maxMemoryRows=100000 -cp $H2_HOME org.h2.tools.Server -tcpAllowOthers -tcpPort 9101 -baseDir $H2_DATA

-During the query all CPU is user, not wa (swap).

-DDL:

CREATE CACHED TABLE PUBLIC.OM( OMATTRIBID INTEGER DEFAULT 0 NOT NULL SELECTIVITY 1, OBJID BIGINT SELECTIVITY 5, OWNERID INTEGER SELECTIVITY 1, OBJVALUE VARCHAR(1000000) SELECTIVITY 12, LISTORDER INTEGER SELECTIVITY 1, OMID BIGINT DEFAULT (NEXT VALUE FOR PUBLIC.SYSTEM_SEQUENCE_3F2DECE5_3E35_486B_AFD8_CBDF71B9EDA5) NOT NULL NULL_TO_DEFAULT SEQUENCE PUBLIC.SYSTEM_SEQUENCE_3F2DECE5_3E35_486B_AFD8_CBDF71B9EDA5 SELECTIVITY 100 );

ALTER TABLE PUBLIC.OM ADD CONSTRAINT PUBLIC.CONSTRAINT_9D PRIMARY KEY(OMID);

CREATE HASH INDEX PUBLIC.OM_OBJID_OMATTRIBID ON PUBLIC.OM(OBJID, OMATTRIBID);

CREATE INDEX PUBLIC.OM_ATTRVAL_ID ON PUBLIC.OM(OMATTRIBID, OBJVALUE);

CREATE INDEX PUBLIC.OM_OBJVALUE_IDX ON PUBLIC.OM(OBJVALUE);

CREATE HASH INDEX PUBLIC.OM_OBJID_IDX ON PUBLIC.OM(OBJID);

Thomas Mueller

unread,
Sep 22, 2011, 1:48:30 PM9/22/11
to h2-da...@googlegroups.com
Hi,

based on speed of the query, it seems to be using scans, not indexes.

If it's a table scan, then the explain plan would say so. 

SELECT OM.OMATTRIBID, OM.OBJID, OM.OWNERID, OM.OBJVALUE, OM.LISTORDER, OM.OMID FROM PUBLIC.OM /* PUBLIC.OM_ATTRVAL_ID: OMATTRIBID IN(26, 27) AND OBJVALUE = '468203' */ WHERE (OMATTRIBID IN(26, 27)) AND (OBJVALUE = '468203')

That means it's using the index OM_ATTRVAL_ID (on OMATTRIBID, OBJVALUE). What other index should the database use?

OMATTRIBID INTEGER SELECTIVITY 1: bad selectivity
OBJVALUE VARCHAR(1000000) SELECTIVITY 12: medium selectivity, but possibly a lot of data.

But possibly the data is sorted in the wrong way on disk. Possibly SHUTDOWN DEFRAG will solve this problem. Could you try this?

-During the query all CPU is user, not wa (swap).

You could use the profiling tool (org.h2.util.Profiler) to get the details.

Regards,
Thomas

kensystem

unread,
Sep 22, 2011, 11:00:00 PM9/22/11
to h2-da...@googlegroups.com
Thomas, I tried SHUTDOWN DEFRAG and still get similar results. Here is the selectivity now, and I just ran ANALYZE:

SELECT     COLUMN_NAME,DATA_TYPE,SELECTIVITY FROM information_schema.columns WHERE LOWER(TABLE_SCHEMA) = 'public' AND LOWER(TABLE_NAME) LIKE 'om'

COLUMN_NAMEDATA_TYPESELECTIVITY
1'OMATTRIBID'41
2'OBJID'-55
3'OWNERID'41
4'OBJVALUE'1212
5'LISTORDER'41
6'OMID'-5100

These are what I expect them to be. To answer your question, I would expect it to choose the OM_ATTRVAL_ID (on OMATTRIBID, OBJVALUE) index, or would you expect it would be faster (because of the OR/IN component) to create a new index on only OMATTRIBID? I tried that and explain gave the same output.

If I just run:
SELECT * FROM om WHERE objvalue='468203'
it takes a few ms to run.

And:
SELECT * FROM om WHERE OMATTRIBID=26 AND objvalue='468203'
takes just a few ms also.

But if I just run:

SELECT * FROM om WHERE OMATTRIBID IN (26,27)
it takes about 30-60 seconds

So it seems like OR/IN has some issue..

kensystem

unread,
Sep 23, 2011, 2:58:16 AM9/23/11
to h2-da...@googlegroups.com
Thomas, I'm sorry, this statement only ran slow due to disk (caching a large resultset):

SELECT * FROM om WHERE OMATTRIBID IN (26,27)
Using COUNT it runs fast (<1 second)
SELECT COUNT(*) FROM om WHERE OMATTRIBID IN (26,27)

Even so the compound criteria still runs slow (with COUNT(), and it only should return 1 row anyway :-)

SELECT * FROM om WHERE OMATTRIBID=26 AND objvalue='468203' (few milliseconds)
SELECT * FROM om WHERE OMATTRIBID IN (26,27) AND objvalue='468203' (60 seconds)


kensystem

unread,
Sep 24, 2011, 7:16:02 AM9/24/11
to h2-da...@googlegroups.com
I'd forgotten that the info about scans is only done when EXPLAIN ANALYZE is used. Here is its output:

explain analyze SELECT COUNT(*) FROM om WHERE objvalue='468203' AND OMATTRIBID IN (26,27)

SELECT COUNT(*) FROM PUBLIC.OM /* PUBLIC.OM_ATTRVAL_ID: OMATTRIBID IN(26, 27) AND OBJVALUE = '468203' */ /* scanCount: 2181921 */ WHERE (OMATTRIBID IN(26, 27)) AND (OBJVALUE = '468203') /* total: 34990 OM.OM_ATTRVAL_ID read: 34990 (100%) */

the same output was generated for:
explain analyze SELECT COUNT(*) FROM om WHERE objvalue='468203' AND (OMATTRIBID=26 OR OMATTRIBID=27)

So seem like my indexes (listed in earlier post) are not being used at all.

kensystem

unread,
Oct 4, 2011, 10:34:28 PM10/4/11
to h2-da...@googlegroups.com
On Saturday, September 24, 2011 5:16:02 AM UTC-6, kensystem wrote:

explain analyze SELECT COUNT(*) FROM om WHERE objvalue='468203' AND (OMATTRIBID=26 OR OMATTRIBID=27)

SELECT COUNT(*) FROM PUBLIC.OM /* PUBLIC.OM_ATTRVAL_ID: OMATTRIBID IN(26, 27) AND OBJVALUE = '468203' */ /* scanCount: 2181921 */ WHERE (OMATTRIBID IN(26, 27)) AND (OBJVALUE = '468203') /* total: 34990 OM.OM_ATTRVAL_ID read: 34990 (100%) */

 Hi Thomas, I hope my description was clear enough that you can verify this is a problem(?) Is there a patch I can test?

Thanks in advance,
ken

Reply all
Reply to author
Forward
0 new messages