Re: [h2] PreparedStatement with a lot of indexed Parameters hangs up and takes CPU - but there is no error message

134 views
Skip to first unread message

Noel Grandin

unread,
Oct 7, 2013, 9:32:19 AM10/7/13
to h2-da...@googlegroups.com, Markus Fengler

I don't do Maven (because, for example, like now when it can't connect
to some server somewhere and so I can't get the project working).

But if you create me a standalone project will all of the necessary
dependencies included, I can take a look at this problem.


On 2013-10-06 22:27, Markus Fengler wrote:
>
> You can download the whole Example as maven project at
> http://jmlp.volans.uberspace.de/ftp/issueExample.zip

Markus Fengler

unread,
Oct 7, 2013, 3:00:55 PM10/7/13
to h2-da...@googlegroups.com, Markus Fengler
Hi Noel,

no problem. I've created a new example. You can download it here: http://jmlp.volans.uberspace.de/ftp/H2Issue.zip

Thomas Mueller

unread,
Oct 7, 2013, 3:35:16 PM10/7/13
to H2 Google Group
Hi,

I guess the query is slow for some reason, possibly because there is no index. Did you already check the documentation at http://h2database.com/html/performance.html#explain_plan ?

Regards,
Thomas


On Sun, Oct 6, 2013 at 10:27 PM, Markus Fengler <nar...@googlemail.com> wrote:
Hi,

I want to get a ResultSet and the query contains a lot of Parameters.
If I use an inline Query then it works but if I use a PreparedStatement with indexed Parameters there seems to be a problem.
Here is an example for this strange behaviour.


You can download the whole Example as maven project at http://jmlp.volans.uberspace.de/ftp/issueExample.zip

Note: For generating the sql I use jooq library. I also posted the issue at their repository but it seems that it is h2 specific. (https://github.com/jOOQ/jOOQ/issues/2768)

  @Test
    public void testBugExample() throws SQLException {
        /* Preparation of Data */
        DSLContext dsl = DSL.using(this.con, SQLDialect.H2);
        List<Long> activeFolders = dsl.select(FILES.ID).from(FILES).where(FILES.TYP.eq("D"), FILES.AVAILABLE.isTrue()).fetch(FILES.ID);
        List<Long> activeFiles = dsl.select(FILES.ID).from(FILES).where(FILES.TYP.eq("F"), FILES.AVAILABLE.isTrue()).fetch(FILES.ID);

        /* the Query with the Bug */
        SelectConditionStep<Record1<Long>> qry = dsl.select(FILES.ID).from(FILES).join(FOLDERMAP).on(FILES.ID.eq(FOLDERMAP.ID))
                .where(FILES.ID.notIn(activeFiles), FOLDERMAP.PARENT.in(activeFolders));

        String qryIndexed = qry.getSQL(ParamType.INDEXED);
        String qryInline = qry.getSQL(ParamType.INLINED);

        /* InlineParameter work */
        log.info("Query Inline: " + qryInline);
        {
            Statement stm = con.createStatement();
            ResultSet rs = stm.executeQuery(qryInline);
            while (rs.next())
                rs.getLong(1);
            rs.close();
            stm.close();
        }

         /* InlineParameter work */
        log.info("Query Indexed: " + qryIndexed);
        {
            PreparedStatement stm = con.prepareStatement(qryIndexed);

            /* set Parameter */
            int pos = 1;
            for (Long e : activeFiles)
                stm.setLong(pos++, e);
            for (Long e : activeFolders)
                stm.setLong(pos++, e);

            ResultSet rs = stm.executeQuery();
            while (rs.next())
                rs.getLong(1);
            rs.close();
            stm.close();
        }

        /* does not finish to this line */
        log.info("finish");
    }

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
To post to this group, send email to h2-da...@googlegroups.com.
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/groups/opt_out.

Markus Fengler

unread,
Oct 7, 2013, 3:38:44 PM10/7/13
to h2-da...@googlegroups.com
Hi Thomas,

I don't think it's slow because of a missing index. If I use the inline statement its fast. Did you tried the example?

Lukas Eder

unread,
Oct 8, 2013, 2:31:11 AM10/8/13
to h2-da...@googlegroups.com
I'll tune in here to provide you with some background info from the jOOQ side. When Markus used jOOQ to inline bind values (e.g. WHERE foldermap.parent IN (1, 2, 3, 4)), the query ran in an acceptable time on my machine - around 1/2s. In the latter query, jOOQ renders bind values as question marks for later binding. What may be interesting to note is the fact that for historic reasons, jOOQ generates CAST(? AS BIGINT) for bind values. I'm not sure if the CAST() is the source of confusion here. But when creating thread dumps, I noticed that a significant amount of time is spent in Value.convertTo():
"main" prio=6 tid=0x0000000001d2e000 nid=0x1798 runnable [0x0000000001d0d000]
   java.lang.Thread.State: RUNNABLE
        at org.h2.value.Value.convertTo(Value.java:507)
        at org.h2.expression.Function.getSimpleValue(Function.java:781)
        at org.h2.expression.Function.getValueWithArgs(Function.java:992)
        at org.h2.expression.Function.getValue(Function.java:466)
        at org.h2.expression.ConditionIn.getValue(ConditionIn.java:52)
        at org.h2.expression.ConditionNot.getValue(ConditionNot.java:33)
        at org.h2.expression.ConditionAndOr.getValue(ConditionAndOr.java:94)
        at org.h2.expression.ConditionAndOr.getValue(ConditionAndOr.java:94)
        at org.h2.expression.Expression.getBooleanValue(Expression.java:180)
        at org.h2.command.dml.Select.queryFlat(Select.java:520)
So, without having evaluated this further, this looks as though a lot of work is performed for the complete IN and NOT IN predicate for each record.

Hope this helps
Lukas

Markus Fengler

unread,
Oct 8, 2013, 4:25:35 AM10/8/13
to h2-da...@googlegroups.com
Hi Lukas,

you are right. Now I did four tests.

First with inline (e.g. where foldermap.parent in (1,2,3,4,...)) - the query takes less than 2 seconds to finish.

Second with indexed parameters (e.g. where foldermap.parent in (?,?,?,?,...)) - the query takes arount 37 seconds to finish.

Third with indexed and cast parameters (e.g. where foldermap.parent in (cast(? as bigint),cast(? as bigint),cast(? as bigint),cast(? as bigint),...)) - the query takes about 160 seconds to finish.

And last but not least a fourth with inline and cast (e.g. where foldermap.parent in (cast(1 as bigint), cast(2 as bigint),...) - the query takes less than one second.

For me it looks like that there is a performance issue at the h2 in handling indexed parameters.

Noel Grandin

unread,
Oct 8, 2013, 5:25:30 AM10/8/13
to h2-da...@googlegroups.com, Markus Fengler
Hi

Thanks for the updated test case.

I've looked into this, and the situation is thusly:

We have special case optimised code for handling "IN (1,2,3)" queries.

Unfortunately, for "IN (?,?,?)" queries the optimised code does not get
used because we at the point in time we run the optimiser, the optimiser
sees the prepared-statement-parameters and bails out.

Running the optimiser on a PreparedStatement kind of defeats the point
of a PreparedStatement, which is to minimise per-execution setup cost.

It might be possible to teach the code in
org.h2.expression.ConditionIn#optimize to handle this case, but I'm
loathe to add that much complexity for corner cases like this.
I'll think about it some more.

Regards, Noel.

Lukas Eder

unread,
Oct 8, 2013, 5:42:19 AM10/8/13
to h2-da...@googlegroups.com, Markus Fengler
Hello,

2013/10/8 Noel Grandin <noelg...@gmail.com>

Hi

Thanks for the updated test case.

I've looked into this, and the situation is thusly:

We have special case optimised code for handling "IN (1,2,3)" queries.

Unfortunately, for "IN (?,?,?)" queries the optimised code does not get used because we at the point in time we run the optimiser, the optimiser sees the prepared-statement-parameters and bails out.

Running the optimiser on a PreparedStatement kind of defeats the point of a PreparedStatement, which is to minimise per-execution setup cost.

That's certainly true. I'm not sure if H2 does this, but other databases keep cursors and execution plans around for re-execution, in case equivalent statements are encountered.

However, given the length of the IN predicate, I think that the above rule might no longer hold true. The SQL string is different for 3895 parameters than it is for 3896 parameters. This is a common problem when tuning IN predicates. When they are long, it is almost always better to inline the bind values (not only in H2). Of course, it would be even better to create an actual semi-join using a subselect.
 
It might be possible to teach the code in org.h2.expression.ConditionIn#optimize to handle this case, but I'm loathe to add that much complexity for corner cases like this.
I'll think about it some more.

I couldn't help but think about this a bit myself as well. ;-)

In SQL, some expressions are "deterministic" (in the Oracle sense of the word), and some expressions are constant. Both expression attributes propagate through expressions. If a function F is deterministic, then "F(A, B)" is deterministic only if both A and B are also deterministic expressions. This can be further expanded to constants. If F is deterministic, then "F(A, B)" is constant only if A and B are constant expressions. Bind values are constant expressions for a single query execution. Thus, it would be possible to pre-process a SQL statement, substituting all constant expressions by their constant value. E.g. CAST(? AS BIGINT) could be substituted by the actual converted bind value. Alternatively, such expressions could be pre-calculated once and cached for a single query execution, similarly to what Oracle does with scalar subquery caching (http://blog.jooq.org/2011/09/02/oracle-scalar-subquery-caching/)

I would imagine that this is quite valuable in many non-corner cases as well, as H2 has a lot of built-in deterministic functions, such as: LOWER(), UPPER(), etc. 

Markus Fengler

unread,
Oct 8, 2013, 5:48:04 AM10/8/13
to h2-da...@googlegroups.com, Markus Fengler
Hi,

I'm really impressed by the your support. Thank you for this.

@Noel: If it's to hard to implement then don't think to long about it. As you said it's a special case (although I like it to use the jOOQ Function `select * from something where x in (<Collection of Ids>)', but it's not to hard to make a workaround (Lukas suggested http://www.jooq.org/doc/3.1/manual/sql-building/bind-values/inlined-parameters/).
But please add this special case (indexed parameters in IN statements) as additional information to your h2-documentation.

Noel Grandin

unread,
Oct 8, 2013, 5:48:02 AM10/8/13
to h2-da...@googlegroups.com, Lukas Eder, Markus Fengler

On 2013-10-08 11:42, Lukas Eder wrote:
>
> In SQL, some expressions are "deterministic" (in the Oracle sense of
> the word), and some expressions are constant. Both expression
> attributes propagate through expressions. If a function F is
> deterministic, then "F(A, B)" is deterministic only if both A and B
> are also deterministic expressions. This can be further expanded to
> constants. If F is deterministic, then "F(A, B)" is constant only if A
> and B are constant expressions. Bind values are constant expressions
> for a single query execution. Thus, it would be possible to
> pre-process a SQL statement, substituting all constant expressions by
> their constant value. E.g. CAST(? AS BIGINT) could be substituted by
> the actual converted bind value. Alternatively, such expressions could
> be pre-calculated once and cached for a single query execution,
> similarly to what Oracle does with scalar subquery caching
> (http://blog.jooq.org/2011/09/02/oracle-scalar-subquery-caching/)
>


We do function determinism and constant propagation in the Statement
case, but not so much in the PreparedStatement case.
If you feel like looking at improving the situation, feel free to have a
poke around, I'm happy to answer questions about how the code is structured.

Thomas Mueller

unread,
Oct 8, 2013, 1:24:12 PM10/8/13
to h2-da...@googlegroups.com
Hi,

Would it be possible to disable generating CAST? If the bind variable is set using PreparedStatement.setLong, I don't see why CAST would be needed.

Of course support for superfluous CAST (or any deterministic function on a parameter) could be implemented in H2, but so far it wasn't needed, and I'm hesitant to implement features that are not really needed. Also, it might break a known workaround to force not using a certain index (using the expression "-ID=-?"). I know that's a hack :-)


> I don't think it's slow because of a missing index.

OK, so the index is not be missing, but it is probably not used because of the CAST function. That would explain the difference.

In general, I would probably use a profiler to check what's going on. See also http://h2database.com/html/performance.html#built_in_profiler - also, it would help to know the query plans, see http://h2database.com/html/performance.html#explain_plan


Regards,
Thomas

Lukas Eder

unread,
Oct 8, 2013, 1:39:45 PM10/8/13
to h2-da...@googlegroups.com


2013/10/8 Thomas Mueller <thomas.to...@gmail.com>

Hi,

Would it be possible to disable generating CAST? If the bind variable is set using PreparedStatement.setLong, I don't see why CAST would be needed.

This is why:

After I had reported the above, I had written this blog post here, explaining why jOOQ casts bind values if databases don't infer them in some cases:

Remember? :-)
But I'll review those casts. I'm sure that many of them aren't needed, and jOOQ could omit blindly adding casts to all bind variables, e.g. when placed in the IN predicate.
 
Of course support for superfluous CAST (or any deterministic function on a parameter) could be implemented in H2, but so far it wasn't needed, and I'm hesitant to implement features that are not really needed. Also, it might break a known workaround to force not using a certain index (using the expression "-ID=-?"). I know that's a hack :-)

True, I'm aware of similar hacks with Oracle: ID + 0 = ?. Although, that's not exactly what I was proposing. While -ID prevents index usage, -? is still a constant. So "ID = -(-?)" should still work the same way as "ID = ?", in principle.
 
> I don't think it's slow because of a missing index.

OK, so the index is not be missing, but it is probably not used because of the CAST function. That would explain the difference.

In general, I would probably use a profiler to check what's going on. See also http://h2database.com/html/performance.html#built_in_profiler - also, it would help to know the query plans, see http://h2database.com/html/performance.html#explain_plan


Regards,
Thomas

--
You received this message because you are subscribed to a topic in the Google Groups "H2 Database" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/h2-database/1B4PuijeaYU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to h2-database...@googlegroups.com.

Thomas Mueller

unread,
Oct 8, 2013, 2:23:15 PM10/8/13
to H2 Google Group
Hi,

I see. Maybe it would make sense to add a special case for CAST(? AS <type>) in the parser, so that it's still a parameter, but with a specific type. I will try.

Regards,
Thomas



--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.

Thomas Mueller

unread,
Oct 9, 2013, 1:43:26 AM10/9/13
to h2-da...@googlegroups.com
Hi,

According to a simple test case, H2 is already using an index when using an unnecessary CAST. So I'm not sure yet what the problem is for your case. Could you run the query with "explain" and post the query plan? Example:

Connection conn = DriverManager.getConnection("jdbc:h2:mem:");
Statement stat = conn.createStatement();
stat.execute("create table test(id identity)"); 
PreparedStatement prep;
ResultSet rs; 
prep = conn.prepareStatement( 
    "explain select * from test where id in (?, ?)");
prep.setInt(1, 1);
prep.setInt(2, 1);
rs = prep.executeQuery();
rs.next();
System.out.println(rs.getString(1)); 
prep = conn.prepareStatement( 
    "explain select * from test where id in " +
    "(cast(? as bigint), cast(? as bigint))");
prep.setInt(1, 1);
prep.setInt(2, 1);
rs = prep.executeQuery();
rs.next();
System.out.println(rs.getString(1));
conn.close();

The output is:

SELECT
    TEST.ID
FROM PUBLIC.TEST
    /* PUBLIC.PRIMARY_KEY_2: ID IN(?1, ?2) */
WHERE ID IN(?1, ?2)
SELECT    
    TEST.ID
FROM PUBLIC.TEST
    /* PUBLIC.PRIMARY_KEY_2: ID IN(CAST(?1 AS BIGINT), SIN(CAST(?2 AS BIGINT))) */
WHERE ID IN(CAST(?1 AS BIGINT), SIN(CAST(?2 AS BIGINT)))

So in both cases the primary key is used. When changing "id in" to "-id in", then a table scan is used (no index). An index is also used when using a deterministic function or a combination (SIN(?), SIN(CAST(? AS INT)), and so on).

Regards,
Thomas
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email to h2-database...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages