subquery using WHERE clause of outer query?

Chad McHenry

unread,

Mar 10, 2012, 2:34:15 PM3/10/12

to h2-da...@googlegroups.com

The WHERE clause of a query seems to be polluting subqueries. The WHERE of the outer select is being pushed into the subquery (query optimizer?), but because the subquery depend on all its selected rows rows for correct @variable calculations, moving the outer WHERE inside changes the results. Is this intended behavior, or a bug?

I'm using H2 1.3.164 (2012-02-03).

The intent is to calculate the time delta between consecutive events (0 and 1), which are interspersed within other events (e.g. 2).

The subquery works on it's own: delta==1 for all rows where var=1 except the last (at time=12) where delta==2

The intent of the outer select is to select only rows where var=1 and sum the delta column. The WHERE moving into the subquery causes it to only select var=1 rows, changing the values of delta. "explain" shows the "event = 1" being pushed into the subquery, corroborated by the results showing @start being set to previous event=(1)

drop table if exists foo;

create temporary table foo (time int(10) not null auto_increment, event int(10), primary key (time));

insert into foo (event) values (0),(1),(0),(1),(2),(0),(1),(2),(2),(0),(2),(1);

set @start=null;

-- calculate deltas between successive event=(0) and event=(1) events

select time, event, @start as prev, time - @start as delta, set(@start,time) as now

from foo where event = 0 or event = 1

order by time;

-- corrupts deltas: here they are calculated between successive event=(1) events only

select * from (

select time, event, @start as prev, time - @start as delta, set(@start,time) as now

from foo where event = 0 or event = 1

order by time

) x where event = 1;

drop table foo;

I can get the desired results in the above situation using CASE WHEN..., in the subselect and omitting the outer WHERE, but I thought it worth asking about the behavior.

Best,

...Chad

Thomas Mueller

unread,

Mar 12, 2012, 4:41:29 PM3/12/12

to h2-da...@googlegroups.com

Hi,

The WHERE clause of a query seems to be polluting subqueries.

I wouldn't call it 'pollute'. It's an optimization (certain statements are faster because of it).

The WHERE of the outer select is being pushed into the subquery (query optimizer?), but because the subquery depend on all its selected rows rows for correct @variable calculations, moving the outer WHERE inside changes the results. Is this intended behavior, or a bug?

It is intended behavior. Depending on the optimization, not all rows are processed.

Maybe it is easier to create a Java function that returns a result set, and use it like a table?

Regards,

Thomas

Chad McHenry

unread,

Mar 13, 2012, 7:19:41 PM3/13/12

to h2-da...@googlegroups.com

It is intended behavior. Depending on the optimization, not all rows are processed.

Maybe it is easier to create a Java function that returns a result set, and use it like a table?

Shouldn't sub-query results be independent of outer-queries? I understand the optimization and why it is useful - in some cases. However, when SET(@variable, value) is in the selectExpression, the order and number of rows SET(...) is applied to, are critical for correct results. An optimization which filters those sub-queries before their calculations are made, all but guarantees unintended results.

If the sub-query did not use @variable calculations, the optimization would be innocuous, but with the calculations, it changes the results unexpectedly.

As comparison, MySQL returns expected results, calculating values on all rows selected within the subquery, then filtering those with the outer query.

What if the planner looked for the use of SET(...) in sub-queries before moving a WHERE clause down, and skip the optimization if a SET is used?

Best,

Chad

Thomas Mueller

unread,

Mar 19, 2012, 3:23:49 PM3/19/12

to h2-da...@googlegroups.com

Hi,

Shouldn't sub-query results be independent of outer-queries? I understand the optimization and why it is useful - in some cases. However, when SET(@variable, value) is in the selectExpression, the order and number of rows SET(...) is applied to, are critical for correct results.

It depends on what you consider a 'correct' result. I agree it might be confusing if not all rows are read, but disabling optimizations depending on whether such features are used also seems to be wrong.

What if the planner looked for the use of SET(...) in sub-queries before moving a WHERE clause down, and skip the optimization if a SET is used?

It sounds like it would still not match what MySQL is doing...

I don't really know the best solution. All possible solutions seem to be very tricky to get right, and I'm not sure if it's really worth the effort and the added complexity in the code. For now, I will document that when used in a subquery, not all rows might be read depending on the query plan.

Regards,

Thomas

Stefan Zeiger

unread,

Mar 20, 2012, 5:28:20 AM3/20/12

to h2-da...@googlegroups.com

On 2012-03-19 20:23, Thomas Mueller wrote:

Hi,

Shouldn't sub-query results be independent of outer-queries? I understand the optimization and why it is useful - in some cases. However, when SET(@variable, value) is in the selectExpression, the order and number of rows SET(...) is applied to, are critical for correct results.

It depends on what you consider a 'correct' result. I agree it might be confusing if not all rows are read, but disabling optimizations depending on whether such features are used also seems to be wrong.

What about my example in https://groups.google.com/forum/?fromgroups#!topic/h2-database/haMFbcuewog? It's basically the same problem but with plain declarative SQL.

Cheers,
Stefan

--
Stefan Zeiger
Typesafe - The software stack for applications that scale
Twitter: @StefanZeiger

Noel Grandin

unread,

Mar 20, 2012, 9:03:59 AM3/20/12

to h2-da...@googlegroups.com, Stefan Zeiger

That looks like a bug.
Could you produce a reduced test-case?

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Stefan Zeiger

unread,

Mar 20, 2012, 10:11:36 AM3/20/12

to h2-da...@googlegroups.com

On 2012-03-20 14:03, Noel Grandin wrote:

That looks like a bug.
Could you produce a reduced test-case?

It's reproducible with pretty much the simplest possible code (as long as you're filtering on a second column and not on the COF_NAME directly):

DROP TABLE COFFEES if exists;
CREATE TABLE COFFEES (COF_NAME VARCHAR,PRICE INT);

INSERT INTO COFFEES (COF_NAME,PRICE) VALUES ('Colombian', 5);
INSERT INTO COFFEES (COF_NAME,PRICE) VALUES ('French_Roast', 5);
INSERT INTO COFFEES (COF_NAME,PRICE) VALUES ('Colombian_Decaf', 20);

select * from (
select COF_NAME, PRICE from COFFEES order by COF_NAME LIMIT 2
);
-- finds Colombian and Colombian_Decaf

select * from (
select COF_NAME, PRICE from COFFEES order by COF_NAME LIMIT 2
) where PRICE < 10;
-- should find only Colombian but also finds French_Roast

Thomas Mueller

unread,

Mar 30, 2012, 6:13:29 AM3/30/12

to h2-da...@googlegroups.com

Hi,

This should be fixed in the trunk now, see issue 387: WHERE condition
getting pushed into sub-query with LIMIT.

Regards,
Thomas

Reply all

Reply to author

Forward