Thanks for the feedback and tips.
Not sure how I overlooked the obviousness of the "EXPLAIN" keyword :)
As expected, I mostly have issues with a couple of poorly designed
queries which are causing tablescans.
If we decide to officially support H2 as a new database, I'll have to
either rework them in such a way that H2 can use the indexes, or dig
into H2's code a bit, and try to figure out why it isn't using the
indexes.
Thanks,
Dan
How or when does H2 decide to update it's own statistics?
Do you need to manually update them after you create a new table,
indexes, and populate the table?
How frequently would one need rerun Analyze if the table is growing?
>
> Low Selectivity: index may not be used
> Solution: fake a better selectivity, if you're sure it will improve
> performance (test this)
> Use an ALTER TABLE ALTER COLUMN SELECTIVITY statement.
I've never understood the lengths that databases sometimes go to to
avoid using the indexes in situations where (it thinks) the table is
small. If someone bothered to create the index, they probably
intended for the database to use the index where possible. I've had
this issue with PostgreSQL in the past too... I used to have to change
a setting that basically told PosgreSQL if the index exists, use it.
Otherwise, it had a tendency to make bad decisions when it thought a
table had 10 rows in it, but in reality, it now had 50,000 (but the
stats were out of date).
Or worse, if you created a prepared statement when a table had 10 rows
in it, then populated the table with thousands of rows, then used the
prepared statement, PostgreSQL would use the query plan that was
created at the time the prepared statement was made - which would
conclude that a table scan was always the thing to do, since the table
was small.
>
> Multi-column indices: can only be used in order specified.
> If first column not used in where, index won't be.
If a table has columns A, B, C,
And the primary key is A, B - I assume it creates an index on A, B.
If I then query on just A - will it be able to use this index? Or
will it only be used for a query on A, B (in that order)?
>
> Also: did improving queries bring H2 up to the same performance level
> as PostgreSQL?
I haven't been able to rework the queries yet... its an issue with a
legacy schema design that doesn't handle the question being asked in a
very straightforward way. Probably won't put to much more work into
it unless I get a go-ahead from management saying that we want to
officially support H2.
My query looks something like this:
select *
from a, b
where a.zz=b.zz
and
((a.xx='string' and a.yy<>'string' )
or
(a.yy='string' and a.xx<>'string' and b.qq='1'))
There are indexes for:
b.zz
a.zz
a.xx, a,yy
a.yy
Postgres starts this query by doing index scans for each of half of the OR.
H2 starts with a table scan on a.
But this isn't a big deal to me at the moment. I'd like to replace
the query anyway, and it's likely going to be a while before I get the
go-ahead to integrate support for H2. Until then, I don't have a lot
of cycles to test alternatives. I was just doing a cursory test of
various embedded databases to see what is possible with our current
code.
Thanks,
Dan
>> I've never understood the lengths that databases sometimes go to to
>> avoid using the indexes in situations where (it thinks) the table is
>> small.
Me neither. H2 uses an index, even if the table has no rows. I'm not
saying this is smart, but that's how it works. One reason is that the
query plan is re-used for prepared statements in most cases (there are
some exceptions, for example if you use LIKE).
> from a, b where a.zz=b.zz and ((a.xx='string' and a.yy<>'string' ) or (a.yy='string' and a.xx<>'string' and b.qq='1'))
H2 doesn't optimize this yet. If you want, you could rewrite the query
as a UNION:
select * from a, b where a.zz=b.zz and a.xx='string' and a.yy<>'string'
union
select * from a, b where a.zz=b.zz and a.yy='string' and
a.xx<>'string' and b.qq='1'
Regards,
Thomas