SQL Select optimisation

Kevin Thiele

unread,

Mar 18, 2012, 11:20:55 AM3/18/12

to H2 Database

I have an H2 database with c. 750,000 records (of plant specimens).
Two of the columns are:

Name VARCHAR(255)
Genus VARCHAR(30)

It happens in plant names that the first part of Name is always the
Genus, e.g.

Genus = 'Eucalyptus'
Name = 'Eucalyptus robusta'

I need to return all distinct names belonging to a particular genus
(e.g. Eucalyptus). I can use two alternative SELECT statements to do
this (both return the same result):

SELECT DISTINCT Name FROM Specimens WHERE Genus = 'Eucalyptus' ORDER
BY NAME Asc;
SELECT DISTINCT Name FROM Specimens WHERE Name LIKE 'Eucalyptus %'
ORDER BY NAME Asc;

The odd thing is that the second is 7 times faster than the first
(1.734 seconds cf. 0.234 seconds). Both columns are indexed.

Can anyone explain why? The only thing I can think that might be
relevant is that the second statement references only one field (Name)
while the second references two (Name and Genus) - but in both cases
the WHERE clause references only one field.

Steve McLeod

unread,

Mar 18, 2012, 6:33:08 PM3/18/12

to h2-da...@googlegroups.com

Try using EXPLAIN ANALYZE in front of your queries in the H2 web console. This will tell whether an index is being used or a full table scan is being performed. It will also tell you which index. With that information you can tune your indexes.

EXPLAIN ANALYZE SELECT DISTINCT Name FROM Specimens WHERE Genus = 'Eucalyptus' ORDER
BY NAME Asc;
EXPLAIN ANALYZE SELECT DISTINCT Name FROM Specimens WHERE Name LIKE 'Eucalyptus %'
ORDER BY NAME Asc;

Peter Yuill

unread,

Mar 18, 2012, 8:14:39 PM3/18/12

to h2-da...@googlegroups.com

Hi Kevin,

As Steve suggested you need to run an EXPLAIN PLAN, but I think you will
find that the optimizer is able to satisfy the second query from the
'Name' index alone, with no need to fetch rows from the table or do any
sorting. The first query by comparison will probably require an index
scan, row fetch and sort. You would probably get much closer results if
you added a second column to the DISTINCT, thus forcing row fetch and sort.

Regards,
Peter

Thomas Mueller

unread,

Mar 28, 2012, 1:13:42 PM3/28/12

to h2-da...@googlegroups.com

Hi,

I guess the second query is much faster because it only needs to read from the index (as only the indexed column is referenced). The first query needs to read the row itself (H2 always reads the whole row), that's why it is slower.

Regards,

Thomas

--
You received this message because you are subscribed to the Google Groups "H2 Database" group.
To post to this group, send email to h2-da...@googlegroups.com.
To unsubscribe from this group, send email to h2-database...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

Reply all

Reply to author

Forward