Can SELECTIVITY be automatic using indexes?

54 views
Skip to first unread message

kensystem

unread,
Sep 23, 2011, 1:38:12 AM9/23/11
to h2-da...@googlegroups.com
First I realize my question may be naive, please accept my apologies in advance.

Some of the performance problems I've encountered seem to related to SELECTIVITY being wrong (just needing update) for a given column. But when a column already has an index on it, is that index not useful as a live snapshot of how unique the value are in it column(s)? At least where Hashmaps are concerned, the size() or the Map relative to row-count, seems like a good way to find priority of the column/index (without requiring periodic auto/manual ANALYZE, and which only checks first N rows).

I realize b-tree indexes may be different... hence my question :-)

Thanks,
Ken

kensystem

unread,
Oct 4, 2011, 10:40:19 PM10/4/11
to h2-da...@googlegroups.com
Hello Thomas, I just thought I'd ping this thread again on the chance this may help how selectivity is calculated. Please let me know if I'm pretty far off base.
Thanks as always,
ken

Thomas Mueller

unread,
Oct 11, 2011, 2:08:30 AM10/11/11
to h2-da...@googlegroups.com
Hi,

For in-memory non-unique hash indexes, the map size could be used, but for b-tree indexes I don't know of a good solution currently, sorry.

Regards,
Thomas

Noel Grandin

unread,
Oct 11, 2011, 7:21:35 AM10/11/11
to h2-da...@googlegroups.com, kensystem
Selectivity should update automatically every 1000 rows, so it should never be wrong enough to cause problems.

-- Noel Grandin

> --
> You received this message because you are subscribed to the Google Groups "H2 Database" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/Yusy5cYjPoUJ.
> To post to this group, send email to h2-da...@googlegroups.com.
> To unsubscribe from this group, send email to h2-database...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/h2-database?hl=en.

kensystem

unread,
Oct 14, 2011, 1:12:56 AM10/14/11
to h2-da...@googlegroups.com, kensystem
Hi Noel, thank you for your reply! Well, I was actually thinking (asking) if calling size() on the index would be better, since:

a) the selectivity doesnt need to repeatedly-evaluated (separate thread, or blocking as case may be) and consume CPU when (if) size can just be called instead.

b) The first N rows that the selectivity-checks could (I imagine) be less-than an ideal sampling, for example when importing millions of already-sorted geodata (where the first N rows might have thousands of same values)

Looking at the size() impl in java.util.TreeMap (which I suspect is not the same impl as H2's), it's just a return value for the int size field.

Hope this is clearer,
ken

Noel Grandin

unread,
Oct 14, 2011, 4:17:35 AM10/14/11
to h2-da...@googlegroups.com, kensystem
Hi

Ah, yes, I see what you are saying.
I'm afraid on-disk b-tree indexes don't work that way.
They have an entry for every row in the table, so index.size() == table.size().

-- Noel Grandin

> --
> You received this message because you are subscribed to the Google Groups "H2 Database" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/h2-database/-/qSNV3FEY-MMJ.

kensystem

unread,
Nov 5, 2011, 3:18:57 AM11/5/11
to h2-da...@googlegroups.com
So if we already have hash indexes, does H2's planner currently skip using the selectivity system (and periodic scanning N rows)?

It seems like this could produce better results, given the scenario where the first N rows for a column might have duplicates and appear to have lower selectivity, when the entire column's actual uniqueness might be much higher than others, in the context of millions of rows.

Thomas Mueller

unread,
Nov 8, 2011, 1:08:27 PM11/8/11
to h2-da...@googlegroups.com
Hi,

So if we already have hash indexes, does H2's planner currently skip using the selectivity system (and periodic scanning N rows)?

Sorry, I don't understand what you mean with "skip using the selectivity system".

If you mean whether "offset" of "limit ... offset ..." is optimized in some way: no, currently it is not.

Regards,
Thomas

Reply all
Reply to author
Forward
0 new messages