Developing and Index Advisor for MySQL

Eduardo Weiland

unread,

Nov 5, 2015, 5:04:22 AM11/5/15

to

Hello there,

I am new here in the list. I am from Brazil and I am finishing my bachelor's degree in Computer Science.

I have to do a final work for the course, and (I don't know why) I came out with the idea of developing an index advisor for MySQL.

Just to be clear, by index advisor I mean some tool that suggests what indices can be created in the database, based on statistics and query log. In other words, indexes that do not exist in the database but can be created by the administrator in order to improve query execution performance.

I have never worked with MySQL internals before. I know a lot of C and C++ programming, so I just need some help to understand MySQL structure.

My main inspirations for doing this work are [1] and [2] (actually, they are the same work, but two different articles). One important thing I have found in [1]:

Before making the decisions, the optimizer allows the developer to override the information about physical design by using several function ̳hooks‘. The hooks can be replaced at runtime with functions that insert new stastistics information into the list of physical design features. This makes the optimizer believe that the newly inserted data regarding the what-if indexes and what-if tables are present in the database. Then, the optimizer selects the execution plans using the statistics from the what-if features.

I guess that this is what I need. Some way to virtually create indexes and then calling the MySQL optimizer or some sort of EXPLAIN method to understand "what MySQL optimizer thinks about this index? Would this index be used if it existed?".

I am probably wrong about all I have said here, but I want to hear what you think about this.

I am very grateful for all your help.

Att.

Eduardo Weiland
http://eduardoweiland.info

[1] PARINDA: an interactive physical designer for PostgreSQL http://dl.acm.org/citation.cfm?id=1739131
[2] An automated, yet interactive and portable DB designer http://dl.acm.org/citation.cfm?id=1807167.1807314

--
MySQL Internals Mailing List
For list archives: http://lists.mysql.com/internals
To unsubscribe: http://lists.mysql.com/internals

shawn l.green

unread,

Nov 12, 2015, 4:01:30 PM11/12/15

to

Hello Eduardo,

The drawback to your approach is that you won't have the cardinality
estimates for your index trials until you actually test those tuple
combinations for uniqueness. It is the cardinality that the optimizer
bases its estimates on.

Other approaches (most of them are manual) have been to examine the
query patterns (available in digest form from the PERFORMANCE SCHEMA) or
from the actual query statistics in the Slow query log (which can be
converted to patterns by tools like mysqldumpslow) to look for patterns
that are both
a) frequently used
b) examining many more rows than they need to return

Quite often, they can be improved by adding indexes to certain tables.
In many cases, though, it was because the query used a dependent
subquery which simply had to search a lot of row combinations.

So, it is easy to suggest indexes that would solve certain queries
quickly but over-indexing a table (creating more indexes than you need)
only hurts INSERT, UPDATE, and DELETE performance if you are trying to
improve a query you only execute rarely.

--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/
for details.

Eduardo Weiland

unread,

Nov 12, 2015, 7:23:13 PM11/12/15

to

Quoting Physical Database Design [1]:

"index statistics can be sufficiently estimated for the purposes of
physical design advisors from relatively straightforward calculations
based on the table and columns statistics of the underlying table that
the virtual index will be based on."

There are already tools like this for other RDBMS: Microsoft SQL Server
Database Tuning Advisor, Oracle Access Advisor and IBM DB2 Design
Advisor. I just want to create a tool that does the same thing for MySQL.

Furthermore I intend to analyze all the SQL queries, including SELECTs,
INSERTs, UPDATEs and DELETEs. Thus I want to find some index
configuration that is good enough for the entire workload. I do know
that workload can vary over time and the optimizations become outdated,
but this is an academic research and I am assuming a constant workload.
Adapt to dynamic workload changes is future work.

I plan to use some statistics such as how many times each query is
executedto make queries that run most often to perform better. So, if
there are more INSERTs and UPDATEs than SELECTs in a given table, I
intend to recommend less indexes for this table, and even suggest to
drop some existing indexes if needed.

[1] LIGHTSTONE S., TEOREY T., NADEAU T. Physical Database Design: The
Database Professional’s Guide to Exploiting Indexes, Views, Storage, and
More. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2007.
ISBN 9780080552316.

shawn l.green

unread,

Nov 12, 2015, 8:32:03 PM11/12/15

to

Hello Eduardo,

You said (in your top-posted response)

> "index statistics can be sufficiently estimated for the purposes of
> physical design advisors from relatively straightforward calculations
> based on the table and columns statistics of the underlying table that
> the virtual index will be based on."

That is exactly the task I said you would need to do when I said "until
you actually test those tuple combinations for uniqueness". MySQL does
not keep or generate statistics for every column of a table. It only
generates statistics for each index based on the columns used in that
index.

I perfectly agree with modeling a constant distribution of query types.
There is no need for you to attempt to make your system dynamic.

Here are some references to help you get started:

* The sys Schema (an intrinsic part of MySQL version 5.7 but can be
added to 5.6 with ease):
https://github.com/MarkLeith/mysql-sys
(search for any blogs by Mark Leith, the author, for usage ideas)

* Current index usage (useful for finding indexes that should be dropped)

http://dev.mysql.com/doc/refman/5.7/en/table-waits-summary-tables.html#table-io-waits-summary-by-index-usage-table

* Useful pages from the manual
http://dev.mysql.com/doc/refman/5.7/en/multiple-column-indexes.html
http://dev.mysql.com/doc/refman/5.7/en/index-statistics.html
http://dev.mysql.com/doc/refman/5.7/en/index-btree-hash.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-fulltext-index.html
http://dev.mysql.com/doc/refman/5.7/en/index-hints.html
http://dev.mysql.com/doc/refman/5.7/en/index-extensions.html
http://dev.mysql.com/doc/refman/5.7/en/glossary.html#glos_covering_index
http://dev.mysql.com/doc/refman/5.7/en/innodb-statistics-estimation.html
http://dev.mysql.com/doc/refman/5.7/en/innodb-persistent-stats.html
http://dev.mysql.com/doc/refman/5.7/en/slow-query-log.html

One alternative to changing the tables is to change the query to more
efficiently use the indexes already on the table. But that would
probably be your next project :)

http://dev.mysql.com/doc/refman/5.7/en/query-rewrite-plugins.html

Yours,

--
Shawn Green
MySQL Senior Principal Technical Support Engineer
Oracle USA, Inc. - Integrated Cloud Applications & Platform Services
Office: Blountville, TN

Become certified in MySQL! Visit https://www.mysql.com/certification/
for details.