merge index的奇怪问题

1 view
Skip to first unread message

jason he

unread,
Jul 28, 2010, 10:46:42 AM7/28/10
to imy...@googlegroups.com
mysql> explain select SENDERID from test1  force index(RECEIVERID,DELETED) where test1.RECEIVERID=16136  or test1.DELETED=0 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test1
         type: index_merge
possible_keys: RECEIVERID,DELETED
          key: RECEIVERID,DELETED
      key_len: 4,1
          ref: NULL
         rows: 13296
        Extra: Using union(RECEIVERID,DELETED); Using where
1 row in set (0.00 sec)

mysql> explain select SENDERID from test1  where test1.RECEIVERID=16136  or test1.DELETED=0 \G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test1
         type: ALL
possible_keys: RECEIVERID,DELETED
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 26590
        Extra: Using where
1 row in set (0.00 sec)

mysql 的版本如下,存储引擎为innodb, 采用共享存储。为什么默认用不上union index呢?

mysql> select version();
+------------+
| version()  |
+------------+
| 5.1.42-log |
+------------+
1 row in set (0.00 sec)

--
blog: http://hi.baidu.com/hexie007

Neo

unread,
Jul 28, 2010, 12:20:26 PM7/28/10
to imy...@googlegroups.com

我认为是优化器认为你查询的数据太多了,所以才会抛弃使用索引

这个是我做的例子:

people里面数据为525312条
mysql> select count(*) from people;
+----------+
| count(*) |
+----------+
|   525312 |
+----------+
1 row in set (0.88 sec)

people上的索引
mysql> show indexes from people\G
*************************** 1. row ***************************
       Table: people
  Non_unique: 0
    Key_name: PRIMARY
Seq_in_index: 1
 Column_name: user_id
   Collation: A
 Cardinality: 533964
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 2. row ***************************
       Table: people
  Non_unique: 1
    Key_name: username
Seq_in_index: 1
 Column_name: username
   Collation: A
 Cardinality: 669
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
*************************** 3. row ***************************
       Table: people
  Non_unique: 1
    Key_name: city
Seq_in_index: 1
 Column_name: city
   Collation: A
 Cardinality: 464
    Sub_part: NULL
      Packed: NULL
        Null:
  Index_type: BTREE
     Comment:
3 rows in set (0.14 sec)


mysql> explain select * from people where username='user10' or city='haidain9'\G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: people
         type: index_merge
possible_keys: username,city
          key: username,city
      key_len: 52,102
          ref: NULL
         rows: 7228
        Extra: Using union(username,city); Using where

1 row in set (0.00 sec)


我将一个city=‘haidain9’ 这个结果集的数据增加了10W
mysql> explain select * from people where username='user10' or city='haidain9';
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | people | ALL  | username,city | NULL | NULL    | NULL | 531887 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+--------+-------------+
1 row in set (0.06 sec)

所以,你的问题是,你扫描的数据太多了,导致优化器直接将索引放弃了

我测试了下,好像扫描的数据是全数据量1/4以上就会把索引抛弃(这个希望达人给证实下)




--
您收到此邮件是因为您订阅了 Google 网上论坛的“[MySQL中文用户群]”论坛。
要向此网上论坛发帖,请发送电子邮件至 imy...@googlegroups.com
要取消订阅此网上论坛,请发送电子邮件至 imysql+un...@googlegroups.com
若有更多问题,请通过 http://groups.google.com/group/imysql?hl=zh-CN 访问此网上论坛。

flycondor

unread,
Jul 28, 2010, 9:24:41 PM7/28/10
to imy...@googlegroups.com
google一下"mysql index selectivity"
能找到不少相关资料

简单说就是,因为磁盘是顺序访问速度远高于随机访问速度的特点,优化器如果认为通过索引查询到的数据超过全数据集一定比例以上,就认为顺序扫描比根据索引结果随机访问要效率高

这个比例我记得在high performance
mysql第一版中说是15%,但是作者强调,这个比例可能随着版本变更会变。毕竟底层存储设备的性能特点会变,mysql的优化策略肯定也会跟着变。

具体到这个例子里,对deleted这类只有有限的几个枚举值的字段建索引,是没有意义的
因为selectivity很差,优化器默认是不用的
就算你force index,实际执行效率未必比全表扫描好

建议你把deleted字段的值,把表拆分成多个
当然,前提是,根据查询频率分布,多数查询是限定在deleted等于某个确定值的

以上带有推测成分,仅供参考

杨海朝

unread,
Jul 28, 2010, 11:27:45 PM7/28/10
to imy...@googlegroups.com
楼主说的很对。mysql的查询分析器是基于cost的,在设计index时,可以通过考察这个要做index列的selectivity是否15%以内,从某种程度上减少这个问题出现,以及设计出优化的index.
 
 
 
 


 
2010/7/29 flycondor <flyc...@douban.com>

向翔

unread,
Jul 29, 2010, 12:36:47 AM7/29/10
to imy...@googlegroups.com
这个在sql server里面是有智能支持的,sqlserver为每个可能filter的字段建立统计直方图,通过了解数据分布,就能够判断是index look up 还是scan
谢谢!



                           向翔

flycondor

unread,
Jul 29, 2010, 1:17:50 AM7/29/10
to imy...@googlegroups.com
mysql里也有类似统计信息,作为优化器判断index selectivity的根据

jason he

unread,
Jul 29, 2010, 5:37:56 AM7/29/10
to imy...@googlegroups.com
谢谢各位大虾的热情回复。做实验验证了,确实是这个问题。在我的实验当中,大概在13%以内的样子。超过了可能会使用不到索引了。

mysql> select (select count(*) from test where DELETED=1)/(select count(*) from test);
+-------------------------------------------------------------------------+
| (select count(*) from test where DELETED=1)/(select count(*) from test) |
+-------------------------------------------------------------------------+
|                                                                                                    0.1378 |
+-------------------------------------------------------------------------+

mysql> explain  select count(*)  from test where test=23705 or test.DELETED=1 \G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test

         type: ALL
possible_keys: RECEIVERID,DELETED
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 24044

        Extra: Using where
1 row in set (0.00 sec)

mysql> update test set DELETED=0 where DELETED=1 limit 200;                                                                       
Query OK, 200 rows affected, 1 warning (0.01 sec)
Rows matched: 200  Changed: 200  Warnings: 0

mysql> select (select count(*) from test where DELETED=1)/(select count(*) from test);                                     
+------------------------------------------------------------------------+
| (select count(*) from test where DELETED=1)/(select count(*) from test |
+------------------------------------------------------------------------+
|                                                                                                   0.1262 |
+------------------------------------------------------------------------+
1 row in set (0.02 sec)

mysql> explain  select count(*) from test where test.RECEIVERID=23705 or test.DELETED=1 \G

*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test

         type: index_merge
possible_keys: RECEIVERID,DELETED
          key: RECEIVERID,DELETED
      key_len: 4,1
          ref: NULL
         rows: 4677

        Extra: Using union(RECEIVERID,DELETED); Using where
1 row in set (0.00 sec)

再次谢谢各位。
blog: http://hi.baidu.com/hexie007
03D.gif
000.gif
Reply all
Reply to author
Forward
0 new messages