Slow SQL query

87 views
Skip to first unread message

Collin Anderson

unread,
Oct 24, 2014, 12:30:55 PM10/24/14
to django...@googlegroups.com
Hi All,

I've been trying to figure out the proper way to index my data or construct my query so I can query orders in a reasonable amount of time. Is there a better way to index/query my data?

orders = Order.objects.exclude(status=Order.CART).filter(Q(user=user) | Q(account_number=account_number))
page
= Paginator(orders, per_page=10).page(request.GET.get('page') or 1)

# Query_time: 49.414771  Lock_time: 0.000148 Rows_sent: 1  Rows_examined: 412843
SELECT COUNT
(*) FROM `order_order` WHERE (NOT (`order_order`.`status` = 4) AND (`order_order`.`user_id` = 12345 OR `order_order`.`account_number` = '123456'));

class Order(models.Model):
    SUBMITTED
, CART, SHIPPED, PARTIAL_SHIPPED, PROCESSED = 1, 4, 5, 6, 7
    STATUS_CHOICES
= (
       
(CART, 'Cart'),
       
(SUBMITTED, 'Submitted on Website'),
       
(PROCESSED, 'Processing'),
       
(PARTIAL_SHIPPED, 'Partial Shipped'),
       
(SHIPPED, 'Shipped'),
   
)
    status
= models.IntegerField('Order Status', choices=STATUS_CHOICES, default=CART, db_index=True)
    user
= models.ForeignKey('account.UserProfile', null=True, blank=True)
    account_number
= models.CharField(max_length=20, blank=True, db_index=True)

I also didn't use the indexes when I tried: WHERE `order_order`.`status` IN (1, 5, 6, 7)

If I take out the exclude(status=Order.CART) it properly uses the user and account indexes, but then it would have an improper count.

Thanks,
Collin

Larry Martell

unread,
Oct 24, 2014, 12:50:52 PM10/24/14
to django...@googlegroups.com
You'll probably get more replies on the mysql list. But you could try
running EXPLAIN on the query. Also, what if you change it to:

SELECT COUNT(*)
FROM `order_order`
WHERE `order_order`.`status` != 4
AND (`order_order`.`user_id` = 12345 OR `order_order`.`account_number`
= 123456');

Collin Anderson

unread,
Oct 27, 2014, 8:20:57 AM10/27/14
to django...@googlegroups.com
mysql> explain SELECT COUNT(*)
   
-> FROM `order_order`
   
-> WHERE `order_order`.`status` != 4
   
-> AND (`order_order`.`user_id` = 12345 OR `order_order`.`account_number`
   
-> = 123456);
+----+-------------+-------------+------+-------------------------------------------------------------------------------------+------+---------+------+--------+-------------+
| id | select_type | table       | type | possible_keys                                                                       | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------------+------+-------------------------------------------------------------------------------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | order_order | ALL  | order_order_fbfc09f1,order_order_979d4f1e,order_order_48fb58bb,order_order_309e75c7 | NULL | NULL    | NULL | 338275 | Using where |
+----+-------------+-------------+------+-------------------------------------------------------------------------------------+------+---------+------+--------+-------------+
1 row in set (0.16 sec)

I can't get it to use the indexes when I have the "status" clause.

Anyway, you might be right about using the mysql list.

Thanks,
Collin


Erik Cederstrand

unread,
Oct 27, 2014, 9:05:51 AM10/27/14
to Django Users
Note that your indexes are on individual columns, not combined. So MySQL has to choose one of the indexes; it can't combine them. It's probably choosing the state index, then deciding that the cardinality is very low (or stats show that state=4 is true for more than n pct of the rows) and then does a full table scan.

You might benefit from adding combined indexes with status and account_number|user.

class Meta:
index_together = [
['status', 'account_number'],
['status', 'user'],
]


Erik

Collin Anderson

unread,
Oct 27, 2014, 9:55:45 AM10/27/14
to django...@googlegroups.com
Hi Erik,

No luck...

mysql> show indexes in order_order;
+-------------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table       | Non_unique | Key_name             | Seq_in_index | Column_name    | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| order_order |          0 | PRIMARY              |            1 | id             | A         |      311492 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_8df3c379 |            1 | order_number   | A         |      311492 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_fbfc09f1 |            1 | user_id        | A         |           8 |     NULL | NULL   | YES  | BTREE      |         |               |
| order_order |          1 | order_order_979d4f1e |            1 | account_number | A         |      311492 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_48fb58bb |            1 | status         | A         |          10 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_482713d4 |            1 | user_id        | A         |          12 |     NULL | NULL   | YES  | BTREE      |         |               |
| order_order |          1 | order_order_482713d4 |            2 | account_number | A         |      155746 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_fb3214ea |            1 | status         | A         |         215 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_fb3214ea |            2 | account_number | A         |         215 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_55eb7e10 |            1 | status         | A         |         215 |     NULL | NULL   |      | BTREE      |         |               |
| order_order |          1 | order_order_55eb7e10 |            2 | user_id        | A         |         215 |     NULL | NULL   | YES  | BTREE      |         |               |
+-------------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

mysql
> explain SELECT COUNT(*) FROM `order_order` WHERE `order_order`.`status` != 4  AND (`order_order`.`user_id` = 12345 OR `order_order`.`account_number` = 123456);
+----+-------------+-------------+------+-------------------------------------------------------------------------------------------------------------------------------+------+---------+------+--------+-------------+

| id | select_type | table       | type | possible_keys                                                                                                                 | key  | key_len | ref  | rows   | Extra       |
+----+-------------+-------------+------+-------------------------------------------------------------------------------------------------------------------------------+------+---------+------+--------+-------------+
|  1 | SIMPLE      | order_order | ALL  | order_order_fbfc09f1,order_order_979d4f1e,order_order_48fb58bb,order_order_482713d4,order_order_fb3214ea,order_order_55eb7e10 | NULL | NULL    | NULL | 311492 | Using where |
+----+-------------+-------------+------+-------------------------------------------------------------------------------------------------------------------------------+------+---------+------+--------+-------------+

Thanks,
Collin

Erik Cederstrand

unread,
Oct 27, 2014, 10:40:14 AM10/27/14
to Django Users
> Den 27/10/2014 kl. 14.55 skrev Collin Anderson <cmawe...@gmail.com>:
>
> Hi Erik,
>
> No luck...
>
> mysql> show indexes in order_order;
> +-------------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
> +-------------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
> | order_order | 0 | PRIMARY | 1 | id | A | 311492 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_8df3c379 | 1 | order_number | A | 311492 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_fbfc09f1 | 1 | user_id | A | 8 | NULL | NULL | YES | BTREE | | |
> | order_order | 1 | order_order_979d4f1e | 1 | account_number | A | 311492 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_48fb58bb | 1 | status | A | 10 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_482713d4 | 1 | user_id | A | 12 | NULL | NULL | YES | BTREE | | |
> | order_order | 1 | order_order_482713d4 | 2 | account_number | A | 155746 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_fb3214ea | 1 | status | A | 215 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_fb3214ea | 2 | account_number | A | 215 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_55eb7e10 | 1 | status | A | 215 | NULL | NULL | | BTREE | | |
> | order_order | 1 | order_order_55eb7e10 | 2 | user_id | A | 215 | NULL | NULL | YES | BTREE | | |
> +-------------+------------+----------------------+--------------+----------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Ok, you're not giving much of a chance to the query planner either. The cardinality of status is 10, so status!=4 potentially means "give me 90% of the rows". The cardinality of user_id is a mere 12, which potentially means "give me 8% of the rows". Your query could easily return 30.000 rows, according to the above (since you're OR'ing user_id and account_number).

All MySQL knows is that it might be returning ca. 8% of the rows. That's about the threshold where MySQL gives up and simply does a full table scan.

Should the query really return ~30.000 rows in practice? If not, maybe you need to re-design the way you query your orders. Your status!=4 is no help at all, and user_id=12345 is killing the query.

That said, a query time of 49 seconds is absurd for just 300.000 rows even for a full table scan, and you should look into why.

Erik

Collin Anderson

unread,
Oct 29, 2014, 1:15:41 PM10/29/14
to django...@googlegroups.com
Hi Erik,
 
Ok, you're not giving much of a chance to the query planner either. The cardinality of status is 10, so status!=4 potentially means "give me 90% of the rows". The cardinality of user_id is a mere 12, which potentially means "give me 8% of the rows". Your query could easily return 30.000 rows, according to the above (since you're OR'ing user_id and account_number). 
Right, in practice it should be less than 1000 returned rows.

I've changed the code to run the query without the status != 4 clause, and manually filtering that out using python, because, yes, 97% of the rows are status != 4. Or maybe a subquery would be better here to say: first get me all the orders for these people, _then_ filter out status=4.

As far as user_id goes, 98% of the rows have user_id=None. Shouldn't the index be helpful there?

That said, a query time of 49 seconds is absurd for just 300.000 rows even for a full table scan, and you should look into why.
This is probably why :) 
Name        | Engine | Version | Row_format | Rows   | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment
order_order
| InnoDB |      10 | Compact    | 672252 |           1765 |  1186988032 |               0 |     85639168 | 822083584 |         716087

Thanks for all the help,
Collin

Erik Cederstrand

unread,
Oct 29, 2014, 5:43:08 PM10/29/14
to Django Users
> Den 29/10/2014 kl. 18.15 skrev Collin Anderson <cmawe...@gmail.com>:
>
> Right, in practice it should be less than 1000 returned rows.
>
> I've changed the code to run the query without the status != 4 clause, and manually filtering that out using python, because, yes, 97% of the rows are status != 4. Or maybe a subquery would be better here to say: first get me all the orders for these people, _then_ filter out status=4.
>
> As far as user_id goes, 98% of the rows have user_id=None. Shouldn't the index be helpful there?

I'm not entirely sure how clever MySQL is with statistics on the indexes, so I can't be sure. But if user_id is almost always None, then this raises the question if your data model is appropriate. Your previous query examined 300.000 rows but returned only 1 row, which tells me either your query or data model is holding back information on the nature of your data.

Maybe Users doesn't belong in the Orders table? You could move the user to a different table which stores the Order.id <-> user_id relation, which would give a fast lookup on user_id and thus easy access to the Order.id index.

Erik

Collin Anderson

unread,
Oct 31, 2014, 6:31:47 PM10/31/14
to django...@googlegroups.com
Hi Erik,
 
Maybe Users doesn't belong in the Orders table? You could move the user to a different table which stores the Order.id <-> user_id relation, which would give a fast lookup on user_id and thus easy access to the Order.id index.
Interesting idea! I'll think about it.

Thanks,
Collin
 
Reply all
Reply to author
Forward
0 new messages