[Django] #32711: MySQL performance regression in forms.models.apply_limit_choices_to_to_formfield

8 views
Skip to first unread message

Django

unread,
May 3, 2021, 1:52:57 PM5/3/21
to django-...@googlegroups.com
#32711: MySQL performance regression in
forms.models.apply_limit_choices_to_to_formfield
----------------------------------------------+------------------------
Reporter: Thomas Achtemichuk | Owner: nobody
Type: Uncategorized | Status: new
Component: Forms | Version: 3.2
Severity: Normal | Keywords:
Triage Stage: Unreviewed | Has patch: 0
Needs documentation: 0 | Needs tests: 0
Patch needs improvement: 0 | Easy pickings: 0
UI/UX: 0 |
----------------------------------------------+------------------------
[https://github.com/django/django/commit/556fa4bbba5ba86bc1646a86fb11ab55405d4aa4
This commit] related to #11707 in Django 3.2 introduced changes to
apply_limit_choices_to_to_formfield. The query generated, for example,
with this field on a model:

{{{
editor = models.ForeignKey(
settings.AUTH_USER_MODEL,
null=True,
blank=True,
limit_choices_to={
'is_staff': True,
'groups__name': 'Editor',
},
on_delete=models.SET_NULL)
}}}

results in the following query which takes ~25s to return on a
`db.r5.xlarge` RDS instance running MySQL 5.7:

{{{
SELECT
`auth_user`.`id`,
`auth_user`.`password`,
`auth_user`.`last_login`,
`auth_user`.`is_superuser`,
`auth_user`.`username`,
`auth_user`.`first_name`,
`auth_user`.`last_name`,
`auth_user`.`email`,
`auth_user`.`is_staff`,
`auth_user`.`is_active`,
`auth_user`.`date_joined`
FROM
`auth_user`
WHERE EXISTS(
SELECT
(1) AS `a`
FROM
`auth_user` U0
INNER JOIN
`auth_user_groups` U1 ON (U0.`id` = U1.`user_id`)
INNER JOIN
`auth_group` U2 ON (U1.`group_id` = U2.`id`)
WHERE (
U2.`name` = 'Editor'
AND U0.`is_staff`
AND U0.`id` = `auth_user`.`id`
)
LIMIT 1
)
ORDER BY `auth_user`.`date_joined` DESC;

...
21 rows in set (26.18 sec)
}}}

Counts for the involved tables:

{{{
mysql> SELECT COUNT(*) FROM auth_user;
+----------+
| COUNT(*) |
+----------+
| 3360439 |
+----------+

mysql> SELECT COUNT(*) FROM auth_group;
+----------+
| COUNT(*) |
+----------+
| 14 |
+----------+

mysql> SELECT COUNT(*) FROM auth_user_groups;
+----------+
| COUNT(*) |
+----------+
| 460 |
+----------+
}}}

And the `EXPLAIN`:

{{{
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: auth_user
partitions: NULL
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 3201287
filtered: 100.00
Extra: Using where; Using filesort
*************************** 2. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: U2
partitions: NULL
type: const
possible_keys: PRIMARY,name
key: name
key_len: 452
ref: const
rows: 1
filtered: 100.00
Extra: Using index
*************************** 3. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: U1
partitions: NULL
type: eq_ref
possible_keys: user_id,auth_user_groups_403f60f,auth_user_groups_425ae3c4
key: user_id
key_len: 8
ref: bso.auth_user.id,const
rows: 1
filtered: 100.00
Extra: Using index
*************************** 4. row ***************************
id: 2
select_type: DEPENDENT SUBQUERY
table: U0
partitions: NULL
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: bso.auth_user.id
rows: 1
filtered: 90.00
Extra: Using where
}}}

--
Ticket URL: <https://code.djangoproject.com/ticket/32711>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

Django

unread,
May 12, 2021, 5:43:09 AM5/12/21
to django-...@googlegroups.com
#32711: MySQL performance regression in
forms.models.apply_limit_choices_to_to_formfield
------------------------------------+--------------------------------------

Reporter: Thomas Achtemichuk | Owner: nobody
Type: Uncategorized | Status: closed
Component: Forms | Version: 3.2
Severity: Normal | Resolution: worksforme

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------
Changes (by Mariusz Felisiak):

* cc: Adam Johnson (added)
* status: new => closed
* resolution: => worksforme


Comment:

Thanks for the report, however I cannot reproduce this issue on MySQL
8.0.23. All indexes are used correctly:
{{{
>>> User.objects.filter(Exists(User.objects.filter(Q(pk=OuterRef('pk')) &
Q(is_staff=True) & Q(groups__name='group_480')))).explain(analyze=True)

-> Nested loop inner join (cost=4.20 rows=4) (actual time=0.110..0.138
rows=2 loops=1)
-> Nested loop inner join (cost=2.80 rows=4) (actual
time=0.081..0.109 rows=4 loops=1)
-> Index lookup on U1 using
auth_user_groups_group_id_97559544_fk_auth_group_id (group_id='481')
(cost=1.40 rows=4) (actual time=0.058..0.065 rows=4 loops=1)
-> Single-row index lookup on auth_user using PRIMARY
(id=U1.user_id) (cost=0.28 rows=1) (actual time=0.009..0.010 rows=1
loops=4)
-> Filter: (0 <> U0.is_staff) (cost=0.27 rows=1) (actual
time=0.006..0.006 rows=0 loops=4)
-> Single-row index lookup on U0 using PRIMARY (id=U1.user_id)
(cost=0.27 rows=1) (actual time=0.004..0.005 rows=1 loops=4)
}}}

It looks that you don't have an index on `auth_user.id`, but I don't know
why.

--
Ticket URL: <https://code.djangoproject.com/ticket/32711#comment:1>

Django

unread,
May 12, 2021, 10:33:53 AM5/12/21
to django-...@googlegroups.com
#32711: MySQL performance regression in
forms.models.apply_limit_choices_to_to_formfield
------------------------------------+--------------------------------------

Reporter: Thomas Achtemichuk | Owner: nobody
Type: Uncategorized | Status: closed
Component: Forms | Version: 3.2
Severity: Normal | Resolution: worksforme

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------

Comment (by Simon Charette):

> It looks that you don't have an index on auth_user.id, but I don't know
why.

I doubt that's the issue, MySQL is just notably really picky with regards
to subqueries optimizations

https://dev.mysql.com/doc/refman/5.7/en/subquery-optimization-with-
exists.html

It seems we are properly following the docs though with our
`outer_expr=inner_expr` usage via `U0.id = auth_user.id` unless MySQL
doesn't like swapping us to `inner_expr=outer_expr` format?

Thomas, are you using the latest version of MySQL 5.7 and does swapping
`AND U0.id = auth_user.id` to `AND auth_user.id = U0.id` in the subquery
make any difference?

--
Ticket URL: <https://code.djangoproject.com/ticket/32711#comment:2>

Django

unread,
May 12, 2021, 11:25:22 AM5/12/21
to django-...@googlegroups.com
#32711: MySQL performance regression in
forms.models.apply_limit_choices_to_to_formfield
------------------------------------+--------------------------------------

Reporter: Thomas Achtemichuk | Owner: nobody
Type: Uncategorized | Status: closed
Component: Forms | Version: 3.2
Severity: Normal | Resolution: worksforme

Keywords: | Triage Stage: Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
------------------------------------+--------------------------------------

Comment (by Thomas Achtemichuk):

Thanks for following up Mariusz & Simon.

I agree that this smacks of typically poor MySQL subquery optimization,
which is one of the reasons we're planning on migrating to PostgreSQL in
the near future. In the meantime, I've monkeypatched in the
`apply_limit_choices_to_to_formfield` from 3.1 which is doing the trick
for us. And as 5.7 is quickly approaching EOL, probably not worth any more
time/attention to the issue we're seeing.

Simon, we're on MySQL 5.7.25, so a few minor versions back from the
current GA release of 5.7.34 (or 5.7.33, available on RDS). Swapping the
order of the condition doesn't seem to have any effect.

--
Ticket URL: <https://code.djangoproject.com/ticket/32711#comment:3>

Reply all
Reply to author
Forward
0 new messages