Anyone fancy a MySQL query optimisation?

Kevin Bowman

unread,

Mar 5, 2012, 7:03:59 PM3/5/12

to joindin-d...@googlegroups.com

Hi list,

I noticed that the joind.in homepage is running a little slowly, so I had a quick look into it - it looks like there's a particularly expensive MySQL query running when the homepage is requested:

select t.talk_title, t.ID, count(tc.ID) as ccount, get_talk_rating(t.ID) as tavg, e.ID eid, e.event_name from talks t JOIN talk_comments tc ON tc.talk_id=t.ID AND tc.private = 0 JOIN events e ON e.ID=t.event_id where t.active=1 and (tc.user_id != 0 and tc.rating != 0) group by t.ID order by ccount desc limit 7;

This is taking nearly 1 second to run, during which time MySQL is fully using a CPU. I think it used to be really fast because we've got the query cache enabled, but recently it started using the get_talk_rating() user-defined function which automatically misses the query cache. Here's the explain plan on the current live DB:

mysql> explain select t.talk_title, t.ID, count(tc.ID) as ccount, get_talk_rating(t.ID) as tavg, e.ID eid, e.event_name from talks t JOIN talk_comments tc ON tc.talk_id=t.ID AND tc.private = 0 JOIN events e ON e.ID=t.event_id where t.active=1 and (tc.user_id != 0 and tc.rating != 0) group by t.ID order by ccount desc limit 7;

+----+-------------+-------+--------+-------------------+---------+---------+--------------------+-------+----------------------------------------------+

+----+-------------+-------+--------+-------------------+---------+---------+--------------------+-------+----------------------------------------------+

| 1 | SIMPLE | e | eq_ref | PRIMARY | PRIMARY | 4 | joindin.t.event_id | 1 | |

+----+-------------+-------+--------+-------------------+---------+---------+--------------------+-------+----------------------------------------------+

3 rows in set (0.00 sec)

I'm pretty sure that it should be possible to run the get_talk_rating logic and have a fast query - does anyone fancy having a look? I can provide an anonymised version of the data if desired, and open a ticket on Jira for it, but I thought some discussion here would be good.

Thanks,

Kevin

Joshua Thijssen

unread,

Mar 6, 2012, 3:11:19 AM3/6/12

to joindin-d...@googlegroups.com

Out of the top of my head: the get_talk_rating is a deterministic
function, so it should be able to get cached. In any case, it isn't
really the problem: the reason you have the temporary / filesort is due
to the fact you are ordering on ccount, which is a count(*). It cannot
use any indexation in this case obviously. It will generate ALL 15508
entries (including the get_talk_rating() calls), and limit to 7.

There are 2 options: remove the ccount, which won't help you because you
still don't know the top talks and the get_talk_ratings() gets called
anyway,

OR

do a subquery. Fetch the 7 ID's from the query (ONLY the talk ID's).
That way, the get_talk_ratings is only called 7 times instead of 15K times.

Not enough time to actually create it right now, but I think you get the
general idea?

gr,
josh

lornajane

unread,

Apr 29, 2012, 12:24:45 PM4/29/12

to joindin-developers

I made a change today that fetches the rating for each talk that we
display on the homepage, rather than all [insert ridiculously large
number here] talks in the db. This is what happened to the response
times for the front page when I deployed the change.

http://flic.kr/p/bCGgDA

Sorted - thanks to Kevin for spotting this, nagging me, and setting up
my smokeping tool for me that you see in the screenshot :)

Lorna

Reply all

Reply to author

Forward