Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

LEFT JOIN... WHERE column IS NULL, very slow

2,376 views
Skip to first unread message

Jason C

unread,
Sep 14, 2013, 5:51:52 AM9/14/13
to
Here's the query I'm using:

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND (
classifieds_img.sorter = 0
OR classifieds_img.sorter IS NULL
)
ORDER BY id DESC

This query runs in 0.64s, which is very high!

The table "classifieds" has about 50,000 rows, and it uses an index with 787 rows:
ref_or_null
Using where; Using temporary; Using filesort

The table "classifieds_which" has about 260,000 rows, and it uses an index with 1 row:
eq_ref
Using where; Using index

These seem OK to me. But the table "classifieds_img" isn't using an index, and I can't figure out how to change that. The table has about 25,000 rows, and EXPLAIN shows:

ALL
25314
Range checked for each record (index map: 0x1D)

The table has 3 columns:

id
sorter
image

The ID would correspond to the ID field in classifieds, of course. There can be up to 10 rows for each ID, and "sorter" is an integer for each of those (0-9). If sorter doesn't exist, then it means there is no matching ID. The third column, image, is simply the name of the image.

I have ID and sorter together set to UNIQUE, then I have ID and image set to UNIQUE. For testing, I've set up indices for:

ID
ID + sorter + image (all 3 columns)
sorter + image

but none of them are being used (even the one with all 3 columns). My WHERE relies on the ID and sorter, so I thought the first UNIQUE would work, but it's not.

If I remove the "OR classifieds_img.sorter IS NULL", then it does use the index with all 3 columns, and finishes in 0.086s (a major improvement over 0.64s).

Any suggestions on how to get this query to speed up? If it's not an issue of an index, then what should I change?

Luuk

unread,
Sep 14, 2013, 6:01:04 AM9/14/13
to
Can you show a 'show create table classifieds_img' ?

i think (based on a quick look at your query) you need an index on the
fields:
id, which and sorter in that table...

Lennart Jonsson

unread,
Sep 14, 2013, 7:46:04 AM9/14/13
to
On 09/14/2013 11:51 AM, Jason C wrote:
[...]
> but none of them are being used (even the one with all 3 columns). My WHERE relies on the ID and sorter, so I thought the first UNIQUE would work, but it's not.
>
> If I remove the "OR classifieds_img.sorter IS NULL", then it does use the index with all 3 columns, and finishes in 0.086s (a major improvement over 0.64s).
>
> Any suggestions on how to get this query to speed up? If it's not an issue of an index, then what should I change?
>

Is index used for each of the branches?:

a)

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img -- can be replaced with inner join
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter = 0
ORDER BY id DESC

b)

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter IS NULL
ORDER BY id DESC


You can try:

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND coalesce(classifieds_img.sorter,0) = 0
ORDER BY id DESC

and see if that helps


/Lennart

Norman Peelman

unread,
Sep 14, 2013, 9:30:51 AM9/14/13
to
On 09/14/2013 05:51 AM, Jason C wrote:
> Here's the query I'm using:
>
> SELECT classifieds.id, title, classifieds_img.image
> FROM classifieds
> JOIN classifieds_which
> ON classifieds_which.id = classifieds.id
> LEFT JOIN classifieds_img
> ON classifieds_img.id = classifieds.id
> WHERE category= 'Cars'
> AND subcat= 'Ford'
> AND classifieds_which.which='varA'
> AND (
> classifieds_img.sorter = 0
> OR classifieds_img.sorter IS NULL
> )
> ORDER BY id DESC
>

You should use full dot notation for each column, I'm assuming
'title', 'category' and 'subcat' are all coming from 'classifieds'. Are
you sure this is the query, your ORDER BY clause is using a non-unique
column name - use full dot notation.

What is 'classifieds_which'?

> This query runs in 0.64s, which is very high!
>
> The table "classifieds" has about 50,000 rows, and it uses an index with 787 rows:
> ref_or_null
> Using where; Using temporary; Using filesort
>
> The table "classifieds_which" has about 260,000 rows, and it uses an index with 1 row:
> eq_ref
> Using where; Using index
>
> These seem OK to me. But the table "classifieds_img" isn't using an index, and I can't figure out how to change that. The table has about 25,000 rows, and EXPLAIN shows:
>
> ALL
> 25314
> Range checked for each record (index map: 0x1D)
>
> The table has 3 columns:
>
> id
> sorter
> image
>
> The ID would correspond to the ID field in classifieds, of course. There can be up to 10 rows for each ID, and "sorter" is an integer for each of those (0-9). If sorter doesn't exist, then it means there is no matching ID. The third column, image, is simply the name of the image.
>

Hmm, you say .sorter can be 0-9 allowing for 10 rows (10 images).
.sorter = 0 allows a row to exist but you treat it same as IS NULL???
Your 'AND ( .sorter=0 OR .sorter IS NULL ) clause says "match 1 OR 0
rows". Don't you want just IS NULL?

> I have ID and sorter together set to UNIQUE, then I have ID and image set to UNIQUE. For testing, I've set up indices for:
>
> ID
> ID + sorter + image (all 3 columns)
> sorter + image
>
> but none of them are being used (even the one with all 3 columns). My WHERE relies on the ID and sorter, so I thought the first UNIQUE would work, but it's not.
>
> If I remove the "OR classifieds_img.sorter IS NULL", then it does use the index with all 3 columns, and finishes in 0.086s (a major improvement over 0..64s).
>
> Any suggestions on how to get this query to speed up? If it's not an issue of an index, then what should I change?
>


--
Norman
Registered Linux user #461062
-Have you been to www.mysql.com yet?-

Jason C

unread,
Sep 14, 2013, 5:24:05 PM9/14/13
to
On Saturday, September 14, 2013 7:46:04 AM UTC-4, Lennart Jonsson wrote:
> Is index used for each of the branches?:

I'm not sure what you mean here, Lennart? But no, it's not using an index for classifieds_img, unless I remove the "OR... IS NULL" from the query.


> LEFT JOIN classifieds_img -- can be replaced with inner join
> AND classifieds_which.which='varA'
> AND classifieds_img.sorter IS NULL

Unfortunately, using INNER JOIN doesn't work for this query, because I need to return rows when there's not a corresponding ID in classifieds_img (which is why I'm checking for 0 OR IS NULL; IS NULL referring to there not being a matching row).


> AND coalesce(classifieds_img.sorter,0) = 0

I'd never seen "coalesce" before, so thanks for that! But unfortunately, it doesn't replace the IS NULL part of the query, so it had no impact on the speed.

Jason C

unread,
Sep 14, 2013, 5:26:21 PM9/14/13
to
On Saturday, September 14, 2013 9:30:51 AM UTC-4, Norman Peelman wrote:

> > ORDER BY id DESC
>
> You should use full dot notation for each column, I'm assuming
> 'title', 'category' and 'subcat' are all coming from 'classifieds'.

That is correct; everything that I didn't use a table name with is coming from 'classifieds'.


> Are
> you sure this is the query, your ORDER BY clause is using a non-unique
> column name - use full dot notation.

Yes, it really is the query, verbatim (except that I don't literally search for "varA", of course). It's funny that you mention the ORDER BY, though, because I was looking at that this morning after posting, and wondering why it didn't throw an error.

I'm guessing that it's unique in the results, and that's all that it's looking at?

Either way, changing it to ORDER BY classifieds.id didn't impact the query speed.


> What is 'classifieds_which'?

It's hard to explain the purpose of the table (it's fairly unique for the site), but it has 3 columns:

id
which
expiration

"ID" corresponds to 'classifieds', and "which" is a varchar(20). "expiration" matches the "expiration" in classifieds, but is only used when cleaning up old data so it's not really relevant for this query.

The index it uses is a UNIQUE for "id" and "which".


> Hmm, you say .sorter can be 0-9 allowing for 10 rows (10 images).
> .sorter = 0 allows a row to exist but you treat it same as IS NULL???
> Your 'AND ( .sorter=0 OR .sorter IS NULL ) clause says "match 1 OR 0
> rows". Don't you want just IS NULL?

No, that's not the intent (and if that's what it's doing, then my query is wrong).

This part of the query:

AND (
classifieds_img.sorter = 0
OR classifieds_img.sorter IS NULL
)

is supposed to return a result if there's a corresponding ".sorter = 0", OR if there isn't a matching row at all (in which case .sorter would be NULL).

It's the "OR... IS NULL" that's slowing everything down, for sure. If I remove it, then the query is fast, but doesn't return all of the rows that I need.

For clarification, this is obviously searching through classified ads, which may have up to 10 pictures... but might not have any at all. I'm trying to fetch all of the current ads, and if they have a picture then, for this query, I only want to see the first one (in which case, sorter = 0).

In the end, my results might look like:

12345
Title 1
12345_0.jpg # the image where sorter=0

12346
Title 2
NULL # sorter IS NULL

12347
Title 3
12347_0.jpg

and so on.

Is there a more appropriate way to write this part of the query, rather than using .sorter IS NULL?

Norman Peelman

unread,
Sep 14, 2013, 11:13:24 PM9/14/13
to
On 09/14/2013 05:26 PM, Jason C wrote:
> On Saturday, September 14, 2013 9:30:51 AM UTC-4, Norman Peelman wrote:
>
>>> ORDER BY id DESC
>>
>> You should use full dot notation for each column, I'm assuming
>> 'title', 'category' and 'subcat' are all coming from 'classifieds'.
>
> That is correct; everything that I didn't use a table name with is coming from 'classifieds'.
>
>
>> Are
>> you sure this is the query, your ORDER BY clause is using a non-unique
>> column name - use full dot notation.
>
> Yes, it really is the query, verbatim (except that I don't literally search for "varA", of course). It's funny that you mention the ORDER BY, though, because I was looking at that this morning after posting, and wondering why it didn't throw an error.
>
> I'm guessing that it's unique in the results, and that's all that it's looking at?
>
> Either way, changing it to ORDER BY classifieds.id didn't impact the query speed.
>

Didn't think it would but it's better for *you* to be sure what
you're ordering by...
I thought maybe rewriting the query would use different indexes:

SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
WHERE classifieds.category = 'Cars'
AND classifieds.subcat = 'Ford'
AND classifieds_which.which = 'varA'
AND (classifieds_img.sorter = 0
OR classifieds_img.sorter IS NULL)
ORDER BY classifieds.id DESC

I only created 5 rows of test data.

Lennart Jonsson

unread,
Sep 15, 2013, 3:30:42 AM9/15/13
to
On 09/14/2013 11:24 PM, Jason C wrote:
> On Saturday, September 14, 2013 7:46:04 AM UTC-4, Lennart Jonsson wrote:
>> Is index used for each of the branches?:
>
> I'm not sure what you mean here, Lennart? But no, it's not using an index for classifieds_img, unless I remove the "OR... IS NULL" from the query.
>

What I meant was whether it used the index for the two queries a) and b)
individually. The union of these are equivalent with your query.

>
>> LEFT JOIN classifieds_img -- can be replaced with inner join
>> AND classifieds_which.which='varA'
>> AND classifieds_img.sorter IS NULL
>

Not sure how you managed to quote this from my post, I wrote:

LEFT JOIN classifieds_img -- can be replaced with inner join
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter = 0

[...]

>
>> AND coalesce(classifieds_img.sorter,0) = 0
>
> I'd never seen "coalesce" before, so thanks for that! But unfortunately, it doesn't replace the IS NULL part of the query, so it had no impact on the speed.
>

Not sure what you mean, the expression:

coalesce(classifieds_img.sorter,0) = 0

is equivalent with:

classifieds_img.sorter = 0 OR classifieds_img.sorter IS NULL

so it does replace the IS NULL part of the query. If you mean that it
does not help, sure, that could be the case.


/Lennart


Jason C

unread,
Sep 15, 2013, 4:58:46 AM9/15/13
to
On Sunday, September 15, 2013 3:30:42 AM UTC-4, Lennart Jonsson wrote:
> What I meant was whether it used the index for the two queries a) and b)
> individually. The union of these are equivalent with your query.

Ahh, I see now, sorry about that.

If I run the query without "OR classifieds_img.sorter IS NULL", then it does use the index with all 3 columns in it. For that, the query speed is 0.0952s.

If I run the query without "classifieds_img.sorter = 0", then it does not use an index. The last query like this took 0.5897s.

So, it look like "classifieds_img.sorter IS NULL" is definitely the bottleneck.


> Not sure how you managed to quote this from my post, I wrote:
>
> LEFT JOIN classifieds_img -- can be replaced with inner join
> ON classifieds_img.id = classifieds.id
> WHERE category= 'Cars'
> AND subcat= 'Ford'
> AND classifieds_which.which='varA'
> AND classifieds_img.sorter = 0

Sorry, I just completely misunderstood your post before.

But in reference to your note, I don't think that I can use INNER JOIN, since I need to return rows with or without a corresponding ID.


> Not sure what you mean, the expression:
>
> coalesce(classifieds_img.sorter,0) = 0
>
> is equivalent with:
>
> classifieds_img.sorter = 0 OR classifieds_img.sorter IS NULL
>
> so it does replace the IS NULL part of the query. If you mean that it
> does not help, sure, that could be the case.

I must have screwed up when I tried it before, but you're right, it does work to replace both. My mistake (again).

However, the query time was essentially the same (0.5537s), and it still didn't use an index.

Thanks again for the suggestions, though! And again, I'm sorry for misunderstanding before.

Jason C

unread,
Sep 15, 2013, 5:16:25 AM9/15/13
to
On Saturday, September 14, 2013 11:13:24 PM UTC-4, Norman Peelman wrote:
> Didn't think it would but it's better for *you* to be sure what
> you're ordering by...

I gotcha, and thanks for the suggestion. In the end, these queries are ran through PHP, so it's always been my habit to use the least amount of characters possible (I figure that I might save a few milliseconds in opening the page). It's probably not the smartest habit, though.


> I thought maybe rewriting the query would use different indexes:
>
> SELECT classifieds.id, classifieds.title, classifieds_img.image
> FROM classifieds
> LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
> LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
> WHERE classifieds.category = 'Cars'
> AND classifieds.subcat = 'Ford'
> AND classifieds_which.which = 'varA'
> AND (classifieds_img.sorter = 0
> OR classifieds_img.sorter IS NULL)
> ORDER BY classifieds.id DESC
>
> I only created 5 rows of test data.

Good thought, but that was actually a little bit slower! 0.7042s on the first run.

I'm thinking that it might just be smarter to add a column to the 'classifieds' table for "image", and copying the data for each sorter=0 to that table; duplicating data, but eliminating the need for the second JOIN. That would be a bit harder to maintain (especially when users modify their data and change the pictures), but it might be the best solution.

This is my first time coming across something that would be better using duplicated data, though, and I'd rather NOT do that, but it might be the solution for the fastest query.

J.O. Aho

unread,
Sep 15, 2013, 6:19:39 AM9/15/13
to
You could try a UNION

SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
WHERE classifieds.category = 'Cars'
AND classifieds.subcat = 'Ford'
AND classifieds_which.which = 'varA'
AND classifieds_img.sorter = 0
UNION
SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
WHERE classifieds.category = 'Cars'
AND classifieds.subcat = 'Ford'
AND classifieds_which.which = 'varA'
AND classifieds_img.sorter IS NULL

OR is quite slow and tend to not utilize index, so making the queries
together can be a lot faster.

At work we had a query with OR and IN(), the IN() solved with having a
number of "q != 1 AND q != 2 ..." and the OR we solved with UNION, now
the query which used to take up to 6 minutes runs on 40 seconds (bigger
tables with nearly 20 million rows in three of the joined tables, and we
have an overhead with 10 second which is populating a temp table from a
huge table only those rows we want to join in from it).

I think it's worth for you to try.

--

//Aho

Lennart Jonsson

unread,
Sep 15, 2013, 7:52:51 AM9/15/13
to
On 09/15/2013 10:58 AM, Jason C wrote:
[...]
>
> Ahh, I see now, sorry about that.
>
> If I run the query without "OR classifieds_img.sorter IS NULL", then it does use the index with all 3 columns in it. For that, the query speed is 0.0952s.
>
> If I run the query without "classifieds_img.sorter = 0", then it does not use an index. The last query like this took 0.5897s.
>
> So, it look like "classifieds_img.sorter IS NULL" is definitely the bottleneck.
>

You could try:

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND not exists (
select 1 from classifieds_img
where classifieds_img.id = classifieds.id
)
ORDER BY id DESC


if that improves the situation you might want to try:


SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter = 0
union
SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter IS NULL
AND not exists (
select 1 from classifieds_img
where classifieds_img.id = classifieds.id

Norman Peelman

unread,
Sep 15, 2013, 8:40:10 AM9/15/13
to
Well, I can get 'classifieds_img to see an index:

possible keys = PRIMARY,img_index
key = PRIMARY

by merging the 'classifieds_which' table into 'classifieds', but it's
still not using it. I can hint the index by:

SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img
USE INDEX (img_index)
ON classifieds_img.id = classifieds.id
WHERE classifieds.category= 'Cars'
AND classifieds.subcat= 'Ford'
AND classifieds.which='varA'
AND (classifieds_img.sorter = 0
OR classifieds_img.sorter IS NULL)
ORDER BY id DESC

possible keys = img_index
key = img_index


I even tried making 'classifieds_img.id' and 'classifieds_img.sorter'
a compound PRIMARY KEY (making .sorter = 0 for no img, although it would
still require a row):

SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE classifieds.category= 'Cars'
AND classifieds.subcat= 'Ford'
AND classifieds.which='varA'
AND classifieds_img.sorter > 0
ORDER BY id DESC

possible keys = PRIMARY
key = PRIMARY

...but I still can't notice much of a time difference. It could be my
small dataset though, dunno.

Norman Peelman

unread,
Sep 15, 2013, 8:53:14 AM9/15/13
to
On 09/15/2013 05:16 AM, Jason C wrote:
You may do it that way, but it doesn't have to be a duplication of
data. Since 'classifieds.image' will be the *default* image (0), any
rows of 'classifieds_img.sorter' (1-9) will be additional. I still don't
think it's the right way though... I think a question needs to be
asked... How often are you running this type of query? If not often then
maybe a 1/2 second is worth not adding the complexity/duplication.

Jason C

unread,
Sep 15, 2013, 6:06:32 PM9/15/13
to
On Sunday, September 15, 2013 7:52:51 AM UTC-4, Lennart Jonsson wrote:
> You could try:
>
> SELECT classifieds.id, title, classifieds_img.image
> FROM classifieds
> JOIN classifieds_which
> ON classifieds_which.id = classifieds.id
> WHERE category= 'Cars'
> AND subcat= 'Ford'
> AND classifieds_which.which='varA'
> AND not exists (
> select 1 from classifieds_img
> where classifieds_img.id = classifieds.id
> )
> ORDER BY id DESC

Running this query, without a JOIN for classifieds_img, just gave me an error that classifieds_img.image is unknown.

When I added a LEFT JOIN classifieds_img, it works, but ran in 0.9969s. It does use an index on all 4 sections, though, so I don't know why it's that much slower?
In order to get this to work without errors, I had to change the JOIN classifieds_img in the first section to a LEFT JOIN, then had to add the LEFT JOIN to the UNION section:

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter = 0
UNION
SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter IS NULL
AND NOT EXISTS (
SELECT 1 FROM classifieds_img
WHERE classifieds_img.id = classifieds.id
)
ORDER BY id DESC

This did give the right results, but the query time on this one was 0.8694s, so it's still slightly slower than the original :-(

Using EXPLAIN, it shows that I use an index everywhere except for the LEFT JOIN after the UNION (the one that I had to add). Without it, though, I get the error:

Unknown column 'classifieds_img.image' in 'field list'

Jason C

unread,
Sep 15, 2013, 6:13:35 PM9/15/13
to
On Sunday, September 15, 2013 6:19:39 AM UTC-4, J.O. Aho wrote:
> You could try a UNION
>
> SELECT classifieds.id, classifieds.title, classifieds_img.image
> FROM classifieds
> LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
> LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
> WHERE classifieds.category = 'Cars'
> AND classifieds.subcat = 'Ford'
> AND classifieds_which.which = 'varA'
> AND classifieds_img.sorter = 0
> UNION
> SELECT classifieds.id, classifieds.title, classifieds_img.image
> FROM classifieds
> LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
> LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
> WHERE classifieds.category = 'Cars'
> AND classifieds.subcat = 'Ford'
> AND classifieds_which.which = 'varA'
> AND classifieds_img.sorter IS NULL
>
> OR is quite slow and tend to not utilize index, so making the queries
> together can be a lot faster.

I'm really not familiar with the UNION concept (I'd seen it before, but never really looked in to it), so I think you're on the right track. This query does give me the right results, but still runs in 0.5413s, though.

Which, for me, is very slow! These tables are growing very fast (especially classifieds_img, which can grow at a rate of 50,000 rows a day).

Using EXPLAIN, I show that it using an index on classifieds_img on the first query, but not on the second one. So, it still looks like the IS NULL is the bottleneck?

Jason C

unread,
Sep 15, 2013, 6:29:13 PM9/15/13
to
Hold on a sec! I had a little bit of improvement. Aho's post made me realize a minor error in this one, and then a little research helped me think of something else:

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter = 0
UNION

## I'm not expecting to return an image here, so
## replace classifieds.img.image with NULL
SELECT classifieds.id, title, NULL
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id

## if I'm not returning an image, I don't need this
# LEFT JOIN classifieds_img
# ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'

## I don't think this line is necessary if we're using
## the AND NOT EXISTS, too
# AND classifieds_img.sorter IS NULL
AND NOT EXISTS (
SELECT 1 FROM classifieds_img
WHERE classifieds_img.id = classifieds.id

## Can also add this to make it a hair faster
LIMIT 1
)
ORDER BY id DESC

By making those changes, I do get the right results, and it looks like an index is used on all sections except for UNION RESULT.

But, the query speed is even slower, at 0.9342s! So that blows, but since it's now using an index, I think we're on the right track. Maybe?

Norman Peelman

unread,
Sep 15, 2013, 6:53:58 PM9/15/13
to
Have you tried flushing you cache? With this query:

SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
WHERE classifieds.category = 'Cars'
AND classifieds.subcat = 'Ford'
AND classifieds_which.which = 'varA'
AND (classifieds_img.sorter = 0
OR classifieds_img.sorter IS NULL)
ORDER BY classifieds.id DESC

... I now am getting times in the sub .013 - .020 range. I had shut down
MySQL Navigator and came back to it. In the mysql client in a text
console (linux) it consistently says 0.00 sec.

Jason C

unread,
Sep 15, 2013, 6:57:43 PM9/15/13
to
On Sunday, September 15, 2013 8:53:14 AM UTC-4, Norman Peelman wrote:

> You may do it that way, but it doesn't have to be a duplication of
> data. Since 'classifieds.image' will be the *default* image (0), any
> rows of 'classifieds_img.sorter' (1-9) will be additional.

Yeah, I realized that, too. But it's going to be a pain to write the scripts for it (especially when the user modifies their images; every modification will have to update two tables).

And, modifying the main 'classifieds' table will require a change in numerous scripts, so that's also a bit unappealing.


> I still don't
> think it's the right way though... I think a question needs to be
> asked... How often are you running this type of query? If not often then
> maybe a 1/2 second is worth not adding the complexity/duplication.

Good question, but it's a little harder to answer accurately. This database is used on several websites that includes classifieds, and on the main site, this query would be ran on an average of about 48,000 pages per day (based on traffic for the last 30 days). It would take awhile to figure out how much traffic comes from the other (lower traffic) sites, but I'd guess it to be in the range of 15,000 /day.

But, of course, there's a difference between peak time and off-peak time, and I'm not sure how many of those 48,000-63,000 queries would be during peak time. For the sake of math, let's say that 40,000 queries are ran during a 12 hour period, then that comes down to just under one query per second.

So, I don't think that the 1/2 second difference is negligible; especially as the site continues to grow. And, in the future I would like to be able to implement the classifieds in AJAX scripts, etc, so I expect more queries, even if the traffic stays the same.

Jason C

unread,
Sep 15, 2013, 7:07:20 PM9/15/13
to
On Sunday, September 15, 2013 6:53:58 PM UTC-4, Norman Peelman wrote:

> Have you tried flushing you cache? With this query:
>
>
>
> SELECT classifieds.id, classifieds.title, classifieds_img.image
>
> FROM classifieds
>
> LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
>
> LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
>
> WHERE classifieds.category = 'Cars'
>
> AND classifieds.subcat = 'Ford'
>
> AND classifieds_which.which = 'varA'
>
> AND (classifieds_img.sorter = 0
>
> OR classifieds_img.sorter IS NULL)
>
> ORDER BY classifieds.id DESC
>
>
>
> ... I now am getting times in the sub .013 - .020 range. I had shut down
>
> MySQL Navigator and came back to it. In the mysql client in a text
>
> console (linux) it consistently says 0.00 sec.

I haven't flushed, per se, but I've been changing the value of "varA" regularly, which (I think?) keeps it from loading from cache. I'm also using phpMyAdmin for testing.

I just ran the same query, using "20130915" as "varA", and it returned in 0.5452s, and no index on the classifieds_img.

I'm guessing that your database used an index? If so, how did you set up the index?

Jason C

unread,
Sep 15, 2013, 8:04:26 PM9/15/13
to
Sorry to post several in a row like this, but I know you guys are really trying to help, and I didn't want you to waste any time if I'm stumbling over something.

Using the above query, I broke it down in to 3 sections, and timed each of them. So, I have this before the UNION:

SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.sorter = 0
ORDER BY classifieds.id DESC

This returns in 0.0946s.

Then this after the UNION, but without the subquery:

SELECT classifieds.id, title, NULL
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
ORDER BY classifieds.id DESC

This returns in 0.0085s.

Then, the subquery by itself:

SELECT 1 FROM classifieds_img
WHERE classifieds_img.id = 123456
LIMIT 1

This returns in 0.0080s.

Now, the section after the UNION, but with the subquery:

SELECT classifieds.id, title, NULL
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND NOT EXISTS (
SELECT 1 FROM classifieds_img
WHERE classifieds_img.id = classifieds.id
LIMIT 1
)
ORDER BY classifieds.id DESC

This, for whatever reason, returns in 0.7333s. You would think that if one runs in 0.0085, and the other in 0.0080, then combined they would run in 0.0165! The only difference, I guess, is the NOT EXISTS?

Using EXPLAIN, this last query shows a key on classifieds_img, but no ref, so I guess that means it's not using an index.

Norman Peelman

unread,
Sep 15, 2013, 9:59:28 PM9/15/13
to
'classifieds'
type = eq_ref
possible keys = PRIMARY
key = PRIMARY
ref = classifieds_which.id
using = where

'classifieds_which'
type = index
possible keys = PRIMARY,which_index
key = which_index
ref = NULL
using = where; Using index; Using temporary; Using filesort
which_index = INDEX(id,which)

'classifieds_img'
type = ref
possible keys = PRIMARY
key = PRIMARY
ref = classifieds.id
using = where

Jason C

unread,
Sep 15, 2013, 10:28:45 PM9/15/13
to
On Sunday, September 15, 2013 9:59:28 PM UTC-4, Norman Peelman wrote:

> 'classifieds'
>
> type = eq_ref
>
> possible keys = PRIMARY
>
> key = PRIMARY
>
> ref = classifieds_which.id
>
> using = where
>
>
>
> 'classifieds_which'
>
> type = index
>
> possible keys = PRIMARY,which_index
>
> key = which_index
>
> ref = NULL
>
> using = where; Using index; Using temporary; Using filesort
>
> which_index = INDEX(id,which)
>
>
>
> 'classifieds_img'
>
> type = ref
>
> possible keys = PRIMARY
>
> key = PRIMARY
>
> ref = classifieds.id
>
> using = where

Shoot, man, I'm lost then. I'm using the exact same query, and set up a PRIMARY on id and sorter, and it lists it as a possible key but that's all.

EXPLAIN
SELECT classifieds.id, title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
LEFT JOIN classifieds_which
ON classifieds_which.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND (
classifieds_img.sorter = 0
OR classifieds_img.sorter IS NULL
)
ORDER BY classifieds.id DESC

The EXPLAIN says that 'classifieds' and 'classifieds_which' are using an index, but not 'classifieds_img':

table: classifieds_img
type: ALL
possible: PRIMARY,Unique,ID,View
key: NULL
key_len: NULL
ref: NULL
rows: 25668
Extra: Range checked for each record (index map: 0x37)

If I use the query you posted before (using classifieds_img.sorter > 0 instead of checking for IS NULL), though, then it does run fast and uses an index. That requires a total revamp of the system, though, so it's a possibility, but not my first choice.

Norman Peelman

unread,
Sep 15, 2013, 11:32:14 PM9/15/13
to
CREATE TABLE `test2`.`classifieds` (
`id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
`title` char(255) DEFAULT NULL,
`category` char(255) DEFAULT NULL,
`subcat` char(255) DEFAULT NULL,
`which` varchar(20) DEFAULT NULL,
`expiry` date DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=6 DEFAULT CHARSET=latin1

CREATE TABLE `test2`.`classifieds_img` (
`id` smallint(5) unsigned NOT NULL,
`image` char(255) DEFAULT NULL,
`sorter` tinyint(3) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`,`sorter`) USING BTREE
) ENGINE=MyISAM DEFAULT CHARSET=latin1

CREATE TABLE `test2`.`classifieds_which` (
`id` smallint(5) unsigned NOT NULL,
`which` varchar(20) DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `which_index` (`id`,`which`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

Ignore the .which and .expiry in 'classifieds', not using them for
this query.

"id","title" ,"image"
5 ,"Title5" ,"img1"
4 ,"Title4" ,"img1"
3 ,"Title3" ,""
2 ,"Title2" ,"img1"
1 ,"Title1" ,"img1"


note = row 3 was deleted to provide the NULL for the JOIN to find.

I'm wondering if you are seeing the times skewed by phpMyAdmin
itself, by whatever overhead it may generate? Here on my local dev
system, MySQL Query Browser is telling me the times are avg 0:00.0200
and gets worse (rises and fluctuates .08 - .1?? + high spikes to .8??)
the more I submit the request. MySQL on the command line on the other
hand consistently returns the result set immediately and reports 0.00
seconds.

Jason C

unread,
Sep 16, 2013, 12:13:33 AM9/16/13
to
Here's the SHOW CREATE TABLE for my 'classifieds_img':

CREATE TABLE `classifieds_img` (
`id` varchar(15) NOT NULL,
`sorter` int(1) NOT NULL DEFAULT '0',
`image` varchar(20) NOT NULL,
PRIMARY KEY (`id`,`sorter`),
UNIQUE KEY `Unique` (`id`,`image`),
KEY `sorter` (`sorter`,`image`),
KEY `ID` (`id`),
KEY `View` (`id`,`sorter`,`image`),
KEY `image` (`image`),
KEY `sorter_2` (`sorter`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1

All of those indices were ones created for the testing here. There are minor differences (I'm using `id` varchar(15), where you're using smallint(5), and you have `image` first where I have it last), but essentially the same.

And the SHOW CREATE TABLE for my 'classifieds':

CREATE TABLE `classifieds` (
`id` mediumint(9) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL DEFAULT '',
`category` varchar(50) NOT NULL,
`subcat` varchar(50) DEFAULT NULL,
`contact` varchar(50) DEFAULT NULL,
`email` varchar(100) DEFAULT NULL,
`tele` varchar(50) DEFAULT NULL,
`vehicle_type` varchar(30) DEFAULT NULL,
`seller_type` varchar(20) DEFAULT NULL,
`make` varchar(100) DEFAULT NULL,
`model` varchar(100) DEFAULT NULL,
`year` varchar(4) DEFAULT NULL,
`mileage` varchar(10) DEFAULT NULL,
`price` varchar(30) DEFAULT NULL,
`postdate` varchar(14) NOT NULL,
`expiration` varchar(14) NOT NULL DEFAULT '0',
`title` varchar(50) DEFAULT NULL,
`description` text NOT NULL,
`status` varchar(10) NOT NULL,
`startdate` varchar(14) DEFAULT NULL
PRIMARY KEY (`id`),
KEY `username` (`username`,`status`),
KEY `category` (`category`,`subcat`,`expiration`),
KEY `list` (`category`,`subcat`)
) ENGINE=MyISAM AUTO_INCREMENT=609840 DEFAULT CHARSET=latin1

For testing, I removed classifieds_which completely from the query, and it's still just as slow so I didn't bother showing that one.

There are currently 41,669 rows in 'classifieds', and 25,682 rows in 'classifieds_img' (where all but 5 of them have sorter = 0; I haven't implemented the ability to upload multiple pictures yet, so all ads with an image will just have 1).

Running the query from Putty still took 0.57s, so I don't think that PMA is the problem. If it's running fast for you, then can it be something to do with the column types that I'm using??

J.O. Aho

unread,
Sep 16, 2013, 12:47:52 AM9/16/13
to
As there are no classifieds_img, there is no index value either, so you
will always have a longer time for the second sql. I should have thought
about that earlier.

If you use a flag telling there is no image or add an row which
symbolises a missing image in classifieds_img, then the query would be
faster.

--
//Aho

J.O. Aho

unread,
Sep 16, 2013, 1:48:03 AM9/16/13
to
On 16/09/13 00:13, Jason C wrote:
As I haven't had the time to mock your database and anyway I wouldn't
have the same amount of data, so not sure if this would help you, but
remember from a site with around twice your traffic (as you explained it
in this thread), we used memcashed, which I think can be good for you
to, but of course there are times when you do the initial SQL query and
we had some issues with a complicated join, we used HAVING which did
improve the query.

SELECT classifieds.id, classifieds.title, classifieds_img.image
FROM classifieds
LEFT JOIN classifieds_img ON classifieds.id = classifieds_img.id
HAVING classifieds_img.sorter IS NULL
LEFT JOIN classifieds_which ON classifieds.id = classifieds_which.id
WHERE classifieds.category = 'Cars'
AND classifieds.subcat = 'Ford'
AND classifieds_which.which = 'varA'

In this specific case I don't think it will have any difference, but
could be worth trying out.

In this specific case, I think a graph database had been more optimal.


In case you haven't used memcached before, the idea is to store the
result set from a query in the memcached and your "query" functionality
will always first look in memcached for the result set, if missing, then
you connect to the database and make the full query (don't forget to
store the result set in the memcached).
It's important to destroy the cache each time you do update/insert/delete.
I know, this is not a five minutes fix for your sites, it will take some
time to build, but could be an option if you feel like your queries
would be too heavy for the database and you don't get the funds for a
new database server with loads of CPU/RAM/SSD.

--

//Aho

Lennart Jonsson

unread,
Sep 16, 2013, 2:58:37 AM9/16/13
to
On 09/16/2013 12:06 AM, Jason C wrote:
> On Sunday, September 15, 2013 7:52:51 AM UTC-4, Lennart Jonsson wrote:
>> You could try:
>>
>> SELECT classifieds.id, title, classifieds_img.image
>> FROM classifieds
>> JOIN classifieds_which
>> ON classifieds_which.id = classifieds.id
>> WHERE category= 'Cars'
>> AND subcat= 'Ford'
>> AND classifieds_which.which='varA'
>> AND not exists (
>> select 1 from classifieds_img
>> where classifieds_img.id = classifieds.id
>> )
>> ORDER BY id DESC
>
> Running this query, without a JOIN for classifieds_img, just gave me an error that classifieds_img.image is unknown.
>


Sorry about that, the query should read:

SELECT classifieds.id, title, null
-- you might have to cast null to
-- whatever type classifieds_img.image is
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND not exists (
select 1 from classifieds_img
where classifieds_img.id = classifieds.id
)
ORDER BY id DESC

[...]




Norman Peelman

unread,
Sep 16, 2013, 6:21:40 AM9/16/13
to
HAVING is like another WHERE clause anyway, I tried it here earlier -
no difference.

> In this specific case, I think a graph database had been more optimal.
>
>
> In case you haven't used memcached before, the idea is to store the
> result set from a query in the memcached and your "query" functionality
> will always first look in memcached for the result set, if missing, then
> you connect to the database and make the full query (don't forget to
> store the result set in the memcached).
> It's important to destroy the cache each time you do update/insert/delete.
> I know, this is not a five minutes fix for your sites, it will take some
> time to build, but could be an option if you feel like your queries
> would be too heavy for the database and you don't get the funds for a
> new database server with loads of CPU/RAM/SSD.
>

If his database is/will be growing as fast as he says, I doubt that
result set would ever be satisfied by a cache anyway. Tweaking MYSQLs
caches would do the same thing.

Norman Peelman

unread,
Sep 16, 2013, 6:30:31 AM9/16/13
to
Not quit the same I'm thinking as 'classifieds.id' and
'classifieds_img.id' are different types... there has to be some type
conversion taking place when they are being compared. Why VARCHAR(15)
for 'classifieds_img'?

Luuk

unread,
Sep 16, 2013, 2:14:44 PM9/16/13
to
On 16-09-2013 02:04, Jason C wrote:
> SELECT classifieds.id, title, NULL
> FROM classifieds
> JOIN classifieds_which
> ON classifieds_which.id = classifieds.id
> WHERE category= 'Cars'
> AND subcat= 'Ford'
> AND classifieds_which.which='varA'
> AND NOT EXISTS (
> SELECT 1 FROM classifieds_img
> WHERE classifieds_img.id = classifieds.id
> LIMIT 1
> )
> ORDER BY classifieds.id DESC

This query should give the same results as:

SELECT classifieds.id, title, NULL
FROM classifieds
JOIN classifieds_which
ON classifieds_which.id = classifieds.id
LEFT JOIN classifieds_img
ON classifieds_img.id = classifieds.id
WHERE category= 'Cars'
AND subcat= 'Ford'
AND classifieds_which.which='varA'
AND classifieds_img.id IS NULL
ORDER BY classifieds.id DESC

In the first query, you are 'forcing' MySQL to execute the subquery for
every record in the main query.

In the second query MySQL should have a chance to make a smart
alternative choice. (if one exists)


Jason C

unread,
Sep 16, 2013, 5:03:45 PM9/16/13
to
On Monday, September 16, 2013 6:30:31 AM UTC-4, Norman Peelman wrote:

> Not quit the same I'm thinking as 'classifieds.id' and
>
> 'classifieds_img.id' are different types... there has to be some type
>
> conversion taking place when they are being compared. Why VARCHAR(15)
>
> for 'classifieds_img'?

Holy crap, man, I think you got it!!! I changed it from varchar(20) to mediumint(9) to match 'classifieds' and 'classifieds_which', then on the first run the query speed was 0.0093s!

I have chalk it up to just plain laziness on my part when I created 'classifieds_img', not realizing the significance. I didn't think to double check how it was created in the original 'classifieds', so I just plugged in a number that I knew would fit.

Wow, I definitely know better now! I've been plugging at this for 2 weeks, so this is a lesson I won't forget!

Thanks for all of the help, all of you. I really appreciate the time you guys spent helping me find what turned out to be a simple mistake, but I learned a lot of different techniques in the meanwhile.

Norman Peelman

unread,
Sep 16, 2013, 10:35:40 PM9/16/13
to
Good, I hope that works for you as your database grows. Let us know! ;)
0 new messages