Avoiding a full table scan

laredotornado

unread,

Jan 19, 2011, 12:28:27 PM1/19/11

to

Hi,

I'm using MySQL 5. I'm writing a simple app that finds the nearest
store within a certain radius. So I'm using the query

SELECT ((ACOS(SIN($lat * PI() / 180) * SIN(lat * PI() / 180) +
COS($lat * PI() / 180) * COS(lat * PI() / 180) * COS(($lon – lon) *
PI() / 180)) * 180 / PI()) * 60 * 1.1515) AS `distance` FROM `stores`
HAVING `distance`<=’10′ ORDER BY `distance` ASC

where $lat and $lon are the values I'm substituting based on user
input. Although the columns lat and lon are indexed, I'm certain a
full table scan is being performed because of the complexity of the
query. Does anyone know a way to make the query more efficient?

Thanks, - Dave

Erick T. Barkhuis

unread,

Jan 19, 2011, 12:57:29 PM1/19/11

to

laredotornado:

How about selecting the stores (id, lat, lon) within a _square around
your center point, first? That would be something like

SELECT shopid, lat, lon
FROM stores
WHERE lat BETWEEN $cpointlat-10 AND $cpointlat+10
AND lon BETWEEN $cpointlon-10 AND $cpointlon+10

Now, you have a limited number of shops, found quickly, because this
query will use your indices. If really necessary (wouldn't a square be
sufficient already?) you can wrap this result with your circle
calculation to obtain the shops that are within the square AND within
the circle.

If this were really a shopfinder query, I wouldn't bother, since no-one
would be flying in a straight line to that shop, anyway, so the
'circle' wouldn't be too relevant.

--
Erick

"There are three kinds of people: those who can count, and those who
can't."

Peter H. Coffin

unread,

Jan 19, 2011, 1:24:29 PM1/19/11

to

On Wed, 19 Jan 2011 09:28:27 -0800 (PST), laredotornado wrote:
> Hi,
>
> I'm using MySQL 5. I'm writing a simple app that finds the nearest
> store within a certain radius. So I'm using the query
>
> SELECT ((ACOS(SIN($lat * PI() / 180) * SIN(lat * PI() / 180) +

> COS($lat * PI() / 180) * COS(lat * PI() / 180) * COS(($lon ??? lon) *

> PI() / 180)) * 180 / PI()) * 60 * 1.1515) AS `distance` FROM `stores`

> HAVING `distance`<=???10??? ORDER BY `distance` ASC

>
> where $lat and $lon are the values I'm substituting based on user
> input. Although the columns lat and lon are indexed, I'm certain a
> full table scan is being performed because of the complexity of the
> query. Does anyone know a way to make the query more efficient?

Sure. Scrap the radius search for a simpler means of determining
"close", then if it really matters exactly how close the close ones are,
perform the trig on only the ones that are close enough to matter.

If you're looking for something at, for example, 45N 60W with a radius
of 50 miles, do your math OUTSIDE the query and get $miles / 1.2 / (360
* 60) so your range is 45N-0.00193 to 45N+0.00193 for lat, and the same
trick for lon.

SELECT * from stores
where lat > 44.99807
and lat < 45.00193
and lon > 59.99807
and lon < 60.00193
order by (abs(45-lat) + abs(60-lon)) desc

That'll be batshit fast if you've got indexes on lat and lon, and you
can do precision math on the results as you read off the result rows
until you decide to quit. It won't be *precise*, down to a few yards,
but it's probably good enough that the wrong won't make a bigger
difference than whether the traffic lights are with the customer or
against...

(The optimizer is probably smart enough to still be fast if you do math
in the comparisons to make the constant too, but I haven't tried it.

where lat > (45 - 0.00193)
and lat < (45 + 0.00193)

and those value can be substituted for variables *at the query
composition time* by your application language, to make things a little
simpler to read.)

--
45. I will make sure I have a clear understanding of who is responsible
for what in my organization. For example, if my general screws up I
will not draw my weapon, point it at him, say "And here is the
price for failure," then suddenly turn and kill a random underling.

Peter H. Coffin

unread,

Jan 19, 2011, 1:43:49 PM1/19/11

to

On Wed, 19 Jan 2011 12:24:29 -0600, Peter H. Coffin wrote:
> On Wed, 19 Jan 2011 09:28:27 -0800 (PST), laredotornado wrote:
>> Hi,
>>
>> I'm using MySQL 5. I'm writing a simple app that finds the nearest
>> store within a certain radius. So I'm using the query
>>
>> SELECT ((ACOS(SIN($lat * PI() / 180) * SIN(lat * PI() / 180) +
>> COS($lat * PI() / 180) * COS(lat * PI() / 180) * COS(($lon ??? lon) *
>> PI() / 180)) * 180 / PI()) * 60 * 1.1515) AS `distance` FROM `stores`
>> HAVING `distance`<=???10??? ORDER BY `distance` ASC
>>
>> where $lat and $lon are the values I'm substituting based on user
>> input. Although the columns lat and lon are indexed, I'm certain a
>> full table scan is being performed because of the complexity of the
>> query. Does anyone know a way to make the query more efficient?
>
> Sure. Scrap the radius search for a simpler means of determining
> "close", then if it really matters exactly how close the close ones are,
> perform the trig on only the ones that are close enough to matter.
>
> If you're looking for something at, for example, 45N 60W with a radius

Eh, call it 60E then, to make the math work how I wrote it. Maybe you DO
have a store at the bottom of a lake between Kazakhstan and
Uzbekistan...

> of 50 miles, do your math OUTSIDE the query and get $miles / 1.2 / (360
> * 60) so your range is 45N-0.00193 to 45N+0.00193 for lat, and the same
> trick for lon.

--
69. All midwives will be banned from the realm. All babies will be
delivered at state-approved hospitals. Orphans will be placed in
foster-homes, not abandoned in the woods to be raised by creatures
of the wild. --Peter Anspach's Evil Overlord list

Erick T. Barkhuis

unread,

Jan 19, 2011, 3:25:15 PM1/19/11

to

Erick T. Barkhuis:

>How about selecting the stores (id, lat, lon) within a _square around
>your center point, first? That would be something like
>
>SELECT shopid, lat, lon
>FROM stores
>WHERE lat BETWEEN $cpointlat-10 AND $cpointlat+10
> AND lon BETWEEN $cpointlon-10 AND $cpointlon+10
>
>Now, you have a limited number of shops, found quickly, because this
>query will use your indices. If really necessary (wouldn't a square be
>sufficient already?) you can wrap this result with your circle
>calculation to obtain the shops that are within the square AND within
>the circle.

Thinking about this, you might even be able to speed things up further:
make use of the fact, that any shop within the "small square" certainly
is a shop you want selected.
The "small square" is the square _within_ the circle around the centre
point. Its corner points are positioned on the circle, at:

top: $cpointlat+(10/SQRT(2)) bottom: $cpointlat-(10/SQRT(2))
right: $cpointlon+(10/SQRT(2)) left: $cpointlon-(10/SQRT(2))

Now, the strategy is:

1. select every point in the area between the outer square and the
small square
2. from that result, select the points that are at less than 10 from
the center point (so: within the circle)
3. combine the result (UNION) with the points within the small square.

This should be very fast, and quite accurate, at least for flat
surfaces and relatively small distances from the center point.

--
Erick

onedbguru

unread,

Jan 19, 2011, 9:18:06 PM1/19/11

to

Dave, to answer your theory as to the Full Table Scan, the answer is
yes you are... Why? because you have not given it anything in a where
clause to work with. Your query will go through EVERY row, calculate
the "distance" and keep only those "HAVING" the value of <=10. Look
at the other queries in this thread to see how they used the WHERE
clause.

You might also want to look at the SPATIAL extenions in MySQL
Here is just one link with examples to get you started:
http://dev.mysql.com/tech-resources/articles/4.1/gis-with-mysql.html

The Natural Philosopher

unread,

Jan 20, 2011, 5:24:49 AM1/20/11

to

fascnating stuff. Tagged for possible future use..