this is my first instinct:
use cells that are keyword-based, so for instance you have a different
square kilometer for pizza, bike, ..., whatever, and of course you
create the key only if there is at least one shop and you delete when
you delete the last shop in the square.
I said 1 km, it's up to you, you may even do that you have 1km but if
there are more than N shops inside you set a sentinel key and divide
more. It's the correct way to do it algorithmically-wise, but I would
go for fixed sized squares by instinct.
Then if the user search "pizza" you simply fetch and refine doing the
math client side.
if the user typed "pizza foo bar" you fetch all three, and perform an
intersection client side.
And so forth.
My feeling is that's going to work well.
p.s. in your squares just put IDs of shops so that fetch /
intersection will be very fast, even if this will force you to fetch
shop objects later to render them.
Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to
> redis-db+u...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/redis-db?hl=en.
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
> I thought of that approach, my instinct was that it's good for popular tags
> like Pizza, and wasteful for sparse tags
That's true, however if you may afford to don't store lat/long of
single shops, but just IDs, in your "squares", with conveniently set
"intset" parameters you can store those info inside Sets in very
efficient way.
This would require fetching for the on-client additional filtering
stage to avoid returning some that is 1.3 Km away.
Cheers,
Salvatore
On Wed, Apr 11, 2012 at 2:09 PM, Dvir Volk <dv...@everything.me> wrote:That's true, however if you may afford to don't store lat/long of
> I thought of that approach, my instinct was that it's good for popular tags
> like Pizza, and wasteful for sparse tags
single shops, but just IDs, in your "squares", with conveniently set
"intset" parameters you can store those info inside Sets in very
efficient way.
This would require fetching for the on-client additional filtering
stage to avoid returning some that is 1.3 Km away.
Cheers,
Salvatore
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
If you performed the intersections, then directly filtered the
resulting zsets, you would end up with the same result, without paying
for the costly union up front. If you then find that filtering in
Redis is expensive, you could union the intersection results, and
return that directly to the client (with geohash scores), who could
then perform the distance filtering.
For filtering based on distance, because you are using such small grid
regions (1x1km or 4x4km), you can assume a flat world, and can scale
the east/west dimension for simplicity. This would reduce your
distance calculations from N standard geo distance calculations to 1
geo distance calculation (for the scaling factor) followed by an
additions and 3 multiplications (not necessarily in that order) for
each of your N points.
If you've tried these things before and they didn't work out, please
let me know :)
Regards,
- Josiah
I know that it is convenient to perform the union of the cells, then
executing the intersection... But as far as I can tell, because the
intersection will necessarily filter out locations, the only gain to
the early union is that you don't need to execute multiple
intersection calls. But that isn't really a gain, as the union is
essentially just copying data around that will be discarded before the
final result.
For filtering based on distance, because you are using such small grid
regions (1x1km or 4x4km), you can assume a flat world, and can scale
the east/west dimension for simplicity. This would reduce your
distance calculations from N standard geo distance calculations to 1
geo distance calculation (for the scaling factor) followed by an
additions and 3 multiplications (not necessarily in that order) for
each of your N points.
Say that you have a geo_dist() function that takes lat/lon pairs and
returns the distance in km.
center = (34.7228, -118.17) # the center of your search
km_per_degree = geo_dist((0,0), (0,1))
scale = geo_dist(center, (center[0], center[1]+1) / km_per_degree
scale2 = km_per_degree * km_per_degree
cutoff_squared = max_distance * max_distance
close_enough = []
for id, pt in matches:
dla = pt[0] - center[0]
dlo = scale * (pt[1] - center[1])
dist = (dla * dla + dlo * dlo) * scale2
if dist < cutoff_squared:
close_enough.append((id, pt))
Regards,
I'd been hoping for a factor of 2x, but better than 10x is awesome :D
> As for the other part - nice trick ,but this doesn't seem like much of a
> problem currently. The bigger problem would be that I have to actually load
> the lat,lon pairs of each matching object before filtering, which takes
> longer than the actual calculations (by then we are usually left with a
> handful of nodes anyway). This can be optimized by making the geo cells
> sorted sets, with the exact geohash as the score. then I won't need to load
> anything to filter, just transform the scores to lat,lons.
It's always about the lowest-hanging fruit. When you get rid of the
lat/lon fetch using the geohash score, then you translate your data,
you will still need to compute the distances. If you can do your
distance calculations in 1/10th the time it takes for the other parts,
then I'd say the low-hanging fruit is gone. But this change should
give you close to a 4x improvement in lat/lon distance calculation
performance, with generally better than .05% difference in distance
calculated.
Regards,
- Josiah
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/P4txuYbNzgIJ.
Regards,
- Josiah
thanks, that was the initial approach I took, but the number of nodes outside the box was so big it was the slowest approach of the many I've tried.The thing is that the points between min_geocode and max_geocode include not only points outside the radius, but a _lot_ of points outside the box. so this approach was not fast enough.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/mVsmqrnv5PcJ.
Hi Dvir, for people that will read this interesting thread in the next
months, please could you summarize what you ended using? Thanks!
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/FP5dX2Bd06kJ.
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To view this discussion on the web visit https://groups.google.com/d/msg/redis-db/-/uU3-p6HOJRgJ.