[GEO] Testing Time/Space Optimization for GEORADIUS/GEORADIUSBYMEMBER

37 views
Skip to first unread message

Jonah H. Harris

unread,
Mar 14, 2018, 3:18:56 AM3/14/18
to Redis DB

Hey, everyone!


TL;DR

If you use Redis' geospatial search, I could really use your help testing this PR (https://github.com/antirez/redis/pull/4741). If not, there's no need to read this :)


Overview

A few years ago, I wrote a recommender system on top of Redis. At the time, Redis didn't support geospatial searches and, as such, I had to to take the resulting recommendations and pass them over to Elasticsearch for a secondary pass. Over the last couple weeks, I've been doing quite a bit of work optimizing that engine and having it use GEORADIUS instead of Elastic. In doing so, however, I ran into some performance issues performing k-nearest-neighbors search using the COUNT clause. Upon investigation, I noticed the current implementation appends all matching points to the geoPoint array and performs top-k result pruning after-the-fact via a full sort and a [0, count)-based for-loop; this incurs the overhead of multiple unnecessary memory (re)allocations and sorting. The simplest optimization I tried was a three-line code change to replace the full qsort with a call to the included partial qsort (pqsort) to sort only the top-k members of the array providing for O(n + k log k) instead of O(n log n) time improvements where n is the number of points within the radius and k is the number specified by COUNT. While this provides a substantial 5-15% improvement, the most optimal solution is to use a min/max-heap, also reducing memory requirements, which I've implemented and submitted the PR for.


This PR, the first of several geospatial improvements, provides optimizations--in terms of both time and space--for all GEORADIUS/GEORADIUSBYMEMBER searches which utilize the COUNT clause. It accomplishes this by allocating the geoPoint array once with the requested COUNT size and treats it as a (min/max)-heap storing only the k nearest (or farthest) points using the same sort comparators as the final sort.


From a benchmark standpoint, average and median execution time reductions of between 10% and 30% are common depending on the density of points of interest (POI) as well as the radius searched. No performance degradation has been observed.


Where n is the number of points in the desired radius and k is the number of items requested, with this optimization, the overall change for GEORADIUS COUNT searches in terms of memory goes from O(n) to O(k) and in time from O(n log n) to O(k + (n - k) log k + k log k).


Feedback is welcomed.



Reply all
Reply to author
Forward
0 new messages