Index operation similar to "BETWEEN" in SQL

654 views
Skip to first unread message

Marcos Riso

unread,
Sep 20, 2011, 5:46:36 PM9/20/11
to Redis DB
Hello,

I have a millions of lists of ranges of GeoIPs like this (i.e.):

[0] { 'IPStart' => '004001136000', 'IPEnd' => '004001143255', 'City'
=> 'Woburn' }
[1] ...
[2] ...

and so on ...

I want to perform a search like this: Having in hand one IP ( i.e.
004001136001 ) in wich list this value is BETWEEN the keys IPStart and
IPEnd ... and return to me the City Name.

Perhaps this logic is wrong, there is another way of thought for redis
(maybe i'm thinking SQL) ... could please someone throw some light on
this issue?

Thanks a lot in advance.

Marcos

Dvir Volk

unread,
Sep 20, 2011, 6:59:46 PM9/20/11
to redi...@googlegroups.com
Hi Marcos,
Redis is ideal for this task - you can use a sorted set for this, look at ZRANGEBYSCORE

but you don't even have to write the abstraction code!
I have an open python library that does exactly that, and resolves not only IPs, but also lat,lon pairs into cities and ZIP codes in the US, and can be extended to include more layers.

If you don't want python, you can of course just copy my logic.
check this out:




--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.




--
Dvir Volk
System Architect, DoAT, http://doat.com

Salvatore Sanfilippo

unread,
Sep 21, 2011, 7:05:43 AM9/21/11
to redi...@googlegroups.com
Thanks Dvir, nice library!

in order to make things more clear I want to show Marcos what is the
obvious technique to use with sorted sets in order to get things in a
given range.

For instance I've two ranges:

A_start 10, A_end 20
B_start 30, B_end 40

I can add those ranges to single sorted set:

redis 127.0.0.1:6379> zadd ranges 10 A_start
(integer) 1
redis 127.0.0.1:6379> zadd ranges 20 A_end
(integer) 1
redis 127.0.0.1:6379> zadd ranges 30 B_start
(integer) 1
redis 127.0.0.1:6379> zadd ranges 40 B_end
(integer) 1


I've the value of 15, and want to check in what range it is:

redis 127.0.0.1:6379> zrangebyscore ranges (15 +inf LIMIT 0 1
1) "A_end"

As you can see I queried for elements that have a score between 15 and
+inf (infinite). I prefixed 15 with "(" in order to tell Redis I want
elements > 15, and not >= 15.

This way if I get as a result an "*_end" value I know I have a match.
If instead I get no elements or a *_start element I know I've no
match.

For instance 25 is not in the right interval, so if I query Redis I obtain:

redis 127.0.0.1:6379> zrangebyscore ranges (25 +inf LIMIT 0 1
1) "B_start"

I obtained a _start, so I'm not inside a range.

I hope this makes it clear how to do this kind of range queries with Redis.

About performances, you'll be surprised about the speed Redis can
serve this queries, in the order of tens of thousands per seconds in
an entry level virtual machine, even if you have millions of ranges.

Cheers,
Salvatore

--
Salvatore 'antirez' Sanfilippo
open source developer - VMware

http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele

Dvir Volk

unread,
Sep 21, 2011, 9:43:30 AM9/21/11
to redi...@googlegroups.com
Thanks Salvatore.
Not sure which way is more memory efficient, but I did a slightly different trick:
I only indexed the end of the ranges, and the value of each range's score is the start and end of it
something like:
ZADD 20 "ip:10-20"

then after you run your ZRANGEBYSCORE, you parse the value into 10,20, and see if the ip you looked for is inside it or not.

even though my values are really big, since stringified ip2longs are, well, long... it might be more efficient than saving entries per range.

BTW, I used a similar trick to resolve lat,lon pairs - I converted them into geohash - a 64 bit (or a bit less) number, that allows me to use a single index to find adjacent coordinates.
it's a really cool algorithm, you can read more about it here:

Marcos Riso

unread,
Sep 21, 2011, 10:36:26 AM9/21/11
to Redis DB
Thank you for this.

I still didnt get clear the "*_end" part. I have millions of "*_ends"
in a single sorted set?



regards


Marcos

Dvir Volk

unread,
Sep 21, 2011, 10:38:10 AM9/21/11
to redi...@googlegroups.com
yes, that's correct



Marcos

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Marcos Riso

unread,
Sep 21, 2011, 10:46:35 AM9/21/11
to Redis DB
BUT - how can I identify each one in a "query" ?

thx Dvir!

Marcos Riso

unread,
Sep 21, 2011, 1:16:20 PM9/21/11
to Redis DB
HEY I DID IT ! IT WORKS!

Thank you guys!

Marcos Riso

unread,
Sep 21, 2011, 6:05:34 PM9/21/11
to Redis DB
I Found a bug on this: What if I have a Start = End, and the number is
the same as both? In this case, it returns null...

regards

Marcos
> > System Architect, DoAT,http://doat.com

Salvatore Sanfilippo

unread,
Sep 22, 2011, 4:21:40 AM9/22/11
to redi...@googlegroups.com
On Wed, Sep 21, 2011 at 3:43 PM, Dvir Volk <dv...@doat.com> wrote:
> Thanks Salvatore.
> Not sure which way is more memory efficient, but I did a slightly different
> trick:
> I only indexed the end of the ranges, and the value of each range's score is
> the start and end of it
> something like:
> ZADD 20 "ip:10-20"

Hello Dvir, your trick is absolutely more memory efficient, thanks!

> BTW, I used a similar trick to resolve lat,lon pairs - I converted them into
> geohash - a 64 bit (or a bit less) number, that allows me to use a single
> index to find adjacent coordinates.
> it's a really cool algorithm, you can read more about it here:
> http://en.wikipedia.org/wiki/Geohash

Cool, I hope to add those two patterns in the Redis book patterns section.

Salvatore Sanfilippo

unread,
Sep 22, 2011, 4:30:28 AM9/22/11
to redi...@googlegroups.com
On Thu, Sep 22, 2011 at 12:05 AM, Marcos Riso <marco...@gmail.com> wrote:
> I Found a bug on this: What if I have a Start = End, and the number is
> the same as both? In this case, it returns null...

This problem is easily solvable in the pattern I showed you, but if
you use the variant proposed by Dvir I think it is even simpler. In
the example of three ranges: 10-20, 30-40, 22-22:

We populate the ranges with:

redis 127.0.0.1:6379> zadd ranges 20 10-20
(integer) 1
redis 127.0.0.1:6379> zadd ranges 40 30-40
(integer) 1
redis 127.0.0.1:6379> zadd ranges 22 22-22
(integer) 1

Let's check if 5 is inside some range:

redis 127.0.0.1:6379> zrangebyscore ranges 5 +inf LIMIT 0 1
1) "10-20"

5 is not included in the range Redis returned, so no match.

If we use 15 instead we get again "10-20" but this time we can verify
that our number is included.

Now let's try with 22:

redis 127.0.0.1:6379> zrangebyscore ranges 22 +inf LIMIT 0 1
1) "22-22"

That's obviously included.

As Dvir pointed out, this technique also uses less memory.

Cheers,
Salvatore

Marcos Riso

unread,
Sep 22, 2011, 10:47:08 AM9/22/11
to Redis DB
Hi Salvatore,

Again, it worked out perfectly. Many thanks also to Dvir, his solution
was brilliant.

Thanks a lot guys,

Marcos Riso
Web Developer
Akna Software

Marcos Riso

unread,
Sep 22, 2011, 4:45:12 PM9/22/11
to Redis DB
I dont know if I did it right, but I created a range like this:

while($row = mysql_fetch_array($res))
{

$redis->zadd('ranges', $row['bdips_ipend'],
$row['bdips_ipstart'].'-'.$row['bdips_ipend']);


$redis->hset($row['bdips_ipend'], 'bdips_ipstart',
$row['bdips_ipstart']);
$redis->hset($row['bdips_ipend'], 'bdips_ipend',
$row['bdips_ipend']);
$redis->hset($row['bdips_ipend'], 'bdips_pais',
$row['bdips_pais']);
$redis->hset($row['bdips_ipend'], 'bdips_estado',
$row['bdips_estado']);
$redis->hset($row['bdips_ipend'], 'bdips_cidade',
$row['bdips_cidade']);
$redis->hset($row['bdips_ipend'], 'bdips_cep',
$row['bdips_cep']);
$redis->hset($row['bdips_ipend'], 'bdips_latitude',
$row['bdips_latitude']);
$redis->hset($row['bdips_ipend'], 'bdips_longitude',
$row['bdips_longitude']);
$redis->hset($row['bdips_ipend'], 'bdips_dmacode',
$row['bdips_dmacode']);
$redis->hset($row['bdips_ipend'], 'bdips_areacode',
$row['bdips_areacode']);
$redis->hset($row['bdips_ipend'], 'bdips_cidadecode',
$row['bdips_cidadecode']);


$c++;
}

So I'm creating a big range, then millions of hashes with the data I
want ... First I search it on the ranges, then I get the IPEND who is
the hash id

then i get all the data

Is this the best way to do it? I worked fine, and very fast, but I'm
thinking in terms of improving the performance and reducing memory
use.

Regards,

--
Marcos Riso

Dvir Volk

unread,
Sep 22, 2011, 4:47:16 PM9/22/11
to redi...@googlegroups.com
yup, this is the right approach, although you could speed up the indexing with HMSET and pipelining.


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.

Reply all
Reply to author
Forward
0 new messages