search through the values of a set

124 views
Skip to first unread message

Salimane Adjao Moustapha

unread,
Jan 4, 2012, 9:37:53 AM1/4/12
to Redis DB
Hi
Basically I have a set containing values i would like to search
through. The set already exists. when a user enter a query like "123"
and the set contains ['4333', '1233','1133','1236'], i should be able
to return ['1233','1236']. I should also be able to provide the number
of returned results.
Thanks

Josiah Carlson

unread,
Jan 4, 2012, 12:17:25 PM1/4/12
to redi...@googlegroups.com
Sets don't have that ability. You can make it work with a zset, and
here's how...

First, every item in your zset should have a score of 0. I'm going to
make a temporary copy of the zset so that we don't have any race
conditions, but with work, you can address that. Below is the code to
solve your problem using a zset...

def data_range(conn, data, prefix):
if ord(prefix[-1]) == 255:
# we don't want to deal with having to carry
end = prefix + 20*'\xff'
else:
end = prefix[:-1] + chr(ord(prefix[-1]) + 1)
conn.zinterstore('temporary', data)
r1 = conn.zadd('temporary', 0, prefix)
r2 = conn.zadd('temporary', 0, end)
b = conn.zrank('temporary', prefix)
e = conn.zrank('temporary', end)
# if you just want the count...
# count = e + 1 - b - r1 - r2
results = conn.zrange('temporary', b + r1, e - r2)
conn.delete('temporary')
return len(results), results

Redis scripting can offer a similar solution without copying and
without worrying about race conditions. You can also do a similar
thing just using the scores to store the prefixes of the values, which
would allow you to just use zrangebyscore without needing to
add/remove items or copy the zset, but it is limited to 53 bits
naively, and almost 64 bits if you know what you are doing.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>

Salimane Adjao Moustapha

unread,
Jan 4, 2012, 9:57:24 PM1/4/12
to redi...@googlegroups.com
Hi,
thanks for the reply. since in this particular context, I'm dealing with integer values in the set. I think I'll go with using a zset and setting the score of each value as the value itself. then as you said using zrangebyscore. the max value of the value is 2147483647 making it 31 bits. 
out of curiosity, any suggestion on how to go about it with redis scripting
Thanks

Josiah Carlson

unread,
Jan 4, 2012, 11:37:57 PM1/4/12
to redi...@googlegroups.com
With Redis scripting you would use what I offered earlier (without
making a copy of the zset), only you would write it in Lua, send it
off to Redis, and run it there. Redis 2.6 is the first version with
official scripting support.

In terms of using the score and searching that way... your original
post suggested that you wanted to do prefix lookups. If you still want
to do prefix lookups and are having problems figuring out how to do
it, reply to this thread and I'll point you in the right direction.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.

> To view this discussion on the web visit
> https://groups.google.com/d/msg/redis-db/-/dGfY0AuStRgJ.

Salimane Adjao Moustapha

unread,
Jan 5, 2012, 12:16:33 AM1/5/12
to redi...@googlegroups.com
Hi,
yep i should have prefix look up. any suggestion would be appreciated.
Thanks

Josiah Carlson

unread,
Jan 5, 2012, 2:11:20 AM1/5/12
to redi...@googlegroups.com
1. Take your id as a sequence of integers...   726156 -> [7, 2, 6, 1,
5, 6]2. Append -1 values to the right until you have 10 digits
(technically enough for ids of up to 33 bits)...   [7, 2, 6, 1, 5, 6,
-1, -1, -1, -1]3. Add 1 to all of your digits...   [8, 3, 7, 2, 6, 7,
0, 0, 0, 0]4. Consider each of these digits as a 4-bit portion of a
40-bit score...   score = 0x8372670000

To find the ids with a given prefix, you need to calculate a start/end
score for performing ZRANGEBYSCORE.1. Calculate the "start" as the
score method above.2. Calculate "end" by replacing all trailing '0'
digits with hex 'f' (technically 'b' is sufficient, and the only '0'
digits should be at the right side of the score).
Looking at an example, if we wanted to find ids with prefixes of 726,
we would generate a "start" score of 0x8370000000. We would also
generate an "end" score of 0x837fffffff. Our example id from before
has a prefix of 726, but also has a score of 0x8372670000, which falls
inside the range that we generated.
In Python:
def score(id):    # step 1    id = map(int, str(id))    # step 2
id.extend((10 - len(id)) * [-1])    # step 3 and 4    return
int(''.join(i+1 for i in id), 16)
def end_score(score):    # get the hex digits back    id =
hex(score)[2:].rstrip('L')    # replace the trailing 0 with 'f'
return int(id.replace('0', 'f'), 16)
Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/GfzgHJ3MLR8J.

Josiah Carlson

unread,
Jan 5, 2012, 2:14:30 AM1/5/12
to redi...@googlegroups.com
Ouch... my formatting got eaten... Let me fix that for you...

Salimane Adjao Moustapha

unread,
Jan 5, 2012, 3:13:59 AM1/5/12
to redi...@googlegroups.com
hi
no problem for the formatting...thanks very much for this, i was doing another way but this was much more cleaner.
Thanks again

Salimane Adjao Moustapha

unread,
Feb 16, 2012, 6:05:41 AM2/16/12
to redi...@googlegroups.com
Let's say  I want to add search on any part of the numbers like currently i could be able to find "1237888", "1239000" with a query like "123". let's say i wanted to find "9000123" and "8812388" too. how one could go about the scores calculations
Thanks

Josiah Carlson

unread,
Feb 16, 2012, 11:45:59 AM2/16/12
to redi...@googlegroups.com
Sadly, those aren't available with this method without some work and
more space. Basically, for however long your digits are, you need to
rotate them around, and store information about how much you rotated
them. For the earlier example, we'll add a new step 4.

1. Take your id as a sequence of integers...
726156 -> [7, 2, 6, 1, 5, 6]

2. Append -1 values to the right until you have 10 digits (technically
enough for ids of up to 33 bits)...
[7, 2, 6, 1, 5, 6, -1, -1, -1, -1]

3. Add 1 to all of your digits...
[8, 3, 7, 2, 6, 7, 0, 0, 0, 0]

4. Left-shift your digits incrementally until you find a '0' digit at
the front, and remember how many times you rotated it for each:
0, [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]
1, [3, 7, 2, 6, 7, 0, 0, 0, 0, 8]
2, [7, 2, 6, 7, 0, 0, 0, 0, 8, 3]
3, [2, 6, 7, 0, 0, 0, 0, 8, 3, 7]
4, [6, 7, 0, 0, 0, 0, 8, 3, 7, 2]
5, [7, 0, 0, 0, 0, 8, 3, 7, 2, 6]
6, [0, 0, 0, 0, 8, 3, 7, 2, 6, 7] <- stop and ignore this one

5. Again consider those as 4 bit portions of a 40 bit score, but
create your members with information about rotations a little
differently...

726156_0 -> 0x8372670000
726156_1 -> 0x3726700008
726156_2 -> 0x7267000083
726156_3 -> 0x2670000837
726156_4 -> 0x6700008372
726156_5 -> 0x7000083726

To search for a '615' anywhere in the string, we generate the same
start/stop endpoints:
0x72600000 and 0x726fffff. Performing the range scan, we get 726156_2
-> 0x7267000083, which is the 2nd rotation of id 726156.

Unfortunately, you lose 1 hex digit in the process, because you *need*
at least 1 trailing 0 in the zero-rotation version to only get exactly
what you want when you perform your scan. On the other hand, if you
also do a quick 'is this string in that string' check for everything
returned by the zrangebyscore call, you can get that last digit back
and can use all 40 bits for information.

This does potentially increase the amount of space used by a factor of
10x, but that all depends on your data.

Regards,
- Josiah

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/redis-db/-/8sq0taqf0iMJ.

Josiah Carlson

unread,
Feb 16, 2012, 11:48:18 AM2/16/12
to redi...@googlegroups.com
Also, technically, when you rotate your values, you don't need to
carry the data over to the right side, you could leave it as trailing
0's...

0, [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]

1, [3, 7, 2, 6, 7, 0, 0, 0, 0, 0]
2, [7, 2, 6, 7, 0, 0, 0, 0, 0, 0]
3, [2, 6, 7, 0, 0, 0, 0, 0, 0, 0]
4, [6, 7, 0, 0, 0, 0, 0, 0, 0, 0]
5, [7, 0, 0, 0, 0, 0, 0, 0, 0, 0]
6, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] <- stop and ignore this one

And you stop when you run out of positive digits on the left. Then you
always get what you want with your zrangebyscore query.

Regards,
- Josiah

Reply all
Reply to author
Forward
0 new messages