Hi
Basically I have a set containing values i would like to search
through. The set already exists. when a user enter a query like "123"
and the set contains ['4333', '1233','1133','1236'], i should be able
to return ['1233','1236']. I should also be able to provide the number
of returned results.
Thanks
Sets don't have that ability. You can make it work with a zset, and here's how...
First, every item in your zset should have a score of 0. I'm going to make a temporary copy of the zset so that we don't have any race conditions, but with work, you can address that. Below is the code to solve your problem using a zset...
def data_range(conn, data, prefix): if ord(prefix[-1]) == 255: # we don't want to deal with having to carry end = prefix + 20*'\xff' else: end = prefix[:-1] + chr(ord(prefix[-1]) + 1) conn.zinterstore('temporary', data) r1 = conn.zadd('temporary', 0, prefix) r2 = conn.zadd('temporary', 0, end) b = conn.zrank('temporary', prefix) e = conn.zrank('temporary', end) # if you just want the count... # count = e + 1 - b - r1 - r2 results = conn.zrange('temporary', b + r1, e - r2) conn.delete('temporary') return len(results), results
Redis scripting can offer a similar solution without copying and without worrying about race conditions. You can also do a similar thing just using the scores to store the prefixes of the values, which would allow you to just use zrangebyscore without needing to add/remove items or copy the zset, but it is limited to 53 bits naively, and almost 64 bits if you know what you are doing.
Regards, - Josiah
On Wed, Jan 4, 2012 at 6:37 AM, Salimane Adjao Moustapha
<m...@salimane.com> wrote: > Hi > Basically I have a set containing values i would like to search > through. The set already exists. when a user enter a query like "123" > and the set contains ['4333', '1233','1133','1236'], i should be able > to return ['1233','1236']. I should also be able to provide the number > of returned results. > Thanks
> -- > You received this message because you are subscribed to the Google Groups "Redis DB" group. > To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
Hi, thanks for the reply. since in this particular context, I'm dealing with integer values in the set. I think I'll go with using a zset and setting the score of each value as the value itself. then as you said using zrangebyscore. the max value of the value is 2147483647 making it 31 bits. out of curiosity, any suggestion on how to go about it with redis scripting Thanks
With Redis scripting you would use what I offered earlier (without making a copy of the zset), only you would write it in Lua, send it off to Redis, and run it there. Redis 2.6 is the first version with official scripting support.
In terms of using the score and searching that way... your original post suggested that you wanted to do prefix lookups. If you still want to do prefix lookups and are having problems figuring out how to do it, reply to this thread and I'll point you in the right direction.
Regards, - Josiah
On Wed, Jan 4, 2012 at 6:57 PM, Salimane Adjao Moustapha
<m...@salimane.com> wrote: > Hi, > thanks for the reply. since in this particular context, I'm dealing with > integer values in the set. I think I'll go with using a zset and setting the > score of each value as the value itself. then as you said > using zrangebyscore. the max value of the value is 2147483647 making it 31 > bits. > out of curiosity, any suggestion on how to go about it with redis scripting > Thanks
> To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
1. Take your id as a sequence of integers... 726156 -> [7, 2, 6, 1, 5, 6]2. Append -1 values to the right until you have 10 digits (technically enough for ids of up to 33 bits)... [7, 2, 6, 1, 5, 6, -1, -1, -1, -1]3. Add 1 to all of your digits... [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]4. Consider each of these digits as a 4-bit portion of a 40-bit score... score = 0x8372670000
To find the ids with a given prefix, you need to calculate a start/end score for performing ZRANGEBYSCORE.1. Calculate the "start" as the score method above.2. Calculate "end" by replacing all trailing '0' digits with hex 'f' (technically 'b' is sufficient, and the only '0' digits should be at the right side of the score). Looking at an example, if we wanted to find ids with prefixes of 726, we would generate a "start" score of 0x8370000000. We would also generate an "end" score of 0x837fffffff. Our example id from before has a prefix of 726, but also has a score of 0x8372670000, which falls inside the range that we generated. In Python: def score(id): # step 1 id = map(int, str(id)) # step 2 id.extend((10 - len(id)) * [-1]) # step 3 and 4 return int(''.join(i+1 for i in id), 16) def end_score(score): # get the hex digits back id = hex(score)[2:].rstrip('L') # replace the trailing 0 with 'f' return int(id.replace('0', 'f'), 16) Regards, - Josiah
On Wed, Jan 4, 2012 at 9:16 PM, Salimane Adjao Moustapha
> To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
Ouch... my formatting got eaten... Let me fix that for you...
1. Take your id as a sequence of integers... 726156 -> [7, 2, 6, 1, 5, 6]
2. Append -1 values to the right until you have 10 digits (technically enough for ids of up to 33 bits)... [7, 2, 6, 1, 5, 6, -1, -1, -1, -1]
3. Add 1 to all of your digits... [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]
4. Consider each of these digits as a 4-bit portion of a 40-bit score... score = 0x8372670000
To find the ids with a given prefix, you need to calculate a start/end score for performing ZRANGEBYSCORE.
1. Calculate the "start" as the score method above. 2. Calculate "end" by replacing all trailing '0' digits with hex 'f' (technically 'b' is sufficient, and the only '0' digits should be at the right side of the score).
Looking at an example, if we wanted to find ids with prefixes of 726, we would generate a "start" score of 0x8370000000. We would also generate an "end" score of 0x837fffffff. Our example id from before has a prefix of 726, but also has a score of 0x8372670000, which falls inside the range that we generated.
In Python: def score(id): # step 1 id = map(int, str(id)) # step 2 id.extend((10 - len(id)) * [-1]) # step 3 and 4 return int(''.join(i+1 for i in id), 16)
def end_score(score): # get the hex digits back id = hex(score)[2:].rstrip('L') # replace the trailing 0 with 'f' return int(id.replace('0', 'f'), 16)
<josiah.carl...@gmail.com> wrote: > 1. Take your id as a sequence of integers... 726156 -> [7, 2, 6, 1, > 5, 6]2. Append -1 values to the right until you have 10 digits > (technically enough for ids of up to 33 bits)... [7, 2, 6, 1, 5, 6, > -1, -1, -1, -1]3. Add 1 to all of your digits... [8, 3, 7, 2, 6, 7, > 0, 0, 0, 0]4. Consider each of these digits as a 4-bit portion of a > 40-bit score... score = 0x8372670000
> To find the ids with a given prefix, you need to calculate a start/end > score for performing ZRANGEBYSCORE.1. Calculate the "start" as the > score method above.2. Calculate "end" by replacing all trailing '0' > digits with hex 'f' (technically 'b' is sufficient, and the only '0' > digits should be at the right side of the score). > Looking at an example, if we wanted to find ids with prefixes of 726, > we would generate a "start" score of 0x8370000000. We would also > generate an "end" score of 0x837fffffff. Our example id from before > has a prefix of 726, but also has a score of 0x8372670000, which falls > inside the range that we generated. > In Python: > def score(id): # step 1 id = map(int, str(id)) # step 2 > id.extend((10 - len(id)) * [-1]) # step 3 and 4 return > int(''.join(i+1 for i in id), 16) > def end_score(score): # get the hex digits back id = > hex(score)[2:].rstrip('L') # replace the trailing 0 with 'f' > return int(id.replace('0', 'f'), 16) > Regards, > - Josiah
> On Wed, Jan 4, 2012 at 9:16 PM, Salimane Adjao Moustapha > <m...@salimane.com> wrote: >> Hi, >> yep i should have prefix look up. any suggestion would be appreciated. >> Thanks
>> To post to this group, send email to redis-db@googlegroups.com. >> To unsubscribe from this group, send email to >> redis-db+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/redis-db?hl=en.
Let's say I want to add search on any part of the numbers like currently i could be able to find "1237888", "1239000" with a query like "123". let's say i wanted to find "9000123" and "8812388" too. how one could go about the scores calculations Thanks
Sadly, those aren't available with this method without some work and more space. Basically, for however long your digits are, you need to rotate them around, and store information about how much you rotated them. For the earlier example, we'll add a new step 4.
1. Take your id as a sequence of integers... 726156 -> [7, 2, 6, 1, 5, 6]
2. Append -1 values to the right until you have 10 digits (technically enough for ids of up to 33 bits)... [7, 2, 6, 1, 5, 6, -1, -1, -1, -1]
3. Add 1 to all of your digits... [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]
4. Left-shift your digits incrementally until you find a '0' digit at the front, and remember how many times you rotated it for each: 0, [8, 3, 7, 2, 6, 7, 0, 0, 0, 0] 1, [3, 7, 2, 6, 7, 0, 0, 0, 0, 8] 2, [7, 2, 6, 7, 0, 0, 0, 0, 8, 3] 3, [2, 6, 7, 0, 0, 0, 0, 8, 3, 7] 4, [6, 7, 0, 0, 0, 0, 8, 3, 7, 2] 5, [7, 0, 0, 0, 0, 8, 3, 7, 2, 6] 6, [0, 0, 0, 0, 8, 3, 7, 2, 6, 7] <- stop and ignore this one
5. Again consider those as 4 bit portions of a 40 bit score, but create your members with information about rotations a little differently...
To search for a '615' anywhere in the string, we generate the same start/stop endpoints: 0x72600000 and 0x726fffff. Performing the range scan, we get 726156_2 -> 0x7267000083, which is the 2nd rotation of id 726156.
Unfortunately, you lose 1 hex digit in the process, because you *need* at least 1 trailing 0 in the zero-rotation version to only get exactly what you want when you perform your scan. On the other hand, if you also do a quick 'is this string in that string' check for everything returned by the zrangebyscore call, you can get that last digit back and can use all 40 bits for information.
This does potentially increase the amount of space used by a factor of 10x, but that all depends on your data.
Regards, - Josiah
On Thu, Feb 16, 2012 at 3:05 AM, Salimane Adjao Moustapha
<m...@salimane.com> wrote: > Let's say I want to add search on any part of the numbers like currently i > could be able to find "1237888", "1239000" with a query like "123". let's > say i wanted to find "9000123" and "8812388" too. how one could go about the > scores calculations > Thanks
> To post to this group, send email to redis-db@googlegroups.com. > To unsubscribe from this group, send email to > redis-db+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/redis-db?hl=en.
<josiah.carl...@gmail.com> wrote: > Sadly, those aren't available with this method without some work and > more space. Basically, for however long your digits are, you need to > rotate them around, and store information about how much you rotated > them. For the earlier example, we'll add a new step 4.
> 1. Take your id as a sequence of integers... > 726156 -> [7, 2, 6, 1, 5, 6]
> 2. Append -1 values to the right until you have 10 digits (technically > enough for ids of up to 33 bits)... > [7, 2, 6, 1, 5, 6, -1, -1, -1, -1]
> 3. Add 1 to all of your digits... > [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]
> 4. Left-shift your digits incrementally until you find a '0' digit at > the front, and remember how many times you rotated it for each: > 0, [8, 3, 7, 2, 6, 7, 0, 0, 0, 0] > 1, [3, 7, 2, 6, 7, 0, 0, 0, 0, 8] > 2, [7, 2, 6, 7, 0, 0, 0, 0, 8, 3] > 3, [2, 6, 7, 0, 0, 0, 0, 8, 3, 7] > 4, [6, 7, 0, 0, 0, 0, 8, 3, 7, 2] > 5, [7, 0, 0, 0, 0, 8, 3, 7, 2, 6] > 6, [0, 0, 0, 0, 8, 3, 7, 2, 6, 7] <- stop and ignore this one
> 5. Again consider those as 4 bit portions of a 40 bit score, but > create your members with information about rotations a little > differently...
> To search for a '615' anywhere in the string, we generate the same > start/stop endpoints: > 0x72600000 and 0x726fffff. Performing the range scan, we get 726156_2 > -> 0x7267000083, which is the 2nd rotation of id 726156.
> Unfortunately, you lose 1 hex digit in the process, because you *need* > at least 1 trailing 0 in the zero-rotation version to only get exactly > what you want when you perform your scan. On the other hand, if you > also do a quick 'is this string in that string' check for everything > returned by the zrangebyscore call, you can get that last digit back > and can use all 40 bits for information.
> This does potentially increase the amount of space used by a factor of > 10x, but that all depends on your data.
> Regards, > - Josiah
> On Thu, Feb 16, 2012 at 3:05 AM, Salimane Adjao Moustapha > <m...@salimane.com> wrote: >> Let's say I want to add search on any part of the numbers like currently i >> could be able to find "1237888", "1239000" with a query like "123". let's >> say i wanted to find "9000123" and "8812388" too. how one could go about the >> scores calculations >> Thanks
>> To post to this group, send email to redis-db@googlegroups.com. >> To unsubscribe from this group, send email to >> redis-db+unsubscribe@googlegroups.com. >> For more options, visit this group at >> http://groups.google.com/group/redis-db?hl=en.