Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
search through the values of a set
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  11 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Salimane Adjao Moustapha  
View profile  
 More options Jan 4 2012, 9:37 am
From: Salimane Adjao Moustapha <m...@salimane.com>
Date: Wed, 4 Jan 2012 06:37:53 -0800 (PST)
Local: Wed, Jan 4 2012 9:37 am
Subject: search through the values of a set
Hi
Basically I have a set containing values i would like to search
through. The set already exists. when a user enter a query like "123"
and the set contains ['4333', '1233','1133','1236'], i should be able
to return ['1233','1236']. I should also be able to provide the number
of returned results.
Thanks

 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jan 4 2012, 12:17 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 4 Jan 2012 09:17:25 -0800
Local: Wed, Jan 4 2012 12:17 pm
Subject: Re: search through the values of a set
Sets don't have that ability. You can make it work with a zset, and
here's how...

First, every item in your zset should have a score of 0. I'm going to
make a temporary copy of the zset so that we don't have any race
conditions, but with work, you can address that. Below is the code to
solve your problem using a zset...

def data_range(conn, data, prefix):
    if ord(prefix[-1]) == 255:
        # we don't want to deal with having to carry
        end = prefix + 20*'\xff'
    else:
        end = prefix[:-1] + chr(ord(prefix[-1]) + 1)
    conn.zinterstore('temporary', data)
    r1 = conn.zadd('temporary', 0, prefix)
    r2 = conn.zadd('temporary', 0, end)
    b = conn.zrank('temporary', prefix)
    e = conn.zrank('temporary', end)
    # if you just want the count...
    # count = e + 1 - b - r1 - r2
    results = conn.zrange('temporary', b + r1, e - r2)
    conn.delete('temporary')
    return len(results), results

Redis scripting can offer a similar solution without copying and
without worrying about race conditions. You can also do a similar
thing just using the scores to store the prefixes of the values, which
would allow you to just use zrangebyscore without needing to
add/remove items or copy the zset, but it is limited to 53 bits
naively, and almost 64 bits if you know what you are doing.

Regards,
 - Josiah

On Wed, Jan 4, 2012 at 6:37 AM, Salimane Adjao Moustapha


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salimane Adjao Moustapha  
View profile  
 More options Jan 4 2012, 9:57 pm
From: Salimane Adjao Moustapha <m...@salimane.com>
Date: Wed, 4 Jan 2012 18:57:24 -0800 (PST)
Local: Wed, Jan 4 2012 9:57 pm
Subject: Re: search through the values of a set

Hi,
thanks for the reply. since in this particular context, I'm dealing with
integer values in the set. I think I'll go with using a zset and setting
the score of each value as the value itself. then as you said using zrangebyscore.
the max value of the value is 2147483647 making it 31 bits.
out of curiosity, any suggestion on how to go about it with redis scripting
Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jan 4 2012, 11:37 pm
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 4 Jan 2012 20:37:57 -0800
Local: Wed, Jan 4 2012 11:37 pm
Subject: Re: search through the values of a set
With Redis scripting you would use what I offered earlier (without
making a copy of the zset), only you would write it in Lua, send it
off to Redis, and run it there. Redis 2.6 is the first version with
official scripting support.

In terms of using the score and searching that way... your original
post suggested that you wanted to do prefix lookups. If you still want
to do prefix lookups and are having problems figuring out how to do
it, reply to this thread and I'll point you in the right direction.

Regards,
 - Josiah

On Wed, Jan 4, 2012 at 6:57 PM, Salimane Adjao Moustapha


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salimane Adjao Moustapha  
View profile  
 More options Jan 5 2012, 12:16 am
From: Salimane Adjao Moustapha <m...@salimane.com>
Date: Wed, 4 Jan 2012 21:16:33 -0800 (PST)
Local: Thurs, Jan 5 2012 12:16 am
Subject: Re: search through the values of a set

Hi,
yep i should have prefix look up. any suggestion would be appreciated.
Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jan 5 2012, 2:11 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 4 Jan 2012 23:11:20 -0800
Local: Thurs, Jan 5 2012 2:11 am
Subject: Re: search through the values of a set
1. Take your id as a sequence of integers...   726156 -> [7, 2, 6, 1,
5, 6]2. Append -1 values to the right until you have 10 digits
(technically enough for ids of up to 33 bits)...   [7, 2, 6, 1, 5, 6,
-1, -1, -1, -1]3. Add 1 to all of your digits...   [8, 3, 7, 2, 6, 7,
0, 0, 0, 0]4. Consider each of these digits as a 4-bit portion of a
40-bit score...   score = 0x8372670000

To find the ids with a given prefix, you need to calculate a start/end
score for performing ZRANGEBYSCORE.1. Calculate the "start" as the
score method above.2. Calculate "end" by replacing all trailing '0'
digits with hex 'f' (technically 'b' is sufficient, and the only '0'
digits should be at the right side of the score).
Looking at an example, if we wanted to find ids with prefixes of 726,
we would generate a "start" score of 0x8370000000. We would also
generate an "end" score of 0x837fffffff. Our example id from before
has a prefix of 726, but also has a score of 0x8372670000, which falls
inside the range that we generated.
In Python:
def score(id):    # step 1    id = map(int, str(id))    # step 2
id.extend((10 - len(id)) * [-1])    # step 3 and 4    return
int(''.join(i+1 for i in id), 16)
def end_score(score):    # get the hex digits back    id =
hex(score)[2:].rstrip('L')    # replace the trailing 0 with 'f'
return int(id.replace('0', 'f'), 16)
Regards,
 - Josiah

On Wed, Jan 4, 2012 at 9:16 PM, Salimane Adjao Moustapha


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Jan 5 2012, 2:14 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Wed, 4 Jan 2012 23:14:30 -0800
Local: Thurs, Jan 5 2012 2:14 am
Subject: Re: search through the values of a set
Ouch... my formatting got eaten... Let me fix that for you...

1. Take your id as a sequence of integers...
726156 -> [7, 2, 6, 1, 5, 6]

2. Append -1 values to the right until you have 10 digits (technically
enough for ids of up to 33 bits)...
[7, 2, 6, 1, 5, 6, -1, -1, -1, -1]

3. Add 1 to all of your digits...
[8, 3, 7, 2, 6, 7, 0, 0, 0, 0]

4. Consider each of these digits as a 4-bit portion of a 40-bit score...
score = 0x8372670000

To find the ids with a given prefix, you need to calculate a start/end
score for performing ZRANGEBYSCORE.

1. Calculate the "start" as the score method above.
2. Calculate "end" by replacing all trailing '0' digits with hex 'f'
(technically 'b' is sufficient, and the only '0' digits should be at
the right side of the score).

Looking at an example, if we wanted to find ids with prefixes of 726,
we would generate a "start" score of 0x8370000000. We would also
generate an "end" score of 0x837fffffff. Our example id from before
has a prefix of 726, but also has a score of 0x8372670000, which falls
inside the range that we generated.

In Python:
def score(id):
    # step 1
    id = map(int, str(id))
    # step 2
    id.extend((10 - len(id)) * [-1])
    # step 3 and 4
    return int(''.join(i+1 for i in id), 16)

def end_score(score):
    # get the hex digits back
    id = hex(score)[2:].rstrip('L')
    # replace the trailing 0 with 'f'
    return int(id.replace('0', 'f'), 16)

On Wed, Jan 4, 2012 at 11:11 PM, Josiah Carlson


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salimane Adjao Moustapha  
View profile  
 More options Jan 5 2012, 3:13 am
From: Salimane Adjao Moustapha <m...@salimane.com>
Date: Thu, 5 Jan 2012 00:13:59 -0800 (PST)
Local: Thurs, Jan 5 2012 3:13 am
Subject: Re: search through the values of a set

hi
no problem for the formatting...thanks very much for this, i was doing
another way but this was much more cleaner.
Thanks again


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Salimane Adjao Moustapha  
View profile  
 More options Feb 16 2012, 6:05 am
From: Salimane Adjao Moustapha <m...@salimane.com>
Date: Thu, 16 Feb 2012 03:05:41 -0800 (PST)
Local: Thurs, Feb 16 2012 6:05 am
Subject: Re: search through the values of a set

Let's say  I want to add search on any part of the numbers like currently i
could be able to find "1237888", "1239000" with a query like "123". let's
say i wanted to find "9000123" and "8812388" too. how one could go about
the scores calculations
Thanks


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Feb 16 2012, 11:45 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Thu, 16 Feb 2012 08:45:59 -0800
Local: Thurs, Feb 16 2012 11:45 am
Subject: Re: search through the values of a set
Sadly, those aren't available with this method without some work and
more space. Basically, for however long your digits are, you need to
rotate them around, and store information about how much you rotated
them. For the earlier example, we'll add a new step 4.

1. Take your id as a sequence of integers...
726156 -> [7, 2, 6, 1, 5, 6]

2. Append -1 values to the right until you have 10 digits (technically
enough for ids of up to 33 bits)...
[7, 2, 6, 1, 5, 6, -1, -1, -1, -1]

3. Add 1 to all of your digits...
[8, 3, 7, 2, 6, 7, 0, 0, 0, 0]

4. Left-shift your digits incrementally until you find a '0' digit at
the front, and remember how many times you rotated it for each:
0, [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]
1, [3, 7, 2, 6, 7, 0, 0, 0, 0, 8]
2, [7, 2, 6, 7, 0, 0, 0, 0, 8, 3]
3, [2, 6, 7, 0, 0, 0, 0, 8, 3, 7]
4, [6, 7, 0, 0, 0, 0, 8, 3, 7, 2]
5, [7, 0, 0, 0, 0, 8, 3, 7, 2, 6]
6, [0, 0, 0, 0, 8, 3, 7, 2, 6, 7] <- stop and ignore this one

5. Again consider those as 4 bit portions of a 40 bit score, but
create your members with information about rotations a little
differently...

726156_0 -> 0x8372670000
726156_1 -> 0x3726700008
726156_2 -> 0x7267000083
726156_3 -> 0x2670000837
726156_4 -> 0x6700008372
726156_5 -> 0x7000083726

To search for a '615' anywhere in the string, we generate the same
start/stop endpoints:
0x72600000 and 0x726fffff. Performing the range scan, we get 726156_2
-> 0x7267000083, which is the 2nd rotation of id 726156.

Unfortunately, you lose 1 hex digit in the process, because you *need*
at least 1 trailing 0 in the zero-rotation version to only get exactly
what you want when you perform your scan. On the other hand, if you
also do a quick 'is this string in that string' check for everything
returned by the zrangebyscore call, you can get that last digit back
and can use all 40 bits for information.

This does potentially increase the amount of space used by a factor of
10x, but that all depends on your data.

Regards,
 - Josiah

On Thu, Feb 16, 2012 at 3:05 AM, Salimane Adjao Moustapha


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Josiah Carlson  
View profile  
 More options Feb 16 2012, 11:48 am
From: Josiah Carlson <josiah.carl...@gmail.com>
Date: Thu, 16 Feb 2012 08:48:18 -0800
Local: Thurs, Feb 16 2012 11:48 am
Subject: Re: search through the values of a set
Also, technically, when you rotate your values, you don't need to
carry the data over to the right side, you could leave it as trailing
0's...

0, [8, 3, 7, 2, 6, 7, 0, 0, 0, 0]
1, [3, 7, 2, 6, 7, 0, 0, 0, 0, 0]
2, [7, 2, 6, 7, 0, 0, 0, 0, 0, 0]
3, [2, 6, 7, 0, 0, 0, 0, 0, 0, 0]
4, [6, 7, 0, 0, 0, 0, 0, 0, 0, 0]
5, [7, 0, 0, 0, 0, 0, 0, 0, 0, 0]
6, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] <- stop and ignore this one

And you stop when you run out of positive digits on the left. Then you
always get what you want with your zrangebyscore query.

Regards,
 - Josiah

On Thu, Feb 16, 2012 at 8:45 AM, Josiah Carlson


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »