--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
Hi Joshia,
Thank you for your interest, yes, we’re using python. I’m implementing a grid showing data & status of a list of elements (probably ranging from 50K to 100k elements). I need to support filtering, pagination and ordering.
So my idea is:
· Have hashes with data for each element. Use id of element in the rest of sets, zsets.
· For Filtering, depending on column type:
o For integer fields (up to 15 digits): zset, score calculated using 4 bits per digit. This is similar to the “autocomplete” schema because I need to support “starts with”, “ends with” filtering.
o For string fields: zset, score calculated reducing the number of valid characters, also for “starts with”, “ends with” filtering.
o For enumerated value fields (for instance “status”:{ok,error,unknown}): sets with elements in each state.
o After filtering each column some zunion/zinterstore magic to obtain a temporary zset with the result.
· For ordering, paging:
o A sort operation on the resulting zset using offset, count.
· Reuse that temporary zset till the user changes filters or session timeouts.
I’ve been testing your ROM and it looks quite good, cleaner and (from my point of view) better than other similar libraries in python. It could be extended to support “autocomplete” filtering and using different score functions depending on data type (integers, strings, Unicode strings) and I think it would be a nice functionality to have.
But I need to support also some wildcard search (something like al?n*) and that’s why I was trying to use zscan in lua. In the end I’ve made a test using a lua script to perform the zscan loop and call zadd for the results. The performance over 100k elements varies from 150ms to 2.5s depending on the quality of the filter (1* or *1* both return 90K elements, while 123* returns 72 and 12?3* even less). So, if the filter is “starts/ends with” I’ll use zset filtering and if it’s “*1*” or using ? wildcard I plan to use zscan.
Your opinion about the filtering (and the whole scheme) is appreciated.
Best regards,
Alan.
Hi Joshia,
Thank you for your interest, yes, we’re using python. I’m implementing a grid showing data & status of a list of elements (probably ranging from 50K to 100k elements). I need to support filtering, pagination and ordering.
So my idea is:
· Have hashes with data for each element. Use id of element in the rest of sets, zsets.
· For Filtering, depending on column type:
o For integer fields (up to 15 digits): zset, score calculated using 4 bits per digit. This is similar to the “autocomplete” schema because I need to support “starts with”, “ends with” filtering.
o For string fields: zset, score calculated reducing the number of valid characters, also for “starts with”, “ends with” filtering.
o For enumerated value fields (for instance “status”:{ok,error,unknown}): sets with elements in each state.
o After filtering each column some zunion/zinterstore magic to obtain a temporary zset with the result.
· For ordering, paging:
o A sort operation on the resulting zset using offset, count.
· Reuse that temporary zset till the user changes filters or session timeouts.
I’ve been testing your ROM and it looks quite good, cleaner and (from my point of view) better than other similar libraries in python. It could be extended to support “autocomplete” filtering and using different score functions depending on data type (integers, strings, Unicode strings) and I think it would be a nice functionality to have.
But I need to support also some wildcard search (something like al?n*) and that’s why I was trying to use zscan in lua. In the end I’ve made a test using a lua script to perform the zscan loop and call zadd for the results.
The performance over 100k elements varies from 150ms to 2.5s depending on the quality of the filter (1* or *1* both return 90K elements, while 123* returns 72 and 12?3* even less). So, if the filter is “starts/ends with” I’ll use zset filtering and if it’s “*1*” or using ? wildcard I plan to use zscan.
Your opinion about the filtering (and the whole scheme) is appreciated.
Best regards,
Alan.
On Monday, January 13, 2014 5:10:38 PM UTC+1, Josiah Carlson wrote:
Hi Joshia,
Thank you for your interest, yes, we’re using python. I’m implementing a grid showing data & status of a list of elements (probably ranging from 50K to 100k elements). I need to support filtering, pagination and ordering.
So my idea is:
· Have hashes with data for each element. Use id of element in the rest of sets, zsets.
· For Filtering, depending on column type:
o For integer fields (up to 15 digits): zset, score calculated using 4 bits per digit. This is similar to the “autocomplete” schema because I need to support “starts with”, “ends with” filtering.
Why not just use the integer itself? Do you need any integers greater than 2**52?
Because I need “starts/ends with” filtering, so I need lexicographical order. That’s why I manage integers like strings with a reduced character set. I don’t know if there is a better approach for this.
If so, I might suggest this bit of code to help: https://gist.github.com/josiahcarlson/8459874
Thank you! I was not aware of this conversion, I’m sure it’s going to be
useful in other cases.
o For string fields: zset, score calculated reducing the number of valid characters, also for “starts with”, “ends with” filtering.
o For enumerated value fields (for instance “status”:{ok,error,unknown}): sets with elements in each state.
o After filtering each column some zunion/zinterstore magic to obtain a temporary zset with the result.
· For ordering, paging:
o A sort operation on the resulting zset using offset, count.
· Reuse that temporary zset till the user changes filters or session timeouts.
I’ve been testing your ROM and it looks quite good, cleaner and (from my point of view) better than other similar libraries in python. It could be extended to support “autocomplete” filtering and using different score functions depending on data type (integers, strings, Unicode strings) and I think it would be a nice functionality to have.
Autocomplete has been on my list of things to do for several months now. I've held off because I don't like ZSET score-based autocompletes due to the prefix limit (10 upper/lower + decimal, or 12 lowercase, or 6 full unicode characters), and I don't like requiring Lua scripting to use the library. But I suppose with Redis 2.6 having been out for 15 months now, and Redis 2.8 even 2 months old, I should get over it and just add the code from the book (modified to participate in a query chain, of course).
Good news!
But I need to support also some wildcard search (something like al?n*) and that’s why I was trying to use zscan in lua. In the end I’ve made a test using a lua script to perform the zscan loop and call zadd for the results.
I'm going to guess you mean that you used ZRANGE and not ZSCAN.
No, I meant one lua script for zscan loop, returning data to client, sending data to a lua script issuing the zadd commands. I’ve just realized that performing a loop on zscan in a lua script defeats the whole purpose of the iterative/cursor approach of zscan as I’m going to block redis during that loop like ‘keys’ does.
The performance over 100k elements varies from 150ms to 2.5s depending on the quality of the filter (1* or *1* both return 90K elements, while 123* returns 72 and 12?3* even less). So, if the filter is “starts/ends with” I’ll use zset filtering and if it’s “*1*” or using ? wildcard I plan to use zscan.
That's not a bad plan. Though I will admit, those wildcard queries would be horribly nasty without using Lua's built-in regular-expression pattern matching with a little pattern pre-processing.
I don’t know exactly what you mean with your last statement regarding regular-expression and pattern pre-processing but it brings me another idea: for searching ‘al?n*’ I could use a lua script filtering with zrange or zrangebyscore on ‘al’ and parse the results using lua regular-expressions searching for ‘?n’ then returning data, worst case being patterns like ‘?lan*’ or ‘*lan*’ where it’s like a “full table scan” in SQL world.
Your opinion about the filtering (and the whole scheme) is appreciated.
I think you've got a pretty good plan. If I can find time some time in the next week or so, I'll squeeze autocomplete on prefix/suffix and even the pattern matching into rom. I know that wasn't your intent, but those are useful tools to have available.
It’ll be really useful not only to use it directly but also to learn from your code. Thank you for your help.
Because I need “starts/ends with” filtering, so I need lexicographical order. That’s why I manage integers like strings with a reduced character set. I don’t know if there is a better approach for this.
If so, I might suggest this bit of code to help: https://gist.github.com/josiahcarlson/8459874
Thank you! I was not aware of this conversion, I’m sure it’s going to be useful in other cases.
o For string fields: zset, score calculated reducing the number of valid characters, also for “starts with”, “ends with” filtering.
o For enumerated value fields (for instance “status”:{ok,error,unknown}): sets with elements in each state.
o After filtering each column some zunion/zinterstore magic to obtain a temporary zset with the result.
· For ordering, paging:
o A sort operation on the resulting zset using offset, count.
· Reuse that temporary zset till the user changes filters or session timeouts.
I’ve been testing your ROM and it looks quite good, cleaner and (from my point of view) better than other similar libraries in python. It could be extended to support “autocomplete” filtering and using different score functions depending on data type (integers, strings, Unicode strings) and I think it would be a nice functionality to have.
Autocomplete has been on my list of things to do for several months now. I've held off because I don't like ZSET score-based autocompletes due to the prefix limit (10 upper/lower + decimal, or 12 lowercase, or 6 full unicode characters), and I don't like requiring Lua scripting to use the library. But I suppose with Redis 2.6 having been out for 15 months now, and Redis 2.8 even 2 months old, I should get over it and just add the code from the book (modified to participate in a query chain, of course).Good news!
But I need to support also some wildcard search (something like al?n*) and that’s why I was trying to use zscan in lua. In the end I’ve made a test using a lua script to perform the zscan loop and call zadd for the results.
I'm going to guess you mean that you used ZRANGE and not ZSCAN.No, I meant one lua script for zscan loop, returning data to client, sending data to a lua script issuing the zadd commands. I’ve just realized that performing a loop on zscan in a lua script defeats the whole purpose of the iterative/cursor approach of zscan as I’m going to block redis during that loop like ‘keys’ does.
The performance over 100k elements varies from 150ms to 2.5s depending on the quality of the filter (1* or *1* both return 90K elements, while 123* returns 72 and 12?3* even less). So, if the filter is “starts/ends with” I’ll use zset filtering and if it’s “*1*” or using ? wildcard I plan to use zscan.
That's not a bad plan. Though I will admit, those wildcard queries would be horribly nasty without using Lua's built-in regular-expression pattern matching with a little pattern pre-processing.I don’t know exactly what you mean with your last statement regarding regular-expression and pattern pre-processing but it brings me another idea: for searching ‘al?n*’ I could use a lua script filtering with zrange or zrangebyscore on ‘al’ and parse the results using lua regular-expressions searching for ‘?n’ then returning data, worst case being patterns like ‘?lan*’ or ‘*lan*’ where it’s like a “full table scan” in SQL world.
Your opinion about the filtering (and the whole scheme) is appreciated.
I think you've got a pretty good plan. If I can find time some time in the next week or so, I'll squeeze autocomplete on prefix/suffix and even the pattern matching into rom. I know that wasn't your intent, but those are useful tools to have available.It’ll be really useful not only to use it directly but also to learn from your code. Thank you for your help.
-- Populate time: Records: 100000 Time: 77.495000124
Filters
resutl: (1*) --> records:9294 -- Time -> 0.0650000572205
resutl (*1) --> records:9092 -- Time -> 0.066999912262
resutl (*1*) --> records:60306 -- Time -> 0.538000106812
resutl (1?) --> records:8163 -- Time -> 0.0599999427795
resutl (?1*) --> records:16411 -- Time -> 0.227999925613
resutl (*1?2*) --> records:10508 -- Time -> 0.202000141144
resutl (123*) --> records:23 -- Time -> 0.0019998550415
resutl (*123) --> records:24 -- Time -> 0.00300002098083
Hello Josiah,I have been testing your ROM and I am delighted with it, It is just what I was looking for.
In the time test results I have obtained the following information:-- Populate time: Records: 100000 Time: 77.495000124
Filters
resutl: (1*) --> records:9294 -- Time -> 0.0650000572205
resutl (*1) --> records:9092 -- Time -> 0.066999912262
resutl (*1*) --> records:60306 -- Time -> 0.538000106812
resutl (1?) --> records:8163 -- Time -> 0.0599999427795
resutl (?1*) --> records:16411 -- Time -> 0.227999925613
resutl (*1?2*) --> records:10508 -- Time -> 0.202000141144
resutl (123*) --> records:23 -- Time -> 0.0019998550415
resutl (*123) --> records:24 -- Time -> 0.00300002098083
The results are awesome, they are even better than the ones I got with the scan commands. I have also tested the result cache and I found it very useful.However I think the ROM could be even better if the sorted operation could be performance using numbers in addition to the sorted string which is implemented right now.I mean, If I have to sort numbers , I wold like to get the numeric sorted result, but I can not perform the operations (startswith,endwiths ... ) with integer types.
Besides, I have also missed the possibility of doing filters with "OR" logic, because right now I can only use "AND" logic on filters.
I am considering to use your ROM implementation and I think it would be very convenient to add these improvements.I hope this comments are usefull for your future developments.Thank you very much Josiah, you have helped me a lot.
Regards.Alan Perd.By the way I gonna buy your book. :)
Hello Josiah,
I’m sorry, maybe I didn’t explain myself properly. I meant that the "Startswidth" and "EndsWith" implementations satisfies the requirements my development needed and that’s great.
By now I’m going to follow your advice and I’m going to create another numeric field to do the ordenation once the filter results are obtained.
On the other hand, what I meant by the logic expression “OR”, is that I can nest different types of queries, for example:
RomTestPSP.query.endswith(col='123').startswith(col='5').cached_result(30)
But let’s suppose that what I really want is to get the values of the columns which start with 123 or which end with 5. Is it possible to do this kind of filters with the ROM?
Thank you very much in advance.
I wish you have a nice vacations.
Kind regards.
Alan.
The book is comming .. :)
On Thursday, January 9, 2014 5:37:15 PM UTC+1, Alan Perd wrote:
For more options, visit https://groups.google.com/d/optout.