Hi Redisers,I have a question regarding the SCAN function:What is the purpose of its random nature?Is it because the keys in redis are stored as a set and there it is not possible to create a deterministic SCAN order?
I use Redis in a research project as an intermediate storage. The key space is quite big (> 2 Million) and what I want is that N processes (on different machines) retrieve a certain part of the keys for further processing. I exploited Redis's sequential request handling nature and created the following LUA script that performs partial scans and stores the state (cursor) at the redis server:
local db = ARGV[1] or 11local count = ARGV[2] or 100
local s=0redis.call('SELECT', '1')
if redis.call('EXISTS', 'scan_cursor') == 1 thens=redis.call('GET', 'scan_cursor')end
redis.call('SET', 'scan_cursor', s+count)redis.call('SELECT', db)if s+count > redis.call('DBSIZE') then
return nilelse
local result = redis.call('SCAN', s, 'COUNT', count)return redis.call('MGET', unpack(result[2]))end
Any process can continue the scan which is fine by my application.However, as the documentation of the SCAN function says, a SCAN can return items twice. This renders the SCAN function unusable for me. My current solution takes all keys and puts them in a separate list to achieve the same effect. This has the drawback though that I need to copy the whole key space which is most of the space in the database.Is there any other way to accomplish what I want?If nobody alters the state in the DB during the scan operation then it would be great to have a more deterministic response.
I would be happy to hear some thoughts on that.
Cheers,Sebastian
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.
Sample output looks like so:
[value82, value97, value83, value25, value43, value45, value84, value75, value11, value59, value85, value15, value50, value32, value40, value96, value94, value44, value86, value23]
[value44, value86, value23, value3, value37, value58, value92, value26, value8, value38, value89, value60, value6, value29, value28, value81, value21, value1, value69, value52]
[value85, value15, value50, value32, value40, value96, value94, value44, value86, value23, value3, value37, value58, value92, value26, value8, value38, value89, value60, value6]
[value89, value60, value6, value29, value28, value81, value21, value1, value69, value52, value17, value55, value70, value93, value13, value16, value47, value4, value65, value64]
[value25, value43, value45, value84, value75, value11, value59, value85, value15, value50, value32, value40, value96, value94, value44, value86, value23, value3, value37, value58]
The order seems deterministic but the starting point does not seem to be. I'm also pretty sure that SRANDMEMBER is being used during the SCAN because set calls do not work afterwards anymore.
I did not say that it is called by my script but some non-deterministic operation is used by SCAN which is the reason why a write operation fails afterwards because the scripting assumptions are violated (which I know). I thought I had read somewhere that SCAN internally calls SRANDMEMBER. If it doesn't then nevermind.
All I'm saying is that this non-deterministic operation might be the reason for the non-deterministic output that I'm seeing. Is that correct?
What is this non-deterministic operation?
Has it to do with the rehashing that you mentioned?
Why would redis rehash if no new entries are being created?
Can you please explain the internals on the simple example that I gave above? The program only inserts 100 keys, so there should not be too much to do for redis and the internal state should just stay the same.
I'm really thankful for your comments but I just can't figure out a use case for a non-deterministic SCAN function.
I do understand that a single SCAN over some keys can be useful but then why would I want this cursor concept?
I could just start another SCAN without any cursor and hope to get different values this time.
I think what I do not understand hides behind the following of your lines:"SCAN is a function of several things, none of which are necessarily consistent on a fully up-to-date slave, a master replaying an AOF, or even two masters identically configured that were started up with two identical RDBs"I don't have any distributed redis setup. I barely have a single redis server running. Therefore, it ought to be possible in that case to provide some more guarantees.
However, I do understand that it might be too much to explain every detail here. Maybe I find some time to implement what I envision into redis and share it with you.
On Mon, Jan 13, 2014 at 12:18 AM, Sebastian Ertel <sebasti...@gmail.com> wrote:
I think what I do not understand hides behind the following of your lines:"SCAN is a function of several things, none of which are necessarily consistent on a fully up-to-date slave, a master replaying an AOF, or even two masters identically configured that were started up with two identical RDBs"I don't have any distributed redis setup. I barely have a single redis server running. Therefore, it ought to be possible in that case to provide some more guarantees.Actually no. If you were to execute BGSAVE on your server and copy the resulting RDB to another server, then if you were able to execute your script on both servers at the same time, you would have different results. This is part of what the current semantics are intending to prevent.
However, I do understand that it might be too much to explain every detail here. Maybe I find some time to implement what I envision into redis and share it with you.
If you are talking about making writing after a SCAN call allowed, I can guarantee that Salvatore is going to reject your proposal for exactly the reason I mentioned earlier in this email.
Why do you want to call SCAN? What do you hope to do with the results?
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/daK49HGBrm0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.