SCAN command: is there an upper bound on duplicates for a particular key?

571 views
Skip to first unread message

Motti

unread,
Jan 29, 2014, 12:33:22 PM1/29/14
to redi...@googlegroups.com
The SCAN (and ZSCAN etc.) command can return duplicate keys.

From my reading of the code, comments (https://github.com/antirez/redis/blame/unstable/src/dict.c#L663) and pull request (https://github.com/antirez/redis/pull/579) it appears that key duplication happens when the hash table shrinks mid-SCAN. AFAICS it looks like keys are only duplicated in the subsequent SCAN call, but not at any point.

For a particular key, will it only be duplicated in the very next SCAN call, or could it be duplicated in any future SCAN call?

Also will only one key be duplicated, or might a few keys be duplicated?

e.g. is this the only possible pattern of SCAN returns

Say there are 26 keys "A", "B", "C", "D", "E", ... "W", "X",  "Y", "Z"

> SCAN 0
1) "4"
2) 1) "F"
   2) "B"
   3) "H"
> SCAN 4 
1) "14"
2) 1) "M"
   2) "A"
   3) "Q"

and then we pause. In the meanwhile let's say keys M-Z are deleted so hashtable halves (or worse).
We then continue the scan, e.g "SCAN 14" - I thought the worse that can happen is that let's say "M" & "A" get returned again, but after that they will never be returned again (except in the very next call if the hashtable shrinks again immediately!).

In other words, in order to prevent duplicates being returned, I could first check them against the previous SCAN result and exclude duplicates, and that would be enough. Is this correct?


Pieter Noordhuis

unread,
Jan 29, 2014, 8:09:51 PM1/29/14
to redi...@googlegroups.com
Hi,

This is an interesting question.

To analyze it, let's observe what happens to the cursor when
iterating. Assume a hash table of size 2^N. On every call the next
cursor is determined by incrementing the bit-wise reverse (while
observing the size of the hash table, being 2^N). For a hash table of
size 8, the cursor sequence is [0, 4, 2, 6, 1, 5, 3, 7], or in binary
[000, 100, 010, 110, 001, 101, 011, 111]. As you can see, for a hash
table with a stable size, the most significant bit gets flipped on
every call. This means that the cursor switches between the range [0,
2^(N-1)-1] and [2^(N-1), 2^N-1], or: the first half and the second
half. When the hash table is shrunk by half to size 2^(N-1), the most
significant bit of the hashes of the elements is removed and
effectively the second half of the original range is overlaid on top
of the first half of the original range. If the cursor was pointing to
the first half before this happened, we cannot observe duplicates
since it will not be revisiting slots it has already visited. If the
cursor was pointing to the second half before this happened, we may
observe duplicates since the slot it is pointing to will be merged
with with the slot in the first half of the table *that the cursor
visited in the previous call*. Think about the hash table of size 8:
shrinking it to 4 means that the following slots will be merged:
(0,4), (2,6), (1,5), and (3,7). If the cursor points to slot 2, the
algorithm will just continue where it left off regardless of the
resize operation. If the cursor points to slot 6, the algorithm *must*
revisit slot 2 to ensure all elements are emitted, even while this can
cause elements that originally were in slot 2 to be re-emitted. So
far, it looks like duplicate elements can only be emitted in a
subsequent call to SCAN. However...

This works for the case where you shrink a hash table from size 2^N to
2^(N-1). However, it breaks down when you shrink the hash table from
2^N to 2^(N-2) or further. Instead of worrying about the first half
and second half of the hash table, we need to worry about 4 quarters
of the hash table. If your cursor points to quarter #0, the algorithm
will continue where it left off and the user will not see any
duplicates. If the cursor points to quarter #2, the algorithm may
return elements already seen from quarter #0. If the cursor points to
quarter #1, the algorithm may return elements already seen from
quarters #0 and #2. If the cursor points to quarter #3, the algorithm
may return elements already seen from quarters #0, #2, and #1. It is
relatively easy to see how you can extrapolate to shrinking a hash
table from 2^N to 2^(N-M). The larger the factor by which the hash
table is shrunk, the more results from calls to SCAN you need to
retain to filter duplicates on the client side. In fact, it seems that
you would need to retain 2^M-1 results.

Since any time can pass between invocations to SCAN, M can be greater
than 1. To get to a state where you don't need to filter for
duplicates, you would need to make sure all active cursors point to
the first half before shrinking it down further. This requires keeping
state on the server and can cause independent cursors to be blocked on
each other. While certainly possible, I don't think this is a good
idea given the lightweight nature of the current implementation.

To answer your other question: hash collisions can cause multiple
elements to be present in the same hash slot, so multiple elements can
be duplicated.

I hope this clarifies a few things.

Cheers,
Pieter
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at http://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/groups/opt_out.

Motti

unread,
Jan 30, 2014, 6:48:14 PM1/30/14
to redi...@googlegroups.com
Thanks Pieter, that's very clear.

Just to state the obvious but keys that are removed mid-scan will never appear again though. It's only keys that still exist that may get duplicated.

Ambar Sarkar

unread,
Sep 13, 2021, 3:24:13 PM9/13/21
to Redis DB
"For a particular key, will it only be duplicated in the very next SCAN call, or could it be duplicated in any future SCAN call?:

Was this question ever answered? I am not sure I saw any confirmation of this. 

Any updates will be appreciated.
Thanks!
-Ambar

Reply all
Reply to author
Forward
0 new messages