In Python, nothing needs to be shipped across the network.
Also, Python sets are intersected internally with C.
- Josiah
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>
I would say I would have more or less expected such numbers given the
overhead that using Redis comes with.
Cheers,
Pieter
Redis' set intersection is pretty reasonable in terms of operations:
1. order the sets by size
2. for each element in the smallest set, verify that it is in all other sets
Python's set intersection works the same way (on 2 sets), but I think
it wins because of the various tuning parameters developed over the
years (because the hashes on which they are based are such a core
structure for the language): open addressed hash for less code during
intersections, the probe sequence for sequential numbers in a
sufficiently-sized hash results in 0 collisions, the hash itself is
smaller (due to no value pointers) for less cache pressure, ...
Overall, for single-processor systems, the worst-case performance for
hash-based intersections is O(nk) where n is the size of the smallest
set, and k is the number of sets. For single-processor systems,
tree-based intersections won't actually help you, because instead of
O(1) lookups, you either get O(log(n)) lookups (naively), or best-case
scenario with finger-trees is you get the same O(nk) overall
performance as set-based intersections.
The parallel algorithm you linked to for treaps is very interesting,
but Redis is single-threaded, not an EREW PRAM machine. Incidentally,
the use of a treap in their algorithm is somewhat inconsequential;
with O(p*k*log(n)) work in advance on sorted arrays (B+Trees can be
used equivalently if you want trees), you can get O(nk/p) performance
for EREW PRAM.
Treaps are a great structure to learn about; I've got a Python
implementation that doesn't actually use extra memory for the
priority. Sadly, I've never had a need to use it.
- Josiah