Performance of set intersection

1,807 views
Skip to first unread message

mb

unread,
Aug 19, 2010, 10:09:26 PM8/19/10
to Redis DB
Shouldn't sinter in Redis be faster than set intersection in an
interpreted language like Python?

I have three sets:
set1: 50000 members
set2: 50000 members
set3: 50000 members

The intersection of all three sets has 1,528 members.

Using the redis-py client, my machine can sinter of these three sets
77 times per second:
L = redis.sinter(['set1', 'set2', 'set3'])

But Python's native set methods can do the same thing 361 times per
second:
inter12 = s1.intersection(s2)
inter123 = s1s2.intersection(s3)

I'd expect Redis would be much more efficient than Python, but my
tests show Python is more than 4x faster. Any ideas why this would be?

Bret A. Barker

unread,
Aug 19, 2010, 11:22:27 PM8/19/10
to redi...@googlegroups.com
My guess would be network overhead/latency. There's a big difference between in-process calls and talking to a server over TCP.
-bret

Josiah Carlson

unread,
Aug 19, 2010, 11:26:48 PM8/19/10
to redi...@googlegroups.com
You need to send the query to the redis server (low bandwidth, but
some network latency), and the server needs to send a response back to
you over the network (moderate bandwidth, more network latency).

In Python, nothing needs to be shipped across the network.

Also, Python sets are intersected internally with C.

- Josiah

> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
>

mb

unread,
Aug 20, 2010, 12:15:57 AM8/20/10
to Redis DB
@Bret, I'm not sure this explains the difference.

The set intersection is done entirely within Redis. The server is on
localhost, and the only latency would be for the sinter call itself.

If can 'set' and 'get' 50,000 times per second, there there must be
less than 0.02ms in communication. At 77 sinter calls per second,
that 0.02ms in communication overhead is still only 1.5ms.

@Josiah, good point about Python using C to do the
set.intersection(). I wonder if Python's algorithm is more efficient
than the Redis one. But since Python can only intersect 2 sets at a
time, I'd think Python would still be slower to do it in 2 steps when
Redis can intersect multiple sets in one step.

Are there good benchmarks on Redis set operations? Maybe I'm doing
something wrong.

Pieter Noordhuis

unread,
Aug 20, 2010, 4:02:16 AM8/20/10
to redi...@googlegroups.com
Doesn't seem to me you are doing things wrong. The already mentioned
things like latency and bandwidth have an obvious impact on the speed,
but also the process of formatting the reply. You mention the
resulting set has 1500 elements. This is a fairly large reply which
needs to be formatted before it is send over the network, which comes
at a (relative) high cost. The implementation of the set intersection
in Redis is as efficient as it can be, being: sort the sets to be
intersected from small to large, test for every element in the
smallest set if it is contained in all other sets, and, if so, add
this element to the reply.

I would say I would have more or less expected such numbers given the
overhead that using Redis comes with.

Cheers,
Pieter

Alex Dong

unread,
Aug 21, 2010, 1:25:47 AM8/21/10
to Redis DB
Set is a very complicated topic. There have been many algorithms and
data structures invented to provide better performance. Looking at
redis' code, it doesn't implement any of them. Under the hood, redis
is using hash table to implement the set. It is an good idea for fast-
insert/deletion but not good enough for set operations, neither good
enough for space/memory efficiency. I haven't looked at python's set
implementation, but if you're after fast set operations, you'll be
better off implementing your own or enhance redis by adding treap
support. [1] As for parallel intersections, there are further
optimizations one can do to make sure smallest sets are intersected
first and multiple CPUs can be taken advantage of, etc. [2].

I know I haven't really answered your question, which requires a bit
more time to dig into python's 2.4 above source code. I do have
similar needs for my work as well, so maybe I should take a look at
redis' source code and see if I can contribute some code back.

Good luck,
Alex

[1]: http://en.wikipedia.org/wiki/Treap
[2]: http://www.cs.cmu.edu/~scandal/treaps.html

Josiah Carlson

unread,
Aug 21, 2010, 3:26:38 PM8/21/10
to redi...@googlegroups.com
Python uses hashes. It was originally the same hash as used for
Python dictionaries, but they've been optimized to remove the 'value'
pointer (same algorithm).

Redis' set intersection is pretty reasonable in terms of operations:
1. order the sets by size
2. for each element in the smallest set, verify that it is in all other sets

Python's set intersection works the same way (on 2 sets), but I think
it wins because of the various tuning parameters developed over the
years (because the hashes on which they are based are such a core
structure for the language): open addressed hash for less code during
intersections, the probe sequence for sequential numbers in a
sufficiently-sized hash results in 0 collisions, the hash itself is
smaller (due to no value pointers) for less cache pressure, ...

Overall, for single-processor systems, the worst-case performance for
hash-based intersections is O(nk) where n is the size of the smallest
set, and k is the number of sets. For single-processor systems,
tree-based intersections won't actually help you, because instead of
O(1) lookups, you either get O(log(n)) lookups (naively), or best-case
scenario with finger-trees is you get the same O(nk) overall
performance as set-based intersections.

The parallel algorithm you linked to for treaps is very interesting,
but Redis is single-threaded, not an EREW PRAM machine. Incidentally,
the use of a treap in their algorithm is somewhat inconsequential;
with O(p*k*log(n)) work in advance on sorted arrays (B+Trees can be
used equivalently if you want trees), you can get O(nk/p) performance
for EREW PRAM.

Treaps are a great structure to learn about; I've got a Python
implementation that doesn't actually use extra memory for the
priority. Sadly, I've never had a need to use it.

- Josiah

Reply all
Reply to author
Forward
0 new messages