Using 'redis-benchmark -q' on my $200 core-2 duo box:
PING (inline): 109890.11 requests per second
PING: 111111.11 requests per second
MSET (10 keys): 67114.09 requests per second
SET: 113636.37 requests per second
GET: 113636.37 requests per second
INCR: 112359.55 requests per second
LPUSH: 114942.53 requests per second
LPOP: 113636.37 requests per second
SADD: 111111.11 requests per second
SPOP: 112359.55 requests per second
LPUSH (again, in order to bench LRANGE): 113636.37 requests per second
LRANGE (first 100 elements): 66225.17 requests per second
LRANGE (first 300 elements): 32362.46 requests per second
LRANGE (first 450 elements): 23696.68 requests per second
LRANGE (first 600 elements): 18903.59 requests per second
Writing a Python script (which uses hiredis via the standard Python
redis library) to do ZINCRBY/ZRANGE and SET/LPUSH/LRANGE (for
comparison):
ZINCRBY: 71232.23 per second
ZRANGE (first 100): 23803.95 per second
ZRANGE (first 300): 9206.23 per second
ZRANGE (first 450): 5957.89 per second
ZRANGE (first 600): 4268.74 per second
SET: 92908.21 per second
LPUSH: 91704.67 per second
LRANGE (first 100): 23810.03 per second
LRANGE (first 300): 9200.13 per second
LRANGE (first 450): 6082.21 per second
LRANGE (first 600): 4317.51 per second
There seems to be a very slight performance penalty to using ZRANGE
over LRANGE, and a 25% penalty to using ZINCRBY over SET. The Python
side of things is significantly slower on the [Z|L]RANGE side of
things, I suspect because of parsing.
As for it's comparison to MySQL's SORT BY procedure using a secondary
index; if the index is already created, MySQL can get away with
perhaps a half-dozen random disk IOs or better for almost any volume
of data. That is roughly 60ms on the upper end, though as quick as
under 1 ms if proper regions of the disk are cached (or similar times
for SSDs and low memory). Assuming that your Redis zset has been
incrementally updated, Redis should be able to respond also under 1 ms
for any reasonable ZRANGE query. If you can fit your data in memory
for Redis, and you can force MySQL to keep it's index in memory, Redis
vs. MySQL will probably be similar in performance. But what you gain
with Redis is a new way of thinking about your data and a new tool to
solve your problems.
What problem are you really trying to solve?
> Also in the documentation it states "A complex object in memory like a
> list or a set is something 10 times bigger than the same object
> serialized on disk." Does that also mean a bunch of zsets in Redis
> takes up 10 times more memory than an equivalent table (using
> secondary index for sorting) in MySQL?
The Redis documentation is primarily referring to the representation
in the dump file, which is compressed. You can think of it
(conceptually) like a gzipped MySQL binlog. Essentially useless by
itself, but you can feed it into MySQL and have a database that is
consistent up to the point the binlog was created. If you were to
create an in-memory table in MySQL, you would be able to compare
memory usage directly between the two.
Regards,
- Josiah
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
a little known feature of redis-benchmark introduced recently by
Pieter Noordhuis is that you can benchmark any command you want (only
available in Redis unstable branch, but you can use redis-benchmark
from unstable to benchmark Redis 2.4.x):
$ ./redis-benchmark -q -n 100000 zadd sortedset 10 a
zadd sortedset 10 a: 152439.02 requests per second
To compare it with another command:
$ ./redis-benchmark -q -n 100000 set foo bar
set foo bar: 153374.23 requests per second
However you can say that we are setting again and again the same
element. True... but there is an hidden feature of redis-benchmark
that allows to randomize arguments:
$ ./redis-benchmark -q -r 100000 -n 100000 zadd sortedset 10
ele:rand:000000000000
zadd sortedset 10 ele:rand:000000000000: 104166.67 requests per second
$ redis-cli zcard sortedset
(integer) 63202
We got lower values with a more real-world test, but this is actually
only because the sorted set started small and was resized to reach its
final size, and in the process the hash table is resized as well.
Actually if we run the test again, we'll see that updating an already
existing sorted set is almost as fast as plain SET again:
$ ./redis-benchmark -q -r 100000 -n 100000 zadd sortedset 10
ele:rand:000000000000
zadd sortedset 10 ele:rand:000000000000: 138888.89 requests per second
However you may wonder, this are just around 100k elements, what
happens with millions of elements?
$ ./redis-benchmark -q -r 10000000 -n 10000000 zadd sortedset 10
ele:rand:000000000000
zadd sortedset 10 ele:rand:000000000000: 69898.79 requests per second
$ redis-cli zcard sortedset(integer) 6345866
So to create from scratch a 6 million sorted set key the speed was 70k
insertions per second, but again this has the overhead of creating
everything. Now let's check if our business is just updating the
scores:
zadd sortedset 10 ele:rand:000000000000: 132694.77
Now let's check incrementing the score of random elements:
zincrby sortedset 10 ele:rand:000000000000: 39511.39
Incrementing of random elements is slower, showing 40k ops per second
(not sure why, probably profiling it will show some trivial
optimization to perform).
what about ZRANK?
zrank sortedset ele:rand:000000000000: 70774.48
And finally let's get a range in the middle of this 6 millions entry sorted set.
$ redis-cli zrange sortedset 3000000 3000010
1) "ele:rand:000005696312"
2) "ele:rand:000005696314"
3) "ele:rand:000005696315"
4) "ele:rand:000005696317"
5) "ele:rand:000005696318"
6) "ele:rand:000005696320"
7) "ele:rand:000005696321"
8) "ele:rand:000005696324"
9) "ele:rand:000005696326"
10) "ele:rand:000005696327"
11) "ele:rand:000005696328"
$ ./redis-benchmark -q -r 10000000 -n 10000000 zrange sortedset 3000000 3000010
zrange sortedset 3000000 3000010: 132718.59
132k/s to fetch the same 10 elements again and again from the middle.
Long story short, sorted sets are fast and keep to be fast even when
you have millions of elements inside.
Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups "Redis DB" group.
> To post to this group, send email to redi...@googlegroups.com.
> To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
>
--
Salvatore 'antirez' Sanfilippo
open source developer - VMware
http://invece.org
"We are what we repeatedly do. Excellence, therefore, is not an act,
but a habit." -- Aristotele
sure, in this case the high IOPS (i/o ops per second) provided
by SSDs are great. we run redis diskstore in production since
last summer. our box has 24 gig ram plus 200 gig ssd storage
(4 disks in raid 10), and this setup works fine:
----------------
$ iostat
Linux 2.6.32-29-server (panama) 01/18/12 _x86_64_ (12 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
2.86 0.00 5.43 0.37 0.00 91.34
Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 582.52 3239.27 3022.51 2416639950 2254927608
----------------
also helps when you concurrently want to backup the diskstore
files to a second machine via rsync :)
cheers
tim
Thanks for sharing, but I feel that the user can get confused:
diskstore, the technology Tim is using, is not inside a Redis stable
release and there are currently no plans to do so.
So I guess that the original user plans to use the OS swap. An SSD may
help in this context, but Redis should be optimized to handle swapping
better (optionally using threads to serve queries, with key-level
locking). I've a new SSD that I'll use to run this tests in the near
future, once I'll have some data I'll surely share them with you.
About normal persistence, an SSD helps but with a limited impact.
Because of the way Redis works, always writing data in append-only
mode, normal rotating disks already work very well.
Cheers,
Salvatore
I use non-transactional pipelines of 1000 commands deep on
pre-generated sequential keys/members. Not only do the keys that I
generate also generate sequential hashes, so has effectively zero
collisions in hash buckets, but the benchmark (IIRC) doesn't use
pipelining; it uses 50 concurrent connections to get it's throughput.
It's possible that pipelines are able to better utilize Redis than the
concurrent connections in this scenario (especially because the
response is not complicated and easy to parse).
Regards,
- Josiah
ZINCRBY: 70304.31 per second
ZRANGE (first 10): 66485.50 per second
ZRANGE (first 100): 24411.03 per second
ZRANGE (first 300): 9207.51 per second
ZRANGE (first 450): 5794.75 per second
ZRANGE (first 600): 4212.64 per second
SET: 92515.85 per second
LPUSH: 90825.36 per second
LRANGE (first 10): 78351.34 per second
LRANGE (first 100): 24739.61 per second
LRANGE (first 300): 9373.92 per second
LRANGE (first 450): 6053.34 per second
LRANGE (first 600): 4193.56 per second
> Also there's about a 3x performance difference between LRANGE using
> redis-benchmark and LRANGE using your python script. Is that because
> redis-benchmark is written in C while your script is in python?
Yes. Parsing speed makes a huge difference.
>> What problem are you really trying to solve?
>
> 2 things:
>
> 1) Q&A system - all answers to a specific question can be represented
> as a sorted set, sorted by the score of each answer. User voting will
> be used to determine the scores.
This is exactly the kind of system that Redis does very well at.
Simple up/down voting works best for this, but if you need something
more complicated, you can store the up/down votes in an associated
hash, then update the score in the sorted set later. I've done the
complicated scoring thing in MySQL before, and it's not too bad.
> 2) Tracking - every time a user visits a page, I need to keep track of
> which category that page belongs to. So each user will have a sorted
> set of (category, hits) where hits is the number of times that user
> has visited a page belongs to "category".
You may want to degrade that over time, so that you can track their
behavior changes over time. "ZINTERSTORE key 1 key weights .5" will
halve the scores.
> All your results are for Redis running on a single core, right?
Redis is single-threaded. I don't do anything special to it, but the
linux scheduler will keep it on a single core.
> I'm trying to decide between MySQL and Redis. Redis looks really good
> so far, but I'm a bit concerned about memory requirements:
> 1) Assuming both are completely in-memory, how does Redis sorted sets
> compared to a equivalent MySQL table in size? Is there any
> significant difference?
MySQL (like most databases) pack the data fairly efficiently for
on-disk storage. In-memory storage of the same data is essentially the
same, and will be cached on the page/large chunk level. Redis will use
more memory than the on-disk or in-memory caching of MySQL.
> 2) In the case of MySQL, even if the table grows larger than RAM, I
> could still use an SSD to maintain respectable performance. Is the
> same true for Redis? Or is Redis pretty much unusable once the data
> set grows larger than RAM, even if there's an SSD?
In the case of MySQL, if you structure your updates properly, it will
perform very respectively even if you have a spinning disk. SSDs will
help both of them, but I've not picked up a SSD for Redis testing
(I've got a Samsung SSD on my personal workstation, and it is
amazing).
Regards,
- Josiah
Cheers,
Pieter
> zincrby sortedset 10 ele:rand:000000000000: 39511.39
A few emails ago we were wondering why zincrby was slower than zadd,
so today I used Instruments to profile the two commands.
The difference was in the (bad) benchmarking, with zadd I was always
adding values with the same score they had previously, and Redis has
an internal optimization to avoid touching the element at all if the
score was already equal to the new one.
Once you start randomizing the score of the zadd operation the
performance is he same as zincrby.
Further inspection of profiling data shows that the additional time
when the score changes is mostly lost into zslDelete(), more
specifically in the line: "while (x->level[i].forward &&", so this are
clearly cache misses.
Walking the sorted set is not needed when the score does not change,
since we obtain the score information using the hash table lookup.
This is also an indirect hint about the fact that the hash table is
not just O(1) in the average case compared to O(Log(N)) of the skip
list, but also much better in terms of constant times. This would
suggest that we not did an error taking our incremental hash table
implementation instead of switching to a tree-alike structure for our
implementations of things that externally expose a "Dictionary" alike
interface.
Cheers,
Salvatore