What you describe doesn't seem terribly IO intensive to me, but you
also haven't given the number of sadd calls you are performing per
second, the number of smembers calls per second, the average size of
your sets, the memory use on each of your shards, etc.
For me, the first thing I ask is "are my operations limited by Redis,
or by something else". If under a pure write scenario you are able to
push 50k-150k operations per second to a single node (and 250k-750k
per second to your cluster of 5 nodes), then the problem is not with
your code, it is because Redis has probably at it's current limit. If
you are sustaining that rate, then there may be small incremental
improvements (depending on what the hardware could sustain). On the
other hand, if you are only able to perform 10k operations per second,
then there are a lot of efficiencies to be gained.
> Now for some info on my redis setup and usage.
> I only use set operations -- sadd, smembers for reads & writes. Redis is
> setup over a small cluster and I use ShardedJedis for communicating with all
> redis instances. I use pipelining wherever possible (i.e. wherever I know
> that couple of keys go to the same shard). On a 5-node cluster, there are
> about 400k keys on each node.
> I have done a bit of testing. Writes take a lot of time compared to reads
> and when I use pipelining for writes, it is almost always 4 times faster
> than without pipelining (this is in the case of 2 nodes with 1 node is
> reading & writing to the other node).
What kind of write throughput are you getting with your pipelined
writes? 10k/second? 20k/second? 50k/second? 100k/second?
> After that experiment, I thought of having a global pipeline across all
> writes in the application -- maintain a queue for each shard and put each
> write operation as an entry in the queue. Now, instead of calling sadd, I
> would call my own method (say psadd) which would put this call in
> appropriate queue. After the queue has reasonable number of entries (how
> many is reasonable?), I would do a pipeline for all the entries.
> I found that reads are pretty quick, but would it help to do something
> similar for reads as well?
What does "pretty quick" mean? Faster than your smaller pipelines that
you had been creating and executing (probably one at a time, waiting
for up to 5 round trips)?
Generally when I am doing automatic pipelining and I don't care about
the results (I just want the commands to be sent), I will typically
batch them up in chunks of 1000-5000. Even with an older non-hiredis
accelerated version of redis-py, I can usually sustain 50k
writes/second without much difficulty with that setup.
In terms of reads, you may be able to do the same and get some
performance, but if your sets are even moderately sized (say 100 or
1000 elements or larger), you may not realize as much of a gain as you
would expect, though it really all depends on the network behavior
between your two machines. Obviously, you should test.
> I am hoping that this would speed up the application quite a bit. Are there
> any other commonly used techniques for speeding up IO operations?
Pipelining, using clients with a good protocol parser/driver (if Jedis
uses hiredis, this is probably the case), making sure to locally cache
stuff when reasonable, ... that's it off the top of my head for
general "this is how you make it better".
Regards,
- Josiah
--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To post to this group, send email to redi...@googlegroups.com.
To unsubscribe from this group, send email to redis-db+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/redis-db?hl=en.
There may be a secondary redis connection that takes a path to a unix
domain socket (that's what some other libraries do). If one is not
available, then it would seem that Jedis hasn't implemented unix
domain socket support yet.
> 2) Can I use sockets even in a cluster setting?
You mean unix domain sockets? No. You need to use standard TCP/IP
sockets for remote connections, unix domain sockets are local machine
only.
> 3) Are there any optimum number of parallel connections? With a single
> connection, the performance is pretty low compared to 50 connections?
The benchmark application doesn't use pipelining, it uses multiple
connections. To optimize for number of connections, how many to use
depends on your operations, your processor, your network behavior,
etc. You would need to benchmark to find your sweet spot.
> 4) Would use of tcmalloc & compiling redis with 32bit option increase the
> performance?
No.
If you have done as much as you can to pipeline your calls to Redis,
and you are still getting poor performance, then it sounds like Jedis
just isn't all that fast. Could you paste a bit of your code so that
those with experience using Jedis might be able to point out any
potential slowdowns?
Regards,
- Josiah
In terms of caching, I was mostly referring to a local cache *inside*
your application. That would save any/all round-trips for entries if
you are okay with potentially stale data.
- Josiah