Is mass insertion the fastest technique to insert large amounts of data to redis?

Nachi Vpn

unread,

Oct 21, 2014, 12:04:23 AM10/21/14

to redi...@googlegroups.com

Hi guys,

I have been doing some experimenting/benchmarking with quickly inserting large amounts of data to redis. And I noticed a peculiar scenario of extremely fast writes to redis by sending multiple arguments to the same command. Not sure if it has been noticed before. The speedup magnitude leads me to think that this approach is not a trivial difference.

I hope it's ok to post my blog post's link here. An article wit presents the approach and some benchmarking results. The reason I post here is because I look forward to some feedback/criticism on this approach.

So, I wonder if this could actually be a better method than Redis Mass Insertion? I could benchmark this myself, but I would like to know why it could/couldn't be faster theoretically.

p.s. I've been playing around with Redis for two-three months now. Yes, I love it. No, I am not an expert.

Thanks,

Marc Gravell

unread,

Oct 21, 2014, 3:34:36 AM10/21/14

to redi...@googlegroups.com

Firstly I disagree with the assertion (stated in words and mathematically) that pipelining cuts RTT in half : it does much, much better than that. It means you only pay *1 of* RTT - most easily considered as "the last one", since most socket libraries buffer at the NIC and deal with that anyway - allowing to to keep writing.

If I had to guess, the difference you are seeing is *packet fragmentation*. Obviously I don't have your test to repro, but it sounds consistent with the approach you are taking. If your "single command" approach is also "single packet", then yes - that will be less effective on networks with high MTUs. What you could investigate there as a quick test is simply: batching; rather than send one command each iteration send 10 in a batch (this can still be pipelined). Obviously this would depend on your client library supporting pipelined batching!

Marc

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Marc Gravell

unread,

Oct 21, 2014, 4:11:13 AM10/21/14

to redi...@googlegroups.com

Just to clarify - by "packet fragmentation" here, I simply mean: sending lots and lots of tiny packets, rather than a smaller number of larger packets. One packet per command is pretty inefficient in terms of bandwidth, but will get a faster initial response than if you wait for more work - obviously, the server can't process what you haven't sent it yet! Clearly that isn't important when pipelining, because you *don't care about* (in terms of latency) the response to the first message; you care more about the response to the *last*.

For comparison, what SE.Redis does (one of the .NET clients) is: commands are always sent to a dedicated writer thread. There is always some tiny time involved in the writer thread waking up, which means there might *already* be multiple things in the pipe waiting to be sent. It then additionally writes to a buffered stream, which automatically sends to the socket whenever it hits the chosen size. Whenever the writer runs out of things to write, it does a very short spin-wait; if there still isn't anything to do, it'll send the short packet - but the idea is that *when under heavy load* it biases towards packets with multiple messages in them (possibly even nicely full packets, where "full" simply represents our arbitrary send size).

Marc

Nachi Vpn

unread,

Oct 21, 2014, 10:58:55 AM10/21/14

to redi...@googlegroups.com

Thanks Marc for the feedback. I'll update the post accordingly by re-writing the incorrect assumptions.

If the pipelined approach were to conceptually consume only the cost of a single RTT value, then why is it practically much more expensive than a single command's (single packet) RTT? And If I were to think of pipelining as buffering at NIC, then the only difference from bufferedis (the name I've given my approach), is it buffers in the application layer.

Is there any in-detail documentation of the pipeline mechanism? This page didn't quite do the job for me. I am simply trying to understand how this approach is giving better results than pipelining. The "Packet fragmentation" you've pointed out, seems like a straight forward explanation for bufferedis. But I would be able to compare better if I could understand better about redis pipelining. Is the performance gain of redis pipeline over one command-one packet approach completely dependent on the client library implementation?

So, If I've understood correctly, you've suggested I try sending the 10 commands as a single batch i.e, pipelined transmission. Conceptually speaking, it should give even better results. I'll get back to you with some numbers. I am using Jedis as my client library. It very well supports redis pipelining.

Marc Gravell

unread,

Oct 21, 2014, 1:28:18 PM10/21/14

to redi...@googlegroups.com

Time taken for a command is:

- client outbound processing
- latency (client-server)
- bandwidth (client-server)
- server-side processing
- latency (server-client)
- bandwidth (server-client)
- client inbound processing

The only thing that pipelining changes is: latency. Rather than paying latency per-request, you *only* pay the latency cost once - because you *don't keep waiting*. All other costs remain the same.

As for your "why is it more expensive" - pipelining does involve additional things like keeping a queue of outstanding messages, and often async IO. However, compared to the *huge* cost of latency, that is a good trade. Latency is usually the single most expensive cost.

I haven't looked at how your rig is structured, but done properly: pipelining should be very very efficient. If you saturate the network with tiny packets, that will be diminished somewhat.

Does that make sense?

Marc

Nachi Vpn

unread,

Oct 22, 2014, 2:50:44 AM10/22/14

to redi...@googlegroups.com

Makes a lot of sense. Thanks for the detailed explanation!

Nachi Vpn

unread,

Nov 30, 2014, 1:08:28 PM11/30/14

to redi...@googlegroups.com

Marc,

Some updates:

http://nachivpn.blogspot.in/2014/11/redis-pipeline-explained.html

Cheers!

Reply all

Reply to author

Forward