redis cluster and bulk loading?

Dan Young

unread,

Jun 4, 2014, 9:27:03 PM6/4/14

to redi...@googlegroups.com

I've been using the Redis protocol to load data into Redis via the redis-cli, and it works great. I'm checking out Redis Cluster and it appears that the same way of doing mass inserting via the protocol isn't working with redis-cli -c --pipe-timeout 0 --pipe. Is there support, or planned support for this ?

Regards,

Dan

Matt Stancliff

unread,

Jun 4, 2014, 10:30:36 PM6/4/14

to redi...@googlegroups.com

On Jun 4, 2014, at 9:27 PM, Dan Young <dano...@gmail.com> wrote:

> I've been using the Redis protocol to load data into Redis via the redis-cli, and it works great. I'm checking out Redis Cluster and it appears that the same way of doing mass inserting via the protocol isn't working with redis-cli -c --pipe-timeout 0 --pipe. Is there support, or planned support for this ?

Using that method, you could only import keys mapping to the one server you’re targeting for import. If you pipe an entire un-filtered dataset, any keys not mapping to your target server can’t be set.

The -c mode of redis-cli won’t help you here because the -c redirect is done per-request, but --pipe is bulk-sending your data to the server. Regular bulk loading --pipe operation is incompatible with Cluster unless you pre-select only keys owned by the Cluster instance you’re importing against.

But, there is a (partial) solution! Check out the cluster import function added in beta4: redis-trib import --from host:port cluster_host:cluster_port

The cluster import mechanism isn’t exactly the same as your bulk import process because you already need your data in live Redis instances. You could always take a cheap approach of bulk loading into a local Redis instance, importing the instance to the cluster (without it joining the cluster), emptying the instance, then repeating until your entire data set is populated in the cluster.

If you need bulk import of data files directly into a cluster, you’d need to write a tiny utility program to send each key through a Cluster-aware client - or - you could partition your bulk source data by which server should receive it. To still import using --pipe on cluster instances, you can: pre-determine the mapped Cluster slots for each of your import keys (each key maps to one of 16k slots, and the 16k slots are evenly distributed among your Cluster master instances; see key_to_slot function in redis-trib.rb), look up the matching master instance for each slot (command: CLUSTER NODES), bundle your keys together by which master instance owns the slots for your keys, then run the pipe import containing only keys mapping to slots against your target cluster node. You’d probably want to pre-sort your import keys by target server before formatting your source data to Redis protocol for the pipe stream.

For that cluster bulk load process, we assume your cluster is stable between when your bucketed-by-server pipe files are generated and when you go to load the files. Any additional instances joining or leaving your cluster will alter the slot distribution and invalidate the instance ownership in your bulk partitions. Any cluster master->replica failovers won’t change your bucketed slot mappings, but a failover would change which server you should be importing against.

Hopefully that added more clarity than confusion. If anything is unclear, please ask more questions. :)

-Matt

Dan Young

unread,

Jun 5, 2014, 5:08:59 PM6/5/14

to redi...@googlegroups.com

Thanx Matt, let me digest this and get back to you....thanx for giving me some direction...