Multi-region TIKV sharding/scheduling

James Hartig

unread,

Apr 17, 2018, 11:19:23 PM4/17/18

to TiDB user group

I'm looking into TIKV as a key-value store to store cross-regional data. I was wondering if it's possible to control the scheduler such that pre-determined regions of data can be preferred to have a leader in a particular region. I see that I can specify region (which I'll further refer to as DC to prevent confusion with data regions), zone, rack, etc data on the TIKV level and the PD level, but I don't see anything in the docs about customizing the scheduler.

Our use case is storing sessions in a multi-dc TIKV cluster. We don't need anything besides key-value, so I don't think we'll be using TIDB, if that matters. Our session keys currently have a prefix of the dc. For example, us-central1-<randomhash> and so ideally we could use that prefix to pre-determine the regions and then tell the PD to prefer leaders in a particular DCs, so the servers in us-central1 are preferred to leaders of the ranges that hold us-central1* keys. If there's a better way to do this that doesn't involve prefixes, anything is an option at this point, we're still in the discovery phase. We currently have separate clusters and we'd like to move to a single global cluster so that a DC going down doesn't wipe out all of the sessions in that DC, but we'd like to avoid (as much as possible) a performance hit by the leader being around the world for a given session.

Thanks!

Message has been deleted

tl

unread,

Apr 18, 2018, 5:08:05 AM4/18/18

to TiDB user group

Hi James

If you just want to put all leaders in one IDC, you can set the label restriction.

But if you just want some leaders in one IDC, and other leaders in another IDC, I guess the Namespace feature can help you, but you need to add your own Namespace logic into PD.

You can bind the Namespace to different IDCs, e.g, set N1 to IDC1 and N2 to IDC2, then define a rule to schedule the region, e.g, if the region's start_key is us-central1, the region leader must be in N1. For simply, you can add the prefix. If not, you may use a complex rule.

Now if you have to add your own logic and rebuild PD, and this is not a recommended way later. Maybe we can extract the namespace feature and let is pluggable. Or maybe we can embed Lua to let the user define its own logic.

在 2018年4月18日星期三 UTC+8上午11:19:23，James Hartig写道：

tl

unread,

Apr 18, 2018, 6:14:02 AM4/18/18

to TiDB user group

@menglong

Can you provide how to develop Namespace detailedly?

@James

You can browse the PD code and we can discuss this in the PD repo later.

在 2018年4月18日星期三 UTC+8下午5:08:05，tl写道：

James Hartig

unread,

Apr 18, 2018, 11:08:36 PM4/18/18

to TiDB user group

Thanks for the information! Hopefully, I can get some pointers but I can take a look at the PD code and I'll update with more questions. Google'ing didn't really get me anywhere so far, so I appreciate any help I can get!

Menglong Huang

unread,

Apr 18, 2018, 11:48:29 PM4/18/18

to TiDB user group

Just like @tl introduced, currently PD does not support the advanced replica placement strategy you described. However, it does have some related features that may help:

namespace

A namespace contains a subset of cluster's regions and stores. We can specify different configuration and scheduling policies for the namespace. PD uses NamespaceClassifier to classify regions into different namespaces. Currently we implement the table namespace classifier, which classifies regions according which TiDB SQL table the region data is belonged to. I guess you may want another classifier which based on the key prefixes.

label-property

The label-property is used to set the scheduling policy based on the labels. Currently we support reject-leader property that prevent some stores to have any region leader. What you need may be an opposite property that forces the region leaders to be distributed on stores with a specific label.

More flexible scheduling strategies is one of our the long term goals. However, due to lack of manpower, we probably will not focus on it in the near future. Contributions from outside the development team are always welcomed. If you want to know more information or willing to make some contributions, please contact with us further.

在 2018年4月18日星期三 UTC+8上午11:19:23，James Hartig写道：

I'm looking into TIKV as a key-value store to store cross-regional data. I was wondering if it's possible to control the scheduler such that pre-determined regions of data can be preferred to have a leader in a particular region. I see that I can specify region (which I'll further refer to as DC to prevent confusion with data regions), zone, rack, etc data on the TIKV level and the PD level, but I don't see anything in the docs about customizing the scheduler.

Reply all

Reply to author

Forward