Using Multiple keyspaces vs Single Keyspace

165 views
Skip to first unread message

Maaz Khan

unread,
Mar 28, 2017, 3:59:17 PM3/28/17
to KairosDB
Hi,

For our use case we are debating over whether to have a separate keyspace for different customers which can range from 20 - 40 or a single keyspace. Each Customer will basically have its own read and write Kairos instances due to multiple keyspaces (More $$$ to operate). We can also club some customers in a specific keyspace since not all of them will have the same load.

Question:
Is it a good practice to have separate keyspaces for Multiple Kairos instances?
AFAIK Writes won't have any major effect whether we use multiple or single keyspace, Is that correct?
For Reads:
   - Single Keyspace: To segregate data we will need to put the Team and SubTeam ID as tags for metrics. How will this affect the Reads/Query request?
   - Multiple Keyspace: Since data is already segregated I assume reads will be much faster but then having multiple keyspaces (meaning more Column Family) will have its won implication on the system?

Thanks
Maaz


Brian Hawkins

unread,
Mar 30, 2017, 11:01:56 AM3/30/17
to KairosDB
The downside to multiple keyspaces is that Cassandra has to load them all up separately.  Takes more resources on the C* cluster - how much I don't know.

This has been brought up before, I added issue https://github.com/kairosdb/kairosdb/issues/373 to track the ideas and possible solution.  The simplest is to force some kind of metric prefix for each customer.

Question for you:  Are you expecting customers to send data directly to Kairos?  If not how do you identify each customer?  Will the customer have direct access to Kairos for queries?

Brian

Maaz Khan

unread,
Mar 30, 2017, 2:29:27 PM3/30/17
to KairosDB
Are you expecting customers to send data directly to Kairos?
- Customers uses logstash to send data to our platform. We internally handle how and when to write the data to kairos. FYI we are using InfluxDB as of now and designing the plan to switch to Kairos

if not how do you identify each customer? 
- We have a tenantId for each customer. For now we are thinking of writing a reverse proxy for Kairos (This proxy will serve Grafana dashboard). To achieve this we plan to follow 2 things:
  • Our Grafana dashboard will point to this reverse proxy and add TenantID to its URL something like this: www.kairoproxy.com/tenantID --> We are thinking of modifying the list of Metrics response to return only those metrics that are part of this particular tenant. This way we control what metric needs to be shown on a particular Dashboard (We have different Grafana instances for different customers)
  • tenantID as a TAG for metric to identify which metrics belong to which tenant. Quick follow up question on this: How do I get list of metrics which have a specific tag values. From the REST API docs to it seems that querying metrics based on tags requires metric name to be passed on, but we are interested in retrieving all the METRIC NAMES which have a particular tag. We want to avoid adding prefixes. 

Will the customer have direct access to Kairos for queries?
No Customer will only use Grafana dashboard

Maaz

Maaz Khan

unread,
Apr 3, 2017, 7:13:37 PM4/3/17
to KairosDB
Brian,

Apart from the downside of loading all of them separately Is there any real advantage of using multiple keyspace? 

Maaz

On Thursday, March 30, 2017 at 8:01:56 AM UTC-7, Brian Hawkins wrote:

Brian Hawkins

unread,
Apr 4, 2017, 3:54:07 PM4/4/17
to KairosDB
Tags are probably the worst way to handle this.  The only downside to multiple keyspaces is the resource requirements.  Multiple keyspaces also requires more Kairos nodes, cannot double up.

Here is something to think about to save resources.  Say you have 30 customers and a 9 node C* cluster can handle them all.  You could break the C* cluster up into 3, 3 node clusters and put 10 customers on each.  This way each C* node only loads 10 keyspaces instead of 30 and effectively the performance should be the same.

Brian
Message has been deleted

Maaz Khan

unread,
Apr 4, 2017, 4:31:31 PM4/4/17
to KairosDB
Brian I guess my main question still remains or maybe I am missing something. Is there a advantage of using multiple keyspace at all when it comes to read/write metrics? Because from what it looks like Phase-1 will obtain the necessary info required by looking up by "metric name as the key" which I think wont have any effect if there are other tenant metrics present in the same keyspace.

Maaz

Brian Hawkins

unread,
Apr 4, 2017, 6:25:42 PM4/4/17
to KairosDB
I think I just realized something about your use case.  Because you are generating the data each client has the same metrics right?

If both clients have a metric foo and they each have a different set of tags this causes a couple of problems if in the same keyspace.  The tag cardinality of foo will be higher and could cause a slow down.  When using the UI/Grafana and listing tag options to add to the query the client will see both clients set of tags.  Another problem is if client A has metric foo and client B has metric bar.  When you list metrics to query from the UI both clients will see foo and bar (this assumes none of the Kairos internals have been changed)

Now if all your metrics and tags are the same for every client it doesn't matter listing metrics or tags as each will have the same set.  The problem is if you use a tag to designate one customer from the other is that the drop down for adding a tag to a query will show the customer tag and each value.  Unless you have read only dashboards for each client and they cannot modify them.  In this case you will want to force setting exclude_tags=true on each query or else the client list will be sent to each client's browser (if they open developer tools they could see it in the traffic).

Does that answer your question?
Reply all
Reply to author
Forward
0 new messages