what is the max limit on Dimension and Cordinality count

1,321 views
Skip to first unread message

sit...@gmail.com

unread,
Oct 15, 2015, 5:09:18 AM10/15/15
to Druid User

We have started to use Druid for our project and the version currently we use is 0.7.1.1. In order to decide to support a high cordinality dimension and for other reasons , I have the following questions
1. What is the max. no of dimensions we can have? Have we done any benchmarking and  any advisable count?
2. What is the max cordianlity a dimension can have? OR what kind of problems we need to compromise if we go beyond certain limit (anf of course what is that limit)?

Can you please answer these questions?

//Sithik

Nishant Bangarwa

unread,
Oct 15, 2015, 9:04:29 AM10/15/15
to druid...@googlegroups.com
Hi Sithik, 
See Inline

On Thu, Oct 15, 2015 at 2:39 PM, <sit...@gmail.com> wrote:

We have started to use Druid for our project and the version currently we use is 0.7.1.1. In order to decide to support a high cordinality dimension and for other reasons , I have the following questions
1. What is the max. no of dimensions we can have? Have we done any benchmarking and  any advisable count?
There is no max limit on the number of dimensions you can have. We have see people successfully working with druid in productions from a few to around a hundred dimensions.   
2. What is the max cordianlity a dimension can have? OR what kind of problems we need to compromise if we go beyond certain limit (anf of course what is that limit)?
Again no hard limits set here, generally dimensions with super high cardinality are associated with poor rollup and large segment sizes leading to increase in query times. If all you care about the high cardinality dimension is the approximate cardinality of a dimension , you can use HyperUnique aggregator to store hyperloglog sketches instead of the raw dimension values which will give you much higher query speeds and decrease in storage too. 
 

Can you please answer these questions?

//Sithik

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/7fa0a606-86a5-4a6b-925a-24a108677695%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Nishant Bangarwa

unread,
Oct 15, 2015, 9:08:44 AM10/15/15
to druid...@googlegroups.com
Also, There are many improvements and big fixes in druid since 0.7.1.1
I would also recommend you to switch to 0.8.1

Message has been deleted

sit...@gmail.com

unread,
Oct 15, 2015, 9:49:15 AM10/15/15
to Druid User
Thanks Nishant for the quick reply.

1. From other thread I already got to know that people have explored upto 100 dimensions, but we have got a requirement to support 150+ dimensions and that's the reason I wanted to check the cap on the dimensions we can have.
2. If I understand it correctly, from HyperUnique aggregator / hyperloglog sketches, we can just get the cardinality of that dimension nothing else. Suppose If I want to get the metric count by applying filter on this high cardinality dimension and low cardinality dimension, I don't think it's possible by keeping high cardinality dimension as  hyperloglog sketches ? More over we might need to bear the cost of HyperUnique aggregation during the ingestion, if so any stats available on this?

Sure we will migrate to 0.8.1

Thanks,
Sithik




Fangjin Yang

unread,
Oct 18, 2015, 12:56:49 PM10/18/15
to Druid User
Inline.


On Thursday, October 15, 2015 at 6:49:15 AM UTC-7, sit...@gmail.com wrote:
Thanks Nishant for the quick reply.

1. From other thread I already got to know that people have explored upto 100 dimensions, but we have got a requirement to support 150+ dimensions and that's the reason I wanted to check the cap on the dimensions we can have.

Folks have been successful with thousands of dimensions. 150+ is fine.
 
2. If I understand it correctly, from HyperUnique aggregator / hyperloglog sketches, we can just get the cardinality of that dimension nothing else. Suppose If I want to get the metric count by applying filter on this high cardinality dimension and low cardinality dimension, I don't think it's possible by keeping high cardinality dimension as  hyperloglog sketches ? More over we might need to bear the cost of HyperUnique aggregation during the ingestion, if so any stats available on this?

sit...@gmail.com

unread,
Oct 19, 2015, 2:12:13 AM10/19/15
to Druid User
Thanks Fangjin Yang. Let me go through the video.
Reply all
Reply to author
Forward
0 new messages