Hi Everyone,
-- Please bear with me if I am not asking it right.
========================
We have about 5 dimensions in our data. 2 of them lets say dimension A and dimension B have very high cardinality.
dimension A - 10m
dimension B - 100m
Datasource - D
Our query pattern:
1) Aggregated counts on data filtered by a single value of dimension A and between an interval.
Example:
{
"queryType": "timeseries",
"dataSource": "D",
"intervals": "2018-01-10T08Z/2018-01-11T08Z",
"granularity": "all",
"context": {
"timeout": 6000000
},
"filter": {
"type": "selector",
"dimension": "A",
"value": "10271023423"
},
"aggregations": [
{
"name": "__VALUE__",
"type": "doubleSum",
"fieldName": "count"
}
]
}
2) Aggregated counts on data filtered by a single value of dimension A and grouped by dimension B.
Example:
{
"queryType": "topN",
"dataSource": "D",
"intervals": "2018-01-10T08Z/2018-01-11T08Z",
"granularity": "all",
"context": {
"timeout": 6000000
},
"filter": {
"type": "selector",
"dimension": "A",
"value": "10271023423"
},
"dimension": {
"type": "default",
"dimension": "B",
"outputName": "B"
},
"aggregations": [
{
"name": "count",
"type": "doubleSum",
"fieldName": "count"
}
],
"metric": "count",
"threshold": 1000000
}
=======================================================
Do you guys foresee any problem with such a query pattern when we have such high cardinality?
I have heard that druid has issues if cardinality reaches near 100m and hence I am asking this question.
Any input is highly appreciated.
Thanks