Computes the cardinality of a set of Druid dimensions, using HyperLogLog to estimate the cardinality. Please note that this aggregator will be much slower than indexing a column with the hyperUnique aggregator.
Uses HyperLogLog to compute the estimated cardinality of a dimension that has been aggregated as a "hyperUnique" metric at indexing time.
{ "type" : "hyperUnique", "name" : <output_name>, "fieldName" : <metric_name> }
{
"type": "index_hadoop",
"spec": {
"dataSchema": {
"dataSource": "special_report-V1",
"parser": {
"type": "string",
"parseSpec": {
"format": "csv",
"columns" : ["dim1","dim2","dim3","dim4","dim5"],
"timestampSpec": {
"column": "msgDate",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": ["dim1","dim2","dim3","dim4","dim5"],
"dimensionExclusions": [],
"spatialDimensions": []
}
}
},
"metricsSpec" : [{"type": "count", "name": "count"}, { "type" : "hyperUnique", "name" : "dim2_count", "fieldName" : "dim2" }],
"granularitySpec" : {
"type" : "uniform",
"segmentGranularity" : "HOUR",
"queryGranularity" : "HOUR",
"intervals": ["2016-05-19T10:00:00.000Z/2016-05-19T12:00:00.000Z"]
}
},
"ioConfig": {
"type": "hadoop",
"inputSpec": {
"type": "granularity",
"dataGranularity": "HOUR",
"inputPath": "/tmp/special-reports",
"filePattern": ".*.csv"
}
},
"tuningConfig": {
"type": "hadoop",
"partitionsSpec": {
"targetPartitionSize": 0
}
}
}
}Druid Query JSON:
{"queryType": "groupBy","dataSource": "special_report-V1","granularity": "day","dimensions": ["dim1"],"aggregations": [{ "type": "cardinality", "name": "dim2Count1", "fieldNames": ["dim2"], "byRow":false },{ "type": "cardinality", "name": "dim2Count2", "fieldNames": ["dim2_count"], "byRow":false },{"type": "count","name": "count"}],"postAggregations": [],"intervals": [ "2016-05-19T10:00:00.000Z/2016-05-19T12:00:00.000Z" ]}
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/1790c283-9d5c-4411-a60e-3cd2e5924506%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
"metricsSpec" : [{"type": "count", "name": "count"}, { "type" : "hyperUnique", "name" : "dim2_count", "fieldName" : "dim2" }]
Druid Query JSON:
{ "queryType": "groupBy", "dataSource": "special_report-V1", "granularity": "day", "dimensions": ["dim1"], "aggregations": [ { "type": "hyperUnique", "name": "dim2_HyperUniqueCount", "fieldNames": ["dim2_count"], "byRow":false },To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/d13ca031-0dbf-4e16-afd6-a3bafcf0b2da%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAG0p_PEmDK_-VrLSsT0GUL9ZRY-%3DkU988pyE%3DBNjuLEOMiyfWw%40mail.gmail.com.
"metricsSpec" : [{"type": "count", "name": "count"}, { "type" : "hyperUnique", "name" : "dim2_count", "fieldName" : "dim2" }]
Druid Query JSON:
{ "queryType": "groupBy", "dataSource": "special_report-V1", "granularity": "day", "dimensions": ["dim1"], "aggregations": [ { "type": "hyperUnique", "name": "dim2_HyperUniqueCount", "fieldName": "dim2_count" }, {"type": "count","name": "count"} ], "postAggregations": [ ], "intervals": [ "2016-05-19T10:00:00.000Z/2016-05-19T12:00:00.000Z" ]}