Hi all,
I'm attempting to use a cardinality aggregator to count distinct values for one dimension within groups constructed across another dimension. I notice that the output of the cardinality aggregator is a non-integer - is there a way to get exact counts (even at the cost of some performance)?
For example, this query:
{
"queryType": "groupBy",
"dataSource": "observations",
"granularity": "fifteen_minute",
"aggregations": [
{ "type": "cardinality", "name": "ip_count", "fieldNames": [ "remote_ip" ] },
{ "type": "cardinality", "name": "fqdn_count", "fieldNames": [ "remote_fqdn" ] },
{ "type": "count", "name": "count" }
],
"intervals": "2015-08-03T17:00:00Z/2015-08-03T18:00:00Z",
"dimensions": [ { "type": "default", "dimension": "remote_ip", "outputName": "remote_ip" } ]
}
Returns this response:
[
{
"version": "v1",
"timestamp": "2015-08-03T17:45:00.000Z",
"event": {
"ip_count": 1.0002442201269182,
"count": 3,
"remote_ip": null,
"fqdn_count": 2.000977198748901
}
},
{
"version": "v1",
"timestamp": "2015-08-03T17:45:00.000Z",
"event": {
"ip_count": 1.0002442201269182,
"count": 7,
"remote_ip": "10.10.194.190",
"fqdn_count": 1.0002442201269182
}
}
]
I guess "count" is the number of actual underlying data source events in each group, which isn't what I want. The other counts are almost right, but I am hesitant to just round/truncate and treat it as accurate...
Thanks,
Mike