For some queries druid returns duplicate rows for the same dimension with different resCounts. This is because for on one row the dimension's value is an empty string but on the other it is null.
This causes our aggregations to be broken up into two rows. I notice that once some time has passed, they will be deduped and the result will show up on the same row. i.e if the same query (with the same end time) is ran again - then the results show up correctly
For instance - for the query
{
"dataSource": "customer_message_tracker",
"dimensions": [
{
"type": "default",
"dimension": "CHECKPOINT",
"outputName": "checkpoint"
},
{
"type": "default",
"dimension": "TRACKER_MESSAGE",
"outputName": "tracker_message"
}
],
"queryType": "groupBy",
"orderBy": {
"type": "default",
"columns": [
{
"dimension": "resCount",
"direction": "DESCENDING"
}
],
"limit": 20
},
"intervals": {
"intervals": [
"2013-10-08T21:00:00/2013-10-08T22:14:00"
The response was
[
{
"event": {
"checkpoint": "ARCHIVED",
"resCount": 1563374,
"tracker_message": "SENT"
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "CONSUMED",
"resCount": 1375112
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "PROCESSED",
"resCount": 1349975
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "CONSUMED",
"resCount": 398168,
"tracker_message": ""
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "PROCESSED",
"resCount": 391711,
"tracker_message": ""
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "ARCHIVED",
"resCount": 178268,
"tracker_message": "HOLDOUT"
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "ARCHIVED",
"resCount": 31112,
"tracker_message": "FAILED"
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "RETRIED",
"resCount": 58,
"tracker_message": "SUBSCRIBER_CLIENT_EXCEPTION||"
},
"timestamp": "2013-10-08T21:00:00.000Z"
},
{
"event": {
"checkpoint": "ARCHIVED",
"resCount": 10,
"tracker_message": "FILTER"
},
"timestamp": "2013-10-08T21:00:00.000Z"
}
]
We are using 0.4.32.2 and if this bug is know issue and fixed in 0.5.x, please let me know, I should update.
Thank you
Best, Jae