Hyperunique is giving different results on multiple execution of same query

113 views
Skip to first unread message

priyam gupta

unread,
Dec 21, 2017, 7:28:54 AM12/21/17
to Druid User
While using hyperunique aggregation, I am getting different values on multiple execution of the same query.

Query:

{
"queryType": "groupBy",
"dataSource": "actives_data_source",
"granularity": {
"type": "period", 
"period": "P1D", 
"origin": "2017-12-01T10:00:00.000Z"
},
 
"dimensions": ["state", "location_class"],
"intervals": "2017-12-11T10:00:00.000Z/2017-12-15T10:00:00.000Z",
"aggregations":[
{"type": "hyperUnique", "name": "actives", "fieldName": "user_id"}
]
}




ResultSet:

Execution Attempt 1:

{
        "version": "v1",
        "timestamp": "2017-12-11T10:00:00.000Z",
        "event": {
            "location_class": "1L",
            "state": "Bihar",
            "actives": 19396.80653321458
        }
  }


Execution Attempt 2:

    {
        "version": "v1",
        "timestamp": "2017-12-11T10:00:00.000Z",
        "event": {
            "location_class": "1L",
            "state": "Bihar",
            "actives": 19389.034881761647
        }
    }

Execution Attempt 3:

    {
        "version": "v1",
        "timestamp": "2017-12-11T10:00:00.000Z",
        "event": {
            "location_class": "1L",
            "state": "Bihar",
            "actives": 19392.92087779155
        }
    }


On every execution I am getting one of the 3 highlighted values. Its random.

Observed same behaviour on following druid versions:

0.8.3
0.11.0


Has anyone observed this behaviour and what was the root cause.


Kyle Boyle

unread,
Dec 21, 2017, 9:56:32 AM12/21/17
to Druid User
HyperUnique uses HyperLogLog which is an estimation of cardinality. Perhaps your result value changes based on the order in which results from historicals are merged together. Since this is an estimation, there is a margin of error which is what you may be seeing here.

Kyle

priyam gupta

unread,
Dec 21, 2017, 12:21:04 PM12/21/17
to Druid User
Thanks Kyle. Is there any way that we can enforce some order so that we can get consistent results. 

Kyle Boyle

unread,
Dec 21, 2017, 1:41:37 PM12/21/17
to Druid User
You could try enabling broker query caching for groupBy queries.. that might be dangerous though. I also have no idea if this would actually impact the consistency of your results.

Kyle

Nishant Bangarwa

unread,
Dec 21, 2017, 5:27:49 PM12/21/17
to druid...@googlegroups.com
you can try setting groupBy merging to happen in single threaded, the merge order might be same for consecutive executions with single threaded groupBy - 
relevant property "druid.query.groupBy.singleThreaded" for runtime.props and "groupByIsSingleThreaded" when specifying in context.

Please note that it can affect performance negatively.  


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/de735f8f-e225-4e18-92f1-c429eb43c69d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

priyam gupta

unread,
Dec 22, 2017, 11:51:03 AM12/22/17
to Druid User
Thanks Nishant. Doing it with single thread worked. 

Nishant Bangarwa

unread,
Dec 22, 2017, 2:08:59 PM12/22/17
to druid...@googlegroups.com
Glad, it worked, Just keep an eye on the performance. 

Ramesh Shanmugam

unread,
Jun 14, 2019, 12:20:52 PM6/14/19
to Druid User

We are also seeing the same behavior which is the hyper-unique unique results differs multiple executions of the same query.

Unfortunately 'groupByIsSingleThreaded' does not work for us.  tried with v1 and v2 group strategy but does not help. 

We understand hyper-unique is `approximate estimation` but wonder is it deterministic?. 

Also noticed if we query a single segment that always gives deterministic output. Looks like this behavior occurs only when it tries to union multiple segments of HLL.


On Friday, 22 December 2017 11:08:59 UTC-8, Nishant Bangarwa wrote:
Glad, it worked, Just keep an eye on the performance. 

To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages