Hyperunique is giving different results on multiple execution of same query

113 wyświetleń
Przejdź do pierwszej nieodczytanej wiadomości

priyam gupta

nieprzeczytany,
21 gru 2017, 07:28:5421.12.2017
do Druid User
While using hyperunique aggregation, I am getting different values on multiple execution of the same query.

Query:

{
"queryType": "groupBy",
"dataSource": "actives_data_source",
"granularity": {
"type": "period", 
"period": "P1D", 
"origin": "2017-12-01T10:00:00.000Z"
},
 
"dimensions": ["state", "location_class"],
"intervals": "2017-12-11T10:00:00.000Z/2017-12-15T10:00:00.000Z",
"aggregations":[
{"type": "hyperUnique", "name": "actives", "fieldName": "user_id"}
]
}




ResultSet:

Execution Attempt 1:

{
        "version": "v1",
        "timestamp": "2017-12-11T10:00:00.000Z",
        "event": {
            "location_class": "1L",
            "state": "Bihar",
            "actives": 19396.80653321458
        }
  }


Execution Attempt 2:

    {
        "version": "v1",
        "timestamp": "2017-12-11T10:00:00.000Z",
        "event": {
            "location_class": "1L",
            "state": "Bihar",
            "actives": 19389.034881761647
        }
    }

Execution Attempt 3:

    {
        "version": "v1",
        "timestamp": "2017-12-11T10:00:00.000Z",
        "event": {
            "location_class": "1L",
            "state": "Bihar",
            "actives": 19392.92087779155
        }
    }


On every execution I am getting one of the 3 highlighted values. Its random.

Observed same behaviour on following druid versions:

0.8.3
0.11.0


Has anyone observed this behaviour and what was the root cause.


Kyle Boyle

nieprzeczytany,
21 gru 2017, 09:56:3221.12.2017
do Druid User
HyperUnique uses HyperLogLog which is an estimation of cardinality. Perhaps your result value changes based on the order in which results from historicals are merged together. Since this is an estimation, there is a margin of error which is what you may be seeing here.

Kyle

priyam gupta

nieprzeczytany,
21 gru 2017, 12:21:0421.12.2017
do Druid User
Thanks Kyle. Is there any way that we can enforce some order so that we can get consistent results. 

Kyle Boyle

nieprzeczytany,
21 gru 2017, 13:41:3721.12.2017
do Druid User
You could try enabling broker query caching for groupBy queries.. that might be dangerous though. I also have no idea if this would actually impact the consistency of your results.

Kyle

Nishant Bangarwa

nieprzeczytany,
21 gru 2017, 17:27:4921.12.2017
do druid...@googlegroups.com
you can try setting groupBy merging to happen in single threaded, the merge order might be same for consecutive executions with single threaded groupBy - 
relevant property "druid.query.groupBy.singleThreaded" for runtime.props and "groupByIsSingleThreaded" when specifying in context.

Please note that it can affect performance negatively.  


--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/de735f8f-e225-4e18-92f1-c429eb43c69d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

priyam gupta

nieprzeczytany,
22 gru 2017, 11:51:0322.12.2017
do Druid User
Thanks Nishant. Doing it with single thread worked. 

Nishant Bangarwa

nieprzeczytany,
22 gru 2017, 14:08:5922.12.2017
do druid...@googlegroups.com
Glad, it worked, Just keep an eye on the performance. 

Ramesh Shanmugam

nieprzeczytany,
14 cze 2019, 12:20:5214.06.2019
do Druid User

We are also seeing the same behavior which is the hyper-unique unique results differs multiple executions of the same query.

Unfortunately 'groupByIsSingleThreaded' does not work for us.  tried with v1 and v2 group strategy but does not help. 

We understand hyper-unique is `approximate estimation` but wonder is it deterministic?. 

Also noticed if we query a single segment that always gives deterministic output. Looks like this behavior occurs only when it tries to union multiple segments of HLL.


On Friday, 22 December 2017 11:08:59 UTC-8, Nishant Bangarwa wrote:
Glad, it worked, Just keep an eye on the performance. 

To unsubscribe from this group and stop receiving emails from it, send an email to druid...@googlegroups.com.
Odpowiedz wszystkim
Odpowiedz autorowi
Przekaż
Nowe wiadomości: 0