COUNT_DISTINCT compute same value of result

22 views
Skip to first unread message

victor.sheng

unread,
Jun 18, 2015, 12:23:11 PM6/18/15
to cubert...@googlegroups.com

Hi, Maneesh Varshney:

  

Recently I was researching cubert.


I transformed a sql with cubert script using cube operator and grouping set operator.


Why the count_distinct(mid) and count_distinct(session_id)  has the same result value after computing the cube grouping sets? 


count_distinct(mid)  count_distinct(session_id)

500                                   500

200                                   200



Can anyone help me if I'm writing the wrong script??



Here is part of my code:



JOB1


Map{ data = LOAD  xxx USING TEXT }

     BLOCKGEN data BY SIZE 1000000 PARTITIONED ON mid, session_id;

     STORE data INTO "/cubert/temp" USING RUBIX("overwrite":"true");

END


JOB2

Map {

  data = LOAD "" USING RUBIX

}


cube data by 

columns...


INNERT dim, session_id


AGGREGATES SUM(pv) as pv,

COUNT_DISTINCT(mid) as uv,

COUNT_DISTINCT(session_id) as visits,

SUM(bounce) as bounce 


grouping sets 

(log_date,app_name,app_platform),

(log_date,app_name,app_platform,is_new) ......






Reply all
Reply to author
Forward
0 new messages