Hi, Maneesh Varshney:
Recently I was researching cubert.
I transformed a sql with cubert script using cube operator and grouping set operator.
Why the count_distinct(mid) and count_distinct(session_id) has the same result value after computing the cube grouping sets?
count_distinct(mid) count_distinct(session_id)
500 500
200 200
Can anyone help me if I'm writing the wrong script??
Here is part of my code:
JOB1
Map{ data = LOAD xxx USING TEXT }
BLOCKGEN data BY SIZE 1000000 PARTITIONED ON mid, session_id;
STORE data INTO "/cubert/temp" USING RUBIX("overwrite":"true");
END
JOB2
Map {
data = LOAD "" USING RUBIX
}
cube data by
columns...
INNERT dim, session_id
AGGREGATES SUM(pv) as pv,
COUNT_DISTINCT(mid) as uv,
COUNT_DISTINCT(session_id) as visits,
SUM(bounce) as bounce
grouping sets
(log_date,app_name,app_platform),
(log_date,app_name,app_platform,is_new) ......