COUNT_DISTINCT compute same value of result

22 views

Skip to first unread message

unread,

Jun 18, 2015, 12:23:11 PM6/18/15

to cubert...@googlegroups.com

Hi, Maneesh Varshney:

Recently I was researching cubert.

I transformed a sql with cubert script using cube operator and grouping set operator.

Why the count_distinct(mid) and count_distinct(session_id) has the same result value after computing the cube grouping sets?

count_distinct(mid) count_distinct(session_id)

500 500

200 200

Can anyone help me if I'm writing the wrong script??

Here is part of my code:

JOB1

Map{ data = LOAD xxx USING TEXT }

BLOCKGEN data BY SIZE 1000000 PARTITIONED ON mid, session_id;

STORE data INTO "/cubert/temp" USING RUBIX("overwrite":"true");

END

JOB2

Map {

data = LOAD "" USING RUBIX

}

cube data by

columns...

INNERT dim, session_id

AGGREGATES SUM(pv) as pv,

COUNT_DISTINCT(mid) as uv,

COUNT_DISTINCT(session_id) as visits,

SUM(bounce) as bounce

grouping sets

(log_date,app_name,app_platform),

(log_date,app_name,app_platform,is_new) ......

Reply all

Reply to author

Forward

0 new messages