how to do count distinct

20 views
Skip to first unread message

ede...@gmail.com

unread,
May 4, 2016, 11:52:46 PM5/4/16
to PigPen Support
HI

I'm confused how to do count distinct in pigpen.

log = LOAD '20160412.tgz' AS (uid:chararray,path:chararray);
log_cnt = FOREACH (GROUP log BY path) {
ids = DISTINCT log.uid;
GENERATE group, COUNT(ids);
};

I read pigpen wiki, still confuse how to change.

Thanks

zhihong zhang

unread,
May 5, 2016, 11:36:02 PM5/5/16
to PigPen Support
https://github.com/Netflix/PigPen/issues/108
found how to do this. Thanks.

(defn distinct-count
  []
  (fold/fold-fn clojure.set/union conj count))

(->> (pig/return records)                                                                                                                                                                                                                                                                                                            
 (pig/fold (distinct-count))
 (pig/dump))
Reply all
Reply to author
Forward
0 new messages