I'm confused how to do count distinct in pigpen.
log = LOAD '20160412.tgz' AS (uid:chararray,path:chararray);
log_cnt = FOREACH (GROUP log BY path) {
ids = DISTINCT log.uid;
GENERATE group, COUNT(ids);
};
I read pigpen wiki, still confuse how to change.
Thanks
(defn distinct-count
[]
(fold/fold-fn clojure.set/union conj count))
(->> (pig/return records)
(pig/fold (distinct-count))
(pig/dump))