DataFu has been accepted into Apache Incubator.  The new project page can be found at

Please direct new questions to the dev mailing list listed at

This group will be retained to keep a record of previous questions.

Showing 1-20 of 22 topics
Issue running hourglass on CDH4 Aaron Josephs 7/22/14
Sample impressions file. Sai SaiGraph 6/23/14
Combining job that runs more frequently than daily Jason Bodnar 2/25/14
Mapping output of Hourglss jobs to hive tables Abhishek Gayakwad 2/12/14
Hourglass Input paths Abhishek Gayakwad 2/5/14
why datafu hourglass has hard dependency on avro format ? <eom> Abhishek Gayakwad 2/3/14
Negative variance... Adrian Landman 1/15/14
Compute percentile Rizwana Rizia 11/13/13
Doing official release with Pig 0.12.0 support? Jarek Jarcec Cecho 11/6/13
Sessionize() giving Unexpected internal error. Expected input bag to contain a TUPLE, but instead found chararray Faraz Rasheed 7/3/13
Datafu Branches on Github Sajid Raza 6/23/13
Sessionize spills records to disk Mike Sukmanowsky 1/10/13
SetUnion does not handle large inputs gracefully Josh Rosenberg 1/7/13
Trying to use MarkovPairs but keep getting errors Johan Gustavsson 11/14/12
Can someone give me an example of ApplyQuantile use? Ryan Michael 9/17/12
run Datafu unit test in JUnit framework Johnny Zhang 8/29/12
How does the Median/Quantiles work Amit 4/11/12
BagSplit Question James Newhaven 4/11/12
got a error when run "ant test" Johnny Zhang 3/2/12
pagerank input data Joseph Wang 2/10/12
More topics »