Hello Team
I am try to run Group By followed by count aggregation, But facing a NULL PointerException. Its is straightforward code simlar to that in Impatient2 tutorial, Am I missing anything?
My code,
Scheme schIn2 = new TextDelimited( new Fields("id_2", "oddeven", "name"), ",");
Scheme schOut1 = new TextDelimited( new Fields("id_2", "name"), true, ",");
Tap srctap = new Hfs(schIn2, "/user/hive/warehouse/pokernew/poker_1.csv");
Tap sinkTap = new Hfs(new TextDelimited(true, ","), "/user/hashoutput/", SinkMode.REPLACE);
Pipe lhs = new Pipe("lhs");
lhs = new GroupBy(lhs,new Fields("oddeven"));
lhs = new Every(lhs, new Fields("oddeven"), new Count(), Fields.ALL );
FlowDef flowDef = FlowDef.flowDef().addSource(lhs, srctap).addTailSink(lhs, sinkTap);
Properties properties = new Properties();
AppProps.setApplicationJarClass(properties, cascadClient.class);
Hadoop2MR1FlowConnector flowConnector = new Hadoop2MR1FlowConnector( properties );
Flow flow = flowConnector.connect(flowDef);
flow.writeDOT( "dot/Segment.dot" );
flow.complete();
STACK TRACE,
Exception in thread "main" cascading.flow.FlowException: step failed: (1/1) /user/hashoutput, with job id: job_1440408860150_0008, please see cluster logs for failure messages
at cascading.flow.planner.FlowStepJob.blockOnJob(FlowStepJob.java:261)
at cascading.flow.planner.FlowStepJob.start(FlowStepJob.java:162)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:124)
at cascading.flow.planner.FlowStepJob.call(FlowStepJob.java:43)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Cluster Logs:
2015-08-24 16:22:46,320 INFO [main] cascading.tap.hadoop.io.MultiInputSplit: current split input path: hdfs://localhost:9000/user/hive/warehouse/pokernew/poker_1.csv
2015-08-24 16:22:46,322 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: cascading.tap.hadoop.io.MultiInputSplit@5c8504fd
2015-08-24 16:22:46,348 INFO [main] org.apache.hadoop.mapred.MapTask: numReduceTasks: 10
2015-08-24 16:22:46,352 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException
at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:414)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:442)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Here is input data
1999977,1,B1999977
1999978,0,B1999978
1999979,1,B1999979
1999980,0,B1999980
1999981,1,B1999981
1999982,0,B1999982
1999983,1,B1999983
Thx
Varun