at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:199)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:214)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitTask(FileOutputCommitter.java:167)
at com.datasalt.pangool.tuplemr.mapred.lib.output.ProxyOutputFormat$ProxyOutputCommitter.commitTask(ProxyOutputFormat.java:179)
at org.apache.hadoop.mapred.Task.commit(Task.java:1014)
at org.apache.hadoop.mapred.Task.done(Task.java:884)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:453)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
I am running pangool-core-0.70 atop CDH 4.5.0. A cursory look at ProxyOutputCommitter.commitTask revealed that FileOutputCommitter.commitTask is invoked twice; I don't believe it's
idempotent hence the error message. Also, if I replace ProxyOutputFormat with mine, e.g., job.setOutputFormatClass(TextOutputFormat.class), everything works as expected.
Any suggestions?
Thanks,
stan
<dependency> <groupId>com.datasalt.pangool</groupId> <artifactId>pangool-core</artifactId> <version>0.71-SNAPSHOT</version> </dependency>You will need to add the snapshot repository:
<repository> <id>pangool-snapshots</id> <name>Pangool Snapshot Repository</name> <url>http://clinker.datasalt.com/nexus/content/groups/public-snapshots</url> <releases> <enabled>false</enabled> </releases> <snapshots> <enabled>true</enabled> </snapshots> </repository>