Partition tap problem on tez

63 views
Skip to first unread message

Ankit Singhal

unread,
Sep 28, 2015, 7:28:55 AM9/28/15
to cascading-user

While using partitonTap with Tez , I am getting following error which was not seen when run with MR1.


Commit failed for output: outputName:4699ED4DDC614F69B82AF6383DC5DE00 of vertex/vertexGroup:F126FEB9619C4C68B53B2FF8F31868BE isVertexGroupOutput:false, java.io.IOException: Failed to delete file:/tmp/hadoop-user/part-v010-o000-00000

at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:341)

at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)

at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)

at org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)

at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)

at org.apache.tez.mapreduce.committer.MROutputCommitter.commitOutput(MROutputCommitter.java:99)

at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:960)

at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:957)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.tez.dag.app.dag.impl.DAGImpl.commitOutput(DAGImpl.java:957)

at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2000(DAGImpl.java:144)

at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1029)

at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1024)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)



Ankit Singhal

unread,
Sep 28, 2015, 7:41:33 AM9/28/15
to cascading-user
It seems that all output committers are running simultaneously and deleting the common _temporary directory resulting in missing file.

Commit failed for output: outputName:FFDB8720294942C38D66D2AB92E0028C of vertex/vertexGroup:AF64E20CBE7D4CF0A78902EA6462DCE0 isVertexGroupOutput:false, java.io.FileNotFoundException: File file:/tmp/hadoop-user1/_temporary/0/task_144344012218510_0001_r_000000/part-v010-o000-00000 does not exist

at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)

at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337)

at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289)

at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:344)

at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:509)

at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:345)

at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.mergePaths(FileOutputCommitter.java:362)

at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.commitJob(FileOutputCommitter.java:310)

at org.apache.hadoop.mapred.FileOutputCommitter.commitJob(FileOutputCommitter.java:136)

at org.apache.hadoop.mapred.OutputCommitter.commitJob(OutputCommitter.java:259)

at org.apache.tez.mapreduce.committer.MROutputCommitter.commitOutput(MROutputCommitter.java:99)

at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:960)

at org.apache.tez.dag.app.dag.impl.DAGImpl$1.run(DAGImpl.java:957)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:415)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)

at org.apache.tez.dag.app.dag.impl.DAGImpl.commitOutput(DAGImpl.java:957)

at org.apache.tez.dag.app.dag.impl.DAGImpl.access$2000(DAGImpl.java:144)

at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1029)

at org.apache.tez.dag.app.dag.impl.DAGImpl$3.call(DAGImpl.java:1024)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)


manny....@gmail.com

unread,
May 1, 2017, 1:24:48 PM5/1/17
to cascading-user
Did anyone figure out this issue? 

Pierre-Antoine MARC

unread,
May 12, 2017, 11:13:49 AM5/12/17
to cascading-user
Yup we did. We had to override the PartitionTap class to make it work (force the output path in method sinkConfInit):

    @Override
   
public void sinkConfInit( FlowProcess<? extends Configuration> process, Configuration conf ) {

       
Path qualifiedPath = new Path( getFullIdentifier( conf ) + "_temp" );

       
HadoopUtil.setOutputPath(conf, qualifiedPath);

       
super.sinkConfInit( process, conf );
   
}



Hope it helps.
Reply all
Reply to author
Forward
0 new messages