Cascading 3.0 WIP with Apache Tez platform support

12 views
Skip to first unread message

Chris K Wensel

unread,
Sep 17, 2014, 2:28:13 PM9/17/14
to cascadi...@googlegroups.com, patter...@googlegroups.com, lingua...@googlegroups.com, us...@tez.apache.org

Hey all

Happy to let everyone know we just pushed out a new 3.0 WIP that supports Apache Tez as a runtime platform.

The official announcement is here:


Note the wip-50 download doesn't include hadoop2-tez.jar, it will show up in wip-51. But conjars has the binaries.


And the JavaDoc can be found here:
  

The current working wip branch:


Notes and knowns issues can be tracked here:


Know all platform tests do pass, and the resulting binaries have been pushed to conjars.org.

That said, we expect there to be issues, so please email the cascading-user list with any questions or bugs.


We have also updated the sample apps to use Tez, other projects will also be updated as well over time.

Chris K Wensel

unread,
Feb 27, 2015, 1:44:01 PM2/27/15
to cascadi...@googlegroups.com, patter...@googlegroups.com, us...@tez.apache.org, lingua...@googlegroups.com
Hey all

Just a heads up we are up to wip-72 now.

This moves Tez to 0.6.0, and changes the default Hadoop version to 2.6. Tez intends to remain compatible with Hadoop 2.4. So from a runtime perspective, you shouldn’t see a difference.

This also provides deep counter support to the task/slice level. This in turn requires a YARN history server to be running, things will gracefully downgrade without it.

I’ve also updated the README to include help on using Tez on Amazon EMR


As a convenience we are hosting Tez binaries in our S3 buckets. 


Lastly, we only have a few more items to build out for Tez listed below before we are feature complete for 3.0


If you haven’t done so, please test your apps and languages on 3.0 so we can address any issues, especially API changes, before 3.0 final.

We are also publishing 3.0 Fluid artifacts as well.



ckw
Chris K Wensel




Luis Casillas

unread,
Feb 27, 2015, 5:32:20 PM2/27/15
to cascadi...@googlegroups.com, patter...@googlegroups.com, us...@tez.apache.org, lingua...@googlegroups.com
On Friday, February 27, 2015 at 10:44:02 AM UTC-8, Chris K Wensel wrote:
Hey all

Just a heads up we are up to wip-72 now.

This moves Tez to 0.6.0, and changes the default Hadoop version to 2.6. Tez intends to remain compatible with Hadoop 2.4. So from a runtime perspective, you shouldn’t see a difference.

Thanks!  I gave another try at running our flow in Cascading/Tez and EMR.  I've gotten further ahead than last time, but still having problems.

First: it would be enormously valuable to figure out how to submit a Cascading/Tez application as a step to an EMR cluster, instead of running it by logging in to the master and issuing commands.  I had a shot at this and the challenge was that EMR bootstrap actions run before HDFS is up.  I think the solution here is to have both a bootstrap action (to set up environment variables, configuration files and such) and an EMR step that performs the actual copy of the jars, but I haven't had time to try this yet.

Second: the README you linked still says tez-0.5.3, I think you mean to update that to 0.6.0.

Third: I had to hardcode the -DH arguments in the README to my app, because I couldn't figure out in 5 minutes the correct way to specify options to the hadoop jar command so that my Cascading app's option processor doesn't see them.  Not difficult to do, I'm sure, but the command template shown in the README for kicking off an app didn't actually work.

Fourth: my application (which sources from and sinks to S3, and uses HfsProps.setUseCombinedInput()) fails with two ClassNotFoundExceptions, one for com.amazon.ws.emr.hadoop.fs.EmrFileSystem and the other for cascading.tap.hadoop.Hfs$CombinedInputFormat.
 
Vertex failed, vertexName=DF233A0DD77540CCB87B54D63CD5CE02, vertexId=vertex_1425073885856_0004_1_02, diagnostics=[Vertex init failed : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2379)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.(FileOutputCommitter.java:91)
at org.apache.hadoop.mapred.FileOutputCommitter.getWrapped(FileOutputCommitter.java:65)
at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:131)
at org.apache.hadoop.mapred.OutputCommitter.setupJob(OutputCommitter.java:213)
at org.apache.tez.mapreduce.committer.MROutputCommitter.setupOutput(MROutputCommitter.java:91)
at org.apache.tez.dag.app.dag.impl.VertexImpl$2.run(VertexImpl.java:1965)
at org.apache.tez.dag.app.dag.impl.VertexImpl$2.run(VertexImpl.java:1945)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.tez.dag.app.dag.impl.VertexImpl.initializeCommitters(VertexImpl.java:1945)
at org.apache.tez.dag.app.dag.impl.VertexImpl.initializeVertex(VertexImpl.java:1975)
at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4300(VertexImpl.java:183)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3033)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2942)
at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:2923)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1587)
at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:182)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1768)
at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:1754)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:106)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 35 more
, Vertex vertex_1425073885856_0004_1_02 [DF233A0DD77540CCB87B54D63CD5CE02] killed/failed due to:null]
Vertex failed, vertexName=C80F022F9BC448099A57447133356E96, vertexId=vertex_1425073885856_0004_1_01, diagnostics=[Vertex vertex_1425073885856_0004_1_01 [C80F022F9BC448099A57447133356E96] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: 37D24F5D40D44FAB924984B63029AEBC initializer failed, vertex=vertex_1425073885856_0004_1_01 [C80F022F9BC448099A57447133356E96], org.apache.tez.dag.api.TezUncheckedException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:426)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:686)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:424)
... 13 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
... 15 more
Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 16 more
]
Vertex failed, vertexName=C8082EF9E44B4C8BA8021E4CF7C07ADB, vertexId=vertex_1425073885856_0004_1_03, diagnostics=[Vertex vertex_1425073885856_0004_1_03 [C8082EF9E44B4C8BA8021E4CF7C07ADB] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: 57AB02C1620643C3974B2BC9C06A4C69 initializer failed, vertex=vertex_1425073885856_0004_1_03 [C8082EF9E44B4C8BA8021E4CF7C07ADB], org.apache.tez.dag.api.TezUncheckedException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:426)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:686)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:424)
... 13 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
... 15 more
Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 16 more
]
Vertex failed, vertexName=DC9B873F6A454D1FB032F8252E33C9C7, vertexId=vertex_1425073885856_0004_1_00, diagnostics=[Vertex vertex_1425073885856_0004_1_00 [DC9B873F6A454D1FB032F8252E33C9C7] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: 67A10F19A80B41B8A68522303348387A initializer failed, vertex=vertex_1425073885856_0004_1_00 [DC9B873F6A454D1FB032F8252E33C9C7], org.apache.tez.dag.api.TezUncheckedException: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:426)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295)
at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1927)
at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:686)
at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:424)
... 13 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1919)
... 15 more
Caused by: java.lang.ClassNotFoundException: Class cascading.tap.hadoop.Hfs$CombinedInputFormat not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
... 16 more
]
Vertex killed, vertexName=A968F45732AC42EF8E595623D056A4B9, vertexId=vertex_1425073885856_0004_1_04, diagnostics=[Vertex received Kill in NEW state., Vertex vertex_1425073885856_0004_1_04 [A968F45732AC42EF8E595623D056A4B9] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:4 killedVertices:1


This message and any files or text attached to it are intended only for the recipients named above, and contain information that is confidential or privileged.  If you are not an intended recipient, you must not read, copy, use or disclose this communication. Please also notify the sender by replying to this message, and then delete all copies of it from your system. 
Reply all
Reply to author
Forward
0 new messages