EMRFS error when running with Tez on EMR

791 views
Skip to first unread message

Mike DeLaurentis

unread,
Mar 23, 2015, 10:22:26 AM3/23/15
to cascadi...@googlegroups.com
Hi,

I'm trying to get a Cascading application working with Tez on EMR. I'm using Cascading 3.0.0-WIP-80 and EMR AMI 3.4.0. I followed the instructions here (https://github.com/cwensel/cascading/tree/wip-3.0/cascading-hadoop2-tez) and specifically the instructions for using EMR (https://github.com/cwensel/cascading/tree/wip-3.0/cascading-hadoop2-tez#running-on-amazon-emr). I'm now seeing a ClassNotFoundException on EmrFileSystem, in the Tez AM logs (please see stack trace below). We are getting the input for the task from an s3:// URL, and I believe EMR now uses the EMRFS to handle those URLs. I'm confused as to why the emrfs jar isn't on the classpath. I am not specifically depending on EMRFS in my application, I'm assuming the jar would be provided by the EMR framework. Has anyone else seen an error like this?

Thanks,

Mike

Vertex failed, vertexName=D1288C0A64984AF3B0C190B05535367F, vertexId=vertex_1426801498977_0234_1_00, diagnostics=[Vertex
   vertex_1426801498977_0234_1_00 [D1288C0A64984AF3B0C190B05535367F] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input:
   158575BE56934CB1B3D270B607CD6F6B initializer failed, vertex=vertex_1426801498977_0234_1_00 [D1288C0A64984AF3B0C190B05535367F],
   java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
   at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2379)
   at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
   at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
   at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
   at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:256)
   at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
   at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
   at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:304)
   at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getSplits(TezGroupedSplitsInputFormat.java:75)
   at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateOldSplits(MRInputHelpers.java:441)
   at org.apache.tez.mapreduce.hadoop.MRInputHelpers.generateInputSplitsToMem(MRInputHelpers.java:295)
   at org.apache.tez.mapreduce.common.MRInputAMSplitGenerator.initialize(MRInputAMSplitGenerator.java:122)
   at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
   at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Subject.java:415)
   at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
   at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
   at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
   Caused by: java.lang.ClassNotFoundException: Class com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found
   at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1801)
   at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1893)
   ... 26 more
   ]
   DAG failed due to vertex failure. failedVertices:1 killedVertices:0

Chris K Wensel

unread,
Mar 23, 2015, 1:19:16 PM3/23/15
to cascadi...@googlegroups.com
I haven’t seen this. Can you open a bug with Amazon?

ckw

--
You received this message because you are subscribed to the Google Groups "cascading-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cascading-use...@googlegroups.com.
To post to this group, send email to cascadi...@googlegroups.com.
Visit this group at http://groups.google.com/group/cascading-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/cascading-user/CAK4oS5%3D3A%2BOvdPc3_-mjPG1yx4_nDVv%3D59_hyhNjkb8DAgQarg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Mike DeLaurentis

unread,
Mar 24, 2015, 1:52:43 PM3/24/15
to cascadi...@googlegroups.com
Sure, I'll do that. Before submit a support issue with AWS, I figured I'd try using their recommended (though  unofficial) bootstrap scripts for Tez: s3://support.elasticmapreduce/bootstrap-actions/ami/3.2.x/install-tez.beta  and s3://support.elasticmapreduce/bootstrap-actions/ami/3.2.x/setup-tez.beta. I noticed that they will install either Tez 0.5.1 or Tez 0.4.1. It doesn't look like they've added support for 0.6.0. Do you know if there's any reason why Cascading wouldn't run with Tez 0.5.1, or if there's any other reason why those bootstrap scripts wouldn't work for Cascading?

Thanks again,

Mike

Andre Kelpe

unread,
Mar 24, 2015, 2:03:05 PM3/24/15
to cascadi...@googlegroups.com
0.5.3 should work as well, we rely on 0.6.0 for all the new timeline
server features, but we discover the capabilities of the environment
at runtime.

- Andre

On Tue, Mar 24, 2015 at 10:52 AM, Mike DeLaurentis
> https://groups.google.com/d/msgid/cascading-user/CAK4oS5n1E2K-3x%3DnXeQsN5n2j6ejpHUO31DuobRAja%3Dijky62g%40mail.gmail.com.
>
> For more options, visit https://groups.google.com/d/optout.



--
André Kelpe
an...@concurrentinc.com
http://concurrentinc.com

Chris K Wensel

unread,
Mar 24, 2015, 6:53:30 PM3/24/15
to cascadi...@googlegroups.com
can you try disabling combined input format support. it’s off by default, so you would be calling HfsProps#setUseCombinedInput

A quick look at a prior email thread, i think this error went away when that was disabled.


fwiw, this is enabled in tez by default through a different means. i’m working on a way to gracefully handle when this property is enabled in tez.

ckw



For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Mike DeLaurentis

unread,
Mar 25, 2015, 3:12:11 PM3/25/15
to cascadi...@googlegroups.com
Chris,

Thanks for the tip. I tried disabling combined input with HfsProps.setUseCombinedInput(false), and that didn't fix it. I just opened a case with AWS. I'll let you know what they say.

Thanks,

Mike

Chris K Wensel

unread,
Mar 25, 2015, 3:14:32 PM3/25/15
to cascadi...@googlegroups.com
I’ve pushed a commit to wip-3.0 that will ignore the combined input format config setting on tez (and issue a warning).

should show up as wip-91 tonight.

please let me know if disabling the setting removes the EMRFS error. if not, I can ask the EMR team for guidance.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Mike DeLaurentis

unread,
Mar 26, 2015, 12:15:00 PM3/26/15
to cascadi...@googlegroups.com
Chris,

I tried running it with wip-91, and that did not fix the EmrFileSystem ClassNotFoundException. I also tried explicitly disabling combined inputs with HfsProps, and that did not fix it either. I put in an AWS support ticket yesterday but haven't heard back yet. I might try downloading my input data to the HDFS filesystem, rather than reading the input from S3, and see if that fixes it. That wouldn't be an ideal long-term solution, but I'd like to see if we'll get any other errors once we get past this one.

Thanks,

Mike

Chris K Wensel

unread,
Mar 26, 2015, 12:20:55 PM3/26/15
to cascadi...@googlegroups.com
I gave the EMR team a pointer to this thread (and the other related one), lets see if they follow up here or on the forum.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




Message has been deleted

Rohit Garg

unread,
Dec 8, 2015, 5:13:03 PM12/8/15
to cascading-user
Hey...I just installed tez on EMR using the bootstrap script provided by amazon. How did you get Tez 0.5.1? I got Tez 0.4.1

Chris K Wensel

unread,
Dec 8, 2015, 5:30:30 PM12/8/15
to cascadi...@googlegroups.com
we followed these instructions..


though they may be out of date in relation to the 4.x ami versions. a pull request would be appreciated if there are differences.

ckw


For more options, visit https://groups.google.com/d/optout.

Chris K Wensel




RG

unread,
Dec 8, 2015, 5:40:52 PM12/8/15
to cascading-user
Thanks so much. Just to make sure, there is no bootstrap script provided by amazon that I can directly use for versions after 0.4.1 like the one below?
bootstrap scripts for Tez: s3://support.elasticmapreduce/bootstrap-actions/ami/3.2.x/install-tez.beta  and s3://support.elasticmapreduce/bootstrap-actions/ami/3.2.x/setup-tez.beta. 

Chris K Wensel

unread,
Dec 8, 2015, 5:48:58 PM12/8/15
to cascadi...@googlegroups.com
I have no idea. i’ve never used them.


For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages