Exception in thread "main" java.lang.RuntimeException: hadoop job failed
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:363)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:646)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:610)
On hadoop I see a reference to camus ..
hadoop-hadoop-tasktracker-pppdc9prd310.log, I see
2014-07-10 21:50:04,280 FATAL org.apache.hadoop.mapred.TaskTracker: Task: attempt_201407101039_0017_m_000003_0 - Killed : java.lang.ClassNotFoundException: com.linkedin.camus.etl.IEtlKey
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1713)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1678)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
at org.apache.hadoop.mapreduce.task.JobContextImpl.getOutputFormatClass(JobContextImpl.java:227)
at org.apache.hadoop.mapred.Task.initialize(Task.java:521)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:313)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Appreciate any help.
- Shekar
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
But when I try running
hadoop jar /usr/lib/hadoop-0.20-mapreduce/lib/camus-api-0.1.0-SNAPSHOT.jar com.linkedin.camus.etl.IEtlKey -libjars /usr/lib/hadoop-0.20-mapreduce/lib/camus-api-0.1.0-SNAPSHOT.jar
Exception in thread "main" java.lang.NoSuchMethodException: com.linkedin.camus.etl.IEtlKey.main([Ljava.lang.String;)
at java.lang.Class.getMethod(Class.java:1605)
at org.apache.hadoop.util.RunJar.main(RunJar.java:202)
- Shekar
Also, the main class you want to execute is CamusJob, not IEtlKey. Something like this:
java -cp $your_module_based_off_of_camus_example/target/camus-whatever_version_you_have-SNAPSHOT-shaded.jar:/etc/hadoop/conf:/usr/lib/hadoop-hdfs/hadoop-hdfs.jar:log4j.properties com.linkedin.camus.etl.kafka.CamusJob -P $properties_file
--
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
LinkedIn
f...@linkedin.com
linkedin.com/in/felixgv
________________________________________
From: camu...@googlegroups.com [camu...@googlegroups.com] on behalf of cti...@gmail.com [cti...@gmail.com]
Sent: Saturday, July 12, 2014 3:41 PM
To: camu...@googlegroups.com
Cc: cti...@gmail.com
Subject: Re: Camus to CDH4 -java.lang.RuntimeException: hadoop job failed
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
- Shekar
--
You received this message because you are subscribed to the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this group and stop receiving emails from it, send an email to camus_etl+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
This is what I have
java -Xss256m -cp camus-api/target/*:camus-etl-kafka/target/camus-etl-kafka-0.1.0-SNAPSHOT.jar:lib/httpclient-4.3.4.jar:lib/kafka-0.7.2.jar:lib/zkclient-0.1.jar:lib/avro-tools-1.7.3.jar::camus-example/target/camus-example-0.1.0-SNAPSHOT.jar:camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar:camus-example/target/camus-example-0.1.0-SNAPSHOT-tests.jar:/home/ctippur/.m2/repository/org/apache/hadoop/hadoop-client/2.0.0-mr1-cdh4.7.0/hadoop-client-2.0.0-mr1-cdh4.7.0.jar:/contrib/capacity-scheduler/* com.linkedin.camus.etl.kafka.CamusJob -P camus.1.txt
I still get Exception in thread "main" java.lang.RuntimeException: hadoop job failed
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:363)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:646)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at com.linkedin.camus.etl.kafka.CamusJob.main(CamusJob.java:610)
with the same log message on the jobtracker.
- Shekar
On the same lines, I followed some instructions from Gaurav - https://groups.google.com/forum/#!topic/camus_etl/NSBuFELJB8A
I have created the folder and copied the jars to the folder
hdfs.default.classpath.dir=/hadoop/libs
I still get a exception:
2014-07-14 15:52:41,147 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201407141543_0003_m_000002_0: Error: java.lang.ClassNotFoundException: com.linkedin.camus.etl.IEtlKey
yes y|sudo yum install hadoop-hdfs -y
2. java -cp camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar:/etc/hadoop/conf:/usr/lib/hadoop-hdfs/hadoop-hdfs.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.1.txt
Exception in thread "main" java.lang.RuntimeException: hadoop job failed
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:363)
at com.linkedin.camus.etl.kafka.CamusJob.run(CamusJob.java:646)
But on hadoop logs, I see a diff error
2014-07-15 12:43:51,182 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_201407141543_0005_m_000003_0: java.lang.IllegalStateException: java.lang.NoSuchMethodException: com.linkedin.camus.etl.kafka.common.StringOpentsdbRecordWriterProvider.<init>(org.apache.hadoop.mapreduce.TaskAttemptContext)
at com.linkedin.camus.etl.kafka.mapred.EtlMultiOutputCommitter.<init>(EtlMultiOutputCommitter.java:62)
at com.linkedin.camus.etl.kafka.mapred.EtlMultiOutputFormat.getOutputCommitter(EtlMultiOutputFormat.java:73)
at org.apache.hadoop.mapred.Task.initialize(Task.java:523)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:313)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.lang.NoSuchMethodException: com.linkedin.camus.etl.kafka.common.StringOpentsdbRecordWriterProvider.<init>(org.apache.hadoop.mapreduce.TaskAttemptContext)
Just an fyi that StringOpentsdbRecordWriterProvider is a modified version of StringRecordWriterProvider. How do I include it?
I have added it to hadoop library and have added it to the hdfs-core.xml file as well.
sudo -u hdfs hadoop fs -copyFromLocal /tmp/camus-etl-kafka-0.1.0-SNAPSHOT.jar /hadoop/libs
- Shekar
You can definitely run Camus from outside the Hadoop cluster. In fact, that is probably preferable. In all cases, Camus will spawn a MR job that runs on your cluster anyway.
Now that you have the relevant *-site.xml files in /etc/hadoop/conf (and I assume they contain the proper settings to reach your remote Hadoop cluster), can you include that directory in your classpath, like I said in my previous email?
java -Xss256m -cp camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar:/etc/hadoop/conf com.linkedin.camus.etl.kafka.CamusJob -P camus.1.txt
If that doesn't work, can you also try adding the hadoop-hdfs jar? Something like this:
java -Xss256m -cp camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar:/etc/hadoop/conf:/usr/lib/hadoop-hdfs/hadoop-hdfs.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.1.txt
Basically, you should be able to do operations on HDFS from your box. If using CDH, I recommend using their provided package manager repos to install the hadoop-client package. You should be able to do simple stuff like hdfs dfs -ls /
That is how I used to run Camus.
--
Felix GV
Data Infrastructure Engineer
Distributed Data Systems
Subject: Re: Camus to CDH4 -java.lang.RuntimeException: hadoop job failed
Felix,
Thanks for the reply.I am running camus from a node that does not run hadoop. I ran into some performance issues. I am trying to decouple camus from hadoop.Having said that, have couple of questions:
1. Since I am using CDH4, what are the minimum set of jars that are needed for this to work. I see that you pointed to the folder /etc/hadoop/conf. As I don't have anything in /etc/hadoop/conf, I copied all the relevant files from the standalone hadoop node.core-site.xml
hdfs-site.xml
mapred-site.xml
When I run Camus - java -Xss256m -cp camus-example/target/camus-example-0.1.0-SNAPSHOT-shaded.jar com.linkedin.camus.etl.kafka.CamusJob -P camus.1.txt
I get:
Exception in thread "main" java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2298)
....
2. Is this method recommended (running camus on a different node)?
- Shekar
etl.record.writer.provider.class.
I have specified a class I have modified.
On camus.properties file, I have
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.StringOpentsdbRecordWriterProvider
I see that this class not being referenced at all.
https://github.com/linkedin/camus/issues/88 - I have documented this here as well.
- Shekar
I have placed it on hadoop under a folder called /user/libs
Another point to note. In the current form, the default class file is not being read as well.
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.StringRecordWriterProvider
etl.record.writer.provider.class=com.linkedin.camus.etl.kafka.common.StringRecordWriterProvider
I have been stuck with this issue since a week now. Appreciate any help in this regard.
- Shekar
--
You received this message because you are subscribed to a topic in the Google Groups "Camus - Kafka ETL for Hadoop" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/camus_etl/FBAbksc-Z0Y/unsubscribe.
To unsubscribe from this group and all its topics, send an email to camus_etl+...@googlegroups.com.
2014-07-23 23:43:55,369 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 8020, call org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from 10.132.63.29:46059: error: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/ctippur/base/2014-07-24-06-43-39/_temporary/_attempt_201407232232_0006_m_000000_0/data.DUMMY_LOG.0.1.1406181600000-m-00000: File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-429092731_1, pendingcreates: 1]
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /user/ctippur/base/2014-07-24-06-43-39/_temporary/_attempt_201407232232_0006_m_000000_0/data.DUMMY_LOG.0.1.1406181600000-m-00000: File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-429092731_1, pendingcreates: 1]
Which indicates some sort of race condition. Temp files are getting created under /user/ctippur/base and then deleted prematurely.
I have opened a issue with cdh as well - http://community.cloudera.com/t5/Batch-Processing-and-Workflow/LeaseExpiredException-and-file-not-found/m-p/15910/highlight/false#M496
Thro other posts, I have seen that this could be a issue with the mr job.
- Shekar
I added a print statement in the try block but it is not getting called.- Shekar
On Wed, Jul 23, 2014 at 8:23 PM, Shekar Tippur <cti...@gmail.com> wrote:
Ken,That is what I see as well but for some reason, the logic is not getting to the try block.- Shekar
On Wed, Jul 23, 2014 at 7:16 PM, Ken Goodhope <kengo...@gmail.com> wrote:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /user/ctippur/base/2014-07-25-08-23-12/_temporary/_attempt_local584728229_0001_m_000000_0/data.DUMMY_LOG.0.1.1406275200000-m-00000: File does not exist. [Lease. Holder: DFSClient_NONMAPREDUCE_-939244676_1, pendingcreates: 1]
I have the same jar (camus-example-0.1.0-SNAPSHOT-shaded.jar) file copied to hdfs lib folder and referenced it on hdfs-site.xml
<property>
<name>hdfs.default.classpath.dir</name>
<value>/hadoop/libs</value>
</property>
Is there anything else I need to do to get this setup working - Camus running on a different node than standalone hadoop node)
- Shekar