KiteSDK cli throwing 'java.lang.OutOfMemoryError: GC overhead limit exceeded' when loading json data

557 views
Skip to first unread message

Sree Pratheep

unread,
Jun 1, 2015, 2:56:42 AM6/1/15
to cdk...@cloudera.org
We are trying to import json data with around 2,00,000 entries from a file into a hive dataset using the following command we are getting an OutOfMemoryError.
./kite-dataset json-import abc.txt abc

It works when we try to load around 1,00,000 entries. We couldn't find how to increase the java heap size. Can someone tell us how to increase the heap size when running the kite-dataset command.

We get the following OutOfMemoryError
bash-4.1# ./kite-dataset json-import abc.txt abc     
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.fasterxml.jackson.databind.node.TextNode.valueOf(TextNode.java:43)
at com.fasterxml.jackson.databind.node.JsonNodeFactory.textNode(JsonNodeFactory.java:273)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:210)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:59)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:189)
at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:120)
at org.kitesdk.shaded.com.google.common.collect.Iterators$8.next(Iterators.java:811)
at org.kitesdk.data.spi.filesystem.JSONFileReader.next(JSONFileReader.java:121)
at org.kitesdk.shaded.com.google.common.collect.Iterators$7.computeNext(Iterators.java:648)
at org.kitesdk.shaded.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at org.kitesdk.shaded.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.hasNext(MultiFileDatasetReader.java:125)
at com.google.common.collect.Lists.newArrayList(Lists.java:138)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:256)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:217)
at org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:76)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:151)
at org.kitesdk.tools.TransformTask.run(TransformTask.java:135)
at org.kitesdk.cli.commands.JSONImportCommand.run(JSONImportCommand.java:144)
at org.kitesdk.cli.Main.run(Main.java:178)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.kitesdk.cli.Main.main(Main.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Thanks,
Sree Pratheep

Joey Echeverria

unread,
Jun 1, 2015, 9:36:09 AM6/1/15
to Sree Pratheep, cdk...@cloudera.org
Hi Sree,

You can set JVM flags by setting the flags environment variable before
running the CLI. For example:

export flags="-Xmx2048m"
kite-dataset ...

- or -

flags="-Xmx2048m" kite-dataset ...

The environment variables you can use to configure the CLI are documented here:

http://kitesdk.org/docs/1.0.0/cli-reference.html#general

-Joey
> --
> You received this message because you are subscribed to the Google Groups
> "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to cdk-dev+u...@cloudera.org.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.



--
Joey Echeverria
Senior Infrastructure Engineer

Ryan Blue

unread,
Jun 1, 2015, 11:56:56 AM6/1/15
to Sree Pratheep, cdk...@cloudera.org
Joey's fix is a good one if you have the memory for it, but another
work-around is to put the file you're importing in HDFS. Then we will
use a MR job that doesn't have the memory problem.

The cause of this problem is that we were using Crunch's MemPipeline for
local files, which will only run one stage at a time and will keep
everything in memory. So it will do the conversion, keeping all records
in memory, and then write them to disk. This is CDK-898 [1].

We're fixing this in 1.1.0 and using the LocalJobRunner rather than a
MemPipeline. That will run copy or import tasks from local data as they
would run on a cluster, which uses much less memory.

rb

[1]: https://issues.cloudera.org/browse/CDK-898
--
Ryan Blue
Software Engineer
Cloudera, Inc.

ஸ்ரீ பிரதீப்

unread,
Jun 2, 2015, 1:50:14 AM6/2/15
to cdk...@cloudera.org
Thanks Joey for the reply. We tried to set the flags environment variable but that is not working. We got the following error.

bash-4.1# export flags="-Xmx2048m"                    
bash-4.1# ./kite-dataset json-import abc.txt abc
Exception in thread "main" java.lang.ClassNotFoundException: -Xmx2048m
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
bash-4.1#

Tried to set HADOOP_CLIENT_OPTS='-Xmx4096m'. That also throws the same out of memory error. Java processes still runs with 1024m heap.

-Sree Pratheep

Ryan Blue

unread,
Jun 2, 2015, 12:10:29 PM6/2/15
to ஸ்ரீ பிரதீப், cdk...@cloudera.org
Hi Sree,

Looks like there's something wrong with the "flags" variable we need to
fix. Sorry about that.

Did you try running with the file in HDFS instead of on local disk? I
think that is another way to fix this.

rb

On 06/01/2015 10:50 PM, ஸ்ரீ பிரதீப் wrote:
> Thanks Joey for the reply. We tried to set the flags environment
> variable but that is not working. We got the following error.
>
> bash-4.1# export flags="-Xmx2048m"
> bash-4.1# ./kite-dataset json-import abc.txt abc
> Exception in thread "main" java.lang.ClassNotFoundException: -Xmx2048m
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:270)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:214)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> bash-4.1#
>
> Tried to set HADOOP_CLIENT_OPTS='-Xmx4096m'. That also throws the same
> out of memory error. Java processes still runs with 1024m heap.
>
> -Sree Pratheep
>
>
> 2015-06-01 19:06 GMT+05:30 Joey Echeverria <jo...@rocana.com
> <mailto:jo...@rocana.com>>:
> <mailto:cdk-dev%2Bunsu...@cloudera.org>.
> > For more options, visit
> https://groups.google.com/a/cloudera.org/d/optout.
>
>
>
> --
> Joey Echeverria
> Senior Infrastructure Engineer
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cdk-dev+u...@cloudera.org
> <mailto:cdk-dev+u...@cloudera.org>.

Rafi Syed

unread,
Jun 4, 2015, 1:58:13 AM6/4/15
to cdk...@cloudera.org
Hi Ryan,
                I'm also facing the same issue I've tried using data in hdfs but I'm getting the following error please do the needful.


bash-4.1# ./kite-dataset json-import hdfs:/tmp hungry
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
IO error: Cannot add jar path to distributed cache: /usr/hdp/2.2.0.0-2041/hive/lib






On Monday, 1 June 2015 12:26:42 UTC+5:30, Sree Pratheep wrote:
We are trying to import json data with around 2,00,000 entries from a file into a hive dataset using the following command we are getting an OutOfMemoryError.
./kite-dataset json-import abc.txt abc

It works when we try to load around 1,00,000 entries. We couldn't find how to increase the java heap size. Can someone tell us how to increase the heap size when running the kite-dataset command.

We get the following OutOfMemoryErro
bas
r

Ryan Blue

unread,
Jun 4, 2015, 12:38:45 PM6/4/15
to Rafi Syed, cdk...@cloudera.org
Rafi,

Can you run that command with the verbose flag, -v (just after
kite-dataset), to get the full error message? It looks like a problem
with permissions maybe.

rb

On 06/03/2015 10:58 PM, Rafi Syed wrote:
> Hi Ryan,
> I'm also facing the same issue I've tried using data in
> hdfs but I'm getting the following error please do the needful.
>
>
> bash-4.1#./kite-dataset json-import hdfs:/tmp hungry

Rafi Syed

unread,
Jun 5, 2015, 2:10:11 AM6/5/15
to cdk...@cloudera.org
Hi Ryan 
              PFB the logs




bash-4.1# ./kite-dataset -v json-import hdfs:/tmp hungry    
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
IO error
org.kitesdk.data.DatasetIOException: Cannot add jar path to distributed cache: /usr/hdp/2.2.0.0-2041/hive/lib
at org.kitesdk.tools.TaskUtil$ConfigBuilder.addJarPathForClass(TaskUtil.java:129)
at org.kitesdk.tools.TransformTask.run(TransformTask.java:108)
at org.kitesdk.cli.commands.JSONImportCommand.run(JSONImportCommand.java:144)
at org.kitesdk.cli.Main.run(Main.java:178)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.kitesdk.cli.Main.main(Main.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.io.IOException: Jar file: /usr/hdp/2.2.0.0-2041/hive/lib/ojdbc6.jar does not exist.
at org.apache.crunch.util.DistCache.addJarToDistributedCache(DistCache.java:115)
at org.apache.crunch.util.DistCache.addJarDirToDistributedCache(DistCache.java:208)
at org.apache.crunch.util.DistCache.addJarDirToDistributedCache(DistCache.java:229)
at org.kitesdk.tools.TaskUtil$ConfigBuilder.addJarPathForClass(TaskUtil.java:127)
... 11 more









On Monday, 1 June 2015 12:26:42 UTC+5:30, Sree Pratheep wrote:
We are trying to importjso n data with around 2,00,000 entries from a file into a hive dataset using the following command we are getting an OutOfMemoryError.
./kite-datasetjso n-import abc.txtab c

It works when we try to load around 1,00,000 entries. We couldn't find how to increase thejav a heap size. Can someone tell us how to increase the heap size when running the kite-dataset command.


We get the following OutOfMemoryErro
bas
r
h-4.1# ./kite-dataset json-import abc.txt abc     
SLF4J:Class pat h contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usrhd/p/2.2.0.0-2041zookeepe/r/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main"jav a.lang.OutOfMemoryError: GC overhead limit exceeded

Erin Dogan

unread,
Jun 5, 2015, 12:10:54 PM6/5/15
to cdk...@cloudera.org
Sree,

I would verify that the ojdbc.jar is actually in the location. I ran into this same issue and the jar was not there. I fixed this by downloading the jar from oracle and putting it in the expected location. This however didn't resolve my issues as I then ran into the CopyTask job failing

 job failure(s) occurred:
org.kitesdk.tools.CopyTask: Kite(dataset:hdfs://sandbox.hortonworks.com:8020/tmp/d08d... ID=1 (1/1)(1): Job failed!


logs::
2015-06-05 05:44:04,865 INFO  jobhistory.JobSummary (HistoryFileManager.java:moveToDone(372)) - jobId=job_1433477092849_0001,submitTime=1433482990017,launchTime=1433483001006,firstMapTaskLaunchTime=1433483003858,firstReduceTaskLaunchTime=0,finishTime=1433483027484,resourcesPerMap=250,resourcesPerReduce=250,numMaps=1,numReduces=1,user=root,queue=default,status=FAILED,mapSlotSeconds=17,reduceSlotSeconds=0,jobName=org.kitesdk.tools.CopyTask: Kite(dataset:hdfs://sandbox.hortonworks.com:8020/tmp/21f5... ID\=1 (1/1)

Doesn't really tell me why it failed.

Ryan Blue

unread,
Jun 5, 2015, 12:43:36 PM6/5/15
to Rafi Syed, cdk...@cloudera.org
Rafi,

It looks like /usr/hdp/2.2.0.0-2041/hive/lib/ojdbc6.jar is probably a
broken symlink. How else would a file you can list not exist, right?

I'd look into that file more. Kite adds Hive to the distributed cache by
adding everything in the Hive lib directory. If it finds a broken
symlink, then it makes sense that it would fail. I think it should work
without ojdbc6.jar so you might be able to simply remove the symlink.

The problem with that approach is that a broken symlink indicates some
other issue that you should also look into. Maybe you need another
package installed that provides it, or maybe the Hive package you're
using has a bug. I'd contact your Hadoop vendor to find out, and please
let us know on this list what you find so others can get past this problem.

Thanks!

rb

On 06/04/2015 11:10 PM, Rafi Syed wrote:
> Hi Ryan
> PFB the logs
>
>
>
>
> bash-4.1#./kite-dataset -v json-import hdfs:/tmp hungry
> h-4.1#./kite-dataset json-import abc.txt abc

Rafi Syed

unread,
Jun 16, 2015, 6:58:07 AM6/16/15
to cdk...@cloudera.org, rafis...@gmail.com
FYI,  I still couldn't load the data into hive using json-import even from hdfs. Removed the symbolic link. Now I am getting the same error that Erin is getting. Following is the full output.

bash-4.1# ./kite-dataset -v json-import hdfs://integration.mycorp.kom:8020/tmp/hungry.txt hungry
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
1 job failure(s) occurred:
org.kitesdk.tools.CopyTask: Kite(dataset:hdfs://integration.mycorp.kom:8020/tmp/defau... ID=1 (1/1)(1): Job failed!


Getting the following errors in map reduce/job logs
2015-06-16 06:46:53,947 INFO [Socket Reader #1 for port 43038] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1434446597569_0003 (auth:SIMPLE)
2015-06-16 06:46:53,958 INFO [IPC Server handler 2 on 43038] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1434446597569_0003_m_000004 asked for a task
2015-06-16 06:46:53,958 INFO [IPC Server handler 2 on 43038] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1434446597569_0003_m_000004 given task: attempt_1434446597569_0003_m_000000_2
2015-06-16 06:46:55,680 FATAL [IPC Server handler 0 on 43038] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1434446597569_0003_m_000000_2 - exited : com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
2015-06-16 06:46:55,680 INFO [IPC Server handler 0 on 43038] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1434446597569_0003_m_000000_2: Error: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
2015-06-16 06:46:55,681 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1434446597569_0003_m_000000_2: Error: com.fasterxml.jackson.core.JsonFactory.requiresPropertyOrdering()Z
2015-06-16 06:46:55,681 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1434446597569_0003_m_000000_2 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP

Thanks,
Rafi

Liam Mooney

unread,
Jun 16, 2015, 7:57:52 AM6/16/15
to cdk...@cloudera.org
Hi Syed,

Can you try:
  • export HADOOP_OPTS=-Xmx2g
Seems to work for me, I have HDP 2.2 and KiteSdk 1.0.0 installed locally.

Thanks,
Liam

Sree Pratheep

unread,
Jun 16, 2015, 9:11:22 AM6/16/15
to cdk...@cloudera.org, sreepr...@gmail.com
Hi Ryan,

Will this be part of 1.1.0 release. FYI, ran the binary built locally in my machine from the latest code from https://github.com/kite-sdk/kite. Got the following exception
bash-4.1# ./kite-dataset -v json-import /usr/local/src/hungry.txt hungry
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
1 job failure(s) occurred:
org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/default/.temp/7470a17f-2006-42f7-a... ID=1 (1/1)(1): java.io.FileNotFoundException: File file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz does not exist
        at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
        at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:111)
        at org.apache.hadoop.fs.AbstractFileSystem.resolvePath(AbstractFileSystem.java:460)
        at org.apache.hadoop.fs.FilterFs.resolvePath(FilterFs.java:157)
        at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2137)
        at org.apache.hadoop.fs.FileContext$24.next(FileContext.java:2133)
        at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
        at org.apache.hadoop.fs.FileContext.resolve(FileContext.java:2133)
        at org.apache.hadoop.fs.FileContext.resolvePath(FileContext.java:595)
        at org.apache.hadoop.mapreduce.JobSubmitter.addMRFrameworkToDistributedCache(JobSubmitter.java:753)
        at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:435)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204)
        at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238)
        at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
        at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
        at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
        at java.lang.Thread.run(Thread.java:745)
I am running this in hadoop running in sequenceiq/ambari doker image. Let me know if you need any more information.

Thanks,
Sree

Ryan Blue

unread,
Jun 16, 2015, 12:22:59 PM6/16/15
to Sree Pratheep, cdk...@cloudera.org
Sree,

Does file:/hdp/apps/2.2.0.0-2041/mapreduce/mapreduce.tar.gz exist?

I'm not sure what's happening with your setup, but I think you might
have a problem with your install like Rafi. I don't think these files
should be missing.

And thanks to Liam for chiming in with help!

rb
> --
> You received this message because you are subscribed to the Google
> Groups "CDK Development" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to cdk-dev+u...@cloudera.org
> <mailto:cdk-dev+u...@cloudera.org>.
> For more options, visit https://groups.google.com/a/cloudera.org/d/optout.


Satyam Singh Chandel

unread,
Oct 12, 2015, 9:37:42 AM10/12/15
to CDK Development, sreepr...@gmail.com
Hi,

This thread chain helped me a lot while fixing issues while importing json data in HDFS using kite dataset.

Now I am facing an error when executed below command:

bash-4.1# ./kite-dataset json-import /vagrant/kite/sample.json dataset:hdfs://integcorp.kom:8020/user/falcon/dataset/hgrw


SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
1 job failure(s) occurred:
org.kitesdk.tools.CopyTask: Kite(dataset:file:/tmp/dataset/.temp/1d5a3984-d762-4b16-a... ID=1 (1/1)(1): java.io.FileNotFoundException: File does not exist: hdfs://integcorp.kom:8020/tmp/crunch-2009144800/p1/REDUCE
    at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1122)
    at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:1114)
    at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
    at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1114)
    at org.apache.hadoop.fs.FileSystem.resolvePath(FileSystem.java:750)
    at org.apache.hadoop.mapreduce.v2.util.MRApps.parseDistributedCacheArtifacts(MRApps.java:568)
    at org.apache.hadoop.mapreduce.v2.util.MRApps.setupDistributedCache(MRApps.java:460)
    at org.apache.hadoop.mapred.LocalDistributedCacheManager.setup(LocalDistributedCacheManager.java:93)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.<init>(LocalJobRunner.java:163)
    at org.apache.hadoop.mapred.LocalJobRunner.submitJob(LocalJobRunner.java:731)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:536)

    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1296)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1293)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1293)
    at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchControlledJob.submit(CrunchControlledJob.java:329)
    at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.startReadyJobs(CrunchJobControl.java:204)
    at org.apache.crunch.hadoop.mapreduce.lib.jobcontrol.CrunchJobControl.pollJobStatusAndStartNewOnes(CrunchJobControl.java:238)
    at org.apache.crunch.impl.mr.exec.MRExecutor.monitorLoop(MRExecutor.java:112)
    at org.apache.crunch.impl.mr.exec.MRExecutor.access$000(MRExecutor.java:55)
    at org.apache.crunch.impl.mr.exec.MRExecutor$1.run(MRExecutor.java:83)
    at java.lang.Thread.run(Thread.java:745)

Kindly help me out.....

Regards,

Ryan Blue

unread,
Oct 12, 2015, 12:27:12 PM10/12/15
to Satyam Singh Chandel, CDK Development, sreepr...@gmail.com
Hi Satyam,

I'm not sure what's going on there. It looks like a problem with the
LocalJobRunner's setup. Could you try loading the source file into HDFS
and re-running the command?

rb
> https://github.com/kite-sdk/kite <https://github.com/kite-sdk/kite>.
> > an email to cdk-dev+u...@cloudera.org <javascript:>
> > <mailto:cdk-dev+u...@cloudera.org <javascript:>>.
> <https://groups.google.com/a/cloudera.org/d/optout>.
Reply all
Reply to author
Forward
0 new messages