We are trying to import json data with around 2,00,000 entries from a file into a hive dataset using the following command we are getting an OutOfMemoryError.
./kite-dataset json-import abc.txt abc
It works when we try to load around 1,00,000 entries. We couldn't find how to increase the java heap size. Can someone tell us how to increase the heap size when running the kite-dataset command.
We get the following OutOfMemoryError
bash-4.1# ./kite-dataset json-import abc.txt abc
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/hive/lib/hive-jdbc-0.14.0.2.2.0.0-2041-standalone.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.2.0.0-2041/zookeeper/lib/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.fasterxml.jackson.databind.node.TextNode.valueOf(TextNode.java:43)
at com.fasterxml.jackson.databind.node.JsonNodeFactory.textNode(JsonNodeFactory.java:273)
at com.fasterxml.jackson.databind.deser.std.BaseNodeDeserializer.deserializeObject(JsonNodeDeserializer.java:210)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:59)
at com.fasterxml.jackson.databind.deser.std.JsonNodeDeserializer.deserialize(JsonNodeDeserializer.java:15)
at com.fasterxml.jackson.databind.MappingIterator.nextValue(MappingIterator.java:189)
at com.fasterxml.jackson.databind.MappingIterator.next(MappingIterator.java:120)
at org.kitesdk.shaded.com.google.common.collect.Iterators$8.next(Iterators.java:811)
at org.kitesdk.data.spi.filesystem.JSONFileReader.next(JSONFileReader.java:121)
at org.kitesdk.shaded.com.google.common.collect.Iterators$7.computeNext(Iterators.java:648)
at org.kitesdk.shaded.com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at org.kitesdk.shaded.com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at org.kitesdk.data.spi.filesystem.MultiFileDatasetReader.hasNext(MultiFileDatasetReader.java:125)
at com.google.common.collect.Lists.newArrayList(Lists.java:138)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:256)
at com.google.common.collect.ImmutableList.copyOf(ImmutableList.java:217)
at org.apache.crunch.impl.mem.collect.MemCollection.<init>(MemCollection.java:76)
at org.apache.crunch.impl.mem.MemPipeline.read(MemPipeline.java:151)
at org.kitesdk.tools.TransformTask.run(TransformTask.java:135)
at org.kitesdk.cli.commands.JSONImportCommand.run(JSONImportCommand.java:144)
at org.kitesdk.cli.Main.run(Main.java:178)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.kitesdk.cli.Main.main(Main.java:256)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Thanks,
Sree Pratheep