Gobblin Standalone encounter Java heap space OOM, need help

78 views
Skip to first unread message

Di ma

unread,
Jul 10, 2017, 6:15:34 AM7/10/17
to gobblin-users
Hi guys, i am new to Gobblin, using Gobblin 0.10.0 moving json logs from kafka to HDFS. Logs contain about 12+ namespaces, i need split the logs by it's namespace and write it to different directory by day, for example the log message belong to namespace:A, and it timestamp convert to day is 2017-07-01 should write to HDFS path: /data/2017_07_01_data/A_day_2017_07_01/. All Logs in one day is about 70GB and 95% logs message belong to the same namespace.

My custom WriterPartitioner and job file as follows:

I run the job in Standalone Gobblin mode two hour a time and extract.limit.timeLimit is 70 minutes, at the first 4-6 hours it work well but the syslog show the process memory use is slowly increase, several hours later it report Java heap space OOM and exit, the error log message as follows:

2017-05-25 17:56:37 CST INFO  [LocalJobRunner Map Task Executor #0] gobblin.runtime.GobblinMultiTaskAttempt  137 - 126 out of 128 tasks of job job_ReadingjoySinkData_1495704642150 are running in container attempt_local1151327680_0001_m_000000_0
2017-05-25 17:56:39 CST WARN  [ForkExecutor-0] gobblin.writer.RetryWriter$1  95 - Caught exception. This may be retried.
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.StringCoding.safeTrim(StringCoding.java:89)
at java.lang.StringCoding.access$100(StringCoding.java:50)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:154)
at java.lang.StringCoding.decode(StringCoding.java:193)
at java.lang.StringCoding.decode(StringCoding.java:254)
at java.lang.String.<init>(String.java:546)
at java.lang.String.<init>(String.java:566)
at com.readingjoy.SinkPartitioner.partitionForRecord(SinkPartitioner.java:36)
at com.readingjoy.SinkPartitioner.partitionForRecord(SinkPartitioner.java:19)
at gobblin.writer.PartitionedDataWriter.getDataWriterForRecord(PartitionedDataWriter.java:136)
at gobblin.writer.PartitionedDataWriter.write(PartitionedDataWriter.java:126)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:116)
at gobblin.writer.RetryWriter$2.call(RetryWriter.java:113)
at com.github.rholder.retry.AttemptTimeLimiters$NoAttemptTimeLimit.call(AttemptTimeLimiters.java:78)
at com.github.rholder.retry.Retryer.call(Retryer.java:160)
at com.github.rholder.retry.Retryer$RetryerCallable.call(Retryer.java:318)
at gobblin.writer.RetryWriter.callWithRetry(RetryWriter.java:140)
at gobblin.writer.RetryWriter.write(RetryWriter.java:121)
at gobblin.runtime.fork.Fork.processRecord(Fork.java:426)
at gobblin.runtime.fork.AsynchronousFork.processRecord(AsynchronousFork.java:98)
at gobblin.runtime.fork.AsynchronousFork.processRecords(AsynchronousFork.java:81)
at gobblin.runtime.fork.Fork.run(Fork.java:180)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-05-25 17:56:39 CST INFO  [communication thread] org.apache.hadoop.mapred.LocalJobRunner$Job  591 - map > map
2017-05-25 17:56:45 CST WARN  [ForkExecutor-0] gobblin.writer.RetryWriter$1  95 - Caught exception. This may be retried.

The JAVA environment configuration is default: java -Xmx2g -Xms1g 

when Gobblin run first hour job the process takes about 3.5G memory, memory use peak is about 5G, when it finish job and wait for next called it still takes about 3G memory.

I struggled in this problem about 2 months, search a lot for this but did not found any glue to solve. can someone give me some advice or hint? 

Forgive my poor English, i just wirte something with my language intuition and try to explain the problem clean.

Issac Buenrostro

unread,
Jul 12, 2017, 7:46:15 PM7/12/17
to Di ma, gobblin-users
Hi,

Any chance you can get a heap dump of the job and check what the leak suspects are? We can go from there to try to identify if there is a memory leak.

Best,
Issac

--
You received this message because you are subscribed to the Google Groups "gobblin-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-users+unsubscribe@googlegroups.com.
To post to this group, send email to gobbli...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gobblin-users/d5495516-280f-4de3-a153-46615f3fe176%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Di ma

unread,
Jul 13, 2017, 11:01:35 PM7/13/17
to gobblin-users, mad...@gmail.com
Thanks for provide the advice, i will try to analyse the heap dump file. i still got one question: run Gobblin standalone in production environment is reliable?
 
在 2017年7月13日星期四 UTC+8上午7:46:15,Issac Buenrostro写道:
To unsubscribe from this group and stop receiving emails from it, send an email to gobblin-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages