OutOfMemoryError: "GC overhead limit exceeded" at the end of Hadoop indexing job

2,107 views
Skip to the first unread message

Asbjørn Clemmensen

unread,
12 Aug 2014, 02:38:2412/08/2014
to druid-de...@googlegroups.com
In trying to index a fairly large file (~750 mb when gzipped, around 8.3 million lines) in a Hadoop indexer task I get an OutOfMemoryError. I'm running 6.137.
The task runs basically to completion - takes almost three hours in this case - and fails with the message below.

This particular run had a ton of issues with ZooKeeper, as far as I can tell from the logs, but this has happened repeatedly, also without any ZK issues in the log output.

If I use a smaller subset of the file - 50.000 lines or 500.000 lines - it finishes processing as expected. I haven't experimented with other sizes yet.

2014-08-12 00:32:43,630 WARN [Thread-448] org.apache.hadoop.mapred.LocalJobRunner - job_local1920330805_0003
java.lang.Exception: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.regex.Pattern.matcher(Pattern.java:1088)
	at java.lang.String.replace(String.java:2180)
	at io.druid.data.input.MapBasedRow.getFloatMetric(MapBasedRow.java:106)
	at io.druid.segment.incremental.SpatialDimensionRowFormatter$5.getFloatMetric(SpatialDimensionRowFormatter.java:156)
	at io.druid.segment.incremental.SpatialDimensionRowFormatter$5.getFloatMetric(SpatialDimensionRowFormatter.java:156)
	at io.druid.segment.incremental.IncrementalIndex$1$2.get(IncrementalIndex.java:237)
	at io.druid.query.aggregation.LongSumAggregator.aggregate(LongSumAggregator.java:60)
	at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:367)
	at io.druid.segment.incremental.IncrementalIndex.add(IncrementalIndex.java:141)
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:303)
	at io.druid.indexer.IndexGeneratorJob$IndexGeneratorReducer.reduce(IndexGeneratorJob.java:253)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
	at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:744)
2014-08-12 00:32:43,758 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Job job_local1920330805_0003 failed with state FAILED due to: NA
2014-08-12 00:32:43,826 INFO [task-runner-0] org.apache.hadoop.mapreduce.Job - Counters: 38
	File System Counters
		FILE: Number of bytes read=1857818703704
		FILE: Number of bytes written=1942485935992
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		S3N: Number of bytes read=0
		S3N: Number of bytes written=8451918159
		S3N: Number of read operations=0
		S3N: Number of large read operations=0
		S3N: Number of write operations=0
	Map-Reduce Framework
		Map input records=8294987
		Map output records=8294987
		Map output bytes=8116137478
		Map output materialized bytes=8149317612
		Input split bytes=132
		Combine input records=0
		Combine output records=0
		Reduce input groups=31
		Reduce shuffle bytes=8149317612
		Reduce input records=8204124
		Reduce output records=0
		Spilled Records=24045003
		Shuffled Maps =31
		Failed Shuffles=0
		Merged Map outputs=31
		GC time elapsed (ms)=4772813
		CPU time spent (ms)=0
		Physical memory (bytes) snapshot=0
		Virtual memory (bytes) snapshot=0
		Total committed heap usage (bytes)=58565591040
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=763481881
	File Output Format Counters 
		Bytes Written=248


This is the indexing task I use for the job:

{
   "task":"index_hadoop_ds_2014-08-11T21:48:12.607Z",
   "payload":{
      "id":"index_hadoop_ds_2014-08-11T21:48:12.607Z",
      "spec":{
         "dataSchema":{
            "dataSource":"ds",
            "parser":{
               "type":"string",
               "parseSpec":{
                  "format":"json",
                  "timestampSpec":{
                     "column":"timestamp",
                     "format":"iso"
                  },
                  "dimensionsSpec":{
                     "dimensions":[
                        "lid",
                        "timestamp",
                        "sessionorder",
                        "responsetime",
                        "internal",
                        "onclick",
                        "url",
                        "domain",
                        "referrer",
                        "referrer_domain",
                        "sessioncookie",
                        "permcookie",
                        "useragent",
                        "browser",
                        "browserversion",
                        "os_name",
                        "os_family",
                        "device_type",
                        "resolution",
                        "ipaddress",
                        "network",
                        "country",
                        "region",
                        "city",
                        "organization",
                        "search_term",
                        "lastpage",
                        "lastlink",
                        "ext_search"
                     ],
                     "dimensionExclusions":[
                        "timestamp",
                        "unique_visitors",
                        "exits",
                        "responsetime_max",
                        "sessions",
                        "responsetime_min",
                        "count",
                        "search_hits",
                        "entries",
                        "bounces"
                     ],
                     "spatialDimensions":[

                     ]
                  }
               }
            },
            "metricsSpec":[
               {
                  "type":"count",
                  "name":"count"
               },
               {
                  "type":"longSum",
                  "name":"entries",
                  "fieldName":"entry"
               },
               {
                  "type":"longSum",
                  "name":"exits",
                  "fieldName":"exit"
               },
               {
                  "type":"longSum",
                  "name":"bounces",
                  "fieldName":"bounce"
               },
               {
                  "type":"longSum",
                  "name":"search_hits",
                  "fieldName":"search_hits"
               },
               {
                  "type":"min",
                  "name":"responsetime_min",
                  "fieldName":"responsetime"
               },
               {
                  "type":"max",
                  "name":"responsetime_max",
                  "fieldName":"responsetime"
               },
               {
                  "type":"cardinality",
                  "name":"sessions",
                  "fieldNames":[
                     "sessioncookie"
                  ]
               },
               {
                  "type":"cardinality",
                  "name":"unique_visitors",
                  "fieldNames":[
                     "permcookie"
                  ]
               }
            ],
            "granularitySpec":{
               "type":"uniform",
               "segmentGranularity":"DAY",
               "queryGranularity":{
                  "type":"duration",
                  "duration":60000,
                  "origin":"1970-01-01T00:00:00.000Z"
               },
               "intervals":[
                  "2014-03-01T00:00:00.000Z/2014-03-31T23:59:59.000Z"
               ]
            }
         },
         "ioConfig":{
            "type":"hadoop",
            "inputSpec":{
               "type":"static",
               "paths":"/path/to/data.json.gz"
            },
            "metadataUpdateSpec":null,
            "segmentOutputPath":null
         },
         "tuningConfig":{
            "type":"hadoop",
            "workingPath":null,
            "version":"2014-08-11T21:48:12.607Z",
            "partitionsSpec":{
               "type":"dimension",
               "partitionDimension":null,
               "targetPartitionSize":5000000,
               "maxPartitionSize":7500000,
               "assumeGrouped":false,
               "numShards":-1
            },
            "shardSpecs":{

            },
            "rowFlushBoundary":500000,
            "leaveIntermediate":false,
            "cleanupOnFailure":true,
            "overwriteFiles":false,
            "ignoreInvalidRows":false,
            "jobProperties":{

            },
            "combineText":false
         }
      },
      "hadoopDependencyCoordinates":null,
      "groupId":"index_hadoop_ds_2014-08-11T21:48:12.607Z",
      "dataSource":"ds",
      "resource":{
         "availabilityGroup":"index_hadoop_ds_2014-08-11T21:48:12.607Z",
         "requiredCapacity":1
      }
   }
}

Nishant Bangarwa

unread,
12 Aug 2014, 03:11:5212/08/2014
to druid-de...@googlegroups.com
Hi, 
What is your reducer -Xmx set to? hadoop tracks both the JVM heap and any off-heap pages as physical memory usage and druid indexer uses off-heap memory for aggregation, So you need to adjust the memory limits in to give reducers enough memoryWe usually run with 2.5GB reducer heaps and a 6GB physical memory limit.



--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/777222b1-7ebc-4876-9f7f-e22eb0964a1c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Asbjørn Clemmensen

unread,
12 Aug 2014, 04:53:5012/08/2014
to druid-de...@googlegroups.com
I'm not sure where to configure that exactly. I'm using the Indexer in local mode with the settings below (adjusted it to match the out-of-box configuration).

Prior to this I had runner.javaOpts set to -Xmx2g but that gave me the same results.

druid.host=localhost
druid.port=8087
druid.service=overlord

druid.zk.service.host=localhost

druid.extensions.coordinates=["io.druid.extensions:druid-kafka-seven:0.6.137","io.druid.extensions:druid-s3-extensions:0.6.138"]

druid.db.connector.connectURI=jdbc:mysql://localhost:3306/druid
druid.db.connector.user=druid
druid.db.connector.password=diurd

druid.selectors.indexing.serviceName=overlord
druid.indexer.storage.type=db
druid.indexer.queue.startDelay=PT0M
druid.indexer.runner.javaOpts="-server -Xmx256m"
druid.indexer.runner.startPort=8088
druid.indexer.fork.property.druid.processing.numThreads=1
druid.indexer.fork.property.druid.computation.buffer.size=100000000

#druid.storage.type=local
#druid.storage.storageDirectory=/storage/druid/localStorage

druid.storage.type=s3
druid.storage.bucket=<key>:<secret>@druiddata
druid.s3.accessKey=<key>
druid.s3.secretKey=<secret>

com.metamx.aws.accessKey=<key>
com.metamx.aws.secretKey=<secret>

hadoop.fs.s3n.awsAccessKeyId=<key>
hadoop.fs.s3n.awsSecretAccessKey=<secret>

Nishant Bangarwa

unread,
12 Aug 2014, 05:02:4412/08/2014
to druid-de...@googlegroups.com
Hi, 
the mapper and reducer memory settings can be configured in mapred-site.xml

e.g you can add these properties to mapred-site.xml to tune memory limits - 

  <property>
    <name>mapreduce.map.memory.mb</name>
    <value>2048</value>
  </property>
  <property>
    <name>mapreduce.map.java.opts</name>
    <value>-server -Xmx1536m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps</value>
  </property>
  <property>
    <name>mapreduce.reduce.memory.mb</name>
    <value>6144</value>
  </property>
  <property>
    <name>mapreduce.reduce.java.opts</name>
    <value>-server -Xmx2560m -Duser.timezone=UTC -Dfile.encoding=UTF-8 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps</value>
  </property>

Gian Merlino

unread,
12 Aug 2014, 11:47:0312/08/2014
to druid-de...@googlegroups.com
In addition to Nishant's suggestion of looking into the reducer -Xmx, you might also want to consider lowering your rowFlushBoundary. This controls the number of rows kept in memory on your reducer at any one time. The default is 500000, although if you have a lot of columns then you can benefit from setting it to half that or even lower.
Reply all
Reply to author
Forward
0 new messages