GC overhead limit exceeded

tiny657

unread,

Aug 31, 2020, 9:09:51 PM8/31/20

to Druid User

Hi,

Got out of memory even though my data volume is not that big compared to the server.

Error Message: Terminating due to java.lang.OutOfMemoryError: GC overhead limit exceeded

Installed druid on a single machine for i3.4xlarge and launched by bin/start-medium

-----

Medium: 16 CPU, 128GB RAM (~i3.4xlarge)

Launch command: bin/start-medium
Configuration directory: conf/druid/single-server/medium

https://druid.apache.org/docs/latest/operations/single-server.html

Here is the default jvm config for medium I used.

- broker: -Xms8g -Xmx8g -XX:MaxDirectMemorySize=5g

- coordinator-overlord: -Xms9g -Xmx9g

- historical: -Xms8g -Xmx8g -XX:MaxDirectMemorySize=13g

- middleManager: -Xms256m -Xmx256m

- router: -Xms512m -Xmx512, -XX:MaxDirectMemorySize=128m

Also, I tried to increase the memory in jvm config. But got the same Out of Memory error.

- broker: -Xms32g -Xmx32g -XX:MaxDirectMemorySize=20g

- coordinator-overlord: -Xms36g -Xmx 36g

- historical: -Xms32g -Xmx32g -XX:MaxDirectMemorySize=52g

- middleManager: -Xms10g -Xmx10g

- router: -Xms2g -Xmx2g -XX:MaxDirectMemorySize=512m

Here is the data example I used.

Tried to ingest 1.5G bytes (25M rows)

Ingestion spec:

{

"type": "index_parallel",

"spec": {

"dataSchema": {

"dataSource": “table_name”,

"dimensionsSpec" : {

"dimensions" : ["test_option", "abtest_metric_id", "dimension_name", "dimension_value", "event_value", "user_id"]

},

"timestampSpec": {

"column": "dt",

"format": "yyyyMMdd",

"missingValue": "20200822"

},

"metricsSpec": [ { "type": "count", "name": "count" } ],

"granularitySpec": {

"segmentGranularity": "day",

"queryGranularity": "none"

}

},

"ioConfig": {

"type": "index_parallel",

"inputSource": {

"type": "s3",

"prefixes": ["s3://buckets/"]

},

"inputFormat": {

"type": "parquet"

}

},

"tuningConfig": {

"type": "index_parallel",

"maxNumConcurrentSubTasks": 20

}

Full Error Log:

2020-09-01T00:51:14,974 INFO [main] org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=localhost sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@55d99dc3

2020-09-01T00:51:15,035 INFO [main] org.apache.curator.framework.imps.CuratorFrameworkImpl - Default schema

2020-09-01T00:51:15,045 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)

2020-09-01T00:51:15,054 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Socket connection established to localhost/127.0.0.1:2181, initiating session

2020-09-01T00:51:15,077 INFO [main-SendThread(localhost:2181)] org.apache.zookeeper.ClientCnxn - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x1000f97e8a80014, negotiated timeout = 30000

2020-09-01T00:51:15,083 INFO [main-EventThread] org.apache.curator.framework.state.ConnectionStateManager - State change: CONNECTED

2020-09-01T00:51:15,206 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Node[http://localhost:8081] of role[coordinator] detected.

2020-09-01T00:51:15,206 INFO [NodeRoleWatcher[OVERLORD]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Node[http://localhost:8081] of role[overlord] detected.

2020-09-01T00:51:15,206 INFO [NodeRoleWatcher[COORDINATOR]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Node watcher of role[coordinator] is now initialized.

2020-09-01T00:51:15,206 INFO [NodeRoleWatcher[OVERLORD]] org.apache.druid.curator.discovery.CuratorDruidNodeDiscoveryProvider$NodeRoleWatcher - Node watcher of role[overlord] is now initialized.

2020-09-01T00:51:15,378 INFO [main] org.apache.druid.indexing.worker.executor.ExecutorLifecycle - Running with task: {

"type" : "single_phase_sub_task",

"id" : "single_phase_sub_task_maxRetry3_biinapgd_2020-09-01T00:49:33.063Z",

"groupId" : "index_parallel_maxRetry3_pgibohji_2020-09-01T00:45:45.039Z",

"resource" : {

"availabilityGroup" : "single_phase_sub_task_maxRetry3_biinapgd_2020-09-01T00:49:33.063Z",

"requiredCapacity" : 1

},

"supervisorTaskId" : "index_parallel_maxRetry3_pgibohji_2020-09-01T00:45:45.039Z",

"numAttempts" : 2,

"spec" : {

"dataSchema" : {

"dataSource" : "maxRetry3",

"timestampSpec" : {

"column" : "dt",

"format" : "yyyyMMdd",

"missingValue" : "20200822-01-01T00:00:00.000Z"

},

"dimensionsSpec" : {

"dimensions" : [ {

"type" : "string",

"name" : "test_option",