Limits for number of input events for batch ingestion

80 views
Skip to first unread message

Arjun Singri

unread,
Jan 15, 2017, 4:37:36 PM1/15/17
to Druid User
Are you guys aware of any limits for the number of input events per day for druid ingestion using batch ingestion? I am using task based ingestion. I saw this doc and couldn't find anything: http://druid.io/docs/latest/ingestion/tasks.html

Although its parent doc has something but it doesn't indicate the number of input events: http://druid.io/docs/latest/ingestion/batch-ingestion.html
maxRowsInMemoryIntegerThe number of rows to aggregate before persisting. Note that this is the number of post-aggregation rows which may not be equal to the number of input events due to roll-up. This is used to manage the required JVM heap size.no (default == 75000)
I am trying to ingest 1302523 events in one day. I also have theta-sketches enabled for three dimensions with default values for size. Am getting the following exception which after googling I found that GC is spending too much time without achieving much:

2017-01-15T21:16:38,982 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Uncaught Throwable while running task[IndexTask{id=index_event_10_2017-01-15T20:26:28.015Z, type=index, dataSource=event_10}]
java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[?:1.8.0_74]
	at java.lang.String.<init>(String.java:207) ~[?:1.8.0_74]
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:330) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:441) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringMap(MapDeserializer.java:475) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:335) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:26) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2168) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.metamx.common.parsers.JSONPathParser.parse(JSONPathParser.java:99) ~[java-util-0.27.9.jar:?]
	at io.druid.data.input.impl.StringInputRowParser.parseString(StringInputRowParser.java:126) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:72) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:390) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:221) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
2017-01-15T21:16:38,992 ERROR [main] io.druid.cli.CliPeon - Error when starting up.  Failing.
java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.worker.executor.ExecutorLifecycle.join(ExecutorLifecycle.java:211) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.cli.CliPeon.run(CliPeon.java:287) [druid-services-0.9.1.1.jar:0.9.1.1]
	at io.druid.cli.Main.main(Main.java:105) [druid-services-0.9.1.1.jar:0.9.1.1]
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-16.0.1.jar:?]
	at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-16.0.1.jar:?]
	at io.druid.indexing.worker.executor.ExecutorLifecycle.join(ExecutorLifecycle.java:208) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	... 2 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.Arrays.copyOfRange(Arrays.java:3664) ~[?:1.8.0_74]
	at java.lang.String.<init>(String.java:207) ~[?:1.8.0_74]
	at com.fasterxml.jackson.core.util.TextBuffer.contentsAsString(TextBuffer.java:330) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.core.json.ReaderBasedJsonParser.getText(ReaderBasedJsonParser.java:235) ~[jackson-core-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.UntypedObjectDeserializer$Vanilla.deserialize(UntypedObjectDeserializer.java:441) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer._readAndBindStringMap(MapDeserializer.java:475) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:335) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.deser.std.MapDeserializer.deserialize(MapDeserializer.java:26) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3066) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2168) ~[jackson-databind-2.4.6.jar:2.4.6]
	at com.metamx.common.parsers.JSONPathParser.parse(JSONPathParser.java:99) ~[java-util-0.27.9.jar:?]
	at io.druid.data.input.impl.StringInputRowParser.parseString(StringInputRowParser.java:126) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.StringInputRowParser.parse(StringInputRowParser.java:131) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.data.input.impl.FileIteratingFirehose.nextRow(FileIteratingFirehose.java:72) ~[druid-api-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.generateSegment(IndexTask.java:390) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:221) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) ~[druid-indexing-service-0.9.1.1.jar:0.9.1.1]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_74]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_74]
	at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_74]

Gian Merlino

unread,
Jan 16, 2017, 6:04:55 AM1/16/17
to druid...@googlegroups.com
Theta sketches have quite a large footprint relative to other columns. If you have three of them, you probably need to either kick up your heap size or you lower maxRowsInMemory relative to the defaults. It's ok for maxRowsInMemory to be less than the number of rows that you're indexing; Druid will spill the excess to disk.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/a3494821-78ac-48cb-a94e-24a9a5ebc58e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages