"ioConfig" : {
"type" : "index",
"firehose" : {
"type" : "static-s3",
"prefixes": ["s3://<bucket>/2018/05/07/*.gz"]
},
"appendToExisting" : false
},
"tuningConfig" : {
"type" : "index",
"targetPartitionSize" : 5000000,
"maxRowsInMemory" : 25000,
"forceExtendableShardSpecs" : true
}
"firehose" : {
"type" : "static-s3",
"prefixes": ["s3://<bucket>/2018/05/07/"]
},
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/863a314e-0b3b-401c-b6e0-2bc7ef3001d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Jihoon,
Thanks for your response. I tried modifying the `maxRowsInMemory` with 500K, 1 million rows etc., and the `targetPartitionSize` is 1 million rows and both failed with `OOM:heapspace`. A day's files' will have roughly 2 million rows.
The files are json documents and each of them have atleast 30-40 fields (in druid terms).
Out of this, only 4 dimensions were indexed. I use flatten spec as our json documents are nested object structs and the 4 dimensions are json path parsed/flattened.
Here below is the stack trace:
2018-05-11T10:28:32,500 ERROR [task-runner-0-priority-0] io.druid.indexing.common.task.IndexTask - Encountered exception in DETERMINE_PARTITIONS.
com.amazonaws.SdkClientException: Failed to sanitize XML document destined for handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseListObjectsV2Response(XmlResponsesSaxParser.java:315) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:88) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1553) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1271) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4229) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4176) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4170) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:865) ~[aws-java-sdk-bundle-1.11.199.jar:?]
at io.druid.storage.s3.S3Utils$2.fetchNextBatch(S3Utils.java:124) ~[?:?]
at io.druid.storage.s3.S3Utils$2.next(S3Utils.java:147) ~[?:?]
at io.druid.storage.s3.S3Utils$2.next(S3Utils.java:114) ~[?:?]
at com.google.common.collect.Iterators.addAll(Iterators.java:357) ~[guava-16.0.1.jar:?]
at com.google.common.collect.Lists.newArrayList(Lists.java:147) ~[guava-16.0.1.jar:?]
at io.druid.firehose.s3.StaticS3FirehoseFactory.initObjects(StaticS3FirehoseFactory.java:138) ~[?:?]
at io.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory.connect(PrefetchableTextFilesFirehoseFactory.java:167) ~[druid-api-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.data.input.impl.prefetch.PrefetchableTextFilesFirehoseFactory.connect(PrefetchableTextFilesFirehoseFactory.java:89) ~[druid-api-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.indexing.common.task.IndexTask.collectIntervalsAndShardSpecs(IndexTask.java:716) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.indexing.common.task.IndexTask.createShardSpecsFromInput(IndexTask.java:645) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.indexing.common.task.IndexTask.determineShardSpecs(IndexTask.java:583) ~[druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.indexing.common.task.IndexTask.run(IndexTask.java:417) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:456) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:428) [druid-indexing-service-0.13.0-SNAPSHOT.jar:0.13.0-SNAPSHOT]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_161]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_161]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_161]
Caused by: java.lang.OutOfMemoryError: Java heap space
There are also other OOMs.
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "main-SendThread(ip-10-35-19-87.ec2.internal:2181)"
2018-05-11T10:28:32,499 DEBUG [JettyScheduler] org.eclipse.jetty.server.session - Scavenging sessions at 1526034512491
Exception in thread "HttpClient-Netty-Boss-0" java.lang.OutOfMemoryError: Java heap space
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:932)
at java.util.concurrent.ThreadPoolExecutor.processWorkerExit(ThreadPoolExecutor.java:1025)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It should also be noted that the TASK does not complete to FAILED or so. I presume that the Peon died aand the overlord console shows that the TASK is ever running !
Best Regards
Varaga
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/96b38d2d-ecb3-4a08-b469-f923915d0eeb%40googlegroups.com.
druid.service=druid/middlemanager
druid.host=<Aurora Assigned Host:Port>
druid.port=<Aurora Assigned>
# HTTP server threads
druid.server.http.numThreads=40
# Processing threads and buffers
druid.processing.buffer.sizeBytes=36870912
druid.processing.numMergeBuffers=2
druid.processing.numThreads=2
# Resources for peons
druid.indexer.runner.javaOpts=-server -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Dlog4j.configurationFile=/opt/druid/conf/druid/middleManager/log4j2.xml
druid.indexer.task.baseDir=/mnt/<Masked>/task
druid.indexer.logs.directory=/mnt/<Masked>/task/logs
#druid.indexer.task.restoreTasksOnRestart=true
druid.indexer.runner.startPort=40000
# Peon properties
druid.indexer.fork.property.druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor"]
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=136870912
druid.indexer.fork.property.druid.processing.numMergeBuffers=2
druid.indexer.fork.property.druid.processing.numThreads=2
# Number of tasks per middleManager
druid.worker.capacity=3
druid.worker.ip=localhost
druid.worker.version=0
-server
-Xmx64m
-Xms64m
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Duser.timezone=UTC
-Dfile.encoding=UTF-8
-Djava.io.tmpdir=/tmp
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/8c625491-0dfc-40da-985d-eaadfcef6cdd%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAP7S8ju0W6xRRoh%2BOGs8yVLc6EwGJz6CVdRU%2B11fMrSqzqtgSg%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/8c625491-0dfc-40da-985d-eaadfcef6cdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CAP7S8ju0W6xRRoh%2BOGs8yVLc6EwGJz6CVdRU%2B11fMrSqzqtgSg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CACZfFK52QjWX1bmo9k-0k8KUAxKugHfR2-ov1BVhXm%2BK7N9khg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/8c625491-0dfc-40da-985d-eaadfcef6cdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/8c625491-0dfc-40da-985d-eaadfcef6cdd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to dr
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CACZfFK7-BrUXPua4LdxrZ2DA0medVkEGJTdrCRw1gcFsN%3DDv6w%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CACZfFK77s_6x%2B7GRkgsxNoxy4Xuo9L46CB_GrKQtDTrx%3DU0Qmw%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+unsubscribe@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/CACZfFK67TbAgbRKhJXMuDAa2RH_2U6DmvumMq%3DoYqwoQsiEbhg%40mail.gmail.com.
Out of this, only 4 dimensions were indexed. I use flatten spec as our json documents are nested object structs and the 4 dimensions are json path parsed/flattened.
com.amazonaws<span style="c