Indexing Service Extremely Slow

672 views
Skip to first unread message

Weiwen.Zhang

unread,
Jul 15, 2015, 5:16:13 AM7/15/15
to druid...@googlegroups.com

 Hi, I found my indexing service take long time for completing tasks, for example, 

it takes more than 5 hours to complete a task which has a segment of interval 

“2015-07-14T12:00:00.000Z/2015-07-14T13:00:00.000Z”(for 1 hour, which contains about 

35000000 origin records, 1 of the 5 shards)


So if the ingestion continue, many “running” tasks will lock all of the peon workers, and 

then no more tasks can be assigned to any peon.


I don’t know why is it so slow, and how to improve this.


I have only 1 middleManager node for test, and the server has a 12 cores CPU of “Intel E5-2620 v3 @ 2.40GHz”, 

192GB RAM, HDD.


My JDK version is “1.8.040“.


My config files:


_common/common.runtime.properties:


druid.extensions.searchCurrentClassloader=true



druid
.extensions.coordinates=[“io.druid.extensions:druid-kafka-eight”,”io.druid.extensions:mysql-metadata-storage”,”io.druid.extensions:druid-hdfs-storage”]

druid
.zk.service.host=192.168.105.240:2181

druid
.metadata.storage.type=mysql

druid
.metadata.storage.connector.connectURI=jdbc\:mysql\://192.168.105.240\:3306/druid

druid
.metadata.storage.connector.user=druid

druid
.metadata.storage.connector.password=durid

druid
.storage.type=hdfs

druid
.storage.storageDirectory=hdfs\://192.168.105.240\:9000/druid

druid
.cache.type=local

druid
.cache.sizeInBytes=100000000

druid
.selectors.indexing.serviceName=overlord

druid
.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid
.monitoring.emissionPeriod=PT10S

druid
.emitter=http

druid
.emitter.http.recipientBaseUrl=http\://192.168.105.240:19999/metrics/

druid
.request.logging.feed=druidrequests

druid
.emitter.logging.loggerClass=HttpPostEmitter

druid
.request.logging.type=file

druid
.request.logging.dir=/data1/logs/druid



middleManager/runtime.properties:


druid.host=192.168.105.17:3029



druid
.port=3029

druid
.service=middlemanager

druid
.indexer.queue.startDelay=PT1M

druid
.indexer.task.baseDir=/data1/druid

druid
.indexer.task.baseTaskDir=/data1/druid/persistent/tasks

druid
.indexer.runner.javaOpts=-server -Xmx8g -XX\:\+UseG1GC -XX\:MaxGCPauseMillis=100 -XX\:\+PrintGCDetails -XX\:\+PrintGCTimeStamps

druid
.indexer.fork.property.druid.monitoring.monitors=[“com.metamx.metrics.JvmMonitor”]

druid
.indexer.fork.property.druid.processing.buffer.sizeBytes=536870912

druid
.indexer.fork.property.druid.processing.numThreads=10

druid
.indexer.fork.property.druid.segmentCache.locations=[{“path”: “/data1/druid/persistent/segmentcache“, maxSize”: 1024000000000}]

druid
.indexer.fork.property.druid.server.http.numThreads=50

druid
.worker.capacity=15

druid
.indexer.runner.startPort=41000

druid
.worker.ip=192.168.105.17

druid
.worker.version=0

druid
.indexer.task.chathandler.type=announce

druid
.indexer.logs.type=file

druid
.indexer.logs.directory=/data1/logs/druid/indexer

druid
.indexer.fork.property.log4j.configurationFile=<file:///opt/druid/config/middleManager/log4j2.xml>



config/coordinator/runtime.properties:


druid.host=192.168.105.17:3031



druid
.port=3031

druid
.service=coordinator

druid
.coordinator.startDelay=PT5M

druid
.emitter=http

druid
.emitter.http.recipientBaseUrl=http\://192.168.105.240:19999/metrics/



config/historical/runtime.properties:


druid.host=192.168.105.17:3033



druid
.port=3033

druid
.service=historical

druid
.processing.buffer.sizeBytes=1073741824

druid
.processing.numThreads=4

druid
.segmentCache.locations=[{“path”\: “/data1/druid/indexCache”, maxSize”\: 1024000000000}]

druid
.server.maxSize=1024000000000

druid
.server.http.numThreads=50



overlord/runtime.properties:


druid.host=192.168.105.17:3030



druid
.port=3030

druid
.service=overlord

druid
.indexer.queue.startDelay=PT1M

druid
.indexer.runner.taskAssignmentTimeout=PT1M

druid
.indexer.runner.type=remote

druid
.indexer.storage.type=metadata

druid
.indexer.logs.type=file

druid
.indexer.logs.directory=/data1/logs/druid/indexer



broker/runtime.properties:


druid.host=192.168.105.17:3032

druid
.port=3032

druid
.service=broker

druid
.broker.cache.useCache=true

druid
.broker.cache.populateCache=true

druid
.broker.http.numConnections=20

druid
.broker.http.readTimeout=PT5M

druid
.processing.buffer.sizeBytes=512000000

druid
.processing.numThreads=2

druid
.server.http.numThreads=50



My start commands:


middleManager node: 

java -Xmx64m -Xms64m -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager \
-Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:”/data1/druid/indexCache”,”maxSize”:”500000000000”}] \
-Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/common:/opt/druid/config/middleManager:/opt/druid/lib/*:\
/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/common/lib/* \
io.druid.cli.Main server middleManager



overlord node: 

java -Xmx4g -Xms4g -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8 \
-Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager \
-Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:”/data1/druid/indexCache”,”maxSize”:”500000000000”}] \
-Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/common:/opt/druid/config/overlord:/opt/druid/lib/*:\
/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/common/lib/* \
io.druid.cli.Main server overlord



historical node: 

java -Xmx8g -Xms2g -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8
-Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
-Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{“path”:”/data1/druid/indexCache”,”maxSize”:”500000000000”}] \
-Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/common:/opt/druid/config/historical:/opt/druid/lib/*:\
/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/common/lib/*
io.druid.cli.Main server historical


coordinator node:

java -Xmx8g -Xms2g -server -XX:+UseConcMarkSweepGC -Duser.timezone=UTC -Dfile.encoding=UTF-8 -Ddruid.emitter.logging.logLevel=info -Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager -Djava.io.tmpdir=/data1/druid -Ddruid.segmentCache.locations=[{"path":"/data1/druid/indexCache","maxSize":"500000000000"}] -Ddruid.server.maxSize=500000000000 -classpath /opt/druid/config/_common:/opt/druid/config/coordinator:/opt/druid/lib/*:/opt/hadoop/share/hadoop/hdfs/*:/opt/hadoop/share/hadoop/common/*:/opt/hadoop/share/hadoop/common/lib/* io.druid.cli.Main server coordinator



My segments payload like below:


{

 task
: "index_realtime_datasource2_2015-07-14T20:00:00.000+08:00_0_0",

 payload
: {

 id
: "index_realtime_datasource2_2015-07-14T20:00:00.000+08:00_0_0",

 resource
: {

 availabilityGroup
: "datasource2-12-0000",

 requiredCapacity
: 1

 
},

 spec
: {

 dataSchema
: {

 dataSource
: "datasource2",

 parser
: {

 type
: "map",

 parseSpec
: {

 format
: "json",

 timestampSpec
: {

 column
: "timestamp",

 format
: "auto",

 missingValue
: null

 
},

 dimensionsSpec
: {

 dimensions
: [

 
"brand",

 
"country",

 
"cpa",

 
"cpc",

 
"cpm",

 
"currency",

 
"deal_type",

 
"from_jpa",

 
"from_rmb",

 
"from_twd",

 
"height",

 
"id",

 
"impression_type",

 
"media_type",

 
"payment_rule_id",

 
"payment_type",

 
"placement_group_id",

 
"placement_id",

 
"revenue_type",

 
"timestamp",

 
"to_jpa",

 
"to_rmb",

 
"to_twd"

 
],

 dimensionExclusions
: [

 
"count"

 
],

 spatialDimensions
: [ ]

 
}

 
}

 
},

 metricsSpec
: [

 
{

 type
: "count",

 name
: "row"

 
},

 
{

 type
: "longSum",

 name
: "count",

 fieldName
: "count"

 
},

 
{

 type
: "doubleSum",

 name
: "sum_cpm",

 fieldName
: "cpm"

 
},

 
{

 type
: "doubleSum",

 name
: "sum_cpc",

 fieldName
: "cpc"

 
},

 
{

 type
: "doubleSum",

 name
: "sum_cpa",

 fieldName
: "cpa"

 
},

 
{

 type
: "javascript",

 name
: "rmb_cpm",

 fieldNames
: [

 
"cpm",

 
"to_rmb"

 
],

 fnAggregate
: "function(current, a, b) { return current + a * b; }",

 fnReset
: "function() { return 0; }",

 fnCombine
: "function(partialA, partialB) { return partialA + partialB; }"

 
}

 
],

 granularitySpec
: {

 type
: "uniform",

 segmentGranularity
: "HOUR",

 queryGranularity
: {

 type
: "duration",

 duration
: 1000,

 origin
: "1970-01-01T00:00:00.000Z"

 
},

 intervals
: null

 
}

 
},

 ioConfig
: {

 type
: "realtime",

 firehose
: {

 type
: "clipped",

 
delegate: {

 type
: "timed",

 
delegate: {

 type
: "receiver",

 serviceName
: "druid:firehose:datasource2-12-0000-0000",

 bufferSize
: 100000

 
},

 shutoffTime
: "2015-07-14T21:06:00.000+08:00"

 
},

 interval
: "2015-07-14T12:00:00.000Z/2015-07-14T13:00:00.000Z"

 
}

 
},

 tuningConfig
: {

 type
: "realtime",

 maxRowsInMemory
: 10000000,

 intermediatePersistPeriod
: "PT2M",

 windowPeriod
: "PT1M",

 basePersistDirectory
: "/tmp/1436876000335-0",

 versioningPolicy
: {

 type
: "intervalStart"

 
},

 rejectionPolicy
: {

 type
: "none"

 
},

 maxPendingPersists
: 0,

 shardSpec
: {

 type
: "linear",

 partitionNum
: 0

 
},

 indexSpec
: {

 bitmap
: {

 type
: "concise"

 
},

 dimensionCompression
: null,

 metricCompression
: null

 
},

 persistInHeap
: false,

 ingestOffheap
: false,

 bufferSize
: 134217728

 
}

 
},

 groupId
: "index_realtime_datasource2",

 dataSource
: "datasource2"

 
}

}




the failed task logs show exceptions:


2015-07-14T13:55:26,104 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-kafka-eight] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,109 INFO [main] io.druid.initialization.Initialization - Adding extension module[class io.druid.firehose.kafka.KafkaEightDruidModule] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,110 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:mysql-metadata-storage] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,114 INFO [main] io.druid.initialization.Initialization - Adding extension module[class io.druid.metadata.storage.mysql.MySQLMetadataStorageModule] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,115 INFO [main] io.druid.initialization.Initialization - Loading extension[io.druid.extensions:druid-hdfs-storage] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:26,119 INFO [main] io.druid.initialization.Initialization - Adding extension module[class io.druid.storage.hdfs.HdfsStorageDruidModule] for class[io.druid.initialization.DruidModule]

2015-07-14T13:55:28,129 INFO [main] io.druid.guice.JsonConfigurator - Loaded class[class io.druid.server.initialization.HttpEmitterConfig] from props[druid.emitter.http.] as [io.druid.server.initialization.HttpEmitterConfig@2e53b094]

Exception in thread main com.google.inject.CreationException: Guice creation errors:


Error injecting method, java.lang.OutOfMemoryError: unable to create new native thread

at io
.druid.server.initialization.EmitterModule$EmitterProvider.inject(EmitterModule.java:101)

at io
.druid.server.initialization.EmitterModule.configure(EmitterModule.java:72)

1 error

at com
.google.inject.internal.Errors.throwCreationExceptionIfErrorsExist(Errors.java:448)

at com
.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:176)

at com
.google.inject.internal.InternalInjectorCreator.build(InternalInjectorCreator.java:110)

at com
.google.inject.Guice.createInjector(Guice.java:96)

at com
.google.inject.Guice.createInjector(Guice.java:73)

at com
.google.inject.Guice.createInjector(Guice.java:62)

at io
.druid.initialization.Initialization.makeInjectorWithModules(Initialization.java:369)

at io
.druid.cli.GuiceRunnable.makeInjector(GuiceRunnable.java:55)

at io
.druid.cli.CliPeon.run(CliPeon.java:207)

at io
.druid.cli.Main.main(Main.java:88)

Caused by: java.lang.OutOfMemoryError: unable to create new native thread

at java
.lang.Thread.start0(Native Method)

at java
.lang.Thread.start(Thread.java:714)

at java
.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950)

at java
.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1368)

at org
.jboss.netty.util.internal.DeadLockProofWorker.start(DeadLockProofWorker.java:38)

at org
.jboss.netty.channel.socket.nio.AbstractNioSelector.openSelector(AbstractNioSelector.java:368)

at org
.jboss.netty.channel.socket.nio.AbstractNioSelector.(AbstractNioSelector.java:100)

at org
.jboss.netty.channel.socket.nio.AbstractNioWorker.(AbstractNioWorker.java:52)

at org
.jboss.netty.channel.socket.nio.NioWorker.(NioWorker.java:45)

at org
.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:45)

at org
.jboss.netty.channel.socket.nio.NioWorkerPool.createWorker(NioWorkerPool.java:28)

at org
.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.newWorker(AbstractNioWorkerPool.java:143)

at org
.jboss.netty.channel.socket.nio.AbstractNioWorkerPool.init(AbstractNioWorkerPool.java:81)

at org
.jboss.netty.channel.socket.nio.NioWorkerPool.(NioWorkerPool.java:39)

at com
.metamx.http.client.HttpClientInit.createBootstrap(HttpClientInit.java:134)

at com
.metamx.http.client.HttpClientInit.createClient(HttpClientInit.java:89)

at io
.druid.server.initialization.HttpEmitterModule.getEmitter(HttpEmitterModule.java:79)

at sun
.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun
.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun
.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java
.lang.reflect.Method.invoke(Method.java:497)

at com
.google.inject.internal.ProviderMethod.get(ProviderMethod.java:105)

at com
.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)

at com
.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)

at com
.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:66)

at com
.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)

at com
.google.inject.internal.ProviderToInternalFactoryAdapter1.get(LifecycleScope.java:49)

at com
.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)

at com
.google.inject.internal.InjectorImpl1.call(InjectorImpl.java:1005)

at com
.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1058)

at com
.google.inject.internal.InjectorImplEmitterProvider.findEmitter(EmitterModule.java:125)

at io
.druid.server.initialization.EmitterModuleEmitterProvider

c521ec5e
.invoke()

at com
.google.inject.internal.cglib.reflect.1.invoke(SingleMethodInjector.java:57)

at com
.google.inject.internal.SingleMethodInjector.inject(SingleMethodInjector.java:91)

at com
.google.inject.internal.MembersInjectorImpl.injectMembers(MembersInjectorImpl.java:132)

at com
.google.inject.internal.MembersInjectorImpl1.call(MembersInjectorImpl.java:80)

at com
.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1051)

at com
.google.inject.internal.MembersInjectorImpl.injectAndNotify(MembersInjectorImpl.java:80)

at com
.google.inject.internal.Initializer$InjectableReference.get(Initializer.java:174)

at com
.google.inject.internal.Initializer.injectAll(Initializer.java:108)

at com
.google.inject.internal.InternalInjectorCreator.injectDynamically(InternalInjectorCreator.java:174)

8 more

Heap

garbage
-first heap total 2097152K, used 82228K [0x00000005c0000000, 0x00000005c0202000, 0x00000007c0000000)

region size
2048K, 41 young (83968K), 6 survivors (12288K)

Metaspace used 26540K, capacity 26950K, committed 27008K, reserved 1073152K

class space used 3298K, capacity 3414K, committed 3456K, reserved 1048576K


I also find many 'Cannot shut down yet!' errors in the peon logs:


/data1/logs/druid % grep 'Cannot shut down yet!' peon_indexing.log

2015-07-14T13:06:00,075 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_2

2015-07-14T13:06:00,282 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z

2015-07-14T13:06:00,283 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_3

2015-07-14T13:06:00,290 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_1

2015-07-14T13:06:00,311 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T12:00:00.000Z_2015-07-14T13:00:00.000Z_2015-07-14T12:14:34.966Z_4

2015-07-14T14:06:00,036 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_4

2015-07-14T14:06:00,468 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_2

2015-07-14T14:06:00,475 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_1

2015-07-14T14:06:00,495 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z

2015-07-14T14:06:00,500 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T13:00:00.000Z_2015-07-14T14:00:00.000Z_2015-07-14T13:00:08.657Z_3

2015-07-14T19:06:00,141 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_2

2015-07-14T19:06:00,147 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_4

2015-07-14T19:06:00,148 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_1

2015-07-14T19:06:00,150 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z_3

2015-07-14T19:06:00,150 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T18:00:00.000Z_2015-07-14T19:00:00.000Z_2015-07-14T18:00:08.748Z

2015-07-14T21:06:00,343 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T20:00:00.000Z_2015-07-14T21:00:00.000Z_2015-07-14T20:00:08.808Z_3

2015-07-14T23:06:00,169 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T22:00:00.000Z_2015-07-14T23:00:00.000Z_2015-07-14T22:00:09.014Z_1

2015-07-15T00:06:00,097 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-14T23:00:00.000Z_2015-07-15T00:00:00.000Z_2015-07-14T23:00:08.898Z_4

2015-07-15T01:06:00,314 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_3

2015-07-15T01:06:00,322 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_1

2015-07-15T01:06:00,324 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_4

2015-07-15T01:06:00,339 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z

2015-07-15T01:06:00,341 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T00:00:00.000Z_2015-07-15T01:00:00.000Z_2015-07-15T00:00:08.880Z_2

2015-07-15T04:06:00,126 INFO [task-runner-0] io.druid.segment.realtime.plumber.RealtimePlumber - Cannot shut down yet! Sinks remaining: datasource2_2015-07-15T03:00:00.000Z_2015-07-15T04:00:00.000Z_2015-07-15T03:00:09.029Z_3



other log files are attached to this post.

Any help would be really appreciated.


Regards,

Weiwen


Screenshot - 07152015 - 04:21:53 PM.png
druid.log.tar.bz2
coordinator_20150714201311.log.tar.bz2
historical_20150714201311.log.tar.bz2
middleManager_20150714201311.log.tar.bz2
overlord_20150714201311.log.tar.bz2
peon_indexing.log.tar.bz2

Weiwen.Zhang

unread,
Jul 15, 2015, 5:50:34 AM7/15/15
to druid...@googlegroups.com
update: my druid version is 0.7.3.

Gian Merlino

unread,
Jul 15, 2015, 12:05:15 PM7/15/15
to druid...@googlegroups.com
Hi Weiwen, it looks like your peon_indexing.log.tar.bz2 actually mixes together the logs from a bunch of different tasks. Is it possible for you to split that out for one of the slow tasks? The "log (all)" link next to a task on the overlord console should get you that.

A couple of things that stand out just from what you have:

- The one segment that your tasks managed to insert was really large (task=index_realtime_datasource2_2015-07-14T20:00:00.000+08:00_3_0 published one with size= 2881622658). Generally it's best to have segments a third to half that size. So, you may need either more parallel tasks or a shorter segmentGranularity (you could try FIFTEEN_MINUTE with tranquility 0.5.0).

- It looks like some of your tasks are OOMEing on startup by running out of threads (java.lang.OutOfMemoryError: unable to create new native thread). This *seems* unrelated to the realtime task slowness, but it's hard to tell because the logs are jumbled up.
Reply all
Reply to author
Forward
0 new messages