restoreOnRestart doesn't work

55 views
Skip to first unread message

Noppanit Charassinvichai

unread,
Jul 6, 2016, 4:09:45 PM7/6/16
to Druid User
We're using Druid 0.9.0 we have restoreOnRestore turn on but after we restart the middle manager I see that the tasks FAILED on the coordinator and it didn't restore the tasks. 

We use systemctl to manager the service.

druid.host=<IPADDRESS>
druid
.port=8080
druid
.service=druid/middlemanager


# Task Logging
druid
.indexer.logs.type=file


# MiddleManager Service
druid
.indexer.runner.allowedPrefixes=["com.metamx","druid","io.druid","user.timezone","file.encoding"]
druid
.indexer.runner.compressZnodes=true
druid
.indexer.runner.javaCommand=java
druid
.indexer.runner.javaOpts=-server -Xmx4G -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
druid
.indexer.runner.maxZnodeBytes=524288
druid
.indexer.runner.startPort=8100
druid
.worker.ip=localhost
druid
.worker.version=0


# Peon Configs
druid
.indexer.fork.property.druid.monitoring.monitors=["com.metamx.metrics.JvmMonitor"]
druid
.indexer.fork.property.druid.segmentCache.locations=[{"path": "/mnt/persistent/zk_druid", "maxSize": 0}]
druid
.indexer.fork.property.druid.processing.numThreads=7
druid
.indexer.fork.property.druid.server.http.numThreads=50
druid
.indexer.fork.property.druid.storage.archiveBaseKey=ci-druid-archive
druid
.indexer.fork.property.druid.storage.archiveBucket=cn-dev
druid
.indexer.fork.property.druid.storage.baseKey=ci/druid
druid
.indexer.fork.property.druid.storage.bucket=cn-dev
druid
.indexer.fork.property.druid.storage.type=s3
druid
.indexer.fork.property.druid.indexer.task.restoreTasksOnRestart=true
druid
.peon.mode=remote
druid
.indexer.task.baseDir=/tmp
druid.indexer.task.baseTaskDir=/
tmp/persistent/tasks
druid
.indexer.task.hadoopWorkingPath=/tmp/druid-indexing
druid
.indexer.task.defaultRowFlushBoundary=50000
druid
.indexer.task.defaultHadoopCoordinates=["org.apache.hadoop:hadoop-client:2.3.0"]
druid
.indexer.task.chathandler.type=announce


# Remote Peon Configs
druid
.peon.taskActionClient.retry.minWait=PT1M
druid
.peon.taskActionClient.retry.maxWait=PT10M
druid
.peon.taskActionClient.retry.maxRetryCount=10

Noppanit Charassinvichai

unread,
Jul 6, 2016, 4:13:14 PM7/6/16
to Druid User
The task is realtime. It's from tranquility. 

David Lim

unread,
Jul 6, 2016, 6:26:22 PM7/6/16
to Druid User
Try setting druid.indexer.task.restoreTasksOnRestart=true instead of druid.indexer.fork.property.druid.indexer.task.restoreTasksOnRestart=true.

Noppanit Charassinvichai

unread,
Jul 7, 2016, 10:57:44 AM7/7/16
to Druid User
However, looking at the log it looks like it's showing that it did gracefully shutdown the task.

2016-07-07T14:40:30,502 INFO [sparrow-firehose-web-incremental-persist] io.druid.segment.ReferenceCountingSegment - Closing sparrow-firehose-web_2016-07-07T14:00:00.000Z_2016-07-07T15:00:00.000Z_2016-07-07T14:00:00.367Z
2016-07-07T14:40:30,502 INFO [sparrow-firehose-web-incremental-persist] io.druid.segment.ReferenceCountingSegment - Closing sparrow-firehose-web_2016-07-07T14:00:00.000Z_2016-07-07T15:00:00.000Z_2016-07-07T14:00:00.367Z, numReferences: 0
2016-07-07T14:40:30,502 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Gracefully stopping.
2016-07-07T14:40:30,502 INFO [task-runner-0-priority-0] io.druid.indexing.common.task.RealtimeIndexTask - Job done!
2016-07-07T14:40:30,503 INFO [Thread-54] io.druid.indexing.overlord.ThreadPoolTaskRunner - Graceful shutdown of task[index_realtime_sparrow-firehose-web_2016-07-07T14:00:00.000Z_0_1] finished in 818ms with status[SUCCESS].
2016-07-07T14:40:30,506 INFO [Thread-54] LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-07-07T14:40:30.504Z","service":"druid/middlemanager","host":"10.91.39.204:8100","metric":"task/interrupt/count","value":1,"dataSource":"sparrow-firehose-web","error":"false","graceful":"true","task":"index_realtime_sparrow-firehose-web_2016-07-07T14:00:00.000Z_0_1"}]
2016-07-07T14:40:30,506 INFO [Thread-54] LoggingEmitter - Event [{"feed":"metrics","timestamp":"2016-07-07T14:40:30.506Z","service":"druid/middlemanager","host":"10.91.39.204:8100","metric":"task/interrupt/elapsed","value":819,"dataSource":"sparrow-firehose-web","error":"false","graceful":"true","task":"index_realtime_sparrow-firehose-web_2016-07-07T14:00:00.000Z_0_1"}]
2016-07-07T14:40:30,506 INFO [Thread-54] com.metamx.common.lifecycle.Lifecycle$AnnotationBasedHandler - Invoking stop method[public void io.druid.curator.discovery.ServerDiscoverySelector.stop() throws java.io.IOException] on object[io.druid.curator.discovery.ServerDiscoverySelector@4c577186].
2016-07-07T14:40:30,508 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
 
"id" : "index_realtime_sparrow-firehose-web_2016-07-07T14:00:00.000Z_0_1",
 
"status" : "SUCCESS",
 
"duration" : 3024414
}


However, when I look at the coordinator it shows that the task FAILED.

Noppanit Charassinvichai

unread,
Jul 7, 2016, 10:58:20 AM7/7/16
to Druid User
Should I ignore what the coordinator says?

Gian Merlino

unread,
Jul 7, 2016, 11:03:50 AM7/7/16
to druid...@googlegroups.com
Hey Noppanit,

Did you set druid.indexer.task.restoreTasksOnRestart=true? Without that, the task executor will stop gracefully, but the middleManager watching it won't be expecting that and will mark it failed anyway.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/946a0ed9-5bd1-4f84-a3a3-c94227e8b343%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Noppanit Charassinvichai

unread,
Jul 7, 2016, 11:19:53 AM7/7/16
to Druid User
Gi Gian,

Yes I set that on the middle manager.

druid.indexer.fork.property.druid.indexer.task.restoreTasksOnRestart=true

Gian Merlino

unread,
Jul 7, 2016, 11:23:55 AM7/7/16
to druid...@googlegroups.com
druid.indexer.task.restoreTasksOnRestart, *not* druid.indexer.fork.property.druid.indexer.task.restoreTasksOnRestart.

Gian

Noppanit Charassinvichai

unread,
Jul 7, 2016, 2:42:43 PM7/7/16
to Druid User
Thanks both Gian and David. I was confused a bit with the documentation. It works now by setting druid.indexer.task.restoreTasksOnRestart,
Reply all
Reply to author
Forward
0 new messages