Indexing task not shutting down on time

263 views
Skip to first unread message

rohit kochar

unread,
Aug 11, 2015, 2:40:27 AM8/11/15
to Druid User
In my druid setup there are 2 hosts running the middle managers each with capacity of 4 and the overlord runs on 1 of these boxes.
On one of these middle managers i am observing that the indexing task never get shut even 10 hours past there shutdown time.

From the the task logs i can observe that the task started the "merge and persist" routine at 14:31 (where 14:30 was the task shutdown time):

2015-08-10T14:31:04,947 INFO [mergedTest1-2015-08-10T13:30:00.000Z-persist-n-merge] io.druid.indexing.common.actions.RemoteTaskActionClient - Performing action for task[index_realtime_mergedTest1_2015-08-10T13:30:00.000Z_0_0]: SegmentInsertAction.....

And than rest of the log lines look like these :
2015-08-10T14:32:00,004 INFO [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2015-08-10T14:32:00,005 INFO [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [1] sinks. minTimestamp [1970-01-01T00:00:00.000Z]
2015-08-10T14:32:00,005 WARN [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - [2015-08-10T13:30:00.000Z] < [1970-01-01T00:00:00.000Z] Skipping persist and merge.
2015-08-10T14:32:00,005 INFO [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [0] sinks to persist and merge
2015-08-10T14:33:00,005 INFO [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Starting merge and push.
2015-08-10T14:33:00,005 INFO [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - Found [1] sinks. minTimestamp [1970-01-01T00:00:00.000Z] 
2015-08-10T14:33:00,005 WARN [mergedTest1-overseer-0] io.druid.segment.realtime.plumber.RealtimePlumber - [2015-08-10T13:30:00.000Z] < [1970-01-01T00:00:00.000Z] Skipping persist and merge. 

With occasional log line in between mentioning disappearance of the server :
2015-08-10T14:46:30,204 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - Server Disappeared[DruidServerMetadata{name='dp2013.XXX.XXX.XXX.XXX:8100', host='dp2013.XXX.XXX.XXX.XXX:8100', maxSize=0, tier='_default_tier', type='realtime', priority='0'}]
2015-08-10T14:46:37,112 INFO [ServerInventoryView-0] io.druid.curator.inventory.CuratorInventoryManager - Created new InventoryCacheListener for /druid/segments/dp2013.XXX.XXX.XXX.XXX:8100
2015-08-10T14:46:37,112 INFO [ServerInventoryView-0] io.druid.client.BatchServerInventoryView - New Server[DruidServerMetadata{name='dp2013.XXX.XXX.XXX.XXX:8100', host='dp2013.XXX.XXX.XXX.XXX:8100', maxSize=0, tier='_default_tier', type='realtime', priority='0'}]

What could be wrong here??

PS: I am using druid 0.7.3

Thanks
Rohit 
 




Nishant Bangarwa

unread,
Aug 11, 2015, 2:57:46 AM8/11/15
to druid...@googlegroups.com
Hi Rohit, 

After creating a segment and adding that to metadata storage the realtime task waits for the segment to be handed off to a historical node before it shuts down. 
Do you have the coordinator and historical nodes running and have free capacity on the historical nodes to load segments ? 
are there any exceptions in the coordinator/historical logs related to segment loading ?   

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/868f96e5-b182-4e66-93dd-a685ca4f227f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

rohit kochar

unread,
Aug 11, 2015, 4:14:01 AM8/11/15
to Druid User
Hello Nishant,
I couldn't find any relevant error/exception in the co-ordinator logs but i could find these exceptions in the logs of the historical node:

2015-08-10T14:34:25,152 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[mergedTest1_2015-08-10T13:30:00.000Z_2015-08-10T13:31:00.000Z_2015-08-10T13:30:14.148Z], segment=DataSegment{size=1414033, shardSpec=LinearShardSpec{partitionNum=0}, metrics=XXX, version='2015-08-10T13:30:14.148Z', loadSpec={type=local, path=/tmp/druid/localStorage/mergedTest1/2015-08-10T13:30:00.000Z_2015-08-10T13:31:00.000Z/2015-08-10T13:30:14.148Z/0/index.zip}, interval=2015-08-10T13:30:00.000Z/2015-08-10T13:31:00.000Z, dataSource='mergedTest1', binaryVersion='9'}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[mergedTest1_2015-08-10T13:30:00.000Z_2015-08-10T13:31:00.000Z_2015-08-10T13:30:14.148Z]

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:138) ~[druid-server-0.7.3.jar:0.7.3]

        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:163) [druid-server-0.7.3.jar:0.7.3]

        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:42) [druid-server-0.7.3.jar:0.7.3]

        at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:125) [druid-server-0.7.3.jar:0.7.3]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:516) [curator-recipes-2.7.0.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:510) [curator-recipes-2.7.0.jar:?]

        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92) [curator-framework-2.7.0.jar:?]

        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) [guava-16.0.1.jar:?]

        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83) [curator-framework-2.7.0.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:507) [curator-recipes-2.7.0.jar:?]

        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35) [curator-recipes-2.7.0.jar:?]

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$9.run(PathChildrenCache.java:759) [curator-recipes-2.7.0.jar:?]

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_40]

        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_40]

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_40]

        at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_40]

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_40]

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_40]

        at java.lang.Thread.run(Thread.java:745) [?:1.8.0_40]

Caused by: java.lang.IllegalArgumentException: Instantiation of [simple type, class io.druid.segment.loading.LocalLoadSpec] value failed: [/tmp/druid/localStorage/mergedTest1/2015-08-10T13:30:00.000Z_2015-08-10T13:31:00.000Z/2015-08-10T13:30:14.148Z/0/index.zip] does not exist

        at com.fasterxml.jackson.databind.ObjectMapper._convert(ObjectMapper.java:2774) ~[jackson-databind-2.4.4.jar:2.4.4]

        at com.fasterxml.jackson.databind.ObjectMapper.convertValue(ObjectMapper.java:2700) ~[jackson-databind-2.4.4.jar:2.4.4]

        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegmentFiles(SegmentLoaderLocalCacheManager.java:140) ~[druid-server-0.7.3.jar:0.7.3]

        at io.druid.segment.loading.SegmentLoaderLocalCacheManager.getSegment(SegmentLoaderLocalCacheManager.java:93) ~[druid-server-0.7.3.jar:0.7.3]

        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:150) ~[druid-server-0.7.3.jar:0.7.3]

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:134) ~[druid-server-0.7.3.jar:0.7.3]

        ... 18 more

Caused by: com.fasterxml.jackson.databind.JsonMappingException: Instantiation of [simple type, class io.druid.segment.loading.LocalLoadSpec] value failed: [/tmp/druid/localStorage/mergedTest1/2015-08-10T13:30:00.000Z_2015-08-10T13:31:00.000Z/2015-08-10T13:30:14.148Z/0/index.zip] does not exist

        at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.wrapException(StdValueInstantiator.java:405) ~[jackson-databind-2.4.4.jar:2.4.4]

        at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:234) ~[jackson-databind-2.4.4.jar:2.4.4]

        at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:167) ~[jackson-databind-2.4.4.jar:2.4.4]

        at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:398) ~[jackson-databind-2.4.4.jar:2.4.4]

        at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1064) ~[jackson-databind-2.4.4.jar:2.4.4]

        at com.fasterxml.jackson.databind.deser.BeanDeserializer.deserializeFromObject(BeanDeserializer.java:264) ~[jackson-databind-2.4.4.jar:2.4.4]

Caused by: java.lang.IllegalArgumentException: [/tmp/druid/localStorage/mergedTest1/2015-08-10T13:30:00.000Z_2015-08-10T13:31:00.000Z/2015-08-10T13:30:14.148Z/0/index.zip] does not exist
        at com.google.api.client.repackaged.com.google.common.base.Preconditions.checkArgument(Preconditions.java:119) ~[google-http-client-1.15.0-rc.jar:?]
        at com.google.api.client.util.Preconditions.checkArgument(Preconditions.java:69) ~[google-http-client-1.15.0-rc.jar:?]
        at io.druid.segment.loading.LocalLoadSpec.<init>(LocalLoadSpec.java:49) ~[druid-server-0.7.3.jar:0.7.3]
        at sun.reflect.GeneratedConstructorAccessor41.newInstance(Unknown Source) ~[?:?]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_40]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_40]
        at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:125) ~[jackson-databind-2.4.4.jar:0.7.3]
        at com.fasterxml.jackson.databind.deser.std.StdValueInstantiator.createFromObjectWith(StdValueInstantiator.java:230) ~[jackson-databind-2.4.4.jar:2.4.4]
        at com.fasterxml.jackson.databind.deser.impl.PropertyBasedCreator.build(PropertyBasedCreator.java:167) ~[jackson-databind-2.4.4.jar:2.4.4] 

I am not sure if the indexing task erratic behaviour is causing these exception on the historical node or vice versa.

Thanks
Rohit

Nishant Bangarwa

unread,
Aug 11, 2015, 5:35:07 AM8/11/15
to druid...@googlegroups.com
ah, It seems you are using local disk as deep storage and the middlemanager and historical nodes are on diff machines cauing this. 
Fwiw, when running on more than one hosts, the deep storage needs to be shared across druid nodes. 
you can find more info on supported deep storage options and how to configure them here - 


For more options, visit https://groups.google.com/d/optout.

rohit kochar

unread,
Aug 11, 2015, 6:02:05 AM8/11/15
to Druid User
Nishant,
Yes i am using local disk as deep storage.
So my current setup is like this :

Node 1 : Coordinator,Historical Node,Middle Manager,Broker
Node 2 : Middle Manager

The issue i reported was seen on the tasks running on Node1 which is co-located with historical node and the deep storage,whereas task running on Node 2 seems to run fine hence i m trying to understand how things would have worked for Node 2 when it doesn't have access to deep storage.
I was under the assumption that Peons will communicate with historical node via the RPC and hence need not have shared storage with them.

Thanks for your help :)

Nishant Bangarwa

unread,
Aug 11, 2015, 7:59:33 AM8/11/15
to druid...@googlegroups.com
Hi Rohit, 
See Inline

On Tue, Aug 11, 2015 at 3:32 PM, rohit kochar <mnit....@gmail.com> wrote:
Nishant,
Yes i am using local disk as deep storage.
So my current setup is like this :

Node 1 : Coordinator,Historical Node,Middle Manager,Broker
Node 2 : Middle Manager

The issue i reported was seen on the tasks running on Node1 which is co-located with historical node and the deep storage,whereas task running on Node 2 seems to run fine hence i m trying to understand how things would have worked for Node 2 when it doesn't have access to deep storage.
It should have been vice-versa, the middlemanager should work fine but the one located on Node2 should have issues during handover.   
I was under the assumption that Peons will communicate with historical node via the RPC and hence need not have shared storage with them.
thats not the case, druid needs a shared deep storage which acts as a permanent storage of data and nodes can load/unload data from deep storage.  

For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages