Segments created by hadoop indexer not being served by historical

96 views
Skip to first unread message

ravi teja

unread,
Dec 11, 2014, 2:49:33 AM12/11/14
to druid-de...@googlegroups.com
Hi ,

I have the realtime pipeline setup for druid using tranquility.
I am checking the lambda architecture on druid.

Hence creating the update pipeline where the existing messages can get updated using the batch ingestion.

For this I am using Hadoop Indexer for the batch pipeline.

The hadoop  index task is successful and the segments are created and I can see them in prod_segments table in mysql as well with used flag as 1.

I am not able to query the new data, nor those segments are being served by the historical nodes, as checked in co-od console.

I dont see any errors in co-odinator or historical nodes.

Can you help me in this regard.


Thanks in advance,
Ravi Teja

Gian Merlino

unread,
Dec 11, 2014, 12:02:13 PM12/11/14
to druid-de...@googlegroups.com
Do you see your coordinator periodically polling the database and finding the right number of segments? The log message should say "Polled and found %,d segments in the database", and the logged count should be the number of prod_segments that have used set to 1.

Also, can you share some of the rows from prod_segments that are not being loaded, and some of the rows that are being loaded?

ravi teja

unread,
Dec 11, 2014, 11:54:19 PM12/11/14
to druid-de...@googlegroups.com
Hi Glan,

Firstly , thanks for the reply,

These are the segments which are seen at every level:

mysql: 485
co-od log: 485
co-od console: 481


So I see that there are 4 segments not being served by the historical and I cant see any errors in the co-od logs as well.


Mysql:

mysql> select count(*) from prod_segments where used=1;

+----------+

| count(*) |

+----------+

|      485 |

+----------+

1 row in set (0.00 sec)



Co-od log:

2014-12-12 04:46:39,841 INFO [DatabaseSegmentManager-Exec--0] io.druid.db.DatabaseSegmentManager - Polled and found 485 segments in the database


Co-od console:

962 entries

With replication factor 2,
hence 481


Thanks,
Ravi

--
You received this message because you are subscribed to a topic in the Google Groups "Druid Development" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/druid-development/RymYllsIo8k/unsubscribe.
To unsubscribe from this group and all its topics, send an email to druid-developm...@googlegroups.com.
To post to this group, send email to druid-de...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-development/b9b98d54-0fa8-4319-859e-48bd8421c25c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

ravi teja

unread,
Dec 12, 2014, 12:11:15 AM12/12/14
to druid-de...@googlegroups.com
I see that the segment load fails at historical with this error, even though the index.zip is present at that location:

2014-12-11 07:19:54,188 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator

, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[batch_test3_2014-12-10T09:13:00.000Z_2014-12-10T09:14:00.000Z_2014

-12-11T07:17:55.478Z], segment=DataSegment{size=643, shardSpec=NoneShardSpec, metrics=[count], dimensions=[source_client, offerid, parentid, schemaversion, icmpid, eventid], version

='2014-12-11T07:17:55.478Z', loadSpec={type=hdfs, path=/grid/lambda/druid/batch_test3/batch_test3/20141210T091300.000Z_20141210T091400.000Z/2014-12-11T07_17_55.478Z/0/index.zip}, in

terval=2014-12-10T09:13:00.000Z/2014-12-10T09:14:00.000Z, dataSource='batch_test3', binaryVersion='9'}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[batch_test3_2014-12-10T09:13:00.000Z_2014-12-10T09:14:00.000Z_2014-12-11T07:17:55.478Z]

at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:136)

at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44)

at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:113)

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:509)

at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:503)

at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)

at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)

at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)

at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:500)

at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)




The description.json looks like this:


{"dataSource":"batch_test3","interval":"2014-12-10T09:13:00.000Z/2014-12-10T09:14:00.000Z","version":"2014-12-11T07:17:55.478Z","loadSpec":{"type":"hdfs","path":"/grid/lambda/druid/batch_test3/batch_test3/20141210T091300.000Z_20141210T091400.000Z/2014-12-11T07_17_55.478Z/0/index.zip"},"dimensions":"parentid","metrics":"count","shardSpec":{"type":"none"},"binaryVersion":9,"size":643,"identifier":"batch_test3_2014-12-10T09:13:00.000Z_2014-12-10T09:14:00.000Z_2014-12-11T07:17:55.478Z"}



Thanks,
Ravi

Nishant Bangarwa

unread,
Dec 12, 2014, 3:14:23 AM12/12/14
to druid-de...@googlegroups.com
Hi Ravi, 
is there any other stack trace for the underlying root cause ? 
what is the size of index.zip file ? 

--
You received this message because you are subscribed to the Google Groups "Druid Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-developm...@googlegroups.com.

To post to this group, send email to druid-de...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

ravi teja

unread,
Dec 12, 2014, 3:36:06 AM12/12/14
to druid-de...@googlegroups.com
Hi Nishant,

There is no other stack trace .  The size of the index is 729Bytes.
I have only sent one message for test.

-rw-r--r--

druid

hadoop

729 B

3

128 MB

index.zip



Thanks,
Ravi


Nishant Bangarwa

unread,
Dec 12, 2014, 9:56:56 AM12/12/14
to druid-de...@googlegroups.com
Hi Ravi, 

can you also share complete logs for more info on this. 


For more options, visit https://groups.google.com/d/optout.

ravi teja

unread,
Dec 15, 2014, 1:44:47 AM12/15/14
to druid-de...@googlegroups.com
Hi Nishant,

I have seen other topics, where similar problems are faced by other users:


The batch indexer is removing the HDFS absolute path in the segment entry at the mysql.

How should we fix this, should I also patch the Indexerjob code? I am using 0.6.154 version of druid.
Is there a workaround for this issue?

The HDFS  stack trace:


2014-12-11 07:17:54,109 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[batch_test3_2014-12-10T09:13:00.000Z_2014-12-10T09:14:00.000Z_2014-12-11T07:17:00.380Z], segment=DataSegment{size=643, shardSpec=NoneShardSpec, metrics=[count], dimensions=[source_client, offerid, parentid, schemaversion, icmpid, eventid], version='2014-12-11T07:17:00.380Z', loadSpec={type=hdfs, path=/grid/lambda/druid/batch_test3/batch_test3/20141210T091300.000Z_20141210T091400.000Z/2014-12-11T07_17_00.380Z/0/index.zip}, interval=2014-12-10T09:13:00.000Z/2014-12-10T09:14:00.000Z, dataSource='batch_test3', binaryVersion='9'}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[batch_test3_2014-12-10T09:13:00.000Z_2014-12-10T09:14:00.000Z_2014-12-11T07:17:00.380Z]

        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:136)

        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44)

        at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:113)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:509)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:503)

        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)

        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)

        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:500)

        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run(PathChildrenCache.java:762)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

Caused by: io.druid.segment.loading.SegmentLoadingException: Path[/grid/lambda/druid/batch_test3/batch_test3/20141210T091300.000Z_20141210T091400.000Z/2014-12-11T07_17_00.380Z/0/index.zip] doesn't exist.

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.checkPathAndGetFilesystem(HdfsDataSegmentPuller.java:94)

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:52)

        at io.druid.segment.loading.OmniSegmentLoader.getSegmentFiles(OmniSegmentLoader.java:125)

        at io.druid.segment.loading.OmniSegmentLoader.getSegment(OmniSegmentLoader.java:93)

        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:145)

        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:132)        ... 17 more




Thanks,
Ravi

Nishant Bangarwa

unread,
Dec 15, 2014, 10:52:31 AM12/15/14
to druid-de...@googlegroups.com
Hi Ravi, 

looks like the path where segments are stored are correct, since other segments are being loaded correctly. 
I wonder if there were any issues in classpath with the indexing, when you generated these specific segments that might have caused this ? 
does this happens again if you reindex with correct indexer properties ?


For more options, visit https://groups.google.com/d/optout.

Gian Merlino

unread,
Dec 15, 2014, 11:12:46 AM12/15/14
to druid-de...@googlegroups.com
Ravi, can you try updating to the stable version of druid (0.6.160)? In that version we accepted a patch from Flowyi that makes the written paths absolute.

Gian Merlino

unread,
Dec 15, 2014, 11:13:43 AM12/15/14
to druid-de...@googlegroups.com
You'll need to reindex those problematic segments after upgrading.
...

ravi teja

unread,
Dec 16, 2014, 12:11:54 AM12/16/14
to druid-de...@googlegroups.com
Hi Guys,

Thanks for the replies.

@Nishant, the hadoop configs are getting properly picked up from the realtime node where Hadoop indexer is running, as its able to put the segments in HDFS using the same configs.
I have rechecked on the classpath,its picking up the config.


@Gian,
I will try to upgrade the cluster to 0.6.160. I hope its backward compatible with 0.6.154, as I have a lot of test data which is indexed using 0.6.154.


Thanks,
Ravi


Fangjin Yang

unread,
Dec 16, 2014, 2:01:18 PM12/16/14
to druid-de...@googlegroups.com
Hi Ravi, 0.6.160 is compatible with your version.
To unsubscribe from this group and all its topics, send an email to druid-development+unsubscribe@googlegroups.com.
To post to this group, send email to druid-development@googlegroups.com.

ravi teja

unread,
Dec 22, 2014, 4:01:51 AM12/22/14
to druid-de...@googlegroups.com
Hi Yang, Nishant,


I have upgraded the historicals, overlord and realtime to 0.6.160.
Now I see that the segment path is being updated with the absolute HDFS URL.
and the historical is able to find it.

But still the historical is unable to load the segment because of another exception,



2014-12-22 06:55:15,215 ERROR [ZkCoordinator-0] io.druid.server.coordination.ZkCoordinator - Failed to load segment for dataSource: {class=io.druid.server.coordination.ZkCoordinator, exceptionType=class io.druid.segment.loading.SegmentLoadingException, exceptionMessage=Exception loading segment[batch_test_2014-12-09T06:39:00.000Z_2014-12-09T06:40:00.000Z_2014-12-22T06:47:01.999Z], segment=DataSegment{size=643, shardSpec=NoneShardSpec, metrics=[count], dimensions=[source_client, offerid, parentid, schemaversion, icmpid, eventid], version='2014-12-22T06:47:01.999Z', loadSpec={type=hdfs, path=hdfs://b1nn1.ch.flipkart.com:8020/grid/lambda/druid/batch_test/batch_test/20141209T063900.000Z_20141209T064000.000Z/2014-12-22T06_47_01.999Z/0/index.zip}, interval=2014-12-09T06:39:00.000Z/2014-12-09T06:40:00.000Z, dataSource='batch_test', binaryVersion='9'}}

io.druid.segment.loading.SegmentLoadingException: Exception loading segment[batch_test_2014-12-09T06:39:00.000Z_2014-12-09T06:40:00.000Z_2014-12-22T06:47:01.999Z]

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:140)

        at io.druid.server.coordination.ZkCoordinator.addSegment(ZkCoordinator.java:165)

        at io.druid.server.coordination.SegmentChangeRequestLoad.go(SegmentChangeRequestLoad.java:44)

        at io.druid.server.coordination.BaseZkCoordinator$1.childEvent(BaseZkCoordinator.java:127)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:509)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$5.apply(PathChildrenCache.java:503)

        at org.apache.curator.framework.listen.ListenerContainer$1.run(ListenerContainer.java:92)

        at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)

        at org.apache.curator.framework.listen.ListenerContainer.forEach(ListenerContainer.java:83)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache.callListeners(PathChildrenCache.java:500)

        at org.apache.curator.framework.recipes.cache.EventOperation.invoke(EventOperation.java:35)

        at org.apache.curator.framework.recipes.cache.PathChildrenCache$10.run(PathChildrenCache.java:762)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

        at java.util.concurrent.FutureTask.run(FutureTask.java:262)

        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

        at java.lang.Thread.run(Thread.java:745)

Caused by: io.druid.segment.loading.SegmentLoadingException: Some IOException

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:61)

        at io.druid.segment.loading.OmniSegmentLoader.getSegmentFiles(OmniSegmentLoader.java:125)

        at io.druid.segment.loading.OmniSegmentLoader.getSegment(OmniSegmentLoader.java:93)

        at io.druid.server.coordination.ServerManager.loadSegment(ServerManager.java:145)

        at io.druid.server.coordination.ZkCoordinator.loadSegment(ZkCoordinator.java:136)

        ... 18 more

Caused by: java.io.FileNotFoundException: /grid/2/druid/historical/persistent/zk_druid/batch_test/2014-12-09T06:39:00.000Z_2014-12-09T06:40:00.000Z/2014-12-22T06:47:01.999Z/0/00000.smoosh (No such file or directory)

        at java.io.FileOutputStream.open(Native Method)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:221)

        at java.io.FileOutputStream.<init>(FileOutputStream.java:171)

        at io.druid.utils.CompressionUtils.unzip(CompressionUtils.java:94)

        at io.druid.storage.hdfs.HdfsDataSegmentPuller.getSegmentFiles(HdfsDataSegmentPuller.java:57)

        ... 22 more




Thanks,
Ravi

Nishant Bangarwa

unread,
Dec 22, 2014, 9:09:15 AM12/22/14
to druid-de...@googlegroups.com
Hi Ravi, 

have you reIndexed the broken segments ? 

once you have reIndexed the problematic segments, you will need to manually mark the old problematic segments as unused for the coordinator to be able to pick the new segments 


For more options, visit https://groups.google.com/d/optout.

Fangjin Yang

unread,
Dec 22, 2014, 11:32:13 AM12/22/14
to druid-de...@googlegroups.com
We should place a higher priority on fixing #540. It places a lot of work on the user when something goes wrong.
...
Reply all
Reply to author
Forward
0 new messages