Unable to see the new data that has been indexed

551 views
Skip to first unread message

Giri Tata

unread,
Apr 6, 2016, 10:03:52 AM4/6/16
to Druid User
All,

This is the first time i am trying out druid. Looks very promising, but running into some teething issues. I hope you can help .

I have ingested some files using batch mode ( non-hadoop). The ingestion was successful as seen in the overlord console and also checked the ingestion log . 

SegmentInsertAction{segments=[DataSegment{size=120158798, shardSpec=No
neShardSpec, metrics=[m1,m2.m3,m4,m5], dimensions=[a,b,c,d,e,f,g,h,i,j], version='2016-04-06T05:33:36.716Z', loadSpec={type=local
, path=/druid/storage/pos_big/2016-02-18T00:00:00.000Z_2016-02-19T00:00:00.000Z/2016-04-06T05:33:36.716Z/0/index.zip}, interval=2016-02-18T00:00:00.000Z/2016-02
-19T00:00:00.000Z, dataSource='pos_big', binaryVersion='9'}]}


......

2016-04-06T06:03:03,997 INFO [task-runner-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
  "id" : "index_pos_big_2016-04-06T05:33:36.716Z",
  "status" : "SUCCESS",
  "duration" : 1761899
}

I looked at the storage directories and they do have the files split for each date. However, when i query the data it does not show the new data loaded. I restarted druid but no luck. Is there any metadata update i need to do or check?


Thanks for your help

Giri


Giri Tata

unread,
Apr 6, 2016, 10:17:33 AM4/6/16
to Druid User
I just found that the /druid/indexCache/  doesn't have the files for the dates which are not showing up.  Is there a way to recreate the indexCache or rebuild it  for the newly ingested data?

Nishant Bangarwa

unread,
Apr 7, 2016, 1:26:33 AM4/7/16
to Druid User
Hi giri, 
Once the data is indexed, the druid coordinator loads it up onto some historical node which then serves the query. 
Do you have your coordinator and historical running ? 
Can you see the segment loaded in the coordinator console ? 

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/375f6dd4-75c8-47ad-8af0-64b6044fba09%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Giri Tata

unread,
Apr 7, 2016, 11:33:38 AM4/7/16
to Druid User
Nishant, 

Thanks for your reply. I did look at localhost:8090/console.html and it has status of SUCCESS for the load. I was loading each week separately as an index job and running them parallelly ( non overlapping time windows). I can see in the logs as segment being added. However, when i query the data i don't get all the results back for the dates i have loaded .. i only see 4 days  or so instead of full history 365+ days. I can see that the data has been indexed because 

/druid/storage/pos_big has directories for each day and is of size i expect 

However 

/druid/indexCache/pos_big has directories only for 4 days instead of full year 

The historical query results only show the dates in /druid/indexCache/pos_big , which is only 4 days 

Nishant Bangarwa

unread,
Apr 7, 2016, 11:42:53 AM4/7/16
to Druid User
Hi Giri, 
SUCCESS status for indexTask means that the data is indexed successfully and the segment is pushed to deep-storage. It does not guarantee that it has been handed over to the historical node. loading of segments on the historical nodes is done by the coordinator. 
Few things to check - 
1) Check the coordinator console (http://localhost:8081/console.html) which shows how much data is loaded on the historical node
2) Make sure that the historical node has enough capacity to load the segments (capacity of historical node is defined via druid.server.maxSize
3) check coordinator/historical logs for any errors related to segment loading



Giri Tata

unread,
Apr 7, 2016, 2:11:38 PM4/7/16
to Druid User
Nishant, 

Thanks a million for the help. You are right -   druid.server.maxSize was low and it did not load all the segments. After increasing the druid.server.maxSize and segment max size and restarting the cluster, i was able to see entire date range in the queries. 

Cheers
Giri
Reply all
Reply to author
Forward
0 new messages