Hello,
I'm using Hadoop batch indexing mechanism to index data into Druid. The index task takes about 10 mins to index 5GB of data and the task status says "SUCCESS". I can even see the segments on HDFS deep storage.
However, the data source in the Overlord page has been in a "<99% available" state ever since the indexing task is complete.
A quick check in the MySQL database has the following:
"druid_dataSource" table is empty
"druid_pendingSegments" is empty
"druid_segments" table has data source segment info and corresponds to the directories found on HDFS for this data source.
There are no exceptions/errors found in the indexing task logs or the overlord / co-ordinator logs.
None of the queries return any result as well.
Here's our Druid setup:
Druid version: 0.9.1
1 overlord
1 co-ordinator
3 historical and middle manager nodes
1 broker.
Deep storage - HDFS
Metadata store: MySQL
I found similar behavior (datasource <99% available) for a data source that was indexed using Kafka indexing service as well.
Any pointers would really help.
Thanks,
Sumanth