Moat Daily Data Issues

26 views
Skip to first unread message

Amitabh Sural

unread,
Aug 11, 2016, 4:20:23 PM8/11/16
to sparklinedata
I am seeing a weird behavior with daily MOAT data. 
I am trying to create a new index for the dt=20160804. While the indexing task is still in progress , I see the following :

1 row selected (2.549 seconds)
0: jdbc:hive2://spl08.dev.dw.sc.gwallet.com:1> select count(*) from moat_daily_fixed where dt=20160804 ; 
+-----------+--+
|    _c0    |
+-----------+--+
| 62556733  |
+-----------+--+
1 row selected (1.574 seconds)
0: jdbc:hive2://spl08.dev.dw.sc.gwallet.com:1> select count(*) from moat_daily where dt=20160804 ; 
+-----------+--+
|    _c0    |
+-----------+--+
| 62556733  |
+-----------+--+
1 row selected (25.595 seconds)
0: jdbc:hive2://spl08.dev.dw.sc.gwallet.com:1> select count(*) from moat_daily_fixed where dt=20160804 ; 
+-----------+--+
|    _c0    |
+-----------+--+
| 62556733  |
+-----------+--+
1 row selected (1.636 seconds)

How is it possible that the row count for the final spl table for 20160804 is same as the fixed table when the indexing task is still undergoing. 

I am suspecting its the same issue that we saw a couple of days back with MOAT hourly table and Harish/Sri had to change some default rules.

One more thing , I see this through SQL 
1 row selected (1.636 seconds)
0: jdbc:hive2://spl08.dev.dw.sc.gwallet.com:1> select ts_utc, count(*) from moat_daily group by ts_utc;
+----------------------+-----------+--+
|        ts_utc        |    _c1    |
+----------------------+-----------+--+
| 2016-07-31 17:00:00  | 57004218  |
+----------------------+-----------+--+
1 row selected (0.095 seconds)


But the console : http://spl08.dev.dw.sc.gwallet.com:8081/#/datasources/moat_daily is showing me a segment for 2016-08-01 and I see a red mark against
the data source name, does it mean its disabled ?

Whereas I should see segments :
1 row selected (0.095 seconds)
0: jdbc:hive2://spl08.dev.dw.sc.gwallet.com:1> show partitions moat_daily_fixed ;
+--------------+--+
|    result    |
+--------------+--+
| dt=20160801  |
| dt=20160804  |
+--------------+--+
2 rows selected (0.478 seconds)

Is my understanding correct ? The indexing task for 20160804 was a success :

index_hadoop_moat_daily_2016-08-11T19:19:16.229Z SUCCESS

I guess that segment from 31-07 is still around because of failed cleanup during drop solution(I will look into that), but that does not still explain the other behaviors.

harish

unread,
Aug 11, 2016, 9:13:53 PM8/11/16
to sparklinedata
1. Are you reindexing the partition? You are probably still being served the current segment version for that dt. The new segment version becomes available only after the indexing job finishes.

2. Yes, check if the default  Load Rule is still there on your druid cluster. This is probably the reason why all the segments are not showing up in the Druid console.

Amitabh Sural

unread,
Aug 11, 2016, 9:39:41 PM8/11/16
to sparklinedata
1. Not sure about it , will keep this in mind next time.
2. I am not clear what to do here. I see similar rules for moat_hourly(that was fixed by you) and moat_daily. So , not clear what is going on and what I need to change.
Reply all
Reply to author
Forward
0 new messages