Hi, Gian:
I had configured the Overlord, but not the Middle Manager, for the segment locations directory, and after running nearly 24 hours (with hour granularity and a 10-minute window), I never saw any segments pushed to deep storage.
I see in the documentation on the Overlord that if you are running with "druid.indexer.runner.type=local", which is the default value, the Overlord directly launches the tasks without using the Middle Manager at all. This seems to confirm the experience I had on my first attempt, since when I posted my request to the Overlord, a real-time node was launched and it didn't appear the Middle Manager got used. But if that is the case, why didn't my segments get pushed to deep storage, since the Overlord contains all the required properties? Are there some other properties required other than druid.storage.type, druid.storage.storageDirectory, druid.segmentCache.locations, and druid.segmentCache.infoDir?
This morning, when it became obvious that I wasn't going to get segments pushed out, I brought down everything and restarted without a Middle Manager at all. Overlord runtime properties include both the MySQL and segment-location properties. I posted my realtime indexer request to the Overlord and saw that it stood up a realtime node, and I was immediately able to query the data. After the first segment should have been pushed out, however, nothing happened. But I did see the following in the Overlord log:
2014-02-21 18:11:51,512 INFO [Coordinator-Exec--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-02-21T18:11:51.512Z","service":"coordinator","host":"localhost:8082","metric":"coordinator/dropQueue/count","value":0,"user1":"localhost:8081"}]
2014-02-21 18:12:51,161 WARN [DatabaseSegmentManager-Exec--0] io.druid.db.DatabaseSegmentManager - No segments found in the database!
2014-02-21 18:12:51,367 INFO [DatabaseRuleManager-Exec--0] io.druid.db.DatabaseRuleManager - Polled and found rules for 1 datasource(s)
2014-02-21 18:12:51,513 INFO [Coordinator-Exec--0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant create queue is empty.
2014-02-21 18:12:51,513 INFO [Coordinator-Exec--0] io.druid.server.coordinator.ReplicationThrottler - [_default_tier]: Replicant terminate queue is empty.
2014-02-21 18:12:51,513 INFO [Coordinator-Exec--0] io.druid.server.coordinator.helper.DruidCoordinatorBalancer - [_default_tier]: One or fewer servers found. Cannot balance.
2014-02-21 18:12:51,514 INFO [Coordinator-Exec--0] com.metamx.emitter.core.LoggingEmitter - Event [{"feed":"metrics","timestamp":"2014-02-21T18:12:51.513Z","service":"coordinator","host":"localhost:8082","metric":"coordinator/overShadowed/count","value":0}]
at about the time I expected the first segment to be pushed. Does this shed any light? I wouldn't expect there to be any segments in the database, yet, as this was the time I expected the first segment to be pushed. But I'm wondering if that "One or fewer servers found." statement is really a problem or not, or even related to my issue.
I've noticed other people having issues where either they never see historical segments, or the segments appear 6 hours late, and I don't recall seeing those issues resolved (at least, in the text of the thread). Are there any other places I should be looking, log-wise? I should mention if it's not obvious that the segments table in MySQL is empty.
Hi Fangjin:
Everything's running now. My segment granularity is 1 hour, window period 10 minutes, intermediate persist 10 minutes and rollup 1 minute.
Everything started working when I removed the following:
"rejectionPolicy": {
"type": "messageTime"
}
I was looking at the code and suppose I'd need to run in debug to see exactly what's happening, but it appears that this rejection policy results in everything being rejected when it comes time for the "merge and push" task. In the online docs, under "rejectionPolicy" there's a link to Realtime, but no discussion there of rejectionPolicy. It would be helpful to have a little discussion of that somewhere. For now I'm just going to leave it out, but the example realtime indexing service configuration in the docs includes the above policy, so I think anyone using it without understanding it is likely to have some issues.
On the other point -- that really is a bug in Druid, re the setting the of the druid.indexer.taskDir property. If you set that to any value other than "/tmp", part of the codebase writes to the new directory, and part writes to the "/tmp", with the result being that the incremental persists will fail due to "directory not found". I would try a fix, but right now I might break more than I fix -- just as in the batch indexer task, someone would need to run through all places where files are written to "/tmp", either directly or indirectly via JDK API, and ensure they use the property instead. I'll revisit this later, since right now I have to have really large /tmp partitions to be able to run some of these jobs.
Thanks again -- Wayne
P.S. Agree with Gian or whoever it was who said standing up a realtime node from the indexing service has an issue with log-file size.... I stood up a cluster on Friday afternoon and this morning, I woke up to a 158-GB-and-counting realtime log file!
I have a similar problem (https://groups.google.com/forum/#!topic/druid-user/DJp-myKKBCM) , and I don't know how to solve it , I hope you suggest some advise to me 。