Hi!
I deployed a Druid system and kept ingesting data into RT node through Kafka, found some data lost comparing to Mysql which shared the same data source.
After checking RT node log, I found those waring and errors every hour. Are those the reason of data lost? Any idea how to fix them?
2014-11-24 10:00:00,794 WARN [dsp_client-overseer-2] io.druid.segment.realtime.plumber.RealtimePlumber - [2014-11-24T01:00:00.000Z] < [20
14-11-24T00:00:00.000Z] Skipping persist and merge.
2014-11-24 10:00:00,802 ERROR [dsp_client-2014-11-18T08:00:00.000Z-persist-n-merge] io.druid.segment.realtime.plumber.RealtimePlumber - Fa
iled to persist merged index[dsp_client]: {class=io.druid.segment.realtime.plumber.RealtimePlumber, exceptionType=class com.metamx.common.
IAE, exceptionMessage=Bad number of metrics[48], expected [45], interval=2014-11-18T08:00:00.000Z/2014-11-18T09:00:00.000Z}
com.metamx.common.IAE: Bad number of metrics[48], expected [45]
at io.druid.segment.IndexMerger.merge(IndexMerger.java:269)
at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:169)
at io.druid.segment.IndexMerger.mergeQueryableIndex(IndexMerger.java:162)
at io.druid.segment.realtime.plumber.RealtimePlumber$4.doRun(RealtimePlumber.java:348)
at io.druid.common.guava.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
My schemas.json:
{
"schema": {
"dataSource": "dsp_client",
"aggregators": [
{
"type": "count",
"name": "row_count"
},
{"type":"longSum", "name":"ips", "fieldName":"ips"},
///... 43 more aggs here
],
"indexGranularity": "hour",
"shardSpec": {
"type": "linear",
"partitionNum": 2
}
},
"config": {
"maxRowsInMemory": 500000,
"intermediatePersistPeriod": "PT10m"
},
"firehose": {
"type": "kafka-0.8",
"consumerProps": {
"zookeeper.connect": "192.168.3.16:2181,192.168.3.18:2181",
"zookeeper.connection.timeout.ms": "15000",
"zookeeper.session.timeout.ms": "40000",
"zookeeper.sync.time.ms": "5000",
"group.id": "druid-real-time-node-client",
"fetch.message.max.bytes": "1048586",
"auto.offset.reset": "largest",
"auto.commit.enable": "true"
},
"feed": "dsp_client_topic",
"parser": {
"timestampSpec": {
"column": "timestamp"
},
"data": {
"format": "json",
"dimensions": [
"campaign_id",
// ... 22 more dims here
]
}
}
},
"plumber": {
"type": "realtime",
"windowPeriod": "PT60m",
"segmentGranularity": "hour",
"basePersistDirectory": "/data/druid/realtime/basePersist",
"rejectionPolicyFactory": {
"type": "messageTime"
}
}
}