Questions about "auto.offset.reset" in Kafka-indexing-service

805 views
Skip to first unread message

zhangxin...@gmail.com

unread,
Jul 16, 2016, 7:53:14 AM7/16/16
to Druid User
Hi all,
    I'm using druid-0.9.1 with features of Kafka-indexing-service. That's really awesome!
    My imply cluster has one node with zk + broker + pivot + overlord + coordinator, one node with  overlord + coordinator for HA, and eight nodes with middlemanager + historical
    My Kafka cluster has nine servers, and data retention is set by 2 HOUR.
    In Spec of Kafka-indexing-service, I add an option "auto.offset.reset" : "latest" in “consumerProperties“, when I restart supervisor for my topic, I can find "auto.offset.reset=latest" in overlord.log, but this option is still  set "none" in worker Spec, so worker will throws an OffsetOutofRangeException when reading an expired offset in partitions . Many Kafka indexing tasks failed and numbers of segments of each hour reduce a lot.
    How can I fix this problem, can anyone help me solve this problem?

"ioConfig": {
    "topic": "tsl",
    "consumerProperties": {
      "auto.offset.reset" : "latest"
    }
  }

Gian Merlino

unread,
Jul 16, 2016, 1:04:08 PM7/16/16
to druid...@googlegroups.com
Because of your 2 hour data retention, I guess you're hitting a case where the Druid Kafka indexing tasks are trying to read offsets that have already been deleted. This causes problems with the exactly-once transaction handling scheme, which requires that all offsets be read in order, without skipping any. The Github issue https://github.com/druid-io/druid/issues/3195 is about making this better – basically you would have an option to reset the Kafka indexing to latest (this would involve resetting the ingestion metadata Druid stores for the datasource).

In the meantime, maybe it's possible to make this happen less often by either extending your Kafka retention, or by setting your Druid taskDuration lower than the default of 1 hour.

Gian

--
You received this message because you are subscribed to the Google Groups "Druid User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to druid-user+...@googlegroups.com.
To post to this group, send email to druid...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/druid-user/ea4f8d77-2bba-46fb-a303-03a957a2ee72%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

zhangxin...@gmail.com

unread,
Jul 16, 2016, 9:56:10 PM7/16/16
to Druid User
Thanks, Gian Merlino 

在 2016年7月17日星期日 UTC+8上午1:04:08,Gian Merlino写道:

zhangxin...@gmail.com

unread,
Jul 18, 2016, 9:10:52 AM7/18/16
to Druid User
Hi Gian
    I have set my Druid taskDuration by "PT20M", and it really works! but I still need to watch the status of the cluster for some days. Thanks a lot !
    I still have no idea about that why setting Druid taskDuration lower can  make the loss of segments happen less?

Thanks!

在 2016年7月17日星期日 UTC+8上午1:04:08,Gian Merlino写道:
Because of your 2 hour data retention, I guess you're hitting a case where the Druid Kafka indexing tasks are trying to read offsets that have already been deleted. This causes problems with the exactly-once transaction handling scheme, which requires that all offsets be read in order, without skipping any. The Github issue https://github.com/druid-io/druid/issues/3195 is about making this better – basically you would have an option to reset the Kafka indexing to latest (this would involve resetting the ingestion metadata Druid stores for the datasource).

Gian Merlino

unread,
Jul 18, 2016, 1:04:40 PM7/18/16
to druid...@googlegroups.com
Using a shorter taskDuration makes Druid commit Kafka offsets more often, which if you have really short retention is probably going to be more stable. (You don't want the most recently committed offsets to fall out of the retention window, since then you can't failover or retry tasks.)

Gian

Reply all
Reply to author
Forward
0 new messages