[S3 Connector] TimeBasedPartitioner, TimestampExtractor: Exactly once delivery and rotation

179 views
Skip to first unread message

Major OPPO

unread,
Jul 11, 2017, 4:55:37 PM7/11/17
to Confluent Platform
Hi all,

We are currently upgrading our Kafka-Connect from 3.2.0 to 3.2.2.


We want to take advantage of the `exactly once delivery` system to S3. However we have some questions regarding the way it works.
We have events that use the TimeBasedPartitioner where we plugged a custom TimestampExtractor that gets the timestamp from our events. So in theory we can achieve exactly once delivery.
Do I need additional configuration to enable 'exactly once delivery' or should it be automatic since the partitioner is deterministic?


We also have some concerns about processing late events.
We need to put our events in S3 using this time pattern: 'YYYY/MM/dd/HH'
I'm wondering what will happen if one event of hour N is delayed by our producer and it gets inserted in Kafka after multiple events of hour N+1. Will it be disregarded or will Kafka Connect commit a file with only this event in the correct folder?

Also is it safe to use `rotate.interval.ms` to commit files before it hits the flush size if there isn't enough data during this time period?


Thanks for any help :)

dhawan.g...@datavisor.com

unread,
Jul 12, 2017, 12:06:40 AM7/12/17
to Confluent Platform
You would need something like this.

schema.generator.class=io.confluent.connect.storage.hive.schema.TimeBasedSchemaGenerator
partitioner.class=io.confluent.connect.storage.partitioner.TimeBasedPartitioner
path.format=YYYY/MM/DD//dd/hh


Reply all
Reply to author
Forward
0 new messages