[2017-07-27 19:02:24,018] INFO Opening record writer for: my_bucket/year=2017/month=07/day=27/hour=19/my_topic+53+0007467933.avro (io.confluent.connect.s3.format.avro.AvroRecordWriterProvider:66)
[2017-07-27 19:02:24,470] INFO Files committed to S3. Target commit offset for my_topic-53 is 7467933 (io.confluent.connect.s3.TopicPartitionWriter:407)
[2017-07-27 19:02:24,471] INFO Opening record writer for: my_bucket/year=2017/month=07/day=27/hour=19/my_topic+53+0007467934.avro (io.confluent.connect.s3.format.avro.AvroRecordWriterProvider:66)
... (hundreds more, each with the offset incremented by one)
I'm seeing behavior that I don't necessarily expect with the Kafka Connect S3 connector. I have a topic w/ 64 partitions, and am using the attached worker and connector configurations to write data from these partitions into S3. The intent is to roll the files hourly, ideally ending up with 64 files per hour. As you can likely tell from the naming convention, I'm looking to use Amazon Athena to query this data. My first challenge was related to the number of partitions and the s3.part.size parameter default; after setting that to the minimum (5MB), I haven't had any further issues with running out of memory.However, I am seeing two different problem behaviors - the first is that occasionally, the workers appear to get into a state where they are constantly rebalancing, and the offsets don't appear to be committed (seeing messages like "WARN Commit of WorkerSinkTask{id=pixall-parsed-non-prod-2} offsets timed out (org.apache.kafka.connect.runtime.WorkerSinkTask:172)", which implies I may be hitting https://issues.apache.org/jira/browse/KAFKA-4942). This ends up creating significant problems, because when the rebalance starts over, it starts from somewhere much further back in time, and ends up writing large amounts of duplicate data from previous hours into the bucket for the current wall clock hour. I commented on some changes I made in the worker-config.properties file to try and address this (increased heartbeat.interval.ms, session.timeout.ms, and offset.flush.timeout.ms, and decreased offset.flush.interval.ms), but the problem is still happening intermittently.
In addition, some of the workers will occasionally create hundreds (or even thousands) of files in the S3 bucket, each containing a single Avro record. Below are a couple sample log lines to show what this looks like when it occurs:[2017-07-27 19:02:24,018] INFO Opening record writer for: my_bucket/year=2017/month=07/day=27/hour=19/my_topic+53+0007467933.avro (io.confluent.connect.s3.format.avro.AvroRecordWriterProvider:66)
[2017-07-27 19:02:24,470] INFO Files committed to S3. Target commit offset for my_topic-53 is 7467933 (io.confluent.connect.s3.TopicPartitionWriter:407)
[2017-07-27 19:02:24,471] INFO Opening record writer for: my_bucket/year=2017/month=07/day=27/hour=19/my_topic+53+0007467934.avro (io.confluent.connect.s3.format.avro.AvroRecordWriterProvider:66)
... (hundreds more, each with the offset incremented by one)
I've definitely run into a wall, and would appreciate any help or support this group can provide!
Regards,Will
--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/de4c890b-e31d-4c12-9412-1b32a20e0abc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.