We are getting weird behavior in our connect cluster when we run multiple connectors with say 4GB of data each , then one connector would end up in Failed to commit offsets and its leading to connect framework down.
Scenario is , if i run only one file connector with 4GB of data , its able to process the file completely with out any exceptions. But the moment when we spawn new connector with another 4G file , first connector is not giving any records and either of the connector get in to below exception.
This happens when multiple connectors dealing with large data.
Connect logs after first connector
set:21018483}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,356] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018506}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,356] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018529}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,356] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018552}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,357] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018575}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,357] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018598}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,358] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018621}, Current: {epoch:2,
Connect logs after posting the second connector:
sumerConfig:287)
[2018-07-24 04:39:34,525] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2018-07-24 04:39:34,526] WARN The configuration 'internal.value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2018-07-24 04:39:34,526] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2018-07-24 04:39:34,584] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:767)
[2018-07-24 04:46:03,141] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to flush, timed out while waiting for producer to flush outstanding 15603 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:350)
[2018-07-24 04:46:03,162] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)
[2018-07-24 04:46:23,848] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to flush, timed out while waiting for producer to flush outstanding 24072 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:350)
[2018-07-24 04:46:23,852] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)
[2018-07-24 04:46:59,156] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to flush, timed out while waiting for producer to flush outstanding 15974 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:350)
[2018-07-24 04:46:59,161] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)