Kafka Connect going down due to large file load

633 views
Skip to first unread message

Sreejith S

unread,
Sep 12, 2018, 8:43:48 AM9/12/18
to Confluent Platform
Hi All,

We are getting weird behavior in our connect cluster when we run multiple connectors with say 4GB of data each , then one connector would end up in Failed to commit offsets and its leading to connect framework down.

Scenario is , if i run only one file connector with 4GB of data , its able to process the file completely with out any exceptions. But the moment when we spawn new connector with another 4G file , first connector is not giving any records and either of the connector get in to below exception.

This happens when multiple connectors dealing with large data.

Connect logs after first connector
set:21018483}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,356] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018506}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,356] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018529}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,356] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018552}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,357] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018575}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,357] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018598}, Current: {epoch:2, offset:1608878} for Partition: s3-topic1-0 (kafka.server.epoch.LeaderEpochFileCache)
[2018-07-24 04:47:05,358] WARN Received a PartitionLeaderEpoch assignment for an epoch < latestEpoch. This implies messages have arrived out of order. New: {epoch:0, offset:21018621}, Current: {epoch:2, 


Connect logs after posting the second connector:

sumerConfig:287)
[2018-07-24 04:39:34,525] WARN The configuration 'value.converter.schemas.enable' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2018-07-24 04:39:34,526] WARN The configuration 'internal.value.converter' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2018-07-24 04:39:34,526] WARN The configuration 'offset.storage.replication.factor' was supplied but isn't a known config. (org.apache.kafka.clients.consumer.ConsumerConfig:287)
[2018-07-24 04:39:34,584] WARN Catching up to assignment's config offset. (org.apache.kafka.connect.runtime.distributed.DistributedHerder:767)
[2018-07-24 04:46:03,141] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to flush, timed out while waiting for producer to flush outstanding 15603 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:350)
[2018-07-24 04:46:03,162] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)
[2018-07-24 04:46:23,848] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to flush, timed out while waiting for producer to flush outstanding 24072 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:350)
[2018-07-24 04:46:23,852] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)
[2018-07-24 04:46:59,156] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to flush, timed out while waiting for producer to flush outstanding 15974 messages (org.apache.kafka.connect.runtime.WorkerSourceTask:350)
[2018-07-24 04:46:59,161] ERROR WorkerSourceTask{id=cloud-source-test-july24-1014-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter:111)


We have tried increasing offset.flush.timeout.ms, decreasing offset.flush.interval.ms, reducing buffer.memory and lowering batch.size but the issue still persists. I assume it has nothing to do with the connector logic. What would be the problem here ? Is there any resolution for this.

Thank You 
Srijith

Amit Sahu

unread,
Sep 12, 2018, 12:04:03 PM9/12/18
to confluent...@googlegroups.com
Can you share the worker property file?? 

Regards, 
Amit 

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/511c8897-a958-4ee3-9e5a-3c659f021ae2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Sreejith S

unread,
Sep 12, 2018, 12:22:59 PM9/12/18
to confluent...@googlegroups.com
Hi Amit,

Havent made any major changes in Broker as well as Connect nodes proeprties. Most are defaults.

You can see it here.


Does it actually due to any of these properties ? Or is there any issue with the Producer with in Connect Framework ?

Thank You

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/fOSwb1vlYaA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/CAL%2BJYYG9k76-d41mGUWtFwNFGpLu_jFBNr60QVh%3DTnR3v3gC3Q%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.


--

Amit Sahu

unread,
Sep 12, 2018, 2:31:28 PM9/12/18
to confluent...@googlegroups.com
Hi, 
Yes, you haven't provided the producers configs in the worker properties file. 

All producer configs should go like

producer.buffer....Something, gave example in end. 

You can find all the properties for initializing a production-ready producer in the official Confluent site... 

Use values according to your requirements. 

producer.batch.size=


Regards, 
Amit 

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/fOSwb1vlYaA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/CABD2eG6rMv6xZK6qHFLQ-dC3m37hw8%2BU02X8d-F7gc03XjxqzA%40mail.gmail.com.

Sreejith S

unread,
Sep 12, 2018, 10:59:14 PM9/12/18
to confluent...@googlegroups.com
Thanks Amit, 

I have overrided the producer configs and asince it was not working, i revoked it.So the shared configs does not have that configs.

But why the default producer configs fails in this case?  



Thanks
Srijith

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/fOSwb1vlYaA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.


--

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/fOSwb1vlYaA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platf...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/CAL%2BJYYEcON2m0X1drMZ%2BtN-k5DvXbRJFLdYT5vrnVrN1iRwdZg%40mail.gmail.com.

Amit Sahu

unread,
Sep 13, 2018, 2:54:53 AM9/13/18
to confluent...@googlegroups.com
Hi Shrijith,
Default producer configuration won't work properly as these are optimized for very basic loading. That's why we need to do some play-around and decide what's best for us. :-)

Regards,
Amit


Sreejith S

unread,
Sep 13, 2018, 3:09:39 AM9/13/18
to confluent...@googlegroups.com
Thank u Amit

Let me try some changes on producer ans update

Reply all
Reply to author
Forward
0 new messages