Record CRC validation failure, please help

1,110 views
Skip to first unread message

Cheng Chen

unread,
Dec 17, 2016, 7:04:16 AM12/17/16
to Confluent Platform
Hey Confluent folks,
we are trying to use confluent platform ver 3.1.1 and kafka ver 0.10.1.0-cp2 in our swarm cluster.
and kept seeing this error when we were trying to consume the message.
2016-12-13 02:08:13,846] ERROR Task dp threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.common.KafkaException: Record for partition dp at offset 234 is invalid, cause: Record is corrupt (stored crc = 1133837813, computed crc = 2330297257)
	at org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:743)
	at org.apache.kafka.clients.consumer.internals.Fetcher.parseFetchedData(Fetcher.java:682)
	at org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:425)
	at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1045)
	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:979)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:317)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:235)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:172)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:143)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
At first we thought it might be related to https://github.com/confluentinc/kafka/commit/d2acd676c3eb0c11d0042bc3b9ae314165c68443,
but the update to "crc update function" was just a simple check, can anyone kindly shed some light into this?
Thanks in advance.
cheng
Message has been deleted

Cheng Chen

unread,
Dec 19, 2016, 4:03:27 AM12/19/16
to Confluent Platform
additional info: this happened on docker swarm with multiple nodes quite consistently.

Ewen Cheslack-Postava

unread,
Dec 20, 2016, 1:51:14 AM12/20/16
to Confluent Platform
Cheng,

I don't have any immediate suggestions, but a CRC check failure is pretty unusual. It indicates data got corrupted somehow. Do you have an easily reproducible setup for this issue?

-Ewen

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/39b660f6-f40c-408b-ad8c-0be1c2e28a75%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Cheng Chen

unread,
Dec 21, 2016, 2:44:14 AM12/21/16
to Confluent Platform
Thanks Ewen for your reply, turns out once we changed consumer.fetch.max.bytes from 50mb to 5kb, the problem went away, any idea why? 
also this happens in distributed mode(3 nodes).
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages