Problem: consumer_offsets partition skew and possibly ignored retention in Kafka 10.0.1

Eugene Dvorkin

unread,

Dec 26, 2016, 11:38:48 AM12/26/16

to Confluent Platform

Hi ,

I noticed than one of my partition for _consumer_offset grows very large:

0 __consumer_offsets-48

77G __consumer_offsets-49

0 __consumer_offsets-5

0 __consumer_offsets-8

0 __consumer_offsets-9

77 GB for quite small load. This is for cluster of 5 servers. Replication factor on this topic is 3.

Our retention policy set for 100mb but apparently deleting of old data does not happens.

log.retention.bytes=104857600

kafka_log_cleanup_interval_mins: 1

Why this is happening? Is it a bug in Kafka? How to clean it up?

Thanks

Dustin Cote

unread,

Dec 26, 2016, 7:47:14 PM12/26/16

to confluent...@googlegroups.com

Probably worth posting your whole broker configuration. I've seen this happen a lot to folks with log.cleaner.enable set to false. This happens for a variety of reasons but this property needs to be true for the internal topic to be cleaned as it's a compacted topic.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/baaaef2b-3769-44af-939c-056a9fd4131c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Eugene Dvorkin

unread,

Dec 27, 2016, 11:39:11 AM12/27/16

to Confluent Platform

Hi Dustin,

Here is my broker configuration.

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.

broker.id=1

# The port the socket server listens on

port=9092

#for rolling updates only

#log.message.format.version=0.9.0.0

#inter.broker.protocol.version=0.9.0.0

default.replication.factor=2

#Maximum message size a broker can replicate. Must be larger than message.max.bytes,

#or a broker can accept messages it cannot replicate, potentially resulting in data loss.

replica.fetch.max.bytes=40000000

############################# Socket Server Settings #############################

# Hostname the broker will bind to. If not set, the server will bind to all interfaces

host.name=10.204.25.29

delete.topic.enable=false

# Hostname the broker will advertise to producers and consumers. If not set, it uses the

# value for "host.name" if configured. Otherwise, it will use the value returned from

# java.net.InetAddress.getCanonicalHostName().

#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,

# it will publish the same port that the broker binds to.

#advertised.port=<port accessible by clients>

# The number of threads handling network requests

num.network.threads=2

# The number of threads doing disk I/O

num.io.threads=2

# The send buffer (SO_SNDBUF) used by the socket server

socket.send.buffer.bytes=1048576

# The receive buffer (SO_RCVBUF) used by the socket server

socket.receive.buffer.bytes=1048576

# The maximum size of a request that the socket server will accept (protection against OOM)

socket.request.max.bytes=104857600

############################# Log Basics #############################

# A comma seperated list of directories under which to store log files

log.dirs=/data/kafka

auto.create.topics.enable=true

# The number of logical partitions per topic per server. More partitions allow greater parallelism

# for consumption, but also mean more files.

num.partitions=3

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync

# the OS cache lazily. The following configurations control the flush of data to disk.

# There are a few important trade-offs here:

# 1. Durability: Unflushed data may be lost if you are not using replication.

# 2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.

# 3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.

# The settings below allow one to configure the flush policy to flush data after a period of time or

# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk

log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush

log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can

# be set to delete segments after a period of time, or after a given size has accumulated.

# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens

# from the end of the log.

# The minimum age of a log file to be eligible for deletion

log.retention.hours=24

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining

# segments don't drop below log.retention.bytes.

log.retention.bytes=104857600

# The maximum size of a log segment file. When this size is reached a new log segment will be created.

log.segment.bytes=104857600

#The maximum size of a message that the server can receive.

message.max.bytes=20000000

# The interval at which log segments are checked to see if they can be deleted according

# to the retention policies

log.retention.check.interval.ms=60000

# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.

# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.

log.cleaner.enable=false

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).

# This is a comma separated host:port pairs, each corresponding to a zk

# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".

# You can also append an optional chroot string to the urls to specify the

# root directory for all kafka znodes.

zookeeper.connect=10.204.25.93:2181,10.204.25.30:2181,10.204.25.200:2181,10.204.25.203:2181,10.204.25.12:2181

# Timeout in ms for connecting to zookeeper

zookeeper.connection.timeout.ms=1000000

***************************

Do you see something there? I am limiting size of the logs by time and size. Is it a problem?

Thanks

On Monday, December 26, 2016 at 7:47:14 PM UTC-5, Dustin Cote wrote:

Probably worth posting your whole broker configuration. I've seen this happen a lot to folks with log.cleaner.enable set to false. This happens for a variety of reasons but this property needs to be true for the internal topic to be cleaned as it's a compacted topic.

On Dec 26, 2016 11:38 AM, "Eugene Dvorkin" <edvo...@gmail.com> wrote:

Hi ,
I noticed than one of my partition for _consumer_offset grows very large:

0 __consumer_offsets-48

77G __consumer_offsets-49

0 __consumer_offsets-5

0 __consumer_offsets-8

0 __consumer_offsets-9

77 GB for quite small load. This is for cluster of 5 servers. Replication factor on this topic is 3.

Our retention policy set for 100mb but apparently deleting of old data does not happens.

log.retention.bytes=104857600
kafka_log_cleanup_interval_mins: 1

Why this is happening? Is it a bug in Kafka? How to clean it up?
Thanks

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.

Dustin Cote

unread,

Dec 27, 2016, 11:51:44 AM12/27/16

to confluent...@googlegroups.com

Yes, here's the problem:

log.cleaner.enable=false

change this to:

log.cleaner.enable=true

There was an old wiki that had a configuration that some people picked up from a previous version where the log cleaner was disabled. That wiki has since been deleted because the log cleaner should be on. The default is now true in your version (unlike the comment in your file states) so please go ahead and make the change above and you should start seeing results after you restart the broker(s).

To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsubscribe@googlegroups.com.

To post to this group, send email to confluent...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/baaaef2b-3769-44af-939c-056a9fd4131c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platform+unsub...@googlegroups.com.

To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/5011359e-2d7d-4169-b8dc-0f0cc2f8a30c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--

Dustin Cote

Customer Operations Engineer | Confluent

Follow us: Twitter | blog

Reply all

Reply to author

Forward