Confluent Kafka 3.2.2 - rebalancing not happening

207 views
Skip to first unread message

karan alang

unread,
Aug 1, 2017, 7:47:58 PM8/1/17
to Confluent Platform
Hi All - 
i'm trying to rebalance Kafka topic (refer link -> http://docs.confluent.io/current/kafka/rebalancer/rebalancer.html), and somehow the rebalancing is not working.


Here is what i'm doing ->
- i've 4 Kafka brokers & i've made changes to the server.properties file to enable Confluent Metrics Reporter.
(attached are the server.properties of the 4 brokers)

-> Created a topic specifying Replica assignment

./bin/kafka-topics --create --topic topic-a1 --replica-assignment 0:1,0:1,0:1,0:1 --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:3181

-> describe topic 

./bin/kafka-topics --describe --topic topic-a1 --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:3181
Topic:topic-a1 PartitionCount:4 ReplicationFactor:2 Configs:
Topic: topic-a1 Partition: 0 Leader: 1 Replicas: 1,0 Isr: 0,1
Topic: topic-a1 Partition: 1 Leader: 1 Replicas: 1,0 Isr: 0,1
Topic: topic-a1 Partition: 2 Leader: 0 Replicas: 0,1 Isr: 0,1
Topic: topic-a1 Partition: 3 Leader: 0 Replicas: 0,1 Isr: 0,1
 
 
-> Produce data into topics, using the following command

./bin/kafka-producer-perf-test --topic topic-a1 --num-records 200000 --record-size 1000 --throughput 10000000 --producer-props bootstrap.servers=nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9082,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9072,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9062 


-> Force Creation of offsets topic, by creating a Consumer (NOT SURE WHAT THIS IS FOR ???) :

./bin/kafka-consumer-perf-test --topic topic-a1 --broker-list nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9082,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9072,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9062 --messages 10


-> run the following command to rebalance 

The plan that is presented does not really any rebalancing ->

./bin/confluent-rebalancer execute --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:3181 --metrics-bootstrap-server nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9082,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9072,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9062 --throttle 10000000 --verbose
Computing the rebalance plan (this may take a while) ...
You are about to move 0 replica(s) for 0 partitions to 0 broker(s) with total size 0 MB.
The preferred leader for 2 partition(s) will be changed.
In total, the assignment for 2 partitions will be changed.
The minimum free volume space is set to 20.0%.
The following brokers will have less than 40% of free volume space during the rebalance:
Broker     Current Size (MB)  Size During Rebalance (MB)   Free % During Rebalance      Size After Rebalance (MB)    Free % After Rebalance      
0          4,021.1            4,021.1                      14.2                         4,021.1                      14.2                        
1          1,240.8            1,240.8                      14.2                         1,240.8                      14.2                        
2          620.4              620.4                        14.2                         620.4                        14.2                        
3          0                  0                            14.2                         0                            14.2                        
Min/max stats for brokers (before -> after):
Type  Leader Count                 Replica Count                Size (MB)                          
Min   0 (id: 3) -> 0 (id: 3)       0 (id: 3) -> 0 (id: 3)       0 (id: 3) -> 0 (id: 3)             
Max   125 (id: 0) -> 123 (id: 0)   127 (id: 0) -> 127 (id: 0)   4,021.1 (id: 0) -> 4,021.1 (id: 0) 
No racks are defined.
Broker stats (before -> after):
Broker     Leader Count    Replica Count   Size (MB)            Free Space (%)      
0          125 -> 123      127 -> 127      4,021.1 -> 4,021.1   14.2 -> 14.2        
1          3 -> 5          12 -> 12        1,240.8 -> 1,240.8   14.2 -> 14.2        
2          2 -> 2          3 -> 3          620.4 -> 620.4       14.2 -> 14.2        
3          0 -> 0          0 -> 0          0 -> 0               14.2 -> 14.2        
Would you like to continue? (y/n): y
The rebalance has been started, run `status` to check progress.
Warning: You must run the `status` or `finish` command periodically, until the rebalance completes, to ensure the throttle is removed. You can also alter the throttle by re-running the execute command passing a new value.



-> I describe the topic, after rebalancing

./bin/kafka-topics --describe --topic topic-a1 --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:3181 
Topic:topic-a1 PartitionCount:4 ReplicationFactor:2 Configs:
Topic: topic-a1 Partition: 0 Leader: 1 Replicas: 1,0 Isr: 0,1
Topic: topic-a1 Partition: 1 Leader: 1 Replicas: 1,0 Isr: 0,1
Topic: topic-a1 Partition: 2 Leader: 0 Replicas: 0,1 Isr: 0,1
Topic: topic-a1 Partition: 3 Leader: 0 Replicas: 0,1 Isr: 0,1



My expectation was that the topic will have all 4 brokers as leaders for the 4 partitions, but that does not seem to be happening.

Any ideas what the issue is ?
kafka-server.properties.txt
kafka-server1.properties.txt
kafka-server2.properties.txt
kafka-server3.properties.txt

karan alang

unread,
Aug 2, 2017, 2:11:40 AM8/2/17
to confluent...@googlegroups.com
Hello, here is the update ..

when i ran script - kafka-preferred-replica-election, it did the re-election as required.


./bin/kafka-preferred-replica-election --zookeeper localhost:3181

so does that mean that i need to run the  script -> ./bin/confluent-rebalancer to rebalance the data, 
but for the leader election, the script to be run is ->  ./bin/kafka-preferred-replica-election

The documentation mentions this  (link - http://docs.confluent.io/current/kafka/rebalancer/rebalancer.html)

The confluent-rebalancer tool balances data so that the number of leaders and disk usage are even across brokers and racks on a per topic and cluster level while minimising data movement.   

seems there is a disconnect here, pls let me know if anyone has inputs. 


--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/xeyNQ5cP6fU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/7d93fe32-aa85-4ada-b7b7-f3dd44f8de81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ismael Juma

unread,
Aug 2, 2017, 5:23:22 AM8/2/17
to Confluent Platform
Hi Karan,

Answers inline.


On Wednesday, 2 August 2017 07:11:40 UTC+1, karan alang wrote:
when i ran script - kafka-preferred-replica-election, it did the re-election as required.

./bin/kafka-preferred-replica-election --zookeeper localhost:3181

so does that mean that i need to run the  script -> ./bin/confluent-rebalancer to rebalance the data, 
but for the leader election, the script to be run is ->  ./bin/kafka-preferred-replica-election

This is explained here:


"If auto.leader.rebalance.enable is disabled on your brokers, run the preferred leader election tool after the rebalance completes. This will ensure that the actual leaders are balanced (not just the preferred leaders)."

The documentation mentions this  (link - http://docs.confluent.io/current/kafka/rebalancer/rebalancer.html)

The confluent-rebalancer tool balances data so that the number of leaders and disk usage are even across brokers and racks on a per topic and cluster level while minimising data movement.   

seems there is a disconnect here, pls let me know if anyone has inputs. 

The rebalancer updates the replica assignment so that the leaders are balanced, but the preferred leader election is done periodically (every 5 minutes by default) if `auto.leader.rebalance.enable` is true. If `auto.leader.rebalance.enable` is false, then `./bin/kafka-preferred-replica-election ` has to be invoked for preferred leader election to take place.

On Tue, Aug 1, 2017 at 4:47 PM, karan alang <karan...@gmail.com> wrote:
-> Force Creation of offsets topic, by creating a Consumer (NOT SURE WHAT THIS IS FOR ???) :

The offsets topic is an internal topic where Kafka stores consumer offsets and group management data. It's created on demand and adds a number of partitions to the cluster. We consume some data to trigger the creation of this topic at a predictable time for the purpose of the quickstart. In a real environment, it would have been created already.

-> run the following command to rebalance 

The plan that is presented does not really any rebalancing ->

That's because there's not enough free disk space. There's only 14.2% free (as can be seen below), but the minimum threshold is 20% by default. You can change this by by passing --min-free-volume-space-percentage=5 (for example).

./bin/confluent-rebalancer execute --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:3181 --metrics-bootstrap-server nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9082,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9072,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9062 --throttle 10000000 --verbose
Computing the rebalance plan (this may take a while) ...
You are about to move 0 replica(s) for 0 partitions to 0 broker(s) with total size 0 MB.
The preferred leader for 2 partition(s) will be changed.
In total, the assignment for 2 partitions will be changed.
The minimum free volume space is set to 20.0%.
The following brokers will have less than 40% of free volume space during the rebalance:
Broker     Current Size (MB)  Size During Rebalance (MB)   Free % During Rebalance      Size After Rebalance (MB)    Free % After Rebalance      
0          4,021.1            4,021.1                      14.2                         4,021.1                      14.2                        
1          1,240.8            1,240.8                      14.2                         1,240.8                      14.2                        
2          620.4              620.4                        14.2                         620.4                        14.2                        
3          0                  0                            14.2                         0                            14.2

Hope that helps,
Ismael 

karan alang

unread,
Aug 2, 2017, 3:12:52 PM8/2/17
to confluent...@googlegroups.com
Thanks Ismael ... i was able to get this to work based on your comments !

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/xeyNQ5cP6fU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.

karan alang

unread,
Aug 2, 2017, 4:00:54 PM8/2/17
to Confluent Platform

Hello - However, i still need some clarification wrt Confluent Metric Reporter & Auto Rebalancing
(links referred 
 
As i understand, the Confluent metrics are stored in topic -> _confluent-metrics

I have 4 brokers & they are on the same m/c & different port numbers.
I've enabled metrics collection on the 4 brokers, by modifying the server.properties files for all 4 brokers

The question is - what should the value of the property  'confluent.metrics.reporter.bootstrap.servers'
be set to ?


should it be -> 

1) confluent.metrics.reporter.bootstrap.servers=<host1>:<port1>,<host1>:<port2>,<host1>:<port3>,<host1>:<port4>

(i.e. the parameter points to all the brokers)

OR 

2) confluent.metrics.reporter.bootstrap.servers=<host1>:<port1> 

(i.e. parameter points to corresponding broker1:port1 )

Currently i have option 2 setup & when i describe the topic _confluent-metrics, i see the following ->

./bin/kafka-topics --describe --topic _confluent-metrics --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:3181
Topic:_confluent-metrics PartitionCount:12 ReplicationFactor:1 Configs:retention.ms=259200000,segment.ms=14400000,min.insync.replicas=1,retention.bytes=-1
Topic: _confluent-metrics Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: _confluent-metrics Partition: 1 Leader: 2 Replicas: 2 Isr: 2
Topic: _confluent-metrics Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: _confluent-metrics Partition: 3 Leader: 3 Replicas: 3 Isr: 3
Topic: _confluent-metrics Partition: 4 Leader: 0 Replicas: 0 Isr: 0
Topic: _confluent-metrics Partition: 5 Leader: 1 Replicas: 1 Isr: 1
Topic: _confluent-metrics Partition: 6 Leader: 3 Replicas: 3 Isr: 3
Topic: _confluent-metrics Partition: 7 Leader: 2 Replicas: 2 Isr: 2
Topic: _confluent-metrics Partition: 8 Leader: 1 Replicas: 1 Isr: 1
Topic: _confluent-metrics Partition: 9 Leader: 3 Replicas: 3 Isr: 3
Topic: _confluent-metrics Partition: 10 Leader: 1 Replicas: 1 Isr: 1
Topic: _confluent-metrics Partition: 11 Leader: 2 Replicas: 2 Isr: 2
Clearly, this is not desirable, since there is only one broker in the Replica & ISR for each partition ?


The reason why i ask the above ->

when i try to decommission the broker 1, using the command ->

./bin/confluent-rebalancer execute --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:2181 --metrics-bootstrap-server nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9082,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9072,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9062 --throttle 100000 --remove-broker-ids 1

(i,e, --metrics-bootstrap-server points to <broker1>:<port1>,<broker1>:<port2>,<broker1>:<port3>,<broker1>:<port4>

it gives the following error ->

./bin/confluent-rebalancer execute --zookeeper nwk2-bdp-kafka-04.gdcs-qa.apple.com:2181 --metrics-bootstrap-server nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9092,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9082,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9072,nwk2-bdp-kafka-04.gdcs-qa.apple.com:9062 --throttle 100000 --remove-broker-ids 1

Computing the rebalance plan (this may take a while) ...
Unexpected exception: Did not receive a cluster id to filter on. Please ensure that you are using Kafka >= 0.10.1, and that you have some brokers running in the cluster.
java.lang.IllegalArgumentException: Did not receive a cluster id to filter on. Please ensure that you are using Kafka >= 0.10.1, and that you have some brokers running in the cluster.
at io.confluent.kafka.databalancing.metric.MetricsCollector.collectMetrics(MetricsCollector.java:63)
at io.confluent.kafka.databalancing.DefaultRebalancer.metrics(DefaultRebalancer.java:150)
at io.confluent.kafka.databalancing.DefaultRebalancer.proposeRebalance(DefaultRebalancer.java:67)
at io.confluent.kafka.databalancing.ConfluentRebalancerCommand$Execute.doRun(ConfluentRebalancerCommand.java:242)
at io.confluent.kafka.databalancing.ConfluentRebalancerCommand$BaseRebalanceCommand.run(ConfluentRebalancerCommand.java:120)
at io.confluent.kafka.databalancing.ConfluentRebalancerCommand.run(ConfluentRebalancerCommand.java:61)
at io.confluent.kafka.databalancing.ConfluentRebalancerCommand.main(ConfluentRebalancerCommand.java:36)
However, when i provide only one Broker:Port, i don't see the error shown above.
Pls let me know what needs to be done for this ?

thanks

karan alang

unread,
Aug 2, 2017, 6:05:28 PM8/2/17
to confluent...@googlegroups.com
Hello - here is update on this 


The error below was because i was giving incorrect zookeeper port :( .. so this is not really an issue
 
Unexpected exception: Did not receive a cluster id to filter on. Please ensure that you are using Kafka >= 0.10.1, and that you have some brokers running in the cluster.
java.lang.IllegalArgumentException: Did not receive a cluster id to filter on. Please ensure that you are using Kafka >= 0.10.1, and that you have some brokers running in the cluster.
at io.confluent.kafka.databalancing.metric.MetricsCollector.collectMetrics(MetricsCollector.java:63)


However, the other questions remain ..

i.e. 

The question is - what should the value of the property  'confluent.metrics.reporter.bootstrap.servers'
be set to ?

should it be -> 
1) confluent.metrics.reporter.bootstrap.servers=<host1>:<port1>,<host1>:<port2>,<host1>:<port3>,<host1>:<port4>
(i.e. the parameter points to all the brokers)
OR 
2) confluent.metrics.reporter.bootstrap.servers=<host1>:<port1> 
(i.e. parameter points to corresponding broker1:port1 )
 

and - why does the topic _confluent-metric have only one replication ?
btw, i checked the server.properties of the brokers and realized that the property confluent.metrics.reporter.topic.replicas=1 was uncommented in one of the broker properties file.
I assume this property dictates the replication factor the topic - _confluent-metrics ? 


To unsubscribe from this group and all its topics, send an email to confluent-platform+unsubscribe@googlegroups.com.

To post to this group, send email to confluent-platform@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/xeyNQ5cP6fU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.

Ismael Juma

unread,
Aug 2, 2017, 6:28:52 PM8/2/17
to Confluent Platform
Hi Karan,

Answers inline.

On Wednesday, 2 August 2017 23:05:28 UTC+1, karan alang wrote:
Hello - here is update on this 


The error below was because i was giving incorrect zookeeper port :( .. so this is not really an issue
 
Unexpected exception: Did not receive a cluster id to filter on. Please ensure that you are using Kafka >= 0.10.1, and that you have some brokers running in the cluster.
java.lang.IllegalArgumentException: Did not receive a cluster id to filter on. Please ensure that you are using Kafka >= 0.10.1, and that you have some brokers running in the cluster.
at io.confluent.kafka.databalancing.metric.MetricsCollector.collectMetrics(MetricsCollector.java:63)

Glad that you solved that issue.

However, the other questions remain ..

i.e. 

The question is - what should the value of the property  'confluent.metrics.reporter.bootstrap.servers'
be set to ?

should it be -> 
1) confluent.metrics.reporter.bootstrap.servers=<host1>:<port1>,<host1>:<port2>,<host1>:<port3>,<host1>:<port4>
(i.e. the parameter points to all the brokers)
OR 
2) confluent.metrics.reporter.bootstrap.servers=<host1>:<port1> 
(i.e. parameter points to corresponding broker1:port1)

A minimum of one broker is required. However, if that broker is down, then the reporter will fail. If multiple brokers are provided, then the reporter will work as long as one of the brokers is up.

and - why does the topic _confluent-metric have only one replication ?
btw, i checked the server.properties of the brokers and realized that the property confluent.metrics.reporter.topic.replicas=1 was uncommented in one of the broker properties file.
I assume this property dictates the replication factor the topic - _confluent-metrics ? 

That's correct. By default, that property is commented out, but it's there for the case where someone is trying out the metrics reporter with a single broker.

Hope this helps, let us know if you have additional questions.

Ismael

karan alang

unread,
Aug 3, 2017, 2:36:13 AM8/3/17
to confluent...@googlegroups.com
Thanks, Ismael !

--
You received this message because you are subscribed to a topic in the Google Groups "Confluent Platform" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/confluent-platform/xeyNQ5cP6fU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to confluent-platform+unsub...@googlegroups.com.
To post to this group, send email to confluent-platform@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages