cpu usage differs between brokers

111 views

Skip to first unread message

Elad Eldor

unread,

Apr 12, 2021, 4:49:52 PM4/12/21

to Confluent Platform

Hi everyone,

we have a kafka cluser on AWS EC2 machines with

12 brokers, each with
- 12 cores
- 93 gb ram
- a single ephemeral 6.2tb disk

we started with 10 brokers that reach total of 80% cpu util (50% us, 20% sy, 10% si) and load average reaching ~20. the cpu usage mostly arrives from the kafka broker process.

so we added 2 more brokers.

the 2 new brokers receive the same amount of data as the old 10 brokers, but they are much less utilised in terms of cpu usage (total of us, sy, si is only 25%). the cpu usage mostly arrives from the kafka broker process.

the only difference between the old and new brokers is that

the old brokers have total of 250 leader partitions (and 500 partitions in total)
the new brokers have 125 leaders and 250 partitions (due to replication factor of 2).

old & new brokers receive the same amount of data because the topics that were assigned to the new brokers contain almost all the messages arriving to the cluster. the other partitions that aren't assigned to the new brokers are very small from traffic perspective.

the dstat on both new & old brokers shows ~ 140 mb network recv and ~200 mb network send.

however it also shows the new brokers have 100k interrupts and 70k context switches while the old brokers have 200k interrupts and 140k context switches.

has anyone encountered such issue of new brokers receiving almost the same amount of events while their cpu usage significantly differs? do you think it's related to the difference between the number of assigned partitions among them?

Thanks in advance,

Elad

Reply all

Reply to author

Forward

0 new messages