Pod memory keeps on increasing and restart with error OOMKilled

Rakesh K R

unread,

Mar 10, 2022, 1:26:57 AM3/10/22

to golang-nuts

HI,

I have a micro service application deployed on kubernetes cluster(with 1gb of pod memory limit). This app receive continuous messages(500 message per second) from producer app over kafka interface(these messages are encoded in protobuf format.)

Basic application flow:

1. Get the message one by one from kafka

2. unmarshal proto message

3. apply business logic

4. write the message to redis cache(in []byte format)

When pod starts memory will be around 50mb and memory starts increasing as traffic flows into the application. It is never released back to OS. As a result pod restarts with error code OOMKilled.

I have integrated grafana to see memory usage like RSS, heap, stack.

During this traffic flow, in-use heap size is 80mb, idle heap is 80mb where as process resident memory is at 800-1000MB. Stopping the traffic completely for hours did not help and RSS continue to remain in 1000mb.

Tried to analyze this with pprof and it reports only 80mb are in in-use section. So I am wondering where these remaining 800-1000mb of pods memory went. Also application allocates memory like slices/maps/strings to perform business logic(see alloc_space pprof output below)

I tried couple of experiments:

1. Calling FreeOsMemory() in the app but that did not help

2. invoking my app with GODEBUG=madvdontneed=1 my_app_executable and did not help

3. Leaving the application for 5-6hrs without any traffic to see whether memory comes down. It did not help

4. pprof shows only 80mb of heap in use

5. Tried upgrading golang version from 1.13 to 1.16 as there were some improvements in runtime. It did not help

pprof output for alloc_space:

(pprof) top20
Showing nodes accounting for 481.98GB, 91.57% of 526.37GB total
Dropped 566 nodes (cum <= 2.63GB)
Showing top 20 nodes out of 114
flat flat% sum% cum cum%
78.89GB 14.99% 14.99% 78.89GB 14.99% github.com/go-redis/redis/v7/internal/proto.(*Reader).readStringReply
67.01GB 12.73% 27.72% 285.33GB 54.21% airgroup/internal/wrapper/agrediswrapper.GetAllConfigurationForGroups
58.75GB 11.16% 38.88% 58.75GB 11.16% google.golang.org/protobuf/internal/impl.(*MessageInfo).MessageOf
52.26GB 9.93% 48.81% 52.26GB 9.93% reflect.unsafe_NewArray
45.78GB 8.70% 57.50% 46.38GB 8.81% encoding/json.(*decodeState).literalStore
36.98GB 7.02% 64.53% 36.98GB 7.02% reflect.New
28.20GB 5.36% 69.89% 28.20GB 5.36% gopkg.in/confluentinc/confluent-kafka-go.v1/kafka._Cfunc_GoBytes
25.60GB 4.86% 74.75% 63.62GB 12.09% google.golang.org/protobuf/proto.MarshalOptions.marshal
12.79GB 2.43% 77.18% 165.56GB 31.45% encoding/json.(*decodeState).object
12.73GB 2.42% 79.60% 12.73GB 2.42% reflect.mapassign
11.05GB 2.10% 81.70% 63.31GB 12.03% reflect.MakeSlice
10.06GB 1.91% 83.61% 12.36GB 2.35% filterServersForDestinationDevicesAndSendToDistributionChan
6.92GB 1.32% 84.92% 309.45GB 58.79% groupAndSendToConfigPolicyChannel
6.79GB 1.29% 86.21% 48.85GB 9.28% publishInternalMsgToDistributionService
6.79GB 1.29% 87.50% 174.81GB 33.21% encoding/json.Unmarshal
6.14GB 1.17% 88.67% 6.14GB 1.17% google.golang.org/protobuf/internal/impl.consumeBytes
4.64GB 0.88% 89.55% 14.39GB 2.73% GetAllDevDataFromGlobalDevDataDb
4.11GB 0.78% 90.33% 18.47GB 3.51% GetAllServersFromServerRecordDb
3.27GB 0.62% 90.95% 3.27GB 0.62% net.HardwareAddr.String
3.23GB 0.61% 91.57% 3.23GB 0.61% reflect.makemap
(pprof)

Need experts help in analyzing this issue.

Thanks in advance!!

Tamás Gulácsi

unread,

Mar 10, 2022, 3:15:18 AM3/10/22

to golang-nuts

gopkg.in/confluentinc/confluent-kafka-go.v1/kafka._Cfunc_GoBytes

says it uses cgo, hiding it's memory usage from Go. I bet that 900MiB of memory is there...

Rakesh K R

unread,

Mar 10, 2022, 11:06:20 AM3/10/22

to golang-nuts

Tamas,

Thanks you. So any suggestion on how to make application release this 900MiB memory back to OS so that pod will not end up in OOMKilled state?

Robert Engels

unread,

Mar 10, 2022, 11:20:50 AM3/10/22

to Rakesh K R, golang-nuts

You need to configure Kafka for how long it retains messages for replay - or some other option to store on disk.

On Mar 10, 2022, at 10:07 AM, Rakesh K R <rakesh...@gmail.com> wrote:

Tamas,

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/e9b91937-7bf5-4526-940f-2a60b2989ddfn%40googlegroups.com.

Rakesh K R

unread,

Mar 10, 2022, 12:02:47 PM3/10/22

to golang-nuts

Hi,

Sorry I am not sure which kafka configuration are you referring here. Can you please point me to the right configuration responsible for retaining the message for replay.

I see following properties which might be related but not sure:

queued.min.messages
queued.max.messages.kbytes
queue.buffering.max.messages
queue.buffering.max.kbytes
linger.ms ---> this is currently set to 1000
message.timeout.ms

Thank you

Robert Engels

unread,

Mar 10, 2022, 3:04:23 PM3/10/22

to Rakesh K R, golang-nuts

Look at log.retention.hours and log.retention.bytes

You should post this in the Kafka forums not the Go ones.

On Mar 10, 2022, at 11:04 AM, Rakesh K R <rakesh...@gmail.com> wrote:

Hi,

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0b7a3c46-00f6-43ff-9db4-b0520282610an%40googlegroups.com.

Rakesh K R

unread,

Mar 10, 2022, 11:11:13 PM3/10/22

to golang-nuts

Hi,
Thank you. I know its kafka related question but thread started with issue in golang but later we are suspecting issue in go kafka library configuration.

FYI, I am interested in kafka client or producer properties. log.retention.hours or log.retention.bytes are kafka broker related configuration. so I am confused now

Robert Engels

unread,

Mar 11, 2022, 10:30:30 AM3/11/22

to Rakesh K R, golang-nuts

I think your best course of action is to go to the Kafka forums.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/70cbff4e-464f-4d4a-96da-e9c82ca89978n%40googlegroups.com.

Reply all

Reply to author

Forward