OOM in S3 sink connector

Zainal A

unread,

Jan 8, 2021, 1:33:02 PM1/8/21

to Confluent Platform

Hi, did anybody ever run into OOM when running S3 sink connector?
From my experiment, I set the job to 2GB RAM, and after it runs for few hours, it hit that memory limit and died. In my test, no data is produced to the source Kafka topic, and yet it still hit OOM after running for few hours. I have no idea what causing it consumes a lot of memory.
Also, is there a way to debug/troubleshoot OOM issue in Kafka connect job? Thanks!

Miguel Silvestre

unread,

Jan 12, 2021, 9:41:29 AM1/12/21

to confluent...@googlegroups.com

Hi!

I've seen OOM due to the s3 connector. It was because we were saving parquet to s3 and parquet files are memory intensive to generate.

It was reaching the limit set on k8s and pods were dying and restarting..

You can add more memory to the process or tweak with the connector configuration.

In my case I tweaked the flush.size and rotate.schedule.interval.ms which lead to more parquet files but smaller.

--

Miguel Silvestre

--
You received this message because you are subscribed to the Google Groups "Confluent Platform" group.
To unsubscribe from this group and stop receiving emails from it, send an email to confluent-platf...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/0afc0bdb-0b00-4897-be2c-6c66bb6e4b8en%40googlegroups.com.

Zainal A

unread,

Jan 13, 2021, 11:25:54 AM1/13/21

to Confluent Platform

Thanks Miguel for the reply and input from your experience!
Our job (which also runs on k8s pod) is not using parquet, but just json, and the data size is relatively small, i.e. 100 records are about 4.6KB
Earlier our config set flush.size to 500, and I changed it to 100, and it still hit OOM though it's less frequent, i.e. about every 1 hour, previously it's like once less than half an hour.

I don't try adding rotate.schedule.interval.ms setting, since I see the job committed the file in less time, so I think it'd be irrelevant.
Btw, how much RAM did you allocate for your job? Ours is 2GB, and I just changed it to 3GB, I'll see if that will fix this OOM issue.

What surprise me is I also tested with another job which there is no data in the source kafka topic (hence nothing to write to the sink), and yet it also hit OOM pretty often, though not as often as every hour.

Thanks,
zainal

Miguel Silvestre

unread,

Jan 14, 2021, 6:16:26 AM1/14/21

to confluent...@googlegroups.com

I have 5GB for JVM.

What you are describing seems to be a very odd behaviour.

You need to check JVM memory consumption, the machine memory, k8s settings. Etc.

--

Miguel Silvestre

To view this discussion on the web visit https://groups.google.com/d/msgid/confluent-platform/08c061dc-cf7b-4061-a50c-c750a102a097n%40googlegroups.com.

Zainal A

unread,

Jan 14, 2021, 2:15:09 PM1/14/21

to Confluent Platform

I set it to 3GB and flush.size to 100, and I haven't seen OOM so far (it seems settling at 2.7GB and never going up beyond that).
Do you install JVM tools in your container image, so that you can use it to debug?

Thanks,
zainal

Reply all

Reply to author

Forward