Thanks Miguel for the reply and input from your experience!
Our job (which also runs on k8s pod) is not using parquet, but just json, and the data size is relatively small, i.e. 100 records are about 4.6KB
Earlier our config set flush.size to 500, and I changed it to 100, and it still hit OOM though it's less frequent, i.e. about every 1 hour, previously it's like once less than half an hour.
I don't try adding
rotate.schedule.interval.ms setting, since I see the job committed the file in less time, so I think it'd be irrelevant.
Btw, how much RAM did you allocate for your job? Ours is 2GB, and I just changed it to 3GB, I'll see if that will fix this OOM issue.
What surprise me is I also tested with another job which there is no data in the source kafka topic (hence nothing to write to the sink), and yet it also hit OOM pretty often, though not as often as every hour.
Thanks,
zainal