Hi,
I have a Jenkins workload running in Kubernetes which keeps getting OOM killed by Kubernetes. Based on the Prometheus data, I’m not sure how to make sense of what’s happening, and it looks like there is a memory leak in Kubernetes. To begin with, I have this deployment definition
containers:
- name: jenkins
image: jenkins/jenkins:2.164.3
ports:
- containerPort: 8080
- containerPort: 50000
resources:
limits:
cpu: "4"
memory: 5G
requests:
cpu: "2"
memory: 5G
env:
- name: JAVA_OPTS
value: -Djava.util.logging.config.file=/var/jenkins/log.properties -Dorg.apache.commons.jelly.tags.fmt.timeZone=America/Chicago -Dkubernetes.websocket.ping.interval=30000
- name: _JAVA_OPTIONS
value: -XX:MaxRAMFraction=1 -XX:MaxRAM=4g
Attached is how the memory consumption shows in Prometheus over the last week. The lower graph is showing the container_memory_rss, which appears to be steady <=3GB. The first part that’s confusing is the top graph that shows container_memory_usage_bytes. When the Pod first starts, this is around 4GB, but it the Jenkins container dies (Kubernetes OOM kills it) and when Kubernetes creates a new one, there is a top line seen going up to 6GB. That container lasts for a few days, and hovers around 4GB, but then the container is OOM killed and when the new container is created, the top line moves up again and the new container hovers around 4GB. The top line in the graph is for this element:
container_memory_usage_bytes{
beta_kubernetes_io_arch="amd64",
beta_kubernetes_io_os="linux",
id="/kubepods/burstable/pod54b40950-9dcd-11e9-a0ee-fa163e6c897e",
instance="extra-nodes-0",
job="kubernetes-cadvisor",
kubernetes_io_arch="amd64",
kubernetes_io_hostname="extra-nodes-0",
kubernetes_io_os="linux",
namespace="jenkins",
pod="jenkins-deployment-6758bdb87f-995xj",
pod_name="jenkins-deployment-6758bdb87f-995xj"
}
I see that this doesn’t appear to be a container at all. There is no image, no container ID, etc. The element for the actual Jenkins container is
container_memory_usage_bytes{
beta_kubernetes_io_arch="amd64",
beta_kubernetes_io_os="linux",
container="jenkins",
container_name="jenkins",
id="/kubepods/burstable/pod54b40950-9dcd-11e9-a0ee-fa163e6c897e/8092164b373b8689015cbdcba1fbee9aa084384d19ed2703e7d3e89e97d0cf08",
image="sha256:ba607c18aeb76f625680a9786d00c9527aeeb9df971aa23346a404e9904d49aa",
instance="extra-nodes-0",
job="kubernetes-cadvisor",
kubernetes_io_arch="amd64",
kubernetes_io_hostname="extra-nodes-0",
kubernetes_io_os="linux",
name="k8s_jenkins_jenkins-deployment-6758bdb87f-995xj_jenkins_54b40950-9dcd-11e9-a0ee-fa163e6c897e_3",
namespace="jenkins",
pod="jenkins-deployment-6758bdb87f-995xj",
pod_name="jenkins-deployment-6758bdb87f-995xj"
}
In this case there is a container, and the memory it uses is within the limits specified.
I’m starting with the Prometheus list to understand what data I’m actually seeing and what it means, but ultimately I need to figure out how to end up with a configuration that won’t result in Kubernetes OOM killing the Jenkins container. I also don’t want to lose memory each time a container is replaced by Kubernetes.
Thanks!
I have a Jenkins workload running in Kubernetes which keeps getting OOM killed by Kubernetes.
Thanks for the idea about trying JDK 11. I will definitely try that.
However, I am still curious to understand why memory appears to be leaking each time a container is killed/respawned. What is that container memory usage element that grows with each kill/respawn cycle?
Daniel