Is Prometheus showing me a memory leak in kubernetes?

144 views
Skip to first unread message

Daniel Watrous

unread,
Jul 12, 2019, 9:40:10 AM7/12/19
to promethe...@googlegroups.com

Hi,

 

I have a Jenkins workload running in Kubernetes which keeps getting OOM killed by Kubernetes. Based on the Prometheus data, I’m not sure how to make sense of what’s happening, and it looks like there is a memory leak in Kubernetes. To begin with, I have this deployment definition

 

      containers:

      - name: jenkins

        image: jenkins/jenkins:2.164.3

        ports:

        - containerPort: 8080

        - containerPort: 50000

        resources:

          limits:

            cpu: "4"

            memory: 5G

          requests:

            cpu: "2"

            memory: 5G

        env:

        - name: JAVA_OPTS

          value: -Djava.util.logging.config.file=/var/jenkins/log.properties -Dorg.apache.commons.jelly.tags.fmt.timeZone=America/Chicago -Dkubernetes.websocket.ping.interval=30000

        - name: _JAVA_OPTIONS

          value: -XX:MaxRAMFraction=1 -XX:MaxRAM=4g

 

Attached is how the memory consumption shows in Prometheus over the last week. The lower graph is showing the container_memory_rss, which appears to be steady <=3GB. The first part that’s confusing is the top graph that shows container_memory_usage_bytes. When the Pod first starts, this is around 4GB, but it the Jenkins container dies (Kubernetes OOM kills it) and when Kubernetes creates a new one, there is a top line seen going up to 6GB. That container lasts for a few days, and hovers around 4GB, but then the container is OOM killed and when the new container is created, the top line moves up again and the new container hovers around 4GB. The top line in the graph is for this element:

 

container_memory_usage_bytes{

beta_kubernetes_io_arch="amd64",

beta_kubernetes_io_os="linux",

id="/kubepods/burstable/pod54b40950-9dcd-11e9-a0ee-fa163e6c897e",

instance="extra-nodes-0",

job="kubernetes-cadvisor",

kubernetes_io_arch="amd64",

kubernetes_io_hostname="extra-nodes-0",

kubernetes_io_os="linux",

namespace="jenkins",

pod="jenkins-deployment-6758bdb87f-995xj",

pod_name="jenkins-deployment-6758bdb87f-995xj"

}

 

I see that this doesn’t appear to be a container at all. There is no image, no container ID, etc. The element for the actual Jenkins container is

 

container_memory_usage_bytes{

beta_kubernetes_io_arch="amd64",

beta_kubernetes_io_os="linux",

container="jenkins",

container_name="jenkins",

id="/kubepods/burstable/pod54b40950-9dcd-11e9-a0ee-fa163e6c897e/8092164b373b8689015cbdcba1fbee9aa084384d19ed2703e7d3e89e97d0cf08",

image="sha256:ba607c18aeb76f625680a9786d00c9527aeeb9df971aa23346a404e9904d49aa",

instance="extra-nodes-0",

job="kubernetes-cadvisor",

kubernetes_io_arch="amd64",

kubernetes_io_hostname="extra-nodes-0",

kubernetes_io_os="linux",

name="k8s_jenkins_jenkins-deployment-6758bdb87f-995xj_jenkins_54b40950-9dcd-11e9-a0ee-fa163e6c897e_3",

namespace="jenkins",

pod="jenkins-deployment-6758bdb87f-995xj",

pod_name="jenkins-deployment-6758bdb87f-995xj"

}

 

In this case there is a container, and the memory it uses is within the limits specified.

 

I’m starting with the Prometheus list to understand what data I’m actually seeing and what it means, but ultimately I need to figure out how to end up with a configuration that won’t result in Kubernetes OOM killing the Jenkins container. I also don’t want to lose memory each time a container is replaced by Kubernetes.

 

Thanks!

 

 

jenkins-prometheus-2019-07-12_08-13-56.png

Harald Koch

unread,
Jul 12, 2019, 11:40:44 AM7/12/19
to Prometheus Users
On Friday, 12 July 2019 09:40:10 UTC-4, Daniel Watrous wrote:

 

I have a Jenkins workload running in Kubernetes which keeps getting OOM killed by Kubernetes.


I believe that the Jenkins container you're using is still built on top of openjdk 8, which doesn't co-exist well with docker (or Kubernetes) memory management. If you're want to limit memory in the resources: section of your deployment, you should first upgrade to use the jdk11-based images; the newer JDK has memory management hooks to play nicely when running inside a constrained container.

--
Harald Koch

Daniel Watrous

unread,
Jul 15, 2019, 10:50:43 AM7/15/19
to promethe...@googlegroups.com

Thanks for the idea about trying JDK 11. I will definitely try that. 

 

However, I am still curious to understand why memory appears to be leaking each time a container is killed/respawned. What is that container memory usage element that grows with each kill/respawn cycle?

 

Daniel

Reply all
Reply to author
Forward
0 new messages