Need some guidance/help: howto diagnose an oomkill

158 views
Skip to first unread message

John VanRyn

unread,
Sep 26, 2017, 7:37:05 AM9/26/17
to Kubernetes user discussion and Q&A
I have a kube cluster running on n1-highmem-16 (16 vCPUs, 104 GB memory), using the unmodified cos-stable-60-9592-84-0 image. 

I have a java app running under wildfly 

<pre>
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: cas-unicas-ws
  labels:
    name: cas-unicas-ws
    model: cas
spec:
  replicas: 1
  template:
    metadata:
      labels:
        name: cas-unicas-ws
        model: cas
    spec:
      containers:
      - name: cas-unicas-ws
        image: liaisonintl/cas-unicas-ws:__CAS_TAG__
        imagePullPolicy: Always
        ports:
          - containerPort: 8080
        readinessProbe:
          periodSeconds: 20
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
          httpGet:
            path: /services/getPdfServiceConfig
            port: 8080
        resources:
          limits:
            memory: "10000M"
          requests:
            memory: "10000M"
        env:
          - name: JAVA_MEM
            value: -Xms9000m -Xmx9000m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+AlwaysPreTouch
          - name: SPRING_PROFILE
            value: __SPRING_PROFILE__
        command: ["/bin/bash","-ic"]
        args:
          - "set -xeo pipefail ; source /interpolate ; exec /opt/jboss/wildfly/bin/standalone.sh -b 0.0.0.0"
</pre>

Here is the important parts of the dockerFile

<pre>
FROM liaisonintl/docker-cas-base:master
MAINTAINER John VanRyn <REDACTED>

EXPOSE 8080
EXPOSE 9990

LABEL "GITHASH"="__GIT_HASH__"
ENV WILDFLY_HOME /opt/jboss/wildfly
ENV PATH $WILDFLY_HOME/bin:$PATH

ADD *.war ${WILDFLY_HOME}/standalone/deployments/

## App config
#
ADD config/ ${WILDFLY_HOME}/appConfigTemplate/

## Temporary fix just to see things working
ADD config/gen.unicas-ws.docker ${WILDFLY_HOME}/appConfig/unicas-ws.docker

USER root
ENV CAS_CONFIGS ${WILDFLY_HOME}/appConfig
ENV SPRING_PROFILE QA

ENV JAVA_OPTS="${JAVA_OPTS} ${JAVA_MEM} -XX:+UseG1GC -XX:+UseStringDeduplication -DCAS_CONFIGS=${CAS_CONFIGS} -Dspring.profiles.active=${SPRING_PROFILE}"

RUN \
mkdir -p $CAS_CONFIGS && \
    chmod 777 ${WILDFLY_HOME}/appConfig && \
    chmod 777 ${WILDFLY_HOME}/appConfigTemplate && \
    /opt/jboss/wildfly/bin/add-user.sh admin REDACTED --silent

# Add REVISION FILE FOR GITHASH Reporting
ADD config/REVISION REVISION

CMD ["/opt/jboss/wildfly/bin/standalone.sh", "-b", "0.0.0.0"]
</pre>

Log looks like this.. 
<pre>
+ exec /opt/jboss/wildfly/bin/standalone.sh -b 0.0.0.0
JAVA_OPTS already set in environment; overriding default settings with values:   -XX:+UseG1GC -XX:+UseStringDeduplication -DCAS_CONFIGS=/opt/jboss/wildfly/appConfig -Dspring.profiles.active=QA -Xms9000m -Xmx9000m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+UseStringDeduplication
=========================================================================

  JBoss Bootstrap Environment

  JBOSS_HOME: /opt/jboss/wildfly

  JAVA: /usr/lib/jvm/java/bin/java

  JAVA_OPTS:  -server -XX:+UseCompressedOops  -server -XX:+UseCompressedOops   -XX:+UseG1GC -XX:+UseStringDeduplication -DCAS_CONFIGS=/opt/jboss/wildfly/appConfig -Dspring.profiles.active=QA -Xms9000m -Xmx9000m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+UseStringDeduplication

=========================================================================

11:28:24,386 INFO  [org.jboss.modules] (main) JBoss Modules version 1.3.3.Final
11:28:24,539 INFO  [org.jboss.msc] (main) JBoss MSC version 1.2.2.Final
11:28:24,602 INFO  [org.jboss.as] (MSC service thread 1-6) JBAS015899: WildFly 8.2.1.Final "Tweek" starting
11:28:25,430 INFO  [org.jboss.as.controller.management-deprecated] (Controller Boot Thread) JBAS014627: Attribute any-ipv4-address is deprecated, and it might be removed in future version!
11:28:25,479 INFO  [org.jboss.as.server] (Controller Boot Thread) JBAS015888: Creating http management service using socket-binding (management-http)
11:28:25,498 INFO  [org.xnio] (MSC service thread 1-10) XNIO version 3.3.0.Final
11:28:25,510 INFO  [org.xnio.nio] (MSC service thread 1-10) XNIO NIO Implementation Version 3.3.0.Final
11:28:25,534 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 32) JBAS010280: Activating Infinispan subsystem.
11:28:25,542 WARN  [org.jboss.as.txn] (ServerService Thread Pool -- 46) JBAS010153: Node identifier property is set to the default value. Please make sure it is unique.
11:28:25,546 INFO  [org.jboss.as.security] (ServerService Thread Pool -- 45) JBAS013171: Activating Security Subsystem
11:28:25,551 INFO  [org.jboss.as.naming] (ServerService Thread Pool -- 40) JBAS011800: Activating Naming Subsystem
11:28:25,553 INFO  [org.jboss.as.jsf] (ServerService Thread Pool -- 38) JBAS012615: Activated the following JSF Implementations: [main]
11:28:25,564 INFO  [org.jboss.as.connector.logging] (MSC service thread 1-8) JBAS010408: Starting JCA Subsystem (IronJacamar 1.1.9.Final)
11:28:25,564 INFO  [org.jboss.as.security] (MSC service thread 1-7) JBAS013170: Current PicketBox version=4.0.21.Final
11:28:25,578 INFO  [org.jboss.as.webservices] (ServerService Thread Pool -- 48) JBAS015537: Activating WebServices Extension
11:28:25,587 INFO  [org.wildfly.extension.io] (ServerService Thread Pool -- 31) WFLYIO001: Worker 'default' has auto-configured to 16 core threads with 128 task threads based on your 8 available processors
11:28:25,594 INFO  [org.jboss.as.connector.subsystems.datasources] (ServerService Thread Pool -- 27) JBAS010403: Deploying JDBC-compliant driver class org.h2.Driver (version 1.3)
11:28:25,597 INFO  [org.jboss.as.connector.deployers.jdbc] (MSC service thread 1-14) JBAS010417: Started Driver service with driver-name = h2
11:28:25,599 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-16) JBAS017502: Undertow 1.1.8.Final starting
11:28:25,599 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool -- 47) JBAS017502: Undertow 1.1.8.Final starting
11:28:25,639 INFO  [org.jboss.as.naming] (MSC service thread 1-14) JBAS011802: Starting Naming Service
11:28:25,639 INFO  [org.jboss.as.mail.extension] (MSC service thread 1-16) JBAS015400: Bound mail session [java:jboss/mail/Default]
11:28:25,677 INFO  [org.jboss.remoting] (MSC service thread 1-10) JBoss Remoting version 4.0.7.Final
11:28:25,852 INFO  [org.wildfly.extension.undertow] (ServerService Thread Pool -- 47) JBAS017527: Creating file handler for path /opt/jboss/wildfly/welcome-content
11:28:25,857 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-5) JBAS017525: Started server default-server.
11:28:25,882 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-13) JBAS017531: Host default-host starting
11:28:25,939 INFO  [org.wildfly.extension.undertow] (MSC service thread 1-5) JBAS017519: Undertow HTTP listener default listening on /0.0.0.0:8080
/opt/jboss/wildfly/bin/standalone.sh: line 326:   113 Killed                  "/usr/lib/jvm/java/bin/java" -D"[Standalone]" -server -XX:+UseCompressedOops -server -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseStringDeduplication -DCAS_CONFIGS=/opt/jboss/wildfly/appConfig -Dspring.profiles.active=QA -Xms9000m -Xmx9000m -XX:+UseG1GC -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:+UseG1GC -XX:+UseStringDeduplication "-Dorg.jboss.boot.log.file=/opt/jboss/wildfly/standalone/log/server.log" "-Dlogging.configuration=file:/opt/jboss/wildfly/standalone/configuration/logging.properties" -jar "/opt/jboss/wildfly/jboss-modules.jar" -mp "/opt/jboss/wildfly/modules" org.jboss.as.standalone -Djboss.home.dir="/opt/jboss/wildfly" -Djboss.server.base.dir="/opt/jboss/wildfly/standalone" '-b' '0.0.0.0'
</pre>

Thought maybe it was code... but the same war is running on a VM just fine sharing it's jvm with other apps.. 

I believe it's kube that is killing the container... but how to figure it out....   any advice would be appreciated... 

Davanum Srinivas

unread,
Sep 26, 2017, 7:50:53 AM9/26/17
to kubernet...@googlegroups.com
John,

Does this help?
https://developers.redhat.com/blog/2017/03/14/java-inside-docker/

There are some details here as well:
https://github.com/moby/moby/issues/15020

Thanks,
Dims
> --
> You received this message because you are subscribed to the Google Groups
> "Kubernetes user discussion and Q&A" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to kubernetes-use...@googlegroups.com.
> To post to this group, send email to kubernet...@googlegroups.com.
> Visit this group at https://groups.google.com/group/kubernetes-users.
> For more options, visit https://groups.google.com/d/optout.



--
Davanum Srinivas :: https://twitter.com/dims

John VanRyn

unread,
Sep 26, 2017, 8:50:30 PM9/26/17
to kubernet...@googlegroups.com
helps some...  we made the kube pods have almost twice as much memory as we are allocating the jvm.. and it seems to get us out of the woods....  but it totally means we need to look into a jdk upgrade from 8.

Thanks

> email to kubernetes-users+unsubscribe@googlegroups.com.
> To post to this group, send email to kubernetes-users@googlegroups.com.
--
Davanum Srinivas :: https://twitter.com/dims

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/ii2eyX_MmaI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-users+unsubscribe@googlegroups.com.
To post to this group, send email to kubernetes-users@googlegroups.com.

Evan Jones

unread,
Sep 27, 2017, 8:30:21 PM9/27/17
to Kubernetes user discussion and Q&A
Its been a while since I've dealt with this sort of issue, but there are various libraries that use "native" memory outside the Java heap. The -Xmx flag only limits the Java heap, so it isn't surprising that some processes may need a way higher container memory limit than the Java GC heap limit.

However, if the memory usage increases over time without limit, you might have some sort of native memory leak due to not closing things (e.g. direct ByteBuffers, GZIP streams, many others). You can watch the container memory usage of the pod over time, and if it seems to increase without bound this may be what is happening. The JVM's native memory tracking summary statistics can also be useful: https://docs.oracle.com/javase/8/docs/technotes/guides/troubleshoot/tooldescr007.html

I've had success tracking down native memory leaks using jemalloc's profiling: http://www.evanjones.ca/java-native-leak-bug.html

Hope this helps, good luck!

Evan


> email to kubernetes-use...@googlegroups.com.
> To post to this group, send email to kubernet...@googlegroups.com.
--
Davanum Srinivas :: https://twitter.com/dims

--
You received this message because you are subscribed to a topic in the Google Groups "Kubernetes user discussion and Q&A" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/kubernetes-users/ii2eyX_MmaI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to kubernetes-use...@googlegroups.com.
To post to this group, send email to kubernet...@googlegroups.com.

Matthias Rampke

unread,
Sep 28, 2017, 3:56:33 AM9/28/17
to Kubernetes user discussion and Q&A
You can also run jstatd in a running pod and then attach JVisualVM. I haven't done it myself, but the general procedure is:

- kubectl exec into the pod
- Write the policy file to disk: echo 'grant codebase "file:${java.home}/../lib/tools.jar" { permission java.security.AllPermission; };' > all.policy
- Start jstatd. This is a daemon process that exposes information on all JVMs running on the host: jstatd -p 1099 -J-Djava.security.policy=all.policy
- connect JVisualVM using the pod IP (kubectl get pod -o wide; this may be tricky if you can't reach pod IPs directly, e.g. because of an overlay. I think kubectl can help you proxy to it)

/MR
Reply all
Reply to author
Forward
0 new messages