how find root cause for Zombies processes for Quarkus application ?

113 views
Skip to first unread message

Sébastien Dionne

unread,
Mar 5, 2021, 10:12:01 AM3/5/21
to Quarkus Development mailing list
I have a issue.  We have a lot of zombies processes on a node.  I find that the zombies appear on the node my quarkus application is deployed.

how to find zombies :

ps aux | grep 'Z'

take one entry and use it in the next command below : 

root@test-pcl111:~# pstree -p -s 4194132
systemd(1)───containerd(387924)───containerd-shim(1647514)───java(1647541)───timeout(4194132)
root@test-pcl111:~#


root@test-pcl111:~# ps alx | grep 1647541 | grep -v time
4     0 1647541 1647514  20   0 5802568 335896 futex_ Ssl ?         4:34 java -Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager -Dquarkus.profile=staging -XshowSettings:vm -XX:+HeapDumpOnOutOfMemoryError -Xms500m -Xmx1500m -XX:+ExitOnOutOfMemoryError -cp . -jar /app/quarkus-run.jar
0     0 1648277 1647541  20   0      0     0 -      Z    ?          0:00 [sleep] <defunct>
0     0 1648840 1647541  20   0      0     0 -      Z    ?          0:00 [sleep] <defunct>
0     0 1649468 1647541  20   0      0     0 -      Z    ?          0:00 [sleep] <defunct>
0     0 1650107 1647541  20   0      0     0 -      Z    ?          0:00 [sleep] <defunct>
0     0 1651383 1647541  20   0      0     0 -      Z    ?          0:00 [sleep] <defunct>
0     0 1654707 1647541  20   0      0     0 -      Z    ?          0:00 [sleep] <defunct>
0     0 2474176 2446861  20   0   6432  2456 pipe_w S+   pts/0      0:00 grep --color=auto 1647541
root@test-pcl111:~#

I have no idea how to troubleshoot that one.  I don't have timeout Exception in my logs. nothing special that could keep connection alive.

My application receive calls on a endpoint and also call other services with
@Inject
@RestClient

I have openapi in my pom.xml 
I'm using Quarkus 1.12.0.FINAL  (not the latest)  PS.. I had that also with 1.10.0.FINAL





Sébastien Dionne

unread,
Mar 5, 2021, 4:29:47 PM3/5/21
to Quarkus Development mailing list
I want to continue the conversation here.  I try something else.  see below

************************
Hi Sébastien,

A zombie process usually occurs when a process forks a child process, the child process makes the 'exit()' system call and the parent process does not make call 'wait()' on the child process to read the exit status.  In Java this can occur if you use ProcesBuilder and the std output buffer of the child process fills up and is not read.   Is the application using ProcessBuilder to fork child processes? 



************************
yes it's what I'm using.  

In my Quarkus application I'm calling a process to Helm to install/update a chart.  What should I add to close correctly the sub process ?



String command = "helm install release xxx";
Process pr = Runtime.getRuntime().exec(command);
List<String> lines = IOUtils.readLines(pr.getInputStream(), Charset.defaultCharset());
or 
String command = "helm install release xxx";
LOGGER.debug("handle Install request : command [{}]", command);
waitForNormalTermination(Runtime.getRuntime().exec(command), INSTALL_TIMEOUT, TimeUnit.SECONDS, name);

private void waitForNormalTermination(Process process, int timeout, TimeUnit unit, String release) throws Exception {
    if (!process.waitFor(timeout, unit)) {
        throw new TimeoutException("Timeout while executing " + process.info().commandLine().orElse(null));
    }

    if (process.exitValue() != 0) {
        String errorStreamOutput = IOUtils.toString(process.getErrorStream(), StandardCharsets.UTF_8);
        if (errorStreamOutput != null && errorStreamOutput.contains("release: not found")) {
            throw new ReleaseNotFoundException(release);
        }

        throw new Exception("Process termination was abnormal, exit value: [" + process.exitValue() + "], command:[" + process.info().commandLine().orElse(null) + "] error returned:[" + errorStreamOutput + "]");
    }
}

I have another usecase when I call a powershell script to put labels on pods

import com.profesorfalken.jpowershell.PowerShell;
import com.profesorfalken.jpowershell.PowerShellResponse;
...
try (PowerShell powerShell = PowerShell.openSession()) {
    String script = generateScript(scriptContent.toString());

    LOGGER.debug("handle add labels to [{}] request : command [{}] with script [{}]", resourceType, command, script);

    Map<String, String> config = new HashMap<>();
    config.put("maxWait", String.valueOf(TimeUnit.SECONDS.toMillis(DEFAULT_TIMEOUT))); // timeout

    PowerShellResponse response = powerShell.configuration(config).executeScript(script);

    String commandOutput = response.getCommandOutput();

    if (response.isTimeout()) {
        throw new TimeoutException("Timeout while executing [" + script + "]");
    }
    if (response.isError()) {
        throw new Exception("Process termination was abnormal, command:[" + script + "] and output was [" + commandOutput + "]");
    }
}


*************

I added  pr.destroy();  before throwning exceptions and existing my methods  just to see if it will change something.

PS.  the code that use Process is used only when I'm installing or updating something from my application.. which is not the case right now.  So there is no activity like that right now.


Here what I did

#1 - add pr.destroy(); in my code
#1b - build and publish the image
#2 - I killed my pod in my cluster.
#3 - my pod was recreated with the new image
#4 - I look into my node were I had zombies (it's the same where my application was).
        I killed the process java that were generating zombie.  I had over 12 000 zombies.. now I'm back at 4200.
#5 - I did :  ps aux | grep 'Z' | wc -l    
       in a loop to see if I have new zombies... and yes.. they are still increasing
       now I have this : root@test-pcl111:~# ps aux | grep 'Z' | wc -l
       4487

      I did this : kubectl logs iep-iep-codec-staging-7596fccd85-jkn68 --follow
in another terminal so see if I have activities... 


the zombies are still increasing each 1-2 seconds even when I don't have activity on my side other than few periodics REST calls (polling from others applications). 


Did a miss something ?

Guillaume Smet

unread,
Mar 6, 2021, 6:31:20 AM3/6/21
to sebastie...@gmail.com, Quarkus Development mailing list
Hi Sébastien,

As Georgios already mentioned it in another thread, this mailing list is dedicated to the development of Quarkus.

Usage questions should be asked on StackOverflow with the quarkus tag. But AFAICS, I'm not even sure your question is about Quarkus.

Thanks.

--
Guillaume

--
You received this message because you are subscribed to the Google Groups "Quarkus Development mailing list" group.
To unsubscribe from this group and stop receiving emails from it, send an email to quarkus-dev...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/quarkus-dev/4024c18a-8bd8-4353-8640-6d9f35f10d80n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages