I have a situation on zombie parent scenario with golang
A process (in the case replicator) has many goroutines internally
<<>>>:~$ ps -ef | grep replicator
root 87548 87507 0 Aug23 ? 00:00:00 [replicator] <defunct>
fatal error: concurrent map writes
goroutine 666359 [running]:
runtime.throw(0x101d6ae, 0x15)
/home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/panic.go:608
+0x72 fp=0xc00374b6f0 sp=0xc00374b6c0 pc=0x42da62
runtime.mapassign_faststr(0xdb71c0, 0xc00023f5f0, 0xc000aca990, 0x83,
0xc0009d03c8)
/home/ll/ntnx/toolchain-builds/78ae837ba07c8ef8f0ea782407d8d4626815552b.x86_64/go/src/runtime/map_faststr.go:275
+0x3bf fp=0xc00374b758 sp=0xc00374b6f0 pc=0x41527f
github.eng.nutanix.com/xyz/abc/metadata.UpdateRecvInProgressFlag(0xc000aca990,
0x83, 0x0)
.......
goroutine 665516 [chan receive, 2
minutes]:
zeus.(*Leadership).LeaderValue.func1(0xc003d5c120, 0x0, 0xc002e906c0, 0x52,
0xc00302ec60, 0x29)
/home/ll/ntnx/main/build/.go/src/zeus/leadership.go:244 +0x34
created by zeus.(*Leadership).LeaderValue
/home/ll/ntnx/main/build/.go/src/zeus/leadership.go:243 +0x277
2020-08-03 00:35:04 rolled over log file
ERROR: logging before flag.Parse: I0803 00:35:04.426906 196123 dataset.go:26]
initialize zfs linking
ERROR: logging before flag.Parse: I0803 00:35:04.433296 196123 dataset.go:34]
completed zfs linking successfully
I0803 00:35:04.433447 196123 main.go:86] Gflags passed NodeUuid:
c238e584-0eeb-48bd-b299-2a25b13602f1, External Ip: 10.15.96.163
I0803 00:35:04.433460 196123 main.go:99] Component name using for this process
: abc-c238e584-0eeb-48bd-b299-2a25b13602f1
I0803 00:35:04.433467 196123 main.go:120] Trying to initialize DB
If there is panic() from main P thread, as I understand we exit() and cleanup all P threads of the process.
Are we hitting into the following scenario, I did not look into M-P-G implantation in detail.
Example:
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <stdlib.h>
void *thread_function(void *args)
{
printf("The is new thread! Sleep 20
seconds...\n");
sleep(100);
printf("Exit from thread\n");
pthread_exit(0);
}
int main(int argc, char **argv)
{
pthread_t thrd;
pthread_attr_t attr;
int res = 0;
res = pthread_attr_init(&attr);
res = pthread_attr_setdetachstate(&attr,
PTHREAD_CREATE_DETACHED);
res = pthread_create(&thrd, &attr, thread_function,
NULL);
res = pthread_attr_destroy(&attr);
printf("Main thread. Sleep 5 seconds\n");
sleep(5);
printf("Exit from main process\n");
pthread_exit(0);
}
kkk@ ~/mycode/go () $ ./a.out &
[1] 108418Main thread. Sleep 5 secondsThe is new thread! Sleep 20 seconds...
kkk@ ~/mycode/go () $
Exit from main processs
PID TTY TIME CMD
49313 pts/26 00:00:01 bash108418 pts/26 00:00:00 [a.out] <defunct>108449 pts/26 00:00:00 ps
See the main process is <defunct> and child is still hanging around
kkk@ ~/mycode/go () $ sudo cat /proc/108418/task/108420/stack[<ffffffff810b4c1d>] hrtimer_nanosleep+0xbd/0x1d0[<ffffffff810b4dae>] SyS_nanosleep+0x7e/0x90[<ffffffff816a63c9>] system_call_fastpath+0x16/0x1b[<ffffffffffffffff>] 0xffffffffffffffffujonnala@ ~/mycode/go () $ Exit from threadAny help in this regard is appreciated.
$ ps -ef | grep replicator
root 87548 87507 0 Aug23 ? 00:00:00 [replicator] <defunct>
Now looking at the tasks within the process
I see the stack trace of the threads within the process still stuck on following
bash-4.2# cat /proc/87548/task/87561/stack
[<ffffffffbb114714>] futex_wait_queue_me+0xc4/0x120
[<ffffffffbb11520a>] futex_wait+0x10a/0x250
[<ffffffffbb1182ce>] do_futex+0x35e/0x5b0
[<ffffffffbb11865b>] SyS_futex+0x13b/0x180
[<ffffffffbb003c09>] do_syscall_64+0x79/0x1b0
[<ffffffffbba00081>] entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[<ffffffffffffffff>] 0xffffffffffffffff
From the above example if we are creating some internal threads and main thread is excited due to panic and left some detached threads, process will be in zombie state until the threads
within the process completes.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f1c6abc0-13b2-41ca-a365-fe0fbc7f129an%40googlegroups.com.
Thanks for the reply. We are fixing the issue. But the point I wanted to bring it up here is the issue of a thread causing the go process to be in defunct state.
My kernel version isLinux version 4.14.175-1.nutanix.20200709.el7.x86_64 (dev@ca4b0551898c) (gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)) #1 SMP Fri Jul 10 02:17:54 UTC 2020
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/ad4843e1-f7d1-43ae-8091-579bc61527fdn%40googlegroups.com.
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/ad4843e1-f7d1-43ae-8091-579bc61527fdn%40googlegroups.com.