I encountered a wiered problem recently.
Our pure go program, named controllerd, stopped working except looping on one cpu core (from top command, it consumed one cpu core). Before it stopped working, it runs from mid April to Jun 12.
After profiling the process with perf command, the perf data shows the program looped to call __vdso_clock_gettime .
64.20% controllerd [vdso] [.] __vdso_clock_gettime
8.82% controllerd controllerd [.] runtime.procyield
8.82% controllerd controllerd [.] runtime.suspendG
8.01% controllerd controllerd [.] runtime.nanotime1
1.47% controllerd [kernel.kallsyms] [k] __enqueue_entity
0.79% controllerd [kernel.kallsyms] [k] system_call_after_swapgs
0.79% controllerd [kernel.kallsyms] [k] set_next_entity
0.68% controllerd [kernel.kallsyms] [k] _raw_gspin_lock
0.68% controllerd [kernel.kallsyms] [k] change_pte_range
0.65% controllerd [kernel.kallsyms] [k] auditsys
0.57% controllerd [kernel.kallsyms] [k] update_curr
0.45% controllerd [kernel.kallsyms] [k] native_sched_clock
0.45% controllerd [kernel.kallsyms] [k] cpuacct_charge
0.45% controllerd [kernel.kallsyms] [k] __x86_indirect_thunk_+rax
0.45% controllerd [kernel.kallsyms] [k] __audit_syscall_exit
0.34% controllerd [kernel.kallsyms] [k] pick_next_task_fail
0.23% controllerd controllerd [.] runtime.osyield
0.23% controllerd [kernel.kallsyms] [k] native_queuedc_spin_lock_slowpath
0.23% controllerd [kernel.kallsyms] [k] dput
0.23% controllerd [kernel.kallsyms] [k] __schedule
0.23% controllerd [kernel.kallsyms] [k] put_prev_task_fair
0.23% controllerd [kernel.kallsyms] [k] yield_task_fair
0.22% controllerd [kernel.kallsyms] [k] sys_sched_yield
0.11% controllerd [kernel.kallsyms] [k] clear_buddies
0.11% controllerd [kernel.kallsyms] [k] update_rq_clock.part.78
0.11% controllerd [kernel.kallsyms] [k] update_min_vruntime
0.11% controllerd [kernel.kallsyms] [k] tick_do_update_jiffies64
0.11% controllerd [kernel.kallsyms] [k] system_call
0.11% controllerd [kernel.kallsyms] [k] rb_next
0.11% controllerd [kernel.kallsyms] [k] rb_insert_color
0.00% controllerd controllerd [.] runtime.notesleep
0.00% controllerd [kernel.kallsyms] [k] load_balance
0.00% controllerd controllerd [.] runtime.runggrab
0.00% controllerd controllerd [.] runtime.findrunnable
0.00% controllerd [kernel.kallsyms] [k] update_numa_stats
0.00% controllerd [kernel.kallsyms] [k] __lock_task_sighand
0.00% controllerd [kernel.kallsyms] [k] idle_cpu
0.00% controllerd [kernel.kallsyms] [k] kmem_cache_free_bulk
0.00% controllerd [kernel.kallsyms] [k] task_numa_find_cpu
0.00% controllerd [kernel.kallsyms] [k] __queue_work
0.00% controllerd [kernel.kallsyms] [k] __perf_event_task_sched_in
0.00% controllerd [kernel.kallsyms] [k] finish_task_switch
0.00% controllerd [kernel.kallsyms] [k] perf_ctx_unlock
0.00% controllerd [kernel.kallsyms] [k] native_write_msr_safe
0.00% controllerd [kernel.kallsyms] [k] __perf_event_enable
I'm not familiar with Go's runtime detail. But after getting the perf data and reading code of function suspendG of Go's runtime. I guessed: The program looping in suspendG and failed to find any other goroutine to execute.
Thanks.