I'm using cgo to call a C function from Go. Inside the C function there is a callback to a Go function. In other way, I'm calling Go -> C -> Go.
After running pprof, I noticed that the __GI___pthread_mutex_unlock took half of the execution time. AFAIK, cgo has an overhead, especially calling back from C to Go. But it's weird that cgo takes half of the execution time to do some locking. Is there something wrong in my code?
Here is the gist containing the code https://gist.github.com/phqb/2fa5bc76b77208d7f36b151b405e6dcc.
pprof output:
```
(pprof) top
Showing nodes accounting for 136.20s, 89.12% of 152.82s total
Dropped 252 nodes (cum <= 0.76s)
Showing top 10 nodes out of 64
flat flat% sum% cum cum%
68.20s 44.63% 44.63% 68.20s 44.63% __GI___pthread_mutex_unlock
54.32s 35.55% 80.17% 54.32s 35.55% __lll_lock_wait
3.57s 2.34% 82.51% 57.89s 37.88% __pthread_mutex_lock
2.38s 1.56% 84.07% 3.15s 2.06% runtime.casgstatus
1.81s 1.18% 85.25% 1.81s 1.18% runtime.futex
1.26s 0.82% 86.08% 3.99s 2.61% runtime.mallocgc
1.21s 0.79% 86.87% 2.54s 1.66% runtime.lock2
1.21s 0.79% 87.66% 1.48s 0.97% runtime.reentersyscall
1.15s 0.75% 88.41% 1.15s 0.75% runtime.procyield
1.09s 0.71% 89.12% 1.59s 1.04% runtime.exitsyscallfast
```
Running environment:
* Golang version: go version go1.20.5 linux/amd64
* `lscpu`:
```
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 46 bits physical, 48 bits virtual
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU @ 2.20GHz
Stepping: 0
CPU MHz: 2200.152
BogoMIPS: 4400.30
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 512 KiB
L1i cache: 512 KiB
L2 cache: 4 MiB
L3 cache: 55 MiB
NUMA node0 CPU(s): 0-31
...
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtop
ology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowpre
fetch invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat md_clear arch_capabilities
```