Sorry for getting back so late.
Here is the conclusion after many testing:
1. C Call Go is the best method as far as I could find (Thanks Ian!): √
The latency could nearly always keep less than 100us no matter the
system is under light load or moderated heavy load.
2. Socketpair ( Thanks Sokolov for the recommendation):
In fact, the net.FileConn could add a socket fd into the golang runtime
event poll directly, thus we could read/write a conn generated from `Socketpair`
without blocking the OS thread and go routine. But the latency of this method is
hard to control. When the system is under light load, the latency is pretty fine
(nearly the same as `C Call Go`), but the latency could be often more than 1ms
when the system is under moderated heavy load. That is due to the different
priority of IPCs implementation in the kernel.
Hope it will be helpful to people who meet the same question in the future.
Enjoy!