Note: This is NOT a proper fix but rather a quick&dirty workaround to allow process forking after grpc_init() has been called. Hopefully there will be a proper fix soon.
The current problem with GRPC as described in many places (e.g. #17695, #13412, #11814) is that once grpc_init() has been called in a process, then this process should not be forked. There is a somewhat partial forking support but only if no connection has been established yet.
In my case I have an extension for PHP which is running under Apache. When Apache runs in prefork mode, it dynamically forks child processes to handle the incoming load. This means I need to be able to initialize a new GRPC instance in forked processes even when the parent process already has an active GRPC connection. Since this use case is not yet supported I made a workaround.
Essentially what this workaround does is to reset the internal GRPC state just enough so that a new and clean GRPC instance can be created in a forked child process.
My code change is based on the 1.24.3 release and can be found here.
There are several functions that are called internally to collect active sockets, pthread_once states, mutexes, and conditional variables:
void grpc_add_socket_fd(int fd);
void grpc_add_once_init(pthread_once_t *once);
void grpc_add_mutex(pthread_mutex_t *mutex);
void grpc_remove_mutex(pthread_mutex_t *mutex);
void grpc_add_cond(pthread_cond_t *cond);
void grpc_remove_cond(pthread_cond_t *cond);
Then there are new globals which hold a pointer to critical internal GRPC states:
void **fork_g_event_engine;
void **fork_g_resolver_registry_state;
...
From the user perspective, all that is needed is to call the newly added function grpc_clean_after_fork() before initializing GRPC. Be sure to call it ONLY ONCE per process.
grpc_clean_after_fork() closes all inherited sockets, resets mutexes, conditional variables, pthread_once states, and other internal states that are left-overs from the parent process. Afterwards GRPC will initialize properly in every forked child.
I'm just sharing my findings here in case somebody else is in a similar situation and wants to give it a try. However, while this workaround works for me, it might not work in other cases.