Hi Alex,
One of the main goals of nxweb was programmer friendly C API. Writing modules for multi-process servers such as Apache or Nginx seemed nightmare to me. Inter-process communication is nowhere close in convenience to inter-thread communication. So I knew from the start that I am going to use threads, not processes.
I had concerns though about performance, cause there are opinions that processes scale better. After studying the reasoning behind those opinions I found one potential problem, which is called 'false sharing' of memory between threads running on different CPU cores. To avoid false sharing I implemented guard memory areas between allocated blocks.
Funny thing is that I did not see any performance change after adding those guard areas. So I do not think that false sharing was ever a performance factor for nxweb.
Another performance problem could be with multi-threaded malloc, which internally uses some kind of blocking. I use my own free-list memory allocation (separate for each thread) for most frequently allocated blocks, but there are still calls to malloc in code. The problem can be mitigated by linking against tcmalloc (or similar) library, I think.
The event loop of nxweb is single-threaded. Threads in nxweb work independently and share minimum information to avoid locks, etc.
By the way what benefits do you see of process workers over threads?
Yaroslav