Some things on top of my head:
Security:
"sandbox" is a bit generic here. On Linux desktop there are at least two layers of sandbox that affect the zygote:
1) the bpf sandbox (the thing that filters syscalls). IIRC this one is pre-initialized in the zygote, to pay the cost of building the BPF tree once, but effectively entered in each child process after fork.
2) the namespace sandbox (The thing that isolates uid and pid namespaces).
It seems that right now the zygote takes also care of namespace bookkeeping, so I guess that you can't easily get rid of that without losing 2 (or refactoring all that outside of the zygote)
Run-time:the zygote avoids to redo all the dynamic relocations on each renderer.
On Android the cost of relocations was in the ballpark of hundreds of ms.
On Linux, if my ld-fo is correct:
$ LD_DEBUG=statistics /opt/google/chrome-beta/chrome --help 2>&1
21610:
21610: runtime linker statistics:
21610: total startup time in dynamic loader: 73899158 clock cycles
21610: time needed for relocation: 56836478 clock cycles (76.9%)
21610: number of relocations: 4271
21610: number of relocations from cache: 11347
21610: number of relative relocations: 502740
21610: time needed to load objects: 15789844 clock cycles (21.3%)
so looks like roughly ~60 ms/GHz
On top of dynamic loading and relocations, on Linux, IIRC there is some further bootstrapping that the zygote does, amortizing the initialization cost and hence reducing the startup cost for a new renderer. If I am reading the code correctly, essentially anything that happens in ContentMainRunnerImpl::Initialize() (% exclusions based on process_type) is CoW-ed by all the zygote forks: things like loading the v8 snapshot, initializing ICU, early-initializing NSS, etc.
Memory:
the aforementioned relocations have also a memory cost. In the case of the zygote, that cost is paid once and amortized over all the child processes.
Without the zygote, this would cost an extra ~6 MB per process:
$ readelf -WS /opt/google/chrome-beta/chrome
[Nr] Name Type Address Off Size ES Flg Lk Inf Al
...
[25] .data.rel.ro PROGBITS 0000000006a8b590 6a8a590 5b5500 00 WA 0 0 16
0x5b5500 -> 5.98 MB
actual impact on memory usage
$ cat /proc/.../smaps
7fbdd1c81000-7fbdd2233000 r--p 06a5d000 fc:00 665771 /opt/google/chrome-unstable/chrome
...
Shared_Dirty: 5796 kB
(this would become private_dirty when dropping the zygote)
Similarly, if my reading above is correct, re-initializing v8, ICU, etc in each process will cause further memory to not be CoW-ed (hence shared) anymore.
A quick check suggests that the zygote shares ~8 MB of dirty memory with the other processes:
$ cat /proc/$PID_OF_ZYGOTE/smaps | grep Shared_Dirt | awk '{TOTAL += $2} END {print TOTAL}'
8092