On 16 June 2015 at 03:07, Luke Gorrie <
lu...@snabb.co> wrote:
> On 15 June 2015 at 18:43, Luke Gorrie <
lu...@snabb.co> wrote:
>>
>> I suppose the fork() design also does not require the sandboxes and "new
>> style" apps?
>
>
> To expand on that a bit...
>
> There are a couple of reasons the "sandboxed" apps that can't see each other
> are interesting:
>
> 1. If we want the freedom to run app instances in different VMs/processes
> then it is important that they don't casually depend on each other e.g. by
> expecting to communicate using shared Lua variables.
>
> 2. If we want to run multiple app instances in parallel within the same
> process then they need to be running in separate Lua VMs (lua_states).
>
> Really there are many different ways we could tackle problem #1: separate VM
> per app, Lua-level sandbox per app, or simply explain that you should not
> depend on apps to have a shared environment (or otherwise carefully document
> the implications of running them in separate processes). This does not seem
> to be on the critical path for the parallel app network implementation: that
> will work without sandboxes and separately we can consider whether they are
> worth adding in their own right.
>
> Problem #2 is also not on the critical path if we are fork()ing worker
> processes. Then we only have one thread of execution per process and so we
> only need one Lua VM.
nitpicking: the LuaVM itself is fork()ed, so it becomes one Lua per
process, but isn't separately initialized, it inherits all the state
up to the fork() point.
there's some confusing effects when inheriting core.shm objects, it
seems that the mmap()'ed part is shared, but there seem to be some
kernel per-process housekeeping that makes some surprising COW
effects. it's much easier to do the shm.map(name) _after_ the
fork(). the pid()-relative nameform makes it easy to share within a
fork, it would be nice to have a while-program-id too, to make it easy
to share between forks but different between program invocations.
>
>
> Separately...
>
>
> There is another issue in the back of my mind which is DMA memory mappings.
> If one process allocates a new HugeTLB then how do we map that into all the
> processes? That is, how do we make sure the 'struct packet *' addresses on
> the inter-app rings are actually pointing to the same valid memory in every
> worker process?
i still don't have a foolproof way to share packets between forks.
the mentioned benchmarks used a single freelist per process, and all
packets were allocated, processed and released within the same fork.
i have some hopes that it should be possible to allocate the freelist
at startup (before the forks) and keep the pointers consistent and the
DMA mapping shared and non-COW. that should allow the apps to share
pointers within that space. i guess it wouldn't be able to grow after
the fork() point
if that proves too problematic, the only way out i see would be to
work with offsets within the pool instead of pointers, and let each
fork map it at a different address.
> The simplest way would be to reserve memory at the start before forking and
> only use that. Probably reasonable. Fancier would be to catch SIGSEGV,
> recognize access to unmapped DMA memory, and map it in on the fly. That
> sounds pretty horrendously complicated though. Perhaps there is a middle
> ground: a simple mechanism that supports dynamic allocation.
mmap() supports an 'address hint'. maybe we could somehow reserve a
big address space (say, the third terabyte?) and ask mmap() to put
each DMA allocated block at an specific place. what i don't know is
if simply stacking one point right after the previous one would be ok,
or if there's some housekeeping data just after or just before the
mmap()ed block (i'm guessing no, since it favors page-aligned blocks)
> There are likely other issues too e.g. what happens if a worker process
> crashes hard? (Can we reclaim the packet memory it was using? Or should we
> terminate the whole process tree and restart?) Conceivably it might even be
> worth making copies of packets when they pass between processes so that
> memory ownership is very clear. I am not sure.
every non-shared posix resource is automatically released only after
the parent process calls wait() to get the return status. otherwise
it's a zombie process, taking all its resources.
our application-specific resources would be ours to deal with, from
the OS point of view they're shared so won't be released while other
forks are running. i guess since packets don't contain pointers to
other structures, i guess it would be safe to return to the freelist
to reuse; the biggest challenge would be how to identify which packets
were handled by this specific process.
not releasing those packets and restarting only the apps that died
should keep the system running but leaking memory. maybe setting some
'sync points' where a full terminate and restart (resetting the shared
freelist) would be acceptable.
> I should read the code on your fork_poc branch to see how it works for now
> :). Have been away from the keys for a bit.
it's really, really simple, just a 4-5 line spawn() function that does
a fork() and require() a named file. in the test, that file is just a
config.new()/app/link/engine.configure()/run(). the 'main' process
just initializes the shared stats object, spawn()s a lot, wait()s them
all, show stats.
--
Javier