What OS are you on? On POSIX (but not Windows), there's a cheap trick
that works great when you can get away with it: when you spawn a child
process via fork() (as multiprocessing does), then the child process
ends up with with *looks* like a complete copy of the parent process's
memory -- but the operating system implements this using a sneaky
virtual memory hack, where the memory isn't actually copied until it's
written to (the term of art is "copy-on-write").
The easiest way to do this is to stash your data in a global variable,
then use multiprocessing to spawn some workers in the usual way -- but
have them access the data via the global, instead of via explicit
message passing. Note that (1) the data will be writeable in the
children, but (2) writes made in one child won't be visible in the
others, and (3) when you write to the data, your actual memory usage
will go up. So you should pretend it's read-only.
One possible spanner in the works is that every time you access a
*Python* object, its refcount gets updated -- which is a write
operation. But this only a big deal if you have many small, Python
objects -- if you have one Python object that you use to access a
giant pile of memory, then it's no big deal. This might interact badly
with pandas's habit of using the 'object' dtype though :-/
-n
> --
> You received this message because you are subscribed to the Google Groups "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to
pydata+un...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>