--
You received this message because you are subscribed to the Google Groups "PyData" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pydata+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
It depends on exactly what you want to do with the DataFrame. If you only need part of it in each process, you can divide it up and send just one part of it at a time (like with groupby or iterrows).
If that isn't the case, there isn't much you can do. The windows multiprocessing capabilities are very different than those of pretty much any other modern operating system, and you are encountering one of those issues.
You can share data if you use multiple threads rather than processes. However, then you get bitten by Python’s Global Interpreter Lock (GIL), which means that only one thread can access the CPU at a time, pretty much eliminating the advantage of using multiple threads in the first place (at least for CPU-bound work)! You can, however, get around the GIL by writing multithreaded code in Cython and explicitly turning off the GIL for the stuff you want to run in parallel.
Here’s a brief, very readable presentation by Francesc Alted that talks about this and other ways around the GIL:
https://python.g-node.org/python-summerschool-2011/_media/materials/parallel/parallelcython.pdf
Best,
Richard Stanton
--