
Hello,
As I was
intrigued by your problem, I ran some test to figure it out.
Unfortunately, none of them have been conclusive. I tried with three different problems : a GA problem (sorting network), a GP problem (artificial ant) and an ES problem (the fctmin example provided with DEAP).
Each of them was tuned so they spent most of their computing time in the evaluation function, thus allowing us to see if the distribution works well.
I ran those tests on two separate platforms (both on Linux), one high-end workstation (8 cores, 12 GB RAM, 64 bits) and one "low-end" computer (2 cores, 2 GB RAM, 32 bits), with Python 2.7 and Pypy 1.8.
I was not able to reproduce your bug : each time, pypy scales adequatly, as cPython.
The included printscreen shows that correct behavior (it was taken from an ES run) : the small concavities are the non-parallelisable work (selection, cloning, etc.), but we can see that all of the 8 CPU cores are 100% loaded when it comes to evaluate individuals, which is correct.
More specifically on ES, I should state an important point (which can be generalized to every usage of multiprocessing.Pool.map() with a very small number of items) : as multiprocessing does not implement a load balancing algorithm (it just separates the work into approximately equal chunks), having a small number of individuals like in ES combined with an high standard deviation in evaluation times, may conduct to a performance decrease.
If you have any remaining information which may help us to find out why your algorithm is not distributed well with multiprocessing, please let us know (although I do not believe that it would be a DEAP bug).
Regards,
Marc-André Gardner