SCOOP/multiprocessing - no performance improvement

1,430 views
Skip to first unread message

Christoph

unread,
Jan 18, 2016, 6:10:29 AM1/18/16
to deap-users
Hi,

I am currently testing DEAP for future structural optimization tasks. Using the setup from the beginner's tutorial I'm testing parallel evaluation using to optimize the rosenbrock function. To assess the benefits of parallelization I additionally do some useless but intense calculations in the objective function. Weird thing is, SCOOP and multiprocessing both seem to work (they spawn multiple python processes), but both are slower than not using parallelization. OS is Win7 x64.
What's wrong there?

Code:

import random
import numpy as np
import multiprocessing

from deap import base, creator, tools
from scoop import futures
from timeit import default_timer as timer

if __name__ == '__main__': # Protecting multiprocessing pool as indicated in the tutorial

    # Create Fitness and Individual Classes
    creator.create('FitnessMin', base.Fitness, weights=(-1.0,)) # Weights must be a sequence
    creator.create('Individual', list, fitness=creator.FitnessMin)
    
    # Create Individual Type
    IND_SIZE = 2
    
    toolbox = base.Toolbox()
    def rand():
        return random.uniform(-3,3)
        
    toolbox.register('attr_float', rand)
    toolbox.register('individual', tools.initRepeat, creator.Individual,
                     toolbox.attr_float, n=IND_SIZE)
                    
    # Create Population Type
    toolbox.register('population', tools.initRepeat, list, toolbox.individual)
    
    # Define Evaluation Function
    def evaluate(individual):
        """Fitness evaluation. Must return a tuple even for single objective optimization.
        """
        x = (individual[0])
        y = (individual[1])
        
        # Do some heavy stuff which is totally useless
        n = 1873895732
        import math
        for i in range(10000000):
            math.sqrt(n)
            i = i + 1
        print 'Done!'

        f = (1-x)**2 + 100*(y-x**2)**2 # Actual rosenbrock fct

        return (f,) # Must be an iterable
        
    # Define Operators
    toolbox.register('mate', tools.cxTwoPoint)
    toolbox.register('mutate', tools.mutGaussian, mu = 0, sigma = 1, indpb = 0.1)
    toolbox.register('select', tools.selTournament, tournsize=3)
    toolbox.register('evaluate', evaluate)
    #toolbox.register('map', futures.map) # or this line instead of the following two and using python -m scoop file.py on cmd
    pool = multiprocessing.Pool()
    toolbox.register('map', pool.map)
    
    # Optimization Algorithm
    def main():
        """Defines the optimization process. Should be contained in the main function.
        """
        
        # General Settings
        POP_SIZE = 20   # Population size
        NGEN = 1        # Number of generations
        CXPB = 0.2      # Crossover probability
        MUTPB = 0.5     # Mutation probablilty
        
        # Initialize population
        pop = toolbox.population(n=POP_SIZE)
        
        # Evaluate the entire initial population
        fitnesses = map(toolbox.evaluate, pop)
        # Assign the fitnesses
        for ind, fit in zip(pop,fitnesses):
            ind.fitness.values = fit
            
        # Evolutionary Loop
        for g in range(NGEN):
            # Select the next generations individuals (len(pop) individuals)
            offspring = toolbox.select(pop, len(pop))
            # Clone the selected individuals
            offspring = map(toolbox.clone, offspring)
            
            # Apply crossover on the offspring
            for child1, child2 in zip(offspring[::2], offspring[1::2]):  # Chooses offspring with neighbouring indices to mutate
                if random.random() < CXPB: 
                    # Do crossover inplace
                    toolbox.mate(child1, child2)
                    # Request reevaluation of fitnesses
                    del child1.fitness.values
                    del child2.fitness.values
                    
            # Apply mutation on the offspring
            for mutant in offspring:
                if random.random() < MUTPB:
                    # Do mutation inplace
                    toolbox.mutate(mutant)
                    # Request reevaluation of fitness
                    del mutant.fitness.values
                    
            # Evaluate fitnesses of individuales with invalid fitnesses
                    invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
                    fitnesses = map(toolbox.evaluate, invalid_ind)
                    for ind, fit in zip(invalid_ind, fitnesses):
                        ind.fitness.values = fit
                        
            # Replace the population by the offspring
            pop[:] = offspring
        
        return pop
        
    pop = main()
    print timer()-tic

Best regards,
Christoph

Ben Elliston

unread,
Jan 18, 2016, 6:57:12 AM1/18/16
to deap-...@googlegroups.com
Hi Christoph

How are you running this script from the command line? You need python
-m scoop. Without it, you may see:

RuntimeWarning: SCOOP was not started properly.
Be sure to start your program with the '-m scoop' parameter. You can
find further information in the documentation.
Your map call has been replaced by the builtin serial Python map().

Cheers, Ben

François-Michel De Rainville

unread,
Jan 18, 2016, 7:47:48 AM1/18/16
to deap-users
The problem is that the overhead of parallelisation is bigger than the time it takes to compute the very simple rosenbrock function as per say the Amdahl's Law. Try adding a sleep of a few milliseconds in the evaluation.

Cheers,
François-Michel


--
You received this message because you are subscribed to the Google Groups "deap-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to deap-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Christoph

unread,
Jan 18, 2016, 2:58:25 PM1/18/16
to deap-users
Hi Ben and François-Michel,

thank you for your reply. I am running the script from cmd via "python -m scoop" and SCOOP is indeed launching 4 processes on my 4 core machine (as is multiprocessing, but this does on a side note not close the processes after the computation unless i close the python interpreter).

I am aware that I have to provide something bigger to compute in parallel. I was using a repeated square computation of a large number in the evaluation for that but have also tried time.sleep() with no gain at all from parallel processing.

I've pushed the code to pastebin for better readability. It's basically just the example from the beginning tutorial. You may want to check it out:

Cheers,
Christoph


Marc-André Gardner

unread,
Jan 18, 2016, 3:06:34 PM1/18/16
to deap-users
Hi Christoph,

Just replace :

fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)

by :

fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)

Don't forget that this line appears two times in main().

Else, your code always use the Python builtin map, no matter which map operator you declared in the toolbox. Apart from that it should work (either with multiprocessing or SCOOP).

Do not hesitate if you have any other questions,

Marc-André

Marc-André Gardner

unread,
Jan 18, 2016, 3:10:24 PM1/18/16
to deap-users
Hi again,

Sorry, you would have understand that the previous (wrong) code is :

fitnesses = map(toolbox.evaluate, invalid_ind)

You have to add the "toolbox" prefix to use the map registered in this toolbox. This is the good code :

fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)

Sorry for the confusion,

Marc-André

Christoph

unread,
Jan 18, 2016, 4:22:07 PM1/18/16
to deap-users
Hi Marc-André,

thank you. That should have appeared to me! This brings me to the next problem. Changing exactly these lines I obtain:

Using SCOOP:

[2016-01-18 22:10:36,105] launcher  INFO    SCOOP 0.7 1.1 on win32 using Python
2.7.11 |Anaconda 2.4.1 (32-bit)| (default, Dec  7 2015, 14:13:17) [MSC v.1500 32
 bit (Intel)], API: 1013
[2016-01-18 22:10:36,105] launcher  INFO    Deploying 4 worker(s) over 1 host(s)
.
[2016-01-18 22:10:36,105] launcher  INFO    Worker distribution:
[2016-01-18 22:10:36,105] launcher  INFO       127.0.0.1:       3 + origin
[2016-01-18 22:10:36,885] scoopzmq  (127.0.0.1:53974) ERROR   An instance could
not find its base reference on a worker. Ensure that your objects have their def
inition available in the root scope of your program.
'module' object has no attribute 'Individual'

Using multiprocessing:

Running the script spawnes subprocesses, but they do not do anything at all. I also tried the standard syntax of multiprocessing on that toolbox element to no avail:

jobs= toolbox.map(toolbox.evaluate, invalid_ind)
fitnesses = jobs.get()

Christoph

Marc-André Gardner

unread,
Jan 18, 2016, 4:32:41 PM1/18/16
to deap-users
You're right. The if __name__ == '__main__' block should be used only to call the root function.
All the workers except the root one never execute the code in this block, so if you define variables in it (like you do in your code), these variables won't be reachable by the other workers, hence the error you see : "Ensure that your objects have their definition available in the root scope of your program."

Basically, your if __name__ == '__main__' can be only this (with SCOOP) :

if __name__ == '__main__':
   tic = timer()
   pop = main()
   print timer()-tic
 
or this (with multiprocessing) :

if __name__ == '__main__':
   pool = multiprocessing.Pool()
   toolbox.register('map', pool.map) 
   tic = timer()
   pop = main()
   print timer()-tic

and it should work.

Good luck!

Marc-André

Christoph

unread,
Jan 19, 2016, 5:13:08 AM1/19/16
to deap-users
Hi,

thanks four your help - I got it working the way I wanted using

if __name__ == '__main__':
    
    pool = multiprocessing.Pool(processes=4)
    toolbox.register('map', pool.map_async)

And for evaluation in the main() function

jobs = toolbox.map(toolbox.evaluate, pop)
fitnesses = jobs.get()

For future reference the full code is to be found at http://pastebin.com/0WazVPKy

Cheers
Reply all
Reply to author
Forward
0 new messages