Differential evolution in parallel

524 views
Skip to first unread message

Gaël Grissonnanche

unread,
Dec 21, 2020, 12:16:43 AM12/21/20
to lmfit-py
Hello!
I am looking for if I could use the differential evolution algorithm in lmfit in an unorthodox way. I would like that for each generation, the DE gives me all the child members of the new generation in advance and that I evaluate them all at once in my objective function.
The reason is that my objective function calls COMSOL. I can do a batch of calculations in a COMSOL that COMSOL is going to parallelize carefully, so I don't want the DE to parallelize it itself. So in the end, I want to calculate all the members in one call of COMSOL. Do you have any idea if it is achievable with lmfit?

Thank you for your help!

Matt Newville

unread,
Dec 21, 2020, 1:49:01 PM12/21/20
to lmfit-py
Hi Gael,

Did you try setting options for 'workers' and 'updating'? 

--Matt 

Gaël Grissonnanche

unread,
Dec 21, 2020, 4:40:35 PM12/21/20
to lmfit-py
Hi Matt,

Thank you for your answer. I haven't, you mean using a "map-like callable function" for "workers"? I guess I don't know how this would work for what I need, I have looked online to understand what it means "map-like callable" for the differential evolution algorithm and how to use it, but I don't get it...

Gaël

Matt Newville

unread,
Dec 21, 2020, 6:40:36 PM12/21/20
to lmfit-py
On Mon, Dec 21, 2020 at 3:40 PM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi Matt,

Thank you for your answer. I haven't, you mean using a "map-like callable function" for "workers"? I guess I don't know how this would work for what I need, I have looked online to understand what it means "map-like callable" for the differential evolution algorithm and how to use it, but I don't get it...

Can you set "workers" to 1? Lmift's differential_evolution is a pretty shallow wrapping of the scipy.optimize function. 

Just to be clear (we say so often it is painful), if you do not include example code (complete, minimal example), it is really very hard for us to give any meaningful help.  

--Matt

Gaël Grissonnanche

unread,
Jan 6, 2021, 12:10:35 AM1/6/21
to lmfit-py
Hi Matt,
I am sorry that it took me so long to come with a basic example.
If we forget my very first message, first of all what I would like is to be able to make this simple code work with workers, and on my computer it does not with lmfit, but it does if I use scipy differential evolution without lmfit.

---------------------
import numpy as np
from time import time, sleep
from lmfit import minimize, Parameters, report_fit

= 300

def residual(pars, data):

    C = np.array([pars["a"], pars["b"], pars["c"]])

    #multiply two random matrices to slow things down artificially
    f = np.random.random([N,N])
    g = np.random.random([N,N])
    l = np.matmul(f, g)

    return (C - data)

pars = Parameters()
pars.add("a", value = 0, min = -20000, max= 20000)
pars.add("b", value = 0, min = -20000, max= 20000)
pars.add("c", value = 0, min = -20000, max= 20000)

data = np.array([12.342234, 254.4356340, 97.1327916])

t0 = time()
out = minimize(residual, pars, args=(data,), method='differential_evolution', workers=2, updating="deferred")
report_fit(out)
print("##Time total: ", time() -t0, "s")
-----------------

With workers=1 that is fine, in fact it does not improve the time, with workers=2 it does not work anymore (RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase).
I have 8 cores, and I can make it work with 8 workers using scipy instead of lmfit. Do you understand what is wrong?

Thanks!

Matt Newville

unread,
Jan 6, 2021, 2:43:50 PM1/6/21
to lmfit-py
Hi Gael, 

That

On Tue, Jan 5, 2021 at 11:10 PM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi Matt,
I am sorry that it took me so long to come with a basic example.
If we forget my very first message, first of all what I would like is to be able to make this simple code work with workers, and on my computer it does not with lmfit, but it does if I use scipy differential evolution without lmfit.


Hm, not sure.  Your script "works" for me in the sense of creating many worker processes.  

I get an AbortFit exception that the maximum number of iterations (set to 4000) was exceeded.   In fact, it looks like setting 'max_nfev' is currency ignored by `differential_evolution`.  That's easy to fix, and setting it to 10000 does work for me for your test script.   Is that consistent with what you see?   You said "it does not work", which is kind of unclear...  Anyway, I will make a Pull Request to fix this.

FWIW, I changed N=300 to N=25 and I see the runtime as a function of the number of workers on a machine with 14 cores / 28 Threads (Linux):

Nworkers    N_function_evals   Runtime (sec) 
  16                          7366                   77.5
  10                          8401                   48.5
    8                          7816                   43.2
    6                          7996                   45.2        
    4                          8356                   31.3
    2                          7771                   16.8
    1                          8311                     0.8

Yes, that is not a typo.  And, yes, I did verify that multiple worker processes were created.  For this problem inter-process communication swamps the benefits of multiple processing.   No amount of "well did you try *this*" or "what was that parameter set to" can hope to explain this away.  

One of the features of lmfit is how easy it is to try out other solvers.  So, I tried 'shgo' with
    out = minimize(residual, pars, args=(data,), method='shgo')

The runtime was 0.04 sec, using 23 function evaluations. 

This is consistent with Andrea Gavana's excellent study of global solvers (see http://infinity77.net/global_optimization/index.html) and a good justification for why `shgo` was added to `lmfit` (and later, scipy).  Of course, there may be cases where one method does better than others -- everyone has their niche.  But the differential_evolution method in scipy is just not very good. 

So I have 2 suggestions:
  1. we'll fix differential_evolution to accept and apply `max_nfev` correctly.
  2. no one will ever again start out by using differential_evolution.  Start with a solver that works.

I suppose some people might object to deprecating differential_evolution, but I'll invite those objections now:

I propose we deprecate `differential_evolution`.  Is there any reason to keep it?

--Matt 

Gaël Grissonnanche

unread,
Jan 7, 2021, 11:56:27 AM1/7/21
to lmfit-py
Thank you very much Matt for your answer, I love your spirit! :)

I played with SHGO in the past under your recommendation already, and that is true that it can do wonders for some problems. There is one problem with it that I haven't reported (or maybe it is SHGO itself), beyond 10 free parameters, SHGO would run for a bunch of iterations and would stop, just freeze, not doing anything, at a number of iteration that is proportional to n and iter if I recall correctly (using "sobol"). That is another subject, so I should create another thread for that at some point.

The other problem I have with SHGO is myself, I don't understand how to set it up, the way I did it in the past is just playing with the parameters "n", "iter" and "sampling method" until it gives me the fastest convergence, but without knowing much what would be optimal for my problem. Yes for this specific example it is impressive, but for another problem that we discussed together in another thread, the default sampling method would not converge within a reasonable amount of time and I had to use sampling_method='sobol' for example, without really getting why, but then it was extremely fast compared to differential_evolution or AMPGO.

Differential evolution is for me easy. For the specific problem I am looking at right now, by default it converges within ~1000 calls, while AMPGO converges over ~5000 calls. For now, I am still in the belief that differential_evolution might be the simplest for what I need, considering my ignorance towards SHGO, even if I share your opinions.

Going back to parallelization of differential_evolution, I tried it on two different computers on Windows 10, and for both it does not converge and throws me this message (below) every seconds for a very long time and I have to abort manually the process. This does not happen if I use workers in differential_evolution directly from scipy, only from lmfit:

---------------------------------
Traceback (most recent call last):
File "<string>", line 1, in <module>
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\ProgramData\Anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "c:\Users\Oersted\Desktop\test_differential_evolution_lmfit.py", line 26, in <module>
    out = minimize(residual, pars, args=(data,), method='differential_evolution', workers=2, updating="deferred")
  File "C:\ProgramData\Anaconda3\lib\site-packages\lmfit\minimizer.py", line 2393, in minimize
    return fitter.minimize(method=method)
  File "C:\ProgramData\Anaconda3\lib\site-packages\lmfit\minimizer.py", line 2176, in minimize
    return function(**kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\lmfit\minimizer.py", line 925, in scalar_minimize
    ret = differential_evolution(self.penalty, _bounds, **kwargs)
  File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\optimize\_differentialevolution.py", line 305, in differential_evolution
    constraints=constraints) as solver:
  File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\optimize\_differentialevolution.py", line 498, in __init__
    self._mapwrapper = MapWrapper(workers)
  File "C:\ProgramData\Anaconda3\lib\site-packages\scipy\_lib\_util.py", line 385, in __init__
    self.pool = Pool(processes=int(pool))
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 119, in Pool
    context=self.get_context())
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 176, in __init__
    self._repopulate_pool()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\pool.py", line 241, in _repopulate_pool
    w.start()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 112, in start
    self._popen = self._Popen(self)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 46, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
--------------------------------------

My initial question for this thread was how to use the map-like callable for workers instead of giving it the number of workers I want, because I absolutely need to initiate the pool of workers manually in my problem (there is something that I need to do only when starting the pool), but I am stuck with using differential_evolution in parallel with lmfit. And I love so much lmfit that I don't want to use the differential evolution from scipy directly.

Thanks again for everything!

Gaël

Andrea Gavana

unread,
Jan 8, 2021, 4:42:29 AM1/8/21
to lmfi...@googlegroups.com
Hi Matt,

Thank you for your kind words :-) . In case you're interested, I have just completed a much bigger exercise which kind of supersedes the previous one:


Andrea.


Matt Newville

unread,
Jan 8, 2021, 9:30:34 AM1/8/21
to lmfit-py
Hi Andrea,


Wow, that's a great study.   Thanks for doing that!   For sure global optimizers are far outside of my area of expertise, so your benchmarks are really valuable for me.  I should have also tried the 'ampgo' solver along with 'shgo' - it turns out to be much slower than 'shgo' for this case but roughly as efficient as 'differential_evolution' (1.5 sec and 10000 function evaluations).


It does seem like the MCS solver there would be useful to add to scipy and/or lmfit.  I was just reading about the licensing issue at the scipy-dev thread, which seems unfortunate.

Anyway,  thanks very much!

--Matt

Matt Newville

unread,
Jan 8, 2021, 9:43:14 AM1/8/21
to lmfit-py
Hi Gael, 


On Thu, Jan 7, 2021 at 10:56 AM Gaël Grissonnanche <gael...@gmail.com> wrote:
Thank you very much Matt for your answer, I love your spirit! :)

I played with SHGO in the past under your recommendation already, and that is true that it can do wonders for some problems. There is one problem with it that I haven't reported (or maybe it is SHGO itself), beyond 10 free parameters, SHGO would run for a bunch of iterations and would stop, just freeze, not doing anything, at a number of iteration that is proportional to n and iter if I recall correctly (using "sobol"). That is another subject, so I should create another thread for that at some point.


Hm, yes, I'm not sure if that is a "known problem" or not, but please try to put together a report on that.

The other problem I have with SHGO is myself, I don't understand how to set it up, the way I did it in the past is just playing with the parameters "n", "iter" and "sampling method" until it gives me the fastest convergence, but without knowing much what would be optimal for my problem. Yes for this specific example it is impressive, but for another problem that we discussed together in another thread, the default sampling method would not converge within a reasonable amount of time and I had to use sampling_method='sobol' for example, without really getting why, but then it was extremely fast compared to differential_evolution or AMPGO.

I don't think I can help you with these details other than pointing to the docs and source code.  

Differential evolution is for me easy. For the specific problem I am looking at right now, by default it converges within ~1000 calls, while AMPGO converges over ~5000 calls. For now, I am still in the belief that differential_evolution might be the simplest for what I need, considering my ignorance towards SHGO, even if I share your opinions.

OK.  I guess there are cases where differential evolution is useful compared to AMPGO and SHGO.  These global solvers are pretty opaque to me and often aimed at problems that I just do not experience (or try to solve?) often. 
Hm, I do not know what that message really means.  I guess one difference between scipy and lmfit would have to do with what can be pickled and shared between processes.  I thought we had that "solved", but perhaps 'differential_evolution' is doing something more complicated?

It does look to me like the problem is with 'scipy.optimize.differential_evolution', but maybe the call of it from 'lmfit':

    File "C:\ProgramData\Anaconda3\lib\site-packages\lmfit\minimizer.py", line 925, in scalar_minimize
        ret = differential_evolution(self.penalty, _bounds, **kwargs)

is send objects that multiprocessing.Pool cannot handle.   I don't know what that would be, but might suggest that someone interested (perhaps you?) could look into that in more detail.

--Matt 

Gaël Grissonnanche

unread,
Jan 10, 2021, 11:57:00 PM1/10/21
to lmfit-py
Hi Matt,

Thanks again for all your answers above. I kind of sorted it out all that goo in my head.
I cannot justify it here in a few words, but I am indeed in a situation where differential_evolution performs better than AMPGO and SHGO (I tested it over the weekend), and even better when using workers in parallel.
I solved the issue with these weird errors I had above, it is simply that I had to use "if __name__ == '__main__':" before calling differential evolution in parallel, I am sorry for that (see below).

So but wait, it becomes interesting now! If you run the code below, for the same objective function and with a number of workers = number of cpus, scipy works within 2 seconds on my computer, but lmfit takes more than a minute. So I am pretty sure there is a problem with lmfit and workers now, not a problem with scipy. Using the parallelization of the differential_evolution with scipy, I can solve my problem (my real one, not this dummy example) in 40 minutes instead of  3 hours, which is a significant improvement for me. But as you can see with this example, there is a problem with the parallelization of differential_evolution with lmfit, and that saddens me a lot because lmfit is always my preference for so many reasons, no doubt!

Here is the example:
------------------------------------
import numpy as np
from time import time
from lmfit import minimize, Parameters, report_fit
from scipy.optimize import differential_evolution
from multiprocessing import Pool, cpu_count

= 10

def residual_lmfit(pars, data):
    C = np.array([pars["a"].value, pars["b"].value, pars["c"].value])
    #multiply two random matrices to slow things down artificially
    f = np.random.random([N,N])
    g = np.random.random([N,N])
    l = np.matmul(f, g)
    return (C - data)

def residual_scipy(C, data):
    #multiply two random matrices to slow things down artificially
    f = np.random.random([N,N])
    g = np.random.random([N,N])
    l = np.matmul(f, g)
    return np.sum((C - data)**2)

pars = Parameters()
pars.add("a", value = 0, min = -20000, max= 20000)
pars.add("b", value = 0, min = -20000, max= 20000)
pars.add("c", value = 0, min = -20000, max= 20000)

#make random array to elvolve towards
data = np.array([12.342234, 254.4356340, 97.1327916])
bounds = [(-20000,20000), (-20000,20000), (-20000,20000)]

if __name__ == '__main__':

    percent_workers = 100
    num_cpu = cpu_count()
    num_workers = int(percent_workers / 100 * num_cpu)
    print("# cpu cores: " + str(num_cpu))
    print("# workers: " + str(num_workers))

    t0 = time()
    pool_1 = Pool(processes= num_workers)
    out = minimize(residual_lmfit, pars, args=(data,), method='differential_evolution', workers=num_workers, updating="deferred")
    # out = minimize(residual_lmfit, pars, args=(data,), method='differential_evolution', workers=pool_1.map, updating="deferred")
    report_fit(out)
    print("##Time total: ", time() -t0, "s")

    t0 = time()
    pool_2 = Pool(processes= num_workers)
    res = differential_evolution(residual_scipy, bounds, args=(data,), polish= False, updating= 'deferred', workers=pool_2.map)
    print("##Time total: ", time() -t0, "s")
    print(res.x)
    print(res.fun)
-----------------------------------------

So for scipy, I can use either the number of workers I want, or give "workers" a map-like callable "pool_2.map", and both work. In the end a map-like callable is what I want for my real problem, because I will need to initialize my pool.
Regarding lmfit, on my computer, "workers" take only an int, if I give it pool_1.map, it throws the error "NotImplementedError: pool objects cannot be passed between processes or pickled", and I am using lmfit 1.0.1.
I don't know what lmfit is doing differently, I never looked at the source code of lmfit, I am sorry, but for scipy, it is pretty straight forward in the source code, so I don't know. Do you think it is an easy fix?

Thanks again!

Gaël


Gaël Grissonnanche

unread,
Jan 17, 2021, 7:30:20 PM1/17/21
to lmfit-py
Hi Matt,

To follow on what I say last time, I think I found, not the solution to why parallel is slow with Pool in lmfit, but the reason why it is slow in lmfit and not scipy.
I think it is slow in lmfit and not scipy because the objective function in lmfit must be given params, the object of the class Parameters, and by doing so, the entire self of Parameters is pickled and send to workers processes. That is slow.
It seems to be a common problem with multiprocessing https://stackoverflow.com/questions/37068981/minimize-overhead-in-python-multiprocessing-pool-with-numpy-scipy .
Ray seems to be a new package that does not have this overhead problem with classes, so I might look at that and come back to you if that works better than Pool in lmfit.

Gaël

Matt Newville

unread,
Jan 18, 2021, 11:29:43 AM1/18/21
to lmfit-py
Hi Gael, 


On Sun, Jan 17, 2021 at 6:30 PM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi Matt,

To follow on what I say last time, I think I found, not the solution to why parallel is slow with Pool in lmfit, but the reason why it is slow in lmfit and not scipy.
I think it is slow in lmfit and not scipy because the objective function in lmfit must be given params, the object of the class Parameters, and by doing so, the entire self of Parameters is pickled and send to workers processes. That is slow.

Well, yes that is a difference between using scipy.optimize and lmfit.  I'm not certain that pickle itself is the slow part.  You can just try that at at the command line (timeit could be used):

>>> import lmfit, pickle, time
>>> mod = lmfit.models.GaussianModel()
>>> pars = mod.make_params(amplitude=1, center=3, sigma=0.3)
>>> t0 = time.time() ; newpars = pickle.loads(pickle.dumps(pars)) ; print(" pickle roundtrip = %.4f sec" % (time.time()-t0))

That is on the order of 10s of milliseconds on a not-new laptop - hard to see that itself as a major problem.  But, a new process does have to be created, python initiated, lmfit imported and the model reconstructed.  If the overhead time for all of that is significant compared to the actual runtime for that process then multiprocessing will not be much of a win.   That sort of comes down to process management, which is really not something we are going to solve here.

Just to be clear "this calculation is too slow!  I know, multiprocessing will magically make it faster!" is not always correct.
 
It seems to be a common problem with multiprocessing https://stackoverflow.com/questions/37068981/minimize-overhead-in-python-multiprocessing-pool-with-numpy-scipy .
Ray seems to be a new package that does not have this overhead problem with classes, so I might look at that and come back to you if that works better than Pool in lmfit.

I've not heard of the Ray package, so I cannot guess what it might do differently from other multi-processing approaches.

Cheers,
--Matt

Renee Otten

unread,
Jul 2, 2021, 8:47:03 AM7/2/21
to lmfi...@googlegroups.com, andrea...@gmail.com
Apologies for being late here…. I was just looking at your new suite of benchmarks Andrea, that seems like quite a bit of work and is a very valuable compilation, so thanks!

As Matt is saying the MCS solver looks very useful. I tried to find the discussion about the licensing at the scipy-dev mailing list, but have a hard time as this list doesn’t seem searchable… Any chance someone can provide a link to that or summarize the issue - I couldn’t find much on the website regarding licensing (https://www.mat.univie.ac.at/~neum/software/mcs/). 

Since you have ported it to Python already Andrea for your benchmarks would you be willing to share that code (I didn’t see it on your GitHub page) perhaps off-list if you prefer? Of course we will have to figure out licensing stuff before thinking about including it, but I would be interested in just trying it out myself.

Best,
Renee
Reply all
Reply to author
Forward
0 new messages