Differential Evolution not taking fit_kws

99 views
Skip to first unread message

Gaël Grissonnanche

unread,
Apr 26, 2020, 3:28:08 AM4/26/20
to lmfit-py
Hi!
I just spent hours trying to pass fit keywords for the differential evolution method without any success.
If I download the simple example https://lmfit.github.io/lmfit-py/examples/example_diffev.html#sphx-glr-examples-example-diffev-py , and then I want to pass fit keywords to the scipy differential evolution function, I am completely incapable of doing so at line 47:

This does not work, in the sense that the code runs like popsize is not there:
o2 = lmfit.minimize(resid, params, args=(x, yn), method='differential_evolution', popsize=2)

This does not work, same as above:
o2 = lmfit.minimize(resid, params, args=(x, yn), method='differential_evolution', options={"popsize":2})

And this does not cause any trouble to the code, it just does not care:
o2 = lmfit.minimize(resid, params, args=(x, yn), method='differential_evolution', whateverman=2)

If I now replace by method="leastsq", whateverman=2 is a problem and the code does not run.

So I wonder why can't I send these fit_kws for the differential evolution method?

I thank you in advance for any help you would provide, thanks!

Gaël

Matt Newville

unread,
Apr 26, 2020, 10:43:19 AM4/26/20
to lmfit-py
Hi Gael,

On Sun, Apr 26, 2020 at 2:28 AM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi!
I just spent hours trying to pass fit keywords for the differential evolution method without any success.
If I download the simple example https://lmfit.github.io/lmfit-py/examples/example_diffev.html#sphx-glr-examples-example-diffev-py , and then I want to pass fit keywords to the scipy differential evolution function, I am completely incapable of doing so at line 47:

This does not work, in the sense that the code runs like popsize is not there:
o2 = lmfit.minimize(resid, params, args=(x, yn), method='differential_evolution', popsize=2)

Hm, not sure what is going on.  That would be the intended way for it to work, and it does work for me (with 1.0.0 and github master).  Or at least using `popsize=2` definitely results in fewer function evaluations and slightly different results than using the default (popsize=15).

What is your evidence that it is not working?

I can see (and "concede" - the past tense of "can see" for the Sunday morning word puzzle enthusiasts) that scipy's OptimizeResult and the lmfit MinimizerResult does not do a good job of preserving the actual arguments sent to the fitting function.  I think we should fix that so that somewhere in the `o2` MinimizerResult it had the value for `popsize` actually sent.
--Matt 

Gaël Grissonnanche

unread,
Apr 26, 2020, 6:18:47 PM4/26/20
to lmfi...@googlegroups.com
Thank you very much Matt! Indeed, you are correct, I guess there were a lot of keywords that prevented me from seeing the first solution was correct.
I had to deactivate polish=False to control the number of evalutions of the function.
Now, what was/is a problem is that if you have 4 parameters like in the example, if you decide to put maxiter=0 and popsize=5 (the minimum), minimize() is still going to evalute at least 30 times the residual function and I do not not understand why. In the fit report "function evals = 6", but if you place print in the residual function, you will see that there is an initial ~30 time calls about that I cannot understand. In pratice, I would like to understand this ~30 times because for what I want to do, 30 times is very long and a waste of time for my application. Would you know why there are these initial calls?

Thanks again!

Gaël


Matt Newville

unread,
Apr 27, 2020, 1:13:34 AM4/27/20
to lmfit-py
Hi Gael,


Well, I think the first thing to say is that if you are worried about 6 vs 30 function calls, then differential evolution is not the algorithm for you.  Probably no scalar minimizer is going to solve for 4 parameters in fewer than 30 function calls, and differential evolution is not even slightly designed for speed.   I don't know for sure, but I can definitely believe that there will be some overhead in "warming up" differential evolution.  

If making 8*N_parameters function calls is wasteful, you need to carefully consider with method to use.   One question I would ask is: do you need to use a scalar minimizer?

--Matt


Gaël Grissonnanche

unread,
Apr 27, 2020, 3:55:47 PM4/27/20
to lmfit-py
Hi Matt,

Thank you for your answer once again. I am sorry, I probably did not explain myself clearly, I do not expect to solve the problem in less than 30 function calls. The problem I have is that each function call is very long, about 20s at best. At each call I am computing electric and thermoelectric transport relying on Boltzmann formalism (physics), therefore if I can spare 30 useless function calls, that would be appreciated. I had written my own differential evolution code, and that works, but I would prefer to use the one from lmfit that includes a lot of strategies for example.

What I am doing in one call of the objective function, is using a "heavy" calculation to generate sigma (electric conductivity) and alpha (thermoelectric conductivity) that I compare to the data in a dual fit (same objective function). Another problem I have is that sigma and alpha are not quantities in the same units and do not have the same order of magnitude once expressed in SI units (sigma ~ 10^-9 and alpha ~ 10^-11), therefore sigma and alpha are not weighted the same in the algorithm. I guess it is a common problem, but I do not know how to make it work out, as I have to concatenate the two quantities for the objective function.

I also need to understand how to follow the evolution of the differential evolution algorithm, I would appreciate if I could see what is the best individual after a given time when still running (as it takes forever). I guess a callback function is what I need.

Gaël

Matt Newville

unread,
Apr 27, 2020, 6:32:28 PM4/27/20
to lmfit-py
Hi Gael,

On Mon, Apr 27, 2020 at 2:55 PM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi Matt,

Thank you for your answer once again. I am sorry, I probably did not explain myself clearly, I do not expect to solve the problem in less than 30 function calls. The problem I have is that each function call is very long, about 20s at best. At each call I am computing electric and thermoelectric transport relying on Boltzmann formalism (physics), therefore if I can spare 30 useless function calls, that would be appreciated. I had written my own differential evolution code, and that works, but I would prefer to use the one from lmfit that includes a lot of strategies for example.


Well, differential evolution is not even remotely concerned with efficiency.  If you look at that `example_diffev.py`, differential evolution used 1425 function evaluations to solve a problem that leastsq solved with 65 evaluations.  That's a bit more than a factor of 20.        

OK, so maybe that problem is not hard enough to justify using differential evolution, and maybe your problem is.  Well, maybe... have you tested that?  Have you explored other solvers?   Because, again, if sparing 30 function calls is valuable to you, then differential evolution is probably not the place to start.  There are other solvers....


What I am doing in one call of the objective function, is using a "heavy" calculation to generate sigma (electric conductivity) and alpha (thermoelectric conductivity) that I compare to the data in a dual fit (same objective function).

OK.   Are those arrays of electrical and thermal conductivity or single values?  How many variables parameters?


Another problem I have is that sigma and alpha are not quantities in the same units and do not have the same order of magnitude once expressed in SI units (sigma ~ 10^-9 and alpha ~ 10^-11), therefore sigma and alpha are not weighted the same in the algorithm. I guess it is a common problem, but I do not know how to make it work out, as I have to concatenate the two quantities for the objective function.

Generally (because we don't really know what you are doing), I would recommend rescaling so that the numerical values are more in the 1e.-6 to 1.e6 range.   Like, don't use Farads when you could use pF.   

Of course, we have no idea what you are trying to do without actual code.  Your concatenating values?  OK...  Are those meant to have equal weight?  


I also need to understand how to follow the evolution of the differential evolution algorithm, I would appreciate if I could see what is the best individual after a given time when still running (as it takes forever). I guess a callback function is what I need.

Well, not trying to combative but: why do you want to use differential evolution?  Is your problem actually a scalar problem: does your objective function return an array to be minimized or a single scalar value? 

--Matt 

Gaël Grissonnanche

unread,
Apr 27, 2020, 7:10:10 PM4/27/20
to lmfit-py
Hi Matt,

Thank you so much for trying to help me here. You might be very correct regarding the fact that differential evolution might not be the best solver for me here. I turned myself to it because sigma and alpha are obtained from highly non-linear computations and free parameters can vary from 3 to almost 10 (I just fix parameters function of what I want to test). Here I would stick with 4 for what I want to do today. Therefore, in order to avoid local minima, I thought differential evolution would be the most suited solver, but the reality is that, apart from leastsq and differential evolution, I do not much about the world of solvers. I am opened to any other recommendations! :)

I wish I could find a way to explain simply what does the algorithm to compute sigma and alpha, but for now I don't know how. What I can say is it is not an analytical function, but numerical calculations to get sigma and alpha. I have data function of temperature in the form of arrays sigma_data and alpha_data, each index correspond to a temperature. Therefore, in the objective function, there is a for loop that calls my "heavy" algorithm to compute sigma and alpha at each temperature. In the end, sigma_model and alpha_model are both arrays of the same length as the temperature array. My objective function returns:

diff_sigma = sigma_data - sigma_model # in (Ohm m)^-1
diff_alpha = alpha_data - alpha_model # in (A / K^2 m)
return np.concatenate((diff_sigma, diff_alpha))

I would like sigma and alpha to be equally weighted in the solver. But as I said, in SI units, their numerical values are far from being in the same order of magnitude. I agree that keeping the data around unity is good, usually I keep mine in SI units because it prevents me from conversion errors, I should change that here. But so, are you recommending to weight alpha in order to have the same order of magnitude for sigma and alpha numerical values? Is it the only way?

Thank you again!

Gaël

Matt Newville

unread,
Apr 27, 2020, 11:39:32 PM4/27/20
to lmfit-py
Hi Gael,

On Mon, Apr 27, 2020 at 6:10 PM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi Matt,

Thank you so much for trying to help me here. You might be very correct regarding the fact that differential evolution might not be the best solver for me here. I turned myself to it because sigma and alpha are obtained from highly non-linear computations and free parameters can vary from 3 to almost 10 (I just fix parameters function of what I want to test). Here I would stick with 4 for what I want to do today. Therefore, in order to avoid local minima, I thought differential evolution would be the most suited solver, but the reality is that, apart from leastsq and differential evolution, I do not much about the world of solvers. I am opened to any other recommendations! :)


Do you *know* that false, local minima are actually a problem?  The earlier discussion suggests that you could do `leastsq` with 20 different sets of randomly selected starting values in the time it takes to do 1 run of differential evolution.  If the first 15 of those 20 runs gave you the same results (or even two different sets of results) would you continue? 

I wish I could find a way to explain simply what does the algorithm to compute sigma and alpha, but for now I don't know how. What I can say is it is not an analytical function, but numerical calculations to get sigma and alpha.

Well, you could post actual code.  Normally, we sort of expect that.  We don't know what you are doing except what you tell us.
 
I have data function of temperature in the form of arrays sigma_data and alpha_data, each index correspond to a temperature. Therefore, in the objective function, there is a for loop that calls my "heavy" algorithm to compute sigma and alpha at each temperature. In the end, sigma_model and alpha_model are both arrays of the same length as the temperature array. My objective function returns:

diff_sigma = sigma_data - sigma_model # in (Ohm m)^-1
diff_alpha = alpha_data - alpha_model # in (A / K^2 m)
return np.concatenate((diff_sigma, diff_alpha))

I would like sigma and alpha to be equally weighted in the solver. But as I said, in SI units, their numerical values are far from being in the same order of magnitude. I agree that keeping the data around unity is good, usually I keep mine in SI units because it prevents me from conversion errors, I should change that here. But so, are you recommending to weight alpha in order to have the same order of magnitude for sigma and alpha numerical values? Is it the only way?


Um, I guess...  If you want to weight alpha in order to have the same order of magnitude as sigma, sure, go ahead and do that.     

The mathematics does not care about SI units.  It does care about the scale of numbers.  Don't try to do fits with values that are greater in scale than1.e+10 or less than 1.e-10.   If you need to use gigameters or femtometers instead of meters or whatever, do so.

Hope that helps,

--Matt

Gaël Grissonnanche

unread,
Apr 28, 2020, 12:32:23 AM4/28/20
to lmfit-py
Hi Matt,

Thank you again! Great, regarding the units I should definitely do that, I have not realized how crucial it was.

Well I know that indeed local minima can be a huge pain, do you mean something deeper saying "false" minima? I never thought that doing `leastsq` with 20 different sets of randomly selected starting values was a fast and efficient practice, would you recommend it as a conventional practice for certain problems?

Right now I know there are parameters values that work for my data, but the fit does not find them, so indeed if I were getting 15 times the same answer, I might consider this answer correct. That is a shame that I do not know more about solvers regarding my problem, now I am curious to know if there is not a better solver for me out there. Differential evolution was sold to me by my advisor as the ideal solver to explore large parameter spaces, and I must confess that in the past year, my own genetic algorithm worked better than "leastsq" which was getting trapped in local minima as soon as I was adding more than 4 free parameters. But I do not know what else is out there.

I do not mind posting my code, it is just that I am worried it would be unintelligible (find it attached). If you want to make it run, you would need my package: pip install cuprates-transport ...which is working but certainly far from being "public-ready".

Thank you again so much for your time Matt!

Gaël

TS_NSYM_K_S_0T_Tl-2201-33K_a_31MAR2014.dat
TS_16T_rho_a_1-2_Tl-2201_Tc_33K_30Aug2013.dat
fitting_Sxx_vs_T_Tl2201_Tc_33K.py

Matt Newville

unread,
Apr 28, 2020, 2:59:33 PM4/28/20
to lmfit-py
On Mon, Apr 27, 2020 at 11:32 PM Gaël Grissonnanche <gael...@gmail.com> wrote:
Hi Matt,

Thank you again! Great, regarding the units I should definitely do that, I have not realized how crucial it was.

Well I know that indeed local minima can be a huge pain, do you mean something deeper saying "false" minima? 
I never thought that doing `leastsq` with 20 different sets of randomly selected starting values was a fast and efficient practice, would you recommend it as a conventional practice for certain problems?

Running leastsq 20 times is not fast or efficient.  But, it can be faster and more efficient than running differential evolution once. 


Right now I know there are parameters values that work for my data, but the fit does not find them, so indeed if I were getting 15 times the same answer, I might consider this answer correct. That is a shame that I do not know more about solvers regarding my problem, now I am curious to know if there is not a better solver for me out there. Differential evolution was sold to me by my advisor as the ideal solver to explore large parameter spaces, and I must confess that in the past year, my own genetic algorithm worked better than "leastsq" which was getting trapped in local minima as soon as I was adding more than 4 free parameters. But I do not know what else is out there.

Many people have a strong attachment to differential evolution.   The best algorithm to use is always problem dependent. The "global optimizers" (in which diffev is often grouped, though the scipy implementation requires min/max bounds on all parameters) are useful when parameter space has multiple minima or minima that are hard to find unless you start very close to the solution.  

Unfortunately, DiffEv does not include a steepest-descent method for each candidate value, effectively asserting that there is no point in searching for local minima at all: the global minimum will not be the best of the local minima, it will be the best of the randomly selected values based on the values tried so far.  DiffEv also does not use the initial parameter values as "most likely prior" with bounds that mean "min/max". Rather, it uses min/max as the range over which all values are equally likely.  All of this makes diffev brutally pessimistic about the parameter space and sort of deliberately inefficient.  One might call it "dumb by design".   All algorithms have value, but if you are counting function evaluations in the tens, diffev is not your friend.  

You might find `shgo`, `dual_annealing`, or `ampgo` to be worth trying.  See http://infinity77.net/global_optimization/index.html for a discussion of `ampgo`.

I would also not rule out "a hundred reasonable-but-varied random starting values for leastsq".  That will give a good measure of how prevalent and dangerous local minima are for the actual problem.  It will also illuminate the correlation between parameters.   Sometimes what looks like multiple local minima are really from a few highly correlated parameters and a poor accounting (or inspection) of the correlations and uncertainties.  Leastsq provides a decent measure of these for each fit which can be incredibly useful as a first step in exploring parameter space.   Building on that, starting from such local minima and using conf_interval, conf_interval2d, and emcee to explore the parameter space around the local minima can be very helpful.  

--Matt 

Gaël Grissonnanche

unread,
Apr 28, 2020, 5:43:35 PM4/28/20
to lmfit-py
Hi Matt,

Thank you so much for your answer. It is full of sense, let me study the different avenues that you describe and come back to you.

Best,

Gaël
Reply all
Reply to author
Forward
0 new messages