Speed of Parameters _getval

29 views
Skip to first unread message

Nicolas Macro

unread,
Aug 23, 2019, 1:56:26 PM8/23/19
to lmfit-py
Hello,

First of all, thank you for making such an excellent tool.  It has helped me immensely in my research.

I'm currently trying to solve a fitting problem that has both a large parameter space (~1500 parameters) and a long computation time.  I have optimized the computation using numba, which requires the use of params['name'].value to properly return a float and not a Parameters object.  Now, it appears that the next bottle neck is at parameter.py:745(_getval).

Do you have any suggestions for speeding up the retrieval of parameter values?  Is there room for optimization of this portion of code?  I would like to give back to the project if possible.

Best,
Nick


Matt Newville

unread,
Aug 24, 2019, 8:15:47 AM8/24/19
to lmfit-py
Hi Nick,

On Fri, Aug 23, 2019 at 12:56 PM Nicolas Macro <nmac...@gmail.com> wrote:
Hello,

First of all, thank you for making such an excellent tool.  It has helped me immensely in my research.

I'm currently trying to solve a fitting problem that has both a large parameter space (~1500 parameters) and a long computation time.  I have optimized the computation using numba, which requires the use of params['name'].value to properly return a float and not a Parameters object.  Now, it appears that the next bottle neck is at parameter.py:745(_getval).


What makes you conclude that  `_getval()` is the bottleneck?   If each function evaluation is a long running computation time, I would no imagine that getting the parameter values be the slow part.

You decided to not post any code or even to tell us how many of the 1500 parameter were variables.  That makes it hard to know what you are doing or give any specific advice.  Our request that you include a complete, minimal example is really pretty clear.

I do not know how well any of the fitting algorithms would scale to 1500 variable parameters.  I expect that many would start to hit serious problems.   You decided to not tell us what solver you are using.


Do you have any suggestions for speeding up the retrieval of parameter values?  Is there room for optimization of this portion of code?  I would like to give back to the project if possible.


Welll, maybe.   If you have constraint expressions, or bounds, or fixed parameters, you might want to refactor your code to use fewer of these. But in the end  `_getval()` is a Python function that will be called.  So if that really is the bottleneck it might be hopeless. 

Then again, if you want to try to see if you can get lmfit Parameters faster, go for it!   As you can probably tell, ultimate runtime performance is actually not the highest priority for lmfit. Or python. Or, really, even numba.  It is entirely possible that what you really need is Fortran, MPI, and access to a high-performance cluster.  

--Matt


Nicolas Macro

unread,
Aug 24, 2019, 1:44:30 PM8/24/19
to lmfit-py
Hi Matt,

Thank you for taking the time to provide such an detailed and prompt response!


What makes you conclude that  `_getval()` is the bottleneck?   If each function evaluation is a long running computation time, I would no imagine that getting the parameter values be the slow part.

Please see this notebook.  It contains a simplified example of the type of optimization problem at hand and the bottleneck is pretty well diagnosed to exist when defining a parameter using `expr`.


You decided to not post any code or even to tell us how many of the 1500 parameter were variables.  That makes it hard to know what you are doing or give any specific advice.  Our request that you include a complete, minimal example is really pretty clear.

I do not understand the differentiation between parameter and variable in this context.  Can you please explain the difference?  Sorry about not providing an example in my first message.  I cannot share the code for the problem that I'm solving as it is unpublished research.  Please see the above notebook for a basic example.  Let me know if I can provide any additional information.


I do not know how well any of the fitting algorithms would scale to 1500 variable parameters.  I expect that many would start to hit serious problems.   You decided to not tell us what solver you are using.

I'm currently using the least_squares method, because it provided the best performance for less ambitious version of this optimization problem of the same type.



Welll, maybe.   If you have constraint expressions, or bounds, or fixed parameters, you might want to refactor your code to use fewer of these. But in the end  `_getval()` is a Python function that will be called.  So if that really is the bottleneck it might be hopeless.

 I came to the same conclusion last night and the above notebook provides some supporting evidence.  When using a parameter defined with `expr` it significantly slows down (>5x slow down) retrieval of the value.  This does make sense because it needs to evaluate the expression every time which I imagine is a lot slower than retrieving a number.  Why does calling `parameter[name].value` take ~10x longer than `parameter[name]`?  Also, if the parameter is defined using `expr`, the slow down becomes ~100x.  Do you think that could be accelerated at all?


Then again, if you want to try to see if you can get lmfit Parameters faster, go for it!   As you can probably tell, ultimate runtime performance is actually not the highest priority for lmfit. Or python. Or, really, even numba.  It is entirely possible that what you really need is Fortran, MPI, and access to a high-performance cluster. 

I completely agree that one must sacrifice performance for improved usability and flexibility.  lmfit provides an excellent framework to work in and I hope that it can be used for this problem.  I already have the code running on a compute cluster, but could definitely eek out some additional performance by improving overall parallelization.

Please let me know if you see any significant problems with my conclusions or methodology.

Best,
Nick

Matt Newville

unread,
Aug 25, 2019, 5:20:42 PM8/25/19
to lmfit-py
Hi Nick,

On Sat, Aug 24, 2019 at 12:44 PM Nicolas Macro <nmac...@gmail.com> wrote:
Hi Matt,

Thank you for taking the time to provide such an detailed and prompt response!


What makes you conclude that  `_getval()` is the bottleneck?   If each function evaluation is a long running computation time, I would no imagine that getting the parameter values be the slow part.

Please see this notebook.  It contains a simplified example of the type of optimization problem at hand and the bottleneck is pretty well diagnosed to exist when defining a parameter using `expr`.


Hm, I guess I am still not sure what led you to conclude that `Parameter._getval()` "is the bottleneck".  

For sure, using constraint expressions will slow down your code -- the expressions are evaluated in Python after all.  Using a constraint expression of
 
      slow_params.add('slow', expr='2')

is an odd thing to do.  It will definitely incur a performance hit.  Yeah, don't do that.



You decided to not post any code or even to tell us how many of the 1500 parameter were variables.  That makes it hard to know what you are doing or give any specific advice.  Our request that you include a complete, minimal example is really pretty clear.

I do not understand the differentiation between parameter and variable in this context.  Can you please explain the difference? 


A Parameter is any abstract quantity in the objective or model function describing the phenomenon you are modeling or problem you are trying to solve.    A Parameter is the kind of value that *might* vary in a fit, but that you may want to not vary but instead fix or constrain.  A variable is a Parameter that is actually varied in the fit.  


Sorry about not providing an example in my first message.  I cannot share the code for the problem that I'm solving as it is unpublished research.  Please see the above notebook for a basic example.  Let me know if I can provide any additional information.

I do not know how well any of the fitting algorithms would scale to 1500 variable parameters.  I expect that many would start to hit serious problems.   You decided to not tell us what solver you are using.

I'm currently using the least_squares method, because it provided the best performance for less ambitious version of this optimization problem of the same type.


I would assume that either `leastsq` or `least_squares` (which are actually different because scipy.optimize is rather oddly organized) would give the best performance.  If you've gone to this length, I will assume that you have tested this.




Welll, maybe.   If you have constraint expressions, or bounds, or fixed parameters, you might want to refactor your code to use fewer of these. But in the end  `_getval()` is a Python function that will be called.  So if that really is the bottleneck it might be hopeless.

 I came to the same conclusion last night and the above notebook provides some supporting evidence.  When using a parameter defined with `expr` it significantly slows down (>5x slow down) retrieval of the value.  This does make sense because it needs to evaluate the expression every time which I imagine is a lot slower than retrieving a number.  Why does calling `parameter[name].value` take ~10x longer than `parameter[name]`? 

Well it has to deternine the value in the presence of bounds and expressions....


Also, if the parameter is defined using `expr`, the slow down becomes ~100x.  Do you think that could be accelerated at all?

Probably not, or perhaps: Maybe, but I don't know how to do that.  


--Matt

Nicolas Macro

unread,
Aug 26, 2019, 12:42:26 PM8/26/19
to lmfi...@googlegroups.com
Hi Matt,

The use of `expr=2` was simply to demonstrate the overhead associated with using an expression instead of a value.  I completely agree that expressions should only be used when required to avoid unnecessarily evaluating the expression.

Anyway, I believe I now understand reason for the measured slowdown in the previous notebook.  I'm not going to provide a notebook or code for this response because most of the conclusions are fairly intuitive.  When I was calling `param[name]` it was returning the `param[name].__repr__` and not the parameter value itself.  The `param[name].__repr__` is of course faster than retrieving the value.  However, we can indeed remove some of the overhead associated with the `.value` property.  Calling `._val` instead of `.value` provided a 80x speedup.  I'm not suggesting that the checks in `.value` are useless.  However, there may be some circumstances where the checks in `.value` are unnecessary because the model is constructed in such a way that `.value` always return `._val`.

Please let me know if you you disagree with any of my conclusions.

Also, can you explain why the parameter itself needs to check boundaries if we are using an optimizer that has boundaries built in (e.g. least_squares)?  If it is unnecessary in that scenario, and we are not using expressions, is there a downside or risk to using `._val` to retrieve the value?

Thank you for the discussion, it has been very helpful.

update:
I have changed all instances of `.value` to `._val` in my model and the result is entirely the same with a 2.4x speed up.  Interestingly, the code still contained parameters defined by `expr=` and it works just fine.  Any idea why that is?  I expected that the code would throw an error or return nonsense when using `._val` on a parameter defined by `expr=`.

Best,
Nick
Reply all
Reply to author
Forward
0 new messages