Can residual function return log posterior without problems?

40 views
Skip to first unread message

J F

unread,
Sep 22, 2021, 7:48:00 PM9/22/21
to lmfit-py

Hi everyone,

It's not completely clear to me how lmfit works with the residual array that is returned by my objective function but I would like to minimize the negative log posterior rather than the sum of squares.

Right now I'm trying to achieve this by inside my objective function calculating a gaussian likelihood using the residuals, multiplying it by the prior probability for the given set of parameters and then taking the negative log of that unnormalized posterior. I then return this array, which contains this negative log posterior for timepoint of each individual in my data set, rather than the flat array of residuals itself.

My question is whether this is a good way of implementing this? And does lmfit still return a reasonable covariance matrix in this case? If not, what are your suggestions? It's crucial for me to minimize the posterior rather than the sum of squares of the residual so that I'm not exclusively depending on my data set, but also my prior knowledge regarding the parameters.

Eager to hear your thoughts! Thank you.

Kind regards




Matt Newville

unread,
Sep 23, 2021, 12:15:14 AM9/23/21
to lmfit-py
On Wed, Sep 22, 2021 at 6:48 PM J F <jeroen...@live.nl> wrote:

Hi everyone,

It's not completely clear to me how lmfit works with the residual array that is returned by my objective function but I would like to minimize the negative log posterior rather than the sum of squares.

The lmfit minimizers take the array that your objective function returns and minimizes that in the least-squares sense. 

So, to the extent that 

     log_posterior = -0.5 * np.sum(residual**2)

you should be good.  If that is not sufficient, and you want to concatenate a regularizing term to that residual, just be sure to scale it so that sum(residual_with_regularization**2) is the log of the posterior probability that you want.

--Matt 

J F

unread,
Sep 23, 2021, 3:50:17 AM9/23/21
to lmfit-py
Ah thank you! Good suggestion, so in that case, I think I should take the square root of the terms in the residual first! Thanks I'll try that! It seems that the minimizer has become much slower when I'm using the log posterior. Perhaps that was caused by lmfit using the sum of squares, which I'm thinking could be rather large (>1e6 or higher even).
Anyway I hope that taking the squareroot helps.

J F

unread,
Sep 23, 2021, 3:55:15 AM9/23/21
to lmfit-py
regarding the posterior: i'm calculating the normal likelihood using scipy.stats.norm.pdf, with x the data value, mu the predicted value and a constant sigma of 1 (at a large stage i want to pass a parameter to the optimizer for sigma as well).

I'm calculating the prior in similar manner and then multiply them.

I'm using 'least_squares' for the optimization

Matt Newville

unread,
Sep 23, 2021, 7:41:15 AM9/23/21
to lmfit-py
On Thu, Sep 23, 2021 at 2:50 AM J F <jeroen...@live.nl> wrote:
Ah thank you! Good suggestion, so in that case, I think I should take the square root of the terms in the residual first!

Don't take the square root of the actual residual or do the squaring or sum yourself.  I think you would want to concatenate the actual residual with the "properly signed" square root of any additional lambda/regularization terms.   As we say frequently, if you want more a detailed discussion, it always helps to provide an actual example of code. 

Thanks I'll try that! It seems that the minimizer has become much slower when I'm using the log posterior. Perhaps that was caused by lmfit using the sum of squares, which I'm thinking could be rather large (>1e6 or higher even).

I don't know, and maybe don't understand.  The sum of squares is large?  I don't see why that would cause trouble.
--Matt 

J F

unread,
Sep 23, 2021, 8:14:23 AM9/23/21
to lmfit-py
Thanks for your helpful comments Matt. I'm a bit confused about what you mean by the properly signed square root. I'm using prior probabilities together with the likelihood because I find it more intuitive to work with than regularization terms.

The code might be a bit long and confusing to share so I'll try to describe the steps as clearly as possible.

Imagine I have a dataset (Y) and let's assume I only have one variable so Y is that variable in the real data set.
I run simulations of a single ODE for multiple individuals, so I get 5 simulated values (X_i_t) for 5 times points (t) per individual (X_i).
I then calculate the likelihood for each value (X_i_t) using:

LL = scipy.stats.norm.pdf(Y_i_t, mu=X_i_t, sigma=1)
and put all those likelihoods in a flat array for all individuals and all t's.

Now I multiply all these likelihoods with some arbitrary prior probability constants, which is not particularly relevant right now, let's imagine it's just 1.0. So now I have an unnormalized 1D posterior array.

The reason I'm doing this inside the objective function is so that I can intuitively work with the priors, I'm trying to regularize towards a non-zero value that is known by expert knowledge.

Since I'm using the minimizer I have to take the negative of the unnormalized posterior, and because it's easier for the optimization i first take the log of the values.

I'm now returning this 1D negative log posterior array to the lmfit minimizer (least_squares).

Are you saying that this should just work? And that I should in principle get the parameters (and corresponding confidence intervals) that minimize this neg log posterior?

I indeed thought maybe the sum of squares being very large could be a problem, but I indeed don't think it's THAT large so probably not. Maybe the prior multiplication is making the shape of the posterior distribution significantly more complicated.

Matt Newville

unread,
Sep 23, 2021, 11:11:19 PM9/23/21
to lmfit-py
Hi J F, 


On Thu, Sep 23, 2021 at 7:14 AM J F <jeroen...@live.nl> wrote:
Thanks for your helpful comments Matt. I'm a bit confused about what you mean by the properly signed square root. I'm using prior probabilities together with the likelihood because I find it more intuitive to work with than regularization terms.


Well, the residual can have positive or negative values, right?  Any values you add to that could be negative too.  Taking the square root of that will cause problems.  The temptation is to take the square root of the absolute value, but that loses the sign information.  What I'm saying is: preserve the sign.

 

The code might be a bit long and confusing to share so I'll try to describe the steps as clearly as possible.

Imagine I have a dataset (Y) and let's assume I only have one variable so Y is that variable in the real data set.

Is Y the data or the variable?

I run simulations of a single ODE for multiple individuals, so I get 5 simulated values (X_i_t) for 5 times points (t) per individual (X_i). 

I then calculate the likelihood for each value (X_i_t) using:

LL = scipy.stats.norm.pdf(Y_i_t, mu=X_i_t, sigma=1)
and put all those likelihoods in a flat array for all individuals and all t's.

Now I multiply all these likelihoods with some arbitrary prior probability constants, which is not particularly relevant right now, let's imagine it's just 1.0. So now I have an unnormalized 1D posterior array.

The reason I'm doing this inside the objective function is so that I can intuitively work with the priors, I'm trying to regularize towards a non-zero value that is known by expert knowledge.

Since I'm using the minimizer I have to take the negative of the unnormalized posterior, and because it's easier for the optimization i first take the log of the values.

I'm now returning this 1D negative log posterior array to the lmfit minimizer (least_squares).

Are you saying that this should just work? And that I should in principle get the parameters (and corresponding confidence intervals) that minimize this neg log posterior?

I'm afraid I don't really understand any of what you are doing.  It does seem to me that people wanting to do probability statistics are really speaking a completely different language with overlapping words.  

I indeed thought maybe the sum of squares being very large could be a problem, but I indeed don't think it's THAT large so probably not. Maybe the prior multiplication is making the shape of the posterior distribution significantly more complicated.

I don't see why the actual size would matter.  

--Matt

J F

unread,
Sep 27, 2021, 9:29:31 AM9/27/21
to lmfit-py
Thanks for your effort Matt. I may be making it too difficult.

The question simply is:
If my " Objective function to be minimized "  return the negative log-likelihood instead of the residuals, will lmfit still work properly, i.e., find the parameters that minimize the likelihood with a plausible approximation of the uncertainty/standard error/confidence intervals.

J F

unread,
Sep 27, 2021, 10:41:51 AM9/27/21
to lmfit-py
It's a bit clearer to me now, if I scalar minimize the summed neg log-likelihood using BFGS then I get the same result if I minimize the residual array using least_squares.

Unfortunately I don't get any measure of uncertainty when I use the BFGS and it also requires many more function evals, which is problematic.

Matt Newville

unread,
Sep 27, 2021, 12:45:36 PM9/27/21
to lmfit-py
Hi,


On Mon, Sep 27, 2021 at 9:41 AM J F <jeroen...@live.nl> wrote:
It's a bit clearer to me now, if I scalar minimize the summed neg log-likelihood using BFGS then I get the same result if I minimize the residual array using least_squares.

Unfortunately I don't get any measure of uncertainty when I use the BFGS and it also requires many more function evals, which is problematic.

Yes, BFGS and other scaler minimizers will be less efficient than those that use a residual array.  
If neg log-likelihood gives the same result as least squares, that implies that the uncertainties are pretty close to being normally distributed around the central value -- the Gaussian hypothesis that is often maligned as simplistic and incomplete, but which also happens to work reasonably well in lots of cases.  I would generally recommend starting with least-squares and then exploring where that may not be sufficiently accurate.   The confidence-interval tools in lmfit can also help reveal where these kinds of non-normal effects can be significant.

Also, if you are not aware, the scaler minimizers supported in lmfit (including LBFGS) *can* be coerced to generating uncertainties if the numdifftools package is installed.    In fact, if you have not looked at the example at

you might find that it is doing some of the things you are trying to do.

--Matt

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/d78aec97-6507-476f-8f31-3aab792825fan%40googlegroups.com.


--
--Matt Newville <newville at cars.uchicago.edu> 630-327-7411
Reply all
Reply to author
Forward
0 new messages