Thanks for your helpful comments Matt. I'm a bit confused about what you mean by the properly signed square root. I'm using prior probabilities together with the likelihood because I find it more intuitive to work with than regularization terms.
The code might be a bit long and confusing to share so I'll try to describe the steps as clearly as possible.
Imagine I have a dataset (Y) and let's assume I only have one variable so Y is that variable in the real data set.
I run simulations of a single ODE for multiple individuals, so I get 5 simulated values (X_i_t) for 5 times points (t) per individual (X_i).
I then calculate the likelihood for each value (X_i_t) using:
LL = scipy.stats.norm.pdf(Y_i_t, mu=X_i_t, sigma=1)
and put all those likelihoods in a flat array for all individuals and all t's.
Now I multiply all these likelihoods with some arbitrary prior probability constants, which is not particularly relevant right now, let's imagine it's just 1.0. So now I have an unnormalized 1D posterior array.
The reason I'm doing this inside the objective function is so that I can intuitively work with the priors, I'm trying to regularize towards a non-zero value that is known by expert knowledge.
Since I'm using the minimizer I have to take the negative of the unnormalized posterior, and because it's easier for the optimization i first take the log of the values.
I'm now returning this 1D negative log posterior array to the lmfit minimizer (least_squares).
Are you saying that this should just work? And that I should in principle get the parameters (and corresponding confidence intervals) that minimize this neg log posterior?
I indeed thought maybe the sum of squares being very large could be a problem, but I indeed don't think it's THAT large so probably not. Maybe the prior multiplication is making the shape of the posterior distribution significantly more complicated.