How to improve fitting of a model for oscillatory target data.

148 views
Skip to first unread message

Andres Valdez

unread,
Mar 8, 2021, 11:20:44 PM3/8/21
to lmfit-py
I am using lmfit to determine the coefficients of a non-linear system of ODEs. Integration of the coupled Van der Pol oscillators is done using python's odeint routine.

When I fit a synthetic case everything works perfectly. When I fit a "realistic" case the fitting methods failed to improve the initial guess.

I tried to use different fitting methods like differential evolution, nelder, and leastsq, no remarkable evolution from the initial guess came out.

I am attaching two figures model_fit.png (works very good, its synthetic) and model_fit_bad.png (realistic case). I am wondering if there is any way to find good initial shoots?

The function to fit combines the solution of a coupled Van der Pol system, see Figures odes.jpg and cost.jpg...

The integration of the odes was done using odeint, so I assume that implementation runs error-free... The synthetic case confirms it. But the realistic-case fails to improve the initial shoot.

Any advice will be very well received. Thanks




model_fit.png
model_fit_bad.png
cost.jpg
odes.jpg

Matt Newville

unread,
Mar 9, 2021, 7:59:12 AM3/9/21
to lmfit-py
You would need to show us a minimal, complete example of python code that shows the trouble to get any concrete help. Also, convince us that you have read and understood what the fit report is saying.   Finally, if values are not changing from initial values, make sure you have read https://lmfit.github.io/lmfit-py/faq.html#why-are-parameter-values-sometimes-stuck-at-initial-values

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/b4d9c555-9a19-4981-b63e-f6c620247bd9n%40googlegroups.com.

Andres Valdez

unread,
Mar 9, 2021, 4:43:48 PM3/9/21
to lmfit-py
Dear Matt Newville,

Following your advice let me share an excerpt of my fitting script.

Step 1, define constants and fixed parameters:
    # Here I define constants of the model
    c_til = 50            # air particle velocity, m/s
    d_len = 0.0175        # length of vocal folds, m
    x_0   = 0.001         # half glottal width at rest position, m
    sol_0 = [0,0.1,0,0.1] # Initial state from papers (reference) (x(0),x'(0),y(0),y'(0))
    theta = [c_til,d_len,x_0]
    
    if(dataset == 'Synthetic'):
    if(dataset == 'Synthetic'):
        # Here I define a synthetic dataset just to test the fit
        A     = 0.50 # Reference value
        B     = 0.32 # Reference value
        D     = 0.00 # Reference value
    
        t     = np.linspace(0,60*pi,1000)
        data , tgt = ode_solver(A,B,D,sol_0,t)
        data  = data + 0.1*np.random.normal(size=t.shape) # Add a noise to see the robustness.
    else:
        # Here I upload a given audio file
        t, signal , glot_flow, sr = load_audio('target.WAV')
        t    = t[:900]
        data = glot_flow[:900] * 1.0e-01
    else:
        # Here I upload a given audio file
        t, signal , glot_flow, sr = load_audio('target.WAV')
        t    = t[:900]
        data = glot_flow[:900] * 1.0e-01

Step 2, declare the parameters to fit, plus the min-max limits and other props.
    # Define the initial shoot and ranges for the parameters to fit
    ID = ['A' ,'B' ,'D']
    if(dataset == 'Synthetic'):
        i0 = [0.50,0.50,0.50] # This one works for synthetic
    else:
        i0 = [0.50,0.50,0.50]
    
    vi , vf = [0,0,0] , [1,1,1] # According to code, they are always between (0,1)

Step 3, Define and call the lmfit minimizer stuff
    # According to lmfit, you must define ranges for parameters and initial guess
    params = lmf.Parameters()
    params.add(ID[0],value=i0[0], min=vi[0], max=vf[0])  # alpha
    params.add(ID[1],value=i0[1], min=vi[1], max=vf[1])  # beta
    params.add(ID[2],value=i0[2], min=vi[2], max=vf[2])  # Delta

    # lmfit minimizer 
    foo = lmf.Minimizer(residual_ode, params,fcn_kws={'t': t, 'data':data,'ID':ID, 'theta':theta, 'sol_0':sol_0})

    result = foo.minimize(method='leastsq')
    lmf.report_fit(result)

As you can notice from the figures I shared before, the synthetic case works well... I mean I
"Validated the ode_solver() function". For the synthetic case the lmfit report is copied next:
Fitting ODEs running on lmfit v1.0.0
Analyzed dataset Synthetic
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 101
    # data points      = 1000
    # variables        = 3
    chi-square         = 9.96375601
    reduced chi-square = 0.00999374
    Akaike info crit   = -4602.80117
    Bayesian info crit = -4588.07790
[[Variables]]
    A:  0.50089718 +/- 1.5368e-07 (0.00%) (init = 0.5)
    B:  0.32140460 +/- 2.1184e-05 (0.01%) (init = 0.5)
    D:  0.02961727 +/- 4.6440e-05 (0.16%) (init = 0.5)
[[Correlations]] (unreported correlations are < 0.100)
    C(B, D) = -0.857
    C(A, B) =  0.644
    C(A, D) = -0.373

For the Realistic case the lmfit report is shown next
Fitting ODEs running on lmfit v1.0.0
Analyzed dataset real
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 41
    # data points      = 900
    # variables        = 3
    chi-square         = 0.98750335
    reduced chi-square = 0.00110090
    Akaike info crit   = -6127.47314
    Bayesian info crit = -6113.06595
[[Variables]]
    A:  3.6795e-06 +/- 51769552.4 (1406957752181724.50%) (init = 0.5)
    B:  0.99999944 +/- 2.6702e+08 (26701595662.24%) (init = 0.5)
    D:  0.21433002 +/- 2.4401e+11 (113848637625094.62%) (init = 0.5)
[[Correlations]] (unreported correlations are < 0.100)
    C(A, B) =  1.000
    C(B, D) = -0.964

To summarize my problem (A,B,D) should be always in (0,1) but the realistic case fails
giving good results..... I mean look at the level of uncertainty there in the report.
Without mentioning that the fit looks as the figure I shared before...

Shall I move forward on fitting an "approximate curve" something like sinus with the first
3-5 freqs.??? OR shall I explore different alternatives within the fitting????

All advice are very welcome, thanks.

Andres Valdez

unread,
Mar 9, 2021, 10:48:42 PM3/9/21
to lmfit-py
Dear Matt Newville,

Let me improve my answer by adding more details of the implementation.

Residual definition
def residual_ode(params,t,data,ID,theta,sol_0):
    """
    Here I define the residual function to minimize.
    """
    A,B,D           = params[ID[0]],params[ID[1]],params[ID[2]]
    c_til,d_len,x_0 = theta
    u0              = ode_solver(A,B,D,sol_0,t)[0]
    
    #plt.plot(t, u0, 'g-')
    
    return u0 - data

And the ODEs' solver
def ode_sys(sol,t,A,B,D):
    """
    This function returns the order reduction for the Van der Pol's ODEs
    A , B, D: are scalar parameters to be "determined"
    x_sol , y_sol: are displacements
    u_sol , v_sol: are velocities
    You need to convert the 2-second order ODEs into 4-first order ODEs
    """
    
    x_sol , u_sol , y_sol , v_sol = sol
    
    x_der = u_sol
    u_der = (A - B * (1 + np.power(x_sol,2))) * u_sol + (0.5 *D-1) * x_sol + A * v_sol
    y_der = v_sol
    v_der = (A - B * (1 + np.power(y_sol,2))) * v_sol - (0.5 *D+1) * y_sol + A * u_sol
    
    return [x_der,u_der,y_der,v_der]

def ode_solver(A,B,D,sol_0,t):
    """
    This function integrates the coupled ODEs that represent the Van der Pol oscillators
    A     :: alpha
    B     :: beta
    D     :: delta
    sol_0 :: initial condition [x(0),x'(0),y(0),y'(0)]
    t     :: time array
    u0    :: computed glottal flow
    sol   :: displacements of both vocal folds vs time
    """
    # Here I define constants of the model (physiological props.)
    c_til = 50            # air particle velocity, m/s
    d_len = 0.0175        # length of vocal folds, m
    x_0   = 0.001         # half glottal width at rest position, m
    
    sol   = odeint(ode_sys,sol_0,t,args=(A,B,D))
    u0    = c_til * d_len * (2 * x_0 + sol[:,0] + sol[:,2])
    
    return u0,sol

I think that's all.



Matt Newville

unread,
Mar 10, 2021, 8:14:24 AM3/10/21
to lmfit-py
Hi Andres,

Just to be clear, a "minimal, complete example of python code".  means both minimal (no extraneous code) and complete, as in the code would actually run and shows all the steps.  

It looks to me like you are asking for help for a fit that ran to completion, and gave a reasonable report.   That is, it looks like everything is working as expected.  If you are running into any issue it is that your parameters "A" and  "B" are hitting the bounds you set for them.  Are you sure that "A" and "B" cannot exceed [0, 1]?   Are you sure that you are sure?

I cannot really guess why "B>1" or "A<0" would never ever be sensible, but I have not tried to really understand the analysis you're trying to do.  





Andres Valdez

unread,
Mar 10, 2021, 8:28:53 AM3/10/21
to lmfit-py
Dea Matt Newville,

My apologies for my extensive answers. I tried to share the essential parts to follow my script...
Ideally, I am trying to use the classical least-squares method as an alternative to the proposed

As you can note in that article they propose the fitting problem as a sensitivity analysis problem,
Initially, I am trying to avoid the sensitivity cases by just replacing them with least-squares. Do
you have any suggestions?

I was encouraged by the synthetic case, but the realistic case failed and I don't know how to proceed.

Ps.: A, B, D should be always inside [0,1], initial states and other definitions are FIXED, no need to fit them.

Andres Valdez

unread,
Mar 11, 2021, 5:38:43 PM3/11/21
to lmfit-py
Dear Matt Newville,

Do you think this might have happened because I used TOO much data????

Matt Newville

unread,
Mar 11, 2021, 6:08:05 PM3/11/21
to lmfit-py
On Thu, Mar 11, 2021 at 4:38 PM Andres Valdez <andres1...@gmail.com> wrote:
Dear Matt Newville,

Do you think this might have happened because I used TOO much data????

Probably not , but I don't really know what you mean by "this".   

The fit you described has a fit report that ran to completion and gave sensible results.  So, the fit "worked".  It also happened to say that the best fit values for your A and B were at the limits you imposed.    That is, literally, what you asked for.  Are you sure that you are sure that you should not relax the bounds?  Is B=1.2 really, really impossible?

I would suggest running your model function with what you think are sensible inputs (like, say, the initial values) for the parameters and plotting the results with the data.  Do those "sensible inputs" give anything that is close to the data you are trying to fit?  If so, the fit will probably work.   And if not, maybe those initial values are not so sensible after all. 

Like, are the scale and frequency of your actual data anywhere close to the synthetic data?  Maybe your data needs some scaling.  Again, are you sure that you are sure that the bounds you imposed are actually, really, really correct?  If you are sure that the scaling is all correct and that the bounds are right, then you are getting the best fit result. 


Andres Valdez

unread,
Mar 11, 2021, 6:20:37 PM3/11/21
to lmfit-py
Let me answer by parts, and explain better myself.

The synthetic data came from integration the Van der Pol's equations with "prescribed (A, B, D)". I am told that (A, B, D) should be always in [0,1]. The synthetic case worked fine.

My real problem is that the realistic data, which comes from a real audio file, should be able to be fitted, I think using Least-Squares. Recently I executed a test that uses LHS sampling to choose initial shoots in the interval, and the sensitivity to inial shoots is almost zero. Then the problem seems to be somewhere else...

Ok I will take the advice of checking the frequency of the datasets (synthetic and realistic) to determine if they are comparable...

Do you suggest also to "change the residual" definition? Taking into account the frequency spectra, for example..... like a joint residual....

Matt Newville

unread,
Mar 11, 2021, 10:46:36 PM3/11/21
to lmfit-py
On Thu, Mar 11, 2021 at 5:20 PM Andres Valdez <andres1...@gmail.com> wrote:
Let me answer by parts, and explain better myself.

The synthetic data came from integration the Van der Pol's equations with "prescribed (A, B, D)". I am told that (A, B, D) should be always in [0,1]. The synthetic case worked fine.

My real problem is that the realistic data, which comes from a real audio file, should be able to be fitted, I think using Least-Squares. Recently I executed a test that uses LHS sampling to choose initial shoots in the interval, and the sensitivity to inial shoots is almost zero. Then the problem seems to be somewhere else...

Ok I will take the advice of checking the frequency of the datasets (synthetic and realistic) to determine if they are comparable...

Do you suggest also to "change the residual" definition? Taking into account the frequency spectra, for example..... like a joint residual....


How would I know if you should change the residual?  Did you ever post any code with a residual in it?  I think not.  Really, I do not know what you are doing or talking about.  I am sure that you understand Python, and have basic mathematical skills, and enough common sense to look at your different data sets and see if they share the same scale and ranges. 

If you want further help, you will really have to provide example code that is both minimal and complete that shows any problem you are having. 

Andres Valdez

unread,
Mar 12, 2021, 12:17:17 AM3/12/21
to lmfit-py
I thought you could suggest something like add to the residual definition the discrepancies between the frequencies ...
Reply all
Reply to author
Forward
0 new messages