Unexpected results for data with NaN values

24 views
Skip to first unread message

Vanessa Carneiro Morita

unread,
Apr 3, 2024, 4:07:51 PMApr 3
to lmfit-py
Dear all, 

I've been struggling with some model fits, and would like some advice, please

Summary: I am fitting a custom model to my dataset and I have some weird fit results, which I believe are due to NaN values on the data. I don't know whether I missed anything in my codes (or in the codes of the library that was developed in the lab to do these fits to the data) or I'm doing anything wrong...

Detailed information:
I have a dataset of eye velocity over time, which has NaN values due to saccades (= very high peaks on the eye velocity that I'm not interested in, so I removed them)
My model is a concatenation of three pieces of functions: a flat regression - an exponential - a sigmoid
Here is an example (data in blue, fitted model in red):
image_2024-04-03_205235116.png

It works very well when the data does not have NaN values, but otherwise (with the nan_policy set to 'omit' -- it does not work at all with 'propagate') I get weird fits like this one:

image_2024-04-03_205834500.png

So I started doing some tests. If I use the eval() function from the lmfit library without specifying the independent variable, I get an array of the same length as the data without NaNs, as you can see in the figure below (same data as above)

image_2024-04-03_210456127.png

If I now use the function plot_fit(), which takes into consideration the independent variable used in the fit process by default, I get this:

image_2024-04-03_210838523.png

i.e., a "broken" exponential, when I would actually expect something like what is shown here: https://stackoverflow.com/questions/73199517/omit-values-when-fitting-lmfit-nan-policy

Do you have any idea of why this happens and what I could do to have a proper fit? (besides interpolations in the data -- this would really be my last option, as interpolation introduces biases in the fitted models...)

Here is the model function that I use (in case it helps):

def velocity_ExponentialAnti_SigmoidalPursuit(x, target_dir, baseline, anti_onset, anti_slope, anti_offset, pursuit_slope, horizontal_shift, steady_state, fit_baseline:bool=False) :

    '''
    Model of eye-velocity over time, in a smooth pursuit task.
    Anticipatory phase is modeled as an exponential, and initial visually guided pursuit as a sigmoid

    Parameters
    ----------
    x : ndarray
        time in milliseconds
    target_dir : int
        direction of the target
        -1 (corresponding usually to left/down) or 1 (corresponding usually to right/up)
    anti_onset : int
        anticipation onset in milliseconds
    anti_slope : float
        defines the slope of the sigmoid corresponding to the anticipatory phase
    anti_offset : int
        offset of the anticipatory phase in milliseconds
    pursuit_slope : float
        defines the slope of the sigmoid corresponding to the initial phase of the visually-guided pursuit
    horizontal_shift : int
        shift in the sigmoid function corresponding to the initial phase of the visually-guided pursuit
    steady_state : float
        eye velocity reached in the late phase of visually-guided pursuit (in degrees/sec)
    fit_baseline : bool
        if ``True``, fits a flat regression corresponding to the baseline velocity
        this parameter is intended to be used when there were problems with the eye-tracker calibration,
        such that eye velocity is not centered in zero during the fixation phase

    Returns
    -------
    velocity : list
        model of the eye velocity in deg/sec
    '''

    if not fit_baseline:      baseline = 0

    anti_sign     = np.sign(anti_slope)
    anti_slope    = abs(anti_slope)/1000 # to switch from deg/sec to deg/ms
    pursuit_slope = target_dir*pursuit_slope/1000
    steady_state  = steady_state*target_dir
    anti_onset    = np.floor(anti_onset)
    anti_offset   = np.floor(anti_offset)
    time          = x

    idx_baseline     = np.where(time < anti_onset)[0]
    idx_anticipation = np.where((time >= anti_onset)&(time < anti_offset))[0]
    idx_pursuit      = np.where(time >= anti_offset)[0]
       
    x_anticipation = np.arange(0,len(idx_anticipation),1)
    x_pursuit      = np.arange(0,len(idx_pursuit),1)

    velocity = np.zeros(len(time))
    velocity[idx_baseline]     = baseline * np.ones(len(idx_baseline))
    velocity[idx_anticipation] = baseline + anti_sign*(np.exp(anti_slope*x_anticipation)-1)

    max_anticipation      = anti_sign*max(abs(velocity))
    sigmoid_pursuit       = (steady_state-max_anticipation)/(1 + np.exp(-pursuit_slope*(x_pursuit-horizontal_shift)))
    velocity[idx_pursuit] = max_anticipation + sigmoid_pursuit - sigmoid_pursuit[0]

    return velocity

the parameters are set somewhere else in the library, but they are basically this:

param_fit=[{'name':'steady_state',      'value':value_steady_state, 'min':3.,                   'max':40.,             'vary':True  },
                            {'name':'dir_target',        'value':dir_target,         'min':None,                 'max':None,            'vary':False },
                            {'name':'anti_slope_tmp',        'value':value_anti,         'min':-40.,                 'max':40.,             'vary':vary_anti,},
                            {'name':'anti_slope',            'expr':'anti_slope_tmp if abs(anti_slope_tmp) >= .5 else 0'}, # arbitrary threshold for valid acceleration
                            {'name':'anti_offset',       'value':TargetOn+90,      'min':TargetOn+30,      'max':TargetOn+120,     'vary':True  },
                            {'name':'anti_onset_tmp',    'value':TargetOn-100,   'min':t_0,     'max':TargetOn,    'vary':vary_start_anti},
                            {'name':'anti_onset',        'expr':'anti_onset_tmp if a_anti != 0 else anti_offset-1'},
                            {'name':'pursuit_slope',       'value':80, 'min':3., 'max':500., 'vary':'vary'},
                            {'name':'baseline',           'value':0,  'min':-1,  'max':1,   'vary':True},
                            {'name':'horizontal_shift',   'value':100,  'min':0,  'max':110,   'vary':True},
                            ]

The library does the fit in two steps, and for parameters where 'vary'='vary', vary=False in the first step and vary=True in the second step:

out = model.fit(data_trial, params, nan_policy='omit', **inde_vars)
               
                # make the other parameters vary now
                for num_par in range(len(param_fit)) :
                    if 'vary' in param_fit[num_par].keys() :
                        if param_fit[num_par]['vary'] == 'vary' :
                            out.params[param_fit[num_par]['name']].set(vary=True)

                result_deg = model.fit(data_trial, out.params, method='powell', nan_policy='omit', **inde_vars)

I hope I did not forget to mention anything important, and please let me know if you need more information

Thanks in advance,
Vanessa

Matt Newville

unread,
Apr 3, 2024, 10:28:40 PMApr 3
to lmfi...@googlegroups.com
Hi Vanessa, 

Including a complete, minimal example is always recommended.  

If you have data with NaN values meant to signal "missing data", then the best thing to do is simply remove those data points.  

Using `nan_policy='omit'` is meant to do that: remove NaN before doing the fit.  If there are NaN, then a fit with
`nan_policy='propagate'` will definitely fail.


It looks to me like that is what you are doing, and that it is working.   That is, your third plot is plotting with the x-axis being just the array index, and values with y=NaN were removed (omitted) from the array.  So, that looks OK to me (but, again, example code would never hurt)

You can certainly evaluate a model at any value of the independent variable, including those values where the y data was NaN.
That should then "fill in" the missing data points.
 

--
You received this message because you are subscribed to the Google Groups "lmfit-py" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmfit-py+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lmfit-py/9ebc4c59-e15b-47a6-a67c-75df8f9f4426n%40googlegroups.com.


--
--Matt Newville <newville at cars.uchicago.edu630-327-7411

Vanessa Carneiro Morita

unread,
Apr 5, 2024, 10:05:32 AMApr 5
to lmfit-py
Hi Matt,

thanks again! I was in the process of writing/sending you the reply below when I had a click: I could actually change the model function, so that it would take into consideration that the data can possibly have NaN values... and it works like a charm
I'm sending also the new version of the notebook (example2.ipynb), in case someone else runs into the same problem and needs a solution :)

best
-------------------------------
Hi Matt,

thank you very much for your quick reply! I'm sorry that I didn't include a minimal example in my first message, but it was already too late and I had to sleep in order to be a functional human being today :)
In any case, I attached a python notebook (example.ipynb) with two examples: one with NaNs, one without...

Yes, I understand that the nan_policy=omit removes NaNs from both the data and independent variable... what doesn't make sense to me is that I have the impression that the fit process is ignoring my independent variable (or, more precisely, the gaps in it)
example2.ipynb
example.ipynb
Reply all
Reply to author
Forward
0 new messages