Dealing with missing RTs, e.g. from the CPT

1,113 views
Skip to first unread message

Guido Biele

unread,
Mar 6, 2013, 8:40:59 AM3/6/13
to hddm-...@googlegroups.com
Hi

I have a large continuous performance task (CPT) data-set which I would like to analyze with the HDMM, but I am wondering about how to deal with trials without RT.

As you probably know, the CPT has target and non-target stimuli, and the participant is instructed to press a button upon presentation of the only the target stimulus.

As a result, CPT data has RTs for hits and commission errors, but no RTs for omission errors and correct non-responses. Still, we have information about the frequency about these last correct and error non-responses.

One way to use analyze these data with the HDDM is only to use correct responses and commission errors. However, I think it would be nicer to also include frequency of correct and incorrect non-responses in the analysis. My question is how to go about this.

I think one way to go would be to (1) give all non-responses the same RT and x (2) use this data with a modified version of HDDM, where the predicted RT for the non-responses is always set to x.  More specifically, I am thinking of a HDDM model with 2 stimulus types (target, non-target), which can be specified such that the predicted RT for non-responses to targets and non-responses to non-targets is always set to x, whereas the predicted RTs for the responses remain untouched.

It seems to me that to implement this one would need to modify the likelihood function and/or the wfpt node, but I am not sure where/how. Could you give me a hint about where to start?

Best – Guido

Imri Sofer

unread,
Mar 6, 2013, 1:45:35 PM3/6/13
to hddm-...@googlegroups.com
Hi Guide,
I believe you have to include the trials that the subjects did not respond. If you won't you would bias the model, since all these trials are trials that the subject responded 'NO' (see Gomez2007 A model of the Go/No-Go task) 
The likelihood of an observed trials should be the same as before, but the likelihood for a non response should be the likelihood of the trials hitting the "other" boundary.
So you will need to modify HDDM, to assign this probability to the No responses.
these changed should be done in likelihood.py
you need to create a new likelihood function, and then a new Wfpt class.
if you step over the old Wfpt it would do the trick.


*I would check this model on a simulated data before I would ran it on the real data




--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

guido biele

unread,
Mar 6, 2013, 1:56:40 PM3/6/13
to hddm-...@googlegroups.com
hi imre,
I agree that I need to incorporate the non-responses, I was just wondering how to incorporate that into the likelihood function.
do you think assigning all non-responses the same RT and then modifying the likelihood function and the wftp node accordingly would do the job?

more on a technical level: do I remember it correctly that the wftp classes are defined in hddm_base, or was this at another location?

cheers-guido
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Imri Sofer

unread,
Mar 6, 2013, 2:01:18 PM3/6/13
to hddm-...@googlegroups.com
I don't think assigning the same RT will do the job.
But better to simulate check, maybe it's a good approximation.
you can use hddm.generate.gen_rand_data and hddm,.generate.gen_rand_params to simulate data


the likelihood functions are in likelihood.py

guido biele

unread,
Mar 6, 2013, 2:09:49 PM3/6/13
to hddm-...@googlegroups.com
sorry for the many emails,

but what would you do to get the likelihood for the non-response data?
I would think that if I give all nonresponses the same RT x and also change the wftp node output such that all processes that end at the "nonresponse boundary" get the same RT x, this would amount to making sure that the correct prediction of the propapility of non responses is reflected in the likelihood.

Thomas Wiecki

unread,
Mar 6, 2013, 3:18:57 PM3/6/13
to hddm-...@googlegroups.com
I think you can just use the probability of hitting that boundary at any time. We have defined it in src/pdf.pxi. This is a left-over as nothing else is using it and it's not exposed externally (which we can easily do). For now you could just copy the function to likelihoods.py:

def prob_ub(v, a, z):
    """Probability of hitting upper boundary."""
    return (exp(-2*a*z*v) - 1) / (exp(-2*a*v) - 1)
and use that in your wfpt likelihood for the unobserveds. Another way to look at this is that you are integrating out the RT.

Note that this is the probability of hitting the upper boundary, so 1-prob_ub if you want the lower boundary.

Thomas

Guido Biele

unread,
Jun 13, 2013, 4:30:47 PM6/13/13
to hddm-...@googlegroups.com
Hi Thomas and Imri,

I'd like to pick up this topic again because we got positive feedback on the results that we obtained while "cheating", i.e. analyse CPT data while ignoring the no-response data.

as a quick reminder: 
- we agreed that the right way to go is to include the no-response trials
- you suggested that the likelihood of no-responses be calculated as the probability to hitting the lower boundary, for which Thomas pointed to the code in src/pdf.pxi
- you also suggested that likelihood function as defined in likelihood.py would need to be adapted (and correspondingly the wfpt nodes will have to be adapted)

after going through this today I still have more questions then I like. I think one main reason for this is that I probably have not fully understood where in the likelihood functions the predicted and observed RT distributions come together.  

my current understanding is that wfpt_like in likelihood.py returns the likelihood of the oberved RT distribution given the model and the parameters.
moreover, the function random in likelihood.py is uses to flip the errors.  However, what I can't find is where the total likelihood for correct and error RT distributions is calculated. (at first I thought maybe in wiener_like in src/wfpt.pyx, but that does not seem to be the case).

I I think in the end I will have to sum the likelihood for the RT distribution for the trials with RTs and the likelihood to reach the (upper or lower boundary) for the trials without RTs. But I  am still unclear about where I could do that...

can you give me a  hint on where I should continue from here and correct me if I misunderstood something?

thanks in advance
Cheers - Guido

Thomas Wiecki

unread,
Jun 15, 2013, 5:20:04 PM6/15/13
to hddm-...@googlegroups.com
Hi Guido,

The likelihood code is pretty complex due to the multiple levels of layering.

In essence:
* likelihood.py provides the wrappers that make the functions available to pymc. It accesses wfpt.pyx.
* wfpt.pyx provides an interface to the actual likelihood functions, it takes an array, loops over it to compute the summed logp. It calls into pdf.pxi for each RT.
* pdf.pxi has the actual likelihoods for a single RT.

To create your new likelihood you might want to start by looking at wfpt.pyx:wiener_like() which computes the summed logp. pdf.pxi:full_pdf() is returning the logp of a single RT. You need to replace this likelihood call with a call to prob_up() if the RT is not-observed (make sure to get the boundary correct, i.e. 1 - prob_up() is the probability of hitting the lower boundary).

If those are working you can copy the wiener_like_contaminant() wrapper in likelihoods.py and make it instead call your new function.

Let me know if you have more questions, this is actually pretty interesting to me.

Thomas
Message has been deleted

Guido Biele

unread,
Sep 9, 2013, 8:47:57 AM9/9/13
to hddm-...@googlegroups.com
Hi Thomas,

I deleted my earlier reply because rethinking it showed me that the solution i proposed there does not work.
Rethinking it led me to the solution Imri had suggested earlier (sorry for not picking it up right then!).

This solution entails simply calculating the log likelihood as 
sum_log_p = sum_log_response_trials + sum_logp_NOresponse_trials
whereby sum_logp_NOresponse_trials is simply the number of trials without response multiplied by the log probability that the respective boundary is hit

One (inelegant) way to implement this is to give the no-response trials a specific response time by which they can be identified as such, and use this when calculating the likelihoods in likelihoods.py.
You find one such implementation below.
There are several inefficiencies in the implementation below, which mostly have to do with my inexperience in working with python.
I'd like to implement a more efficient version before I do a simple model-recovery exercise to test how a,v,z can be recovered from data where non-responses are deleted.

Here are the 2 things I'd like to know so that I can make a more efficient implementation:
1) Where would I need to change the code to change the variables submitted to wfpt_like? (I am thinking of directly passing the number of no-response trials instead of calculating them in each call of wfpt_like)
2) I think it would be even faster to let wfpt.wiener_like do the entire likelihood calculation. would implementing this be as easy as defining a new function in wfpt.pyx?

Cheers - Guido

#### modified wfpt_like for data from the CPT (or n-back, go-nogo ...) ####

#NOTE!
# I assume a bias towards responding here, 
# so that no-response trials for target-trials have negative "fake RTs" and 
# no-response trials for non-target-trials have positive "fake RTs" 

def wfpt_likeCPT(x, v, sv, a, z, sz, t, st, p_outlier=0):
        # separate trials into no-response and response trials
        # here, I use .55555 to identify no-response trials
        resp = x['rt'].abs() != .55555
        # get sum of log p for trials with response as usual
        sum_logp_resp = hddm.wfpt.wiener_like(x['rt'][resp], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
        # get sum of log p for trials with no response as log p over the probability to go to the lower or upper boundary
        # lower boundary for target-trials, upper boundary for non-target trials
        p_noresp = (np.exp(-2*a*z*v) - 1) / (np.exp(-2*a*v) - 1)  
        if x['rt'][resp==0][0] < 0:
                p_noresp = 1-p_noresp
        sum_logp_noresp = np.log(p) * (x['rt'].count() - sum(resp))
        return logp_resp + logp_noresp

Thomas Wiecki

unread,
Sep 9, 2013, 11:08:35 AM9/9/13
to hddm-...@googlegroups.com
Hi Guido,

Yes, that sounds like a better approach.


On Mon, Sep 9, 2013 at 12:47 PM, Guido Biele <g.p....@psykologi.uio.no> wrote:
Hi Thomas,

I deleted my earlier reply because rethinking it showed me that the solution i proposed there does not work.
Rethinking it led me to the solution Imri had suggested earlier (sorry for not picking it up right then!).

This solution entails simply calculating the log likelihood as 
sum_log_p = sum_log_response_trials + sum_logp_NOresponse_trials
whereby sum_logp_NOresponse_trials is simply the number of trials without response multiplied by the log probability that the respective boundary is hit

One (inelegant) way to implement this is to give the no-response trials a specific response time by which they can be identified as such, and use this when calculating the likelihoods in likelihoods.py.
You find one such implementation below.
There are several inefficiencies in the implementation below, which mostly have to do with my inexperience in working with python.
I'd like to implement a more efficient version before I do a simple model-recovery exercise to test how a,v,z can be recovered from data where non-responses are deleted.

Here are the 2 things I'd like to know so that I can make a more efficient implementation:
1) Where would I need to change the code to change the variables submitted to wfpt_like? (I am thinking of directly passing the number of no-response trials instead of calculating them in each call of wfpt_like)

You can do that but I don't think it matters that much. Essentially you'll have to create a new knode for your likelihood as done here:
https://github.com/twiecki/stopsignal/blob/master/stopsignal/stopsignal.py#L41
 
2) I think it would be even faster to let wfpt.wiener_like do the entire likelihood calculation. would implementing this be as easy as defining a new function in wfpt.pyx?

Yes, that should work. But just a single computation of the likelihood of hitting the boundary should not benefit much from moving it to cython (which rocks for loops).

If I were you I'd first do it in the likelihood as you started below -- premature optimization and all.

See also comments on the code below.

Thomas

Cheers - Guido

#### modified wfpt_like for data from the CPT (or n-back, go-nogo ...) ####

#NOTE!
# I assume a bias towards responding here, 
# so that no-response trials for target-trials have negative "fake RTs" and 
# no-response trials for non-target-trials have positive "fake RTs" 

def wfpt_likeCPT(x, v, sv, a, z, sz, t, st, p_outlier=0):
        # separate trials into no-response and response trials
        # here, I use .55555 to identify no-response trials
        resp = x['rt'].abs() != .55555
        # get sum of log p for trials with response as usual
        sum_logp_resp = hddm.wfpt.wiener_like(x['rt'][resp], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
make sure to remove the .5555 trials for computing this.
        # get sum of log p for trials with no response as log p over the probability to go to the lower or upper boundary
        # lower boundary for target-trials, upper boundary for non-target trials
        p_noresp = (np.exp(-2*a*z*v) - 1) / (np.exp(-2*a*v) - 1)  
        if x['rt'][resp==0][0] < 0:
                p_noresp = 1-p_noresp
        sum_logp_noresp = np.log(p) * (x['rt'].count() - sum(resp))
p = p_noresp I assume? also, why do you subract sum(resp)?



--
Thomas Wiecki
PhD candidate, Brown University
Quantitative Researcher, Quantopian Inc, Boston

Guido Biele

unread,
Sep 9, 2013, 4:13:12 PM9/9/13
to hddm-...@googlegroups.com
OK, so I'll try it with the simple approach.

as to your questions about the modified wfpt_like:

sum_logp_resp = hddm.wfpt.wiener_like(x['rt'][resp], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
is already with removed .55555 trials, because resp is defined as x['rt'] != .55555

in
sum_logp_noresp = np.log(p) * (x['rt'].count() - sum(resp))
I count the no-response trials by subtracting the number of response trials from the total number of trials.

thanks for the hit about cython.

cheers - guido

Thomas Wiecki

unread,
Sep 9, 2013, 4:14:33 PM9/9/13
to hddm-...@googlegroups.com
Good luck. Definitely share any success (or failure) stories.

Thomas

Guido Biele

unread,
Sep 17, 2013, 4:09:45 AM9/17/13
to hddm-...@googlegroups.com
Hi,

I have now run a parameter recovery test, and i think the results are satisfactory if one is mainly interested in comparing people or conditions, but unsatisfactory if one requires unbiased estimates of parameters.

Before I get there, here is the likelihood function I used (which is very different than the still naive/wrong one in my last post ...). the key addition to the standard likelihood function is to use the bernoulli function to get the likelihood for the %correct and sum the likelihood for the %correct with the likelihood for the RT distribution (the latter being what HDDM typically uses).
It appeared to me that the parameters were even more biased when adding the full likelihood for %correct, so I scaled this by the %of non-response trials. hence, the summed likelihood (sum of RT and %correct likelihood) will be dominated by the RT-likelihood if there are few no-response trials, and it will be dominated by the %correct-likelihood if there are many no-response  trials.
ANY FEEDBACK  ABOUT THE LIKELIHOOD FUNCTION IS APPRECIATED, ESPECIALLY IF IT HELPS REDUCING THE BIAS IN THE POSTERIORS
    
    def wfpt_like(x, v, sv, a, z, sz, t, st, p_outlier=0):
        if abs(x['rt'][x.index[0]]) > .01:
            return hddm.wfpt.wiener_like(x['rt'], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
        else:
            ## get sum of log p for trials with RTs as usual ##
            sum_logp_resp = hddm.wfpt.wiener_like(x['rt'][1:], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
            
            ## use bernoulli functiont to get log-likelihood of error/correct distribution given DDM determined accuracy
            # first get number of error an correct trials
            # this function assumes following format for the RTs:
            # - accuracy coding such that correct responses have a 1 and incorrect responses a 0
            # - usage of HDDMStimCoding for z
            # - within one condition the first RT is #no_response_trials/1e+5
            # - note that hddm wil flip RTs, such that error trials have negative RTs
            # so that the miss-trial in the go condition and comission error in the no-go condition will habe negative RTs
            
            # when no-response trials have a positive RT, we are looking at nogo Trials
            if x['rt'][x.index[0]] > 0:
            n_correct = x['rt'][x.index[0]] * 1e+5
            n_error   = len(x)-1
            p_noresponse = n_correct / (n_correct + n_error)
            # when no-response trials have a negative RT, we are looking at go Trials
            else:
            n_error = x['rt'][x.index[0]] * -1e+5
            n_correct   = len(x)-1
            p_noresponse = n_error / (n_correct + n_error)
            
            # percentage correct according to probability to get to upper or lower boundary
            if v == 0:
            p_correct = z
            else:
            p_correct = (np.exp(-2*a*z*v) - 1) / (np.exp(-2*a*v) - 1)
           
            # log likelihood with bernoulli function
            log_p_correct = (np.log(p_correct)*n_correct + np.log(1-p_correct)*n_error)/p_noresponse
            
            return sum_logp_resp + log_p_correct
 

Now to the results, which are in the attached figures:

PR_HDDM4CPT.png shows true parameter values and posterior means for 64 parameter combinations (4z*4a*4v, t was fixed at .2) which produce an accuracy in the range typically found in the CPT.

The data used here are all trials with RT for correct responses and false alarms, + the number of misses and the number of correct rejections. I generated  medium sized data sets of 1800 trials (i.e. 5 Conners CPTs)

PR_HDDM_fulldata.png shows for comparison the recovered parameters when we use the full data set (i.e. misses and correct non-responses are included with their RTs.)

The key results are:

- parameters are (as expected) well recovered when using the full data

- when using only CPT data, this leads to underestimation of v and overestimation of a and z

- fortunately, this is a simple linear bias such that the order of the recovered parameters is (essentially) the same as the order of the true parameters

NOTE that all these results are for the Conners CPT, which as 90% target trials that require a response. I would expect the bias to be bigger for other CPTs with fewer target trials.


All in all, it looks to me as if this is usable. I would be happier without the bias, but it is not big problem for my application, where I am more interested in finding differences within participants and between groups.


Finally I just repeat that any feedback about the likelihood function would be welcome!


Cheers - Guido


PR_HDDM_fulldata.png
PR_HDDM4CPT.png

Imri Sofer

unread,
Sep 17, 2013, 10:15:28 AM9/17/13
to hddm-...@googlegroups.com
correct me if I'm wrong,
we can frame the problem in the following way:
In each condition all responses that reach boundary A ended with response, and the ones reach boundary B ended win no-response. This is true whether you use accuracy or stimulus coding.
the probably of each trial is wfpt(rt) if X=A. if X=B then the probability is integral(wfpt(rt))=P(no_response)=P(hitting the lower boundary)

there for the likelihood should be: sum_logp_resp + log(p_noresponse)*n_noresponse
p_noresponse should be computed directly from the parameters and not from the data, and it should be equal to the P(hitting the lower boundary),
so it should be (np.exp(-2*a*z*v) - 1) / (np.exp(-2*a*v) - 1), or 1 minus this expression.

At least I think so.


Guido Biele

unread,
Sep 17, 2013, 11:20:38 AM9/17/13
to hddm-...@googlegroups.com
Hi Imri,

the solution you propose ( sum_logp_resp + log(p_noresponse)*n_noresponse ) is indeed the first I implemented.
However, this resulted in an a large bias of the estimated parameters (compared to the current solution), which got me thinking.

I think the problem with that solution is that we get maximum likelihood for the prediction of the non-responses if we predict that the no-response boundary is hit with p = 1.
What we need instead is, I think, a likelihood function that is maximal where observed_p_noresponse = predicted_p_noresponse. We get this with the bernoulli as the likelihood function.
I reformulated p_noresponse into p_correct, because that seemed to streamline the code (I hope I didn't mix up anything there...*).
Therefor, p_correct and p_noresponse in my solution is calculated from the data, and the likelihood is calculated for p_correct given the model and its parameters:
(log_p_correct = (np.log(p_correct)*n_correct + np.log(1-p_correct)*n_error))
That i additionally devide by p_noresponse is a "trick" I used because it looked to me as if the likelihood for p_noresponse overwhelmed the likelihood for the RT distribution. I'm not sure if this a good trick, but it seemed to improve the parameter recovery.

The key intuition behind the likelihood function I described is the following.
- we want parameters that accurately predict RT distribution and the %correct answers (which are p_response in the target trials and p_noresponse in the non-target trials)
- so I just take the RT likelihood as calculated by HDDM and add the likelihood for p_correct (or p_noresponse) as calculazed from the bernoulli distribution.

The point where I'd most like to here your opinion (and which might resolve the discrepancy between the approaches we propose) is if you agree that a likelihood function log(p_noresponse)*n_noresponse favors parameters that make p_noresponse = 1, and that we rather should have maximum likelihood for p_noresponse at observed_p_noresponse = predicted_p_noresponse (which I think I am doing, even if I do it with the p_correct).
(NOTE that differently than in my notation, p_noresponse in your data refers to the predicted noresponse probability).

What do you think?

Cheers - Guido

*I hope I didn't screw up the transition between p_noresponse and p_correct. I'll try to refromulate the bernoulli likelihood in terms of p_noresponse and 1-p_noresponse when the kids are sleeping.

Imri Sofer

unread,
Sep 17, 2013, 1:45:23 PM9/17/13
to hddm-...@googlegroups.com
- it is true that log(p_noresponse)*n_noresponse would be at maximzed when p=1, but we also have the sum_logp_resp component which should balance that.
- you should be able to recover the estimated parameters using the correct likelihood function, any discrepancy points to a problem.
- Even if using the data to compute p_norepsonse improve the bias, it is still incorrect. So I think it is better to find the correct solution or at least we should realize what is the correct problem.

I think the likelihood function is correct, but It is very possible that I'm missing something.
It's very similar to modeling censored data. for example,  see eq 2 in http://yaroslavvb.com/papers/gelman-parameterization.pdf

P(rt) =  
if rt > 0:
    return wfpt(rt)
else:
   return the cdf of wfpt from -inf to 0 (which is equal to the p_noresponse)

Thomas Wiecki

unread,
Sep 17, 2013, 2:20:52 PM9/17/13
to hddm-...@googlegroups.com
I understand the whole logic of integrating out unobservable data points in the way you guys are describing in a mathematical sense. However, intuitively, it seems a bit odd just because the probability mass of hitting the boundary will be much larger for each unobserved RT compared to the probability density of hitting the boundary at a specific time-point. That sounds as if it would introduce bias in the way that probability mass of unobserved RTs is maximized. Is that what you are seeing Guido?

Can someone enlighten me why my intuition is incorrect?

Thomas

Guido Biele

unread,
Sep 17, 2013, 4:49:06 PM9/17/13
to hddm-...@googlegroups.com
Hi Imri,

I agree with you that my clumsy attempt to deal with the comparatively large likelihoods for the no-response by multiplying with observed_p_noresponse (actually I devided by, which is also wrong) is oversimple and wrong, and that we should find a more principled solution.

I also see that similarity between using log(predicted_p_noresponse)*observed_n_noresponse and equation 2 for the likelihood of censored data in the Gelman paper.
Maybe the reason this works better is that the integral of the censored part of the data is at the tail of the normal distribution, such that even the integral is not to large compared to the densities at the central parts of the distribution where the likelihoods for the other data points are calculated?

also see my response to thomas.

Cheers - Guido

Guido Biele

unread,
Sep 17, 2013, 4:54:48 PM9/17/13
to hddm-...@googlegroups.com
Hi Thomas, Hi Imri,

I agree with Thomas' intuition.
to double-check I looked at some simulated data (1620 trials, 1404 correct responses and 216 noresponse trials).

Using the true parameter values I get following likelihoods:
For correct RTs: -888
For noresponse trials as calculated as log(p_noresponse)*n_noresponse: -442
(Note that there are 6.5 times as many response as no-response trials, giving relatively rare noresponse trials a large weight in the summed likelihood)

Now when i looked at the bernoulli as the likelihood function, I saw that this is actually very similar to the approach Imri proposed:

When the no-response trials are the incorrect trials, then the likelihood for the no-response trials can be written as
log(1-p_correct)*n_errors
the bernoulli likelihood I used was
log(1-p_correct)*n_errors + log(p_correct)*n_corrects

I see the intuition behind using log(1-p_correct)*n_errors as taking care of the trials that are not used when calculating the RTs likelihoods. But it seems that the fact that their maximal likelihood is where p_error = 1 with the relative greater weight of the p_noresponse likelihood causes problems.

the approach i proposes takes care of the problem that the maximum likelihood for p_noresponse is at parameters where p_error = 1*, but makes the problem of overweighting the log likelihoods for p_noresponse (as I defined it) only greater. My way to deal with this problem was wrong on two counts: first, I should have multiplied with p_noresponse, and second even that approach lacks deeper justification.

I have no clear idea for how to solve this at the moment, but I'll keep thinking.

Cheers - Guido

Guido Biele

unread,
Sep 17, 2013, 6:16:56 PM9/17/13
to hddm-...@googlegroups.com
I forgot to add that the posterior means for the target-trials (with a high percentage correct) look quite ok generally. it is the posterior for the non-target trials (where the proportion of no-response trials is high) that often is strange.
guido
Message has been deleted

Guido Biele

unread,
Sep 18, 2013, 8:09:57 AM9/18/13
to hddm-...@googlegroups.com
My first implementation of Imri's proposal must have been buggy because I find much better results after implementing it again today.
the attached plot shows no bias, good recovery of parameters a and z, and ok recovery of v.
(it looks as if for small a small v are underestimated and large v are overestimated, but this does not look very strong)
sorry that my mistake did cost you time!

here is the likelihood function that worked:

    def wfpt_like(x, v, sv, a, z, sz, t, st, p_outlier=0):
        if abs(x['rt'][x.index[0]]) > .01:
            return hddm.wfpt.wiener_like(x['rt'], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
        else:
            ## get sum of log p for trials with RTs as usual ##
            LLH_resp = hddm.wfpt.wiener_like(x['rt'][1:], v, sv, a, z, sz, t, st, p_outlier=p_outlier, **wp)
            
            ## get sum of log p for no-response trials from p(upper_boundary|parameters) ##
            # this function assumes following format for the RTs:
            # - accuracy coding such that correct responses have a 1 and incorrect responses a 0
            # - usage of HDDMStimCoding for z
            # - within one condition the first RT is #no_response_trials/1e+5
            # - note that hddm will flip RTs, such that error trials have negative RTs
            # so that the miss-trial in the go condition and comission error in the no-go condition will have negative RTs
            
            # get number of no-response trials from first entry in x
            n_noresponse = abs(x['rt'][x.index[0]]) * 1e+5
            
            # percentage correct according to probability to get to upper boundary
            if v == 0:
                p_correct = z
            else:
                p_correct = (np.exp(-2*a*z*v) - 1) / (np.exp(-2*a*v) - 1)            
            
            # calculate percent no-response trials from % correct                       
            if x['rt'][x.index[0]] > 0:            
                p_noresponse = p_correct # when no-response trials have a positive RT we are looking at nogo Trials            
            else:
                p_noresponse = 1-p_correct # when no-response trials have a negative RT we are looking at go Trials
            
            # likelihood for no-response trials
            LLH_noresp = np.log(p_noresponse)*n_noresponse
            
            return LLH_resp + LLH_noresp

 guido
PR_HDDM4CPTcorrected.png

Thomas Wiecki

unread,
Sep 18, 2013, 8:36:01 AM9/18/13
to hddm-...@googlegroups.com
Hi Guido,

Looks great -- congrats on getting the likelihood to work. This would be a great addition to HDDM so let me know if you want to add this once you tested the model a bit more.

Thomas

Imri Sofer

unread,
Sep 18, 2013, 8:54:49 AM9/18/13
to hddm-...@googlegroups.com
I'm really happy that it worked well at the end.
I have to say that I had (and still have) the same intuition as Thomas, so the fact that it had actually worked is still a surprise to me.

 

Guido Biele

unread,
Sep 27, 2013, 10:22:06 AM9/27/13
to hddm-...@googlegroups.com
Hi Thomas an Imri,

i ran a few more parameter recovery tests to check if we get the same results for a slightly wider range of parameters, for a smaller data set (360 trials, which is the number of trials of the CPT), and for different proportions of target trials (30,50,70,90%).
The attached png shows that this still works quiet well. especially, the parameter recovery looks still quite ok for fewer target trials. However, with few trials extreme parameter values were identified in some rare instances (I didn't check why exactly this happened, I guess this could happen when we have only very few or no data for one condition, e.g. no false alarm trials). 
But I don't think that this will be a problem in a "real" data analysis, where one has data from multiple participants and a hierarchical model should reign in outliers.

from my side, I am happy to use this to analyse some data.
do you have other tests in mind i should do before we conclude that this works?
(I don't think that a posterior predictive check would be helpful here, because we already know that the data-generating process was drift diffusion process)

Cheers - Guido
Cross_validation_360trials_ptarget_30_50_70_90_FullRange.png
Cross_validation_360trials_ptarget_30_50_70_90.png

Thomas Wiecki

unread,
Sep 27, 2013, 10:35:28 AM9/27/13
to hddm-...@googlegroups.com
Hi Guido,

Looks great!

Am I wrong that we could just replace our original likelihood with this one that checks for missing values (i.e. nans) and models them in this way (i.e. integrating it out)? Should be generally applicable.

Thomas

Imri Sofer

unread,
Sep 27, 2013, 12:48:30 PM9/27/13
to hddm-...@googlegroups.com
Hi Guido,
thanks for sharing all of this.
From my own simulations, I found that it is important to use an informative prior when recovering with a small number of trials, which as you mentioned is a rare case.

Guido Biele

unread,
Sep 27, 2013, 4:20:19 PM9/27/13
to hddm-...@googlegroups.com
Hi Thomas,

sorry for the long response. I'm just trying to make sure that we are on the same page...

First I should clarify something I think I haven't pointed out clearly  until now: I used HDDMStimCoding for z for all analysis in combination with accuracy coding for the data. The idea was that people have a bias towards responding (at least in CPTs where the %target trials is clearly above 50%), so that they are biased towards the correct response in the target trials and biased towards the incorrect response in the non-target trials. [the bias should go in the opposite direction when the %target trials is clearly below 50%, that's something I haven't actually considered when doing the tests.]

As to the modified likelihood function: it is similar to what you describe.
But NaNs won't work because we need to know if the trials with missing RTs are the correct or the error RTs.
One could code the missing RT trials with inf, so that when the error RTs are multiplied with -1 one can still distinguish error trials from correct trials (or stimulus a responses from stimulus B responses in case of stimulus coding), because depending on if one is looking at target trials or non-target trials, the likelihood for the number of non-responses comes from the lower or upper boundary, respectively.

I actually coded the RTs so that the first RT for each stimulus was 1e-5*number_of_errors. The idea here was that one wouldn't need to count the error trials each time when the likelihood is calculated. But I now think that this maybe does not save to much time. (and 1e-5* number_of_errors is problematic with large datasets with many errors, but one could simply use 1000* number_of_errors to achieve the same thing)

I also want to make sure that I understand correctly what you mean with integrating out the non-response trials:
I think this is ok when you mean that we calculate the RT-likelihood only for the response trials. But it is important that we also add the likelihood of %non-response trials. So the procedure is more than only integrating out the non-response trials.

Hope this helps
Cheers-Guido

Thomas Wiecki

unread,
Sep 30, 2013, 9:41:01 AM9/30/13
to hddm-...@googlegroups.com
Hi Guido,

Right, so I guess we could just code it as -999 and +999.

Yeah, I mean that we replace all trials for which no RT was observed with the prob of hitting that boundary at any time.

Thomas

Guido Biele

unread,
Oct 3, 2013, 11:34:38 AM10/3/13
to hddm-...@googlegroups.com
sure, 999 is perfect.

do you want me to do anything on that (i.e. trying to make a pull request), or do you want to do it yourself?

cheers - guido

Thomas Wiecki

unread,
Oct 3, 2013, 2:10:44 PM10/3/13
to hddm-...@googlegroups.com
Thanks for offering, it'd be great if you could do a PR for this.

Thomas

David

unread,
Aug 10, 2015, 4:46:53 PM8/10/15
to hddm-users
Hi
I just wanted to check if there was an update on how this has been integrated now

...

Thomas Wiecki

unread,
Aug 11, 2015, 5:26:07 AM8/11/15
to hddm-...@googlegroups.com
Hi David,

it's not integrated yet, would be a great contribution though.

Thomas

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Marjorie

unread,
Jun 27, 2017, 12:43:23 PM6/27/17
to hddm-users
Hi all,

Can somebody advise me how to runthe HDDM on the data of a Go/No Go with missing RT's for non-responds task nowadays?

Do I still have to manually recode the likelihood.py and wfpt, or is there another more updated way? If so, where can I find how to execute this?

Thank you!
Marjorie



Op dinsdag 11 augustus 2015 05:26:07 UTC-4 schreef Thomas:

Jan Willem de Gee

unread,
Jul 10, 2017, 8:36:15 AM7/10/17
to hddm-users
Hi Marjorie,

It seems that this has been implemented: https://github.com/hddm-devs/hddm/blob/master/hddm/likelihoods.py (from line 55 onwards). If I understand correctly, you should now code your missing RT's as NaN's.

I personally do not understand the following comments in the code:
# - accuracy coding such that correct responses have a 1 and incorrect responses a 0
# - usage of HDDMStimCoding for z
Suppose I would want to estimate starting point z, can I then use HDDMStimCoding (with a column "response" for choices, and a column "stimulus" for stimuli), or would this not work?

Cheers,

JW

Marjorie

unread,
Jul 31, 2017, 10:59:55 AM7/31/17
to hddm-users
Hi JW,

Thanks for the link. Yes, I use 999 or NaN for my missing RT's. 

I thinks the code tells you that you can use the missing RTs in either accuracy coding or stimulus coding. I think, if you have a hypothesis that z is biased, you can run that model on those data.

I'm still figuring out if it is possible to run a stim coded model with a within and between subject factor(time and groups). I dont know what kind of factors you have?

Marjorie

Nate Hall

unread,
Apr 27, 2020, 12:44:40 PM4/27/20
to hddm-users
Hi Thomas, Guido, and Imri,

Thanks much for putting the time in to getting an updated wpft likelihood coded and simulated, it was fun for my own edification to read through your correspondence. In digging through the likelihoods.py history, it looks like this was implemented almost 5 years ago at this point: https://github.com/hddm-devs/hddm/commit/dd9a7eeaf6c99e770f4741d04499f8a67c208271#diff-89701615fc3609bef90159a018b110b5.

Just to make sure I'm clear (I'm relatively new to python/HDDM), it looks like in the finalized likelihood that is currently implemented in HDDM users are to specify no-go trials with 999s for correct no-gos and -999s for incorrect no-gos (omission errors)? 

One other question I have is whether or not HDDMRegressor would be able to implement such a setup. For example, in my task the number of go trials that precede a no-go stimulus is manipulated within subject and I have become a fan of using the flexibility of HDDMRegressor to examine such within-subject effects (it is reasonable to expect that number of previous go trials might modulate z, for example). However, it looks like HDDMRegressor implements wfpt.wiener_like_multi, which doesn't seem to have Guido's code implemented. I honestly can't claim to know the difference between these two likelihood functions so perhaps it would be a simple extension? Curious to get your thoughts!

Thanks for your active maintenance of such and awesome package.

Nate



On Thursday, October 3, 2013 at 2:10:44 PM UTC-4, Thomas wrote:
Thanks for offering, it'd be great if you could do a PR for this.

Thomas

Thomas Wiecki

unread,
Apr 30, 2020, 11:12:16 AM4/30/20
to hddm-...@googlegroups.com
Hi Nate,

You understand the coding correct.

I recently added that support to HDDMRegression but it's still unreleased. You can check it out here: https://github.com/hddm-devs/hddm/pull/60 Some more eyes and further testing would be appreciated!

Best,
Thomas

--
You received this message because you are subscribed to the Google Groups "hddm-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hddm-users+...@googlegroups.com.

hankyo...@gmail.com

unread,
May 17, 2022, 10:00:25 AM5/17/22
to hddm-users
Hi Thomas and Nate,
    May I ask if there is any update for fitting a task with a single response (for example, missing RT for Nogo trials ) using the HDDMRegressor? The question is especially interesting to me in exploring the inhibitory control when no response is made. Unfortunately the above-mentioned solution for fitting missing response with HDDMRegressor was deleted  in the link you sent. Is there any further follow-up for this and any idea how to implement the regression? Thanks a lot for any comment.

Best,
Hang

Alexander Fengler

unread,
Jun 2, 2022, 5:37:13 PM6/2/22
to hddm-users
You might find this useful

(search for Stimulus coding with HDDMRegression)

Let us know if it doesn't address your concerns.

Best,
Alex
Reply all
Reply to author
Forward
0 new messages