binomial models with varying, including zero, number of trials

427 views
Skip to first unread message

Duncan McPherson

unread,
Oct 3, 2013, 4:34:19 PM10/3/13
to r-inla-disc...@googlegroups.com
I'm trying to use INLA to model disease treatments in different areas over time.

The data are the number of cases of disease, and the number of treatments for that disease in each area in each time period. The main thing I'm interested in is the probability of treatment in any given area, so it seems obvious that I would model the number of treatments as being a binomial process with number cases known and some underlying unknown probability (to be estimated), dependant on the area and time period.

Unfortunately, some of the areas have no cases for some time periods, and therefore also no treatments. However, I can't just throw these away, since there is still some underlying probability of treatment, were there to be a case, and presumably also of there being some cases.

This paper sets out better than I why these zeros are of interest, and some ways to deal with them: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2843591/. The summary is to model the cases and treatments as two poisson processes that are correlated.

This paper uses INLA to model a poisson process and then a binomial process that depends on the number of trials in the poisson process: http://onlinelibrary.wiley.com/doi/10.1111/2041-210x.12017/abstract. It is also very similar to the toy example http://www.r-inla.org/models/tools#TOC-Models-with-more-than-one-type-of-likelihood. However, in both these cases none of the instances have zero trials.

My example looks something like this:

n=100
area = rep(1:10, each=10)
time = rep.int(1:10, 10)
cases = rpois(n, 4)
treatments = round(0.5*cases, 0)

data <- data.frame(area, time, cases, treatments)

formula <- treatments~1+f(area) #ignoring the time variable for now

result <- inla(formula, family="binomial", data=data, Ntrials=cases)

And it produces error:

*** ERROR *** INLA.Data1: Binomial data[33] (nb,y) = (0,0) is void

because it can't handle having instances of zero cases. I've also tried putting it into the form of the Illian paper, but the same error appears. Can you suggest how I can achieve something like the Scott Comulada model, or adapt the Illian model to my uses. Or, which is quite likely, can you suggest a better way than either?

Thank you,

Duncan

INLA help

unread,
Oct 4, 2013, 12:19:15 AM10/4/13
to Duncan McPherson, r-inla-disc...@googlegroups.com
Hi,

So if n=0 and y=0 in a Binomial, then the likelihood should be set to a
constant (=1) ? would that solve the issue????

H




--
Håvard Rue
he...@r-inla.org

Duncan McPherson

unread,
Oct 4, 2013, 5:19:30 AM10/4/13
to r-inla-disc...@googlegroups.com, Duncan McPherson, he...@r-inla.org
Thanks for the speedy reply. I'm not sure I'm clever enough to quite understand your suggestion.

In general, posterior=prior*likelihood, so I can see that if likelihood is set to 1 then posterior=prior. Presumably then one couldn't say that the prior was 'uninformative' and would have to make some adjustments there too?

I'm not sure what likelihood you are suggesting setting to 1. How would I do this?

I am trying to find out what the underlying 'propensity' or 'capacity' for treating the disease is in each area at each time. If there is, for example, a doctor in a clinic, then there is some capacity to treat disease even if no patients turn up during a particular month. It is more likely in my case that there is some capacity, but not enough to treat everyone, and that this capacity is increasing over time. The areas are also related to each other by a graph that I will put in later.

Referring to the example case I gave above, what I am interested in is estimating the '0.5', which is actually different for each area and time and could be anything from 0 to 1.

Thank you again, I hope I've made my problem clearer?

Duncan

Finn Lindgren

unread,
Oct 4, 2013, 5:31:22 AM10/4/13
to he...@r-inla.org, Duncan McPherson, r-inla-disc...@googlegroups.com
INLA wrote:
> So if n=0 and y=0 in a Binomial, then the likelihood should be set to a
> constant (=1) ? would that solve the issue????

Duncan,
what Håvard suggested was that we change the inla-code to support
this, not that there is somewhere where you could do it yourself.

What you have is a situation where in order to get the correct
parameter estimates, you could simply leave those "observations of
Binomial(0, p)" out of the problem entirely, since they will have no
effect on the model estimates (the locations should still be in the
model, but you don't need to actually observe them, since they will
always be zero, given "no cases", if I understood your model
correctly).

That should solve the immediate estimation issue. But if inla
supported n=0 by explicitly defining the likelihood for that
degenerate case to be =1, then you would be able to get localised
estimates of "p" for those locations without needing to change your
code (I think).

Finn L

Finn Lindgren

unread,
Oct 4, 2013, 5:33:21 AM10/4/13
to he...@r-inla.org, Duncan McPherson, r-inla-disc...@googlegroups.com
On 4 October 2013 10:31, Finn Lindgren <finn.l...@gmail.com> wrote:
> What you have is a situation where in order to get the correct
> parameter estimates, you could simply leave those "observations of
> Binomial(0, p)" out of the problem entirely, since they will have no
> effect on the model estimates (the locations should still be in the

Note: it's possible that this won't help, since implementation details
may mean that n=0 simply isn't supported at all; I haven't tested it.

Finn

Havard Rue

unread,
Oct 4, 2013, 5:58:04 AM10/4/13
to Finn Lindgren, Duncan McPherson, he...@r-inla.org, r-inla-disc...@googlegroups.com

I'm still confused about what the problem really is...

--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To post to this group, send an email to r-inla-disc...@googlegroups.com.
Visit this group at http://groups.google.com/group/r-inla-discussion-group.
For more options, visit https://groups.google.com/groups/opt_out.

Duncan McPherson

unread,
Oct 4, 2013, 7:27:11 AM10/4/13
to r-inla-disc...@googlegroups.com, Finn Lindgren, Duncan McPherson, he...@r-inla.org, hr...@math.ntnu.no
 Havard and Finn

Thank you, I hadn't appreciated that it might involve changes to the code from your end.

It sounds like Finn's suggestion would solve the mathematical problem (although I'm not a mathematician, so may be wrong), but thinking about what it means to my application is interesting. For example, the information that an area has no cases of disease occasionally may (should) affect the capacity to treat that disease, but that may vary between diseases or treatments.

Let me try to explain more clearly my problem. There is a country, divided into areas. In each area there is the potential to apply a treatment for a specific disease (for example each area has specialist doctors, hospitals, medicines, equipment etc). In each area cases of the disease arise randomly (under the influence of the population in the area, their health status etc) and each case may or may not receive the treatment. The potential to treat cases is probably increasing over time, but at different rates in different areas. The area and time boundaries are blurred because the capacity to treat at time t+1 is related to the capacity to treat at time t (if the number of doctors increases slowly over time for example), and because neighbouring areas influence each other.

Now, this capacity to treat disease (phrased as the probability that a case will receive treatment) is varying but present throughout time and space, and I want to be able to know what that capacity is at any time and place (only within the time that I have measured and the areas within the country). Likewise the potential to develop cases of the disease is varying but present throughout time and space, even though there may be no realisations of the potential in a particular area at a particular time. 

Although the capacity to treat may exist, it cannot be deployed unless there is a case of the disease to apply it to. But the capacity still remains, and I still want to estimate what it is.

What I had hoped to end up with is a model something like: 

formula <- treatment ~ f(area, model="bym", ...) + f(time, model="rw1", ...) + (area.time, model="iid", ...)
result <- inla(formula, family="binomial", Ntrials=cases, ...)

But if there is a better way to do this then I'm open to suggestions. If code changes are necessary I'm happy to help with testing as well.

Thanks again

Duncan

Finn Lindgren

unread,
Oct 4, 2013, 9:05:05 AM10/4/13
to Duncan McPherson, r-inla-disc...@googlegroups.com
Hi,

I explained some more details directly to Duncan, but in summary, for
the benefit of the list:

Solution 1) Extend the INLA Binomial model to support the degenerate
deterministic "Bin(0, p)" model, where the parameter "p" has no
influence, as all realisations are necessarily =0. This requires some
small changes to the current inla code.

Solution 2) Remove the observations where cases==0.
Since those observations would have no effect anyway, this should give
the same estimates as for Solution 1, but can be done without any
changes to INLA.

Finn

--
Finn Lindgren
email: finn.l...@gmail.com

Havard Rue

unread,
Oct 4, 2013, 9:29:45 AM10/4/13
to Finn Lindgren, Duncan McPherson, r-inla-disc...@googlegroups.com

I can change the code to allow for the 0  0 case. It should be easy.

Duncan McPherson

unread,
Oct 4, 2013, 9:44:27 AM10/4/13
to r-inla-disc...@googlegroups.com, Finn Lindgren, Duncan McPherson, hr...@math.ntnu.no
Thank you very much.

I'll look forward to testing it out when it's done, let me know.

Duncan

Duncan McPherson

unread,
Oct 5, 2013, 7:52:35 AM10/5/13
to r-inla-disc...@googlegroups.com, Finn Lindgren, Duncan McPherson, hr...@math.ntnu.no
I've just tested the latest testing version (0.0-1380813620) and it handles my binomial(0,p) issue.

48 hours from issue raised to tested fix!

And that is why open source is the best...

Keep up the good work,

Duncan

bach...@gmail.com

unread,
Jun 12, 2021, 2:36:48 PM6/12/21
to R-inla discussion group
Hello there,

I believe I have inadvertently discovered that while the Binomial likelihood now tolerates the binomial(0,p) issue as noted above, the beta binomial likelihood does not.  I'm happy to follow the same solution above of filtering out the zero trials observations, but any reason not to have the same protection within this likelihood also?

Thanks,
Jonathan

Helpdesk

unread,
Jun 12, 2021, 3:18:35 PM6/12/21
to bach...@gmail.com, R-inla discussion group

its just a matter of inconsistent coding, in that case the likelihood is
just 1 and does not inform about anything. I am more tempted to remove
the y=0,n=0 case from the binomial than to go through all and make the
required changes. maybe best to leave it

H
> --
> You received this message because you are subscribed to the Google
> Groups "R-inla discussion group" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to r-inla-discussion...@googlegroups.com.
> To view this discussion on the web, visit
> https://groups.google.com/d/msgid/r-inla-discussion-group/76a9452f-184c-4304-add4-a85b634cb91dn%40googlegroups.com
> .

--
Håvard Rue
he...@r-inla.org

Reply all
Reply to author
Forward
0 new messages