what is binomial distribution doing with non-integer data

343 views
Skip to first unread message

Jiang Du

unread,
Feb 10, 2017, 8:19:44 PM2/10/17
to R-inla discussion group
I am going through a paper:
"The use of sample weights in Bayesian spatial hierarchical models for small area estimation"
and it has some sample code at:

It proposes handling complex survey data by using a binomial likelihood to model the the "effective number of successes" out of the effective sample size, where both are calculated based on the survey design effect. But:since both these numbers are fractional, I'm wondering what exactly the INLA package is doing behind the scenes. The code runs successfully, but what do the results correspond to? Is it rounding the numbers, using a generalized binomial in terms of gamma functions, or something else? The documentation at http://www.math.ntnu.no/inla/r-inla.org/doc/likelihood/binomial.pdf does not indicate how the INLA package would respond.

Thanks!

INLA help

unread,
Feb 11, 2017, 11:47:28 PM2/11/17
to Jiang Du, R-inla discussion group
Hi,

by reading the relevant part of the code, then

status = gsl_sf_lnchoose_e((unsigned int) n, (unsigned int) y, &res);
                   
logll[i] = res.val + y * log(p) + (n - y) * log(1.0 - p);


so the normalizing constant uses the integer part of 'y', which in this
case only will influence the marginal likelihood estimate.

the log-likelihood uses 'whatever values of y is that entered', and
there are no check on 'integer values of y only'.


PS: If you have suggestions for improvement, please let me know

Best
H


                        
--
Håvard Rue
he...@r-inla.org

Jiang Du

unread,
Feb 12, 2017, 12:46:31 AM2/12/17
to R-inla discussion group, jiang...@gmail.com, he...@r-inla.org
Thanks for your reply.

Correct me if I'm wrong. When INLA deals with this situation, the "choose y from n" part only uses the integer parts of y and n after truncating both fractions, while y * log(p) + (n - y) * log(1.0 - p) uses the real values.

To be honest, I don't have any suggestions for this problem other than restricting both y and n to be integers, since by definition the density should just be zero. There could be some smoothing approximation version for the density functions but again it will mess up with the normalizing constant, which actually contributes to the posterior distributions of relevant parameters. The fraction problem should be handled before fitting the model, though the paper I mentioned didn't say anything about that. I think I will go ahead to fit the binomial - BYM model with fractions, then compare with the result if I round up/down the fractions to integers.

在 2017年2月11日星期六 UTC-6下午10:47:28,help help写道:

INLA help

unread,
Feb 12, 2017, 1:03:54 AM2/12/17
to Jiang Du, R-inla discussion group
On Sat, 2017-02-11 at 21:46 -0800, Jiang Du wrote:
> Thanks for your reply.
>
> Correct me if I'm wrong. When INLA deals with this situation, the
> "choose y from n" part only uses the integer parts of y and n after
> truncating both fractions, while y * log(p) + (n - y) * log(1.0 - p)
> uses the real values.

that is correct. but {n \chose y} does not depend on 'p' so its just a
constant.

however, if we allow for continous y from 0...n, then we get the
attached density. I guess it 'has a name' but its just the contious
extension of the Binomial when y is continous from 0..n (and also n can
be continous).

Anyone knows something about this one?
Screenshot from 2017-02-12 09-02-44.png
Reply all
Reply to author
Forward
0 new messages