Poisson with non-integer data

272 views
Skip to first unread message

Thierry Onkelinx

unread,
Jul 20, 2021, 6:10:28 AM7/20/21
to R-inla discussion group
Dear all,

We have an area divided into smaller parts. We counted the number of animals in each of the parts. In some parts we used one method, in other parts a second method. The goal is to see whether the methods yield a different observed density. So the model is quite simple.

inla(
  count ~ part_area + method + f(part, model = "iid"), family = poisson,
  control.fixed = list(
    mean = c(part_area = 1, default = 0),
    prec = c(part_area = 1000, default = 0.001
  )
)

The problem is that some animals have their home range in two neighbouring parts. Or partly outside the study area. Attributing a fraction of the count to the part leads to non-integer counts, making the Poisson distribution invalid.

Suppose that a part gets a count of 3.2. An idea is to split the observation into two observations.? One with count 3 and weight 0.8 and the second with count 4 and weight 0.2. And then apply a Poisson distribution with a weighted log-likelihood (using the weights argument). Does this make sense? Is there a better way of doing this?

Best regards,

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry....@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////


Finn Lindgren

unread,
Jul 20, 2021, 6:31:45 AM7/20/21
to Thierry Onkelinx, R-inla discussion group
Hi THierry,

whether the fractional splitting and log-likelihood weighting itself is the best approach to the problem is a longer story (e.g. can it be formulated as a model that is motivated directly by a data generating process for your situation), but when it comes to implementing it, there is a shortcut: If you have weights w1 and w2 such that w1+w2=1 and y=w1 y1 + w2 y2 (as in your example), then the weighted log-likelihood for intensity lambda becomes

w1 y1 log(lambda) - w1 lambda + w2 y2 log(lambda) - w2 lambda - w1 log(y1!) - w2 log(y2!)
  = (w1 y1 + w2 y2) log(lambda) - (w1 + w2) lambda - w1 log(y1!) - w2 log(y2!)
  = y log(lambda) - lambda - w1 log(y1!) - w2 log(y2!)

Only the y-factorial terms and the fact that y can be fractional distinguishes this from a regular Poisson model.
R-INLA implements the "xpoisson" model, that implements the log-likelihood

  y log(lambda) - lambda - log(floor(y)!)

for non-integer y-values.

Apart from the factorial terms, that's identical to the weighted log-likelihood model construction.
The difference in the normalisation constant means you need to be careful about using the likelihood values for comparisons with different data values (this is also true for weighted log-likelihoods)

See inla.doc("xpoisson") for some more information (it's in the same document as the regular "poisson" model).

Finn


--
You received this message because you are subscribed to the Google Groups "R-inla discussion group" group.
To unsubscribe from this group and stop receiving emails from it, send an email to r-inla-discussion...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/r-inla-discussion-group/CAJuCY5xkgxUsEdnSVzZcoWBh-m67rOqwZdgcQeqtnBugwywm_Q%40mail.gmail.com.


--
Finn Lindgren
email: finn.l...@gmail.com

Thierry Onkelinx

unread,
Jul 20, 2021, 7:45:17 AM7/20/21
to Finn Lindgren, R-inla discussion group
Dear Finn,

Thanks for the feedback. I get sensible results using "xpoisson".

Best regards,

Thierry

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry....@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///////////////////////////////////////////////////////////////////////////////////////////
To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data. ~ John Tukey
///////////////////////////////////////////////////////////////////////////////////////////




Op di 20 jul. 2021 om 12:31 schreef Finn Lindgren <finn.l...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages