Random Deterministic?

Delphine Pessoa

unread,

Jan 4, 2011, 7:42:31 AM1/4/11

to py...@googlegroups.com

Hello,

I'm trying to estimate some bacterial transmission parameters.
I count the number of certain events that can be observed and estimate which parameters fit best.

Here is a simplification of my code.

obs = array( [3, 5, 4, 3, 6, 14, 6, 7, 7] )
susceptibles = array( [34.0, 30.0, 22.0, 25.0, 31.0, 36.0, 32.0, 33.0, 37.0] )
tintervals = array( [ 1. , 1.5 , 1. , 1.25, 1. , 1.25, 1. , 1.25, 0.75] )

clear_rate = Uniform('clear_rate', 0,100)

@deterministic
def exp_ev(S=susceptibles, cl_rt = clear_rate, t = tintervals):
   return (S) * (cl_rt) * (t)

likelihood = Poisson('likelihood', mu=exp_ev, value=obs, observed='True')

By simulating data, I have found that the number of observed events is less than the number of events that happen (some events are hidden), which causes the parameter estimation to be biased. I've also found that the number of unobserved events is Normally distributed. I would like to include this in the model, I tried

@deterministic
def exp_from_obs(S=susceptibles, cl_rt = clear_rate, t=tintervals):
   return (S) * (cl_rt) * (t)

unobs = Normal('unobs', 2.77, 3)

@deterministic
def exp_ev(exp_obs=exp_from_obs, unobs=unobs):
   return exp_obs - unobs

likelihood = Poisson('likelihood', mu=exp_ev, value=obs, observed='True')

As the number of unobserved events (unobs_ev) is Stochastic, it is also estimated from the data.
The Normal distribution I give is taken as a prior distribution, the posterior distribution in some cases is different and my estimation remains biased.

unobs is not just one value, but it should not be estimated from the data.
What should I do?

Thank you,

Delphine

Chris Fonnesbeck

unread,

Jan 4, 2011, 9:44:15 AM1/4/11

to PyMC

On Jan 4, 6:42 am, Delphine Pessoa <delphinepes...@gmail.com> wrote:
> As the number of unobserved events (unobs_ev) is Stochastic, it is also
> estimated from the data.
> The Normal distribution I give is taken as a prior distribution, the
> posterior distribution in some cases is different and my estimation remains
> biased.
>

> *unobs* is not just one value, but it should not be estimated from the data.
> What should I do?

If I understand your problem correctly, you are trying to model the
population event rate based on a sample of data that contains
observation bias, that is, events are not detected perfectly. In
general, an effective way of tackling such problems is to break the
model into a process submodel, which in this case is the Poisson model
for describing the true number of events, and an observation submodel,
which describes the observation bias. You have sort of done that here,
except that you have fixed the mean and variance of the unobserved
values. Another (probably better) approach is to model a detection
probability that is simply:

observed = p * actual

Then set up a stochastic node for p, using a beta, for example. The
main impediment to doing that in your case is that there may not be
sufficient information in the data to separately estimate p and the
rate parameter. Are there any covariates available that might help you
estimate detection? You said your simulations indicated observation
bias -- what was in the simulation that made this happen?

cf

Delphine Pessoa

unread,

Jan 5, 2011, 9:32:24 AM1/5/11

to py...@googlegroups.com

Thank you for your response Chris.

We count event by comparing the state of the individuals at two consecutive sampling times.
What we observe in the simulation is that the individuals might change state more than once between these sampling times and therefore the "apparent" number of events (transitions) is less than what is expected from the model.
We can simulate using chosen parameter values and sampling frequencies to get an idea of this observation bias (counting the number of apparent transitions by sampling the individual's states at chosen sampling times and comparing to the number of simulated transitions).

The problem when using this probability of observation is the same as with the number of unobserved events.
If I set p with a Beta distribution, this will be its prior probability distribution, and pymc will try to estimate it from the data.
It's not possible to differentiate between a low detection of events and high rate or the opposite. So extreme values for both are accepted and the posterior distributions are very wide.

Delphine

--
You received this message because you are subscribed to the Google Groups "PyMC" group.
To post to this group, send email to py...@googlegroups.com.
To unsubscribe from this group, send email to pymc+uns...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/pymc?hl=en.

Chris Fonnesbeck

unread,

Jan 5, 2011, 10:46:44 AM1/5/11

to PyMC

On Jan 5, 8:32 am, Delphine Pessoa <delphinepes...@gmail.com> wrote:
> Thank you for your response Chris.
>
> We count event by comparing the state of the individuals at two consecutive
> sampling times.
> What we observe in the simulation is that the individuals might change state
> more than once between these sampling times and therefore the "apparent"
> number of events (transitions) is less than what is expected from the model.
> We can simulate using chosen parameter values and sampling frequencies to
> get an idea of this observation bias (counting the number of apparent
> transitions by sampling the individual's states at chosen sampling times and
> comparing to the number of simulated transitions).
>
> The problem when using this probability of observation is the same as with
> the number of unobserved events.
> If I set p with a Beta distribution, this will be its prior probability
> distribution, and pymc will try to estimate it from the data.
> It's not possible to differentiate between a low detection of events and
> high rate or the opposite. So extreme values for both are accepted and the
> posterior distributions are very wide.

You are exactly right -- given just counts of events, you do not have
the information to separately estimate the two parameters. Typically,
you need a sequence of observations in order to get at the observation
process. If you look at the salamander occupancy model example on the
PyMC main page, it will give you an idea of what is required -- in
this case they have a sequence of capture occasions for marked
individuals, and the sequences of observations of capture events holds
independent information regarding the observation process. You may be
stuck with having to fix hyperparameters for the beta distribution
based on any prior information you may have.

Are you saying that some individuals are infected, recover, then are
infected again within the same time step? In that case, should you not
be accounting for a recovery rate? You might be able to set up a
latent variable model that predicts unobserved transitions in the same
model. If rates of infection and recovery are high, you might expect
lots of mutliple transitions, whereas if one or the other are low, you
would not. I guess i dont know enought about the system to say for
sure.

cf

Reply all

Reply to author

Forward