Hi,
I've come across some surprising (at least to me) behavior when using
PyMC to model source and background counts from astronomical objects
(quasars, in case you're curious). The goal is, not surprisingly, to
estimate the posterior probability distributions for the flux of each
object. (The fluxes of the objects are independent, as are the
backgrounds at the locations of the sources.) For one object, the
model is simple:
r_obj = pymc.Uniform('r_obj', 0., tot_data*5., value=r_obj0)
r_sky = pymc.Uniform('r_sky', 0., tot_data*5., value=r_sky0)
@pymc.deterministic
def r_tot(r_obj=r_obj, r_sky=r_sky):
return r_obj + r_sky
d_tot = pymc.Poisson('d_tot', r_tot, value=tot_data, observed=True)
M = pymc.MCMC([r_obj, r_sky, r_tot, d_sky, d_tot])
M.sample(100000, 10000, 10)
where tot_data is the observed source plus sky counts at the source
position. This model works perfectly.
The mystery arises when I try to model a bunch (several hundred)
sources simultaneously. As I add to the list of sources that I'm
modeling, the 95% HPD interval for the first source in the list
shrinks, even though, as mentioned above, the sources fluxes and
background fluxes at each source position are independent and modeled
as such. A figure illustrating the shrinkage of the HPD interval is
here:
http://dl.getdropbox.com/u/647515/PyMC/FUVFlux.png
The solid red line in the figure is the flux of the first source, and
the dashed red lines are the +/- 2 sigma range computed from the
square root of the counts (Poisson statistics, the error from the
background counts can be neglected for this object). When there are
~40 or fewer objects in the sample, the 95% HPD interval, shown by the
blue error bars, is correct, but the interval shrinks when there are
more than 40 objects in the sample.
A script to reproduce my results is here:
http://dl.getdropbox.com/u/647515/PyMC/FUVFlux.py
I'm using Python 2.5 on a Mac with Numpy 1.3 and an SVN version of
PyMC from about a month ago.
Does anyone have any thoughts about what might be going wrong?
Thanks,
Todd