straight line fitting -- faster convergence

Matt

unread,

Oct 9, 2009, 6:03:22 PM10/9/09

to PyMC

There was a thread several months ago on fitting a straight line to
data with errors on both the dependent and independent variables and
intrinsic scatter explicitly included in the model (http://
groups.google.com/group/pymc/browse_thread/thread/fef4582f1ff73077#)
following the model of Kelly 2007 (ApJ 665, 1489).

I found that it took a very long time for Todd's code to converge for
simulated data when the errors on `x' are significantly larger (eg 15
times) than the errors on `y' (order several hundred thousand
iterations) and if this lack of convergence is not recognized it leads
to biased (flattened) slopes (indeed, the results from `short' runs
agreed with the results obtained using a very broad uniform prior on
x_true, eg eqn 24 from Kelly 2007).

I've coded up a verbatim implementation of Kelly's model (for K=1, a
single component Gaussian mixture model -- extending to K components
should be straightforward) that converges substantially faster (eg ~
several thousand iterations or fewer). I slightly modified the prior
on mu0 (the location of the Gaussian distribution) because there was a
strange behavior where this parameter would float to values very far
from the data x-values (this was generally only seen after a few
hundred thousand iterations). If anybody knows why this is happening,
I'm interested!

Finally, the code doesn't work with PyMC 2.0 because of implementing a
scaled inverse-chisq distribution via the InverseGamma function
(things like `x = pymc.Uniform('x',0,1); k = x/2' don't work in 2.0).
The code is below; hopefully it is found to be helpful!

Matt

# Linear fit with intrinsic scatter and errors on x and y
import numpy,pymc
pi = numpy.pi

def linear_model(x,y,ye,xe=None,doscatter=False):
if xe is None:
xe = x*0.

# First guess of parameters
slope,inter = numpy.polyfit(x,y,1)

# Inf for distributions
myinf = 1e15

# Kelly priors; could probably be (much) more restrictive
slope = pymc.Uniform('slope',-myinf,myinf,value=slope)
inter = pymc.Uniform('intercept',-myinf,myinf,value=inter)

# Allow intrinsic scatter to be an option
if doscatter:
sigma = pymc.Uniform('sigma',0.,myinf,value=0.)
else:
sigma = 0.

wsqr = pymc.Uniform('wsqr',0,myinf,value=1)
# original Kelly prior -- this leads to weird results for very long
chains
# mu0 = pymc.Uniform('mu0',-myinf,myinf,x.mean())
mu0 = pymc.Uniform('mu0',x.min()-3.*x.std(),x.max()+3.*x.std
(),x.mean())
usqr = pymc.InverseGamma('usqr',0.5,0.5*wsqr,value=1.)
mu1 = pymc.Normal('mu1',mu0,usqr,value=x.mean())
tau1 = pymc.InverseGamma('tau1',0.5,0.5*wsqr,value=1.)

@pymc.observed
def logpx(value=x,xe=xe,xt=mu1,tau=tau1):
err = tau**2 + xe**2
return (-0.5*(x-xt)**2/err - 0.5*numpy.log(2*pi*err)).sum()

@pymc.observed
def logpy
(value=y,x=x,ye=ye,xe=xe,tau=tau1,mu=mu1,m=slope,b=inter,s=sigma):
E = b + (m*tau**2/(tau**2 + xe**2))*x + (m*xe**2/
(tau**2+xe**2))*mu
VAR = (m*tau)**2 + s**2 + ye**2 - (m*tau**2)**2/(tau**2+xe**2)
return (-0.5*(y-E)**2/VAR - 0.5*numpy.log(2*pi*VAR)).sum()

model = pymc.MCMC
([slope,inter,sigma,wsqr,mu0,usqr,mu1,tau1,logpx,logpy])
model.use_step_method(pymc.AdaptiveMetropolis,
[slope,inter,sigma,wsqr,mu0,usqr,mu1,tau1])

return model

Chris Fonnesbeck

unread,

Oct 10, 2009, 4:01:06 PM10/10/09

to py...@googlegroups.com

On Sat, Oct 10, 2009 at 11:03 AM, Matt <mau...@physics.ucsb.edu> wrote:
> Finally, the code doesn't work with PyMC 2.0 because of implementing a
> scaled inverse-chisq distribution via the InverseGamma function
> (things like `x = pymc.Uniform('x',0,1); k = x/2' don't work in 2.0).

I assume, though, that this does work in the development code?

cf

--
Chris Fonnesbeck
Department of Mathematics and Statistics
PO Box 56, University of Otago
Dunedin, New Zealand

Matt

unread,

Oct 10, 2009, 4:49:37 PM10/10/09

to PyMC

On Oct 10, 1:01 pm, Chris Fonnesbeck <fonnesb...@maths.otago.ac.nz>
wrote:

Yes; I'm not sure at what point stochastic objects could be used like
this, but the original straight-line fitting post (from July) uses
similar code.

PyMC 2.0 throws an error along the lines of `unsupported operand type
(s) for /: 'float' and 'Uniform'' (I'm not sure if this is the exact
error, but it's something similar, I think).

Jason Gofford

unread,

Mar 7, 2014, 3:38:02 AM3/7/14

to py...@googlegroups.com

This is a long shot given the age of the post (and that this forum is deprecated) however I was wondering whether you (Matt) ever extend your model into a fully-verbatim version of the linmix_err routine, i.e., including K>1 and the treatment of censorship in the independent variable? Using your code as a template I've been trying to incorporate these edits but I have not had much luck thus far.

Reply all

Reply to author

Forward