multimodal curve fitting

Paul Kienzle

unread,

Sep 25, 2009, 6:50:55 PM9/25/09

to PyMC

Hi,

Following Aaron Parsons' example[1] I've set up a curve fit for a sine
wave with both frequency and phase as fitting parameters. Letting the
phase vary, I was expecting to see equally likely solutions show up
every 2 pi n + 0.5, but this did not occur.

Anyone care to comment on whether this is the right tool either for
global optimization, or for proper uncertainty analysis when multiple
solutions are possible?

Thanks in advance,

- Paul

[1] http://sciencestoriesetc.blogspot.com/2009/04/more-on-mcmc-in-python.html

My example:

import pymc, numpy as n, pylab as p
from pymc import deterministic

class Theory:
def __init__(self, x):
self.x = x
def fn(self, w, phi):
return n.sin(2*n.pi*(w*self.x+ phi))
def __call__(self, *args):
return self.fn(*args)

x = n.arange(-1., 1, .01)
sig = .1*n.ones_like(x)
tau = 1/sig**2
actual_func = Theory(x)

# Generate some fake data
# Normally x, sig and mdata comes from the experiment
noise = n.random.normal(0, scale=sig, size=x.size)
mdata = actual_func(5., 0.5)
mdata += noise
a_fit = pymc.Uniform('w', 4.5, 5.3)
b_fit = pymc.Uniform('phi', 1, 5)

@deterministic
def func(a=a_fit, b=b_fit):
return actual_func(a,b)

mdata_var = pymc.Normal('mdata', mu=func, tau=tau,
value=mdata, observed=True)

mdl = pymc.Model([a_fit, b_fit, mdata_var])
mc = pymc.MCMC(mdl)
mc.sample(iter=30000,burn=10)

atrace = mc.trace('w')[:]
btrace = mc.trace('phi')[:]
a = n.mean(atrace)
b = n.median(btrace) # Hope this is one of the modes
print 'w',n.mean(atrace),n.std(atrace)
print 'phi',n.mean(btrace),n.std(btrace)
p.subplot(311); p.errorbar(x,mdata,yerr=sig); p.plot(x,actual_func
(a,b))
p.subplot(312); p.hist(atrace)
p.subplot(313); p.hist(btrace)

p.show()

Paul Kienzle

unread,

Sep 25, 2009, 8:52:28 PM9/25/09

to PyMC

Here's an additional example with negative correlation
between two variables. Again, pymc is not agressively
looking for the complete set of solutions, otherwise the
a vs. b plot would stretch from (0,4) to (4,0). Has
anybody explored this sort of problem?

Thanks,

- Paul

import pymc, numpy as n, pylab as p
from pymc import deterministic

class Theory:
def __init__(self, x):
self.x = x

def fn(self, a, b):
return (a+b)*x

def __call__(self, *args):
return self.fn(*args)

x = n.arange(-1., 1, .01)
sig = .1*n.ones_like(x)
tau = 1/sig**2
actual_func = Theory(x)

# Generate some fake data

noise = n.random.normal(0, scale=sig, size=x.size)

mdata = actual_func(2,2)
mdata += noise

a_fit = pymc.Uniform('a', 0,4)
b_fit = pymc.Uniform('b', 0,4)

@deterministic
def func(a=a_fit, b=b_fit):
return actual_func(a,b)

mdata_var = pymc.Normal('mdata', mu=func, tau=tau,
value=mdata, observed=True)

mdl = pymc.Model([a_fit, b_fit, mdata_var])#, tau_fit])
mc = pymc.MCMC(mdl)
mc.sample(iter=10000,burn=100)

atrace = mc.trace('a')[:]
btrace = mc.trace('b')[:]

a = n.mean(atrace)
b = n.median(btrace)

print 'a',n.mean(atrace),n.std(atrace)
print 'b',n.mean(btrace),n.std(btrace)
p.subplot(221);
p.errorbar(x,mdata,yerr=sig); p.plot(x,actual_func(a,b))
p.legend(['best fit'])
p.subplot(222); p.plot(atrace,btrace); p.legend(['a vs b'])
p.subplot(223); p.hist(atrace); p.legend(['a'])
p.subplot(224); p.hist(btrace); p.legend(['b'])

p.show()

Chris Fonnesbeck

unread,

Sep 27, 2009, 11:29:27 PM9/27/09

to PyMC

On Sep 26, 11:50 am, Paul Kienzle <pakn...@gmail.com> wrote:
> Anyone care to comment on whether this is the right tool either for
> global optimization, or for proper uncertainty analysis when multiple
> solutions are possible?

Paul,

MCMC does not optimise per se, but rather attempts to draw samples
from a posterior distribution that is generally to complex to either
solve analytically or sample from directly. So, while optimisation
methods like lease squares use a loss function to estimate parameters,
Metropolis-Hastings samplers generate a Markov chain, selecting or
rejecting proposed values based on the ratio between the proposed
value and the last value in the chain. In this way, MCMC can estimate
the parameters of a potentially very complex hierarchical model.
Everything is based upon fully specifying a probability model, and
letting the sampler estimate that model.

Not sure if that helps,
cf

David Huard

unread,

Sep 28, 2009, 12:28:55 AM9/28/09

to py...@googlegroups.com

Paul,

To complement Chris answer, pymc will adjust the jump variance to get an optimal acceptance ratio. For well behaved likelihoods, this generally means that the whole space is explored.

When there are multiples extrema, my guess is that you'd need to accept higher rejection ratios if you wanted to make sure you sampled all solutions. That is, you need to climb out of the probability well.

When variables are highly correlated, you generally need a lot more samples to cover the domain, since values are often rejected if the jump is not in the right direction. To get an acceptable acceptance ratio, the algorithm will reduce the jump distance, which slows down the exploration of the space.

So simply put, if you want to explore those problems, you'll have to tweak the tuning methods. Also, for highly correlated variables, AdaptiveMetropolis may help since it makes mutidimensional jumps based on the covariance of the past samples, particularly if you help it by giving an initial guess for the covariance matrix.

HTH,

David

Anand Patil

unread,

Sep 30, 2009, 7:26:00 AM9/30/09

to py...@googlegroups.com

Hi Paul,

If you plot btrace directly (in your first program), you can confirm David's answer: PyMC _is_ finding multiple modes early on, but it ends up spending long enough on each that it optimizes itself for exploring a single mode.

You can adjust the tuning strategy as David suggests, but in this case you can just restrict phi to an interval of width 1 a priori, with the understanding that the model would be completely indifferent to integer shifts.

There's a big literature on MCMC methods that are relatively good at exploring multimodal posterior distributions. PyMC doesn't include any of these, but many could be implemented as step methods. If you're going to be looking at such models a lot, and you won't usually be able to identify the degeneracy ahead of time, it might be worth the effort.