Joseph,
I'm always amazed at how quickly you come up with realy good answers --
thanks a lot, again!
However, in my use case, I don't know a-priori the locations of the
breakpoints. I quickly hacked together something along these lines,
using scipy.optimize.minimize:
---8<-------
import numpy as np
import
numpy.ma as ma
import pandas as pd
from scipy.optimize import minimize
import matplotlib.pyplot as plt
def F(X, data, S):
# y1, y2, y3, t2 are scalars, data is a constant np.array
y1, y2, y3, t2 = X
T = np.arange(data.size)
t1, t3 = T[0], T[-1]
Xbefore = y1 + (T - t1) * (y2 - y1) / (t2 - t1)
Xafter = y2 + (T - t2) * (y3 - y2) / (t3 - t2)
Xbreak = np.where(T <= t2, Xbefore, Xafter)
return ((ma.masked_invalid(Xbreak - data)**2) /
ma.masked_invalid(S)**2).sum()
# create test data
idx = pd.period_range("2000-01", "2009-12", freq="M")
data1 = np.arange(70) * .4 + np.random.randn(70)
data2 = np.arange(50) * (- .2) + np.random.randn(50) + 29.
data = pd.Series(np.r_[data1, data2], index=idx)
_d = np.asarray(data)
# seasonally varying measurement uncertainty
S = data.groupby(lambda x: x.month).std()
_S = np.asarray(S[[t.month for t in data.index]])
# find optimal parameters
res = minimize(F, (data.mean(), data.mean(),
data.mean(), data.index.size / 2.),
(_d, _S), method="L-BFGS-B",
bounds=((None, None), (None, None), (None, None),
(0., float(_d.size))))
y1, y2, y3, t2 = res.x
# plot results
data.plot(style="*")
tmin, tmax = data.index[0].ordinal, data.index[-1].ordinal
t2 = tmin + t2
ax = plt.gca()
ax.plot([data.index[0], t2], [y1, y2], 'r-', lw=2.)
ax.plot([t2, data.index[-1], ], [y2, y3], 'r-', lw=2.)
plt.show()
---8<-------
The only a-priori is the number of breakpoints, one. However, this has
the drawback that there is no estimate for the uncertainties of the fit
parameters. Of course, this could be done by bootstrapping ...
I found MARS, which might be worthwhile to look at.
Links:
-
http://www.milbo.users.sonic.net/earth/
-
http://dx.doi.org/10.1214/aos/1176347963
-
http://en.wikipedia.org/wiki/Multivariate_adaptive_regression_splines
Actually, I just found this cython implementation of the MARS algorithm:
https://github.com/jcrudy/py-earth/
Do you have any other suggestions?
I'll play around a bit and report my experiences here, if there's interest.
Cheers, Andreas.