I'm trying to work out whether there is a curve of the form a*sin(x)+b that can sensibly be fitted to some very messy data. My x values range from 0 to 360 but each x value does not have a unique corresponding y value. I was thinking of trying to minimise the RMSE by trying a range of values for a and b, but am at a loss as to how to go about it. Reading the the help pages for likely-looking functions has only increased my confusion.
Any help greatly appreciated.
Sian
Hi Sian, You have many options in MATLAB depending on what toolboxes you have and whether you know the period of the trigonometric function you want to fit. From your post you indicate that you want to fit a period of 2*pi. Is this true? If you have no a priori idea of what oscillations are in your data, a Fourier analysis would be a good place to start.
Wayne
Hi Wayne,
Since my x values are wind directions I guess I would be looking for something with a period of 2 pi. I'm not very familiar with fourier analysis, but I thought that it could only be used when each x value has a unique corresponding y value, is this not correct? Also, given that my x-axis is direction, not time, I don't really know what to plot the transform against...
You're saying that you have multiple y-values for each x? What are you measuring for each wind direction, speed?
If you want a simple minimize the squared error of the fit, there is a decent tutorial online here:
http://www-h.eng.cam.ac.uk/help/tpl/programs/Matlab/curve_fitting.html
"Sian " <s.e....@pgr.reading.ac.uk> wrote in message <j0u05m$3ml$1...@newscl01ah.mathworks.com>...
>
> You're saying that you have multiple y-values for each x? What are you measuring for each wind direction, speed?
Yep - The data has been collected over many months, so each x-value (wind direction) has a variety of y values. The y values are arctan(vertical_wind/horizontal_wind), the idea being to work out whether the instrument is tilted (if I can fit some sort of sinusoidal curve to the data, that indicates a tilt).
The easiest way to do what you're describing is to use the fit command from
Curve Fitting Toolbox
Here's some simple example code to do what you're describing
x = linspace(0, 4*pi, 100);
x = x';
a = 3;
b = 2;
y = a*sin(x)+b;
y = y + .25 * randn(100,1);
foo = fit(x,y,'a*sin(x)+b')
As an alternative, you might prefer to play around with a nonparametric
fitting technique (localized regression, a smoothing spline, or some such)
I'm attaching a simple function that you can use to generate a nonparametric
fit for a noisy data set
(This function requires both Curve Fitting Toolbox and Statistics Toolbox)
Here, the syntax is
foo = fitit(x,y)
Alternatively, if you want confidence intervals you need to enter an
addition coefficient describing the size of the bootstrap
[foo, bar1, bar2= = fitit(x,y, 500)
--------------------
function [myfit,varargout] = fitit(X,Y,varargin)
% Input and Output checking
error(nargchk(1, 3, nargin))
error(nargchk(1, 3, nargout))
validateattributes(X, {'numeric'}, {'nonempty','vector'})
validateattributes(Y, {'numeric'}, {'nonempty','vector'})
% Finding optimal span for lowess
num = 99;
spans = linspace(.01,.99,num);
sse = zeros(size(spans));
cp = cvpartition(100,'k',10);
for j=1:length(spans),
f = @(train,test) norm(test(:,2) -
mylowess(train,test(:,1),spans(j)))^2;
sse(j) = sum(crossval(f,[X,Y],'partition',cp));
end
[~,minj] = min(sse);
span = spans(minj);
if nargin-2 == 0
yfit = mylowess([X,Y],X,span);
figure;
h1 = gca;
scatter(h1,X,Y, 'k');
hold on
line(X,yfit,'color','b','linestyle','-', 'linewidth',2)
hold off
legend(h1,'Data','LOWESS',2)
xlabel(h1,inputname(1));
ylabel(h1,inputname(2));
else
nboot = varargin{1};
% Calculating confidence intervals
f = @(xy) mylowess(xy,X,span);
yboot2 = bootstrp(nboot,f,[X,Y])';
yfit = mean(yboot2,2);
stdloess = std(yboot2,0,2);
figure;
h1 = gca;
scatter(h1,X, Y,'k')
hold on
line(X, yfit,'color','k','linestyle','-','linewidth',2);
line(X, yfit+2*stdloess,'color','r','linestyle','--','linewidth',2);
line(X, yfit-2*stdloess,'color','r','linestyle','--','linewidth',2);
hold off
legend(h1,'Data', 'Localized Regression','Confidence Intervals',2);
xlabel(h1,inputname(1));
ylabel(h1,inputname(2));
end
% Fit cubic spline to resulting data;
myfit = fit(X, yfit, 'cubicinterp');
if nargout == 3
varargout{1} = fit(X, yfit+2*stdloess, 'pchipinterp');
varargout{2} = fit(X, yfit-2*stdloess, 'pchipinterp');
end
function ys=mylowess(xy,xs,span)
%MYLOWESS Lowess smoothing, preserving x values
% YS=MYLOWESS(XY,XS) returns the smoothed version of the x/y data in the
% two-column matrix XY, but evaluates the smooth at XS and returns the
% smoothed values in YS. Any values outside the range of XY are taken to
% be equal to the closest values.
if nargin<3 || isempty(span)
span = .3;
end
% Sort and get smoothed version of xy data
xy = sortrows(xy);
x1 = xy(:,1);
y1 = xy(:,2);
ys1 = smooth(x1,y1,span,'loess');
% Remove repeats so we can interpolate
t = diff(x1)==0;
x1(t)=[]; ys1(t) = [];
% Interpolate to evaluate this at the xs values
ys = interp1(x1,ys1,xs,'linear',NaN);
% Some of the original points may have x values outside the range of the
% resampled data. Those are now NaN because we could not interpolate them.
% Replace NaN by the closest smoothed value. This amounts to extending the
% smooth curve using a horizontal line.
if any(isnan(ys))
ys(xs<x1(1)) = ys1(1);
ys(xs>x1(end)) = ys1(end);
end
Richard - thanks for taking the time to post that example. Alas, I have access to neither the curve fitting or statistics toolboxes.
% y = [ones(N,1) sin(x)]*W
W = y/[ones(N,1) sin(x)]
Hope this helps.
Greg
% y = a*sin(x)+b
% y = [ones(N,1) sin(x)]*W
W = y/[ones(N,1) sin(x)]
a = W(1)
b = W(2)
Hope this helps.
Greg