Fitting a sine curve to messy data

Sian

unread,

Jul 29, 2011, 5:57:11 AM7/29/11

to

Hi,

I'm trying to work out whether there is a curve of the form a*sin(x)+b that can sensibly be fitted to some very messy data. My x values range from 0 to 360 but each x value does not have a unique corresponding y value. I was thinking of trying to minimise the RMSE by trying a range of values for a and b, but am at a loss as to how to go about it. Reading the the help pages for likely-looking functions has only increased my confusion.

Any help greatly appreciated.

Sian

Wayne King

unread,

Jul 29, 2011, 7:01:10 AM7/29/11

to

"Sian " <s.e....@pgr.reading.ac.uk> wrote in message <j0u05m$3ml$1...@newscl01ah.mathworks.com>...

Hi Sian, You have many options in MATLAB depending on what toolboxes you have and whether you know the period of the trigonometric function you want to fit. From your post you indicate that you want to fit a period of 2*pi. Is this true? If you have no a priori idea of what oscillations are in your data, a Fourier analysis would be a good place to start.

Wayne

Sian

unread,

Jul 29, 2011, 7:58:13 AM7/29/11

to

> Hi Sian, You have many options in MATLAB depending on what toolboxes you have and whether you know the period of the trigonometric function you want to fit. From your post you indicate that you want to fit a period of 2*pi. Is this true? If you have no a priori idea of what oscillations are in your data, a Fourier analysis would be a good place to start.
>
> Wayne

Hi Wayne,

Since my x values are wind directions I guess I would be looking for something with a period of 2 pi. I'm not very familiar with fourier analysis, but I thought that it could only be used when each x value has a unique corresponding y value, is this not correct? Also, given that my x-axis is direction, not time, I don't really know what to plot the transform against...

Wayne King

unread,

Jul 29, 2011, 8:08:11 AM7/29/11

to

"Sian " <s.e....@pgr.reading.ac.uk> wrote in message <j0u78l$ii3$1...@newscl01ah.mathworks.com>...

You're saying that you have multiple y-values for each x? What are you measuring for each wind direction, speed?

Kevin

unread,

Jul 29, 2011, 8:27:10 AM7/29/11

to

>>help fminsearch

If you want a simple minimize the squared error of the fit, there is a decent tutorial online here:
http://www-h.eng.cam.ac.uk/help/tpl/programs/Matlab/curve_fitting.html

"Sian " <s.e....@pgr.reading.ac.uk> wrote in message <j0u05m$3ml$1...@newscl01ah.mathworks.com>...

Sian

unread,

Jul 29, 2011, 9:41:30 AM7/29/11

to

"Wayne King" <wmki...@gmail.com> wrote in message <j0u7rb$jun$1...@newscl01ah.mathworks.com>...

>
> You're saying that you have multiple y-values for each x? What are you measuring for each wind direction, speed?

Yep - The data has been collected over many months, so each x-value (wind direction) has a variety of y values. The y values are arctan(vertical_wind/horizontal_wind), the idea being to work out whether the instrument is tilted (if I can fit some sort of sinusoidal curve to the data, that indicates a tilt).

Richard_Willey

unread,

Jul 29, 2011, 9:53:49 AM7/29/11

to

Hi Sian

The easiest way to do what you're describing is to use the fit command from
Curve Fitting Toolbox
Here's some simple example code to do what you're describing

x = linspace(0, 4*pi, 100);
x = x';

a = 3;
b = 2;

y = a*sin(x)+b;
y = y + .25 * randn(100,1);

foo = fit(x,y,'a*sin(x)+b')

As an alternative, you might prefer to play around with a nonparametric
fitting technique (localized regression, a smoothing spline, or some such)
I'm attaching a simple function that you can use to generate a nonparametric
fit for a noisy data set
(This function requires both Curve Fitting Toolbox and Statistics Toolbox)

Here, the syntax is

foo = fitit(x,y)

Alternatively, if you want confidence intervals you need to enter an
addition coefficient describing the size of the bootstrap

[foo, bar1, bar2= = fitit(x,y, 500)

--------------------

function [myfit,varargout] = fitit(X,Y,varargin)

% Input and Output checking
error(nargchk(1, 3, nargin))
error(nargchk(1, 3, nargout))

validateattributes(X, {'numeric'}, {'nonempty','vector'})
validateattributes(Y, {'numeric'}, {'nonempty','vector'})

% Finding optimal span for lowess
num = 99;
spans = linspace(.01,.99,num);
sse = zeros(size(spans));
cp = cvpartition(100,'k',10);

for j=1:length(spans),
f = @(train,test) norm(test(:,2) -
mylowess(train,test(:,1),spans(j)))^2;
sse(j) = sum(crossval(f,[X,Y],'partition',cp));
end

[~,minj] = min(sse);
span = spans(minj);

if nargin-2 == 0
yfit = mylowess([X,Y],X,span);

figure;
h1 = gca;
scatter(h1,X,Y, 'k');
hold on
line(X,yfit,'color','b','linestyle','-', 'linewidth',2)
hold off
legend(h1,'Data','LOWESS',2)
xlabel(h1,inputname(1));
ylabel(h1,inputname(2));

else
nboot = varargin{1};

% Calculating confidence intervals

f = @(xy) mylowess(xy,X,span);
yboot2 = bootstrp(nboot,f,[X,Y])';
yfit = mean(yboot2,2);
stdloess = std(yboot2,0,2);

figure;
h1 = gca;
scatter(h1,X, Y,'k')
hold on
line(X, yfit,'color','k','linestyle','-','linewidth',2);
line(X, yfit+2*stdloess,'color','r','linestyle','--','linewidth',2);
line(X, yfit-2*stdloess,'color','r','linestyle','--','linewidth',2);
hold off
legend(h1,'Data', 'Localized Regression','Confidence Intervals',2);
xlabel(h1,inputname(1));
ylabel(h1,inputname(2));
end

% Fit cubic spline to resulting data;
myfit = fit(X, yfit, 'cubicinterp');

if nargout == 3
varargout{1} = fit(X, yfit+2*stdloess, 'pchipinterp');
varargout{2} = fit(X, yfit-2*stdloess, 'pchipinterp');
end

function ys=mylowess(xy,xs,span)
%MYLOWESS Lowess smoothing, preserving x values
% YS=MYLOWESS(XY,XS) returns the smoothed version of the x/y data in the
% two-column matrix XY, but evaluates the smooth at XS and returns the
% smoothed values in YS. Any values outside the range of XY are taken to
% be equal to the closest values.

if nargin<3 || isempty(span)
span = .3;
end

% Sort and get smoothed version of xy data
xy = sortrows(xy);
x1 = xy(:,1);
y1 = xy(:,2);
ys1 = smooth(x1,y1,span,'loess');

% Remove repeats so we can interpolate
t = diff(x1)==0;
x1(t)=[]; ys1(t) = [];

% Interpolate to evaluate this at the xs values
ys = interp1(x1,ys1,xs,'linear',NaN);

% Some of the original points may have x values outside the range of the
% resampled data. Those are now NaN because we could not interpolate them.
% Replace NaN by the closest smoothed value. This amounts to extending the
% smooth curve using a horizontal line.
if any(isnan(ys))
ys(xs<x1(1)) = ys1(1);
ys(xs>x1(end)) = ys1(end);
end

Sian

unread,

Jul 29, 2011, 11:17:14 AM7/29/11

to

Kevin - thanks for the link to the tutorial. It looks very useful.

Richard - thanks for taking the time to post that example. Alas, I have access to neither the curve fitting or statistics toolboxes.

Greg Heath

unread,

Jul 29, 2011, 9:50:58 PM7/29/11

to

% y = [ones(N,1) sin(x)]*W

W = y/[ones(N,1) sin(x)]

Hope this helps.

Greg

Greg Heath

unread,

Jul 30, 2011, 3:14:05 PM7/30/11

to

On Jul 29, 5:57 am, "Sian " <s.e.l...@pgr.reading.ac.uk> wrote:

% y = a*sin(x)+b

% y = [ones(N,1) sin(x)]*W

W = y/[ones(N,1) sin(x)]

a = W(1)
b = W(2)

Hope this helps.

Greg

Marquis Vawter

unread,

Oct 6, 2011, 2:31:26 PM10/6/11

to

I am cutting and pasting the first line of the script that you listed, however, I receive this error in 2011a. Best, Mark

function [myfit,varargout] = fitit(X,Y,varargin)
??? function [myfit,varargout] = fitit(X,Y,varargin)
|
Error: Function definitions are not permitted in this context.

ImageAnalyst

unread,

Oct 6, 2011, 2:56:55 PM10/6/11

to

--------------------------------------------
You can't have a function in the script. The script can only CALL the
function. So just have something like
myfit = fitit(x, y);
not the whole function definition. That should be in a separate file
or else you need to make your script a function itself. If your
script is a function, then it can have other functions live in the
same file below it, like this (all in a single m-file)

function my_m_file()
% stuff
myfit = fitit(x, y);
% more stuff.

function [myfit,varargout] = fitit(X,Y,varargin)
% the function contents...