fitting

Vadim Zaliva

unread,

Nov 7, 2009, 6:50:00 AM11/7/09

to

Hi!

I am very new to Mathematica and I am failing in trying to do
something very simple.
I am have the following data:

x = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444, 579, 711, 731,
602, 364, 151};

Which is a curve, similar to PDF function of Beta distribution:

ListLinePlot[x]

And I am trying to fit it to:

PDF[BetaDistribution[\[Alpha], \[Beta]], x

finding Alpha and Beta values.

I will appreciate it somebody can give me a hint how to do this
properly using FindFit or
any other means. I am too embarrassed to post my modest attempts here,
but trust
me, I've spent few hours trying before posting :)

Vadim

da...@wolfram.com

unread,

Nov 8, 2009, 6:52:54 AM11/8/09

to

There are a few ways to go astray, and I imagine I replicated some of
them. Here is something that seems to work. Rescale along the x axis so
that those values are evenly spaced between zero and one. Now you can
either also rescale on the y axis (e.g. divide each value by the total),
or else simply make that scaling a part of the fit parametrization. I do
the latter below.

data = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444, 579, 711,
731, 602, 364, 151};

In[86]:= bfit =
FindFit[Transpose[{(Range[Length[data]])/(Length[data] + 1), data}],
n*PDF[BetaDistribution[a, b], x], {a, b, n}, x]

Out[86]= {a -> 4.69588, b -> 2.34725, n -> 291.736}

To test this, you might do

Show[{ListPlot[
Transpose[{(Range[Length[data]])/(Length[data] + 1), data}]] /.
bfit, Plot[n*PDF[BetaDistribution[a, b], x] /. bfit, {x, 0, 1}]}]

One other thing I tried was to augment data by zeros at each end (and have
x values starting at zero and ending at one). This did not work, at least
not with the code I used. Another thing that failed was to let FindFit
find scaling along the x axis. Hence the explicit scaling used above.

Daniel Lichtblau
Wolfram Research

Ray Koopman

unread,

Nov 8, 2009, 7:26:37 AM11/8/09

to

You haven't said what your 'x' represents or how it is supposed to
relate to the Beta distribution. If we take the values as the counts
in equal-width bins that cover [0,1], then we can get the sample mean
and variance using the midpoints of the bins as the x-values, and use
the method of moments to estimate 'a' and 'b', the parameters of the
distribution.

k = Length[f = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444,
579, 711, 731, 602, 364, 151}]
x = Range[1/2,k]/k; {n = Tr@f, N[m = f.x/n], N[v = f.x^2/n - m^2]}
Solve[{m == a/(a+b), v == m(1-m)/(1+a+b)}, {a,b}]//Flatten//N

17

{5416, 0.653065, 0.0391941}

{a -> 3.12214, b -> 1.65861}

Mark Fisher

unread,

Nov 9, 2009, 5:55:27 AM11/9/09

to

Borrowing from both Ray and Dan, I interpret the data as bin counts
and use FindFit:

Clear[a, b, x];
bincounts = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444, 579,

711, 731, 602, 364, 151};

nbins = Length[bincounts];
bincenters = Range[1/2, nbins]/nbins;
binheights = Normalize[bincounts, Total[#]/nbins &];
bindata = Transpose[{bincenters, binheights}];
fit = FindFit[bindata, PDF[BetaDistribution[a, b], x], {a, b}, {x}];
dist = BetaDistribution[a, b] /. fit;
Plot[PDF[dist, x], {x, 0, 1},
Epilog -> {ColorData[1][2], Point[bindata]},
PlotLabel -> Column[(fit /. Rule -> Equal)]]

--Mark

Ray Koopman

unread,

Nov 10, 2009, 5:57:07 AM11/10/09

to

17 bins? That's a strange number! Were there 3 empty bins at the
start? In any case, this will get maximum likelihood estimates of
{a,b}. To keep them positive, we estimate their logs.

k = Length[f = { 0,0,0, 8, 36, 74, 96, 123, 152, 201, 269, 415, 460,
444, 579, 711, 731, 602, 364, 151}]
xx = N@Partition[Range[0,k]/k,2,1];
NMaximize[f.Log[BetaRegularized[Sequence@@#,E^ta,E^tb]&/@xx],{ta,tb}]
{a->E^ta, b->E^tb}/.%[[2]]

20

{-13831.2, {ta -> 1.53959, tb -> 0.673633}}

{a -> 4.66267, b -> 1.96135}