I am very new to Mathematica and I am failing in trying to do
something very simple.
I am have the following data:
x = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444, 579, 711, 731,
602, 364, 151};
Which is a curve, similar to PDF function of Beta distribution:
ListLinePlot[x]
And I am trying to fit it to:
PDF[BetaDistribution[\[Alpha], \[Beta]], x
finding Alpha and Beta values.
I will appreciate it somebody can give me a hint how to do this
properly using FindFit or
any other means. I am too embarrassed to post my modest attempts here,
but trust
me, I've spent few hours trying before posting :)
Vadim
There are a few ways to go astray, and I imagine I replicated some of
them. Here is something that seems to work. Rescale along the x axis so
that those values are evenly spaced between zero and one. Now you can
either also rescale on the y axis (e.g. divide each value by the total),
or else simply make that scaling a part of the fit parametrization. I do
the latter below.
data = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444, 579, 711,
731, 602, 364, 151};
In[86]:= bfit =
FindFit[Transpose[{(Range[Length[data]])/(Length[data] + 1), data}],
n*PDF[BetaDistribution[a, b], x], {a, b, n}, x]
Out[86]= {a -> 4.69588, b -> 2.34725, n -> 291.736}
To test this, you might do
Show[{ListPlot[
Transpose[{(Range[Length[data]])/(Length[data] + 1), data}]] /.
bfit, Plot[n*PDF[BetaDistribution[a, b], x] /. bfit, {x, 0, 1}]}]
One other thing I tried was to augment data by zeros at each end (and have
x values starting at zero and ending at one). This did not work, at least
not with the code I used. Another thing that failed was to let FindFit
find scaling along the x axis. Hence the explicit scaling used above.
Daniel Lichtblau
Wolfram Research
You haven't said what your 'x' represents or how it is supposed to
relate to the Beta distribution. If we take the values as the counts
in equal-width bins that cover [0,1], then we can get the sample mean
and variance using the midpoints of the bins as the x-values, and use
the method of moments to estimate 'a' and 'b', the parameters of the
distribution.
k = Length[f = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444,
579, 711, 731, 602, 364, 151}]
x = Range[1/2,k]/k; {n = Tr@f, N[m = f.x/n], N[v = f.x^2/n - m^2]}
Solve[{m == a/(a+b), v == m(1-m)/(1+a+b)}, {a,b}]//Flatten//N
17
{5416, 0.653065, 0.0391941}
{a -> 3.12214, b -> 1.65861}
Borrowing from both Ray and Dan, I interpret the data as bin counts
and use FindFit:
Clear[a, b, x];
bincounts = {8, 36, 74, 96, 123, 152, 201, 269, 415, 460, 444, 579,
711, 731, 602, 364, 151};
nbins = Length[bincounts];
bincenters = Range[1/2, nbins]/nbins;
binheights = Normalize[bincounts, Total[#]/nbins &];
bindata = Transpose[{bincenters, binheights}];
fit = FindFit[bindata, PDF[BetaDistribution[a, b], x], {a, b}, {x}];
dist = BetaDistribution[a, b] /. fit;
Plot[PDF[dist, x], {x, 0, 1},
Epilog -> {ColorData[1][2], Point[bindata]},
PlotLabel -> Column[(fit /. Rule -> Equal)]]
--Mark
17 bins? That's a strange number! Were there 3 empty bins at the
start? In any case, this will get maximum likelihood estimates of
{a,b}. To keep them positive, we estimate their logs.
k = Length[f = { 0,0,0, 8, 36, 74, 96, 123, 152, 201, 269, 415, 460,
444, 579, 711, 731, 602, 364, 151}]
xx = N@Partition[Range[0,k]/k,2,1];
NMaximize[f.Log[BetaRegularized[Sequence@@#,E^ta,E^tb]&/@xx],{ta,tb}]
{a->E^ta, b->E^tb}/.%[[2]]
20
{-13831.2, {ta -> 1.53959, tb -> 0.673633}}
{a -> 4.66267, b -> 1.96135}