Fitting a beta distribution among 0 and 1

Emil J K

unread,

Sep 10, 2016, 4:12:07 PM9/10/16

to pystatsmodels

Hi everyone!
I am trying to fit a beta distribution that should be defined between 0 and 1 on a data set that only has samples in a subrange. My problem is that using the fit() function will cause the fitted PDF to be defined only between my smallest and largest values. For instance, if my dataset has samples between 0.2 and 0.3, what I get is a PDF defined between 0.2 and 0.3, instead of between 0 and 1, as it should be. The code I am using is:

ps1 = beta.fit(selected, loc=0, scale=1)

Am I missing any parameter? Thank you!

josef...@gmail.com

unread,

Sep 10, 2016, 4:27:57 PM9/10/16

to pystatsmodels

I assume the question and beta refer to the scipy.stats distribution.
The best place for this kind of question is stackoverflow with the
scipy tag, which will most of the time provide a answer fast.

specifically:
the scipy.stats distribution fit methods uses loc and scale as start
parameters. If you want to fix them and not estimate them, then you
have to prefix the names with an "f" for fixed, i.e.

ps1 = beta.fit(selected, floc=0, fscale=1)
should fit the shape parameter without changing the support of the distribution.

Constraining parameters to specific values in fit also works for shape
parameters, prefixed with an f.

Josef

>
>
>

Emil J K

unread,

Sep 11, 2016, 7:44:26 AM9/11/16

to pystatsmodels

That did the trick, thank you! I had asked the question in stackoverflow too, but thought this would be a better place :P Anyway, I had come with a partial solution that was working well for me, which was to replicate my samples (for the datasets that were too small) and add dummy samples at 0 and 1. Although that was increasing error, it was low enough for my purpose.

josef...@gmail.com

unread,

Sep 11, 2016, 11:37:30 AM9/11/16

to pystatsmodels

On Sun, Sep 11, 2016 at 7:44 AM, Emil J K <emilk...@gmail.com> wrote:
> That did the trick, thank you! I had asked the question in stackoverflow
> too, but thought this would be a better place :P Anyway, I had come with a
> partial solution that was working well for me, which was to replicate my
> samples (for the datasets that were too small) and add dummy samples at 0
> and 1. Although that was increasing error, it was low enough for my purpose.

Note:
`statsmodels` and `scipy` are two different packages, and there is no
`beta` in statsmodels.

The stackoverflow tag for `scipy` has a large number of watchers that
can answer questions, but they will not see it if it doesn't have the
correct tag.

Josef

Reply all

Reply to author

Forward