Using KDEUnivariate

129 views
Skip to first unread message

Stefano Messina

unread,
Apr 22, 2014, 9:27:57 AM4/22/14
to pystat...@googlegroups.com
I'm using the KDEUnivariate class to fit a PDF to my data.

Unfortunately, I cannot find how to pass the support along with the densities to the fit method.
As a consequence, the support of the fitted PDF is (as usual) around zero,
but that's not what I would like to have given the data range (see attachment).

Any hint will be appreciated.

Stefano
test.jpeg

Skipper Seabold

unread,
Apr 22, 2014, 10:11:57 AM4/22/14
to pystat...@googlegroups.com
How did you create this plot? Did you use the support attribute of the
fitted KDEUnivariate object. E.g.,

ax.plot(density.support, density.density)

Sourceforge appears to be down right now, but there's an example here.

https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/kernel_density.ipynb
https://github.com/statsmodels/statsmodels/blob/master/examples/python/kernel_density.py

Skipper

Stefano Messina

unread,
Apr 22, 2014, 10:38:06 AM4/22/14
to pystat...@googlegroups.com
Indeed, I used the support attribute of the class.

Stefano

Skipper Seabold

unread,
Apr 22, 2014, 10:50:59 AM4/22/14
to pystat...@googlegroups.com
On Tue, Apr 22, 2014 at 10:38 AM, Stefano Messina <esse...@gmail.com> wrote:
> Indeed, I used the support attribute of the class.
>

Can you post code to replicate this behavior? I'm unable to do so.
This works as expected for me.

[~/]
[1]: y = np.random.normal(10, 3, size=500)

[~/]
[2]: kde = sm.nonparametric.KDEUnivariate(y)

[~/]
[3]: kde.fit()

[~/]
[4]: plt.plot(kde.support, kde.density)

The plot is centered around 10, as I'd expect.

baba

unread,
Apr 22, 2014, 4:31:40 PM4/22/14
to pystat...@googlegroups.com
For this purpose you'd need to load the data I'm using, so I've attached a text file.

In [1]: data = np.loadtxt('out.txt', usecols=(1,))

In [2]: data = np.log10(data)

In [3]: entries, edges = np.histogram(data, 80, normed=True)

In [4]: kde = sm.nonparametric.KDEUnivariate(entries)

In [5]: kde.fit()

In [6]: pyplot.plot(kde.support, kde.density)

The plot is not where I'd expect to be, all values are in the range 16. - 18.5, and the peak at 17. more or less.
out.txt

Skipper Seabold

unread,
Apr 22, 2014, 4:35:16 PM4/22/14
to pystat...@googlegroups.com
On Tue, Apr 22, 2014 at 4:31 PM, baba <esse...@gmail.com> wrote:
> For this purpose you'd need to load the data I'm using, so I've attached a
> text file.
>
> In [1]: data = np.loadtxt('out.txt', usecols=(1,))
>
> In [2]: data = np.log10(data)
>
> In [3]: entries, edges = np.histogram(data, 80, normed=True)
>
> In [4]: kde = sm.nonparametric.KDEUnivariate(entries)
>
> In [5]: kde.fit()
>
> In [6]: pyplot.plot(kde.support, kde.density)
>
> The plot is not where I'd expect to be, all values are in the range 16. -
> 18.5, and the peak at 17. more or less.

Well you're taking the KDE of entries not data. You should pass data
to KDEUnivariate not the normed bin values from histogram.

Padarn Wilson

unread,
Apr 22, 2014, 8:30:32 PM4/22/14
to pystat...@googlegroups.com


On Wednesday, April 23, 2014 6:35:16 AM UTC+10, jseabold wrote:
On Tue, Apr 22, 2014 at 4:31 PM, baba <esse...@gmail.com> wrote:
> For this purpose you'd need to load the data I'm using, so I've attached a
> text file.
>
> In [1]: data = np.loadtxt('out.txt', usecols=(1,))
>
> In [2]: data = np.log10(data)
>
> In [3]: entries, edges = np.histogram(data, 80, normed=True)
>
> In [4]: kde = sm.nonparametric.KDEUnivariate(entries)
>
> In [5]: kde.fit()
>
> In [6]: pyplot.plot(kde.support, kde.density)
>
> The plot is not where I'd expect to be, all values are in the range 16. -
> 18.5, and the peak at 17. more or less.

Well you're taking the KDE of entries not data. You should pass data
to KDEUnivariate not the normed bin values from histogram.


I think this is almost certainly the problem. However, if you really want to evaluate this estimated density over a different support you can use the kde.evaluate() function to evaluate it over whatever support you like.

Skipper Seabold

unread,
Apr 22, 2014, 8:38:30 PM4/22/14
to pystat...@googlegroups.com
On Tue, Apr 22, 2014 at 8:30 PM, Padarn Wilson <pad...@gmail.com> wrote:
>
>
> On Wednesday, April 23, 2014 6:35:16 AM UTC+10, jseabold wrote:
>>
>> On Tue, Apr 22, 2014 at 4:31 PM, baba <esse...@gmail.com> wrote:
>> > For this purpose you'd need to load the data I'm using, so I've attached
>> > a
>> > text file.
>> >
>> > In [1]: data = np.loadtxt('out.txt', usecols=(1,))
>> >
>> > In [2]: data = np.log10(data)
>> >
>> > In [3]: entries, edges = np.histogram(data, 80, normed=True)
>> >
>> > In [4]: kde = sm.nonparametric.KDEUnivariate(entries)
>> >
>> > In [5]: kde.fit()
>> >
>> > In [6]: pyplot.plot(kde.support, kde.density)
>> >
>> > The plot is not where I'd expect to be, all values are in the range 16.
>> > -
>> > 18.5, and the peak at 17. more or less.
>>
>> Well you're taking the KDE of entries not data. You should pass data
>> to KDEUnivariate not the normed bin values from histogram.
>>
>
> I think this is almost certainly the problem. However, if you really want to
> evaluate this estimated density over a different support you can use the
> kde.evaluate() function to evaluate it over whatever support you like.
>

Though in this case the probabilities of the original data's support
plugged in to evaluate will all be zero since the true support of the
data is well outside the 0,1 support of the normed histogram data.

Skipper

Padarn Wilson

unread,
Apr 22, 2014, 8:42:10 PM4/22/14
to pystat...@googlegroups.com
True, it seemed like a bad idea. I was just pointing out that it is possible to manually choose the support if you so desire.

baba

unread,
Apr 23, 2014, 3:18:47 AM4/23/14
to pystat...@googlegroups.com
Thanks a lot!
I've thought that a PDF estimate should be performed on the normalized distribution.

Stefano
Reply all
Reply to author
Forward
0 new messages