KDEUnivariate

kai

unread,

Jun 1, 2017, 3:33:02 PM6/1/17

to pystatsmodels

Hi,

I am using statsmodel 0.8.0 master. I have created samples from 0-2499, incremented by 1 in a txt file, and then i run the script:

from statsmodels.nonparametric.kde import KDEUnivariate

samples=np.genfromtxt(sample_file_name)

xS=np.linspace(samples.min(),samples.max(),251)

kde = KDEUnivariate(samples)

kde.fit(bw="scott",kernel="tri",fft=False)

pdf1=kde.evaluate(xS)

lw1=2

fs1=18

fig = plt.figure(figsize=(8,6))

ax=fig.add_axes([0.10, 0.10, 0.85, 0.85]) ;

l1=plt.plot(xS,pdf1,linewidth=lw1,label="optimal")

==========================================

In red, if i use kernel="gau", it is fine. But not with uni, tri etc...I saw some discussion on github, and it is said this has been solved in 0.8.0, but dont know why same error appears as in 0.6:

Traceback (most recent call last):

File "plot_pdf.py", line 73, in <module>

pdf1=kde.evaluate(xS)

File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/nonparametric/kde.py", line 265, in evaluate

return self.kernel.density(self.endog, point)

File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/sandbox/nonparametric/kernels.py", line 194, in density

xs = self.in_domain( xs, xs, x )[0]

File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/sandbox/nonparametric/kernels.py", line 178, in in_domain

filtered = lfilter(isInDomain, lzip(xs, ys))

File "/usr/local/lib/python2.7/dist-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/sandbox/nonparametric/kernels.py", line 173, in isInDomain

return u >= self.domain[0] and u <= self.domain[1]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Can anyone help please?

Regards,

Kai.

josef...@gmail.com

unread,

Jun 1, 2017, 4:35:41 PM6/1/17

to pystatsmodels

I was trying to debug this but didn't find anything wrong.

However, `evaluate` says the argument is a "point", the non-fft
version loops over all points.

pdf1 = np.asarray([kde.evaluate(xi) for xi in xS])

I tried a few examples and it works, the "scott" bandwidth looks like
is undersmoothing a bit.

Josef

>
> Regards,
> Kai.
>
>

josef...@gmail.com

unread,

Jun 2, 2017, 3:25:33 PM6/2/17

to pystatsmodels

I came by chance to this issue
https://github.com/statsmodels/statsmodels/issues/1239

The part about looping inside evaluate got lost, AFAICS.

Josef

>
> Josef
>
>
>>
>> Regards,
>> Kai.
>>
>>

zkzkzk...@gmail.com

unread,

Jun 2, 2017, 6:51:38 PM6/2/17

to josef...@gmail.com, pystatsmodels

Thanks, Josef. It is at least working now. I didn’t realized I should use pdf1 = np.asarray([kde.evaluate(xi) for xi in xS])?though it is undersmoothed not only for scoot, but also for all other method.

Kai.

josef...@gmail.com

unread,

Jun 2, 2017, 9:13:41 PM6/2/17

to pystatsmodels, zkzkzk...@gmail.com

On Fri, Jun 2, 2017 at 5:51 PM, <zkzkzk...@gmail.com> wrote:
> Thanks, Josef. It is at least working now. I didn’t realized I should use
> pdf1 = np.asarray([kde.evaluate(xi) for xi in xS])?though it is
> undersmoothed not only for scoot, but also for all other method.

I got much better looking and smoother kde by doubling the bandwidth.

kde = KDEUnivariate(list(samples))

kde.fit(bw="scott",kernel="tri",fft=False)

kde.fit(bw=kde.bw * 2, kernel="tri",fft=False)

pdf1 = np.asarray([kde.evaluate(xi) for xi in xS])

I recomputed the constants for the triangular kernel that are used in
the computation for the normal reference rule-of-thumb. The constants
look correct, but I don't remember how the bandwidth is computed from
the constants.
Even for a simple normal distributed sample it looks undersmoothed.

Josef

kai

unread,

Jun 3, 2017, 7:02:56 AM6/3/17

to josef...@gmail.com, pystatsmodels

OK, thanks.

Kai.

Reply all

Reply to author

Forward