Is possible to calculate 'kde' without plotting it?

131 views
Skip to first unread message

Paul Blelloch

unread,
May 20, 2013, 3:21:33 PM5/20/13
to pyd...@googlegroups.com
I find that the 'kde' plotting capability takes a very long time because the 'kde' calculation is expensive.  I'd like to be able to save the answers for later plotting.  Is it possible to do that, or do I need to recalculate it every time I want to plot it?

Miki Tebeka

unread,
May 20, 2013, 11:50:41 PM5/20/13
to pyd...@googlegroups.com
On Monday, May 20, 2013 12:21:33 PM UTC-7, Paul Blelloch wrote:
I find that the 'kde' plotting capability takes a very long time because the 'kde' calculation is expensive.  I'd like to be able to save the answers for later plotting.  Is it possible to do that, or do I need to recalculate it every time I want to plot it?
There are two implementations of KDE that I'm aware of - http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html and http://statsmodels.sourceforge.net/devel/generated/statsmodels.nonparametric.kde.KDE.html. You can use either to pre-compute and plot when needed. 

Paul Blelloch

unread,
May 22, 2013, 11:54:32 AM5/22/13
to pyd...@googlegroups.com
Thank you for the reply.  I just saw it.  I had looked at the plotting code in pandas and saw that it used the scipy.stats.gaussian_kde and had started using that.  I have to pull the data out of a pandas object and put it back in.  Having a kde method that operated on a DataFrame or Series object and returned another would be nice. I had written a very short function that did that, but I suppose that it's so short that it's barely worth thinking about. The nicest thing about the function from my perspective is that I can control the number of points in the KDE, while the plot capability in Pandas is hardwired to 1000.  I think that was why it was taking so long in my case.  I get a perfectly good estimate in my case with about 100-200 points.


On Monday, May 20, 2013 12:21:33 PM UTC-7, Paul Blelloch wrote:

Skipper Seabold

unread,
May 22, 2013, 12:07:29 PM5/22/13
to pyd...@googlegroups.com
On Wed, May 22, 2013 at 11:54 AM, Paul Blelloch <paul.b...@gmail.com> wrote:
Thank you for the reply.  I just saw it.  I had looked at the plotting code in pandas and saw that it used the scipy.stats.gaussian_kde and had started using that.  I have to pull the data out of a pandas object and put it back in.  Having a kde method that operated on a DataFrame or Series object and returned another would be nice. I had written a very short function that did that, but I suppose that it's so short that it's barely worth thinking about. The nicest thing about the function from my perspective is that I can control the number of points in the KDE, while the plot capability in Pandas is hardwired to 1000.  I think that was why it was taking so long in my case.  I get a perfectly good estimate in my case with about 100-200 points.

FWIW, for univariate KDE the statsmodels implementation using FFT should be orders of magnitude faster. Some more examples are here


Skipper

Paul Blelloch

unread,
May 22, 2013, 12:15:59 PM5/22/13
to pyd...@googlegroups.com
That's excellent information. The kde plotting capability in pandas can be extremely slow, so it'll be really nice to have something faster.

THANKS!!


On Monday, May 20, 2013 12:21:33 PM UTC-7, Paul Blelloch wrote:

Paul Blelloch

unread,
May 22, 2013, 1:07:55 PM5/22/13
to pyd...@googlegroups.com
Has there been some major reorganization of statsmodels?  I can't get any of the KDE examples that I've found online to work.  The error is always that the method that I'm trying (KDEUnivariate or kdensityfft) doesn't exist, and the examples seem to be inconsistent in exactly how these methods are imported.  Do you have an example of an FFT KDE that works with statsmodels 0.4.3?


 
On Monday, May 20, 2013 12:21:33 PM UTC-7, Paul Blelloch wrote:

josef...@gmail.com

unread,
May 22, 2013, 1:16:01 PM5/22/13
to pyd...@googlegroups.com
On Wed, May 22, 2013 at 1:07 PM, Paul Blelloch <paul.b...@gmail.com> wrote:
> Has there been some major reorganization of statsmodels? I can't get any of
> the KDE examples that I've found online to work. The error is always that
> the method that I'm trying (KDEUnivariate or kdensityfft) doesn't exist, and
> the examples seem to be inconsistent in exactly how these methods are
> imported. Do you have an example of an FFT KDE that works with statsmodels
> 0.4.3?

the test suite is a good place to check what was supposed to work with 0.4.3
https://github.com/statsmodels/statsmodels/blob/v0.4.3/statsmodels/nonparametric/tests/test_kde.py#L5

upgrading to a developement version would help. A problem with our
annual releases is that the documentation and examples are ahead of
the release (non-)schedule.

Josef

>
>
>
> On Monday, May 20, 2013 12:21:33 PM UTC-7, Paul Blelloch wrote:
>>
>> I find that the 'kde' plotting capability takes a very long time because
>> the 'kde' calculation is expensive. I'd like to be able to save the answers
>> for later plotting. Is it possible to do that, or do I need to recalculate
>> it every time I want to plot it?
>
> --
> You received this message because you are subscribed to the Google Groups
> "PyData" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to pydata+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Skipper Seabold

unread,
May 22, 2013, 1:16:34 PM5/22/13
to pyd...@googlegroups.com
On Wed, May 22, 2013 at 1:07 PM, Paul Blelloch <paul.b...@gmail.com> wrote:
Has there been some major reorganization of statsmodels?  I can't get any of the KDE examples that I've found online to work.  The error is always that the method that I'm trying (KDEUnivariate or kdensityfft) doesn't exist, and the examples seem to be inconsistent in exactly how these methods are imported.  Do you have an example of an FFT KDE that works with statsmodels 0.4.3?

Yes, KDEUnivariate doesn't exist in 0.4.3. You should be able to replace it with KDE everywhere and have it work. We now have both univariate and multivariate KDE, hence the new distinction. I highly recommend upgrading to current master if at all possible.

What are you doing that you get an error for kdensityfft? This should exist in 0.4.3.

Skipper
Reply all
Reply to author
Forward
0 new messages