artificial datasets, examples

26 views
Skip to first unread message

josef...@gmail.com

unread,
Jan 4, 2013, 7:03:59 PM1/4/13
to pystatsmodels
Except for ARMA (and some old sandbox code) we don't have any
functions to create artificial examples for various models.

Where should we put those?

Case in point:

I was reading up a bit on non-parametric function estimation (for
PR#562), and there are some simulated functions that are often used to
check different estimators.
I wanted sometimes to create a collection, but always ended up just
including some in the example scripts.

I guess putting them in a central location is not so useful, but a
common name for the modules in the different subdirectories would be.

Josef

Ralf Gommers

unread,
Jan 5, 2013, 1:06:17 PM1/5/13
to pystat...@googlegroups.com
On Sat, Jan 5, 2013 at 1:03 AM, <josef...@gmail.com> wrote:
Except for ARMA (and some old sandbox code) we don't have any
functions to create artificial examples for various models.

Where should we put those?

Case in point:

I was reading up a bit on non-parametric function estimation (for
PR#562), and there are some simulated functions that are often used to
check different estimators.

I seem to remember Skipper actually had a bunch of those implemented already, just can't find the email anymore.

I wanted sometimes to create a collection, but always ended up just
including some in the example scripts.

I guess putting them in a central location is not so useful, but a
common name for the modules in the different subdirectories would be.

If they're really common, including them in the module may make sense. Perhaps hide all the functions themselves, but expose them in a single dict (with the same name across modules) per module?

Ralf


josef...@gmail.com

unread,
Jan 5, 2013, 9:54:20 PM1/5/13
to pystat...@googlegroups.com
We might need more options if we want to make it useful, and add the
simulation of x/exog and noise to it.

So I ended up with a function and a class as first draft

https://github.com/josef-pkt/statsmodels/commit/1fd15ce691a804aecec2ba3bd9364cf23b9dbaf4

The Fan/Gijbels (1992) examples are from a paper on variable bandwidth
kernel regression, so a fixed bandwidth estimator won't do so well on
them.

Josef

>
> Ralf
>
>
fg1eu.png

josef...@gmail.com

unread,
Jan 6, 2013, 9:42:07 AM1/6/13
to pystat...@googlegroups.com
example results and advertising:
Kernel Regression with least squares cross-validation bandwidth choice

Josef

>
> Josef
>
>>
>> Ralf
>>
>>
kernelreggrid.png

Ralf Gommers

unread,
Jan 7, 2013, 4:07:54 PM1/7/13
to pystat...@googlegroups.com
On Sun, Jan 6, 2013 at 3:42 PM, <josef...@gmail.com> wrote:
On Sat, Jan 5, 2013 at 9:54 PM,  <josef...@gmail.com> wrote:
> On Sat, Jan 5, 2013 at 1:06 PM, Ralf Gommers <ralf.g...@gmail.com> wrote:
>>
>>
>>
>> On Sat, Jan 5, 2013 at 1:03 AM, <josef...@gmail.com> wrote:
>>>
>>> Except for ARMA (and some old sandbox code) we don't have any
>>> functions to create artificial examples for various models.
>>>
>>> Where should we put those?
>>>
>>> Case in point:
>>>
>>> I was reading up a bit on non-parametric function estimation (for
>>> PR#562), and there are some simulated functions that are often used to
>>> check different estimators.
>>
>>
>> I seem to remember Skipper actually had a bunch of those implemented
>> already, just can't find the email anymore.
>>
>>> I wanted sometimes to create a collection, but always ended up just
>>> including some in the example scripts.
>>>
>>> I guess putting them in a central location is not so useful, but a
>>> common name for the modules in the different subdirectories would be.
>>>
>> If they're really common, including them in the module may make sense.
>> Perhaps hide all the functions themselves, but expose them in a single dict
>> (with the same name across modules) per module?
>
> We might need more options if we want to make it useful, and add the
> simulation of x/exog and noise to it.
>
> So I ended up with a function and a class as first draft

That's still an example file, just placed in the nonparametric module. You don't want to add all those functions and classes to the nonparametric api do you?

I think a single dict of functions would still make sense. A class for adding various types of noise and plotting can probably be made generic enough that it can be shared across modules.

> The Fan/Gijbels (1992) examples are from a paper on variable bandwidth
> kernel regression, so a fixed bandwidth estimator won't do so well on
> them.

example results and advertising:
Kernel Regression with least squares cross-validation bandwidth choice

That looks good. The left top figure also shows the limitations; sometimes you just need adaptive bandwidths.

Ralf

josef...@gmail.com

unread,
Jan 7, 2013, 4:18:33 PM1/7/13
to pystat...@googlegroups.com
I moved the part of "if __main__ ..." to an example file. Now it only
contains functions and classes.
No, I wouldn't any of those to any api, maybe some more generic
functions or classes like the generate sample for ARMA.

I mentioned them in the rst doc but didn't want to add them all to a
main toc index.

>
> I think a single dict of functions would still make sense. A class for
> adding various types of noise and plotting can probably be made generic
> enough that it can be shared across modules.

I started to make the base class reasonably generic. (interface and
extensions still unclear)
but the advantage of explicit example classes with default values is
that we get immediately "published" test cases.

>
>> > The Fan/Gijbels (1992) examples are from a paper on variable bandwidth
>> > kernel regression, so a fixed bandwidth estimator won't do so well on
>> > them.
>>
>> example results and advertising:
>> Kernel Regression with least squares cross-validation bandwidth choice
>
>
> That looks good. The left top figure also shows the limitations; sometimes
> you just need adaptive bandwidths.

Getting nice plots is pretty useful for seeing how well this works in
different cases.

(adaptive bandwidths would be a nice enhancement)

Josef

>
> Ralf
>
Reply all
Reply to author
Forward
0 new messages