merging nonparametric and others

14 views
Skip to first unread message

josef...@gmail.com

unread,
Jan 2, 2013, 4:36:34 PM1/2/13
to pystatsmodels
I would like to have some of the waiting pull request merged within
the next 2 weeks or so.

nonparametric:
This is the GSOC work of George with a lot of work also done by Ralph
The main parts are under statsmodels.nonparametric, additional parts
(that don't have enough test coverage) are in the
sandbox.nonparametric.

There are still a few variable names that are too short to make the
code easy to understand, but overall this looks good. In large parts
similar to the R package "np".
This needs exposure to find where the interface is not so nice or too
different from other parts of statsmodels, and to see whether it works
in all (most) cases that users can throw at it.

Following the recent discussion on what KDE.fit() should return, one
change that we might have to do with some of it's classes, is to
change the returns of fit. For example, I guess we should create a
result class for KernelReg and similar.

What we still need to do is add the new nonparametric parts to the
documentation, and provide more example files for the different
classes.

I started to play with some examples for KernelReg to see how it works.

My suggestion: Merge it now, and clean up or refactor parts in new
pull requests if or when we find anything to change.

Josef

Ralf Gommers

unread,
Jan 2, 2013, 4:42:20 PM1/2/13
to pystat...@googlegroups.com
Let me first fix the things you commented on, and the hang that I can't yet reproduce. Otherwise, +1 on merging soon.

Ralf

josef...@gmail.com

unread,
Jan 2, 2013, 6:00:43 PM1/2/13
to pystat...@googlegroups.com
Don`t try to run with joblib on Windows. My computer has been frozen
for half an hour and I hope it will come back without hard restart
(loosing a month of open windows.)

Josef


>
> Ralf
>

josef...@gmail.com

unread,
Jan 2, 2013, 9:13:41 PM1/2/13
to pystat...@googlegroups.com
Didn`t come back, hard restart, log files are gone :(

There is something awfully wrong with the joblib part.
It keeps creating new processes, most likely in a recursive way.
After about 500 created processes, I was fast enough with Ctrl-C to
kill the new processes. And after I managed to stop creating new
processes, I`m left with 40 dead processes.

Josef


>
> Josef
>
>
>>
>> Ralf
>>

josef...@gmail.com

unread,
Jan 3, 2013, 8:00:55 AM1/3/13
to pystat...@googlegroups.com
"user error": I had forgotten to protect the script with ``if __name__
is '__main__'

upgrading to joblib 0.7 raises an exception to avoid the recursive
creation of new processes. (I don't find the thread where Gael
discussed and introduced this.)

as aside: running the original script in IDLE didn't cause the excess
process creation, because IDLE always runs in a different '__main__'
(or something like that).

using defaults=sm.nonparametric.EstimatorSettings(efficient=True)
works fine now.
except the bandwidth is 3 times higher with efficient=True than with
efficient=False.

Josef


>
> Josef
>
>
>>
>> Josef
>>
>>
>>>
>>> Ralf
>>>
Reply all
Reply to author
Forward
0 new messages