How do I run Nadaraya-Watson kernel regression?

2,771 views
Skip to first unread message

David Montgomery

unread,
Mar 7, 2014, 5:48:46 AM3/7/14
to pystat...@googlegroups.com
Hi,

I am using version 5.

How do I run Nadaraya-Watson kernel regression?

Here is xo the point I am at.  array([ 1.66172,  1.66167,  1.66179,  1.66167,  1.66176])


Here are my K nearest neighbors to xo.

        x_0          x_1          x_2         x_3         x_4 
0   1.66070  1.66076  1.66134  1.66133  1.66175
1   1.66170  1.66123  1.66115  1.66152  1.66175
2   1.66185  1.66196  1.66171  1.66145  1.66178
3   1.66152  1.66175  1.66188  1.66186  1.66173
4   1.66209  1.66181  1.66172  1.66167  1.66179
5   1.66189  1.66193  1.66209  1.66181  1.66172
6   1.66214  1.66208  1.66185  1.66191  1.66180
7   1.66178  1.66189  1.66193  1.66209  1.66181
8   1.66142  1.66150  1.66185  1.66196  1.66171
9   1.66133  1.66175  1.66118  1.66112  1.66170
10  1.66208  1.66185  1.66191  1.66180  1.66170
11  1.66185  1.66191  1.66180  1.66170  1.66183
12  1.66193  1.66209  1.66181  1.66172  1.66167
13  1.66181  1.66172  1.66167  1.66179  1.66167
14  1.66095  1.66116  1.66142  1.66150  1.66185
15  1.66164  1.66192  1.66214  1.66208  1.66185
16  1.66115  1.66152  1.66175  1.66188  1.66186
17  1.66175  1.66188  1.66186  1.66173  1.66165
18  1.66188  1.66186  1.66173  1.66165  1.66164
19  1.66123  1.66115  1.66152  1.66175  1.66188



kernel_regression = statsmodels.nonparametric.kernel_regression
kr = kernel_regression.KernelReg([embedding[-1]],X,'c')

Or....is X a list of arrays?  e.g, [[1.66070  1.66076  1.66134  1.66133  1.66175],[1.66070  1.66076  1.66134  1.66133  1.66175]]
tried tat too.  Dont work. 

Traceback (most recent call last):
  File "/home/ubuntu/workspace/chaos/forecast.py", line 168, in <module>
    forecast = get_forecast(local_df,X_cols)
  File "/home/ubuntu/workspace/chaos/forecast.py", line 132, in get_forecast
    kr = kernel_regression.KernelReg([embedding[-1]],X,'c')
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/nonparametric/kernel_regression.py", line 100, in __init__
    self.exog = _adjust_shape(exog, self.k_vars)
  File "/usr/local/lib/python2.7/dist-packages/statsmodels/nonparametric/_kernel_base.py", line 443, in _adjust_shape
    dat = np.reshape(dat, (nobs, k_vars))
  File "/usr/local/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 172, in reshape
    return reshape(newshape, order=order)
ValueError: total size of new array must be unchanged





Only  thing that worked was the below. 

kr = kernel_regression.KernelReg([ 1.66172,  1.66167,  1.66179,  1.66167,  1.66176],1.66070  1.66076  1.66134  1.66133  1.66175,'c')
KernelReg instance
Number of variables: k_vars = 1
Number of samples:   N = 5
Variable types:      c
BW selection method: cv_ls
Estimator type: ll













josef...@gmail.com

unread,
Mar 7, 2014, 7:33:38 AM3/7/14
to pystatsmodels
kernel regression will also need an array of y and an array of x, like
the other models, y and x with same shape[0], and y is 1 dimensional.

The model is y = f(x) + u, where f is the unknown, nonparametrically
estimated function.

It seems to me that you are using one x point as the dependent variable y.

There are some examples inside the statsmodels sourcetree, that I used
to try them out

for example
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/examples/ex_kernel_regression_dgp.py
https://github.com/statsmodels/statsmodels/blob/master/statsmodels/examples/ex_kernel_regression2.py

Josef

>
>
>
>
>
>
>
>
>
>
>
>
>

Anastasia Sokolova

unread,
Dec 8, 2019, 4:50:25 PM12/8/19
to pystatsmodels
Hi, I think I'm late with the answer for 5 years, mb it will be useful for others.. I faced the same problem as you, my python didn't want to work with error "cannot reshape 9805 to size (1961,1) while I wanted to give 5 variables with following code:


KernelReg(X_train_new_1['goal1'], X_train_new_1[['field1','field12','field14','field16','field25']], 'c') 

where y = X_train_new_1['goal1']
X = X_train_new_1[['field1','field12','field14','field16','field25']]
y.shape = 1961 (observations)
X.shape = (1961, 5) (5 variables of 1961 obs)

I was surprised, when I found that problem was in the third positional argument var_type. Because I wanted to pass 5 variables I should give as a var_type the list of types of vars for each varible, for example:

KernelReg(X_train_new_1['goal1'], X_train_new_1[['field1','field12','field14','field16','field25']], var_type = ['c', 'c', 'c', 'c', 'c']) 
or
KernelReg(X_train_new_1['goal1'], X_train_new_1[['field1','field12','field14','field16','field25']], var_type = ['c', 'u', 'c', 'o', 'c']) 

I think the main problem in using statsmodels is poor documentation




пятница, 7 марта 2014 г., 14:48:46 UTC+4 пользователь David Montgomery написал:

kalyan dasgupta

unread,
May 5, 2020, 9:14:18 AM5/5/20
to pystatsmodels
Hi,

Thanks a lot for the reply. It did help. A slight change is required. You have to convert the list to a string. For example,
var = ['c', 'c', 'c', 'c', 'c']
var1=''
for ii in var:
       var1 += ii

KernelReg(Y, X, var_type = var1) 

Thanks a lot. I was struggling with this. You made it happen. :)
Reply all
Reply to author
Forward
0 new messages