Getting a simple predict from OLS something different from .6 to .8

2,199 views
Skip to first unread message

Dartdog

unread,
Jan 24, 2018, 4:01:39 PM1/24/18
to pystatsmodels
I have an OLS model that used to work with SM .6 and now not working in .8
I just don't get what I need to feed to the predict method.
So my model create looks like:

    def fit_line2(x, y):
    X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the calculation of the intercept
    ols_test = sm.OLS(y, X,missing='drop').fit()
    """Return slope, intercept of best fit line."""
    X = sm.add_constant(x)
    return ols_test

And that works fine and I get a model out and can see the summary fine.
I used to do this to get the prediction one period ahead by using my latest value (on which I want to project forward) worked in SM .6
The predict is called as follows:

     yrahead=ols_test.predict(ols_input)
   
ols input is:

     lastqu
     2018-12-31 13209.0  
     type:
     <class 'pandas.core.frame.DataFrame'>

Then: 
     ols_input=(sm.add_constant(merged2.lastqu[-1:], prepend=True))
This gives me an error: 
  ValueError: shapes (1,1) and (2,) not aligned: 1 (dim 1) != 2 (dim 0)

I tried simply feeding the number by changing ols_input to:

    13209.0
    Type: 
    <class 'numpy.float64'>
That gave me a similar error:
ValueError: shapes (1,1) and (2,) not aligned: 1 (dim 1) != 2 (dim 0)

Not sure where to go here?

the base table looks like:
                   Units   lastqu  Uperchg  lqperchg
2000-12-31  19391.000000      NaN      NaN       NaN
2001-12-31  35068.000000   5925.0    80.85       NaN
2002-12-31  39279.000000   8063.0    12.01     36.08
2003-12-31  47517.000000   9473.0    20.97     17.49
2004-12-31  51439.000000  11226.0     8.25     18.51
2005-12-31  59674.000000  11667.0    16.01      3.93
2006-12-31  58664.000000  14016.0    -1.69     20.13
2007-12-31  55698.000000  13186.0    -5.06     -5.92
2008-12-31  42235.000000  11343.0   -24.17    -13.98
2009-12-31  40478.333333   7867.0    -4.16    -30.64
2010-12-31  38721.666667   8114.0    -4.34      3.14
2011-12-31  36965.000000   8361.0    -4.54      3.04
2012-12-31  39132.000000   8608.0     5.86      2.95
2013-12-31  43160.000000   9016.0    10.29      4.74
2014-12-31  44520.000000   9785.0     3.15      8.53
2015-12-31  49966.000000  10351.0    12.23      5.78
2016-12-31  53752.000000  10884.0     7.58      5.15
2017-12-31  57571.000000  12109.0     7.10     11.26
2018-12-31           NaN  13209.0      NaN      9.08

So I'm using the OLS against the lastqu to project units for 2018

I freely confess to not really understanding why SM .6 worked the way it did, but it did!


josef...@gmail.com

unread,
Jan 27, 2018, 9:52:28 AM1/27/18
to pystatsmodels
There is a bug in add_constant in that it cannot handle scalar values (anymore). 
AFAIR, pandas support in add_constant changed in 0.8

I don't get an exception with master, but it doesn't add a constant.
It looks like the has_constant check is currently always true if there is only one row.


>>> df = pd.DataFrame(np.arange(6).reshape(3,2))
>>> df
   0  1
0  0  1
1  2  3
2  4  5
>>> add_constant(df.iloc[2], prepend=True)
   const  2
0    1.0  4
1    1.0  5
>>> add_constant(df.iloc[2, 1], prepend=True)
array(5)
>>> df.iloc[2, 1]
5


forcing a constant raises

>>> add_constant(df.iloc[2, 1], prepend=True, has_constant='add')
Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    add_constant(df.iloc[2, 1], prepend=True, has_constant='add')
  File "m:\...\statsmodels\tools\tools.py", line 287, in add_constant
    x = [np.ones(x.shape[0]), x]
IndexError: tuple index out of range


There is an ambiguity for whether a series is row or column
(It looks like a series is interpreted like a one column dataframe)

>>> add_constant(df.iloc[2:3], prepend=True)
   0  1
2  4  5
>>> add_constant(df.iloc[2:3, 1], prepend=True)
   1
2  5
>>> df.iloc[2:3, 1:]
   1
2  5


As workaround: make sure that the object for add_constant is 2-D


>>> add_constant(df.iloc[2:3, 1:], prepend=True, has_constant='add')
   const  1
2    1.0  5
>>> df.iloc[2:3, 1:].shape
(1, 1)

or slice after adding constant

>>> add_constant(df[1], prepend=True)[2:]
   const  1
2    1.0  5
>>> add_constant(df[1], prepend=True).iloc[2:]
   const  1
2    1.0  5 

Josef

Dartdog

unread,
Jan 27, 2018, 11:57:20 AM1/27/18
to pystatsmodels
Wow, thanks so much, no way I could have figured that out!! 
Is it a bug Is should file with Pandas? (it seems?)
Confirm and I will..  
Thanks again
Tom

Dartdog

unread,
Jan 27, 2018, 12:15:31 PM1/27/18
to pystatsmodels
Or are you saying the you have it fixed in SM master?

josef...@gmail.com

unread,
Jan 27, 2018, 12:28:13 PM1/27/18
to pystatsmodels
It's a bug/limitation in add_constant, pandas behaves as expected.

It's not fixed in master, it just has a different problem.
The exception that you got seems to be fixed but add_constant still doesn't work for the scalar case.

I guess the scalar case is not what we thought about for add_constant, because it's the same as a simple list `[1, value]` except for the additional pandas index.

Josef

josef...@gmail.com

unread,
Jan 27, 2018, 12:34:02 PM1/27/18
to pystatsmodels
On Sat, Jan 27, 2018 at 12:28 PM, <josef...@gmail.com> wrote:
It's a bug/limitation in add_constant, pandas behaves as expected.

It could be that pandas changed the behavior at some point in the past, but the way it is now corresponds to standard numpy behavior.

Josef

Dartdog

unread,
Jan 28, 2018, 3:01:00 PM1/28/18
to pystatsmodels
Sorry to be dense but I cannot seem to create the 2d array trying to adapt your examples to my DF (merged2) see 1st post last entry, so I'm trying to feed the "lastqu" col value from the last row into the predict function to predict Units..
so :

    ols_input=(sm.add_constant(merged2.iloc[-1:,1:2], prepend=True)[2:])
gives me:

    print (ols_input)
    Empty DataFrame 
    Columns: [lastqu] 
    Index: []
While:

   ols_input=(sm.add_constant(merged2.iloc[-1:,1:2], prepend=True))
Gives:

   print (ols_input)
    lastqu
    2018-12-31 13209.0

And I can't get a ols input with the proper shape (and I've tried a bunch of variants that I have not included) so I'm still stuck I'm not exactly sure what the "proper' contents of the array should be so I can't figure out how to construct it elsewise
???

Dartdog

unread,
Jan 28, 2018, 3:09:02 PM1/28/18
to pystatsmodels
The second example here shows a shape of (1,1) but rather obviously does not show a constant? print (merged2.iloc[-1:,1:2].shape) =(1,1)

josef...@gmail.com

unread,
Jan 28, 2018, 3:17:35 PM1/28/18
to pystatsmodels
On Sun, Jan 28, 2018 at 3:09 PM, Dartdog <tombr...@gmail.com> wrote:
The second example here shows a shape of (1,1) but rather obviously does not show a constant? print (merged2.iloc[-1:,1:2].shape) =(1,1)


On Sunday, January 28, 2018 at 2:01:00 PM UTC-6, Dartdog wrote:
Sorry to be dense but I cannot seem to create the 2d array trying to adapt your examples to my DF (merged2) see 1st post last entry, so I'm trying to feed the "lastqu" col value from the last row into the predict function to predict Units..
so :

    ols_input=(sm.add_constant(merged2.iloc[-1:,1:2], prepend=True)[2:])
gives me:

    print (ols_input)
    Empty DataFrame 
    Columns: [lastqu] 
    Index: []
While:

   ols_input=(sm.add_constant(merged2.iloc[-1:,1:2], prepend=True))


Note, you need to add  `has_constant='add'` as keyword option to force adding a constant even if there is already a column with constant values, which is trivially satisfied if there is only one row. (current behavior where a single row is not treated as a special case)

Josef

Dartdog

unread,
Jan 28, 2018, 3:20:54 PM1/28/18
to pystatsmodels
Not pretty but this gets the job done it seems?

    ols_input=np.array([1,merged2.lastqu[-1:].values])

josef...@gmail.com

unread,
Jan 28, 2018, 3:29:10 PM1/28/18
to pystatsmodels
On Sun, Jan 28, 2018 at 3:20 PM, Dartdog <tombr...@gmail.com> wrote:
Not pretty but this gets the job done it seems?

    ols_input=np.array([1,merged2.lastqu[-1:].values])

Yes, that's much more direct and faster than letting a general function handle the special case (which it currently doesn't).

There should be an asarray in predict so that even a list like  ols_input = [1,merged2.lastqu[-1:].values] should work.

Josef
Reply all
Reply to author
Forward
0 new messages