I thought a fairly simple regression question? using a DF value as predictor?

663 views
Skip to first unread message

Dartdog

unread,
Jan 23, 2014, 6:23:33 PM1/23/14
to pystat...@googlegroups.com
On SO.

So now I have:

def fit_line2(x, y):
    X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the calculation of the intercept
    model = sm.OLS(y, X,missing='drop').fit()
    """Return slope, intercept of best fit line."""
    X = sm.add_constant(x)
    return model

And:

model=fit_line2(merged2[:-1].lastqu,merged2[:-1].Units)
print fit.summary()

But I cannot get

yrahead2=model.predict(merged2.lastqu[-1:]) 

or any variant to give me a prediction? Note that the pd.ols uses the same merged2.lastqu[-1:] to grab the data I want to 'predict" from, no matter what I put into the () for predict I'm not having any joy. It seems statsmodels wants something specific in the () other than a pandas DF cell I even tried to just put a number eg 2696 there but still nothing... My current error is

----> 3 yrahead2=model.predict(merged2.lastqu[-1:])

/usr/lib/pymodules/python2.7/statsmodels/base/model.pyc in predict(self, exog, transform, *args, **kwargs)
   1004             exog = np.atleast_2d(exog) # needed in count model shape[1]
   1005 
-> 1006         return self.model.predict(self.params, exog, *args, **kwargs)
   1007 
   1008 

/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.pyc in predict(self, params, exog)
    253         if exog is None:
    254             exog = self.exog
--> 255         return np.dot(exog, params)
    256 
    257 class GLS(RegressionModel):

ValueError: objects are not aligned

> /usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py(255)predict()
    254             exog = self.exog
--> 255         return np.dot(exog, params)
    256 

josef...@gmail.com

unread,
Jan 23, 2014, 7:06:37 PM1/23/14
to pystatsmodels
On Thu, Jan 23, 2014 at 6:23 PM, Dartdog <tombr...@gmail.com> wrote:
> On SO.
> http://stackoverflow.com/questions/21319255/pandas-statsmodels-ols-regression-prediction-using-df-predictor
>
> So now I have:
>
> def fit_line2(x, y):
> X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the
> calculation of the intercept
> model = sm.OLS(y, X,missing='drop').fit()
> """Return slope, intercept of best fit line."""
> X = sm.add_constant(x)
> return model
>
> And:
>
> model=fit_line2(merged2[:-1].lastqu,merged2[:-1].Units)
> print fit.summary()
>
> But I cannot get
>
> yrahead2=model.predict(merged2.lastqu[-1:])

try `.values` or `asarray`

yrahead2=model.predict(merged2.lastqu[-1:].values)

there might be a missing array conversion if pandas dataframes are
used without formulas. (bug)

There were also some changes to whether exog in predict is required to
be 2d or not, but I think that might be only in master for OLS.

Josef

josef...@gmail.com

unread,
Jan 23, 2014, 7:16:25 PM1/23/14
to pystatsmodels
On Thu, Jan 23, 2014 at 7:06 PM, <josef...@gmail.com> wrote:
> On Thu, Jan 23, 2014 at 6:23 PM, Dartdog <tombr...@gmail.com> wrote:
>> On SO.
>> http://stackoverflow.com/questions/21319255/pandas-statsmodels-ols-regression-prediction-using-df-predictor
>>
>> So now I have:
>>
>> def fit_line2(x, y):
>> X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the
>> calculation of the intercept
>> model = sm.OLS(y, X,missing='drop').fit()
>> """Return slope, intercept of best fit line."""
>> X = sm.add_constant(x)
>> return model
>>
>> And:
>>
>> model=fit_line2(merged2[:-1].lastqu,merged2[:-1].Units)
>> print fit.summary()
>>
>> But I cannot get
>>
>> yrahead2=model.predict(merged2.lastqu[-1:])
>
> try `.values` or `asarray`
>
> yrahead2=model.predict(merged2.lastqu[-1:].values)
>
> there might be a missing array conversion if pandas dataframes are
> used without formulas. (bug)
>
> There were also some changes to whether exog in predict is required to
> be 2d or not, but I think that might be only in master for OLS.

From what I can see, in the 0.5 release there is no np.asarray(exog)
in predict, but it is in current master.

If a formula is present, then the code goes through patsy, which
converts the DataFrame (or Series ?) to a ndarray. (already in 0.5)

Josef

josef...@gmail.com

unread,
Jan 23, 2014, 7:37:14 PM1/23/14
to pystatsmodels
On Thu, Jan 23, 2014 at 6:23 PM, Dartdog <tombr...@gmail.com> wrote:
I get `matrices` in my numpy version
ValueError: matrices are not aligned

>
>>
>> /usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py(255)predict()
> 254 exog = self.exog
> --> 255 return np.dot(exog, params)
> 256


Sorry I didn't read carefully.

your `merged2.lastqu[-1:]` doesn't contain the constant

yrahead2=model.predict(sm.add_constant(merged2.lastqu[-1:], prepend=True))

or add it to the dataframe in the same way as the X in the model.



Josef

Dartdog

unread,
Jan 23, 2014, 7:37:49 PM1/23/14
to pystat...@googlegroups.com
Not sure of what syntax to use for asarray? I'll try if you tell me how (I did try a few things but no luck)
Using '0.6.0.dev-Unknown' on statsmodels (via overnight ubuntu updates)
tried yrahead2=model.predict(merged2.lastqu[-1:].values) Got:
ValueError                                Traceback (most recent call last)
<ipython-input-130-2610639fcd9a> in <module>()
      1 #projqu=merged2.lastqu[-1:].get_value(0)
      2 #print model.params
----> 3 yrahead2=model.predict(merged2.lastqu[-1:].values)
      4 model


/usr/lib/pymodules/python2.7/statsmodels/base/model.pyc in predict(self, exog, transform, *args, **kwargs)
   1004             exog = np.atleast_2d(exog) # needed in count model shape[1]
   1005 
-> 1006         return self.model.predict(self.params, exog, *args, **kwargs)
   1007 
   1008 

/usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.pyc in predict(self, params, exog)
    253         if exog is None:
    254             exog = self.exog
--> 255         return np.dot(exog, params)
    256 
    257 class GLS(RegressionModel):

ValueError: objects are not aligned

> /usr/lib/pymodules/python2.7/statsmodels/regression/linear_model.py(255)predict()
    254             exog = self.exog
--> 255         return np.dot(exog, params)
    256 

Dartdog

unread,
Jan 23, 2014, 7:40:45 PM1/23/14
to pystat...@googlegroups.com
Yahoo! that did it thank you so much!

josef...@gmail.com

unread,
Jan 23, 2014, 7:53:00 PM1/23/14
to pystatsmodels
If you just want the predicted values for the data (or a subset) of
it, then you could also directly use `fittedvalues`.
Given that matplotlib interpolates, the plot with `fittedvalues` will
look the same, as the one on your grid.
`fittedvalues` is a pandas.Series if you use pandas for X and y.

Josef

Dartdog

unread,
Jan 23, 2014, 8:07:09 PM1/23/14
to pystat...@googlegroups.com
is there a better place in the docs to see this stuff?

josef...@gmail.com

unread,
Jan 23, 2014, 8:45:34 PM1/23/14
to pystatsmodels
On Thu, Jan 23, 2014 at 8:07 PM, Dartdog <tombr...@gmail.com> wrote:
> is there a better place in the docs to see this stuff?

the class documentation shows the available attributes and methods
http://statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.html

some docstrings are not very explicit or extensive, and help to
improve those would be very welcome
statsmodels.sourceforge.net/devel/generated/statsmodels.regression.linear_model.OLSResults.predict.html

also, predict is in most Results classes inherited from the top-level
and behaves the same way in all/most models (with model specific extra
keywords, but no model/results specific docstring).

Most often usage examples are only in the examples notebooks.
http://statsmodels.sourceforge.net/devel/examples/notebooks/generated/predict.html
(without backlinks to individual sections )


Josef

Dartdog

unread,
Jan 23, 2014, 9:19:58 PM1/23/14
to pystat...@googlegroups.com
FWIW I never could have figured out this option/syntax you gave me from the docs or examples!
yrahead2=model.predict(sm.add_constant(merged2.lastqu[-1:], prepend=True)) 

Thanks again.

josef...@gmail.com

unread,
Jan 23, 2014, 10:11:54 PM1/23/14
to pystatsmodels
On Thu, Jan 23, 2014 at 9:19 PM, Dartdog <tombr...@gmail.com> wrote:
> FWIW I never could have figured out this option/syntax you gave me from the
> docs or examples!
> yrahead2=model.predict(sm.add_constant(merged2.lastqu[-1:], prepend=True))

It's obvious :) that's why it's not explained. (*)

predict needs an `exog` x that has the same structure (number of
columns after reshaping) as the original `exog` X used in the
estimation. (because we want to predict the same relationship between
a y_predicted and an x as we did for the Y and X that were used in the
estimation.)

If you use formulas, then patsy is doing the conversion from original
data(frame) to the `exog` X both in the model.__init__ and in predict
(and in t_test) as long as the variable names are available in the
dataframe/dictionary.

If you create X yourself, you need to also to create the matching x in
predict yourself, since the model doesn't know what your `exog` are or
how they were created.
If you make any transformation to your original data (like
add_constant), then you need to make the **same** transformation of
your original data for the x in predict.

I hope that helps to understand the pattern.


(*) it's often difficult to tell what's "obvious" and what's not.
I find myself often staring at a problem for a long time, and then it
becomes "obvious", and later I forget that it took me a while to
figure out why or how it is obvious (and that it might not be obvious
to everyone).

Josef
Reply all
Reply to author
Forward
0 new messages