Confidence and Prediction Intervals for Multiple Linear Regression (with picture)

1,198 views
Skip to first unread message

palm...@gmail.com

unread,
Jan 23, 2018, 4:12:38 PM1/23/18
to pystatsmodels
Hi there,

I just saw recently in this forum another person ask about this same topic.
But when I looked at the code samples that were provided, it seems they were either Simple Linear Regression models(only one independent variable) and/or nonlinear type of models.

So I am just wanting to see some very simple code of how to handle a basic Multiple Linear Regression model and create its Confidence and Prediction intervals using StatsModels.

I also have an image that illustrates the regression curve, confidence and prediction interval, see below:
========================================================================================================

==================================================================================================

This plot was created in R-statistics, and this plot is a very nice one in my opinion, it makes use of an Area-Fill to show the confidence and prediction intervals, very nice effect.
This was created with the R-library : ggplot2.

Just curious if anybody also knows which plotting library in Python can do this type of plot.

I have read that : Plotnine is very close to ggplot2.

http://pltn.ca/plotnine-superior-python-ggplot/

OK, hope someone can help.

Regards,

Auto Generated Inline Image 1

josef...@gmail.com

unread,
Jan 28, 2018, 8:38:43 PM1/28/18
to pystatsmodels
On Tue, Jan 23, 2018 at 4:12 PM, <palm...@gmail.com> wrote:
Hi there,

I just saw recently in this forum another person ask about this same topic.
But when I looked at the code samples that were provided, it seems they were either Simple Linear Regression models(only one independent variable) and/or nonlinear type of models.

So I am just wanting to see some very simple code of how to handle a basic Multiple Linear Regression model and create its Confidence and Prediction intervals using StatsModels.

I also have an image that illustrates the regression curve, confidence and prediction interval, see below:
========================================================================================================

==================================================================================================

This plot was created in R-statistics, and this plot is a very nice one in my opinion, it makes use of an Area-Fill to show the confidence and prediction intervals, very nice effect.
This was created with the R-library : ggplot2.

Just curious if anybody also knows which plotting library in Python can do this type of plot.

I have read that : Plotnine is very close to ggplot2.

http://pltn.ca/plotnine-superior-python-ggplot/

OK, hope someone can help.

Regards,


I don't know what's the best way or the library with the most convenient way is, I usually just use plain matplotlib with fill_between.

Note, in multiple regression a plot like this with respect to one explanatory variable needs a decision about what to do with the other explanatory variables. In general, the predicted value depends on all variables and won't be a nice line unless all the other variables are fixed at some value.

Plotting each observations with the actual values will jump around with changes in the values of the other variables, e.g.

This notebook illustrates how to predict the response as function of one explanatory variable while keeping all the other ones at the mean ( last plot in notebook)
based on the example in the other prediction thread.
The scatter points in the last plot are predicted plus residuals which is similar to one of our regression_plots, but those don't have confidence intervals.

(Warning: quickly written without proofreading)


Josef

palm...@gmail.com

unread,
Jan 29, 2018, 12:59:30 PM1/29/18
to pystatsmodels
Hi Josef,

I really really appreciate your great help with providing the sample code!

I discovered that Matplotlib has an area fill just a day after I posted here.
Thanks for providing all this detail, it will really help me a lot!

I will try things out based on your code.

My particular type of model is of the form  usually : y = a0 + a1 * x_1 + a2 * x_2, some time I would have up to 3 independent variables but rarely, buy my model could have just the simple linear case.

But I usually don't plot : y versus x_1 or y versus x_2.
I usually plot : y versus time.

Both x_1 and x_2 quantities are implicit functions of time, so lets say we have some quantity like cost, energy etc every month.
And plotting y versus time gives us a 2-D plot, which is the plot that one can make use of confidence and prediction intervals.

I will get back to you once I try things out.

Thanks once more!

josef...@gmail.com

unread,
Jan 29, 2018, 1:09:58 PM1/29/18
to pystatsmodels
On Mon, Jan 29, 2018 at 12:59 PM, <palm...@gmail.com> wrote:
Hi Josef,

I really really appreciate your great help with providing the sample code!

I discovered that Matplotlib has an area fill just a day after I posted here.
Thanks for providing all this detail, it will really help me a lot!

I will try things out based on your code.

My particular type of model is of the form  usually : y = a0 + a1 * x_1 + a2 * x_2, some time I would have up to 3 independent variables but rarely, buy my model could have just the simple linear case.

But I usually don't plot : y versus x_1 or y versus x_2.
I usually plot : y versus time.

That's similar to the first case, with `range` on the x-axis as in the original example.

In the past I didn't get nice plots when plotting original observations because predicted mean was jumping to much because there was no "smoothness" in the explanatory variables.

So which version looks nice depends on the actual data.


Josef

palm...@gmail.com

unread,
Jan 29, 2018, 1:35:42 PM1/29/18
to pystatsmodels
Hi Josef,

For my data when y(regression values - the average y lets call it) is plotted as a function of time, it is very sinusoidal, or follows a "seasonal" trend.
So my plots turn out always fairly smooth.
I just need to calculate the confidence interval around the average-y value.

I have plotted this in excel for the simple linear case (one independent variable), but could not find much regarding how to do it in the multi-linear case.



So its great to see code that can handle multi-linear case.

So my data columns in the multi-linear case would be:
item# | y-regression | x_1 | x_2 | month |
1       | 32,500         | 345 | 52   | march-2015|
2       | 44, 425        | 256 | 104  | april-2015|
3       | 65, 340        | 202 | 137  | may-2015|
etc
So I would plot, lets say in excel, y-regression versus month.

As you can see the x_1 and x_2 quantities change with every month.

And if y-regression represents cost or something, I would want to know the confidence upper and lower limits of the regression y-values and also have prediction intervals to
see the max or min values of the future data points that would appear within say a 95% confidence.


So, I also don't mind jaggedness as long as I got a upper and lower limit on the y-values of regression.

Regards,

P.
Auto Generated Inline Image 1
Reply all
Reply to author
Forward
0 new messages