Automatic Forecasting

Chad Fulton

未读，

2018年3月3日 23:08:472018/3/3

收件人 Statsmodels Mailing List

I am glad to see interest in automatic forecasting for GSOC 2018! I thought I'd write a brief description of where we are and what I think we'd like to see integrated. I'm not an expert on this topic, though, so hopefully other people will reply with things they'd like to see. Also students should feel free to put other features in their proposal.

References

--------------

The basic reference is Hyndman and Khandakar (2008), which can be found at https://www.jstatsoft.org/article/view/v027i03/v27i03.pdf

This is implemented in the R `forecast` package in the function auto.arima (https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/auto.arima). We cannot use / translate code from the forecast package (including this function) because it has an incompatible license. However, we can look at the signature and description on this link.

E-views implements Hyndman and Khandakar (2008), and they describe their process here: http://www.eviews.com/help/helpintro.html#page/content/series-Automatic_ARIMA_Forecasting.html

Hyndman also allows for automatic selection of exponential smoothing models, as in the function `ets`, see e.g. https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/ets and also https://otexts.org/fpp2/estimation-and-model-selection.html

Models in Statsmodels

----------------------------

There are primarily three types of models from which we'll want to consider forecasts:

- SARIMAX

- Unobserved components (UC)

- Exponential smoothing (ES)

I anticipate that GSOC proposals will probably use these models and will not construct new models, but that doesn't have to be the case if a student has something particular in mind.

Within SARIMAX / UC models information criteria (IC) can be used to select a model, and within ES models IC can be used, but IC cannot be used to select between e.g. SARIMAX and ES. See for example https://otexts.org/fpp2/arima-ets.html.

However, we can produce a comparison using out-of-sample forecasting exercises (e.g. estimate the parameters on a subset of the data and then compare on MSE of h-step ahead forecasts on the remaining data). This is what Hyndman refers to as time series cross validation, see https://www.otexts.org/fpp/2/5 and https://robjhyndman.com/hyndsight/tscv/.

josef...@gmail.com

未读，

2018年3月3日 23:31:072018/3/3

收件人 pystatsmodels

On Sat, Mar 3, 2018 at 11:08 PM, Chad Fulton <chadf...@gmail.com> wrote:

I am glad to see interest in automatic forecasting for GSOC 2018! I thought I'd write a brief description of where we are and what I think we'd like to see integrated. I'm not an expert on this topic, though, so hopefully other people will reply with things they'd like to see. Also students should feel free to put other features in their proposal.

References
--------------

The basic reference is Hyndman and Khandakar (2008), which can be found at https://www.jstatsoft.org/article/view/v027i03/v27i03.pdf

This is implemented in the R `forecast` package in the function auto.arima (https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/auto.arima). We cannot use / translate code from the forecast package (including this function) because it has an incompatible license. However, we can look at the signature and description on this link.

E-views implements Hyndman and Khandakar (2008), and they describe their process here: http://www.eviews.com/help/helpintro.html#page/content/series-Automatic_ARIMA_Forecasting.html

Hyndman also allows for automatic selection of exponential smoothing models, as in the function `ets`, see e.g. https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/ets and also https://otexts.org/fpp2/estimation-and-model-selection.html

Models in Statsmodels
----------------------------

There are primarily three types of models from which we'll want to consider forecasts:

- SARIMAX
- Unobserved components (UC)
- Exponential smoothing (ES)

I anticipate that GSOC proposals will probably use these models and will not construct new models, but that doesn't have to be the case if a student has something particular in mind.

Within SARIMAX / UC models information criteria (IC) can be used to select a model, and within ES models IC can be used, but IC cannot be used to select between e.g. SARIMAX and ES. See for example https://otexts.org/fpp2/arima-ets.html.

Not necessarily true

crucial is "and the likelihood is computed in different ways"

In our MLE models we use a consistent definition across models, llf is always the full likelihood value and we don't drop terms that are irrelevant for the optimization but necessary for the comparison across models (with the same distributional assumption).

Also I think in linear models the sum of squares definition for "quasi-normal" models should allow a consistent comparison across models.

A possible inconsistency can arise by whether auxiliary parameters like scale are counted in k_params or not.

However, we can produce a comparison using out-of-sample forecasting exercises (e.g. estimate the parameters on a subset of the data and then compare on MSE of h-step ahead forecasts on the remaining data). This is what Hyndman refers to as time series cross validation, see https://www.otexts.org/fpp/2/5 and https://robjhyndman.com/hyndsight/tscv/.

Chad,

you could move this later to an issue, where, at least for me, it will be easier to find.

(I'm just shutting down windows to get ready for a few days of skiing, and found a few more open tabs)

There are a few issues on outlier detection in tsa

https://github.com/statsmodels/statsmodels/issues/2571

https://github.com/statsmodels/statsmodels/issues/3285

https://github.com/statsmodels/statsmodels/issues/2873#issuecomment-209286703 starting point of outlier/anomaly discussion

one specific issue on smarter choice of start params in order selection using warm start

https://github.com/statsmodels/statsmodels/issues/2198

Chad Fulton

未读，

2018年3月3日 23:46:502018/3/3

收件人 Statsmodels Mailing List

On Sat, Mar 3, 2018 at 11:31 PM, <josef...@gmail.com> wrote:

On Sat, Mar 3, 2018 at 11:08 PM, Chad Fulton <chadf...@gmail.com> wrote:
I am glad to see interest in automatic forecasting for GSOC 2018! I thought I'd write a brief description of where we are and what I think we'd like to see integrated. I'm not an expert on this topic, though, so hopefully other people will reply with things they'd like to see. Also students should feel free to put other features in their proposal.

References
--------------

The basic reference is Hyndman and Khandakar (2008), which can be found at https://www.jstatsoft.org/article/view/v027i03/v27i03.pdf

This is implemented in the R `forecast` package in the function auto.arima (https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/auto.arima). We cannot use / translate code from the forecast package (including this function) because it has an incompatible license. However, we can look at the signature and description on this link.

E-views implements Hyndman and Khandakar (2008), and they describe their process here: http://www.eviews.com/help/helpintro.html#page/content/series-Automatic_ARIMA_Forecasting.html

Hyndman also allows for automatic selection of exponential smoothing models, as in the function `ets`, see e.g. https://www.rdocumentation.org/packages/forecast/versions/8.1/topics/ets and also https://otexts.org/fpp2/estimation-and-model-selection.html

Models in Statsmodels
----------------------------

There are primarily three types of models from which we'll want to consider forecasts:

- SARIMAX
- Unobserved components (UC)
- Exponential smoothing (ES)

I anticipate that GSOC proposals will probably use these models and will not construct new models, but that doesn't have to be the case if a student has something particular in mind.

Within SARIMAX / UC models information criteria (IC) can be used to select a model, and within ES models IC can be used, but IC cannot be used to select between e.g. SARIMAX and ES. See for example https://otexts.org/fpp2/arima-ets.html.

Not necessarily true
crucial is "and the likelihood is computed in different ways"

In our MLE models we use a consistent definition across models, llf is always the full likelihood value and we don't drop terms that are irrelevant for the optimization but necessary for the comparison across models (with the same distributional assumption).
Also I think in linear models the sum of squares definition for "quasi-normal" models should allow a consistent comparison across models.

A possible inconsistency can arise by whether auxiliary parameters like scale are counted in k_params or not.

The ES models in master aren't developed from a likelihood-based perspective, so I don't know what the justification for comparing SARIMAX / UC and ES using IC would be? As far as comparing within the ES models, in my view, the IC are just a proxy for minimizing the in-sample SSE.

josef...@gmail.com

未读，

2018年3月4日 00:08:032018/3/4

收件人 pystatsmodels

In (cross-section) linear models AIC is equivalent to some leave-one-out statistic, so still better than SSE because of the penalization by number of parameters.

IMO, those IC are good for preliminary model selection because they are fast to compute, even when the final selection uses some out-of-sample forecast (or leave-k-out) cross-validation.

My guess is that even if there is no strict theoretical justification, using the same comparison measure and penalization across models provides useful results, e.g. just think of the IC as penalized SSE

https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/eval_measures.py#L416

Qualification: I never read much of the literature on IC in tsa, although they are used everywhere in tsa..

Josef

Brian R

未读，

2018年3月7日 09:58:322018/3/7

收件人 pystatsmodels

Are the ES models actually implemented and released, i.e in statsmodels 0.8.0?

Abhijeet Panda

未读，

2018年3月7日 10:25:122018/3/7

收件人 pystatsmodels

Hi Chad and Josef,

I have done some basic research on how to proceed with this project. I have gone through the references you provided.

These are some of my conclusions, please help me for further clarification in order to properly understand it:

In Exponential Smoothing models, we can use the AIC for model selection and estimation as described in
https://otexts.org/fpp2/estimation-and-model-selection.html
which is also used in the ets() in R.
To select between SARIMAX/ARIMA and ES,
we can see that this link
https://otexts.org/fpp2/arima-ets.html
describes an out-of-sample method like time series Cross-Validation(tsCV) and MSE for non-seasonal data and uses RMSE, MAPE, and MASE for seasonal data.
The forecast package uses state-space models for all Exponential smoothing methods to make point forecasts which are explained in Hyndman's Paper. This makes sense as there are nearly 30 ES variations and statsmodels' ES implementation has 9 of them(I just had a glance at it). Shall we proceed with the existing models or build new models?

Chad Fulton

未读，

2018年3月7日 19:35:062018/3/7

收件人 Statsmodels Mailing List

On Wed, Mar 7, 2018 at 9:58 AM, Brian R <rak...@gmail.com> wrote:

Are the ES models actually implemented and released, i.e in statsmodels 0.8.0?

Unfortunately no. They are implemented and in master (https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tsa/holtwinters.py), but won't be part of a release until 0.9.

(Also, there is an alternative implementation in https://github.com/statsmodels/statsmodels/pull/4183).

Chad Fulton

未读，

2018年3月7日 19:41:222018/3/7

收件人 Statsmodels Mailing List

That's a good start! I think that we will not want to implement Hyndman's innovations state space approach as part of an automatic forecasting project, because the existing models probably cover most of the interesting cases, and so the most important part is the automatic forecasting infrastructure itself.

For the next step, it would be a good idea to start working on an application (e.g. http://python-gsoc.org/studenttemplate.html) that you can post here. Although the deadline for applications isn't until March 27, the sooner you get an application, the more time we'll have to polish it.

In particular, the "Proposal Detailed Description/Timeline (*)" section will require you to think through more concrete parts of what you are proposing, and that can take a bit of time. However, if you get a rough outline, we can usually help you fill it in, or move things around, etc.

Also, if you haven't yet, there is a lot of useful and important information available about Python / GSOC at http://python-gsoc.org/#gettingstarted

Best,

Chad

Abhijeet Panda

未读，

2018年3月7日 20:02:282018/3/7

收件人 pystatsmodels

Hi Chad,

Thanks for the advice. I haven't started working on the proposal part. I'll get started on it right now. I just had a look at the proposal template. In the code sample part, I haven't made enough patches to the sub-org except for the one here

https://github.com/statsmodels/statsmodels/pull/4290,
if you have any issues related to this project, can you please help me find them. I am new to the code base.

Apart from this, Is there any other suggestions for the proposal that you want me to look into?

Chad Fulton

未读，

2018年3月8日 19:59:092018/3/8

收件人 Statsmodels Mailing List

On Wed, Mar 7, 2018 at 8:02 PM, Abhijeet Panda <abhijeet...@gmail.com> wrote:

Hi Chad,
Thanks for the advice. I haven't started working on the proposal part. I'll get started on it right now. I just had a look at the proposal template. In the code sample part, I haven't made enough patches to the sub-org except for the one here
https://github.com/statsmodels/statsmodels/pull/4290,
if you have any issues related to this project, can you please help me find them. I am new to the code base.

Do you have any background already in econometrics or statistics? That could help me find an appropriate place for you to make an initial contribution for your code sample (but note that the code sample does not have to be very large or complex for you to have a successful proposal).

Apart from this, Is there any other suggestions for the proposal that you want me to look into?

I gave a link to Eviews' documentation in my original e-mail. In my mind the eventual implementation in Statsmodels should look a lot like that (in terms of supported features and our general approach).

Abhijeet Panda

未读，

2018年3月8日 20:51:242018/3/8

收件人 pystatsmodels

Hi Chad,

On Friday, March 9, 2018 at 6:29:09 AM UTC+5:30, Chad Fulton wrote:

On Wed, Mar 7, 2018 at 8:02 PM, Abhijeet Panda <abhijeet...@gmail.com> wrote:
Hi Chad,
Thanks for the advice. I haven't started working on the proposal part. I'll get started on it right now. I just had a look at the proposal template. In the code sample part, I haven't made enough patches to the sub-org except for the one here
https://github.com/statsmodels/statsmodels/pull/4290,
if you have any issues related to this project, can you please help me find them. I am new to the code base.

Do you have any background already in econometrics or statistics? That could help me find an appropriate place for you to make an initial contribution for your code sample (but note that the code sample does not have to be very large or complex for you to have a successful proposal).

I don't have a very deep background but, I have some basic knowledge of statistics which I used to study in machine learning. Right now, I'm writing a test class for the plot_simultaneous() for the above pull request.

Apart from this, Is there any other suggestions for the proposal that you want me to look into?

I gave a link to Eviews' documentation in my original e-mail. In my mind the eventual implementation in Statsmodels should look a lot like that (in terms of supported features and our general approach).

Thank you for this advice, I'll look through the documentation and put my first version of the proposal as soon as possible.

Regards

Abhijeet

Abhijeet Panda

未读，

2018年3月9日 19:11:292018/3/9

收件人 pystatsmodels

Hi Chad,

Thank you for this advice, I'll look through the documentation and put my first version of the proposal as soon as possible.

Before proceeding with the proposal I wanted to make a proper idea exploration for this project so I have created a google doc with the process and their underlying components. It's incomplete.

The confusions are highlighted and you can add any suggestions or any type of improvements in that.

The Action Points section is for the stepwise functional improvements to be done in the project.

https://docs.google.com/document/d/1ZMav3vHEr7DoRJeVwIyuIO-Gb2MsnbLPvu3EOJ2W5jQ/edit?usp=sharing

Regards,

Abhijeet

Chad Fulton

未读，

2018年3月19日 22:30:172018/3/19

收件人 Statsmodels Mailing List

For the highlighted sections:

- I think that it is fine for your project to focus on ARIMA models and not exogenous regressors.

- Since IC and MSE are both outputs of the model, your project should give the user a choice on what to use to select the model (e.g. llf, aic, aicc, bic, hqic, mse, etc).

- The notebook for exponential smoothing is at: https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/exponential_smoothing.ipynb, but it has not been run at that link, so you'd need to download it to run it (and you need the master version installed).

As Josef mentioned, the deadline isn't too far away now, but I will be pretty available in the next week and a half to help with questions.

Best,

Chad

Abhijeet Panda

未读，

2018年3月20日 05:11:242018/3/20

收件人 pystatsmodels

Hi Chad,

I have updated the document now,

https://docs.google.com/document/d/1ZMav3vHEr7DoRJeVwIyuIO-Gb2MsnbLPvu3EOJ2W5jQ/edit?usp=sharing

We'll be focusing on ARIMA model for now.
I have updated on how we can proceed with the implementation of the automatic forecasting part of ARIMA models.
I have a question there which I have highlighted in yellow
For the implementation of the ES automatic forecasting, I think this part is straightforward. Please help me point out the issues in this part which you think might occur and I should take care of it.

In case I have missed out on something for the implementation part of the project, please help me on it so that I can address it in my proposal.

For my proposal, I have drafted it but once the action points are clear to me for this project I will complete the proposal immediately.

Best,

Abhijeet

Chad Fulton

未读，

2018年3月21日 09:23:192018/3/21

收件人 Statsmodels Mailing List

For your question: it is definitely good to have a function that replicates the Hyndman-Khandakar / auto.arima methodology, as you have written in your proposal. Ideally, though, that would only be one application of the tools that your project would develop. As an example, we might want a second function that does a "brute force" search, and it would be nice to make it easy for users to write their own heuristic algorithm for selecting a model.

For the proposal generally, it will need to be more detailed than it is currently, and you'll want to follow the template from the Python umbrella project. One element of that is that you will need:

"Proposal Detailed Description/Timeline (*)

- Please include timeline with milestones, preferably weekly ones. You may wish to read the GSoC student guide which includes several examples of good proposals with timelines, or our own information at SummerOfCode/Application

- Note that any pre-work such as setup and reading documentation should take place during the community bonding period, not after coding has started."

Another thing that will make your proposal much stronger is if you can show that you have thought about the structure of the code you're proposing (of course you're not required to stick with that, and it may all change during the summer as you actually implement things). For example:

- What models exactly will your function consider (i.e. more detail than "ARIMA" and "ES")?

- What sorts of functions / classes do you think you'll need to write? Are you proposing everything in one function (I hope not) or multiple functions? How do you imagine the end-user actually using your code?

- How do you think the model selection based on pseudo-out-of-sample forecasting will work (this is what Hyndman refers to as time series cross validation)? This is important, since this is how we can select the model when considering both ARIMA and ES classes of models.

- How will the forecasting part work? What kind of forecast output do you think you should produce?

In general, the more that you can use your proposal to demonstrate that you have thought about this project, the more likely it will be that it will be accepted. We are competing for slots across the entire Python organization (and more generally across all of GSOC), so it's important to make the proposal as strong as possible.

Chad

Abhijeet Panda

未读，

2018年3月22日 07:36:192018/3/22

收件人 pystatsmodels

Hi Chad,

Thank you for your advice, I have tried to draft my proposal to fill in the details that you have asked for.

https://docs.google.com/document/d/1mHO9o5KCL9ALm1i5pOy47doSuRo_YmG-kCCvf74wHSc/edit?usp=sharing

I haven't filled the timeline part as I have to update it according to the requirements and plan out the time management for each evaluation.

I need some help with the exponential smoothing section which I haven't drafted properly.

Also if you see any wrong information put up on the proposal please correct me.

Abhijeet

josef...@gmail.com

未读，

2018年3月22日 08:01:322018/3/22

收件人 pystatsmodels

For the ARIMA model it would be better to focus on SARIMAX and include
choosing seasonal order and differencing.
ARIMA.

I don't see anything about choosing a trend which will be important if
there is a trend and no differencing is used, otherwise ARMA might not
converge or estimate inappropriate parameters.

Josef

Abhijeet Panda

未读，

2018年3月22日 23:14:222018/3/22

收件人 pystatsmodels

Hi Josef,

For the ARIMA model it would be better to focus on SARIMAX and include
choosing seasonal order and differencing.
ARIMA.

I have updated the document for SARIMAX and how can we choose the seasonal order and differencing partner by using successive unit root tests.

Are the tests already available in statsmodels or shall we write it?

I don't see anything about choosing a trend which will be important if
there is a trend and no differencing is used, otherwise ARMA might not
converge or estimate inappropriate parameters.

Can you help me with like how to automatically choose the trend?

One approach is to see the p-values and know if the time variable is statistically significant.

Using this we can decide if there is a linear trend or not.

Abhijeet

josef...@gmail.com

未读，

2018年3月23日 00:32:552018/3/23

收件人 pystatsmodels

On Thu, Mar 22, 2018 at 11:14 PM, Abhijeet Panda
<abhijeet...@gmail.com> wrote:
> Hi Josef,
>
>> For the ARIMA model it would be better to focus on SARIMAX and include
>> choosing seasonal order and differencing.
>> ARIMA.
>>
> I have updated the document for SARIMAX and how can we choose the seasonal
> order and differencing partner by using successive unit root tests.
> Are the tests already available in statsmodels or shall we write it?

We don't have seasonal unit root tests yet in statsmodels.
One problem will be to get the tables for the distribution an p-values
for different
season lenghts, AFAIR.
If we only need them for a rough preliminary specification search, then having
very good p-values might not be necessayr

There might be some simple preliminary tests, like checking whether
seasonal polynomials
are significant.
If users don't provide a frequency, or even if they do, a check for
spikes in the
spectral density might also be useful.

>
>> I don't see anything about choosing a trend which will be important if
>> there is a trend and no differencing is used, otherwise ARMA might not
>> converge or estimate inappropriate parameters.
>>
> Can you help me with like how to automatically choose the trend?
> One approach is to see the p-values and know if the time variable is
> statistically significant.
> Using this we can decide if there is a linear trend or not.

I would just run an initial regression on a trendline, or test whether means
differ across sequential subsamples.
If it works in can be done in the context or SARIMAX or similar models by
choosing the constant/trend.options based on significance or cross-validation.

However, we got several reports about convergence and non-stationarity
problem in fitting ARMA/SARIMAX and similar models, when a stationary ARMA is
just not appropriate because there is a trend or some other non-stationary
pattern in the data. These are problems that we should be able to avoid
by some preliminary testing or diagnostics.
It might also be possible to start with a SARIMAX model that includes
trend and seasonal components, which might avoid some of the
convergence problems and then drop parts if they are not needed or don't improve
prediction.

I never did a systematic reading of the automatic forecasting literature.

A related issue, where I did some readings for the PR, is about using
Box-Cox transformation where the implemented method checks
what make the variance stable in subsamples.
https://github.com/statsmodels/statsmodels/pull/3477
and discussion in issues and PR leading up to this.

IMO: choosing the order in (S)AR(I)MA(X) or options for other models
is the core of the automatic forecasting specification search.
But if we and users throw arbitrary data like sales data at it, then
there will be messy data that might not fit to many of those candidates.

Josef

>
>
> Abhijeet

Abhijeet Panda

未读，

2018年3月23日 07:35:072018/3/23

收件人 pystatsmodels

On Friday, March 23, 2018 at 10:02:55 AM UTC+5:30, josefpktd wrote:

On Thu, Mar 22, 2018 at 11:14 PM, Abhijeet Panda
<abhijeet...@gmail.com> wrote:
> Hi Josef,
>
>> For the ARIMA model it would be better to focus on SARIMAX and include
>> choosing seasonal order and differencing.
>> ARIMA.
>>
> I have updated the document for SARIMAX and how can we choose the seasonal
> order and differencing partner by using successive unit root tests.
> Are the tests already available in statsmodels or shall we write it?

We don't have seasonal unit root tests yet in statsmodels.
One problem will be to get the tables for the distribution an p-values
for different
season lenghts, AFAIR.
If we only need them for a rough preliminary specification search, then having
very good p-values might not be necessayr

There might be some simple preliminary tests, like checking whether
seasonal polynomials
are significant.
If users don't provide a frequency, or even if they do, a check for
spikes in the
spectral density might also be useful.

This is a good idea to proceed but we need some spectral analysis tools for this.

You can check this link at:

https://stats.stackexchange.com/questions/12164/testing-significance-of-peaks-in-spectral-density

>
>> I don't see anything about choosing a trend which will be important if
>> there is a trend and no differencing is used, otherwise ARMA might not
>> converge or estimate inappropriate parameters.
>>
> Can you help me with like how to automatically choose the trend?
> One approach is to see the p-values and know if the time variable is
> statistically significant.
> Using this we can decide if there is a linear trend or not.

I would just run an initial regression on a trendline, or test whether means
differ across sequential subsamples.

This would a effective approach to detect trends. Thank you for this. But here there might be an issue while selecting the length of the subsamples.

I have done some homework and came across this link here:

https://stats.stackexchange.com/questions/225003/test-for-trend-and-seasonality-in-time-series

which explains on formulating a hypothesis and then testing it for presence of trend and seasonality.

If it works in can be done in the context or SARIMAX or similar models by
choosing the constant/trend.options based on significance or cross-validation.

However, we got several reports about convergence and non-stationarity
problem in fitting ARMA/SARIMAX and similar models, when a stationary ARMA is
just not appropriate because there is a trend or some other non-stationary
pattern in the data. These are problems that we should be able to avoid
by some preliminary testing or diagnostics.

It might also be possible to start with a SARIMAX model that includes
trend and seasonal components, which might avoid some of the
convergence problems and then drop parts if they are not needed or don't improve
prediction.

I haven't thought of this but we can add this to one of the approaches to select the model in addition to

calculating the parameters for the model.

I never did a systematic reading of the automatic forecasting literature.

A related issue, where I did some readings for the PR, is about using
Box-Cox transformation where the implemented method checks
what make the variance stable in subsamples.
https://github.com/statsmodels/statsmodels/pull/3477
and discussion in issues and PR leading up to this.

IMO: choosing the order in (S)AR(I)MA(X) or options for other models
is the core of the automatic forecasting specification search.
But if we and users throw arbitrary data like sales data at it, then
there will be messy data that might not fit to many of those candidates.

Since as of now we're only focusing on SARIMAX and ES models, I believe simple exponential smoothing might bring some good results here.

Later we can improve on the models and expand our model space.

Josef

>
>
> Abhijeet

Abhijeet Panda

未读，

2018年3月24日 05:50:102018/3/24

收件人 pystatsmodels

Hi Chad and Josef,

I have updated the proposal according to the above ideas.

I still need some help in Exponential Smoothing section.

Any other changes are also welcome.

https://docs.google.com/document/d/1mHO9o5KCL9ALm1i5pOy47doSuRo_YmG-kCCvf74wHSc/edit?usp=sharing

ja20...@gmail.com

未读，

2018年3月24日 14:10:202018/3/24

收件人 pystatsmodels

Hi Chad Fulton, we should look over to PDE's for more improvement and bigo notations for more efficiency and less time

https://pdfs.semanticscholar.org/0f71/ef0a78f86052e3ccbe9a8d0e7fbe13ba91f7.pdf

http://www.iieta.org/sites/default/files/Journals/EESRJ/04.02_01.pdf

ja20...@gmail.com

未读，

2018年3月25日 08:39:352018/3/25

收件人 pystatsmodels

Where to submit proposal ?

On Sunday, March 4, 2018 at 9:08:47 AM UTC+5, Chad Fulton wrote:

josef...@gmail.com

未读，

2018年3月25日 09:08:152018/3/25

收件人 pystatsmodels

On Sun, Mar 25, 2018 at 8:39 AM, <ja20...@gmail.com> wrote:
> Where to submit proposal ?

The google website that administers GSOC is summerofcode.withgoogle.com

Josef

ja20...@gmail.com

未读，

2018年3月25日 10:01:402018/3/25

收件人 pystatsmodels

Thanks Josef!

Should I have to submit rought draft here or just to submit purposal direclty over website ?

josef...@gmail.com

未读，

2018年3月25日 10:32:072018/3/25

收件人 pystatsmodels

On Sun, Mar 25, 2018 at 10:01 AM, <ja20...@gmail.com> wrote:
> Thanks Josef!
> Should I have to submit rought draft here or just to submit purposal
> direclty over website ?

That depends on the state of your proposal. If it is very preliminary,
then it's better to discuss on the mailing list first.
For reviewing details a google doc with link here would be convenient.

Given that the deadline is approaching fast, I recommend getting into
the google system soon. There is always a rush at the last days and
complaints or comments afterwards by some students that they missed
the deadline by a few minutes. That never happened to us.
GSOC deadlines are strict.

As update: We have one proposal for automatic forecasting in discussion
that is almost finished. We don't have any proposals under discussion
for other topics.

Josef

ja20...@gmail.com

未读，

2018年3月25日 11:16:452018/3/25

收件人 pystatsmodels

So it means, the proposals on this topic has been closed and I should look forward to another topic?

josef...@gmail.com

未读，

2018年3月25日 11:34:392018/3/25

收件人 pystatsmodels

On Sun, Mar 25, 2018 at 11:16 AM, <ja20...@gmail.com> wrote:
> So it means, the proposals on this topic has been closed and I should look
> forward to another topic?

No, nothing is decided yet, of course. It affects the chances of being accepted.

If there are two very good proposals on the same topic, then we still have to
choose only one of them.
If there is one good proposal on another topic, then, assuming it is good
enough for a GSOC project, it will compete for the number of available
slots but doesn't have to compete on the same topic.

Abhijeet Panda

未读，

2018年3月25日 16:32:222018/3/25

收件人 pystatsmodels

Hi everyone,

I have shared my proposal for review in my GSOC profile under Python Software Foundation in Statsmodels.

Please let me know if there are any improvements required so that I would be able to update it.

Regards,

Abhijeet

Chad Fulton

未读，

2018年3月25日 17:14:442018/3/25

收件人 Statsmodels Mailing List

On Sat, Mar 24, 2018 at 2:05 PM, <ja20...@gmail.com> wrote:

Hi Chad Fulton, we should look over to PDE's for more improvement and bigo notations for more efficiency and less time
https://pdfs.semanticscholar.org/0f71/ef0a78f86052e3ccbe9a8d0e7fbe13ba91f7.pdf
http://www.iieta.org/sites/default/files/Journals/EESRJ/04.02_01.pdf

Hello,

I'm not really sure what you have in mind here, can you describe a little bit more what this is and when it would be useful to us?

Best,

Chad

Chad Fulton

未读，

2018年3月25日 17:15:432018/3/25

收件人 Statsmodels Mailing List

On Sun, Mar 25, 2018 at 11:34 AM, <josef...@gmail.com> wrote:

On Sun, Mar 25, 2018 at 11:16 AM, <ja20...@gmail.com> wrote:
> So it means, the proposals on this topic has been closed and I should look
> forward to another topic?

No, nothing is decided yet, of course. It affects the chances of being accepted.

If there are two very good proposals on the same topic, then we still have to
choose only one of them.
If there is one good proposal on another topic, then, assuming it is good
enough for a GSOC project, it will compete for the number of available
slots but doesn't have to compete on the same topic.

Josef

Yes, I agree, if you would like to make a proposal there is still time and we would welcome additional ideas, although you would need to move very quickly now.

Best,

Chad

回复全部

回复作者