Question on UCM model components

104 views
Skip to first unread message

Jerry

unread,
Jan 17, 2024, 5:47:45 PMJan 17
to pystatsmodels
Question for UCM developers and users. The model worked great for many occasions, however there is one puzzle I need to understand for interpretation:

How do you interpret "fitted values" or "forecasted values" for historical data in UCM? 

For example, I am using monthly data from last three years, Jan 21 to Dec 24, to run a forecast for year 2024. When I check the forecast column, the Jan 24 forecast would be equal to the sum of components. When I check historical data, for example, Dec 23, the sum of the components would equal to Dec 23 actual value. Then there is this forecasted value for Dec 23, which is different from Dec 23 actual value and there is an error term. My question is: how does the model get that forecasted value for Dec 23 and how does that relate to future forecast. This seems to be different from many other forecasting methods, where I may expect the sum of historical component would be equal to the forecated value, no the actual.

Thank you if you could provide some directions. Maybe I am not pulling the right reference columns to look at.

Jerry

Jonathan de Souza Matias

unread,
Jan 17, 2024, 8:52:06 PMJan 17
to pystat...@googlegroups.com
Hello there,

In time series there are several modela to obtain the parameters for fitting and after that obtain forecasting values. Therefore, I strong recommend you try to understand what model you ares exactly using. After that, many doubts will be automatically solved. 

In particular, as you are describing, you model seams to be estimated using Kalman Filter. This is an algorithm that can be used to obtain forecasting, in your case, month by month. Basically, this filtering star with the first observation of the data, for example Jan 23. Then, using the filter Feb 23 can be predicted. Using a new data of Feb 23 and te error of the last month a new forecast can be made for next month en so on. And there ir is your prediction for month where you already has your data. 

Hope this helps.

Jonathan S Matias


--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/ac1d5338-7ed0-4fb7-b967-35d39de38883n%40googlegroups.com.

Chad Fulton

unread,
Jan 18, 2024, 10:52:00 PMJan 18
to pystat...@googlegroups.com
Hi Jerry,

If you are using either `fittedvalues` or the output of the `predict` method (and assuming that `dynamic=False`, which is the default), then the values are the "one-step-ahead forecasts".  This means that the predicted value for time `t` is constructed by using the data through time `t-1` (although the forecasts are conditional on the parameters; if you used `fit`, the parameters would be estimated using the entire sample),

If you are forecasting out-of-sample, then the values at horizon `h` is an `h`-step-ahead forecast. So if you have data for periods 1, 2, ..., T, then the first forecast is for period T+1 and it is also a one-step-ahead forecast, while the forecast for T+2 is a two-step-ahead forecast, and so on.

I'm not quite sure I followed your question about whether the components sum to the actual value.  In general, the predicted values should not be equal to the actual values. For example, run the following code:

import statsmodels.api as sm
data = sm.datasets.macrodata.load_pandas().data['cpi']
mod = sm.tsa.UnobservedComponents(data, 'llevel')
res = mod.fit(disp=False)

# res.plot_components();
pd.concat({
    'one-step-ahead': res.fittedvalues,
    'actual': data
}, axis=1)


The output I get is:
     one-step-ahead   actual
0             0.000   28.980
1            28.980   29.150
2            29.150   29.350
3            29.350   29.370
4            29.370   29.540
..              ...      ...
198         218.610  216.889
199         216.889  212.174
200         212.174  212.671
201         212.671  214.469
202         214.469  216.385

You can that the one-step-ahead prediction contained in the `fittedvalues` attribute is not equal to the actual data for any time point. (In fact, for this simple model the forecast is a random walk, so you can see that the prediction for *tomorrow* is whatever the model observed *today*).

If you'd like to share an example with some specific questions, please feel free.

Hope that helps,
Chad

--

Jerry

unread,
Jan 22, 2024, 10:21:51 AMJan 22
to pystatsmodels
Thank you all for your replies. This is super helpful. I understand the part about one-step-ahead prediction, which is very cool by itself. I am wondering if we can obtain the components out of those one-step-ahead prediction easily. ( I remember I tested this one-step forward prediciton against the standard output, the prediction was very close but not perfect match).

A deeper question, I have tested the forecasting by artificially inflating the last observational point. Say, the whole time series is in 18-20 range, and the last month observation was 20, I manually changed it to 30 and found my future monthly forecast go up by quite a lot (only part of the upside was allocated to seasonality). I understand this could be the risk of using a local model (local level, local linear trend). Any recommended methodology to mitigate this type of risk and how to explain this? Other methods seem to give much less credit to the last observation when it is abnormal. 

I will need some time to come up a good example for discussion. To be followed. Thanks everyone!

Jerry

Jonathan de Souza Matias

unread,
Jan 22, 2024, 12:12:53 PMJan 22
to pystat...@googlegroups.com
Basically what you are testing is the impact of huge observation that behaves as random outlier that you put randomly in the las observation. 

 In this case, you have 2 options: 

1) use a replacement filtering the time series every time it is bigger, in absolute value, 3 times of the median. Actually you can use any filter that replace all abrupt observations for some central tendency measure. Then, work with the filtered series to build the model and predict. 

2) search for any “robust” estimation for time series and Kalman Filter that can accommodate extreme values. 


Hope it helps. 

Jerry

unread,
Jan 26, 2024, 12:10:43 PMJan 26
to pystatsmodels
Hello -

Here is one example I tried using GDPC data from year 2000 to July 2023 (attached). I intentionally overwrite Oct 2023 value to see how the models are different. I had done two scenarios: 1. Set Oct 2023 value to be 25000  2. Set Oct 2023 value to be 30000

The model I specified are Local Level model with seasonal = 4. The two scenarios were fitted and predicted with the following results:

Model 1: Set Oct 2023 value 25000, to illustrate, I only show two lines which contains Oct 2023 and Jan 2024
 
Date  
GDPC1  forecast  std_err  lower_bound upper_bound

...
2023-10-01 25000  22512.72 371.7801 21784.05 23241.4 
2024-01-01 (blank) 24288.41 371.78     23559.74  25017.09
...

Model 2: Set Oct 2023 value 30000, and do the same thing
 
Date  
GDPC1  forecast  std_err  lower_bound upper_bound

...
2023-10-01 30000   22433.51095 844.5371407 20778.24857 24088.77333   
2024-01-01 (blank)  25554.43907 844.5371407 23899.17669 27209.70145
...

Here are my questions: 
1. How are the two highlighted values in the forecast column estimated (I thought that would be using all the data up to July 2023 to get one-step forward prediction, but then they should be the same value since all data before July 2023 are the same).
2. Someshow this shows that these two values are influenced by the input for Oct 2023 data, but does not seem to be impacted at the direction (the forecasted value with 30000 is actually lower than the one with 25000 input). Maybe how you explain the first question may answer this as well.

Thank in advance and please let me know of any questions.
Jerry 
GDPC1.csv

Jerry

unread,
Jan 26, 2024, 12:49:45 PMJan 26
to pystatsmodels
BTW, I tried model 3, whic leaves Oct 2023 as blank and the following is the result:

Model 3: Set Oct 2023 value blank, and do the same thing
 
Date  
GDPC1  forecast  std_err  lower_bound upper_bound

...
2023-10-01 (blank)  22473.92637 266.8940378 21950.82367 22997.02907     
2024-01-01 (blank)  22454.25032 348.7028418 21770.80531 23137.69534  ...
...
Good thing is that all these numbers are close to each other: meaning the influence from the input of Oct 2023 is quite small but you can also see that they are all different. From my previous understanding, I thought this "forecast" value for Oct 2023 would always be 22473.92637, no matter what the actual input for Oct 2023 is. Now there are some explantion work to understand. Thanks!

Jerry

sweep2009

unread,
Jan 26, 2024, 4:02:29 PMJan 26
to pystat...@googlegroups.com
Also, I have just tested that the res.fittedvalues is exactly the same as model 3, which is great and matches my understanding. Now the only question is why when you have actual values in Oct 2023, the "fitted values" changes a little, and what was the mechanism behind. 

I also tested an absurd entry of Oct 2023, such as 300000, then the fitted value goes down further to 21993. It seems that the model attributes most of this outlier to the effect of seasonal effect, where the forecasted value for Oct 2024 is 299768.5, and for other quarters close to zero (there is a quarter with negative values). So maybe somehow the seasonal component is calculated using all data instead of one-step-ahead algorithm?

Thanks!
Jerry

You received this message because you are subscribed to a topic in the Google Groups "pystatsmodels" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/pystatsmodels/8hiuCVz2j-A/unsubscribe.
To unsubscribe from this group and all its topics, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/2923c7dc-07b5-4275-8184-4436539451a5n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages