HELP: Looking for an ARIMA and KF Example

212 views
Skip to first unread message

Victor Saucedo

unread,
Jul 21, 2023, 3:08:26 PM7/21/23
to pystatsmodels
Hello,

I'm relatively new to this great code source. Can anyone direct me to an example building a multivariate state space with a multivariate ARIMA and then use the state space for predicting unobservable states with KF? I would expect the Selection Matrix "R" to contain the ar. and ma. parameters and the state space to be augmented with t_1 observed variables.

Thanks,

Victor

D. K

unread,
Jul 21, 2023, 3:32:12 PM7/21/23
to pystatsmodels

Dear Victor, 

From my library of codes. I believe that will work as a general example. Please adapt it to your case according to your needs.


Let's walk through an example of building a multivariate state space model using a multivariate ARIMA and then using the Kalman filter to predict unobservable states. In this example, we'll use Python and the `statsmodels` library to handle the ARIMA and state space modeling.


Suppose we have two time series variables, 'x' and 'y', and we want to create a multivariate ARIMA model for them. The steps are as follows:


Step 1: Import the necessary libraries.


```python
import numpy as np
import pandas as pd
import statsmodels.api as sm
```


Step 2: Generate sample data for 'x' and 'y'.


```python
# Sample data
np.random.seed(42)
n_obs = 100
x = np.random.randn(n_obs)
y = 0.5 * x + np.random.randn(n_obs)
```


Step 3: Create a multivariate ARIMA model for 'x' and 'y'.


```python
# Order of AR and MA terms for x and y
order_x = (1, 0, 1) # (AR, I, MA) for x
order_y = (1, 0, 0) # (AR, I, MA) for y


# Combine 'x' and 'y' into a multivariate time series
data = pd.DataFrame({'x': x, 'y': y})


# Create the multivariate ARIMA model
model = sm.tsa.VARMAX(data, order=(order_x, order_y))
results = model.fit(maxiter=1000, disp=False)
```


Step 4: Extract AR and MA coefficients and create the Selection Matrix 'R'.


```python
# Extract AR and MA coefficients
ar_params = results.arparams
ma_params = results.maparams


# Create the Selection Matrix 'R'
R = np.zeros((2, 4)) # 2 variables, 4 parameters (2 AR and 2 MA)
R[0, :2] = ar_params[0]
R[0, 2:] = ma_params[0]
R[1, :2] = ar_params[1]
```


Step 5: Set up the state space representation for prediction.


```python
# Augment the data with t_1 observed variables
data_augmented = pd.DataFrame({'x': np.zeros(n_obs), 'y': np.zeros(n_obs)})
data_augmented.loc[1:, :] = data.values[:-1, :]


# Set up the state space representation
# Note: We set 'k_states=2' because there are 2 unobserved states (x and y)
mod = sm.tsa.statespace.SARIMAX(data, order=(order_x, order_y), k_states=2)


# Set the state space representation to use the selected R matrix
mod.ssm['design', :] = R


# Perform the prediction
res = mod.filter(results.params)
predicted_states = res.filtered_state[:, -2:] # Get the predicted states for 'x' and 'y'
```


Now, `predicted_states` will contain the predicted unobservable states for 'x' and 'y' using the Kalman filter based on the multivariate ARIMA model. Keep in mind that this example uses randomly generated data and relatively simple ARIMA orders, so in practice, you should tune the model and validate its performance on real-world data.


Note: For time series analysis and state space modeling, it is crucial to perform thorough data analysis, validate assumptions, and choose appropriate model orders for better results. This example is meant to provide a basic idea of how to set up a multivariate state space with a multivariate ARIMA model and use the Kalman filter for state prediction.

Victor Saucedo

unread,
Jul 21, 2023, 4:18:12 PM7/21/23
to pystatsmodels
Thank you very much for the quick and useful response. David?

Your code is much simpler than the new classes definitions recommended in the general state space examples. I may not fully understand why. Anyhow, I'm going to review your propsed approch which seems to include the requirements I was expecting.

Thank you!!

Victor

Victor Saucedo

unread,
Jul 22, 2023, 1:22:35 AM7/22/23
to pystatsmodels
Dear David,

Reviewing your code, I came up with two questions that I cannot resolve.

1.- The data augmentation I refer to, should increase the number of states by stacking the original state vector (2 states) and the observation variable. Assuming Y_t the observation variable with dimension (1x2) then the new state will increase to 3 states: x, t and Y_t_1. Your code does not augment the states. Does this make sense?

2.- The Kalman Filter I want to implement is to predict the unobserved states, x and y, based on observed Y_t. Where is Y_t used in your code to predict the states?

Are my questions clear? FYI. I'm using the theory from Section 12.1 of the book "Time Series: Theory and Methods", Second edition by Brockell, P,J., and Davis, R.A., Springer-Verlag

Thanks in advance for your help,

Victor

D. K

unread,
Jul 22, 2023, 6:00:07 AM7/22/23
to pystatsmodels
Dear Victor, 

The previous code was an example. Without having access to the real data and the model, I provided a general example. 

You are correct in your understanding of data augmentation and the use of the Kalman filter to predict unobserved states based on observed variables. Let's clarify the steps to address your questions:

1. Data Augmentation:
You are correct that data augmentation involves increasing the number of states by stacking the original state vector and the observation variable. In this case, we want to augment the state space with an additional state, which is the lagged observation variable 'Y_t_1'. We can achieve this by modifying the state space matrix and observation matrix. Let's implement this:


```python
import numpy as np
import pandas as pd
import statsmodels.api as sm

# Sample data
np.random.seed(42)
n_obs = 100
x = np.random.randn(n_obs)
y = 0.5 * x + np.random.randn(n_obs)

# Combine 'x' and 'y' into a multivariate time series
data = pd.DataFrame({'x': x, 'y': y})

# Order of AR and MA terms for x and y
order_x = (1, 0, 1)  # (AR, I, MA) for x
order_y = (1, 0, 0)  # (AR, I, MA) for y

# Create the multivariate ARIMA model
model = sm.tsa.VARMAX(data, order=(order_x, order_y))
results = model.fit(maxiter=1000, disp=False)

# Extract AR and MA coefficients
ar_params = results.arparams
ma_params = results.maparams

# Create the Selection Matrix 'R'
R = np.zeros((3, 4))  # 3 states (x, y, and Y_t_1), 4 parameters (2 AR and 2 MA)

R[0, :2] = ar_params[0]
R[0, 2:] = ma_params[0]
R[1, 1] = 1  # y state
R[2, 0] = 1  # Y_t_1 state


# Augment the data with t_1 observed variables
data_augmented = pd.DataFrame({'x': np.zeros(n_obs), 'y': np.zeros(n_obs)})
data_augmented.loc[1:, :] = data.values[:-1, :]

# Set up the state space representation with augmented states
mod = sm.tsa.statespace.SARIMAX(data_augmented, order=(order_x, order_y), k_states=3)


# Set the state space representation to use the selected R matrix
mod.ssm['design', :] = R

# Perform the prediction
res = mod.filter(results.params)
predicted_states = res.filtered_state[:, -3:]  # Get the predicted states for 'x', 'y', and 'Y_t_1'
```

Now, the state space representation has been augmented with the additional states 'y' and 'Y_t_1'.

2. Kalman Filter for Predicting Unobserved States:
The Kalman filter is used to estimate the unobserved states based on the observed variables. In this case, we want to predict 'x' and 'y' based on the observed variable 'Y_t'. The Kalman filter automatically handles this estimation during the filtering step, and we can access the predicted states using `res.filtered_state`.

The code provided above includes the Kalman filter step (`res.filter`) to predict the unobserved states 'x' and 'y' based on the observed 'Y_t'. The `predicted_states` array will contain the estimated values for 'x', 'y', and 'Y_t_1'.

I hope this clarifies the implementation of data augmentation and the use of the Kalman filter for state prediction based on observed variables.

Note: Codes do not work as a magic box. You have to make any modifications to your code manually.

D. K

Victor Saucedo

unread,
Jul 24, 2023, 1:19:48 PM7/24/23
to pystatsmodels
Dear David,

Thank you very much for your prompt and useful reply. Agree hat code is no magic and modifications are needed to my data/code. So, before making many transformations I'm trying to understand what your code does and how it does it. 

From your code and using your proposed data and dataframe, I'm trying to learn the next few things. I would appreciate your comments:

# Create the multivariate ARIMA model
model = sm.tsa.VARMAX(data, order=(order_x, order_y))
results = model.fit(maxiter=1000, disp=False)


Here, I'm getting 
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[5], line 2 1 # Create the multivariate ARIMA model ----> 2 model = sm.tsa.VARMAX(data, order=(order_x, order_y)) 3 results = model.fit(maxiter=1000, disp=False) File ~\AppData\Local\miniconda3\envs\cadet\lib\site-packages\statsmodels\tsa\statespace\varmax.py:148, in VARMAX.__init__(self, endog, exog, order, trend, error_cov_type, measurement_error, enforce_stationarity, enforce_invertibility, trend_offset, **kwargs) 145 self.order = order 147 # Model orders --> 148 self.k_ar = int(order[0]) 149 self.k_ma = int(order[1]) 151 # Check for valid model TypeError: int() argument must be a string, a bytes-like object or a real number, not 'tuple'
---------------------

** I'm using statsmodels v0.14.0.. Why can't it use the "order" parameter as you defined it?

For the augmented state, your code does not augment with an additional state. Instead I'm using,

# Augment the data with t_1 observed variables
y_t_1 = np.roll(y,-1)
data_augmented = pd.DataFrame({'x': x, 'y': y, 'Y_t_1': y_t_1})


so, it transforms data from (100 rows × 2 columns) to data_augmented with (100 rows × 3 columns). ** Does this make sense to you?

Nevertheless, when moving to building the state space with SARIMAX using the endog as data_augmented

# Set up the state space representation with augmented states
mod = sm.tsa.statespace.SARIMAX(data_augmented, order=(order_x, order_y), k_states =3)


--------------------------------------------------------------------------- ValueError Traceback (most recent call last) Cell In[11], line 2 1 # Set up the state space representation with augmented states ----> 2 mod = sm.tsa.statespace.SARIMAX(data_augmented, order=(order_x, order_y), k_states =3) File ~\AppData\Local\miniconda3\envs\cadet\lib\site-packages\statsmodels\tsa\statespace\sarimax.py:328, in SARIMAX.__init__(self, endog, exog, order, seasonal_order, trend, measurement_error, time_varying_regression, mle_regression, simple_differencing, enforce_stationarity, enforce_invertibility, hamilton_representation, concentrate_scale, trend_offset, use_exact_diffuse, dates, freq, missing, validate_specification, **kwargs) 318 def __init__(self, endog, exog=None, order=(1, 0, 0), 319 seasonal_order=(0, 0, 0, 0), trend=None, 320 measurement_error=False, time_varying_regression=False, (...) 325 freq=None, missing='none', validate_specification=True, 326 **kwargs): --> 328 self._spec = SARIMAXSpecification( 329 endog, exog=exog, order=order, seasonal_order=seasonal_order, 330 trend=trend, enforce_stationarity=None, enforce_invertibility=None, 331 concentrate_scale=concentrate_scale, dates=dates, freq=freq, 332 missing=missing, validate_specification=validate_specification) 333 self._params = SARIMAXParams(self._spec) 335 # Save given orders File ~\AppData\Local\miniconda3\envs\cadet\lib\site-packages\statsmodels\tsa\arima\specification.py:267, in SARIMAXSpecification.__init__(self, endog, exog, order, seasonal_order, ar_order, diff, ma_order, seasonal_ar_order, seasonal_diff, seasonal_ma_order, seasonal_periods, trend, enforce_stationarity, enforce_invertibility, concentrate_scale, trend_offset, dates, freq, missing, validate_specification) 265 # Validate shapes of `order`, `seasonal_order` 266 if len(order) != 3: --> 267 raise ValueError('`order` argument must be an iterable with three' 268 ' elements.') 269 if len(seasonal_order) != 4: 270 raise ValueError('`seasonal_order` argument must be an iterable' 271 ' with four elements.') ValueError: `order` argument must be an iterable with three elements.

-------------------------------

I have not been able to find in the User's guide the parameters for SARIMAX with the parameter order the way your are suggesting. Also, k_states does not seem to be a paramete for SARIMAX.

I really appreciate your time. If there is something I should to learn better, please let me know.

Regards,

Victor S.
Reply all
Reply to author
Forward
0 new messages