Non-positive-definite observation covariance matrix encountered at period 0

43 views
Skip to first unread message

D. K

unread,
May 30, 2023, 5:36:29 PM5/30/23
to pystatsmodels
Dear,

I am attempting to run a TVP-VAR model, exactly, as written on the model page:

https://www.statsmodels.org/stable/examples/notebooks/generated/statespace_custom_models.html#Model-1:-time-varying-coefficients

I get an error. 

Non-positive-definite observation covariance matrix encountered at period 0

This happens in step 14 of the model from your page

 mod.update_variances

 File statsmodels/tsa/statespace/_cfa_simulation_smoother.pyx:616 in statsmodels.tsa.statespace._cfa_simulation_smoother.dCFASimulationSmoother.update_sparse_posterior_moments 


LinAlgError: Non-positive-definite observation covariance matrix encountered at period 0 

I suspect it has to do with the data itself or the parameters. I have no exogenous variables; however, I have gaps in the data.

I set in the definitions of super" and "exog =none without being able to solve them. I also attempted to drop the gaps (although it violates my model) by setting df = df.dropna() and letting exog = z_t, but again, without being able to resolve (dropping na is not at all desirable), I get an occurrence in several rows at the beginning, which is certainly not an error.


but I am getting an error on the size of the update state_cov (I have numerous variables, resulting in a 756x756 matrix size for 78 years),
 
I would like to kindly ask you if you have the time to help me with some line code to solve the issue, eventually adding impulse response functions at the end of the code. Thank you very much in advance.

Best regards, 
David

Chad Fulton

unread,
Jun 6, 2023, 10:23:45 AM6/6/23
to pystat...@googlegroups.com
Hi David,

You're right, the TVP-VAR model in that notebook does not support missing values. However, you can make a few small adjustments that will accommodate them:

First, replace the first part of the __init__ method with:

        # Create a matrix with [y_t' : y_{t-1}'] for t = 2, ..., T
        augmented = sm.tsa.lagmat(y, 1, trim='both', original='in', use_pandas=True)
        augmented[augmented.isnull().any(axis=1)] = np.nan
        # Separate into y_t and z_t = [1 : y_{t-1}']
        p = y.shape[1]
        y_t = augmented.iloc[:, :p]
        z_t = sm.add_constant(augmented.iloc[:, p:])

        # Recall that the length of the state vector is p * (p + 1)
        k_states = p * (p + 1)
        super().__init__(y_t, k_states=k_states)
        self.exog = z_t.to_numpy()
        self.k_exog = self.exog.shape[1]


Then, you have to update the MCMC iterations to ignore the periods with missing values:

mask = ~mod.data.orig_endog.isnull().any(axis=1)

for i in range(niter):
    mod.update_variances(store_obs_cov[i], store_state_cov[i])
    sim.simulate()

    # 1. Sample states
    store_states[i + 1] = sim.simulated_state.T

    # 2. Simulate obs cov
    fitted = np.matmul(mod['design'].transpose(2, 0, 1), store_states[i + 1][..., None])[..., 0]
    resid = (mod.endog - fitted)[mask]
    store_obs_cov[i + 1] = invwishart.rvs(v10 + resid.shape[0], S10 + resid.T @ resid)

    # 3. Simulate state cov variances
    resid = store_states[i + 1, 1:] - store_states[i + 1, :-1]
    sse = np.sum(resid**2, axis=0)
   
    for j in range(mod.k_states):
        rv = invgamma.rvs((vi20 + mod.nobs - 1) / 2, scale=(Si20 + sse[j]) / 2)
        store_state_cov[i + 1, j] = rv


Finally, you cannot use the CFA simulation smoother in this case, so you need to use 

sim = mod.simulation_smoother()

rather than sim = mod.simulation_smoother(method='cfa').

If you would like to make a pull request to update the notebook with these changes, I'd be happy to merge that.

Finally, with regards to impulse response functions, we don't provide a way of computing them in this model. One reason is that in a time-varying model there is no single definition of the impulse responses, since those are functions of the time-varying parameter. But you can of course compute them yourself from the coefficient estimates for any particular time period.

Best,
Chad



--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/d13f40b8-90a2-4475-ace0-d4037236472dn%40googlegroups.com.

D. K

unread,
Jun 12, 2023, 2:54:55 PM6/12/23
to pystatsmodels

Hi Chad,
I thank you very much for the help and the code.

As a general comment, I think you should update the page at some point, with the knowledge and interaction of the various users.

As a preface to my code, I arrived at the last points 16 and 17 of the guide, to explore the credible intervals of each of the variance, covariance, and variance parameters. I was given the code by a colleague in a slightly different way, trying to overcome the resulting error of not producing a graph and handling variables with one or fewer data points and excluding them from the plots., as it turns out that to be the cause.

anaconda3/lib/python3.8/site-packages/arviz/stats/density_utils.py:606: UserWarning: Something failed: `x` is of length 1. Can't produce a KDE with only one data point.
  warnings.warn("Something failed: " + str(e))
<Figure size 576x432 with 0 Axes>.
Traceback (most recent call last):


For both points 16 and 17, it is the same error, It produces only empty plots. The code I was given for the credible interval is the following. I would appreciate any ideas or help. 

Best,

David

  



import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import arviz as az

# Define your variable_names and data
variable_names = [ MY LIST]
 

# Load your data into a pandas DataFrame
data = pd.read_csv('data.csv')

import warnings
# Exclude variables with one or fewer data points
variable_names_filtered = []
for var in variable_names:
    if data[var].nunique() > 1:
        variable_names_filtered.append(var)
    else:
        warnings.warn(f"Variable '{var}' has one or fewer data points and will be excluded.")

# Print the filtered variable names
print("Variable names filtered:", variable_names_filtered)

# Check if there are enough data points for KDE plot
if len(variable_names_filtered) > 0:
    # Retrieve the parameter covariances
    param_cov = results.cov_params()

    # Retrieve the parameter variances
    param_variances = np.diagonal(param_cov)

    # Store the state innovation variance parameters in a dictionary
    state_var_dict = {
        variable_names_filtered[i]: param_variances[i]
        for i in range(len(variable_names_filtered))
    }

    # Convert the state innovation variance parameters to an InferenceData object
    az_state_var = az.convert_to_inference_data(state_var_dict)

    # Plot the credible intervals using ArviZ for each variable
    for var in variable_names_filtered:
        var_data = data[var]
       
        if var_data.nunique() > 1:
            plt.figure(figsize=(8, 6))
            az.plot_forest(az_state_var, var_names=[var], kind='ridgeplot')
            plt.xlabel('Credible Intervals')
            plt.title(f'Credible Intervals - {var}')
            plt.show()
        else:
            print(f"The variable '{var}' has insufficient variability and will not be plotted.")

    # Plot the credible intervals using ArviZ for each pair of variables
    pair_variables = variable_names_filtered

    for var1 in pair_variables:
        for var2 in pair_variables:
            if var1 != var2:
                var1_data = data[var1]
                var2_data = data[var2]
               
                if var1_data.nunique() > 1 and var2_data.nunique() > 1:
                    plt.figure(figsize=(8, 6))
                    az.plot_forest(az_state_var, var_names=[var1, var2], kind='ridgeplot')
                    plt.xlabel('Credible Intervals')
                    plt.title(f'Credible Intervals - {var1} vs {var2}')
                    plt.show()
                else:
                    print(f"The pair of variables '{var1}' vs '{var2}' has insufficient variability and will not be plotted.")
else:
    print("Insufficient data points to create a KDE plot.")
Reply all
Reply to author
Forward
0 new messages