Hello
I'm attempting to set up a panel TVPVAR using statsmodels, running first a local model.
I follow step by step the tutorial on their website
Coming from Stata, I am confused on how to proper reshape my data to match what statsmodels expects.
The data are saved in long format file(Stata) in the way the attached screenshot image here shows.
There are an identifier (id ), year, country and then a set of thirty (30) variables, say variable1 to variable30 for each country and year. A typical long panel data format.
I am getting an error
An unsupported index was provided and will be ignored when e.g. forecasting. self._init_dates(dates, freq)
So, my first question is how to properly reshape my data in order to be compatible with statsmodels for a local and a panel tvpvar model ?
Also, the second error I get is when I run the tvpvar model is:
exog contains inf or nans
I do have gaps in the data, of course. I run two types of Var models. In the first one all my variables are endogenous. In the second one I consider some exogenous variables, mostly dummies. How is that solved? Could just setting exog=None be a solution? Since A part from the attached screenshot , a small sample of my data are in the following link
https://drive.google.com/file/d/1YmKseNKEGZTQk_II4fOwgUVfZLAgVqJT/view?usp=share_link
For the first question I set up the panel framework as follows:
%matplotlib inline
from importlib import reload
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import invwishart, invgamma
#1
import pyreadstat
dtafile = 'panel.dta'
dta, meta = pyreadstat.read_dta(dtafile)
dta.tail()
labels=list(meta.column_labels)
column=list(meta.column_names)
# Panel data settings
year = dta.year
year = pd.Categorical(dta.year)
dta = dta.set_index([ "country", "year"])
dta["year"] = year
dta.head()
Thank you for your help in advanced
--
You received this message because you are subscribed to the Google Groups "pystatsmodels" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystatsmodel...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/befd912b-a24b-43af-ad78-cefc68fe2029n%40googlegroups.com.
Thank you so much for your reply. I have realized that my initial question was unclear.
My model is a TVP-VAR for a panel in a normal linear state space model composed of the State Equation and the Measurement Equation, where I have managed to write it as in eq. 33 in Canova and Cicarelli (2013)
The key model equation, where X t = Xt and ut = Xt′+ut with UtN = 0 (I + 2 Xt′ Xt), is attached.
I use exactly this class of models from your site : TVP-VAR, MCMC, and sparse simulation smoothing.
https://www.statsmodels.org/devel/examples/notebooks/generated/statespace_tvpvar_mcmc_cfa.html
When I run the local model, I get the attached local graph, for the Simulations based on KFS approach, MLE parameters' and Simulations based on CFA approach, MLE parameters' where some countries and years appear in an unexpected format. I suspect it has to do with the data shape I am using. You can see my actual data shape in the attached local screenshot.
When I run the Simulations with alternative parameterization yielding a smoother trend smong the errors I get is "'value' must be an instance of str or bytes, not a tuple."
In addition to an earlier "An unsupported index was provided and will be ignored when, e.g. forecasting. self._init_dates(dates, freq) "
I suspect that has to do with my data shape and index. Because I created my dataset in Stata, it is in a long format.
My question is a bit naive. How do I reshape my data in order to be compatible with statsmodels? How do I rewrite my code in order to bring my data into an acceptable shape to run the TVP-VAR, MCMC, and sparse simulation smoothing?
Hope it is clear what I am looking. The code I am now using is:
%matplotlib inline
from importlib import reload
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from scipy.stats import invwishart, invgamma
#1
import pyreadstat
dtafile = 'panel.dta'
dta, meta = pyreadstat.read_dta(dtafile)
dta.tail()
labels=list(meta.column_labels)
column=list(meta.column_names)
# Panel data settings
year = dta.year
year = pd.Categorical(dta.year)
dta = dta.set_index([ "country", "year"])
dta["year"] = year
dta.head()
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/66000269-2f9d-4909-8d8e-13db52cf772cn%40googlegroups.com.
Thank you very much, Chand for your contribution so far. I apologize if I am going to bore you, I feel bad, especially from those who intend to assist me. I would like your understanding, However, there are two or three things I have not understood as "pre data construction. The first has to do with the construction of the data, the others are more theoretical.
I changed the data to index the time variable years and all made all variables and strings column as per your suggestion, and I put the index in year the following way.
Adding the prefix of the ISO 3 character country code variable name before each variable to change the name of each variable. That is, for e.g. the variable burden in the case of Australia and Canada AUS_burden, CAN_burden, and so on. I have compiled them by country. Screenshot1 shows how I have done it. I am new into python so, the change was made in Stata, but that doesn't matter. A second grouping is aggregation by variable, as screenshot2 shows. My question is which one is correct, screenshot 1 or screenshot ?
Mainly, I am wondering how do I select the variable I am now interested in, given that there are now a total of 80 countries in panels and over 1800 columns of results? These are the variables with the country prefix. That is, I end up having more than one variable for the same common variable, e.g. burden instead of AUS_burden CAN_burden etc. How can I have only one? I found this construction in a paper by Korobilis on PVARS . I read somewhere that "perhaps there should be a list or dictionary." I don't know how this is done or if it can be done? In general, I understand the correct layout in the case of a single country, as you explained, and I tested it on my data, but in the panel, how is it done? My original long data format was for about thirty variables or more for 80 countries
My code now is:
import pyreadstat
dtafile = 'panel.dta'
dta, meta = pyreadstat.read_dta(dtafile)
dta.tail()
labels=list(meta.column_labels)
column=list(meta.column_names)
# Panel data settings
year=pd.date_range('1945', freq='A', periods=76)dta = dta.set_index([ "year"])
dta.head()
Second, in the local model with an AR(1) process , series should not be stationary, am I understanding this wrongly?
Morover, in point 7 of the tvp var code from your site, I have a MissingDataError: exog contains inf or nans . I also tried it in the case of a single country. Of course, I have gaps in the data But how do I solve this without deleting rows or taking the average
Finally, in the same model, how can I add the Cholesky time varying impulse response function?
Thank you so much. I really apologize if I asked too much
David
To view this discussion on the web visit https://groups.google.com/d/msgid/pystatsmodels/82a53fad-b453-44e0-9e07-86c3a35f3680n%40googlegroups.com.