Tounderstand regression analysis with dummy variables, let us take an example of using dummy variable with structural changes in an economy. For example, there was a structural change in U.S during 1981-1982, and also a severe recession in 2007 and 2008. So when we taking a time series data, such structural changes does has an effect on our regression analysis. To check the effects of such structural changes, we use a particular dummy variable and then run our regression.
I am taking an example of the economy of Pakistan which had a structural change in 1993 Monetary policy. The data starts from 1976 to 2015. Source of the data is from State bank of Pakistan Statistical handbook.
To understand the structural changes of the economy we first have to create our model of dependent and independent variables.We know that the overall effect of the independent variable of CPI and Real GDP would cause an effect over the Money Supply (Broad Money, M2) of the Economy. Thus, we can write the equation in the following way:
To understand the structural change in the Economy of Pakistan we take a time-series data from 1976 to 2015. The source of the data is from Handbook of Statistics of Pakistan Economy from State Bank of Pakistan. For the above equation the following data was collected:
We will be using double-log at both ends of the equation. Econometricians use natural log for various reasons in regression analysis. The important reasons for taking a double log at both ends for this equation is as follows:
With regard to the Independent variable of RGDP, it shows us that 1% increase in real GDP would lead to a decrease of -0.9% in MS. The p-value for this coefficient is significant enough which shows there is a strong relationship between Real GDP and MS. (5%). Let us add a structural change dummy variable to understand the exact effects.
However in the case of the Inflation rate, the board money supply of M2 decreases at a rate of 0.02% if the Inflation rate increase by 1%. The p-value for this relationship is very weak (96%)
The structural change of 1993 of monetary policy is about 2.91 which is statistically significant this shows us that there have been changes in Money supply in pre and post-1993.
The R2 value of 72% and an Adjusted R value of 74% shows how well the independent variables of Inflation and Real GDP has explained the model. So higher R values show regression with a good fit. (Note that we always get a high R2 in a time series data because of the significant trends within the observations. Whereas in a cross-sectional data we get a low R2.
Thus, the inclusion of the dummy variable of Structural change of Monetary Policy of 1993 in the regression, show highly significant p-value, which leads us to the analysis that there have been structural changes in the economy.
I have developed a fairly simple multivariate regression econometrics model. I am now attempting to run Robust Regressions (EViews calls them Robust Least Square). I can easily run a Robust Regression M-estimation. But, every time I run a Robust Regression MM-estimation I run into the same error: "Maximum number of singular subsamples reached." I have played around with the MM-estimation specifications by increasing/decreasing number of iterations, convergence level, etc... Invariably, I run into the same error.
At an EViews forum, another fellow ran into the exact same problem for both MM-estimation and S-estimation. The forum moderator indicated that if a model has the presence of dummy variables without that many observations, such estimations may not reach convergence and generate the error as mentioned above. My model does have dummy variables. And, some of them do not have that many observations (8 consecutive ones out of a time series data with 217 observations). However, I am unclear if this is a limitation of EViews or if this is truly an algorithm limitation. I may attempt to rerun MM-estimation in R. And, see if it is feasible.
Following up on the above, I did just that. And, ran Robust Regression using R with MASS package using rlm() function. Just as in EViews I had no problem running an M-estimation. Similarly, I firt ran into trouble when attempting an MM-estimation. Just as in EViews I got an error message stating the regression/simulation did not reach convergence after 20 iterations. So, I reran my MM-estimation by first eliminating all my dummy variables. As predicted, it worked. Next, I added just one single dummy variable at a time and each time I reran my MM-estimation. I did that to observe when the MM-estimation model would break down. To my surprise, it never did. And, now I eventually could run my MM-estimation with all the dummy variables. I don't know why I could not run it at first with all the dummy variables in at once (maybe I did an error in coding).
This leads me to conclude that R is somewhat more flexible than EViews on this count. After closer inspection, I noticed that the EViews M-estimation I ran was of the bisquare type (vs. the regular Huber one). This makes a big difference. When I did run in R an M-estimation of the bisquare type I almost got the exact same results as EViews. There were small differences between the two. This can be expected given that the solving process is iterative.
As you can read in my commentary, I did quite a bit of work on the issue. In the end, I am unclear why EViews methodically crashes when running a Robust Regression of the MM-estimation type with a model that has a few dummy variables. I feel like it should not. The exact same model using the same Robust Regression methodology was solvable in R with the MASS package and rlm function using method = "MM".
In case you find yourself in similar circumstances, I would advise you to attempt to do the Robust Regression MM type in R instead. I don't know what is the relative resiliance of this process in SAS, SPSS, Python, STATA and other similar software. Hopefully, any of those is more resiliant than EViews on this count.
It is not unlikely that this type of model can actually cause a software to crash (after numerous iterations, the algorithm is not converging towards a solution). But, if my experience is any indicator R has a much higher resiliance threshold than EViews on this count.
I first found the trend the data is best approximated by, then determined the relevant seasonal dummies (which is none), and then looked at the correlogram of my new model to determine the AR process.
Once I added in my lagged variables my trend was rendered statistically insignificant, so I took the trend out. Additionally, I thought it was weird I got no seasonality. So just on a whim I decided to to input my seasonal dummy variable after I added in my lags. And season 4 was then highly significant!
I then played around with adding in different seasons and found the model with the lowest SIC(Schwarz Information Criterion, as listed in the output below) of all the models I constructed has two lags and one seasonal dummy.
The most important question is about the role of lags in my model. Why would controlling for serial correlation first allow me to see seasonality? Does the order of controlling for certain dynamics matter? Can I control for cycles, then seasons, then trend? Or should I control these dynamics in a specific sequence?
Seasonal dummies and seasonal auto-regressive structure are often "competitors" when finding the best model. Your significant lag1 and lag2 effects often enable a clearer picture as to which approach is best i.e. seasonal dummies or seasonal memory (ARIMA) . This is why we recommend and implement a tournament approach to identifying an initial model. Care must also be taken to incorporate level shifts and/or multiple trends while dealing with Pulses. If you wish I would be glad to be more specific by actually analyzing your data. Other readers of the list might also chime in and provide productive approaches/solutions using software of their choice.
With respect to "why?" when you introduce additional significant structure you reduce the error sum of squares from the model; thus you reduce the standard errors of all estimated/non-redundant coefficients and so we often get increased levels of statistical significance.
This is the blog post to show how an unknown structural break can be found for any variable. Following illustration is only available in eviews 8 and onward, you can get demo version of eviews from eviews website.
Thanks a lot for all your effort to support scientific research, if i have 6 variables in my model ( 1 dependent and 5 independent) my data 31 years, i want to applied breakpoint test in order creat dummy varible.
what i should do in this case:
1- I should make one dummy for all variables together, Please explain how?
2- or I should make dummy for evry variable As explained in the example above ( it means 6 dummy variables)?
This problem really confused me. I am waiting for your help .
The IMPRI Generation Alpha Data Centre (GenAlphaDC) at IMPRI Impact and Policy Research Institute, New Delhi conducted a Two-Week Immersive Online Hands-On Certificate Training Course on Exploratory Data Analysis with Categorical Variables Regression Models Dummy Variables and Logit/Probit using EViews, on December 10 and 17, 2022. The expert trainer for the course was Professor Nilanjan Banik, Professor at Mahindra University. He is a Visiting Consultant at IMPRI and an Academic Consultant with Geneva Network, United Kingdom, and a Senior Consultant with Hankuk University of Foreign Studies, South Korea.
Second, he mentioned that a dummy variable can capture any break or shift in data. He used the example of the Indian economic reforms of 1991, which was a breakpoint in terms of per capita GDP levels. After 1991 there was a big jump in GDP growth. In other words, there was a structural break. Dummy variables can capture such structural breaks. Thirdly, he mentioned that dummy variables can also be used to de-seasonalize the data. Using Excel, he showed how to incorporate dummy variables in a regression model and how dropping a dummy variable is important in order to avoid a Dummy Trap. He also showed how to de-seasonalize the data, using Excel. After de-seasonalizing the graph turned out to be more stable than before.
3a8082e126