Principle Of Econometrics 5th Edition Solution

0 views

Skip to first unread message

Namuncura Mckoy

unread,

Aug 4, 2024, 9:04:25 PM8/4/24

to ensifusna

Instatistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model (with fixed level-one[clarification needed] effects of a linear function of a set of explanatory variables) by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable (values of the variable being observed) in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.[1]

where y \displaystyle \mathbf y and ε \displaystyle \boldsymbol \varepsilon are n 1 \displaystyle n\times 1 vectors of the response variables and the errors of the n \displaystyle n observations, and X \displaystyle \mathbf X is an n p \displaystyle n\times p matrix of regressors, also sometimes called the design matrix, whose row i \displaystyle i is x i T \displaystyle \mathbf x _i^\operatorname T and contains the i \displaystyle i -th observations on all the explanatory variables.

Regressors do not have to be independent for estimation to be consistent, but multicolinearity makes estimation inconsistent. As a concrete example where regressors are not independent, we might suspect the response depends linearly both on a value and its square; in which case we would include one regressor whose value is just the square of another regressor. In that case, the model would be quadratic in the second regressor, but none-the-less is still considered a linear model because the model is still linear in the parameters ( β \displaystyle \boldsymbol \beta ).

Such a system usually has no exact solution, so the goal is instead to find the coefficients β \displaystyle \boldsymbol \beta which fit the equations "best", in the sense of solving the quadratic minimization problem

A justification for choosing this criterion is given in Properties below. This minimization problem has a unique solution, provided that the p \displaystyle p columns of the matrix X \displaystyle \mathbf X are linearly independent, given by solving the so-called normal equations:

The matrix X T X \displaystyle \mathbf X ^\operatorname T \mathbf X is known as the normal matrix or Gram matrix and the matrix X T y \displaystyle \mathbf X ^\operatorname T \mathbf y is known as the moment matrix of regressand by regressors.[3] Finally, β ^ \displaystyle \hat \boldsymbol \beta is the coefficient vector of the least-squares hyperplane, expressed as

where T denotes the matrix transpose, and the rows of X, denoting the values of all the independent variables associated with a particular value of the dependent variable, are Xi = xiT. The value of b which minimizes this sum is called the OLS estimator for β. The function S(b) is quadratic in b with positive-definite Hessian, and therefore this function possesses a unique global minimum at b = β ^ \displaystyle b=\hat \beta , which can be given by the explicit formula:[5][proof]

It is common to assess the goodness-of-fit of the OLS regression by comparing how much the initial variation in the sample can be reduced by regressing onto X. The coefficient of determination R2 is defined as a ratio of "explained" variance to the "total" variance of the dependent variable y, in the cases where the regression sum of squares equals the sum of squares of residuals:[13]

Introducing γ ^ \displaystyle \hat \boldsymbol \gamma and a matrix K with the assumption that a matrix [ X K ] \displaystyle [\mathbf X \ \mathbf K ] is non-singular and KT X = 0 (cf. Orthogonal projections), the residual vector should satisfy the following equation:

Another way of looking at it is to consider the regression line to be a weighted average of the lines passing through the combination of any two points in the dataset.[14] Although this way of calculation is more computationally expensive, it provides a better intuition on OLS.

These moment conditions state that the regressors should be uncorrelated with the errors. Since xi is a p-vector, the number of moment conditions is equal to the dimension of the parameter vector β, and thus the system is exactly identified. This is the so-called classical GMM case, when the estimator does not depend on the choice of the weighting matrix.

There are several different frameworks in which the linear regression model can be cast in order to make the OLS technique applicable. Each of these settings produces the same formulas and same results. The only difference is the interpretation and the assumptions which have to be imposed in order for the method to give meaningful results. The choice of the applicable framework depends mostly on the nature of data in hand, and on the inference task which has to be performed.

One of the lines of difference in interpretation is whether to treat the regressors as random variables, or as predefined constants. In the first case (random design) the regressors xi are random and sampled together with the yi's from some population, as in an observational study. This approach allows for more natural study of the asymptotic properties of the estimators. In the other interpretation (fixed design), the regressors X are treated as known constants set by a design, and y is sampled conditionally on the values of X as in an experiment. For practical purposes, this distinction is often unimportant, since estimation and inference is carried out while conditioning on X. All results stated in this article are within the random design framework.

The classical model focuses on the "finite sample" estimation and inference, meaning that the number of observations n is fixed. This contrasts with the other approaches, which study the asymptotic behavior of OLS, and in which the behavior at a large number of samples is studied.

First of all, under the strict exogeneity assumption the OLS estimators β ^ \displaystyle \scriptstyle \hat \beta and s2 are unbiased, meaning that their expected values coincide with the true values of the parameters:[23][proof]

If the strict exogeneity does not hold (as is the case with many time series models, where exogeneity is assumed only with respect to the past shocks but not the future ones), then these estimators will be biased in finite samples.

In particular, the standard error of each coefficient β ^ j \displaystyle \scriptstyle \hat \beta _j is equal to square root of the j-th diagonal element of this matrix. The estimate of this standard error is obtained by replacing the unknown quantity σ2 with its estimate s2. Thus,

in the sense that this is a nonnegative-definite matrix. This theorem establishes optimality only in the class of linear unbiased estimators, which is quite restrictive. Depending on the distribution of the error terms ε, other, non-linear estimators may provide better results than OLS.

As was mentioned before, the estimator β ^ \displaystyle \hat \beta is linear in y, meaning that it represents a linear combination of the dependent variables yi. The weights in this linear combination are functions of the regressors X, and generally are unequal. The observations with high weights are called influential because they have a more pronounced effect on the value of the estimator.

To analyze which observations are influential we remove a specific j-th observation and consider how much the estimated quantities are going to change (similarly to the jackknife method). It can be shown that the change in the OLS estimator for β will be equal to [30]

The theorem can be used to establish a number of theoretical results. For example, having a regression with a constant and another regressor is equivalent to subtracting the means from the dependent variable and the regressor and then running the regression for the de-meaned variables but without the constant term.

This expression for the constrained estimator is valid as long as the matrix XTX is invertible. It was assumed from the beginning of this article that this matrix is of full rank, and it was noted that when the rank condition fails, β will not be identifiable. However it may happen that adding the restriction A makes β identifiable, in which case one would like to find the formula for the estimator. The estimator is equal to [34]

The least squares estimators are point estimates of the linear regression model parameters β. However, generally we also want to know how close those estimates might be to the true values of parameters. In other words, we want to construct the interval estimates.

Since we have not made any assumption about the distribution of error term εi, it is impossible to infer the distribution of the estimators β ^ \displaystyle \hat \beta and σ ^ 2 \displaystyle \hat \sigma ^2 . Nevertheless, we can apply the central limit theorem to derive their asymptotic properties as sample size n goes to infinity. While the sample size is necessarily finite, it is customary to assume that n is "large enough" so that the true distribution of the OLS estimator is close to its asymptotic limit.

We can show that under the model assumptions, the least squares estimator for β is consistent (that is β ^ \displaystyle \hat \beta converges in probability to β) and asymptotically normal:[proof]

These asymptotic distributions can be used for prediction, testing hypotheses, constructing other estimators, etc.. As an example consider the problem of prediction. Suppose x 0 \displaystyle x_0 is some point within the domain of distribution of the regressors, and one wants to know what the response variable would have been at that point. The mean response is the quantity y 0 = x 0 T β \displaystyle y_0=x_0^\mathrm T \beta , whereas the predicted response is y ^ 0 = x 0 T β ^ \displaystyle \hat y_0=x_0^\mathrm T \hat \beta . Clearly the predicted response is a random variable, its distribution can be derived from that of β ^ \displaystyle \hat \beta :