landdari player chryss

0 views

Skip to first unread message

Eboni Kleifgen

unread,

Aug 3, 2024, 12:12:02 AM8/3/24

to cuphicuke

Stata 11.1 introduced a set pformat command that specifies the output format of p-values in coefficient tables. (I don't know about STATA I'm afraid as I think that was discontinued some time in the 1980s).

A lot of times, you can get the utmost precision if you know your p-value by its internal name. I usually type return list or ereturn list after nearly every command that I will seriously use, and then grab results that may look like e(p) or r(p) or e(p_chi2) or whatever the scalar that contains the p-value might be.

The ttest command performs t-tests for one sample, two samples andpaired observations. The single-sample t-test compares the mean of the sampleto a given number (which you supply). The independent samples t-test comparesthe difference in the means from the two groups to a given value (usually 0).In other words, it tests whether the difference in the means is 0. Thedependent-sample or paired t-test compares the difference in the means from the twovariables measured on the same set of subjects to a given number (usually 0), while taking into account the fact thatthe scores are not independent. In our examples, we will use the hsb2 data set.

The single sample t-test tests the null hypothesis that the population meanis equal to the given number specified using the option write == .For this example, we will compare the mean of the variable write witha pre-selected value of 50. In practice, the value against which the mean iscompared should be based on theoretical considerations and/or previous research. Stata calculates the t-statistic and itsp-value under the assumption that the sample comes from an approximately normaldistribution. If the p-value associated with the t-test is small (0.05 is oftenused as the threshold), there is evidence that the mean is different from the hypothesizedvalue. If the p-value associated with the t-test is not small (p > 0.05),then the null hypothesis is not rejected and you can conclude that the mean isnot different from the hypothesized value.

In this example, the t-statistic is 4.1403 with 199degrees of freedom. The corresponding two-tailed p-value is .0001, which is less than0.05. We conclude that the mean of variable write is different from50.

where s isthe sample deviation of the observations and N is the number of validobservations. The t-value in the formula can be computed or found in anystatistics book with the degrees of freedom being N-1 and the p-value being 1-alpha/2,where alpha is the confidence level and by default is .95.

In all three cases, the difference between the population means is the same.But with large variability of sample means, second graph, two populationsoverlap a great deal. Therefore, the difference may well come by chance. Onthe other hand, with small variability, the difference is more clear as in thethird graph. The smaller the standard error of the mean, the larger themagnitude of the t-value and therefore, the smaller the p-value.

This t-test is designed to compare means of same variable between two groups.In our example, we compare the mean writing score between the group offemale students and the group of male students. Ideally, these subjects arerandomly selected from a larger population of subjects. The test assumes thatvariances for the two populations are the same. The interpretation for p-value is the same asin other type of t-tests.

In this example, the t-statistic is -3.7341 with 198 degrees of freedom. The correspondingtwo-tailed p-value is 0.0002, which is less than 0.05. We conclude thatthe difference of means in write between males and females is differentfrom 0.

We are again going to compare means of the same variable between two groups.In our example, we compare the mean writing score between the group offemale students and the group of male students. Ideally, these subjects arerandomly selected from a larger population of subjects. We previously assumedthat thevariances for the two populations are the same. Here, we will allow forunequal variances in our samples. The interpretation for p-value is the same asin other type of t-tests.

In this example, the t-statistic is -3.6564 with 169.707 degrees of freedom. The correspondingtwo-tailed p-value is 0.0003, which is less than 0.05. We conclude thatthe difference of means in write between males and females is differentfrom 0, allowing for differences in variances across groups.

In Stata, you can use either the .correlate or.pwcorr command to compute correlation coefficients. Thefollowing examples produce identical correlation coefficient matricesfor the variables income, gnp, andinterest:

In contrast, the .pwcorr command generates a correlationcoefficient matrix with p-values using the sig option, butit does not give a variance-covariance matrix. Consider the followingexamples for obtaining correlation coefficients and their p-values:

The first command generates a correlation coefficient matrix withp-values. The second line outputs correlation coefficients andp-values only when their p-values are less than .05; that is, thecoefficients with greater than the .05 significance level are leftblank. The print(.05) specifies the significance level ofcoefficients to be suppressed. The third command generates correlationcoefficients and p-values, and places an asterisk(*) next to the coefficients only when thep-value is .05 or lower. The star(.05) option requeststhat an asterisk be printed for correlation coefficients with p-valuesof .05 or lower.

The procedure is to first store a number of models and then applyesttab to these stored estimation sets to compose a regressiontable. The main difference between esttab and estout isthat esttab produces a fully formatted right away. Example:

The default in esttab is to display raw point estimates along with t statistics andto print the number of observations in the table footer. To replace thet-statistics by, e.g., standard errors and add the adjusted R-squared type:

esttab has sensible default settings for numerical display formats.For example, t-statistics are printed using two decimal places and R-squaredmeasures are printed using three decimal places. For point estimates and,for example, standard errors an adaptive display format is used where thenumber of displayed decimal places depends on the scale of the statistic tobe printed (the default format is a3; see below).

The format applied to a certain statistic can be changed by adding theappropriate display format specification in parentheses. For example, toincrease precision for the point estimates and display p-values andthe R-squared using four decimal places, type:

Depending on whether the plain option is specified ornot, esttab uses two different variants of the CSV format. Bydefault, that is, if plain is omitted, the contents of the tablecells are enclosed in double quotes preceded by an equal sign (i.e.="..."). This prevents Excel from trying to interpret thecontents of the cells and, therefore, preserves formatting elements such asparentheses around t-statistics. One drawback of this approach is, however,that the displayed numbers cannot directly be used for further calculationsin Excel. Hence, if the purpose of exporting the estimates is to doadditional computations in Excel, specify the plain option. In thiscase, the table cells are enclosed in double quotes without the equal sign,and Excel will interpret the contents as numbers. Example:

If you know a bit RTF you can also include RTF commandsto achieve specific effects, although you have to be careful not to break the document(most importantly, do not introduce unmatched curly braces).Useful are, for example, "\b ..." for boldface and "\i ..." for italics.A very helpful reference is the "RTF Pocket Guide" by Sean M. Burke (O'Reilly). Example

esttab is a wrapper for estout. Its syntax is much simpler than that of estout and, by default, it produces publication-style tables that display nicely in Stata's results window. The basic syntax of esttab is:

The procedure is to first store a number of models and then apply esttab to these stored estimation sets to compose a regression table. The main difference between esttab and estout is that esttab produces a fully formatted right away. Example:

The eststo command is used in this example to store the regression models. Models stored by eststo are automatically picked up by esttab (the command eststo clear on the last line removes the models from memory). An alternative would be to use Stata's official estimates store as in the following example:

The default of esttab is to display raw point estimates along with t-statistics and to print the number of observations in the table footer. To replace the t-statistics by, e.g., standard errors and add the adjusted R-squared type:

The t-statistics can also be replaced by p-values (option p), confidence intervals (option ci), or any parameter statistics contained in the estimates (see the aux() option). Further summary statistics options are, for example, pr2 for the pseudo R-squared and bic for Schwarz's information criterion. Moreover, there is a generic scalars() option to include any other scalar statistics contained in the stored estimates. For instance, to print p-values and add the overall F-statistic and information on the degrees of freedom, type:

esttab has sensible default settings for numerical display formats. For example, t-statistics are printed using two decimal places and R-squared measures are printed using three decimal places. For point estimates and, for example, standard errors an adaptive display format is used where the number of displayed decimal places depends on the scale of the statistic to be printed (the default format is a3; see below).