Stata Functions

0 views

Skip to first unread message

Agenor Ramadan

unread,

Aug 3, 2024, 4:53:32 PM8/3/24

to broodsummaivers

This is the twelfth post in the series Programming an estimation command in Stata. I recommend that you start at the beginning. See Programming an estimation command in Stata: A map to posted entries for a map to all the posts in this series.

Commands do work in Stata. Functions do work in Mata. Commands operate on Stata objects, like variables, and users specify options to alter the behavior. Mata functions accept arguments, operate on the arguments, and may return a result or alter the value of an argument to contain a result.

When a Mata function changes the value of an argument inside the function, that changes the value of that argument outside the function; in other words, arguments are passed by address. Mata functions can compute more than one result by storing these results in arguments. For example, sumproduct() returns both the sum and the element-wise product of two matrices.

In myadd() and sumproduct(), I did not specify what type of thing each argument must be, nor did I specify what type of thing each function would return. In other words, I used implicit declarations. Implicit declarations are easier to type than explicit declarations, but they make error messages and code less informative. I highly recommend explicitly declaring returns, arguments, and local variables to make your code and error messages more readable.

The error message in example 4 indicates that somewhere in myadd(), an operator or a function could not perform something on two objects because their types were not compatible. Do not be deluded by the simplicity of myadd(). Tracking down a type mismatch in real code can be difficult.

In contrast, the error message in example 5 says that the matrix C we passed to myadd2() is neither a real nor a complex matrix like the argument of myadd2() requires. Looking at the code and the error message immediately informs me that the problem is that I passed a string matrix to a function that requires a numeric matrix.

I use the feature that arguments may be implicitly declared to make my code easier to read. Many of the Mata functions that I write replace arguments with results. I explicitly declare arguments that are inputs, and I implicitly declare arguments that contain outputs. Consider sumproduct2().

will display the probability that exactly 1 (one) success will occur in a random experiment with distribution B(3,.3), that is, three trials and outcome probability .3. Stata will render the value .441.

Note that you may write dis binomialp(3,1.8,.3), requesting the probability that you will observe 1.8 successes, which is impossible as the values of a binomial random variable are always integers. Stata will use floor(1.8) instead, that is, 1.

will display the probability that 1 (one) or fewer successes will occur in a random experiment with distribution B(3,.3). In other words, Stata will render the value of the cumulative probability function. The probability for 0 (zero) successes is .343, and together with the probability for one success (.441) this will yield a cumulative value of .784.

will display the probability that 2 (two) or more successes will occur in a random experiment with distribution B(3,.3). In other words, Stata will render the value of the cumulative probability function for k (the number of successes) or more. As the value for up to 1 success is .784, the probability for 2 or more (that is, 2 or 3) successes by necessity is .216, and this is the value Stata will display.

will display the parameter p (that is, the probability for success in one trial) that corresponds to a binomial random trial with n = 3 and probability of .784 for 1 (one) or fewer successes. We know from the preceding that this parameter is .3.

will display the parameter p (that is, the probability for success in one trial) that corresponds to a binomial random trial with n = 3 and probability of .216 for 2 (two) or more successes. Again, this parameter is .3.

This distribution describes the behaviour of random variable with a binary outcome for samples without replacemet. It has four parameters: N, the size of the population, K, the number of successes in the population, n, the size of the sample, and k, the number of successes in the sample.

Normal distributions have two parameters; the mean, referred to by stata a m, and the standard deviation, denoted by s. As there is a infinite number of normal distributions (with different parameters m and/or s), statisticians often use the standard normal distribution with m = 0 and s = 1.

will display the density of the standard normal distribution at 0, i.e. .39894228 (the maximum, of course). This command has versions which accommodate for normal distributions with means and/or standard deviations that differ from those of the standard normal distribution. Thus, dis normalden(0,2) will display the density of a normal distribution with mean 0 and a standard deviation of 2 at the value x = 0, that is, its mean (the result being half the value of the standard normal distribution), whereas dis normalden(0,1,2) will produce an even lower value, i.e., the density at value 0 of a normal distribution with mean 1 and a standard deviation of 2.

Student's t distribution has the same shape as the standard normal distribution (and mean 0), but actually there is (in principle) an infinite number of t-distributions that vary according to their "degrees of freedom" (d.f.). As the d.f. increase, the t-distribution approaches the standard normal distribution. Thus,

This will create a variable called, appropriately, newvar. There are a variety of built-in functions and manual processes you can then do to calculate your desired value. Note that you can only calculate variables from existing string variables.

Now pretend I have a variable that is skewed. One way to attempt to correct for the skew is to take a logarithm. This is a good example of one of Stata's built-in mathematical functions that can be used in the generation of new variables. Let's pretend the variable is named askew. To take the log of askew, you would type gen logskew=log(askew) As before, this command will generate a new variable called "logskew" that has as its content the log base ten of the skewed variable. To see a complete list of mathematical functions available for calculating new variables type help math functions into the Command window. In addition to the more common mathematical functions, help functions will bring up a complete list of all the types of functions Stata offers. Click on blue text to go to the specific help file.

Reed College prohibits unlawful discrimination on the basis of race, color, national origin, religion, sex, sexual orientation, gender identity, gender expression, age, marital or familial status, military status, veteran status, genetic information, physical or mental disability, pregnancy, or any other category protected by federal, state, or local laws that apply to the college, in any area, activity or operation of the college, including in its employment policies, educational policies, admission policies, scholarship and loan programs, housing policies, athletic programs, and other school-administered programs.

The LCA_Distal Stata functions estimate the association between a latent class variable and a distal outcome. Both functions require the LCA Stata plugin (version 1.2.1 or higher) and Stata (version 9.1 or higher).

With great thanks to all those who kindly volunteered to test the beta version and have provided valuable suggestions for improvement, I am pleased to report that kobo2stata is now publicly available from the SSC. Please see my full announcement here: -kobo2stata-new-on-ssc/

You cannot output data from Kobo directly as a Stata file. But you can export your data and form as Excel and then use the kobo2stata command in Stata to automate the import/conversion of those into a labelled Stata format.

In this article, we briefly review the role of the propensity score in estimating dose-response functions as described in Hirano and Imbens (2004, Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 73-84). Then we present a set of Stata programs that estimate the propensity score in a setting with a continuous treatment, test the balancing property of the generalized propensity score, and estimate the dose-response function. We illustrate these programs by using a dataset collected by Imbens, Rubin, and Sacerdote.

N2 - In this article, we briefly review the role of the propensity score in estimating dose-response functions as described in Hirano and Imbens (2004, Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 73-84). Then we present a set of Stata programs that estimate the propensity score in a setting with a continuous treatment, test the balancing property of the generalized propensity score, and estimate the dose-response function. We illustrate these programs by using a dataset collected by Imbens, Rubin, and Sacerdote.

AB - In this article, we briefly review the role of the propensity score in estimating dose-response functions as described in Hirano and Imbens (2004, Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives, 73-84). Then we present a set of Stata programs that estimate the propensity score in a setting with a continuous treatment, test the balancing property of the generalized propensity score, and estimate the dose-response function. We illustrate these programs by using a dataset collected by Imbens, Rubin, and Sacerdote.

All content on this site: Copyright 2024 Elsevier B.V. or its licensors and contributors. All rights are reserved, including those for text and data mining, AI training, and similar technologies. For all open access content, the Creative Commons licensing terms apply

The egen command consists of functions that extend the capability of the generate command. The various functions within egen create variables that hold information about patterns and calculations within subgroups or across columns.