Lavaan Model Syntax

2 views

Skip to first unread message

Tisha

unread,

Aug 5, 2024, 1:12:32 AM8/5/24

to dangbiztverlay

Thecode piece above will produce a model syntax object, called myModel that can be used later when calling a function that actually estimates this model given a dataset. Note that formulas can be split over multiple lines, and you can use comments (starting with the # character) and blank lines within the single quotes to improve the readability of the model syntax.

In our second example, we will use the built-in PoliticalDemocracy dataset. This is a dataset that has been used by Bollen in his 1989 book on structural equation modeling (and elsewhere). To learn more about the dataset, see its help page and the references therein.

The variables can be either observed or latent variables. If the two variable names are the same, the expression refers to the variance (or residual variance) of that variable. If the two variable names are different, the expression refers to the (residual) covariance among these two variables. The lavaan package automatically makes the distinction between variances and residual variances.

Consider a simple one-factor model with 4 indicators. By default, lavaan will always fix the factor loading of the first indicator to 1. The other three factor loadings are free, and their values are estimated by the model. But suppose that you have good reasons to fix all the factor loadings to 1. The syntax below illustrates how this can be done:

If you need to constrain all covariances of the latent variables in a CFA model to be orthogonal, there is a shortcut. You can omit the covariance formulas in the model syntax and simply add an argument orthogonal = TRUE to the function call:

The lavaan package automatically generates starting values for all free parameters. Normally, this works fine. But if you prefer to provide your own starting values, you are free to do so. The way it works is based on the pre-multiplication mechanism that we discussed before. But the numeric constant is now the argument of a special function start(). An example will make this clear:

A nice property of the lavaan package is that all free parameters are automatically named according to a simple set of rules. This is convenient, for example, if equality constraints are needed (see the next subsection). To see how the naming mechanism works, we will use the model that we used for the Politcal Democracy data.

The function coef() extracts the estimated values of the free parameters in the model, together with their names. Each name consists of three parts and reflects the part of the formula where the parameter was involved. The first part is the variable name that appears on the left-hand side (lhs) of the formula. The middle part is the operator type (op) of the formula, and the third part is the variable in the right-hand side (rhs) of the formula that corresponds with the parameter.

We have seen the use of the pre-multiplication mechanism (using the * operator) a number of times: to fix a parameter, to provide a starting value, and to label a parameter. We refer to these operations as modifiers, because they modify some properties of certain model parameters. More modifiers will be introduced later.

Each term on the right-hand side in a formula can have one modifier only. If you want to specify more modifiers for the same parameter, you need to list the term multiple times in the same formula. For example:

In some applications, it is useful to impose equality constraints on one or more otherwise free parameters. Consider again the three-factor H&S CFA model. Suppose a user has a priori reasons to believe that the factor loadings of the x2 and x3 indicators are equal to each other. Instead of estimating two free parameters, lavaan should only estimate a single free parameter, and use that value for both factor loadings. The main mechanism to specify this type of (simple) equality constraint is by using labels: if two parameters have the same label, they will be considered to be the same, and only one value will be computed for them. This is illustrated in the following syntax:

The lavaan model syntax describes a latent variable model. The function lavaanify turns it into a table that represents the fullmodel as specified by the user. We refer to this table as the parameter table.

A description of the user-specified model. Typically, the modelis described using the lavaan model syntax; see details for more information. Alternatively, a parameter table (e.g., the output oflavParseModelString is also accepted.

Logical. Only relevant if the metric of each latentvariable is set by fixing the first factor loading to unity.If TRUE, it implies meanstructure = TRUE and std.lv = FALSE, and it fixes the intercepts of the markerindicators to zero, while freeing the means/intercepts of the latentvariables. Only works correcly for single group, single level models.

If TRUE, the metric of each latent variable isdetermined by fixing their variances to 1.0. If FALSE, the metricof each latent variable is determined by fixing the factor loading of thefirst indicator to 1.0. If there are multiplegroups, std.lv = TRUE and "loadings" is included in the group.label argument, then only the latent variances iof the first group will be fixed to 1.0, while the latentvariances of other groups are set free.

Can be logical or character string. Iflogical and TRUE, this implies effect.coding = c("loadings", "intercepts"). If logical and FALSE, it is set equal to the empty string. If "loadings" is included, equalityconstraints are used so that the average of the factor loadings (perlatent variable) equals 1. Note that this should not be used together with std.lv = TRUE. If "intercepts" isincluded, equality constraints are used so that the sum of theintercepts (belonging to the indicators of a single latent variable)equals zero.As a result, the latent mean will be freely estimated and usuallyequal the average of the means of the involved indicators.

If TRUE, the necessary constraints areimposed to make the (unrotated) exploratory factor analysis blocksidentifiable: for each block, factor variances are set to 1, factorcovariances are constrained to be zero, and factor loadings areconstrained to follow an echelon pattern.

Either a single integer or a named vector of integers.If nthresholds is a single integer, all endogenousvariables are assumed to be ordered with nthresholds indicating the number of thresholds needed in the model. If nthresholds is a named vector, it indicates the number of thresholds for these orderedvariables only. This argument should not be used in combination with varTable.

A vector of character strings. Only used ina multiple group analysis. Can be one or more of the following:"loadings", "intercepts","means", "regressions", "residuals" or"covariances", specifying the pattern of equalityconstraints across multiple groups. When (in the model syntax) a vector of labels is used as a modifier for a certain parameter, this will override the group.equal setting if it applies to this parameter. See also the Multiple groups section below for using modifiers in multiplegroups.

Logical. If TRUE, the group frequencies areconsidered to be free parameters in the model. In this case, aPoisson model is fitted to estimate the group frequencies. IfFALSE (the default), the group frequencies are fixed to theirobserved values.

The model syntax consists of one or more formula-like expressions, each onedescribing a specific part of the model. The model syntax can be read froma file (using readLines), or can be specified as a literalstring enclosed by single quotes as in the example below.

Blank lines and comments can be used in between the formulas, and formulas canbe split over multiple lines. Both the sharp (#) and the exclamation (!) characters can be used to start a comment. Multiple formulas can be placedon a single line if they are separated by a semicolon (;).

Thresholds: The "" operator can be used to define the thresholds of categorical endogenous variables (on the left hand sideof the operator). By convention, thethresholds (on the right hand sided, separated by the "+" operator,are named "t1", "t2", etcetera.

Usually, only a single variable name appears on the left side of anoperator. However, if multiple variable names are specified, separated by the "+" operator, the formula is repeated for eachelement on the left side (as for example in the third regression formula in the example above). The only exception are scaling factors, whereonly a single element is allowed on the left hand side.

In the right-hand side of these formula-like expressions, each element can bemodified (using the "*" operator) by either a numeric constant,an expression resulting in a numeric constant, an expression resultingin a character vector, or oneof three special functions: start(), label() and equal().This provides the user with a mechanism to fix parameters, to providealternative starting values, to label the parameters, and to define equalityconstraints among model parameters. All "*" expressions arereferred to as modifiers. They are explained in more detail in thefollowing sections.

To constrain a parameterto be equal to another target parameter, there are two ways. If youhave specified your own labels, you can use the fact that equal labels imply equal parameter values. If you rely on automatic parameter labels, youcan use the special function equal(). The argument ofequal() is the (automatic or user-specified) name of the targetparameter. For example, in the confirmatory factor analysis example below, theintercepts of the three indicators of each latent variable are constrained tobe equal to each other. For the first three, we have used the defaultnames. For the last three, we have provided a custom label for the y2a intercept.

In a multiple group analysis, modifiers that contain a single elementshould be replaced by a vector, having the same length as the numberof groups. If you provide a single element, it will be recycledfor all the groups. This may be dangerous, in particular when the modifieris a label. In that case, the (same) label is copied across all groups,and this would imply an equality constraint across groups.Therefore, when using modifiers in a multiple group setting, it is always safer (and cleaner) to specify the same number of elements as the number of groups. Consider this example with two groups: