Latent Gold 6.0

4 views

Skip to first unread message

Brook Mithani

unread,

Aug 4, 2024, 4:14:07 PM8/4/24

to potsgrapimhtac

StatisticalInnovations specializes in innovative applications of statistical modeling, especially latent class, discrete choice and other latent variable models to obtain meaningful segments. As developers of Latent GOLD, CORExpress, SI-CHAID, and GOLDMineR, we provide consulting, online and onsite courses, and license our popular software packages.

Latent GOLD is a powerful latent class and finite mixture program with a very user-friendly point-and-click interface (GUI). Two add-on options are available to extend the basic version of the program.

The Choice add-on allows estimation of discrete choice models via the point-and-click interface. When obtaining both the Choice and the Advanced/Syntax add-on, various advanced choice models can be estimated and the Syntax can also be used to further the customize discrete choice models.

Includes GUI for:

LC Cluster Latent GOLD's cluster module provides the state-of-the-art in cluster analysis based on latent class models. Latent classes are unobservable (latent) subgroups or segments. Cases within the same latent class are homogeneous on certain criteria (variables), while cases in different latent classes are dissimilar from each other in certain important ways.

The traditional latent class model can be used to handle measurement and classification errors in categorical variables, and can accomodate avriables that are nominal, ordinal, continuous, counts, or any combination of these. Covariates can be included directly in the model as well for improved cluster description.

Latent GOLD improves over traditional ad-hoc types of cluster analysis methods by including model selection criteria and probability-based classification. Posterior membership probabilities are estimated directly from the model parameters and used to assign cases to the classes.

A DFactor model is often used for variable reduction or to define an ordinal attitudinal scale. It contains one or more DFactors which group together variables sharing a common source of variation. Each DFactor is either dichotomous (the default option) or consists of 3 or more ordered levels (ordered latent classes).

Latent GOLD makes it possible to estimate a regression model in a heterogeneous population as well by including a categorical latent variable. Each category of this latent variabe represents a homogeneous subpopulation (segment) having identical regression coefficients.

In addition to using predictors to estimate a regression model for each class, covariates can be specified to refine class descriptions and improve classification of cases into the appropriate latent classes.

After performing a latent class analysis, you might wish to investigate the relationship between class membership and external variables. A popular three-step approach is to first estimate the latent class model of interest (step 1), then assign individuals to latent classes using their posterior class membership probabilities (step 2), and subsequently investigate the association between the assigned class memberships and external variables (step 3).

In step 2, classification errors are introduced when assigning individuals to latent classes. The estimates of the association with the external variables need to be corrected for classification errors to prevent a downward bias (Bolck, Croon, and Hagenaars, 2004). The Step3 module implements two bias adjustments procedures (Vermunt, 2010).

The Step3 module can be used with external variables predicting the class membership (Covariate option) or with external variables which are predicted by the class membership (Dependent option). These two types of external variables are also referred to as concomitant variables and distal outcomes, respectively.

Latent class (LC) choice models analyze these data in a way that accounts for heterogeneity by allowing different population segments (latent classes) to express different preferences in making their choices.

For a first choice model, an extended multinomial logit model (MNL) is used to estimate the probability of making a specific choice as a function of choice attributes and individual characteristics (predictors).

The sequential logit model is used for situations where two or more choices are selected from a choice set. This includes a 1st and 2nd choice, 1st and last choice (best-worst), or other partial rankings as well as a complete ranking of all alternatives.

The latent Markov model is a popular longitudinal data variant of the standard latent class model; it is in fact a latent class cluster model in which individuals are allowed to switch between clusters across measurement occasions.

CFactors can be used to specify continuous latent variable models, such as factor analysis, item response theory models, latent trait models, and regression models with continuous random effects. The CFactors can be included in any LC Cluster, DFactor or LC regression model.

If included, additional information pertaining to the CFactor effects appear in the Parameters output and to CFactor scores in the Standard Classification, the ProbMeans, and the Classification Statistics output.

This advanced option is used to specify a multilevel extension to an LC Cluster

, DFactor or LC Regression model which allows for explanation of the heterogeneity not only at the case level, but also at the group level.

Group-level variation may also be accounted for by specifying group-level latent classes (GClasses) and/or group-level CFactors (GCFactors). In addition, when 2 or more GClasses are specified, group-level covariates (GCovariates) can be included in the model for improved description/ prediction.

Two important survey sampling designs are stratified sampling -- sampling cases within strata, and two-stage cluster sampling -- sampling within primary sampling units (PSUs) and subsequent sampling of cases within the selected PSUs. Moreover, sampling weights may exist.

While the assumed behavioral mechanism underlying RUM-based

models is that individuals select the alternative having the largest utility,

RRM-based models assume that individuals select the alternative having the

smallest potential regret.

All Student licenses are annual and must be renewed each year on or before the expiration date for continued use. Student licenses need to be prepaid and proof of your fulltime student status must be provided.

Tutorials take you step-by-step through several analyses of these sample files. These tutorials along with various publications are available on our website. Upon purchase of the program users can download a 200 page User's Guide or other Manuals that cover a wide range of topics on Latent Class Analysis and Latent GOLD .

There is NO limit concerning the number of records. The time will depend on several factors including the # of variables and records, speed of your machine, and the requested output. For many models, Latent GOLD runs 20 or more times faster than other Latent Class programs and version 5.0 is much faster than earlier versions. We suggest trying the demo program to see how fast Latent GOLD works on your machine.

Latent GOLD implements the 3 most important types of latent class (LC) models. It was designed to be extremely easy to use and to make it possible for people without a strong statistical background to apply LC analysis to their own data in a safe and easy way. LEM is a command language research tool that Prof. Jeroen Vermunt developed for applied researchers with a strong statistical background who want to apply nonstandard log-linear and latent class models to their categorical data. With LEM you can specify more probability structures with many more kinds of restrictions (if you know how to do it), but is not designed to be Windows friendly, requires strict data and input formats and does not provide error checks.

With Latent GOLD, continuous and count variables can be included in the model, and special LC output not available in LEM is provided, such as various graphs, classification statistics, and bivariate residuals. Latent GOLD also has faster (full Newton-Raphson) and safer (sets of starting values, Bayes constants) estimation methods for LC models than LEM. Both programs give information on nonidentifiability and boundary solutions, but Latent GOLD , unlike LEM, can prevent boundary solutions through the use of Bayes constants.

The set of example data files on our website contains various event history analysis examples. The setup for several Event History models can be opened in Latent GOLD using the HELP GUI Example Regression menu. Full tutorials are not yet available for these. However, to get you started, you might look at the data file land.sav, the full reference for which is " Land, K.C., Nagin, D.S., and McCall (2001). Discrete-time hazard regression models with hidden heterogeneity: the semi-parametric mixed Poisson approach. Sociological Methods and Research, 29, 342-373." Another good example is jobchange.dat.

Land.sav contains information on 411 males from working-class area of London who were followed from ages 10 through 31. The dependent variable is "first serious delinquency". As can be seen, there is one record for each time point, which is called a person-period data format. The dependent "first" is zero for all records of a person, expect for the last if a person experienced the event of interest at that age. The variables age and age_sq are the duration variables. These can also be seen as time-varying predictors. The variable "tot" is a time-constant covariate/predictor (a composite risk factor). Of course the ID should be used as Case ID to indicate which records belong to the same case.

The dependent "first" can be treated as a Poisson count or as a binomial count. The former option yields a piece-wise constant log-linear hazard model, the latter a discrete-time logit. If treated as Poisson count, it is best to set the exposure to one half (exp_half: event occurs in the middle of the interval) for the time point at which the event occurs. With a binomial count the exposure should be one all the time (=default). Age and age_sq should be used as class-dependent predictors. You identify two groups with clearly different age pattern in the rate of first delinquency. The variable "tot" can be used as class-independent predictor, but more interesting is to use it as covariate: does the risk factor determine the type of delinquency rajectory?