Part 2: Latent growth (mixture) modeling

Day 3: Longitudinal mixture modeling

The goal of this exercise is to find subpopulations with different alcohol use trajectories. To this end, we start with an exploratory latent class growth analysis, and work towards a growth mixture model.

Exercise 2A

First, we perform an exploratory latent class growth analysis (LCGA) as initial exploratory option. Set up LCGA models models for 1, 2, 3, and 4 classes for the data in DDS8_1.dat. Specify the model using the | notation, and constrain the variances of the intercept and slope factors to be equal to 0. Furthermore, request TECH11, and TECH14 to help evaluate model fit.

Mplus model syntax for the LCGA models for 1, 2, 3, and 4 classes are in exercise2A_1C.inp, exercise2A_2C.inp, exercise2A_3C.inp, exercise2A_4C.inp, respectively.

Exercise 2B

These models use random starting values. Several independent random starts are made, to ensure that the model converges on the proper solution. The default is 20 random sets of starting values, of which 4 are run to completion. Inspect the output, and look carefully if the model estimation has converged, especially for the larger number of classes. Look for warning and error messages, make sure you understand what they are telling you.

The STARTS option is used to specify the number of initial random starting values and final stage optimizations. Now, increase the number of starts to ensure proper convergence. Once you are confident that the model has converged to the proper solution, compare the different models using the available fit information (e.g., BIC, LMR-LRT, BLRT, entropy, min. N in classes, etc.). Which model do you prefer, and why?

You can inspect the convergence of the model by checking that the best final state loglikelihood value has been replicated using different starting values. See the output section RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES. Increasing the number of random starts can be done by specifying STARTS = 50 10; in the ANALYSIS command, where the first number specifies the initial number of random starts, and the second number specifies the number of initial starts that will be converged to the final stage.

Based on the fit indices of the fitted LCGA models, I would select a 3-class model. The fit indices and (LMR-LRT tests essentially indicate that you can keep adding classes and improve the model, which makes it difficult to decide. However, if we look at the counts in each class, we see that from 4 classes onward, the smallest class has less than 10% of cases assigned to it. The minimum posterior classification probability and entropy are best for the 3-class model, which means that this model can reasonably accurately assign individuals to classes.

Exercise 2C

Set up the same models as analyzed in the previous exercise, but now allow the means and variances of the intercept and slope factors to be freely estimated in each class (i.e., a growth mixture model). You do this by mentioning the intercept and slope explicitly in the class-specific part of the syntax. This is a more complex model, and we might therefore expect that we will need fewer classes for a good description of the data. This analysis will also take more computing time, so add PROCESSORS = 4 to the analysis section. Make a table of the fit indices, look at BIC, the LMR-LRT (TECH11), and the bootstrapped LRT value (TECH14).

Mplus model syntax for the GMM models for 1, 2, 3, and 4 classes are in exercise2C_1C.inp, exercise2C_2C.inp, exercise2C_3C.inp, exercise2C_4C.inp, respectively. The BLRT for the GMM with 3 classes warns that OF THE 10 BOOTSTRAP DRAWS, 7 DRAWS HAD BOTH A SMALLER LRT VALUE THAN THE OBSERVED LRT VALUE AND NOT A REPLICATED BEST LOGLIKELIHOOD VALUE FOR THE 3-CLASS MODEL. You can increase the number of random starts for the BLRT using LRTSTARTS = 0 0 40 10; in the ANALYSIS command. Also note the convergence problems for the GMM with 4 classes. This indicates that a model with 4 classes is likely to be too complex for the data. Furthermore, since the GMM with only a single class is equivalent to a “simple” LGCM, we focus on the GMM with 2 and 3 classes in the next exercises.

Exercise 2D

Plotting the model-predicted trajectories makes it easier to interpret the model. Moreover, visualizing the raw data provides yet another way to evaluate the fit of your mixture model to the data. With this in mind, plot the GMM models with 2 and 3 classes you created in exercise 1c, and interpret what you see. First, plot only the predicted trajectories. Then, plot raw data as well. Explain the benefit of plotting the raw data in your own words.

Plotting the raw data helps us understand how representative the average trajectory for each class captures the individual trajectories of individuals in that class. It helps us see how separable the classes are visually, instead of just relying on statistics like entropy. To get plots, make sure you include:

PLOT:
    TYPE = PLOT3;
    SERIES = ALC1YR1(1) ALC1YR2(2) ALC1YR3(3);

when running the model. Then press the “plot” button and select “Estimated mean and observed individual values” and “Estimated means and estimated individual values”. Below you can find these plots for the GMM with 2 classes.

Estimated means and individual values.

Estimated means and estimated invidivual trajectories

Admittedly, this plot looks chaotic. Primarily because the individual values are not colored according to the class to which they are assigned. Alternatively, you could get the estimated means and (individual) values per class in a separate window as well. However, MplusAutomation has some useful plotting functions that I recommend you to explore, for example plotGrowthMixtures(), in which you can color the lines according to their class membership.

Exercise 2E

Covariates are often added to mixture models, to predict 1) class membership 2) to explain variation in the growth parameters within the classes, or 3) as a distal outcome. Whenever covariates are however added to the model, they change the latent class solution. Sometimes, this is fine, as the covariates can help to improve the classification. In other cases, you would use a 3-step approach, which Mplus has automated:

  1. Fit an unconditional LCA (without covariates).
  2. A “most likely class variable” is created using the posterior distribution of step 1.
  3. This most likely class variable is then regressed on (a) covariate.

There are a few options for how to do 3-step analysis. They all rely on adding to the VARIABLE command. For more info, see https://www.statmodel.com/download/webnotes/webnote15.pdf.

Commands for conducting a 3-step model

You can add the following options to the VARIABLE command:

  1. AUXILIARY = x(R)
    This is actually a 1-step method for predicting latent class memberships using Pseudo-Class draws.
  2. AUXILIARY = x(R3step);
    A 3-step procedure, where covariates predict the latent class.
  3. AUXILIARY = y(e)
    A 1-step method, where the latent class predicts a continuous distal outcome.
  4. AUXILIARY = y(de3step);
    A 3-step procedure, where latent class predicts continuous covariates (distal outcome) with unequal means and equal variances.
  5. AUXILIARY = y(du3step);
    A 3-step procedure, where latent class predicts continuous covariates (distal outcome) with unequal means and variances.
  6. AUXILIARY = Y(dcon);
    Procedure for continuous distal outcomes as suggested by Lanza et al (2013).
  7. AUXILIARY = Y(dcon);
    Procedure for categorical distal outcomes as suggested by Lanza et al (2013).
  8. AUXILIARY = y(BCH);
    Improved and currently best 3-step procedure with continuous covariates as distal outcomes.

Pick your final model from 2C, and add both age and gender as auxiliary variables in the model. Try to think what 3-step model you want, and if you are not sure, run different models, so you can evaluate how the different procedures make a difference. What is the effect of both age and gender?

I’m providing an example using the 3-step procedure in exercise2E.inp. The results of adding these covariate to the model as predictors of the latent class can be found under the heading TESTS OF CATEGORICAL LATENT VARIABLE MULTINOMIAL LOGISTIC REGRESSIONS USING THE 3-STEP PROCEDURE. It can be seen, from the overall test and the pairwise comparisons, that the third group is significantly older, and has a significantly lower proportion of girls than the other two classes.