Part 1: Latent Class Analysis and Latent Transition Analysis
Day 3: Longitudinal mixture modeling
In this exercise we will explore the development of dating status over time using a latent transition analysis (LTA). However, we will slowly build up the model, starting with a latent class analysis (LCA) first at a single time points, before going to LTA. Use the data in DatingSex.dat
, which holds data on five dating indicators measured at two occasions (u11
, …, u15
, u21
, …, u25
), as well as the variable gender
. The \(u\)-variables represent five yes/no items measured at two time points (first digit represents time point, second digit represents the item).
The exercises in this lab session are designed, in the first place, for use of Mplus exclusively. However, for those interested, we have the option to run the latent growth mixture models in batch, using the R-package MplusAutomation
. This package allows you to automate part of your workflow (like making plots and tables), and provides functions for plots that are more aesthetically pleasing, as well as more insightful. The exercises using MplusAutomation
can be found with the course materials on SURFdrive.
Exercise 1: Latent Class Analysis
Before we can perform an LTA, we first need to determine the number of classes at each time point using separate LCAs.
Exercise 1A: Number of classes
Decide how many latent classes you think should be used. To determine this, first set up an LCA for the first time point (i.e., ignoring the variables at time point 2, as well as gender
). Run an LCA for 2, 3, 4, and 5 classes, and collect information on the AIC, BIC, the LMR LRT (TECH11
), the bootstrapped LRT (TECH14
), and the size of the smallest class. Fill in the table below, and then decide how many classes you want to use for the next set of analyses. To help in your decision, you can also create the LCA item profiles using the SERIES
function of the OUTPUT:
command.
Number of classes | AIC | BIC | LMR LRT | Bootstrapped LRT | Size smallest class |
---|---|---|---|---|---|
2 | |||||
3 | |||||
4 | |||||
5 |
The information for selecting the number of latent classes at the first time point has been filled in into the table below.
Number of classes | AIC | BIC | LMR LRT | Bootstrapped LRT | Size smallest class |
---|---|---|---|---|---|
2 | 6681 | 6735 | \(p < .001\) | \(p < .001\) | 360 |
3 | 6659 | 6743 | \(p = .017\) | \(p < .001\) | 216 |
4 | 6668 | 6781 | \(p = .146\) | \(p = 1.000\) | 43 |
5 | 6675 | 6818 | \(p = .473\) | \(p = .500\) | 43 |
The AIC indicates three classes, whereas the BIC is the lowest for the 2-classes LCA. In deciding between the two-classes or three-classes solution, I would let substantive arguments play the deciding role (i.e., see if the different classes can be given an substantive interpretation). For now, these exercises will continue with the two-classes solution.
Exercise 1B: Model convergence
Evaluate if the two-classes LCA model converged to a global solution. In other words, do we have reason to believe that Mplus did not find the best solution for the parameter estimates (i.e., is there evidence for a local solution)?
For this, we need to check if our best loglikelihood value is replicated many times in a second run with increases random start perturbations. It would be problematic if we cannot replicate our best solution from the first run (i.e., the solution with the lowest loglikelihood) when we use different starting values: That would imply Mplus found local solutions, rather than ‘the best’ global solution. Setting the STARTS
function in Mplus helps us check this. In the output of the first run, look for the RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES
section, and note that best loglikelihood. In the first run, we already see that the best loglikelihood value has been replicated approximately eight times out of ten (this number can change as the starting values are random). Now run a second analysis with increased number of random start perturbations. Again, look for the RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES
section, and check if the best loglikelihood from the first run has been replicated.
Exercise 1C: Model fit
Continuing with the two-classes solution. Have a look at the output, specifically the model fit information and the response pattern frequencies of TECH10
. Evaluate the model fit. Can we use the Chi-Square test of model fit for this? Are there particular patterns in the data that the 2-classes LCA cannot replicate well?
Looking at the Chi-Square Test of Model Fit, we can see that the Pearson Chi-Square and the Likelihood Ratio Chi-Square are very similar. This implies that we can actually used these tests to assess model fit. The p-value lower than 0.05 indicates the model does not fit well. Note, however, that the similarity of the Pearson Chi-Square and the Likelihood Ratio Chi-Square values is an exception rather than the rule. Especially when you have eight or more binary outcomes, it very likely that both values differ substantially, in which case you wouldn’t look at either of the tests.
The Standardized Residual (z-score) of the response patterns (in TECH10
of the output), can inform us about where the model misfit might come from. We are looking for absolute values greater than 1.96 to indicate significant differences between the observed frequencies and estimated frequencies. The most problematic response pattern is pattern 11, which is 00111, and also the most frequent pattern in the data. 94 individuals are observed to have such a pattern, but the model predicts only 67.85 individuals to have this. We can think of adjustments to the model to resolve this problem. For example, we might look at a three-classes solution after all, or we might inspect the BIVARIATE MODEL FIT INFORMATION
to get further information about which two items are problematic for to model.
Exercise 1D: LCA model parameters
Now look at the model output, specifically RESULTS IN PROBABILITY SCALE
and LATENT CLASS INDICATOR ODDS RATIOS FOR THE LATENT CLASS
. Which item (or items) is (are) the most predictive for which class you actually belong to?
The RESULTS IN PROBABILITY SCALE
tells us for each class separately what the probability is of scoring a category 1 (‘no’) or category 2 (‘yes’) for each item. These probabilities differ the most between the classes for items 3 and 5. This indicates that these two items are the most different across the classes, and therefore might also be the most useful for predicting which class you belong. This hunch is confirmed by results in the LATENT CLASS INDICATOR ODDS RATIOS FOR THE LATENT CLASS
. Here we see that the category you belong to for items 3 and 5 have the highest odds ratio for going to latent class 1 (compared to latent class 2). In other words, scoring higher than category 1 on item U13
multiplies your odds of belonging to latent class 1 compared to latent class 2 by 8.067.
Exercise 2: Latent Transition Analysis
For the LTA, we are going to extend the LCA of exercise 1 to two time points, and additionally include a structural relationship between the latent classes at time point 1 and time point 2.
Exercise 2A: Setting up the LTA
Set up a model with two latent class variables for the two time points, say c1
and c2
. Exclude the variable gender for now (i.e., explore the LTA without covariates first), and assume there are 2 latent classes. Restrict the thresholds (and hence response probabilities) across the two time points by first repeating the thresholds for each latent class, in both MODEL c1
and MODEL c2
. To be sure Mplus does what you want, include equality constraints on the five thresholds of %c1#1%
and %c2#1%
, and similarly for %c1#2%
and %c2#2%
. Note that these constraints can be interpreted as imposing measurement invariance over time. Although we don’t test the assumption of measurement invariance in these exercises, it is definitely something you would want to check in practice.
After running the analysis, inspect the proportions of yes/no answers for each of the indicators in the latent classes (look at probability scale in Mplus output).
The parameters in probability scale can be found in under the heading RESULTS IN PROBABILITY SCALE
in the output file. For example, we find that the probability of falling into category 1 of the item U11
(i.e., answering no) is 0.434 (\(SE = 0.024\)) if you are in class 1 at the first time point, and 0.566 (\(SE = 0.024\)) for falling into category 2 (i.e., answering yes). Because of the imposed constraints, the probabilities also apply to time point 2, see Latent Class C2#1
.
Exercise 2B: Estimated transition probabilities
Examine the proportions of participants in each class, based on the estimated model. Note that for each latent variable, the total proportions add up to 1. Next, examine the latent transition probabilities based on the estimated model. What do these probabilities signify?
You can find proportions of participants in each class under the heading FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE BASED ON ESTIMATED POSTERIOR PROBABILITIES
. These probabilities represent the proportion of the total sample that is assigned to each class. Note that an individual can have a non-zero probability of being assigned to both classes. For example, there might be a 70% probability that the person belongs to class 1, and a 30% probability that the person belongs to class 2. The proportions here are a sum across those probabilities for all participants. Thus, this person would contribute for 30% to the proportion of the sample in class 2.
You can find the latent transition probabilities based on the estimated model under the heading TECHNICAL 15 OUTPUT
. These probabilities represent the probability that an individual assigned to one class at time one, will be assigned to another class at time 2. So for example, we see that people in class 1 at time 1 also tend to be in class 1 at time 2 (.76 probability).