Part 1: Latent transition analysis

Day 3: Longitudinal mixture modeling

In this exercise we will explore the development of dating status over time using a latent transition analysis. Use the data in DatingSex.dat, which holds data on five dating indicators measured at two occasions (u11, …, u15, u21, …, u25), as well as the variable gender. The u-variables represent five yes/no items measured at two time points (first digit represents time point, second digit represents the item).

The exercises in this lab session are designed, in the first place, for use of Mplus exclusively. However, for those interested, we have the option to run the latent growth mixture models in batch, using the R-package MplusAutomation. This package allows you to automate part of your workflow (like making plots and tables), and provides functions for plots that are more aesthetically pleasing, as well as more insightful. The exercises using MplusAutomation can be found with the course materials on SURFdrive.

Exercise 1A

Set up a model with two latent class variables for the two time points, say c1 and c2. Exclude the variable gender for now (i.e., explore the LTA without covariates first), and assume there are 2 latent classes. Restrict the thresholds (and hence response probabilities) across the two time points by first repeating the thresholds for each latent class, in both MODEL c1 and MODEL c2. To be sure Mplus does what you want, include equality constraints on the five thresholds of %c1#1% and %c2#1%, and similarly for %c1#2% and %c2#2%. Note that these constraints can be interpreted as imposing measurement invariance over time. Although we don’t test the assumption of measurement invariance in these exercises, it is definitely something you would want to check in practice.

After running the analysis, inspect the proportions of yes/no answers for each of the indicators in the latent classes (look at probability scale in Mplus output).

The Mplus syntax should is given in exercise1A.inp. The parameters in probability scale can be found in under the heading RESULTS IN PROBABILITY SCALE in the output file. For example, we find that the probability of falling into category 1 of the item U11 (i.e., answering yes) is 0.434 (\(SE = 0.024\)) if you are in class 1 at the first time point, and 0.566 (\(SE = 0.024\)) for falling into category 2 (i.e., answering yes). Because of the imposed constraints, the probabilities also apply to time point 2, see Latent Class C2#1.

Exercise 1B

Examine the proportions of participants in each class, based on the estimated model. Note that for each latent variable, the total proportions add up to 1. Next, examine the latent transition probabilities based on the estimated model. What do these probabilities signify?

You can find proportions of participants in each class under the heading FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE BASED ON ESTIMATED POSTERIOR PROBABILITIES. These probabilities represent the proportion of the total sample that is assigned to each class. Note that an individual can have a non-zero probability of being assigned to both classes. E.g., there might be a 70% probability that the person belongs to class 1, and a 30% probability that the person belongs to class 2. The proportions here are a sum across those probabilities for all participants. Thus, this person would contribute for 30% to the proportion of the sample in class 2.

You can find the latent transition probabilities based on the estimated model under the heading TECHNICAL 15 OUTPUT. These probabilities represent the probability that an individual assigned to one class at time one, will be assigned to another class at time 2. So for example, we see that people in class 1 at time 1 also tend to be in class 1 at time 2 (.76 probability).

Exercise 1C

If there is time, you can conduct additional analyses. Include gender as a control variable on the observed variables.

You could include gender as a control variable on the observed variables, by adding it to the USEVARIABLES, and including the following lines:

u11-u15 ON gender; u21-u25 ON gender;

See exercise1C.inp. Alternatively, you could regress class membership on gender, to see whether men are more likely to be in a particular class than women, or vice versa. This is only allowed when you’re NOT using probability parametrization. So, you would have to remove this line:

PARAMETERIZATION = PROBABILITY;

And add this line:

c1 c2 ON gender;

Exercise 1D

We are going to extend the LTA to a Mover-Stayer LTA. This model assumes that there is a subpopulation of “stayers” that stay in the same class, and a subpopulation of “movers” that is free to transition to a different class over time. In other words, a Mover-Stayer LTA includes an additional categorical latent variable with a “movers-class” and a “stayers-class”.

Answer the below questions:

  • Continuing with the 2 classes at both time points. Then, for movers, there is no restriction on the 2 by 2 transition matrix. However, what should the transition matrix for the stayers look like in terms of probabilities? Furthermore, what should the transition matrix for stayers look like in terms of thresholds?
  • Add a Movers-Stayers class (i.e., an additional latent categorical factor with 2 classes, namely a movers class and a stayers class) to the LTA specified in the previous exercise. Make sure that for the movers class the transition matrix has no restrictions, and that the stayers class has the restrictions as discussed above.
  • Run the Movers-Stayers class and look in the output. Which proportion of people is categorized as movers/stayers?

By definition, for stayers who are in category 1 at time point 1, there is a probability of 1 for being in category 1 at time point 2 as well, and a probability of 0 for transitioning to category 2. For stayers who are in category 2 at time point 1, there is a probability of 0 for being in category 1 at time point 2, but a probability of 1 for being in category 2. Translating this to thresholds, a large negative threshold represents a very high probability of being into that category, whereas a large positive threshold represents a low probability. So stayers who are in category 1 at time point 1 should have a large negative threshold for category 1 at time point 2, and a large positive threshold for category 2, etc.

Mplus model syntax for the mover-stayer model can be found in exercise1D.inp. Looking at TECHNICAL 15 OUTPUT, we can clearly see that we have specified a movers and a stayers class:

P(C2=1|CM=1,C1=1)=0.271
P(C2=2|CM=1,C1=1)=0.729

P(C2=1|CM=1,C1=2)=0.542
P(C2=2|CM=1,C1=2)=0.458

P(C2=1|CM=2,C1=1)=1.000
P(C2=2|CM=2,C1=1)=0.000

P(C2=1|CM=2,C1=2)=0.000
P(C2=2|CM=2,C1=2)=1.000

Looking at the information under FINAL CLASS COUNTS AND PROPORTIONS FOR EACH LATENT CLASS VARIABLE BASED ON THEIR MOST LIKELY LATENT CLASS PATTERN, we see that 66.8% is classified as a mover, and 33.2% as a stayer.

Exercise 1E

We have not investigated whether 2 classes is the right number for this dataset. Investigate how many classes at each time point you would choose.

Think about:

  • Whether you think there should be an equal number of classes at both time points (this is mostly a theoretical decision).
  • How to build the model. Should you start by comparing unconstrained or constrained models?
  • How to decide what solution you prefer.