Simulation-based calibration

Author

By Suzanne Hoogeveen

Disclaimer: This is a draft version of the tutorial. Please let me know about any mistakes (and excuse any typos).

We have already seen various methods to assess the appropriateness of our priors and posteriors, by means of prior/posterior predictive checks. Another method for validating the appropriateness of Bayesian models is based on assessing the faithfulness of the computational model, i.e., whether the model can recover the parameters that were used to generate the data. This is often referred to as simulation-based calibration (SBC) (Talts et al. 2018). Here, we follow the Bayesian workflow proposed in Kucharský et al. (2021), based on the recommendations by Talts et al. (2018) and Schad, Betancourt, and Vasishth (2021).

A full workflow for robust Bayesian inference focuses on the following steps:

assessing adequacy of priors (prior predictive checks): do the priors lead to reasonable predictions?
assessing computational faithfulness (through simulation-based calibration): can the model recover the parameters that were used to generate the data?
assessing model sensitivity: can the model return unbiased estimates and effectively update prior beliefs (i.e., can the model learn from data)?
assessing adequacy of posteriors: posterior predictive checks: do the posterior estimates reflect reasonable predictions?

In practice this involves:

Take the prior \(\pi(\theta)\) and randomly draw a parameter set \(\tilde{\theta}\) from it: \(\tilde{\theta} \sim \pi(\theta)\)
Use this parameter set \(\tilde{\theta}\) to simulate hypothetical data \(\tilde{y}\) from the model: \(\tilde{y} \sim \pi(y|\tilde{\theta})\)
Fit the model to this hypothetical data and draw samples from the posterior distribution: \(\tilde{\theta'} \sim \pi(\theta|\tilde{y})\)
Find the rank of the true parameter \(\theta\) within the posterior samples \(\tilde{\theta'}\) (that is, the count of posterior samples smaller than the generating parameter value).
Repeat steps 1-4, say, 100 times
Assess faithfulness of the model by comparing the posterior estimates to the prior draws that were used to generate the data
Assess model sensitivity by comparing the posterior estimates to the prior estimates (i.e., is the model unbiased and can it effectively update prior beliefs?)
Fit the model to the actual data
Assess the adequacy of the posterior estimates and predictions

Example: afterlife beliefs and mind-body dualism

To show SBC in action, we’ll apply it to estimating a multilevel model on cross-cultural variation in afterlife beliefs and mind-body dualism, using a subset of the data from Hoogeveen et al. (2023). The idea that high-level mental processes (e.g., love) continue after physical death, whereas bodily processes (e.g., hunger) cease. The original data consists of 10195 subjects from 24 countries, who each provided 6 continuation ratings (i.e., 61170 observations). Specifically, responses were tabulated as x out of 3 ‘yes, continues’ responses in the body-condition (hunger, working brains, hearing) and x out of 3 ‘yes, continues’ responses in the mind-condition (love, knowledge, desire). In the example here, we use a subset of the data with \(60\) subjects across \(10\) countries.

1. Prior predictive checks

To ensure that the model makes predictions are accordance with theoretical expectations we create prior predictive plots. Because we’re working with a (aggregated) binomial model (i.e., ‘yes’ vs. ‘no’ responses, across 3 trials per category), we need to be especially careful about the prior settings. That is, because the priors are set on the transformed scale (logit-transformation), we need to check what the prior settings translate into on the probability scale. We can investigate prior predictions (on the response scale) before we touch the actual data. In this case, we have to specify priors on the intercept (i.e., overall probability of saying ‘continues’), the experimental effect (i.e., difference in probability of saying ‘continues’ between mental and physical states), and the between-country variation.

Show the code

library(brms)

Loading required package: Rcpp

Loading 'brms' package (version 2.22.0). Useful instructions
can be found by typing help('brms'). A more detailed introduction
to the package is available through vignette('brms_overview').


Attaching package: 'brms'

The following object is masked from 'package:stats':

    ar

Show the code

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Show the code

library(ggplot2)

# specify the folder you want your output to be saved in
output_file <- "sbc/"
iterations = 5000
warmup = 2500

runSims = FALSE

Show the code

# the data is a subset of the full cross-cultural dataset on afterlife beliefs
# and mind-body dualism 
dat <- read.csv("afterlife_beliefs.csv")
head(dat)

  country subject resp  cat condition site
1 Belgium     468    2 -0.5      body    1
2 Belgium     468    3  0.5      mind    1
3 Belgium     473    1 -0.5      body    1
4 Belgium     473    3  0.5      mind    1
5 Belgium     475    2 -0.5      body    1
6 Belgium     475    2  0.5      mind    1

Show the code

logit <- function(p) log(p / (1 - p))
logit_dif <- function(p_diff){
  logit_A <- logit(ifelse(0.2-p_diff/2<0,0.01,0.2-p_diff/2))
  logit_B <- logit(0.2+p_diff/2)
  return(logit_B - logit_A)
}

#
m_int <- logit(0.2) # -1.39
min_int <- logit(0.05)
max_int <- logit(0.42)
sd_int <- (max_int - min_int) / 4 # 0.37

# state effect 
m <- logit_dif(0.15) # 0.98
min <- logit_dif(0.01)
max <- logit_dif(0.25)
sd <- (max - min) / 4 # 0.43

Here, we use informative priors to reflect the knowledge we have from previous studies:

overall average continuity is 0.2, from 0.05 to 0.42 (across different studies). Note that this is about the difference in means across studies, not across countries. We don’t know much about the difference between countries, but we do know that the overall intercept is around 0.2, with a range from 0.05 to 0.42.
- we can translate this to the logit-scale. The mean will be \(log(p / (1 - p))\), so \(log(0.2 / (1 - 0.2)) = -1.39\)
- the minimum value is \(log(0.05 / (1 - 0.05)) = -2.94\) and the maximum is \(log(0.42 / (1 - 0.42)) = -0.32\). From that we can get an approximate standard deviation of \(sd = (max - min) / 4 = 0.66\). We thus predict a distribution that puts more density on the lower values of 0/3, 1/3, 2/3 and 3/3 responses.
state condition effect: We know that in previous studies, the average continuity was around 20%, with a different between mental and bodily states of around 15%. This means an expected difference of around 0.98 on the logit-scale. We also allow for some uncertainty, specifically, the observed range from 0.01 to 0.58 in the state-effect, which means a standard deviation of about 0.43 on the logit-scale (again we don’t know much about between-country differences).

The prior predictions visualized in Figure 1 confirm these expectations. The top panel shows the prior predictive distribution of all responses. The bottom panels show that the distribution in the body condition predicts less continuity than the mind condition, which is in line with our prior information from previous literature.

Show the code

# specify priors based on previous literature
prior_inf <- c(brms::prior(normal(-1.39, 0.65), class = Intercept),
               brms::prior(normal(0.98, 0.42), class = b),
               brms::prior(normal(0.5, 0.2), class = sd),
               brms::prior(lkj(2), class = L))
# fit a model with option 'sample_prior = "only"' to generate prior predictive samples
fit_prior_inf <- brm(data = dat, family = binomial,
      resp | trials(3) ~ 1 + cat + 
        (1 + cat | site),
      prior = prior_inf,
      iter = iterations, warmup = warmup, chains = 4, cores = 4,
      sample_prior = "only",
      seed = 2025, backend = "cmdstanr")

Start sampling

Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 1 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 1 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 1 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 1 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 1 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 1 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 1 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 1 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 1 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 1 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 1 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 1 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 1 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 1 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 1 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 1 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 1 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 1 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 1 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 1 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 1 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 1 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 1 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 1 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 1 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 1 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 1 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 1 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 1 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 1 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 1 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 1 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 1 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 1 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 1 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 1 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 1 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 2 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 2 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 2 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 2 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 2 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 2 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 2 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 2 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 2 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 2 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 2 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 2 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 2 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 2 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 2 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 2 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 2 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 2 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 2 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 2 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 2 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 2 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 2 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 2 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 2 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 2 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 2 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 2 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 2 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 2 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 2 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 2 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 2 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 2 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 2 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 2 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 2 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 2 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 3 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 3 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 3 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 3 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 3 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 3 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 3 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 3 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 3 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 3 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 3 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 3 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 3 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 3 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 3 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 3 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 3 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 3 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 3 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 3 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 3 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 3 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 3 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 3 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 3 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 3 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 3 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 3 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 3 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 3 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 3 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 3 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 3 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 3 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 3 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 3 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 3 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 3 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 3 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 4 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 4 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 4 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 4 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 4 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 4 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 4 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 4 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 4 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 4 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 4 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 4 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 4 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 4 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 4 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 4 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 4 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 4 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 4 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 4 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 4 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 4 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 4 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 4 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 4 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 4 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 4 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 4 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 4 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 4 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 4 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 4 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 4 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 4 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 4 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 4 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 4 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 1 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 1 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 1 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 1 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 1 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 1 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 1 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 1 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 1 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 1 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 1 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 1 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 1 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 2 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 2 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 2 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 2 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 2 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 2 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 2 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 2 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 2 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 2 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 2 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 2 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 2 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 2 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 3 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 3 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 3 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 3 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 3 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 3 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 3 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 3 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 3 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 3 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 3 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 3 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 3 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 4 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 4 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 4 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 4 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 4 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 4 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 4 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 4 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 4 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 4 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 4 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 4 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 4 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 4 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 4 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 4 finished in 0.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.4 seconds.

Show the code

# (1) extract samples
prior_draws <- posterior::as_draws_df(fit_prior_inf)
# (2) select all relevant samples 
vars_names <- variables(prior_draws)
vars_names <- vars_names[1:(which(vars_names=="lprior")-1)]
samples_pars <- prior_draws[, vars_names]

Warning: Dropping 'draws_df' class as required metadata was removed.

Show the code

# (3) compute posterior mean and variance
mu_prior <- posterior::summarise_draws(prior_draws, mean) %>% 
  filter(variable %in% vars_names)
var_prior <- posterior::summarise_draws(prior_draws, var = distributional::variance) %>% 
  filter(variable %in% vars_names)
# (4) thin the chains and export parameter values from priors
samples_pars <- samples_pars[seq(from=1, to=nrow(samples_pars), by = 100),]
write.csv(samples_pars,
          paste0(output_file, 'prior_pred_all.csv'),
          row.names = FALSE)
# (5) thin the chains of predicted values
pred_responses <- posterior_predict(fit_prior_inf)
pred_responses <- pred_responses[seq(from=1, to=nrow(pred_responses), by = 100),]
# (6) export predicted responses from priors
colnames(pred_responses) <- paste0('yrep', 1:ncol(pred_responses))
write.csv(pred_responses,
          paste0(output_file, 'prior_pred.csv'),
          row.names = FALSE)
# (7) export mean and variance from prior
write.csv(mu_prior, paste0(output_file, 'sim_mu_prior.csv'), 
          row.names = FALSE)
write.csv(var_prior, paste0(output_file, 'sim_var_prior.csv'), 
          row.names = FALSE)

Show the code

pred_responses <- read.csv(paste0(output_file, 'prior_pred.csv'),
                           header = TRUE)
cols <- c('#99B898','#FF847C')

p1 <- bayesplot::ppd_dens_overlay(ypred = as.matrix(pred_responses)) + 
  ggtitle("Prior predictive distribution") + 
  ylim(0, 4) +
  scale_x_continuous(breaks = seq(0, 3, 1), 
                     labels = c("0/3","1/3","2/3","3/3")) +
  theme_minimal() +
  theme(legend.position="none")
# or per condition
p2 <- bayesplot::ppd_dens_overlay(ypred = as.matrix(pred_responses[,seq(1,ncol(pred_responses), by = 2)])) +
  ggtitle("Prior predictive - body condition") + 
  scale_color_manual(values = cols[1]) +
  theme_minimal() +
  ylim(0, 4) +
  scale_x_continuous(breaks = seq(0, 3, 1), 
                     labels = c("0/3","1/3","2/3","3/3")) +
  theme(legend.position="none")
p3 <- bayesplot::ppd_dens_overlay(ypred = as.matrix(pred_responses[,seq(2,ncol(pred_responses), by = 2)])) +
  ggtitle("Prior predictive - mind condition") + 
  scale_color_manual(values = cols[2]) +
  ylim(0, 4) +
  scale_x_continuous(breaks = seq(0, 3, 1), 
                     labels = c("0/3","1/3","2/3","3/3")) +
  theme_minimal() + theme(legend.position="none")

gridExtra::grid.arrange(p1, p2, p3,
             layout_matrix = rbind(c(1, 1),
                                   c(2, 3)))

Figure 1: Prior model predictions of data featuring 60 subjects across 10 countries. Before having been in contact with the data, the model predicts slightly more lower values of 0/3, 1/3, 2/3 or 3/3 continuity responses (top panel) and more continuity in the mind compared to the body condition (bottom panels).

2. Computational faithfulness

The idea of the computational faithfulness check is to see if the implemented sampling algorithm is able to recover the parameters that the data was simulated from. This is done by simulating data from the prior distribution, and then fitting the model to this simulated data. The MCMC algorithm should be able to recover the prior distribution accurately, which can be assessed by comparing the posterior samples to the prior draws that were used to generate the data.

Here, we generated \(100\) datasets from the prior distribution using predictive predictives. We then fitted the model to each of these datasets. To validate the computational faithfulness of the model, we then compared for each simulated dataset, the initial parameter value drawn from the prior (i.e., the draw that generated the data) to the samples from the posterior. This involved tallying the number of posterior samples which are smaller than the prior draw. When the model is computational faithful, it should be able to recover the prior distribution accurately. Specifically, all ranks –that is, the count of posterior samples smaller than the generating parameter value– should be equally probable, resulting in an approximately uniform distribution of rank statistics. Figure 2 depicts the results from the simulation based calibration for the fixed effect parameters and a random subset of country-level parameters (random effects) and confirm the approximately uniform distribution. These results shows that the MCMC algorithm faithfully recovers the prior distribution.

Note that for the simulation-based calibration, we do need to thin the chains, because we need the draws from the posterior to be roughly independent in order to get an accurate estimate of the rank statistic.

Show the code

# initialize the simulation study 
nsims_target   <- 100
nsims_ready    <- length(list.files(output_file, pattern='sim_mu_post', 
                                    full.names = TRUE, recursive = TRUE))
nsims          <- nsims_target - nsims_ready

pred_responses <- read.csv(paste0(output_file, 'prior_pred.csv'),
                           header = TRUE)
mu_prior   <- t(read.csv(paste0(output_file, 'sim_mu_prior.csv'),
                       header = TRUE)[,2])
var_prior  <- t(read.csv(paste0(output_file, 'sim_var_prior.csv'), 
                       header = TRUE)[,2])
# load prior parameters samples 
sbc_prior <- read.csv(paste0(output_file, 'prior_pred_all.csv'),
                      header = TRUE)[1:nsims_target,] 
par_names_post <- par_names_prior <- colnames(sbc_prior)
colnames(mu_prior) <- colnames(var_prior) <- colnames(sbc_prior)

Show the code

# run the simulation study (this takes a while, even for this simple model)
# you only have to run this one though
# the output files are also included in the github repository, so you can 
# download those and inspect the results directly.
dataset <- dat
for(n in 1:nsims){
  # add responses from prior predictives to the data
  dataset$resp <- as.numeric(pred_responses[nsims_ready + n,])
  
  # fit the model on the simulated data
  fit_post <- brm(data = dataset, family = binomial,
                  resp | trials(3) ~ 1 + cat + (1 + cat | site),
                  c(prior(normal(-1.39, 0.65), class = Intercept),
                    prior(normal(0.98,0.42), class = b),
                    prior(normal(0.5,0.2), class = sd),
                    prior(lkj(2), class = L)),
                  iter = iterations, warmup = warmup, 
                  chains = 4, cores = 4,
                  seed = 2025, backend = "cmdstanr")
  
  # get posterior samples
  sbc_post <- posterior::as_draws_df(fit_post)
  sbc_post <- sbc_post[, 1:(which(colnames(sbc_post) == 'lprior')-1)]
  
  par_names_post  <- colnames(sbc_post) 
  par_names_prior <- colnames(sbc_prior)
  # compute the posterior medians and variances
  mu_post  <- apply(sbc_post, 2, mean)
  var_post <- apply(sbc_post, 2, var) 
  # compute the 50% and 80% credible intervals
  ci_post  <- apply(sbc_post, 2, function(x) quantile(x, c(0.1, 0.9, 0.25, 0.75)))
  # thin the chains
  sbc_post <- sbc_post[seq(from=1, to=nrow(sbc_post), by = 100),]
  
  rank <- data.frame(matrix(nrow = 1, ncol = ncol(sbc_post)))
  colnames(rank) <- par_names_post
  
  # loop over parameters
  for (j in 1:length(par_names_post)) {
    
    # count the posterior samples that are smaller than the draw from the prior
    rank[, j] <- sum(sbc_post[, par_names_post[j]] < sbc_prior[nsims_ready + n,
                                                               par_names_prior[j]])
    
  }
  
  # export results
  write.csv(mu_post, 
            paste0(output_file, 'sim_mu_post_', nsims_ready + n, '.csv'), 
            row.names = FALSE)
  write.csv(var_post, 
            paste0(output_file, 'sim_var_post_', nsims_ready + n, '.csv'), 
            row.names = FALSE)
  write.csv(ci_post, 
            paste0(output_file, 'sim_ci_post_', nsims_ready + n, '.csv'), 
            row.names = FALSE)
  write.csv(rank, 
            paste0(output_file, 'sim_rank_', nsims_ready + n, '.csv'), 
            row.names = FALSE)
  rm(fit_post)
}

Show the code

# load the simulated prior and posterior estimates 
nsims_ready    <- sum(grepl('sim_mu_post', 
                            list.files(output_file, pattern='sim_mu_post',
                                       full.names = TRUE, recursive = TRUE)))

sbc_prior <- read.csv(paste0(output_file, 'prior_pred_all.csv'),
                      header = TRUE)[1:nsims_ready,] 
npars <- ncol(sbc_prior) # 26

# compute necessary quantities
rank           <- data.frame(matrix(nrow = 0, ncol = npars))
mu_post        <- data.frame(matrix(nrow = 0, ncol = npars))
var_post       <- data.frame(matrix(nrow = 0, ncol = npars))
ci_post        <- list()
colnames(rank) <- colnames(mu_post) <- colnames(var_post) <- par_names_post


for(n in 1:nsims_ready){
  mu_post[nrow(mu_post) + 1,] <- t(read.csv(paste0(output_file, 
                                                   'sim_mu_post_', n, '.csv'),
                                            header = TRUE))
  var_post[nrow(var_post) + 1,] <- t(read.csv(paste0(output_file, 
                                                     'sim_var_post_', n, '.csv'),
                                              header = TRUE))
  rank[nrow(rank) + 1,] <-  t(read.csv(paste0(output_file, 
                                              'sim_rank_', n, '.csv'),
                                       header = TRUE))
  ci_post[[length(ci_post) + 1]] <- read.csv(paste0(output_file, 
                                                    'sim_ci_post_', n, '.csv'),
                                             header = TRUE)
}

Show the code

# select the parameters to plot (fixed effects of intercept and condition, 
# plus a random subset of 3 random effects (country-level))
nfix  <- sum(grepl('^b_|^sd_', colnames(sbc_prior)))
nran <- sum(grepl('^r_site', colnames(sbc_prior)))/2
countries <- unique(dat$country)
all <- FALSE

if(all){
  par_vector   <- 1:npars
  param_labels <- par_names_post
  
} else {
  ran_index <- sort(sample(1:nran, 4))
  # four random intercepts and condition effects 
  intercepts <- paste0('r_site.', ran_index, '.Intercept.')
  effects    <- paste0('r_site.', ran_index, '.cat')
  fixed <- colnames(sbc_prior)[1:nfix]
  names <- c(fixed, intercepts, effects)
  
  par_vector <- NULL
  for(n in 1:length(names)){
    new_index  <- grep(names[n], colnames(sbc_prior)) 
    par_vector <- c(par_vector, new_index)
  }
  
  # gives nice labels for plotting 
  param_labels <- c(bquote('Intercept'[.(countries[ran_index[1]])]),
                    bquote('Intercept'[.(countries[ran_index[2]])]),
                    bquote('Intercept'[.(countries[ran_index[3]])]),
                    bquote('Intercept'[.(countries[ran_index[4]])]),
                    bquote('Condition'[.(countries[ran_index[1]])]),
                    bquote('Condition'[.(countries[ran_index[2]])]),
                    bquote('Condition'[.(countries[ran_index[3]])]),
                    bquote('Condition'[.(countries[ran_index[4]])]),
                    expression(sigma['Intercept']),
                    expression(sigma['Condition']),
                    expression(mu['Intercept']),
                    expression(mu['Condition']))
}

Show the code

transparentColor <- function(color, percent = 50, name = NULL) {
  rgb.val <- grDevices::col2rgb(color)
  ## Make new color using input color as base and alpha set by transparency
  t.col <- rgb(rgb.val[1], rgb.val[2], rgb.val[3],
               max = 255,
               alpha = (100 - percent) * 255 / 100,
               names = name)
  invisible(t.col)
}
band_color <- transparentColor(color = "grey36", percent = 90, name = 'colpost')
hist_color <- transparentColor(color = "slateblue", percent = 70, name = 'colpost')

# plot the historgrams
par(mfrow=c(3, 4))

# grey band
nr_simulations <- nsims_ready
# nr_samples_post_for_every_prior_sample
L <- 100 # equivalent to highest number of rank
nr_of_bins <- 10
# equals ideal number of samples per bin
# how_many_sims_in_how_many_bins
thinning <- nr_simulations/nr_of_bins

# for the code later
breaks <- L/nr_of_bins
low  <- qbinom(0.025, nr_simulations, thinning/nr_simulations)
mid  <- qbinom(0.5, nr_simulations, thinning/nr_simulations)
high <- qbinom(0.975, nr_simulations, thinning/nr_simulations)
 
x <- seq(from = 1, to = L)

for(j in seq_along(par_vector)){
  
  hist_param <- hist(rank[, par_vector[j]], 
                     breaks = nr_of_bins, 
                     plot=FALSE)
  plot(1, type='n',
       las = 1, 
       main = param_labels[j], 
       xlab = 'Rank', ylab = 'Frequency',
       bty='n', cex.lab=1.5, cex.main=2,
       xlim = c(0, 100),
       ylim = c(0, high + 5)
  ) 
  polygon(c(x, rev(x)), rep(c(high, low), each = L), col = band_color, border = NA)
  abline(h=c(mid, high, low), lwd = c(2, 1, 1), lty=c(1, 3, 3))
  plot(hist_param, col = hist_color, add = TRUE) # Add 2nd histogram using different color
    
}

Figure 2: Histograms of the rank statistic for simulated data featuring 10 countries and 60 subjects per country. The grey band represents the expected distribution of ranks assuming a uniform distribution. The rank statistic for each parameter is approximately uniformly distributed which indicates a good recovery of the prior distribution.

A downside of the rank histogram is that the interpretation depends on the width of the bins. A more informative way to visualize the rank statistics is to plot the empirical cumulative distribution function (ECDF) of the ranks, and more specifically, the difference between the perfectly uniform CDF and the empirical CDF of the ranks, including the 95% interval of expected deviations. As shown in Figure 3, for all parameters, the ECDF of the ranks is approximately uniform (i.e., it stays within the band), which indicates that the model is computationally faithful, that is, unbiased with the appropriate level of certainty.

Show the code

plot_sbc_ecdf_diff <- function(ranks, n_rep = 10000, conf_level = 0.95, main = "ECDF difference") {
  ranks <- sort(ranks)
  n_samples <- length(ranks)
  xvals <- 0:max(ranks)
  # Empirical ECDF
  ecdf_vals <- ecdf(ranks)(xvals)
  # Theoretical uniform ECDF
  uniform_ecdf <- xvals / max(xvals)
  # Simulate ECDFs under uniform null
  sim_ecdfs <- t(replicate(n_rep, ecdf(sample(xvals, size = n_samples, replace = TRUE))(xvals)))
  sim_diff <- sweep(sim_ecdfs, 2, uniform_ecdf, "-")
  # Compute confidence bands for the difference
  lower <- apply(sim_diff, 2, quantile, probs = (1 - conf_level) / 2)
  upper <- apply(sim_diff, 2, quantile, probs = 1 - (1 - conf_level) / 2)
  # Compute difference between observed ECDF and uniform
  observed_diff <- ecdf_vals - uniform_ecdf
  # plot
  plot(xvals, observed_diff, type = "n", bty = "n",
       ylim = range(c(observed_diff, lower, upper)),
       xlab = "Rank", ylab = "ECDF difference", main = main)
  # Confidence band
  polygon(c(xvals, rev(xvals)), c(upper, rev(lower)), col = "lightgray", border = NA)
  # Plot observed ECDF difference
  lines(xvals, observed_diff, lwd = 2, col = "slateblue")
  # Reference line: zero difference
  abline(h = 0, lty = 2, col = "darkgray")
}

# plot ECDF for selected parameters
par(mfrow=c(3, 4))
for(j in seq_along(par_vector)){
  plot_sbc_ecdf_diff(rank[,j], main = param_labels[j])
}

Figure 3: Empirical cumulative distribution function (ECDF) of the ranks of the simulated data. The grey band represents the expected distribution of ranks assuming a uniform distribution. The ECDF of the ranks is approximately uniform which indicates a good recovery of the prior distribution.

These plots are not only useful to detect if the model is well-calibrated or not, but they can also provide insight into the type of misspecification. See the lecture slides for examples of different shapes and what they signal about issues in the model.

3. Model sensitivity

Next we assess model sensitivity to confirm that (1) the implemented MCMC algorithm returns unbiased estimates and that (2) the data effectively update the prior beliefs (i.e., the likelihood should reduce the prior uncertainty yielding a lower posterior variance). For that, we depict the posterior z-scores of each parameter as a function of the posterior contraction for all simulated datasets. Ideally, posterior z-scores cluster around zero, indicating minimal overfitting to incorrect values and no prior/likelihood mismatch. Posterior contraction scores ideally concentrate around 1 indicating effective updating from prior to posterior. Figure 4 plots the posterior z-score as a function of the posterior contraction for all group level parameters in the model as well as a random subset of parameters on the individual level. The outcomes of our simulations reveal favorable posterior z-scores, predominantly clustering around zero. The outcomes of the posterior contraction are also good, with many values close to one. Note that the level of posterior contraction also depends on the sample size; in small samples, less updating is expected.

Show the code

# based on the model specification
z           <- (mu_post - sbc_prior) / sqrt(var_post)
contraction <- t(apply(var_post, 1, function(x)  1 - (x/var_prior)))

# scatter plot 1
par(mfrow=c(3, 4))

for(j in seq_along(par_vector)){
    
  plot(contraction[,par_vector[j]], z[,par_vector[j]],
       xlab = 'Posterior Contraction',
       ylab = 'Z Score',
       main = param_labels[j],
       xlim = c(-1, 1),
       ylim = c(-4, 4),
       las = 1, bty = 'n', cex.main=2,
       pch = 1, cex.lab=1.5,
       col = 'slateblue')
  abline(h=0, lty=3)
  points(mean(contraction[,par_vector[j]]), mean(z[, par_vector[j]]),
         cex = 2, lwd=1.5,
         #col = 'grey36',
         bg = "lightgrey",
         pch = 21)
}

Figure 4: Assessment of model sensitivity. The grey dots depict the means of the distributions. Z-scores (y-axis) clustering around zero indicate that the model returns unbiased estimates. Posterior contraction (x-axis) ranges from around 0.5 to 1, which indicates satisfactory to good updating of model parameters.

In addition to assessing model sensitivity, we can also depict the relationship between the generating parameter values and the posterior estimates to evaluate the model’s ability to recover true parameter values. A perfect recovery of the true parameter values would result in a correlation of 1. Our results, shown in Figure 5, show correlations between \(0.73\) to \(0.97\) across population- and country-level parameters which again indicates good recovery.

Show the code

# scatter plot
par(mfrow=c(3, 4))

for(j in seq_along(par_vector)){
  
  pos_x <- quantile(sbc_prior[,par_vector[j]], 0.80)
  pos_y <- quantile(mu_post[,par_vector[j]], 0.90)
    
  correlation <- cor(sbc_prior[,par_vector[j]], mu_post[,par_vector[j]])
  plot(sbc_prior[,par_vector[j]], mu_post[,par_vector[j]],
       xlab = 'True',
       ylab = 'Estimated',
       main = param_labels[j],
       #main = colnames(mu_post)[par_vector[j]],
       las = 1, bty = 'n', cex.main=2,
       pch = 21,
       ylim=c(min(sbc_prior[,par_vector[j]]), max(sbc_prior[,par_vector[j]])),
       col = 'slateblue', bg = 'lightgrey')
  abline(0, 1, lty=3, col = "grey36")
  legend('bottomright', legend=paste0('r = ', round(correlation, 2)), bty='n', cex = 1.5)
}

Figure 5: True versus estimated parameter values. Correlations between true values (x-axis) and estimated parameter values (y-axis) range between \(0.73\) to \(0.97\) which indicates good recovery of the model parameters.

4. Posterior predictions

The posterior predictive distribution can be interpreted as the model’s attempt to re-describe the behavioral data. Predictions from an adequate model should resemble the behavioral data. As seen before, the aggregated binomial model is not a perfect fit for these data, as it doesn’t capture the inflation of 0/3 and 3/3 well. Figure 6 shows that the misfit is especially present in the mind condition, where the model predicts many 1/3 and 2/3 responses, while the data show a enhanced preference for 0/3 and 3/3 responses. Note that the posterior average per condition is close to the observed average, so we can likely trust the model coefficients, even with this misfit in the exact pattern of the data.

In this case, we may want to consider alternative likelihood functions, such as a zero-inflated binomial model or an ordinal model, as discussed on Tuesday. Remember, however, that you sometimes need to make a trade-off between model complexity and model fit; easier models are typically easier to estimate and interpret, but might show more misfit to the exact data pattern. As a researcher you need to decide what fit is good enough, and ideally conduct sensitivity analyses on other model structures.

Show the code

# now finally fit the model with real data
fit_inf <- brm(data = dat, family = binomial,
                  resp | trials(3) ~ 1 + cat + (1 + cat | site),
                  prior = prior_inf,
                  iter = iterations, warmup = warmup, 
                  chains = 4, cores = 4,
                  seed = 2025, backend = "cmdstanr")

Start sampling

Running MCMC with 4 parallel chains...

Chain 1 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 2 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 3 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 4 Iteration:    1 / 5000 [  0%]  (Warmup) 
Chain 1 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 2 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 3 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 4 Iteration:  100 / 5000 [  2%]  (Warmup) 
Chain 2 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 1 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 3 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 4 Iteration:  200 / 5000 [  4%]  (Warmup) 
Chain 1 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 2 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 4 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 2 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 3 Iteration:  300 / 5000 [  6%]  (Warmup) 
Chain 1 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 4 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 3 Iteration:  400 / 5000 [  8%]  (Warmup) 
Chain 2 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 4 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 1 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 3 Iteration:  500 / 5000 [ 10%]  (Warmup) 
Chain 4 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 1 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 2 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 4 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 2 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 3 Iteration:  600 / 5000 [ 12%]  (Warmup) 
Chain 1 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 4 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 2 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 3 Iteration:  700 / 5000 [ 14%]  (Warmup) 
Chain 1 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 2 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 3 Iteration:  800 / 5000 [ 16%]  (Warmup) 
Chain 4 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 1 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 3 Iteration:  900 / 5000 [ 18%]  (Warmup) 
Chain 4 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 2 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 1 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 4 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 2 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 3 Iteration: 1000 / 5000 [ 20%]  (Warmup) 
Chain 1 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 4 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 2 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 3 Iteration: 1100 / 5000 [ 22%]  (Warmup) 
Chain 1 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 4 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 2 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 3 Iteration: 1200 / 5000 [ 24%]  (Warmup) 
Chain 4 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 1 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 2 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 3 Iteration: 1300 / 5000 [ 26%]  (Warmup) 
Chain 4 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 1 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 2 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 3 Iteration: 1400 / 5000 [ 28%]  (Warmup) 
Chain 4 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 1 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 2 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 3 Iteration: 1500 / 5000 [ 30%]  (Warmup) 
Chain 4 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 1 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 2 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 3 Iteration: 1600 / 5000 [ 32%]  (Warmup) 
Chain 4 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 1 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 2 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 3 Iteration: 1700 / 5000 [ 34%]  (Warmup) 
Chain 4 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 1 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 2 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 3 Iteration: 1800 / 5000 [ 36%]  (Warmup) 
Chain 4 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 1 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 4 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 2 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 3 Iteration: 1900 / 5000 [ 38%]  (Warmup) 
Chain 1 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 4 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 2 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 3 Iteration: 2000 / 5000 [ 40%]  (Warmup) 
Chain 4 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 1 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 2 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 3 Iteration: 2100 / 5000 [ 42%]  (Warmup) 
Chain 4 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 1 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 2 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 3 Iteration: 2200 / 5000 [ 44%]  (Warmup) 
Chain 1 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 4 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 4 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 2 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 3 Iteration: 2300 / 5000 [ 46%]  (Warmup) 
Chain 1 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 3 Iteration: 2400 / 5000 [ 48%]  (Warmup) 
Chain 4 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 2 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 2 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 1 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 1 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 3 Iteration: 2500 / 5000 [ 50%]  (Warmup) 
Chain 3 Iteration: 2501 / 5000 [ 50%]  (Sampling) 
Chain 4 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 2 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 1 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 4 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 3 Iteration: 2600 / 5000 [ 52%]  (Sampling) 
Chain 2 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 1 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 3 Iteration: 2700 / 5000 [ 54%]  (Sampling) 
Chain 4 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 2 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 4 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 1 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 3 Iteration: 2800 / 5000 [ 56%]  (Sampling) 
Chain 2 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 3 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 4 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 1 Iteration: 2900 / 5000 [ 58%]  (Sampling) 
Chain 2 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 4 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 3 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 1 Iteration: 3000 / 5000 [ 60%]  (Sampling) 
Chain 2 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 4 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 1 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 3 Iteration: 3100 / 5000 [ 62%]  (Sampling) 
Chain 2 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 3 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 4 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 1 Iteration: 3200 / 5000 [ 64%]  (Sampling) 
Chain 2 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 4 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 3 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 1 Iteration: 3300 / 5000 [ 66%]  (Sampling) 
Chain 2 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 4 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 3 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 1 Iteration: 3400 / 5000 [ 68%]  (Sampling) 
Chain 2 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 4 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 3 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 1 Iteration: 3500 / 5000 [ 70%]  (Sampling) 
Chain 2 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 4 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 3 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 1 Iteration: 3600 / 5000 [ 72%]  (Sampling) 
Chain 2 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 4 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 3 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 1 Iteration: 3700 / 5000 [ 74%]  (Sampling) 
Chain 2 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 4 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 3 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 1 Iteration: 3800 / 5000 [ 76%]  (Sampling) 
Chain 2 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 4 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 3 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 1 Iteration: 3900 / 5000 [ 78%]  (Sampling) 
Chain 2 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 4 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 2 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 3 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 1 Iteration: 4000 / 5000 [ 80%]  (Sampling) 
Chain 4 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 3 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 2 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 1 Iteration: 4100 / 5000 [ 82%]  (Sampling) 
Chain 4 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 3 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 2 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 4 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 1 Iteration: 4200 / 5000 [ 84%]  (Sampling) 
Chain 3 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 2 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 4 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 1 Iteration: 4300 / 5000 [ 86%]  (Sampling) 
Chain 3 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 2 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 4 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 1 Iteration: 4400 / 5000 [ 88%]  (Sampling) 
Chain 3 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 2 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 4 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 1 Iteration: 4500 / 5000 [ 90%]  (Sampling) 
Chain 3 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 2 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 4 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 1 Iteration: 4600 / 5000 [ 92%]  (Sampling) 
Chain 3 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 4 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 4 finished in 13.3 seconds.
Chain 2 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 1 Iteration: 4700 / 5000 [ 94%]  (Sampling) 
Chain 3 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 2 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 3 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 1 Iteration: 4800 / 5000 [ 96%]  (Sampling) 
Chain 2 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 2 finished in 13.9 seconds.
Chain 1 Iteration: 4900 / 5000 [ 98%]  (Sampling) 
Chain 3 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 3 finished in 14.0 seconds.
Chain 1 Iteration: 5000 / 5000 [100%]  (Sampling) 
Chain 1 finished in 14.3 seconds.

All 4 chains finished successfully.
Mean chain execution time: 13.9 seconds.
Total execution time: 14.5 seconds.

Show the code

pred_post <- posterior_predict(fit_inf)
pred_post <- pred_post[seq(from=1, to=nrow(pred_post), by = 100),]

cols <- c('#2A363B','#99B898','#96281B','#FF847C')

p1 <- bayesplot::ppc_dens_overlay(yrep = as.matrix(pred_post), y = dat$resp) + 
  ggtitle("Posterior predictive distribution") + 
  theme_minimal() +
  theme(legend.position="none")
# or per condition
p2 <- bayesplot::ppc_dens_overlay(y = dat$resp[dat$cat==-0.5], yrep = as.matrix(pred_post[,seq(1,ncol(pred_post), by = 2)])) +
  ggtitle("Posterior predictive - body condition") + 
  scale_color_manual(values = cols[c(1,2)]) +
  theme_minimal() +
  theme(legend.position="none")

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Show the code

p3 <- bayesplot::ppc_dens_overlay(y = dat$resp[dat$cat==0.5], yrep = as.matrix(pred_post[,seq(2,ncol(pred_post), by = 2)])) +
  ggtitle("Posterior predictive - mind condition") + 
  scale_color_manual(values = cols[c(3,4)]) +
  theme_minimal() + theme(legend.position="none")

Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.

Show the code

gridExtra::grid.arrange(p1, p2, p3,
             layout_matrix = rbind(c(1, 1),
                                   c(2, 3)))

Figure 6: Posterior predictive distribution of the data. The model predicts responses in the body condition fairly well (bottom left), but shows clear misfit in the mind condition (bottom right), as it does not capture the inflation of 0/3 and 3/3 responses.

References

Hoogeveen, Suzanne, Sacha Altay, Theiss Bendixen, Renatas Berniūnas, Joseph A Bulbulia, Arik Cheshin, Claudio Gentili, et al. 2023. “Does She Still Love and Feel Hungry? Afterlife Beliefs, Mind-Body Dualism, and Religion Across 24 Countries.” PsyArXiv. https://doi.org/10.31234/osf.io/tvycp.

Kucharský, Šimon, N.-Han Tran, Karel Veldkamp, Maartje Raijmakers, and Ingmar Visser. 2021. “Hidden Markov Models of Evidence Accumulation in Speeded Decision Tasks.” Computational Brain & Behavior 4 (4): 416–41. https://doi.org/10.1007/s42113-021-00115-0.

Schad, Daniel J., Michael Betancourt, and Shravan Vasishth. 2021. “Toward a Principled Bayesian Workflow in Cognitive Science.” Psychological Methods 26 (1): 103–26. https://doi.org/10.1037/met0000275.

Talts, Sean, Michael Betancourt, Daniel Simpson, Aki Vehtari, and Andrew Gelman. 2018. “Validating Bayesian Inference Algorithms with Simulation-Based Calibration.” arXiv. https://doi.org/10.48550/arXiv.1804.06788.