statistical model – A.Z. Andis Arietta

Structural Equation Modeling in R

azandis@gmail.com — Wed, 24 Jul 2024 21:52:46 +0000

Introduction

Today we are talking about structural equation models (SEM). There are lots of synonyms, sub-categories, and adjacent techniques that you may have heard before–covariance structure analysis, linear structural relations, path analysis, latent variable modeling, causal modeling, confirmatory factor analysis, exploratory factor analysis, latent growth modeling, mixture/multigroup/heirarchical/multilevel structural modeling, construct analysis, etc., etc…

So many names exist because there are a LOT of things that you can do with these types of models. It turns out that allowing for and testing structure in modeling can help solve lots of problems in research.

I think that the best way to understand SEMs is to start with simple regressions. Let’s consider a regression predicting a rooster crowing from the rising of the sun:

y = b₀ + b₁x + ε

y ~ Rooster Crowing,
x ~ Sun Rising,
b₀ = 0,
ε ~ N(0, 1)

This is a silly regression because we all understand the relationship: the rooster goes cock-a-doodle-do when the sun crests the horizon. But in the language of mathematics, there is no reason I can’t rewrite this equation as:

x = ( y – ε ) b₁^-1

This formulation make no sense. Basically, we are saying that the rising of the sun is a function of the rooster crowing! Even though this is totally mathematically viable, it defies our common sense of causation.

The language of structural equation modeling allows one way to impose some directional structure on mathematical equations. We usually visualize that language as directed graphical models like the one below.

In a graphical model, the observed variables are displayed as boxes Unobserved or latent variables are displayed as circles. Constants are triangles. The functional relationship between variables is displayed as directional arrows. Non-directional or double-headed arrows indicate a variance or covariance.

This graphical model above is the same model as the regression in the equation above. Our independent variable x has a linear relationship with the dependent variable y with the slope parameter b₁. y has a constant intercept b₀ of 0. Finally, the residual variation in y not caused by x is assumed to come from some other unobserved cause with it’s own variance. Rather than thinking of variables as independent or dependent, we used the terms exogenous or endogenous. Exogenous variables are those (like x) that have no incoming paths. These are the most ‘upstream’ variables in the causal paths. Endogenous variables are those that receive causal paths. We’ll see later that some endogenous variables can also be causes of other endogenous variables, but they are still considered endogenous.

In practice, we generally ignore the peripherals and intercepts in structural models, yielding a simplified graph:

Now that we have a shared language. Let’s take a look at a toy example to understand why SEMs can be so useful in research.

Motivating example

You will need a handful of packages for this tutorial:

packages_for_sem_workshop <-
  c(
    'tidyverse', # basic data wrangling
    'tidygraph', # graph visualization
    'ggraph', # graph visualization
    'lavaan', # sem tools
    'piecewiseSEM', # sem tools
    'mgcv', # nonlinear modeling
    'lme4', # random effect modeling
    'cvsem' # cross-validating sems
    )

install_and_load_packages <-
  function(x){
    for( i in x ){
      if( ! require( i , character.only = TRUE ) ){
        install.packages( i , dependencies = TRUE )
        library( i , character.only = TRUE )
      }
    }
  }

install_and_load_packages(packages_for_sem_workshop)

Fitting a simple regression

I find that, for most applied researchers, the language of code is more intuitive than the language of math. So, let’s simulate a toy dataset and see how we can fit the same system as a linear model or an SEM.


simple_ex <-
  data.frame(
    x = runif(n = 100, min = 0, max = 10),
    e = rnorm(n = 100, 0, 1)
  )

simple_ex <- simple_ex %>%
  mutate(
    y = 1 + 2.5*x + e
  )

Now, let’s fit a good old-fashioned linear regression model:

fit_simple_ex_lm <- 
  lm(y ~ x, data = simple_ex)

summary(fit_simple_ex_lm)

> summary(fit_simple_ex_lm)

Call:
lm(formula = y ~ x, data = simple_ex)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.61653 -0.49110 -0.01622  0.51680  2.76976 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.08641    0.17916   6.064 2.49e-08 ***
x            2.52441    0.02871  87.913  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.8972 on 98 degrees of freedom
Multiple R-squared:  0.9875,	Adjusted R-squared:  0.9874 
F-statistic:  7729 on 1 and 98 DF,  p-value: < 2.2e-16

We can fit the same model using the lavaan package. The syntax for lavaan is to pass the list of models as a simple character string separated by next lines.

simple_ex_sem <-
  '
  y ~ x
  y ~ 1
'

fit_simple_ex_sem <-
  sem(model = simple_ex_sem,
      data = simple_ex)

summary(fit_simple_ex_sem)

> summary(fit_simple_ex_sem)
lavaan 0.6-18 ended normally after 1 iteration

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         3

  Number of observations                           100

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  y ~                                                 
    x                 2.524    0.028   88.805    0.000

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y                 1.086    0.177    6.125    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y                 0.789    0.112    7.071    0.000

Lavaan give us an informative output. We see that the parameters are estimated with ‘ML’, maximum likelihood, which differs from ‘lm’ which estimates via ordinary least squares (OLS). This means that estimates might differ slightly. We will trust the default optimizer ‘NLMINB‘.

We see that the model is estimating 3 parameters–the regression slope, the intercept, and the residual variance. Note that we typically do not estimate the intercepts in SEMs (I’m doing so here for continuity). To fit the SEM without the intercept, simply remove the 'y ~ 1' from the model list.

We will pass over the Model Tests and focus on the parameter estimates. The slope estimate of 2.524 and the intercept estimate of 1.086 perfectly match our lm estimates. We don’t get an estimate of the residual variance from lm, but we can extract the residuals from the lm model and calculate the variance ourselves.

fit_simple_ex_lm %>%
  resid() %>%
  var()

> fit_simple_ex_lm %>%
+   resid() %>%
+   var()
[1] 0.7968917

That values is a bit off due to the difference in estimation techniques, I believe.

Fitting complex systems

Now that we’ve seen how linear regression models can be fit as a special case of SEM, let’s take a look at an example that shows where SEM surpasses linear regression.

Again, we will use simulated data, but we’ll download the data so that we can’t see the underlying structure used to generate it. Instead, we’ll use SEMs to determine the structure.

source("https://raw.githubusercontent.com/andisa01/202407_SEM_turorial/main/scripts/SEM_tutorial_example_source.R")

The traditional approach

First, let’s analyze this data using linear regression modeling with stepwise variable selection like almost (the technique use by almost half of all ecological and animal behavioral researcher (including my past self)). In this methodology, we start with a ‘full’ model including all variables and their interactions. Then, we drop non-significant parameters starting with the least parsimonious and refit the model until only significant parameters remain.

# Fit the full or 'global model'
mod_ex01_full <- lm(Y ~ X1 + X2 + X1:X2, data = example01_data_anon)
summary(mod_ex01_full)

> summary(mod_ex01_full)

Call:
lm(formula = Y ~ X1 + X2 + X1:X2, data = example01_data_anon)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5935  -3.7755   0.1861   3.6929  13.7593 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.163110   7.944663  -0.146    0.884
X1          -0.264413   0.387296  -0.683    0.496
X2           0.155742   0.109397   1.424    0.156
X1:X2        0.002022   0.003950   0.512    0.609

Residual standard error: 5.066 on 196 degrees of freedom
Multiple R-squared:  0.144,	Adjusted R-squared:  0.1309 
F-statistic: 10.99 on 3 and 196 DF,  p-value: 1.06e-06

None of our parameters are significant, so we drop the interaction term.

# Drop the interaction
lm(Y ~ X1 + X2, data = example01_data_anon) %>% 
  summary()

> lm(Y ~ X1 + X2, data = example01_data_anon) %>% 
+   summary()

Call:
lm(formula = Y ~ X1 + X2, data = example01_data_anon)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.5962  -3.8684   0.1564   3.6213  13.8291 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -4.73378    3.79270  -1.248  0.21346   
X1          -0.08449    0.16216  -0.521  0.60295   
X2           0.19734    0.07308   2.701  0.00753 **
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.057 on 197 degrees of freedom
Multiple R-squared:  0.1428,	Adjusted R-squared:  0.1341 
F-statistic: 16.41 on 2 and 197 DF,  p-value: 2.555e-07

Now, x₂ is significant, but x₁ is not. So, we drop x₁.

lm(Y ~ X2, data = example01_data_anon) %>% 
  summary()

> lm(Y ~ X2, data = example01_data_anon) %>% 
+   summary() # Drop X1

Call:
lm(formula = Y ~ X2, data = example01_data_anon)

Residuals:
     Min       1Q   Median       3Q      Max 
-15.4915  -3.8082  -0.0172   3.6272  13.5921 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -3.28441    2.57317  -1.276    0.203    
X2           0.16227    0.02839   5.716 3.97e-08 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 5.047 on 198 degrees of freedom
Multiple R-squared:  0.1416,	Adjusted R-squared:  0.1373 
F-statistic: 32.67 on 1 and 198 DF,  p-value: 3.966e-08

And, now we have a minimum adequate model to explain the system! We can explain 14% of the variation of y with x₂. For every 1 unit increase in x₂, we expect y to increase by 0.16. And look at that p-value! Let’s go publish a paper!!!

In fact, this method of stepwise variable selection is so common that you can do it all in one command with dredge() from the MuMIn package.

MuMIn::dredge(mod_ex01_full)

> MuMIn::dredge(mod_ex01_full)
Fixed term is "(Intercept)"
Global model call: lm(formula = Y ~ X1 + X2 + X1:X2, data = example01_data_anon)
---
Model selection table 
   (Int)       X1     X2    X1:X2 df   logLik   AICc delta weight
3 -3.284          0.1623           3 -606.551 1219.2  0.00  0.626
4 -4.734 -0.08449 0.1973           4 -606.413 1221.0  1.81  0.254
8 -1.163 -0.26440 0.1557 0.002022  5 -606.279 1222.9  3.64  0.101
2  4.870  0.31890                  3 -610.048 1226.2  6.99  0.019
1 11.280                           2 -621.824 1247.7 28.49  0.000
Models ranked by AICc(x)

The complexity of complex systems

The problem with the linear regression approach is that there is an inherent assumption that the independent variables are, well… independent. But, in natural systems, there are almost always relationships between the independent variables. For instance, here are just a few structures that could underly the relationship between y, x₁, and x₂.

The model structure we derived in our stepwise regression model is the single effect graph in the top left. The model implied by the multiple regression model is a ‘common effect’ structure where the exogenous variables are uncorrelated–in other words x₁ and x₂ have independent effects on y.

But other structures could exist that we cannot easily capture in regression. For instance, For instance, x₁ and x₂ have independent effects on y but remain correlated with each other (common cause with correlation (bottom right)). Or x₁ may have no direct effect on y, but may affect x₂ which affects y in turn (this is called a chain or fully mediated effect (top middle)). Or, x₁ might directly affect y in addition to the indirect effect (partial mediator (top right)).

x₂ may not have any effect on y at all, but may still covary because both are directly affected by x₁ (common cause (bottom left)).

Using SEMs to compare structural hypotheses

Given all the possible structures, how can we ever know which governs our study system? Well, the first pass is to use common sense. If x₁ is height and x₂ is weight, the principle of allometry should have us exclude any model without a relationship between them.

The second pass is to use knowledge of your system from the literature. For example, if x₁ is maternal genotype and x₂ is F1 genotype (of the progeny), there can be no direct effect of the maternal effect on y if the species has an annual lifecycle, but might be partially mediated for perennial species.

Once you have a set of plausible structural hypotheses, we can use the mechanics of SEM to ask which structure best fits the data.

For now, we will assume that all 6 hypotheses above are plausible. We’ll fit each in turn. To do so, I’ll introduce a new operator ~~ which relates the undirected covariation between two variables (i.e. covariance) or with itself (i.e. variance).

# single effect
ex01_formula_x2effect <- '
Y ~ X2
X1 ~~ X1
'

ex01_sem_x2effect <- sem(ex01_formula_x2effect, data = example01_data_anon)

summary(ex01_sem_x2effect)

> summary(ex01_sem_x2effect)
lavaan 0.6-18 ended normally after 8 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         3

  Number of observations                           200

Model Test User Model:
                                                      
  Test statistic                               377.733
  Degrees of freedom                                 2
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)
  Y ~                                                 
    X2                0.162    0.028    5.745    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    X1               32.093    3.209   10.000    0.000
   .Y                25.220    2.522   10.000    0.000

We won’t use the summary output for now, so I will exclude it when fitting the rest of the models.

# chain
ex01_formula_chain <- '
X2 ~ X1
Y ~ X2
'

ex01_sem_chain <- sem(ex01_formula_chain, data = example01_data_anon)

# partially mediated
ex01_formula_mediator <- '
X2 ~ X1
Y ~ X2
Y ~ X1
'

ex01_sem_mediator <- sem(ex01_formula_mediator, data = example01_data_anon)

# common cause
ex01_formula_commoncause <- '
X2 ~ X1
Y ~ X1
'

ex01_sem_commoncause <- sem(ex01_formula_commoncause, data = example01_data_anon)

# common effect (uncorrelated)
ex01_formula_commoneffect <- '
Y ~ X1
Y ~ X2
'

ex01_sem_commoneffect <- sem(ex01_formula_commoneffect, data = example01_data_anon)

# common effect (correlated)
ex01_formula_commoneffect2 <- '
Y ~ X1
Y ~ X2
X1 ~~ X2
'

ex01_sem_commoneffect2 <- sem(ex01_formula_commoneffect2, data = example01_data_anon)

We’ll use the anova() command to get some summary statistics on all of the models.

anova(
  ex01_sem_chain, 
  ex01_sem_commoneffect, 
  ex01_sem_commoncause, 
  ex01_sem_x2effect, 
  ex01_sem_commoneffect2, 
  ex01_sem_mediator
)

Chi-Squared Difference Test

                       Df    AIC    BIC    Chisq Chisq diff RMSEA Df diff Pr(>Chisq)    
ex01_sem_commoneffect   0 1218.8 1228.7   0.0000                                        
ex01_sem_commoncause    0 2425.5 2442.0   0.0000       0.00 0.000       0               
ex01_sem_commoneffect2  0 3688.8 3708.6   0.0000       0.00 0.000       0               
ex01_sem_mediator       0 2425.5 2442.0   0.0000       0.00 0.000       0               
ex01_sem_chain          1 2423.8 2437.0   0.2754       0.28 0.000       1     0.5997    
ex01_sem_x2effect       2 2480.4 2490.3 377.7330     377.46 1.372       1     <2e-16 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Warning message: lavaan->lavTestLRT():  
   some models have the same degrees of freedom

The models are not mutually nested, so we can’t interpret the chi-squared test. However, we can use the two information criteria (AIC and BIC) since those are agnostic to the particular models in comparison. Both AIC and BIC indicate that the uncorrelated common effect structure is the most likely model.

That’s not the end of the story, though…

Cross-validating SEMs

As an applied data scientist, I’ve become very skeptical of parametric goodness-of-fit metrics. After all, I can fit a neural net that perfectly fits any dataset by memorizing it. What I am really interested is in how well a model can make accurate predictions.

Ideally, we would fit our model on one dataset and then collect separate data to validate its predictive accuracy. This is called cross-validation. Given the limitations of field research, however, a more common approach is to split your single dataset into two groups. One for fitting and one for validation.

The simple form of cross-validation only gives us one chance to measure accuracy (or lack thereof, i.e. error). In order to maximize the use of our data and get better estimates, it is a standard practice to split the dataset into a given number of evenly distributed, randomly sampled splits (i.e. folds). Then we can perform the cross-validation as many times as we have folds. This is called k-fold cross-validation.

We can use the library cvsem to easily conduct a 10-fold cross validation on all of our models.

# Cross validating
models_to_cv <- 
  cvgather(
    ex01_sem_chain, 
    ex01_sem_commoneffect, 
    ex01_sem_commoncause, 
    ex01_sem_x2effect, 
    ex01_sem_commoneffect2, 
    ex01_sem_mediator
    )

cvsem(
  data = example01_data_anon,
  Models = models_to_cv,
  k = 10
)

Cross-Validation Results of 6 models 
based on  k =  10 folds. 

                   Model E(KL-D)   SE
1         ex01_sem_chain    0.32 0.16
3   ex01_sem_commoncause    0.33 0.16
6      ex01_sem_mediator    0.33 0.16
4      ex01_sem_x2effect    5.90 1.32
5 ex01_sem_commoneffect2    7.16 1.23
2  ex01_sem_commoneffect    7.20 1.23

Both of the ‘common cause’ structures exhibit the worst predictive performance. They have the highest errors (by default, the error is calculated via KL-Divergence–the difference between the covariance matrix of the validation set and the covariance matric implied by the fitted model). To be fair, a very quick look at the relationship between x₁ and x₂ could have told us that any model that fails to account for the relationship of these two models is not realistic.

The next highest error comes from the model suggested by our initial stepwise regression method. Thus, based on predictive performance, we can conclusively exclude these hypothesized structures.

The chain, common cause, and mediator structures all have similar predictive accuracy. How can we decide which is the true structure of our system? Unfortunately, there is no statistical method to differentiate with our data. However, we can still use the structural model to help us develop further experiments to select the final model. For example, we could envision conducting an experiment where we held x₂ constant in one set of plots and let it vary in another.

If the system were a common cause structure, holding x₂ would cause no difference in the y values between plots. However, if the system were a fully mediated chain through x₂ , holding x₂ would completely decouple the association between x₁ and y, whereas controlling x₂ in a partially mediated system would only attenuate the covariation between x₁ and y.

In our toy example, the true causal structure will be easy to diagnose when I tell you what the variables really are:

y ~ Shark attacks,
x₁ ~ Weather,
x₂ ~ Ice cream sales

Estimating SEM coefficients

You might be tempted to take the SEM for the common cause structure and use that in your paper. I’d advise against it. Since we used all of our data in estimating the structure of the system, we don’t want to reuse the same data to estimate parameter coefficients. That would be double-dipping on our data. Having our cake and eating it too.

Instead, we need to either collect additional data (ideal) or reserve a split of our original data to use as the final fitting data. Since this is a simulated dataset, we can ‘collect’ all of the new data we want.

set.seed(666)

n <- 200

weather <- rnorm(n, mean = 20, sd = 5)

ice_cream_sales <- 50 + 2 * weather + rnorm(n, mean = 0, sd = 5)

shark_attacks <- 5 + 0.3 * weather + rnorm(n, mean = 0, sd = 5)

example01_newdata <- data.frame(shark_attacks, weather, ice_cream_sales)

# common cause
ex01_new_commoncause <- '
ice_cream_sales ~ weather
shark_attacks ~ weather
# shark_attacks ~ 1
'

ex01_sem_new_commoncause <- sem(ex01_new_commoncause, data = example01_newdata)

summary(ex01_sem_new_commoncause)

> summary(ex01_sem_new_commoncause)
lavaan 0.6-18 ended normally after 10 iterations

  Estimator                                         ML
  Optimization method                           NLMINB
  Number of model parameters                         5

  Number of observations                           200

Model Test User Model:
                                                      
  Test statistic                                 0.000
  Degrees of freedom                                 0

Parameter Estimates:

  Standard errors                             Standard
  Information                                 Expected
  Information saturated (h1) model          Structured

Regressions:
                    Estimate  Std.Err  z-value  P(>|z|)
  ice_cream_sales ~                                    
    weather            2.131    0.065   32.853    0.000
  shark_attacks ~                                      
    weather            0.419    0.061    6.852    0.000

Covariances:
                     Estimate  Std.Err  z-value  P(>|z|)
 .ice_cream_sales ~~                                    
   .shark_attacks       3.026    1.644    1.841    0.066

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
   .ice_cream_sals   24.452    2.445   10.000    0.000
   .shark_attacks    21.727    2.173   10.000    0.000

Our coefficient estimates are close, but not spot on the true parameter values we simulated. The reason is that the simulated errors are pretty high (sd = 5). Rather than fitting the final model once, we can borrow another technique from machine learning and repeatedly fit bootstrap replicates. This will both stabilize our estimates and provide a convenient, non-parametric confidence interval.

Boostrapping SEM coefficients

Content to come!

Extensions to SEM

Content to come!

Laten variables

Content to come!

Composite variables

Content to come!

Non-linear relationships

Content to come!

Hierarchical relationships

Content to come!

Background

This tutorial was originally developed for a graduate workshop at Yale University on 2024 07 25.

Visualize mixed effect regressions in R with GGplot2

azandis@gmail.com — Sat, 31 Dec 2022 18:34:16 +0000

Example of the final viz.

In this post, I will show some methods of displaying mixed effect regression models and associated uncertainty using non-parametric bootstrapping. This is kind of a follow-up to my previous post on visualizing custom main effect models.

Introduction

Mixed models have quickly become the model du jour in many observation-oriented fields because these models obviate many of the issues of pseudoreplication inherent in blocked or repeated measures experiments and structured data.

They do this by treating the levels of categorical variables not as unique instances to be parameterized individually, but as random samples from an infinite distribution of levels. Instead of wasting degrees of freedom estimating parameter values for each level, we only need to estimate that global distribution (which requires only a handful of parameters) and instead focus our statistical power on the variables of interest.

Thanks to packages like nlme and lme4, mixed models are simple to implement. For all of their virtues, mixed models can also be a pain to visualize and interpret. Although linear mixed models are conceptually similar to the plain old ordinary least-squares regression we know and love, they harbor a lot more math under the hood, which can be intimidating.

One of the reasons mixed models are difficult to intuitively visualize is because they allow us to manage many levels of uncertainty. Depending on the focus of our analyses, we usually want to focus on certain aspects of the trends and associated uncertainty. For instance, an ecologist might be interested in the effect of nutrient input across many plots, but not interested in the difference between plots (i.e. traditional random effect). Or, an educator might be interested in the effect of different curricula, but not the difference between specific classes within specific schools (i.e. nested random effects). Or, a physician might be interested in the effect of a long-term treatment on a patient after accounting for baseline difference between patients (i.e. repeated measures).

In this tutorial, I’m going to focus on how to visualize the results of mixed effect models from lme4 using ggplot2. You can also clone the annotated code from my Github.

First, load in the necessary libraries.

library(tidyverse)
library(lme4)
library(ggsci)
library(see)
library(cowplot)
theme_set(theme_classic()) # This sets the default ggplot theme

To begin, I am going to simulate an experiment with 10 experimental units each containing 100 observations. These could be 10 plots with 100 random samples, or 10 schools with 100 student test scores, or the records of 10 patients from each of 100 follow-up visits. Each of the experimental units will ultimately get its own intercept and slope effect coefficient. The rand_eff data frame is essentially a Z matrix in classic mixed model notation. For this example, I’ll assume that the intercepts come from a distribution with standard deviation of 20 and the slopes from a distribution with standard deviation of 0.5. The random effects define the variation of the experimental unit around the main effect, so the mean of these distributions is necessarily 0.

set.seed(666)
rand_eff <- data.frame(unit = as.factor(seq(1:10)),
            b0 = rnorm(10, 0, 20),
            b1 = rnorm(10, 0, 0.5))

We can now join our random effect matrix to the full dataset and define our y values as y_i = B0_i + b0_j + B1x_i + b1x_j + ε.

X <- expand.grid(unit = as.factor(seq(1:10)), obs = as.factor(seq(1:100))) %>%
  left_join(rand_eff,
            by = "unit") %>%
  mutate(x = runif(n = nrow(.), 0, 10),
         B0 = 20,
         B1 = 2,
         E = rnorm(n = nrow(.), 0, 10)) %>%
  mutate(y = B0 + b0 + x * (B1 + b1) + E)

Here’s a look at the data.

X %>%
  ggplot(aes(x = x, y = y, col = unit)) +
  geom_point() +
  facet_wrap(vars(unit))

Scatter plots of the simulated data. Each of 10 experimental units contains 100 observations.

Random intercept model

For demonstration, let’s first assume that we are primarily interested in the overall slope of the relationship. For instance, if these are 10 field plots, we might want to know the effect of adding 1 unit of nutrient fertilizer, regardless of the baseline level of nutrients in a given plot.

We can do this by fitting a random intercept model and then looking at the summary of the resulting model.

lmer1 <- lmer(y ~ x + (1|unit), data = X)

summary(lmer1)

> summary(lmer1)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | unit)
   Data: X

REML criterion at convergence: 7449.7

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.88156 -0.68745  0.01641  0.71022  2.84532 

Random effects:
 Groups   Name        Variance Std.Dev.
 unit     (Intercept) 811.59   28.488  
 Residual              94.63    9.728  
Number of obs: 1000, groups:  unit, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)  18.2652     9.0301   2.023
x             2.0091     0.1077  18.651

Correlation of Fixed Effects:
  (Intr)
x -0.060

We can see that the fitted model does a good job estimating the fixed effect slope (B1), which we simulated with a coefficient of 2 as 2.0091. However, the model is underestimating the fixed effect intercept (B0) as 18.3 and overestimating the standard deviation of the random effect slopes (b1) as 28.5, when we simulated those values as 20 and 20.

If we think we could live with that fit, how would we go about visualizing our model?

Here is a look at our data with a linear regression fit to each experimental unit. It is clear that there is a wide spread in the intercepts, but the slopes are similar.

X %>%
ggplot(aes(x = x, y = y, col = unit)) +
geom_point() +
geom_smooth(method = 'lm', se = F)

Scatter plot of the simulated data with an independent linear regression fit to each experimental unit.

Marginal (Fixed effect) versus Conditional (Fixed + Random effect)

We might be tempted to use this built-in regression by group from ggplot as a visualization of the mixed model. However, this would be WRONG!!! GGplot is fitting an ordinary least squares regression without accounting for the random effect. That means that the estimates and the confidence intervals do not reflect our model. In this case, the estimates might be pretty close since our samples sizes across species are pretty even, but this could be wildly off, or even opposite, of mixed model slope estimate.

In a prior post, I showed how we can use the predict function to display our custom models in ggplot. In the case of mixed effect models, you can predict both the marginal and conditional values. The marginal value is the fixed effect. The conditional value is the mixed effect of the fixed and random effects.

In other words, the marginal effect is asking “What would I expect y to be for a given x without knowing which experimental unit it came from?” whereas the conditional effect is asking “What would I expect y to be for a given x from a given experimental unit?”

We can specify which prediction we want with the random effect formula argument re.form:

X <- X %>% 
  mutate(fit.m = predict(lmer1, re.form = NA),
         fit.c = predict(lmer1, re.form = NULL))

The simplest visualization would be to display the marginal fit on the raw values.

X %>%
  ggplot(aes(x = x, y = y)) +
    geom_point(pch = 16, col = "grey") +
    geom_line(aes(y = fit.m), col = 1, size = 2) +
    coord_cartesian(ylim = c(-40, 100))

Linear fit of the marginal (fixed) effects (black line) shown with a scatterplot of the raw data (grey dots).

However, this is a bit misleading because it underrepresents our confidence in the slope by making it look like the residuals are huge.

But the residuals, and our confidence in the fit, is based on the conditional residual variance, which is much tighter. We can see that easily when we look at the conditional fits. This is one option for visualization, but it highlights the wrong element if our primary interest is the overall slope trend.

X %>%
  ggplot(aes(x = x, y = y, col = unit)) +
  geom_point(pch = 16) +
  geom_line(aes(y = fit.c, col = unit), size = 2) +
  facet_wrap(vars(unit)) +
  coord_cartesian(ylim = c(-40, 100))

Conditional fits from the random effect model with random intercepts on the raw data points, facetted by experimental unit.

Displaying the conditional fits on the same facet helps. Now we can see the variation in the conditional intercepts, but as a tradeoff it makes it difficult to get a sense of the residual variance because there are too many points.

X %>%
  ggplot(aes(x = x, y = y, col = unit)) +
  geom_point(pch = 16) +
  geom_line(aes(y = fit.c, col = unit), size = 2)  +
  coord_cartesian(ylim = c(-40, 100))

Conditional fits from the random effect model with random intercepts on the raw data points. All experimental units are displayed on the same facet, differentiated by color.

Instead, I think it makes more sense to display the conditional residuals around the marginal effect. You can kind of think of this as collapsing all of the conditional fits from the previous plot into the single marginal fit. We can do this by extracting the residuals (which are the conditional residuals) and then displaying the points as the marginal fit plus the residuals.

X <- X %>%
  mutate(resid = resid(lmer1))

X %>%
  ggplot(aes(x = x, y = fit.m + resid, col = unit)) +
  geom_point(pch = 16) +
  geom_line(aes(y = fit.m), col = 1, size = 2)  +
  coord_cartesian(ylim = c(-40, 100))

Marginal fit from the random effect model with random intercepts on the conditional residuals of the experimental units, differentiated by color.

In some cases, we might also want to give the reader a sense of the variation in the conditional intercepts. For instance, the fact that the slope is so consistent across a wide range of baselines might actually increase our confidence in the relationship even further.

There are a couple of ways to simultaneously display both our confidence in the fit of the marginal trend and the variance in the conditional fits.

Depending on the number of conditional units, one option is to display the conditional fits below the scatter plot of the conditional residuals.

X %>%
  ggplot(aes(x = x, y = fit.m + resid)) +
  geom_line(aes(y = fit.c, col = unit), size = 1) +
  geom_point(pch = 16, col = "grey") +
  geom_line(aes(y = fit.m), col = 1, size = 2) +
  coord_cartesian(ylim = c(-40, 100))

Marginal fit (heavy black line) from the random effect model with random intercepts with the conditional residuals (grey dots) and conditional fits (thin lines) for each experimental unit, differentiated by color.

Another option is to display a density plot or histogram of the estimated conditional intercepts (also known as the Best Linear Unbiased Predictors or BLUPs). In a random effect framework, we are assuming that the conditional intercepts are samples of some infinite distribution of intercepts, so this histogram from the BLUPs of our model is essentially an empirical representation of that idealized distribution. (Alternatively, we could also simply plot the idealized distribution as a normal distribution from the estimated variance of the random effect, but I like the empirical density plot because it also give a sense of when our conditionals do NOT conform to the assumption of being samples from a normal distribution).

We can extract the BLUPs from the model object (b0_hat) and add those to the model estimate of the marginal intercept (B0_hat) to get the estimated conditional intercepts. This is our data frame of conditional estimates.

Cond_DF <- as.data.frame(ranef(lmer1)) %>% transmute(unit = grp, b0_hat = condval) %>% mutate(Intercept_cond = b0_hat + summary(lmer1)$coef[1,1])

X %>%
  ggplot(aes(x = x, y = fit.m + resid)) +
  geom_point(pch = 16, col = "grey") +
  geom_violinhalf(data = Cond_DF, aes(x = 0, y = Intercept_cond), trim = FALSE, width = 3, fill = NA) +
  geom_line(aes(y = fit.m), col = 1, size = 2) +
  coord_cartesian(ylim = c(-40, 100))

Marginal fit (heavy black line) from the random effect model with random intercepts with the conditional residuals (grey dots) and histogram of the distribution of conditional intercepts.

Random slope and intercept model

Now let’s imagine that we are not satisfied with the random intercept model and also want to fit a random slope parameter. In this case, we want to estimate the distribution of slopes for all experimental units across the values of x.

lmer2 <- lmer(y ~ x + (x|unit), data = X)

We can see that the values from the model are getting much closer to the known values that we simulated.

summary(lmer2)


> summary(lmer2)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (x | unit)
   Data: X

REML criterion at convergence: 7442.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.2071 -0.6957  0.0268  0.7067  2.8482 

Random effects:
 Groups   Name        Variance Std.Dev. Corr 
 unit     (Intercept) 846.3021 29.0913       
          x             0.2151  0.4638  -0.27
 Residual              93.0853  9.6481       
Number of obs: 1000, groups:  unit, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)  18.3474     9.2202    1.99
x             2.0075     0.1816   11.05

Correlation of Fixed Effects:
  (Intr)
x -0.250

Correlation of random effects

One important addition from the random intercept-only model is the estimate for the correlation between the distribution of the random slopes and random intercepts (which the model estimates as -0.268, see output below). Because we simulated these data, we know that there is no true correlation between the unit slopes and intercepts. But, because we have a small number of units, we just happened to have an emergent correlation.

summary(lmer2)$varcor

cor.test(rand_eff$b0, rand_eff$b1)

> summary(lmer2)$varcor
 Groups   Name        Std.Dev. Corr  
 unit     (Intercept) 29.09127       
          x            0.46378 -0.268
 Residual              9.64807       
> 
> cor.test(rand_eff$b0, rand_eff$b1)

	Pearson's product-moment correlation

data:  rand_eff$b0 and rand_eff$b1
t = -1.2158, df = 8, p-value = 0.2587
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.8205150  0.3123993
sample estimates:
       cor 
-0.3949022

If you had a reason to assume NO correlation between your random effects, you could specify that as (x||unit) in this way:

lmer3 <- lmer(y ~ x + (x||unit), data = X)

summary(lmer3)

summary(lmer3)$varcor

> summary(lmer3)
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + ((1 | unit) + (0 + x | unit))
   Data: X

REML criterion at convergence: 7443.3

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.1812 -0.6951  0.0223  0.7074  2.8523 

Random effects:
 Groups   Name        Variance Std.Dev.
 unit     (Intercept) 833.2681 28.8664 
 unit.1   x             0.2092  0.4574 
 Residual              93.1011  9.6489 
Number of obs: 1000, groups:  unit, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)   18.325      9.149   2.003
x              2.010      0.180  11.165

Correlation of Fixed Effects:
  (Intr)
x -0.035
> 
> summary(lmer3)$varcor
 Groups   Name        Std.Dev.
 unit     (Intercept) 28.86638
 unit.1   x            0.45738
 Residual              9.64889

As with the random intercept model, we can use the predict function to get expected values of y based on the marginal or conditional estimates. Note that re.form = NULL is the same as re.form = ~ (x|unit).

X <- X %>% 
  mutate(fit2.m = predict(lmer2, re.form = NA),
         fit2.c = predict(lmer2, re.form = NULL),
         resid2 = resid(lmer2))

As with the random intercept model, one way to visualize the model is to show the conditional intercept/slopes as fitted lines and the conditional residuals as points.

pmain_lmer2 <- X %>%
  ggplot(aes(x = x, y = fit2.m + resid2)) +
  geom_line(aes(y = fit2.c, col = unit), size = 1) +
  geom_point(pch = 16, col = "grey") +
  geom_line(aes(y = fit2.m), col = 1, size = 2) +
  coord_cartesian(ylim = c(-40, 100))
pmain_lmer2

Marginal fit (heavy black line) from the random effect model with random intercepts and slopes with the conditional residuals (grey dots) and conditional fits (thin lines) for each experimental unit, differentiated by color.

Visualizing the random effect variance gets a bit more difficult with two random parameters. One strategy I like is to include an additional plot of the correlation and distribution of the random effects.

Basically, we make a scatter plot of the BLUPs of slopes and intercepts, Then we make histograms for each and add those to the margins of the plot. (I’m relying heavily on Claus Wilke’s post to code the histograms at the margins of the plot). Finally, we patch it all together with cowplot.

Cond_DF2 <- as.data.frame(ranef(lmer2)) %>% 
  transmute(unit = grp,
            term = case_when(term == "(Intercept)" ~ "b0_hat",
                             term == "x" ~ "b1_hat"),
            value = condval) %>%
  pivot_wider(id_cols = "unit", names_from = "term", values_from = "value") %>%
  mutate(Intercept_cond = b0_hat + summary(lmer2)$coef[1,1],
         Slope_cond = b1_hat + summary(lmer2)$coef[2,1])

pmain <- Cond_DF2 %>%
  ggplot(aes(x = Intercept_cond, y = Slope_cond)) +
    geom_point(aes(col = unit), size = 3) +
    geom_density2d(bins = 4, col = "grey", adjust = 3)

xdens <- axis_canvas(pmain, axis = "x") +
  geom_density(data = Cond_DF2, aes(x = Intercept_cond), fill = "grey", col = NA, trim = FALSE, adjust = 2)

ydens <- axis_canvas(pmain, axis = "y", coord_flip = TRUE) +
  geom_density(data = Cond_DF2, aes(x = Slope_cond), fill = "grey", col = NA, trim = FALSE, adjust = 2) +
  coord_flip()

p1 <- insert_xaxis_grob(pmain, xdens, grid::unit(.2, "null"), position = "top")
p2 <- insert_yaxis_grob(p1, ydens, grid::unit(.2, "null"), position = "right")

pinsert_lmer2 <- ggdraw(p2)

plot_grid(
  pmain_lmer2,
  pinsert_lmer2,
  nrow = 1
)

Left: Marginal fit (heavy black line) from the random effect model with random intercepts with the conditional residuals (grey dots) and conditional fits (thin lines) for each experimental unit, differentiated by color. Right: Correlation of BLUPs (Best Linear Unbiased Predictors) of random intercept and slope parameters of experimental units, differentiated by color with marginal density distributions.

Bootstrapping uncertainty

For many researchers, one of the most frustrating aspects of mixed models is that estimating confidence intervals and testing the significance of parameters is not straight forward. I highly encourage that folks take a look at Ben Bolker’s thorough considerations on the topic. Dr. Bolker suggests many different problems and solutions depending on the structure of your model.

I think that the most generalized solution is to use non-parametric bootstrapping. This method essentially asks the question, “How would our model fit change if we could go back in time, select different samples, and then rerun our analysis?”

We can’t go back in time, but maybe we CAN assume that our original samples were representative of the population. If so, instead of resampling the actual population, we could resample our original observations WITH REPLACEMENT to approximate resamples of the population. If we do this many time, we can then make intuitive statements like, “95% of the time, if we reran this experiment, we’d expect the main effect to be between X and X value.”

It is important to stop an consider what re-doing your data generation process would look like. For instance, imagine our mock data had come from 1000 independent random observation that we then categorized into “units” to control for autocorrelation after the fact. If we re-ran the process to generate a new dataset, we may not always get the same number of observations in each “unit”.

However, if our mock data came from an experiment where we planted 100 trees in each of 10 “units”, then when we re-ran the experiment, we could control the number of individuals per unit. We would also need to consider if we would always choose to plant in the same 10 units, or if we would also choose units at random.

The structure of the data generation process can guide our bootstrap resampling strategy. In the first example, we could simply bootstrap all individual observations (although we may need to worry about non-convergence and small sample sizes). In the second example, where unit choice is constrained, we might decide to bootstrap within units. If, in the second example, we could also randomize units, we should probably take a hierarchical approach, first bootstrapping the units and then bootstrapping the observations within each unit.

NOTE: The problem with non-parametric boostrapping of this kind is that it can be computationally expensive. One trick is to parallelize the bootstraps across all of your computer’s processors. By default, R uses one processor, so it will fit one bootstrap iteration at a time, sequentially. But the bootstraps are independent and the order doesn’t matter. If you computer has 8 cores, there is no reason not to fit 8 models simultaneously on 8 processors in 1/8 of the time. Unfortunately, setting up for parallel processing can be an adventure of its own. I won’t detail it here, but will try to dedicate a post on it in the future. If you have a very large dataset and can’t run bootstraps in parallel, you might consider some of the other methods suggested by Dr. Bolker.

Since our dataset is fairly small and simple, I’ll demonstrate how we can use bootstrapping to simultaneously estimate confidence intervals of our model parameters and visualize error bands.

Just keep in mind that if you are fitting models with complex random effect designs, you’ll have to think critically about which elements and levels of variance are most important for your data story. Hopefully, these examples will at least get you started and inspired!

The bootstrapping process begins by initiating an empty dataframes to accept the parameter estimates for the fixed effect coefficients and random effect variances. The number of rows in the dataframe will be the same as the number of bootstrap iterations, so we set that first. The number of iterations is the number of times we want to simulate re-doing our data generation. The convention is 1000, but the more the merrier!

Number_of_boots <- 1000

The number of columns for the dataframes will equal the number of fixed effect coefficients and random effect variances. We can extract these from the initial model. First we extract the coefficients, then transpose the table into wide format.

# Extract the fixed effect coefficients.
FE_df <- fixef(lmer2) %>% 
  t() %>%
  as.data.frame()

# Extract the random effects variance and residual variance
RE_df <- VarCorr(lmer2) %>%
  as.data.frame() %>%
  unite("Level", -c(vcov, sdcor)) %>%
  select(-vcov) %>%
  t() %>%
  as.data.frame()

Next, we create empty dataframes to take our bootstraps.

BS_params <- data.frame(matrix(nrow = Number_of_boots, ncol = ncol(FE_df)))
colnames(BS_params) <- colnames(FE_df)

BS_var <- data.frame(matrix(nrow = Number_of_boots, ncol = ncol(RE_df)))
colnames(BS_var) <- RE_df["Level",]

In addition, we will be predicting marginal values from each model. So, we need to create a prediction dataframe with an empty column to store the predicted values. For this example, we only need to predict ŷ values for a handful of x values that represent the range of xs. I chose to use 10-quantiles because I want to be able to fit a non-linear confidence band later. If this was a non-linear fit, we might want even more prediction values.

BS_pred <- expand.grid(x = quantile(X$x, probs = seq(0, 1, length.out = 10)),
                       iterration = 1:Number_of_boots,
                       pred = NA)

Finally, we can write a loop that creates a resampled dataset (with replacement) and fits the original model formula to the new dataset. From the new model, we can then extract the fixed and random effects and predict ŷs for the subset of x values. All of these get stored in their respective dataframes, indexed by the iteration number.

for(i in 1:Number_of_boots){
BS_X <- slice_sample(X, prop = 1, replace = TRUE)
BS_lmer <- lmer(formula = lmer2@call$formula,
data = BS_X)

BS_params[i,] <- BS_lmer %>%
fixef() %>%
t() %>%
as.data.frame()

BS_var[i,] <- BS_lmer %>%
VarCorr() %>%
as.data.frame() %>%
.$sdcor

BS_pred[which(BS_pred$iterration == i),]$pred <- predict(BS_lmer,
newdata = BS_pred[which(BS_pred$iterration == i),],
re.form = ~0)
}

Now we have a dataframe of the marginal (i.e. fixed effect) intercept and slope parameter estimates from 1000 models fit to bootstraps of our original data.

head(BS_params)

> head(BS_params)
  (Intercept)        x
1    18.06942 2.096240
2    18.18005 2.070043
3    18.13506 2.093110
4    18.77862 1.928048
5    18.47963 2.013831
6    18.28875 2.005947

One way to get a sense of our confidence in these parameter estimates is to take a look at their distributions.

BS_hist_x <- BS_params %>%
  ggplot(aes(x = x)) +
  geom_histogram()

BS_hist_intercept <- BS_params %>%
  ggplot(aes(x = `(Intercept)`)) +
  geom_histogram()

BS_hists <- plot_grid(
  BS_hist_intercept,
  BS_hist_x,
  nrow = 1)

BS_hists

Histograms of marginal (i.e. fixed effect) intercept (left) and slope (right) parameters.

These histograms tell us so much more than a typical confidence interval because we can see the full distribution. We can see that the baseline effect on y given x = 0 is around 18.5 and we are very confident it is between 17 and 20. We can also see that the effect of 1 unit change in x is expected to yield about a 2.0 unit change in y and we are extremely confident that the slope is positive and greater than 1.7 but probably less than 2.3.

Although we won’t directly visualize the random effect variance, we can see the estimates in the BS_var dataframe.

BS_var %>% head()

> BS_var %>% head()
  unit_(Intercept)_NA unit_x_NA unit_(Intercept)_x Residual_NA_NA
1            28.30440 0.4427450         0.01009018       9.786112
2            30.31353 0.6265168        -0.42315981       9.365630
3            30.33369 0.5261389        -0.78128072       9.320844
4            28.51896 0.6079722        -0.08575830       9.739881
5            30.50690 0.5835157        -0.64068868       9.718633
6            29.17012 0.5182395        -0.28244154       9.930305

Here, the first column is the estimated variance attributed to the random intercepts, the second column is the variance estimate of the random slopes, and the third column is the correlation between the random effects. The fourth and final column is the residual variation after accounting for the random effects (i.e. the conditional residual variation). This is information you’d want to include in a table of the model output.

We also have a dataframe of 10 predicted ŷs for each iteration.

head(BS_pred, n = 20)

> head(BS_pred, n = 20)
             x iterration     pred
1  0.008197445          1 18.08660
2  1.098954093          1 20.37309
3  2.231045775          1 22.74622
4  3.266109165          1 24.91596
5  4.323574828          1 27.13267
6  5.667717615          1 29.95031
7  6.672807375          1 32.05722
8  7.748743587          1 34.31264
9  8.793447507          1 36.50259
10 9.968969584          1 38.96677
11 0.008197445          2 18.19702
12 1.098954093          2 20.45493
13 2.231045775          2 22.79841
14 3.266109165          2 24.94104
15 4.323574828          2 27.13003
16 5.667717615          2 29.91247
17 6.672807375          2 31.99305
18 7.748743587          2 34.22028
19 8.793447507          2 36.38286
20 9.968969584          2 38.81624

Rather than using a traditional confidence band (which basically reduces the distribution of our bootstraps down to two points: high and low), I prefer to actually show all of the iterations and let the density of the lines make a kind of optical illusion of a confidence band.

Since our estimates are pretty tightly distributed, I’m using cowplot to show what this looks like while also zooming in to a section.

plot_grid(
  BS_pred %>%
    ggplot(aes(x = x, y = pred)) +
    geom_line(aes(group = iterration), alpha = 0.1, col = "grey50") +
    geom_line(data = X,
              aes(x = x, y = fit2.m)) +
    geom_rect(aes(ymin = 35, ymax = 40,
                  xmin = 8, xmax = 10),
              col = "firebrick",
              fill = NA,
              size = 2),
  BS_pred %>%
    ggplot(aes(x = x, y = pred)) +
    geom_line(aes(group = iterration), alpha = 0.1, col = "black") +
    geom_line(data = X,
              aes(x = x, y = fit2.m),
              col = "grey", 
              size = 2) +
    coord_cartesian(xlim = c(8, 10),
                    ylim = c(35, 40)) +
    geom_rect(aes(ymin = 35, ymax = 40,
                  xmin = 8, xmax = 10),
              col = "firebrick",
              fill = NA,
              size = 2) +
    theme(axis.line = element_blank(),
          axis.text = element_blank(),
          axis.title = element_blank()) +
    labs(x = "", y = ""),
  nrow = 1
)

Marginal effects of linear mixed effect regressions fit to 1000 boostrap resamples of the original data (thin grey lines in left and think black lines in right panel). The model estimates fit to the original data are show as a thick black (left) or grey (right). The right panel is a detailed view of the quadrant indicated with a red rectangle in the left panel.

Of course, we can also create a more traditional 90% confidence band by summarizing the 5th and 95th percentiles of ŷ across each x value.

BS_pred %>%
  group_by(x) %>%
  summarise(hi = quantile(pred, 0.95),
            lo = quantile(pred, 0.05)) %>%
  ggplot(aes(x = x)) +
    geom_ribbon(aes(ymin = lo, ymax = hi),
                fill = "grey50",
                alpha = 0.3) +
    geom_line(data = X,
              aes(x = x, y = fit2.m))

Model fit of a linear mixed effect regression (black line) and 90% confidence band (grey band) estimated from fitting 1000 boostraps.

Putting it all together

Putting it all together, here is my preferred visualization of a mixed effect model with random intercepts and slopes, using bootstrapping to display uncertainty.

BS_ci_lines <- BS_pred %>%
  ggplot(aes(x = x, y = pred)) +
    geom_line(aes(group = iterration), alpha = 0.1, col = "grey") +
    geom_line(data = X,
              aes(x = x, y = fit2.m)) +
    geom_point(data = X,
               aes(x = x, y = fit2.m + resid2, col = unit),
               alpha = 0.3,
               pch = 16)

plot_grid(
  BS_ci_lines,
  plot_grid(
    pinsert_lmer2,
    plot_grid(
      BS_hist_intercept,
      BS_hist_x,
      nrow = 1,
      labels = c("C", "D")),
    nrow = 2,
    rel_heights = c(1, 0.7),
    labels = c("B", NA)
  ),
  nrow = 1,
  rel_widths = c(1, 0.7),
  labels = c("A", NA)
)plot_grid(
  BS_ci_lines,
  plot_grid(
    pinsert_lmer2,
    plot_grid(
      BS_hist_intercept,
      BS_hist_x,
      nrow = 1,
      labels = c("C", "D")),
    nrow = 2,
    rel_heights = c(1, 0.7),
    labels = c("B", NA)
  ),
  nrow = 1,
  rel_widths = c(1, 0.7),
  labels = c("A", NA)
)

Linear fixed effect (A, solid black line) of a mixed effect regression fit to the original data and estimate 1000 boostrap resamples (A, thin grey lines) with conditional residuals for experimental units (A, points). Correlation of BLUPs (Best Linear Unbiased Predictors) of random intercept and slope parameters of experimental units with marginal density distributions (B). Histograms of marginal (i.e. fixed effect) intercept (C) and slope (D) parameters estimated from 100 boostraps. Colors in A and B differentiate experimental units.

Or, if we want to get really fancy with it, we could inset everything into one plot panel.

BS_ci_lines +
  coord_cartesian(ylim = c(-30, 100)) +
  annotation_custom(ggplotGrob(BS_hists),
                    xmin = 5,
                    xmax = 10,
                    ymin = -30,
                    ymax = 5
  ) +
  annotation_custom(ggplotGrob(pinsert_lmer2),
                    xmin = 0,
                    xmax = 4,
                    ymin = 50,
                    ymax = 110
  )

sadfasdf