# Introduction

This tutorial presents a comprehensive and detailed version of the examples section from my master’s thesis . The paper introduces a default Bayes factor (henceforth abbreviated as BF, Kass & Raftery, 1995) with clear operating characteristics for testing whether the fixed effects of linear two-level models are equal to zero. This was achieved by generalizing an approach for linear regression presented in Hoijtink (2021), resulting in a BF of 19 when the marginal $$R^2$$ for the fixed effects is zero in the data. A wrapper function for the R package bain was developed for testing the fixed parameters of linear two-level models fitted with the lmer function from the R package lme4 . The function includes an adjustment for the fraction of the scaling parameter of the prior distribution, as proposed in my paper. Researchers can access the function and the full paper from this repository. This tutorial also provides a step-by-step guide for calculating the Multiple Imputation-based effective sample Size (abbreviated as MI-based $$N_{eff}$$), a new method for determining the effective sample size for two-level models containing predictors with varying slopes. In the paper, I demonstrate that the sample size does not impact the calculation of the BF. However, I believe that the proposed method for determining the effective sample size of two-level models is an innovative approach. That’s why I have included a comprehensive, step-by-step guide on how to calculate the MI-based $$N_{eff}$$. Unfortunately, at this time, there is not a user-friendly R function readily available that automates this calculation.

The tutorial aims to:

• Offer practical examples for using the wrapper function.

• Demonstrate the clear operating characteristics of the proposed Bayes factor when testing fixed effects in two-level models

• Provide a clear and easy-to-follow guide for calculating the $$N_{eff}$$. Which represents a promising new approach for calculating the effective sample size for two-level models.

# Packages

library(R2MLwiN)      # includes the data set
library(lme4)         # fitting two-level models
library(tidyverse)    # data manipulation and plotting
library(jtools)       # model summaries & automatic calculation of R^2_m
library(summarytools) # descriptive statistics
library(DT)           # interactive tables

# for the MI-based N_eff
library(rjags)
library(MASS)

#the wrapper function
source("wrapper_function.R") 

Please note, that you need to have JAGS installed on your local machine to be able to compute the MI-based $$N_{eff}$$.

# The wrapper function

bain_2lmer(x, hypotheses, standardize = FALSE,
N, fraction, jref = FALSE, seed)

As illustrated in the example call above, the wrapper takes the following arguments:

• x: the fitted lmer object;

• hypotheses: the specified hypotheses (saved as a character vector), the names of the parameters specified in the hypotheses should correspond to the names of the predictors as specified when calling lmer (see the examples below);

• standardize: a logical argument, indicating whether to compute the BF based on standardized data. This is not relevant when testing whether the fixed effects are equal to zero, however, as mentioned in the paper, it is important when comparing the parameters to each other;

• N: the value for the sample size, which includes the following options:

• "level_1" - computes the BF using the number of level-1 observations;
• "level_2" - computes the BF using the number of level-2 observations;
• "ICC_effective" - computes the BF using the ICC based $$N_{eff}$$;
• a number supplied by the user - used for the MI-based $$N_{eff}$$.
• fraction: a multiplicative factor for the fraction b which is by default set to 1 i.e., the default value for J (the number of fixed effects that are set equal to zero in the hypothesis);

• jref: a logical argument, which, if set to equal TRUE, applies the calculation for the reference value for $$J$$ (i.e., $$J_{ref}$$), which yields BFs with clear operating characteristics when testing whether the fixed effects are equal to zero. In this case, the previous argument fraction is ignored (and the used need not specify it).

For more details on how to test (informative) hypotheses for different statistical models, see this vignette.

# The Data

Throughout the examples, we will be using the tutorial data, a two-level data set available from the R package R2MLwiN. The data represents a subset from a larger data set of examination results from six inner London Education Authorities with 4059 students nested within 65 schools. The variables used for the aims of this example are: (i) the standardized students’ exam score (normexam) which will serve as the outcome variable; (ii) the standardized students’ score at age 11 on the London Reading Test (standlrt); (iii) the school indicator (school) with 65 schools of varying size. Additionally, in the end, a level-2 predictor in the form of the average LRT score for each school (avslrt) is included to illustrate that this approach can also be for testing the coefficients of second-level variables.

## Load and inspect the data

data("tutorial")

tutorial.1 <- tutorial[, c(1,2,3,5)] # subset the data with only the variables of interest

st_options(descr.silent = TRUE)
descr(tutorial.1, stats = c("mean", "med", "sd", "min", "max")) # descriptive statistics for the outcome and the level-1 predictor
## Descriptive Statistics
## tutorial.1
## N: 4059
##
##                 normexam   standlrt
## ------------- ---------- ----------
##          Mean       0.00       0.00
##        Median       0.00       0.04
##       Std.Dev       1.00       0.99
##           Min      -3.67      -2.93
##           Max       3.67       3.02

# Example 1: Model with random intercept and a random level-1 predictor where $$H_0$$ is false.

For this example, we fit a two-level model containing a random intercept and a random slope for the standlrt variable, i.e.,

$\text{normexam}_{ij} = \alpha + \beta_1\;\text{standlrt}_{ij} + u_{0j} + u_{1j}\;\text{standlrt}_{ij} + \epsilon_{ij},$ where $$i$$ = 1,. . . , N and $$j$$ = 1,. . . , G with N denoting the number of students (level-1 observations i.e., 4059) and G denoting the total number of schools (level-2 observations i.e., 65). $$\alpha$$ represents the fixed intercept, $$\beta_1$$ represents the effect for standlrt, $$u_{0j}$$ represents the random component denoting the deviation of school $$j$$ from the fixed intercept and $$u_{1j}$$ denotes the deviation of school $$j$$ from the fixed intercept. Lastly, $$\epsilon_{ij}$$ represents the standard residual error term for student $$i$$ in school $$j$$. $$u_{0j}$$, $$u_{1j}$$ and $$\epsilon_{ij}$$ have estimated variance components denoted as $$\sigma^2_{u0}$$, $$\sigma^2_{u1}$$ and $$\sigma^2_\epsilon$$, respectively.

First, let’s visualize this model by plotting the predictor standlrt against the outcome normexam, with separate regression lines for each school: We can see that the schools indeed have different slopes for the predictor standlrt. So, let’s fit this model with lmer.

## Fit the model

We will estimate all the models using Full Maximum Likelihood Estimation (FML), here is an interesting blog post about the differences between FML and REML. However, researchers are advised to use whichever method they choose based on other methodological considerations which are not related to testing the fixed effects. As shown in the paper, the estimation method has a negligible influence on the value of the BF.

model.1 <- lmer(normexam ~ standlrt + (standlrt | school), REML = FALSE, data = tutorial.1)

### Inspect the estimates and calculate $$R^2_m$$

The summ function from the package jtools gives visually pleasing summaries of fitted lme4 models and reports the marginal effect size for the fixed effects ($$R^2_m$$). Additionally, this function reports p-values based on the Satterthwaite approximation for the degrees of freedom. The lme4 package deliberately omits reporting p-values due to issues regarding the calculation of the degrees of freedom (for details on evaluating the fixed effects by using Null Hypothesis Significance Testing approaches, see, Luke, 2017).

summ(model.1) 
 Observations 4059 Dependent variable normexam Type Mixed effects linear regression
 AIC 9328.87 BIC 9366.72 Pseudo-R² (fixed effects) 0.32 Pseudo-R² (total) 0.43
Fixed Effects
Est. S.E. t val. d.f. p
(Intercept) -0.01 0.04 -0.29 65.28 0.78
standlrt 0.56 0.02 27.62 62.60 0.00
p values calculated using Kenward-Roger standard errors and d.f.
Random Effects
Group Parameter Std. Dev.
school (Intercept) 0.30
school standlrt 0.12
Residual 0.74
Grouping Variables
Group # groups ICC
school 65 0.14

The fixed effect for standlrt is estimated to be larger than zero, which is evident both by its SE and by the highly significant p-value (rejecting the null hypothesis which states that the fixed effect is equal to zero). The value for the ICC in this data set is .14, which indicates the amount of within-school clustering with respect to normexam, before accounting for (part of) the explained variance by introducing (random) predictors.

#### $$R^2_m$$ using summ()

From the table above we can see that the Pseudo-$$R^2$$ (fixed effects) i.e., the $$R^2_m$$ is .32, which indicates a large effect size based on Cohen (1992) for the $$R^2$$ of multiple linear regression.

#### $$R^2_m$$ by hand

We can also manually compute the marginal $$R^2$$ which is defined as the proportion of variation in the outcome variable that can be explained by the fixed effect(s):

$R^2_m = \frac{\sigma_f^2}{\sigma_f^2 + \sigma_{u0}^2 + \sigma_{u1}^2 + \sigma_\epsilon^2},$ where $$\sigma_f^2$$ is calculated is defined as

$\sigma_f^2 = \text{var}(\alpha +\beta_1\;\text{standlrt}_{ij}).$

fixef <- fixef(model.1)  # extract the fixed effect
y_hat <- fixef + fixef*tutorial$standlrt # predict the outcome sigma_f <- var(y_hat) # obtain the variance in the predicted outcome based on the fixed effect # extract the estimated random effects from the fitted model (you can use the summary function) # i.e., summary(model.1) sigma_u0 <- 0.09044 sigma_u1 <- 0.01454 sigma_e <- 0.55366 # calculate the R^2_m marginal_Rsq_fixed <- (sigma_f)/(sigma_f + sigma_u0 + sigma_u1 + sigma_e) marginal_Rsq_fixed ##  0.3170484 We can see that this value corresponds to the Pseudo-$$R^2$$ (fixed effects) given by the summ function. Thus, we expect the BF reject the null (i.e., show support for the unconstrained hypothesis). ## Calculate the MI-based $$N_{eff}$$ In the following section, I will provide a comprehensive, yet easy-to-follow guide on how to compute the recently proposed effective sample size using the tutorial data. While there may be more sophisticated and streamlined methods from a programming perspective, the goal of this tutorial is to clearly demonstrate the underlying processes involved in calculating Mi-based $$N_eff$$ in a straightforward manner. First, we need to specify the same model in JAGS in a text file: jags.model <- "model { # Likelihood for (i in 1:4059){ normexam[i] ~ dnorm(mu[i], tau) mu[i] <- alpha[school[i]] + beta[school[i]] * standlrt[i] } # Level-2 for(j in 1:65){ alpha[j] <- U[j,1] beta[j] <- U[j,2] U[j,1:2] ~ dmnorm (MU[j,], invSigma[,]) MU[j,1] <- mu.alpha MU[j,2] <- mu.beta } # (hyper)Priors mu.alpha ~ dnorm(0, 0.0001) mu.beta ~ dnorm(0, 0.0001) tau ~ dgamma (0.001, 0.001) # resiudal variance invSigma[1:2,1:2] ~ dwish(Tau, 2) # inverse of the covariance matrix following a Wishart prior tau.alpha ~ dgamma (0.001, 0.001) # intercept variance tau.beta ~ dgamma (0.001, 0.001) # slope variance Tau[1,1] <- pow(tau.alpha, -1/2) # construct the scale matrix (a parameter of the Whishart prior) Tau[2,2] <- pow(tau.beta, -1/2) Tau[1,2] <- rho_1*tau.alpha*tau.beta # covariance between the slope of standlrt and the intrcept Tau[2,1] <- Tau[1,2] rho_1 ~ dunif(-1, 1) # correlation (between -1 and 1) not important } " ### Compare the models Since we are using uninformative priors, we will take a small detour and show that the Bayesian posterior parameter estimates correspond to those obtained with lmer when using FML. # Check and inspect the model model.def <- jags.model(file = textConnection(jags.model), inits = list(.RNG.name="base::Wichmann-Hill", .RNG.seed=100), data = tutorial, n.chains = 2) ## Compiling model graph ## Resolving undeclared variables ## Allocating nodes ## Graph information: ## Observed stochastic nodes: 4059 ## Unobserved stochastic nodes: 72 ## Total graph size: 16539 ## ## Initializing model update(object = model.def, n.iter = 1000) # ask only for these parameters parameters <- c("mu.alpha", "mu.beta", "tau", "invSigma") results <- coda.samples(model = model.def, variable.names = parameters, n.iter =1000) summary(results) ## ## Iterations = 2001:3000 ## Thinning interval = 1 ## Number of chains = 2 ## Sample size per chain = 1000 ## ## 1. Empirical mean and standard deviation for each variable, ## plus standard error of the mean: ## ## Mean SD Naive SE Time-series SE ## invSigma[1,1] 13.52115 3.58318 0.0801224 0.1558283 ## invSigma[2,1] -16.27213 8.90956 0.1992239 0.5807041 ## invSigma[1,2] -16.27213 8.90956 0.1992239 0.5807041 ## invSigma[2,2] 79.44014 31.90626 0.7134457 2.1708230 ## mu.alpha -0.01271 0.04361 0.0009752 0.0013995 ## mu.beta 0.55412 0.02121 0.0004743 0.0008501 ## tau 1.80630 0.04071 0.0009103 0.0009105 ## ## 2. Quantiles for each variable: ## ## 2.5% 25% 50% 75% 97.5% ## invSigma[1,1] 8.05983 11.05483 13.07467 15.39220 22.17893 ## invSigma[2,1] -39.38018 -20.46357 -14.63042 -10.09460 -4.02706 ## invSigma[1,2] -39.38018 -20.46357 -14.63042 -10.09460 -4.02706 ## invSigma[2,2] 37.66881 57.83871 72.84995 93.21918 170.70371 ## mu.alpha -0.09981 -0.04166 -0.01324 0.01561 0.06769 ## mu.beta 0.51329 0.53977 0.55368 0.56767 0.59710 ## tau 1.72764 1.77944 1.80671 1.83268 1.89180 #transform the precisions of the random eff. into standard deviations sqrt(1/13.52115) # intercept SD ##  0.2719526 sqrt(1/79.44014) # slope SD ##  0.1121967 sqrt(1/1.8063) # residual SD ##  0.744055 Note how the posterior mean estimates along with their standard deviations roughly correspond to their (Full) Maximum Likelihood counterparts. Note, the (co)variance components are expressed in terms of precisions, the reason for this are beyond the scope of the tutorial. Thus, in order to be able to compare the results with the ones from lmer, we need to take the inverse for the values of invSigma and tau, and then take the square root (since the random effects summarized by the summ function are expressed as standard deviations). The last three rows from the output contain the estimated random effects expressed in terms of standard deviations. ### Obtain multiple imputed data sets Now, we estimate the model again and save all the sampled random effects, which we will treat as multiple imputed values. We run the sampler by using two chains with a 1000 iterations each (excluding the burn-in period of a 1000 extra iterations). # fit the model again but ask to monitor all the random effects model.def <- jags.model(file = textConnection(jags.model), inits = list(.RNG.name="base::Wichmann-Hill", .RNG.seed=100), data = tutorial, n.chains = 2) ## Compiling model graph ## Resolving undeclared variables ## Allocating nodes ## Graph information: ## Observed stochastic nodes: 4059 ## Unobserved stochastic nodes: 72 ## Total graph size: 16539 ## ## Initializing model update(object = model.def, n.iter = 1000) # the random effects (the ones we need for the aim of this approach) parameters <- c("alpha", "beta", "mu.alpha", "mu.beta") results <- coda.samples(model = model.def, variable.names = parameters, n.iter =1000) Extract the samples from the posterior (from both chains): chain1 <- as.data.frame(results[]) chain2 <- as.data.frame(results[]) samples <- rbind(chain1, chain2) # combine both chains Before we start manipulating the sampled random (and fixed) effects, such that we can add each one to a copy of the original data set, it would be nice to have a glimpse of the data set’s layout. datatable(tutorial.1, options = list(pageLength = 5)) Extract the fixed effect from each sampled vector (i.e., $$\alpha$$ and $$\beta)$$: fixed_alphas <- samples[, 131] fixed_betas <- samples[, 132] fixed_alphas <- as.data.frame(fixed_alphas) fixed_betas <- as.data.frame(fixed_betas) Repeat each fixed effect N times (in this case 4059 - the number of $$N_{level-1}$$ observations) and store it in a separate data frame that will be merged with the imputed data sets later on: # the fixed intercepts samp_fixed_alphas <- list() for (i in 1:nrow(fixed_alphas)){ samp_fixed_alphas[[i]] <-rep.int(fixed_alphas[i, 1], nrow(tutorial.1)) } # the fixed slopes samp_fixed_betas <- list() for (i in 1:nrow(fixed_betas)){ samp_fixed_betas[[i]] <-rep.int(fixed_betas[i, 1], nrow(tutorial.1)) } Get the sampled random intercepts ($$\alpha_j's$$) First, split the data per group: split <- split(tutorial.1, tutorial.1$school)

Afterwards, extract the size of each group:

n_gr <- sapply(split, nrow) 

Exclude the columns containing the fixed effects from the samples data set:

samples <- samples[, -c(131,132)]

Extract the random $$\alpha_{j}s$$ and $$\beta_{j}s$$ into separate data frames:

alphas <- samples [, 1:65]  # extract the random intercepts
betas <- samples [, 66:130] # extract the random slopes

Use the size of each group to replicate, from each iteration, the respective $$\alpha_j$$, n_gr number of times:

samp_alphas <- list()

for (i in 1:nrow(alphas)){
samp_alphas[[i]] <-rep.int(alphas[i, ], n_gr)
}

Extract these in a big matrix with the samples from every iteration stored in a separate column:

samp_alphas_mat <- matrix(nrow = nrow(tutorial.1), ncol = nrow(alphas))

for(i in 1:nrow(alphas)){
samp_alphas_mat[, i] <- unlist(samp_alphas[[i]])
}

Get the sampled random slopes ($$\beta_j's$$)

Use the size of each group to replicate, from each iteration, the respective $$\beta_j$$, n_gr number of times:

samp_betas <- list()

for (i in 1:nrow(betas)){
samp_betas[[i]] <-rep.int(betas[i, ], n_gr)
}

Extract them in a big matrix with the samples from every iteration stored in a separate column:

samp_betas_mat <- matrix(nrow = nrow(tutorial.1), ncol = nrow(betas))

for(i in 1:nrow(betas)){
samp_betas_mat[, i] <- unlist(samp_betas[[i]])
}

Combine the sampled random effects

First, combine the $$\alpha_j's$$:

imputed <- list()

for (i in 1:ncol(samp_alphas_mat)){
imputed[[i]] <- cbind(tutorial.1, samp_alphas_mat[, i])
}

Add the $$\beta_j's$$:

for (i in 1:ncol(samp_betas_mat)){
imputed[[i]] <- cbind(imputed[[i]], samp_betas_mat[, i])
}

The $$\alpha's$$:

for (i in 1:length(imputed)){
imputed[[i]] <- cbind(imputed[[i]], samp_fixed_alphas[[i]])
}

The $$\beta's$$:

for (i in 1:length(imputed)){
imputed[[i]] <- cbind(imputed[[i]], samp_fixed_betas[[i]])
}

Voilà, we obtained 2000 imputed data sets.

### Transform the outcome variable

In order to be able to fit multiple linear regression models we need to transform the outcome variable as follows,

$Z_{ij} = \text{normexam}_{ij} - \alpha_j - \beta_j\;\text{standlrt}_{ij} + \alpha + \beta\;\text{standlrt}_{ij},$

which leads to $Z_i = \eta_0 + \eta_1\;\text{standlrt}_i + e_i.$

Construct a function that calculates $$Z$$:

transform_z <- function(df){
df\$z <- df[, 3] - df[, 5] - df[, 6] * df[, 4] + df[, 7] + df[, 8] * df[, 4]
df
}

Apply the function to the “imputed” data sets and obtain a column for $$Z$$:

imputed <- lapply(imputed, transform_z)

Now let’s quickly have a look at one of the imputed data sets:

datatable(imputed[][,-c(1,2)], options = list(pageLength = 5)) # exclude the school and student columns