A Tutorial on Clustering in Bayesian Graphical Models Using the R Packages bgms and easybgm

tutorial

Introduction

This is a short software tutorial accompanying the paper “A Bayesian Stochastic Block Model for Clustering in Graphical Models”. The paper introduces the Stochastic Block Model as a prior on the network structure of the ordinal Markov random field graphical model (Marsman et al., 2025). In addition to offering an approach for incorporating the theoretical assumption of clustering directly into the statistical model, the method provides a principled way to infer the number of clusters in the network given the data, and to test hypotheses about clustering.

In this brief tutorial, an example analysis is first illustrated using the R package bgms (Marsman et al., 2023). The same analysis is then demonstrated using the user-friendly wrapper package easybgm (Huth et al., 2024). The package easybgm provides a more accessible interface for conducting the analysis, particularly for users without a strong background in R. In the near future, these functionalists will also be available in the Network module of open-source statistical software JASP

Suppose the aim is to evaluate whether the network structure estimated from empirical data exhibits clustering. Within the Bayesian graphical modeling framework, this can be examined by estimating a representative node allocation vector from the posterior distribution. This allocation summarizes the most plausible cluster membership of each node given the data and thereby provides a direct indication of potential community structure in the network.

In addition, the model yields posterior probabilities over the number of clusters, allowing researchers to quantify uncertainty about the clustering solution. These probabilities can be used to compute Bayes factors for testing hypotheses about the presence or absence of clustering, or for testing hypotheses about specific numbers of clusters.

The first step is to estimate the model by setting the argument edge_prior = "Stochastic-Block" in either the bgms or easybgm functions. When using the stochastic block model (SBM) as a prior on the network structure, several additional arguments are important:

  • Shape hyperparameters for the Beta prior distributions set on the probabilities for the inclusion of within and between block edges

    • beta_bernoulli_alpha
    • beta_bernoulli_beta
    • beta_bernoulli_alpha_between
    • beta_bernoulli_beta_between
  • These hyperparameters govern the expected density of edges within and between clusters.

  • Larger values of beta_bernoulli_alpha and smaller values of beta_bernoulli_beta encourage denser connections within clusters.

  • Conversely, smaller values of beta_bernoulli_alpha_between and larger values of beta_bernoulli_beta_between encourage sparser connections between clusters.

  • Recommended settings:

    • When clustering is expected:
      • beta_bernoulli_alpha = 8
      • beta_bernoulli_beta = 1
      • beta_bernoulli_alpha_between = 1
      • beta_bernoulli_beta_between = 8
    • When clustering is not expected:
      • beta_bernoulli_alpha = 1
      • beta_bernoulli_beta = 1
      • beta_bernoulli_alpha_between = 1
      • beta_bernoulli_beta_between = 8
  • The rate hyperparameter of the truncated Poisson prior on the number of clusters

    • lambda
    • Although lambda is not exactly the expected number of clusters, it tracks it closely. For example:
      • lambda = 1 → expected number of clusters ≈ 1
      • lambda = 1.59 → expected number of clusters ≈ 2
      • lambda = 2.82 → expected number of clusters ≈ 3
      • lambda = 3.92 → expected number of clusters ≈ 4
      • lambda = 4.96 → expected number of clusters ≈ 5
  • The concentration parameter of the Dirichlet distribution on the probabilities of allocating the nodes into clusters

    • dirichlet_alpha
    • This hyperparameter controls how balanced we expect cluster sizes to be.
      • Larger values encourage more balanced cluster sizes of similar size.
      • Smaller values encourage uneven, potentially sparse cluster structures.
      • The default value of 1 corresponds to a neutral prior: all partitions of nodes are equally likely, without preferring equal or unequal clusters.
    • We recommend leaving this hyperparameter at the default value, unless there are strong theoretical reasons to do otherwise.

It is strongly recommended that researchers carefully examine their choice of hyperparameter values when conducting analyses and perform prior sensitivity checks. For further discussion and practical guidance, see, for example, Marsman et al. (2023), Huth et al. (2024) and Sekulovski et al. (2024)

In the code examples below, data is a placeholder for the actual dataset to be analyzed. This dataset should be an \(n \times p\) matrix or data frame, where \(n\) denotes the number of observations and \(p\) the number of variables to be included in the network. In the current implementation, the variables may be binary, ordinal, or a combination of both.

Suppose we have a dataset with 200 observations and 10 variables, and theoretical considerations suggest a two-cluster structure. In that case, the hyperparameters could be specified as follows:

  • beta_bernoulli_alpha = 8,
  • beta_bernoulli_beta = 1,
  • beta_bernoulli_alpha_between = 1,
  • beta_bernoulli_beta_between = 8,
  • lambda = 1.59

Clustering Analysis using the package bgms

We first need to make sure that the latest CRAN version of bgms is installed.

install.packages("bgms")
library(bgms)
?bgm # for more details on the function and its arguments
# run the model
fit <- bgm(data, edge_prior = "Stochastic-Block", 
           update_method = "adaptive-metropolis",
           beta_bernoulli_alpha = 8,
           beta_bernoulli_beta = 1,
           beta_bernoulli_alpha_between = 1,
           beta_bernoulli_beta_between = 8,
           lambda = 1.59)

Now, first we check the estimated posterior cluster allocations of the nodes. The output is a vector whose length equals the number of variables (columns) in the dataset. There are two possible options: the posterior mean vector and the posterior mode vector. To examine this, we can inspect the allocations vectors in the fit object, where the results of our analysis are stored.

cat("Posterior mean:\n")
print(fit$posterior_mean_allocations)

cat("\nPosterior mode:\n")
print(fit$posterior_mode_allocations)

We can also examine the estimated posterior probabilities for all possible clusters. In the software, this ranges from 1 (i.e., no clustering) to the total number of variables in the dataset (i.e., each variable forms its own cluster). These probabilities provide an indication of the relative plausibility of each clustering solution given the data.

fit$posterior_num_blocks

The posterior probabilities are particularly useful for calculating the cluster Bayes factors, which are defined as follows:

\[\text{BF}_{10} = \underbrace{\frac{p(\mathcal{H}_1\mid \text{data})}{p(\mathcal{H}_0 \mid \text{data})}}_{\text{Posterior Odds}} \bigg/ \underbrace{\frac{p(\mathcal{H}_1)}{p(\mathcal{H}_0)}}_{\text{Prior Odds}}\]

Depending on the hypotheses, we have two types of Bayes factors:

  • The Bayes factor testing the hypothesis of clustering, i.e., \(\mathcal{H}_1: B > 1\), against the null hypothesis of no clustering \(\mathcal{H}_0: B = 1\). For this, we first need to calculate the posterior odds of the two hypotheses, which is simply the ratio of the posterior probabilities of the two hypotheses. For \(\mathcal{H}_1: B > 1\), we need to sum the posterior probabilities of all possible numbers of clusters greater than 1.
H1 <- unname(fit$posterior_num_blocks[-1,1]) # posterior probabilities under H1
H0 <- unname(fit$posterior_num_blocks[1,1]) # posterior probabilities under H0

posterior_odds <- sum(H1) / H0 

After calculating the posterior odds, we need to divide by the prior odds in order to obtain the Bayes factor. For this Bayes factor, the prior odds are defined as:

\[ \frac{(\exp(\lambda) - 1 - \lambda)}{\lambda}.\] We calculate the prior odds as follows:

lambda <- fit$arguments$lambda  # extract the value of lambda used 
prior_odds <- (exp(lambda) - 1 - lambda) / lambda

Finally the Bayes factor is given by

BF_10 <- posterior_odds / prior_odds
cat("Bayes factor in favor of H1:\n")
BF_10  
cat("Bayes factor in favor of H0:\n")
1/BF_10 # inverse of the Bayes factor in favor of H1
  • The Bayes factor testing the hypothesis of a specific number of clusters, i.e., \(\mathcal{H}_1: B = b_1\), against \(\mathcal{H}_2: B = b_2\). Say we wish to test the null hypothesis of one cluster against the hypothesis that assumes there are two clusters (i.e., \(b_1 = 1\) and \(b_2 = 2\)). For this, we follow similar steps as above.
H1 <- unname(fit$posterior_num_blocks[1,1]) # posterior probabilities under H0 (i.e., B = 1)
H2 <- unname(fit$posterior_num_blocks[2,1]) # posterior probabilities under H1 (i.e., B = 2)
posterior_odds <- H1 / H2

The prior odds for this Bayes factor are given by: \[\lambda^{b_1 - b_2} \cdot \frac{b_2!}{b_1!}\]

prior_odds <- lambda^(1 - 2) * factorial(2) / factorial(1)

Finally the Bayes factor is given by

BF_12 <- posterior_odds / prior_odds
cat("Bayes factor in favor of H1:\n")
BF_12  
cat("Bayes factor in favor of H2:\n")
1/BF_12 

Researchers can additionally examine the posterior co-clustering matrix, a symmetric matrix that contains, for each pair of variables, the proportion of posterior samples in which the two variables are assigned to the same cluster. This matrix offers an informative visual summary of clustering uncertainty. Diffuse or heterogeneous patterns suggest that certain variables frequently change cluster membership or do not have a well-defined allocation.

insstall.packages("pheatmap")
library(pheatmap)
pheatmap(fit$posterior_mean_coclustering_matrix, 
         cluster_rows = FALSE, 
         cluster_cols = FALSE,
         color = colorRampPalette(c("#B8860B", "#009E73"))(100))

Clustering Analysis using the package easybgm

Everything that has been illustrated using bgms can be more easily achieved using easybgm. We simply execute the analysis as follows:

install.packages("easybgm")
library(easybgm)

fit <- easybgm(data, 
               type = "ordinal", # this analysis only works with binary and/or ordinal data
               edge_prior = "Stochastic-Block",
               update_method = "adaptive-metropolis",
               beta_bernoulli_alpha = 8,
               beta_bernoulli_beta = 1,
               beta_bernoulli_alpha_between = 1,
               beta_bernoulli_beta_between = 8,
               lambda = 1.59)

There is no need to perform any post-processing, manually inspect the result objects, or calculate Bayes factors “by hand.” Instead, the summary function can be used to display all relevant output directly.

summary(fit)

Following the edge overview (for more details, see Huth et al. (2024)), the output from the summary function provides a clustering overview. This includes the posterior probabilities for all possible numbers of clusters, the estimated node memberships, and the Bayes factor comparing the hypothesis of clustering against the null hypothesis of no clustering.

In case researchers wish to calculate the second type of Bayes factor, similar to the steps above, they can simply use the function clusterBayesfactor, by setting the argument type = "simple" and specifying the values for \(b_1\) and \(b_2\).

clusterBayesfactor(fit, type = "point", b1 = 1, b2 = 2)

The posterior coclustering matrix can be plotted in a similar way as shown above for bgms.

library(pheatmap)
pheatmap(fit$sbm$posterior_mean_coclustering_matrix, 
         cluster_rows = FALSE, 
         cluster_cols = FALSE,
         color = colorRampPalette(c("#B8860B", "#009E73"))(100))

References

Huth, K. B. S., Keetelaar, S., Sekulovski, N., Bergh, D. van den, & Marsman, M. (2024). Simplifying bayesian analysis of graphical models for the social sciences with easybgm: A user-friendly r-package. AIP Advances, e66366. https://doi.org/10.56296/aip00010
Marsman, M., Bergh, D. van den, & Haslbeck, J. M. B. (2025). Bayesian analysis of the ordinal Markov random field. Psychometrika. https://doi.org/10.1017/psy.2024.4
Marsman, M., Huth, K., Sekulovski, N., & Bergh, D. van den. (2023). Bgms: Bayesian variable selection for networks of binary and/or ordinal variables. https://CRAN.R-project.org/package=bgms
Sekulovski, N., Keetelaar, S., Haslbeck, J. M. B., & Marsman, M. (2024). Sensitivity analysis of prior distributions in bayesian graphical modeling: Guiding informed prior choices for conditional independence testing. Advances. In/Psychology. https://doi.org/10.56296/aip00016/