Preface

Disclaimer

The opinions & statements expressed in this presentation and on the following slides are solely of the presenters, and not necessarily those of Novartis.

Learning Goals

After this course, you should:

  • Be familiar with brms syntax and workflow
  • Recognize its versatility for statistical modelling in drug development
  • Have hands-on experience with the package from two guided exercises

and of course:

  • Feel empowered to use brms going forward!

Housekeeping

Acknowledgements

  • Paul Buerkner (Technical University of Dortmund)
  • Andrew Bean (Novartis)
  • Björn Holzhauer (Novartis)
  • David Ohlssen (Novartis)
  • Cong Zhang (Novartis)

Tutorial Agenda

  1. Introduction: a brms modelling workflow
  2. Historical controls
  3. Priors
  4. Dose finding (nonlinear Emax models, model averaging)
  5. Mixed models with repeated measurements
  6. Longitudinal data

A brms modelling workflow

Bayesian inference

p(\theta|y) = \frac{p(y|\theta) \, p(\theta)}{p(y)}

  • Prior p(\theta)
  • Model likelihood p(y|\theta)
  • Marginal data likelihood p(y) = \int p(y|\theta) \, p(\theta) \, d\theta
  • Posterior p(\theta|y) is conditional on data y

Bayesian inference is integration

\hat{\theta} = E[\theta|y] = \int \theta \, p(\theta|y) \, d\theta

In practice solved by Markov-Chain Monte-Carlo (MCMC) in generating draws s=1, \ldots, S

\theta_s \sim p(\theta|y)

\int \theta \, p(\theta|y) \, d\theta \approx \frac{1}{S} \sum_{s=1}^S \theta_s

Stan project: Bayesian modelling platform

  • mc-stan.org
  • discourse.mc-stan.org
  • Models are written in the Stan probabilistic programming language for priors & model likelihood conditional on data
  • Hamiltonian Monte-Carlo No-U-turn sampler (NUTS)

data {
  int<lower=0> N;
  array[N] int<lower=0,upper=1> y;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}

brms: Bayesian regression modelling in Stan

https://paulbuerkner.com/brms/

  • Specify models via extended R formula syntax
  • Internally write Stan code that is readable yet fast
  • Provide an easy interface for defining priors & models
  • Facilitate post-processing

Some Highlights of brms

  • Flexible hierarchical (random effects) modeling
  • Both built-in and user-defined likelihoods
  • Explicit and implicit non-linear modeling
  • Distributional regression
  • Within-chain parallelization
  • Posterior and prior predictions
  • Model checking
  • Highly dense feature matrix

Model specification via formula

Varying intercept model for a single grouping factor:

formula <- y ~ 1 + x + (1 | g)

Varying intercept-slope model for a single grouping factor:

formula <- y ~ 1 + x + (1 + x | g)

Advanced non-linear terms such as Gaussian processes:

formula <- y ~ 1 + gp(x) + (1 + x | g)

… or approximate Gaussian processes:

formula <- y ~ 1 + gp(x, k=9) + (1 + x | g)

Model specification via formula

Linear formulas for multiple distributional parameters (e.g., predict mean and overdispersion of negative binomial):

formula <- bf(
  y ~ 1 + x + (1 | g) + ...,
  par2 ~ 1 + x + (1 | g) + ...,
  par3 ~ 1 + x + (1 | g) + ...,
)

Non-linear formula for a single distributional parameter:

formula <- bf(
  y ~ fun(x, nlpar1, nlpar2),
  nlpar1 ~ 1 + x + (1 | g) + ...,
  nlpar2 ~ 1 + (1 | g) + ...,
  nl = TRUE
)

Model likelihood via family

General structure:

family <- brmsfamily(
    family = "<family>", link = "<link>",
    more_link_arguments
)

Gaussian likelihood (default):

family <- brmsfamily(family = "gaussian", link = "identity",
                     link_sigma = "log")

Poisson likelihood:

family <- brmsfamily(family = "poisson", link = "log")

See also vignette("brms_families") for details on the families.

Part 1: historical control data

Use of historical control data in clinical trials

Goal: reduce control group sample size while maintaining power

  1. Collect historical data from relevant literature systematically

  2. Evaluate heterogeniety of historical data
    data quality, patient population, trial design

  3. Pre-specify trial protocol
    prior documentation, operating charachteristics

  4. Prior-data conflict?

The meta-analytic-predictive (MAP) prior1

  • Borrow from historical data under an exchangeability assumption
  • Discount/down-weight historical data through between-trial heterogeneity \tau
  • Meta-analytic-predictive (MAP) prior through generative nature of hierarchical models
  • Pre-requisites
    • essentially the same endpoint
    • similar treatment procedures
    • similar patient eligibility criteria
    • similar patient population characteristics

flowchart LR
    M((β,τ)) --> A
    M --> B
    M --> C
    M -->|prediction| T
    A((θ<sub>1</sub>)) --> YA[Y<sub>1</sub>]
    B((θ<sub>2</sub>)) --> YB[Y<sub>2</sub>]
    C((θ<sub>3</sub>)) --> YC[Y<sub>3</sub>]
    T((θ<sub>*</sub>)) -->|new trial| YT[Y<sub>*</sub>]

Example: double-blind PoC placebo controlled trial1

  • Endpoint binary ASAS20 response at week 6
    Improvement = higher rates

  • Historical control data
    8 studies with 513 patients

  • Bayesian design

    • P(\pi_t - \pi_p > 0 | y) > 0.95
    • Placebo: (“MAP prior” from historical data)
      • \pi_{p} \sim \text{Beta}(11,32)
    • Active:
      • \pi_{t} \sim \text{Beta}(0.5,1)
  • Randomization ratio 4:1 (24 vs 6)

MAP prior:

mean sd 2.5% 50% 97.5%
0.26 0.09 0.11 0.25 0.47

Generalized Meta-Analytic-Predictive model

Y is the (control) group summary data for H historical trials \begin{align*} Y_{h}|\theta_{h} &\sim f(\theta_{h}) & \forall \; h \in [1,H] \\ Y_\ast|\theta_\ast &\sim f(\theta_\ast) & \text{for new trial} \end{align*}

Exchangeability assumption: \begin{align*} g(\theta_{h})|\beta,\tau &\sim \text{Normal}(\beta, \tau^2) & \forall \; h \in [1,H] \\ g(\theta_\ast)|\beta,\tau &\sim \text{Normal}(\beta, \tau^2) & \text{for new trial} \end{align*}

  • f likelihood / g link function: Binomial/logit, Normal (known \sigma)/identity or Poisson/log
  • \tau \rightarrow 0 \Rightarrow pooling / unbound use of historical data
  • \tau \rightarrow \infty \Rightarrow stratification / no use of historical data

Running the MAP analysis in R

  • RBesT1
library(RBesT)
set.seed(98721487)
map_mc <- gMAP(cbind(r, n-r) ~ 1 | study,
               data=RBesT::AS,
               family=binomial,
               tau.dist="HalfNormal",
               tau.prior=1,
               beta.prior=2)
  • brms
library(brms)
# use of an outcome modifier to specify number of trials
# intercept study random effect requested with (1 | study)
bmap_model <- bf(r | trials(n) ~ 1 + (1 | study),
                 family=binomial, center=FALSE)
bmap_model_prior <- prior(normal(0, 2), class=b, coef=Intercept) +
    prior(normal(0, 1), class=sd, coef=Intercept, group=study)
bmap_mc <- brm(bmap_model,
               data=RBesT::AS,
               prior=bmap_model_prior,
               seed=98721487,
               control=list(adapt_delta=0.99)) # Stan sampling parameter
study n r
Study 1 107 23
Study 2 44 12
Study 3 51 19
Study 4 39 9
Study 5 139 39
Study 6 20 6
Study 7 78 9
Study 8 35 10

Summary MAP prior model

summary(bmap_mc)
 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | study) 
   Data: RBesT::AS (Number of observations: 8) 

Multilevel Hyperparameters:
~study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.37      0.21     0.04     0.86 1.00     1104     1126

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.11      0.19    -1.48    -0.73 1.00     1581     1753
  • Intercept is the typical trial response rate on the logit scale
  • sd(Intercept) is the between-trial heterogeneity \tau

Predicting the placebo response rate in a new study

AS_new <- data.frame(study = "new_study", n = 1)
pe <- posterior_epred(
    bmap_mc, newdata = AS_new,
    allow_new_levels = TRUE, 
    sample_new_levels = "gaussian"
)
posterior_summary(pe)
     Estimate  Est.Error      Q2.5     Q97.5
[1,] 0.257566 0.08803654 0.1101559 0.4716962
  • allow_new_levels=TRUE allows to simulate the study random effect for a new (unseen) studies
  • sample_new_levels = "gaussian" ensures that the simulation is done using the normal hierarchical model (the default is to bootstrap existing levels)

Approximation with a finite mixture

For specifying the prior as part of a pre-specified analysis we choose a parametric approximation:

pe_mix <- automixfit(pe[, 1], type = "beta")
print(pe_mix, digits=3)
EM for Beta Mixture Model
Log-Likelihood = 4433.645

Univariate beta mixture
Mixture Components:
  comp1   comp2   comp3   comp4  
w   0.452   0.198   0.184   0.165
a  34.650  22.567  11.717   2.550
b 106.893  48.984  54.192   5.588
plot(pe_mix)$mix

Region specific MAP prior (hypothetically)

withr::with_seed(654873,
                 AS_region <- bind_cols(AS, region=sample(c("asia", "europe", "north_america"), 8, TRUE)))
kable(AS_region)
study n r region
Study 1 107 23 europe
Study 2 44 12 asia
Study 3 51 19 europe
Study 4 39 9 europe
Study 5 139 39 asia
Study 6 20 6 north_america
Study 7 78 9 europe
Study 8 35 10 asia

Region specific MAP Priors via varying regions model

form_AS_region <- bf(r | trials(n) ~ 1 + (1 | region/study), 
                     family = binomial, center=FALSE)

# show which priors brms defines for this model
get_prior(form_AS_region, AS_region)

bprior_AS_region <- prior(normal(0, 2), class=b, coef=Intercept) +
  prior(normal(0, 0.50), class=sd, coef=Intercept, group=region) +
  prior(normal(0, 0.25), class=sd, coef=Intercept, group=region:study)

fit_AS_region <- brm(
  form_AS_region, data = AS_region, prior = bprior_AS_region, seed = 29856341,
  control=list(adapt_delta=0.99))
  • (1 | region/study) specifies a nested random effect structure as in this example each study is run in a single region. It is equivalent to
    (1 | region) + (1 | region:study).

Summary region model

summary(fit_AS_region)
 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | region/study) 
   Data: AS_region (Number of observations: 8) 

Multilevel Hyperparameters:
~region (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.28      0.22     0.01     0.86 1.00     1605     1680

~region:study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.24      0.14     0.02     0.53 1.00     1648     1345

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.06      0.26    -1.57    -0.46 1.00     1643     1661

Extract MAP MCMC samples

AS_region_new <- data.frame(study = "new_study_asia", 
                            n = 1, region = "asia")

pe_region <- posterior_epred(
  fit_AS_region, newdata = AS_region_new, 
  allow_new_levels = TRUE, 
  sample_new_levels = "gaussian"
)
dim(pe_region)
[1] 4000    1
posterior_summary(pe_region)
      Estimate  Est.Error      Q2.5     Q97.5
[1,] 0.2691718 0.06397847 0.1549326 0.4190814

Obtain parametric MAP prior

pe_mix_region <- RBesT::automixfit(pe_region[, 1], type = "beta")
plot(pe_mix_region)$mix

Summary historical control data

  • Bayesian random-effects meta-analysis models can be used to derive Meta-Analytic-Predictive (MAP) priors
  • Predictions for new study mean inform the MAP prior
  • Specification and inference for MAP models is simple in brms
  • Not covered: Parametric MAP mixtures can be used as priors in brms using the RBesT::mixstanvar adapter

Hands-on exercises: historical control data

Priors

The prior can only be understood in the context of the model 1

Commonly heard reasons to be afraid of priors

“Priors are inherently subjective”

“Priors bias your analysis”

“You should let the data speak for themselves”

“Non-Bayesian models don’t have priors and don’t need them”

“I have no idea how to set appropriate priors”

Are priors inherently subjective?

Consider using the alternative terms “contextual” or “conditional”1

  • on the person or people or agent performing the analysis
  • on results of previous studies
  • on structural assumptions

Do priors bias your analysis?

Definition:

An estimator \hat{\theta} of the parameter \theta is called unbiased if \text{E}_{y}(\hat{\theta}) = \theta.

Corrollary:

If an estimator \hat{\theta} is unbiased for a given prior, it will be biased for most other priors.

Corrollary:

If an estimator \hat{\theta} is unbiased at a given scale, it will be biased for most non-linear transformations f(\hat{\theta}) (Jensen’s inequality): \text{E}_y(f(\hat{\theta})) \neq f(\theta)

Unbiasedness is fragile and often only holds in the limit

Is bias even the right metric?

A sampling distribution is a distribution of a data-dependent variable (a “statistic”) as implied by variation in data y across the true data generating process

Consider two estimators \hat{\theta}_1 and \hat{\theta}_2 of \theta with sampling distributions given by

  • p(\hat{\theta}_1) := p(\hat{\theta}_1(y)) = \text{normal}(\theta, 10)
  • p(\hat{\theta}_2) := p(\hat{\theta}_2(y)) = \text{normal}(\theta + 0.2, 1)

Which of them is better?

Winning the bias-variance trade-off

The RMSE can be defined as \begin{align*} \text{RMSE}_y(\hat{\theta}, \theta) & = \sqrt{\int (\hat{\theta}(y) - \theta)^2 \; p(y) \; d y} \\ & = \sqrt{\text{Var}_y{\hat{\theta}} + \text{Bias}_y(\hat{\theta}, \theta) ^2} \end{align*}

It expresses a trade-off between variance and bias

A lot of priors will help you win the bias-variance trade-off, not the race for minimal bias!

Non-Bayesian models don’t have priors?

Basic multilevel model with J groups:

y ~ normal(b0[j] + b1 * x, sigma)
b0[j] ~ normal(b0, sd0)

Is b0[j] ~ normal(b0, sd0) a likelihood, prior, or something else?

Some reasons for using priors

  • Make a-priori implausible values unlikely (weakly informative priors)
  • Incorporate specific expert information into the model (“subjective” priors)
  • Mimic frequentist methods (uninformative/“objective” priors)
  • Represent known data structure (multilevel priors)
  • Regularize the model to avoid overfitting (shrinkage/sparsifying priors)
  • Enable hypothesis testing via Bayes factors
  • Ensure unimodal posteriors
  • Facilitate convergence

Some observations about priors

Uniformity is informative

Prior tails matter

What is the posterior?

  • … if the prior is \theta \sim \text{student-t}(4, 5, 1)

  • … and the likelihood alone implies a posterior of \theta \sim \text{normal}(-5, 1)

Prior tails matter: Student-t(4) prior vs. normal likelihood

Prior tails matter: Normal prior vs. Student-t(4) likelihood

Prior tails matter: Normal prior vs. normal likelihood

Prior tails matter: Student-t(4) prior vs. Student-t(4) likelihood

But what now?

Principles of specifying priors

  • Respecting boundaries: use hard bounds in priors exactly where parameters have natural bounds

  • Expressiveness: use prior families flexible enough to express different plausible prior knowledge

  • Controlling complexity: If you want to be conservative, put relevant amount of prior probability at or near a conservative submodel

  • Scale awareness: ensure that priors take the scales of parameters into account

  • Reflecting structure: reflect the structure of the data in the structure of the prior

  • Data Informed: Use previous data to inform the current priors

Data informed priors

Part 2: Dose finding

Typical dose response shapes

Typical dose response shapes

Typical dose response shapes

Data Set PATHWAY

  • Placebo controlled trial
  • Treatment of severe asthma with tezepelumab
  • Three different doses + placebo
  • Endpoint: annualized rate of asthma exacerbations
  • Estimates per arm from negative binomial regression (like in “arm-based meta-analysis”), not individual patient data
dose group log_est log_stderr
0 placebo −0.400 0.103
70 tezepelumab 70 mg q4w −1.347 0.177
210 tezepelumab 210 mg q4w −1.661 0.222
560 tezepelumab 280 mg q2w −1.514 0.191

Sigmoid Emax Model

f(\text{dose}; \text{parameters}) = \text{E}_0 + \text{E}_\text{max} \frac{\text{dose}^h}{\text{dose}^h + \text{ED}_{50}^h}

Parameters:

  • \text{E}_0 \in \mathbb{R}: Expected placebo outcome
  • \text{E}_\text{max} \in \mathbb{R}: Maximum effect size
  • h \in \mathbb{R}_+: Hill (steepness) parameter
  • \text{ED}_{50}: Dose at which 50% of \text{E}_\text{max} is reached

Sigmoid Emax Model: Visualization

Specifying sigmoid Emax Model with brms

form_sig <- bf( 
  log_est | se(log_stderr) ~ E0 + Emax * dose^h / 
                             (dose^h + ED50^h),
  nlf(h ~ exp(logh)), nlf(ED50 ~ exp(logED50)),
  E0 ~ 1, Emax ~ 1, logh ~ 1, logED50 ~ 1,
  nl = TRUE,
  family = gaussian()
)

prior_sig <- prior(normal(0,1), nlpar=E0) +
  prior(normal(0,1), nlpar=logh) +
  prior(normal(0,1), nlpar=Emax) +
  prior(normal(4,2), nlpar=logED50)

A note on Emax Model priors

What is more informative?

  • \text{logh} \sim \text{normal}(0,1)
  • \text{logh} \sim \text{normal}(0,3)

A note on Emax Model priors

A note on Emax Model priors

  • Biologically plausible values: approx. .25 < h < 4 (see prior interval probabilities).
    The above prior puts roughly 1/3 above 4 and 1/3 below 0.25!

  • Re-writing the fraction to \left(\frac{d}{\text{ED}_{50}}\right)^h / \left(1+\left(\frac{d}{\text{ED}_{50}}\right)^h\right) makes it obvious that for h \rightarrow 0, this goes to 1/2, and for h \rightarrow \infty, either to 0 (d < \text{ED}_{50}) or 1 (d > \text{ED}_{50}).

Fitting the sigmoid Emax Model with brms

fit_sig = brm(
  formula = form_sig, 
  data = pathway, 
  prior = prior_sig,
  control = list(adapt_delta = 0.999)
) 

Sigmoid Emax Model: Results Summary

summary(fit_sig)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log_est | se(log_stderr) ~ E0 + Emax * dose^h/(dose^h + ED50^h) 
         h ~ exp(logh)
         ED50 ~ exp(logED50)
         E0 ~ 1
         Emax ~ 1
         logh ~ 1
         logED50 ~ 1
   Data: pathway (Number of observations: 4) 

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
E0_Intercept         -0.42      0.10    -0.61    -0.21 1.00     2065     2179
Emax_Intercept       -1.30      0.32    -2.11    -0.84 1.00     1172     1199
logh_Intercept       -0.08      0.98    -1.90     1.92 1.00     1306     1914
logED50_Intercept     2.73      1.38    -0.27     5.32 1.00     1341     1273

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.00      0.00     0.00     0.00   NA       NA       NA

Visualizing the Fitted Sigmoid Emax Model

tibble(dose = seq(0, 560, 1), log_stderr=1) |>
  tidybayes::add_epred_rvars(object=fit_sig) |>
  (\(x) x |> 
     left_join(x |> filter(dose==0) |> rename(pbo = .epred) |> dplyr::select(-dose), 
               by="log_stderr"))() |>
  mutate(.delta = .epred - pbo) |>
  ggplot(aes(x=dose, ydist=.delta)) +
  ggdist::stat_lineribbon()

Modified Beta Model

f(\text{dose}; \text{parameters}) = \text{E}_0 + \text{E}_\text{max} \; \frac{(\delta_1 + \delta_2)^{(\delta_1+\delta_2)}} {\delta_1^{\delta_1} \, \delta_2^{\delta_2}} \; \left(\frac{\text{dose}}{S}\right)^{\delta_1} \left(1-\frac{\text{dose}}{S}\right)^{\delta_2}

Parameters:

  • \text{E}_0 \in \mathbb{R}: Expected placebo response
  • \text{E}_\text{max} \in \mathbb{R}: Maximum effect size
  • \delta_1, \delta_2 \in \mathbb{R}_+: Shape parameters
  • S: constant > maximum dose, e.g. 1.5 \times \max(\text{dose}), here we choose S=850

Specifying the Modified Beta Model with brms

form_mbeta <- bf( 
  log_est | se(log_stderr) ~ E0 +   
    Emax * (delta1+delta2)^(delta1+delta2) /
    (delta1^delta1 * delta2^delta2) * 
    (dose/850)^delta1 * (1-dose/850)^delta2,
  nlf(delta1 ~ exp(logdelta1)), nlf(delta2 ~ exp(logdelta2)),
  E0 ~ 1, Emax ~ 1, logdelta1 ~ 1, logdelta2 ~ 1,
  nl = TRUE,
  family = gaussian()
)

prior_mbeta <- prior(normal(0,1), nlpar="E0") +
  prior(normal(0,1), nlpar="Emax") +
  prior(normal(0,1), nlpar="logdelta1") +
  prior(normal(0,1), nlpar="logdelta2")

Fitting the Modified Beta Model with brms

fit_mbeta <- brm(
  form_mbeta, 
  data = pathway, 
  prior = prior_mbeta,
  control = list(adapt_delta = 0.999)
)

Visualizing the Fitted Modified Beta Model

Model Evaluation (failed attempt)

(loo_mbeta <- loo(fit_mbeta))

Computed from 4000 by 4 log-likelihood matrix.

         Estimate  SE
elpd_loo      0.1 0.4
p_loo         2.2 0.6
looic        -0.2 0.7
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.6]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     1     25.0%   1282    
   (0.7, 1]   (bad)      3     75.0%   <NA>    
   (1, Inf)   (very bad) 0      0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
  • brms interfaces with the loo R package that performs efficient approximate leave-one-out cross-validation (LOO) for Bayesian models by Pareto smoothed importance sampling (PSIS). See Vethari et al. (2017) and the loo documentation on CRAN.

Model Evaluation (works)

loo_exact_sig <- kfold(fit_sig, folds = "loo", silent=2)
(loo_exact_mbeta <- kfold(fit_mbeta, folds = "loo", silent=2)) 

Based on 4-fold cross-validation.

           Estimate  SE
elpd_kfold     -2.0 1.6
p_kfold         4.3 2.2
kfoldic         4.0 3.3
  • elpd_kfold: CV estimate of the expected log pointwise predictive density for a new dataset
  • p_kfold: CV estimate of difference between elpd_loo and the non-cross-validated log posterior predictive density (describes how much more difficult it is to predict future data than the observed data).
  • kfoldic: -2 * elpd_kfold (on the conventional scale of deviance)

Model Comparison

loo_compare(loo_exact_sig, loo_exact_mbeta) 
          elpd_diff se_diff
fit_sig    0.0       0.0   
fit_mbeta -1.7       0.9   
  • loo_compare computes a paired estimate of the difference in ELPD to take advantage of the fact that the same set of data points was used to fit both models (and reduce the standard error). See Vethari et al. (2017) and the loo R package manual for details.
fit_sig$criteria$loo <- loo_exact_sig
fit_mbeta$criteria$loo <- loo_exact_mbeta
(w_dose <- model_weights(fit_sig, fit_mbeta, weights = "loo")) 
  fit_sig fit_mbeta 
 0.841057  0.158943 
  • Here, model_weights from brms will compute Akaike weights based on the information criterion values returned by loo

Bayesian Model Averaging

pe_sig <- posterior_epred(fit_sig, newdata = dose_df)
pe_mbeta <- posterior_epred(fit_mbeta, newdata = dose_df)
pe_avg <- pe_sig * w_dose[1] + pe_mbeta * w_dose[2]  
pe_avg <- pe_avg |>
  posterior_summary() |>
  as.data.frame() |> 
  bind_cols(dose_df) 
    Estimate  Est.Error       Q2.5      Q97.5      dose
1 -0.4167663 0.08680825 -0.5816652 -0.2445850  0.000000
2 -0.8312366 0.25879835 -1.3256556 -0.3984107  5.656566
3 -0.9563272 0.25696458 -1.3977850 -0.4571903 11.313131
4 -1.0409796 0.24418232 -1.4400838 -0.5073584 16.969697

Visualizing the Model Averaging

Hands-on exercises: dose finding

Part 3: Bayesian Mixed Models for Repeated Measures (MMRM)

Hypothetical dose finding study, N=200

Continuous outcome measured at baseline & weeks 2, 4, 8, 12

Overview and Analysis Goals

  • Mixed Effects Model for Repeated Measures (MMRM) commonly used for longitudinal data (each patient measured at multiple visits)

  • Direct likelihood analysis that can address hypothetical estimand of all patients completing the study on treatment without doing missing data imputation first

  • Commonly no structure assumed for correlations between visits and different variance allowed for different visits (to avoid unnecessary assumptions)

  • Convergence issues with REML fit common, especially for small sample sizes, which is alleviated by Bayesian inference with (weakly-)informative priors

  • Bayesian approach allows us to incorporate prior information and historical data, which is very interesting for Phase I studies

  • brms lets us easily add more components & structure to the model

What do our data look like?

Analysis Data Model (ADaM) Basic Data Structure (BDS)

USUBJID TRT01P AVISIT ADY AVAL CHG BASE
3 0 visit1 14 1.18 0.40 0.78
3 0 visit2 28 0.87 0.10 0.78
3 0 visit3 56 −0.81 −1.59 0.78
3 0 visit4 84 0.81 0.04 0.78
9 10 visit1 14 0.95 0.84 0.11
9 10 visit2 28 1.46 1.35 0.11
9 10 visit3 56 −0.97 −1.08 0.11
9 10 visit4 84 1.21 1.10 0.11
13 20 visit1 14 2.38 0.48 1.90
13 20 visit2 28 2.13 0.23 1.90

Patients can miss visits or drop-out of the study prior to the final visit.

Model Specification: Formal

Formally, let us assume that there are V visits. We assume that the V-dimensional response \boldsymbol{Y}_i for patient i satisfies \boldsymbol{Y}_i = \boldsymbol{X}_i \boldsymbol{\beta} + \boldsymbol{Z}_i \boldsymbol{b}_i + \boldsymbol{\epsilon}_i

with \boldsymbol{b}_i \sim \text{MVN}(\boldsymbol{0}, \boldsymbol{D}) and \boldsymbol{\epsilon}_i \sim \text{MVN}(\boldsymbol{0}, \boldsymbol{\Sigma}), where \boldsymbol{\Sigma} is a diagonal matrix. This implies \boldsymbol{Y}_i \sim \text{MVN}(\boldsymbol{X}_i \boldsymbol{\beta}, \boldsymbol{V}_i), where \boldsymbol{V}_i = \boldsymbol{Z}_i \boldsymbol{D} \boldsymbol{Z}_i^T + \boldsymbol{\Sigma}. Model the correlated Y_{ij} either by

  1. marginalizing out random effects & account for them with correlated residual errors (residual covariance matrix V_i), or
  2. conditionally on (V-dimensional) random effects \boldsymbol{b}_i with residual errors \boldsymbol{\epsilon}_i independent (once we condition on \boldsymbol{b}_i).

MMRMs in SAS

Widely-used high-quality reference implementation

PROC MIXED DATA=simulated_data;
  CLASS TRT01P AVISIT USUBJID;
  MODEL CHG ~ TRT01P AVISIT BASE TRT01P*AVISIT AVISIT*BASE 
    / SOLUTION DDFM=KR ALPHA = 0.05;
  REPEATED AVISIT / TYPE=UN SUBJECT = USUBJID R Rcorr GROUP=TRT01P;
  LSMEANS TRT01P*AVISIT / DIFFS PDIFF CL OM E;
RUN;

MMRMs with the mmrm R package

Fit the model in R in a frequentist framework

library(mmrm)
mmrm_fit <- mmrm(
  formula = CHG ~ TRT01P + AVISIT + BASE + AVISIT:TRT01P + 
    AVISIT:BASE + us(AVISIT | TRT01P / USUBJID),
  method = "Kenward-Roger", 
  vcov = "Kenward-Roger-Linear", # to match SAS
  data = mutate(simulated_data, USUBJID=factor(USUBJID))
)

Refer to https://openpharma.github.io/mmrm.

Motivation Forward Difference Parametrization

We would like to put priors on the differences from visit to visit

Difference contrasts in R

Setup forward difference contrasts for changes between visits:

contrasts(simulated_data$AVISIT) <- MASS::contr.sdif

Hard to interpret the contrast matrix directly:

contrasts(simulated_data$AVISIT) |> MASS::fractions()
       visit2-visit1 visit3-visit2 visit4-visit3
visit1 -3/4          -1/2          -1/4         
visit2  1/4          -1/2          -1/4         
visit3  1/4           1/2          -1/4         
visit4  1/4           1/2           3/4         

Learn more about contrasts: https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf

What do these contrasts mean?

Inverting the contrast matrix reveals the dummy variables’ interpretation:

# add the intercept
cmat <- cbind("Intercept" = 1, contrasts(simulated_data$AVISIT))
# compute the inverse matrix
solve(cmat) |> MASS::fractions()
              visit1 visit2 visit3 visit4
Intercept     1/4    1/4    1/4    1/4   
visit2-visit1  -1      1      0      0   
visit3-visit2   0     -1      1      0   
visit4-visit3   0      0     -1      1   

MMRMs in brms

mmrm_model1 <- bf(
  CHG ~ 1 + AVISIT + BASE + BASE:AVISIT + TRT01P + TRT01P:AVISIT 
    + unstr(time = AVISIT, gr = USUBJID),
  sigma ~ 1 + AVISIT + TRT01P + AVISIT:TRT01P
)
mmrm_prior1 <- prior(normal(0, 2), class=Intercept) +
    prior(normal(0, 1), class=b) +
    prior(normal(0, log(10.0)/1.64), class=Intercept, dpar=sigma) +
    prior(normal(0, log(2.0)/1.64), class=b, dpar=sigma) +  
    prior(lkj(1), class=cortime)
fit_mmrm1 <- brm(
  formula = mmrm_model1, data = simulated_data, prior = mmrm_prior1, ...
)

Note that a single corrleation matrix is estimated while the covariance varies as per the sigma model.

Expected marginal (least-squares) means

emm2 <- emmeans(fit_mmrm1, ~ TRT01P | AVISIT, weights="proportional")
AVISIT = visit1:
 TRT01P  emmean lower.HPD upper.HPD
 0      -0.0374   -0.2637    0.1998
 10     -0.1663   -0.3601    0.0434
 20      0.0597   -0.1276    0.2505
 40     -0.1128   -0.2838    0.0793

AVISIT = visit4:
 TRT01P  emmean lower.HPD upper.HPD
 0      -0.2937   -0.6172   -0.0181
 10      0.2062   -0.0653    0.5070
 20      0.4645    0.2035    0.7719
 40      0.2955    0.0408    0.5411

Point estimate displayed: median 
HPD interval probability: 0.95 

To work with MCMC samples of the expected marginal means & to summarize them exactly as we want (e.g. quantile credible intervals instead of highest posterior density intervals) we can use:

emm2 |> as.mcmc() |> summarize_draws()

Expected Marginal Contrasts per Visit

contrast(emm2, adjust="none", method="trt.vs.ctrl", ref="TRT01P0")
AVISIT = visit1:
 contrast           estimate lower.HPD upper.HPD
 TRT01P10 - TRT01P0  -0.1267   -0.4490     0.170
 TRT01P20 - TRT01P0   0.0959   -0.2036     0.396
 TRT01P40 - TRT01P0  -0.0765   -0.3619     0.222

AVISIT = visit4:
 contrast           estimate lower.HPD upper.HPD
 TRT01P10 - TRT01P0   0.5024    0.0925     0.922
 TRT01P20 - TRT01P0   0.7598    0.3540     1.155
 TRT01P40 - TRT01P0   0.5932    0.2090     0.969

Point estimate displayed: median 
HPD interval probability: 0.95 

as.mcmc() to work with MCMC samples of the difference, e.g.

contrast(emm2, adjust="none", method="trt.vs.ctrl", ref="TRT01P0") |> as.mcmc(()

Monotonic Effects Across Ordered Factor Levels

mmrm_model2 <- bf( 
  CHG ~ 1 + AVISIT + mo(TRT01P) + BASE + mo(TRT01P):AVISIT 
    + BASE:AVISIT + unstr(time = AVISIT, gr = USUBJID),
  sigma ~1 + AVISIT + mo(TRT01P) + mo(TRT01P):AVISIT
)

For category c=0,\ldots,(\text{categories}-1) = C-1, the monotonic term is \text{coefficient} \times (C-1) \times \sum_{k=1}^{c} \zeta_k

  • \zeta_k \in [0, 1] is the percent increase and is subject to the constraint \sum_{k=1}^{C-1} \zeta_k = 1

  • \text{coefficient} is the mean change between adjacent categories

For more details see the respective vignette:
https://paulbuerkner.com/brms/articles/brms_monotonic.html

fit_mmrm2 <- brm(formula = mmrm_model2,
                 data = simulated_data |> mutate(TRT01P=ordered(TRT01P)), prior = mmrm_prior1, ...)

Results from different MMRMs

MMRMs: Outlook

In the online case study on http://opensource.nibr.com/bamdd you additionally find:

  • Data and full code
  • More on estimands, parametrization, contrasts & priors
  • Estimating average differences across visits
  • Meta-analytic combined (MAC) using historical data
  • Robustifying MAC via a “slab-and-spike”-type prior
  • Non-linear functions over time & doses in MMRMs
  • Excercises

Part 4: Longitudinal data

Longitudinal Data: Background

  • Example: simulated study of a drug for the treatment of Psoriasis
  • 112 patients, 59 randomized to placebo, 53 to drug.
  • Efficacy was assessed using the Psoriasis Area and Severity Index (PASI), a numerical score which measures the severity and extent of psoriasis (0: no disease, 72: maximum).
  • Assessed at baseline and 7 post-baseline timepoints.

Longitudinal Data: Illustration

Longitudinal Data: Key Variables

# A tibble: 10 × 6
      CHG  BASE TRT01P AVISIT  AVISITN SUBJID
    <dbl> <dbl> <fct>  <fct>     <dbl>  <dbl>
 1 -2.2    16   PBO    Week 1        1      1
 2 -0.5    16   PBO    Week 2        2      1
 3  4.7    16   PBO    Week 4        4      1
 4  7.3    16   PBO    Week 6        6      1
 5 32.1    16   PBO    Week 8        8      1
 6 10      16   PBO    Week 10      10      1
 7 20.7    16   PBO    Week 12      12      1
 8  3.4    20.3 TRT    Week 1        1      2
 9  1.2    20.3 TRT    Week 2        2      2
10  0.600  20.3 TRT    Week 4        4      2

Longitudinal Data: Mean Model

fit_long_mean <- brm(
  CHG ~ BASE + TRT01P * AVISIT + (1 | SUBJID), 
  data = pasi_data, seed = 2454
)
fit_long_mean <- add_criterion(fit_long_mean, "loo")

Total elapsed fitting time (seconds):

sum(rstan::get_elapsed_time(fit_long_mean$fit))
[1] 3.652

Mean Model: Posterior predictive checks

pp_check(fit_long_mean)
  • pp_check in brms performs posterior predictive checks with the help of the bayesplot package (shows dens_overlay by default).
    • Can also be called with type loo_pit (probability integral transformation). In an ideal model fit, the resulting PIT-transformed values would be uniformly distributed: i.e. the blue points and dashed lines would align with the distribution function of a Uniform(0,1) variable (solid line). See chapter 12 of BAMDD for details.

Mean Model: LOO-CV results

loo(fit_long_mean)

Computed from 4000 by 684 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2286.3 33.1
p_loo       102.3  9.4
looic      4572.6 66.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.9]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     681   99.6%   80      
   (0.7, 1]   (bad)        3    0.4%   <NA>    
   (1, Inf)   (very bad)   0    0.0%   <NA>    
See help('pareto-k-diagnostic') for details.
  • While this is quick, is this what we are interested in?
    • If we’re interested in the prediction for a new (unseen) patient, we should be running leave-one-patient-out CV (not leave-one-observation-out)…
      See kfold_split_grouped in the loo package.
    • For the Emax model data: not possible (IPD needed!)

Mean Model: Visualization of effects

conditional_effects(fit_long_mean, "AVISIT:TRT01P")
  • By default, numeric variables will be conditionalized by using their means and factors will get their first level assigned.
  • We might be interested in the effect integrated out over a population \rightarrow emmeans

Longitudinal Data: Linear Model

fit_long_lin <- brm(
  CHG ~ BASE + TRT01P * AVISITN + (1 | SUBJID), 
  data = pasi_data, seed = 2454
)
fit_long_lin <- add_criterion(fit_long_lin, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_lin$fit))
[1] 3.111

Linear Model: Summary

summary(fit_long_lin)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: CHG ~ BASE + TRT01P * AVISITN + (1 | SUBJID) 
   Data: pasi_data (Number of observations: 684) 

Multilevel Hyperparameters:
~SUBJID (Number of levels: 112) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     5.23      0.45     4.41     6.17 1.00     1493     2463

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            12.99      1.83     9.42    16.69 1.00     1275     2212
BASE                 -0.70      0.08    -0.85    -0.54 1.00     1282     2047
TRT01PTRT            -5.96      1.35    -8.57    -3.34 1.00     1550     1953
AVISITN              -0.22      0.09    -0.39    -0.05 1.00     3625     3546
TRT01PTRT:AVISITN    -0.72      0.13    -0.96    -0.47 1.00     3421     3194

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     6.51      0.20     6.14     6.92 1.00     4713     2555

Linear Model: Visualization of effects

conditional_effects(fit_long_lin, "AVISITN:TRT01P")

Longitudinal Data: Linear Model with Varying Slopes

fit_long_lin_vs <- brm(
  CHG ~ BASE + TRT01P * AVISITN + (1 + AVISITN | SUBJID),
  data = pasi_data, seed = 2454
)
fit_long_lin_vs <- add_criterion(fit_long_lin_vs, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_lin_vs$fit))
[1] 8.545

Linear Model: Summary

summary(fit_long_lin_vs)
 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: CHG ~ BASE + TRT01P * AVISITN + (1 + AVISITN | SUBJID) 
   Data: pasi_data (Number of observations: 684) 

Multilevel Hyperparameters:
~SUBJID (Number of levels: 112) 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)              5.26      0.54     4.25     6.35 1.00     1578     2512
sd(AVISITN)                1.10      0.09     0.94     1.30 1.01      519     1008
cor(Intercept,AVISITN)    -0.53      0.09    -0.69    -0.34 1.01      402     1128

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept             9.21      1.71     5.85    12.47 1.00      954     1632
BASE                 -0.49      0.08    -0.64    -0.35 1.00      997     1908
TRT01PTRT            -6.47      1.18    -8.74    -4.19 1.00     1783     2657
AVISITN              -0.25      0.16    -0.56     0.06 1.00      915     1985
TRT01PTRT:AVISITN    -0.70      0.23    -1.14    -0.26 1.00      760     1626

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.67      0.15     4.39     4.96 1.00     3107     3238

Linear Model with Varying Slopes: Visualization of effects

conditional_effects(fit_long_lin_vs, "AVISITN:TRT01P")

Model Comparison

loo_compare(fit_long_lin, fit_long_lin_vs)
                elpd_diff se_diff
fit_long_lin_vs    0.0       0.0 
fit_long_lin    -173.7      20.3 

Longitudinal Data: Monotonic Model

fit_long_mo <- brm(
  CHG ~ BASE + TRT01P * mo(AVISITN) + 
    (1 + mo(AVISITN) | SUBJID),
  data = pasi_data, seed = 24543
)
fit_long_mo <- add_criterion(fit_long_mo, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_mo$fit))
[1] 30.034

Monotonic Model: Visualization of effects

conditional_effects(fit_long_mo, "AVISITN:TRT01P")

Model Comparison

loo_compare(fit_long_lin_vs, fit_long_mo)
                elpd_diff se_diff
fit_long_mo       0.0       0.0  
fit_long_lin_vs -92.5      11.8  

Longitudinal Data: Quadratic Model

fit_long_quad <- brm(
  CHG ~ BASE + TRT01P * (AVISITN + I(AVISITN^2)) + 
    (1 + AVISITN + I(AVISITN^2) | SUBJID),
  data = pasi_data, seed = 245124
)
fit_long_quad <- add_criterion(fit_long_quad, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_quad$fit))
[1] 42.86

Quadratic Model: Visualization of effects

conditional_effects(fit_long_quad, "AVISITN:TRT01P")

Longitudinal Data: Centering the time variable

Non-centered variables are often highly correlated with their square

cor(pasi_data$AVISITN, pasi_data$AVISITN^2)
[1] 0.9734303

Centering can be done around any central value. Here we use 6 because this is the average week in the study

pasi_data <- pasi_data |> 
  mutate(AVISITN_c = AVISITN - 6)

After centering, the correlation is drastically reduced

cor(pasi_data$AVISITN_c, pasi_data$AVISITN_c^2)
[1] 0.2548532

Longitudinal Data: Centered Quadratic model

fit_long_quad_c <- brm(
  CHG ~ BASE + TRT01P * (AVISITN_c + I(AVISITN_c^2)) + 
    (1 + AVISITN_c + I(AVISITN_c^2) | SUBJID),
  data = pasi_data, seed = 24547
)
fit_long_quad_c <- add_criterion(fit_long_quad_c, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_quad_c$fit))
[1] 14.114

Centered Quadratic Model: Visualization of effects

conditional_effects(fit_long_quad_c, "AVISITN_c:TRT01P")

Model Comparison

loo_compare(fit_long_mo, fit_long_quad_c)
                elpd_diff se_diff
fit_long_mo       0.0       0.0  
fit_long_quad_c -20.3       9.9  

Longitudinal Data: Gaussian Process Model

fit_long_gp <- brm(
  CHG ~ BASE + gp(AVISITN_c, by = TRT01P, k=9) + 
    (1 + (AVISITN_c + I(AVISITN_c^2)) | SUBJID),
  data = pasi_data, seed = 32454,
  control = list(adapt_delta = 0.95)
)
fit_long_gp <- add_criterion(fit_long_gp, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_gp$fit))
[1] 82.378

Gaussian Process Model: LOO-CV results

loo(fit_long_gp)

Computed from 4000 by 684 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2045.3 35.7
p_loo       194.4 16.1
looic      4090.7 71.4
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.8]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     642   93.9%   106     
   (0.7, 1]   (bad)       36    5.3%   <NA>    
   (1, Inf)   (very bad)   6    0.9%   <NA>    
See help('pareto-k-diagnostic') for details.

Gaussian Process Model: Visualization of effects

conditional_effects(fit_long_gp, "AVISITN_c:TRT01P")

Model Comparison

loo_compare(fit_long_quad_c, fit_long_gp)
                elpd_diff se_diff
fit_long_gp       0.0       0.0  
fit_long_quad_c -12.8       6.6  

Recall that model comparisons may not be trustworthy.

Longitudinal Data: Gaussian Process AR1 Model

fit_long_gp_ar <- brm(
  CHG ~ BASE + gp(AVISITN_c, by = TRT01P, k=9) + 
    ar(AVISITN_c, gr = SUBJID, p = 1) +
    (1 + (AVISITN_c + I(AVISITN_c^2)) | SUBJID),
  data = pasi_data, seed = 24235,
  iter = 4000, warmup = 1000,
  control = list(adapt_delta = 0.95)
)
fit_long_gp_ar <- add_criterion(fit_long_gp_ar, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_gp_ar$fit))
[1] 172.381

Gaussian Process AR1 Model: Autocorrelation results

Some autocorrelation may be present:

posterior_summary(fit_long_gp_ar, variable = "ar") |>
  round(3)
      Estimate Est.Error  Q2.5 Q97.5
ar[1]    0.379     0.121 0.157 0.625

A more formal model comparison may be theoretically advantageous but given that we add only a single parameter (the AR1 effect), we should probably gain more than we loose by including it.

Gaussian Process AR1 Model: Visualization of effects

conditional_effects(fit_long_gp_ar, "AVISITN_c:TRT01P")

Course wrap-up

Summary

  • Diverse opportunities for applied modelling to inform good drug-development decisions
  • Bayesian paradigm is well suited for many of these situations
    • Availability of meaningful prior information
    • Desire for probabilistically interpretable statements about unknowns and future observable quantities
  • brms is a powerful and highly flexible engine for applied modelling , facilitating straightforward model specification and inference

Looking ahead

  • We hope you have:
    • Become familiar with brms syntax and workflow
    • Seen its versatility for statistical modelling in drug development
    • Gained hands-on experience with the package from guided exercises
  • And that you feel empowered to use brms going forward!

Resources

Thank you