Hierarchical Models in Preclincal Research

Preface

Disclaimer

The opinions & statements expressed in this presentation and on the following slides are solely of the presenters, and not necessarily those of Novartis.

Learning Goals

After this course, you should:

Be familiar with brms syntax and workflow
Recognize its versatility for statistical modelling in drug development
Have hands-on experience with the package from two guided exercises

and of course:

Feel empowered to use brms going forward!

Housekeeping

Q&A: you may raise your hand at any time, or hold for Q&A sessions at the end of each section
Laptop charging: we recommend conserving battery by keeping your laptop powered down except during the hands-on exercises
For hands-on exercises, we will use the Göttingen RStudio server: http://134.76.20.1:8787
- More instructions to come when we begin the first exercise
Online case study library: https://opensource.nibr.com/bamdd
Slides & exercises for workshop
Course materials will also be uploaded to: https://github.com/Novartis/bamdd/tree/main/workshops/goettingen2024

Acknowledgements

Paul Buerkner (Technical University of Dortmund)
Andrew Bean (Novartis)
Björn Holzhauer (Novartis)
David Ohlssen (Novartis)
Cong Zhang (Novartis)

Tutorial Agenda

Introduction: a brms modelling workflow
Historical controls
Priors
Dose finding (nonlinear E_max models, model averaging)
Mixed models with repeated measurements
Longitudinal data

A `brms` modelling workflow

Bayesian inference

p(\theta|y) = \frac{p(y|\theta) \, p(\theta)}{p(y)}

Prior p(\theta)
Model likelihood p(y|\theta)
Marginal data likelihood p(y) = \int p(y|\theta) \, p(\theta) \, d\theta
Posterior p(\theta|y) is conditional on data y

Bayesian inference is integration

\hat{\theta} = E[\theta|y] = \int \theta \, p(\theta|y) \, d\theta

In practice solved by Markov-Chain Monte-Carlo (MCMC) in generating draws s=1, \ldots, S

\theta_s \sim p(\theta|y)

\int \theta \, p(\theta|y) \, d\theta \approx \frac{1}{S} \sum_{s=1}^S \theta_s

Stan project: Bayesian modelling platform

mc-stan.org
discourse.mc-stan.org
Models are written in the Stan probabilistic programming language for priors & model likelihood conditional on data
Hamiltonian Monte-Carlo No-U-turn sampler (NUTS)

data {
  int<lower=0> N;
  array[N] int<lower=0,upper=1> y;
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  theta ~ beta(1,1);
  y ~ bernoulli(theta);
}

brms: Bayesian regression modelling in Stan

https://paulbuerkner.com/brms/

Specify models via extended R formula syntax
Internally write Stan code that is readable yet fast
Provide an easy interface for defining priors & models
Facilitate post-processing

brms

Some Highlights of brms

Flexible hierarchical (random effects) modeling
Both built-in and user-defined likelihoods
Explicit and implicit non-linear modeling
Distributional regression
Within-chain parallelization
Posterior and prior predictions
Model checking
Highly dense feature matrix

Model specification via `formula`

Varying intercept model for a single grouping factor:

formula <- y ~ 1 + x + (1 | g)

Varying intercept-slope model for a single grouping factor:

formula <- y ~ 1 + x + (1 + x | g)

Advanced non-linear terms such as Gaussian processes:

formula <- y ~ 1 + gp(x) + (1 + x | g)

… or approximate Gaussian processes:

formula <- y ~ 1 + gp(x, k=9) + (1 + x | g)

Model specification via `formula`

Linear formulas for multiple distributional parameters (e.g., predict mean and overdispersion of negative binomial):

formula <- bf(
  y ~ 1 + x + (1 | g) + ...,
  par2 ~ 1 + x + (1 | g) + ...,
  par3 ~ 1 + x + (1 | g) + ...,
)

Non-linear formula for a single distributional parameter:

formula <- bf(
  y ~ fun(x, nlpar1, nlpar2),
  nlpar1 ~ 1 + x + (1 | g) + ...,
  nlpar2 ~ 1 + (1 | g) + ...,
  nl = TRUE
)

Model likelihood via `family`

General structure:

family <- brmsfamily(
    family = "<family>", link = "<link>",
    more_link_arguments
)

Gaussian likelihood (default):

family <- brmsfamily(family = "gaussian", link = "identity",
                     link_sigma = "log")

Poisson likelihood:

family <- brmsfamily(family = "poisson", link = "log")

See also vignette("brms_families") for details on the families.

Recommended global brms settings

Ensure to have at least 8 GB of RAM available for compilation step
Preferable to use the cmdstanr backend and set cmdstanr_write_stan_file_dir to a model binary caching directory. This avoids model recompilation with and between R sessions.
Use of the here R package is encouraged to manage file paths. This makes all file paths relative to a common project root.

library(brms)
here::i_am("relative_path_in_project_to/rscript.R")
library(here)

options(
  # how many processor cores would you like to use?
  mc.cores = 4, 
  # how would you like to access Stan?
  brms.backend = "cmdstanr",
  # cache model binaries
  cmdstanr_write_stan_file_dir = here::here("_brms-cache"),
  # no need to normalize likelihoods
  brms.normalize = FALSE
)
# create cache directory if not yet available
dir.create(here::here("_brms-cache"), FALSE)

Part 1: historical control data

Use of historical control data in clinical trials

Goal: reduce control group sample size while maintaining power

Collect historical data from relevant literature systematically
Evaluate heterogeniety of historical data
data quality, patient population, trial design
Pre-specify trial protocol
prior documentation, operating charachteristics
Prior-data conflict?

The meta-analytic-predictive (MAP) prior¹

Borrow from historical data under an exchangeability assumption
Discount/down-weight historical data through between-trial heterogeneity \tau
Meta-analytic-predictive (MAP) prior through generative nature of hierarchical models

Pre-requisites
- essentially the same endpoint
- similar treatment procedures
- similar patient eligibility criteria
- similar patient population characteristics

flowchart LR
    M((β,τ)) --> A
    M --> B
    M --> C
    M -->|prediction| T
    A((θ<sub>1</sub>)) --> YA[Y<sub>1</sub>]
    B((θ<sub>2</sub>)) --> YB[Y<sub>2</sub>]
    C((θ<sub>3</sub>)) --> YC[Y<sub>3</sub>]
    T((θ<sub>*</sub>)) -->|new trial| YT[Y<sub>*</sub>]

Example: double-blind PoC placebo controlled trial¹

Endpoint binary ASAS20 response at week 6
Improvement = higher rates
Historical control data
8 studies with 513 patients
Bayesian design
- P(\pi_t - \pi_p > 0 | y) > 0.95
- Placebo: (“MAP prior” from historical data)
  - \pi_{p} \sim \text{Beta}(11,32)
- Active:
  - \pi_{t} \sim \text{Beta}(0.5,1)
Randomization ratio 4:1 (24 vs 6)

MAP prior:

mean	sd	2.5%	50%	97.5%
0.26	0.09	0.11	0.25	0.47

Generalized Meta-Analytic-Predictive model

Y is the (control) group summary data for H historical trials \begin{align*} Y_{h}|\theta_{h} &\sim f(\theta_{h}) & \forall \; h \in [1,H] \\ Y_\ast|\theta_\ast &\sim f(\theta_\ast) & \text{for new trial} \end{align*}

Exchangeability assumption: \begin{align*} g(\theta_{h})|\beta,\tau &\sim \text{Normal}(\beta, \tau^2) & \forall \; h \in [1,H] \\ g(\theta_\ast)|\beta,\tau &\sim \text{Normal}(\beta, \tau^2) & \text{for new trial} \end{align*}

f likelihood / g link function: Binomial/logit, Normal (known \sigma)/identity or Poisson/log

\tau \rightarrow 0 \Rightarrow pooling / unbound use of historical data
\tau \rightarrow \infty \Rightarrow stratification / no use of historical data

Running the MAP analysis in R

RBesT¹

library(RBesT)
set.seed(98721487)
map_mc <- gMAP(cbind(r, n-r) ~ 1 | study,
               data=RBesT::AS,
               family=binomial,
               tau.dist="HalfNormal",
               tau.prior=1,
               beta.prior=2)

brms

library(brms)
# use of an outcome modifier to specify number of trials
# intercept study random effect requested with (1 | study)
bmap_model <- bf(r | trials(n) ~ 1 + (1 | study),
                 family=binomial, center=FALSE)
bmap_model_prior <- prior(normal(0, 2), class=b, coef=Intercept) +
    prior(normal(0, 1), class=sd, coef=Intercept, group=study)
bmap_mc <- brm(bmap_model,
               data=RBesT::AS,
               prior=bmap_model_prior,
               seed=98721487,
               control=list(adapt_delta=0.99)) # Stan sampling parameter

study	n	r
Study 1	107	23
Study 2	44	12
Study 3	51	19
Study 4	39	9
Study 5	139	39
Study 6	20	6
Study 7	78	9
Study 8	35	10

Summary MAP prior model

summary(bmap_mc)

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | study) 
   Data: RBesT::AS (Number of observations: 8) 

Multilevel Hyperparameters:
~study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.37      0.21     0.04     0.86 1.00     1104     1126

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.11      0.19    -1.48    -0.73 1.00     1581     1753

Intercept is the typical trial response rate on the logit scale
sd(Intercept) is the between-trial heterogeneity \tau

Predicting the placebo response rate in a new study

AS_new <- data.frame(study = "new_study", n = 1)
pe <- posterior_epred(
    bmap_mc, newdata = AS_new,
    allow_new_levels = TRUE, 
    sample_new_levels = "gaussian"
)
posterior_summary(pe)

     Estimate  Est.Error      Q2.5     Q97.5
[1,] 0.257566 0.08803654 0.1101559 0.4716962

allow_new_levels=TRUE allows to simulate the study random effect for a new (unseen) studies
sample_new_levels = "gaussian" ensures that the simulation is done using the normal hierarchical model (the default is to bootstrap existing levels)

Approximation with a finite mixture

For specifying the prior as part of a pre-specified analysis we choose a parametric approximation:

pe_mix <- automixfit(pe[, 1], type = "beta")
print(pe_mix, digits=3)

EM for Beta Mixture Model
Log-Likelihood = 4433.645

Univariate beta mixture
Mixture Components:
  comp1   comp2   comp3   comp4  
w   0.452   0.198   0.184   0.165
a  34.650  22.567  11.717   2.550
b 106.893  48.984  54.192   5.588

plot(pe_mix)$mix

Region specific MAP prior (hypothetically)

withr::with_seed(654873,
                 AS_region <- bind_cols(AS, region=sample(c("asia", "europe", "north_america"), 8, TRUE)))
kable(AS_region)

study	n	r	region
Study 1	107	23	europe
Study 2	44	12	asia
Study 3	51	19	europe
Study 4	39	9	europe
Study 5	139	39	asia
Study 6	20	6	north_america
Study 7	78	9	europe
Study 8	35	10	asia

Region specific MAP Priors via varying regions model

form_AS_region <- bf(r | trials(n) ~ 1 + (1 | region/study), 
                     family = binomial, center=FALSE)

# show which priors brms defines for this model
get_prior(form_AS_region, AS_region)

bprior_AS_region <- prior(normal(0, 2), class=b, coef=Intercept) +
  prior(normal(0, 0.50), class=sd, coef=Intercept, group=region) +
  prior(normal(0, 0.25), class=sd, coef=Intercept, group=region:study)

fit_AS_region <- brm(
  form_AS_region, data = AS_region, prior = bprior_AS_region, seed = 29856341,
  control=list(adapt_delta=0.99))

(1 | region/study) specifies a nested random effect structure as in this example each study is run in a single region. It is equivalent to
(1 | region) + (1 | region:study).

Summary region model

summary(fit_AS_region)

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | region/study) 
   Data: AS_region (Number of observations: 8) 

Multilevel Hyperparameters:
~region (Number of levels: 3) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.28      0.22     0.01     0.86 1.00     1605     1680

~region:study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.24      0.14     0.02     0.53 1.00     1648     1345

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.06      0.26    -1.57    -0.46 1.00     1643     1661

Extract MAP MCMC samples

AS_region_new <- data.frame(study = "new_study_asia", 
                            n = 1, region = "asia")

pe_region <- posterior_epred(
  fit_AS_region, newdata = AS_region_new, 
  allow_new_levels = TRUE, 
  sample_new_levels = "gaussian"
)
dim(pe_region)

[1] 4000    1

posterior_summary(pe_region)

      Estimate  Est.Error      Q2.5     Q97.5
[1,] 0.2691718 0.06397847 0.1549326 0.4190814

Obtain parametric MAP prior

pe_mix_region <- RBesT::automixfit(pe_region[, 1], type = "beta")
plot(pe_mix_region)$mix

Summary historical control data

Bayesian random-effects meta-analysis models can be used to derive Meta-Analytic-Predictive (MAP) priors
Predictions for new study mean inform the MAP prior
Specification and inference for MAP models is simple in brms
Not covered: Parametric MAP mixtures can be used as priors in brms using the RBesT::mixstanvar adapter

Hands-on exercises: historical control data

Priors

The prior can only be understood in the context of the model ¹

Commonly heard reasons to be afraid of priors

“Priors are inherently subjective”

“Priors bias your analysis”

“You should let the data speak for themselves”

“Non-Bayesian models don’t have priors and don’t need them”

“I have no idea how to set appropriate priors”

Are priors inherently subjective?

Consider using the alternative terms “contextual” or “conditional”¹

on the person or people or agent performing the analysis
on results of previous studies
on structural assumptions
…

Do priors bias your analysis?

Definition:

An estimator \hat{\theta} of the parameter \theta is called unbiased if \text{E}_{y}(\hat{\theta}) = \theta.

Corrollary:

If an estimator \hat{\theta} is unbiased for a given prior, it will be biased for most other priors.

Corrollary:

If an estimator \hat{\theta} is unbiased at a given scale, it will be biased for most non-linear transformations f(\hat{\theta}) (Jensen’s inequality): \text{E}_y(f(\hat{\theta})) \neq f(\theta)

Unbiasedness is fragile and often only holds in the limit

Is bias even the right metric?

A sampling distribution is a distribution of a data-dependent variable (a “statistic”) as implied by variation in data y across the true data generating process

Consider two estimators \hat{\theta}_1 and \hat{\theta}_2 of \theta with sampling distributions given by

p(\hat{\theta}_1) := p(\hat{\theta}_1(y)) = \text{normal}(\theta, 10)
p(\hat{\theta}_2) := p(\hat{\theta}_2(y)) = \text{normal}(\theta + 0.2, 1)

Which of them is better?

Winning the bias-variance trade-off

The RMSE can be defined as \begin{align*} \text{RMSE}_y(\hat{\theta}, \theta) & = \sqrt{\int (\hat{\theta}(y) - \theta)^2 \; p(y) \; d y} \\ & = \sqrt{\text{Var}_y{\hat{\theta}} + \text{Bias}_y(\hat{\theta}, \theta) ^2} \end{align*}

It expresses a trade-off between variance and bias

A lot of priors will help you win the bias-variance trade-off, not the race for minimal bias!

Non-Bayesian models don’t have priors?

Basic multilevel model with J groups:

y ~ normal(b0[j] + b1 * x, sigma)
b0[j] ~ normal(b0, sd0)

Is b0[j] ~ normal(b0, sd0) a likelihood, prior, or something else?

Some reasons for using priors

Make a-priori implausible values unlikely (weakly informative priors)
Incorporate specific expert information into the model (“subjective” priors)
Mimic frequentist methods (uninformative/“objective” priors)
Represent known data structure (multilevel priors)
Regularize the model to avoid overfitting (shrinkage/sparsifying priors)
Enable hypothesis testing via Bayes factors
Ensure unimodal posteriors
Facilitate convergence
…

Some observations about priors

Uniformity is informative

Prior tails matter

What is the posterior?

… if the prior is \theta \sim \text{student-t}(4, 5, 1)
… and the likelihood alone implies a posterior of \theta \sim \text{normal}(-5, 1)

Prior tails matter: Student-t(4) prior vs. normal likelihood

Prior tails matter: Normal prior vs. Student-t(4) likelihood

Prior tails matter: Normal prior vs. normal likelihood

Prior tails matter: Student-t(4) prior vs. Student-t(4) likelihood

But what now?

Principles of specifying priors

Respecting boundaries: use hard bounds in priors exactly where parameters have natural bounds
Expressiveness: use prior families flexible enough to express different plausible prior knowledge
Controlling complexity: If you want to be conservative, put relevant amount of prior probability at or near a conservative submodel
Scale awareness: ensure that priors take the scales of parameters into account
Reflecting structure: reflect the structure of the data in the structure of the prior
Data Informed: Use previous data to inform the current priors

Data informed priors

The RBesT package allows to build Meta-Analytic Predictive (MAP) priors, a special kind of data informed priors
See the ESMARConf2023 tutorial on RBesT on Youtube
Also check out https://opensource.nibr.com/bamdd/ the for the following MAP case studies: historical control without / with stratification, treatment effects, single arm PoS

Part 2: Dose finding

Typical dose response shapes

Data Set PATHWAY

Placebo controlled trial
Treatment of severe asthma with tezepelumab
Three different doses + placebo
Endpoint: annualized rate of asthma exacerbations
Estimates per arm from negative binomial regression (like in “arm-based meta-analysis”), not individual patient data

dose	group	log_est	log_stderr
0	placebo	−0.400	0.103
70	tezepelumab 70 mg q4w	−1.347	0.177
210	tezepelumab 210 mg q4w	−1.661	0.222
560	tezepelumab 280 mg q2w	−1.514	0.191

Sigmoid E_max Model

f(\text{dose}; \text{parameters}) = \text{E}_0 + \text{E}_\text{max} \frac{\text{dose}^h}{\text{dose}^h + \text{ED}_{50}^h}

Parameters:

\text{E}_0 \in \mathbb{R}: Expected placebo outcome
\text{E}_\text{max} \in \mathbb{R}: Maximum effect size
h \in \mathbb{R}_+: Hill (steepness) parameter
\text{ED}_{50}: Dose at which 50% of \text{E}_\text{max} is reached

Sigmoid E_max Model: Visualization

Specifying sigmoid E_max Model with `brms`

form_sig <- bf( 
  log_est | se(log_stderr) ~ E0 + Emax * dose^h / 
                             (dose^h + ED50^h),
  nlf(h ~ exp(logh)), nlf(ED50 ~ exp(logED50)),
  E0 ~ 1, Emax ~ 1, logh ~ 1, logED50 ~ 1,
  nl = TRUE,
  family = gaussian()
)

prior_sig <- prior(normal(0,1), nlpar=E0) +
  prior(normal(0,1), nlpar=logh) +
  prior(normal(0,1), nlpar=Emax) +
  prior(normal(4,2), nlpar=logED50)

A note on E_max Model priors

What is more informative?

\text{logh} \sim \text{normal}(0,1)
\text{logh} \sim \text{normal}(0,3)

A note on E_max Model priors

Biologically plausible values: approx. .25 < h < 4 (see prior interval probabilities).
The above prior puts roughly 1/3 above 4 and 1/3 below 0.25!
Re-writing the fraction to \left(\frac{d}{\text{ED}_{50}}\right)^h / \left(1+\left(\frac{d}{\text{ED}_{50}}\right)^h\right) makes it obvious that for h \rightarrow 0, this goes to 1/2, and for h \rightarrow \infty, either to 0 (d < \text{ED}_{50}) or 1 (d > \text{ED}_{50}).

Fitting the sigmoid E_max Model with `brms`

fit_sig = brm(
  formula = form_sig, 
  data = pathway, 
  prior = prior_sig,
  control = list(adapt_delta = 0.999)
)

Sigmoid E_max Model: Results Summary

summary(fit_sig)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: log_est | se(log_stderr) ~ E0 + Emax * dose^h/(dose^h + ED50^h) 
         h ~ exp(logh)
         ED50 ~ exp(logED50)
         E0 ~ 1
         Emax ~ 1
         logh ~ 1
         logED50 ~ 1
   Data: pathway (Number of observations: 4) 

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
E0_Intercept         -0.42      0.10    -0.61    -0.21 1.00     2065     2179
Emax_Intercept       -1.30      0.32    -2.11    -0.84 1.00     1172     1199
logh_Intercept       -0.08      0.98    -1.90     1.92 1.00     1306     1914
logED50_Intercept     2.73      1.38    -0.27     5.32 1.00     1341     1273

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.00      0.00     0.00     0.00   NA       NA       NA

Visualizing the Fitted Sigmoid E_max Model

tibble(dose = seq(0, 560, 1), log_stderr=1) |>
  tidybayes::add_epred_rvars(object=fit_sig) |>
  (\(x) x |> 
     left_join(x |> filter(dose==0) |> rename(pbo = .epred) |> dplyr::select(-dose), 
               by="log_stderr"))() |>
  mutate(.delta = .epred - pbo) |>
  ggplot(aes(x=dose, ydist=.delta)) +
  ggdist::stat_lineribbon()

Modified Beta Model

f(\text{dose}; \text{parameters}) = \text{E}_0 + \text{E}_\text{max} \; \frac{(\delta_1 + \delta_2)^{(\delta_1+\delta_2)}} {\delta_1^{\delta_1} \, \delta_2^{\delta_2}} \; \left(\frac{\text{dose}}{S}\right)^{\delta_1} \left(1-\frac{\text{dose}}{S}\right)^{\delta_2}

Parameters:

\text{E}_0 \in \mathbb{R}: Expected placebo response
\text{E}_\text{max} \in \mathbb{R}: Maximum effect size
\delta_1, \delta_2 \in \mathbb{R}_+: Shape parameters
S: constant > maximum dose, e.g. 1.5 \times \max(\text{dose}), here we choose S=850

Specifying the Modified Beta Model with `brms`

form_mbeta <- bf( 
  log_est | se(log_stderr) ~ E0 +   
    Emax * (delta1+delta2)^(delta1+delta2) /
    (delta1^delta1 * delta2^delta2) * 
    (dose/850)^delta1 * (1-dose/850)^delta2,
  nlf(delta1 ~ exp(logdelta1)), nlf(delta2 ~ exp(logdelta2)),
  E0 ~ 1, Emax ~ 1, logdelta1 ~ 1, logdelta2 ~ 1,
  nl = TRUE,
  family = gaussian()
)

prior_mbeta <- prior(normal(0,1), nlpar="E0") +
  prior(normal(0,1), nlpar="Emax") +
  prior(normal(0,1), nlpar="logdelta1") +
  prior(normal(0,1), nlpar="logdelta2")

Fitting the Modified Beta Model with `brms`

fit_mbeta <- brm(
  form_mbeta, 
  data = pathway, 
  prior = prior_mbeta,
  control = list(adapt_delta = 0.999)
)

Visualizing the Fitted Modified Beta Model

Model Evaluation (failed attempt)

(loo_mbeta <- loo(fit_mbeta))


Computed from 4000 by 4 log-likelihood matrix.

         Estimate  SE
elpd_loo      0.1 0.4
p_loo         2.2 0.6
looic        -0.2 0.7
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 0.6]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     1     25.0%   1282    
   (0.7, 1]   (bad)      3     75.0%   <NA>    
   (1, Inf)   (very bad) 0      0.0%   <NA>    
See help('pareto-k-diagnostic') for details.

brms interfaces with the loo R package that performs efficient approximate leave-one-out cross-validation (LOO) for Bayesian models by Pareto smoothed importance sampling (PSIS). See Vethari et al. (2017) and the loo documentation on CRAN.

Model Evaluation (works)

loo_exact_sig <- kfold(fit_sig, folds = "loo", silent=2)
(loo_exact_mbeta <- kfold(fit_mbeta, folds = "loo", silent=2))


Based on 4-fold cross-validation.

           Estimate  SE
elpd_kfold     -2.0 1.6
p_kfold         4.3 2.2
kfoldic         4.0 3.3

elpd_kfold: CV estimate of the expected log pointwise predictive density for a new dataset
p_kfold: CV estimate of difference between elpd_loo and the non-cross-validated log posterior predictive density (describes how much more difficult it is to predict future data than the observed data).
kfoldic: -2 * elpd_kfold (on the conventional scale of deviance)

Model Comparison

loo_compare(loo_exact_sig, loo_exact_mbeta)

          elpd_diff se_diff
fit_sig    0.0       0.0   
fit_mbeta -1.7       0.9

loo_compare computes a paired estimate of the difference in ELPD to take advantage of the fact that the same set of data points was used to fit both models (and reduce the standard error). See Vethari et al. (2017) and the loo R package manual for details.

fit_sig$criteria$loo <- loo_exact_sig
fit_mbeta$criteria$loo <- loo_exact_mbeta
(w_dose <- model_weights(fit_sig, fit_mbeta, weights = "loo"))

  fit_sig fit_mbeta 
 0.841057  0.158943

Here, model_weights from brms will compute Akaike weights based on the information criterion values returned by loo

Bayesian Model Averaging

pe_sig <- posterior_epred(fit_sig, newdata = dose_df)
pe_mbeta <- posterior_epred(fit_mbeta, newdata = dose_df)
pe_avg <- pe_sig * w_dose[1] + pe_mbeta * w_dose[2]

pe_avg <- pe_avg |>
  posterior_summary() |>
  as.data.frame() |> 
  bind_cols(dose_df)

    Estimate  Est.Error       Q2.5      Q97.5      dose
1 -0.4167663 0.08680825 -0.5816652 -0.2445850  0.000000
2 -0.8312366 0.25879835 -1.3256556 -0.3984107  5.656566
3 -0.9563272 0.25696458 -1.3977850 -0.4571903 11.313131
4 -1.0409796 0.24418232 -1.4400838 -0.5073584 16.969697

Visualizing the Model Averaging

Hands-on exercises: dose finding

Part 3: Bayesian Mixed Models for Repeated Measures (MMRM)

Hypothetical dose finding study, N=200

Continuous outcome measured at baseline & weeks 2, 4, 8, 12

Overview and Analysis Goals

Mixed Effects Model for Repeated Measures (MMRM) commonly used for longitudinal data (each patient measured at multiple visits)
Direct likelihood analysis that can address hypothetical estimand of all patients completing the study on treatment without doing missing data imputation first

Commonly no structure assumed for correlations between visits and different variance allowed for different visits (to avoid unnecessary assumptions)
Convergence issues with REML fit common, especially for small sample sizes, which is alleviated by Bayesian inference with (weakly-)informative priors

Bayesian approach allows us to incorporate prior information and historical data, which is very interesting for Phase I studies
brms lets us easily add more components & structure to the model

What do our data look like?

Analysis Data Model (ADaM) Basic Data Structure (BDS)

USUBJID	TRT01P	AVISIT	ADY	AVAL	CHG	BASE
3	0	visit1	14	1.18	0.40	0.78
3	0	visit2	28	0.87	0.10	0.78
3	0	visit3	56	−0.81	−1.59	0.78
3	0	visit4	84	0.81	0.04	0.78
9	10	visit1	14	0.95	0.84	0.11
9	10	visit2	28	1.46	1.35	0.11
9	10	visit3	56	−0.97	−1.08	0.11
9	10	visit4	84	1.21	1.10	0.11
13	20	visit1	14	2.38	0.48	1.90
13	20	visit2	28	2.13	0.23	1.90

Patients can miss visits or drop-out of the study prior to the final visit.

Model Specification: Informal (popular choice)

Fixed effects:

baseline (pre-treatment) value of the endpoint
visit as a factor
treatment as a factor
treatment by visit interaction
visit by baseline value interaction

Random effects:

Random subject effect on the visit main effect
… or equivalently correlated residual error terms within subjects

Model Specification: Formal

Formally, let us assume that there are V visits. We assume that the V-dimensional response \boldsymbol{Y}_i for patient i satisfies \boldsymbol{Y}_i = \boldsymbol{X}_i \boldsymbol{\beta} + \boldsymbol{Z}_i \boldsymbol{b}_i + \boldsymbol{\epsilon}_i

with \boldsymbol{b}_i \sim \text{MVN}(\boldsymbol{0}, \boldsymbol{D}) and \boldsymbol{\epsilon}_i \sim \text{MVN}(\boldsymbol{0}, \boldsymbol{\Sigma}), where \boldsymbol{\Sigma} is a diagonal matrix. This implies \boldsymbol{Y}_i \sim \text{MVN}(\boldsymbol{X}_i \boldsymbol{\beta}, \boldsymbol{V}_i), where \boldsymbol{V}_i = \boldsymbol{Z}_i \boldsymbol{D} \boldsymbol{Z}_i^T + \boldsymbol{\Sigma}. Model the correlated Y_{ij} either by

marginalizing out random effects & account for them with correlated residual errors (residual covariance matrix V_i), or
conditionally on (V-dimensional) random effects \boldsymbol{b}_i with residual errors \boldsymbol{\epsilon}_i independent (once we condition on \boldsymbol{b}_i).

MMRMs in SAS

Widely-used high-quality reference implementation

PROC MIXED DATA=simulated_data;
  CLASS TRT01P AVISIT USUBJID;
  MODEL CHG ~ TRT01P AVISIT BASE TRT01P*AVISIT AVISIT*BASE 
    / SOLUTION DDFM=KR ALPHA = 0.05;
  REPEATED AVISIT / TYPE=UN SUBJECT = USUBJID R Rcorr GROUP=TRT01P;
  LSMEANS TRT01P*AVISIT / DIFFS PDIFF CL OM E;
RUN;

MMRMs with the `mmrm` R package

Fit the model in R in a frequentist framework

library(mmrm)
mmrm_fit <- mmrm(
  formula = CHG ~ TRT01P + AVISIT + BASE + AVISIT:TRT01P + 
    AVISIT:BASE + us(AVISIT | TRT01P / USUBJID),
  method = "Kenward-Roger", 
  vcov = "Kenward-Roger-Linear", # to match SAS
  data = mutate(simulated_data, USUBJID=factor(USUBJID))
)

Refer to https://openpharma.github.io/mmrm.

Motivation Forward Difference Parametrization

We would like to put priors on the differences from visit to visit

Difference contrasts in R

Setup forward difference contrasts for changes between visits:

contrasts(simulated_data$AVISIT) <- MASS::contr.sdif

Hard to interpret the contrast matrix directly:

contrasts(simulated_data$AVISIT) |> MASS::fractions()

       visit2-visit1 visit3-visit2 visit4-visit3
visit1 -3/4          -1/2          -1/4         
visit2  1/4          -1/2          -1/4         
visit3  1/4           1/2          -1/4         
visit4  1/4           1/2           3/4

Learn more about contrasts: https://bbolker.github.io/mixedmodels-misc/notes/contrasts.pdf

What do these contrasts mean?

Inverting the contrast matrix reveals the dummy variables’ interpretation:

# add the intercept
cmat <- cbind("Intercept" = 1, contrasts(simulated_data$AVISIT))
# compute the inverse matrix
solve(cmat) |> MASS::fractions()

              visit1 visit2 visit3 visit4
Intercept     1/4    1/4    1/4    1/4   
visit2-visit1  -1      1      0      0   
visit3-visit2   0     -1      1      0   
visit4-visit3   0      0     -1      1

MMRMs in `brms`

mmrm_model1 <- bf(
  CHG ~ 1 + AVISIT + BASE + BASE:AVISIT + TRT01P + TRT01P:AVISIT 
    + unstr(time = AVISIT, gr = USUBJID),
  sigma ~ 1 + AVISIT + TRT01P + AVISIT:TRT01P
)

mmrm_prior1 <- prior(normal(0, 2), class=Intercept) +
    prior(normal(0, 1), class=b) +
    prior(normal(0, log(10.0)/1.64), class=Intercept, dpar=sigma) +
    prior(normal(0, log(2.0)/1.64), class=b, dpar=sigma) +  
    prior(lkj(1), class=cortime)

fit_mmrm1 <- brm(
  formula = mmrm_model1, data = simulated_data, prior = mmrm_prior1, ...
)

Note that a single corrleation matrix is estimated while the covariance varies as per the sigma model.

Expected marginal (least-squares) means

emm2 <- emmeans(fit_mmrm1, ~ TRT01P | AVISIT, weights="proportional")

AVISIT = visit1:
 TRT01P  emmean lower.HPD upper.HPD
 0      -0.0374   -0.2637    0.1998
 10     -0.1663   -0.3601    0.0434
 20      0.0597   -0.1276    0.2505
 40     -0.1128   -0.2838    0.0793

AVISIT = visit4:
 TRT01P  emmean lower.HPD upper.HPD
 0      -0.2937   -0.6172   -0.0181
 10      0.2062   -0.0653    0.5070
 20      0.4645    0.2035    0.7719
 40      0.2955    0.0408    0.5411

Point estimate displayed: median 
HPD interval probability: 0.95

To work with MCMC samples of the expected marginal means & to summarize them exactly as we want (e.g. quantile credible intervals instead of highest posterior density intervals) we can use:

emm2 |> as.mcmc() |> summarize_draws()

Expected Marginal Contrasts per Visit

contrast(emm2, adjust="none", method="trt.vs.ctrl", ref="TRT01P0")

AVISIT = visit1:
 contrast           estimate lower.HPD upper.HPD
 TRT01P10 - TRT01P0  -0.1267   -0.4490     0.170
 TRT01P20 - TRT01P0   0.0959   -0.2036     0.396
 TRT01P40 - TRT01P0  -0.0765   -0.3619     0.222

AVISIT = visit4:
 contrast           estimate lower.HPD upper.HPD
 TRT01P10 - TRT01P0   0.5024    0.0925     0.922
 TRT01P20 - TRT01P0   0.7598    0.3540     1.155
 TRT01P40 - TRT01P0   0.5932    0.2090     0.969

Point estimate displayed: median 
HPD interval probability: 0.95

as.mcmc() to work with MCMC samples of the difference, e.g.

contrast(emm2, adjust="none", method="trt.vs.ctrl", ref="TRT01P0") |> as.mcmc(()

Monotonic Effects Across Ordered Factor Levels

mmrm_model2 <- bf( 
  CHG ~ 1 + AVISIT + mo(TRT01P) + BASE + mo(TRT01P):AVISIT 
    + BASE:AVISIT + unstr(time = AVISIT, gr = USUBJID),
  sigma ~1 + AVISIT + mo(TRT01P) + mo(TRT01P):AVISIT
)

For category c=0,\ldots,(\text{categories}-1) = C-1, the monotonic term is \text{coefficient} \times (C-1) \times \sum_{k=1}^{c} \zeta_k

\zeta_k \in [0, 1] is the percent increase and is subject to the constraint \sum_{k=1}^{C-1} \zeta_k = 1
\text{coefficient} is the mean change between adjacent categories

For more details see the respective vignette:
https://paulbuerkner.com/brms/articles/brms_monotonic.html

fit_mmrm2 <- brm(formula = mmrm_model2,
                 data = simulated_data |> mutate(TRT01P=ordered(TRT01P)), prior = mmrm_prior1, ...)

Results from different MMRMs

MMRMs: Outlook

In the online case study on http://opensource.nibr.com/bamdd you additionally find:

Data and full code
More on estimands, parametrization, contrasts & priors
Estimating average differences across visits
Meta-analytic combined (MAC) using historical data
Robustifying MAC via a “slab-and-spike”-type prior
Non-linear functions over time & doses in MMRMs
Excercises

Part 4: Longitudinal data

Longitudinal Data: Background

Example: simulated study of a drug for the treatment of Psoriasis
112 patients, 59 randomized to placebo, 53 to drug.
Efficacy was assessed using the Psoriasis Area and Severity Index (PASI), a numerical score which measures the severity and extent of psoriasis (0: no disease, 72: maximum).
Assessed at baseline and 7 post-baseline timepoints.

Longitudinal Data: Illustration

Longitudinal Data: Key Variables

# A tibble: 10 × 6
      CHG  BASE TRT01P AVISIT  AVISITN SUBJID
    <dbl> <dbl> <fct>  <fct>     <dbl>  <dbl>
 1 -2.2    16   PBO    Week 1        1      1
 2 -0.5    16   PBO    Week 2        2      1
 3  4.7    16   PBO    Week 4        4      1
 4  7.3    16   PBO    Week 6        6      1
 5 32.1    16   PBO    Week 8        8      1
 6 10      16   PBO    Week 10      10      1
 7 20.7    16   PBO    Week 12      12      1
 8  3.4    20.3 TRT    Week 1        1      2
 9  1.2    20.3 TRT    Week 2        2      2
10  0.600  20.3 TRT    Week 4        4      2

…

Longitudinal Data: Mean Model

fit_long_mean <- brm(
  CHG ~ BASE + TRT01P * AVISIT + (1 | SUBJID), 
  data = pasi_data, seed = 2454
)
fit_long_mean <- add_criterion(fit_long_mean, "loo")

Total elapsed fitting time (seconds):

sum(rstan::get_elapsed_time(fit_long_mean$fit))

[1] 3.652

Mean Model: Posterior predictive checks

pp_check(fit_long_mean)

pp_check in brms performs posterior predictive checks with the help of the bayesplot package (shows dens_overlay by default).
- Can also be called with type loo_pit (probability integral transformation). In an ideal model fit, the resulting PIT-transformed values would be uniformly distributed: i.e. the blue points and dashed lines would align with the distribution function of a Uniform(0,1) variable (solid line). See chapter 12 of BAMDD for details.

Mean Model: LOO-CV results

loo(fit_long_mean)


Computed from 4000 by 684 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2286.3 33.1
p_loo       102.3  9.4
looic      4572.6 66.3
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.9]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     681   99.6%   80      
   (0.7, 1]   (bad)        3    0.4%   <NA>    
   (1, Inf)   (very bad)   0    0.0%   <NA>    
See help('pareto-k-diagnostic') for details.

While this is quick, is this what we are interested in?
- If we’re interested in the prediction for a new (unseen) patient, we should be running leave-one-patient-out CV (not leave-one-observation-out)…
  See kfold_split_grouped in the loo package.
- For the E_max model data: not possible (IPD needed!)

Mean Model: Visualization of effects

conditional_effects(fit_long_mean, "AVISIT:TRT01P")

By default, numeric variables will be conditionalized by using their means and factors will get their first level assigned.

We might be interested in the effect integrated out over a population \rightarrow emmeans
- See longitudinal data exercise 1 in BAMDD Chapter 12

Longitudinal Data: Linear Model

fit_long_lin <- brm(
  CHG ~ BASE + TRT01P * AVISITN + (1 | SUBJID), 
  data = pasi_data, seed = 2454
)
fit_long_lin <- add_criterion(fit_long_lin, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_lin$fit))

[1] 3.111

Linear Model: Summary

summary(fit_long_lin)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: CHG ~ BASE + TRT01P * AVISITN + (1 | SUBJID) 
   Data: pasi_data (Number of observations: 684) 

Multilevel Hyperparameters:
~SUBJID (Number of levels: 112) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     5.23      0.45     4.41     6.17 1.00     1493     2463

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept            12.99      1.83     9.42    16.69 1.00     1275     2212
BASE                 -0.70      0.08    -0.85    -0.54 1.00     1282     2047
TRT01PTRT            -5.96      1.35    -8.57    -3.34 1.00     1550     1953
AVISITN              -0.22      0.09    -0.39    -0.05 1.00     3625     3546
TRT01PTRT:AVISITN    -0.72      0.13    -0.96    -0.47 1.00     3421     3194

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     6.51      0.20     6.14     6.92 1.00     4713     2555

Linear Model: Visualization of effects

conditional_effects(fit_long_lin, "AVISITN:TRT01P")

Longitudinal Data: Linear Model with Varying Slopes

fit_long_lin_vs <- brm(
  CHG ~ BASE + TRT01P * AVISITN + (1 + AVISITN | SUBJID),
  data = pasi_data, seed = 2454
)
fit_long_lin_vs <- add_criterion(fit_long_lin_vs, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_lin_vs$fit))

[1] 8.545

Linear Model: Summary

summary(fit_long_lin_vs)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: CHG ~ BASE + TRT01P * AVISITN + (1 + AVISITN | SUBJID) 
   Data: pasi_data (Number of observations: 684) 

Multilevel Hyperparameters:
~SUBJID (Number of levels: 112) 
                       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)              5.26      0.54     4.25     6.35 1.00     1578     2512
sd(AVISITN)                1.10      0.09     0.94     1.30 1.01      519     1008
cor(Intercept,AVISITN)    -0.53      0.09    -0.69    -0.34 1.01      402     1128

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept             9.21      1.71     5.85    12.47 1.00      954     1632
BASE                 -0.49      0.08    -0.64    -0.35 1.00      997     1908
TRT01PTRT            -6.47      1.18    -8.74    -4.19 1.00     1783     2657
AVISITN              -0.25      0.16    -0.56     0.06 1.00      915     1985
TRT01PTRT:AVISITN    -0.70      0.23    -1.14    -0.26 1.00      760     1626

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     4.67      0.15     4.39     4.96 1.00     3107     3238

Linear Model with Varying Slopes: Visualization of effects

conditional_effects(fit_long_lin_vs, "AVISITN:TRT01P")

Model Comparison

loo_compare(fit_long_lin, fit_long_lin_vs)

                elpd_diff se_diff
fit_long_lin_vs    0.0       0.0 
fit_long_lin    -173.7      20.3

Longitudinal Data: Monotonic Model

fit_long_mo <- brm(
  CHG ~ BASE + TRT01P * mo(AVISITN) + 
    (1 + mo(AVISITN) | SUBJID),
  data = pasi_data, seed = 24543
)
fit_long_mo <- add_criterion(fit_long_mo, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_mo$fit))

[1] 30.034

Monotonic Model: Visualization of effects

conditional_effects(fit_long_mo, "AVISITN:TRT01P")

Model Comparison

loo_compare(fit_long_lin_vs, fit_long_mo)

                elpd_diff se_diff
fit_long_mo       0.0       0.0  
fit_long_lin_vs -92.5      11.8

Longitudinal Data: Quadratic Model

fit_long_quad <- brm(
  CHG ~ BASE + TRT01P * (AVISITN + I(AVISITN^2)) + 
    (1 + AVISITN + I(AVISITN^2) | SUBJID),
  data = pasi_data, seed = 245124
)
fit_long_quad <- add_criterion(fit_long_quad, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_quad$fit))

[1] 42.86

Quadratic Model: Visualization of effects

conditional_effects(fit_long_quad, "AVISITN:TRT01P")

Longitudinal Data: Centering the time variable

Non-centered variables are often highly correlated with their square

cor(pasi_data$AVISITN, pasi_data$AVISITN^2)

[1] 0.9734303

Centering can be done around any central value. Here we use 6 because this is the average week in the study

pasi_data <- pasi_data |> 
  mutate(AVISITN_c = AVISITN - 6)

After centering, the correlation is drastically reduced

cor(pasi_data$AVISITN_c, pasi_data$AVISITN_c^2)

[1] 0.2548532

Longitudinal Data: Centered Quadratic model

fit_long_quad_c <- brm(
  CHG ~ BASE + TRT01P * (AVISITN_c + I(AVISITN_c^2)) + 
    (1 + AVISITN_c + I(AVISITN_c^2) | SUBJID),
  data = pasi_data, seed = 24547
)
fit_long_quad_c <- add_criterion(fit_long_quad_c, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_quad_c$fit))

[1] 14.114

Centered Quadratic Model: Visualization of effects

conditional_effects(fit_long_quad_c, "AVISITN_c:TRT01P")

Model Comparison

loo_compare(fit_long_mo, fit_long_quad_c)

                elpd_diff se_diff
fit_long_mo       0.0       0.0  
fit_long_quad_c -20.3       9.9

Longitudinal Data: Gaussian Process Model

fit_long_gp <- brm(
  CHG ~ BASE + gp(AVISITN_c, by = TRT01P, k=9) + 
    (1 + (AVISITN_c + I(AVISITN_c^2)) | SUBJID),
  data = pasi_data, seed = 32454,
  control = list(adapt_delta = 0.95)
)
fit_long_gp <- add_criterion(fit_long_gp, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_gp$fit))

[1] 82.378

Gaussian Process Model: LOO-CV results

loo(fit_long_gp)


Computed from 4000 by 684 log-likelihood matrix.

         Estimate   SE
elpd_loo  -2045.3 35.7
p_loo       194.4 16.1
looic      4090.7 71.4
------
MCSE of elpd_loo is NA.
MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.8]).

Pareto k diagnostic values:
                         Count Pct.    Min. ESS
(-Inf, 0.7]   (good)     642   93.9%   106     
   (0.7, 1]   (bad)       36    5.3%   <NA>    
   (1, Inf)   (very bad)   6    0.9%   <NA>    
See help('pareto-k-diagnostic') for details.

Gaussian Process Model: Visualization of effects

conditional_effects(fit_long_gp, "AVISITN_c:TRT01P")

Model Comparison

loo_compare(fit_long_quad_c, fit_long_gp)

                elpd_diff se_diff
fit_long_gp       0.0       0.0  
fit_long_quad_c -12.8       6.6

Recall that model comparisons may not be trustworthy.

Longitudinal Data: Gaussian Process AR1 Model

fit_long_gp_ar <- brm(
  CHG ~ BASE + gp(AVISITN_c, by = TRT01P, k=9) + 
    ar(AVISITN_c, gr = SUBJID, p = 1) +
    (1 + (AVISITN_c + I(AVISITN_c^2)) | SUBJID),
  data = pasi_data, seed = 24235,
  iter = 4000, warmup = 1000,
  control = list(adapt_delta = 0.95)
)
fit_long_gp_ar <- add_criterion(fit_long_gp_ar, "loo")

Total elapsed fitting time:

sum(rstan::get_elapsed_time(fit_long_gp_ar$fit))

[1] 172.381

Gaussian Process AR1 Model: Autocorrelation results

Some autocorrelation may be present:

posterior_summary(fit_long_gp_ar, variable = "ar") |>
  round(3)

      Estimate Est.Error  Q2.5 Q97.5
ar[1]    0.379     0.121 0.157 0.625

A more formal model comparison may be theoretically advantageous but given that we add only a single parameter (the AR1 effect), we should probably gain more than we loose by including it.

Gaussian Process AR1 Model: Visualization of effects

conditional_effects(fit_long_gp_ar, "AVISITN_c:TRT01P")

Course wrap-up

Summary

Diverse opportunities for applied modelling to inform good drug-development decisions
Bayesian paradigm is well suited for many of these situations
- Availability of meaningful prior information
- Desire for probabilistically interpretable statements about unknowns and future observable quantities
brms is a powerful and highly flexible engine for applied modelling , facilitating straightforward model specification and inference

Looking ahead

We hope you have:
- Become familiar with brms syntax and workflow
- Seen its versatility for statistical modelling in drug development
- Gained hands-on experience with the package from guided exercises
And that you feel empowered to use brms going forward!

Resources

Our open-source book:
Bayesian Applied Modelling in Drug Developoment (BAMDD)
Course materials on GitHub
brms documentation: https://paul-buerkner.github.io/brms/
Stan homepage: https://mc-stan.org
Stan Forums: https://discourse.mc-stan.org

Thank you

Thank you for your interest and participation!
Our contact information:
- Sebastian: sebastian.weber@novartis.com
- Lukas: lukas_andreas.widmer@novartis.com

Hierarchical Models in Preclincal Research

Preface

Disclaimer

Learning Goals

Housekeeping

Acknowledgements

Tutorial Agenda

A brms modelling workflow

Bayesian inference

Bayesian inference is integration

Stan project: Bayesian modelling platform

brms: Bayesian regression modelling in Stan

Some Highlights of brms

Model specification via formula

Model specification via formula

Model likelihood via family

Recommended global brms settings

Part 1: historical control data

Use of historical control data in clinical trials

The meta-analytic-predictive (MAP) prior1

Example: double-blind PoC placebo controlled trial1

Generalized Meta-Analytic-Predictive model

Running the MAP analysis in R

Summary MAP prior model

Predicting the placebo response rate in a new study

Approximation with a finite mixture

Region specific MAP prior (hypothetically)

Region specific MAP Priors via varying regions model

Summary region model

Extract MAP MCMC samples

Obtain parametric MAP prior

Summary historical control data

Hands-on exercises: historical control data

Priors

The prior can only be understood in the context of the model 1

Commonly heard reasons to be afraid of priors

Are priors inherently subjective?

Do priors bias your analysis?

Is bias even the right metric?

Winning the bias-variance trade-off

Non-Bayesian models don’t have priors?

Some reasons for using priors

Some observations about priors

Uniformity is informative

Prior tails matter

Prior tails matter: Student-t(4) prior vs. normal likelihood

Prior tails matter: Normal prior vs. Student-t(4) likelihood

Prior tails matter: Normal prior vs. normal likelihood

Prior tails matter: Student-t(4) prior vs. Student-t(4) likelihood

But what now?

Principles of specifying priors

Data informed priors

Part 2: Dose finding

Typical dose response shapes

Typical dose response shapes

Typical dose response shapes

Data Set PATHWAY

Sigmoid Emax Model

Sigmoid Emax Model: Visualization

Specifying sigmoid Emax Model with brms

A note on Emax Model priors

A note on Emax Model priors

A note on Emax Model priors

Fitting the sigmoid Emax Model with brms

Sigmoid Emax Model: Results Summary

Visualizing the Fitted Sigmoid Emax Model

Modified Beta Model

Specifying the Modified Beta Model with brms

Fitting the Modified Beta Model with brms

Visualizing the Fitted Modified Beta Model

Model Evaluation (failed attempt)

Model Evaluation (works)

Model Comparison

Bayesian Model Averaging

Visualizing the Model Averaging

Hands-on exercises: dose finding

Part 3: Bayesian Mixed Models for Repeated Measures (MMRM)

Hypothetical dose finding study, N=200

Overview and Analysis Goals

What do our data look like?

A `brms` modelling workflow

Model specification via `formula`

Model specification via `formula`

Model likelihood via `family`

The meta-analytic-predictive (MAP) prior¹

Example: double-blind PoC placebo controlled trial¹

The prior can only be understood in the context of the model ¹

Sigmoid E_max Model

Sigmoid E_max Model: Visualization

Specifying sigmoid E_max Model with `brms`

A note on E_max Model priors

A note on E_max Model priors

A note on E_max Model priors

Fitting the sigmoid E_max Model with `brms`

Sigmoid E_max Model: Results Summary

Visualizing the Fitted Sigmoid E_max Model

Specifying the Modified Beta Model with `brms`

Fitting the Modified Beta Model with `brms`

MMRMs with the `mmrm` R package

MMRMs in `brms`