Exercises Hierarchical Models in Preclincal Research

Setup

R Session

Note: All shown code can be copied by simply clicking the top right icon of the code block. The code blocks from the main presentation can also be copied by the same icon, which is visible when hovering over the respective block with your mouse.

For the course a RStudio server has been setup by the University of Göttingen: http://134.76.20.1:8787

To run exercises on your hardware please follow the instructions below.

Stan

Please first install the cmdstanr R package following the instructions from their Getting started with CmdStanR article. This also covers installation of the necessary compiler and accompanying C++ toolchain as well as cmdstan itself.

An alternative is to use rstan as described on the wiki page for rstan to get started. In this case, please use as backend for brms the setting rstan. Note that the use of the cmdstanr backend is recommended over rstan allowing to use more recent versions of Stan. Moreover, the instructions for installing a C++ toolchain is better described as part of the respective cmdstan guide chapter.

C++ toolchain installation

Windows: cmdstanr can automatically install the RTools based C++ toolchain via

cmdstanr::check_cmdstan_toolchain(fix=T)

macOS: The project macrtools provides a comprehensive and simple installation of a full toolchain as used by CRAN https://mac.thecoatlessprofessor.com/macrtools/

# install.packages("remotes")
remotes::install_github("coatless-mac/macrtools")
macrtools::macos_rtools_install()

R packages

Please also ensure to have all R packages installed as listed on the next preliminary code section.

# in case packages are missing, please run:
install.packages(c("here", "ggplot2", "dplyr", "knitr", "brms", "posterior", "tidybayes", "RBesT", "ggrepel", "patchwork", "ggdist", "withr"))

Web-site

To download the full web-site to complement the material you may do so using the following R code snippet:

bamdd_zip <- tempfile(fileext=".zip")
download.file("https://github.com/Novartis/bamdd/archive/refs/heads/main.zip", bamdd_zip)
## extracts web-site into the users home
unzip(bamdd_zip, exdir=normalizePath("~"))
browseURL(normalizePath(file.path("~", "bamdd-main")))
# to install all dependencies needed to build the web-site, please run
source(file.path("~", "bamdd-main", "src", "install_dependencies.R"))

Preliminary code

First load the required packages and set some options as described in the case study.

# uncomment below and set to the filename of you R file for the
# exercises
# here::i_am("your-exercise-file.R")
library(here)
library(ggplot2)
library(dplyr)
library(knitr)
library(brms)
library(posterior)
library(tidybayes)
library(bayesplot)
library(RBesT)
library(ggrepel)
library(patchwork)
library(ggdist)

options(
  # how many processor cores would you like to use?
  mc.cores = 4, 
  # how would you like to access Stan?
  brms.backend = "cmdstanr",
  # cache model binaries
  cmdstanr_write_stan_file_dir=here::here("_brms-cache"),
  # no need to normalize likelihoods
  brms.normalize = FALSE
)

dir.create(here::here("_brms-cache"), FALSE)
set.seed(254335)

# in case rstan is used as backend, consider to enable model caching
# by enabling the following line
# rstan_options(auto_write = TRUE)

# Set defaults for ggplot2 ----
theme_set(theme_bw(base_size=18))

scale_colour_discrete <- function(...) {
  # Alternative: ggsci::scale_color_nejm(...)
  # scale_colour_brewer(..., palette="Set1")
  ggthemes::scale_colour_colorblind(...)
}
scale_fill_discrete <- function(...) {
  # Alternative: ggsci::scale_fill_nejm(...)
  #scale_fill_brewer(..., palette="Set1")
  ggthemes::scale_fill_colorblind(...)
}
scale_colour_continuous <- function(...) {
  scale_colour_viridis_c(..., option="turbo")
}
update_geom_defaults("point", list(size=2))
update_geom_defaults("line", list(size=1.5))

Resources

Historical Control Data

Exercise setup

Create the region dataset used in the example:

withr::with_seed(654873,
                 AS_region <- bind_cols(RBesT::AS, region=sample(c("asia", "europe", "north_america"), 8, TRUE)))
kable(AS_region)

study	n	r	region
Study 1	107	23	europe
Study 2	44	12	asia
Study 3	51	19	europe
Study 4	39	9	europe
Study 5	139	39	asia
Study 6	20	6	north_america
Study 7	78	9	europe
Study 8	35	10	asia

And fit the model as shown in the case study:

model <- bf(r | trials(n) ~ 1 + (1 | study), family=binomial)

model_prior <- prior(normal(0, 2), class=Intercept) +
    prior(normal(0, 1), class=sd, coef=Intercept, group=study)

map_mc_brms  <- brm(model, AS_region, prior = model_prior,
                    seed = 4767, control=list(adapt_delta=0.99),
                    silent = 2, refresh = 0)
map_mc_brms

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + (1 | study) 
   Data: AS_region (Number of observations: 8) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~study (Number of levels: 8) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.37      0.20     0.04     0.83 1.01      938     1082

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept    -1.10      0.18    -1.45    -0.74 1.00     1416     1550

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Exercise 1: Posterior predictive check

Create a posterior predictive check based on the predictive distribution for the response rate.

Steps:

Use posterior_predict to create samples from the predictive distribution of outcomes per trial.
Use sweep(predictive, 2, AS_region$n, "/") to convert these samples from the predictive distribution of the outcome counts to samples from the predictive distribution for responder rates.
Use ppc_intervals from bayesplot to create a plot showing your results. Consider using the with function to refer more easily to the data columns of the analysis data set.
Redo the above using from the tidybayes package the add_predicted_rvars function adding the predictions directly to the analysis data set and then redo task 2 and 3. To convert a rvars column to the respective draws format of bayesplot you can use the as_draws_matrix from the posterior package.

Exercise 2: Fixed vs random effect

Redo the analysis with region, but treat region as a fixed effect. Evaluate the informativeness of the obtained MAP priors. The model formula for brms should look like

region_model_fixed <- bf(r | trials(n) ~ 1 + region + (1 | study), family=binomial)

Steps:

Consider the prior for the region fixed effect first. The reference region is included in the intercept. The reference region is implicitly defined by the first level of the variable region when defined as factor.
- Define asia to be the reference region in the example. Also include a level other in the set of levels.
- Assume that an odds-ratio of 2 between regions can be seen as very large such that a prior of \text{N}(0, (\log(2)/1.96)^2) for the region main effect is adequate.
Obtain the MAP prior for each region by using the AS_region_all data frame defined below and apply posterior_linpred as shown in the case study.
Convert the MCMC samples from the MAP prior distribution into mixture distributions with the same code as in the case study.
Calculate the effective sample size (ESS) for each prior distribution with the ess function from the RBesT R package.

AS_region_all <- data.frame(region=c("asia", "europe", "north_america", "other")) |>
    mutate(study=paste("new_study", region, sep="_"), r=0, n=6)

Exercise 3: Normal endpoint meta-analysis

Run the analysis for the normal endpoint in the crohn data set of RBesT. Refer to the RBesT vignette for a normal endpoint on more details and context.

Steps:

Use as family=gaussian and use the se response modifier in place of trials to specify a known standard error. More details on the additional response information specifiers can be found for the documentation of brmsformula.
Use the same priors as proposed in the vignette from RBesT.
Compare the obtained MAP prior (in MCMC sample form) from RBesT and brms.

Solutions Historical Control

Exercise 1: Posterior predictive check

pp_count_AS_region <- posterior_predict(map_mc_brms)

pp_rate_AS_region <- sweep(pp_count_AS_region, 2, AS_region$n, "/")

with(AS_region, ppc_intervals(r / n, pp_rate_AS_region))  +
    scale_x_continuous("Study", breaks=1:nrow(AS_region), labels=AS_region$study) +
    ylab("Responder Rate") +
    coord_flip() +
    theme(legend.position="right",
          # suppress vertical grid lines for better readability of intervals
          panel.grid.major.y = element_blank())

pp_AS_region <- AS_region |>
    add_predicted_rvars(map_mc_brms, value="pp_r") |>
    mutate(pp_rate=pp_r/n)

with(pp_AS_region, ppc_intervals(r / n, as_draws_matrix(pp_rate)))  +
    scale_x_continuous("Study", breaks=1:nrow(AS_region), labels=AS_region$study) +
    ylab("Responder Rate") +
    coord_flip() +
    theme(legend.position="right",
          # suppress vertical grid lines for better readability of intervals
          panel.grid.major.y = element_blank())

Exercise 2: Fixed vs random effect

AS_region_all <- data.frame(region=c("asia", "europe", "north_america", "other")) |>
    mutate(study=paste("new_study", region, sep="_"), r=0, n=6)

## to get brms to include the other factor level in the model, we have
## to add a fake row with region "other" and n=0
AS_region_2 <- mutate(bind_rows(AS_region, mutate(AS_region_all, n=0)[4,]),
                      region=factor(region, levels=c("asia", "europe", "north_america", "other")))

str(AS_region_2)

'data.frame':   9 obs. of  4 variables:
 $ study : chr  "Study 1" "Study 2" "Study 3" "Study 4" ...
 $ n     : num  107 44 51 39 139 20 78 35 0
 $ r     : num  23 12 19 9 39 6 9 10 0
 $ region: Factor w/ 4 levels "asia","europe",..: 2 1 2 2 1 3 2 1 4

model_fixed <- bf(r | trials(n) ~ 1 + region + (1|study), family=binomial)

get_prior(model_fixed, AS_region_2)

                prior     class                coef group resp dpar nlpar lb ub
               (flat)         b                                                
               (flat)         b        regioneurope                            
               (flat)         b regionnorth_america                            
               (flat)         b         regionother                            
 student_t(3, 0, 2.5) Intercept                                                
 student_t(3, 0, 2.5)        sd                                            0   
 student_t(3, 0, 2.5)        sd                     study                  0   
 student_t(3, 0, 2.5)        sd           Intercept study                  0   
       source
      default
 (vectorized)
 (vectorized)
 (vectorized)
      default
      default
 (vectorized)
 (vectorized)

model_fixed_prior <- prior(normal(0, 2), class=Intercept) +
    prior(normal(0, 1), class=sd, coef=Intercept, group=study) +
    prior(normal(0, log(2)/1.96), class=b)

fixed_mc_brms  <- brm(model_fixed, AS_region_2, prior=model_fixed_prior,
                      seed=4767,
                      silent = 2, refresh = 0, control=list(adapt_delta=0.99))

fixed_mc_brms

 Family: binomial 
  Links: mu = logit 
Formula: r | trials(n) ~ 1 + region + (1 | study) 
   Data: AS_region_2 (Number of observations: 9) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~study (Number of levels: 9) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)     0.38      0.22     0.04     0.90 1.00     1159     1208

Regression Coefficients:
                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept              -1.02      0.24    -1.51    -0.56 1.00     1957     1971
regioneurope           -0.17      0.26    -0.67     0.35 1.00     2020     2371
regionnorth_america     0.04      0.31    -0.56     0.64 1.00     3893     2962
regionother            -0.00      0.36    -0.69     0.70 1.00     3953     2920

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

post_regions <- posterior_linpred(fixed_mc_brms,
                                  newdata=AS_region_all,
                                  transform=TRUE,
                                  allow_new_levels=TRUE,
                                  sample_new_levels="gaussian")

head(post_regions)

           [,1]      [,2]      [,3]      [,4]
[1,] 0.30404914 0.1931906 0.3161137 0.2367646
[2,] 0.07039955 0.2106194 0.1659046 0.3036233
[3,] 0.21368669 0.1590512 0.3322744 0.2683783
[4,] 0.37494614 0.2432188 0.3392217 0.2402818
[5,] 0.20217845 0.2834231 0.1709087 0.3674437
[6,] 0.26282384 0.3563941 0.2888155 0.2961940

colnames(post_regions) <- AS_region_all$region
map_region <- list()
for(r in AS_region_all$region) {
    map_region[[r]] <- mixfit(post_regions[,r], type="beta", Nc=3, constrain_gt1=TRUE)
}

kable(bind_rows(lapply(map_region, summary), .id="MAP"), digits=3)

MAP	mean	sd	2.5%	50.0%	97.5%
asia	0.278	0.096	0.111	0.269	0.522
europe	0.244	0.090	0.105	0.232	0.486
north_america	0.286	0.108	0.109	0.276	0.549
other	0.282	0.116	0.096	0.268	0.561

sapply(map_region, ess)

         asia        europe north_america         other 
     28.94846      34.15330      19.51994      15.76238

Exercise 3: Normal endpoint meta-analysis

crohn <- RBesT::crohn

crohn_sigma <- 88
crohn$y.se <- crohn_sigma/sqrt(crohn$n)

library(RBesT)
set.seed(1234)
rbest_normal_map_mcmc <- gMAP(cbind(y, y.se) ~ 1 | study, 
                              weights=n, data=crohn,
                              family=gaussian,
                              beta.prior=cbind(0, crohn_sigma),
                              tau.dist="HalfNormal",tau.prior=cbind(0,crohn_sigma/2))

model_normal <- bf(y | se(y.se) ~ 1 + (1 | study), family=gaussian())

prior_normal <- prior(normal(0, 88), class=Intercept) +
    prior(normal(0, 88/2), class=sd, coef=Intercept, group=study)

brms_normal_map_mcmc <- brm(model_normal, crohn, prior=prior_normal,
                            seed=4767,
                            silent = 2, refresh = 0, control=list(adapt_delta=0.99))

## comparing the outputs we see that the random effect posterior
## matches...
rbest_normal_map_mcmc

Generalized Meta Analytic Predictive Prior Analysis

Call:  gMAP(formula = cbind(y, y.se) ~ 1 | study, family = gaussian, 
    data = crohn, weights = n, tau.dist = "HalfNormal", tau.prior = cbind(0, 
        crohn_sigma/2), beta.prior = cbind(0, crohn_sigma))

Exchangeability tau strata: 1 
Prediction tau stratum    : 1 
Maximal Rhat              : 1 
Estimated reference scale : 88 

Between-trial heterogeneity of tau prediction stratum
 mean    sd  2.5%   50% 97.5% 
14.50  9.74  1.33 12.60 39.60 

MAP Prior MCMC sample
 mean    sd  2.5%   50% 97.5% 
-49.7  19.7 -91.4 -48.3 -10.7

brms_normal_map_mcmc

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: y | se(y.se) ~ 1 + (1 | study) 
   Data: crohn (Number of observations: 6) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Multilevel Hyperparameters:
~study (Number of levels: 6) 
              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sd(Intercept)    14.50      9.80     1.57    39.37 1.00      928     1522

Regression Coefficients:
          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept   -49.80      8.77   -68.88   -33.19 1.00      977      971

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     0.00      0.00     0.00     0.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

brms_normal_map <- posterior_epred(brms_normal_map_mcmc,
                                   newdata=data.frame(study="new", n=1, y=0, y.se=88),
                                   allow_new_levels=TRUE,
                                   sample_new_levels="gaussian")

## ... and the MAP prior is also the same
summarise_draws(brms_normal_map, mean, sd, ~quantile2(., probs = c(0.025, 0.5, 0.975)))

# A tibble: 1 × 6
  variable  mean    sd  q2.5   q50 q97.5
  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl>
1 ...1     -49.4  18.7 -88.7 -48.3 -10.1

Dose-finding

Peanut allergy exercise overview

This exercises accompanies the dose finding case study.

In a hypothetical phase 2b trial children and adolescents with peanut allergy were randomly assigned to placebo or 5 doses (5 to 300 mg) of a new drug.

After 26 weeks the patients underwent a double-blind placebo-controlled food challenge and the primary endpoint was the proportion of patients that could ingest 600 mg or more of peanut protein without experiencing dose-limiting symptoms.

The plots below show an overview of the data. Note that no placebo patients were considered “responders” per the primary endpoint definition.

Show data setup code

peanut <- tibble(
  TRT01P = structure(c(
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L,
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 5L, 5L, 5L, 5L, 5L, 5L,
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
    4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
    3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
  ),levels = c("PBO", "5 mg", "15 mg", "50 mg", "150 mg", "300 mg"), class = "factor"),
  dose = c(
    300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L,
    300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L,
    300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 150L, 150L, 150L, 150L, 150L, 150L, 150L,
    150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L,
    150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L, 150L,
    150L, 150L, 150L, 150L, 150L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L,
    50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L,
    50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 50L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
    15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L,
    15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
    5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
    0L, 0L),
  AVAL = c(
    1000L, 100L, 0L, 600L, 600L, 300L, 300L, 10L, 10L, 100L, 1000L, 300L, 10L, 1000L, 1000L, 1000L, 3L, 3L, 600L, 300L,
    1000L, 1000L, 100L, 1000L, 600L, 600L, 1000L, 1000L, 1000L, 300L, 600L, 1000L, 1000L, 1000L, 1000L, 1000L, 300L,
    1000L, 600L, 1000L, 300L, 1000L, 600L, 600L, 1000L, 1000L, 600L, 1000L, 1000L, 1000L, 1000L, 600L, 1000L, 1000L,
    1000L, 0L, 600L, 300L, 1000L, 1000L, 1000L, 100L, 1000L, 1000L, 600L, 1000L, 1000L, 1000L, 600L, 300L, 600L, 600L,
    100L, 600L, 1000L, 300L, 10L, 3L, 1000L, 300L, 300L, 1000L, 300L, 300L, 10L, 1000L, 1000L, 10L, 10L, 600L, 3L, 10L,
    600L, 600L, 600L, 1000L, 100L, 1000L, 1000L, 10L, 1000L, 3L, 10L, 1000L, 100L, 1000L, 100L, 10L, 300L, 10L, 100L,
    1000L, 0L, 1000L, 100L, 3L, 10L, 100L, 100L, 300L, 1000L, 1000L, 1000L, 10L, 1000L, 3L, 3L, 600L, 600L, 10L, 3L,
    600L, 600L, 300L, 300L, 1000L, 3L, 3L, 1000L, 10L, 1000L, 1000L, 600L, 100L, 300L, 600L, 10L, 100L, 0L, 100L, 3L,
    0L, 10L, 3L, 600L, 300L, 300L, 300L, 600L, 300L, 100L, 3L, 0L, 10L, 600L, 300L, 10L, 300L, 600L, 1000L, 600L, 600L,
    0L, 0L, 600L, 600L, 600L, 0L, 0L, 300L, 100L, 0L, 10L, 300L, 1000L, 300L, 600L, 600L, 300L, 10L, 600L, 100L, 100L,
    300L, 3L, 3L, 300L, 1000L, 10L, 3L, 100L, 3L, 100L, 100L, 300L, 3L, 3L, 600L, 300L, 3L, 3L, 3L, 300L, 3L, 0L, 10L,
    3L, 300L, 10L, 10L, 600L, 0L, 300L, 600L, 0L, 0L, 100L, 100L, 10L, 100L, 10L, 100L, 600L, 0L, 600L, 0L, 10L, 100L,
    0L, 0L, 0L, 0L, 3L, 10L, 3L, 300L, 600L, 0L, 0L, 300L, 10L, 10L, 100L, 300L, 0L, 0L, 0L, 3L, 0L, 0L, 10L, 0L, 300L,
    3L, 0L, 0L, 0L, 0L, 100L, 3L, 0L, 3L, 10L, 10L, 3L, 0L, 3L, 10L, 3L, 0L, 0L, 3L, 100L, 0L, 0L, 300L, 0L, 0L, 0L,
    0L, 0L, 0L, 0L, 3L, 0L, 3L, 0L, 0L, 0L, 0L),
  PARAM = structure(
    c(
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
      1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    levels = "Tolerated amount of peanut protein (mg)", class = "factor"),
  USUBJID = c(
    129L, 103L, 104L, 185L, 34L, 15L, 140L, 151L, 7L, 144L, 200L, 23L, 138L, 115L, 255L, 147L, 224L, 101L, 156L, 284L,
    136L, 248L, 179L, 298L, 168L, 295L, 289L, 241L, 36L, 27L, 123L, 266L, 11L, 291L, 236L, 130L, 173L, 195L, 203L, 160L,
    274L, 167L, 290L, 94L, 9L, 269L, 122L, 135L, 64L, 26L, 95L, 10L, 5L, 234L, 161L, 299L, 88L, 69L, 35L, 233L, 286L,
    85L, 91L, 189L, 80L, 152L, 223L, 287L, 244L, 57L, 108L, 18L, 62L, 157L, 300L, 283L, 164L, 243L, 89L, 220L, 24L, 271L,
    166L, 118L, 201L, 127L, 121L, 41L, 267L, 213L, 49L, 73L, 202L, 134L, 112L, 25L, 227L, 29L, 251L, 273L, 119L, 132L,
    74L, 270L, 83L, 37L, 181L, 258L, 253L, 48L, 120L, 54L, 277L, 176L, 65L, 264L, 107L, 171L, 262L, 162L, 187L, 272L,
    288L, 294L, 245L, 109L, 172L, 204L, 275L, 22L, 66L, 186L, 247L, 17L, 149L, 141L, 177L, 280L, 216L, 40L, 75L, 263L,
    246L, 14L, 81L, 260L, 153L, 45L, 237L, 8L, 117L, 86L, 296L, 146L, 154L, 116L, 38L, 100L, 191L, 175L, 92L, 158L, 192L,
    180L, 256L, 254L, 125L, 222L, 145L, 261L, 155L, 159L, 3L, 206L, 278L, 19L, 31L, 63L, 208L, 55L, 259L, 218L, 111L,
    226L, 33L, 2L, 44L, 297L, 53L, 87L, 16L, 28L, 90L, 207L, 56L, 137L, 128L, 178L, 142L, 143L, 148L, 193L, 229L, 265L,
    97L, 252L, 205L, 150L, 165L, 188L, 52L, 99L, 93L, 1L, 221L, 124L, 210L, 6L, 232L, 21L, 211L, 163L, 96L, 60L, 183L,
    190L, 242L, 42L, 46L, 67L, 126L, 209L, 72L, 194L, 238L, 184L, 39L, 105L, 249L, 61L, 113L, 30L, 77L, 12L, 4L, 51L,
    139L, 20L, 268L, 215L, 292L, 217L, 199L, 32L, 276L, 47L, 225L, 230L, 79L, 71L, 98L, 50L, 13L, 76L, 231L, 250L, 58L,
    68L, 239L, 198L, 293L, 212L, 110L, 59L, 182L, 133L, 170L, 43L, 282L, 281L, 131L, 114L, 196L, 214L, 285L, 70L, 102L,
    279L, 106L, 197L, 257L, 228L, 84L, 235L, 78L, 169L, 240L, 219L, 174L, 82L),
  CRIT1 = structure(c(
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    levels = "Tolerating >=600 mg of peanut protein without dose-limiting symptoms", class = "factor"),
  CRIT1FL = structure(c(
    2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L,
    2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L,
    2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
    1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
    1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L,
    2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L,
    2L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
    1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    levels = c("N", "Y"), class = "factor"),
  CRIT2 = structure(c(
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    levels = "Tolerating >=1000 mg of peanut protein without dose-limiting symptoms", class = "factor"),
  CRIT2FL = structure(c(
    2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L,
    2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L,
    1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L,
    1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
    1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L,
    2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
    levels = c("N", "Y"), class = "factor" ) )

Code

peanut |>
  ggplot(aes(x=factor(AVAL), fill=TRT01P)) +
  geom_bar( position=position_dodge(preserve = "single")) +
  scale_fill_manual(values=c("royalblue", "#fb6a4a", "#ef3b2c", "#cb181d", "#a50f15", "#67000d")) +
  xlab("Peanut protein tolerated without dose-limiting symptoms (mg)") +
  ylab("Patients") +
  theme(legend.position="right")

Code

peanut |>
  group_by(TRT01P) |>
  summarize(proportion = sum(CRIT1FL=="Y") / n()) |>
  ggplot(aes(x=TRT01P, y=proportion, fill=TRT01P)) +
  geom_bar(stat="identity", position=position_dodge()) +
  geom_text(aes(label=scales::percent(proportion)), size=7, vjust=c(0, rep(1.5, 5))) +
  scale_fill_manual(values=c("royalblue", "#fb6a4a", "#ef3b2c", "#cb181d", "#a50f15", "#67000d")) +
  scale_y_continuous(breaks=seq(0, 0.7, 0.1), labels=scales::percent) +
  xlab("Tolerating >=600 mg of peanut protein without dose-limiting symptoms") +
  ylab("Percentage of patients")

Excercise 1: Emax logistic regression

Fit a sigmoid Emax logistic regression model (i.e. family=binomial(link="logit")). To accelerate likelihood evaluations, first summarize data as “responders” y out of n patients per dose. We tell brms to use this information using y | trials(n) ~ ....

We expect the true placebo proportion to be around 0.05 (-2.944 on the logit scale) with much more than 0.1 or much less than 0.02 considered unlikely. It is a-priori at least possible that there is a huge treatment effect such as 95% versus 5% responders (difference on the log-scale close to 6), but we are at least mildly skeptical.

The dose response is a-priori somewhat likely to follow a Emax curve (with Hill parameter near 1), but we wish to allow for the possibility of a steeper or shallower curve. The dose with half the effect (ED50) might be near 15 mg, but have considerable uncertainty around that.

model1 <- bf( #---- YOUR CODE HERE --- # ~ E0 + Emax * dose^h/(dose^h + ED50^h),
              family= #---- YOUR CODE HERE --- # ,
              nlf(h ~ exp(logh)), nlf(ED50 ~ exp(logED50)),
              E0 ~ 1, logED50 ~ 1, logh ~ 1, Emax ~ 1, 
              nl=TRUE)

priors1 <- prior(normal(-2.944, 0.38), nlpar=E0) +
           prior(normal(0, 1), nlpar=logh) +
           prior(normal(0, 6), nlpar=Emax) +
           prior(normal(2.7, 2.5), nlpar=logED50)

Now fit the model

brmfit1 <- brm(
  formula = model1,
  prior = priors1, 
  data = summarize(peanut, y=sum(CRIT1FL=="Y"), n=n(), .by=c(TRT01P, dose))
)

Now use

tibble(dose = seq(0, 300, 1), n=1) |> tidybayes::add_epred_rvars(brmfit1)

to obtain predicted proportions for every dose from 0 to 300 mg.

Then, plot the curve of predicted proportions for each of these doses using ggplot2 and ggdist (using ydist=.epred in the aesthetics and the stat_lineribbon() geom from the ggdist package).

If you prefer, replace the default fill colors e.g. with shades of blue using + scale_fill_brewer(palette="Blues")

Next obtain the posterior distribution for the difference in proportions for each of these dose levels compared with placebo.

Excercise 2: ordinal outcome and interval censored

Try to perform a more efficient statistical analysis by either using an ordinal outcome or treating the data as interval censored.

Interval censored

Try treating the data about the logarithm of the amount of protein tolerated as interval censored (-\infty to < \log(3), >= \log(3) to < \log(10), …) using logAVAL | cens("interval", logAVALnext) as endpoint and with family = gaussian(). For the latter approach you might have to set log(0) to a very low number such as -28 (approximate \log(\text{weight}) of one single peanut protein molecule) and assume the upper interval end when a patient could tolerate 1000 mg to, say, \log(30000) (approximately 100 peanut kernels). This achieves approximately the same thing as setting these to -\infty and \infty, but is handled better by brms/Stan.

How would you change the prior distributions in this case?

Ordinal outcome aproach

Now use an ordinal outcome via family = cumulative(link = "logit", threshold = "flexible"). Details of this model are described in a journal article, for which there is also a PsyArXiv preprint in case you do not have access to the journal.

Note that the placebo response is described by a series of K-1 ordered thresholds \theta_1, \ldots, \theta_{K-1} on the logit scale for K ordered categories. For the placebo group, the probability of being in categories k=1, \ldots, K-1 is be given by \text{logit}^{-1} \theta_k and for category K by 1-\text{logit}^{-1} \theta_{K-1}. In this case, these parameters will appear in summary(brmfit3) as Intercept[1], …, Intercept[6].

How would you change the model and the prior distributions in this case?

Note, if we had a baseline amount of protein tolerated, we could treat this as a monotonic covariate e.g. using mo(BASE) or assume a particular functional form.

Solutions Dose-Finding

Excercise 1: Emax logistic regression

model1 <- bf( y | trials(n) ~ E0 + Emax * dose^h/(dose^h + ED50^h),
              family=binomial(link="logit"),
              nlf(h ~ exp(logh)), nlf(ED50 ~ exp(logED50)),
              E0 ~ 1, logED50 ~ 1, logh ~ 1, Emax ~ 1, 
              nl=TRUE)

priors1 <- prior(normal(-2.944, 0.38), nlpar=E0) +
           prior(normal(0, 1), nlpar=logh) +
           prior(normal(0, 6), nlpar=Emax) +
           prior(normal(2.7, 2.5), nlpar=logED50)


brmfit1 <- brm(
  formula = model1,
  prior = priors1, 
  data = summarize(peanut, y=sum(CRIT1FL=="Y"), n=n(), .by=c(TRT01P, dose)),
  refresh=0
)

Running MCMC with 4 parallel chains...

Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 4 finished in 0.2 seconds.

All 4 chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.5 seconds.

summary(brmfit1)

 Family: binomial 
  Links: mu = logit 
Formula: y | trials(n) ~ E0 + Emax * dose^h/(dose^h + ED50^h) 
         h ~ exp(logh)
         ED50 ~ exp(logED50)
         E0 ~ 1
         logED50 ~ 1
         logh ~ 1
         Emax ~ 1
   Data: summarize(peanut, y = sum(CRIT1FL == "Y"), n = n() (Number of observations: 6) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
E0_Intercept         -3.22      0.32    -3.87    -2.59 1.00     1810     2055
logED50_Intercept     3.69      1.23     1.99     6.56 1.00      973     1049
logh_Intercept       -0.62      0.38    -1.31     0.16 1.00      994     1295
Emax_Intercept        5.56      1.63     3.52     9.65 1.00      885     1004

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

tibble(dose = seq(0, 300, 1), n=1) |> 
  tidybayes::add_epred_rvars(brmfit1) |>
  ggplot(aes(x=dose, ydist=.epred)) +
  stat_lineribbon() +
  scale_fill_brewer(palette="Blues") +
  ylab("Model predicted proportion") +
  xlab("Dose [mg]")

pr <- tibble(dose = seq(0, 300, 1), n=1) |> 
    tidybayes::add_epred_rvars(brmfit1)

left_join(pr, filter(pr, dose == 0)|> 
              dplyr::select(-dose) |> 
              rename(.pbo=.epred), 
          by="n") |>
    mutate(diff = .epred - .pbo) |>
    ggplot(aes(x=dose, ydist=diff)) +
    stat_lineribbon() +
    scale_fill_brewer(palette="Blues") +
    ylab("Model predicted difference\nin proportion vs. placebo") +
    xlab("Dose [mg]")

Excercise 2: ordinal outcome and interval censored

Interval censored

model2 <- bf( logAVAL | cens("interval", logAVALnext) ~ E0 + Emax * dose^h/(dose^h + ED50^h),
              family = gaussian(),
              nlf(h ~ exp(logh)), nlf(ED50 ~ exp(logED50)),
              E0 ~ 1, logED50 ~ 1, logh ~ 1, Emax ~ 1, 
              nl=TRUE )

priors2 <- prior(normal(0, 5), nlpar=E0) + # centered on 1 mg, but
    # very uncertain
    prior(normal(0, 1), nlpar=logh) + # No particular reason
    # to change prior (dose
    # response could be
    # just as steep here as
    # for binomial outcome)
    prior(normal(0, 6.5), nlpar=Emax) + # decent prior probability
    # for >= 1000 mg, still conceivable could reach 300000 No
    # particular reason to change prior here (ED50 could be same as
    # for binomial outcome)
    prior(normal(2.7, 2.5), nlpar=logED50) + # Additional parameter: residual SD, clearly a decent amount of
    # variability, want to
    # allow large SD on
    # the log-scale
    prior(lognormal(5, 10), class=sigma) 

brmfit2 <- brm(
  formula = model2,
  prior = priors2,
  data = peanut |> 
    mutate(logAVAL = ifelse(AVAL==0, -28, log(AVAL)),
           logAVALnext = case_when(AVAL==0 ~ log(3),
                                   AVAL==3 ~ log(10),
                                   AVAL==10 ~ log(100),
                                   AVAL==100 ~ log(300),
                                   AVAL==300 ~ log(600),
                                   AVAL==600 ~ log(1000),
                                   TRUE ~ log(30000))),
  refresh=0
  )

Running MCMC with 4 parallel chains...

Chain 4 finished in 4.6 seconds.
Chain 3 finished in 4.6 seconds.
Chain 2 finished in 4.8 seconds.
Chain 1 finished in 5.1 seconds.

All 4 chains finished successfully.
Mean chain execution time: 4.8 seconds.
Total execution time: 5.2 seconds.

summary(brmfit2)

 Family: gaussian 
  Links: mu = identity; sigma = identity 
Formula: logAVAL | cens("interval", logAVALnext) ~ E0 + Emax * dose^h/(dose^h + ED50^h) 
         h ~ exp(logh)
         ED50 ~ exp(logED50)
         E0 ~ 1
         logED50 ~ 1
         logh ~ 1
         Emax ~ 1
   Data: mutate(peanut, logAVAL = ifelse(AVAL == 0, -28, lo (Number of observations: 300) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
E0_Intercept          0.90      0.39     0.12     1.66 1.00     2378     2397
logED50_Intercept     3.13      1.04     1.74     5.63 1.00     1002      879
logh_Intercept       -0.64      0.33    -1.24     0.04 1.00     1027     1405
Emax_Intercept        7.66      1.80     5.33    11.97 1.00      942      795

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
sigma     2.42      0.12     2.20     2.66 1.00     2488     2235

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

Ordinal outcome aproach

# Note that we omit E0 from the model, because the expected placebo outcome
# is already described by the intercept parameters of the distribution
model3 <- bf(AVAL ~ Emax * dose^h/(dose^h + ED50^h),
             family = cumulative(link = "logit", threshold = "flexible"),
             nlf(h ~ exp(logh)),
             nlf(ED50 ~ exp(logED50)),
             logED50 ~ 1,
             logh ~ 1,
             Emax ~ 1, 
             nl=TRUE)

priors3 <- prior(normal(0, 1), nlpar=logh) + # No particular reason to
                                             # change prior here
           prior(normal(0, 6), nlpar=Emax) + # Prior could be
                                             # different (odds from
                                             # one category to the
                                             # next not necessarily
                                             # the same as for top two
                                             # vs. rest)
           prior(normal(2.7, 2.5), nlpar=logED50) + # No particular
                                                    # reason to change
                                                    # prior here
           prior(student_t(3, 0, 2.5), class=Intercept) +
           prior(student_t(3, -0.2, 1), class=Intercept, coef=1) +
           prior(student_t(3, 2.197, 1), class=Intercept, coef=5)

brmfit3 <- brm(
  formula = model3,
  prior = priors3,
  data = mutate(peanut, AVAL = ordered(AVAL)),
  refresh=0
  )

Running MCMC with 4 parallel chains...

Chain 2 finished in 13.6 seconds.
Chain 1 finished in 16.0 seconds.
Chain 3 finished in 16.0 seconds.
Chain 4 finished in 17.6 seconds.

All 4 chains finished successfully.
Mean chain execution time: 15.8 seconds.
Total execution time: 17.9 seconds.

summary(brmfit3)

 Family: cumulative 
  Links: mu = logit; disc = identity 
Formula: AVAL ~ Emax * dose^h/(dose^h + ED50^h) 
         h ~ exp(logh)
         ED50 ~ exp(logED50)
         logED50 ~ 1
         logh ~ 1
         Emax ~ 1
   Data: mutate(peanut, AVAL = ordered(AVAL)) (Number of observations: 300) 
  Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup draws = 4000

Regression Coefficients:
                  Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept[1]          0.07      0.24    -0.41     0.55 1.00     3580     2784
Intercept[2]          1.03      0.26     0.54     1.55 1.00     4218     2715
Intercept[3]          1.75      0.28     1.23     2.31 1.00     3980     2849
Intercept[4]          2.30      0.29     1.75     2.87 1.00     4128     2761
Intercept[5]          3.05      0.30     2.48     3.66 1.00     4472     2738
Intercept[6]          4.04      0.32     3.42     4.68 1.00     4340     3034
logED50_Intercept     3.95      1.23     2.25     7.04 1.01     1275     1009
logh_Intercept       -0.64      0.30    -1.18    -0.02 1.00     1372     2008
Emax_Intercept        5.67      1.71     3.52    10.29 1.01     1155      982

Further Distributional Parameters:
     Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
disc     1.00      0.00     1.00     1.00   NA       NA       NA

Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).