Simulation 2 was designed to evaluate 1) the effect of the predominant source of variance (subject or residual), and 2) the use of pixels as replicates, on error rates of tests from different model specifications. We start from the same base image objects and impose known feature-level effects, subject-level variation, and residual variation. We then model abundance, either as sample means or using pixels as replicates, before comparing false and true positive rates.
library(matrixStats)
library(broom.mixed)
library(emmeans)
library(ggpubr)
library(lmerTest)
library(dplyr)
library(tidyr)
library(ggplot2)
library(Cardinal)
library(data.table)
RNGkind(kind = "L'Ecuyer-CMRG")
source("Utilities.R")
source("SimulationUtilities.R")
setCardinalVerbose(verbose = FALSE)
We first define the baseline intensity level, a constant target pixel standard deviation, and the sample design. The simulation uses 16 samples, corresponding to 8 subjects crossed with a binary tissue indicator and a binary condition indicator. Each subject has 2 tissues, but only 1 condition.
baseline <- 50
pixel_sd <- 45
pdx <- PositionDataFrame(expand_grid(x = 1:100, y = 1:100))
pdx$all <- TRUE
fdx <- MassDataFrame(1:3000)
fdx$all <- baseline
samples <- data.frame(
subject = c(1:8, 1:8),
tissue = c(rep(0, 8), rep(1, 8)),
condition = c(rep(0, 4), rep(1, 4), rep(0, 4), rep(1, 4))
) |>
mutate(id = paste0(subject, "_", tissue, "_", condition))
knitr::kable(samples)
| subject | tissue | condition | id |
|---|---|---|---|
| 1 | 0 | 0 | 1_0_0 |
| 2 | 0 | 0 | 2_0_0 |
| 3 | 0 | 0 | 3_0_0 |
| 4 | 0 | 0 | 4_0_0 |
| 5 | 0 | 1 | 5_0_1 |
| 6 | 0 | 1 | 6_0_1 |
| 7 | 0 | 1 | 7_0_1 |
| 8 | 0 | 1 | 8_0_1 |
| 1 | 1 | 0 | 1_1_0 |
| 2 | 1 | 0 | 2_1_0 |
| 3 | 1 | 0 | 3_1_0 |
| 4 | 1 | 0 | 4_1_0 |
| 5 | 1 | 1 | 5_1_1 |
| 6 | 1 | 1 | 6_1_1 |
| 7 | 1 | 1 | 7_1_1 |
| 8 | 1 | 1 | 8_1_1 |
The code below was used to generate the base images for the Simulation 1 and 2. Theoretically it can be replicated with the code below, but the use of MulticoreParam and RNG seeds complicates random number generation, and may generate different results on different machines. We will instead load the images from the RData object supplied on MassIVE.
#set.seed(123, kind = "L'Ecuyer-CMRG")
# sim_list_original <- list()
# for (i in 1:nrow(samples)) {
# sim_list_original[[i]] <- simulateImage(pdx,
# fdx,
# sdpixel = pixel_sd,
# sdrun = 0,
# sdmz = 0,
# sdmult = 0,
# sdpeaks = 0,
# sdnoise = 0.0,
# SAR = FALSE,
# baseline = 0,
# spcorr = 0.5,
# centroided = TRUE,
# verbose = TRUE,
# BPPARAM=MulticoreParam())
# }
#
# save(sim_list_original, file = "sim_list_original.RData")
load(file.path("Data", "Simulation-Data", "sim_list_original.RData"))
The next block is where the main simulation assumptions are encoded. We specify:
delta, the magnitude of the fixed tissue and condition
effectsgamma, the scale of the subject-level random
componentresid, the scale of the sample-level residual
variationdelta <- 15
gamma <- sqrt(20)
resid <- sqrt(120)
effects <- tibble(
mz = mz(sim_list_original[[1]]),
tissue = c(rep(0, 1500), rep(delta, 1500)),
condition = c(rep(0, 1500), rep(delta, 1500))
)
knitr::kable(head(effects, 8))
| mz | tissue | condition |
|---|---|---|
| 1 | 0 | 0 |
| 2 | 0 | 0 |
| 3 | 0 | 0 |
| 4 | 0 | 0 |
| 5 | 0 | 0 |
| 6 | 0 | 0 |
| 7 | 0 | 0 |
| 8 | 0 | 0 |
The first 1500 features will not have nonzero fixed effects for tissue or condition.
We will make the effect of tissue and condition 0 for the first 1500 features, then 15 each for the last 1500. By generating features that do and do not have effects for the independent variables, we can quantify both true and false positive rates.
The script then generates two sources of random variation, both of which are centered at zero to enforce the modeling assumption that sources of random error are not systematically biased. This will preserve the intended baseline and fixed-effect interpretation on average.
set.seed(2025)
gamma_mat_cad <- matrix(rnorm(4 * nrow(effects), sd = gamma), nrow = 4)
gamma_mat_oa <- matrix(rnorm(4 * nrow(effects), sd = gamma), nrow = 4)
gamma_mat_cad <- scale(gamma_mat_cad, center = TRUE, scale = FALSE)
gamma_mat_oa <- scale(gamma_mat_oa, center = TRUE, scale = FALSE)
gamma_mat <- rbind(gamma_mat_oa, gamma_mat_cad, gamma_mat_oa, gamma_mat_cad)
residual_mat <- matrix(rnorm(16 * nrow(effects), sd = resid), nrow = 16)
residual_mat <- scale(residual_mat, center = TRUE, scale = FALSE)
dim(gamma_mat)
## [1] 16 3000
dim(residual_mat)
## [1] 16 3000
The fixed effects and random components are then combined into a sample-by-feature matrix of target means.
effects_matrix <- sapply(1:16, function(i) {
samples[[i, "tissue"]] * effects$tissue +
samples[[i, "condition"]] * effects$condition +
gamma_mat[i, ] +
residual_mat[i, ] +
baseline
})
if (any(effects_matrix < 0)) {
stop("Some effects have negative means. Increase baseline.")
}
MSI abundance can’t be negative, so we check to ensure this is correct. This is technically an oversimplification of our simulation: it enforces the assumption that variation in each feature is independent of intensity (i.e., the random components that make up the intensity of a specific feature for each pixel are normally distributed and symmetric, even when intensity is close to zero). In reality, intensity-dependent missingness means that assumption likely does not hold near zero. To ease this, we boosted the baseline, avoiding the issue for now.
The next step applies the target intensity to each feature, preserving the spatial autocorrelation of pixels within each feature. We are directly modeling ROIs, so there is no need to define them.
sim_list_adj <- list()
for (i in 1:length(sim_list_original)) {
m <- sim_list_original[[i]]
intensity(m) <- scale_row_sd_mean(
as.matrix(intensity(m)),
target_sd = pixel_sd,
target_mean = effects_matrix[, i]
)
sim_list_adj[[i]] <- summarizeFeatures(
m,
stat = c("mean", "sd"),
na.rm = TRUE,
verbose = FALSE
)
}
mse_frame_sim_list <- extract_data_for_modeling_alt(sim_list_adj, samples)
The last step takes the mean pixel intensity for each feature. At this point the simulation has created one summary value per sample and feature.
Before fitting models, it is useful to inspect two representative
features. To keep the simulation concrete, it is helpful to look at
representative unaffected and affected features. Feature
1000 lies in the null half of the spectrum where condition
and tissue have no relationship with intensity, whereas feature
2500 lies in the half of features where intensity is
affected by condition and tissue.
m_context <- cbind(sim_list_adj[[1]], sim_list_adj[[12]])
image(
m_context,
i = c(1000, 2500),
layout = c(2, 2),
scale = TRUE,
superpose = FALSE,
key = FALSE,
grid = FALSE,
xaxt = "n",
yaxt = "n",
ylab = NA,
xlab = NA
)
The first analysis treats each summarized image as the observational unit, summarizing with means.
The script fits the same mixed-effects model to every feature with
lme4::lmer, using fixed effects for tissue,
condition, and their interaction, plus a random intercept
for subject.
run_models_sim_list <- run_models_MEMORY_EFF(
mse_frame_sim_list,
formu = "mean ~ tissue * condition + (1|subject)"
)
This chunk extracts intraclass correlation coefficient (ICC) values and decomposes total variance into an implied subject-level component and an implied technical component.
feature_models_results_for_each_combo <- run_models_sim_list[[2]] |>
mutate(
sub_var = total_var * icc,
tech_var = total_var * (1 - icc),
eff = as.numeric(feature_id) > 1500
) |>
dplyr::select(feature_id, icc, eff, total_var, sub_var, tech_var) |>
filter(icc > 0 & icc < 1 & total_var > 40 & total_var < 505) |>
group_by(eff, icc_group = icc < 0.5) |>
mutate(
target_icc = if_else(icc_group, 0.25, 0.75),
distance = abs(icc - target_icc)
) |>
arrange(eff, icc_group, distance) |>
group_by(eff, icc_group) |>
slice_head(n = 150) |>
mutate(icc_label = if_else(icc_group, "low icc", "high icc")) |>
ungroup()
feature_models_results_for_each_combo |>
group_by(eff, icc_label) |>
summarise(
median_icc = median(icc),
median_total = median(sub_var + tech_var),
median_tech = median(tech_var),
median_sub = median(sub_var),
.groups = "drop"
) |>
knitr::kable()
| eff | icc_label | median_icc | median_total | median_tech | median_sub |
|---|---|---|---|---|---|
| FALSE | high icc | 0.7025347 | 148.8525 | 40.50939 | 103.95961 |
| FALSE | low icc | 0.2491983 | 138.8390 | 104.66244 | 33.57851 |
| TRUE | high icc | 0.6877291 | 145.5031 | 43.40351 | 99.13051 |
| TRUE | low icc | 0.2430072 | 144.0007 | 105.88457 | 34.51493 |
The distributions below provide useful context for the selected features.
feature_models_results_for_each_combo |>
ggplot(aes(x = total_var, fill = icc_label)) +
geom_histogram(position = "identity", bins = 50, alpha = 0.4) +
theme_classic()
feature_models_results_for_each_combo |>
ggplot(aes(x = tech_var, fill = icc_label)) +
geom_histogram(position = "identity", bins = 50, alpha = 0.4) +
theme_classic()
feature_models_results_for_each_combo |>
ggplot(aes(x = sub_var, fill = icc_label)) +
geom_histogram(position = "identity", bins = 50, alpha = 0.4) +
theme_classic()
In practice, with many layered sources of error and the complexity of parameter estimators like REML, it is simpler to fit more models than necessary then choose a subset with the desired breakdown of amount and predominant variance (subject vs residual). We do that here using the ICC, building sets with higher residual/technical variation than subject/biological variation and vice versa. Each subset contains features with and without condition and tissue effects, ordered by feature index so that later accuracy calculations can use the same truth convention.
n_each <- 150
feature_ids_for_lowb_hight <- feature_models_results_for_each_combo |>
filter(icc_label == "low icc") |>
arrange(feature_id) |>
pull(feature_id)
j_frame_lowb_hight <- mse_frame_sim_list |>
filter(feature_id %in% feature_ids_for_lowb_hight)
run_models_lowb_hight <- run_models_sim_list[[1]][feature_models_results_for_each_combo |>
filter(icc_label == "low icc") |>
arrange(feature_id) |>
pull(feature_id)]
feature_ids_for_highb_lowt <- feature_models_results_for_each_combo |>
filter(icc_label == "high icc") |>
arrange(feature_id) |>
pull(feature_id)
j_frame_highb_lowt <- mse_frame_sim_list |>
filter(feature_id %in% feature_ids_for_highb_lowt)
run_models_highb_lowt <- run_models_sim_list[[1]][feature_models_results_for_each_combo |>
filter(icc_label == "high icc") |>
arrange(feature_id) |>
pull(feature_id)]
In the chunk below, get_comparison_plotting_frame()
takes a simulated long-format dataset plus the per-feature mixed-effects
models we already fit and:
tests_lowb_hight <- get_comparison_plotting_frame(
j_frame_lowb_hight,
run_models_lowb_hight,
nmax = n_each
)
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `null = case_match(...)`.
## Caused by warning:
## ! `case_match()` was deprecated in dplyr 1.2.0.
## ℹ Please use `recode_values()` instead.
tests_highb_lowt <- get_comparison_plotting_frame(
j_frame_highb_lowt,
run_models_highb_lowt,
nmax = n_each
)
For model specifications, see Table 1 of the manuscript.
It is tempting to use pixels as replicates given the larger sample size than using sample means. Linear mixed-effects models, as well as most other linear models like ANOVA, linear regression, and t-tests, assume replicates are independent. As has been shown in previous work, pixels are not independent replicates; when substituted, they increase the sensitivity of models and hypothesis tests to the point that they become meaningless. We demonstrate this effect below. This section of the simulation code is computationally intense, and may not be suitable for all systems. Please proceed accordingly.
The following chunk converts each simulated image into long format,
one row per pixel-feature combination, then appends sample metadata.
Using data.table will help speed this process up and
consume less memory.
results_list <- list()
for (i in 1:16) {
m <- sim_list_adj[[i]]
transposed_intensity_matrix <- t(as.matrix(intensity(m)))
dt <- as.data.table(pData(sim_list_adj[[1]])[, 1:2])
for (col in 1:ncol(transposed_intensity_matrix)) {
dt[[as.character(col)]] <- transposed_intensity_matrix[, col]
}
dt_long <- melt(
dt,
id.vars = c("x", "y"),
measure.vars = as.character(1:3000),
variable.name = "feature_id",
value.name = "intensity"
)
dt_long[, subject := samples[i, 1]]
dt_long[, condition := samples[i, 3]]
dt_long[, tissue := samples[i, 2]]
results_list[[i]] <- dt_long
rm(dt, dt_long, transposed_intensity_matrix)
invisible(gc())
}
pix_as_rep_sim <- rbindlist(results_list)
rm(results_list)
cols_to_factor <- c("feature_id", "subject", "condition", "tissue")
pix_as_rep_sim[, (cols_to_factor) := lapply(.SD, as.factor), .SDcols = cols_to_factor]
To facilitate comparison, we will use the same two sets of features selected from above.
pix_as_rep_sim_lowb_hight <- pix_as_rep_sim[feature_id %in% as.factor(feature_ids_for_lowb_hight), ]
res_pixasrep_sim_lowb_hight <- run_models_MEMORY_EFF(
pix_as_rep_sim_lowb_hight,
"intensity ~ condition * tissue + (1|subject)"
)
pix_as_rep_sim_highb_lowt <- pix_as_rep_sim[feature_id %in% as.factor(feature_ids_for_highb_lowt), ]
res_pixasrep_sim_highb_lowt <- run_models_MEMORY_EFF(
pix_as_rep_sim_highb_lowt,
"intensity ~ condition * tissue + (1|subject)"
)
Like its counterpart from the pixel means section above, this chunk
uses a helper function,
get_comparison_plotting_frame_pix_as_rep(), with the
pixels-as-replicates simulated long-format dataset and the per-feature
mixed-effects models we fit above to compute hypothesis tests and
determine true and false positive rates.
get_comparison_plotting_frame_pix_as_rep <- function(mse_frame, model_res, nmax = 150) {
ttest_frame_condition <- mse_frame |>
filter(tissue == "1") |>
group_by(feature_id) |>
group_modify(~ broom::tidy(t.test(intensity ~ condition, data = .x, var.equal = TRUE))) |>
ungroup() |>
mutate(
SE = estimate / statistic,
adj_p.value = p.adjust(p.value, method = "fdr"),
old_feature_id = feature_id,
feature_id = row_number()
)
ttest_frame_tissue <- mse_frame |>
filter(condition == "1") |>
group_by(feature_id) |>
arrange(tissue, subject) |>
group_modify(~ broom::tidy(t.test(intensity ~ tissue, data = .x, var.equal = TRUE))) |>
ungroup() |>
mutate(
SE = estimate / statistic,
adj_p.value = p.adjust(p.value, method = "fdr"),
old_feature_id = feature_id,
feature_id = row_number()
)
contrasts_means <- bind_rows(lapply(model_res[[1]], FUN = function(m) {
bind_rows(
tidy(contrast(emmeans(m, ~ condition | tissue, lmer.df = "satterthwaite"), "pairwise", adjust = "none")) |> dplyr::rename(subgroup = tissue),
tidy(contrast(emmeans(m, ~ tissue | condition, lmer.df = "satterthwaite"), "pairwise", adjust = "none")) |> dplyr::rename(subgroup = condition)
)
})) |>
group_by(term, subgroup) |>
mutate(
feature_id = row_number(),
adj_p.value = p.adjust(p.value, method = "fdr")
) |>
ungroup()
bind_rows(
contrasts_means |>
filter(subgroup == "1") |>
mutate(
significant = p.value < 0.05,
truth = as.numeric(feature_id) > 150,
finding = ifelse(significant, ifelse(truth, "True Positive", "False Positive"), ifelse(truth, "False Negative", "True Negative"))
) |>
group_by(finding, term, .drop = FALSE) |>
summarise(n = n(), .groups = "keep") |>
ungroup() |>
mutate(
model = "Mixed Effects Model with All Variables",
test = "T Test of Conditional Means",
null = case_match(term, "condition" ~ "In Medial:\nControl vs Osteoarthritis", "tissue" ~ "In Osteoarthritis:\nLateral vs Medial"),
term = case_match(term, "condition" ~ "Condition", "condition:tissue" ~ "Interaction", "tissue" ~ "Subsection")
) |>
dplyr::select(finding, n, model, test, null, term),
ttest_frame_condition |>
mutate(
significant = p.value < 0.05,
truth = as.numeric(feature_id) > nmax,
finding = ifelse(significant, ifelse(truth, "True Positive", "False Positive"), ifelse(truth, "False Negative", "True Negative"))
) |>
group_by(finding, .drop = FALSE) |>
summarise(n = n(), .groups = "keep") |>
ungroup() |>
mutate(
model = "Linear Model with Condition",
null = "In Medial:\nControl vs Osteoarthritis",
test = "Two Sample T Test",
term = "Condition"
) |>
dplyr::select(finding, n, model, test, null, term),
ttest_frame_tissue |>
mutate(
significant = p.value < 0.05,
truth = as.numeric(feature_id) > nmax,
finding = ifelse(significant, ifelse(truth, "True Positive", "False Positive"), ifelse(truth, "False Negative", "True Negative"))
) |>
group_by(finding, .drop = FALSE) |>
summarise(n = n(), .groups = "keep") |>
ungroup() |>
mutate(
model = "Linear Model of Paired Differences\nBetween tissues",
null = "In Osteoarthritis:\nLateral vs Medial",
test = "Paired T Test",
term = "Subsection"
) |>
dplyr::select(finding, n, model, test, null, term)
)
}
tests_pix_as_reps_lowb_hight <- get_comparison_plotting_frame_pix_as_rep(
pix_as_rep_sim_lowb_hight,
res_pixasrep_sim_lowb_hight
)
tests_pix_as_reps_highb_lowt <- get_comparison_plotting_frame_pix_as_rep(
pix_as_rep_sim_highb_lowt,
res_pixasrep_sim_highb_lowt
)
We will now recreate the figures from above showing the true and false positive rates for each model and hypothesis.
df_test_vari_pix_as_reps <- bind_rows(
add_zero_tests(tests_pix_as_reps_lowb_hight) |> mutate(vari = top_labs[1]),
add_zero_tests(tests_pix_as_reps_highb_lowt) |> mutate(vari = top_labs[2])
) |>
mutate(vari = factor(vari, levels = top_labs)) |>
mutate(
model_simple = factor(
ifelse(test == "T Test of Conditional Means", "Mixed Effects Model", test),
levels = c("Mixed Effects Model", "Two Sample T Test", "Paired T Test"),
labels = c(
"Supplementary\nTable 1 Model 3:\nMixed Effects\nModel with\nboth variables",
"Supplementary\nTable 1 Model 1:\nEquivalent to\nTwo Sample T Test\nin this design",
"Supplementary\nTable 1 Model 2:\nEquivalent to\nTwo Sample T Test\nin this design"
)
),
null = factor(
null,
levels = c("In Medial:\nControl vs Osteoarthritis", "In Osteoarthritis:\nLateral vs Medial"),
labels = c(
"Hypothesis A, Between-subjects\nIn Medial: Osteoarthritis vs Control",
"Hypothesis B, Within-subjects\nIn Osteoarthritis: Medial vs Lateral"
)
)
)
annotate_figure(
ggarrange(
annotate_figure(
ggplot(
df_test_vari_pix_as_reps |>
filter(vari == top_labs[1]) |>
filter(finding %in% c("False Positive", "True Positive")) |>
mutate(
n = ifelse(finding == "False Positive", -1 * n, n),
finding = factor(finding, levels = c("True Positive", "False Positive"))
),
aes(x = model_simple, y = n / 150, fill = finding)
) +
theme_classic() +
geom_bar(stat = "identity", position = "stack") +
facet_grid(. ~ null, space = "free", scales = "free_x") +
theme(legend.position = "none") +
ylab(" ") +
xlab(NULL) +
geom_hline(yintercept = 0, color = "white") +
geom_hline(yintercept = -0.05, color = "black", linetype = "dashed") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(-1, 1)) +
scale_fill_manual(values = custom_colors),
top = top_labs[1]
),
annotate_figure(
ggplot(
df_test_vari_pix_as_reps |>
filter(vari == top_labs[2]) |>
filter(finding %in% c("False Positive", "True Positive")) |>
mutate(
n = ifelse(finding == "False Positive", -1 * n, n),
finding = factor(finding, levels = c("True Positive", "False Positive"))
),
aes(x = model_simple, y = n / 150, fill = finding)
) +
theme_classic() +
geom_bar(stat = "identity", position = "stack") +
facet_grid(. ~ null, space = "free", scales = "free_x") +
theme(legend.position = "none") +
ylab(NULL) +
xlab(NULL) +
geom_hline(yintercept = 0, color = "white") +
geom_hline(yintercept = -0.05, color = "black", linetype = "dashed") +
scale_y_continuous(labels = scales::percent_format(accuracy = 1), limits = c(-1, 1)) +
scale_fill_manual(values = custom_colors),
top = top_labs[2]
),
ncol = 2,
widths = c(4, 4),
common.legend = TRUE,
legend = "bottom"
)
)
ggsave(file.path("Recreated-Figures", "Figure-Variation-Simulation-2-Pixels-as-Replicates.png"), width = 11.5, height = 6, units = "in")