0-Setup

Install R

This workflow was built with R version 4.5.1. To install R for your computer, follow the directions on the R Project website.

Clone this workflow to your preferred directory

This can be done by downloading the source zip file and unzipping the contents into your preferred directory.

Run R

Navigate to the directory you added the workflow to above and launch R via command line. In bash or zsh, this can be done with the following commands, substituting your/preferred/directory with the directory path.

cd your/preferred/directory
R

Install `renv`, create the virtual environment from the `renv.lock` file, and install dependencies

Now we can begin setting up our workspace and installing dependencies using renv. renv sets up isolated package libraries called virtual environments similar to python’s venv. It’s best practice to use a dedicated virtual environments for each project. However, in order to set up a virtual environment, we need to first install renv on the system library.

install.packages("renv")
renv::init(bioconductor = TRUE)

This workflow requires the following packages and their dependencies:

Cardinal
dplyr, tibble, tidyr, and ggplot2
data.table
fuzzyjoin
patchwork, ggpubr, ggrepel
lmerTest, nlme
emmeans

renv will automatically detect the renv.lock file in the main directory, which will tell it to install the packages above and their dependencies. To commence this process, run the following:

Download data from S3

The original dataset in Bruker .d format can be found on MassIVE via ID number: MSV000094448. Due to some technical issues with MassIVE and .imzML files, we have stored the converted raw files and all intermediates created from this analysis in an public Amazon S3 bucket. Files can be downloaded directly over HTTPS — no AWS account, credentials, or additional software is required.

This workflow is written to look for data in the ./Data directory, with specific steps having sub directories. If these already exist this code will do nothing, otherwise we will create them.

Create local directories

dirs <- c(
  file.path(".", "Data"),
  file.path(".", "Data", "Raw"),
  file.path(".", "Data", "Filtering-Aggregation"),
  file.path(".", "Data", "Preprocessing"),
  file.path(".", "Data", "Simulation-Data"),
  file.path(".", "Outputs"),
  file.path(".", "Recreated-Figures")
)

for (d in dirs) {
  dir.create(d, showWarnings = FALSE, recursive = TRUE)
}

Select which data to download

You do not need to download everything. Set any of the following to FALSE to skip that directory. At minimum, you will need Raw and Simulation-Data to run the full workflow from scratch, or the later intermediates to reproduce specific steps. T

download_dirs <- list(
  "Data/Raw"                        = TRUE,
  "Data/Preprocessing/Smoothed-BLR-Recalibrated" = TRUE,
  "Data/Preprocessing/Individual-Picked-Aligned-Filtered" = TRUE,
  "Data/Preprocessing/Smoothed-BLR-Recalibrated-Picked-Aligned-FreqFiltered05perRun.imzML" = TRUE,
  "Data/Preprocessing/normalized.imzML" = TRUE,
  "Data/Filtering-Aggregation/Preprocessed-NSFiltered.imzML" = TRUE,
  "Data/Filtering-Aggregation/Preprocessed-NSFiltered-Aggregated.imzML" = TRUE,
  "Data/Filtering-Aggregation/Preprocessed-NSFiltered-Aggregated-CartilageFiltered25.imzML" = TRUE,
  "Data/Simulation-Data"            = TRUE
)

Download helper functions

The following functions list and download files from the public S3 bucket using only base R. No additional packages are needed.

bucket    <- "preprint-msi-arthtitis-data"
base_url  <- paste0("https://", bucket, ".s3.amazonaws.com")
source("S3Utilities.R")

Run the download

The following code iterates over your selected directories, lists the files in each from the S3 bucket, and downloads them.

selected <- names(download_dirs)[unlist(download_dirs)]

for (prefix in selected) {
  message("\n--- Listing: ", prefix, " ---")
  
  # Ensure the prefix ends with / so we only get objects inside it
  query_prefix <- ifelse(grepl("/$", prefix), prefix, paste0(prefix, "/"))
  objects <- s3_list_objects(base_url, query_prefix)
  
  if (nrow(objects) == 0) {
    message("  No files found for prefix: ", prefix)
    next
  }
  
  total_size <- sum(objects$size)
  
  # Check how many already exist locally
  already_exist <- vapply(objects$key, function(k) {
    file.exists(file.path(".", k))
  }, logical(1))
  n_skip <- sum(already_exist)
  n_todo <- nrow(objects) - n_skip
  
  message(sprintf("  %d files (%s total), %d to download, %d already present",
                  nrow(objects), human_size(total_size), n_todo, n_skip))
  
  if (n_todo == 0) {
    message("  All files already downloaded, skipping.")
    next
  }
  
  for (i in seq_len(nrow(objects))) {
    key  <- objects$key[i]
    size <- objects$size[i]
    
    if (file.exists(file.path(".", key))) next
    
    message(sprintf("  [%d/%d] %s (%s)",
                    i, nrow(objects), basename(key), human_size(size)))
    s3_download_file(base_url, key, dest_dir = ".")
  }
  
  message("  Done: ", prefix)
}

message("\nAll downloads complete.")

0-Setup

Ethan Rogers

2025-07-17

Install R

Clone this workflow to your preferred directory

Run R

Install `renv`, create the virtual environment from the `renv.lock` file, and install dependencies

Download data from S3

Create local directories

Select which data to download

Download helper functions

Run the download

0-Setup

Ethan Rogers

2025-07-17

Install R

Clone this workflow to your preferred directory

Run R

Install renv, create the virtual environment from the renv.lock file, and install dependencies

Download data from S3

Create local directories

Select which data to download

Download helper functions

Run the download

Install `renv`, create the virtual environment from the `renv.lock` file, and install dependencies