This workflow was built with R version 4.5.1. To install R for your computer, follow the directions on the R Project website.
This can be done by downloading the source zip file and unzipping the contents into your preferred directory.
Navigate to the directory you added the workflow to above and launch
R via command line. In bash or zsh, this can be done with the following
commands, substituting your/preferred/directory with the
directory path.
cd your/preferred/directory
R
renv, create the virtual environment from the
renv.lock file, and install dependenciesNow we can begin setting up our workspace and installing dependencies
using renv. renv sets up isolated package
libraries called virtual environments similar to python’s
venv. It’s best practice to use a dedicated virtual
environments for each project. However, in order to set up a virtual
environment, we need to first install renv on the system
library.
install.packages("renv")
renv::init(bioconductor = TRUE)
This workflow requires the following packages and their dependencies:
renv will automatically detect the
renv.lock file in the main directory, which will tell it to
install the packages above and their dependencies. To commence this
process, run the following:
The original dataset in Bruker .d format can be found on MassIVE via ID number: MSV000094448. Due to some technical issues with MassIVE and .imzML files, we have stored the converted raw files and all intermediates created from this analysis in an public Amazon S3 bucket. Files can be downloaded directly over HTTPS — no AWS account, credentials, or additional software is required.
This workflow is written to look for data in the ./Data
directory, with specific steps having sub directories. If these already
exist this code will do nothing, otherwise we will create them.
dirs <- c(
file.path(".", "Data"),
file.path(".", "Data", "Raw"),
file.path(".", "Data", "Filtering-Aggregation"),
file.path(".", "Data", "Preprocessing"),
file.path(".", "Data", "Simulation-Data"),
file.path(".", "Outputs"),
file.path(".", "Recreated-Figures")
)
for (d in dirs) {
dir.create(d, showWarnings = FALSE, recursive = TRUE)
}
You do not need to download everything. Set any of the following to
FALSE to skip that directory. At minimum, you will need
Raw and Simulation-Data to run the full
workflow from scratch, or the later intermediates to reproduce specific
steps. T
download_dirs <- list(
"Data/Raw" = TRUE,
"Data/Preprocessing/Smoothed-BLR-Recalibrated" = TRUE,
"Data/Preprocessing/Individual-Picked-Aligned-Filtered" = TRUE,
"Data/Preprocessing/Smoothed-BLR-Recalibrated-Picked-Aligned-FreqFiltered05perRun.imzML" = TRUE,
"Data/Preprocessing/normalized.imzML" = TRUE,
"Data/Filtering-Aggregation/Preprocessed-NSFiltered.imzML" = TRUE,
"Data/Filtering-Aggregation/Preprocessed-NSFiltered-Aggregated.imzML" = TRUE,
"Data/Filtering-Aggregation/Preprocessed-NSFiltered-Aggregated-CartilageFiltered25.imzML" = TRUE,
"Data/Simulation-Data" = TRUE
)
The following functions list and download files from the public S3 bucket using only base R. No additional packages are needed.
bucket <- "preprint-msi-arthtitis-data"
base_url <- paste0("https://", bucket, ".s3.amazonaws.com")
source("S3Utilities.R")
The following code iterates over your selected directories, lists the files in each from the S3 bucket, and downloads them.
selected <- names(download_dirs)[unlist(download_dirs)]
for (prefix in selected) {
message("\n--- Listing: ", prefix, " ---")
# Ensure the prefix ends with / so we only get objects inside it
query_prefix <- ifelse(grepl("/$", prefix), prefix, paste0(prefix, "/"))
objects <- s3_list_objects(base_url, query_prefix)
if (nrow(objects) == 0) {
message(" No files found for prefix: ", prefix)
next
}
total_size <- sum(objects$size)
# Check how many already exist locally
already_exist <- vapply(objects$key, function(k) {
file.exists(file.path(".", k))
}, logical(1))
n_skip <- sum(already_exist)
n_todo <- nrow(objects) - n_skip
message(sprintf(" %d files (%s total), %d to download, %d already present",
nrow(objects), human_size(total_size), n_todo, n_skip))
if (n_todo == 0) {
message(" All files already downloaded, skipping.")
next
}
for (i in seq_len(nrow(objects))) {
key <- objects$key[i]
size <- objects$size[i]
if (file.exists(file.path(".", key))) next
message(sprintf(" [%d/%d] %s (%s)",
i, nrow(objects), basename(key), human_size(size)))
s3_download_file(base_url, key, dest_dir = ".")
}
message(" Done: ", prefix)
}
message("\nAll downloads complete.")