Data objects in R can be serialized to disk in R’s Rds
format using the base R save() function and re-imported
using the load() function. This R-specific binary data
format can however not be used or read by other programming languages
preventing thus the exchange of R data objects between software or
programming languages. The MsStash package defines basic
classes and generic methods to export and import mass spectrometry data
objects in various storage formats aiming to facilitate data exchange
between software. This includes, among other formats, also storage of
data objects using Bioconductor’s alabaster.base
package.
For export or import of MS data objects, the
saveMsObject() and readMsObject() functions
can be used. For saveMsObject(), the first parameter is the
MS data object that should be stored, for readMsObject() it
defines type of MS object that should be restored (returned). The second
parameter param defines and configures the storage format
of the MS data. The currently supported formats and the respective
parameter objects are:
PlainTextParam: storage of data in (a custom) plain
text file format.AlabasterParam: storage of MS data using Bioconductor’s
alabaster.base
framework based files in HDF5 and JSON format.These storage formats are described in more details in the following sections.
An example use of these functions and parameters:
saveMsObject(x, param = PlainTextParam(storage_path)) to
store an MS data object assigned to a variable x to a
directory storage_path using the plain text file format. To
restore the data (assuming x was an instance of a
MsExperiment class):
readMsObject(MsExperiment(), param = PlainTextParam(storage_path)).
The package can be installed with the BiocManager package.
To install BiocManager use
install.packages("BiocManager") and, after that,
BiocManager::install("RforMassSpectrometry/MsStash") to
install this package.
To illustrate how the save/read functionality can be implemented for
a specific data class, we first define a simple toy R S4 object to
represent the data from a single mass spectrum. This
MySpectrum class contains slots to hold the spectrum’s
m/z and intensity values as well as some (limited)
metadata.
#' Class definition
setClass("MySpectrum",
slots = c(mz = "numeric",
intensity = "numeric",
rtime = "numeric",
msl = "integer"),
prototype = prototype(
mz = numeric(),
intensity = numeric(),
rtime = numeric(),
msl = integer()))
#' Default constructor function
MySpectrum <- function(mz = numeric(), intensity = numeric(),
rtime = numeric(), msl = integer()) {
stopifnot(length(mz) == length(intensity))
if (length(mz) && !length(rtime)) rtime <- NA_real_
if (length(mz) && !length(msl)) msl <- NA_integer_
new("MySpectrum", mz = mz, intensity = intensity, rtime = rtime,
msl = as.integer(msl))
}We can now create an example MySpectrum object.
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
##
## Slot "intensity":
## [1] 123.10 1235.30 12.45 51.50
##
## Slot "rtime":
## [1] NA
##
## Slot "msl":
## [1] NA
To ensure consistency, the saveMsObject() should:
path).Both methods support also ..., hence, if needed,
additional parameters can be added to an implementation of the generic
method if needed.
PlainTextParamStorage of MS data objects in plain text format aims to support an easy exchange of data, and in particular analysis results, with external software, such as MS-DIAL or mzmine3. In most cases, the data is stored as tabulator delimited text files simplifying the use of the data and results across multiple programming languages, or their import into spreadsheet applications. MS data objects stored in plain text format can also be fully re-imported into R providing thus an alternative, and more flexible, object serialization approach than the R internal Rds/RData format.
We implement a saveMsObject() method for our
MySpectrum class and the PlainTextParam. This
function first creates the required directory and throws an error if an
result file is already stored there. Then it exports the data: for our
example we store the data of the object into a single text file in a
custom format we define: the metadata if first written to the file, one
line per metadata item followed by the m/z and intensity
values, each m/z-intensity pair in one line separated by a
tabulator.
#' Write example class to a plain text file
setMethod("saveMsObject", signature(object = "MySpectrum",
param = "PlainTextParam"),
function(object, param) {
dir.create(path = param@path, recursive = TRUE,
showWarnings = FALSE)
fl <- file.path(param@path, "my_spectrum.txt")
if (file.exists(fl))
stop("Overwriting an existing result object is not ",
"supported.")
## Write the type of object as a comment followed by the
## metadata.
writeLines(c(paste0("# ", class(object)[1L]),
paste0("rtime:", object@rtime),
paste0("msl:", object@msl)), con = fl)
## Write the peak data, i.e. m/z and intensity values
write.table(cbind(object@mz, object@intensity), file = fl,
sep = "\t", append = TRUE, col.names = FALSE,
row.names = FALSE)
})We next export our example object s with the
saveMsData() method to a temporary folder.
The data was thus exported to this text file. The individual lines are:
## [1] "# MySpectrum" "rtime:NA" "msl:NA" "1.4\t123.1" "1.6\t1235.3"
## [6] "1.9\t12.45" "2.56\t51.5"
We next implement the readMsObject() method for this
class. This function will read the text file content and assign the
imported values to the different slots of the MySpectrum
class.
#' Read example object from plain text file storage format
setMethod("readMsObject", signature(object = "MySpectrum",
param = "PlainTextParam"),
function(object, param) {
fl <- file.path(param@path, "my_spectrum.txt")
if (!file.exists(fl))
stop("my_spectrum.txt not found in the provided path")
l <- readLines(fl, n = 3) # read the comment and the metadata
p <- read.table(fl, sep = "\t", skip = 3)
MySpectrum(
mz = p[, 1L], intensity = p[, 2L],
rtime = suppressWarnings(
as.numeric(sub("rtime:", "", l[2], fixed = TRUE))),
msl = suppressWarnings(
as.integer(sub("msl:", "", l[3], fixed = TRUE))))
})We can now restore our MySpectrum object with the
readMsObject() method from the exported text file:
p <- PlainTextParam(path = file.path(tempdir(), "text_format"))
b <- readMsObject(MySpectrum(), p)
b## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
##
## Slot "intensity":
## [1] 123.10 1235.30 12.45 51.50
##
## Slot "rtime":
## [1] NA
##
## Slot "msl":
## [1] NA
AlabasterParamThe alabaster
framework and related Bioconductor package alabaster.base
implements methods to save a variety of R/Bioconductor objects to
on-disk representations based on standard file formats like HDF5 and
JSON. This ensures that Bioconductor objects can be easily read from
other languages like Python and Javascript. With
AlabasterParam, MsStash provides a parameter class
to configure saving MS data objects in the alabaster storage
format.
To enable writing in this format a saveMsObject() method
should be implemented for the MS data object and
AlabasterParam. To enable full alabaster support
it is also suggested to implement the
alabaster.base::saveObject method, a validation method and
a function to read from an alabaster format. For more details refer also
to the package vignette of the alabaster.base
package, in particular chapter 5 Extending to new classes.
We below define a saveObject() method. The generic for
this method is defined in the alabaster.base package. While it
would be possible to simply save the data as simple text files as we did
above, we use alabaster’s strategy to allow storage of more
complex objects (such as S4 objects in the individual slots). This uses
altSaveObject() and altReadObject() to save
individual slots or parent/child classes in sub-directories of
path. For each of these classes, a
saveObject() needs to be defined.
library(alabaster.base)
setMethod("saveObject", "MySpectrum", function(x, path, ...) {
## Create the directory where to save the data
dir.create(path = path, recursive = TRUE, showWarnings = FALSE)
## Create an "object" file; this defines the type of object stored in path
saveObjectFile(path, "my_spectrum")
## save each slot into it's own directory
altSaveObject(x@mz, path = file.path(path, "mz"))
altSaveObject(x@intensity, path = file.path(path, "intensity"))
altSaveObject(x@rtime, path = file.path(path, "retention_time"))
altSaveObject(x@msl, path = file.path(path, "ms_level"))
})We next need to implement a validation function for the
stash (directory). For our example we simply check that the
path contains the expected sub-directories with the
object’s content. This function needs then to be registered with the
registerValidateObjectFunction() method for our class.
#' Define a helper function to check that the folder contains all
#' expected sub-directories.
validateMySpectrum <- function(path, metadata) {
if (!dir.exists(path))
stop("Directory ", path, " does not exist")
req_dir <- c("mz", "intensity", "retention_time", "ms_level")
if (any(miss <- !dir.exists(file.path(path, req_dir))))
stop("Required directories ",
paste0("\"", req_dir[miss], "\"", collapse = ", "),
" not found in ", path)
}
#' Register the validation function
registerValidateObjectFunction("my_spectrum", validateMySpectrum)## NULL
Finally we define the function to read the data back from the stash.
We then register this function with alabaster’s
registerReadObjectFunction() function.
#' Define a function that can read from an alabaster-based serialization
#' of `MySpectrum` objects
readMySpectrum <- function(path, metadata, ...) {
validateMySpectrum(path)
## Read the data from individual sub-directories
mz <- altReadObject(file.path(path, "mz"))
int <- altReadObject(file.path(path, "intensity"))
rtime <- altReadObject(file.path(path, "retention_time"))
msl <- altReadObject(file.path(path, "ms_level"))
MySpectrum(mz = mz, intensity = int, rtime = rtime, msl = msl)
}
#' Register the read function
registerReadObjectFunction("my_spectrum", readMySpectrum)Registration of the validation and read functions is generally done
in the extension package’s onLoad() function.
With these functions defined and registered, we can store an instance
of MySpectrum directly with alabaster’s
saveObject() method:
#' Define the path where we want to export out data
p <- file.path(tempdir(), "alabaster_export")
#' Save the object
saveObject(s, path = p)This saved the object’s content to the directory specified with
path. The content of this folder is:
## /tmp/RtmpxcIktN/alabaster_export
## ├── OBJECT
## ├── _environment.json
## ├── intensity
## │ ├── OBJECT
## │ └── contents.h5
## ├── ms_level
## │ ├── OBJECT
## │ └── contents.h5
## ├── mz
## │ ├── OBJECT
## │ └── contents.h5
## └── retention_time
## ├── OBJECT
## └── contents.h5
We can read the serialized object again as a MySpectrum
object:
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
##
## Slot "intensity":
## [1] 123.10 1235.30 12.45 51.50
##
## Slot "rtime":
## [1] NA
##
## Slot "msl":
## [1] NA
We next implement the saveMsObject() and
readMsObject() methods for MySpectrum and
AlabasterParam. These can simply re-use the functions
implemented above.
#' Write example class to a plain text file
setMethod("saveMsObject", signature(object = "MySpectrum",
param = "AlabasterParam"),
function(object, param) {
if (file.exists(file.path(param@path, "OBJECT")))
stop("'path' contains already an MS data stash. Overwriting",
" is not supported. Please remove 'path' first.")
saveObject(object, param@path)
})
#' Read example object from plain text file storage format
setMethod("readMsObject", signature(object = "MySpectrum",
param = "AlabasterParam"),
function(object, param) {
readMySpectrum(param@path)
})We can now stash our MS object in either the text file-based format
(PlainTextParam) or the alabaster-based format
(AlabasterParam). Below we write it using the alabaster
approach.
To read the data back we can then use readMsObject()
specifying in addition the type of object we want to read.
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
##
## Slot "intensity":
## [1] 123.10 1235.30 12.45 51.50
##
## Slot "rtime":
## [1] NA
##
## Slot "msl":
## [1] NA
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] fs_2.1.0 alabaster.base_1.13.0 MsStash_0.97.0
## [4] BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] crayon_1.5.3 cli_3.6.6 knitr_1.51
## [4] rlang_1.2.0 xfun_0.57 ProtGenerics_1.39.2
## [7] generics_0.1.4 jsonlite_2.0.0 S4Vectors_0.51.1
## [10] buildtools_1.0.0 htmltools_0.5.9 maketools_1.3.2
## [13] sys_3.4.3 stats4_4.6.0 sass_0.4.10
## [16] rmarkdown_2.31 evaluate_1.0.5 jquerylib_0.1.4
## [19] fastmap_1.2.0 yaml_2.3.12 alabaster.schemas_1.13.0
## [22] lifecycle_1.0.5 Rhdf5lib_2.1.0 BiocManager_1.30.27
## [25] compiler_4.6.0 Rcpp_1.1.1-1.1 rhdf5filters_1.25.0
## [28] rhdf5_2.57.0 digest_0.6.39 R6_2.6.1
## [31] bslib_0.10.0 tools_4.6.0 BiocGenerics_0.59.0
## [34] cachem_1.1.0