Storage Modes of MS Data Objects

Introduction

Data objects in R can be serialized to disk in R’s Rds format using the base R save() function and re-imported using the load() function. This R-specific binary data format can however not be used or read by other programming languages preventing thus the exchange of R data objects between software or programming languages. The MsStash package defines basic classes and generic methods to export and import mass spectrometry data objects in various storage formats aiming to facilitate data exchange between software. This includes, among other formats, also storage of data objects using Bioconductor’s alabaster.base package.

For export or import of MS data objects, the saveMsObject() and readMsObject() functions can be used. For saveMsObject(), the first parameter is the MS data object that should be stored, for readMsObject() it defines type of MS object that should be restored (returned). The second parameter param defines and configures the storage format of the MS data. The currently supported formats and the respective parameter objects are:

  • PlainTextParam: storage of data in (a custom) plain text file format.
  • AlabasterParam: storage of MS data using Bioconductor’s alabaster.base framework based files in HDF5 and JSON format.

These storage formats are described in more details in the following sections.

An example use of these functions and parameters: saveMsObject(x, param = PlainTextParam(storage_path)) to store an MS data object assigned to a variable x to a directory storage_path using the plain text file format. To restore the data (assuming x was an instance of a MsExperiment class): readMsObject(MsExperiment(), param = PlainTextParam(storage_path)).

Installation

The package can be installed with the BiocManager package. To install BiocManager use install.packages("BiocManager") and, after that, BiocManager::install("RforMassSpectrometry/MsStash") to install this package.

Example implementations

To illustrate how the save/read functionality can be implemented for a specific data class, we first define a simple toy R S4 object to represent the data from a single mass spectrum. This MySpectrum class contains slots to hold the spectrum’s m/z and intensity values as well as some (limited) metadata.

#' Class definition
setClass("MySpectrum",
         slots = c(mz = "numeric",
                   intensity = "numeric",
                   rtime = "numeric",
                   msl = "integer"),
         prototype = prototype(
             mz = numeric(),
             intensity = numeric(),
             rtime = numeric(),
             msl = integer()))

#' Default constructor function
MySpectrum <- function(mz = numeric(), intensity = numeric(),
                       rtime = numeric(), msl = integer()) {
    stopifnot(length(mz) == length(intensity))
    if (length(mz) && !length(rtime)) rtime <- NA_real_
    if (length(mz) && !length(msl)) msl <- NA_integer_
    new("MySpectrum", mz = mz, intensity = intensity, rtime = rtime,
        msl = as.integer(msl))
}

We can now create an example MySpectrum object.

s <- MySpectrum(c(1.4, 1.6, 1.9, 2.56), c(123.1, 1235.3, 12.45, 51.5))
s
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
## 
## Slot "intensity":
## [1]  123.10 1235.30   12.45   51.50
## 
## Slot "rtime":
## [1] NA
## 
## Slot "msl":
## [1] NA

Suggested properties of implemented methods

To ensure consistency, the saveMsObject() should:

  • first create the directory to which the data should be exported (defined by param path).
  • throw an error if the directory exists or contains already an exported object (avoiding thus accidental overwriting and eventual data corruption/inconsistencies).

Both methods support also ..., hence, if needed, additional parameters can be added to an implementation of the generic method if needed.

library(MsStash)

PlainTextParam

Storage of MS data objects in plain text format aims to support an easy exchange of data, and in particular analysis results, with external software, such as MS-DIAL or mzmine3. In most cases, the data is stored as tabulator delimited text files simplifying the use of the data and results across multiple programming languages, or their import into spreadsheet applications. MS data objects stored in plain text format can also be fully re-imported into R providing thus an alternative, and more flexible, object serialization approach than the R internal Rds/RData format.

We implement a saveMsObject() method for our MySpectrum class and the PlainTextParam. This function first creates the required directory and throws an error if an result file is already stored there. Then it exports the data: for our example we store the data of the object into a single text file in a custom format we define: the metadata if first written to the file, one line per metadata item followed by the m/z and intensity values, each m/z-intensity pair in one line separated by a tabulator.

#' Write example class to a plain text file
setMethod("saveMsObject", signature(object = "MySpectrum",
                                    param = "PlainTextParam"),
          function(object, param) {
              dir.create(path = param@path, recursive = TRUE,
                         showWarnings = FALSE)
              fl <- file.path(param@path, "my_spectrum.txt")
              if (file.exists(fl))
                  stop("Overwriting an existing result object is not ",
                       "supported.")
              ## Write the type of object as a comment followed by the
              ## metadata.
              writeLines(c(paste0("# ", class(object)[1L]),
                           paste0("rtime:", object@rtime),
                           paste0("msl:", object@msl)), con = fl)
              ## Write the peak data, i.e. m/z and intensity values
              write.table(cbind(object@mz, object@intensity), file = fl,
                          sep = "\t", append = TRUE, col.names = FALSE,
                          row.names = FALSE)
          })

We next export our example object s with the saveMsData() method to a temporary folder.

p <- PlainTextParam(path = file.path(tempdir(), "text_format"))
saveMsObject(s, p)

The data was thus exported to this text file. The individual lines are:

readLines(file.path(p@path, "my_spectrum.txt"))
## [1] "# MySpectrum" "rtime:NA"     "msl:NA"       "1.4\t123.1"   "1.6\t1235.3" 
## [6] "1.9\t12.45"   "2.56\t51.5"

We next implement the readMsObject() method for this class. This function will read the text file content and assign the imported values to the different slots of the MySpectrum class.

#' Read example object from plain text file storage format
setMethod("readMsObject", signature(object = "MySpectrum",
                                    param = "PlainTextParam"),
          function(object, param) {
              fl <- file.path(param@path, "my_spectrum.txt")
              if (!file.exists(fl))
                  stop("my_spectrum.txt not found in the provided path")
              l <- readLines(fl, n = 3) # read the comment and the metadata
              p <- read.table(fl, sep = "\t", skip = 3)
              MySpectrum(
                  mz = p[, 1L], intensity = p[, 2L],
                  rtime = suppressWarnings(
                      as.numeric(sub("rtime:", "", l[2], fixed = TRUE))),
                  msl = suppressWarnings(
                      as.integer(sub("msl:", "", l[3], fixed = TRUE))))
          })

We can now restore our MySpectrum object with the readMsObject() method from the exported text file:

p <- PlainTextParam(path = file.path(tempdir(), "text_format"))
b <- readMsObject(MySpectrum(), p)
b
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
## 
## Slot "intensity":
## [1]  123.10 1235.30   12.45   51.50
## 
## Slot "rtime":
## [1] NA
## 
## Slot "msl":
## [1] NA

AlabasterParam

The alabaster framework and related Bioconductor package alabaster.base implements methods to save a variety of R/Bioconductor objects to on-disk representations based on standard file formats like HDF5 and JSON. This ensures that Bioconductor objects can be easily read from other languages like Python and Javascript. With AlabasterParam, MsStash provides a parameter class to configure saving MS data objects in the alabaster storage format.

To enable writing in this format a saveMsObject() method should be implemented for the MS data object and AlabasterParam. To enable full alabaster support it is also suggested to implement the alabaster.base::saveObject method, a validation method and a function to read from an alabaster format. For more details refer also to the package vignette of the alabaster.base package, in particular chapter 5 Extending to new classes.

We below define a saveObject() method. The generic for this method is defined in the alabaster.base package. While it would be possible to simply save the data as simple text files as we did above, we use alabaster’s strategy to allow storage of more complex objects (such as S4 objects in the individual slots). This uses altSaveObject() and altReadObject() to save individual slots or parent/child classes in sub-directories of path. For each of these classes, a saveObject() needs to be defined.

library(alabaster.base)

setMethod("saveObject", "MySpectrum", function(x, path, ...) {
    ## Create the directory where to save the data
    dir.create(path = path, recursive = TRUE, showWarnings = FALSE)
    ## Create an "object" file; this defines the type of object stored in path
    saveObjectFile(path, "my_spectrum")
    ## save each slot into it's own directory
    altSaveObject(x@mz, path = file.path(path, "mz"))
    altSaveObject(x@intensity, path = file.path(path, "intensity"))
    altSaveObject(x@rtime, path = file.path(path, "retention_time"))
    altSaveObject(x@msl, path = file.path(path, "ms_level"))
})

We next need to implement a validation function for the stash (directory). For our example we simply check that the path contains the expected sub-directories with the object’s content. This function needs then to be registered with the registerValidateObjectFunction() method for our class.

#' Define a helper function to check that the folder contains all
#' expected sub-directories.
validateMySpectrum <- function(path, metadata) {
    if (!dir.exists(path))
        stop("Directory ", path, " does not exist")
    req_dir <- c("mz", "intensity", "retention_time", "ms_level")
    if (any(miss <- !dir.exists(file.path(path, req_dir))))
        stop("Required directories ",
             paste0("\"", req_dir[miss], "\"", collapse = ", "),
             " not found in ", path)
}

#' Register the validation function
registerValidateObjectFunction("my_spectrum", validateMySpectrum)
## NULL

Finally we define the function to read the data back from the stash. We then register this function with alabaster’s registerReadObjectFunction() function.

#' Define a function that can read from an alabaster-based serialization
#' of `MySpectrum` objects
readMySpectrum <- function(path, metadata, ...) {
    validateMySpectrum(path)
    ## Read the data from individual sub-directories
    mz <- altReadObject(file.path(path, "mz"))
    int <- altReadObject(file.path(path, "intensity"))
    rtime <- altReadObject(file.path(path, "retention_time"))
    msl <- altReadObject(file.path(path, "ms_level"))
    MySpectrum(mz = mz, intensity = int, rtime = rtime, msl = msl)
}

#' Register the read function
registerReadObjectFunction("my_spectrum", readMySpectrum)

Registration of the validation and read functions is generally done in the extension package’s onLoad() function.

With these functions defined and registered, we can store an instance of MySpectrum directly with alabaster’s saveObject() method:

#' Define the path where we want to export out data
p <- file.path(tempdir(), "alabaster_export")

#' Save the object
saveObject(s, path = p)

This saved the object’s content to the directory specified with path. The content of this folder is:

library(fs)
dir_tree(p)
## /tmp/RtmpxcIktN/alabaster_export
## ├── OBJECT
## ├── _environment.json
## ├── intensity
## │   ├── OBJECT
## │   └── contents.h5
## ├── ms_level
## │   ├── OBJECT
## │   └── contents.h5
## ├── mz
## │   ├── OBJECT
## │   └── contents.h5
## └── retention_time
##     ├── OBJECT
##     └── contents.h5

We can read the serialized object again as a MySpectrum object:

b <- readObject(p)
b
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
## 
## Slot "intensity":
## [1]  123.10 1235.30   12.45   51.50
## 
## Slot "rtime":
## [1] NA
## 
## Slot "msl":
## [1] NA

We next implement the saveMsObject() and readMsObject() methods for MySpectrum and AlabasterParam. These can simply re-use the functions implemented above.

#' Write example class to a plain text file
setMethod("saveMsObject", signature(object = "MySpectrum",
                                    param = "AlabasterParam"),
          function(object, param) {
              if (file.exists(file.path(param@path, "OBJECT")))
                  stop("'path' contains already an MS data stash. Overwriting",
                       " is not supported. Please remove 'path' first.")
              saveObject(object, param@path)
          })

#' Read example object from plain text file storage format
setMethod("readMsObject", signature(object = "MySpectrum",
                                    param = "AlabasterParam"),
          function(object, param) {
              readMySpectrum(param@path)
          })

We can now stash our MS object in either the text file-based format (PlainTextParam) or the alabaster-based format (AlabasterParam). Below we write it using the alabaster approach.

p <- file.path(tempdir(), "alabaster_format_2")
ap <- AlabasterParam(p)

saveMsObject(s, ap)

To read the data back we can then use readMsObject() specifying in addition the type of object we want to read.

b <- readMsObject(MySpectrum(), ap)
b
## An object of class "MySpectrum"
## Slot "mz":
## [1] 1.40 1.60 1.90 2.56
## 
## Slot "intensity":
## [1]  123.10 1235.30   12.45   51.50
## 
## Slot "rtime":
## [1] NA
## 
## Slot "msl":
## [1] NA

Session information

sessionInfo()
## R version 4.6.0 (2026-04-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] fs_2.1.0              alabaster.base_1.13.0 MsStash_0.97.0       
## [4] BiocStyle_2.41.0     
## 
## loaded via a namespace (and not attached):
##  [1] crayon_1.5.3             cli_3.6.6                knitr_1.51              
##  [4] rlang_1.2.0              xfun_0.57                ProtGenerics_1.39.2     
##  [7] generics_0.1.4           jsonlite_2.0.0           S4Vectors_0.51.1        
## [10] buildtools_1.0.0         htmltools_0.5.9          maketools_1.3.2         
## [13] sys_3.4.3                stats4_4.6.0             sass_0.4.10             
## [16] rmarkdown_2.31           evaluate_1.0.5           jquerylib_0.1.4         
## [19] fastmap_1.2.0            yaml_2.3.12              alabaster.schemas_1.13.0
## [22] lifecycle_1.0.5          Rhdf5lib_2.1.0           BiocManager_1.30.27     
## [25] compiler_4.6.0           Rcpp_1.1.1-1.1           rhdf5filters_1.25.0     
## [28] rhdf5_2.57.0             digest_0.6.39            R6_2.6.1                
## [31] bslib_0.10.0             tools_4.6.0              BiocGenerics_0.59.0     
## [34] cachem_1.1.0