Data objects in R can be serialized to disk in R’s rds or
RData format using the base R save() function and
re-imported using the load() function. This R-specific
binary data format can however not be used easily by other programming
languages preventing the exchange of R data objects between software or
programming languages. The MsStash package defines basic
classes and generic methods to export and import mass spectrometry (MS)
data objects in various storage formats aiming to facilitate data
exchange between software. The SpectraStash package implements
portable data storage formats (stashes) for data classes from the Spectra
package, including the Spectra object and it’s various data
backends.
The package can be installed with the BiocManager package.
To install BiocManager use
install.packages("BiocManager") and, after that,
BiocManager::install("RforMassSpectrometry/SpectraStash")
to install this package.
Spectra objectsMS data objects can be saved and restored through the
saveMsObject() and readMsObject() functions
into (or from) MS data stashes. Supported stash formats and their
respective parameter objects are:
PlainTextParam: storage of data in (a custom) plain
text file format.AlabasterParam: storage of MS data using Bioconductor’s
r Biocpkg("alabaster.base") framework using files in HDF5
and JSON format. MS stashes in this format fully support the functions
saveObject() and readObject() from
alabaster.base.See also the vignette from the MsStash for details on the formats and implementation notes.
As an example we create below a Spectra object from two
example MS data files from the MsDataHub package.
library(Spectra)
library(SpectraStash)
library(MsDataHub)
fls <- c(X20171016_POOL_POS_1_105.134.mzML(),
X20171016_POOL_POS_3_105.134.mzML())## MSn data (Spectra) with 1862 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 0.280 1
## 2 1 0.559 2
## 3 1 0.838 3
## 4 1 1.117 4
## 5 1 1.396 5
## ... ... ... ...
## 1858 1 258.636 927
## 1859 1 258.915 928
## 1860 1 259.194 929
## 1861 1 259.473 930
## 1862 1 259.752 931
## ... 34 more variables/columns.
##
## file(s):
## 14224b95f897_7859
## 1422343aa99_7860
We next filter the data restricting to spectra and mass peaks with a retention time between 20 and 200 seconds and an m/z between 110 and 120.
## MSn data (Spectra) with 1290 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 72
## 2 1 20.368 73
## 3 1 20.647 74
## 4 1 20.926 75
## 5 1 21.205 76
## ... ... ... ...
## 1286 1 198.649 712
## 1287 1 198.928 713
## 1288 1 199.207 714
## 1289 1 199.486 715
## 1290 1 199.765 716
## ... 34 more variables/columns.
##
## file(s):
## 14224b95f897_7859
## 1422343aa99_7860
## Lazy evaluation queue: 1 processing step(s)
## Processing:
## Filter: select retention time [20..200] on MS level(s) [Fri Jun 26 13:16:32 2026]
## Filter: select peaks with an m/z within [110, 120] [Fri Jun 26 13:16:32 2026]
We next store this Spectra object to a
SpectraStash using the saveMsObject() function. We
use an alabaster format and define the location of the stash with the
path parameter of AlabasterParam. For the
present example we save it to a temporary folder.
#' Define the location of the stash
d <- file.path(tempfile(), "spectra_stash")
#' Configure the format and location
ap <- AlabasterParam(d)
#' Save the `Spectra` object to the stash
saveMsObject(sps, ap)The content of the stash folder is:
## /tmp/RtmpLCi5TT/file142227f0070d/spectra_stash
## ├── OBJECT
## ├── _environment.json
## ├── backend
## │ ├── OBJECT
## │ └── spectra_data
## │ ├── OBJECT
## │ └── basic_columns.h5
## ├── metadata
## │ ├── OBJECT
## │ └── list_contents.json.gz
## ├── processing
## │ ├── OBJECT
## │ └── contents.h5
## ├── processing_chunk_size
## │ ├── OBJECT
## │ └── contents.h5
## ├── processing_queue_variables
## │ ├── OBJECT
## │ └── contents.h5
## └── spectra_processing_queue.json
In alabaster format, each slot of the Spectra object is
stored into its own sub directory. Spectra objects don’t
handle the MS data itself, but rely on a MsBackend to
provide this data. The MsBackend used by the
Spectra object is stored into it’s own stash located in the
backend directory of the SpectraStash. The Spectra
object can be restored again with readMsObject():
## MSn data (Spectra) with 1290 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 72
## 2 1 20.368 73
## 3 1 20.647 74
## 4 1 20.926 75
## 5 1 21.205 76
## ... ... ... ...
## 1286 1 198.649 712
## 1287 1 198.928 713
## 1288 1 199.207 714
## 1289 1 199.486 715
## 1290 1 199.765 716
## ... 25 more variables/columns.
##
## file(s):
## 14224b95f897_7859
## 1422343aa99_7860
## Lazy evaluation queue: 1 processing step(s)
## Processing:
## Filter: select retention time [20..200] on MS level(s) [Fri Jun 26 13:16:32 2026]
## Filter: select peaks with an m/z within [110, 120] [Fri Jun 26 13:16:32 2026]
We need to specify the type of the object to restore with the first
parameter of the function - in our case Spectra(). The full
Spectra object was restored, including the processing queue
and history.
We can also read (restore) only the MsBackend from the
SpectraStash. Since the present stash is in alabaster format we can
either use readMsObject() or also the
readObject() from alabaster.base:
## MsBackendMzR with 1290 spectra
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 72
## 2 1 20.368 73
## 3 1 20.647 74
## 4 1 20.926 75
## 5 1 21.205 76
## ... ... ... ...
## 1286 1 198.649 712
## 1287 1 198.928 713
## 1288 1 199.207 714
## 1289 1 199.486 715
## 1290 1 199.765 716
## ... 25 more variables/columns.
##
## file(s):
## 14224b95f897_7859
## 1422343aa99_7860
Or using readMsObject():
## MsBackendMzR with 1290 spectra
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 72
## 2 1 20.368 73
## 3 1 20.647 74
## 4 1 20.926 75
## 5 1 21.205 76
## ... ... ... ...
## 1286 1 198.649 712
## 1287 1 198.928 713
## 1288 1 199.207 714
## 1289 1 199.486 715
## 1290 1 199.765 716
## ... 25 more variables/columns.
##
## file(s):
## 14224b95f897_7859
## 1422343aa99_7860
Our example Spectra object uses an
MsBackendMzR backend which keeps only limited information
in memory and retrieves the peaks data (i.e., the m/z and
intensity values) from the original MS data files upon demand. The stash
for MsBackendMzR objects contains therefore also only the
spectra metadata and a reference to the original MS data files - but no
peaks data.
If the original MS data files were moved to a different location or
if the SpectraStash folder was moved to another computer, the updated
path to the raw MS data files would need to be provided with the
spectraPath parameter of the readMsObject()
function. As an alternative, it is also possible to create a
self-contained stash setting consolidate = TRUE in
saveMsObject(). We below save our Spectra
object again, this time into a self-contained stash.
d2 <- file.path(tempdir(), "spectra_stash2")
saveMsObject(sps, AlabasterParam(d2), consolidate = TRUE)The consolidate = TRUE parameter is passed to the
saveMsObject() call of the MsBackend, which,
for MsBackendMzR copies the original MS data files
into the stash folder:
## /tmp/RtmpLCi5TT/spectra_stash2
## ├── OBJECT
## ├── _environment.json
## ├── backend
## │ ├── 1422343aa99_7860
## │ ├── 14224b95f897_7859
## │ ├── OBJECT
## │ └── spectra_data
## │ ├── OBJECT
## │ └── basic_columns.h5
## ├── metadata
## │ ├── OBJECT
## │ └── list_contents.json.gz
## ├── processing
## │ ├── OBJECT
## │ └── contents.h5
## ├── processing_chunk_size
## │ ├── OBJECT
## │ └── contents.h5
## ├── processing_queue_variables
## │ ├── OBJECT
## │ └── contents.h5
## └── spectra_processing_queue.json
Note the two additional files in the backend folder - these are the original MS data files in mzML format. Such a self-contained stash folder allows to restore the full data even if the stash is moved to another file system. Of course, depending on the size of the data set and the respective raw MS data files, the stash folder can become very large.
Spectra with in-memory backendsIn addition to the on-disk backends
MsBackendMzR and MsBackendHdf5Peaks,
Spectra defines also in-memory backends
MsBackendMemory and MsBackendDataFrame, which
keep the full MS data in memory. Below we change the backend of our
sps object to MsBackendMemory:
## MSn data (Spectra) with 1290 spectra in a MsBackendMemory backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 72
## 2 1 20.368 73
## 3 1 20.647 74
## 4 1 20.926 75
## 5 1 21.205 76
## ... ... ... ...
## 1286 1 198.649 712
## 1287 1 198.928 713
## 1288 1 199.207 714
## 1289 1 199.486 715
## 1290 1 199.765 716
## ... 34 more variables/columns.
## Lazy evaluation queue: 1 processing step(s)
## Processing:
## Filter: select retention time [20..200] on MS level(s) [Fri Jun 26 13:16:32 2026]
## Filter: select peaks with an m/z within [110, 120] [Fri Jun 26 13:16:32 2026]
## Switch backend from MsBackendMzR to MsBackendMemory [Fri Jun 26 13:16:33 2026]
We next stash this updated Spectra object removing first
the stash directory of the previous SpectraStash (because overwriting
stash directories is not allowed).
#' Remove the existing SepctraStash
unlink(d2, recursive = TRUE)
#' Store the `Spectra` object in alabaster format
saveMsObject(sps, AlabasterParam(d2))Inspecting the content of the stash folder we can see a different structure:
## /tmp/RtmpLCi5TT/spectra_stash2
## ├── OBJECT
## ├── _environment.json
## ├── backend
## │ ├── OBJECT
## │ └── backend
## │ ├── OBJECT
## │ ├── mod_count
## │ │ ├── OBJECT
## │ │ └── contents.h5
## │ ├── peaks.h5
## │ └── spectra_data
## │ ├── OBJECT
## │ └── basic_columns.h5
## ├── metadata
## │ ├── OBJECT
## │ └── list_contents.json.gz
## ├── processing
## │ ├── OBJECT
## │ └── contents.h5
## ├── processing_chunk_size
## │ ├── OBJECT
## │ └── contents.h5
## ├── processing_queue_variables
## │ ├── OBJECT
## │ └── contents.h5
## └── spectra_processing_queue.json
The MS peaks data is now stored within a file peaks.h5, a
file in a HDF5 format used by the MsBackendHdf5Peaks
backend: saving in-memory backends changes the data first to a
MsBackendHdf5Peaks backend which is then stored into an
additional backend sub-folder of the stash. We can restore the
Spectra object with:
## MSn data (Spectra) with 1290 spectra in a MsBackendMemory backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 1
## 2 1 20.368 2
## 3 1 20.647 3
## 4 1 20.926 4
## 5 1 21.205 5
## ... ... ... ...
## 1286 1 198.649 1286
## 1287 1 198.928 1287
## 1288 1 199.207 1288
## 1289 1 199.486 1289
## 1290 1 199.765 1290
## ... 25 more variables/columns.
## Lazy evaluation queue: 1 processing step(s)
## Processing:
## Filter: select retention time [20..200] on MS level(s) [Fri Jun 26 13:16:32 2026]
## Filter: select peaks with an m/z within [110, 120] [Fri Jun 26 13:16:32 2026]
## Switch backend from MsBackendMzR to MsBackendMemory [Fri Jun 26 13:16:33 2026]
In addition, we can restore the MsBackendMemory
with:
## MsBackendMemory with 1290 spectra
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 1
## 2 1 20.368 2
## 3 1 20.647 3
## 4 1 20.926 4
## 5 1 21.205 5
## ... ... ... ...
## 1286 1 198.649 1286
## 1287 1 198.928 1287
## 1288 1 199.207 1288
## 1289 1 199.486 1289
## 1290 1 199.765 1290
## ... 25 more variables/columns.
and also the MsBackendHdf5Peaks which is used as the
actual data storage format for the in-memory
MsBackendMemory (note the double backend
sub-folder):
## MsBackendHdf5Peaks with 1290 spectra
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 20.089 1
## 2 1 20.368 2
## 3 1 20.647 3
## 4 1 20.926 4
## 5 1 21.205 5
## ... ... ... ...
## 1286 1 198.649 1286
## 1287 1 198.928 1287
## 1288 1 199.207 1288
## 1289 1 199.486 1289
## 1290 1 199.765 1290
## ... 25 more variables/columns.
##
## file(s):
## peaks.h5
## R version 4.6.1 (2026-06-24)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 26.04 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.32.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] alabaster.base_1.13.0 fs_2.1.0 MsDataHub_1.11.5
## [4] SpectraStash_0.97.6 MsStash_0.99.0 Spectra_1.23.3
## [7] BiocParallel_1.47.0 S4Vectors_0.51.3 BiocGenerics_0.59.7
## [10] generics_0.1.4 BiocStyle_2.41.0
##
## loaded via a namespace (and not attached):
## [1] KEGGREST_1.53.4 xfun_0.59 bslib_0.11.0
## [4] httr2_1.2.3 Biobase_2.73.1 rhdf5_2.57.1
## [7] rhdf5filters_1.25.0 vctrs_0.7.3 tools_4.6.1
## [10] curl_7.1.0 parallel_4.6.1 AnnotationDbi_1.75.0
## [13] tibble_3.3.1 RSQLite_3.53.2 cluster_2.1.8.2
## [16] blob_1.3.0 pkgconfig_2.0.3 data.table_1.18.4
## [19] dbplyr_2.6.0 lifecycle_1.0.5 compiler_4.6.1
## [22] Biostrings_2.81.3 Seqinfo_1.3.0 codetools_0.2-20
## [25] ncdf4_1.24 clue_0.3-68 htmltools_0.5.9
## [28] sys_3.4.3 buildtools_1.0.0 sass_0.4.10
## [31] yaml_2.3.12 crayon_1.5.3 pillar_1.11.1
## [34] jquerylib_0.1.4 MASS_7.3-65 cachem_1.1.0
## [37] MetaboCoreUtils_1.21.1 ExperimentHub_3.3.1 AnnotationHub_4.3.1
## [40] tidyselect_1.2.1 digest_0.6.39 purrr_1.2.2
## [43] dplyr_1.2.1 BiocVersion_3.24.0 maketools_1.3.2
## [46] fastmap_1.2.0 cli_3.6.6 magrittr_2.0.5
## [49] withr_3.0.3 filelock_1.0.3 rappdirs_0.3.4
## [52] bit64_4.8.2 XVector_0.53.0 httr_1.4.8
## [55] rmarkdown_2.31 bit_4.6.0 otel_0.2.0
## [58] png_0.1-9 memoise_2.0.1 evaluate_1.0.5
## [61] knitr_1.51 IRanges_2.47.2 BiocFileCache_3.3.0
## [64] rlang_1.2.0 Rcpp_1.1.1-1.1 glue_1.8.1
## [67] DBI_1.3.0 mzR_2.47.0 BiocManager_1.30.27
## [70] alabaster.schemas_1.13.0 jsonlite_2.0.0 R6_2.6.1
## [73] Rhdf5lib_2.1.0 ProtGenerics_1.39.2 MsCoreUtils_1.25.4