---
title: "Storage Modes of MS Data Objects"
output:
BiocStyle::html_document:
toc_float: true
vignette: >
%\VignetteIndexEntry{Storage Modes of MS Data Objects}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
%\VignettePackage{MsIO}
%\VignetteDepends{MsIO,BiocStyle,msdata,MsExperiment,Spectra}
---
```{r style, echo = FALSE, results = 'asis', message=FALSE}
BiocStyle::markdown()
```
**Package**: `r Biocpkg("MsIO")`
**Authors**: `r packageDescription("MsIO")[["Author"]] `
**Compiled**: `r date()`
```{r, echo = FALSE, message = FALSE}
library(MsIO)
knitr::opts_chunk$set(echo = TRUE, message = FALSE)
library(BiocStyle)
```
# Introduction
Data objects in R can be serialized to disk in R's *Rds* format using the base R
`save()` function and re-imported using the `load()` function. This R-specific
binary data format can however not be used or read by other programming
languages preventing thus the exchange of R data objects between software or
programming languages. The *MsIO* package provides functionality to export and
import mass spectrometry data objects in various storage formats aiming to
facilitate data exchange between software. This includes, among other formats,
also storage of data objects using Bioconductor's `r Biocpkg("alabaster.base")`
package.
For export or import of MS data objects, the `saveMsObject()` and
`readMsObject()` functions can be used. For `saveMsObject()`, the first
parameter is the MS data object that should be stored, for `readMsObject()` it
defines type of MS object that should be restored (returned). The second
parameter `param` defines and configures the storage format of the MS data. The
currently supported formats and the respective parameter objects are:
- `PlainTextParam`: storage of data in plain text file format.
- `AlabasterParam`: storage of MS data using Bioconductor's
`r Biocpkg("alabaster.base")` framework based files in HDF5 and JSON format.
These storage formats are described in more details in the following sections.
An example use of these functions and parameters: `saveMsObject(x, param =
PlainTextParam(storage_path))` to store an MS data object assigned to a variable
`x` to a directory `storage_path` using the plain text file format. To restore
the data (assuming `x` was an instance of a `MsExperiment` class):
`readMsObject(MsExperiment(), param = PlainTextParam(storage_path))`.
# Installation
The package can be installed with the *BiocManager* package. To install
*BiocManager* use `install.packages("BiocManager")` and, after that,
`BiocManager::install("RforMassSpectrometry/MsIO")` to install this package.
For import or export of MS data objects installation of additional Bioconductor
packages might be needed:
- `r Biocpkg("Spectra")` (with `BiocManager::install("Spectra")`) for import or
export of `Spectra` or `MsBackendMzR` objects.
- `r Biocpkg("MsExperiment")` (with `BiocManager::install("MsExperiment")`) for
import or export of `MsExperiment` objects.
- `r Biocpkg("xcms")` (with `BiocManager::install("xcms")`) for import or export
of `XcmsExperiment` objects (result objects of *xcms*-based preprocessing).
# Plain text file format
Storage of MS data objects in *plain* text format aims to support an easy
exchange of data, and in particular analysis results, with external software,
such as [MS-DIAL](https://systemsomicslab.github.io/compms/msdial/main.html) or
[mzmine3](http://mzmine.github.io/download.html). In most cases, the data is
stored as tabulator delimited text files simplifying the use of the data and
results across multiple programming languages, or their import into spreadsheet
applications. MS data objects stored in plain text format can also be fully
re-imported into R providing thus an alternative, and more flexible, object
serialization approach than the R internal *Rds*/*RData* format.
Below we create a MS data object (`MsExperiment`) representing the data from two
raw MS data files and assign sample annotation information to these data files.
```{r}
library(MsIO)
library(MsExperiment)
fls <- dir(system.file("TripleTOF-SWATH", package = "msdata"),
full.names = TRUE)
mse <- readMsExperiment(
fls,
sampleData = data.frame(name = c("Pestmix1 DDA", "Pestmix SWATH"),
mode = c("DDA", "SWATH")))
mse
```
We can export this data object to plain text files using *MsIO*'s
`saveMsObject()` function in combination with the `PlainTextParam` parameter
object. The path to the directory to which the data should be stored can be
defined with the `path` parameter of `PlainTextParam`. With the call below we
store the MS data object to a temporary directory.
```{r}
d <- file.path(tempdir(), "ms_experiment_export")
saveMsObject(mse, PlainTextParam(path = d))
```
The data was exported to a set of text files that we list below:
```{r}
dir(d)
```
Each text file contains information about one particular *slot* of the MS data
object. See the `?PlainTextParam` help for a description of the files and their
respective formats. We can restore the MS data object again using the
`readMsObject()` function, specifying the type of object we want to restore (and
which was stored to the respective directory) with the first parameter of the
function and the data storage format with the second. In our example we use
`MsExperiment()` as first parameter and `PlainTextParam` as second. The MS data
of our `MsExperiment` data object was represented by a `Spectra` object, thus,
to import the data we need in addition to load the `r Biocpkg("Spectra")`
package.
```{r}
library(Spectra)
mse_in <- readMsObject(MsExperiment(), PlainTextParam(d))
mse_in
```
Note that at present *MsIO* does **not** support storage of the full MS data
(i.e. the individual mass peaks' *m/z* and intensity values) to plain text
file. *MsIO* supports storage of *on-disk* data objects/representations (such as
the `MsBackendMzR` object) to plain text formats. The `Spectra` object that is
used to represent the MS data of our example `MsExperiment` object uses a
`MsBackendMzR` backend and thus we were able to export and import its data. Due
to its on-disk data mode, this type of backend retrieves the MS data on-the-fly
from the original data files and hence we only need to store the MS metadata and
the location of the original data files. Thus, also with the restored MS data
object we have full access to the MS data:
```{r}
spectra(mse_in) |>
head() |>
intensity()
```
However, ff the location of the original MS data files was changed (e.g. if the
files or the stored object was moved to a different location or file system),
the new location of these files would be needed to be specified with parameter
`spectraPath` (e.g. `readMsObject(MsExperiment(), PlainTextParam(d), spectraPath
= )`).
Generally, `saveMsData()` stores the MS data objects in a modular way, i.e. the
content of each component or slot is exported to its own data file. The storage
directory of our example `MsExperiment` contains thus multiple data files:
```{r}
dir(d)
```
This modularity allows also to load only parts of the original data. We can for
example also load only the `Spectra` object representing the MS experiment's MS
data.
```{r}
s <- readMsObject(Spectra(), PlainTextParam(d))
s
```
Or even only the `MsBackendMzR` that is used by the `Spectra` object to
represent the MS data.
```{r}
be <- readMsObject(MsBackendMzR(), PlainTextParam(d))
be
```
# *alabaster*-based formats
The [alabaster framework](https://github.com/ArtifactDB/alabaster.base) and
related Bioconductor package `r Biocpkg("alabaster.base")` implements methods to
save a variety of R/Bioconductor objects to on-disk representations based on
standard file formats like HDF5 and JSON. This ensures that Bioconductor objects
can be easily read from other languages like Python and Javascript. With
`AlabasterParam`, *MsIO* supports export of MS data objects into these storage
formats. Below we export our example `MsExperiment` to a storage directory using
the alabaster format.
```{r}
d <- file.path(tempdir(), "ms_experiment_export_alabaster")
saveMsObject(mse, AlabasterParam(path = d))
```
The contents of the storage folder is listed below:
```{r}
dir(d, recursive = TRUE)
```
In contrast to the plain text format described in the previous section, that
stores all data files into a single directory, the alabaster export is
structured hierarchically into sub-folders by the MS data object's
slots/components.
To restore the object we use the `readMsObject()` function with an
`AlabasterParam` parameter objects to define the used data storage format.
```{r}
mse_in <- readMsObject(MsExperiment(), AlabasterParam(d))
mse_in
```
Also for this format, we can load parts of the data separately. We can load the
MS data as a `Spectra` object from the respective subfolder of the data storage
directory:
```{r}
s <- readMsObject(Spectra(), AlabasterParam(file.path(d, "spectra")))
s
```
The import/export functionality is completely compatible with Bioconductor's
alabaster framework and hence allows also to read the whole, or parts of the
data directly using alabaster's `readObject()` method. The full `MsExperiment`
is restored importing the full directory (i.e. providing the path to the
directory containing the full export with the function's `path` parameter).
```{r}
mse_in <- readObject(path = d)
mse_in
```
Alternatively, by providing a path to one of the MS object's components, it is
possible to read only specific parts of the data. Below we read the sample
annotation information as a `DataFrame` from the *sample_data* subfolder:
```{r}
readObject(path = file.path(d, "sample_data"))
```
# Session information
```{r}
sessionInfo()
```