Title: | Retrieve Mass Spectrometry Data from MetaboLights |
---|---|
Description: | MetaboLights is one of the main public repositories for storage of metabolomics experiments, which includes analysis results as well as raw data. The MsBackendMetaboLights package provides functionality to retrieve and represent mass spectrometry (MS) data from MetaboLights. Data files are downloaded and cached locally avoiding repetitive downloads. MS data from metabolomics experiments can thus be directly and seamlessly integrated into R-based analysis workflows with the Spectra and MsBackendMetaboLights package. |
Authors: | Johannes Rainer [aut, cre] , Philippine Louail [aut] |
Maintainer: | Johannes Rainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 0.99.1 |
Built: | 2024-11-20 06:07:24 UTC |
Source: | https://github.com/rformassspectrometry/msbackendmetabolights |
MetaboLights is one of the main public repositories for deposition of metabolomics experiments including (raw) mass spectrometry (MS) and NMR data files and experimental/analysis results. The experimental metadata and results are stored as plain text files in ISA-tab format. Each MetaboLights experiment must provide a file describing the samples analyzed and at least one assay file that links between the experimental samples and the (raw and processed) data files with quantification of metabolites/features in these samples.
Each experiment in MetaboLights is identified with its unique identifier, starting with MTBLS followed by a number. The data (metadata files and MS/NMR data files) of an experiment are available through the repository's ftp server.
The functions listed here allow to query and retrieve information of a data set/experiment from MetaboLights.
mtbls_ftp_path()
: returns the FTP path for a provided MetaboLights ID.
With mustWork = TRUE
(the default) the function throws an error if
the path is not accessible (either because the data set does not exist or
no internet connection is available). The function returns a
character(1)
with the FTP path to the data set folder.
mtbls_cached_data_files()
: lists locally cached data files from
MetaboLights. Since this function evaluates only local content it does not
require an internet connection. With the default parameters all available
data files are listed. The parameters can be used to restrict the lookup.
mtbls_list_files()
: returns the available files (and directories) for the
specified MetaboLights data set (i.e., the FTP directory content of the
data set). The function returns a character
vector with the relative
file names to the absolute FTP path (mtbls_ftp_path()
) of the data set.
Parameter pattern
allows to filter the file names and define which
file names should be returned.
mtbls_sync_data_files()
: synchronize data files of a specifies
MetaboLights data set eventually downloading and locally caching them.
Parameter fileName
allows to specify names of selected data files to
sync.
mtbls_ftp_path(x = character(), mustWork = TRUE) mtbls_list_files(x = character(), pattern = NULL) mtbls_sync_data_files( mtblsId = character(), assayName = character(), pattern = "mzML$|CDF$|cdf$|mzXML$", fileName = character() ) mtbls_cached_data_files( mtblsId = character(), assayName = character(), pattern = "*", fileName = character() )
mtbls_ftp_path(x = character(), mustWork = TRUE) mtbls_list_files(x = character(), pattern = NULL) mtbls_sync_data_files( mtblsId = character(), assayName = character(), pattern = "mzML$|CDF$|cdf$|mzXML$", fileName = character() ) mtbls_cached_data_files( mtblsId = character(), assayName = character(), pattern = "*", fileName = character() )
x |
|
mustWork |
for |
pattern |
for |
mtblsId |
|
assayName |
|
fileName |
for |
For mtbls_ftp_path()
: character(1)
with the ftp path to the specified
data set on the MetaboLights ftp server.
For mtbls_list_files()
: character
with the names of the files in the
data set's base ftp directory.
For mtbls_sync_data_files()
and mtbls_cached_data_files()
: a
data.frame
with the MetaboLights ID, the assay name(s) and remote and
local file names of the synchronized data files.
Johannes Rainer, Philippine Louail
## Get the FTP path to the data set MTBLS2 mtbls_ftp_path("MTBLS2") ## Retrieve available files (and directories) for the data set MTBLS2 mtbls_list_files("MTBLS2") ## Retrieve the available assay files (file names starting with "a_"). afiles <- mtbls_list_files("MTBLS2", pattern = "^a_") afiles ## Read the content of one file a <- read.table(paste0(mtbls_ftp_path("MTBLS2"), afiles[1L]), header = TRUE, sep = "\t", check.names = FALSE) head(a) ## List all available files mtbls_cached_data_files()
## Get the FTP path to the data set MTBLS2 mtbls_ftp_path("MTBLS2") ## Retrieve available files (and directories) for the data set MTBLS2 mtbls_list_files("MTBLS2") ## Retrieve the available assay files (file names starting with "a_"). afiles <- mtbls_list_files("MTBLS2", pattern = "^a_") afiles ## Read the content of one file a <- read.table(paste0(mtbls_ftp_path("MTBLS2"), afiles[1L]), header = TRUE, sep = "\t", check.names = FALSE) head(a) ## List all available files mtbls_cached_data_files()
MsBackendMetaboLights
retrieves and represents mass spectrometry (MS)
data from metabolomics experiments stored in the
MetaboLights repository. The backend
directly extends the MsBackendMzR backend from the Spectra package and
hence supports MS data in mzML, netCDF and mzXML format. Data in other
formats can not be loaded with MsBackendMetaboLights
. Upon initialization
with the backendInitialize()
method, the MsBackendMetaboLights
backend
downloads and caches the MS data files of an experiment locally avoiding
hence repeated download of the data.
MsBackendMetaboLights() ## S4 method for signature 'MsBackendMetaboLights' backendInitialize( object, mtblsId = character(), assayName = character(), filePattern = "mzML$|CDF$|cdf$|mzXML$", offline = FALSE, ... ) ## S4 method for signature 'MsBackendMetaboLights' backendMerge(object, ...) ## S4 method for signature 'MsBackendMetaboLights' backendRequiredSpectraVariables(object, ...) mtbls_sync(x, offline = FALSE)
MsBackendMetaboLights() ## S4 method for signature 'MsBackendMetaboLights' backendInitialize( object, mtblsId = character(), assayName = character(), filePattern = "mzML$|CDF$|cdf$|mzXML$", offline = FALSE, ... ) ## S4 method for signature 'MsBackendMetaboLights' backendMerge(object, ...) ## S4 method for signature 'MsBackendMetaboLights' backendRequiredSpectraVariables(object, ...) mtbls_sync(x, offline = FALSE)
object |
an instance of |
mtblsId |
|
assayName |
|
filePattern |
|
offline |
|
... |
additional parameters; currently ignored. |
x |
an instance of |
File names for data files are by default extracted from the column
"Derived Spectral Data File"
of the MetaboLights data set's assay
table. If this column does not contain any supported file names, the
assay's column "Raw Spectral Data File"
is evaluated instead.
The backend uses the BiocFileCache package for caching of the data files. These are stored in the default local BiocFileCache cache along with additional metadata that includes the MetaboLights ID and the assay file name with which the data file is associated with. Note that at present only MS data files in mzML, CDF and mzXML format are supported.
The MsBackendMetaboLights
backend defines and provides additional spectra
variables "mtbls_id"
, "mtbls_assay_name"
and
"derived_spectral_data_file"
that list the MetaboLights ID, the name of
the assay file and the original data file name on the MetaboLights ftp
server for each individual spectrum. The "derived_spectral_data_file"
can
be used for the mapping between the experiment's samples and the
individual data files, respective their spectra. This mapping is provided
in the MetaboLights assay file.
The MsBackendMetaboLights
backend is considered read-only and does
thus not support changing m/z and intensity values directly.
Also, merging of MS data of MsBackendMetaboLights
is not supported and
thus c()
of several Spectra
with MS data represented by
MsBackendMetaboLights
will throw an error.
For MsBackendMetaboLights()
: an instance of MsBackendMetaboLights
.
For backendInitialize()
: an instance of MsBackendMetaboLights
with
the MS data of the specified MetaboLights data set.
For backendRequiredSpectraVariables()
: character
with spectra
variables that are needed for the backend to provide the MS data.
For mtbls_sync()
: the input MsBackendMetaboLights
with the paths to
the locally cached data files being eventually updated.
New instances of the class can be created with the MsBackendMetaboLights()
function. Data is loaded and initialized using the backendInitialize()
function which can be configured with parameters mtblsId
, assayName
and
filePattern
. mtblsId
must be the ID of a single (existing)
MetaboLights data set. Parameter assayName
allows to define specific
assays of the MetaboLights data set from which the data files should be
loaded. If provided, it should be the file name(s) of the respective
assay(s) in MetaboLights (use e.g.
mtbls_list_files(<MetaboLights ID>, pattern = "^a_")
to list all available
assay files for a given MetaboLights ID <MetaboLights ID>
). By default,
with assayName = character()
MS data files from all assays of a data
set are loaded. Optional parameter filePattern
defines the pattern that
should be used to filter the file names of the MS data files. It defaults
to data files with file endings of supported MS data files.
backendInitialize()
requires an active internet connection as the
function first compares the remote file content to the locally cached files
and eventually synchronizes changes/updates. This can be skipped with
offline = TRUE
in which case only locally cached content is queried.
The backendRequiredSpectraVariables()
function returns the names of the
spectra variables required for the backend to provide the MS data.
The mtbls_sync()
function can be used to synchronize the local data
cache and ensure that all data files are locally available. The function
will check the local cache and eventually download missing data files from
the MetaboLights repository.
Philippine Louail, Johannes Rainer
library(MsBackendMetaboLights) ## List files of a MetaboLights data set mtbls_list_files("MTBLS39") ## Initialize a MsBackendMetaboLights representing all MS data files of ## the data set with the ID "MTBLS39". This will download and cache all ## files and subsequently load and represent them in R. be <- backendInitialize(MsBackendMetaboLights(), "MTBLS39") be ## The `mtbls_sync()` function can be used to ensure that all data files are ## available locally. This function will eventually download missing data ## files or update their paths. be <- mtbls_sync(be)
library(MsBackendMetaboLights) ## List files of a MetaboLights data set mtbls_list_files("MTBLS39") ## Initialize a MsBackendMetaboLights representing all MS data files of ## the data set with the ID "MTBLS39". This will download and cache all ## files and subsequently load and represent them in R. be <- backendInitialize(MsBackendMetaboLights(), "MTBLS39") be ## The `mtbls_sync()` function can be used to ensure that all data files are ## available locally. This function will eventually download missing data ## files or update their paths. be <- mtbls_sync(be)