Title: | Infrastructure for Chromatographic Mass Spectrometry Data |
---|---|
Description: | The Chromatograms packages defines a efficient infrastructure for storing and handling of chromatographic mass spectrometry data. It provides different implementations of *backends* to store and represent the data. Such backends can be optimized for small memory footprint or fast data access/processing. A lazy evaluation queue and chunk-wise processing capabilities ensure efficient analysis of also very large data sets. |
Authors: | Laurent Gatto [aut] , Johannes Rainer [aut] , Philippine Louail [aut, cre] |
Maintainer: | Philippine Louail <[email protected]> |
License: | Artistic-2.0 |
Version: | 0.2.0 |
Built: | 2024-11-08 10:16:54 UTC |
Source: | https://github.com/rformassspectrometry/Chromatograms |
ChromBackendMemory
: This backend stores chromatographic data directly
in memory, making it ideal for small datasets or testing. It can be
initialized with a data.frame
of chromatographic data via the chromData
parameter and a list
of data.frame
entries for peaks data using the
peaksData
parameter. These data can be accessed with the chromData()
and
peaksData()
functions.
ChromBackendMemory() ## S4 method for signature 'ChromBackendMemory' backendInitialize( object, chromData = fillCoreChromVariables(data.frame()), peaksData = list(.EMPTY_PEAKS_DATA) )
ChromBackendMemory() ## S4 method for signature 'ChromBackendMemory' backendInitialize( object, chromData = fillCoreChromVariables(data.frame()), peaksData = list(.EMPTY_PEAKS_DATA) )
object |
A |
chromData |
For |
peaksData |
For |
Philippine Louail
ChromBackend
is a virtual class that defines what different backends need
to provide to be used by the Chromatograms
package and classes.
The backend should provide access to the chromatographic data which mainly consists of (paired) intensity and retention time values. Additional chromatographic metadata such as MS level and precursor and product m/z should also be provided.
Through their implementation different backends can be either optimized for minimal memory requirements or performance. Each backend needs to implement data access methods listed in section Backend functions: below.
And example implementation and more details and descriptions are provided
in the Creating new ChromBackend
classes for Chromatograms vignette.
Currently available backends are:
ChromBackendMemory
: This backend stores chromatographic data directly
in memory, making it ideal for small datasets or testing. It can be
initialized with a data.frame
of chromatographic data via the
chromData
parameter and a list
of data.frame
entries for peaks data
using the peaksData
parameter. These data can be accessed with the
chromData()
and peaksData()
functions.
chromData(object, ...) chromData(object) <- value chromIndex(object, ...) chromIndex(object) <- value chromVariables(object, ...) mzMax(object, ...) mzMax(object) <- value mzMin(object, ...) mzMin(object) <- value precursorMzMin(object, ...) precursorMzMin(object) <- value precursorMzMax(object, ...) precursorMzMax(object) <- value productMzMax(object, ...) productMzMax(object) <- value productMzMin(object, ...) productMzMin(object) <- value reset(object, ...) coreChromVariables() corePeaksVariables() ## S4 method for signature 'ChromBackend' x[i, j, ..., drop = FALSE] ## S4 method for signature 'ChromBackend' x$name ## S4 replacement method for signature 'ChromBackend' x$name <- value ## S4 method for signature 'ChromBackend' backendMerge(object, ...) ## S4 method for signature 'ChromBackend' chromData(object, columns = chromVariables(object), drop = FALSE) ## S4 replacement method for signature 'ChromBackend' chromData(object) <- value ## S4 method for signature 'ChromBackend' peaksData(object, columns = c("rtime", "intensity"), drop = FALSE) ## S4 replacement method for signature 'ChromBackend' peaksData(object) <- value ## S4 method for signature 'ChromBackend' backendInitialize(object, ...) ## S4 method for signature 'ChromBackend' backendParallelFactor(object, ...) ## S4 method for signature 'list' backendMerge(object, ...) ## S4 method for signature 'ChromBackend' chromIndex(object) ## S4 replacement method for signature 'ChromBackend' chromIndex(object) <- value ## S4 method for signature 'ChromBackend' chromVariables(object) ## S4 method for signature 'ChromBackend' collisionEnergy(object) ## S4 replacement method for signature 'ChromBackend' collisionEnergy(object) <- value ## S4 method for signature 'ChromBackend' dataOrigin(object) ## S4 replacement method for signature 'ChromBackend' dataOrigin(object) <- value ## S4 method for signature 'ChromBackend' dataStorage(object) ## S4 replacement method for signature 'ChromBackend' dataStorage(object) <- value ## S4 method for signature 'ChromBackend' intensity(object) ## S4 replacement method for signature 'ChromBackend' intensity(object) <- value ## S4 method for signature 'ChromBackend' isEmpty(x) ## S4 method for signature 'ChromBackend' isReadOnly(object) ## S4 method for signature 'ChromBackend' length(x) ## S4 method for signature 'ChromBackend' lengths(x) ## S4 method for signature 'ChromBackend' msLevel(object) ## S4 replacement method for signature 'ChromBackend' msLevel(object) <- value ## S4 method for signature 'ChromBackend' mz(object) ## S4 replacement method for signature 'ChromBackend' mz(object) <- value ## S4 method for signature 'ChromBackend' mzMax(object) ## S4 replacement method for signature 'ChromBackend' mzMax(object) <- value ## S4 method for signature 'ChromBackend' mzMin(object) ## S4 replacement method for signature 'ChromBackend' mzMin(object) <- value ## S4 method for signature 'ChromBackend' peaksVariables(object) ## S4 method for signature 'ChromBackend' precursorMz(object) ## S4 replacement method for signature 'ChromBackend' precursorMz(object) <- value ## S4 method for signature 'ChromBackend' precursorMzMax(object) ## S4 replacement method for signature 'ChromBackend' precursorMzMax(object) <- value ## S4 method for signature 'ChromBackend' precursorMzMin(object) ## S4 replacement method for signature 'ChromBackend' precursorMzMin(object) <- value ## S4 method for signature 'ChromBackend' productMz(object) ## S4 replacement method for signature 'ChromBackend' productMz(object) <- value ## S4 method for signature 'ChromBackend' productMzMax(object) ## S4 replacement method for signature 'ChromBackend' productMzMax(object) <- value ## S4 method for signature 'ChromBackend' productMzMin(object) ## S4 replacement method for signature 'ChromBackend' productMzMin(object) <- value ## S4 method for signature 'ChromBackend' reset(object) ## S4 method for signature 'ChromBackend' rtime(object) ## S4 replacement method for signature 'ChromBackend' rtime(object) <- value ## S4 method for signature 'ChromBackend,ANY' split(x, f, drop = FALSE, ...)
chromData(object, ...) chromData(object) <- value chromIndex(object, ...) chromIndex(object) <- value chromVariables(object, ...) mzMax(object, ...) mzMax(object) <- value mzMin(object, ...) mzMin(object) <- value precursorMzMin(object, ...) precursorMzMin(object) <- value precursorMzMax(object, ...) precursorMzMax(object) <- value productMzMax(object, ...) productMzMax(object) <- value productMzMin(object, ...) productMzMin(object) <- value reset(object, ...) coreChromVariables() corePeaksVariables() ## S4 method for signature 'ChromBackend' x[i, j, ..., drop = FALSE] ## S4 method for signature 'ChromBackend' x$name ## S4 replacement method for signature 'ChromBackend' x$name <- value ## S4 method for signature 'ChromBackend' backendMerge(object, ...) ## S4 method for signature 'ChromBackend' chromData(object, columns = chromVariables(object), drop = FALSE) ## S4 replacement method for signature 'ChromBackend' chromData(object) <- value ## S4 method for signature 'ChromBackend' peaksData(object, columns = c("rtime", "intensity"), drop = FALSE) ## S4 replacement method for signature 'ChromBackend' peaksData(object) <- value ## S4 method for signature 'ChromBackend' backendInitialize(object, ...) ## S4 method for signature 'ChromBackend' backendParallelFactor(object, ...) ## S4 method for signature 'list' backendMerge(object, ...) ## S4 method for signature 'ChromBackend' chromIndex(object) ## S4 replacement method for signature 'ChromBackend' chromIndex(object) <- value ## S4 method for signature 'ChromBackend' chromVariables(object) ## S4 method for signature 'ChromBackend' collisionEnergy(object) ## S4 replacement method for signature 'ChromBackend' collisionEnergy(object) <- value ## S4 method for signature 'ChromBackend' dataOrigin(object) ## S4 replacement method for signature 'ChromBackend' dataOrigin(object) <- value ## S4 method for signature 'ChromBackend' dataStorage(object) ## S4 replacement method for signature 'ChromBackend' dataStorage(object) <- value ## S4 method for signature 'ChromBackend' intensity(object) ## S4 replacement method for signature 'ChromBackend' intensity(object) <- value ## S4 method for signature 'ChromBackend' isEmpty(x) ## S4 method for signature 'ChromBackend' isReadOnly(object) ## S4 method for signature 'ChromBackend' length(x) ## S4 method for signature 'ChromBackend' lengths(x) ## S4 method for signature 'ChromBackend' msLevel(object) ## S4 replacement method for signature 'ChromBackend' msLevel(object) <- value ## S4 method for signature 'ChromBackend' mz(object) ## S4 replacement method for signature 'ChromBackend' mz(object) <- value ## S4 method for signature 'ChromBackend' mzMax(object) ## S4 replacement method for signature 'ChromBackend' mzMax(object) <- value ## S4 method for signature 'ChromBackend' mzMin(object) ## S4 replacement method for signature 'ChromBackend' mzMin(object) <- value ## S4 method for signature 'ChromBackend' peaksVariables(object) ## S4 method for signature 'ChromBackend' precursorMz(object) ## S4 replacement method for signature 'ChromBackend' precursorMz(object) <- value ## S4 method for signature 'ChromBackend' precursorMzMax(object) ## S4 replacement method for signature 'ChromBackend' precursorMzMax(object) <- value ## S4 method for signature 'ChromBackend' precursorMzMin(object) ## S4 replacement method for signature 'ChromBackend' precursorMzMin(object) <- value ## S4 method for signature 'ChromBackend' productMz(object) ## S4 replacement method for signature 'ChromBackend' productMz(object) <- value ## S4 method for signature 'ChromBackend' productMzMax(object) ## S4 replacement method for signature 'ChromBackend' productMzMax(object) <- value ## S4 method for signature 'ChromBackend' productMzMin(object) ## S4 replacement method for signature 'ChromBackend' productMzMin(object) <- value ## S4 method for signature 'ChromBackend' reset(object) ## S4 method for signature 'ChromBackend' rtime(object) ## S4 replacement method for signature 'ChromBackend' rtime(object) <- value ## S4 method for signature 'ChromBackend,ANY' split(x, f, drop = FALSE, ...)
object |
Object extending |
... |
Additional arguments. |
value |
replacement value for |
x |
Object extending |
i |
For |
j |
For |
drop |
For |
name |
For |
columns |
For |
f |
|
The core chromatogram variables are variables (metadata) that can/should
be provided by a backend. For each of these variables a value needs to be
returned, if none is defined, a missing value (of the correct data type)
should be returned. The names of the chromatogram variables in your current
chromatogram object are returned with the chromVariables()
function.
For each core chromatogram variable a dedicated access method exists. In contrast to the peaks data described below, a single value should be returned for each chromatogram.
The coreChromVariables()
function returns the core chromatogram variables
along with their expected (defined) data type.
The core chromatogram variables (in alphabetical order) are:
chromIndex
: an integer
with the index of the chromatogram in the
original source file (e.g. mzML file).
collisionEnergy
: for SRM data, numeric
with the collision energy of
the precursor.
dataOrigin
: optional character
with the origin of a chromatogram.
dataStorage
: character
defining where the data is (currently) stored.
msLevel
: integer
defining the MS level of the data.
mz
: optional numeric
with the (target) m/z value for the
chromatographic data.
mzMin
: optional numeric
with the lower m/z value of the m/z range in
case the data (e.g. an extracted ion chromatogram EIC) was extracted from
a Spectra
object.
mzMax
: optional numeric
with the upper m/z value of the m/z range.
precursorMz
: for SRM data, numeric
with the target m/z of the
precursor (parent).
precursorMzMin
: for SRM data, optional numeric
with the lower m/z of
the precursor's isolation window.
precursorMzMax
: for SRM data, optional numeric
with the upper m/z of
the precursor's isolation window.
productMz
for SRM data, numeric
with the target m/z of the
product ion.
productMzMin
: for SRM data, optional numeric
with the lower m/z of
the product's isolation window.
productMzMax
: for SRM data, optional numeric
with the upper m/z of
the product's isolation window.
Similar to the core chromatogram variables, core peaks variables represent metadata that should be provided by a backend. Each of these variables should return a value, and if undefined, a missing value (with the appropriate data type) is returned. The number of values for a peaks variable in a single chromatogram can vary, from none to multiple, and may differ between chromatograms.
The names of peaks variables in the current chromatogram object can be
obtained with the peaksVariables()
function.
Each core peaks variable has a dedicated accessor method.
The corePeaksVariables()
function returns the core peaks variables along
with their expected (defined) data type.
The core peaks variables, listed in the required order for peaksData
, are:
rtime
: A numeric
vector containing retention time values.
intensity
: A numeric
vector containing intensity values.
They should be provided for each chromatogram in the backend, in this order,
No NAs are allowed for the rtime
values. These characteristics will be
checked with the validPeaksData()
function.
New backend classes must extend the base ChromBackend
class and
implement the following mandatory methods:
backendInitialize()
: initialises the backend. This method is
supposed to be called right after creating an instance of the
backend class and should prepare the backend.
Parameters can be defined freely for each backend, depending on what is
needed to initialize the backend.
This method has to ensure to set the spectra variable dataStorage
correctly.
chromData()
, chromData<-
: gets or sets general chromatogram metadata
(annotation). chromData()
returns a data.frame
, chromData<-
expects
a data.frame
with the same number of rows as there are chromatograms in
object
. Read-only backends might not need to implement the
replacement method chromData<-
(unless some internal caching mechanism
could be used). chromData()
should be implemented with the parameter
drop
set to FALSE
as default. With drop = FALSE
the method should
return a data.frame
even if only one column is called. If drop = TRUE
is specified, the output will be a vector of the single column requested.
New backends should be implemented such as if empty, the method returns a
data.frame
with 0 rows and the columns defined by chromVariables()
.
By default, the function should return at minimum the coreChromVariables,
even if NAs.
peaksData()
: returns a list
of data.frame
with the data
(e.g. retention time - intensity pairs) from each chromatogram. The length
of the list
is equal to the number of chromatograms in object
. For an
empty chromatogram a data.frame
with 0 rows and two columns (named
"rtime"
and "intensity"
) has to be returned. The optional parameter
columns
, if supported by the backend allows to define which peak
variables should be returned in each array. As default (minimum) columns
"rtime"
and "intensity"
have to be provided. peaksData()
should be
implemented with the parameter drop
set to FALSE
as default. With
drop = FALSE
the method should return a data.frame
even if only one
column is called. If drop = TRUE
is specified, the output will be a
vector of the single column requested.
peaksData<-
replaces the peak data (retention time and intensity values)
of the backend. This method expects a list
of two-dimensional arrays
(data.frame
) with columns representing the peak variables.
All existing peaks data are expected to be replaced with these new values.
The length of the list
has to match the number of spectra of object
.
Note that only writeable backends need to support this method.
[
: subset the backend. Only subsetting by element (row/i
) is
allowed.
$
, $<-
: access or set/add a single chromatogram variable (column) in
the backend.
backendMerge()
: merges (combines) ChromBackend
objects into a single
instance. All objects to be merged have to be of the same type.
Additional methods that might be implemented, but for which default implementations are already present are:
backendParallelFactor()
: returns a factor
defining an optimal
(preferred) way how the backend can be split for parallel processing
used for all peak data accessor or data manipulation functions.
The default implementation returns a factor of length 0 (factor()
)
providing thus no default splitting.
chromIndex()
: returns an integer
vector with the index of the
chromatograms in the original source file.
chromVariables()
: returns a character
vector with the
available chromatogram variables (columns, fields or attributes)
available in object
. Variables listed by this function are expected to
be returned (if requested) by the chromData()
function.
collisionEnergy()
, collisionEnergy<-
: gets or sets the collision energy
for the precursor (for SRM data). collisionEnergy()
returns a numeric
of length equal to the number of chromatograms in object
.
dataOrigin()
, dataOrigin<-
: gets or sets the data origin variable.
dataOrigin()
returns a character
of length equal to the number of
chromatograms, dataOrigin<-
expects a character
of length equal
length(object)
.
dataStorage()
, dataStorage<-
: gets or sets the data storage variable.
dataStorage()
returns a character
of length equal to the number of
chromatograms in object
, dataStorage<-
expects a character
of
length equal length(object)
. Note that missing values (NA_character_
)
are not supported for dataStorage()
.
intensity()
: gets the intensity values from the chromatograms. Returns
a list
of numeric
vectors (intensity values for each
chromatogram). The length of the list is equal to the number of
chromatograms in object
.
intensity<-
: replaces the intensity values. value
has to be a list
of length equal to the number of chromatograms and the number of values
within each list element identical to the number of data pairs in each
chromatogram. Note that just writeable backends need to support this
method.
isReadOnly()
: returns a logical(1)
whether the backend is read
only or does allow also to write/update data. Defaults to FALSE.
isEmpty()
: returns a logical
of length equal to the number of
chromatograms with TRUE
for chromatograms without any data pairs.
length()
: returns the number of chromatograms in the object.
lengths()
: returns the number of data pairs (retention time and intensity
values) per chromatogram.
msLevel()
: gets the chromatogram's MS level. Returns an integer
vector (of length equal to the number of chromatograms) with the MS
level for each chromatogram (or NA_integer_
if not available).
mz()
,mz<-
: gets or sets the m/z value of the chromatograms. mz()
returns a numeric
of length equal to the number of chromatograms in object
, mz<-
expects a numeric
of length length(object)
.
mzMax()
,mzMax<-
: gets or sets the upper m/z of the mass-to-charge
range from which a chromatogram contains signal (e.g. if the chromatogram
was extracted from MS data in spectra format and a m/z range was provided).
mzMax()
returns a numeric
of length equal to the number of
chromatograms in object
, mzMax<-
expects a numeric
of length equal
to the number of chromatograms in object
.
mzMin()
,mzMin<-
: gets or sets the lower m/z of the mass-to-charge range
from which a chromatogram contains signal (e.g. if the chromatogram
was extracted from MS data in spectra format and a m/z range was provided).
mzMin()
returns a numeric
of length equal to the number of
chromatograms in object
, mzMin<-
expects a numeric
of length equal
to the number of chromatograms in object
.
peaksVariables()
: lists the available data variables for the
chromatograms. Default peak variables are "rtime"
and "intensity"
(which all backends need to support and provide), but some backends
might provide additional variables.
Variables listed by this function are expected to be returned (if
requested) by the peaksData()
function.
precursorMz()
,precursorMz<-
: gets or sets the (target) m/z of the
precursor (for SRM data). precursorMz()
returns a numeric
of length
equal to the number of chromatograms in object
. precursorMz<-
expects
a numeric
of length equal to the number of chromatograms.
precursorMzMin()
,precursorMzMax()
,productMzMin()
, productMzMax()
:
gets the lower and upper margin for the precursor or product isolation
windows. These functions might return the value of productMz()
if the
respective minimal or maximal m/z values are not defined in object
.
productMz()
,productMz<-
: gets or sets the (target) m/z of the
product (for SRM data). productMz()
returns a numeric
of length
equal to the number of chromatograms in object
. productMz<-
expects
a numeric
of length equal to the number of chromatograms.
rtime()
: gets the retention times from the chromatograms. returns a
NumericList()
of numeric
vectors (retention times for each
chromatogram). The length of the returned list is equal to the number of
chromatograms in object
.
rtime<-
: replaces the retention times. value
has to be a list
(or
NumericList()
) of length equal to the number of chromatograms and the
number of values within each list element identical to the number of
data pairs in each chromatogram. Note that just writeable backends support
this method.
split()
: splits the backend into a list
of backends (depending on
parameter f
). The default method for ChromBackend
uses
split.default()
, thus backends extending ChromBackend
don't
necessarily need to implement this method.
Backends extending ChromBackend
must implement all of its methods
(listed above). A guide to create new backend classes is provided as a
dedicated vignette. Additional information and an example for a backend
implementation is provided in the respective vignette.
Johannes Rainer, Philippine Louail
## Create a simple backend implementation ChromBackendDummy <- setClass("ChromBackendDummy", contains = "ChromBackend")
## Create a simple backend implementation ChromBackendDummy <- setClass("ChromBackendDummy", contains = "ChromBackend")