Package 'Chromatograms'

Title: Infrastructure for Chromatographic Mass Spectrometry Data
Description: The Chromatograms packages defines a efficient infrastructure for storing and handling of chromatographic mass spectrometry data. It provides different implementations of *backends* to store and represent the data. Such backends can be optimized for small memory footprint or fast data access/processing. A lazy evaluation queue and chunk-wise processing capabilities ensure efficient analysis of also very large data sets.
Authors: Laurent Gatto [aut] , Johannes Rainer [aut] , Philippine Louail [aut, cre]
Maintainer: Philippine Louail <[email protected]>
License: Artistic-2.0
Version: 0.5.0
Built: 2025-02-18 08:32:00 UTC
Source: https://github.com/rformassspectrometry/Chromatograms

Help Index


The Chromatograms class to manage and access chromatographic data

Description

The Chromatograms class encapsules chromatographic data and related metadata. The chromatographic data is represented by a backend extending the virtual ChromBackend class which provides the raw data to the Chromatograms object. Different backends and their properties are decribed in the ChromBackend class documentation.

Usage

Chromatograms(backend = ChromBackendMemory(), processingQueue = list(), ...)

## S4 method for signature 'Chromatograms,ChromBackend'
setBackend(
  object,
  backend,
  f = processingChunkFactor(object),
  BPPARAM = SerialParam(),
  ...
)

## S4 method for signature 'Chromatograms'
x$name

## S4 replacement method for signature 'Chromatograms'
x$name <- value

Arguments

backend

ChromBackend object providing the raw data for the Chromatograms object.

processingQueue

list a list of processing steps (i.e. functions) to be applied to the chromatographic data. The processing steps are applied in the order they are listed in the processingQueue.

...

Additional arguments.

object

A Chromatograms object.

f

factor defining the grouping to split the Chromatograms object.

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information.

x

A Chromatograms object.

name

A character string specifying the name of the variable to access.

value

The value to replace the variable with.

Creation of objects

Chromatograms objects can be created using the Chromatograms() construction function.

Data stored in a Chromatograms object

The Chromatograms object is a container for chromatographic data, which includes peaks data (retention time and related intensity values, also referred to as peaks data variables in the context of Chromatograms) and metadata of individual chromatogram (so called chromatograms variables). While a core set of chromatograms variables (the coreChromatogramsVariables()) and peaks data variables (the corePeaksVariables()) are guaranteed to be provided by a Chromatograms, it is possible to add arbitrary variables to a Chromatograms object.

The Chromatograms object is designed to contain chromatographic data of a (large) set of chromatograms. The data is organized linearly and can be thought of a list of chromatograms, i.e. each element in the Chromatograms is one chromatogram.

The chromatograms variables information in the Chromatograms object can be accessed using the chromData() function. Specific chromatograms variables can be accessed by either precising the "columns" parameter in chromData() or using $. chromData can be accessed, replaced but also filtered/subsetted. Refer to the chromData documentation for more details.

The peaks data variables information in the Chromatograms object can be accessed using the peaksData() function. Specific peaks variables can be accessed by either precising the "columns" parameter in peaksData() or using $. peaksData can be accessed, replaced but also filtered/subsetted. Refer to the peaksData documentation for more details.

Processing of Chromatograms objects

Functions that process the chromatograms data in some ways can be applied to the object either directly or by using the processingQueue mechanism. The processingQueue is a list of processing steps that are stored within the object and only applied when needed. This was created so that the data can be processed in a single step and is very useful for larger datasets. This is even more true as this processing queue will call function that can be applied on the data in a chunk-wise manner. This allows for parallel processing of the data and reduces the memory demand. To read more about the processingQueue, and how to parallelize your processes, see the processingQueue documentation.

Note

This needs to be discussed, if we want for example to be able to set a a backend to ChromBackendMzR we need to implement backendInitialize() better. = Support peaksData and chromData as arguments AND have a way to write .mzml files (which we do not have for chromatographic data).

See Also

chromData for a general description of the chromatographic metadata available in the object, as well as how to access, replace and subset them. peaksData for a general description of the chromatographic peaks data available in the object, as well as how to access, replace and subset them. processingQueue for more information on the queuing of processings and parallelization for larger dataset.

Examples

## Create a Chromatograms object
chroms <- Chromatograms(backend = ChromBackendMemory())

Improved in-memory Chromatographic data backend

Description

ChromBackendMemory: This backend stores chromatographic data directly in memory, making it ideal for small datasets or testing. It can be initialized with a data.frame of chromatographic data via the chromData parameter and a list of data.frame entries for peaks data using the peaksData parameter. These data can be accessed with the chromData() and peaksData() functions.

Usage

ChromBackendMemory()

## S4 method for signature 'ChromBackendMemory'
backendInitialize(
  object,
  chromData = fillCoreChromVariables(data.frame()),
  peaksData = list(.EMPTY_PEAKS_DATA),
  ...
)

Arguments

object

A ChromBackendMemory object.

chromData

For backendInitialize() of a ChromBackendMemory backend, a data.frame with the chromatographic data. If not provided (or if empty), a default data.frame with the core chromatographic variables will be created.

peaksData

For backendInitialize() of a ChromBackendMemory backend, a list of data.frame with the peaks data. If not provided (or if empty), a default list of empty data.frame with the core peaks variables will be created. The length of the list should match the number of chromatograms in the chromData parameter.

...

Additional parameters to be passed.

Author(s)

Philippine Louail


Chromatographic Data Backend for Reading mzML Files

Description

The ChromBackendMzR inherits all slots and methods from the base ChromBackendMemory backend, providing additional functionality for reading chromatographic data from mzML files.

Unlike the ChromBackendMemory backend, the ChromBackendMzR backend should have the dataOrigin chromatographic variables populated with the file path of the mzML file from which the chromatographic data was read.

Note that the ChromBackendMzR backend is read-only and does not support direct modification of chromatographic data. However, it does support peaksData slot replacement, which will modify the peaksData slot but not the local mzML files. This is indicated by the "inMemory" slot being set to TRUE.

Implementing functionalities with the ChromBackendMzR backend should be simplified as much as possible and reuse the methods already implemented for ChromBackendMemory when possible.

Usage

ChromBackendMzR()

## S4 method for signature 'ChromBackendMzR'
backendInitialize(object, files = character(), BPPARAM = bpparam(), ...)

Arguments

object

A ChromBackendMzR object.

files

A character vector of file paths to mzML files.

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information.

...

Additional parameters to be passed.

Author(s)

Philippine Louail


Chromatographic Peaks Metadata.

Description

As explained in the Chromatograms class documentation, the Chromatograms object is a container for chromatogram data that includes chromatographic peaks data (retention time and related intensity values, also referred to as peaks data variables in the context of Chromatograms) and metadata of individual chromatograms (so called chromatograms variables).

The chromatograms variables information can be accessed using the chromData() function. it is also possible to access specific chromatograms variables using $.

chromData can be accessed, replaced but also filtered/subsetted. Refer to the sections below for more details.

Usage

## S4 method for signature 'Chromatograms'
chromData(object, columns = chromVariables(object), drop = FALSE)

## S4 replacement method for signature 'Chromatograms'
chromData(object) <- value

## S4 method for signature 'Chromatograms'
chromVariables(object)

## S4 method for signature 'Chromatograms'
chromIndex(object)

## S4 replacement method for signature 'Chromatograms'
chromIndex(object) <- value

## S4 method for signature 'Chromatograms'
collisionEnergy(object)

## S4 replacement method for signature 'Chromatograms'
collisionEnergy(object) <- value

## S4 method for signature 'Chromatograms'
dataOrigin(object)

## S4 replacement method for signature 'Chromatograms'
dataOrigin(object) <- value

## S4 method for signature 'Chromatograms'
msLevel(object)

## S4 replacement method for signature 'Chromatograms'
msLevel(object) <- value

## S4 method for signature 'Chromatograms'
mz(object)

## S4 replacement method for signature 'Chromatograms'
mz(object) <- value

## S4 method for signature 'Chromatograms'
mzMax(object)

## S4 replacement method for signature 'Chromatograms'
mzMax(object) <- value

## S4 method for signature 'Chromatograms'
mzMin(object)

## S4 replacement method for signature 'Chromatograms'
mzMin(object) <- value

## S4 method for signature 'Chromatograms'
length(x)

## S4 method for signature 'Chromatograms'
precursorMz(object)

## S4 replacement method for signature 'Chromatograms'
precursorMz(object) <- value

## S4 method for signature 'Chromatograms'
precursorMzMin(object)

## S4 replacement method for signature 'Chromatograms'
precursorMzMin(object) <- value

## S4 method for signature 'Chromatograms'
precursorMzMax(object)

## S4 replacement method for signature 'Chromatograms'
precursorMzMax(object) <- value

## S4 method for signature 'Chromatograms'
productMz(object)

## S4 replacement method for signature 'Chromatograms'
productMz(object) <- value

## S4 method for signature 'Chromatograms'
productMzMin(object)

## S4 replacement method for signature 'Chromatograms'
productMzMin(object) <- value

## S4 method for signature 'Chromatograms'
productMzMax(object)

## S4 replacement method for signature 'Chromatograms'
productMzMax(object) <- value

## S4 method for signature 'Chromatograms'
filterChromData(
  object,
  variables = character(),
  ranges = numeric(),
  match = c("any", "all"),
  keep = TRUE
)

Arguments

object

A Chromatograms object.

columns

A character vector of chromatograms variables to extract.

drop

A logical indicating whether to drop dimensions when extracting a single variable.

value

replacement value for ⁠<-⁠ methods. See individual method description or expected data type.

x

A Chromatograms object.

variables

For filterChromData(): character vector with the names of the chromatogram variables to filter for. The list of available chromatogram variables can be obtained with chromVariables().

ranges

For filterChromData() : a numeric vector of paired values (upper and lower boundary) that define the ranges to filter the object. These paired values need to be in the same order as the variables parameter (see below).

match

For filterChromData() : character(1) defining whether the condition has to match for all provided ranges (match = "all"; the default), or for any of them (match = "any") for chromatogram data to be retained.

keep

For filterChromData(): logical(1) defining whether to keep (keep = TRUE) or remove (keep = FALSE) the chromatogram data that match the condition.

Chromatograms variables and accessor functions

The following chromatograms variables are guaranteed to be provided by a Chromatograms object and to be accessible with either the chromData() or a specific function named after the variables names:

  • chromIndex: an integer with the index of the chromatogram in the original source file (e.g. mzML file).

  • collisionEnergy: for SRM data, numeric with the collision energy of the precursor.

  • dataOrigin: optional character with the origin of the data.

  • msLevel: integer defining the MS level of the data.

  • mz: optional numeric with the (target) m/z value for the chromatographic data.

  • mzMin: optional numeric with the lower m/z value of the m/z range in case the data (e.g. an extracted ion chromatogram EIC) was extracted from a Spectra object.

  • mzMax: optional numeric with the upper m/z value of the m/z range.

  • precursorMz: for SRM data, numeric with the target m/z of the precursor (parent).

  • precursorMzMin: for SRM data, optional numeric with the lower m/z of the precursor's isolation window.

  • precursorMzMax: for SRM data, optional numeric with the upper m/z of the precursor's isolation window.

  • productMz for SRM data, numeric with the target m/z of the product ion.

  • productMzMin: for SRM data, optional numeric with the lower m/z of the product's isolation window.

  • productMzMax: for SRM data, optional numeric with the upper m/z of the product's isolation window.

Filter Chromatograms variables

Functions that filter Chromatograms based on chromatograms variables (i.e, chromData ) will remove chromatographic data that do not meet the specified conditions. This means that if a chromatogram is filtered out, its corresponding chromData and peaksData will be removed from the object immediately.

The available functions to filter chromatogram data are:

  • filterChromData(): Filters numerical chromatographic data variables based on the provided numerical ranges. The method returns a Chromatograms object containing only the chromatograms that match the specified conditions. This function results in an object with fewer chromatograms than the original.

Author(s)

Philippine Louail

See Also

Chromatograms for a general description of the Chromatograms object. peaksData for a general description of the chromatographic peaks data available in the object, as well as how to access, replace and subset them. processingQueue for more information on the queuing of processings and parallelization for larger dataset processing.


Chromatographic MS Data Backends

Description

ChromBackend is a virtual class that defines what different backends need to provide to be used by the Chromatograms package and classes.

The backend should provide access to the chromatographic data which mainly consists of (paired) intensity and retention time values. Additional chromatographic metadata such as MS level and precursor and product m/z should also be provided.

Through their implementation different backends can be either optimized for minimal memory requirements or performance. Each backend needs to implement data access methods listed in section Backend functions: below.

And example implementation and more details and descriptions are provided in the Creating new ChromBackend classes for Chromatograms vignette.

Currently available backends are:

  • ChromBackendMemory: This backend stores chromatographic data directly in memory, making it ideal for small datasets or testing. It can be initialized with a data.frame of chromatographic data via the chromData parameter and a list of data.frame entries for peaks data using the peaksData parameter. These data can be accessed with the chromData() and peaksData() functions.

  • ChromBackendMzR: The ChromBackendMzR inherits all slots and methods from the base ChromBackendMemory backend, providing additional functionality for reading chromatographic data from mzML files.

Filter the peak data based on the provided ranges for the given variables.

Usage

coreChromVariables()

corePeaksVariables()

## S4 method for signature 'ChromBackend'
x[i, j, ..., drop = FALSE]

## S4 method for signature 'ChromBackend'
x$name

## S4 replacement method for signature 'ChromBackend'
x$name <- value

## S4 method for signature 'ChromBackend'
backendMerge(object, ...)

## S4 method for signature 'ChromBackend'
chromData(object, columns = chromVariables(object), drop = FALSE)

## S4 replacement method for signature 'ChromBackend'
chromData(object) <- value

## S4 method for signature 'ChromBackend'
peaksData(object, columns = c("rtime", "intensity"), drop = FALSE, ...)

## S4 replacement method for signature 'ChromBackend'
peaksData(object) <- value

## S4 method for signature 'ChromBackend'
x[[i, j, ...]]

## S4 replacement method for signature 'ChromBackend'
x[[i, j, ...]] <- value

## S4 method for signature 'ChromBackend'
backendBpparam(object, BPPARAM = bpparam())

## S4 method for signature 'ChromBackend'
backendInitialize(object, ...)

## S4 method for signature 'ChromBackend'
backendParallelFactor(object, ...)

## S4 method for signature 'list'
backendMerge(object, ...)

## S4 method for signature 'ChromBackend'
chromIndex(object)

## S4 replacement method for signature 'ChromBackend'
chromIndex(object) <- value

## S4 method for signature 'ChromBackend'
chromVariables(object)

## S4 method for signature 'ChromBackend'
collisionEnergy(object)

## S4 replacement method for signature 'ChromBackend'
collisionEnergy(object) <- value

## S4 method for signature 'ChromBackend'
dataOrigin(object)

## S4 replacement method for signature 'ChromBackend'
dataOrigin(object) <- value

## S4 method for signature 'ChromBackend'
intensity(object)

## S4 replacement method for signature 'ChromBackend'
intensity(object) <- value

## S4 method for signature 'ChromBackend'
isEmpty(x)

## S4 method for signature 'ChromBackend'
isReadOnly(object)

## S4 method for signature 'ChromBackend'
length(x)

## S4 method for signature 'ChromBackend'
lengths(x)

## S4 method for signature 'ChromBackend'
msLevel(object)

## S4 replacement method for signature 'ChromBackend'
msLevel(object) <- value

## S4 method for signature 'ChromBackend'
mz(object)

## S4 replacement method for signature 'ChromBackend'
mz(object) <- value

## S4 method for signature 'ChromBackend'
mzMax(object)

## S4 replacement method for signature 'ChromBackend'
mzMax(object) <- value

## S4 method for signature 'ChromBackend'
mzMin(object)

## S4 replacement method for signature 'ChromBackend'
mzMin(object) <- value

## S4 method for signature 'ChromBackend'
peaksVariables(object)

## S4 method for signature 'ChromBackend'
precursorMz(object)

## S4 replacement method for signature 'ChromBackend'
precursorMz(object) <- value

## S4 method for signature 'ChromBackend'
precursorMzMax(object)

## S4 replacement method for signature 'ChromBackend'
precursorMzMax(object) <- value

## S4 method for signature 'ChromBackend'
precursorMzMin(object)

## S4 replacement method for signature 'ChromBackend'
precursorMzMin(object) <- value

## S4 method for signature 'ChromBackend'
productMz(object)

## S4 replacement method for signature 'ChromBackend'
productMz(object) <- value

## S4 method for signature 'ChromBackend'
productMzMax(object)

## S4 replacement method for signature 'ChromBackend'
productMzMax(object) <- value

## S4 method for signature 'ChromBackend'
productMzMin(object)

## S4 replacement method for signature 'ChromBackend'
productMzMin(object) <- value

## S4 method for signature 'ChromBackend'
reset(object)

## S4 method for signature 'ChromBackend'
rtime(object)

## S4 replacement method for signature 'ChromBackend'
rtime(object) <- value

## S4 method for signature 'ChromBackend,ANY'
split(x, f, drop = FALSE, ...)

## S4 method for signature 'ChromBackend'
filterChromData(
  object,
  variables = character(),
  ranges = numeric(),
  match = c("any", "all"),
  keep = TRUE
)

## S4 method for signature 'ChromBackend'
filterPeaksData(
  object,
  variables = character(),
  ranges = numeric(),
  match = c("any", "all"),
  keep = TRUE
)

## S4 method for signature 'ChromBackend'
supportsSetBackend(object, ...)

Arguments

x

Object extending ChromBackend.

i

For [: integer, logical or character to subset the object.

j

For [ and [[: ignored.

...

Additional arguments.

drop

For chromData() and peaksData(): logical(1) default to FALSE. If TRUE, and one column is requested by the user, the method should return a vector (or list of vector for peaksData()) of the single column requested.

name

For $ and ⁠$<-⁠: the name of the chromatogram variable to return or set.

value

Replacement value for ⁠<-⁠ methods. See individual method description or expected data type.

object

Object extending ChromBackend.

columns

For chromData() accessor: optional character with column names (chromatogram variables) that should be included in the returned data.frame. By default, all columns are returned.

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information.

f

factor defining the grouping to split x. See split().

variables

For filterChromData(): character vector with the names of the chromatogram variables to filter for. The list of available chromatogram variables can be obtained with chromVariables().

ranges

For filterChromData() : a numeric vector of paired values (upper and lower boundary) that define the ranges to filter the object. These paired values need to be in the same order as the variables parameter (see below).

match

For filterChromData() : character(1) defining whether the condition has to match for all provided ranges (match = "all"; the default), or for any of them (match = "any") for chromatogram data to be retained.

keep

For filterChromData(): logical(1) defining whether to keep (keep = TRUE) or remove (keep = FALSE) the chromatogram data that match the condition.

Core chromatogram variables

The core chromatogram variables are variables (metadata) that can/should be provided by a backend. For each of these variables a value needs to be returned, if none is defined, a missing value (of the correct data type) should be returned. The names of the chromatogram variables in your current chromatogram object are returned with the chromVariables() function.

For each core chromatogram variable a dedicated access method exists. In contrast to the peaks data described below, a single value should be returned for each chromatogram.

The coreChromVariables() function returns the core chromatogram variables along with their expected (defined) data type.

The core chromatogram variables (in alphabetical order) are:

  • chromIndex: an integer with the index of the chromatogram in the original source file (e.g. mzML file).

  • collisionEnergy: for SRM data, numeric with the collision energy of the precursor.

  • dataOrigin: optional character with the origin of a chromatogram.

  • dataOrigin: character defining where the data is (currently) stored.

  • msLevel: integer defining the MS level of the data.

  • mz: optional numeric with the (target) m/z value for the chromatographic data.

  • mzMin: optional numeric with the lower m/z value of the m/z range in case the data (e.g. an extracted ion chromatogram EIC) was extracted from a Spectra object.

  • mzMax: optional numeric with the upper m/z value of the m/z range.

  • precursorMz: for SRM data, numeric with the target m/z of the precursor (parent).

  • precursorMzMin: for SRM data, optional numeric with the lower m/z of the precursor's isolation window.

  • precursorMzMax: for SRM data, optional numeric with the upper m/z of the precursor's isolation window.

  • productMz for SRM data, numeric with the target m/z of the product ion.

  • productMzMin: for SRM data, optional numeric with the lower m/z of the product's isolation window.

  • productMzMax: for SRM data, optional numeric with the upper m/z of the product's isolation window.

Core Peaks variables

Similar to the core chromatogram variables, core peaks variables represent metadata that should be provided by a backend. Each of these variables should return a value, and if undefined, a missing value (with the appropriate data type) is returned. The number of values for a peaks variable in a single chromatogram can vary, from none to multiple, and may differ between chromatograms.

The names of peaks variables in the current chromatogram object can be obtained with the peaksVariables() function.

Each core peaks variable has a dedicated accessor method.

The corePeaksVariables() function returns the core peaks variables along with their expected (defined) data type.

The core peaks variables, listed in the required order for peaksData, are:

  • rtime: A numeric vector containing retention time values.

  • intensity: A numeric vector containing intensity values.

They should be provided for each chromatogram in the backend, in this order, No NAs are allowed for the rtime values. These characteristics will be checked with the validPeaksData() function.

Mandatory methods

New backend classes must extend the base ChromBackend class and implement the following mandatory methods:

  • backendInitialize(): initialises the backend. This method is supposed to be called right after creating an instance of the backend class and should prepare the backend. Parameters can be defined freely for each backend, depending on what is needed to initialize the backend. This method has to ensure to set the chromatogram variable dataOrigin correctly.

  • backendBpparam(): returns the parallel processing setup supported by the backend class. This function can be used by any higher level function to evaluate whether the provided parallel processing setup (or the default one returned by bpparam()) is supported by the backend. Backends not supporting parallel processing (e.g. because they contain a connection to a database that can not be shared across processes) should extend this method to return only SerialParam() and hence disable parallel processing for (most) methods and functions. See also backendParallelFactor() for a function to provide a preferred splitting of the backend for parallel processing.

  • backendParallelFactor(): returns a factor defining an optimal (preferred) way how the backend can be split for parallel processing used for all peak data accessor or data manipulation functions. The default implementation returns a factor of length 0 (factor()) providing thus no default splitting. backendParallelFactor() for ChromBackendMzR on the other hand returns factor(dataOrigin(object)) hence suggesting to split the object by data file.

  • chromData(), ⁠chromData<-⁠: gets or sets general chromatogram metadata (annotation). chromData() returns a data.frame, ⁠chromData<-⁠ expects a data.frame with the same number of rows as there are chromatograms in object. Read-only backends might not need to implement the replacement method ⁠chromData<-⁠ (unless some internal caching mechanism could be used). chromData() should be implemented with the parameter drop set to FALSE as default. With drop = FALSE the method should return a data.frame even if one column is requested. If drop = TRUE is specified, the output will be a vector of the single column requested. New backends should be implemented such as if empty, the method returns a data.frame with 0 rows and the columns defined by chromVariables(). By default, the function should return at minimum the coreChromVariables, even if NAs.

  • peaksData(): returns a list of data.frame with the data (e.g. retention time - intensity pairs) from each chromatogram. The length of the list is equal to the number of chromatograms in object. For an empty chromatogram a data.frame with 0 rows and two columns (named "rtime" and "intensity") has to be returned. The optional parameter columns, if supported by the backend allows to define which peak variables should be returned in each array. As default (minimum) columns "rtime" and "intensity" have to be provided. peaksData() should be implemented with the parameter drop set to FALSE as default. With drop = FALSE the method should return a data.frame even if only one column is requested. If drop = TRUE is specified, the output will be a vector of the single column requested.

  • ⁠peaksData<-⁠ replaces the peak data (retention time and intensity values) of the backend. This method expects a list of two-dimensional arrays (data.frame) with columns representing the peak variables. All existing peaks data are expected to be replaced with these new values. The length of the list has to match the number of chromatogram of object. Note that only writeable backends need to support this method.

  • [: subset the backend. Only subsetting by element (row/i) is allowed. This method should be implemented as to support empty integer.

  • $, ⁠$<-⁠: access or set/add a single chromatogram variable (column) in the backend.

  • backendMerge(): merges (combines) ChromBackend objects into a single instance. All objects to be merged have to be of the same type.

Optional methods with default implementations

Additional methods that might be implemented, but for which default implementations are already present are:

  • [[

  • backendParallelFactor(): returns a factor defining an optimal (preferred) way how the backend can be split for parallel processing used for all peak data accessor or data manipulation functions. The default implementation returns a factor of length 0 (factor()) providing thus no default splitting.

  • chromIndex(): returns an integer vector with the index of the chromatograms in the original source file.

  • chromVariables(): returns a character vector with the available chromatogram variables (columns, fields or attributes) available in object. Variables listed by this function are expected to be returned (if requested) by the chromData() function.

  • collisionEnergy(), ⁠collisionEnergy<-⁠: gets or sets the collision energy for the precursor (for SRM data). collisionEnergy() returns a numeric of length equal to the number of chromatograms in object.

  • dataOrigin(), ⁠dataOrigin<-⁠: gets or sets the data origin variable. dataOrigin() returns a character of length equal to the number of chromatograms, ⁠dataOrigin<-⁠ expects a character of length equal length(object).

  • filterChromData(): filters any numerical chromatographic data variables based on the provided numerical ranges. The method should return a ChromBackend object with the chromatograms that match the condition. This function will results in an object with less chromatogram than the original.

  • intensity(): gets the intensity values from the chromatograms. Returns a list of numeric vectors (intensity values for each chromatogram). The length of the list is equal to the number of chromatograms in object.

  • ⁠intensity<-⁠: replaces the intensity values. value has to be a list of length equal to the number of chromatograms and the number of values within each list element identical to the number of data pairs in each chromatogram. Note that just writeable backends need to support this method.

  • isReadOnly(): returns a logical(1) whether the backend is read only or does allow also to write/update data. Defaults to FALSE.

  • isEmpty(): returns a logical of length equal to the number of chromatograms with TRUE for chromatograms without any data pairs.

  • length(): returns the number of chromatograms in the object.

  • lengths(): returns the number of data pairs (retention time and intensity values) per chromatogram.

  • msLevel(): gets the chromatogram's MS level. Returns an integer vector (of length equal to the number of chromatograms) with the MS level for each chromatogram (or NA_integer_ if not available).

  • mz(),⁠mz<-⁠: gets or sets the m/z value of the chromatograms. mz() returns a numeric of length equal to the number of chromatograms in object, ⁠mz<-⁠ expects a numeric of length length(object).

  • mzMax(),⁠mzMax<-⁠: gets or sets the upper m/z of the mass-to-charge range from which a chromatogram contains signal (e.g. if the chromatogram was extracted from MS data in spectra format and a m/z range was provided). mzMax() returns a numeric of length equal to the number of chromatograms in object, ⁠mzMax<-⁠ expects a numeric of length equal to the number of chromatograms in object.

  • mzMin(),⁠mzMin<-⁠: gets or sets the lower m/z of the mass-to-charge range from which a chromatogram contains signal (e.g. if the chromatogram was extracted from MS data in spectra format and a m/z range was provided). mzMin() returns a numeric of length equal to the number of chromatograms in object, ⁠mzMin<-⁠ expects a numeric of length equal to the number of chromatograms in object.

  • peaksVariables(): lists the available data variables for the chromatograms. Default peak variables are "rtime" and "intensity" (which all backends need to support and provide), but some backends might provide additional variables. Variables listed by this function are expected to be returned (if requested) by the peaksData() function.

  • precursorMz(),⁠precursorMz<-⁠: gets or sets the (target) m/z of the precursor (for SRM data). precursorMz() returns a numeric of length equal to the number of chromatograms in object. ⁠precursorMz<-⁠ expects a numeric of length equal to the number of chromatograms.

  • precursorMzMin(),precursorMzMax(),productMzMin(), productMzMax(): gets the lower and upper margin for the precursor or product isolation windows. These functions might return the value of productMz() if the respective minimal or maximal m/z values are not defined in object.

  • productMz(),⁠productMz<-⁠: gets or sets the (target) m/z of the product (for SRM data). productMz() returns a numeric of length equal to the number of chromatograms in object. ⁠productMz<-⁠ expects a numeric of length equal to the number of chromatograms.

  • rtime(): gets the retention times from the chromatograms. returns a NumericList() of numeric vectors (retention times for each chromatogram). The length of the returned list is equal to the number of chromatograms in object.

  • ⁠rtime<-⁠: replaces the retention times. value has to be a list (or NumericList()) of length equal to the number of chromatograms and the number of values within each list element identical to the number of data pairs in each chromatogram. Note that just writeable backends support this method.

  • split(): splits the backend into a list of backends (depending on parameter f). The default method for ChromBackend uses split.default(), thus backends extending ChromBackend don't necessarily need to implement this method.

  • supportsSetBackend(): whether a ChromBackend supports the Chromatograms setBackend() function. The default function will take the peaksData() and chromData() of the user's backend and pass it to the new backend. If the backend does not support this function, it should return FALSE. Therefore both backend in question should have a adequate peaksData() and chromData() method as well as their respective replacement method.

Implementation notes

Backends extending ChromBackend must implement all of its methods (listed above). A guide to create new backend classes is provided as a dedicated vignette. Additional information and an example for a backend implementation is provided in the respective vignette.

Note

This function replaces the peaksData() of the input object. Therefore backend with readOnly == TRUE (i.e. ChromBackendmzR) will need to have a carefully implemented ⁠peaksData(object) <-⁠ method.

I don't know if I want to base is on the isReadOnly() output.. I think it depends more on how the backend is implemented. we can discuss this.

Author(s)

Johannes Rainer, Philippine Louail

Examples

## Create a simple backend implementation
ChromBackendDummy <- setClass("ChromBackendDummy",
    contains = "ChromBackend")

Chromatographic peaks data

Description

As explained in the Chromatograms class documentation, the Chromatograms object is a container for chromatographic data that includes chromatographic peaks data (retention time and related intensity values, also referred to as peaks data variables in the context of Chromatograms) and metadata of individual chromatograms (so called chromatograms variables).

The peaks data variables information can be accessed using the peaksData() function. It is also possible to access specific peaks variables using $.

The peaks data can be accessed, replaced but also filtered/subsetted. Refer to the sections below for more details.

Usage

## S4 method for signature 'Chromatograms'
peaksData(
  object,
  columns = peaksVariables(object),
  f = processingChunkFactor(object),
  BPPARAM = bpparam(),
  drop = FALSE,
  ...
)

## S4 replacement method for signature 'Chromatograms'
peaksData(object) <- value

## S4 method for signature 'Chromatograms'
peaksVariables(object, ...)

## S4 method for signature 'Chromatograms'
rtime(object, ...)

## S4 replacement method for signature 'Chromatograms'
rtime(object) <- value

## S4 method for signature 'Chromatograms'
intensity(object, ...)

## S4 replacement method for signature 'Chromatograms'
intensity(object) <- value

## S4 method for signature 'Chromatograms'
filterPeaksData(
  object,
  variables = character(),
  ranges = numeric(),
  match = c("any", "all"),
  keep = TRUE
)

Arguments

object

A Chromatograms object.

columns

For peaksData(): optional character with column names (peaks variables) that should be included in the returned list of data.frame. By default, all columns are returned. Available variables can be found by calling peaksVariables() on the object.

f

factor defining the grouping to split the Chromatograms object.

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information.

drop

logical(1) For peaksData(), default to FALSE. If TRUE, and one column is called by the user, the method returns a list of vector of the single column requested.

...

Additional arguments passed to the method.

value

For rtime() and intensity(): numeric vector with the values to replace the current values. The length of the vector must match the number of peaks data pairs in the Chromatograms object.

variables

For filterPeaksData(): character vector with the names of the peaks data variables to filter for. The list of available peaks data variables can be obtained with peaksVariables().

ranges

For filterPeaksData() : a numeric vector of paired values (upper and lower boundary) that define the ranges to filter the object. These paired values need to be in the same order as the variables parameter (see below).

match

For filterPeaksData() : character(1) defining whether the condition has to match for all provided ranges (match = "all"; the default), or for any of them (match = "any").

keep

For filterPeaksData(): logical(1) defining whether to keep (keep = TRUE) or remove (keep = FALSE) the chromatographic peaks data that match the condition.

Filter Peaks Variables

Functions that filter a Chromatograms's peaks data (i.e., peaksData). These functions remove peaks data that do not meet the specified conditions. If a chromatogram in a Chromatograms object is filtered, only the corresponding peaks variable pairs (i.e., rows) in the peaksData are removed, while the chromatogram itself remains in the object.

The available functions to filter chromatographic peaks data include:

  • filterPeaksData(): Filters numerical peaks data variables based on the specified numerical ranges parameter. This method returns the same input Chromatograms object, but the filtering step is added to the processing queue. The filtered data will be reflected when the user accesses peaksData. This function does not reduce the number of chromatograms in the object, but it removes the specified peaks data (e.g., "rtime" and "intensity" pairs) from the peaksData.

In the case of a read-only backend, (such as the ChromBackendMzR), the replacement of the peaks data is not possible. The peaks data can be filtered, but the filtered data will not be saved in the backend. This means the original mzml files will not be affected by computations performed on the Chromatograms.

Author(s)

Philippine Louail

See Also

Chromatograms for a general description of the Chromatograms object, and chromData for accessing,substituting and filtering chromatographic variables. For more information on the queuing of processings and parallelization for larger dataset processing see processingQueue.


Efficiently processing Chromatograms objects.

Description

The processingQueue of a Chromatograms object is a list of processing steps (i.e., functions) that are stored within the object and applied only when needed. This design allows data to be processed in a single step, which is particularly useful for larger datasets. The processing queue enables functions to be applied in a chunk-wise manner, facilitating parallel processing and reducing memory demand.

Since the peaks data can be quite large, a processing queue is used to ensure efficiency. Generally, the processing queue is applied either temporarily when calling peaksData() or permanently when calling applyProcessing(). As explained below the processing efficiency can be further improved by enabling chunk-wise processing.

Usage

## S4 method for signature 'Chromatograms'
applyProcessing(
  object,
  f = processingChunkFactor(object),
  BPPARAM = bpparam(),
  ...
)

## S4 method for signature 'Chromatograms'
addProcessing(object, FUN, ...)

## S4 method for signature 'Chromatograms'
processingChunkSize(object, ...)

## S4 replacement method for signature 'Chromatograms'
processingChunkSize(object) <- value

## S4 method for signature 'Chromatograms'
processingChunkFactor(object, chunkSize = processingChunkSize(object), ...)

Arguments

object

A Chromatograms object.

f

factor defining the grouping to split the Chromatograms object.

BPPARAM

Parallel setup configuration. See BiocParallel::bpparam() for more information.

...

Additional arguments passed to the methods.

FUN

For addProcessing(), a function to be added to the Chromatograms object's processing queue.

value

integer(1) defining the chunk size.

chunkSize

integer(1) for processingChunkFactor defining the chunk size. The default is the value stored in the Chromatograms object's processingChunkSize slot.

Value

processingChunkSize() returns the currently defined processing chunk size (or Inf if it is not defined). processingChunkFactor() returns a factor defining the chunks into which object will be split for (parallel) chunk-wise processing or a factor of length 0 if no splitting is defined.

Apply Processing

The applyProcessing() function applies the processing queue to the backend and returns the updated Chromatograms object. The processing queue is a list of processing steps applied to the chromatograms data. Each element in the list is a function that processes the chromatograms data. To apply processing to the peaks data, the backend must be set to a non-read-only backend using the setBackend() function.

Parallel and Chunk-wise Processing of Chromatograms

Many operations on Chromatograms objects, especially those involving the actual peaks data (see peaksData), support chunk-wise processing. This involves splitting the Chromatograms into smaller parts (chunks) that are processed iteratively. This enables parallel processing by data chunk and reduces memory demand since only the peak data of the currently processed subset is loaded into memory. Chunk-wise processing, which is disabled by default, can be enabled by setting the processing chunk size of a Chromatograms object using the processingChunkSize() function to a value smaller than the length of the Chromatograms object. For example, setting processingChunkSize(chr) <- 1000 will cause any data manipulation operation on chr, such as filterPeaksData(), to be performed in parallel for sets of 1000 chromatograms in each iteration.

Chunk-wise processing is particularly useful for Chromatograms objects using an on-disk backend or for very large experiments. For small datasets or Chromatograms using an in-memory backend, direct processing might be more efficient. Setting the chunk size to Inf will disable chunk-wise processing.

Some backends may prefer a specific type of splitting and chunk-wise processing. For example, the ChromBackendMzR backend needs to load MS data from the original (mzML) files, so chunk-wise processing on a per-file basis is ideal. The backendParallelFactor() function for ChromBackend allows backends to suggest a preferred data chunking by returning a factor defining the respective data chunks. The ChromBackendMzR returns a factor based on the dataOrigin chromatograms variable. A factor of length 0 is returned if no particular preferred splitting is needed. The suggested chunk definition will be used if no finite processingChunkSize() is defined. Setting the processingChunkSize overrides backendParallelFactor.

Functions to configure parallel or chunk-wise processing:

  • processingChunkSize(): Gets or sets the size of the chunks for parallel or chunk-wise processing of a Chromatograms object. With a value of Inf (the default), no chunk-wise processing will be performed.

  • processingChunkFactor(): Returns a factor defining the chunks into which a Chromatograms object will be split for chunk-wise (parallel) processing. A factor of length 0 indicates that no chunk-wise processing will be performed.

Note

Some backends might not support parallel processing. For these, the backendBpparam() function will always return a SerialParam() regardless of how parallel processing was defined.

Author(s)

Johannes Rainer, Philippine Louail