Package 'Chromatograms'

Title: Infrastructure for Chromatographic Mass Spectrometry Data
Description: The Chromatograms packages defines a efficient infrastructure for storing and handling of chromatographic mass spectrometry data. It provides different implementations of *backends* to store and represent the data. Such backends can be optimized for small memory footprint or fast data access/processing. A lazy evaluation queue and chunk-wise processing capabilities ensure efficient analysis of also very large data sets.
Authors: Laurent Gatto [aut] , Johannes Rainer [aut] , Philippine Louail [aut, cre]
Maintainer: Philippine Louail <[email protected]>
License: Artistic-2.0
Version: 0.2.0
Built: 2024-11-08 10:16:54 UTC
Source: https://github.com/rformassspectrometry/Chromatograms

Help Index


Improved in-memory Chromatographic data backend

Description

ChromBackendMemory: This backend stores chromatographic data directly in memory, making it ideal for small datasets or testing. It can be initialized with a data.frame of chromatographic data via the chromData parameter and a list of data.frame entries for peaks data using the peaksData parameter. These data can be accessed with the chromData() and peaksData() functions.

Usage

ChromBackendMemory()

## S4 method for signature 'ChromBackendMemory'
backendInitialize(
  object,
  chromData = fillCoreChromVariables(data.frame()),
  peaksData = list(.EMPTY_PEAKS_DATA)
)

Arguments

object

A ChromBackendMemory object.

chromData

For backendInitialize() of a ChromBackendMemory backend, a data.frame with the chromatographic data. If not provided (or if empty), a default data.frame with the core chromatographic variables will be created.

peaksData

For backendInitialize() of a ChromBackendMemory backend, a list of data.frame with the peaks data. If not provided (or if empty), a default list of empty data.frame with the core peaks variables will be created. The length of the list should match the number of chromatograms in the chromData parameter.

Author(s)

Philippine Louail


Chromatographic MS Data Backends

Description

ChromBackend is a virtual class that defines what different backends need to provide to be used by the Chromatograms package and classes.

The backend should provide access to the chromatographic data which mainly consists of (paired) intensity and retention time values. Additional chromatographic metadata such as MS level and precursor and product m/z should also be provided.

Through their implementation different backends can be either optimized for minimal memory requirements or performance. Each backend needs to implement data access methods listed in section Backend functions: below.

And example implementation and more details and descriptions are provided in the Creating new ChromBackend classes for Chromatograms vignette.

Currently available backends are:

  • ChromBackendMemory: This backend stores chromatographic data directly in memory, making it ideal for small datasets or testing. It can be initialized with a data.frame of chromatographic data via the chromData parameter and a list of data.frame entries for peaks data using the peaksData parameter. These data can be accessed with the chromData() and peaksData() functions.

Usage

chromData(object, ...)

chromData(object) <- value

chromIndex(object, ...)

chromIndex(object) <- value

chromVariables(object, ...)

mzMax(object, ...)

mzMax(object) <- value

mzMin(object, ...)

mzMin(object) <- value

precursorMzMin(object, ...)

precursorMzMin(object) <- value

precursorMzMax(object, ...)

precursorMzMax(object) <- value

productMzMax(object, ...)

productMzMax(object) <- value

productMzMin(object, ...)

productMzMin(object) <- value

reset(object, ...)

coreChromVariables()

corePeaksVariables()

## S4 method for signature 'ChromBackend'
x[i, j, ..., drop = FALSE]

## S4 method for signature 'ChromBackend'
x$name

## S4 replacement method for signature 'ChromBackend'
x$name <- value

## S4 method for signature 'ChromBackend'
backendMerge(object, ...)

## S4 method for signature 'ChromBackend'
chromData(object, columns = chromVariables(object), drop = FALSE)

## S4 replacement method for signature 'ChromBackend'
chromData(object) <- value

## S4 method for signature 'ChromBackend'
peaksData(object, columns = c("rtime", "intensity"), drop = FALSE)

## S4 replacement method for signature 'ChromBackend'
peaksData(object) <- value

## S4 method for signature 'ChromBackend'
backendInitialize(object, ...)

## S4 method for signature 'ChromBackend'
backendParallelFactor(object, ...)

## S4 method for signature 'list'
backendMerge(object, ...)

## S4 method for signature 'ChromBackend'
chromIndex(object)

## S4 replacement method for signature 'ChromBackend'
chromIndex(object) <- value

## S4 method for signature 'ChromBackend'
chromVariables(object)

## S4 method for signature 'ChromBackend'
collisionEnergy(object)

## S4 replacement method for signature 'ChromBackend'
collisionEnergy(object) <- value

## S4 method for signature 'ChromBackend'
dataOrigin(object)

## S4 replacement method for signature 'ChromBackend'
dataOrigin(object) <- value

## S4 method for signature 'ChromBackend'
dataStorage(object)

## S4 replacement method for signature 'ChromBackend'
dataStorage(object) <- value

## S4 method for signature 'ChromBackend'
intensity(object)

## S4 replacement method for signature 'ChromBackend'
intensity(object) <- value

## S4 method for signature 'ChromBackend'
isEmpty(x)

## S4 method for signature 'ChromBackend'
isReadOnly(object)

## S4 method for signature 'ChromBackend'
length(x)

## S4 method for signature 'ChromBackend'
lengths(x)

## S4 method for signature 'ChromBackend'
msLevel(object)

## S4 replacement method for signature 'ChromBackend'
msLevel(object) <- value

## S4 method for signature 'ChromBackend'
mz(object)

## S4 replacement method for signature 'ChromBackend'
mz(object) <- value

## S4 method for signature 'ChromBackend'
mzMax(object)

## S4 replacement method for signature 'ChromBackend'
mzMax(object) <- value

## S4 method for signature 'ChromBackend'
mzMin(object)

## S4 replacement method for signature 'ChromBackend'
mzMin(object) <- value

## S4 method for signature 'ChromBackend'
peaksVariables(object)

## S4 method for signature 'ChromBackend'
precursorMz(object)

## S4 replacement method for signature 'ChromBackend'
precursorMz(object) <- value

## S4 method for signature 'ChromBackend'
precursorMzMax(object)

## S4 replacement method for signature 'ChromBackend'
precursorMzMax(object) <- value

## S4 method for signature 'ChromBackend'
precursorMzMin(object)

## S4 replacement method for signature 'ChromBackend'
precursorMzMin(object) <- value

## S4 method for signature 'ChromBackend'
productMz(object)

## S4 replacement method for signature 'ChromBackend'
productMz(object) <- value

## S4 method for signature 'ChromBackend'
productMzMax(object)

## S4 replacement method for signature 'ChromBackend'
productMzMax(object) <- value

## S4 method for signature 'ChromBackend'
productMzMin(object)

## S4 replacement method for signature 'ChromBackend'
productMzMin(object) <- value

## S4 method for signature 'ChromBackend'
reset(object)

## S4 method for signature 'ChromBackend'
rtime(object)

## S4 replacement method for signature 'ChromBackend'
rtime(object) <- value

## S4 method for signature 'ChromBackend,ANY'
split(x, f, drop = FALSE, ...)

Arguments

object

Object extending ChromBackend.

...

Additional arguments.

value

replacement value for ⁠<-⁠ methods. See individual method description or expected data type.

x

Object extending ChromBackend.

i

For [: integer, logical or character to subset the object.

j

For [: ignored.

drop

For chromData() and peaksData(): logical(1) default to FALSE. If TRUE, and one column is called by the user, the method should return a vector (or list of vector for peaksData()) of the single column requested.

name

For $ and ⁠$<-⁠: the name of the chromatogram variable to return or set.

columns

For chromData() accessor: optional character with column names (chromatogram variables) that should be included in the returned data.frame. By default, all columns are returned.

f

factor defining the grouping to split x. See split().

Core chromatogram variables

The core chromatogram variables are variables (metadata) that can/should be provided by a backend. For each of these variables a value needs to be returned, if none is defined, a missing value (of the correct data type) should be returned. The names of the chromatogram variables in your current chromatogram object are returned with the chromVariables() function.

For each core chromatogram variable a dedicated access method exists. In contrast to the peaks data described below, a single value should be returned for each chromatogram.

The coreChromVariables() function returns the core chromatogram variables along with their expected (defined) data type.

The core chromatogram variables (in alphabetical order) are:

  • chromIndex: an integer with the index of the chromatogram in the original source file (e.g. mzML file).

  • collisionEnergy: for SRM data, numeric with the collision energy of the precursor.

  • dataOrigin: optional character with the origin of a chromatogram.

  • dataStorage: character defining where the data is (currently) stored.

  • msLevel: integer defining the MS level of the data.

  • mz: optional numeric with the (target) m/z value for the chromatographic data.

  • mzMin: optional numeric with the lower m/z value of the m/z range in case the data (e.g. an extracted ion chromatogram EIC) was extracted from a Spectra object.

  • mzMax: optional numeric with the upper m/z value of the m/z range.

  • precursorMz: for SRM data, numeric with the target m/z of the precursor (parent).

  • precursorMzMin: for SRM data, optional numeric with the lower m/z of the precursor's isolation window.

  • precursorMzMax: for SRM data, optional numeric with the upper m/z of the precursor's isolation window.

  • productMz for SRM data, numeric with the target m/z of the product ion.

  • productMzMin: for SRM data, optional numeric with the lower m/z of the product's isolation window.

  • productMzMax: for SRM data, optional numeric with the upper m/z of the product's isolation window.

Core Peaks variables

Similar to the core chromatogram variables, core peaks variables represent metadata that should be provided by a backend. Each of these variables should return a value, and if undefined, a missing value (with the appropriate data type) is returned. The number of values for a peaks variable in a single chromatogram can vary, from none to multiple, and may differ between chromatograms.

The names of peaks variables in the current chromatogram object can be obtained with the peaksVariables() function.

Each core peaks variable has a dedicated accessor method.

The corePeaksVariables() function returns the core peaks variables along with their expected (defined) data type.

The core peaks variables, listed in the required order for peaksData, are:

  • rtime: A numeric vector containing retention time values.

  • intensity: A numeric vector containing intensity values.

They should be provided for each chromatogram in the backend, in this order, No NAs are allowed for the rtime values. These characteristics will be checked with the validPeaksData() function.

Mandatory methods

New backend classes must extend the base ChromBackend class and implement the following mandatory methods:

  • backendInitialize(): initialises the backend. This method is supposed to be called right after creating an instance of the backend class and should prepare the backend. Parameters can be defined freely for each backend, depending on what is needed to initialize the backend. This method has to ensure to set the spectra variable dataStorage correctly.

  • chromData(), ⁠chromData<-⁠: gets or sets general chromatogram metadata (annotation). chromData() returns a data.frame, ⁠chromData<-⁠ expects a data.frame with the same number of rows as there are chromatograms in object. Read-only backends might not need to implement the replacement method ⁠chromData<-⁠ (unless some internal caching mechanism could be used). chromData() should be implemented with the parameter drop set to FALSE as default. With drop = FALSE the method should return a data.frame even if only one column is called. If drop = TRUE is specified, the output will be a vector of the single column requested. New backends should be implemented such as if empty, the method returns a data.frame with 0 rows and the columns defined by chromVariables(). By default, the function should return at minimum the coreChromVariables, even if NAs.

  • peaksData(): returns a list of data.frame with the data (e.g. retention time - intensity pairs) from each chromatogram. The length of the list is equal to the number of chromatograms in object. For an empty chromatogram a data.frame with 0 rows and two columns (named "rtime" and "intensity") has to be returned. The optional parameter columns, if supported by the backend allows to define which peak variables should be returned in each array. As default (minimum) columns "rtime" and "intensity" have to be provided. peaksData() should be implemented with the parameter drop set to FALSE as default. With drop = FALSE the method should return a data.frame even if only one column is called. If drop = TRUE is specified, the output will be a vector of the single column requested.

  • ⁠peaksData<-⁠ replaces the peak data (retention time and intensity values) of the backend. This method expects a list of two-dimensional arrays (data.frame) with columns representing the peak variables. All existing peaks data are expected to be replaced with these new values. The length of the list has to match the number of spectra of object. Note that only writeable backends need to support this method.

  • [: subset the backend. Only subsetting by element (row/i) is allowed.

  • $, ⁠$<-⁠: access or set/add a single chromatogram variable (column) in the backend.

  • backendMerge(): merges (combines) ChromBackend objects into a single instance. All objects to be merged have to be of the same type.

Optional methods with default implementations

Additional methods that might be implemented, but for which default implementations are already present are:

  • backendParallelFactor(): returns a factor defining an optimal (preferred) way how the backend can be split for parallel processing used for all peak data accessor or data manipulation functions. The default implementation returns a factor of length 0 (factor()) providing thus no default splitting.

  • chromIndex(): returns an integer vector with the index of the chromatograms in the original source file.

  • chromVariables(): returns a character vector with the available chromatogram variables (columns, fields or attributes) available in object. Variables listed by this function are expected to be returned (if requested) by the chromData() function.

  • collisionEnergy(), ⁠collisionEnergy<-⁠: gets or sets the collision energy for the precursor (for SRM data). collisionEnergy() returns a numeric of length equal to the number of chromatograms in object.

  • dataOrigin(), ⁠dataOrigin<-⁠: gets or sets the data origin variable. dataOrigin() returns a character of length equal to the number of chromatograms, ⁠dataOrigin<-⁠ expects a character of length equal length(object).

  • dataStorage(), ⁠dataStorage<-⁠: gets or sets the data storage variable. dataStorage() returns a character of length equal to the number of chromatograms in object, ⁠dataStorage<- ⁠ expects a character of length equal length(object). Note that missing values (NA_character_) are not supported for dataStorage().

  • intensity(): gets the intensity values from the chromatograms. Returns a list of numeric vectors (intensity values for each chromatogram). The length of the list is equal to the number of chromatograms in object.

  • ⁠intensity<-⁠: replaces the intensity values. value has to be a list of length equal to the number of chromatograms and the number of values within each list element identical to the number of data pairs in each chromatogram. Note that just writeable backends need to support this method.

  • isReadOnly(): returns a logical(1) whether the backend is read only or does allow also to write/update data. Defaults to FALSE.

  • isEmpty(): returns a logical of length equal to the number of chromatograms with TRUE for chromatograms without any data pairs.

  • length(): returns the number of chromatograms in the object.

  • lengths(): returns the number of data pairs (retention time and intensity values) per chromatogram.

  • msLevel(): gets the chromatogram's MS level. Returns an integer vector (of length equal to the number of chromatograms) with the MS level for each chromatogram (or NA_integer_ if not available).

  • mz(),⁠mz<-⁠: gets or sets the m/z value of the chromatograms. mz() returns a numeric of length equal to the number of chromatograms in object, ⁠mz<-⁠ expects a numeric of length length(object).

  • mzMax(),⁠mzMax<-⁠: gets or sets the upper m/z of the mass-to-charge range from which a chromatogram contains signal (e.g. if the chromatogram was extracted from MS data in spectra format and a m/z range was provided). mzMax() returns a numeric of length equal to the number of chromatograms in object, ⁠mzMax<-⁠ expects a numeric of length equal to the number of chromatograms in object.

  • mzMin(),⁠mzMin<-⁠: gets or sets the lower m/z of the mass-to-charge range from which a chromatogram contains signal (e.g. if the chromatogram was extracted from MS data in spectra format and a m/z range was provided). mzMin() returns a numeric of length equal to the number of chromatograms in object, ⁠mzMin<-⁠ expects a numeric of length equal to the number of chromatograms in object.

  • peaksVariables(): lists the available data variables for the chromatograms. Default peak variables are "rtime" and "intensity" (which all backends need to support and provide), but some backends might provide additional variables. Variables listed by this function are expected to be returned (if requested) by the peaksData() function.

  • precursorMz(),⁠precursorMz<-⁠: gets or sets the (target) m/z of the precursor (for SRM data). precursorMz() returns a numeric of length equal to the number of chromatograms in object. ⁠precursorMz<-⁠ expects a numeric of length equal to the number of chromatograms.

  • precursorMzMin(),precursorMzMax(),productMzMin(), productMzMax(): gets the lower and upper margin for the precursor or product isolation windows. These functions might return the value of productMz() if the respective minimal or maximal m/z values are not defined in object.

  • productMz(),⁠productMz<-⁠: gets or sets the (target) m/z of the product (for SRM data). productMz() returns a numeric of length equal to the number of chromatograms in object. ⁠productMz<-⁠ expects a numeric of length equal to the number of chromatograms.

  • rtime(): gets the retention times from the chromatograms. returns a NumericList() of numeric vectors (retention times for each chromatogram). The length of the returned list is equal to the number of chromatograms in object.

  • ⁠rtime<-⁠: replaces the retention times. value has to be a list (or NumericList()) of length equal to the number of chromatograms and the number of values within each list element identical to the number of data pairs in each chromatogram. Note that just writeable backends support this method.

  • split(): splits the backend into a list of backends (depending on parameter f). The default method for ChromBackend uses split.default(), thus backends extending ChromBackend don't necessarily need to implement this method.

Implementation notes

Backends extending ChromBackend must implement all of its methods (listed above). A guide to create new backend classes is provided as a dedicated vignette. Additional information and an example for a backend implementation is provided in the respective vignette.

Author(s)

Johannes Rainer, Philippine Louail

Examples

## Create a simple backend implementation
ChromBackendDummy <- setClass("ChromBackendDummy",
    contains = "ChromBackend")