Title: | Core Utils for Metabolomics Data |
---|---|
Description: | MetaboCoreUtils defines metabolomics-related core functionality provided as low-level functions to allow a data structure-independent usage across various R packages. This includes functions to calculate between ion (adduct) and compound mass-to-charge ratios and masses or functions to work with chemical formulas. The package provides also a set of adduct definitions and information on some commercially available internal standard mixes commonly used in MS experiments. |
Authors: | Johannes Rainer [aut, cre] , Michael Witting [aut] , Andrea Vicini [aut], Liesa Salzer [ctb] , Sebastian Gibb [aut] , Michael Stravs [ctb] , Roger Gine [aut] , Philippine Louail [aut] |
Maintainer: | Johannes Rainer <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.13.1 |
Built: | 2024-10-27 06:20:47 UTC |
Source: | https://github.com/rformassspectrometry/metabocoreutils |
addElements
Add one chemical formula to another.
addElements(x, y)
addElements(x, y)
x |
|
y |
|
character
Resulting formula
Michael Witting and Sebastian Gibb
addElements("C6H12O6", "Na") addElements("C6H12O6", c("Na", "H2O"))
addElements("C6H12O6", "Na") addElements("C6H12O6", c("Na", "H2O"))
adductFormula
calculates the chemical formulas for the specified adducts
of provided chemical formulas.
adductFormula(formulas, adduct = "[M+H]+", standardize = TRUE)
adductFormula(formulas, adduct = "[M+H]+", standardize = TRUE)
formulas |
|
adduct |
|
standardize |
|
character
matrix with formula rows and adducts columns
containing all ion formulas. In case an ion can't be generated (eg.
[M-NH3+H]+ in a molecule that doesn't have nitrogen), a NA is returned
instead.
Roger Gine
adductNames()
for a list of all available predefined adducts and
adducts()
for the adduct data.frame
definition style.
# Calculate the ion formulas of glucose with adducts [M+H]+, [M+Na]+ and [M+K]+ adductFormula("C6H12O6", c("[M+H]+", "[M+Na]+", "[M+K]+")) # > "[C6H13O6]+" "[C6H12O6Na]+" "[C6H12O6K]+" # Use a custom set of adduct definitions (For instance, a iron (Fe2+) adduct) custom_ads <- data.frame(name = "[M+Fe]2+", mass_multi = 0.5, charge = 2, formula_add = "Fe", formula_sub = "C0", positive = "TRUE") adductFormula("C6H12O6", custom_ads)
# Calculate the ion formulas of glucose with adducts [M+H]+, [M+Na]+ and [M+K]+ adductFormula("C6H12O6", c("[M+H]+", "[M+Na]+", "[M+K]+")) # > "[C6H13O6]+" "[C6H12O6Na]+" "[C6H12O6K]+" # Use a custom set of adduct definitions (For instance, a iron (Fe2+) adduct) custom_ads <- data.frame(name = "[M+Fe]2+", mass_multi = 0.5, charge = 2, formula_add = "Fe", formula_sub = "C0", positive = "TRUE") adductFormula("C6H12O6", custom_ads)
adductNames
returns all supported adduct definitions that can be used by
mass2mz()
and mz2mass()
.
adducts
returns a data.frame
with the adduct definitions.
adductNames(polarity = c("positive", "negative")) adducts(polarity = c("positive", "negative"))
adductNames(polarity = c("positive", "negative")) adducts(polarity = c("positive", "negative"))
polarity |
|
for adductNames
: character
vector with all valid adduct names
for the selected ion mode. For adducts
: data.frame
with the adduct
definitions.
Michael Witting, Johannes Rainer
## retrieve names of adduct names in positive ion mode adductNames(polarity = "positive") ## retrieve names of adduct names in negative ion mode adductNames(polarity = "negative")
## retrieve names of adduct names in positive ion mode adductNames(polarity = "positive") ## retrieve names of adduct names in negative ion mode adductNames(polarity = "negative")
Kendrick mass defect analysis is a way to analyze high-resolution MS data in order to identify homologous series. The Kendrick mass (KM) is calculated by choosing a specific molecular fragment (e.g. CH2) and settings its mass to an integer mass. In case of CH2 the mass of 14.01565 would be set to 14.The Kendrick mass defect (KMD) is defined as the difference between the KM and the nominial (integer) KM. All molecules of homologoues series, e.g. only differing in the number of CH2, will have an identical KMD. In an additional step the KMD can be referenced to the mass defect of specific lipid backbone and by this normalize values to the referenced KMD (RKMD). This leads to values of 0 for saturated species or -1, -2, -3, etc for unsaturated species.
Available functoins are:
calculateKm
: calculates the Kendrick mass from an exact mass for a
specific molecular fragment, e.g. "CH2"
.
calculateKmd
: calculates the Kendrick mass defect from an exact mass for
a specific molecular fragment, e.g. "CH2"
.
calculateRkmd
: calculates the referenced Kendrick mass defect from an
exact mass for a specific molecular fragment, e.g. "CH2"
, and a reference
KMD.
isRkmd
: Checks if a calculated RKMD falls within a specific error range
around an negative integer corresponding the number of double bonds, in
case of CH2 as fragment.
calculateKm(x, fragment = 14/14.01565) calculateKmd(x, fragment = 14/14.01565) calculateRkmd(x, fragment = 14/14.01565, rkmd = 0.749206) isRkmd(x, rkmdTolerance = 0.1)
calculateKm(x, fragment = 14/14.01565) calculateKmd(x, fragment = 14/14.01565) calculateRkmd(x, fragment = 14/14.01565, rkmd = 0.749206) isRkmd(x, rkmdTolerance = 0.1)
x |
|
fragment |
|
rkmd |
|
rkmdTolerance |
|
numeric
or boolean
. All functions, except isRkmd
return a
numeric
with same length as the input corresponding to the KM, KMD or
RMKD.
isRkmd
returns a logical
with TRUE
or FALSE
indicating if the
RKMD falls within a specific range around a negative integer
corresponding to the number of double bonds.
Michael Witting
calculateKm(760.5851) calculateKmd(760.5851) calculateRkmd(760.5851, rkmd = 0.749206) isRkmd(calculateRkmd(760.5851, rkmd = 0.749206))
calculateKm(760.5851) calculateKmd(760.5851) calculateRkmd(760.5851, rkmd = 0.749206) isRkmd(calculateRkmd(760.5851, rkmd = 0.749206))
calculateMass
calculates the exact mass from a formula. Isotopes are also
supported. For isotopes, the isotope type needs to be specified as an
element's prefix, e.g. "[13C]"
for carbon 13 or "[2H]"
for deuterium.
A formula with 2 carbon 13 isotopes and 3 carbons would thus contain e.g.
"[13C2]C3"
.
calculateMass(x)
calculateMass(x)
x |
|
numeric
Resulting exact mass.
Michael Witting
calculateMass("C6H12O6") calculateMass("NH3") calculateMass(c("C6H12O6", "NH3")) ## Calculate masses for formulas containing isotope information. calculateMass(c("C6H12O6", "[13C3]C3H12O6")) ## Calculate mass for a chemical with 5 deuterium. calculateMass("C11[2H5]H7N2O2")
calculateMass("C6H12O6") calculateMass("NH3") calculateMass(c("C6H12O6", "NH3")) ## Calculate masses for formulas containing isotope information. calculateMass(c("C6H12O6", "[13C3]C3H12O6")) ## Calculate mass for a chemical with 5 deuterium. calculateMass("C11[2H5]H7N2O2")
containsElements
checks if one sum formula is contained in another.
containsElements(x, y)
containsElements(x, y)
x |
|
y |
|
logical
TRUE if y
is contained in x
Michael Witting and Sebastian Gibb
containsElements("C6H12O6", "H2O") containsElements("C6H12O6", "NH3")
containsElements("C6H12O6", "H2O") containsElements("C6H12O6", "NH3")
convertMtime
performs effective mobility scale transformation of CE(-MS)
data, which is used to overcome variations of the migration times, caused by
differences in the Electroosmotic Flow (EOF) between different runs.
In order to monitor the EOF and perform the transformation, neutral or
charged EOF markers are spiked into the sample before analysis. The
information of the EOF markers (migration time and effective mobility) will
be then used to perform the effective mobility transformation of the
migration time scale.
convertMtime( x = numeric(), rtime = numeric(), mobility = numeric(), tR = 0, U = numeric(), L = numeric() )
convertMtime( x = numeric(), rtime = numeric(), mobility = numeric(), tR = 0, U = numeric(), L = numeric() )
x |
|
rtime |
|
mobility |
|
tR |
|
U |
|
L |
|
numeric
vector of same length as x with effective mobility values.
Liesa Salzer
rtime <- c(10,20,30,40,50,60,70,80,90,100) marker_rt <- c(20,80) mobility <- c(0, 2000) convertMtime(rtime, marker_rt, mobility)
rtime <- c(10,20,30,40,50,60,70,80,90,100) marker_rt <- c(20,80) mobility <- c(0, 2000) convertMtime(rtime, marker_rt, mobility)
correctRindex
performs correction of retention indices (RIs) based on
reference substances.
Even after conversion of RTs to RIs slight deviations might exist. These
deviations can be further normalized, if they are linear, by using two
metabolites for which the RIs are known (e.g. internal standards).
correctRindex(x, y)
correctRindex(x, y)
x |
|
y |
|
numeric
vector of same length than x
with corrected retention
indices. Values are floating point decimals. If integer values shall be
used conversion has to be performed manually.
Michael Witting
ref <- data.frame(rindex = c(110, 210), refindex = c(100, 200)) rindex <- c(110, 210) correctRindex(rindex, ref)
ref <- data.frame(rindex = c(110, 210), refindex = c(100, 200)) rindex <- c(110, 210) correctRindex(rindex, ref)
countElements
parses strings representing a chemical formula into a named
vector of element counts.
countElements(x)
countElements(x)
x |
|
list
of integer
with the element counts (names being elements).
Michael Witting and Sebastian Gibb
countElements(c("C6H12O6", "C11H12N2O2"))
countElements(c("C6H12O6", "C11H12N2O2"))
The fit_lm
and adjust_lm
functions facilitate linear model-based
normalization of abundance matrices. The expected noise in a numeric
data matrix can be modeled with a linear regression model using the
fit_lm
function and the data can subsequently be adjusted using the
adjust_lm
function (i.e., the modeled noise will be removed from the
abundance values). A typical use case would be to remove injection index
dependent signal drifts in a LC-MS derived metabolomics data:
a linear model of the form y ~ injection_index
could be used to model
the measured abundances of each feature (each row in a data matrix) as a
function of the injection index in which a specific sample was measured
during the LC-MS measurement run. The fitted linear regression models
can subsequently be used to adjust the abundance matrix by removing
these dependencies from the data. This allows to perform signal
adjustments as described in (Wehrens et al. 2016).
The two functions are described in more details below:
fit_lm
allows to fit a linear regression model (defined with parameter
formula
) to each row of the numeric data matrix submitted with parameter
y
. Additional covariates of the linear model defined in formula
are
expected to be provided as columns in a data.frame
supplied via
the data
parameter.
The linear model is expected to be defined by a formula starting with
y ~
. To model for example an injection index dependency of values in
y
a formula y ~ injection_index
could be used, with values for the
injection index being provided as a column "injection_index"
in the
data
data frame. fit_lm
would thus fit this model to each row of
y
.
Linear models can be fitted either with the standard least squares of
lm()
by setting method = "lm"
(the default), or with the more robust
methods from the robustbase package with method = "lmrob"
.
adjust_lm
can be used to adjust abundances in a data matrix y
based
on linear regression models provided with parameter lm
. Parameter lm
is expected to be a list
of length equal to the number of rows of y
,
each element being a linear model (i.e., a results from lm
or lmrob
).
Covariates for the model need to be provided as columns in a data.frame
provided with parameter data
. The number of rows of that data.frame
need to match the number of columns of y
. The function returns the
input matrix y
with values in rows adjusted with the linear models
provided by lm
. No adjustment is performed for rows for which the
respective element in lm
is NA
. See examples below for details or the
vignette for more examples, descriptions and information.
fit_lm( formula, data, y, method = c("lm", "lmrob"), control = NULL, minVals = ceiling(nrow(data) * 0.75), model = TRUE, ..., BPPARAM = SerialParam() ) adjust_lm(y = matrix(), data = data.frame(), lm = list(), ...)
fit_lm( formula, data, y, method = c("lm", "lmrob"), control = NULL, minVals = ceiling(nrow(data) * 0.75), model = TRUE, ..., BPPARAM = SerialParam() ) adjust_lm(y = matrix(), data = data.frame(), lm = list(), ...)
formula |
|
data |
|
y |
for |
method |
|
control |
a |
minVals |
|
model |
|
... |
for |
BPPARAM |
parallel processing setup. See |
lm |
|
For `fit_lm`: a `list` with linear models (either of type *lm* or *lmrob*) or length equal to the number of rows of `y`. `NA` is reported for rows with too few non-missing data points (depending on parameter `minValues`). For `adjust_lm`: a numeric matrix (same dimensions as input matrix `y`) with the values adjusted with the provided linear models.
Johannes Rainer
Wehrens R, Hageman JA, van Eeuwijk F, Kooke R, Flood PJ, Wijnker E, Keurentjes JJ, Lommen A, van Eekelen HD, Hall RD Mumm R and de Vos RC. Improved batch correction in untargeted MS-based metabolomics. Metabolomics 2016; 12:88.
## See also the vignette for more details and examples. ## Load a test matrix with abundances of features from a LC-MS experiment. vals <- read.table(system.file("txt", "feature_values.txt", package = "MetaboCoreUtils"), sep = "\t") vals <- as.matrix(vals) ## Define a data.frame with the covariates to be used to model the noise sdata <- data.frame(injection_index = seq_len(ncol(vals))) ## Fit a linear model describing the feature abundances as a ## function of the index in which samples were injected during the LC-MS ## run. We're fitting the model to log2 transformed data. ## Note that such a model should **only** be fitted if the samples ## were randomized, i.e. the injection index is independent of any ## experimental covariate. Alternatively, the injection order dependent ## signal drift could be estimated using QC samples (if they were ## repeatedly injected) - see vignette for more details. ii_lm <- fit_lm(y ~ injection_index, data = sdata, y = log2(vals)) ## The result is a list of linear models ii_lm[[1]] ## Plotting the data for one feature: plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[2]], lty = 2) ## For this feature (row) a decreasing signal intensity with injection ## index was observed (and modeled). ## For another feature an increasing intensity can be observed. plot(x = sdata$injection_index, y = log2(vals[3, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[3]], lty = 2) ## This trend can be removed from the data using the `adjust_lm` function ## by providing the linear models descring the drift. Note that, because ## we're adjusting log2 transformed data, the resulting abundances are ## also in log2 scale. vals_adj <- adjust_lm(log2(vals), data = sdata, lm = ii_lm) ## Plotting the data before (open circles) and after adjustment (filled ## points) plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") points(x = sdata$injection_index, y = vals_adj[2, ], pch = 16) grid() ## Adding the line fitted through the raw data abline(ii_lm[[2]], lty = 2) ## Adding a line fitted through the adjusted data abline(lm(vals_adj[2, ] ~ sdata$injection_index), lty = 1) ## After adjustment there is no more dependency on injection index.
## See also the vignette for more details and examples. ## Load a test matrix with abundances of features from a LC-MS experiment. vals <- read.table(system.file("txt", "feature_values.txt", package = "MetaboCoreUtils"), sep = "\t") vals <- as.matrix(vals) ## Define a data.frame with the covariates to be used to model the noise sdata <- data.frame(injection_index = seq_len(ncol(vals))) ## Fit a linear model describing the feature abundances as a ## function of the index in which samples were injected during the LC-MS ## run. We're fitting the model to log2 transformed data. ## Note that such a model should **only** be fitted if the samples ## were randomized, i.e. the injection index is independent of any ## experimental covariate. Alternatively, the injection order dependent ## signal drift could be estimated using QC samples (if they were ## repeatedly injected) - see vignette for more details. ii_lm <- fit_lm(y ~ injection_index, data = sdata, y = log2(vals)) ## The result is a list of linear models ii_lm[[1]] ## Plotting the data for one feature: plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[2]], lty = 2) ## For this feature (row) a decreasing signal intensity with injection ## index was observed (and modeled). ## For another feature an increasing intensity can be observed. plot(x = sdata$injection_index, y = log2(vals[3, ]), ylab = expression(log[2]~abundance), xlab = "injection index") grid() ## plot also the fitted model abline(ii_lm[[3]], lty = 2) ## This trend can be removed from the data using the `adjust_lm` function ## by providing the linear models descring the drift. Note that, because ## we're adjusting log2 transformed data, the resulting abundances are ## also in log2 scale. vals_adj <- adjust_lm(log2(vals), data = sdata, lm = ii_lm) ## Plotting the data before (open circles) and after adjustment (filled ## points) plot(x = sdata$injection_index, y = log2(vals[2, ]), ylab = expression(log[2]~abundance), xlab = "injection index") points(x = sdata$injection_index, y = vals_adj[2, ], pch = 16) grid() ## Adding the line fitted through the raw data abline(ii_lm[[2]], lty = 2) ## Adding a line fitted through the adjusted data abline(lm(vals_adj[2, ] ~ sdata$injection_index), lty = 1) ## After adjustment there is no more dependency on injection index.
formula2mz
calculates the m/z values from a list of molecular formulas and
adduct definitions.
Custom adduct definitions can be passed to the adduct
parameter in form of
a data.frame
. This data.frame
is expected to have columns "mass_add"
and "mass_multi"
defining the additive and multiplicative part of the
calculation. See adducts()
for examples.
formula2mz(formula, adduct = "[M+H]+", standardize = TRUE)
formula2mz(formula, adduct = "[M+H]+", standardize = TRUE)
formula |
|
adduct |
either a |
standardize |
|
Numeric matrix
with same number of rows than elements in formula
and number of columns being equal to the length of adduct
(adduct names
are used as column names). Each column thus represents the m/z of formula
for each defined adduct
.
Roger Gine
## Calculate m/z values of adducts of a list of formulas formulas <- c("C6H12O6", "C9H11NO3", "C16H13ClN2O") ads <- c("[M+H]+", "[M+Na]+", "[2M+H]+", "[M]+") formula2mz(formulas, ads) formula2mz(formulas, adductNames()) #All available adducts ## Use custom-defined adducts as input custom_ads <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) formula2mz(formulas, custom_ads) ## Use standardize = FALSE to keep formula unaltered formula2mz("H12C6O6") formula2mz("H12C6O6", standardize = FALSE)
## Calculate m/z values of adducts of a list of formulas formulas <- c("C6H12O6", "C9H11NO3", "C16H13ClN2O") ads <- c("[M+H]+", "[M+Na]+", "[2M+H]+", "[M]+") formula2mz(formulas, ads) formula2mz(formulas, adductNames()) #All available adducts ## Use custom-defined adducts as input custom_ads <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) formula2mz(formulas, custom_ads) ## Use standardize = FALSE to keep formula unaltered formula2mz("H12C6O6") formula2mz("H12C6O6", standardize = FALSE)
indexRtime
uses a list of known substances to convert retention times (RTs)
to retention indices (RIs). By this retention information is normalized
for differences in experimental settings, such as gradient delay volume,
dead volume or flow rate. By default linear interpolation is performed,
other ways of calculation can supplied as function.
indexRtime(x, y, FUN = rtiLinear, ...)
indexRtime(x, y, FUN = rtiLinear, ...)
x |
|
y |
|
FUN |
|
... |
additional parameter used by |
numeric
vector of same length as x with retention indices. Values
floating point decimals. If integer values shall be used conversion has
to be performed manually
Michael Witting
rti <- data.frame(rtime = c(1,2,3), rindex = c(100,200,300)) rtime <- c(1.5, 2.5) indexRtime(rtime, rti)
rti <- data.frame(rtime = c(1,2,3), rindex = c(100,200,300)) rtime <- c(1.5, 2.5) indexRtime(rtime, rti)
internalStandardMixNames
returns available names of internal standard
mixes provided by the MetaboCoreUtils
package.
internalStandardMixNames()
internalStandardMixNames()
character
names of available IS mixes
Michael Witting
internalStandardMixNames()
internalStandardMixNames()
internalStandards
returns a table with metabolite standards available
in commercial internal standard mixes. The returned data frame contains
the following columns:
"name"
: the name of the standard
"formula_salt"
: chemical formula of the salt that was used to produce
the standard mix
"formula_metabolite"
: chemical formula of the metabolite in free form
"smiles_salt"
: SMILES of the salt that was used to produced the
standard mix
"smiles_metabolite"
: SMILES of the metabolite in free form
"mol_weight_salt"
: molecular (average) weight of the salt (can be used
for calculation of molar concentration, etc.)
"exact_mass_metabolite"
: exact mass of free metabolites
"conc"
: concentration of the metabolite in ug/mL (of salt form)
"mix"
: name of internal standard mix
internalStandards(mix = "QReSS")
internalStandards(mix = "QReSS")
mix |
|
data.frame
data on internal standards
Michael Witting
internalStandardMixNames()
for provided internal standard mixes.
internalStandards(mix = "QReSS") internalStandards(mix = "UltimateSplashOne")
internalStandards(mix = "QReSS") internalStandards(mix = "UltimateSplashOne")
In order to identify potential isotopologues based on only m/z and intensity
values with the isotopologues()
function, sets of pre-calculated parameters
are required. This function returns such parameter sets estimated on
different sources/databases. The nomenclature used to describe isotopes
follows the following convention: the number of neutrons is provided in [
as a prefix to the element and the number of atoms of the element as suffix.
[13]C2[37]Cl3
describes thus an isotopic substitution containing 2 [13]C
isotopes and 3 [37]Cl
isotopes.
Each row in the returned data.frame
is associated with an isotopic
substitution (which can involve isotopes of several elements or different
isotopes of the same element). In general for each isotopic substitution
multiple rows are present in the data.frame
. Each row provides parameters
to compute bounds (for the ratio between the isotopologue peak and the
monoisotopic one) on a certain mass range. The provided isotopic
substitutions are in general the most frequently observed substitutions in
the database (e.g. HMDB) on which they were defined. Parameters (columns)
defined for each isotopic substitution are:
"minmass"
: the minimal mass of a compound for which the isotopic
substitution was found. Peaks with a mass lower than this will most likely
not have the respective isotopic substitution.
"maxmass"
: the maximal mass of a compound for which the isotopic
substitution was found. Peaks with a mass higher than this will most likely
not have the respective isotopic substitution.
"md"
: the mass difference between the monoisotopic peak and a peak of an
isotopologue characterized by the respective isotopic substitution.
"leftend"
: left endpoint of the mass interval.
"rightend"
: right endpoint of the mass interval.
"LBint"
: intercept of the lower bound line on the mass interval whose
endpoints are "leftend"
and "rightend"
.
"LBslope"
: slope of the lower bound line on the mass interval.
"UBint"
: intercept of the upper bound line on the mass interval.
"UBslope"
: slope of the upper bound line on the mass interval.
isotopicSubstitutionMatrix(source = c("HMDB_NEUTRAL"))
isotopicSubstitutionMatrix(source = c("HMDB_NEUTRAL"))
source |
|
data.frame
with parameters to detect the defined isotopic
substitutions
source = "HMDB"
: most common isotopic substitutions and parameters for
these have been calculated for all compounds from the
Human Metabolome Database (HMDB, July 2021). Note that
the substitutions were calculated on the neutral masses (i.e. the
chemical formulas of the compounds, not considering any adducts).
Andrea Vicini
## Get the substitution matrix calculated on HMDB isotopicSubstitutionMatrix("HMDB_NEUTRAL")
## Get the substitution matrix calculated on HMDB isotopicSubstitutionMatrix("HMDB_NEUTRAL")
Given a spectrum (i.e. a peak matrix with m/z and intensity values)
the function identifies groups of potential isotopologue peaks based on
pre-defined mass differences and intensity (probability) ratios that need
to be passed to the function with the substDefinition
parameter. Each
isotopic substitution in a compound determines a certain isotopologue and it
is associated with a certain mass difference of that with respect to the
monoisotopic isotopologue. Also each substitution in a compound is linked
to a certain ratio between the intensities of the peaks of the corresponding
isotopologue and the monoisotopic one. This ratio isn't the same for
isotopologues corresponding to the same isotopic substitution but to
different compounds. Through the substDefinition
parameter we provide
upper and lower values to compute bounds for each isotopic substitution
dependent on the peak's mass.
isotopologues( x, substDefinition = isotopicSubstitutionMatrix(), tolerance = 0, ppm = 20, seedMz = numeric(), charge = 1, .check = TRUE )
isotopologues( x, substDefinition = isotopicSubstitutionMatrix(), tolerance = 0, ppm = 20, seedMz = numeric(), charge = 1, .check = TRUE )
x |
|
substDefinition |
|
tolerance |
|
ppm |
|
seedMz |
|
charge |
|
.check |
|
The function iterates over the peaks (rows) in x
. For each peak (which is
assumed to be the monoisotopic peak) it searches other peaks in x
with a
difference in mass matching (given ppm
and tolerance
) any of the
pre-defined mass differences in substDefinitions
(column "md"
). The mass
is obtained by multiplying the m/z of the peaks for the charge
expected
for the ionized compounds.
For matching peaks, the function next evaluates whether their intensity is
within the expected (pre-defined) intensity range. Using "LBint"
,
"LBslope"
, "UBint"
, "UBslope"
of the previously matched isotopic
substitution in substDefinition
, the function estimates a (mass dependent)
lower and upper intensity ratio limit based on the peak's mass.
When some peaks are grouped together their indexes are excluded from the set of indexes that are searched for further groups (i.e. peaks already assigned to an isotopologue group are not considered/tested again thus each peak can only be part of one isotopologue group).
list
of integer
vectors. Each integer
vector contains the
indixes of the rows in x
with potential isotopologues of the same
compound.
Andrea Vicini
## Read theoretical isotope pattern (high resolution) from example file x <- read.table(system.file("exampleSpectra", "serine-alpha-lactose-caffeine.txt", package = "MetaboCoreUtils"), header = TRUE) x <- x[order(x$mz), ] plot(x$mz, x$intensity, type = "h") isos <- isotopologues(x, ppm = 5) isos ## highlight them in the plot for (i in seq_along(isos)) { z <- isos[[i]] points(x$mz[z], x$intensity[z], col = i + 1) }
## Read theoretical isotope pattern (high resolution) from example file x <- read.table(system.file("exampleSpectra", "serine-alpha-lactose-caffeine.txt", package = "MetaboCoreUtils"), header = TRUE) x <- x[order(x$mz), ] plot(x$mz, x$intensity, type = "h") isos <- isotopologues(x, ppm = 5) isos ## highlight them in the plot for (i in seq_along(isos)) { z <- isos[[i]] points(x$mz[z], x$intensity[z], col = i + 1) }
mass2mz
calculates the m/z value from a neutral mass and an adduct
definition.
Custom adduct definitions can be passed to the adduct
parameter in form of
a data.frame
. This data.frame
is expected to have columns "mass_add"
and "mass_multi"
defining the additive and multiplicative part of the
calculation. See adducts()
for examples.
mass2mz(x, adduct = "[M+H]+")
mass2mz(x, adduct = "[M+H]+")
x |
|
adduct |
either a |
numeric matrix
with same number of rows than elements in x
and
number of columns being equal to the length of adduct
(adduct names
are used as column names). Each column thus represents the m/z of x
for each defined adduct
.
Michael Witting, Johannes Rainer
mz2mass()
for the reverse calculation, adductNames()
for
supported adduct definitions.
exact_mass <- c(100, 200, 250) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mass2mz(exact_mass, adduct) exact_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mass2mz(exact_mass, adduct) ## Calculate m/z of multiple adducts from neutral mass mass2mz(exact_mass, adduct = adductNames()) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mass2mz(c(100, 200), adds)
exact_mass <- c(100, 200, 250) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mass2mz(exact_mass, adduct) exact_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mass2mz(exact_mass, adduct) ## Calculate m/z of multiple adducts from neutral mass mass2mz(exact_mass, adduct = adductNames()) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mass2mz(c(100, 200), adds)
The mclosest
function calculates the closest rows between two matrices
(or data frames) considering pairwise differences between values in columns
of x
and table
. It returns the index of the closest row in table
for
each row in x
.
mclosest(x, table, ppm = 0, tolerance = Inf)
mclosest(x, table, ppm = 0, tolerance = Inf)
x |
|
table |
|
ppm |
|
tolerance |
|
If, for a row of x
, two rows of table
are closest only the index of first
row will be returned.
For both the tolerance
and ppm
arguments, if their length is different to
the number of columns of x
and table
, the input argument will be
replicated to match it.
integer
vector of indices indicating the closest row of table
for
each row of x
. If no suitable match is found for a row in x
based on the
specified tolerance
and ppm
, the corresponding index is set to NA
.
Philippine Louail
x <- data.frame(a = 1:5, b = 3:7) table <- data.frame(c = c(11, 23, 3, 5, 1), d = c(32:35, 45)) ## Get for each row of `x` the index of the row in `table` with the smallest ## difference of values (per column) mclosest(x, table) ## If the absolute difference is larger than `tolerance`, return `NA`. Note ## that the tolerance value of `25` is used for difference for each pairwise ## column in `x` and `table`. mclosest(x, table, tolerance = 25)
x <- data.frame(a = 1:5, b = 3:7) table <- data.frame(c = c(11, 23, 3, 5, 1), d = c(32:35, 45)) ## Get for each row of `x` the index of the row in `table` with the smallest ## difference of values (per column) mclosest(x, table) ## If the absolute difference is larger than `tolerance`, return `NA`. Note ## that the tolerance value of `25` is used for difference for each pairwise ## column in `x` and `table`. mclosest(x, table, tolerance = 25)
multiplyElements
Multiply the number of atoms of each element by a
constant, positive, integer
multiplyElements(x, k)
multiplyElements(x, k)
x |
|
k |
|
character
strings with the standardized chemical formula.
Roger Gine
multiplyElements("H2O", 3) multiplyElements(c("C6H12O6", "Na", "CH4O"), 2)
multiplyElements("H2O", 3) multiplyElements(c("C6H12O6", "Na", "CH4O"), 2)
mz2mass
calculates the neutral mass from a given m/z value and adduct
definition.
Custom adduct definitions can be passed to the adduct
parameter in form of
a data.frame
. This data.frame
is expected to have columns "mass_add"
and "mass_multi"
defining the additive and multiplicative part of the
calculation. See adducts()
for examples.
mz2mass(x, adduct = "[M+H]+")
mz2mass(x, adduct = "[M+H]+")
x |
|
adduct |
either a |
numeric matrix
with same number of rows than elements in x
and
number of columns being equal to the length of adduct
(adduct names
are used as column names. Each column thus represents the neutral mass
of x
for each defined adduct
.
Michael Witting, Johannes Rainer
mass2mz()
for the reverse calculation, adductNames()
for
supported adduct definitions.
ion_mass <- c(100, 200, 300) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mz2mass(ion_mass, adduct) ion_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mz2mass(ion_mass, adduct) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mz2mass(c(100, 200), adds)
ion_mass <- c(100, 200, 300) adduct <- "[M+H]+" ## Calculate m/z of [M+H]+ adduct from neutral mass mz2mass(ion_mass, adduct) ion_mass <- 100 adduct <- "[M+Na]+" ## Calculate m/z of [M+Na]+ adduct from neutral mass mz2mass(ion_mass, adduct) ## Provide a custom adduct definition. adds <- data.frame(mass_add = c(1, 2, 3), mass_multi = c(1, 2, 0.5)) rownames(adds) <- c("a", "b", "c") mz2mass(c(100, 200), adds)
pasteElements
creates a chemical formula from element counts (such as
returned by countElements()
).
pasteElements(x)
pasteElements(x)
x |
|
character()
with the chemical formulas.
Michael Witting and Sebastian Gibb
elements <- c("C" = 6, "H" = 12, "O" = 6) pasteElements(elements)
elements <- c("C" = 6, "H" = 12, "O" = 6) pasteElements(elements)
The following functions allow to calculate basic quality assessment estimates typically employed in the analysis of metabolomics data. These functions are designed to be applied to entire rows of data, where each row corresponds to a feature. Subsequently, these estimates can serve as a foundation for feature filtering.
rsd
and rowRsd
are convenience functions to calculate the relative
standard deviation (i.e. coefficient of variation) of a numerical vector
or for rows of a numerical matrix, respectively.
rowDratio
computes the D-ratio or dispersion ratio, defined as the
standard deviation for QC (Quality Control) samples divided by the
standard deviation for biological test samples, for each feature (row) in
the matrix.
percentMissing
and rowPercentMissing
determine the percentage of
missing values in a vector or for each row of a matrix, respectively.
rowBlank
identifies rows (i.e., features) where the mean of test samples
is lower than a specified multiple (defined by the threshold
parameter)
of the mean of blank samples. This can be used to flag features that result
from contamination in the solvent of the samples.
These functions are based on standard filtering methods described in the literature, and they are implemented to assist in preprocessing metabolomics data.
rsd(x, na.rm = TRUE, mad = FALSE) rowRsd(x, na.rm = TRUE, mad = FALSE) rowDratio(x, y, na.rm = TRUE, mad = FALSE) percentMissing(x) rowPercentMissing(x) rowBlank(x, y, threshold = 2, na.rm = TRUE)
rsd(x, na.rm = TRUE, mad = FALSE) rowRsd(x, na.rm = TRUE, mad = FALSE) rowDratio(x, y, na.rm = TRUE, mad = FALSE) percentMissing(x) rowPercentMissing(x) rowBlank(x, y, threshold = 2, na.rm = TRUE)
x |
|
na.rm |
|
mad |
|
y |
|
threshold |
|
See individual function description above for details.
For rsd
and rowRsd
the feature abundances are expected to be provided in
natural scale and not e.g. log2 scale as it may lead to incorrect
interpretations.
Philippine Louail, Johannes Rainer
Broadhurst D, Goodacre R, Reinke SN, Kuligowski J, Wilson ID, Lewis MR, Dunn WB. Guidelines and considerations for the use of system suitability and quality control samples in mass spectrometry assays applied in untargeted clinical metabolomic studies. Metabolomics. 2018;14(6):72. doi: 10.1007/s11306-018-1367-3. Epub 2018 May 18. PMID: 29805336; PMCID: PMC5960010.
## coefficient of variation a <- c(4.3, 4.5, 3.6, 5.3) rsd(a) A <- rbind(a, a, a) rowRsd(A) ## Dratio x <- c(4.3, 4.5, 3.6, 5.3) X <- rbind(a, a, a) rowDratio(X, X) #' ## Percent Missing b <- c(1, NA, 3, 4, NA) percentMissing(b) B <- matrix(c(1, 2, 3, NA, 5, 6, 7, 8, 9), nrow = 3) rowPercentMissing(B) ## Blank Rows test_samples <- matrix(c(13, 21, 3, 4, 5, 6), nrow = 2) blank_samples <- matrix(c(0, 1, 2, 3, 4, 5), nrow = 2) rowBlank(test_samples, blank_samples)
## coefficient of variation a <- c(4.3, 4.5, 3.6, 5.3) rsd(a) A <- rbind(a, a, a) rowRsd(A) ## Dratio x <- c(4.3, 4.5, 3.6, 5.3) X <- rbind(a, a, a) rowDratio(X, X) #' ## Percent Missing b <- c(1, NA, 3, 4, NA) percentMissing(b) B <- matrix(c(1, 2, 3, NA, 5, 6, 7, 8, 9), nrow = 3) rowPercentMissing(B) ## Blank Rows test_samples <- matrix(c(13, 21, 3, 4, 5, 6), nrow = 2) blank_samples <- matrix(c(0, 1, 2, 3, 4, 5), nrow = 2) rowBlank(test_samples, blank_samples)
standardizeFormula
standardizes a supplied chemical formula according
to the Hill notation system.
standardizeFormula(x)
standardizeFormula(x)
x |
|
character
strings with the standardized chemical formula.
Michael Witting and Sebastian Gibb
pasteElements()
countElements()
standardizeFormula("C6O6H12")
standardizeFormula("C6O6H12")
subtractElements
subtracts one chemical formula from another.
subtractElements(x, y)
subtractElements(x, y)
x |
|
y |
|
character
Resulting formula
Michael Witting and Sebastian Gibb
subtractElements("C6H12O6", "H2O") subtractElements("C6H12O6", "NH3")
subtractElements("C6H12O6", "H2O") subtractElements("C6H12O6", "NH3")