Title: | Quantitative features for mass spectrometry data |
---|---|
Description: | The QFeatures infrastructure enables the management and processing of quantitative features for high-throughput mass spectrometry assays. It provides a familiar Bioconductor user experience to manages quantitative data across different assay levels (such as peptide spectrum matches, peptides and proteins) in a coherent and tractable format. |
Authors: | Laurent Gatto [aut, cre] , Christophe Vanderaa [aut] |
Maintainer: | Laurent Gatto <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.15.3 |
Built: | 2024-11-17 05:58:38 UTC |
Source: | https://github.com/rformassspectrometry/qfeatures |
This function aggregates the quantitative features of an assay,
applying a summarisation function (fun
) to sets of features.
The fcol
variable name points to a rowData column that defines
how to group the features during aggregate. This variable can
eigher be a vector (we then refer to an aggregation by vector)
or an adjacency matrix (aggregation by matrix).
The rowData of the aggregated SummarizedExperiment
assay
contains a .n
variable that provides the number of parent
features that were aggregated.
When aggregating with a vector, the newly aggregated
SummarizedExperiment
assay also contains a new aggcounts
assay
containing the aggregation counts matrix, i.e. the number of
features that were aggregated for each sample, which can be
accessed with the aggcounts()
accessor.
## S4 method for signature 'QFeatures' aggregateFeatures( object, i, fcol, name = "newAssay", fun = MsCoreUtils::robustSummary, ... ) ## S4 method for signature 'SummarizedExperiment' aggregateFeatures(object, fcol, fun = MsCoreUtils::robustSummary, ...) ## S4 method for signature 'QFeatures' adjacencyMatrix(object, i, adjName = "adjacencyMatrix") adjacencyMatrix(object, i, adjName = "adjacencyMatrix") <- value ## S4 method for signature 'SummarizedExperiment' aggcounts(object, ...)
## S4 method for signature 'QFeatures' aggregateFeatures( object, i, fcol, name = "newAssay", fun = MsCoreUtils::robustSummary, ... ) ## S4 method for signature 'SummarizedExperiment' aggregateFeatures(object, fcol, fun = MsCoreUtils::robustSummary, ...) ## S4 method for signature 'QFeatures' adjacencyMatrix(object, i, adjName = "adjacencyMatrix") adjacencyMatrix(object, i, adjName = "adjacencyMatrix") <- value ## S4 method for signature 'SummarizedExperiment' aggcounts(object, ...)
object |
An instance of class |
i |
When adding an adjacency matrix to an assay of a
|
fcol |
A |
name |
A |
fun |
A function used for quantitative feature aggregation. See Details for examples. |
... |
Additional parameters passed the |
adjName |
|
value |
An adjacency matrix with row and column names. The
matrix will be coerced to compressed, column-oriented sparse
matrix (class |
Aggregation is performed by a function that takes a matrix as
input and returns a vector of length equal to ncol(x)
. Examples
thereof are
MsCoreUtils::medianPolish()
to fits an additive model (two way
decomposition) using Tukey's median polish_ procedure using
stats::medpolish()
;
MsCoreUtils::robustSummary()
to calculate a robust aggregation
using MASS::rlm()
(default);
base::colMeans()
to use the mean of each column;
colMeansMat(x, MAT)
to aggregate feature by the calculating
the mean of peptide intensities via an adjacency matrix. Shared
peptides are re-used multiple times.
matrixStats::colMedians()
to use the median of each column.
base::colSums()
to use the sum of each column;
colSumsMat(x, MAT)
to aggregate feature by the summing the
peptide intensities for each protein via an adjacency
matrix. Shared peptides are re-used multiple times.
See MsCoreUtils::aggregate_by_vector()
for more aggregation functions.
A QFeatures
object with an additional assay or a
SummarizedExperiment
object (or subclass thereof).
Missing quantitative values have different effects based on the aggregation method employed:
The aggregation functions should be able to deal with missing
values by either ignoring or propagating them. This is often
done with an na.rm
argument, that can be passed with
...
. For example, rowSums
, rowMeans
, rowMedians
,
... will ignore NA
values with na.rm = TRUE
, as illustrated
below.
Missing values will result in an error when using medpolish
,
unless na.rm = TRUE
is used. Note that this option relies on
implicit assumptions and/or performes an implicit imputation:
when summing, the values are implicitly imputed by 0, assuming
that the NA
represent a trully absent features; when
averaging, the assumption is that the NA
represented a
genuinely missing value.
When using robust summarisation, individual missing values are
excluded prior to fitting the linear model by robust
regression. To remove all values in the feature containing the
missing values, use filterNA()
.
More generally, missing values often need dedicated handling such
as filtering (see filterNA()
) or imputation (see impute()
).
Missing values in the row data of an assay will also impact the
resulting (aggregated) assay row data, as illustrated in the
example below. Any feature variables (a column in the row data)
containing NA
values will be dropped from the aggregated row
data. The reasons underlying this drop are detailed in the
reduceDataFrame()
manual page: only invariant aggregated rows,
i.e. rows resulting from the aggregation from identical variables,
are preserved during aggregations.
The situation illustrated below should however only happen in rare
cases and should often be imputable using the value of the other
aggregation rows before aggregation to preserve the invariant
nature of that column. In cases where an NA
is present in an
otherwise variant column, the column would be dropped anyway.
When considering non-unique peptides explicitly, i.e. peptides
that map to multiple proteins rather than as a protein group, it
is convenient to encode this ambiguity explicitly using a
peptide-by-proteins (sparse) adjacency matrix. This matrix is
typically stored in the rowdata and set/retrieved with the
adjacencyMatrix()
function. It can be created manually (as
illustrated below) or using PSMatch::makeAdjacencyMatrix()
.
The QFeatures vignette provides an extended example and
the Processing vignette, for a complete quantitative
proteomics data processing pipeline. The
MsCoreUtils::aggregate_by_vector()
manual page provides
further details.
## --------------------------------------- ## An example QFeatures with PSM-level data ## --------------------------------------- data(feat1) feat1 ## Aggregate PSMs into peptides feat1 <- aggregateFeatures(feat1, "psms", "Sequence", name = "peptides") feat1 ## Aggregate peptides into proteins feat1 <- aggregateFeatures(feat1, "peptides", "Protein", name = "proteins") feat1 assay(feat1[[1]]) assay(feat1[[2]]) aggcounts(feat1[[2]]) assay(feat1[[3]]) aggcounts(feat1[[3]]) ## -------------------------------------------- ## Aggregation with missing quantitative values ## -------------------------------------------- data(ft_na) ft_na assay(ft_na[[1]]) rowData(ft_na[[1]]) ## By default, missing values are propagated ft2 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums) assay(ft2[[2]]) aggcounts(ft2[[2]]) ## The rowData .n variable tallies number of initial rows that ## were aggregated (irrespective of NAs) for all the samples. rowData(ft2[[2]]) ## Ignored when setting na.rm = TRUE ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums, na.rm = TRUE) assay(ft3[[2]]) aggcounts(ft3[[2]]) ## ----------------------------------------------- ## Aggregation with missing values in the row data ## ----------------------------------------------- ## Row data results without any NAs, which includes the ## Y variables rowData(ft2[[2]]) ## Missing value in the Y feature variable rowData(ft_na[[1]])[1, "Y"] <- NA rowData(ft_na[[1]]) ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums) ## The Y feature variable has been dropped! assay(ft3[[2]]) rowData(ft3[[2]]) ## -------------------------------------------- ## Using a peptide-by-proteins adjacency matrix ## -------------------------------------------- ## Let's use assay peptides from object feat1 and ## define that peptide SYGFNAAR maps to proteins ## Prot A and B se <- feat1[["peptides"]] rowData(se)$Protein[3] <- c("ProtA;ProtB") rowData(se) ## This can also be defined using anadjacency matrix, manual ## encoding here. See PSMatch::makeAdjacencyMatrix() for a ## function that does it automatically. adj <- matrix(0, nrow = 3, ncol = 2, dimnames = list(rownames(se), c("ProtA", "ProtB"))) adj[1, 1] <- adj[2, 2] <- adj[3, 1:2] <- 1 adj adjacencyMatrix(se) <- adj rowData(se) adjacencyMatrix(se) ## Aggregation using the adjacency matrix se2 <- aggregateFeatures(se, fcol = "adjacencyMatrix", fun = MsCoreUtils::colMeansMat) ## Peptide SYGFNAAR was taken into account in both ProtA and ProtB ## aggregations. assay(se2) ## Aggregation by matrix on a QFeature object works as with a ## vector ft <- QFeatures(list(peps = se)) ft <- aggregateFeatures(ft, "peps", "adjacencyMatrix", name = "protsByMat", fun = MsCoreUtils::colMeansMat) assay(ft[[2]]) rowData(ft[[2]])
## --------------------------------------- ## An example QFeatures with PSM-level data ## --------------------------------------- data(feat1) feat1 ## Aggregate PSMs into peptides feat1 <- aggregateFeatures(feat1, "psms", "Sequence", name = "peptides") feat1 ## Aggregate peptides into proteins feat1 <- aggregateFeatures(feat1, "peptides", "Protein", name = "proteins") feat1 assay(feat1[[1]]) assay(feat1[[2]]) aggcounts(feat1[[2]]) assay(feat1[[3]]) aggcounts(feat1[[3]]) ## -------------------------------------------- ## Aggregation with missing quantitative values ## -------------------------------------------- data(ft_na) ft_na assay(ft_na[[1]]) rowData(ft_na[[1]]) ## By default, missing values are propagated ft2 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums) assay(ft2[[2]]) aggcounts(ft2[[2]]) ## The rowData .n variable tallies number of initial rows that ## were aggregated (irrespective of NAs) for all the samples. rowData(ft2[[2]]) ## Ignored when setting na.rm = TRUE ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums, na.rm = TRUE) assay(ft3[[2]]) aggcounts(ft3[[2]]) ## ----------------------------------------------- ## Aggregation with missing values in the row data ## ----------------------------------------------- ## Row data results without any NAs, which includes the ## Y variables rowData(ft2[[2]]) ## Missing value in the Y feature variable rowData(ft_na[[1]])[1, "Y"] <- NA rowData(ft_na[[1]]) ft3 <- aggregateFeatures(ft_na, 1, fcol = "X", fun = colSums) ## The Y feature variable has been dropped! assay(ft3[[2]]) rowData(ft3[[2]]) ## -------------------------------------------- ## Using a peptide-by-proteins adjacency matrix ## -------------------------------------------- ## Let's use assay peptides from object feat1 and ## define that peptide SYGFNAAR maps to proteins ## Prot A and B se <- feat1[["peptides"]] rowData(se)$Protein[3] <- c("ProtA;ProtB") rowData(se) ## This can also be defined using anadjacency matrix, manual ## encoding here. See PSMatch::makeAdjacencyMatrix() for a ## function that does it automatically. adj <- matrix(0, nrow = 3, ncol = 2, dimnames = list(rownames(se), c("ProtA", "ProtB"))) adj[1, 1] <- adj[2, 2] <- adj[3, 1:2] <- 1 adj adjacencyMatrix(se) <- adj rowData(se) adjacencyMatrix(se) ## Aggregation using the adjacency matrix se2 <- aggregateFeatures(se, fcol = "adjacencyMatrix", fun = MsCoreUtils::colMeansMat) ## Peptide SYGFNAAR was taken into account in both ProtA and ProtB ## aggregations. assay(se2) ## Aggregation by matrix on a QFeature object works as with a ## vector ft <- QFeatures(list(peps = se)) ft <- aggregateFeatures(ft, "peps", "adjacencyMatrix", name = "protsByMat", fun = MsCoreUtils::colMeansMat) assay(ft[[2]]) rowData(ft[[2]])
Placeholder for generics functions documentation
Links between assays within a QFeatures object are handled by an
AssayLinks
object. It is composed by a list of AssayLink
instances.
## S4 method for signature 'AssayLink' show(object) ## S4 method for signature 'AssayLinks' updateObject(object, ..., verbose = FALSE) ## S4 method for signature 'AssayLink' updateObject(object, ..., verbose = FALSE) AssayLink(name, from = NA_character_, fcol = NA_character_, hits = Hits()) AssayLinks(..., names = NULL) assayLink(x, i) assayLinks(x, i) ## S4 method for signature 'AssayLink,character,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'AssayLinks,list,ANY,ANY' x[i, j, ..., drop = TRUE] addAssayLink(object, from, to, varFrom, varTo) addAssayLinkOneToOne(object, from, to)
## S4 method for signature 'AssayLink' show(object) ## S4 method for signature 'AssayLinks' updateObject(object, ..., verbose = FALSE) ## S4 method for signature 'AssayLink' updateObject(object, ..., verbose = FALSE) AssayLink(name, from = NA_character_, fcol = NA_character_, hits = Hits()) AssayLinks(..., names = NULL) assayLink(x, i) assayLinks(x, i) ## S4 method for signature 'AssayLink,character,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'AssayLinks,list,ANY,ANY' x[i, j, ..., drop = TRUE] addAssayLink(object, from, to, varFrom, varTo) addAssayLinkOneToOne(object, from, to)
object |
An |
... |
A set of |
verbose |
logical (default FALSE) whether to print extra messages |
name |
A mandatory name of the assay(s). |
from |
A |
fcol |
The feature variable of the parent assay used to
generate the current assay (used in
|
hits |
An object of class S4Vectors::Hits matching the features of two assays. |
names |
A |
x |
An instance of class QFeatures. |
i |
The index or name of the assay whose |
j |
ignored. |
drop |
ignored. |
to |
A |
varFrom |
A |
varTo |
A |
assayLink
returns an instance of class AssayLink
.
assayLinks
returns an instance of class AssayLinks
.
Object can be created with the AssayLink()
and AssayLinks()
constructors.
assayLink(x, i)
accesses the AssayLink at position i
or with
name i
in the QFeatures object x
.
parentAssayLinks(x, i, recursive = FALSE)
accesses the
parent(s) AssayLinks
or assay with index or name i
.
addAssayLink
takes a parent assay and a child assay contained
in the QFeatures object and creates a link given a matching
feature variable in each assay's rowData
. addAssayLink
also
allows to link an assay from multiple parent assays (see
examples below).
addAssayLinkOneToOne
links two assays contained in the
QFeatures object. The parent assay and the child assay must
have the same size and contain the same rownames (a different
ordering is allowed). The matching is performed based on the row
names of the assays, instead of a supplied variable name in
rowData
. Providing multiple parents is not supported.
##----------------------------- ## Creating an AssayLink object ##----------------------------- al1 <- AssayLink(name = "assay1") al1 ##------------------------------ ## Creating an AssayLinks object ##------------------------------ AssayLinks(al1) al2 <- AssayLinks(names = c("Assay1", "Assay2")) al2 ##--------------------------------------- ## Adding an AssayLink between two assays ##--------------------------------------- ## create a QFeatures object with 2 (identical) assays ## see also '?QFeatures' se <- SummarizedExperiment(matrix(runif(20), ncol = 2, dimnames = list(LETTERS[1:10], letters[1:2])), rowData = DataFrame(ID = 1:10)) ft <- QFeatures(list(assay1 = se, assay2 = se)) ## assay1 and assay2 are not linked assayLink(ft, "assay2") ## 'from' is NA assayLink(ft, "assay1") ## 'from' is NA ## Suppose assay2 was generated from assay1 and the feature variable ## 'ID' keeps track of the relationship between the two assays ftLinked <- addAssayLink(ft, from = "assay1", to = "assay2", varFrom = "ID", varTo = "ID") assayLink(ftLinked, "assay2") ## For one-to-one relationships, you can also use ftLinked <- addAssayLinkOneToOne(ft, from = "assay1", to = "assay2") assayLink(ftLinked, "assay2") ##---------------------------------------- ## Adding an AssayLink between more assays ##---------------------------------------- ## An assay can also be linked to multiple parent assays ## Create a QFeatures object with 2 parent assays and 1 child assay ft <- QFeatures(list(parent1 = se[1:6, ], parent2 = se[4:10, ], child = se)) ft <- addAssayLink(ft, from = c("parent1", "parent2"), to = "child", varFrom = c("ID", "ID"), varTo = "ID") assayLink(ft, "child")
##----------------------------- ## Creating an AssayLink object ##----------------------------- al1 <- AssayLink(name = "assay1") al1 ##------------------------------ ## Creating an AssayLinks object ##------------------------------ AssayLinks(al1) al2 <- AssayLinks(names = c("Assay1", "Assay2")) al2 ##--------------------------------------- ## Adding an AssayLink between two assays ##--------------------------------------- ## create a QFeatures object with 2 (identical) assays ## see also '?QFeatures' se <- SummarizedExperiment(matrix(runif(20), ncol = 2, dimnames = list(LETTERS[1:10], letters[1:2])), rowData = DataFrame(ID = 1:10)) ft <- QFeatures(list(assay1 = se, assay2 = se)) ## assay1 and assay2 are not linked assayLink(ft, "assay2") ## 'from' is NA assayLink(ft, "assay1") ## 'from' is NA ## Suppose assay2 was generated from assay1 and the feature variable ## 'ID' keeps track of the relationship between the two assays ftLinked <- addAssayLink(ft, from = "assay1", to = "assay2", varFrom = "ID", varTo = "ID") assayLink(ftLinked, "assay2") ## For one-to-one relationships, you can also use ftLinked <- addAssayLinkOneToOne(ft, from = "assay1", to = "assay2") assayLink(ftLinked, "assay2") ##---------------------------------------- ## Adding an AssayLink between more assays ##---------------------------------------- ## An assay can also be linked to multiple parent assays ## Create a QFeatures object with 2 parent assays and 1 child assay ft <- QFeatures(list(parent1 = se[1:6, ], parent2 = se[4:10, ], child = se)) ft <- addAssayLink(ft, from = c("parent1", "parent2"), to = "child", varFrom = c("ID", "ID"), varTo = "ID") assayLink(ft, "child")
This function counts the number of unique features per sample. A grouping structure can be provided to count higher level features from assays, for example counting the number of unique proteins from PSM data.
countUniqueFeatures(object, i, groupBy = NULL, colDataName = "count")
countUniqueFeatures(object, i, groupBy = NULL, colDataName = "count")
object |
An object of class |
i |
A |
groupBy |
A |
colDataName |
A |
An object of class QFeatures
.
data("ft_na") ## Count number of (non-missing) PSMs ft_na <- countUniqueFeatures(ft_na, i = "na", colDataName = "counts") ft_na$counts ## Count number of unique rowData feature ft_na <- countUniqueFeatures(ft_na, i = "na", groupBy = "Y", colDataName = "Y_counts") ft_na$Y_counts
data("ft_na") ## Count number of (non-missing) PSMs ft_na <- countUniqueFeatures(ft_na, i = "na", colDataName = "counts") ft_na$counts ## Count number of unique rowData feature ft_na <- countUniqueFeatures(ft_na, i = "na", groupBy = "Y", colDataName = "Y_counts") ft_na$Y_counts
A shiny app to browser and explore the assays in an
MultiAssayExperiment
object. Each assay can be selected from the
dropdown meny in the side panel, and the quantitative data and row
metadata are displayed in the respective Assay and Row data
tabs. The Heatmap tab displays a heatmap of the assay. The
selection of rows in the Row data table is used to subset the
features displayed in the Assay table and the heatmap to those
currectly selected. See QFeatures for an example.
display(object, n = 100, ...)
display(object, n = 100, ...)
object |
An instance inheriting from |
n |
A |
... |
Additional parameters (other than |
Used for its side effect.
Laurent Gatto
## Not run: data(feat2) display(feat2) ## End(Not run)
## Not run: data(feat2) display(feat2) ## End(Not run)
feat1
is a small test QFeatures
object for testing and
demonstration. feat2
is used to demonstrate assay joins. ft_na
is a tiny test set that contains missing values used to
demonstrate and test the impact of missing values on data
processing. se_na2
is an SummarizedExperiment
with missing
values of mixed origin.
feat1
feat1
An object of class QFeatures
of length 1.
QFeatures
object after processingfeat3
is a small QFeatures
object that contains 7 assays:
psms1
, psms2
, psmsall
, peptides
, proteins
,
normpeptides
, normproteins
. The dataset contains example data
that could be obtained after running a simple processing pipeline.
You can use it to get your hands on manipulating AssayLinks
since all 3 general cases are present:
One parent to one child AssayLink
: the relationship can either be
one row to one row (e.g. "peptides" to "normpeptides") or
multiple rows to one row (e.g. "peptides" to "proteins").
One parent to multiple children AssayLink
: for instance "peptides"
to "normpeptides" and "proteins".
Multiple parents to one child AssayLink
: links the rows between
multiple assays to a single assays where some rows in different
parent assays may point to the same row in the child assay. E.g.
"psms1" and "psms2" to "psmsall"
feat3
feat3
An object of class QFeatures
of length 7.
feat3
was built from feat1
. The source code is available in
inst/scripts/test_data.R
See ?feat1
for other example/test data sets.
data("feat3") plot(feat3)
data("feat3") plot(feat3)
A data.frame
with PSM-level quantitation data by Christoforou et al.
(2016). This is the first replicate of a spatial proteomics dataset from a
hyperLOPIT experimental design on Mouse E14TG2a embryonic stem
cells. Normalised intensities for proteins for TMT 10-plex labelled
fractions are available for 3 replicates acquired in MS3 mode using an
Orbitrap Fusion mass-spectrometer.
The variable names are
X126, X127C, X127N, X128C, X128N, X129C, X129N, X130C, X130N and X131: the 10 TMT tags used to quantify the peptides along the density gradient.
Sequence: the peptide sequence.
ProteinDescriptions: the description of the protein this peptide was associated to.
NbProteins: the number of proteins in the protein group.
ProteinGroupAccessions: the main protein accession number in the protein group.
Modifications: post-translational modifications identified in the peptide.
qValue: the PSM identification q-value.
PEP: the PSM posterior error probability.
IonScore: the Mascot ion identification score.
NbMissedCleavages: the number of missed cleavages in the peptide.
IsolationInterference: the calculated precursor ion isolation interference.
IonInjectTimems: the ions injection time in milli-seconds.
Intensity: the precursor ion intensity.
Charge: the peptide charge.
mzDa: the peptide mass to charge ratio, in Daltons.
MHDa: the peptide mass, in Daltons.
DeltaMassPPM: the difference in measure and calculated mass, in parts per millions.
RTmin: the peptide retention time, in minutes.
markers: localisation for well known sub-cellular markers. QFeatures of
unknown location are encode as "unknown"
.
For further details, install the pRolocdata
package and see
?hyperLOPIT2015
.
hlpsms
hlpsms
An object of class data.frame
with 3010 rows and 28 columns.
The pRolocdata
package: http://bioconductor.org/packages/pRolocdata/
A draft map of the mouse pluripotent stem cell spatial proteome Christoforou A, Mulvey CM, Breckels LM, Geladaki A, Hurrell T, Hayward PC, Naake T, Gatto L, Viner R, Martinez Arias A, Lilley KS. Nat Commun. 2016 Jan 12;7:8992. doi: 10.1038/ncomms9992. PubMed PMID: 26754106; PubMed Central PMCID: PMC4729960.
See QFeatures to import this data using the readQFeatures()
function.
The impute
method performs data imputation on QFeatures
and
SummarizedExperiment
instance using a variety of methods.
Users should proceed with care when imputing data and take precautions to assure that the imputation produce valid results, in particular with naive imputations such as replacing missing values with 0.
See MsCoreUtils::impute_matrix()
for details on
the different imputation methods available and strategies.
impute ## S4 method for signature 'SummarizedExperiment' impute(object, method, ...) ## S4 method for signature 'QFeatures' impute(object, method, ..., i, name = "imputedAssay")
impute ## S4 method for signature 'SummarizedExperiment' impute(object, method, ...) ## S4 method for signature 'QFeatures' impute(object, method, ..., i, name = "imputedAssay")
object |
A |
method |
|
... |
Additional parameters passed to the inner imputation
function. See |
i |
A |
name |
A |
An object of class standardGeneric
of length 1.
MsCoreUtils::imputeMethods() data(se_na2) ## table of missing values along the rows (proteins) table(rowData(se_na2)$nNA) ## table of missing values along the columns (samples) colData(se_na2)$nNA ## non-random missing values notna <- which(!rowData(se_na2)$randna) length(notna) notna impute(se_na2, method = "min") if (require("imputeLCMD")) { impute(se_na2, method = "QRILC") impute(se_na2, method = "MinDet") } if (require("norm")) impute(se_na2, method = "MLE") impute(se_na2, method = "mixed", randna = rowData(se_na2)$randna, mar = "knn", mnar = "QRILC") ## neighbour averaging x <- se_na2[1:4, 1:6] assay(x)[1, 1] <- NA ## min value assay(x)[2, 3] <- NA ## average assay(x)[3, 1:2] <- NA ## min value and average ## 4th row: no imputation assay(x) assay(impute(x, "nbavg"))
MsCoreUtils::imputeMethods() data(se_na2) ## table of missing values along the rows (proteins) table(rowData(se_na2)$nNA) ## table of missing values along the columns (samples) colData(se_na2)$nNA ## non-random missing values notna <- which(!rowData(se_na2)$randna) length(notna) notna impute(se_na2, method = "min") if (require("imputeLCMD")) { impute(se_na2, method = "QRILC") impute(se_na2, method = "MinDet") } if (require("norm")) impute(se_na2, method = "MLE") impute(se_na2, method = "mixed", randna = rowData(se_na2)$randna, mar = "knn", mnar = "QRILC") ## neighbour averaging x <- se_na2[1:4, 1:6] assay(x)[1, 1] <- NA ## min value assay(x)[2, 3] <- NA ## average assay(x)[3, 1:2] <- NA ## min value and average ## 4th row: no imputation assay(x) assay(impute(x, "nbavg"))
This function applies a full-join type of operation on 2 or more
assays in a QFeatures
instance.
joinAssays(x, i, name = "joinedAssay")
joinAssays(x, i, name = "joinedAssay")
x |
An instance of class QFeatures. |
i |
The indices or names of al least two assays to be joined. |
name |
A |
The rows to be joined are chosen based on the rownames of the respective assays. It is the user's responsability to make sure these are meaningful, such as for example refering to unique peptide sequences or proteins.
The join operation acts along the rows and expects the samples
(columns) of the assays to be disjoint, i.e. the assays mustn't
share any samples. Rows that aren't present in an assay are set to
NA
when merged.
The rowData
slots are also joined. However, only columns that
are shared and that have the same values for matching columns/rows
are retained. For example of a feature variable A
in sample S1
contains value a1
and variable A
in sample S2
in a different
assay contains a2
, then the feature variable A
is dropped in
the merged assay.
The joined assay is linked to its parent assays through an AssayLink
object. The link between the child assay and the parent assays is based on
the assay row names, just like the procedure for joining the parent assays.
A QFeatures
object with an additional assay.
Laurent Gatto
## ----------------------------------------------- ## An example QFeatures with 3 assays to be joined ## ----------------------------------------------- data(feat2) feat2 feat2 <- joinAssays(feat2, 1:3) ## Individual assays to be joined, each with 4 samples and a ## variable number of rows. assay(feat2[[1]]) assay(feat2[[2]]) assay(feat2[[3]]) ## The joined assay contains 14 rows (corresponding to the union ## of those in the initial assays) and 12 samples assay(feat2[["joinedAssay"]]) ## The individual rowData to be joined. rowData(feat2[[1]]) rowData(feat2[[2]]) rowData(feat2[[3]]) ## Only the 'Prot' variable is retained because it is shared among ## all assays and the values and coherent across samples (the ## value of 'Prot' for row 'j' is always 'Pj'). The variable 'y' is ## missing in 'assay1' and while variable 'x' is present is all ## assays, the values for the shared rows are different. rowData(feat2[["joinedAssay"]])
## ----------------------------------------------- ## An example QFeatures with 3 assays to be joined ## ----------------------------------------------- data(feat2) feat2 feat2 <- joinAssays(feat2, 1:3) ## Individual assays to be joined, each with 4 samples and a ## variable number of rows. assay(feat2[[1]]) assay(feat2[[2]]) assay(feat2[[3]]) ## The joined assay contains 14 rows (corresponding to the union ## of those in the initial assays) and 12 samples assay(feat2[["joinedAssay"]]) ## The individual rowData to be joined. rowData(feat2[[1]]) rowData(feat2[[2]]) rowData(feat2[[3]]) ## Only the 'Prot' variable is retained because it is shared among ## all assays and the values and coherent across samples (the ## value of 'Prot' for row 'j' is always 'Pj'). The variable 'y' is ## missing in 'assay1' and while variable 'x' is present is all ## assays, the values for the shared rows are different. rowData(feat2[["joinedAssay"]])
This manual page describes the handling of missing values in
QFeatures objects. In the following functions, if object
is of
class QFeatures
, an optional assay index or name i
can be
specified to define the assay (by name of index) on which to
operate.
The following functions are currently available:
zeroIsNA(object, i)
replaces all 0 in object
by NA
. This
is often necessary when third-party software assume that
features that weren't quantified should be assigned an
intensity of 0.
infIsNA(object, i)
replaces all infinite values in object
by
NA
. This is necessary when third-party software divide
expression data by zero values, for instance during custom
normalization.
nNA(object, i)
returns a list of missing value summaries. The
first element nNA
gives a DataFrame
with the number and the
proportion of missing values for the whole assay; the second
element nNArows
provides a DataFrame
with the number and the
proportion of missing values for the features (rows) of the
assay(s); the third element nNAcols
provides the number and
the proportions of missing values in each sample of the
assay(s). When object
has class QFeatures
and additional
column with the assays is provided in each element's
DataFrame
.
filterNA(object, pNA, i)
removes features (rows) that contain
a proportion of more missing values of pNA
or higher.
See the Processing vignette for examples.
## S4 method for signature 'SummarizedExperiment,missing' zeroIsNA(object, i) ## S4 method for signature 'QFeatures,integer' zeroIsNA(object, i) ## S4 method for signature 'QFeatures,numeric' zeroIsNA(object, i) ## S4 method for signature 'QFeatures,character' zeroIsNA(object, i) ## S4 method for signature 'SummarizedExperiment,missing' infIsNA(object, i) ## S4 method for signature 'QFeatures,integer' infIsNA(object, i) ## S4 method for signature 'QFeatures,numeric' infIsNA(object, i) ## S4 method for signature 'QFeatures,character' infIsNA(object, i) ## S4 method for signature 'SummarizedExperiment,missing' nNA(object, i) ## S4 method for signature 'QFeatures,integer' nNA(object, i) ## S4 method for signature 'QFeatures,numeric' nNA(object, i) ## S4 method for signature 'QFeatures,character' nNA(object, i) ## S4 method for signature 'SummarizedExperiment' filterNA(object, pNA = 0) ## S4 method for signature 'QFeatures' filterNA(object, pNA = 0, i)
## S4 method for signature 'SummarizedExperiment,missing' zeroIsNA(object, i) ## S4 method for signature 'QFeatures,integer' zeroIsNA(object, i) ## S4 method for signature 'QFeatures,numeric' zeroIsNA(object, i) ## S4 method for signature 'QFeatures,character' zeroIsNA(object, i) ## S4 method for signature 'SummarizedExperiment,missing' infIsNA(object, i) ## S4 method for signature 'QFeatures,integer' infIsNA(object, i) ## S4 method for signature 'QFeatures,numeric' infIsNA(object, i) ## S4 method for signature 'QFeatures,character' infIsNA(object, i) ## S4 method for signature 'SummarizedExperiment,missing' nNA(object, i) ## S4 method for signature 'QFeatures,integer' nNA(object, i) ## S4 method for signature 'QFeatures,numeric' nNA(object, i) ## S4 method for signature 'QFeatures,character' nNA(object, i) ## S4 method for signature 'SummarizedExperiment' filterNA(object, pNA = 0) ## S4 method for signature 'QFeatures' filterNA(object, pNA = 0, i)
object |
An object of class |
i |
One or more indices or names of the assay(s) to be processed. |
pNA |
|
An instance of the same class as object
.
The impute()
for QFeautres
instances.
data(ft_na) ## Summary if missing values nNA(ft_na, 1) ## Remove rows with missing values assay(filterNA(ft_na, i = 1)) ## Replace NAs by zero and back ft_na <- impute(ft_na, i = 1, method = "zero") assay(ft_na) ft_na <- zeroIsNA(ft_na, 1) assay(ft_na)
data(ft_na) ## Summary if missing values nNA(ft_na, 1) ## Remove rows with missing values assay(filterNA(ft_na, i = 1)) ## Replace NAs by zero and back ft_na <- impute(ft_na, i = 1, method = "zero") assay(ft_na) ft_na <- zeroIsNA(ft_na, 1) assay(ft_na)
Conceptually, a QFeatures
object holds a set of assays, each
composed of a matrix
(or array
) containing quantitative data
and row annotations (meta-data). The number and the names of the
columns (samples) must always be the same across the assays, but
the number and the names of the rows (features) can vary. The
assays are typically defined as SummarizedExperiment
objects. In
addition, a QFeatures
object also uses a single DataFrame
to
annotate the samples (columns) represented in all the matrices.
The QFeatures
class extends the
MultiAssayExperiment::MultiAssayExperiment and inherits all
the functionality of the
MultiAssayExperiment::MultiAssayExperiment class.
A typical use case for such QFeatures
object is to represent
quantitative proteomics (or metabolomics) data, where different
assays represent quantitation data at the PSM (the main assay),
peptide and protein level, and where peptide values are computed
from the PSM data, and the protein-level data is calculated based
on the peptide-level values. The largest assay (the one with the
highest number of features, PSMs in the example above) is
considered the main assay.
The recommended way to create QFeatures
objects is the use the
readQFeatures()
function, that creates an instance from tabular
data. The QFeatures
constructor can be used to create objects
from their bare parts. It is the user's responsability to make
sure that these match the class validity requirements.
QFeatures(..., assayLinks = NULL) ## S4 method for signature 'QFeatures' show(object) ## S3 method for class 'QFeatures' plot(x, interactive = FALSE, ...) ## S4 method for signature 'QFeatures,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'QFeatures,character,ANY,ANY' x[i, j, k, ..., drop = TRUE] ## S4 method for signature 'QFeatures' c(x, ...) ## S4 method for signature 'QFeatures' dims(x, use.names = TRUE) ## S4 method for signature 'QFeatures' nrows(x, use.names = TRUE) ## S4 method for signature 'QFeatures' ncols(x, use.names = TRUE) ## S4 method for signature 'QFeatures' rowData(x, use.names = TRUE, ...) ## S4 replacement method for signature 'QFeatures,DataFrameList' rowData(x) <- value ## S4 replacement method for signature 'QFeatures,ANY' rowData(x) <- value rbindRowData(object, i) selectRowData(x, rowvars) rowDataNames(x) ## S4 replacement method for signature 'QFeatures,character' names(x) <- value longFormat(object, colvars = NULL, rowvars = NULL, index = 1L) addAssay(x, y, name, assayLinks) removeAssay(x, i) replaceAssay(x, y, i) ## S4 replacement method for signature 'QFeatures,ANY,ANY' x[[i, j, ...]] <- value ## S4 method for signature 'QFeatures' updateObject(object, ..., verbose = FALSE) dropEmptyAssays(object, dims = 1:2)
QFeatures(..., assayLinks = NULL) ## S4 method for signature 'QFeatures' show(object) ## S3 method for class 'QFeatures' plot(x, interactive = FALSE, ...) ## S4 method for signature 'QFeatures,ANY,ANY,ANY' x[i, j, ..., drop = TRUE] ## S4 method for signature 'QFeatures,character,ANY,ANY' x[i, j, k, ..., drop = TRUE] ## S4 method for signature 'QFeatures' c(x, ...) ## S4 method for signature 'QFeatures' dims(x, use.names = TRUE) ## S4 method for signature 'QFeatures' nrows(x, use.names = TRUE) ## S4 method for signature 'QFeatures' ncols(x, use.names = TRUE) ## S4 method for signature 'QFeatures' rowData(x, use.names = TRUE, ...) ## S4 replacement method for signature 'QFeatures,DataFrameList' rowData(x) <- value ## S4 replacement method for signature 'QFeatures,ANY' rowData(x) <- value rbindRowData(object, i) selectRowData(x, rowvars) rowDataNames(x) ## S4 replacement method for signature 'QFeatures,character' names(x) <- value longFormat(object, colvars = NULL, rowvars = NULL, index = 1L) addAssay(x, y, name, assayLinks) removeAssay(x, i) replaceAssay(x, y, i) ## S4 replacement method for signature 'QFeatures,ANY,ANY' x[[i, j, ...]] <- value ## S4 method for signature 'QFeatures' updateObject(object, ..., verbose = FALSE) dropEmptyAssays(object, dims = 1:2)
... |
See |
assayLinks |
An optional AssayLinks. |
object |
An instance of class QFeatures. |
x |
An instance of class QFeatures. |
interactive |
A |
i |
An indexing vector. See the corresponding section in the documentation for more details. |
j |
|
drop |
logical (default |
k |
|
use.names |
A |
value |
The values to use as a replacement. See the corresponding section in the documentation for more details. |
rowvars |
A |
colvars |
A |
index |
The assay indicator within each |
y |
An object that inherits from |
name |
A |
verbose |
logical (default FALSE) whether to print extra messages |
dims |
|
QFeatures(..., assayLinks)
allows the manual construction of
objects. It is the user's responsability to make sure these
comply. The arguments in ...
are those documented in
MultiAssayExperiment::MultiAssayExperiment()
. For details
about assayLinks
, see AssayLinks. An example is shown below.
The readQFeatures()
function constructs a QFeatures
object
from text-based spreadsheet or a data.frame
used to generate
an assay. See the function manual page for details and an
example.
The QFeatures
class extends the
MultiAssayExperiment::MultiAssayExperiment class and inherits
all its accessors and replacement methods.
The rowData
method returns a DataFrameList
containing the
rowData
for each assay of the QFeatures
object. On the other
hand, rowData
can be modified using rowData(x) <- value
,
where value
is a list of tables that can be coerced to DFrame
tables. The names of value
point to the assays for
which the rowData
must be replaced. The column names of each
table are used to replace the data in the existing rowData
. If
the column name does not exist, a new column is added to the
rowData
.
The rbindRowData
functions returns a DFrame
table that
contains the row binded rowData
tables from the selected
assays. In this context, i
is a character()
, integer()
or
logical()
object for subsetting assays. Only rowData variables
that are common to all assays are kept.
The rowDataNames
accessor returns a list with the rowData
variable names.
The longFormat
accessor takes a QFeatures
object and returns
it in a long format DataFrame
. Each quantitative value is
reported on a separate line. colData
and rowData
data can
also be added. This function is an extension of the longFormat
function in the MultiAssayExperiment::MultiAssayExperiment.
The aggregateFeatures()
function creates a new assay by
aggregating features of an existing assay.
addAssay(x, y, name, assayLinks)
: Adds one or more
new assay(s) y
to the QFeatures
instance x
. name
is a character(1)
naming the assay if only one assay is
provided, and is ignored if y
is a list of assays. assayLinks
is an optional AssayLinks. The colData(y)
is
automatically added to colData(x)
by matching sample
names, that is colnames(y)
. If the samples are not present in
x
, the rows of colData(x)
are extended to account for the
new samples. Be aware that conflicting information between the
colData(y)
and the colData(x)
will result in an
error.
removeAssay(x, i)
: Removes one or more assay(s) from the
QFeatures
instance x
. In this context, i
is a character()
,
integer()
or logical()
that indicates which assay(s) to
remove.
replaceAssay(x, y, i)
: Replaces one or more
assay(s) from the QFeatures
instance x
. In this context, i
is a character()
, integer()
or logical()
that indicates
which assay(s) to replace. The AssayLinks
from or to
any replaced assays are automatically removed, unless the
replacement has the same dimension names (columns and row, order
agnostic). Be aware that conflicting information between
colData(y)
and colData(x)
will result in an error.
x[[i]] <- value
: a generic method for adding (when i
is not
in names(x)
), removing (when value
is null) or replacing (when
i
is in names(x)
). Note that the arguments j
and ...
from
the S4 replacement method signature are not allowed.
QFeatures object can be subset using the x[i, j, k, drop = TRUE]
paradigm. In this context, i
is a character()
, integer()
,
logical()
or GRanges()
object for subsetting by rows. See
the argument descriptions for details on the remaining arguments.
The subsetByFeature()
function can be used to subset a
QFeatures
object using one or multiple feature names that will
be matched across different assays, taking the aggregation
relation between assays.
The selectRowData(x, rowvars)
function can be used to
select a limited number of rowData
columns of interest named
in rowvars
in the x
instance of class QFeatures
. All other
variables than rowvars
will be dropped. In case an element in
rowvars
isn't found in any rowData
variable, a message is
printed.
The dropEmptyAssays(object, dims)
function removes empty
assays from a QFeatures
. Empty assays are defined as having 0
rows and/or 0 columns, as defined by the dims
argument.
Laurent Gatto
The readQFeatures()
constructor and the aggregateFeatures()
function. The QFeatures vignette provides an extended example.
The QFeatures-filtering manual page demonstrates how to filter features based on their rowData.
The missing-data manual page to manage missing values in
QFeatures
objects.
The QFeatures-processing and aggregateFeatures()
manual pages
and Processing vignette describe common quantitative data
processing methods using in quantitative proteomics.
## ------------------------ ## An empty QFeatures object ## ------------------------ QFeatures() ## ----------------------------------- ## Creating a QFeatures object manually ## ----------------------------------- ## two assays (matrices) with matching column names m1 <- matrix(1:40, ncol = 4) m2 <- matrix(1:16, ncol = 4) sample_names <- paste0("S", 1:4) colnames(m1) <- colnames(m2) <- sample_names rownames(m1) <- letters[1:10] rownames(m2) <- letters[1:4] ## two corresponding feature metadata with appropriate row names df1 <- DataFrame(Fa = 1:10, Fb = letters[1:10], row.names = rownames(m1)) df2 <- DataFrame(row.names = rownames(m2)) (se1 <- SummarizedExperiment(m1, df1)) (se2 <- SummarizedExperiment(m2, df2)) ## Sample annotation (colData) cd <- DataFrame(Var1 = rnorm(4), Var2 = LETTERS[1:4], row.names = sample_names) el <- list(assay1 = se1, assay2 = se2) fts1 <- QFeatures(el, colData = cd) fts1 fts1[[1]] fts1[["assay1"]] ## Rename assay names(fts1) <- c("se1", "se2") ## Add an assay fts1 <- addAssay(fts1, se1[1:2, ], name = "se3") ## Get the assays feature metadata rowData(fts1) ## Keep only the Fa variable selectRowData(fts1, rowvars = "Fa") ## ----------------------------------- ## See ?readQFeatures to create a ## QFeatures object from a data.frame ## or spreadsheet. ## -----------------------------------
## ------------------------ ## An empty QFeatures object ## ------------------------ QFeatures() ## ----------------------------------- ## Creating a QFeatures object manually ## ----------------------------------- ## two assays (matrices) with matching column names m1 <- matrix(1:40, ncol = 4) m2 <- matrix(1:16, ncol = 4) sample_names <- paste0("S", 1:4) colnames(m1) <- colnames(m2) <- sample_names rownames(m1) <- letters[1:10] rownames(m2) <- letters[1:4] ## two corresponding feature metadata with appropriate row names df1 <- DataFrame(Fa = 1:10, Fb = letters[1:10], row.names = rownames(m1)) df2 <- DataFrame(row.names = rownames(m2)) (se1 <- SummarizedExperiment(m1, df1)) (se2 <- SummarizedExperiment(m2, df2)) ## Sample annotation (colData) cd <- DataFrame(Var1 = rnorm(4), Var2 = LETTERS[1:4], row.names = sample_names) el <- list(assay1 = se1, assay2 = se2) fts1 <- QFeatures(el, colData = cd) fts1 fts1[[1]] fts1[["assay1"]] ## Rename assay names(fts1) <- c("se1", "se2") ## Add an assay fts1 <- addAssay(fts1, se1[1:2, ], name = "se3") ## Get the assays feature metadata rowData(fts1) ## Keep only the Fa variable selectRowData(fts1, rowvars = "Fa") ## ----------------------------------- ## See ?readQFeatures to create a ## QFeatures object from a data.frame ## or spreadsheet. ## -----------------------------------
The filterFeatures
methods enables users to filter features
based on a variable in their rowData
. The features matching the
filter will be returned as a new object of class QFeatures
. The
filters can be provided as instances of class AnnotationFilter
(see below) or as formulas.
VariableFilter(field, value, condition = "==", not = FALSE) ## S4 method for signature 'QFeatures,AnnotationFilter' filterFeatures(object, filter, i, na.rm = FALSE, keep = FALSE, ...) ## S4 method for signature 'QFeatures,formula' filterFeatures(object, filter, i, na.rm = FALSE, keep = FALSE, ...)
VariableFilter(field, value, condition = "==", not = FALSE) ## S4 method for signature 'QFeatures,AnnotationFilter' filterFeatures(object, filter, i, na.rm = FALSE, keep = FALSE, ...) ## S4 method for signature 'QFeatures,formula' filterFeatures(object, filter, i, na.rm = FALSE, keep = FALSE, ...)
field |
|
value |
|
condition |
|
not |
|
object |
An instance of class QFeatures. |
filter |
Either an instance of class AnnotationFilter or a formula. |
i |
A numeric, logical or character vector pointing to the assay(s) to be filtered. |
na.rm |
|
keep |
|
... |
Additional parameters. Currently ignored. |
An filtered QFeature
object.
filterFeatures()
will go through each assay of the QFeatures
object and apply the filtering on the corresponding rowData
.
Features that do not pass the filter condition are removed from
the assay. In some cases, one may want to filter for a variable
present in some assay, but not in other. There are two options:
either provide keep = FALSE
to remove all features for those
assays (and thus leaving an empty assay), or provide keep = TRUE
to ignore filtering for those assays.
Because features in a QFeatures
object are linked between different
assays with AssayLinks
, the links are automatically updated.
However, note that the function doesn't propagate the filter to parent
assays. For example, suppose a peptide assay with 4 peptides is
linked to a protein assay with 2 proteins (2 peptides mapped per
protein) and you apply filterFeatures()
. All features pass the
filter except for one protein. The peptides mapped to that protein
will remain in the QFeatures
object. If propagation of the
filtering rules to parent assay is desired, you may want to use
x[i, , ]
instead (see the Subsetting section in ?QFeature
).
The variable filters are filters as defined in the
AnnotationFilter package. In addition to the pre-defined filter,
users can arbitrarily set a field on which to operate. These
arbitrary filters operate either on a character variables (as
CharacterVariableFilter
objects) or numerics (as
NumericVariableFilters
objects), which can be created with the
VariableFilter
constructor.
Laurent Gatto
The QFeatures man page for subsetting and the QFeatures
vignette provides an extended example.
## ---------------------------------------- ## Creating character and numberic ## variable filters ## ---------------------------------------- VariableFilter(field = "my_var", value = "value_to_keep", condition = "==") VariableFilter(field = "my_num_var", value = 0.05, condition = "<=") example(aggregateFeatures) ## ---------------------------------------------------------------- ## Filter all features that are associated to the Mitochondrion in ## the location feature variable. This variable is present in all ## assays. ## ---------------------------------------------------------------- ## using the forumla interface, exact mathc filterFeatures(feat1, ~ location == "Mitochondrion") ## using the forumula intefrace, martial match filterFeatures(feat1, ~startsWith(location, "Mito")) ## using a user-defined character filter filterFeatures(feat1, VariableFilter("location", "Mitochondrion")) ## using a user-defined character filter with partial match filterFeatures(feat1, VariableFilter("location", "Mito", "startsWith")) filterFeatures(feat1, VariableFilter("location", "itochon", "contains")) ## ---------------------------------------------------------------- ## Filter all features that aren't marked as unknown (sub-cellular ## location) in the feature variable ## ---------------------------------------------------------------- ## using a user-defined character filter filterFeatures(feat1, VariableFilter("location", "unknown", condition = "!=")) ## using the forumula interface filterFeatures(feat1, ~ location != "unknown") ## ---------------------------------------------------------------- ## Filter features that have a p-values lower or equal to 0.03 ## ---------------------------------------------------------------- ## using a user-defined numeric filter filterFeatures(feat1, VariableFilter("pval", 0.03, "<=")) ## using the formula interface filterFeatures(feat1, ~ pval <= 0.03) ## you can also remove all p-values that are NA (if any) filterFeatures(feat1, ~ !is.na(pval)) ## ---------------------------------------------------------------- ## Negative control - filtering for an non-existing markers value, ## returning empty results. ## ---------------------------------------------------------------- filterFeatures(feat1, VariableFilter("location", "not")) filterFeatures(feat1, ~ location == "not") ## ---------------------------------------------------------------- ## Filtering for a missing feature variable. The outcome is controled ## by keep ## ---------------------------------------------------------------- data(feat2) filterFeatures(feat2, ~ y < 0) filterFeatures(feat2, ~ y < 0, keep = TRUE) ## ---------------------------------------------------------------- ## Example with missing values ## ---------------------------------------------------------------- data(feat1) rowData(feat1[[1]])[1, "location"] <- NA rowData(feat1[[1]]) ## The row with the NA is not removed rowData(filterFeatures(feat1, ~ location == "Mitochondrion")[[1]]) rowData(filterFeatures(feat1, ~ location == "Mitochondrion", na.rm = FALSE)[[1]]) ## The row with the NA is removed rowData(filterFeatures(feat1, ~ location == "Mitochondrion", na.rm = TRUE)[[1]]) ## Note that in situations with missing values, it is possible to ## use the `%in%` operator or filter missing values out ## explicitly. rowData(filterFeatures(feat1, ~ location %in% "Mitochondrion")[[1]]) rowData(filterFeatures(feat1, ~ location %in% c(NA, "Mitochondrion"))[[1]]) ## Explicit handling filterFeatures(feat1, ~ !is.na(location) & location == "Mitochondrion") ## Using the pipe operator feat1 |> filterFeatures( ~ !is.na(location)) |> filterFeatures( ~ location == "Mitochondrion")
## ---------------------------------------- ## Creating character and numberic ## variable filters ## ---------------------------------------- VariableFilter(field = "my_var", value = "value_to_keep", condition = "==") VariableFilter(field = "my_num_var", value = 0.05, condition = "<=") example(aggregateFeatures) ## ---------------------------------------------------------------- ## Filter all features that are associated to the Mitochondrion in ## the location feature variable. This variable is present in all ## assays. ## ---------------------------------------------------------------- ## using the forumla interface, exact mathc filterFeatures(feat1, ~ location == "Mitochondrion") ## using the forumula intefrace, martial match filterFeatures(feat1, ~startsWith(location, "Mito")) ## using a user-defined character filter filterFeatures(feat1, VariableFilter("location", "Mitochondrion")) ## using a user-defined character filter with partial match filterFeatures(feat1, VariableFilter("location", "Mito", "startsWith")) filterFeatures(feat1, VariableFilter("location", "itochon", "contains")) ## ---------------------------------------------------------------- ## Filter all features that aren't marked as unknown (sub-cellular ## location) in the feature variable ## ---------------------------------------------------------------- ## using a user-defined character filter filterFeatures(feat1, VariableFilter("location", "unknown", condition = "!=")) ## using the forumula interface filterFeatures(feat1, ~ location != "unknown") ## ---------------------------------------------------------------- ## Filter features that have a p-values lower or equal to 0.03 ## ---------------------------------------------------------------- ## using a user-defined numeric filter filterFeatures(feat1, VariableFilter("pval", 0.03, "<=")) ## using the formula interface filterFeatures(feat1, ~ pval <= 0.03) ## you can also remove all p-values that are NA (if any) filterFeatures(feat1, ~ !is.na(pval)) ## ---------------------------------------------------------------- ## Negative control - filtering for an non-existing markers value, ## returning empty results. ## ---------------------------------------------------------------- filterFeatures(feat1, VariableFilter("location", "not")) filterFeatures(feat1, ~ location == "not") ## ---------------------------------------------------------------- ## Filtering for a missing feature variable. The outcome is controled ## by keep ## ---------------------------------------------------------------- data(feat2) filterFeatures(feat2, ~ y < 0) filterFeatures(feat2, ~ y < 0, keep = TRUE) ## ---------------------------------------------------------------- ## Example with missing values ## ---------------------------------------------------------------- data(feat1) rowData(feat1[[1]])[1, "location"] <- NA rowData(feat1[[1]]) ## The row with the NA is not removed rowData(filterFeatures(feat1, ~ location == "Mitochondrion")[[1]]) rowData(filterFeatures(feat1, ~ location == "Mitochondrion", na.rm = FALSE)[[1]]) ## The row with the NA is removed rowData(filterFeatures(feat1, ~ location == "Mitochondrion", na.rm = TRUE)[[1]]) ## Note that in situations with missing values, it is possible to ## use the `%in%` operator or filter missing values out ## explicitly. rowData(filterFeatures(feat1, ~ location %in% "Mitochondrion")[[1]]) rowData(filterFeatures(feat1, ~ location %in% c(NA, "Mitochondrion"))[[1]]) ## Explicit handling filterFeatures(feat1, ~ !is.na(location) & location == "Mitochondrion") ## Using the pipe operator feat1 |> filterFeatures( ~ !is.na(location)) |> filterFeatures( ~ location == "Mitochondrion")
This manual page describes common quantitative proteomics data
processing methods using QFeatures objects. In the following
functions, if object
is of class QFeatures
, and optional assay
index or name i
can be specified to define the assay (by name of
index) on which to operate.
The following functions are currently available:
logTransform(object, base = 2, i, pc = 0)
log-transforms (with
an optional pseudocount offset) the assay(s).
normalize(object, method, i)
normalises the assay(s) according
to method
(see Details).
scaleTransform(object, center = TRUE, scale = TRUE, i)
applies
base::scale()
to SummarizedExperiment
and QFeatures
objects.
sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...)
sweeps out array summaries from SummarizedExperiment
and
QFeatures
objects. See base::sweep()
for details.
See the Processing vignette for examples.
## S4 method for signature 'SummarizedExperiment' logTransform(object, base = 2, pc = 0) ## S4 method for signature 'QFeatures' logTransform(object, i, name = "logAssay", base = 2, pc = 0) ## S4 method for signature 'SummarizedExperiment' scaleTransform(object, center = TRUE, scale = TRUE) ## S4 method for signature 'QFeatures' scaleTransform(object, i, name = "scaledAssay", center = TRUE, scale = TRUE) ## S4 method for signature 'SummarizedExperiment' normalize(object, method, ...) ## S4 method for signature 'QFeatures' normalize(object, i, name = "normAssay", method, ...) ## S4 method for signature 'SummarizedExperiment' sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...) ## S4 method for signature 'QFeatures' sweep( x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ..., i, name = "sweptAssay" )
## S4 method for signature 'SummarizedExperiment' logTransform(object, base = 2, pc = 0) ## S4 method for signature 'QFeatures' logTransform(object, i, name = "logAssay", base = 2, pc = 0) ## S4 method for signature 'SummarizedExperiment' scaleTransform(object, center = TRUE, scale = TRUE) ## S4 method for signature 'QFeatures' scaleTransform(object, i, name = "scaledAssay", center = TRUE, scale = TRUE) ## S4 method for signature 'SummarizedExperiment' normalize(object, method, ...) ## S4 method for signature 'QFeatures' normalize(object, i, name = "normAssay", method, ...) ## S4 method for signature 'SummarizedExperiment' sweep(x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ...) ## S4 method for signature 'QFeatures' sweep( x, MARGIN, STATS, FUN = "-", check.margin = TRUE, ..., i, name = "sweptAssay" )
object |
An object of class |
base |
|
pc |
|
i |
A numeric vector or a character vector giving the index or the name, respectively, of the assay(s) to be processed. |
name |
A |
center |
|
scale |
|
method |
|
... |
Additional parameters passed to inner functions. |
x |
An object of class |
MARGIN |
As in |
STATS |
As in |
FUN |
As in |
check.margin |
As in |
The method
parameter in normalize
can be one of "sum"
,
"max"
, "center.mean"
, "center.median"
, "div.mean"
,
"div.median"
, "diff.median"
, "quantiles
", "quantiles.robust
"
or "vsn"
. The MsCoreUtils::normalizeMethods()
function returns
a vector of available normalisation methods.
For "sum"
and "max"
, each feature's intensity is divided by
the maximum or the sum of the feature respectively. These two
methods are applied along the features (rows).
"center.mean"
and "center.median"
center the respective
sample (column) intensities by subtracting the respective column
means or medians. "div.mean"
and "div.median"
divide by the
column means or medians. These are equivalent to sweep
ing the
column means (medians) along MARGIN = 2
with FUN = "-"
(for
"center.*"
) or FUN = "/"
(for "div.*"
).
"diff.median"
centers all samples (columns) so that they all
match the grand median by subtracting the respective columns
medians differences to the grand median.
Using "quantiles"
or "quantiles.robust"
applies (robust) quantile
normalisation, as implemented in preprocessCore::normalize.quantiles()
and preprocessCore::normalize.quantiles.robust()
. "vsn"
uses the
vsn::vsn2()
function. Note that the latter also glog-transforms the
intensities. See respective manuals for more details and function
arguments.
For further details and examples about normalisation, see
MsCoreUtils::normalize_matrix()
.
An processed object of the same class as x
or object
.
MsCoreUtils::normalizeMethods()
MsCoreUtils::normalizeMethods()
These functions convert tabular data into dedicated data
objets. The readSummarizedExperiment()
function takes a file
name or data.frame
and converts it into a
SummarizedExperiment()
object. The readQFeatures()
function
takes a data.frame
and converts it into a QFeatures
object
(see QFeatures()
for details). For the latter, two use-cases
exist:
The single-set case will generate a QFeatures
object with a
single SummarizedExperiment
containing all features of the
input table.
The multi-set case will generate a QFeatures
object containing
multiple SummarizedExperiment
s, resulting from splitting the
input table. This multi-set case is generally used when the
input table contains data from multiple runs/batches.
readSummarizedExperiment( assayData, quantCols = NULL, fnames = NULL, ecol = NULL, ... ) readQFeatures( assayData, colData = NULL, quantCols = NULL, runCol = NULL, name = "quants", removeEmptyCols = FALSE, verbose = TRUE, ecol = NULL, ... )
readSummarizedExperiment( assayData, quantCols = NULL, fnames = NULL, ecol = NULL, ... ) readQFeatures( assayData, colData = NULL, quantCols = NULL, runCol = NULL, name = "quants", removeEmptyCols = FALSE, verbose = TRUE, ecol = NULL, ... )
assayData |
A |
quantCols |
A |
fnames |
For the single- and multi-set cases, an optional
|
ecol |
Same as |
... |
Further arguments that can be passed on to |
colData |
A |
runCol |
For the multi-set case, a |
name |
For the single-set case, an optional |
removeEmptyCols |
A |
verbose |
A |
The single- and multi-set cases are defined by the quantCols
and
runCol
parameters, whether passed by the quantCols
and
runCol
vectors and/or the colData
data.frame
(see below).
The quantitative data variables are defined by the quantCols
.
The single-set case can be represented schematically as shown
below.
|------+----------------+-----------| | cols | quantCols 1..N | more cols | | . | ... | ... | | . | ... | ... | | . | ... | ... | |------+----------------+-----------|
Note that every quantCols
column contains data for a single
sample. The single-set case is defined by the absence of any
runCol
input (see next section). We here provide a
(non-exhaustive) list of typical data sets that fall under the
single-set case:
Peptide- or protein-level label-free data (bulk or single-cell).
Peptide- or protein-level multiplexed (e.g. TMT) data (bulk or single-cell).
PSM-level multiplexed data acquired in a single MS run (bulk or single-cell).
PSM-level data from fractionation experiments, where each fraction of the same sample was acquired with the same multiplexing label.
A run/batch variable, runCol
, is required to import multi-set
data. The multi-set case can be represented schematically as shown
below.
|--------+------+----------------+-----------| | runCol | cols | quantCols 1..N | more cols | | 1 | . | ... | ... | | 1 | . | ... | ... | |--------+------+----------------+-----------| | 2 | . | ... | ... | |--------+------+----------------+-----------| | . | . | ... | ... | |--------+------+----------------+-----------|
Every quantCols
column contains data for multiple samples
acquired in different runs. The multi-set case applies when
runCol
is provided, which will determine how the table is split
into multiple sets.
We here provide a (non-exhaustive) list of typical data sets that fall under the multi-set case:
PSM- or precursor-level multiplexed data acquired in multiple runs (bulk or single-cell)
PSM- or precursor-level label-free data acquired in multiple runs (bulk or single-cell)
DIA-NN data (see also readQFeaturesFromDIANN()
).
colData
We recommend providing sample annotations when creating a
QFeatures
object. The colData
is a table in which each row
corresponds to a sample and each column provides information about
the samples. There is no restriction on the number of columns and
on the type of data they should contain. However, we impose one or
two columns (depending on the use case) that allow to link the
annotations of each sample to its quantitative data:
Single-set case: the colData
must contain a column named
quantCols
that provides the names of the columns in
assayData
containing quantitative values for each sample (see
single-set cases in the examples).
Multi-set case: the colData
must contain a column named
quantCols
that provides the names of the columns in
assayData
with the quantitative values for each sample, and a
column named runCol
that provides the MS runs/batches in which
each sample has been acquired. The entries in
colData[["runCol"]]
are matched against the entries provided
by assayData[[runCol]]
.
When the quantCols
argument is not provided to
readQFeatures()
, the function will automatically determine the
quantCols
from colData[["quantCols"]]
. Therefore, quantCols
and colData
cannot be both missing.
Samples that are present in assayData
but absent
colData
will lead to a warning, and the missing entries will be
automatically added to the colData
and filled with NA
s.
When using the quantCols
and runCol
arguments only
(without colData
), the colData
contains zero
columns/variables.
An instance of class QFeatures
or
SummarizedExperiment::SummarizedExperiment()
. For the
former, the quantitative sets of each run are stored in
SummarizedExperiment::SummarizedExperiment()
object.
Laurent Gatto, Christophe Vanderaa
The QFeatures
(see QFeatures()
) class to read about how to
manipulate the resulting QFeatures
object.
The readQFeaturesFromDIANN()
function to import DIA-NN
quantitative data.
###################################### ## Single-set case. ## Load a data.frame with PSM-level data data(hlpsms) hlpsms[1:10, c(1, 2, 10:11, 14, 17)] ## Create a QFeatures object with a single psms set qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms") qf1 colData(qf1) ###################################### ## Single-set case with colData. (coldat <- data.frame(var = rnorm(10), quantCols = names(hlpsms)[1:10])) qf2 <- readQFeatures(hlpsms, colData = coldat) qf2 colData(qf2) ###################################### ## Multi-set case. ## Let's simulate 3 different files/batches for that same input ## data.frame, and define a colData data.frame. hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE)) hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)] qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file") qf3 colData(qf3) ###################################### ## Multi-set case with colData. (coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10), var = rnorm(10), quantCols = names(hlpsms)[1:10])) qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file") qf4 colData(qf4)
###################################### ## Single-set case. ## Load a data.frame with PSM-level data data(hlpsms) hlpsms[1:10, c(1, 2, 10:11, 14, 17)] ## Create a QFeatures object with a single psms set qf1 <- readQFeatures(hlpsms, quantCols = 1:10, name = "psms") qf1 colData(qf1) ###################################### ## Single-set case with colData. (coldat <- data.frame(var = rnorm(10), quantCols = names(hlpsms)[1:10])) qf2 <- readQFeatures(hlpsms, colData = coldat) qf2 colData(qf2) ###################################### ## Multi-set case. ## Let's simulate 3 different files/batches for that same input ## data.frame, and define a colData data.frame. hlpsms$file <- paste0("File", sample(1:3, nrow(hlpsms), replace = TRUE)) hlpsms[1:10, c(1, 2, 10:11, 14, 17, 29)] qf3 <- readQFeatures(hlpsms, quantCols = 1:10, runCol = "file") qf3 colData(qf3) ###################################### ## Multi-set case with colData. (coldat <- data.frame(runCol = rep(paste0("File", 1:3), each = 10), var = rnorm(10), quantCols = names(hlpsms)[1:10])) qf4 <- readQFeatures(hlpsms, colData = coldat, runCol = "file") qf4 colData(qf4)
This function takes the Report.tsv
output files from DIA-NN and
converts them into a multi-set QFeatures
object. It is a wrapper
around readQFeatures()
with default parameters set to match
DIA-NN label-free and plexDIA report files: default runCol
is
"File.Name"
and default quantColsis
"Ms1.Area"'.
readQFeaturesFromDIANN( assayData, colData = NULL, quantCols = "Ms1.Area", runCol = "File.Name", multiplexing = c("none", "mTRAQ"), extractedData = NULL, ecol = NULL, verbose = TRUE, ... )
readQFeaturesFromDIANN( assayData, colData = NULL, quantCols = "Ms1.Area", runCol = "File.Name", multiplexing = c("none", "mTRAQ"), extractedData = NULL, ecol = NULL, verbose = TRUE, ... )
assayData |
A |
colData |
A |
quantCols |
A |
runCol |
For the multi-set case, a |
multiplexing |
A |
extractedData |
A |
ecol |
Same as |
verbose |
A |
... |
Further arguments passed to |
An instance of class QFeatures
. The quantiative data of
each acquisition run is stored in a separate set as a
SummarizedExperiment
object.
Laurent Gatto, Christophe Vanderaa
Derks, Jason, Andrew Leduc, Georg Wallmann, R. Gray Huffman, Matthew Willetts, Saad Khan, Harrison Specht, Markus Ralser, Vadim Demichev, and Nikolai Slavov. 2022. "Increasing the Throughput of Sensitive Proteomics by plexDIA." Nature Biotechnology, July. Link to article
The QFeatures
(see QFeatures()
) class to read about how to
manipulate the resulting QFeatures
object.
The readQFeatures()
function which this one depends on.
x <- read.delim(MsDataHub::benchmarkingDIA.tsv()) x[["File.Name"]] <- x[["Run"]] ################################# ## Label-free multi-set case ## using default arguments readQFeaturesFromDIANN(x) ## with a colData (and default arguments) cd <- data.frame(sampleInfo = LETTERS[1:24], quantCols = "Ms1.Area", runCol = unique(x[["File.Name"]])) readQFeaturesFromDIANN(x, colData = cd) ################################# ## mTRAQ multi-set case x2 <- read.delim(MsDataHub::Report.Derks2022.plexDIA.tsv()) x2[["File.Name"]] <- x2[["Run"]] readQFeaturesFromDIANN(x2, multiplexing = "mTRAQ")
x <- read.delim(MsDataHub::benchmarkingDIA.tsv()) x[["File.Name"]] <- x[["Run"]] ################################# ## Label-free multi-set case ## using default arguments readQFeaturesFromDIANN(x) ## with a colData (and default arguments) cd <- data.frame(sampleInfo = LETTERS[1:24], quantCols = "Ms1.Area", runCol = unique(x[["File.Name"]])) readQFeaturesFromDIANN(x, colData = cd) ################################# ## mTRAQ multi-set case x2 <- read.delim(MsDataHub::Report.Derks2022.plexDIA.tsv()) x2[["File.Name"]] <- x2[["Run"]] readQFeaturesFromDIANN(x2, multiplexing = "mTRAQ")
DataFrame
A long dataframe can be reduced by mergeing certain rows into a
single one. These new variables are constructed as a SimpleList
containing all the original values. Invariant columns, i.e columns
that have the same value along all the rows that need to be
merged, can be shrunk into a new variables containing that
invariant value (rather than in list columns). The grouping of
rows, i.e. the rows that need to be shrunk together as one, is
defined by a vector.
The opposite operation is expand. But note that for a
DataFrame
to be expanded back, it must not to be simplified.
reduceDataFrame(x, k, count = FALSE, simplify = TRUE, drop = FALSE) expandDataFrame(x, k = NULL)
reduceDataFrame(x, k, count = FALSE, simplify = TRUE, drop = FALSE) expandDataFrame(x, k = NULL)
x |
The |
k |
A ‘vector’ of length |
count |
|
simplify |
A |
drop |
A |
An expanded (reduced) DataFrame
.
Missing values do have an important effect on reduce
. Unless all
values to be reduces are missing, they will result in an
non-invariant column, and will be dropped with drop = TRUE
. See
the example below.
The presence of missing values can have side effects in higher
level functions that rely on reduction of DataFrame
objects.
Laurent Gatto
library("IRanges") k <- sample(100, 1e3, replace = TRUE) df <- DataFrame(k = k, x = round(rnorm(length(k)), 2), y = seq_len(length(k)), z = sample(LETTERS, length(k), replace = TRUE), ir = IRanges(seq_along(k), width = 10), r = Rle(sample(5, length(k), replace = TRUE)), invar = k + 1) df ## Shinks the DataFrame df2 <- reduceDataFrame(df, df$k) df2 ## With a tally of the number of members in each group reduceDataFrame(df, df$k, count = TRUE) ## Much faster, but more crowded result df3 <- reduceDataFrame(df, df$k, simplify = FALSE) df3 ## Drop all non-invariant columns reduceDataFrame(df, df$k, drop = TRUE) ## Missing values d <- DataFrame(k = rep(1:3, each = 3), x = letters[1:9], y = rep(letters[1:3], each = 3), y2 = rep(letters[1:3], each = 3)) d ## y is invariant and can be simplified reduceDataFrame(d, d$k) ## y isn't not dropped reduceDataFrame(d, d$k, drop = TRUE) ## BUT with a missing value d[1, "y"] <- NA d ## y isn't invariant/simplified anymore reduceDataFrame(d, d$k) ## y now gets dropped reduceDataFrame(d, d$k, drop = TRUE)
library("IRanges") k <- sample(100, 1e3, replace = TRUE) df <- DataFrame(k = k, x = round(rnorm(length(k)), 2), y = seq_len(length(k)), z = sample(LETTERS, length(k), replace = TRUE), ir = IRanges(seq_along(k), width = 10), r = Rle(sample(5, length(k), replace = TRUE)), invar = k + 1) df ## Shinks the DataFrame df2 <- reduceDataFrame(df, df$k) df2 ## With a tally of the number of members in each group reduceDataFrame(df, df$k, count = TRUE) ## Much faster, but more crowded result df3 <- reduceDataFrame(df, df$k, simplify = FALSE) df3 ## Drop all non-invariant columns reduceDataFrame(df, df$k, drop = TRUE) ## Missing values d <- DataFrame(k = rep(1:3, each = 3), x = letters[1:9], y = rep(letters[1:3], each = 3), y2 = rep(letters[1:3], each = 3)) d ## y is invariant and can be simplified reduceDataFrame(d, d$k) ## y isn't not dropped reduceDataFrame(d, d$k, drop = TRUE) ## BUT with a missing value d[1, "y"] <- NA d ## y isn't invariant/simplified anymore reduceDataFrame(d, d$k) ## y now gets dropped reduceDataFrame(d, d$k, drop = TRUE)
This function will find the assays and features that match directly (by name) or indirectly (through aggregation) the feature name.
The subsetByFeature
function will first identify the assay that
contains the feature(s) i
and filter the rows matching these
feature names exactly. It will then find, in the other assays, the
features that produces i
through aggregation with the
aggregateQFeatures
function.
See QFeatures for an example.
x |
An instance of class QFeatures. |
y |
A |
... |
Additional parameters. Ignored. |
An new instance of class QFeatures containing relevant assays and features.
example(aggregateFeatures) ## Retrieve protein 'ProtA' and its 2 peptides and 6 PSMs feat1["ProtA", , ]
example(aggregateFeatures) ## Retrieve protein 'ProtA' and its 2 peptides and 6 PSMs feat1["ProtA", , ]
A data frame is said to be folded when some cells contain
multiple elements. These are often encode as a semi-colon
separated character , such as "a;b"
. This function will
transform the data frame to that "a"
and "b"
are split and
recorded across two lines.
The simple example below illustrates a trivial case, where the table below
X | Y |
1 | a;b |
2 | c |
is unfolded based on the Y variable and becomes
X | Y |
1 | a |
1 | b |
2 | c |
where the value 1 of variable X is now duplicated.
If there is a second variable that follows the same pattern as the one used to unfold the table, it also gets unfolded.
X | Y | Z |
1 | a;b | x;y |
2 | c | z |
becomes
X | Y | Z |
1 | a | x |
1 | b | y |
2 | c | z |
because it is implied that the element in "a;b" are match to "x;y" by their respective indices. Note in the above example, unfolding by Y or Z produces the same result.
However, the following table unfolded by Y
X | Y | Z |
1 | a;b | x;y |
2 | c | x;y |
produces
X | Y | Z |
1 | a | x;y |
1 | b | x;y |
2 | c | x;y |
because "c" and "x;y" along the second row don't match. In this case, unfolding by Z would produce a different result. These examples are also illustrated below.
Note that there is no foldDataFrame()
function. See
reduceDataFrame()
and expandDataFrame()
to flexibly encode and
handle vectors of length > 1 within cells.
unfoldDataFrame(x, k, split = ";")
unfoldDataFrame(x, k, split = ";")
x |
A |
k |
|
split |
|
A new object unfolded object of class class(x)
with
numbers of rows >= nrow(x)
and columns identical to x
.
Laurent Gatto
(x0 <- DataFrame(X = 1:2, Y = c("a;b", "c"))) unfoldDataFrame(x0, "Y") (x1 <- DataFrame(X = 1:2, Y = c("a;b", "c"), Z = c("x;y", "z"))) unfoldDataFrame(x1, "Y") unfoldDataFrame(x1, "Z") ## same (x2 <- DataFrame(X = 1:2, Y = c("a;b", "c"), Z = c("x;y", "x;y"))) unfoldDataFrame(x2, "Y") unfoldDataFrame(x2, "Z") ## different
(x0 <- DataFrame(X = 1:2, Y = c("a;b", "c"))) unfoldDataFrame(x0, "Y") (x1 <- DataFrame(X = 1:2, Y = c("a;b", "c"), Z = c("x;y", "z"))) unfoldDataFrame(x1, "Y") unfoldDataFrame(x1, "Z") ## same (x2 <- DataFrame(X = 1:2, Y = c("a;b", "c"), Z = c("x;y", "x;y"))) unfoldDataFrame(x2, "Y") unfoldDataFrame(x2, "Z") ## different