| Title: | R API for the mzTab-M Reference Implementation |
|---|---|
| Description: | The RmzTabM package provides functions to read and write files in the HUPO-PSI mzTab-M file format. This package represents the reference implementation for mzTab-M import/export for R. Core, low level functions allow to format and parse the individual sections of the mzTab-M format. These functions are designed to be re-used by other packages implementing import/export of results in mzTab-M. |
| Authors: | Nils Hoffmann [ctb] (ORCID: <https://orcid.org/0000-0002-6540-6875>), Steffen Neumann [ctb] (ORCID: <https://orcid.org/0000-0002-7899-7192>), Johannes Rainer [aut, cre] (ORCID: <https://orcid.org/0000-0002-6977-7147>), Philippine Louail [aut] (ORCID: <https://orcid.org/0009-0007-5429-6846>, fnd: European Union HORIZON-MSCA-2021 project Grant No. 101073062: HUMAN – Harmonising and Unifying Blood Metabolic Analysis Networks), Gabriele Tomè [aut] (ORCID: <https://orcid.org/0000-0002-3976-6068>, fnd: MetaRbolomics4Galaxy project (CUP: D53C25001030003) co-funded by the Autonomous Province of Bolzano under the Joint Projects South Tyrol–Germany 2025 program.) |
| Maintainer: | Johannes Rainer <[email protected]> |
| License: | Artistic-2.0 |
| Version: | 0.97.13 |
| Built: | 2026-05-21 14:07:39 UTC |
| Source: | https://github.com/rformassspectrometry/RmzTabM |
Sets or updates contact-related metadata fields within an MTD (metadata) in a MTD section. When contact metadata already exists, the function can either replace it entirely or append new values to the existing ones.
setMtdContact( x = matrix(), name = character(), affiliation = character(), email = character(), orcid = character(), replace = FALSE ) getMtdContact(x = matrix())setMtdContact( x = matrix(), name = character(), affiliation = character(), email = character(), orcid = character(), replace = FALSE ) getMtdContact(x = matrix())
x |
A MTD section that stores metadata fields. Defaults to |
name |
|
affiliation |
|
email |
|
orcid |
|
replace |
|
For setMtdContact(): the input object x updated to include the new or
merged contact metadata fields. If x is empty, the empty x.
For getMtdContact(): returns the contact information.
Gabriele Tomè
x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add contact metadata to an existing mzTab object mtd <- setMtdContact(x, name = "Name Surname", affiliation = "PSI-MS", email = "[email protected]", orcid = "0000-0002-1825-0097") ## Replace all existing contact metadata mtd <- setMtdContact(mtd, name = "Name Surname", affiliation = "PSI-MS", email = "[email protected]", orcid = "0000-0002-1825-0097", replace = TRUE) getMtdContact(mtd)x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add contact metadata to an existing mzTab object mtd <- setMtdContact(x, name = "Name Surname", affiliation = "PSI-MS", email = "[email protected]", orcid = "0000-0002-1825-0097") ## Replace all existing contact metadata mtd <- setMtdContact(mtd, name = "Name Surname", affiliation = "PSI-MS", email = "[email protected]", orcid = "0000-0002-1825-0097", replace = TRUE) getMtdContact(mtd)
Sets or updates CV-related metadata fields within an MTD (metadata) section. When CV metadata already exists, the function can either replace it entirely or append new values to the existing ones.
setMtdCv( x = matrix(), label = character(), full_name = character(), version = character(), uri = character(), replace = FALSE ) getMtdCv(x = matrix())setMtdCv( x = matrix(), label = character(), full_name = character(), version = character(), uri = character(), replace = FALSE ) getMtdCv(x = matrix())
x |
A MTD section that stores metadata fields. Defaults to |
label |
|
full_name |
|
version |
|
uri |
|
replace |
|
For setMtdCv(): the input object x updated to include the new or
merged CV metadata fields. If x is empty, the empty x.
For getMtdCv(): returns the CV information.
Gabriele Tomè
x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add CV metadata to an existing mzTab object mtd <- setMtdCv(x, label = "MS", full_name = "PSI-MS controlled vocabulary", version = "4.1.11", uri = "https://purl.obolibrary.org/obo/ms.obo") ## Replace all existing CV metadata mtd <- setMtdCv(mtd, label = "MS", full_name = "PSI-MS controlled vocabulary", version = "4.1.11", uri = "https://purl.obolibrary.org/obo/ms.obo", replace = TRUE) ## Get CV infrmation getMtdCv(mtd)x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add CV metadata to an existing mzTab object mtd <- setMtdCv(x, label = "MS", full_name = "PSI-MS controlled vocabulary", version = "4.1.11", uri = "https://purl.obolibrary.org/obo/ms.obo") ## Replace all existing CV metadata mtd <- setMtdCv(mtd, label = "MS", full_name = "PSI-MS controlled vocabulary", version = "4.1.11", uri = "https://purl.obolibrary.org/obo/ms.obo", replace = TRUE) ## Get CV infrmation getMtdCv(mtd)
Sets or updates database-related metadata fields within an MTD (metadata) section. When database metadata already exists, the function can either replace it entirely or append new values to the existing ones.
setMtdDatabase( x = matrix(), name = character(), prefix = character(), version = character(), uri = character(), replace = FALSE ) getMtdDatabase(x = matrix())setMtdDatabase( x = matrix(), name = character(), prefix = character(), version = character(), uri = character(), replace = FALSE ) getMtdDatabase(x = matrix())
x |
A MTD section that stores metadata fields. Defaults to |
name |
|
prefix |
|
version |
|
uri |
|
replace |
|
For setMtdDatabase(): the input object x updated to include the new
or merged database metadata fields. If x is empty, the empty x.
For getMtdDatabase(): get the database metadata.
Gabriele Tomè
x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add database metadata to an existing mzTab object mtd <- setMtdDatabase(x, name = "[MIRIAM, MIR:00100079, HMDB, ]", prefix = "hmdb", version = "3.6", uri = "http://www.hmdb.ca/") ## Replace all existing database metadata mtd <- setMtdDatabase(mtd, name = "[MIRIAM, MIR:00100079, HMDB, ]", prefix = "hmdb", version = "3.6", uri = "http://www.hmdb.ca/", replace = TRUE) ## Get the database metadata getMtdDatabase(mtd)x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add database metadata to an existing mzTab object mtd <- setMtdDatabase(x, name = "[MIRIAM, MIR:00100079, HMDB, ]", prefix = "hmdb", version = "3.6", uri = "http://www.hmdb.ca/") ## Replace all existing database metadata mtd <- setMtdDatabase(mtd, name = "[MIRIAM, MIR:00100079, HMDB, ]", prefix = "hmdb", version = "3.6", uri = "http://www.hmdb.ca/", replace = TRUE) ## Get the database metadata getMtdDatabase(mtd)
The metadata section/table of the mzTab-M definition is comprehensive, but
also tricky to define. The RmzTabM package provides a variety of utility
functions that help defining this information. These might be re-used for
software package developers to export metabolomics results from their
respective software. Importantly, the helper functions listed here only
define the core elements for the MTD section, helping with re-arranging and
reformatting information available e.g. in data.frame format into the
respective fields in the MTD section. Additional (optional) fields might
need to be added manually depending on availability for an experiment.
See also the specification of the MTD section for details and more information, in particular on the format of the mzTab-M and on mandatory or optional fields.
Generally, MTD data can be categorized into the following parts:
Core information: general information on the experiment. A minimal
set can be created using the mtdSkeleton() function, which might be
further expanded with additional fields. This section allows to describe
the general experimental setup. Also, it should contain references to
all controlled vocabulary (CV) ontologies used and refered to in the
mzTab-M file.
Sample information: optional information on individual samples that were
measured with the various assays/runs. The mtdSample() function
assists in compiling the information for this section.
MS run information: information on the individual MS runs
(measurements of the samples). Each data file is one run. Use the
mtdMsRun() function to define this part of the metadata section.
Assay information: the mtdAssay() function assists in compiling the
assay section of the metadata. Mandatory fields are the name (ID) of the
assay and the reference to the MS run in which the assay was measured.
Optional information on sample reference, external links or custom
information can be provided too. In most cases (except multiplexed assays
or pre-fractionated samples) one assay will link to one MS run. Each assay
must represent one column in the following abundance matrix sections.
Study variable information: the mtdStudyVariables() function allows
to format study variable information from an experiment into the mzTab-M
format. All study variables need to be assigned to at least one assay and
must also be reported in the subsequent abundance matrices.
The helper function listed above can be used sequentially to create the metadata information. See the examples below for a general approach how to define the MTD section of an experiment.
In addition, various helper functions are available to assist in MTD data generation:
mtdSort(): to sort the MTD matrix into the expected order.
mtdFields(): helps formatting values into the mzTab-M-specific format.
setMtdInstrument()/getMtdInstrument(): set/get Instrument Metadata in
an MTD section.
setMtdDatabase()/getMtdDatabase(): set/get Database Metadata in an MTD
section.
setMtdCv()/getMtdCv(): set/get Controlled Vocabularies (CV) Metadata
in an MTD section.
setMtdContact()/getMtdContact(): set/get contact Metadata in an MTD
section.
setMtdField()/getMtdField(): set/get a Metadata Field in an MTD
section.
The general relationship between ms_run, assay and sample:
one ms_run is the measurement of one assay. In most use cases one ms_run is associated with one assay.
one assay can be measured by several MS runs (if fractionated) or multiple assays can be measured in the same MS run (if multiplexed).
one assay is (generally) one sample, but the same sample can be measured with multiple assays (i.e., technical replicates).
Philippine Louail, Johannes Rainer
SMF-export and SML-export for creating and formatting the small molecule feature (SMF) and small molecule (SML) sections.
setMtdInstrument(), MTD-database, MTD-CV, MTD-contact and
MTD-field.
## Building the mzTab-M metadata information from a `data.frame` with sample ## information of an experiment. Each row in that `data.frame` is one ## measurement of one sample (i.e., represents one *ms_run*). Columns in ## that `data.frame` provide the phenotypic and experimental variables of ## each sample. The example below represents a simple experiment in which ## 3 samples (e.g. cell lines) were measured. Columns *time_point and ## *cell_count* provide the time point (in hours) when samples were ## collected and the number of cells per sample. In addition, one ## sample has the genotype *WT* and two *KO*. Column `"operator"` contains ## the initials of the researcher extracting the samples exp <- data.frame( sample_name = c("S1_T1", "S1_T2", "S2_T1", "S2_T2", "S3_T1", "S3_T2"), sample_id = c("S1", "S1", "S2", "S2", "S3", "S3"), time_point = c(0, 6, 0, 6, 0, 6), cell_count = c(13000, 8700, 10100, 6000, 14000, 4500), genotype = c("WT", "WT", "KO", "KO", "KO", "KO"), operator = c("BB", "BB", "BB", "BB", "FB", "FB"), file_name = c("s1-t1.mzML", "s1-t2.mzML", "s2-t1.mzML", "s2-t2.mzML", "s3-t1.mzML", "s3-t2.mzML") ) exp ############################################################################# ## Core metadata information ## We first compile the general metadata information. For the present ## example we assume that we performed only preprocessing of the raw MS ## data using *xcms*, thus we don't specify annotation databases used for ## the compound identification/annotation. These could be provided through ## the `database*` parameters. Also, the quantification method and unit(s) ## could be specified using respective parameters of the function. mtd <- mtdSkeleton( id = "EXP_001", software = "[MS, MS:1001582], xcms, 4.0.0") mtd ## We can add additional content to this *skeleton* MTD information with ## any of the `setMtd*` functions. ## We next add a title and description for the experiment. mtd <- setMtdField(mtd, field = "title", value = "Experiment 1 preprocessed data") mtd <- setMtdField(mtd, field = "description", value = "The preprocessed data of experiment 1 samples.") ## We also add information on the MS instrumentation used mtd <- setMtdInstrument(mtd, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c("analyzer[1]" = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]", ) ## We can also add contact information mtd <- setMtdContact(mtd, name = c("frodo", "sauron"), affiliation = c("fellowship of the ring", "the dark side"), email = c("[email protected]", "[email protected]"), orcid = c("0000-0001-2345-6789", "0000-0001-2345-678X")) mtd ## Other information, such as employed sample processing methods could be ## added in a similar way. ############################################################################# ## Sample information ## We next add sample information to the metadata. In addition to the ## specific sample properties that can be defined using the function's ## parameters, arbitrary custom fields can be defined too. Below we add ## information on sample extraction as custom information. mtd_s <- mtdSample( sample = unique(exp$sample_id), species = "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]", tissue = "[BTO, BTO:0000759, liver, ]", cell_type = "[CL, CL:0000182, hepatocyte, ]", c("[,,Extraction date, 2011-12-21]", "[,,Extraction date, 2011-12-22]", "[,,Extraction date, 2011-12-23]") ) mtd_s mtd <- rbind(mtd, mtd_s) ############################################################################# ## MS run information ## The MS run information should capture information of each individual ## measurement run on an MS instrument. For this, the original data file ## names and location should be provided as well as the format of the ## data files as well as polarity etc. mtd_msr <- mtdMsRun( location = exp$file_name, format = "[MS, MS:1000584, mzML file, ]", id_format = "[MS, MS:1000530, mzML unique identifier, ]", scan_polarity = "positive") mtd <- rbind(mtd, mtd_msr) ############################################################################# ## Assay information ## Each measurement should be associated to (at least) one assay. For our ## simple example, each row in the `data.frame` represents one assay, with ## each assay being measured in one MS run. a <- mtdAssay( assay = exp$sample_name, sample_ref = c("sample[1]", "sample[1]", "sample[2]", "sample[2]", "sample[3]", "sample[3]"), ms_run_ref = paste0("ms_run[", seq_len(nrow(exp)), "]") ) a mtd <- rbind(mtd, a) ############################################################################# ## Study variable information ## Study variables can be defined directly from the experiment `data.frame`. ## In our example we use the columns (information on) `"time_point"`, ## `"genotype"`, `"cell_count"` and `"operator"`. Importantly, the row-order ## of the provided `data.frame` has to match the order of the assays (and ## MS runs). svar <- mtdStudyVariables( exp, groups = c("time_point", "genotype", "cell_count", "operator")) svar ## The individual columns got encoded to a `"study_variable_group"` with a ## defined (statistical) type and a datatype. Both are inferred ## automatically from the columns of the provided `data.frame`, but could ## also be manually provided through the respective function arguments. mtd <- rbind(mtd, svar) ## Finally, the `mtdSort()` function can be used to sort the generated ## two-column matrix in the expected order. mtd <- mtdSort(mtd) ## This metadata information can next be exported manually, or using the ## dedicated export helper functions to an mzTab-M file.## Building the mzTab-M metadata information from a `data.frame` with sample ## information of an experiment. Each row in that `data.frame` is one ## measurement of one sample (i.e., represents one *ms_run*). Columns in ## that `data.frame` provide the phenotypic and experimental variables of ## each sample. The example below represents a simple experiment in which ## 3 samples (e.g. cell lines) were measured. Columns *time_point and ## *cell_count* provide the time point (in hours) when samples were ## collected and the number of cells per sample. In addition, one ## sample has the genotype *WT* and two *KO*. Column `"operator"` contains ## the initials of the researcher extracting the samples exp <- data.frame( sample_name = c("S1_T1", "S1_T2", "S2_T1", "S2_T2", "S3_T1", "S3_T2"), sample_id = c("S1", "S1", "S2", "S2", "S3", "S3"), time_point = c(0, 6, 0, 6, 0, 6), cell_count = c(13000, 8700, 10100, 6000, 14000, 4500), genotype = c("WT", "WT", "KO", "KO", "KO", "KO"), operator = c("BB", "BB", "BB", "BB", "FB", "FB"), file_name = c("s1-t1.mzML", "s1-t2.mzML", "s2-t1.mzML", "s2-t2.mzML", "s3-t1.mzML", "s3-t2.mzML") ) exp ############################################################################# ## Core metadata information ## We first compile the general metadata information. For the present ## example we assume that we performed only preprocessing of the raw MS ## data using *xcms*, thus we don't specify annotation databases used for ## the compound identification/annotation. These could be provided through ## the `database*` parameters. Also, the quantification method and unit(s) ## could be specified using respective parameters of the function. mtd <- mtdSkeleton( id = "EXP_001", software = "[MS, MS:1001582], xcms, 4.0.0") mtd ## We can add additional content to this *skeleton* MTD information with ## any of the `setMtd*` functions. ## We next add a title and description for the experiment. mtd <- setMtdField(mtd, field = "title", value = "Experiment 1 preprocessed data") mtd <- setMtdField(mtd, field = "description", value = "The preprocessed data of experiment 1 samples.") ## We also add information on the MS instrumentation used mtd <- setMtdInstrument(mtd, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c("analyzer[1]" = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]", ) ## We can also add contact information mtd <- setMtdContact(mtd, name = c("frodo", "sauron"), affiliation = c("fellowship of the ring", "the dark side"), email = c("[email protected]", "[email protected]"), orcid = c("0000-0001-2345-6789", "0000-0001-2345-678X")) mtd ## Other information, such as employed sample processing methods could be ## added in a similar way. ############################################################################# ## Sample information ## We next add sample information to the metadata. In addition to the ## specific sample properties that can be defined using the function's ## parameters, arbitrary custom fields can be defined too. Below we add ## information on sample extraction as custom information. mtd_s <- mtdSample( sample = unique(exp$sample_id), species = "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]", tissue = "[BTO, BTO:0000759, liver, ]", cell_type = "[CL, CL:0000182, hepatocyte, ]", c("[,,Extraction date, 2011-12-21]", "[,,Extraction date, 2011-12-22]", "[,,Extraction date, 2011-12-23]") ) mtd_s mtd <- rbind(mtd, mtd_s) ############################################################################# ## MS run information ## The MS run information should capture information of each individual ## measurement run on an MS instrument. For this, the original data file ## names and location should be provided as well as the format of the ## data files as well as polarity etc. mtd_msr <- mtdMsRun( location = exp$file_name, format = "[MS, MS:1000584, mzML file, ]", id_format = "[MS, MS:1000530, mzML unique identifier, ]", scan_polarity = "positive") mtd <- rbind(mtd, mtd_msr) ############################################################################# ## Assay information ## Each measurement should be associated to (at least) one assay. For our ## simple example, each row in the `data.frame` represents one assay, with ## each assay being measured in one MS run. a <- mtdAssay( assay = exp$sample_name, sample_ref = c("sample[1]", "sample[1]", "sample[2]", "sample[2]", "sample[3]", "sample[3]"), ms_run_ref = paste0("ms_run[", seq_len(nrow(exp)), "]") ) a mtd <- rbind(mtd, a) ############################################################################# ## Study variable information ## Study variables can be defined directly from the experiment `data.frame`. ## In our example we use the columns (information on) `"time_point"`, ## `"genotype"`, `"cell_count"` and `"operator"`. Importantly, the row-order ## of the provided `data.frame` has to match the order of the assays (and ## MS runs). svar <- mtdStudyVariables( exp, groups = c("time_point", "genotype", "cell_count", "operator")) svar ## The individual columns got encoded to a `"study_variable_group"` with a ## defined (statistical) type and a datatype. Both are inferred ## automatically from the columns of the provided `data.frame`, but could ## also be manually provided through the respective function arguments. mtd <- rbind(mtd, svar) ## Finally, the `mtdSort()` function can be used to sort the generated ## two-column matrix in the expected order. mtd <- mtdSort(mtd) ## This metadata information can next be exported manually, or using the ## dedicated export helper functions to an mzTab-M file.
Sets or updates a generic metadata field within an MTD (metadata) section. When the field already exists, the function can either replace it entirely or append new values to the existing ones.
setMtdField( x = matrix(), field = character(), value = character(), replace = FALSE ) getMtdField(x = matrix(), field = character())setMtdField( x = matrix(), field = character(), value = character(), replace = FALSE ) getMtdField(x = matrix(), field = character())
x |
A MTD section that stores metadata fields. Defaults to |
field |
|
value |
|
replace |
|
For setMtdField(): the input object x updated to include the new or
merged field metadata. If x is empty, the empty x.
For getMtdField(): character() with the requested metadata.
Gabriele Tomè
x <- mtdSkeleton("001", software = "[MS, MS:1001582, xcms, 4.0.0]") ## Add a metadata field to an existing mzTab object mtd <- setMtdField(x, field = "publication", value = "pubmed:21063943|doi:10.1007/978-1-60761-987-1_6") ## Replace all existing values for a metadata field mtd <- setMtdField(mtd, field = "custom", value = "[,,MS operator, Florian]", replace = TRUE) getMtdField(mtd, field = "mzTab-ID")x <- mtdSkeleton("001", software = "[MS, MS:1001582, xcms, 4.0.0]") ## Add a metadata field to an existing mzTab object mtd <- setMtdField(x, field = "publication", value = "pubmed:21063943|doi:10.1007/978-1-60761-987-1_6") ## Replace all existing values for a metadata field mtd <- setMtdField(mtd, field = "custom", value = "[,,MS operator, Florian]", replace = TRUE) getMtdField(mtd, field = "mzTab-ID")
The mtdAssay() function assists in compiling the assay information of
the metadata section. Each assay must be associated with at least one
entry of the ms_run section (see mtdMsRun()). This mapping can be
defined with the ms_run_ref parameter by providing the ID/name of the
run (e.g. "ms_run[1]").
Important: to support the optional additional parameters passed along
with ... all parameters (such as assay, sample_ref etc) have to
be fully spelled out.
For details and expected input for the various parameter it is strongly suggested to consult the mzTab-M documentation.
mtdAssay( ..., assay = character(), external_uri = character(), sample_ref = character(), ms_run_ref = character(), protocol_ref = character(), parameters = character() )mtdAssay( ..., assay = character(), external_uri = character(), sample_ref = character(), ms_run_ref = character(), protocol_ref = character(), parameters = character() )
... |
optional additional (custom) parameters for each assay. If
provided, the length of the |
assay |
|
external_uri |
optional |
sample_ref |
optional |
ms_run_ref |
|
protocol_ref |
optional |
parameters |
optional |
two-column character matrix with the content for the assay
metadata section.
Johannes Rainer
MTD-export for other functions defining metadata information
## Minimal example with assay and ms_run_ref defined, each assay assigned ## to its own MS run. Note that for **all** parameters the **full** ## parameter name has to be used (e.g., `assay = `). mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]")) ## Example for a multiplexed assay. mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = c("ms_run[1]", "ms_run[1]", "ms_run[1]")) ## Example for a pre-fractionated samples mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = list(c("ms_run[1]", "ms_run[2]"), c("ms_run[3]", "ms_run[4]"), c("ms_run[5]", "ms_run[6]"))) ## Example adding also sample reference and an external_uri mtdAssay( assay = c("a1", "a2", "a3"), external_uri = "https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt", sample_ref = c("sample[1]", "sample[1]", "sample[2]"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]")) ## Example adding also protocol reference mtdAssay( assay = c("a1", "a2", "a3"), external_uri = "https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt", sample_ref = c("sample[1]", "sample[1]", "sample[2]"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]"), protocol_ref = c("protocol[1]", "protocol[1]", "protocol[1]|protocol[2]")) ## Providing additional, custom information for each assay. These can be ## passed as `character` vectors (same length than `assay`!). mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]"), c("[MS, , Assay operator, Fred Blogs]", "[MS, , Assay operator, Fred Blogs]", "[MS, , Assay operator, Frodo]"))## Minimal example with assay and ms_run_ref defined, each assay assigned ## to its own MS run. Note that for **all** parameters the **full** ## parameter name has to be used (e.g., `assay = `). mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]")) ## Example for a multiplexed assay. mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = c("ms_run[1]", "ms_run[1]", "ms_run[1]")) ## Example for a pre-fractionated samples mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = list(c("ms_run[1]", "ms_run[2]"), c("ms_run[3]", "ms_run[4]"), c("ms_run[5]", "ms_run[6]"))) ## Example adding also sample reference and an external_uri mtdAssay( assay = c("a1", "a2", "a3"), external_uri = "https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt", sample_ref = c("sample[1]", "sample[1]", "sample[2]"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]")) ## Example adding also protocol reference mtdAssay( assay = c("a1", "a2", "a3"), external_uri = "https://www.ebi.ac.uk/metabolights/MTBLS517/files/i_Investigation.txt", sample_ref = c("sample[1]", "sample[1]", "sample[2]"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]"), protocol_ref = c("protocol[1]", "protocol[1]", "protocol[1]|protocol[2]")) ## Providing additional, custom information for each assay. These can be ## passed as `character` vectors (same length than `assay`!). mtdAssay(assay = c("a1", "a2", "a3"), ms_run_ref = c("ms_run[1]", "ms_run[2]", "ms_run[3]"), c("[MS, , Assay operator, Fred Blogs]", "[MS, , Assay operator, Fred Blogs]", "[MS, , Assay operator, Frodo]"))
This function assists in creating and formatting information for the mzTab-M metadata section (MTD). It combines and formats the provided input values for a specific field.
See mzTab-M documentation for more information, examples and expected format.
mtdFields(..., field_prefix = "")mtdFields(..., field_prefix = "")
... |
|
field_prefix |
|
two column character matrix with the formatted elements.
Johannes Rainer, Philippine Louail
MTD-export for other functions defining metadata information
## Define the CV element with 3 CV terms: mtdFields( label = c("a", "b", "c"), full_name = c("A", "B", "C"), version = c(1, 2, 3), uri = c("u1", "u2", "u3"), field_prefix = "cv") ## Define a single software: mtdFields("[MS, MS:1002879, Progenesis QI, 3.0]", field_prefix = "software") ## Define two softwares: mtdFields(c("[MS, MS:1002879, Progenesis QI, 3.0]", "[a, b, c, d]"), field_prefix = "software") ## Define a software with the optional setting mtdFields(c("[MS, MS:1002879, Progenesis QI, 3.0]", "[a, b, c, d]"), `setting[1]` = c("my cool settings", "none"), `setting[2]` = c("other setting", "none"), field_prefix = "software") ## Define database fields mtdFields( c("[MITIAM, MRI:00100079, HMDB, ]", "[,, de novo, ]"), prefix = c("hmdb", "dn"), version = c("3.6", "Unknown"), uri = c("http://www.hmdb.ca", "null"), field_prefix = "database" )## Define the CV element with 3 CV terms: mtdFields( label = c("a", "b", "c"), full_name = c("A", "B", "C"), version = c(1, 2, 3), uri = c("u1", "u2", "u3"), field_prefix = "cv") ## Define a single software: mtdFields("[MS, MS:1002879, Progenesis QI, 3.0]", field_prefix = "software") ## Define two softwares: mtdFields(c("[MS, MS:1002879, Progenesis QI, 3.0]", "[a, b, c, d]"), field_prefix = "software") ## Define a software with the optional setting mtdFields(c("[MS, MS:1002879, Progenesis QI, 3.0]", "[a, b, c, d]"), `setting[1]` = c("my cool settings", "none"), `setting[2]` = c("other setting", "none"), field_prefix = "software") ## Define database fields mtdFields( c("[MITIAM, MRI:00100079, HMDB, ]", "[,, de novo, ]"), prefix = c("hmdb", "dn"), version = c("3.6", "Unknown"), uri = c("http://www.hmdb.ca", "null"), field_prefix = "database" )
mtdFromSampleData() compiles the sample, ms_run, assay and
study variable content of the MTD section from a sample data
data.frame. Each row of this data.frame (parameter x) is expected to
represent on MS run (i.e., acquisition of a sample) of an experimen with
columns containing information on the individual MS run and the sample(s).
The columns providing information on the MS run can be specified with
parameter msRunCols. Mandatory columns are location and scan_polarity.
The columns providing information on the individual assays can be defined
with parameter assayCols. In most cases one assay represents one MS run.
The columns from which sample information can be retrieved can be defined
with sampleCols. See notes below for more information.
The columns containing experimental/phenotype information can be defined
with parameter groups.
Arguments defining columns for MS runs, assays and samples are grouped with
parameters msRunCols., assayCols. and sampleCols,. For each a helper
function is available providing defaults and assisting in defining the
arguments. See section Defining columns with information on MS runs, assays
and samples
The MTD information is compiled from the sample data x as follows:
MS runs: each row in x is added as one MS run. The column in x with
the respective file name needs to be defined through parameter
msRunCols..
samples: the unique set of rows based on the column(s) defined with
sampleCols. are added as samples.
assay: each unique element in the column defined through assayCols.
(assay) is added as one assay referencing to samples and MS runs.
In most cases the number of MS runs will match the number of assays, i.e.,
each row in x is one MS run as well as one assay. A 1:n mapping between
assay and MS run is also possible.
study variables: each column defined with parameter groups is added as
one study variable group.
mtdFromSampleData( x, sampleCols. = sampleCols(), msRunCols. = msRunCols(), assayCols. = assayCols(), groups = character(), group_description = character(), group_type = character(), group_datatype = character(), group_unit = character(), average_function = "[MS, MS:1002962, mean, ]", variation_function = "[MS, MS:1002963, variation coefficient, ]", description = character() ) sampleCols( sample = "sample", species = "species", tissue = "tissue", cell_type = "cell_type", disease = "disease", description = "description", ... ) msRunCols( location = "location", instrument_ref = "instrument_ref", format = "format", id_format = "id_format", fragmentation_method = "fragmentation_method", scan_polarity = "scan_polarity", hash = "hash", hash_method = "hash_method" ) assayCols( assay = "assay", external_uri = "external_uri", sample_ref = "sample_ref", ms_run_ref = "ms_run_ref", ... )mtdFromSampleData( x, sampleCols. = sampleCols(), msRunCols. = msRunCols(), assayCols. = assayCols(), groups = character(), group_description = character(), group_type = character(), group_datatype = character(), group_unit = character(), average_function = "[MS, MS:1002962, mean, ]", variation_function = "[MS, MS:1002963, variation coefficient, ]", description = character() ) sampleCols( sample = "sample", species = "species", tissue = "tissue", cell_type = "cell_type", disease = "disease", description = "description", ... ) msRunCols( location = "location", instrument_ref = "instrument_ref", format = "format", id_format = "id_format", fragmentation_method = "fragmentation_method", scan_polarity = "scan_polarity", hash = "hash", hash_method = "hash_method" ) assayCols( assay = "assay", external_uri = "external_uri", sample_ref = "sample_ref", ms_run_ref = "ms_run_ref", ... )
x |
|
sampleCols. |
named |
msRunCols. |
named |
assayCols. |
named |
groups |
|
group_description |
|
group_type |
|
group_datatype |
optional |
group_unit |
optional |
average_function |
optional |
variation_function |
optional |
description |
|
sample |
|
species |
|
tissue |
|
cell_type |
|
disease |
|
... |
named |
location |
|
instrument_ref |
(optional) |
format |
(optional) |
id_format |
(optional) |
fragmentation_method |
(optional) |
scan_polarity |
|
hash |
(optional) |
hash_method |
(optional) |
assay |
|
external_uri |
optional |
sample_ref |
optional |
ms_run_ref |
|
two column character matrix.
msRunCols(): function to create a named character defining the columns
in x containinig information for the individual parameters. To change
the default for the column containing the MS data file names (default
location = "location") to a column called e.g. "mzml_file":
msRunCols(location = "mzml_file"). Parameters location and
scan_polarity have to be set to match the column names in x with the
respective information. See examples for more information.
assayCols(): function to define columns containing information on assays.
Parameter assay has to be adapted. Through ... it is also possible
to define additional columns with optional information on each assay.
sampleCols(): function to define columns containing sample information.
Through ... additional optional columns with sample information can be
provided.
The SML and SMF sections must report one column for each assay. Thus,
each row in the input sample data x is expected to be one assay.
In mzTab-M a sample is the source of an biological sample. If in an
experiment e.g. multiple blood samples are taken at different time points
from the same individual, the individual is considered a single sample.
Information on the time points can be provided as study variables using the
groups parameter.
Johannes Rainer
## Defining an example sample data: ## - file: the mzML file, i.e., the *MS run*. ## - name: the name of the measurement. This is also the name of the sample: ## QC is the pool of all samples, s1 to s4 the ID of the individual. ## - phenotype: defining the biological replicates, 2 for CVD, 2 for CTR. ## - age: covariate, age of the individuals. ## - injection_index: the order in which samples were measured. sdata <- data.frame( file = c("1.mzML", "2.mzML", "3.mzML", "4.mzML", "5.mzML", "6.mzML"), name = c("QC", "s1", "s2", "QC", "s3", "s4"), phenotype = c(NA, "CVD", "CTR", NA, "CTR", "CVD"), age = c(NA, 35, 32, NA, 43, 32), injection_index = c(1, 2, 3, 4, 5, 6)) ## Add additional required columns: sdata$polarity <- "positive" ## Add columns with optional, additional information to the individual ## samples or assays. sdata$organism <- "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]" sdata$assay_info <- c("run1", "run2", "run3", "run4", "run5", "run6") ## Define the columns in `sdata` that provide information on the individual ## samples. scols <- sampleCols(sample = "name", species = "organism") ## Define the columns in `sdata` that provide MS run information mscols <- msRunCols(location = "file", scan_polarity = "polarity") ## Define the columns in `sdata` that provide assay information; we use ## the MS run/file name also for the assay name and add an additional ## column with optional content/information. acols <- assayCols(assay = "file", assay_info = "assay_info") ## Create the MTD section from the `sdata` `data.frame`. Parameter `groups` ## allows to define the columns in `sdata` that should be encoded as ## *study variable groups*. m <- mtdFromSampleData(sdata, sampleCols = scols, msRunCols = mscols, assayCols = acols, groups = c("phenotype", "age", "injection_index")) ## The sample to assay mapping: ## The repeated injection (assay 1 and 4) of the QC sample are assigned ## to the same sample (1). getMtdField(m, "assay\\[\\d\\]-sample_ref") ## The study variable 2 represents a value of `"CVD"` for the *phenotype* ## study variable group getMtdField(m, "study_variable\\[2\\]$") ## and this study variable references the 2nd and 6th assay getMtdField(m, "study_variable\\[2\\]-assay_refs") ## It is also possible to create a MTD section without samples or ## study variables m <- mtdFromSampleData( sdata, msRunCols = c(location = "file", scan_polarity = "polarity"), assayCols = c(assay = "assay_info")) m## Defining an example sample data: ## - file: the mzML file, i.e., the *MS run*. ## - name: the name of the measurement. This is also the name of the sample: ## QC is the pool of all samples, s1 to s4 the ID of the individual. ## - phenotype: defining the biological replicates, 2 for CVD, 2 for CTR. ## - age: covariate, age of the individuals. ## - injection_index: the order in which samples were measured. sdata <- data.frame( file = c("1.mzML", "2.mzML", "3.mzML", "4.mzML", "5.mzML", "6.mzML"), name = c("QC", "s1", "s2", "QC", "s3", "s4"), phenotype = c(NA, "CVD", "CTR", NA, "CTR", "CVD"), age = c(NA, 35, 32, NA, 43, 32), injection_index = c(1, 2, 3, 4, 5, 6)) ## Add additional required columns: sdata$polarity <- "positive" ## Add columns with optional, additional information to the individual ## samples or assays. sdata$organism <- "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]" sdata$assay_info <- c("run1", "run2", "run3", "run4", "run5", "run6") ## Define the columns in `sdata` that provide information on the individual ## samples. scols <- sampleCols(sample = "name", species = "organism") ## Define the columns in `sdata` that provide MS run information mscols <- msRunCols(location = "file", scan_polarity = "polarity") ## Define the columns in `sdata` that provide assay information; we use ## the MS run/file name also for the assay name and add an additional ## column with optional content/information. acols <- assayCols(assay = "file", assay_info = "assay_info") ## Create the MTD section from the `sdata` `data.frame`. Parameter `groups` ## allows to define the columns in `sdata` that should be encoded as ## *study variable groups*. m <- mtdFromSampleData(sdata, sampleCols = scols, msRunCols = mscols, assayCols = acols, groups = c("phenotype", "age", "injection_index")) ## The sample to assay mapping: ## The repeated injection (assay 1 and 4) of the QC sample are assigned ## to the same sample (1). getMtdField(m, "assay\\[\\d\\]-sample_ref") ## The study variable 2 represents a value of `"CVD"` for the *phenotype* ## study variable group getMtdField(m, "study_variable\\[2\\]$") ## and this study variable references the 2nd and 6th assay getMtdField(m, "study_variable\\[2\\]-assay_refs") ## It is also possible to create a MTD section without samples or ## study variables m <- mtdFromSampleData( sdata, msRunCols = c(location = "file", scan_polarity = "polarity"), assayCols = c(assay = "assay_info")) m
The mtdMsRun() function allows to define and format the ms_run fields
of the mzTab-M metadata. The information is build on the actual data file
names along with optional additional parameters to characterize the MS
run(s).
For details and expected input for the various parameter it is strongly suggested to consult the mzTab-M documentation.
mtdMsRun( location = character(), instrument_ref = integer(), format = character(), id_format = character(), fragmentation_method = vector("list", length(location)), scan_polarity = character(), hash = character(), hash_method = character(), parameters = character() )mtdMsRun( location = character(), instrument_ref = integer(), format = character(), id_format = character(), fragmentation_method = vector("list", length(location)), scan_polarity = character(), hash = character(), hash_method = character(), parameters = character() )
location |
|
instrument_ref |
(optional) |
format |
(optional) |
id_format |
(optional) |
fragmentation_method |
(optional) |
scan_polarity |
|
hash |
(optional) |
hash_method |
(optional) |
parameters |
(optional) |
two column character matrix with the ms_run metadata fields
for a mzTab-M file.
At present only a single polarity per run/file is supported.
Johannes Rainer, Philippine Louail
MTD-export for other functions defining metadata information
## Build a very basic MTD ms_run section for two data files fls <- c("file:///path/to/file/a.mzML", "file:///path/to/file/b.mzML") mtdMsRun(location = fls, scan_polarity = "positive") ## Add also instrument reference information mtdMsRun(location = fls, scan_polarity = "positive", instrument_ref = 1) ## Finally, add a fragmentation method used for the second file - no ## fragmentation was used for the first file, thus `NULL` is specified. ## Parameter `fragmentation_method` expects a `list` as input to support ## also multiple fragmentation methods per MS run. mtdMsRun(location = fls, scan_polarity = "positive", fragmentation_method = list(NULL, "[MS, MS:1000133, CID, ]"))## Build a very basic MTD ms_run section for two data files fls <- c("file:///path/to/file/a.mzML", "file:///path/to/file/b.mzML") mtdMsRun(location = fls, scan_polarity = "positive") ## Add also instrument reference information mtdMsRun(location = fls, scan_polarity = "positive", instrument_ref = 1) ## Finally, add a fragmentation method used for the second file - no ## fragmentation was used for the first file, thus `NULL` is specified. ## Parameter `fragmentation_method` expects a `list` as input to support ## also multiple fragmentation methods per MS run. mtdMsRun(location = fls, scan_polarity = "positive", fragmentation_method = list(NULL, "[MS, MS:1000133, CID, ]"))
The mtdProtocol() function assists in compiling the protocol information
of the metadata section. Each protocol is referenced from an assay section
(see mtdAssay()).
For details and expected input for the various parameter it is strongly suggested to consult the mzTab-M documentation.
mtdProtocol( name = character(), type = character(), description = character(), parameters = character() )mtdProtocol( name = character(), type = character(), description = character(), parameters = character() )
name |
|
type |
|
description |
optional |
parameters |
optional |
two-column character matrix with the content for the protocol
metadata section.
Gabriele Tomè
MTD-export for other functions defining metadata information
## Minimal example with protocol. mtdProtocol(name = c("protocol1", "protocol2", "protocol3"), type = c("[,,,type1]", "[,,,type2]", "[,,,type3]")) ## Example with all the fields mtdProtocol(name = c("protocol1", "protocol2", "protocol3"), type = c("[,,,type]", "[,,,type2]", "[,,,type3]"), description = c("description1", "description2", "description3"), parameters = list(c("param1.1", "param1.2"), "param2", "param3"))## Minimal example with protocol. mtdProtocol(name = c("protocol1", "protocol2", "protocol3"), type = c("[,,,type1]", "[,,,type2]", "[,,,type3]")) ## Example with all the fields mtdProtocol(name = c("protocol1", "protocol2", "protocol3"), type = c("[,,,type]", "[,,,type2]", "[,,,type3]"), description = c("description1", "description2", "description3"), parameters = list(c("param1.1", "param1.2"), "param2", "param3"))
The mtdSample() function aids in creating and formatting the (optional)
sample information from the mzTab-M metadata section. If defined, the sample
information must be correctly linked to from the assay section. In
particular, the assays need to link to the index of the samples defined in
this section. One entry for each originating sample should be defined
(without information on experimental properties). For each sample one
or more additional characteristics (such as species, tissue, cell_type
or disease) can be provided. Thus, these parameters expect the input be
provided as a list. In addition, if a single value needs to be assigned
to each sample, a character(1) of length 1 can be provided with the
respective input parameter.
Important: to support the optional additional parameters passed along
with ... all parameters (such as sample, species etc) have to
be fully spelled out.
For details and expected input for the various parameter it is strongly suggested to consult the mzTab-M documentation.
mtdSample( ..., sample = character(), species = list(), tissue = list(), cell_type = list(), disease = list(), description = character() )mtdSample( ..., sample = character(), species = list(), tissue = list(), cell_type = list(), disease = list(), description = character() )
... |
named |
sample |
|
species |
|
tissue |
|
cell_type |
|
disease |
|
description |
|
two column character matrix with the information formatted as
sample section of the mzTab-M format.
MTD-export for other functions defining metadata information
## Example sample description data.frame for an experiment pd <- data.frame( sample_name = c("ind_1", "ind_2", "ind_1", "ind_2"), sample_id = c("i1_t1", "i2_t2", "i1_t2", "i2_t2"), time_point = c(1, 2, 1, 2)) ## Define a minimal sample information with just the sample names. mtdSample(unique(pd$sample_name)) ## Add also species information: each sample from the same species mtdSample( sample = unique(pd$sample_name), species = "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]") ## Assume first sample is a mixture of two species mtdSample( sample = unique(pd$sample_name), species = list(c("[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]", "[NCBITaxon, NCBITaxon:39767, Human rhinovirus 11, ]"), "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]") ) ## Add full information including tissue, cell type and disease mtdSample( sample = unique(pd$sample_name), species = list(c("[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]", "[NCBITaxon, NCBITaxon:39767, Human rhinovirus 11, ]"), "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]"), tissue = "[BTO, BTO:0000759, liver, ]", cell_type = "[CL, CL:0000182, hepatocyte, ]", disease = list(c("[DOID, DOID:684, hepatocellular carcinoma, ]", "[DOID, DOID:9451, alcoholic fatty liver, ]"), NULL) ) ## Add also additional custom variables mtdSample(sample = c("A", "B"), c("[,,Extraction date, 2011-12-21]", "[,,Extraction date, 2011-12-22]"), c("[,,Extraction reason, liver biopsy]", "[,,Extraction reason, liver biopsy]"))## Example sample description data.frame for an experiment pd <- data.frame( sample_name = c("ind_1", "ind_2", "ind_1", "ind_2"), sample_id = c("i1_t1", "i2_t2", "i1_t2", "i2_t2"), time_point = c(1, 2, 1, 2)) ## Define a minimal sample information with just the sample names. mtdSample(unique(pd$sample_name)) ## Add also species information: each sample from the same species mtdSample( sample = unique(pd$sample_name), species = "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]") ## Assume first sample is a mixture of two species mtdSample( sample = unique(pd$sample_name), species = list(c("[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]", "[NCBITaxon, NCBITaxon:39767, Human rhinovirus 11, ]"), "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]") ) ## Add full information including tissue, cell type and disease mtdSample( sample = unique(pd$sample_name), species = list(c("[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]", "[NCBITaxon, NCBITaxon:39767, Human rhinovirus 11, ]"), "[NCBITaxon, NCBITaxon:9606, Homo sapiens, ]"), tissue = "[BTO, BTO:0000759, liver, ]", cell_type = "[CL, CL:0000182, hepatocyte, ]", disease = list(c("[DOID, DOID:684, hepatocellular carcinoma, ]", "[DOID, DOID:9451, alcoholic fatty liver, ]"), NULL) ) ## Add also additional custom variables mtdSample(sample = c("A", "B"), c("[,,Extraction date, 2011-12-21]", "[,,Extraction date, 2011-12-22]"), c("[,,Extraction reason, liver biopsy]", "[,,Extraction reason, liver biopsy]"))
This core MTD section allows to describe the general experimental setup
and provides general information of the data set. It should contain
references to all controlled vocabulary (CV) ontologies used and
refered to in the mzTab-M file.
The mtdSkeleton() function creates a two-column matrix with the
basic mzTab-M MTD section based on the provided data. The returned
result contains only minimal information. It should be expanded, corrected
and completed with additional fields and information (i.e., the
skeleton returned by this function should be completed with flesh).
For details and expected input for the various parameter it is strongly suggested to consult the mzTab-M documentation.
mtdSkeleton( id = character(), software = character(), quantification_method = "[MS, MS:1001834, LC-MS label-free quantitation analysis, ]", cv_label = c("MS", "PRIDE", "STATO"), cv_full_name = c("PSI-MS controlled vocabulary", "PRIDE PRoteomics IDEntifications (PRIDE) database controlled vocabulary", "General purpose STATistics Ontology"), cv_version = c("4.1.138", "16:10:2023 11:38", "2026-04-20"), cv_uri = c("https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo", "https://www.ebi.ac.uk/ols/ontologies/pride", "https://www.ebi.ac.uk/ols4/ontologies/stato"), database = c("[,, \"no database\", null ]"), database_prefix = c("null"), database_version = c("Unknown"), database_uri = c("null"), small_molecule_quantification_unit = "[PRIDE, PRIDE:0000330, Arbitrary quantification unit, ]", small_molecule_feature_quantification_unit = "[PRIDE, PRIDE:0000330, Arbitrary quantification unit, ]", small_molecule_identification_reliability = "[MS, MS:1002896, compound identification confidence level, ]", mztab_version = "2.1.0-M" )mtdSkeleton( id = character(), software = character(), quantification_method = "[MS, MS:1001834, LC-MS label-free quantitation analysis, ]", cv_label = c("MS", "PRIDE", "STATO"), cv_full_name = c("PSI-MS controlled vocabulary", "PRIDE PRoteomics IDEntifications (PRIDE) database controlled vocabulary", "General purpose STATistics Ontology"), cv_version = c("4.1.138", "16:10:2023 11:38", "2026-04-20"), cv_uri = c("https://raw.githubusercontent.com/HUPO-PSI/psi-ms-CV/master/psi-ms.obo", "https://www.ebi.ac.uk/ols/ontologies/pride", "https://www.ebi.ac.uk/ols4/ontologies/stato"), database = c("[,, \"no database\", null ]"), database_prefix = c("null"), database_version = c("Unknown"), database_uri = c("null"), small_molecule_quantification_unit = "[PRIDE, PRIDE:0000330, Arbitrary quantification unit, ]", small_molecule_feature_quantification_unit = "[PRIDE, PRIDE:0000330, Arbitrary quantification unit, ]", small_molecule_identification_reliability = "[MS, MS:1002896, compound identification confidence level, ]", mztab_version = "2.1.0-M" )
id |
|
software |
|
quantification_method |
|
cv_label |
|
cv_full_name |
|
cv_version |
|
cv_uri |
|
database |
|
database_prefix |
|
database_version |
|
database_uri |
|
small_molecule_quantification_unit |
|
small_molecule_feature_quantification_unit |
|
small_molecule_identification_reliability |
|
mztab_version |
|
two-column character matrix that should be expanded with
additional fields (such as title, description etc) and
information (with the help from the mtdFields() function).
Philippine Louail, Johannes Rainer
MTD-export for other functions defining metadata information
## Define a minimal mzTab-M metadata information mtd <- mtdSkeleton(id = "001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Column 1 has the field names mtd[, 1] ## Column 2 the respective values mtd[, 2] ## Add additional fields as defined in the mzTab-M definition mtd <- rbind( mtd, c("title", "My simple xcms preprocessed data"), c("description", "A simple example xcms preprocessing.")) tail(mtd) ## Add instrument information instr <- mtdFields( name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", `analyzer[1]` = "[MS, MS:1000291, linear ion trap,]", detector = "[MS, MS:1000253, electron multiplier,]", field_prefix = "instrument" ) instr ## Add this information to the metadata mtd <- rbind(mtd, instr) ## Define sample processing fields using the mtdFields function sp <- mtdFields( c("[MSIO, MSIO:0000146, centrifugation,]", "[MSIO, MSIO:0000141, metabolite extraction,]", "[MSIO, MSIO:0000141, silylation,]"), field_prefix = "sample_processing") sp ## Add this information to the metadata mtd <- rbind(mtd, sp) ## Since a new ontology was used for the sample processing, we need also to ## add that to the metadata. We manually define the fields to add using ## `"cv[3]"` because there are already 2 CVs defined in the MTD skeleton. cv2 <- rbind( c("cv[3]-label", "MSIO"), c("cv[3]-full_name", "Metabolomics Standards Initiative Ontology"), c("cv[3]-version", "1.0.1"), c("cv[3]-uri", "http://purl.obolibrary.org/obo/msio.owl") ) ## Add this information to the metadata mtd <- rbind(mtd, cv2) ## Finally sort the metadata fields according to the expected order mtd <- mtdSort(mtd) mtd## Define a minimal mzTab-M metadata information mtd <- mtdSkeleton(id = "001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Column 1 has the field names mtd[, 1] ## Column 2 the respective values mtd[, 2] ## Add additional fields as defined in the mzTab-M definition mtd <- rbind( mtd, c("title", "My simple xcms preprocessed data"), c("description", "A simple example xcms preprocessing.")) tail(mtd) ## Add instrument information instr <- mtdFields( name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", `analyzer[1]` = "[MS, MS:1000291, linear ion trap,]", detector = "[MS, MS:1000253, electron multiplier,]", field_prefix = "instrument" ) instr ## Add this information to the metadata mtd <- rbind(mtd, instr) ## Define sample processing fields using the mtdFields function sp <- mtdFields( c("[MSIO, MSIO:0000146, centrifugation,]", "[MSIO, MSIO:0000141, metabolite extraction,]", "[MSIO, MSIO:0000141, silylation,]"), field_prefix = "sample_processing") sp ## Add this information to the metadata mtd <- rbind(mtd, sp) ## Since a new ontology was used for the sample processing, we need also to ## add that to the metadata. We manually define the fields to add using ## `"cv[3]"` because there are already 2 CVs defined in the MTD skeleton. cv2 <- rbind( c("cv[3]-label", "MSIO"), c("cv[3]-full_name", "Metabolomics Standards Initiative Ontology"), c("cv[3]-version", "1.0.1"), c("cv[3]-uri", "http://purl.obolibrary.org/obo/msio.owl") ) ## Add this information to the metadata mtd <- rbind(mtd, cv2) ## Finally sort the metadata fields according to the expected order mtd <- mtdSort(mtd) mtd
Helper function to sort a mzTab-M MTD matrix, such as generated by
mtdSkeleton(), into the correct order of the metadata fields.
mtdSort(x)mtdSort(x)
x |
two-column matrix with the first column containing the metadata field names. |
input parameter x sorted into the correct order.
Johannes Rainer
MTD-export for other functions defining metadata information
mzTab-M (version >= 2.1) encodes the experimental design of a data set/study using study variable groups and study variables. The study variable group represents the phenotypic (or experimental) condition and the study variable the actual value of a sample (or assay) for that study group.
In R, the most common representation of an experimental design is a
data.frame where rows are individual samples (assays) and columns the
experimental or phenotypic conditions (variables). The
mtdStudyVariables() takes such a data.frame as input and encodes it
into the mzTab-M format. Additional parameters such as group_description,
and group_type allow to provide additional information for each study
variable group (phenotype) while parameters average_function,
variation_function and description can be used to provide properties for
the individual study variables. For most experiments the default values of
these parameters should suffice.
mtdStudyVariables( x, groups = character(), group_description = character(), group_type = character(), group_datatype = character(), group_unit = character(), average_function = "[MS, MS:1002962, mean, ]", variation_function = "[MS, MS:1002963, variation coefficient, ]", description = character() ) mtdDefineStudyVariables(x = data.frame(), groups = character())mtdStudyVariables( x, groups = character(), group_description = character(), group_type = character(), group_datatype = character(), group_unit = character(), average_function = "[MS, MS:1002962, mean, ]", variation_function = "[MS, MS:1002963, variation coefficient, ]", description = character() ) mtdDefineStudyVariables(x = data.frame(), groups = character())
x |
|
groups |
|
group_description |
|
group_type |
|
group_datatype |
optional |
group_unit |
optional |
average_function |
optional |
variation_function |
optional |
description |
|
Each study variable must be reported in the abundance tables.
Each assay of a data set must be referred to from at least one study
variable. Even if a data set has no experimental variables, a study variable
group and study variable with the name "undefined" must be reported.
Using mtdStudyVariables() without specifying parameter group will
create such a setup.
The mtdDefineStudyVariables() function can be used to get the set
(and order) of study variables that would be generated from an input
data.frame depending on the parameter groups.
two-column character matrix with the content for the study
variables metadata section.
Datatypes "xsd:date", "xsd:time", "xsd:dateTime" and "xsd:anyURI" are
currently mapped to character in R (and vice versa).
At present study variables are mapped to assays, but not to MS runs.
Philippine Louail, Johannes Rainer
MTD-export for other functions defining metadata information
## Example phenodata/sample data.frame. Each row is supposed to match ## the measurement of one sample (for a certain condition/time point) from ## one individual x <- data.frame( name = c("I1_0", "I2_0", "I1_6", "I2_6", "I3_0"), individual = c("I1", "I2", "I1", "I2", "I3"), BMI = c(29.3, 31.4, 29.3, 31.4, 26.5), timepoint = c(0, 6, 0, 6, 0), T2D = c(TRUE, FALSE, TRUE, FALSE, FALSE) ) ## Study variable groups for this data set could be `"individual"`, `"BMI"`, ## `"timepoint"` and `"T2D"` mtdStudyVariables(x, groups = c("individual", "BMI", "timepoint", "T2D")) ## Specifying a different average and variation function and selecting ## just two sample columns mtdStudyVariables(x, groups = c("timepoint", "T2D"), average_function = "[MS, MS:1002883, median, ]", variation_function = "[MS, MS:1002885, standard error, ]") ## Creating a study variable section without defined study variable groups mtdStudyVariables(x) ## Use `mtdDefineStudyVariables()` to get the definition of study ## variables for a given `x` and `groups` mtdDefineStudyVariables(x, c("T2D", "BMI", "individual"))## Example phenodata/sample data.frame. Each row is supposed to match ## the measurement of one sample (for a certain condition/time point) from ## one individual x <- data.frame( name = c("I1_0", "I2_0", "I1_6", "I2_6", "I3_0"), individual = c("I1", "I2", "I1", "I2", "I3"), BMI = c(29.3, 31.4, 29.3, 31.4, 26.5), timepoint = c(0, 6, 0, 6, 0), T2D = c(TRUE, FALSE, TRUE, FALSE, FALSE) ) ## Study variable groups for this data set could be `"individual"`, `"BMI"`, ## `"timepoint"` and `"T2D"` mtdStudyVariables(x, groups = c("individual", "BMI", "timepoint", "T2D")) ## Specifying a different average and variation function and selecting ## just two sample columns mtdStudyVariables(x, groups = c("timepoint", "T2D"), average_function = "[MS, MS:1002883, median, ]", variation_function = "[MS, MS:1002885, standard error, ]") ## Creating a study variable section without defined study variable groups mtdStudyVariables(x) ## Use `mtdDefineStudyVariables()` to get the definition of study ## variables for a given `x` and `groups` mtdDefineStudyVariables(x, c("T2D", "BMI", "individual"))
The MzTabM class is a simple container for the mzTab-M data/file content.
Methods for this class allow adding or updating information and validating
its content.
New instances can be created using the MzTabM() function providing the
content for the MTD, SML, SMF and SML sections (through parameters mtd,
sml, smf, and sml, respectively).
MzTabM( mtd = mtdSkeleton(id = "<replace>", software = "<replace>"), sml = matrix(ncol = 0, nrow = 0), smf = matrix(ncol = 0, nrow = 0), sme = matrix(ncol = 0, nrow = 0) )MzTabM( mtd = mtdSkeleton(id = "<replace>", software = "<replace>"), sml = matrix(ncol = 0, nrow = 0), smf = matrix(ncol = 0, nrow = 0), sme = matrix(ncol = 0, nrow = 0) )
mtd |
Two-column |
sml |
|
smf |
|
sme |
|
Various functions are available to get or set metadata information of a
MzTabM class:
getMtdInstrument() and setMtdInstrument() for instrument information.
Johannes Rainer
## Create a minimal mzTab-M with only MTD content. m <- MzTabM(mtd = mtdSkeleton(id = "001", software = "[,,RmzTabM,]")) m ## Add instrument information to the MTD section m <- setMtdInstrument(m, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]") m getMtdInstrument(m)## Create a minimal mzTab-M with only MTD content. m <- MzTabM(mtd = mtdSkeleton(id = "001", software = "[,,RmzTabM,]")) m ## Add instrument information to the MTD section m <- setMtdInstrument(m, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]") m getMtdInstrument(m)
Write and validate an mzTab-M (version 2.1) file.
The writer recognises the four standard sections:
MTD (Metadata): Always required. The metadata section provides additional information about the dataset(s) reported in the mzTab-M 2.1 file.
SML (Small Molecule Summary): Optional. Each row of the small molecule section is intended to report one final result to be communicated in terms of a molecule that has been quantified.
SMF (Small Molecule Feature): Optional but required when the SME section is present. The small molecule feature section represents individual MS regions that have been measured/quantified. Each SMF row SHOULD represent a single isotopomer.
SME (Small Molecule Evidence): Optional. The small molecule evidence section represents evidence for identifications of small molecules/features. In a typical case, each row represents one result from a single search or intepretation of a piece of evidence.
Column names of SMH, SFH and SEH are used as section headers.
writeMzTabM(x, path, comments = character())writeMzTabM(x, path, comments = character())
x |
|
path |
|
comments |
|
Gabriele Tomè
## Basic usage exp <- data.frame( sample_name = c("S1_T1", "S1_T2", "S2_T1", "S2_T2", "S3_T1", "S3_T2"), sample_id = c("S1", "S1", "S2", "S2", "S3", "S3"), timepoint = c("0h", "6h", "0h", "6h", "0h", "6h"), genotype = c("WT", "WT", "KO", "KO", "KO", "KO"), operator = c("BB", "BB", "BB", "BB", "FB", "FB"), file_name = c("s1-t1.mzML", "s1-t2.mzML", "s2-t1.mzML", "s2-t2.mzML", "s3-t1.mzML", "s3-t2.mzML") ) mtd <- mtdSkeleton(id = "EXP_1", software = "[MS, MS:1001582, xcms, 4.1.0]") mtd_msr <- mtdMsRun( location = exp$file_name, format = "[MS, MS:1000584, mzML file, ]", id_format = "[MS, MS:1000530, mzML unique identifier, ]", scan_polarity = "positive") mtd <- rbind(mtd, mtd_msr) mtd_a <- mtdAssay( assay = exp$sample_name, sample_ref = c("sample[1]", "sample[1]", "sample[2]", "sample[2]", "sample[3]", "sample[3]"), ms_run_ref = paste0("ms_run[", seq_len(nrow(exp)), "]") ) mtd <- rbind(mtd, mtd_a) mtd_svar <- mtdStudyVariables( exp, groups = c("timepoint", "genotype", "operator"), group_unit = c("[, , hours, ]", "", "")) mtd <- rbind(mtd, mtd_svar) mtd <- mtdSort(mtd) x <- list("MTD" = mtd) filepath <- file.path(tempdir(), "example.mztab") result <- writeMzTabM(x, path = filepath) ## Add comments comments <- c("Test comment") results <- writeMzTabM(x, path = filepath, comments = comments)## Basic usage exp <- data.frame( sample_name = c("S1_T1", "S1_T2", "S2_T1", "S2_T2", "S3_T1", "S3_T2"), sample_id = c("S1", "S1", "S2", "S2", "S3", "S3"), timepoint = c("0h", "6h", "0h", "6h", "0h", "6h"), genotype = c("WT", "WT", "KO", "KO", "KO", "KO"), operator = c("BB", "BB", "BB", "BB", "FB", "FB"), file_name = c("s1-t1.mzML", "s1-t2.mzML", "s2-t1.mzML", "s2-t2.mzML", "s3-t1.mzML", "s3-t2.mzML") ) mtd <- mtdSkeleton(id = "EXP_1", software = "[MS, MS:1001582, xcms, 4.1.0]") mtd_msr <- mtdMsRun( location = exp$file_name, format = "[MS, MS:1000584, mzML file, ]", id_format = "[MS, MS:1000530, mzML unique identifier, ]", scan_polarity = "positive") mtd <- rbind(mtd, mtd_msr) mtd_a <- mtdAssay( assay = exp$sample_name, sample_ref = c("sample[1]", "sample[1]", "sample[2]", "sample[2]", "sample[3]", "sample[3]"), ms_run_ref = paste0("ms_run[", seq_len(nrow(exp)), "]") ) mtd <- rbind(mtd, mtd_a) mtd_svar <- mtdStudyVariables( exp, groups = c("timepoint", "genotype", "operator"), group_unit = c("[, , hours, ]", "", "")) mtd <- rbind(mtd, mtd_svar) mtd <- mtdSort(mtd) x <- list("MTD" = mtd) filepath <- file.path(tempdir(), "example.mztab") result <- writeMzTabM(x, path = filepath) ## Add comments comments <- c("Test comment") results <- writeMzTabM(x, path = filepath, comments = comments)
Reads and parses an mzTab-M (version 2.1) file into a named list.
The parser recognises the four standard sections:
MTD (Metadata): Always required. The metadata section provides additional information about the dataset(s) reported in the mzTab-M 2.1 file.
SML (Small Molecule Summary): Optional. Each row of the small molecule section is intended to report one final result to be communicated in terms of a molecule that has been quantified.
SMF (Small Molecule Feature): Optional but required when the SME section is present. The small molecule feature section represents individual MS regions that have been measured/quantified. Each SMF row SHOULD represent a single isotopomer.
SME (Small Molecule Evidence): Optional. The small molecule evidence section represents evidence for identifications of small molecules/features. In a typical case, each row represents one result from a single search or intepretation of a piece of evidence.
Section headers (SMH, SFH, SEH) are used as column names for their
respective data sections. The MTD section does not have a dedicated header
row; its two-column key–value structure is returned without column names.
readMzTabM(path, ...)readMzTabM(path, ...)
path |
|
... |
Additional arguments forwarded to every internal call to
|
A named list containing between one and four elements, depending on
which sections are present in the file:
MTD: A matrix with two columns. Column names are absent (the
prefix column is dropped).
SML: A matrix whose column names are taken from the SMH
header row. Present only when SMH is found in the file.
SMF: A matrix whose column names are taken from the SFH
header row. Present only when SFH is found in the file.
SME: A matrix whose column names are taken from the SEH
header row. Present only when both SFH and SEH are found in the file.
Gabriele Tomè
## Basic usage result <- readMzTabM(system.file("mztabm/out","xcms_mzTab-M_2-1_v2.mzTab", package = "RmzTabM")) names(result)## Basic usage result <- readMzTabM(system.file("mztabm/out","xcms_mzTab-M_2-1_v2.mzTab", package = "RmzTabM")) names(result)
mzTab-M makes use of controlled vocabulary (CV) parameters. These parameters
are expected to be provided in the format
"[CV label, accession, name, value]". In addition, also user parameters
where only the name and value are provided are supported, but it is
recommendet to use full CV parameters where possible.
parseCvParameter() allows to extract individual fields from a CV
parameter.
isCvParameter() tests whether a string is in the expected (CV parameter)
format.
parseCvParameter(x, element = 2L) isCvParameter(x)parseCvParameter(x, element = 2L) isCvParameter(x)
x |
|
element |
|
character of length(x) with the parsed CV parameter elements or
NA_character_ if not present.
While mzTab supports "," in the name and value field (in which case
the respective field must be placed between "), this is currently not
supported. Thus, CV label and term (accession) are expected to be correctly
extracted, the name and value field might not if "," are present in them.
Johannes Rainer
## Extract CV term x <- c("[MS, MS:1002962, mean, ]", "[MS, MS:1002883, median, ]") parseCvParameter(x, 1) parseCvParameter(x) parseCvParameter(x, 3) parseCvParameter(x, 4) ## CV term missing parseCvParameter("[, , user, value]") parseCvParameter("[, , user, value]", 3) parseCvParameter("[, , user, value]", 4) ## Check validity of CV parameters isCvParameter(c(x, "[a, b, c, d, e]", "d"))## Extract CV term x <- c("[MS, MS:1002962, mean, ]", "[MS, MS:1002883, median, ]") parseCvParameter(x, 1) parseCvParameter(x) parseCvParameter(x, 3) parseCvParameter(x, 4) ## CV term missing parseCvParameter("[, , user, value]") parseCvParameter("[, , user, value]", 3) parseCvParameter("[, , user, value]", 4) ## Check validity of CV parameters isCvParameter(c(x, "[a, b, c, d, e]", "d"))
setMtdInstrument() sets or updates instrument-related metadata fields
within an MTD (metadata) section. When instrument metadata already exists,
the function can either replace it entirely or append new values to the
existing ones.
getMtdInstrument() returns the instrument information from an MTD section.
## S4 method for signature 'dfmatrix' setMtdInstrument( x = matrix(), name = character(), source = character(), analyzer = character(), detector = character(), replace = FALSE ) getMtdInstrument(x = matrix())## S4 method for signature 'dfmatrix' setMtdInstrument( x = matrix(), name = character(), source = character(), analyzer = character(), detector = character(), replace = FALSE ) getMtdInstrument(x = matrix())
x |
A MTD section that stores metadata fields. Can be a two-column
|
name |
|
source |
|
analyzer |
|
detector |
|
replace |
|
For setMtdInstrument(): the input object x updated to include the new
or merged instrument metadata fields. If x is empty, the empty x.
For getMtdInstrument(): a named character with the instrument
information, names being the field names.
Gabriele Tomè
x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add instrument metadata to an existing mzTab object mtd <- setMtdInstrument(x, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]") ## Replace all existing instrument metadata mtd <- setMtdInstrument(mtd, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]", replace = TRUE) x <- mtdSkeleton("001", software = "[MS, MS:1001582, xcms, 4.0.0]") mtd <- setMtdInstrument(x, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]") getMtdInstrument(mtd)x <- mtdSkeleton("001", software = "[MS, MS:1001582, xmcs, 4.0.0]") ## Add instrument metadata to an existing mzTab object mtd <- setMtdInstrument(x, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]") ## Replace all existing instrument metadata mtd <- setMtdInstrument(mtd, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]", replace = TRUE) x <- mtdSkeleton("001", software = "[MS, MS:1001582, xcms, 4.0.0]") mtd <- setMtdInstrument(x, name = "[MS, MS:1000449, LTQ Orbitrap,]", source = "[MS, MS:1000073, ESI,]", analyzer = c(`analyzer[1]` = "[MS, MS:1000291, linear ion trap,]"), detector = "[MS, MS:1000253, electron multiplier,]") getMtdInstrument(mtd)
The Small Molecule (SME) table contains evidence for annotation
(identification) of small molecule features (defined in the SMF section
SMF-export). These annotations can result from database searches using MS2
spectra or retention time and m/z searches against an in-house annotation
database. Each row provides the evidence for one match. Multiple
matches/annotations for the same input information can be reported in
separate rows (but using the same value in the "evidence_input_id"
column).
The small molecule evidence section must always come after the
"Small Molecule Feature" (SMF) Table. All columns are
mandatory except for "opt_" columns.
A detailed description of the SME format and its columns is provided in the respective section of the mzTab-M specification.
The functions to create and format the SME content are:
smeCreate(): provides a simplified workflow to generate this table in a
single step. It takes vectors defining feature properties.
It automatically:
Adds the required SME_ID and standard mzTab-M columns (e.g.,
evidence_input_id).
Populates missing mandatory columns with "null" strings to ensure
compliance.
Checks content against information available in the MTD section.
Sets the line prefix column SEH to "SME".
Orders columns according to the mzTab-M specification.
Important: to support the optional additional parameters passed along
with ... all parameters have to be fully spelled out. All
parameters are vectorized and recycled as needed to match the number of
rows in the abundance matrix. If their length is not equal to the number
of rows or 1, an error is raised.
See also the specification of the SMF section for details.
smeSort(): can be used to sort the columns of the SME data frame
according to the standard order defined in the mzTab-M specification.
This is useful if you have added custom columns and want to ensure the
standard columns are in the correct order for export.
smeCreate( ..., evidence_input_id = character(), database_identifier = character(), chemical_formula = character(), smiles = character(), inchi = character(), chemical_name = character(), uri = character(), derivatized_form = character(), adduct_ions = character(), exp_mass_to_charge = numeric(), charge = numeric(), theoretical_mass_to_charge = numeric(), spectra_ref = character(), identification_method = character(), ms_level = character(), id_confidence_measure = NULL, rank = 1, mtd = NULL ) smeSort(x) smeIdConfidenceMeasure(x, mtd, nr = numeric()) smeSpectraRefValidator(x, mtd)smeCreate( ..., evidence_input_id = character(), database_identifier = character(), chemical_formula = character(), smiles = character(), inchi = character(), chemical_name = character(), uri = character(), derivatized_form = character(), adduct_ions = character(), exp_mass_to_charge = numeric(), charge = numeric(), theoretical_mass_to_charge = numeric(), spectra_ref = character(), identification_method = character(), ms_level = character(), id_confidence_measure = NULL, rank = 1, mtd = NULL ) smeSort(x) smeIdConfidenceMeasure(x, mtd, nr = numeric()) smeSpectraRefValidator(x, mtd)
... |
optional columns to be added to the SME. The length of arguments
passed through |
evidence_input_id |
|
database_identifier |
|
chemical_formula |
|
smiles |
|
inchi |
|
chemical_name |
|
uri |
|
derivatized_form |
|
adduct_ions |
|
exp_mass_to_charge |
|
charge |
|
theoretical_mass_to_charge |
|
spectra_ref |
|
identification_method |
|
ms_level |
|
id_confidence_measure |
|
rank |
|
mtd |
two-column |
x |
for |
nr |
|
All parameters passed to the smeCreate() function must be fully named.
A complete SME data.frame ready for export. The data frame contains
the 'SEH' line prefix, standard columns ordered according to spec, and any
optional columns.
Gabriele Tomè
MTD-export, SMF-export and SML-export for creating and formatting the metadata (MTD), small molecule feature (SMF) and small molecule (SML) sections.
## Define minimum required MTD section mtd <- cbind(c("ms_run[1]-location", "ms_run[1]-format", "ms_run[1]-id_format", "ms_run[1]-scan_polarity[1]", "ms_run[2]-location", "ms_run[2]-format", "ms_run[2]-id_format", "ms_run[2]-scan_polarity[1]", "ms_run[3]-location", "ms_run[3]-format", "ms_run[3]-id_format", "ms_run[3]-scan_polarity[1]"), c("1.mzML", "[MS, MS:1000584, mzML file, ]", "[MS, MS:1000530, mzML unique identifier, ]", "[MS, MS:1000130, positive scan, ]", "2.mzML", "[MS, MS:1000584, mzML file, ]", "[MS, MS:1000530, mzML unique identifier, ]", "[MS, MS:1000130, positive scan, ]", "3.mzML", "[MS, MS:1000584, mzML file, ]", "[MS, MS:1000530, mzML unique identifier, ]", "[MS, MS:1000130, positive scan, ]")) ## Define minimum parameter to build SME section evidence_input_id = c("ms_run[1]:mass=700.5255;rt=20·5", "ms_run[2]:mass=452.2782;rt=35.1", "ms_run[3]:mass=882.6210;rt=40.0") exp_mass_to_charge = c(700.5255, 452.2782, 882.6210) charge = c(1, 1, 1) theoretical_mass_to_charge = c(700.5281, 452.2777, 882.6224) spectra_ref = c("ms_run[1]:index=7646", "ms_run[2]:index=7640", "ms_run[3]:index=7671|ms_run[3]:index=7725") identification_method = "[, , LipidDataAnalyzer, 2.11.1]" ms_level = "[MS, MS:1000511, ms level, 2]" ## Create the final dataframe ready for export ## Note: Fields not provided are automatically set to "null" sme_final <- smeCreate( evidence_input_id = evidence_input_id, exp_mass_to_charge = exp_mass_to_charge, charge = charge, theoretical_mass_to_charge = theoretical_mass_to_charge, spectra_ref = spectra_ref, identification_method = identification_method, ms_level = ms_level, mtd = mtd ) ## The result contains the 'SEH' line prefix and standard columns head(sme_final)## Define minimum required MTD section mtd <- cbind(c("ms_run[1]-location", "ms_run[1]-format", "ms_run[1]-id_format", "ms_run[1]-scan_polarity[1]", "ms_run[2]-location", "ms_run[2]-format", "ms_run[2]-id_format", "ms_run[2]-scan_polarity[1]", "ms_run[3]-location", "ms_run[3]-format", "ms_run[3]-id_format", "ms_run[3]-scan_polarity[1]"), c("1.mzML", "[MS, MS:1000584, mzML file, ]", "[MS, MS:1000530, mzML unique identifier, ]", "[MS, MS:1000130, positive scan, ]", "2.mzML", "[MS, MS:1000584, mzML file, ]", "[MS, MS:1000530, mzML unique identifier, ]", "[MS, MS:1000130, positive scan, ]", "3.mzML", "[MS, MS:1000584, mzML file, ]", "[MS, MS:1000530, mzML unique identifier, ]", "[MS, MS:1000130, positive scan, ]")) ## Define minimum parameter to build SME section evidence_input_id = c("ms_run[1]:mass=700.5255;rt=20·5", "ms_run[2]:mass=452.2782;rt=35.1", "ms_run[3]:mass=882.6210;rt=40.0") exp_mass_to_charge = c(700.5255, 452.2782, 882.6210) charge = c(1, 1, 1) theoretical_mass_to_charge = c(700.5281, 452.2777, 882.6224) spectra_ref = c("ms_run[1]:index=7646", "ms_run[2]:index=7640", "ms_run[3]:index=7671|ms_run[3]:index=7725") identification_method = "[, , LipidDataAnalyzer, 2.11.1]" ms_level = "[MS, MS:1000511, ms level, 2]" ## Create the final dataframe ready for export ## Note: Fields not provided are automatically set to "null" sme_final <- smeCreate( evidence_input_id = evidence_input_id, exp_mass_to_charge = exp_mass_to_charge, charge = charge, theoretical_mass_to_charge = theoretical_mass_to_charge, spectra_ref = spectra_ref, identification_method = identification_method, ms_level = ms_level, mtd = mtd ) ## The result contains the 'SEH' line prefix and standard columns head(sme_final)
The Small Molecule Feature (SMF) section of the mzTab-M definition captures information on the individual MS features (quantified regions, e.g., elution profiles of specific m/z and retention times) that were measured across the assays.
smfCreate() provides a simplified workflow to generate this table in a
single step. It takes a matrix of abundances (rows=features,
columns=assays) and optional vectors defining feature properties.
It automatically:
Formats the abundance matrix (renaming columns to abundance_assay[n]).
Adds the required SMF_ID and standard mzTab-M columns (e.g.,
exp_mass_to_charge).
Populates missing mandatory columns with "null" strings to ensure
compliance.
Sets the line prefix column SFH to "SMF".
Orders columns according to the mzTab-M specification.
Important: to support the optional additional parameters passed along
with ... all parameters (such as adduct_ion,
retention_time_in_seconds etc) have to be fully spelled out. All
parameters are vectorized and recycled as needed to match the number of rows
in the abundance matrix. If their length is not equal to the number of rows
or 1, an error is raised.
See also the specification of the SMF section for details.
smfSort() can be used to sort the columns of the SMF data frame
according to the standard order defined in the mzTab-M specification.
This is useful if you have added custom columns and want to ensure the
standard columns are in the correct order for export.
smfCreate( ..., x, exp_mass_to_charge = numeric(), retention_time_in_seconds = numeric(), retention_time_in_seconds_start = numeric(), retention_time_in_seconds_end = numeric(), SME_ID_REFS = character(), SME_ID_REF_ambiguity_code = character(), charge = numeric(), adduct_ion = character(), isotopomer = character() ) smfSort(x)smfCreate( ..., x, exp_mass_to_charge = numeric(), retention_time_in_seconds = numeric(), retention_time_in_seconds_start = numeric(), retention_time_in_seconds_end = numeric(), SME_ID_REFS = character(), SME_ID_REF_ambiguity_code = character(), charge = numeric(), adduct_ion = character(), isotopomer = character() ) smfSort(x)
... |
Additional optional columns to add. These arguments must be named.
The function will automatically prepend |
x |
|
exp_mass_to_charge |
|
retention_time_in_seconds |
|
retention_time_in_seconds_start |
|
retention_time_in_seconds_end |
|
SME_ID_REFS |
|
SME_ID_REF_ambiguity_code |
|
charge |
|
adduct_ion |
|
isotopomer |
|
All parameters passed to the smfCreate() function must be fully named.
A complete SMF data.frame ready for export. The data frame contains
the 'SFH' line prefix, standard columns ordered according to spec,
abundance columns, and any optional columns.
Philippine Louail
MTD-export and SML-export for creating and formatting the metadata (MTD) and small molecule (SML) sections.
## Assume we have a matrix of abundances (e.g., from xcms or similar tools) ## Rows are features, Columns are Samples/Assays. abund_mat <- matrix( c(100.1, 105.2, 110.3, 200.5, 198.2, 201.0, 50.0, 55.0, 52.1), nrow = 3, byrow = TRUE ) ## Define feature metadata (vectors must match number of rows in matrix) mz_values <- c(150.05, 200.10, 300.15) rt_values <- c(20.5, 35.1, 40.0) adducts <- c("[M+H]+", "[M+Na]+", "[M+H]+") ## Create the final dataframe ready for export ## Note: Fields not provided (like charge) are automatically set to "null" smf_final <- smfCreate( x = abund_mat, exp_mass_to_charge = mz_values, retention_time_in_seconds = rt_values, adduct_ion = adducts, ## Optional custom column example global_custom_attribute = c("A", "B", "C") ) ## The result contains the 'SFH' line prefix, standard columns, and ## abundances head(smf_final)## Assume we have a matrix of abundances (e.g., from xcms or similar tools) ## Rows are features, Columns are Samples/Assays. abund_mat <- matrix( c(100.1, 105.2, 110.3, 200.5, 198.2, 201.0, 50.0, 55.0, 52.1), nrow = 3, byrow = TRUE ) ## Define feature metadata (vectors must match number of rows in matrix) mz_values <- c(150.05, 200.10, 300.15) rt_values <- c(20.5, 35.1, 40.0) adducts <- c("[M+H]+", "[M+Na]+", "[M+H]+") ## Create the final dataframe ready for export ## Note: Fields not provided (like charge) are automatically set to "null" smf_final <- smfCreate( x = abund_mat, exp_mass_to_charge = mz_values, retention_time_in_seconds = rt_values, adduct_ion = adducts, ## Optional custom column example global_custom_attribute = c("A", "B", "C") ) ## The result contains the 'SFH' line prefix, standard columns, and ## abundances head(smf_final)
The Small Molecule (SML) table is mandatory in the mzTab-M format and
provides abundances along with annotations on the small molecules of an
analysis result. All columns in this table need to be tab-delimited and no
empty cells are allowed ("null" must be used for missing values, for
columns that support that).
Each row of the small molecule section is intended to report one final result to be communicated in terms of a molecule that has been quantified. In many cases, this may be the molecule of biological interest. In general, different adduct forms for the same molecule would be reported in the Small Molecule Feature (SMF) section.
A detailed description of the SML format and its columns is provided in the respective section of the mzTab-M specification.
The functions to create and format the SML content are:
smlCreate(): main function that creates the SML table based on the
provided abundance matrix and small molecule annotation information.
The function automatically:
Formats the abundance matrix x (renaming column names to
"abundance_assay[n]")
Adds the required "SML_ID" column with a running integer representing
the primary key (unique ID) for each molecule.
Populates missing mandatory columns with "null" strings.
Sets the line prefix column.
Orders the columns according to the mzTab-M specification.
After smlCreate(), required study variable abundance columns should
be either added manually or using the smlAddStudyVariableColumns()
function. These required columns contain the average and variation
of molecule abundances across samples for a particular study variable.
smlAddStudyVariableColumns(): completes the SML table created
with smlCreate() by adding columns with aggregated assay values
(molecule abundances) per study variable(s). Study variables have to be
listed in the MTD section provided with the mtd parameter (e.g.
"study_variable[1]") where also the aggregation functions must be
defined (e.g. "study_variable[1]-average_function"). This aggregation
and variation functions has to be defined using the respective MS CV
ontology term (e.g. "MS:1002962" for the mean and "MS:1002963" for
the coefficient of variation). See MTD-export for details on the MTD
section.
The function:
gets all study variables from the metadata section (mtd)
for each study variable:
gets the assay references from MTD
gets the summary functions (for average and variation) from MTD (resolving the MS CV term to the respective R function)
gets the referred abundance columns in x
calculates the summary statistic and these as additional columns to
x.
smlSort(): sorts the columns of an SML table into the expected order.
smlCreate( ..., x, SMF_ID_REFS = character(), database_identifier = character(), chemical_formula = character(), smiles = character(), inchi = character(), chemical_name = character(), uri = character(), theoretical_neutral_mass = numeric(), adduct_ions = character(), reliability = character(), best_id_confidence_measure = character(), best_id_confidence_value = character() ) smlSort(x) smlAddStudyVariableColumns(x, mtd)smlCreate( ..., x, SMF_ID_REFS = character(), database_identifier = character(), chemical_formula = character(), smiles = character(), inchi = character(), chemical_name = character(), uri = character(), theoretical_neutral_mass = numeric(), adduct_ions = character(), reliability = character(), best_id_confidence_measure = character(), best_id_confidence_value = character() ) smlSort(x) smlAddStudyVariableColumns(x, mtd)
... |
optional columns to be added to the SML. The length of arguments
passed through |
x |
for |
SMF_ID_REFS |
|
database_identifier |
|
chemical_formula |
|
smiles |
|
inchi |
|
chemical_name |
|
uri |
|
theoretical_neutral_mass |
|
adduct_ions |
|
reliability |
|
best_id_confidence_measure |
|
best_id_confidence_value |
|
mtd |
two-column |
All parameters passed to the smlCreate() function must be fully named.
For smlCreate(): a character data.frame ready for export. The
data.frame contains the "SMH" line prefix, the standard columns ordered
according to the mzTab-M specification, the abundance columns, the
abundance columns aggregated per study variables and any optional columns.
For smlSort(): the input data.frame with columns ordered as defined
by the mzTab-M definition.
Johannes Rainer
MTD-export and SMF-export for creating and formatting the metadata (MTD) and small molecule feature (SMF) sections.
## Define an example abundance matrix amat <- matrix(abs(rnorm(20)), ncol = 4, nrow = 5) colnames(amat) <- c("sample_1", "sample_2", "sample_3", "sample_4") ## Define annotation information for each row (molecule). This information ## should be the result of an annotation pipeline used. anns <- data.frame( hmdb_id = c("HMDB:HMDB0001847", NA, "HMDB:HMDB0000122|HMDB:HMDB0000169", "HMDB:HMDB0000258", "HMDB:HMDB0060475"), formula = c("C8H10N4O2", NA, "C6H12O6|C6H12O6", "C12H22O11", "C5H9NO4"), smiles = c("CN1C=NC2=C1C(=O)N(C)C(=O)N2C", NA, "OC[C@H]1O[C@H](O)[C@H](O)[C@H](O)[C@H]1O|OC[C@H]1O[C@H](O)[C@H](O)[C@H](O)[C@H]1O", NA, "NC(CCC(O)=O)C(O)=O"), neutral_mass = c(194.0804, NA, "180.0634|180.0634", 342.1162, 147.0531), name = c("caffeine", NA, "glucose|mannose", "sucrose", "DL-Glutamate"), adduct_ions = c("[M+H]1+", NA, "[M+Na]1+", "[M+H]1+", "[M+H]1+"), uri = c("http://www.hmdb.ca/metabolites/HMDB0001847", NA, "http://www.hmdb.ca/metabolites/HMDB0000122|http://www.hmdb.ca/metabolites/HMDB0000169", "http://www.hmdb.ca/metabolites/HMDB0000258", "http://www.hmdb.ca/metabolites/HMDB0060475")) ## Create the SML table based on this information. Note that we have to ## use the **full** name for **all** function arguments (i.e., `x = amat` ## etc). sml <- smlCreate( x = amat, database_identifier = anns$hmdb_id, chemical_formula = anns$formula, smiles = anns$smiles, theoretical_neutral_mass = anns$neutral_mass, uri = anns$uri, chemical_name = anns$name, adduct_ions = anns$adduct_ions) sml ## Update the reference between rows in SML with those in an SMF table sml$SMF_ID_REFS <- c("3", "5", "8|23", "12", "16") ## Re-order columns in the expected order sml <- smlSort(sml) sml ## Add study variable abundance columns based on study variable definition ## in a metadata (MTD) section. ## In the example below we first define a metadata section with the ## minimally required information for `smlAddStudyVariableColumns()`: ## the study variables, the samples (assays) that belong to each and ## a average and variation function (see ?MTD-export for full help on the ## metadata section). ## In the example we assign sample 1 and 3 to the `"male"` category/study ## variable and 2 and 4 to `"female"`. For the average and variation ## function we use the mean and coefficient of variation. mtd <- cbind( c("study_variable[1]", "study_variable[1]-assay_refs", "study_variable[1]-average_function", "study_variable[1]-variation_function", "study_variable[1]-description", "study_variable[2]", "study_variable[2]-assay_refs", "study_variable[2]-average_function", "study_variable[2]-variation_function", "study_variable[2]-description"), c("male", "assay[1]|assay[3]", "[MS, MS:1002962, mean, ]", "[MS, MS:1002963, variation coefficient, ]", "sex of participants, male", "female", "assay[2]|assay[4]", "[MS, MS:1002962, mean, ]", "[MS, MS:1002963, variation coefficient, ]", "sex of participants, female") ) ## Add study variable average and variation columns and sort the columns sml <- smlAddStudyVariableColumns(sml, mtd) |> smlSort() sml ## The average and variation for each study variable were calculated from ## the assay abundances and added as additional columns to the SML.## Define an example abundance matrix amat <- matrix(abs(rnorm(20)), ncol = 4, nrow = 5) colnames(amat) <- c("sample_1", "sample_2", "sample_3", "sample_4") ## Define annotation information for each row (molecule). This information ## should be the result of an annotation pipeline used. anns <- data.frame( hmdb_id = c("HMDB:HMDB0001847", NA, "HMDB:HMDB0000122|HMDB:HMDB0000169", "HMDB:HMDB0000258", "HMDB:HMDB0060475"), formula = c("C8H10N4O2", NA, "C6H12O6|C6H12O6", "C12H22O11", "C5H9NO4"), smiles = c("CN1C=NC2=C1C(=O)N(C)C(=O)N2C", NA, "OC[C@H]1O[C@H](O)[C@H](O)[C@H](O)[C@H]1O|OC[C@H]1O[C@H](O)[C@H](O)[C@H](O)[C@H]1O", NA, "NC(CCC(O)=O)C(O)=O"), neutral_mass = c(194.0804, NA, "180.0634|180.0634", 342.1162, 147.0531), name = c("caffeine", NA, "glucose|mannose", "sucrose", "DL-Glutamate"), adduct_ions = c("[M+H]1+", NA, "[M+Na]1+", "[M+H]1+", "[M+H]1+"), uri = c("http://www.hmdb.ca/metabolites/HMDB0001847", NA, "http://www.hmdb.ca/metabolites/HMDB0000122|http://www.hmdb.ca/metabolites/HMDB0000169", "http://www.hmdb.ca/metabolites/HMDB0000258", "http://www.hmdb.ca/metabolites/HMDB0060475")) ## Create the SML table based on this information. Note that we have to ## use the **full** name for **all** function arguments (i.e., `x = amat` ## etc). sml <- smlCreate( x = amat, database_identifier = anns$hmdb_id, chemical_formula = anns$formula, smiles = anns$smiles, theoretical_neutral_mass = anns$neutral_mass, uri = anns$uri, chemical_name = anns$name, adduct_ions = anns$adduct_ions) sml ## Update the reference between rows in SML with those in an SMF table sml$SMF_ID_REFS <- c("3", "5", "8|23", "12", "16") ## Re-order columns in the expected order sml <- smlSort(sml) sml ## Add study variable abundance columns based on study variable definition ## in a metadata (MTD) section. ## In the example below we first define a metadata section with the ## minimally required information for `smlAddStudyVariableColumns()`: ## the study variables, the samples (assays) that belong to each and ## a average and variation function (see ?MTD-export for full help on the ## metadata section). ## In the example we assign sample 1 and 3 to the `"male"` category/study ## variable and 2 and 4 to `"female"`. For the average and variation ## function we use the mean and coefficient of variation. mtd <- cbind( c("study_variable[1]", "study_variable[1]-assay_refs", "study_variable[1]-average_function", "study_variable[1]-variation_function", "study_variable[1]-description", "study_variable[2]", "study_variable[2]-assay_refs", "study_variable[2]-average_function", "study_variable[2]-variation_function", "study_variable[2]-description"), c("male", "assay[1]|assay[3]", "[MS, MS:1002962, mean, ]", "[MS, MS:1002963, variation coefficient, ]", "sex of participants, male", "female", "assay[2]|assay[4]", "[MS, MS:1002962, mean, ]", "[MS, MS:1002963, variation coefficient, ]", "sex of participants, female") ) ## Add study variable average and variation columns and sort the columns sml <- smlAddStudyVariableColumns(sml, mtd) |> smlSort() sml ## The average and variation for each study variable were calculated from ## the assay abundances and added as additional columns to the SML.