Compiled: Sat Aug 31 05:53:14 2024
The SpectriPy
package allows integration of Python MS
packages into a Spectra
-based MS analysis in
R
. Python functionality is wrapped into R functions
allowing a seamless integration of the functionality of Python’s
matchms
package into R
. In addition, functions
to convert between R’s Spectra
objects and Python’s
matchms
spectrum objects are available to the advanced user
or developer.
The package requires a python environment to be available and can be
installed with the BiocManager
R package using the command
BiocManager::install("RforMassSpectrometry/SpectriPy")
. All
required python packages are installed automatically on demand.
matchms
The SpectriPy
package provides the
compareSpectriPy
function that allows to perform spectra
similarity calculations using the scoring functions from the
matchms
Python package. Below all currently supported
scoring functions are listed along with the parameter class
that allows selecting and configuring the algorithm in the
compareSpectriPy
function. Additional functions will be
added in future.
CosineGreedyParam
.CosineHungarianParam
.ModifiedCosineParam
.We next create some simple example spectra and subsequently use the
compareSpectriPy
function to calculate pairwise
similarities between these.
library(Spectra)
library(SpectriPy)
## Create a Spectra object with two MS2 spectra for Caffeine.
caf <- DataFrame(
msLevel = c(2L, 2L),
name = "Caffeine",
precursorMz = c(195.0877, 195.0877)
)
caf$intensity <- list(
c(340.0, 416, 2580, 412),
c(388.0, 3270, 85, 54, 10111))
caf$mz <- list(
c(135.0432, 138.0632, 163.0375, 195.0880),
c(110.0710, 138.0655, 138.1057, 138.1742, 195.0864))
caf <- Spectra(caf)
## Create a Spectra object with two MS2 spectra for 1-Methylhistidine
mhd <- DataFrame(
msLevel = c(2L, 2L),
precursorMz = c(170.0924, 170.0924),
id = c("HMDB0000001", "HMDB0000001"),
name = c("1-Methylhistidine", "1-Methylhistidine"))
mhd$mz <- list(
c(109.2, 124.2, 124.5, 170.16, 170.52),
c(83.1, 96.12, 97.14, 109.14, 124.08, 125.1, 170.16))
mhd$intensity <- list(
c(3.407, 47.494, 3.094, 100.0, 13.240),
c(6.685, 4.381, 3.022, 16.708, 100.0, 4.565, 40.643))
mhd <- Spectra(mhd)
We first calculate pairwise similarities between all spectra defined
above and those of caffeine using Spectra
’s built-in
compareSpectra
function.
## 1 2
## 1 1.0000000 0.1973448
## 2 0.1973448 1.0000000
## 3 0.0000000 0.0000000
## 4 0.0000000 0.0000000
Thus, compareSpectra
returned the pairwise similarity
scores (by default calculated using the normalized dot-product function)
between all spectra in all
(rows) and all spectra in
caf
(columns). compareSpectriPy
works similar,
with the difference that we need to specify and configure the similarity
function (from matchms
) using a dedicated parameter object.
Below we calculate the similarity using the CosineGreedy
function changing the tolerance
to a value of
0.05
(instead of the default 0.1
).
## + /github/home/.cache/R/basilisk/1.17.2/0/bin/conda create --yes --prefix /github/home/.cache/R/basilisk/1.17.2/SpectriPy/0.1.1/matchms_env 'python=3.10.14' --quiet -c bioconda -c conda-forge --override-channels
## + /github/home/.cache/R/basilisk/1.17.2/0/bin/conda install --yes --prefix /github/home/.cache/R/basilisk/1.17.2/SpectriPy/0.1.1/matchms_env 'python=3.10.14' -c bioconda -c conda-forge --override-channels
## + /github/home/.cache/R/basilisk/1.17.2/0/bin/conda install --yes --prefix /github/home/.cache/R/basilisk/1.17.2/SpectriPy/0.1.1/matchms_env -c bioconda -c conda-forge 'python=3.10.14' 'matchms=0.14.0' --override-channels
## [,1] [,2]
## [1,] 1.0000000 0.1948181
## [2,] 0.1948181 1.0000000
## [3,] 0.0000000 0.0000000
## [4,] 0.0000000 0.0000000
As a result compareSpectriPy
returns also a numeric
matrix of similarities. Note also that the first
compareSpectriPy
call takes usually a little longer because
the Python setup has to be initialized.
Next we use the ModifiedCosine algorithm that considers also differences between the spectra’s precursor m/z in the calculation.
## [,1] [,2]
## [1,] 1.00000000 0.1948181
## [2,] 0.19481813 1.0000000
## [3,] 0.13841831 0.8520549
## [4,] 0.05724816 0.3523997
Note that for this calculation all spectra precursor m/z values need
to be available, otherwise an error will be thrown. Thus, we should
always ensure to remove spectra without precursor m/z values prior to
similarity scoring with this similarity method. Below we remove the
precursor m/z from one of our input spectra and then show how the
Spectra
object could be subsetted to valid spectra
for this method.
## Remove precursor m/z from the 3rd spectrum
all$precursorMz[3] <- NA
## Filter the input spectra removing those with missing precursor.
all <- all[!is.na(precursorMz(all))]
compareSpectriPy(all, caf, param = ModifiedCosineParam())
## [,1] [,2]
## [1,] 1.00000000 0.1948181
## [2,] 0.19481813 1.0000000
## [3,] 0.05724816 0.3523997
For advanced users or developers, the rspec_to_pyspec
and pyspec_to_rspec
functions are available that enable
conversion between MS data representations in R
and Python
(i.e. between the Spectra
object and the Python
matchms
Spectrum object). To use these functions the
reticulate
package needs to be installed along with a
Python environment and the matchms
Python package.
To illustrate their use we initialize below the Python environment
that is bundled using the basilisk
package within SpectriPy
.
## Loading required package: reticulate
We next create a simple Spectra
object representing
fragment spectra for some small compounds.
library(Spectra)
# create example spectra
spd <- DataFrame(
msLevel = c(2L, 2L, 2L),
id = c("HMDB0000001", "HMDB0000001", "HMDB0001847"),
name = c("1-Methylhistidine", "1-Methylhistidine", "Caffeine"))
## Assign m/z and intensity values.
spd$mz <- list(
c(109.2, 124.2, 124.5, 170.16, 170.52),
c(83.1, 96.12, 97.14, 109.14, 124.08, 125.1, 170.16),
c(56.0494, 69.0447, 83.0603, 109.0395, 110.0712,
111.0551, 123.0429, 138.0662, 195.0876))
spd$intensity <- list(
c(3.407, 47.494, 3.094, 100.0, 13.240),
c(6.685, 4.381, 3.022, 16.708, 100.0, 4.565, 40.643),
c(0.459, 2.585, 2.446, 0.508, 8.968, 0.524, 0.974, 100.0, 40.994))
sps <- Spectra(spd)
We next convert the Spectra
to a matchms
Spectrum object.
## [<matchms.Spectrum.Spectrum>, <matchms.Spectrum.Spectrum>, <matchms.Spectrum.Spectrum>]
As a result we got now a Python list with 3 Spectrum objects
containing the peak data of sps
as well as a reduced set of
the available spectra variables in sps
. Which spectra
variables get copied to Python can be defined with the
mapping
parameter of rspec_to_pyspec
. The
default is to convert all variables defined by
spectraVariableMapping()
, but additional variables along
with their respective names in the Python Spectrum object can be defined
too. Below we list the pre-selected spectra variables that are converted
by default.
## precursorMz precursorIntensity
## "precursor_mz" "precursor_intensity"
## precursorCharge rtime
## "charge" "retention_time"
## collisionEnergy isolationWindowTargetMz
## "collision_energy" "isolation_window_target_mz"
## msLevel
## "ms_level"
With our Spectra
data converted to Python, we could
directly call all routines from the matchms
package using
the reticulate
R package. Below we normalize the
intensities of the 3 spectra using the from
normalize_intensities
functions from the
matchms.filtering
Python package. We thus need to import
first the functionality from that package and can then directly call the
function on the objects.
library(reticulate)
filters <- import("matchms.filtering")
res <- vector("list", length(pysps))
for (i in (seq_along(pysps) - 1))
res[[i + 1]] <- filters$normalize_intensities(pysps[i])
res <- r_to_py(res)
res
## [<matchms.Spectrum.Spectrum>, <matchms.Spectrum.Spectrum>, <matchms.Spectrum.Spectrum>]
We can now convert the list of Python Spectrum objects back to R with
pyspec_to_rspec
:
## NumericList of length 3
## [[1]] 0.03407 0.47494 0.03094 1 0.1324
## [[2]] 0.06685 0.04381 0.03022 0.16708 1 0.04565 0.40643
## [[3]] 0.00459 0.02585 0.02446 0.00508 0.08968 0.00524 0.00974 1 0.40994
## NumericList of length 3
## [[1]] 3.407 47.494 3.094 100 13.24
## [[2]] 6.685 4.381 3.022 16.708 100 4.565 40.643
## [[3]] 0.459 2.585 2.446 0.508 8.968 0.524 0.974 100 40.994
Intensity values have now been normalized to values between 0 and 1.
Note however that, it would be much better (and likely more efficient) to directly use a Python function on the list of Spectrum objects instead of the mixed R and Python code used in the example above. Below we define a simple python script that iterates over the spectra in python and performs the normalization.
py_script <- paste0("from matchms.filtering import normalize_intensities\n",
"for i in range(len(pysps)):\n",
" pysps[i] = normalize_intensities(pysps[i])\n")
py$pysps <- pysps
py_run_string(py_script)
tmp <- pyspec_to_rspec(pysps)
intensity(tmp)
## NumericList of length 3
## [[1]] 0.03407 0.47494 0.03094 1 0.1324
## [[2]] 0.06685 0.04381 0.03022 0.16708 1 0.04565 0.40643
## [[3]] 0.00459 0.02585 0.02446 0.00508 0.08968 0.00524 0.00974 1 0.40994
At very last we need also to stop the Python environment created by
basilisk
.
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] basilisk_1.17.2 reticulate_1.38.0 SpectriPy_0.1.1
## [4] Spectra_1.15.7 ProtGenerics_1.37.1 BiocParallel_1.39.0
## [7] S4Vectors_0.43.2 BiocGenerics_0.51.0 BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] Matrix_1.7-0 jsonlite_1.8.8 compiler_4.4.1
## [4] BiocManager_1.30.25 filelock_1.0.3 Rcpp_1.0.13
## [7] parallel_4.4.1 cluster_2.1.6 jquerylib_0.1.4
## [10] png_0.1-8 IRanges_2.39.2 yaml_2.3.10
## [13] fastmap_1.2.0 lattice_0.22-6 R6_2.5.1
## [16] knitr_1.48 MASS_7.3-61 maketools_1.3.0
## [19] bslib_0.8.0 rlang_1.1.4 cachem_1.1.0
## [22] dir.expiry_1.13.0 xfun_0.47 fs_1.6.4
## [25] MsCoreUtils_1.17.1 sass_0.4.9 sys_3.4.2
## [28] cli_3.6.3 withr_3.0.1 grid_4.4.1
## [31] digest_0.6.37 MetaboCoreUtils_1.13.0 lifecycle_1.0.4
## [34] clue_0.3-65 evaluate_0.24.0 codetools_0.2-20
## [37] buildtools_1.0.0 rmarkdown_2.28 basilisk.utils_1.17.2
## [40] tools_4.4.1 htmltools_0.5.8.1