Package: PSMatch
Authors: Laurent Gatto [aut, cre] (https://orcid.org/0000-0002-1520-2268), Johannes Rainer
[aut] (https://orcid.org/0000-0002-6977-7147), Sebastian Gibb
[aut] (https://orcid.org/0000-0001-7406-4443), Samuel Wieczorek
[ctb], Thomas Burger [ctb]
Last modified:
2024-10-28 13:27:57.226887
Compiled: Mon Oct 28
13:32:01 2024
This vignette is one among several illustrating how to use the
PSMatch
package, focusing on the calculation and
visualisation of MS2 fragment ions. For a general overview of the
package, see the PSMatch
package manual page
(?PSMatch
) and references therein.
To illustrate this vignette, we will import and merge raw and identification data from the msdata. For details about this section, please visit the Spectra package webpage.
Load the raw MS data:
## [1] "/github/workspace/pkglib/msdata/proteomics/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz"
Load the identification data:
## [1] "/github/workspace/pkglib/msdata/ident/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"
## Starting with 5802 PSMs:
## Removed 2896 decoy hits.
## Removed 155 PSMs with rank > 1.
## Removed 85 shared peptides.
## 2666 PSMs left.
## PSM with 2666 rows and 35 columns.
## names(35): sequence spectrumID ... subReplacementResidue subLocation
Merge both:
## Warning in joinSpectraData(sp, id, by.x = "spectrumId", by.y = "spectrumID"):
## Duplicates found in the 'y' key. Only last instance will be kept!
## MSn data (Spectra) with 7534 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 1 0.4584 1
## 2 1 0.9725 2
## 3 1 1.8524 3
## 4 1 2.7424 4
## 5 1 3.6124 5
## ... ... ... ...
## 7530 2 3600.47 7530
## 7531 2 3600.83 7531
## 7532 2 3601.18 7532
## 7533 2 3601.57 7533
## 7534 2 3601.98 7534
## ... 67 more variables/columns.
##
## file(s):
## TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz
In this example, we are going to focus the MS2 scan with index 5449 and its parent MS1 scan (index 5447, selected automatically with the filterPrecursorScan() function).
The MS2 scan was matched to SQILQQAGTSVLSQANQVPQTVLSLLR
(there was obviously no match the the MS1 scan):
## [1] NA "SQILQQAGTSVLSQANQVPQTVLSLLR"
The calculateFragments()
simply takes a peptide sequence
as input and returns a data.frame
with the fragment
sequences, M/Z, ion type, charge and position.
## Modifications used: C=57.02146
## mz ion type pos z seq
## 1 88.03931 b1 b 1 1 S
## 2 216.09789 b2 b 2 1 SQ
## 3 329.18195 b3 b 3 1 SQI
## 4 442.26601 b4 b 4 1 SQIL
## 5 570.32459 b5 b 5 1 SQILQ
## 6 698.38317 b6 b 6 1 SQILQQ
## 7 769.42028 b7 b 7 1 SQILQQA
## 8 826.44174 b8 b 8 1 SQILQQAG
## 9 927.48942 b9 b 9 1 SQILQQAGT
## 10 1014.52145 b10 b 10 1 SQILQQAGTS
## 11 1113.58986 b11 b 11 1 SQILQQAGTSV
## 12 1226.67392 b12 b 12 1 SQILQQAGTSVL
## 13 1313.70595 b13 b 13 1 SQILQQAGTSVLS
## 14 1441.76453 b14 b 14 1 SQILQQAGTSVLSQ
## 15 1512.80164 b15 b 15 1 SQILQQAGTSVLSQA
## 16 1626.84457 b16 b 16 1 SQILQQAGTSVLSQAN
## 17 1754.90315 b17 b 17 1 SQILQQAGTSVLSQANQ
## 18 1853.97156 b18 b 18 1 SQILQQAGTSVLSQANQV
## 19 1951.02432 b19 b 19 1 SQILQQAGTSVLSQANQVP
## 20 2079.08290 b20 b 20 1 SQILQQAGTSVLSQANQVPQ
## 21 2180.13058 b21 b 21 1 SQILQQAGTSVLSQANQVPQT
## 22 2279.19899 b22 b 22 1 SQILQQAGTSVLSQANQVPQTV
## 23 2392.28305 b23 b 23 1 SQILQQAGTSVLSQANQVPQTVL
## 24 2479.31508 b24 b 24 1 SQILQQAGTSVLSQANQVPQTVLS
## 25 2592.39914 b25 b 25 1 SQILQQAGTSVLSQANQVPQTVLSL
## 26 2705.48320 b26 b 26 1 SQILQQAGTSVLSQANQVPQTVLSLL
## 27 175.11895 y1 y 1 1 R
## 28 288.20301 y2 y 2 1 LR
## 29 401.28707 y3 y 3 1 LLR
## 30 488.31910 y4 y 4 1 SLLR
## 31 601.40316 y5 y 5 1 LSLLR
## 32 700.47157 y6 y 6 1 VLSLLR
## 33 801.51925 y7 y 7 1 TVLSLLR
## 34 929.57783 y8 y 8 1 QTVLSLLR
## 35 1026.63059 y9 y 9 1 PQTVLSLLR
## 36 1125.69900 y10 y 10 1 VPQTVLSLLR
## 37 1253.75758 y11 y 11 1 QVPQTVLSLLR
## 38 1367.80051 y12 y 12 1 NQVPQTVLSLLR
## 39 1438.83762 y13 y 13 1 ANQVPQTVLSLLR
## 40 1566.89620 y14 y 14 1 QANQVPQTVLSLLR
## 41 1653.92823 y15 y 15 1 SQANQVPQTVLSLLR
## 42 1767.01229 y16 y 16 1 LSQANQVPQTVLSLLR
## 43 1866.08070 y17 y 17 1 VLSQANQVPQTVLSLLR
## 44 1953.11273 y18 y 18 1 SVLSQANQVPQTVLSLLR
## 45 2054.16041 y19 y 19 1 TSVLSQANQVPQTVLSLLR
## 46 2111.18187 y20 y 20 1 GTSVLSQANQVPQTVLSLLR
## 47 2182.21898 y21 y 21 1 AGTSVLSQANQVPQTVLSLLR
## 48 2310.27756 y22 y 22 1 QAGTSVLSQANQVPQTVLSLLR
## 49 2438.33614 y23 y 23 1 QQAGTSVLSQANQVPQTVLSLLR
## 50 2551.42020 y24 y 24 1 LQQAGTSVLSQANQVPQTVLSLLR
## 51 2664.50426 y25 y 25 1 ILQQAGTSVLSQANQVPQTVLSLLR
## 52 2792.56284 y26 y 26 1 QILQQAGTSVLSQANQVPQTVLSLLR
## 53 996.51088 b10_ b_ 10 1 SQILQQAGTS
## 54 1095.57929 b11_ b_ 11 1 SQILQQAGTSV
## 55 1208.66335 b12_ b_ 12 1 SQILQQAGTSVL
## 56 1295.69538 b13_ b_ 13 1 SQILQQAGTSVLS
## 57 1423.75396 b14_ b_ 14 1 SQILQQAGTSVLSQ
## 58 1494.79107 b15_ b_ 15 1 SQILQQAGTSVLSQA
## 59 1608.83400 b16_ b_ 16 1 SQILQQAGTSVLSQAN
## 60 1736.89258 b17_ b_ 17 1 SQILQQAGTSVLSQANQ
## 61 1835.96099 b18_ b_ 18 1 SQILQQAGTSVLSQANQV
## 62 1933.01375 b19_ b_ 19 1 SQILQQAGTSVLSQANQVP
## 63 2061.07233 b20_ b_ 20 1 SQILQQAGTSVLSQANQVPQ
## 64 2162.12001 b21_ b_ 21 1 SQILQQAGTSVLSQANQVPQT
## 65 2261.18842 b22_ b_ 22 1 SQILQQAGTSVLSQANQVPQTV
## 66 2374.27248 b23_ b_ 23 1 SQILQQAGTSVLSQANQVPQTVL
## 67 2461.30451 b24_ b_ 24 1 SQILQQAGTSVLSQANQVPQTVLS
## 68 2574.38857 b25_ b_ 25 1 SQILQQAGTSVLSQANQVPQTVLSL
## 69 2687.47263 b26_ b_ 26 1 SQILQQAGTSVLSQANQVPQTVLSLL
## 70 583.39260 y5_ y_ 5 1 LSLLR
## 71 682.46101 y6_ y_ 6 1 VLSLLR
## 72 783.50869 y7_ y_ 7 1 TVLSLLR
## 73 911.56727 y8_ y_ 8 1 QTVLSLLR
## 74 1008.62003 y9_ y_ 9 1 PQTVLSLLR
## 75 1107.68844 y10_ y_ 10 1 VPQTVLSLLR
## 76 1235.74702 y11_ y_ 11 1 QVPQTVLSLLR
## 77 1349.78995 y12_ y_ 12 1 NQVPQTVLSLLR
## 78 1420.82706 y13_ y_ 13 1 ANQVPQTVLSLLR
## 79 1548.88564 y14_ y_ 14 1 QANQVPQTVLSLLR
## 80 1635.91767 y15_ y_ 15 1 SQANQVPQTVLSLLR
## 81 1749.00173 y16_ y_ 16 1 LSQANQVPQTVLSLLR
## 82 1848.07014 y17_ y_ 17 1 VLSQANQVPQTVLSLLR
## 83 1935.10217 y18_ y_ 18 1 SVLSQANQVPQTVLSLLR
## 84 2036.14985 y19_ y_ 19 1 TSVLSQANQVPQTVLSLLR
## 85 2093.17131 y20_ y_ 20 1 GTSVLSQANQVPQTVLSLLR
## 86 2164.20842 y21_ y_ 21 1 AGTSVLSQANQVPQTVLSLLR
## 87 2292.26700 y22_ y_ 22 1 QAGTSVLSQANQVPQTVLSLLR
## 88 2420.32558 y23_ y_ 23 1 QQAGTSVLSQANQVPQTVLSLLR
## 89 2533.40964 y24_ y_ 24 1 LQQAGTSVLSQANQVPQTVLSLLR
## 90 2646.49370 y25_ y_ 25 1 ILQQAGTSVLSQANQVPQTVLSLLR
## 91 2774.55228 y26_ y_ 26 1 QILQQAGTSVLSQANQVPQTVLSLLR
## 92 157.10839 y1_ y_ 1 1 R
## 93 270.19245 y2_ y_ 2 1 LR
## 94 383.27651 y3_ y_ 3 1 LLR
## 95 470.30854 y4_ y_ 4 1 SLLR
## 96 312.15540 b3* b* 3 1 SQI
## 97 425.23946 b4* b* 4 1 SQIL
## 98 553.29804 b5* b* 5 1 SQILQ
## 99 681.35662 b6* b* 6 1 SQILQQ
## 100 752.39373 b7* b* 7 1 SQILQQA
## 101 809.41519 b8* b* 8 1 SQILQQAG
## 102 910.46287 b9* b* 9 1 SQILQQAGT
## 103 997.49490 b10* b* 10 1 SQILQQAGTS
## 104 1096.56331 b11* b* 11 1 SQILQQAGTSV
## 105 1209.64737 b12* b* 12 1 SQILQQAGTSVL
## 106 1296.67940 b13* b* 13 1 SQILQQAGTSVLS
## 107 1424.73798 b14* b* 14 1 SQILQQAGTSVLSQ
## 108 1495.77509 b15* b* 15 1 SQILQQAGTSVLSQA
## 109 1609.81802 b16* b* 16 1 SQILQQAGTSVLSQAN
## 110 1737.87660 b17* b* 17 1 SQILQQAGTSVLSQANQ
## 111 1836.94501 b18* b* 18 1 SQILQQAGTSVLSQANQV
## 112 1933.99777 b19* b* 19 1 SQILQQAGTSVLSQANQVP
## 113 2062.05635 b20* b* 20 1 SQILQQAGTSVLSQANQVPQ
## 114 2163.10403 b21* b* 21 1 SQILQQAGTSVLSQANQVPQT
## 115 2262.17244 b22* b* 22 1 SQILQQAGTSVLSQANQVPQTV
## 116 2375.25650 b23* b* 23 1 SQILQQAGTSVLSQANQVPQTVL
## 117 2462.28853 b24* b* 24 1 SQILQQAGTSVLSQANQVPQTVLS
## 118 2575.37259 b25* b* 25 1 SQILQQAGTSVLSQANQVPQTVLSL
## 119 2688.45665 b26* b* 26 1 SQILQQAGTSVLSQANQVPQTVLSLL
## 120 912.55128 y8* y* 8 1 QTVLSLLR
## 121 1009.60404 y9* y* 9 1 PQTVLSLLR
## 122 1108.67245 y10* y* 10 1 VPQTVLSLLR
## 123 1236.73103 y11* y* 11 1 QVPQTVLSLLR
## 124 1350.77396 y12* y* 12 1 NQVPQTVLSLLR
## 125 1421.81107 y13* y* 13 1 ANQVPQTVLSLLR
## 126 1549.86965 y14* y* 14 1 QANQVPQTVLSLLR
## 127 1636.90168 y15* y* 15 1 SQANQVPQTVLSLLR
## 128 1749.98574 y16* y* 16 1 LSQANQVPQTVLSLLR
## 129 1849.05415 y17* y* 17 1 VLSQANQVPQTVLSLLR
## 130 1936.08618 y18* y* 18 1 SVLSQANQVPQTVLSLLR
## 131 2037.13386 y19* y* 19 1 TSVLSQANQVPQTVLSLLR
## 132 2094.15532 y20* y* 20 1 GTSVLSQANQVPQTVLSLLR
## 133 2165.19243 y21* y* 21 1 AGTSVLSQANQVPQTVLSLLR
## 134 2293.25101 y22* y* 22 1 QAGTSVLSQANQVPQTVLSLLR
## 135 2421.30959 y23* y* 23 1 QQAGTSVLSQANQVPQTVLSLLR
## 136 2534.39365 y24* y* 24 1 LQQAGTSVLSQANQVPQTVLSLLR
## 137 2647.47771 y25* y* 25 1 ILQQAGTSVLSQANQVPQTVLSLLR
## 138 2775.53629 y26* y* 26 1 QILQQAGTSVLSQANQVPQTVLSLLR
We can now visualise these fragments directly on the MS spectrum. Let’s first visualise the spectrum as is:
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] Spectra_1.15.13 BiocParallel_1.39.0
## [3] factoextra_1.0.7 ggplot2_3.5.1
## [5] QFeatures_1.15.3 MultiAssayExperiment_1.31.5
## [7] SummarizedExperiment_1.35.5 Biobase_2.65.1
## [9] GenomicRanges_1.57.2 GenomeInfoDb_1.41.2
## [11] IRanges_2.39.2 MatrixGenerics_1.17.1
## [13] matrixStats_1.4.1 PSMatch_1.9.1
## [15] S4Vectors_0.43.2 BiocGenerics_0.51.3
## [17] BiocStyle_2.33.1
##
## loaded via a namespace (and not attached):
## [1] rlang_1.1.4 magrittr_2.0.3 clue_0.3-65
## [4] compiler_4.4.1 vctrs_0.6.5 reshape2_1.4.4
## [7] stringr_1.5.1 ProtGenerics_1.37.1 pkgconfig_2.0.3
## [10] MetaboCoreUtils_1.13.1 crayon_1.5.3 fastmap_1.2.0
## [13] backports_1.5.0 XVector_0.45.0 labeling_0.4.3
## [16] utf8_1.2.4 rmarkdown_2.28 UCSC.utils_1.1.0
## [19] purrr_1.0.2 xfun_0.48 zlibbioc_1.51.2
## [22] cachem_1.1.0 jsonlite_1.8.9 highr_0.11
## [25] DelayedArray_0.31.14 broom_1.0.7 parallel_4.4.1
## [28] cluster_2.1.6 R6_2.5.1 bslib_0.8.0
## [31] stringi_1.8.4 car_3.1-3 jquerylib_0.1.4
## [34] Rcpp_1.0.13 knitr_1.48 Matrix_1.7-1
## [37] igraph_2.1.1 tidyselect_1.2.1 abind_1.4-8
## [40] yaml_2.3.10 codetools_0.2-20 lattice_0.22-6
## [43] tibble_3.2.1 plyr_1.8.9 withr_3.0.1
## [46] evaluate_1.0.1 pillar_1.9.0 BiocManager_1.30.25
## [49] ggpubr_0.6.0 carData_3.0-5 ncdf4_1.23
## [52] generics_0.1.3 munsell_0.5.1 scales_1.3.0
## [55] glue_1.8.0 lazyeval_0.2.2 maketools_1.3.1
## [58] tools_4.4.1 sys_3.4.3 mzR_2.39.2
## [61] ggsignif_0.6.4 buildtools_1.0.0 fs_1.6.4
## [64] grid_4.4.1 tidyr_1.3.1 MsCoreUtils_1.17.3
## [67] msdata_0.45.0 colorspace_2.1-1 GenomeInfoDbData_1.2.13
## [70] Formula_1.2-5 cli_3.6.3 fansi_1.0.6
## [73] S4Arrays_1.5.11 dplyr_1.1.4 AnnotationFilter_1.29.0
## [76] gtable_0.3.6 rstatix_0.7.2 sass_0.4.9
## [79] digest_0.6.37 SparseArray_1.5.45 ggrepel_0.9.6
## [82] farver_2.1.2 htmltools_0.5.8.1 lifecycle_1.0.4
## [85] httr_1.4.7 MASS_7.3-61