MS2 fragment ions

Package: PSMatch
Authors: Laurent Gatto [aut, cre] (https://orcid.org/0000-0002-1520-2268), Johannes Rainer [aut] (https://orcid.org/0000-0002-6977-7147), Sebastian Gibb [aut] (https://orcid.org/0000-0001-7406-4443), Samuel Wieczorek [ctb], Thomas Burger [ctb]
Last modified: 2024-10-28 13:27:57.226887
Compiled: Mon Oct 28 13:32:01 2024

Introduction

This vignette is one among several illustrating how to use the PSMatch package, focusing on the calculation and visualisation of MS2 fragment ions. For a general overview of the package, see the PSMatch package manual page (?PSMatch) and references therein.

To illustrate this vignette, we will import and merge raw and identification data from the msdata. For details about this section, please visit the Spectra package webpage.

Load the raw MS data:

(spf <- msdata::proteomics(pattern = "2014", full.names = TRUE))
## [1] "/github/workspace/pkglib/msdata/proteomics/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz"
library(Spectra)
sp <- Spectra(spf)

Load the identification data:

(idf <- msdata::ident(pattern = "2014", full.names = TRUE))
## [1] "/github/workspace/pkglib/msdata/ident/TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"
id <- PSM(idf) |> filterPSMs()
## Starting with 5802 PSMs:
## Removed 2896 decoy hits.
## Removed 155 PSMs with rank > 1.
## Removed 85 shared peptides.
## 2666 PSMs left.
id
## PSM with 2666 rows and 35 columns.
## names(35): sequence spectrumID ... subReplacementResidue subLocation

Merge both:

sp <- joinSpectraData(sp, id, by.x = "spectrumId", by.y = "spectrumID")
## Warning in joinSpectraData(sp, id, by.x = "spectrumId", by.y = "spectrumID"):
## Duplicates found in the 'y' key. Only last instance will be kept!
sp
## MSn data (Spectra) with 7534 spectra in a MsBackendMzR backend:
##        msLevel     rtime scanIndex
##      <integer> <numeric> <integer>
## 1            1    0.4584         1
## 2            1    0.9725         2
## 3            1    1.8524         3
## 4            1    2.7424         4
## 5            1    3.6124         5
## ...        ...       ...       ...
## 7530         2   3600.47      7530
## 7531         2   3600.83      7531
## 7532         2   3601.18      7532
## 7533         2   3601.57      7533
## 7534         2   3601.98      7534
##  ... 67 more variables/columns.
## 
## file(s):
## TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz

In this example, we are going to focus the MS2 scan with index 5449 and its parent MS1 scan (index 5447, selected automatically with the filterPrecursorScan() function).

sp5449 <- filterPrecursorScan(sp, 5449)
plotSpectra(sp5449[1], xlim = c(550, 1200))
abline(v = precursorMz(sp5449)[2], col = "red", lty = "dotted")

Calculating fragment ions

The MS2 scan was matched to SQILQQAGTSVLSQANQVPQTVLSLLR (there was obviously no match the the MS1 scan):

sp5449$sequence
## [1] NA                            "SQILQQAGTSVLSQANQVPQTVLSLLR"

The calculateFragments() simply takes a peptide sequence as input and returns a data.frame with the fragment sequences, M/Z, ion type, charge and position.

calculateFragments(sp5449$sequence[2])
## Modifications used: C=57.02146
##             mz  ion type pos z                        seq
## 1     88.03931   b1    b   1 1                          S
## 2    216.09789   b2    b   2 1                         SQ
## 3    329.18195   b3    b   3 1                        SQI
## 4    442.26601   b4    b   4 1                       SQIL
## 5    570.32459   b5    b   5 1                      SQILQ
## 6    698.38317   b6    b   6 1                     SQILQQ
## 7    769.42028   b7    b   7 1                    SQILQQA
## 8    826.44174   b8    b   8 1                   SQILQQAG
## 9    927.48942   b9    b   9 1                  SQILQQAGT
## 10  1014.52145  b10    b  10 1                 SQILQQAGTS
## 11  1113.58986  b11    b  11 1                SQILQQAGTSV
## 12  1226.67392  b12    b  12 1               SQILQQAGTSVL
## 13  1313.70595  b13    b  13 1              SQILQQAGTSVLS
## 14  1441.76453  b14    b  14 1             SQILQQAGTSVLSQ
## 15  1512.80164  b15    b  15 1            SQILQQAGTSVLSQA
## 16  1626.84457  b16    b  16 1           SQILQQAGTSVLSQAN
## 17  1754.90315  b17    b  17 1          SQILQQAGTSVLSQANQ
## 18  1853.97156  b18    b  18 1         SQILQQAGTSVLSQANQV
## 19  1951.02432  b19    b  19 1        SQILQQAGTSVLSQANQVP
## 20  2079.08290  b20    b  20 1       SQILQQAGTSVLSQANQVPQ
## 21  2180.13058  b21    b  21 1      SQILQQAGTSVLSQANQVPQT
## 22  2279.19899  b22    b  22 1     SQILQQAGTSVLSQANQVPQTV
## 23  2392.28305  b23    b  23 1    SQILQQAGTSVLSQANQVPQTVL
## 24  2479.31508  b24    b  24 1   SQILQQAGTSVLSQANQVPQTVLS
## 25  2592.39914  b25    b  25 1  SQILQQAGTSVLSQANQVPQTVLSL
## 26  2705.48320  b26    b  26 1 SQILQQAGTSVLSQANQVPQTVLSLL
## 27   175.11895   y1    y   1 1                          R
## 28   288.20301   y2    y   2 1                         LR
## 29   401.28707   y3    y   3 1                        LLR
## 30   488.31910   y4    y   4 1                       SLLR
## 31   601.40316   y5    y   5 1                      LSLLR
## 32   700.47157   y6    y   6 1                     VLSLLR
## 33   801.51925   y7    y   7 1                    TVLSLLR
## 34   929.57783   y8    y   8 1                   QTVLSLLR
## 35  1026.63059   y9    y   9 1                  PQTVLSLLR
## 36  1125.69900  y10    y  10 1                 VPQTVLSLLR
## 37  1253.75758  y11    y  11 1                QVPQTVLSLLR
## 38  1367.80051  y12    y  12 1               NQVPQTVLSLLR
## 39  1438.83762  y13    y  13 1              ANQVPQTVLSLLR
## 40  1566.89620  y14    y  14 1             QANQVPQTVLSLLR
## 41  1653.92823  y15    y  15 1            SQANQVPQTVLSLLR
## 42  1767.01229  y16    y  16 1           LSQANQVPQTVLSLLR
## 43  1866.08070  y17    y  17 1          VLSQANQVPQTVLSLLR
## 44  1953.11273  y18    y  18 1         SVLSQANQVPQTVLSLLR
## 45  2054.16041  y19    y  19 1        TSVLSQANQVPQTVLSLLR
## 46  2111.18187  y20    y  20 1       GTSVLSQANQVPQTVLSLLR
## 47  2182.21898  y21    y  21 1      AGTSVLSQANQVPQTVLSLLR
## 48  2310.27756  y22    y  22 1     QAGTSVLSQANQVPQTVLSLLR
## 49  2438.33614  y23    y  23 1    QQAGTSVLSQANQVPQTVLSLLR
## 50  2551.42020  y24    y  24 1   LQQAGTSVLSQANQVPQTVLSLLR
## 51  2664.50426  y25    y  25 1  ILQQAGTSVLSQANQVPQTVLSLLR
## 52  2792.56284  y26    y  26 1 QILQQAGTSVLSQANQVPQTVLSLLR
## 53   996.51088 b10_   b_  10 1                 SQILQQAGTS
## 54  1095.57929 b11_   b_  11 1                SQILQQAGTSV
## 55  1208.66335 b12_   b_  12 1               SQILQQAGTSVL
## 56  1295.69538 b13_   b_  13 1              SQILQQAGTSVLS
## 57  1423.75396 b14_   b_  14 1             SQILQQAGTSVLSQ
## 58  1494.79107 b15_   b_  15 1            SQILQQAGTSVLSQA
## 59  1608.83400 b16_   b_  16 1           SQILQQAGTSVLSQAN
## 60  1736.89258 b17_   b_  17 1          SQILQQAGTSVLSQANQ
## 61  1835.96099 b18_   b_  18 1         SQILQQAGTSVLSQANQV
## 62  1933.01375 b19_   b_  19 1        SQILQQAGTSVLSQANQVP
## 63  2061.07233 b20_   b_  20 1       SQILQQAGTSVLSQANQVPQ
## 64  2162.12001 b21_   b_  21 1      SQILQQAGTSVLSQANQVPQT
## 65  2261.18842 b22_   b_  22 1     SQILQQAGTSVLSQANQVPQTV
## 66  2374.27248 b23_   b_  23 1    SQILQQAGTSVLSQANQVPQTVL
## 67  2461.30451 b24_   b_  24 1   SQILQQAGTSVLSQANQVPQTVLS
## 68  2574.38857 b25_   b_  25 1  SQILQQAGTSVLSQANQVPQTVLSL
## 69  2687.47263 b26_   b_  26 1 SQILQQAGTSVLSQANQVPQTVLSLL
## 70   583.39260  y5_   y_   5 1                      LSLLR
## 71   682.46101  y6_   y_   6 1                     VLSLLR
## 72   783.50869  y7_   y_   7 1                    TVLSLLR
## 73   911.56727  y8_   y_   8 1                   QTVLSLLR
## 74  1008.62003  y9_   y_   9 1                  PQTVLSLLR
## 75  1107.68844 y10_   y_  10 1                 VPQTVLSLLR
## 76  1235.74702 y11_   y_  11 1                QVPQTVLSLLR
## 77  1349.78995 y12_   y_  12 1               NQVPQTVLSLLR
## 78  1420.82706 y13_   y_  13 1              ANQVPQTVLSLLR
## 79  1548.88564 y14_   y_  14 1             QANQVPQTVLSLLR
## 80  1635.91767 y15_   y_  15 1            SQANQVPQTVLSLLR
## 81  1749.00173 y16_   y_  16 1           LSQANQVPQTVLSLLR
## 82  1848.07014 y17_   y_  17 1          VLSQANQVPQTVLSLLR
## 83  1935.10217 y18_   y_  18 1         SVLSQANQVPQTVLSLLR
## 84  2036.14985 y19_   y_  19 1        TSVLSQANQVPQTVLSLLR
## 85  2093.17131 y20_   y_  20 1       GTSVLSQANQVPQTVLSLLR
## 86  2164.20842 y21_   y_  21 1      AGTSVLSQANQVPQTVLSLLR
## 87  2292.26700 y22_   y_  22 1     QAGTSVLSQANQVPQTVLSLLR
## 88  2420.32558 y23_   y_  23 1    QQAGTSVLSQANQVPQTVLSLLR
## 89  2533.40964 y24_   y_  24 1   LQQAGTSVLSQANQVPQTVLSLLR
## 90  2646.49370 y25_   y_  25 1  ILQQAGTSVLSQANQVPQTVLSLLR
## 91  2774.55228 y26_   y_  26 1 QILQQAGTSVLSQANQVPQTVLSLLR
## 92   157.10839  y1_   y_   1 1                          R
## 93   270.19245  y2_   y_   2 1                         LR
## 94   383.27651  y3_   y_   3 1                        LLR
## 95   470.30854  y4_   y_   4 1                       SLLR
## 96   312.15540  b3*   b*   3 1                        SQI
## 97   425.23946  b4*   b*   4 1                       SQIL
## 98   553.29804  b5*   b*   5 1                      SQILQ
## 99   681.35662  b6*   b*   6 1                     SQILQQ
## 100  752.39373  b7*   b*   7 1                    SQILQQA
## 101  809.41519  b8*   b*   8 1                   SQILQQAG
## 102  910.46287  b9*   b*   9 1                  SQILQQAGT
## 103  997.49490 b10*   b*  10 1                 SQILQQAGTS
## 104 1096.56331 b11*   b*  11 1                SQILQQAGTSV
## 105 1209.64737 b12*   b*  12 1               SQILQQAGTSVL
## 106 1296.67940 b13*   b*  13 1              SQILQQAGTSVLS
## 107 1424.73798 b14*   b*  14 1             SQILQQAGTSVLSQ
## 108 1495.77509 b15*   b*  15 1            SQILQQAGTSVLSQA
## 109 1609.81802 b16*   b*  16 1           SQILQQAGTSVLSQAN
## 110 1737.87660 b17*   b*  17 1          SQILQQAGTSVLSQANQ
## 111 1836.94501 b18*   b*  18 1         SQILQQAGTSVLSQANQV
## 112 1933.99777 b19*   b*  19 1        SQILQQAGTSVLSQANQVP
## 113 2062.05635 b20*   b*  20 1       SQILQQAGTSVLSQANQVPQ
## 114 2163.10403 b21*   b*  21 1      SQILQQAGTSVLSQANQVPQT
## 115 2262.17244 b22*   b*  22 1     SQILQQAGTSVLSQANQVPQTV
## 116 2375.25650 b23*   b*  23 1    SQILQQAGTSVLSQANQVPQTVL
## 117 2462.28853 b24*   b*  24 1   SQILQQAGTSVLSQANQVPQTVLS
## 118 2575.37259 b25*   b*  25 1  SQILQQAGTSVLSQANQVPQTVLSL
## 119 2688.45665 b26*   b*  26 1 SQILQQAGTSVLSQANQVPQTVLSLL
## 120  912.55128  y8*   y*   8 1                   QTVLSLLR
## 121 1009.60404  y9*   y*   9 1                  PQTVLSLLR
## 122 1108.67245 y10*   y*  10 1                 VPQTVLSLLR
## 123 1236.73103 y11*   y*  11 1                QVPQTVLSLLR
## 124 1350.77396 y12*   y*  12 1               NQVPQTVLSLLR
## 125 1421.81107 y13*   y*  13 1              ANQVPQTVLSLLR
## 126 1549.86965 y14*   y*  14 1             QANQVPQTVLSLLR
## 127 1636.90168 y15*   y*  15 1            SQANQVPQTVLSLLR
## 128 1749.98574 y16*   y*  16 1           LSQANQVPQTVLSLLR
## 129 1849.05415 y17*   y*  17 1          VLSQANQVPQTVLSLLR
## 130 1936.08618 y18*   y*  18 1         SVLSQANQVPQTVLSLLR
## 131 2037.13386 y19*   y*  19 1        TSVLSQANQVPQTVLSLLR
## 132 2094.15532 y20*   y*  20 1       GTSVLSQANQVPQTVLSLLR
## 133 2165.19243 y21*   y*  21 1      AGTSVLSQANQVPQTVLSLLR
## 134 2293.25101 y22*   y*  22 1     QAGTSVLSQANQVPQTVLSLLR
## 135 2421.30959 y23*   y*  23 1    QQAGTSVLSQANQVPQTVLSLLR
## 136 2534.39365 y24*   y*  24 1   LQQAGTSVLSQANQVPQTVLSLLR
## 137 2647.47771 y25*   y*  25 1  ILQQAGTSVLSQANQVPQTVLSLLR
## 138 2775.53629 y26*   y*  26 1 QILQQAGTSVLSQANQVPQTVLSLLR

Visualising fragment ions

We can now visualise these fragments directly on the MS spectrum. Let’s first visualise the spectrum as is:

plotSpectra(sp5449[2])

plotSpectra(sp5449[2], labels = addFragments, labelPos = 3)

Session information

sessionInfo()
## R version 4.4.1 (2024-06-14)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so;  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: Etc/UTC
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] Spectra_1.15.13             BiocParallel_1.39.0        
##  [3] factoextra_1.0.7            ggplot2_3.5.1              
##  [5] QFeatures_1.15.3            MultiAssayExperiment_1.31.5
##  [7] SummarizedExperiment_1.35.5 Biobase_2.65.1             
##  [9] GenomicRanges_1.57.2        GenomeInfoDb_1.41.2        
## [11] IRanges_2.39.2              MatrixGenerics_1.17.1      
## [13] matrixStats_1.4.1           PSMatch_1.9.1              
## [15] S4Vectors_0.43.2            BiocGenerics_0.51.3        
## [17] BiocStyle_2.33.1           
## 
## loaded via a namespace (and not attached):
##  [1] rlang_1.1.4             magrittr_2.0.3          clue_0.3-65            
##  [4] compiler_4.4.1          vctrs_0.6.5             reshape2_1.4.4         
##  [7] stringr_1.5.1           ProtGenerics_1.37.1     pkgconfig_2.0.3        
## [10] MetaboCoreUtils_1.13.1  crayon_1.5.3            fastmap_1.2.0          
## [13] backports_1.5.0         XVector_0.45.0          labeling_0.4.3         
## [16] utf8_1.2.4              rmarkdown_2.28          UCSC.utils_1.1.0       
## [19] purrr_1.0.2             xfun_0.48               zlibbioc_1.51.2        
## [22] cachem_1.1.0            jsonlite_1.8.9          highr_0.11             
## [25] DelayedArray_0.31.14    broom_1.0.7             parallel_4.4.1         
## [28] cluster_2.1.6           R6_2.5.1                bslib_0.8.0            
## [31] stringi_1.8.4           car_3.1-3               jquerylib_0.1.4        
## [34] Rcpp_1.0.13             knitr_1.48              Matrix_1.7-1           
## [37] igraph_2.1.1            tidyselect_1.2.1        abind_1.4-8            
## [40] yaml_2.3.10             codetools_0.2-20        lattice_0.22-6         
## [43] tibble_3.2.1            plyr_1.8.9              withr_3.0.1            
## [46] evaluate_1.0.1          pillar_1.9.0            BiocManager_1.30.25    
## [49] ggpubr_0.6.0            carData_3.0-5           ncdf4_1.23             
## [52] generics_0.1.3          munsell_0.5.1           scales_1.3.0           
## [55] glue_1.8.0              lazyeval_0.2.2          maketools_1.3.1        
## [58] tools_4.4.1             sys_3.4.3               mzR_2.39.2             
## [61] ggsignif_0.6.4          buildtools_1.0.0        fs_1.6.4               
## [64] grid_4.4.1              tidyr_1.3.1             MsCoreUtils_1.17.3     
## [67] msdata_0.45.0           colorspace_2.1-1        GenomeInfoDbData_1.2.13
## [70] Formula_1.2-5           cli_3.6.3               fansi_1.0.6            
## [73] S4Arrays_1.5.11         dplyr_1.1.4             AnnotationFilter_1.29.0
## [76] gtable_0.3.6            rstatix_0.7.2           sass_0.4.9             
## [79] digest_0.6.37           SparseArray_1.5.45      ggrepel_0.9.6          
## [82] farver_2.1.2            htmltools_0.5.8.1       lifecycle_1.0.4        
## [85] httr_1.4.7              MASS_7.3-61