--- title: "Importing Spectra into Sirius" output: BiocStyle::html_document: toc_float: true vignette: > %\VignetteIndexEntry{Importing Spectra into Sirius} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignettePackage{RuSirius} %\VignetteDepends{Spectra, RSirius, RuSirius, MsDataHub} --- ``` r library(RuSirius) library(MsDataHub) library(Spectra) ``` ## Introduction **Note**: this vignette is [**pre-computed**](https://ropensci.org/blog/2019/12/08/precompute-vignettes/). See the session info for information on packages used and the date the vignette was rendered. The vignette requires a running [Sirius](https://bio.informatik.uni-jena.de/software/sirius/) instance. To reproduce this analysis, you will need Sirius 6.3 installed and running. This vignette demonstrates a basic workflow for importing MS data in a Spectra object object into *Sirius*. It then runs Sirius's main tools: formula identification, structure database search, compound class prediction, spectral library matching, *de novo* structure prediction, and finally retrieves the results. This is a foundational example and does not cover all the possible parameters for each Sirius tool. For detailed parameter information, consult the `run()` function documentation. More information can be found in the [Sirius documentation online](https://v6.docs.sirius-ms.io/). **IMPORTANT:** This is a work in progress. Feedback is highly valued, especially regarding enhancements or additions that could simplify your workflow. Your input as a user is essential. ## Prepping Spectra object Below we load the example mass spectrometry (MS) data, provided by the *MsDataHub*, as a `Spectra` object: ``` r dda_file <- MsDataHub::PestMix1_DDA.mzML() sp <- Spectra(dda_file) sp <- setBackend(sp, MsBackendMemory()) sp <- filterEmptySpectra(sp) ``` To import the `Spectra` data into *Sirius*, it must be preprocessed. If spectra from multiple MS levels are present, we need to group them appropriately. We use the `fragmentGroupIndex()` function to assign an index to each spectrum. MS2 spectra that belong to the same MS1 spectrum will share the same index. See `?fragmentGroupIndex` for details on how these spectra groups are defined. ``` r sp |> msLevel() |> table() #> #> 1 2 #> 4627 2756 idxs <- fragmentGroupIndex(sp) sp$Msn_idx <- idxs ``` ## Open Sirius and project set up The Sirius application is initialized via the API, requiring only a project ID. If the project exists, it is opened; otherwise, a new project is created. The `srs` object acts as the connection to Sirius and holds project details. Properly shut down the connection with `shutdown(srs)` after completing your work. This `srs` variable is needed for any task that necessitate to communicate with the application. You can learn more about this object class by running `?Sirius` in the console. Below I do not precise the `path` parameter, by default Sirius will try save your project in the `sirius_projects` folder in your user directory. Note that this folder will *not* be created automatically. If you want to save it somewhere else you can specify the `path =` parameter. ``` r srs <- Sirius(projectId = "test_spectra", path = getwd(), port = 9999) #> Error in `Sirius()`: #> ! unused argument (port = 9999) ``` You could import the entire `Spectra` object, but for demonstration purposes, we will use selected examples. Here, we import two MS1-MS2 pairs and one MS1 spectrum on its own. It's also possible to import only MS2 spectra. When importing, the `ms_column_name` parameter defines which column contains the index that groups the spectra. Each such group is considered one *feature* in Sirius terminology. ``` r sp_subset <- sp[sp$Msn_idx %in% c(421, 707, 895)] srs <- import(sirius = srs, spectra = sp_subset, ms_column_name = "Msn_idx", deleteExistingFeatures = TRUE) #> Error: #> ! object 'srs' not found ## See information about the features featuresInfo(srs) #> Error: #> ! object 'srs' not found ``` ## Submit job to Sirius - For structure DB search Once data is imported, annotation and prediction can begin. The `run()` function accepts parameters for each Sirius tool, such as formula identification, structure database search, and compound class prediction. ``` r ## Start computation run(srs, fallbackAdducts = c("[M + H]+", "[M + Na]+"), formulaIdParams = formulaIdParam(numberOfCandidates = 10, instrument = "QTOF", numberOfCandidatesPerIonization = 3, massAccuracyMS2ppm = 10, filterByIsotopePattern = FALSE, isotopeMs2Settings = c("SCORE"), performDeNovoBelowMz = 600, minPeaksToInjectSpecLibMatch = 3), predictParams = predictParam(), structureDbSearchParams = structureDbSearchParam( structureSearchDbs = c("BIO") ), recompute = TRUE, wait = TRUE ) #> Error: #> ! object 'srs' not found ## could test featureInfo vs featureId info <- featuresInfo(srs) #> Error: #> ! object 'srs' not found ``` ## Retrieve Results To get a summary of all results—including top formulas, structures, and compound class predictions—use the following: ``` r summarytb <- summary(srs, result.type = "structure") #> Error: #> ! object 'srs' not found ``` This summary table offers a quick overview for checking whether the predictions meet expectations. However, we recommend not relying solely on it for in-depth analysis. Instead, use the more detailed functions provided later in this vignette. Key columns include confidence scores that help assess result reliability. ## De novo structure description ``` r # Compute with zodiac and denovo run(srs, msNovelistParams = deNovoStructureParam(numberOfCandidateToPredict = 5), recompute = FALSE, wait = TRUE ) #> Error: #> ! object 'srs' not found summaryDeNovo <- summary(srs, result.type = "deNovo") #> Error: #> ! object 'srs' not found ``` Interestingly, for the first feature, the results remain consistent, while for the second—originally having lower confidence—the predictions now differ. For a visual exploration of results, you can open the Sirius GUI: ``` r shutdown(srs) #> Warning in value[[3L]](cond): Could not retrieve open projects: object 'srs' not found #> Warning in doTryCatch(return(expr), name, parentenv, handler): restarting interrupted #> promise evaluation # openGUI(srs) # closeGUI(srs) ``` You can look more into retrieving the other results in the `?results` documentation. or the other vignette. ## Importing MS2-only or MSn-only data In some workflows, only MS2 (or MS2 and MS3) spectra are available — for example, when working with spectral libraries, MGF files, or data that was acquired without recording MS1 scans. The SIRIUS API fully supports importing features without MS1 data. When no MS1 spectra are present and no `ms_column_name` is provided, `import()` automatically groups MSn spectra by acquisition order: within each file (`dataOrigin`), a new group starts whenever a new MS2 `precursorMz` is encountered, and any subsequent higher-level scans (MS3+) are assigned to the same group as their preceding MS2. With `deleteExistingFeatures = TRUE` any eventually present previously imported spectra (*features*) and their results are removed. ``` r ## Example: importing MS2-only spectra ## Assume sp_ms2 is a Spectra object containing only MS2 (and optionally MS3) ## spectra, with no MS1 data. sp_ms2 <- filterMsLevel(sp, msLevel = 2L) sp_ms2 <- sp_ms2[1:10] # Just an example subset of MS2 spectra ## No need for ms_column_name — the function auto-groups by acquisition order. srs <- import(sirius = srs, spectra = sp_ms2, deleteExistingFeatures = TRUE) featuresInfo(srs) ``` If your MSn spectra already have a grouping column (e.g., from a feature detection tool), you can still pass it via `ms_column_name` as usual. # Session information The R code was run on: ``` r date() #> [1] "Mon Mar 23 11:27:17 2026" ``` Information on the R session: ``` r sessionInfo() #> R version 4.5.2 (2025-10-31 ucrt) #> Platform: x86_64-w64-mingw32/x64 #> Running under: Windows 11 x64 (build 26100) #> #> Matrix products: default #> LAPACK version 3.12.1 #> #> locale: #> [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8 #> [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C #> [5] LC_TIME=English_United States.utf8 #> #> time zone: Europe/Rome #> tzcode source: internal #> #> attached base packages: #> [1] stats4 stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] MsDataHub_1.10.0 dplyr_1.2.0 RuSirius_0.2.0 #> [4] jsonlite_2.0.0 MetaboAnnotation_1.14.0 RSirius_6.3.3 #> [7] xcms_4.8.0 MsExperiment_1.12.0 ProtGenerics_1.42.0 #> [10] Spectra_1.20.1 BiocParallel_1.44.0 S4Vectors_0.48.0 #> [13] BiocGenerics_0.56.0 generics_0.1.4 #> #> loaded via a namespace (and not attached): #> [1] RColorBrewer_1.1-3 MultiAssayExperiment_1.36.1 magrittr_2.0.4 #> [4] farver_2.1.2 MALDIquant_1.22.3 fs_1.6.6 #> [7] vctrs_0.7.1 memoise_2.0.1 RCurl_1.98-1.17 #> [10] base64enc_0.1-6 htmltools_0.5.9 S4Arrays_1.10.1 #> [13] BiocBaseUtils_1.12.0 progress_1.2.3 curl_7.0.0 #> [16] AnnotationHub_4.0.0 SparseArray_1.10.8 mzID_1.48.0 #> [19] htmlwidgets_1.6.4 plyr_1.8.9 httr2_1.2.2 #> [22] impute_1.84.0 cachem_1.1.0 igraph_2.2.1 #> [25] lifecycle_1.0.5 iterators_1.0.14 pkgconfig_2.0.3 #> [28] Matrix_1.7-4 R6_2.6.1 fastmap_1.2.0 #> [31] MatrixGenerics_1.22.0 clue_0.3-66 digest_0.6.39 #> [34] pcaMethods_2.2.0 rsvg_2.7.0 AnnotationDbi_1.72.0 #> [37] ExperimentHub_3.0.0 GenomicRanges_1.62.1 RSQLite_2.4.5 #> [40] filelock_1.0.3 httr_1.4.7 abind_1.4-8 #> [43] compiler_4.5.2 withr_3.0.2 bit64_4.6.0-1 #> [46] doParallel_1.0.17 S7_0.2.1 DBI_1.2.3 #> [49] MASS_7.3-65 ChemmineR_3.62.0 rappdirs_0.3.4 #> [52] DelayedArray_0.36.0 rjson_0.2.23 mzR_2.44.0 #> [55] tools_4.5.2 PSMatch_1.14.0 otel_0.2.0 #> [58] CompoundDb_1.14.2 glue_1.8.0 QFeatures_1.20.0 #> [61] grid_4.5.2 cluster_2.1.8.1 reshape2_1.4.5 #> [64] snow_0.4-4 gtable_0.3.6 preprocessCore_1.72.0 #> [67] tidyr_1.3.2 data.table_1.18.2.1 hms_1.1.4 #> [70] MetaboCoreUtils_1.19.2 xml2_1.5.2 XVector_0.50.0 #> [73] BiocVersion_3.22.0 foreach_1.5.2 pillar_1.11.1 #> [76] stringr_1.6.0 limma_3.66.0 BiocFileCache_3.0.0 #> [79] lattice_0.22-7 bit_4.6.0 tidyselect_1.2.1 #> [82] Biostrings_2.78.0 knitr_1.51 gridExtra_2.3 #> [85] IRanges_2.44.0 Seqinfo_1.0.0 SummarizedExperiment_1.40.0 #> [88] xfun_0.56 Biobase_2.70.0 statmod_1.5.1 #> [91] MSnbase_2.36.0 matrixStats_1.5.0 DT_0.34.0 #> [94] stringi_1.8.7 yaml_2.3.12 lazyeval_0.2.2 #> [97] evaluate_1.0.5 codetools_0.2-20 MsCoreUtils_1.22.1 #> [100] tibble_3.3.1 BiocManager_1.30.27 cli_3.6.5 #> [103] affyio_1.80.0 Rcpp_1.1.1 MassSpecWavelet_1.76.0 #> [106] dbplyr_2.5.1 png_0.1-8 XML_3.99-0.20 #> [109] parallel_4.5.2 ggplot2_4.0.2 blob_1.3.0 #> [112] prettyunits_1.2.0 AnnotationFilter_1.34.0 bitops_1.0-9 #> [115] MsFeatures_1.18.0 scales_1.4.0 affy_1.88.0 #> [118] ncdf4_1.24 purrr_1.2.1 crayon_1.5.3 #> [121] rlang_1.1.7 KEGGREST_1.50.0 vsn_3.78.1 ```