---
title: "Importing Spectra into Sirius"
output:
    BiocStyle::html_document:
        toc_float: true
vignette: >
    %\VignetteIndexEntry{Importing Spectra into Sirius}
    %\VignetteEngine{knitr::rmarkdown}
    %\VignetteEncoding{UTF-8}
    %\VignettePackage{RuSirius}
    %\VignetteDepends{Spectra, RSirius, RuSirius, MsDataHub}
---

<!--
# Pre-render with (in the vignettes folder)
knitr::knit("ImportSpectra.Rmd.orig", output = "ImportSpectra.Rmd")
-->


``` r
library(RuSirius)
library(MsDataHub)
library(Spectra)
```

## Introduction

**Note**: this vignette is
[**pre-computed**](https://ropensci.org/blog/2019/12/08/precompute-vignettes/).
See the session info for information on packages used and the date the vignette
was rendered. The vignette requires a running
[Sirius](https://bio.informatik.uni-jena.de/software/sirius/) instance. To
reproduce this analysis, you will need Sirius 6.3 installed and running.

This vignette demonstrates a basic workflow for importing MS data in a Spectra
object object into *Sirius*. It then runs Sirius's main tools: formula
identification, structure database search, compound class prediction, spectral
library matching, *de novo* structure prediction, and finally retrieves the
results.

This is a foundational example and does not cover all the possible parameters
for each Sirius tool. For detailed parameter information, consult the `run()`
function documentation. More information can be found in the [Sirius
documentation online](https://v6.docs.sirius-ms.io/).

**IMPORTANT:** This is a work in progress. Feedback is highly valued, especially
regarding enhancements or additions that could simplify your workflow. Your
input as a user is essential.

## Prepping Spectra object

Below we load the example mass spectrometry (MS) data, provided by the
*MsDataHub*, as a `Spectra` object:


``` r
dda_file <- MsDataHub::PestMix1_DDA.mzML()
sp <- Spectra(dda_file)
sp <- setBackend(sp, MsBackendMemory())
sp <- filterEmptySpectra(sp)
```

To import the `Spectra` data into *Sirius*, it must be preprocessed. If spectra
from multiple MS levels are present, we need to group them appropriately.

We use the `fragmentGroupIndex()` function to assign an index to each spectrum.
MS2 spectra that belong to the same MS1 spectrum will share the same index. See
`?fragmentGroupIndex` for details on how these spectra groups are defined.


``` r
sp |>
    msLevel() |>
    table()
#> 
#>    1    2 
#> 4627 2756

idxs <- fragmentGroupIndex(sp)
sp$Msn_idx <- idxs
```

## Open Sirius and project set up

The Sirius application is initialized via the API, requiring only a project ID.
If the project exists, it is opened; otherwise, a new project is created. The
`srs` object acts as the connection to Sirius and holds project details.
Properly shut down the connection with `shutdown(srs)` after completing your
work.

This `srs` variable is needed for any task that necessitate to communicate with
the application. You can learn more about this object class by running `?Sirius`
in the console. Below I do not precise the `path` parameter, by default Sirius
will try save your project in the `sirius_projects` folder in your user
directory. Note that this folder will *not* be created automatically. If you
want to save it somewhere else you can specify the `path =` parameter.


``` r
srs <- Sirius(projectId = "test_spectra", path = getwd(), port = 9999)
#> Error in `Sirius()`:
#> ! unused argument (port = 9999)
```

You could import the entire `Spectra` object, but for demonstration purposes, we
will use selected examples.

Here, we import two MS1-MS2 pairs and one MS1 spectrum on its own. It's also
possible to import only MS2 spectra.

When importing, the `ms_column_name` parameter defines which column contains the
index that groups the spectra. Each such group is considered one *feature* in
Sirius terminology.


``` r
sp_subset <- sp[sp$Msn_idx %in% c(421, 707, 895)]

srs <- import(sirius = srs,
              spectra = sp_subset,
              ms_column_name = "Msn_idx",
              deleteExistingFeatures = TRUE)
#> Error:
#> ! object 'srs' not found

## See information about the features
featuresInfo(srs)
#> Error:
#> ! object 'srs' not found
```

## Submit job to Sirius - For structure DB search

Once data is imported, annotation and prediction can begin. The `run()` function
accepts parameters for each Sirius tool, such as formula identification,
structure database search, and compound class prediction.


``` r
## Start computation
run(srs,
    fallbackAdducts = c("[M + H]+", "[M + Na]+"),
    formulaIdParams = formulaIdParam(numberOfCandidates = 10,
                                       instrument = "QTOF",
                        numberOfCandidatesPerIonization = 3,
                        massAccuracyMS2ppm = 10,
                        filterByIsotopePattern = FALSE,
                        isotopeMs2Settings = c("SCORE"),
                        performDeNovoBelowMz = 600,
                        minPeaksToInjectSpecLibMatch = 3),
    predictParams = predictParam(),

    structureDbSearchParams = structureDbSearchParam(
          structureSearchDbs = c("BIO")
      ),
    recompute = TRUE,
    wait = TRUE
    )
#> Error:
#> ! object 'srs' not found

## could test featureInfo vs featureId
info <- featuresInfo(srs)
#> Error:
#> ! object 'srs' not found
```

## Retrieve Results

To get a summary of all results—including top formulas, structures, and compound
class predictions—use the following:


``` r
summarytb <- summary(srs, result.type = "structure")
#> Error:
#> ! object 'srs' not found
```

This summary table offers a quick overview for checking whether the predictions
meet expectations. However, we recommend not relying solely on it for in-depth
analysis. Instead, use the more detailed functions provided later in this
vignette.

Key columns include confidence scores that help assess result reliability.

## De novo structure description


``` r
# Compute with zodiac and denovo
run(srs,
    msNovelistParams = deNovoStructureParam(numberOfCandidateToPredict = 5),
    recompute = FALSE,
    wait = TRUE
)
#> Error:
#> ! object 'srs' not found

summaryDeNovo <- summary(srs, result.type = "deNovo")
#> Error:
#> ! object 'srs' not found
```

Interestingly, for the first feature, the results remain consistent, while for
the second—originally having lower confidence—the predictions now differ.

For a visual exploration of results, you can open the Sirius GUI:


``` r
shutdown(srs)
#> Warning in value[[3L]](cond): Could not retrieve open projects: object 'srs' not found
#> Warning in doTryCatch(return(expr), name, parentenv, handler): restarting interrupted
#> promise evaluation

# openGUI(srs)
# closeGUI(srs)
```

You can look more into retrieving the other results in the `?results`
documentation. or the other vignette.

## Importing MS2-only or MSn-only data

In some workflows, only MS2 (or MS2 and MS3) spectra are available — for
example, when working with spectral libraries, MGF files, or data that was
acquired without recording MS1 scans.

The SIRIUS API fully supports importing features without MS1 data. When no MS1
spectra are present and no `ms_column_name` is provided, `import()`
automatically groups MSn spectra by acquisition order: within each file
(`dataOrigin`), a new group starts whenever a new MS2 `precursorMz` is
encountered, and any subsequent higher-level scans (MS3+) are assigned to the
same group as their preceding MS2. With `deleteExistingFeatures = TRUE` any
eventually present previously imported spectra (*features*) and their results
are removed.


``` r
## Example: importing MS2-only spectra
## Assume sp_ms2 is a Spectra object containing only MS2 (and optionally MS3)
## spectra, with no MS1 data.
sp_ms2 <- filterMsLevel(sp, msLevel = 2L)
sp_ms2 <- sp_ms2[1:10]  # Just an example subset of MS2 spectra
## No need for ms_column_name — the function auto-groups by acquisition order.
srs <- import(sirius = srs,
              spectra = sp_ms2,
              deleteExistingFeatures = TRUE)

featuresInfo(srs)
```

If your MSn spectra already have a grouping column (e.g., from a feature
detection tool), you can still pass it via `ms_column_name` as usual.

# Session information

The R code was run on:


``` r
date()
#> [1] "Mon Mar 23 11:27:17 2026"
```

Information on the R session:


``` r
sessionInfo()
#> R version 4.5.2 (2025-10-31 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 11 x64 (build 26100)
#> 
#> Matrix products: default
#>   LAPACK version 3.12.1
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Europe/Rome
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats4    stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#>  [1] MsDataHub_1.10.0        dplyr_1.2.0             RuSirius_0.2.0         
#>  [4] jsonlite_2.0.0          MetaboAnnotation_1.14.0 RSirius_6.3.3          
#>  [7] xcms_4.8.0              MsExperiment_1.12.0     ProtGenerics_1.42.0    
#> [10] Spectra_1.20.1          BiocParallel_1.44.0     S4Vectors_0.48.0       
#> [13] BiocGenerics_0.56.0     generics_0.1.4         
#> 
#> loaded via a namespace (and not attached):
#>   [1] RColorBrewer_1.1-3          MultiAssayExperiment_1.36.1 magrittr_2.0.4             
#>   [4] farver_2.1.2                MALDIquant_1.22.3           fs_1.6.6                   
#>   [7] vctrs_0.7.1                 memoise_2.0.1               RCurl_1.98-1.17            
#>  [10] base64enc_0.1-6             htmltools_0.5.9             S4Arrays_1.10.1            
#>  [13] BiocBaseUtils_1.12.0        progress_1.2.3              curl_7.0.0                 
#>  [16] AnnotationHub_4.0.0         SparseArray_1.10.8          mzID_1.48.0                
#>  [19] htmlwidgets_1.6.4           plyr_1.8.9                  httr2_1.2.2                
#>  [22] impute_1.84.0               cachem_1.1.0                igraph_2.2.1               
#>  [25] lifecycle_1.0.5             iterators_1.0.14            pkgconfig_2.0.3            
#>  [28] Matrix_1.7-4                R6_2.6.1                    fastmap_1.2.0              
#>  [31] MatrixGenerics_1.22.0       clue_0.3-66                 digest_0.6.39              
#>  [34] pcaMethods_2.2.0            rsvg_2.7.0                  AnnotationDbi_1.72.0       
#>  [37] ExperimentHub_3.0.0         GenomicRanges_1.62.1        RSQLite_2.4.5              
#>  [40] filelock_1.0.3              httr_1.4.7                  abind_1.4-8                
#>  [43] compiler_4.5.2              withr_3.0.2                 bit64_4.6.0-1              
#>  [46] doParallel_1.0.17           S7_0.2.1                    DBI_1.2.3                  
#>  [49] MASS_7.3-65                 ChemmineR_3.62.0            rappdirs_0.3.4             
#>  [52] DelayedArray_0.36.0         rjson_0.2.23                mzR_2.44.0                 
#>  [55] tools_4.5.2                 PSMatch_1.14.0              otel_0.2.0                 
#>  [58] CompoundDb_1.14.2           glue_1.8.0                  QFeatures_1.20.0           
#>  [61] grid_4.5.2                  cluster_2.1.8.1             reshape2_1.4.5             
#>  [64] snow_0.4-4                  gtable_0.3.6                preprocessCore_1.72.0      
#>  [67] tidyr_1.3.2                 data.table_1.18.2.1         hms_1.1.4                  
#>  [70] MetaboCoreUtils_1.19.2      xml2_1.5.2                  XVector_0.50.0             
#>  [73] BiocVersion_3.22.0          foreach_1.5.2               pillar_1.11.1              
#>  [76] stringr_1.6.0               limma_3.66.0                BiocFileCache_3.0.0        
#>  [79] lattice_0.22-7              bit_4.6.0                   tidyselect_1.2.1           
#>  [82] Biostrings_2.78.0           knitr_1.51                  gridExtra_2.3              
#>  [85] IRanges_2.44.0              Seqinfo_1.0.0               SummarizedExperiment_1.40.0
#>  [88] xfun_0.56                   Biobase_2.70.0              statmod_1.5.1              
#>  [91] MSnbase_2.36.0              matrixStats_1.5.0           DT_0.34.0                  
#>  [94] stringi_1.8.7               yaml_2.3.12                 lazyeval_0.2.2             
#>  [97] evaluate_1.0.5              codetools_0.2-20            MsCoreUtils_1.22.1         
#> [100] tibble_3.3.1                BiocManager_1.30.27         cli_3.6.5                  
#> [103] affyio_1.80.0               Rcpp_1.1.1                  MassSpecWavelet_1.76.0     
#> [106] dbplyr_2.5.1                png_0.1-8                   XML_3.99-0.20              
#> [109] parallel_4.5.2              ggplot2_4.0.2               blob_1.3.0                 
#> [112] prettyunits_1.2.0           AnnotationFilter_1.34.0     bitops_1.0-9               
#> [115] MsFeatures_1.18.0           scales_1.4.0                affy_1.88.0                
#> [118] ncdf4_1.24                  purrr_1.2.1                 crayon_1.5.3               
#> [121] rlang_1.1.7                 KEGGREST_1.50.0             vsn_3.78.1
```