FBMN Pre-processing through XCMS

90 views
Skip to first unread message

Rene Sieracki

unread,
Apr 5, 2021, 2:10:21 AM4/5/21
to GNPS Discussion Forum and Bug Reports
Hello all,

I am fairly new to using R and the first project I have taken on in R is using XCMS for the pre-processing of lipid metabolite samples for export to GNPS from XCMS. While my code runs fine, the network results are really confusing me. I expected to see more identified MS2 features and networks given the large size of the files (~46 mg for the experimental and control files and ~18 mb for the 2 blanks). In case I am reading the results wrong, here is the link to the most recent GNPS job I ran:

The data I am using is 51 .mzML/.mzXML files broken up into 3 sample groups--where 36 are control, 13 are experimental, and 2 are blanks (that I am including as a sample group so that I can take out features unique to the blanks that are present in any other samples). This biolipid data was run in the negative ionisation mode and the machine was UPLC Q-exactive (and each run lasts approx. 7.5 min).

I have followed the XCMS documentation that exists for feature-based metabolomic networking and watched many tutorials, but no matter what I try I always end up with very few MS2 features at the end, barely even enough to make a single network. The code runs just fine, but I wonder if I am filtering excessively or have made a mistake with how I formatted the sample groups?

I'm going to include a bit of the code (all pretty much the same as the online documentation) because I'm very uncertain on where I have gone wrong.

First, I import the data into R and specify the 3 sample groups (blank,experimental,control). Then, I read in the files using
```{r}
rawData <- readMSData(msfiles, centroided. = TRUE, mode = "onDisk",pd)
```
where "pd" is an object of "NAnnotatedDataFrame" of the pheno data for the samples (name + sample group for each file) and msfiles lists the path for all the files.
Next, I pick peaks using findChromPeaks() with CentWaveParam and adjust retention times using adjustRtime() with ObiwarpParam.
To group the peaks, I used groupChromPeaks() with PeakDensityParam (and a minFraction of 0.1 for the peak to belong to a sample group).
Finally, I filled missing peaks and then used the following to create my filtered MS2 data:

```{r}
filteredMs2Spectra <- featureSpectra(processedData, return.type = "MSpectra")
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)

filteredMs2Spectra_consensus <- combineSpectra(filteredMs2Spectra, fcol = "feature_id", method = consensusSpectrum,
    mzd = 0, minProp = 0.8, ppm = 10) # this mgf file ends up being very small
```
The consensus variable is the one that I used to export to GNPS along with the feature definition/intensities table.

I've been tweaking my pre-processing code and re-running it through GNPS many times but it takes so long (in R) because of the size of the files and I've run out of ideas. Any help or insight on this matter would be greatly appreciated.

Thanks!
Rene

Mingxun Wang

unread,
Apr 5, 2021, 2:29:14 AM4/5/21
to Rene Sieracki, GNPS Discussion Forum and Bug Reports
Hi Rene,

Just briefly looking seems the spectra aren’t showing up so might be an issue with the format of your Mgf. Swing by office hours tomorrow and should be a quick thing to investigate. Also would recommend using a newer version of the workflow, try cloning to latest version as it will have more features and easier to dig into data.

Ming 

--
You received this message because you are subscribed to the Google Groups "GNPS Discussion Forum and Bug Reports" group.
To unsubscribe from this group and stop receiving emails from it, send an email to molecular_networking_b...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/molecular_networking_bug_reports/e813a764-5178-4b74-aa73-bce818ea3751n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages