I am fairly new to using R and the first project I have taken on in R is using XCMS for the pre-processing of lipid metabolite samples for export to GNPS from XCMS. While my code runs fine, the network results are really confusing me. I expected to see more identified MS2 features and networks given the large size of the files (~46 mg for the experimental and control files and ~18 mb for the 2 blanks). In case I am reading the results wrong, here is the link to the most recent GNPS job I ran:
The data I am using is 51 .mzML/.mzXML files broken up into 3 sample groups--where 36 are control, 13 are experimental, and 2 are blanks (that I am including as a sample group so that I can take out features unique to the blanks that are present in any other samples). This biolipid data was run in the negative ionisation mode and the machine was UPLC Q-exactive (and each run lasts approx. 7.5 min).
I have followed the XCMS documentation that exists for feature-based metabolomic networking and watched many tutorials, but no matter what I try I always end up with very few MS2 features at the end, barely even enough to make a single network. The code runs just fine, but I wonder if I am filtering excessively or have made a mistake with how I formatted the sample groups?
I'm going to include a bit of the code (all pretty much the same as the online documentation) because I'm very uncertain on where I have gone wrong.
First, I import the data into R and specify the 3 sample groups (blank,experimental,control). Then, I read in the files using
```{r}
rawData <- readMSData(msfiles, centroided. = TRUE, mode = "onDisk",pd)
```
where "pd" is an object of "NAnnotatedDataFrame" of the pheno data for the samples (name + sample group for each file) and msfiles lists the path for all the files.
Next, I pick peaks using findChromPeaks() with CentWaveParam and adjust retention times using adjustRtime() with ObiwarpParam.
To group the peaks, I used groupChromPeaks() with PeakDensityParam (and a minFraction of 0.1 for the peak to belong to a sample group).
Finally, I filled missing peaks and then used the following to create my filtered MS2 data:
```{r}
filteredMs2Spectra <- featureSpectra(processedData, return.type = "MSpectra")
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)
filteredMs2Spectra_consensus <- combineSpectra(filteredMs2Spectra, fcol = "feature_id", method = consensusSpectrum,
mzd = 0, minProp = 0.8, ppm = 10) # this mgf file ends up being very small
```
The consensus variable is the one that I used to export to GNPS along with the feature definition/intensities table.
I've been tweaking my pre-processing code and re-running it through GNPS many times but it takes so long (in R) because of the size of the files and I've run out of ideas. Any help or insight on this matter would be greatly appreciated.
Thanks!
Rene