In the image above you can see that there are some signals (below 600) that get imported into the GNPS algorithm that are artifacts from the exporting procedure from DataAnalysis (Bruker) whether it's done manually or in batch mode with a script. My guess is that when creating the line spectra it averages surrounding baseline signals and produces the artifact signal (?). The problem is that when clustering, the algorithm considers the artifacts as real signals and sometimes it creates random matches between nodes that happened to have the same artifacts in the same region and as well as coinciding matches with library spectra. However, in the second case, after manual inspection the spectral match is way off with regards to other peaks and their relative intensities. This is very problematic because it creates nodes due to random matching and not due to unique chemical information.
I'm aware that there are some controls that can't be considered in the GNPS algorithm parameter to help with restricting false matches, but I have encountered the following limitations:
- Minimum Peak Intensity: I could raise the bar to eliminate the artifacts. However, the intensity of the baseline between experiments is not always the same. The baseline particularly changes depending on the organic phase percentage that is reaching the equipment (a gradient run is used in the LC-MS set up). This phenomenon makes it difficult to set up a "standard" minimum peak intensity without sacrificing some signals.
For lipids such as the ones containing phosphocholine, I would actually suggest collecting in negative ion mode. In positive mode, the main fragment is m/z 184 which is intrinsically charged. Negative mode fragmentation of lipids yields more fragments which provides more structural information.
It would be helpful if you included a link to your gnps job so we can take a look at the settings being used. We may be able to provide some additional modifications to your settings that will reduce this particular issue.
You may also want to look into whether you want to include a filtering step in your conversion. You can probably do this in MS Convert on the mzXMLs. But also realize that you will lose some legitimate peaks in your MS2 data.
I agree with Laura. You should be able to collect data on the MicroTofQ at 10 Hz. I definitely collected at 10 Hz on the Dorrestein MicroTof. I collected 10 MS2 for every MS1. Your LC peak shape will suffer, but you'll have more scans. It really depends on the question you're asking and whether the chromatogram is more important than the MS2 data.
Like every mass spec tool, gnps has tradeoffs. The more broad your settings, the more noise you will get. You may want to perform feature finding on your MS1 data to visualize the production of a specific m/z value across samples. This would help support conclusions based upon loose networking parameters.
Best wishes,
Vanessa