Hello @ all,
I have some trouble with my network analysis using .mzXML converted Bruker .d data, generated by an Impact II MS. In short words, I got the same phenomenon as described in following publication:
"Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures“
Garg, N., Kapono, C. A., Lim, Y. W., Koyama, N., Vermeij, M. J. A., Conrad, D., et al. (2015). International Journal of Mass Spectrometry, 377, 719–727.
There, your working group is describing, that multiple nodes of the same m/z arise probably due to problems of data conversion from .d file format to .mzXML. I asked Prof. Dorrestein if the problem is solved in the meanwhile. I got the following answer (and the advice to post and discuss my problem here):
"(...) They may even point to scripts to do this. We also see the same with Bruker data here but there it is because the spectra are different (and thus reflects real differences in spectra) and based on the data it is real. Basically when you fragment at different concentrations you see ions drop out and intensity varies a bit, or the parent mass shift out of the filtering window. It also means the more of the same ions that are fragmented the more "unique" spectra either based on parent mass or ms/ms intensities are captured. The largest issue with Bruker conversions is that most mzxml converters, especially if the manufacturer provided this, it grabs the wrong pre calibration header. I suggest to contact Bruker if this is the case. (...)"
Regarding his answer there are two questions left:
1) Prof. Dorrestein described, that the ion could shift out of the filtering window. So as workaround the filtering window could be set bigger. But what is an appropriate filtering window if I am dealing with high resolution data - or in other words, how can I express a mass accuracy below 2ppm as „parent or ion mass tolerance“ in Dalton in the gnps settings? However, I played around with bigger filter windows, the problem remains.
2) The problem, that the .mzXML converter grab wrong data header. Can someone tell me how to prove this? (wich software or script editor?) In this case I want to talk to Bruker, I also have contact to software developers there, maybe they can fix the converter behaviour as you said with a script - or it is fixed in the meanwhile (I use DataAnalysis 4.3 - as I understood my technician properly, version 4.4 is a stable release and 5.0 is in development).
Unfortunately the problem seems very strange to me. I started network analyses using data aquired by a Bruker micrOtof MS. If I rightly remember, I converted the data using DataAnalysis 4.2 (with service pack) and the problem with multiple nodes did not occur. However I tried different ways to convert my current data: (i) unsing DataAnalysis export via menu entry, (ii) convertion with DataAnalysis via custom script, provided by Bruker, (iii) convertion unsing MSconvert of Proteosafe suite. Last mentioned led to unusable data for network analysis. I tried first 64 bit conversion, networking was aborted with the message that data has to be 32 bit. With 32 bit data calculation was aborted by the gnps server after 10h (node termination). However, msconvert swells my data - 500mb originally to 2,3Gb after. I also used "raw" data, not lock mass corrected via downstram scripting after finishing the MS run - the problem remains.
So actually I have no clue where the problem lays. I think there is something wrong with converting the data to an open format or something is altered in data writing of newer Bruker MS. But I do not understand why it worked with an other (older) HR-MS but not with your Maxis or my Impact II.
I would be happy to get some ideas!
Greetings from Germany
Hendrik Wolff