Networking using Bruker data & Impact II MS

hendrik...@gmail.com

unread,

Feb 15, 2016, 3:01:22 AM2/15/16

to GNPS Discussion Forum and Bug Reports

Hello @ all,

I have some trouble with my network analysis using .mzXML converted Bruker .d data, generated by an Impact II MS. In short words, I got the same phenomenon as described in following publication:

"Mass spectral similarity for untargeted metabolomics data analysis of complex mixtures“

Garg, N., Kapono, C. A., Lim, Y. W., Koyama, N., Vermeij, M. J. A., Conrad, D., et al. (2015). International Journal of Mass Spectrometry, 377, 719–727.

There, your working group is describing, that multiple nodes of the same m/z arise probably due to problems of data conversion from .d file format to .mzXML. I asked Prof. Dorrestein if the problem is solved in the meanwhile. I got the following answer (and the advice to post and discuss my problem here):

"(...) They may even point to scripts to do this. We also see the same with Bruker data here but there it is because the spectra are different (and thus reflects real differences in spectra) and based on the data it is real. Basically when you fragment at different concentrations you see ions drop out and intensity varies a bit, or the parent mass shift out of the filtering window. It also means the more of the same ions that are fragmented the more "unique" spectra either based on parent mass or ms/ms intensities are captured. The largest issue with Bruker conversions is that most mzxml converters, especially if the manufacturer provided this, it grabs the wrong pre calibration header. I suggest to contact Bruker if this is the case. (...)"

Regarding his answer there are two questions left:

1) Prof. Dorrestein described, that the ion could shift out of the filtering window. So as workaround the filtering window could be set bigger. But what is an appropriate filtering window if I am dealing with high resolution data - or in other words, how can I express a mass accuracy below 2ppm as „parent or ion mass tolerance“ in Dalton in the gnps settings? However, I played around with bigger filter windows, the problem remains.

2) The problem, that the .mzXML converter grab wrong data header. Can someone tell me how to prove this? (wich software or script editor?) In this case I want to talk to Bruker, I also have contact to software developers there, maybe they can fix the converter behaviour as you said with a script - or it is fixed in the meanwhile (I use DataAnalysis 4.3 - as I understood my technician properly, version 4.4 is a stable release and 5.0 is in development).

Unfortunately the problem seems very strange to me. I started network analyses using data aquired by a Bruker micrOtof MS. If I rightly remember, I converted the data using DataAnalysis 4.2 (with service pack) and the problem with multiple nodes did not occur. However I tried different ways to convert my current data: (i) unsing DataAnalysis export via menu entry, (ii) convertion with DataAnalysis via custom script, provided by Bruker, (iii) convertion unsing MSconvert of Proteosafe suite. Last mentioned led to unusable data for network analysis. I tried first 64 bit conversion, networking was aborted with the message that data has to be 32 bit. With 32 bit data calculation was aborted by the gnps server after 10h (node termination). However, msconvert swells my data - 500mb originally to 2,3Gb after. I also used "raw" data, not lock mass corrected via downstram scripting after finishing the MS run - the problem remains.

So actually I have no clue where the problem lays. I think there is something wrong with converting the data to an open format or something is altered in data writing of newer Bruker MS. But I do not understand why it worked with an other (older) HR-MS but not with your Maxis or my Impact II.

I would be happy to get some ideas!

Greetings from Germany

Hendrik Wolff

hendrik...@gmail.com

unread,

Feb 16, 2016, 8:00:23 AM2/16/16

to GNPS Discussion Forum and Bug Reports

Hello again,

yesterday, I plyed around with my data and I want to update my observations.

First, I have to correct myself. I reproduced the multiple nodes problem with converted data aquired by the older micrOtof and our actual Impact II. The above mentioned working network analysis was done by using Iontrap-data aquired by a Bruker amaZon MS. Sry for that inaccuracy. However, it seems that it is a problem with dealing high resolution data.

Second, I viewed my Impact II generated .mzXML data by importing them to Excel. I had a look on two files, first "as raw as possible" which means no lock mass correction with automated DataAnalysis scripts etc.. Second, the same analysis but after applied calibration script. In both cases, the vendor mzXML converter uses "mzXML Version 2.1" and "Bruker compass xport 3.0.9.2 (most actual)". I also identified a scan number of a known component whereby information displayed in DataAnalysis was the same like in the exported mzXML file. The same was true for the micrOtof data. I did this to understand what "grabing the wrong pre calibration header" means. I have no clue...

Greetings

Hendrik

Kathleen and Pieter Dorrestein

unread,

Feb 16, 2016, 10:51:44 AM2/16/16

to GNPS Discussion Forum and Bug Reports

Thanks, it is great to see you are using all resources and are trouble shooting.

Can you send links with your networking runs and the data. All jobs are retained in GNPS so you can send the links? This will help us looking into this.

There will always be multiple nodes as spectra vary and there is no way around this especially if there are many spectra of the same molecule (parent mass slightly different, MS/MS due to different amounts of the analyte will vary the intensities, same method run on different days, instrument temperature-lets say in the morning vs afternoon alters a spectrum, and then we also see that there are other masses close in the window of the same molecules and thus creates a spectrum that will network with different parent masses etc) but one can post-GNPS merge them as well (There are Cytoscape apps for post merging of the data). Isomers will then also be merged if you do not consider retention time.

It is also possible there are issues on our end, as we had a major server failure a few months back but all should be fixed now and work much faster, and we would like to have a look. The header issue, in particular when we network a large number of files, gave us a large number of nodes for certain molecules but we were provided a fixed script by the Bruker team and reduced this visualization issue but did not abolish it. Once we look at your data we can tell. The only way for us to have a closer look is to provide us with the data (send the MSV number and the links on the jobs you ran). This will tell us if the MS/MS data is simply different or if there are other issues, conversion, merging of data, etc.

Thanks, Pieter

hendrik...@gmail.com

unread,

Feb 16, 2016, 11:51:25 AM2/16/16

to GNPS Discussion Forum and Bug Reports

Hello,

thank you for your fast response. Ok, I can accept, that HR MS systems are so cool, that slightly differences in masses are observed but as comprehensive network analysis this is not useful. However, do you exactly know the names of this merging apps for cytoscape (and for which cytoscape version)?

However in the following I will give you some links to network runs and a short description. I have to admit, that I do not know where I can find the "MSV numbers" but I hope the links work as well. Furthermore I hope it is no problem that I deleted uploaded mzxml files but it is no problem to reupp them.

I) ID=f4f23113332046378848407fb381533b http://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=f4f23113332046378848407fb381533b I used older MS data aquired my the microtof system but did the data conversion yesterday. There is a network with e.g. multiple m/z 586 included (7 nodes-network) but all 586 are the same compound.
II) ID=1937d1074f6242108de2de070ac0be71 http://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=1937d1074f6242108de2de070ac0be71 This is data of our new Impact II. There is a network where Erytromycin is correctly identified but there are 5 nodes with slightly different masses but only containing one MS2 spectrum. The mass data are not lock mass corrected by a script.
III) ID=84c2e2d311b244309458d534163d9ada http://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=84c2e2d311b244309458d534163d9ada The same as III but lock mass corrected by a script.

Thanks in advance and thank you, that you published your deep MS settings in the above mentioned publication!

Hendrik

Kathleen and Pieter Dorrestein

unread,

Feb 16, 2016, 12:12:36 PM2/16/16

to GNPS Discussion Forum and Bug Reports

Thanks we will look into your results. Yes modern HR instruments are awesome.

If you have not created a data set (one can upload and search files privately but that does not mean a data set and won't have MSV number) privately or make it public. It would help if we can see the data files. Here is how to create data sets https://bix-lab.ucsd.edu/display/Public/GNPS+MassIVE+Dataset+Creation in the documentation.

I dont recall offhand what app in cytoscape as we used it a few years back, we tend to use version 2.8. When needed we now do post-GNPS merging using a script. This is also a way to merge Na, K adducts as well but that is not perfect and all of this is very much in development.

Molecular Networking combined with dereplication is not a solution to every question so perhaps if you can define your goal or question that you addressing to see of the workflow is appropriate. It may be for your purpose it will not work and you have to find other solutions. If that is the case use feature request for this forum. We have a lot of tools in the development pipeline (some of them do not make it beyond development stage with low demand) and may be what you need and if it is then you can become a tester -with all its associated frusterations :) When appropriate, we work with others to develop solution.

P

Vanessa Phelan

unread,

Feb 16, 2016, 12:48:59 PM2/16/16

to GNPS Discussion Forum and Bug Reports

Depending on the accuracy of your mass spectrometer and the version of cytoscape you're using, you may be able to remove duplicate nodes using cytoscape. I don't know if the plug-ins for cytoscape 2 are still widely available. I have not used the method for cytoscape 3.

I recommend performing network merging on high accuracy mass spec data (qTOF w/ lockmass applied, FT data or qExactive). In this case, the nodes are merged together to give you a single node. Please note that the cytoscape 2 network merge method will merge all nodes with the same m/z regardless of variations in retention time.

The generation of metanodes collapses the nodes into a larger node.

Cytoscape 2

- to cluster your nodes, use the Advanced Network Merge plug-in

- select a network

o under advanced network merge, choose the attribute you want to match between the nodes to merge (m/z)

o under advanced option, enable merging nodes/edges in the same network

- to create metanode

o you can download MetanodePlugin2 (http://apps.cytoscape.org/apps/metanodeplugin2)

Cytoscape 3

- to cluster your nodes, use clusterMaker2 App. It should allow you to create a hierarchical cluster based on any arbitrary feature vector from your node attributes (ie. mz)

- to create a metanode in Cytoscape 3, you simply select a group of nodes and then right-click on the background and select Group-->Group Selected Nodes

o to do that based on an attribute, install the SetsApp, which includes the capability of creating groups based on node attributes.

As Pieter mentioned, scripts are being developed to overcome this issue. They just aren't quite ready for dissemination.

Best wishes,

Vanessa

Mingxun Wang

unread,

Feb 16, 2016, 3:51:50 PM2/16/16

to GNPS Discussion Forum and Bug Reports

Hendrik,

Another issue that is happening, is you are turning off MS-Cluster. That is why no consensus creation that is happening. You have listed the "Minimum Cluster Size" to zero which effectively turns it off. You must have the "Run MSCluster" checked and also have the "Minimum Cluster Size" be at least 1.

-Ming

Kathleen and Pieter Dorrestein

unread,

Feb 16, 2016, 4:05:07 PM2/16/16

to GNPS Discussion Forum and Bug Reports

Thanks Ming, I did not catch this :)

Hendrik, Ming was able to spot this by looking at the parameters of the links you provided,and is the reasons why it is so important to send the links when asking about particulars of your experiment. Thanks for sending these. MScluster is indeed key. This merges nearly identical spectra first. We typically use 2 or 3 (I think default is 2). This means you need 2 or 3 nearly identical before it will be considered in molecular networking. This is also the best way to begin to digest the noise. If you turn MScluster off you also see a lot of noise spectra. By requiring multiple spectra to be nearly identical, the spectra originating from random noise will be cancelled out as well. If you still have many nodes with the same mass and retention time then the post-GNPS processing will likely be required as discussed above.

Let us know if this did the trick, if not we will continue our exploration.

P

hendrik...@gmail.com

unread,

Feb 18, 2016, 3:16:27 AM2/18/16

to GNPS Discussion Forum and Bug Reports

Hello again,

first of all thank you all for your fast responses and your help to fix my problem!

It seems that it is right what Ming and Kathleen/Peter said - if I use "1" as "minimum cluster size" everythig works fine :) (I am so glad that it was a fault on my side). However I chose "0" because I thought I can force a network behaviour where also single nodes without connection are drawn. I saw this in some publications. So unfortunately this was not the right setting - anyway can you tell me what I have to set to get single nodes, too?

However I subsequently started some analyses after your advice to check .mzXML convertion behaviour of DataAnalysis v4.3. of lockmass corrected MS data:
I) ID=bd788713cd07402caaaebc2d1d000a83 http://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=bd788713cd07402caaaebc2d1d000a83 --> data were "manually" converted using menu_export
II) ID=96d316c0f8e54912942c6a01b5937f80 http://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=96d316c0f8e54912942c6a01b5937f80 --> data were exported via Bruker script
In both cases Erythromycin networks contain same nodes and corresponding MSMS specra number and mzXML files have the same size. That is good because I had the suspicion, that the script alters export behavoir.

I also got following advice of Michael Meehan regarding the header inaccurracy problem. I think it is worthful to share here, too:

"I'll address the second issue about the post-data-conversion header-inaccuracy that you are observing. To correct the problem requires 2 components:
1. The computer connected to the mass spectrometer needs to have the version of oTOF Control that it is running upgraded to oTOF Control v3.2 or higher. It is best to contact your local Bruker service contact for this upgrade.
2. Bruker provided us with a pre-release version of CompassXport for beta-testing. This specific version of CompassXport provided to us to test will be included in a future release of Bruker ESI Compass software suite.
From our testing we found that the combination of data acquired from a system running oTOF Control v3.2 and using the newer version of CompassXport completely corrected the inaccuracy of the precursor m/z reported in the header atop MS/MS spectrum.
However, when the data were acquired on a system running oTOF Control v3.1 or prior, then the new version of CompassXport was unable to perfectly export the precursor m/z from the header ...although it did yield a reasonable improvement in how close the precursor m/z in the header was to the actually precursor m/z in the data. Essentially, one cannot 100% fix the header issue using data acquired with the older version of oTOF Control, but all data acquired following the update to oTOFControl 3.2 can be corrected."

In my case, MS system was bought 4 weeks ago and I am working with oTOF Control v4.x and CompassXport v3.0.9. It is speculative but I think I cannot observe the heading problem because Bruker fixed this according to your work !?! If this is the case, thank you for your troubleshooting!

@Vanessa: Thank you for posting the cytoscape plugins - I will have a look on them regardless that my problem seems to be fixed!

So second time thank you all for your fast responses and advices - that is how a community should work!

Greetings

Hendrik

Laura Sanchez

unread,

Feb 18, 2016, 12:23:07 PM2/18/16

to GNPS Discussion Forum and Bug Reports

In order to view single nodes, use the 'Networkedges_selfloop' file in cytoscape instead of 'Networkedges'

Best,

Laura

Rachel Gregor

unread,

Feb 23, 2016, 8:32:58 AM2/23/16

to GNPS Discussion Forum and Bug Reports

Hi all,

I'm just starting to play here at BGU, using an old Thermo LCQ fleet, certainly not high res :(

I've come across a similar problem in which some spectra are not being grouped together into a node that I assume should be (same mass, same RT-- see attached screenshot). I'm not sure if the problem is in the settings I'm using or in the consistency of my MS2-- I thought to try to collect more scans before sending to the exclusion list perhaps. There are a lot of examples of this in this run that I tried. I can of course manually group as suggested here, but I hope to improve it at least.

https://gnps.ucsd.edu/ProteoSAFe/status.jsp?task=95728a3dce56427b8927e221dd36884a

I also came across another problem in this data set, in which the opposite happened-- a lot of unrelated scans were grouped together as a node. See for example the first library ID, 7-acetoxycoumarin, which grouped 100 (!!) mostly unrelated scans...

Would love to hear what any of you think, thanks for this great forum resource!

Best, Rachel Gregor

mz 221.png

Laura Sanchez

unread,

Feb 23, 2016, 10:57:41 AM2/23/16

to GNPS Discussion Forum and Bug Reports

Rachel,

Molecular networking really relies on MS2, so if nodes don't cluster based on MS1 and RT then you should manually inspect the MS2 spectra, on GNPS you can look at this by clicking on the tiny little spectra button of the left hand side when you are looking at compounds. Right next to the 11 and 12 in that screen shot you sent. If the MS2 match then we can talk why they didn't cluster.

Your parent mass tolerance is huge for the nodes that clustered that shouldn't have cluster. I would recommend using 1 Da, even its its an LCQ. I mean that doesn't waver by more than a Da during a run, with the Parent mass tolerance of 2.5 you are actually searching +/- 5 Da for consensus nodes.