mzxml conversion issues in bruker

Max Cruesemann

unread,

Aug 23, 2016, 10:16:18 AM8/23/16

to GNPS Discussion Forum and Bug Reports

Hi all,

maybe someone here can give me a better and quicker answer than someone from Bruker.

If I convert a .d file that was recalibrated in data analysis to a .mzxml file (no matter if I do it manually or with a script), all masses are still the old, wrong ones before recalibration, although I saved the .d file after calibration. How can I convert the calibrated file?

Did someone have this issue before? I am using data anlysis 4.1.

Thamks, and best regards,

Max

Michael J. Meehan

unread,

Aug 23, 2016, 8:14:49 PM8/23/16

to GNPS Discussion Forum and Bug Reports

Hello Max,

This is a known issue. This particular problem is caused by a combination of 2 factors:

1. The mass spectrometer used to acquired the data is running a version oTOF Control that is earlier than oTOF Control v3.2. If the mass spectrometer is in fact running a version of oTOF Control prior to v3.2 then there is nothing that can be done to correct the precursor m/z exported into the mzXML.

2. There is a known issue with most versions of Bruker's Compass Xport utility. This utility handles the .d to .mzxml conversion, whether you perform the conversion via Data Analysis directly or via the script.

The solution:

We were provided with an custom version of Compass Xport which corrects the precursor m/z issue during mzxml coversion. I will correspond with you directly about both 1 and 2.

Bruker has released a new version of the Compass ESI Suite of programs that includes updated versions of Data Analysis. We recently received this software package but I have yet to test the mzXML conversion, so I will test this out shortly and update this thread once I know whether or not the latest versions of the Bruker software fix this particular problem.

Best regards,

Mike

Max Cruesemann

unread,

Aug 24, 2016, 3:18:23 AM8/24/16

to GNPS Discussion Forum and Bug Reports

Thanks Mike, that was very helpful!

Best,

Max

Juan Camilo Rojas Echeverri

unread,

Apr 26, 2017, 9:01:20 AM4/26/17

to GNPS Discussion Forum and Bug Reports

Hello Mike,

I thought that this thread was the appropriate for consulting a consistent issue that I have been finding with the clustering experiments I have performed.

In the image above you can see that there are some signals (below 600) that get imported into the GNPS algorithm that are artifacts from the exporting procedure from DataAnalysis (Bruker) whether it's done manually or in batch mode with a script. My guess is that when creating the line spectra it averages surrounding baseline signals and produces the artifact signal (?). The problem is that when clustering, the algorithm considers the artifacts as real signals and sometimes it creates random matches between nodes that happened to have the same artifacts in the same region and as well as coinciding matches with library spectra. However, in the second case, after manual inspection the spectral match is way off with regards to other peaks and their relative intensities. This is very problematic because it creates nodes due to random matching and not due to unique chemical information.

I'm aware that there are some controls that can't be considered in the GNPS algorithm parameter to help with restricting false matches, but I have encountered the following limitations:

- Minimum Peak Intensity: I could raise the bar to eliminate the artifacts. However, the intensity of the baseline between experiments is not always the same. The baseline particularly changes depending on the organic phase percentage that is reaching the equipment (a gradient run is used in the LC-MS set up). This phenomenon makes it difficult to set up a "standard" minimum peak intensity without sacrificing some signals.

- Minimum Matched Fragment Ions: I could raise the number of ions that would be required to give a match. Unfortunately, the fragmentation of some compounds (e.g. phosphatidylcholines) give a poor fragmentation in the sense that there are only 2 or three fragment signals (with CID using nitrogen). Therefore if I raise it too much it would miss matches.

- Minimum Cluster Size: I could raise the number of spectra required to create a cluster to decrease the probability of false matches. However, the instrument we used (micrOTOF-QIII) has been mentioned to be slow at acquiring data in general which leaves us with few scans of the same compound. Also there have been few (if any) biological replicates in some experiments that wouldn't allow for clustering across experiments of the same spectrum, coming from the same biological matrix.

So I would like to try and find a solution for the exportation in line spectra and would like to poke your knowledge or someone else in the community for any ideas of how to accomplish this:

- Apply a baseline subtraction filter to all spectra before exportation into line spectra. Ideally would like to find the median of each spectra (and bump it up a 10-20%) and use it as the case by case threshold for baseline subtraction (probably using a script).

Do you know exactly how the Compass Xport exportation function works? Or how could I access this information?

I will keep looking into this, but thought I could perhaps start a discussion here.

Mike, I'm really grateful for all the help you have provided us in Panama and I'm sorry I haven't taken the time to share my gratitude.

Sincerely, thank you for your time and help.

Juan C. Rojas

Laura Sanchez

unread,

Apr 26, 2017, 9:47:08 AM4/26/17

to GNPS Discussion Forum and Bug Reports

Juan,

Another solution different from what you are suggesting is to run a MeOH blank on your instrument and subtract the noise spectra to cut down on false matches by filtering all nodes out of the data that are also contained in your control. Real spectra shouldn't form a consensus node with noise but your right, they may cluster with it. I posted the filtering instructions a while ago.

I think for Minimum matched peaks, I generally don't recommend lowering that past 4, in the spectra you showed, there are clearly at least four intense fragment ions that would drive clustering. Also a MicroTOFQ should still be able to scan fast enough to ensure that a given real analyte would have been fragmented at least twice over the course of an LC run to set the Cluster min to 2. We use that with an LCQ which only does 3 MS2 events for every MS1 and we aren't finding this to be a problem (but we've probably already taken a hit with the minor analogues).

Best,

Laura

--

Laura Sanchez, PhD

Assistant Professor

Medicinal Chemistry and Pharmacognosy

University of Illinois at Chicago

833 S Wood St, 321 PHARM

Chicago, IL 60612

Phone (312) 996-0842

http://www.sanchezlab.science/

Vanessa Phelan

unread,

Apr 26, 2017, 4:59:15 PM4/26/17

to GNPS Discussion Forum and Bug Reports

Hi Juan-

For lipids such as the ones containing phosphocholine, I would actually suggest collecting in negative ion mode. In positive mode, the main fragment is m/z 184 which is intrinsically charged. Negative mode fragmentation of lipids yields more fragments which provides more structural information.

It would be helpful if you included a link to your gnps job so we can take a look at the settings being used. We may be able to provide some additional modifications to your settings that will reduce this particular issue.

You may also want to look into whether you want to include a filtering step in your conversion. You can probably do this in MS Convert on the mzXMLs. But also realize that you will lose some legitimate peaks in your MS2 data.

I agree with Laura. You should be able to collect data on the MicroTofQ at 10 Hz. I definitely collected at 10 Hz on the Dorrestein MicroTof. I collected 10 MS2 for every MS1. Your LC peak shape will suffer, but you'll have more scans. It really depends on the question you're asking and whether the chromatogram is more important than the MS2 data.

Like every mass spec tool, gnps has tradeoffs. The more broad your settings, the more noise you will get. You may want to perform feature finding on your MS1 data to visualize the production of a specific m/z value across samples. This would help support conclusions based upon loose networking parameters.

Best wishes,
Vanessa

Reply all

Reply to author

Forward