msconvert conversion of .wiff file to mzXML presents scans out of order...

924 views
Skip to first unread message

Nathan Edwards

unread,
Jan 19, 2011, 12:34:09 PM1/19/11
to spctools-discuss
I've had this problem with a variety of tools and their handling .wiff
data file from Analyst, and now having gotten msconvert to work
(thanks Matt) I was hoping that msconvert did it "right".

Unfortunately, it doesn't seem so.

I believe that the scan number and retention times should increase
monotonically in the mzXML file and in a tandem mass-spectrometry
experiment, I expect the MS1 scan to be immediately followed by the MS/
MS scans whose precursors are derived from the MS1 scan.

A number (n >= 2) of converters (msconvert & ABI's) for .wiff files do
not respect this file structure and output the spectra by experiment
and cycle, with all experiment 1 (MS1 spectra) first, then all
experiment 2 (MS/MS from first selected precursor peak from each MS1
spectrum), then all experiment 3, etc.

In the msconvert mzXML output, there isn't even any reference in the
MS2 spectra to assist in determining the correct MS1 spectrum to
associate with the MS2 spectrum.

It is possible to use various tricks to try and determine cycle,
experiment, and MS1/MS2 relationships but at the least these require
sorting (globally) on retentionTime, an expensive proposition for
large mzXML files.

I'd be happy to provide an example mzXML output to demonstrate the
issue.

- n

Matthew Chambers

unread,
Jan 19, 2011, 12:49:14 PM1/19/11
to spctools...@googlegroups.com
I am well aware of this issue, but there's no schematic rule about the file being in retention time
order. And there is no scan number for a WIFF scan (since it uses the arbitrary index that pwiz
translates to, that part at least actually does increase monotonically). Use mzML and nativeID! :)

The problem is the WiffFileReader API takes a relative eternity to switch between experiments. It's
quite slow enough as it is. :) You'll be happy to hear that the new API does not have the same
problem. With the current API it would be faster (except possibly with huge profile data) to first
convert to XML and then use a sorting filter to convert the XML to another file sorted by retention
time. Currently there is a sorting filter, but no built-in predicates that use it are accessible
from the command-line.

I'm not actually sure HOW to tell which scan is the precursor scan. In Thermo, figuring out the
precursor scan with certainty without parsing the scan event list (which comes in a fascinating
variety of formats) can be quite tricky. I don't know if the same problems exist in ABI and there's
no scan event list to check (AFAIK), so I punted.

-Matt


On 1/19/2011 11:34 AM, Nathan Edwards wrote:
> I've had this problem with a variety of tools and their handling .wiff
> data file from Analyst, and now having gotten msconvert to work
> (thanks Matt) I was hoping that msconvert did it "right".
>
> Unfortunately, it doesn't seem so.
>
> I believe that the scan number and retention times should increase
> monotonically in the mzXML file and in a tandem mass-spectrometry
> experiment, I expect the MS1 scan to be immediately followed by the MS/
> MS scans whose precursors are derived from the MS1 scan.
>

> A number (n>= 2) of converters (msconvert& ABI's) for .wiff files do

Nathan Edwards

unread,
Jan 19, 2011, 2:25:21 PM1/19/11
to spctools-discuss

Ugh. I was worried it was due to efficiency issues with the vendor
API.

Sigh. Regardless of whether the scan numbers are real or made-up, I
think that the non-chronological order of the scans in the file is an
issue. I suspect others will be surprised by this too.

At the time of conversion it is possible to read in one way and write
in another without having to resort globally (read from # experiments
"caches" in turn) but without an experiment annotation in the spectra
metadata, a global retentionTime sort is the only robust alternative I
can think of (though linear time merge sort for # experiments
monotonicly increasing runs is doable, I guess). There are
retentionTime repeats (empty spectra before the real spectrum with the
correct retention time). More about this next.

How can I detect that the retention time is not monotonic without
reading a large chunk of the file? I guess I can look for a magic
string in the first 1K of the file (.wiff, Analyst) to decide whether
to do this expensive check, and fix.

Without explicit information in the .wiff file data structure,
formally determining the precursor scan may not be possible, but the
"cycle,experiment" grouping (as opposed to experiment,cycle) will
capture the right relationships by chronology for the vast majority of
LC-MS/MS datasets.

- n

On Jan 19, 12:49 pm, Matthew Chambers <matt.chamber...@gmail.com>
wrote:

Nathan Edwards

unread,
Jan 19, 2011, 2:49:02 PM1/19/11
to spctools-discuss

OK, sorry, interupted. Here is an example of the problem with
retentionTime sorting (and a msconvert/pwiz bug!):

msLevel retentionTime precursorMz peaksCount basePeakMz
startMz endMz num
1 1510.83 - 1263
371.082032334 225.102565257 1197.95325018 1031
1 1511.94 - 1257
473.596738128 223.037754119 1197.62626587 1032
1 1513.03 - 1281
473.596738128 223.03211017 1198.92138532 1033
2 1513.85 570.260399924 93 729.337882586
155.163041064 1393.62827746 1607
1 1513.85 - 1281
473.596738128 223.03211017 1198.92138532 1034
2 1514.6 570.260399924 121 570.260399924
175.104021303 1201.47442517 1608
1 1515.62 - 1273
371.082032334 223.037754119 1199.18311059 1035
1 1516.71 - 1232
371.089312373 217.869931256 1197.83553069 1036
1 1517.79 - 1159
371.082032334 221.094829342 1193.8887412 1037
1 1518.91 - 1181
371.082032334 223.043398139 1199.03915816 1038

I'm sure the table will be foobar'ed by a proportional font. Sigh.

Retention time 1513.85 is mentioned twice, I don't know how to
interpret this. I think MS1 scan num 1034 is empty in the wiff file
(gets the retention time of next spectrum as it takes no time to
collect), and its spectrum is a carry over from scan 1033 (notice that
the spectrum mz metadata is identical for basePeakMz, startMz, and
endMz, very unlikely). Base 64 spectral data for 1033 and 1034 are
identical too. That said, why are we taking a MS2 spectrum (1608) if
we don't take a MS1 spectrum prior to it?

- n

Matthew Chambers

unread,
Jan 19, 2011, 3:01:32 PM1/19/11
to spctools...@googlegroups.com
The validation would have to come from Analyst I suspect. Do you have an installation you can look
at these cycles with? If there's a discrepancy between Analyst and WiffFileDataReader and there's a
feasible way to work hack around it, that could be done.

-Matt

Nathan Edwards

unread,
Jan 19, 2011, 4:15:01 PM1/19/11
to spctools-discuss
What do you mean validation?

I have a copy of Analyst installed. I'm confident it doesn't have two
scans at the same retention time.

- n

On Jan 19, 3:01 pm, Matthew Chambers <matt.chamber...@gmail.com>

Matthew Chambers

unread,
Jan 19, 2011, 4:21:15 PM1/19/11
to spctools...@googlegroups.com
What is the corresponding nativeID for the index 1034? I'd look up that cycle and see what Analyst
has there.

-Matt

Nathan Edwards

unread,
Jan 25, 2011, 10:31:52 AM1/25/11
to spctools-discuss

I'm getting back to analyzing this issue. Note that mzWiff outputs in
cycle major order (all experiments of each cycle in order), as opposed
to msconvert and the ABI tool. Furthermore, mzWiff's conversion time
for my test case wiff/wiff.scan file (15 sample file of approximately
27K spectra, about 120Mb .wiff/wiff.scan file, no peak detection) is
about the same as for msconvert (about 10mins each) despite the cycle
major order of output.

These results suggest the API isn't always slower for cycle major
order. Perhaps the API can do random file seeks if there is
a .wiff.scan file, and not if there isn't? Dunno. However, this does
suggest there are cases where cycle major is no slower than experiment
major order...

I'll be looking into the repeat retention time issue and the apparent
duplicated spectrum next.

- n

On Jan 19, 2:25 pm, Nathan Edwards <edward...@gmail.com> wrote:

Nathan Edwards

unread,
Jan 26, 2011, 12:43:35 PM1/26/11
to spctools-discuss

Whoa. I am looking at the results of three different .wiff file
processing tools and a acquisition sample, and they all appear to
exhibit the same behavior with respect to repeated retention times and
duplicated MS1 spectra (suggests the problem is in Analyst or the
Analyst API) :-(

Here is what I've observed. If the scans are sorted by retention time,
I observe that sometimes the last MS/MS experiment of a cycle has the
same retention time as the MS experiment of the next cycle. The MS/MS
spectrum has data in it. The MS spectrum data appears to be a repeat
of the previous cycle's MS experiment. Even in mzWiff, which gives
each experiment of a cycle the same retention time, seems to have the
repeated MS scan if looked up directly.

Analyst (2.0 here) doesn't make it easy to figure out what the right
answer is, but it helped me form a hypothesis.

Some notation - let sij be spectrum in cycle i, experiment j, and
presume cycles consist of an MS spectrum (exp 1) and an MS/MS spectrum
(exp 2).

So in order, s11 -> s12 -> s21 -> s22, and I've observed that the
retention time of rt(s12) == rt(s21), and that spectrum(s11) ==
spectrum(s21). Also, precursorMz(s12) == precursorMz(s22).

In Analyst, it appears that spectrum(s22) is displayed when looking at
the s11 and s12 pair (with their retention times). retention times
corresponding to s21 and s22 are not shown in the IDA explorer view.

What I think is happening is that MS scan s11 is taken,
precursorMz(s12) is selected and the acquisition of s12 is started.
However, time runs out (?) before enough signal is collected. s12 is
filled in with the current data when time runs out, and spectral
acquisition is continued. s22 represents the "2x acquisition time" of
s12 and holds the accumulation of two scan's worth of data. s21 is
filled in with s11's data and the spectral data in s22 is presented
with s12's meta-data.

Now we'd need LifeTech/ABI/MDS/Sciex to confirm or deny, but if all of
this is correct, the easiest fix would be to drop s12 and s21, but it
is unclear how all of this generalizes with more MS/MS experiments per
cycle with perhaps only experiment 3 requiring more time. Sigh.

- n

Paul Bergen

unread,
Jan 27, 2011, 1:55:32 AM1/27/11
to spctools-discuss
Sorry to interupt the flow of ideas, but for the newcomer to these
file conversion issues: Is there a tool that I could use to validate
a conversion from a .wiff file to mzXML?

thanks

Paul
> > > > > - n- Hide quoted text -
>
> - Show quoted text -
Reply all
Reply to author
Forward
0 new messages