batch reading of Sciex wiff files using Sciex dll

299 views
Skip to first unread message

Gautam Saxena

unread,
Nov 6, 2017, 4:41:52 PM11/6/17
to ProteomicsQA
Does anyone know if any documentation exists on the Sciex dll (that can be shared publicly) so that one could possibly directly read from the wiff files in an efficient, batch manner? (We took a look at the ProteoWizard's use of the Sciex dll, but it "feels" like there should be a (much) faster way to do *batch* retrievals of data. Currently, ProteoWiard fires a query for every single scan where it asks the Sciex dll for a spectra *given* a sample-period-experiment combination; but we wonder if the Sciex dll supported a function call similar to "give me *all* spectra for this wiff file in any order that's efficient". In theory, such a call should be able to retrieve the data (in memory) in say "less than 5x" the time it takes for the OS to copy and paste the original wiff file to hard-disk.

On a related note, I heard that Sciex had a converter for Sciex wiff SWATH files (converts to mzXML I think?) that ran on Linux? Anyone have any thoughts on how performance and accurate that linux solution was? And what was easiest/best way to get it? (I couldn't find it online, so I presume it's a special request to someone at Sciex? Does anyone know who?)

David Bouyssié

unread,
Nov 16, 2017, 3:40:22 AM11/16/17
to ProteomicsQA
Hi Gautam,

We are also using the same data access approach in our ProteoWizard fork (https://github.com/mzdb/pwiz-mzdb).
I agree with you that a spectrum iterator might more efficient than the current API.
Though I don't know how they have implemented their random access internally.
I guess they use an optimized indexing system to obtain reasonable performance.

Thus, the best advice I could give is to contact AbSciex developers and ask them if another API is available.

If you have any progress on this side I'm also very interested.
Currently our .wiff -> mzDB conversion is not as fast as a Thermo .raw -> mzDB conversion, so any IO improvement would be welcome ;-)

About the Linux libraries availability, sorry again but I have no information to share.
It's always very difficult to have information regarding vendor software development.
I guess we don't have access to people knowing the details that would be useful for our research&development projects.

Next January, we organize a developer meeting in Ghent.
If people from AbSciex come to this event it would be a great opportunity to grab more information and also to extend our list of enlightened contacts.

Best,

David

Eric Deutsch

unread,
Nov 16, 2017, 10:54:54 AM11/16/17
to David Bouyssié, ProteomicsQA, Eric Deutsch

Hi David, thanks for the updates. I have heard that Thermo is now supporting Linux in their MSFileReader API, so I think we can look forward to conversion from RAW to other formats under Linux in the nearish future. I have not confirmed that it is already downloadable today, but it is imminent if not.

 

I have also heard that SCIEX now has conversion of wiff to mzML under Linux, but this may only be available within their OneOmics platform, and not generally available.

 

Sorry, all I have is hearsay, but there are some possibilities out there.

 

Eric

--
You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/proteomicsqa/c435f623-523b-4cf8-8810-d39f241e41ae%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

David Bouyssié

unread,
Jan 14, 2018, 5:00:06 AM1/14/18
to ProteomicsQA
Hi Eric,

I just got one week ago the Linux version of MSFileReader from Thermo :-D
However the hardest part is yet to come.
I expect some long weeks of debugging before having something really usable...

I can also confirm that AbSciex engineers are working on a Linux version of the wiff file reader.
Lyle Burton did a talk last week at the EuBIC developers meeting in Ghent (http://uahost.uantwerpen.be/eubic18/sciex.html).
He mentioned this Linux library and also explained us that Sciex was working on a wiff2 file format based on SQLite.

I know that Bruker is following the same objective, their new file format being also partially based on SQLite (only meta-data + external binary file for the peaks).

The road to SQLite based raw files is open!
Let's see where this road will take us ;-)

Cheers,

David

Eric Deutsch

unread,
Jan 14, 2018, 11:53:00 PM1/14/18
to David Bouyssié, ProteomicsQA, Eric Deutsch

Hi David, these all sounds like great developments, thanks for the update!

 

Eric

 

 

From: proteo...@googlegroups.com [mailto:proteo...@googlegroups.com] On Behalf Of David Bouyssié


Sent: Sunday, January 14, 2018 2:00 AM
To: ProteomicsQA <proteo...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "ProteomicsQA" group.
To unsubscribe from this group and stop receiving emails from it, send an email to proteomicsqa...@googlegroups.com.

Gautam Saxena

unread,
Apr 5, 2018, 9:25:01 AM4/5/18
to ProteomicsQA
Hi David,

That's awesome regarding the Thermo Linux library -- thanks for sharing!

A few questions:
  1. Is this Linux Thermo library publicly available? (If not, would you know how one can get it?)
  2. Have you had a chance to use it and see how "well" it works, especially compared to performance/accuracy of its Windows counterpart?
  3. Does this Linux library work on Mac as well?
Thanks in advance, David et al.
-Gautam

David Bouyssié

unread,
Apr 8, 2018, 5:00:10 PM4/8/18
to ProteomicsQA
Hi Gautam,

> 1. Is this Linux Thermo library publicly available? (If not, would you know how one can get it?)
It depends what you mean by "publicly".
I would say it available upon request for now.
Do you need a contact?

> 2. Have you had a chance to use it and see how "well" it works, especially compared to performance/accuracy of its Windows counterpart?

Unfortunately not yet :(
Our objective is to integrate it in our raw2mzDB tool (https://github.com/mzdb/pwiz-mzdb).
But the main project maintainer (Alexandre Burel) is working on a second project for the rest of the year.

I think we should ask to people working on ProteoWizard (Matt Chambers for instance) if they have a roadmap regarding the integration of the Linux version of MSFileReader.
I would bet yes but can't say more than that ;-)

>3. Does this Linux library work on Mac as well?
Yes it seems to be the case.
Here is what I found in the release notes:
The Windows and MacOS assemblies are in a combined NuGet package file while the Linux assemblies are in a separate package.

Best,

David

Gautam Saxena

unread,
Apr 14, 2018, 11:32:17 AM4/14/18
to ProteomicsQA
Hi David et al:

So, I've been trying to read Sciex wiff file from Java by using SWIG + ProteoWizard's WiffFile.cpp file. I haven't quite got it working, but I'm 99% sure that's because I don't know C/C++ (and more importantly, boost), so I get all these compilation issues. I read in an old paper that ProteoWizard folks were thinking about using SWIG for the ProteoWizard readers, but I'm guessing that that idea must have gotten shelved at some point. Has anyone had success applying SWIG to the Sciex WiffFile.cpp class and to get Java to then be able to read a WIFF file? If not, is anyone interested in working on that? (I'd help, but I just don't seem to have enough Boost/C++ skills to wrestle with the boost/c++/dll compilation issues.)

Thanks in advance, David et al.

Regards,
Gautam

dtabb1973

unread,
Apr 15, 2018, 1:20:13 AM4/15/18
to ProteomicsQA
I cannot really speak to the use of SWIG, but I have been trying to improve my peak-listing of WIFF files from a TripleTOF, as well.  The vendor peak-lister in msConvert for SCIEX is just the Analyst centroider, and it's okay rather than great.  The ProteinPilot peaklister in the SCIEX MS Data Converter beta is superior, but I've learned that SCIEX doesn't really support its use on SWATH experiments (in my hands, it does not work on SWATH experiments).  The CantWaiT and Turbocharger functions in ProteoWizard are a pretty effective tandem, but this combo seems to be more conservative about including peaks than the Analyst centroider, even at a signal-to-noise ratio of 1.0.

Good luck!
Dave

Brian Pratt

unread,
Apr 16, 2018, 11:45:14 AM4/16/18
to ProteomicsQA
Hi Gautam,

Good news: there is in fact SWIG support in ProteoWizard, developed for use with Java. Bad news: I don't think anyone has used or even tried to build it in some years, but you could try reviving it. It wraps the RAMPAdapter class, which is probably too primitive for your purposes, but could serve as a starting point for your work.

The whole idea of ProteoWizard is to provide a layer of abstraction between your code and the vendor DLLs, which means a fair bit of code complexity, so you're not going to be able to just wrap one source file. There will be lots of dependencies, but the existing (and  certainly stale and doubtless broken) SWIG bindings work may point you in the right direction.

And, yes, boost and bjam are truly awful to deal with, you'll just have to lean into it if you don't want to reinvent the entire build system. On the other hand, we've finally migrated to GitHub so at least you don't have to fool around with SourceForge and subversion any longer.

Good luck!

Brian Pratt
Reply all
Reply to author
Forward
0 new messages