Converting from XCalibur *.raw

2,733 views
Skip to first unread message

brennmat

unread,
Apr 16, 2009, 11:49:33 AM4/16/09
to spctools-discuss
Dear all

I am an absolute newbie and I am not sure if I came to the right
place... well, here's my question:

I am looking for a tool that extracts chromatogram data from XCalibur
*.raw files and converts them into a format that is readable by other
software (e.g. Matlab or Octave). My *.raw files contain data from a
GC/MS system which has both an MS and an ECD (electron capture
detector; note that the data from the ECD does not contain massess,
only signal intensity as a function of time).

I can think of two ways to achieve this:

1. There is already a tool available that does what I need. This might
be a converter that converts the XCalibur *.raw into a file that is
readable by Matlab or Octave. Remember that I not only need the MS
data, but also the ECD data which does not have any mass information.
Otherwise I could use XCalibur/xconvert or ReAdW, which export to ANDI
or mzXML. Is there a converter which also processes 'massless'
chromatograms?

2. I might try to write some code in C/C++ that does what I need. This
might be either a standalone program or a MEX or OCT binary that runs
from within Matlab or Octave. In both cases I'd need some help to get
me started on how to use the XCalibur DLL stuff in C/C++. I have never
worked with C/C++ on Microsoft Windows and I've got no clue how to
work with DLLs. Any pointers or code examples would be greatly
appreciated.

I'd be very grateful for any hints, comments and suggestions.


Thanks
Matthias

Matthew Chambers

unread,
Apr 16, 2009, 12:21:28 PM4/16/09
to spctools...@googlegroups.com
ProteoWizard's msconvert will almost work for this. If you give me an
example RAW file I can fix it up rather quickly. ReAdW isn't close to
being able to support this, not least because mzXML doesn't support
chromatograms (and it's also entirely about mass spectra). I have tested
msconvert on RAW files with PDA spectra, but in the absence of test data
it's difficult to develop for the other controller types. I'm not even
sure which controller would be used for ECD data. Is that using the ADC
controller?

-Matt

Natalie Tasman

unread,
Apr 16, 2009, 12:29:09 PM4/16/09
to spctools...@googlegroups.com
Hi Mattias,

To get started with Xcalibur, you might want to look at the "XDK" for examples.

But I do think it will be easier if you can just use and/or adapt existing tools.  I know that Matt from the ProteoWizard (PWIZ) project (including in the TPP) has been interested in parsing chromatogram data, and PWIZ (as well as the TPP) includes various tools for exporting data to text files.  Let's see what he has to say.

-Natalie

Brian Pratt

unread,
Apr 16, 2009, 12:37:51 PM4/16/09
to spctools...@googlegroups.com
The great thing about TPP, of course, is that it's open source. So, you've
already got source code and MSVC build files for readw, which shows you
exactly how to use the Thermo DLLs.

After that, well, Google is your friend!

Begin here:
https://sashimi.svn.sourceforge.net/svnroot/sashimi/trunk

Or maybe better yet, here:
https://proteowizard.svn.sourceforge.net/svnroot/proteowizard/trunk



Good luck,

Brian

Christopher Mason

unread,
Apr 16, 2009, 12:47:18 PM4/16/09
to spctools...@googlegroups.com
Hi Matthias,

[Looks like some other folks might be able to help you out directly;
take the below with a grain of salt]

Most of the folks here work with LC not GC MS. However, the existing
software for extracting LCMS data should at least get you started
towards your own. I would start with ReAdW:

http://tools.proteomecenter.org/wiki/index.php?title=Software:ReAdW

http://sashimi.svn.sourceforge.net/viewvc/sashimi/trunk/trans_proteomic_pipeline/src/util/

It uses the "XRawfile" DLL. There's some minimal documentation in
c:\Xcalibur\examples\xdk\ and c:\Xcalibur\Help\xdkhelp\, if you choose
to install it when you install XCalibur, but I mostly end up cribbing
from others, using ObjectBrowser and writing little test programs.

Note that, at least for LCMS data, there are two types of data that
could be stored in the RAW file: "centroid" (stick peaks, just a single
mass/abundance pair for each peak), and "profile" (peaks with actual
shape). You have to use the right function for each (ReAdW does this)
or you get wacky data. I suspect you'll need to use a totally separate
function for the EC detector.

The trickiest bit I've found is the string manipulation; there are
approximately fourteen million different kinds of character strings on
windows, and figuring out how to convert between them all is a pain. If
someone has a good reference for this, I'd love to see it.

Good luck!

-c
--
Christopher Mason Proteome Software (503) 244-6027

brennmat

unread,
Apr 17, 2009, 6:10:12 AM4/17/09
to spctools-discuss, Matthias Brennwald


On Apr 16, 6:29 pm, Natalie Tasman <ntas...@systemsbiology.org> wrote:
> Hi Mattias,
>
> To get started with Xcalibur, you might want to look at the "XDK" for
> examples.

That's what I did. I looked at the HTML documentation and the
examples. However, the HTML stuff does not give detailed information
on the objects and datatypes available, and the code examples are in
VisualBasic (I do not speak VisualBasic at all).

> But I do think it will be easier if you can just use and/or adapt existing
> tools.

I agree, but it might still be fun and educational to get my hands
dirty...

Thanks
Matthias

brennmat

unread,
Apr 17, 2009, 6:07:46 AM4/17/09
to spctools-discuss, Matthias Brennwald


On Apr 16, 6:21 pm, Matthew Chambers <matthew.chamb...@vanderbilt.edu>
wrote:
> ProteoWizard's msconvert will almost work for this. If you give me an
> example RAW file I can fix it up rather quickly.

You can get a few of my data files here:
http://homepages.eawag.ch/~brennmat/stuff/Xcalibur/testfiles/

I tried running msconvert on one of my files. I gave the following
error:
---------------
bad lexical cast: source type value could not be interpreted as target
Error processing file F011.RAW
---------------

> ReAdW isn't close to
> being able to support this, not least because mzXML doesn't support
> chromatograms (and it's also entirely about mass spectra). I have tested
> msconvert on RAW files with PDA spectra, but in the absence of test data
> it's difficult to develop for the other controller types. I'm not even
> sure which controller would be used for ECD data. Is that using the ADC
> controller?

I do not understand well enough what this all about. What does PDA
mean? Which analog/digital converter are you referring to?

Thanks!

Matthias

Natalie Tasman

unread,
Apr 17, 2009, 1:30:32 PM4/17/09
to spctools...@googlegroups.com
Matthias,

I'm glad to hear that there's another enthusiatic developer interested in the Thermo converter.  In readw, I suggest looking at the '#import'-ed Xrawfile2.dll header, which shows which COM calls are available.  There is a .doc file in the SDK describing all of the calls.  Unfortunately, it is sometimes incomplete and poorly documents some important calls.  There are notes in the readw source code where we've encountered this. 

Also, the thermo reader in the msconvert code does a nice job of writing C++ wrapper functions for the rawfile access, so that's another great place to start.

-Natalie

Matthew Chambers

unread,
Apr 17, 2009, 1:47:04 PM4/17/09
to spctools...@googlegroups.com
Hi Matthias,

I looked at an example file and supporting this in pwiz is
straightforward, but I don't have time to work on it today. If you want
to patch it in yourself, look at:
ChromatogramList_Thermo.cpp
There's two member functions of that class that will need updating,
createIndex() and chromatogram().

Look for the line "case 5: // generate "Total Scan" chromatogram for
entire run", which is the PDA case, which is another kind of detector
that Thermo instruments can use. The ECD chromatograms should be handled
very similarly.

The ECD chromatogram is accessed through Controller_Analog and getting a
chromatogram from it would be like:
rawfile_->setCurrentController(Controller_Analog, 1);
auto_ptr<ChromatogramData> cd = rawfile_->getChromatogramData(
Type_TIC, Operator_None, Type_MassRange,
"", "", "", 0,
0, rawfile_->rt(rawfile_->value(NumSpectra)),
Smoothing_None, 0);
The only thing that I'm not sure about is whether the NumSpectra
variable still applies when the ECD doesn't collect any spectra. In that
case, there'd have to be some other way to get the maximum retention time.

Modifying pwiz should be much easier than trying to work with the raw XDK.

-Matt


Natalie Tasman wrote:
> Matthias,
>
> I'm glad to hear that there's another enthusiatic developer interested
> in the Thermo converter. In readw, I suggest looking at the
> '#import'-ed Xrawfile2.dll header, which shows which COM calls are
> available. There is a .doc file in the SDK describing all of the
> calls. Unfortunately, it is sometimes incomplete and poorly documents
> some important calls. There are notes in the readw source code where
> we've encountered this.
>
> Also, the thermo reader in the msconvert code does a nice job of
> writing C++ wrapper functions for the rawfile access, so that's
> another great place to start.
>
> -Natalie
>
> On Fri, Apr 17, 2009 at 3:10 AM, brennmat <bren...@gmail.com
> <mailto:bren...@gmail.com>> wrote:
>
>
>
>
> On Apr 16, 6:29 pm, Natalie Tasman <ntas...@systemsbiology.org

DD

unread,
Apr 17, 2009, 2:00:49 PM4/17/09
to spctools-discuss
Maybe give XChroDisplay in XDK. You could also call rawfile.exe in
XDK which allows exporting of analog data. We did do this at one time
(for LC data) so will ask group how. We stopped when we started using
Promass (again for LC data) which also pulls this data.

brennmat

unread,
Apr 18, 2009, 3:18:31 AM4/18/09
to spctools-discuss, Matthias Brennwald
Thanks Matt!

I'll look into this next week when I'm back at work. I don't have a
Windoze machine at home.

Do you think the error I reported in my previous message is related to
pwiz not (yet) knowing about the Analog Controller?

Thanks again
Matthias


On Apr 17, 7:47 pm, Matthew Chambers <matthew.chamb...@vanderbilt.edu>
wrote:

brennmat

unread,
Apr 20, 2009, 2:48:26 AM4/20/09
to spctools-discuss, Matthias Brennwald
Dear Matt

I tried to apply your suggestion to the code. The modified file is
here:

http://homepages.eawag.ch/~brennmat/stuff/Xcalibur/pwiz/ChromatogramList_Thermo.cpp

However, I have to admit I did not really understand what I was doing.
Also, when I opened the project with MS Visual C++ 2008, the project
needed to be converted. I did not know what else to do, so I agreed to
convert it. If I try to build the project, the build stops with the
following:

-----------
Build Log Build started: Project: pwiz, Configuration: Debug|
Win32
Command Lines Creating temporary file "C:
\DOCUME~1\MATTHI~1\LOCALS~1\Temp\BAT00000440802108.bat" with contents
[
@echo off

cd c:\Documents and Settings\Matthias Brennwald\Desktop\pwiz_1.5.2\
\build

bjam debug -q -n

if errorlevel 1 goto VCReportError

goto VCEnd

:VCReportError

echo Project : error PRJ0019: A tool returned an error code from
"Performing Makefile project actions"

exit 1

:VCEnd
]
Creating command line "C:\DOCUME~1\MATTHI~1\LOCALS~1\Temp
\BAT00000440802108.bat"
Output Window Performing Makefile project actions
The system cannot find the path specified.
'bjam' is not recognized as an internal or external command,
operable program or batch file.
Project : error PRJ0019: A tool returned an error code from
"Performing Makefile project actions"
Results Build log was saved at "file://c:\Documents and Settings
\Matthias Brennwald\Desktop\pwiz_1.5.2\Debug\BuildLog.htm"
pwiz - 1 error(s), 0 warning(s)
-----------

I have no idea how to proceed with this... what do you think?

Matthias



On Apr 17, 7:47 pm, Matthew Chambers <matthew.chamb...@vanderbilt.edu>
wrote:

brennmat

unread,
Apr 20, 2009, 3:20:40 AM4/20/09
to spctools-discuss, Matthias Brennwald
Thanks for this suggestion. I tried rawfile.exe from the XDK examples.
It does export the data from the ECD, but there are a few issues:

1. I'd like to call it from a batch script or from XCalibur to
automatically extract the data from the raw files without user
intervention. I don't know if/how this can be done.

2. The time values of the data exported to the ASCII file are
truncated. They contain only two digits 'after the comma'. This
results in a time resolution of about 0.6 seconds, which is not enough
because there are about three data points per second in my data files.
I do not know how to change this because I do not know VisualBasic.

Matthias

Matthew Chambers

unread,
Apr 20, 2009, 11:37:38 AM4/20/09
to spctools...@googlegroups.com
I think you pretty much have it covered. We don't compile with MSVC
though, we use Boost Build. The MSVC project is just there because I use
MSVC as a development environment. If you want an easy way to build,
just run quickbuild.bat.

-Matt

DD

unread,
Apr 20, 2009, 11:51:26 AM4/20/09
to spctools-discuss
1)
a)Ofcourse one quick work around is to make a processing method that
has rawfile.exe selected as a program to run. Then you can have this
proc. meth. produce an excel (or whatever format) file of output.
Note there is a whole slew of scriptable parameters that can be added.

b)C:\Xcalibur\system\programs\XRawfile OCX.doc p. 153-156 is what you
want if you are going to go the activeX route. Looks like pwiz has
this stuff covered but it may help end users to have access to it.

2)
Believe 1)b should handle this but to some earlier points Thermo
doesnt seem to update XDK all that often so this could be OOD.

Later in the week I should be able to pull out our deprecated
functions for you to use for this functionality.
DD
> > > Matthias- Hide quoted text -
>
> - Show quoted text -

brennmat

unread,
Apr 21, 2009, 1:53:05 AM4/21/09
to spctools-discuss, Matthias Brennwald
Ok, I see. I double clicked the quickbuild.bat file. This resulted in
a terminal window showing what's going on, but it was too quick for me
to read what's there before the window closed. I then opened a
terminal (Start -> Run -> cmd), cd'd to the quickbuild.bat file and
ran it. The process stopped after a while saying that 'C:\Documents'
is not a recognized as an internal or external command, operable
program or batch file. The previous command started with 'C:\Documents
and Settings\Matthias Brennwald\Desktop\...', so I guess Winows
screwed up the spaces... what am I doing wrong?

Are you going to include support for the Analog channel in pwiz
anyway? (I hope so!) If so I might just wait until you come up with a
compiled binary.

Matthias


On Apr 20, 5:37 pm, Matthew Chambers <matthew.chamb...@vanderbilt.edu>
wrote:
> I think you pretty much have it covered. We don't compile with MSVC
> though, we use Boost Build. The MSVC project is just there because I use
> MSVC as a development environment. If you want an easy way to build,
> just run quickbuild.bat.
>
> -Matt
>
> brennmat wrote:
> > Dear Matt
>
> > I tried to apply your suggestion to the code. The modified file is
> > here:
>
> >http://homepages.eawag.ch/~brennmat/stuff/Xcalibur/pwiz/ChromatogramL...

brennmat

unread,
Apr 21, 2009, 7:51:55 AM4/21/09
to spctools-discuss
Thanks. I tried to look at pages 153-156 of XRawfile OCX.doc, but the
file has only 143 pages (I have XCalibur 1.4).

Matthias

Matthew Chambers

unread,
Apr 21, 2009, 12:35:40 PM4/21/09
to spctools...@googlegroups.com
Looks like my script doesn't like spaces. I'll fix it, but did you try
building the source from a path without spaces?

And yes, I eventually will include support for Analog based on your
example file, it just won't be for a week or two due to an upcoming trip.

-Matt

DD

unread,
Apr 21, 2009, 2:01:37 PM4/21/09
to spctools-discuss
Let me know where to drop zipped file. In the interim;

GetChroData

long GetChroData(long nChroType1, long nChroOperator, long nChroType2,
LPCTSTR szFilter, LPCTSTR szMassRanges1, LPCTSTR szMassRanges2, double
dDelay, double FAR* pdStartTime, double FAR* pdEndTime, long
nSmoothingType, long nSmoothingValue, VARIANT FAR* pvarChroData,
VARIANT FAR* pvarPeakFlags, long FAR* pnArraySize);

Return Value

1 if successful; otherwise, see Error Codes.

Parameters

nChroType1 A long variable containing the first chromatogram trace
type of interest.

nChroOperator A long variable containing the chromatogram trace
operator.

nChroType2 A long variable containing the second chromatogram trace
type of interest.

szFilter A string containing the formatted scan filter.

szMassRanges1 A string containing the formatted mass ranges for the
first chromatogram trace type.

szMassRanges2 A string containing the formatted mass ranges for the
second chromatogram trace type.

dDelay A double precision variable containing the chromatogram delay
in minutes.

pdStartTime A pointer to a double precision variable containing the
start time of the chromatogram time range to return.

pdEndTime A pointer to a double precision variable containing the end
time of the chromatogram time range to return.

nSmoothingType A long variable containing the type of chromatogram
smoothing to be performed.

nSmoothingValue A long variable containing the chromatogram smoothing
value.

pvarChroData A valid pointer to a VARIANT variable to receive the
chromatogram data.

pvarPeakFlags A valid pointer to a VARIANT variable to receive the
peak flag data.

pnArraySize A valid pointer to a long variable to receive the number
of data peaks returned in the chromatogram array.

pnArraySize A pointer to a long variable to receive the size of the
returned chromatogram array.

Remarks

Returns the requested chromatogram data as an array of double
precision time intensity pairs in pvarChroData. The number of time
intensity pairs is returned in pnArraySize.

The chromatogram trace types and operator values of nChroType1,
nChroOperator, and nChroType2 are dependent on the current controller.
See Chromatogram Type and Chromatogram Operator in the Enumerated
Types section for a list of the valid values for the different
controller types.

The scan filter field is only valid for MS controllers. If no scan
filter is to be provided, the value of szFilter may be NULL or an
empty string. Scan filters must match the Xcalibur scan filter format.
See the topic scan filters format, definition in the Xcalibur online
help for information on how to construct a scan filter.

The dDelay value contains the retention time offset to add to the
returned chromatogram times. The value may be set to 0.0 if no offset
is desired. This value must be 0.0 for MS controllers. It must be
greater than or equal to 0.0 for all other controller types.

The mass ranges are only valid for MS or PDA controllers. For all
other controller types, these fields must be NULL or empty strings.
For MS controllers, the mass ranges must be correctly formatted mass
ranges and are only valid for Mass Range and Base Peak chromatogram
trace types. For PDA controllers, the mass ranges must be correctly
formatted wavelength ranges and are only valid for Wavelength Range
and Spectrum Maximum chromatogram trace types. These values may be
left empty for Base Peak or Spectrum Maximum trace types but must be
specified for Mass Range or Wavelength Range trace types. See the
topic Mass1 (m/z) text box in the Xcalibur online help for information
on how to format mass ranges.

The start and end times, pdStartTime and pdEndTime, may be used to
return a portion of the chromatogram. The start time and end time must
be within the acquisition time range of the current controller which
may be obtained by calling GetStartTime and GetEndTime, respectively.
Alternatively, if the entire chromatogram is to be returned,
pdStartTime and pdEndTime may be set to zero. On return, pdStartTime
and pdEndTime will contain the actual time range of the returned
chromatographic data.

The nSmoothingType variable contains the type of smoothing to perform
on the returned chromatographic data. See Smoothing Type in the
Enumerated Types section for a list of the valid values for
nSmoothingType. The value of nSmoothingValue must be an odd number in
the range of 3 15 if smoothing is desired.

The chromatogram list contents will be returned in a SafeArray
attached to the pvarChroData VARIANT variable. When passed in, the
pvarChroData variable must exist and be initialized to VARIANT type
VT_EMPTY. If the function returns successfully, pvarChroData will be
set to type VT_ARRAY | VT_R8. The format of the chromatogram list
returned will be an array of double precision values in time intensity
pairs in ascending time order (e.g. time 1, intensity 1, time 2,
intensity 2, time 3, intensity 3, etc.)

The pvarPeakFlags variable is currently not used. This variable is
reserved to future use to return flag information, such as saturation,
about each time intensity pair.

On successful return, pnArraySize will contain the number of time
intensity pairs stored in the pvarChroData array.

Example

// example for GetChroData to return the MS TIC trace

typedef struct _datapeak
{
double dTime;
double dIntensity;
} ChroDataPeak;

XRawfileCtrl.SetCurrentController ( 0, 1 ); // first MS controller

VARIANT varChroData;
VariantInit(&varChroData);
VARIANT varPeakFlags;
VariantInit(&varPeakFlags);
long nArraySize = 0;
double dStartTime = 0.0;
double dEndTime = 0.0;
long nRet = XRawfileCtrl.GetChroData ( 1, // TIC trace
0,
0,
NULL,
NULL,
NULL,
0.0,
&dStartTime,
&dEndTime,
0,
0,
&varChroData,
&varPeakFlags,
&nArraySize );

if( nRet != 1 )
{
::MessageBox( NULL, _T(“Error getting chro data.”), _T(“Error”),
MB_OK );

}

if( nArraySize )
{
// Get a pointer to the SafeArray
SAFEARRAY FAR* psa = varChroData.parray;

ChroDataPeak* pDataPeaks = NULL;
SafeArrayAccessData( psa, (void**)(&pDataPeaks) );

for( long j=0; j<nArraySize; j++ )
{
double dTime = pDataPeaks[j].dTime;
double dIntensity = pDataPeaks[j].dIntensity;

// Do something with time intensity values

}

// Release the data handle
SafeArrayUnaccessData( psa );
}

if(varChroData.vt != VT_EMPTY )
{
SAFEARRAY FAR* psa = varChroData.parray;
varChroData.parray = NULL;

// Delete the SafeArray
SafeArrayDestroy( psa );
}

if(varPeakFlags.vt != VT_EMPTY )
{
SAFEARRAY FAR* psa = varPeakFlags.parray;
varPeakFlags.parray = NULL;

// Delete the SafeArray
SafeArrayDestroy( psa );
> > this stuff covered but it may help end users to have access to it.- Hide quoted text -

brennmat

unread,
Apr 22, 2009, 1:45:01 AM4/22/09
to spctools-discuss
Thanks. You can send it to my email account.

brennmat

unread,
Apr 22, 2009, 1:52:21 AM4/22/09
to spctools-discuss, Matthias Brennwald
Dear Matt

I moved the pwiz folder to C:\ and ran quickbuild.bat again by double
clicking it. It took a long time so I left it running when I went
home. This morning it was finished, but I did not see the results
anymore (the window was closed automatically). I could not find the
*.exe files, which is either because I did not know where to look for
them or they were not created. So I tried running quickbuild.bat from
a terminal and got the following message after a while:

...failed updating 3 targets...
...skipped 56 targets...

I think I'll take a break now. I will be in holidays for the next two
weeks, so we are not in a hurry. Please let me know if include support
for the Analog Channel in your binaries in the meantime.

Thanks
Matthias


On Apr 21, 6:35 pm, Matthew Chambers <matthew.chamb...@vanderbilt.edu>
Reply all
Reply to author
Forward
0 new messages